JP4865607B2

JP4865607B2 - Karaoke apparatus, singing evaluation method and program

Info

Publication number: JP4865607B2
Application number: JP2007064108A
Authority: JP
Inventors: 達也入山; 拓弥 ▲高▼橋; 聡橘; 豪矢吹
Original assignee: Yamaha Corp; Daiichikosho Co Ltd
Current assignee: Yamaha Corp; Daiichikosho Co Ltd
Priority date: 2007-03-13
Filing date: 2007-03-13
Publication date: 2012-02-01
Anticipated expiration: 2027-03-13
Also published as: JP2008225115A

Description

本発明は、歌唱を採点するカラオケ装置において、特殊な歌唱技法を評価する技術に関する。 The present invention relates to a technique for evaluating a special singing technique in a karaoke apparatus for scoring a song.

カラオケ装置において、歌唱者の歌唱の巧拙を点数で表示する採点機能を備えたものがある。このような採点機能のうち、できるだけ実際の歌唱の巧拙と採点の結果が対応するように、歌唱者の歌唱音声信号から抽出された音程データや音量データなどのデータと、カラオケ曲の歌唱旋律（ガイドメロディ）と対応するデータとの比較機能を持たせたものがある（例えば、特許文献１）。
特開平１０−６９２１６号公報 Some karaoke apparatuses have a scoring function for displaying the skill of a singer's singing in points. Of these scoring functions, actual singing skill and scoring results correspond as much as possible, and data such as pitch data and volume data extracted from the singer's singing voice signal and karaoke song melody ( Some have a comparison function between the guide melody) and the corresponding data (for example, Patent Document 1).
Japanese Patent Laid-Open No. 10-69216

このような採点機能を備えたカラオケ装置によって、１音を単位としてノートごとの音程変化などを比較して採点することが可能になったが、この採点機能は、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）形式でデータ化されたガイドメロディを基準にして、歌唱者の歌唱と比較していたため、楽譜上の音符を基準にした採点に止まっていた。そのため、このような採点を行った場合、実際の巧拙の印象とは異なった採点結果となることがあった。例えば、歌唱の途中のある音において、その音の最後の部分で音程を下げるようにして歌う歌唱（以下、フォール歌唱という）を行った場合、巧く聞こえることがあるにもかかわらず、本来歌唱すべきピッチからずれていくことになるために、採点結果が低くなることもあった。 The karaoke apparatus provided with such a scoring function makes it possible to compare and score changes in notes for each note, and this scoring function is in the MIDI (Musical Instrument Digital Interface) format. Since it was compared with the singer's singing based on the guide melody that was converted into data, the scoring was based on the notes on the score. Therefore, when such a scoring is performed, the scoring result may differ from the actual skillful impression. For example, if you sing a song that sings with a lower pitch at the end of the sound (hereinafter referred to as the Fall song), you may be able to hear it skillfully. The result of scoring may be lowered due to deviation from the pitch that should be performed.

本発明は、上述の事情に鑑みてなされたものであり、フォール歌唱を検出して評価するカラオケ装置、歌唱評価方法およびプログラムを提供することを目的とする。 This invention is made | formed in view of the above-mentioned situation, and it aims at providing the karaoke apparatus, singing evaluation method, and program which detect and evaluate a fall song.

上述の課題を解決するため、本発明は、楽曲を再生する再生手段と、前記楽曲の再生の間に入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力手段と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを抽出するピッチ抽出手段と、前記ピッチ抽出手段が所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出手段によって抽出されたピッチを第１のピッチとして特定する第１の特定手段と、前記ピッチ抽出手段によって前記第１のピッチが抽出された第１タイミングの予め設定された設定時間前にあたる第２タイミングにおいて前記ピッチ抽出手段によって抽出されたピッチを第２のピッチとして特定する第２の特定手段と、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別手段と、前記識別手段から出力された識別信号に基づいて、予め定められた処理を行う評価手段とを具備することを特徴とするカラオケ装置を提供する。 In order to solve the above-described problems, the present invention provides a playback unit that plays back a song, a voice input unit that generates singer voice data based on a song of a singer input during playback of the song, If there is a pitch extraction means for extracting the pitch of the singer's singing based on the singer's voice data and a period during which the pitch extraction means cannot extract the pitch continuously for a predetermined time or longer, a predetermined time before the period A first specifying means for specifying the pitch extracted by the pitch extracting means at the timing as a first pitch, and a preset setting of the first timing at which the first pitch is extracted by the pitch extracting means second specifying means for specifying a pitch extracted by the pitch extracting means at the second timing corresponding to previous time as the second pitch, the second When the pitch is larger than the first pitch by a predetermined pitch or more, an identification unit that outputs an identification signal and an evaluation unit that performs a predetermined process based on the identification signal output from the identification unit A karaoke apparatus is provided.

また、本発明は、楽曲の進行に対応して歌唱者が歌唱すべきメロディを示すガイドメロディデータを記憶する記憶手段と、前記楽曲の進行に対応して入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力手段と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを前記楽曲の進行に対応して抽出するピッチ抽出手段と、前記ピッチ抽出手段が前記楽曲の進行に対して所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出手段によって抽出されたピッチを第１のピッチとして特定する第１の特定手段と、前記ガイドメロディデータが示すメロディを構成する各音のうち、前記ピッチ抽出手段が前記第１のピッチを抽出した前記楽曲の進行に対するタイミングに対応する音のピッチを算出し、当該算出したピッチより所定量低いピッチを第２のピッチとして算出するピッチ算出手段と、前記ピッチ抽出手段が前記第１のピッチを抽出した前記楽曲の進行に対する第１タイミングの予め設定された設定時間前にあたる第２タイミングから当該第１タイミングまでの期間において前記ピッチ抽出手段が抽出したピッチに、前記第２のピッチより大きいピッチが含まれている場合、かつ、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別手段と、前記識別手段から出力された識別信号に基づいて、予め定められた処理を行う評価手段とを具備することを特徴とするカラオケ装置を提供する。 Further, the present invention is based on storage means for storing guide melody data indicating a melody to be sung by a singer corresponding to the progress of the music, and a singer's singing input corresponding to the progress of the music. Voice input means for generating singer voice data, pitch extraction means for extracting the pitch of the singer's singing corresponding to the progress of the music, based on the singer voice data, and the pitch extraction means When there is a period during which the pitch cannot be extracted continuously for a predetermined time or more with respect to the progress of the music, the pitch extracted by the pitch extracting means at a predetermined timing before the period is specified as the first pitch. Of the sounds constituting the melody indicated by the guide melody data, the pitch extracting means extracts the first pitch from the sound that constitutes the melody indicated by the guide melody data. A pitch calculation unit that calculates a pitch of a sound corresponding to the timing at which the first pitch is lower than the calculated pitch, and a pitch calculation unit that calculates a pitch that is lower than the calculated pitch as a second pitch; If the pitch the pitch extraction unit and extracted in a preset period of time from the previous corresponding to the second timing to the first timing of the first timing for the progression includes a larger pitch than said second pitch And when the second pitch is larger than the first pitch by a predetermined pitch or more, the identification means for outputting the identification signal and the identification signal output from the identification means are determined in advance. There is provided a karaoke apparatus comprising an evaluation means for performing the processing.

また、別の好ましい態様において、前記識別手段は、さらに、前記第１タイミングより前の一定時間における前記ピッチ抽出手段が抽出したピッチが、所定幅のピッチ範囲に含まれている場合には、識別信号を出力してもよい。 Further, in another preferred embodiment, the identification unit further, when the pitch of the pitch extraction unit and extracted in a predetermined time earlier than the first timing is included in the pitch range of Tokoro constant width is The identification signal may be output.

また、別の好ましい態様において、前記識別手段は、さらに、前記第１タイミング以前の一定時間における前記ピッチ抽出手段が抽出したピッチの変動が、単調減少になっている場合には、識別信号を出力してもよい。 Further, in another preferred embodiment, the identification unit further, when the pitch extracting means in a predetermined time of the first timing earlier variations in pitch was extracted, have become monotonically decrease, the identification signal May be output.

また、別の好ましい態様において、前記評価手段における予め定められた処理は、前記識別手段から識別信号が出力された回数を計測し、当該計測した回数に基づいて、前記歌唱者の歌唱についての評価を行う処理であってもよい。 Moreover, in another preferable aspect, the predetermined process in the evaluation means measures the number of times that the identification signal is output from the identification means, and evaluates the song of the singer based on the measured number of times. It may be a process of performing.

また、別の好ましい態様において、楽曲の進行に対応して楽曲の特定の範囲を示す範囲特定情報を記憶する範囲記憶手段をさらに具備し、前記識別手段は、さらに、前記第１タイミングが、前記範囲特定情報が示す特定の範囲に含まれている場合には、識別信号を出力してもよい。 Further, in another preferred embodiment, in response to the progress of the music further comprising a range storage means for storing a range specifying information indicating a specific range of musical, said identification means further said first timing is, If it is included in the specific range indicated by the range specifying information, an identification signal may be output.

また、別の好ましい態様において、前記所定時間、前記所定のタイミング、前記設定時間または前記所定ピッチの設定値を示す設定情報を記憶する設定記憶手段と、前記設定情報に基づいて、前記所定時間、前記所定のタイミング、前記設定時間または前記所定ピッチの値を設定する設定手段とをさらに具備してもよい。 In another preferred embodiment, setting storage means for storing setting information indicating a setting value of the predetermined time, the predetermined timing, the setting time or the predetermined pitch, and the predetermined time based on the setting information, You may further comprise the setting means which sets the value of the said predetermined timing, the said setting time, or the said predetermined pitch.

また、本発明は、楽曲を再生する再生過程と、前記楽曲の再生の間に入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力過程と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを抽出するピッチ抽出過程と、前記ピッチ抽出過程において所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出過程により抽出されたピッチを第１のピッチとして特定する第１の特定過程と、前記ピッチ抽出過程において前記第１のピッチが抽出された第１タイミングの予め設定された設定時間前にあたる第２タイミングにおいて前記ピッチ抽出過程により抽出されたピッチを第２のピッチとして特定する第２の特定過程と、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別過程と、前記識別過程によって出力された識別信号に基づいて、予め定められた処理を行う評価過程とを備えることを特徴とする歌唱評価方法を提供する。 Further, the present invention is based on a playback process of playing back a song, a voice input process of generating singer voice data based on a singer's song input during playback of the song, and the singer voice data When there is a pitch extraction process for extracting the pitch of the singer's song and a period during which the pitch cannot be extracted continuously for a predetermined time or more in the pitch extraction process, the pitch extraction is performed at a predetermined timing before the period. first specific process and, said first timing preset time before it falls second timing of said first pitch in the pitch extraction process is extracted to identify the pitch extracted by the process as the first pitch wherein the second specific procedure for identifying the pitch extracted by the pitch extraction process as a second pitch, the second pitch of the first in An identification process for outputting an identification signal and an evaluation process for performing a predetermined process based on the identification signal output by the identification process. A singing evaluation method is provided.

また、本発明は、楽曲の進行に対応して入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力過程と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを前記楽曲の進行に対応して抽出するピッチ抽出過程と、前記ピッチ抽出過程において前記楽曲の進行に対して所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出過程により抽出されたピッチを第１のピッチとして特定する第１の特定過程と、記憶手段に記憶された前記楽曲の進行に対応して歌唱者が歌唱すべきメロディを示すガイドメロディデータが示すメロディを構成する各音のうち、前記ピッチ抽出過程において前記第１のピッチを抽出した前記楽曲の進行に対するタイミングに対応する音のピッチを算出し、当該算出したピッチより所定量低いピッチを第２のピッチとして算出するピッチ算出過程と、前記ピッチ抽出過程において前記第１のピッチを抽出した前記楽曲の進行に対する第１タイミングの予め設定された設定時間前にあたる第２タイミングから当該第１タイミングまでの期間において前記ピッチ抽出過程が抽出したピッチに、前記第２のピッチより大きいピッチが含まれている場合、かつ、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別過程と、前記識別過程によって出力された識別信号に基づいて、予め定められた処理を行う評価過程とを備えることを特徴とする歌唱評価方法を提供する。 The present invention also provides a voice input process for generating singer voice data based on a singer's singing input corresponding to the progress of the music, and the singer's singing based on the singer voice data. If there is a pitch extraction process for extracting the pitch corresponding to the progress of the music and a period during which the pitch cannot be extracted continuously for a predetermined time or more with respect to the progress of the music in the pitch extraction process, A first identification process that identifies the pitch extracted by the pitch extraction process at a predetermined timing as a first pitch, and a melody that the singer should sing in response to the progress of the music stored in the storage means Among the sounds constituting the melody indicated by the indicated guide melody data, the timing for the progression of the music in which the first pitch is extracted in the pitch extraction process. Calculating a pitch of the response to the sound, the pitch calculation step of calculating a predetermined amount lower pitch than the pitch the calculated as the second pitch, the second on the progression of the music extracted the first pitch in the pitch extraction process a pitch the pitch extraction process is extracted from the second timing before falls preset time in the period until the first timing of the first timing, when contains greater pitch than the second pitch, and, When the second pitch is larger than the first pitch by a predetermined pitch or more, an identification process for outputting an identification signal and a predetermined process based on the identification signal output by the identification process are performed. A singing evaluation method characterized by comprising an evaluation process to be performed.

また、本発明は、コンピュータに、楽曲を再生する再生機能と、前記楽曲の再生の間に入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力機能と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを抽出するピッチ抽出機能と、前記ピッチ抽出機能において所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出機能において抽出されたピッチを第１のピッチとして特定する第１の特定機能と、前記ピッチ抽出機能において前記第１のピッチが抽出された第１タイミングの予め設定された設定時間前にあたる第２タイミングにおいて前記ピッチ抽出機能により抽出されたピッチを第２のピッチとして特定する第２の特定機能と、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別機能と、前記識別機能によって出力された識別信号に基づいて、予め定められた処理を行う評価機能とを実現させるためのプログラムを提供する。 The present invention also provides a computer with a playback function for playing back music, a voice input function for generating singer voice data based on a song of a singer input during playback of the music, and the singer voice Based on the data, if there is a pitch extraction function for extracting the pitch of the singer's singing and a period in which the pitch cannot be extracted continuously for a predetermined time or more in the pitch extraction function, at a predetermined timing before the period hits before the first and the specific function of specifying a pitch extracted in the pitch extracting function as a first pitch, said first timing a preset time of the said first pitch in the pitch extraction function is extracted a second specifying function for specifying a more extracted pitch to the pitch extraction function in the second time as the second pitch, the second When the pitch is larger than the first pitch by a predetermined pitch or more, an identification function for outputting an identification signal and an evaluation function for performing a predetermined process based on the identification signal output by the identification function A program for realizing the above is provided.

また、本発明は、コンピュータに、楽曲の進行に対応して歌唱者が歌唱すべきメロディを示すガイドメロディデータを記憶する記憶機能と、楽曲の進行に対応して入力された歌唱者の歌唱に基づいて歌唱者音声データを生成する音声入力機能と、前記歌唱者音声データに基づいて、前記歌唱者の歌唱のピッチを前記楽曲の進行に対応して抽出するピッチ抽出機能と、前記ピッチ抽出機能において前記楽曲の進行に対して所定時間以上連続してピッチを抽出できない期間がある場合には、当該期間前の所定のタイミングにおいて前記ピッチ抽出機能において抽出されたピッチを第１のピッチとして特定する第１の特定機能と、前記ガイドメロディデータが示すメロディを構成する各音のうち、前記ピッチ抽出機能において前記第１のピッチを抽出した前記楽曲の進行に対するタイミングに対応する音のピッチを算出し、当該算出したピッチより所定量低いピッチを第２のピッチとして算出するピッチ算出機能と、前記ピッチ抽出機能において前記第１のピッチを抽出した前記楽曲の進行に対する第１タイミングの予め設定された設定時間前にあたる第２タイミングから当該第１タイミングまでの期間において前記ピッチ抽出が抽出したピッチに、前記第２のピッチより大きいピッチが含まれている場合、かつ、前記第２のピッチが前記第１のピッチに対して所定ピッチ以上大きい場合には、識別信号を出力する識別機能と、前記識別機能によって出力された識別信号に基づいて、予め定められた処理を行う評価機能とを実現させるためのプログラムを提供する。 Further, the present invention provides a storage function for storing guide melody data indicating a melody to be sung by a singer in accordance with the progress of the music, and a singer's singing input corresponding to the progress of the music. A voice input function for generating singer voice data based on the singer voice data, a pitch extraction function for extracting the singer's singing pitch corresponding to the progress of the music, and the pitch extraction function based on the singer voice data When there is a period during which the pitch cannot be extracted continuously for a predetermined time or more with respect to the progress of the music, the pitch extracted by the pitch extraction function at a predetermined timing before the period is specified as the first pitch Among the sounds constituting the melody indicated by the first specific function and the guide melody data, the pitch extraction function extracts the first pitch. A pitch calculation function for calculating a pitch of a sound corresponding to the timing with respect to the progress of the music, a pitch lower than the calculated pitch by a predetermined amount as a second pitch, and the first pitch in the pitch extraction function. the extracted pitch the pitch extraction is extracted in the period from the second timing falls before a preset time of the first timing until the first timing on the progression of the song, it included larger pitch than said second pitch And when the second pitch is larger than the first pitch by a predetermined pitch or more, based on the identification function for outputting the identification signal and the identification signal output by the identification function A program for realizing an evaluation function for performing a predetermined process is provided.

本発明によれば、フォール歌唱を検出して評価するカラオケ装置、歌唱評価方法およびプログラムを提供することができる。
ADVANTAGE OF THE INVENTION According to this invention, the karaoke apparatus which detects and evaluates a fall song, the song evaluation method, and a program can be provided.

以下、本発明の一実施形態について説明する。 Hereinafter, an embodiment of the present invention will be described.

＜実施形態＞
本実施形態においては、フォール歌唱の評価を行なうことができるカラオケ装置１について説明する。まず、カラオケ装置１のハードウエアの構成について図１を用いて説明する。図１は、本発明の第１実施形態に係るカラオケ装置１のハードウエアの構成を示すブロック図である。 <Embodiment>
In the present embodiment, a karaoke apparatus 1 capable of evaluating a fall song will be described. First, the hardware configuration of the karaoke apparatus 1 will be described with reference to FIG. FIG. 1 is a block diagram showing a hardware configuration of a karaoke apparatus 1 according to the first embodiment of the present invention.

ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２に記憶されているプログラムを読み出して、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３にロードして実行することにより、カラオケ装置１の各部について、バス１０を介して制御する。また、ＲＡＭ１３は、ＣＰＵ１１がデータ処理などを行う際のワークエリアとして機能する。 A CPU (Central Processing Unit) 11 reads out a program stored in a ROM (Read Only Memory) 12, loads it into a RAM (Random Access Memory) 13, and executes it, thereby executing a bus for each part of the karaoke apparatus 1. 10 to control. The RAM 13 functions as a work area when the CPU 11 performs data processing.

記憶部１４は、例えば、ハードディスクなどの大容量記憶手段であって、楽曲データ記憶領域１４ａおよび歌唱者音声データ記憶領域１４ｂを有する。楽曲データ記憶領域１４ａには、カラオケ曲の楽曲データが複数記憶され、各楽曲データは、ガイドメロディトラック、伴奏データトラック、歌詞データトラックを有している。 The storage unit 14 is, for example, a large-capacity storage unit such as a hard disk, and includes a music data storage area 14a and a singer voice data storage area 14b. The song data storage area 14a stores a plurality of song data of karaoke songs, and each song data has a guide melody track, an accompaniment data track, and a lyrics data track.

ガイドメロディトラックは、楽曲のボーカルパートのメロディを示すデータであり、発音の指令を示すノートオン、消音の指令を示すノートオフ、コントロールチェンジなどのイベントデータと、次のイベントデータを読み込んで実行するまでの時間を示すデルタタイムデータとを有している。このデルタタイムにより、実行すべきイベントデータの時刻と楽曲の進行が開始されてからの時間経過とを対応付けることができる。また、ノートオン、ノートオフは、それぞれ発音、消音の対象となる音の音程を示すノートナンバを有している。これにより、楽曲のボーカルパートのメロディを構成する各音は、ノートオン、ノートオフ、デルタタイムによって規定することができる。伴奏データトラックは、各伴奏楽器の複数のトラックから構成されており、各楽器のトラックは上述したガイドメロディトラックと同様のデータ構造を有している。なお、本実施形態の場合、ＭＩＤＩ形式のデータが記憶されている。 The guide melody track is data that indicates the melody of the vocal part of the music, and reads and executes event data such as note-on that indicates a sound generation command, note-off that indicates a mute command, and control change, and the next event data. Delta time data indicating the time until. With this delta time, the time of event data to be executed can be associated with the passage of time since the progression of music began. Note on and note off each have a note number indicating the pitch of the sound to be sounded and muted. Thereby, each sound which comprises the melody of the vocal part of a music can be prescribed | regulated by note-on, note-off, and delta time. The accompaniment data track is composed of a plurality of tracks of each accompaniment instrument, and each instrument track has the same data structure as the above-described guide melody track. In the case of the present embodiment, MIDI format data is stored.

歌詞データトラックは、楽曲の歌詞を示すテキストデータと、楽曲の進行に応じて後述する表示部１５に歌詞テロップを表示するタイミングを示す表示タイミングデータと、表示される歌詞テロップを色替え（以下、ワイプという）するためのタイミングを示すワイプタイミングデータとを有する。そして、ＣＰＵ１１は、楽曲データ記憶領域１４ａに記憶される楽曲データを再生し、当該楽曲データの伴奏データトラックに基づいて生成した音声データを後述する音声処理部１８に出力するとともに、歌詞データトラックに基づいて表示部１５に歌詞テロップを表示させる。 The lyric data track changes the color of text data indicating the lyrics of music, display timing data indicating the timing of displaying lyrics telop on the display unit 15 to be described later according to the progress of the music, Wipe timing data indicating timing for performing wipe. Then, the CPU 11 reproduces the music data stored in the music data storage area 14a, outputs the audio data generated based on the accompaniment data track of the music data to the audio processing unit 18 described later, and outputs it to the lyrics data track. Based on this, a lyrics telop is displayed on the display unit 15.

歌唱者音声データ記憶領域１４ｂには、後述するマイクロフォン１７から音声処理部１８を経てＡ／Ｄ変換された音声データ（以下、歌唱者音声データという）が、例えばＷＡＶＥ形式やＭＰ３形式などで時系列に記憶される。このように時系列に記憶されることにより、歌唱者音声データの所定時間長の各フレームに対して、楽曲の進行が開始されてから経過した時間を対応付けることができる。 In the singer voice data storage area 14b, voice data (hereinafter referred to as singer voice data) A / D converted from the microphone 17 (to be described later) via the voice processing unit 18 is time-sequentially in, for example, the WAVE format or the MP3 format. Is remembered. By storing in chronological order in this way, it is possible to associate the time elapsed since the progression of the music started with each frame having a predetermined time length of the singer's voice data.

表示部１５は、液晶ディスプレイなどの表示デバイスであって、ＣＰＵ１１に制御されて、記憶部１４の楽曲データ記憶領域１４ａに記憶された歌詞データトラックに基づいて、楽曲の進行に応じて背景画像などとともに歌詞テロップを表示する。また、カラオケ装置１を操作するためのメニュー画面、歌唱の評価結果画面などの各種画面を表示する。操作部１６は、例えばキーボード、マウス、リモコンなどであり、カラオケ装置１の利用者が操作部１６を操作すると、その操作内容を表すデータがＣＰＵ１１へ出力される。 The display unit 15 is a display device such as a liquid crystal display. The display unit 15 is controlled by the CPU 11, and based on the lyrics data track stored in the music data storage area 14a of the storage unit 14, a background image or the like according to the progress of the music. A lyrics telop is also displayed. In addition, various screens such as a menu screen for operating the karaoke apparatus 1 and a singing evaluation result screen are displayed. The operation unit 16 is, for example, a keyboard, a mouse, a remote controller, and the like. When the user of the karaoke apparatus 1 operates the operation unit 16, data representing the operation content is output to the CPU 11.

マイクロフォン１７は、歌唱者の歌唱を収音する。音声処理部１８は、マイクロフォン１７によって収音された音声をＡ／Ｄ変換して歌唱者音声データを生成する。歌唱者音声データは、上述したように記憶部１４の歌唱者音声データ記憶領域１４ｂに記憶される。また、音声処理部１８は、ＣＰＵ１１によって入力された音声データをＤ／Ａ変換し、スピーカ１９から放音する。 The microphone 17 picks up a singer's song. The voice processing unit 18 performs A / D conversion on the voice collected by the microphone 17 to generate singer voice data. The singer voice data is stored in the singer voice data storage area 14b of the storage unit 14 as described above. The audio processing unit 18 D / A converts the audio data input by the CPU 11 and emits the sound from the speaker 19.

次に、ＣＰＵ１１が、ＲＯＭ１２に記憶されたプログラムを実行することによって実現する機能について説明する。図２は、ＣＰＵ１１が実現する機能を示したソフトウエアの構成を示すブロック図である。 Next, functions realized by the CPU 11 executing programs stored in the ROM 12 will be described. FIG. 2 is a block diagram showing a software configuration showing the functions realized by the CPU 11.

ピッチ抽出部１０１は、歌唱者音声データ記憶領域１４ｂに記憶される歌唱者音声データを読み出し、所定時間長のフレーム単位で当該歌唱者音声データに係る歌唱のピッチを抽出する。そして、フレーム単位で抽出した歌唱のピッチを示す歌唱ピッチデータを通常評価部１０３とフォール歌唱評価部１０５に出力する。なお、ピッチの抽出にはＦＦＴ（ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）により生成されたスペクトルから抽出してもよいし、その他公知の方法により抽出すればよい。ここで、ピッチ抽出部１０１が抽出したフレーム単位のピッチをＰｉｔｃｈ（ｆ）と表記し、ｆは最初のフレームから数えたフレーム数とする。例えば、最初のフレームから抽出されたピッチをＰｉｔｃｈ（１）とし、フレーム順にＰｉｔｃｈ（２）、Ｐｉｔｃｈ（３）、・・・とする。そして、歌唱者音声データに係る歌唱が無い場合などピッチが抽出できない場合には、Ｐｉｔｃｈ（ｆ）＝０とする。 The pitch extraction unit 101 reads out the singer's voice data stored in the singer's voice data storage area 14b, and extracts the pitch of the singer's voice data related to the singer's voice data in units of frames having a predetermined time length. Then, the singing pitch data indicating the singing pitch extracted in units of frames is output to the normal evaluation unit 103 and the fall singing evaluation unit 105. The pitch may be extracted from a spectrum generated by FFT (Fast Fourier Transform) or may be extracted by other known methods. Here, the pitch of the frame unit extracted by the pitch extraction unit 101 is expressed as Pitch (f), and f is the number of frames counted from the first frame. For example, Pitch (1) is the pitch extracted from the first frame, and Pitch (2), Pitch (3),. If the pitch cannot be extracted, such as when there is no song related to the singer's voice data, Pitch (f) = 0 is set.

ピッチ算出部１０２は、楽曲データ記憶領域１４ａから評価対象となる楽曲のガイドメロディトラックを読み出し、読み出したガイドメロディトラックから楽曲のメロディを認識する。また、認識したメロディを構成する各音について、所定時間長のフレーム単位でピッチを算出する。そして、フレーム単位で算出したガイドメロディのピッチを示すメロディピッチデータを通常評価部１０３に出力する。なお、メロディを構成する各音の音程は、ノートナンバによって規定されているから、ノートナンバに対応してピッチが決定することになる。例えば、ノートナンバが６９（Ａ４）である場合には、ピッチは４４０Ｈｚとなる。この際、ノートナンバとピッチを対応させるテーブルを記憶部１４に記憶しておけば、ピッチ算出部１０２は当該テーブルを参照してピッチを算出してもよい。 The pitch calculation unit 102 reads the guide melody track of the music to be evaluated from the music data storage area 14a, and recognizes the melody of the music from the read guide melody track. In addition, for each sound constituting the recognized melody, a pitch is calculated in units of frames having a predetermined time length. Then, the melody pitch data indicating the pitch of the guide melody calculated for each frame is output to the normal evaluation unit 103. In addition, since the pitch of each sound which comprises a melody is prescribed | regulated by the note number, a pitch will be determined corresponding to a note number. For example, when the note number is 69 (A4), the pitch is 440 Hz. At this time, if a table that associates the note number with the pitch is stored in the storage unit 14, the pitch calculation unit 102 may calculate the pitch with reference to the table.

通常評価部１０３は、ピッチ抽出部１０１から出力された歌唱ピッチデータとピッチ算出部１０２から出力されたメロディピッチデータとをフレーム単位で比較し、ピッチの一致の程度を示す通常評価データを生成し、採点部１０４へ出力する。ここで、一致の程度は、各フレームにおけるメロディを構成する音のピッチと歌唱のピッチとの差分から算出してもよいし、メロディを構成する音のピッチと歌唱のピッチとが実質的に一致、すなわちメロディを構成する音のピッチに対して所定のピッチの範囲に入った時間的な割合から算出してもよい。なお、通常評価部１０３においては、歌唱のピッチを評価するだけでなく、音量、その他の特徴量を用いて評価してもよい。この場合には、歌唱からそれぞれ必要な特徴量を抽出する抽出手段を設けるとともに、記憶部１４に評価の基準となる特徴量を記憶させておけばよい。 The normal evaluation unit 103 compares the singing pitch data output from the pitch extraction unit 101 and the melody pitch data output from the pitch calculation unit 102 in units of frames, and generates normal evaluation data indicating the degree of pitch matching. , Output to the scoring unit 104. Here, the degree of coincidence may be calculated from the difference between the pitch of the sound that composes the melody and the pitch of the singing in each frame, or the pitch of the sound that composes the melody and the pitch of the singing substantially match. That is, it may be calculated from a time ratio within a predetermined pitch range with respect to the pitch of the sound constituting the melody. Note that the normal evaluation unit 103 may not only evaluate the singing pitch but also evaluate the sound volume and other feature quantities. In this case, an extraction unit that extracts each necessary feature amount from the singing is provided, and a feature amount serving as a reference for evaluation may be stored in the storage unit 14.

フォール歌唱評価部１０５は、検出部１０５１、計測部１０５２、累算部１０５３、評価部１０５４を有し、ピッチ抽出部１０１から出力された歌唱ピッチデータに基づいて、フォール歌唱の評価結果を示すフォール歌唱評価データを生成し、採点部１０４へ出力する。以下、フォール歌唱評価部１０５を構成する各部について説明する。 The fall song evaluation unit 105 includes a detection unit 1051, a measurement unit 1052, an accumulation unit 1053, and an evaluation unit 1054. The fall song evaluation unit 105 indicates the fall song evaluation result based on the song pitch data output from the pitch extraction unit 101. Singing evaluation data is generated and output to the scoring unit 104. Hereinafter, each part which comprises the fall song evaluation part 105 is demonstrated.

検出部１０５１は、ピッチ抽出部１０１から出力された歌唱ピッチデータを所定量バッファするとともに、フレーム単位で解析し、各フレームについて歌唱のピッチが抽出されているか否かの検出結果を示す判断情報を計測部１０５２に出力する。また、全てのフレームについて比較が終了した場合には、終了したことを示す終了情報および比較した全フレーム数Ｔａｌｌを示す情報を評価部１０５４に出力する。 The detecting unit 1051 buffers a predetermined amount of the singing pitch data output from the pitch extracting unit 101, analyzes the singing pitch data in units of frames, and determines determination information indicating whether or not the singing pitch is extracted for each frame. The data is output to the measurement unit 1052. When the comparison is completed for all the frames, the end information indicating the completion and the information indicating the total number Tall of the compared frames are output to the evaluation unit 1054.

計測部１０５２は、検出部１０５１から出力される判断情報に基づいて、歌唱ピッチデータに係るピッチが連続して抽出されなかった、すなわちＰｉｔｃｈ（ｆ）が連続して０となったフレーム数であるＴｃｏｕｎｔの数を計測するとともに、所定の条件を満たした場合には、当該条件を満たしたことを示す識別信号を累算部１０５３に出力する。ここで、所定の条件とは、以下の２条件である。第１に、Ｔｃｏｕｎｔが所定数ｍに達した場合、すなわち、検出部１０５１において歌唱ピッチデータに係るピッチが連続して抽出されなかったフレーム数が所定数ｍに達することである。第２に、Ｐｉｔｃｈ（α）−Ｐｉｔｃｈ（β）が予め設定された設定値ｋ（例えば、３００ｃｅｎｔ）より大きいことである。ここで、βは、検出部１０５１が最後にＰｉｔｃｈ（ｆ）≠０と判断したフレームを示すものである。また、αは、βからさらに予め設定されたフレーム数ｎだけ前のフレームを示すものである。なお、計測部１０５２は、Ｐｉｔｃｈ（α）とＰｉｔｃｈ（β）を検出部１０５１がバッファした歌唱ピッチデータから取得する。一方、検出部１０５１から歌唱ピッチデータに係るピッチが抽出されたことを示す判断情報が出力された場合には、Ｔｃｏｕｎｔのカウントをリセットして「０」とする。なお、所定数ｍは、フレーム数を示すものであり、１フレームの時間が決まっていることから時間に換算することもできる。また、所定数ｍは、操作部１６を操作することにより変更できるようにしてもよい。同様に、フレーム数ｎ、設定値ｋについても操作部１６を操作することにより変更できるようにしてもよい。 The measuring unit 1052 is the number of frames in which the pitch related to the singing pitch data has not been continuously extracted based on the determination information output from the detecting unit 1051, that is, Pitch (f) has been continuously zero. The number of Tcounts is measured, and when a predetermined condition is satisfied, an identification signal indicating that the condition is satisfied is output to the accumulation unit 1053. Here, the predetermined conditions are the following two conditions. First, when Tcount reaches a predetermined number m, that is, the number of frames in which the pitch related to the singing pitch data is not continuously extracted in the detection unit 1051 reaches the predetermined number m. Second, Pitch (α) −Pitch (β) is larger than a preset setting value k (for example, 300 cent). Here, β indicates a frame that the detection unit 1051 has finally determined that Pitch (f) ≠ 0. In addition, α indicates a frame that is a predetermined number of frames before β from β. The measurement unit 1052 acquires Pitch (α) and Pitch (β) from the singing pitch data buffered by the detection unit 1051. On the other hand, when the determination information indicating that the pitch related to the singing pitch data is extracted from the detection unit 1051, the count of Tcount is reset to “0”. The predetermined number m indicates the number of frames and can be converted into time since the time of one frame is determined. The predetermined number m may be changed by operating the operation unit 16. Similarly, the number of frames n and the set value k may be changed by operating the operation unit 16.

累算部１０５３は、計測部１０５２から出力される識別信号の回数を計測する。計測した値をＴｔｏｔａｌという。評価部１０５４は、検出部１０５１から終了情報と全フレーム数Ｔａｌｌを示す情報が出力された場合には、累算部１０５３から、累算部１０５３が累算した値であるＴｔｏｔａｌを読み出し、当該Ｔｔｏｔａｌの全フレーム数Ｔａｌｌに対する割合を示すフォール歌唱評価データを採点部１０４に出力する。なお、フォール歌唱評価データは、Ｔｔｏｔａｌに基づいて生成されていれば、Ｔｔｏｔａｌ／Ｔａｌｌを示すものに限られない。例えばＴｔｏｔａｌそのものを示すものであってもよい。 The accumulation unit 1053 measures the number of identification signals output from the measurement unit 1052. The measured value is referred to as Ttotal. In a case where the end information and the information indicating the total number of frames Tall are output from the detection unit 1051, the evaluation unit 1054 reads Ttotal that is a value accumulated by the accumulation unit 1053 from the accumulation unit 1053, and the Ttotal Fall singing evaluation data indicating the ratio to the total number of frames Tall is output to the scoring unit 104. In addition, if fall song evaluation data are produced | generated based on Ttotal, it will not be restricted to what shows Ttotal / Tall. For example, it may indicate Ttotal itself.

採点部１０４は、通常評価部１０３から出力された通常評価データと、フォール歌唱評価部１０５から出力されたフォール歌唱評価データとに基づいて歌唱者の歌唱の評価点を算出する。そして、算出した評価点はＣＰＵ１１によって表示部１５に表示される。 The scoring unit 104 calculates the evaluation score of the singer's song based on the normal evaluation data output from the normal evaluation unit 103 and the Fall song evaluation data output from the Fall song evaluation unit 105. The calculated evaluation score is displayed on the display unit 15 by the CPU 11.

次に、カラオケ装置１の動作について説明する。まず、歌唱者は操作部１６を操作して、歌唱する楽曲を選択する。ＣＰＵ１１は、歌唱者が選択した楽曲に対応する楽曲データを楽曲データ記憶領域１４ａから読み出し、楽曲の進行に応じて、読み出した楽曲データの伴奏データトラックに基づいて楽曲の伴奏などをスピーカ１９から放音させるとともに、読み出した楽曲データの歌詞データトラックに基づいて表示部１５に歌詞をワイプ表示させる。歌唱者は、楽曲の進行にあわせて歌唱すると、当該歌唱がマイクロフォン１７に収音され、歌唱者音声データとして歌唱者音声データ記憶領域１４ｂに記憶される。 Next, the operation of the karaoke apparatus 1 will be described. First, the singer operates the operation unit 16 to select a song to be sung. The CPU 11 reads the music data corresponding to the music selected by the singer from the music data storage area 14a, and releases the music accompaniment from the speaker 19 based on the accompaniment data track of the read music data as the music progresses. The lyric is wiped on the display unit 15 based on the lyric data track of the read music data. When the singer sings along with the progress of the music, the singing is picked up by the microphone 17 and stored in the singer voice data storage area 14b as singer voice data.

楽曲が最後まで進むことにより終了すると、ＣＰＵ１１によって歌唱者の歌唱の評価が開始される。ピッチ抽出部１０１は、歌唱者音声データ記憶領域１４ｂに記憶された歌唱者音声データを読み出し、歌唱ピッチデータを通常評価部１０３およびフォール歌唱評価部１０５の検出部１０５１に出力する。ピッチ算出部１０２は、楽曲データ記憶領域１４ａから評価基準となる楽曲のガイドメロディトラックを読み出し、メロディピッチデータを通常評価部１０３に出力する。 When the music is finished by proceeding to the end, the CPU 11 starts to evaluate the singer's singing. The pitch extraction unit 101 reads the singer voice data stored in the singer voice data storage area 14 b and outputs the singing pitch data to the normal evaluation unit 103 and the detection unit 1051 of the fall singing evaluation unit 105. The pitch calculation unit 102 reads the guide melody track of the music serving as the evaluation reference from the music data storage area 14 a and outputs the melody pitch data to the normal evaluation unit 103.

通常評価部１０３は、ピッチ抽出部１０１から出力された歌唱ピッチデータとピッチ算出部１０２から出力されたメロディピッチデータとをフレーム単位で比較し、ピッチの一致の程度を示す通常評価データを生成し、採点部１０４へ出力する。フォール歌唱評価部１０５は、ピッチ抽出部１０１から出力された歌唱ピッチデータに基づいて、その評価結果を示すフォール歌唱評価データを生成し、採点部１０４へ出力する。 The normal evaluation unit 103 compares the singing pitch data output from the pitch extraction unit 101 and the melody pitch data output from the pitch calculation unit 102 in units of frames, and generates normal evaluation data indicating the degree of pitch matching. , Output to the scoring unit 104. The fall song evaluation unit 105 generates fall song evaluation data indicating the evaluation result based on the song pitch data output from the pitch extraction unit 101, and outputs the fall song evaluation data to the scoring unit 104.

ここで、フォール歌唱評価部１０５の評価の流れについて図３を用いて、詳細に説明する。図３は、フォール歌唱評価部１０５の評価の流れを示すフローチャートである。 Here, the flow of evaluation of the fall singing evaluation unit 105 will be described in detail with reference to FIG. FIG. 3 is a flowchart showing a flow of evaluation performed by the fall singing evaluation unit 105.

まず、フォール歌唱評価部１０５における評価を開始すると、フォール歌唱評価部１０５において用いられるパラメータであるＴａｌｌ、Ｔｔｏｔａｌ、Ｔｃｏｕｎｔを初期化して全て「０」とする（ステップＳ１）。 First, when the evaluation in the fall singing evaluation unit 105 is started, the parameters Tall, Ttotal, and Tcount used in the fall singing evaluation unit 105 are initialized to all “0” (step S1).

次に、検出部１０５１は、比較したフレーム数を示すＴａｌｌの数値を「１」増加させてカウントアップし（ステップＳ２）、ピッチ抽出部１０１から出力された歌唱ピッチデータに係るピッチのうち最初のフレームのピッチが抽出されているかどうか、すなわちＰｉｔｃｈ（Ｔａｌｌ）≠０（この時点においてはＴａｌｌ＝１）であるかどうかを検出することにより判断する（ステップＳ３）。 Next, the detection unit 1051 increments the value of Tall indicating the number of compared frames by “1” and counts up (step S2), and the first pitch among the pitches related to the singing pitch data output from the pitch extraction unit 101 is counted. It is determined by detecting whether or not the pitch of the frame has been extracted, that is, whether Pitch (Tall) ≠ 0 (Tall = 1 at this time) (step S3).

歌唱ピッチデータに係るピッチのうち最初のフレームのピッチが抽出されている（Ｐｉｔｃｈ（１）≠０）と判断した場合（ステップＳ３；Ｙｅｓ）には、検出部１０５１は、ピッチが抽出されていることを示す判断情報を計測部１０５２に出力する。そして、計測部１０５２は、Ｔｃｏｕｎｔのカウントをリセットして「０」とする（ステップＳ４）。そして、後述するステップＳ９へ進む。 When it is determined that the pitch of the first frame among the pitches related to the singing pitch data is extracted (Pitch (1) ≠ 0) (step S3; Yes), the detecting unit 1051 extracts the pitch. The determination information indicating that is output to the measurement unit 1052. Then, the measurement unit 1052 resets the count of Tcount to “0” (step S4). And it progresses to step S9 mentioned later.

一方、歌唱ピッチデータに係るピッチが抽出されていない（Ｐｉｔｃｈ（１）＝０）と判断した場合（ステップＳ３；Ｎｏ）には、検出部１０５１は、ピッチが抽出されていないと判断したことを示す判断情報を計測部１０５２に出力する。そして、計測部１０５２は、Ｔｃｏｕｎｔを「１」増加させてカウントアップし（ステップＳ５）、Ｔｃｏｕｎｔが所定数ｍになったかどうかを判断する（ステップＳ６）。以下、最初のフレーム（Ｔａｌｌ＝１）に限らずに説明を続ける。 On the other hand, if it is determined that the pitch related to the singing pitch data has not been extracted (Pitch (1) = 0) (step S3; No), the detection unit 1051 has determined that the pitch has not been extracted. The determination information shown is output to the measurement unit 1052. The measuring unit 1052 then increments Tcount by “1” and counts up (step S5), and determines whether or not Tcount has reached a predetermined number m (step S6). Hereinafter, the description is not limited to the first frame (Tall = 1).

Ｔｃｏｕｎｔ＝ｍでない場合、すなわちＴｃｏｕｎｔが所定数ｍより小さい場合および大きい場合（ステップＳ６；Ｎｏ）は、後述するステップＳ９へ進む。一方、Ｔｃｏｕｎｔ＝ｍである場合（ステップＳ６；Ｙｅｓ）は、Ｐｉｔｃｈ（α）−Ｐｉｔｃｈ（β）が予め設定された設定値ｋより大きいかどうかを判断する（ステップＳ７）。ここで、α＝Ｔａｌｌ−ｍ、β＝Ｔａｌｌ−ｍ−ｎである。 When Tcount = m is not satisfied, that is, when Tcount is smaller or larger than the predetermined number m (step S6; No), the process proceeds to step S9 described later. On the other hand, if Tcount = m (step S6; Yes), it is determined whether Pitch (α) −Pitch (β) is larger than a preset set value k (step S7). Here, α = Tall-m and β = Tall-mn.

Ｐｉｔｃｈ（α）−Ｐｉｔｃｈ（β）が設定値ｋ以下である場合（ステップＳ７；Ｎｏ）は、後述するステップＳ９へ進む。一方、Ｐｉｔｃｈ（α）−Ｐｉｔｃｈ（β）が設定値ｋより大きい場合（ステップＳ７；Ｙｅｓ）は、計測部１０５２は、識別信号を累算部１０５３に出力する。そして累算部１０５３は、Ｔｔｏｔａｌを「１」増加させてカウントアップする（ステップＳ８）。そして後述するステップＳ９へ進む。 When Pitch (α) −Pitch (β) is equal to or less than the set value k (step S7; No), the process proceeds to step S9 described later. On the other hand, when Pitch (α) −Pitch (β) is larger than the set value k (step S7; Yes), the measurement unit 1052 outputs an identification signal to the accumulation unit 1053. Then, the accumulating unit 1053 increases Ttotal by “1” and counts up (step S8). And it progresses to step S9 mentioned later.

ステップＳ４、Ｓ６、Ｓ７、Ｓ８における処理が終了すると、検出部１０５１は、ピッチ抽出部１０１から出力される歌唱ピッチデータに基づいて、楽曲が終了したかどうかを判断する（ステップＳ９）。楽曲が終了していないと判断した場合（ステップＳ９；Ｎｏ）には、ステップＳ２から繰り返し上述した処理を行う。一方、楽曲が終了したと判断した場合（ステップＳ９；Ｙｅｓ）には、検出部１０５１は、評価部１０５４に終了情報およびＴａｌｌを示す情報を出力する。評価部１０５４は、検出部１０５１から終了情報が出力されると、累算部１０５３において識別信号が出力される度に累算した結果であるＴｔｏｔａｌを読み出す。そして、評価部１０５４は、Ｔｔｏｔａｌ／Ｔａｌｌを示すフォール歌唱評価データを採点部１０４に出力する（ステップＳ１０）。なお、上述したように、フォール歌唱評価データは、Ｔｔｏｔａｌに基づいて生成されていれば、Ｔｔｏｔａｌ／Ｔａｌｌを示すものに限られない。例えばＴｔｏｔａｌそのものを示すものであってもよい。 When the processing in steps S4, S6, S7, and S8 ends, the detection unit 1051 determines whether or not the music has ended based on the singing pitch data output from the pitch extraction unit 101 (step S9). If it is determined that the music has not ended (step S9; No), the above-described processing is repeated from step S2. On the other hand, when it is determined that the music has ended (step S9; Yes), the detection unit 1051 outputs the end information and information indicating Tall to the evaluation unit 1054. When the end information is output from the detection unit 1051, the evaluation unit 1054 reads Ttotal that is a result of accumulation every time an identification signal is output in the accumulation unit 1053. And the evaluation part 1054 outputs the fall song evaluation data which shows Ttotal / Tall to the scoring part 104 (step S10). In addition, as above-mentioned, if fall song evaluation data are produced | generated based on Ttotal, it will not be restricted to what shows Ttotal / Tall. For example, it may indicate Ttotal itself.

ここで、フォール歌唱評価部１０５における処理の具体例として、図４に示すような、楽曲の進行のある一部分の場合について説明する。図４は、歌唱ピッチデータ、α、βを説明する説明図である。ここで、縦軸はピッチの高さを示し、横軸は時刻を示し、実線は歌唱ピッチデータが示すピッチである。また、図中に記載のｋは、予め設定された設定値ｋを示すものである。そしてｍ、ｎは、それぞれ所定数ｍ、予め設定されたフレーム数ｎであって、フレーム数を時間に換算して表記したものである。なお、上述したフレームに関する内容は、以下の説明においては、時間に換算したものとして説明する。また、当該部分を評価する前におけるＴｔｏｔａｌはγであるものとする。 Here, as a specific example of the processing in the fall singing evaluation unit 105, a case where the music progresses as shown in FIG. 4 will be described. FIG. 4 is an explanatory diagram for explaining singing pitch data, α and β. Here, the vertical axis indicates the pitch height, the horizontal axis indicates the time, and the solid line indicates the pitch indicated by the singing pitch data. Further, k described in the figure indicates a preset set value k. M and n are a predetermined number m and a preset number of frames n, respectively, which are expressed by converting the number of frames into time. In addition, the content regarding the frame described above will be described as being converted into time in the following description. Moreover, Ttotal before evaluating the said part shall be (gamma).

まず、時刻ｔ１を過ぎると、歌唱ピッチデータが示すピッチが０となり、計測部１０５２は、Ｔｃｏｕｎｔをカウントアップしていく。Ｔｃｏｕｎｔ＝ｍ、すなわち時刻がｔ１＋ｍになると、計測部１０５２は、上述したβに相当するｔ１におけるピッチＰｉｔｃｈ（ｔ１）および、αに相当するｔ１−ｎにおけるピッチＰｉｔｃｈ（ｔ１−ｎ）を取得する。ここで、Ｐ１＝Ｐｉｔｃｈ（ｔ１−ｎ）−Ｐｉｔｃｈ（ｔ１）は設定値ｋより小さいため、計測部１０５２は識別信号を出力しない。そして時刻ｔ２になると、歌唱ピッチデータが示すピッチが０でなくなるため、Ｔｃｏｕｎｔのカウントがリセットされ、Ｔｃｏｕｎｔ＝０となる。 First, after the time t1, the pitch indicated by the singing pitch data becomes 0, and the measuring unit 1052 counts up Tcount. When Tcount = m, that is, when the time becomes t1 + m, the measurement unit 1052 acquires the pitch Pitch (t1) at t1 corresponding to β and the pitch Pitch (t1-n) at t1-n corresponding to α. Here, since P1 = Pitch (t1-n) −Pitch (t1) is smaller than the set value k, the measurement unit 1052 does not output an identification signal. At time t2, since the pitch indicated by the singing pitch data is not 0, the count of Tcount is reset and Tcount = 0.

次に、時刻ｔ３を過ぎると、再び歌唱ピッチデータが示すピッチが０となり、計測部１０５２は、Ｔｃｏｕｎｔをカウントアップしていく。ここで、Ｔｃｏｕｎｔ＝ｍになる前である時刻ｔ４において歌唱ピッチデータが示すピッチが０でなくなり、再びＴｃｏｕｎｔ＝０となる。そして、時刻ｔ５を過ぎると、再び歌唱ピッチデータが示すピッチが０となり、計測部１０５２は、Ｔｃｏｕｎｔをカウントアップし、時刻ｔ５＋ｍになると、上述したβに相当するｔ５におけるピッチＰｉｔｃｈ（ｔ５）および、αに相当するｔ５−ｎにおけるピッチＰｉｔｃｈ（ｔ５−ｎ）を取得する。ここで、Ｐ５＝Ｐｉｔｃｈ（ｔ５−ｎ）−Ｐｉｔｃｈ（ｔ５）は設定値ｋより大きいため、計測部１０５２は識別信号を累算部１０５３に出力する。これにより、累算部１０５３はＴｔｏｔａｌをカウントアップし、Ｔｔｏｔａｌ＝γ＋１となる。そして、時刻ｔ６になると、歌唱ピッチデータが示すピッチが０でなくなるため、Ｔｃｏｕｎｔのカウントがリセットされ、Ｔｃｏｕｎｔ＝０となる。このように、例としてあげた部分におけるフォール歌唱評価部１０５においては、Ｔｔｏｔａｌがγからγ＋１に増加することになる。 Next, after the time t3, the pitch indicated by the singing pitch data becomes 0 again, and the measuring unit 1052 counts up Tcount. Here, at time t4 before Tcount = m, the pitch indicated by the singing pitch data is no longer 0, and Tcount = 0 again. Then, after the time t5, the pitch indicated by the singing pitch data becomes 0 again, and the measuring unit 1052 counts up Tcount. When the time t5 + m is reached, the pitch Pitch (t5) at t5 corresponding to β described above and The pitch Pitch (t5-n) at t5-n corresponding to α is acquired. Here, since P5 = Pitch (t5-n) −Pitch (t5) is larger than the set value k, the measurement unit 1052 outputs an identification signal to the accumulation unit 1053. As a result, the accumulation unit 1053 counts up Ttotal, and Ttotal = γ + 1. At time t6, since the pitch indicated by the singing pitch data is not 0, the count of Tcount is reset and Tcount = 0. Thus, in the fall singing evaluation unit 105 in the portion given as an example, Ttotal increases from γ to γ + 1.

このようにして、全てのフレームについてフォール歌唱評価部１０５における処理が行われる。そして、評価部１０５４はＴｔｏｔａｌ／Ｔａｌｌを示すフォール歌唱評価データを採点部１０４に出力する。 In this way, the processing in the fall singing evaluation unit 105 is performed for all frames. Then, the evaluation unit 1054 outputs fall song evaluation data indicating Ttotal / Tall to the scoring unit 104.

そして、採点部１０４は、通常評価部１０３から出力された通常評価データと、フォール歌唱評価部１０５から出力されたフォール歌唱評価データとに基づいて、所定のアルゴリズムによって歌唱者の歌唱の評価点を算出する。そして、その算出結果が表示部１５に表示されることになる。 Then, the scoring unit 104 determines the evaluation score of the singer's song by a predetermined algorithm based on the normal evaluation data output from the normal evaluation unit 103 and the Fall song evaluation data output from the Fall song evaluation unit 105. calculate. Then, the calculation result is displayed on the display unit 15.

以上のように、本実施形態におけるカラオケ装置１は、歌唱のフレーズの最後など所定時間無音になることによりピッチが抽出できない場合に、最後にピッチ抽出部１０１が抽出したピッチＰｉｔｃｈ（β）と、さらにフレーム数ｎだけ前のピッチＰｉｔｃｈ（α）を認識することができる。そして、Ｐｉｔｃｈ（α）−Ｐｉｔｃｈ（β）が設定値ｋより大きい場合には、フレーム数ｎに対応する時間内に設定値ｋより大きくピッチが下がったことになり、フォール歌唱の状態になったといえるから、歌唱者の歌唱の評価による採点結果にフォール歌唱の影響を加えることができる。 As described above, the karaoke apparatus 1 according to the present embodiment, when the pitch cannot be extracted due to silence for a predetermined time such as the end of a singing phrase, and the pitch Pitch (β) extracted by the pitch extraction unit 101 at the end, Further, it is possible to recognize the pitch Pitch (α) that is the previous number n of frames. When Pitch (α) −Pitch (β) is larger than the set value k, the pitch has fallen larger than the set value k within the time corresponding to the number of frames n, and a fall singing state is assumed. Since it can be said, the influence of the Fall singing can be added to the scoring result by the evaluation of the singing of the singer.

以上、本発明の実施形態について説明したが、本発明は以下のように、さまざまな態様で実施可能である。 As mentioned above, although embodiment of this invention was described, this invention can be implemented in various aspects as follows.

＜変形例１＞
実施形態におけるフォール歌唱評価部１０５におけるフォール歌唱の評価は、ピッチ抽出部１０１から出力された歌唱ピッチデータを用いて行ったが、ピッチ算出部１０２から出力されるメロディピッチデータをさらに用いて、フォール歌唱の評価を行なってもよい。この場合には、以下のようにすればよい。図５に示すように、ピッチ算出部１０２は、メロディピッチデータを通常評価部１０３に出力するとともに、フォール歌唱評価部１０５の計測部１０５２に出力する。そして、フォール歌唱評価部１０５は、図６に示すような処理を行う。図６に示すフローチャートは、図３に示すフローチャートにおけるステップＳ７の処理に替えて、ステップＳ７１、Ｓ７２の処理としたものである。以下、計測部１０５２におけるステップＳ７１、Ｓ７２の処理について説明する。 <Modification 1>
Although the fall song evaluation in the fall song evaluation unit 105 in the embodiment is performed using the song pitch data output from the pitch extraction unit 101, the fall song evaluation unit 105 further uses the melody pitch data output from the pitch calculation unit 102 to Singing may be evaluated. In this case, the following may be performed. As shown in FIG. 5, the pitch calculation unit 102 outputs the melody pitch data to the normal evaluation unit 103 and also outputs it to the measurement unit 1052 of the fall song evaluation unit 105. And fall song evaluation part 105 performs processing as shown in FIG. The flowchart shown in FIG. 6 is the process of steps S71 and S72 instead of the process of step S7 in the flowchart shown in FIG. Hereinafter, the process of steps S71 and S72 in the measurement unit 1052 will be described.

Ｔｃｏｕｎｔ＝ｍである場合（ステップＳ６；Ｙｅｓ）は、Ｐｉｔｃｈ２（α）−Ｐｉｔｃｈ（β）が予め設定された設定値ｋより大きいかどうかを判断する（ステップＳ７１）。ここで、Ｐｉｔｃｈ２（ｆ）は、フレーム数ｆにおけるメロディピッチデータが示すピッチに相当するピッチである。ここで、Ｐｉｔｃｈ２（ｆ）に相当とは、フレーム数ｆにおけるメロディピッチデータが示すピッチより少し低いピッチ（例えば−１０ｃｅｎｔ）であることをいう。なお、メロディピッチデータが示すピッチは、ガイドメロディトラックが示す各音のピッチを算出したものであって同じ音内ではピッチは変わらないから、必ずしもフレーム数αにおけるピッチである必要は無く、前後に若干ずれたフレームにおけるピッチであっても問題ない。すなわち、フレーム数αに対応する音が認識できれば、どのような方法であってもよく、当該音のピッチをＰｉｔｃｈ２（α）とみなせばよい。 If Tcount = m (step S6; Yes), it is determined whether Pitch2 (α) −Pitch (β) is larger than a preset set value k (step S71). Here, Pitch2 (f) is a pitch corresponding to the pitch indicated by the melody pitch data in the frame number f. Here, “corresponding to Pitch2 (f)” means that the pitch is slightly lower than the pitch indicated by the melody pitch data in the frame number f (for example, −10 cent). Note that the pitch indicated by the melody pitch data is obtained by calculating the pitch of each sound indicated by the guide melody track and does not change within the same sound. There is no problem even if the pitch is slightly shifted in the frame. In other words, any method may be used as long as the sound corresponding to the number of frames α can be recognized, and the pitch of the sound may be regarded as Pitch2 (α).

Ｐｉｔｃｈ２（α）−Ｐｉｔｃｈ（β）が予め設定された設定値ｋより小さい場合（ステップＳ７１；Ｎｏ）は、ステップＳ９へ進む。一方、Ｐｉｔｃｈ２（α）−Ｐｉｔｃｈ（β）が予め設定された設定値ｋより大きい場合（ステップＳ７１；Ｙｅｓ）は、歌唱ピッチデータに係るピッチのうち、フレーム数がαからβの間にＰｉｔｃｈ２（α）に該当するピッチ（Ｐｉｔｃｈ（ｈ）＝Ｐｉｔｃｈ２（α）、但しα＜ｈ＜β）があるかどうかを判断する（ステップＳ７２）。ここで、Ｐｉｔｃｈ２（α）は、フレーム数αにおけるメロディピッチデータが示すピッチより少し低いピッチであるから、ガイドメロディトラックが示す音のピッチより少し低いピッチで歌唱が行なわれていた場合に、フォール歌唱の状態であっても上記条件を満たさなくなることを防止することができる。 If Pitch2 (α) −Pitch (β) is smaller than the preset setting value k (step S71; No), the process proceeds to step S9. On the other hand, when Pitch2 (α) −Pitch (β) is larger than a preset set value k (step S71; Yes), among the pitches related to the singing pitch data, the pitch2 (α) It is determined whether or not there is a pitch (Pitch (h) = Pitch2 (α), where α <h <β) corresponding to α) (step S72). Here, since Pitch2 (α) is a pitch slightly lower than the pitch indicated by the melody pitch data in the frame number α, when the singing is performed at a pitch slightly lower than the pitch of the sound indicated by the guide melody track, the fall is performed. Even if it is in the state of singing, it can prevent that the said conditions are no longer satisfied.

上記条件を満たさない場合（ステップＳ７２；Ｎｏ）は、ステップＳ９に進む。一方、上記条件を満たす場合（ステップＳ７２；Ｙｅｓ）は、ステップＳ８に進む。ここで、図７に楽曲の進行の一部におけるメロディピッチデータに係るピッチと歌唱ピッチデータに係るピッチについて、ステップＳ７２における条件を満たす場合を図７（ａ）、満たさない場合を図７（ｂ）として示す。図７（ａ）においては、ｈがαとβの間に位置しているが、図７（ｂ）においては、歌唱のピッチの下降が緩やかなため、ｈがαよりも前に位置してしまい、フォール歌唱としては認識されない。以上のような構成においても、実施形態と同様にしてフォール歌唱を評価することができる。 If the above condition is not satisfied (step S72; No), the process proceeds to step S9. On the other hand, if the above condition is satisfied (step S72; Yes), the process proceeds to step S8. Here, FIG. 7A shows a case where the condition in step S72 is satisfied with respect to the pitch related to the melody pitch data and the pitch related to the singing pitch data in a part of the progression of the music, and FIG. ). In FIG. 7 (a), h is located between α and β. However, in FIG. 7 (b), since the singing pitch declines slowly, h is located before α. It is not recognized as a fall song. Even in the above configuration, the fall song can be evaluated in the same manner as in the embodiment.

＜変形例２＞
実施形態においては、フレーム数ｎに対応する時間内に設定値ｋだけピッチが下がったことを判断することによりフォール歌唱の状態になったことを認識して評価していたが、さらに楽曲の進行に対して、特定の期間においてフォール歌唱が認識されるようにしてもよい。この場合には、フォール歌唱を認識すべきことを示す情報のイベントデータを特定の期間を示すデータ位置に設けるようにして、楽曲データに含まれるようにすればよい。そして、ＣＰＵ１１が楽曲データを再生する際に上記情報を識別したときに、その情報を計測部１０５２などにそれぞれ出力して、当該イベントデータが示す特定の期間を計測部１０５２が認識するようにすれば、当該特定の期間以外においては、識別信号を出力しないようにすることができるから、フォール歌唱として認識されなくなる。なお、特定の期間を示すデータを記憶部１４に別途記憶させ、これをＣＰＵ１１が読み出すことにより認識するようにしてもよい。 <Modification 2>
In the embodiment, it has been recognized and evaluated that the state of fall singing has been reached by determining that the pitch has decreased by the set value k within the time corresponding to the number of frames n. On the other hand, the fall song may be recognized in a specific period. In this case, event data of information indicating that a fall song should be recognized may be provided at a data position indicating a specific period so as to be included in the music data. When the CPU 11 identifies the above information when reproducing the music data, the information is output to the measuring unit 1052 or the like so that the measuring unit 1052 recognizes the specific period indicated by the event data. For example, since the identification signal can be prevented from being output outside the specific period, it is not recognized as a fall song. Note that data indicating a specific period may be separately stored in the storage unit 14 and may be recognized by the CPU 11 reading it.

また、別の方法として、ガイドメロディトラックを解析して、メロディを構成する各音において隣接する音程の関係、音長、無音時間などに基づいて特定の期間を決定する解析手段を設けて、計測部１０５２に認識させるようにしてもよい。例えば、メロディを構成するある音の音程とその次の音の音程を比較した場合に、次の音の音程の方が一定量以上離れている場合には、当該ある音の最後の部分を特定の期間とすればよい。また、ある音の音長が一定以上長い場合や次の音まで無音時間が一定時間以上ある場合などにも、同様にして当該ある音の最後の部分を特定の期間とすればよい。このようにすれば、特定の期間においてフォール歌唱が認識されるようになるため、フォール歌唱が不要な部分では、評価の対象としないこともできる。 Another method is to analyze the guide melody track and provide analysis means to determine a specific period based on the relationship between adjacent pitches in each sound that composes the melody, sound length, silent time, etc. The unit 1052 may be made to recognize. For example, when comparing the pitch of one sound that composes a melody with the pitch of the next sound, if the pitch of the next sound is more than a certain amount away, the last part of the certain sound is specified. The period of time may be used. Similarly, when the sound length of a certain sound is longer than a certain time, or when the silent time until a next sound is longer than a certain time, the last part of the certain sound may be set as a specific period. If it does in this way, since a fall song comes to be recognized in a specific period, it can also be made not to be an object of evaluation in a portion where fall song is unnecessary.

＜変形例３＞
実施形態においては、フレーム数ｎに対応する時間内に設定値ｋだけピッチが下がったことを判断することによりフォール歌唱の状態になったことを認識して評価していたが、さらに条件を加え、計測部１０５２が検出部１０５１によってバッファされた歌唱ピッチデータを解析し条件を満たしていると判断した場合に、計測部１０５２は識別信号を出力するようにすればよい。ここで、条件には以下のような条件を設ければよい。例えば、条件として、フレーム数αの前の部分にあたる一定数のフレーム数のピッチが一定のピッチ範囲に収まっていることとすると、安定した音程の状態からフォール歌唱の状態になった場合に限定してフォール歌唱を評価することができる。この際、変形例１のようにしてメロディピッチデータを参照して、当該一定のピッチ範囲を決めてもよく、Ｐｉｔｃｈ２（α）を中心として例えば上下に２０ｃｅｎｔの範囲をピッチ範囲とすればよい。また、他の条件としては、上記Ｐｉｔｃｈ２（α）に対応するメロディを構成するある音の次の音についてのピッチに対して、一定の範囲のピッチにＰｉｔｃｈ（β）が含まれないこととする。このようにすると、次の音のピッチに連続して変化するようにした歌唱（ポルタメント）とフォール歌唱を区別することができる。 <Modification 3>
In the embodiment, it was recognized and evaluated that the state of fall singing was reached by judging that the pitch was lowered by the set value k within the time corresponding to the number of frames n, but further conditions were added. When the measurement unit 1052 analyzes the singing pitch data buffered by the detection unit 1051 and determines that the condition is satisfied, the measurement unit 1052 may output an identification signal. Here, the conditions may be set as follows. For example, as a condition, if the pitch of a certain number of frames corresponding to the part before the frame number α is within a certain pitch range, the condition is limited to a state where a fall singing state is obtained from a stable pitch state. Can evaluate the fall song. At this time, the fixed pitch range may be determined by referring to the melody pitch data as in Modification 1, and a range of 20 cents in the vertical direction, for example, centering on Pitch2 (α) may be used. Further, as another condition, pitch (β) is not included in a certain range of pitch with respect to the pitch of a sound next to a certain sound constituting the melody corresponding to the above Pitch 2 (α). . In this way, it is possible to distinguish between the singing (portamento) and the fall singing that are continuously changed to the pitch of the next sound.

また、歌唱ピッチデータから音程を震わせる歌唱（ビブラート）を検出するビブラート検出手段を設け、ビブラート検出手段がビブラートを検出しないことを条件としてもよい。すなわちビブラートを検出した場合には、計測部１０５２は識別信号を出力しないようにすればよい。ビブラートはピッチが上下に変動することから、ピッチが下に変動したときにフォール歌唱と認識する可能性があるが、これを防ぐことができる。このように、計測部１０５２は、設定された各条件を満たした場合に、識別信号を出力するようにすると、より精度の高いフォール歌唱の評価を行うことができる。 Moreover, it is good also as conditions on the condition that the vibrato detection means which detects the song (vibrato) which shakes a pitch from song pitch data is provided, and a vibrato detection means does not detect vibrato. That is, when vibrato is detected, the measurement unit 1052 may not output an identification signal. Since vibrato fluctuates up and down, it may be recognized as a fall song when the pitch fluctuates downward, but this can be prevented. As described above, the measurement unit 1052 can evaluate the fall singing with higher accuracy by outputting the identification signal when each set condition is satisfied.

＜変形例４＞
実施形態においては、設定値ｋ、カウント数ｍ、フレーム数ｎなどのパラメータは、予め設定されていたが、これらは、楽曲の進行の途中で変更されるようにしてもよい。この場合は、パラメータの値を示す情報をイベントデータなどにより楽曲データに含まれるようにすればよい。そして、ＣＰＵ１１が楽曲データを再生する際に上記情報を識別したときに、その情報を検出部１０５１、計測部１０５２などにそれぞれ出力して、それぞれ設定する設定手段を設ければよい。このようにすれば、楽曲の進行に対して異なる態様でフォール歌唱の評価をすることができる。また、楽曲中においてフォール歌唱の状態になっても評価の対象としない部分については、この設定の態様（例えば、設定値ｋ、カウント数ｍを大きくするなど）によって制御することも可能であり、変形例２で述べたような効果を得ることもできる。なお、パラメータの値を示すデータを記憶部１４に別途記憶させ、これをＣＰＵ１１が読み出すことにより認識し、各パラメータに対して値を設定する設定手段を設ければよい。なお、実施形態のように予め設定された各パラメータで固定して使用するモード、または本変形例のように楽曲データに基づいて設定された各パラメータで使用するモードのうち、どのモードを用いるかについては、利用者が操作部１６を操作することによって、選択できるようにしてもよい。また、利用者が操作部１６を操作することによって入力した歌唱者のレベルに応じて、各パラメータが変更されるようにしてもよい。 <Modification 4>
In the embodiment, parameters such as the set value k, the count number m, and the frame number n are set in advance. However, these parameters may be changed during the progress of the music. In this case, information indicating the parameter value may be included in the music data by event data or the like. Then, when the CPU 11 identifies the above information when reproducing the music data, setting means for outputting the information to the detection unit 1051, the measurement unit 1052, and the like, respectively, may be provided. If it does in this way, fall song can be evaluated in a different mode to progress of music. Moreover, it is also possible to control the portion that is not subject to evaluation even in the fall singing state in the music according to this setting mode (for example, increasing the set value k, the count number m, etc.) The effects described in the second modification can also be obtained. Note that data indicating parameter values may be separately stored in the storage unit 14 and recognized by the CPU 11 reading the data, and setting means for setting values for each parameter may be provided. In addition, which mode is used among the mode that is fixedly used with each preset parameter as in the embodiment, or the mode that is used with each parameter set based on music data as in this modification The user may be able to select by operating the operation unit 16. Each parameter may be changed according to the level of the singer input by the user operating the operation unit 16.

＜変形例５＞
実施形態においては、β＝Ｔａｌｌ−ｍとしていたために、Ｐｉｔｃｈ（β）はピッチ抽出部１０１において歌唱者音声データからピッチが抽出できなくなる直前フレームのピッチを示していたが、さらに少し前のフレームにおけるピッチであってもよい。この場合は、β＝Ｔａｌｌ−ｍ−１０として一定のフレーム数前（この場合は１０フレーム前）のフレームのピッチＰｉｔｃｈ（β）としてもよいし、Ｔａｌｌ−ｍ−１０からＴａｌｌ−ｍまでのフレームにおけるピッチに基づいて決定したピッチ、例えば平均値などをＰｉｔｃｈ（β）とみなしてもよい。このようにすると、Ｐｉｔｃｈ（β）は、抽出できた最後のピッチとする場合よりも実際の聞こえ方にあったピッチとなるから、より正確なフォール歌唱の評価ができる。なお、一定のフレーム数は、予め設定されたフレーム数ｎよりは少ない数としておくことが望ましい。また、一定のフレーム数を大きくするとαとβの間隔が少なくなるから、計測部１０５２は、設定値ｋの値を当該間隔に応じて少なくなるように設定してもよいし、設定値ｋは変化させずにαとβの間隔が変化しないようにフレーム数ｎを設定してもよい。 <Modification 5>
In the embodiment, since β = Tall−m, Pitch (β) indicates the pitch of the frame immediately before the pitch extraction unit 101 cannot extract the pitch from the singer voice data. May be the pitch. In this case, β = Tall−m−10 and a pitch Pitch (β) of a certain number of frames before (in this case, 10 frames before) may be used, or frames from Tall−m−10 to Tall−m A pitch determined based on the pitch at, such as an average value, may be regarded as Pitch (β). In this way, since Pitch (β) is a pitch that is actually heard compared to the last pitch that can be extracted, the fall singing can be evaluated more accurately. It is desirable that the fixed number of frames is smaller than the preset number of frames n. In addition, since the interval between α and β decreases when the number of fixed frames is increased, the measurement unit 1052 may set the value of the set value k so as to decrease according to the interval. The number of frames n may be set so that the interval between α and β does not change without being changed.

＜変形例６＞
実施形態においては、Ｐｉｔｃｈ（α）がＰｉｔｃｈ（β）に対して設定値ｋより大きければフォール歌唱の状態になっていると判断したが、その途中のフレーム（αからβの間）におけるピッチの変化の態様について条件を設け、これを満たした場合に計測部１０５２は識別信号を出力するようにしてもよい。この場合、計測部１０５２は、検出部１０５１によってバッファされた歌唱ピッチデータに係るフレーム数αからβの間のピッチの変化を認識する。そして、その変化の態様が所定の条件を満たしているかを判断すればよい。ここで、所定の条件とは、例えば、ピッチがフレーム数αからβにかけて単調減少している（例えば、１次微分が負）とすればよい。このようにすれば、途中でピッチが上がってから下がるという歌唱の状態を除くことができ、より精度の高いフォール歌唱の評価ができる。 <Modification 6>
In the embodiment, if Pitch (α) is larger than Pitch (β) than the set value k, it is determined that the state is in the fall singing state, but the pitch of the intermediate frame (between α and β) is determined. A condition may be provided for the mode of change, and the measurement unit 1052 may output an identification signal when the condition is satisfied. In this case, the measurement unit 1052 recognizes a change in pitch between the frame numbers α and β related to the singing pitch data buffered by the detection unit 1051. And what is necessary is just to judge whether the aspect of the change satisfy | fills a predetermined condition. Here, the predetermined condition may be, for example, that the pitch monotonously decreases from the number of frames α to β (for example, the first derivative is negative). If it does in this way, the state of the singing that the pitch goes up and goes down in the middle can be removed, and the fall singing with higher accuracy can be evaluated.

＜変形例７＞
実施形態においては、フォール歌唱評価部１０５による処理は、歌唱者が歌唱する楽曲が終了した後に行なわれていたが、歌唱途中で順次処理が行なわれるようにしてもよい。この場合には、ピッチ抽出部１０１は、楽曲の進行に応じて、すでに歌唱された部分のデータである歌唱者音声データから歌唱のピッチを順次抽出し、歌唱ピッチデータをフォール歌唱評価部１０５に出力していくようにすればよい。そして、フォール歌唱評価部１０５は、ピッチ抽出部１０１から順次出力される歌唱ピッチデータにあわせて、順次処理を行っていけばよい。このようにすれば、楽曲が終了した後わずかな時間で処理が終了するため、早く評価結果を表示部１５に表示させることができる。また、計測部１０５２がＴｃｏｕｎｔを出力するタイミング、すなわちフォール歌唱が検出されたときに、ＣＰＵ１１は、表示部１５に当該検出が行われたことを示す表示を行なうこともできる。 <Modification 7>
In the embodiment, the processing by the fall singing evaluation unit 105 is performed after the music sung by the singer is finished, but the processing may be sequentially performed during the singing. In this case, the pitch extraction unit 101 sequentially extracts the singing pitch from the singer's voice data, which is the data of the already sung portion, as the music progresses, and the singing pitch data is sent to the fall singing evaluation unit 105. Just output it. And the fall song evaluation part 105 should just process sequentially according to the song pitch data sequentially output from the pitch extraction part 101. FIG. In this way, since the process is completed in a short time after the music is completed, the evaluation result can be quickly displayed on the display unit 15. Further, when the timing at which the measurement unit 1052 outputs Tcount, that is, when a fall song is detected, the CPU 11 can also display on the display unit 15 that the detection has been performed.

実施形態に係るカラオケ装置のハードウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the hardware of the karaoke apparatus which concerns on embodiment. 実施形態に係るカラオケ装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the karaoke apparatus which concerns on embodiment. 実施形態に係るフォール歌唱評価部におけるフォール歌唱の評価の流れを示すフローチャートである。It is a flowchart which shows the flow of evaluation of the fall song in the fall song evaluation part which concerns on embodiment. 実施形態に係るフォール歌唱評価部におけるフォール歌唱の評価の一部を示す説明図である。It is explanatory drawing which shows a part of evaluation of the fall song in the fall song evaluation part which concerns on embodiment. 変形例１に係るカラオケ装置のソフトウエアの構成を示すブロック図である。It is a block diagram which shows the structure of the software of the karaoke apparatus which concerns on the modification 1. 変形例１に係るフォール歌唱評価部におけるフォール歌唱の評価の流れを示すフローチャートである。It is a flowchart which shows the flow of evaluation of the fall song in the fall song evaluation part which concerns on the modification 1. FIG. 変形例１に係るフォール歌唱評価部におけるフォール歌唱の評価の一部を示す説明図である。It is explanatory drawing which shows a part of fall song evaluation in the fall song evaluation part which concerns on the modification 1. FIG.

Explanation of symbols

１…カラオケ装置、１０…バス、１１…ＣＰＵ、１２…ＲＯＭ、１３…ＲＡＭ、１４…記憶部、１４ａ…楽曲データ記憶領域、１４ｂ…歌唱者音声データ記憶領域、１５…表示部、１６…操作部、１７…マイクロフォン、１８…音声処理部、１９…スピーカ、１０１…ピッチ抽出部、１０２…ピッチ算出部、１０３…通常評価部、１０４…採点部、１０５…フォール歌唱評価部、１０５１…検出部、１０５２…計測部、１０５３…累算部、１０５４…評価部 DESCRIPTION OF SYMBOLS 1 ... Karaoke apparatus, 10 ... Bus, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Storage part, 14a ... Music data storage area, 14b ... Singer voice data storage area, 15 ... Display part, 16 ... Operation , 17 ... Microphone, 18 ... Audio processing part, 19 ... Speaker, 101 ... Pitch extraction part, 102 ... Pitch calculation part, 103 ... Normal evaluation part, 104 ... Scoring part, 105 ... Fall song evaluation part, 1051 ... Detection part , 1052 ... Measurement unit, 1053 ... Accumulation unit, 1054 ... Evaluation unit

Claims

Playback means for playing back music;
Voice input means for generating singer voice data based on a singer's singing input during playback of the music;
Based on the singer voice data, pitch extraction means for extracting the singer's singing pitch;
When there is a period during which the pitch extraction means cannot extract the pitch continuously for a predetermined time or more, the first pitch that is extracted by the pitch extraction means at a predetermined timing before the period is specified as the first pitch. Specific means,
Second identifying a pitch extracted by the pitch extracting means at the second timing corresponding to before a preset time of the first timing at which the first pitch is extracted as a second pitch by the pitch extracting means Specific means,
An identification means for outputting an identification signal when the second pitch is larger than the first pitch by a predetermined pitch or more;
A karaoke apparatus comprising: an evaluation unit that performs a predetermined process based on an identification signal output from the identification unit.

Storage means for storing guide melody data indicating a melody to be sung by a singer in accordance with the progress of the music;
Voice input means for generating singer voice data based on a singer's singing input corresponding to the progress of the music;
Based on the singer voice data, pitch extraction means for extracting the pitch of the singer's singing corresponding to the progress of the music;
If there is a period during which the pitch extraction means cannot extract the pitch continuously for a predetermined time or more with respect to the progress of the music, the pitch extracted by the pitch extraction means at a predetermined timing before the period is set to the first First specifying means for specifying as a pitch;
Of the sounds constituting the melody indicated by the guide melody data, the pitch extraction means calculates the pitch of the sound corresponding to the timing of the progression of the music from which the first pitch has been extracted, and based on the calculated pitch A pitch calculating means for calculating a fixed low pitch as the second pitch;
The pitch extracted by the pitch extractor during a period from a second timing that is a preset set time before the first timing with respect to the progress of the music from which the pitch extractor has extracted the first pitch to the first timing. And when the second pitch is larger than the first pitch, and when the second pitch is larger than the first pitch by a predetermined pitch or more, an identification unit that outputs an identification signal;
A karaoke apparatus comprising: an evaluation unit that performs a predetermined process based on an identification signal output from the identification unit.

Said identification means further that the pitch the pitch extraction unit and extracted in a predetermined time earlier than the first timing is when included in the pitch range of Tokoro constant width is for outputting an identification signal The karaoke apparatus according to claim 1 or 2, characterized by the above.

It said identification means further variation of pitch the pitch extracting means has extracted a given interval of the first timing previously, if it has been monotonically decrease, and outputs an identification signal The karaoke apparatus in any one of Claims 1 thru | or 3.

The predetermined process in the evaluation unit is a process of measuring the number of times that the identification signal is output from the identification unit, and evaluating the singing of the singer based on the measured number of times. The karaoke apparatus according to any one of claims 1 to 4.

Further comprising range storage means for storing range specifying information indicating a specific range of the music corresponding to the progress of the music,
It said identification means further wherein the first timing is when included in a specific range the range specifying information indicates the of claims 1 to 5, characterized in that outputs the identification signal Karaoke apparatus in any one.

Setting storage means for storing setting information indicating a setting value of the predetermined time, the predetermined timing, the setting time or the predetermined pitch;
7. The apparatus according to claim 1, further comprising a setting unit configured to set a value of the predetermined time, the predetermined timing, the set time, or the predetermined pitch based on the setting information. Karaoke apparatus as described in 1.

A playback process for playing music,
A voice input process for generating singer voice data based on a singer's singing input during playback of the music;
Based on the singer voice data, a pitch extraction process of extracting the singer's singing pitch;
If there is a period during which the pitch cannot be extracted continuously for a predetermined time or more in the pitch extraction process, the first pitch that identifies the pitch extracted by the pitch extraction process at a predetermined timing before the period as the first pitch Specific process,
Second identifying a pitch extracted by the pitch extraction process at the second timing corresponding to before a preset time of the first timing at which the first pitch is extracted at the pitch extraction process as a second pitch Specific process,
An identification process for outputting an identification signal when the second pitch is larger than the first pitch by a predetermined pitch or more;
A singing evaluation method comprising: an evaluation process for performing a predetermined process based on the identification signal output in the identification process.

A voice input process for generating singer voice data based on a singer's singing input corresponding to the progress of the music;
Based on the singer voice data, a pitch extraction process of extracting the pitch of the singer's singing corresponding to the progress of the music;
If there is a period of time can not be extracted pitch continuously over a predetermined time with respect to the progression of the music piece in the pitch extraction process, the pitch extracted by the pitch extraction process at a predetermined timing before the period first A first identifying process identified as a pitch;
The first pitch is extracted in the pitch extraction process among the sounds constituting the melody indicated by the guide melody data indicating the melody that the singer should sing corresponding to the progress of the music stored in the storage means. Calculating a pitch of a sound corresponding to the timing with respect to the progress of the music, and calculating a pitch lower than the calculated pitch by a predetermined amount as a second pitch;
The pitch extracted by the pitch extraction process in the period from the second timing before the first set time to the first timing with respect to the progression of the music from which the first pitch was extracted in the pitch extraction process. And when a pitch larger than the second pitch is included, and when the second pitch is larger than the first pitch by a predetermined pitch or more, an identification process of outputting an identification signal;
A singing evaluation method comprising: an evaluation process for performing a predetermined process based on the identification signal output in the identification process.

On the computer,
A playback function to play music,
A voice input function for generating singer voice data based on a singer's singing input during playback of the music;
Based on the singer voice data, a pitch extraction function for extracting the pitch of the singer's song;
When there is a period during which the pitch cannot be extracted continuously for a predetermined time or longer in the pitch extraction function, the pitch extracted by the pitch extraction function at a predetermined timing before the period is specified as the first pitch. Specific functions,
Second identifying a more extracted pitch to the pitch extraction function in the second timing falls before a preset time of the first timing at which the first pitch is extracted at the pitch extraction function as a second pitch Specific features of
An identification function for outputting an identification signal when the second pitch is larger than the first pitch by a predetermined pitch or more;
A program for realizing an evaluation function for performing a predetermined process based on an identification signal output by the identification function.

On the computer,
A memory function for storing guide melody data indicating a melody that a singer should sing in response to the progress of the music;
A voice input function for generating singer voice data based on a singer's singing input corresponding to the progress of the music;
Based on the singer voice data, a pitch extraction function for extracting the pitch of the singer's singing corresponding to the progress of the music;
If there is a period during which the pitch cannot be extracted continuously for a predetermined time or more with respect to the progress of the music in the pitch extraction function, the pitch extracted by the pitch extraction function at a predetermined timing before the period is set to the first A first specific function for specifying as a pitch;
Of the sounds constituting the melody indicated by the guide melody data, the pitch of the sound corresponding to the timing of the progression of the music from which the first pitch has been extracted by the pitch extraction function is calculated. A pitch calculation function for calculating a fixed low pitch as the second pitch;
A pitch the pitch extraction is extracted from the second timing before falls preset time in the period until the first timing of the first timing on the progression of the music extracted the first pitch in the pitch extraction function An identification function for outputting an identification signal when a pitch larger than the second pitch is included, and when the second pitch is larger than the first pitch by a predetermined pitch,
The program for implement | achieving the evaluation function which performs a predetermined process based on the identification signal output by the said identification function.