JP3562223B2

JP3562223B2 - Karaoke equipment

Info

Publication number: JP3562223B2
Application number: JP15255997A
Authority: JP
Inventors: 兼久鶴見
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1997-06-10
Filing date: 1997-06-10
Publication date: 2004-09-08
Anticipated expiration: 2017-06-10
Also published as: JPH113087A

Description

【０００１】
【発明の属する技術分野】
この発明は、利用者の歌唱力を採点する機能を有するカラオケ装置に関する。
【０００２】
【従来の技術】
従来より、歌い手の歌唱力を採点する機能を備えたカラオケ装置が各種開発されている。一般に、この種のカラオケ装置においては、歌い手の歌唱音声とカラオケの楽曲情報に含められたボーカルパートのリファレンスとの間で音量や音程（ピッチ）等を比較し、その一致の程度に応じて歌唱力を採点するようになっている。
【０００３】
【発明が解決しようとする課題】
ところで、従来のカラオケ装置において、デュエット曲のように複数のボーカルパートからなる曲が歌唱される場合、その歌唱力の採点は、複数のマイクロホン（以下、マイクという）から入力される歌唱音声を混合した信号を、ボーカルパートのリファレンス（通常、メインボーカルの基準値）と比較して行われる。したがって、各パートの歌唱音声について正当な評価をすることができず、正確な採点結果を得ることができなかった。
【０００４】
このような場合に歌唱力の採点を、各歌唱音声を各パートのガイドメロディと比較し、両者の採点結果の合計で行うことも考えられるが、二人の協調の程度を加味した総合的な採点を行う場合には、以下の問題がある。
まず、一方の歌い手が正しく歌唱して、他方の歌い手が歌唱しなかった場合、単純に両者の採点結果を加算して総合的な採点としたのでは、歌唱しなっかた他方の歌い手の影響を受けて、正しく歌唱した歌い手者の歌唱力が採点結果に正当反映されない。
また、デュエット曲には、男性と女性が同時に歌唱する混成歌唱区間の他、男性のみが歌唱する男性歌唱区間や女性のみが歌唱する女性歌唱区間があるが、一方の歌い手のみが歌唱する区間において、両者の採点結果の合計を取ると、本来歌唱すべきでない他方の歌唱音声も採点対象となり、正確な採点結果を得ることができない。
【０００５】
さらに、上記したデュエット曲に対応するカラオケ装置においては、各パートを同時に採点する必要があるため、採点機を２系統用意することが前提となる。一方、カラオケ装置で歌唱される曲は、デュエット曲ばかりでなく、むしろ単独のボーカルパートからなる通常の曲の方が多い。この場合に、一方の採点機を用いれば歌唱力の採点を行うことができるが、他方の採点機を利用して採点の精度を高めることができれば便利である。
【０００６】
この発明は、このような背景の下になされたもので、デュエット曲のように複数のボーカルパートが歌唱される場合に、各パートの歌唱音声について正当な評価をし正確な採点結果を得ることができるカラオケ装置を提供することを目的とする。また、他の目的は歌唱力の採点精度を向上させることにある。
【０００７】
【課題を解決するための手段】
上述した課題を解決するために、請求項１記載の発明は、選択手段と、第１比較手段と、第２比較手段と、供給手段と、演算手段と、評価手段を備え、曲データを演奏するカラオケ装置であって、曲データは、第１基準値、第２基準値を含むと共に、混成歌唱区間、第１単独歌唱区間、第２単独歌唱区間が識別可能にされ、選択手段は、演奏が混声歌唱区間にある場合は、第１マイクロホンから入力される歌唱音声信号を第１出力端子から出力し、第２マイクロホンから入力される歌唱音声信号を第２出力端子から出力し、演奏が第１単独歌唱区間にある場合は、第１マイクロホンから入力される歌唱音声信号を第１および第２出力端子から出力し、演奏が第２単独歌唱区間にある場合は、第２マイクロホンから入力される歌唱音声信号を第１および第２出力端子から出力し、第１比較手段は、第１出力端子から出力される歌唱音声信号の特徴量を供給される第１または第２基準値と比較し、第２比較手段は、第２出力端子から出力される歌唱音声信号の特徴量を供給される第１または第２基準値と比較し、供給手段は、演奏が混声歌唱区間にある場合は、第１基準値を第１比較手段に、第２基準値を第２比較手段に供給し、演奏が第１単独歌唱区間にある場合は、第１基準値を第１比較手段および第２比較手段に供給し、演奏が第２単独歌唱区間にある場合は、第２基準値を第１比較手段および第２比較手段に供給し、演算手段は、第１比較手段と第２比較手段の比較結果の平均値を算出出力し、評価手段は、演算手段の出力に基づいて歌唱力を評価することを特徴とする。
【００１０】
また、請求項２に記載の発明は、請求項１に記載のカラオケ装置において、第１、第２比較手段のそれぞれは、歌唱音声信号が入力されない場合を非歌唱期間として検出し、演算手段は、非歌唱期間が検出された場合、平均値に代えて非歌唱期間でない第１若しくは第２比較手段の比較結果をそのまま出力することを特徴とする。
【００１１】
【発明の実施の形態】
以下、図面を参照して、この発明の実施形態について説明する。
Ａ：実施形態の全体構成
図１は、この発明の一実施形態によるカラオケ装置の全体構成を示すブロック図である。同図において、３０は装置各部を制御するＣＰＵである。このＣＰＵ３０には、バスＢＵＳを介してＲＯＭ３１、ＲＡＭ３２、ハードディスク装置（ＨＤＤ）３７、通信制御部３６、リモコン受信部３３、表示パネル３４、パネルスイッチ３５、音源装置３８、音声データ処理部３９、効果用ＤＳＰ４０、文字表示部４３、ＬＤチェンジャ４４、表示制御部４５および音声処理用ＤＳＰ４９が接続されている。
【００１２】
ＲＯＭ３１には、当該カラオケ装置を起動するために必要なイニシャルプログラムが記憶されている。装置の電源がオンされると、このイニシャルプログラムによってＨＤＤ３７に記憶されたシステムプログラムおよびアプリケーションプログラムがＲＡＭ３２にロードされる。ＨＤＤ３７には、上記システムプログラムおよびアプリケーションプログラムのほか、カラオケ演奏時に再生される約１万曲分の楽曲データを記憶する楽曲データファイル３７０が記憶されている。
【００１３】
ここで、図２〜図４を参照し、楽曲データの内容について説明する。図２は、１曲分の楽曲データのフォーマットを示す図である。また、図３、図４は楽曲データの各トラックの内容を示す図である。
図２において、楽曲データは、ヘッダ、楽音トラック、ガイドメロディトラック、歌詞トラック、音声トラック、効果トラックおよび音声データ部からなっている。ヘッダには、その楽曲データに関する種々の情報が書き込まれており、例えば曲番号、曲名、ジャンル、発売日、曲の演奏時間（長さ）等のデータが書き込まれている。
【００１４】
楽音トラックないし効果トラックの各トラックは、図３および図４に示すように、複数のイベントデータと各イベント間の時間間隔を示すデュレーションデータΔｔからなるシーケンスデータで構成されている。ＣＰＵ３０は、カラオケ演奏時にシーケンスプログラム（カラオケ演奏のためのアプリケーションプログラム）によって各トラックのデータを並行して読み出すようになっている。各トラックのシーケンスデータを読み出す場合、所定のテンポクロックによりΔｔをカウントし、カウントを終了したしたときこれに続くイベントデータを読み出し、所定の処理部へ出力する。楽音トラックには、図３に示すように、メロディトラック、リズムトラックをはじめとして種々のパートのトラックが形成されている。
【００１５】
また、図４に示すように、ガイドメロディトラックには、このカラオケ曲のボーカルパートのメロディすなわち歌唱者が歌うべきメロディのシーケンスデータが書き込まれている。ＣＰＵ３０は、このデータに基づきリファレンスの音高データ、音量データを生成し、歌唱音声と比較する。デュエット曲のように複数のボーカルパート（例えば、メインメロディとコーラスメロディ）がある場合には、各パートに対応してガイドメロディトラックが存在する。
【００１６】
また、歌詞トラックは、モニタ４６上に歌詞を表示するためのシーケンスデータからなっている。このシーケンスデータは、楽音データではないが、インプリメンテーションの統一を図り作業工程を容易にするため、このトラックもＭＩＤＩデータ形式で記述される。データの種類は、システムエクスクルーシブメッセージである。歌詞トラックは、通常はモニタに表示される１行分の歌詞に相当する文字コード、そのモニタ画面上の表示座標、表示時間、およびワイプシーケンスデータからなっている。ワイプシーケンスデータとは、曲の進行に合わせて歌詞の表示色を変更していくためのシーケンスデータであり、表示色を変更するタイミング（この歌詞が表示されてからの時間）と変更位置（座標）が１行分の長さにわたって順次記録されているデータである。
【００１７】
音声トラックは、音声データ部に記憶されている音声データｎ（ｎ＝１，２，３，……）の発生タイミング等を指定するシーケンストラックである。音声データ部には、音源装置３８では合成し難いバックコーラス等の人声が記憶されている。音声トラックには、音声指定データと、音声指定データの読み出し間隔、すなわち、音声データを音声データ処理部３９に出力して音声信号を形成するタイミングを指定するデュレーションデータΔｔが書き込まれている。音声指定データは、音声データ番号、音程データおよび音量データからなっている。音声データ番号は、音声データ部に記録されている各音声データの識別番号ｎである。音程データ、音量データは、形成すべき音声データの音程や音量を指定するデータである。すなわち、言葉を伴わない「アー」や「ワワワワッ」等のバックコーラスは、音程や音量を変化させれば何度も利用できるため、基本的な音程、音量で１つ記憶しておき、このデータに基づいて音程や音量をシフトして繰り返し使用する。音声データ処理部３９は、音量データに基づいて出力レベルを設定し、音程データに基づいて音声データの読み出し間隔を変えることによって音声信号の音程を設定する。
【００１８】
効果トラックには、効果用ＤＳＰ４０を制御するためのＤＳＰコントロールデータが書き込まれている。効果用ＤＳＰ４０は、音源装置３８、音声データ処理部３９から入力される信号に対してリバーブなどの残響系の効果を付与する。ＤＳＰコントロールデータは、このような効果の種類を指定するデータおよびディレータイム、エコーレベル等の効果付与の程度を指定するデータからなっている。
【００１９】
このような楽曲データは、カラオケの演奏開始時にＨＤＤ３７から読み出され、ＲＡＭ３２にロードされる。
【００２０】
次に、図５を参照し、ＲＡＭ３２のメモリマップの内容を説明する。同図に示すように、ＲＡＭ３２には、ロードしたシステムプログラムやアプリケーションプログラムを記憶するプログラム記憶エリア３２４のほか、カラオケ演奏のための楽曲データを記憶する実行データ記憶エリア３２３、ガイドメロディを一時記憶するＭＩＤＩバッファ３２０、このガイドメロディから抽出されたリファレンスデータを記憶するリファレンスデータレジスタ３２１、およびリファレンスと歌唱音声を比較することによって求められた差分データを蓄積記憶する差分データ記憶エリア３２２が設定されている。リファレンスデータレジスタ３２１は、音高データレジスタ３２１ａおよび音量データレジスタ３２１ｂからなっている。また、差分データ記憶エリア３２２は、音高差分データ記憶エリア３２２ａ、音量差分データ記憶エリア３２２ｂからなっている。
【００２１】
さて、再び図１を参照し、当該カラオケ装置の構成の説明を進める。同図において、通信制御部３６は、ＩＳＤＮ回線を介して図示しないホストコンピュータから楽曲データ等をダウンロードし、内部のＤＭＡコントローラによって受信した楽曲データをＣＰＵ３０を介さずに直接ＨＤＤ３７へ転送する。
リモコン受信部３３は、リモコン５１から送られてくる赤外線信号を受信して入力データを復元する。リモコン５１は、選曲スイッチなどのコマンドスイッチやテンキースイッチ等を備えており、利用者がこれらのスイッチを操作するとその操作に応じたコードで変調された赤外線信号を送信する。
表示パネル３４は、このカラオケ装置の前面に設けられており、現在演奏中の曲コードや予約曲数などを表示するものである。パネルスイッチ３５は、カラオケ装置の前面に設けられており、曲コード入力スイッチやキーチェンジスイッチ等を含んでいる。また、リモコン５１またはパネルスイッチ３５によって採点機能のオン／オフが指定できるようになっている。
【００２２】
音源装置３８は、楽曲データの楽音トラックのデータに基づいて楽音信号を形成する。楽曲データは、カラオケ演奏時にＣＰＵ３０によって読み出され、楽音トラックとともに比較用データであるガイドメロディトラックも並行して読み出される。音源装置３８は、楽音トラックの各トラックのデータを並行して読み出し、複数パートの楽音信号を同時に形成する。
【００２３】
音声データ処理部３９は、楽曲データに含まれる音声データに基づき、指定された長さ、指定された音高の音声信号を形成する。音声データは、バックコーラス等の音源装置３８で電子的に発生し難い信号波形をそのままＡＤＰＣＭデータ化して記憶したものである。音源装置３８が形成した楽音信号および音声データ処理部３９が形成した音声信号がカラオケ演奏音であり、これらは、効果用ＤＳＰ４０に入力される。効果用ＤＳＰ４０は、このカラオケ演奏音に対してリバーブやエコー等の効果を付与する。効果を付与されたカラオケ演奏音は、Ｄ／Ａコンバータ４１によってアナログ信号に変換された後、アンプスピーカ４２へ出力される。
【００２４】
また、４７ａ，４７ｂは各々歌唱用のマイクであり、各マイク４７ａ，４７ｂから入力される歌唱音声信号Ｖ１，Ｖ２は、図示せぬプリアンプで増幅された後、アンプスピーカ４２およびセレクタ４８に各々入力される。
【００２５】
セレクタ４８は、ＣＰＵ３０の制御の下、各歌唱音声信号Ｖ１，Ｖ２を選択して音声処理用ＤＳＰ４９を出力する。この場合、セレクタ４８の切換には、入力端子Ｘ１に供給される歌唱音声信号Ｖ１を出力端子Ｙ１から、入力端子Ｘ２に供給される歌唱音声信号Ｖ２を出力端子Ｙ２から各々出力するストレートモードと、入力端子Ｘ１，Ｘ２に供給される歌唱音声信号Ｖ１，Ｖ２を混合した後、出力端子Ｙ１，Ｙ２に出力するミックスモードがある。
ここで、モードの選択は楽曲データおよびリモコン５１の操作の組み合わせによって決定される。例えば、楽曲によっては、ハモリパートのデータを有するものがあるが、ハモリ機能を用いるか否かは、利用者の判断に委ねられている。具体的には、利用者がハモリ機能を利用して歌唱したい場合には、リモコン５１を操作してその旨を入力すると、ハモリパートとメインボーカルパートの演奏が行われ、一方、特に操作の行わない場合には、メインボーカルパートのみによる演奏が行われる。この場合に、ハモリ機能を利用する場合にはストレートモードとされ、それを利用しない場合にはミックスモードとされる。換言すれば、各種の効果を含め、利用者によって設定された楽曲データによってモードの選択が行われる。
【００２６】
音声処理用ＤＳＰ４９に入力された各歌唱音声信号Ｖ１，Ｖ２は、各々ディジタル信号に変換された後、採点処理のための信号処理が施される。この音声処理用ＤＳＰ４９とＣＰＵ３０を含む構成によって採点処理部５０の機能が実現される。これについては後述する。
アンプスピーカ４２は、入力されたカラオケ演奏音および各歌唱音声信号を増幅し、かつ、各歌唱音声信号にエコー等の効果を付与した後、スピーカから放音する。
【００２７】
文字表示部４３は、文字コードが入力されるとこれに対応する曲名や歌詞等のフォントデータを内部のＲＯＭ（図示略）から読み出し、該データを出力する。また、ＬＤチェンジャ４４は、入力された映像選択データ（チャプタナンバ）に基づき、対応するＬＤの背景映像を再生する。映像選択データは、当該カラオケ曲のジャンルデータに基づいて決定される。このジャンルデータは、楽曲データのヘッダに書かれており、カラオケ演奏スタート時にＣＰＵ３０によって読み出される。ＣＰＵ３０は、ジャンルデータに基づいてどの背景映像を再生するかを決定し、その背景映像を指定する映像選択データをＬＤチェンジャ４４に対して出力する。ＬＤチェンジャ４４には、５枚程度のレーザディスクが内蔵されており、約１２０シーンの背景映像を再生することが可能である。映像選択データによってこの中から１つの背景映像が選択され、映像データとして出力される。この映像データと文字表示部４３から出力される歌詞等のフォントデータは、表示制御部４５にてスーパーインポーズされ、その合成画像がモニタ４６に表示される。また、採点処理部５０によって採点結果が算出されると、これに応じたキャラクタが文字表示部４３から出力され、モニタ４６に表示されるようになっている。
【００２８】
Ｂ：採点処理部５０について
次に、本実施形態の採点処理部５０について説明する。この採点処理部５０は、上述した音声処理用ＤＳＰ４９、ＣＰＵ３０等のハードウェアと採点用のソフトウェアによって構成される。図６は、採点処理部５０の構成を示すブロック図である。同図において、採点処理部５０は、第１の採点部５０Ａ、第２の採点部５０Ｂ、合成部５０Ｃおよび評価部５０Ｄからなる。
第１，第２の採点部５０Ａ，５０Ｂは、一対のＡ／Ｄコンバータ５０１ａ，５０１ｂ、データ抽出部５０２ａ，５０２ｂ、比較部５０３ａ，５０３ｂ、およびフィルタ５０４ａ，５０４ｂによって構成される。
【００２９】
Ａ／Ｄコンバータ５０１ａ，５０１ｂは、セレクタ４８から出力される歌唱音声信号を各々ディジタル信号に変換する。データ抽出部５０２ａ，５０２ｂは、ディジタル化された各歌唱音声信号から１００ｍｓ毎に音高データと音量データを抽出する。比較部５０３ａ，５０３ｂは、各歌唱音声信号から抽出された音高データおよび音量データとリファレンスメロディデータ＃Ａ，＃Ｂの音高データおよび音量データとを各々比較し、それらの差分を算出して、差分データＤｉｆｆａ，Ｄｉｆｆｂとして出力する。
【００３０】
ここで、差分データＤｉｆｆａ，Ｄｉｆｆｂは、以下のデータから構成される。
Ｔｉ：計測時刻データ（演奏クロックの相対時間で計測）
ΔＴ：持続時間データ（前回の計測時刻からの時間）
Ｍｉ：リファレンスメロディ状態データ
（歌唱が必要な区間か否か、歌唱区間で「１」、非歌唱区間で「０」）
Ｓｉ：歌唱状態データ（歌唱の有無、歌唱中で「１」、非歌唱中で「０」）
Ｆｉ：音高差データ（音高の差分をログスケール（ｃｅｎｔ単位）で指示）
Ｌｉ：音量差データ（音量の差分をログスケール（ｄＢ単位）で指示）
ただし、「ｉ」は、ｉ番目のサンプルであることを示している。
【００３１】
この場合、音高差データＦｉと音量差データＬｉはログスケールで表されているので、後段にある合成部５０Ｃの演算を簡略化することができる。
また、リファレンスメロディ状態データＭｉは、ガイドメロディトラックに記録されている各パートに対応した楽曲データに基づいて、ＣＰＵ３０が生成する。具体的には、当該楽曲データ中のノートオンステータス、ノートオフステータスから生成される。
また、歌唱状態データＳｉは、データ抽出部５０２ａ，５０２ｂから供給される各音量データを予め定められた閾値と比較することによって、各比較部５０３ａ，５０３ｂが生成する。この場合、閾値は、利用者が歌唱しているか否かを判別可能なレベルに設定される。
【００３２】
ここで、図７を参照し、歌唱音声データ、リファレンスデータ、差分データＤｉｆｆについて説明する。図７（Ａ），（Ｂ）はリファレンスであるガイドメロディの例を示す図である。同図（Ａ）はガイドメロディを五線譜によって示したもので、同図（Ｂ）はこの五線譜の内容を約８０パーセントのゲートタイムで音高データ、音量データ化したものを示している。音量はｍｐ→クレッシェンド→ｍｐの指示に従って上下している。これに対し、同図（Ｃ）は歌唱音声の例を示している。音高、音量ともリファレンスが示す値から若干変動している。この場合の歌唱状態データＳｉは、図に示すように音量データが、閾値を上回った場合に「１」となり、それ以下の場合に「０」となる。後述する評価部５０Ｄは、歌唱状態データＳｉが「０」となっているサンプルについては、有効なサンプルとして扱わないようにしている。このように音量の小さな部分を無視するのは、この区間では、音高差データＦｉあるいは音量差データＬｉに占めるノイズ成分の割合が大きくなるため、採点精度が劣化してしまうからである。
【００３３】
ところで、音高差データＦｉと音量差データＬｉは、ある範囲内で変動するのが通常であり、これらの値が突発的に変動する場合は、ノイズによる誤動作等によって誤った演算が行われたと考えることができる。ノイズの影響を受けた音高差データＦｉと音量差データＬｉとに基づいて歌唱力の採点を行ったのでは、歌い手の歌唱力を正当に評価することはできない。フィルタ５０４ａ，５０４ｂは、このような場合の音高差データＦｉと音量差データＬｉとを無効にするために設けられたものである。
【００３４】
フィルタ５０４ａ，５０４ｂは、その内部にバッファ、減算器およびコンパレータを有している。バッファには、１つ前のサンプルで算出された音高差データＦｉ-1，音量差データＬｉ-1が格納される。そして、現在のサンプルに対応した音高差データＦｉ，音量差データＬｉが入力されると、減算器において、ΔＬｉ＝｜Ｌｉ−Ｌｉ-1｜、ΔＦｉ＝｜Ｆｉ−Ｆｉ-1｜が算出される。コンパレータは、ΔＬｉ、ΔＦｉを予め定められた閾値Ｌｒ、Ｆｒと各々比較して、各閾値を上回る場合に「１」となり、下回る場合に「０」となる制御信号を出力する。ここで、各閾値は、各種の実測データから無効なサンプルと判定できるように定める。そして、フィルタ５０４ａ，５０４ｂは、制御信号が「１」の場合に、現在の音高差データＦｉと音量差データＬｉと無効とする。
これにより、前回のサンプルと比較して変化の大きいサンプルを無効にして、歌い手の歌唱力を正当に評価することが可能となる。
【００３５】
次に、合成部５０Ｃは、計測時刻データＴｉを参照することにより、同時刻の差分データＤｉｆｆａ，Ｄｉｆｆｂを合成し、合成差分データＤｉｆｆｃを生成する。合成差分データＤｉｆｆｃは、計測時刻データＴｉ、持続時間データΔＴの他、合成リファレンスメロディ状態データＭｉ’、合成歌唱状態データＳｉ’、合成音高差データＦｉ’および合成音量差データＬｉ’から構成される。
【００３６】
ここで、差分データＤｉｆｆａを構成する各データに添字「１」、差分データＤｉｆｆｂに係わる各データに添字「２」を付して表すこととすると、合成リファレンスメロディ状態データＭｉ’はＭｉ１とＭｉ２の論理和として、合成歌唱状態データＳｉ’はＳｉ１とＳｉ２の論理和として算出される。また、合成音高差データＦｉ’と合成音量差データＬｉ’は、Ｍｉ１とＭｉ２、Ｓｉ１とＳｉ２に応じて以下に示す式に従って算出される。
【００３７】
１）Ｍｉ１＊Ｍｉ２＊Ｓｉ１＊Ｓｉ２＝１の場合
この場合は、いずれの採点部で行われる採点にあっても、有効な歌唱区間であって、かつ歌い手が歌唱している期間である。このため、差分データの平均値を算出する。
Ｆｉ’＝（Ｆｉ１＋Ｆｉ２）／２
Ｌｉ’＝（Ｌｉ１＋Ｌｉ２）／２
【００３８】
２）Ｍｉ１＊Ｓｉ１＝１、Ｍｉ２＊Ｓｉ２＝０
この場合、第２の採点部５０Ｂで行われる採点は、非歌唱区間かあるいは歌唱中でない。一方、第１の採点部５０Ａで行われる採点は、有効歌唱区間において歌い手が歌唱中である期間である。このため、差分データＤｉｆｆｂは無視される。Ｆｉ’＝Ｆｉ１
Ｌｉ’＝Ｌｉ１
【００３９】
３）Ｍｉ１＊Ｓｉ１＝０、Ｍｉ２＊Ｓｉ２＝１
この場合、第１の採点部５０Ａで行われる採点は、非歌唱区間かあるいは歌唱中でない。一方、第２の採点部５０Ｂで行われる採点は、有効歌唱区間において歌い手が歌唱中である期間である。このため、差分データＤｉｆｆａは無視される。
Ｆｉ’＝Ｆｉ２
Ｌｉ’＝Ｌｉ２
【００４０】
このよう合成部５０Ｃを構成することによって、例えば、デュエット曲の混成歌唱区間で、男子の歌い手が正しく歌唱して、女性の歌い手が歌唱しなかった場合、女性の歌い手が歌唱しなかった部分については採点の対象外とされ、正しく歌唱した男性の歌い手の歌唱力をもって両者の歌唱力とすることが可能となる。
また、デュエット曲の単独歌唱区間において、本来歌唱すべきでない歌唱音声は採点対象とならず、本来予定されている歌唱音声のみに基づいて、正確な採点結果を得ることができる。
【００４１】
次に、評価部５０Ｄは、記憶部等（図示せず）から構成されており、差分データＤｉｆｆａ，Ｄｉｆｆｂまたは合成差分データＤｉｆｆｃに基づいて、採点結果を算出する。差分データＤｉｆｆａ，Ｄｉｆｆｂまたは合成差分データＤｉｆｆｃが入力されると、記憶部（すなわち、ＲＡＭ３２の差分データ記憶エリア３２２）に蓄積記憶される。この場合、Ｄｉｆｆａ，ＤｉｆｆｂまたはＤｉｆｆｃのうちどのデータを記憶部に蓄積するかは、ＣＰＵ３０によって制御される。この蓄積は曲の演奏中随時行われる。
【００４２】
曲の演奏が終了すると、評価部５０Ｄは、記憶部に蓄積された差分データを順次読み出してこれらを音高、音量の各音楽要素毎に累算し、各累算値に基づいて各々採点のための減算値を求める。そして、各減算値を満点（１００点）から減算して各音楽要素毎の得点を求め、これらの平均値を採点結果として出力する。
【００４３】
Ｃ：実施形態の採点動作
次に、本実施形態による採点動作（すなわち、採点処理部５０の動作）について説明する。なお、この例においては、特に断らない限り、歌い手は歌唱すべき区間で歌唱中であり、歌唱状態データＳｉ＝１であったものとする。
Ｃ−１：バトル曲を歌唱する場合の採点動作
まず、二人の歌い手が、バトル曲を歌唱する場合について説明する。この場合には、セレクタ４８はストレートモードに設定され、第１の採点部５０Ａと第２の採点部５０Ｂには、同一のリファレンスメロディデータ＃Ａが供給される。これにより、第１，第２の採点部５０Ａ，５０Ｂに各歌唱音声信号Ｖ１，Ｖ２が入力されると、第１の採点部５０Ａと第２の採点部５０Ｂは、差分データＤｉｆｆａ，Ｄｉｆｆｂを生成する。この場合の採点は各歌い手毎に行う必要があるので、評価部５０Ｄは、差分データＤｉｆｆａに基づく採点結果と差分データＤｉｆｆｂに基づく採点結果を各々生成する。
【００４４】
Ｃ−２：通常の曲を歌唱する場合の採点動作
次に、一人の歌い手が通常の曲を歌唱する場合について説明する。この場合には、いずれか一方の採点部によって、差分データを生成してもよいが、本実施形態では、ノイズの低減を図るために、第１，第２の採点部５０Ａ，５０Ｂで同時に処理を行い、その平均値に基づいて採点を行うようにしている。
このため、セレクタ４８はミックスモードに設定され、第１の採点部５０Ａと第２の採点部５０Ｂには、同一のリファレンスメロディデータ＃Ａが供給される。そして、合成部５０Ｃは差分データＤｉｆｆａと差分データＤｉｆｆｂの平均値を算出し、合成差分データＤｉｆｆｃとして出力する。
【００４５】
一般に、ノイズ成分はランダムノイズであるから、平均をとることによってその成分は３ｄＢ減少する。これに対して、信号成分は平均をとっても変化しない。したがって、合成差分データＤｉｆｆｃ中の合成音高差データＦｉ’および合成音量差データＬｉ’のＳＮ比は、差分データＤｉｆｆａ，差分データＤｉｆｆｂのそれと比較して、３ｄＢ改善される。
これにより、Ａ／Ｄコンバータ５０１ａ，５０１ｂにおいて、量子化する際に発生する誤差や、音高を検出する際の誤差等によって生じるノイズ成分を低減して、歌唱力を精度の良く採点することが可能となる。
【００４６】
Ｃ−３：デュエット曲を歌唱する場合の採点動作
次に、男女の歌い手がデュエット曲を歌唱する場合について説明する。デュエット曲中には、一般に、男性のみが歌唱する男性歌唱区間、女性のみが歌唱する女性歌唱区間、男性と女性が同時に歌唱する混成歌唱区間、および両者がともに歌唱しない前奏・間奏区間がある。混成区間にあっては、両者が同時に歌唱するため、歌唱力の採点は、第１，第２の採点部５０Ａ，５０Ｂの各々で行う必要がある。これに対して、男性歌唱区間あるいは女性歌唱区間では、いずれか一方で差分データを生成すれば、採点を行うことができるが、本実施形態にあっては、採点精度を向上させる目的で、この場合にも両方の採点部を用いて差分データを生成し、これを合成部５０Ｃで平均して合成差分データを得ている。
【００４７】
この点について、図８を参照しつつ具体的に説明する。なお、この例では、男性がマイク４７ａで歌唱し、女性がマイク４７ｂで歌唱するものとする。図８（Ａ）は、デュエット曲の進行の一例を示したものである。この例のデュエット曲は、前奏区間Ｔ１→男性歌唱区間Ｔ２→女性歌唱区間Ｔ３→混成歌唱区間Ｔ４→間奏区間Ｔ５の順に進行する。また、図８（Ｂ）はセレクタ４８のモードを示したものであり、図８（Ｃ）は第１の採点部５０Ａに供給されるリファレンスメロディデータを、図８（Ｄ）は第２の採点部５０Ｂに供給されるリファレンスメロディデータを示したものである。なお、＃Ｍは男性パート、＃Ｗは女性パートに各々対応したリファレンスメロディデータを示している。
【００４８】
まず、前奏区間Ｔ１と間奏区間Ｔ５は、本来の歌唱区間でないから、図８（Ｂ），（Ｃ）に示すようにガイドメロディは存在しておらず、採点の対象外とされる。このため、セレクタ４８の切換モードは、スレートモード、ミックスモードのどちらであってもよい。
【００４９】
次に、男性歌唱区間Ｔ２にあっては、セレクタ４８はミックスモードに設定される。この場合、ＣＰＵは、セレクタ４８の入力端子Ｘ１と出力端子Ｙ１，Ｙ２を接続状態にし、セレクタ４８の入力端子Ｘ２を開放状態にするように制御する。このため、マイク４７ａから出力される男性の歌唱音声信号Ｖ１は、第１の採点部５０Ａと第２の採点部５０Ｂに供給される。この区間にあっては、第１，第２の採点部５０Ａ，５０Ｂに、リファレンスメロディデータ＃Ｍが供給されているので、男性の歌唱音声信号Ｖ１と男性パートのリファレンスメロディデータ＃Ｍが二つの採点部５０Ａ，５０Ｂによって比較され、その平均値が合成部５０Ｃにおいて生成される。評価部５０Ｄは合成部５０Ｃからの合成差分データＤｉｆｆｃに基づいて当該区間の採点を行う。この場合の合成差分データＤｉｆｆｃは、差分データＤｉｆｆａｔ，Ｄｉｆｆｂと比較してＳＮ比が改善されたものとなる。
【００５０】
次に、女性歌唱区間Ｔ３にあっては、男性歌唱区間Ｔ２と同様にセレクタ４８はミックスモードに設定される。ただし、セレクタ４８の内部の接続状態は男性歌唱区間Ｔ２と相違する。この場合、ＣＰＵは、セレクタ４８の入力端子Ｘ２と出力端子Ｙ１，Ｙ２を接続状態にし、セレクタ４８の入力端子Ｘ１を開放状態にするように制御する。このため、男性の歌唱音声信号Ｖ１は、セレクタ４８から出力されない。二人の歌い手のうち一方のみが歌唱すべき区間において、両方の歌唱音声信号を混合して出力端子Ｙ１，Ｙ２に出力せず、他方のマイクからの入力を開放としたのは、例えば、女性歌唱区間Ｔ３において、男性が手拍子を行うと、それがノイズとして混入され、女性の歌唱力を正当に評価することができないからである。
【００５１】
こうして、女性の歌唱音声信号Ｖ２が第１，第２の採点部５０Ａ，５０Ｂに供給されると、第１，第２の採点部５０Ａ，５０Ｂは、リファレンスメロディデータ＃Ｗに基づいて比較を行なう。この比較結果が合成部５０Ｃによって平均化され、合成差分データＤｉｆｆｃとして出力されると、評価部５０Ｄは合成差分データＤｉｆｆｃに基づいて当該区間の採点を行う。この場合も、男性歌唱区間Ｔと同様に、合成差分データＤｉｆｆｃは、差分データＤｉｆｆａｔ，Ｄｉｆｆｂと比較してＳＮ比が改善されたものとなる。
【００５２】
次に、混成歌唱区間にあっては、セレクタ４８はストレートモードに設定される。この場合、ＣＰＵ３０は、セレクタ４８の入力端子Ｘ１と出力端子Ｙ１を接続状態にし、その入力端子Ｘ２を出力端子Ｙ１を接続状態にするように制御する。このため、男性の歌唱音声信号Ｖ１が第１の採点部５０Ａに、女性の歌唱音声信号Ｖ２が第２の採点部５０Ｂに供給される。この区間にあっては、第１，第２の採点部５０Ａ，５０Ｂに、リファレンスメロディデータ＃Ｍ，＃Ｗが各々供給されているので、第１，第２の採点部５０Ａ，５０Ｂからは、異なる差分データＤｉｆｆａ，Ｄｉｆｆｂが出力される。合成部５０Ｃは、両者の平均値を算出して合成差分データＤｉｆｆｃを生成する。
【００５３】
ここで、当該区間の一部（Ｔ４’）において、女性が歌唱しなっかたとすると、第２の採点部５０Ｂに係る歌唱状態データＳｉ２は、図８（Ｅ）に示すものとなる。このため、当該期間Ｔ４’にあっては、合成部５０Ｃは、平均値を算出するのではなく、第１の採点部５０Ａによって生成された音高差データＦｉ１、音量差データＬｉ１を合成差分データＤｉｆｆｃとして出力するから、男性の歌唱力によって総合的な採点を行うことができる。
【００５４】
このように、本実施形態によれば、楽曲データとリモコン５１の操作の組み合わせに基づいて、ＣＰＵ３０は、セレクタ４８の切換と第１，第２の採点部５０Ａ，５０Ｂに供給するリファレンスガイドメロディデータを制御するので、第１，第２の採点部５０Ａ，５０Ｂを有効に活用して、精度の良くかつ妥当な採点結果を算出することが可能となる。
すなわち、一人の歌い手が歌唱する場合には、ＳＮ比を改善した合成差分データＤｉｆｆｃに基づいて採点結果を得ることができ、デュエット曲においては、歌唱区間の性質に応じて、合成部５０Ｃの動作を切り替えることによって精度の良くかつ妥当な採点結果を算出することができる。
【００５５】
Ｄ：変形例
なお、本発明は、上述した実施形態には限定されず、以下のような各種の変形が可能である。
（１）例えば、実施形態では、デュエット曲をカラオケ演奏する場合を例としたが、これに限らず、３つ以上のボーカルパートからなるコーラスの歌唱に対応すべく拡張することも可能である。この場合、採点処理部５０をパートの数に対応した系統に拡張し、ガイドメロディもパートの数に対応したトラック数だけ用意すればよい。
（２）また、実施形態のように、採点結果として各音楽要素の平均値を求めるのではなく、音高、音量あるいはリズムの得点を各音楽要素毎の採点結果として出力してもよい。
（３）また、採点処理は、曲が終了した後にまとめて採点を行っているが、フレーズ単位、音符単位で基本評価を行い、曲終了後にそれを集計するようにしてもよい。さらに、フレーズ単位毎に採点結果をモニタ４６に表示し、曲終了後に最終的な採点結果を表示してもよい。
（４）また、実施形態では、デュエット曲においてボーカルのパート毎に得られる得点の平均値を出力したが、個別に出力するようにしてもよいし、あるいは、両方を出力するようにしてもよい。個別に出力する場合は、差分データＤｉｆｆａ，Ｄｉｆｆｂ各々に基づいて採点結果を評価部５０Ｄで算出すればよい。
（５）その他、複数の歌唱音声のうち採点結果の最も高い者の点数を強調表示するなど、種々の表示態様を採用することによって利用者の楽しみをさらに増すことができる。
【００５６】
【発明の効果】
以上説明したように、この発明によれば、例えばデュエット曲のように複数のボーカルパートが歌唱される場合に、総合的な歌唱力を採点することができ、しかも、単独歌唱期間の採点について、その採点精度を向上させることができる。
【図面の簡単な説明】
【図１】この発明の一実施形態によるカラオケ装置の構成を示すブロックである。
【図２】同実施形態における楽曲データのデータフォーマットを示す図である。
【図３】同楽曲データの楽音トラックの構成を示す図である。
【図４】同楽曲データの楽音トラック以外のトラックの構成を示す図である。
【図５】同カラオケ装置におけるＲＡＭのメモリマップの内容を示す図である。
【図６】同カラオケ装置における採点処理部の構成を示すブロック図である。
【図７】（Ａ）は同実施形態におけるガイドメロディの例を五線譜で示す図、（Ｂ）は同ガイドメロディに基づくリファレンスの音高データおよび音量データを示す図、（Ｃ）は歌唱音声の音高データ、音量データおよび歌唱状態データを示す図である。
【図８】同カラオケ装置においてデュエット曲を歌唱する場合のタイミングチャートである。
【符号の説明】
３０…ＣＰＵ（制御手段、採点手段）、３１…ＲＯＭ、３２…ＲＡＭ、３７…ハードディスク装置、３８…音源装置、４７ａ，４７ｂ…マイク（第１，第２のマイクロホン）、４９…音声処理用ＤＳＰ、５０…採点処理部、５０１ａ，５０１ｂ…Ａ／Ｄコンバータ、５０２ａ，５０２ｂ…データ抽出部（第１，第２の抽出手段）、５０３ａ，５０３ｂ…比較部（第１，第２の比較手段）。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a karaoke apparatus having a function of scoring a singing ability of a user.
[0002]
[Prior art]
Conventionally, various karaoke apparatuses having a function of scoring the singing ability of a singer have been developed. Generally, in this type of karaoke apparatus, the volume and pitch (pitch) of the singer's singing voice are compared with the vocal part reference included in the karaoke music information, and the singing is performed according to the degree of agreement. It is designed to score power.
[0003]
[Problems to be solved by the invention]
Meanwhile, in a conventional karaoke apparatus, when a song composed of a plurality of vocal parts such as a duet song is sung, the singing ability is scored by mixing singing voices input from a plurality of microphones (hereinafter, referred to as microphones). The obtained signal is compared with a vocal part reference (usually, a reference value of the main vocal). Therefore, the singing voice of each part could not be properly evaluated, and an accurate scoring result could not be obtained.
[0004]
In such a case, it is conceivable that the singing ability is scored by comparing each singing voice with the guide melody of each part and summing up the grading results of both parts. There are the following problems when scoring.
First, if one singer sings correctly and the other singer does not sing, simply adding the results of the two scores to give a comprehensive score would result in the inability of the other singer to sing. As a result, the singing ability of a singer who sings correctly is not properly reflected in the scoring results.
In addition, duet songs have a male singing section in which men and women sing at the same time, a male singing section in which only men sing, and a female singing section in which only women sing, but in sections where only one singer sings. If the total of the score results of the two is taken, the other singing voice that should not be sung is also targeted for scoring, and an accurate scoring result cannot be obtained.
[0005]
Further, in the karaoke apparatus corresponding to the above-mentioned duet music, it is necessary to score each part at the same time, so it is premised that two systems of the scorers are prepared. On the other hand, the songs sung by the karaoke apparatus are usually not only duet songs but also ordinary songs composed of a single vocal part. In this case, the singing ability can be graded by using one scoring machine, but it is convenient if the scoring accuracy can be improved by using the other scoring machine.
[0006]
The present invention has been made under such a background, and when a plurality of vocal parts are sung like a duet song, it is necessary to properly evaluate the singing voice of each part and obtain an accurate scoring result. It is an object of the present invention to provide a karaoke apparatus that can perform music. Another object is to improve the singing ability scoring accuracy.
[0007]
[Means for Solving the Problems]
In order to solve the above-mentioned problem, the invention according to claim 1 is: A karaoke apparatus for playing music data, comprising a selection means, a first comparison means, a second comparison means, a supply means, a calculation means, and an evaluation means, wherein the music data has a first reference value, In addition to the two reference values, the mixed singing section, the first single singing section, and the second single singing section are identifiable. When the performance is in the mixed singing section, the singing input from the first microphone is performed. A voice signal is output from the first output terminal, a singing voice signal input from the second microphone is output from the second output terminal, and when the performance is in the first single singing section, the singing input from the first microphone is performed. A voice signal is output from the first and second output terminals, and when the performance is in the second single singing section, a singing voice signal input from the second microphone is output from the first and second output terminals, and the first singing voice signal is output from the first and second output terminals. The comparing means includes: The feature of the singing voice signal output from the input terminal is compared with the supplied first or second reference value, and the second comparing means is supplied with the feature of the singing voice signal output from the second output terminal. The supply means supplies the first reference value to the first comparison means and the second reference value to the second comparison means when the performance is in a mixed singing section, When the performance is in the first single singing section, the first reference value is supplied to the first comparing means and the second comparing means. When the performance is in the second single singing section, the second reference value is compared with the first comparing value. To the second comparing means, the calculating means calculates and outputs an average value of the comparison results of the first comparing means and the second comparing means, and the evaluating means evaluates the singing ability based on the output of the calculating means. It is characterized by the following.
[0010]
Claims 2 The invention described in 2. The karaoke apparatus according to claim 1, wherein each of the first and second comparison means detects a case where a singing voice signal is not input as a non-singing period, and calculates an average when the non-singing period is detected. Instead of the value, the comparison result of the first or second comparison means that is not in the non-singing period is output as it is. It is characterized by the following.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
A: Overall configuration of the embodiment
FIG. 1 is a block diagram showing the overall configuration of a karaoke apparatus according to one embodiment of the present invention. In FIG. 1, reference numeral 30 denotes a CPU that controls each unit of the apparatus. The CPU 30 includes, via a bus BUS, a ROM 31, a RAM 32, a hard disk drive (HDD) 37, a communication control unit 36, a remote control receiving unit 33, a display panel 34, a panel switch 35, a sound source device 38, an audio data processing unit 39, A DSP 40, a character display unit 43, an LD changer 44, a display control unit 45, and an audio processing DSP 49 are connected.
[0012]
The ROM 31 stores an initial program necessary for starting the karaoke apparatus. When the power of the apparatus is turned on, the system program and the application program stored in the HDD 37 are loaded into the RAM 32 by the initial program. The HDD 37 stores, in addition to the system program and the application program, a music data file 370 for storing music data of about 10,000 music played during karaoke performance.
[0013]
Here, the contents of the music data will be described with reference to FIGS. FIG. 2 is a diagram showing a format of music data for one music. FIGS. 3 and 4 are diagrams showing the contents of each track of the music data.
In FIG. 2, the music data includes a header, a musical sound track, a guide melody track, a lyrics track, an audio track, an effect track, and an audio data section. Various information regarding the music data is written in the header, and data such as a music number, a music title, a genre, a release date, and a music playing time (length) are written in the header.
[0014]
As shown in FIGS. 3 and 4, each track of the tone track or the effect track is composed of sequence data including a plurality of event data and duration data Δt indicating a time interval between each event. The CPU 30 reads the data of each track in parallel by a sequence program (application program for karaoke performance) at the time of karaoke performance. When reading the sequence data of each track, Δt is counted by a predetermined tempo clock, and when the counting is completed, the subsequent event data is read and output to a predetermined processing unit. As shown in FIG. 3, tracks of various parts including a melody track and a rhythm track are formed on the musical tone track.
[0015]
As shown in FIG. 4, the melody of the vocal part of the karaoke tune, that is, the melody sequence data to be sung by the singer is written in the guide melody track. The CPU 30 generates reference pitch data and volume data based on the data and compares the generated data with the singing voice. When there are a plurality of vocal parts (for example, a main melody and a chorus melody) as in a duet song, there is a guide melody track corresponding to each part.
[0016]
The lyrics track is composed of sequence data for displaying lyrics on the monitor 46. Although this sequence data is not tone data, this track is also described in the MIDI data format in order to unify the implementation and facilitate the work process. The type of data is a system exclusive message. The lyrics track usually includes a character code corresponding to one line of lyrics displayed on the monitor, its display coordinates on the monitor screen, display time, and wipe sequence data. The wipe sequence data is sequence data for changing the display color of the lyrics in accordance with the progress of the song. The timing of changing the display color (time from when the lyrics are displayed) and the change position (coordinates) ) Is data sequentially recorded over the length of one line.
[0017]
The audio track is a sequence track that specifies the generation timing of the audio data n (n = 1, 2, 3,...) Stored in the audio data section. The voice data section stores a human voice such as a back chorus which is difficult to synthesize by the sound source device 38. In the audio track, the audio designation data and the reading interval of the audio designation data, that is, the duration data Δt that designates the timing of outputting the audio data to the audio data processing unit 39 and forming the audio signal, are written. The voice designation data includes a voice data number, pitch data, and volume data. The audio data number is an identification number n of each audio data recorded in the audio data section. The pitch data and the volume data are data for specifying the pitch and volume of the audio data to be formed. That is, a back chorus such as "Ah" or "Wawa Wawa" without words can be used many times by changing the pitch and volume. The pitch and volume are shifted based on and used repeatedly. The audio data processing unit 39 sets the output level based on the volume data, and sets the pitch of the audio signal by changing the reading interval of the audio data based on the pitch data.
[0018]
In the effect track, DSP control data for controlling the effect DSP 40 is written. The effect DSP 40 applies reverberation or other reverberation-based effects to signals input from the sound source device 38 and the audio data processing unit 39. The DSP control data is composed of data for specifying the kind of the effect and data for specifying the degree of effect application such as the delay time and the echo level.
[0019]
Such music data is read from the HDD 37 at the start of the karaoke performance and loaded into the RAM 32.
[0020]
Next, the contents of the memory map of the RAM 32 will be described with reference to FIG. As shown in the figure, the RAM 32 temporarily stores a program storage area 324 for storing loaded system programs and application programs, an execution data storage area 323 for storing music data for karaoke performance, and a guide melody. A MIDI buffer 320, a reference data register 321 for storing reference data extracted from the guide melody, and a difference data storage area 322 for storing difference data obtained by comparing the reference with the singing voice are set. . The reference data register 321 includes a pitch data register 321a and a volume data register 321b. The difference data storage area 322 includes a pitch difference data storage area 322a and a volume difference data storage area 322b.
[0021]
Now, the configuration of the karaoke apparatus will be described again with reference to FIG. In the figure, a communication control unit 36 downloads music data and the like from a host computer (not shown) via an ISDN line, and transfers music data received by an internal DMA controller directly to the HDD 37 without passing through the CPU 30.
The remote control receiver 33 receives the infrared signal sent from the remote controller 51 and restores the input data. The remote controller 51 includes a command switch such as a music selection switch, a numeric key switch, and the like. When the user operates these switches, an infrared signal modulated with a code corresponding to the operation is transmitted.
The display panel 34 is provided on the front of the karaoke apparatus, and displays the currently playing music code and the number of reserved music. The panel switch 35 is provided on the front of the karaoke apparatus, and includes a music code input switch, a key change switch, and the like. Further, the on / off of the scoring function can be designated by the remote controller 51 or the panel switch 35.
[0022]
The sound source device 38 forms a tone signal based on the data of the tone track of the music data. The music data is read by the CPU 30 during a karaoke performance, and the guide melody track, which is comparison data, is read in parallel with the musical sound track. The tone generator 38 reads out the data of each of the musical tone tracks in parallel, and simultaneously generates musical tone signals of a plurality of parts.
[0023]
The audio data processing unit 39 forms an audio signal having a specified length and a specified pitch based on the audio data included in the music data. The audio data is a signal waveform that is hardly generated electronically by the sound source device 38 such as a back chorus and is directly converted into ADPCM data and stored. The tone signal formed by the sound source device 38 and the sound signal formed by the sound data processing section 39 are karaoke performance sounds, which are input to the effect DSP 40. The effect DSP 40 adds effects such as reverb and echo to the karaoke performance sound. The karaoke performance sound to which the effect has been added is converted into an analog signal by the D / A converter 41 and then output to the amplifier speaker 42.
[0024]
47a and 47b are singing microphones. Singing voice signals V1 and V2 input from the microphones 47a and 47b are amplified by a preamplifier (not shown) and then input to the amplifier speaker 42 and the selector 48, respectively. Is done.
[0025]
The selector 48 selects each of the singing voice signals V1 and V2 under the control of the CPU 30, and outputs a voice processing DSP 49. In this case, switching of the selector 48 includes a straight mode in which the singing voice signal V1 supplied to the input terminal X1 is output from the output terminal Y1 and the singing voice signal V2 supplied to the input terminal X2 is output from the output terminal Y2. There is a mix mode in which the singing voice signals V1 and V2 supplied to the input terminals X1 and X2 are mixed and then output to the output terminals Y1 and Y2.
Here, the selection of the mode is determined by a combination of the music data and the operation of the remote controller 51. For example, some songs have data of a hamori part, but whether or not to use the hamori function is left to the discretion of the user. Specifically, when the user wants to sing using the hamori function, the user operates the remote controller 51 and inputs that fact, and the hamori part and the main vocal part are performed. If there is no performance, only the main vocal part is performed. In this case, the straight mode is used when the hamori function is used, and the mixed mode is used when not used. In other words, the mode is selected based on the music data set by the user, including various effects.
[0026]
Each of the singing voice signals V1 and V2 input to the voice processing DSP 49 is converted into a digital signal, and then subjected to signal processing for scoring processing. The configuration of the voice processing DSP 49 and the CPU 30 realizes the function of the scoring processing unit 50. This will be described later.
The amplifier speaker 42 amplifies the input karaoke performance sound and each singing voice signal, gives an effect such as an echo to each singing voice signal, and emits the sound from the speaker.
[0027]
When a character code is input, the character display unit 43 reads font data such as a song title and lyrics corresponding to the character code from an internal ROM (not shown) and outputs the data. The LD changer 44 reproduces the background video of the corresponding LD based on the input video selection data (chapter number). The video selection data is determined based on the genre data of the karaoke song. This genre data is written in the header of the music data, and is read by the CPU 30 at the start of the karaoke performance. The CPU 30 determines which background video is to be reproduced based on the genre data, and outputs video selection data specifying the background video to the LD changer 44. The LD changer 44 contains about five laser disks, and can reproduce about 120 scenes of background video. One background video is selected from among them according to the video selection data, and is output as video data. The video data and font data such as lyrics output from the character display unit 43 are superimposed by the display control unit 45, and the composite image is displayed on the monitor 46. When the scoring result is calculated by the scoring processing unit 50, a character corresponding to the scoring result is output from the character display unit 43 and displayed on the monitor 46.
[0028]
B: About the scoring unit 50
Next, the scoring processing unit 50 of the present embodiment will be described. The scoring processing unit 50 is configured by hardware such as the above-described voice processing DSP 49 and CPU 30 and scoring software. FIG. 6 is a block diagram illustrating a configuration of the scoring processing unit 50. In the figure, the scoring processing unit 50 includes a first scoring unit 50A, a second scoring unit 50B, a combining unit 50C, and an evaluation unit 50D.
The first and second scoring units 50A and 50B are composed of a pair of A / D converters 501a and 501b, data extraction units 502a and 502b, comparison units 503a and 503b, and filters 504a and 504b.
[0029]
The A / D converters 501a and 501b each convert the singing voice signal output from the selector 48 into a digital signal. The data extraction units 502a and 502b extract pitch data and volume data from each digitized singing voice signal every 100 ms. The comparing units 503a and 503b compare the pitch data and the volume data extracted from each singing voice signal with the pitch data and the volume data of the reference melody data #A and #B, respectively, and calculate the difference therebetween. , And differential data Diffa and Diffb.
[0030]
Here, the difference data Diffa and Diffb are composed of the following data.
Ti: Measurement time data (measured by relative time of performance clock)
ΔT: Duration data (time since last measurement time)
Mi: Reference melody status data
(Whether the section requires singing, "1" for singing section, "0" for non-singing section)
Si: Singing state data (singing presence / absence, “1” during singing, “0” during non-singing)
Fi: Pitch difference data (pitch difference is indicated by log scale (cent unit))
Li: Volume difference data (indicating the volume difference on a log scale (dB unit))
Here, “i” indicates that it is the i-th sample.
[0031]
In this case, since the pitch difference data Fi and the volume difference data Li are expressed on a log scale, the calculation of the synthesis unit 50C at the subsequent stage can be simplified.
The reference melody state data Mi is generated by the CPU 30 based on music data corresponding to each part recorded on the guide melody track. Specifically, it is generated from the note-on status and the note-off status in the music data.
The singing state data Si is generated by each of the comparison units 503a and 503b by comparing each volume data supplied from the data extraction units 502a and 502b with a predetermined threshold. In this case, the threshold is set to a level at which it is possible to determine whether the user is singing.
[0032]
Here, the singing voice data, the reference data, and the difference data Diff will be described with reference to FIG. FIGS. 7A and 7B are diagrams showing examples of a guide melody as a reference. FIG. 7A shows the guide melody in a staff notation, and FIG. 7B shows the contents of the staff converted into pitch data and volume data with a gate time of about 80%. The volume rises and falls according to the instruction of mp → crescendo → mp. On the other hand, FIG. 3C shows an example of a singing voice. Both the pitch and volume slightly fluctuate from the values indicated by the reference. The singing state data Si in this case becomes “1” when the volume data exceeds the threshold as shown in the figure, and becomes “0” when the volume data is lower than the threshold. The evaluation unit 50D, which will be described later, does not treat a sample whose singing state data Si is “0” as a valid sample. The reason for ignoring the low-volume part is that, in this section, the proportion of the noise component in the pitch difference data Fi or the volume difference data Li is large, and the scoring accuracy is degraded.
[0033]
By the way, the pitch difference data Fi and the volume difference data Li usually fluctuate within a certain range, and when these values fluctuate suddenly, it is assumed that erroneous calculation was performed due to malfunction due to noise or the like. You can think. If the singing ability is scored based on the pitch difference data Fi and the volume difference data Li affected by the noise, the singing ability of the singer cannot be properly evaluated. The filters 504a and 504b are provided to invalidate the pitch difference data Fi and the volume difference data Li in such a case.
[0034]
Each of the filters 504a and 504b has a buffer, a subtractor, and a comparator therein. The buffer stores pitch difference data Fi-1 and volume difference data Li-1 calculated for the immediately preceding sample. Then, when the pitch difference data Fi and the volume difference data Li corresponding to the current sample are input, the subtractor calculates ΔLi = | Li−Li−1 | and ΔFi = | Fi−Fi−1 | You. The comparator compares ΔLi and ΔFi with predetermined thresholds Lr and Fr, respectively. Together A control signal that becomes “0” is output. Here, each threshold value is determined so that an invalid sample can be determined from various types of actually measured data. Then, when the control signal is "1", the filters 504a and 504b invalidate the current pitch difference data Fi and the volume difference data Li.
As a result, it is possible to invalidate a sample whose change is larger than that of the previous sample and to properly evaluate the singing ability of the singer.
[0035]
Next, the combining unit 50C combines the difference data Diffa and Diffb at the same time by referring to the measurement time data Ti to generate combined difference data Diffc. The synthetic difference data Difffc is composed of synthetic reference melody state data Mi ′, synthetic singing state data Si ′, synthetic pitch difference data Fi ′, and synthetic volume difference data Li ′, in addition to the measurement time data Ti and the duration data ΔT. You.
[0036]
Here, assuming that each data constituting the difference data Diffa is represented by a suffix “1” and each data relating to the difference data Diffb is represented by a suffix “2”, the combined reference melody state data Mi ′ is represented by Mi1 and Mi2. As a logical sum, the synthesized singing state data Si 'is calculated as a logical sum of Si1 and Si2. The synthesized pitch difference data Fi ′ and the synthesized volume difference data Li ′ are calculated according to the following equations according to Mi1 and Mi2 and Si1 and Si2.
[0037]
1) When Mi1 * Mi2 * Si1 * Si2 = 1
In this case, regardless of the scoring performed by any scoring unit, it is a valid singing section and a period during which the singer sings. Therefore, an average value of the difference data is calculated.
Fi ′ = (Fi1 + Fi2) / 2
Li ′ = (Li1 + Li2) / 2
[0038]
2) Mi1 * Si1 = 1, Mi2 * Si2 = 0
In this case, the scoring performed by the second scoring unit 50B is not in the non-singing section or during singing. On the other hand, the scoring performed by the first scoring unit 50A is a period during which the singer is singing in the effective singing section. Therefore, the difference data Diffb is ignored. Fi '= Fi1
Li '= Li1
[0039]
3) Mi1 * Si1 = 0, Mi2 * Si2 = 1
In this case, the scoring performed by the first scoring unit 50A is not in a non-singing section or during singing. On the other hand, the scoring performed by the second scoring unit 50B is a period during which the singer is singing in the effective singing section. Therefore, the difference data Diffa is ignored.
Fi '= Fi2
Li '= Li2
[0040]
Combined like this Department By configuring 50C, for example, in a mixed singing section of a duet song, if a male singer sings correctly and a female singer does not sing, a female singer will not sing. won The part is excluded from the scoring, and the singing ability of the male singer who sings correctly can be used as both singing ability.
In addition, in a single singing section of a duet song, singing voices that should not be sung originally are not to be scored, and accurate scoring results can be obtained based only on originally planned singing voices.
[0041]
Next, the evaluation unit 50D includes a storage unit and the like (not shown), and calculates a scoring result based on the difference data Diffa, Diffb or the combined difference data Diffc. When the difference data Diffa, Diffb or the combined difference data Diffc is input, the difference data Diffa, Diffb, or the combined difference data Diffc is stored in the storage unit (that is, the difference data storage area 322 of the RAM 32). In this case, the CPU 30 controls which data among Diffa, Diffb, and Diffc is stored in the storage unit. This accumulation is performed at any time during the performance of the music.
[0042]
When the performance of the music is completed, the evaluation unit 50D sequentially reads out the difference data stored in the storage unit, accumulates the difference data for each musical element of pitch and volume, and gives a score based on each accumulated value. Find the subtraction value for Then, each subtraction value is subtracted from the full score (100 points) to obtain a score for each music element, and the average value of these is output as a scoring result.
[0043]
C: Scoring operation of the embodiment
Next, the scoring operation (that is, the operation of the scoring processing unit 50) according to the present embodiment will be described. In this example, unless otherwise specified, it is assumed that the singer is singing in the section to be sung and the singing state data Si = 1.
C-1: Scoring operation when singing a battle song
First, a case where two singers sing a battle song will be described. In this case, the selector 48 is set to the straight mode, and the same reference melody data #A is supplied to the first scoring unit 50A and the second scoring unit 50B. As a result, when the singing voice signals V1 and V2 are input to the first and second scoring units 50A and 50B, the first scoring unit 50A and the second scoring unit 50B generate difference data Diffa and Diffb. I do. In this case, since it is necessary to perform the scoring for each singer, the evaluation unit 50D generates a scoring result based on the difference data Diffa and a scoring result based on the difference data Diffb.
[0044]
C-2: Scoring operation when singing a normal song
Next, a case where one singer sings a normal song will be described. In this case, the difference data may be generated by one of the scoring units, but in the present embodiment, the first and second scoring units 50A and 50B simultaneously process the difference data in order to reduce noise. And scoring is performed based on the average value.
Therefore, the selector 48 is set to the mix mode, and the same reference melody data #A is supplied to the first scoring unit 50A and the second scoring unit 50B. Then, the combining unit 50C calculates an average value of the difference data Diffa and the difference data Diffb, and outputs the result as combined difference data Diffc.
[0045]
In general, since the noise component is random noise, averaging reduces the component by 3 dB. On the other hand, the signal component does not change even if the average is taken. Therefore, the SN ratio of the synthetic pitch difference data Fi ′ and the synthetic volume difference data Li ′ in the synthetic difference data Diffc is improved by 3 dB as compared with those of the difference data Diffa and the difference data Diffb.
As a result, in the A / D converters 501a and 501b, it is possible to reduce a noise component generated due to an error generated at the time of quantization and an error at the time of detecting a pitch, and to score the singing ability with high accuracy. It becomes possible.
[0046]
C-3: Scoring operation when singing duet music
Next, a case where male and female singers sing a duet song will be described. In a duet song, there are generally a male singing section in which only men sing, a female singing section in which only women sing, a mixed singing section in which men and women sing simultaneously, and a prelude / interlude section in which both do not sing. In the hybrid section, since both sing at the same time, it is necessary to score the singing power in each of the first and second scoring units 50A and 50B. On the other hand, in the male singing section or the female singing section, scoring can be performed by generating difference data in one of the sections, but in the present embodiment, in order to improve the scoring accuracy, this is performed. Also in this case, difference data is generated using both the scoring units, and the difference data is averaged by the combining unit 50C to obtain combined difference data.
[0047]
This point will be specifically described with reference to FIG. In this example, it is assumed that a man sings with the microphone 47a and a woman sings with the microphone 47b. FIG. 8A shows an example of the progress of a duet song. The duet music in this example is a prelude section T1 → a male singer section T2 → a female singer section T3 → a mixed singer section T4 → an interlude section. T5 Proceed in the order of 8B shows the mode of the selector 48. FIG. 8C shows the reference melody data supplied to the first scoring unit 50A, and FIG. 8D shows the second scoring. It shows reference melody data supplied to the unit 50B. Note that #M indicates reference melody data corresponding to the male part and #W indicates reference melody data corresponding to the female part.
[0048]
First, since the prelude section T1 and the interlude section T5 are not the original singing sections, no guide melody exists as shown in FIGS. 8B and 8C, and are excluded from scoring. Therefore, the switching mode of the selector 48 may be either the slate mode or the mix mode.
[0049]
Next, in the male singing section T2, the selector 48 is set to the mix mode. In this case, the CPU controls the input terminal X1 of the selector 48 to connect to the output terminals Y1 and Y2, and controls the input terminal X2 of the selector 48 to open. Therefore, the male singing voice signal V1 output from the microphone 47a is supplied to the first scoring unit 50A and the second scoring unit 50B. In this section, since the reference melody data #M is supplied to the first and second scoring units 50A and 50B, the male singing voice signal V1 and the male part reference melody data #M are scored in two parts. The comparison is performed by the units 50A and 50B, and the average value is generated in the combining unit 50C. The evaluation unit 50D scores the section based on the combined difference data Diffc from the combining unit 50C. In this case, the combined difference data Diffc has an improved SN ratio as compared with the difference data Diffat and Diffb.
[0050]
Next, in the female singing section T3, the selector 48 is set to the mix mode as in the male singing section T2. However, the connection state inside the selector 48 is different from the male singing section T2. In this case, the CPU performs control so that the input terminal X2 of the selector 48 is connected to the output terminals Y1 and Y2, and the input terminal X1 of the selector 48 is opened. Therefore, the male singing voice signal V1 is not output from the selector 48. In the section where only one of the two singers is to sing, both singing voice signals are not mixed and output to the output terminals Y1 and Y2, and the input from the other microphone is opened, for example, This is because, when the male clapping in the singing section T3, the clapping is performed as noise, and the singing ability of the female cannot be properly evaluated.
[0051]
Thus, when the female singing voice signal V2 is supplied to the first and second scoring units 50A and 50B, the first and second scoring units 50A and 50B perform comparison based on the reference melody data #W. . When the comparison result is averaged by the combining unit 50C and output as combined difference data Diffc, the evaluation unit 50D scores the section based on the combined difference data Diffc. Also in this case, similarly to the male singing section T, the combined difference data Diffc has an improved SN ratio compared to the difference data Diffat and Diffb.
[0052]
Next, in the mixed singing section, the selector 48 is set to the straight mode. In this case, the CPU 30 controls the input terminal X1 and the output terminal Y1 of the selector 48 to be connected, and controls the input terminal X2 to be connected to the output terminal Y1. Therefore, the male singing voice signal V1 is supplied to the first scoring unit 50A, and the female singing voice signal V2 is supplied to the second scoring unit 50B. In this section, since the reference melody data #M and #W are supplied to the first and second scoring units 50A and 50B, respectively, they are different from the first and second scoring units 50A and 50B. Difference data Diffa and Diffb are output. The combining unit 50C calculates an average value of the two, and generates combined difference data Diffc.
[0053]
Here, if a woman does not sing in a part (T4 ′) of the section, the singing state data Si2 related to the second scoring unit 50B is as shown in FIG. 8E. Therefore, during the period T4 ′, the combining unit 50C does not calculate the average value, but combines the pitch difference data Fi1 and the volume difference data Li1 generated by the first scoring unit 50A with the combined difference data Li1. Since it is output as Diffc, comprehensive scoring can be performed based on male singing ability.
[0054]
As described above, according to the present embodiment, based on the combination of the music data and the operation of the remote controller 51, the CPU 30 switches the selector 48 and supplies the reference guide melody data to be supplied to the first and second scoring units 50A and 50B. Is controlled, the first and second scoring units 50A and 50B can be effectively utilized, and a highly accurate and appropriate scoring result can be calculated.
That is, when one singer sings, a scoring result can be obtained based on the synthetic difference data Diffc with an improved SN ratio. In a duet song, the operation of the synthesizing unit 50C depends on the nature of the singing section. By switching, accurate and appropriate scoring results can be calculated.
[0055]
D: Modified example
Note that the present invention is not limited to the above-described embodiment, and various modifications as described below are possible.
(1) For example, in the embodiment, the case of performing a karaoke performance of a duet song has been described as an example. In this case, the scoring processing unit 50 may be extended to a system corresponding to the number of parts, and guide melody may be prepared by the number of tracks corresponding to the number of parts.
(2) Further, instead of obtaining the average value of each music element as the scoring result as in the embodiment, the score of the pitch, volume or rhythm may be output as the scoring result for each music element.
(3) In the grading process, the grading is performed collectively after the song is completed. However, the basic evaluation may be performed in units of phrases and musical notes, and the results may be totaled after the tune is completed. Further, the scoring result may be displayed on the monitor 46 for each phrase unit, and the final scoring result may be displayed after the end of the music.
(4) In the embodiment, the average value of the scores obtained for each vocal part in the duet music is output. However, the average value may be output individually, or both may be output. . In the case of outputting individually, the scoring result may be calculated by the evaluation unit 50D based on each of the difference data Diffa and Diffb.
(5) In addition, the user's enjoyment can be further increased by adopting various display modes such as highlighting the score of the highest scoring result among the plurality of singing voices.
[0056]
【The invention's effect】
As described above, according to the present invention, when a plurality of vocal parts are sung, for example, duet songs, comprehensive singing ability can be scored. The scoring accuracy can be improved.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a karaoke apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram showing a data format of music data in the embodiment.
FIG. 3 is a diagram showing a configuration of a music track of the music data.
FIG. 4 is a diagram showing a configuration of a track other than a musical tone track of the music data.
FIG. 5 is a diagram showing contents of a memory map of a RAM in the karaoke apparatus.
FIG. 6 is a block diagram showing a configuration of a scoring processing unit in the karaoke apparatus.
FIG. 7A is a diagram showing an example of a guide melody in the embodiment in a staff notation, FIG. 7B is a diagram showing pitch data and volume data of a reference based on the guide melody, and FIG. It is a figure which shows pitch data, volume data, and singing state data.
FIG. 8 is a timing chart in the case of singing a duet song in the karaoke apparatus.
[Explanation of symbols]
30 CPU (control means, scoring means), 31 ROM, 32 RAM, 37 hard disk device, 38 sound source device, 47a, 47b microphones (first and second microphones), 49 DSP for voice processing .. 50 scoring processing units, 501a, 501b A / D converters, 502a, 502b... Data extraction units (first and second extraction means), 503a, 503b... Comparison units (first and second comparison means) .

Claims

A karaoke apparatus that includes a selection unit, a first comparison unit, a second comparison unit, a supply unit, a calculation unit, and an evaluation unit, and plays music data.
The music data includes a first reference value and a second reference value, and a hybrid singing section, a first single singing section, and a second single singing section can be identified,
The selection means
When the performance is in the mixed singing section, the singing voice signal input from the first microphone is output from the first output terminal, and the singing voice signal input from the second microphone is output from the second output terminal;
When the performance is in the first single singing section, a singing voice signal input from the first microphone is output from the first and second output terminals,
When the performance is in the second single singing section, a singing voice signal input from the second microphone is output from the first and second output terminals,
The first comparing means compares the characteristic amount of the singing voice signal output from the first output terminal with the supplied first or second reference value,
The second comparing means compares the characteristic amount of the singing voice signal output from the second output terminal with the supplied first or second reference value,
The supply means is
When the performance is in a mixed singing section, the first reference value is supplied to the first comparing means, and the second reference value is supplied to the second comparing means.
When the performance is in the first singing section, the first reference value is supplied to the first comparing means and the second comparing means,
When the performance is in the second single singing section, the second reference value is supplied to the first comparing means and the second comparing means,
The calculating means calculates and outputs an average value of the comparison results of the first comparing means and the second comparing means,
The evaluator evaluates the singing ability based on an output of the calculator.
Karaoke equipment.

Each of the first and second comparing means detects a case where no singing voice signal is input as a non-singing period,
When the non-singing period is detected, the calculating means outputs the comparison result of the first or second comparing means which is not the non-singing period as it is, instead of the average value.
The karaoke apparatus according to claim 1.