JP2004138634A

JP2004138634A - "karaoke" singing equipment, program, and recording medium

Info

Publication number: JP2004138634A
Application number: JP2002300406A
Authority: JP
Inventors: Nobukimi Kobayashi; 小林　宣公
Original assignee: Brother Industries Ltd
Current assignee: Brother Industries Ltd
Priority date: 2002-10-15
Filing date: 2002-10-15
Publication date: 2004-05-13
Anticipated expiration: 2022-10-15
Also published as: JP4135461B2

Abstract

<P>PROBLEM TO BE SOLVED: To perform rating in which karaoke singing ability is suitably evaluated corresponding even to vibrato and figuration which are generally considered to be a skillful technique. <P>SOLUTION: When a special singing method like vibrato or figuration is used, the pitch data D1 of song voice changes roughly periodically in a pitch direction around a reference pitch. As a result, although the average value D31 of the absolute value D3 of a value obtained by subtracting rating reference data D2 from the pitch data D1 becomes a large value, the absolute value D42 of the average value D41 of D1-D2 is close to the rating reference data D2. On the other hand, when the same verification is performed to the case where a singer is a so-called poor singer who sings a song by deviating from the reference pitch without using the special singing method, D31, D42 become almost equal. Since the difference of them, D31-D42, shows different values clearly for both cases, presence of absence of use of the special singing method can be judged when a determination value for differentiating both is used in karaoke singing equipment. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、カラオケ装置における歌唱採点に関し、より具体的には、ビブラートやこぶしのようなテクニックを使った場合でも適正な採点結果を得ることができるようにする技術に関する。
【０００２】
【従来の技術】
従来、カラオケ装置の付帯機能として採点機能が良く知られている。この採点機能は、マイクロホンから入力された歌唱者の音声信号をサンプリングすることで歌唱者が発声した音高や声量あるいはテンポなどの歌唱状態を示す歌唱データを生成する。この歌唱データとカラオケデータ中の主旋律データなどの採点基準データとを比較し、その比較結果に基づいて所定の得点を付与して採点データを生成する。そして、歌唱パートが終了するとこの採点データ中の得点を集計して総合得点を算出する。総合得点はそのままの得点をスコアボードやディスプレイに表示したり、所定のメッセージや所定の表現内容を含む映像など総合得点を反映した映像をディスプレイに出力したりする（例えば、特許文献１参照）。
【０００３】
【特許文献１】
特開平１１−３０５７８６号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、従来の歌唱採点のやり方は、一意的に決められている採点基準に対する歌唱状態の一致度によって歌唱力の優劣を判定している。例えば音高の一致度合いを見る場合であれば、両者の音高の差分がなるべく零に近い方が歌唱力が優良であると判定している。そのため、例えばビブラートやこぶしのようなテクニックを使うと、歌唱者の音声信号から得た音高データと採点基準となる歌唱旋律の音高データとの差分が大きくなり、側で聞いている人にとっては上手く歌っているように感じていても、得点としては悪くなるという問題があった。
【０００５】
そこで本発明は、カラオケ歌唱する際に、一般的に上手なテクニックであるビブラートやこぶしにも対応して、歌唱力を適正に評価した採点が可能なカラオケ装置を提供することを目的とする。
【０００６】
【課題を解決するための手段及び発明の効果】
（１）上述した問題点を解決するためになされた請求項１に係るカラオケ装置は、曲データ記憶手段がカラオケ曲を演奏するためのカラオケ演奏データ及びそのカラオケ曲の歌唱旋律の音高データを含む採点基準データを記憶しており、カラオケ演奏手段が、指定されたカラオケ曲のカラオケ演奏データを曲データ記憶手段から読み出してカラオケ演奏を行なう。また、音声信号入力手段を介してカラオケ歌唱の音声信号が入力されると、音高抽出手段が、その音声信号をサンプリングしてカラオケ歌唱の音高データを抽出する。そして採点手段は、カラオケ演奏手段によるカラオケ演奏と同期してその演奏曲に対応する採点基準データを曲データ記憶手段から読み出し、その読み出した採点基準データと音高抽出手段によって抽出した音高データとに基づいて採点するのであるが、さらに詳しくは次のような採点を行う。
【０００７】
つまり、第１の音高差算出手段が、所定期間中に音高抽出手段によって抽出された複数の音高データと、曲データ記憶手段から読み出した採点基準データ中における対応する音高データとの差の絶対値をそれぞれ算出し、さらにそれら絶対値の平均値を求める。また、第２の音高差算出手段が、所定期間中に音高抽出手段によって抽出された複数の音高データと、曲データ記憶手段から読み出した採点基準データ中における対応する音高データとの差の平均値の絶対値を算出する。そして、特殊歌唱方法判定手段は、第１の音高差算出手段によって算出した音高差と第２の音高差算出手段によって算出した音高差との差分が、ビブラート又はこぶしを用いた特殊歌唱方法以外の歌唱方法では取り得ない判定値よりも大きい場合に、ビブラート又はこぶしを用いた特殊歌唱方法であると判定する。この特殊歌唱方法判定手段によってビブラート又はこぶしを用いた特殊歌唱方法であると判定された場合は、第２の音高差算出手段によって算出された音高差を用いて採点を行うのである。
【０００８】
ここで、特殊歌唱方法判定手段は、第１及び第２の音高差算出手段によってそれぞれ算出した音高差同士の差分に基づいて、ビブラート又はこぶしを用いた特殊歌唱方法であるか否かを判定しているが、このようにして判定できる理由を説明する。
【０００９】
まず、ビブラートやこぶしのようなテクニックを使った場合の歌唱の音声信号の特徴を考えてみると、その音高に注目した場合、基準となる音高を中心として高低方向にほぼ同じような差分を持った音高間をほぼ周期的に変化する「正弦波的な」波形信号になる。一般的に採点のためのサンプリング間隔はこの変化周期よりは短く、その変化周期と一致することはないため、全体としては正しい音高で歌唱しているにもかかわらず、あるサンプリングタイミングでは基準音高から大きく外れている状態が生じる。しかしながら、ビブラート部分全体を見てみると、あるサンプリングタイミングでは基準音高から高い側に外れた状態、別のサンプリングタイミングでは基準音高から低い側に外れた状態、さらに別のサンプリングタイミングでは基準音高に一致あるいは非常に近い状態が得られ、ビブラート部分全体の音高を平均してみると基準音高に近くなる。
【００１０】
これに対して、ビブラートやこぶしのようなテクニックを用いていないのに、基準音高を外して歌唱しているいわゆる「下手な」場合には、ほとんど全てのサンプリングタイミングにおいて、基準音高に対して同じ側に外れていることとなる。つまり、基準音高に対して高い側に外れている場合はほとんど全てのサンプリングタイミングにおいてほぼ同じ量だけ高い側に外れ、基準音高に対して低い側に外れている場合はほとんど全てのサンプリングタイミングにおいてほぼ同じ量だけ低い側に外れるため、音高差の正負は同じであり、また音高差の絶対値もほぼ等しくなる。
【００１１】
これらの分析に基づき、両者の違いを次のような物理量によって反映させることができると考えた。つまり、ビブラートやこぶしのようなテクニックを使い、且つ正しい音高で歌唱している場合には、上述のように「正弦波的な」波形信号となるため、歌唱音声信号から所定のサンプリング間隔で得た所定期間中の複数の音高データに対して、採点基準となる音高データとの差の絶対値の平均値は相対的に大きいが、採点基準となる音高データとの差の平均値の絶対値は相対的に小さくなる。これに対して、ビブラートやこぶしのようなテクニックを用いないで基準音高を外して歌唱している場合には、採点基準となる音高データとの差の絶対値の平均値及び採点基準となる音高データとの差の平均値の絶対値の両者が共に、相対的に大きな値となる。したがって、これらの差異に着目すれば両者の違いを判定できると考え、上述のような特殊歌唱方法判定手段の判定手法を採用した。この判定に用いる「ビブラート又はこぶしを用いた特殊歌唱方法以外の歌唱方法では取り得ない判定値」については、例えば実験等によって求めることが考えられる。例えばビブラートやこぶしのようなテクニックを使い、且つ正しい音高で歌唱した状態における、採点基準となる音高データとの差の絶対値の平均値を算出する。それが例えば２音（４セミトーン）程度であったならば、判定値を３セミトーンにするとか、いったことである。
【００１２】
なお、歌唱音声信号を周波数分析等によって解析して信号波形の情報を得るようにすれば、その波形からビブラートの有無を判定することは可能である。しかし、この場合はＦＦＴ等の周波数分析を行う必要があり、相対的に計算量が多くなる。これに対して本発明の場合は単純な四則演算で対応できるため、相対的に簡易な計算によってビブラートの有無を加味した適切な採点ができる点で非常に有利である。
【００１３】
そして本発明のカラオケ装置では、ビブラート又はこぶしを用いた特殊歌唱方法であると判定された場合、第２の音高差算出手段によって算出された音高差を用いて採点を行うのであるが、これは、上述のビブラートやこぶしのようなテクニックを使った場合の歌唱の音声信号の特徴分析からも分かるように、第２の音高差算出手段によって算出された音高差は、全体として見た場合の基準音高に対する一致度合いを反映していると考えられるからである。このように採点することで、カラオケ歌唱する際に、一般的に上手なテクニックであるビブラートやこぶしにも対応して、歌唱力を適正に評価した採点が可能となる。
【００１４】
（２）また、請求項２に示すように、特殊歌唱方法判定手段によってビブラート又はこぶしを用いた特殊歌唱方法であると判定された場合は、所定の得点を加算した採点を行うようにしてもよい。上述のように、第２の音高差算出手段によって算出された音高差を用いて採点を行うということは、ビブラート又はこぶしを用いた特殊歌唱方法であった場合に、不当に不利な採点をされないといういわば消極的な対処である。しかし、ビブラートやこぶしのようなテクニックは一般的に上手なテクニックであると考えられているため、そのようなテクニックを用いない歌唱の場合よりも有利な採点をすることも、「歌唱力を適正に評価した採点」という観点では有効である。そこで、ビブラート又はこぶしのようなテクニックを加点対象とし、そのようなテクニックが用いられた場合は所定得点を加算するようにした。
【００１５】
なお、このような採点結果である得点は、従来同様、ディスプレイ等に表示することが考えられるが、その際、例えばビブラート又はこぶしのようなテクニックが認められて加点された場合には、その旨を合わせて表示するようにしてもよい。このようにすれば、歌唱者は自分のテクニックが認められて加点されたことが分かり、満足度が高くなる。
【００１６】
（３）ところで、採点において、第１の音高差算出手段及び第２の音高差算出手段の算出対象として音高抽出手段が複数の音高データを抽出する所定期間に関しては、次のような工夫が考えられる。
例えば請求項３に示すように、曲データ記憶手段から読み出した採点基準データ中の算出対象となる同一音高の音高データが継続する期間を「所定期間」として採用する。同一音高が継続している場合には、その期間中の一部あるいは全部においてビブラート又はこぶしを用いた歌唱を行っている可能性がある。したがって、この期間中を全て対象とすることで、どの部分でビブラート又はこぶしが用いられても、歌唱力を適正に評価した採点が可能となる。
【００１７】
もちろん、採点基準データ中の算出対象となる同一音高の音高データが継続する期間全部ではなく、その一部の期間を「所定期間」として採用することも可能である。但し、採点基準となる音高データとの差の平均値の絶対値が相対的に小さくなるという性質を適切に把握するためには、上述のように「正弦波的な」波形信号のうち、１周期の整数倍単位の範囲で考えるので好ましい。但し、１周期を検出するため、上述のように歌唱音声信号を周波数分析等によって解析して信号波形の情報を得るのであれば、やはりＦＦＴ等の周波数分析を行う必要があり、相対的に計算量が多くなる。
【００１８】
そこで、請求項４に示す周期推定手段のように、音高抽出手段によって抽出された音高データと、曲データ記憶手段から読み出した採点基準データ中における対応する音高データとの差が正負転換するタイミングに基づいてビブラート又はこぶしによる音高波形の１周期を推定することが考えられる。例えば、正から負へ反転したタイミングから、次に同じように正から負へ反転したタイミングまでの時間を１周期として算出すればよい。
【００１９】
なお、このようにして設定した所定期間を１回の比較対象範囲とし、同一音高の音高データが継続する期間中において、その比較対象範囲を順次ずらしながら音高差の平均値を算出するという、いわゆる「移動平均」の手法を採用することも考えられる。
【００２０】
（４）また、請求項５に示すように、請求項１〜４の何れかに記載のカラオケ装置における採点手段をコンピュータにて実現する場合、例えばコンピュータで実行するプログラムとして備えることができる。このようなプログラムは、請求項６に示すように、例えばフレキシブルディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ハードディスク、ＲＯＭ、ＲＡＭ等のコンピュータ読み取り可能な記録媒体に記録し、必要に応じてコンピュータにロードして実行したり、ネットワークを介してロードして実行することにより、採点手段としての機能を実現できる。
【００２１】
【発明の実施の形態】
以下、本発明が適用された実施例について図面を用いて説明する。なお、本発明の実施の形態は、下記の実施例に何ら限定されることなく、本発明の技術的範囲に属する限り種々の形態を採りうることは言うまでもない。
【００２２】
図１は、本実施例のカラオケ装置の構成を示すブロック図である。本実施例では、複数のカラオケ装置１ａ，１ｂ，１ｃ…がローカル・エリア・ネットワーク（ＬＡＮ）３０に接続されてカラオケシステムを構成している。基本的な構成はどのカラオケ装置１も同じであるが、カラオケ装置１ａのみが通信ネットワーク３を介してホストコンピュータ２に接続できるようになっている。そして、カラオケ装置１ａは、この通信ネットワーク３を介して接続したホストコンピュータ２からカラオケに関する音楽情報と画像情報とを取得することができる。そして、カラオケ装置１ａが取得したカラオケに関する音楽情報と画像情報は、ＬＡＮ３０を介して他のカラオケ装置１ｂ、１ｃ…も取得できるようにされている。
【００２３】
なお、ホストコンピュータ２は、通信ネットワーク３を介してカラオケ装置１ａとアクセス可能であって、カラオケ装置１ａに対して、最新の流行曲等の曲データを発信したり、どのような曲が何回演奏されたかといったログデータを含む関連情報をカラオケ装置１ａから受信したりして管理することができるようになっている。なお、この場合の「どのような曲が何回演奏されたかといったログデータ」については、カラオケ装置１ａ単体でのログデータではなく、カラオケシステム全体のログデータを指す。つまり、他のカラオケ装置１ｂ、１ｃ…におけるログデータもカラオケ装置１ａに集められ、カラオケシステム全体のログデータがカラオケ装置１ａからホストコンピュータ２へ送信される。
【００２４】
また、ホストコンピュータ２は、データベースを備えており、このデータベースに楽曲演奏に使用するコンテンツデータとしての音楽情報や背景画または歌詞等の画像情報等を記憶している。また、ホストコンピュータ２は、コンテンツデータ以外にバージョンアップされたシステムプログラム等をデータベースに記憶し、そのデータベースから随時読み出してカラオケ装置１に対して発信することができるようになっている。
【００２５】
次に、カラオケ装置１ａの構成について説明する。
このカラオケ装置１ａには、図１に示されるように、ホストコンピュータ２に通信ネットワーク３を介して接続し各種の情報を送受信する通信装置１９、曲の予約などを行う操作パネル１０、カラオケ装置１ａ全体の制御を司るＣＰＵ１４、各種情報を一時的に記憶するＲＡＭ１５、演奏の再生を行う音源再生装置１８、音楽情報にかかる電気信号を増幅等するアンプ２０、アンプ２０からの電気信号を入力して伴奏曲及び利用者の歌声等を流すスピーカ２２、利用者の歌声等をアンプ２０に入力するためのマイクロフォン（以下、単にマイクと称す。）２３、マイク２３から入力される歌唱音声をサンプリングして解析することで歌唱音高を抽出し採点基準データと比較して得点化する採点部２８、カラオケ用の音楽情報や画像情報その他各種データを記憶しているＥＥＰＲＯＭ１２とハードディスク１３、画像情報等を映像化するための映像再生装置２４、画像情報である背景画及び歌詞等を表示する表示装置２６、およびＬＡＮ３０に接続するためのネットワークインターフェース１７を備えている。この内、操作パネル１０、ＥＥＰＲＯＭ１２、ハードディスク１３、ＲＡＭ１５、ネットワークインターフェース１７、音源再生装置１８、映像再生装置２４及び採点部２８は、ＣＰＵ１４に接続されている。
【００２６】
なお、他のカラオケ装置１ｂ、１ｃ…は通信装置１９を備えないだけで、他の構成は備えているため、ここではカラオケ装置１ａのみの説明をし、他のカラオケ装置１ｂ，１ｃ…の構成についての説明は省略する。
上述した通信装置１９は、信号の変調および復調を行う変復調装置であり、ＣＰＵ１４の制御の下、通信ネットワーク３を通じてホストコンピュータ２にアクセス可能に構成されている。それによって、通信装置１９は、通信ネットワーク３を介してホストコンピュータ２から送られてくる曲データ等を受信したり、上記関連情報をホストコンピュータ２に伝送することができる。
【００２７】
また、操作パネル１０は、利用者によって操作されるものであり、任意の曲の選択、演奏音のキーの調整、演奏と歌との音量バランスの調整、その他エコー、音量、トーンなど各種調整を行うため操作部と例えば選択された曲番号等を表示するための表示部（図示せず）を備えている。利用者は、その操作部を操作することによって、再生する曲をカラオケ装置１に予約登録することができる。
【００２８】
また、ＥＥＰＲＯＭ１２にはシステムプログラムや各種の設定に必要な設定データなどが記憶されている。
また、ハードディスク１３には、音楽情報や画像情報などのコンテンツデータや演奏記録などのログデータなどが記憶されている。そして、操作パネル１０の操作部を介して曲が選択されると、ＣＰＵ１４は、歌詞データ、映像データからなる画像情報、および演奏データからなる音楽情報をハードディスク１３から呼び出して、映像再生装置２４および音源再生装置１８に同期させて出力するようになっている。なお、ハードディスク１３には、識別情報としての曲番号に対応して演奏データや歌詞データ等が記憶されている。
【００２９】
その後、ＣＰＵ１４から出力される演奏データは、音源再生装置１８において、アナログの演奏音信号に変換された後、アンプ２０へ送られて電気的に増幅される。このアンプ２０は、マイク２３を介して入力される利用者（歌唱者）の歌唱音信号と適度な割合でミキシングするもので、ミキシングされた歌唱音信号と演奏音信号は、アンプ２０からスピーカ２２に出力され、音声及び演奏音となってスピーカ２２から外部へ出力される。
【００３０】
一方、映像再生装置２４は、ＣＰＵ１４の制御の下、ハードディスク１３から読み出された画像情報に基づく画像の再生を行うものである。それにより、ＣＰＵ１４により出力される歌詞データは、映像再生装置２４において映像データと合成され、表示装置２６の画面に背景映像とともに歌詞テロップが表示されるようになっている。
【００３１】
このような構成のため、利用者は、表示装置２６に表示される歌詞テロップを参照しながら、スピーカ２２より流れるカラオケ演奏にあわせ、マイク２３を使って歌唱できるようになっている。
また、本実施例のカラオケ装置１ａはカラオケ演奏にあわせて歌っている人の歌唱状態を採点してその採点結果を出力する歌唱採点機能を備えている。利用者がカラオケ装置１ａで歌唱採点機能を利用したい場合、パネルやリモコンなどの操作パネル１０から演奏予約用の一連の楽曲番号に引き続いて所定の操作キーを押す。それにより、歌唱採点機能の作動命令を示す符号が付加された楽曲番号がＣＰＵ１４に転送される。ＣＰＵ１４はこの楽曲を演奏処理するときは歌唱採点機能を作動させるものとして演奏予約処理する。もちろん、利用者による課金手続きなどによって歌唱採点機能が作動するようにしても良い。例えば、別途課金装置をカラオケ装置１ａに接続し、採点して欲しい楽曲が演奏される直前や前奏の演奏中など適宜な時期に所定の料金を課金装置に投入することで歌唱採点機能が作動するよう構成してもよい。
【００３２】
歌唱採点を実施するのが採点部２８である。採点部２８にはマイク２３から歌唱音声が入力するよう構成されており、採点部２８は、その入力した歌唱音声をサンプリングして解析することで歌唱音高やリズムなどを採点要素として抽出する。一方、ハードディスク１３に格納されている伴奏音楽生成データをデータバス及びＣＰＵ１４を介して受け取り、このデータ中のボーカルデータ（歌唱旋律データ）を採点基準データとして取得する。そして、この取得した採点基準データによる採点要素と上述の歌唱音声から抽出した採点要素とを比較し、歌唱音声がどの程度ボーカルデータに近いかによって採点する。なお、このボーカルデータは、いわゆるガイドメロディ機能として相対的に小さな音量で伴奏楽音と共にスピーカ２２から出力される機能を実現する場合にも用いられる。
【００３３】
この採点結果のデータはＣＰＵ１４に転送され、ＣＰＵ１４は転送された採点データを受け取るとＲＡＭ１５に格納していく。そしてＣＰＵ１４は、カラオケ楽曲における採点区間の終了点を検知すると、採点部２８に採点データの生成を停止させ、採点区間中にＲＡＭ１５に蓄積された採点データを集計し、得点結果を表示装置２６に表示させる。
【００３４】
これが採点のための構成及び概略動作の説明であったが、本実施例のカラオケ装置１ａにおいては、ビブラート・こぶしを判定してより適正な採点を行うことができるようになっている。これは、次のような問題に鑑みたものである。つまり、歌唱音声から抽出した音高データと伴奏音楽生成データ中のボーカルデータの音高データの単純な一致度合いを判断する従来手法であると、ビブラートやこぶしのようなテクニックを使った場合、音高データとの差分が大きくなる。したがって、側で聞いている人にとっては上手く歌っているように感じていても、得点としては悪くなってしまうのである。そこで、本実施例のカラオケ装置１ａにおいては、この音高データの一致度合いを判定する部分においてビブラート・こぶしの存在を加味した判定を行えるようにした。
【００３５】
この内容を図２〜図５も参照して説明する。図２は実施例における採点部２８で実施される採点処理の内、音高データの一致度合いを判定するアルゴリズムを説明するフローチャートを示しており、図３は歌唱音声から抽出した音高データＤ１と採点基準データＤ２に基づく採点方法の説明図である。一音毎に音高データＤ１と採点基準データＤ２との音高差を計算し、得られた音高差に応じて所定の得点をつける。なお、図３に示す例においては最後の一音の後半部にビブラート・こぶしが存在している。そして、図４は本実施例の判定手法の説明図であり、図５は従来の判定手法の説明図である。
【００３６】
図２に示すように、採点部２８では、まず、マイク２３から入力される歌唱音声信号を取り込む（Ｓ１０２）。そして、その取り込んだ歌唱音声信号を所定タイミング（例えば一定周期）でサンプリングして解析することにより音高データＤ１を抽出する（Ｓ１０４）。さらに、伴奏音楽生成データをデータバスを介して受け取り、このデータ中のボーカルデータ中の音高データを採点基準データＤ２として取得する（Ｓ１０６）。
【００３７】
次のＳ１０８では、歌唱音声から抽出した音高データＤ１から、当該音高データＤ１が得られた歌唱タイミングに対応する採点基準データＤ２を減算した値の絶対値Ｄ３を算出し、さらに所定区間中における当該絶対値Ｄ３の平均値Ｄ３１を算出する。この「所定期間」に関しては、例えば算出対象となる採点基準データＤ２中において同一音高の音高データが継続する期間を採用することが考えられる。同一音高が継続している場合には、その期間中の一部あるいは全部においてビブラート又はこぶしを用いた歌唱を行っている可能性がある。したがって、この期間中を全て対象とすることで、どの部分でビブラート又はこぶしが用いられても、歌唱力を適正に評価した採点が可能となる。
【００３８】
このＳ１０８での平均値Ｄ３１の算出に関して、さらに図４を参照して具体的に説明する。
図４（ａ）における縦軸は周波数の大小を示しており、横軸は時間軸であるとであると共に、採点基準データＤ２のＳ１０６で取得される周波数に対応する。また、黒丸はＳ１０４にて抽出される歌唱音声の音高データＤ１であり、これら黒丸をつなぐ曲線は、ビブラート又はこぶしを用いて歌唱した場合の歌唱音声の周波数を連続的に示したものである。そして、Ｓ１０８における「音高データＤ１から採点基準データＤ２を減算した値の絶対値Ｄ３は図４（ｂ）にて黒丸で示される。なお、図４（ｂ）の縦軸は周波数、横軸は時間軸である。さらに、この絶対値Ｄ３の平均値Ｄ３１を一点鎖線で示した。図４（ｂ）では時間軸から黒丸まで線を引くことで棒グラフ的に示してあるが、その棒の長さの平均値がＤ３１に対応することとなる。
【００３９】
図２のフローチャートの説明に戻り、次のＳ１１０では、歌唱音声から抽出した音高データＤ１から、当該音高データＤ１が得られた歌唱タイミングに対応する採点基準データＤ２を減算した値Ｄ４を算出し、さらに所定区間中における当該減算値Ｄ４の平均値Ｄ４１の絶対値Ｄ４２を算出する。この「所定期間」はＳ１０８での所定期間と同じである。
【００４０】
このＳ１１０での減算値Ｄ４は、図４（ｃ）にて黒丸で示される。なお、図４（ｂ）の縦軸は周波数、横軸は時間軸である。さらに、図４（ｃ）において、この減算値Ｄ４の平均値Ｄ４１を一点鎖線で示し、さらにその平均値Ｄ４１の絶対値Ｄ４２を二点鎖線で示した。
【００４１】
次のＳ１１２では、Ｓ１０８にて算出した（絶対値Ｄ３の）平均値Ｄ３１からＳ１１０にて算出した（平均値Ｄ４１の）絶対値Ｄ４２を減算した値が所定値α以上か否かを判定する。この所定値αとしては例えば３度（３セミトーン）といった値が考えられる。この所定値αに関しては後で補足する。
【００４２】
そして、（Ｄ３１−Ｄ４２）≧αであれば（Ｓ１１２：ＹＥＳ）、この１音はビブラート又はこぶしを用いた特殊歌唱方法であると考えられるため、Ｓ１１０にて算出した（平均値Ｄ４１の）絶対値Ｄ４２を、採点基準となる音高との差とする（Ｓ１１４）。そして、ビブラート又はこぶしを用いた特殊歌唱方法を用いたことによる所定の加点をする（Ｓ１１６）。この加点は一定の得点を付与するものである。
【００４３】
なお、この場合の採点部２８は、別途実行するメインの採点処理において、図２のＳ１１４で得られた音高差に基づく採点を行うと共にＳ１１６で得られた加点すべき得点を加えて採点を行う。また、上述のように採点された結果である得点は表示装置２６に表示されるのであるが、Ｓ１１６の処理を経て加点された場合には、その旨（例えば「ビブラート又はこぶしにより○点加点」など）を表示しても良い。このようにすれば、ビブラート又はこぶしのようなテクニックが認められて加点されたことを歌唱者が分かり、歌唱者の満足度が高くなる。
【００４４】
一方、（Ｄ３１−Ｄ４２）＜αであれば（Ｓ１１２：ＮＯ）、この１音はビブラート又はこぶしを用いた特殊歌唱方法はなされていないと考えられるため、Ｓ１０８にて算出した（絶対値Ｄ３の）平均値Ｄ３１を、採点基準となる音高との差とする（Ｓ１１８）。なお、この場合の採点部２８は、別途実行するメインの採点処理において、図２のＳ１１８で得られた音高差に基づく採点を行う。
【００４５】
ここで、Ｓ１１２の説明で述べたように、（Ｄ３１−Ｄ４２）≧αであればビブラート又はこぶしを用いた特殊歌唱方法であると考えられ、（Ｄ３１−Ｄ４２）＜αであればビブラート又はこぶしを用いた特殊歌唱方法はなされていないと考えられる理由について説明する。
【００４６】
ビブラートやこぶしのようなテクニックを使った場合の歌唱音声の音高に注目すると、基準となる音高を中心として高低方向にほぼ同じような差分を持った音高間をほぼ周期的に変化する「正弦波的な」波形信号になる。つまり、図４（ａ）に示すような波形信号である。採点のためのサンプリング間隔はこの変化周期よりは短いため、その変化周期と一致することはない。したがって、図４（ａ）からも分かるように、全体としては正しい音高で歌唱しているにもかかわらず、あるサンプリングタイミングでは基準音高から大きく外れている状態が生じる。しかしながら、ビブラート部分全体を見てみると、あるサンプリングタイミングでは基準音高から高い側に外れた状態、別のサンプリングタイミングでは基準音高から低い側に外れた状態、さらに別のサンプリングタイミングでは基準音高に一致あるいは非常に近い状態が得られる。したがって、ビブラート部分全体の音高を平均してみると、図４（ｃ）中に一点鎖線で示したＤ４１（＝Ｄ４（＝Ｄ１−Ｄ２）の平均値）のように、基準音高である採点基準データＤ２に近くなる。
【００４７】
これとの比較のため、ビブラートやこぶしのようなテクニックを用いていないのに、基準音高を外して歌唱しているいわゆる「下手な」場合について、同様の検証をしてみる。図５（ａ）は、ビブラートやこぶしのようなテクニックを用いずに基準音高を外して歌唱している場合の音高データＤ１と採点基準データＤ２を示している。この場合の歌唱音声から抽出した音高データＤ１の特徴としては、ほとんど全てのサンプリングタイミングにおいて、基準音高に対して同じ側に外れていることとなる。つまり、基準音高に対して高い側（採点基準データＤ２に対して周波数が大きい側）に外れている場合はほとんど全てのサンプリングタイミングにおいてほぼ同じ量だけ高い側に外れ、基準音高に対して低い側（採点基準データＤ２に対して周波数が小さい側）に外れている場合はほとんど全てのサンプリングタイミングにおいてほぼ同じ量だけ低い側に外れるため、音高差の正負は同じであり、また音高差の絶対値もほぼ等しくなる。
【００４８】
このようなデータＤ１，Ｄ２に対して、図２におけるＳ１０８及びＳ１１０の処理を施してみると、Ｓ１０８における「音高データＤ１から採点基準データＤ２を減算した値の絶対値Ｄ３」は図５（ｂ）にて黒丸で示されるため、この絶対値Ｄ３の平均値Ｄ３１は同図中の一点鎖線で示すようになる。一方、Ｓ１１０における「音高データＤ１から採点基準データＤ２を減算した値Ｄ４」は図５（ｃ）中の黒丸で示されるため、その減算値Ｄ４の平均値Ｄ４１の絶対値Ｄ４２は同図中に二点鎖線で示すようになる。つまり、これらＤ３１とＤ４２は同じ値となる。
【００４９】
これら図４及び図５にて示したＤ３１及びＤ４２との関係の相違から、図５（ａ）に示すような、ビブラートやこぶしのようなテクニックを用いずに基準音高を外して歌唱している場合の（Ｄ３１−Ｄ４２）の値はほぼ零になるのに対して、図４（ａ）に示すような、ビブラートやこぶしのようなテクニックを用いて基準音高に近い状態で上手く歌唱している場合には（Ｄ３１−Ｄ４２）の値はそれなりの大きさになる。例えば、ビブラートの幅が基準音高に対して２音〜２音半（４〜５セミトーン）程度あれば、サンプリングタイミングにもよるが、例えば（Ｄ３１−Ｄ４２）の値が１音半（３セミトーン）程度になることが想定される。このような値は、図５（ａ）に示すような、ビブラートやこぶしのようなテクニックを用いずに基準音高を外して歌唱している場合には取り得ない値である。そのため、例えば図２のＳ１１２での判定のための所定値αを３セミトーン程度に設定すればよい。もちろん、ビブラートの幅は曲や歌唱者によって変化する可能性があり、上述例よりも小さい場合もあり得るので、判定のための所定値αとして、上述例よりも小さな２セミトーンあるいは１セミトーンといった値を採用しても構わない。
【００５０】
なお、この判定値αを決めるにあたっては、例えば実験等によって求めることも考えられる。ビブラートやこぶしのようなテクニックを使い、且つ正しい音高で歌唱した状態における、採点基準となる音高データとの差の絶対値の平均値を算出する。それが例えば２音（４セミトーン）程度であったならば、判定値を３セミトーンにするとか、１音（２セミトーン）程度であったならば、判定値を１セミトーンにする、といったことである。また、例えば原曲を歌っている歌手のビブラートやこぶしの度合いを元に分析してもよい。歌手による歌唱音声が入っている音楽情報からその歌唱音声部分（ボーカル部分）を抽出し、そのビブラートやこぶし部分において、上述の（Ｄ３１−Ｄ４２）の値を算出する。このようなサンプル値を複数の歌手のボーカル部分に基づいて算出し、それらの平均を取るなどして、判定のための所定値αを決定することも考えられる。
【００５１】
このように、本実施例のカラオケ装置１ａ（他のカラオケ装置１ｂ、１ｃ…についても同様である。）によれば、次のような効果が得られる。
（１）カラオケ歌唱する際に、一般的に上手なテクニックであるビブラートやこぶしにも対応して、歌唱力を適正に評価した採点が可能となる。
【００５２】
なお、歌唱音声信号を周波数分析等によって解析して信号波形の情報を得るようにすれば、その波形からビブラートの有無を判定することはできるが、この場合はＦＦＴ等の周波数分析を行う必要があり、相対的に計算量が多くなる。これに対して本実施例の場合には、図２のＳ１０８、Ｓ１１０、Ｓ１１２の説明からも分かるように単純な四則演算で対応できるため、相対的に簡易な計算によってビブラートの有無を加味した適切な採点ができる点で非常に有利である。
【００５３】
（２）また、図２のＳ１１６に示すように、ビブラート又はこぶしを用いた特殊歌唱方法をしている場合には加点するようにした。ビブラートやこぶしのようなテクニックは一般的に上手なテクニックであると考えられているため、そのようなテクニックを用いない歌唱の場合よりも有利な採点をすることで「歌唱力を適正に評価した採点」が実現できる。
【００５４】
なお、本実施例においては、ハードディスク１３が「曲データ記憶手段」に相当し、ＣＰＵ１４及び音源再生装置１８等が「カラオケ演奏手段」に相当する。また、マイク２３が音声信号入力手段に相当し、採点部２８が音高抽出手段及び採点手段に相当する。また、図２の処理のうち、Ｓ１０８、Ｓ１１０、Ｓ１１２がそれぞれ「第１の音高差算出手段」、「第２の音高差算出手段」、「特殊歌唱方法判定手段」としての処理の実行に相当する。
【００５５】
以上実施例について説明したが、本発明は上記実施例に限定されるものではなく、種々の態様で実施し得る。そのいくつかを説明する。
（１）上記実施例では、図２のＳ１０８における平均値Ｄ３１の算出及びＳ１１０における絶対値Ｄ４２の算出の対象となるデータの「所定区間」として、採点基準データＤ２中において同一音高の音高データが継続する期間を採用した。しかし、このような同一音高の音高データが継続する期間全部ではなく、その一部の期間を「所定期間」として採用してもよい。但し、図２のＳ１１２における（Ｄ３１−Ｄ４２）の値の大小で、ビブラート又はこぶしを用いた特殊歌唱方法を用いているか否かを判定するようにしているため、次のような観点での工夫を施すことが望ましい。
【００５６】
つまり、ビブラート又はこぶし部分は、基準となる音高を中心として高低方向にほぼ同じような差分を持った音高間をほぼ周期的に変化する「正弦波的な」波形信号になるため、全体としてみれば、図４（ｃ）中に二点鎖線で示したように、歌唱音声に基づく音高データＤ１と採点基準データＤ２との差Ｄ４の平均値Ｄ４１の絶対値Ｄ４２が相対的に小さくなるという性質がある。この性質を利用して特殊歌唱方法か否かを判定しているため、ビブラート又はこぶしの特徴を適切に把握するためには、「正弦波的な」波形信号のうち、１周期の整数倍単位の範囲で考えるので好ましい。例えば、図４では３周期分のデータが示されている。実際にはビブラート又はこぶし部分がもっと長い周期であったとしても、例えばその内の３周期分のみに着目した算出を行っても良い。
【００５７】
但し、１周期を検出するため、歌唱音声信号を周波数分析等によって解析して信号波形の情報を得るのであれば、やはりＦＦＴ等の周波数分析を行う必要があり、相対的に計算量が多くなる。そのため、例えば図２のＳ１０４での音高データＤ２を抽出及びＳ１０６での採点基準データＤ２の取得をした後、それらの差分（Ｄ１−Ｄ２）が正負転換するタイミングに音高波形の１周期を推定することが考えられる。例えば、正から負へ反転したタイミングから、次に同じように正から負へ反転したタイミングまでの時間を１周期として算出すればよい。もちろん、逆に、負から正へ反転したタイミングから、次に同じように負から正へ反転したタイミングまでの時間を１周期として算出してもよい。さらには、例えば正から負へ反転したタイミングから、次に負から正へ反転したタイミングまでの時間を半周期とし、それを２倍して１周期を算出してもよい。そして、上述した所定期間を、この周期推定手段によって推定した１周期の整数倍とする。
【００５８】
もちろん、「正弦波的な」波形信号といっても変化周期が必ずしも一定とは限らないが、実際の歌唱状況を考えてみるとビブラート又はこぶしを用いた歌唱時における変化周期はほぼ一定に近いことが多いと想定されるため、上述のような手法でも特段問題はない。
【００５９】
そして、このようにして設定した１周期の整数倍単位の「所定期間」において図２のＳ１０８〜Ｓ１１２の処理をする。なお、同一音高の音高データが継続する期間中の一部だけを判定区間とした場合、その判定区間だけたまたまビブラート等の特殊歌唱方法ではなく、別の区間では特殊歌唱方法を用いていたということも考えられる。したがって、例えば同一音高の音高データが継続する期間中の前半及び後半部分から適宜抽出した所定の判定区間において、図２のＳ１０８〜Ｓ１１２の処理を行うようにし、それらのいずれかで特殊歌唱方法が認められる場合には加点するような対処を施しても良い。
【００６０】
また、同一音高の音高データが継続する期間中の全部を判定区間とする場合、あるいは一部を判定区間とする場合であっても、次のような「移動平均」の手法を採用することも考えられる。例えば同一音高の音高データが継続する期間中全部を判定区間とする場合であれば、２周期分の所定期間を１回の比較対象範囲とし、同一音高の音高データが継続する期間中において、その比較対象範囲を順次ずらしながら音高差の平均値を算出するのである。
【００６１】
（２）上記実施例ではカラオケ装置１の本体に設けられた操作パネル１０の操作部によって曲番号等を入力するようにしたが、例えば赤外線信号やＢｌｕｅｔｏｏｔｈ規格に基づく無線通信によって接続されたリモコンなどに上記操作ボタンを準備し、その操作に基づく選曲データをカラオケ装置本体側へ送信するような構成であってもよい。
【図面の簡単な説明】
【図１】実施例のカラオケ装置の概略構成を示すブロック図である。
【図２】実施例のカラオケ装置において実施される採点処理の内、音高データの一致度合いを判定するアルゴリズムを説明するフローチャートである。
【図３】歌唱音声から抽出した音高データＤ１と採点基準データＤ２に基づく採点方法の説明図である。
【図４】本実施例の判定手法の説明図である。
【図５】従来の判定手法の説明図である。
【符号の説明】
１ａ，１ｂ，１ｃ，…カラオケ装置、２…ホストコンピュータ、３…通信ネットワーク、１０…入力装置、１２…ＥＥＰＲＯＭ、１３…ハードディスク、１４…ＣＰＵ、１５…ＲＡＭ、１７…ネットワークインターフェース、１８…音源再生装置、２０…アンプ、２２…スピーカ、２３…マイクロフォン、２４…映像再生装置、２６…表示装置、２８…採点部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to singing scoring in a karaoke apparatus, and more specifically, to a technique for obtaining an appropriate scoring result even when using a technique such as vibrato or fist.
[0002]
[Prior art]
Conventionally, a scoring function is well known as an additional function of a karaoke apparatus. This scoring function generates singing data indicating a singing state such as a pitch, a voice volume, or a tempo of a singer by sampling a singer's voice signal input from a microphone. The singing data is compared with scoring reference data such as main melody data in the karaoke data, and a predetermined score is given based on the comparison result to generate scoring data. When the singing part is completed, the scores in the scoring data are totaled to calculate a total score. The total score is displayed as it is on a scoreboard or a display, or a video reflecting the total score such as a video including a predetermined message or a predetermined expression content is output to a display (for example, see Patent Document 1).
[0003]
[Patent Document 1]
JP-A-11-305786
[0004]
[Problems to be solved by the invention]
However, in the conventional singing scoring method, superiority of singing ability is determined based on the degree of coincidence of the singing state with a uniquely determined scoring standard. For example, when looking at the degree of coincidence of pitches, it is determined that the singing ability is excellent when the difference between the two pitches is as close to zero as possible. Therefore, using techniques such as vibrato and fist, for example, the difference between the pitch data obtained from the voice signal of the singer and the pitch data of the singing melody as the scoring standard increases, There was a problem that even if you felt like you were singing well, your score would be bad.
[0005]
Therefore, an object of the present invention is to provide a karaoke apparatus capable of scoring with appropriate evaluation of singing ability in response to vibrato and fist which are generally good techniques when singing karaoke.
[0006]
Means for Solving the Problems and Effects of the Invention
(1) In a karaoke apparatus according to claim 1, which has been made to solve the above-mentioned problems, the music data storage means stores karaoke performance data for playing a karaoke music and pitch data of the singing melody of the karaoke music. The karaoke performance means reads out the karaoke performance data of the designated karaoke music from the music data storage means and performs the karaoke performance. When a karaoke singing voice signal is input via the voice signal input means, the pitch extracting means samples the voice signal and extracts the karaoke singing pitch data. Then, the scoring means reads out, from the music data storage means, scoring reference data corresponding to the performance music in synchronization with the karaoke performance by the karaoke playing means, and reads out the read scoring reference data and the pitch data extracted by the pitch extracting means. The scoring is based on the following, but in more detail, the following scoring is performed.
[0007]
In other words, the first pitch difference calculating means calculates the difference between the plurality of pitch data extracted by the pitch extracting means during the predetermined period and the corresponding pitch data in the scoring reference data read from the music data storage means. The absolute values of the differences are calculated, and the average of the absolute values is calculated. Also, the second pitch difference calculating means may calculate the plurality of pitch data extracted by the pitch extracting means during a predetermined period and the corresponding pitch data in the scoring reference data read from the music data storage means. Calculate the absolute value of the average of the differences. Then, the special singing method determining means determines whether the difference between the pitch difference calculated by the first pitch difference calculating means and the pitch difference calculated by the second pitch difference calculating means is a special pitch using vibrato or fist. If the singing method other than the singing method is larger than the determination value that cannot be obtained, it is determined that the method is a special singing method using vibrato or fist. If the special singing method determining means determines that the method is a special singing method using vibrato or fist, the score is calculated using the pitch difference calculated by the second pitch difference calculating means.
[0008]
Here, the special singing method determining means determines whether or not the special singing method is a special singing method using vibrato or fist based on the difference between the pitch differences calculated by the first and second pitch difference calculating means. The reason why the determination can be made will be described.
[0009]
First, considering the characteristics of the singing voice signal when using techniques such as vibrato and fist, when focusing on the pitch, almost the same difference in the pitch direction around the reference pitch A "sinusoidal" waveform signal that changes almost periodically between pitches having a. In general, the sampling interval for scoring is shorter than this change period and does not coincide with the change period. A situation occurs that is significantly deviated from the height. However, when looking at the entire vibrato portion, the state deviates from the reference pitch to a higher side at one sampling timing, the state deviates from the reference pitch to a lower side at another sampling timing, and the reference tone at another sampling timing. A state in which the pitch matches or is very close to the pitch is obtained. When the pitches of the entire vibrato portion are averaged, they are close to the reference pitch.
[0010]
On the other hand, if you do not use techniques such as vibrato or fist, but sing without the reference pitch, so-called `` poor '', at almost all sampling timings, Will be off the same side. In other words, if it deviates to the higher side from the reference pitch, it deviates to the higher side by almost the same amount at almost all sampling timings, and if it deviates to the lower side from the reference pitch, almost all sampling timings At the same time, the sign of the pitch difference is the same, and the absolute value of the pitch difference is also substantially equal.
[0011]
Based on these analyses, it was thought that the difference between the two could be reflected by the following physical quantities. In other words, when using a technique such as vibrato or fist, and singing at the correct pitch, the waveform signal becomes a "sine wave" as described above, and thus, at a predetermined sampling interval from the singing voice signal. For the plurality of pitch data obtained during the predetermined period, the average value of the absolute value of the difference from the pitch data serving as the grading reference is relatively large, but the average of the difference from the pitch data serving as the grading reference is averaged. The absolute value of the value becomes relatively small. On the other hand, when singing with the reference pitch removed without using techniques such as vibrato or fist, the average value of the absolute value of the difference from the pitch data serving as the scoring standard and the scoring standard are compared. Both absolute values of the average value of the difference from the pitch data become relatively large values. Therefore, it is considered that the difference can be determined by focusing on these differences, and the above-described determination method of the special singing method determination unit is employed. The “judgment value that cannot be obtained by a singing method other than the special singing method using vibrato or fist” used for this judgment may be obtained by, for example, an experiment. For example, using a technique such as vibrato or fist and singing at the correct pitch, the average value of the absolute value of the difference from the pitch data serving as the scoring reference is calculated. If it is, for example, about two tones (four semitones), the decision value is set to three semitones.
[0012]
If the singing voice signal is analyzed by frequency analysis or the like to obtain signal waveform information, it is possible to determine the presence or absence of vibrato from the waveform. However, in this case, it is necessary to perform frequency analysis such as FFT, and the amount of calculation becomes relatively large. On the other hand, in the case of the present invention, since it can be dealt with by the simple four arithmetic operations, it is very advantageous in that it is possible to perform appropriate scoring by virtue of the presence or absence of vibrato by relatively simple calculation.
[0013]
Then, in the karaoke apparatus of the present invention, when it is determined that the special singing method using vibrato or fist, scoring is performed using the pitch difference calculated by the second pitch difference calculating means. This is because the pitch difference calculated by the second pitch difference calculating means is viewed as a whole, as can be seen from the characteristic analysis of the voice signal of the singing when the technique such as vibrato or fist described above is used. This is because it is considered that this reflects the degree of coincidence with the reference pitch in the case where the pitch is changed. By scoring in this manner, when singing a karaoke song, scoring that appropriately evaluates the singing ability can be performed in response to vibrato and fist, which are generally good techniques.
[0014]
(2) As described in claim 2, when the special singing method determining unit determines that the special singing method uses vibrato or fist, scoring is performed by adding a predetermined score. Good. As described above, scoring using the pitch difference calculated by the second pitch difference calculating means is an unfair disadvantageous scoring in the case of a special singing method using vibrato or fist. It is a reluctant response to not be done. However, techniques such as vibrato and fist are generally considered to be good techniques, so scoring more favorably than singing without such techniques or saying `` It is effective from the viewpoint of "scoring evaluated". Therefore, techniques such as vibrato or fist are set as points to be added, and when such a technique is used, a predetermined score is added.
[0015]
In addition, such a score as a scoring result may be displayed on a display or the like as in the past, but in this case, if a technique such as vibrato or fist is recognized and added, for example, the fact is added. May be displayed together. In this way, the singer knows that his technique has been recognized and added points, and the singer is more satisfied.
[0016]
(3) By the way, in the scoring, with respect to the predetermined period during which the pitch extracting means extracts a plurality of pieces of pitch data as calculation targets of the first pitch difference calculating means and the second pitch difference calculating means, as follows. Some ideas can be considered.
For example, a period in which the pitch data of the same pitch to be calculated in the scoring reference data read from the music data storage means continues is adopted as the “predetermined period”. When the same pitch continues, there is a possibility that singing using vibrato or fist is performed in part or all of the period. Therefore, by targeting all parts during this period, scoring that appropriately evaluates the singing ability can be performed regardless of where vibrato or fist is used.
[0017]
Of course, it is also possible to adopt not a whole period of the pitch data of the same pitch to be calculated in the scoring reference data but a part of the period as the “predetermined period”. However, in order to properly grasp the property that the absolute value of the average value of the difference from the pitch data serving as the scoring reference becomes relatively small, among the “sinusoidal” waveform signals as described above, This is preferable because it is considered in the range of an integral multiple of one cycle. However, in order to detect one cycle, if the singing voice signal is analyzed by frequency analysis or the like to obtain signal waveform information as described above, it is necessary to perform frequency analysis such as FFT. The amount increases.
[0018]
Thus, the difference between the pitch data extracted by the pitch extracting means and the corresponding pitch data in the scoring reference data read from the music data storage means is converted to positive or negative. It is conceivable to estimate one cycle of a pitch waveform due to vibrato or fist based on the timing of the execution. For example, the period from the timing of inversion from positive to negative to the next timing of inversion from positive to negative may be calculated as one cycle.
[0019]
The predetermined period set in this way is set as one comparison target range, and during a period in which pitch data of the same pitch continues, the average value of the pitch difference is calculated while sequentially shifting the comparison target range. It is also conceivable to employ a so-called “moving average” technique.
[0020]
(4) As described in claim 5, when the scoring means in the karaoke apparatus according to any one of claims 1 to 4 is realized by a computer, it can be provided as, for example, a program executed by the computer. Such a program is recorded on a computer-readable recording medium such as a flexible disk, a magneto-optical disk, a CD-ROM, a hard disk, a ROM, and a RAM, and loaded into the computer as necessary. By executing the program, or loading and executing the program via a network, a function as a scoring unit can be realized.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments to which the present invention is applied will be described with reference to the drawings. It is needless to say that the embodiments of the present invention are not limited to the following examples, and can take various forms within the technical scope of the present invention.
[0022]
FIG. 1 is a block diagram illustrating the configuration of the karaoke apparatus according to the present embodiment. In this embodiment, a plurality of karaoke apparatuses 1a, 1b, 1c... Are connected to a local area network (LAN) 30 to form a karaoke system. Although the basic configuration is the same for all karaoke apparatuses 1, only the karaoke apparatus 1a can be connected to the host computer 2 via the communication network 3. Then, the karaoke apparatus 1a can acquire music information and image information related to karaoke from the host computer 2 connected via the communication network 3. The music information and the image information relating to the karaoke acquired by the karaoke apparatus 1a can be acquired by the other karaoke apparatuses 1b, 1c,.
[0023]
The host computer 2 can access the karaoke apparatus 1a via the communication network 3 and sends the karaoke apparatus 1a the music data such as the latest popular music or the number of times of what music. Related information including log data indicating whether or not a performance has been performed can be received from the karaoke apparatus 1a and managed. In this case, the “log data indicating what music has been played and how many times” means not the log data of the karaoke apparatus 1a alone but the log data of the entire karaoke system. That is, the log data of the other karaoke apparatuses 1b, 1c,... Is also collected in the karaoke apparatus 1a, and the log data of the entire karaoke system is transmitted from the karaoke apparatus 1a to the host computer 2.
[0024]
Further, the host computer 2 has a database, which stores music information as content data used for music performance, image information such as background images or lyrics, and the like. In addition, the host computer 2 stores the upgraded system program and the like in addition to the content data in a database, reads out the database from the database as needed, and transmits the system program to the karaoke apparatus 1.
[0025]
Next, the configuration of the karaoke apparatus 1a will be described.
As shown in FIG. 1, the karaoke apparatus 1a is connected to a host computer 2 via a communication network 3 to transmit and receive various kinds of information, an operation panel 10 for performing music reservation and the like, a karaoke apparatus 1a. CPU 14 for overall control, RAM 15 for temporarily storing various information, sound source reproducing device 18 for reproducing performances, amplifier 20 for amplifying electric signals related to music information, and inputting electric signals from amplifier 20 A speaker 22 that outputs accompaniment music and the singing voice of the user, a microphone (hereinafter simply referred to as a microphone) 23 for inputting the singing voice of the user and the like to the amplifier 20, and a singing voice input from the microphone 23 is sampled. Scoring unit 28 that extracts singing pitches by analysis and compares it with scoring reference data to score, music information and image information for karaoke, etc. EEPROM 12 and hard disk 13 storing seed data, video reproducing device 24 for visualizing image information and the like, display device 26 for displaying background information and lyrics as image information, and network for connecting to LAN 30 An interface 17 is provided. The operation panel 10, the EEPROM 12, the hard disk 13, the RAM 15, the network interface 17, the sound source playback device 18, the video playback device 24, and the scoring unit 28 are connected to the CPU 14.
[0026]
Since the other karaoke devices 1b, 1c,... Do not include the communication device 19 but include other components, only the karaoke device 1a will be described here, and the configuration of the other karaoke devices 1b, 1c,. The description of is omitted.
The communication device 19 is a modulation / demodulation device that modulates and demodulates a signal, and is configured to be accessible to the host computer 2 through the communication network 3 under the control of the CPU 14. Thus, the communication device 19 can receive music data and the like transmitted from the host computer 2 via the communication network 3 and transmit the related information to the host computer 2.
[0027]
The operation panel 10 is operated by the user to select an arbitrary song, adjust keys for performance sounds, adjust a volume balance between performance and song, and perform various adjustments such as echo, volume, and tone. An operation unit and a display unit (not shown) for displaying, for example, a selected music number are provided. By operating the operation unit, the user can make a reservation registration of the music to be reproduced in the karaoke apparatus 1.
[0028]
The EEPROM 12 stores a system program and setting data necessary for various settings.
The hard disk 13 stores content data such as music information and image information and log data such as performance records. When a song is selected via the operation unit of the operation panel 10, the CPU 14 calls up the music information including the lyrics data, the image data including the video data, and the music information including the performance data from the hard disk 13, and The data is output in synchronization with the sound source reproducing device 18. The hard disk 13 stores performance data, lyrics data, and the like corresponding to a song number as identification information.
[0029]
Thereafter, the performance data output from the CPU 14 is converted into an analog performance sound signal in the sound source reproducing device 18 and then sent to the amplifier 20 to be electrically amplified. The amplifier 20 mixes the singing sound signal of the user (singer) input via the microphone 23 at an appropriate ratio, and the mixed singing sound signal and performance sound signal are transmitted from the amplifier 20 to the speaker 22. And output as voice and performance sound from the speaker 22 to the outside.
[0030]
On the other hand, the video reproduction device 24 reproduces an image based on the image information read from the hard disk 13 under the control of the CPU 14. Thereby, the lyrics data output by the CPU 14 is combined with the video data in the video playback device 24, and the lyrics telop is displayed on the screen of the display device 26 together with the background video.
[0031]
With such a configuration, the user can sing using the microphone 23 in accordance with the karaoke performance played from the speaker 22 while referring to the lyrics telop displayed on the display device 26.
Further, the karaoke apparatus 1a of the present embodiment has a singing scoring function for scoring the singing state of the person singing along with the karaoke performance and outputting the scoring result. When the user wants to use the singing score function with the karaoke apparatus 1a, he or she presses a predetermined operation key following a series of music numbers for performance reservation from an operation panel 10 such as a panel or a remote controller. Thereby, the music number to which the code indicating the operation command of the singing scoring function is added is transferred to the CPU 14. When performing the music performance processing, the CPU 14 performs the performance reservation processing on the assumption that the singing scoring function is activated. Of course, the song grading function may be activated by a user's billing procedure or the like. For example, a separate charging device is connected to the karaoke device 1a, and a singing scoring function is activated by inputting a predetermined fee to the charging device at an appropriate time, such as immediately before the music to be scored is played or during a prelude. Such a configuration may be adopted.
[0032]
The scoring unit 28 performs singing scoring. The scoring unit 28 is configured to input a singing voice from the microphone 23. The scoring unit 28 extracts the singing pitch and rhythm as a scoring element by sampling and analyzing the input singing voice. On the other hand, the accompaniment music generation data stored in the hard disk 13 is received via the data bus and the CPU 14, and the vocal data (singing melody data) in the data is acquired as scoring reference data. Then, the scoring element based on the acquired scoring reference data is compared with the scoring element extracted from the above-mentioned singing voice, and the singing voice is scored based on how close it is to the vocal data. The vocal data is also used to realize a function of outputting the accompaniment music sound from the speaker 22 with a relatively small volume as a so-called guide melody function.
[0033]
The data of the scoring result is transferred to the CPU 14, and the CPU 14 receives the transferred scoring data and stores it in the RAM 15. When detecting the end point of the scoring section in the karaoke music, the CPU 14 causes the scoring unit 28 to stop generating scoring data, accumulates scoring data accumulated in the RAM 15 during the scoring section, and displays the scoring result on the display device 26. Display.
[0034]
This is the description of the configuration and schematic operation for scoring, but in the karaoke apparatus 1a of the present embodiment, it is possible to judge vibrato / fist and perform more appropriate scoring. This is in view of the following problems. In other words, the conventional method of judging the degree of coincidence between the pitch data extracted from the singing voice and the pitch data of the vocal data in the accompaniment music generation data requires a technique such as vibrato or fist. The difference from the high data increases. Therefore, even if the person listening on the side feels that they are singing well, the score will be bad. Therefore, in the karaoke apparatus 1a of the present embodiment, it is possible to make a determination in consideration of the presence of vibrato and fist in a part for determining the degree of coincidence of pitch data.
[0035]
This will be described with reference to FIGS. FIG. 2 is a flowchart illustrating an algorithm for judging the degree of coincidence of pitch data in the scoring process performed by the scoring unit 28 in the embodiment. FIG. 3 is a diagram illustrating pitch data D1 extracted from singing voice. It is explanatory drawing of the scoring method based on scoring reference data D2. A pitch difference between the pitch data D1 and the scoring reference data D2 is calculated for each note, and a predetermined score is given according to the obtained pitch difference. In the example shown in FIG. 3, a vibrato / fist exists in the latter half of the last note. FIG. 4 is an explanatory diagram of a determination method according to the present embodiment, and FIG. 5 is an explanatory diagram of a conventional determination method.
[0036]
As shown in FIG. 2, the scoring unit 28 first takes in a singing voice signal input from the microphone 23 (S102). Then, the captured singing voice signal is sampled and analyzed at a predetermined timing (for example, a constant cycle) to extract pitch data D1 (S104). Further, the accompaniment music generation data is received via the data bus, and the pitch data in the vocal data in the data is acquired as the scoring reference data D2 (S106).
[0037]
In the next S108, the absolute value D3 of the value obtained by subtracting the scoring reference data D2 corresponding to the singing timing at which the pitch data D1 was obtained from the pitch data D1 extracted from the singing voice is calculated. , An average value D31 of the absolute value D3 is calculated. For the “predetermined period”, for example, it is conceivable to adopt a period in which the pitch data of the same pitch continues in the scoring reference data D2 to be calculated. When the same pitch continues, there is a possibility that singing using vibrato or fist is performed in part or all of the period. Therefore, by targeting all parts during this period, scoring that appropriately evaluates the singing ability can be performed regardless of where vibrato or fist is used.
[0038]
The calculation of the average value D31 in S108 will be specifically described with reference to FIG.
The vertical axis in FIG. 4A indicates the magnitude of the frequency, the horizontal axis indicates the time axis, and corresponds to the frequency acquired in S106 of the scoring reference data D2. The black circles are the pitch data D1 of the singing voice extracted in S104, and the curve connecting these black circles continuously indicates the frequency of the singing voice when singing using vibrato or fist. . Then, the absolute value D3 of the value obtained by subtracting the scoring reference data D2 from the pitch data D1 is indicated by a black circle in FIG. 4B. The vertical axis in FIG. Is the time axis, and the average value D31 of the absolute value D3 is indicated by a dashed line, and is shown in a bar graph by drawing a line from the time axis to a black circle in FIG. The average length value corresponds to D31.
[0039]
Returning to the description of the flowchart of FIG. 2, in the next S110, a value D4 is calculated by subtracting the scoring reference data D2 corresponding to the singing timing at which the pitch data D1 was obtained from the pitch data D1 extracted from the singing voice. Then, the absolute value D42 of the average value D41 of the subtraction value D4 during the predetermined section is calculated. This "predetermined period" is the same as the predetermined period in S108.
[0040]
The subtraction value D4 in S110 is indicated by a black circle in FIG. Note that the vertical axis in FIG. 4B is the frequency, and the horizontal axis is the time axis. Further, in FIG. 4C, the average value D41 of the subtraction value D4 is indicated by a dashed line, and the absolute value D42 of the average value D41 is indicated by a two-dot chain line.
[0041]
In the next S112, it is determined whether or not a value obtained by subtracting the absolute value D42 (of the average value D41) calculated in S110 from the average value D31 (of the absolute value D3) calculated in S108 is equal to or more than a predetermined value α. As the predetermined value α, for example, a value such as 3 degrees (3 semitones) can be considered. This predetermined value α will be supplemented later.
[0042]
If (D31−D42) ≧ α (S112: YES), since this one sound is considered to be a special singing method using vibrato or fist, it is calculated in S110 (of the average value D41). The value D42 is set as the difference from the pitch used as the scoring reference (S114). Then, a predetermined point is added by using the special singing method using vibrato or fist (S116). This additional point gives a certain score.
[0043]
In this case, the scoring unit 28 performs scoring based on the pitch difference obtained in S114 of FIG. 2 and adds a score to be added obtained in S116 in the main scoring process executed separately, and performs scoring. Do. In addition, the score as a result of the scoring as described above is displayed on the display device 26. If the score is added through the process of S116, the fact is added (for example, “O points added by vibrato or fist”). Etc.) may be displayed. In this way, the singer can know that a technique such as vibrato or fist has been recognized and added points, and the singer's satisfaction increases.
[0044]
On the other hand, if (D31-D42) <α (S112: NO), it is considered that this one sound has not been subjected to the special singing method using vibrato or fist, and thus is calculated in S108 (the absolute value D3). ) The average value D31 is set as the difference from the pitch used as the scoring reference (S118). The scoring unit 28 in this case performs scoring based on the pitch difference obtained in S118 in FIG. 2 in a main scoring process that is separately executed.
[0045]
Here, as described in the description of S112, if (D31−D42) ≧ α, it is considered that the special singing method uses vibrato or fist, and if (D31−D42) <α, vibrato or fist. The reason why it is considered that the special singing method using is not performed will be described.
[0046]
Focusing on the pitch of the singing voice when using techniques such as vibrato or fist, the pitch between the pitches with almost the same difference in the pitch direction around the reference pitch changes almost periodically The result is a "sinusoidal" waveform signal. That is, it is a waveform signal as shown in FIG. Since the sampling interval for scoring is shorter than this change period, it does not coincide with the change period. Therefore, as can be seen from FIG. 4 (a), there is a state in which, at a certain sampling timing, the reference pitch is greatly deviated, although the singing is performed at the correct pitch as a whole. However, when looking at the entire vibrato portion, the state deviates from the reference pitch to a higher side at one sampling timing, the state deviates from the reference pitch to a lower side at another sampling timing, and the reference tone at another sampling timing. High or very close states are obtained. Accordingly, when the pitches of the entire vibrato portion are averaged, the pitch is a reference pitch, such as D41 (= D4 (= D1-D2) average value) indicated by a dashed line in FIG. 4C. It becomes close to the scoring reference data D2.
[0047]
For comparison, a similar test will be conducted for a so-called “bad” case in which the user does not use techniques such as vibrato and fist, but sings with the reference pitch removed. FIG. 5A shows pitch data D1 and scoring reference data D2 when singing with a reference pitch removed without using a technique such as vibrato or fist. The characteristic of the pitch data D1 extracted from the singing voice in this case is that the pitch data D1 deviates to the same side with respect to the reference pitch at almost all sampling timings. That is, when the reference pitch is deviated to a higher side (a side having a higher frequency with respect to the scoring reference data D2), the deviates to the higher side by almost the same amount at almost all sampling timings. In the case where the pitch deviates to the lower side (the side where the frequency is smaller than the scoring reference data D2), it deviates to the lower side by almost the same amount at almost all sampling timings. The absolute values of the differences are also substantially equal.
[0048]
When the processing of S108 and S110 in FIG. 2 is performed on such data D1 and D2, the “absolute value D3 of the value obtained by subtracting the scoring reference data D2 from the pitch data D1” in S108 is shown in FIG. Since this is indicated by a black circle in b), the average value D31 of the absolute value D3 is indicated by a dashed line in FIG. On the other hand, since the “value D4 obtained by subtracting the scoring reference data D2 from the pitch data D1” in S110 is indicated by a black circle in FIG. 5C, the absolute value D42 of the average value D41 of the subtraction value D4 is shown in FIG. Is shown by a two-dot chain line. That is, D31 and D42 have the same value.
[0049]
Due to the difference between D31 and D42 shown in FIGS. 4 and 5, singing is performed without the reference pitch without using a technique such as vibrato or fist as shown in FIG. In this case, the value of (D31-D42) becomes almost zero, while singing well in a state close to the reference pitch using a technique such as vibrato or fist as shown in FIG. In this case, the value of (D31-D42) becomes a certain size. For example, if the width of the vibrato is about two to two and a half (4 to 5 semitones) with respect to the reference pitch, the value of (D31-D42) will be one and a half (3 semitones) depending on the sampling timing. ). Such a value is a value that cannot be obtained when singing out of the reference pitch without using a technique such as vibrato or fist as shown in FIG. Therefore, for example, the predetermined value α for the determination in S112 of FIG. 2 may be set to about three semitones. Of course, the width of the vibrato may vary depending on the song or singer, and may be smaller than the above example. Therefore, the predetermined value α for determination is a value such as two semitones or one semitone smaller than the above example. May be adopted.
[0050]
When determining the determination value α, it may be determined by, for example, an experiment. Using a technique such as vibrato or fist and singing at the correct pitch, the average value of the absolute value of the difference from the pitch data serving as the scoring reference is calculated. For example, if it is about two tones (four semitones), the judgment value is set to three semitones. If it is about one sound (two semitones), the judgment value is made one semitone. . For example, the analysis may be based on the degree of vibrato or fist of the singer singing the original song. The singing voice portion (vocal portion) is extracted from the music information containing the singing voice of the singer, and the value of (D31-D42) is calculated in the vibrato and fist portions. It is also conceivable to calculate such a sample value based on the vocal portions of a plurality of singers and take an average of them to determine the predetermined value α for determination.
[0051]
As described above, according to the karaoke apparatus 1a of the present embodiment (the same applies to the other karaoke apparatuses 1b, 1c, etc.), the following effects can be obtained.
(1) When singing karaoke, scoring that appropriately evaluates singing ability can be performed in response to vibrato and fist, which are generally good techniques.
[0052]
Note that if the singing voice signal is analyzed by frequency analysis or the like to obtain signal waveform information, the presence or absence of vibrato can be determined from the waveform, but in this case, it is necessary to perform frequency analysis such as FFT. Yes, the amount of calculation is relatively large. On the other hand, in the case of the present embodiment, as can be understood from the description of S108, S110, and S112 in FIG. It is very advantageous in that it can perform scoring.
[0053]
(2) In addition, as shown in S116 of FIG. 2, when a special singing method using vibrato or fist is performed, points are added. Techniques such as vibrato and fist are generally considered to be good techniques, so scoring more favorably than singing without such techniques means that `` the singing ability was properly assessed. "Scoring" can be realized.
[0054]
In the present embodiment, the hard disk 13 corresponds to “song data storage means”, and the CPU 14 and the sound source reproducing device 18 correspond to “karaoke performance means”. The microphone 23 corresponds to an audio signal input unit, and the scoring unit 28 corresponds to a pitch extracting unit and a scoring unit. In addition, among the processing in FIG. 2, S108, S110, and S112 respectively execute processing as “first pitch difference calculating means”, “second pitch difference calculating means”, and “special singing method determining means”. Is equivalent to
[0055]
Although the embodiments have been described above, the present invention is not limited to the above-described embodiments, and can be implemented in various modes. I will explain some of them.
(1) In the above embodiment, the pitches having the same pitch in the scoring reference data D2 are defined as “predetermined sections” of the data to be calculated for the average value D31 in S108 and the absolute value D42 in S110 in FIG. A period during which the data continued was adopted. However, instead of the entire period in which the pitch data of the same pitch continues, a part of the period may be adopted as the “predetermined period”. However, since it is determined whether or not the special singing method using vibrato or fist is used based on the magnitude of the value of (D31-D42) in S112 in FIG. 2, the following point is devised. It is desirable to apply.
[0056]
In other words, the vibrato or fist portion is a “sinusoidal” waveform signal that changes almost periodically between pitches having substantially the same difference in the pitch direction around the reference pitch, so that the whole 4C, the absolute value D42 of the average value D41 of the difference D4 between the pitch data D1 based on the singing voice and the scoring reference data D2 is relatively small, as indicated by the two-dot chain line in FIG. It has the property of becoming. Since this characteristic is used to determine whether or not a special singing method is used, in order to properly grasp the characteristics of vibrato or fisting, the “sine wave” waveform signal must be an integral multiple of one cycle. It is preferable because it is considered in the range. For example, FIG. 4 shows data for three cycles. Actually, even if the vibrato or fist portion has a longer period, the calculation may be performed focusing on, for example, only three periods.
[0057]
However, if a singing voice signal is analyzed by frequency analysis or the like to obtain signal waveform information in order to detect one cycle, it is necessary to perform frequency analysis such as FFT, which requires a relatively large amount of calculation. . For this reason, for example, after extracting the pitch data D2 in S104 of FIG. 2 and obtaining the scoring reference data D2 in S106, one cycle of the pitch waveform is changed at the timing when the difference (D1-D2) changes between positive and negative. It can be estimated. For example, the period from the timing of inversion from positive to negative to the next timing of inversion from positive to negative may be calculated as one cycle. Of course, conversely, the period from the timing of inversion from negative to positive to the next timing of inversion from negative to positive may be calculated as one cycle. Further, for example, a period from the timing when the signal is inverted from positive to negative to the next timing when the signal is inverted from negative to positive may be set as a half period, and the period may be doubled to calculate one period. Then, the above-mentioned predetermined period is set to an integral multiple of one cycle estimated by the cycle estimating means.
[0058]
Of course, the change period is not always constant even though it is a "sine wave" waveform signal, but considering the actual singing situation, the change period when singing using vibrato or fist is almost constant Since it is assumed that there are many cases, there is no particular problem even with the above-described method.
[0059]
Then, the processing of S108 to S112 in FIG. 2 is performed in the “predetermined period” of an integral multiple of one cycle set as described above. In addition, when only a part of the period in which the pitch data of the same pitch continues was determined to be the determination section, only the determination section happened to use a special singing method in another section instead of a special singing method such as vibrato. It is also possible. Therefore, for example, in the predetermined determination section appropriately extracted from the first half and the second half of the period in which the pitch data of the same pitch continues, the processing of S108 to S112 in FIG. If the method is acceptable, measures to add points may be taken.
[0060]
In addition, even when the entirety of the period in which the pitch data of the same pitch continues is used as the determination section, or when a part of the data is used as the determination section, the following “moving average” technique is used. It is also possible. For example, in the case where the entirety of the period in which the pitch data of the same pitch continues is the determination section, the predetermined period of two cycles is set as one comparison target range, and the period in which the pitch data of the same pitch continues. In the meantime, the average value of the pitch difference is calculated while sequentially shifting the comparison target range.
[0061]
(2) In the above embodiment, the tune number and the like are input by the operation unit of the operation panel 10 provided on the main body of the karaoke apparatus 1. However, for example, a remote controller or the like connected by infrared communication or wireless communication based on the Bluetooth standard is used. The above-mentioned operation buttons may be prepared, and music selection data based on the operation may be transmitted to the karaoke apparatus main body side.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a schematic configuration of a karaoke apparatus according to an embodiment.
FIG. 2 is a flowchart illustrating an algorithm for determining the degree of coincidence of pitch data in the scoring process performed in the karaoke apparatus of the embodiment.
FIG. 3 is an explanatory diagram of a scoring method based on pitch data D1 extracted from singing voice and scoring reference data D2.
FIG. 4 is an explanatory diagram of a determination method according to the present embodiment.
FIG. 5 is an explanatory diagram of a conventional determination method.
[Explanation of symbols]
1a, 1b, 1c, karaoke device, 2 host computer, 3 communication network, 10 input device, 12 EEPROM, 13 hard disk, 14 CPU, 15 RAM, 17 network interface, 18 sound source reproduction Device, 20 amplifier, 22 speaker, 23 microphone, 24 video playback device, 26 display device, 28 scoring unit.

Claims

Song data storage means for storing karaoke performance data for playing a karaoke song and scoring reference data including pitch data of singing melody of the karaoke song;
Karaoke performance means for performing karaoke performance by reading karaoke performance data of the designated karaoke music from the music data storage means;
Voice signal input means for inputting a voice signal of karaoke singing;
Pitch extraction means for sampling the audio signal input via the audio signal input means and extracting the pitch data of the karaoke song,
In synchronization with the karaoke performance by the karaoke performance means, the scoring reference data corresponding to the performance music is read from the music data storage means, and the read scoring reference data and the pitch data extracted by the pitch extraction means are read out. Scoring means for scoring based on
A karaoke device comprising:
The scoring means,
Calculating the absolute value of the difference between the plurality of pitch data extracted by the pitch extraction means during a predetermined period and the corresponding pitch data in the scoring reference data read from the music data storage means; First pitch difference calculating means for calculating an average value of the absolute values,
Calculating an absolute value of an average value of a difference between a plurality of pitch data extracted by the pitch extracting means during a predetermined period and corresponding pitch data in the scoring reference data read from the music data storage means; Second pitch difference calculating means;
In a singing method other than the special singing method using vibrato or fist, a difference between the pitch difference calculated by the first pitch difference calculating means and the pitch difference calculated by the second pitch difference calculating means is different. Special singing method determining means to determine that it is a special singing method using the vibrato or fist, if the determination value is larger than an impossible value,
If the special singing method determining means is determined to be a special singing method using vibrato or fist, the scoring using the pitch difference calculated by the second pitch difference calculating means A karaoke apparatus characterized by performing.

The karaoke apparatus according to claim 1,
The karaoke apparatus, wherein the scoring means performs scoring by adding a predetermined score when the special singing method judging means judges that the special singing method uses vibrato or fist.

The karaoke device according to claim 1 or 2,
A predetermined period during which the pitch extracting means extracts a plurality of pitch data as a calculation target of the first pitch difference calculating means and the second pitch difference calculating means is a scoring read out from the music data storage means. A karaoke apparatus characterized by a period in which pitch data of the same pitch to be calculated in the reference data continues.

The karaoke device according to claim 1 or 2,
The scoring means,
Further, based on the timing at which the difference between the pitch data extracted by the pitch extracting means and the corresponding pitch data in the scoring reference data read from the music data storage means changes between positive and negative, the sound generated by vibrato or fisting is used. A period estimating means for estimating one period of the high waveform,
The predetermined period during which the pitch extracting means extracts a plurality of pitch data as a calculation target of the first pitch difference calculating means and the second pitch difference calculating means is one cycle estimated by the cycle estimating means. A karaoke apparatus characterized by being an integral multiple of.

A program for causing a computer to function as the scoring unit in the karaoke apparatus according to claim 1.

A recording medium in which a program for causing a computer to function as the scoring unit in the karaoke apparatus according to claim 1 is recorded.