JP4275762B2

JP4275762B2 - Voice instruction device and karaoke device

Info

Publication number: JP4275762B2
Application number: JP07386998A
Authority: JP
Inventors: トム蔡
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 1998-03-23
Filing date: 1998-03-23
Publication date: 2009-06-10
Anticipated expiration: 2018-03-23
Also published as: JPH11272283A

Description

【０００１】
【発明の属する技術分野】
この発明は、コマンドの入力やテンポ，キー設定などをマイクからの音声入力で行うことができる音声指示装置およびカラオケ装置に関する。
【０００２】
【従来の技術】
カラオケ装置を利用するためには、選曲や演奏スタートなどの操作が必要であるが、近年カラオケ装置の高機能化に伴って、選曲や演奏スタート以外に新機能を利用するための種々の操作が必要になっている。従来は、赤外線のリモコン装置のボタンや装置本体のパネルスイッチをオンすることによって上記操作を行っていた。
【０００３】
【発明が解決しようとする課題】
しかし、赤外線リモコンや装置本体のパネルスイッチなどで多くの機能の操作を行おうとすると、ボタンの数を増やしたり、複雑なキーシーケンスを定めたりする必要がある。ボタンの数を増やすためには、ハードを改造することが必要であり、容易に行うことができず、コストアップや対応に時間が掛かるなどの問題点があった。また、キーシーケンスで新機能を操作しようとすれば、キーシーケンスが複雑化し、キー操作が面倒になるという問題点があった。
【０００４】
この発明は、マイクから入力する音声信号によって種々の機能を操作できるようにした音声指示装置およびカラオケ装置を提供することを目的とする。
【０００５】
【課題を解決するための手段】
この出願の請求項１の発明は、連続する複数の音声信号のそれぞれの音楽要素の組み合わせであるコマンドパターンと該コマンドパターンに対応する処理内容とを対応して記憶したコマンドパターンテーブルと、マイクから入力された音声信号から検出された複数の音楽要素で前記コマンドパターンテーブルを検索し、一致したコマンドパターンが抽出されたとき、そのコマンドパターンに対応する処理を実行する制御部と、を備えたことを特徴とする。
【０００６】
この出願の請求項２の発明は、請求項１の発明において、前記音楽要素は、音声信号の音高、音量またはテンポであることを特徴とする。
【０００７】
この出願の請求項３の発明であるカラオケ装置は、請求項１または請求項２に記載の音声指示装置を備えたことを特徴とする。
【０００９】
請求項１の発明において、音楽要素としては、周波数，音量，時間的要素などを用いることができる。周波数はヘルツで表される連続値を用いてもよく段階化された音高を用いてもよい。時間的要素としては、複数の音声信号の発音開始タイミングの時間的間隔や１つの音声信号の長さなどを用いることができる。
【００１０】
これらの音楽要素を用いたコマンドパターンは、たとえば以下のようなもので構成される。
【００１１】
複数の音声信号から抽出した音高データ列や音量データ列
複数の音声信号間の時間間隔（テンポ）や周波数間隔（音程）
１の音声信号から抽出した音高データ，音量データなどの組み合わせ。
【００１２】
すなわち、コマンドパターンの複数の音楽要素は、連続する複数の音声信号から抽出した１種類の音楽要素列であっても、１つの音声信号から抽出した複数の音楽要素群であってもよい。コマンドパターンテーブルには、これらのコマンドパターンに対応して、カラオケ装置などの装置を制御する処理内容を記憶する。処理内容としては、たとえばこの発明をカラオケ装置に適用した場合、カラオケ装置が実行できるカラオケ演奏以外の機能（ゲームや占いなどのコンテンツ）の選択、ゲーム機能におけるキャラクタの動作など各コンテンツ内の操作などがある。この発明によれば、このような、従来のカラオケ装置にはない機能を追加したとき、ハード的な操作ボタンを追加することなく、マイクからの音声入力でこれを操作することができる。
【００１５】
【発明の実施の形態】
図面を参照してこの発明の実施形態であるカラオケ装置について説明する。このカラオケ装置は、歌唱採点機能を備えている。歌唱採点機能は、利用者の歌唱音声信号をリファレンスデータと比較することによって、その音程，音量，リズムについて採点し、歌唱終了後にその得点を表示するものである。この採点機能は、音声信号処理部５０によって実行される。
【００１６】
また、このカラオケ装置は音声コマンド機能を備えている。音声コマンド機能とは、カラオケ曲の演奏中でないとき、利用者がマイク４７から音声を入力することによってカラオケ装置を操作できる機能である。音声コマンド機能は、音高コマンド機能，音量コマンド機能，テンポコマンド機能の３種類があり、これら機能における音高，音量，テンポの判定は採点機能と同様、音声信号処理部５０によって行われる。カラオケ演奏中でないときに、マイク４７から複数の音声信号が連続して入力されると、その音高，音量またはテンポなどの音楽要素を音声信号処理部５０が検出する。その音楽要素の配列パターンがコマンドパターンテーブルに記憶されているコマンドパターンのいずれかと一致するかを判定し、一致するものがあったときそのコマンドパターンが指示するコマンド（処理）を実行する。
【００１７】
図１は同カラオケ装置のブロック図である。図２は、同カラオケ装置のＲＡＭ３２，ハードディスク３７および楽曲データの構成図である。
図１において、装置全体の動作を制御するＣＰＵ３０には、バスを介してＲＯＭ３１，ＲＡＭ３２，ハードディスク記憶装置（ＨＤＤ）３７，通信制御部３６，リモコン受信部３３，表示パネル３４，パネルスイッチ３５，音源装置３８，音声データ処理部３９，コントロールアンプ４０，文字表示部４３，ＣＤ−ＲＯＭチェンジャ４４，表示制御部４５および音声信号処理部５０が接続されている。
【００１８】
ＲＯＭ３１にはこの装置を起動するために必要な起動プログラムなどが記憶されている。この装置の動作を制御するシステムプログラム，カラオケ実行プログラムなどはＨＤＤ３７に記憶されており、装置の電源がオンされると上記起動プログラムによってＲＡＭ３２に読み込まれる。ＲＡＭ３２には、これらのプログラムを記憶するエリアなど図２（Ａ）に示すように種々の記憶エリアが設定されている。図２（Ａ）において、ＲＡＭ３２にはハードディスク３７から読み込まれたプログラムを記憶するプログラム記憶エリア３２０，演奏実行中のカラオケ曲の楽曲データを記憶する実行データ記憶エリア３２１，楽曲データ中のリファレンスデータと歌唱音声信号とを比較することによって求められたポイントを記憶するポイント記憶エリア３２２および上記音声コマンド機能のコマンドパターンを記憶するコマンドパターンテーブル３２３などが設けられている。プログラムおよびコマンドパターンテーブルは電源オン時にハードディスク３７から読み込まれ、実行データ記憶エリア３２１の楽曲データは利用者によって選曲されたときにハードディスク３７から読み込まれる。
【００１９】
また、ＨＤＤ３７には図２（Ｂ）に示すように、上記システムプログラムやアプリケーションプログラムを記憶するプログラム記憶エリア３７０のほか数千曲分の楽曲データを記憶する楽曲データファイル３７１，コマンドパターンテーブル３７２などが設定されている。利用者のカラオケ歌唱を採点するためのリファレンスデータは各カラオケ曲の楽曲データに含まれている。通信制御部３６は、ＩＳＤＮ回線を介してホストステーションから楽曲データなどをダウンロードし、内蔵しているＤＭＡ回路を用いてこの楽曲データをＣＰＵ３０を介さずに直接ＨＤＤ３７に書き込む。
【００２０】
選曲やカラオケ曲スタートなどの通常のコマンドは赤外線のリモコン装置５１から入力される。リモコン受信部３３はリモコン５１から送られてくる赤外線信号を受信してデータを復元する。リモコン５１は選曲スイッチなどのコマンドスイッチやテンキースイッチなどを備えており、利用者がこれらのスイッチを操作するとその操作に応じたコードで変調された赤外線信号を送信する。表示パネル３４はこのカラオケ装置の前面に設けられており、現在演奏中の曲コードや予約曲数などを表示するものである。パネルスイッチ３５はカラオケ装置の前面操作部に設けられており、テンポチェンジスイッチやキーチェンジスイッチなどを含んでいる。
【００２１】
図２（Ｃ）おいて、楽曲データは、ヘッダ，楽音トラック，ガイドメロディトラック，歌詞トラック，音声制御トラック，効果トラックおよび音声データ部からなっている。ヘッダは、この楽曲データに関する種々のデータが書き込まれる部分であり、曲名，ジャンル，発売日，曲の演奏時間（長さ）などのデータが書き込まれている。
【００２２】
楽音トラック〜効果トラックの各トラックは複数のイベントデータと各イベントデータ間の時間間隔を示すデュレーションデータΔｔからなるシーケンスデータで構成されている。ＣＰＵ３０は、カラオケ演奏時にシーケンスプログラムに基づき全トラックのデータを並行して読み出す。シーケンスプログラムは、所定のテンポクロックでΔｔをカウントし、Δｔをカウントアップしたときこれに続くイベントデータを読み出し、所定の処理部へ出力するプログラムである。
【００２３】
楽音トラックには、メロディトラック，リズムトラックを初めとして種々のパートのトラックが形成されている。ガイドメロディトラックには、このカラオケ曲の旋律すなわち歌唱者が歌うべき旋律のシーケンスデータが書き込まれている。ＣＰＵ３０はこのデータに基づいてリファレンスの音高データ，音量データを生成し、歌唱音声と比較する。
【００２４】
歌詞トラックは、モニタ４６上に歌詞を表示するためのシーケンスデータを記憶したトラックである。このシーケンスデータは楽音データではないが、インプリメンテーションの統一をとり、作業工程を容易にするためこのトラックもＭＩＤＩデータ形式で記述されている。データ種類は、システム・エクスクルーシブ・メッセージである。
【００２５】
音声制御トラックは、音声データ部に記憶されている音声データｎ（ｎ＝１，２，３，‥‥）の発生タイミングなどを指定するシーケンストラックである。音声データ部には、音源装置３８で合成しにくいバックコーラスやハーモニー歌唱などの人声が記憶されている。音声トラックには、音声指定データと、音声指定データの読み出し間隔、すなわち、音声データを音声データ処理部３９に出力して音声信号形成するタイミングを指定するデュレーションデータΔｔが書き込まれている。音声指定データは、音声データ番号，音程データおよび音量データからなっている。音声データ番号は、音声データ部に記録されている各音声データの識別番号ｎである。音程データ，音量データは、形成すべき音声データの音程や音量を指示するデータである。すなわち、言葉を伴わない「アー」や「ワワワワッ」などのバックコーラスは、音程や音量を変化させれば何度も利用できるため、基本的な音程，音量で１つ記憶しておき、このデータに基づいて音程や音量をシフトして繰り返し使用する。音声データ処理部３９は音量データに基づいて出力レベルを設定し、音程データに基づいて音声データの読出間隔を変えることによって音声信号の音程を設定する。
【００２６】
効果トラックには、コントロールアンプ４０を制御するための効果制御データが書き込まれている。コントロールアンプ４０は音源装置３８，音声データ処理部３９から入力される信号に対してリバーブなどの残響系の効果やフィルタ系の効果を付与する。効果制御データは、このような効果の種類を指定するデータおよびその程度を指示するデータなどからなっている。
【００２７】
図１において、カラオケ曲の演奏がスタートすると、ＣＰＵ３０は、テンポクロックに基づいて楽曲データの各トラックのイベントデータを順次読み出し、所定の動作部に入力する。楽曲データの楽音トラックのイベントデータは音源装置３８に入力される。また、リファレンスデータとして用いられるガイドメロディトラックのイベントデータは音声信号処理部５０に入力される。効果トラックのイベントデータはコントロールアンプ４０に入力される。ＣＰＵ３０が、歌詞トラックのイベントデータを読み出すと、このイベントデータに対応する文字パターンを文字表示部４３のＶＲＡＭ上に形成する。また、ＣＰＵ３０が、音声制御トラックのイベントデータを読み出すと、このイベントデータが指示する音声データを音声データ処理部３９に入力する。
【００２８】
音源装置３８は、ＣＰＵ３０から入力された楽音トラックのイベントデータに基づいて楽音信号を形成する。楽音トラックは上述したように複数トラックで構成されており、音源装置３８はこのデータに基づいて複数パートの楽音信号を同時に形成する。音声データ処理部３９は、入力された音声データに基づき、指定された長さ，指定された音高の音声信号を形成する。
【００２９】
音源装置３８が形成した楽音信号および音声データ処理部３９が形成した音声信号はコントロールアンプ４０に入力される。コントロールアンプ４０は、このカラオケ演奏音に対して残響系，フィルタ系の効果を付与する。この効果の種類や程度は前記効果トラックのイベントデータによって制御される。また、歌唱用のマイク４７から入力された歌唱音声信号もコントロールアンプ４０に入力される。コントロールアンプ４０はこの歌唱音声信号に対して残響系，フィルタ系の効果を付与する。この効果の種類や程度も効果トラックのイベントデータによって制御される。コントロールアンプ４０はカラオケ演奏音および歌唱音声信号をミキシングしてスピーカ４２に出力する。
【００３０】
一方、歌唱用のマイク４７から入力された歌唱音声信号はコントロールアンプ４０を介して音声信号処理部５０にも入力される。音声信号処理部５０は、入力された歌唱音声信号を５０ｍｓずつのフレームに区切り、各フレーム毎の平均周波数および平均音量を測定する。ＣＰＵ３０は、この周波数データと音量データとをリファレンスデータと比較することによって歌唱の音量および音程についての採点を行う。また、各フレームの音量データを読み取ることによって歌唱音声の切れ目を検出し、この歌唱音声の切れ目によってリズムについての採点を行う。歌唱の音量データ，周波数データ，リズムデータのリファレンスデータとの差をマイナス点として加算してゆき、カラオケ演奏が終了したとき、音量，音程，リズム毎にマイナス点を満点から減算することによって各得点を計算し、これを重み付け平均することによって総合得点を算出する。重み付けは曲のジャンルによって定められている。たとえば、ポップスはリズムの重みを大きくし、演歌は音程や音量の重みを大きくするなどである。なお、音声信号処理部５０を外付け装置とし音声信号処理部５０自身がリファレンスパターンとの比較を行うようにしてもよい。
【００３１】
一方、カラオケ演奏中でないときに、マイク４７から音声信号が入力されると、音声信号処理部５０はその音声信号の音高，音量やリズムなどを検出してＣＰＵ３０に入力する。ＣＰＵ３０は、入力された音高データ，音量データ，リズムデータをコマンドパターンテーブルのコマンドパターンとを比較し、一致したコマンドパターンの処理を実行する。
【００３２】
文字表示部４３はＣＰＵ３０から入力される文字パターンデータをＶＲＡＭ上に展開して歌詞の映像信号を発生する。ＣＤ−ＲＯＭチェンジャ４４はＣＰＵ３０から入力された映像選択データに基づいて所定の背景映像を再生する。映像選択データは当該カラオケ曲のジャンルデータなどに基づいて決定される。ジャンルデータは楽曲データのヘッダに書き込まれており、カラオケ演奏スタート時にＣＰＵ３０によって読み出される。ＣＤ−ＲＯＭチェンジャ４４には、６枚のＣＤ−ＲＯＭが内蔵されており約１２０シーンの背景映像を再生することができる。文字パターンの映像信号および背景映像の映像信号は表示制御部４５に入力される。表示制御部４５はこれらの映像信号をスーパーインポーズで合成してモニタ４６に表示する。
【００３３】
図３は音声コマンド機能のうち音高コマンド機能を実行する音高監視動作のフローチャートおよび音高のコマンドパターンテーブルを示す図である。同図（Ｂ）のコマンドパターンテーブルには、連続する３音の音高（第１音高，第２音高，第３音高）からなるコマンドパターンが、対応するコマンド（処理内容）とともに複数登録されている。たとえば、コマンドパターンＡ１，Ｂ１，Ｃ１はコマンド１に対応し、コマンドパターンＡ１，Ｃ１，Ｄ１はコマンド２に対応し、コマンドパターンＡ１，Ｅ１，Ｇ１はコマンド３に対応している。各コマンド（１〜ｎ）は、たとえばカラオケ装置で実行可能なインタラクティブな機能（コンテンツ）でのメニュー項目の選択機能などの処理内容のコマンドである。たとえば、コマンド１は占い機能の選択、コマンド２はゲーム機能の選択、コマンド３は新譜紹介機能の選択、コマンド４は食事注文機能の選択などである。
【００３４】
同図（Ａ）のフローチャートにおいて、カラオケ演奏がされていない間、マイク４７からの入力に対してこの動作を実行する。最初は音声信号が入力されるまでｓ１で待機する。音声信号が入力されると、この音声信号の音高を検出する（ｓ２）。この音高検出動作は音声信号処理部５０が実行する。音声信号の周波数が音高の検出が可能な許容範囲のものであれば（ｓ３）、検出し音高データをＣＰＵ３０に入力する。
【００３５】
音高データがＣＰＵ３０に入力されると、同図（Ｂ）のコマンドパターンテーブルを検索して、上記連続する３音の音高からなるコマンドパターンのうち第１音高が音声信号の音高データと一致するものを抽出する（ｓ４）。第１音高が音高データと一致するコマンドパターンがない場合にはｓ５の判断でｓ１に戻る。第１音高が音高データと一致するコマンドパターンが抽出された場合、次の音声信号が入力されるまでｓ６，ｓ７で待機する。音声信号の切れ目は、入力される音声信号が明確に別の音高に移行したとき、または、音量が所定値以下になったときとする。第１音が途切れたのち一定時間（たとえば１秒程度）以内に次の音声信号が入力されない場合には、連続した３音のコマンドパターンの入力ではないとして（ｓ７）、ｓ４の抽出をキャンセルして（ｓ１９）、ｓ１に戻る。
【００３６】
第２音の音声信号が入力されると（ｓ６）、この音声信号の音高を検出する（ｓ８）。音声信号が音階の周波数から大きく外れている場合や周波数が変動して一定しない場合など音高を検出できない場合には、コマンド入力ではないとしてｓ９の判断でｓ１９に進み、ｓ４の抽出をキャンセルしてｓ１に戻る。
【００３７】
第２音の音高データが検出され、音声信号処理部５０からＣＰＵ３０に入力されると、ｓ４で抽出された第１音高が一致したコマンドパターンのうち第２音高が第２音の音声信号から検出された音高データと一致するコマンドパターンを抽出する（ｓ１０）。第２音高一致するコマンドパターンがない場合にはｓ１１の判断でｓ１９に進み、ｓ４の抽出をキャンセルしてｓ１に戻る。第２音高一致するコマンドパターンが抽出されると、次の音声信号（第３音）が入力されるまでｓ１２，ｓ１３で待機する。第２音が途切れたのち一定時間次の第３音が入力されない場合には、連続した３音のコマンド入力ではないとして（ｓ１３）、ｓ１０の抽出をキャンセルして（ｓ１９）、ｓ１に戻る。
【００３８】
第３音の音声信号が入力されると（ｓ１２）、この音声信号の音高を検出する（ｓ１４）。音声信号が音階の周波数から大きく外れている場合や周波数が変動して一定しない場合など音高を検出できない場合には、コマンド入力ではないとしてｓ１５の判断でｓ１９に進み、ｓ１０の抽出をキャンセルしてｓ１に戻る。
【００３９】
第３音の音高データが検出され、音声信号処理部５０からＣＰＵ３０に入力されると、ｓ１０で抽出された第１音高，第２音高が一致したコマンドパターンのうち第３音高が第３音の音声信号から検出された音高と一致するコマンドパターンを抽出する（ｓ１６）。第３音高が一致するコマンドがない場合にはｓ１７の判断でｓ１９に進み、ｓ１０の抽出をキャンセルしてｓ１に戻る。第３音高が一致するコマンドパターンが抽出された場合、そのコマンドパターンに対応する処理処理をコマンドパターンテーブルから読み出して実行する（ｓ１８）。実行ののち、ｓ１にもどる。
【００４０】
この例では全てのコマンドを３音にしたがコマンドは３音以外でもよく、３音と別の音数のものを混在させてもよい。この場合には、長いコマンドの前半部と一致する短いコマンドを設定しないようにする。
【００４１】
図４は音声コマンド機能のうち、音量コマンド機能を処理する音量監視動作のフローチャートおよび音量のコマンドパターンテーブルを示す図である。同図（Ｂ）において、コマンドパターンテーブルには、連続する３音の音量の大／小（第１音量，第２音量，第３音量）からなるコマンドパターンが、対応するコマンド（処理内容）とともに複数登録されている。たとえば、コマンドパターン「大，大，大」はコマンド１に対応し、コマンドパターン「大，小，大」はコマンド２に対応し、コマンドパターン「大，大，小」はコマンド３に対応する。この音声コマンド機能も音量コマンド機能と同様のインタラクティブなコンテンツのメニュー選択機能に用いてもよく、音声コマンド機能とは異なる機能に用いてもよい。
【００４２】
同図（Ａ）のフローチャートにおいて、カラオケ演奏がされていない間、マイク４７からの入力に対してこの動作を実行する。第１音の音声信号が入力されるまでｓ３１で待機する。第１音の音声信号が入力されると、この音声信号の音量を検出し、その大小を判定する（ｓ３２）。音声信号の入力の有無は低いしきい値で判定し、音声信号の大小は中程度のしきい値で判定する。この音量判定動作は音声信号処理部５０が実行し、検出された音量判定データはＣＰＵ３０に入力される。
【００４３】
音量判定データがＣＰＵ３０に入力されると、同図（Ｂ）のコマンドパターンテーブルを検索して、上記音量判定データと第１音量が一致するコマンドパターンを抽出する（ｓ３３）。第１音量が一致するコマンドパターンを抽出したのち、第２音の音声信号が入力されるまでｓ３４，ｓ３５で待機する。音声信号の切れ目は、音量が上記低いしきい値以下になったときとする。第１音が途切れたのち一定時間（たとえば１秒程度）次の音声信号（第２音）が入力されない場合には、連続した３音のコマンドパターンの入力ではないとして（ｓ３５）、ｓ３３の抽出をキャンセルして（ｓ４５）、ｓ３１にもどる。
【００４４】
第２音の音声信号が入力されると（ｓ３４）、この音声信号の音量の大小を判定する（ｓ３６）。第２音の音量判定データが音声信号処理部５０からＣＰＵ３０に入力されると、ｓ３３で抽出されたコマンドパターンのうち第２音量がこの音量判定データと一致するコマンドを抽出する（ｓ３８）。上記コマンドパターンのなかで第２音量が音量判定データと一致するコマンドパターンがない場合にはｓ３８の判断でｓ４５に進み、ｓ３３の抽出をキャンセルしてｓ３１に戻る。第２音量が音量判定データと一致するコマンドパターンが１または複数抽出された場合、第３音の音声信号が入力されるまでｓ３９，ｓ４０で待機する。第２音が途切れたのち一定時間次の第３音が入力されない場合には、連続した３音のコマンドパターンの入力ではないとして（ｓ４０）、ｓ３７の抽出をキャンセルして（ｓ４５）、ｓ３１に戻る。
【００４５】
第３音の音声信号が入力されると（ｓ３９）、この音声信号の音量の大小を判定する（ｓ４１）。第３音の音量判定データが音声信号処理部５０からＣＰＵ３０に入力されると、ｓ３７で抽出されたコマンドパターンのうち第３音量が前記音量判定データと一致するコマンドを抽出する（ｓ４２）。上記コマンドパターンのなかで第３音量が音量判定データと一致するコマンドパターンがない場合にはｓ４３の判断でｓ４５に進み、ｓ３７の抽出をキャンセルしてｓ３１に戻る。第３音量が音量判定データと一致するコマンドパターンが抽出された場合、そのコマンドパターンに対応するコマンド（処理内容）を実行する（ｓ４４）。処理実行ののち、ｓ３１にもどる。
【００４６】
図５はテンポによる音声コマンドを処理するテンポ監視動作のフローチャートおよびテンポのコマンドパターンテーブルを示す図である。同図（Ｂ）のコマンドパターンテーブルには、連続する４音の３つの発音間隔からなるコマンドパターンが、それぞれコマンドと対応して複数登録されている。すなわち、このコマンドパターンは、第１音の入力タイミングと第２音の入力タイミングの時間間隔である第１間隔、第２音の入力タイミングと第３音の入力タイミングの時間間隔である第２間隔、第３音の入力タイミングと第４音の入力タイミングの時間間隔である第３間隔からなっている。この実施形態において時間間隔はｍｓであるが、これ以外に予め定められたテンポクロックのカウント数などを採用することができる。同図では、たとえば、コマンドパターン「４００，４００，２００」はコマンド１に対応し、コマンドパターン「４００，８００，２００」はコマンド２に対応し、コマンドパターン「６００，４００，２００」はコマンド３に対応している。
【００４７】
同図（Ａ）のフローチャートにおいて、カラオケ演奏がされていない間、マイク４７からの入力に対してこの動作を実行する。最初に音声信号が入力されるまでｓ５１で待機する。最初の（第１音の）音声信号が入力されると、次の第２音が入力されるまでの時間間隔をカウントする（ｓ５２，ｓ５４）。第１音ののち一定時間（たとえば１秒程度）次の第２音が入力されない場合には、コマンド入力ではないとして（ｓ５３）、カウントを中止してｓ５１にもどる。第１音と第２音の音声信号の切れ目は、音量が所定値以下になったときとする。このカウント動作は音声信号処理部５０が実行し、前記時間間隔のカウント値はＣＰＵ３０に入力される。
【００４８】
カウント値がＣＰＵ３０に入力されると、同図（Ｂ）のコマンドパターンテーブルを検索して、第１間隔の値が前記カウント値と一致するコマンドパターンを抽出する（ｓ５５）。第１間隔がカウント値と一致するコマンドがない場合にはｓ５６の判断でｓ５１の待機動作にもどる。第１間隔がカウント値と一致するコマンドが抽出されると、第２音の入力タイミングからの時間をカウントしながら（ｓ５７）、第３音が入力されるまで待機する（ｓ５８，ｓ５９）。第２音の入力ののち一定時間次の第３音が入力されない場合には、コマンド入力ではないと判断して（ｓ５９）、ｓ５５の抽出をキャンセルして（ｓ６８）、ｓ５１に戻る。
【００４９】
第３音の音声信号が入力されると（ｓ５８）、第２音の入力タイミングからのカウント値を読み出し、ｓ５５で抽出されたコマンドパターンのうち第２間隔の値がこのカウント値と一致するコマンドパターンを抽出する（ｓ６０）。上記コマンドパターンのなかで第２間隔がカウント値と一致するコマンドパターンがない場合にはｓ６１の判断でｓ６８に進み、ｓ５５の抽出をキャンセルしてｓ５１に戻る。第２間隔がカウント値と一致したコマンドが抽出されると、第３音の入力タイミングからの時間をカウントしながら（ｓ６２）、第４音が入力されるまで待機する（ｓ６３，ｓ６４）。第３音の入力ののち一定時間次の第４音が入力されない場合には、コマンド入力ではないと判断して（ｓ６４）、ｓ６０の抽出をキャンセルして（ｓ６８）、ｓ５１に戻る。
【００５０】
第４音の音声信号が入力され第３音の入力タイミングから第４音の入力タイミングまでのカウント値が音声信号処理部５０から入力されると（ｓ６３）、ｓ６０で抽出されたコマンドパターンのうち第３間隔がこのカウント値と一致するコマンドパターンを抽出する（ｓ６６）。上記コマンドパターンのなかで第３間隔がカウント値と一致するコマンドがない場合にはｓ６６の判断でｓ６８に進み、ｓ６０の抽出をキャンセルしてｓ５１に戻る。第３音がカウント値と一致するコマンドパターンが抽出された場合、そのコマンドパターンに対応するコマンド（処理内容）を実行する（ｓ６７）。実行ののち、ｓ５１にもどる。
【００５１】
この例では全てのコマンドを連続する４音の３つの時間間隔で決定するものにしたがコマンドは４音以外でもよく、４音と別の音数のものを混在させてもよい。この場合には、長いコマンドの前半部と一致する短いコマンドを設定しないようにする。
【００５２】
なお、カラオケ装置において、上記音声信号の音高，音量，テンポによる音声コマンド機能は、択一的にいずれか１つのみを機能させてもよく、３つの機能を並行して機能させてもよい。また、利用者が任意にいずれかの機能モードを選択できるようにしてもよい。
【００５３】
さらに、上記実施形態では音高コマンド機能は、絶対音高を用い、３音の絶対音高でコマンドを構成するようにしたが、テンポコマンド機能のように２音間の音高間隔（相対音程）によってコマンドを構成するようにしてもよい。たとえば、４音の音声信号を入力し、第１音と第２音の相対音程である第１音程、第２音と第３音の相対音程である第２音程、第３音と第４音の相対音程である第３音程によってコマンドを構成するなどである。
【００５４】
またさらに、入力された音声信号から音高，音量，テンポなどの複数の音楽要素を抽出し、これらを組み合わせてコマンドパターンとしてもよい。これであれば、少ない音数で多くのコマンドパターンを構成することができる。
【００５５】
また、上記実施形態では、音高，音量，テンポの音声コマンド機能はコンテンツの選択機能であるが、これ以外にカラオケ装置上で実行されるゲームの操作機能にこの音声コマンド機能を適用してもよい。たとえば、「○○曲の最初のメロディを歌え」というゲームの回答の場合、マイク４７から入力されたメロディが実際に○○曲のメロディであるかを判定する機能に用いることもできる。
【００５６】
また、画面上のキャラクタや車を上下や左右に移動させるゲームの場合、音高の高低や音量の大小で移動方向の上下・左右を制御できるようにしてもよい。また、キャラクタや車の移動速度を入力テンポで制御できるようにしてもよい。たとえば、入力音声のテンポが速いほどキャラクタの移動も速くなるなどである。
【００５７】
また、上記画面上でキャラクタや車を移動させるゲームにおいて、２本のマイクを用いて２個のキャラクタや車を移動させて対戦させるようにしてもよい。
【００５８】
さらに、この音声コマンド機能を用いて、カラオケ装置本来の機能であるカラオケ曲演奏機能の制御を行うようにしてもよい。たとえば、カラオケ曲の演奏がスタンバイ状態にあるときに、あるテンポで音声信号（たとえば、「ワン，ツー，ワン，ツー，スリー」など）を入力すれば、音声信号処理部５０がそのテンポを判断し、そのテンポで演奏がスタートするようにしてもよい。また、カラオケ曲の演奏がスタンバイしているときに、そのカラオケ曲の歌いだしを自分の好きなキーで歌うと、音声信号処理部５０がそのキーを判断して、カラオケ曲をそのキーに移調して演奏するようにすることもできる。
【００５９】
このフローチャートを図６に示す。図６（Ａ）は、テンポ設定動作を示すフローチャートである。カラオケ曲が選曲され演奏がスタンバイしている状態で連続した「ワン，ツー，ワン，ツー，スリー」などの音声が入力されると（ｓ７１）、その音声の間隔に基づいてテンポを検出する（ｓ７２）。そしてこのテンポを楽曲データを読み出すクロックに設定する（ｓ７３）。すなわち、楽曲データがデフォルトで設定したテンポをこのテンポに書き換える。そしてカラオケ演奏をスタートする（ｓ７４）。以後の曲中におけるテンポ切り換えは、このテンポが基本として行われる。なお、入力する音声信号は、「ワン，ツー，ワン，ツー，スリー」のようなものでなくカラオケ曲の歌いだしでもよい。
【００６０】
同図（Ｂ）は、キー設定動作を示すフローチャートである。カラオケ曲が選曲され演奏がスタンバイしている状態でその曲の歌いだしメロディが入力されると（ｓ８１）、このメロディの周波数からそのキー（調性）を検出する（ｓ８２）。このキーと楽曲データの原調に基づき、曲をこのキーに移調するための音高シフト量を設定する（ｓ８３）。すなわち、原調がハ長調で歌われたキーがニ長調であれば全音（２半音）全てのノートデータを上げるようにシフトする。このシフトはノートデータが読み出されたときリアルタイムに行ってもよく、予め全データを書き換えてせもよい。こののちカラオケ演奏をスタートする（ｓ８４）。なお、最初に入力するメロディは、歌いだし以外の部分でもよい。たとえば、サビの部分でもよい。
【００６１】
また、この例では、歌唱者が曲の一部を歌唱し、その歌唱のキーに一致するように演奏のキーを移調するようにしているが、歌唱者が入力した音を主音とするキーに移調するようにしてもよく、歌唱者が入力した音がその曲の最高音または最低音となるようなキーに移調するようにしてもよい。これらの処理も上記フローチャートのｓ８２およびｓ８３で行うことができる。
【００６２】
また、上記実施形態のテンポコマンド機能では、音声信号のスタートタイミングの間隔のみの組み合わせでコマンドを決定しているが、音声信号の持続時間もコマンドの要素にしてもよい。これにより、スタートタイミングの間隔である第１間隔，第２間隔，第３間隔が同じパターンであってもその音声の長さで別のコマンドにすることができる。また、スタートタイミングは考慮せず、複数の音声信号の長さのみでコマンドを構成するようにしてもよい。
【００６３】
【発明の効果】
以上のようにこの発明によれば、音声信号によって装置を制御することができるため、装置が多機能であっても操作用のボタンの数を増やしたり、キーシーケンスを複雑にしたりする必要がなくなり、マイクから容易に装置を制御することができるようになる。
【図面の簡単な説明】
【図１】この発明の実施形態であるカラオケ装置のブロック図
【図２】同カラオケ装置のＲＡＭ，ハードディスク，楽曲データの構成を示す図
【図３】同カラオケ装置の音高コマンド機能を処理するフローチャート
【図４】同カラオケ装置の音量コマンド機能を処理するフローチャート
【図５】同カラオケ装置のリズムコマンド機能を処理するフローチャート
【図６】同カラオケ装置のテンポ設定動作およびキー設定動作を示すフローチャート
【符号の説明】
３０…ＣＰＵ、３２…ＲＡＭ、３７…ハードディスク、
４７…マイク、５０…音声信号処理部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice instruction apparatus and a karaoke apparatus capable of performing command input, tempo, key setting, and the like by voice input from a microphone.
[0002]
[Prior art]
In order to use the karaoke device, operations such as music selection and performance start are necessary, but in recent years, with the enhancement of the functionality of karaoke devices, various operations for using new functions other than music selection and performance start are required. It is necessary. Conventionally, the above operation is performed by turning on a button of an infrared remote control device or a panel switch of the device main body.
[0003]
[Problems to be solved by the invention]
However, if many functions are to be operated with an infrared remote controller or a panel switch of the apparatus main body, it is necessary to increase the number of buttons or to define a complicated key sequence. In order to increase the number of buttons, it is necessary to modify the hardware, which cannot be easily performed, and there are problems such as cost increase and time taken for response. Further, when trying to operate a new function with a key sequence, the key sequence becomes complicated and the key operation becomes troublesome.
[0004]
An object of the present invention is to provide a voice instruction apparatus and a karaoke apparatus that can operate various functions by a voice signal input from a microphone.
[0005]
[Means for Solving the Problems]
The invention of claim 1 of this application isMultiple consecutiveAudio signaleachMusic elementsIs a combination ofThe command pattern table storing the command pattern and the processing content corresponding to the command pattern in correspondence with each other, and searching the command pattern table with a plurality of music elements detected from the audio signal input from the microphone, and the matched command And a control unit that executes processing corresponding to the command pattern when the pattern is extracted.
[0006]
The invention of claim 2 of this application is characterized in that, in the invention of claim 1, the music element is a pitch, volume or tempo of an audio signal.
[0007]
The karaoke apparatus which is the invention of claim 3 of this application,The voice instruction device according to claim 1 or 2 is provided.
[0009]
In the first aspect of the invention, the music element can be a frequency, volume, temporal element, or the like. The frequency may be a continuous value expressed in hertz or a stepped pitch. As a temporal element, a time interval between sound generation start timings of a plurality of audio signals, a length of one audio signal, or the like can be used.
[0010]
A command pattern using these music elements is configured as follows, for example.
[0011]
Pitch data sequence and volume data sequence extracted from multiple audio signals
Time interval (tempo) and frequency interval (pitch) between multiple audio signals
Combination of pitch data, volume data, etc. extracted from one audio signal.
[0012]
That is, the plurality of music elements of the command pattern may be one type of music element sequence extracted from a plurality of continuous audio signals, or may be a plurality of music element groups extracted from one audio signal. The command pattern table stores processing contents for controlling a device such as a karaoke device corresponding to these command patterns. As processing contents, for example, when the present invention is applied to a karaoke device, a function other than karaoke performance that can be executed by the karaoke device (contents such as games and fortune-telling), operations in each content such as character actions in the game function, etc. There is. According to the present invention, when such a function that the conventional karaoke apparatus does not have is added, it can be operated by voice input from the microphone without adding a hardware operation button.
[0015]
DETAILED DESCRIPTION OF THE INVENTION
A karaoke apparatus according to an embodiment of the present invention will be described with reference to the drawings. This karaoke apparatus has a singing scoring function. The singing scoring function scores the pitch, volume, and rhythm of a user by comparing the user's singing voice signal with reference data, and displays the score after the singing ends. This scoring function is executed by the audio signal processing unit 50.
[0016]
The karaoke apparatus has a voice command function. The voice command function is a function that allows the user to operate the karaoke device by inputting voice from the microphone 47 when the karaoke song is not being played. There are three types of voice command functions: a pitch command function, a volume command function, and a tempo command function. The pitch, volume, and tempo are determined by the voice signal processing unit 50 in the same manner as the scoring function. When a plurality of audio signals are continuously input from the microphone 47 when the karaoke performance is not being performed, the audio signal processing unit 50 detects a music element such as a pitch, volume, or tempo. It is determined whether the arrangement pattern of the music elements matches any of the command patterns stored in the command pattern table, and when there is a match, the command (processing) indicated by the command pattern is executed.
[0017]
FIG. 1 is a block diagram of the karaoke apparatus. FIG. 2 is a block diagram of the RAM 32, hard disk 37 and music data of the karaoke apparatus.
In FIG. 1, a CPU 30 for controlling the operation of the entire apparatus includes a ROM 31, a RAM 32, a hard disk storage device (HDD) 37, a communication control unit 36, a remote control receiving unit 33, a display panel 34, a panel switch 35, a sound source via a bus. A device 38, an audio data processing unit 39, a control amplifier 40, a character display unit 43, a CD-ROM changer 44, a display control unit 45, and an audio signal processing unit 50 are connected.
[0018]
The ROM 31 stores an activation program necessary for activating this apparatus. A system program for controlling the operation of the apparatus, a karaoke execution program, and the like are stored in the HDD 37, and are read into the RAM 32 by the startup program when the apparatus is turned on. Various storage areas such as an area for storing these programs are set in the RAM 32 as shown in FIG. 2A, the RAM 32 has a program storage area 320 for storing a program read from the hard disk 37, an execution data storage area 321 for storing song data of a karaoke song being played, reference data in the song data, and A point storage area 322 for storing points obtained by comparing with a singing voice signal, a command pattern table 323 for storing command patterns of the voice command function, and the like are provided. The program and command pattern table are read from the hard disk 37 when the power is turned on, and the music data in the execution data storage area 321 is read from the hard disk 37 when the music is selected by the user.
[0019]
In addition, as shown in FIG. 2B, the HDD 37 has a program storage area 370 for storing the system program and application program, a music data file 371 for storing thousands of music data, a command pattern table 372, and the like. Is set. Reference data for scoring the user's karaoke song is included in the song data of each karaoke song. The communication control unit 36 downloads music data and the like from the host station via the ISDN line, and writes the music data directly into the HDD 37 without using the CPU 30 using the built-in DMA circuit.
[0020]
Ordinary commands such as song selection and karaoke song start are input from the infrared remote controller 51. The remote control receiving unit 33 receives the infrared signal sent from the remote control 51 and restores the data. The remote controller 51 includes a command switch such as a music selection switch, a numeric keypad switch, and the like. When the user operates these switches, the remote controller 51 transmits an infrared signal modulated with a code corresponding to the operation. The display panel 34 is provided on the front side of the karaoke apparatus, and displays the currently playing song code, the number of reserved songs, and the like. The panel switch 35 is provided in the front operation unit of the karaoke apparatus, and includes a tempo change switch, a key change switch, and the like.
[0021]
In FIG. 2C, the music data is composed of a header, a musical sound track, a guide melody track, a lyrics track, a voice control track, an effect track, and a voice data section. The header is a portion in which various data relating to the music data is written, and data such as a music title, genre, release date, and performance time (length) of the music is written.
[0022]
Each track from the musical sound track to the effect track is composed of a plurality of event data and sequence data including duration data Δt indicating a time interval between the event data. The CPU 30 reads the data of all tracks in parallel based on the sequence program during karaoke performance. The sequence program is a program that counts Δt with a predetermined tempo clock, reads the event data that follows when Δt is counted up, and outputs the event data to a predetermined processing unit.
[0023]
The musical sound track is formed with various parts such as a melody track and a rhythm track. In the guide melody track, the melody sequence data of the karaoke song, that is, the melody sequence data to be sung by the singer, is written. The CPU 30 generates reference pitch data and volume data based on this data and compares it with the singing voice.
[0024]
The lyrics track is a track that stores sequence data for displaying lyrics on the monitor 46. This sequence data is not musical sound data, but this track is also described in the MIDI data format in order to unify the implementation and facilitate the work process. The data type is a system exclusive message.
[0025]
The audio control track is a sequence track that designates the generation timing of audio data n (n = 1, 2, 3,...) Stored in the audio data portion. The voice data section stores voices such as back chorus and harmony singing that are difficult to synthesize by the sound source device 38. In the audio track, audio designation data and duration data Δt for designating the timing at which the audio data is read out, that is, the timing at which the audio data is output to the audio data processing unit 39 to form an audio signal are written. The voice designation data includes a voice data number, pitch data, and volume data. The audio data number is an identification number n of each audio data recorded in the audio data part. The pitch data and volume data are data for instructing the pitch and volume of audio data to be formed. In other words, back choruses such as “Ah” and “Wawa Wawa” without words can be used many times by changing the pitch and volume, so one data is stored at the basic pitch and volume. The pitch and volume are shifted based on the above and used repeatedly. The audio data processing unit 39 sets the output level based on the volume data, and sets the pitch of the audio signal by changing the reading interval of the audio data based on the pitch data.
[0026]
Effect control data for controlling the control amplifier 40 is written in the effect track. The control amplifier 40 gives a reverberation effect such as reverberation or a filter effect to the signals input from the sound source device 38 and the audio data processing unit 39. The effect control data is composed of data designating the kind of effect and data designating the degree thereof.
[0027]
In FIG. 1, when the performance of the karaoke song starts, the CPU 30 sequentially reads the event data of each track of the song data based on the tempo clock and inputs it to a predetermined operation unit. The event data of the musical tone track of the music data is input to the sound source device 38. Further, the event data of the guide melody track used as reference data is input to the audio signal processing unit 50. The event data of the effect track is input to the control amplifier 40. When the CPU 30 reads the event data of the lyrics track, a character pattern corresponding to the event data is formed on the VRAM of the character display unit 43. When the CPU 30 reads the event data of the audio control track, the audio data indicated by the event data is input to the audio data processing unit 39.
[0028]
The tone generator 38 forms a musical tone signal based on the musical tone track event data input from the CPU 30. As described above, the tone track is composed of a plurality of tracks, and the tone generator 38 simultaneously forms a plurality of parts of tone signals based on this data. The audio data processing unit 39 forms an audio signal having a specified length and a specified pitch based on the input audio data.
[0029]
The tone signal formed by the tone generator 38 and the sound signal formed by the sound data processing unit 39 are input to the control amplifier 40. The control amplifier 40 provides reverberation and filter effects to the karaoke performance sound. The type and degree of this effect are controlled by the event data of the effect track. A singing voice signal input from the singing microphone 47 is also input to the control amplifier 40. The control amplifier 40 gives reverberation and filter effects to this singing voice signal. The type and degree of this effect are also controlled by the event data of the effect track. The control amplifier 40 mixes the karaoke performance sound and the singing voice signal and outputs them to the speaker 42.
[0030]
On the other hand, the singing voice signal input from the singing microphone 47 is also input to the voice signal processing unit 50 via the control amplifier 40. The audio signal processing unit 50 divides the input singing audio signal into frames of 50 ms, and measures the average frequency and average volume for each frame. The CPU 30 scores the volume and pitch of the song by comparing the frequency data and volume data with reference data. In addition, the singing voice break is detected by reading the volume data of each frame, and the rhythm is scored based on the singing voice break. The difference between the singing volume data, frequency data, and rhythm data reference data is added as a negative point, and when the karaoke performance ends, each point is obtained by subtracting the negative point from the full score for each volume, pitch, and rhythm. Is calculated, and the total score is calculated by weighted averaging. The weighting is determined by the genre of the song. For example, pops increases the weight of rhythm, and enka increases the weight of pitch and volume. The audio signal processing unit 50 may be an external device, and the audio signal processing unit 50 itself may perform comparison with the reference pattern.
[0031]
On the other hand, when a voice signal is input from the microphone 47 when the karaoke performance is not being performed, the voice signal processing unit 50 detects the pitch, volume, rhythm, and the like of the voice signal and inputs them to the CPU 30. The CPU 30 compares the input pitch data, volume data, and rhythm data with the command pattern of the command pattern table, and executes the process of the matched command pattern.
[0032]
The character display unit 43 develops the character pattern data input from the CPU 30 on the VRAM and generates a video signal of lyrics. The CD-ROM changer 44 reproduces a predetermined background video based on the video selection data input from the CPU 30. The video selection data is determined based on the genre data of the karaoke song. The genre data is written in the header of the music data, and is read out by the CPU 30 when the karaoke performance is started. The CD-ROM changer 44 incorporates six CD-ROMs and can reproduce background images of about 120 scenes. The video signal of the character pattern and the video signal of the background video are input to the display control unit 45. The display control unit 45 synthesizes these video signals by superimposing and displays them on the monitor 46.
[0033]
FIG. 3 shows a flowchart of a pitch monitoring operation for executing the pitch command function of the voice command function and a pitch command pattern table. In the command pattern table of FIG. 5B, there are a plurality of command patterns consisting of three consecutive pitches (first pitch, second pitch, third pitch) together with corresponding commands (processing contents). It is registered. For example, command patterns A1, B1, and C1 correspond to command 1, command patterns A1, C1, and D1 correspond to command 2, and command patterns A1, E1, and G1 correspond to command 3. Each command (1 to n) is a command of processing contents such as a menu item selection function in an interactive function (content) that can be executed by a karaoke apparatus, for example. For example, command 1 is selection of a fortune telling function, command 2 is selection of a game function, command 3 is selection of a new music introduction function, command 4 is selection of a meal order function, and the like.
[0034]
In the flowchart of FIG. 5A, this operation is performed on the input from the microphone 47 while the karaoke performance is not being performed. Initially, the process waits at s1 until an audio signal is input. When an audio signal is input, the pitch of the audio signal is detected (s2). This pitch detection operation is executed by the audio signal processing unit 50. If the frequency of the audio signal is within an allowable range in which the pitch can be detected (s3), the detected pitch data is input to the CPU 30.
[0035]
When the pitch data is input to the CPU 30, the command pattern table of FIG. 5B is searched, and the first pitch among the command patterns consisting of the above three consecutive pitches is the pitch data of the voice signal. Are extracted (s4). If there is no command pattern in which the first pitch matches the pitch data, the process returns to s1 in the determination of s5. When the command pattern in which the first pitch matches the pitch data is extracted, the process waits at s6 and s7 until the next voice signal is input. The break of the audio signal is when the input audio signal clearly shifts to another pitch, or when the volume becomes a predetermined value or less. If the next sound signal is not input within a certain time (for example, about 1 second) after the first sound is interrupted, it is determined that the command pattern of three consecutive sounds is not input (s7), and the extraction of s4 is canceled. (S19), the process returns to s1.
[0036]
When the second sound signal is input (s6), the pitch of the sound signal is detected (s8). If the pitch cannot be detected, such as when the audio signal deviates significantly from the scale frequency, or if the frequency fluctuates and is not constant, the process proceeds to s19 with the determination of s9 as canceling the command and cancels the extraction of s4. And return to s1.
[0037]
When the pitch data of the second sound is detected and input to the CPU 30 from the voice signal processing unit 50, the second pitch of the command pattern in which the first pitch extracted in s4 matches is the second voice. A command pattern that matches the pitch data detected from the signal is extracted (s10). If there is no command pattern that matches the second pitch, the process proceeds to s19 with the determination of s11, cancels the extraction of s4, and returns to s1. When a command pattern that matches the second pitch is extracted, the process waits at s12 and s13 until the next voice signal (third sound) is input. If the third sound is not input for a certain time after the second sound is interrupted, it is determined that the command input is not three consecutive sounds (s13), the extraction of s10 is canceled (s19), and the process returns to s1.
[0038]
When the third sound signal is input (s12), the pitch of the sound signal is detected (s14). If the pitch cannot be detected, such as when the audio signal deviates significantly from the scale frequency, or if the frequency fluctuates and is not constant, the process proceeds to s19 with the judgment of s15 as not being a command input, and the extraction of s10 is canceled. And return to s1.
[0039]
When the pitch data of the third sound is detected and input to the CPU 30 from the voice signal processing unit 50, the third pitch of the command pattern in which the first pitch and the second pitch extracted in s10 match is found. A command pattern that matches the pitch detected from the third sound signal is extracted (s16). If there is no command with the same third pitch, the process proceeds to s19 by determining s17, cancels the extraction of s10, and returns to s1. When a command pattern having the same third pitch is extracted, the processing corresponding to the command pattern is read from the command pattern table and executed (s18). After execution, return to s1.
[0040]
In this example, all the commands are set to three sounds, but the commands may be other than three sounds, and may be mixed with three sounds having different numbers. In this case, a short command that matches the first half of the long command is not set.
[0041]
FIG. 4 shows a flowchart of a volume monitoring operation for processing the volume command function of the voice command functions and a command pattern table for the volume. In FIG. 5B, the command pattern table includes command patterns composed of large / small (first volume, second volume, third volume) of three consecutive sounds, together with corresponding commands (processing contents). Multiple registered. For example, the command pattern “large, large, large” corresponds to the command 1, the command pattern “large, small, large” corresponds to the command 2, and the command pattern “large, large, small” corresponds to the command 3. This voice command function may also be used for an interactive content menu selection function similar to the volume command function, or may be used for a function different from the voice command function.
[0042]
In the flowchart of FIG. 5A, this operation is performed on the input from the microphone 47 while the karaoke performance is not being performed. The process waits at s31 until the first sound signal is input. When the sound signal of the first sound is input, the volume of the sound signal is detected and the magnitude is determined (s32). The presence / absence of an audio signal is determined by a low threshold value, and the magnitude of the audio signal is determined by an intermediate threshold value. This sound volume determination operation is executed by the audio signal processing unit 50, and the detected sound volume determination data is input to the CPU 30.
[0043]
When the sound volume determination data is input to the CPU 30, the command pattern table in FIG. 5B is searched to extract a command pattern in which the sound volume determination data matches the first sound volume (s33). After extracting the command pattern having the same first volume, the process waits at s34 and s35 until the second sound signal is input. An audio signal break occurs when the volume falls below the low threshold. When the next sound signal (second sound) is not input for a certain period of time (for example, about 1 second) after the first sound is interrupted, it is determined that the command pattern of three consecutive sounds is not input (s35), and s33 is extracted. Is canceled (s45), and the process returns to s31.
[0044]
When the audio signal of the second sound is input (s34), the volume level of the audio signal is determined (s36). When the volume determination data of the second sound is input from the audio signal processing unit 50 to the CPU 30, a command whose second volume matches the volume determination data is extracted from the command pattern extracted in s33 (s38). If there is no command pattern in which the second sound volume matches the sound volume determination data in the command pattern, the process proceeds to s45 by the determination in s38, the extraction of s33 is canceled, and the process returns to s31. When one or a plurality of command patterns whose second volume matches the volume determination data are extracted, the process waits at s39 and s40 until the third sound signal is input. If the third sound is not input for a certain period of time after the second sound is interrupted, it is determined that the command pattern of three consecutive sounds is not input (s40), and the extraction of s37 is canceled (s45). Return.
[0045]
When the audio signal of the third sound is input (s39), the volume level of the audio signal is determined (s41). When the volume determination data of the third sound is input from the audio signal processing unit 50 to the CPU 30, a command whose third volume matches the volume determination data is extracted from the command pattern extracted in s37 (s42). If there is no command pattern in which the third volume matches the volume determination data in the command pattern, the process proceeds to s45 by the determination of s43, the extraction of s37 is canceled, and the process returns to s31. When a command pattern whose third volume matches the volume determination data is extracted, a command (processing content) corresponding to the command pattern is executed (s44). After executing the process, the process returns to s31.
[0046]
FIG. 5 shows a flowchart of a tempo monitoring operation for processing a voice command by tempo and a tempo command pattern table. In the command pattern table of FIG. 5B, a plurality of command patterns each consisting of three consecutive sound generation intervals of four sounds are registered corresponding to the commands. That is, this command pattern includes a first interval that is a time interval between the input timing of the first sound and the input timing of the second sound, and a second interval that is a time interval between the input timing of the second sound and the input timing of the third sound. The third interval is the time interval between the input timing of the third sound and the input timing of the fourth sound. In this embodiment, the time interval is ms, but other than this, a predetermined tempo clock count or the like can be employed. In the figure, for example, command pattern “400, 400, 200” corresponds to command 1, command pattern “400, 800, 200” corresponds to command 2, and command pattern “600, 400, 200” corresponds to command 3. It corresponds to.
[0047]
In the flowchart of FIG. 5A, this operation is performed on the input from the microphone 47 while the karaoke performance is not being performed. It waits in s51 until an audio signal is input first. When the first (first sound) audio signal is input, the time interval until the next second sound is input is counted (s52, s54). If the next second sound is not input for a certain time (for example, about 1 second) after the first sound, the command is not input (s53), and the count is stopped and the process returns to s51. The break between the sound signals of the first sound and the second sound is when the sound volume falls below a predetermined value. This counting operation is performed by the audio signal processing unit 50, and the count value of the time interval is input to the CPU 30.
[0048]
When the count value is input to the CPU 30, the command pattern table of FIG. 5B is searched to extract a command pattern whose first interval value matches the count value (s55). If there is no command in which the first interval matches the count value, the process returns to the standby operation in s51 based on the determination in s56. When a command whose first interval matches the count value is extracted, it waits until the third sound is input while counting the time from the input timing of the second sound (s57) (s58, s59). If the third sound is not input for a certain time after the second sound is input, it is determined that the command is not input (s59), the extraction of s55 is canceled (s68), and the process returns to s51.
[0049]
When the third sound signal is input (s58), the count value from the input timing of the second sound is read, and the command whose second interval value matches the count value in the command pattern extracted in s55. A pattern is extracted (s60). If there is no command pattern in which the second interval coincides with the count value in the command pattern, the process proceeds to s68 in the determination of s61, the extraction of s55 is canceled, and the process returns to s51. When a command whose second interval matches the count value is extracted, it waits until the fourth sound is input while counting the time from the input timing of the third sound (s62) (s63, s64). If the fourth sound is not input for a predetermined time after the third sound is input, it is determined that the command is not input (s64), the extraction of s60 is canceled (s68), and the process returns to s51.
[0050]
When a sound value of the fourth sound is input and a count value from the input timing of the third sound to the input timing of the fourth sound is input from the sound signal processing unit 50 (s63), among the command patterns extracted in s60 A command pattern whose third interval matches this count value is extracted (s66). If there is no command whose third interval matches the count value in the command pattern, the process proceeds to s68 in the determination of s66, cancels the extraction of s60, and returns to s51. If a command pattern whose third sound matches the count value is extracted, a command (processing content) corresponding to the command pattern is executed (s67). After execution, return to s51.
[0051]
In this example, all commands are determined at three time intervals of four consecutive sounds, but the commands may be other than four sounds, or four sounds and other sounds may be mixed. In this case, a short command that matches the first half of the long command is not set.
[0052]
In the karaoke apparatus, as for the voice command function based on the pitch, volume and tempo of the voice signal, only one of them may alternatively function, or three functions may function in parallel. . Further, the user may arbitrarily select one of the function modes.
[0053]
Further, in the above embodiment, the pitch command function uses absolute pitches and commands are composed of three absolute pitches. However, as in the tempo command function, the pitch interval between two pitches (relative pitch) is used. ) May constitute a command. For example, when four sound signals are input, the first pitch that is the relative pitch of the first and second sounds, the second pitch that is the relative pitch of the second and third sounds, the third and fourth sounds. The command is constituted by the third pitch which is the relative pitch of
[0054]
Furthermore, a plurality of music elements such as pitch, volume, and tempo may be extracted from the input audio signal and combined to form a command pattern. In this case, a large number of command patterns can be configured with a small number of sounds.
[0055]
In the above embodiment, the voice command function for pitch, volume, and tempo is a content selection function. However, the voice command function may be applied to a game operation function executed on the karaoke apparatus. Good. For example, in the case of an answer to a game “Sing the first melody of the XX song”, it can be used for a function of determining whether the melody input from the microphone 47 is actually the melody of the XX song.
[0056]
In addition, in the case of a game in which a character or car on the screen is moved up and down or left and right, the up and down and left and right of the moving direction may be controlled by the pitch or the volume. Moreover, you may enable it to control the moving speed of a character or a vehicle by input tempo. For example, the faster the input voice tempo, the faster the character moves.
[0057]
Further, in a game in which a character or a car is moved on the screen, two characters or a car may be moved using two microphones to make a battle.
[0058]
Further, this voice command function may be used to control a karaoke song performance function that is an original function of the karaoke apparatus. For example, if a voice signal (for example, “One, Two, One, Two, Three”, etc.) is input at a certain tempo when the performance of a karaoke song is in a standby state, the voice signal processing unit 50 determines the tempo. However, the performance may start at that tempo. Also, when the performance of a karaoke song is on standby, when the singing of the karaoke song is sung with a key of his choice, the audio signal processing unit 50 determines the key and transposes the karaoke song to that key. You can also play.
[0059]
This flowchart is shown in FIG. FIG. 6A is a flowchart showing the tempo setting operation. When a continuous voice such as “One, Two, One, Two, Three” is input while a karaoke song is selected and the performance is on standby (s71), the tempo is detected based on the interval of the voice (S71). s72). Then, this tempo is set as a clock for reading music data (s73). That is, the tempo set by default in the music data is rewritten to this tempo. Then, karaoke performance is started (s74). Subsequent tempo switching is performed based on this tempo. Note that the input audio signal is not like “One, Two, One, Two, Three” but may be a karaoke song.
[0060]
FIG. 5B is a flowchart showing the key setting operation. When a karaoke song is selected and the melody of the song is input while the performance is on standby (s81), the key (tonality) is detected from the frequency of this melody (s82). Based on the key and the original tone of the song data, a pitch shift amount for transposing the song to this key is set (s83). That is, if the key sung in C major is in D major, the shift is performed so as to raise all the note data of the whole tone (two semitones). This shift may be performed in real time when the note data is read, or all data may be rewritten in advance. After this, karaoke performance is started (s84). The melody to be input first may be a part other than the singing start. For example, a rust portion may be used.
[0061]
In this example, the singer sings a part of the song and transposes the key of the performance so that it matches the key of the singing. You may make it transpose, and you may make it transpose to the key that the sound which the singer input becomes the highest sound or the lowest sound of the music. These processes can also be performed in s82 and s83 of the flowchart.
[0062]
In the tempo command function of the above-described embodiment, the command is determined only by the combination of the audio signal start timing intervals. However, the duration of the audio signal may also be an element of the command. As a result, even if the first interval, the second interval, and the third interval, which are the start timing intervals, are the same pattern, another command can be used depending on the length of the voice. Further, the command may be configured only by the lengths of a plurality of audio signals without considering the start timing.
[0063]
【The invention's effect】
As described above, according to the present invention, since the apparatus can be controlled by the audio signal, it is not necessary to increase the number of buttons for operation or complicate the key sequence even if the apparatus is multifunctional. The device can be easily controlled from the microphone.
[Brief description of the drawings]
FIG. 1 is a block diagram of a karaoke apparatus according to an embodiment of the present invention.
FIG. 2 is a diagram showing the configuration of RAM, hard disk, and music data of the karaoke apparatus
FIG. 3 is a flowchart for processing a pitch command function of the karaoke apparatus.
FIG. 4 is a flowchart for processing a volume command function of the karaoke apparatus.
FIG. 5 is a flowchart for processing a rhythm command function of the karaoke apparatus.
FIG. 6 is a flowchart showing a tempo setting operation and a key setting operation of the karaoke apparatus.
[Explanation of symbols]
30 ... CPU, 32 ... RAM, 37 ... hard disk,
47 ... Microphone, 50 ... Audio signal processor

Claims

A command pattern table that stores a command pattern that is a combination of music elements of a plurality of continuous audio signals and processing contents corresponding to the command pattern;
A control unit that searches the command pattern table with a plurality of music elements detected from an audio signal input from a microphone and executes a process corresponding to the command pattern when a matching command pattern is extracted;
A voice instruction device comprising:

The voice instruction device according to claim 1, wherein the music element is a pitch, a volume, or a tempo of a voice signal.

A karaoke apparatus comprising the voice instruction apparatus according to claim 1.