JP3895014B2

JP3895014B2 - Video playback device and karaoke device

Info

Publication number: JP3895014B2
Application number: JP25562497A
Authority: JP
Inventors: 三平浅井; 尚人稲葉
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 1997-09-19
Filing date: 1997-09-19
Publication date: 2007-03-22
Anticipated expiration: 2017-09-19
Also published as: CN1143262C; CN1212416A; JPH1195778A; TW372311B; HK1018838A1

Description

【０００１】
【発明の属する技術分野】
本発明は、音声の再生に同期して動く映像を再生する同期映像生成方法およびそれを用いたカラオケ装置に関し、さらに詳しくは、人間、動物またはこれらの模倣物からなる物体の形状および動き等を測定することによって得られたデータに基づいて、前記物体を表す映像を生成し、その映像を音声の再生に同期させながら表示する映像再生装置およびカラオケ装置に関する。
【０００２】
【従来の技術】
歌謡曲、ポップス等の楽曲（伴奏部分）を再生すると共に、再生する楽曲専用の背景画像を表示するカラオケ装置は、一般に知られている。このようなカラオケ装置は、例えば、ＶＣＤ（Video CD）またはＬＤ（Laser Disc）等を用いた記憶装置を有しており、この記憶装置に、楽曲を再生するための音声データ、背景画像および歌詞を表示するための画像データを記憶保持している。ここで、前記音声データは、例えばＰＣＭ音声データ等であり、前記画像データは、撮影された景色に歌詞を合成することによって形成されたデータである。そして、このカラオケ装置は、前記記憶装置から音声データを読み出して楽曲を再生し、これと同時に、前記記憶装置から画像データを読み出して背景画像を表示する。
【０００３】
ところで、前記カラオケ装置が備えている記憶装置は、ＶＣＤやＬＤといった比較的記憶容量の大きい記憶媒体を用いている。しかしながら、カラオケで使用される楽曲の数は膨大であり、特に画像データの容量が格段に大きいため、前記記憶媒体をもってしても全楽曲に対応する音声データおよび画像データを記憶できないという不都合が生じた。また、カラオケで使用される楽曲は流行歌が多いため、新しい楽曲を頻繁に追加しなければならないという問題が生じた。そこで、近年では、カラオケ装置に通信機能を備え、楽曲を再生するための音声データおよび歌詞データを電話回線等を介して伝送する、いわゆる通信カラオケ装置が広く普及している。
【０００４】
このような通信カラオケ装置では、楽曲を再生するための音声データ等を電話回線を介して受け取る構成であるため、再生できる楽曲の数は記憶装置の記憶容量によって制限されない。従って、膨大な楽曲を再生することができる。また、最新の楽曲を再生する必要が生じても、電話回線を介してその楽曲の音声データを受け取れば、当該最新の楽曲を再生することができる。
【０００５】
ところが、背景画像を表示するための画像データは、楽曲を再生するための音声データと比較してデータ量が格段に大きいため、画像データを、音声データ等と同様に電話回線等を介して伝送するのは時間的または経済的な不利益が大きい。このため、上述したような通信カラオケ装置においても、背景画像の表示に関しては、従前のカラオケ装置と同様であり、即ち、ＶＣＤまたはＬＤ等の記憶媒体に予め記憶された画像データに基づいて背景画像を表示している。しかしながら、表示可能な背景画像の種類は記憶媒体の記憶容量によって制限されるため、再生する楽曲のすべてに異なった背景画像を対応させることは困難である。従って、上述したようなカラオケ装置では、再生される楽曲にふさわしいが直接は関係のない背景画像を選択して表示するようにしている。
【０００６】
【発明が解決しようとする課題】
ところで、カラオケで使用される楽曲のうち、特にリズミカルな楽曲の場合、楽曲を歌っているときの歌手等の動作、例えば、振りまたは踊りをカラオケ装置によって表示することが望まれる。
【０００７】
しかしながら、このような歌手等の動作は、再生される楽曲の曲調に合った動きであるため、各楽曲毎に異なる。この結果、このような歌手等の動作を表示するための映像データを各楽曲毎に設けようとすると、映像データの量が膨大になり、たとえデータ圧縮技術を用いても、上述したようなカラオケ装置に設けられた記憶媒体の記憶容量を遙かに超えてしまう。従って、歌手等の動作を表示するための画像データを各楽曲毎に設けるのは困難であるという問題がある。
【０００８】
また、前述したような歌手等の動作を再現するには、歌手等の動作を表す映像データを、再生する楽曲と正確に同期させる必要がある。ここで、楽曲の進行に同期して変化する映像を表示できるカラオケ装置は、例えば特開平７−１９９９７６号公報により知られている。即ち、このカラオケ装置は、ポリゴンデータと時間データとから構成される映像データを有しており、この映像データに含まれる時間データに基づいて、映像の表示と楽曲の再生とを同期させるものである。しかしながら、ポリゴンデータと時間データとからなる映像データを楽曲毎に設けようとすれば、映像データの総量は膨大なものとなるため、映像データを各楽曲毎に設けるのは依然困難である。
【０００９】
一方、カラオケに使用される楽曲の数は膨大であり、現在に至るまでに各楽曲を再生するための膨大な音声データが、既に設置された各カラオケ装置、および通信カラオケ装置に音声データを配信するためのセンターホストコンピュータ等に蓄積されている。従って、楽曲の再生に合わせて動作する歌手等を表示させるカラオケ装置を実現するためには、これら蓄積された膨大な楽曲に合わせて動作する歌手等を表示するための映像データを追加する必要がある。このとき、映像データを追加するために、現在に至るまでに蓄積された膨大な音声データを作成し直すのは、時間的または経済的な不利益が大きいという問題がある。
【００１０】
また、カラオケを行う際には、様々なテンポ（速度）で楽曲を再生する必要がある。このため、カラオケ装置は、通常、カラオケを行う者の好みに応じて実際に再生する楽曲のテンポを基準テンポよりも遅くしたり、速くしたりする機能を有する。従って、楽曲の再生に合わせて動作する歌手等を表示させるカラオケ装置を実現するためには、楽曲のテンポを変化させた場合でも、表示されている歌手の映像を自然に、滑らかに動作させる必要がある。
【００１１】
さらに、カラオケを行う際には、練習等のため楽曲を途中から再生したり、楽曲の一部分を繰り返し再生させたりする必要がある。このため、カラオケ装置は、通常、楽曲を途中から再生する機能を有する。従って、楽曲の再生に合わせて動作する歌手等を表示させるカラオケ装置を実現するためには、楽曲を途中から再生したときには、その楽曲の再生部分に対応するように歌手等の動作を表示し、楽曲の途中からでも歌手等の動作をその楽曲に正確に同期させる必要がある。
【００１２】
本発明は上述したような問題に鑑みなされたもので、本発明は、音声の再生に同期して動作し、かつ、複数種類の音声がある場合には、各音声毎に異なった動作をする映像を表示することができる映像再生装置およびカラオケ装置を提供することを目的としている。
【００１３】
また、本発明は、現在に至るまでに作成された既存の音声データを作成し直すことなく、この音声データに基づいて再生される音声に同期して動作する映像を表示することができる映像再生装置およびカラオケ装置を提供することを目的としている。
【００１４】
さらに、本発明は、音声の再生速度を変化させた場合でも、表示している映像を音声の再生に同期させながら滑らかに動作させることができる映像再生装置およびカラオケ装置を提供することを目的としている。
【００１５】
さらにまた、本発明は、楽曲等の音声を途中から再生したときでも、その音声に正確に同期して動作する映像を表示することができる映像再生装置およびカラオケ装置を提供することを目的としている。
【００１６】
【課題を解決するための手段】
上述した課題を解決するために、請求項１に記載の発明は、音声データを再生する音声再生装置から、当該音声データの基準再生速度を示す基準再生速度情報と、当該音声データの再生位置を示す識別符号が付された同期信号と、を受信する受信手段と、物体の各構成要素の形状を規定する形状データを記憶する形状データ記憶手段と、前記各構成要素の位置または動作を、映像の再生単位である再生フレーム毎に規定する動作データが再生順に配列された映像データを記憶する映像データ記憶手段と、前記受信された基準再生速度情報と、前記再生フレームの再生周期と、に基づいて、前記同期信号に付された前記識別符号と、当該同期信号が受信された時点で再生すべき前記再生フレームである同期フレームと、の対応付けを示すテーブル情報を生成するテーブル情報生成手段と、前記形状データと前記動作データとに基づいて前記再生フレームを順次生成し表示装置に出力するとともに、前記テーブル情報と前記受信される同期信号とに基づいて、次に出力しようとしている前記再生フレームの同期状況を判定し、同期しない場合には、次に出力する前記再生フレームを別の前記再生フレームに変更する映像再生手段と、を備えることを特徴とする。
【００１７】
請求項２に記載の発明は、請求項１に記載の映像再生装置において、前記映像再生手段は、前記同期信号が受信されていないときに、次に出力しようとしている前記再生フレームが前記同期フレームである場合には、次に出力する前記再生フレームを、前記再生フレームを補間するための補間フレームに変更することを特徴とする。
【００１８】
請求項３に記載の発明は、請求項１または請求項２に記載の映像再生装置において、前記映像再生手段は、前記同期信号が受信された時点において、次に出力しようとしている再生フレームが前記同期フレームではない場合には、次に出力する前記再生フレームを前記同期フレームに変更することを特徴とする。
【００１９】
請求項４に記載の発明は、請求項１乃至３のいずれか１項に記載の映像再生装置において、前記形状データ記憶手段は、複数の前記形状データを記憶し、前記映像再生手段は、当該映像再生装置の外部から受信された選択データに対応する前記形状データに基づいて、前記再生フレームを生成することを特徴とする。
【００２０】
請求項５に記載の発明は、請求項１乃至４のいずれか１項に記載の映像再生装置と、前記表示装置と、前記音声再生装置と、を備えるカラオケ装置であって、前記音声再生装置は、利用者からの指示を指示信号として入力するための入力手段と、前記基準再生速度情報を含む前記音声データを複数記憶する音声データ記憶手段と、前記記憶された音声データを再生する音声再生手段と、前記音声再生手段による前記音声データの再生に同期して、当該音声データの再生位置を示す前記識別符号が付された前記同期信号を出力する同期信号出力手段と、前記入力手段から入力された前記指示信号に基づいて指定された前記音声データに含まれる前記基準再生速度情報を出力し、当該音声データを当該基準再生速度情報が示す再生速度で前記音声再生手段により再生させるとともに、前記入力手段から入力された前記指示信号に基づいて、前記音声再生手段における前記音声データの再生速度を変更させる音声再生制御手段と、を備えることを特徴とする。
【００４９】
【発明の実施の形態】
以下、本発明の実施の形態を図１ないし図１５に従って説明する。なお、本実施形態では、本発明による同期映像生成方法を用いたカラオケ装置として、図１に示す通信カラオケ装置１００を例に挙げて説明する。
【００５０】
（１）通信カラオケ装置の構成および動作
まず、通信カラオケ装置１００の構成および動作について説明する。
【００５１】
図１に示すように、通信カラオケ装置１００は、音声によって構成される楽曲を再生すると共に、楽曲の歌詞を表す歌詞画像を生成するカラオケ演奏部１０と、楽曲の再生に同期して動作する歌手等の物体映像を生成する映像再生部３０と、楽曲の再生時に背景画像を生成する背景画像生成部４０と、歌詞画像、物体映像および背景画像を合成する合成部５０とを備えている。そして、カラオケ演奏部１０には、センターホストコンピュータ２００が電話回線を介して接続されている。さらに、カラオケ演奏部１０には、楽曲の音声とマイク８０から入力される音声とを合成するミキサアンプ６０が接続され、このミキサアンプ６０にはスピーカ７０およびマイク８０が接続されている。また、合成部５０には、この合成部５０から出力される映像・画像を表示する表示装置としてのモニタ９０が接続されている。
【００５２】
さらに、カラオケ演奏部１０は、音声用ＣＰＵ（Central Processing Unit）１１、ＲＡＭ（Random Access Memory）１２、ＲＯＭ（Read Only Memory）１３、ＣＤ−ＲＯＭ読取部１４、インタフェース回路１５を介して接続された音声データ記憶手段としての楽曲データ記憶部１６、入力部１７、歌詞画像生成部１８、音源部１９およびＦＩＦＯ（First In First Out）回路２０を備えている。そして、これらはバス２２を介して相互に接続されている。
【００５３】
ここで、音声用ＣＰＵ１１は、カラオケ演奏部１０の総合的な制御を行うと共に、歌謡曲、ポップス等の楽曲（伴奏部分）の自動演奏を行うものである。具体的に説明すると、音声用ＣＰＵ１１は、例えば、ＭＩＤＩ規格に沿って構成されたＭＩＤＩデータに基づいて楽曲の自動演奏を行う機能を有している。さらに、この音声用ＣＰＵ１１は、タイマを有しており、ＭＩＤＩクロックに基づいて後述する同期信号を生成する機能をも有している。また、ＲＡＭ１２は、音声用ＣＰＵ１１が制御処理を行うときに作業エリアとして利用されると共に、各種データを一時的に蓄積するために利用される。また、ＲＯＭ１３には、カラオケ演奏部１０の動作を定める制御プログラム等が記憶されている。
【００５４】
ＣＤ−ＲＯＭ読取部１４は、ＣＤ−ＲＯＭから後述する楽曲データや映像データ等読み出すものである。ここで、ＣＤ−ＲＯＭは外部から装着することができ、そのＣＤ−ＲＯＭには、後述する楽曲データや映像データ等が記憶されている。ＣＤ−ＲＯＭ読取部１４から読み出された楽曲データは、楽曲データ記憶部１６に転送される。また、ＣＤ−ＲＯＭ読取部１４から読み出された楽曲データが、ＲＡＭ１２に転送され、即座に音声用ＣＰＵ１１および音源部１９によって再生される場合もある。一方、楽曲データ記憶部１６は、例えば、ハードディスクにより構成されており、楽曲データを例えば２０００曲程度記憶している。また、楽曲データ記憶部１６は書き換え可能であり、センターホストコンピュータ２００からモデム２１を介して受信された楽曲データや、ＣＤ−ＲＯＭ読取部１４から転送された楽曲データを追加して記憶することができる。
【００５５】
入力部１７は、再生すべき楽曲の選択、楽曲を再生する際のテンポの設定、楽曲の調の設定、視点情報および光源情報等の設定、楽曲の早送り・巻き戻し等、通信カラオケ装置１００を制御するための指示を入力するものである。また、歌詞画像生成部１８は、例えば、ＯＳＤ（On Screen Display）回路等によって構成されており、楽曲の再生と同時にモニタ９０に表示する歌詞画像を生成するものである。さらに詳しく説明すると、楽曲データには、後述するように、楽曲を再生するための音声データと楽曲の歌詞画像を生成するための歌詞データが含まれている。歌詞画像生成部１８は、この楽曲データに含まれる歌詞データに基づいて歌詞画像を生成する。
【００５６】
音源部１９は、楽曲データに含まれる音声データに基づいて音声を合成するものである。例えば、前記楽曲データは、ＭＩＤＩ規格に沿った自動演奏用のＭＩＤＩデータであり、音源部１９は、このＭＩＤＩデータに基づいて楽音等を発生させるシンセサイザー等によって構成されている。一方、ＦＩＦＯ回路２０は、映像再生部３０のバスラインとのバッファとして機能すると共に、音声用ＣＰＵ１１から出力される同期信号を映像再生部３０に出力するものである。
【００５７】
モデム２１は、センターホストコンピュータ２００に電話回線を介して接続されており、センターホストコンピュータ２００から電話回線を介して伝送されるデータの受信、復調を行うものである。ここで、センターホストコンピュータ２００には、多数の楽曲データと、これら各楽曲データに対応した多数の映像データが蓄積されている。そして、通信カラオケ装置１００の楽曲データ記憶部１６またはＣＤ−ＲＯＭ読取部１４にセットされたＣＤ−ＲＯＭに記憶されていない楽曲、例えば、最新の流行歌等を再生する必要が生じたときには、センターホストコンピュータ２００から通信カラオケ装置１００に向けて当該楽曲データ等が伝送される。このとき、モデム２１は、センターホストコンピュータ２００から伝送される楽曲データ等を受信し、受信したデータを復調した後、ＲＡＭ１２または楽曲データ記憶部１６に転送する。
【００５８】
一方、映像再生部３０は、映像用ＣＰＵ３１、ＲＯＭ３２、インタフェース回路３３を介して接続された動作データ記憶手段としての映像データ記憶部３４、形状データ記憶手段としての形状データ記憶部３５および瞬時映像生成手段としての物体映像生成部３６を備えている。そして、これらはバス３７を介して相互に接続されている。なお、バス３７はカラオケ演奏部１０のバス２２との間でＦＩＦＯ回路２０を介し、相互にデータの転送を行うことができるようになっている。
【００５９】
ここで、映像用ＣＰＵ３１は、映像再生部３０の総合的な制御を行うものである。また、ＲＯＭ３２には、映像再生部３０の動作を定める制御プログラムおよび後述する同期映像生成処理を行うための制御プログラム等が記憶されている。
【００６０】
映像データ記憶部３４は、例えば、ハードディスクにより構成されており、映像データを記憶している。また、映像データ記憶部３４は書き換え可能であり、センターホストコンピュータ２００からモデム２１を介して受信された映像データや、ＣＤ−ＲＯＭ読取部１４から転送された映像データを追加して記憶したり、更新することができる。一方、形状データ記憶部３５は、後述する形状データを記憶するものであり、ＲＡＭ、ＲＯＭまたはハードディスク等によって構成されている。
【００６１】
物体映像生成部３６は、例えば、ＯＳＤ回路等によって構成され、映像データ記憶部３４等に記憶された映像データに含まれる動作データと、形状データ記憶部３５に記憶された形状データとに基づいて、楽曲に合わせて踊る歌手等の動く物体を映像（動画）として再生するための瞬時映像データを生成するものである。さらに具体的に説明すると、当該通信カラオケ装置１００では、図１５に示すように、楽曲の再生と同時に、楽曲の歌詞を表す歌詞画像Ｉｍ１と、背景画像Ｉｍ２と、楽曲に合わせて踊る歌手の映像Ｉｍ３をモニタ９０に表示する。物体映像生成部３６は、これら各画像および映像のうち、楽曲に合わせて踊る歌手の映像Ｉｍ３を表示するための瞬時映像データを生成する。ここで、瞬時映像データは、映像Ｉｍ３の１フレーム分を形成するデータである。
【００６２】
また、前記物体映像生成部３６には、作成用メモリ３６Ａと、表示用メモリ３６Ｂとが設けられている。物体映像生成部３６は、映像データ記憶部３４および形状データ記憶部３５からそれぞれ動作データおよび形状データを受信し、これら動作データおよび形状データを作成用メモリ３６Ａに展開し、この作成用メモリ３６Ａ内で瞬時映像データを生成する。その後、物体映像生成部３６は、この瞬時映像データを、作成用メモリ３６Ａから表示用メモリ３６Ｂに転送する。ここで、表示用メモリ３６Ｂは、画像１フレームに一対一に対応した、いわゆるビデオメモリである。従って、作成用メモリ３６Ａから表示用メモリ３６Ｂに瞬時映像データを転送することによって、表示用メモリ３６Ｂ内には、歌手等の物体を表す１フレーム分の映像が形成される。そして、表示用メモリ３６Ｂ内に形成された映像は、合成部５０に出力され、合成部５０において歌詞画像および背景画像と合成されて、モニタ９０に表示される。
【００６３】
背景画像生成部４０は、例えば、ＬＤ再生装置等により構成されており、モニタ９０に表示する背景画像Ｉｍ２を生成するものである。具体的に説明すると、背景画像生成部４０は、ＬＤ（Laser Disc）に記録された背景画像を形成するための画像データを読み出し、この画像データに基づいて背景画像を生成し、生成された背景画像を合成部５０に出力する。
【００６４】
モニタ９０は、例えば、ＣＲＴ（Cathode-Ray Tube）ディスプレイまたは液晶ディスプレイ等により構成されている。
【００６５】
上述したような通信カラオケ装置１００によれば、カラオケを行うときには、まず、入力部１７を操作して、楽曲の選択、テンポの設定等を行い、楽曲の演奏を開始すべき指示を入力する。これにより、カラオケ演奏部１０の楽曲データ記憶部１６に記憶された楽曲データの中から、選択された楽曲に対応する楽曲データが抽出される。また、選択された楽曲に対応する楽曲データが楽曲データ記憶部１６に存在しない場合には、ＣＤ−ＲＯＭ読取部１４に装着されたＣＤ−ＲＯＭ内から楽曲データが抽出される。さらに、選択された楽曲に対応する楽曲データが楽曲データ記憶部１６にも、ＣＤ−ＲＯＭ内にも存在しない場合には、選択された楽曲に対応する楽曲データの伝送を、センターホストコンピュータ２００に要求する。これにより、その楽曲データが、センターホストコンピュータ２００からモデム２１を介して受信される。
【００６６】
ここで、楽曲データ記憶部１６またはＣＤ−ＲＯＭ内から抽出され、またはセンターホストコンピュータ２００から伝送された楽曲データには、楽曲を再生するための音声データと歌詞画像を生成するための歌詞データが含まれている。従って、楽曲データは、音声データと歌詞データに分離され、音声データはＲＡＭ１２に、歌詞データは歌詞画像生成部１８にそれぞれ転送される。
【００６７】
続いて、音声用ＣＰＵ１１は、ＲＡＭ１２に転送された音声データに基づいて自動演奏を行う。これにより、音源部１９によって合成された音声がミキサアンプ６０を介してスピーカ７０に出力され、楽曲が演奏される。これと並行して、歌詞画像生成部１８は、歌詞データに基づいて歌詞画像を生成し、合成部５０に出力する。また、このとき、音声用ＣＰＵ１１は楽曲の演奏に対応した同期信号を映像再生部３０に向けて出力する。
【００６８】
さらに、これと並行して、映像再生部３０の映像データ記憶部３４に記憶された映像データの中から、選択された楽曲に対応する映像データが抽出される。また、選択された楽曲に対応する映像データが映像データ記憶部３４に存在しない場合には、ＣＤ−ＲＯＭ読取部１４に装着されたＣＤ−ＲＯＭ内から映像データが抽出される。さらに、選択された楽曲に対応する映像データが映像データ記憶部３４にも、ＣＤ−ＲＯＭ内にも存在しない場合には、選択された楽曲に対応する映像データの伝送を、センターホストコンピュータ２００に要求する。これにより、その映像データが、センターホストコンピュータ２００からモデム２１を介して受信される。
【００６９】
続いて、映像データ記憶部３４またはＣＤ−ＲＯＭ内から抽出され、またはセンターホストコンピュータ２００から伝送された映像データから動作データが抽出され、この動作データが、形状データ記憶部３５に記憶された形状データと共に、物体映像生成部３６の作成用メモリ３６Ａに転送される。そして、物体映像生成部３６により、歌手等の物体を表す映像の１フレーム分の画像を形成する瞬時映像データが生成され、合成部５０に出力される。このとき、物体映像生成部３６は、音声用ＣＰＵ１１から映像再生部３０に向けて出力された同期信号に同期するように瞬時映像データを出力する。
【００７０】
さらに、これと並行して、背景画像生成部４０から合成部５０に向けて、選択された楽曲に対応する背景画像が出力される。そして、歌詞画像生成部１８から出力された歌詞画像、物体映像生成部３６から出力された歌手等を表す映像および背景画像生成部４０から出力された背景画像は、合成部５０により合成され、モニタ９０に表示される。
【００７１】
これにより、選択された楽曲の再生が実行されると共に、楽曲の再生に同期して踊る歌手等の映像が、歌詞画像および背景画像と共にモニタ９０に表示される。
【００７２】
（２）楽曲データおよび映像データの構成
次に、楽曲データおよび映像データの構成について説明する。
【００７３】
図２は、楽曲データの構成を示している。楽曲データは、ヘッダＨｍ、対応する楽曲を特定するための番号である曲番号Ｎｍ、楽曲を再生する際の速度を示すテンポデータＴｍ、楽曲に対応する背景画像を指定するための番号である背景データＢｍおよび複数の演奏データＰｍ等から構成されている。
【００７４】
さらに、各演奏データＰｍは、音声データＳｄと時間データＴｓｄからなるブロックと、歌詞データＷｄおよび時間データＴｗｄからなるブロックとから構成されている。音声データＳｄは、楽曲を再生するためのデータであり、例えば、ＭＩＤＩデータ等によって構成されている。より具体的に説明すると、音声データＳｄは、音を鳴らす旨の指令、音を止める旨の指令、どの音を鳴らすかを指定する旨の指令等を含んでいる。音声データＳｄの隣に配列された時間データＴｓｄは、音声データに基づく指令を実行する時間（タイミング）を制御するのに用いられるデータである。一方、歌詞データＷｄは、歌詞画像を生成するためのデータである。歌詞データＷｄの隣に配列された時間データＴｗｄは、歌詞画像を生成する時間（タイミング）を制御するのに用いられるデータである。
【００７５】
また、図３は、映像データの構成を示している。映像データは、ヘッダＨｐ、対応する楽曲を特定するための番号である曲番号Ｎｐ、動作データのデータ数Ｄｐおよび複数の動作データＭｐ等から構成されている。
【００７６】
（３）動作データおよび形状データの構成
次に、動作データおよび形状データの構成について説明する。上述したように、動作データおよび形状データは、楽曲の再生に同期して踊る歌手等の映像を形成するための瞬時映像データを生成するのに用いられるデータである。
【００７７】
動作データは、人間、動物またはこれらの模倣物からなる物体を表す映像を複数の構成要素に分割し、これら各構成要素の位置および回転を設定するデータである。また、動作データは、図３に示すように、映像データ中に複数配列されており、個々の動作データ（瞬時動作データ）は、１フレーム分の映像の動き（位置および回転）を設定するものである。なお、上述したように、動作データは、映像データの一部に含まれ、各楽曲毎に個別に設定されている。また、映像データは、センターホストコンピュータから電話回線で受け取ることができる。
【００７８】
さらに具体的に説明すると、動作データは、人間、動物またはこれらの模倣物からなる物体を表す映像を複数の構成要素に分割し、これら各構成要素の位置または動作を設定するデータである。
【００７９】
例えば、図５に示すように、人間を模倣した人間モデルを仮想的に定義し、この人間モデルを腰、胸、頭、腕および脚等の各構成要素に分割する。この分割した各構成要素を「レベル」と呼ぶ。例えば、図５に示すように、人間モデルはレベル１〜レベル１７に分割され、そのうちレベルＬ１は人間モデルの腰に相当し、レベルＬ５は人間モデルの頭に相当する。そして、レベル１はＸ軸方向、Ｙ軸方向およびＺ軸方向に移動することができると共に、Ｘ軸方向の軸、Ｙ軸方向の軸およびＺ軸方向の軸をそれぞれ中心として回転することができる。各レベル２〜１７は、連結部Ｒでそれぞれ連結されており、連結部Ｒを基準にＸ軸方向の軸、Ｙ軸方向の軸およびＺ軸方向の軸を中心として回転することができる。動作データは、このような各レベルの位置および回転を記録したものである。
【００８０】
ここで、図４は、動作データの構成を示している。図４に示すように、動作データは、レベル１〜レベル１７の各動作データによって構成されており、レベル１の動作データは、位置座標（Ｘ,Ｙ,Ｚ）および回転角度（Ｘｒ,Ｙｒ,Ｚｒ）によって構成され、レベル２〜レベル１７の各動作データは回転角度（Ｘｒ,Ｙｒ,Ｚｒ）のみによって構成されている。
【００８１】
また、動作データは、以下の方法で作成する。即ち、楽曲を、その楽曲に予め設定された基準テンポＴｍで再生する。そして、その再生された楽曲に合わせて人間が実際に踊り（ダンス、振り）を行う。さらに、踊りを行っている人間の各構成要素の位置、回転等を所定の測定周期、例えば１／１５秒の周期で測定し、前記レベル１〜１７の各座標に関するデータを採取する。これにより、１／１５秒毎に動作データが作成される。なお、この測定周期は後述する表示周期と同一である。
【００８２】
このように構成される動作データは、従来のようなフレーム毎に完全な画像データを用意するものと比較してデータ量が少ない。従って、動作データを各楽曲毎に個別に設定することができ、センターホストコンピュータ２００から通信カラオケ装置１００に向けて短時間で伝送することができる。
【００８３】
一方、形状データは、人間、動物またはこれらの模倣物からなる物体を表す映像を複数の構成要素に分割し、これら各構成要素の形状を設定するデータである。形状データは、形状データ記憶部３５に記憶されている。そして、形状データは、動作データと異なり、各楽曲毎に個別に設定されていない。即ち、形状データは、主として複数の楽曲で共通して用いられる。なお、形状データは、表示する映像の種類に応じて複数種類設けられている。例えば、複数種類の形状データを変更することにより、歌手の映像を動物の映像に変更して表示することができる。
【００８４】
また、形状データは、各レベル１〜１７毎にそれぞれ設定されている。ここで、図６は、レベルＬ１の形状データの構成を示している。図６に示すように、形状データは、頂点座標Ａ１〜Ａ８からなる頂点座標データと、ポリゴンデータＰ１〜Ｐ６からなるポリゴンリンクデータとから構成されている。頂点座標データは、図７に示すように、レベルＬ１の各頂点座標Ａ１〜Ａ８を定めることにより、レベルＬ１の立体形状を設定するものである。また、ポリゴンリンクデータは、レベルＬ１の各面の質感や特性を定めるものである。具体的に説明すると、ポリゴンリンクデータを構成する各ポリゴンデータＰ１〜Ｐ６は、それぞれサーフェスデータと頂点番号とから構成されている。サーフェスデータは、表面の色、アンビエント、透明度、その表面に画像（テクスチャー）を張り付けるか否かを指示するデータ等を有している。頂点座標は、各表面を形成する頂点を示すデータである。レベルＬ１の形状データはこのように構成されるが、レベルＬ２〜Ｌ１７の各形状データも、同様に構成されている。また、形状データには、各レベルのつながりの関係を示す情報およびその位置、即ち、図５で示すような連結部Ｒの座標を含んでいる。
【００８５】
（４）楽曲と映像の同期
次に、本実施形態による通信カラオケ装置１００のカラオケ演奏部１０によって再生される楽曲と、映像再生部３０によって生成され、モニタ９０に表示される歌手等の映像の同期について説明する。
【００８６】
楽曲の再生は、主として音声用ＣＰＵ１１に設けられた自動演奏機能によって行われる。上述したように、音声用ＣＰＵ１１は、音声データに基づいてＭＩＤＩ規格に従った自動演奏を行う。このとき、自動演奏の基本的な時間制御はＭＩＤＩクロックによって行われる。
【００８７】
また、楽曲には、予め基準テンポが設定されており、楽曲データ中にテンポデータＴｍとして記述されている。この基準テンポは、楽曲毎に異なり、例えば、リズム感のあるロック等の楽曲の場合には、比較的速い基準テンポが設定されている。また、バラード等のゆっくりした楽曲の場合には、比較的遅い基準テンポが設定されている。従って、楽曲の再生は、通常、基準テンポで行われる。しかしながら、入力部１７によってテンポの変更を入力されたときには、入力されたテンポに従って、楽曲を基準テンポよりも遅いテンポまたは基準テンポよりも速いテンポで再生する。
【００８８】
一方、歌手等の映像の生成は、主として映像用ＣＰＵ３１および物体映像生成部３６によって行われる。上述したように、物体映像生成部３６は、動作データと形状データとに基づいて映像の１フレームを形成する瞬時映像データを生成する。即ち、物体映像生成部３６は、作成用メモリ３６Ａ内で瞬時映像データを次々に生成し、生成した瞬時映像データを表示用メモリ３６Ｂに所定の表示周期で転送する。これにより、所定の表示周期で変化する映像が合成部５０を介してモニタ９０に表示される。
【００８９】
ここで、前記所定の表示周期は、動作データを測定したときに用いた測定周期と同一であり、例えば、１／１５秒である。即ち、動作データは、上述したように、楽曲を基準テンポＴｍで再生し、その再生された楽曲に合わせて人間が実際に踊り（ダンス、振り）を行い、その踊っている人間の各構成要素の位置、回転等を所定の測定周期、例えば１／１５秒周期で測定することによって生成される。従って、このように生成された動作データを用い、かつ、前記所定の測定周期と同一の表示周期で瞬時映像データを表示することにより、測定時の人間の動きをそのまま再現し、自然にかつ滑らかに動く映像を再生することができる。
【００９０】
さらに、物体映像生成部３６が瞬時映像データを表示する表示周期、即ち、動作データを測定する測定周期は、モニタ９０が１フレームを表示する表示周期の整数倍となるように設定されている。即ち、モニタ９０は、ＣＲＴディスプレイまたは液晶ディスプレイによって構成されており、１フレームの画像を一定の表示周期、例えばＮＴＳＣ方式であれば、２フィールドからなる１フレームを１／３０秒周期で表示する。この場合、物体映像生成部３６が瞬時映像データを表示する表示周期は、例えば、モニタ９０が１フレームを表示する表示周期の２倍の周期、即ち、１／１５秒である。従って、同一の画像が２フィールドにわたって表示されることになる。
【００９１】
また、これら楽曲の再生と映像の表示との同期は、音声用ＣＰＵ１１から映像用ＣＰＵ３１に出力される同期信号と、後述する再生フレームテーブルＴｓに基づいて行われる。同期信号は、音声用ＣＰＵ１１によりＭＩＤＩクロックに基づいて生成される。ＭＩＤＩクロックは、例えば、楽曲の４分音符当たり２４のクロックパルスを出力する信号であり、楽曲のテンポに対応している。そして、同期信号は、このＭＩＤＩクロックを分周した信号であり、例えば、楽曲の８分音符当たり１のクロックパルスを出力する信号である。
【００９２】
さらに、同期信号を構成する各クロックパルスには、図９に示すように、楽曲の再生位置を認識するための識別符号が付される。この識別符号は、楽曲の先頭から末尾に向けて１,２,３,…と１ずつ増加する。
【００９３】
さらにまた、再生フレームテーブルは、図８に示すように、同期信号を構成する各クロックパルスと映像を構成する各フレームとの時間的な対応関係を記述したテーブルである。即ち、物体映像生成部３６から出力される瞬時映像データは、１／１５秒周期で変化し、モニタ９０に表示される映像の１フレーム分を形成する。ここで、モニタ９０に表示される映像の各フレーム（以下、これを「再生フレーム」という）を、図９に示すように、所定の再生周期ｔ（１／１５秒周期）で再生順に配列し、各再生フレームにＦ１〜Ｆ１１等のフレーム番号を付ける。そして、再生する楽曲の基準テンポに基づいて同期信号の周期を算定し、これに基づいて、同期信号を構成する各クロックパルスに付された識別番号が、再生順に配列された各再生フレームＦ１〜Ｆ１１等のうち、いずれの再生フレームに一致するかを算定する。その結果を記述したものが、再生フレームテーブルＴｓである。なお、この再生フレームテーブルＴｓは、楽曲の再生が開始される直前に作成される。
【００９４】
それでは、楽曲の再生と映像の表示との同期が、上述した同期信号および再生フレームテーブルＴｓに基づいてどのようにして行われるかについて、図８ないし図１１に従って説明する。
【００９５】
▲１▼楽曲の再生が基準テンポのとき
まず、楽曲の再生が基準テンポのときには、図９に示すように、楽曲の再生開始と同時に、第１番目の再生フレームＦ１を表示し、それ以降、楽曲の再生と並列して各再生フレームＦ２〜Ｆ１１等を再生順通りに表示する。即ち、楽曲の再生と並列して、物体映像生成部３６によって各再生フレームを形成するための瞬時映像データを生成し、生成した瞬時画像データを再生周期ｔ毎に合成部５０に出力する。合成部５０は、モニタ９０の表示周期に同期させて合成された画像データをモニタ９０に出力する。
【００９６】
ここで、瞬時映像データを形成するために用いられる動作データ（瞬時動作データ）は、基準テンポで再生した楽曲に合わせて踊る人間の動きを測定することによって生成されている。従って、楽曲を基準テンポで再生するときには、楽曲の再生開始時点と、映像の表示開始時点を一致させ、楽曲の再生と映像の表示とを並列して行えば、楽曲の再生と映像の表示は同期する。しかしながら、楽曲の再生テンポが再生中に変更された場合や、楽曲を一度停止して再び再生する場合等に備えて、楽曲の再生と各再生フレームの表示との時間的な対応関係を常に認識しておく必要がある。そこで、通信カラオケ装置１００は、楽曲の再生と各再生フレームの表示との時間的な対応関係を、同期信号および再生フレームテーブルＴｓに基づいて常に認識している。即ち、図８に示す再生フレームテーブルＴｓによれば、同期信号のクロックパルス１は再生フレームＦ１に対応し、クロックパルス２は再生フレームＦ４に対応している。さらに、クロックパルス３は再生フレームＦ７に対応し、クロックパルス４は再生フレームＦ１１に対応している。従って、楽曲を基準テンポで再生するときには、各クロックパルスの出力と、再生フレームテーブルＴｓに記述された再生フレームの表示とが時間的に一致し、かつ、各クロックパルスと再生フレームの対応関係が、再生フレームテーブルの記述に合致しているか確認しながら、映像の表示を行う。
【００９７】
▲２▼楽曲の再生が基準テンポよりも遅いとき
次に、楽曲の再生が基準テンポよりも遅いときには、図１０に示すように、再生フレームＦ１〜Ｆ８等の間に補間フレームＧ１,Ｇ２等を挿入する。即ち、楽曲の再生が基準テンポより遅いときでも、各再生フレームの生成・表示は、所定の表示周期ｔに同期して行われるため、各再生フレームをそのまま再生順に表示したのでは、各再生フレームの表示が楽曲の再生よりも早くなり、楽曲の再生と映像の表示との同期がとれなくなる。そこで、各クロックパルスの出力と、再生フレームテーブルＴｓに記述された再生フレームの表示とが時間的に一致し、かつ、各クロックパルスと再生フレームの対応関係が、再生フレームテーブルの記述に合致するように、再生フレームＦ１〜Ｆ８等の間に補間フレームＧ１,Ｇ２等を挿入する。そして、再生フレームＦ１〜Ｆ８等および補間フレームＧ１,Ｇ２を順次表示する。
【００９８】
ここで、補間フレームＧ１,Ｇ２を形成する瞬時画像データを生成するのに用いられる動作データ（以下、これを「補間動作データ」という）は、補間フレームを挿入する位置の直前に配置された再生フレームに対応する瞬時画像データの動作データ（以下、これを「直前動作データ」という）と、補間フレームを挿入する位置の直後に配置された再生フレームに対応する瞬時画像データの動作データ（以下、これを「直後動作データ」という）とに基づいて算定される。さらに詳しく説明すると、映像の動きが激しく、直前動作データと直後動作データとの値の差が大きいときには、直前動作データと直後動作データとの平均値が、補間動作データとして用いられる。映像の動きが緩やかで、直前動作データと直後動作データとの値の差が小さいときには、直前動作データと直後動作データのうち、いずれか一方の値が、補間動作データとして用いられる。これにより、楽曲が基準テンポよりも遅い速度で再生されても、楽曲の再生と映像の表示との同期を容易にとることができると共に、表示される映像を滑らかに動作させることができる。
【００９９】
▲３▼楽曲の再生が基準テンポよりも速いとき
次に、楽曲の再生が基準テンポよりも速いときには、図１１に示すように、各再生フレームＦ１〜Ｆ１３等のうち、一部の再生フレームＦ３，Ｆ６，Ｆ９，Ｆ１０，Ｆ１３等を抜き取り、各再生フレームを間引きする。即ち、楽曲の再生が基準テンポより速いときでも、各再生フレームの生成・表示は、所定の表示周期ｔに同期して行われるため、各再生フレームをそのまま再生順に表示したのでは、各再生フレームの表示が楽曲の再生よりも遅くなり、楽曲の再生と映像の表示との同期がとれなくなる。そこで、各クロックパルスの出力と、再生フレームテーブルＴｓに記述された再生フレームの表示とが時間的に一致し、かつ、各クロックパルスと再生フレームの対応関係が、再生フレームテーブルの記述に合致するように、各再生フレームを間引きしながら、各再生フレームの表示を行う。これにより、楽曲が基準テンポよりも速い速度で再生されても、楽曲の再生と映像の表示との同期を容易にとることができる。
【０１００】
（５）同期映像生成処理
次に、同期映像生成処理について図１２ないし図１４のフローチャートに沿って説明する。同期映像生成処理は、上述したような楽曲の再生と映像の表示との同期を行う処理であり、映像再生部３０のＲＯＭ３２に記憶された制御プログラムに従って映像用ＣＰＵ３１および物体映像生成部３６によって実行される。
【０１０１】
まず、通信カラオケ装置１００を利用者が、入力部１７により、再生すべき楽曲を選択し、楽曲の再生を開始する旨の指令を入力すると、カラオケ演奏部１０の音声用ＣＰＵ１１は、選択された楽曲に対応する楽曲データを楽曲データ記憶部１６等からＲＡＭ１２に転送する。そして、音声用ＣＰＵ１１は、楽曲データからテンポデータを抽出し、このテンポデータに基づいて選択された楽曲の基準テンポを認識し、その基準テンポのデータを映像用ＣＰＵ３１に出力する。さらに、音声用ＣＰＵ１１は、入力部１７から入力されたテンポ変更の指令等を認識し、これら指令に従って楽曲再生に関する設定を行う。
【０１０２】
そして、上述したように音声用ＣＰＵ１１が楽曲再生に関する設定を行っている間に、映像再生部３０の映像用ＣＰＵ３１は、選択された楽曲に対応する映像データを映像データ記憶部３４等から物体映像生成部３６に転送すると共に、形状データを形状データ記憶部３５から物体映像生成部３６に転送する。さらに、映像用ＣＰＵ３１は、音声用ＣＰＵ１１から受け取った基準テンポを、物体映像生成部３６に出力する。続いて、物体映像生成部３６において、以下に説明する同期画像生成処理のプログラムがスタートする。
【０１０３】
まず、図１２中のステップ１では、映像データ記憶部３４等から転送される映像データ、形状データ記憶部３５から出力される形状データおよび映像用ＣＰＵ３１から出力される基準テンポのデータを、物体映像生成部３６が受信したか否か判定する。いずれのデータも受信しないときには、いずれかのデータを受信するまでステップ１を繰り返す。
【０１０４】
そして、形状データを受信したときには、ステップ１からステップ２を経てステップ５に移行する。そして、ステップ５では、受信した形状データを物体映像生成部３６の作成用メモリ３６Ａに展開し、再びステップ１に戻る。また、ステップ１で映像データを受信したときには、ステップ１からステップ２および３を経てステップ５に移行する。そして、ステップ５では、映像データ中から動作データを抽出し、その動作データを物体映像生成部３６の作成用メモリ３６Ａに展開し、再びステップ１に戻る。このとき、各動作データにはフレーム番号を付し、フレーム番号によってランダムアクセス可能に記憶しておく。最終的に、フレーム番号は１からデータ数Ｄｐまでとなる。さらに、ステップ１で基準テンポのデータを受信したときには、ステップ１からステップ２および３を経てステップ４に移行する。そして、ステップ４で「ＹＥＳ」と判定し、受信した基準テンポのデータを物体映像生成部３６に設けられたメモリに記憶してから、ステップ６に移行する。一方、ステップ１で、形状データ、映像データおよび基準データ以外のデータを受信したときには、ステップ１からステップ２および３を経てステップ４に移行し、ステップ４で「ＮＯ」と判定し、ステップ１に戻る。
【０１０５】
ステップ６では、すべての形状データを受信し、形状データの受信が完了したか否かを判定する。その結果、形状データの受信を完了していないときには、ステップ１に戻って形状データの受信を待ち、形状データの受信を完了したときには、ステップ７に移行する。
【０１０６】
ステップ７では、選択された楽曲に対応する映像データ（動作データ）をすべて受信し、映像データの受信を完了したか、即ち、データ数Ｄｐ分の動作データを受信したか否かを判定する。その結果、映像データの受信を完了していないときには、ステップ１に戻って映像データの受信を待ち、映像データの受信を完了したときには、ステップ８に移行する。
【０１０７】
ステップ８では、モニタ９０に表示すべき映像の視点位置と光源位置をデフォルト（初期設定）の位置に設定する。ここで、物体映像生成部３６は、視点位置、即ち、物体の映像をモニタ９０に表示するときに、物体をどの角度、どの距離から見た状態で表示するかを設定することができ、例えば、歌手等の映像を前からだけでなく、横や後ろから見た状態で表示することができる。なお、視点位置をデフォルトの位置に設定すると、前から見た歌手等の映像が表示される。また、物体映像生成部３６は、光源位置、即ち、映像をモニタ９０に表示するときに、映像にどの角度からライトを当てた状態で表示するかを設定することができ、例えば、左上、右上、正面等からライトを当てた状態で歌手等の映像を表示することができる。なお、光源位置をデフォルトの位置に設定すると、左上からライトを当てた歌手等の映像が表示される。
【０１０８】
ステップ９では、ステップ４で受信した基準テンポのデータおよびデータ数Ｄｐ、即ち動作データＭｐの数に基づいて、図８に示すような再生フレームテーブルＴｓをその１曲分について作成する。
【０１０９】
次に、図１３中のステップ１０では、楽曲の再生位置を認識する。即ち、通常、楽曲は、その先頭から再生されるが、利用者が入力部１７を操作して、楽曲の再生位置を指定した場合には、その指定された位置から楽曲を再生することができる。即ち、利用者が楽曲の再生位置を指定した場合には、その指定された再生位置を示すデータが物体映像生成部３６に入力される。ここで、同期信号を構成する各クロックパルスには、再生位置を認識するための識別符号が付されている。従って、指定された再生位置を示すデータに基づいて、指定された再生位置に対応するクロックパルスの識別符号を認識することにより、指定された位置から楽曲が再生することができる。
【０１１０】
ステップ１１では、楽曲の再生開始の指示を受信したか否か判定する。即ち、利用者が入力部１７を操作して、楽曲の再生を開始する旨の指示を入力すると、その指示は物体映像生成部３６に送信される。そして、その指示を物体映像生成部３６が受信すると、ステップ１１で「ＹＥＳ」と判定し、ステップ１２に移行する。これにより、カラオケ演奏部１０では、ステップ１０で指定された再生位置から楽曲の再生が開始され、映像再生部３０では、ステップ１０で指定された再生位置から映像の生成が開始される。一方、楽曲の再生を開始する旨の指示を受信しないときには、ステップ１１で「ＮＯ」と判定し、楽曲の再生を開始する旨の指示を受信するまでステップ１１を繰り返す。
【０１１１】
ステップ１２では、楽曲の再生終了の指示を受信したか否か判定する。即ち、利用者が入力部１７を操作して、楽曲の再生を終了（中止）する旨の指示を入力すると、その指示は物体映像生成部３６に送信される。そして、その指示を物体映像生成部３６が受信すると、ステップ１２で「ＹＥＳ」と判定し、楽曲の再生を終了すると共に、同期画像生成処理を終了する。一方、楽曲の再生を終了する旨の指示を受信しないときには、ステップ１２で「ＮＯ」と判定し、ステップ１３に移行する。
【０１１２】
ステップ１３およびステップ１４では、視点情報または光源情報を受信したか否かを判定する。即ち、通信カラオケ装置１００は、視点位置を設定するための情報である視点情報と、光源位置を設定するための情報である光源情報を利用者の指示に従って設定、変更する機能を有している。そして、利用者が入力部１７を操作して視点情報または光源情報を入力すると、このステップ１３またはステップ１４で、「ＹＥＳ」と判定され、ステップ１５に移行し、ステップ１５で、各情報が物体映像生成部３６に設けられたメモリに記憶される。
【０１１３】
ステップ１６では、ステップ５で、物体映像生成部３６の作成用メモリ３６Ａに展開した形状データおよび動作データに基づいて再生フレームを生成する。即ち、形状データおよび動作データに基づいて再生フレームを形成するための瞬時映像データを生成する。そして、再生フレームを生成してから１／１５秒間経過するまでステップ１７を繰り返し、その後、ステップ１８で、音声用ＣＰＵ１１から出力された同期信号のクロックパルスを受信したか否かを判定する。
【０１１４】
ステップ１８で同期信号のクロックパルスを受信したときには、ステップ２２に移行し、ステップ２２で、同期信号のクロックパルスを受信した時点において、次に表示しようとしている再生フレームが同期フレームか否かを判定する。ここで、「同期フレーム」とは、再生フレームテーブルＴｓに記述されたフレーム番号に相当する再生フレームを意味する。即ち、上述したように、再生フレームテーブルＴｓは、楽曲を基準テンポで再生し、かつ、これと同時に、各再生フレームを所定の表示周期（例えば、１／１５秒周期）で表示したとき、同期信号を構成する各クロックパルスと時間的に一致する再生フレームを記述したものである。従って、同期信号の各クロックパルスを受信したときには、再生フレームテーブルＴｓに記述されたフレーム番号に相当する再生フレームを表示するようにすれば、楽曲の再生と映像の表示との同期をとることができる。この意味で、再生フレームテーブルに記述されたフレーム番号に相当する再生フレームを「同期フレーム」という。例えば、図８に示す再生フレームテーブルによれば、同期フレームは、再生フレームＦ１，Ｆ４，Ｆ７，Ｆ１１等である。なお、図９ないし図１１では、枠内に斜線を付した再生フレームが、同期フレームである。
【０１１５】
そして、ステップ２２の判定の結果、同期信号のクロックパルスを受信した時点において、次に表示しようとしている映像の再生フレームが同期フレームであるときには、その同期フレームを表示すべく、図１４中のステップ２４に移行する。
【０１１６】
一方、ステップ２２の判定の結果、同期信号のクロックパルスを受信した時点において、次に表示しようとしている映像の再生フレームが同期フレームでないときには、ステップ２３に移行し、ステップ２３で、その再生フレームを破棄する。そして、その破棄した再生フレームの代わりに、その再生フレームからみて時間的に後側に配列されている各同期フレームのうち、一番近い同期フレームを表示すべく、その同期フレームを形成する瞬時動作データを使用して生成する。そして、その同期フレームを表示すべく、図１４中のステップ２４に移行する。
【０１１７】
ここで、ステップ２２の判定の結果、同期信号のクロックパルスを受信した時点において、次に表示しようとしている再生フレームが同期フレームでないといった事態は、楽曲の再生が基準テンポよりも速いときに起こる。即ち、図９に示すように、楽曲の再生が基準テンポのときには、再生順に配列された各再生フレームがそのままの順序で表示されれば、各クロックパルスの出力と同期フレームの表示とは時間的に一致し、楽曲の再生と映像の表示との同期はとれる。このため、ステップ１８で同期信号のクロックパルスを受信した時点において、次に表示しようとしている映像の再生フレームは、常に同期フレームである。言い換えれば、同期信号を構成する各クロックパルスが出力される時間間隔は、同期フレームが配置された時間間隔と等しい。一方、図１１に示すように、楽曲の再生が基準テンポよりも速いときには、同期信号を構成する各クロックパルスが出力される時間間隔が、各同期フレームが配置された時間間隔よりも短くなる。従って、ステップ２２の判定の結果、同期信号のクロックパルスを受信した時点において、次に表示しようとしている再生フレームが同期フレームでないといった事態が生じる。この場合には、次に表示しようとしている再生フレームを間引きする。即ち、次に表示しようとしている再生フレームを破棄し、破棄した再生フレームの代わりに、その破棄した再生フレームからみて時間的に後側に配列されている各同期フレームのうち一番近い同期フレームを表示するようにする。これにより、楽曲の再生が基準テンポよりも速くても、各クロックパルスの出力と各同期フレームの表示とを時間的に一致させることができ、楽曲の再生と映像の表示との同期をとることができる。即ち、表示される映像の動きは、楽曲の再生が速められた分、速くなる。
【０１１８】
一方、ステップ１７において、再生フレームを生成してから１／１５秒間経過した時点において、ステップ１８で同期信号のクロックパルスを受信しなかった場合には、ステップ１９に移行し、ステップ１９で、次に表示しようとしている再生フレームが同期フレームか否かを判定する。そして、ステップ１９の判定の結果、次に表示しようとしている再生フレームが同期フレームでないときには、その再生フレームを表示すべく、図１４中のステップ２４に移行する。
【０１１９】
一方、ステップ１９の判定の結果、次に表示しようとしている再生フレームが同期フレームの場合には、ステップ２０に移行し、補間フレームを生成し、ステップ２１で、次に表示しようとしている同期フレームの直前に、生成した補間フレームを挿入する。そして、挿入した補間フレームを表示すべく、図１４中のステップ２４に移行する。
【０１２０】
ここで、ステップ１９の判定の結果、同期信号のクロックパルスを受信していないにも拘わらず、次に表示しようとしている再生フレームが同期フレームであるといった事態は、楽曲の再生が基準テンポよりも遅いときに起こる。即ち、図１０に示すように、楽曲の再生が基準テンポよりも遅いときには、同期信号を構成する各クロックパルスが出力される時間間隔が、各同期フレームが配置された時間間隔よりも長くなる。従って、ステップ１９の判定の結果、同期信号のクロックパルスを受信していないにも拘わらず、次に表示しようとしている再生フレームが同期フレームであるといった事態が生じる。この場合には、各同期フレームの間に補間フレームを挿入し、各クロックパルスの出力と、各同期フレームの表示とが一致するように調整する。これにより、楽曲の再生が基準テンポよりも遅くても、楽曲の再生と映像の表示との同期をとることができる。即ち、表示される映像の動きは、楽曲の再生が遅くなった分だけ、遅くなる。
【０１２１】
ステップ２４では、再生フレームを形成する瞬時映像データの動作データおよび視点情報に基づいてポリゴンの表示座標系での座標を計算する。ここで、ステップ１３で入力された視点情報に従って、表示すべき映像の視点位置が算定される。
【０１２２】
ステップ２５では、形状データ、光源情報および視点情報に基づいてテクスチャの張り付けを行う。これにより、視点位置によって変化する表面の模様や質感がポリゴンの各表面に付加される。
【０１２３】
ステップ２６では、光源情報および視点情報に基づいてポリゴンのシェーディングを行う。これにより、光源の向きによって形成される影等が各ポリゴンに付加される。
【０１２４】
ステップ２７では、物体映像生成部３６の作成用メモリ３６Ａに生成された瞬時映像データを表示用メモリ３６Ｂに転送する。これにより、瞬時映像データによって形成された映像が、合成部５０を介してモニタ９０に表示される。
【０１２５】
その後、ステップ１２に戻り、ステップ１２で、楽曲の再生を終了する旨の指示を受信するまで、ステップ１３〜ステップ２７の処理を繰り返し実行する。これにより、楽曲が再生されると共に、モニタ９０には、図１５に示すように、楽曲に合わせて踊る歌手等の映像が表示される。
【０１２６】
かくして、本実施形態による通信カラオケ装置１００によれば、楽曲の再生に合わせて踊る歌手等の映像をモニタ９０に表示することができる。特に、歌手等の映像を形成する瞬時映像データを生成するときに用いられる動作データと形状データをそれぞれ分離して記憶する構成としたから、比較的データ量の少ない動作データを各楽曲毎に設け、形状データを各楽曲共通にすれば、各楽曲毎に固有の動作をする映像を少ないデータ量で容易に生成することができる。
【０１２７】
さらに、動作データは、比較的データ量が少ないため、楽曲データと同様に、センターホストコンピュータ２００から電話回線を介して伝送することができる。これにより、最新の楽曲等、通信カラオケ装置１００内に記憶されていない楽曲を再生するときには、楽曲データと共に、その楽曲に対応する動作データをセンターホストコンピュータ２００から受け取ることができる。従って、最新の楽曲等を即座に再生できると共に、その楽曲に施された最新の踊り、振り付け等を楽曲に同期させながら、即座に表示することができる。
【０１２８】
さらに、動作データと形状データをそれぞれ分離して記憶する構成としたから、表示する映像の種類に応じて複数種類の形状データを設ければ、形状データの種類を変更することによって、楽曲に合わせて踊る映像の様子を大幅に変化させることができる。例えば、男性の映像を形成する形状データと、女性の映像を形成する形状データとの２種類の形状データを設ければ、動作データを変更することなく、１つの楽曲について、男性が踊る映像と女性が踊る映像との２種類の映像を選択して、あるいは合成して同時に表示することができる。
【０１２９】
また、本実施形態による通信カラオケ装置１００によれば、楽曲データと動作データとが完全に分離しているため、既存の楽曲データに動作データを容易に後付することができる。例えば、既存の楽曲データを再生し、その再生された楽曲に合わせて人間が踊り、その人間の動きを測定して動作データを生成するだけで、
既存の楽曲データに対応した動作データを生成することができる。そして、このようにして生成された動作データを、例えば、センターホストコンピュータ２００から通信カラオケ装置１００に伝送すれば、既存の楽曲データに対応する動作データを容易に追加することができる。従って、既存の楽曲データを作成し直す必要がなく、既存の楽曲データを有効に利用することができる。
【０１３０】
また、本実施形態による通信カラオケ装置１００によれば、基準テンポで再生された楽曲に合わせて踊る人間の動作を所定の測定周期で測定することにより、動作データを生成し、この動作データを用いて生成された瞬時映像データを、前記測定周期と同一の表示周期で表示する構成としたから、楽曲を基準テンポで再生したとき、楽曲の再生と映像の表示との同期を容易にとることができ、かつ、楽曲に合わせて踊る歌手等の映像を正確に再現できる。
【０１３１】
さらに、楽曲の再生が基準テンポよりも遅いときには、補間フレームを、各同期フレームの間に挿入する構成としたから、楽曲の再生が基準テンポよりも遅くても、楽曲の再生と映像の表示とを同期させることができ、かつ、表示される映像を滑らかに動かすことができる。また、楽曲の再生が基準テンポよりも速いときには、一部を再生フレームを間引きする構成としたから、楽曲の再生が基準テンポよりも速いときでも、楽曲の再生と映像の表示とを容易の同期させることができる。
【０１３２】
一方、本実施形態による通信カラオケ装置１００によれば、各再生フレームの表示周期、即ち、各瞬時映像データの表示周期を、モニタ９０が１フレームの画像を表示する表示周期の整数倍となるように設定したから、物体映像生成部３６によって生成された各瞬時映像データによって形成される映像を、モニタ９０の表示周期に合わせて表示することができる。従って、物体映像生成部３６によって生成された各瞬時映像データによって形成される映像をすべてモニタ９０に表示することができ、モニタ９０に表示される映像の動きを滑らかにすることができる。
【０１３３】
また、上述した実施形態によれば、各再生フレームを表示する表示周期は、同期信号の周期よりも短い。ここで、楽曲のテンポは、通常、１分間あたりの４分音符の数で表す。テンポが速い楽曲でも、１分間あたりの４分音符の数はせいぜい２４０程度である。一方、同期信号の周期は、楽曲の８分音符あたりに１クロックパルスが出力される周期であるから、１分間あたりの４分音符の数が２４０の楽曲を再生すると、同期信号の周期は、１／８秒となる。これに対し、各再生フレームを表示する表示周期は、例えば１／１５秒である。従って、各再生フレームを表示する表示周期は、同期信号の周期よりも短い。
【０１３４】
このように、各再生フレームを表示する表示周期が、同期信号の周期よりも短いと、同期信号を構成する各クロックパルスが出力される間に、複数の再生フレームを表示することができる。例えば、図９に示すように、クロックパルス１が出力されてから、次のクロックパルス２が出力されるまでの間に、４つの再生フレームＦ１〜Ｆ４を表示することができる。このように、同期信号を構成する各クロックパルスが出力される間に、複数の再生フレームを表示することができると、楽曲のテンポが基準テンポよりも速くなったときでも、図１１のように、一部の再生フレームを間引きすることによって、楽曲の再生と映像の表示との同期を容易にかつ正確にとることができる。一方、楽曲のテンポが基準テンポよりも遅くなったときでも、図１０に示すように、各再生フレーム間に補間フレームを挿入することにより、各クロックパルスの出力と各同期フレームの表示とを時間的に正確に一致させることができ、楽曲の再生と映像の表示との同期を容易にかつ正確にとることができる。
【０１３５】
また、本実施形態による通信カラオケ装置１００によれば、同期信号を構成する各クロックパルスに、楽曲の再生位置を認識するための識別符号を付したから、楽曲の再生位置を常に認識することができ、楽曲の途中からの再生や、楽曲の早送り、巻き戻し等を容易に行うことができる。
【０１３６】
また、上述したように、形状データは、形状データ記憶部３５に複数種類記憶されており、楽曲のタイプ（曲調や歌手が男性か女性か等）によって選択することができる。例えば、楽曲データにどの形状データを使用するかを示す選択データを記述しておくことにより、その楽曲に標準的な歌手が自動的に選択できる。また、形状データを入力部１７からの入力によって選択できる構成とすることにより、利用者の好みに応じた歌手等を選択することができる。さらに、複数の歌手を選択すれば、モニタ９０内に同一の動作をする複数の歌手を同時に並べて表示することができる。
【０１３７】
なお、前記実施形態では、補間フレームを同期フレームの直前に挿入するが、本発明はこれに限らず、同期フレームではない各再生フレームの間や、同期フレームの直後に挿入してもよい。また、楽曲のテンポが基準テンポよりも大幅に遅いときには、複数の補間フレームを連続的に挿入する必要が生じる。この場合には、各補間フレームの挿入位置を分散させるようにする。これにより、表示される物体の動きをより滑らかにすることができる。同様に、楽曲のテンポが基準テンポよりも大幅に速いときには、複数の再生フレームを連続的に間引きする必要が生じる。この場合には、間引きする各再生フレームを分散させるようにする。これにより、表示される物体の動きをより滑らかにすることができる。また、各補間フレームの挿入位置を分散させる処理、または、各再生フレームを分散的に間引きする処理は、楽曲の再生中に演算して求めてもよく、楽曲の再生前のテンポ設定時等に演算して求めてもよい。
【０１３８】
また、前記実施形態では、表示周期を１／１５秒としたが、本発明はこれに限らず、動作データのデータ量、画像作成処理の所要時間等の条件がよければ、動きをより滑らかに再生するために、表示周期を１／１５秒よりも短い時間としてもよい。例えば、前記実施形態におけるモニタ９０の表示周期は、１フレームが２フィールドからなるので、１フィールド１／６０秒である。従って、表示周期を１／６０秒としてもよい。
【０１３９】
また、前記実施形態では、形状データを映像再生部３０内の形状データ記憶部３５に記憶するようにしたが、本発明はこれに限るものではない。例えば、形状データをＲＡＭ１２内に記憶してもよい。また、前記実施形態では、動作データを映像データ記憶部３４に記憶するようにしたが、例えば、動作データを楽曲データと共に楽曲データ記憶部１６に記憶してもよい。即ち、形状データ、動作データ、楽曲データを、それぞれ独立したファイルとして扱うことができ、それぞれのデータに個別にアクセスでき、個別に追加、変更、削除等ができれば、前記各データをそれぞれ別々の記憶装置に記憶しなくてもよい。
【０１４０】
また、前記実施形態では、歌手等の人間の映像を表示する場合を例に挙げたが、本発明はこれに限らず、動物や、人間または動物の模倣物を表示してもよい。例えば、犬や猫といった動物のキャラクターが人間のように踊る映像を表示するようにしてもよい。
【０１４１】
また、前記実施形態では、形状データを形状データ記憶部３５に記憶するものとして述べたが、本発明はこれに限らず、形状データをＣＤ−ＲＯＭ等を用いて追加、変更することができる構成としてもよい。また、センターホストコンピュータ２００から形状データを受信し、適宜、追加、変更、削除できる構成としてもよい。この場合、形状データ自体は、比較的データ量が小さいため、センターホストコンピュータ２００から通信カラオケ装置１００に向けて短時間で伝送することができる。そこで、通信カラオケ装置１００が利用されない時間に、形状データの伝送を適宜行うようにすればよい。
【０１４２】
さらに、前記実施形態では、同期画像生成方法を通信カラオケ装置に適用した場合を例に挙げて説明したが、本発明はこれに限らず、エアロビクスの音楽とエアロビクスのインストラクターの映像とを同期させる装置や、アナウンサーの声と手話の映像を同期させる装置等にも適用できる。
【０１４３】
【発明の効果】
以上詳述したとおり、請求項１の発明によれば、音声に同期して動く人間、動物等の映像を表示することができる。
【０１４４】
請求項２の発明によれば、音声の再生速度を遅くした場合でも、表示している映像を音声の再生に同期させながら滑らかに動作させることができる。
【０１４５】
請求項３の発明によれば、音声の再生速度を速くした場合でも、音声の再生に同期して動く映像を容易に表示させることができる。
【０１４６】
請求項４の発明によれば、楽曲に応じて、または利用者の好みに応じて、表示する物体の形状等を設定、変更することができる。
【０１４７】
請求項５の発明によれば、音声に同期して動く人間、動物等の映像を表示することができる。また、音声データの再生速度を変更しても、音声データの再生と映像の表示との同期をとることができる。
【図面の簡単な説明】
【図１】本発明の実施形態による通信カラオケ装置を示すブロック図である。
【図２】本発明の実施形態において、楽曲データの構成を示す説明図である。
【図３】本発明の実施形態において、映像データの構成を示す説明図である。
【図４】本発明の実施形態において、動作データの構成を示す説明図である。
【図５】本発明の実施形態において、物体モデルを示す説明図である。
【図６】本発明の実施形態において、形状データの構成を示す説明図である。
【図７】本発明の実施形態において、物体モデルの構成要素を示す斜視図である。
【図８】本発明の実施形態において、再生フレームテーブルを示す説明図である。
【図９】本発明の実施形態において、基準速度で楽曲が再生されているときに、ＭＩＤＩクロック、同期信号および再生フレームが同期している状態を示すタイムチャートである。
【図１０】本発明の実施形態において、基準速度よりも遅いテンポで楽曲が再生されているときに、ＭＩＤＩクロック、同期信号および再生フレームが同期している状態を示すタイムチャートである。
【図１１】本発明の実施形態において、基準速度よりも速いテンポで楽曲が再生されているときに、ＭＩＤＩクロック、同期信号および再生フレームが同期している状態を示すタイムチャートである。
【図１２】本発明の実施形態の通信カラオケ装置による同期映像生成処理を示すフローチャートである。
【図１３】図１２に続く同期映像生成処理を示すフローチャートである。
【図１４】図１３に続く同期映像生成処理を示すフローチャートである。
【図１５】本発明の実施形態による通信カラオケ装置のモニタに表示された映像を示す説明図である。
【符号の説明】
１０カラオケ演奏部
１１音声用ＣＰＵ（音声再生手段、同期信号出力手段）
１２ＲＡＭ
１６楽曲データ記憶部（音声データ記憶手段）
１７入力部（再生速度変更手段）
１９音源部（音声再生手段）
２１モデム（データ受取手段）
３０映像再生部
３１映像用ＣＰＵ
３２ＲＯＭ
３４映像データ記憶部（動作データ記憶手段）
３５形状データ記憶部（形状データ記憶手段）
３６物体映像生成部（瞬時映像生成手段、表示手段）
５０合成部
６０ミキサアンプ
７０スピーカ
９０モニタ（表示装置）
１００通信カラオケ装置[0001]
BACKGROUND OF THE INVENTION
  The present invention relates to a synchronized video generation method for playing back a video that moves in synchronization with the playback of audio and a karaoke apparatus using the same, and more particularly, the shape and movement of an object made of a human being, an animal, or an imitation thereof. Based on the data obtained by measurement, a video representing the object is generated, and the video is displayed while being synchronized with audio playback.Video playback device andIt relates to karaoke equipment.
[0002]
[Prior art]
A karaoke apparatus that reproduces a song (accompaniment part) such as a song or pop and displays a background image dedicated to the song to be reproduced is generally known. Such a karaoke apparatus has a storage device using, for example, a VCD (Video CD) or an LD (Laser Disc), and the storage device stores audio data, background images, and lyrics for playing music. The image data for displaying is stored and held. Here, the audio data is, for example, PCM audio data, and the image data is data formed by synthesizing lyrics with a photographed scene. And this karaoke apparatus reads audio | speech data from the said memory | storage device, reproduces | regenerates a music, and simultaneously reads image data from the said memory | storage device, and displays a background image.
[0003]
By the way, the storage device provided in the karaoke apparatus uses a storage medium having a relatively large storage capacity such as VCD or LD. However, the number of songs used in karaoke is enormous, and the capacity of image data is particularly large, so that even with the storage medium, it is not possible to store audio data and image data corresponding to all songs. It was. In addition, since there are many popular songs used in karaoke, there is a problem that new songs must be added frequently. Therefore, in recent years, a so-called communication karaoke apparatus, which has a communication function in a karaoke apparatus and transmits voice data and lyrics data for reproducing music via a telephone line or the like, has been widely used.
[0004]
Since such a communication karaoke apparatus is configured to receive voice data or the like for playing music via a telephone line, the number of music that can be played is not limited by the storage capacity of the storage device. Therefore, a huge amount of music can be reproduced. Even if it is necessary to reproduce the latest music, if the audio data of the music is received via the telephone line, the latest music can be reproduced.
[0005]
However, since the image data for displaying the background image is much larger than the audio data for playing back the music, the image data is transmitted through the telephone line as well as the audio data. Doing so has a significant time or economic disadvantage. For this reason, in the communication karaoke apparatus as described above, the display of the background image is the same as that of the conventional karaoke apparatus, that is, the background image is based on the image data stored in advance in a storage medium such as a VCD or LD. Is displayed. However, since the types of background images that can be displayed are limited by the storage capacity of the storage medium, it is difficult to make different background images correspond to all of the music to be played. Therefore, in the karaoke apparatus as described above, a background image suitable for the music to be reproduced but not directly related is selected and displayed.
[0006]
[Problems to be solved by the invention]
By the way, especially in the case of a rhythmical music among the music used in karaoke, it is desired to display an operation of a singer or the like when singing a music, for example, a swing or a dance, using a karaoke device.
[0007]
However, the operation of such a singer or the like is different for each piece of music because it is a movement that matches the tone of the music to be played. As a result, if video data for displaying the operation of such a singer or the like is provided for each song, the amount of video data becomes enormous, and even if data compression technology is used, the above karaoke The storage capacity of the storage medium provided in the device is far exceeded. Therefore, there is a problem that it is difficult to provide image data for displaying the operation of a singer or the like for each music piece.
[0008]
Further, in order to reproduce the operation of the singer or the like as described above, it is necessary to accurately synchronize video data representing the operation of the singer or the like with the music to be reproduced. Here, a karaoke apparatus capable of displaying an image that changes in synchronization with the progress of music is known, for example, from Japanese Patent Application Laid-Open No. 7-199976. In other words, this karaoke apparatus has video data composed of polygon data and time data, and based on the time data included in this video data, the video display and music playback are synchronized. is there. However, if video data composed of polygon data and time data is provided for each piece of music, the total amount of video data becomes enormous, and it is still difficult to provide video data for each piece of music.
[0009]
On the other hand, the number of songs used for karaoke is enormous, and enormous audio data for playing each song up to now is distributed to each already installed karaoke device and communication karaoke device Is stored in a center host computer or the like. Therefore, in order to realize a karaoke apparatus that displays singers and the like that operate in accordance with the reproduction of music, it is necessary to add video data for displaying singers and the like that operate in accordance with these accumulated music. is there. At this time, in order to add video data, recreating a huge amount of audio data accumulated up to the present has a problem of great time or economic disadvantage.
[0010]
In addition, when performing karaoke, it is necessary to reproduce music at various tempos (speeds). For this reason, the karaoke apparatus usually has a function of making the tempo of the music actually reproduced slower or faster than the reference tempo according to the preference of the person who performs karaoke. Therefore, in order to realize a karaoke device that displays a singer or the like that operates in accordance with the reproduction of the music, it is necessary to operate the displayed singer's video naturally and smoothly even when the tempo of the music is changed. There is.
[0011]
Furthermore, when performing karaoke, it is necessary to reproduce the music from the middle for practice or to repeatedly reproduce a part of the music. For this reason, a karaoke apparatus usually has a function of reproducing music from the middle. Therefore, in order to realize a karaoke device that displays a singer or the like that operates in accordance with the reproduction of the music, when the music is reproduced from the middle, the operation of the singer or the like is displayed so as to correspond to the reproduction part of the music, It is necessary to accurately synchronize the operation of a singer or the like with the music even from the middle of the music.
[0012]
  The present invention has been made in view of the above-described problems, and the present invention operates in synchronization with the reproduction of audio and operates differently for each audio when there are multiple types of audio. Can display videoVideo playback device andThe purpose is to provide a karaoke device.
[0013]
  Further, the present invention can display a video that operates in synchronization with the audio reproduced based on the audio data without recreating the existing audio data created up to now.Video playback device andThe purpose is to provide a karaoke device.
[0014]
  Furthermore, the present invention can smoothly operate the displayed video in synchronism with the audio reproduction even when the audio reproduction speed is changed.Video playback device andThe purpose is to provide a karaoke device.
[0015]
  Furthermore, the present invention can display an image that operates in synchronization with the sound accurately even when the sound of music or the like is reproduced from the middle.Video playback device andThe purpose is to provide a karaoke device.
[0016]
[Means for Solving the Problems]
  In order to solve the above-mentioned problems,According to the first aspect of the present invention, there is provided a synchronization signal to which reference reproduction speed information indicating a reference reproduction speed of the audio data and an identification code indicating a reproduction position of the audio data are attached from an audio reproduction apparatus that reproduces the audio data. Receiving means, shape data storage means for storing shape data defining the shape of each component of the object, and the position or operation of each component for each playback frame that is a playback unit of video Based on the video data storage means for storing video data in which the operation data to be defined are arranged in the order of playback, the received reference playback speed information, and the playback cycle of the playback frame, Table information generation that generates table information indicating correspondence between the identification code and the synchronization frame that is the reproduction frame to be reproduced when the synchronization signal is received. The reproduction frame is sequentially generated based on the means, the shape data, and the operation data and output to the display device, and the next output is based on the table information and the received synchronization signal. Video reproduction means for determining a reproduction frame synchronization status and changing the reproduction frame to be output next to another reproduction frame when the reproduction frame is not synchronized;
[0017]
  According to a second aspect of the present invention, in the video reproduction device according to the first aspect, when the synchronization signal is not received, the reproduction unit to be output next when the synchronization signal is not received is the synchronization frame. In this case, the reproduction frame to be output next is changed to an interpolation frame for interpolating the reproduction frame.
[0018]
  According to a third aspect of the present invention, in the video reproduction device according to the first or second aspect, the video reproduction means is configured to output a reproduction frame to be output next when the synchronization signal is received. If it is not a sync frame, the playback frame to be output next is changed to the sync frame.
[0019]
  According to a fourth aspect of the present invention, in the video reproduction device according to any one of the first to third aspects, the shape data storage means stores a plurality of the shape data, and the video reproduction means The reproduction frame is generated based on the shape data corresponding to selection data received from the outside of the video reproduction apparatus.
[0020]
    A fifth aspect of the present invention is a karaoke apparatus comprising the video reproduction device according to any one of the first to fourth aspects, the display device, and the audio reproduction device, wherein the audio reproduction device is provided. Is an input means for inputting an instruction from a user as an instruction signal, an audio data storage means for storing a plurality of the audio data including the reference reproduction speed information, and an audio reproduction for reproducing the stored audio data Synchronization signal output means for outputting the synchronization signal with the identification code indicating the reproduction position of the audio data in synchronization with reproduction of the audio data by the audio reproduction means, and input from the input means The reference reproduction speed information included in the audio data designated based on the designated instruction signal is output, and the audio data is output at the reproduction speed indicated by the reference reproduction speed information. Causes reproduced by live means, based on the instruction signal inputted from said input means, characterized in that it comprises a voice reproduction control means for changing the playback speed of the audio data in said audio reproducing means.
[0049]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to FIGS. In the present embodiment, a communication karaoke apparatus 100 shown in FIG. 1 will be described as an example of a karaoke apparatus using the synchronized video generation method according to the present invention.
[0050]
(1) Configuration and operation of online karaoke equipment
First, the configuration and operation of the online karaoke apparatus 100 will be described.
[0051]
As shown in FIG. 1, the communication karaoke apparatus 100 reproduces a music composed of sound, and generates a lyric image representing the lyrics of the music, and a singer that operates in synchronization with the music reproduction. A video playback unit 30 that generates an object video such as a background image, a background image generation unit 40 that generates a background image when music is played back, and a synthesis unit 50 that combines a lyrics image, an object video, and a background image. A center host computer 200 is connected to the karaoke performance unit 10 via a telephone line. Furthermore, the karaoke performance unit 10 is connected to a mixer amplifier 60 that synthesizes the sound of the music and the sound input from the microphone 80, and a speaker 70 and a microphone 80 are connected to the mixer amplifier 60. Further, a monitor 90 as a display device for displaying the video / image output from the combining unit 50 is connected to the combining unit 50.
[0052]
Further, the karaoke performance unit 10 is connected via an audio CPU (Central Processing Unit) 11, a RAM (Random Access Memory) 12, a ROM (Read Only Memory) 13, a CD-ROM reading unit 14, and an interface circuit 15. A music data storage unit 16 as an audio data storage unit, an input unit 17, a lyrics image generation unit 18, a sound source unit 19, and a FIFO (First In First Out) circuit 20 are provided. These are connected to each other via a bus 22.
[0053]
Here, the voice CPU 11 performs overall control of the karaoke performance unit 10 and performs automatic performance of music (accompaniment part) such as pop songs and pops. More specifically, the audio CPU 11 has a function of automatically playing music based on, for example, MIDI data configured in accordance with the MIDI standard. Further, the audio CPU 11 has a timer and also has a function of generating a synchronization signal described later based on the MIDI clock. The RAM 12 is used as a work area when the audio CPU 11 performs control processing, and is used for temporarily storing various data. The ROM 13 stores a control program for determining the operation of the karaoke performance unit 10.
[0054]
The CD-ROM reader 14 reads music data, video data, etc., which will be described later, from the CD-ROM. Here, the CD-ROM can be mounted from the outside, and music data, video data, etc. described later are stored in the CD-ROM. The music data read from the CD-ROM reading unit 14 is transferred to the music data storage unit 16. Further, the music data read from the CD-ROM reading unit 14 may be transferred to the RAM 12 and immediately reproduced by the audio CPU 11 and the sound source unit 19. On the other hand, the music data storage unit 16 is constituted by a hard disk, for example, and stores about 2000 music data, for example. The music data storage unit 16 is rewritable and can additionally store music data received from the center host computer 200 via the modem 21 and music data transferred from the CD-ROM reading unit 14. it can.
[0055]
The input unit 17 selects the music to be played, sets the tempo when playing the music, sets the key of the music, sets the viewpoint information, light source information, etc., fast forwards / rewinds the music, etc. An instruction for control is input. The lyric image generator 18 includes, for example, an OSD (On Screen Display) circuit, and generates a lyric image to be displayed on the monitor 90 simultaneously with the reproduction of the music. More specifically, the music data includes audio data for reproducing the music and lyrics data for generating the lyrics image of the music, as will be described later. The lyric image generator 18 generates a lyric image based on the lyric data included in the music data.
[0056]
The sound source unit 19 synthesizes sound based on sound data included in the music data. For example, the music data is MIDI data for automatic performance in accordance with the MIDI standard, and the sound source unit 19 is configured by a synthesizer or the like that generates a musical sound or the like based on the MIDI data. On the other hand, the FIFO circuit 20 functions as a buffer with the bus line of the video reproduction unit 30 and outputs a synchronization signal output from the audio CPU 11 to the video reproduction unit 30.
[0057]
The modem 21 is connected to the center host computer 200 via a telephone line, and receives and demodulates data transmitted from the center host computer 200 via the telephone line. Here, the center host computer 200 stores a large number of music data and a large number of video data corresponding to each music data. When the music data storage unit 16 or the CD-ROM set in the CD-ROM reading unit 14 of the communication karaoke apparatus 100 needs to play back a song that is not stored in the CD-ROM, such as the latest popular song, etc. The music data and the like are transmitted from the host computer 200 to the communication karaoke apparatus 100. At this time, the modem 21 receives music data transmitted from the center host computer 200, demodulates the received data, and transfers the data to the RAM 12 or the music data storage unit 16.
[0058]
On the other hand, the video reproduction unit 30 includes a video data storage unit 34 as an operation data storage unit, a shape data storage unit 35 as a shape data storage unit, and an instantaneous video generation unit connected via a video CPU 31, a ROM 32, and an interface circuit 33. An object video generation unit 36 is provided as means. These are connected to each other via a bus 37. Note that the bus 37 can transfer data to and from the bus 22 of the karaoke performance unit 10 via the FIFO circuit 20.
[0059]
Here, the video CPU 31 performs comprehensive control of the video playback unit 30. The ROM 32 stores a control program for determining the operation of the video reproduction unit 30, a control program for performing a synchronous video generation process to be described later, and the like.
[0060]
The video data storage unit 34 is composed of, for example, a hard disk, and stores video data. The video data storage unit 34 is rewritable, and additionally stores video data received from the center host computer 200 via the modem 21 and video data transferred from the CD-ROM reading unit 14. Can be updated. On the other hand, the shape data storage unit 35 stores shape data, which will be described later, and includes a RAM, a ROM, a hard disk, or the like.
[0061]
The object video generation unit 36 is configured by, for example, an OSD circuit or the like, and is based on the operation data included in the video data stored in the video data storage unit 34 or the like and the shape data stored in the shape data storage unit 35. Instantaneous video data for reproducing a moving object such as a singer who dances to the music as a video (moving image) is generated. More specifically, in the communication karaoke apparatus 100, as shown in FIG. 15, simultaneously with the reproduction of the music, the lyrics image Im1 representing the lyrics of the music, the background image Im2, and the video of the singer dancing to the music Im3 is displayed on the monitor 90. Of these images and videos, the object video generator 36 generates instantaneous video data for displaying the video Im3 of the singer who dances to the music. Here, the instantaneous video data is data that forms one frame of the video Im3.
[0062]
The object video generation unit 36 includes a creation memory 36A and a display memory 36B. The object video generation unit 36 receives the operation data and the shape data from the video data storage unit 34 and the shape data storage unit 35, respectively, expands the operation data and the shape data in the creation memory 36A, and stores the motion data and the shape data in the creation memory 36A. To generate instantaneous video data. Thereafter, the object video generation unit 36 transfers the instantaneous video data from the creation memory 36A to the display memory 36B. Here, the display memory 36B is a so-called video memory that corresponds one-to-one with one frame of an image. Therefore, by transferring the instantaneous video data from the creation memory 36A to the display memory 36B, an image for one frame representing an object such as a singer is formed in the display memory 36B. The video formed in the display memory 36 </ b> B is output to the synthesizing unit 50, synthesized with the lyrics image and the background image by the synthesizing unit 50, and displayed on the monitor 90.
[0063]
The background image generation unit 40 is configured by, for example, an LD playback device and the like, and generates a background image Im2 to be displayed on the monitor 90. Specifically, the background image generation unit 40 reads image data for forming a background image recorded on an LD (Laser Disc), generates a background image based on the image data, and generates the generated background. The image is output to the synthesis unit 50.
[0064]
The monitor 90 is configured by, for example, a CRT (Cathode-Ray Tube) display or a liquid crystal display.
[0065]
According to the communication karaoke apparatus 100 as described above, when performing karaoke, first, the input unit 17 is operated to select music, set tempo, etc., and input an instruction to start playing the music. Thereby, the music data corresponding to the selected music is extracted from the music data stored in the music data storage unit 16 of the karaoke performance unit 10. If there is no music data corresponding to the selected music in the music data storage unit 16, the music data is extracted from the CD-ROM attached to the CD-ROM reading unit 14. Further, when there is no music data corresponding to the selected music in the music data storage unit 16 or in the CD-ROM, transmission of the music data corresponding to the selected music is sent to the center host computer 200. Request. As a result, the music data is received from the center host computer 200 via the modem 21.
[0066]
Here, the music data extracted from the music data storage unit 16 or the CD-ROM or transmitted from the center host computer 200 includes audio data for reproducing a music and lyrics data for generating a lyrics image. include. Therefore, the music data is separated into audio data and lyrics data, and the audio data is transferred to the RAM 12 and the lyrics data is transferred to the lyrics image generation unit 18.
[0067]
Subsequently, the audio CPU 11 performs an automatic performance based on the audio data transferred to the RAM 12. Thereby, the sound synthesized by the sound source unit 19 is output to the speaker 70 via the mixer amplifier 60, and the music is played. In parallel with this, the lyric image generation unit 18 generates a lyric image based on the lyric data and outputs it to the synthesis unit 50. At this time, the audio CPU 11 outputs a synchronization signal corresponding to the performance of the music to the video reproduction unit 30.
[0068]
In parallel with this, video data corresponding to the selected music is extracted from the video data stored in the video data storage unit 34 of the video playback unit 30. If the video data corresponding to the selected music does not exist in the video data storage unit 34, the video data is extracted from the CD-ROM attached to the CD-ROM reading unit 14. Further, when there is no video data corresponding to the selected music in the video data storage unit 34 or in the CD-ROM, the video data corresponding to the selected music is transmitted to the center host computer 200. Request. As a result, the video data is received from the center host computer 200 via the modem 21.
[0069]
Subsequently, the operation data is extracted from the image data extracted from the image data storage unit 34 or the CD-ROM or transmitted from the center host computer 200, and the operation data is stored in the shape data storage unit 35. Together with the data, it is transferred to the creation memory 36A of the object video generation unit 36. Then, the object video generation unit 36 generates instantaneous video data that forms an image for one frame of a video representing an object such as a singer and outputs it to the synthesis unit 50. At this time, the object video generation unit 36 outputs the instantaneous video data so as to synchronize with the synchronization signal output from the audio CPU 11 toward the video reproduction unit 30.
[0070]
In parallel with this, a background image corresponding to the selected music is output from the background image generation unit 40 to the synthesis unit 50. The lyrics image output from the lyrics image generator 18, the video representing the singer output from the object video generator 36, and the background image output from the background image generator 40 are combined by the combiner 50 and monitored. 90.
[0071]
As a result, the selected music is played back, and a video of a singer or the like dancing in synchronization with the music playback is displayed on the monitor 90 together with the lyrics image and the background image.
[0072]
(2) Composition of music data and video data
Next, the composition of music data and video data will be described.
[0073]
FIG. 2 shows the composition of music data. The song data includes a header Hm, a song number Nm that is a number for identifying the corresponding song, tempo data Tm that indicates the speed at which the song is played, and a background that is a number for designating a background image corresponding to the song. It consists of data Bm and a plurality of performance data Pm.
[0074]
Further, each performance data Pm is composed of a block made up of audio data Sd and time data Tsd, and a block made up of lyrics data Wd and time data Twd. The audio data Sd is data for playing back music, and is composed of, for example, MIDI data. More specifically, the audio data Sd includes a command for sounding, a command for stopping the sound, a command for designating which sound is to be played, and the like. The time data Tsd arranged next to the audio data Sd is data used to control the time (timing) for executing a command based on the audio data. On the other hand, the lyric data Wd is data for generating a lyric image. The time data Twd arranged next to the lyrics data Wd is data used to control the time (timing) for generating the lyrics image.
[0075]
FIG. 3 shows the structure of video data. The video data is composed of a header Hp, a music number Np that is a number for identifying the corresponding music, an operation data number Dp, a plurality of operation data Mp, and the like.
[0076]
(3) Structure of motion data and shape data
Next, the configuration of operation data and shape data will be described. As described above, the motion data and the shape data are data used to generate instantaneous video data for forming a video of a singer or the like dancing in synchronization with the reproduction of the music.
[0077]
The motion data is data that divides a video representing an object made up of a human being, an animal, or a mimic thereof into a plurality of components and sets the position and rotation of each component. Also, as shown in FIG. 3, a plurality of motion data are arranged in the video data, and each motion data (instantaneous motion data) sets the motion (position and rotation) of the video for one frame. It is. As described above, the operation data is included in a part of the video data, and is set individually for each music piece. The video data can be received from the center host computer via a telephone line.
[0078]
More specifically, the motion data is data that divides an image representing an object made up of a human being, an animal, or a mimic thereof into a plurality of components and sets the positions or motions of these components.
[0079]
For example, as shown in FIG. 5, a human model imitating a human is virtually defined, and this human model is divided into components such as a waist, a chest, a head, an arm, and a leg. Each divided component is referred to as a “level”. For example, as shown in FIG. 5, the human model is divided into levels 1 to 17, of which level L1 corresponds to the waist of the human model and level L5 corresponds to the head of the human model. Level 1 can move in the X-axis direction, the Y-axis direction, and the Z-axis direction, and can rotate around the X-axis direction axis, the Y-axis direction axis, and the Z-axis direction axis, respectively. . The levels 2 to 17 are respectively connected by a connecting portion R, and can rotate around the axis in the X-axis direction, the axis in the Y-axis direction, and the axis in the Z-axis direction with respect to the connecting portion R. The operation data is a record of the position and rotation of each level.
[0080]
Here, FIG. 4 shows a configuration of operation data. As shown in FIG. 4, the operation data is composed of operation data of level 1 to level 17, and the operation data of level 1 includes position coordinates (X, Y, Z) and rotation angles (Xr, Yr, Zr), and each operation data of level 2 to level 17 is composed only of the rotation angle (Xr, Yr, Zr).
[0081]
The operation data is created by the following method. That is, the music is reproduced at the reference tempo Tm preset for the music. A human actually dances (dances, swings) in accordance with the reproduced music. Further, the position, rotation, etc. of each component of the human performing the dance are measured at a predetermined measurement cycle, for example, a 1/15 second cycle, and data relating to the coordinates of the levels 1 to 17 are collected. Thus, operation data is created every 1/15 second. This measurement period is the same as the display period described later.
[0082]
The operation data configured as described above has a smaller amount of data compared to the conventional case where complete image data is prepared for each frame. Therefore, the operation data can be individually set for each music piece and can be transmitted from the center host computer 200 to the communication karaoke apparatus 100 in a short time.
[0083]
On the other hand, the shape data is data that divides an image representing an object made up of a human being, an animal, or a mimic thereof into a plurality of components and sets the shapes of these components. The shape data is stored in the shape data storage unit 35. The shape data is not set individually for each piece of music unlike the operation data. That is, the shape data is mainly used in common for a plurality of music pieces. A plurality of types of shape data are provided depending on the type of video to be displayed. For example, by changing multiple types of shape data, the singer's video can be changed to an animal video and displayed.
[0084]
Further, the shape data is set for each of the levels 1-17. Here, FIG. 6 shows the configuration of the shape data of level L1. As shown in FIG. 6, the shape data is composed of vertex coordinate data composed of vertex coordinates A1 to A8 and polygon link data composed of polygon data P1 to P6. As shown in FIG. 7, the vertex coordinate data sets the level L1 three-dimensional shape by determining the vertex coordinates A1 to A8 of the level L1. The polygon link data defines the texture and characteristics of each surface at level L1. More specifically, each polygon data P1 to P6 constituting the polygon link data is composed of surface data and vertex numbers. The surface data includes surface color, ambient, transparency, data for instructing whether or not an image (texture) is pasted on the surface, and the like. The vertex coordinates are data indicating the vertices forming each surface. The shape data of level L1 is configured in this way, but the shape data of levels L2 to L17 are also configured in the same way. In addition, the shape data includes information indicating the relationship between the connections of each level and the position thereof, that is, the coordinates of the connecting portion R as shown in FIG.
[0085]
(4) Music and video synchronization
Next, the synchronization of the music reproduced by the karaoke performance unit 10 of the communication karaoke apparatus 100 according to the present embodiment and the video of the singer or the like generated by the video reproduction unit 30 and displayed on the monitor 90 will be described.
[0086]
The music is played mainly by an automatic performance function provided in the voice CPU 11. As described above, the audio CPU 11 performs an automatic performance according to the MIDI standard based on the audio data. At this time, basic time control of automatic performance is performed by a MIDI clock.
[0087]
In addition, a reference tempo is set in advance for the music, and is described as tempo data Tm in the music data. This reference tempo differs for each piece of music. For example, in the case of a piece of music such as a rhythmic rock, a relatively fast reference tempo is set. For slow music such as ballads, a relatively slow reference tempo is set. Therefore, the music is normally played back at the reference tempo. However, when a change in tempo is input by the input unit 17, the music is reproduced at a tempo slower than the reference tempo or faster than the reference tempo according to the input tempo.
[0088]
On the other hand, the video of the singer or the like is mainly generated by the video CPU 31 and the object video generator 36. As described above, the object video generation unit 36 generates instantaneous video data that forms one frame of video based on the motion data and the shape data. That is, the object video generation unit 36 generates instantaneous video data one after another in the creation memory 36A, and transfers the generated instantaneous video data to the display memory 36B at a predetermined display cycle. As a result, an image that changes at a predetermined display cycle is displayed on the monitor 90 via the synthesis unit 50.
[0089]
Here, the predetermined display period is the same as the measurement period used when measuring the operation data, and is, for example, 1/15 seconds. That is, as described above, the operation data is obtained by reproducing a music piece at the reference tempo Tm, performing a dance (dance, swinging) in accordance with the reproduced music piece, and each component of the dancing person. Is generated by measuring the position, rotation, etc. of the lens at a predetermined measurement cycle, for example, a 1/15 second cycle. Therefore, by using the motion data generated in this way and displaying the instantaneous video data at the same display cycle as the predetermined measurement cycle, the human movement at the time of measurement is reproduced as it is, and it is natural and smooth. Can be played back.
[0090]
Further, the display cycle in which the object video generation unit 36 displays the instantaneous video data, that is, the measurement cycle in which the operation data is measured is set to be an integral multiple of the display cycle in which the monitor 90 displays one frame. That is, the monitor 90 is composed of a CRT display or a liquid crystal display, and displays one frame image at a constant display cycle, for example, one frame consisting of two fields at a 1/30 second cycle in the NTSC system. In this case, the display cycle in which the object video generation unit 36 displays the instantaneous video data is, for example, twice the display cycle in which the monitor 90 displays one frame, that is, 1/15 seconds. Therefore, the same image is displayed over two fields.
[0091]
The reproduction of the music and the display of the video are performed based on a synchronization signal output from the audio CPU 11 to the video CPU 31 and a reproduction frame table Ts described later. The synchronization signal is generated by the audio CPU 11 based on the MIDI clock. The MIDI clock is, for example, a signal that outputs 24 clock pulses per quarter note of a song, and corresponds to the tempo of the song. The synchronization signal is a signal obtained by dividing the MIDI clock, for example, a signal that outputs one clock pulse per eighth note of a music piece.
[0092]
Further, as shown in FIG. 9, an identification code for recognizing the reproduction position of the music is attached to each clock pulse constituting the synchronization signal. This identification code increases by 1 from 1, 2, 3,...
[0093]
Furthermore, as shown in FIG. 8, the reproduction frame table is a table describing the temporal correspondence between each clock pulse constituting the synchronization signal and each frame constituting the video. That is, the instantaneous video data output from the object video generation unit 36 changes at a 1/15 second period and forms one frame of the video displayed on the monitor 90. Here, each frame of video displayed on the monitor 90 (hereinafter referred to as “playback frame”) is arranged in a playback order at a predetermined playback cycle t (1/15 second cycle) as shown in FIG. Each frame is assigned a frame number such as F1 to F11. Then, the period of the synchronization signal is calculated based on the reference tempo of the music to be reproduced, and on the basis of this, the reproduction frames F1 to F1 in which the identification numbers attached to the respective clock pulses constituting the synchronization signal are arranged in the reproduction order. Of F11 etc., it is calculated which playback frame matches. A description of the result is a reproduction frame table Ts. The reproduction frame table Ts is created immediately before the reproduction of music is started.
[0094]
Now, how to synchronize the reproduction of the music and the display of the video based on the synchronization signal and the reproduction frame table Ts described above will be described with reference to FIGS.
[0095]
(1) When music playback is at the standard tempo
First, when the music is played back at the reference tempo, as shown in FIG. 9, the first playback frame F1 is displayed simultaneously with the start of the music playback, and thereafter each playback frame F2 is displayed in parallel with the music playback. -F11 etc. are displayed in the order of playback. That is, in parallel with the reproduction of music, the object video generation unit 36 generates instantaneous video data for forming each playback frame, and outputs the generated instantaneous image data to the synthesis unit 50 at each playback cycle t. The synthesizing unit 50 outputs the image data synthesized in synchronization with the display cycle of the monitor 90 to the monitor 90.
[0096]
Here, the motion data (instantaneous motion data) used to form the instantaneous video data is generated by measuring the movement of a human dancing along with the music reproduced at the reference tempo. Therefore, when playing a song at a reference tempo, if the music playback start time and the video display start time are matched, and the music playback and video display are performed in parallel, the music playback and video display will be Synchronize. However, in case the music playback tempo is changed during playback or when the music is paused and played again, the temporal correspondence between the playback of the music and the display of each playback frame is always recognized. It is necessary to keep it. Therefore, the communication karaoke apparatus 100 always recognizes the temporal correspondence between the reproduction of music and the display of each reproduction frame based on the synchronization signal and the reproduction frame table Ts. That is, according to the reproduction frame table Ts shown in FIG. 8, the clock pulse 1 of the synchronization signal corresponds to the reproduction frame F1, and the clock pulse 2 corresponds to the reproduction frame F4. Further, the clock pulse 3 corresponds to the reproduction frame F7, and the clock pulse 4 corresponds to the reproduction frame F11. Therefore, when the music is reproduced at the reference tempo, the output of each clock pulse and the display of the reproduction frame described in the reproduction frame table Ts are temporally coincident, and the correspondence between each clock pulse and the reproduction frame is the same. The video is displayed while confirming whether it matches the description of the playback frame table.
[0097]
(2) When music playback is slower than the standard tempo
Next, when the music reproduction is slower than the reference tempo, interpolation frames G1, G2, etc. are inserted between the reproduction frames F1-F8, etc., as shown in FIG. That is, even when the music playback is slower than the reference tempo, each playback frame is generated and displayed in synchronization with a predetermined display period t. Therefore, if each playback frame is displayed as it is in the playback order, each playback frame is displayed. Is displayed earlier than the playback of the music, and the playback of the music and the display of the video cannot be synchronized. Therefore, the output of each clock pulse and the display of the reproduction frame described in the reproduction frame table Ts temporally match, and the correspondence between each clock pulse and the reproduction frame matches the description of the reproduction frame table. Thus, interpolated frames G1, G2, etc. are inserted between the playback frames F1-F8, etc. Then, the reproduction frames F1 to F8 and the interpolation frames G1 and G2 are sequentially displayed.
[0098]
Here, the operation data used to generate the instantaneous image data forming the interpolation frames G1 and G2 (hereinafter referred to as “interpolation operation data”) is reproduced just before the position where the interpolation frame is inserted. The operation data of the instantaneous image data corresponding to the frame (hereinafter referred to as “immediate operation data”) and the operation data of the instantaneous image data corresponding to the reproduction frame arranged immediately after the position where the interpolation frame is inserted (hereinafter, This is calculated on the basis of “immediate operation data”). More specifically, when the motion of the video is intense and the value difference between the immediately preceding motion data and the immediately following motion data is large, the average value of the immediately preceding motion data and the immediately following motion data is used as the interpolation motion data. When the motion of the video is slow and the difference in value between the immediately preceding action data and the immediately following action data is small, either one of the immediately preceding action data and the immediately following action data is used as the interpolation action data. As a result, even when the music is played back at a speed slower than the reference tempo, it is possible to easily synchronize the playback of the music and the display of the video and to smoothly operate the displayed video.
[0099]
(3) When music playback is faster than the standard tempo
Next, when the reproduction of the music is faster than the reference tempo, as shown in FIG. 11, some reproduction frames F3, F6, F9, F10, F13, etc. are extracted from the reproduction frames F1 to F13, etc. Decimate the playback frame. That is, even when the music is played back faster than the reference tempo, each playback frame is generated and displayed in synchronization with a predetermined display period t. Therefore, if each playback frame is displayed as it is in the playback order, each playback frame is displayed. Is slower than the music playback, and the music playback and the video display cannot be synchronized. Therefore, the output of each clock pulse and the display of the reproduction frame described in the reproduction frame table Ts temporally match, and the correspondence between each clock pulse and the reproduction frame matches the description of the reproduction frame table. As described above, each playback frame is displayed while thinning out each playback frame. Thereby, even if the music is played back at a speed faster than the reference tempo, it is possible to easily synchronize the playback of the music and the display of the video.
[0100]
(5) Synchronous video generation processing
Next, the synchronized video generation process will be described with reference to the flowcharts of FIGS. The synchronized video generation process is a process for synchronizing the music reproduction and the video display as described above, and is executed by the video CPU 31 and the object video generation unit 36 according to the control program stored in the ROM 32 of the video reproduction unit 30. Is done.
[0101]
First, when the user selects the music to be reproduced by the input unit 17 and inputs a command to start the reproduction of the music by the input unit 17, the voice CPU 11 of the karaoke performance unit 10 is selected. The music data corresponding to the music is transferred from the music data storage unit 16 or the like to the RAM 12. Then, the audio CPU 11 extracts tempo data from the music data, recognizes the reference tempo of the music selected based on the tempo data, and outputs the reference tempo data to the video CPU 31. Furthermore, the audio CPU 11 recognizes a tempo change command or the like input from the input unit 17 and performs settings related to music reproduction in accordance with these commands.
[0102]
Then, as described above, while the audio CPU 11 is making settings related to music playback, the video CPU 31 of the video playback unit 30 receives video data corresponding to the selected music from the video data storage unit 34 and the like. The shape data is transferred from the shape data storage unit 35 to the object video generation unit 36 while being transferred to the generation unit 36. Further, the video CPU 31 outputs the reference tempo received from the audio CPU 11 to the object video generation unit 36. Subsequently, the object image generation unit 36 starts a program for synchronous image generation processing described below.
[0103]
First, in step 1 in FIG. 12, the video data transferred from the video data storage unit 34, the shape data output from the shape data storage unit 35, and the reference tempo data output from the video CPU 31 are converted into object video. It is determined whether the generation unit 36 has received it. If no data is received, step 1 is repeated until any data is received.
[0104]
When the shape data is received, the process proceeds from step 1 to step 5 through step 2. In step 5, the received shape data is developed in the creation memory 36A of the object video generation unit 36, and the process returns to step 1 again. When the video data is received at step 1, the process proceeds from step 1 to steps 5 through steps 2 and 3. In step 5, motion data is extracted from the video data, the motion data is expanded in the creation memory 36A of the object video generation unit 36, and the process returns to step 1 again. At this time, each operation data is assigned a frame number, and is stored so as to be randomly accessible by the frame number. Finally, the frame number is from 1 to the number of data Dp. Further, when the reference tempo data is received in step 1, the process proceeds from step 1 to steps 4 through steps 2 and 3. Then, it is determined as “YES” in Step 4, and the received reference tempo data is stored in the memory provided in the object video generation unit 36, and then the process proceeds to Step 6. On the other hand, when data other than shape data, video data, and reference data is received in Step 1, the process proceeds from Step 1 to Step 4 through Steps 2 and 3, and in Step 4, it is determined as “NO”. Return.
[0105]
In step 6, it is determined whether or not all shape data has been received, and reception of shape data has been completed. As a result, when the reception of the shape data is not completed, the process returns to step 1 to wait for the reception of the shape data, and when the reception of the shape data is completed, the process proceeds to step 7.
[0106]
In step 7, it is determined whether all video data (motion data) corresponding to the selected music has been received and reception of video data has been completed, that is, whether motion data corresponding to the number of data Dp has been received. As a result, when the reception of the video data is not completed, the process returns to step 1 to wait for the reception of the video data, and when the reception of the video data is completed, the process proceeds to step 8.
[0107]
In step 8, the viewpoint position and light source position of the video to be displayed on the monitor 90 are set to default (initial setting) positions. Here, the object image generation unit 36 can set the viewpoint position, that is, the angle at which the object is displayed when the object image is displayed on the monitor 90, and the distance from which the object is displayed. The video of a singer can be displayed not only from the front but also from the side or back. Note that when the viewpoint position is set to the default position, an image of a singer or the like viewed from the front is displayed. Further, the object video generation unit 36 can set the light source position, that is, the angle from which the light is displayed when the video is displayed on the monitor 90. For example, the upper left, the upper right The image of a singer or the like can be displayed with the light on from the front. If the light source position is set to the default position, an image of a singer or the like who has applied the light from the upper left is displayed.
[0108]
In step 9, based on the reference tempo data and the data number Dp received in step 4, that is, the number of operation data Mp, a reproduction frame table Ts as shown in FIG.
[0109]
Next, in step 10 in FIG. 13, the reproduction position of the music is recognized. That is, normally, the music is played from the beginning, but if the user operates the input unit 17 to specify the playback position of the music, the music can be played from the specified position. . That is, when the user designates the reproduction position of the music, data indicating the designated reproduction position is input to the object video generation unit 36. Here, an identification code for recognizing the reproduction position is attached to each clock pulse constituting the synchronization signal. Accordingly, the music can be reproduced from the designated position by recognizing the identification code of the clock pulse corresponding to the designated reproduction position based on the data indicating the designated reproduction position.
[0110]
In step 11, it is determined whether or not an instruction to start playing the music is received. That is, when the user operates the input unit 17 to input an instruction to start playing music, the instruction is transmitted to the object video generation unit 36. When the object video generation unit 36 receives the instruction, it determines “YES” in step 11, and proceeds to step 12. Thereby, in the karaoke performance part 10, the reproduction | regeneration of a music is started from the reproduction | regeneration position designated by step 10, and the production | generation of an image | video is started from the reproduction | regeneration position designated by step 10 in the image | video reproduction | regeneration part 30. On the other hand, when an instruction to start playing the music is not received, “NO” is determined in step 11, and step 11 is repeated until an instruction to start playing the music is received.
[0111]
In step 12, it is determined whether or not an instruction to end the reproduction of the music has been received. That is, when the user operates the input unit 17 and inputs an instruction to end (stop) the reproduction of the music, the instruction is transmitted to the object video generation unit 36. When the object video generation unit 36 receives the instruction, it determines “YES” in step 12, ends the reproduction of the music, and ends the synchronous image generation process. On the other hand, when the instruction to end the reproduction of the music is not received, “NO” is determined in the step 12, and the process proceeds to the step 13.
[0112]
In step 13 and step 14, it is determined whether viewpoint information or light source information has been received. That is, the communication karaoke apparatus 100 has a function of setting and changing viewpoint information, which is information for setting the viewpoint position, and light source information, which is information for setting the light source position, according to a user instruction. . Then, when the user operates the input unit 17 to input viewpoint information or light source information, “YES” is determined in step 13 or step 14, the process proceeds to step 15, and each information is an object in step 15. It is stored in a memory provided in the video generation unit 36.
[0113]
In step 16, a playback frame is generated based on the shape data and operation data developed in the creation memory 36 </ b> A of the object video generation unit 36 in step 5. That is, instantaneous video data for forming a playback frame is generated based on the shape data and the operation data. Then, step 17 is repeated until 1/15 seconds have elapsed since the reproduction frame was generated, and thereafter, in step 18, it is determined whether or not the clock pulse of the synchronization signal output from the audio CPU 11 has been received.
[0114]
When the clock pulse of the synchronization signal is received in step 18, the process proceeds to step 22, and at step 22 the determination is made as to whether or not the reproduction frame to be displayed next is the synchronization frame when the clock pulse of the synchronization signal is received. To do. Here, the “synchronization frame” means a playback frame corresponding to the frame number described in the playback frame table Ts. That is, as described above, the playback frame table Ts is synchronized when the music is played at the reference tempo and at the same time, each playback frame is displayed at a predetermined display cycle (for example, 1/15 second cycle). It describes a reproduction frame that coincides with each clock pulse constituting the signal in time. Accordingly, when each clock pulse of the synchronization signal is received, the reproduction of the music and the display of the video can be synchronized by displaying the reproduction frame corresponding to the frame number described in the reproduction frame table Ts. it can. In this sense, a playback frame corresponding to a frame number described in the playback frame table is referred to as a “synchronization frame”. For example, according to the reproduction frame table shown in FIG. 8, the synchronization frames are reproduction frames F1, F4, F7, F11, and the like. In FIG. 9 to FIG. 11, the playback frame hatched in the frame is a synchronization frame.
[0115]
As a result of the determination in step 22, when the reproduction frame of the video to be displayed next is a synchronization frame at the time when the clock pulse of the synchronization signal is received, the step in FIG. 14 is performed to display the synchronization frame. 24.
[0116]
On the other hand, if the result of determination in step 22 is that the playback frame of the video to be displayed next is not a synchronization frame at the time when the clock pulse of the synchronization signal is received, the process proceeds to step 23. Discard. Then, instead of the discarded playback frame, an instantaneous operation for forming the synchronization frame so as to display the closest synchronization frame among the synchronization frames arranged behind the playback frame in time. Generate using data. Then, the process proceeds to step 24 in FIG. 14 to display the synchronization frame.
[0117]
Here, as a result of the determination in step 22, when the clock pulse of the synchronization signal is received, a situation in which the playback frame to be displayed next is not a synchronization frame occurs when the music is played back faster than the reference tempo. That is, as shown in FIG. 9, when the playback of the music is at the reference tempo, if the playback frames arranged in the playback order are displayed in the same order, the output of each clock pulse and the display of the synchronization frame are temporal. The music playback and the video display can be synchronized. Therefore, at the time when the clock pulse of the synchronization signal is received in step 18, the playback frame of the video to be displayed next is always the synchronization frame. In other words, the time interval at which each clock pulse constituting the synchronization signal is output is equal to the time interval at which the synchronization frame is arranged. On the other hand, as shown in FIG. 11, when the reproduction of music is faster than the reference tempo, the time interval at which each clock pulse constituting the synchronization signal is output is shorter than the time interval at which each synchronization frame is arranged. Therefore, as a result of the determination in step 22, when the clock pulse of the synchronization signal is received, a situation occurs in which the reproduction frame to be displayed next is not the synchronization frame. In this case, the playback frame to be displayed next is thinned out. That is, the playback frame to be displayed next is discarded, and instead of the discarded playback frame, the closest synchronization frame among the synchronization frames arranged on the rear side with respect to the discarded playback frame is displayed. Display it. As a result, even if the music playback is faster than the reference tempo, the output of each clock pulse and the display of each synchronized frame can be matched in time, and the playback of the music and the video display can be synchronized. Can do. That is, the motion of the displayed video becomes faster as the music playback is accelerated.
[0118]
On the other hand, if the clock pulse of the synchronization signal is not received in step 18 when 1/15 seconds have elapsed since the generation of the playback frame in step 17, the process proceeds to step 19 and in step 19, the next It is determined whether or not the playback frame that is to be displayed is a synchronization frame. If the result of determination in step 19 is that the next playback frame to be displayed is not a synchronous frame, the process proceeds to step 24 in FIG. 14 to display the playback frame.
[0119]
On the other hand, if the result of determination in step 19 is that the playback frame to be displayed next is a synchronous frame, the process proceeds to step 20 to generate an interpolation frame, and in step 21 the synchronization frame to be displayed next is displayed. Immediately before, the generated interpolation frame is inserted. Then, the process proceeds to step 24 in FIG. 14 to display the inserted interpolation frame.
[0120]
Here, as a result of the determination in step 19, the situation in which the playback frame to be displayed next is a synchronization frame even though the clock pulse of the synchronization signal has not been received. It happens when it is late. That is, as shown in FIG. 10, when the reproduction of music is slower than the reference tempo, the time interval at which each clock pulse constituting the synchronization signal is output is longer than the time interval at which each synchronization frame is arranged. Therefore, as a result of the determination in step 19, a situation occurs in which the playback frame to be displayed next is a synchronization frame even though the clock pulse of the synchronization signal has not been received. In this case, an interpolation frame is inserted between the respective synchronization frames, and adjustment is performed so that the output of each clock pulse and the display of each synchronization frame coincide. Thereby, even if the music reproduction is slower than the reference tempo, the music reproduction and the video display can be synchronized. In other words, the movement of the displayed video is delayed by the amount that the reproduction of the music is delayed.
[0121]
In step 24, the coordinates of the polygon in the display coordinate system are calculated based on the motion data of the instantaneous video data forming the playback frame and the viewpoint information. Here, the viewpoint position of the video to be displayed is calculated according to the viewpoint information input in step 13.
[0122]
In step 25, the texture is pasted based on the shape data, the light source information, and the viewpoint information. As a result, a surface pattern or texture that changes depending on the viewpoint position is added to each surface of the polygon.
[0123]
In step 26, polygon shading is performed based on the light source information and the viewpoint information. Thereby, a shadow or the like formed according to the direction of the light source is added to each polygon.
[0124]
In step 27, the instantaneous video data generated in the creation memory 36A of the object video generation unit 36 is transferred to the display memory 36B. As a result, the video formed by the instantaneous video data is displayed on the monitor 90 via the synthesis unit 50.
[0125]
Thereafter, the process returns to step 12, and the process of step 13 to step 27 is repeatedly executed until an instruction to end the reproduction of the music is received in step 12. As a result, the music is played and the monitor 90 displays an image of a singer who dances to the music as shown in FIG.
[0126]
Thus, according to the communication karaoke apparatus 100 according to the present embodiment, it is possible to display an image of a singer or the like dancing along with the reproduction of music on the monitor 90. In particular, since the operation data and the shape data used when generating instantaneous video data for forming a video of a singer or the like are stored separately, operation data with a relatively small amount of data is provided for each song. If the shape data is common to each music piece, an image that performs a unique operation for each music piece can be easily generated with a small amount of data.
[0127]
Furthermore, since the operation data has a relatively small amount of data, it can be transmitted from the center host computer 200 via a telephone line in the same manner as music data. As a result, when playing back a music that is not stored in the communication karaoke apparatus 100, such as the latest music, the operation data corresponding to the music can be received from the center host computer 200 together with the music data. Therefore, the latest music or the like can be reproduced immediately, and the latest dance or choreography applied to the music can be displayed immediately while being synchronized with the music.
[0128]
Furthermore, since the operation data and the shape data are stored separately, if multiple types of shape data are provided according to the type of video to be displayed, the shape data type can be changed to match the music. You can change the appearance of the dancing video. For example, if two types of shape data, that is, shape data that forms a male image and shape data that forms a female image, are provided, a video of a male dancing with respect to one piece of music without changing the operation data. It is possible to select two types of images, such as images of women dancing, or combine them and display them simultaneously.
[0129]
Moreover, according to the communication karaoke apparatus 100 according to the present embodiment, the music data and the motion data are completely separated, and therefore the motion data can be easily added to the existing music data. For example, by playing back existing music data, a human dancing to the played music, measuring the human movement and generating motion data,
Motion data corresponding to existing music data can be generated. Then, if the motion data generated in this way is transmitted from the center host computer 200 to the communication karaoke apparatus 100, for example, motion data corresponding to existing music data can be easily added. Therefore, it is not necessary to recreate existing music data, and the existing music data can be used effectively.
[0130]
Further, according to the communication karaoke apparatus 100 according to the present embodiment, motion data is generated by measuring a human motion dancing to a music reproduced at a reference tempo at a predetermined measurement cycle, and the motion data is used. The instantaneous video data generated in this way is displayed with the same display cycle as the measurement cycle, so that when the music is played at the reference tempo, the music playback and the video display can be easily synchronized. And can accurately reproduce the video of a singer dancing to the music.
[0131]
Further, when the music playback is slower than the reference tempo, the interpolation frame is inserted between the synchronization frames. Therefore, even if the music playback is slower than the reference tempo, the music playback and video display are performed. Can be synchronized and the displayed video can be smoothly moved. In addition, when the music playback is faster than the reference tempo, the playback frame is partially thinned out, so even when the music playback is faster than the reference tempo, the music playback and video display are easily synchronized. Can be made.
[0132]
On the other hand, according to the communication karaoke apparatus 100 according to the present embodiment, the display period of each playback frame, that is, the display period of each instantaneous video data is set to be an integral multiple of the display period at which the monitor 90 displays one frame image. Therefore, the video formed by each instantaneous video data generated by the object video generation unit 36 can be displayed in accordance with the display cycle of the monitor 90. Therefore, all the images formed by the instantaneous image data generated by the object image generation unit 36 can be displayed on the monitor 90, and the movement of the image displayed on the monitor 90 can be smoothed.
[0133]
Further, according to the above-described embodiment, the display cycle for displaying each playback frame is shorter than the cycle of the synchronization signal. Here, the tempo of the music is usually expressed by the number of quarter notes per minute. Even for music with a fast tempo, the number of quarter notes per minute is about 240 at most. On the other hand, since the cycle of the synchronization signal is a cycle in which one clock pulse is output per eighth note of a music piece, when a music piece having 240 quarter notes per minute is reproduced, the cycle of the synchronization signal is 1/8 second. On the other hand, the display cycle for displaying each playback frame is 1/15 seconds, for example. Accordingly, the display cycle for displaying each playback frame is shorter than the cycle of the synchronization signal.
[0134]
Thus, if the display cycle for displaying each playback frame is shorter than the cycle of the synchronization signal, a plurality of playback frames can be displayed while each clock pulse constituting the synchronization signal is output. For example, as shown in FIG. 9, four playback frames F1 to F4 can be displayed after the clock pulse 1 is output until the next clock pulse 2 is output. Thus, if a plurality of playback frames can be displayed while the clock pulses constituting the synchronization signal are output, even when the tempo of the music becomes faster than the reference tempo, as shown in FIG. By thinning out some of the playback frames, it is possible to easily and accurately synchronize the playback of music and the display of video. On the other hand, even when the tempo of the music is slower than the reference tempo, as shown in FIG. 10, by inserting an interpolated frame between each playback frame, the output of each clock pulse and the display of each synchronization frame are timed. Therefore, the reproduction of the music and the display of the video can be easily and accurately synchronized.
[0135]
Moreover, according to the communication karaoke apparatus 100 according to the present embodiment, since the identification code for recognizing the music reproduction position is attached to each clock pulse constituting the synchronization signal, the music reproduction position can always be recognized. Thus, it is possible to easily perform playback from the middle of the music, fast forward, rewind, etc. of the music.
[0136]
Further, as described above, a plurality of types of shape data are stored in the shape data storage unit 35, and can be selected according to the type of music (music tone, singer is male or female, etc.). For example, by describing selection data indicating which shape data is used for music data, a standard singer can be automatically selected for the music. In addition, by adopting a configuration in which shape data can be selected by input from the input unit 17, a singer or the like according to the user's preference can be selected. Furthermore, if a plurality of singers are selected, a plurality of singers performing the same operation can be displayed side by side in the monitor 90.
[0137]
In the above embodiment, the interpolation frame is inserted immediately before the synchronization frame. However, the present invention is not limited to this, and the interpolation frame may be inserted between playback frames that are not synchronization frames or immediately after the synchronization frame. Further, when the tempo of the music is much slower than the reference tempo, it is necessary to insert a plurality of interpolated frames continuously. In this case, the insertion position of each interpolation frame is dispersed. Thereby, the movement of the displayed object can be made smoother. Similarly, when the tempo of the music is much faster than the reference tempo, it is necessary to thin out a plurality of playback frames continuously. In this case, each reproduction frame to be thinned out is dispersed. Thereby, the movement of the displayed object can be made smoother. Also, the process of distributing the insertion position of each interpolation frame or the process of decimation of each playback frame may be obtained by calculation during music playback, such as when setting the tempo before music playback. You may obtain | require by calculating.
[0138]
In the above-described embodiment, the display cycle is 1/15 seconds. However, the present invention is not limited to this. If conditions such as the amount of operation data and the time required for image creation processing are satisfactory, the movement is smoother. In order to reproduce, the display cycle may be shorter than 1/15 seconds. For example, the display cycle of the monitor 90 in the above embodiment is 1 field 1/60 seconds since one frame is composed of two fields. Therefore, the display cycle may be 1/60 seconds.
[0139]
In the embodiment, the shape data is stored in the shape data storage unit 35 in the video reproduction unit 30, but the present invention is not limited to this. For example, the shape data may be stored in the RAM 12. In the embodiment, the motion data is stored in the video data storage unit 34. However, for example, the motion data may be stored in the music data storage unit 16 together with the music data. That is, shape data, motion data, and music data can be handled as independent files, and each data can be individually accessed, and if each can be added, changed, deleted, etc., the data can be stored separately. It does not have to be stored in the device.
[0140]
Moreover, although the case where the image | video of humans, such as a singer, was mentioned as an example in the said embodiment, this invention is not limited to this, You may display an animal, a human, or an imitation of an animal. For example, an image in which animal characters such as dogs and cats dance like humans may be displayed.
[0141]
In the embodiment, the shape data is described as being stored in the shape data storage unit 35. However, the present invention is not limited to this, and the configuration in which the shape data can be added or changed using a CD-ROM or the like. It is good. Further, the configuration may be such that shape data is received from the center host computer 200 and can be added, changed, or deleted as appropriate. In this case, since the shape data itself has a relatively small data amount, it can be transmitted from the center host computer 200 to the communication karaoke apparatus 100 in a short time. Therefore, the shape data may be appropriately transmitted during the time when the communication karaoke apparatus 100 is not used.
[0142]
Furthermore, although the said embodiment demonstrated and demonstrated the case where the synchronous image generation method was applied to the communication karaoke apparatus as an example, this invention is not restricted to this, The apparatus which synchronizes aerobics music and the image | video of an aerobics instructor. It can also be applied to devices that synchronize announcer's voice and sign language video.
[0143]
【The invention's effect】
    As detailed above, according to the invention of claim 1,It is possible to display images of people, animals, etc. that move in synchronization with the sound.
[0144]
  According to the invention of claim 2,Even when the audio playback speed is reduced, the displayed video can be smoothly operated while being synchronized with the audio playback.
[0145]
  According to the invention of claim 3,Even when the audio playback speed is increased,It is possible to easily display an image that moves in synchronization with audio reproduction.
[0146]
  According to the invention of claim 4,The shape or the like of the object to be displayed can be set or changed according to the music or according to the user's preference.
[0147]
  According to the invention of claim 5,It is possible to display images of people, animals, etc. that move in synchronization with the sound. Further, even if the reproduction speed of the audio data is changed, the reproduction of the audio data and the display of the video can be synchronized.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a communication karaoke apparatus according to an embodiment of the present invention.
FIG. 2 is an explanatory diagram showing the composition of music data in the embodiment of the present invention.
FIG. 3 is an explanatory diagram showing a configuration of video data in the embodiment of the present invention.
FIG. 4 is an explanatory diagram showing a configuration of operation data in the embodiment of the present invention.
FIG. 5 is an explanatory diagram showing an object model in the embodiment of the present invention.
FIG. 6 is an explanatory diagram showing a configuration of shape data in the embodiment of the present invention.
FIG. 7 is a perspective view showing components of an object model in the embodiment of the present invention.
FIG. 8 is an explanatory diagram showing a playback frame table in the embodiment of the present invention.
FIG. 9 is a time chart showing a state in which a MIDI clock, a synchronization signal, and a reproduction frame are synchronized when a music piece is reproduced at a reference speed in the embodiment of the present invention.
FIG. 10 is a time chart showing a state in which a MIDI clock, a synchronization signal, and a reproduction frame are synchronized when a music piece is reproduced at a tempo slower than a reference speed in the embodiment of the present invention.
FIG. 11 is a time chart showing a state in which a MIDI clock, a synchronization signal and a reproduction frame are synchronized when a music piece is reproduced at a tempo faster than a reference speed in the embodiment of the present invention.
FIG. 12 is a flowchart showing synchronous video generation processing by the communication karaoke apparatus according to the embodiment of the present invention.
FIG. 13 is a flowchart showing a synchronized video generation process following FIG. 12;
FIG. 14 is a flowchart showing a synchronized video generation process following FIG. 13;
FIG. 15 is an explanatory diagram showing an image displayed on the monitor of the communication karaoke apparatus according to the embodiment of the present invention.
[Explanation of symbols]
10 Karaoke performance club
11 CPU for voice (voice playback means, synchronization signal output means)
12 RAM
16 Music data storage unit (voice data storage means)
17 Input section (reproduction speed changing means)
19 Sound source section (sound reproduction means)
21 Modem (data receiving means)
30 Video playback unit
31 Video CPU
32 ROM
34 Video data storage unit (operation data storage means)
35 Shape data storage unit (shape data storage means)
36 Object image generation unit (instant image generation means, display means)
50 Synthesizer
60 Mixer amplifier
70 speakers
90 Monitor (display device)
100 online karaoke equipment

Claims

Receiving means for receiving, from an audio reproduction device for reproducing audio data, reference reproduction speed information indicating a reference reproduction speed of the audio data and a synchronization signal to which an identification code indicating a reproduction position of the audio data is attached;
Shape data storage means for storing shape data defining the shape of each component of the object;
Video data storage means for storing video data in which operation data defining the position or operation of each component for each playback frame as a video playback unit is arranged in the order of playback;
Based on the received reference playback speed information and the playback period of the playback frame, the identification code attached to the synchronization signal and the playback frame to be played back when the synchronization signal is received. Table information generating means for generating table information indicating the association with a certain synchronization frame;
The playback frame is sequentially generated based on the shape data and the operation data and output to the display device, and the playback frame to be output next is output based on the table information and the received synchronization signal. When the synchronization status is determined and the synchronization is not performed, video playback means for changing the playback frame to be output next to another playback frame;
A video playback apparatus comprising:

The video reproduction means interpolates the reproduction frame to be output next when the reproduction frame to be output next is the synchronization frame when the synchronization signal is not received. The video reproducing apparatus according to claim 1, wherein the video reproducing apparatus is changed to an interpolation frame for performing the operation.

The video playback means changes the playback frame to be output next to the synchronization frame when the playback frame to be output next is not the synchronization frame when the synchronization signal is received. The video playback apparatus according to claim 1, wherein the video playback apparatus is characterized.

The shape data storage means stores a plurality of the shape data,
The said video reproduction | regeneration means produces | generates the said reproduction | regeneration frame based on the said shape data corresponding to the selection data received from the said video reproduction | regeneration apparatus outside, The one of Claim 1 thru | or 3 characterized by the above-mentioned. The video playback device described.

A karaoke apparatus comprising: the video reproduction device according to any one of claims 1 to 4, the display device, and the audio reproduction device.
The audio playback device
An input means for inputting an instruction from the user as an instruction signal;
Audio data storage means for storing a plurality of the audio data including the reference reproduction speed information;
Audio reproducing means for reproducing the stored audio data;
Synchronization signal output means for outputting the synchronization signal to which the identification code indicating the reproduction position of the audio data is attached in synchronization with the reproduction of the audio data by the audio reproduction means;
The reference playback speed information included in the audio data designated based on the instruction signal input from the input means is output, and the audio playback means outputs the audio data at a playback speed indicated by the reference playback speed information. Audio reproduction control means for changing the reproduction speed of the audio data in the audio reproduction means based on the instruction signal input from the input means,
A karaoke apparatus comprising: