JP3925349B2

JP3925349B2 - Apparatus and method for synchronous reproduction of audio data and performance data

Info

Publication number: JP3925349B2
Application number: JP2002242481A
Authority: JP
Inventors: 潤石井; 令古川
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2002-08-22
Filing date: 2002-08-22
Publication date: 2007-06-06
Anticipated expiration: 2022-08-22
Also published as: JP2004085609A

Description

【０００１】
【発明の属する技術分野】
本発明は、オーディオデータの再生に同期して、楽曲の演奏制御に関する情報を含む演奏データの再生を行う装置および方法に関する。
【０００２】
【従来の技術】
楽曲を再生するための手段として、音楽ＣＤ（ＣｏｍｐａｃｔＤｉｓｃ）などの記憶媒体から音声データを読み出し、読み出された音声データから音声を生成して出力する装置がある。また、楽曲を再生するための他の手段として、ＦＤ（ＦｌｏｐｐｙＤｉｓｋ）などの記憶媒体から楽曲の演奏制御に関する情報を含むデータを読み出し、読み出されたデータを用いて音源装置の発音を制御することにより自動演奏を行う装置がある。楽曲の演奏制御に関する情報を含むデータとしては、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）規格に従って作成されたＭＩＤＩデータがある。
【０００３】
最近では、音楽ＣＤに記録される音声データの再生に対し、ＭＩＤＩデータによる自動演奏を同期させる方法が提案されている。その中の１つとして、音楽ＣＤに記録されているタイムコードを用いる方法がある（例えば、特許文献１および特許文献２参照）。以下、この方法について説明する。
【０００４】
まず、音楽ＣＤ再生装置により音楽ＣＤの音声データおよびタイムコードが再生される。そして、音声データは音として出力され、タイムコードは記録装置に供給される。ここで、タイムコードは、あるまとまった単位の音声データに対応付けられたデータであり、各タイムコードは、楽曲の開始時点から当該タイムコードに対応した音声データの再生タイミングまでの経過時間を表している。また、音楽ＣＤの再生に合わせて、楽器の演奏が行われ、楽器から記録装置にＭＩＤＩデータが順次供給される。記録装置は、楽器からＭＩＤＩデータを受け取ると、ＭＩＤＩデータをその受信のタイミングを示す時間情報とともに記録媒体に記録する。また、記録装置は、タイムコードを音楽ＣＤ再生装置から受け取ると、これをその受信タイミングを示す時間情報とともに記録媒体に記録する。その結果、記録媒体には、タイムコードとＭＩＤＩデータとが混在したファイルが作成される。このファイルにおいて、各タイムコードとＭＩＤＩデータは、楽曲再生開始時刻から各々の再生時刻までの経過時間を表す時間情報を伴っている。
【０００５】
このようにしてＭＩＤＩデータおよびタイムコードが記録媒体に記録されると、以後、同一楽曲の音声データが音楽ＣＤから再生されるとき、これに同期させて記録媒体からＭＩＤＩデータを読み出し、自動演奏を行うことができる。その動作は次の通りである。
【０００６】
まず、音楽ＣＤ再生装置により音楽ＣＤから音声データとタイムコードが再生される。そして、音声データは、音として出力され、タイムコードはＭＩＤＩデータの再生装置に供給される。それと同時に再生装置は、ファイルに記録されているＭＩＤＩデータを、ともに記録されている時間情報に従って読み出し、ＭＩＤＩデータによる自動演奏が可能な楽器に順次送信する。その際、再生装置は音楽ＣＤ再生装置から受信するタイムコードと、ＭＩＤＩデータとともにファイルから読み出されるタイムコードとに基づき、音楽ＣＤの音声データの再生とＭＩＤＩデータの再生の時間的ずれを調整する。その結果、音楽ＣＤの音声データとＭＩＤＩデータの同期再生が実現される。
【０００７】
特許文献１：特願２００２−７８７２
特許文献２：特願２００２−７８７３
【０００８】
【発明が解決しようとする課題】
しかしながら、同じ楽曲であっても異なるタイムコードが付された音楽ＣＤに関しては、音楽ＣＤのタイムコードを用いる方法により、音楽ＣＤの音声データとＭＩＤＩデータの同期再生を実現することはできない。
【０００９】
現在、同じ楽曲について、異なる版の音楽ＣＤが数多くある。内容的には同じ楽曲であっても、音楽ＣＤの版が異なると、各音楽ＣＤ間で楽曲の前の沈黙時間に差が生じ、実際に楽曲の演奏が開始される時点のタイムコードが大きく異なる場合がある。すなわち、従来のタイムコードを用いる技術により作成された同期演奏用のＭＩＤＩデータを、同じ楽曲の異なる版の音楽ＣＤに対し用いると、実際に楽曲の演奏が開始される前にＭＩＤＩデータによる演奏が開始してしまうか、逆に楽曲の演奏が開始されてしばらくの間、ＭＩＤＩデータによる演奏が開始されないため、ＭＩＤＩデータによる演奏が全体的に音楽ＣＤの楽曲に対しずれてしまう。
【００１０】
従って、従来のタイムコードを用いる技術によれば、同じ楽曲の音声データを記録する音楽ＣＤであっても、実際の楽曲の演奏開始時点に対応するタイムコードのバリエーションに応じて、異なる同期演奏用のＭＩＤＩデータを準備しなければならないという問題があった。
【００１１】
上述した状況に鑑み、本発明は、同じ楽曲の音声データであっても、実際の楽曲の演奏開始時点が互いに異なる複数の版の音声データに対し、同期再生が可能なＭＩＤＩデータ等の演奏データの記録装置、再生装置、記録方法、再生方法、およびプログラムを提供することを目的とする。
【００１２】
【課題を解決するための手段】
以上説明した課題を解決するため、本発明は、楽曲の音声波形を示す音声データを受信する第１受信手段と、演奏の制御を指示する制御データを受信する第２受信手段と、前記音声データの一部である部分データが表す音声波形を抽象化した参照用データを生成する生成手段と、前記参照用データを記録するとともに、前記部分データの再生タイミングと前記制御データの受信タイミングとの時間的関係を示す時間データとからなる演奏データを記録する記録手段とを備えることを特徴とする記録装置を提供する。
【００１３】
また、本発明は、音声波形を抽象化した参照用データと、演奏の制御を指示する制御データおよび該演奏の制御の実行タイミングを指示する時間データからなる演奏データとを受信する第１受信手段と、楽曲の音声波形を示す音声データを受信する第２受信手段と、前記音声データの中から、前記参照用データが表す音声波形に類似する音声波形を示すデータを部分データとして選択する選択手段と、前記部分データの再生タイミングと前記時間データとによって決定されるタイミングで、前記制御データの送信を行う送信手段とを備えることを特徴とする再生装置を提供する。
【００１４】
また、本発明は、楽曲の音声波形を示す音声データを受信する第１受信過程と、演奏の制御を指示する制御データを受信する第２受信過程と、前記音声データの一部である部分データが表す音声波形を抽象化した参照用データを生成する生成過程と、前記参照用データを記録するとともに、前記部分データの再生タイミングと前記制御データの受信タイミングとの時間的関係を示す時間データとからなる演奏データを記録する記録過程とを備えることを特徴とする記録方法を提供する。
【００１５】
また、本発明は、音声波形を抽象化した参照用データと、演奏の制御を指示する制御データおよび該演奏の制御の実行タイミングを指示する時間データからなる演奏データとを受信する第１受信過程と、楽曲の音声波形を示す音声データを受信する第２受信過程と、前記音声データの中から、前記参照用データが表す音声波形に類似する音声波形を示すデータを部分データとして選択する選択過程と、前記部分データの再生タイミングと前記時間データとによって決定されるタイミングで、前記制御データの送信を行う送信過程とを備えることを特徴とする再生方法を提供する。
【００１６】
また、本発明は、これらの記録方法および再生方法を用いる処理をコンピュータに実行させるプログラムを提供する。
【００１７】
また、本発明は、音声波形を抽象化した参照用データと、演奏の制御を指示する制御データおよび該演奏の制御の実行タイミングを指示する時間データからなる演奏データが記録された記録媒体を提供する。
【００１８】
かかる構成による装置、方法、プログラム、および記録媒体を用いると、音声データを再生する場合、音声データが表す波形の類似性により、参照用データの音声データに対する時間的な位置を決定することができ、さらに参照用データの時間的な位置に基づき、制御データの再生のタイミングを決定することができる。その結果、音声データと制御データとの同期再生が実現される。
【００１９】
また、本発明にかかる記録装置は、前記音声データの再生タイミングを示すタイムコードを受信する第３受信手段を備え、前記記録手段は、前記タイムコードが示す時間情報に基づいて前記時間データを生成する構成としてもよい。
また、本発明にかかる再生装置は、前記音声データの再生タイミングを示すタイムコードを受信する第３受信手段を備え、前記送信手段は、前記タイムコードが示す時間情報に基づいて前記制御データの送信を行う構成としてもよい。
【００２０】
かかる構成による記録装置および再生装置を用いると、再生速度にバイアスを持つ再生装置により再生される音声データに対しても、タイムコードに従った計時が行われるため、制御データの同期再生が正しく行われる。
【００２１】
また、本発明にかかる記録装置において、前記生成手段は、入力データが表す音声波形の直流成分を取り除くためのフィルタ手段を備える構成としてもよい。また、本発明にかかる記録装置において、前記生成手段は、入力データが表す音声波形に含まれる特定の周波数帯の成分を取り出すためのフィルタ手段を備える構成としてもよい。
【００２２】
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記生成手段は、入力データが示す音声波形の直流成分を取り除くためのフィルタ手段を備える構成としてもよい。
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記生成手段は、入力データが示す音声波形に含まれる特定の周波数帯の成分を取り出すためのフィルタ手段を備える構成としてもよい。
【００２３】
かかる構成による記録装置および再生装置を用いると、音声データが表す音声波形の類似性により参照用データの音声データに対する時間的な位置を決定する際に、高い精度でその位置を決定することができる。
【００２４】
また、本発明にかかる記録装置において、前記生成手段は、入力データをダウンサンプリングするダウンサンプリング手段を備える構成としてもよい。
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記生成手段は、入力データをダウンサンプリングするダウンサンプリング手段を備える構成としてもよい。
【００２５】
かかる構成による記録装置および再生装置を用いると、参照用データのデータ量が小さくなり、データの記録や送受信が容易となる。
【００２６】
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記参照用データと前記判定用データとの積和を、前記参照用データの二乗和で除して得られる指標に基づき、前記部分データを選択する構成としてもよい。
【００２７】
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記参照用データと前記判定用データとの積和の二乗を、前記参照用データの二乗和と前記参照用データの二乗和との積で除して得られる指標に基づき、前記部分データを選択する構成としてもよい。
【００２８】
また、本発明にかかる再生装置において、前記選択手段は、前記音声データの一部が表す音声波形を抽象化した判定用データを生成する生成手段を備え、前記参照用データと前記判定用データとの積和の変化率に基づき、前記部分データを選択する構成としてもよい。
【００２９】
かかる構成による再生装置を用いると、音声データが表す音声波形の類似性により参照用データの音声データに対する時間的な位置を決定する際に、高い精度でその位置を決定することができる。
【００３０】
【発明の実施の形態】
［１］第１実施形態
［１．１］構成、機能、およびデータフォーマット
［１．１．１］全体構成
図１は、本発明の第１実施形態に係る同期記録再生装置ＳＳの構成を示す図である。同期記録再生装置ＳＳは、音楽ＣＤドライブ１、ＦＤドライブ２、自動演奏ピアノ部３、発音部４、操作表示部５およびコントローラ部６により構成されている。
【００３１】
音楽ＣＤドライブ１、ＦＤドライブ２、自動演奏ピアノ部３、発音部４および操作表示部５はそれぞれ通信線によりコントローラ部６と接続されている。また、自動演奏ピアノ部３と発音部４は、通信線により直接接続されている。
【００３２】
［１．１．２］音楽ＣＤドライブ
音楽ＣＤに記録されているオーディオデータは、音声情報を示す音声データと、音声データの再生のタイミングを示すタイムコードを含んでいる。音楽ＣＤドライブ１は、コントローラ部６からの指示に従って、装填された音楽ＣＤからオーディオデータを読み出し、読み出したオーディオデータに含まれる音声データを順次出力する装置である。音楽ＣＤドライブ１は通信線により、コントローラ部６の通信インタフェース６５と接続されている。
【００３３】
音楽ＣＤドライブ１から出力される音声データは、サンプリング周波数４４１００Ｈｚ、量子化ビット数１６の左右２チャンネルからなるデジタル音声データである。なお、音楽ＣＤドライブ１から出力されるデータには、タイムコードは含まれていない。音楽ＣＤドライブ１の構成は、音声データをデジタル出力可能な一般的な音楽ＣＤドライブと同様であるため、その説明は省略する。
【００３４】
［１．１．３］ＦＤドライブ
ＦＤドライブ２は、ＳＭＦ（ＳｔａｎｄａｒｄＭＩＤＩＦｉｌｅ）をＦＤに記録し、またＦＤに記録されているＳＭＦを読み取り、読み取ったＳＭＦを送信する装置である。ＦＤドライブ２は通信線により、コントローラ部６の通信インタフェース６５と接続されている。なお、ＦＤドライブ２の構成は、一般的なＦＤドライブと同様であるため、その説明は省略する。
【００３５】
［１．１．４］ＭＩＤＩイベントおよびＳＭＦ
ＳＭＦは、ＭＩＤＩ規格に従った演奏制御データであるＭＩＤＩイベントと、各ＭＩＤＩイベントの実行タイミングを示すデータであるデルタタイムを含むファイルである。図２および図３を用いて、ＭＩＤＩイベントおよびＳＭＦのフォーマットを説明する。
【００３６】
図２にはＭＩＤＩイベントの例として、ノートオンイベント、ノートオフイベント、およびシステムエクスクルーシブイベントが示されている。ノートオンイベントは楽音の発音を指示するためのＭＩＤＩイベントで、発音を示す９ｎＨ（ｎはチャンネル番号、Ｈは１６進数を示す、以下同様）、音高を示すノートナンバ、および発音の強さ（もしくは打鍵の速さ）を示すベロシティから成る。同様に、ノートオフイベントは楽音の消音を指示するためのＭＩＤＩイベントで、消音を示す８ｎＨ、音高を示すノートナンバ、および消音時の強さ（もしくは鍵を離す速さ）を示すベロシティから成る。一方、システムエクスクルーシブイベントは製品やソフトウェアの製造者が自由に定めるフォーマットのデータを送受信もしくは記録するためのＭＩＤＩイベントで、システムエクスクルーシブイベントの開始を示すＦ０Ｈ、データ長、データ、およびシステムエクスクルーシブイベントの終了を示すＦ７Ｈから成る。このように、ＭＩＤＩイベントは時間情報を持たず、リアルタイムに楽音の発音、消音、およびその他の制御を行う目的で利用される。
【００３７】
図３にはＳＭＦのフォーマットの概略が示されている。ＳＭＦはヘッダチャンクとトラックチャンクから成る。ヘッダチャンクには、トラックチャンネルに含まれるデータのフォーマットや時間単位等に関する制御データが含まれている。トラックチャンクには、ＭＩＤＩイベントと、各ＭＩＤＩイベントの実行タイミングを示すデルタタイムが含まれている。
【００３８】
ＳＭＦにおいて、デルタタイムとしては、直前のＭＩＤＩイベントに対する相対的な時間をクロックと呼ばれる時間単位で表現する方法と、楽曲の先頭の時点からの絶対的な時間を時間、分、秒およびフレームと呼ばれる時間単位の組み合わせで表現する方法がある。以下の説明においては、説明を容易にするため、デルタタイムは、基準となる時点からの絶対的な時間とし、その単位を秒で表すこととする。
なお、本明細書においては、ＭＩＤＩデータとはＭＩＤＩ規格に従って作成されるデータの総称である。
【００３９】
［１．１．５］自動演奏ピアノ部
自動演奏ピアノ部３は、同期記録再生装置ＳＳのユーザによる鍵操作およびペダル操作に応じて、音響的なピアノ音および電子的な楽音合成によるピアノ音を出力する楽音発生装置である。また、自動演奏ピアノ部３はユーザによる鍵操作およびペダル操作に応じてＭＩＤＩイベントを生成し、生成したＭＩＤＩイベントを送信する。さらに、自動演奏ピアノ部３はＭＩＤＩイベントを受信して、受信したＭＩＤＩイベントに応じて、音響的なピアノ音および電子的な楽音合成によるピアノ音による自動演奏を行う。
【００４０】
自動演奏ピアノ部３は、ピアノ３１、キーセンサ３２と、ペダルセンサ３３、ＭＩＤＩイベント制御回路３４、音源３５、および駆動部３６から構成されている。
【００４１】
キーセンサ３２およびペダルセンサ３３は、それぞれピアノ３１の複数の鍵および複数のペダルの各々に配設され、それぞれ鍵およびペダルの位置を検出する。キーセンサ３２およびペダルセンサ３３は、検出された位置情報を、それぞれの鍵およびペダルに対応した識別番号と検出の時間情報と共に、ＭＩＤＩイベント制御回路３４に送信する。
【００４２】
ＭＩＤＩイベント制御回路３４は、キーセンサ３２およびペダルセンサ３３から、それぞれの鍵およびペダルの位置情報を、鍵およびペダルの識別情報、および時間情報と共に受信し、これらの情報から即時にノートオンイベントやノートオフイベント等のＭＩＤＩイベントを生成し、生成したＭＩＤＩイベントをコントローラ部６および音源３５に出力する回路である。さらに、ＭＩＤＩイベント制御回路３４は、コントローラ部６からＭＩＤＩイベントを受信し、受信したＭＩＤＩイベントを音源３５もしくは駆動部３６に転送する機能も持つ。なお、ＭＩＤＩイベント制御回路３４が、コントローラ部６から受信するＭＩＤＩイベントを音源３５と駆動部３６のいずれに転送するかは、コントローラ部６の指示による。
【００４３】
音源３５は、ＭＩＤＩイベント制御回路３４からＭＩＤＩイベントを受信し、受信したＭＩＤＩイベントに基づいて、各種楽器の音情報を左右２チャンネルのデジタル音声データとして出力する装置である。音源３５は受信したＭＩＤＩイベントによって指示された音高のデジタル音声データを電子的に合成し、発音部４のミキサ４１に送信する。
【００４４】
駆動部３６は、ピアノ３１の各鍵および各ペダルに配設され、それらを駆動するソレノイド群およびそれらのソレノイド群を制御する制御回路から構成される。駆動部３６の制御回路は、ＭＩＤＩイベント制御回路からＭＩＤＩイベントを受信すると、対応する鍵もしくはペダルに配設されたソレノイドへ供給する電流量を調節し、ソレノイドが発生する磁力を制御することにより、ＭＩＤＩイベントに応じた鍵もしくはペダルの動作を実現する。
【００４５】
［１．１．６］発音部
発音部４は、自動演奏ピアノ部３およびコントローラ部６から音声データを受信し、受信した音声データを音に変換して出力する装置である。発音部４は、ミキサ４１、Ｄ／Ａコンバータ４２、アンプ４３、およびスピーカ４４から構成されている。
【００４６】
ミキサ４１は左右２チャンネルからなるデジタル音声データを複数受信し、それらを左右１組のデジタル音声データに変換するデジタルステレオミキサである。ミキサ４１は自動演奏ピアノ部３の音源３５からデジタル音声データを受信すると同時に、音楽ＣＤドライブ１により音楽ＣＤから読み出されたデジタル音声データを、コントローラ部６を介して受信する。ミキサ４１は受信したこれらのデジタル音声データを算術平均し、左右１組のデジタル音声データとしてＤ／Ａコンバータ４２に送信する。
【００４７】
Ｄ／Ａコンバータ４２は、ミキサ４１からデジタル音声データを受信し、受信したデジタル音声データをアナログ音声信号に変換し、アンプ４３に出力する。アンプ４３は、Ｄ／Ａコンバータ４２から入力されるアナログ音声信号を増幅し、スピーカ４４に出力する。スピーカ４４は、アンプ４３から入力される増幅されたアナログ音声信号を音に変換する。その結果、音楽ＣＤに記録された音声データおよび音源３５が生成する音声データは、ステレオの音として発音部４から出力される。
【００４８】
［１．１．７］操作表示部
操作表示部５は、同期記録再生装置ＳＳのユーザが同期記録再生装置ＳＳの各種操作を行う際に用いるユーザインタフェースである。
【００４９】
操作表示部５はユーザが同期記録再生装置ＳＳに指示を与える際に押下するキーパッド、ユーザが同期記録再生装置ＳＳの状態を確認するための液晶ディスプレイ等を有している。ユーザによりキーパッドが押下されると、操作表示部５は押下されたキーパッドに対応する信号をコントローラ部６に出力する。また、操作表示部５はコントローラ部６から文字や図形の情報を含むビットマップデータを受信すると、受信したビットマップデータに基づき液晶ディスプレイに文字や図形を表示する。
【００５０】
［１．１．８］コントローラ部
コントローラ部６は、同期記録再生装置ＳＳの全体を制御する装置である。コントローラ部６はＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）６１、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６２、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）６３、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）６４、および通信インタフェース６５から構成されている。また、これらの構成要素はバスにより相互に接続されている。
【００５１】
ＲＯＭ６１は各種の制御用プログラムを格納する不揮発性のメモリである。ＲＯＭ６１が格納する制御用プログラムには、一般的な制御処理を行うプログラムに加え、後述するＳＭＦの記録動作および再生動作における処理をＣＰＵ６２に実行させるプログラムが含まれている。ＣＰＵ６２は汎用的な処理を実行可能なマイクロプロセッサであり、ＲＯＭ６１から制御用プログラムを読み込み、読み込んだ制御用プログラムに従った制御処理を行う。ＤＳＰ６３はデジタル音声データを高速に処理可能なマイクロプロセッサであり、ＣＰＵ６２の制御に従い、音楽ＣＤドライブ１やＦＤドライブ２からコントローラ部６が受信するデジタル音声データに対し、後述する相関判定用データ生成処理および相関判定処理において必要とされるフィルタ処理等の処理を施し、その結果得られるデータをＣＰＵ６２に送信する。ＲＡＭ６４は揮発性メモリであり、ＣＰＵ６２およびＤＳＰ６３が利用するデータを一時的に格納する。通信インタフェース６５は各種フォーマットのデジタルデータを送受信可能なインタフェースであり、音楽ＣＤドライブ１、ＦＤドライブ２、自動演奏ピアノ部３、発音部４、および操作表示部５との間で送受信されるデジタルデータに対し必要なフォーマット変換を行い、それらの各装置とコントローラ部６との間のデータの中継を行う。
【００５２】
［１．２］動作
続いて、同期記録再生装置ＳＳの動作を説明する。
［１．２．１］記録動作
まず、同期記録再生装置ＳＳのユーザが市販の音楽ＣＤの再生に合わせてピアノを演奏し、その演奏の情報をＭＩＤＩデータとしてＦＤに記録する際の同期記録再生装置ＳＳにおける動作を説明する。なお、以下に説明する記録動作において用いられる音楽ＣＤを、後述する再生動作において用いられる音楽ＣＤと区別するために、音楽ＣＤ−Ａと呼ぶ。
【００５３】
［１．２．１．１］記録の開始操作
ユーザは、音楽ＣＤ−Ａを音楽ＣＤドライブ１に、また空のＦＤをＦＤドライブ２にセットする。続いて、ユーザは演奏データの記録開始に対応する操作表示部５のキーパッドを押下する。操作表示部５は押下されたキーパッドに対応する信号をコントローラ部６に出力する。
【００５４】
コントローラ部６のＣＰＵ６２は操作表示部５から演奏データの記録開始に対応する信号を受信すると、音楽ＣＤドライブ１に音楽ＣＤの再生命令を送信する。この再生命令に応じて、音楽ＣＤドライブ１は音楽ＣＤ−Ａに記録されている音声データをコントローラ部６に順次送信する。コントローラ部６は音楽ＣＤドライブ１から、１／４４１００秒ごとに左右１組のデータを受信する。以下、左右１組のデータの値を（Ｒ（ｎ），Ｌ（ｎ））と表し、この１組のデータの値、もしくは後述する相関判定用データ生成処理においてこの１組のデータから生成される各データの値を「サンプル値」と呼ぶ。Ｒ（ｎ）およびＬ（ｎ）はそれぞれ右チャンネルのデータおよび左チャンネルのデータの値を示し、−３２７６８〜３２７６７のいずれかの整数である。ｎは音声データの順序を表す整数で、先頭のデータから順に０、１、２、・・・と増加する。
【００５５】
［１．２．１．２］音声データの発音部への送信
まず、ＣＰＵ６２はサンプル値、すなわち（Ｒ（０），Ｌ（０））、（Ｒ（１），Ｌ（１））、（Ｒ（２），Ｌ（２））、・・・を受信すると、受信したサンプル値を発音部４に送信する。発音部４はコントローラ部６からサンプル値を受信すると、これを音に変換し出力する。その結果、ユーザは音楽ＣＤ−Ａに記録されている楽曲を聴くことができる。
【００５６】
［１．２．１．３］参照用未加工音声データのＲＡＭへの記録
ＣＰＵ６２は受信したサンプル値を発音部４に送信すると同時に、受信したサンプル値のうち、楽曲の冒頭付近の一定時間に対応するサンプル値をＲＡＭ６４に記録する。
本実施形態においては、例として、ＣＰＵ６２は２¹⁶組、すなわち６５５３６組のサンプル値をＲＡＭ６４に記録するものとする。なお、６５５３６組のサンプル値は、約１．４９秒間分のデータである。
【００５７】
まず、ＣＰＵ６２は各サンプル値について、各サンプル値の絶対値が予め定められた閾値を超えるか否かの判定を行う。具体的には、閾値が１０００であるとすると、Ｒ（ｎ）もしくはＬ（ｎ）の絶対値のいずれかが１０００より大きい場合、ＣＰＵ６２は比較判定で肯定的な結果を得る。
【００５８】
以下、説明のため例として、音楽ＣＤ−Ａの音声データに関しては、第５２１５６組のサンプル値、すなわち（Ｒ（５２１５６），Ｌ（５２１５６））において初めて、Ｒ（５２１５６）もしくはＬ（５２１５６）の絶対値が所定の閾値を超えるものとする。従って、ＣＰＵ６２は（Ｒ（０），Ｌ（０））〜（Ｒ（５２１５５），Ｌ（５２１５５））に対する比較判定の結果、否定的な結果を得る。その間、ＣＰＵ６２はこれらのサンプル値をＲＡＭ６４に記録しない。その結果、楽曲の冒頭に含まれる沈黙もしくは沈黙に近い部分にあたるサンプル値はＲＡＭ６４に記録されない。この場合、記録されない冒頭のサンプル値の再生時間は約１．１８秒である。
【００５９】
その後、ＣＰＵ６２は（Ｒ（５２１５６），Ｌ（５２１５６））を受信し、そのサンプル値に対する比較判定の結果、肯定的な結果を得る。ＣＰＵ６２は比較判定において肯定的な結果を得ると、それ以降に受信する６５５３６組のサンプル値、すなわち（Ｒ（５２１５６），Ｌ（５２１５６））〜（Ｒ（１１７６９１），Ｌ（１１７６９１））をＲＡＭ６４に記録する。以下、これら一連のサンプル値を「参照用未加工音声データ」と呼ぶ。
【００６０】
［１．２．１．４］計時の開始
ＣＰＵ６２は、参照用未加工音声データの最後のサンプル値、すなわち（Ｒ（１１７６９１），Ｌ（１１７６９１））を受信すると、参照用未加工音声データの記録を終了すると共に、その時点を基準として計時を開始する。
【００６１】
［１．２．１．５］参照用加工済音声データの生成
ＣＰＵ６２は、参照用未加工音声データの記録を終了すると、ＤＳＰ６３に、参照用未加工音声データに対して相関判定用データ生成処理を行うよう、実行命令を送信する。相関判定用データ生成処理とは、サンプリング周波数４４１００Ｈｚの音声データから、相関判定処理に用いるためのサンプリング周波数約１７２．２７Ｈｚの音声データを生成する処理である。相関判定処理とは、２組の音声データの類似性を判定する処理であり、詳細は後述する。以下、図４を参照しながら、相関判定用データ生成処理を説明する。
【００６２】
ＤＳＰ６３はＣＰＵ６２より参照用未加工音声データに対する相関判定用データ生成処理の実行命令を受信すると、ＲＡＭ６４に記録されている参照用未加工音声データを読み出す（ステップＳ１）。続いて、ＤＳＰ６３は参照用未加工音声データの各サンプル値の左右の値を相加平均することにより、ステレオのデータをモノラルのデータに変換する（ステップＳ２）。このモノラルへの変換処理は、このステップより後の処理のＤＳＰ６３への負荷を軽減するための処理である。
【００６３】
続いて、ＤＳＰ６３はモノラルに変換された一連のサンプル値に対し、ハイパスフィルタ処理を行う（ステップＳ３）。このハイパスフィルタ処理により、一連のサンプル値が示す音声波形における直流成分が取り除かれ、サンプル値が正負の両側に均等に分布するようになる。相関判定処理においては２組の音声データの相互相関値を用いた比較判定が行われ、相互相関値の比較を行う際、サンプル値は正負の両側に均等に分布すると判定の精度が高い。すなわち、このステップにおける処理は、相関判定処理における判定の精度向上を目的とする処理である。
【００６４】
続いて、ＤＳＰ６３はハイパスフィルタ処理を施された各サンプル値について、絶対値を求める（ステップＳ４）。このステップにおける処理は、各サンプル値のパワーの代替値を求めるための処理である。絶対値はパワーを示す二乗値よりも値が小さく処理が容易であるため、本実施形態においては、各サンプル値の二乗値の代替値として絶対値が用いられる。従って、ＤＳＰ６３の処理能力が高い場合、このステップにおいて各サンプル値の絶対値の代わりに二乗値を算出してもよい。
【００６５】
続いて、ＤＳＰ６３はステップＳ４において絶対値に変換された一連のサンプル値に対し、くし形フィルタを用いたフィルタ処理を行う（ステップＳ５）。このステップにおける処理は、一連のサンプル値が示す音声波形から、波形の変化を捉えやすい低周波数成分を取り出すための処理である。低周波数成分を取り出すためには、ローパスフィルタが通常用いられるが、くし形フィルタはローパスフィルタと比較して、通常、ＤＳＰ６３への負荷が小さいため、本実施形態においてはローパスフィルタがくし形フィルタで代用されている。
【００６６】
図５は、ステップＳ５において利用可能なくし形フィルタの一例について、その構成を示している。図５において、四角形で示される処理は遅延処理を示し、ｚ^-kにおけるｋは、その遅延処理における遅延時間が（サンプリング周期×ｋ）であることを意味する。前述のとおり、音楽ＣＤのサンプリング周波数は４４１００Ｈｚであるため、サンプリング周期は１／４４１００秒である。一方、三角形で示される処理は乗算処理を示し、三角形の中に示される値が乗算の係数を示す。図５において、Ｋは次の式（１）で表される。
【数１】

【００６７】
このＫを係数とする乗算により、このくし形フィルタは周波数ｆのハイパスフィルタの機能を持つ。その結果、このステップにおけるフィルタ処理によって、一連のサンプル値が示す音声波形における直流成分が再度、取り除かれる。なお、ｋおよびｆの値は任意に変更が可能であり、相関判定処理における判定精度が高くなるよう、経験的に求められる。
【００６８】
続いて、ＤＳＰ６３はステップＳ５においてフィルタ処理を施された一連のサンプル値に対し、さらにローパスフィルタを用いたフィルタ処理を行う（ステップＳ６）。このステップにおける処理は、次のステップＳ７において行われるダウンサンプリング処理の結果、折り返し雑音が発生することを防止するための処理である。ステップＳ７においては４４１００Ｈｚのサンプリング周波数のデータを約１７２．２７Ｈｚのサンプリング周波数にダウンサンプリングするため、折り返し雑音の発生を防ぐには、その半分である約８６．１３Ｈｚ以上の周波数成分を取り除く必要がある。しかしながら、ステップＳ５におけるくし形フィルタを用いたフィルタ処理においては、くし形フィルタの特性により高周波数成分が十分に取り除かれない。従って、このステップにおいてローパスフィルタを用いたフィルタ処理を行うことにより、残存している約８６．１３Ｈｚ以上の周波数成分を取り除く。なお、ＤＳＰ６３の処理能力が高い場合、ステップＳ５とステップＳ６における２つのフィルタを用いたフィルタ処理の代わりに、精度の高いローパスフィルタを１つのみ用いたフィルタ処理を行ってもよい。
【００６９】
続いて、ＤＳＰ６３はステップＳ６においてフィルタ処理を施された一連のサンプル値に対し、１／２５６のダウンサンプリング処理を行う（ステップＳ７）。すなわち、ＤＳＰ６３は、２５６のサンプル値ごとにサンプル値を１つ抜き取る処理を行う。その結果、一連のサンプル値のデータ数は６５５３６から２５６に減少する。以下、ステップＳ７の処理によって得られる各サンプル値をＸ（ｍ）と表す。ただし、ｍは０〜２５５の整数である。また、一連のサンプル値、すなわち、Ｘ（０）〜Ｘ（２５５）を以下、「参照用加工済音声データ」と呼ぶ。ＤＳＰ６３は、参照用加工済音声データをＲＡＭ６４に記録する（ステップＳ８）。
【００７０】
［１．２．１．６］ＭＩＤＩイベントのＲＡＭへの記録
上述したＤＳＰ６３による参照用加工済音声データの生成処理が行われる一方で、ユーザはピアノ３１を用いた演奏を開始する。すなわち、ＣＰＵ６２が参照用未加工音声データの記録を終了し、計時を開始した後、ユーザは発音部４から出力される音楽ＣＤ−Ａの楽曲を聴きながら、その楽曲に合わせてピアノ３１の打鍵およびペダル操作を行う。
ユーザのピアノ３１を用いた演奏の情報は、キーセンサ３２およびペダルセンサ３３にて鍵およびペダルの動きとして検出され、ＭＩＤＩイベント制御回路３４にてＭＩＤＩイベントに変換された後、コントローラ部６に送信される。
【００７１】
コントローラ部６において、ＣＰＵ６２は自動演奏ピアノ部３からＭＩＤＩイベントを受信すると、ＭＩＤＩイベントの受信した時点における計時の値、すなわちＣＰＵ６２が参照用未加工音声データの最後のサンプル値を受信した時点からＭＩＤＩイベントの受信時刻までの経過時間を表すデルタタイムを、ＭＩＤＩイベントとともにＲＡＭ６４に記録する。図６は音楽ＣＤ−Ａの音声データと、ＭＩＤＩイベントとの時間的関係を示す模式図である。図６によれば、音楽ＣＤ−Ａの音声データの再生の開始時点から約２．６７秒後にＣＰＵ６２による計時が開始され、その時点を基準に、第１のＭＩＤＩイベントが１．２５秒、第２のＭＩＤＩイベントが２．６３秒後、第３のＭＩＤＩイベントが３．７１秒後に、それぞれＣＰＵ６２に受信されたことが分かる。
【００７２】
［１．２．１．７］ＳＭＦのＦＤへの記録
音楽ＣＤ−Ａの楽曲の再生が終了し、ピアノ３１を用いたユーザによる演奏も終了すると、ユーザは演奏データの記録終了に対応する操作表示部５のキーパッドを押下する。操作表示部５は押下されたキーパッドに対応する信号をコントローラ部６に送信する。ＣＰＵ６２は、操作表示部５から演奏データの記録終了を示す信号を受信すると、音楽ＣＤドライブ１に音楽ＣＤの再生停止命令を送信する。この再生停止命令に応じて、音楽ＣＤドライブ１は音楽ＣＤ−Ａの再生を停止する。
【００７３】
続いて、ＣＰＵ６２は、ＤＳＰ６３により生成された参照用加工済音声データと、ユーザのピアノ３１を用いた演奏により生成されたＭＩＤＩイベントおよびデルタタイムをＲＡＭ６４より読み出す。ＣＰＵ６２は読み出したこれらのデータを組み合わせ、ＳＭＦのトラックチャンクを生成する。さらに、ＣＰＵ６２は作成したトラックチャンクに対し、これに応じたヘッダチャンクを付加し、ＳＭＦを生成する。
【００７４】
図７は、ＣＰＵ６２が生成するＳＭＦの概略を示す図である。トラックチャンクのデータ部分の先頭には、参照用加工済音声データを含むシステムエクスクルーシブイベントが、そのデルタタイムと共に記録されている。このデルタタイムは０．００秒である。参照用加工済音声データを含むシステムエクスクルーシブイベントに続き、ユーザのピアノ３１を用いた演奏に応じたＭＩＤＩイベントが順次記録されている。図６の例によれば、ユーザの演奏による第１のＭＩＤＩイベントはＣ５音のノートオンイベント、第２のＭＩＤＩイベントはＥ６音のノートオンイベント、第３のＭＩＤＩイベントはＣ５音のノートオフイベントであり、それらに対するデルタタイムは、それぞれ１．２５秒、２．６３秒、３．７１秒である。
【００７５】
ＣＰＵ６２は、ＳＭＦの生成を完了すると、生成したＳＭＦを書込命令と共にＦＤドライブ２に送信する。ＦＤドライブ２はＣＰＵ６２から書込命令およびＳＭＦを受信すると、ＳＭＦをセットされているＦＤに書き込む。
【００７６】
図６を用いて、音楽ＣＤ−Ａの音声データと、ＳＭＦに書き込まれるＭＩＤＩイベントの時間的関係を整理する。なお、以下の説明において、異なる２つの時間を区別するため、音楽ＣＤ−Ａの再生開始時点を０秒とする時間には後に（Ｔ）を付け、一方、ＳＭＦにおけるデルタタイムには後に（Ｄ）を付ける。
【００７７】
まず、約１．１８秒（Ｔ）の時点で、音楽ＣＤ−Ａの音声データの絶対値が閾値である１０００を超えるため、参照用未加工音声データの記録が開始される。その後、約１．４９秒間、すなわち、約２．６７秒（Ｔ）後まで、参照用未加工音声データが記録される。
【００７８】
続いて、約２．６７秒（Ｔ）の時点を０秒として、デルタタイム算出のための計時が開始される。その後、第１イベントが１．２５秒（Ｄ）、すなわち約３．９２秒（Ｔ）の時点で発生し、そのイベントが記録される。同様に、第２イベントは２．６３秒（Ｄ）、すなわち約５．３０秒（Ｔ）の時点で、また第３イベントは３．７１秒（Ｄ）、すなわち約６．３８秒（Ｔ）の時点で発生し、それらのイベントが記録される。
【００７９】
なお、図７の下段に示すように、参照用加工済音声データに対応する参照用未加工音声データの再生時間は０．００秒（Ｄ）に先行するが、ＳＭＦにおいて参照用加工済音声データは、システムエクスクルーシブデータとして０．００秒（Ｄ）の位置に記録されている。
【００８０】
［１．２．２］再生動作
続いて、上述した方法によって記録されたＳＭＦを再生し、音楽ＣＤの音声データとＳＭＦのＭＩＤＩデータを同期させる際の動作を説明する。以下の再生動作において用いられる音楽ＣＤは、上述した記録動作において用いられた音楽ＣＤ−Ａと同じ楽曲を含んでいるが、版が異なっており、音楽ＣＤの再生開始の時間から楽曲が開始するまでの時間や、音声データの示す音声波形のレベルが異なっている。さらに、この音楽ＣＤは楽曲のマスタデータからプレス用のデータが作成される際、音声データに音響効果等に関する編集が加えられているため、音楽ＣＤ−Ａに含まれる同じ楽曲のデータと、その内容にわずかの差異がある。従って、以下に説明する再生動作において用いられる音楽ＣＤを音楽ＣＤ−Ａと区別するため、音楽ＣＤ−Ｂと呼ぶ。
【００８１】
［１．２．２．１］再生の開始操作
ユーザは、音楽ＣＤ−Ｂを音楽ＣＤドライブ１に、またＳＭＦの記録されたＦＤをＦＤドライブ２にセットする。続いて、ユーザは演奏データの再生開始に対応する操作表示部５のキーパッドを押下する。操作表示部５は押下されたキーパッドに対応する信号をコントローラ部６に出力する。
【００８２】
ＣＰＵ６２は操作表示部５から演奏データの再生開始を指示する信号を受信すると、まずＦＤドライブ２に対しＳＭＦの送信命令を送信する。ＦＤドライブ２はこのＳＭＦの送信命令に応じて、ＦＤからＳＭＦを読み出し、読み出したＳＭＦをコントローラ部６に送信する。ＣＰＵ６２はＦＤドライブ２からＳＭＦを受信し、受信したＳＭＦをＲＡＭ６４に記録する。
【００８３】
続いて、ＣＰＵ６２は音楽ＣＤドライブ１に音楽ＣＤの再生命令を送信する。この再生命令に応じて、音楽ＣＤドライブ１は音楽ＣＤ−Ｂに記録されている音声データをコントローラ部６に順次送信する。コントローラ部６は音楽ＣＤドライブ１から、１／４４１００秒ごとに左右１組のデータを受信する。ここでＣＰＵ６２が音楽ＣＤドライブ１から受信するデータの値を（ｒ（ｎ），ｌ（ｎ））と表す。なお、ｒ（ｎ）およびｌ（ｎ）の値の範囲、ｎおよび以下で用いる「サンプル値」の定義は、Ｒ（ｎ）およびＬ（ｎ）におけるものと同様である。
【００８４】
［１．２．２．２］音声データの発音部への送信
ＣＰＵ６２は音楽ＣＤドライブ１からサンプル値、すなわち（ｒ（０），ｌ（０））、（ｒ（１），ｌ（１））、（ｒ（２），ｌ（２））、・・・を受信すると、受信したサンプル値を発音部４に送信する。発音部４はコントローラ部６からサンプル値を受信すると、これを音に変換し出力する。その結果、ユーザは音楽ＣＤ−Ｂに記録されている楽曲を聴くことができる。
【００８５】
［１．２．２．３］相関判定処理
ＣＰＵ６２は音楽ＣＤドライブ１から受信されるサンプル値を発音部４に送信すると同時に、ＤＳＰ６３に対し、まず相関判定処理の実行命令を送信し、続いて受信されるサンプル値を順次ＤＳＰ６３に送信する。相関判定処理とは音楽ＣＤドライブ１から受信される一連のサンプル値から生成される判定用加工済音声データと、ＳＭＦに含まれている参照用加工済音声データとの類似性を判定する処理である。以下、図８を参照しながら、相関判定処理を説明する。
【００８６】
ＤＳＰ６３はＣＰＵ６２から相関判定処理の実行命令を受信し、続いて順次サンプル値、すなわち（ｒ（０），ｌ（０））、（ｒ（１），ｌ（１））、（ｒ（２），ｌ（２））、・・・を受信すると、受信されるサンプル値をＲＡＭ６４に記録する。以下、（ｒ（ｎ），ｌ（ｎ））から始まる一連の６５５３６のサンプル値を「判定用未加工音声データ（ｎ）」と呼ぶ。そして、ＤＳＰ６３は６５５３６番目のサンプル値、すなわち（ｒ（６５５３５），ｌ（６５５３５））を受信してそのサンプル値をＲＡＭ６４に記録すると、ＲＡＭ６４から（ｒ（０），ｌ（０））〜（ｒ（６５５３５），ｌ（６５５３５））、すなわち判定用未加工音声データ（０）を読み込む。続いて、ＤＳＰ６３は判定用未加工音声データ（０）に対し、既述の相関判定用データ生成処理、すなわち図４におけるステップＳ１〜ステップＳ８の処理と同様の処理を行う。その結果、ＤＳＰ６３は２５６のサンプル値を生成し、生成された２５６のサンプル値がＲＡＭ６４に記録される（ステップＳ１１）。以下、判定用未加工音声データ（ｎ）に対して相関判定用データ生成処理を行った結果得られる２５６のサンプル値をＹ_n（０）〜Ｙ_n（２５５）と表し、これら一連のデータを「判定用加工済音声データ（ｎ）」と呼ぶ。
【００８７】
続いて、ＤＳＰ６３はＲＡＭ６４から、ＳＭＦのシステムエクスクルーシブイベントに含まれる参照用加工済音声データ、すなわちＸ（０）〜Ｘ（２５５）と、ステップＳ１１で記録した判定用加工済音声データ（０）、すなわちＹ₀（０）〜Ｙ₀（２５５）とを読み込む（ステップＳ１２）。
【００８８】
続いて、ＤＳＰ６３は以下の式（２）および式（３）で表される判定処理を行う（ステップＳ１３）。
【数２】

【数３】

【００８９】
式（２）の左辺はＸ（ｍ）とＹ₀（ｍ）の値が近い程、１に近づく。そして、参照用加工済音声データと判定用加工済音声データ（０）を順に並べて同じ番号のデータを対にした時、各対のデータの値が合致すればする程、左辺の値が大きくなる。以下の説明において、この左辺の値を絶対相関指標と呼ぶ。なお、ｐの値は０〜１の範囲で任意に変更が可能で、楽曲の音声データの同じ部分から生成された部分参照用未加工音声データと判定用加工済音声データを用いて上記の式（２）による判定を行った場合には肯定的な結果（以下、「Ｙｅｓ」と呼ぶ）が得られ、たとえ類似していても楽曲の音声データの異なる部分から得られる参照用加工済音声データと判定用加工済音声データを用いて式（２）による判定を行った場合には否定的な結果（以下、「Ｎｏ」と呼ぶ）が得られるよう、経験的に定められる。
【００９０】
式（３）の左辺は０〜１の範囲で値をとり、Ｘ（ｍ）が示す音声波形とＹ₀（ｍ）が示す音声波形の形状が相似形に近いほど、１に近づく。以下の説明において、この左辺の値を相対相関指標と呼ぶ。上記の絶対相関指標は、参照用加工済音声データと判定用加工済音声データが楽曲の音声データの同じ部分から生成されていても、判定用加工済音声データが示す音声波形のレベルが、参照用加工済音声データが示す音声波形のレベルより小さいと、そのレベルに応じて１より小さい値をとる。また、逆に判定用加工済音声データが示す音声波形のレベルが大きいと、絶対相関指標はそのレベルに応じて１より大きい値をとる。それに対し、相対相関指標はいずれの場合であっても１に近い値をとるため、音楽ＣＤの異なる版において録音レベルが違う場合であっても、式（３）による判定はＹｅｓを与える。ｑの値は０〜１の範囲で任意に変更が可能で、ｐと同様に経験的に定められる。
【００９１】
ステップＳ１３における２つの判定処理の結果のいずれか一方もしくは両方がＮｏである場合、ＤＳＰ６３は判定用加工済音声データ（０）を用いた相関判定処理を終了し、ＣＰＵ６２から次のサンプル値のＲＡＭ６４への書込の完了通知を待つ。ＣＰＵ６２は音楽ＣＤドライブ１から新しいサンプル値を受信すると（ステップＳ１４）、これをＲＡＭ６４に記録し、ＤＳＰ６３に対し新たなサンプル値のＲＡＭ６４への書込の完了通知を送信する。ＤＳＰ６３はこの完了通知を受信すると、上記のステップＳ１１の処理に戻る。ただしその際、判定用未加工音声データ（０）の代わりに、新たに記録されたサンプル値を最後のサンプル値とする判定用未加工音声データに対し相関判定用データ生成処理を行う。その結果、ｎ回目のステップＳ１１の処理によって、ＲＡＭ６４には判定用加工済音声データ（ｎ−１）が記録される。
【００９２】
一方、ステップＳ１３における２つの判定処理の結果の両方がＹｅｓである場合、ＤＳＰ６３は以下の式（４）および式（５）で表される判定処理を行う（ステップＳ１５）。
【数４】

【数５】

【００９３】
式（４）の左辺はＸ（ｍ）とＹ_n（ｍ）の積和のｎ＝０における変化率である。以下の説明において、Ｘ（ｍ）とＹ_n（ｍ）の積和を相関値と呼ぶ。相関値は、参照用加工済音声データと判定用加工済音声データ（ｎ）を順に並べて同じ順番のデータを対にした時、各対のデータの値が近似すればする程、大きな値となる。相関値の変化率は、Ｘ（ｍ）とＹ₀（ｍ）による相関値、Ｘ（ｍ）とＹ₁（ｍ）による相関値、・・・と相関値を時系列的に並べた場合、相関値が極値をとる場合に０となる。従って、式（４）による判定処理は、相関値が極値であるかどうかを判定する処理である。また、式（５）によるは、その極値が極大値であることを判定する処理である。
【００９４】
ところで、ｎ＝０の場合は先行する相関値が存在しないため、判定が行えない。そこで本実施形態においては、ｎ＝０の場合、ステップＳ１５における判定結果はＮｏとする。その理由は、参照用未加工音声データは音楽ＣＤ−Ａの先頭からではなく、音声データの示す音声波形が閾値を超えた時点から取り出された音声データであるため、そのデータに対応する音声データが音楽ＣＤ−Ｂの先頭に位置する可能性は極めて低いためである。
【００９５】
さらに、より正確に説明すると、本実施形態においてはＸ（ｍ）およびＹ_n（ｍ）は離散値であるため、上記の左辺が厳密に０を取ることは極めてまれである。従って、ステップＳ１５の判定処理は、実際には次のように行われる。まず、ＤＳＰ６３はＸ（ｍ）とＹ_n（ｍ）の積和と、Ｘ（ｍ）とＹ_n-1（ｍ）の積和の差を取る。その値を以下、Ｄ_nと呼ぶ。続いて、ＤＳＰ６３はＤ_n-1が０より大きく、かつＤ_nが０以下であるか否かを判定する。ここでＤ_n-1が０より大きく、かつＤ_nが０以下である場合、相関値の変化率はＤ_nにおいて正の値から０になるか、０をまたがって変化しているので、その時点における相関値は極大値もしくは極大値の近似値である。従って、その場合、ステップＳ１５の判定処理の結果をＹｅｓとする。なお、上記の処理を行う場合、ｎ＝２以上である必要があるが、ｎ＝１の場合もｎ＝０と同様の理由により、ステップＳ１５の判定処理の結果はＮｏとする。
【００９６】
ステップＳ１５における判定処理の結果がＮｏであると、ＤＳＰ６３はＣＰＵ６２からの新たなサンプル値の書込の完了通知を待つ。ＣＰＵ６２から新たなサンプル値の書込の完了通知を受信すると（ステップＳ１４）、ＤＳＰ６３は上記のステップＳ１１の処理に戻る。その結果、ＲＡＭ６４には新たな判定用加工済音声データが記録される。
【００９７】
ステップＳ１３における判定処理の結果、もしくはステップＳ１５における判定処理の結果がＮｏとなり、その結果、ステップＳ１４を経てステップＳ１１に処理を戻すと、ＤＳＰ６３は引き続き上述したステップＳ１２〜ステップＳ１５の処理を行う。その結果、ＤＳＰ６３はステップＳ１５における判定処理の結果がＹｅｓとなるまで、判定用加工済音声データ（０）、判定用加工済音声データ（１）、判定用加工済音声データ（２）、・・・と、判定用加工済音声データを順次更新させる。
【００９８】
ここで例として、音楽ＣＤ−Ｂに記録されている音声データには、音楽ＣＤ−Ａに記録されている音声データと比較して、再生開始時点より５１６００のサンプル分、時間にして約１．１７秒だけ全体として遅れて楽曲が収録されているものとする。すなわち、参照用未加工音声データは、音楽ＣＤ−Ａに記録された音声データの（Ｒ（５２１５６），Ｌ（５２１５６））〜（Ｒ（１１７６９１），Ｌ（１１７６９１））が取り出されたものであるため、音楽ＣＤ−Ｂにおいて参照用未加工音声データに対応する音声データは、（ｒ（１０３７５６），ｌ（１０３７５６））〜（ｒ（１６９２９１），ｌ（１６９２９１））となる。
【００９９】
この場合、ＤＳＰ６３は判定用加工済音声データ（０）〜判定用加工済音声データ（１０３７５５）を用いて行うステップＳ１３もしくはステップＳ１５の判定処理の結果として、Ｎｏを得る。なぜなら、それらの判定用加工済音声データの生成に用いられる判定用未加工音声データ（０）〜判定用未加工音声データ（１０３７５５）が参照用未加工音声データと対応しておらず、十分な相関関係がないためである。
【０１００】
そして、ＤＳＰ６３は、判定用加工済音声データ（１０３７５６）を用いて行うステップＳ１３の判定処理の結果としてＹｅｓを得て、さらにステップＳ１５の判定処理の結果としてＹｅｓを得る。なぜなら、判定用加工済音声データ（１０３７５６）の生成に用いられる判定用未加工音声データ（１０３７５６）が参照用未加工音声データと対応しており、十分な相関関係が得られるためである。その結果、ＤＳＰ６３は一連の相関判定処理を終了し、相関判定処理の成功通知をＣＰＵ６２に送信する。
【０１０１】
図９は、実際の音声データのサンプルに対し、ステップＳ１３およびステップＳ１４の判定処理において用いられる計算式の値を算出し、グラフに示したものである。なお、このグラフの作成においては、図４におけるステップＳ３においては周波数２５Ｈｚのハイパスフィルタとして、１段のＩＩＲ（ＩｉｆｉｎｉｔｅＩｍｐｌｕｓｅＲｅｓｐｏｎｓｅ）フィルタを、またステップＳ５におけるくし型フィルタの定数としては、ｋ＝４４１０およびｆ＝１を、また、ステップＳ６においては周波数２５Ｈｚのローパスフィルタとして、１段のＩＩＲフィルタを用いている。さらに、図８のステップＳ１３における判定式の定数としては、ｐ＝０．５、ｑ＝０．８を用いている。
【０１０２】
図９の上段のグラフには、ｎに対する、式（２）の左辺の分子の値と、左辺の分母を右辺に移項した式の値が示されている。図９の中段のグラフには、ｎに対する、式（３）の左辺の分子の値と、左辺の分母を右辺に移項した式の値が示されている。また、図９の下段のグラフには、式（４）の左辺の値が示されている。
【０１０３】
図９によれば、ｎの値が区間Ａに位置する場合において、式（２）の左辺の分子の値が、左辺の分母を右辺に移項した式の値と等しいか、それを超えるため、式（２）の条件が満たされる。さらに、区間Ａのなかで、ｎの値が区間Ｂに位置する場合において、式（３）の左辺の分子の値が、左辺の分母を右辺に移項した式の値と等しいか、それを超えるため、式（３）の条件が満たされる。その結果、ステップＳ１３の判定処理において、Ｙｅｓの結果が得られる。区間Ｂのなかで、ｎの値が矢印Ｃで示される値をとる場合、式（４）の左辺の値が正の値から０となり、式（５）の条件も満たされるので、ステップＳ１５の判定処理において、Ｙｅｓの結果が得られる。
【０１０４】
［１．２．２．４］ＭＩＤＩイベントの再生
ＣＰＵ６２は、ＤＳＰ６３から相関判定処理の成功通知を受信すると、その時点を０秒として、計時を開始する。同時に、ＣＰＵ６２はＲＡＭ６４からＳＭＦを読み込み、計時される時間とＳＭＦに含まれるデルタタイムを順次比較し、計時される時間がデルタタイムと一致すると、そのデルタタイムに対応するＭＩＤＩイベントを自動演奏ピアノ部３に送信する。
【０１０５】
自動演奏ピアノ部３において、ＭＩＤＩイベント制御回路３４はＭＩＤＩイベントをＣＰＵ６２から受信すると、受信したＭＩＤＩイベントを音源３５もしくは駆動部３６に送信する。音源３５にＭＩＤＩイベントが送信される場合、音源３５は受信されるＭＩＤＩイベントに従い、楽器の音を示す音声データを順次、発音部４に送信する。発音部４は、既に再生されている音楽ＣＤ−Ｂの楽曲の音と共に、音源３５から受信される楽器音による演奏をスピーカ４４から出力する。一方、駆動部３６にＭＩＤＩイベントが送信される場合、駆動部３６は受信されるＭＩＤＩイベントに従い、ピアノ３１の鍵およびペダルを動かす。いずれの場合においても、ユーザは音楽ＣＤ−Ｂに記録された楽曲と、ＳＭＦに記録された演奏情報による楽器音による演奏とを同時に聴くことができる。
【０１０６】
［１．２．２．５］音声データとＭＩＤＩイベントの時間的関係
上記のように、ユーザは音楽ＣＤと、ＳＭＦに記録されたＭＩＤＩイベントとを同時に再生することができるが、その際、音楽ＣＤ−Ａと音楽ＣＤ−Ｂにおける楽曲の開始時間のずれは調整され、音楽ＣＤの再生とＳＭＦに記録されたＭＩＤＩイベントの再生がずれることはない。以下、図１０を用いて、音楽ＣＤ−Ａおよび音楽ＣＤ−Ｂの音声データと、ＭＩＤＩイベントの時間的関係を整理する。図１０においては、音楽ＣＤ−Ｂの音声データの示す音声波形のレベルが、音楽ＣＤ−Ａの音声データの示す音声波形と比較し、全体的に低い場合の状況を例として示している。なお、異なる２つの時間を区別するため、音楽ＣＤ−Ｂの再生開始時点を０秒とする時間には後に（Ｔ’）を付ける。
【０１０７】
もし、音楽ＣＤ−Ａと音楽ＣＤ−Ｂにおける楽曲の開始時間のずれが調整されず、音楽ＣＤの再生開始の時点を基準にＭＩＤＩイベントの再生が行われた場合、第１イベントは３．９２秒（Ｔ’）、第２イベントは５．３０秒（Ｔ’）、第３イベントは６．３８秒（Ｔ’）に自動演奏ピアノ部３に送信される。従って、ＭＩＤＩイベントによる演奏が、音楽ＣＤの楽曲に対して早過ぎる結果となる。
【０１０８】
しかしながら、音楽ＣＤ−Ｂの再生が開始された後、約３．８４秒は音楽ＣＤ−Ｂから取り出される判定用未加工音声データと、音楽ＣＤ−Ａから事前に取り出された参照用未加工音声データが大きく異なっているため、それらから生成される判定用加工済音声データと参照用加工済音声データの間に十分な相関関係がなく、ＭＩＤＩイベントの再生は開始されない。
【０１０９】
そして、約３．８４秒（Ｔ’）にそれらの音声データの間に十分な相関関係が得られ、それぞれが音楽ＣＤ−Ｂと音楽ＣＤ−Ａの楽曲の同じ部分から生成されたものであると判定される。この約３．８４秒（Ｔ’）、すなわち音楽ＣＤ−Ａにおける約２．６７秒（Ｔ）に相当する時間を基準にＭＩＤＩイベントのデルタタイムが計時されるため、第１イベントは約５．０９秒（Ｔ’）、第２イベントは約６．４７秒（Ｔ’）、第３イベントは約７．５５秒（Ｔ’）に自動演奏ピアノ部３に送信される。このように、ＭＩＤＩイベントの送信タイミングは調整され、音楽ＣＤ−Ｂに記録された楽曲に対し、正しいタイミングでＭＩＤＩイベントによる演奏がなされる。
【０１１０】
［２］第２実施形態
本発明の第２実施形態においては、音楽ＣＤに記録された音声データと、ＳＭＦに記録されたＭＩＤＩイベントの再生の同期調整に、音楽ＣＤに記録されているタイムコードが利用される。
【０１１１】
［２．１］音楽ＣＤドライブ
第２実施形態における全体構成、各構成部の機能、およびＭＩＤＩデータにおけるデータフォーマットは、音楽ＣＤドライブ１の機能を除いて、第１実施形態におけるものと同様であるため、音楽ＣＤドライブ１の機能についてのみ説明し、他の説明は省略する。
第２実施形態において、音楽ＣＤドライブ１は、音楽ＣＤに記録されている音声データと共に、タイムコードをコントローラ部６に送信する。その他の点は、第１実施形態における音楽ＣＤドライブ１と同様である。
【０１１２】
［２．２］動作
第２実施形態における同期記録再生装置ＳＳの動作については、以下の３点が第１実施形態と異なる。
（１）ＳＭＦにおいて、システムエクスクルーシブイベントに、参照用加工済音声データの生成に用いられる参照用未加工音声データの開始時点のタイムコードが記録される。
（２）ＳＭＦに記録される他のＭＩＤＩイベントのデルタタイムとして、それらのＭＩＤＩイベントの発生時点に対応するタイムコードが記録される。
（３）ＭＩＤＩイベントの再生動作において、ＭＩＤＩイベントは、コントローラ部６のクロックを用いた計時によらず、音楽ＣＤドライブ１から送信されるタイムクロックに基づいて自動演奏ピアノ部３に送信される。
【０１１３】
第２実施形態における他の動作は、第１実施形態におけるものと同様であるので、その詳細な説明は省略する。なお、以下の説明において、第１実施形態と同様に記録動作においては音楽ＣＤ−Ａが、再生動作においては音楽ＣＤ−Ｂが用いられるものとする。また、タイムコードの表現形式には、時間、分、秒、およびフレームが用いられるが、ＳＭＦに記録されるデルタタイムと同様に、以下の説明においては簡易化のため、タイムコードの示す時間情報を秒で表す。
【０１１４】
［２．２．１］記録動作
第２実施形態の同期記録再生装置ＳＳにおいて、ユーザが操作表示部５を用いて演奏データの記録開始の指示を行うと、音楽ＣＤ−Ａの音声データが、タイムコードと共に音楽ＣＤドライブ１からコントローラ部６に順次送信される。
【０１１５】
コントローラ部６において、ＣＰＵ６２は受信した音声データを順次発音部４に送信し、音楽ＣＤ−Ａの楽曲が発音部４から音として出力される。一方、ＣＰＵ６２は受信した音声データのサンプル値の絶対値が所定の閾値を超えると、その直前に受信されたタイムコードをデルタタイムの形式に変換し、そのデータをＲＡＭ６４に記録する。すなわち、ＲＡＭ６４にはデルタタイムとして「１．１８秒」が記録される。以下、このデルタタイムを「参照用音声データ開始タイム」と呼ぶ。
【０１１６】
ＣＰＵ６２は、参照用音声データ開始タイムを記録すると同時にサンプル値のＲＡＭ６４への記録を開始し、その後、約１．４９秒間分のサンプル値が参照用未加工音声データとしてＲＡＭ６４に記録される。
ＣＰＵ６２による参照用未加工音声データの記録が終了すると、ＤＳＰ６３は記録された参照用未加工音声データをＲＡＭ６４から読み込み、読み込まれた参照用未加工音声データに対し相関判定用データ生成処理を行う。その結果、ＲＡＭ６４には参照用加工済音声データが記録される。
【０１１７】
ＤＳＰ６３が相関判定用データ生成処理を行う一方で、ユーザは発音部４から聴こえる音楽ＣＤ−Ａの楽曲の音に合わせて、ピアノ３１による演奏を開始する。ユーザによる演奏の情報は、ＭＩＤＩイベントとして自動演奏ピアノ部３からコントローラ部６に送信される。ＣＰＵ６２はＭＩＤＩイベントを受信すると、その直前に音楽ＣＤドライブ１から受信されたタイムコードをデルタタイムの形式に変換し、そのデータをＭＩＤＩイベントと対応付けてＲＡＭ６４に記録する。
【０１１８】
音楽ＣＤ−Ａの楽曲の再生が終了し、ユーザによる演奏も終了すると、ユーザは操作表示部５を用いて演奏データの記録終了の指示を行う。このユーザの指示が行われると、まず、音楽ＣＤドライブ１による音楽ＣＤ−Ａの再生が停止される。続いて、ＣＰＵ６２は、参照用音声データ開始タイム、参照用加工済音声データ、ユーザの演奏により生成されたＭＩＤＩイベント、およびそのＭＩＤＩイベントに対応付けられたデルタタイムをＲＡＭ６４より読み出す。ＣＰＵ６２は読み出したこれらのデータを組み合わせ、ＳＭＦを生成する。
【０１１９】
図１１は、ＣＰＵ６２が生成するＳＭＦの概略を示す図である。このＳＭＦにおいて、システムエクスクルーシブイベントには参照用加工済音声データに加え、参照用音声データ開始タイムが格納されている。また、他のＭＩＤＩイベントに対応するデルタタイムは、ＭＩＤＩイベントとほぼ同時にＣＰＵ６２が受信したタイムコードと同じ時刻情報を含んでおり、例えば第１イベントに対するデルタタイムは３．９２秒である。このデルタタイムは、第１イベントは音楽ＣＤ−Ａの音声データの再生開始から３．９２秒後に生成されたことを示す。
ＣＰＵ６２は、生成したＳＭＦを書込命令と共にＦＤドライブ２に送信し、ＳＭＦはＦＤドライブ２により、ＦＤに書き込まれる。
【０１２０】
［２．２．２］再生動作
続いて、上述した方法によって記録されたＳＭＦを再生し、音楽ＣＤ−Ｂの音声データとＳＭＦのＭＩＤＩデータを同期させる際の動作を説明する。
ユーザが操作表示部５を用いて演奏データの再生開始の指示を行うと、まずＦＤドライブ２からＦＤに記録されているＳＭＦがＣＰＵ６２に送信され、ＣＰＵ６２は受信したＳＭＦをＲＡＭ６４に記録する。続いて、音楽ＣＤドライブ１は音楽ＣＤ−Ｂの再生を開始し、音楽ＣＤ−Ｂに記録されている音声データとタイムコードがコントローラ部６に順次送信される。ＣＰＵ６２は受信した音声データを順次発音部４に送信し、音楽ＣＤ−Ｂの楽曲が発音部４から音として出力される。ＣＰＵ６２は、音声データを順次発音部４に送信すると同時に、音声データをタイムコードと共にＲＡＭ６４に記録する。
【０１２１】
ＣＰＵ６２が６５５３６番目のサンプル値をＲＡＭ６４に記録すると、ＤＳＰ６３はＲＡＭ６４に記録された音声データに対する相関判定処理を開始する。そして、図８におけるステップＳ１５の判定処理の結果がＹｅｓとなるまで、ＤＳＰ６３は順次更新される判定用未加工音声データから判定用加工済音声データを生成し、生成した判定用加工済音声データに対しステップＳ１３およびステップＳ１５の判定処理を繰り返す。
【０１２２】
ＤＳＰ６３は、判定用加工済音声データ（１０３７５６）を用いて行うステップＳ１５の判定処理の結果としてＹｅｓを得て、一連の相関判定処理を終了し、相関判定処理の成功通知をＣＰＵ６２に送信する。この相関判定処理の成功通知には、相関判定処理において最後に用いた判定用加工済音声データ（１０３７５６）の番号「１０３７５６」が含まれている。
【０１２３】
ＣＰＵ６２はＤＳＰ６３から相関判定処理の成功通知を受信すると、受信した成功通知に含まれる番号「１０３７５６」に基づき、判定用未加工音声データ（１０３７５６）の先頭のサンプル値、すなわち（ｒ（１０３７５６），ｌ（１０３７５６））と共にＲＡＭ６４に記録されているタイムコードを読み出す。この場合、タイムコードの示す時間は２．３５秒である。続いて、ＣＰＵ６２は読み出したタイムコードの示す時間と、ＲＡＭ６４に記録されているＳＭＦのシステムエクスクルーシブイベントに含まれる参照用音声データ開始タイムの示す時間との差を算出する。
【０１２４】
この場合、参照用音声データ開始タイムの示す時間は１．１８秒であるので、これらの時間の差は１．１７秒となる。これは、ＳＭＦに記録されているデルタタイムが、全体として１．１７秒、音楽ＣＤ−Ｂの楽曲に対して早いことを示す。従って、ＣＰＵ６２はＳＭＦにおける各デルタタイムに１．１７秒を加算する。その結果、第１イベントに対するデルタタイムは３．９２秒から５．０９秒に、第２イベントに対するデルタタイムは５．３０秒から６．４７秒に、第３イベントに対するデルタタイムは６．３８秒から７．５５秒に、それぞれ更新される。以下、この動作を「タイミング調整処理」と呼ぶ。
【０１２５】
続いて、ＣＰＵ６２は音楽ＣＤドライブ１から順次送信されてくる音楽ＣＤ−Ｂのタイムコードと、更新後のデルタタイムとを順次比較し、それらの時間情報が一致すると、そのデルタタイムに対応するＭＩＤＩイベントを自動演奏ピアノ部３に送信する。
【０１２６】
自動演奏ピアノ部３においては、コントローラ部６から送信されるＭＩＤＩイベントに従い、自動演奏が行われる。その結果、ユーザは音楽ＣＤ−Ｂに記録された楽曲と、ＳＭＦに記録された演奏情報による演奏とを同時に聴くことができる。
【０１２７】
［２．２．３］音声データとＭＩＤＩイベントとの時間的関係
図１２は、ＭＩＤＩデータの記録動作および再生動作における、音楽ＣＤ−Ａおよび音楽ＣＤ−Ｂの音声データと、ＭＩＤＩイベントの時間的関係を示す図である。
図１２の上段の図は、ＭＩＤＩデータの記録動作における音楽ＣＤ−Ａのタイムコードが示す時間と、記録されるＭＩＤＩイベントに対応するデルタタイムが示す時間との関係を示している。この図に示されるように、デルタタイムにはＭＩＤＩイベントの発生時におけるタイムコードの示す時間情報がそのまま記録される。
【０１２８】
図１２の中段の図は、ＭＩＤＩデータの再生動作における音楽ＣＤ−Ｂのタイムコードが示す時間と、タイミング調整処理後のデルタタイムが示す時間との関係を示している。もし、タイミング調整処理前のデルタタイムに従い、音楽ＣＤ−Ｂのタイムコードに基づいてＭＩＤＩイベントの再生が行われると、ＭＩＤＩイベントの再生が音楽ＣＤ−Ｂの楽曲に対し早くなる。しかしながら、タイミング調整処理により、その時間的ずれが調整されるため、タイミング調整処理後のデルタタイムに従い、音楽ＣＤ−Ｂのタイムコードに基づいてＭＩＤＩイベントの再生が行われると、ＭＩＤＩイベントは音楽ＣＤ−Ｂの楽曲に対し正しいタイミングで再生される。
【０１２９】
ところで、音楽ＣＤドライブ１は、音楽ＣＤドライブ１が有する発振器からの基準クロック信号を分周することにより、４４１００Ｈｚのクロック信号を生成し、このクロック信号に従って音楽ＣＤに記録されている音声データを順次コントローラ部６に送信する。ここで発振器の動作が不安定な場合、全く同じ音楽ＣＤを再生しても、その再生速度が再生するたびにわずかに異なる場合がある。
【０１３０】
図１２の下段の図は、中段の図における音楽ＣＤ−Ｂの再生時における再生速度より、わずかに速い再生速度で音楽ＣＤ−Ｂが再生される場合の、音楽ＣＤ−Ｂのタイムコードが示す時間と、タイミング調整処理後のデルタタイムが示す時間との関係を示している。もし、ＣＰＵ６２のクロック信号に従ってＭＩＤＩイベントの再生が行われると、ＭＩＤＩイベントの再生は全体として、音楽ＣＤ−Ｂの楽曲に対しわずかに遅れる。すなわち、図１２の中段の図はＣＰＵ６２のクロック信号に従った時間とし、ＣＰＵ６２のクロック信号およびその分周処理には誤差がないとすると、音楽ＣＤドライブ１におけるクロック信号およびその分周処理における誤差により、第１イベントは時間ｔ１、第２イベントは時間ｔ２、第３イベントは時間ｔ３だけ、音楽ＣＤ−Ｂに対して遅く再生される。
【０１３１】
しかしながら、第２実施形態においては、音楽ＣＤドライブ１からリアルタイムにＣＰＵ６２に送信されるタイムコードに従ってＭＩＤＩイベントの再生が行われるため、ＭＩＤＩイベントが音楽ＣＤ−Ｂの楽曲に対し時間的にずれて再生されることはない。
【０１３２】
［３］第３実施形態
本発明の第３実施形態においては、参照用未加工音声データが、音楽ＣＤに記録された音声データの示す楽曲の冒頭部分からではなく、楽曲の途中部分から取り出される。なお、第３実施形態においては、第２実施形態と同様に、ＳＭＦに記録されたＭＩＤＩイベントの再生の同期調整に、音楽ＣＤに記録されているタイムコードが利用される。
第３実施形態における全体構成、各構成部の機能、およびＭＩＤＩデータにおけるデータフォーマットは第２実施形態におけるものと同様であるため、それらの説明は省略する。
【０１３３】
［３．１］動作
第３実施形態における同期記録再生装置ＳＳの動作については、以下の２点が第２実施形態と異なる。
（１）ＭＩＤＩイベントの記録動作において、参照用未加工音声データが、音楽ＣＤに記録された音声データの示す楽曲の途中部分から取り出される。
（２）ＭＩＤＩイベントの再生動作において、音楽ＣＤに記録された音声データに対する相関判定処理によりＭＩＤＩイベントの再生タイミングが決定された後、再度、音楽ＣＤに記録された音声データが先頭から再生される。
【０１３４】
［３．１．１］記録動作
第３実施形態におけるＭＩＤＩイベントの記録動作はおいては、音楽ＣＤに記録された音声データの任意の部分を参照用未加工音声データとして取り出すことができる。例えば楽曲の先頭から３分の時点から、約１．４９秒に相当するサンプル値を参照用未加工音声データとしてもよいし、楽曲全体の中で特徴的な音声波形を示す部分を含む約１．４９秒に相当するサンプル値を参照用未加工音声データとしてもよい。以下の説明においては、例として、音楽ＣＤ−Ａの楽曲の、タイムコードにおける３分、すなわち１８０秒の時点から約１．４９秒を参照用未加工音声データとして取り出すこととする。
【０１３５】
ユーザにより、演奏データの記録開始の指示が行われると、まず、音楽ＣＤドライブ１からＣＰＵ６２に対し、音楽ＣＤ−Ａの楽曲の先頭から１８０秒の時点からの６５５３６組分のオーディオデータが送信される。ＣＰＵ６２は受信したオーディオデータに含まれる先頭のタイムコードをデルタタイムの形式に変換し、そのデータを参照用音声データ開始タイムとしてＲＡＭ６４に記録する。また、ＣＰＵ６２は受信したオーディオデータに含まれる音声データのサンプル値を参照用未加工音声データとしてＲＡＭ６４に記録する。ＣＰＵ６２は、この参照用未加工音声データに対し相関判定用データ生成処理を行い、その結果、ＲＡＭ６４には参照用加工済音声データが記録される。
【０１３６】
続いて、音楽ＣＤドライブ１は音楽ＣＤ−Ａを先頭から再生する。ＣＰＵ６２は音楽ＣＤドライブ１から順次オーディオデータを受信し、受信したオーディオデータに含まれる音声データを発音部４に送信する。ユーザは発音部４から発せられる音楽ＣＤ−Ａの楽曲の音に合わせてピアノ３１を用いた演奏を行い、その演奏情報はＭＩＤＩイベントとして順次ＣＰＵ６２に送信される。ＣＰＵ６２はＭＩＤＩイベントを受信すると、その直前に音楽ＣＤドライブ１から受信されたタイムコードをデルタタイムの形式に変換し、そのデータをＭＩＤＩイベントと対応付けてＲＡＭ６４に記録する。
【０１３７】
ユーザにより、演奏データの記録終了の指示が行われると、音楽ＣＤドライブ１は音楽ＣＤ−Ａの再生を停止する。同時に、ＣＰＵ６２はＲＡＭ６４に記録されたデータから図１３に示すＳＭＦを生成する。生成されたＳＭＦはＦＤドライブ２によりＦＤに書き込まれる。
【０１３８】
［３．１．２］再生動作
続いて、上述した方法によって記録されたＳＭＦを、音楽ＣＤ−Ｂに対して同期再生する場合、ユーザによる演奏データの再生開始の指示により、まずＦＤドライブ２からＣＰＵ６２に対し、ＳＭＦが送信される。ＳＭＦはＲＡＭ６４に記録される。続いて、音楽ＣＤドライブ１から音楽ＣＤ−Ｂの音声データおよびタイムコードが順次、ＣＰＵ６２に送信される。
【０１３９】
ＣＰＵ６２は６５５３６番目の音声データのサンプル値を受信すると、受信された一連のサンプル値に対する相関判定処理を開始する。ＣＰＵ６２の制御のもとで、ＤＳＰ６３は順次受信される音声データのサンプル値により相関判定処理に用いる参照用未加工音声データを更新し、図８におけるステップＳ１５の判定処理の結果がＹｅｓとなるまで、相関判定処理が続けられる。音楽ＣＤ−Ｂの楽曲は、音楽ＣＤ−Ａの楽曲に対して、全体として約１．１７秒の遅れがあるので、ＣＰＵ６２は音楽ＣＤ−Ｂの楽曲の先頭から約１８２．３５秒の時点に対応するサンプル値を受信し、そのサンプル値を最後とする参照用未加工音声データに対して相関判定処理を行う結果、ステップＳ１５の判定処理の結果としてＹｅｓを得て、相関判定処理を終了する。ＣＰＵ６２は相関判定処理が成功した際に用いられた参照用未加工音声データの先頭のデータに対応するタイムコードとして、１８１．１７秒を得る。
【０１４０】
続いて、ＣＰＵ６２はこのタイムコードの示す時間と、ＳＭＦのシステムエクスクルーシブイベントに含まれるデルタタイムの示す時間との差を算出する。この場合、これらの時間の差は１．１７秒となるため、ＣＰＵ６２はＳＭＦにおける各デルタタイムに１．１７秒を加算する。その結果、第２実施形態と同様に、各デルタタイムは音楽ＣＤ−Ｂの楽曲に対し、正しいタイミングを示すよう調整される。以上がＭＩＤＩイベントの再生タイミングを決定する処理であり、この処理の間、音楽ＣＤ−Ｂの楽曲は発音部４に送信されず、従ってユーザに音楽ＣＤ−Ｂの楽曲は聴こえない。
【０１４１】
以上の処理が終了すると、音楽ＣＤドライブ１は再度、音楽ＣＤ−Ｂの再生を楽曲の先頭から行う。音楽ＣＤ−Ｂの楽曲の音声データは、ＣＰＵ６２を介して発音部４に送信され、ユーザは発音部４から楽曲の音を聴くことができる。同時に、ＣＰＵ６２は音楽ＣＤドライブ１から受信する音楽ＣＤ−Ｂのタイムコードと、更新後のＳＭＦにおけるデルタタイムを順次比較し、それらの時間情報が一致すると、そのデルタタイムに対応するＭＩＤＩイベントを自動演奏ピアノ部３に送信する。その結果、自動演奏ピアノ部３による自動演奏が行われる。
【０１４２】
図１４は第３実施形態における参照用未加工音声データ、参照用加工済音声データ、判定用未加工音声データ、および判定用加工済音声データの関係を示す模式図である。参照用未加工音声データは、音楽ＣＤ−Ａの先頭から時間Ｔ１が経過した時点から約１．４９秒分の音声データを取り出したものである。この参照用未加工音声データに対し、相関判定用データ生成処理が行われ、参照用加工済音声データが生成される。参照用加工済音声データはＳＭＦの先頭に、時間Ｔ１を示す時間情報と共に格納される。
音楽ＣＤ−Ｂにおいて、音楽ＣＤ−Ａにおける参照用未加工音声データに対応する音声データは、先頭から時間Ｔ２が経過した時点から約１．４９秒分の音声データとして記録されている。
【０１４３】
ＳＭＦに含まれるＭＩＤＩイベントに対するデルタタイムの調整は、Ｔ１とＴ２の差に基づいて行われる。すなわち、Ｔ１がＴ２よりも小さければ、ＳＭＦにおけるデルタタイムはその差だけ加算され、Ｔ１がＴ２よりも大きければ、ＳＭＦにおけるデルタタイムはその差だけ減算される。
【０１４４】
［４］変形例
上述した第１実施形態、第２実施形態および第３実施形態は、それぞれ本発明の実施形態の例示であり、上記実施形態に対しては、本発明の趣旨から逸脱しない範囲で様々な変形を加えることができる。以下、変形例を示す。
【０１４５】
［４．１］第１変形例
第１変形例においては、同期記録再生装置ＳＳの各構成部は同じ装置の中に配置されておらず、グループごとに分離して配置されている。
例えば、以下のそれぞれのグループに分離配置することが可能である。
（１）音楽ＣＤドライブ１
（２）ＦＤドライブ２
（３）自動演奏ピアノ部３
（４）ミキサ４１およびＤ／Ａコンバータ４２
（５）アンプ４３
（６）スピーカ４４
（７）操作表示部５およびコントローラ部６
さらに、コントローラ部６は、記録動作のみを行う装置と再生動作のみを行う装置とに別々に構成されていてもよい。
【０１４６】
これらの構成部のグループ間は、オーディオケーブル、ＭＩＤＩケーブル、オーディオ用光ケーブル、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）ケーブル、および専用の制御用ケーブル等で接続される。また、ＦＤドライブ２、アンプ４３、スピーカ４４等は市販のものを利用してもよい。
第１変形例によれば、同期記録再生装置ＳＳの配置の柔軟性が高まると同時に、ユーザは同期記録再生装置ＳＳの全てを新たに準備することなく、必要な構成部のみを準備することにより、必要なコストを低減できる。
【０１４７】
［４．２］第２変形例
第２変形例においては、同期記録再生装置ＳＳにおいて音楽ＣＤドライブ１およびＦＤドライブ２はない。その一方で、通信インタフェースはＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）に接続可能な機能を有し、外部の通信機器とＬＡＮおよびＷＡＮを介して接続されている。さらに、コントローラ部６はＨＤ（ＨａｒｄＤｉｓｋ）を有している。
【０１４８】
コントローラ部６は、ＬＡＮを介して他の通信機器から、音声データとタイムコードを含むデジタルオーディオデータを受信し、受信したオーディオデータをＨＤに記録する。同様に、コントローラ部６は、ＬＡＮを介して他の通信機器から、オーディオデータに対応して作成されたＳＭＦを受信し、受信したＳＭＦをＨＤに記録する。
【０１４９】
コントローラ部６は音楽ＣＤドライブ１から音楽ＣＤの音声データおよびタイムコードを受信する代わりに、ＨＤからデジタルオーディオデータを読み出す。また、コントローラ部６はＦＤドライブ２に対しＳＭＦの書込および読出を行う代わりに、ＨＤに対し同様の動作を行う。
第２変形例によれば、ユーザはデジタルオーディオデータおよびＳＭＦを、ＬＡＮを介して地理的に離れた通信機器との間で送受信することができる。なお、ＬＡＮはインターネット等の広域通信網と接続されていてもよい。
【０１５０】
［４．３］第３変形例
上述した実施形態においては、相関判定処理のステップＳ１３およびステップＳ１５において、絶対相関指標による判定、相対相関指標による判定、および相関値による判定の全てが用いられているが、第３変形例においては、これらの判定の１つまたは複数の組み合わせにより相関判定処理が行われる。なお、これらの判定の１つまたは複数の組み合わせを、自由に選択可能としてもよい。
第３変形例によれば、より柔軟に必要とされる精度の判定結果を得ることができる。
【０１５１】
［４．４］第４変形例
上述した実施形態においては、相関判定処理のステップＳ１５において、式（４）および式（５）で示される判定により、相関値の極大値が検出されているが、第４変形例においては、式（４）で示される判定のみが行われ、相関値の極値が検出される。
【０１５２】
より具体的には、ステップＳ１５において、ＤＳＰ６３はＤ_n-1とＤ_nの積を求め、その積が０以下であるか否かを判定する。ここでその積が０以下である場合、相関値の変化率は０であるか、もしくは０をまたがって変化しているので、その時点における相関値は極値もしくは極値の近似値である。従って、Ｄ_n-1とＤ_nの積が０以下である場合、ステップＳ１５の判定処理の結果をＹｅｓとする。
【０１５３】
第４変形例によれば、極大値の近辺に極小値が現れる可能性が低い場合、上述した実施形態におけるステップＳ１５と同様の判定結果を、より簡易な判定処理により得ることができる。
【０１５４】
【発明の効果】
以上説明したように、本発明によれば、同じ楽曲であっても、楽曲の開始時点が異なる音声データを記録した、異なる版の音声データのいずれに対しても、正しいタイミングで演奏データの同期再生を開始することができる。従って、同じ楽曲の異なる版に対し異なる演奏データを準備する必要がなく、データの作成および管理が簡便化される。
【０１５５】
なお、同じ楽曲の異なる版においては、楽曲の録音レベルが異なる場合があるが、本発明によれば、演奏データの再生開始のタイミングを決定する際に用いられる指標の一つとして、参照用の音声データの示す音声波形の形状と、実際の音声データの示す音声波形の形状との相似の度合いを示す指標を用いることが可能なため、録音レベルが異なる版のオーディオデータに対しても、正しく再生開始のタイミングを決定することができる。
【０１５６】
さらに、本発明において、タイムコードを用いた演奏データの再生が行われる場合、オーディオデータの再生速度が不安定な場合であっても、そのオーディオデータに対し正しいタイミングで演奏データの再生が行われる。
【図面の簡単な説明】
【図１】本発明の第１実施形態および第２実施形態に係る同期記録再生装置ＳＳの構成を示す図である。
【図２】ＭＩＤＩイベントの構成を示す図である。
【図３】ＳＭＦの構成を示す図である。
【図４】本発明の第１実施形態および第２実施形態に係る相関判定用データ生成処理のフロー図である。
【図５】本発明の第１実施形態および第２実施形態に係るくし形フィルタの構成を示す図である。
【図６】本発明の第１実施形態に係る記録動作における音声データとＭＩＤＩイベントの時間的関係を示す図である。
【図７】本発明の第１実施形態に係るＳＭＦの概略を示す図である。
【図８】本発明の第１実施形態および第２実施形態に係る相関判定処理のフロー図である。
【図９】本発明の第１実施形態および第２実施形態に係る相関判定処理における、計算式の値の変化と判定結果との関係を示す図である。
【図１０】本発明の第１実施形態に係る記録動作における音声データ、再生動作における音声データ、およびＭＩＤＩイベントの時間的関係を示す図である。
【図１１】本発明の第２実施形態に係るＳＭＦの概略を示す図である。
【図１２】本発明の第２実施形態に係る記録動作における音声データ、再生動作における音声データ、およびＭＩＤＩイベントの時間的関係を示す図である。
【図１３】本発明の第３実施形態に係るＳＭＦの概略を示す図である。
【図１４】本発明の第３実施形態に係る参照用未加工音声データ、参照用加工済音声データ、判定用未加工音声データ、および判定用加工済音声データの関係を示す図である。
【符号の説明】
１・・・音楽ＣＤドライブ、２・・・ＦＤドライブ、３・・・自動演奏ピアノ部、４・・・発音部、５・・・操作表示部、６・・・コントローラ部、３１・・・ピアノ、３２・・・キーセンサ、３３・・・ペダルセンサ、３４・・・ＭＩＤＩイベント制御回路、３５・・・音源、３６・・・駆動部、４１・・・ミキサ、４２・・・Ｄ／Ａコンバータ、４３・・・アンプ、４４・・・スピーカ、６１・・・ＲＯＭ、６２・・・ＣＰＵ、６３・・・ＤＳＰ、６４・・・ＲＡＭ、６５・・・通信インタフェース。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an apparatus and method for reproducing performance data including information related to music performance control in synchronization with reproduction of audio data.
[0002]
[Prior art]
As means for reproducing music, there is an apparatus that reads audio data from a storage medium such as a music CD (Compact Disc) and generates and outputs audio from the read audio data. Further, as another means for reproducing the music, data including information related to performance control of the music is read from a storage medium such as an FD (Floppy Disk), and the sound generation of the sound source device is controlled using the read data. There is a device that performs automatic performance. As data including information related to music performance control, there is MIDI data created according to the MIDI (Musical Instrument Digital Interface) standard.
[0003]
Recently, a method has been proposed in which automatic performance using MIDI data is synchronized with reproduction of audio data recorded on a music CD. One of them is a method using a time code recorded on a music CD (see, for example, Patent Document 1 and Patent Document 2). Hereinafter, this method will be described.
[0004]
First, audio data and a time code of a music CD are reproduced by the music CD reproducing device. The audio data is output as sound, and the time code is supplied to the recording device. Here, the time code is data associated with a certain unit of audio data, and each time code represents an elapsed time from the start of the music to the reproduction timing of the audio data corresponding to the time code. ing. The musical instrument is played in accordance with the reproduction of the music CD, and MIDI data is sequentially supplied from the musical instrument to the recording device. When the recording device receives the MIDI data from the musical instrument, the recording device records the MIDI data on a recording medium together with time information indicating the reception timing. Further, when the recording device receives the time code from the music CD reproducing device, the recording device records it on the recording medium together with time information indicating the reception timing. As a result, a file in which time code and MIDI data are mixed is created on the recording medium. In this file, each time code and MIDI data is accompanied by time information indicating the elapsed time from the music playback start time to each playback time.
[0005]
When the MIDI data and the time code are recorded on the recording medium in this way, when the audio data of the same music is subsequently reproduced from the music CD, the MIDI data is read from the recording medium in synchronism with this and the automatic performance is performed. It can be carried out. The operation is as follows.
[0006]
First, audio data and time code are reproduced from the music CD by the music CD reproducing device. The audio data is output as sound, and the time code is supplied to a MIDI data reproducing apparatus. At the same time, the playback device reads the MIDI data recorded in the file in accordance with the time information recorded together and sequentially transmits it to the musical instrument capable of automatic performance using the MIDI data. At that time, the playback device adjusts the time lag between the playback of the audio data of the music CD and the playback of the MIDI data based on the time code received from the music CD playback device and the time code read from the file together with the MIDI data. As a result, the synchronized playback of the audio data of the music CD and the MIDI data is realized.
[0007]
Patent Document 1: Japanese Patent Application No. 2002-7872
Patent Document 2: Japanese Patent Application No. 2002-7873
[0008]
[Problems to be solved by the invention]
However, for music CDs with different time codes even for the same music, synchronized playback of audio data and MIDI data of the music CD cannot be realized by a method using the time code of the music CD.
[0009]
Currently, there are many different versions of music CDs for the same song. Even if the music is the same, if the music CD version is different, there is a difference in the silence time before the music between the music CDs, and the time code at the time when the music performance actually starts is large. May be different. That is, if the MIDI data for synchronous performance created by the conventional technology using time code is used for different versions of the music CD of the same music, the performance by the MIDI data is performed before the performance of the music is actually started. On the contrary, since the performance based on the MIDI data is not started for a while after the performance of the music is started, the performance based on the MIDI data is entirely deviated from the music on the music CD.
[0010]
Therefore, according to the technique using the conventional time code, even for music CDs that record the audio data of the same music, different synchronized performances are used depending on the time code variation corresponding to the actual music performance start time. There was a problem that MIDI data had to be prepared.
[0011]
In view of the above-described situation, the present invention provides performance data such as MIDI data that can be synchronized and reproduced with respect to a plurality of versions of audio data whose actual music performance start points are different from each other even if the audio data is the same music. An object of the present invention is to provide a recording apparatus, a reproducing apparatus, a recording method, a reproducing method, and a program.
[0012]
[Means for Solving the Problems]
In order to solve the problems described above, the present invention provides first receiving means for receiving sound data indicating a sound waveform of music, second receiving means for receiving control data for instructing performance control, and the sound data. Generating means for generating reference data that abstracts a speech waveform represented by partial data that is a part of the data, time for recording the reference data, and timing for reproducing the partial data and receiving the control data There is provided a recording apparatus comprising recording means for recording performance data composed of time data indicating a physical relationship.
[0013]
Further, the present invention provides a first receiving means for receiving reference data that abstracts an audio waveform, performance data including control data for instructing performance control and time data for instructing execution timing of performance control. And second receiving means for receiving voice data indicating the voice waveform of the music; and selecting means for selecting, as partial data, data showing a voice waveform similar to the voice waveform represented by the reference data from the voice data And a transmission means for transmitting the control data at a timing determined by the reproduction timing of the partial data and the time data.
[0014]
In addition, the present invention provides a first reception process for receiving audio data indicating an audio waveform of music, a second reception process for receiving control data for instructing performance control, and partial data that is a part of the audio data. A generation process for generating reference data that abstracts the audio waveform represented by: time data that records the reference data and indicates a temporal relationship between the reproduction timing of the partial data and the reception timing of the control data; And a recording process for recording performance data comprising: a recording method comprising:
[0015]
The present invention also provides a first receiving process of receiving performance data comprising reference data abstracted from a speech waveform, control data for instructing performance control, and time data for instructing execution timing of performance control. And a second reception process for receiving audio data indicating the audio waveform of the music, and a selection process for selecting, as partial data, data indicating an audio waveform similar to the audio waveform represented by the reference data from the audio data And a transmission process of transmitting the control data at a timing determined by the reproduction timing of the partial data and the time data.
[0016]
The present invention also provides a program for causing a computer to execute processing using these recording methods and reproducing methods.
[0017]
The present invention also provides a recording medium on which performance data comprising reference data abstracted from a sound waveform, control data for instructing performance control, and time data for instructing execution timing of the performance is recorded. To do.
[0018]
Using the apparatus, method, program, and recording medium having such a configuration, when reproducing audio data, the temporal position of the reference data with respect to the audio data can be determined based on the similarity of the waveform represented by the audio data. Further, the timing for reproducing the control data can be determined based on the temporal position of the reference data. As a result, synchronized reproduction of audio data and control data is realized.
[0019]
The recording apparatus according to the present invention further includes third receiving means for receiving a time code indicating the reproduction timing of the audio data, and the recording means generates the time data based on time information indicated by the time code. It is good also as composition to do.
The reproduction apparatus according to the present invention further includes third reception means for receiving a time code indicating the reproduction timing of the audio data, and the transmission means transmits the control data based on time information indicated by the time code. It is good also as composition which performs.
[0020]
When the recording device and the playback device having such a configuration are used, the time data according to the time code is also measured for the audio data played back by the playback device having a bias in the playback speed. Is called.
[0021]
In the recording apparatus according to the present invention, the generation unit may include a filter unit for removing a direct current component of a voice waveform represented by input data. In the recording apparatus according to the present invention, the generation unit may include a filter unit for extracting a component of a specific frequency band included in a speech waveform represented by input data.
[0022]
Further, in the playback apparatus according to the present invention, the selection unit includes a generation unit that generates determination data by abstracting a voice waveform represented by a part of the voice data, and the generation unit includes the voice indicated by the input data. It is good also as a structure provided with the filter means for removing the direct current | flow component of a waveform.
Further, in the playback apparatus according to the present invention, the selection unit includes a generation unit that generates determination data by abstracting a voice waveform represented by a part of the voice data, and the generation unit includes the voice indicated by the input data. It is good also as a structure provided with the filter means for taking out the component of the specific frequency band contained in a waveform.
[0023]
When the recording apparatus and the reproducing apparatus having such a configuration are used, when the temporal position of the reference data with respect to the audio data is determined based on the similarity of the audio waveform represented by the audio data, the position can be determined with high accuracy. .
[0024]
In the recording apparatus according to the present invention, the generation unit may include a downsampling unit that downsamples input data.
In the playback device according to the present invention, the selection unit includes a generation unit that generates determination data by abstracting a voice waveform represented by a part of the voice data, and the generation unit downsamples the input data. It is good also as a structure provided with the downsampling means to do.
[0025]
When the recording apparatus and the reproducing apparatus having such a configuration are used, the data amount of the reference data is reduced, and data recording and transmission / reception are facilitated.
[0026]
In the playback device according to the present invention, the selection unit includes a generation unit that generates determination data obtained by abstracting a voice waveform represented by a part of the audio data, and the reference data, the determination data, The partial data may be selected based on an index obtained by dividing the product sum by the square sum of the reference data.
[0027]
In the playback device according to the present invention, the selection unit includes a generation unit that generates determination data obtained by abstracting a voice waveform represented by a part of the audio data, and the reference data, the determination data, The partial data may be selected on the basis of an index obtained by dividing the square of the product sum of the two by the product of the square sum of the reference data and the square sum of the reference data.
[0028]
In the playback device according to the present invention, the selection unit includes a generation unit that generates determination data obtained by abstracting a voice waveform represented by a part of the audio data, and the reference data, the determination data, The partial data may be selected based on the rate of change of the product sum.
[0029]
When the playback device having such a configuration is used, the position can be determined with high accuracy when the temporal position of the reference data with respect to the audio data is determined based on the similarity of the audio waveform represented by the audio data.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
[1] First embodiment
[1.1] Configuration, function, and data format
[1.1.1] Overall configuration
FIG. 1 is a diagram showing a configuration of a synchronous recording / reproducing apparatus SS according to the first embodiment of the present invention. The synchronous recording / reproducing apparatus SS includes a music CD drive 1, an FD drive 2, an automatic performance piano unit 3, a sound generation unit 4, an operation display unit 5, and a controller unit 6.
[0031]
The music CD drive 1, the FD drive 2, the automatic performance piano unit 3, the sound generation unit 4, and the operation display unit 5 are connected to the controller unit 6 through communication lines. Further, the automatic performance piano unit 3 and the sound generation unit 4 are directly connected by a communication line.
[0032]
[1.1.2] Music CD drive
The audio data recorded on the music CD includes audio data indicating audio information and a time code indicating the reproduction timing of the audio data. The music CD drive 1 is a device that reads audio data from a loaded music CD in accordance with an instruction from the controller unit 6 and sequentially outputs audio data included in the read audio data. The music CD drive 1 is connected to the communication interface 65 of the controller unit 6 by a communication line.
[0033]
The audio data output from the music CD drive 1 is digital audio data consisting of left and right channels with a sampling frequency of 44100 Hz and a quantization bit number of 16. The data output from the music CD drive 1 does not include a time code. Since the configuration of the music CD drive 1 is the same as that of a general music CD drive capable of digitally outputting audio data, the description thereof is omitted.
[0034]
[1.1.3] FD drive
The FD drive 2 is a device that records SMF (Standard MIDI File) on the FD, reads the SMF recorded on the FD, and transmits the read SMF. The FD drive 2 is connected to the communication interface 65 of the controller unit 6 through a communication line. Since the configuration of the FD drive 2 is the same as that of a general FD drive, the description thereof is omitted.
[0035]
[1.1.4] MIDI events and SMF
The SMF is a file including a MIDI event that is performance control data according to the MIDI standard, and a delta time that is data indicating the execution timing of each MIDI event. The MIDI event and SMF format will be described with reference to FIGS.
[0036]
FIG. 2 shows a note-on event, a note-off event, and a system exclusive event as examples of the MIDI event. The note-on event is a MIDI event for instructing the pronunciation of a musical sound. The note-on event is 9 nH (n is a channel number, H is a hexadecimal number, and so on), the note number is a pitch, and the strength of the sound ( Or velocity indicating the speed of keystroke). Similarly, the note-off event is a MIDI event for instructing the muting of a musical sound, and is composed of 8nH indicating muting, a note number indicating pitch, and a velocity indicating the strength at the time of muting (or the speed of releasing a key). . On the other hand, the system exclusive event is a MIDI event for transmitting / receiving or recording data in a format freely determined by the manufacturer of the product or software. F0H indicating the start of the system exclusive event, data length, data, and the end of the system exclusive event It consists of F7H which shows. As described above, the MIDI event does not have time information and is used for the purpose of performing tone generation, muting, and other control in real time.
[0037]
FIG. 3 shows an outline of the SMF format. The SMF consists of a header chunk and a track chunk. The header chunk includes control data related to the format and time unit of data included in the track channel. The track chunk includes a MIDI event and a delta time indicating the execution timing of each MIDI event.
[0038]
In SMF, the delta time is a method of expressing the relative time for the immediately preceding MIDI event in units of time called a clock, and the absolute time from the beginning of the music is called hours, minutes, seconds and frames. There is a method of expressing by a combination of time units. In the following description, for ease of explanation, the delta time is an absolute time from the reference time point, and its unit is expressed in seconds.
In this specification, MIDI data is a generic name for data created in accordance with the MIDI standard.
[0039]
[1.1.5] Automatic piano part
The automatic performance piano unit 3 is a musical sound generating device that outputs an acoustic piano sound and a piano sound by electronic musical sound synthesis in response to a key operation and a pedal operation by a user of the synchronous recording / reproducing device SS. The automatic performance piano unit 3 generates a MIDI event in response to a key operation and a pedal operation by the user, and transmits the generated MIDI event. Further, the automatic performance piano unit 3 receives a MIDI event, and performs an automatic performance with an acoustic piano sound and a piano sound by electronic musical sound synthesis in accordance with the received MIDI event.
[0040]
The automatic performance piano unit 3 includes a piano 31, a key sensor 32, a pedal sensor 33, a MIDI event control circuit 34, a sound source 35, and a drive unit 36.
[0041]
The key sensor 32 and the pedal sensor 33 are disposed on each of the plurality of keys and the plurality of pedals of the piano 31, respectively, and detect the positions of the keys and the pedals, respectively. The key sensor 32 and the pedal sensor 33 transmit the detected position information to the MIDI event control circuit 34 together with the identification number and detection time information corresponding to each key and pedal.
[0042]
The MIDI event control circuit 34 receives the key and pedal position information from the key sensor 32 and the pedal sensor 33 together with the key and pedal identification information and time information. This is a circuit that generates a MIDI event such as an off event and outputs the generated MIDI event to the controller unit 6 and the sound source 35. Further, the MIDI event control circuit 34 has a function of receiving a MIDI event from the controller unit 6 and transferring the received MIDI event to the sound source 35 or the driving unit 36. Whether the MIDI event control circuit 34 transfers the MIDI event received from the controller unit 6 to either the sound source 35 or the drive unit 36 depends on an instruction from the controller unit 6.
[0043]
The sound source 35 is a device that receives a MIDI event from the MIDI event control circuit 34 and outputs sound information of various musical instruments as digital audio data of two left and right channels based on the received MIDI event. The sound source 35 electronically synthesizes the digital audio data of the pitch designated by the received MIDI event and transmits it to the mixer 41 of the sound generator 4.
[0044]
The drive unit 36 is disposed on each key and each pedal of the piano 31 and includes a solenoid group that drives them and a control circuit that controls the solenoid groups. When receiving the MIDI event from the MIDI event control circuit, the control circuit of the drive unit 36 adjusts the amount of current supplied to the solenoid arranged in the corresponding key or pedal, and controls the magnetic force generated by the solenoid, Realizes key or pedal operation in response to MIDI events.
[0045]
[1.1.6] Sound generator
The sound generation unit 4 is a device that receives audio data from the automatic performance piano unit 3 and the controller unit 6, converts the received audio data into sound, and outputs the sound. The sound generation unit 4 includes a mixer 41, a D / A converter 42, an amplifier 43, and a speaker 44.
[0046]
The mixer 41 is a digital stereo mixer that receives a plurality of digital audio data consisting of left and right channels and converts them into a set of left and right digital audio data. The mixer 41 receives the digital audio data from the sound source 35 of the automatic performance piano unit 3 and simultaneously receives the digital audio data read from the music CD by the music CD drive 1 via the controller unit 6. The mixer 41 arithmetically averages the received digital audio data and transmits the digital audio data to the D / A converter 42 as a pair of left and right digital audio data.
[0047]
The D / A converter 42 receives the digital audio data from the mixer 41, converts the received digital audio data into an analog audio signal, and outputs the analog audio signal to the amplifier 43. The amplifier 43 amplifies the analog audio signal input from the D / A converter 42 and outputs it to the speaker 44. The speaker 44 converts the amplified analog audio signal input from the amplifier 43 into sound. As a result, the sound data recorded on the music CD and the sound data generated by the sound source 35 are output from the sound generator 4 as stereo sound.
[0048]
[1.1.7] Operation display section
The operation display unit 5 is a user interface used when the user of the synchronous recording / reproducing apparatus SS performs various operations of the synchronous recording / reproducing apparatus SS.
[0049]
The operation display unit 5 has a keypad to be pressed when the user gives an instruction to the synchronous recording / reproducing apparatus SS, a liquid crystal display for the user to confirm the state of the synchronous recording / reproducing apparatus SS, and the like. When the user presses the keypad, the operation display unit 5 outputs a signal corresponding to the pressed keypad to the controller unit 6. When the operation display unit 5 receives bitmap data including character and graphic information from the controller unit 6, the operation display unit 5 displays the characters and graphics on the liquid crystal display based on the received bitmap data.
[0050]
[1.1.8] Controller part
The controller unit 6 is a device that controls the entire synchronous recording / reproducing apparatus SS. The controller unit 6 includes a ROM (Read Only Memory) 61, a CPU (Central Processing Unit) 62, a DSP (Digital Signal Processor) 63, a RAM (Random Access Memory) 64, and a communication interface 65. These components are connected to each other by a bus.
[0051]
The ROM 61 is a non-volatile memory that stores various control programs. The control program stored in the ROM 61 includes a program for causing the CPU 62 to execute processing in SMF recording and playback operations, which will be described later, in addition to a program for performing general control processing. The CPU 62 is a microprocessor capable of executing general-purpose processing, reads a control program from the ROM 61, and performs control processing according to the read control program. The DSP 63 is a microprocessor capable of processing digital audio data at high speed, and performs correlation determination data generation processing to be described later on the digital audio data received by the controller unit 6 from the music CD drive 1 or the FD drive 2 under the control of the CPU 62. Then, processing such as filter processing required in the correlation determination processing is performed, and data obtained as a result is transmitted to the CPU 62. The RAM 64 is a volatile memory, and temporarily stores data used by the CPU 62 and the DSP 63. The communication interface 65 is an interface capable of transmitting / receiving digital data of various formats, and is digital data transmitted / received among the music CD drive 1, the FD drive 2, the automatic performance piano unit 3, the sound generation unit 4, and the operation display unit 5. The necessary format conversion is performed, and data is relayed between these devices and the controller unit 6.
[0052]
[1.2] Operation
Next, the operation of the synchronous recording / reproducing apparatus SS will be described.
[1.2.1] Recording operation
First, the operation of the synchronous recording / reproducing apparatus SS when the user of the synchronous recording / reproducing apparatus SS plays a piano in accordance with the reproduction of a commercially available music CD and records the performance information on the FD as MIDI data will be described. Note that a music CD used in a recording operation described below is referred to as a music CD-A in order to distinguish it from a music CD used in a reproduction operation described later.
[0053]
[1.2.1.1] Recording start operation
The user sets the music CD-A in the music CD drive 1 and the empty FD in the FD drive 2. Subsequently, the user presses the keypad of the operation display unit 5 corresponding to the start of performance data recording. The operation display unit 5 outputs a signal corresponding to the pressed keypad to the controller unit 6.
[0054]
When the CPU 62 of the controller unit 6 receives a signal corresponding to the start of recording of performance data from the operation display unit 5, it transmits a music CD playback command to the music CD drive 1. In response to this playback command, the music CD drive 1 sequentially transmits audio data recorded on the music CD-A to the controller unit 6. The controller unit 6 receives one set of left and right data from the music CD drive 1 every 1/44100 seconds. Hereinafter, the value of one set of data on the left and right is expressed as (R (n), L (n)), and the value of this one set of data or generated from this set of data in the correlation determination data generation process described later. Each data value is called a “sample value”. R (n) and L (n) represent the values of the data of the right channel and the data of the left channel, respectively, and are integers from −32768 to 32767. n is an integer representing the order of the audio data, and increases in the order of 0, 1, 2,.
[0055]
[1.2.1.2] Transmission of sound data to sound generator
First, when the CPU 62 receives sample values, that is, (R (0), L (0)), (R (1), L (1)), (R (2), L (2)),. The received sample value is transmitted to the sound generator 4. When the sound generation unit 4 receives the sample value from the controller unit 6, it converts it into a sound and outputs it. As a result, the user can listen to the music recorded on the music CD-A.
[0056]
[1.2.1.3] Recording of raw audio data for reference to RAM
The CPU 62 transmits the received sample value to the sound generator 4 and at the same time records the sample value corresponding to a certain time near the beginning of the music in the RAM 64 among the received sample values.
In this embodiment, as an example, the CPU 62 is 2 ¹⁶ Assume that the sample value of 65536 sets, that is, 65536 sets, is recorded in the RAM 64. The 65536 sets of sample values are data for about 1.49 seconds.
[0057]
First, for each sample value, the CPU 62 determines whether or not the absolute value of each sample value exceeds a predetermined threshold value. Specifically, assuming that the threshold value is 1000, if either of the absolute values of R (n) or L (n) is greater than 1000, the CPU 62 obtains a positive result in the comparison determination.
[0058]
Hereinafter, as an example for explanation, regarding audio data of a music CD-A, for the first time in the 52156th set of sample values, that is, (R (52156), L (52156)), R (52156) or L (52156) It is assumed that the absolute value exceeds a predetermined threshold value. Therefore, the CPU 62 obtains a negative result as a result of the comparison determination with respect to (R (0), L (0)) to (R (52155), L (52155)). Meanwhile, the CPU 62 does not record these sample values in the RAM 64. As a result, the sample value corresponding to the silence included in the beginning of the music or a portion close to silence is not recorded in the RAM 64. In this case, the playback time of the first sample value not recorded is about 1.18 seconds.
[0059]
Thereafter, the CPU 62 receives (R (52156), L (52156)), and obtains a positive result as a result of the comparison determination with respect to the sample value. When the CPU 62 obtains a positive result in the comparison determination, 65536 sets of sample values to be received thereafter, that is, (R (52156), L (52156)) to (R (1171691), L (1176691)) are stored in the RAM 64. To record. Hereinafter, the series of sample values is referred to as “reference raw audio data”.
[0060]
[1.2.1.4] Start of timing
When the CPU 62 receives the last sample value of the reference raw audio data, that is, (R (117769), L (117691)), the CPU 62 ends the recording of the reference raw audio data and measures the time based on that time. To start.
[0061]
[1.2.1.5] Generation of processed processed speech data
When the recording of the reference raw audio data is completed, the CPU 62 transmits an execution command to the DSP 63 so as to perform a correlation determination data generation process on the reference raw audio data. The correlation determination data generation process is a process of generating audio data having a sampling frequency of about 172.27 Hz to be used for the correlation determination process from audio data having a sampling frequency of 44100 Hz. The correlation determination process is a process of determining the similarity between two sets of audio data, details of which will be described later. The correlation determination data generation process will be described below with reference to FIG.
[0062]
When the DSP 63 receives from the CPU 62 an execution instruction for the correlation determination data generation process for the reference raw voice data, the DSP 63 reads the reference raw voice data recorded in the RAM 64 (step S1). Subsequently, the DSP 63 converts the stereo data into monaural data by arithmetically averaging the left and right values of each sample value of the reference raw audio data (step S2). This monaural conversion process is a process for reducing the load on the DSP 63 in the process after this step.
[0063]
Subsequently, the DSP 63 performs high-pass filter processing on the series of sample values converted to monaural (step S3). By this high-pass filter processing, the direct current component in the speech waveform indicated by the series of sample values is removed, and the sample values are evenly distributed on both the positive and negative sides. In the correlation determination process, the comparison determination using the cross-correlation values of the two sets of audio data is performed. When the cross-correlation values are compared, if the sample values are evenly distributed on both the positive and negative sides, the determination accuracy is high. That is, the processing in this step is processing aimed at improving the accuracy of determination in the correlation determination processing.
[0064]
Subsequently, the DSP 63 obtains an absolute value for each sample value that has been subjected to the high-pass filter process (step S4). The processing in this step is processing for obtaining an alternative value for the power of each sample value. Since the absolute value is smaller than the square value indicating power and is easy to process, the absolute value is used as a substitute value for the square value of each sample value in this embodiment. Therefore, when the processing capacity of the DSP 63 is high, a square value may be calculated instead of the absolute value of each sample value in this step.
[0065]
Subsequently, the DSP 63 performs a filtering process using a comb filter on the series of sample values converted into absolute values in step S4 (step S5). The processing in this step is processing for extracting a low-frequency component that easily captures a change in waveform from a speech waveform indicated by a series of sample values. In order to extract low frequency components, a low-pass filter is usually used. However, since the comb filter has a smaller load on the DSP 63 than the low-pass filter, the low-pass filter is replaced with a comb filter in this embodiment. Has been.
[0066]
FIG. 5 shows the configuration of an example of a comb filter that can be used in step S5. In FIG. 5, processing indicated by a rectangle indicates delay processing, and z ^-k K in means that the delay time in the delay processing is (sampling period × k). As described above, since the sampling frequency of the music CD is 44100 Hz, the sampling period is 1/444100 seconds. On the other hand, a process indicated by a triangle indicates a multiplication process, and a value indicated in the triangle indicates a multiplication coefficient. In FIG. 5, K is represented by the following formula (1).
[Expression 1]

[0067]
By this multiplication using K as a coefficient, this comb filter has a function of a high-pass filter of frequency f. As a result, the direct current component in the speech waveform indicated by the series of sample values is removed again by the filtering process in this step. Note that the values of k and f can be arbitrarily changed, and are determined empirically so that the determination accuracy in the correlation determination process is increased.
[0068]
Subsequently, the DSP 63 further performs filter processing using a low-pass filter on the series of sample values subjected to the filter processing in step S5 (step S6). The process in this step is a process for preventing aliasing noise from occurring as a result of the downsampling process performed in the next step S7. In step S7, the data of the sampling frequency of 44100 Hz is down-sampled to the sampling frequency of about 172.27 Hz. Therefore, in order to prevent the occurrence of aliasing noise, it is necessary to remove the frequency component of about 86.13 Hz or more, which is half of that. . However, in the filter processing using the comb filter in step S5, the high frequency component is not sufficiently removed due to the characteristics of the comb filter. Therefore, the remaining frequency components of about 86.13 Hz or more are removed by performing filter processing using a low-pass filter in this step. When the processing capacity of the DSP 63 is high, filter processing using only one high-precision low-pass filter may be performed instead of the filter processing using two filters in step S5 and step S6.
[0069]
Subsequently, the DSP 63 performs 1/256 downsampling processing on the series of sample values subjected to the filtering processing in step S6 (step S7). That is, the DSP 63 performs a process of extracting one sample value for every 256 sample values. As a result, the number of data of a series of sample values is reduced from 65536 to 256. Hereinafter, each sample value obtained by the process of step S7 is represented as X (m). However, m is an integer of 0-255. A series of sample values, that is, X (0) to X (255) will be referred to as “reference processed audio data” hereinafter. The DSP 63 records the reference processed audio data in the RAM 64 (step S8).
[0070]
[1.2.1.6] Recording MIDI events to RAM
While the above-described processing for generating the processed sound data for reference by the DSP 63 is performed, the user starts a performance using the piano 31. That is, after the CPU 62 finishes recording the raw audio data for reference and starts timing, the user can listen to the music CD-A output from the sound generation unit 4 and press the piano 31 in accordance with the music. And pedal operation.
Information on performance using the user's piano 31 is detected as key and pedal movements by the key sensor 32 and the pedal sensor 33, converted into a MIDI event by the MIDI event control circuit 34, and then transmitted to the controller unit 6. The
[0071]
In the controller unit 6, when the CPU 62 receives the MIDI event from the automatic performance piano unit 3, the measured time value at the time when the MIDI event is received, that is, the MIDI value from the time when the CPU 62 receives the last sample value of the reference raw audio data. The delta time representing the elapsed time until the event reception time is recorded in the RAM 64 together with the MIDI event. FIG. 6 is a schematic diagram showing a temporal relationship between audio data of a music CD-A and MIDI events. According to FIG. 6, the time measurement by the CPU 62 is started about 2.67 seconds after the start of the reproduction of the audio data of the music CD-A, and the first MIDI event is 1.25 seconds, the first time based on that time. It can be seen that the second MIDI event was received by the CPU 62 after 2.63 seconds and the third MIDI event was received after 3.71 seconds.
[0072]
[1.2.1.7] Recording of SMF to FD
When the reproduction of the music CD-A is finished and the performance by the user using the piano 31 is finished, the user presses the keypad of the operation display unit 5 corresponding to the end of the recording of the performance data. The operation display unit 5 transmits a signal corresponding to the pressed keypad to the controller unit 6. When the CPU 62 receives a signal indicating the end of recording of performance data from the operation display unit 5, it transmits a music CD playback stop command to the music CD drive 1. In response to this playback stop command, the music CD drive 1 stops playback of the music CD-A.
[0073]
Subsequently, the CPU 62 reads the processed processed sound data for reference generated by the DSP 63 and the MIDI event and delta time generated by the performance using the user's piano 31 from the RAM 64. The CPU 62 combines these read data to generate an SMF track chunk. Further, the CPU 62 adds a header chunk corresponding to the created track chunk to generate an SMF.
[0074]
FIG. 7 is a diagram showing an outline of the SMF generated by the CPU 62. At the beginning of the data portion of the track chunk, a system exclusive event including reference processed audio data is recorded together with the delta time. This delta time is 0.00 seconds. Following the system exclusive event including the processed audio data for reference, MIDI events corresponding to the performance of the user using the piano 31 are sequentially recorded. According to the example of FIG. 6, the first MIDI event by the user's performance is a C5 note on event, the second MIDI event is an E6 note on event, and the third MIDI event is a C5 note off event. The delta times for them are 1.25 seconds, 2.63 seconds, and 3.71 seconds, respectively.
[0075]
When completing the generation of the SMF, the CPU 62 transmits the generated SMF to the FD drive 2 together with a write command. When the FD drive 2 receives the write command and the SMF from the CPU 62, the FD drive 2 writes the SMF into the set FD.
[0076]
Using FIG. 6, the temporal relationship between the audio data of the music CD-A and the MIDI event written to the SMF is organized. In the following description, in order to distinguish two different times, (T) is added after the time when the playback start time of the music CD-A is 0 seconds, while the delta time in the SMF is (D )
[0077]
First, at about 1.18 seconds (T), since the absolute value of the audio data of the music CD-A exceeds 1000, which is the threshold, recording of the reference raw audio data is started. Thereafter, the reference raw audio data is recorded until about 1.49 seconds, that is, after about 2.67 seconds (T).
[0078]
Subsequently, the time for calculating the delta time is started by setting the time of about 2.67 seconds (T) to 0 seconds. Thereafter, the first event occurs at 1.25 seconds (D), that is, about 3.92 seconds (T), and the event is recorded. Similarly, the second event is 2.63 seconds (D), ie, about 5.30 seconds (T), and the third event is 3.71 seconds (D), ie, about 6.38 seconds (T). These events will be recorded.
[0079]
As shown in the lower part of FIG. 7, the playback time of the reference raw audio data corresponding to the reference processed audio data precedes 0.00 seconds (D), but the reference processed audio data in the SMF Is recorded at a position of 0.00 seconds (D) as system exclusive data.
[0080]
[1.2.2] Playback operation
Next, an operation for reproducing the SMF recorded by the above-described method and synchronizing the audio data of the music CD and the SMF MIDI data will be described. The music CD used in the following playback operation includes the same music as the music CD-A used in the recording operation described above, but the version is different and the music starts from the time when the music CD starts to be played. And the level of the voice waveform indicated by the voice data is different. Further, when the press data is created from the master data of the music, the music CD is edited with respect to the sound effect and the like on the audio data, so the data of the same music included in the music CD-A and the data There are slight differences in content. Therefore, in order to distinguish the music CD used in the reproduction operation described below from the music CD-A, it is called a music CD-B.
[0081]
[1.2.2.1] Playback start operation
The user sets the music CD-B in the music CD drive 1 and the FD on which the SMF is recorded in the FD drive 2. Subsequently, the user presses the keypad of the operation display unit 5 corresponding to the start of reproduction of the performance data. The operation display unit 5 outputs a signal corresponding to the pressed keypad to the controller unit 6.
[0082]
When the CPU 62 receives a signal for instructing the start of reproduction of performance data from the operation display unit 5, it first transmits an SMF transmission command to the FD drive 2. The FD drive 2 reads the SMF from the FD in response to the SMF transmission command, and transmits the read SMF to the controller unit 6. The CPU 62 receives the SMF from the FD drive 2 and records the received SMF in the RAM 64.
[0083]
Subsequently, the CPU 62 transmits a music CD playback command to the music CD drive 1. In response to this playback command, the music CD drive 1 sequentially transmits audio data recorded on the music CD-B to the controller unit 6. The controller unit 6 receives one set of left and right data from the music CD drive 1 every 1/44100 seconds. Here, the value of data received by the CPU 62 from the music CD drive 1 is represented as (r (n), l (n)). The range of values of r (n) and l (n), and the definition of “sample value” used below are the same as those in R (n) and L (n).
[0084]
[1.2.2.2] Transmission of sound data to sound generator
The CPU 62 obtains sample values from the music CD drive 1, that is, (r (0), l (0)), (r (1), l (1)), (r (2), l (2)),. Is received, the received sample value is transmitted to the sound generator 4. When the sound generation unit 4 receives the sample value from the controller unit 6, it converts it into a sound and outputs it. As a result, the user can listen to the music recorded on the music CD-B.
[0085]
[1.2.2.3] Correlation determination process
The CPU 62 transmits the sample value received from the music CD drive 1 to the sound generator 4 and simultaneously transmits an execution command for correlation determination processing to the DSP 63 and then sequentially transmits the received sample value to the DSP 63. The correlation determination process is a process for determining the similarity between the processed sound data for determination generated from a series of sample values received from the music CD drive 1 and the processed processed sound data for reference included in the SMF. is there. Hereinafter, the correlation determination process will be described with reference to FIG.
[0086]
The DSP 63 receives the execution instruction of the correlation determination process from the CPU 62, and then sequentially samples values, that is, (r (0), l (0)), (r (1), l (1)), (r (2)). , L (2)),... Are recorded in the RAM 64. Hereinafter, a series of 65536 sample values starting from (r (n), l (n)) will be referred to as “determination raw speech data (n)”. When the DSP 63 receives the 65536th sample value, that is, (r (65535), l (65535)) and records the sample value in the RAM 64, the DSP 63 reads (r (0), l (0)) to ( r (65535), l (65535)), that is, raw audio data for determination (0) is read. Subsequently, the DSP 63 performs the above-described correlation determination data generation process on the determination raw audio data (0), that is, the same process as the process of steps S1 to S8 in FIG. As a result, the DSP 63 generates 256 sample values, and the generated 256 sample values are recorded in the RAM 64 (step S11). Hereinafter, 256 sample values obtained as a result of performing the correlation determination data generation process on the determination raw speech data (n) are represented by Y _n (0) to Y _n (255), and this series of data is referred to as “processed voice data for determination (n)”.
[0087]
Subsequently, the DSP 63 reads from the RAM 64 the reference processed voice data included in the SMF system exclusive event, that is, X (0) to X (255), and the processed voice data for determination (0) recorded in step S11. Ie Y ₀ (0) to Y ₀ (255) is read (step S12).
[0088]
Subsequently, the DSP 63 performs a determination process represented by the following equations (2) and (3) (step S13).
[Expression 2]

[Equation 3]

[0089]
The left side of Equation (2) is X (m) and Y ₀ The closer the value of (m) is, the closer it is to 1. When the processed processed speech data for reference and the processed processed speech data for determination (0) are arranged in order and the same number of data is paired, the value of the left side increases as the value of each pair of data matches. . In the following description, the value on the left side is called an absolute correlation index. Note that the value of p can be arbitrarily changed in the range of 0 to 1, and the above formula is used by using the partial reference unprocessed sound data and the determination processed sound data generated from the same portion of the sound data of the music. When the determination according to (2) is made, a positive result (hereinafter referred to as “Yes”) is obtained, and the processed processed sound data for reference obtained from different parts of the sound data of the music even though they are similar Is determined empirically so that a negative result (hereinafter referred to as “No”) is obtained when the determination by the expression (2) is performed using the processed processed speech data.
[0090]
The left side of Equation (3) takes a value in the range of 0 to 1, and the speech waveform indicated by X (m) and Y ₀ The closer the shape of the speech waveform indicated by (m) is to a similar shape, the closer it is to 1. In the following description, the value on the left side is referred to as a relative correlation index. The above-mentioned absolute correlation index refers to the level of the sound waveform indicated by the processed sound data for determination, even if the processed sound data for reference and the processed sound data for determination are generated from the same portion of the sound data of the music If it is smaller than the level of the voice waveform indicated by the processed voice data, it takes a value smaller than 1 according to the level. Conversely, when the level of the speech waveform indicated by the processed speech data for determination is large, the absolute correlation index takes a value greater than 1 according to the level. On the other hand, since the relative correlation index takes a value close to 1 in any case, even if the recording level is different in different versions of the music CD, the determination by the expression (3) gives Yes. The value of q can be arbitrarily changed in the range of 0 to 1, and is determined empirically in the same manner as p.
[0091]
If one or both of the two determination processing results in step S13 are No, the DSP 63 ends the correlation determination processing using the processed audio data for determination (0), and the CPU 64 receives the RAM 64 of the next sample value. Wait for notification of completion of writing to. When the CPU 62 receives a new sample value from the music CD drive 1 (step S14), it records this in the RAM 64, and sends a notice of completion of writing the new sample value to the RAM 64 to the DSP 63. When the DSP 63 receives the completion notification, the DSP 63 returns to the processing in step S11. However, at that time, instead of the determination raw speech data (0), the correlation determination data generation processing is performed on the determination raw speech data having the newly recorded sample value as the last sample value. As a result, the processed processed speech data (n−1) is recorded in the RAM 64 by the n-th process of step S11.
[0092]
On the other hand, when both the results of the two determination processes in step S13 are Yes, the DSP 63 performs a determination process represented by the following expressions (4) and (5) (step S15).
[Expression 4]

[Equation 5]

[0093]
The left side of equation (4) is X (m) and Y _n It is the rate of change at n = 0 of the product sum of (m). In the following description, X (m) and Y _n The product sum of (m) is called a correlation value. When the processed processed speech data for reference and processed processed speech data (n) are arranged in order and the data in the same order are paired, the correlation value becomes larger as the value of each pair of data approximates. . The rate of change of the correlation value is X (m) and Y ₀ Correlation value by (m), X (m) and Y ₁ When the correlation values of (m),... And the correlation values are arranged in time series, the correlation value becomes 0 when the correlation value takes an extreme value. Therefore, the determination process according to Expression (4) is a process of determining whether or not the correlation value is an extreme value. Further, the expression (5) is processing for determining that the extreme value is a maximum value.
[0094]
By the way, when n = 0, there is no preceding correlation value, and thus determination cannot be performed. Therefore, in this embodiment, when n = 0, the determination result in step S15 is No. The reason is that the raw audio data for reference is not from the beginning of the music CD-A, but is audio data extracted from the time when the audio waveform indicated by the audio data exceeds the threshold value, and therefore the audio data corresponding to the data This is because there is a very low possibility that is located at the beginning of the music CD-B.
[0095]
More precisely, in the present embodiment, X (m) and Y _n Since (m) is a discrete value, it is extremely rare for the left side to be strictly zero. Therefore, the determination process in step S15 is actually performed as follows. First, the DSP 63 uses X (m) and Y _n Product sum of (m) and X (m) and Y _n-1 Take the product-sum difference of (m). The value is D _n Call it. Then, DSP63 is D _n-1 Is greater than 0 and D _n Whether or not is 0 or less is determined. Where D _n-1 Is greater than 0 and D _n Is less than 0, the rate of change of the correlation value is D _n Since the positive value becomes 0 or changes across 0, the correlation value at that time is a local maximum value or an approximate value of the local maximum value. Therefore, in this case, the result of the determination process in step S15 is set to Yes. When performing the above processing, it is necessary that n = 2 or more. However, when n = 1, the determination processing result in step S15 is No for the same reason as n = 0.
[0096]
If the result of the determination process in step S15 is No, the DSP 63 waits for a completion notification of writing a new sample value from the CPU 62. When a notification of completion of writing of a new sample value is received from the CPU 62 (step S14), the DSP 63 returns to the process of step S11. As a result, new processed voice data for determination is recorded in the RAM 64.
[0097]
The result of the determination process in step S13 or the result of the determination process in step S15 is No. As a result, when the process returns to step S11 through step S14, the DSP 63 continues to perform the processes in steps S12 to S15 described above. As a result, until the result of the determination process in step S15 is Yes, the DSP 63 determines the processed voice data for determination (0), the processed processed sound data for determination (1), the processed processed sound data for determination (2),.・ The processed voice data for judgment is sequentially updated.
[0098]
Here, as an example, the audio data recorded on the music CD-B is about 1.600 samples in time from the playback start time, compared to the audio data recorded on the music CD-A. It is assumed that the music is recorded with a delay of 17 seconds as a whole. That is, the raw audio data for reference is obtained by extracting the audio data (R (52156), L (52156)) to (R (1176691), L (117691)) recorded on the music CD-A. Therefore, the audio data corresponding to the reference unprocessed audio data in the music CD-B is (r (103756), l (103756)) to (r (169291), l (169291)).
[0099]
In this case, the DSP 63 obtains No as a result of the determination process in step S13 or step S15 performed using the processed sound data for determination (0) to the processed sound data for determination (103755). This is because the determination raw audio data (0) to the determination raw audio data (103755) used to generate the determination processed audio data does not correspond to the reference raw audio data, and is sufficient. This is because there is no correlation.
[0100]
The DSP 63 obtains Yes as a result of the determination process in step S13 performed using the processed processed speech data for determination (103756), and further obtains Yes as a result of the determination process in step S15. This is because the determination raw sound data (103756) used to generate the determination processed sound data (103756) corresponds to the reference unprocessed sound data, and a sufficient correlation is obtained. As a result, the DSP 63 ends the series of correlation determination processes and transmits a success notification of the correlation determination process to the CPU 62.
[0101]
FIG. 9 is a graph showing the values of the calculation formulas used in the determination processing in step S13 and step S14 for actual audio data samples. In the creation of this graph, in step S3 in FIG. 4, as a high-pass filter with a frequency of 25 Hz, a one-stage IIR (Ifinite Infinite Response) filter is used, and as a comb filter constant in step S5, k = 4410. And f = 1, and in step S6, a one-stage IIR filter is used as a low-pass filter having a frequency of 25 Hz. Furthermore, p = 0.5 and q = 0.8 are used as constants of the determination formula in step S13 of FIG.
[0102]
The upper graph in FIG. 9 shows the value of the numerator on the left side of Equation (2) and the value of the equation in which the denominator of the left side is transferred to the right side for n. The middle graph of FIG. 9 shows the value of the numerator on the left side of Equation (3) and the value of the equation with the left denominator transferred to the right side for n. Also, the lower graph of FIG. 9 shows the value on the left side of Equation (4).
[0103]
According to FIG. 9, when the value of n is located in the section A, the value of the numerator on the left side of the equation (2) is equal to or exceeds the value of the equation in which the denominator of the left side is transferred to the right side. The condition of equation (2) is satisfied. Further, in the section A, when the value of n is located in the section B, the value of the numerator on the left side of the expression (3) is equal to or exceeds the value of the expression obtained by transferring the denominator of the left side to the right side. Therefore, the condition of Expression (3) is satisfied. As a result, a Yes result is obtained in the determination process of step S13. In the section B, when the value of n takes the value indicated by the arrow C, the value on the left side of the equation (4) is changed from a positive value to 0, and the condition of the equation (5) is also satisfied. In the determination process, a Yes result is obtained.
[0104]
[1.2.2.4] Playback of MIDI events
When the CPU 62 receives the notification of the success of the correlation determination process from the DSP 63, the CPU 62 sets the time to 0 seconds and starts measuring time. At the same time, the CPU 62 reads the SMF from the RAM 64 and sequentially compares the time measured with the delta time included in the SMF. 3 to send.
[0105]
In the automatic performance piano unit 3, when the MIDI event control circuit 34 receives a MIDI event from the CPU 62, it transmits the received MIDI event to the sound source 35 or the drive unit 36. When a MIDI event is transmitted to the sound source 35, the sound source 35 sequentially transmits sound data indicating the sound of the musical instrument to the sound generation unit 4 in accordance with the received MIDI event. The sound generation unit 4 outputs the performance of the musical instrument sound received from the sound source 35 from the speaker 44 together with the sound of the music CD-B that has already been reproduced. On the other hand, when a MIDI event is transmitted to the drive unit 36, the drive unit 36 moves the keys and pedals of the piano 31 according to the received MIDI event. In any case, the user can simultaneously listen to the music recorded on the music CD-B and the performance based on the musical instrument sound based on the performance information recorded on the SMF.
[0106]
[1.2.2.5] Temporal relationship between audio data and MIDI events
As described above, the user can play the music CD and the MIDI event recorded in the SMF at the same time, but the start time difference between the music CD-A and the music CD-B is adjusted. The reproduction of the music CD and the reproduction of the MIDI event recorded in the SMF do not deviate. Hereinafter, the temporal relationship between the audio data of the music CD-A and the music CD-B and the MIDI event will be organized with reference to FIG. FIG. 10 shows an example of a situation where the level of the audio waveform indicated by the audio data of the music CD-B is lower than the audio waveform indicated by the audio data of the music CD-A. In order to distinguish two different times, (T ′) is added after the time when the playback start time of the music CD-B is 0 seconds.
[0107]
If the start time difference between the music CD-A and the music CD-B is not adjusted, and the MIDI event is played back based on the start of the music CD playback, the first event is 3.92. Second (T ′), the second event is transmitted to the automatic performance piano unit 3 at 5.30 seconds (T ′), and the third event is transmitted at 6.38 seconds (T ′). Therefore, the performance by the MIDI event is too early for the music CD.
[0108]
However, after playback of the music CD-B is started, about 3.84 seconds, raw audio data for determination taken out from the music CD-B and raw audio for reference taken out from the music CD-A in advance are used. Since the data are greatly different, there is no sufficient correlation between the processed processed sound data for determination and the processed processed sound data for reference, and the reproduction of the MIDI event is not started.
[0109]
Then, a sufficient correlation is obtained between the audio data in about 3.84 seconds (T ′), and each is generated from the same part of the music CD-B and music CD-A. It is determined. Since the delta time of the MIDI event is measured based on the time corresponding to about 3.84 seconds (T ′), that is, about 2.67 seconds (T) in the music CD-A, the first event is about 5. 09 seconds (T ′), the second event is transmitted to the automatic performance piano section 3 at about 6.47 seconds (T ′), and the third event is transmitted at about 7.55 seconds (T ′). Thus, the MIDI event transmission timing is adjusted, and the music recorded on the music CD-B is played by the MIDI event at the correct timing.
[0110]
[2] Second embodiment
In the second embodiment of the present invention, the time code recorded on the music CD is used for the synchronization adjustment of the reproduction of the audio data recorded on the music CD and the MIDI event recorded on the SMF.
[0111]
[2.1] Music CD drive
The overall configuration, the function of each component, and the data format in the MIDI data in the second embodiment are the same as those in the first embodiment except for the function of the music CD drive 1, and therefore the function of the music CD drive 1 Only will be described, and other description will be omitted.
In the second embodiment, the music CD drive 1 transmits the time code to the controller unit 6 together with the audio data recorded on the music CD. The other points are the same as those of the music CD drive 1 in the first embodiment.
[0112]
[2.2] Operation
About the operation | movement of the synchronous recording / reproducing apparatus SS in 2nd Embodiment, the following 3 points | pieces differ from 1st Embodiment.
(1) In SMF, the time code at the start time of the reference unprocessed audio data used for generating the reference processed audio data is recorded in the system exclusive event.
(2) As a delta time of other MIDI events recorded in the SMF, a time code corresponding to the occurrence time of those MIDI events is recorded.
(3) In the reproduction operation of the MIDI event, the MIDI event is transmitted to the automatic performance piano unit 3 based on the time clock transmitted from the music CD drive 1 regardless of the time using the clock of the controller unit 6.
[0113]
Since other operations in the second embodiment are the same as those in the first embodiment, a detailed description thereof will be omitted. In the following description, it is assumed that the music CD-A is used in the recording operation and the music CD-B is used in the reproduction operation, as in the first embodiment. The time code expression format uses hours, minutes, seconds, and frames. Similar to the delta time recorded in the SMF, in the following description, time information indicated by the time code is used for simplification. Is expressed in seconds.
[0114]
[2.2.1] Recording operation
In the synchronous recording / reproducing apparatus SS of the second embodiment, when the user gives an instruction to start recording performance data using the operation display unit 5, the audio data of the music CD-A is transferred from the music CD drive 1 to the controller together with the time code. The data are sequentially transmitted to the unit 6.
[0115]
In the controller unit 6, the CPU 62 sequentially transmits the received audio data to the sound generation unit 4, and the music of the music CD-A is output from the sound generation unit 4 as a sound. On the other hand, when the absolute value of the sample value of the received audio data exceeds a predetermined threshold, the CPU 62 converts the time code received immediately before to the delta time format and records the data in the RAM 64. That is, “1.18 seconds” is recorded in the RAM 64 as the delta time. Hereinafter, this delta time is referred to as “reference audio data start time”.
[0116]
The CPU 62 starts recording the sample value in the RAM 64 at the same time as recording the reference audio data start time, and then the sample value for about 1.49 seconds is recorded in the RAM 64 as the reference unprocessed audio data.
When the recording of the reference raw audio data by the CPU 62 is completed, the DSP 63 reads the recorded reference raw audio data from the RAM 64, and performs a correlation determination data generation process on the read reference raw audio data. As a result, reference processed audio data is recorded in the RAM 64.
[0117]
While the DSP 63 performs correlation determination data generation processing, the user starts playing the piano 31 in accordance with the sound of the music CD-A that can be heard from the sound generator 4. Information on the performance by the user is transmitted from the automatic performance piano unit 3 to the controller unit 6 as a MIDI event. When receiving the MIDI event, the CPU 62 converts the time code received from the music CD drive 1 immediately before to the delta time format, and records the data in the RAM 64 in association with the MIDI event.
[0118]
When the reproduction of the music CD-A is finished and the performance by the user is finished, the user uses the operation display unit 5 to give an instruction to finish recording the performance data. When this user instruction is given, first, the reproduction of the music CD-A by the music CD drive 1 is stopped. Subsequently, the CPU 62 reads the reference audio data start time, the reference processed audio data, the MIDI event generated by the user's performance, and the delta time associated with the MIDI event from the RAM 64. The CPU 62 combines these read data to generate an SMF.
[0119]
FIG. 11 is a diagram showing an outline of the SMF generated by the CPU 62. In this SMF, the system exclusive event stores the reference audio data start time in addition to the reference processed audio data. The delta time corresponding to another MIDI event includes the same time information as the time code received by the CPU 62 almost simultaneously with the MIDI event. For example, the delta time for the first event is 3.92 seconds. This delta time indicates that the first event is generated 3.92 seconds after the start of reproduction of the audio data of the music CD-A.
The CPU 62 transmits the generated SMF together with a write command to the FD drive 2, and the SMF is written to the FD by the FD drive 2.
[0120]
[2.2.2] Playback operation
Next, an operation for reproducing the SMF recorded by the above-described method and synchronizing the audio data of the music CD-B and the SMF MIDI data will be described.
When the user gives an instruction to start playing performance data using the operation display unit 5, first, the SMF recorded in the FD is transmitted from the FD drive 2 to the CPU 62, and the CPU 62 records the received SMF in the RAM 64. Subsequently, the music CD drive 1 starts playback of the music CD-B, and the audio data and time code recorded on the music CD-B are sequentially transmitted to the controller unit 6. The CPU 62 sequentially transmits the received audio data to the sound generation unit 4, and the music of the music CD-B is output as a sound from the sound generation unit 4. The CPU 62 sequentially transmits the sound data to the sound generator 4 and simultaneously records the sound data in the RAM 64 together with the time code.
[0121]
When the CPU 62 records the 65536th sample value in the RAM 64, the DSP 63 starts correlation determination processing for the audio data recorded in the RAM 64. Then, until the result of the determination process of step S15 in FIG. 8 is Yes, the DSP 63 generates the processed processed sound data for determination from the processed unprocessed sound data that is sequentially updated, and generates the determined processed processed sound data. On the other hand, the determination process of step S13 and step S15 is repeated.
[0122]
The DSP 63 obtains Yes as a result of the determination process of step S15 performed using the processed processed audio data for determination (103756), ends the series of correlation determination processes, and transmits a success notification of the correlation determination process to the CPU 62. The success notification of the correlation determination process includes the number “103756” of the processed processed speech data for determination (103756) used last in the correlation determination process.
[0123]
When the CPU 62 receives the success notification of the correlation determination process from the DSP 63, the CPU 62 determines, based on the number “103756” included in the received success notification, the first sample value of the raw audio data for determination (103756), that is, (r (103756), l (103756)) and the time code recorded in the RAM 64 are read out. In this case, the time indicated by the time code is 2.35 seconds. Subsequently, the CPU 62 calculates a difference between the time indicated by the read time code and the time indicated by the reference audio data start time included in the SMF system exclusive event recorded in the RAM 64.
[0124]
In this case, since the time indicated by the reference audio data start time is 1.18 seconds, the difference between these times is 1.17 seconds. This indicates that the delta time recorded in the SMF is 1.17 seconds as a whole, which is early for the music on the music CD-B. Therefore, the CPU 62 adds 1.17 seconds to each delta time in the SMF. As a result, the delta time for the first event is 3.92 seconds to 5.09 seconds, the delta time for the second event is 5.30 seconds to 6.47 seconds, and the delta time for the third event is 6.38 seconds. To 7.55 seconds respectively. Hereinafter, this operation is referred to as “timing adjustment processing”.
[0125]
Subsequently, the CPU 62 sequentially compares the time code of the music CD-B sequentially transmitted from the music CD drive 1 with the updated delta time, and when the time information matches, the MIDI corresponding to the delta time is obtained. The event is transmitted to the automatic performance piano unit 3.
[0126]
In the automatic performance piano unit 3, automatic performance is performed according to the MIDI event transmitted from the controller unit 6. As a result, the user can simultaneously listen to the music recorded on the music CD-B and the performance based on the performance information recorded on the SMF.
[0127]
[2.2.3] Temporal relationship between audio data and MIDI events
FIG. 12 is a diagram showing a temporal relationship between the audio data of the music CD-A and the music CD-B and the MIDI event in the MIDI data recording operation and reproducing operation.
The upper diagram of FIG. 12 shows the relationship between the time indicated by the time code of the music CD-A and the time indicated by the delta time corresponding to the recorded MIDI event in the MIDI data recording operation. As shown in this figure, the time information indicated by the time code at the time of occurrence of the MIDI event is recorded as it is in the delta time.
[0128]
The middle diagram of FIG. 12 shows the relationship between the time indicated by the time code of the music CD-B and the time indicated by the delta time after the timing adjustment process in the MIDI data playback operation. If the MIDI event is reproduced based on the time code of the music CD-B according to the delta time before the timing adjustment process, the MIDI event is reproduced earlier than the music CD-B. However, since the time lag is adjusted by the timing adjustment process, when the MIDI event is reproduced based on the time code of the music CD-B according to the delta time after the timing adjustment process, the MIDI event becomes the music CD. -B is played back at the correct timing.
[0129]
By the way, the music CD drive 1 divides the reference clock signal from the oscillator included in the music CD drive 1 to generate a 44100 Hz clock signal, and in accordance with this clock signal, the audio data recorded on the music CD is sequentially supplied. Transmit to the controller unit 6. Here, when the operation of the oscillator is unstable, even if the exact same music CD is reproduced, the reproduction speed may be slightly different every time it is reproduced.
[0130]
The lower diagram of FIG. 12 shows the time code of the music CD-B when the music CD-B is played at a slightly higher playback speed than the playback speed of the music CD-B in the middle diagram. The relationship between time and the time which the delta time after a timing adjustment process shows is shown. If the MIDI event is reproduced according to the clock signal of the CPU 62, the reproduction of the MIDI event is slightly delayed with respect to the music CD-B as a whole. That is, the middle diagram in FIG. 12 shows the time according to the clock signal of the CPU 62, and if there is no error in the clock signal of the CPU 62 and its frequency dividing process, the error in the clock signal and the frequency dividing process in the music CD drive 1 is assumed. Thus, the first event is played back with respect to the music CD-B by time t1, the second event is time t2, and the third event is time t3.
[0131]
However, in the second embodiment, since the MIDI event is played back according to the time code transmitted from the music CD drive 1 to the CPU 62 in real time, the MIDI event is played back with a time shift from the music CD-B. It will never be done.
[0132]
[3] Third embodiment
In the third embodiment of the present invention, the reference unprocessed audio data is extracted not from the beginning of the music indicated by the audio data recorded on the music CD but from the middle of the music. In the third embodiment, as in the second embodiment, the time code recorded on the music CD is used for the synchronization adjustment of the reproduction of the MIDI event recorded in the SMF.
Since the overall configuration, functions of each component, and data format in MIDI data in the third embodiment are the same as those in the second embodiment, description thereof is omitted.
[0133]
[3.1] Operation
About operation | movement of the synchronous recording / reproducing apparatus SS in 3rd Embodiment, the following 2 points differ from 2nd Embodiment.
(1) In the recording operation of the MIDI event, the reference raw audio data is extracted from the middle part of the music indicated by the audio data recorded on the music CD.
(2) In the playback operation of the MIDI event, after the playback timing of the MIDI event is determined by the correlation determination process for the audio data recorded on the music CD, the audio data recorded on the music CD is played back again from the beginning. .
[0134]
[3.1.1] Recording operation
In the recording operation of the MIDI event in the third embodiment, any part of the audio data recorded on the music CD can be taken out as reference unprocessed audio data. For example, a sample value corresponding to about 1.49 seconds from the beginning of the music 3 minutes may be used as the reference raw audio data, or about 1 including a portion showing a characteristic audio waveform in the entire music. A sample value corresponding to 49 seconds may be used as reference raw audio data. In the following description, as an example, it is assumed that about 1.49 seconds from the time point of 3 minutes in the time code of the music CD-A, that is, 180 seconds, is taken out as reference raw audio data.
[0135]
When an instruction to start recording performance data is given by the user, first, 65536 sets of audio data from the time point of 180 seconds from the beginning of the music CD-A are transmitted from the music CD drive 1 to the CPU 62. The The CPU 62 converts the leading time code included in the received audio data into a delta time format, and records the data in the RAM 64 as the reference audio data start time. Further, the CPU 62 records the sample value of the audio data included in the received audio data in the RAM 64 as the reference raw audio data. The CPU 62 performs correlation determination data generation processing for the reference raw voice data, and as a result, the reference processed voice data is recorded in the RAM 64.
[0136]
Subsequently, the music CD drive 1 plays the music CD-A from the beginning. The CPU 62 sequentially receives audio data from the music CD drive 1 and transmits sound data included in the received audio data to the sound generation unit 4. The user performs a performance using the piano 31 in accordance with the sound of the music on the music CD-A emitted from the sound generation unit 4, and the performance information is sequentially transmitted to the CPU 62 as a MIDI event. When receiving the MIDI event, the CPU 62 converts the time code received from the music CD drive 1 immediately before to the delta time format, and records the data in the RAM 64 in association with the MIDI event.
[0137]
When the user gives an instruction to end recording of performance data, the music CD drive 1 stops the reproduction of the music CD-A. At the same time, the CPU 62 generates the SMF shown in FIG. 13 from the data recorded in the RAM 64. The generated SMF is written to the FD by the FD drive 2.
[0138]
[3.1.2] Playback operation
Subsequently, when the SMF recorded by the above-described method is synchronously reproduced with respect to the music CD-B, the SMF is first transmitted from the FD drive 2 to the CPU 62 in response to an instruction to start reproduction of the performance data by the user. . The SMF is recorded in the RAM 64. Subsequently, the audio data and the time code of the music CD-B are sequentially transmitted from the music CD drive 1 to the CPU 62.
[0139]
When the CPU 62 receives the sample value of the 65536th audio data, the CPU 62 starts correlation determination processing for the received series of sample values. Under the control of the CPU 62, the DSP 63 updates the reference raw audio data used for the correlation determination processing with the sample values of the audio data sequentially received until the determination processing result in step S15 in FIG. 8 becomes Yes. The correlation determination process is continued. Since the music CD-B has a delay of about 1.17 seconds as a whole with respect to the music CD-A, the CPU 62 is about 182.35 seconds from the beginning of the music CD-B. As a result of receiving the corresponding sample value and performing the correlation determination process on the reference raw speech data with the sample value as the last, Yes is obtained as a result of the determination process in step S15, and the correlation determination process is terminated . The CPU 62 obtains 181.17 seconds as a time code corresponding to the head data of the reference unprocessed voice data used when the correlation determination process is successful.
[0140]
Subsequently, the CPU 62 calculates the difference between the time indicated by this time code and the time indicated by the delta time included in the SMF system exclusive event. In this case, since the time difference is 1.17 seconds, the CPU 62 adds 1.17 seconds to each delta time in the SMF. As a result, as in the second embodiment, each delta time is adjusted to indicate the correct timing for the music of the music CD-B. The above is the process of determining the playback timing of the MIDI event. During this process, the music CD-B is not transmitted to the sound generation unit 4, and therefore the user cannot hear the music CD-B.
[0141]
When the above processing is completed, the music CD drive 1 again plays the music CD-B from the beginning of the music. The audio data of the music of the music CD-B is transmitted to the sound generation unit 4 via the CPU 62, and the user can listen to the sound of the music from the sound generation unit 4. At the same time, the CPU 62 sequentially compares the time code of the music CD-B received from the music CD drive 1 with the updated delta time in the SMF, and when the time information matches, the MIDI event corresponding to the delta time is automatically selected. It transmits to the performance piano part 3. As a result, automatic performance by the automatic performance piano unit 3 is performed.
[0142]
FIG. 14 is a schematic diagram showing the relationship between reference unprocessed audio data, reference processed audio data, determination unprocessed audio data, and determination processed audio data in the third embodiment. The reference unprocessed audio data is obtained by extracting audio data for about 1.49 seconds from the time when the time T1 has elapsed from the beginning of the music CD-A. A correlation determination data generation process is performed on the reference unprocessed sound data, and reference processed sound data is generated. The processed processed sound data for reference is stored at the head of the SMF together with time information indicating the time T1.
In the music CD-B, the audio data corresponding to the reference unprocessed audio data in the music CD-A is recorded as audio data for about 1.49 seconds after the time T2 has elapsed from the beginning.
[0143]
The adjustment of the delta time for the MIDI event included in the SMF is performed based on the difference between T1 and T2. That is, if T1 is smaller than T2, the delta time in SMF is added by the difference, and if T1 is larger than T2, the delta time in SMF is subtracted by the difference.
[0144]
[4] Modification
The first embodiment, the second embodiment, and the third embodiment described above are examples of embodiments of the present invention, and various modifications can be made to the above embodiments without departing from the spirit of the present invention. Can be added. Hereinafter, a modification is shown.
[0145]
[4.1] First modification
In the first modification, each component of the synchronous recording / reproducing apparatus SS is not arranged in the same apparatus, but is arranged separately for each group.
For example, it is possible to separately arrange in the following groups.
(1) Music CD drive 1
(2) FD drive 2
(3) Automatic performance piano part 3
(4) Mixer 41 and D / A converter 42
(5) Amplifier 43
(6) Speaker 44
(7) Operation display unit 5 and controller unit 6
Furthermore, the controller unit 6 may be configured separately for a device that performs only a recording operation and a device that performs only a reproduction operation.
[0146]
The groups of these components are connected by an audio cable, a MIDI cable, an audio optical cable, a USB (Universal Serial Bus) cable, a dedicated control cable, and the like. Commercially available FD drive 2, amplifier 43, speaker 44, etc. may be used.
According to the first modified example, the flexibility of the arrangement of the synchronous recording / reproducing apparatus SS is increased, and at the same time, the user does not prepare all of the synchronous recording / reproducing apparatus SS and prepares only necessary components. Can reduce the necessary cost.
[0147]
[4.2] Second modification
In the second modification, the music CD drive 1 and the FD drive 2 are not provided in the synchronous recording / reproducing apparatus SS. On the other hand, the communication interface has a function connectable to a LAN (Local Area Network) and is connected to an external communication device via the LAN and WAN. Further, the controller unit 6 has an HD (Hard Disk).
[0148]
The controller unit 6 receives digital audio data including audio data and a time code from another communication device via the LAN, and records the received audio data on the HD. Similarly, the controller unit 6 receives the SMF created corresponding to the audio data from another communication device via the LAN, and records the received SMF on the HD.
[0149]
The controller unit 6 reads the digital audio data from the HD instead of receiving the audio data and time code of the music CD from the music CD drive 1. The controller unit 6 performs the same operation on the HD instead of writing and reading the SMF to and from the FD drive 2.
According to the second modification, the user can transmit / receive digital audio data and SMF to / from a geographically distant communication device via the LAN. The LAN may be connected to a wide area communication network such as the Internet.
[0150]
[4.3] Third modification
In the above-described embodiment, all of the determination based on the absolute correlation index, the determination based on the relative correlation index, and the determination based on the correlation value are used in steps S13 and S15 of the correlation determination process. In the third modification, Correlation determination processing is performed by a combination of one or more of these determinations. One or a combination of these determinations may be freely selectable.
According to the third modified example, it is possible to obtain a determination result of required accuracy more flexibly.
[0151]
[4.4] Fourth modification
In the above-described embodiment, the maximum value of the correlation value is detected by the determination represented by the expressions (4) and (5) in step S15 of the correlation determination process. However, in the fourth modified example, the expression Only the determination shown in (4) is performed, and the extreme value of the correlation value is detected.
[0152]
More specifically, in step S15, the DSP 63 determines that D _n-1 And D _n To determine whether the product is 0 or less. Here, when the product is 0 or less, the rate of change of the correlation value is 0 or changes across 0, so the correlation value at that time is an extreme value or an approximate value of the extreme value. Therefore, D _n-1 And D _n If the product of is less than or equal to 0, the result of the determination process in step S15 is Yes.
[0153]
According to the fourth modified example, when the possibility that a local minimum value appears in the vicinity of the local maximum value is low, a determination result similar to that in step S15 in the above-described embodiment can be obtained by a simpler determination process.
[0154]
【The invention's effect】
As described above, according to the present invention, even if the music is the same, the performance data is synchronized at the correct timing with respect to any of the different versions of the audio data recorded with different audio data. Playback can be started. Therefore, it is not necessary to prepare different performance data for different versions of the same music piece, and the creation and management of data is simplified.
[0155]
It should be noted that the recording level of the music may be different in different versions of the same music, but according to the present invention, as one of the indexes used for determining the timing for starting playback of performance data, Since it is possible to use an index that indicates the degree of similarity between the shape of the audio waveform indicated by the audio data and the shape of the audio waveform indicated by the actual audio data, it can be used correctly for audio data with different recording levels. The playback start timing can be determined.
[0156]
Furthermore, in the present invention, when the performance data is reproduced using the time code, even if the reproduction speed of the audio data is unstable, the performance data is reproduced at the correct timing for the audio data. .
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a synchronous recording / reproducing apparatus SS according to a first embodiment and a second embodiment of the present invention.
FIG. 2 is a diagram showing a configuration of a MIDI event.
FIG. 3 is a diagram illustrating a configuration of an SMF.
FIG. 4 is a flowchart of correlation determination data generation processing according to the first and second embodiments of the present invention.
FIG. 5 is a diagram showing a configuration of a comb filter according to the first embodiment and the second embodiment of the present invention.
FIG. 6 is a diagram showing a temporal relationship between audio data and a MIDI event in the recording operation according to the first embodiment of the present invention.
FIG. 7 is a diagram showing an outline of the SMF according to the first embodiment of the present invention.
FIG. 8 is a flowchart of correlation determination processing according to the first and second embodiments of the present invention.
FIG. 9 is a diagram illustrating a relationship between a change in a value of a calculation formula and a determination result in the correlation determination processing according to the first embodiment and the second embodiment of the present invention.
FIG. 10 is a diagram showing a temporal relationship between audio data in a recording operation, audio data in a reproduction operation, and a MIDI event according to the first embodiment of the present invention.
FIG. 11 is a diagram schematically showing an SMF according to a second embodiment of the present invention.
FIG. 12 is a diagram showing a temporal relationship between audio data in a recording operation, audio data in a reproduction operation, and a MIDI event according to the second embodiment of the present invention.
FIG. 13 is a diagram schematically showing an SMF according to a third embodiment of the present invention.
FIG. 14 is a diagram illustrating a relationship among reference unprocessed audio data, reference processed audio data, determination unprocessed audio data, and determination processed audio data according to the third embodiment of the present invention;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Music CD drive, 2 ... FD drive, 3 ... Automatic performance piano part, 4 ... Sound generation part, 5 ... Operation display part, 6 ... Controller part, 31 ... Piano, 32 ... key sensor, 33 ... pedal sensor, 34 ... MIDI event control circuit, 35 ... sound source, 36 ... drive unit, 41 ... mixer, 42 ... D / A Converter, 43 ... Amplifier, 44 ... Speaker, 61 ... ROM, 62 ... CPU, 63 ... DSP, 64 ... RAM, 65 ... Communication interface.

Claims

First receiving means for receiving audio data indicating the audio waveform of the music;
Second receiving means for receiving control data for instructing performance control;
Generating means for generating reference data that abstracts an audio waveform represented by partial data that is part of the audio data;
Recording means for recording the reference data and recording performance data comprising time data indicating a temporal relationship between the reproduction timing of the partial data and the reception timing of the control data apparatus.

First receiving means for receiving reference data obtained by abstracting a voice waveform, performance data including control data for instructing performance control and time data for instructing execution timing of the performance control;
Second receiving means for receiving audio data indicating the audio waveform of the music;
Selecting means for selecting, as partial data, data indicating a speech waveform similar to the speech waveform represented by the reference data from the speech data;
A reproduction apparatus comprising: a transmission unit configured to transmit the control data at a timing determined by the reproduction timing of the partial data and the time data.

A first reception process of receiving audio data indicating an audio waveform of the music;
A second reception process of receiving control data instructing control of performance;
Generating a reference data that abstracts a speech waveform represented by partial data that is a part of the speech data;
A recording process for recording the reference data, and recording performance data including time data indicating a temporal relationship between the reproduction timing of the partial data and the reception timing of the control data. Method.

A first receiving process of receiving reference data that abstracts a speech waveform, performance data that includes control data for instructing performance control, and time data for instructing execution timing of the performance control;
A second receiving process of receiving audio data indicating the audio waveform of the music;
A selection process for selecting, as partial data, data indicating a speech waveform similar to the speech waveform represented by the reference data from the speech data;
A reproduction method comprising: a transmission step of transmitting the control data at a timing determined by the reproduction timing of the partial data and the time data.

A first reception process for receiving audio data indicating the audio waveform of the music;
A second reception process for receiving control data instructing control of performance;
Generation processing for generating reference data that abstracts a voice waveform represented by partial data that is a part of the voice data;
A program for recording the reference data and causing a computer to execute a recording process of recording performance data including time data indicating a temporal relationship between the reproduction timing of the partial data and the reception timing of the control data.

A first reception process for receiving reference data that abstracts a speech waveform, performance data that includes control data for instructing performance control, and time data for instructing execution timing of the performance control;
A second reception process for receiving audio data indicating the audio waveform of the music;
A selection process for selecting, as partial data, data indicating a speech waveform similar to the speech waveform represented by the reference data from the speech data;
A program that causes a computer to execute transmission processing for transmitting the control data at a timing determined by the reproduction timing of the partial data and the time data.

A recording medium on which performance data comprising reference data abstracted from a sound waveform, control data for instructing performance control, and time data for instructing execution timing of performance control is recorded.