JP2004511838A

JP2004511838A - Method and system for finding matches in a database for waveforms

Info

Publication number: JP2004511838A
Application number: JP2002514577A
Authority: JP
Inventors: スティーブン　シェルフ; ポール　クイン
Original assignee: グレースノート　インコーポレイテッド
Priority date: 2000-07-21
Filing date: 2001-07-20
Publication date: 2004-04-15
Also published as: NO20030319L; WO2002008943A3; AU2001277034A1; EP1303817A2; WO2002008943A2; NO20030319D0

Abstract

【課題】波形に関するデータベース内に整合を発見する方法およびシステム
【解決手段】波形を含むファイルに対応するデータベースにレコード（ｒｅｃｏｒｄｓ）があるか否かを判断するために、ディジタル的にサンプリングされた波形の１つ以上のセグメントが使用され、波形の振幅シグネチャを形成する。振幅シグネチャは、複数の振幅バンドまたはスロットの各々において、波形のセグメント内の発生回数をカウントすることによって生成される。波形の振幅シグネチャはデータベース内の振幅シグネチャとのファジー比較を実行する。１つ以上の潜在的な整合が発見された場合、より正確な比較が実行される。この技術は、例えば各トラック最初、真ん中、および最後から５秒のサンプルセグメントを取ることによってコンパクトディスク（ＣＤ）で使用し、ＣＤのサンプルセグメントに記録されている１／７５秒フレームの５５８個のサンプルの各々において波形の振幅を検出できる。ＣＤの振幅シグネチャは、ＣＤの全サンプルセグメントに対して各振幅スロット内で信号の発生を蓄積することによって、波形の最低振幅から最高振幅までの約２０００個の振幅バンドまたはスロットから形成されてもよい。振幅シグネチャが使用されて、トラック数と各トラックの長さを示すＣＤについての目次（ＴＯＣ）データに基づいて得られた複数の潜在的整合を識別することができる。A method and system for finding a match in a database for waveforms United States Patent Application 20070135157 Kind Code: A1 A digitally sampled waveform for determining whether there are records in a database corresponding to a file containing the waveform. Are used to form the amplitude signature of the waveform. An amplitude signature is generated by counting the number of occurrences within a segment of the waveform in each of a plurality of amplitude bands or slots. The amplitude signature of the waveform performs a fuzzy comparison with the amplitude signature in the database. If one or more potential matches are found, a more accurate comparison is performed. This technique is used on compact discs (CDs), for example, by taking sample segments of the first, middle, and last 5 seconds of each track, with 558 of a 1/75 second frame recorded on a sample segment of the CD. The amplitude of the waveform can be detected in each of the samples. The CD amplitude signature may be formed from approximately 2000 amplitude bands or slots from the lowest amplitude to the highest amplitude of the waveform by accumulating the occurrence of the signal in each amplitude slot for every sample segment of the CD. Good. The amplitude signature can be used to identify multiple potential matches obtained based on the table of contents (TOC) data for the CD indicating the number of tracks and the length of each track.

Description

（関連特許）
本件は、米国特許第５，９８７，５２５号として１９９９年１１月１６日に発行され参照としてここに組み込まれる、米国特許出願番号第０８／８３８，０８２号の分割出願である、１９９９年７月１６日に出願された米国特許出願番号第０９／３５４，１６４号の一部継続出願である。
【０００１】
（発明の背景）
（発明の分野）
本発明は、データベース内でレコード（ｒｅｃｏｒｄｓ）を発見すること、具体的には波形を示すレコードのデータベースに波形の整合を発見することことについてである。
【０００２】
（従来技術の説明）
過去数年に渡って、オンラインサービスは爆発的な成長を経験し、エンタテイメントの１つの主要な形態となった。この新たなエンタテイメントと並行して、音楽レコーディングなど多くの従来の形態が大規模に消費され続けている。
【０００３】
音楽レコーディングにおける従来の体験は、１つの部屋に一緒に集まって少人数のグループで聞くことである。音楽は音響的に部屋を満たすが、関連の映像コンテンツはほとんどなく、音量を設定したりオーディオイコライザを使用したりするなど、実質的にどのトラックを演奏するかや、記録された音楽に単純な変化を与えるという、レコーディングとの限られた対話性があるだけである。この従来の体験はほぼ１００年前の７８ｒ．ｐ．ｍ音楽レコーディングの初期に遡る。
【０００４】
音楽レコーディングの従来の生産はレコーディングの従来の経験を補完する。レコーディングは多数のレコーディングセッションにより行なわれ、慎重なミキシングと編集にかけられ、そして一般大衆に発売される。その時点で固定化された媒体、今日では音楽用ＣＤに記録されるが、その目的は、作詞（曲）者、ミュージシャン、プロデューサ、およびレコーディングエンジニアによって設計された最終的な音響体験を可能な限り忠実に記録することにある。
【０００５】
ミュージックビデオは、このような媒体のトラック上に映像コンテンツと対応付けることで、音楽レコーディングの従来の体験を補完してきた。しかしながら、とりわけミュージックビデオは、それが併せ持つユーザコントロールの欠如についてのすべての問題が解決されないまま放送されてきており、それらは、対話性や消費者による参加に寄与してこなかった。
【０００６】
オンラインサービスは音楽レコーディングと関連する経験を豊富にする機会を提供する。本発明は、この問題を解決するコンピュータプログラム、システム、およびプロトコルに関する。
【０００７】
（発明の概要）
従って、本発明の１つの目的は、インターネットなどのオンラインサービスによって、プロデューサが音楽レコーディングを補完するエンタテイメントを配信することを可能にするコンピュータプログラム、システム、およびプロトコルを提供することである。本発明のさらなる目的は、このような補完的なエンタテイメントが消費者に対して有意義に対話的であり、それによって消費者もまたクリエータとしての体験を可能にするコンピュータプログラム、システム、およびプロトコルを提供することである。
【０００８】
本発明のさらなる目的は、オンラインサービスの基準によるさらなる発展に適合するような柔軟性を保持しつつ、既存の環境とプログラムの統合を特にインターネット上で達成するように設計されることによって上記目的を達成することにある。
【０００９】
本発明の１つの態様において、リモートホストコンピュータ上で稼動中のコンピュータプログラムが、ユーザのコンピュータ上でコンパクトディスク（ＣＤ）プレーヤ、ＤＶＤプレーヤなどをコントロールできるソフトウェアが提供される（便宜上、「ＣＤプレーヤ」という用語は、ＤＶＤプレーヤおよび類似の装置にも言及することにする。）。ソフトウェアはリモートホストコンピュータが、ＣＤプレーヤ上で動作を開始し、かつＣＤプレーヤのフロントパネル上のボタンや別のＣＤプレーヤコントロールプログラムなど他のコントロール手段によってユーザが開始した動作に気づくように設計されている。本発明のこの態様は、これらのレコーディングが、オーディオＣＤという現在一般的な形態として固定化されている場合、音楽レコーディングへの補完的なエンタテイメントの提供に対するビルディングブロックである。
【００１０】
本発明の第２の態様において、双方向コンテンツを含む映像コンテンツは、音楽レコーディングからのコンテンツの配信に同期するように、オンラインサービス上で配信されてもよい。このような映像コンテンツは、例えば、ユーザのコンピュータでオーディオＣＤの演奏に同期してもよい。映像コンテンツは、例えばミュージックビデオのように、音楽レコーディングにテーマ別にリンクされている。
【００１１】
本発明の第３の態様において、一意の識別子を、多数のトラックからなる音楽レコーディングに割り当てる方法が提供される。一意の識別子は、映像コンテンツを配信するソフトウェアによって、オーディオＣＤが、映像コンテンツが対応している真に正しいＣＤであるということを確かめられる点において、オーディオＣＤの演奏と連動した映像コンテンツの配信に有用な補足となる。映像コンテンツが、例えばハインリッヒ・イグナッツ・フランツ・ビーバー（Ｈｅｉｎｒｉｃｈ　Ｉｇｎａｚ　Ｆｒａｎｚ　Ｂｉｂｅｒ）のロザリオのソナタ（Ｒｏｓａｒｙ　Ｓｏｎａｔａｓ）を伴奏するように設計されている場合、ユーザのプレーヤＣＤが映画メリーポピンズ（Ｍａｒｙ　Ｐｏｐｐｉｎｓ）のサウンドトラックでなければそれはおそらくうまく機能しないであろう。また一意の識別子によって、プレミアムウェブエリアにアクセスするためのキーとしてＣＤが使用される。さらに、一意の識別子によって、ユーザは、ユーザのマシーンにあるＣＤに対応するウェブのエリアに導かれる。
【００１２】
本発明の第４の態様において、一般的に「チャットルーム」と称される非常に人気のあるオンラインサービスは、チャットルームのすべての人が聞いている音楽レコーディングへのリンクによって拡張されてもよい。オンラインサービスにおいて今日存在しているようなチャットルームの体験は、識別可能な環境がある、従来からの直接、面と向き合う社交的な出会いと比較すると非現実的な性質を有している。今日のチャットユーザの唯一の共通の体験は、コンピュータスクリーン上を飛び回るようなチャットの言葉であり、おそらくはスクリーンの小さな空間を占めるユーザのアイコン（「権化」）または他の映像コンテンツである。チャットルームと連動した音楽レコーディングの使用は、従来の社交的な出会いの共有された雰囲気の度合いを回復する可能性を開く。さらに、音楽レコーディングは、チャット検索者が、特定のタイプのレコーディングへの共有された興味によって集まることができる中心地点を提供する。
【００１３】
（好ましい実施形態の説明）
本発明の好ましい実施形態は、ＷＷＷ（ワールド・ワイド・ウェブ（Ｗｏｒｌｄ　Ｗｉｄｅ　Ｗｅｂ））上で動作する。ＷＷＷによって提供されるソフトウェア実施環境は多数の書物、例えば、Ｊｏｈｎ　Ｄｅｃｅｍｂｅｒ　＆　Ｍａｒｋ　Ｇｉｎｓｂｕｒｇによる、ＨＴＭＬ３．２およびＣＧＩ　Ｕｎｌｅａｓｈｅｄ（１９９６年）に説明されている。ＷＷＷは、Ｔ．Ｂｅｒｍｅｒｓ―Ｌｅｅらのハイパーテキスト転送プロトコル（Ｈｙｐｅｒｔｅｘｔ　Ｔｒａｎｓｆｅｒ　Ｐｒｏｔｏｃｏｌ）―ＨＴＴＰ／１．０（Ｉｎｔｅｒｎｅｔ　Ｒｅｑｕｅｓｔ　Ｆｏｒ　Ｃｏｍｍｅｎｔｓ第１９４５号、１９９６年）に説明されている、ＨＴＴＰ（ハイパーテキスト転送プロトコル）と称されるネットワークプロトコルに基づいている。ＨＴＴＰプロトコルは、Ｄｏｕｇｌａｓ　Ｅ．ＣｏｍｅｒによるＴＣＰ／ＩＰによるインターネットワーキング（Ｉｎｔｅｒｎｅｔｗｏｒｋｉｎｇ　ｗｉｔｈ　ＴＣＰ／ＩＰ）（第３版、１９９５年）に説明されている、今日では一般的にＴＣＰ／ＩＰである、一般接続指向プロトコル上で稼動されなければならない。しかしながら、ここに説明される発明は、特定の種類のネットワークのソフトウェアまたはハードウェア上で稼動するＨＴＴＰに限定されない。本発明の原理は、ＨＴＴＰに匹敵または取って代わるようになるかもしれない、遠隔情報にアクセスするための他のプロトコルにも適用される。
【００１４】
図１に示されるように、ウェブのユーザは自分のコンピュータの前に座り、ブラウザと称されるコンピュータプログラムを稼動する。ブラウザはＨＴＴＰ要求を、サーバと称される他のコンピュータに送出する。この要求において、サーバ上で使用可能なリソースと称される特定のデータはユニフォーム・リソース・ロケータ（ＵＲＬ）によって参照される。また、Ｂｅｒｍｅｒｓ―Ｌｅｅらによるｓｕｐｒａ．Ａ　ＵＲＬに定義される特定のフォーマットの文字列は、サーバの識別とサーバ内の特定データの識別との両方を含んでいる。この要求に反応して、サーバはユーザのブラウザに返答し、ブラウザは、一般的に数種類のコンテンツをユーザに表示することによって、これらの応答に反応する。
【００１５】
応答のコンテンツ部分は、ハイパーテキスト・マークアップ・ランゲージ（Ｈｙｐｅｒｔｅｘｔ　Ｍａｒｋｕｐ　Ｌａｎｇｕａｇｅ）（ＨＴＭＬ）で表される「ウェブページ」であってもよい。その言語によって、（アンカーおよびハイパーリンクとしても知られている）ビットマップフォーマットイメージおよびリンクを散りばめたテキストからなるコンテンツを表すことができる。リンクは、ブラウザがユーザの指示でさらに要求を送ってもよい、さらなるＵＲＬリンクである。
【００１６】
応答はまた、ブラウザ、例えば動画をもたらすコマンドによって解釈される、より複雑なコマンドを含んでいてもよい。ＨＴＭＬ自体は複雑なコマンドを定義しないが、むしろそれらのコマンドは、個別に定義されたスクリプト言語に属するものと考えられる。このスクリプト言語のうち最も一般的な２つのスクリプト言語としてＪａｖａ（登録商標）ＳｃｒｉｐｔとＶＢＳｃｒｉｐｔがある。
【００１７】
スクリプト言語で記述されたコードによってブラウザの機能を拡張することに加えて、コンパイルされたコードによってブラウザの機能を拡張することも可能である。このようなコンパイルされたコードは「プラグイン」と称される。プラグインを記述するための正確なプロトコルは特定のブラウザによる。マイクロソフト（Ｍｉｃｒｏｓｏｆｔ）のブラウザ用のプラグインは、アクティブ・エックス・コントロール（Ａｃｔｉｖｅ　Ｘ　ｃｏｎｔｒｏｌｓ）の名称で呼ばれる。
【００１８】
プラグインは非常に複雑でありうる。本発明に従って好都合に使用されてもよいプラグインはマクロメディア（Ｍａｃｒｏｍｅｄｉａ）によるショックウェーブ（Ｓｈｏｃｋｗａｖｅ）である。それによって、サーバの応答の一部である動画はユーザにダウンロードされプレイされる。ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）はＬｉｎｇｏと称されるそれ自身のスクリプト言語を定義する。Ｌｉｎｇｏスクリプトはショックウェーブ（Ｓｈｏｃｋｗａｖｅ）プラグインがプレイできるダウンロード可能な動画内に含まれる。ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）動画の一般的なフォーマットは、タイムライン内の特定のフレームで現れ、動き、そして消える多数の映像オブジェクトと共に一連のフレームからなるタイムラインである。ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）動画内でより複雑な効果を達成するために、Ｌｉｎｇｏスクリプトは所定のビジュアルオブジェクトに加えて使用されてもよい。
【００１９】
本発明の好ましい実施形態は、音楽レコーディングの演奏を詳細に命令する能力をスクリプト言語に提供する、コマンドプラグインと称されるプラグインを使用する。コマンドプラグインは最低限以下の基本的機能を提供するべきである：
（１）演奏を開始および停止する
（２）トラック内の現在の曲と位置を把握する
（３）トラック内の曲と位置を検索する
（４）音量を設定する
（５）ＣＤに関する情報を得る（例えば、トラック数、その長さ、トラック間のポーズ）
（６）ＣＤドライブの性能に関する情報を得る
他の機能も提供されてもよく、基本オペレーティングシステムサービスが提供できるものによってのみ制限される。
【００２０】
コマンドプラグインは好ましくは、Ｃ^＋＋などの従来のプログラミング言語で記述される。プラグインは、マイクロソフト（Ｍｉｃｒｏｓｏｆｔ）のアクティブ・エックス・オブジェクト（Ａｃｔｉｖｅ　Ｘ　ｏｂｊｅｃｔ）に必要とされるような、プラグインの既存の規格に準拠するべきである。情報を得、コマンドプラグインがスクリプト言語に使用可能にする機能を実行するために、コマンドプラグインは、演奏中の音楽レコーディングに関するコントロールおよび情報を提供する機能に依存する。これらの機能はレコーディングの正確なソースに依存する。現在の好ましい実施形態のように、レコーディングがコンピュータのＣＤプレーヤのオーディオＣＤで演奏されており、かつブラウザがマイクロソフト（Ｍｉｃｒｏｓｏｆｔ）のウィンドウズ（登録商標）３．１またはウィンドウズ（登録商標）９５で稼動中の場合、これらの機能はＷｉｎ３２アプリケーションプログラミングインタフェースの一部を形成するＭＣＩ機能である。これらの機能は、例えばマイクロソフト（Ｍｉｃｒｏｓｏｆｔ）のＷｉｎ３２のプログラマーズリファレンスに文書化されている。異なる機能は、ストリーミングオーディオレシーバによって提供されてもよく、例えば、ネットワーク接続によってユーザのコンピュータに入ってくる、ＭＰＥＧなどの適切な音声符号化フォーマットで符号化されたオーディオを取り入れるレシーバなどが考えられる。
【００２１】
コマンドプラグインの実装について言及する際の重要なポイントは、例えば検索などの実行する動作が約１秒かかるということである。コマンドプラグインが、その１秒の間にマシーンのコントロールを保つのは望ましいことではなく、長い動作の場合はいつもプラグインがブラウザに対するマシーンのコントロールを放棄し、共通のスクリプト言語で使用されている非同期イベントハンドル性能を介して動作の結果を報告することが重要である。
【００２２】
コマンドプラグインが提供する機能、プラグインの記述方法（例えばアクティブ・エックス・オブジェクト（Ａｃｒｉｖｅ　Ｘ　ｏｂｊｅｃｔ）の記述方法）に関する一般知識、および音楽レコーディングの演奏をコントロールするための関連のアプリケーションプログラミングインタフェース（例えばＷｉｎ３２のＭＣＩ）の上記概要を考えると、当業者は容易にかつ必要以上の実験なしで、実際に動作するコマンドプラグインを開発することができる。このため、コマンドプラグインがどのように実行されるかについてのさらなる詳細はここでは説明しない。
【００２３】
スクリプト言語によって上述した機能を提供するコマンドプラグインの存在は、音楽レコーディングを補足するエンタテイメントが構築されてもよい基礎である。とりわけ、この基礎の上にたつ、オーディオＣＤ上で生じるイベントを有するスクリプト言語によって映像コンテンツの表示を同期させる方法を考案することができる。
【００２４】
本発明の好ましい実施形態において、映像コンテンツのオーディオＣＤへの同期は以下のように進行する。映像コンテンツは、サーバからダウンロードされて、ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）プラグインによってユーザに提供され、ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）動画によって表示される。このダウンロードは動画が表示される前に行われてもよく、あるいはまた、ネットワークへのユーザの接続が適切な速度でダウンロードをサポートするのに十分速い場合、動画が表示されるのと同時に行われてもよい。ダウンロードはショックウェーブ（Ｓｈｏｃｋｗａｖｅ）プラグイン自身によって提供される機能である。
【００２５】
ショックウェーブ（Ｓｈｏｃｋｗａｖｅ）動画がプレイされると、Ｌｉｎｇｏスクリプトは、フレームが表示を終了するたびに実行する。Ｌｉｎｇｏスクリプトは、動画のフレームと、トラック数と時間とによって識別される音楽レコーディングのセグメントとの間に存在すべき関係についての記述を含んでいる。Ｌｉｎｇｏスクリプトは上記のコマンドプラグインによって、オーディオＣＤが演奏するトラックおよび時間を判断する。そして、動画のいずれのフレームがオーディオＣＤの部分に対応するかを判断するために記述を参照する。現在のフレームがこれらのフレームうちの１つでない場合、Ｌｉｎｇｏスクリプトは動画のタイムラインをリセットし、動画はオーディオＣＤの現在の位置に対応するフレームで演奏を開始する。これによって、例えばネットワークからのダウンロードが遅れたり、ユーザのコンピュータが全速で動画をプレイするサイクルを欠いていたり、またはユーザがＣＤを速くダウンロードしたりするために、まだＣＤの現在の位置が遅れている場合、映像コンテンツを使用することができる。
【００２６】
（図２に示される）この同期アルゴリズムの変形形態において、動画のフレームは隣接するフレームのグループに配置される。このようなフレームの各グループとオーディオレコーディングの特定のセグメント間の対応が確立される（図２のボックス２００）。動画の各フレームの終わりで、オーディオ演奏位置が判断される（ボックス２１０）。オーディオ演奏位置が次の連続するフレームが属するフレームのグループに対応するレコーディングのセグメント内にあるか否かを判断するためのテストが実行される（ボックス２１５）。オーディオ演奏位置がそのセグメント内にある場合、動画の再生はその次のフレームに続く（ボックス２３０）。オーディオ演奏位置がそのセグメント内にない場合、動画の再生はオーディオがある位置に対応するフレームに進む（ボックス２２０および２２５）。
【００２７】
本発明のさらなる態様は、コマンドプラグインを利用することによって、大容量記憶装置に記憶されてもよいレコーディングに対する一意の識別子を確立する技術を提供する能力である。大容量記憶装置以外にも、集積回路、磁気媒体（例えばハードディスク）または他の媒体、あるいはオーディオＣＤなどのリムーバブル媒体、携帯型フラッシュメモリやメモリスティック^ＴＭなどの集積回路メモリ、コンピュータのＣＤ−ＲＯＭドライブによってアクセスされるもの、例えば、ＭＰ３プレーヤ／レコーダまたは媒体にアクセス可能な他の装置でもよい。一意の識別子は目次（Ｔａｂｌｅ　ｏｆ　Ｃｏｎｔｅｎｔｓ（ＴＯＣ））データまたはレコーディング自体のコンテンツからの（フレームで測定された、すなわち１秒の７５分の１番目の）トラックの数および長さに基づいていてもよい。識別子は単に、ファジー比較アルゴリズムで、また１つ以上の可能な整合が見つかる場合はより正確な整合のために使用される、連結したトラック長である。
【００２８】
以下は本発明に使用されるファジー比較アルゴリズムの一例である。比較される２つのオーディオＣＤの各々に対して、記録媒体の全トラックの長さをミリ秒単位で決定する。次いで、各トラック長を８ビットずつ右にシフトする。実際には、２^８＝２５６で切り捨て除算を実行する。次いで、進行するにつれて２つの数、すなわち、整合した合計数と整合エラーの数とを累積しながら、双方の記録媒体についてトラックごとに実行する。これらの数値は両方とも比較開始の時点でゼロに初期化される。各トラックについて、比較される第１のＣＤのシフトされたトラック長によって整合した合計数をインクリメントし、２つのＣＤにおけるシフトされたトラック長の差分の絶対値によって整合エラーをインクリメントする。ＣＤの一方が他方よりもトラック数が少ない場合、少ないトラック数を有するＣＤの最終トラックに行き着くと、整合した合計数と整合エラーの数をシフトされた残りのトラック長によってインクリメントしながら、他方のＣＤのトラックで演算を続ける。トラックを介して実行するこれらのステップに続いて、アルゴリズムは整合エラーの数を整合数によって除算し、その結果の指数を１から減算し、その差分を、２つのＣＤがどの程度整合しているかを示すパーセンテージに変換する。
【００２９】
レコーディングに対して識別子を生成するためのトラック長の使用は、複数のトラックを有する媒体、好ましくは、目次、つまりＴＯＣの情報を記憶するトラックを有する、ＣＤやＤＶＤなどの媒体に最適である。さらに、トラック長またはＴＯＣデータの使用は、ファジー整合に最適に機能することがわかっているが、このことは１つ以上の可能な整合を見つけ出す結果につながることもある。ＴＯＣデータの代替または補足は、媒体上のコンテンツを使用することである。しかしながら、記憶スペースと帯域要件を最小化するために、比較的小さいコンテンツベースの識別子を使用することが望ましい。
【００３０】
本発明の実施形態は、振幅シグネチャ（ａｍｐｌｉｔｕｄｅｓｉｇｎａｔｕｒｅ）を使用しているが、これにより（レコーディングに１つ以上のトラックがある場合は）最初、真ん中、最後の各トラックの位置からの、短い、例えば５秒のサンプルセグメントから生成されたコンテンツベースの識別子を提供することができる。このようなサンプルセグメントの一例（サンプルセグメントという用語は識別子を生成するために使用されるセグメントと、識別されたセグメント、すなわち一般にＣＤ上の「トラック」と称される、ＴＯＣ内の識別されたセグメントとを区別するために使用される）は図４Ａに波形４１０で図示されている。本発明に従って、複数の振幅帯域または間隔（ｓｌｏｔ）が定義され、各間隔（ｓｌｏｔ）内の全セグメントの波形の発生数がカウントされる。レッドブック（Ｒｅｄｂｏｏｋ）ＣＤオーディオは１チャネル当たり４４．１Ｋサンプル、かつ１秒あたり７５フレームのデータを有する１６ビットステレオでサンプリングされたディジタルオーディオファイルである。従って、１つの５秒サンプルセグメント（７５フレーム／秒×５８８サンプル／フレーム×５秒＝データの５秒間で２２０，５００サンプル）において、最大２２０，５００回発生する。一意性を保証するために、約２０００個（例えば、２^１１または２０４８）の間隔（ｓｌｏｔ）を使用することが望ましいが、比較されている波形の特徴によっては、他のサイズ、数、および種別のサンプルと、別の数の間隔（ｓｌｏｔ）を使用してもよい。本発明の説明を簡略化するために、図４Ａおよび４Ｂについてより大まかな例を示す。
【００３１】
好ましい実施形態において、第１のステップは、波形を正常化し、最初と最後の間隔（ｓｌｏｔ）が少なくとも１つの波形発生を有するようになることである。図４Ａの波形４１０は図４Ａに図示された７つの間隔（ｓｌｏｔ）４２０上で正常化され、間隔（ｓｌｏｔ）４２１乃至４２７として個々に示される間隔（ｓｌｏｔ）４２０によって図４Ｂの波形４１０ｂを生成する。図４Ｂで提供される簡略化された例において、１６個のタイムサンプルが取られ、各垂直線に１つずつある。従って、間隔（ｓｌｏｔ）４２１に波形４１０ｂの１つのサンプルがあり、間隔（ｓｌｏｔ）４２２に３つ、間隔（ｓｌｏｔ）４２３に２つ、間隔（ｓｌｏｔ）４２４に１つ、間隔（ｓｌｏｔ）４２５に２つ、間隔（ｓｌｏｔ）４２６に３つ、間隔（ｓｌｏｔ）４２７に４つある。これは線形アレイＡ１［１，３，２，１，２，３，４］によって表される。
【００３２】
アレイＡ１が、データベース内の整合が検索されるための選択されたレコーディングを表す識別シグネチャアレイである場合、ファジー整合が、Ａ１の要素と既存のシグネチャアレイの要素間の差分の平均を算出することによって実行されてもよい。例えば、データベース内の記録の１つは、［１，０，２，０，１，０，１］の差分アレイあるいは５／７または０．７１４の平均差分に対するＡ２［２，３，４，１，１，３，３］のシグネチャアレイを有していてもよい。平均差分に基づく“ファジー整合”は、シグネチャ生成に対する波形のエラーと不完全な初期発見を見込んでいる。しかしながら、整合として受容される平均差分は偽数（ｆａｌｓｅｐｏｓｉｔｉｖｅｓ）を最小化するために設定されなければならない。あるいはまた、サンプルセグメントの数または長さは偽数（ｆａｌｓｅｐｏｓｉｔｉｖｅｓ）を減少させるために増大されてもよいが、これは記録媒体を読取り、シグネチャアレイを算出するのにかかる時間を増大させる。テストされた波形について、１０の平均差異は、ＣＤ波形を使用する場合の多数の偽数（ｆａｌｓｅｐｏｓｉｔｉｖｅｓ）と、各々が２０４８個の間隔（ｓｌｏｔ）を有する３つの５秒サンプルセグメントとを除去しながら、実際にすべての可能な整合を発見できることが分かった。これら条件下で、２５６個の間隔（ｓｌｏｔ）が非類似の波形の非常に多くの整合を生成し、４０００個の間隔（ｓｌｏｔ）は別個に置かれるため、多数の近い整合があることがわかった。間隔（ｓｌｏｔ）の正確な数は、サンプルセグメントのサイズおよびサンプリングされている波形のタイプによって変わる。
【００３３】
１つ以上の可能な整合が発見された場合、識別および既存のシグネチャアレイのより正確な比較が実行される。正確に整合している間隔（ｓｌｏｔ）か、または１回の整合発生内の間隔（ｓｌｏｔ）数が使用されてもよい。上述の例において、１つのアレイの１つのエラー（又は１つの猶予）が許容され、７つのうちの３つ又は要素の４３％が正確に整合する場合には、７つのうちの６つまたはアレイＡ１およびＡ２の要素の８６％が整合する。１つの許容に対して８０％以上の整合、または許容無しで７０％以上の整合が可能な整合であるらしいことがわかった。許容値は１以上に増大され、波形を整合する際により多くの許容を認める。
【００３４】
音楽レコーディングに対する一意の識別子はデータベースキーとして使用される。サイトにはＣＤについての情報に関するデータベースを保持していてもよく、例えば特定のレコード会社によって発売されるすべてのＣＤに関する情報はそのレコード会社のサイト上に保持されてもよい。ユーザがこの情報をナビゲートするには、種々の方法がある。例えば、ユーザは目次として多数のハイパーリンクを含むウェブページを使用してもよく、あるいは従来の検索エンジンを使用してもよい。本発明の一意の識別子によって可能にされる検索の第３の方法は、ユーザが情報を探しているＣＤをコンピュータのＣＤドライブに置くことを勧めるウェブページが存在することである。ドライブ内のＣＤの存在を検出する際に、ウェブページのスクリプトはＣＤに対応する一意の識別子を算出し、それをサーバに送る。そしてサーバはその一意の識別子に基づいてデータベースから検索したＣＤについての情報を表示する。この情報は、オーディオＣＤに関連するウェブアドレス（ＵＲＬ）（例えばアーティストのホームページのもの）、歌の題名などの簡単なデータ、そして潜在的には（例えばバンドの）写真、アートワーク、動画、およびビデオクリップを含む補完エンタテイメントを含んでいてもよい。事項を配列して、ユーザがオーディオＣＤをコンピュータに挿入すると、（ｉ）ブラウザが、まだ稼動していない場合には開始され、（ｉｉ）ブラウザがＣＤの一意の識別子を算出し、その一意の識別子からＵＲＬを取り出し、（ｉｉｉ）ブラウザがそのＵＲＬ上でＨＴＴＰトランザクションを得るようにすることも可能である。
【００３５】
音楽レコーディングに対する一意の識別子の別の適用は、ウェブのプレミアムエリアに入るためにオーディオＣＤをキーとして使用することである。現在、署名によって許可されるウェブのプレミアムエリアがある。一意の識別子に基づいた許可の簡単な形態は、ウェブの特定のエリアにアクセスする前に、ユーザが自分のＣＤドライブに、特定のＣＤ、あるいは特定の会社によって発売されたＣＤまたは特定のバンド又はアーティストの音楽を含むＣＤを置くことを必要とすることである。これは、コマンドプラグインによって提供される機能を使用し、一意の識別子を算出するスクリプトによって容易に実行される。
【００３６】
本発明の第３の態様は、音楽レコーディングを有するチャットルームの接続である。その目的は、チャットルームへの全参加者に、同じ音楽をほぼ同時に提供することである。
【００３７】
チャットサービス用の一般的なネットワークプロトコルは、Ｊ．Ｏｉｋａｒｉｎｅｎ＆Ｄ．Ｒｅｅｄによる、インターネット・リレー・チャット・プロトコル（Ｉｎｔｅｒｎｅｔ　Ｒｅｌａｙ　Ｃｈａｔ　Ｐｒｏｔｏｃｏｌ）（Ｉｎｔｅｒｎｅｔ　Ｒｅｑｕｅｓｔ　ｆｏｒ　Ｃｏｍｍｅｎｔｓ第１４５９号、１９９３年）に説明されるインターネット・リレー・チャット（Ｉｎｔｅｒｎｅｔ　Ｒｅｌａｙ　Ｃｈａｔ）（ＩＲＣ）である。このプロトコルにおいて、チャットサーバのクライアントになると、チャットルームの名前を送る。チャットサーバはクライアントの名前のすべてからメッセージを受信し、１人のクライアントから送られてきたメッセージを、クライアントとして同じルームに接続されている他の全クライアントに中継する。クライアントが送るメッセージは一般的に、クライアントになっているユーザによってタイピングされ、クライアントが受信するメッセージは一般的にクライアントであるユーザに対して表示されて、読まれる。
【００３８】
本発明の好ましい実施形態において、チャットのクライアントは我々がチャットプラグインと称するプラグインによってカスタマイズされる。チャットのクライアントはブラウザによって以下のように開始される（図３参照）。ユーザは、ダウンロードされると、ユーザが自分のプレーヤにＣＤを挿入したかを尋ねる中心ウェブページ（ボックス３００）に、ブラウザによって接続する（ボックス３０５）。ＣＤの一意の識別子が、中心ウェブページのスクリプトのコマンド下で、上記のコントロールプラグインを使用することによって、算出され、サーバに送り返される（ボックス３１０）。そしてサーバは一意の識別子を使用して、ＣＤに焦点を当てたチャットルームであるか否かを判断する（ボックス３１５）。このステップは当業において既知の技術を使用して、一意の識別子をデータベース内に見つけることによって実行されてもよい。データベース、例えばＤｅｃｅｍｂｅｒ＆Ｇｉｎｓｂｕｒｇ，ｓｕｐｒａ，ｃｈａｐｔｅｒ２１にウェブページを接続すると、膨大な文献が存在する。ＣＤに焦点を当てたチャットルームが存在するか、または作成することができる場合、サーバはそのチャットルームの名前で応答し、ブラウザはそのチャットルームのクライアントとして、ユーザのコンピュータ上でチャットクライアントを開始する（ボックス３２０）。
【００３９】
チャットルームの名はサーバによって設定されて、ＣＤが他のチャットルームのクライアントの端末で演奏中のトラック、およびトラックが演奏を開始した時間、ならびにＣＤが演奏されている音量についての情報を含んでいる。チャットのクライアントのプラグインはその情報を使用し、コントロールプラグインに、ユーザのコンピュータにＣＤをセットさせ、他のチャットルームのクライアントの端末で演奏しているＣＤにほぼ同期するように演奏させる（ボックス３２０）。
【００４０】
チャットルームの各ユーザは自分の端末で演奏中のＣＤをコントロールできる。コントロール動作は、実行されているコントロール動作について記述するチャットサーバにメッセージを送るチャットプラグインになる（ボックス３２５）。例えば、このようなメッセージはＣＤの位置の変化、音量の変化、または別のＣＤと取り替えるためのＣＤの取り出しを示していてもよい。他のユーザの端末上で稼動中のチャットプラグインは、この種のメッセージを見る際に、（可能な限り）他のユーザの端末上の動作を、上記のコントロールプラグインを使用して反復する（ボックス３３０）。
【００４１】
本発明のさらなる態様において、特定の音楽レコーディングに焦点を当てたチャットルームは、投票手続によって特定のトラックを選択することができる。簡単な投票手続は、各チャットプラグインが、識別可能な２つの連続する変更メッセージを見るときのみ、前段で記述した種類の変化メッセージに影響を及ぼすことである。これは、演奏されているトラックを変更するために、２人のユーザは、そのトラックに変更する必要があることを意味している。２曲目が上位の番号に置き換えられてもよい。
【００４２】
本発明のさらなる態様において、チャットのユーザに配信されるメッセージは手でタイピングするよりも、むしろテキストファイルから駆動される。これによって、事前に記録されていた体験が、チャットユーザのグループに対して再生される。このような技術は、オーディオＣＤが事前に記録され、ナレーションの入ったツアーを作成するために使用されてもよい。
【００４３】
上記のような好ましい実施形態の重要な利点は、インターネット・リレー・チャット（Ｉｎｔｅｒｎｅｔ　Ｒｅｌａｙ　Ｃｈａｔ）、または類似の最小限のチャットサービスを提供するプロトコルによって必要とされる最小限の機能性をサポートする、いかなるチャットサーバソフトウェアとも一緒に使用されてもよいということである。必要とされるさらなるソフトウェアは、ＣＤ情報のデータベースに接続している、チャットのクライアントプラグインおよび中心ウェブページにみられる。
【００４４】
本発明の多数の特徴および利点は詳細な明細書より明らかであり、従って添付のクレームによって、本発明の主旨および範囲内にあるシステムのこのような特徴および利点のすべてをカバーすることが意図されている。さらに、多数の修正および変更は、本発明の開示より、当業者にとって容易になされる。本発明を、図示および説明された正確な構成および動作に限定することは望ましくなく、従って、適切な修正および同等物が、本発明の範囲および主旨を逸脱しない限り採用されてもよい。
【図面の簡単な説明】
【図１】
図１は、好ましい実施形態が動作する環境のブロック図である。
【図２】
図２は、本発明の同期コードのフローチャートである。
【図３】
図３は、音楽レコーディングに焦点を当てたチャットルームへの接続のための動作シーケンスのフローチャートである。
【図４】
図４Ａおよび４Ｂは、本発明に従った波形分析の説明図である。(Related patents)
This application is a divisional application of US Patent Application Serial No. 08 / 838,082, issued July 16, 1999 as US Patent No. 5,987,525, incorporated herein by reference, July 1999. It is a continuation-in-part of U.S. Patent Application Serial No. 09 / 354,164 filed on the 16th.
[0001]
(Background of the Invention)
(Field of the Invention)
The present invention is about finding records in a database, and specifically finding waveform matches in a database of records showing waveforms.
[0002]
(Description of the prior art)
Over the past few years, online services have experienced explosive growth and have become one major form of entertainment. In parallel with this new entertainment, many conventional forms, such as music recordings, continue to be consumed on a large scale.
[0003]
The traditional experience in music recording is to get together in one room and listen in small groups. The music acoustically fills the room, but has little associated video content, and virtually no track to play, such as setting the volume or using an audio equalizer, There is only limited interactivity with the recording, making a difference. This traditional experience is almost 100 years ago at 78 r. p. Go back to the beginning of music recording.
[0004]
The traditional production of music recordings complements the traditional experience of recording. Recordings are performed in a number of recording sessions, carefully mixed and edited, and released to the general public. Recorded on fixed media at that time, today on music CDs, the goal is to provide as much as possible a final acoustic experience designed by songwriters, musicians, producers and recording engineers. It is to record faithfully.
[0005]
Music videos have complemented the traditional experience of music recording by associating them with video content on tracks on such media. However, music videos, among others, have been broadcast without solving all the problems associated with their lack of user control, and they have not contributed to interactivity or consumer participation.
[0006]
Online services offer the opportunity to enrich the experience associated with music recordings. The present invention relates to computer programs, systems, and protocols that solve this problem.
[0007]
(Summary of the Invention)
Accordingly, one object of the present invention is to provide computer programs, systems, and protocols that allow producers to distribute entertainment that complements music recordings via online services such as the Internet. It is a further object of the present invention to provide a computer program, system and protocol in which such complementary entertainment is meaningfully interactive to the consumer, thereby enabling the consumer to also experience as a creator. It is to be.
[0008]
It is a further object of the present invention that it is designed to achieve the integration of existing environments and programs, especially on the Internet, while retaining the flexibility to adapt to the further developments of online service standards. To achieve.
[0009]
In one aspect of the invention, software is provided that allows a computer program running on a remote host computer to control a compact disc (CD) player, DVD player, etc. on a user's computer (for convenience, a "CD player"). The term will also refer to DVD players and similar devices.) The software is designed so that the remote host computer starts operating on the CD player and is aware of the operation initiated by the user by means of other controls such as buttons on the front panel of the CD player or another CD player control program. I have. This aspect of the invention is a building block for providing complementary entertainment to music recordings where these recordings are fixed as the current general form of audio CDs.
[0010]
In a second aspect of the present invention, video content, including interactive content, may be delivered on an online service so as to synchronize with the delivery of the content from the music recording. Such video content may, for example, be synchronized with the performance of an audio CD on the user's computer. Video content is thematically linked to music recordings, such as music videos.
[0011]
In a third aspect of the present invention, there is provided a method of assigning a unique identifier to a music recording comprising a number of tracks. The unique identifier is useful for delivering video content in conjunction with audio CD performance, in that the software that delivers the video content can verify that the audio CD is the true correct CD that the video content supports. A useful supplement. If the video content is designed to accompany, for example, the Rosary Sonatas of the Heinrich Ignats Franz Biber, the user's player CD will be the Mary Poppins of the movie Mary Poppins. Without a soundtrack it would probably not work. The unique identifier also uses the CD as a key to access the premium web area. Further, the unique identifier directs the user to an area of the web corresponding to the CD on the user's machine.
[0012]
In a fourth aspect of the present invention, a very popular online service, commonly referred to as a "chat room", may be extended by a link to a music recording that is being listened to by everyone in the chat room. . Chat room experiences, such as those that exist today in online services, are unrealistic when compared to traditional, direct, face-to-face, social encounters with an identifiable environment. The only common experience of today's chat users is chatting words that fly around on a computer screen, perhaps the user's icon ("authorization") or other video content that occupies a small space on the screen. The use of music recordings in conjunction with chat rooms opens up the possibility of restoring the degree of shared atmosphere of traditional social encounters. In addition, music recordings provide a central point from which chat searchers can gather with a shared interest in a particular type of recording.
[0013]
(Description of a preferred embodiment)
The preferred embodiment of the present invention operates on the WWW (World Wide Web). The software implementation environment provided by the WWW is described in numerous books, for example, HTML 3.2 and CGI Unleased (1996) by John December & Mark Ginsburg. WWW is a Bermers-Lee et al. Hypertext Transfer Protocol-a network called HTTP (Hypertext Transfer Protocol), described in HTTP / 1.0 (Internet Request For Comments, No. 1945, 1996). Based on protocol. The HTTP protocol is described by Douglas E. As described in Internetworking with TCP / IP (3rd Edition, 1995) by Commer, it must be run on a general connection-oriented protocol, which is now generally TCP / IP. No. However, the invention described herein is not limited to HTTP running on any particular type of network software or hardware. The principles of the present invention also apply to other protocols for accessing remote information that may become comparable to or replace HTTP.
[0014]
As shown in FIG. 1, a web user sits in front of his computer and runs a computer program called a browser. Browsers send HTTP requests to other computers, called servers. In this request, specific data, referred to as resources available on the server, is referenced by a uniform resource locator (URL). In addition, supra. The character string of a specific format defined in the A URL includes both the identification of the server and the identification of specific data in the server. In response to this request, the server responds to the user's browser, and the browser responds to these responses, generally by displaying several types of content to the user.
[0015]
The content portion of the response may be a "web page" represented by Hypertext Markup Language (HTML). The language can represent content consisting of bitmap format images (also known as anchors and hyperlinks) and text interspersed with links. The link is a further URL link where the browser may send further requests at the direction of the user.
[0016]
The response may also include more complex commands that are interpreted by the browser, for example, commands that result in a moving image. HTML itself does not define complex commands, but rather those commands are considered to belong to individually defined scripting languages. Two of the most common script languages are Java (registered trademark) Script and VBScript.
[0017]
In addition to extending the functionality of the browser with code written in a scripting language, it is also possible to extend the functionality of the browser with compiled code. Such compiled code is called a "plug-in." The exact protocol for writing a plug-in depends on the particular browser. The plug-in for Microsoft's browser is called Active X Controls.
[0018]
Plugins can be very complex. A plug-in that may be advantageously used in accordance with the present invention is Shockwave by Macromedia. Thereby, the moving image that is part of the server response is downloaded and played by the user. Shockwave defines its own scripting language called Lingo. The Lingo script is included in a downloadable movie that the Shockwave plug-in can play. A common format for Shockwave motion pictures is a timeline consisting of a series of frames with a number of video objects appearing, moving, and disappearing at specific frames in the timeline. To achieve more complex effects in Shockwave animation, Lingo scripts may be used in addition to certain visual objects.
[0019]
The preferred embodiment of the present invention uses a plug-in, called a command plug-in, that provides the scripting language with the ability to command the performance of a music recording in detail. Command plugins should provide at least the following basic functions:
(1) Start and stop playing
(2) Know the current song and position in the track
(3) Search for songs and positions in the track
(4) Set the volume
(5) Obtain information about the CD (for example, the number of tracks, its length, pause between tracks)
(6) Obtain information on the performance of the CD drive
Other features may also be provided and are limited only by what the basic operating system services can provide.
[0020]
The command plugin is preferably C ⁺⁺ Written in a conventional programming language such as The plug-in should conform to the plug-in's existing standards, such as those required by Microsoft's Active X object. In order to obtain information and perform the functions that the command plug-in makes available to the scripting language, the command plug-in relies on the ability to provide control and information regarding the music recording being performed. These features depend on the exact source of the recording. As in the presently preferred embodiment, the recording is being played on an audio CD in a computer CD player and the browser is running on Microsoft® Windows® 3.1 or Windows® 95 In these cases, these functions are MCI functions that form part of the Win32 application programming interface. These features are documented, for example, in the Microsoft Win32 Programmer's Reference. The different functions may be provided by a streaming audio receiver, such as a receiver that takes in audio encoded in a suitable audio encoding format, such as MPEG, that enters the user's computer via a network connection.
[0021]
An important point when referring to the implementation of a command plug-in is that the action to perform, such as a search, takes about one second. It is not desirable for a command plugin to retain control of the machine during the second, and in long runs the plugin will relinquish control of the machine to the browser and be used in a common scripting language It is important to report the result of the operation via the asynchronous event handle capability.
[0022]
General knowledge of the functions provided by the command plug-in, how to write the plug-in (e.g., how to write an Active X object), and related application programming interfaces for controlling the performance of music recordings (e.g., Given the above summary of Win32 MCI), those skilled in the art can easily and without undue experimentation develop a working command plug-in. For this reason, further details on how the command plug-in is executed will not be described here.
[0023]
The presence of a command plug-in that provides the above-described functionality through a scripting language is the basis on which entertainment that complements music recordings may be built. Above all, on this basis, it is possible to devise a method of synchronizing the display of the video content by means of a scripting language with events occurring on an audio CD.
[0024]
In a preferred embodiment of the invention, the synchronization of the video content to the audio CD proceeds as follows. The video content is downloaded from the server, provided to the user by a Shockwave plug-in, and displayed as a Shockwave movie. This download may be done before the video is displayed, or, if the user's connection to the network is fast enough to support the download at a reasonable speed, the video will be displayed simultaneously. You may. Download is a function provided by the Shockwave plug-in itself.
[0025]
When a Shockwave movie is played, the Lingo script is executed each time the frame ends displaying. The Lingo script contains a description of the relationship that should exist between the frames of the movie and the segments of the music recording identified by the number of tracks and the time. The Lingo script uses the above command plug-in to determine the track and time at which the audio CD will play. Then, the description is referred to in order to determine which frame of the moving image corresponds to the audio CD portion. If the current frame is not one of these frames, the Lingo script resets the timeline of the movie, and the movie starts playing at the frame corresponding to the current position on the audio CD. This may cause delays in the current location of the CD, for example, due to delays in downloading from the network, lack of a cycle in which the user's computer plays the video at full speed, or because the user downloads the CD quickly. If so, video content can be used.
[0026]
In a variation of this synchronization algorithm (shown in FIG. 2), the frames of the video are arranged in groups of adjacent frames. A correspondence between each group of such frames and a particular segment of the audio recording is established (box 200 in FIG. 2). At the end of each frame of the movie, the audio performance position is determined (box 210). A test is performed to determine whether the audio playing position is within the segment of the recording corresponding to the group of frames to which the next consecutive frame belongs (box 215). If the audio performance position is within that segment, playback of the animation continues with the next frame (box 230). If the audio playing position is not within the segment, playback of the moving image proceeds to the frame corresponding to the position where the audio is (boxes 220 and 225).
[0027]
A further aspect of the present invention is the ability to provide a technique for establishing a unique identifier for a recording that may be stored on mass storage by utilizing a command plug-in. Besides mass storage devices, integrated circuits, magnetic media (eg hard disks) or other media, or removable media such as audio CDs, portable flash memories and memory sticks ^TM An integrated circuit memory such as that accessed by a computer's CD-ROM drive, such as an MP3 player / recorder or other device accessible to the medium. The unique identifier may also be based on the number and length of tracks (measured in frames, i.e., 1 / 75th of a second) from the Table of Contents (TOC) data or the content of the recording itself. Good. The identifier is simply the concatenated track length used by the fuzzy comparison algorithm and for more accurate matching if one or more possible matches are found.
[0028]
The following is an example of a fuzzy comparison algorithm used in the present invention. For each of the two audio CDs to be compared, the length of all tracks on the recording medium is determined in milliseconds. Next, each track length is shifted to the right by 8 bits. Actually, 2 ⁸ = 256 to perform truncation division. Then, the process is performed for each recording medium for each track while accumulating the two numbers, that is, the total number of alignments and the number of alignment errors, as the process proceeds. Both of these numbers are initialized to zero at the start of the comparison. For each track, increment the total number matched by the shifted track length of the first CD being compared, and increment the alignment error by the absolute value of the shifted track length difference between the two CDs. If one of the CDs has fewer tracks than the other, arriving at the last track of the CD with the smaller number of tracks, the other will increment the total number of alignments and the number of alignment errors by the remaining track length shifted while the other. Continue the calculation on the CD track. Following these steps, which run through the track, the algorithm divides the number of alignment errors by the number of matches, subtracts the resulting exponent from 1, and determines the difference by how well the two CDs match. To a percentage that indicates
[0029]
The use of track length to generate an identifier for recording is best suited for media with multiple tracks, preferably media such as CDs and DVDs with a table of contents, ie, a track for storing TOC information. Further, the use of track length or TOC data has been found to work best for fuzzy matches, which may result in finding one or more possible matches. An alternative or supplement to TOC data is to use the content on the media. However, it is desirable to use relatively small content-based identifiers to minimize storage space and bandwidth requirements.
[0030]
Embodiments of the present invention use an amplitude signature, which (if there is more than one track in the recording) uses a short, middle, and last track from the position of each track. For example, a content-based identifier generated from a 5 second sample segment can be provided. An example of such a sample segment (the term sample segment is used to generate an identifier and an identified segment, ie, an identified segment in the TOC, commonly referred to as a "track" on a CD) (Used to distinguish between the two) is illustrated by waveform 410 in FIG. 4A. According to the present invention, a plurality of amplitude bands or intervals are defined, and the number of occurrences of the waveform of all segments in each interval is counted. Redbook CD audio is a digital audio file sampled in 16-bit stereo with 44.1K samples per channel and 75 frames of data per second. Therefore, a maximum of 220,500 times occurs in one 5-second sample segment (75 frames / second × 588 samples / frame × 5 seconds = 220,500 samples in 5 seconds of data). In order to guarantee uniqueness, about 2000 pieces (for example, 2 ¹¹ Or 2048), but depending on the characteristics of the waveform being compared, other sizes, numbers, and types of samples and another number of slots may be used. Is also good. To simplify the description of the present invention, a more general example is shown for FIGS. 4A and 4B.
[0031]
In a preferred embodiment, the first step is to normalize the waveform so that the first and last slots have at least one waveform occurrence. The waveform 410 of FIG. 4A is normalized over the seven slots 420 illustrated in FIG. 4A and generates the waveform 410b of FIG. 4B with the slots 420 individually shown as slots 421-427. I do. In the simplified example provided in FIG. 4B, 16 time samples are taken, one for each vertical line. Thus, there is one sample of the waveform 410b in the slot 421, three in the slot 422, two in the slot 423, one in the slot 424, and one in the slot 425. There are two, three in the slot 426 and four in the slot 427. This is represented by the linear array A1 [1, 3, 2, 1, 2, 3, 4].
[0032]
If the array A1 is an identification signature array representing a selected recording for which a match in the database is to be searched, the fuzzy match computes the average of the differences between the elements of A1 and the elements of the existing signature array. May be performed. For example, one of the records in the database is A2 [2,3,4,1 for a difference array of [1,0,2,0,1,0,1] or an average difference of 5/7 or 0.714. , 1, 3, 3]. "Fuzzy matching" based on average differences allows for waveform errors and incomplete initial findings for signature generation. However, the average difference that is accepted as a match must be set in order to minimize false positives. Alternatively, the number or length of the sample segments may be increased to reduce false positives, but this increases the time taken to read the recording medium and calculate the signature array. For the tested waveforms, an average difference of 10 removes a number of false positives when using the CD waveform and three 5-second sample segments, each having 2048 slots. However, it turns out that we can actually find all possible matches. Under these conditions, it can be seen that 256 intervals (slots) produce so many matches of dissimilar waveforms, and 4000 intervals (slots) are placed separately, so that there are many close matches. Was. The exact number of slots depends on the size of the sample segment and the type of waveform being sampled.
[0033]
If one or more possible matches are found, an identification and a more accurate comparison of the existing signature array is performed. Either exactly aligned slots or the number of slots within one alignment occurrence may be used. In the above example, if one error (or one delay) in one array is allowed and three of seven or 43% of the elements match exactly, six of seven or array 86% of the elements in A1 and A2 are matched. It has been found that it is likely that a match of over 80% for one tolerance or over 70% without tolerance would be possible. The tolerance is increased to one or more, allowing more tolerance in matching the waveforms.
[0034]
The unique identifier for the music recording is used as a database key. The site may maintain a database of information about CDs, for example, information about all CDs released by a particular record company may be maintained on that record company's site. There are various ways for the user to navigate this information. For example, a user may use a web page containing a number of hyperlinks as a table of contents, or may use a conventional search engine. A third method of searching enabled by the unique identifier of the present invention is the presence of a web page that recommends that the user place the CD for which information is sought on the computer's CD drive. Upon detecting the presence of a CD in the drive, the web page script calculates a unique identifier corresponding to the CD and sends it to the server. The server then displays information about the CD retrieved from the database based on the unique identifier. This information includes web addresses (URLs) associated with the audio CD (eg, from the artist's home page), simple data such as song titles, and potentially (eg, the band's) pictures, artwork, videos, and Complementary entertainment including video clips may be included. Arranging the items and when the user inserts the audio CD into the computer, (i) the browser is started if it is not already running, (ii) the browser calculates the unique identifier of the CD, and It is also possible to retrieve the URL from the identifier and (iii) cause the browser to get an HTTP transaction on that URL.
[0035]
Another application of the unique identifier for music recording is to use an audio CD as a key to enter the premium area of the web. Currently, there are premium areas of the web that are allowed by signature. A simple form of permission based on a unique identifier is that a user can access a particular area of the web by placing a particular CD or a CD released by a particular company or a particular band or It is necessary to put a CD containing the music of the artist. This is easily performed by a script that uses the functionality provided by the command plug-in and calculates a unique identifier.
[0036]
A third aspect of the present invention is a chat room connection with music recording. Its purpose is to provide the same music to all participants in the chat room almost simultaneously.
[0037]
General network protocols for chat services are described in Oikarinen & D. The Internet Relay Chat (IRC) described by Reed in the Internet Relay Chat Protocol (Internet Request for Comments, No. 1459, 1993). In this protocol, when a client of a chat server becomes a client, the name of the chat room is sent. The chat server receives messages from all of the client's names and relays the message sent from one client to all other clients connected to the same room as the client. The messages sent by the client are typically typed by the user who is the client, and the messages received by the client are displayed and read to the user who is typically the client.
[0038]
In a preferred embodiment of the present invention, the chat client is customized by a plug-in we call a chat plug-in. The chat client is started by the browser as follows (see FIG. 3). Once downloaded, the user connects (box 305) by browser to a central web page (box 300) asking if the user has inserted the CD into his player. The unique identifier of the CD is calculated and sent back to the server by using the control plug-in described above under the command of the script of the central web page (box 310). The server then uses the unique identifier to determine whether the chat room focuses on the CD (box 315). This step may be performed by finding the unique identifier in the database using techniques known in the art. When a web page is connected to a database, for example, December & Ginsberg, supra, chapter 21, there is an enormous amount of literature. If a CD-focused chat room exists or can be created, the server responds with the name of the chat room and the browser starts the chat client on the user's computer as a client of the chat room. Yes (box 320).
[0039]
The name of the chat room is set by the server and contains information about the track that the CD is playing on the client terminal of the other chat room, the time the track started playing, and the volume at which the CD is playing. I have. The chat client plug-in uses the information and causes the control plug-in to set the CD on the user's computer and to play almost in sync with the CD playing on the client terminal in the other chat room ( Box 320).
[0040]
Each user in the chat room can control the CD being played on his terminal. The control action becomes a chat plug-in that sends a message to the chat server describing the control action being performed (box 325). For example, such a message may indicate a change in the location of a CD, a change in volume, or the removal of a CD for replacement with another CD. The chat plug-in running on the other user's terminal repeats the operation on the other user's terminal (as much as possible) using the above-mentioned control plug-in when viewing such a message. (Box 330).
[0041]
In a further aspect of the invention, a chat room focused on a particular music recording can select a particular track through a voting procedure. A simple voting procedure is that each chat plug-in affects the type of change message described above only when it sees two consecutive change messages that can be identified. This means that in order to change the track being played, two users need to change to that track. The second song may be replaced with a higher number.
[0042]
In a further aspect of the invention, messages delivered to chat users are driven from text files rather than typing by hand. Thereby, the experience recorded in advance is reproduced for a group of chat users. Such a technique may be used to create a pre-recorded, narrated tour of an audio CD.
[0043]
An important advantage of the preferred embodiment as described above is that it supports the minimal functionality required by an Internet Relay Chat or similar protocol that provides a minimal chat service. That is, it may be used with any chat server software. Additional software needed is found in the chat client plug-in and central web page, connecting to a database of CD information.
[0044]
Numerous features and advantages of the invention will be apparent from the detailed description, and thus the appended claims are intended to cover all such features and advantages of systems that fall within the spirit and scope of the invention. ing. In addition, many modifications and variations will be readily apparent to those skilled in the art from the present disclosure. It is not desired to limit the invention to the exact configuration and operation shown and described, and accordingly, appropriate modifications and equivalents may be employed without departing from the scope and spirit of the invention.
[Brief description of the drawings]
FIG.
FIG. 1 is a block diagram of an environment in which the preferred embodiment operates.
FIG. 2
FIG. 2 is a flowchart of the synchronization code of the present invention.
FIG. 3
FIG. 3 is a flowchart of an operation sequence for connecting to a chat room focused on music recording.
FIG. 4
4A and 4B are explanatory diagrams of the waveform analysis according to the present invention.

Claims

A method of searching for a match in a database of a plurality of records, wherein said records in said database correspond to recordings including waveforms,
Generating an amplitude signature for at least one segment of the selected recording;
Determining at least one matching record in the database for the selected recording based on the amplitude signature;
A method comprising:

Further, for the record and the selected recording in the database, comprising a step of calculating approximate length information,
The method of claim 1, wherein the determining step is also based on the approximate length information.

The recording has at least one track,
The calculating step calculates the length of each track of each recording in the database and represented for the selected recording;
The method of claim 2, wherein the determining step is also based on the number and length of tracks of the recording represented in the database and the selected recording.

The waveform is represented by sampled digital data of the recording and the selected recording;
The method further comprises storing an existing signature array for each of the recordings represented in the database, wherein each element of the existing signature array includes at least one of the recordings represented in the database. In one segment, corresponding to the number of occurrences of said sampled digital data in an amplitude band;
The generating step generates an identification signature array, wherein each element of the identification signature array corresponds to a number of occurrences of the sampled digital data in an amplitude band in the at least one segment of the selected recording. 4. The method of claim 3, wherein:

The determining step includes:
For the recording represented in the database, calculating an average difference between the elements of the identification signature array and the existing signature array,
Identifying any recording represented in the database where the average difference is greater than a predetermined value as a possible match.

The determining step includes:
Calculating, within a predetermined number of each other, matching percentages of the identification signature array and corresponding elements in the existing signature array;
Indicating as a possible match any recording represented in the database, wherein the match percentage is greater than a predetermined percentage.

The method of claim 6, wherein the predetermined number is zero and the predetermined percentage is about 70%.

The method of claim 6, wherein the predetermined number is one and the predetermined percentage is about 80%.

The method of claim 4, wherein the recording is stored on a removable storage medium owned by a user.

The method of claim 4, wherein the recording is a digital file stored on a mass storage device accessible by a viewer of the selected recording.

Additionally, receiving a query to search for a match between the selected recording and the record in the database, the query including, for the selected recording, a track number and the length information. 4. The method according to claim 3, comprising the steps of:

The waveform is represented by digital data sampled in the recording and the selected recording;
The method further comprises storing an existing signature array for each of the recordings represented in the database, wherein each element of the existing signature array includes at least one of the recordings represented in the database. In one segment, corresponding to the number of occurrences of said sampled digital data in an amplitude band;
The generating step generates an identification signature array, wherein each element of the identification signature array corresponds to a number of occurrences of the sampled digital data in an amplitude band in the at least one segment of the selected recording. The method of claim 1, wherein

The determining step includes:
For the recording represented in the database, calculating an average difference between the elements of the identification signature array and the existing signature array,
Identifying any recording represented in the database as a possible match, wherein the average difference is greater than a predetermined value.

The determining step includes:
Calculating, within a predetermined number of each other, a matching percentage of the identification signature array and corresponding elements in the existing signature array;
Indicating as a possible match any recording represented in the database where the match percentage is greater than a predetermined percentage.

The method of claim 14, wherein the predetermined number is zero and the predetermined percentage is about 70%.

The method of claim 14, wherein the predetermined number is 1 and the predetermined percentage is about 80%.

The method of claim 12, wherein the recording is stored on a removable storage medium owned by the user.

The method of claim 17, wherein the recording is a digital file stored on a mass storage medium accessible by a listener of the selected recording.

Playing the selected recording at a first location on a device owned by the user;
The method further comprises:
Generating a query by the device at the first location;
Sending the query to a server at a second location where the database is stored to retrieve at least one matching record.

And transmitting additional information stored in the at least one approximate match record and not included in the selected recording from the server to the device at the first location. 20. The method according to claim 19, wherein:

For a recording corresponding to the record, a storage unit storing a database of records including existing signatures,
Coupled to the storage unit to generate an identification amplitude signature for the selected recording, and to compare the identification amplitude signature to the existing amplitude signature in the database for the selected recording. A processing unit programmed to determine at least one matching record in said database.

The storage unit further stores information indicating the length and number of the identified segments of the recording;
The processing unit calculates approximate length information for the selected recording, and further identifies the approximate length information in the selected recording and the recording corresponding to the record in the database. 22. The database system according to claim 21, wherein the at least one matching record in the database is determined based on the number of segments obtained.

The recording includes sampled digital data;
The storage unit stores the existing signature array corresponding to a number of occurrences of the sampled digital data in an amplitude band in at least one segment of the recording where each element is represented in the database. ,
The processing unit generates the identification signature array, each element corresponding to a number of occurrences of the sampled digital data in an amplitude band in at least one segment of the selected recording, and displaying the identification signature array in the database. Calculating the average difference between the elements of the identification signature array and the existing signature array for the recording being performed, and any recording represented in the database where the average difference is greater than a predetermined value. 22. The database system of claim 21, wherein the at least one match record is determined by identifying the match as a good match.

The recording includes sampled digital data;
The storage unit stores the existing signature array corresponding to a number of occurrences of the sampled digital data in an amplitude band in at least one segment of the recording where each element is represented in the database. ,
Said processing unit generating said identification signature array, each element corresponding to a number of occurrences of said sampled digital data in an amplitude band in at least one segment of said selected recording, wherein a predetermined number of each Calculating a matching percentage of the identification signature array and corresponding elements in the existing signature array, and matching any of the recordings represented in the database where the matching percentage is greater than a predetermined percentage. 22. The database system of claim 21, wherein the at least one matching record is determined by indicating

Further, a communication unit coupled to the storage unit for receiving a query for searching for a match between the selected recording and the record in the database, wherein the query is related to the selected recording. 22. The database system according to claim 21, further comprising a communication unit including the number of segments and the length information.

The recording of claim 2, wherein the recording corresponding to the recording in the database and the selected recording each include at least one audio portion, and wherein the number of segments is the number of tracks in the audio portion. 26. The database system according to 25.

The database system according to claim 26, wherein the recording is stored in a removable storage medium owned by the user.

The database system of claim 26, wherein the recording is a digital file stored on a mass storage device accessible by a listener of the selected recording.

The processing unit, the storage unit, and the communication unit are in a first position;
The database system further comprises:
An apparatus for generating the query and playing the selected recording, owned by a user at a second location and remote from the first location;
26. A database according to claim 25, comprising a communication network for at least temporarily connecting the device with the communication unit and for sending the inquiry from the device to the communication unit. system.

Wherein said communication unit sends said device over said communication network additional information stored in said at least one approximate matching record and not included in said selected recording. A database system according to claim 29.

At least one computer program stored on a computer-readable medium for performing a method of searching for a match in a database of a plurality of records, wherein the records in the database correspond to recordings including waveforms. Program
Generating an amplitude signature for at least one segment of the selected recording;
Determining at least one matching record in the database for the selected recording based on the amplitude signature.

Further comprising calculating approximate length information for the record and the selected recording in the database,
The at least one computer program of claim 31, wherein the determining step is also based on the approximate length information.

The recording has at least one track,
The calculating step calculates the length of each track of each recording represented in the database and for the selected recording;
33. The at least one computer program of claim 32, wherein the determining step is also based on the number and length of tracks of the recording represented in the database and the selected recording. .

The waveform is represented by the sampled digital data in the recording and the selected recording;
The method further comprises storing an existing signature array for each of the recordings represented in the database, wherein each element of the existing signature array represents the recording represented in the database. Corresponding to the number of occurrences of said sampled digital data in an amplitude band in at least one segment of
The generating step generates an identification signature array, wherein each element of the identification signature array corresponds to a number of occurrences of the sampled digital data in an amplitude band in the at least one segment of the selected recording. 34. At least one computer program according to claim 33, characterized in that:

The determining step includes:
For the recording represented in the database, calculating an average difference between the elements of the identification signature array and the existing signature array,
Identifying at least one recording represented in the database as a possible match, wherein the average difference is greater than a predetermined value.

The determining step includes:
Calculating, within a predetermined number of each other, a matching percentage of the identification signature array and corresponding elements in the existing signature array;
At least one computer program as claimed in claim 34, wherein the match percentage indicates any recording represented in the database that is greater than a predetermined percentage as a possible match.

35. The at least one computer program of claim 34, wherein the recording is stored on a removable storage medium owned by the user.

35. The at least one computer program of claim 34, wherein the recording is a digital file stored on a mass storage device accessible by a listener of the selected recording.

Further, receiving a query to search for a match between the selected recording and the record in the database, the query comprising: a track number and the length information for the selected recording. 34. At least one computer program according to claim 33, comprising the steps of including.

The waveform is represented by the sampled digital data in the recording and the selected recording;
The method further comprises storing an existing signature array for each of the recordings represented in the database, wherein each element of the existing signature array includes at least one of the recordings represented in the database. In one segment, corresponding to the number of occurrences of said sampled digital data in an amplitude band;
The generating step generates an identification signature array, wherein each element of the identification signature array corresponds to a number of occurrences of the sampled digital data in an amplitude band in the at least one segment of the selected recording. The at least one computer program according to claim 31, characterized in that:

The determining step includes:
For the recording represented in the database, calculating an average difference between the elements of the identification signature array and the existing signature array,
At least one computer program as claimed in claim 40, characterized in that any recording represented in the database whose average difference is greater than a predetermined value is identified as a possible match.

The determining step includes:
Calculating, within a predetermined number of each other, a matching percentage of the identification signature array and corresponding elements in the existing signature array;
Indicating as a possible match any recording represented in the database, wherein the match percentage is greater than a predetermined percentage.

41. The at least one computer program according to claim 40, wherein the recording is stored on a removable storage medium owned by the user.

44. The at least one computer program of claim 43, wherein the recording is a digital file stored on a mass storage device accessible by a listener of the selected recording.

Playing the selected recording at a first location on a device owned by the user;
The method further comprises:
Generating an inquiry by the device at the first location;
Sending said query to a server at a second location where said database is stored to retrieve at least one matching record. Computer programs.

Additionally, the method comprises sending from the server to the device at the first location additional information stored in the at least one approximate matching record and not included in the selected recording. 46. The at least one computer program according to claim 45, characterized in that: