JP4736201B2

JP4736201B2 - Information retrieval apparatus and method, and storage medium

Info

Publication number: JP4736201B2
Application number: JP2001041517A
Authority: JP
Inventors: 純平井
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2001-02-19
Filing date: 2001-02-19
Publication date: 2011-07-27
Anticipated expiration: 2021-02-19
Also published as: JP2002245066A

Description

【０００１】
【発明の属する技術分野】
本発明は、情報検索技術に係り、特に、映像との音楽の各データからなるＡＶコンテンツを検索する情報検索技術に関する。
【０００２】
更に詳しくは、本発明は、元データが持つ特徴量に基づくＡＶコンテンツのための情報検索技術に係り、特に、映像と音楽の両方の特徴量を利用してＡＶコンテンツを検索する情報検索技術に関する。
【０００３】
【従来の技術】
昨今、コンピュータ上での演算処理をベースとした情報処理技術が飛躍的に向上している。これに伴い、単なるコンピュータ・データ以外に、映像や音声などの各種のデータ・コンテンツもデジタル化され、コンピュータ・ファイルとして取り扱われるようになってきている。したがって、情報資源の有効活用という観点からも、各種のデータ・コンテンツに関する検索技術は重要度を増している。
【０００４】
あるいは、コンテンツの配信や放送の分野において、コンテンツ使用量に関して従量課金制を導入した場合や、著作権保護の観点からコンテンツの使用を取り締まる場合など、各コンテンツに対して配信又は放送された回数を計数したり監視しなければならない。例えば、いま配信されたコンテンツが何であったかを特定するために、情報検索技術が適用される。
【０００５】
例えば、テキスト形式の比較的小サイズのデータ検索であれば、コンピュータの演算能力に頼ってテキストの一致を基調とする全数検索を行うこともできる。これに対し、映像や音楽データからなるＡＶコンテンツの場合、一般にデータ量が膨大であるため、コンテンツの初めから終わりまで全数検索を行うことは技術的に不可能である。
【０００６】
音楽情報検索として、楽曲データを採譜して検索対象となる楽曲データとの照合（全数検索）を行うものがある。この方法では、楽曲の初めから終わりまでパターン・マッチングを行う。しかしながら、インデックスを使用しない全数検索は、検索速度が遅いという欠点がある。
【０００７】
例えば音楽放送においては、音譜ベースでの比較照合ではなく、信号の特徴抽出による監視が既に実施されている。すなわち、ＩＳＲＣ（International Standard Recording Number）のような楽曲のユニーク・コードと対応させた楽曲の特徴量をあらかじめデータベースに登録しておく。ＩＳＲＣや楽曲の採取には、ＣＤ（Compact Disc）などの既存の記録メディアを利用する。そして、放送を受信してその特徴量とデータベースに登録済みの特徴量とを比較照合して、受信した楽曲のＩＳＲＣを特定する（図６を参照のこと）。
【０００８】
音楽放送で実施されている同一検索を、ＡＶコンテンツにまでそのまま適用範囲を拡張する場合、音声で行うと無音部分では抽出できないという問題がある。例えば、音楽のみのコンテンツに比べ、映画などでは無音部分がかなりあるので、音楽データの特徴抽出に基づく検索には限界がある。
【０００９】
一方、映像だけでＡＶコンテンツを検索することを考えた場合、登場人物がしゃべっているシーンなど映像の内容の変化が少ない部分が多いので、映像データの特徴抽出に基づく検索には限界がある。
【００１０】
また、映画業界においては、同じ映画でも販売先となるそれぞれの国や地域の公用語などに応じた言語対応、すなわち音声の吹き替え処理が行われる。吹き替え処理の結果、音声ベースでのコンテンツ検索を行うと、同じ映画であっても一致しなくなる。
【００１１】
【発明が解決しようとする課題】
本発明の目的は、映像との音楽の各データからなるＡＶコンテンツを高速且つ確実に検索することができる、優れた情報検索装置及び方法を提供することにある。
【００１２】
【課題を解決するための手段及び作用】
本発明は、上記課題を参酌してなされたものであり、その第１の側面は、映像データと音声データの組み合わせからなるＡＶコンテンツを検索する情報検索装置又は方法であって、
音声の特徴量と音声識別コードを連結した音声データベースと、
映像の特徴量と映像識別コードを連結した映像データベースと、
音声識別コードと映像識別コードを連結したＡＶコンテンツ・テーブルと、
音声データベースを用いて音声の検索を行う音声データ検索手段又はステップと、
映像データベースを用いて映像の検索を行う映像データ検索手段又はステップと、
前記音声データ検索手段又はステップ、及び、前記映像データ検索手段又はステップにより抽出された音声識別コード及び映像識別コードに基づいて、前記ＡＶコンテンツ・テーブルから該当するＡＶコンテンツを決定するＡＶコンテンツ決定手段又はステップと、
を具備することを特徴とする情報検索装置又は方法である。
【００１３】
本発明に係る情報検索装置又は方法は、音声の特徴量と音声識別コードを連結したデータベースと、映像の特徴量と映像識別コードを連結したデータベースと、音声識別コードと映像識別コードを連結したＡＶコンテンツ・テーブルを用意する。そして、映像の検索と音声の検索を並行して実行する。
【００１４】
例えば、映像の検索で見つかった場合にその映像識別コードをＡＶコンテンツ・テーブルで参照して、該当する音声識別コードを引き出す。このコードを音声の検索に与えて、照合がとれれば検索は完了する。同様に、音声の検索で見つかった場合にその音声識別コードをＡＶコンテンツ・テーブルで参照して、該当する映像識別コードを引き出す。このコードを映像の検索に与えて、照合がとれれば検索は完了する。
【００１５】
放送系になどでこのような検索を行う場合、マルチパスの影響や、送信・受信回路の歪や圧縮による歪など、系としてある程度の歪を許容しなければならない場合がある。この場合、完全な同一検索では検索不能になるので、特徴量の照合時に、ある程度の判定の許容度を与える必要がある。逆に許容度を与えることによって、ある１つの検索対象に対して複数の検索結果が出現する可能性がある。
【００１６】
検索の高速化を狙った場合、複数の検索結果が出てしまう可能性は必然的に高まる。このような事態を解消するためには、映像と音声の両方を検索対象にして、信頼性を上げるようにすればよい。
【００１７】
映像で検索して先に見つかった場合には、この映像識別コードをＡＶコンテンツ・テーブルを参照して、音声の検索に音声識別コードを与えて照合を図る。また、その逆に、音声で検索して先に見つかった場合に、この音声識別コードをＡＶコンテンツ・テーブルを参照して、映像の検索に映像識別コードを与えて照合を図る。そして、映像及び音声の両方の結果が一致することを見て判断する。映像、音声の検索結果がそれぞれ複数出現する場合には、すべての組み合わせについて判断し、誤差が少ない結果を選択するようにする。
【００１８】
本発明の第１の側面に係る情報検索装置又は方法は、さらに検索対象となるＡＶコンテンツのコンテンツ識別コード取得手段又はステップを備えて、前記音声データ検索手段又はステップ、及び／又は、映像データ検索手段又はステップは、該コンテンツ識別コードに対応するＡＶコンテンツに対して優先的に照合を行うようにしてもよい。
【００１９】
コンテンツを識別する識別コードなどの情報を入手するメカニズムが用意されている場合であって、しかも改竄のおそれがなく入手できる場合には、信号の照合も必要がない。しかし、コンテンツの著作権者は、再生機や記録機をユーザが改竄したりして偽りの識別コードが生成されることを恐れる。そこで、改竄を確認するための信号の照合を行う必要がある。
【００２０】
多くの場合は改竄されていないので、コンテンツ識別コードに対応するコンテンツを優先して照合処理することによって、システム全体の情報検索速度を高速化することができる。
【００２１】
また、前記音声データ検索手段又はステップ、及び／又は、映像データ検索手段又はステップは、明かに相違するＡＶコンテンツに対する照合処理を省略するようにしてもよい。
【００２２】
明かに相違することが容易に分かるＡＶコンテンツに関しては、音声データ検索並びに映像データ検索を行う必要がなく、かかる工程を省略することにより計算機負荷を軽減することができるとともに、より類似するコンテンツに関して誤差を小さくして高精度な照合処理を行うことができる。また、明かに相違するコンテンツに対する余分な照合処理を省くので、システム全体の情報検索速度を高速化することができる。
【００２３】
本発明の第２の側面は、映像データと音声データの組み合わせからなるＡＶコンテンツを検索する情報検索処理をコンピュータ・システム上で実行するように記述されたコンピュータ・ソフトウェアをコンピュータ可読形式で物理的に格納した記憶媒体であって、前記コンピュータ・ソフトウェアは、
音声の特徴量と音声識別コードを連結した音声データベースを用いて音声の検索を行う音声データ検索ステップと、
映像の特徴量と映像識別コードを連結した映像データベースを用いて映像の検索を行う映像データ検索ステップと、
前記音声データ検索ステップ及び前記映像データ検索ステップにより抽出された音声識別コード及び映像識別コードに基づいて、音声識別コードと映像識別コードを連結したＡＶコンテンツ・テーブルから該当するＡＶコンテンツを決定するＡＶコンテンツ決定ステップと、
を具備することを特徴とする記憶媒体である。
【００２４】
本発明の第２の側面に係る記憶媒体は、例えば、様々なプログラム・コードを実行可能な汎用コンピュータ・システムに対して、コンピュータ・ソフトウェアをコンピュータ可読な形式で提供する媒体である。このような媒体は、例えば、ＣＤ（Compact Disc）やＦＤ（Floppy Disk）、ＭＯ（Magneto-Optical disc）などの着脱自在で可搬性の記憶媒体である。あるいは、ネットワーク（ネットワークは無線、有線の区別を問わない）などの伝送媒体などを経由してコンピュータ・ソフトウェアを特定のコンピュータ・システムに提供することも技術的に可能である。
【００２５】
このような記憶媒体は、コンピュータ・システム上で所定のコンピュータ・ソフトウェアの機能を実現するための、コンピュータ・ソフトウェアと記憶媒体との構造上又は機能上の協働的関係を定義したものである。換言すれば、本発明の第２の側面に係る記憶媒体を介して所定のコンピュータ・ソフトウェアをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る情報検索装置及び方法と同様の作用効果を得ることができる。
【００２６】
本発明のさらに他の目的、特徴や利点は、後述する本発明の実施例や添付する図面に基づくより詳細な説明によって明らかになるであろう。
【００２７】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施例を詳解する。
【００２８】
図１には、本発明の一実施形態に係る情報検索装置１の構成を模式的に示している。以下、同図を参照しながら、情報検索装置１の各コンポーネントについて説明する。
【００２９】
ＣＰＵ（Central Processing Unit）１１は、情報検索装置１全体の動作を統括的に制御する中央コントローラであり、オペレーティング・システム（ＯＳ）が提供するプログラム実行環境下で、メモリ１２に格納された（又はロードされた）制御プログラムや他のメモリ１３に格納された（又はロードされた）一致判別プログラムを実行するようになっている。
【００３０】
ここで言う一致判別プログラムには、映像の特徴値による映像データベースの照合処理、音声の特徴値による音声データーベースの照合処理、映像並びに音声データベースの照合結果に基づくＡＶコンテンツ・テーブルの照合処理などが含まれる。また、制御プログラムには、一致判別結果に基づくＡＶコンテンツの特定処理などのＡＶコンテンツに関する情報検索処理プログラムが含まれる。
【００３１】
ＣＰＵ１１は、バス２２を介して装置１内の各モジュールと相互接続されている。バス２２は、アドレス・バス、データ・バス、コントロール・バスを含んだ共通信号伝送路である。
【００３２】
映像・音声特徴抽出部１４は、ＡＶコンテンツから映像及び音声データへのデマルチプレクス処理、並びに、映像データの特徴抽出処理、音声データの特徴抽出処理を行う。映像・音声特徴抽出部１４によって抽出された映像並びに音声の特徴値は、一致判別プログラムに投入される。
【００３３】
特徴データベース１５は、ＡＶコンテンツに関する照合・情報検索処理に利用されるデータベースで構成される。より具体的には、特徴データベース１５は、音声の特徴量と音声識別コードを連結した音声データベースと、映像の特徴量と映像識別コードを連結した映像データベースと、音声識別コードと映像識別コードを連結したＡＶコンテンツ・テーブルとで構成される。このうち映画に関するＡＶコンテンツ・テーブルの構成例を以下に示しておく。
【００３４】
【表１】

【００３５】
上記に示したＡＶコンテンツ・テーブルでは、各レコードは、映画名と、映像識別コードと、音声識別コードと、ＡＶコンテンツ識別コードの組み合わせで構成される。したがって、映像識別コードが判明するとこれにヒットする音声識別コードを絞り込むことができるし、逆に、音声識別コードが判明するとこれにヒットする映像識別コードを絞り込むことができる。また、映像識別コードと音声識別コードの双方を特定すると、このＡＶコンテンツ・テーブルを参照することによってＡＶコンテンツ識別コードすなわちＡＶコンテンツが一意に定まる。
【００３６】
優先照合リスト１６、並びに映像・音声照合リスト１７は、情報検索すなわちＡＶコンテンツの照合処理を行うときの一部の作業データを一時的に保管するために使用される。
【００３７】
優先照合リスト１６は、優先的に照合処理を行うべきＡＶコンテンツをリストするための記憶モジュールである。例えば、外部からコンテンツ識別コードが渡されたＡＶコンテンツに関しては、この識別コードにより照合・検索範囲を限定して処理時間を高速化することができるまで、優先照合リスト１６に登録する。
【００３８】
映像・音声照合リスト１７は、各映像データ及び音声データについてのデータベース照合結果を一時的に保管するために使用される。例えば、映像データが一致するが対応する音声データが一致しない場合は、映像データの照合処理を行う必要がないので、映像照合リストにその旨を記入する。また、逆に音声データが一致するが対応する映像データが一致しない場合は、音声データの照合処理を行う必要がないので、音声照合リストにその旨を記入する。また、音声が明らかに違う場合には、音声照合リストにその旨を記入して高精度の照合処理をスキップさせるようにするし、映像が明らかに違う場合には、映像照合リストにその旨を記入して照合処理をスキップさせるようにする。
【００３９】
情報入力部１８は、外部機器から情報を入力する機能モジュールである。ここで入力される情報には、映像データと音声データがマルチプレクスされたＡＶコンテンツや、そのコンテンツ識別コードなどが含まれる。また、情報を入出力する媒体として、ＬＡＮ（Local Area Network）やインターネットなどのコンピュータ・ネットワーク、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの可搬型記録メディアなどを挙げることができる。情報入力部１８を介して外部の装置から供給された情報は、例えば、バッファ・メモリ１９上に一時格納される。
【００４０】
表示部２０は、ＣＰＵ１１による演算結果を画面出力してユーザに対して情報の視覚的なフィードバックを行う。また、ユーザ入力部２１は、例えばキーボードやマウス（いずれも図示しない）などの入力装置で構成され、ユーザからのコマンドやデータの入力を受容する。
【００４１】
なお、情報検索装置１は、例えば一般的な計算機システムを利用して構成することができる。かかる計算機システムの例は、米ＩＢＭ社のＰＣ／ＡＴ（Personal Computer/Advanced Technology）互換機又はその後継機である。
【００４２】
図２には、情報検索装置１上で実行されるＡＶコンテンツの検索処理についての概略的な手順を、フローチャートの形式で示している。以下、このフローチャートを参照しながら、本実施形態に係るＡＶコンテンツの検索処理の流れを概略的に説明する。
【００４３】
まず、検索対象となるＡＶコンテンツをデマルチプレクスして、映像信号と音声信号に分離する（ステップＳ１）。なお、図示しないが、必要に応じて映像信号と音声信号をそれぞれ圧縮を解凍する。
【００４４】
次いで、映像信号と音声信号からそれぞれ特徴量を抽出して、特徴データベース１５にあらかじめ登録された各データと照合して、コンテンツ識別コードを抽出する。ステップＳ２Ａの音声照合とステップＳ２Ｂの映像照合は並行して行う。コンテンツ識別コードは、特徴データベース１５中のＡＶコンテンツ・テーブル（前述）を参照して、音声であればそれに組み合わされる映像の識別コードを抽出し（ステップＳ３Ａ）、映像であればそれに組み合わされる音声の識別コードを抽出する（ステップＳ３Ｂ）。
【００４５】
抽出された音声識別コードは音声照合２（ステップＳ４Ａ）に渡され、そのコンテンツ識別コードに該当する特徴量を音声データベースと照合することで、該当する音声のコンテンツ識別コードを確認あるいは選別する。同様に、抽出された映像識別コードは映像照合２（ステップＳ４Ｂ）に渡され、そのコンテンツ識別コードに該当する特徴量を映像データベースと照合することで、該当する映像のコンテンツ識別コードを確認あるいは選別する。
【００４６】
音声照合２あるいは映像照合２で得られたコンテンツ識別コードのうち早期に得られたものを採用するように決定論理（ステップＳ５）を組むことによって、ＡＶコンテンツの識別の高速化を図ることができる。
【００４７】
［表１］に示したＡＶコンテンツ・テーブルを使用した場合、例えば、映像照合により映像識別コード（４）を抽出してＡＶコンテンツ・テーブルを参照すると、対応する音声識別コードとして（６，７，８）を取り出すことができる。これら音声識別コードを音声照合２に投入して、そこで照合が合致することによってＡＶコンテンツ識別コード（例えば、４−７）が判断される。
【００４８】
伝送系の歪、伝送時のマルチパスや圧縮や記録再生時の歪など、完全な同一検索ではなく、歪に対応するため、ある程度の許容度を照合時に認める場合には、ただ１つの識別コードに対して照合時に２つのコンテンツが対応してしまうことがある。このような場合には、音声照合２と映像照合２が一致した識別コードを採用するように、決定論理を構成して、信頼性の向上を図ることができる。
【００４９】
図２に示したフローチャートにおいて、点線で記したパスを設けて、音声照合２又は映像照合２から識別コードが得られなかった場合には、映像照合又は音声成功空の識別コードを直接利用して、「音声が正常ではないようですが映像から識別コードはＸＸＸＸＸと読めます」、あるいは「映像が正常ではないようですが音声から識別コードはＸＸＸＸＸと読めます」などのユーザ・フィードバックを、表示部２０などを介して行うようにしてもよい。
【００５０】
図３には、本実施形態に係る情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の一例を、フローチャートの形式で示している。同図に示す処理手順は、照合に誤差を許容しない場合の例である。
【００５１】
まず、検索対象となるＡＶコンテンツをデマルチプレクスして、映像信号と音声信号に分離する（ステップＳ１１）。なお、図示しないが、必要に応じて映像信号と音声信号をそれぞれ圧縮を解凍する。
【００５２】
次いで、映像信号と音声信号からそれぞれ特徴量を抽出する（ステップＳ１２）。そして、映像並びに音声それぞれの特徴値を特徴データベース１５にあらかじめ登録された各データと照合して（ステップＳ１３）、一致するか否かを判断する（ステップＳ１４）。映像及び音声の照合の速度はコンテンツによって区々である。本実施形態では、映像及び音声の照合処理をそれぞれベストエフォートで同時並行的に実行する。
【００５３】
映像並びに音声のいずれについても一致を見出せなかった場合には（ステップＳ１４）、特徴データベース中の次のコンテンツに進んで（ステップＳ２０）、同様に映像・音声の特徴量の照合を繰り返し実行する。そして、特徴データベース中で一致するコンテンツが見つからなかった場合には、該当コンテンツがない旨の表示を表示部２０から出力してユーザに通知し（ステップＳ２１）、本処理ルーチン全体を終了する。
【００５４】
映像について一致するデータが見つかった場合（ステップＳ１５）、音声照合リストを参照して、この映像データに対応する音声について、一致するものがあったか否かを判別する（ステップＳ１８）。対応音声について一致するものがない場合には、映像照合リスト中に当該映像は一致しない旨を記録した後（ステップＳ１９）、ステップＳ１３に復帰して、対応映像のコンテンツにジャンプして、映像の特徴データベースと比較照合する。
【００５５】
また、音声について一致するデータが見つかった場合（ステップＳ１５）、映像照合リストを参照して、この音声データに対応する映像について、一致するものがあったか否かを判別する（ステップＳ１６）。対応映像については一致するものがない場合には、音声照合リスト中に当該音声は一致しない旨を記録した後（ステップＳ１７）、ステップＳ１３に復帰して、対応音声のコンテンツにジャンプして、音声の特徴データベースと比較照合する。
【００５６】
判断ブロックＳ１６又はＳ１８の結果、映像及び音声について一致するＡＶコンテンツが見つかった場合、そのコンテンツ識別コードを検索結果として出力して、本処理ルーチン全体を終了する。
【００５７】
また、図４には、本実施形態に係る情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の他の例を、フローチャートの形式で示している。同図に示す処理手順は、何らかの手段により与えられたコンテンツ識別コードを利用して検索し、照合に誤差を許容しない場合の例である。
【００５８】
まず、何らかの手段により、検索対象となるＡＶコンテンツのコンテンツ識別コードを取得して（ステップＳ３１）、これを優先照合リスト１６に登録する（ステップＳ３２）。本実施形態に係る情報検索装置１は、例えば検索対象としてのＡＶコンテンツを絶え間なく入力し続けるような場合、優先照合リスト１６に登録されたＡＶコンテンツを優先的に照合処理する。
【００５９】
次いで、優先照合リスト１６に登録された最優先のコンテンツ識別コードに対応するＡＶコンテンツをデマルチプレクスして、映像信号と音声信号に分離する（ステップＳ３３）。なお、図示しないが、必要に応じて映像信号と音声信号をそれぞれ圧縮を解凍する。
【００６０】
次いで、映像信号と音声信号からそれぞれ特徴量を抽出する（ステップＳ３４）。そして、映像並びに音声それぞれの特徴値を特徴データベース１５にあらかじめ登録された各データと照合して（ステップＳ３５）、一致するか否かを判断する（ステップＳ３６）。映像及び音声の照合の速度はコンテンツによって区々である。本実施形態では、映像及び音声の照合処理をそれぞれベストエフォートで同時並行的に実行する。
【００６１】
映像並びに音声のいずれについても一致を見出せなかった場合には（ステップＳ３６）、特徴データベース中の次のコンテンツに進んで（ステップＳ４２）、同様に映像・音声の特徴量の照合を繰り返し実行する。そして、特徴データベース中で一致するコンテンツが見つからなかった場合には、該当コンテンツがない旨の表示を表示部２０から出力してユーザに通知し（ステップＳ４３）、本処理ルーチン全体を終了する。
【００６２】
映像について一致するデータが見つかった場合（ステップＳ３７）、音声照合リストを参照して、この映像データに対応する音声について、一致するものがあったか否かを判別する（ステップＳ４０）。対応音声について一致するものがないない場合には、映像照合リスト中に当該映像は一致しない旨を記録した後（ステップＳ４１）、ステップＳ３５に復帰して、対応映像のコンテンツにジャンプして、映像の特徴データベースと比較照合する。
【００６３】
また、音声について一致するデータが見つかった場合（ステップＳ３７）、映像照合リストを参照して、この音声データに対応する映像について、一致するものがあったか否かを判別する（ステップＳ３８）。対応映像については一致するものがない場合には、音声照合リスト中に当該音声は一致しない旨を記録した後（ステップＳ３９）、ステップＳ３５に復帰して、対応音声のコンテンツにジャンプして、音声の特徴データベースと比較照合する。
【００６４】
判断ブロックＳ３８又はＳ４０の結果、映像及び音声について一致するＡＶコンテンツが見つかった場合、そのコンテンツ識別コードを検索結果として出力して、本処理ルーチン全体を終了する。
【００６５】
検索対象となるＡＶコンテンツのコンテンツＩＤを改竄のおそれなく入手できる場合、信号の照合も必要ない。しかしながら、コンテンツの著作権者は、再生機や記録機を改造したりして偽りのＩＤが作られることをおそれる。そこで、その改竄を確認するため、信号の照合を行う必要がある。多くの場合は改竄されていないので、上述したように優先照合リスト１６中のコンテンツを優先して照合して、一致していればそれでコンテンツ検索を終了することができるので、検索速度を大幅に高速化することができる。
【００６６】
また、図５には、本実施形態に係る情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の他の例を、フローチャートの形式で示している。同図に示す処理手順は、特徴データベースに登録されたＡＶコンテンツのうち検索対象となるＡＶコンテンツとは明らかに相違するものを除外して照合処理することにより、高速化を図った例である。
【００６７】
まず、検索対象となるＡＶコンテンツをデマルチプレクスして、映像信号と音声信号に分離する（ステップＳ５１）。なお、図示しないが、必要に応じて映像信号と音声信号をそれぞれ圧縮を解凍する。
【００６８】
次いで、映像信号と音声信号からそれぞれ特徴量を抽出する（ステップＳ５２）。そして、映像並びに音声それぞれの特徴値を特徴データベース１５にあらかじめ登録された各データと照合して（ステップＳ５３）、検索対象となるＡＶコンテンツとは明らかに相違するか否かを判断する（ステップＳ５４）。映像及び音声の照合の速度はコンテンツによって区々である。本実施形態では、映像及び音声の照合処理をそれぞれベストエフォートで同時並行的に実行する。ここでは、コンテンツの相違を検証することのみを目的とするので、ステップＳ５３では比較的大きな誤差を認めた照合処理を行えばよい。
【００６９】
ステップＳ５４において、明らかに相違すると判断された場合、映像又は音声のうちいずれが明らかに相違するかをさらに判別する（ステップＳ５５）。
【００７０】
映像に関して明らかに相違する場合には、映像照合リスト中に当該映像は一致しない旨を記録した後（ステップＳ５６）、ステップＳ５３に復帰して、対応映像の照合処理をスキップする。
【００７１】
また、音声に関して明らかに相違する場合には、音声照合リスト中に当該音声は一致しない旨を記録した後（ステップＳ５７）、ステップＳ５３に復帰して、対応音声の照合処理をスキップする。
【００７２】
検索対象となるＡＶコンテンツと明らかに相違する訳ではない場合には、特徴データベース中の次のコンテンツに進んで（ステップＳ５８）、同様に検索対象となるＡＶコンテンツと明らかに相違しないか否かを判別する処理を繰り返し実行する。
【００７３】
特徴データベース中のすべてのデータについて、検索対象となるＡＶコンテンツとの相違を検証した後、映像並びに音声それぞれの特徴値を特徴データベース１５にあらかじめ登録された各データと照合して（ステップＳ５９）、検索対象となるＡＶコンテンツとは相違するか否かを判断する（ステップＳ６０）。映像及び音声の照合の速度はコンテンツによって区々である。本実施形態では、映像及び音声の照合処理をそれぞれベストエフォートで同時並行的に実行する。ここでは、データの絞込みを目的とするので、ステップＳ５９では先行ステップＳ５３よりも誤差を小さくした照合処理を行う。
【００７４】
ステップＳ６０において、相違すると判断された場合、映像又は音声のうちいずれが相違するかをさらに判別する（ステップＳ６１）。
【００７５】
映像に関して相違する場合には、映像照合リスト中に当該映像は一致しない旨を記録した後（ステップＳ６２）、ステップＳ５９に復帰して、対応映像の照合処理をスキップする。
【００７６】
また、音声に関して相違する場合には、音声照合リスト中に当該音声は一致しない旨を記録した後（ステップＳ６３）、ステップＳ５９に復帰して、対応音声の照合処理をスキップする。
【００７７】
特徴データベース中のすべてのデータについて、検索対象となるＡＶコンテンツとの相違を検証した後、本処理ルーチン全体を終了する。
【００７８】
このようにして照合の誤差を徐々に小さくしていくことで、最終的に特徴が近いコンテンツを高速に探索することができる。ステップＳ５８までの工程では、明らかに相違する訳でないコンテンツがただ１つに絞り込まれた場合には、ステップＳ５９以降の工程を合えて実行する必要はない。
【００７９】
また、被検信号に歪が多いことが想定される場合には、ステップＳ５３〜Ｓ５８の工程において、違う程度を徐々に小さくして何回も繰り返すことで、誤差の最も小さいコンテンツを詳細に探索することができる。
【００８０】
以上、検索性能の改善について述べてきたが、最後に本発明に係る情報検索の応用例について説明しておく。
【００８１】
（１）まず第１の応用例は、放送の監視である。放送番組のスポンサと放送事業者との間では、一般に、広告の挿入に応じてスポンサから放送事業者に相当額（スポンサ料）を支払うという事業モデルが構築されている。
【００８２】
しかしながら、約束通りに広告が挿入されない可能性があるので、スポンサは実際の放送を監視する必要がある。このような場合、あらかじめ広告の特徴量をデータベースに登録しておき、放送を受信し放送コンテンツから特徴抽出してデータベースと照合する。
【００８３】
（２）さらに、不当に放送コンテンツが複製され放送に利用されていないかを、番組を制作した著作権者が監視するというアプリケーションを挙げることができる。
【００８４】
テープなどの記録メディアで複製又はその一部を持っている場合、アーカイブの中からその本編を探し出すという目的に、本発明に係る情報検索手法を使用することができる。
【００８５】
番組制作のための撮影時に、特徴抽出を同時に行い、特徴量のデータベースを作っておくことも賢明な方法である。この場合、カメラや撮影者の識別コード、日時、ＧＰＳ（Global Positioning System）などを用いた緯度・経度などのデータも対応させて登録しておけば、さらに便利である。
【００８６】
（３）家庭においても、ＶＴＲ（Video Tape Recorder）やディスク・レコーダで記録されたコンテンツから特徴抽出を行い、視聴履歴から関連する番組をお薦め版としてユーザに通知したり、自動録画したりするサービスも考えられる。特徴抽出は、複数のＶＴＲ、ＤＶＤ（Digital Versatile Disc）プレーヤやその他の記録再生機器、チューナなどすべてのコンテンツが集まる受像機間で行うことで、合理化を図ることができる。
【００８７】
ユーザの視聴履歴から関連する番組をお薦め版としてユーザに教えるようなサービスでは、視聴者別のサービスなので、視聴者を特定するために受信機器の識別コードが必要となる。
【００８８】
さらに、家庭内の個人を特定するには、リモート・コマンダを個人毎に占有し、その識別コードを使用する。これらの識別コードも、サービス供給者に送る必要がある。受信時には、時刻やチャネルが判ると検索が容易になる。また、信号の品質（受信時、再生時、再生のモード（例えば標準と３倍））が判ると、これをデータベース側に送り、照合時の許容度を適応的に制御することによって、検索速度又は検索の確実度を向上させることができる。
【００８９】
［追補］
以上、特定の実施例を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施例の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、限定的に解釈されるべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。
【００９０】
【発明の効果】
以上詳記したように、本発明によれば、映像との音楽の各データからなるＡＶコンテンツを高速且つ確実に検索することができる、優れた情報検索装置及び方を提供することができる。
【００９１】
本発明に係る情報検索装置及び方法によれば、映像だけ、又は音声だけでＡＶコンテンツを検索するよりも迅速に検索することができる。また、系における歪を許容して検索する場合においても、より確実にＡＶコンテンツを検索することができる。
【図面の簡単な説明】
【図１】本発明の一実施形態に係る情報検索装置１の構成を模式的に示した図である。
【図２】情報検索装置１上で実行されるＡＶコンテンツの検索処理についての概略的な手順を示したフローチャートである。
【図３】情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の一例を示したフローチャートである。
【図４】情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の他の例を示したフローチャートである。
【図５】情報検索装置１上で実行されるＡＶコンテンツの検索処理の具体的な手順の他の例を示したフローチャートである。
【図６】ＩＳＲＣのような楽曲のユニーク・コードを利用した音楽検索のメカニズムを示した図（従来例）である。
【符号の説明】
１…情報検索装置
１１…ＣＰＵ
１２…メモリ（制御プログラム）
１３…メモリ（一致判別プログラム）
１４…特徴抽出部
１５…特徴データベース
１６…優先照合リスト
１７…映像・音声照合リスト
１８…情報入力部
１９…バッファ
２０…表示部
２１…ユーザ入力部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an information search technique, and more particularly, to an information search technique for searching AV content including music and video data.
[0002]
More specifically, the present invention relates to an information search technique for AV content based on a feature value of original data, and more particularly to an information search technique for searching for AV content using feature values of both video and music. .
[0003]
[Prior art]
In recent years, information processing technology based on computation processing on a computer has been dramatically improved. Accordingly, various data contents such as video and audio are digitized and handled as computer files in addition to simple computer data. Therefore, from the viewpoint of effective use of information resources, search techniques relating to various types of data / content are becoming increasingly important.
[0004]
Alternatively, in the field of content distribution and broadcasting, the number of times each content has been distributed or broadcasted, such as when a pay-per-use system is introduced for content usage, or when the use of content is controlled from the viewpoint of copyright protection. It must be counted and monitored. For example, an information search technique is applied in order to specify what the content that has been distributed is.
[0005]
For example, if the data search is relatively small in text format, an exhaustive search based on text matching can be performed depending on the computing power of the computer. On the other hand, in the case of AV contents made up of video and music data, since the amount of data is generally enormous, it is technically impossible to perform a complete search from the beginning to the end of the contents.
[0006]
As a music information search, there is one in which music data is scored and collated with the music data to be searched (total number search). In this method, pattern matching is performed from the beginning to the end of the music. However, an exhaustive search that does not use an index has a drawback that the search speed is slow.
[0007]
For example, in music broadcasting, monitoring based on signal feature extraction has already been carried out instead of comparison based on musical notes. That is, the feature amount of a music corresponding to a music unique code such as ISRC (International Standard Recording Number) is registered in the database in advance. An existing recording medium such as a CD (Compact Disc) is used to collect ISRC and music. Then, the broadcast is received, the feature quantity is compared with the feature quantity registered in the database, and the ISRC of the received music is specified (see FIG. 6).
[0008]
When the same search performed in music broadcasting is extended to AV content as it is, if it is performed by voice, there is a problem that it is not possible to extract the silent part. For example, compared to music-only content, there are considerable silences in movies and the like, so there is a limit to search based on feature extraction of music data.
[0009]
On the other hand, when searching for AV content using only video, there are many changes in video content, such as scenes where characters are talking, so there is a limit to search based on video data feature extraction.
[0010]
Also, in the movie industry, even in the same movie, language support corresponding to the official language of each country or region to be sold, that is, voice dubbing processing is performed. As a result of the dubbing process, if the content search is performed on an audio basis, the same movie will not match.
[0011]
[Problems to be solved by the invention]
An object of the present invention is to provide an excellent information retrieval apparatus and method capable of quickly and reliably retrieving AV content including video and music data.
[0012]
[Means and Actions for Solving the Problems]
The present invention has been made in consideration of the above problems, and a first aspect of the present invention is an information search apparatus or method for searching AV content comprising a combination of video data and audio data,
A speech database that concatenates speech features and speech identification codes;
A video database that links video features and video identification codes;
An AV content table in which an audio identification code and a video identification code are concatenated;
A voice data search means or step for performing a voice search using a voice database;
Video data search means or step for searching video using a video database;
AV content determination means or step for determining corresponding AV content from the AV content table based on the audio data search means or step and the audio identification code and video identification code extracted by the video data search means or step Steps,
It is the information search device or method characterized by comprising.
[0013]
An information search apparatus or method according to the present invention includes a database in which audio feature quantities and audio identification codes are linked, a database in which video feature quantities and video identification codes are linked, and an AV in which audio identification codes and video identification codes are linked. Prepare a content table. Then, the video search and the audio search are executed in parallel.
[0014]
For example, when a video search is found, the video identification code is referred to in the AV content table, and the corresponding audio identification code is extracted. If this code is given to the voice search and the collation is obtained, the search is completed. Similarly, when a voice search is found, the voice identification code is referred to in the AV content table, and the corresponding video identification code is extracted. If this code is given to the video search and the collation is obtained, the search is completed.
[0015]
When such a search is performed in a broadcasting system or the like, there may be a case where a certain amount of distortion must be allowed for the system, such as the influence of multipath, the distortion of a transmission / reception circuit, or distortion due to compression. In this case, since the search cannot be performed by completely the same search, it is necessary to give a certain degree of judgment tolerance when collating the feature amount. On the other hand, by giving a tolerance, a plurality of search results may appear for a certain search target.
[0016]
If the search is aimed at speeding up, the possibility that a plurality of search results will appear is inevitably increased. In order to eliminate such a situation, both video and audio can be searched, and reliability can be improved.
[0017]
If a video is searched and found first, the video identification code is referred to the AV content table, and an audio identification code is given to the audio search for collation. On the contrary, when a voice search is performed first, the voice identification code is referred to the AV content table, and the video identification code is given to the video search for collation. Then, the determination is made by seeing that both the video and audio results match. When a plurality of video and audio search results appear, all combinations are judged, and a result with a small error is selected.
[0018]
The information search apparatus or method according to the first aspect of the present invention further comprises a content identification code acquisition means or step for AV content to be searched, and the audio data search means or step and / or video data search. The means or step may preferentially check the AV content corresponding to the content identification code.
[0019]
In the case where a mechanism for obtaining information such as an identification code for identifying content is prepared and can be obtained without fear of falsification, signal verification is not required. However, the copyright owner of the content fears that the user may falsify the player or the recording device to generate a false identification code. Therefore, it is necessary to perform signal verification for confirming tampering.
[0020]
In many cases, tampering is not performed, so that the information retrieval speed of the entire system can be increased by preferentially collating the content corresponding to the content identification code.
[0021]
Further, the audio data search means or step and / or the video data search means or step may omit the collation processing for clearly different AV contents.
[0022]
For AV contents that can be clearly seen to be different, it is not necessary to perform audio data search and video data search, and by omitting such steps, it is possible to reduce the computer load and to make errors with respect to similar contents. This makes it possible to perform highly accurate collation processing. In addition, since an unnecessary collation process for obviously different contents is omitted, the information retrieval speed of the entire system can be increased.
[0023]
According to a second aspect of the present invention, computer software written to execute an information retrieval process for retrieving AV content including a combination of video data and audio data on a computer system is physically stored in a computer-readable format. A stored storage medium, wherein the computer software includes:
A voice data search step for searching for voice using a voice database in which voice feature values and voice identification codes are linked;
A video data search step for searching for video using a video database in which video features and video identification codes are linked;
AV content for determining the corresponding AV content from an AV content table in which the audio identification code and the video identification code are linked based on the audio identification code and the video identification code extracted by the audio data search step and the video data search step. A decision step;
It is a storage medium characterized by comprising.
[0024]
The storage medium according to the second aspect of the present invention is, for example, a medium that provides computer software in a computer-readable format to a general-purpose computer system that can execute various program codes. Such a medium is a detachable and portable storage medium such as a CD (Compact Disc), an FD (Floppy Disk), or an MO (Magneto-Optical disc). Alternatively, it is technically possible to provide computer software to a specific computer system via a transmission medium such as a network (whether the network is wireless or wired).
[0025]
Such a storage medium defines a structural or functional cooperative relationship between the computer software and the storage medium for realizing a predetermined computer software function on the computer system. In other words, by installing predetermined computer software into the computer system via the storage medium according to the second aspect of the present invention, a cooperative action is exhibited on the computer system, and The same effects as those of the information search apparatus and method according to the first aspect can be obtained.
[0026]
Other objects, features, and advantages of the present invention will become apparent from a more detailed description based on embodiments of the present invention described later and the accompanying drawings.
[0027]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0028]
FIG. 1 schematically shows a configuration of an information search apparatus 1 according to an embodiment of the present invention. Hereinafter, each component of the information search apparatus 1 will be described with reference to FIG.
[0029]
A CPU (Central Processing Unit) 11 is a central controller that controls the overall operation of the information retrieval apparatus 1 and is stored in the memory 12 under a program execution environment provided by an operating system (OS) (or The control program loaded) and the coincidence determination program stored (or loaded) in another memory 13 are executed.
[0030]
The match determination program mentioned here includes video database matching processing based on video feature values, audio database matching processing based on audio feature values, and AV content table matching processing based on video and audio database matching results. included. Further, the control program includes an information search processing program related to AV content such as AV content specifying processing based on the match determination result.
[0031]
The CPU 11 is interconnected with each module in the apparatus 1 via the bus 22. The bus 22 is a common signal transmission path including an address bus, a data bus, and a control bus.
[0032]
The video / audio feature extraction unit 14 performs demultiplex processing from AV content to video and audio data, video data feature extraction processing, and audio data feature extraction processing. The video and audio feature values extracted by the video / audio feature extraction unit 14 are input to the match determination program.
[0033]
The feature database 15 is composed of a database used for collation / information retrieval processing related to AV content. More specifically, the feature database 15 includes an audio database in which audio feature values and audio identification codes are linked, a video database in which video feature values and video identification codes are linked, and an audio identification code and a video identification code. The AV content table. Of these, a configuration example of an AV content table relating to movies is shown below.
[0034]
[Table 1]

[0035]
In the AV content table shown above, each record is composed of a combination of a movie name, a video identification code, an audio identification code, and an AV content identification code. Accordingly, when the video identification code is found, the voice identification code that hits it can be narrowed down. Conversely, when the voice identification code is found, the video identification code that hits it can be narrowed down. When both the video identification code and the audio identification code are specified, the AV content identification code, that is, the AV content is uniquely determined by referring to the AV content table.
[0036]
The priority collation list 16 and the video / audio collation list 17 are used for temporarily storing a part of work data when performing information retrieval, that is, collation processing of AV contents.
[0037]
The priority collation list 16 is a storage module for listing AV contents that should be preferentially collated. For example, AV content to which a content identification code is passed from the outside is registered in the priority collation list 16 until the processing time can be increased by limiting the collation / search range by this identification code.
[0038]
The video / audio collation list 17 is used for temporarily storing database collation results for each video data and audio data. For example, if the video data matches but the corresponding audio data does not match, it is not necessary to perform the video data verification process, so that information is entered in the video verification list. On the other hand, if the audio data matches but the corresponding video data does not match, it is not necessary to perform the audio data verification process, so that information is entered in the audio verification list. Also, if the audio is clearly different, enter that in the audio verification list to skip the high-precision verification process, and if the video is clearly different, indicate that in the video verification list. Fill in to skip the verification process.
[0039]
The information input unit 18 is a functional module that inputs information from an external device. The information input here includes AV content in which video data and audio data are multiplexed, the content identification code, and the like. Examples of the medium for inputting and outputting information include a computer network such as a LAN (Local Area Network) and the Internet, and a portable recording medium such as a CD (Compact Disc) and a DVD (Digital Versatile Disc). Information supplied from an external device via the information input unit 18 is temporarily stored in, for example, the buffer memory 19.
[0040]
The display unit 20 outputs the calculation result by the CPU 11 to the screen and provides visual feedback of information to the user. The user input unit 21 is configured by an input device such as a keyboard and a mouse (both not shown), for example, and accepts input of commands and data from the user.
[0041]
The information retrieval apparatus 1 can be configured using, for example, a general computer system. An example of such a computer system is IBM's PC / AT (Personal Computer / Advanced Technology) compatible machine or its successor.
[0042]
FIG. 2 shows a schematic procedure of AV content search processing executed on the information search apparatus 1 in the form of a flowchart. The flow of AV content search processing according to this embodiment will be schematically described below with reference to this flowchart.
[0043]
First, AV content to be searched is demultiplexed and separated into a video signal and an audio signal (step S1). Although not shown, the video signal and the audio signal are compressed and decompressed as necessary.
[0044]
Next, feature amounts are extracted from the video signal and the audio signal, respectively, and collated with each data registered in advance in the feature database 15 to extract a content identification code. The voice collation in step S2A and the video collation in step S2B are performed in parallel. For the content identification code, the AV content table (described above) in the feature database 15 is referred to, and if it is audio, the video identification code combined with it is extracted (step S3A). An identification code is extracted (step S3B).
[0045]
The extracted voice identification code is transferred to the voice collation 2 (step S4A), and the feature quantity corresponding to the content identification code is collated with the voice database, thereby confirming or selecting the content identification code of the corresponding voice. Similarly, the extracted video identification code is transferred to video collation 2 (step S4B), and the feature quantity corresponding to the content identification code is collated with the video database to confirm or select the content identification code of the corresponding video. To do.
[0046]
By incorporating a decision logic (step S5) so as to adopt an early obtained content identification code obtained by voice collation 2 or video collation 2, it is possible to speed up the identification of AV content. .
[0047]
When the AV content table shown in [Table 1] is used, for example, when the video identification code (4) is extracted by video collation and the AV content table is referenced, the corresponding audio identification code is (6, 7, 8) can be taken out. These voice identification codes are input to the voice collation 2, and the collation is matched there to determine an AV content identification code (for example, 4-7).
[0048]
In order to cope with distortions, such as transmission system distortion, multipath during transmission, distortion during compression and recording / playback, etc., but only to allow a certain degree of tolerance at the time of matching, only one identification code On the other hand, two contents may correspond at the time of collation. In such a case, it is possible to improve the reliability by configuring the decision logic so as to employ an identification code in which the voice collation 2 and the video collation 2 match.
[0049]
In the flowchart shown in FIG. 2, when a path indicated by a dotted line is provided and an identification code is not obtained from the voice collation 2 or the video collation 2, the video collation or the voice success empty identification code is directly used. , "Sound is not normal but the identification code can be read as XXXXXX from the video" or "The video seems not normal but the identification code can be read from the sound as XXXXXX" 20 may be used.
[0050]
FIG. 3 shows an example of a specific procedure of AV content search processing executed on the information search apparatus 1 according to the present embodiment in the form of a flowchart. The processing procedure shown in the figure is an example in the case where no error is allowed in collation.
[0051]
First, AV content to be searched is demultiplexed and separated into a video signal and an audio signal (step S11). Although not shown, the video signal and the audio signal are compressed and decompressed as necessary.
[0052]
Next, feature quantities are extracted from the video signal and the audio signal, respectively (step S12). Then, the feature values of the video and audio are compared with each data registered in advance in the feature database 15 (step S13), and it is determined whether or not they match (step S14). Video and audio collation speeds vary depending on the content. In the present embodiment, the video and audio collation processes are executed concurrently at best effort.
[0053]
If no match is found for both video and audio (step S14), the process proceeds to the next content in the feature database (step S20), and video and audio feature values are similarly collated. If no matching content is found in the feature database, a display indicating that there is no corresponding content is output from the display unit 20 to notify the user (step S21), and the entire processing routine is terminated.
[0054]
If matching data is found for the video (step S15), it is determined whether or not there is a matching audio corresponding to the video data with reference to the audio collation list (step S18). If there is no match for the corresponding audio, after recording that the video does not match in the video collation list (step S19), the process returns to step S13, jumps to the content of the corresponding video, and Compare with feature database.
[0055]
If matching data is found for the audio (step S15), it is determined by referring to the video collation list whether there is a match for the video corresponding to the audio data (step S16). If there is no match for the corresponding video, after recording that the voice does not match in the voice collation list (step S17), the process returns to step S13, jumps to the content of the corresponding voice, and the voice Compare with the feature database.
[0056]
As a result of the determination block S16 or S18, when a matching AV content is found for video and audio, the content identification code is output as a search result, and the entire processing routine is terminated.
[0057]
FIG. 4 shows another example of a specific procedure of AV content search processing executed on the information search apparatus 1 according to the present embodiment in the form of a flowchart. The processing procedure shown in the figure is an example in the case where a search is performed using a content identification code given by some means and no error is allowed in collation.
[0058]
First, the content identification code of the AV content to be searched is acquired by some means (step S31) and registered in the priority collation list 16 (step S32). The information search apparatus 1 according to the present embodiment preferentially collates the AV content registered in the priority collation list 16 when, for example, the AV content as a search target is continuously input.
[0059]
Then, it was registered in the priority collation list 16 Most The AV content corresponding to the priority content identification code is demultiplexed and separated into a video signal and an audio signal (step S33). Although not shown, the video signal and the audio signal are compressed and decompressed as necessary.
[0060]
Next, feature amounts are extracted from the video signal and the audio signal, respectively (step S34). Then, the feature values of the video and audio are compared with each data registered in advance in the feature database 15 (step S35), and it is determined whether or not they match (step S36). Video and audio collation speeds vary depending on the content. In the present embodiment, the video and audio collation processes are executed concurrently at best effort.
[0061]
If no match is found for both video and audio (step S36), the process proceeds to the next content in the feature database (step S42), and video / audio feature values are similarly collated. If no matching content is found in the feature database, a display indicating that there is no corresponding content is output from the display unit 20 to notify the user (step S43), and the entire processing routine is terminated.
[0062]
When matching data is found for the video (step S37), it is determined whether or not there is a matching audio corresponding to the video data with reference to the audio collation list (step S40). If there is no match for the corresponding audio, after recording that the video does not match in the video collation list (step S41), the process returns to step S35, jumps to the content of the corresponding video, and the video Compare with the feature database.
[0063]
If matching data is found for the audio (step S37), it is determined by referring to the video collation list whether there is a matching video for the audio data (step S38). If there is no match for the corresponding video, after recording that the voice does not match in the voice collation list (step S39), the process returns to step S35, jumps to the content of the corresponding voice, and the voice Compare with the feature database.
[0064]
As a result of the determination block S38 or S40, if a matching AV content is found for video and audio, the content identification code is output as a search result, and the entire processing routine is terminated.
[0065]
If the content ID of the AV content to be searched can be obtained without fear of falsification, signal verification is not necessary. However, the copyright holder of the content may fear that a false ID is created by modifying the player or the recorder. Therefore, in order to confirm the tampering, it is necessary to perform signal verification. In many cases, the contents are not falsified, so that the contents in the priority collation list 16 are collated preferentially as described above, and if they match, the contents search can be terminated. The speed can be increased.
[0066]
FIG. 5 shows another example of a specific procedure of AV content search processing executed on the information search apparatus 1 according to the present embodiment in the form of a flowchart. The processing procedure shown in the figure is an example in which speed-up is performed by excluding the AV content registered in the feature database that is clearly different from the AV content to be searched, and performing collation processing.
[0067]
First, AV content to be searched is demultiplexed and separated into a video signal and an audio signal (step S51). Although not shown, the video signal and the audio signal are compressed and decompressed as necessary.
[0068]
Next, feature quantities are extracted from the video signal and the audio signal, respectively (step S52). Then, the feature values of the video and audio are collated with each data registered in advance in the feature database 15 (step S53), and it is determined whether or not it is clearly different from the AV content to be searched (step S54). ). Video and audio collation speeds vary depending on the content. In the present embodiment, the video and audio collation processes are executed concurrently at best effort. Here, since the purpose is only to verify the difference in content, in step S53, a matching process in which a relatively large error is recognized may be performed.
[0069]
If it is determined in step S54 that there is a clear difference, it is further determined which of the video and audio is clearly different (step S55).
[0070]
If there is a clear difference regarding the video, after recording that the video does not match in the video collation list (step S56), the process returns to step S53 to skip the matching process of the corresponding video.
[0071]
If the voice is clearly different, after recording that the voice does not match in the voice collation list (step S57), the process returns to step S53, and the corresponding voice collation processing is skipped.
[0072]
If it is not clearly different from the AV content to be searched, the process proceeds to the next content in the feature database (step S58), and similarly whether or not it is clearly different from the AV content to be searched. The process for determining is repeatedly executed.
[0073]
After verifying the difference from the AV content to be searched for all the data in the feature database, the feature values of video and audio are collated with each data registered in advance in the feature database 15 (step S59). It is determined whether or not the AV content is a search target (step S60). Video and audio collation speeds vary depending on the content. In the present embodiment, the video and audio collation processes are executed concurrently at best effort. Here, since the purpose is to narrow down the data, in step S59, collation processing is performed with a smaller error than in the preceding step S53.
[0074]
If it is determined in step S60 that they are different, it is further determined which of video and audio is different (step S61).
[0075]
If there is a difference between the videos, after recording that the videos do not match in the video collation list (step S62), the process returns to step S59, and the corresponding video collation processing is skipped.
[0076]
If the voices are different, after recording that the voices do not match in the voice collation list (step S63), the process returns to step S59, and the corresponding voice collation process is skipped.
[0077]
After all the data in the feature database are verified for differences from the AV content to be searched, the entire processing routine is terminated.
[0078]
In this way, by gradually reducing the verification error, it is possible to search for content having features close to each other at high speed. In the process up to step S58, when only one content that is not clearly different is narrowed down to one, it is not necessary to execute the processes after step S59 together.
[0079]
If it is assumed that the test signal is distorted, in steps S53 to S58, the degree of difference is gradually reduced and repeated many times, so that the content with the smallest error is searched in detail. can do.
[0080]
The improvement of search performance has been described above. Finally, an application example of information search according to the present invention will be described.
[0081]
(1) First, a first application example is broadcast monitoring. In general, a sponsor has established a business model between a sponsor of a broadcast program and a broadcaster that pays an equivalent amount (sponsor fee) from the sponsor to the broadcaster in accordance with the insertion of an advertisement.
[0082]
However, the sponsor needs to monitor the actual broadcast because there is a possibility that the advertisement will not be inserted as promised. In such a case, the feature amount of the advertisement is registered in the database in advance, the broadcast is received, the feature is extracted from the broadcast content, and collated with the database.
[0083]
(2) Further, there can be mentioned an application in which the copyright holder who produced the program monitors whether the broadcast content is illegally copied and used for broadcasting.
[0084]
When a recording medium such as a tape has a copy or a part thereof, the information search method according to the present invention can be used for the purpose of searching the main part from the archive.
[0085]
It is also wise to create a feature database by extracting features at the same time as shooting for program production. In this case, it is more convenient if data such as latitude / longitude using a camera / photographer's identification code, date / time, GPS (Global Positioning System) and the like are also registered.
[0086]
(3) A service that extracts features from content recorded by a VTR (Video Tape Recorder) or disc recorder, and notifies the user of related programs from the viewing history as recommended versions or automatically records them at home. Is also possible. The feature extraction can be rationalized by performing between the receivers in which all the contents such as a plurality of VTRs, DVDs (Digital Versatile Disc) players, other recording / reproducing devices, and tuners gather.
[0087]
The service that teaches the user a recommended program from the user's viewing history as a recommended version is a service for each viewer, and therefore the identification code of the receiving device is required to identify the viewer.
[0088]
Further, in order to identify an individual in the home, a remote commander is occupied for each individual and the identification code is used. These identification codes also need to be sent to the service provider. When receiving, it is easy to search if the time and channel are known. Also, if the signal quality (reception, playback, playback mode (for example, 3 times the standard)) is known, this is sent to the database side, and the search speed is adjusted by adaptively controlling the tolerance at the time of verification. Or the certainty of a search can be improved.
[0089]
[Supplement]
The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiments without departing from the gist of the present invention. In other words, the present invention has been disclosed in the form of exemplification, and should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims section described at the beginning should be considered.
[0090]
【The invention's effect】
As described above in detail, according to the present invention, it is possible to provide an excellent information search apparatus and method capable of searching AV content composed of video and music data at high speed and with certainty.
[0091]
According to the information search apparatus and method of the present invention, it is possible to search more quickly than searching AV contents only with video or audio. Also, AV content can be searched more reliably even when searching while allowing distortion in the system.
[Brief description of the drawings]
FIG. 1 is a diagram schematically showing a configuration of an information search apparatus 1 according to an embodiment of the present invention.
FIG. 2 is a flowchart showing a schematic procedure for AV content search processing executed on the information search apparatus 1;
FIG. 3 is a flowchart showing an example of a specific procedure of AV content search processing executed on the information search apparatus 1;
FIG. 4 is a flowchart showing another example of a specific procedure of AV content search processing executed on the information search apparatus 1;
FIG. 5 is a flowchart showing another example of a specific procedure of AV content search processing executed on the information search apparatus 1;
FIG. 6 is a diagram (conventional example) showing a music search mechanism using a unique code of music such as ISRC.
[Explanation of symbols]
1 Information retrieval device
11 ... CPU
12 ... Memory (control program)
13 ... Memory (coincidence determination program)
14 ... Feature extraction unit
15 ... Feature database
16 ... Priority collation list
17 ... Video / sound collation list
18 ... Information input section
19 ... Buffer
20 ... Display section
21 ... User input part

Claims

An information search device for searching AV content comprising a combination of video data and audio data,
An audio database in which audio features of each AV content and audio identification codes are linked;
A video database in which video feature quantities and video identification codes of each AV content are linked;
An AV content table in which the audio identification code and the video identification code of each AV content are linked;
Separating means for separating AV content to be searched into a video signal and an audio signal;
First voice data collating means for collating a feature amount extracted from the separated voice signal with the voice database and extracting a corresponding voice identification code;
In parallel with the audio data search means, first video data collating means for collating the feature quantity extracted from the separated video signal with the video database and extracting a corresponding video identification code;
A video identification code connected to the audio identification code extracted by the first audio data collating unit is extracted with reference to the AV content table, and the video identification code extracted by the first video data collating unit is extracted. AV content table reference means for extracting a voice identification code concatenated with,
Second voice data collation for confirming whether the voice identification code extracted by the AV content table reference means matches the voice identification code extracted by the first voice data matching means collating with the voice database Means,
Second video data collation for checking whether the video identification code extracted by the AV content table reference means matches the video identification code extracted by the first video data collating means collating with the video database Means,
AV content determining means for determining a corresponding AV content from the AV content table based on a result of early confirmation of the second audio data matching means or the second video data matching means;
An information retrieval apparatus comprising:

It further comprises a content identification code acquisition means for AV content to be searched,
The AV content corresponding to the content identification code is preferentially verified.
The information retrieval apparatus according to claim 1.

An information retrieval method for retrieving AV content comprising a combination of video data and audio data using a computer, wherein the computer includes an audio database in which audio feature amounts and audio identification codes of each AV content are linked, and each AV content A video database in which video feature quantities and video identification codes are linked, and an AV content table in which audio identification codes and video identification codes of each AV content are linked.
Separating means provided in the computer separates AV content to be searched into a video signal and an audio signal;
A first voice data matching step in which a first voice data matching means provided in the computer compares the feature amount extracted from the separated voice signal with the voice database and extracts a corresponding voice identification code;
In parallel with the audio data search step, a first video data collating unit provided in the computer collates the feature amount extracted from the separated video signal with the video database and extracts a corresponding video identification code. A first video data collation step;
AV content table reference means provided in the computer refers to the AV content table to extract a video identification code concatenated with the audio identification code extracted in the first audio data collation step, and AV content table reference step for extracting a voice identification code connected to the video identification code extracted in the first video data collating step;
A second voice data collating means provided in the computer, wherein the voice identification code extracted in the AV content table reference step is extracted by collating with the voice database in the first voice data collating step; A second voice data collation step for confirming whether or not they match,
A second video data collating unit provided in the computer, wherein the video identification code extracted in the AV content table reference step is extracted by collating with the video database in the first video data collating step; A second video data collation step for confirming whether or not they match,
The AV content determination means included in the computer selects a corresponding AV from the AV content table based on a result of early confirmation in the second audio data collation step or the second video data collation step. AV content determination step for determining content;
A method for retrieving information, comprising:

The content identification code acquisition means provided in the computer further includes a content identification code acquisition step of AV content to be searched,
The AV content corresponding to the content identification code is preferentially verified.
The information search method according to claim 3.

A storage medium that physically stores in a computer-readable format computer software written to execute an information search process for searching AV content comprising a combination of video data and audio data on a computer,
The computer includes an audio database in which audio feature values and audio identification codes for each AV content are linked, a video database in which video feature values and video identification codes for each AV content are linked, and an audio identification code and video for each AV content. Having an AV content table concatenated with identification codes;
The computer software is stored on the computer.
A separation step of separating the AV content to be searched into a video signal and an audio signal;
A first speech data collation step of collating the feature amount extracted from the separated speech signal with the speech database and extracting a corresponding speech identification code;
In parallel with the audio data search step, a first video data collation step for collating the feature amount extracted from the separated video signal with the video database and extracting a corresponding video identification code;
With reference to the AV content table, the video identification code connected to the voice identification code extracted in the first audio data collation step is extracted, and the video extracted in the first video data collation step AV content table reference step for extracting a voice identification code concatenated with the identification code;
Second voice data for confirming whether or not the voice identification code extracted in the AV content table reference step matches the voice identification code extracted by the first voice data matching unit collating with the voice database. A matching step;
Second video data for confirming whether or not the video identification code extracted in the AV content table reference step matches the video identification code extracted by the first video data collating means collating with the video database. A matching step;
AV content determining means for determining a corresponding AV content from the AV content table based on a result of early confirmation in the second audio data collating step or the second video data collating step;
A storage medium characterized in that