JP2004080094A

JP2004080094A - Information-processing apparatus, information-processing method and program, and computer-readable recording medium

Info

Publication number: JP2004080094A
Application number: JP2002233839A
Authority: JP
Inventors: Mitsuru Maeda; 前田　充
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2002-08-09
Filing date: 2002-08-09
Publication date: 2004-03-11

Abstract

<P>PROBLEM TO BE SOLVED: To provide an information-processing apparatus, information-processing method and a program, capable of preventing falsification by which the identity between video data and audio data is destroyed, without the need for using a specified file format, and to provide a computer-readable recording medium. <P>SOLUTION: In this method, a video encoder 109 encodes video data, and an audio encoder 108 encodes the audio data. Also, a watermark generator 110 generates predetermined watermark data. A watermark-embedding device 112 embeds watermark data in the encoded video data by an electronic watermark. Meanwhile, a watermark-embedding device 111 embeds watermark data into the encoded audio data by an electronic watermark. Further, a multiplexer 113 forms multiplexed data, in which the video data and audio data each having embedded watermark data, are multiplexed. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、ビデオデータとそれに関連するオーディオデータに対する改ざんを検出する情報処理装置及び情報処理方法並びにプログラム及びコンピュータ読み取り可能な記録媒体に関する。
【０００２】
【従来の技術】
一般に、動画像の符号化方式として、フレーム内符号化方式であるＭｏｔｉｏｎ　ＪＰＥＧ（Ｊｏｉｎｔ　Ｐｈｏｔｏｇｒａｐｈｉｃ　Ｃｏｄｉｎｇ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）やＤｉｇｉｔａｌ　Ｖｉｄｅｏ等の符号化方式や、フレーム間予測符号化を用いたＨ．２６１、Ｈ．２６３、ＭＰＥＧ（Ｍｏｖｉｎｇ　Ｐｉｃｔｕｒｅ　Ｃｏｄｉｎｇ　Ｅｘｐｅｒｔｓ　Ｇｒｏｕｐ）−１、ＭＰＥＧ−２、ＭＰＥＧ−４等の符号化方式が知られている。これらの符号化方式は、ＩＳＯ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｏｒｇａｎｉｚａｔｉｏｎ　ｆｏｒ　Ｓｔａｎｄａｒｄｉｚａｔｉｏｎ：国際標準化機構）やＩＴＵ（Ｉｎｔｅｒｎａｔｉｏｎａｌ　Ｔｅｌｅｃｏｍｍｕｎｉｃａｔｉｏｎ　Ｕｎｉｏｎ：国際電気通信連合）によって国際標準化されている。
【０００３】
上述したようなデジタル符号化規格の普及に伴い、映像や音楽等のコンテンツ業界からは著作権保護の問題が強く提起されるようになってきた。これに対応して、コンテンツの保護に関する標準化も進み、ＭＰＥＧ−４符号化方式では、ファイルシステムにＩＰＭＰ　ＯＤを用いてセキュリティ情報を記述したり、セキュリティ情報によって再生を制限する方法が標準化された。そして、セキュリティに関する情報や暗号化のために電子透かし技術が開発されている。この電子透かし技術とは、データ再生時にデータが変化しないか、又は変化が知覚できないレベルでデータを埋め込む技術のことである。
【０００４】
ビデオデータに対して電子透かしを埋め込む技術としては、特開平１０−２４３３９８号の「動画像エンコードプログラムを記録した記録媒体及び動画像エンコード装置」や特開平１１−３４１４５０号の「電子透かし埋め込み装置及び抽出装置」等に開示されている。また、オーディオデータに関しても同様に、特開２００１−２０２０８９の「音声データに透かし情報を埋め込む方法、透かし情報埋め込み装置、透かし情報検出装置、透かし情報が埋め込まれた記録媒体、及び透かし情報を埋め込む方法を記録した記録媒体」や特開平１１−３１６５９９号の「電子透かし埋め込み装置、オーディオ符号化装置及び記録媒体」等に電子透かしを埋め込む技術が開示されている。
【０００５】
さらに、静止画像の一部を画像処理等によって改ざんした場合、これを検出する方法が、特開２００１−７８０７０の「デジタルカメラ及び画像改ざん検出システム」等に開示されている。
【０００６】
上述したような電子透かしは、一般に、ビデオデータやオーディオデータ等の改ざん防止や著作権の保護のために用いられている。
【０００７】
【発明が解決しようとする課題】
しかしながら、本来のビデオデータとオーディオデータの組み合わせからオーディオデータの一部又は全部を編集ソフト等で置き換えるような場合、従来の著作権保護システムではそれを改ざんとして検知することができない。例えば、あるシーンを撮影した後、同じカメラで音声等を変えて同じように撮影し、音声データを入れ替えても、オリジナルのデータであるかどうかを判断することができない。
【０００８】
また、ファイルフォーマットについては、フレームの同期やフレームの制御に関して各種各様の方式が採用されている。例えば、同じＭｏｔｉｏｎ　ＪＰＥＧのファイルであっても、ＡＶＩファイルフォーマットとＱｕｉｃｋＴｉｍｅファイルフォーマットとでは方式が異なる。従って、ＭＰＥＧ−４符号化方式のファイルフォーマットから別なファイルフォーマットに変換したような場合、あるファイルフォーマットで著作権保護を行っていても、これをサポートしない別のファイルフォーマットに変換すると、その情報が失われてしまうという問題が生じることになる。
【０００９】
本発明は、このような事情を考慮してなされたものであり、特定のファイルフォーマットを用いる必要がなく、ビデオデータとオーディオデータとの同一性を破壊するような改ざんを防止することができる情報処理装置及び情報処理方法並びにプログラム及びコンピュータ読み取り可能な記録媒体を提供することを目的とする。
【００１０】
【課題を解決するための手段】
上記課題を解決するために、本発明は、ビデオデータと該ビデオデータに同期したオーディオデータとを符号化する情報処理装置であって、前記ビデオデータを符号化する第１の符号化手段と、前記オーディオデータを符号化する第２の符号化手段と、所定の透かしデータを生成する透かしデータ生成手段と、前記透かしデータを符号化された前記ビデオデータに電子透かしによって埋め込む第１の透かし埋め込み手段と、前記透かしデータを符号化された前記オーディオデータに電子透かしによって埋め込む第２の透かし埋め込み手段と、前記透かしデータが埋め込まれた前記ビデオデータと前記オーディオデータとを多重化した多重化データを生成する多重化手段とを備えることを特徴とする。
【００１１】
また、本発明に係る情報処理装置は、前記透かしデータ生成手段が、前記ビデオデータと該ビデオデータに同期した前記オーディオデータとに基づいて、前記ビデオデータと前記オーディオデータに電子透かしによって埋め込まれる共通の透かしデータを生成することを特徴とする。
【００１２】
さらに、本発明に係る情報処理装置は、ビデオデータを入力する第１の入力手段と、前記ビデオデータに同期したオーディオデータを入力する第２の入力手段と、前記ビデオデータと前記オーディオデータとが同期していることを示す同一性データを生成する同一性データ生成手段と、前記同一性データから所定の透かしデータを生成する透かしデータ生成手段と、前記透かしデータを前記ビデオデータに電子透かしによって埋め込む第１の透かし埋め込み手段と、前記透かしデータを前記オーディオデータに電子透かしによって埋め込む第２の透かし埋め込み手段と、前記透かしデータが埋め込まれた前記ビデオデータと前記透かしデータが埋め込まれた前記オーディオデータとを多重化した多重化データを生成する多重化手段とを備えることを特徴とする。
【００１３】
さらにまた、本発明に係る情報処理装置は、前記ビデオデータを符号化する第１の符号化手段をさらに備え、前記第１の埋め込み手段が、符号化された前記ビデオデータに対して電子透かしによって前記透かしデータを埋め込むことを特徴とする。
【００１４】
さらにまた、本発明に係る情報処理装置は、前記オーディオデータを符号化する第２の符号化手段をさらに備え、前記第２の埋め込み手段が、符号化された前記オーディオデータに対して電子透かしによって前記透かしデータを埋め込むことを特徴とする。
【００１５】
さらにまた、本発明に係る情報処理装置は、前記多重化データを可搬記録媒体に記録する記録手段をさらに備えることを特徴とする。
【００１６】
さらにまた、本発明に係る情報処理装置は、第１の透かしデータが電子透かしによって埋め込まれたビデオデータを入力する第１の入力手段と、第２の透かしデータが電子透かしによって埋め込まれたオーディオデータを入力する第２の入力手段と、前記ビデオデータに埋め込まれた前記第１の透かしデータを抽出する第１の透かし抽出手段と、前記オーディオデータに埋め込まれた前記第２の透かしデータを抽出する第２の透かし抽出手段と、前記第１の透かしデータと前記第２の透かしデータとの同一性を比較する比較手段と、前記比較手段における前記同一性の比較結果に基づいて、前記ビデオデータと前記オーディオデータとが同期しているか否かを判定する判定手段とを備えることを特徴とする。
【００１７】
さらにまた、本発明に係る情報処理装置は、第１の透かしデータが埋め込まれたビデオデータと第２の透かしデータが埋め込まれたオーディオデータとが多重化された多重化データを入力する多重化入力手段と、前記多重化データから前記ビデオデータと前記オーディオデータとを分離する分離手段と、前記ビデオデータに埋め込まれた前記第１の透かしデータを抽出する第１の透かし抽出手段と、前記オーディオデータに埋め込まれた前記第２の透かしデータを抽出する第２の透かし抽出手段と、前記第１の透かしデータと前記第２の透かしデータとの同一性を比較する比較手段と、前記比較手段における前記同一性の比較結果に基づいて、前記ビデオデータと前記オーディオデータとが同期しているか否かを判定する判定手段とを備えることを特徴とする。
【００１８】
さらにまた、本発明に係る情報処理装置は、前記多重化入力手段が、可搬記録媒体に記録された前記多重化データを読み取って入力することを特徴とする。
【００１９】
さらにまた、本発明に係る情報処理装置は、前記ビデオデータを出力する第１の出力手段と、前記オーディオデータを出力する第２の出力手段とをさらに備えることを特徴とする。
【００２０】
さらにまた、本発明に係る情報処理装置は、前記ビデオデータを再生する第１の再生手段をさらに備えることを特徴とする。
【００２１】
さらにまた、本発明に係る情報処理装置は、前記オーディオデータを再生する第２の再生手段をさらに備えることを特徴とする。
【００２２】
さらにまた、本発明に係る情報処理装置は、前記判定手段が、前記ビデオデータと前記オーディオデータは同期していないと判定した場合、前記第１の再生手段における前記ビデオデータの再生時に、前記オーディオデータと同期していないことを示す情報を再生させる制御手段をさらに備えることを特徴とする。
【００２３】
さらにまた、本発明に係る情報処理装置は、前記ビデオデータが符号化されており、符号化された該ビデオデータを復号する第１の復号手段と、前記オーディオデータが符号化されており、符号化された該オーディオデータを復号する第２の復号手段とをさらに備えることを特徴とする。
【００２４】
【発明の実施の形態】
以下、図面を参照して、本発明の好適な実施形態について説明する。
【００２５】
＜第１の実施形態＞
図１は、本発明の第１の実施形態における符号化システムの構成を示すブロック図である。本実施形態では、図１に示す符号化システムを用いて、ビデオデータとそれに同期しているオーディオデータとを符号化して記録する例を用いて説明する。
【００２６】
図１に示すように、本実施形態に係る符号化システムは、符号化データを生成する情報処理装置１と、それに接続し、オーディオデータを入力するマイク２と、ビデオデータをフレーム単位で連続して入力するカメラ４と、生成された符号化データを記録する記憶装置１０とから構成される。
【００２７】
情報処理装置１には、まず、マイク２から入力されたオーディオデータをフレーム単位で符号化するオーディオ符号化器３が設けられている。ここでは一例としてＭＰＥＧ−１　Ｌａｙｅｒ　３符号化方式を用いて説明するが、本発明における適用はこの方式だけに限定されない。また、説明を容易にするためにフレームサイズをビデオデータの１フレームの間隔とする。
【００２８】
さらに、情報処理装置１は、カメラ４から入力されたビデオデータをフレーム単位で符号化するビデオ符号化器５を備える。本実施形態では、符号化方式としてＭＰＥＧ−４符号化方式を例にとって説明するが、本発明の適用はこの方式に限定されない。さらにまた、情報処理装置１は、１回の録画に関してユニークな値を生成する透かし生成器６を備える。
【００２９】
また、透かし埋め込み器７は、オーディオ符号化器３と透かし生成器６に接続し、透かし生成器６で生成された透かしデータ（以下、「オーディオ透かしデータ」と称す。）を符号化されたオーディオデータ（以下、「オーディオ符号化データ」と称す。）に埋め込む。一方、透かし埋め込み器８は、ビデオ符号化器５と透かし生成器６に接続し、透かし生成器６で生成された透かしデータ（以下、「ビデオ透かしデータ」と称す。）を符号化されたビデオデータ（以下、「ビデオ符号化データ」と称す。）に埋め込む。そして、多重化器９は、透かし埋め込み器７、８に接続され、オーディオ透かしデータが埋め込まれたオーディオ符号化データとビデオ透かしデータが埋め込まれたビデオ符号化データとを多重化して１つのストリームデータにする。記憶装置１０は、このストリームデータを記録、保存する。
【００３０】
次に、上記構成をした符号化システムにおけるビデオデータとオーディオデータの符号化から蓄積までの処理の流れを詳細に説明する。図９は、本発明の第１の実施形態に係る符号化システムにおける符号化処理手順を説明するためのフローチャートである。
【００３１】
まず、処理に先立って各部を初期化し、ＩＤ番号を設定する（ステップＳ２０１）。ここで、ＩＤ番号とは、今回の処理に関する固有の番号のことである。例えば、本実施形態ではＩＤ番号として４バイトで表される乱数を用いる。尚、ＩＤ番号はこの値だけに限定されるものではなく、任意の番号を用いることが可能である。また、多重化のためのヘッダデータを生成し、記憶装置１０の空き領域に出力結果の書き込みの準備を行う。その後、動画像（本実施形態では、映像と音声を含むものとする）の録画動作を開始する。
【００３２】
次に、符号化処理の終了判断を行う（ステップＳ２０２）。その結果、処理が終了していない場合、透かし生成器６はＩＤ番号に基づいて透かしデータを生成する（ステップＳ２０３）。本実施形態では、この透かしデータとして、オーディオデータとビデオデータのそれぞれのために４バイトのデータを暗号化したものを用いる。そして、生成されたオーディオ透かしデータは透かし埋め込み器７に、ビデオ透かしデータは透かし埋め込み器８に入力される。
【００３３】
次に、カメラ４から読み込まれるビデオデータの各フレームを符号化する（ステップＳ２０４）。すなわち、カメラ４で撮像されたビデオデータは１フレーム単位でビデオ符号化器５に入力される。そして、ビデオ符号化器５は、入力されたビデオデータに対してＭＰＥＧ−４符号化方式で符号化し、その符号化データ（ビデオ符号化データ）を保持する。
【００３４】
さらに、そのビデオ符号化データにステップＳ２０３で生成したビデオ透かしデータを埋め込む（ステップＳ２０５）。すなわち、透かし埋め込み器８において、ビデオ符号化器５から読み出したビデオ符号化データに対して生成されたビデオ透かしデータを埋め込む。尚、埋め込みの方法としては、例えば各ブロックの最も高周波の係数を±１の範囲で増減させて奇数・偶数を透かしデータに従って意図的に符号化データを変更する方法等が用いられる。すなわち、埋め込むデータの１ビットが０であれば最後の係数を偶数に、１であれば奇数にする。また、埋め込むマクロブロックのＥＯＢの前の符号を読み出し、必要であればこれを変更する。
【００３５】
例えば、直前の符号の０ラン長が８で値が３であった時、埋め込む値が０であれば、ラン長が８で値が４の符号に置換する。実際の符号では、「１１１１１１１１１１１０１１０１１０」　を「１１１１１１１１１１１０１１０１１１」に置換する。値が１であれば何もしない。尚、本発明はこれに限定されず、特開平１１−３４１４５２号の「動画像電子透かしシステム」に記載されている方法等の既存の方法を使用しても良い。
【００３６】
一方で、マイク２から読み込まれるオーディオデータのフレームを符号化する（ステップＳ２０６）。すなわち、マイク２から入力されたオーディオデータはフレーム単位でオーディオ符号化器３に入力される。オーディオ符号化器３は入力されたオーディオデータに対してＭＰＥＧ−１　Ｌａｙｅｒ　３符号化方式で符号化し、その符号化データ（オーディオ符号化データ）を保持する。そして、そのオーディオ符号化データにステップＳ２０３で生成したオーディオ透かしデータを埋め込む（ステップＳ２０７）。すなわち、オーディオ符号化器３から読み出したオーディオ符号化データに、透かし埋め込み器７では生成されたオーディオ透かしデータが埋め込まれる。埋め込みの方法としては、例えば４バイトの値を埋め込む場合は、３２個のサンプルのＬＳＢをそれに当てれば良い。
【００３７】
そして、透かしデータが埋め込まれたそれぞれの符号化データは多重化器９で多重化され、記憶装置１０の所定の位置に蓄積される（ステップＳ２０８）。その後、多重化されたデータを出力し（ステップＳ２０９）、ステップＳ２０２に戻り、すべてのデータ入力が終わるまで処理が繰り返される。このような一連の選択動作により、オーディオデータとビデオデータに関して同じ情報を電子透かしによって埋め込むことにより、復号側での改ざん検出を容易にすることが可能になる。
【００３８】
上述したように、本発明に係る情報処理装置１は、ビデオデータとそれに同期したオーディオデータとを符号化するものである。具体的には、ビデオ符号化器５でビデオデータを符号化し、オーディオ符号化器３でオーディオデータを符号化する。また、透かし生成器６は、所定の透かしデータを生成する。そして、透かし埋め込み器８で透かしデータを符号化されたビデオデータに電子透かしによって埋め込む。一方、透かし埋め込み器７で透かしデータを符号化されたオーディオデータに電子透かしによって埋め込む。さらに多重化器９で透かしデータが埋め込まれたビデオデータとオーディオデータとを多重化した多重化データを生成する。
【００３９】
また、本発明に係る情報処理装置１は、ビデオデータとそれに同期したオーディオデータとを符号化するものであって、具体的には、ビデオデータとオーディオデータとが同期していることを示す同一性データを生成し、同一性データから所定の透かしデータが生成される。そして、透かしデータは、ビデオデータに電子透かしによって埋め込まれる。また、透かしデータはオーディオデータにも電子透かしによって埋め込まれる。そして、透かしデータが埋め込まれたビデオデータと透かしデータが埋め込まれたオーディオデータとが多重化された多重化データが生成される。
【００４０】
さらに、上記情報処理装置１は、ビデオデータを符号化するビデオ符号化器５を有し、透かし埋め込み器８は、符号化されたビデオデータに対して電子透かしによって透かしデータを埋め込むことを特徴とする。
【００４１】
さらにまた、上記情報処理装置１は、オーディオデータを符号化するオーディオ符号化器３を有し、透かし埋め込み器７は、符号化されたオーディオデータに対して電子透かしによって透かしデータを埋め込むことを特徴とする。
【００４２】
尚、本実施形態においてはビデオデータの符号化方式をＭＰＥＧ−４としたが他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２であってもよい。また、同様にオーディオデータの符号化方式もこれに限定されるものではなく、ＡＡＣ符号化やＡＤＰＣＭ符号化であってもよい。
【００４３】
また、本実施形態に係る符号化システムの各部又は全部の機能をソフトウェアで記述し、ＣＰＵ等の演算装置によって処理をさせてもよい。また、本実施形態では符号化データに透かしデータを後で埋め込むように構成したが、これに限定されるものではなく、例えば特開平１１−２８４５１６号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように、符号化しながら透かしデータの埋め込みを行ってもよい。
【００４４】
尚、透かしデータもオーディオ透かしデータとビデオ透かしデータが同一である必要は無く、例えば、４バイトのコードをビデオ透かしデータとし、その補数をオーディオ透かしデータとしてもよい。
【００４５】
＜第２の実施形態＞
図２は、本発明の第２の実施形態における復号システムの構成を示すブロック図である。尚、本実施形態では、ビデオデータの符号化方式としてＭＰＥＧ−４符号化方式を、オーディオデータの符号化方式としてＭＰＥＧ−１　Ｌａｙｅｒ３符号化方式を例にとって説明するが、本発明の適用はこの方式のみに限定されるものではない。
【００４６】
図２に示すように、本実施形態に係る復号システムは、動画像（本実施形態では映像と音声を含むものとする）を復号する情報処理装置１１と、それに接続する符号化データを記憶する記憶装置１２と、復号されて再生されたオーディオデータを再生するスピーカ２０と、再生されたビデオデータを表示するモニタ２１とから構成される。尚、記憶装置１２には、上述した第１の実施形態で生成された多重化された符号化データ（以下、「多重化符号化データ」と称す。）が記憶されているものとする。
【００４７】
また、情報処理装置１１には、上述した第１の実施形態における図１の多重化器９で多重化された多重化符号化データをビデオ符号化データとオーディオ符号化データとに分離する分離器１３が備わっている。そして、分離器１３には、透かし抽出器１４、１５が接続している。透かし抽出器１４は、オーディオ符号化データからオーディオ透かしデータを抽出する。一方、透かし抽出器１５は、ビデオ符号化データからビデオ透かしデータを抽出する。
【００４８】
また、両透かし抽出器１４、１５は、抽出されたそれぞれの透かしデータを比較する比較器１６に接続している。
【００４９】
また、透かし抽出器１４は、オーディオ符号化データを復号してオーディオデータを再生するオーディオ復号器１７に接続している。一方、透かし抽出器１５は、ビデオ符号化データを復号してビデオデータを再生するビデオ復号器１８に接続している。そして、表示制御器１９は、比較器１６とビデオ復号器１８に接続され、ビデオデータをフレーム単位で制御する。そして、モニタ２１が表示制御器１９の制御に基づき再生されたビデオデータ（映像）を表示する。
【００５０】
次に、上記構成をした画像復号システムにおける動画像データの復号処理の流れを説明する。図１０は、本発明の第２の実施形態に係る復号システムにおける復号処理手順を説明するためのフローチャートである。
【００５１】
まず、第１の実施形態と同様に、処理動作に先立って各部の初期化が行われ、記憶装置１２は蓄積された多重化符号化データの先頭に読み出し位置を設定する（ステップＳ３０１）。
【００５２】
次に、復号処理の終了判断を行う（ステップＳ３０２）。その結果、処理が終了していない場合、記憶装置１２の所定の位置から多重化符号化データを読み出され、読み出された多重化符号化データが分離器１３に入力される（ステップＳ３０３）。そして、分離器１３では、入力された多重化符号化データから１フレーム単位でオーディオ符号化データとビデオ符号化データとを分離する（ステップＳ３０４）。分離器１３で分離された２種類の符号化データは、フレーム単位で、ビデオ符号化データは透かし抽出器１５に出力され、オーディオ符号化データは透かし抽出器１４に出力される。
【００５３】
そこで、透かし抽出器１５は、ビデオ符号化データに埋め込まれたビデオ透かしデータを抽出し、暗号を復号してその結果を比較器１６に出力し、符号化データをビデオ復号器１８に出力する（ステップＳ３０５）。ビデオ復号器１８では、符号化データを復号し１フレーム分のビデオデータを再生する（ステップＳ３０６）。
【００５４】
同時に、透かし抽出器１４ではオーディオ符号化データに埋め込まれたオーディオ透かしデータを抽出し、暗号を復号してその結果を比較器１６に出力し、符号化データをオーディオ復号器１７に出力する（ステップＳ３０７）。オーディオ復号器１７では、符号化データを復号して１フレーム分のオーディオデータを再生する（ステップＳ３０８）。
【００５５】
比較器１６では、透かし抽出器１４で抽出されたオーディオ透かしデータと透かし抽出器１５で抽出されたビデオ透かしデータとを比較する（ステップＳ３０９）。その結果、両透かしデータが不一致の場合はステップＳ３１０に進む。ステップＳ３１０では、表示制御器１９にこの不一致を通知することによって、再生されたビデオデータのフレーム上に警告を重畳する。すなわち、比較器１６で透かしデータが不一致の場合は、ビデオ復号器１８の出力にオーディオデータとビデオデータとが不整合である旨の表示を重畳してモニタ２１に出力し、モニタ２１上で視覚的に表示する。
【００５６】
一方、比較器１６で透かしデータが一致していると判定された場合は、表示制御器１９によって、ビデオ復号器１８の出力をそのままモニタ２１に出力し、モニタ２１上に画像を表示する。この時、ビデオデータに同期しているオーディオデータも同時に復号結果をスピーカ２０に出力する（ステップＳ３１１）。このようにして、再生されたビデオデータの１フレームを表示し、同時にそれに同期したオーディオデータを再生した後ステップＳ３０２に進み、次のフレームの処理を行う。そして、ステップＳ３０２において全ての符号化データに関して処理が終了した場合、復号システム全体の処理を終了する。
【００５７】
図１１は、ビデオデータとオーディオデータに関する透かしデータが一致する場合と一致しない場合のモニタ２１上の画像の表示例を説明するための図である。図１１（Ａ）に示すように、両透かしデータが一致していた場合、改ざん等がないとして画像１０００が表示される。また、両透かしデータが不一致の場合、図１１（Ｂ）に示すように、改ざん等があったとしてテロップ１００１が重畳された画像が表示される。この時、オーディオデータの復号結果は画像表示と同時にスピーカ２０に出力される。
【００５８】
このような一連の選択動作において、オーディオデータとビデオデータに埋め込まれた透かしデータを比較することにより、データの改ざんを検知することができ、それをユーザに知らせることが可能になる。
【００５９】
すなわち、本発明に係る情報処理装置１１では、透かし抽出器１５は、第１の透かしデータ（ビデオ透かしデータ）が電子透かしによって埋め込まれたビデオデータから第１の透かしデータを抽出する。また、透かし抽出器１４は、第２の透かしデータ（オーディオ透かしデータ）が電子透かしによって埋め込まれたオーディオデータから第２の透かしデータを抽出する。そして、比較器１６は、第１の透かしデータと第２の透かしデータとの同一性を比較する。そして、情報処理装置１１では、同一性の比較結果に基づいて、ビデオデータとオーディオデータとが同期しているか否かが判定される。
【００６０】
また、本発明に係る情報処理装置１１では、分離器１３は、第１の透かしデータが埋め込まれたビデオデータと第２の透かしデータが埋め込まれたオーディオデータとが多重化された多重化データからビデオデータとオーディオデータとを分離する。そして、透かし抽出器１５は、ビデオデータに埋め込まれた第１の透かしデータを抽出する。また、透かし抽出器１４は、オーディオデータに埋め込まれた第２の透かしデータを抽出する。そして、比較器１６は、第１の透かしデータと第２の透かしデータとの同一性を比較する。そして、情報処理装置１１では、同一性の比較結果に基づいて、ビデオデータとオーディオデータとが同期しているか否かが判定される。
【００６１】
さらに、上記情報処理装置１１は、ビデオデータとオーディオデータが同期していないと判定された場合、モニタ２１におけるビデオデータの再生時に、オーディオデータと同期していないことを示す情報（テロップ１００１）を再生させる制御をすることを特徴とする。
【００６２】
さらにまた、上記情報処理装置１１は、ビデオデータが符号化されており、符号化された当該ビデオデータを復号するビデオ復号器１８と、オーディオデータが符号化されており、符号化された当該オーディオデータを復号するオーディオ復号器１７とをさらに有することを特徴とする。
【００６３】
さらにまた、上記情報処理装置１１は、ビデオデータを出力し、また、オーディオデータを出力する。さらにまた、情報処理装置１１は、ビデオデータを再生するモニタ２１に接続可能であることを特徴とする。さらにまた、情報処理装置１１は、オーディオデータを再生するスピーカ２０に接続可能であることを特徴とする。
【００６４】
尚、本実施形態においては動画像の符号化方式をＭＰＥＧ−４としたが他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２、Ｍｏｔｉｏｎ　ＪＰＥＧであってもよい。また、同様にオーディオの符号化方式もこれに限定されず、ＡＡＣ符号化やＡＤＰＣＭ符号化であってもよい。
【００６５】
また、本実施形態に係る復号システムの各部又は全部の機能をソフトウェアで記述し、ＣＰＵ等の演算装置によって処理をさせてもよい。
【００６６】
また、本実施形態では、第１の実施形態と同様に、符号化データに透かしデータを後で埋め込むように構成した。尚、本発明の適用は上述した場合に限定されるものではなく、例えば特開平１１−２８４５１６号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように、復号しながら透かしデータの抽出を行ってもよい。
【００６７】
また、透かしデータもオーディオ透かしデータとビデオ透かしデータが同一である必要は無く、例えば、４バイトのコードをビデオ透かしデータとし、その補数をオーディオ透かしデータとし、比較器１６で演算を行なって同一性を検証してもよい。
【００６８】
＜第３の実施形態＞
次に、本発明の第３の実施形態について、図面を用いて詳細に説明する。
【００６９】
図４は、本発明の第３の実施形態に係る画像を撮像して音声と共に記録するカムコーダシステムの構成を示すブロック図である。図４において、符号１０１はオーディオデータを入力するマイクを示し、１０２はレンズ等から構成される光学系を示し、１０３は光の強度によって電気信号を生成するＣＣＤ等で構成される光電変換器を示す。
【００７０】
また、符号１０４、１０５はアナログの電気信号をデジタル信号に変換するＡ／Ｄ変換器を示し、１０６、１０７、１２１、１２６はデジタル信号をフレーム毎に蓄積するフレームメモリを示す。さらに、１０８はフレーム単位でオーディオ符号化を行うオーディオ符号化器を示す。さらにまた、１０９はフレーム単位でビデオ符号化を行うビデオ符号化器を示す。一方、１１０はオーディオ符号化器１０８とビデオ符号化器１０９の出力から透かしデータを生成する透かし生成器を示す。例えば、オーディオデータとそれに同期するビデオデータとの間で共有するデータを透かしデータとして用いてもよい。
【００７１】
また、符号１１１はオーディオ符号化データに透かしデータを埋め込む透かし埋め込み器を示し、１１２はビデオ符号化データに透かしデータを埋め込む透かし埋め込み器を示す。さらに、１１３はオーディオ符号化データとビデオ符号化データを多重化してストリームを整形する多重化器を示す。さらにまた、１１４はストリームを記録媒体１１５に対して読み書きを行う記録媒体制御器を示す。尚、記録媒体１１５は光磁気ディスク等で構成されるが、本発明の適用はこれに限定されるものではない。
【００７２】
さらに、符号１１６はストリームからオーディオ符号化データとビデオ符号化データを分離する分離器を示す。さらにまた、１１７はオーディオ符号化データから透かしデータを抽出する透かし抽出器を示し、１１８はビデオ符号化データから透かしデータを抽出する透かし抽出器を示す。さらにまた、１１９は抽出された透かしデータを比較する比較器を示す。
【００７３】
さらにまた、符号１２０はオーディオ符号化データを復号してオーディオデータを再生するオーディオ復号器を示す。また、１２２はフレームメモリ１２１内のオーディオデータをデジタルからアナログ信号に変換するＤ／Ａ変換器であり、１２３はアナログのオーディオ信号を再生して音を出すスピーカを示す。さらに、１２４は透かし比較器１１９の結果を表示するインジケータを示しており、ＬＥＤ等で構成される。
【００７４】
さらにまた、符号１２５はビデオ符号化データを復号して画像データを再生するビデオ復号器を示す。さらにまた、１２７はフレームメモリ１２６内のビデオデータをデジタルからアナログ信号に変換するＤ／Ａ変換器を示し、１２８はアナログのビデオ信号を表示するビューワ（Ｖｉｅｗｅｒ）を示し、ＬＣＤ等で構成される。
【００７５】
次に、上記構成をした本実施形態に係るカムコーダにおける動画像の記録動作を説明する。本実施形態においては、ビデオ符号化方式としてＭＰＥＧ−４符号化方式を、オーディオ符号化方式としてＡＡＣ符号化方式を例にとって説明する。また、説明を容易にするためにフレームサイズをビデオの１フレームの間隔とする。
【００７６】
最初に、符号化データを記録処理について述べる。図１２は、第３の実施形態に係るカムコーダシステムにおける動画像記録処理を説明するためのフローチャートである。まず、装置の初期化を行い、各種のヘッダを生成して記録装置に格納し、ＩＤ番号を設定する（ステップＳ４０１）。すなわち、カムコーダに電源が入ると各部の初期化が行われる。そして、光電変換器１０３のセンサ間のバラツキ補正、フレームメモリ１０５、１０７のクリアを行う。また、記録媒体制御器１１４は、記録媒体１１５上の空いている領域の先頭に書き込み位置をセットする。
【００７７】
そして、不図示の録画ボタンをユーザが押すと動画の記録が開始される。多重化器１１３は、多重化に必要なヘッダデータを生成し、記録媒体制御器１１４を介して記録媒体１１５上の所定の領域に書き込む。オーディオ符号化器１０６は、ＡＡＣ符号化方式に従ってヘッダデータを生成する。ビデオ符号化器１０９は、ＭＰＥＧ−４符号化方式に従ってヘッダデータを生成する。生成されたそれぞれのヘッダデータは、多重化器１１３、記録媒体制御器１１４を介して、記録媒体１１５の所定の位置に格納される。また、第１の実施形態と同様にＩＤ番号を設定する。尚、本実施形態では、乱数から選択された１６ビットの整数をＩＤ番号として用いる。
【００７８】
次に、処理の終了判断を行う（ステップＳ４０２）。そして、終了ではない場合、入力されたビデオデータの１フレーム分をＭＰＥＧ−４符号化方式に従って符号化する（ステップＳ４０３）。すなわち、光学系１０２を介して入ってきた光を光電変換器１０３は電気信号に変え、Ａ／Ｄ変換器１０５でデジタル信号にしてフレーム単位でフレームメモリ１０７に格納する。ビデオ符号化器１０９は、ＭＰＥＧ−４符号化方式に従って、入力されたフレーム単位で画像データを符号化する。生成された符号化データは、透かし生成器１１０と透かし埋め込み器１１２に入力される。
【００７９】
同時に、入力されたオーディオデータの１フレーム分をＡＡＣ符号化方式に従って符号化する（ステップＳ４０４）。すなわち、マイク１０１から入力されたオーディオ信号は、Ａ／Ｄ変換器１０４でデジタル信号に変換されて、フレーム単位でフレームメモリ１０６に格納する。オーディオ符号化器１０８は、ＡＡＣ符号化方式に従って、入力されたフレーム単位でオーディオデータを符号化する。生成された符号化データは、透かし生成器１１０と透かし埋め込み器１１１に入力される。
【００８０】
次いで、透かし生成器１１０では、ＩＤ番号とビデオ符号化データの一部とオーディオ符号化データの一部からビデオ透かしデータを、ＩＤ番号とビデオ符号化データの一部からオーディオ透かしデータを生成する（ステップＳ４０５）。すなわち、透かし生成器１１０では、オーディオデータ符号化器１０８から入力された符号化データから予め決められたビットデータを読み出す。例えば、先頭から３ビット目，９ビット目，１７ビット目、…といったような予め決められたビットを１６ビット読み出しオーディオ識別データとする。
【００８１】
また、ビデオデータ符号化器１０９から入力された符号化データから予め決められたビットデータを読み出す。例えば、先頭から５４ビット目，６１ビット目，７７ビット目、…といったような予め決められたビットを１６ビット読み出しビデオ識別データとする。これらのＩＤ番号、オーディオ識別データ、ビデオ識別データの各ビットを順に上位から並べて４８ビットのデータを作成し、これにをビデオ透かしデータとする。また、ビデオ識別データとＩＤ番号の各ビットを順に上位から並べて３２ビットのデータを作成し、オーディオ透かしデータとする。
【００８２】
その後、ビデオ符号化データにステップＳ４０５で生成したビデオ透かしデータを埋め込む（ステップＳ４０６）。すなわち、透かし埋め込み器１１１はオーディオ透かしデータをオーディオ符号化データに埋め込み、多重化器１１３に入力する。
【００８３】
同時に、オーディオ符号にステップＳ４０５で生成したオーディオ透かしデータを埋め込む（ステップＳ４０７）。すなわち、透かし埋め込み器１１２はビデオ透かしデータをビデオ符号化データに埋め込み、多重化器１１３に入力する。
【００８４】
そして、多重化器１１３は、これらの２つの符号化データを多重化し、記録媒体制御器１１４を介して、記録媒体１１５の所定の位置に格納する（ステップＳ４０８）。さらに、多重化したストリームを出力して記憶装置１１５に格納し（ステップＳ４０９）、ステップＳ４０２に戻って撮影が終わるまで処理が繰り返される。尚、不図示のユーザが録画ボタンを放したり、再度押した場合、録画が終了したと判断し、処理を終了するものとする。
【００８５】
上述したような一連の選択動作により、フレーム毎で変化するデータを透かしとして埋め込むことにより、フレーム単位での改ざんの検出を容易にするデータの生成が好適に行えるようにすることが可能である。
【００８６】
すなわち、本発明では、透かし生成器１１０が、ビデオデータと当該ビデオデータに同期したオーディオデータとに基づいて、ビデオデータとオーディオデータに電子透かしによって埋め込まれる共通の透かしデータを生成することを特徴とする。また、本発明では、多重化データを可搬記録媒体１１５に記録することを特徴とする。
【００８７】
尚、本実施形態においては動画像の符号化方式をＭＰＥＧ−４としたが、他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭｏｔｉｏｎＪＰＥＧであってもよい。同様に、オーディオの符号化方式もこれに限定されるものではなく、ＭＰＥＧ−１　Ｌａｙｅｒ　２符号化やＡＤＰＣＭ符号化であってもよい。
【００８８】
また、本実施形態に係るカムコーダシステムの各部又は全部の機能をソフトウェアで記述し、ＣＰＵ等の演算装置によって処理をさせてもよい。
【００８９】
また、本実施形態では、符号化データに透かしデータを後で埋め込むように構成したが、本発明はこれに限定されるものではなく、例えば、特開平１１−２８４５１６号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように符号化しながら透かしデータの埋め込みを行うものであってもよい。
【００９０】
また、オーディオのフレームサイズをビデオの１フレームの間隔としたが、これに限定されず、オーディオのフレームサイズとビデオのフレームサイズが異なってもよい。例えば、オーディオが２０ｍｓ間隔、ビデオが３３ｍｓ間隔であったとしても、ビデオの時間間隔にオーディオのフレームの切れ目があれば、その切れ目から始まる符号化データに関して透かしデータを作成し、ビデオ符号化データに複数の透かしデータを埋め込めばよい。
【００９１】
＜第４の実施形態＞
本実施形態では、上述した第３の実施形態のカムコーダシステムにおける動画像の再生までの処理について説明する。図１３は、第４の実施形態に係るカムコーダシステムにおける動画像再生処理を説明するためのフローチャートである。
【００９２】
まず、第２の実施形態と同様に、動作に先立って装置の各部の初期化が行われる（ステップＳ５０１）。そして、記録媒体制御器１１４は記録媒体１１５に蓄積された符号化データの先頭に読み出し位置を設定する。次いで、処理の終了判断が行われる（ステップＳ５０２）。その結果、終了していない場合、記録媒体１１５の所定の位置から符号化データが読み出され、読み出された符号化データは分離器１１６に入力される（ステップＳ５０３）。
【００９３】
続いて、分離器１１６では、入力された符号化データからオーディオ符号化データとビデオ符号化データを分離し、フレーム単位でオーディオ符号化データは透かし抽出器１１７に、ビデオ符号化データは透かし抽出器１１８に出力される（ステップＳ５０４）。
【００９４】
そして、透かし抽出器１１７では、オーディオ符号化データに埋め込まれたオーディオ透かしデータを抽出し、その結果を比較器１１９に、符号化データは比較器１１９とオーディオ復号器１２０に出力する（ステップＳ５０７）。同様に、透かし抽出器１１６では、ビデオ符号化データに埋め込まれたビデオ透かしデータを抽出し、その結果を比較器１１９に、符号化データをビデオ復号器１２５に出力する（ステップＳ５０５）。
【００９５】
そして、ビデオ復号器１２５では、画像符号化データを復号し１フレーム分の画像データを再生する（ステップＳ５０６）。同時に、オーディオ復号器１２０では、オーディオ符号化データを復号し１フレーム分のオーディオデータを再生する（ステップＳ５０８）。
【００９６】
比較器１１９では、オーディオ透かしデータとビデオ透かしデータとからＩＤ番号、オーディオ識別データ。ビデオ識別データを比較する（ステップＳ５０９）。その結果、すべてが一致すれば、インジケータ１２４を点灯する（ステップＳ５１０）。一方、一致しないものがあれば、インジケータ１２４を消灯する（ステップＳ５１１）。
【００９７】
すなわち、比較器１１９には、透かし抽出器１１７で抽出されたオーディオ透かしデータと、透かし抽出器１１６で抽出されたビデオ透かしデータとが入力される。そして、オーディオ透かしデータからＩＤ番号Ａとビデオ識別データＡを再生する。また、入力されたオーディオ符号化データから第３の実施形態のオーディオ識別データの生成方法と同じ方法でオーディオ識別データＡを生成する。さらに、ビデオ透かしデータからＩＤ番号Ｖとオーディオ識別データＶとビデオ識別データＶを再生する。
【００９８】
そこでまず、ＩＤ番号、ビデオ識別データ、オーディオ識別データをそれぞれ比較し、１つでも一致しないものがあれば、インジケータ１２４を点灯する。すべて一致すれば、これは消灯される。すなわち、ＩＤ番号ＡとＩＤ番号Ｖ、ビデオ識別データＡとビデオ識別データＶ、オーディオ識別データＡとオーディオ識別データＶである。
【００９９】
そして、再生されたビデオデータの１フレームの画像をビューワに表示し、オーディオデータを再生する（ステップＳ５１２）。そして、ステップＳ５０２に進み、次のフレームの処理を行う。また、ステップＳ５０２にて全ての符号化データに関して処理が終了すれば、再生処理を終了する。
【０１００】
すなわち、ビデオ符号化データはビデオ復号器１２５に、オーディオ符号化データはオーディオ復号器１２０に入力され、それぞれ復号されてフレームメモリ１２６とフレームメモリ１２１に格納される。フレームメモリ１２１に格納されたオーディオデータはＤ／Ａ変換器でアナログ信号に変換され、スピーカ１２３で再生される。フレームメモリ１２６に格納されたビデオデータもＤ／Ａ変換器でアナログ信号に変換され、ビューワ１２８に表示される。
【０１０１】
このような一連の選択動作により、オーディオデータとビデオデータに埋め込まれたフレーム毎で変化するデータを比較することにより、細かい単位で改ざんを検知することができ、それをユーザに知らせることが可能になる。
【０１０２】
上述したように、本発明は、ビデオ透かしデータが埋め込まれたビデオデータとオーディオ透かしデータが埋め込まれたオーディオデータとが多重化された多重化データを格納する可搬記録媒体１１５から当該多重化データを読み取ることを特徴とする。
【０１０３】
尚、本実施形態においては動画像の符号化方式をＭＰＥＧ−４としたが、他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４でもよい。同様に、オーディオデータの符号化方式もこれに限定されず、ＭＰＥＧ−１　Ｌａｙｅｒ　３符号化やＡＤＰＣＭ符号化でもよい。
【０１０４】
また、本実施形態に係るカムコーダシステムの各部又は全部の機能をソフトウェアで記述し、ＣＰＵ等の演算装置によって処理をさせてもよい。
【０１０５】
また、本実施形態では符号化データに透かしデータを後で埋め込むように構成したが、本発明の適用はこれに限定されるものではない。本発明は、例えば、特開平１１−２８４５１６号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように、復号しながら透かしデータの抽出を行ってもよい。
【０１０６】
＜第５の実施形態＞
図３は、本発明の第５の実施形態に係る情報処理装置の構成を示すブロック図である。図３において、符号３００は装置全体の制御及び種々の処理を行う中央演算装置（ＣＰＵ）、３０１は本装置の制御に必要なオペレーティングシステム（ＯＳ）、ソフトウェア、演算に必要な記憶領域を提供するメモリを示す。また、３０２は種々の装置をつなぎ、データ、制御信号をやり取りするバスを示す。
【０１０７】
さらに、符号３０３は装置の起動、各種条件の設定、再生の指示を行うための端末を示す。さらに、３０４はソフトウェアを蓄積する記憶装置を示す。また、３０５はストリームを蓄積する記憶装置を示す。尚、記憶装置３０４、３０５はシステムから切り離して移動できるメディア（可搬記録媒体）で構成することも可能である。また、３０６は画像を撮像するカメラを示し、３０７はオーディオデータを取り込むオーディオキャプチャを示す。さらに、３０８は画像を表示するモニタを示し、３０９はオーディオデータを再生するスピーカを示す。さらにまた、３１１は通信回路を示し、ＬＡＮ、公衆回線、無線回線、放送電波等で構成されている。また、３１０は通信回路３１１を介してストリームを送受信する通信インタフェースを示す。
【０１０８】
ここで、メモリ３０１には、装置全体を制御し、各種ソフトウェアを動作させるためのＯＳや動作させるソフトウェアを格納し、画像データを格納する画像エリア、オーディオデータを格納するオーディオエリア、生成した符号化データを格納する符号エリア、各種演算や符号化の際のパラメータ等や透かしに関するデータ等を格納しておくワーキングエリアが存在する。
【０１０９】
次に、このような構成の情報処理装置における画像データの符号化処理について説明する。図５は、ＣＰＵ３００による動画像データの記憶装置３０５に記録する動作手順を説明するためのフローチャートである。
【０１１０】
まず、処理に先立ち、端末３０３から装置全体に対して起動が指示され、装置の各部が初期化される。そして、ＩＤ番号を設定してメモリ３０１上のワーキングエリアに格納する（ステップＳ１）。本実施形態では、ＩＤ番号は乱数で決定するものとする。
【０１１１】
次いで、記憶装置３０４に格納されているソフトウェアがバス３０２を介してメモリ３０１に展開され、ソフトウェアが起動される。これによって、多重化、オーディオ符号化、ビデオ符号化のそれぞれのヘッダを生成し、それらを多重化して記憶装置３０５の所定の位置に書き込む（ステップＳ２）。ビデオ符号化ではＭＰＥＧ−１符号化方式に従って、シーケンス層の符号データがヘッダとして生成される。
【０１１２】
図７は、符号化時のメモリ３０１におけるメモリの使用・格納状況を説明するための概要図である。図７に示すように、メモリ３０１には装置全体を制御し、各種ソフトウェアを動作させるためのＯＳ、画像データを符号化するビデオ符号化ソフトウェア、オーディオデータを符号化するオーディオ符号化ソフトウェア、符号化データを多重化する多重化ソフトウェア、透かしデータの生成と埋め込みを行う透かし埋め込みソフトウェアが格納されている。
【０１１３】
尚、本実施形態では、ビデオ符号化ソフトウェアをＭＰＥＧ−１符号化方式で符号化するソフトウェアとして説明をするが、本発明の適用はこれに限定されるものではない。また、オーディオ符号化ソフトウェアはＡＡＣ符号化方式で符号化するソフトウェアとして説明をするが、同様にしてこれに限定されるものではない。
【０１１４】
また、符号化の際に画像を格納する画像エリア、オーディオデータを格納するオーディオエリア、生成された符号や透かしの埋め込みが行われた符号化データを格納する符号エリア、各種演算のパラメータ等を格納するワーキングエリアが存在する。このような構成において、端末３０６からの指示によって動画像の入力をカメラ３０６から行い、オーディオの入力がオーディオキャプチャ３０７から行われる。
【０１１５】
そこで、次に処理の終了判断が行われる（ステップＳ３）。その結果、処理が終了していなければ、カメラ３０６から１フレーム分のビデオデータをメモリ３０１上の画像エリアに読み込まれる（ステップＳ４）。
【０１１６】
次に、メモリ３０１上の画像エリアに格納されたビデオデータをメモリ３０１上のビデオ符号化ソフトウェアを用いてＭＰＥＧ−１符号化方式で符号化し、メモリ３０１上の符号エリアに格納し、画像エリアにあるビデオのフレームデータの領域を開放する（ステップＳ５）。さらに、オーディオキャプチャ３０７から１フレーム分のオーディオデータを読み込み、メモリ３０１上のオーディオエリアに格納する（ステップＳ６）。さらにまた、メモリ３０１上のオーディオエリアに格納されたオーディオデータをメモリ３０１上のオーディオ符号化ソフトウェアを用いてＡＡＣ符号化方式で符号化し、メモリ３０１上の符号エリアに格納し、オーディオエリアにあるオーディオのフレームデータの領域を開放する（ステップＳ７）。
【０１１７】
そして、メモリ３０１上のワーキングエリアに格納されたＩＤ番号、フレームを符号化する時の記録日時、ビデオフレームのタイムスタンプから透かしデータを生成し、メモリ３０１上のワーキングエリアのデータを更新する（ステップＳ８）。例えば、ＩＤ番号が１６ビット、記録日時を表すのに西暦で１４ビット、月で２ビット、日時で５ビット、秒で６ビット、タイムスタンプで１０ビットとすると、全部で５３ビットのデータになり、これを暗号化して透かしデータとする。そして、透かしデータを生成した後、ステップＳ９に進む。
【０１１８】
ステップＳ９では、メモリ３０１上の透かし埋め込みソフトウェアを用いて、ワーキングエリア上の透かしデータを符号エリアのビデオ符号化データに埋め込み、符号エリアに格納し、埋め込み前の符号化データの領域を開放して、ステップＳ１０に進む。ステップＳ１０では、ワーキングエリア上の透かしデータを符号エリアのオーディオ符号化データに埋め込み、符号エリアに格納し、埋め込み前の符号化データの領域を開放して、ステップＳ１１に進む。
【０１１９】
ステップＳ１１では、生成されメモリ３０１上の符号エリアに格納されたオーディオ符号化データとビデオ符号化データを、多重化ソフトウェアを用いて多重化し、記憶装置３０５の所定の領域に書き込み、符号エリアのデータ領域を開放して、ステップＳ３に戻り次のフレームデータの処理を行う。
【０１２０】
一方、ステップＳ３において全てのフレームデータに関して処理が終了した場合、再生処理を終了する。
【０１２１】
このような一連の選択動作により、オーディオデータとビデオデータに埋め込まれたフレーム毎で連続して変化するデータを比較することにより、細かい単位で改ざんを検知することができ、それをユーザに知らせることが可能になる。
【０１２２】
尚、本実施形態においては、動画像の符号化方式をＭＰＥＧ−１としたが他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４でももちろんかまわない。また、同様にオーディオの符号化方式もこれに限定されず、ＭＰＥＧ−１　Ｌａｙｅｒ　３符号化やＡＤＰＣＭ符号化であってもよい。
【０１２３】
また、本実施形態に係る情報処理装置の各部又は全部の機能をハードウェアで構成し、処理をさせてもよい。
【０１２４】
また、本実施形態では符号化データに透かしデータを後で埋め込むように構成したが、本発明の適用はこれに限定されるものでない。例えば、特開平１１−２８４５１号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように、符号化しながら透かしデータの埋め込みを行うものであってもかまわない。
【０１２５】
また、透かしデータは上述した種類に限定されるものではなく、その他フレームを特定することが可能な情報を用いてもよい。また、生成されたストリームを記憶装置３０５に格納するだけではなく、通信インタフェース３１０を介して通信回線３１１に出力してもよい。
【０１２６】
＜第６の実施形態＞
本実施形態では、画像データの復号処理について説明する。尚、画像データ処理装置の構成は、上述した第５の実施形態で用いた図３に示す情報処理装置を用いる。また、本実施形態においてもビデオ符号化としてＭＰＥＧ−４符号化方式、オーディオ符号化としてＡＡＣ符号化を例にとって説明するが、本発明の適用はこれに限定されることはない。そこで本実施形態では、第５の実施形態で生成され、記憶装置３０５に格納された符号化データの復号処理を例にとって説明する。
【０１２７】
本実施形態では、図５の構成をした情報処理装置において、処理に先立って端末３０３を用いて、記憶装置３０５に蓄積されている動画像符号化データから復号する符号化データを選択し、装置の起動が指示される。これにより、記憶装置３０４に格納されているソフトウェアがバス３０２を介してメモリ３０１に展開され、ソフトウェアが起動される。
【０１２８】
また、メモリ３０１には、装置全体を制御し各種ソフトウェアを動作させるためのＯＳや動作させるソフトウェアを格納し、画像データを格納する画像エリア、オーディオデータを格納するオーディオエリア、入力した符号化データを格納する符号エリア、各種演算や復号の際のパラメータ等や透かしに関するデータ等を格納しておくワーキングエリアが存在する。
【０１２９】
このような構成による情報処理装置における画像データの復号処理について説明する。図６は、ＣＰＵ３００による動画像データの記憶装置３０５から読み出し再生する動作手順を説明するためのフローチャートである。
【０１３０】
まず、処理に先立ち、端末３０３から装置全体に対して起動が指示され、装置の各部が初期化される（ステップＳ５１）。
【０１３１】
図８は、復号時のメモリ３０１におけるメモリの使用・格納状況を説明するための概要図である。図８に示すように、メモリ３０１には、装置全体を制御し各種ソフトウェアを動作させるためのＯＳ、画像データを復号するビデオ復号ソフトウェア、オーディオデータを復号するオーディオ復号ソフトウェア、符号化データの多重化を解き、それぞれの符号化データを分離する多重分離ソフトウェア、透かしデータの抽出と解析を行う透かし抽出ソフトウェアが格納されている。
【０１３２】
また、復号の際に画像を格納する画像エリア、オーディオデータを格納するオーディオエリア、入力された符号化データを格納する符号エリア、各種演算のパラメータ等を格納するワーキングエリアが存在する。
【０１３３】
そして、多重化、オーディオ符号化、ビデオ符号化のそれぞれのヘッダを解釈して各所の初期設定が行われる（ステップＳ５２）。さらに、処理の終了判断が行われる（ステップＳ５３）。その結果、処理が終了しない場合、ステップＳ５４に進む。
【０１３４】
ステップＳ５４では、記憶装置３０５の所定の位置から符号化データを読み込み、メモリ３０１上の多重分離ソフトウェアを用いて、符号化データから１フレーム単位でオーディオ符号化データとビデオ符号化データに分離して、それぞれをメモリ３０１上の符号エリアに格納する。そして、メモリ３０１上の符号エリアに格納されたビデオ符号化データから、メモリ３０１上の透かし抽出ソフトウェアを用いて、ビデオ透かしデータを抽出する（ステップＳ５５）。
【０１３５】
さらに、メモリ３０１上のビデオ復号ソフトウェアを用いて、メモリ３０１上の符号エリアに格納されたビデオ符号化データをＭＰＥＧ−１符号化方式に従って復号し、再生した画像データをメモリ３０１上の画像エリアに格納し、符号エリアの該当する領域を開放する（ステップＳ５６）。そして、メモリ３０１上の透かし抽出ソフトウェアを用いて、メモリ３０１上の符号エリアに格納されたオーディオ符号化データからオーディオ透かしデータを抽出し、その内容をワーキングエリアに格納する（ステップＳ５７）。
【０１３６】
また、メモリ３０１上のオーディオ復号ソフトウェアを用いて、メモリ３０１上の符号エリアに格納されたオーディオ符号化データをＡＡＣ符号化方式に従って復号して１フレーム分のオーディオデータを再生し、メモリ３０１上のオーディオエリアに格納し、符号エリアの該当する領域を開放する（ステップＳ５８）。さらに、メモリ３０１上の透かし抽出ソフトウェアを用いて、メモリ３０１上のワーキングエリアにあるオーディオ透かしデータとビデオ透かしデータからそれぞれのＩＤ番号、日時、タイムスタンプ等の情報を解析し、その結果を分類毎にワーキングエリアに格納する（ステップＳ５９）。
【０１３７】
ここで、メモリ３０１上のワーキングエリアに格納されたＩＤ番号や日時等の情報を比較する（ステップＳ６０）。その結果、すべてが一致する場合、タイムスタンプに従って、メモリ３０１上の画像エリアの画像データをモニタ３０８に表示し、オーディオエリアのオーディオデータをスピーカ３０９から出力し、それぞれのエリアの領域を開放する（ステップＳ６１）。そして、ステップＳ５３に戻って、次のフレームの処理を行う。そして、ステップＳ５３において、全ての符号化データに関して処理が終了する場合は再生処理を終了する。
【０１３８】
一方、ステップＳ６０において不一致である場合、ステップＳ５３に戻って、内容を再生しないで次のフレームの処理を行う。
【０１３９】
このような一連の選択動作において、オーディオデータとビデオデータに埋め込まれたフレーム毎で連続して変化する透かしデータを比較することにより、細かい単位で改ざんを検知することができ、その再生方法を制御することが可能になる。
【０１４０】
尚、本実施形態においては動画像の符号化方式をＭＰＥＧ−１としたが、他の符号化方式、例えば、Ｈ．２６１、ＭＰＥＧ−１、ＭＰＥＧ−２、ＭＰＥＧ−４でももちろんかまわない。同様に、オーディオの符号化方式もこれに限定されず、ＭＰＥＧ　Ｌａｙｅｒ　３符号化やＡＤＰＣＭ符号化であってもよい。
【０１４１】
また、本実施形態の各部又は全部の機能をハードウェアで構成し、処理をさせてもよい。
【０１４２】
また、本実施形態では符号化データに透かしデータを後で埋め込むように構成したが、本発明の適用はこれに限定されるものではない。本発明では、例えば、特開平１１−２８４５１６号の「データ処理装置、データ処理方法及び記録媒体」に記載されているように復号しながら透かしデータの抽出を行ってもよい。
【０１４３】
本発明は、上述した実施形態の他に、複数の機器（例えば、ホストコンピュータ、インタフェース機器、カムコーダ、ビデオカメラ等）から構成されるシステムに適用しても、一つの機器からなる装置（例えば、ＶＴＲ、テレビ装置等）に適用してもよい。
【０１４４】
また、本発明の目的は、前述した実施形態の機能を実現するソフトウェアのプログラムコードを記録した記録媒体（または記憶媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。この場合、記録媒体から読み出されたプログラムコード自体が前述した実施形態の機能を実現することになり、そのプログラムコードを記録した記録媒体は本発明を構成することになる。また、コンピュータが読み出したプログラムコードを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムコードの指示に基づき、コンピュータ上で稼働しているオペレーティングシステム（ＯＳ）などが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【０１４５】
さらに、記録媒体から読み出されたプログラムコードが、コンピュータに挿入された機能拡張カードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張カードや機能拡張ユニットに備わるＣＰＵなどが実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。
【０１４６】
本発明を上記記録媒体に適用する場合、その記録媒体には、先に説明したフローチャートに対応するプログラムコードが格納されることになる。
【０１４７】
【発明の効果】
以上説明したように、本発明によれば、特定のファイルフォーマットを用いる必要がなく、ビデオデータとオーディオデータとの同一性を破壊するような改ざんを防止することができる。そして、ビデオデータとオーディオデータとの同一性を破壊するような改ざんを好適に検出することができる。
【０１４８】
また、本発明によれば、オーディオデータとそれに同期するビデオデータとの間で共有するデータを透かしデータとして符号化データの中に埋め込むことにより、オーディオとビデオの同一性すなわち、正しい組み合わせの判別が容易に行うことが可能になる。
【０１４９】
さらに、本発明によれば、透かしデータに埋め込むことにより、特定のファイルフォーマットに依存せず、種々のファイルフォーマットで記述された場合でも、再生に関して問題を生じないという効果が得られる。
【図面の簡単な説明】
【図１】本発明の第１の実施形態における符号化システムの構成を示すブロック図である。
【図２】本発明の第２の実施形態における復号システムの構成を示すブロック図である。
【図３】本発明の第５の実施形態に係る情報処理装置の構成を示すブロック図である。
【図４】本発明の第３の実施形態に係る画像を撮像して音声と共に記録するカムコーダシステムの構成を示すブロック図である。
【図５】ＣＰＵ３００による動画像データの記憶装置３０５に記録する動作手順を説明するためのフローチャートである。
【図６】ＣＰＵ３００による動画像データの記憶装置３０５から読み出し再生する動作手順を説明するためのフローチャートである。
【図７】符号化時のメモリ３０１におけるメモリの使用・格納状況を説明するための概要図である。
【図８】復号時のメモリ３０１におけるメモリの使用・格納状況を説明するための概要図である。
【図９】本発明の第１の実施形態に係る符号化システムにおける符号化処理手順を説明するためのフローチャートである。
【図１０】本発明の第２の実施形態に係る復号システムにおける復号処理手順を説明するためのフローチャートである。
【図１１】ビデオデータとオーディオデータに関する透かしデータが一致する場合（Ａ）と一致しない場合（Ｂ）のモニタ２１上の画像の表示例を説明するための図である。
【図１２】第３の実施形態に係るカムコーダシステムにおける動画像記録処理を説明するためのフローチャートである。
【図１３】第４の実施形態に係るカムコーダシステムにおける動画像再生処理を説明するためのフローチャートである。
【符号の説明】
１、１１　情報処理装置
２、１０１、３０７　マイク
３、１０８　オーディオ符号化器
４、３０６　カメラ
５、１０９　ビデオ符号化器
６、１１０　透かし生成器
７、８、１１１、１１２　透かし埋め込み器
９、１１３　多重化器
１０、１２　記憶装置
１３、１１６　分離器
１４、１５、１１７、１１８　透かし抽出器
１６、１１９　比較器
１７、１２０　オーディオ復号器
１８、１２５　ビデオ復号器
１９　表示制御器
２０、１２３、３０９　スピーカ
２１、３０８　モニタ
１０２　光学系
１０３　光電変換器
１０４、１０５　Ａ／Ｄ変換器
１０６、１０７、１２１、１２６　フレームメモリ
１１４　記録媒体制御器
１１５　記録媒体
１２２、１２７　Ｄ／Ａ変換器
１２４　インジケータ
１２８　ビューア
３００　ＣＰＵ
３０１　メモリ
３０２　バス
３０３　端末
３０４、３０５　記憶装置
３０９　オーディオキャプチャ
３１０　通信インタフェース
３１１　通信回線[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an information processing apparatus and an information processing method for detecting falsification of video data and audio data related thereto, a program, and a computer-readable recording medium.
[0002]
[Prior art]
Generally, as an encoding method of a moving image, an encoding method such as Motion JPEG (Joint Photographic Coding Experts Group) or Digital Video, which is an intra-frame encoding method, or an H.264 encoding method using inter-frame prediction encoding is used. 261, H .; H.263, MPEG (Moving Picture Coding Experts Group) -1, MPEG-2, MPEG-4, etc. are known. These coding schemes are internationally standardized by ISO (International Organization for Standardization) and ITU (International Telecommunication Union).
[0003]
With the widespread use of the digital encoding standards as described above, the content industry, such as video and music, has strongly raised the problem of copyright protection. In response to this, standardization regarding content protection has been advanced, and in the MPEG-4 encoding method, a method of describing security information in a file system using IPMP OD and restricting reproduction by the security information has been standardized. Digital watermark technology has been developed for security-related information and encryption. The digital watermarking technique is a technique for embedding data at a level where data does not change during data reproduction or a change is not perceivable.
[0004]
As techniques for embedding a digital watermark in video data, Japanese Patent Application Laid-Open No. H10-243398, entitled “Recording Medium and Video Encoding Apparatus Recording Moving Image Encoding Program” and Japanese Patent Application Laid-Open No. H11-341450, Extraction device "and the like. Similarly, with respect to audio data, Japanese Patent Application Laid-Open No. 2001-202089 discloses a method of embedding watermark information in audio data, a watermark information embedding device, a watermark information detecting device, a recording medium in which watermark information is embedded, and a method of embedding watermark information. There is disclosed a technique for embedding a digital watermark in a “recording medium on which a digital watermark is recorded” or in “Digital watermark embedding device, audio encoding device and recording medium” in JP-A-11-316599.
[0005]
Further, when a part of a still image is falsified by image processing or the like, a method of detecting this is disclosed in Japanese Patent Application Laid-Open No. 2001-78070, “Digital Camera and Image Falsification Detection System”.
[0006]
The digital watermark as described above is generally used to prevent falsification of video data and audio data and protect copyright.
[0007]
[Problems to be solved by the invention]
However, when a part or all of the audio data is replaced with editing software or the like from the original combination of video data and audio data, the conventional copyright protection system cannot detect it as tampering. For example, even if a certain scene is photographed, the same camera is used to change the sound and the like in the same manner, and the sound data is exchanged, it cannot be determined whether the data is original data.
[0008]
As for the file format, various methods are adopted for frame synchronization and frame control. For example, even for the same Motion JPEG file, the format differs between the AVI file format and the QuickTime file format. Therefore, in the case where the file format of the MPEG-4 encoding method is converted to another file format, even if copyright protection is performed in one file format, if the file format is converted to another file format that does not support the copyright protection, the information becomes Will be lost.
[0009]
The present invention has been made in view of such circumstances, and does not require the use of a specific file format, and information capable of preventing tampering that destroys the identity of video data and audio data. It is an object to provide a processing device, an information processing method, a program, and a computer-readable recording medium.
[0010]
[Means for Solving the Problems]
In order to solve the above problems, the present invention is an information processing apparatus for encoding video data and audio data synchronized with the video data, wherein a first encoding unit for encoding the video data, Second encoding means for encoding the audio data, watermark data generation means for generating predetermined watermark data, and first watermark embedding means for embedding the watermark data in the encoded video data by a digital watermark Second watermark embedding means for embedding the watermark data in the encoded audio data by a digital watermark, and generating multiplexed data obtained by multiplexing the video data with the watermark data embedded therein and the audio data. Multiplexing means.
[0011]
Also, in the information processing apparatus according to the present invention, the watermark data generation unit may be configured to embed a digital watermark in the video data and the audio data based on the video data and the audio data synchronized with the video data. The watermark data is generated.
[0012]
Further, the information processing apparatus according to the present invention may include a first input unit for inputting video data, a second input unit for inputting audio data synchronized with the video data, and the video data and the audio data. Identity data generating means for generating identity data indicating synchronization, watermark data generating means for generating predetermined watermark data from the identity data, and embedding the watermark data in the video data by digital watermarking A first watermark embedding unit, a second watermark embedding unit for embedding the watermark data in the audio data by a digital watermark, and the video data in which the watermark data is embedded and the audio data in which the watermark data is embedded. Multiplexing means for generating multiplexed data obtained by multiplexing And wherein the door.
[0013]
Still further, the information processing apparatus according to the present invention further includes a first encoding unit that encodes the video data, wherein the first embedding unit performs a digital watermark on the encoded video data. The watermark data is embedded.
[0014]
Still further, the information processing apparatus according to the present invention further includes a second encoding unit that encodes the audio data, wherein the second embedding unit performs a digital watermark on the encoded audio data. The watermark data is embedded.
[0015]
Still further, the information processing apparatus according to the present invention is characterized by further comprising a recording unit for recording the multiplexed data on a portable recording medium.
[0016]
Still further, the information processing apparatus according to the present invention comprises: first input means for inputting video data in which the first watermark data is embedded by digital watermark; and audio data in which the second watermark data is embedded by digital watermark. , A second watermark extracting means for extracting the first watermark data embedded in the video data, and a second watermark extracting means for extracting the second watermark data embedded in the audio data. A second watermark extracting unit, a comparing unit for comparing the identities of the first watermark data and the second watermark data, and the video data based on the comparison result of the identities in the comparing unit. Determining means for determining whether the audio data is synchronized with the audio data.
[0017]
Still further, the information processing apparatus according to the present invention provides a multiplexing input for inputting multiplexed data in which video data in which first watermark data is embedded and audio data in which second watermark data is embedded. Means, separating means for separating the video data and the audio data from the multiplexed data, first watermark extracting means for extracting the first watermark data embedded in the video data, and the audio data A second watermark extracting unit for extracting the second watermark data embedded in the first watermark data; a comparing unit for comparing the first watermark data with the second watermark data for identity; Determining means for determining whether or not the video data and the audio data are synchronized based on the result of comparison of the identity And wherein the door.
[0018]
Still further, the information processing apparatus according to the present invention is characterized in that the multiplexing input means reads and inputs the multiplexed data recorded on a portable recording medium.
[0019]
Still further, the information processing apparatus according to the present invention is characterized by further comprising first output means for outputting the video data, and second output means for outputting the audio data.
[0020]
Still further, the information processing apparatus according to the present invention is characterized by further comprising a first reproducing means for reproducing the video data.
[0021]
Still further, the information processing apparatus according to the present invention is characterized by further comprising a second reproducing means for reproducing the audio data.
[0022]
Still further, the information processing apparatus according to the present invention, when the determining means determines that the video data and the audio data are not synchronized, when the first reproducing means reproduces the video data, It is characterized by further comprising control means for reproducing information indicating that the data is not synchronized with the data.
[0023]
Still further, the information processing apparatus according to the present invention is characterized in that the video data is encoded, a first decoding means for decoding the encoded video data, and the audio data is encoded. Second decoding means for decoding the converted audio data.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, preferred embodiments of the present invention will be described with reference to the drawings.
[0025]
<First embodiment>
FIG. 1 is a block diagram illustrating a configuration of an encoding system according to the first embodiment of the present invention. In the present embodiment, an example will be described in which video data and audio data synchronized therewith are encoded and recorded using the encoding system shown in FIG.
[0026]
As shown in FIG. 1, the encoding system according to the present embodiment includes an information processing apparatus 1 that generates encoded data, a microphone 2 that is connected to the information processing apparatus 1, and that inputs audio data, and video data that are continuously transmitted in frame units. And a storage device 10 for recording the generated encoded data.
[0027]
The information processing apparatus 1 is first provided with an audio encoder 3 that encodes audio data input from the microphone 2 in frame units. Here, the MPEG-1 Layer 3 encoding method will be described as an example, but the application in the present invention is not limited to this method. Also, for ease of explanation, the frame size is set to the interval of one frame of video data.
[0028]
Further, the information processing apparatus 1 includes a video encoder 5 that encodes video data input from the camera 4 in frame units. In the present embodiment, the MPEG-4 encoding method will be described as an example of the encoding method, but the application of the present invention is not limited to this method. Furthermore, the information processing apparatus 1 includes a watermark generator 6 that generates a unique value for one recording.
[0029]
The watermark embedding unit 7 is connected to the audio encoder 3 and the watermark generator 6, and encodes the watermark data generated by the watermark generator 6 (hereinafter referred to as “audio watermark data”) into encoded audio. It is embedded in data (hereinafter, referred to as “audio encoded data”). On the other hand, the watermark embedding unit 8 is connected to the video encoder 5 and the watermark generator 6, and the watermark data generated by the watermark generator 6 (hereinafter, referred to as “video watermark data”) is encoded video. It is embedded in data (hereinafter, referred to as “video encoded data”). The multiplexer 9 is connected to the watermark embedding units 7 and 8 and multiplexes the encoded audio data with the embedded audio watermark data and the encoded video data with the embedded video watermark data to form one stream data. To The storage device 10 records and stores this stream data.
[0030]
Next, the flow of processing from encoding to storage of video data and audio data in the encoding system having the above configuration will be described in detail. FIG. 9 is a flowchart for explaining an encoding processing procedure in the encoding system according to the first embodiment of the present invention.
[0031]
First, prior to processing, each unit is initialized and an ID number is set (step S201). Here, the ID number is a unique number relating to the current process. For example, in the present embodiment, a random number represented by 4 bytes is used as the ID number. Note that the ID number is not limited to this value, and any number can be used. In addition, it generates header data for multiplexing and prepares to write the output result in a free area of the storage device 10. Thereafter, a recording operation of a moving image (in the present embodiment, including video and audio) is started.
[0032]
Next, the end of the encoding process is determined (step S202). As a result, if the processing has not been completed, the watermark generator 6 generates watermark data based on the ID number (step S203). In the present embodiment, as the watermark data, data obtained by encrypting 4-byte data for each of audio data and video data is used. Then, the generated audio watermark data is input to the watermark embedding device 7 and the video watermark data is input to the watermark embedding device 8.
[0033]
Next, each frame of the video data read from the camera 4 is encoded (step S204). That is, video data captured by the camera 4 is input to the video encoder 5 on a frame-by-frame basis. Then, the video encoder 5 encodes the input video data according to the MPEG-4 encoding method and holds the encoded data (video encoded data).
[0034]
Further, the video watermark data generated in step S203 is embedded in the encoded video data (step S205). That is, the watermark embedding unit 8 embeds the generated video watermark data in the encoded video data read from the video encoder 5. As a method of embedding, for example, a method of intentionally changing encoded data according to watermark data by changing the highest-frequency coefficient of each block within a range of ± 1 and using odd and even numbers in accordance with watermark data is used. That is, if one bit of the data to be embedded is 0, the last coefficient is an even number, and if 1 is 1, the last coefficient is an odd number. Further, the code before the EOB of the macro block to be embedded is read out and changed if necessary.
[0035]
For example, when the 0 run length of the immediately preceding code is 8 and the value is 3, if the value to be embedded is 0, the code is replaced with a code having a run length of 8 and a value of 4. In the actual code, “111111111110110110” is replaced with “111111111110110111”. If the value is 1, do nothing. Note that the present invention is not limited to this, and an existing method such as the method described in “Moving picture digital watermarking system” in JP-A-11-341452 may be used.
[0036]
On the other hand, a frame of the audio data read from the microphone 2 is encoded (step S206). That is, the audio data input from the microphone 2 is input to the audio encoder 3 in frame units. The audio encoder 3 encodes the input audio data according to the MPEG-1 Layer 3 encoding method and holds the encoded data (audio encoded data). Then, the audio watermark data generated in step S203 is embedded in the encoded audio data (step S207). That is, the watermark embedding unit 7 embeds the generated audio watermark data in the encoded audio data read from the audio encoder 3. As a method of embedding, for example, when embedding a 4-byte value, an LSB of 32 samples may be applied to it.
[0037]
Then, each of the encoded data in which the watermark data is embedded is multiplexed by the multiplexer 9 and stored in a predetermined position of the storage device 10 (step S208). Thereafter, the multiplexed data is output (step S209), and the process returns to step S202, and the process is repeated until all data has been input. By such a series of selection operations, the same information regarding audio data and video data is embedded by a digital watermark, thereby making it possible to easily detect tampering on the decoding side.
[0038]
As described above, the information processing device 1 according to the present invention encodes video data and audio data synchronized with the video data. Specifically, the video encoder 5 encodes video data, and the audio encoder 3 encodes audio data. Further, the watermark generator 6 generates predetermined watermark data. Then, the watermark embedding unit 8 embeds the watermark data in the encoded video data by using a digital watermark. On the other hand, the watermark embedding unit 7 embeds the watermark data in the encoded audio data by a digital watermark. Further, the multiplexing unit 9 generates multiplexed data in which the video data and the audio data in which the watermark data is embedded are multiplexed.
[0039]
Further, the information processing apparatus 1 according to the present invention encodes video data and audio data synchronized with the video data. Specifically, the information processing apparatus 1 has the same configuration indicating that the video data and the audio data are synchronized. Sex data is generated, and predetermined watermark data is generated from the identity data. Then, the watermark data is embedded in the video data by a digital watermark. The watermark data is also embedded in the audio data by a digital watermark. Then, multiplexed data is generated by multiplexing the video data with the watermark data embedded therein and the audio data with the watermark data embedded therein.
[0040]
Further, the information processing apparatus 1 has a video encoder 5 for encoding video data, and the watermark embedding unit 8 embeds watermark data in the encoded video data by a digital watermark. I do.
[0041]
Still further, the information processing apparatus 1 has an audio encoder 3 for encoding audio data, and the watermark embedding unit 7 embeds watermark data in the encoded audio data by a digital watermark. And
[0042]
In the present embodiment, the encoding method of video data is MPEG-4, but other encoding methods, for example, H.264. 261, MPEG-1, or MPEG-2. Similarly, the encoding method of audio data is not limited to this, and may be AAC encoding or ADPCM encoding.
[0043]
Further, each part or all functions of the encoding system according to the present embodiment may be described by software, and may be processed by an arithmetic device such as a CPU. In this embodiment, the watermark data is embedded in the encoded data later. However, the present invention is not limited to this. For example, Japanese Patent Application Laid-Open No. H11-284516 discloses a “data processing device, a data processing method, and a recording medium”. , The watermark data may be embedded while encoding.
[0044]
The watermark data need not be the same as the audio watermark data and the video watermark data. For example, a 4-byte code may be used as the video watermark data and its complement may be used as the audio watermark data.
[0045]
<Second embodiment>
FIG. 2 is a block diagram illustrating a configuration of a decoding system according to the second embodiment of the present invention. In the present embodiment, an MPEG-4 encoding method will be described as an example of a video data encoding method, and an MPEG-1 Layer 3 encoding method will be described as an example of an audio data encoding method. It is not limited to only.
[0046]
As illustrated in FIG. 2, the decoding system according to the present embodiment includes an information processing device 11 that decodes a moving image (including video and audio in the present embodiment) and a storage device that stores encoded data connected thereto. 12, a speaker 20 for reproducing the decoded and reproduced audio data, and a monitor 21 for displaying the reproduced video data. It is assumed that the storage device 12 stores the multiplexed coded data generated in the first embodiment (hereinafter, referred to as “multiplexed coded data”).
[0047]
Further, the information processing apparatus 11 includes a demultiplexer for separating the multiplexed coded data multiplexed by the multiplexer 9 of FIG. 1 according to the first embodiment into video coded data and audio coded data. 13 are provided. The watermark extractors 14 and 15 are connected to the separator 13. The watermark extractor 14 extracts audio watermark data from the encoded audio data. On the other hand, the watermark extractor 15 extracts video watermark data from the encoded video data.
[0048]
Further, both watermark extractors 14 and 15 are connected to a comparator 16 which compares the respective extracted watermark data.
[0049]
The watermark extractor 14 is connected to an audio decoder 17 that decodes the encoded audio data and reproduces the audio data. On the other hand, the watermark extractor 15 is connected to a video decoder 18 that decodes the encoded video data and reproduces the video data. The display controller 19 is connected to the comparator 16 and the video decoder 18 and controls the video data on a frame basis. Then, the monitor 21 displays the video data (video) reproduced under the control of the display controller 19.
[0050]
Next, a flow of a decoding process of moving image data in the image decoding system having the above configuration will be described. FIG. 10 is a flowchart for explaining a decoding processing procedure in the decoding system according to the second embodiment of the present invention.
[0051]
First, similarly to the first embodiment, each unit is initialized prior to the processing operation, and the storage device 12 sets a read position at the head of the accumulated multiplexed coded data (step S301).
[0052]
Next, it is determined whether the decryption process is completed (step S302). As a result, if the processing has not been completed, the multiplexed coded data is read from a predetermined position in the storage device 12, and the read multiplexed coded data is input to the demultiplexer 13 (step S303). . Then, the separator 13 separates the audio coded data and the video coded data from the input multiplexed coded data in units of one frame (step S304). The two types of encoded data separated by the separator 13 are output to the watermark extractor 15 and the audio encoded data are output to the watermark extractor 14 in frame units.
[0053]
Therefore, the watermark extractor 15 extracts the video watermark data embedded in the encoded video data, decodes the encryption, outputs the result to the comparator 16, and outputs the encoded data to the video decoder 18 ( Step S305). The video decoder 18 decodes the encoded data and reproduces one frame of video data (step S306).
[0054]
At the same time, the watermark extractor 14 extracts the audio watermark data embedded in the encoded audio data, decrypts the encryption, outputs the result to the comparator 16, and outputs the encoded data to the audio decoder 17 (step). S307). The audio decoder 17 decodes the encoded data and reproduces one frame of audio data (step S308).
[0055]
The comparator 16 compares the audio watermark data extracted by the watermark extractor 14 with the video watermark data extracted by the watermark extractor 15 (step S309). As a result, if the two pieces of watermark data do not match, the process proceeds to step S310. In step S310, this mismatch is notified to the display controller 19, so that a warning is superimposed on the frame of the reproduced video data. That is, if the watermark data does not match in the comparator 16, a display indicating that the audio data and the video data do not match is superimposed on the output of the video decoder 18, output to the monitor 21, and visually checked on the monitor 21. Is displayed.
[0056]
On the other hand, if the comparator 16 determines that the watermark data matches, the display controller 19 outputs the output of the video decoder 18 to the monitor 21 as it is, and displays an image on the monitor 21. At this time, the decoding result of the audio data synchronized with the video data is simultaneously output to the speaker 20 (step S311). In this way, one frame of the reproduced video data is displayed, and at the same time, the audio data synchronized with the one frame is reproduced. Then, when the processing has been completed for all the encoded data in step S302, the processing of the entire decoding system ends.
[0057]
FIG. 11 is a diagram for explaining an example of displaying an image on the monitor 21 when the watermark data for video data and audio data match and when the watermark data does not match. As shown in FIG. 11A, if the two pieces of watermark data match, the image 1000 is displayed as having no tampering. If the two pieces of watermark data do not match, as shown in FIG. 11B, an image on which the telop 1001 is superimposed is displayed as if there has been tampering. At this time, the decoding result of the audio data is output to the speaker 20 simultaneously with the image display.
[0058]
In such a series of selecting operations, by comparing the watermark data embedded in the audio data and the video data, it is possible to detect data falsification and to notify the user of the falsification.
[0059]
That is, in the information processing apparatus 11 according to the present invention, the watermark extractor 15 extracts the first watermark data from the video data in which the first watermark data (video watermark data) is embedded by the digital watermark. Further, the watermark extractor 14 extracts the second watermark data from the audio data in which the second watermark data (audio watermark data) is embedded by the digital watermark. Then, the comparator 16 compares the identity of the first watermark data with the identity of the second watermark data. Then, the information processing device 11 determines whether or not the video data and the audio data are synchronized based on the comparison result of the identity.
[0060]
Further, in the information processing apparatus 11 according to the present invention, the separator 13 converts the multiplexed data obtained by multiplexing the video data in which the first watermark data is embedded and the audio data in which the second watermark data is embedded. Separate video and audio data. Then, the watermark extractor 15 extracts the first watermark data embedded in the video data. Further, the watermark extractor 14 extracts the second watermark data embedded in the audio data. Then, the comparator 16 compares the identity of the first watermark data with the identity of the second watermark data. Then, the information processing device 11 determines whether or not the video data and the audio data are synchronized based on the comparison result of the identity.
[0061]
Further, when it is determined that the video data and the audio data are not synchronized, the information processing device 11 transmits information (telop 1001) indicating that the video data and the audio data are not synchronized with the audio data when the video data is reproduced on the monitor 21. It is characterized by performing control for reproduction.
[0062]
Furthermore, the information processing apparatus 11 includes a video decoder 18 that encodes the video data and decodes the encoded video data, and a video decoder 18 that encodes the audio data and encodes the encoded audio data. An audio decoder 17 for decoding data is further provided.
[0063]
Furthermore, the information processing device 11 outputs video data and outputs audio data. Furthermore, the information processing apparatus 11 is characterized in that it can be connected to a monitor 21 for reproducing video data. Furthermore, the information processing apparatus 11 is characterized in that it can be connected to a speaker 20 for reproducing audio data.
[0064]
In the present embodiment, the moving image encoding method is MPEG-4, but other encoding methods, for example, H.264. 261, MPEG-1, MPEG-2, and Motion JPEG. Similarly, the audio encoding method is not limited to this, and may be AAC encoding or ADPCM encoding.
[0065]
Further, each part or all functions of the decoding system according to the present embodiment may be described in software, and may be processed by an arithmetic device such as a CPU.
[0066]
Further, in the present embodiment, similarly to the first embodiment, the watermark data is embedded in the encoded data later. The application of the present invention is not limited to the case described above. For example, as described in “Data Processing Apparatus, Data Processing Method, and Recording Medium” in JP-A-11-284516, watermarking is performed while decoding. Data may be extracted.
[0067]
Also, the watermark data need not be the same as the audio watermark data and the video watermark data. For example, a 4-byte code is used as the video watermark data, its complement is used as the audio watermark data, and the comparator 16 performs an operation to determine the identity. May be verified.
[0068]
<Third embodiment>
Next, a third embodiment of the present invention will be described in detail with reference to the drawings.
[0069]
FIG. 4 is a block diagram showing a configuration of a camcorder system according to a third embodiment of the present invention, which captures an image and records the captured image together with audio. In FIG. 4, reference numeral 101 denotes a microphone for inputting audio data, 102 denotes an optical system including a lens and the like, and 103 denotes a photoelectric converter including a CCD or the like that generates an electric signal by the intensity of light. Show.
[0070]
Reference numerals 104 and 105 denote A / D converters that convert analog electric signals into digital signals, and reference numerals 106, 107, 121, and 126 denote frame memories that store digital signals for each frame. Reference numeral 108 denotes an audio encoder that performs audio encoding on a frame basis. Furthermore, reference numeral 109 denotes a video encoder that performs video encoding on a frame basis. On the other hand, reference numeral 110 denotes a watermark generator for generating watermark data from the outputs of the audio encoder 108 and the video encoder 109. For example, data shared between audio data and video data synchronized therewith may be used as watermark data.
[0071]
Reference numeral 111 denotes a watermark embedding unit for embedding watermark data in audio encoded data, and reference numeral 112 denotes a watermark embedding unit for embedding watermark data in video encoded data. A multiplexer 113 multiplexes audio encoded data and video encoded data to shape a stream. Further, reference numeral 114 denotes a recording medium controller which reads and writes a stream from and to the recording medium 115. Note that the recording medium 115 is constituted by a magneto-optical disk or the like, but the application of the present invention is not limited to this.
[0072]
Further, reference numeral 116 denotes a separator for separating audio encoded data and video encoded data from the stream. Furthermore, reference numeral 117 denotes a watermark extractor for extracting watermark data from audio encoded data, and 118 denotes a watermark extractor for extracting watermark data from video encoded data. Further, reference numeral 119 denotes a comparator for comparing the extracted watermark data.
[0073]
Further, reference numeral 120 denotes an audio decoder that decodes the encoded audio data and reproduces the audio data. Reference numeral 122 denotes a D / A converter that converts audio data in the frame memory 121 from digital to analog signals, and reference numeral 123 denotes a speaker that reproduces analog audio signals and emits sound. Reference numeral 124 denotes an indicator for displaying the result of the watermark comparator 119, which is constituted by an LED or the like.
[0074]
Furthermore, reference numeral 125 denotes a video decoder that decodes encoded video data and reproduces image data. Reference numeral 127 denotes a D / A converter for converting video data in the frame memory 126 from digital to analog signals, and 128 denotes a viewer for displaying analog video signals, and is constituted by an LCD or the like. .
[0075]
Next, a recording operation of a moving image in the camcorder according to the embodiment having the above-described configuration will be described. In the present embodiment, an MPEG-4 encoding method will be described as an example of a video encoding method, and an AAC encoding method will be described as an example of an audio encoding method. For ease of explanation, the frame size is set to the interval of one video frame.
[0076]
First, a process of recording encoded data will be described. FIG. 12 is a flowchart illustrating a moving image recording process in the camcorder system according to the third embodiment. First, the apparatus is initialized, various headers are generated and stored in the recording device, and an ID number is set (step S401). That is, when the power is turned on to the camcorder, initialization of each unit is performed. Then, variation correction between sensors of the photoelectric converter 103 and clearing of the frame memories 105 and 107 are performed. Further, the recording medium controller 114 sets the writing position at the head of the empty area on the recording medium 115.
[0077]
When the user presses a recording button (not shown), recording of a moving image is started. The multiplexer 113 generates header data required for multiplexing, and writes the header data to a predetermined area on the recording medium 115 via the recording medium controller 114. The audio encoder 106 generates header data according to the AAC encoding method. The video encoder 109 generates header data according to the MPEG-4 encoding method. Each generated header data is stored at a predetermined position on the recording medium 115 via the multiplexer 113 and the recording medium controller 114. Also, an ID number is set as in the first embodiment. In this embodiment, a 16-bit integer selected from random numbers is used as an ID number.
[0078]
Next, the end of the process is determined (step S402). If it is not finished, one frame of the input video data is encoded according to the MPEG-4 encoding method (step S403). That is, the photoelectric converter 103 converts the light that has entered through the optical system 102 into an electric signal, and converts the light into a digital signal by the A / D converter 105 and stores the digital signal in the frame memory 107 in frame units. The video encoder 109 encodes image data in input frame units according to the MPEG-4 encoding method. The generated encoded data is input to the watermark generator 110 and the watermark embedding unit 112.
[0079]
At the same time, one frame of the input audio data is encoded according to the AAC encoding method (step S404). That is, the audio signal input from the microphone 101 is converted into a digital signal by the A / D converter 104 and stored in the frame memory 106 on a frame basis. The audio encoder 108 encodes audio data in input frame units according to the AAC encoding method. The generated encoded data is input to the watermark generator 110 and the watermark embedding unit 111.
[0080]
Next, the watermark generator 110 generates video watermark data from the ID number, part of the video encoded data, and part of the audio encoded data, and audio watermark data from the ID number and part of the video encoded data ( Step S405). That is, the watermark generator 110 reads out predetermined bit data from the encoded data input from the audio data encoder 108. For example, predetermined bits such as the third bit, the ninth bit, the 17th bit,... From the head are read out as 16-bit audio identification data.
[0081]
In addition, predetermined bit data is read from the encoded data input from the video data encoder 109. For example, predetermined bits such as the 54th bit, the 61st bit, the 77th bit,... From the head are read out as 16-bit video identification data. These ID numbers, audio identification data, and video identification data bits are arranged in order from the top to create 48-bit data, which is used as video watermark data. Also, the video identification data and the respective bits of the ID number are arranged in order from the higher order to create 32-bit data, which is used as audio watermark data.
[0082]
Thereafter, the video watermark data generated in step S405 is embedded in the encoded video data (step S406). That is, the watermark embedding unit 111 embeds the audio watermark data in the audio encoded data and inputs the audio watermark data to the multiplexer 113.
[0083]
At the same time, the audio watermark data generated in step S405 is embedded in the audio code (step S407). That is, the watermark embedding unit 112 embeds the video watermark data into the encoded video data and inputs the coded data to the multiplexer 113.
[0084]
Then, the multiplexer 113 multiplexes these two encoded data, and stores the data at a predetermined position on the recording medium 115 via the recording medium controller 114 (step S408). Further, the multiplexed stream is output and stored in the storage device 115 (step S409), and the process returns to step S402 to repeat the processing until shooting is completed. If a user (not shown) releases or presses the record button, it is determined that the recording has been completed, and the process is terminated.
[0085]
By embedding data that changes for each frame as a watermark by a series of selection operations as described above, it is possible to suitably generate data that facilitates detection of tampering in frame units.
[0086]
That is, the present invention is characterized in that the watermark generator 110 generates common watermark data to be embedded in the video data and audio data by a digital watermark, based on the video data and the audio data synchronized with the video data. I do. Further, the present invention is characterized in that the multiplexed data is recorded on the portable recording medium 115.
[0087]
In the present embodiment, the moving picture encoding method is MPEG-4, but other encoding methods, for example, H.264. 261, MPEG-1, MPEG-2, and Motion JPEG. Similarly, the audio encoding method is not limited to this, and may be MPEG-1 Layer 2 encoding or ADPCM encoding.
[0088]
Also, each part or all functions of the camcorder system according to the present embodiment may be described in software, and may be processed by an arithmetic device such as a CPU.
[0089]
In this embodiment, the watermark data is embedded in the encoded data later. However, the present invention is not limited to this. For example, Japanese Patent Application Laid-Open No. 11-284516 discloses a “data processing device, As described in “Processing Method and Recording Medium”, watermark data may be embedded while encoding.
[0090]
Further, the audio frame size is set to the interval of one video frame, but is not limited thereto, and the audio frame size and the video frame size may be different. For example, even if the audio is at 20 ms intervals and the video is at 33 ms intervals, if there is a break in the audio frame at the video time interval, watermark data is created for the encoded data starting from that break, and A plurality of watermark data may be embedded.
[0091]
<Fourth embodiment>
In the present embodiment, processing up to reproduction of a moving image in the camcorder system of the third embodiment described above will be described. FIG. 13 is a flowchart for explaining moving image reproduction processing in the camcorder system according to the fourth embodiment.
[0092]
First, similarly to the second embodiment, each unit of the apparatus is initialized prior to the operation (step S501). Then, the recording medium controller 114 sets a read position at the head of the encoded data stored in the recording medium 115. Next, the end of the process is determined (step S502). As a result, if the processing has not been completed, encoded data is read from a predetermined position on the recording medium 115, and the read encoded data is input to the separator 116 (step S503).
[0093]
Subsequently, the separator 116 separates the audio coded data and the video coded data from the input coded data. The audio coded data is sent to the watermark extractor 117 on a frame basis, and the video coded data is sent to the watermark extractor. 118 (step S504).
[0094]
Then, the watermark extractor 117 extracts the audio watermark data embedded in the encoded audio data, and outputs the result to the comparator 119 and the encoded data to the comparator 119 and the audio decoder 120 (step S507). . Similarly, the watermark extractor 116 extracts the video watermark data embedded in the encoded video data, outputs the result to the comparator 119, and outputs the encoded data to the video decoder 125 (step S505).
[0095]
Then, the video decoder 125 decodes the encoded image data and reproduces one frame of image data (step S506). At the same time, the audio decoder 120 decodes the encoded audio data and reproduces one frame of audio data (step S508).
[0096]
The comparator 119 obtains an ID number and audio identification data from the audio watermark data and the video watermark data. The video identification data is compared (step S509). As a result, if all match, the indicator 124 is turned on (step S510). On the other hand, if there is a mismatch, the indicator 124 is turned off (step S511).
[0097]
That is, the audio watermark data extracted by the watermark extractor 117 and the video watermark data extracted by the watermark extractor 116 are input to the comparator 119. Then, the ID number A and the video identification data A are reproduced from the audio watermark data. Further, the audio identification data A is generated from the input audio encoded data by the same method as the audio identification data generation method of the third embodiment. Further, an ID number V, audio identification data V, and video identification data V are reproduced from the video watermark data.
[0098]
Therefore, first, the ID number, the video identification data, and the audio identification data are compared with each other, and if at least one does not match, the indicator 124 is turned on. If all match, this is turned off. That is, ID number A and ID number V, video identification data A and video identification data V, audio identification data A and audio identification data V.
[0099]
Then, an image of one frame of the reproduced video data is displayed on the viewer, and the audio data is reproduced (step S512). Then, the process proceeds to step S502 to perform processing of the next frame. If the processing is completed for all the encoded data in step S502, the reproduction processing ends.
[0100]
That is, the encoded video data is input to the video decoder 125, and the encoded audio data is input to the audio decoder 120, where they are decoded and stored in the frame memory 126 and the frame memory 121, respectively. The audio data stored in the frame memory 121 is converted into an analog signal by a D / A converter and reproduced by a speaker 123. The video data stored in the frame memory 126 is also converted into an analog signal by the D / A converter and displayed on the viewer 128.
[0101]
By such a series of selection operations, by comparing data that changes for each frame embedded in the audio data and the video data, it is possible to detect tampering in small units and notify the user of the tampering. Become.
[0102]
As described above, according to the present invention, the multiplexed data is stored in a portable recording medium 115 that stores multiplexed data in which video data in which video watermark data is embedded and audio data in which audio watermark data is embedded. Is read.
[0103]
In the present embodiment, the moving picture encoding method is MPEG-4, but other encoding methods, for example, H.264. 261, MPEG-1, MPEG-2, MPEG-4. Similarly, the encoding method of audio data is not limited to this, and may be MPEG-1 Layer 3 encoding or ADPCM encoding.
[0104]
Also, each part or all functions of the camcorder system according to the present embodiment may be described in software, and may be processed by an arithmetic device such as a CPU.
[0105]
In this embodiment, the watermark data is embedded in the encoded data later. However, the application of the present invention is not limited to this. In the present invention, for example, watermark data may be extracted while decoding, as described in “Data Processing Apparatus, Data Processing Method, and Recording Medium” in JP-A-11-284516.
[0106]
<Fifth embodiment>
FIG. 3 is a block diagram illustrating a configuration of an information processing apparatus according to a fifth embodiment of the present invention. In FIG. 3, reference numeral 300 denotes a central processing unit (CPU) for controlling the entire apparatus and performs various processes, and 301 provides an operating system (OS), software, and a storage area required for operations required for controlling the apparatus. Indicates memory. A bus 302 connects various devices and exchanges data and control signals.
[0107]
Reference numeral 303 denotes a terminal for starting the apparatus, setting various conditions, and instructing reproduction. Reference numeral 304 denotes a storage device for storing software. Reference numeral 305 denotes a storage device that stores streams. Note that the storage devices 304 and 305 can be configured by a medium (portable recording medium) that can be moved separately from the system. Reference numeral 306 denotes a camera that captures an image, and 307 denotes an audio capture that captures audio data. Further, reference numeral 308 denotes a monitor for displaying an image, and 309 denotes a speaker for reproducing audio data. Further, reference numeral 311 denotes a communication circuit, which comprises a LAN, a public line, a wireless line, a broadcast wave, and the like. Reference numeral 310 denotes a communication interface for transmitting / receiving a stream via the communication circuit 311.
[0108]
Here, in the memory 301, an OS for controlling the entire apparatus and operating various software and operating software are stored, an image area for storing image data, an audio area for storing audio data, and a generated encoding There is a code area for storing data, a working area for storing parameters and the like for various operations and encoding, data related to a watermark, and the like.
[0109]
Next, encoding processing of image data in the information processing apparatus having such a configuration will be described. FIG. 5 is a flowchart for explaining an operation procedure of recording moving image data in the storage device 305 by the CPU 300.
[0110]
First, prior to the processing, the terminal 303 instructs the entire apparatus to start up, and each unit of the apparatus is initialized. Then, an ID number is set and stored in the working area on the memory 301 (step S1). In the present embodiment, the ID number is determined by a random number.
[0111]
Next, the software stored in the storage device 304 is expanded in the memory 301 via the bus 302, and the software is activated. As a result, headers for multiplexing, audio encoding, and video encoding are generated, multiplexed, and written in a predetermined position of the storage device 305 (step S2). In video coding, the coded data of the sequence layer is generated as a header according to the MPEG-1 coding method.
[0112]
FIG. 7 is a schematic diagram for explaining the use and storage state of the memory in the memory 301 at the time of encoding. As shown in FIG. 7, an OS for controlling the entire apparatus and operating various software, a video encoding software for encoding image data, an audio encoding software for encoding audio data, an encoding Multiplexing software for multiplexing data and watermark embedding software for generating and embedding watermark data are stored.
[0113]
In the present embodiment, the video encoding software is described as software for encoding by the MPEG-1 encoding method, but the application of the present invention is not limited to this. Also, the audio encoding software will be described as software that encodes using the AAC encoding method, but is not limited to this.
[0114]
It also stores an image area for storing images at the time of encoding, an audio area for storing audio data, a code area for storing encoded data in which generated codes and watermarks are embedded, parameters for various operations, and the like. Working area exists. In such a configuration, a moving image is input from the camera 306 according to an instruction from the terminal 306, and an audio input is performed from the audio capture 307.
[0115]
Therefore, the end of the process is determined next (step S3). As a result, if the processing is not completed, one frame of video data is read from the camera 306 into the image area on the memory 301 (step S4).
[0116]
Next, the video data stored in the image area on the memory 301 is encoded by the MPEG-1 encoding method using video encoding software on the memory 301, and stored in the code area on the memory 301. The area of the frame data of a certain video is released (step S5). Further, one frame of audio data is read from the audio capture 307 and stored in the audio area on the memory 301 (step S6). Furthermore, the audio data stored in the audio area on the memory 301 is encoded by the AAC encoding method using audio encoding software on the memory 301, and stored in the code area on the memory 301. The area of the frame data is released (step S7).
[0117]
Then, watermark data is generated from the ID number stored in the working area on the memory 301, the recording date and time when encoding the frame, and the time stamp of the video frame, and the data in the working area on the memory 301 is updated (step S8). For example, if the ID number is 16 bits, the recording date and time is 14 bits in the Christian era, 2 bits in the month, 5 bits in the date and time, 6 bits in the second, and 10 bits in the time stamp, the data becomes 53 bits in total. This is encrypted to obtain watermark data. After generating the watermark data, the process proceeds to step S9.
[0118]
In step S9, using watermark embedding software on the memory 301, the watermark data in the working area is embedded in the video encoded data in the code area, stored in the code area, and the area of the encoded data before embedding is opened. The process proceeds to step S10. In step S10, the watermark data in the working area is embedded in the audio encoded data in the code area, stored in the code area, the area of the encoded data before embedding is released, and the process proceeds to step S11.
[0119]
In step S11, the audio encoded data and the video encoded data generated and stored in the code area on the memory 301 are multiplexed using multiplexing software, and written in a predetermined area of the storage device 305, and the data in the code area is written. The area is released, and the process returns to step S3 to process the next frame data.
[0120]
On the other hand, if the processing has been completed for all the frame data in step S3, the reproduction processing ends.
[0121]
By such a series of selection operations, by comparing data that changes continuously for each frame embedded in audio data and video data, it is possible to detect tampering in small units, and to notify the user of the tampering. Becomes possible.
[0122]
In the present embodiment, the moving picture encoding method is MPEG-1, but other encoding methods, for example, H.264. 261, MPEG-1, MPEG-2, and MPEG-4, of course. Similarly, the audio encoding method is not limited to this, and may be MPEG-1 Layer 3 encoding or ADPCM encoding.
[0123]
Further, each part or all of the functions of the information processing apparatus according to the present embodiment may be configured by hardware to perform processing.
[0124]
In this embodiment, the watermark data is embedded in the encoded data later. However, the application of the present invention is not limited to this. For example, as described in “Data Processing Apparatus, Data Processing Method, and Recording Medium” of JP-A-11-28451, watermark data may be embedded while encoding.
[0125]
Further, the watermark data is not limited to the above-described type, and other information that can specify a frame may be used. Further, the generated stream may not only be stored in the storage device 305 but also output to the communication line 311 via the communication interface 310.
[0126]
<Sixth embodiment>
In the present embodiment, a decoding process of image data will be described. The image data processing apparatus uses the information processing apparatus shown in FIG. 3 used in the fifth embodiment. Also in the present embodiment, the MPEG-4 encoding method as the video encoding and the AAC encoding as the audio encoding will be described as an example, but the application of the present invention is not limited to this. Therefore, in the present embodiment, the decoding process of the encoded data generated in the fifth embodiment and stored in the storage device 305 will be described as an example.
[0127]
In the present embodiment, in the information processing apparatus having the configuration shown in FIG. 5, prior to the processing, the terminal 303 selects encoded data to be decoded from the encoded moving image data stored in the storage device 305, and Is instructed to start. Thereby, the software stored in the storage device 304 is expanded in the memory 301 via the bus 302, and the software is started.
[0128]
The memory 301 stores an OS for controlling the entire apparatus and operating various software and operating software, and stores an image area for storing image data, an audio area for storing audio data, and input coded data. There are a code area to be stored, a working area to store parameters and the like at the time of various operations and decoding, data related to a watermark, and the like.
[0129]
A description will be given of a decoding process of image data in the information processing apparatus having such a configuration. FIG. 6 is a flowchart for explaining an operation procedure of reading and reproducing moving image data from the storage device 305 by the CPU 300.
[0130]
First, prior to the processing, the terminal 303 instructs the entire apparatus to be activated, and each unit of the apparatus is initialized (step S51).
[0131]
FIG. 8 is a schematic diagram for explaining the state of use and storage of the memory in the memory 301 at the time of decoding. As shown in FIG. 8, the memory 301 includes an OS for controlling the entire apparatus and operating various software, video decoding software for decoding image data, audio decoding software for decoding audio data, and multiplexing of encoded data. , And demultiplexing software for separating each encoded data, and watermark extraction software for extracting and analyzing watermark data are stored.
[0132]
Further, there are an image area for storing an image at the time of decoding, an audio area for storing audio data, a code area for storing input encoded data, and a working area for storing parameters for various calculations.
[0133]
Then, the respective headers of the multiplexing, the audio encoding, and the video encoding are interpreted, and the initialization of each part is performed (step S52). Further, it is determined whether the processing is completed (step S53). As a result, when the process is not completed, the process proceeds to step S54.
[0134]
In step S54, the coded data is read from a predetermined position in the storage device 305, and separated into coded audio data and coded video data in units of one frame from the coded data using demultiplexing software on the memory 301. Are stored in the code area on the memory 301. Then, video watermark data is extracted from the encoded video data stored in the code area on the memory 301 by using watermark extraction software on the memory 301 (step S55).
[0135]
Further, using video decoding software on the memory 301, the video encoded data stored in the code area on the memory 301 is decoded according to the MPEG-1 encoding method, and the reproduced image data is stored in the image area on the memory 301. It is stored and the corresponding area of the code area is released (step S56). Then, the audio watermark data is extracted from the encoded audio data stored in the code area on the memory 301 using the watermark extraction software on the memory 301, and the contents are stored in the working area (step S57).
[0136]
Also, using audio decoding software on the memory 301, the audio encoded data stored in the code area on the memory 301 is decoded according to the AAC encoding method to reproduce one frame of audio data. It is stored in the audio area, and the corresponding area of the code area is released (step S58). Further, using watermark extraction software on the memory 301, information such as ID numbers, date and time, and time stamps are analyzed from audio watermark data and video watermark data in the working area on the memory 301, and the result is classified by classification. Is stored in the working area (step S59).
[0137]
Here, information such as the ID number and the date and time stored in the working area on the memory 301 is compared (step S60). As a result, if all match, the image data of the image area on the memory 301 is displayed on the monitor 308 according to the time stamp, the audio data of the audio area is output from the speaker 309, and the area of each area is opened ( Step S61). Then, the process returns to step S53 to perform the processing of the next frame. Then, in step S53, when the processing is completed for all the encoded data, the reproduction processing is completed.
[0138]
On the other hand, if there is a mismatch in step S60, the process returns to step S53, and the processing of the next frame is performed without reproducing the content.
[0139]
In such a series of selection operations, by comparing the watermark data that changes continuously for each frame embedded in the audio data and the video data, tampering can be detected in small units, and the reproduction method can be controlled. It becomes possible to do.
[0140]
In the present embodiment, the moving picture encoding method is MPEG-1, but other encoding methods, for example, H.264. 261, MPEG-1, MPEG-2, and MPEG-4, of course. Similarly, the audio encoding method is not limited to this, and may be MPEG Layer 3 encoding or ADPCM encoding.
[0141]
In addition, each part or all of the functions of the present embodiment may be configured by hardware and processed.
[0142]
In this embodiment, the watermark data is embedded in the encoded data later. However, the application of the present invention is not limited to this. In the present invention, watermark data may be extracted while decoding, for example, as described in “Data Processing Apparatus, Data Processing Method, and Recording Medium” of JP-A-11-284516.
[0143]
The present invention may be applied to a system including a plurality of devices (for example, a host computer, an interface device, a camcorder, a video camera, and the like), in addition to the above-described embodiments. VTR, television device, etc.).
[0144]
Further, an object of the present invention is to supply a recording medium (or a recording medium) in which a program code of software for realizing the functions of the above-described embodiments is recorded to a system or an apparatus, and a computer (or a CPU or a CPU) of the system or the apparatus. Needless to say, the present invention can also be achieved by the MPU) reading and executing the program code stored in the recording medium. In this case, the program code itself read from the recording medium implements the functions of the above-described embodiment, and the recording medium on which the program code is recorded constitutes the present invention. When the computer executes the readout program code, not only the functions of the above-described embodiments are realized, but also an operating system (OS) running on the computer based on the instruction of the program code. It goes without saying that a part or all of the actual processing is performed and the functions of the above-described embodiments are realized by the processing.
[0145]
Further, after the program code read from the recording medium is written into a memory provided in a function expansion card inserted into the computer or a function expansion unit connected to the computer, the function expansion is performed based on the instruction of the program code. It goes without saying that the CPU or the like provided in the card or the function expansion unit performs part or all of the actual processing, and the processing realizes the functions of the above-described embodiments.
[0146]
When the present invention is applied to the recording medium, the recording medium stores program codes corresponding to the flowcharts described above.
[0147]
【The invention's effect】
As described above, according to the present invention, it is not necessary to use a specific file format, and it is possible to prevent tampering that destroys the identity between video data and audio data. Then, falsification that destroys the identity of the video data and the audio data can be suitably detected.
[0148]
Further, according to the present invention, by embedding data shared between audio data and video data synchronized therewith as encoded data in encoded data, it is possible to determine the identity of audio and video, that is, the determination of a correct combination. It can be easily performed.
[0149]
Further, according to the present invention, by embedding in watermark data, there is obtained an effect that there is no problem in reproduction even when described in various file formats without depending on a specific file format.
[Brief description of the drawings]
FIG. 1 is a block diagram illustrating a configuration of an encoding system according to a first embodiment of the present invention.
FIG. 2 is a block diagram illustrating a configuration of a decoding system according to a second embodiment of the present invention.
FIG. 3 is a block diagram illustrating a configuration of an information processing apparatus according to a fifth embodiment of the present invention.
FIG. 4 is a block diagram showing a configuration of a camcorder system according to a third embodiment of the present invention, which captures an image and records the image together with a sound.
FIG. 5 is a flowchart for explaining an operation procedure of recording moving image data in the storage device 305 by the CPU 300;
FIG. 6 is a flowchart for explaining an operation procedure of reading and reproducing moving image data from a storage device 305 by a CPU 300;
FIG. 7 is a schematic diagram for explaining a state of use and storage of the memory in the memory 301 at the time of encoding.
FIG. 8 is a schematic diagram for explaining the use and storage state of the memory in the memory 301 at the time of decoding.
FIG. 9 is a flowchart illustrating an encoding processing procedure in the encoding system according to the first embodiment of the present invention.
FIG. 10 is a flowchart illustrating a decoding processing procedure in the decoding system according to the second embodiment of the present invention.
FIG. 11 is a diagram for explaining an example of displaying an image on the monitor 21 when watermark data regarding video data and audio data match (A) and when the watermark data does not match (B).
FIG. 12 is a flowchart illustrating a moving image recording process in the camcorder system according to the third embodiment.
FIG. 13 is a flowchart for explaining moving image reproduction processing in the camcorder system according to the fourth embodiment.
[Explanation of symbols]
1, 11 information processing device
2,101,307 microphone
3,108 audio encoder
4,306 camera
5,109 video encoder
6,110 watermark generator
7, 8, 111, 112 watermark embedding device
9,113 Multiplexer
10, 12 storage device
13,116 separator
14, 15, 117, 118 Watermark extractor
16, 119 comparator
17,120 audio decoder
18,125 video decoder
19 Display controller
20, 123, 309 Speaker
21,308 monitor
102 Optical system
103 photoelectric converter
104, 105 A / D converter
106, 107, 121, 126 Frame memory
114 Recording medium controller
115 Recording medium
122, 127 D / A converter
124 indicator
128 viewers
300 CPU
301 memory
302 bus
303 terminal
304, 305 storage device
309 Audio Capture
310 Communication interface
311 communication line

Claims

An information processing apparatus for encoding video data and audio data synchronized with the video data,
First encoding means for encoding the video data;
Second encoding means for encoding the audio data;
Watermark data generating means for generating predetermined watermark data,
First watermark embedding means for embedding the watermark data in the encoded video data by an electronic watermark;
Second watermark embedding means for embedding the watermark data in the encoded audio data by a digital watermark;
An information processing apparatus comprising: a multiplexing unit that generates multiplexed data obtained by multiplexing the video data and the audio data in which the watermark data is embedded.

The watermark data generating means generates common watermark data to be embedded in the video data and the audio data by a digital watermark, based on the video data and the audio data synchronized with the video data. The information processing device according to claim 1.

First input means for inputting video data;
Second input means for inputting audio data synchronized with the video data, identity data generating means for generating identity data indicating that the video data and the audio data are synchronized,
Watermark data generating means for generating predetermined watermark data from the identity data;
First watermark embedding means for embedding the watermark data in the video data with a digital watermark;
Second watermark embedding means for embedding the watermark data in the audio data by a digital watermark;
An information processing apparatus comprising: a multiplexing unit that generates multiplexed data by multiplexing the video data in which the watermark data is embedded and the audio data in which the watermark data is embedded.

A first encoding unit that encodes the video data,
4. The information processing apparatus according to claim 3, wherein the first embedding unit embeds the watermark data in the encoded video data by a digital watermark.

A second encoding unit that encodes the audio data,
The information processing apparatus according to claim 3, wherein the second embedding unit embeds the watermark data in the encoded audio data by a digital watermark.

The information processing apparatus according to claim 1, further comprising a recording unit that records the multiplexed data on a portable recording medium.

First input means for inputting video data in which first watermark data is embedded by an electronic watermark;
Second input means for inputting audio data in which the second watermark data is embedded by an electronic watermark;
First watermark extracting means for extracting the first watermark data embedded in the video data;
Second watermark extracting means for extracting the second watermark data embedded in the audio data;
Comparing means for comparing the identities of the first watermark data and the second watermark data;
An information processing apparatus comprising: a determination unit configured to determine whether or not the video data and the audio data are synchronized based on a result of the comparison by the comparison unit.

Multiplexing input means for inputting multiplexed data in which video data in which the first watermark data is embedded and audio data in which the second watermark data is embedded;
Separating means for separating the video data and the audio data from the multiplexed data,
First watermark extracting means for extracting the first watermark data embedded in the video data;
Second watermark extracting means for extracting the second watermark data embedded in the audio data;
Comparing means for comparing the identities of the first watermark data and the second watermark data;
An information processing apparatus comprising: a determination unit configured to determine whether or not the video data and the audio data are synchronized based on a result of the comparison by the comparison unit.

9. The information processing apparatus according to claim 8, wherein the multiplexing input unit reads and inputs the multiplexed data recorded on a portable recording medium.

First output means for outputting the video data;
10. The information processing apparatus according to claim 7, further comprising: a second output unit that outputs the audio data.

The information processing apparatus according to claim 10, further comprising a first playback unit that plays back the video data.

The information processing apparatus according to claim 10, further comprising a second reproducing unit that reproduces the audio data.

When the determining means determines that the video data and the audio data are not synchronized, information indicating that the audio data is not synchronized is provided when the video data is reproduced by the first reproducing means. The information processing apparatus according to claim 11, further comprising control means for reproducing.

First decoding means for encoding the video data, and decoding the encoded video data;
The information processing according to any one of claims 7 to 13, wherein the audio data is encoded, and further comprising second decoding means for decoding the encoded audio data. apparatus.

An information processing method for encoding video data and audio data synchronized with the video data,
A first encoding step of encoding the video data;
A second encoding step of encoding the audio data;
A watermark data generating step of generating predetermined watermark data;
A first watermark embedding step of embedding the watermark data in the encoded video data by a digital watermark;
A second watermark embedding step of embedding the watermark data in the encoded audio data by a digital watermark;
A multiplexing step of multiplexing the video data and the audio data in which the watermark data is embedded to generate multiplexed data.

The watermark data generating step generates, based on the video data and the audio data synchronized with the video data, common watermark data to be embedded in the video data and the audio data by a digital watermark. The information processing method according to claim 15.

An information processing method for encoding video data and audio data synchronized with the video data,
An identity data generating step of generating identity data indicating that the video data and the audio data are synchronized;
A watermark data generating step of generating predetermined watermark data from the identity data;
A first watermark embedding step of embedding the watermark data in the video data by a digital watermark;
A second watermark embedding step of embedding the watermark data in the audio data by a digital watermark;
Multiplexing the video data in which the watermark data is embedded and the audio data in which the watermark data is embedded to generate multiplexed data.

A first encoding step of encoding the video data;
18. The information processing method according to claim 17, wherein the first embedding step embeds the watermark data in the encoded video data by a digital watermark.

A second encoding step of encoding the audio data,
19. The information processing method according to claim 17, wherein the second embedding step embeds the watermark data in the encoded audio data by a digital watermark.

20. The information processing method according to claim 15, further comprising a recording step of recording the multiplexed data on a portable recording medium.

A first watermark extraction step of extracting the first watermark data from video data in which the first watermark data is embedded by a digital watermark;
A second watermark extraction step of extracting the second watermark data from audio data in which the second watermark data is embedded by an electronic watermark;
A comparing step of comparing the identity of the first watermark data and the second watermark data;
A determining step of determining whether or not the video data and the audio data are synchronized based on a result of the comparison of the identity in the comparing step.

A separation step of separating the video data and the audio data from multiplexed data in which video data in which the first watermark data is embedded and audio data in which the second watermark data is embedded;
A first watermark extraction step of extracting the first watermark data embedded in the video data;
A second watermark extraction step of extracting the second watermark data embedded in the audio data;
A comparing step of comparing the identity of the first watermark data and the second watermark data;
A determining step of determining whether or not the video data and the audio data are synchronized based on a result of the comparison of the identity in the comparing step.

A reading step of reading the multiplexed data from the portable recording medium storing the multiplexed data in which the video data in which the first watermark data is embedded and the audio data in which the second watermark data is embedded. 23. The information processing method according to claim 22, comprising:

A first output step of outputting the video data;
24. The information processing method according to claim 21, further comprising a second output step of outputting the audio data.

The information processing method according to claim 24, further comprising a first reproduction step of reproducing the video data.

The information processing method according to claim 24, further comprising a second reproduction step of reproducing the audio data.

When the determining step determines that the video data and the audio data are not synchronized, information indicating that the audio data is not synchronized is reproduced when the video data is reproduced in the first reproducing step. 26. The information processing method according to claim 25, further comprising a control step of causing the information processing to be performed.

A first decoding step in which the video data is encoded, and decoding the encoded video data;
The information processing according to any one of claims 21 to 27, wherein the audio data is encoded, and further comprising a second decoding step of decoding the encoded audio data. Method.

A program for causing a computer to encode video data and audio data synchronized with the video data,
A first encoding procedure for encoding the video data;
A second encoding procedure for encoding the audio data;
A watermark data generation procedure for generating predetermined watermark data;
A first watermark embedding procedure for embedding the watermark data in the encoded video data by a digital watermark;
A second watermark embedding procedure for embedding the watermark data in the encoded audio data by a digital watermark;
A multiplexing procedure for generating multiplexed data obtained by multiplexing the video data in which the watermark data is embedded and the audio data.

A program for causing a computer to encode video data and audio data synchronized with the video data,
An identity data generation procedure for generating identity data indicating that the video data and the audio data are synchronized,
A watermark data generating procedure for generating predetermined watermark data from the identity data,
A first watermark embedding procedure for embedding the watermark data in the video data by a digital watermark;
A second watermark embedding procedure for embedding the watermark data in the audio data by a digital watermark;
A multiplexing procedure for generating multiplexed data obtained by multiplexing the video data in which the watermark data is embedded and the audio data in which the watermark data is embedded.

On the computer,
A first watermark extraction procedure for extracting the first watermark data from video data in which the first watermark data is embedded by a digital watermark;
A second watermark extraction procedure for extracting the second watermark data from the audio data in which the second watermark data is embedded by a digital watermark;
A comparing procedure for comparing the identity of the first watermark data and the second watermark data;
And a determining step of determining whether or not the video data and the audio data are synchronized based on a result of the comparison in the comparing step.

On the computer,
A separation procedure for separating the video data and the audio data from multiplexed data in which video data in which first watermark data is embedded and audio data in which second watermark data is embedded;
A first watermark extraction procedure for extracting the first watermark data embedded in the video data;
A second watermark extraction procedure for extracting the second watermark data embedded in the audio data;
A comparing procedure for comparing the identity of the first watermark data and the second watermark data;
A determination step of determining whether or not the video data and the audio data are synchronized based on a result of the comparison of the identity in the comparison step.

A computer-readable recording medium storing the program according to any one of claims 29 to 32.