JP2010268263A

JP2010268263A - Tampering detection system, watermark information embedding device, tampering detector, watermark information embedding method and tampering detection method

Info

Publication number: JP2010268263A
Application number: JP2009118329A
Authority: JP
Inventors: Yoshiyasu Takahashi; 由泰高橋; Takaaki Yamada; 隆亮山田; Yukinori Terahama; 幸徳寺濱; Shigeru Suzuki; 滋鈴木
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2009-05-15
Filing date: 2009-05-15
Publication date: 2010-11-25
Anticipated expiration: 2029-05-15
Also published as: JP5031793B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a tampering detection system 10, capable of keeping the tampering detection accuracy high and reducing the quality deterioration of original data, with respect to each of audio data and video data, while associating the audio data with the video data. <P>SOLUTION: The tampering detection system 10 extracts feature quantity from the audio data to create sound feature quantity; extracts feature quantity from the video data to create video feature quantity; embeds a part of the audio feature quantity and the video feature quantity in the audio data; and embeds the remaining part of the audio feature quantity and the video feature quantity in the video data. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、フレーム単位で音声データと映像データとを含むコンテンツの改ざんを検出する技術に関する。 The present invention relates to a technique for detecting falsification of content including audio data and video data in units of frames.

下記の特許文献１には、ビデオデータとオーディオデータの組について、オーディオデータから３２ビットのオーディオ透かしデータを作成してビデオデータに埋め込むと共に、ビデオデータから４８ビットのビデオ透かしデータを作成してオーディオデータに埋め込むことにより、オーディオデータのみ、あるいは、ビデオデータのみのすげ替えを防止する技術が開示されている。 In Patent Document 1 below, for a set of video data and audio data, a 32-bit audio watermark data is created from the audio data and embedded in the video data, and a 48-bit video watermark data is created from the video data to create an audio. A technique for preventing replacement of only audio data or only video data by embedding in data is disclosed.

特開２００４−８００９４号公報JP 2004-80094 A

データの改ざんを高い精度で検出するためには、当該データの特徴を示すより多くの情報を、透かしとして当該データに埋め込む必要がある。しかし、改ざん判定の対象となるデータのデータ量に対して、透かし情報のデータ量の割合があまりに高すぎると、改ざん判定の対象となる本来のデータが劣化する場合がある。そのため、改ざん判定の対象となるデータの劣化を抑えつつ、改ざん検出の精度を高く保つには、対象となるデータに対する透かし情報の割合を、例えば１％程度とする場合がある。 In order to detect falsification of data with high accuracy, it is necessary to embed more information indicating the characteristics of the data in the data as a watermark. However, if the ratio of the data amount of the watermark information is too high with respect to the data amount of the data that is subject to falsification determination, the original data that is subject to falsification determination may be degraded. Therefore, in order to keep the accuracy of falsification detection while suppressing deterioration of data that is subject to falsification determination, the ratio of watermark information to the target data may be about 1%, for example.

また、上記した特許文献１の技術のように、ビデオデータとオーディオデータとを組にして互いの透かし情報を埋め込むことによりお互いを紐付ける場合、ビデオデータとオーディオデータとを同一の時間間隔でブロック化する必要がある。同一の時間間隔でブロック化した場合、ビデオデータのデータ量は、オーディオデータのデータ量の数百倍程度になる場合がある。そのため、ビデオデータの改ざん検出精度を、オーディオデータの改ざん検出精度と同程度にするためには、ビデオデータの透かし情報は、オーディオデータの透かし情報の数百倍にする必要がある。 Further, as in the technique of Patent Document 1 described above, when video data and audio data are paired together by embedding each other's watermark information, the video data and audio data are blocked at the same time interval. It is necessary to make it. When the data is blocked at the same time interval, the amount of video data may be several hundred times as large as the amount of audio data. Therefore, in order to make the falsification detection accuracy of the video data comparable to the falsification detection accuracy of the audio data, the watermark information of the video data needs to be several hundred times the watermark information of the audio data.

しかし、上記した特許文献１の技術では、ビデオデータの透かし情報は、オーディオデータの透かし情報の１．５倍程度となっている。そのため、オーディオデータの特徴を示す透かし情報のデータ量が、改ざん判定の対象となるオーディオデータのデータ量の例えば１％程度となっている場合には、ビデオデータの特徴を示す透かし情報のデータ量は、改ざん検出の対象となるビデオデータのデータ量の１％に満たないことになり、ビデオデータの改ざん検出精度が低くなってしまう可能性がある。 However, in the technique disclosed in Patent Document 1, the watermark information of video data is about 1.5 times the watermark information of audio data. Therefore, when the data amount of the watermark information indicating the characteristics of the audio data is, for example, about 1% of the data amount of the audio data to be falsified, the amount of watermark information indicating the characteristics of the video data Is less than 1% of the data amount of the video data subject to tampering detection, and the tampering detection accuracy of the video data may be lowered.

逆に、ビデオデータの特徴を示す透かし情報のデータ量が、改ざん検出の対象となるビデオデータのデータ量の１％となっている場合には、オーディオデータの特徴を示す透かし情報のデータ量は、改ざん検出の対象となるオーディオデータのデータ量の例えば１％より多いことになり、本来のオーディオデータの品質が損なわれてしまう可能性がある。 Conversely, when the amount of watermark information indicating the characteristics of video data is 1% of the amount of video data targeted for falsification detection, the amount of watermark information indicating the characteristics of audio data is This means that the amount of audio data to be tampered with is greater than, for example, 1%, and the quality of the original audio data may be impaired.

また、単純に、オーディオデータおよびビデオデータについてそれぞれ１％のデータ量の透かし情報を作成してビデオデータおよびオーディオデータにそれぞれ埋め込むことも考えられるが、ビデオデータの透かし情報のデータ量はオーディオデータの透かし情報のデータ量よりも多くなるため（例えば数百倍）、本来のオーディオデータの品質が損なわれてしまうことになる。 In addition, it is conceivable to simply create watermark information of 1% data amount for audio data and video data and embed them in video data and audio data, respectively. Since the amount of data of the watermark information is larger (for example, several hundred times), the quality of the original audio data is impaired.

本発明は上記事情を鑑みてなされたものであり、本発明の目的は、音声データと映像データとの紐付けを行ないつつ、それぞれについて、改ざん検出精度を高く保つと共に、元のデータの品質劣化を低く抑えることにある。 The present invention has been made in view of the above circumstances, and an object of the present invention is to maintain high tampering detection accuracy for each of the audio data and the video data and to deteriorate the quality of the original data. Is to keep it low.

上記課題を解決するために本発明では、音声データから音声特徴量を作成すると共に、映像データから映像特徴量を作成し、音声特徴量の一部および映像特徴量の一部を音声データに埋め込むと共に、音声特徴量の残りの部分および映像特徴量の残りの部分を映像データに埋め込む。 In order to solve the above problems, in the present invention, an audio feature amount is created from audio data, a video feature amount is created from video data, and a part of the audio feature amount and a part of the video feature amount are embedded in the audio data. At the same time, the remaining part of the audio feature quantity and the remaining part of the video feature quantity are embedded in the video data.

例えば、本発明は、フレーム単位で音声データと映像データとを含むコンテンツの改ざんを検出する改ざん検出システムであって、
フレーム単位で音声データおよび映像データに透かし情報を埋め込む透かし情報埋込装置と、
フレーム単位で透かし情報が埋め込まれた音声データおよび映像データから透かし情報を読み出して、音声データおよび映像データの改ざんの有無を判定する改ざん検出装置と
を備え、
前記透かし情報埋込装置は、
外部から音声を取得してフレーム単位で音声データに変換する音声データ作成部と、
フレーム単位で、前記音声データ作成部によって作成された音声データ中の予め定められたビットを、音声透かし情報のビットに置き換えることにより、音声透かし情報を音声データに埋め込んで出力する音声透かし埋込部と、
フレーム単位で、前記音声透かし埋込部によって音声透かし情報が埋め込まれた音声データ中の音声透かし情報が埋め込まれていないビットから音声特徴量を抽出する第１の音声特徴量抽出部と、
外部から映像を取得してフレーム単位で映像データに変換する映像データ作成部と、
フレーム単位で、前記映像データ作成部によって作成された映像データ中の予め定められたビットを、映像透かし情報のビットに置き換えることにより、映像透かし情報を映像データに埋め込んで出力する映像透かし埋込部と、
フレーム単位で、前記映像透かし埋込部によって映像透かし情報が埋め込まれた映像データ中の映像透かし情報が埋め込まれていないビットから映像特徴量を抽出する第１の映像特徴量抽出部と、
フレーム単位で、前記音声特徴量の一部および前記映像特徴量の一部を含む音声透かし情報を作成し、作成した音声透かし情報を前記音声透かし埋込部に供給すると共に、フレーム単位で、前記音声特徴量の残りの部分および前記映像特徴量の残りの部分を含む映像透かし情報を作成し、作成した映像透かし情報を前記映像透かし埋込部に供給する透かし情報作成部と
を有し、
前記改ざん検出装置は、
フレーム単位で、音声データ中の音声透かし情報が埋め込まれるべきビット位置から音声透かし情報を抽出する音声透かし抽出部と、
フレーム単位で、音声データ中の音声透かし情報が埋め込まれていないビット位置から音声特徴量を抽出する第２の音声特徴量抽出部と、
フレーム単位で、映像データ中の映像透かし情報が埋め込まれるべきビット位置から映像透かし情報を抽出する映像透かし抽出部と、
フレーム単位で、映像データ中の映像透かし情報が埋め込まれていないビット位置から映像特徴量を抽出する第２の映像特徴量抽出部と、
フレーム単位で、前記音声透かし抽出部によって抽出された音声透かし情報から音声特徴量の一部および映像特徴量の一部を抽出し、フレーム単位で、前記映像透かし抽出部によって抽出された映像透かし情報から音声特徴量の残りの部分および映像特徴量の残りの部分を抽出し、フレーム単位で、抽出したデータから音声特徴量および映像特徴量をそれぞれ再構成する特徴量再構成部と、
フレーム単位で、前記第２の音声特徴量抽出部によって抽出された音声特徴量と、前記特徴量再構成部によって再構成された音声特徴量とを比較することにより、音声データの改ざんの有無を示す情報を出力する音声データ改ざん検出部と、
フレーム単位で、前記第２の映像特徴量抽出部によって抽出された映像特徴量と、前記特徴量再構成部によって再構成された映像特徴量とを比較することにより、映像データの改ざんの有無を示す情報を出力する映像データ改ざん検出部と
を有することを特徴とする改ざん検出システムを提供する。 For example, the present invention is a falsification detection system that detects falsification of content including audio data and video data in units of frames,
A watermark information embedding device for embedding watermark information in audio data and video data in frame units;
A tamper detection device that reads watermark information from audio data and video data in which watermark information is embedded in units of frames, and determines whether the audio data and video data have been tampered with,
The watermark information embedding device comprises:
An audio data creation unit that obtains audio from the outside and converts it into audio data in units of frames;
An audio watermark embedding unit that embeds audio watermark information in audio data and outputs it by replacing predetermined bits in the audio data created by the audio data creation unit in units of frames with bits of audio watermark information When,
A first audio feature amount extraction unit that extracts an audio feature amount from a bit in which audio watermark information is not embedded in the audio data in which the audio watermark information is embedded by the audio watermark embedding unit;
A video data creation unit that acquires video from outside and converts it into video data in units of frames;
A video watermark embedding unit that embeds video watermark information in video data and outputs the video watermark information by replacing predetermined bits in the video data created by the video data creation unit in units of frames with bits of video watermark information When,
A first video feature amount extraction unit that extracts a video feature amount from a bit in which video watermark information is not embedded in the video data in which the video watermark information is embedded by the video watermark embedding unit in units of frames;
Create audio watermark information including a part of the audio feature amount and a part of the video feature amount in frame units, supply the generated audio watermark information to the audio watermark embedding unit, and in frame units, A watermark information creating unit that creates video watermark information including the remaining part of the audio feature and the remaining part of the video feature, and supplies the created video watermark information to the video watermark embedding unit;
The tampering detection device includes:
An audio watermark extraction unit that extracts audio watermark information from a bit position in which audio watermark information in audio data is to be embedded in units of frames;
A second audio feature amount extraction unit that extracts an audio feature amount from a bit position in which audio watermark information is not embedded in the audio data in units of frames;
A video watermark extraction unit that extracts video watermark information from a bit position in which video watermark information in video data is to be embedded in a frame unit;
A second video feature amount extraction unit that extracts a video feature amount from a bit position in which video watermark information in video data is not embedded in a frame unit;
The video watermark information extracted by the video watermark extraction unit by extracting a part of the audio feature amount and a part of the video feature amount from the audio watermark information extracted by the audio watermark extraction unit for each frame. A feature amount reconstructing unit that extracts the remaining portion of the audio feature amount and the remaining portion of the video feature amount from the frame and reconstructs the audio feature amount and the video feature amount from the extracted data in units of frames
By comparing the voice feature quantity extracted by the second voice feature quantity extraction unit with the voice feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, whether or not the voice data has been tampered with is determined. An audio data alteration detection unit that outputs information indicating
By comparing the video feature quantity extracted by the second video feature quantity extraction unit with the video feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, the presence or absence of alteration of the video data is determined. There is provided a falsification detection system including a video data falsification detection unit that outputs information to be displayed.

本発明の改ざん検出システムによれば、音声データと映像データとの紐付けを行ないつつ、それぞれについて、改ざん検出精度を高く保つと共に、元のデータの品質劣化を抑えることができる。 According to the falsification detection system of the present invention, it is possible to maintain high falsification detection accuracy and suppress deterioration in quality of original data while associating audio data and video data.

本発明の一実施形態に係る改ざん検出システム１０の構成の一例を示すシステム構成図である。1 is a system configuration diagram illustrating an example of a configuration of a falsification detection system 10 according to an embodiment of the present invention. 透かし情報埋込装置２０の機能構成の一例を示すブロック図である。3 is a block diagram illustrating an example of a functional configuration of a watermark information embedding device 20. FIG. 透かし情報埋込装置２０の動作の一例を示すフローチャートである。4 is a flowchart showing an example of the operation of the watermark information embedding device 20. 第１の実施形態における音声データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。It is a flowchart which shows an example of the watermark information embedding process (S200) about the audio | voice data in 1st Embodiment. 第１の実施形態において音声透かし情報dea[i]が音声データxa[i]に埋め込まれる過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process in which audio | voice watermark information dea [i] is embedded in audio | voice data xa [i] in 1st Embodiment. 第１の実施形態における映像データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。It is a flowchart which shows an example of the watermark information embedding process (S200) about the video data in 1st Embodiment. 音声データについての特徴量の抽出処理（Ｓ３００）の一例を示すフローチャートである。It is a flowchart which shows an example of the feature-value extraction process (S300) about audio | voice data. 音声データxa[i]から音声特徴量fa[i]が抽出される過程を説明するための概念図である。FIG. 5 is a conceptual diagram for explaining a process of extracting a voice feature amount fa [i] from voice data xa [i]. 映像データについての特徴量の抽出処理（Ｓ３００）の一例を示すフローチャートである。It is a flowchart which shows an example of the feature-value extraction process (S300) about video data. 音声データについての特徴量の遅延処理（Ｓ４００）の一例を示すフローチャートである。It is a flowchart which shows an example of the delay process (S400) of the feature-value about audio | voice data. 音声特徴量fa[i]が遅延される過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process by which audio | voice feature-value fa [i] is delayed. 映像データについての特徴量の遅延処理（Ｓ４００）の一例を示すフローチャートである。It is a flowchart which shows an example of the delay process (S400) of the feature-value about video data. 透かし情報の作成処理（Ｓ５００）の一例を示すフローチャートである。It is a flowchart which shows an example of the production | generation process (S500) of watermark information. 音声透かし情報dea[i]および映像透かし情報dev[i]が作成される過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process in which audio watermark information dea [i] and video watermark information dev [i] are created. 改ざん検出装置３０の機能構成の一例を示すブロック図である。3 is a block diagram illustrating an example of a functional configuration of a falsification detection device 30. FIG. 改ざん検出装置３０の動作の一例を示すフローチャートである。5 is a flowchart showing an example of the operation of the falsification detection device 30. 第１の実施形態における音声データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。It is a flowchart which shows an example of the extraction process (S600) of the watermark information about the audio | speech data in 1st Embodiment. 第１の実施形態において音声データxa[i]から読取音声透かし情報dda[i]が抽出される過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process in which the read audio | voice watermark information dda [i] is extracted from audio | voice data xa [i] in 1st Embodiment. 第１の実施形態における映像データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。It is a flowchart which shows an example of the extraction process (S600) of the watermark information about the video data in 1st Embodiment. 特徴量の再構成処理（Ｓ７００）の動作の一例を示すフローチャートである。It is a flowchart which shows an example of operation | movement of the reconstruction process of a feature-value (S700). 読取音声特徴量fad[i]および読取映像特徴量fvd[i]が作成される過程を説明するための概念図である。FIG. 5 is a conceptual diagram for explaining a process in which a read audio feature value fad [i] and a read video feature value fvd [i] are created. 音声データについての改ざん判定処理（Ｓ８００）の一例を示すフローチャートである。It is a flowchart which shows an example of the alteration determination process (S800) about audio | voice data. 改ざん情報の表示例を示す概念図である。It is a conceptual diagram which shows the example of a display of falsification information. 映像データについての改ざん判定処理（Ｓ８００）の一例を示すフローチャートである。It is a flowchart which shows an example of the alteration determination process (S800) about video data. 第３の実施形態における音声データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。It is a flowchart which shows an example of the watermark information embedding process (S200) about the audio | voice data in 3rd Embodiment. 第３の実施形態において音声透かし情報dea[i]が音声データxa[i]に埋め込まれる過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process in which audio | voice watermark information dea [i] is embedded in audio | voice data xa [i] in 3rd Embodiment. 第３の実施形態における映像データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。It is a flowchart which shows an example of the watermark information embedding process (S200) about the video data in 3rd Embodiment. 第３の実施形態における音声データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。It is a flowchart which shows an example of the extraction process (S600) of the watermark information about the audio | speech data in 3rd Embodiment. 第３の実施形態において音声データxa[i]から読取音声透かし情報dda[i]が抽出される過程を説明するための概念図である。It is a conceptual diagram for demonstrating the process in which the read audio | voice watermark information dda [i] is extracted from audio | voice data xa [i] in 3rd Embodiment. 第３の実施形態における映像データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。It is a flowchart which shows an example of the extraction process (S600) of the watermark information about the video data in 3rd Embodiment. 透かし情報埋込装置２０または改ざん検出装置３０の機能を実現するコンピュータ６０の構成の一例を示すハードウェア構成図である。It is a hardware block diagram which shows an example of a structure of the computer 60 which implement | achieves the function of the watermark information embedding apparatus 20 or the tampering detection apparatus 30.

まず、本発明の第１の実施形態について説明する。 First, a first embodiment of the present invention will be described.

図１は、本発明の一実施形態に係る改ざん検出システム１０の構成の一例を示すシステム構成図である。改ざん検出システム１０は、透かし情報埋込装置２０および改ざん検出装置３０を備える。 FIG. 1 is a system configuration diagram showing an example of a configuration of a falsification detection system 10 according to an embodiment of the present invention. The falsification detection system 10 includes a watermark information embedding device 20 and a falsification detection device 30.

透かし情報埋込装置２０は、マイク１１を介して音声を取り込んでフレーム毎に音声データを作成し、作成した音声データから音声特徴量を抽出すると共に、カメラ１２を介して映像を取り込んでフレーム毎に映像データを作成し、作成した映像データから映像特徴量を抽出する。そして、透かし情報埋込装置２０は、音声特徴量の一部および映像特徴量の一部を含む音声透かし情報を音声データに埋め込むと共に、音声特徴量の残りの部分および映像特徴量の残りの部分を含む映像透かし情報を映像データに埋め込む。そして、透かし情報埋込装置２０は、透かし情報を埋め込んだ音声データおよび映像データをフレーム毎にコンテンツデータとして記録媒体１３に記録する。 The watermark information embedding device 20 captures sound via the microphone 11 to create sound data for each frame, extracts sound feature amounts from the created sound data, and captures video via the camera 12 for each frame. Video data is created, and video feature quantities are extracted from the created video data. The watermark information embedding device 20 embeds audio watermark information including a part of the audio feature quantity and a part of the video feature quantity in the audio data, and also includes the remaining part of the audio feature quantity and the remaining part of the video feature quantity. Embedded in the video data. Then, the watermark information embedding device 20 records the audio data and video data in which the watermark information is embedded in the recording medium 13 as content data for each frame.

改ざん検出装置３０は、記録媒体１３からコンテンツデータをフレーム毎に読み出し、読み出したコンテンツデータをスピーカ１４および表示装置１５を介して再生すると共に、当該コンテンツデータに含まれる音声データから音声特徴量および音声透かし情報を抽出し、当該コンテンツデータに含まれる映像データから映像特徴量および映像透かし情報を抽出する。そして、改ざん検出装置３０は、抽出した音声透かし情報および映像透かし情報から、音声特徴量および映像特徴量を復元する。 The tampering detection device 30 reads content data from the recording medium 13 for each frame, reproduces the read content data via the speaker 14 and the display device 15, and generates audio feature amounts and audio from the audio data included in the content data. The watermark information is extracted, and the video feature amount and the video watermark information are extracted from the video data included in the content data. Then, the falsification detection device 30 restores the audio feature amount and the video feature amount from the extracted audio watermark information and video watermark information.

そして、改ざん検出装置３０は、コンテンツデータに含まれる音声データから抽出した音声特徴量と、復元した音声特徴量とを比較することにより、音声データの改ざんの有無をフレーム毎に判定し、音声データに改ざんがあった場合にその旨を表示装置１５に表示する。また、改ざん検出装置３０は、コンテンツデータに含まれる映像データから抽出した映像特徴量と、復元した映像特徴量とを比較することにより、映像データの改ざんの有無をフレーム毎に判定し、映像データに改ざんがあった場合にその旨を表示装置１５に表示する。 Then, the falsification detection device 30 compares the audio feature amount extracted from the audio data included in the content data with the restored audio feature amount, thereby determining whether the audio data has been tampered with for each frame. Is displayed on the display device 15 when the tampering occurs. Further, the tampering detection device 30 compares the video feature amount extracted from the video data included in the content data with the restored video feature amount to determine whether the video data has been tampered with for each frame. Is displayed on the display device 15 when the tampering occurs.

図２は、透かし情報埋込装置２０の機能構成の一例を示すブロック図である。透かし情報埋込装置２０は、音声データ作成部２００、映像データ作成部２０１、音声透かし埋込部２０２、透かし情報作成部２０３、映像透かし埋込部２０４、音声特徴量抽出部２０５、音声特徴量遅延部２０６、映像特徴量遅延部２０７、映像特徴量抽出部２０８、およびコンテンツ記録部２０９を有する。透かし情報埋込装置２０内の各機能の動作については、図３以降のフローチャートを用いて説明する。 FIG. 2 is a block diagram illustrating an example of a functional configuration of the watermark information embedding device 20. The watermark information embedding device 20 includes an audio data creation unit 200, a video data creation unit 201, an audio watermark embedding unit 202, a watermark information creation unit 203, a video watermark embedding unit 204, an audio feature amount extraction unit 205, an audio feature amount. A delay unit 206, a video feature amount delay unit 207, a video feature amount extraction unit 208, and a content recording unit 209 are included. The operation of each function in the watermark information embedding device 20 will be described with reference to the flowcharts in FIG.

図３は、透かし情報埋込装置２０の動作の一例を示すフローチャートである。透かし情報埋込装置２０は、所定の時間間隔（本実施形態では１秒）のフレーム毎に音声および映像を取り込んで、それぞれ音声データおよび映像データとして処理する。そのため、透かし情報埋込装置２０は、１フレーム毎に、図３のフローチャートに示す処理を実行する。 FIG. 3 is a flowchart showing an example of the operation of the watermark information embedding device 20. The watermark information embedding device 20 takes in audio and video for each frame at a predetermined time interval (1 second in this embodiment) and processes it as audio data and video data, respectively. Therefore, the watermark information embedding device 20 executes the processing shown in the flowchart of FIG. 3 for each frame.

まず、音声データ作成部２００は、マイク１１を介して１フレーム分の音声を取り込み、音声データxa[i](0≦i＜XA)を作成して音声透かし埋込部２０２へ出力する（Ｓ１００）。また、映像データ作成部２０１は、カメラ１２を介して１フレーム分の映像を取り込み、映像データxv[i](0≦i＜XV)を作成して音声透かし埋込部２０２へ出力する（Ｓ１００）。 First, the audio data creation unit 200 captures one frame of audio via the microphone 11, creates audio data xa [i] (0 ≦ i <XA), and outputs it to the audio watermark embedding unit 202 (S100). ). Also, the video data creation unit 201 captures one frame of video through the camera 12, creates video data xv [i] (0 ≦ i <XV), and outputs the video data xv [i] (0 ≦ i <XV) to the audio watermark embedding unit 202 (S100). ).

本実施形態において、音声データ作成部２００は、音声を48kHzのサンプリング周波数で16ビットのデータに変換するため、１フレーム分の音声データxa[i]は768kビットとなり、XAは768,000となる。また、映像データ作成部２０１は、映像を30Hzのサンプリング周波数で640×480×24ビットのデータに変換するため、１フレーム分の映像データxv[i]は、221,184,000ビットとなり、XVは221,184,000となる。 In this embodiment, since the audio data creation unit 200 converts audio into 16-bit data at a sampling frequency of 48 kHz, the audio data xa [i] for one frame is 768 kbit and XA is 768,000. Further, since the video data creation unit 201 converts the video into 640 × 480 × 24 bit data at a sampling frequency of 30 Hz, the video data xv [i] for one frame is 221,184,000 bits and XV is 221,184,000. .

次に、音声透かし埋込部２０２および映像透かし埋込部２０４は、後述する透かし情報の埋込処理を実行して、音声透かし情報dea[i]を音声データxa[i]の所定のビットに埋め込むと共に、映像透かし情報dev[i]を映像データxv[i]の所定のビットに埋め込む（Ｓ２００）。そして、コンテンツ記録部２０９は、透かし情報が埋め込まれた音声データxa[i]および映像データxv[i]をコンテンツデータとして記録媒体１３に記録する（Ｓ１０１）。 Next, the audio watermark embedding unit 202 and the video watermark embedding unit 204 execute watermark information embedding processing described later to convert the audio watermark information dea [i] into predetermined bits of the audio data xa [i]. At the same time, the video watermark information dev [i] is embedded in predetermined bits of the video data xv [i] (S200). Then, the content recording unit 209 records the audio data xa [i] and the video data xv [i] in which the watermark information is embedded in the recording medium 13 as content data (S101).

次に、音声特徴量抽出部２０５および映像特徴量抽出部２０８は、後述する特徴量の抽出処理を実行することにより、ステップＳ２００で透かし情報が埋め込まれた音声データxa[i]において透かし情報が埋め込まれていないビットから音声特徴量fa[i]を抽出すると共に、ステップＳ２００で透かし情報が埋め込まれた映像データxv[i]において透かし情報が埋め込まれていないビットから映像特徴量fv[i]を抽出する（Ｓ３００）。 Next, the audio feature amount extraction unit 205 and the video feature amount extraction unit 208 execute a feature amount extraction process, which will be described later, so that the watermark information is included in the audio data xa [i] in which the watermark information is embedded in step S200. The audio feature quantity fa [i] is extracted from the bits that are not embedded, and the video feature quantity fv [i] is extracted from the bits in which the watermark information is not embedded in the video data xv [i] in which the watermark information is embedded in step S200. Is extracted (S300).

次に、音声特徴量遅延部２０６および映像特徴量遅延部２０７は、後述する特徴量の遅延処理を実行することにより、ステップＳ３００で抽出した音声特徴量fa[i]および映像特徴量fv[i]を所定フレーム分遅延させる（Ｓ４００）。 Next, the audio feature amount delay unit 206 and the video feature amount delay unit 207 execute a feature amount delay process, which will be described later, thereby performing the audio feature amount fa [i] and the video feature amount fv [i extracted in step S300. ] Is delayed by a predetermined frame (S400).

次に、透かし情報作成部２０３は、後述する透かし情報の作成処理を実行することにより、ステップＳ４００で遅延された音声特徴量fa[i]および映像特徴量fv[i]から、音声透かし情報dea[i]および映像透かし情報dev[i]を作成し（Ｓ５００）、透かし情報埋込装置２０は、本フローチャートに示す動作を終了する。 Next, the watermark information creation unit 203 executes a watermark information creation process, which will be described later, so that the audio watermark information dea is obtained from the audio feature amount fa [i] and the video feature amount fv [i] delayed in step S400. [i] and video watermark information dev [i] are created (S500), and the watermark information embedding device 20 ends the operation shown in this flowchart.

図４は、第１の実施形態における音声データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。音声透かし埋込部２０２は、１フレーム毎の音声データxa[i]について、本フローチャートに示す処理を実行する。 FIG. 4 is a flowchart illustrating an example of watermark information embedding processing (S200) for audio data according to the first embodiment. The audio watermark embedding unit 202 executes the processing shown in this flowchart for the audio data xa [i] for each frame.

まず、音声透かし埋込部２０２は、音声透かし情報埋込位置pwa[i](0≦i＜DA)を準備する（Ｓ２０１）。ここで、音声透かし情報ビット長DAは、１フレームの音声データに埋め込まれる音声透かし情報のデータ量を示す。本実施形態では、音声データ64ビット当たりに１ビットの割合で音声透かし情報を埋め込むことを想定しており、DA＝XA÷64＝12,000である。 First, the audio watermark embedding unit 202 prepares an audio watermark information embedding position pwa [i] (0 ≦ i <DA) (S201). Here, the audio watermark information bit length DA indicates the data amount of the audio watermark information embedded in the audio data of one frame. In this embodiment, it is assumed that audio watermark information is embedded at a rate of 1 bit per 64 bits of audio data, and DA = XA ÷ 64 = 12,000.

音声透かし情報埋込位置pwa[i]は、0≦pwa[i]＜XAかつpwa[i]mod2＝0を満たし、0≦i＜DAかつ0≦j＜DAかつi≠jを満たす任意の(i,j)について、pwa[i]≠pwa[j]を満たす数列である。本実施形態において、音声透かし情報埋込位置pwa[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 Audio watermark information embedding position pwa [i] satisfies 0 ≦ pwa [i] <XA and pwa [i] mod2 = 0, and satisfies 0 ≦ i <DA and 0 ≦ j <DA and i ≠ j (i, j) is a sequence satisfying pwa [i] ≠ pwa [j]. In the present embodiment, the audio watermark information embedding position pwa [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、音声透かし埋込部２０２は、変数ideaの値を０に初期化し（Ｓ２０２）、前のフレームにおいて透かし情報作成部２０３がステップＳ５００で作成した音声透かし情報deaのidea番目のビットを、ステップＳ１００において音声データ作成部２００が生成した音声データxaのpwa[idea]番目のビットと置き換えることにより、音声透かし情報dea[idea]を音声データxa[pwa[idea]]に埋め込む（Ｓ２０３）。 Next, the audio watermark embedding unit 202 initializes the value of the variable idea to 0 (S202), and the idea-th bit of the audio watermark information dea generated by the watermark information generation unit 203 in step S500 in the previous frame is The voice watermark information dea [idea] is embedded in the voice data xa [pwa [idea]] by replacing the pwa [idea] -th bit of the voice data xa generated by the voice data creation unit 200 in step S100 (S203).

次に、音声透かし埋込部２０２は、変数ideaの値が音声透かし情報ビット長DAの値と一致したか否かを判定する（Ｓ２０４）。変数ideaの値が音声透かし情報ビット長DAの値と異なる場合（Ｓ２０４：Ｎｏ）、音声透かし埋込部２０２は、変数ideaの値を１増やして（Ｓ２０５）、再びステップＳ２０３に示した処理を実行する。変数ideaの値が音声透かし情報ビット長DAの値と一致した場合（Ｓ２０４：Ｙｅｓ）、音声透かし埋込部２０２は、音声透かし情報dea[i]が埋め込まれた音声データxa[i]を音声特徴量抽出部２０５およびコンテンツ記録部２０９へ出力し（Ｓ２０６）、本フローチャートに示した処理を終了する。 Next, the audio watermark embedding unit 202 determines whether or not the value of the variable idea matches the value of the audio watermark information bit length DA (S204). When the value of the variable idea is different from the value of the audio watermark information bit length DA (S204: No), the audio watermark embedding unit 202 increments the value of the variable idea (S205), and performs the process shown in step S203 again. Execute. When the value of the variable idea matches the value of the audio watermark information bit length DA (S204: Yes), the audio watermark embedding unit 202 converts the audio data xa [i] embedded with the audio watermark information dea [i] into audio The data is output to the feature amount extraction unit 205 and the content recording unit 209 (S206), and the processing shown in this flowchart is terminated.

ここで、ステップＳ２０３からＳ２０５の処理を図５を用いて説明すると、音声透かし情報埋込位置pwa[i]が例えば{2,6,0,4,・・・}である場合、音声透かし埋込部２０２は、音声透かし情報dea[0]のビット４００を、音声データxa[pwa[0]]＝xa[2]のビット４０２に埋め込み、音声透かし情報dea[1]のビット４０１を、音声データxa[pwa[1]]＝xa[6]のビット４０３に埋め込む。pwa[i]は、偶数の集合であるため、音声透かし埋込部２０２は、音声データxa[i]の偶数番目のビットに音声透かし情報dea[i]のビットを埋め込む。 Here, the processing in steps S203 to S205 will be described with reference to FIG. 5. When the audio watermark information embedding position pwa [i] is {2, 6, 0, 4,. The embedding unit 202 embeds the bit 400 of the audio watermark information dea [0] in the bit 402 of the audio data xa [pwa [0]] = xa [2], and the bit 401 of the audio watermark information dea [1] Data xa [pwa [1]] is embedded in bit 403 of xa [6]. Since pwa [i] is an even set, the audio watermark embedding unit 202 embeds bits of the audio watermark information dea [i] in even-numbered bits of the audio data xa [i].

図６は、第１の実施形態における映像データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。映像透かし埋込部２０４は、１フレーム毎の映像データxv[i]について、本フローチャートに示す処理を実行する。 FIG. 6 is a flowchart illustrating an example of watermark information embedding processing (S200) for video data according to the first embodiment. The video watermark embedding unit 204 executes the processing shown in this flowchart for video data xv [i] for each frame.

まず、映像透かし埋込部２０４は、映像透かし情報埋込位置pwv[i](0≦i＜DV)を準備する（Ｓ２１１）。ここで、映像透かし情報ビット長DVは、１フレームの映像データに埋め込まれる映像透かし情報のデータ量を示す。本実施形態では、映像データの24ビットのうち輝度情報64ビット当たりに１ビットの割合で映像透かし情報を埋め込むことを想定しており、DV＝XV÷(3×64)＝1,152,000である。 First, the video watermark embedding unit 204 prepares a video watermark information embedding position pwv [i] (0 ≦ i <DV) (S211). Here, the video watermark information bit length DV indicates the amount of video watermark information embedded in one frame of video data. In the present embodiment, it is assumed that video watermark information is embedded at a rate of 1 bit per 64 bits of luminance information out of 24 bits of video data, and DV = XV ÷ (3 × 64) = 1,152,000.

映像透かし情報埋込位置pwv[i]は、0≦pwv[i]＜XVかつpwv[i]mod2＝0を満たし、0≦i＜DVかつ0≦j＜DVかつi≠jを満たす任意の(i,j)について、pwv[i]≠pwv[j]を満たす数列である。本実施形態において、映像透かし情報埋込位置pwv[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 The video watermark information embedding position pwv [i] satisfies 0 ≦ pwv [i] <XV and pwv [i] mod2 = 0, and satisfies 0 ≦ i <DV and 0 ≦ j <DV and i ≠ j (i, j) is a sequence satisfying pwv [i] ≠ pwv [j]. In this embodiment, the video watermark information embedding position pwv [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、映像透かし埋込部２０４は、変数idevの値を０に初期化し（Ｓ２１２）、前のフレームにおいて透かし情報作成部２０３がステップＳ５００で作成した映像透かし情報devのidev番目のビットを、ステップＳ１００において音声データ作成部２００が生成した映像データxvのpwv[idev]番目のビットと置き換えることにより、映像透かし情報dev[idev]を映像データxv[pwv[idev]]に埋め込む（Ｓ２１３）。なお、pwv[i]も偶数の集合であるため、映像透かし埋込部２０４は、映像データxv[i]の偶数のビットに映像透かし情報dev[i]のビットを埋め込む。 Next, the video watermark embedding unit 204 initializes the value of the variable idev to 0 (S212), and the idev-th bit of the video watermark information dev created by the watermark information creation unit 203 in step S500 in the previous frame is The video watermark information dev [idev] is embedded in the video data xv [pwv [idev]] by replacing the pwv [idev] th bit of the video data xv generated by the audio data creation unit 200 in step S100 (S213). Since pwv [i] is also an even set, the video watermark embedding unit 204 embeds bits of the video watermark information dev [i] in even bits of the video data xv [i].

次に、映像透かし埋込部２０４は、変数idevの値が映像透かし情報ビット長DVの値と一致したか否かを判定する（Ｓ２１４）。変数idevの値が映像透かし情報ビット長DVの値と異なる場合（Ｓ２１４：Ｎｏ）、映像透かし埋込部２０４は、変数idevの値を１増やして（Ｓ２１５）、再びステップＳ２１３に示した処理を実行する。変数idevの値が映像透かし情報ビット長DVの値と一致した場合（Ｓ２１４：Ｙｅｓ）、映像透かし埋込部２０４は、映像透かし情報が埋め込まれた映像データxv[i]を映像特徴量抽出部２０８およびコンテンツ記録部２０９へ出力し（Ｓ２１６）、本フローチャートに示した処理を終了する。 Next, the video watermark embedding unit 204 determines whether or not the value of the variable idev matches the value of the video watermark information bit length DV (S214). When the value of the variable idev is different from the value of the video watermark information bit length DV (S214: No), the video watermark embedding unit 204 increments the value of the variable idev (S215), and performs the process shown in step S213 again. Execute. When the value of the variable idev matches the value of the video watermark information bit length DV (S214: Yes), the video watermark embedding unit 204 uses the video feature value extraction unit to extract the video data xv [i] in which the video watermark information is embedded. 208 and the content recording unit 209 (S216), and the processing shown in this flowchart is terminated.

図７は、音声データについての特徴量の抽出処理（Ｓ３００）の一例を示すフローチャートである。音声特徴量抽出部２０５は、１フレーム毎の音声データxa[i]について、本フローチャートに示す処理を実行する。 FIG. 7 is a flowchart illustrating an example of feature amount extraction processing (S300) for audio data. The voice feature amount extraction unit 205 executes the processing shown in this flowchart for the voice data xa [i] for each frame.

まず、音声特徴量抽出部２０５は、音声特徴量抽出位置pfa[i](0≦i＜FA)を準備する（Ｓ３０１）。ここで、音声特徴量ビット長FAは、音声データxa[i]の同一性を判定するために音声データxa[i]から抽出されるビットの集合を示す音声特徴量fa[i]のデータ量である。本実施形態において、音声特徴量ビット長FAは、例えば、１フレームの音声データxa[i]について、音声透かし情報ビット長DAから後述するヘッダ情報のデータ長DAHを除いた11,000である。 First, the voice feature quantity extraction unit 205 prepares a voice feature quantity extraction position pfa [i] (0 ≦ i <FA) (S301). Here, the speech feature amount bit length FA is a data amount of the speech feature amount fa [i] indicating a set of bits extracted from the speech data xa [i] in order to determine the identity of the speech data xa [i]. It is. In the present embodiment, the audio feature amount bit length FA is, for example, 11,000 for one frame of audio data xa [i], excluding the data length DAH of header information described later from the audio watermark information bit length DA.

音声特徴量抽出位置pfa[i]は、0≦pfa[i]＜XAかつpfa[i]mod2＝1を満たす数列である。音声特徴量抽出位置pfa[i]は、例えば、XAが偶数であれば数列{0,1,2,…,XA/2-1}から、XAが奇数であれば数列{0,1,2,…,(XA-1)/2}から、擬似乱数を用いてランダムにFA個の数字{xn}を選択し、{2xn+1}を計算する方法で生成することができる。本実施形態において、音声特徴量抽出位置pfa[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 The audio feature extraction position pfa [i] is a sequence satisfying 0 ≦ pfa [i] <XA and pfa [i] mod2 = 1. The speech feature extraction position pfa [i] is, for example, from the sequence {0,1,2,..., XA / 2-1} if XA is an even number and the sequence {0,1,2 if XA is an odd number ,..., (XA-1) / 2} can be generated by a method of selecting FA numbers {xn} randomly using pseudorandom numbers and calculating {2xn + 1}. In the present embodiment, the voice feature amount extraction position pfa [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、音声特徴量抽出部２０５は、変数ipfaの値を０に初期化し（Ｓ３０２）、音声透かし埋込部２０２から出力された音声データxaのpfa[ipfa]番目のビットを読み出して音声特徴量faのipfa番目のビットに格納する（Ｓ３０３）。そして、音声特徴量抽出部２０５は、変数ipfaの値が音声特徴量ビット長FAの値と一致したか否かを判定する（Ｓ３０４）。 Next, the speech feature amount extraction unit 205 initializes the value of the variable ipfa to 0 (S302), reads the pfa [ipfa] -th bit of the speech data xa output from the speech watermark embedding unit 202, and speech features. Store in the ipfa-th bit of the quantity fa (S303). Then, the voice feature amount extraction unit 205 determines whether or not the value of the variable ipfa matches the value of the voice feature amount bit length FA (S304).

変数ipfaの値が音声特徴量ビット長FAの値と異なる場合（Ｓ３０４：Ｎｏ）、音声特徴量抽出部２０５は、変数ipfaの値を１増やして（Ｓ３０５）、再びステップＳ３０３に示した処理を実行する。変数ipfaの値が音声特徴量ビット長FAの値と一致した場合（Ｓ３０４：Ｙｅｓ）、音声特徴量抽出部２０５は、音声特徴量fa[i]を音声特徴量遅延部２０６へ出力し（Ｓ３０６）、本フローチャートに示した処理を終了する。 When the value of the variable ipfa is different from the value of the audio feature amount bit length FA (S304: No), the audio feature amount extraction unit 205 increments the value of the variable ipfa by 1 (S305), and performs the process shown in step S303 again. Execute. If the value of the variable ipfa matches the value of the speech feature bit length FA (S304: Yes), the speech feature extraction unit 205 outputs the speech feature fa [i] to the speech feature delay unit 206 (S306). ), The process shown in this flowchart is terminated.

ここで、ステップＳ３０３からＳ３０５の処理を図８を用いて説明すると、音声特徴量読み出し位置pfa[i]が例えば{3,7,1,5,・・・}である場合、音声特徴量抽出部２０５は、音声データxa[pfa[0]]＝xa[3]のビット４０４を読み出して、音声特徴量fa[0]のビット４０６に格納し、音声データxa[pfa[1]]＝xa[7]のビット４０５を読み出して、音声特徴量fa[1]のビット４０７に格納する。音声特徴量読み出し位置pfa[i]は奇数の集合であるため、音声透かし埋込部２０２は、音声透かし情報dea[i]が埋め込まれていない音声データxa[i]の奇数番目のビットを読み出して音声特徴量fa[i]を作成する。 Here, the processing in steps S303 to S305 will be described with reference to FIG. 8. When the speech feature reading position pfa [i] is, for example, {3, 7, 1, 5,. The unit 205 reads the bit 404 of the audio data xa [pfa [0]] = xa [3], stores the bit 404 in the audio feature amount fa [0], and stores the audio data xa [pfa [1]] = xa. The bit 405 of [7] is read and stored in the bit 407 of the audio feature value fa [1]. Since the audio feature reading position pfa [i] is an odd set, the audio watermark embedding unit 202 reads the odd-numbered bits of the audio data xa [i] in which the audio watermark information dea [i] is not embedded. Voice feature fa [i].

図９は、映像データについての特徴量の抽出処理（Ｓ３００）の一例を示すフローチャートである。映像特徴量抽出部２０８は、１フレーム毎の映像データxv[i]について、本フローチャートに示す処理を実行する。 FIG. 9 is a flowchart illustrating an example of a feature amount extraction process (S300) for video data. The video feature amount extraction unit 208 executes the processing shown in this flowchart for video data xv [i] for each frame.

まず、映像特徴量抽出部２０８は、映像特徴量抽出位置pfv[i](0≦i＜FV)を準備する（Ｓ３１１）。ここで、映像特徴量ビット長FVは、映像データxv[i]の同一性を判定するために映像データxv[i]から抽出されるビットの集合を示す映像特徴量fv[i]のデータ量である。本実施形態において、映像特徴量ビット長FVは、例えば、１フレームの音声データxv[i]について、映像透かし情報ビット長DVから後述するヘッダ情報のデータ長DVHを除いた1,151,000である。 First, the video feature quantity extraction unit 208 prepares a video feature quantity extraction position pfv [i] (0 ≦ i <FV) (S311). Here, the video feature amount bit length FV is the data amount of the video feature amount fv [i] indicating a set of bits extracted from the video data xv [i] in order to determine the identity of the video data xv [i]. It is. In the present embodiment, the video feature amount bit length FV is, for example, 1,151,000 obtained by subtracting the data length DVH of header information described later from the video watermark information bit length DV for audio data xv [i] of one frame.

映像特徴量抽出位置pfv[i]は、0≦pfv[i]＜XVかつpfv[i]mod2＝1を満たす数列である。映像特徴量抽出位置pfv[i]は、例えば、XVが偶数であれば数列{0,1,2,…,XV/2-1}から、XVが奇数であれば数列{0,1,2,…,(XV-1)/2}から、擬似乱数を用いてランダムにFV個の数字{xn}を選択し、{2xn+1}を計算する方法で生成することができる。本実施形態において、映像特徴量抽出位置pfv[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 The video feature quantity extraction position pfv [i] is a sequence satisfying 0 ≦ pfv [i] <XV and pfv [i] mod2 = 1. The video feature amount extraction position pfv [i] is, for example, from the sequence {0, 1, 2,..., XV / 2-1} if XV is an even number, and the sequence {0, 1, 2 if XV is an odd number. ,..., (XV-1) / 2} by using pseudorandom numbers and randomly selecting FV numbers {xn} and calculating {2xn + 1}. In the present embodiment, the video feature amount extraction position pfv [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、映像特徴量抽出部２０８は、変数ipfvを０に初期化し（Ｓ３１２）、映像透かし埋込部２０４から出力された映像データxvのpfv[ipfv]番目のビットを読み出して映像特徴量fvのipfv番目のビットに格納する（Ｓ３１３）。そして、映像特徴量抽出部２０８は、変数ipfvの値が映像特徴量ビット長FVの値と一致したか否かを判定する（Ｓ３１４）。 Next, the video feature amount extraction unit 208 initializes the variable ipfv to 0 (S312), reads the pfv [ipfv] -th bit of the video data xv output from the video watermark embedding unit 204, and reads the video feature amount fv. Is stored in the ipfv-th bit (S313). Then, the video feature amount extraction unit 208 determines whether or not the value of the variable ipfv matches the value of the video feature amount bit length FV (S314).

変数ipfvの値が映像特徴量ビット長FVの値と異なる場合（Ｓ３１４：Ｎｏ）、映像特徴量抽出部２０８は、変数ipfvの値を１増やして（Ｓ３１５）、再びステップＳ３１３に示した処理を実行する。変数ipfvの値が映像特徴量ビット長FVの値と一致した場合（Ｓ３１４：Ｙｅｓ）、映像特徴量抽出部２０８は、映像特徴量fv[i]を映像特徴量遅延部２０７へ出力し（Ｓ３１６）、本フローチャートに示した処理を終了する。 When the value of the variable ipfv is different from the value of the video feature amount bit length FV (S314: No), the video feature amount extraction unit 208 increments the value of the variable ipfv by 1 (S315), and performs the process shown in step S313 again. Execute. When the value of the variable ipfv matches the value of the video feature amount bit length FV (S314: Yes), the video feature amount extraction unit 208 outputs the video feature amount fv [i] to the video feature amount delay unit 207 (S316). ), The process shown in this flowchart is terminated.

図１０は、音声データについての特徴量の遅延処理（Ｓ４００）の一例を示すフローチャートである。音声特徴量遅延部２０６は、１フレーム毎に音声特徴量抽出部２０５から出力された音声特徴量fa[i]について、本フローチャートに示す処理を実行する。 FIG. 10 is a flowchart showing an example of the feature amount delay processing (S400) for audio data. The audio feature amount delay unit 206 executes the processing shown in this flowchart for the audio feature amount fa [i] output from the audio feature amount extraction unit 205 for each frame.

まず、音声特徴量遅延部２０６は、音声特徴量バッファbfa[BFA-1][j](0≦i＜BFA，0≦j＜FA)内の音声特徴量fa[i]を透かし情報作成部２０３へ出力する（Ｓ４０１）。本実施形態において、音声特徴量遅延部２０６は、音声特徴量バッファbfaを３個有しており、音声特徴量バッファ数BFAは３である。 First, the speech feature amount delay unit 206 converts the speech feature amount fa [i] in the speech feature amount buffer bfa [BFA-1] [j] (0 ≦ i <BFA, 0 ≦ j <FA) into a watermark information creation unit. It outputs to 203 (S401). In the present embodiment, the audio feature amount delay unit 206 includes three audio feature amount buffers bfa, and the audio feature amount buffer number BFA is three.

次に、音声特徴量遅延部２０６は、変数ibfaにBFA-2（本実施形態ではBFA-2＝1）を設定し（Ｓ４０２）、音声特徴量バッファbfa[ibfa][j]内のデータを、音声特徴量バッファbfa[ibfa+1][j]に格納する（Ｓ４０３）。そして、変数ibfaの値が０になったか否かを判定する（Ｓ４０４）。 Next, the speech feature amount delay unit 206 sets BFA-2 (BFA-2 = 1 in this embodiment) to the variable ibfa (S402), and the data in the speech feature amount buffer bfa [ibfa] [j] Then, it is stored in the voice feature buffer bfa [ibfa + 1] [j] (S403). Then, it is determined whether or not the value of the variable ibfa has become 0 (S404).

変数ibfaの値が０になっていない場合（Ｓ４０４：Ｎｏ）、音声特徴量遅延部２０６は、変数ibfaの値を１減らして（Ｓ４０５）、再びステップＳ４０３に示した処理を実行する。変数ibfaの値が０になった場合（Ｓ４０４：Ｙｅｓ）、音声特徴量遅延部２０６は、音声特徴量抽出部２０５から出力された音声特徴量fa[i]を、音声特徴量バッファbfa[0][j]内に格納し（Ｓ４０６）、本フローチャートに示した処理を終了する。 When the value of the variable ibfa is not 0 (S404: No), the audio feature amount delay unit 206 decreases the value of the variable ibfa by 1 (S405), and executes the process shown in step S403 again. When the value of the variable ibfa becomes 0 (S404: Yes), the speech feature amount delay unit 206 converts the speech feature amount fa [i] output from the speech feature amount extraction unit 205 into the speech feature amount buffer bfa [0. ] [j] (S406), and the process shown in this flowchart is terminated.

ここで、ステップＳ４０１からＳ４０５の処理を図１１を用いて説明すると、音声特徴量遅延部２０６は、ステップＳ４０１において、音声特徴量バッファbfa[BFA-1][j]内の音声特徴量fa[i]を透かし情報作成部２０３へ出力し、ステップＳ４０３から４０５において、音声特徴量バッファbfa内の音声特徴量fa[i]を１つずつずらし、ステップＳ４０６において、音声特徴量抽出部２０５から出力された音声特徴量fa[i]を、音声特徴量バッファbfa[0][j]内に格納する。図１０の処理は、１フレーム毎に実行され、音声特徴量バッファ数BFAが３であるため、音声特徴量遅延部２０６は、音声特徴量抽出部２０５から出力された音声特徴量fa[i]を、３フレーム分遅延させて透かし情報作成部２０３へ出力する。 Here, the processing of steps S401 to S405 will be described with reference to FIG. 11. In step S401, the speech feature amount delay unit 206 performs speech feature amount fa [B] [BFA-1] [j] in the speech feature amount buffer b [ i] is output to the watermark information creation unit 203. In steps S403 to 405, the audio feature amount fa [i] in the audio feature amount buffer bfa is shifted one by one. In step S406, the audio feature amount extraction unit 205 outputs it. The voice feature amount fa [i] thus obtained is stored in the voice feature amount buffer bfa [0] [j]. The processing in FIG. 10 is executed for each frame, and the number of speech feature amount buffers BFA is 3. Therefore, the speech feature amount delay unit 206 outputs the speech feature amount fa [i] output from the speech feature amount extraction unit 205. Is delayed by 3 frames and output to the watermark information creation unit 203.

図１２は、映像データについての特徴量の遅延処理（Ｓ４００）の一例を示すフローチャートである。映像特徴量遅延部２０７は、１フレーム毎に映像特徴量抽出部２０８から出力された映像特徴量fv[i]について、本フローチャートに示す処理を実行する。 FIG. 12 is a flowchart illustrating an example of a feature amount delay process (S400) for video data. The video feature amount delay unit 207 executes the processing shown in this flowchart for the video feature amount fv [i] output from the video feature amount extraction unit 208 for each frame.

まず、映像特徴量遅延部２０７は、映像特徴量バッファbfv[BFV-1][j](0≦i＜BFV，0≦j＜FV)内の映像特徴量fv[i]を透かし情報作成部２０３へ出力する（Ｓ４１１）。本実施形態において、映像特徴量遅延部２０７は、映像特徴量バッファbfvを３個有しており、映像特徴量バッファ数BFVは３である。 First, the video feature amount delay unit 207 uses the video feature amount fv [i] in the video feature amount buffer bfv [BFV-1] [j] (0 ≦ i <BFV, 0 ≦ j <FV) as a watermark information generation unit. The data is output to 203 (S411). In this embodiment, the video feature amount delay unit 207 has three video feature amount buffers bfv, and the number of video feature amount buffers BFV is three.

次に、映像特徴量遅延部２０７は、変数ibfvにBFV-2（本実施形態ではBFV-2＝1）を設定し（Ｓ４１２）、映像特徴量バッファbfv[ibfv][j]内のデータを、映像特徴量バッファbfv[ibfv+1][j]に格納する（Ｓ４１３）。そして、変数ibfvの値が０になったか否かを判定する（Ｓ４１４）。 Next, the video feature amount delay unit 207 sets BFV-2 (BFV-2 = 1 in this embodiment) to the variable ibfv (S412), and stores the data in the video feature amount buffer bfv [ibfv] [j]. Then, it is stored in the video feature buffer bfv [ibfv + 1] [j] (S413). Then, it is determined whether or not the value of the variable ibfv has become 0 (S414).

変数ibfvの値が０になっていない場合（Ｓ４１４：Ｎｏ）、映像特徴量遅延部２０７は、変数ibfvの値を１減らして（Ｓ４１５）、再びステップＳ４１３に示した処理を実行する。変数ibfvの値が０になった場合（Ｓ４１４：Ｙｅｓ）、映像特徴量遅延部２０７は、映像特徴量抽出部２０８から出力された映像特徴量fv[i]を、映像特徴量バッファbfv[0][j]内に格納し（Ｓ４１６）、本フローチャートに示した処理を終了する。 When the value of the variable ibfv is not 0 (S414: No), the video feature amount delay unit 207 decreases the value of the variable ibfv by 1 (S415), and executes the process shown in step S413 again. When the value of the variable ibfv becomes 0 (S414: Yes), the video feature amount delay unit 207 uses the video feature amount fv [i] output from the video feature amount extraction unit 208 as the video feature amount buffer bfv [0]. ] [j] (S416), and the process shown in this flowchart is terminated.

図１３は、透かし情報の作成処理（Ｓ５００）の一例を示すフローチャートである。透かし情報作成部２０３は、１フレーム毎に音声特徴量遅延部２０６および映像特徴量遅延部２０７からそれぞれ出力された音声特徴量fa[i]および映像特徴量fv[i]について、本フローチャートに示す処理を実行する。 FIG. 13 is a flowchart illustrating an example of watermark information creation processing (S500). The watermark information creation unit 203 shows the audio feature amount fa [i] and the video feature amount fv [i] output from the audio feature amount delay unit 206 and the video feature amount delay unit 207 for each frame, as shown in this flowchart. Execute the process.

まず、透かし情報作成部２０３は、音声透かし情報dea[i](0≦i＜DA)および映像透かし情報dev[i](0≦i＜DV)の領域をメモリ上に確保し、音声透かし情報dea[i](0≦i＜DAH)および映像透かし情報dev[i](0≦i＜DVH)にそれぞれヘッダ情報を書き込む（Ｓ５０１）。本実施形態において、ヘッダ領域としては1000ビットを想定しており、DAH＝DVH＝1000である。また、ヘッダ情報には、音声透かし情報ヘッダ長DAH、映像透かし情報ヘッダ長DVH、分割パラメータFAA（後述）、分割パラメータFAV（後述）、分割パラメータFVA（後述）、分割パラメータFVV（後述）、時間情報、音声特徴量バッファ数BFA、および映像特徴量バッファ数BFVなどがある。 First, the watermark information creating unit 203 secures areas of the audio watermark information dea [i] (0 ≦ i <DA) and the video watermark information dev [i] (0 ≦ i <DV) on the memory, and the audio watermark information Header information is written in dea [i] (0 ≦ i <DAH) and video watermark information dev [i] (0 ≦ i <DVH), respectively (S501). In this embodiment, 1000 bits are assumed as the header area, and DAH = DVH = 1000. The header information includes an audio watermark information header length DAH, a video watermark information header length DVH, a division parameter FAA (described later), a division parameter FAV (described later), a division parameter FVA (described later), a division parameter FVV (described later), time Information, number of audio feature buffers BFA, number of video feature buffers BFV, and the like.

次に、透かし情報作成部２０３は、下記の数式（１）を用いて分割パラメータFAA、FAV、FVA、およびFVVを算出する。 Next, the watermark information creating unit 203 calculates the division parameters FAA, FAV, FVA, and FVV using the following mathematical formula (1).

本実施形態において、音声特徴量ビット長FAは11,000であり、音声透かし情報ビット長DAは12,000であり、映像透かし情報ビット長DVは1,152,000であるので、透かし情報作成部２０３は、分割パラメータFAAを、11,000×12,000÷(12,000＋1,152,000)≒113と算出する。そして、透かし情報作成部２０３は、分割パラメータFAVを、11,000−113＝10,887と算出する。また、本実施形態において、映像特徴量ビット長FVは1,151,000であるので、透かし情報作成部２０３は、分割パラメータFVAを、1,151,000×12,000÷(12,000＋1,152,000)≒11,866と算出する。そして、透かし情報作成部２０３は、分割パラメータFVVを、1,151,000−11,866＝1,139,134と算出する。 In the present embodiment, since the audio feature bit length FA is 11,000, the audio watermark information bit length DA is 12,000, and the video watermark information bit length DV is 1,152,000, the watermark information creation unit 203 sets the division parameter FAA. , 11,000 × 12,000 ÷ (12,000 + 1,152,000) ≒ 113 Then, the watermark information creation unit 203 calculates the division parameter FAV as 11,000−113 = 10,887. In this embodiment, since the video feature bit length FV is 1,151,000, the watermark information creation unit 203 calculates the division parameter FVA as 1,151,000 × 12,000 ÷ (12,000 + 1,152,000) ≈11,866. Then, the watermark information creation unit 203 calculates the division parameter FVV as 1,151,000-11,866 = 1,139,134.

次に、透かし情報作成部２０３は、図１４に示すように、音声特徴量fa[i](0≦i＜FAA)を、音声透かし情報dea[i](DAH≦i＜DAH＋FAA)に格納する（Ｓ５０２）。そして、透かし情報作成部２０３は、図１４に示すように、映像特徴量fv[i](0≦i＜FVA)を、音声透かし情報dea[i](DAH＋FAA≦i＜DAH＋FAA＋FVA)に格納する（Ｓ５０３）。 Next, as shown in FIG. 14, the watermark information creating unit 203 stores the audio feature amount fa [i] (0 ≦ i <FAA) in the audio watermark information dea [i] (DAH ≦ i <DAH + FAA). (S502). Then, as shown in FIG. 14, the watermark information creating unit 203 stores the video feature quantity fv [i] (0 ≦ i <FVA) in the audio watermark information dea [i] (DAH + FAA ≦ i <DAH + FAA + FVA) ( S503).

次に、透かし情報作成部２０３は、図１４に示すように、音声特徴量fa[i](FAA≦i＜FA)を、映像透かし情報dev[i]（DVH≦i＜DVH＋FAV）に格納する（Ｓ５０４）。そして、透かし情報作成部２０３は、図１４に示すように、映像特徴量fv[i](FVA≦i＜FV)を、映像透かし情報dev[i](DVH＋FAV≦i＜DVH＋FAV＋FVV)に格納する（Ｓ５０５）。そして、透かし情報作成部２０３は、音声透かし情報dea[i]を音声透かし埋込部２０２へ出力し、映像透かし情報dev[i]を映像透かし埋込部２０４へ出力し（Ｓ５０６）、本フローチャートに示した処理を終了する。 Next, as shown in FIG. 14, the watermark information creating unit 203 stores the audio feature amount fa [i] (FAA ≦ i <FA) in the video watermark information dev [i] (DVH ≦ i <DVH + FAV). (S504). Then, as shown in FIG. 14, the watermark information creating unit 203 stores the video feature quantity fv [i] (FVA ≦ i <FV) in the video watermark information dev [i] (DVH + FAV ≦ i <DVH + FAV + FVV) ( S505). Then, the watermark information creating unit 203 outputs the audio watermark information dea [i] to the audio watermark embedding unit 202, and outputs the video watermark information dev [i] to the video watermark embedding unit 204 (S506). The process shown in (5) is terminated.

このように、透かし情報埋込装置２０は、音声データxa[i]から抽出した音声特徴量fa[i]の一部のビットと、映像データxv[i]から抽出した映像特徴量fv[i]の一部のビットとを含む音声透かし情報dea[i]を音声データxa[i]に埋め込むと共に、音声特徴量fa[i]の残りのビットと、映像特徴量fv[i]の残りのビットとを含む映像透かし情報dev[i]を映像データxa[i]に埋め込むため、音声データxa[i]および映像データxv[i]のいずれを差し替えた場合であっても、コンテンツの改ざんを検出することができる。 As described above, the watermark information embedding device 20 uses a part of the audio feature value fa [i] extracted from the audio data xa [i] and the video feature value fv [i] extracted from the video data xv [i]. ] Is embedded in the audio data xa [i], and the remaining bits of the audio feature fa [i] and the remaining video feature fv [i] Since the video watermark information dev [i] including the bit is embedded in the video data xa [i], even if either the audio data xa [i] or the video data xv [i] is replaced, the content is not altered. Can be detected.

また、透かし情報埋込装置２０は、作成した音声透かし情報dea[i]および映像透かし情報dev[i]を、所定数後のフレームの音声データxa[i]および映像データxa[i]にそれぞれ埋め込むため、音声データxa[i]および映像データxv[i]の削除や挿入による改ざんを検出することができる。 Also, the watermark information embedding device 20 applies the created audio watermark information dea [i] and video watermark information dev [i] to audio data xa [i] and video data xa [i] of a predetermined number of frames, respectively. Since it is embedded, it is possible to detect tampering due to deletion or insertion of the audio data xa [i] and the video data xv [i].

図１５は、改ざん検出装置３０の機能構成の一例を示すブロック図である。改ざん検出装置３０は、コンテンツ再生部３００、音声特徴量抽出部３０１、音声透かし抽出部３０２、映像透かし抽出部３０３、映像特徴量抽出部３０４、音声特徴量遅延部３０５、特徴量再構成部３０６、映像特徴量遅延部３０７、音声改ざん検出部３０８、および映像改ざん検出部３０９を有する。改ざん検出装置３０内の各機能の動作については、図１６以降のフローチャートを用いて説明する。 FIG. 15 is a block diagram illustrating an example of a functional configuration of the falsification detection device 30. The falsification detection device 30 includes a content reproduction unit 300, an audio feature amount extraction unit 301, an audio watermark extraction unit 302, a video watermark extraction unit 303, a video feature amount extraction unit 304, an audio feature amount delay unit 305, and a feature amount reconstruction unit 306. , A video feature amount delay unit 307, an audio tampering detection unit 308, and a video tampering detection unit 309. The operation of each function in the alteration detection device 30 will be described with reference to the flowcharts in FIG.

図１６は、改ざん検出装置３０の動作の一例を示すフローチャートである。改ざん検出装置３０は、所定の時間間隔（本実施形態では１秒）のフレームのコンテンツを記録媒体１３から読み込む毎に、図１６のフローチャートに示す処理を実行する。なお、コンテンツ再生部３００は、記録媒体１３から読み出したコンテンツを再生してスピーカ１４および表示装置１５を介して出力する通常のコンテンツ再生機能を実現するブロックであるため、コンテンツ再生部３００の動作についての説明は省略する。 FIG. 16 is a flowchart illustrating an example of the operation of the falsification detection device 30. The tampering detection device 30 executes the processing shown in the flowchart of FIG. 16 every time the content of a frame at a predetermined time interval (1 second in this embodiment) is read from the recording medium 13. Note that the content reproduction unit 300 is a block that realizes a normal content reproduction function of reproducing the content read from the recording medium 13 and outputting the content via the speaker 14 and the display device 15. Description of is omitted.

まず、音声特徴量抽出部３０１は、記録媒体１３から１フレーム分の音声データxa[i](0≦i＜XA)を読み込み、図７を用いて説明した特徴量の抽出処理を実行して、読み込んだ音声データxa[i]から音声特徴量fa[i]を抽出する（Ｓ３００）。また、映像特徴量抽出部３０４は、記録媒体１３から１フレーム分の映像データxv[i](0≦i＜XV)を読み込み、図９を用いて説明した特徴量の抽出処理を実行して、読み込んだ映像データxv[i]から映像特徴量fv[i]を抽出する（Ｓ３００）。 First, the audio feature quantity extraction unit 301 reads audio data xa [i] (0 ≦ i <XA) for one frame from the recording medium 13, and executes the feature quantity extraction process described with reference to FIG. Then, the voice feature amount fa [i] is extracted from the read voice data xa [i] (S300). The video feature quantity extraction unit 304 reads video data xv [i] (0 ≦ i <XV) for one frame from the recording medium 13, and executes the feature quantity extraction process described with reference to FIG. Then, the video feature quantity fv [i] is extracted from the read video data xv [i] (S300).

次に、音声特徴量遅延部３０５は、図１０を用いて説明した特徴量の遅延処理を実行して、音声特徴量抽出部３０１によって抽出された音声特徴量fa[i]を、透かし情報埋込装置２０の音声特徴量遅延部２０６が遅延させたフレーム数分遅延させる（Ｓ４００）。同様に、映像特徴量遅延部３０７は、図１２を用いて説明した特徴量の遅延処理を実行して、映像特徴量抽出部３０４によって抽出された映像特徴量fv[i]を、透かし情報埋込装置２０の映像特徴量抽出部２０８が遅延させたフレーム数分遅延させる（Ｓ４００）。 Next, the speech feature amount delay unit 305 executes the feature amount delay processing described with reference to FIG. 10, and converts the speech feature amount fa [i] extracted by the speech feature amount extraction unit 301 into watermark information embedding. The speech feature amount delay unit 206 of the insertion device 20 delays the number of frames delayed (S400). Similarly, the video feature amount delay unit 307 executes the feature amount delay processing described with reference to FIG. 12, and converts the video feature amount fv [i] extracted by the video feature amount extraction unit 304 into watermark information embedding. The image feature amount extraction unit 208 of the insertion device 20 delays the number of frames delayed (S400).

次に、音声透かし抽出部３０２および映像透かし抽出部３０３は、それぞれ、記録媒体１３から１フレーム分の音声データxa[i](0≦i＜XA)および映像データxv[i](0≦i＜XV)を読み込み、後述する透かし情報の抽出処理を実行して、読取音声透かし情報dda[i]および読取映像透かし情報ddv[i]を抽出する（Ｓ６００）。 Next, the audio watermark extraction unit 302 and the video watermark extraction unit 303 respectively store audio data xa [i] (0 ≦ i <XA) and video data xv [i] (0 ≦ i) for one frame from the recording medium 13. <XV) is read, and watermark information extraction processing described later is executed to extract read audio watermark information dda [i] and read video watermark information ddv [i] (S600).

次に、特徴量再構成部３０６は、後述する特徴量の再構成処理を実行することにより、ステップＳ６００で抽出された読取音声透かし情報dda[i]および読取映像透かし情報ddv[i]から、読取音声特徴量fad[i]および読取映像特徴量fvd[i]をそれぞれ復元する（Ｓ７００）。 Next, the feature amount reconstruction unit 306 performs a feature amount reconstruction process, which will be described later, from the read audio watermark information dda [i] and the read video watermark information ddv [i] extracted in step S600. The read audio feature value fad [i] and the read video feature value fvd [i] are restored (S700).

次に、音声改ざん検出部３０８は、ステップＳ４００で遅延された音声特徴量fa[i]と、ステップＳ７００で復元された読取音声特徴量fad[i]とを比較することにより、音声データxa[i]の改ざんの有無を判定する（Ｓ８００）。また、映像改ざん検出部３０９は、ステップＳ４００で遅延された映像特徴量fv[i]と、ステップＳ７００で復元された読取映像特徴量fvd[i]とを比較することにより、映像データxv[i]の改ざんの有無を判定し（Ｓ８００）、改ざん検出装置３０は、本フローチャートに示した処理を終了する。 Next, the voice alteration detection unit 308 compares the voice feature quantity fa [i] delayed in step S400 with the read voice feature quantity fad [i] restored in step S700, thereby obtaining voice data xa [ i] is determined whether it has been tampered with (S800). Further, the video alteration detection unit 309 compares the video feature quantity fv [i] delayed in step S400 with the read video feature quantity fvd [i] restored in step S700, thereby obtaining video data xv [i. ] Of tampering is determined (S800), and the tampering detection apparatus 30 ends the processing shown in this flowchart.

図１７は、第１の実施形態における音声データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。音声透かし抽出部３０２は、１フレーム毎に記録媒体１３から読み出した音声データxa[i]について、本フローチャートに示す処理を実行する。 FIG. 17 is a flowchart illustrating an example of watermark information extraction processing (S600) for audio data according to the first embodiment. The audio watermark extraction unit 302 executes the processing shown in this flowchart for the audio data xa [i] read from the recording medium 13 for each frame.

まず、音声透かし抽出部３０２は、音声透かし情報埋込位置pwa[i](0≦i＜DA)を準備する（Ｓ６０１）。音声透かし情報埋込位置pwa[i]は、図４で説明した音声透かし情報埋込位置pwa[i]と同一の数列であり、音声透かし埋込部２０２によって用いられた音声透かし情報埋込位置pwa[i]が、例えば改ざん検出装置３０の管理者等によって予め改ざん検出装置３０に登録される。 First, the audio watermark extraction unit 302 prepares an audio watermark information embedding position pwa [i] (0 ≦ i <DA) (S601). The audio watermark information embedding position pwa [i] is the same number sequence as the audio watermark information embedding position pwa [i] described in FIG. 4, and the audio watermark information embedding position used by the audio watermark embedding unit 202 is used. pwa [i] is registered in advance in the alteration detection device 30 by, for example, an administrator of the alteration detection device 30.

次に、音声透かし抽出部３０２は、変数ivaの値を０に初期化し（Ｓ６０２）、音声データxaのpwa[iva]番目のビットを、読取音声透かし情報ddaのiva番目のビットに格納する（Ｓ６０３）。そして、音声透かし抽出部３０２は、変数ivaの値が音声透かし情報ビット長DAの値と一致したか否かを判定する（Ｓ６０４）。変数ivaの値が音声透かし情報ビット長DAの値と異なる場合（Ｓ６０４：Ｎｏ）、音声透かし抽出部３０２は、変数ivaの値を１増やして（Ｓ６０５）、再びステップＳ６０３に示した処理を実行する。変数ivaの値が音声透かし情報ビット長DAの値と一致した場合（Ｓ６０４：Ｙｅｓ）、音声透かし抽出部３０２は、読取音声透かし情報dda[i]を特徴量再構成部３０６へ出力し（Ｓ６０６）、本フローチャートに示した処理を終了する。 Next, the audio watermark extraction unit 302 initializes the value of the variable iva to 0 (S602), and stores the pwa [iva] -th bit of the audio data xa in the iva-th bit of the read audio watermark information dda ( S603). Then, the audio watermark extraction unit 302 determines whether or not the value of the variable iva matches the value of the audio watermark information bit length DA (S604). When the value of the variable iva is different from the value of the audio watermark information bit length DA (S604: No), the audio watermark extraction unit 302 increments the value of the variable iva by 1 (S605), and executes the process shown in step S603 again. To do. When the value of the variable iva matches the value of the audio watermark information bit length DA (S604: Yes), the audio watermark extraction unit 302 outputs the read audio watermark information dda [i] to the feature amount reconstruction unit 306 (S606). ), The process shown in this flowchart is terminated.

ここで、ステップＳ６０３からＳ６０５の処理を図１８を用いて説明すると、音声透かし情報埋込位置pwa[i]が例えば{2,6,0,4,・・・}である場合、音声透かし抽出部３０２は、読取音声透かし情報dda[0]のビット４１０に、音声データxa[pwa[0]]＝xa[2]のビット４０８を格納し、読取音声透かし情報dda[1]のビット４１１に、音声データxa[pwa[1]]＝xa[6]のビット４０９を格納する。音声透かし抽出部３０２は、音声データxa[i]の偶数番目のビットに埋め込まれている音声透かし情報dea[i]のビットを読取音声透かし情報dda[i]のビットとして抽出する。 Here, the processing of steps S603 to S605 will be described with reference to FIG. 18. When the audio watermark information embedding position pwa [i] is {2, 6, 0, 4,. The unit 302 stores the bit 408 of the audio data xa [pwa [0]] = xa [2] in the bit 410 of the read audio watermark information dda [0], and stores it in the bit 411 of the read audio watermark information dda [1]. , Bit 409 of audio data xa [pwa [1]] = xa [6] is stored. The audio watermark extraction unit 302 extracts the bits of the audio watermark information dea [i] embedded in the even-numbered bits of the audio data xa [i] as the bits of the read audio watermark information dda [i].

図１９は、第１の実施形態における映像データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。映像透かし抽出部３０３は、１フレーム毎に記録媒体１３から読み出した映像データxv[i]について、本フローチャートに示す処理を実行する。 FIG. 19 is a flowchart illustrating an example of watermark information extraction processing (S600) for video data according to the first embodiment. The video watermark extraction unit 303 executes the processing shown in this flowchart for the video data xv [i] read from the recording medium 13 for each frame.

まず、映像透かし抽出部３０３は、映像透かし情報埋込位置pwv[i](0≦i＜DV)を準備する（Ｓ６１１）。映像透かし情報埋込位置pwv[i]は、図６で説明した映像透かし情報埋込位置pwv[i]と同一の数列であり、映像透かし埋込部２０４によって用いられた映像透かし情報埋込位置pwv[i]が、例えば改ざん検出装置３０の管理者等によって予め改ざん検出装置３０に登録される。 First, the video watermark extraction unit 303 prepares a video watermark information embedding position pwv [i] (0 ≦ i <DV) (S611). The video watermark information embedding position pwv [i] is the same sequence as the video watermark information embedding position pwv [i] described with reference to FIG. 6, and the video watermark information embedding position used by the video watermark embedding unit 204. pwv [i] is registered in advance in the alteration detection device 30 by, for example, an administrator of the alteration detection device 30.

次に、映像透かし抽出部３０３は、変数ivvの値を０に初期化し（Ｓ６１２）、映像データxvのpwv[ivv]番目のビットを、読取映像透かし情報ddvのivv番目のビットに格納する（Ｓ６１３）。そして、映像透かし抽出部３０３は、変数ivvの値が映像透かし情報ビット長DVの値と一致したか否かを判定する（Ｓ６１４）。変数ivvの値が映像透かし情報ビット長DVの値と異なる場合（Ｓ６１４：Ｎｏ）、映像透かし抽出部３０３は、変数ivvの値を１増やして（Ｓ６１５）、再びステップＳ６１３に示した処理を実行する。変数ivvの値が映像透かし情報ビット長DVの値と一致した場合（Ｓ６１４：Ｙｅｓ）、映像透かし抽出部３０３は、読取映像透かし情報ddv[i]を特徴量再構成部３０６へ出力し（Ｓ６１６）、本フローチャートに示した処理を終了する。 Next, the video watermark extraction unit 303 initializes the value of the variable ivv to 0 (S612), and stores the pwv [ivv] th bit of the video data xv in the ivvth bit of the read video watermark information ddv ( S613). Then, the video watermark extraction unit 303 determines whether or not the value of the variable ivv matches the value of the video watermark information bit length DV (S614). When the value of the variable ivv is different from the value of the video watermark information bit length DV (S614: No), the video watermark extraction unit 303 increments the value of the variable ivv by 1 (S615) and executes the process shown in step S613 again. To do. When the value of the variable ivv matches the value of the video watermark information bit length DV (S614: Yes), the video watermark extraction unit 303 outputs the read video watermark information ddv [i] to the feature amount reconstruction unit 306 (S616). ), The process shown in this flowchart is terminated.

図２０は、特徴量の再構成処理（Ｓ７００）の動作の一例を示すフローチャートである。特徴量再構成部３０６は、１フレーム毎に音声透かし抽出部３０２および映像透かし抽出部３０３からそれぞれ出力された読取音声透かし情報dda[i]および読取映像透かし情報ddv[i]について、本フローチャートに示す処理を実行する。 FIG. 20 is a flowchart illustrating an example of the operation of the feature amount reconstruction process (S700). The feature amount reconstructing unit 306 uses this flowchart for the read audio watermark information dda [i] and the read video watermark information ddv [i] output from the audio watermark extraction unit 302 and the video watermark extraction unit 303 for each frame. The process shown is executed.

まず、特徴量再構成部３０６は、読取音声透かし情報dda[i](0≦i＜DAH)および読取映像透かし情報ddv[i](0≦i＜DVH)のそれぞれのヘッダ情報を読み込んで、音声透かし情報ヘッダ長DAH、映像透かし情報ヘッダ長DVH、分割パラメータFAA、分割パラメータFAV、分割パラメータFVA、分割パラメータFVV、時間情報、音声特徴量バッファ数BFA、および映像特徴量バッファ数BFV等の情報を取得する（Ｓ７０１）。 First, the feature amount reconstruction unit 306 reads each header information of the read audio watermark information dda [i] (0 ≦ i <DAH) and the read video watermark information ddv [i] (0 ≦ i <DVH), Information such as audio watermark information header length DAH, video watermark information header length DVH, division parameter FAA, division parameter FAV, division parameter FVA, division parameter FVV, time information, number of audio feature buffers BFA, and number of video feature buffers BFV Is acquired (S701).

次に、特徴量再構成部３０６は、図２１に示すように、読取音声透かし情報dda[i](DAH≦i＜DAH＋FAA)を、読取音声特徴量fad[i](0≦i＜FAA)に格納し（Ｓ７０２）、読取音声透かし情報dda[i](DAH＋FAA≦i＜DVH＋FAA＋FVA)を、読取映像特徴量fvd[i](0≦i＜FVA)に格納する（Ｓ７０３）。 Next, as shown in FIG. 21, the feature quantity reconstruction unit 306 converts the read voice watermark information dda [i] (DAH ≦ i <DAH + FAA) into the read voice feature quantity fad [i] (0 ≦ i <FAA). (S702), and the read audio watermark information dda [i] (DAH + FAA ≦ i <DVH + FAA + FVA) is stored in the read video feature value fvd [i] (0 ≦ i <FVA) (S703).

次に、特徴量再構成部３０６は、図２１に示すように、読取映像透かし情報ddv[i](DVH≦i＜DVH＋FAV)を、読取音声特徴量fad[i](FAA≦i＜FA)に格納し（Ｓ７０４）、読取映像透かし情報ddv[i](DVH＋FAV≦i＜DVH＋FAV＋FVV)を、読取映像特徴量fvd[i](FVA≦i＜FV)に格納する（Ｓ７０５）。そして、特徴量再構成部３０６は、読取音声特徴量fad[i]を音声改ざん検出部３０８へ出力し、読取映像特徴量fvd[i]を映像改ざん検出部３０９へ出力し（Ｓ７０６）、本フローチャートに示した処理を終了する。 Next, as shown in FIG. 21, the feature amount reconstruction unit 306 converts the read video watermark information ddv [i] (DVH ≦ i <DVH + FAV) into the read voice feature amount fad [i] (FAA ≦ i <FA). (S704), and the read video watermark information ddv [i] (DVH + FAV ≦ i <DVH + FAV + FVV) is stored in the read video feature value fvd [i] (FVA ≦ i <FV) (S705). Then, the feature amount reconstruction unit 306 outputs the read audio feature amount fad [i] to the audio alteration detection unit 308, and outputs the read image feature amount fvd [i] to the image alteration detection unit 309 (S706). The process shown in the flowchart ends.

図２２は、音声データについての改ざん判定処理（Ｓ８００）の一例を示すフローチャートである。音声特徴量遅延部３０５から音声特徴量fa[i]を受け取ると共に、特徴量再構成部３０６から読取音声特徴量fad[i]を受け取った場合に、音声改ざん検出部３０８は、本フローチャートに示す処理を実行する。 FIG. 22 is a flowchart illustrating an example of falsification determination processing (S800) for audio data. When the voice feature quantity fa [i] is received from the voice feature quantity delay unit 305 and the read voice feature quantity fad [i] is received from the feature quantity reconstruction unit 306, the voice alteration detection unit 308 is shown in this flowchart. Execute the process.

まず、音声改ざん検出部３０８は、変数ｉおよびｊの値を０に初期化し（Ｓ８０１）、読取音声特徴量fad[i]の値と音声特徴量fa[i]の値とが一致するか否かを判定する（Ｓ８０２）。読取音声特徴量fad[i]の値と音声特徴量fa[i]の値とが一致した場合（Ｓ８０２：Ｙｅｓ）、音声改ざん検出部３０８は、ステップＳ８０４に示す処理を実行する。 First, the voice alteration detection unit 308 initializes the values of the variables i and j to 0 (S801), and whether or not the value of the read voice feature value fad [i] matches the value of the voice feature value fa [i]. Is determined (S802). If the value of the read voice feature value fad [i] matches the value of the voice feature value fa [i] (S802: Yes), the voice tampering detection unit 308 executes the process shown in step S804.

読取音声特徴量fad[i]の値と音声特徴量fa[i]の値とが異なる場合（Ｓ８０２：Ｎｏ）、音声改ざん検出部３０８は、変数ｊの値を１増やし（Ｓ８０３）、変数ｉの値と音声特徴量ビット長FAの値とが一致したか否かを判定する（Ｓ８０４）。変数ｉの値と音声特徴量ビット長FAの値とが異なる場合（Ｓ８０４：Ｎｏ）、音声改ざん検出部３０８は、変数ｉの値を１増やし（Ｓ８０５）、再びステップＳ８０２に示した処理を実行する。 When the value of the read voice feature value fad [i] is different from the value of the voice feature value fa [i] (S802: No), the voice alteration detection unit 308 increments the value of the variable j by 1 (S803), and the variable i And the value of the voice feature amount bit length FA are determined (S804). When the value of the variable i is different from the value of the voice feature amount bit length FA (S804: No), the voice tampering detection unit 308 increments the value of the variable i by 1 (S805) and executes the process shown in step S802 again. To do.

変数ｉの値と音声特徴量ビット長FAの値とが一致した場合（Ｓ８０４：Ｙｅｓ）、音声改ざん検出部３０８は、変数ｊの値が０よりも大きいか否かを判定する（Ｓ８０６）。変数ｊの値が０である場合（Ｓ８０６：Ｎｏ）、音声改ざん検出部３０８は、本フローチャートに示した処理を終了する。変数ｊの値が０よりも大きい場合（Ｓ８０６：Ｙｅｓ）、音声改ざん検出部３０８は、例えば図２３に示すように、画像５０の領域５２に、音声の改ざんが検出された旨を表示し（Ｓ８０７）、本フローチャートに示した処理を終了する。 When the value of the variable i matches the value of the voice feature amount bit length FA (S804: Yes), the voice tampering detection unit 308 determines whether or not the value of the variable j is greater than 0 (S806). If the value of the variable j is 0 (S806: No), the voice tampering detection unit 308 ends the processing shown in this flowchart. If the value of the variable j is greater than 0 (S806: Yes), the audio tampering detection unit 308 displays that the audio tampering has been detected in the area 52 of the image 50, for example, as shown in FIG. S807), the process shown in this flowchart is terminated.

なお、図２３に示すように、画像５０の領域５１には、コンテンツ再生部３００によって再生された映像が表示され、領域５２には、改ざんが検出された場合に、その旨および改ざんが検出されたフレームの直前のフレームに含まれているヘッダ情報内の時刻情報等が表示される。 As shown in FIG. 23, an image reproduced by the content reproduction unit 300 is displayed in an area 51 of an image 50, and when an alteration is detected in the area 52, the fact and the alteration are detected. The time information in the header information included in the frame immediately before the frame is displayed.

図２４は、映像データについての改ざん判定処理（Ｓ８００）の一例を示すフローチャートである。映像特徴量遅延部３０７から映像特徴量fv[i]を受け取ると共に、特徴量再構成部３０６から読取映像特徴量fvd[i]を受け取った場合に、映像改ざん検出部３０９は、本フローチャートに示す処理を実行する。 FIG. 24 is a flowchart illustrating an example of alteration determination processing (S800) for video data. When the video feature quantity fv [i] is received from the video feature quantity delay unit 307 and the read video feature quantity fvd [i] is received from the feature quantity reconstruction unit 306, the video alteration detection unit 309 is shown in this flowchart. Execute the process.

まず、映像改ざん検出部３０９は、変数ｉおよびｊの値を０に初期化し（Ｓ８１１）、読取映像特徴量fvd[i]の値と映像特徴量fv[i]の値とが一致するか否かを判定する（Ｓ８１２）。読取映像特徴量fvd[i]の値と映像特徴量fv[i]の値とが一致した場合（Ｓ８１２：Ｙｅｓ）、映像改ざん検出部３０９は、ステップＳ８１４に示す処理を実行する。 First, the video alteration detection unit 309 initializes the values of the variables i and j to 0 (S811), and whether or not the value of the read video feature value fvd [i] matches the value of the video feature value fv [i]. Is determined (S812). When the value of the read video feature value fvd [i] matches the value of the video feature value fv [i] (S812: Yes), the video alteration detection unit 309 executes the process shown in step S814.

読取映像特徴量fvd[i]の値と映像特徴量fv[i]の値とが一致しない場合（Ｓ８１２：Ｎｏ）、映像改ざん検出部３０９は、変数ｊの値を１増やし（Ｓ８１３）、変数ｉの値と映像特徴量ビット長FVの値とが一致したか否かを判定する（Ｓ８１４）。変数ｉの値と映像特徴量ビット長FVの値とが異なる場合（Ｓ８１４：Ｎｏ）、映像改ざん検出部３０９は、変数ｉの値を１増やし（Ｓ８１５）、再びステップＳ８１２に示した処理を実行する。 When the value of the read video feature value fvd [i] does not match the value of the video feature value fv [i] (S812: No), the video alteration detection unit 309 increases the value of the variable j by 1 (S813), and the variable It is determined whether the value of i matches the value of the video feature bit length FV (S814). When the value of the variable i is different from the value of the video feature amount bit length FV (S814: No), the video alteration detection unit 309 increases the value of the variable i by 1 (S815), and executes the process shown in step S812 again. To do.

変数ｉの値と映像特徴量ビット長FVの値とが一致した場合（Ｓ８１４：Ｙｅｓ）、映像改ざん検出部３０９は、変数ｊの値が０よりも大きいか否かを判定する（Ｓ８１６）。変数ｊの値が０である場合（Ｓ８１６：Ｎｏ）、映像改ざん検出部３０９は、本フローチャートに示した処理を終了する。変数ｊの値が０よりも大きい場合（Ｓ８１６：Ｙｅｓ）、映像改ざん検出部３０９は、例えば図２３に示すように、画像５０の領域５２に、映像の改ざんが検出された旨を表示し（Ｓ８１７）、本フローチャートに示した処理を終了する。 When the value of the variable i matches the value of the video feature amount bit length FV (S814: Yes), the video alteration detection unit 309 determines whether the value of the variable j is greater than 0 (S816). When the value of the variable j is 0 (S816: No), the video alteration detection unit 309 ends the process shown in this flowchart. When the value of the variable j is larger than 0 (S816: Yes), the video alteration detection unit 309 displays that the alteration of the video has been detected in an area 52 of the image 50 as shown in FIG. S817), the process shown in this flowchart is terminated.

以上、本発明の第１の実施形態について説明した。 The first embodiment of the present invention has been described above.

上記説明から明らかなように、本実施形態の改ざん検出システム１０によれば、音声データと映像データとの関連付けを行ないつつ、それぞれについて、改ざん検出精度を高く保つと共に、元のデータの品質劣化を低く抑えることができる。 As is apparent from the above description, according to the falsification detection system 10 of the present embodiment, while associating audio data with video data, each of the falsification detection accuracy is kept high and the quality of the original data is degraded. It can be kept low.

次に、本発明の第２の実施形態について説明する。 Next, a second embodiment of the present invention will be described.

本実施形態では、透かし情報埋込装置２０において、透かし情報の埋込位置および特徴量の抽出位置を擬似乱数を用いて決定すると共に、特徴量から透かし情報を作成する際に、特徴量を擬似乱数を用いて並べ替えることにより、透かし情報の秘匿性を高める。また、改ざん検出装置３０では、透かし情報埋込装置２０が使用した擬似乱数と同一の擬似乱数を用いて、透かし情報の抽出位置および特徴量の抽出位置を決定すると共に、抽出した透かし情報から特徴量を復元する際に、透かし情報埋込装置２０によって行なわれた並べ替えを元に戻す。 In the present embodiment, the watermark information embedding device 20 determines the embedding position of the watermark information and the extraction position of the feature quantity using a pseudo random number, and the feature quantity is simulated when creating the watermark information from the feature quantity. By rearranging using random numbers, the confidentiality of watermark information is enhanced. Further, the falsification detection device 30 determines the extraction position of the watermark information and the extraction position of the feature amount using the same pseudo random number as the pseudo random number used by the watermark information embedding device 20, and features from the extracted watermark information. When restoring the amount, the rearrangement performed by the watermark information embedding device 20 is restored.

例えば、音声透かし埋込部２０２は、図４のステップＳ２０１において、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED1を用いて擬似乱数を生成し、XAが偶数あれば数列{0,2,4,…,XA-2}を、XAが奇数であれば数列{0,2,4,…,XA-1}を、生成した擬似乱数を用いてランダムに並び替えた後、先頭のDA個を選択することにより、音声透かし情報埋込位置pwa[i]を生成する。 For example, the audio watermark embedding unit 202 generates a pseudo random number using a constant SEED1 preset by the administrator of the watermark information embedding device 20 or the like in step S201 of FIG. 0,2,4, ..., XA-2}, if XA is an odd number, the sequence {0,2,4, ..., XA-1} is rearranged randomly using the generated pseudo-random numbers, The audio watermark information embedding position pwa [i] is generated by selecting the first DA.

また、映像透かし埋込部２０４は、例えば、図６のステップＳ２１１において、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED2を用いて擬似乱数を生成し、XVが偶数であれば数列{0,2,4,…,XV-2}を、XVが奇数であれば数列{0,2,4,…,XV-1}を、生成した擬似乱数を用いてランダムに並び替えた後、先頭のDV個を選択することにより、映像透かし情報埋込位置pwv[i]を生成する。 Also, the video watermark embedding unit 204 generates a pseudo-random number using a constant SEED2 preset by the administrator of the watermark information embedding device 20 in step S211 of FIG. 6, for example, even if XV is an even number. If the XV is an odd number, the sequence {0,2,4, ..., XV-1} is randomly reordered using the generated pseudo-random numbers. After that, by selecting the first DV, the video watermark information embedding position pwv [i] is generated.

また、音声特徴量抽出部２０５は、例えば、図７のステップＳ３０１において、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED3を用いて擬似乱数を生成し、XAが偶数であれば数列{0,1,2,…,XA/2-1}から、XAが奇数であれば数列{0,1,2,…,(XA-1)/2}から、生成した擬似乱数を用いてランダムにFA個の数字{xn}を選択し、{2xn+1}を計算することにより、音声特徴量抽出位置pfa[i]を生成する。 Further, for example, in step S301 in FIG. 7, the audio feature amount extraction unit 205 generates a pseudo random number using a constant SEED3 preset by the administrator of the watermark information embedding device 20 or the like, and XA is an even number. From the sequence {0,1,2, ..., XA / 2-1}, if XA is odd, the generated pseudo-random number from the sequence {0,1,2, ..., (XA-1) / 2} The speech feature quantity extraction position pfa [i] is generated by randomly selecting FA numbers {xn} and calculating {2xn + 1}.

また、映像特徴量抽出部２０８は、例えば、図９のステップＳ３１１において、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED4を用いて擬似乱数を生成し、XVが偶数であれば数列{0,1,2,…,XV/2-1}から、XVが奇数であれば数列{0,1,2,…,(XV-1)/2}から、擬似乱数を用いてランダムにFV個の数字{xn}を選択し、{2xn+1}を計算することにより、映像特徴量抽出位置pfv[i]を生成する。 Further, for example, in step S311 of FIG. 9, the video feature quantity extraction unit 208 generates a pseudo random number using a constant SEED4 preset by the administrator of the watermark information embedding device 20 or the like, and XV is an even number. From the sequence {0,1,2, ..., XV / 2-1}, if XV is odd, from the sequence {0,1,2, ..., (XV-1) / 2} A video feature quantity extraction position pfv [i] is generated by randomly selecting FV numbers {xn} and calculating {2xn + 1}.

また、透かし情報作成部２０３は、例えば、図１３のステップＳ５０１の前に、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED5を用いて擬似乱数を生成し、音声特徴量遅延部２０６から出力された音声特徴量fa[i]および映像特徴量遅延部２０７から出力された映像特徴量fv[i]を、生成した擬似乱数を用いてそれぞれランダムに並び替えた後に、並び替えた音声特徴量fa[i]および映像特徴量fv[i]についてステップＳ５０１以降の処理を実行する。 Also, the watermark information creating unit 203 generates a pseudo random number using a constant SEED5 preset by the administrator of the watermark information embedding device 20 or the like before step S501 in FIG. The audio feature amount fa [i] output from the unit 206 and the video feature amount fv [i] output from the video feature amount delay unit 207 are randomly rearranged using the generated pseudo-random numbers, and then rearranged. Further, the processing after step S501 is executed for the audio feature value fa [i] and the video feature value fv [i].

例えば、透かし情報作成部２０３は、{1,2,3,…,FA}の数列z[i]をつくり、生成した擬似乱数を用いてこれをランダムに並び替えて数列z'[i]を作成する。そして、透かし情報作成部２０３は、音声特徴量fa[i]の各要素を音声特徴量fa'[z'[i]]にコピーし、fa'[i]の各要素をfa[i]にコピーすることにより、音声特徴量fa[i]の各要素をランダムに並べ替える。映像特徴量fv[i]についても同様に、透かし情報作成部２０３は、映像特徴量fv[i]の各要素を、映像特徴量fa'[z'[i]]にコピーし、fa'[i]の各要素をfa[i]にコピーすることにより、映像特徴量fa[i]の各要素をランダムに並べ替える。 For example, the watermark information creation unit 203 creates a number sequence z [i] of {1, 2, 3,..., FA}, and rearranges it randomly using the generated pseudo-random number to obtain the number sequence z ′ [i]. create. Then, the watermark information creating unit 203 copies each element of the audio feature value fa [i] to the audio feature value fa ′ [z ′ [i]], and sets each element of fa ′ [i] to fa [i]. By copying, the elements of the audio feature value fa [i] are rearranged randomly. Similarly, for the video feature quantity fv [i], the watermark information creating unit 203 copies each element of the video feature quantity fv [i] to the video feature quantity fa ′ [z ′ [i]], and fa ′ [ By copying the elements of i] to fa [i], the elements of the video feature fa [i] are rearranged randomly.

また、透かし情報作成部２０３は、例えば、図１３のステップＳ５０５の後に、透かし情報埋込装置２０の管理者等によって予め設定された定数SEED5を用いて擬似乱数を生成し、ステップＳ５０３で作成した音声透かし情報dea[i]およびステップＳ５０５で作成した音声透かし情報dea[i]を、生成した擬似乱数を用いてそれぞれランダムに並び替えてから、並び替えた音声透かし情報dea[i]および音声透かし情報dea[i]についてステップＳ５０６の処理を実行する。 Further, the watermark information creation unit 203 generates a pseudo random number using a constant SEED5 preset by the administrator of the watermark information embedding device 20 after step S505 in FIG. 13, for example, and creates it in step S503. The audio watermark information dea [i] and the audio watermark information dea [i] created in step S505 are randomly rearranged using the generated pseudo-random numbers, and the rearranged audio watermark information dea [i] and audio watermark The process of step S506 is executed for the information dea [i].

なお、上記した擬似乱数の生成方法は、同一の定数から同一の擬似乱数が生成可能な方法であれば、どのような方法でもかまわない。また、擬似乱数が同一の場合に同一の並び替え結果となる方法であれば、上記した並び替えは他の方法によって行なわれてもよい。例えば、プログラミング言語C++では標準関数としてランダムな並べ替えを行う関数を備えており、これを用いてもよい。 Note that the pseudo-random number generation method described above may be any method as long as the same pseudo-random number can be generated from the same constant. Further, as long as the pseudorandom numbers are the same, the above-described rearrangement may be performed by another method as long as the rearrangement result is the same. For example, the programming language C ++ includes a function for performing random sorting as a standard function, and this may be used.

また、透かし情報埋込装置２０において擬似乱数の生成に使用されたSEED1からSEED5は、コンテンツデータと共に記録媒体１３に記録される、あるいは、他の記録媒体を経由することにより、改ざん検出装置３０によって当該コンテンツデータの改ざんの有無が判定される前に、予め改ざん検出装置３０に提供される。 In addition, SEED1 to SEED5 used for generating pseudo-random numbers in the watermark information embedding device 20 are recorded on the recording medium 13 together with the content data or by the alteration detection device 30 via another recording medium. The content data is provided to the alteration detection device 30 in advance before it is determined whether or not the content data has been altered.

また、音声透かし抽出部３０２は、例えば、図１７のステップＳ６０１において、予め取得したSEED1を用いて擬似乱数を生成し、生成した擬似乱数を用いて音声透かし情報埋込位置pwa[i]を復元する。また、映像透かし抽出部３０３は、例えば、図１９のステップＳ６１１において、予め取得したSEED2を用いて擬似乱数を生成し、生成した擬似乱数を用いて映像透かし情報埋込位置pwv[i]を復元する。 Also, for example, in step S601 in FIG. 17, the audio watermark extraction unit 302 generates pseudorandom numbers using SEED1 acquired in advance, and restores the audio watermark information embedding position pwa [i] using the generated pseudorandom numbers. To do. Further, for example, in step S611 in FIG. 19, the video watermark extraction unit 303 generates a pseudo random number using SEED2 acquired in advance, and restores the video watermark information embedding position pwv [i] using the generated pseudo random number. To do.

また、特徴量再構成部３０６は、例えば、図２０のステップＳ７０１の前に、予め取得したSEED5を用いて擬似乱数を生成し、読取音声透かし情報dda[i]および読取映像透かし情報ddv[i]について、生成した擬似乱数を用いて、図１３のステップＳ５０６の前に行われた並び替えと逆の操作を行なうことにより、読取音声透かし情報dda[i]および読取映像透かし情報ddv[i]の並びを元に戻す。 Further, for example, the feature amount reconstruction unit 306 generates pseudo random numbers using SEED5 acquired in advance before step S701 in FIG. 20, and reads the read audio watermark information dda [i] and the read video watermark information ddv [i. ] Using the generated pseudo-random number, the read audio watermark information dda [i] and the read video watermark information ddv [i] are performed by performing the reverse operation of the rearrangement performed before step S506 in FIG. Restore the order of.

また、特徴量再構成部３０６は、例えば、図２０のステップＳ７０６の前に、予め取得したSEED5を用いて擬似乱数を生成し、読取音声特徴量fad[i]および読取映像特徴量fvd[i]について、生成した擬似乱数を用いて、図１３のステップＳ５０１の前に行われた並び替えと逆の操作を行なうことにより、読取音声特徴量fad[i]および読取映像特徴量fvd[i]の並びを元に戻す。 Further, for example, before step S706 in FIG. 20, the feature amount reconstruction unit 306 generates pseudorandom numbers using SEED5 acquired in advance, and reads the voice feature amount fad [i] and the read video feature amount fvd [i. ] Using the generated pseudo-random numbers, the read voice feature value fad [i] and the read video feature value fvd [i] are performed by performing the reverse operation of the rearrangement performed before step S501 in FIG. Restore the order of.

以上、本発明の第２の実施形態について説明した。 The second embodiment of the present invention has been described above.

上記説明から明らかなように、本実施形態の改ざん検出システム１０によれば、透かし情報の秘匿性を高めることができる。 As is apparent from the above description, according to the falsification detection system 10 of the present embodiment, the confidentiality of watermark information can be enhanced.

次に、本発明の第３の実施形態について説明する。 Next, a third embodiment of the present invention will be described.

本実施形態では、透かし情報埋込装置２０が、フレーム毎に、音声データおよび映像データから抽出した特徴量を複数回用いて透かし情報を作成して音声データおよび映像データに埋め込み、改ざん検出装置３０が、フレーム毎に複数の特徴量のそれぞれのビットの値から、特徴量のビットの値を多数決により特定する。 In the present embodiment, the watermark information embedding device 20 creates watermark information by using a feature amount extracted from audio data and video data a plurality of times for each frame, and embeds the watermark information in the audio data and video data. However, the bit value of the feature value is specified by majority vote from the value of each bit of the plurality of feature values for each frame.

これにより、改ざん検出装置３０は、記録媒体１３に記録されたデータが劣化した場合や、データのわずかな変化を伴う変換が記録媒体１３に記録されたコンテンツデータに施された場合等、コンテンツの改ざんとは異なる変化をキャンセルして、コンテンツの改ざんの有無を精度よく判定することができる。なお、以下では、第１の実施形態における改ざん検出システム１０と異なる部分について説明する。 As a result, the falsification detection device 30 can detect the content of the content when the data recorded on the recording medium 13 is deteriorated, or when the conversion is performed on the content data recorded on the recording medium 13 with a slight change in the data. It is possible to cancel the change different from the alteration and accurately determine whether the content has been altered. Hereinafter, parts different from the falsification detection system 10 in the first embodiment will be described.

図２５は、第３の実施形態における音声データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。音声透かし埋込部２０２は、１フレーム毎の音声データxa[i]について、本フローチャートに示す処理を実行する。 FIG. 25 is a flowchart illustrating an example of watermark information embedding processing (S200) for audio data according to the third embodiment. The audio watermark embedding unit 202 executes the processing shown in this flowchart for the audio data xa [i] for each frame.

まず、音声透かし埋込部２０２は、音声透かし情報埋込位置pwa[i](0≦i＜WA)を準備する（Ｓ２２０）。ここで、音声透かし情報総ビット長WAは、１フレームの音声データに埋め込まれる音声透かし情報のデータの総量を示す。本実施形態では、音声データ64ビット当たりに１ビットの割合で音声透かし情報を埋め込む。また、本実施形態では、音声透かし情報dea[i](0≦i＜DA)を音声データxa[i]に３回埋め込むことを想定している。本実施形態では、音声透かし情報総ビット長WAは12,000であり、音声透かし情報ビット長DAはWA÷3＝4,000である。 First, the audio watermark embedding unit 202 prepares an audio watermark information embedding position pwa [i] (0 ≦ i <WA) (S220). Here, the audio watermark information total bit length WA indicates the total amount of audio watermark information data embedded in one frame of audio data. In the present embodiment, audio watermark information is embedded at a rate of 1 bit per 64 bits of audio data. In the present embodiment, it is assumed that the audio watermark information dea [i] (0 ≦ i <DA) is embedded in the audio data xa [i] three times. In this embodiment, the audio watermark information total bit length WA is 12,000, and the audio watermark information bit length DA is WA ÷ 3 = 4,000.

音声透かし情報埋込位置pwa[i]は、0≦pwa[i]＜XAかつpwa[i]mod2＝0を満たし、0≦i＜WAかつ0≦j＜WAかつi≠jを満たす任意の(i,j)について、pwa[i]≠pwa[j]を満たす数列である。本実施形態において、音声透かし情報埋込位置pwa[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 The audio watermark information embedding position pwa [i] satisfies 0 ≦ pwa [i] <XA and pwa [i] mod2 = 0, and satisfies 0 ≦ i <WA and 0 ≦ j <WA and i ≠ j (i, j) is a sequence satisfying pwa [i] ≠ pwa [j]. In the present embodiment, the audio watermark information embedding position pwa [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、音声透かし埋込部２０２は、変数ideaおよびipwaの値を０に初期化し（Ｓ２２１）、前のフレームにおいて透かし情報作成部２０３がステップＳ５００で作成した音声透かし情報deaのidea番目のビットを、ステップＳ１００において音声データ作成部２００が生成した音声データxaのpwa[ipwa]番目のビットと置き換えることにより、音声透かし情報dea[idea]を音声データxa[pwa[ipwa]]に埋め込む（Ｓ２２２）。 Next, the audio watermark embedding unit 202 initializes the values of variables idea and ipwa to 0 (S221), and the idea-th bit of the audio watermark information dea generated by the watermark information generation unit 203 in step S500 in the previous frame. Is replaced with the pwa [ipwa] -th bit of the audio data xa generated by the audio data creation unit 200 in step S100, thereby embedding the audio watermark information dea [idea] in the audio data xa [pwa [ipwa]] (S222) ).

次に、音声透かし埋込部２０２は、変数ipwaの値が音声透かし情報総ビット長WAの値と一致したか否かを判定する（Ｓ２２３）。変数ipwaの値が音声透かし情報総ビット長WAの値と一致した場合（Ｓ２２３：Ｙｅｓ）、音声透かし埋込部２０２は、音声透かし情報dea[i]が埋め込まれた音声データxa[i]を音声特徴量抽出部２０５およびコンテンツ記録部２０９へ出力し（Ｓ２２８）、本フローチャートに示した処理を終了する。 Next, the audio watermark embedding unit 202 determines whether or not the value of the variable ipwa matches the value of the audio watermark information total bit length WA (S223). When the value of the variable ipwa matches the value of the audio watermark information total bit length WA (S223: Yes), the audio watermark embedding unit 202 stores the audio data xa [i] in which the audio watermark information dea [i] is embedded. It outputs to the audio | voice feature-value extraction part 205 and the content recording part 209 (S228), and complete | finishes the process shown to this flowchart.

変数ipwaの値が音声透かし情報総ビット長WAの値と異なる場合（Ｓ２２３：Ｎｏ）、音声透かし埋込部２０２は、変数ideaの値が音声透かし情報ビット長DAの値と一致したか否かを判定する（Ｓ２２４）。変数ideaの値が音声透かし情報ビット長DAの値と一致した場合（Ｓ２２４：Ｙｅｓ）、音声透かし埋込部２０２は、変数ideaの値を０に初期化し（Ｓ２２５）、ステップＳ２２７に示す処理を実行する。変数ideaの値が音声透かし情報ビット長DAの値と異なる場合（Ｓ２２４：Ｎｏ）、音声透かし埋込部２０２は、変数ideaの値を１増やし（Ｓ２２６）、変数ipwaの値を１増やし（Ｓ２２７）、再びステップＳ２２２に示した処理を実行する。 When the value of the variable ipwa is different from the value of the audio watermark information total bit length WA (S223: No), the audio watermark embedding unit 202 determines whether or not the value of the variable idea matches the value of the audio watermark information bit length DA. Is determined (S224). When the value of the variable idea matches the value of the audio watermark information bit length DA (S224: Yes), the audio watermark embedding unit 202 initializes the value of the variable idea to 0 (S225), and performs the processing shown in step S227. Execute. When the value of the variable idea is different from the value of the audio watermark information bit length DA (S224: No), the audio watermark embedding unit 202 increases the value of the variable idea by 1 (S226), and increases the value of the variable ipwa by 1 (S227). ) The process shown in step S222 is executed again.

ここで、ステップＳ２２２からＳ２２７の処理を図２６を用いて説明すると、音声透かし情報総ビット長WAは、音声透かし情報ビット長DAの３倍であるため、音声透かし埋込部２０２は、音声透かし情報dea[i]のそれぞれのビット４１２を３回ずつ、音声透かし情報埋込位置pwa[i]で指定される音声データxa[i]の偶数番目のビット４１３、４１４、および４１５に埋め込む。 Here, the processing of steps S222 to S227 will be described with reference to FIG. 26. Since the audio watermark information total bit length WA is three times the audio watermark information bit length DA, the audio watermark embedding unit 202 Each bit 412 of the information dea [i] is embedded three times in the even-numbered bits 413, 414, and 415 of the audio data xa [i] specified by the audio watermark information embedding position pwa [i].

図２７は、第３の実施形態における映像データについての透かし情報埋め込み処理（Ｓ２００）の一例を示すフローチャートである。映像透かし埋込部２０４は、１フレーム毎の映像データxv[i]について、本フローチャートに示す処理を実行する。 FIG. 27 is a flowchart illustrating an example of watermark information embedding processing (S200) for video data according to the third embodiment. The video watermark embedding unit 204 executes the processing shown in this flowchart for video data xv [i] for each frame.

まず、映像透かし埋込部２０４は、映像透かし情報埋込位置pwv[i](0≦i＜WV)を準備する（Ｓ２３０）。ここで、映像透かし情報総ビット長WVは、１フレームの映像データに埋め込まれる映像透かし情報のデータの総量を示す。本実施形態では、映像データの24ビットのうち輝度情報64ビット当たりに１ビットの割合で映像透かし情報を埋め込む。また、本実施形態では、映像透かし情報dev[i](0≦i＜DV)を映像データxv[i]に３回埋め込むことを想定している。また、本実施形態では、映像透かし情報総ビット長WVは1,152,000であり、映像透かし情報ビット長DVはWV÷3＝384,000である。 First, the video watermark embedding unit 204 prepares a video watermark information embedding position pwv [i] (0 ≦ i <WV) (S230). Here, the video watermark information total bit length WV indicates the total amount of video watermark information data embedded in one frame of video data. In this embodiment, video watermark information is embedded at a rate of 1 bit per 64 bits of luminance information out of 24 bits of video data. In the present embodiment, it is assumed that the video watermark information dev [i] (0 ≦ i <DV) is embedded in the video data xv [i] three times. In this embodiment, the video watermark information total bit length WV is 1,152,000, and the video watermark information bit length DV is WV ÷ 3 = 384,000.

映像透かし情報埋込位置pwv[i]は、0≦pwv[i]＜XVかつpwv[i]mod2＝0を満たし、0≦i＜WVかつ0≦j＜WVかつi≠jを満たす任意の(i,j)について、pwv[i]≠pwv[j]を満たす数列である。本実施形態において、映像透かし情報埋込位置pwv[i]は透かし情報埋込装置２０の管理者等によって予め設定されている。 The video watermark information embedding position pwv [i] satisfies 0 ≦ pwv [i] <XV and pwv [i] mod2 = 0, and satisfies 0 ≦ i <WV and 0 ≦ j <WV and i ≠ j (i, j) is a sequence satisfying pwv [i] ≠ pwv [j]. In this embodiment, the video watermark information embedding position pwv [i] is set in advance by the administrator of the watermark information embedding device 20 or the like.

次に、映像透かし埋込部２０４は、変数idevおよびipwvの値を０に初期化し（Ｓ２３１）、前のフレームにおいて透かし情報作成部２０３がステップＳ５００で作成した映像透かし情報devのidev番目のビットを、ステップＳ１００において音声データ作成部２００が生成した映像データxvのpwv[ipwv]番目のビットと置き換えることにより、映像透かし情報dev[idev]を映像データxv[pwv[ipwv]]に埋め込む（Ｓ２３２）。 Next, the video watermark embedding unit 204 initializes the values of the variables idev and ipwv to 0 (S231), and the idev-th bit of the video watermark information dev created by the watermark information creation unit 203 in step S500 in the previous frame. Is replaced with the pwv [ipwv] -th bit of the video data xv generated by the audio data creation unit 200 in step S100, thereby embedding the video watermark information dev [idev] in the video data xv [pwv [ipwv]] (S232) ).

次に、映像透かし埋込部２０４は、変数ipwvの値が映像透かし情報総ビット長WVの値と一致したか否かを判定する（Ｓ２３３）。変数ipwvの値が映像透かし情報総ビット長WVの値と一致した場合（Ｓ２３３：Ｙｅｓ）、映像透かし埋込部２０４は、映像透かし情報dev[i]が埋め込まれた映像データxv[i]を映像特徴量抽出部２０８およびコンテンツ記録部２０９へ出力し（Ｓ２３８）、本フローチャートに示した処理を終了する。 Next, the video watermark embedding unit 204 determines whether or not the value of the variable ipwv matches the value of the video watermark information total bit length WV (S233). When the value of the variable ipwv matches the value of the video watermark information total bit length WV (S233: Yes), the video watermark embedding unit 204 stores the video data xv [i] in which the video watermark information dev [i] is embedded. The video feature amount extraction unit 208 and the content recording unit 209 are output (S238), and the processing shown in this flowchart ends.

変数ipwvの値が映像透かし情報総ビット長WVの値と異なる場合（Ｓ２３３：Ｎｏ）、映像透かし埋込部２０４は、変数idevの値が映像透かし情報ビット長DVの値と一致したか否かを判定する（Ｓ２３４）。変数idevの値が映像透かし情報ビット長DVの値と一致した場合（Ｓ２３４：Ｙｅｓ）、映像透かし埋込部２０４は、変数idevの値を０に初期化し（Ｓ２３５）、ステップＳ２３７に示す処理を実行する。変数idevの値が映像透かし情報ビット長DVの値と異なる場合（Ｓ２３４：Ｎｏ）、映像透かし埋込部２０４は、変数idevの値を１増やし（Ｓ２３６）、変数ipwvの値を１増やし（Ｓ２３７）、再びステップＳ２３２に示した処理を実行する。 When the value of the variable ipwv is different from the value of the video watermark information total bit length WV (S233: No), the video watermark embedding unit 204 determines whether or not the value of the variable idev matches the value of the video watermark information bit length DV. Is determined (S234). When the value of the variable idev matches the value of the video watermark information bit length DV (S234: Yes), the video watermark embedding unit 204 initializes the value of the variable idev to 0 (S235), and performs the processing shown in step S237. Execute. When the value of the variable idev is different from the value of the video watermark information bit length DV (S234: No), the video watermark embedding unit 204 increases the value of the variable idev by 1 (S236) and increases the value of the variable ipwv by 1 (S237). ) The process shown in step S232 is executed again.

図２８は、第３の実施形態における音声データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。音声透かし抽出部３０２は、１フレーム毎に記録媒体１３から読み出した音声データxa[i]について、本フローチャートに示す処理を実行する。 FIG. 28 is a flowchart illustrating an example of watermark information extraction processing (S600) for audio data according to the third embodiment. The audio watermark extraction unit 302 executes the processing shown in this flowchart for the audio data xa [i] read from the recording medium 13 for each frame.

まず、音声透かし抽出部３０２は、音声透かし情報埋込位置pwa[i](0≦i＜WA)を準備する（Ｓ６２０）。音声透かし情報埋込位置pwa[i]は、図２５で説明した音声透かし情報埋込位置pwa[i]と同一の数列であり、音声透かし埋込部２０２によって用いられた音声透かし情報埋込位置pwa[i]が、例えば改ざん検出装置３０の管理者等によって予め改ざん検出装置３０に登録される。 First, the audio watermark extraction unit 302 prepares an audio watermark information embedding position pwa [i] (0 ≦ i <WA) (S620). The audio watermark information embedding position pwa [i] is the same sequence as the audio watermark information embedding position pwa [i] described with reference to FIG. 25, and the audio watermark information embedding position used by the audio watermark embedding unit 202. pwa [i] is registered in advance in the alteration detection device 30 by, for example, an administrator of the alteration detection device 30.

次に、音声透かし抽出部３０２は、変数ivaおよび変数ipwaの値を０に初期化し（Ｓ６２１）、音声データxaのpwa[ipwa]番目の値が０か否かを判定する（Ｓ６２２）。音声データxa[pwa[ipwa]]の値が０である場合（Ｓ６２２：Ｙｅｓ）、音声透かし抽出部３０２は、音声投票バッファvaのiva番目のデータの値を１減らし（Ｓ６２３）、ステップＳ６２５に示す処理を実行する。 Next, the audio watermark extraction unit 302 initializes the values of the variables iva and ipwa to 0 (S621), and determines whether the pwa [ipwa] -th value of the audio data xa is 0 (S622). When the value of the audio data xa [pwa [ipwa]] is 0 (S622: Yes), the audio watermark extraction unit 302 decreases the value of the iva-th data in the audio voting buffer va by 1 (S623), and the process proceeds to step S625. The process shown is executed.

音声データxa[pwa[ipwa]]の値が０ではない場合（Ｓ６２２：Ｎｏ）、音声透かし抽出部３０２は、音声投票バッファvaのiva番目のデータの値を１増やし（Ｓ６２４）、変数ipwaの値が音声透かし情報総ビット長WAの値と一致したか否かを判定する（Ｓ６２５）。変数ipwaの値が音声透かし情報総ビット長WAの値と異なる場合（Ｓ６２５：Ｎｏ）、音声透かし抽出部３０２は、変数ivaの値が音声透かし情報ビット長DAの値と一致したか否かを判定する（Ｓ６２６）。 When the value of the audio data xa [pwa [ipwa]] is not 0 (S622: No), the audio watermark extraction unit 302 increases the value of the iva-th data in the audio voting buffer va by 1 (S624), and the variable ipwa It is determined whether or not the value matches the value of the audio watermark information total bit length WA (S625). When the value of the variable ipwa is different from the value of the audio watermark information total bit length WA (S625: No), the audio watermark extraction unit 302 determines whether or not the value of the variable iva matches the value of the audio watermark information bit length DA. Determination is made (S626).

変数ivaの値が音声透かし情報ビット長DAの値と一致した場合（Ｓ６２６：Ｙｅｓ）、音声透かし抽出部３０２は、変数ivaの値を０に初期化し（Ｓ６２７）、ステップＳ６２９に示す処理を実行する。変数ivaの値が音声透かし情報ビット長DAの値と異なる場合（Ｓ６２６：Ｎｏ）、音声透かし抽出部３０２は、変数ivaの値を１増やし（Ｓ６２８）、変数ipwaの値を１増やし（Ｓ６２９）、再びステップＳ６２２に示した処理を実行する。 When the value of the variable iva matches the value of the audio watermark information bit length DA (S626: Yes), the audio watermark extraction unit 302 initializes the value of the variable iva to 0 (S627), and executes the process shown in step S629. To do. When the value of the variable iva is different from the value of the audio watermark information bit length DA (S626: No), the audio watermark extraction unit 302 increases the value of the variable iva by 1 (S628) and increases the value of the variable ipwa by 1 (S629). Then, the process shown in step S622 is executed again.

ステップＳ６２５において、変数ipwaの値が音声透かし情報総ビット長WAの値と一致した場合（Ｓ６２５：Ｙｅｓ）、音声透かし抽出部３０２は、変数iddaの値を０に初期化し（Ｓ６３０）、音声投票バッファvaのidda番目のデータの値の符号が正か否かを判定する（Ｓ６３１）。音声投票バッファva[idda]のデータの値の符号が正である場合（Ｓ６３１：Ｙｅｓ）、音声透かし抽出部３０２は、読取音声透かし情報ddaのidda番目のビットに１を設定し（Ｓ６３２）、ステップＳ６３４に示す処理を実行する。 If the value of the variable ipwa matches the value of the audio watermark information total bit length WA in step S625 (S625: Yes), the audio watermark extraction unit 302 initializes the value of the variable idda to 0 (S630), It is determined whether or not the sign of the idda-th data value in the buffer va is positive (S631). When the sign of the data value of the voice voting buffer va [idda] is positive (S631: Yes), the voice watermark extraction unit 302 sets 1 to the idda-th bit of the read voice watermark information dda (S632). The process shown in step S634 is executed.

音声投票バッファva[idda]のデータの値の符号が正でない場合（Ｓ６３１：Ｎｏ）、音声透かし抽出部３０２は、読取音声透かし情報ddaのidda番目のビットに０を設定し（Ｓ６３２）、変数iddaの値を１増やす（Ｓ６３４）。ここで、本実施形態では、音声データxa[i]に音声透かし情報dea[i]の各ビットを３回（奇数回）埋め込むこととしており、音声透かし情報dea[i]の各ビットの値は０または１であるため、音声投票バッファva[idda]の値は０になることはない。 When the sign of the data value of the voice voting buffer va [idda] is not positive (S631: No), the voice watermark extraction unit 302 sets 0 to the idda-th bit of the read voice watermark information dda (S632), and the variable The value of idda is increased by 1 (S634). Here, in this embodiment, each bit of the audio watermark information dea [i] is embedded three times (odd times) in the audio data xa [i], and the value of each bit of the audio watermark information dea [i] is Since it is 0 or 1, the value of the voice voting buffer va [idda] never becomes 0.

しかし、音声データxa[i]に音声透かし情報dea[i]の各ビットを遇数回埋め込む場合や、音声透かし情報dea[i]の一部を偶数回埋め込む場合には、音声投票バッファva[idda]の値が０になる場合がある。音声投票バッファva[idda]の値が０となった場合、音声透かし抽出部３０２は、読取音声透かし情報ddaのidda番目のビットに０または１を設定する。このとき、音声透かし抽出部３０２は、０または１の設定が偏らないように、交互に設定することが好ましい。 However, when embedding each bit of the audio watermark information dea [i] in the audio data xa [i] an even number of times or when embedding a part of the audio watermark information dea [i] an even number of times, the audio voting buffer va [ The idda] value may be 0. When the value of the audio voting buffer va [idda] becomes 0, the audio watermark extraction unit 302 sets 0 or 1 to the idda-th bit of the read audio watermark information dda. At this time, the audio watermark extraction unit 302 is preferably set alternately so that the setting of 0 or 1 is not biased.

次に、音声透かし抽出部３０２は、変数iddaの値が音声透かし情報ビット長DAの値と一致したか否かを判定する（Ｓ６３５）。変数iddaの値が音声透かし情報ビット長DAの値と異なる場合（Ｓ６３５：Ｎｏ）、音声透かし抽出部３０２は、再びステップＳ６３１に示した処理を実行する。変数iddaの値が音声透かし情報ビット長DAの値と一致した場合（Ｓ６３５：Ｙｅｓ）、音声透かし抽出部３０２は、読取音声透かし情報dda[i]を特徴量再構成部３０６へ出力し（Ｓ６３６）、本フローチャートに示した処理を終了する。 Next, the audio watermark extraction unit 302 determines whether or not the value of the variable idda matches the value of the audio watermark information bit length DA (S635). When the value of the variable idda is different from the value of the audio watermark information bit length DA (S635: No), the audio watermark extraction unit 302 executes the process shown in step S631 again. When the value of the variable idda matches the value of the audio watermark information bit length DA (S635: Yes), the audio watermark extraction unit 302 outputs the read audio watermark information dda [i] to the feature amount reconstruction unit 306 (S636). ), The process shown in this flowchart is terminated.

ここで、ステップＳ６２２からＳ６３５の処理を図２９を用いて説明すると、音声透かし抽出部３０２は、音声透かし情報埋込位置pwa[i]で示される音声データxa[i]の偶数番目の３つのビット４１６、４１７、および４１８を読み出して、当該ビットの値が０であれば音声投票バッファva[i]の対応するデータ４１９の値を１減らし、当該ビットの値が１であれば対応するデータ４１９の値を１増やす。そして、音声透かし抽出部３０２は、音声投票バッファva[i]のデータ４１９の値の符号が正であれば、読取音声透かし情報dda[i]の対応するビット４１８の値を１とし、負であれば０とすることにより、多数決により読取音声透かし情報dda[i]のビット４１８の値を決定する。 Here, the processing of steps S622 to S635 will be described with reference to FIG. 29. The audio watermark extraction unit 302 performs the even-numbered three of the audio data xa [i] indicated by the audio watermark information embedding position pwa [i]. When bits 416, 417, and 418 are read and the value of the bit is 0, the value of the corresponding data 419 in the voice voting buffer va [i] is decreased by 1, and when the value of the bit is 1, the corresponding data Increase the value of 419 by one. Then, if the sign of the value of the data 419 in the voice voting buffer va [i] is positive, the voice watermark extraction unit 302 sets the value of the corresponding bit 418 of the read voice watermark information dda [i] to 1 and is negative. If it exists, the value of bit 418 of the read audio watermark information dda [i] is determined by majority decision by setting it to 0.

これにより、改ざん検出装置３０は、記録媒体１３に記録されたデータが劣化した場合や、データのわずかな変化を伴う変換が記録媒体１３に記録されたコンテンツデータに施された場合等、コンテンツの改ざんとは異なるわずかな変化をキャンセルすることができ、コンテンツの改ざんの有無を精度よく判定することができる。 As a result, the falsification detection device 30 can detect the content of the content when the data recorded on the recording medium 13 is deteriorated, or when the conversion is performed on the content data recorded on the recording medium 13 with a slight change in the data. A slight change that is different from tampering can be canceled, and whether or not content has been tampered with can be accurately determined.

図３０は、第３の実施形態における映像データについての透かし情報の抽出処理（Ｓ６００）の一例を示すフローチャートである。映像透かし抽出部３０３は、１フレーム毎に記録媒体１３から読み出した映像データxv[i]について、本フローチャートに示す処理を実行する。 FIG. 30 is a flowchart illustrating an example of watermark information extraction processing (S600) for video data according to the third embodiment. The video watermark extraction unit 303 executes the processing shown in this flowchart for the video data xv [i] read from the recording medium 13 for each frame.

まず、映像透かし抽出部３０３は、映像透かし情報埋込位置pwv[i](0≦i＜WV)を準備する（Ｓ６４０）。映像透かし情報埋込位置pwv[i]は、図２７で説明した映像透かし情報埋込位置pwv[i]と同一の数列であり、映像透かし埋込部２０４によって用いられた映像透かし情報埋込位置pwv[i]が、例えば改ざん検出装置３０の管理者等によって予め改ざん検出装置３０に登録される。 First, the video watermark extraction unit 303 prepares a video watermark information embedding position pwv [i] (0 ≦ i <WV) (S640). The video watermark information embedding position pwv [i] is the same sequence as the video watermark information embedding position pwv [i] described with reference to FIG. 27, and the video watermark information embedding position used by the video watermark embedding unit 204. pwv [i] is registered in advance in the alteration detection device 30 by, for example, an administrator of the alteration detection device 30.

次に、映像透かし抽出部３０３は、変数ivvおよび変数ipwvの値を０に初期化し（Ｓ６４１）、映像データxvのpwv[ipwv]番目の値が０か否かを判定する（Ｓ６４２）。映像データxv[pwv[ipwv]]の値が０である場合（Ｓ６４２：Ｙｅｓ）、映像透かし抽出部３０３は、映像投票バッファvvのivv番目のデータの値を１減らし（Ｓ６４３）、ステップＳ６４５に示す処理を実行する。 Next, the video watermark extraction unit 303 initializes the values of the variable ivv and the variable ipwv to 0 (S641), and determines whether the pwv [ipwv] -th value of the video data xv is 0 (S642). When the value of the video data xv [pwv [ipwv]] is 0 (S642: Yes), the video watermark extraction unit 303 decreases the value of the ivv-th data in the video voting buffer vv by 1 (S643), and the process proceeds to step S645. The process shown is executed.

映像データxv[pwv[ipwv]]の値が０ではない場合（Ｓ６４２：Ｎｏ）、映像透かし抽出部３０３は、映像投票バッファvvのivv番目のデータの値を１増やし（Ｓ６４４）、変数ipwvの値が映像透かし情報総ビット長WVの値と一致したか否かを判定する（Ｓ６４５）。変数ipwvの値が映像透かし情報総ビット長WVの値と異なる場合（Ｓ６４５：Ｎｏ）、映像透かし抽出部３０３は、変数ivvの値が映像透かし情報ビット長DVの値と一致したか否かを判定する（Ｓ６４６）。 When the value of the video data xv [pwv [ipwv]] is not 0 (S642: No), the video watermark extraction unit 303 increases the value of the ivv-th data in the video voting buffer vv by 1 (S644), and the variable ipwv It is determined whether or not the value matches the value of the video watermark information total bit length WV (S645). When the value of the variable ipwv is different from the value of the video watermark information total bit length WV (S645: No), the video watermark extraction unit 303 determines whether or not the value of the variable ivv matches the value of the video watermark information bit length DV. Determination is made (S646).

変数ivvの値が映像透かし情報ビット長DVの値と一致した場合（Ｓ６４６：Ｙｅｓ）、映像透かし抽出部３０３は、変数ivvの値を０に初期化し（Ｓ６４７）、ステップＳ６４９に示す処理を実行する。変数ivvの値が映像透かし情報ビット長DVの値と異なる場合（Ｓ６４６：Ｎｏ）、映像透かし抽出部３０３は、変数ivvの値を１増やし（Ｓ６４８）、変数ipwvの値を１増やし（Ｓ６４９）、再びステップＳ６４２に示した処理を実行する。 When the value of the variable ivv matches the value of the video watermark information bit length DV (S646: Yes), the video watermark extraction unit 303 initializes the value of the variable ivv to 0 (S647) and executes the process shown in step S649. To do. When the value of the variable ivv is different from the value of the video watermark information bit length DV (S646: No), the video watermark extraction unit 303 increases the value of the variable ivv by 1 (S648) and increases the value of the variable ipwv by 1 (S649). Then, the process shown in step S642 is executed again.

ステップＳ６４５において、変数ipwvの値が映像透かし情報総ビット長WVの値と一致した場合（Ｓ６４５：Ｙｅｓ）、映像透かし抽出部３０３は、変数iddvの値を０に初期化し（Ｓ６５０）、映像投票バッファvvのiddv番目のデータの値の符号が正か否かを判定する（Ｓ６５１）。映像投票バッファvv[iddv]のデータの値の符号が正である場合（Ｓ６５１：Ｙｅｓ）、映像透かし抽出部３０３は、読取映像透かし情報ddvのiddv番目のビットに１を設定し（Ｓ６５２）、ステップＳ６５４に示す処理を実行する。 In step S645, when the value of the variable ipwv matches the value of the video watermark information total bit length WV (S645: Yes), the video watermark extraction unit 303 initializes the value of the variable iddv to 0 (S650), and video voting. It is determined whether or not the sign of the value of the iddv-th data in the buffer vv is positive (S651). When the sign of the data value of the video voting buffer vv [iddv] is positive (S651: Yes), the video watermark extraction unit 303 sets 1 to the iddv-th bit of the read video watermark information ddv (S652). The process shown in step S654 is executed.

映像投票バッファvv[iddv]のデータの値の符号が正でない場合（Ｓ６５１：Ｎｏ）、映像透かし抽出部３０３は、読取映像透かし情報ddvのiddv番目のビットに０を設定し（Ｓ６５２）、変数iddvの値を１増やす（Ｓ６５４）。 When the sign of the data value of the video voting buffer vv [iddv] is not positive (S651: No), the video watermark extraction unit 303 sets 0 to the iddv-th bit of the read video watermark information ddv (S652), and the variable The value of iddv is increased by 1 (S654).

次に、映像透かし抽出部３０３は、変数iddvの値が映像透かし情報ビット長DVの値と一致したか否かを判定する（Ｓ６５５）。変数iddvの値が映像透かし情報ビット長DVの値と異なる場合（Ｓ６５５：Ｎｏ）、映像透かし抽出部３０３は、再びステップＳ６５１に示した処理を実行する。変数iddvの値が映像透かし情報ビット長DVの値と一致した場合（Ｓ６５５：Ｙｅｓ）、映像透かし抽出部３０３は、読取映像透かし情報ddv[i]を特徴量再構成部３０６へ出力し（Ｓ６５６）、本フローチャートに示した処理を終了する。 Next, the video watermark extraction unit 303 determines whether or not the value of the variable iddv matches the value of the video watermark information bit length DV (S655). When the value of the variable iddv is different from the value of the video watermark information bit length DV (S655: No), the video watermark extraction unit 303 executes the process shown in step S651 again. When the value of the variable iddv matches the value of the video watermark information bit length DV (S655: Yes), the video watermark extraction unit 303 outputs the read video watermark information ddv [i] to the feature amount reconstruction unit 306 (S656). ), The process shown in this flowchart is terminated.

以上、本発明の第３の実施形態について説明した。 Heretofore, the third embodiment of the present invention has been described.

なお、上記した第１から第３の実施形態における透かし情報埋込装置２０または改ざん検出装置３０は、例えば図３１に示すような構成のコンピュータ６０によって実現される。コンピュータ６０は、ＣＰＵ（Central Processing Unit）６１、ＲＡＭ（Random Access Memory）６２、ＲＯＭ（Read Only Memory）６３、ＨＤＤ（Hard Disk Drive）６４、通信インターフェイス（Ｉ／Ｆ）６５、入出力インターフェイス（Ｉ／Ｆ）６６、およびメディアインターフェイス（Ｉ／Ｆ）６７を備える。 The watermark information embedding device 20 or the falsification detection device 30 in the first to third embodiments described above is realized by a computer 60 configured as shown in FIG. 31, for example. The computer 60 includes a central processing unit (CPU) 61, a random access memory (RAM) 62, a read only memory (ROM) 63, a hard disk drive (HDD) 64, a communication interface (I / F) 65, an input / output interface (I). / F) 66 and a media interface (I / F) 67.

ＣＰＵ６１は、ＲＯＭ６３またはＨＤＤ６４に格納されたプログラムに基づいて動作し、各部の制御を行う。ＲＯＭ６３は、コンピュータ６０の起動時にＣＰＵ６１によって実行されるブートプログラムや、コンピュータ６０のハードウェアに依存するプログラム等を格納する。 The CPU 61 operates based on a program stored in the ROM 63 or the HDD 64 and controls each unit. The ROM 63 stores a boot program executed by the CPU 61 when the computer 60 is activated, a program depending on the hardware of the computer 60, and the like.

ＨＤＤ６４は、ＣＰＵ６１によって実行されるプログラムおよび当該プログラムによって使用されるデータ等を格納する。通信インターフェイス６５は、通信回線を介して他の機器からデータを受信してＣＰＵ６１へ送ると共に、ＣＰＵ６１が生成したデータを、通信回線を介して他の機器へ送信する。 The HDD 64 stores a program executed by the CPU 61, data used by the program, and the like. The communication interface 65 receives data from other devices via the communication line and sends the data to the CPU 61, and transmits the data generated by the CPU 61 to other devices via the communication line.

ＣＰＵ６１は、入出力インターフェイス６６を介して、スピーカ１４や表示装置１５等の出力装置、および、キーボードやマウス、マイク１１、カメラ１２等の入力装置を制御する。ＣＰＵ６１は、入出力インターフェイス６６を介して、入力装置からデータを取得する。また、ＣＰＵ６１は、生成したデータを、入出力インターフェイス６６を介して出力装置へ出力する。 The CPU 61 controls output devices such as the speaker 14 and the display device 15 and input devices such as a keyboard, a mouse, a microphone 11, and a camera 12 via the input / output interface 66. The CPU 61 acquires data from the input device via the input / output interface 66. Further, the CPU 61 outputs the generated data to the output device via the input / output interface 66.

メディアインターフェイス６７は、記録媒体６８に格納されたプログラムまたはデータを読み取り、ＲＡＭ６２を介してＣＰＵ６１に提供する。ＣＰＵ６１は、当該プログラムを、メディアインターフェイス６７を介して記録媒体６８からＲＡＭ６２上にロードし、ロードしたプログラムを実行する。記録媒体６８は、例えばＤＶＤ（Digital Versatile Disk）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等である。 The media interface 67 reads a program or data stored in the recording medium 68 and provides it to the CPU 61 via the RAM 62. The CPU 61 loads the program from the recording medium 68 onto the RAM 62 via the media interface 67, and executes the loaded program. The recording medium 68 is, for example, an optical recording medium such as a DVD (Digital Versatile Disk) or PD (Phase change rewritable disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), a tape medium, a magnetic recording medium, or a semiconductor memory. Etc.

コンピュータ６０が透かし情報埋込装置２０として機能する場合、コンピュータ６０のＣＰＵ６１は、ＲＡＭ６２上にロードされたプログラムを実行することにより、音声データ作成部２００、映像データ作成部２０１、音声透かし埋込部２０２、透かし情報作成部２０３、映像透かし埋込部２０４、音声特徴量抽出部２０５、音声特徴量遅延部２０６、映像特徴量遅延部２０７、映像特徴量抽出部２０８、およびコンテンツ記録部２０９の各機能を実現する。 When the computer 60 functions as the watermark information embedding device 20, the CPU 61 of the computer 60 executes a program loaded on the RAM 62, thereby executing an audio data creation unit 200, a video data creation unit 201, and an audio watermark embedding unit. 202, watermark information creation unit 203, video watermark embedding unit 204, audio feature amount extraction unit 205, audio feature amount delay unit 206, video feature amount delay unit 207, video feature amount extraction unit 208, and content recording unit 209. Realize the function.

また、コンピュータ６０が改ざん検出装置３０として機能する場合、コンピュータ６０のＣＰＵ６１は、ＲＡＭ６２上にロードされたプログラムを実行することにより、コンテンツ再生部３００、音声特徴量抽出部３０１、音声透かし抽出部３０２、映像透かし抽出部３０３、映像特徴量抽出部３０４、音声特徴量遅延部３０５、特徴量再構成部３０６、映像特徴量遅延部３０７、音声改ざん検出部３０８、および映像改ざん検出部３０９の各機能を実現する。 When the computer 60 functions as the falsification detection device 30, the CPU 61 of the computer 60 executes a program loaded on the RAM 62, thereby executing the content reproduction unit 300, the audio feature amount extraction unit 301, and the audio watermark extraction unit 302. , Video watermark extraction unit 303, video feature amount extraction unit 304, audio feature amount delay unit 305, feature amount reconstruction unit 306, video feature amount delay unit 307, audio alteration detection unit 308, and video alteration detection unit 309 Is realized.

コンピュータ６０のＣＰＵ６１は、これらのプログラムを、記録媒体６８から読み取って実行するが、他の例として、他の装置から、通信媒体を介してこれらのプログラムを取得してもよい。通信媒体とは、通信回線、または、当該通信回線を伝搬するディジタル信号もしくは搬送波を指す。 The CPU 61 of the computer 60 reads these programs from the recording medium 68 and executes them, but as another example, these programs may be acquired from other devices via a communication medium. The communication medium refers to a communication line or a digital signal or a carrier wave that propagates through the communication line.

なお、本発明は、上記した実施形態に限定されるものではなく、その要旨の範囲内で数々の変形が可能である。 In addition, this invention is not limited to above-described embodiment, Many deformation | transformation are possible within the range of the summary.

例えば、上記した各実施形態では、透かし情報埋込装置２０が１つの装置として実現される構成を例に説明したが、本発明はこれに限られず、複数のコンピュータのそれぞれに、透かし情報埋込装置２０内の各機能を分散させ、当該複数のコンピュータを協調動作させて透かし情報埋込装置２０の機能を実現させるようにしてもよい。上記した各実施形態における改ざん検出装置３０についても同様である。 For example, in each of the embodiments described above, the configuration in which the watermark information embedding device 20 is realized as one device has been described as an example. However, the present invention is not limited to this, and watermark information embedding is performed in each of a plurality of computers. The functions of the watermark information embedding device 20 may be realized by distributing each function in the device 20 and causing the plurality of computers to cooperate. The same applies to the falsification detection device 30 in each of the embodiments described above.

また、上記した各実施形態における透かし情報埋込装置２０または改ざん検出装置３０内の各構成要素は、実施形態の説明を容易にするために、主な処理内容に応じて機能別に区分したものである。また、構成要素の区分方法やその名称によって、本願発明が制限されることはない。各実施形態における透かし情報埋込装置２０または改ざん検出装置３０内の構成要素は、処理内容に応じてさらに多くの構成要素に区分することもできるし、１つの構成要素がさらに多くの処理を実行するように区分することもできる。 In addition, each component in the watermark information embedding device 20 or the falsification detection device 30 in each of the above-described embodiments is classified by function according to main processing contents in order to facilitate the description of the embodiment. is there. Further, the invention of the present application is not limited by the component classification method or the name thereof. The components in the watermark information embedding device 20 or the falsification detection device 30 in each embodiment can be divided into more components depending on the processing contents, and one component performs more processing. It can also be classified as follows.

また、上記した各実施形態において、透かし情報作成部２０３は、音声データxa[i]から抽出した音声特徴量fa[i]の一部および映像データxv[i]から抽出した映像特徴量fv[i]の一部を含む音声透かし情報dea[i]を作成すると共に、音声特徴量fa[i]の残りの部分および映像特徴量fv[i]の残りの部分を含む映像透かし情報dev[i]を作成するが、本発明はこれに限られない。例えば、透かし情報作成部２０３は、音声データxa[i]から抽出した音声特徴量fa[i]の全部および映像データxv[i]から抽出した映像特徴量fv[i]の一部を含む音声透かし情報dea[i]を作成し、映像特徴量fv[i]の残りの部分を含み、音声特徴量fa[i]を含まない映像透かし情報dea[i]を作成するようにしてもよい。 In each of the above embodiments, the watermark information creation unit 203 also includes a part of the audio feature amount fa [i] extracted from the audio data xa [i] and the video feature amount fv [ The audio watermark information dea [i] including a part of i] is generated, and the video watermark information dev [i] including the remaining part of the audio feature fa [i] and the remaining part of the video feature fv [i] However, the present invention is not limited to this. For example, the watermark information creating unit 203 includes an audio including all of the audio feature value fa [i] extracted from the audio data xa [i] and a part of the video feature value fv [i] extracted from the video data xv [i]. The watermark information dea [i] may be generated, and the video watermark information dea [i] including the remaining part of the video feature quantity fv [i] and not including the audio feature quantity fa [i] may be generated.

また、上記した各実施形態では、音声特徴量遅延部２０６および映像特徴量遅延部２０７により各特徴量を３フレーム分遅延させたが、本発明はこれに限られず、透かし情報埋込装置２０が、１フレーム分の時間（上記した各実施形態では１秒）の間に、音声透かし埋込部２０２による透かし情報の埋込処理、音声特徴量抽出部２０５による音声特徴量の抽出処理、および透かし情報作成部２０３による透かし情報の作成処理を終了可能な高性能のコンピュータである場合には、透かし情報埋込装置２０には音声特徴量遅延部２０６および映像特徴量遅延部２０７が設けられていなくてもよい。 In each of the above embodiments, each feature amount is delayed by three frames by the audio feature amount delay unit 206 and the video feature amount delay unit 207. However, the present invention is not limited to this, and the watermark information embedding device 20 is The watermark information embedding process by the audio watermark embedding unit 202, the audio feature value extracting process by the audio feature value extracting unit 205, and the watermark for one frame time (1 second in each of the above embodiments) In the case of a high-performance computer capable of completing the watermark information creation process by the information creation unit 203, the watermark information embedding device 20 is not provided with the audio feature amount delay unit 206 and the video feature amount delay unit 207. May be.

この場合、音声透かし埋込部２０２は、音声データ作成部２００から１フレーム分の音声データxa[i]が出力されるたびに、前のフレームの音声データxa[i]および映像データxv[i]から作成された音声透かし情報dea[i]を音声データxa[i]に埋め込む。これにより、前のフレームから作成された透かし情報が埋め込まれないのは、最初の１フレームのみとなり、改ざんの検出対象となるフレームを多くすることができる。 In this case, every time audio data xa [i] for one frame is output from the audio data creation unit 200, the audio watermark embedding unit 202 outputs audio data xa [i] and video data xv [i of the previous frame. The audio watermark information dea [i] created from] is embedded in the audio data xa [i]. As a result, the watermark information created from the previous frame is not embedded only in the first frame, and the number of frames to be detected for alteration can be increased.

１０・・・改ざん検出システム、１１・・・マイク、１２・・・カメラ、１３・・・記録媒体、１４・・・スピーカ、１５・・・表示装置、２０・・・透かし情報埋込装置、２００・・・音声データ作成部、２０１・・・映像データ作成部、２０２・・・音声透かし埋込部、２０３・・・透かし情報作成部、２０４・・・映像透かし埋込部、２０５・・・音声特徴量抽出部、２０６・・・音声特徴量遅延部、２０７・・・映像特徴量遅延部、２０８・・・映像特徴量抽出部、２０９・・・コンテンツ記録部、３０・・・改ざん検出装置、３００・・・コンテンツ再生部、３０１・・・音声特徴量抽出部、３０２・・・音声透かし抽出部、３０３・・・映像透かし抽出部、３０４・・・映像特徴量抽出部、３０５・・・音声特徴量遅延部、３０６・・・特徴量再構成部、３０７・・・映像特徴量遅延部、３０８・・・音声改ざん検出部、３０９・・・映像改ざん検出部、５０・・・画像、６０・・・コンピュータ、６１・・・ＣＰＵ、６２・・・ＲＡＭ、６３・・・ＲＯＭ、６４・・・ＨＤＤ、６５・・・通信インターフェイス、６６・・・入出力インターフェイス、６７・・・メディアインターフェイス、６８・・・記録媒体 DESCRIPTION OF SYMBOLS 10 ... Tampering detection system, 11 ... Microphone, 12 ... Camera, 13 ... Recording medium, 14 ... Speaker, 15 ... Display device, 20 ... Watermark information embedding device, DESCRIPTION OF SYMBOLS 200 ... Audio | voice data creation part, 201 ... Video | video data creation part, 202 ... Voice watermark embedding part, 203 ... Watermark information creation part, 204 ... Video watermark embedding part, 205 ... Audio feature amount extraction unit, 206 ... Audio feature amount delay unit, 207 ... Video feature amount delay unit, 208 ... Video feature amount extraction unit, 209 ... Content recording unit, 30 ... Falsification Detection device, 300... Content reproduction unit, 301... Audio feature amount extraction unit, 302... Audio watermark extraction unit, 303... Video watermark extraction unit, 304. ... Audio feature delay unit, 30 ... feature amount reconstruction unit, 307 ... video feature amount delay unit, 308 ... sound alteration detection unit, 309 ... video alteration detection unit, 50 ... image, 60 ... computer, 61 ... CPU, 62 ... RAM, 63 ... ROM, 64 ... HDD, 65 ... communication interface, 66 ... input / output interface, 67 ... media interface, 68 ... recording Medium

Claims

A tamper detection system that detects tampering of content including audio data and video data in frame units,
A watermark information embedding device for embedding watermark information in audio data and video data in frame units;
A tamper detection device that reads watermark information from audio data and video data in which watermark information is embedded in units of frames, and determines whether the audio data and video data have been tampered with,
The watermark information embedding device comprises:
An audio data creation unit that obtains audio from the outside and converts it into audio data in units of frames;
An audio watermark embedding unit that embeds audio watermark information in audio data and outputs it by replacing predetermined bits in the audio data created by the audio data creation unit in units of frames with bits of audio watermark information When,
A first audio feature amount extraction unit that extracts an audio feature amount from a bit in which audio watermark information is not embedded in the audio data in which the audio watermark information is embedded by the audio watermark embedding unit;
A video data creation unit that acquires video from outside and converts it into video data in units of frames;
A video watermark embedding unit that embeds video watermark information in video data and outputs the video watermark information by replacing predetermined bits in the video data created by the video data creation unit in units of frames with bits of video watermark information When,
A first video feature amount extraction unit that extracts a video feature amount from a bit in which video watermark information is not embedded in the video data in which the video watermark information is embedded by the video watermark embedding unit in units of frames;
Create audio watermark information including a part of the audio feature amount and a part of the video feature amount in frame units, supply the generated audio watermark information to the audio watermark embedding unit, and in frame units, A watermark information creating unit that creates video watermark information including the remaining part of the audio feature and the remaining part of the video feature, and supplies the created video watermark information to the video watermark embedding unit;
The tampering detection device includes:
An audio watermark extraction unit that extracts audio watermark information from a bit position in which audio watermark information in audio data is to be embedded in units of frames;
A second audio feature amount extraction unit that extracts an audio feature amount from a bit position in which audio watermark information is not embedded in the audio data in units of frames;
A video watermark extraction unit that extracts video watermark information from a bit position in which video watermark information in video data is to be embedded in a frame unit;
A second video feature amount extraction unit that extracts a video feature amount from a bit position in which video watermark information in video data is not embedded in a frame unit;
The video watermark information extracted by the video watermark extraction unit by extracting a part of the audio feature amount and a part of the video feature amount from the audio watermark information extracted by the audio watermark extraction unit for each frame. A feature amount reconstructing unit that extracts the remaining portion of the audio feature amount and the remaining portion of the video feature amount from the frame, and reconstructs the audio feature amount and the video feature amount from the extracted data in units of frames,
By comparing the voice feature quantity extracted by the second voice feature quantity extraction unit with the voice feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, whether or not the voice data has been tampered with is determined. An audio data alteration detection unit that outputs information indicating
By comparing the video feature quantity extracted by the second video feature quantity extraction unit with the video feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, the presence or absence of alteration of the video data is determined. A falsification detection system comprising: a video data falsification detection unit that outputs information to be displayed.

The falsification detection system according to claim 1,
The watermark information embedding device comprises:
A first audio feature amount delay unit that delays the audio feature amount extracted by the first audio feature amount extraction unit by a first number of frames and supplies the delayed audio feature amount to the watermark information creation unit When,
A first video feature amount delay unit that delays the video feature amount extracted by the first video feature amount extraction unit by a first number of frames and supplies the delayed video feature amount to the watermark information creation unit And
The watermark information creating unit
Created audio watermark information including a part of the audio feature amount supplied from the audio feature amount delay unit and a part of the video feature amount supplied from the video feature amount delay unit in a frame unit, and the generated audio watermark Information is supplied to the audio watermark embedding unit, and video watermark information including the remaining portion of the audio feature amount and the remaining portion of the video feature amount is generated in units of frames, and the generated video watermark information is Supply to the video watermark embedding part,
The tampering detection device includes:
The second audio feature amount that is extracted by the second audio feature amount extraction unit is delayed by the first number of frames, and the delayed audio feature amount is supplied to the audio data alteration detection unit. A delay unit;
A second video feature amount that is delayed by the first number of frames and that is delayed by the first video feature amount extraction unit and that is supplied to the video data alteration detection unit. A delay unit,
The voice data falsification detection unit
By comparing the audio feature quantity supplied from the second audio feature quantity delay unit with the audio feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, the presence or absence of alteration of the audio data is determined. Output the information shown,
The video data alteration detection unit
By comparing the video feature quantity supplied from the second video feature quantity delay unit with the video feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, the presence or absence of alteration of the video data is determined. A falsification detection system characterized by outputting information indicating.

A watermark information embedding device that embeds watermark information for detecting falsification of content including audio data and video data in frame units,
An audio data creation unit that obtains audio from the outside and converts it into audio data in units of frames;
An audio watermark embedding unit that embeds audio watermark information in audio data and outputs it by replacing predetermined bits in the audio data created by the audio data creation unit in units of frames with bits of audio watermark information When,
An audio feature amount extraction unit that extracts an audio feature amount from bits in which audio watermark information is not embedded in the audio data in which the audio watermark information is embedded by the audio watermark embedding unit in units of frames;
A video data creation unit that acquires video from outside and converts it into video data in units of frames;
A video watermark embedding unit that embeds video watermark information in video data and outputs the video watermark information by replacing predetermined bits in the video data created by the video data creation unit in units of frames with bits of video watermark information When,
A video feature amount extraction unit that extracts a video feature amount from a bit in which video watermark information is not embedded in the video data in which the video watermark information is embedded by the video watermark embedding unit in a frame unit;
Create audio watermark information including a part of the audio feature amount and a part of the video feature amount in frame units, supply the generated audio watermark information to the audio watermark embedding unit, and in frame units, A watermark information creating unit that creates video watermark information including the remaining part of the audio feature quantity and the remaining part of the video feature quantity, and supplies the created video watermark information to the video watermark embedding unit; A watermark information embedding device.

The watermark information embedding device according to claim 3,
A speech feature amount delay unit that delays the speech feature amount extracted by the speech feature amount extraction unit by a predetermined number of frames, and supplies the delayed speech feature amount to the watermark information creation unit;
A video feature amount delay unit that delays the video feature amount extracted by the video feature amount extraction unit by a predetermined number of frames, and supplies the delayed video feature amount to the watermark information creation unit;
The watermark information creating unit
Created audio watermark information including a part of the audio feature amount supplied from the audio feature amount delay unit and a part of the video feature amount supplied from the video feature amount delay unit in a frame unit, and the generated audio watermark Information is supplied to the audio watermark embedding unit, and video watermark information including the remaining portion of the audio feature amount and the remaining portion of the video feature amount is generated in units of frames, and the generated video watermark information is A watermark information embedding device, characterized by being supplied to a video watermark embedding unit.

An alteration detection device that detects alteration of content including audio data and video data in units of frames,
An audio watermark extraction unit that extracts audio watermark information from a bit position in which audio watermark information in audio data is to be embedded in units of frames;
An audio feature amount extraction unit that extracts an audio feature amount from a bit position in which audio watermark information is not embedded in audio data in units of frames;
A video watermark extraction unit that extracts video watermark information from a bit position in which video watermark information in video data is to be embedded in a frame unit;
A video feature quantity extraction unit that extracts video feature quantities from bit positions in which video watermark information is not embedded in video data in units of frames;
The video watermark information extracted by the video watermark extraction unit by extracting a part of the audio feature amount and a part of the video feature amount from the audio watermark information extracted by the audio watermark extraction unit for each frame. A feature amount reconstructing unit that extracts the remaining portion of the audio feature amount and the remaining portion of the video feature amount from the frame, and reconstructs the audio feature amount and the video feature amount from the extracted data in units of frames,
Information indicating whether or not the audio data has been tampered with by comparing the audio feature amount extracted by the audio feature amount extraction unit with the audio feature amount reconstructed by the feature amount reconstruction unit in units of frames. An audio data alteration detection unit to be output;
By comparing the video feature amount extracted by the video feature amount extraction unit with the video feature amount reconstructed by the feature amount reconstruction unit in units of frames, information indicating the presence / absence of alteration of the video data is obtained. An alteration detection device comprising: an output video data alteration detection unit.

The falsification detection device according to claim 5,
A speech feature amount delay unit that delays the speech feature amount extracted by the speech feature amount extraction unit by a predetermined number of frames and supplies the delayed speech feature amount to the speech data alteration detection unit;
A video feature amount delay unit that delays the video feature amount extracted by the video feature amount extraction unit by a predetermined number of frames and supplies the delayed video feature amount to the video data alteration detection unit; ,
The voice data falsification detection unit
Information indicating whether or not audio data has been tampered with by comparing the audio feature amount supplied from the audio feature amount delay unit with the audio feature amount reconstructed by the feature amount reconstruction unit in units of frames. Output,
The video data alteration detection unit
By comparing the video feature quantity supplied from the video feature quantity delay unit with the video feature quantity reconstructed by the feature quantity reconstruction unit in units of frames, information indicating the presence or absence of alteration of the video data is obtained. An alteration detection device characterized by outputting.

A watermark information embedding method in a watermark information embedding device for embedding watermark information for detecting falsification of content including audio data and video data in frame units,
The watermark information embedding device,
Audio data creation step for acquiring audio from outside and converting it into audio data in frame units;
Audio watermark information embedding step for embedding audio watermark information in audio data and outputting the audio watermark information by replacing predetermined bits in the audio data created in the audio data creation step in units of frames with bits of audio watermark information When,
An audio feature amount extraction step for extracting audio feature amounts from bits in which audio watermark information is not embedded in the audio data in which the audio watermark information is embedded in the audio watermark information embedding step in units of frames;
Video data creation step for acquiring video from outside and converting it to video data in frame units,
A video watermark information embedding step for embedding video watermark information in video data and outputting it by replacing a predetermined bit in the video data created in the video data creation step in units of frames with bits of video watermark information When,
A video feature amount extraction step for extracting a video feature amount from a bit in which video watermark information is not embedded in the video data in which the video watermark information is embedded in the video watermark information embedding step in units of frames;
Creating audio watermark information including a part of the audio feature quantity and a part of the video feature quantity in a frame unit;
A method of embedding watermark information, comprising: generating video watermark information including a remaining portion of the audio feature amount and a remaining portion of the video feature amount in units of frames.

An alteration detection method in an alteration detection device for detecting alteration of content including audio data and video data in units of frames,
The tamper detection device is
An audio watermark information extracting step for extracting audio watermark information from a bit position in which audio watermark information in the audio data is to be embedded in units of frames;
An audio feature amount extraction step for extracting an audio feature amount from a bit position in which audio watermark information is not embedded in the audio data in units of frames;
A video watermark information extraction step for extracting video watermark information from a bit position in which video watermark information in video data is to be embedded in a frame unit;
A video feature amount extraction step for extracting a video feature amount from a bit position in which video watermark information in the video data is not embedded in a frame unit;
The video watermark information extracted in the video watermark information extraction step in a frame unit by extracting a part of the audio feature quantity and a part of the video feature quantity in the frame unit from the audio watermark information extracted in the audio watermark information extraction step. A feature amount reconstruction step of extracting the remaining part of the audio feature amount and the remaining portion of the video feature amount from the frame, and reconstructing the audio feature amount and the video feature amount from the extracted data in units of frames,
Outputs information indicating whether or not the audio data has been altered by comparing the audio feature quantity extracted in the audio feature quantity extraction step with the audio feature quantity reconstructed in the feature quantity reconstruction step in units of frames. An audio data alteration detection step;
Information indicating whether the video data has been tampered is output by comparing the video feature quantity extracted in the video feature quantity extraction step with the video feature quantity reconstructed in the feature quantity reconstruction step in units of frames. And a video data alteration detection step.