JP2011039776A

JP2011039776A - Moving image content detection device

Info

Publication number: JP2011039776A
Application number: JP2009186566A
Authority: JP
Inventors: Yusuke Uchida; 祐介内田; Masayuki Hashimoto; 真幸橋本; Akio Yoneyama; 暁夫米山
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2009-08-11
Filing date: 2009-08-11
Publication date: 2011-02-24
Anticipated expiration: 2029-08-11
Also published as: JP5297297B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique which can accurately and quickly detect a content even when the content is an illegally distributed content in which edition in a time-base direction such as segmentation of a part of a content of which the free distribution is not granted by a copyright holder or the like or an illegally distributed content of which the whole is deteriorated due to compression noise or the like. <P>SOLUTION: A moving image content detection device 1 includes: a shot boundary detection part 10 for detecting a shot boundary of a moving image content; a shot boundary feature value extraction part 20 for extracting feature values from front and rear frames of the shot boundary detected by the shot boundary detection part 10; and a feature value collation part 40 for collating the shot boundary feature value which is a feature value concerned with one moving image content and extracted by the shot boundary feature value extraction part 20 with shot boundary feature values concerned with a plurality of moving image contents stored in a database 50. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、動画コンテンツ検出装置に関する。 The present invention relates to a moving image content detection apparatus.

近年のブロードバンドの普及、およびＨＤＤ（Hard Disk Drive）、DVD（Digital Versatile Disk）、Blu-ray disc等のストレージの大容量化に伴って、デジタルコンテンツを著作権者やコンテンツプロバイダ（以下、「著作権者等」という）の許諾を得ずに、ネットワークを介して共有・公開することが容易になってきており、このような不正な共有・公開が問題となっている。このような問題に対して、デジタルコンテンツの指紋（特徴量）を利用して、複数のデジタルコンテンツの中から、著作権者等が自由配布を許諾していない特定のコンテンツを自動的に検出する技術が提案されている（特許文献１、２、非特許文献１参照）。 With the spread of broadband in recent years and the increase in storage capacity of HDDs (Hard Disk Drives), DVDs (Digital Versatile Disks), Blu-ray discs, etc., digital content has become copyright holders and content providers (hereinafter referred to as "Copyrights"). It has become easy to share and publish via a network without obtaining permission from a right holder or the like), and such illegal sharing and publishing has become a problem. To deal with such problems, the digital content fingerprint (feature value) is used to automatically detect a specific content that the copyright holder or the like does not permit free distribution from among a plurality of digital content. Techniques have been proposed (see Patent Documents 1 and 2 and Non-Patent Document 1).

特許文献１では、三次元周波数解析と主成分分析を用いて、コンテンツの特徴量を記述し、特定のコンテンツを検出している。この手法では、空間周波数解析（DCT）で得られた係数に時間軸方向への周波数解析（FFT）を加えた三次元周波数解析を行ない、さらに主成分分析により三次元周波数解析で得られた係数から特徴量を抽出している。特許文献２では、特許文献１で利用されている特徴量を用いて、流通コンテンツと類似している特定コンテンツを絞り込み、絞り込めない場合には、位相限定相関法を用いて流通コンテンツと最も類似している特定コンテンツを決定し、閾値によって同一コンテンツであるか否かを判定している。 In Patent Document 1, a feature amount of a content is described using a three-dimensional frequency analysis and a principal component analysis, and a specific content is detected. This method performs 3D frequency analysis by adding frequency analysis (FFT) in the time axis direction to the coefficient obtained by spatial frequency analysis (DCT), and then the coefficient obtained by 3D frequency analysis by principal component analysis. Feature values are extracted from. In Patent Document 2, the specific content similar to the distributed content is narrowed down using the feature amount used in Patent Document 1, and when it cannot be narrowed down, the most similar to the distributed content using the phase-only correlation method Specific content is determined, and it is determined whether or not the same content is based on a threshold value.

また、非特許文献１ではまず、映像から隣接フレームの輝度値の平均絶対誤差（Motion intensity）を求め、その平均絶対誤差が極値を取るフレームをキーフレームとする。次に各キーフレームからHarris detectorによってコーナーと呼ばれる特徴点を検出し、その周辺からGaussian derivativeを用いて特徴量を抽出する。その後、各特徴量とデータベースとのマッチング・投票を行い、投票数が多いコンテンツを不正流通コンテンツとして検出している。この手法では映像に時間的な編集が行なわれた場合であっても、不正流通コンテンツを検出できるようにしている。 In Non-Patent Document 1, first, an average absolute error (Motion intensity) of luminance values of adjacent frames is obtained from a video, and a frame in which the average absolute error takes an extreme value is set as a key frame. Next, a feature point called a corner is detected from each key frame by a Harris detector, and a feature amount is extracted from the periphery using a Gaussian derivative. Thereafter, matching and voting are performed between each feature quantity and the database, and content with a large number of votes is detected as illegally distributed content. In this method, unauthorized distribution content can be detected even when video is temporally edited.

特開２００５−１８６７５号公報JP 2005-18675 A 特開２００６−２８５９０７号公報JP 2006-285907 A

J.Law-To et al.,“Video Copy Detection:A Comparative Study,”in Proc.ACM CIVR’07,pp.371-378,2007. J. Law-To et al., “Video Copy Detection: A Comparative Study,” in Proc. ACM CIVR’07, pp.371-378, 2007.

しかしながら、特許文献１および２で開示されている手法では、動画コンテンツ１つから１つの特徴量を抽出するため、例えば、動画コンテンツを分割する等の時間軸方向の編集が行われると検出ができなくなるという問題がある。一方、非特許文献１で開示されている手法では、下記の問題がある。まず、Motion intensityによってキーフレームを選択しているが、Motion intensityの極値がノイズに対して不安定であり、キーフレームがずれることによって精度が低下するという問題がある。また、Motion intensityによって抽出されるキーフレームの数がシーンによって異なり、冗長なキーフレームが抽出されることにより処理時間が増加するという問題がある。さらに、抽出されるGaussian derivative特徴量は圧縮ノイズ等に比較的敏感であるため、このようなノイズが付加された場合には精度が低下するという問題がある。 However, in the methods disclosed in Patent Documents 1 and 2, since one feature amount is extracted from one moving image content, for example, it can be detected when editing in the time axis direction such as dividing the moving image content is performed. There is a problem of disappearing. On the other hand, the method disclosed in Non-Patent Document 1 has the following problems. First, a key frame is selected based on the motion intensity. However, there is a problem that the extreme value of the motion intensity is unstable with respect to noise, and the accuracy is lowered due to the shift of the key frame. In addition, the number of key frames extracted by the motion intensity varies depending on the scene, and there is a problem that processing time increases due to the extraction of redundant key frames. Furthermore, since the extracted Gaussian derivative feature amount is relatively sensitive to compression noise or the like, there is a problem that accuracy is lowered when such noise is added.

本発明は、このような事情に鑑みてなされたものであり、著作権者等が自由配布を許諾していないコンテンツの一部分を切り出すなど時間軸方向の編集が行われた不正流通コンテンツや、全体が圧縮ノイズなどによって劣化した不正流通コンテンツであっても、精度よくかつ高速に検出することができる技術を提供することを目的とする。 The present invention has been made in view of such circumstances, and illegally distributed content that has been edited in the time axis direction, such as cutting out a part of content that the copyright holder or the like does not permit free distribution, It is an object of the present invention to provide a technology capable of accurately and rapidly detecting even illegally distributed content deteriorated due to compression noise or the like.

上記問題を解決するために、本発明の一態様である動画コンテンツ検出装置は、動画コンテンツのショット境界を検出するショット境界検出部と、ショット境界検出部によって検出されたショット境界の前後のフレームから特徴量を抽出するショット境界特徴量抽出部と、一の動画コンテンツに係る特徴量であってショット境界特徴量抽出部によって抽出されたショット境界特徴量を、記憶部に記憶されている複数の動画コンテンツに係るショット境界特徴量と照合する特徴量照合部とを備えることを特徴とする。 In order to solve the above problem, a video content detection apparatus according to an aspect of the present invention includes a shot boundary detection unit that detects a shot boundary of video content, and frames before and after the shot boundary detected by the shot boundary detection unit. A shot boundary feature amount extraction unit that extracts feature amounts, and a plurality of moving images in which a shot boundary feature amount that is a feature amount related to one moving image content and is extracted by the shot boundary feature amount extraction unit is stored in the storage unit And a feature amount matching unit for matching with a shot boundary feature amount related to the content.

上記動画コンテンツ検出装置において、ショット境界検出部は、動画コンテンツを構成するフレームであって一定間隔毎に存在する所定のフレームの情報を用いて一定間隔内にショット境界が存在するか否かを判定し、ショット境界が存在すると判定した一定間隔内の各フレーム間がショット境界であるか否かを判定して、動画コンテンツのショット境界を検出してもよい。 In the moving image content detection apparatus, the shot boundary detection unit determines whether a shot boundary exists within a certain interval using information of a predetermined frame that is included in the moving image content and is present at certain intervals. Then, the shot boundary of the moving image content may be detected by determining whether or not each frame within a certain interval determined to have a shot boundary is a shot boundary.

上記動画コンテンツ検出装置において、所定のフレームは、圧縮後の動画コンテンツを構成するフレームのうち他のフレームを参照することなくデコードできるフレームであってもよい。 In the moving image content detection apparatus, the predetermined frame may be a frame that can be decoded without referring to another frame among frames constituting the compressed moving image content.

上記動画コンテンツ検出装置において、ショット境界特徴量抽出部は、ショット境界の前後のフレームの相関を基にショット境界特徴量を抽出してもよい。また、ショット境界特徴量抽出部は、ショット境界の前後のフレームをそれぞれ複数のブロックに分割し、これら複数のブロックから一定個数のブロックの組を作成し、それらブロックの組の相関を基に前記ショット境界特徴量を抽出してもよい。 In the moving image content detection apparatus, the shot boundary feature amount extraction unit may extract the shot boundary feature amount based on a correlation between frames before and after the shot boundary. Further, the shot boundary feature amount extraction unit divides the frames before and after the shot boundary into a plurality of blocks, creates a set of a certain number of blocks from the plurality of blocks, and based on the correlation between the sets of the blocks, A shot boundary feature amount may be extracted.

上記動画コンテンツ検出装置において、ショット境界特徴量抽出部は、ショット境界の前後のフレームをそれぞれ複数のブロックに分割し、これら複数のブロックから一定個数のブロックの組を作成し、ブロックの組の平均輝度、動き強度、エッジ量の大小関係の少なくとも１つを基にショット境界特徴量を抽出するようにしてもよい。 In the moving image content detection apparatus, the shot boundary feature amount extraction unit divides each frame before and after the shot boundary into a plurality of blocks, creates a set of a predetermined number of blocks from the plurality of blocks, and averages the set of blocks You may make it extract a shot boundary feature-value based on at least one of the magnitude relationship of a brightness | luminance, motion intensity | strength, and edge amount.

上記動画コンテンツ検出装置において、特徴量照合部は、一の動画コンテンツに係るショット境界特徴量と記憶部に記憶されている動画コンテンツに係るショット境界特徴量との距離を算出し、当該距離に基づいて一の動画コンテンツと記憶部に記憶されている動画コンテンツとを照合し、距離算出において、平均輝度、動き強度、エッジ量が近いブロックの組の大小関係は利用しないようにしもよい。 In the moving image content detection apparatus, the feature amount matching unit calculates a distance between the shot boundary feature amount related to one moving image content and the shot boundary feature amount related to the moving image content stored in the storage unit, and based on the distance Thus, the moving image content stored in the storage unit and the moving image content stored in the storage unit may be collated, and in the distance calculation, the magnitude relationship between the sets of blocks having similar average luminance, motion intensity, and edge amount may not be used.

本発明によれば、著作権者等が自由配布を許諾していないコンテンツの一部分を切り出すなど時間軸方向の編集が行われた不正流通コンテンツや、全体が圧縮ノイズなどによって劣化した不正流通コンテンツであっても、精度よくかつ高速に検出することができるようになる。 According to the present invention, illegal distribution content that has been edited in the time axis direction, such as cutting out a part of content that the copyright holder or the like has not permitted free distribution, or illegal distribution content that has deteriorated due to compression noise or the like as a whole. Even if it exists, it becomes possible to detect with high accuracy and high speed.

本発明の一実施形態による動画コンテンツ検出装置１の機能ブロック図である。It is a functional block diagram of the moving image content detection apparatus 1 by one Embodiment of this invention. ショット境界検出部１０の動作の一例を示すフローチャートである。4 is a flowchart illustrating an example of the operation of the shot boundary detection unit 10. ショット境界検出部１０の動作を説明する説明図である。FIG. 6 is an explanatory diagram for explaining the operation of the shot boundary detection unit 10. ショット境界特徴量抽出部２０の動作を説明する説明図である。FIG. 6 is an explanatory diagram for explaining an operation of a shot boundary feature quantity extraction unit 20. 特徴量照合部４０の動作を説明する説明図である。FIG. 6 is an explanatory diagram for explaining an operation of a feature amount matching unit 40.

以下、本発明の一実施形態について図面を参照して説明する。本発明の一実施形態による動画コンテンツ検出装置１は、検査対象の動画コンテンツ（クエリコンテンツ）の特徴量と、著作権者等が自由配布を許諾していない特定のコンテンツ（以下、「リファレンスコンテンツ」という）の特徴量とを用いて、不正流通コンテンツと推測されるクエリコンテンツを検出する。動画コンテンツ検出装置１は、図１（ａ）に示すように、ショット境界検出部１０、ショット境界特徴量抽出部２０、特徴量登録部３０、特徴量照合部４０およびデータベース（記憶部）５０を備える。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings. The moving image content detection apparatus 1 according to the embodiment of the present invention includes a feature amount of a moving image content (query content) to be inspected and specific content (hereinafter referred to as “reference content”) that the copyright holder or the like does not permit free distribution. The query content that is presumed to be illegally distributed content is detected using the feature amount of As shown in FIG. 1A, the moving image content detection apparatus 1 includes a shot boundary detection unit 10, a shot boundary feature amount extraction unit 20, a feature amount registration unit 30, a feature amount collation unit 40, and a database (storage unit) 50. Prepare.

ショット境界検出部１０は、動画コンテンツ（リファレンスコンテンツおよびクエリコンテンツ）のショット境界を検出する。具体的には、まず、ショット境界検出部１０は、動画コンテンツを構成するフレームであって一定間隔毎に存在する所定のフレームの情報を用いて一定間隔内にショット境界が存在するか否かを判定する。所定のフレームは、例えば、圧縮された動画コンテンツを構成するフレームのうち他のフレームを参照することなくデコードできるフレーム、即ち、該フレーム単独でデコードできるフレームである。ＧＯＰ（Group of Picture）におけるＩフレーム（Intra Picture）は、上記所定のフレームに相当する。 The shot boundary detection unit 10 detects shot boundaries of moving image content (reference content and query content). Specifically, first, the shot boundary detection unit 10 determines whether or not a shot boundary exists within a certain interval by using information of predetermined frames that are frames constituting the moving image content and exist at every certain interval. judge. The predetermined frame is, for example, a frame that can be decoded without referring to another frame among frames constituting the compressed moving image content, that is, a frame that can be decoded alone. An I frame (Intra Picture) in a GOP (Group of Picture) corresponds to the predetermined frame.

続いて、ショット境界検出部１０は、ある一定間隔内にショット境界が存在すると判定した場合、当該一定間隔内の各フレーム間について、各フレーム間がショット境界に該当するか否かを更に判定する。一方、ショット境界検出部１０は、ショット境界が存在しないと判定した一定間隔内の各フレーム間については、各フレーム間がショット境界に該当するか否かを更に判定しない。即ち、ショット境界検出部１０は、所定のフレームの情報を用いて各一定間隔のショット境界の存否を判定し、ショット境界が存在すると判定した一定間隔内からのみショット境界を検出する。 Subsequently, when the shot boundary detection unit 10 determines that a shot boundary exists within a certain interval, the shot boundary detection unit 10 further determines whether each frame corresponds to a shot boundary for each frame within the certain interval. . On the other hand, the shot boundary detection unit 10 does not further determine whether or not each frame corresponds to a shot boundary for each frame within a certain interval determined that there is no shot boundary. In other words, the shot boundary detection unit 10 determines whether or not there is a shot boundary at each fixed interval using information of a predetermined frame, and detects a shot boundary only from the fixed interval at which it is determined that a shot boundary exists.

ショット境界を検出したショット境界検出部１０は、ショット境界の前後２枚のフレームをキーフレームとして抽出する。以下、キーフレームとして抽出した前後２枚のフレームをキーフレームペアともいう。キーフレームペアを抽出したショット境界検出部１０は、抽出したキーフレームペアをショット境界特徴量抽出部２０に供給する。 The shot boundary detection unit 10 that has detected the shot boundary extracts two frames before and after the shot boundary as key frames. Hereinafter, the two front and rear frames extracted as key frames are also referred to as key frame pairs. The shot boundary detection unit 10 that has extracted the key frame pair supplies the extracted key frame pair to the shot boundary feature quantity extraction unit 20.

以下、図２乃至図４を用いてショット境界検出部１０の動作を詳細に説明する。なお、ショット境界検出部１０において利用する特徴量は、例えば、下記参考文献１に記載の特徴量としてもよいが、高速化のために単体で最も精度が高いとされる色ヒストグラム間のカイ二乗値とする。
（参考文献１）K.Matsumoto,M.Naito,K.Hoashi,and F.Sugaya,“SVM-Based Shot Boundary Detection With a Novel Feature,”in Proc.of ICME’06, pp.1837-1840,2006. Hereinafter, the operation of the shot boundary detection unit 10 will be described in detail with reference to FIGS. Note that the feature amount used in the shot boundary detection unit 10 may be, for example, the feature amount described in Reference Document 1 below, but the chi-square between color histograms that are considered to have the highest accuracy alone for speeding up. Value.
(Reference 1) K. Matsumoto, M. Naito, K. Hoashi, and F. Sugaya, “SVM-Based Shot Boundary Detection With a Novel Feature,” in Proc. Of ICME'06, pp.1837-1840, 2006 .

図２のフローチャートに示すように、ショット境界検出部１０は、まず、動画コンテンツ（リファレンスコンテンツおよびクエリコンテンツ）内のあるＧＯＰに係るＩフレームを抽出する（ステップＳ１０）。例えば、ショット境界検出部１０は、本処理を実行する毎に、動画コンテンツの先頭から順にＩフレームを抽出する。 As shown in the flowchart of FIG. 2, the shot boundary detection unit 10 first extracts an I frame related to a certain GOP in the moving image content (reference content and query content) (step S10). For example, each time this process is executed, the shot boundary detection unit 10 extracts I frames in order from the top of the moving image content.

ショット境界検出部１０は、当該Ｉフレームを含むＧＯＰ内にショット境界が存在するか否かを判定する（ステップＳ２０）。以下、あるＧＯＰ内にショット境界が存在するか否かを判定する処理をＧＯＰレベルショット境界検出処理という。 The shot boundary detection unit 10 determines whether or not a shot boundary exists in the GOP including the I frame (step S20). Hereinafter, processing for determining whether or not a shot boundary exists in a certain GOP is referred to as GOP level shot boundary detection processing.

図３（ａ）は、ＧＯＰレベルショット境界判定処理に利用する特徴量の抽出法を示す概念図である。ショット境界検出部１０は、例えば、ＧＯＰレベルショット境界判定処理の判定対象となるＧＯＰ、即ち当該Ｉフレームを含むＧＯＰの前後各Ｎ個のＩフレームから特徴量を抽出する。 FIG. 3A is a conceptual diagram showing a feature amount extraction method used for the GOP level shot boundary determination process. For example, the shot boundary detection unit 10 extracts a feature amount from each of N I frames before and after the GOP to be determined in the GOP level shot boundary determination process, that is, the GOP including the I frame.

具体的には、まず、ショット境界検出部１０は、各ＩフレームをＸ×Ｙ個の領域に等分割し、等分割後の各領域から色ヒストグラムを抽出する。なお、上記参考文献１では、Ｏｈｔａの色空間でのヒストグラムを利用しているが、色空間の変換に必要な計算量を省くためＹＣｂＣｒ色空間でのヒストグラムを利用する。次に、ショット境界検出部１０は、隣接するＩフレームの同一の領域のヒストグラム間距離として、下記式（１）を用いてカイ二乗値ｄ_Ｘを算出する。 Specifically, first, the shot boundary detection unit 10 equally divides each I frame into X × Y regions, and extracts a color histogram from each region after the equal division. In the above-mentioned reference 1, a histogram in the Ohta color space is used, but a histogram in the YCbCr color space is used in order to save the amount of calculation necessary for the color space conversion. Then, the shot boundary detector 10, as a histogram distance between the same region of the adjacent I frames, calculates the chi-square value d _X using the following equation (1).

次に、ショット境界検出部１０は、ＳＶＭ判別に利用する特徴量として、下記式（２）を用いて、全ての隣接Ｉフレーム間の全ての領域に対するカイ二乗値ｄ_ＸであるＶ_{ｉｎｔｅｒ}を算出する。 Next, the shot boundary detection unit 10 calculates V _inter that is a chi-square value d _X for all regions between all adjacent I frames, using the following equation (2) as a feature amount used for SVM discrimination. To do.

ショット境界検出部１０は、上記特徴量を利用し、予めショット境界のラベリングを行ったコンテンツを利用し、当該Ｉフレームを含むＧＯＰ内にショット境界が存在する特徴を正例、それ以外の特徴を負例として学習しておくことでＧＯＰレベルショット境界判定処理を実現する。 The shot boundary detection unit 10 uses the above-described feature amount, uses the content that has been previously labeled with the shot boundary, uses a feature in which a shot boundary exists in the GOP including the I frame as a positive example, and other features. The GOP level shot boundary determination process is realized by learning as a negative example.

ショット境界検出部１０は、ＧＯＰレベルショット境界検出処理の結果、当該Ｉフレームを含むＧＯＰ内にショット境界が存在すると判定した場合（ステップＳ２０：Ｙｅｓ）、当該ＧＯＰ内のあるフレーム間がショット境界に該当するか否かを判定する（ステップＳ３０）。例えば、ショット境界検出部１０は、本処理を実行する毎に、当該ＧＯＰの先頭のフレーム間から順に、ショット境界に該当するか否かを判定する。以下、あるフレーム間がショット境界に該当するか否かを判定する処理をフレームレベルショット境界検出処理という。即ち、ショット境界検出部１０は、ＧＯＰレベルショット境界検出処理においてＧＯＰ内にショット境界が存在すると判定されたＧＯＰについて、フレームレベルショット境界検出処理を実行する。 As a result of the GOP level shot boundary detection process, the shot boundary detection unit 10 determines that a shot boundary exists in the GOP including the I frame (step S20: Yes), and a certain frame in the GOP becomes a shot boundary. It is determined whether it is applicable (step S30). For example, each time this process is executed, the shot boundary detection unit 10 determines whether or not the shot boundary corresponds to the first frame of the GOP in order. Hereinafter, the process of determining whether a certain frame corresponds to a shot boundary is referred to as a frame level shot boundary detection process. In other words, the shot boundary detection unit 10 executes the frame level shot boundary detection process for the GOP that is determined to have a shot boundary in the GOP in the GOP level shot boundary detection process.

図３（ｂ）は、フレームレベルショット境界判定処理に利用する特徴量の抽出法を示す概念図である。フレームレベルショット境界判定処理において利用する特徴量は、図３（ｂ）に示す通り、ＧＯＰレベル境界検出処理で利用した特徴量とほぼ同じである。但し、特徴量を抽出するフレームが、判定対象のＧＯＰ外に存在する場合は、不要なデコード処理の発生を防ぐため、判定対象のＧＯＰ外に存在するフレームとの間のカイ二乗値ｄ_Ｘを実際には求めずにショット境界でないフレーム間のカイ二乗値の平均値を利用する。 FIG. 3B is a conceptual diagram illustrating a feature amount extraction method used for frame level shot boundary determination processing. The feature quantity used in the frame level shot boundary determination process is substantially the same as the feature quantity used in the GOP level boundary detection process, as shown in FIG. However, if the frame from which the feature amount is extracted exists outside the determination target GOP, the chi-square value d _X between the frame existing outside the determination target GOP is set to prevent generation of unnecessary decoding processing. An average value of chi-square values between frames that are not actually obtained and that are not shot boundaries is used.

ショット境界検出部１０は、フレームレベルショット境界検出処理の結果、当該ＧＯＰ内の当該フレーム間がショット境界に該当すると判定した場合（ステップＳ３０：Ｙｅｓ）、当該フレーム間の前後のキーフレームペアを抽出する（ステップＳ４０）。なお、キーフレームペアを抽出したショット境界検出部１０は、ショット境界に該当すると判定した当該フレーム間の時刻（例えば、コンテンツの先頭からの時刻。以下、「ショット境界時刻」という）に対応付けて当該キーフレームペアを一時記憶する。 The shot boundary detection unit 10 extracts key frame pairs before and after the frames when it is determined that the frames in the GOP correspond to the shot boundaries as a result of the frame level shot boundary detection processing (step S30: Yes). (Step S40). The shot boundary detection unit 10 that has extracted the key frame pair is associated with the time between the frames determined to correspond to the shot boundary (for example, the time from the beginning of the content; hereinafter referred to as “shot boundary time”). The key frame pair is temporarily stored.

ショット境界検出部１０は、当該ＧＯＰ内の当該フレーム間がショット境界に該当しないと判定した場合（ステップＳ３０：Ｎｏ）、または、キーフレームペアを抽出した場合（ステップＳ４０）、当該ＧＯＰ内の全フレーム間を対象にフレームレベルショット境界判定処理を実施したか否かを判断する（ステップＳ５０）。ショット境界検出部１０は、当該ＧＯＰ内の全フレーム間を対象にフレームレベルショット境界判定処理を実施していないと判断した場合（ステップＳ５０：Ｎｏ）、ステップＳ３０に戻って、当該ＧＯＰ内の次のフレーム間がショット境界に該当するか否かを判定する（ステップＳ３０）。 When the shot boundary detection unit 10 determines that the frames in the GOP do not correspond to the shot boundaries (step S30: No) or extracts a key frame pair (step S40), the shot boundary detection unit 10 extracts all the frames in the GOP. It is determined whether or not frame level shot boundary determination processing has been performed between frames (step S50). When the shot boundary detection unit 10 determines that the frame level shot boundary determination process is not performed between all frames in the GOP (step S50: No), the process returns to step S30 to return to the next in the GOP. It is determined whether or not a frame boundary corresponds to a shot boundary (step S30).

ショット境界検出部１０は、当該Ｉフレームを含むＧＯＰ内にショット境界が存在しないと判定した場合（ステップＳ２０：Ｎｏ）、または、当該ＧＯＰ内の全フレーム間を対象にフレームレベルショット境界判定処理を実施したと判断した場合（ステップＳ５０：Ｙｅｓ）、当該コンテンツ内の全ＧＯＰを対象にＧＯＰレベルショット境界判定処理を実施したか否かを判断する（ステップＳ６０）。ショット境界検出部１０は、当該コンテンツ内の全ＧＯＰを対象にＧＯＰレベルショット境界判定処理を実施していないと判断した場合（ステップＳ６０：Ｎｏ）、ステップＳ１０に戻って、当該コンテンツ内の次のＧＯＰに係るＩフレームを抽出する（ステップＳ１０）。 When it is determined that there is no shot boundary in the GOP including the I frame (step S20: No), the shot boundary detection unit 10 performs a frame level shot boundary determination process for all frames in the GOP. If it is determined that the GOP level shot boundary determination process has been performed on all GOPs in the content (step S50: Yes), it is determined (step S60). When the shot boundary detection unit 10 determines that the GOP level shot boundary determination process has not been performed for all GOPs in the content (step S60: No), the process returns to step S10 to return to the next in the content An I frame related to the GOP is extracted (step S10).

ショット境界検出部１０は、当該コンテンツ内の全ＧＯＰを対象にＧＯＰレベルショット境界判定処理を実施したと判断した場合（ステップＳ６０：Ｙｅｓ）、コンテンツを識別するコンテンツＩＤとともに、一時記憶しているショット境界時刻とキーフレームペアとをショット境界特徴量抽出部２０に供給し、本フローチャートは終了する。 When the shot boundary detection unit 10 determines that the GOP level shot boundary determination process has been performed on all GOPs in the content (step S60: Yes), the shot temporarily stored together with the content ID for identifying the content The boundary time and the key frame pair are supplied to the shot boundary feature quantity extraction unit 20, and this flowchart ends.

以上説明した様に、ショット境界検出においては、圧縮されたコンテンツの基本構造であるＧＯＰに着目し、フレームレベルショット境界判定処理に先立ってＧＯＰレベルショット境界検出処理を実行している。従って、デコード等の処理時間が削減され、ショット境界の検出処理が高速化する。なお、動画コンテンツの符号化情報を利用することによって、全フレームをデコードすることなく、ショット境界を高速に検出する手法が存在するが、当該方法は、特定のコーデックに依存した符号化情報を利用するため、特定のコーデックで圧縮された動画コンテンツのみにしか適用できず、汎用的ではない。 As described above, in shot boundary detection, attention is paid to GOP which is a basic structure of compressed content, and GOP level shot boundary detection processing is executed prior to frame level shot boundary determination processing. Accordingly, the processing time for decoding and the like is reduced, and the shot boundary detection process is accelerated. Note that there is a method for detecting shot boundaries at high speed without decoding all frames by using encoded information of video content, but this method uses encoded information that depends on a specific codec. Therefore, it can be applied only to moving image content compressed with a specific codec, and is not general purpose.

ショット境界特徴量抽出部２０は、ショット境界検出部１０から、コンテンツＩＤとショット境界時刻とキーフレームペアとを取得する。ショット境界特徴量抽出部２０は、ショット境界検出部１０から取得したキーフレームペアから特徴量（以下、「ショット境界特徴量」という）を抽出する。 The shot boundary feature quantity extraction unit 20 acquires a content ID, a shot boundary time, and a key frame pair from the shot boundary detection unit 10. The shot boundary feature amount extraction unit 20 extracts a feature amount (hereinafter referred to as “shot boundary feature amount”) from the key frame pair acquired from the shot boundary detection unit 10.

具体的には、ショット境界特徴量抽出部２０は、各キーフレームの相関を基にショット境界特徴量を抽出する。例えば、ショット境界特徴量抽出部２０は、各キーフレームをそれぞれ複数のブロックに分割し、これら複数のブロックから一定個数のブロックの組を作成し、それらブロックの組の相関を基にショット境界特徴量を抽出する。例えば、ショット境界特徴量抽出部２０は、ブロックの組の平均輝度、動き強度、エッジ量の大小関係の少なくとも１つを基にショット境界特徴量を抽出する。 Specifically, the shot boundary feature amount extraction unit 20 extracts a shot boundary feature amount based on the correlation of each key frame. For example, the shot boundary feature amount extraction unit 20 divides each key frame into a plurality of blocks, creates a set of a certain number of blocks from the plurality of blocks, and creates shot boundary features based on the correlation between the sets of blocks. Extract the amount. For example, the shot boundary feature amount extraction unit 20 extracts a shot boundary feature amount based on at least one of the magnitude relationship between the average luminance, the motion intensity, and the edge amount of a block set.

リファレンスコンテンツのショット境界特徴量を抽出したショット境界特徴量抽出部２０は、当該リファレンスコンテンツに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）を特徴量登録部３０に供給する。クエリコンテンツのショット境界特徴量を抽出したショット境界特徴量抽出部２０は、当該クエリコンテンツに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）を特徴量照合部４０に供給する。なお、ショット境界特徴量抽出部２０は、例えば、ユーザからの入力に基づいて、当該コンテンツがリファレンスコンテンツであるかクエリコンテンツであるかを判断してもよい。例えば、動画コンテンツ検出装置はユーザからのモード選択を受け付けるモード選択受付部（非図示）を備え、モード選択受付部を介してリファレンスコンテンツのショット境界特徴量をデータベース５０に登録する登録モードを受け付けた場合には、ショット境界特徴量抽出部２０は当該コンテンツがリファレンスコンテンツであると判断し、クエリコンテンツとリファレンスコンテンツとを照合する照合モードを受け付けた場合には、ショット境界特徴量抽出部２０は当該コンテンツがクエリコンテンツであると判断する。 The shot boundary feature quantity extraction unit 20 that has extracted the shot boundary feature quantity of the reference content supplies shot information (content ID, shot boundary time, shot boundary feature quantity) related to the reference content to the feature quantity registration unit 30. The shot boundary feature amount extraction unit 20 that has extracted the shot boundary feature amount of the query content supplies the feature amount matching unit 40 with the shot information (content ID, shot boundary time, shot boundary feature amount) related to the query content. Note that the shot boundary feature amount extraction unit 20 may determine whether the content is reference content or query content based on an input from the user, for example. For example, the moving image content detection apparatus includes a mode selection receiving unit (not shown) that receives a mode selection from the user, and receives a registration mode for registering the shot boundary feature quantity of the reference content in the database 50 via the mode selection receiving unit. In this case, when the shot boundary feature quantity extraction unit 20 determines that the content is reference content and receives a matching mode for matching the query content and the reference content, the shot boundary feature quantity extraction unit 20 Determine that the content is query content.

以下、ショット境界特徴量抽出部２０がブロックの組のエッジ量の大小関係を基にショット境界特徴量を抽出する例を説明する。まず、ショット境界特徴量抽出部２０は、各キーフレームをそれぞれＮ×Ｍ個の領域に分割する。次に、ショット境界特徴量抽出部２０は、下記式（３）または下記式（４）を用いて、分割後の各領域のエッジ量Ｅ（ｉ，ｊ）を算出する。 Hereinafter, an example in which the shot boundary feature amount extraction unit 20 extracts a shot boundary feature amount based on the magnitude relationship between the edge amounts of a set of blocks will be described. First, the shot boundary feature quantity extraction unit 20 divides each key frame into N × M areas. Next, the shot boundary feature quantity extraction unit 20 calculates an edge quantity E (i, j) of each divided area using the following formula (3) or the following formula (4).

次に、ショット境界特徴量抽出部２０は、図４に示すように、下記式（５）（６）にて表されるＮ×Ｍビットのショット境界特徴量Ｂ（ｉ，ｊ）を算出する。但し、エッジ量Ｅ⁻（ｉ，ｊ）はショット境界の前のキーフレーム、エッジ量Ｅ^＋（ｉ，ｊ）はショット境界の後のキーフレームである。 Next, as shown in FIG. 4, the shot boundary feature quantity extraction unit 20 calculates an N × M bit shot boundary feature quantity B (i, j) represented by the following equations (5) and (6). . However, the edge amount E ⁻ (i, j) is a key frame before the shot boundary, and the edge amount E ⁺ (i, j) is a key frame after the shot boundary.

以上説明した様に、ショット境界特徴量の抽出においては、ビット表現されるショット境界特徴量を生成するため、ＸＯＲによる高速な特徴量間の距離計算が可能になる。また、データベース５０の登録（蓄積）コストの削減が可能になる。なお、画像をビット列表現する手法は複数存在するが、それらの主な課題はロバスト性である。ロバスト性とは、画像に何らかの改変が加えられた際に、特徴量がなるべく変化しない特性のことである。従来の手法は、主にロゴやキャプション等のパターンが挿入された際に、大きく特徴量が変化してしまうという課題があった。本実施形態の手法では、1枚のフレームから特徴量を抽出するのではなく、キーフレームペア、すなわち２枚のキーフレームの相関情報を利用して特徴量を抽出するため、パターン挿入を含む様々な改変にロバストな特徴量を抽出することができる。 As described above, in the extraction of the shot boundary feature value, since the shot boundary feature value expressed in bits is generated, the distance between the feature values can be calculated at high speed by XOR. Further, the registration (accumulation) cost of the database 50 can be reduced. Note that there are a plurality of methods for representing a bit string of an image, but the main problem is robustness. Robustness is a characteristic in which a feature amount does not change as much as possible when some modification is made to an image. The conventional method has a problem that the feature amount largely changes when a pattern such as a logo or a caption is inserted. In the method of the present embodiment, feature amounts are not extracted from one frame, but feature amounts are extracted by using correlation information between key frame pairs, that is, two key frames. It is possible to extract feature amounts that are robust to various modifications.

なお、具体例として、ショット境界特徴量抽出部２０がブロックの組のエッジ量の大小関係を基にショット境界特徴量を抽出する例を説明したが、平均輝度、動き強度の大小関係を基に境界特徴量を抽出する場合も同様である。 As a specific example, the example in which the shot boundary feature amount extraction unit 20 extracts the shot boundary feature amount based on the magnitude relationship between the edge amounts of the block sets has been described, but based on the magnitude relationship between the average luminance and the motion intensity. The same applies to the case where the boundary feature amount is extracted.

特徴量登録部３０は、ショット境界特徴量抽出部２０から、リファレンスコンテンツに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）を取得する。リファレンスコンテンツに係るショット情報を取得した特徴量登録部３０は、当該リファレンスコンテンツに係るショット情報をデータベース５０に登録（記憶）する。なお、特徴量登録部３０は、ショット境界特徴量のハッシュ値を算出し、当該ハッシュ値をハッシュキーとして、各ショット情報を複数のハッシュテーブルに記憶する。なお、ハッシングは、例えば、下記参考文献２に記載のLocal Sensitive Hashingを利用してもよい。
（参考文献２）Datar,M.,N.Immorlica,P.Indyk and V.Mirrokni,“Locality-Sensitive Hashing Scheme Based on p-Stable Distributions,” Proceedings of the 20th Symposium on Computational Geometry,pp.253-262,2004. The feature amount registration unit 30 acquires shot information (content ID, shot boundary time, shot boundary feature amount) related to the reference content from the shot boundary feature amount extraction unit 20. The feature amount registration unit 30 that has acquired the shot information related to the reference content registers (stores) the shot information related to the reference content in the database 50. The feature amount registration unit 30 calculates a hash value of the shot boundary feature amount, and stores each shot information in a plurality of hash tables using the hash value as a hash key. For hashing, for example, Local Sensitive Hashing described in Reference Document 2 below may be used.
(Reference 2) Datar, M., N. Immorlica, P. Indyk and V. Mirrokni, “Locality-Sensitive Hashing Scheme Based on p-Stable Distributions,” Proceedings of the 20th Symposium on Computational Geometry, pp.253-262 , 2004.

以上説明した様に、特徴量の登録においては、各ショット情報は、ショット境界特徴量を基に、例えばLocal Sensitive Hashingによって複数のハッシュテーブルに登録されるため、探索処理が高速化する。 As described above, in registering feature amounts, each piece of shot information is registered in a plurality of hash tables based on shot boundary feature amounts, for example, by Local Sensitive Hashing.

データベース５０は、リファレンスコンテンツに係るショット情報（コンテンツＩＤ、ショット境界の時刻、ショット境界特徴量）が登録（記憶）されている。なお、上述の如く、各ショット情報は、ショット境界特徴量をハッシュキーとして、複数のハッシュテーブルに登録されている。 The database 50 registers (stores) shot information (content ID, shot boundary time, shot boundary feature amount) related to the reference content. As described above, each piece of shot information is registered in a plurality of hash tables using the shot boundary feature quantity as a hash key.

特徴量照合部４０は、ショット境界特徴量抽出部２０から、クエリコンテンツに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）を取得する。クエリコンテンツに係るショット情報を取得した特徴量照合部４０は、当該クエリコンテンツに係るショット境界特徴量を、データベース５０に記憶されている複数のリファレンスコンテンツに係るショット境界特徴量と照合する。即ち、特徴量照合部４０は、クエリコンテンツのショット境界特徴量を取得した場合、リファレンスコンテンツのショット境界特徴量を予め登録しているデータベース５０を参照し、クエリコンテンツがリファレンスコンテンツの少なくとも一部をコピーしたものに該当するか否かを照合する。 The feature amount matching unit 40 acquires shot information (content ID, shot boundary time, shot boundary feature amount) related to the query content from the shot boundary feature amount extraction unit 20. The feature amount collation unit 40 that has acquired the shot information related to the query content collates the shot boundary feature amount related to the query content with the shot boundary feature amount related to a plurality of reference contents stored in the database 50. That is, when acquiring the shot boundary feature amount of the query content, the feature amount matching unit 40 refers to the database 50 in which the shot boundary feature amount of the reference content is registered in advance, and the query content includes at least a part of the reference content. Check whether it corresponds to the copied one.

具体的には、まず、特徴量照合部４０は、クエリコンテンツの各ショット境界のショット情報（ショット境界の時刻、ショット境界特徴量）を基に、クエリコンテンツと最も類似するリファレンスコンテンツの区間を推定し、当該区間の類似度と閾値とを利用してクエリコンテンツがリファレンスコンテンツのコピーであるか否かを判定する。類似するリファレンス区間の推定は、クエリコンテンツとリファレンスコンテンツのショット境界同士のマッチングを投票によって纏めることで実現する。また、当該区間の類似度は、当該区間の投票数に基づいて算出する。 Specifically, the feature amount matching unit 40 first estimates the reference content section most similar to the query content based on the shot information (shot boundary time, shot boundary feature amount) of each shot boundary of the query content. Then, it is determined whether the query content is a copy of the reference content using the similarity and threshold value of the section. The estimation of the similar reference section is realized by collecting matching between query content and reference content shot boundaries by voting. Further, the similarity of the section is calculated based on the number of votes in the section.

以下、図５を用いて特徴量照合部４０の動作を具体的に説明する。なお、データベース５０には、複数のリファレンスコンテンツに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）が登録されているものとする。 Hereinafter, the operation of the feature amount matching unit 40 will be specifically described with reference to FIG. It is assumed that shot information (content ID, shot boundary time, shot boundary feature value) related to a plurality of reference contents is registered in the database 50.

特徴量照合部４０は、ショット境界特徴量抽出部２０から、図５（ａ）に示すクエリコンテンツＱに係るショット情報（コンテンツＩＤ、ショット境界時刻、ショット境界特徴量）を取得する。なお、ショット境界時刻ｔ_１はショット境界Ｑ_１のショット境界時刻、ショット境界時刻ｔ_２はショット境界Ｑ_２のショット境界時刻、ショット境界時刻ｔ_３はショット境界Ｑ_３のショット境界時刻であるものとする。 The feature amount matching unit 40 acquires shot information (content ID, shot boundary time, shot boundary feature amount) related to the query content Q shown in FIG. 5A from the shot boundary feature amount extraction unit 20. The shot boundary time t ₁ is the shot boundary time of the shot boundary Q ₁ , the shot boundary time t ₂ is the shot boundary time of the shot boundary Q ₂ , and the shot boundary time t ₃ is the shot boundary time of the shot boundary Q _3. To do.

クエリコンテンツＱに係るショット情報を取得した特徴量照合部４０は、クエリコンテンツＱの各ショット境界Ｑ_１、Ｑ_２、Ｑ_３の各ショット特徴量と類似するショット特徴量を有するリファレンスコンテンツに係るショット境界をマッチングによってデータベース５０から検索する。具体的には、特徴量照合部４０は、クエリコンテンツとリファレンスコンテンツのショット特徴量同士の距離（類似度）が一定以下になるショット境界、または、当該ショット特徴量同士の距離が近い方から所定の個数のショット境界の何れかを類似するショット境界として検索する。 The feature amount matching unit 40 that has acquired the shot information related to the query content Q, the shot related to the reference content having a shot feature amount similar to each shot feature amount of each shot boundary Q ₁ , Q ₂ , Q ₃ of the query content Q. The boundary is searched from the database 50 by matching. Specifically, the feature amount matching unit 40 is predetermined from a shot boundary in which the distance (similarity) between the shot feature amounts of the query content and the reference content is equal to or less than a certain value, or the distance between the shot feature amounts is shorter. Are searched as similar shot boundaries.

具体的には、特徴量照合部４０は、上述のショット特徴量同士の距離は、例えば、クエリコンテンツに係るショット境界特徴量であるＮ×Ｍビット、リファレンスコンテンツに係るショット境界特徴量であるＮ×Ｍビットの同士のハミング距離として単純なビット操作によって高速に算出可能である。例えば、Ｎ×Ｍビットのクエリコンテンツに係るショット境界特徴量Ｂが下記式（７）、Ｎ×Ｍビットのリファレンスコンテンツに係るショット境界特徴量Ｂ’が下記式（８）によって表される場合、特徴量照合部４０は、Ｎ×Ｍビットのビット列（ＢＸＯＲＢ’）を生成し、当該ビット列（ＢＸＯＲＢ’）に含まれる１の個数をＢとＢ’のハミング距離として算出する。 Specifically, the feature amount matching unit 40 determines that the distance between the above-described shot feature amounts is, for example, N × M bits that are shot boundary feature amounts related to query content, and N is a shot boundary feature amount related to reference content. The Hamming distance between × M bits can be calculated at high speed by a simple bit operation. For example, when the shot boundary feature amount B related to the N × M bit query content is expressed by the following equation (7), and the shot boundary feature amount B ′ related to the N × M bit reference content is expressed by the following equation (8): The feature amount matching unit 40 generates an N × M-bit bit string (B XOR B ′) and calculates the number of 1 included in the bit string (B XOR B ′) as the Hamming distance between B and B ′.

但し、特徴量照合部４０は、平均輝度、動き強度、エッジ量が近いブロックの組の大小関係については、上記距離の算出対象から除外してもよい。例えば、エッジ量が近いブロックの組の大小関係を上記距離の算出対象から除外する場合、クエリコンテンツのショット境界の前後の各キーフレームのブロック（ｉ，ｊ）のエッジ量をＥ⁻（ｉ，ｊ）およびＥ^＋（ｉ，ｊ）とするとき、｜Ｅ^＋（ｉ，ｊ）−Ｅ⁻（ｉ，ｊ）｜が小さい方から一定個のブロックに相当するビットに関しては距離算出に利用しないようにしてもよい。カイ二乗値ｄ_Ｘ（ｉ，ｊ）が小さいブロックは、Ｅ⁻（ｉ，ｊ）とＥ^＋（ｉ，ｊ）の大小関係が変わりやすいためビットの値の信頼性が低いためである。 However, the feature amount matching unit 40 may exclude the size relationship between sets of blocks having similar average luminance, motion intensity, and edge amount from the distance calculation target. For example, when excluding the size relationship of a set of blocks having close edge amounts from the calculation target of the distance, the edge amount of each key frame block (i, j) before and after the shot boundary of the query content is set to E ⁻ (i, j) and ^E + (i, when a ^{j), | E + (i} , j) -E - (i, j) | is not used in the distance calculation with respect to bits corresponding to a predetermined number of blocks from the smallest You may do it. This is because a block having a small chi-square value d _X (i, j) has a low bit value reliability because the magnitude relationship between E ⁻ (i, j) and E ⁺ (i, j) is easily changed.

なお、距離の算出に｜Ｅ^＋（ｉ，ｊ）−Ｅ⁻（ｉ，ｊ）｜が小さい方から一定個のブロックに相当するビットを利用しないときは、まず、特徴量照合部４０は、下記式（９）によって表されるマスク特徴量Ｈ（ｉ，ｊ）を作成する。そして、特徴量照合部４０は、上述のビット列（ＢＸＯＲＢ’）に代えて、下記式（１０）によって表されるＮ×Ｍビットのビット列Ｈを利用して、ビット列（（ＢＸＯＲＢ’）ＡＮＤＨ）を生成し、生成したビット列（（ＢＸＯＲＢ’）ＡＮＤＨ）に含まれる１の個数をＢとＢ’のハミング距離として算出する。 The distance for the calculation of ^{^{| E + (i, j)}} -E - (i, j) | when not using the bits corresponding to a predetermined number of blocks from the smaller, first, the feature checker 40, A mask feature amount H (i, j) represented by the following equation (9) is created. Then, the feature amount matching unit 40 uses a bit string ((B XOR B ′) by using an N × M bit bit string H represented by the following equation (10) instead of the above bit string (B XOR B ′). ) AND H) is generated, and the number of 1 included in the generated bit string ((B XOR B ′) AND H) is calculated as the Hamming distance between B and B ′.

以上にようにして、特徴量照合部４０は、例えば、図５（ｂ）に示すように、リファレンスコンテンツＡの各ショット境界Ａ_１、Ａ_２、Ａ_３、Ａ_４、Ａ_５、リファレンスコンテンツＢの各ショット境界Ｂ_１、Ｂ_２、Ｂ_３を検索する。 As described above, the feature amount matching unit 40, for example, as shown in FIG. 5B, each shot boundary A ₁ , A ₂ , A ₃ , A ₄ , A ₅ , and reference content B of the reference content A Each shot boundary B ₁ , B ₂ , B ₃ is searched.

リファレンスコンテンツに係るショット境界を検索した特徴量照合部４０は、マッチングを行った全てのショット境界のペアに関して、（リファレンスコンテンツに係るショット境界の時刻−クエリコンテンツに係るショット境界の時刻）に投票を行う。当該投票は、コピー区間の先頭の推定である。即ち、当該投票は、図５（ｃ）に示すように、マッチングが正しければ、実際のコピー区間の先頭と推定される時刻に集中し、マッチングが正しくなければ、分散する。従って、特徴量照合部４０は、最も多くの投票が集中する時刻への投票数が閾値以上であるか否かを判定し、閾値以上であれば、当該時刻はコピー区間の先頭であると推定する。なお、最も多くの投票が集中する時刻はコピー区間の先頭であると推定した特徴量照合部４０は、当該クエリコンテンツは不正流通コンテンツであると推測した旨の情報を外部に出力する。また、特徴量照合部４０は、当該クエリコンテンツは不正流通コンテンツであると推測した旨の情報に代えてまたは加えて、例えば、クエリコンテンツおよびリファレンスコンテンツに係るショット情報、並びに、コピー区間の先頭位置を示す情報などを外部に出力してもよい。 The feature amount matching unit 40 that has searched for the shot boundary related to the reference content votes for (shot time related to the reference content−time of shot boundary related to the query content) for all the matched shot boundary pairs. Do. The vote is an estimate of the beginning of the copy section. That is, as shown in FIG. 5C, the voting is concentrated at the time estimated to be the head of the actual copy section if the matching is correct, and dispersed if the matching is not correct. Therefore, the feature amount matching unit 40 determines whether or not the number of votes at the time when the most votes are concentrated is greater than or equal to a threshold, and if it is greater than or equal to the threshold, the time is estimated to be the head of the copy section. To do. Note that the feature amount matching unit 40, which has estimated that the time when the most votes are concentrated, is the head of the copy section, outputs information indicating that the query content is unauthorized distribution content to the outside. Further, the feature amount matching unit 40, for example, instead of or in addition to the information that the query content is estimated to be unauthorized distribution content, for example, shot information related to the query content and the reference content, and the start position of the copy section Or the like may be output to the outside.

以上説明したように、動画コンテンツ検出装置１は、クエリコンテンツの特徴量とリファレンスコンテンツの特徴量とを用いて、不正流通コンテンツと推測されるクエリコンテンツを検出する。動画コンテンツ検出装置１は、クエリコンテンツがリファレンスコンテンツの少なくとも一部を含む不正流通コンテンツであるか否かを判定する。 As described above, the moving image content detection apparatus 1 detects query content that is presumed to be illegally distributed content using the feature amount of the query content and the feature amount of the reference content. The moving image content detection apparatus 1 determines whether the query content is unauthorized distribution content including at least a part of the reference content.

動画コンテンツ検出装置１では、ショット境界からキーフレームを抽出するため、ノイズにロバストかつ冗長でないキーフレームが抽出される。また、ショット境界の前後２枚のキーフレームの相関を基に算出した特徴量を利用するため、コンテンツの変容（例えば、編集、改変、ノイズ）に対してロバストなマッチングが可能になる。また、コンテンツ内の全フレームのデコードを要しないため、高速に検出することができる。即ち、動画コンテンツ検出装置１によれば、著作権者等が自由配布を許諾していないコンテンツの一部分を切り出すなど時間軸方向の編集が行われた不正流通コンテンツや、全体が圧縮ノイズなどによって劣化した不正流通コンテンツであっても、精度よくかつ高速に検出することができるようになる。 In the moving image content detection apparatus 1, since key frames are extracted from shot boundaries, key frames that are robust against noise and not redundant are extracted. In addition, since the feature amount calculated based on the correlation between the two key frames before and after the shot boundary is used, robust matching can be performed with respect to the content change (for example, editing, modification, noise). In addition, since it is not necessary to decode all frames in the content, it can be detected at high speed. That is, according to the moving image content detection apparatus 1, illegally distributed content that has been edited in the time axis direction, such as cutting out a part of the content that the copyright holder or the like has not permitted to distribute freely, or the entire content is deteriorated due to compression noise or the like Even illegally distributed content can be detected accurately and at high speed.

なお、動画コンテンツ検出装置１は、図１（ａ）に示すように、ショット境界検出部１０、ショット境界特徴量抽出部２０および特徴量照合部４０に加え、特徴量登録部３０およびデータベース５０を備える例を説明したが、当該構成に限定されない。例えば、動画コンテンツ検出装置１は、図１（ｂ）に示すように、ショット境界検出部１０、ショット境界特徴量抽出部２０および特徴量照合部４０のみを備えてもよい。動画コンテンツ検出装置１は、図１（ｂ）に示す構成の場合、上述のショット境界検出部１０、ショット境界特徴量抽出部２０および特徴量登録部３０を備える外部の装置によってリファレンスコンテンツに係るショット情報が登録されている外部のデータベース５０を参照し、不正流通コンテンツと推測されるクエリコンテンツを検出する。 As shown in FIG. 1A, the moving image content detection apparatus 1 includes a feature amount registration unit 30 and a database 50 in addition to the shot boundary detection unit 10, the shot boundary feature amount extraction unit 20, and the feature amount comparison unit 40. Although the example provided is demonstrated, it is not limited to the said structure. For example, the moving image content detection apparatus 1 may include only the shot boundary detection unit 10, the shot boundary feature amount extraction unit 20, and the feature amount comparison unit 40 as illustrated in FIG. In the case of the configuration shown in FIG. 1B, the moving image content detection device 1 is a shot related to reference content by an external device including the above-described shot boundary detection unit 10, shot boundary feature amount extraction unit 20, and feature amount registration unit 30. Referring to the external database 50 in which information is registered, query content that is presumed to be illegally distributed content is detected.

なお、本発明の一実施形態による動画コンテンツ検出装置１の各処理を実行するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、当該記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより、本発明の一実施形態による動画コンテンツ検出装置１の各処理に係る上述した種々の処理を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものであってもよい。また、「コンピュータシステム」は、ＷＷＷシステムを利用している場合であれば、ホームページ提供環境（あるいは表示環境）も含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、フラッシュメモリ等の書き込み可能な不揮発性メモリ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。 Note that a program for executing each process of the moving image content detection apparatus 1 according to an embodiment of the present invention is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read by a computer system. By executing, the above-described various processes related to each process of the moving image content detection apparatus 1 according to the embodiment of the present invention may be performed. Here, the “computer system” may include an OS and hardware such as peripheral devices. Further, the “computer system” includes a homepage providing environment (or display environment) if a WWW system is used. The “computer-readable recording medium” means a flexible disk, a magneto-optical disk, a ROM, a writable nonvolatile memory such as a flash memory, a portable medium such as a CD-ROM, a hard disk built in a computer system, etc. This is a storage device.

さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（例えばＤＲＡＭ（Dynamic Random Access Memory））のように、一定時間プログラムを保持しているものも含むものとする。また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであっても良い。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であっても良い。 Further, the “computer-readable recording medium” means a volatile memory (for example, DRAM (Dynamic DRAM) in a computer system that becomes a server or a client when a program is transmitted through a network such as the Internet or a communication line such as a telephone line. Random Access Memory)), etc., which hold programs for a certain period of time. The program may be transmitted from a computer system storing the program in a storage device or the like to another computer system via a transmission medium or by a transmission wave in the transmission medium. Here, the “transmission medium” for transmitting the program refers to a medium having a function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The program may be for realizing a part of the functions described above. Furthermore, what can implement | achieve the function mentioned above in combination with the program already recorded on the computer system, and what is called a difference file (difference program) may be sufficient.

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

１動画コンテンツ検出装置１０ショット境界検出部２０ショット境界特徴量抽出部３０特徴量登録部４０特徴量照合部５０データベース（記憶部） DESCRIPTION OF SYMBOLS 1 Movie content detection apparatus 10 Shot boundary detection part 20 Shot boundary feature-value extraction part 30 Feature-value registration part 40 Feature-value collation part 50 Database (memory | storage part)

Claims

A shot boundary detector for detecting the shot boundary of the video content;
A shot boundary feature amount extraction unit that extracts feature amounts from frames before and after the shot boundary detected by the shot boundary detection unit;
A feature that is a feature amount related to one moving image content, and that matches a shot boundary feature amount extracted by the shot boundary feature amount extraction unit with the shot boundary feature amount related to a plurality of moving image contents stored in a storage unit A video content detection apparatus comprising: a quantity verification unit.

The shot boundary detection unit
It is determined whether or not the shot boundary exists within a fixed interval using information of a predetermined frame that is a frame constituting video content and exists at fixed intervals, and the fixed interval at which the shot boundary is determined to exist The designated moving image content detection apparatus according to claim 1, wherein a shot boundary of moving image content is detected by determining whether or not each of the frames is a shot boundary.

The predetermined frame is:
3. The designated moving image content detection apparatus according to claim 2, wherein the specified moving image content detection apparatus is a frame that can be decoded without referring to other frames among the frames constituting the compressed moving image content.

The shot boundary feature amount extraction unit
4. The designated moving image content detection apparatus according to claim 2, wherein the shot boundary feature amount is extracted based on a correlation between frames before and after the shot boundary.

The shot boundary feature amount extraction unit
The frame before and after the shot boundary is divided into a plurality of blocks, a set of a predetermined number of blocks is created from the plurality of blocks, and the shot boundary feature amount is extracted based on the correlation of the block sets. The designated moving image content detection apparatus according to claim 2, wherein the moving image content detection apparatus is a specified moving image content detection apparatus.

The shot boundary feature amount extraction unit
Each of the frames before and after the shot boundary is divided into a plurality of blocks, a set of a certain number of blocks is created from the plurality of blocks, and at least one of the magnitude relationships of the average luminance, motion intensity, and edge amount of the block set The specified moving image content detection apparatus according to claim 2, wherein the shot boundary feature amount is extracted based on.

The feature amount matching unit
A distance between the shot boundary feature amount related to the one moving image content and the shot boundary feature amount related to the moving image content stored in the storage unit is calculated, and the one moving image content and the storage are calculated based on the distance Check the video content stored in the
4. The designated moving image content detection apparatus according to claim 2, wherein the distance calculation does not use a magnitude relationship between a set of blocks having close average brightness, motion intensity, and edge amount.