JP2010033532A

JP2010033532A - Electronic apparatus, motion vector detection method and program thereof

Info

Publication number: JP2010033532A
Application number: JP2008260784A
Authority: JP
Inventors: Noboru Murabayashi; 昇村林; Hiroshige Okamoto; 裕成岡本
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-06-26
Filing date: 2008-10-07
Publication date: 2010-02-12

Abstract

<P>PROBLEM TO BE SOLVED: To prevent false detection of a motion vector as much as possible. <P>SOLUTION: A video characteristic detection part 4 of a recording/reproducing device 100 sequentially stores input thumbnail images in each 2/4/8/16-field interval, detects the motion vector of each block by block matching processing between a referred field and a reference field having each field interval, discriminates validity of the detected motion vector based on a discriminant function, normalizes each motion vector in accordance with the motion vector in the 2-field interval, combines the normalized valid motion vectors for each block to generate a composite motion vector. The composite motion vector is used for camera characteristic detection processing. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、映像コンテンツ中から動きベクトルを検出することが可能な電子機器、当該電子機器における動きベクトル検出方法及びそのプログラムに関する。 The present invention relates to an electronic device capable of detecting a motion vector from video content, a motion vector detection method in the electronic device, and a program thereof.

近年、ＨＤＤ（Hard Disk Drive）／ＢＤ（Blu-ray Disc）レコーダ等の等の電子機器における映像コンテンツの蓄積量は、記録媒体の大容量化やコンテンツの多様化等により、ますます増大している。このような大量の映像コンテンツをユーザに効率よく視聴させる手法として、映像コンテンツからダイジェスト（ハイライト）シーンを抽出して再生したり、映像コンテンツにチャプタを付与してチャプタ毎に再生させたりすることが行われている。これらダイジェストシーンの抽出やチャプタの生成は、例えばパン、チルト、ズーム等の、映像コンテンツを撮影したカメラの動き特徴（以下、カメラ特徴と称する）を検出し、その特徴を基に映像コンテンツから特徴区間を抽出すること等で行われる。 In recent years, the amount of video content stored in electronic devices such as HDD (Hard Disk Drive) / BD (Blu-ray Disc) recorders is increasing due to the increase in recording media capacity and content diversification. Yes. As a method for allowing users to view such a large amount of video content efficiently, a digest (highlight) scene is extracted from the video content and played back, or a chapter is added to the video content and played back for each chapter. Has been done. The digest scene extraction and chapter generation are performed by detecting, for example, panning, tilting, zooming, and the like, the motion characteristics of the camera that captured the video content (hereinafter referred to as camera characteristics), and using the characteristics based on the characteristics. This is done by extracting a section.

このカメラ特徴は、例えば、映像コンテンツ中の動きベクトルを検出することで検出することができる。例えば、特許文献１には、現フレームから切り出した基準ブロックと、現フレームから１フレーム後の参照フレームから切り出した探索ブロックとの間で差分絶対値和を演算してブロックマッチング処理を行うことで動きベクトルを検出する技術が開示されている。 This camera feature can be detected, for example, by detecting a motion vector in the video content. For example, in Patent Document 1, a block matching process is performed by calculating a sum of absolute differences between a reference block cut out from a current frame and a search block cut out from a reference frame one frame after the current frame. A technique for detecting a motion vector is disclosed.

特開２００３‐２９９０４０号公報（図３等）JP 2003-299040 A (FIG. 3 etc.)

しかしながら、フレーム（フィールド）間で動きベクトルを検出する際、例えば映像コンテンツ内で速い動きがあった場合、その動き量が探索範囲を超えてしまい、検出処理が追随できずに、正しく動きベクトルを検出できない場合があった。そして、この誤った動きベクトルがカメラ特徴の検出処理に用いられることで、カメラ特徴の誤検出が発生していた。 However, when a motion vector is detected between frames (fields), for example, if there is a fast motion in the video content, the amount of motion exceeds the search range, and the detection process cannot be followed. In some cases, it could not be detected. The erroneous motion vector is used for the camera feature detection process, and thus the camera feature is erroneously detected.

以上のような事情に鑑み、本発明の目的は、動きベクトルの誤検出を極力防ぐことが可能な電子機器、動きベクトル検出方法及びそのプログラムを提供することにある。 In view of the circumstances as described above, an object of the present invention is to provide an electronic device, a motion vector detection method, and a program thereof that can prevent erroneous detection of a motion vector as much as possible.

上述の課題を解決するため、本発明の一形態に係る電子機器は、入力手段と、検出手段と、記憶手段と、判別手段と、正規化手段と、生成手段とを有する。
上記入力手段には、映像データを構成する複数の画像データのうち、第１の画像データと、上記第１の画像データとの間に第１の時間長を有する第２の画像データと、上記第１の画像データとの間に上記第１の時間長より長い第２の時間長を有する第３の画像データとが、所定のブロック毎に入力される。
上記検出手段は、上記第１の画像データ中の所定の基準ブロックと上記第２の画像データ中の第１の参照ブロックとの間の第１の動きベクトルと、上記基準ブロックと上記第３の画像データ中の第２の参照ブロックとの間の第２の動きベクトルとをそれぞれ検出する。
上記記憶手段は、上記検出される第１及び第２の動きベクトルがそれぞれ妥当か否かを判別するための判別データを記憶する。
上記判別手段は、上記記憶された判別データを用いて、上記検出された第１及び第２の動きベクトルが妥当か否かを上記ブロック毎にそれぞれ判別する。
上記正規化手段は、上記検出された第１及び第２の動きベクトルを時間長について上記ブロック毎にそれぞれ正規化する。
上記生成手段は、上記妥当と判別され上記正規化された上記ブロック毎の第１または第２の動きベクトルを合成して合成動きベクトルを生成する。
ここで電子機器とは、例えばＨＤＤ／ＤＶＤ／ＢＤレコーダ等の記録再生装置、テレビジョン装置、ＰＣ（Personal Computer）、ゲーム機器、携帯電話機、携帯型オーディオ／ビジュアル機器等の電化製品である。第１の時間長は例えば２フィールド時間（または２フレーム時間、以下同様）、４フィールド時間、８フィールド時間等であり、第２の時間長とは例えば４フィールド時間、８フィールド時間、１６フィールド時間等であるが、これらに限られるものではない。また上記構成は、２つの異なる時間長で２種類の動きベクトルを検出して合成する場合を限定的に示すものではなく、３つ以上の異なる時間長で３種類以上の動きベクトルを検出して合成する場合をも含む概念である。上記入力手段は、上記第１乃至第３の画像データが入力される１つのハードウェアであってもよいし、上記第１乃至第３の画像データ毎がそれぞれ入力される、異なる複数のハードウェアであってもよい。所定のブロックとは、上記各画像データを例えば８×８等のマトリクス状に分割した領域である。
この構成により、第１の画像データと第２及び第３の画像データとの間で検出された第１及び第２の動きベクトルの妥当性が判別され、かつ各動きベクトルが正規化された上で合成されることで、合成動きベクトルが生成される。したがって、電子機器は、時間長の異なるフィールド（フレーム）間で検出された妥当な動きベクトルのみを用いて合成動きベクトルを生成できるため、映像データ中の動きベクトルの検出精度を向上させることができる。すなわち、映像データ中の速い動きや遅い動きを、異なる時間長のフィールド（フレーム）間で適切に検出し、かつその妥当性を判別するため、誤検出を極力なくすことができる。この合成動きベクトルは、例えばパン、チルト、ズームといった映像データ中のカメラ特徴の判定や、映像データのオブジェクト符号化等に用いることができる。 In order to solve the above-described problem, an electronic device according to an embodiment of the present invention includes an input unit, a detection unit, a storage unit, a determination unit, a normalization unit, and a generation unit.
The input means includes a first image data, a second image data having a first time length between the first image data, and a plurality of pieces of image data constituting the video data; Third image data having a second time length longer than the first time length is input to each first block with respect to the first image data.
The detection means includes a first motion vector between a predetermined reference block in the first image data and a first reference block in the second image data, the reference block, and the third block. The second motion vector between the second reference block in the image data is detected.
The storage means stores determination data for determining whether or not the detected first and second motion vectors are valid.
The discriminating unit discriminates for each block whether or not the detected first and second motion vectors are valid using the stored discrimination data.
The normalizing means normalizes the detected first and second motion vectors for each block with respect to a time length.
The generating means generates a combined motion vector by combining the first and second motion vectors for each of the blocks determined to be valid and normalized.
Here, the electronic device is an electrical appliance such as a recording / reproducing device such as an HDD / DVD / BD recorder, a television device, a PC (Personal Computer), a game device, a mobile phone, or a portable audio / visual device. The first time length is, for example, 2 field times (or 2 frame times, the same applies hereinafter), 4 field times, 8 field times, etc., and the second time length is, for example, 4 field times, 8 field times, 16 field times. However, the present invention is not limited to these. In addition, the above configuration does not limit the case where two types of motion vectors are detected and synthesized at two different time lengths, and three or more types of motion vectors are detected at three or more different time lengths. It is a concept that includes the case of synthesis. The input means may be one piece of hardware to which the first to third image data are inputted, or a plurality of different pieces of hardware to which the first to third image data are inputted respectively. It may be. The predetermined block is an area obtained by dividing each image data into a matrix of 8 × 8, for example.
With this configuration, the validity of the first and second motion vectors detected between the first image data and the second and third image data is determined, and each motion vector is normalized. Are combined to generate a combined motion vector. Therefore, since the electronic device can generate a combined motion vector using only valid motion vectors detected between fields (frames) having different time lengths, it is possible to improve the detection accuracy of motion vectors in video data. . That is, since fast motion and slow motion in video data are appropriately detected between different time length fields (frames) and the validity thereof is determined, erroneous detection can be minimized. This synthesized motion vector can be used for, for example, determination of camera characteristics in video data such as pan, tilt, and zoom, and object coding of video data.

上記正規化手段は、上記第１の動きベクトルを上記第２の時間長に合わせて正規化してもよい。
この場合、第２の動きベクトルはそのままに、第１の動きベクトルが、第１の時間長と第２の時間長との比に応じて正規化される。これにより、検出された第１の動きベクトルのダイナミックレンジを広げて、分解能を向上させることができる。また、第２の動きベクトルを第１の時間長に合わせて正規化する場合に比べて、第２の動きベクトルの分解能を落とすことなく正規化することができる。したがって、各動きベクトルを基に生成される合成動きベクトルの精度を高めることができる。 The normalization unit may normalize the first motion vector in accordance with the second time length.
In this case, the first motion vector is normalized according to the ratio of the first time length and the second time length while the second motion vector is left as it is. Thereby, the dynamic range of the detected first motion vector can be expanded and the resolution can be improved. In addition, the second motion vector can be normalized without degrading the resolution of the second motion vector as compared with the case where the second motion vector is normalized in accordance with the first time length. Therefore, it is possible to improve the accuracy of the combined motion vector generated based on each motion vector.

上記生成手段は、対応する上記ブロックの上記第１の動きベクトルと第２の動きベクトルとがいずれも妥当であると判別された場合に、上記第１の動きベクトルを用いて上記合成動きベクトルを生成してもよい。
ここで、第２の動きベクトルではなく第１の動きベクトルを用いるのは、動きベクトルの検出時間長が長い程、映像データ中に速い動きがあった場合に、動き量が大きいためにその速い動きに追随できずに探索範囲から外れてしまう可能性が高まるからである。すなわち、電子機器は、速い動きの検出においては第２の動きベクトルよりも第１の動きベクトルの方が信憑性が高いことを考慮して、第１及び第２の動きベクトルがいずれも妥当と判定された場合には、第１の動きベクトルを採用する。これにより、特に速い動きの検出精度を高めることができる。 The generating means determines the synthesized motion vector using the first motion vector when it is determined that both the first motion vector and the second motion vector of the corresponding block are valid. It may be generated.
Here, the first motion vector is used instead of the second motion vector because the longer the motion vector detection time length, the faster the motion in the video data due to the greater amount of motion. This is because there is a high possibility that the user cannot follow the movement and falls out of the search range. That is, the electronic device considers that the first motion vector is more reliable than the second motion vector in detecting fast motion, and that both the first and second motion vectors are valid. If it is determined, the first motion vector is adopted. Thereby, the detection accuracy of particularly fast motion can be enhanced.

上記記憶手段は、複数の学習用映像データと、当該複数の学習用映像データから検出されるべき、上記第１及び第２の時間長にそれぞれ対応した複数の学習用動きベクトルデータとを記憶してもよい。
この場合、上記電子機器は、上記記憶された複数の学習用映像データから、上記第１及び第２の時間長で上記基準ブロックと上記参照ブロックとの間の動きベクトルをそれぞれ検出し、当該検出された動きベクトルと、上記記憶された学習用動きベクトルデータとを用いて、上記検出された動きベクトルが妥当か否かを学習して、上記判別データを生成する学習手段をさらに具備してもよい。
これにより、学習用映像データを用いて、検出される動きベクトルが妥当か否かを学習することで、上記判別データを生成することができる。 The storage means stores a plurality of learning video data and a plurality of learning motion vector data respectively corresponding to the first and second time lengths to be detected from the plurality of learning video data. May be.
In this case, the electronic device detects a motion vector between the reference block and the reference block for each of the first and second time lengths from the stored plurality of learning video data, and performs the detection. Learning means for learning whether or not the detected motion vector is valid using the stored motion vector and the stored learning motion vector data, and generating the discrimination data. Good.
Accordingly, the discrimination data can be generated by learning whether or not the detected motion vector is valid using the learning video data.

上記学習手段は、上記学習用映像データから検出された動きベクトルと、上記基準ブロックと上記参照ブロックとの残差信号と、上記基準ブロックの画素データの平均値及び分散値と、上記参照ブロックの画素データの平均値及び分散値とを用いて判別分析を行うことにより、上記判別データとしての判別関数を生成してもよい。
この構成においては、学習用映像データから検出された動きベクトルと、上記基準ブロックと参照ブロックとの間の残差信号とに加えて、基準ブロック及び参照ブロックの各平均値及び各分散値を基に判別関数を生成する。したがって、上記第１及び第２の動きベクトルの妥当性を、学習用映像データを基に生成された判別関数により、学習的に判別することができるため、当該妥当性をより正確に行うことができる。 The learning means includes a motion vector detected from the learning video data, a residual signal between the base block and the reference block, an average value and a variance value of pixel data of the base block, and the reference block The discriminant function as the discriminant data may be generated by performing discriminant analysis using the average value and the variance value of the pixel data.
In this configuration, in addition to the motion vector detected from the learning video data and the residual signal between the standard block and the reference block, each average value and each variance value of the standard block and the reference block are used as the basis. Generate a discriminant function. Accordingly, since the validity of the first and second motion vectors can be discriminated learning by the discriminant function generated based on the learning video data, the validity can be more accurately performed. it can.

この場合、第１の時間長で検出された第１の動きベクトルに関する判別関数と、第２の時間長で検出された第２の動きベクトルに関する判別関数とを別個に生成しても構わない。例えば、第１の動きベクトルと第２の動きベクトルとでは、各動き量が同一でも、その動き速度が異なる場合がある。この場合、上記残差信号、各平均値及び各分散値に違いが生じることになるが、第１及び第２の動きベクトルについて共通の判別関数を用いると、この動き速度の違いを判別結果に反映できない場合がある。しかし、上述のように別個に判別関数を生成することで、動きベクトルの妥当性を、映像の動き速度の違いも反映してより高精度に判別することができる。 In this case, the discriminant function related to the first motion vector detected in the first time length and the discriminant function related to the second motion vector detected in the second time length may be generated separately. For example, the first motion vector and the second motion vector may have different motion speeds even if the motion amounts are the same. In this case, a difference occurs in the residual signal, each average value, and each variance value. However, if a common discriminant function is used for the first and second motion vectors, this difference in motion speed is used as a discrimination result. It may not be reflected. However, by generating the discriminant function separately as described above, the validity of the motion vector can be discriminated with higher accuracy reflecting the difference in the motion speed of the video.

上記検出手段は、上記検出された第１の動きベクトルが妥当であると判断された場合に、当該第１の動きベクトルが検出された際の上記第１の参照ブロックを基準ブロックとして上記第２の参照ブロックとの間で第３の動きベクトルを検出し、当該検出された第３の動きベクトルと上記第１の動きベクトルとを加算して上記第２の動きベクトルを検出してもよい。
これにより、上記第１の画像データ中の基準ブロックと第２の参照ブロックとの間で第３の動きベクトルを検出するよりも、距離が短い分、第２の参照ブロックを素早く検出することができ、検出効率を向上させることができる。 When the detected first motion vector is determined to be valid, the detection means uses the first reference block when the first motion vector is detected as the reference block and uses the second reference block as the reference block. A third motion vector may be detected from the reference block, and the detected second motion vector may be detected by adding the detected third motion vector and the first motion vector.
As a result, the second reference block can be detected more quickly because the distance is shorter than when the third motion vector is detected between the base block and the second reference block in the first image data. And detection efficiency can be improved.

上記電子機器は、上記生成された合成動きベクトルを基に、カメラ動作により生じる上記映像データ中のカメラ特徴を判定する判定手段をさらに具備してもよい。
これにより、より精度の高い合成動きベクトルを基にすることで、映像データ中のカメラ特徴をより高精度に判定することができる。ここでカメラ特徴とは、例えばパン、チルト、ズーム等の特徴である。 The electronic apparatus may further include a determination unit that determines camera characteristics in the video data generated by the camera operation based on the generated combined motion vector.
Accordingly, the camera feature in the video data can be determined with higher accuracy by using the synthesized motion vector with higher accuracy. Here, the camera features are features such as pan, tilt, and zoom.

上記判定手段は、上記合成動きベクトルが、上記ブロック毎の第１の動きベクトルが合成されたものである場合に、上記カメラ特徴の判定を第１の判定時間継続し、上記合成動きベクトルが、上記ブロック毎の第２の動きベクトルが合成されたものである場合に、上記カメラ特徴の判定を上記第１の判定時間よりも長い第２の判定時間継続してもよい。
この構成により、合成動きベクトルの元になった動きベクトルが検出された時間長に応じて、判定時間を可変してカメラ特徴を検出することで、カメラ特徴の検出精度を向上させることができる。すなわち、第１の動きベクトルは、第２の動きベクトルよりもその検出時間長が短いため、比較的動きの速い映像から検出された動きベクトルであると考えられ、第２の動きベクトルは、比較的動きの遅い映像から検出された動きベクトルであると考えられる。したがって、動きの速い映像からは短時間でカメラ特徴を検出できるため上記第１の判定時間だけ判定を継続し、動きの遅い映像からは短時間ではカメラ特徴を検出できないため上記第２の判定時間判定を継続する。これにより、カメラ特徴の誤検出を低減することができる。 The determination unit continues the determination of the camera feature for a first determination time when the combined motion vector is a combination of the first motion vector for each block, and the combined motion vector is When the second motion vector for each block is synthesized, the camera feature determination may be continued for a second determination time longer than the first determination time.
With this configuration, the detection accuracy of the camera feature can be improved by detecting the camera feature by varying the determination time according to the time length in which the motion vector that is the basis of the combined motion vector is detected. That is, since the first motion vector has a detection time shorter than that of the second motion vector, it is considered that the first motion vector is a motion vector detected from a relatively fast motion image. This is considered to be a motion vector detected from a video with slow target movement. Therefore, since the camera feature can be detected from the fast motion video in a short time, the determination is continued for the first judgment time, and the camera feature cannot be detected from the slow motion video in a short time, so the second judgment time. Continue judgment. This can reduce erroneous detection of camera features.

さらに、上記判定手段は、上記合成動きベクトルが、上記ブロック毎の第１及び第２の動きベクトルが合成されたものである場合に、当該第１及び第２の動きベクトルのうちいずれが多く合成されているかに応じて、上記カメラ特徴の判定を上記第１または第２の判定時間継続してもよい。
これにより、合成動きベクトルに、上記ブロック毎の第１及び第２の動きベクトルが混在する場合でも、いずれの動きベクトルが多いかに応じて判定時間を適応的に設定することができる。 Further, the determining means may synthesize any one of the first and second motion vectors when the combined motion vector is a combination of the first and second motion vectors for each block. The determination of the camera characteristics may be continued for the first determination time or the second determination time depending on whether the determination is made.
Thereby, even when the first and second motion vectors for each block are mixed in the synthesized motion vector, the determination time can be set adaptively depending on which motion vector is larger.

上記電子機器は、第１の学習用映像データ生成手段をさらに有してもよい。
上記第１の学習用映像データ生成手段は、所定の学習用画像データ内で、矩形領域を上記第１及び第２の時間長だけ移動させ、当該矩形領域内の画像データをそれぞれ上記第１及び第２の時間長分抽出することで、上記第１及び第２の動きベクトルとして検出される各移動が妥当か否かを学習するのに必要な上記学習用映像データをそれぞれ生成する。
これにより、電子機器は、学習用画像データ内で矩形領域を移動させることで、上下左右方向への移動としての動きベクトルの妥当性を学習するための学習用映像データを、擬似的に作成することができる。したがって、動きベクトルとしての移動を学習するための学習用映像データを手動で作成する場合に比べて、特性のばらつきが小さい学習用映像データを効率的に収集できる。すなわち、電子機器は、当該収集の労力を軽減できるとともに、学習処理の精度も向上させることができる。 The electronic apparatus may further include first learning video data generation means.
The first learning video data generation means moves the rectangular area within the predetermined learning image data by the first and second time lengths, and moves the image data in the rectangular area to the first and second image data, respectively. By extracting the second time length, the learning video data necessary for learning whether or not each movement detected as the first and second motion vectors is appropriate is generated.
Thus, the electronic device artificially creates learning video data for learning the validity of the motion vector as the movement in the vertical and horizontal directions by moving the rectangular area in the learning image data. be able to. Therefore, it is possible to efficiently collect learning video data with less variation in characteristics compared to a case where learning video data for learning movement as a motion vector is manually created. In other words, the electronic device can reduce the labor of the collection and can improve the accuracy of the learning process.

上記電子機器は、第２の学習用映像データ生成手段をさらに有してもよい。
上記第２の学習用映像データ生成手段は、所定の学習用画像データ内で、第１の面積を有する第１の矩形領域内の画像を、上記第１の面積とは異なる第２の面積を有する第２の矩形領域内の画像に、上記第１及び第２の時間長でそれぞれ変化させ抽出することで、上記第１及び第２の動きベクトルとして検出される各拡大または各縮小が妥当か否かを学習するのに必要な上記学習用映像データをそれぞれ生成する。
これにより、電子機器は、第１の矩形領域内の画像を第２の矩形領域内の画像へ徐々に変化させることで、拡大または縮小としての動きベクトルの妥当性を学習するための学習用映像データを擬似的に作成することができる。したがって、動きベクトルとしての拡大または縮小を学習するための学習用映像データを手動で作成する場合に比べて、特性のばらつきが小さい学習用映像データを効率的に収集できる。すなわち、電子機器は、当該収集の労力を軽減できるとともに、学習処理の精度も向上させることができる。 The electronic apparatus may further include second learning video data generation means.
The second learning video data generation means is configured to display an image in a first rectangular area having a first area within a predetermined learning image data with a second area different from the first area. Whether each of the enlargements or reductions detected as the first and second motion vectors is appropriate by extracting and changing the images in the second rectangular area having the first and second time lengths, respectively. The learning video data necessary for learning whether or not is generated.
Thereby, the electronic device gradually changes the image in the first rectangular area to the image in the second rectangular area, thereby learning the validity of the motion vector as the enlargement or reduction. Data can be created in a pseudo manner. Therefore, it is possible to efficiently collect learning video data with less variation in characteristics compared to the case of manually creating learning video data for learning expansion or reduction as a motion vector. In other words, the electronic device can reduce the labor of the collection and can improve the accuracy of the learning process.

上記第１または第２の学習用映像データ生成手段は、所定のサンプル映像データ中の、連続する第１のサンプル画像データと第２のサンプル画像データとの間の、上記ブロック毎の残差信号の和を算出し、当該和が所定の閾値以上である場合に、上記第２のサンプル画像データを上記学習用画像データとして抽出してもよい。
上記残差信号の和が所定の閾値以上である場合の上記第２のサンプル画像データは、上記サンプル映像データにおけるいわゆるカット点である。これにより、電子機器は、１つのサンプル映像データから、シーンの異なる様々な画像データを上記学習用画像データとして抽出することができ、適切な学習用画像データの収集の労力をさらに軽減することができる。 The first or second learning video data generating means is a residual signal for each block between the first sample image data and the second sample image data that are consecutive in predetermined sample video data. And the second sample image data may be extracted as the learning image data when the sum is equal to or greater than a predetermined threshold.
The second sample image data when the sum of the residual signals is equal to or greater than a predetermined threshold is a so-called cut point in the sample video data. Thereby, the electronic device can extract various image data with different scenes as the learning image data from one sample video data, and can further reduce the labor of collecting appropriate learning image data. it can.

この場合、上記第１または第２の学習用映像データ生成手段は、上記第１及び第２の動きベクトルが検出される上記映像データを上記サンプル映像データとして上記学習用画像データを抽出してもよい。
これにより、電子機器は、実際に上記第１及び第２の動きベクトルが検出される映像データをサンプル映像データとして用いることで、当該第１及び第２の動きベクトルにより近い動きベクトルが検出される学習用映像データを生成することができる。したがって電子機器は、第１及び第２の動きベクトルの妥当性をより確実に判別することができる。例えば電子機器が、上記映像データから検出される動きベクトルを基にダイジェスト映像を作成する場合、電子機器は、当該ダイジェスト映像作成をユーザから指示されたときに、まずその映像データから学習用画像データを抽出し学習処理を実行すればよい。 In this case, the first or second learning video data generation means may extract the learning image data using the video data from which the first and second motion vectors are detected as the sample video data. Good.
Thereby, the electronic device uses the video data from which the first and second motion vectors are actually detected as the sample video data, thereby detecting a motion vector closer to the first and second motion vectors. Learning video data can be generated. Therefore, the electronic device can more reliably determine the validity of the first and second motion vectors. For example, when an electronic device creates a digest video based on a motion vector detected from the video data, when the electronic device is instructed to create the digest video, the electronic device first uses the video data for learning. And the learning process may be executed.

本発明の別の形態に係る動きベクトル検出方法は、映像データを構成する複数の画像データのうち、第１の画像データと、上記第１の画像データとの間に第１の時間長を有する第２の画像データと、上記第１の画像データとの間に上記第１の時間長より長い第２の時間長を有する第３の画像データとを、所定のブロック毎に入力することを含む。
上記入力された第１の画像データ中の所定の基準ブロックと上記入力された第２の画像データ中の第１の参照ブロックとの間からは、第１の動きベクトルが検出される。
上記基準ブロックと上記入力された第３の画像データ中の第２の参照ブロックとの間からは、第２の動きベクトルが検出される。
上記検出される第１及び第２の動きベクトルがそれぞれ妥当か否かを判別するための判別データが記憶される。
上記記憶された判別データを用いて、上記検出された第１及び第２の動きベクトルが妥当か否かが上記ブロック毎にそれぞれ判別される。
上記検出された第１及び第２の動きベクトルは、時間長について上記ブロック毎にそれぞれ正規化される。
上記妥当と判別され上記正規化された上記ブロック毎の第１または第２の動きベクトルは合成され、合成動きベクトルが生成される。
この構成により、時間長の異なるフィールド間で検出された妥当な動きベクトルのみを用いて合成動きベクトルを生成できるため、映像データ中の動きベクトルの検出精度を向上させることができる。 A motion vector detection method according to another aspect of the present invention has a first time length between first image data and the first image data among a plurality of image data constituting video data. Inputting, for each predetermined block, third image data having a second time length longer than the first time length between the second image data and the first image data. .
A first motion vector is detected between a predetermined reference block in the input first image data and a first reference block in the input second image data.
A second motion vector is detected between the reference block and the second reference block in the input third image data.
Determination data for determining whether or not the detected first and second motion vectors are valid is stored.
Using the stored discrimination data, it is discriminated for each block whether or not the detected first and second motion vectors are valid.
The detected first and second motion vectors are normalized for each block with respect to time length.
The first and second motion vectors for each of the blocks determined to be valid and normalized are combined to generate a combined motion vector.
With this configuration, since a combined motion vector can be generated using only valid motion vectors detected between fields having different time lengths, it is possible to improve the detection accuracy of motion vectors in video data.

上記動きベクトル検出方法において、上記合成動きベクトルが、上記ブロック毎の第１の動きベクトルが合成されたものである場合には、カメラ動作により生じる上記映像データ中のカメラ特徴が、上記合成動きベクトルを基に第１の判定時間継続して判定される。
また、上記合成動きベクトルが、上記ブロック毎の第２の動きベクトルが合成されたものである場合には、上記カメラ特徴が、上記合成動きベクトルを基に上記第１の判定時間よりも長い第２の判定時間継続して判定される。
この構成により、合成動きベクトルの元になった動きベクトルが検出された時間長に応じて、判定時間を可変してカメラ特徴を検出することで、カメラ特徴の検出精度を向上させることができる。 In the motion vector detection method, when the composite motion vector is a composite of the first motion vector for each block, the camera feature in the video data generated by the camera operation is the composite motion vector. Based on the above, the determination is continued for the first determination time.
Further, when the synthesized motion vector is a synthesized second motion vector for each block, the camera feature is longer than the first determination time based on the synthesized motion vector. The determination is continued for 2 determination times.
With this configuration, the detection accuracy of the camera feature can be improved by detecting the camera feature by varying the determination time according to the time length in which the motion vector that is the basis of the combined motion vector is detected.

本発明のまた別の形態に係るプログラムは、電子機器に、入力ステップと、検出ステップと、記憶ステップと、判別ステップと、正規化ステップと、生成ステップとを実行させるためのものである。
上記入力ステップは、映像データを構成する複数の画像データのうち、第１の画像データと、上記第１の画像データとの間に第１の時間長を有する第２の画像データと、上記第１の画像データとの間に上記第１の時間長より長い第２の時間長を有する第３の画像データとを、所定のブロック毎に入力する。
上記検出ステップは、上記第１の画像データ中の所定の基準ブロックと上記第２の画像データ中の第１の参照ブロックとの間の第１の動きベクトルと、上記基準ブロックと上記第３の画像データ中の第２の参照ブロックとの間の第２の動きベクトルとをそれぞれ検出する。
上記記憶ステップは、上記検出される第１及び第２の動きベクトルがそれぞれ妥当か否かを判別するための判別データを記憶する。
上記判別ステップは、上記記憶された判別データを用いて、上記検出された第１及び第２の動きベクトルが妥当か否かを上記ブロック毎にそれぞれ判別する。
上記正規化ステップは、上記検出された第１及び第２の動きベクトルを時間長について上記ブロック毎にそれぞれ正規化する。
上記生成ステップは、上記妥当と判別され上記正規化された上記ブロック毎の第１または第２の動きベクトルを合成して合成動きベクトルを生成する。 A program according to still another aspect of the present invention causes an electronic device to execute an input step, a detection step, a storage step, a determination step, a normalization step, and a generation step.
The input step includes: first image data out of a plurality of image data constituting video data; second image data having a first time length between the first image data; Third image data having a second time length longer than the first time length is input to each predetermined block between the first image data and the first image data.
The detecting step includes a first motion vector between a predetermined reference block in the first image data and a first reference block in the second image data, the reference block, and the third block. The second motion vector between the second reference block in the image data is detected.
The storing step stores determination data for determining whether or not the detected first and second motion vectors are valid.
The discrimination step discriminates for each block whether or not the detected first and second motion vectors are valid using the stored discrimination data.
The normalizing step normalizes the detected first and second motion vectors for each block with respect to a time length.
The generating step generates a combined motion vector by combining the first or second motion vector for each block that has been determined to be valid and normalized.

上記プログラムは、上記電子機器にさらに、判定ステップを実行させてもよい。
上記判定ステップは、上記合成動きベクトルが、上記ブロック毎の第１の動きベクトルが合成されたものである場合に、カメラ動作により生じる上記映像データ中のカメラ特徴を、上記合成動きベクトルを基に第１の判定時間継続して判定する。
上記判定ステップは、上記合成動きベクトルが、上記ブロック毎の第２の動きベクトルが合成されたものである場合に、上記カメラ特徴を、上記合成動きベクトルを基に上記第１の判定時間よりも長い第２の判定時間継続して判定する。 The program may further cause the electronic device to execute a determination step.
In the determination step, when the combined motion vector is a combination of the first motion vector for each block, the camera feature in the video data generated by the camera operation is determined based on the combined motion vector. The determination is continued for the first determination time.
In the determination step, when the combined motion vector is a combination of the second motion vector for each block, the camera feature is determined based on the combined motion vector from the first determination time. The determination is continued for a long second determination time.

以上のように、本発明によれば、動きベクトルの誤検出を極力防ぐことができる。 As described above, according to the present invention, erroneous detection of motion vectors can be prevented as much as possible.

以下、本発明の実施の形態を図面に基づき説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

［記録再生装置のハードウェア構成］
図１は、本発明の一実施形態に係る記録再生装置１００の構成を示した図である。
同図に示すように、記録再生装置１００は、ＣＰＵ（Central Processing Unit）１、ＲＡＭ（Random Access Memory）２、操作入力部３、映像特徴検出部４、デジタルチューナ５、IEEE1394インタフェース６、Ethernet（登録商標）／無線ＬＡＮ（Local Area Network）インタフェース７、ＵＳＢ（Universal Serial Bus）インタフェース８、メモリカードインタフェース９、ＨＤＤ１０、光ディスクドライブ１１、バッファコントローラ１３、セレクタ１４、デマルチプレクサ１５、ＡＶ（Audio/Video）デコーダ１６、ＯＳＤ（On Screen Display）１７、映像Ｄ／Ａ（Digital/Analog）コンバータ１８及び音声Ｄ／Ａコンバータ１９を有している。 [Hardware configuration of recording / reproducing apparatus]
FIG. 1 is a diagram showing a configuration of a recording / reproducing apparatus 100 according to an embodiment of the present invention.
As shown in FIG. 1, a recording / reproducing apparatus 100 includes a CPU (Central Processing Unit) 1, a RAM (Random Access Memory) 2, an operation input unit 3, a video feature detection unit 4, a digital tuner 5, an IEEE1394 interface 6, an Ethernet ( (Registered trademark) / wireless LAN (Local Area Network) interface 7, USB (Universal Serial Bus) interface 8, memory card interface 9, HDD 10, optical disk drive 11, buffer controller 13, selector 14, demultiplexer 15, AV (Audio / Video) ) Decoder 16, OSD (On Screen Display) 17, video D / A (Digital / Analog) converter 18 and audio D / A converter 19.

ＣＰＵ１は、必要に応じてＲＡＭ２等に適宜アクセスし、記録再生装置１００の各ブロック全体を制御する。ＲＡＭ２は、ＣＰＵ１の作業用領域等として用いられ、ＯＳ（Operating System）やプログラム、処理データ等を一時的に保持するメモリである。 The CPU 1 appropriately accesses the RAM 2 or the like as necessary, and controls the entire blocks of the recording / reproducing apparatus 100. The RAM 2 is a memory that is used as a work area of the CPU 1 and temporarily stores an OS (Operating System), a program, processing data, and the like.

操作入力部３は、ボタン、スイッチ、キー、タッチパネルや、リモートコントローラ（図示せず）から送信される赤外線信号の受光部等で構成され、ユーザの操作による各種設定値や指令を入力してＣＰＵ１へ出力する。 The operation input unit 3 includes a button, a switch, a key, a touch panel, a light receiving unit for an infrared signal transmitted from a remote controller (not shown), and the like. Output to.

デジタルチューナ５は、ＣＰＵ１の制御に従って、図示しないアンテナを介してデジタル放送の放送番組の放送信号を受信し、特定のチャンネルの放送信号を選局及び復調する。この放送信号は、セレクタ１４を介してデマルチプレクサ１５に出力され再生させたり、バッファコントローラ１３を介して、ＨＤＤ１０に記録されたり、光ディスクドライブ１１に挿入された光ディスク１２へ記録されたりする。 Under the control of the CPU 1, the digital tuner 5 receives a broadcast signal of a digital broadcast program through an antenna (not shown), and selects and demodulates a broadcast signal of a specific channel. This broadcast signal is output to the demultiplexer 15 via the selector 14 for reproduction, or recorded on the HDD 10 via the buffer controller 13 or recorded on the optical disk 12 inserted in the optical disk drive 11.

IEEE1394インタフェース６は、例えばデジタルビデオカメラ等の外部機器に接続可能である。例えばデジタルビデオカメラによって撮影され記録された映像コンテンツは、上記デジタルチューナ５によって受信された放送番組の映像コンテンツと同様に、再生されたり、ＨＤＤ１０や光ディスク１２へ記録されたりする。 The IEEE1394 interface 6 can be connected to an external device such as a digital video camera. For example, video content shot and recorded by a digital video camera is played back or recorded on the HDD 10 or the optical disc 12 in the same manner as the video content of a broadcast program received by the digital tuner 5.

Ethernet（登録商標）／無線ＬＡＮインタフェース７は、例えばＰＣや他の記録再生装置に記録された映像コンテンツを、Ethernet（登録商標）または無線ＬＡＮ経由で入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The Ethernet (registered trademark) / wireless LAN interface 7 inputs, for example, video content recorded on a PC or other recording / playback apparatus via Ethernet (registered trademark) or wireless LAN. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

ＵＳＢインタフェース８は、ＵＳＢを介して例えばデジタルカメラ等の機器やいわゆるＵＳＢメモリ等の外部記憶装置から映像コンテンツを入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The USB interface 8 inputs video content from a device such as a digital camera or an external storage device such as a so-called USB memory via the USB. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

メモリカードインタフェース９は、例えばフラッシュメモリを内蔵したメモリカードと接続して、当該メモリカードに記録された映像コンテンツを入力する。この映像コンテンツも、再生や、ＨＤＤ１０または光ディスク１２への記録が可能である。 The memory card interface 9 is connected to, for example, a memory card with a built-in flash memory, and inputs video content recorded on the memory card. This video content can also be played back and recorded on the HDD 10 or the optical disc 12.

ＨＤＤ１０は、放送信号として受信したまたは外部機器から入力した各種映像コンテンツの他、各種プログラムやデータ等を内蔵のハードディスクに記録し、また再生時にはそれらを当該ハードディスクから読み出し、バッファコントローラ１３へ出力する。またＨＤＤ１０は、ＯＳや、後述する映像コンテンツから動きベクトル及びカメラ特徴の検出処理を実行するためのプログラム、動きベクトルの学習処理及び判別分析を実行するためのプログラム及びデータ、その他の各種プログラム及びデータ等も格納する。 The HDD 10 records various video contents received as a broadcast signal or input from an external device, various programs, data, and the like on a built-in hard disk, and reads them from the hard disk and outputs them to the buffer controller 13 during reproduction. In addition, the HDD 10 is a program for executing motion vector and camera feature detection processing from the OS and video content described later, a program and data for executing motion vector learning processing and discriminant analysis, and various other programs and data. Etc. are also stored.

なお、記録再生装置１００は、これらＯＳや各種プログラム及びデータを、ＨＤＤ１０ではなく、フラッシュメモリ（図示せず）等の他の記録媒体に格納してもよい。また、これらのプログラムは、例えば光ディスク１２やメモリカード等の記録媒体に記録され、光ディスクドライブ１１により記録再生装置１００にインストールされてもよい。 Note that the recording / reproducing apparatus 100 may store the OS, various programs, and data in another recording medium such as a flash memory (not shown) instead of the HDD 10. These programs may be recorded on a recording medium such as an optical disk 12 or a memory card, and installed in the recording / reproducing apparatus 100 by the optical disk drive 11.

光ディスクドライブ１１は、上記映像コンテンツ等を光ディスク１２に記録し、また再生時にはそれらを読み出し、バッファコントローラ１３へ出力する。光ディスク１２は、例えばＤＶＤ、ＢＤ、ＣＤ等である。 The optical disk drive 11 records the video content and the like on the optical disk 12, reads them during reproduction, and outputs them to the buffer controller 13. The optical disk 12 is, for example, a DVD, a BD, a CD, or the like.

バッファコントローラ１３は、例えば上記デジタルチューナ５やその他の各種インタフェースから連続的に供給される映像コンテンツの、ＨＤＤ１０または光ディスク１２への書き込みのタイミングやデータ量を制御し、当該映像コンテンツを断続的に書き込む。また、バッファコントローラ１３は、ＨＤＤ１０や光ディスク１２に記録された映像コンテンツの読み出しのタイミングやデータ量を制御し、断続的に読み出された映像コンテンツを、デマルチプレクサ１５へ連続的に供給する。 For example, the buffer controller 13 controls the timing and amount of data written to the HDD 10 or the optical disk 12 of video content continuously supplied from the digital tuner 5 and other various interfaces, and writes the video content intermittently. . Further, the buffer controller 13 controls the read timing and data amount of the video content recorded on the HDD 10 and the optical disk 12 and continuously supplies the video content read intermittently to the demultiplexer 15.

セレクタ１４は、上記デジタルチューナ５、各種インタフェース、ＨＤＤ１０及び光ディスクドライブ１１のいずれかから入力される映像コンテンツを、ＣＰＵ１からの制御信号に基づき選択する。 The selector 14 selects video content input from any of the digital tuner 5, various interfaces, the HDD 10, and the optical disk drive 11 based on a control signal from the CPU 1.

デマルチプレクサ１５は、上記バッファコントローラ１３から入力された、多重化された映像コンテンツを、映像信号と音声信号とに分離して、それらをＡＶデコーダ１６へ出力する。 The demultiplexer 15 separates the multiplexed video content input from the buffer controller 13 into a video signal and an audio signal, and outputs them to the AV decoder 16.

ＡＶデコーダ１６は、例えばＭＰＥＧ（Moving Picture Expert Group）−２やＭＰＥＧ−４等の形式でエンコードされた映像信号及び音声信号をそれぞれデコードして、映像信号をＯＳＤ１７へ、また音声信号をＤ／Ａコンバータ１９へ出力する。 The AV decoder 16 decodes a video signal and an audio signal encoded in a format such as MPEG (Moving Picture Expert Group) -2 or MPEG-4, for example, and converts the video signal to the OSD 17 and the audio signal to the D / A. Output to the converter 19.

ＯＳＤ１７は、図示しないディスプレイに表示するためのグラフィックス等を生成して、上記映像信号との合成処理や切り替え処理を施し、処理後の映像信号を映像Ｄ／Ａコンバータ１８へ出力する。映像Ｄ／Ａコンバータ１８は、ＯＳＤ１７でグラフィック処理を施された映像信号をＤ／Ａ変換によりＮＴＳＣ（National Television Standards Committee）信号とし、図示しないディスプレイに出力して表示させる。 The OSD 17 generates graphics or the like to be displayed on a display (not shown), performs synthesis processing and switching processing with the video signal, and outputs the processed video signal to the video D / A converter 18. The video D / A converter 18 converts the video signal subjected to graphic processing by the OSD 17 into an NTSC (National Television Standards Committee) signal by D / A conversion, and outputs and displays it on a display (not shown).

音声Ｄ／Ａコンバータ１９は、上記ＡＶデコーダ１６から入力された音声信号をＤ／Ａ変換して、図示しないスピーカに出力して再生させる。 The audio D / A converter 19 D / A converts the audio signal input from the AV decoder 16 and outputs it to a speaker (not shown) for reproduction.

映像特徴検出部４は、ＡＶデコーダ１６によるデコード前の映像信号、または、デコード後の映像信号から、映像特徴を検出する。 The video feature detection unit 4 detects video features from the video signal before decoding by the AV decoder 16 or the video signal after decoding.

［映像特徴の説明］
図２及び図３は、上記映像特徴について説明した図である。
図２（Ａ）では、シーンＳ１〜Ｓ６へ進むに従ってカメラを左方向または右方向へ移動（パン）させて撮影した映像が示されている。図２（Ｂ）では、シーンＳ１〜Ｓ６へ進むに従ってカメラをズーム（ズームイン）させて撮影した映像が示されている。本実施形態では、このようなパン、チルト（図示せず）、ズームといった、カメラワークにより生じる映像特徴をカメラ特徴と称する。 [Description of video features]
2 and 3 are diagrams illustrating the video feature.
FIG. 2 (A) shows an image captured by moving (panning) the camera leftward or rightward as the process proceeds to scenes S1 to S6. FIG. 2B shows an image captured by zooming in (zooming in) the camera as it proceeds to scenes S1 to S6. In the present embodiment, video features generated by camera work such as pan, tilt (not shown), and zoom are referred to as camera features.

図３（Ａ）では、シーンＳ３とＳ４との間のカット点ｆａにてシーンが切り替わる様子が示されている。図３（Ｂ）では、シーンＳ１〜シーンＳ３へ進むに従って１つのシーンが徐々にフェードアウトし、代わってシーンＳ４〜シーンＳ６へ進むに従って別のシーンが徐々にフェードインしている様子が示されている。本実施形態では、このようなカット、フェード等の映像効果のように、映像の編集作業により生じる映像特徴を映像編集特徴と称する。 FIG. 3A shows a state where the scene is switched at a cut point fa between the scenes S3 and S4. FIG. 3B shows that one scene gradually fades out as it proceeds to scenes S1 to S3, and another scene gradually fades in as it proceeds to scenes S4 to S6 instead. Yes. In the present embodiment, video features generated by video editing work, such as video effects such as cut and fade, are referred to as video editing features.

映像特徴検出部４は、このようなカメラ特徴及び映像編集特徴を、後述する共通の信号処理系により検出する。記録再生装置１００は、当該映像特徴を用いて、例えばハイライトシーン（ダイジェストシーン）生成、チャプタ生成等の処理を行う。 The video feature detection unit 4 detects such camera features and video editing features by a common signal processing system to be described later. The recording / reproducing apparatus 100 performs processing such as highlight scene (digest scene) generation and chapter generation using the video feature.

［映像特徴検出部の構成］
図４は、上記映像特徴検出部４の具体的構成を示したブロック図である。以下に説明する各ブロックは、ハードウェアとして実現されてもよいし、ソフトウェアとして実現されてもよい。
同図に示すように、映像特徴検出部４は、ブロック処理部２１、２フィールド間メモリ部２２、４フィールド間メモリ部２３、８フィールド間メモリ部２４、１６フィールド間メモリ部２５、これらメモリ部毎のマッチング処理部２６、２８、３０及び３２、フェード／カット処理部２７、２９、３１及び３３、動きベクトル処理部３４、カメラ特徴判定部３６及びフェード／カット判定部３５を有する。 [Configuration of video feature detector]
FIG. 4 is a block diagram showing a specific configuration of the video feature detection unit 4. Each block described below may be realized as hardware or software.
As shown in the figure, the video feature detection unit 4 includes a block processing unit 21, an inter-field memory unit 22, an inter-field memory unit 23, an inter-field memory unit 24, an inter-field memory unit 25, and these memory units. Each matching processing unit 26, 28, 30 and 32, fade / cut processing units 27, 29, 31 and 33, a motion vector processing unit 34, a camera feature determination unit 36 and a fade / cut determination unit 35 are included.

ブロック処理部２１は、映像コンテンツの画像データをフィールド番号順に入力する。ここで入力される画像データは、例えば、元の映像コンテンツから画像サイズを縮小されて生成されたサムネイル映像を構成する画像データである。当該サムネイル画像データのサイズは例えば96x128とされるが、これに限られない。これにより、映像特徴検出処理の負荷を抑えることができる。そしてブロック処理部２１は、当該画像データを基に、動きベクトル検出の対象となる探索領域の設定、及びブロックマッチング処理の基準となるブロック（以下、基準ブロックと称する）の抽出等の処理を行う。 The block processing unit 21 inputs image data of video content in the order of field numbers. The image data input here is, for example, image data constituting a thumbnail video generated by reducing the image size from the original video content. The size of the thumbnail image data is, for example, 96x128, but is not limited thereto. Thereby, the load of the video feature detection process can be suppressed. Then, based on the image data, the block processing unit 21 performs processing such as setting a search region that is a target for motion vector detection and extraction of a block that is a reference for block matching processing (hereinafter referred to as a reference block). .

上記ブロック処理部２１に入力されるサムネイル画像データは、例えば、上記ＡＶデコーダ１６によりデコードされたサムネイル映像コンテンツを構成する各フィールドの、ベースバンド帯域の画像データ（具体的には、輝度信号Ｙ、色差信号Ｃｂ及びＣｒ）である。記録再生装置１００で再生される映像コンテンツとしては、ＭＰＥＧ方式、デジタル記録のＤＶ（Digital Video）方式、アナログ記録のＶＨＳ（Video Home System）方式や８ｍｍ方式等、様々な方式のデータが混在している場合が想定できる。そこで、記録再生装置１００は、ベースバンド帯域で処理を行うことで、これらの映像コンテンツからの映像特徴の抽出処理を、極力共通の信号処理系で行うことができる。 The thumbnail image data input to the block processing unit 21 is, for example, baseband image data (specifically, a luminance signal Y, each field of the thumbnail video content decoded by the AV decoder 16). Color difference signals Cb and Cr). As video contents to be played back by the recording / playback apparatus 100, various types of data such as MPEG, digital recording DV (Digital Video), analog recording VHS (Video Home System), and 8 mm are mixed. Can be assumed. Therefore, the recording / reproducing apparatus 100 can perform processing of extracting video features from these video contents by using a common signal processing system as much as possible by performing processing in the baseband.

２／４／８／１６フィールド間メモリ部２２〜２５は、上記基準ブロックを抽出したフィールド（以下、基準フィールドと称する）からそれぞれ２フィールド、４フィールド、８フィールド、１６フィールドの時間長を置いた各フィールドまでの各画像データとを蓄積する。勿論、フィールド間の時間長はこれらに限られるものではない。２／４／８／１６フィールド間メモリ部２２〜２５は、各フィールド間毎に別個のメモリ素子として設けられてもよいし、単一のメモリ素子における別個の記憶領域として設けられてもよい。 The 2/4/8/16 inter-field memory units 22 to 25 have time lengths of 2 fields, 4 fields, 8 fields, and 16 fields respectively from the field from which the reference block is extracted (hereinafter referred to as a reference field). Each image data up to each field is stored. Of course, the time length between fields is not limited to these. The 2/4/8/16 inter-field memory units 22 to 25 may be provided as separate memory elements for each field, or may be provided as separate storage areas in a single memory element.

マッチング処理部２６、２８、３０及び３２は、それぞれ、上記ブロック処理部２１から入力する基準フィールドと、各フィールド間メモリ部２２〜２５から入力する各フィールド（以下、参照フィールドと称する）の各探索領域間で、ブロックマッチング処理を行う。当該ブロックマッチング処理の結果は、動きベクトル処理部３４へ出力される。各マッチング処理部２６、２８、３０及び３２は、参照フィールド内で、上記基準フィールドの基準ブロックと同一形状のブロック（以下、参照ブロックと称する）を移動させながら、基準ブロックと参照ブロックとの類似度が最大となる位置を探索する。そして、各マッチング処理部２６、２８、３０及び３２は、上記基準位置から上記探索された位置までの動きベクトル量（すなわち、ｘ方向（水平方向）及びｙ方向（垂直方向）における各移動量及び移動方向）を動きベクトル処理部３４へ出力する。また、各マッチング処理部２６、２８、３０及び３２は、基準ブロックと参照ブロックとの間のＹ、Ｃｂ及びＣｒの残差信号をそれぞれ各フェード／カット処理部２７、２９、３１及び３３へ出力する。これら処理の詳細についても後述する。 The matching processing units 26, 28, 30 and 32 respectively search for the reference field input from the block processing unit 21 and each field (hereinafter referred to as a reference field) input from the inter-field memory units 22-25. Block matching processing is performed between regions. The result of the block matching process is output to the motion vector processing unit 34. Each matching processing unit 26, 28, 30, and 32 moves a block having the same shape as the standard block of the standard field (hereinafter referred to as a reference block) within the reference field, while similar to the standard block and the reference block. Search for the position with the maximum degree. Then, each of the matching processing units 26, 28, 30 and 32 performs motion vector amounts from the reference position to the searched position (that is, movement amounts in the x direction (horizontal direction) and the y direction (vertical direction) and (Movement direction) is output to the motion vector processing unit 34. The matching processing units 26, 28, 30 and 32 output Y, Cb and Cr residual signals between the base block and the reference block to the fade / cut processing units 27, 29, 31 and 33, respectively. To do. Details of these processes will also be described later.

フェード／カット処理部２７、２９、３１及び３３は、それぞれ、上記各マッチング処理部２６、２８、３０及び３２から入力したマッチング後の残差信号を基に、フェード／カット評価値を生成し、フェード／カット判定部３５へ出力する。この処理の詳細についても後述する。 Fade / cut processing units 27, 29, 31 and 33 generate fade / cut evaluation values based on the residual signals after matching input from the matching processing units 26, 28, 30 and 32, respectively. Output to the fade / cut determination unit 35. Details of this processing will also be described later.

フェード／カット処理部２７、２９、３１及び３３は、上記ブロック処理部２１から入力される上記基準ブロックと、各フィールド間メモリ部２２〜２５から入力される、上記ブロックマッチング処理に用いた各参照ブロックとの残差を独自に算出してもよい。 The fade / cut processing units 27, 29, 31 and 33 are the reference blocks input from the block processing unit 21 and the references used for the block matching processing input from the inter-field memory units 22 to 25. The residual with the block may be calculated independently.

動きベクトル処理部３４は、上記各マッチング処理部２６、２８、３０及び３２から入力されたブロックマッチング処理の結果としての上記動きベクトルが有効（妥当）か否かを判別し、有効な動きベクトルを基に合成動きベクトルを生成する。当該合成動きベクトルは、カメラ特徴判定部３６へ出力される。この処理の詳細についても後述する。 The motion vector processing unit 34 determines whether or not the motion vector as a result of the block matching process input from each of the matching processing units 26, 28, 30 and 32 is valid (valid), and determines an effective motion vector. Based on this, a combined motion vector is generated. The combined motion vector is output to the camera feature determination unit 36. Details of this processing will also be described later.

カメラ特徴判定部３６は、上記動きベクトル処理部３４から入力された合成動きベクトルを基に、後述するアフィン変換モデルを用いた重回帰分析により、映像コンテンツ中の各カメラ特徴を判定し、判定結果をＣＰＵ１へ出力する。この処理の詳細についても後述する。 Based on the combined motion vector input from the motion vector processing unit 34, the camera feature determination unit 36 determines each camera feature in the video content by multiple regression analysis using an affine transformation model to be described later. Is output to CPU1. Details of this processing will also be described later.

フェード／カット判定部３５は、上記各フェード／カット処理部２７、２９、３１及び３３から入力されたフェード／カット評価値を基に、映像コンテンツ中のフェードまたはカットの各映像編集特徴を判定し、ＣＰＵ１へ出力する。 The fade / cut determination unit 35 determines each video editing feature of fade or cut in the video content based on the fade / cut evaluation values input from the respective fade / cut processing units 27, 29, 31 and 33. , Output to CPU1.

［記録再生装置の動作］
次に、以上のように構成された記録再生装置１００の動作について説明する。以下に説明する動作は、その動作主体に関わらず、全て記録再生装置１００のＣＰＵ１の制御下で実行される。また、それらの各動作は、ハードウェアの動作である場合もあれば、ハードウェアと協働するソフトウェア（プログラム）の動作である場合もある。 [Operation of recording and playback device]
Next, the operation of the recording / reproducing apparatus 100 configured as described above will be described. The operations described below are all executed under the control of the CPU 1 of the recording / reproducing apparatus 100 regardless of the operation subject. Each of these operations may be a hardware operation, or may be a software (program) operation cooperating with the hardware.

本実施形態における記録再生装置１００は、カメラ特徴及び映像編集特徴の２つの映像特徴を検出する。このうち、カメラ特徴の検出には動きベクトルが用いられる。その際、上述のように、上記動きベクトル処理部３４は、上記各マッチング処理部２６、２８、３０及び３２で検出された動きベクトルが有効か否かを判別する。当該動きベクトルの有効性の判別には、判別関数が用いられる。以下、判別関数の生成処理について説明する。 The recording / reproducing apparatus 100 in the present embodiment detects two video features, a camera feature and a video editing feature. Of these, a motion vector is used to detect camera features. At that time, as described above, the motion vector processing unit 34 determines whether or not the motion vectors detected by the matching processing units 26, 28, 30 and 32 are valid. A discrimination function is used to determine the validity of the motion vector. The discriminant function generation process will be described below.

（判別関数生成処理）
図５は、判別関数の生成処理の流れを示したフローチャートである。
同図に示すように、まず、記録再生装置１００のＣＰＵ１は、例えばＨＤＤ１０等の記憶媒体に記憶された様々な学習用コンテンツを映像特徴検出部４に入力する（ステップ５１）。ここで、学習用コンテンツには、動きベクトルをうまく検出できる映像コンテンツと、動きベクトルをうまく検出できない映像コンテンツとが含まれる。 (Discriminant function generation process)
FIG. 5 is a flowchart showing a flow of discriminant function generation processing.
As shown in the figure, first, the CPU 1 of the recording / reproducing apparatus 100 inputs various learning contents stored in a storage medium such as the HDD 10 to the video feature detection unit 4 (step 51). Here, the learning content includes video content that can detect motion vectors well and video content that cannot detect motion vectors well.

動きベクトルをうまく検出できる映像コンテンツとは、被写体から動きベクトルを検出した場合に、動き特徴量を検出しやすい映像コンテンツである。動き特徴量とは、具体的には、例えば、基準ブロックと参照ブロックとの間の残差、基準ブロックの平均値及び分散値、参照ブロックの平均値及び分散値といった特徴データ（以下、判別パラメータと称する）である。このような映像コンテンツとしては、例えば、めりはりのある建物や人物等の被写体が撮影されている映像コンテンツが挙げられる。 The video content that can detect a motion vector well is video content that easily detects a motion feature amount when a motion vector is detected from a subject. Specifically, the motion feature amount is, for example, feature data such as a residual between the standard block and the reference block, an average value and a variance value of the standard block, an average value and a variance value of the reference block (hereinafter referred to as a discrimination parameter). Called). Examples of such video content include video content in which a subject such as a building or a person with a sharp edge is photographed.

動きベクトルをうまく検出できない映像コンテンツとは、被写体から動きベクトルを検出した場合に、上記動き特徴量（判別パラメータ）を検出しにくい映像コンテンツである。このような映像コンテンツとしては、例えば、ぼけたような、被写体のめりはりの無い映像コンテンツが挙げられる。 Video content in which a motion vector cannot be detected well is video content in which it is difficult to detect the motion feature amount (discrimination parameter) when a motion vector is detected from a subject. As such video content, for example, video content without blurring of the subject, such as blurred, can be cited.

また、学習用コンテンツには、上記動きベクトルをうまく検出できる映像コンテンツ、うまく検出できない映像コンテンツの両方について、速い動き、遅い動き等、動き速度の異なる映像コンテンツが含まれる。 The learning content includes video content with different motion speeds such as fast motion and slow motion for both video content that can detect the motion vector well and video content that cannot be detected well.

上記学習用コンテンツは、例えば、当該記録再生装置１００の設計者が、実際に所定の時間でビデオカメラを水平（パン）方向、垂直（チルト）方向に動かしたり、ズームインまたはズームアウトをしたりしながら被写体を撮影することで生成される。 The learning content can be obtained by, for example, the designer of the recording / reproducing apparatus 100 actually moving the video camera in a horizontal (pan) direction and a vertical (tilt) direction in a predetermined time, or zooming in or out. It is generated by shooting the subject.

例えば、動き時間が２フィールド（１／３０秒）で、動き量が１０画素の学習用コンテンツを生成する場合、設計者は、ビデオカメラのモニターで何センチが何画素に相当するかをものさし等で計測することができる。したがって、設計者は、当該計測データを基に、所望の動き量の学習用コンテンツを生成することができる。この場合、動き時間が２フィールド（１／３０秒）で、動き量が１０画素では計測しにくいため、設計者は、時間スケールを大きくして、例えば６０フィールド（１秒）で３００画素の動きとして計測してもよい。 For example, when a learning content having a motion time of 2 fields (1/30 seconds) and a motion amount of 10 pixels is generated, the designer measures how many centimeters corresponds to how many pixels on the monitor of the video camera. Can be measured. Therefore, the designer can generate learning content having a desired amount of movement based on the measurement data. In this case, since the motion time is 2 fields (1/30 seconds) and the motion amount is difficult to measure at 10 pixels, the designer increases the time scale, for example, moves 300 pixels in 60 fields (1 second). You may measure as.

このようにして設計者が撮影した学習用コンテンツのうち、被写体のめりはりのあるものが上記動きベクトルをうまく検出できる映像コンテンツであり、被写体のめりはりのないものが上記動きベクトルをうまく検出できない映像コンテンツである。 Of the learning contents photographed by the designer in this manner, those with a subject sharpening are video contents that can detect the motion vector well, and those without a subject sharpening cannot detect the motion vector well. It is.

続いて、ＣＰＵ１は、上述のように生成された学習用コンテンツを用いて学習処理を実行する（ステップ５２）。具体的には、ＣＰＵ１はまず、上記学習用コンテンツに、各マッチング処理部２６、２８、３０及び３２によりブロックマッチング処理を実行して動きベクトルを検出し、さらに上記動きベクトル処理部３４により上記判別パラメータを得る。 Subsequently, the CPU 1 executes a learning process using the learning content generated as described above (step 52). Specifically, the CPU 1 first performs block matching processing on the learning content by the matching processing units 26, 28, 30 and 32 to detect a motion vector, and further detects the motion vector by the motion vector processing unit 34. Get the parameters.

続いて、設計者が、学習コンテンツから検出された動きベクトルの動き量と、学習コンテンツ生成時に予め認識していた本来検出されるべき動き量とを比較して当該動きベクトルの妥当性（有効／無効）を判断し、判断結果を記録再生装置１００に入力する。すなわち、上記学習用コンテンツのうち、被写体のめりはりのあるものから検出された動きベクトルは有効と判断され、被写体のめりはりのないものから検出された動きベクトルは無効と判断されることとなる。 Subsequently, the designer compares the amount of motion of the motion vector detected from the learning content with the amount of motion that should be detected in advance when the learning content is generated, and compares the validity of the motion vector (valid / (Invalid) is determined, and the determination result is input to the recording / reproducing apparatus 100. That is, in the learning content, the motion vector detected from the subject with the presence of the subject is determined to be valid, and the motion vector detected from the subject without the subject is determined to be invalid.

続いて、ＣＰＵ１は、動きベクトル処理部３４により、当該入力された動きベクトルの有効／無効の判断結果と、動きベクトル検出時の上記判別パラメータとを学習データとして、判別分析を行う（ステップ５３）。当該判別分析は例えば線形判別分析である。 Subsequently, the CPU 1 uses the motion vector processing unit 34 to perform discriminant analysis using the input motion vector validity / invalidity judgment result and the discrimination parameter at the time of motion vector detection as learning data (step 53). . The discriminant analysis is, for example, linear discriminant analysis.

そして、ＣＰＵ１は、当該判別分析の結果を基に、動きベクトルの有効／無効を判断するための判別関数を決定する（ステップ５４）。この判別関数は、線形判別分析の場合、例えば以下のように表される。
ｙ＝ａ_１ｘ_１＋ａ_２ｘ_２＋ａ_３ｘ_３＋ａ_４ｘ_４＋ａ_５ｘ_５
当該判別関数において、ｘ_１〜ｘ_５の各変数は、それぞれ上記判別パラメータである基準ブロックと参照ブロックとの間の残差、基準ブロックの平均値及び分散値、参照ブロックの平均値及び分散値に対応している。すなわち、上記判別分析は、上記各判別パラメータを用いて判別関数の各変数ｘ_１〜ｘ_５に係る係数ａ_１〜ａ_５を求める処理である。 Then, the CPU 1 determines a discriminant function for judging the validity / invalidity of the motion vector based on the result of the discriminant analysis (step 54). In the case of linear discriminant analysis, this discriminant function is expressed as follows, for example.
y = a ₁ x ₁ + a ₂ x ₂ + a ₃ x ₃ + a ₄ x ₄ + a ₅ x ₅
In the discriminant function, the variables x _{1 to} x ₅ are the residuals between the standard block and the reference block, which are the discriminant parameters, the average value and variance value of the standard block, the average value and variance value of the reference block It corresponds to. That is, the discriminant analysis is a process for obtaining coefficients a _{1 to} a ₅ related to the variables x _{1 to} x ₅ of the discriminant function using the discriminant parameters.

この判別関数としては、例えば、上記２／４／８／１６フィールドの各マッチング処理部２６、２８、３０及び３２について共通のものが生成される。しかし、各マッチング処理部２６、２８、３０及び３２における各マッチング処理にそれぞれ対応可能なように、それぞれ異なる（４つの）判別関数が生成されてもよい。 As this discriminant function, for example, a common function is generated for each of the matching processing units 26, 28, 30 and 32 of the 2/4/8/16 field. However, different (four) discriminant functions may be generated so as to correspond to each matching process in each matching processing unit 26, 28, 30 and 32, respectively.

この場合、上記学習用コンテンツとしても、各マッチング処理部２６、２８、３０及び３２でそれぞれ動きベクトルが検出可能なように、フィールド時間（動き時間）の異なるものが生成される。またこの場合、上記判別パラメータも、各マッチング処理部２６、２８、３０及び３２により検出された各動きベクトルから別個に検出され、各判別関数を求めるための判別分析に用いられる。 In this case, the learning content is also generated with different field times (motion times) so that each matching processing unit 26, 28, 30 and 32 can detect a motion vector. In this case, the discriminant parameter is also detected separately from each motion vector detected by each matching processing unit 26, 28, 30 and 32, and used for discriminant analysis for obtaining each discriminant function.

各マッチング処理部２６、２８、３０及び３２毎に判別関数を設けることで、ＣＰＵ１は、同じ動き量であっても、動き速度の異なる映像コンテンツにそれぞれ対応して、動きベクトルの有効／無効を判断することができる。同じ動き量であっても、速い動きの場合、すなわち、検出フィールド時間が短い場合には、画像がぼけ気味になり、遅い動きの場合と比べて上記各判別パラメータに違いが出てくる。しかし、上記検出フィールド時間の異なる各マッチング処理部２６、２８、３０及び３２毎に判別関数を設けることで、判別パラメータの違いを、動きベクトルの妥当性の判断に反映させることができる。生成された判別関数は、例えばＨＤＤ１０やＲＡＭ２等に記憶される（ステップ５５）。 By providing a discriminant function for each of the matching processing units 26, 28, 30 and 32, the CPU 1 enables / disables motion vectors corresponding to video contents having different motion speeds even with the same motion amount. Judgment can be made. Even with the same amount of motion, when the motion is fast, that is, when the detection field time is short, the image becomes blurred, and the discrimination parameters differ from those of the slow motion. However, by providing a discriminant function for each of the matching processing units 26, 28, 30 and 32 having different detection field times, the difference in the discriminant parameters can be reflected in the determination of the validity of the motion vector. The generated discriminant function is stored in, for example, the HDD 10 or the RAM 2 (step 55).

（映像特徴検出処理の概要）
次に、記録再生装置１００は、ＨＤＤ１０等に記憶された映像コンテンツからの映像特徴検出処理に移る。この映像特徴は、上記決定された判別関数を用いて各映像コンテンツから検出される有効な動きベクトルを基に検出される。 (Outline of video feature detection processing)
Next, the recording / reproducing apparatus 100 proceeds to a video feature detection process from video content stored in the HDD 10 or the like. This video feature is detected based on an effective motion vector detected from each video content using the determined discriminant function.

図６は、映像特徴検出部４による映像特徴検出処理の大まかな流れを示したフローチャートである。
同図に示すように、映像特徴検出部４は、まず、ＨＤＤ１０等に記憶された映像コンテンツのサムネイル映像を構成する各画像（以下、サムネイル画像と称する）を上記順次入力する（ステップ６１）。続いて、映像特徴検出部４は、入力されたサムネイル画像間でブロックマッチング処理を行い、動きベクトルを検出する（ステップ６２）。続いて、映像特徴検出部４は、検出された動きベクトルを基に、映像コンテンツ中のカメラ特徴を推定し、また後述するフェード／カット評価値を基に、映像コンテンツ中の映像編集特徴を推定する（ステップ６３）。そして、映像特徴検出部４は、当該推定された各映像特徴を出力する（ステップ６４）。
以下、これらの各ステップの処理について詳細に説明する。まず、動きベクトル検出処理の詳細について説明する。 FIG. 6 is a flowchart showing a rough flow of video feature detection processing by the video feature detection unit 4.
As shown in the figure, the video feature detection unit 4 first sequentially inputs the respective images (hereinafter referred to as thumbnail images) constituting the thumbnail video of the video content stored in the HDD 10 or the like (step 61). Subsequently, the video feature detection unit 4 performs block matching processing between the input thumbnail images and detects a motion vector (step 62). Subsequently, the video feature detection unit 4 estimates camera features in the video content based on the detected motion vector, and estimates video editing features in the video content based on a fade / cut evaluation value described later. (Step 63). Then, the video feature detector 4 outputs each estimated video feature (step 64).
Hereinafter, the processing of each of these steps will be described in detail. First, details of the motion vector detection process will be described.

（動きベクトル検出処理）
図７は、本実施形態における動きベクトル検出処理の流れを示したフローチャートである。図８は、当該動きベクトル検出処理の機能ブロック図である。図９は、合成動きベクトルの生成処理を概念的に示した図である。
図７及び図８に示すように、映像特徴検出部４は、ＣＰＵ１から上記サムネイル画像データを入力されると（ステップ７１）、ブロック処理部２１により、当該サムネイル画像データにブロック処理を施す（ステップ７２）。具体的には、まずブロック処理部２１は、入力されたサムネイル画像データのうちの１つを基準フィールドとして設定する。そして、ブロック処理部２１は、当該基準フィールドに、動きベクトル検出の対象とする画像範囲を画定する基準画像領域及び動きベクトルの探索範囲（参照範囲）を画定する探索領域をそれぞれ設定する。そして、ブロック処理部２１は、基準画像領域を例えば８×８＝６４個の領域に分割する。当該分割された各ブロックが、後のブロックマッチング処理において、基準ブロックとして抽出されることとなる。 (Motion vector detection process)
FIG. 7 is a flowchart showing the flow of motion vector detection processing in the present embodiment. FIG. 8 is a functional block diagram of the motion vector detection process. FIG. 9 is a diagram conceptually showing a synthetic motion vector generation process.
7 and 8, when the thumbnail image data is input from the CPU 1 (step 71), the video feature detection unit 4 performs block processing on the thumbnail image data by the block processing unit 21 (step 71). 72). Specifically, first, the block processing unit 21 sets one of the input thumbnail image data as a reference field. Then, the block processing unit 21 sets, in the reference field, a reference image region that defines an image range that is a target of motion vector detection and a search region that defines a motion vector search range (reference range). Then, the block processing unit 21 divides the reference image area into, for example, 8 × 8 = 64 areas. Each divided block is extracted as a reference block in the subsequent block matching process.

続いて、映像特徴検出部４は、図７に示すように、順次入力されブロック処理を施されたサムネイル画像を、参照フィールドとしてそれぞれ２／４／８／１６フィールド間メモリ部２２〜２５に記憶する（ステップ７３）。上述したように、２／４／８／１６フィールド間メモリ部２２〜２５は、単一のメモリ素子で構成されてもよい。この場合、図９に示すように、単一のメモリ素子にバッファされたサムネイル画像のコピーが、それぞれ２／４／８／１６フィールド分ディレイされて各マッチング処理部２６、２８、３０、３２へ供給される。これにより、上記２／４／８／１６フィールド間メモリ部２２〜２５が別個のメモリ素子として存在する場合と同様の機能が実現される。 Subsequently, as shown in FIG. 7, the video feature detection unit 4 stores the thumbnail images sequentially input and subjected to the block processing in the 2/4/8/16 inter-field memory units 22 to 25 as reference fields, respectively. (Step 73). As described above, the 2/4/8/16 inter-field memory units 22 to 25 may be configured by a single memory element. In this case, as shown in FIG. 9, the copy of the thumbnail image buffered in the single memory element is delayed by 2/4/8/16 fields, respectively, and sent to each matching processing unit 26, 28, 30, 32. Supplied. Thereby, the same function as the case where the 2/4/8/16 inter-field memory units 22 to 25 exist as separate memory elements is realized.

続いて、映像特徴検出部４は、マッチング処理部２６、２８、３０、３２により、上記基準フィールドと、上記２／４／８／１６フィールド間メモリ部２２〜２５から入力された参照フィールドとの間で多段ブロックマッチング処理を行う（ステップ７４）。具体的には、マッチング処理部２６、２８、３０、３２は、基準フィールドから抽出された基準ブロックが、２／４／８／１６フィールド間メモリ部２２〜２５に記憶された各参照フィールド内のどの位置に移動したかをパターンマッチングにより検出する。 Subsequently, the video feature detection unit 4 uses the matching processing units 26, 28, 30, and 32 to compare the reference field and the reference fields input from the 2/4/8/16 inter-field memory units 22 to 25. Multistage block matching processing is performed between them (step 74). Specifically, the matching processing units 26, 28, 30, and 32 are configured so that the standard block extracted from the standard field is stored in the 2/4/8/16 inter-field memory units 22 to 25. Which position it has moved to is detected by pattern matching.

この場合、マッチング処理部２６、２８、３０、３２は、参照フィールド内に設定された参照ブロックを参照フィールドの探索領域内で１画素ずつ移動させ、基準ブロックを構成する各データ成分（Ｙ、Ｃｂ及びＣｒ）の各ベクトル距離が最小となる位置を検出する。上記基準ブロックから、当該検出位置（参照ブロック）までの移動量及び移動方向が動きベクトルとして検出される。当該ブロックマッチング処理は、基準フィールドを構成する各ブロック（例えば６４個）についてそれぞれ実行される。また、マッチング処理部２６、２８、３０、３２は、上記動きベクトルとともに、基準ブロックと参照ブロックとのマッチング後の残差も算出する。検出された残差は各フェード／カット処理部２７、２９、３１及び３３へ出力される。 In this case, the matching processing units 26, 28, 30, and 32 move the reference block set in the reference field one pixel at a time in the search field of the reference field, and each data component (Y, Cb) constituting the base block And Cr), the position where each vector distance is minimum is detected. A movement amount and a movement direction from the reference block to the detection position (reference block) are detected as motion vectors. The block matching process is executed for each block (for example, 64 blocks) constituting the reference field. In addition, the matching processing units 26, 28, 30, and 32 also calculate residuals after matching between the base block and the reference block together with the motion vector. The detected residual is output to each of the fade / cut processing units 27, 29, 31 and 33.

続いて、映像特徴検出部４は、上記動きベクトルの検出データについて、上記生成された判別関数を用いて判別分析を行う（ステップ７６）。すなわち、図９に示すように、映像特徴検出部４は、２／４／８／１６フィールド間でブロック毎に検出された動きベクトル１ａ〜６４ａ、１ｂ〜６４ｂ、１ｃ〜６４ｃ、１ｄ〜６４ｄ（図示せず）について、それぞれ上記判別関数により判別分析を行う。すなわち、動きベクトルに付随する上記判別パラメータである基準ブロックと参照ブロックとの間の残差、基準ブロックの平均値及び分散値、参照ブロックの平均値及び分散値が判別関数の各変数として入力される。なお、図９では、説明の便宜上、１６フィールド間の動きベクトルについての処理は図示されていない。 Subsequently, the video feature detection unit 4 performs discriminant analysis on the motion vector detection data using the generated discriminant function (step 76). That is, as shown in FIG. 9, the video feature detection unit 4 has motion vectors 1a to 64a, 1b to 64b, 1c to 64c, 1d to 64d (detected for each block between 2/4/8/16 fields). The discriminant analysis is performed on each of the discriminant functions (not shown). That is, the residual between the standard block and the reference block, which is the discrimination parameter associated with the motion vector, the average value and variance value of the standard block, and the average value and variance value of the reference block are input as variables of the discrimination function. The In FIG. 9, for convenience of explanation, processing for motion vectors between 16 fields is not shown.

続いて、映像特徴検出部４は、上記判別分析の結果としての判別関数の出力が正負のいずれかを確認する（ステップ７７）。映像特徴検出部４は、判別関数の出力が正の値である場合、当該動きベクトルを有効と判断し（ステップ７８）、負の値である場合、当該動きベクトルを無効と判断する（ステップ７９）。もちろん、この正負の設定は逆であってもよい。 Subsequently, the video feature detection unit 4 confirms whether the output of the discriminant function as a result of the discriminant analysis is positive or negative (step 77). When the output of the discriminant function is a positive value, the video feature detection unit 4 determines that the motion vector is valid (step 78), and when the output is a negative value, the video feature detection unit 4 determines that the motion vector is invalid (step 79). ). Of course, this positive / negative setting may be reversed.

続いて、映像特徴検出部４は、多段検出された動きベクトルを正規化する（ステップ８０）。すなわち、映像特徴検出部４は、例えば２／４／８フィールド間で検出された動きベクトルを、それぞれ８倍／４倍／２倍することで、１６フィールド間で検出された動きベクトルに正規化する。これにより、２／４／８フィールド間で検出された動きベクトルのダイナミックレンジを広げて、分解能を向上させ、その後生成される合成動きベクトルの精度を高めることができる。もちろん、映像特徴検出部４は、例えば４／８／１６フィールド間で検出された動きベクトルを、２フィールド間で検出された動きベクトルに合わせて縮小することで正規化しても構わない。しかし、この場合は、長いフィールド間で検出された動きベクトル程分解能が落ちてしまう。 Subsequently, the video feature detection unit 4 normalizes the motion vectors detected in multiple stages (step 80). That is, the video feature detection unit 4 normalizes motion vectors detected between 16 fields, for example, by multiplying motion vectors detected between 2/4/8 fields by 8 times / 4 times / 2 times, respectively. To do. Thereby, the dynamic range of the motion vector detected between 2/4/8 fields can be expanded, the resolution can be improved, and the precision of the synthesized motion vector generated thereafter can be increased. Of course, the video feature detection unit 4 may normalize the motion vector detected between, for example, 4/8/16 fields by reducing it according to the motion vector detected between two fields. However, in this case, the resolution decreases as the motion vector detected between long fields.

続いて、映像特徴検出部４は、動きベクトル処理部３４により、上記判別分析により有効と判断され、正規化された動きベクトルを合成して合成動きベクトルを生成する（ステップ８１）。例えば、動きベクトル処理部３４は、図８及び図９に示すように、各フィールド間で検出された動きベクトルのブロック毎の正規化データから、有効と判断された動きベクトルをブロック毎に選択して合成する。図９の例では、６４個のブロックのうち、フィールドの左上の１／４部分に位置するブロックについては２フィールド間の動きベクトル（１ａ〜４ａ、９ａ〜１２ａ、１７ａ〜２０ａ、２５ａ〜２８ａ）が有効と判断されている。また、右上及び左下の各１／４部分に位置するブロックについては４フィールド間の動きベクトル（５ｂ〜８ｂ、１３ｂ〜１６ｂ、２１ｂ〜２４ｂ、２９ｂ〜３６ｂ、４１ｂ〜４４ｂ、４９ｂ〜５２ｂ、５７ｂ〜６０ｂ）が有効と判断されている。また右下の１／４部分に位置するブロックについては８フィールド間の動きベクトル（３７ｃ〜４０ｃ、４５ｃ〜４８ｃ、５３ｃ〜５６ｃ、６１ｃ〜６４ｃ）が有効と判断されている。したがって、動きベクトル処理部３４は、それぞれのフィールド間で有効と判断された動きベクトルを１フィールド分に合成して合成動きベクトルを生成する。 Subsequently, the video feature detection unit 4 determines that the motion vector processing unit 34 is valid by the discriminant analysis and generates a combined motion vector by combining the normalized motion vectors (step 81). For example, as shown in FIGS. 8 and 9, the motion vector processing unit 34 selects, for each block, the motion vector determined to be valid from the normalized data for each block of the motion vector detected between the fields. To synthesize. In the example of FIG. 9, among the 64 blocks, the motion vector between the two fields (1a to 4a, 9a to 12a, 17a to 20a, 25a to 28a) is located in the block located in the upper left quarter portion of the field. Is determined to be effective. For the blocks located in the upper right and lower left quarters, motion vectors between four fields (5b-8b, 13b-16b, 21b-24b, 29b-36b, 41b-44b, 49b-52b, 57b- 60b) is determined to be effective. For the block located in the lower right quarter portion, motion vectors (37c to 40c, 45c to 48c, 53c to 56c, 61c to 64c) between 8 fields are determined to be valid. Therefore, the motion vector processing unit 34 combines the motion vectors determined to be valid between the fields into one field to generate a combined motion vector.

ここで、動きベクトル処理部３４は、異なるフィールド間の各動きベクトルがいずれも有効である場合には、より短いフィールド間の動きベクトルを合成動きベクトルの生成に用いる。したがって、動きベクトル処理部３４は、２／４／８／１６フィールド間の各動きベクトルが全て有効である場合には、２フィールド間の動きベクトルを合成動きベクトルとする。このように、より短いフィールド間の動きベクトルを優先して用いるのは、フィールド間隔が大きい程、映像コンテンツ中で速い動きがあった場合に、その大きな動き量に追従できずに、動きベクトルの探索範囲から外れてしまう可能性があるからである。したがって、動きベクトル処理部３４は、より短いフィールド間の動きベクトルを優先して用いることで、速い動きの検出精度を高めることができる。 Here, when each motion vector between different fields is valid, the motion vector processing unit 34 uses a shorter motion vector between the fields to generate a combined motion vector. Accordingly, when all the motion vectors between 2/4/8/16 fields are valid, the motion vector processing unit 34 sets the motion vector between the two fields as a combined motion vector. As described above, the motion vector between shorter fields is preferentially used because the larger the field interval, the faster the motion content in the video content cannot follow the large motion amount. This is because there is a possibility of deviating from the search range. Therefore, the motion vector processing unit 34 can improve the accuracy of fast motion detection by giving priority to the motion vector between shorter fields.

そして、映像特徴検出部４は、上記生成された合成動きベクトルを、映像コンテンツ中の動きデータとして、動きベクトル処理部３４からカメラ特徴判定部３６へ出力させる（ステップ８２）。この合成動きベクトルが、続く映像特徴推定処理における重回帰分析処理に用いられる。 Then, the video feature detection unit 4 outputs the generated synthesized motion vector as motion data in the video content from the motion vector processing unit 34 to the camera feature determination unit 36 (step 82). This synthesized motion vector is used for the multiple regression analysis process in the subsequent video feature estimation process.

なお、図８及び図９では、上記判別分析処理及び正規化処理が、各フィールド間毎に別個に実行されるように図示されているが、これは、各フィールド間の判別分析処理及び正規化処理が別個のハードウェア及びソフトウェアで実行されることを示しているわけではない。各フィールド間の判別分析処理及び正規化処理は、単一のハードウェア及びソフトウェアで実行されてもよいし、別個のハードウェア及びソフトウェアで実行されても構わない。 8 and 9, the discriminant analysis process and the normalization process are illustrated to be executed separately for each field, but this is different from the discriminant analysis process and the normalization between the fields. It does not indicate that the processing is performed on separate hardware and software. The discriminant analysis process and the normalization process between the fields may be executed by a single hardware and software, or may be executed by separate hardware and software.

ところで、上記ブロックマッチング処理は、フィールド間が短い順、すなわち、マッチング処理部２６、マッチング処理部２８、マッチング処理部３０、マッチング処理部３２の順に行われる。本実施形態では、各マッチング処理部２６、２８、３０及び３２は、前のフィールド間で検出された動きベクトルを、次のフィールド間における動きベクトルの検出に順次利用することが可能である。図１０は、このようなブロックマッチング処理を概念的に示した図である。 By the way, the block matching process is performed in the order of shortness between fields, that is, the matching processing unit 26, the matching processing unit 28, the matching processing unit 30, and the matching processing unit 32 in this order. In the present embodiment, each of the matching processing units 26, 28, 30 and 32 can sequentially use the motion vector detected between the previous fields for the detection of the motion vector between the next fields. FIG. 10 is a diagram conceptually showing such block matching processing.

同図に示すように、２フィールド間のマッチング処理部２６は、ブロックマッチングにより、基準フィールドの基準ブロックの初期位置（ｘ０，ｙ０）から移動した、２フィールド先の参照フィールドの参照ブロックの移動位置（ｘ２，ｙ２）を探索する。ここで、４フィールド間における動き量は、２フィールド間における動き量よりも大きいと考えられる。そこで、４フィールド間のマッチング処理部２８は、基準ブロックの初期位置（ｘ０，ｙ０）ではなく、２フィールド間での移動位置（ｘ２，ｙ２）を初期位置として、基準フィールドから４フィールド先の参照ブロックでの移動位置（ｘ４，ｙ４）を探索する。 As shown in the figure, the matching processing unit 26 between the two fields moves from the initial position (x0, y0) of the reference block of the reference field by block matching, and the movement position of the reference block of the reference field two fields ahead Search for (x2, y2). Here, the amount of motion between the four fields is considered to be larger than the amount of motion between the two fields. Therefore, the matching processing unit 28 between the four fields refers to the four fields ahead of the reference field with the movement position (x2, y2) between the two fields as the initial position, not the initial position (x0, y0) of the reference block. The movement position (x4, y4) in the block is searched.

そして、４フィールド間のマッチング処理部２８は、２フィールド間で検出された動きベクトルと、（ｘ２，ｙ２）を初期位置として検出した動きベクトルとを足すことで、（ｘ０，ｙ０）から（ｘ４，ｙ４）への動きベクトルとして出力する。 Then, the matching processing unit 28 between the four fields adds (x0, y0) to (x4) by adding the motion vector detected between the two fields and the motion vector detected with (x2, y2) as the initial position. , Y4) as a motion vector.

この処理は、２フィールド間で検出された動きベクトルが、上記判別分析により有効であると判断された場合に実行される。すなわち、４フィールド間のマッチング処理部２８は、２フィールド間で検出された動きベクトルの判別分析結果に応じて、４フィールド間におけるブロックマッチング処理の初期位置を判断する。 This process is executed when a motion vector detected between two fields is determined to be valid by the discriminant analysis. That is, the matching processing unit 28 between the four fields determines the initial position of the block matching processing between the four fields in accordance with the motion vector discriminant analysis result detected between the two fields.

このように、短いフィールド間におけるブロックマッチング処理で探索された探索位置を、次のフィールド間におけるブロックマッチング処理の初期位置として利用することで、初期位置を（ｘ０，ｙ０）として探索するよりも早くマッチング位置を検出できる。 Thus, by using the search position searched by the block matching process between the short fields as the initial position of the block matching process between the next fields, it is faster than searching as the initial position (x0, y0). Matching position can be detected.

図１０では、２フィールド間及び４フィールド間の動きベクトルを検出する際の処理について説明したが、もちろん、８フィールド間、１６フィールド間の動きベクトルを検出する際にも同様に適用できる。例えば２フィールド間及び４フィールド間の各動きベクトルが共に有効であり、８フィールド間の動きベクトルが無効である場合、１６フィールド間のマッチング処理部３２は、４フィールド間での探索位置を初期位置として動きベクトルを探索する。 In FIG. 10, the processing for detecting motion vectors between two fields and between four fields has been described. Of course, the present invention can be similarly applied to detecting motion vectors between eight fields and sixteen fields. For example, when motion vectors between 2 fields and 4 fields are both valid and motion vectors between 8 fields are invalid, the matching processing unit 32 between 16 fields sets the search position between 4 fields as the initial position. Search for motion vectors.

（映像特徴推定処理）
次に、映像特徴推定処理、すなわちカメラ特徴及び映像編集特徴の推定処理の詳細について説明する。 (Video feature estimation processing)
Next, details of video feature estimation processing, that is, camera feature and video editing feature estimation processing will be described.

図１１は、記録再生装置１００による映像特徴推定処理の流れを示したフローチャートである。
同図に示すように、まず、映像特徴検出部４のカメラ特徴判定部３６は、各映像特徴の検出フラグの初期設定を行う（ステップ１０１）。ここで検出フラグとは、映像コンテンツ中から、上記パン、チルト、ズームの各カメラ特徴と、フェード及びカットの各映像編集特徴とがそれぞれ検出されたことを示すフラグである。各映像特徴の検出フラグは、それぞれDpan、Dtilt、Dzoom、Dfade及びDcutで表され、それぞれのフラグ値を０にすることで各初期設定が行われる。 FIG. 11 is a flowchart showing a flow of video feature estimation processing by the recording / reproducing apparatus 100.
As shown in the figure, first, the camera feature determination unit 36 of the video feature detection unit 4 performs initial setting of a detection flag for each video feature (step 101). Here, the detection flag is a flag indicating that the pan, tilt, and zoom camera features and the fade and cut video editing features are detected from the video content. The detection flag of each video feature is represented by Dpan, Dtilt, Dzoom, Dfade, and Dcut, and each initial setting is performed by setting each flag value to 0.

続いて、カメラ特徴判定部３６は、上記動きベクトル処理部３４から出力された合成動きベクトルデータを基に、例えば最小二乗法等により重回帰分析処理を行い（ステップ１０２）、アフィン係数を算出する（ステップ１０３）。ここで、この重回帰分析処理によりアフィン係数を算出するためのアフィン変換モデルについて説明する。 Subsequently, the camera feature determination unit 36 performs multiple regression analysis processing by, for example, the least square method based on the combined motion vector data output from the motion vector processing unit 34 (step 102), and calculates an affine coefficient. (Step 103). Here, an affine transformation model for calculating an affine coefficient by the multiple regression analysis process will be described.

図１２は、アフィン変換モデルを示した図である。アフィン変換モデルは、３次元オブジェクトの平行移動、拡大／縮小、回転を、行列を用いた座標変換処理として記述するためのモデルである。上記パン、チルト、ズームといったカメラ特徴は、上記基準フィールド内の物体の平行移動、拡大／縮小であると考えられるため、アフィン変換モデルを用いることで、カメラ特徴を記述することが可能となる。 FIG. 12 is a diagram showing an affine transformation model. The affine transformation model is a model for describing translation, enlargement / reduction, and rotation of a three-dimensional object as coordinate transformation processing using a matrix. Since the camera features such as pan, tilt, and zoom are considered to be parallel movement and enlargement / reduction of an object in the reference field, the camera features can be described by using an affine transformation model.

ここで、映像コンテンツにおいて、フィールド間隔が大きくない場合には、回転の特徴については、回転角θが小さいものとして、以下の近似処理を行うことができる。
ｓｉｎθ≒θ
ｃｏｓθ≒１ Here, in the video content, when the field interval is not large, the following approximation processing can be performed on the assumption that the rotation angle θ is small for the rotation feature.
sinθ ≒ θ
cos θ ≒ 1

したがって、アフィン変換モデルは、図１２に示すように変形することができる。そして、上記合成動きベクトルから、このアフィン変換モデルを用いて各係数を求めることで、カメラ特徴を検出することができる。すなわち、パン、チルト、ズームに対して、所定の閾値Ｐｔｈ、Ｔｔｈ及びＺｔｈを設定しておき、上記合成動きベクトルから算出した各アフィン係数と比較することで、各カメラ特徴を検出することができる。 Therefore, the affine transformation model can be modified as shown in FIG. And a camera characteristic can be detected by calculating | requiring each coefficient from this synthetic | combination motion vector using this affine transformation model. That is, predetermined camera thresholds Pth, Tth, and Zth are set for pan, tilt, and zoom, and each camera feature can be detected by comparing with each affine coefficient calculated from the combined motion vector. .

図１３は、重回帰分析によりアフィン係数を求める処理を示した図である。同図に示すように、カメラ特徴判定部３６は、説明変数を上記基準フィールドにおける初期位置のｘ、ｙ座標（ｘｎ，ｙｎ）とし、非説明変数（目的変数）を、上記参照フィールドにおける動きベクトルの検出位置（探索位置）のｘ、ｙ座標（ｘｍ，ｙｍ）とする。そして、カメラ特徴判定部３６は、この説明変数及び非説明変数を用いて、上記合成動きベクトルについて重回帰分析処理を行い、パン、チルト、ズームの各係数Ｐｘ、Ｐｙ、Ｚｘを求める（ステップ１０３）。 FIG. 13 is a diagram showing processing for obtaining an affine coefficient by multiple regression analysis. As shown in the figure, the camera feature determination unit 36 uses the explanatory variables as x and y coordinates (xn, yn) of the initial position in the reference field, and sets the non-explaining variables (object variables) as motion vectors in the reference field. X and y coordinates (xm, ym) of the detected position (search position). Then, using the explanatory variable and the non-explanatory variable, the camera feature determination unit 36 performs a multiple regression analysis process on the synthesized motion vector to obtain pan, tilt, and zoom coefficients Px, Py, and Zx (step 103). ).

図１１に戻り、カメラ特徴判定部３６は、上記算出したアフィン係数のうち、パン係数Ｐｘを入力する（ステップ１０４）。そして、カメラ特徴判定部３６は、当該Ｐｘが、上記閾値Ｐｔｈよりも大きいか否かを判定し（ステップ１０５）、Ｐｔｈよりも大きい場合には（Ｙｅｓ）、パン検出フラグDpan＝１とし（ステップ１０６）、Ｐｔｈ以下の場合には（Ｎｏ）、パン検出フラグDpan＝０とする（ステップ１０７）。 Returning to FIG. 11, the camera feature determination unit 36 inputs the pan coefficient Px among the calculated affine coefficients (step 104). Then, the camera feature determination unit 36 determines whether or not the Px is greater than the threshold value Pth (step 105). If the Px is greater than Pth (Yes), the pan detection flag Dpan = 1 is set (step 1). 106) If Pth or less (No), the pan detection flag Dpan = 0 is set (step 107).

続いて、カメラ特徴判定部３６は、上記算出したアフィン係数のうち、チルト係数Ｐｙを入力する（ステップ１０８）。そして、カメラ特徴判定部３６は、当該Ｐｙが、上記閾値Ｔｔｈよりも大きいか否かを判定し（ステップ１０９）、Ｔｔｈよりも大きい場合には（Ｙｅｓ）、チルト検出フラグDtilt＝１とし（ステップ１１０）、Ｔｔｈ以下の場合には（Ｎｏ）、チルト検出フラグDtilt＝０とする（ステップ１１１）。 Subsequently, the camera feature determination unit 36 inputs the tilt coefficient Py among the calculated affine coefficients (step 108). Then, the camera feature determination unit 36 determines whether or not the Py is greater than the threshold value Tth (step 109). If the Py is greater than Tth (Yes), the tilt detection flag Dtilt = 1 is set (step 1). 110), if Tth or less (No), the tilt detection flag Dtilt = 0 is set (step 111).

続いて、カメラ特徴判定部３６は、上記算出したアフィン係数のうち、ズーム係数Ｚｘ及びＺｙを入力する（ステップ１１２）。そして、カメラ特徴判定部３６は、当該ＺｘまたはＺｙが、上記閾値Ｚｔｈよりも大きいか否かを判定し（ステップ１１３）、少なくとも一方がＺｔｈよりも大きい場合には（Ｙｅｓ）、ズーム検出フラグDzoom＝１とし（ステップ１１４）、いずれもＺｔｈ以下の場合には（Ｎｏ）、ズーム検出フラグDzoom＝０とする（ステップ１１５）。 Subsequently, the camera feature determination unit 36 inputs zoom coefficients Zx and Zy among the calculated affine coefficients (step 112). Then, the camera feature determination unit 36 determines whether or not the Zx or Zy is greater than the threshold value Zth (step 113). If at least one is greater than Zth (Yes), the zoom detection flag Dzoom = 1 (step 114). If both are equal to or smaller than Zth (No), the zoom detection flag Dzoom = 0 is set (step 115).

カメラ特徴判定部３６は、パン、チルト、ズームの各カメラ特徴については、それぞれ左パンと右パン、左チルトと右チルト、ズームインとズームアウトをそれぞれ区別して検出するようにしても構わない。この区別は、アフィン係数の正負の符号を参照することで容易に行うことができる。ここで、カメラを移動させた場合における実際のカメラの動き方向と、被写体の動き方向とは逆となる。上記カメラ特徴判定部３６により検出されるパン及びチルトの各カメラ特徴は、カメラの動きとしてではなく、被写体の動きとして検出される。しかし、カメラ特徴判定部３６は、上記アフィン係数の正負の符号を逆にすることで、カメラの動き方向に合わせてカメラ特徴を検出することができる。 The camera feature determination unit 36 may separately detect left pan and right pan, left tilt and right tilt, and zoom in and zoom out for each camera feature of pan, tilt, and zoom. This distinction can be easily made by referring to the sign of the affine coefficient. Here, when the camera is moved, the actual camera movement direction is opposite to the subject movement direction. The pan and tilt camera features detected by the camera feature determination unit 36 are detected not as camera motion but as subject motion. However, the camera feature determination unit 36 can detect the camera feature in accordance with the movement direction of the camera by reversing the sign of the affine coefficient.

以上の重回帰分析による各アフィン係数の算出及び各閾値との比較により、映像コンテンツ中にカメラ特徴が存在するか否かが判定される。本実施形態では、このカメラ特徴の判定継続時間長を、上記合成動きベクトルが生成される基になった各ブロックの動きベクトルが、どのフィールド間で検出されたかに応じて可変している。以下、この判定継続時間長の可変処理について説明する。 By calculating each affine coefficient by the above multiple regression analysis and comparing it with each threshold value, it is determined whether or not a camera feature exists in the video content. In this embodiment, the camera feature determination duration is varied depending on which field the motion vector of each block from which the synthesized motion vector is generated is detected. Hereinafter, the determination duration time variable process will be described.

図１４は、当該カメラ特徴の判定継続時間長の可変処理の流れを示したフローチャートである。同図は、上記可変処理を、パン、チルト、ズームの各アフィン係数及び各カメラ特徴について共通の処理として示している。 FIG. 14 is a flowchart showing a flow of variable processing of the determination duration time of the camera feature. This figure shows the variable processing as common processing for pan, tilt, zoom affine coefficients and camera features.

同図に示すように、カメラ特徴判定部３６はまず、各アフィン係数を入力する（ステップ１４１）。次に、カメラ特徴判定部３６は、当該アフィン係数の算出の基になった合成動きベクトルを構成する各ブロックの動きベクトルとして、２フィールド間のマッチング処理部２６から最終出力された動きベクトルが最も多く含まれるか否かを判断する（ステップ１４２）。カメラ特徴判定部３６は、２フィールド間の動きベクトルが最多であると判断した場合（Ｙｅｓ）、各カメラ特徴の判定時間長をｔ２に設定する（ステップ１４３）。 As shown in the figure, the camera feature determination unit 36 first inputs each affine coefficient (step 141). Next, the camera feature determination unit 36 uses the motion vector finally output from the matching processing unit 26 between two fields as the motion vector of each block constituting the combined motion vector that is the basis for calculating the affine coefficient. It is determined whether or not many are included (step 142). If the camera feature determination unit 36 determines that the number of motion vectors between two fields is the largest (Yes), it sets the determination time length of each camera feature to t2 (step 143).

カメラ特徴判定部３６は、２フィールド間の動きベクトルが最多でないと判断した場合（Ｎｏ）、合成動きベクトルに、４フィールド間のマッチング処理部２８から最終出力された動きベクトルが最も多く含まれるか否かを判断する（ステップ１４４）。カメラ特徴判定部３６は、４フィールド間の動きベクトルが最多であると判断した場合（Ｙｅｓ）、各カメラ特徴の判定時間長をｔ４に設定する（ステップ１４５）。 If the camera feature determination unit 36 determines that the number of motion vectors between two fields is not the largest (No), does the combined motion vector include the largest number of motion vectors finally output from the matching processing unit 28 between the four fields? It is determined whether or not (step 144). When the camera feature determination unit 36 determines that the motion vectors between the four fields are the largest (Yes), the camera feature determination unit 36 sets the determination time length of each camera feature to t4 (step 145).

カメラ特徴判定部３６は、４フィールド間の動きベクトルが最多でないと判断した場合（Ｎｏ）、合成動きベクトルに、８フィールド間のマッチング処理部３０から最終出力された動きベクトルが最も多く含まれるか否かを判断する（ステップ１４６）。カメラ特徴判定部３６は、８フィールド間の動きベクトルが最多であると判断した場合（Ｙｅｓ）、各カメラ特徴の判定時間長をｔ８に設定する（ステップ１４７）。 If the camera feature determination unit 36 determines that the number of motion vectors between the four fields is not the largest (No), does the combined motion vector include the largest number of motion vectors finally output from the matching processing unit 30 between the eight fields? It is determined whether or not (step 146). When the camera feature determination unit 36 determines that the motion vectors between the eight fields are the largest (Yes), the camera feature determination unit 36 sets the determination time length of each camera feature to t8 (step 147).

カメラ特徴判定部３６は、８フィールド間の動きベクトルが最多でないと判断した場合（Ｎｏ）、各カメラ特徴の判定時間長をｔ１６に設定する（ステップ１４８）。ここで、ｔ２＜ｔ４＜ｔ６＜ｔ１６である。 If the camera feature determination unit 36 determines that the motion vector between the eight fields is not the largest (No), the camera feature determination unit 36 sets the determination time length of each camera feature to t16 (step 148). Here, t2 <t4 <t6 <t16.

続いて、カメラ特徴判定部３６は、合成動きベクトルを構成するブロック毎に、上記各所定の閾値Ｐｔｈ、Ｔｔｈ及びＺｔｈに基づいて各カメラ特徴の有無を検出する処理を、上記各設定した判定時間長だけ継続する。 Subsequently, the camera feature determination unit 36 performs a process for detecting the presence or absence of each camera feature based on the predetermined threshold values Pth, Tth, and Zth for each block constituting the combined motion vector. Continue for a long time.

そして、カメラ特徴判定部３６は、上記各判定時間内で、それぞれ所定時間割合以上、各カメラ特徴が検出されたか否かを判断する（ステップ１４９）。すなわち、カメラ特徴判定部３６は、上記各所定の閾値Ｐｔｈ、Ｔｔｈ及びＺｔｈを超える各アフィン係数が所定時間割合以上検出されたか否かを判断する。 Then, the camera feature determination unit 36 determines whether or not each camera feature is detected within a predetermined time ratio within each determination time (step 149). That is, the camera feature determination unit 36 determines whether or not each affine coefficient exceeding the predetermined thresholds Pth, Tth, and Zth has been detected for a predetermined time ratio or more.

そして、カメラ特徴判定部３６は、各判定時間内で、所定時間割合以上各カメラ特徴が検出された場合には、カメラ特徴ありと判定し（ステップ１５０）、そうでない場合にはカメラ特徴なしと判定する（ステップ１５１）。この判定結果が、上記各検出フラグとして記録される。 The camera feature determination unit 36 determines that there is a camera feature if each camera feature is detected within a predetermined time ratio within each determination time (step 150), and otherwise indicates that there is no camera feature. Determination is made (step 151). The determination result is recorded as each detection flag.

この場合、上記重回帰分析処理における非説明変数（目的変数）も、合成動きベクトルを構成する各動きベクトルが検出されたフィールド間に応じて決定される。すなわち、動きベクトルが最多となるフィールド間を置いた参照フィールドにおける動きベクトルの検出位置（探索位置）のｘ、ｙ座標（ｘｍ，ｙｍ）が、非説明変数とされる。 In this case, the non-explanatory variable (objective variable) in the multiple regression analysis process is also determined according to the field in which each motion vector constituting the combined motion vector is detected. That is, the x and y coordinates (xm, ym) of the motion vector detection position (search position) in the reference field between the fields where the motion vector is the largest are set as non-explanatory variables.

このように、合成動きベクトルの基になった最多の動きベクトルがどのフィールド間で検出されたものであるかに応じてカメラ特徴の判定時間長を可変するのは、映像コンテンツ中の速い動きと遅い動きの両方に対応するためである。図１５は、２フィールド間と１６フィールド間で検出される動きベクトルの移動量を、動きが速い場合及び動きが遅い場合について説明した図である。 As described above, changing the camera feature determination time length according to which field the detected most motion vector based on the combined motion vector is based on the fast motion in the video content. This is to cope with both slow movements. FIG. 15 is a diagram illustrating the movement amount of the motion vector detected between 2 fields and 16 fields when the motion is fast and when the motion is slow.

同図（Ａ）に示すように、短いフィールド間隔で有効に検出された動きベクトルは、速い動きを示していると考えられ、同図（Ｂ）に示すように、長いフィールド間隔で有効に検出された動きベクトルは、遅い動きを示していると考えられる。同図（Ａ）に示すように、動きが遅い場合には、１６フィールド間では、マッチング処理部３２は有効な動きベクトルを検出できる。しかし、２フィールド間では、マッチング処理部２６は、合成動きベクトルの各ブロックの動き量が小さすぎるために、有効な動きベクトルを検出できない可能性がある。また、マッチング処理部２６が有効な動きベクトルを検出できたとしても、合成動きベクトルの各ブロックの動きベクトルが小さすぎて、カメラ特徴判定部３６がカメラ特徴として検出できない可能性がある。 As shown in the figure (A), the motion vector that is effectively detected in a short field interval is considered to indicate a fast motion, and as shown in the figure (B), it is detected effectively in a long field interval. It is considered that the obtained motion vector indicates a slow motion. As shown in FIG. 4A, when the motion is slow, the matching processing unit 32 can detect an effective motion vector between 16 fields. However, between the two fields, the matching processing unit 26 may not be able to detect an effective motion vector because the motion amount of each block of the combined motion vector is too small. Even if the matching processing unit 26 can detect an effective motion vector, the motion vector of each block of the combined motion vector may be too small to be detected by the camera feature determination unit 36 as a camera feature.

一方、同図（Ｂ）に示すように、動きが速い場合には、マッチング処理部２６は、２フィールド間では有効な動きベクトルを検出できる。しかし、１６フィールド間では、マッチング処理部３２は、動き量が大きすぎてそれに追従できず、有効な動きベクトルを検出できない可能性がある。 On the other hand, as shown in FIG. 5B, when the motion is fast, the matching processing unit 26 can detect an effective motion vector between two fields. However, there is a possibility that the matching processing unit 32 cannot detect an effective motion vector between 16 fields because the amount of motion is too large to follow.

したがって、カメラ特徴判定部３６は、最多の動きベクトルが検出されたフィールド間が大きい場合には、カメラ特徴の判定継続時間長を大きくする。これにより、遅い動きを確実に検出することができる。またカメラ特徴判定部３６は、最多の動きベクトルが検出されたフィールド間が小さい場合には、カメラ特徴の判定時間長を小さくする。これにより、速い動きを確実に検出できると共に、無駄な判定処理を防ぎ、また長時間判定を行うことで混入する可能性のあるノイズによる誤判定を防ぐことができる。図１６は、以上のような、最多の動きベクトルが検出されたフィールド間に応じたカメラ特徴の判定継続時間長の可変処理について示したグラフである。同図に示すように、２フィールド間、４フィールド間、８フィールド間、１６フィールド間とフィールド間が大きくなるに従って、カメラ特徴判定継続時間長も大きくなっている。 Therefore, the camera feature determination unit 36 increases the determination duration of the camera feature when the distance between the fields where the most motion vectors are detected is large. Thereby, it is possible to reliably detect a slow movement. The camera feature determination unit 36 reduces the determination time length of the camera feature when the distance between the fields where the most motion vectors are detected is small. Accordingly, it is possible to reliably detect a fast movement, to prevent useless determination processing, and to prevent erroneous determination due to noise that may be mixed by performing long-time determination. FIG. 16 is a graph showing the variable processing of the camera feature determination continuation length corresponding to the field in which the most motion vectors are detected as described above. As shown in the figure, the camera feature determination duration time increases as the distance between 2 fields, 4 fields, 8 fields, 16 fields, and the fields increases.

（フェード／カット検出処理）
図１１に戻り、次に、映像特徴検出部４は、フェード及びカットの検出処理を行う。
まず、フェード／カット処理部２７、２９、３１及び３３の処理について説明する。 (Fade / cut detection process)
Returning to FIG. 11, the video feature detection unit 4 performs a fade and cut detection process.
First, the processing of the fade / cut processing units 27, 29, 31 and 33 will be described.

フェード／カット処理部２７、２９、３１及び３３は、それぞれ、上記各マッチング処理部２６、２８、３０及び３２から、マッチング処理後の各残差を入力し、これらの各残差を基に、フェード／カット評価値を生成し、フェード／カット判定部３５へ出力する（ステップ１１６）。 The fade / cut processing units 27, 29, 31 and 33 respectively input the residuals after the matching processing from the matching processing units 26, 28, 30 and 32, and based on these residuals, respectively. A fade / cut evaluation value is generated and output to the fade / cut determination unit 35 (step 116).

ここで、フェード／カット評価値Ｈは、上記残差をＥｎ（ｎ＝０〜６３）とすると、次式で求めることができる。
₆₃
Ｈ＝ ΣEn
ⁿ⁼⁰ Here, the fade / cut evaluation value H can be obtained by the following equation, where En (n = 0 to 63) is the residual.
₆₃
H = ΣEn
^{n = 0}

したがって、各フェード／カット処理部２７、２９、３１及び３３は、上記各マッチング処理部２６、２８、３０及び３２からの各残差を、それぞれｎ＝６３となるまで、すなわち、基準フィールドの全ての基準ブロックについての残差が入力されるまで入力し、それぞれそれらの総和を算出する。 Therefore, each fade / cut processing unit 27, 29, 31 and 33 uses the residuals from the matching processing units 26, 28, 30 and 32 until n = 63, that is, all the reference fields. Are input until the residuals for the reference blocks are input, and the sum of these is calculated.

図１７及び図１８は、上記フェード／カット評価値の算出結果と、フィールド経過との関係を、上記フィールド間隔毎に示したグラフである。図１７は、カット点が含まれる場合のグラフを示し、図１８は、フェードが含まれる場合のグラフを示している。 17 and 18 are graphs showing the relationship between the calculation result of the fade / cut evaluation value and the field progress for each field interval. FIG. 17 shows a graph when cut points are included, and FIG. 18 shows a graph when fades are included.

フェード／カット判定部３５は、上記図１７及び図１８に示されるフェード／カット評価値の値を基に、フェード及びカットの判定を行う（図１１のステップ１１７）。すなわち、フェード／カット判定部３５は、フィールド経過に伴うフェード／カット評価値の変化が急峻な場合（ステップ１１８のＹｅｓ）には、カットであると判定してカット検出フラグDcut＝１とする（ステップ１２０）。また、フェード／カット判定部３５は、フィールド経過に伴うフェード／カット評価値の変化が緩やかである場合（ステップ１１９のＹｅｓ）には、フェードであると判定してフェード検出フラグDfade＝１とする（ステップ１２１）。そのどちらとも判定できない場合（ステップ１１９のＮｏ）には、フェード／カット判定部３５は、カット検出フラグDcut＝０、フェード検出フラグDfade＝０とする（ステップ１２２）。 The fade / cut determination unit 35 determines fade and cut based on the fade / cut evaluation value shown in FIGS. 17 and 18 (step 117 in FIG. 11). That is, when the change in the fade / cut evaluation value with the progress of the field is steep (Yes in Step 118), the fade / cut determination unit 35 determines that the cut is made and sets the cut detection flag Dcut = 1 ( Step 120). Further, when the change in the fade / cut evaluation value with the progress of the field is moderate (Yes in Step 119), the fade / cut determination unit 35 determines that the fade is made and sets the fade detection flag Dfade = 1. (Step 121). If neither of them can be determined (No in step 119), the fade / cut determination unit 35 sets the cut detection flag Dcut = 0 and the fade detection flag Dfade = 0 (step 122).

具体的には、フェード／カット判定部３５は、２フィールド間におけるフェード／カット評価値の変化を解析し、図１７のグラフａに示すようなピーク特性が検出される場合には、そのピーク点をカット点と判定する。 Specifically, the fade / cut determination unit 35 analyzes the change in the fade / cut evaluation value between two fields, and when a peak characteristic as shown in the graph a of FIG. Is determined as a cut point.

また、フェード／カット判定部３５は、上記ピーク特性が検出されない場合には、図１８に示すように、所定の時刻ｔにおいて、２フィールド間についてのフェード評価値（グラフａ）と４フィールド間隔についてのフェード評価値（グラフｂ）との差分Ｖａ、４フィールド間についてのフェード評価値と８フィールド間についてのフェード評価値（グラフｃ）との差分Ｖｂ、８フィールド間についてのフェード評価値と１６フィールド間についてのフェード評価値（グラフｄ）との差分Ｖｃをそれぞれ算出する。 Further, when the peak characteristic is not detected, the fade / cut determination unit 35 performs the fade evaluation value (graph a) between two fields and the four field intervals at a predetermined time t as shown in FIG. The difference Va between the fade evaluation value (graph b) and the fade evaluation value between four fields and the fade evaluation value (graph c) between eight fields and the fade evaluation value between eight fields and 16 fields A difference Vc from the fade evaluation value (graph d) is calculated.

図１８に示すように、フェードの場合、映像は徐々に変化していくため、フェード／カット評価値はフィールド間隔によってその変化量に違いが生じ、それにより上記Ｖａ、Ｖｂ、Ｖｃの各値は全て正の値かつ比較的近い数値として顕著に現れる。一方、カットの場合は、図１７に示すように、Ｖａ、Ｖｂ、Ｖｃの値に大きな差が生じ、また負の値となる場合もある。したがって、フェード／カット判定部３５は、このＶａ、Ｖｂ及びＶｃを解析することで、フェードか否かを判定することができる。 As shown in FIG. 18, in the case of fading, since the video gradually changes, the fade / cut evaluation value varies in the amount of change depending on the field interval, so that each value of Va, Vb, Vc is All appear prominently as positive and relatively close numbers. On the other hand, in the case of a cut, as shown in FIG. 17, there are large differences in the values of Va, Vb, and Vc, and there may be negative values. Therefore, the fade / cut determination unit 35 can determine whether or not it is a fade by analyzing the Va, Vb, and Vc.

そして、映像特徴検出部４は、以上の処理により検出された各カメラ特徴、映像編集特徴を出力する（ステップ１２３）。 Then, the video feature detection unit 4 outputs each camera feature and video editing feature detected by the above processing (step 123).

図１９は、上記カメラ特徴判定部３６及びフェード／カット判定部３５により判定された各映像特徴の判定結果を示した表である。ＣＰＵ１は、この表と同等のデータを例えばＲＡＭ２やＨＤＤ１０へ記憶するように、映像特徴検出部４を制御する。 FIG. 19 is a table showing the determination results of the video features determined by the camera feature determination unit 36 and the fade / cut determination unit 35. The CPU 1 controls the video feature detection unit 4 so as to store data equivalent to this table in, for example, the RAM 2 or the HDD 10.

記録再生装置１００は、上記検出された映像特徴を用いて、映像コンテンツから、例えばダイジェスト映像（ハイライト映像）を作成したり、映像コンテンツをチャプタに分割したりすることができる。 The recording / reproducing apparatus 100 can create, for example, a digest video (highlight video) from video content or divide the video content into chapters using the detected video features.

［まとめ］
以上説明したように、本実施形態によれば、記録再生装置１００は、時間長の異なる画像データ間で検出された動きベクトルの有効性を判断し、有効な動きベクトルを基に合成動きベクトルを生成するため、動きベクトルの検出精度を向上させることができる。 [Summary]
As described above, according to the present embodiment, the recording / reproducing apparatus 100 determines the effectiveness of the motion vector detected between the image data having different time lengths, and determines the synthesized motion vector based on the effective motion vector. Therefore, the motion vector detection accuracy can be improved.

また、記録再生装置は、合成動きベクトルの基になった各動きベクトルが検出されたフィールド間に応じて、カメラ特徴の判定継続時間長を可変するため、速い動きと遅い動きの両方に対応して確実にカメラ特徴を検出することができる。 In addition, since the recording / playback apparatus varies the duration of the camera feature determination according to the field in which each motion vector that is the basis of the combined motion vector is detected, it supports both fast and slow motion. Thus, camera features can be reliably detected.

［変形例］
本発明は以上説明した実施形態にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 [Modification]
The present invention is not limited to the embodiments described above, and various modifications can be made without departing from the scope of the present invention.

上記図４においては、上記映像特徴検出部４を、各フィールド間メモリ部２２〜２５を直列接続することで構成していたが、各フィールド間メモリ部２２〜２５を並列接続して映像特徴検出部４を構成しても構わない。図２０は、この場合の映像特徴検出部４の構成を示した図である。このように構成しても、上記並列接続した場合と同様の処理を実行し、同様の効果を得ることができる。 In FIG. 4, the video feature detection unit 4 is configured by connecting the inter-field memory units 22 to 25 in series. However, the video feature detection is performed by connecting the inter-field memory units 22 to 25 in parallel. The unit 4 may be configured. FIG. 20 is a diagram showing the configuration of the video feature detection unit 4 in this case. Even if comprised in this way, the same process as the case where it connected in parallel may be performed, and the same effect can be acquired.

上述の実施形態においては、映像特徴検出部４は、映像コンテンツをフィールド単位で処理したが、もちろん、フレーム単位で処理しても構わない。 In the above-described embodiment, the video feature detection unit 4 processes video content in units of fields, but may of course process in units of frames.

上述の実施形態においては、映像特徴検出部４は、元の映像コンテンツのサムネイル映像を構成する各サムネイル画像を用いていたが、縮小されていない元の映像コンテンツの画像そのものを用いても構わない。 In the above-described embodiment, the video feature detection unit 4 uses each thumbnail image constituting the thumbnail video of the original video content. However, the original video content image that has not been reduced may be used. .

上述の実施形態においては、映像特徴検出部４は、判別関数として線形判別関数を用いたが、非線形判別関数を用いてもよい。非線形判別分析には、例えばマハラノビス距離を計算して判別を行う手法や、サポートベクターマシン、ニューラルネットワークその他の機械学習的手法等が用いられてもよい。 In the above-described embodiment, the video feature detection unit 4 uses a linear discriminant function as a discriminant function, but may use a non-linear discriminant function. For the non-linear discriminant analysis, for example, a method of performing discrimination by calculating the Mahalanobis distance, a support vector machine, a neural network, or other machine learning methods may be used.

マハラノビス距離を計算する手法では、判別関数は用いられないが、上述の実施形態と同様に、動きベクトルの有効／無効が予め分かっている学習用コンテンツが用いられる。まず、いくつかの学習用コンテンツから検出された動きベクトルの上記判別パラメータを基に、有効／無効の各データ群の分散共分散行列及び平均が算出される。 In the method of calculating the Mahalanobis distance, the discriminant function is not used, but learning content in which the validity / invalidity of the motion vector is known in advance is used as in the above-described embodiment. First, the variance-covariance matrix and average of each valid / invalid data group are calculated based on the discrimination parameters of motion vectors detected from several learning contents.

そして、映像特徴検出処理の対象となる映像コンテンツの各フィールド間で、各ブロックから検出された動きベクトルの判別パラメータと、有効／無効の各データ群との間の各マハラノビス距離が計算される。そして、当該マハラノビス距離がどちらのデータ群に近いかが判定され、検出された動きベクトルの判別パラメータは、当該距離が近い方（距離が最小となる方）の群に属するとされる。このような処理により、検出された動きベクトルの有効／無効が判別される。 Then, the Mahalanobis distance between the motion vector discrimination parameter detected from each block and each valid / invalid data group is calculated between each field of the video content to be subjected to the video feature detection processing. Then, it is determined which data group the Mahalanobis distance is closer to, and the detected motion vector discrimination parameter belongs to the group of the closer distance (the one with the smallest distance). By such processing, the validity / invalidity of the detected motion vector is determined.

ここで、上記学習用データの、有効な判別パラメータの分散共分散行列をＨ_ｖ、平均をａ_ｙ、無効な判別パラメータの分散共分散行列をＨ_ｎ、平均をａ_ｎとし、判別処理の対象となる、検出された動きベクトルの判別パラメータをｖとすると、検出された動きベクトルの判別パラメータと、有効なデータ群とのマハラノビス距離Ｄ_ｙは、以下の式で算出される。
Ｄ_ｙ ^２＝（ｖ−ａ_ｙ）^ＴＨ_ｙ ^−１（ｖ−ａ_ｙ）
ここで、（ｖ−ａ_ｙ）^Ｔは転置行列、Ｈ_ｙ ^−１は逆行列であることを示す。また、上記判別パラメータｖは多変量データであるため、（特徴）ベクトルと見なしている。 Here, the data above learning, the variance-covariance matrix H _v of valid discrimination _parameter, averaged a _y, a variance-covariance matrix of the invalid determining parameter H _n, average and a _n, subject discriminating process Assuming that the detected motion vector discrimination parameter is v, the Mahalanobis distance D _y between the detected motion vector discrimination parameter and the effective data group is calculated by the following equation.
_{^{_{^{D y 2 = (v-a}}}} y) T H y -1 (v-a y)
Here, (v−a _y ) ^T indicates a transposed matrix, and H _y ⁻¹ indicates an inverse matrix. Further, since the discrimination parameter v is multivariate data, it is regarded as a (feature) vector.

同様に、検出された動きベクトルの判別パラメータと、無効なデータ群とのマハラノビス距離Ｄ_ｎは、以下の式で算出される。
Ｄ_ｎ ^２＝（ｖ−ａ_ｎ）^ＴＨ_ｎ ^−１（ｖ−ａ_ｎ） Similarly, Mahalanobis distance D _n of the determining parameter of the detected motion vector, and invalid data group is calculated by the following equation.
_{^{_{^{D n 2 = (v-a}}}} n) T H n -1 (v-a n)

以上より、検出された動きベクトルの有効／無効の判別は、以下のようになる。
Ｄ_ｙ ^２＜Ｄ_ｎ ^２の場合・・・有効
Ｄ_ｙ ^２ ≧ Ｄ_ｎ ^２の場合・・・無効 As described above, the validity / invalidity of the detected motion vector is determined as follows.
D _y ² <··· invalid cases of ... effective D _y ^² ≧ D _n ² of D _n ²

（学習用コンテンツ生成処理）
次に、上記マハラノビス距離を用いた判別処理に用いられる、学習用データの生成処理に関する他の実施形態について説明する。 (Learning content generation processing)
Next, another embodiment relating to the learning data generation process used in the discrimination process using the Mahalanobis distance will be described.

上述したように、マハラノビス距離を用いた判別処理においては、その判別処理に用いる学習データを生成するために、予め動きベクトルの有効／無効が分かっている学習用コンテンツ（学習用映像データ）が用いられる。この学習用コンテンツから動きベクトルが検出され、当該動きベクトルが有効な場合と無効な場合とでそれぞれ上記判別パラメータが取得され、当該判別パラメータが上記判別処理に用いられる。しかしながら、この学習用コンテンツを、上記異なる時間長毎に、動き量や動き方向を調整して、例えばカメラ撮影等により手動で作成するのは大変な手間である。また、手動で作成された学習用コンテンツから上記判別パラメータが取得される場合、当該判別パラメータの特性にばらつきが生じ、判別処理の精度が低くなってしまう可能性もある。
そこで本実施形態では、この学習用コンテンツを、記録再生装置１００が自動的に作成することとした。 As described above, in the discrimination processing using the Mahalanobis distance, learning content (learning video data) whose motion vector is known in advance is used to generate learning data used for the discrimination processing. It is done. A motion vector is detected from the learning content, and the discrimination parameter is acquired when the motion vector is valid and invalid, and the discrimination parameter is used for the discrimination process. However, it is very troublesome to manually create the learning content by adjusting the amount of movement and the direction of movement for each of the different time lengths, for example, by camera photography. In addition, when the discrimination parameter is acquired from the manually created learning content, there is a possibility that the characteristics of the discrimination parameter vary and the accuracy of the discrimination process is lowered.
Therefore, in this embodiment, the learning / playback apparatus 100 automatically creates the learning content.

図２１は、本実施形態における映像特徴検出部４の構成を示したブロック図である。同図において、上述の実施形態と同様の箇所は同一の符号を付し、説明を省略または簡略化する。 FIG. 21 is a block diagram showing a configuration of the video feature detection unit 4 in the present embodiment. In the figure, the same parts as those in the above-described embodiment are denoted by the same reference numerals, and description thereof is omitted or simplified.

同図に示すように、本実施形態における映像特徴検出部４は、新たに、学習用画像抽出部２１１、ウィンドウ処理部２１２、学習処理部２１３を有する。ブロック処理部２１、２／４／８／１６フィールド間の各メモリ部２２〜２５、各マッチング処理部２６、２８、３０、３２、各フェード／カット処理部２７、２９、３１、３３、判別処理部２１４、正規化部２１５、動きベクトル処理部３４、フェード／カット判定部３５及びカメラ特徴判定部３６は、上述した実施形態と同様の構成及び機能を有する。 As shown in the figure, the video feature detection unit 4 in the present embodiment newly includes a learning image extraction unit 211, a window processing unit 212, and a learning processing unit 213. Block processing units 21, 2/4/8/16 fields between memory units 22-25, matching processing units 26, 28, 30, 32, fade / cut processing units 27, 29, 31, 33, discrimination processing The unit 214, the normalization unit 215, the motion vector processing unit 34, the fade / cut determination unit 35, and the camera feature determination unit 36 have the same configuration and functions as those of the above-described embodiment.

学習用画像抽出部２１１には、上記学習用コンテンツを生成するためのサンプル映像コンテンツの映像信号またはサンプル静止画像データが入力される。学習用画像抽出部２１１は、入力されたサンプル映像コンテンツからカット点（シーンチェンジ）を検出し、当該カット点の画像を学習用画像データとして抽出する。抽出された学習用画像データは、ウィンドウ処理部２１２へ供給される。具体的には、学習用画像抽出部２１１は、入力されたサンプル映像コンテンツの上記各フィールド間で、Ｙ、Ｃｂ及びＣｒのヒストグラムの残差信号をブロック毎に検出し、当該各ブロックの残差信号の総和をカット評価値として算出する。そして学習用画像抽出部２１１は、当該カット評価値の変化を解析することでカット点を検出し、当該カット点のフィールドを学習用画像データとして抽出する。すなわち、カット評価値が急激に（例えば所定の閾値以上）変化した場合に、当該変化の前後におけるフィールドのうち、後のフィールドが学習用画像データとして抽出される。 The learning image extraction unit 211 receives a video signal of sample video content or sample still image data for generating the learning content. The learning image extraction unit 211 detects a cut point (scene change) from the input sample video content, and extracts an image of the cut point as learning image data. The extracted learning image data is supplied to the window processing unit 212. Specifically, the learning image extraction unit 211 detects, for each block, residual signals in the histograms of Y, Cb, and Cr between the above-described fields of the input sample video content, and the residual of each block. The sum of the signals is calculated as a cut evaluation value. The learning image extraction unit 211 detects a cut point by analyzing the change in the cut evaluation value, and extracts the field of the cut point as learning image data. That is, when the cut evaluation value changes abruptly (for example, a predetermined threshold value or more), the subsequent field is extracted as learning image data among the fields before and after the change.

また学習用画像抽出部２１１は、サンプルとして映像コンテンツではなく静止画像データを用いる場合は、その静止画像データを学習用画像データとしてウィンドウ処理部２１２へ供給する。 In addition, when using still image data instead of video content as a sample, the learning image extraction unit 211 supplies the still image data to the window processing unit 212 as learning image data.

ウィンドウ処理部２１２は、学習用画像抽出部２１１から供給された学習用画像データ内でウィンドウ（矩形枠）を移動させ、またはウィンドウ内の画像を変化させる（ウィンドウ処理）ことで、学習用コンテンツ（学習用映像データ）を生成する。当該学習用コンテンツは、上記２／４／８／１６フィールドの各フィールド間毎に、かつ、上記動きベクトルとして検出されるパン、チルト及びズームの各カメラ特徴毎に、当該各カメラ特徴の検出の妥当性を学習するために生成される。生成された学習用コンテンツは、学習処理部２１３へ供給される。このウィンドウ処理の詳細については後述する。 The window processing unit 212 moves the window (rectangular frame) in the learning image data supplied from the learning image extraction unit 211 or changes the image in the window (window processing), so that the learning content ( Learning video data). The learning content includes the detection of each camera feature for each of the 2/4/8/16 fields and for each pan, tilt, and zoom camera feature detected as the motion vector. Generated to learn validity. The generated learning content is supplied to the learning processing unit 213. Details of this window processing will be described later.

学習処理部２１３は、ウィンドウ処理部２１２から供給された学習用コンテンツを用いて、動きベクトルの有効／無効についての学習処理を実行する。具体的には、学習処理部２１３は、学習用コンテンツから上記異なるフィールド間で動きベクトルを検出し、当該動きベクトルの有効／無効の判断結果と、動きベクトル検出時の上記判別パラメータとを学習データとして判別処理部２１４へ供給する。 The learning processing unit 213 uses the learning content supplied from the window processing unit 212 to execute a learning process for valid / invalid motion vectors. Specifically, the learning processing unit 213 detects a motion vector between the different fields from the learning content, and obtains the determination result of the validity / invalidity of the motion vector and the determination parameter at the time of detecting the motion vector as learning data. To the discrimination processing unit 214.

判別処理部２１４は、上記マッチング処理部２６、２８、３０、３２からそれぞれ検出された動きベクトルの有効／無効について、学習処理部２１３から供給された学習データを用いて、マハラノビス距離に基づく判別分析を行う。判別処理部２１４は、当該判別分析の結果（動きベクトルの有効／無効）を正規化部２１５へ出力する。 The discrimination processing unit 214 uses the learning data supplied from the learning processing unit 213 for the validity / invalidity of the motion vectors detected from the matching processing units 26, 28, 30, and 32, respectively, and discriminant analysis based on the Mahalanobis distance I do. The discrimination processing unit 214 outputs the result of the discrimination analysis (valid / invalid of motion vector) to the normalization unit 215.

ここで、上記サンプル映像コンテンツは、上記ブロック処理部２１に入力される、映像特徴の検出対象である映像コンテンツと同一のコンテンツであってもよい。この場合、映像特徴検出部４は、例えばユーザの操作により、ある映像コンテンツからのダイジェスト映像の生成が指示されると、当該映像コンテンツをブロック処理部２１へ入力するとともに、学習用画像抽出部２１１へ入力する。これにより、ダイジェスト映像の生成対象の映像コンテンツと同一の映像コンテンツについて、上記各マッチング処理部による動きベクトル検出処理と並行して、学習用コンテンツ生成処理及び学習処理が実行される。 Here, the sample video content may be the same content as the video content that is input to the block processing unit 21 and that is the target of video feature detection. In this case, for example, when the generation of a digest video from a certain video content is instructed by a user operation, the video feature detection unit 4 inputs the video content to the block processing unit 21 and also uses the learning image extraction unit 211. To enter. Thus, the learning content generation process and the learning process are executed in parallel with the motion vector detection process by each matching processing unit for the same video content as the digest video generation target video content.

上記サンプル映像コンテンツが、映像特徴の検出対象の映像コンテンツと同一でない場合、学習用コンテンツの生成処理及び学習処理は、例えば定期的、または映像コンテンツがＨＤＤ１０等に新たに記憶されたとき等に、上記動きベクトル検出処理に先立って予め実行される。 When the sample video content is not the same as the video content whose video feature is to be detected, the learning content generation process and the learning process are performed periodically or when the video content is newly stored in the HDD 10 or the like. Prior to the motion vector detection process, it is executed in advance.

ブロック処理部２１には、映像特徴の検出対象である映像コンテンツの映像信号が入力される。当該映像信号は、各メモリ部２２〜２５にメモリされ、各マッチング処理部２６、２８、３０、３２により当該映像信号から動きベクトルが検出され、判別処理部２１４へ出力される。 The block processing unit 21 receives a video signal of video content that is a target of video feature detection. The video signals are stored in the memory units 22 to 25, and motion vectors are detected from the video signals by the matching processing units 26, 28, 30, and 32, and are output to the discrimination processing unit 214.

また各フェード／カット処理部２７、２９、３１、３３には、各マッチング処理部２６、２８、３０、３２におけるマッチング結果としての上記残差信号が供給される。各フェード／カット処理部２７、２９、３１、３３は、上述したように、当該残差信号を用いてフェード／カット評価値を生成し、フェード／カット判定部３５へ出力する。フェード／カット判定部３５は、上記フェード／カット評価値の変化を解析して、映像信号からフェード及びカットを検出し、その結果を出力する。本実施形態においては、フェード／カット判定部３５は、検出されたカット点における画像データを、上記学習用画像データとしてウィンドウ処理部２１２へ供給することも可能である。すなわち、上述したように、サンプル映像コンテンツが、上記映像特徴の検出対象である映像コンテンツと同一のコンテンツである場合には、当該映像特徴の検出対象の映像コンテンツから検出されたカット点の画像がウィンドウ処理部２１２へ供給される。 Further, the residual signals as matching results in the respective matching processing units 26, 28, 30 and 32 are supplied to the respective fade / cut processing units 27, 29, 31 and 33. As described above, each fade / cut processing unit 27, 29, 31, 33 generates a fade / cut evaluation value using the residual signal and outputs the fade / cut evaluation value to the fade / cut determination unit 35. The fade / cut determination unit 35 analyzes the change in the fade / cut evaluation value, detects fade and cut from the video signal, and outputs the result. In the present embodiment, the fade / cut determination unit 35 can also supply the image data at the detected cut point to the window processing unit 212 as the learning image data. That is, as described above, in the case where the sample video content is the same content as the video content that is the detection target of the video feature, the image of the cut point detected from the video content that is the detection target of the video feature is This is supplied to the window processing unit 212.

正規化部２１５、動きベクトル処理部３４、カメラ特徴判定部３６の各機能については、上述の実施形態と同様であるため省略する。 The functions of the normalization unit 215, the motion vector processing unit 34, and the camera feature determination unit 36 are the same as those in the above-described embodiment, and thus are omitted.

次に、以上のように構成された映像特徴検出部４による、学習用コンテンツの生成処理について説明する。
図２２は、本実施形態における学習用コンテンツ生成処理の大まかな流れを示したフローチャートである。同図においては、映像特徴検出部４が、サンプル映像コンテンツから抽出される学習用画像データを用いて学習用コンテンツを生成する場合について説明する。 Next, learning content generation processing by the video feature detection unit 4 configured as described above will be described.
FIG. 22 is a flowchart showing a rough flow of the learning content generation processing in the present embodiment. In the figure, a case where the video feature detection unit 4 generates learning content using learning image data extracted from sample video content will be described.

図２２に示すように、映像特徴検出部４はまず、ＨＤＤ１０等に記憶された各種映像コンテンツを、サンプル映像コンテンツとして学習用画像抽出部２１１へ入力する（ステップ２２１）。学習用画像抽出部２１１は、当該入力されたサンプル映像コンテンツから、上述の手法によりカット点を抽出する（ステップ２２２）。カット点、すなわちシーンチェンジが検出された場合（Ｙｅｓ）、学習用画像抽出部２１１は、カット点の画像（カット点前後の画像のうち後の画像）を学習用画像データとして抽出し、ＲＡＭ２等に記憶する（ステップ２２４）。映像特徴検出部４は、当該ステップ２２１からステップ２２４までの処理を、ＨＤＤ１０等に記憶されている全てのサンプル映像コンテンツについて実行するまで繰り返す（ステップ２２５）。 As shown in FIG. 22, the video feature detection unit 4 first inputs various video contents stored in the HDD 10 or the like as sample video content to the learning image extraction unit 211 (step 221). The learning image extraction unit 211 extracts cut points from the input sample video content by the above-described method (step 222). When a cut point, that is, a scene change is detected (Yes), the learning image extraction unit 211 extracts an image of the cut point (the image after the cut point) as learning image data, and the RAM 2 or the like. (Step 224). The video feature detection unit 4 repeats the processing from step 221 to step 224 for all the sample video contents stored in the HDD 10 or the like (step 225).

図２５は、サンプル映像コンテンツからカット点の画像が学習用画像データとして抽出される様子を示した図である。
同図に示すように、サンプル映像コンテンツからは、ｆ０、ｆ２及びｆ４の各画像がカット点１〜３として抽出される。このカット点の各画像Ｖ１、Ｖ２及びＶ３が、学習用画像データとして学習用画像抽出部２１１へ出力され、ＲＡＭ２等に記憶される。 FIG. 25 is a diagram illustrating a state in which an image of a cut point is extracted as learning image data from sample video content.
As shown in the figure, images f0, f2, and f4 are extracted as cut points 1 to 3 from the sample video content. The images V1, V2, and V3 of the cut points are output as learning image data to the learning image extraction unit 211 and stored in the RAM 2 or the like.

ここで、上記サンプル映像コンテンツが、映像特徴の検出対象の映像コンテンツと同一である場合には、上記フェード／カット判定部３５から、カット点の各画像が学習用画像抽出部２１１へ入力され、学習用画像データとしてＲＡＭ２等に記憶される。この場合、フェード／カット判定部３５は、映像特徴の検出対象の映像コンテンツから全てのカット点の画像を抽出して学習用画像抽出部２１１へ出力する。 Here, when the sample video content is the same as the video content whose video feature is to be detected, each image of the cut point is input from the fade / cut determination unit 35 to the learning image extraction unit 211, It is stored in the RAM 2 or the like as learning image data. In this case, the fade / cut determination unit 35 extracts images of all cut points from the video content whose video features are to be detected, and outputs the images to the learning image extraction unit 211.

また、学習用コンテンツの生成に静止画像データが用いられる場合には、図２４に示すように、例えばＨＤＤ１０等に記憶された全ての静止画像データＶ１、Ｖ２、Ｖ３・・・が学習用画像データとして学習用画像抽出部２１１へ入力され、ＲＡＭ２等に記憶される。 Further, when still image data is used for generating learning content, as shown in FIG. 24, for example, all the still image data V1, V2, V3,. Is input to the learning image extraction unit 211 and stored in the RAM 2 or the like.

続いて、ウィンドウ処理部２１２は、これから実行するウィンドウ処理用のフラグｗを０に設定する（ステップ２２６）。
ここで、ウィンドウ処理用フラグｗは０〜２まで設定される。ｗ＝０はウィンドウを水平方向へ移動させる場合、ｗ＝１はウィンドウを垂直方向へ移動させる場合、ｗ＝２はウィンドウを拡大及び縮小させる場合にそれぞれ設定される。ウィンドウ処理用フラグの初期値は０とされる。 Subsequently, the window processing unit 212 sets a window processing flag w to be executed to 0 (step 226).
Here, the window processing flag w is set to 0-2. w = 0 is set when the window is moved in the horizontal direction, w = 1 is set when the window is moved in the vertical direction, and w = 2 is set when the window is enlarged and reduced. The initial value of the window processing flag is set to 0.

続いて、ウィンドウ処理部２１２は、上記ＲＡＭ２等に記憶された学習用画像データを読み出し（ステップ２２７）、当該学習用画像データについてウィンドウ処理を実行する（ステップ２２８）。以下、このウィンドウ処理について詳述する。 Subsequently, the window processing unit 212 reads the learning image data stored in the RAM 2 or the like (step 227), and executes window processing on the learning image data (step 228). Hereinafter, this window processing will be described in detail.

図２３は、上記ウィンドウ処理の詳細な流れを示すフローチャートである。
同図に示すように、ウィンドウ処理部２１２は、上記ウィンドウ処理用フラグｗが０であるか否か、すなわち、自身がウィンドウを水平方向（左右方向）へ移動する処理を設定されているか否かを判断する（ステップ２４１）。 FIG. 23 is a flowchart showing a detailed flow of the window processing.
As shown in the figure, the window processing unit 212 determines whether or not the window processing flag w is 0, that is, whether or not the window processing unit 212 is set to move the window in the horizontal direction (left-right direction). Is determined (step 241).

続いて、ウィンドウ処理部２１２は、２^ｎフィールド間で検出される動きベクトルの妥当性の学習用に学習用コンテンツを生成するため、まずｎ＝１を設定する（ステップ２４２）。ここで１≦ｎ≦４（ｎは整数）である。すなわち、学習用コンテンツは、上記各マッチング処理部によって検出される、２／４／８／１６フィールドの各フィールド間の動きベクトルの妥当性をそれぞれ学習するために、当該フィールド間毎に生成される。このフィールド間毎の学習用コンテンツ生成処理における上記ｎの初期値が１とされる。 Subsequently, the window processing unit 212 first sets n = 1 in order to generate learning content for learning the validity of motion vectors detected between 2 ⁿ fields (step 242). Here, 1 ≦ n ≦ 4 (n is an integer). That is, the learning content is generated for each field in order to learn the validity of the motion vector between the fields of 2/4/8/16 fields detected by the matching processing units. . The initial value of n in the learning content generation process for each field is set to 1.

続いて、ウィンドウ処理部２１２は、２^ｎフィールド間の動きベクトル検出に対応した学習用コンテンツを生成するための、ウィンドウの水平方向の移動速度を設定する。ここで、移動速度は、上記各マッチング処理部が、２^ｎフィールド間でやっと検出できる程度の最小速度に設定される。まず２フィールド間の動きベクトル用の学習用コンテンツの生成のための移動速度が設定される。 Subsequently, the window processing unit 212 sets a horizontal moving speed of the window for generating learning content corresponding to motion vector detection between 2 ⁿ fields. Here, the moving speed is set to a minimum speed at which each matching processing unit can finally detect between 2 ⁿ fields. First, a moving speed for generating learning content for motion vectors between two fields is set.

続いて、ウィンドウ処理部２１２は、上記読み出した学習用画像データ内で、上記設定された移動速度でウィンドウを水平方向に移動させることで、２^ｎフィールド間の、上記パン（左パン及び右パン）学習用の学習用コンテンツを生成する（ステップ２４４）。 Subsequently, the window processing unit 212 moves the window in the horizontal direction at the set moving speed in the read learning image data, thereby performing the panning (left panning and right panning) between 2 ⁿ fields. ) Learning content for learning is generated (step 244).

図２６は、当該パン用の学習用コンテンツが生成される様子を示した図である。
同図（Ａ）に示すように、ウィンドウＳは、学習用画像データＶのサイズ（例えば７２０×４８０画素）よりも小さい所定の面積に設定され、水平方向に移動される。例えばウィンドウ処理部２１２は、同図（Ｂ）に示すウィンドウＳを同図（Ｃ）〜（Ｅ）に示すように徐々に右方向へ移動させ、移動されたウィンドウＳ内の各画像を切り出すことで、実際にカメラを右方向にパンして撮影されたような学習用コンテンツを生成できる。この学習用コンテンツでは、被写体（ビル）が徐々に左方向へ移動しているため、上記動きベクトルとして左方向への移動が検出されることとなる。同様に、ウィンドウ処理部２１２は、ウィンドウＳを左方向へ移動させることで、実際にカメラを左方向にパンして撮影したような学習用コンテンツを生成できる。この学習用コンテンツからは、動きベクトルとして被写体の右方向への移動が検出される。ウィンドウ処理部２１２は、ある学習用画像データＶについて、ウィンドウＳの右方向への移動によりカメラの右パンを模した学習用コンテンツを生成した後、当該右方向へ移動されたウィンドウＳを左方向へ戻すことで、左パンを模した学習用コンテンツを作成してもよい。上述のように、ウィンドウＳの移動速度は予め設定されているため、ウィンドウ処理部２１２は、これらの学習用コンテンツにおける被写体の移動量も容易に把握することができる。 FIG. 26 is a diagram illustrating how the learning content for bread is generated.
As shown in FIG. 6A, the window S is set to a predetermined area smaller than the size of the learning image data V (for example, 720 × 480 pixels) and is moved in the horizontal direction. For example, the window processing unit 212 gradually moves the window S shown in FIG. 5B to the right as shown in FIGS. 5C to 5E, and cuts out each image in the moved window S. Thus, it is possible to generate learning content that is actually taken by panning the camera to the right. In this learning content, since the subject (building) is gradually moving in the left direction, the movement in the left direction is detected as the motion vector. Similarly, the window processing unit 212 can generate learning content that is actually taken by panning the camera leftward by moving the window S leftward. From this learning content, the movement of the subject in the right direction is detected as a motion vector. The window processing unit 212 generates learning content imitating the right pan of the camera by moving the window S in the right direction with respect to certain learning image data V, and then moves the window S moved in the right direction to the left By returning to, a learning content imitating the left pan may be created. As described above, since the moving speed of the window S is set in advance, the window processing unit 212 can easily grasp the moving amount of the subject in these learning contents.

続いて、ウィンドウ処理部２１２は、上記ｎの値を１インクリメントしてｎ＝ｎ＋１に設定し（ステップ２４５）、当該インクリメントされたｎの値が、５を越えたか否かを判断する（ステップ２４６）。 Subsequently, the window processing unit 212 increments the value of n by 1 and sets it to n = n + 1 (step 245), and determines whether or not the incremented value of n exceeds 5 (step 246). ).

ｎの値が５を越えていない場合（Ｎｏ）、ウィンドウ処理部２１２は、上記ステップ２４３へ戻り、次のフィールド間の動きベクトルに対応したパン用の学習用コンテンツ生成処理を実行する。これにより、２／４／８／１６フィールド間の各動きベクトルに対応したパン用の学習用コンテンツが生成される。ｎの値が５を越えた場合（Ｙｅｓ）、ウィンドウ処理部２１２は、パン用の学習用コンテンツ生成処理を終了する。 If the value of n does not exceed 5 (No), the window processing unit 212 returns to step 243 and executes the learning content generation process for pan corresponding to the motion vector between the next fields. Thereby, the learning content for bread corresponding to each motion vector between 2/4/8/16 fields is generated. When the value of n exceeds 5 (Yes), the window processing unit 212 ends the learning content generation process for bread.

上記ステップ２４１において、ウィンドウ処理部２１２は、ウィンドウ処理用フラグｗが０ではないと判断した場合（Ｎｏ）、当該ウィンドウ処理用フラグｗが１であるか否かを判断する（ステップ２４７）。すなわち、ウィンドウ処理部２１２は、自身がウィンドウを垂直方向（上下方向）へ移動する処理を設定されているか否かを判断する（ステップ２４７）。 When the window processing unit 212 determines in step 241 that the window processing flag w is not 0 (No), the window processing unit 212 determines whether or not the window processing flag w is 1 (step 247). That is, the window processing unit 212 determines whether or not the processing for moving the window in the vertical direction (up and down direction) is set (step 247).

続いて、ウィンドウ処理部２１２は、パン学習用の学習用コンテンツの生成処理と同様に、学習用画像データＶ内でウィンドウＳを垂直方向へ移動させることで、各フィールド間の垂直方向の動きベクトルに対応したチルト学習用の学習用コンテンツを生成する（ステップ２４８〜２５２）。すなわち、ウィンドウ処理部２１２は、実際にカメラを垂直方向にチルトして撮影したような学習用コンテンツを生成する。この場合、ウィンドウ処理部２１２は、例えば学習用画像データＶ内で上方向にウィンドウＳを上方向に移動させて、カメラの上チルトに模した学習用コンテンツを生成した後、当該上方向に移動されたウィンドウＳを下方向へ戻すことで、カメラの下チルトに模した学習用コンテンツを生成してもよい。 Subsequently, the window processing unit 212 moves the window S in the learning image data V in the vertical direction in the same manner as the learning content generation processing for bread learning, so that a vertical motion vector between the fields is obtained. Learning content for tilt learning corresponding to is generated (steps 248 to 252). That is, the window processing unit 212 generates learning content that is actually taken by tilting the camera in the vertical direction. In this case, for example, the window processing unit 212 moves the window S upward in the learning image data V, generates learning content imitating the upward tilt of the camera, and then moves in the upward direction. The learning content imitating the downward tilt of the camera may be generated by returning the window S thus moved downward.

上記ステップ２４７において、ウィンドウ処理部２１２は、ウィンドウ処理用フラグｗが１ではない（２である）と判断した場合（Ｎｏ）、ズーム学習用の学習用コンテンツの生成処理を実行する（ステップ２５３〜２５６）。すなわち、ウィンドウ処理部２１２は、学習用画像データＶ内でウィンドウＳを拡大縮小させることで、各フィールド間の拡大及び縮小の動きベクトルに対応したズーム学習用の学習用コンテンツを生成する。当該ズーム学習用の学習用コンテンツの生成処理の大まかな流れ（ｎの値の設定、速度設定等）は、上記パン及びチルト学習用の学習用コンテンツの生成処理と同様である。しかし、ズーム学習用の学習用コンテンツの生成処理は、ウィンドウＳの拡大及び縮小処理が行われる点で、上記パン用及びチルト学習用の学習用コンテンツの生成処理と異なっている。以下、この拡大及び縮小処理を中心に、当該ズーム学習用の学習用コンテンツの生成処理の詳細について説明する。
図２７は、ズーム学習用の学習用コンテンツの生成処理を概念的に示した図である。
同図に示すように、ウィンドウ処理部２１２は、縮小（ズームアウト）学習用の学習用コンテンツを生成する際、学習用画像データＶ内で、ウィンドウＳａと、それよりも面積が大きいウィンドウＳｂとを定義する。ウィンドウＳａ及びウィンドウＳｂのサイズは、学習用画像データＶのサイズを越えない範囲で設定される。そして、ウィンドウ処理部２１２は、ウィンドウＳａ内の画像データを、それよりも広域の画像を捉えたウィンドウＳｂ内の画像データへ、時間ｔａ（秒）の間で徐々に変化させる。これは、ウィンドウＳａをウィンドウＳｂまで徐々に拡大させ、拡大途中のウィンドウＳａにより切り出される各画像のサイズをウィンドウＳａのサイズに正規化することと等価である。 When the window processing unit 212 determines in step 247 that the window processing flag w is not 1 (2) (No), the window processing unit 212 executes learning content generation processing for zoom learning (steps 253 to 253). 256). That is, the window processing unit 212 generates learning content for zoom learning corresponding to the motion vector for enlargement and reduction between fields by enlarging and reducing the window S in the learning image data V. The rough flow of learning content generation processing for zoom learning (setting of the value of n, speed setting, etc.) is the same as the generation processing of learning content for pan and tilt learning. However, the learning content generation process for zoom learning differs from the above-described learning content generation process for panning and tilt learning in that the window S is enlarged and reduced. Hereinafter, the details of the process of generating learning content for zoom learning will be described focusing on the enlargement and reduction processes.
FIG. 27 is a diagram conceptually illustrating a process for generating learning content for zoom learning.
As shown in the figure, when generating learning content for reduction (zoom-out) learning, the window processing unit 212 includes a window Sa and a window Sb having a larger area in the learning image data V. Define The sizes of the window Sa and the window Sb are set in a range not exceeding the size of the learning image data V. Then, the window processing unit 212 gradually changes the image data in the window Sa to the image data in the window Sb capturing a wider area image during the time ta (seconds). This is equivalent to gradually expanding the window Sa to the window Sb and normalizing the size of each image cut out by the window Sa being enlarged to the size of the window Sa.

またウィンドウ処理部２１２は、拡大（ズームイン）学習用の学習用コンテンツを生成する際、上記ウィンドウＳａに納められたウィンドウＳｂ内の画像データを、時間ｔｂ（秒）の間で徐々にウィンドウＳａ内の画像データへと戻す。これは、ウィンドウＳｂをウィンドウＳａまで徐々に縮小させ、縮小途中のウィンドウＳｂにより切り出される各画像のサイズをウィンドウＳａのサイズに正規化することと等価である。 Further, when generating the learning content for enlargement (zoom-in) learning, the window processing unit 212 gradually converts the image data in the window Sb stored in the window Sa into the window Sa during the time tb (seconds). Return to the image data. This is equivalent to gradually reducing the window Sb to the window Sa and normalizing the size of each image cut out by the window Sb being reduced to the size of the window Sa.

図２８は、ズームアウト（縮小）学習用の学習用コンテンツの生成処理の様子を示した図である。同図においては、説明を簡単にするため、１０×１０の画素データについての処理を示しているが、もちろん、他のサイズの画素データについての処理も同様に実行される。
同図（Ａ）に示すように、ウィンドウ処理部２１２は、６×６画素の画素データを囲むウィンドウを上記ウィンドウＳａとし、１０×１０画素の画像データを囲むウィンドウを上記ウィンドウＳｂとして設定する。そして、ウィンドウ処理部２１２は、８×８画素の画素データのうち、水平方向の８画素を上記ウィンドウＳａに合わせて６画素に変化させ、垂直方向の８画素を上記ウィンドウＳａに合わせて６画素に変化させる。この処理は、処理開始からｔ１（秒）後に実行される。これにより、同図（Ｂ）に示すように、ウィンドウＳａ内の画像は、（６×６）／（８×８）＝０．５６倍に縮小される。 FIG. 28 is a diagram illustrating a process of generating learning content for zoom-out (reduction) learning. In the figure, for simplification of explanation, processing for 10 × 10 pixel data is shown. Of course, processing for pixel data of other sizes is also executed in the same manner.
As shown in FIG. 5A, the window processing unit 212 sets a window surrounding 6 × 6 pixel data as the window Sa and a window surrounding 10 × 10 pixel image data as the window Sb. Then, the window processing unit 212 changes 8 pixels in the horizontal direction from the pixel data of 8 × 8 pixels to 6 pixels in accordance with the window Sa, and 6 pixels in the vertical direction according to the window Sa. To change. This process is executed after t1 (seconds) from the start of the process. As a result, as shown in FIG. 5B, the image in the window Sa is reduced to (6 × 6) / (8 × 8) = 0.56 times.

そしてウィンドウ処理部２１２は、同図（Ａ）に示すように、最終的に、１０×１０画素の画素データのうち、水平方向の１０画素を上記ウィンドウＳａに合わせて６画素に変化させ、垂直方向の１０画素を上記ウィンドウＳａに合わせて６画素に変化させる。この処理は、処理開始からｔａ（秒）後に実行される。これにより、同図（Ｂ）に示すように、ウィンドウＳａ内の画像は、（６×６）／（１０×１０）＝０．３６倍に縮小される。 Then, as shown in FIG. 5A, the window processing unit 212 finally changes 10 pixels in the horizontal direction from 10 × 10 pixel data to 6 pixels in accordance with the window Sa, and vertically 10 pixels in the direction are changed to 6 pixels in accordance with the window Sa. This process is executed ta (seconds) after the start of the process. As a result, as shown in FIG. 5B, the image in the window Sa is reduced to (6 × 6) / (10 × 10) = 0.36 times.

図２９は、水平方向及び垂直方向で画素データ数を変化させる場合の演算手法について説明するための図である。
同図（Ａ）及び（Ｂ）に示すように、例えば８画素のデータを６画素のデータに変化させる場合、変化後の画素データｅｋ（ｋ：０〜５）は、変化前の画素データｄｎ（ｎ：０〜７）を重み付け加算することで生成される。同図（Ｂ）に示すように、重み係数ｗ_ｋｎは、変化前の画素データｄｎ（ｄ０〜ｄ７）と変化後の画素データｅｋ（ｅ０〜ｅ５）間の距離に比例して可変される。すなわち、変化前後の画素データ間の位置（距離）が近い場合には重み係数は大きく、遠い場合には重み係数は小さく設定される。例えば同図（Ａ）に示すように、画素データｅ０が生成される場合、画素データｅ０との距離が最小となる画素データｄ１についての重み係数ｗ_０１が最大となり、画素データｅ１との距離が最大となる画素データｄ７についての重み係数ｗ_０７が最小となる。ウィンドウ処理部２１２は、この処理を水平方向と垂直方向とでそれぞれ実行する。 FIG. 29 is a diagram for explaining a calculation method when the number of pixel data is changed in the horizontal direction and the vertical direction.
As shown in FIGS. 6A and 6B, for example, when data of 8 pixels is changed to data of 6 pixels, the pixel data ek (k: 0 to 5) after the change is the pixel data dn before the change. It is generated by weighted addition of (n: 0 to 7). As shown in FIG. 5B, the weighting factor w _kn is varied in proportion to the distance between the pixel data dn (d0 to d7) before the change and the pixel data ek (e0 to e5) after the change. That is, the weight coefficient is set large when the position (distance) between the pixel data before and after the change is close, and the weight coefficient is set small when the position is far. For example, as shown in Fig (A), in the case where the pixel data e0 is generated, the weighting factor w ₀₁ is maximized for the pixel data d1 distance between the pixel data e0 is minimized, the distance between the pixel data e1 weighting factor w ₀₇ for the pixel data d7 to the maximum is minimized. The window processing unit 212 executes this processing in the horizontal direction and the vertical direction, respectively.

図３０は、ズームイン（拡大）学習用の学習用コンテンツの生成処理の様子を、上記図２８と同様に示した図である。
同図に示すように、ウィンドウ処理部２１２は、上記ズームアウト（縮小）学習用の学習用コンテンツの生成処理で生成されたデータを基に戻すことで、ズームイン（拡大）学習用の学習用コンテンツを生成する。すなわち、同図（Ａ）に示すように、ウィンドウ処理部２１２は、上記図２８の処理により生成された、１０×１０画素から６×６画素に変化された画素データ（０．３６倍）を用いて処理を開始する。ウィンドウ処理部２１２は、当該１０×１０画素から６×６画素に変化された画素データを、ｔ２秒後に、８×８画素の画素データを６×６画素の画素データに変化させた画素データに変化させる。これにより、同図（Ｂ）に示すように、ウィンドウＳａ内の画像は、上記０．３６倍から、（６×６）／（８×８）＝０．５６倍に拡大される。 FIG. 30 is a diagram showing the generation process of learning content for zoom-in (enlargement) learning in the same manner as FIG.
As shown in the figure, the window processing unit 212 returns learning data for zoom-in (enlargement) learning by returning the data generated in the process for generating learning content for zoom-out (reduction) learning. Is generated. That is, as shown in FIG. 8A, the window processing unit 212 uses the pixel data (0.36 times) generated by the process of FIG. 28 and changed from 10 × 10 pixels to 6 × 6 pixels. To start the process. The window processing unit 212 converts the pixel data changed from 10 × 10 pixels to 6 × 6 pixels into pixel data obtained by changing pixel data of 8 × 8 pixels to pixel data of 6 × 6 pixels after t2 seconds. Change. As a result, as shown in FIG. 5B, the image in the window Sa is enlarged from the above 0.36 times to (6 × 6) / (8 × 8) = 0.56 times.

そしてウィンドウ処理部２１２は、同図（Ａ）に示すように、最終的に、ｔｂ秒後に、ウィンドウＳａ内の６×６画素の画素データを、元の画素データに戻す。これにより、同図（Ｂ）に示すように、ウィンドウＳａ内の画像は、０．３６倍から１倍に拡大される。 Then, the window processing unit 212 finally returns the pixel data of 6 × 6 pixels in the window Sa to the original pixel data after tb seconds, as shown in FIG. As a result, as shown in FIG. 5B, the image in the window Sa is enlarged from 0.36 times to 1 time.

ウィンドウ処理部２１２は、上記ズームアウト用の学習用コンテンツ生成処理において生成されたデータを用いずに別個にズームイン用の学習用コンテンツを生成することも可能である。しかし、ズームアウト用の学習用コンテンツ生成処理で生成されたデータを用いた方が、新たなデータを生成する手間が省ける分、効率的である。 The window processing unit 212 can separately generate the zoom-in learning content without using the data generated in the zoom-out learning content generation processing. However, it is more efficient to use the data generated by the zoom-out learning content generation process because it saves the trouble of generating new data.

図３１は、ズームアウト（縮小）用の学習用コンテンツ生成処理において、ウィンドウＳａ内の画像が変化する様子を示した図である。
ウィンドウ処理部２１２は、同図（Ａ）及び（Ｂ）に示すように、学習用画像データＶ内でウィンドウＳａ及びＳｂを定義する。そして、ウィンドウ処理部２１２は、同図（Ｃ）〜（Ｆ）に示すように、ウィンドウＳａ内の画像をウィンドウＳｂ内の画像へと徐々に変化させることで、実際にカメラでズームアウトして撮影したような学習用コンテンツを生成することができる。この逆の処理により、実際にカメラでズームインして撮影したような学習用コンテンツが生成される。 FIG. 31 is a diagram showing how the image in the window Sa changes in the learning content generation process for zoom-out (reduction).
The window processing unit 212 defines windows Sa and Sb in the learning image data V as shown in FIGS. Then, the window processing unit 212 actually zooms out with the camera by gradually changing the image in the window Sa to the image in the window Sb, as shown in FIGS. It is possible to generate learning content such as a photograph taken. By the reverse process, learning content that is actually captured by zooming in with the camera is generated.

以上の処理により学習用コンテンツが生成されると、ウィンドウ処理部２１２は、その学習用コンテンツをＲＡＭ２またはＨＤＤ１０等に記憶する。この場合、上記ウィンドウＳの移動量や移動方向に関するデータも学習用コンテンツと対応付けられて記憶される。 When the learning content is generated by the above processing, the window processing unit 212 stores the learning content in the RAM 2 or the HDD 10 or the like. In this case, data relating to the moving amount and moving direction of the window S is also stored in association with the learning content.

図２２に戻り、映像特徴検出部４の学習処理部２１３は、以上のように生成された学習用コンテンツを用いて学習処理を実行する（ステップ２２９）。すなわち、学習処理部２１３は、上記生成された学習用コンテンツをＲＡＭ２やＨＤＤ１０から読み出し、当該学習用コンテンツから動きベクトルを検出し、当該動きベクトルの有効／無効を判断する。学習処理部２１３は、学習用コンテンツ生成時におけるウィンドウＳの移動方向や移動量に関するデータも上記ＲＡＭ２やＨＤＤ１０等から取得できるため、検出された動きベクトルの有効／無効も容易に判断できる。すなわち、実際に検出された動きベクトルの移動量や移動方向が、学習用コンテンツ生成時に記憶されたウィンドウＳの移動量や移動方向に適合するか否かに応じて、動きベクトルの有効／無効が判断される。この学習処理は、上記２／４／８／１６フィールドの異なるフィールド間毎に実行される。 Returning to FIG. 22, the learning processing unit 213 of the video feature detection unit 4 executes the learning process using the learning content generated as described above (step 229). That is, the learning processing unit 213 reads the generated learning content from the RAM 2 or the HDD 10, detects a motion vector from the learning content, and determines whether the motion vector is valid / invalid. The learning processing unit 213 can also acquire from the RAM 2 or the HDD 10 the data related to the moving direction and moving amount of the window S at the time of learning content generation, and therefore can easily determine whether the detected motion vector is valid / invalid. That is, the validity / invalidity of the motion vector is determined according to whether or not the movement amount and movement direction of the actually detected motion vector match the movement amount and movement direction of the window S stored at the time of learning content generation. To be judged. This learning process is executed for each different field of the 2/4/8/16 field.

そして学習処理部２１３は、動きベクトルの有効／無効のデータと、判別パラメータ（基準ブロックと参照ブロックとの間の残差、基準ブロック及び参照ブロックの平均値及び分散値）とを学習データとしてＲＡＭ２等に記憶する（ステップ２３０）。 Then, the learning processing unit 213 uses the motion vector valid / invalid data and the discrimination parameters (residual between the base block and the reference block, the average value and the variance value of the base block and the reference block) as the learning data. (Step 230).

続いて、ウィンドウ処理部２１２は、上記ウィンドウ処理用フラグｗを１だけインクリメントし（ステップ２３２）、当該ウィンドウ処理用フラグｗの値が２を越えたか否かを判断する（ステップ２３３）。すなわちウィンドウ処理部２１２は、１つの学習用画像データから、上記パン、チルト、ズームの各学習用の学習用コンテンツを生成したか否かを判断する。ウィンドウ処理用フラグｗの値が２以下である場合（Ｎｏ）、ウィンドウ処理部は、ステップ２２７に戻り、当該インクリメント後のウィンドウ処理用フラグｗに対応する学習用コンテンツの生成処理を実行する。 Subsequently, the window processing unit 212 increments the window processing flag w by 1 (step 232), and determines whether or not the value of the window processing flag w exceeds 2 (step 233). That is, the window processing unit 212 determines whether or not learning content for learning of each of the pan, tilt, and zoom is generated from one learning image data. If the value of the window processing flag w is 2 or less (No), the window processing unit returns to Step 227 and executes the learning content generation processing corresponding to the window processing flag w after the increment.

ウィンドウ処理用フラグｗの値が２を越えた場合（Ｙｅｓ）、判別処理部２１４は、上記学習データをＲＡＭ２等から読み出す（ステップ２３４）。判別処理部２１４は、当該学習データの読み出しを全て完了すると（ステップ２３５のＹｅｓ）、上記マッチング処理部２６、２８、３０、３２から、検出された動きベクトルの判別パラメータの入力を受ける（ステップ２３６）。そして、判別処理部２１４は、上述したように、当該入力された判別パラメータと、上記学習データのうち、有効な判別パラメータ群と無効な判別パラメータ群とのマハラノビス距離を演算することで、動きベクトルの有効／無効を判別する（ステップ２３７）。 When the value of the window processing flag w exceeds 2 (Yes), the determination processing unit 214 reads the learning data from the RAM 2 or the like (step 234). When the discrimination processing unit 214 completes the reading of the learning data (Yes in Step 235), the discrimination processing unit 214 receives input of the detected motion vector discrimination parameter from the matching processing units 26, 28, 30, and 32 (Step 236). ). Then, as described above, the discrimination processing unit 214 calculates the motion vector by calculating the Mahalanobis distance between the valid discrimination parameter group and the invalid discrimination parameter group among the input discrimination parameter and the learning data. The validity / invalidity is determined (step 237).

以上説明したように、本実施形態によれば、映像特徴検出部４は、上記学習用画像データ内でウィンドウ処理を実行することで、特性のばらつきが小さい学習用コンテンツを効率よく生成できる。これにより、映像特徴検出部４は、その学習用コンテンツを用いた学習処理及び当該学習処理の結果を用いた判別処理を正確に実行することができる。 As described above, according to the present embodiment, the video feature detection unit 4 can efficiently generate learning content with small variations in characteristics by executing window processing in the learning image data. Thereby, the video feature detection unit 4 can accurately execute the learning process using the learning content and the discrimination process using the result of the learning process.

また映像特徴検出部４は、映像特徴検出対象の映像コンテンツを学習用映像コンテンツとしても用いることで、当該映像コンテンツから検出されるであろう動きベクトルを擬似的に生成して、当該動きベクトルの妥当性について学習することができる。したがって、映像特徴検出部４は、上記各マッチング処理部により実際に検出される動きベクトルと類似する動きベクトルの妥当性について学習することができる。すなわち、この場合にウィンドウ処理部２１２で生成される学習用コンテンツは、上記映像特徴検出対象の映像コンテンツを基にしながらも、それとは異なった映像コンテンツである。したがって、映像特徴検出部４は、当該映像特徴検出用の映像コンテンツから検出される可能性のある動きベクトルを網羅的に学習することができるため、当該実際に検出される動きベクトルの妥当性をより高精度に判別することができる。 The video feature detection unit 4 also uses the video content of the video feature detection target as the learning video content, thereby generating a motion vector that will be detected from the video content in a pseudo manner. You can learn about validity. Therefore, the video feature detection unit 4 can learn about the validity of a motion vector similar to the motion vector actually detected by each matching processing unit. That is, in this case, the learning content generated by the window processing unit 212 is a video content different from that based on the video content of the video feature detection target. Therefore, since the video feature detection unit 4 can comprehensively learn motion vectors that may be detected from the video content for video feature detection, the validity of the actually detected motion vector is determined. It can be determined with higher accuracy.

本実施形態においては、記録再生装置１００は、動きベクトルの有効／無効を、マハラノビス距離に基づく判別分析により実行した。しかしながら、記録再生装置１００は、例えばＳＶＭ（Support Vector Machine）、ニューラルネットワーク等の他の機械学習的な手法を用いて判別分析を実行してもよい。また、上記学習用コンテンツの生成処理は、動きベクトルの有効／無効の判断に上記判別関数が用いられる場合にも同様に適用することができる。 In the present embodiment, the recording / reproducing apparatus 100 executes the validity / invalidity of the motion vector by discriminant analysis based on the Mahalanobis distance. However, the recording / reproducing apparatus 100 may perform the discriminant analysis using another machine learning method such as SVM (Support Vector Machine) or a neural network. The learning content generation process can be similarly applied to the case where the discriminant function is used to determine whether a motion vector is valid / invalid.

上述の実施形態においては、判別関数の生成処理や、マハラノビス距離を用いた判別処理を行うための学習データの生成処理を、記録再生装置１００が実行していた。しかし、この判別関数や学習データは、他の装置により生成されて、種々のインタフェースを介して記録再生装置１００に入力され記憶されてもよい。 In the above-described embodiment, the recording / reproducing apparatus 100 executes the discrimination function generation processing and the learning data generation processing for performing the discrimination processing using the Mahalanobis distance. However, the discriminant function and the learning data may be generated by another device and input and stored in the recording / reproducing device 100 via various interfaces.

本発明の一実施形態に係る記録再生装置の構成を示した図である。It is the figure which showed the structure of the recording / reproducing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態におけるカメラ特徴について示した図である。It is the figure shown about the camera characteristic in one Embodiment of this invention. 本発明の一実施形態における映像編集特徴について示した図である。It is the figure shown about the video editing characteristic in one Embodiment of this invention. 本発明の一実施形態における映像特徴検出部の具体的構成を示したブロック図である。It is the block diagram which showed the specific structure of the image | video feature detection part in one Embodiment of this invention. 本発明の一実施形態における判別関数の生成処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the production | generation process of the discriminant function in one Embodiment of this invention. 本発明の一実施形態における映像特徴検出部による映像特徴検出処理の大まかな流れを示したフローチャートである。It is the flowchart which showed the rough flow of the image | video feature detection process by the image | video feature detection part in one Embodiment of this invention. 本発明の一実施形態における動きベクトル検出処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the motion vector detection process in one Embodiment of this invention. 本発明の一実施形態における動きベクトル検出処理の機能ブロック図である。It is a functional block diagram of the motion vector detection process in one Embodiment of this invention. 本発明の一実施形態における合成動きベクトルの生成処理の概念図である。It is a conceptual diagram of the production | generation process of the synthetic | combination motion vector in one Embodiment of this invention. 本発明の一実施形態におけるブロックマッチング処理の概念図である。It is a conceptual diagram of the block matching process in one Embodiment of this invention. 本発明の一実施形態に係る記録再生装置による映像特徴推定処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the image | video feature estimation process by the recording / reproducing apparatus which concerns on one Embodiment of this invention. 本発明の一実施形態において用いられるアフィン変換モデルを示した図である。It is the figure which showed the affine transformation model used in one Embodiment of this invention. 本発明の一実施形態において重回帰分析によりアフィン係数を求める処理を示した図である。It is the figure which showed the process which calculates | requires an affine coefficient by multiple regression analysis in one Embodiment of this invention. 本発明の一実施形態におけるカメラ特徴の判定継続時間長の可変処理の流れを示したフローチャートである。It is the flowchart which showed the flow of the variable process of the determination continuation time length of the camera characteristic in one Embodiment of this invention. 本発明の一実施形態において、２フィールド間と１６フィールド間で検出される動きベクトルの移動量を、動きが速い場合及び動きが遅い場合について説明した図である。In one Embodiment of this invention, it is a figure explaining the amount of movement of the motion vector detected between 2 fields and 16 fields about the case where a motion is fast and the case where a motion is slow. 本発明の一実施形態における、動きベクトルが検出されたフィールド間に応じたカメラ特徴の判定継続時間長の可変処理について示したグラフである。It is the graph shown about the variable process of the determination duration length of the camera characteristic according to between the fields from which the motion vector was detected in one Embodiment of this invention. 本発明の一実施形態における、フェード／カット評価値の算出結果と、フィールドの経過との関係を、上記フィールド間隔毎に示したグラフである。It is the graph which showed the relationship between the calculation result of fade / cut evaluation value, and progress of a field for every said field interval in one Embodiment of this invention. 本発明の一実施形態における、フェード／カット評価値の算出結果と、フィールドの経過との関係を、上記フィールド間隔毎に示したグラフである。It is the graph which showed the relationship between the calculation result of fade / cut evaluation value, and progress of a field for every said field interval in one Embodiment of this invention. 本発明の一実施形態において判定された各映像特徴の判定結果を示した表である。It is the table | surface which showed the determination result of each image | video characteristic determined in one Embodiment of this invention. 本発明の他の実施形態における映像特徴検出部の具体的構成を示したブロック図である。It is the block diagram which showed the specific structure of the image | video feature detection part in other embodiment of this invention. 本発明の他の実施形態における映像特徴検出部の構成を示したブロック図である。It is the block diagram which showed the structure of the image | video feature detection part in other embodiment of this invention. 本発明の他の実施形態における学習用コンテンツ生成処理の大まかな流れを示したフローチャートである。It is the flowchart which showed the rough flow of the content generation process for learning in other embodiment of this invention. 本発明の他の実施形態におけるウィンドウ処理の詳細な流れを示すフローチャートである。It is a flowchart which shows the detailed flow of the window process in other embodiment of this invention. 本発明の他の実施形態において、静止画像が学習用画像データとして抽出される様子を示した図である。It is the figure which showed a mode that the still image was extracted as image data for learning in other embodiment of this invention. 本発明の他の実施形態において、サンプル映像コンテンツからカット点の画像が学習用画像データとして抽出される様子を示した図である。In other embodiment of this invention, it is the figure which showed a mode that the image of the cut point was extracted as learning image data from the sample video content. 本発明の他の実施形態においてパン用の学習用コンテンツが生成される様子を示した図である。It is the figure which showed a mode that the content for learning for bread | pans is produced | generated in other embodiment of this invention. 本発明の他の実施形態におけるズーム学習用の学習用コンテンツの生成処理を概念的に示した図である。It is the figure which showed notionally the production | generation process of the learning content for zoom learning in other embodiment of this invention. 本発明の他の実施形態におけるズームアウト（縮小）学習用の学習用コンテンツの生成処理の様子を示した図である。It is the figure which showed the mode of the production | generation process of the content for learning for zoom out (reduction | reduction) learning in other embodiment of this invention. 本発明の他の実施形態において、水平方向及び垂直方向で画素データ数を変化させる場合の演算手法について説明するための図である。FIG. 11 is a diagram for explaining a calculation method when changing the number of pixel data in the horizontal direction and the vertical direction in another embodiment of the present invention. 本発明の他の実施形態におけるズームイン（拡大）学習用の学習用コンテンツの生成処理の様子を、上記図２８と同様に示した図である。It is the figure which showed the mode of the production | generation process of the content for learning for zoom-in (enlargement) learning in other embodiment of this invention similarly to the said FIG. 本発明の他の実施形態におけるズームアウト（縮小）用の学習用コンテンツ生成処理において、ウィンドウＳａ内の画像が変化する様子を示した図である。It is the figure which showed a mode that the image in window Sa changed in the learning content production | generation process for the zoom out (reduction | reduction) in other embodiment of this invention.

Explanation of symbols

１…ＣＰＵ
２…ＲＡＭ
３…操作入力部
４…映像特徴検出部
１０…ＨＤＤ
１１…光ディスクドライブ
１６…ＡＶデコーダ
２１…ブロック処理部
２２…２フィールド間メモリ部
２３…４フィールド間メモリ部
２４…８フィールド間メモリ部
２５…１６フィールド間メモリ部
２６、２８、３０、３２…マッチング処理部
２７、２９、３１、３３…フェード／カット処理部
３４…動きベクトル処理部
３５…フェード／カット判定部
３６…カメラ特徴判定部
１００…記録再生装置
２１１…学習用画像抽出部
２１２…ウィンドウ処理部
２１３…学習処理部
２１４…判別処理部
Ｖ…学習用画像データ
Ｓａ…ウィンドウ
Ｓｂ…ウィンドウ 1 ... CPU
2 ... RAM
3 ... Operation input unit 4 ... Video feature detection unit 10 ... HDD
DESCRIPTION OF SYMBOLS 11 ... Optical disk drive 16 ... AV decoder 21 ... Block processing part 22 ... Memory field between 2 fields 23 ... Memory part between 4 fields 24 ... Memory part between 8 fields 25 ... Memory part between 16 fields 26, 28, 30, 32 ... Matching Processing unit 27, 29, 31, 33 ... Fade / cut processing unit 34 ... Motion vector processing unit 35 ... Fade / cut determination unit 36 ... Camera feature determination unit 100 ... Recording / playback device 211 ... Learning image extraction unit 212 ... Window processing Unit 213 ... Learning processing unit 214 ... Discrimination processing unit V ... Image data for learning Sa ... Window Sb ... Window

Claims

Of the plurality of image data constituting the video data, the first image data, the second image data having a first time length between the first image data, and the first image data Input means for inputting, for each predetermined block, third image data having a second time length longer than the first time length during
A first motion vector between a predetermined reference block in the first image data and a first reference block in the second image data; and a first motion vector in the reference image and the third image data. Detecting means for respectively detecting a second motion vector between the two reference blocks;
Storage means for storing determination data for determining whether or not each of the detected first and second motion vectors is valid;
Discriminating means for discriminating, for each block, whether or not the detected first and second motion vectors are valid using the stored discrimination data;
Normalization means for normalizing the detected first and second motion vectors for each block with respect to time length;
An electronic apparatus comprising: generating means for generating a combined motion vector by combining the first and second motion vectors for each of the blocks determined to be valid and normalized.

The electronic device according to claim 1,
The electronic device that normalizes the first motion vector according to the second time length.

The electronic device according to claim 2,
The generation unit determines the combined motion vector using the first motion vector when it is determined that both the first motion vector and the second motion vector of the corresponding block are valid. Generate electronic equipment.

The electronic device according to claim 3,
The storage means stores a plurality of learning video data and a plurality of learning motion vector data respectively corresponding to the first and second time lengths to be detected from the plurality of learning video data. ,
The electronic device
A motion vector between the reference block and the reference block is detected from the stored plurality of learning video data with the first and second time lengths, and the detected motion vector and the storage An electronic device further comprising learning means for learning whether or not the detected motion vector is valid using the learned motion vector data and generating the discrimination data.

The electronic device according to claim 4,
The learning means includes a motion vector detected from the learning video data, a residual signal between the base block and the reference block, an average value and a variance value of pixel data of the base block, and the reference block An electronic apparatus that generates a discriminant function as the discriminant data by performing discriminant analysis using an average value and a variance value of pixel data.

The electronic device according to claim 3,
When it is determined that the detected first motion vector is valid, the detection means uses the first reference block when the first motion vector is detected as a reference block and uses the second reference block as a reference block. An electronic device that detects a second motion vector by detecting a third motion vector between the reference block and the detected third motion vector and the first motion vector.

The electronic device according to claim 3,
An electronic apparatus further comprising: a determination unit that determines a camera feature in the video data generated by a camera operation based on the generated combined motion vector.

The electronic device according to claim 7,
The determination unit continues the determination of the camera feature for a first determination time when the combined motion vector is a combination of the first motion vector for each block, and the combined motion vector is An electronic device that continues the determination of the camera characteristics for a second determination time longer than the first determination time when the second motion vector for each block is synthesized.

The electronic device according to claim 8,
The determining means determines that if the combined motion vector is a combination of the first and second motion vectors for each block, any one of the first and second motion vectors is combined. The electronic device continues the determination of the camera characteristics according to whether the first or second determination time is satisfied.

The electronic device according to claim 4,
In the predetermined learning image data, the rectangular area is moved by the first and second time lengths, and the image data in the rectangular area is extracted by the first and second time lengths, respectively. An electronic device further comprising learning video data generating means for generating the learning video data necessary for learning whether or not each movement detected as the first and second motion vectors is appropriate.

The electronic device according to claim 4,
In the predetermined learning image data, the image in the first rectangular area having the first area is changed to the image in the second rectangular area having the second area different from the first area. The learning is necessary for learning whether or not each expansion or reduction detected as the first and second motion vectors is appropriate by changing and extracting the first and second time lengths, respectively. An electronic device further comprising learning video data generation means for generating video data.

The electronic device according to claim 10 or 11,
The learning video data generation means calculates a sum of residual signals for each block between continuous first sample image data and second sample image data in predetermined sample video data, An electronic apparatus that extracts the second sample image data as the learning image data when the sum is equal to or greater than a predetermined threshold.

The electronic device according to claim 12,
The learning video data generating means extracts the learning image data using the video data in which the first and second motion vectors are detected as the sample video data.

Of the plurality of image data constituting the video data, the first image data, the second image data having a first time length between the first image data, and the first image data And input the third image data having a second time length longer than the first time length for each predetermined block,
A first motion vector between a predetermined reference block in the first image data and a first reference block in the second image data; and a first motion vector in the reference image and the third image data. Each detecting a second motion vector between two reference blocks;
Storing discrimination data for discriminating whether or not each of the detected first and second motion vectors is valid;
Using the stored discrimination data, each block determines whether or not the detected first and second motion vectors are valid,
Normalizing the detected first and second motion vectors for each block with respect to time length;
A motion vector detection method for generating a combined motion vector by combining the first or second motion vector for each of the blocks determined to be valid and normalized.

15. The motion vector detection method according to claim 14, further comprising:
When the composite motion vector is a composite of the first motion vector for each block, the camera feature in the video data generated by the camera operation is determined based on the composite motion vector for a first determination time. Continue to judge,
When the synthesized motion vector is a synthesized second motion vector for each block, the camera feature is determined based on the synthesized motion vector as a second determination longer than the first determination time. A motion vector detection method for continuous determination.

Electronic equipment,
Of the plurality of image data constituting the video data, the first image data, the second image data having a first time length between the first image data, and the first image data Inputting, for each predetermined block, third image data having a second time length longer than the first time length during
A first motion vector between a predetermined reference block in the first image data and a first reference block in the second image data; and a first motion vector in the reference image and the third image data. Each detecting a second motion vector between two reference blocks;
Storing discrimination data for discriminating whether or not each of the detected first and second motion vectors is valid;
Using the stored discrimination data to discriminate for each block whether or not the detected first and second motion vectors are valid;
Normalizing the detected first and second motion vectors for each block with respect to time length; and
Generating a synthesized motion vector by synthesizing the first or second motion vector for each of the blocks determined to be valid and normalized.

The program according to claim 16, further comprising:
When the composite motion vector is a composite of the first motion vector for each block, the camera feature in the video data generated by the camera operation is determined based on the composite motion vector for a first determination time. A step of continually determining;
When the synthesized motion vector is a synthesized second motion vector for each block, the camera feature is determined based on the synthesized motion vector as a second determination longer than the first determination time. A program for executing the step of determining continuously.