JP2012160011A

JP2012160011A - Thumbnail extraction program and thumbnail extraction method

Info

Publication number: JP2012160011A
Application number: JP2011019158A
Authority: JP
Inventors: Susumu Endo; 進遠藤; Masaki Ishihara; 正樹石原; Takayuki Baba; 孝之馬場; Yusuke Uehara; 祐介上原; Daiki Masumoto; 大器増本; Shugo Nakamura; 秋吾中村; Masahiko Sugimura; 昌彦杉村; Shigemi Osada; 茂美長田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2011-01-31
Filing date: 2011-01-31
Publication date: 2012-08-23
Anticipated expiration: 2031-01-31
Also published as: JP5516444B2

Abstract

PROBLEM TO BE SOLVED: To provide a thumbnail extraction program and a thumbnail extraction method that are capable of extracting a thumbnail suitable for three-dimensional video.SOLUTION: A thumbnail extraction program causes a computer to execute processing for: extracting feature points corresponding between a frame image pair constituting one frame of 3D video from the frame image pair as a feature point pair; calculating a depth amount of the feature point pair based on a distance between points of the feature point pair, a distance between lenses, and a focal length, and clustering the feature point pair using the depth amount; evaluating stereoscopic visibility of one or more frame image pairs from clustering result information of the one or more frame image pairs on the basis of a predetermined evaluation condition of stereoscopic visibility of a frame image pair based on clustering result information; and extracting a pair of the frame image with the most excellent evaluation result as a pair of the frame image for a thumbnail based on the evaluation result of stereoscopic visibility.

Description

本発明は映像からサムネイルを抽出するサムネイル抽出プログラム及びサムネイル抽出方法に関する。 The present invention relates to a thumbnail extraction program and a thumbnail extraction method for extracting thumbnails from video.

近年、携帯電話やビデオカメラ等には映像が大量に蓄積されるようになった。このように映像が大量に蓄積された携帯電話やビデオカメラ等では必要な映像を検索したいという要望がある。 In recent years, a large amount of video has been accumulated in mobile phones and video cameras. There is a demand for searching for necessary images in a mobile phone, a video camera, or the like in which a large amount of images is stored in this way.

従来、映像を検索する場合は、例えば映像に付加されたメタデータ等で検索する方法が利用されている。しかし、映像に付加されたメタデータ等で検索する方法では映像を絞り込むことができない場合も多かった。結局、映像を検索する場合は一つ一つ再生しながら確認する場合が多い。 Conventionally, when searching for a video, for example, a search method using metadata added to the video is used. However, there are many cases where the video cannot be narrowed down by a method of searching with metadata added to the video. After all, when searching for video, there are many cases where confirmation is made while playing back one by one.

また、従来、映像を検索する場合は、映像から抽出したサムネイルを表示することで映像の内容を把握させる方法も利用されている。サムネイルとは利用者に映像の内容を把握させるための画像である。映像の検索にサムネイルを利用する場合は映像一つ当たりのサムネイルの数を増やすことができないため、より良いサムネイル抽出手法が必要とされていた（例えば特許文献１参照）。 Conventionally, when searching for a video, a method of grasping the content of the video by displaying a thumbnail extracted from the video is also used. A thumbnail is an image that allows the user to grasp the content of the video. When thumbnails are used for video search, since the number of thumbnails per video cannot be increased, a better thumbnail extraction method is required (see, for example, Patent Document 1).

その他のサムネイル抽出手法としては、映像の先頭フレームや映像の先頭からあらかじめ決められた秒数後のフレームからサムネイルを抽出する方法や、ＣＭ検出手法と組み合わせて、ＣＭ以外の部分からサムネイルを抽出する方法、映像の切り替わりであるカットを検出して、そのカット画像をサムネイルとして抽出する方法、顔検出手法と組み合わせて顔のアップ画像をサムネイルとして抽出する手法などが存在する。 Other thumbnail extraction methods include a method of extracting thumbnails from the first frame of a video or a frame after a predetermined number of seconds from the top of the video, or a thumbnail from a portion other than the CM in combination with a CM detection method. There are a method, a method of detecting a cut that is a video change and extracting the cut image as a thumbnail, and a method of extracting an up image of a face as a thumbnail in combination with a face detection method.

特開２００５−２９４９０４号公報JP 2005-294904 A

現在、携帯電話やビデオカメラ等には３次元（３Ｄ）映像を扱える製品が出現し始めている。したがって、将来的に、携帯電話やビデオカメラ等には３Ｄ映像が大量に蓄積されること考えられる。このように３Ｄ映像が大量に蓄積された携帯電話やビデオカメラ等では必要な３Ｄ映像を検索したいという要望が増加すると考えられる。 Currently, products that can handle three-dimensional (3D) images have begun to appear in mobile phones and video cameras. Therefore, it is conceivable that a large amount of 3D images will be accumulated in the future in mobile phones, video cameras and the like. In this way, it is considered that there is an increasing demand for searching for a necessary 3D video in a mobile phone or a video camera in which a large amount of 3D video is accumulated.

しかし、上記したような２次元（２Ｄ）映像を検索する場合の問題は、３Ｄ映像を検索する場合にも発生すると考えられる。つまり、３Ｄ映像を検索する場合も、３Ｄ映像から抽出したサムネイルを表示することで映像の内容を把握させる方法が利用されると考えられる。 However, it is considered that the problem in searching for a two-dimensional (2D) video as described above also occurs when searching for a 3D video. That is, even when searching for a 3D video, it is considered that a method of grasping the content of the video by displaying thumbnails extracted from the 3D video is used.

しかし、従来の２Ｄ映像を対象としたサムネイル抽出手法は、３Ｄ映像に適したサムネイルを抽出できるものではないという問題があった。 However, the conventional thumbnail extraction method for 2D video has a problem that thumbnails suitable for 3D video cannot be extracted.

本実施形態は３次元映像に適したサムネイルを抽出できるサムネイル抽出プログラム及びサムネイル抽出方法を提供することを目的とする。 An object of the present embodiment is to provide a thumbnail extraction program and a thumbnail extraction method capable of extracting thumbnails suitable for 3D video.

上記課題を解決するため、本実施形態は、３Ｄ映像の一フレームを構成するフレーム画像ペア間で対応する特徴点を特徴点ペアとして前記フレーム画像ペアから抽出し、前記特徴点ペアの奥行き量を前記特徴点ペアの点間距離、所定のレンズ間距離・焦点距離を基に算出して、前記奥行き量で前記特徴点ペアをクラスタリングし、予め定められたクラスタリングの結果情報に基づく前記フレーム画像ペアの立体的な見え易さの評価条件を基に、１以上の前記フレーム画像ペアのクラスタリングの結果情報から前記１以上のフレーム画像ペアの立体的な見え易さを評価し、前記１以上のフレーム画像ペアの立体的な見え易さの評価結果に基づき、前記評価結果の最も良い前記フレーム画像ペアを、サムネイル用の前記フレーム画像ペアとして抽出する処理をコンピュータに実行させるサムネイル抽出プログラムである。 In order to solve the above-described problem, the present embodiment extracts a feature point corresponding to a pair of frame images constituting one frame of 3D video as a feature point pair from the frame image pair, and calculates the depth amount of the feature point pair. The frame image pair is calculated based on a distance between points of the feature point pair, a predetermined inter-lens distance / focal length, and clustering the feature point pairs based on the depth amount, and based on predetermined clustering result information The stereoscopic visibility of the one or more frame image pairs is evaluated from the result information of the clustering of the one or more frame image pairs based on the evaluation condition of the stereoscopic visibility of the one or more frames, and the one or more frames Based on the evaluation result of the three-dimensional visibility of the image pair, the frame image pair having the best evaluation result is extracted as the frame image pair for the thumbnail. A thumbnail extraction program for executing the processing to computers.

なお、本実施形態の構成要素、表現又は構成要素の任意の組合せを、方法、装置、システム、コンピュータプログラム、記録媒体、データ構造などに適用したものも本発明の態様として有効である。 In addition, what applied the arbitrary combination of the component of this embodiment, expression, or a component to a method, an apparatus, a system, a computer program, a recording medium, a data structure, etc. is also effective as an aspect of this invention.

本実施形態によれば３次元映像に適したサムネイルを抽出できるサムネイル抽出プログラム及びサムネイル抽出方法を提供可能である。 According to the present embodiment, it is possible to provide a thumbnail extraction program and a thumbnail extraction method capable of extracting thumbnails suitable for 3D video.

ＰＣの一例のハードウェア構成図である。It is a hardware block diagram of an example of PC. 本実施例のサムネイル抽出装置の一例の機能ブロック図である。It is a functional block diagram of an example of the thumbnail extraction apparatus of a present Example. 本実施例のサムネイル抽出装置の一例のフローチャートである。It is a flowchart of an example of the thumbnail extraction apparatus of a present Example. 映像情報テーブルの一例の構成図である。It is a block diagram of an example of a video information table. フレーム情報テーブルの一例の構成図である。It is a block diagram of an example of a frame information table. ３Ｄ映像の一フレームを構成する２枚の画像の一例のイメージ図である。It is an image figure of an example of two pictures which constitute one frame of 3D video. 局所特徴量テーブルの一例の構成図である。It is a block diagram of an example of a local feature-value table. 抽出した特徴点を視覚的に表したフレーム画像の一例のイメージ図である。It is an image figure of an example of the frame image which expressed the extracted feature point visually. 対応点テーブルの一例の構成図である。It is a block diagram of an example of a corresponding point table. 対応点テーブルの他の例の構成図である。It is a block diagram of the other example of a corresponding point table. Ｚ値ヒストグラムテーブルの一例の構成図である。It is a block diagram of an example of a Z value histogram table. 対応点ヒストグラムテーブルの一例の構成図である。It is a block diagram of an example of a corresponding point histogram table. Ｚ値クラスタテーブルの一例の構成図である。It is a block diagram of an example of a Z value cluster table. 対応点クラスタテーブルの一例の構成図である。It is a block diagram of an example of a corresponding point cluster table. フレーム評価テーブルの一例の構成図である。It is a block diagram of an example of a frame evaluation table. ３Ｄカメラを上から見た場合の一例の模式図である。It is a schematic diagram of an example at the time of seeing a 3D camera from the top. 視差によるＺ値の算出方法の一例のイメージ図である。It is an image figure of an example of the calculation method of Z value by parallax. ステップＳ３の処理の一例のフローチャートである。It is a flowchart of an example of a process of step S3. 計算した各ビンの値を表す一例のヒストグラムである。It is an example histogram showing the value of each bin calculated. 各クラスタに含まれる対応点を視覚的に表したフレーム画像の一例のイメージ図である。It is an image figure of an example of the frame picture which expressed the corresponding point contained in each cluster visually. ステップＳ４の処理の一例のフローチャートである。It is a flowchart of an example of a process of step S4. ステップＳ５の処理の一例のフローチャートである。It is a flowchart of an example of a process of step S5. ステップＳ２８の処理の一例のフローチャートである。It is a flowchart of an example of a process of step S28. 本実施例のサムネイル抽出装置の他の例の機能ブロック図である。It is a functional block diagram of the other example of the thumbnail extraction apparatus of a present Example. ステップＳ５の処理の他の例のフローチャートである。It is a flowchart of the other example of a process of step S5.

次に、本発明を実施するための形態を、以下の実施例に基づき図面を参照しつつ説明していく。本実施例のサムネイル抽出プログラムはパーソナルコンピュータ（ＰＣ）で実行される他、携帯電話やビデオカメラ等、３Ｄ映像を扱う様々な機器で実行される。本実施例ではＰＣでサムネイル抽出プログラムを実行する例を説明する。ＰＣは例えば図１に示すハードウェアにより構成される。図１はＰＣの一例のハードウェア構成図である。 Next, modes for carrying out the present invention will be described based on the following embodiments with reference to the drawings. The thumbnail extraction program of this embodiment is executed by a personal computer (PC) and various devices that handle 3D video such as a mobile phone and a video camera. In this embodiment, an example in which a thumbnail extraction program is executed on a PC will be described. The PC is configured by, for example, hardware shown in FIG. FIG. 1 is a hardware configuration diagram of an example of a PC.

図１のＰＣ１０は入力装置２１、表示装置２２、ＰＣ本体２３を有している。ＰＣ本体２３はバス３７で相互に接続された主記憶装置３１、演算処理装置３２、インタフェース装置３３、記録媒体読取装置３４及び補助記憶装置３５を有している。また、バス３７には入力装置２１及び表示装置２２が接続されている。 1 includes an input device 21, a display device 22, and a PC main body 23. The PC main body 23 includes a main storage device 31, an arithmetic processing device 32, an interface device 33, a recording medium reading device 34, and an auxiliary storage device 35 connected to each other via a bus 37. The input device 21 and the display device 22 are connected to the bus 37.

バス３７で相互に接続されている入力装置２１、表示装置２２、主記憶装置３１、演算処理装置３２、インタフェース装置３３、記録媒体読取装置３４及び補助記憶装置３５は演算処理装置３２による管理下で相互にデータの送受を行うことができる。演算処理装置３２は、ＰＣ１０全体の動作制御を司る中央処理装置である。 The input device 21, the display device 22, the main storage device 31, the arithmetic processing device 32, the interface device 33, the recording medium reading device 34, and the auxiliary storage device 35 connected to each other via the bus 37 are managed by the arithmetic processing device 32. Data can be sent and received between each other. The arithmetic processing unit 32 is a central processing unit that controls operation of the entire PC 10.

インタフェース装置３３はネットワーク等からのデータを受信し、データの内容を演算処理装置３２に渡す。インタフェース装置３３は演算処理装置３２からの指示に応じてネットワーク等にデータを送信する。 The interface device 33 receives data from a network or the like and passes the contents of the data to the arithmetic processing device 32. The interface device 33 transmits data to a network or the like in response to an instruction from the arithmetic processing device 32.

補助記憶装置３５にはサムネイル抽出装置と同様の機能をＰＣ１０に発揮させるプログラムの一部として、少なくともサムネイル抽出装置における処理をＰＣ１０に実行させるサムネイル抽出プログラムが記憶されている。そして、演算処理装置３２がサムネイル抽出プログラムを補助記憶装置３５から読み出して実行することで、ＰＣ１０はサムネイル抽出装置として機能するようになる。サムネイル抽出プログラムは演算処理装置３２とアクセス可能な主記憶装置３１に格納されていても良い。入力装置２１は演算処理装置３２の管理下でデータの入力を受付ける。サムネイル抽出プログラムはＰＣ１０が読み取り可能な記録媒体３６に記録しておくことができる。 The auxiliary storage device 35 stores a thumbnail extraction program that causes the PC 10 to execute at least processing in the thumbnail extraction device as part of a program that causes the PC 10 to perform the same function as the thumbnail extraction device. Then, when the arithmetic processing device 32 reads out and executes the thumbnail extraction program from the auxiliary storage device 35, the PC 10 functions as a thumbnail extraction device. The thumbnail extraction program may be stored in the main storage device 31 accessible to the arithmetic processing device 32. The input device 21 receives data input under the control of the arithmetic processing device 32. The thumbnail extraction program can be recorded in a recording medium 36 that can be read by the PC 10.

記録媒体３６には、磁気記録媒体、光ディスク、光磁気記録媒体、半導体メモリなどがある。磁気記録媒体には、ＨＤＤ、フレキシブルディスク（ＦＤ）、磁気テープ（ＭＴ）などがある。光ディスクには、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）、ＤＶＤ−ＲＡＭ、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃ − ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＣＤ−Ｒ（Ｒｅｃｏｒｄａｂｌｅ）／ＲＷ（ＲｅＷｒｉｔａｂｌｅ）などがある。また、光磁気記録媒体には、ＭＯ（Ｍａｇｎｅｔｏ − Ｏｐｔｉｃａｌｄｉｓｋ）などがある。 Examples of the recording medium 36 include a magnetic recording medium, an optical disk, a magneto-optical recording medium, and a semiconductor memory. Magnetic recording media include HDDs, flexible disks (FD), magnetic tapes (MT) and the like. Examples of the optical disc include a DVD (Digital Versatile Disc), a DVD-RAM, a CD-ROM (Compact Disc-Read Only Memory), and a CD-R (Recordable) / RW (ReWriteable). Magneto-optical recording media include MO (Magneto-Optical disk).

サムネイル抽出プログラムを流通させる場合は、サムネイル抽出プログラムが記録されたＤＶＤ、ＣＤ−ＲＯＭなどの可搬型の記録媒体３６を販売することが考えられる。サムネイル抽出プログラムを実行するＰＣ１０は例えば記録媒体読取装置３４がサムネイル抽出プログラムを記録した記録媒体３６からサムネイル抽出プログラムを読み出す。演算処理装置３２は、読み出されたサムネイル抽出プログラムを主記憶装置３１若しくは補助記憶装置３５に格納する。 When distributing the thumbnail extraction program, it is conceivable to sell a portable recording medium 36 such as a DVD or CD-ROM in which the thumbnail extraction program is recorded. For example, the PC 10 executing the thumbnail extraction program reads the thumbnail extraction program from the recording medium 36 on which the recording medium reader 34 has recorded the thumbnail extraction program. The arithmetic processing device 32 stores the read thumbnail extraction program in the main storage device 31 or the auxiliary storage device 35.

ＰＣ１０は自己の記憶装置である主記憶装置３１若しくは補助記憶装置３５からサムネイル抽出プログラムを読み取り、サムネイル抽出プログラムに従った処理を実行する。演算処理装置３２はサムネイル抽出プログラムに従って、後述するような各種処理を実現している。 The PC 10 reads the thumbnail extraction program from the main storage device 31 or the auxiliary storage device 35 which is its own storage device, and executes processing according to the thumbnail extraction program. The arithmetic processing unit 32 implements various processes as described later according to the thumbnail extraction program.

以下では、少なくともＰＣ１０にサムネイル抽出プログラムがインストールされているサムネイル抽出装置４０を例に説明していく。 Hereinafter, at least the thumbnail extraction apparatus 40 in which the thumbnail extraction program is installed in the PC 10 will be described as an example.

図２は本実施例のサムネイル抽出装置の一例の機能ブロック図である。図２のサムネイル抽出装置４０は、フレーム画像取得部４１、特徴点抽出部４２、局所特徴抽出部４３、特徴点ペア抽出部４４、奥行きクラスタリング部４５、サムネイル評価部４６、サムネイル抽出部４７、映像情報テーブル５１、フレーム情報テーブル５２、局所特徴量テーブル５３、対応点テーブル５４、Ｚ値ヒストグラムテーブル５５、対応点ヒストグラムテーブル５６、Ｚ値クラスタテーブル５７、対応点クラスタテーブル５８、フレーム評価テーブル５９を有する。 FIG. 2 is a functional block diagram of an example of the thumbnail extraction apparatus according to the present embodiment. 2 includes a frame image acquisition unit 41, a feature point extraction unit 42, a local feature extraction unit 43, a feature point pair extraction unit 44, a depth clustering unit 45, a thumbnail evaluation unit 46, a thumbnail extraction unit 47, and a video. An information table 51, a frame information table 52, a local feature table 53, a corresponding point table 54, a Z value histogram table 55, a corresponding point histogram table 56, a Z value cluster table 57, a corresponding point cluster table 58, and a frame evaluation table 59. .

ここでは図２の機能ブロックの概要について説明する。フレーム画像取得部４１は３Ｄ映像の一フレームを構成する２枚の画像を取得する。特徴点抽出部４２は画像から特徴点を抽出する。局所特徴抽出部４３は画像から特徴点の局所特徴量を抽出する。特徴点ペア抽出部４４は３Ｄ映像の一フレームを構成する２枚の画像の特徴点のペアを抽出する。 Here, an outline of the functional block of FIG. 2 will be described. The frame image acquisition unit 41 acquires two images constituting one frame of 3D video. The feature point extraction unit 42 extracts feature points from the image. The local feature extraction unit 43 extracts local feature amounts of feature points from the image. The feature point pair extraction unit 44 extracts a pair of feature points of two images constituting one frame of 3D video.

奥行きクラスタリング部４５は特徴点ペアから算出される奥行き距離で特徴点をクラスタリングする。サムネイル評価部４６はクラスタリングの結果を元に、フレームの立体らしさの値（評価値）を算出する。サムネイル抽出部４７はフレームの立体らしさの値が一番高いフレームをサムネイル用のフレームとして抽出する。 The depth clustering unit 45 clusters the feature points by the depth distance calculated from the feature point pairs. The thumbnail evaluation unit 46 calculates a three-dimensional value (evaluation value) of the frame based on the clustering result. The thumbnail extracting unit 47 extracts a frame having the highest frame solidity value as a thumbnail frame.

映像情報テーブル５１は３Ｄ映像の情報を保存する。フレーム情報テーブル５２は３Ｄ映像の一フレームを構成する２枚の画像を関連付けて保存する。局所特徴量テーブル５３は３Ｄ映像の一フレームを構成する２枚の画像から抽出された特徴点の局所特徴量を保存する。 The video information table 51 stores 3D video information. The frame information table 52 associates and stores two images constituting one frame of 3D video. The local feature quantity table 53 stores the local feature quantities of feature points extracted from two images constituting one frame of 3D video.

対応点テーブル５４は、３Ｄ映像の一フレームを構成する２枚の画像から抽出された特徴点のペアを対応点として保存すると共に、その対応点の距離値を保存する。距離値とは３Ｄ映像の一フレームを構成する２枚の画像を重畳したときの特徴点のペアの間の距離をいう。なお、対応点テーブル５４は対応点、その対応点の距離値に加えて対応点の奥行き距離（Ｚ値）を保存するようにしてもよい。 The corresponding point table 54 stores a pair of feature points extracted from two images constituting one frame of the 3D video as corresponding points, and stores a distance value of the corresponding points. The distance value is a distance between a pair of feature points when two images constituting one frame of 3D video are superimposed. The corresponding point table 54 may store the corresponding points and the depth distances (Z values) of the corresponding points in addition to the distance values of the corresponding points.

Ｚ値ヒストグラムテーブル５５は複数定められたＺ値の範囲にある対応点の個数を保存する。対応点ヒストグラムテーブル５６は対応点と、その対応点が格納されたＺ値の範囲とを関連付けて保存する。 The Z value histogram table 55 stores the number of corresponding points in a plurality of Z value ranges. The corresponding point histogram table 56 stores the corresponding points in association with the Z value range in which the corresponding points are stored.

Ｚ値クラスタテーブル５７はＺ値のヒストグラムのピークと、そのピークの前後のＺ値の範囲とを併せたクラスタを保存する。対応点クラスタテーブル５８は対応点と、その対応点が格納されたクラスタとを関連付けて保存する。また、フレーム評価テーブル５９は各フレームの立体らしさの値（評価値）を保存する。 The Z value cluster table 57 stores a cluster that combines the peaks of the Z value histogram and the range of Z values before and after the peak. The corresponding point cluster table 58 stores the corresponding points in association with the clusters in which the corresponding points are stored. Further, the frame evaluation table 59 stores the solidity value (evaluation value) of each frame.

図３は本実施例のサムネイル抽出装置の一例のフローチャートである。ステップＳ１において、フレーム画像取得部４１は３Ｄ映像の画像（フレーム画像）を取得する。３Ｄ映像の各フレームは、位置が少しずれた地点から見た２枚の画像（フレーム画像）で構成される。３Ｄ映像におけるフレーム画像の位置のずれは、あまり大きくないことがほとんどである。３Ｄ映像におけるフレーム画像は、かなり似通った画像となる。 FIG. 3 is a flowchart of an example of the thumbnail extraction apparatus according to this embodiment. In step S1, the frame image acquisition unit 41 acquires a 3D video image (frame image). Each frame of 3D video is composed of two images (frame images) viewed from a point slightly shifted in position. In most cases, the displacement of the position of the frame image in the 3D video is not so large. The frame images in the 3D video are considerably similar images.

取得元の３Ｄ映像に関して特に制約はない。３Ｄ映像はテレビ映像や、３Ｄカメラでライブ撮影された映像でもかまわないし、Ｂｌｕ−ｒａｙ３Ｄのようなファイルに保存された映像でもかまわない。また、３Ｄ映像の格納方式についても特に制約はない。３Ｄ映像の格納方式は、Ｓｉｄｅ−ｂｙ−Ｓｉｄｅ（左右の画像を横に並べて一つの画像として保存）やＦｒａｍｅａｌｔｅｒｎａｔｉｖｅ（右と左画像を交互に保存）などを利用することができる。 There are no particular restrictions on the 3D video of the acquisition source. The 3D video may be a TV video, a video captured live with a 3D camera, or a video stored in a file such as Blu-ray 3D. There is no particular restriction on the 3D video storage method. As a 3D video storage method, Side-by-Side (left and right images are stored side by side and stored as a single image), Frame alternative (right and left images are stored alternately), and the like can be used.

３Ｄ映像に関しては、映像の幅や高さ、フレーム数などの映像情報を利用できるようにする。これらの映像情報は映像情報テーブル５１のような形で保存できる。図４は映像情報テーブルの一例の構成図である。図４の映像情報テーブル５１はデータ項目として映像番号、幅、高さ、フレーム数を有する。 For 3D video, video information such as the width and height of the video and the number of frames is made available. Such video information can be stored in the form of a video information table 51. FIG. 4 is a configuration diagram of an example of the video information table. The video information table 51 of FIG. 4 has video numbers, widths, heights, and frame numbers as data items.

映像番号は３Ｄ映像の識別子である。幅は３Ｄ映像の幅である。高さは３Ｄ映像の高さである。フレーム数は３Ｄ映像のフレーム数である。なお、フレーム数に関してはライブ映像などで取得できない場合もある。その場合はフレーム数を利用しないようにすることもできる。例えばフレーム数が取得できない場合は先頭から指定フレーム以内のフレームからサムネイルを抽出するようにすれば、問題なく処理を行うことができる。 The video number is a 3D video identifier. The width is the width of the 3D video. The height is the height of the 3D video. The number of frames is the number of 3D video frames. Note that the number of frames may not be acquired with live video. In that case, it is possible not to use the number of frames. For example, if the number of frames cannot be acquired, processing can be performed without problems by extracting thumbnails from frames within a specified frame from the beginning.

なお、以降の処理は３Ｄ映像ごとに独立であるため、一つの３Ｄ映像に対する処理を記述する。フレーム画像取得部４１は必ずしも３Ｄ映像の全てのフレームを抽出する必要はなく、３Ｄ映像の一部のフレームを抽出すればよい。フレーム画像取得部４１は、カット検出手法と組み合わせて画像を抽出する方法や、一定間隔の画像を抽出する方法、ＣＭ検出手法、顔検出手法などの他の解析手法と組み合わせて画像を抽出する方法などが考えられる。 Since the subsequent processing is independent for each 3D video, the processing for one 3D video is described. The frame image acquisition unit 41 does not necessarily need to extract all the frames of the 3D video, and may extract a part of the frames of the 3D video. The frame image acquisition unit 41 extracts an image in combination with a cut detection method, a method for extracting images at regular intervals, a CM detection method, a face detection method, or another analysis method. And so on.

フレーム画像取得部４１は、取得したフレーム画像をフレーム情報テーブル５２に保存する。図５はフレーム情報テーブルの一例の構成図である。図５のフレーム情報テーブル５２はデータ項目としてフレーム番号、左画像、右画像を有する。 The frame image acquisition unit 41 stores the acquired frame image in the frame information table 52. FIG. 5 is a configuration diagram of an example of the frame information table. The frame information table 52 in FIG. 5 has a frame number, a left image, and a right image as data items.

フレーム番号は、フレームの識別子である。左画像は、フレームを構成する左画像（左フレーム画像）の識別子である。また、右画像はフレームを構成する右画像（右フレーム画像）の識別子である。 The frame number is a frame identifier. The left image is an identifier of a left image (left frame image) constituting the frame. The right image is an identifier of the right image (right frame image) constituting the frame.

図６は３Ｄ映像の一フレームを構成する２枚の画像の一例のイメージ図である。図６に示すように、左フレーム画像１００及び右フレーム画像１０１は、かなり似通った画像となる。 FIG. 6 is an image diagram of an example of two images constituting one frame of 3D video. As shown in FIG. 6, the left frame image 100 and the right frame image 101 are quite similar images.

図３のステップＳ２において、特徴点抽出部４２はステップＳ１で取得したフレーム情報テーブル５２内の一つのフレームの左フレーム画像１００、右フレーム画像１０１に対して、それぞれ、特徴点を抽出する。また、局所特徴抽出部４３は抽出された特徴点の特徴量を局所特徴量として抽出する。 In step S2 of FIG. 3, the feature point extraction unit 42 extracts feature points from the left frame image 100 and the right frame image 101 of one frame in the frame information table 52 acquired in step S1, respectively. Further, the local feature extraction unit 43 extracts the feature quantity of the extracted feature points as a local feature quantity.

特徴点抽出方法について、特に限定は行わない。単純な特徴点抽出方法としては、Ｓｏｂｅｌエッジ抽出手法、Ｃａｎｎｙエッジ抽出手法などの各種エッジ検出手法やＨａｒｒｉｓコーナー手法などのコーナー検出手法などがある。Ｓｏｂｅｌエッジ抽出手法は例えば「高木幹雄，下田陽久：新編画像解析ハンドブック，東京大学出版会，2004.」などに記載されている。Ｃａｎｎｙエッジ抽出手法は例えば「CANNY J.,"A Computational Approach to Edge Detection," IEEE Trans. on Pattern Analysis and Machine Intelligence, 8(6), pp. 679-698, 1986」などに記載されている。Ｈａｒｒｉｓコーナー手法は例えば「C.Harris, M.Stephens, "A COMBINED CORNER AND EDGE DETECTOR," Proc. of 4th Alvey vision Conference, pp.147-151, 1988」などに記載されている。 The feature point extraction method is not particularly limited. As simple feature point extraction methods, there are various edge detection methods such as a Sobel edge extraction method and a Canny edge extraction method, and corner detection methods such as a Harris corner method. The Sobel edge extraction method is described in, for example, “Mikio Takagi, Yoshihisa Shimoda: New Image Analysis Handbook, University of Tokyo Press, 2004”. The Canny edge extraction method is described in, for example, “CANNY J.,“ A Computational Approach to Edge Detection, ”IEEE Trans. On Pattern Analysis and Machine Intelligence, 8 (6), pp. 679-698, 1986”. The Harris corner method is described in, for example, “C. Harris, M. Stephens,“ A COMBINED CORNER AND EDGE DETECTOR, ”Proc. Of 4th Alvey vision Conference, pp. 147-151, 1988”.

また、局所特徴量抽出手法についても、特に限定は行わない。局所特徴量は特徴点付近の画像から抽出することが考えられる。局所特徴量抽出手法としては、色ヒストグラムなどの手法が使用できる。色ヒストグラムの手法は例えば「高木幹雄，下田陽久：新編画像解析ハンドブック，東京大学出版会，2004.」などに記載されている。 Further, the local feature amount extraction method is not particularly limited. It is conceivable that the local feature amount is extracted from an image near the feature point. As a local feature extraction method, a method such as a color histogram can be used. The method of the color histogram is described in, for example, “Mikio Takagi, Yoshihisa Shimoda: New Image Analysis Handbook, University of Tokyo Press, 2004”.

また、特徴点抽出方法、局所特徴量抽出手法としては、ＳＩＦＴ（US Patent 6,711,293）などを使用した方法も考えられる。ＳＩＦＴは例えば「D.G.Lowe, "Object recognition from local scale-invariant features," Proc. of IEEE Int. Conf. on. Computer Vision (ICCV) pp.1150-1157, 1999.」などに記載されている。 Further, as a feature point extraction method and a local feature quantity extraction method, a method using SIFT (US Patent 6,711,293) or the like is also conceivable. SIFT is described in, for example, “D.G. Lowe,“ Object recognition from local scale-invariant features, ”Proc. Of IEEE Int. Conf. On. Computer Vision (ICCV) pp. 1150-1157, 1999.

ＳＩＦＴでは、画像にガウスぼかしをかけた場合の差分（Difference of Gaussian）を算出し、その極大点を特徴点とし、そこから、勾配の方向を元にした、局所特徴量を抽出する。同様の局所特徴抽出手法としてはＳＵＲＦなども利用できる。ＳＵＲＦは、例えば「BAY H., "SURF : Speeded up robust features," Proc. 9th ECCV, May 2006, Graz, Austria 1, 404-417, 2006」などに記載されている。なお、抽出された特徴量（局所特徴量）は、実数値の配列（ベクトル）形式になる。 In SIFT, the difference (Difference of Gaussian) when Gaussian blurring is applied to an image is calculated, and the local maximum is extracted from the maximum point as a feature point. As a similar local feature extraction method, SURF or the like can be used. SURF is described in, for example, “BAY H.,“ SURF: Speeded up robust features, ”Proc. 9th ECCV, May 2006, Graz, Austria 1, 404-417, 2006”. The extracted feature quantity (local feature quantity) is in the form of a real value array (vector).

特徴点抽出部４２及び局所特徴抽出部４３は、抽出した特徴点及び局所特徴量を局所特徴量テーブル５３に保存する。図７は局所特徴量テーブルの一例の構成図である。図７の局所特徴量テーブル５３はデータ項目としてフレーム番号、画像番号、特徴量番号、Ｘ座標、Ｙ座標、特徴量を有する。 The feature point extraction unit 42 and the local feature extraction unit 43 store the extracted feature points and local feature amounts in the local feature amount table 53. FIG. 7 is a configuration diagram of an example of the local feature table. The local feature quantity table 53 of FIG. 7 has a frame number, an image number, a feature quantity number, an X coordinate, a Y coordinate, and a feature quantity as data items.

フレーム番号は、フレームの識別子である。画像番号は、右フレーム画像と左フレーム画像との識別子である。特徴量番号は特徴点の識別子である。Ｘ座標は特徴点のＸ座標である。Ｙ座標は特徴点のＹ座標である。特徴量は、特徴点の特徴量である。 The frame number is a frame identifier. The image number is an identifier between the right frame image and the left frame image. The feature quantity number is an identifier of a feature point. The X coordinate is the X coordinate of the feature point. The Y coordinate is the Y coordinate of the feature point. The feature amount is a feature amount of a feature point.

図８は抽出した特徴点を視覚的に表したフレーム画像の一例のイメージ図である。図８では特徴点を「○」で表している。図８に示すように、左フレーム画像１００及び右フレーム画像１０１は複数の特徴点が抽出されている。なお、左フレーム画像１００及び右フレーム画像１０１上に表された特徴点は図７の局所特徴量テーブル５３に応じたものとなる。 FIG. 8 is an image diagram of an example of a frame image that visually represents the extracted feature points. In FIG. 8, the feature points are represented by “◯”. As shown in FIG. 8, a plurality of feature points are extracted from the left frame image 100 and the right frame image 101. Note that the feature points represented on the left frame image 100 and the right frame image 101 correspond to the local feature table 53 of FIG.

図３のステップＳ３において、特徴点ペア抽出部４４は局所特徴量テーブル５３を参照して、後述のように同一フレームの右フレーム画像及び左フレーム画像から特徴点のペアを抽出する。特徴点ペア抽出部４４は抽出した特徴点のペアを対応点として対応点テーブル５４に保存する。 In step S3 of FIG. 3, the feature point pair extraction unit 44 refers to the local feature amount table 53 and extracts a feature point pair from the right frame image and the left frame image of the same frame as described later. The feature point pair extraction unit 44 stores the extracted feature point pairs in the corresponding point table 54 as corresponding points.

図９は対応点テーブルの一例の構成図である。図９の対応点テーブル５４はデータ項目としてフレーム番号、対応点番号、右フレーム画像１０１の特徴点番号、左フレーム画像１００の特徴点番号、距離値を有する。フレーム番号は、フレームの識別子である。対応点番号は、対応点の識別子である。右フレーム画像１０１の特徴点番号は、右フレーム画像１０１の特徴点の識別子である。左フレーム画像１００の特徴点番号は、左フレーム画像１００の特徴点の識別子である。距離値は、左フレーム画像１００と右フレーム画像１０１とを重畳したときの、右フレーム画像１０１の特徴点番号により識別される特徴点と左フレーム画像１００の特徴点番号により識別される特徴点との間の距離である。 FIG. 9 is a configuration diagram of an example of the corresponding point table. The corresponding point table 54 in FIG. 9 includes data items such as a frame number, a corresponding point number, a feature point number of the right frame image 101, a feature point number of the left frame image 100, and a distance value. The frame number is a frame identifier. The corresponding point number is an identifier of the corresponding point. The feature point number of the right frame image 101 is an identifier of the feature point of the right frame image 101. The feature point number of the left frame image 100 is an identifier of the feature point of the left frame image 100. The distance value includes a feature point identified by the feature point number of the right frame image 101 and a feature point identified by the feature point number of the left frame image 100 when the left frame image 100 and the right frame image 101 are superimposed. Is the distance between.

図１０は対応点テーブルの他の例の構成図である。図１０の対応点テーブル５４は図９の対応点テーブル５４のデータ項目にＺ値が追加されている。Ｚ値は局所特徴量テーブル５３に保存されているＸ座標、対応点テーブル５４に保存されている距離値から後述のように算出される。Ｚ値は対応点の奥行き距離を表している。 FIG. 10 is a configuration diagram of another example of the corresponding point table. In the corresponding point table 54 of FIG. 10, a Z value is added to the data item of the corresponding point table 54 of FIG. The Z value is calculated from the X coordinate stored in the local feature table 53 and the distance value stored in the corresponding point table 54 as described later. The Z value represents the depth distance of the corresponding point.

図３のステップＳ４において、奥行きクラスタリング部４５は図９の対応点テーブル５４に保存されている距離値又は図１０の対応点テーブル５４に保存されているＺ値を元に後述のクラスタリング処理を行い、近い奥行き距離の対応点のグループを作成する。奥行きクラスタリング部４５はクラスタリング処理の結果を、図１１に示すＺ値ヒストグラムテーブル５５、図１２に示す対応点ヒストグラムテーブル５６、図１３に示すＺ値クラスタテーブル５７、図１４に示す対応点クラスタテーブル５８に保存する。 In step S4 in FIG. 3, the depth clustering unit 45 performs a clustering process described later based on the distance value stored in the corresponding point table 54 in FIG. 9 or the Z value stored in the corresponding point table 54 in FIG. Create a group of corresponding points of close depth distance. The depth clustering unit 45 displays the result of the clustering process as a Z value histogram table 55 shown in FIG. 11, a corresponding point histogram table 56 shown in FIG. 12, a Z value cluster table 57 shown in FIG. 13, and a corresponding point cluster table 58 shown in FIG. Save to.

図１１はＺ値ヒストグラムテーブルの一例の構成図である。図１１のＺ値ヒストグラムテーブル５５はデータ項目としてフレーム番号、ビン番号、開始のＺ値、終了のＺ値、個数を有する。フレーム番号は、フレームの識別子である。ビン番号は、Ｚ値の範囲（ヒストグラムのビン）の識別子である。開始のＺ値は、ビンの開始のＺ値である。終了のＺ値は、ビンの終了のＺ値である。個数はビンにある対応点の個数である。 FIG. 11 is a configuration diagram of an example of a Z value histogram table. The Z value histogram table 55 in FIG. 11 has a frame number, a bin number, a start Z value, an end Z value, and the number as data items. The frame number is a frame identifier. The bin number is an identifier of a range of Z values (histogram bins). The starting Z value is the starting Z value of the bin. The end Z value is the end Z value of the bin. The number is the number of corresponding points in the bin.

図１２は対応点ヒストグラムテーブルの一例の構成図である。図１２の対応点ヒストグラムテーブル５６はデータ項目としてフレーム番号、対応点番号、ビン番号を有する。フレーム番号は、フレームの識別子である。対応点番号は、対応点の識別子である。ビン番号はビンの識別子である。 FIG. 12 is a configuration diagram of an example of the corresponding point histogram table. The corresponding point histogram table 56 of FIG. 12 has a frame number, a corresponding point number, and a bin number as data items. The frame number is a frame identifier. The corresponding point number is an identifier of the corresponding point. The bin number is an identifier of the bin.

図１３はＺ値クラスタテーブルの一例の構成図である。図１３のＺ値クラスタテーブル５７はデータ項目としてフレーム番号、クラスタ番号、開始のＺ値、終了のＺ値、個数を有する。フレーム番号は、フレームの識別子である。クラスタ番号は、クラスタの識別子である。開始のＺ値は、クラスタの開始のＺ値である。終了のＺ値は、クラスタの終了のＺ値である。個数はクラスタにある対応点の個数である。 FIG. 13 is a configuration diagram of an example of the Z value cluster table. The Z value cluster table 57 of FIG. 13 has a frame number, a cluster number, a start Z value, an end Z value, and the number as data items. The frame number is a frame identifier. The cluster number is a cluster identifier. The starting Z value is the starting Z value of the cluster. The end Z value is the end Z value of the cluster. The number is the number of corresponding points in the cluster.

また、図１４は対応点クラスタテーブルの一例の構成図である。図１４の対応点クラスタテーブル５８はデータ項目としてフレーム番号、対応点番号、クラスタ番号を有する。フレーム番号は、フレームの識別子である。対応点番号は、対応点の識別子である。クラスタ番号はクラスタの識別子である。 FIG. 14 is a configuration diagram of an example of the corresponding point cluster table. The corresponding point cluster table 58 of FIG. 14 has a frame number, a corresponding point number, and a cluster number as data items. The frame number is a frame identifier. The corresponding point number is an identifier of the corresponding point. The cluster number is a cluster identifier.

図３のステップＳ５において、サムネイル評価部４６は後述するように、クラスタリングの結果を元に、フレームの立体らしさの値（評価値）を算出する。つまり、サムネイル評価部４６は現在のフレームがサムネイルとして適しているかの評価値を算出する。 In step S5 of FIG. 3, the thumbnail evaluation unit 46 calculates a value (evaluation value) of the three-dimensionality of the frame based on the result of clustering, as will be described later. That is, the thumbnail evaluation unit 46 calculates an evaluation value as to whether the current frame is suitable as a thumbnail.

また、ステップＳ６において、サムネイル評価部４６は算出したフレームの立体らしさの値（評価値）を図１５に示すフレーム評価テーブル５９に保存する。図１５はフレーム評価テーブルの一例の構成図である。図１５のフレーム評価テーブル５９はデータ項目としてフレーム番号、評価値を有する。フレーム番号は、フレームの識別子である。評価値はフレームの立体らしさの値である。 In step S6, the thumbnail evaluation unit 46 stores the calculated three-dimensional value (evaluation value) of the frame in the frame evaluation table 59 shown in FIG. FIG. 15 is a configuration diagram of an example of a frame evaluation table. The frame evaluation table 59 in FIG. 15 has a frame number and an evaluation value as data items. The frame number is a frame identifier. The evaluation value is a value of the solidness of the frame.

また、ステップＳ７において、サムネイル評価部４６は他に解析すべき（立体らしさの値を算出すべき）フレームがあるか否かを判定する。他に解析すべきフレームがあると判定すれば、サムネイル評価部４６はフレーム画像取得部４１にフレーム画像の取得を要求する。ステップＳ１〜Ｓ６の処理は解析すべきフレームの数だけ繰り返される。 In step S7, the thumbnail evaluation unit 46 determines whether there is another frame to be analyzed (a value of solidness should be calculated). If it is determined that there are other frames to be analyzed, the thumbnail evaluation unit 46 requests the frame image acquisition unit 41 to acquire a frame image. Steps S1 to S6 are repeated for the number of frames to be analyzed.

他に解析すべきフレームがないと判定すれば、サムネイル評価部４６はサムネイル抽出部４７にサムネイル用のフレームの抽出を要求する。ステップＳ８において、サムネイル抽出部４７はフレーム評価テーブル５９のフレームの立体らしさの値（評価値）が一番高いフレームをサムネイル用のフレームとして抽出する。なお、サムネイル抽出部４７はサムネイルの抽出位置（３Ｄ映像の最初、最後など）を元にした重み付けを、例えば３Ｄ映像の種類によって行うようにしてもよい。 If it is determined that there are no other frames to be analyzed, the thumbnail evaluation unit 46 requests the thumbnail extraction unit 47 to extract thumbnail frames. In step S <b> 8, the thumbnail extracting unit 47 extracts the frame having the highest three-dimensionality value (evaluation value) of the frame in the frame evaluation table 59 as a thumbnail frame. Note that the thumbnail extraction unit 47 may perform weighting based on the thumbnail extraction position (first, last, etc. of 3D video) depending on the type of 3D video, for example.

図３のステップＳ３の処理の詳細は以下の通りである。特徴点ペア抽出部４４は、局所特徴量テーブル５３における、同一フレームの右フレーム画像及び左フレーム画像でそれぞれ抽出された特徴量間で関連付けを行う。 The details of the process of step S3 in FIG. 3 are as follows. The feature point pair extraction unit 44 associates the feature amounts extracted from the right frame image and the left frame image of the same frame in the local feature amount table 53, respectively.

右フレーム画像の特徴量番号集合をＲ、左フレーム画像の特徴量番号集合をＬとした場合は以下の式（１）を満たす、ｉ，ｊを見つける問題となる。ここで、ｆ_Ｒ（ｉ）は右フレーム画像のｉ番目の特徴量とする。ｆ_Ｌ（ｊ）は、左フレーム画像のｊ番目の特徴量とする。ｄｉｓｔは二つの特徴量間の距離関数とする。 When the feature quantity number set of the right frame image is R and the feature quantity number set of the left frame image is L, there is a problem of finding i and j that satisfy the following expression (1). Here, f _R (i) is the i-th feature amount of the right frame image. f _L (j) is the j-th feature amount of the left frame image. “dist” is a distance function between two feature quantities.

つまり、特徴点ペア抽出部４４は右フレーム画像の特徴量と左フレーム画像の特徴量との間の全てのペアに対して特徴量間の距離を算出し、最小の距離のペア（ｉ，ｊ）を求めている。特徴点ペア抽出部４４は、同時に、最小となるペアの特徴量間の距離（ｄｉｓｔ関数の戻り値）も保持する。特徴量間の距離が予め与えられた距離の閾値以下の場合は対応点として対応点テーブル５４に保存する。 That is, the feature point pair extraction unit 44 calculates the distance between the feature amounts for all pairs between the feature amount of the right frame image and the feature amount of the left frame image, and sets the minimum distance pair (i, j ) At the same time, the feature point pair extraction unit 44 also holds the distance between the feature amounts of the minimum pair (return value of the dist function). When the distance between the feature amounts is equal to or less than a predetermined distance threshold value, it is stored in the corresponding point table 54 as a corresponding point.

また、式（１）では左フレーム画像と右フレーム画像の特徴量間の全てのマッチングを行っているが、高速化のため、ある一定距離のペアまでしか求めないようにしてもかまわない。 Further, in Expression (1), all the matching between the feature amounts of the left frame image and the right frame image is performed. However, only a certain distance pair may be obtained for speeding up.

例えば右フレーム画像のｉ番目の特徴点のＸ座標をｘ_Ｒ（ｉ）とし、左フレーム画像のｊ番目の特徴点のＸ座標をｘ_Ｌ（ｊ）とし、右フレーム画像のｉ番目の特徴点のＹ座標をｙ_Ｒ（ｉ）とし、左フレーム画像のｊ番目の特徴点のＹ座標をｙ_Ｌ（ｊ）とし、閾値としてＴ_ｘ及びＴ_ｙを設定した場合は、以下の式（２）を満たすものだけを対象とする。 For example, the X coordinate of the i th feature point of the right frame image is x _R (i), the X coordinate of the j th feature point of the left frame image is x _L (j), and the i th feature point of the right frame image. Is set to y _R (i), the Y coordinate of the j-th feature point of the left frame image is set to y _L (j), and T _x and T _y are set as thresholds, the following equation (2) Only those that meet

一般的に、３Ｄ画像のカメラのレンズは左右に並んでいる。したがって、３Ｄ画像ではＸ座標における差分の方が大きいため、Ｔ_ｘの方にＴ_ｙより大きな値を指定する。距離関数としては、一般的なユークリッド距離などの手法を用いることができる。また、局所特徴量ごとに適した距離関数を用いてもかまわない。 In general, the lenses of a 3D image camera are arranged side by side. Accordingly, since the difference in the X coordinate is larger in the 3D image, a value larger than T _y is designated for T _x . As the distance function, a general technique such as Euclidean distance can be used. A distance function suitable for each local feature may be used.

ユークリッド距離を用いる場合は以下の式（３）のようになる。式（３）は、２つのベクトルｖ_１及びｖ_２において、次元数ｉの場合の値がそれぞれｖ_１（ｉ）、ｖ_２（ｊ）であり、かつ、次元数がｎの場合の例である。特徴点ペア抽出部４４は、抽出した対応点の左右フレーム画像の特徴量番号（ｉ，ｊ）と、抽出した対応点の距離値とを対応点テーブル５４に保存する。 When the Euclidean distance is used, the following equation (3) is obtained. Expression (3) is an example in which the values in the case of the number of dimensions i are v ₁ (i) and v ₂ (j), respectively, and the number of dimensions is n in the two vectors v ₁ and v ₂ . is there. The feature point pair extraction unit 44 stores the feature quantity number (i, j) of the extracted left and right frame images of the corresponding points and the distance value of the extracted corresponding points in the corresponding point table 54.

図１０に示す対応点テーブルを用いる場合、特徴点ペア抽出部４４は局所特徴量テーブル５３に保存しているＸ座標及び対応点テーブル５４に保存している距離値から、対応点として保存されている特徴点のＺ値（奥行き距離）を算出する。なお、Ｚ値の算出は以下の式（４）を用いて行うことができる。 When the corresponding point table shown in FIG. 10 is used, the feature point pair extraction unit 44 is stored as a corresponding point from the X coordinate stored in the local feature table 53 and the distance value stored in the corresponding point table 54. The Z value (depth distance) of the feature point is calculated. The Z value can be calculated using the following formula (4).

ここで、ｆ_Ｃは３Ｄ画像の撮影時のカメラのパラメータである焦点距離である。Ｘ_Ｃは３Ｄ画像を撮影したカメラの二つのレンズ間の距離である。ｘ_Ｒは対応点の右フレーム画像でのＸ座標である。ｘ_Ｌは対応点の左フレーム画像でのＸ座標である。なお、焦点距離は２つのレンズで同じ値を持つものとする。なお、式（４）における各パラメータを図示すると、例えば図１６に示すようになる。図１６は３Ｄカメラを上から見た場合の一例の模式図である。 Here, f _C is a focal length which is a parameter of the camera at the time of capturing a 3D image. X _C is the distance between the two lenses of the cameras taking a 3D image. x _R is the X-coordinate of the right frame image of the corresponding points. x _L is the X coordinate in the left frame image of the corresponding point. It is assumed that the focal length has the same value for the two lenses. In addition, when each parameter in Formula (4) is illustrated, it will become as shown, for example in FIG. FIG. 16 is a schematic diagram of an example when the 3D camera is viewed from above.

また、図１７は視差によるＺ値の算出方法の一例のイメージ図である。図１７（Ａ）は二つのレンズ間の視差のイメージを表している。図１７（Ｂ）は同一フレームにおける右フレーム画像及び左フレーム画像から抽出された対応点（特徴点のペア）のイメージを表している。図１７（Ｂ）では特徴点を「○」で表し、ペアとなる特徴点を線で結んで視覚的に表している。図１７（Ｂ）ではＺ値の小さい特徴点のペアを細線で繋ぎ、Ｚ値の大きい特徴点のペアを太線で繋いで表している。 FIG. 17 is an image diagram illustrating an example of a Z value calculation method based on parallax. FIG. 17A shows an image of parallax between two lenses. FIG. 17B shows an image of corresponding points (feature point pairs) extracted from the right frame image and the left frame image in the same frame. In FIG. 17B, feature points are represented by “◯”, and paired feature points are visually represented by connecting them with lines. In FIG. 17B, pairs of feature points having a small Z value are connected by thin lines, and pairs of feature points having a large Z value are connected by thick lines.

図１７に示すように、位置がレンズに近いものほど、左右フレーム画像間での特徴点の左右位置は、ずれる。したがって、奥行き距離の算出は左右フレーム画像間での特徴点の左右位置のずれを利用して行うことができる。 As shown in FIG. 17, the closer the position is to the lens, the more the left and right positions of the feature points are shifted between the left and right frame images. Therefore, the calculation of the depth distance can be performed using the shift of the left and right position of the feature point between the left and right frame images.

図３のステップＳ３以降の処理ではＺ値の絶対値を使用せず、相対値を使用する。したがって、焦点距離と二つのレンズ間の距離とは正の固定値を設定すればよい。また、上記の式（４）から分かるように、Ｚ値と対応点の距離値とは反比例の関係にある。ステップＳ４の処理において対応点の距離値を使用するようにすれば、ステップＳ３の処理ではＺ値の算出まで行う必要はなくなる。特徴点ペア抽出部４４は算出したＺ値を対応点テーブル５４に保存する。 In the processing after step S3 in FIG. 3, the absolute value of the Z value is not used, but the relative value is used. Therefore, a positive fixed value may be set for the focal length and the distance between the two lenses. Further, as can be seen from the above equation (4), the Z value and the distance value of the corresponding point are in an inversely proportional relationship. If the distance value of the corresponding point is used in the process of step S4, it is not necessary to perform the calculation of the Z value in the process of step S3. The feature point pair extraction unit 44 stores the calculated Z value in the corresponding point table 54.

図３のステップＳ３の処理は例えば図１８に示すフローチャートの手順で行うことができる。図１８はステップＳ３の処理の一例のフローチャートである。 The process of step S3 in FIG. 3 can be performed, for example, according to the procedure of the flowchart shown in FIG. FIG. 18 is a flowchart of an example of the process in step S3.

ステップＳ１１において、特徴点ペア抽出部４４は局所特徴量テーブル５３に保存されている左フレーム画像の特徴点を一つ選択し、その特徴点の特徴量を取得する。ステップＳ１２において、特徴点ペア抽出部４４はステップＳ１１で選択した左フレーム画像の特徴点と座標値が近い特徴点を選択し、その特徴点の特徴量を取得する。 In step S11, the feature point pair extraction unit 44 selects one feature point of the left frame image stored in the local feature amount table 53, and acquires the feature amount of the feature point. In step S12, the feature point pair extraction unit 44 selects a feature point having a coordinate value close to that of the left frame image selected in step S11, and acquires a feature amount of the feature point.

ステップＳ１３において、特徴点ペア抽出部４４はステップＳ１１、Ｓ１２で選択した二つの特徴点の特徴量の距離値を算出する。ステップＳ１４において、特徴点ペア抽出部４４は算出した距離値が最小値の場合、ステップＳ１１、Ｓ１２で選択した二つの特徴点及びステップＳ１３で算出した二つの特徴点の特徴量の距離値を保存する。 In step S13, the feature point pair extraction unit 44 calculates the distance value between the feature amounts of the two feature points selected in steps S11 and S12. In step S14, when the calculated distance value is the minimum value, the feature point pair extraction unit 44 stores the distance values of the feature values of the two feature points selected in steps S11 and S12 and the two feature points calculated in step S13. To do.

ステップＳ１５に進み、特徴点ペア抽出部４４は他に右フレーム画像の特徴点があるかを判定する。他に右フレーム画像の特徴点があれば、特徴点ペア抽出部４４は右フレーム画像の他の特徴点に対してステップＳ１２〜Ｓ１４の処理を繰り返す。他に右フレーム画像の特徴点が無くなれば、特徴点ペア抽出部４４はステップＳ１６において、特徴量の距離値が最小値の二つの特徴点のＸ座標の差分とＺ値とを算出する。 In step S15, the feature point pair extraction unit 44 determines whether there are other feature points of the right frame image. If there are other feature points of the right frame image, the feature point pair extraction unit 44 repeats the processes of steps S12 to S14 for the other feature points of the right frame image. If there are no more feature points in the right frame image, the feature point pair extraction unit 44 calculates the difference between the X coordinates and the Z value of the two feature points having the minimum feature value distance value in step S16.

ステップＳ１７において、特徴点ペア抽出部４４は特徴量の距離値が閾値以下である場合に、特徴量の距離値が最小値の二つの特徴点を対応点として対応点テーブル５４へ保存する。ステップＳ１８において、特徴点ペア抽出部４４は他に左フレーム画像の特徴点があるかを判定する。 In step S <b> 17, when the feature value distance value is equal to or smaller than the threshold value, the feature point pair extraction unit 44 stores the two feature points having the minimum feature value distance value as corresponding points in the corresponding point table 54. In step S18, the feature point pair extraction unit 44 determines whether there are other feature points of the left frame image.

他に左フレーム画像の特徴点があれば、特徴点ペア抽出部４４は左フレーム画像の他の特徴点に対してステップＳ１１〜Ｓ１７の処理を繰り返す。他に左フレーム画像の特徴点が無くなれば、特徴点ペア抽出部４４は図１８に示す処理を終了する。 If there are other feature points of the left frame image, the feature point pair extraction unit 44 repeats the processes of steps S11 to S17 for the other feature points of the left frame image. If there are no more feature points in the left frame image, the feature point pair extraction unit 44 ends the process shown in FIG.

図３のステップＳ４の処理の詳細は以下の通りである。奥行きクラスタリング部４５はステップＳ３の処理で求めたＺ値又は距離値を元に、クラスタリング処理を行い、近い奥行き距離のグループを作成する。 The details of the process of step S4 in FIG. 3 are as follows. The depth clustering unit 45 performs a clustering process based on the Z value or the distance value obtained in the process of step S3, and creates a group of close depth distances.

クラスタリング手法としては、ヒストグラムを用いた方法が利用できる。奥行きクラスタリング部４５は特定の間隔でＺ値の範囲（ヒストグラムのビン）を定める。奥行きクラスタリング部４５は各ビンの値を最初に０に初期化する。奥行きクラスタリング部４５は対応点テーブル５４を参照して、各ビンのＺ値の範囲にある対応点の個数を各ビンの値として計算する。奥行きクラスタリング部４５は、計算した各ビンの値をＺ値ヒストグラムテーブル５５に保存する。 As a clustering method, a method using a histogram can be used. The depth clustering unit 45 defines a range of Z values (histogram bins) at specific intervals. The depth clustering unit 45 first initializes the value of each bin to 0. The depth clustering unit 45 refers to the corresponding point table 54 and calculates the number of corresponding points in the range of the Z value of each bin as the value of each bin. The depth clustering unit 45 stores the calculated bin values in the Z value histogram table 55.

図１９は計算した各ビンの値を表す一例のヒストグラムである。図１９のヒストグラムは図１７に示したイメージ図に対応するものである。なお、Ｚ値の範囲は、固定間隔でもかまわないし、Ｚ値の絶対値で変更してもかまわない。上記の式（４）により、Ｚ値は対応点の距離値が０に近くなると急激に大きくなる。そこで、ビンはＺ値が大きい部分の範囲を広めに設定してもよい。奥行きクラスタリング部４５は必要に応じて、各ヒストグラムにどの対応点が格納されたのかを対応点ヒストグラムテーブル５５に保存する。 FIG. 19 is an example histogram showing the calculated value of each bin. The histogram in FIG. 19 corresponds to the image diagram shown in FIG. Note that the range of the Z value may be a fixed interval or may be changed by the absolute value of the Z value. According to the above equation (4), the Z value increases rapidly when the distance value of the corresponding point is close to zero. Therefore, the bin may be set to have a wider range of the portion with a large Z value. The depth clustering unit 45 stores, in the corresponding point histogram table 55, which corresponding points are stored in each histogram as necessary.

奥行きクラスタリング部４５はＺ値ヒストグラムテーブル５５に保存されているヒストグラムのピーク（局所的に値が大きくなっているビン）を見つけ、その前後のビンの個数と併せて一つのクラスタとする。 The depth clustering unit 45 finds the peak of the histogram stored in the Z value histogram table 55 (bin having a locally large value), and combines it with the number of bins before and after that to form one cluster.

ヒストグラムのビン番号ｉの個数をｈ（ｉ）とした場合、奥行きクラスタリング部４５はｈ（ｉ）＞ｈ（ｉ−１）かつｈ（ｉ）＞ｈ（ｉ＋１）であるビン番号ｉのビンをピークとする。奥行きクラスタリング部４５はビン番号ｉのビンと、その前後のビン番号ｉ−１及びｉ＋１のビンとを併せて一つのクラスタとする。奥行きクラスタリング部４５はクラスタにある対応点の個数をｃ＝ｈ（ｉ−１）＋ｈ（ｉ）＋ｈ（ｉ＋１）とする。奥行きクラスタリング部４５は、クラスタにある対応点の個数をＺ値クラスタテーブル５７に保存する。奥行きクラスタリング部４５は必要に応じて、対応点ヒストグラムテーブル５６を元に、各クラスタにどの対応点が含まれているのかを対応点クラスタテーブル５８に保存する。 When the number of bin numbers i in the histogram is h (i), the depth clustering unit 45 selects bins with bin numbers i that satisfy h (i)> h (i−1) and h (i)> h (i + 1). Let it be a peak. The depth clustering unit 45 combines the bin with the bin number i and the bins with bin numbers i−1 and i + 1 before and after that into one cluster. The depth clustering unit 45 sets the number of corresponding points in the cluster as c = h (i−1) + h (i) + h (i + 1). The depth clustering unit 45 stores the number of corresponding points in the cluster in the Z value cluster table 57. The depth clustering unit 45 stores in the corresponding point cluster table 58 which corresponding points are included in each cluster based on the corresponding point histogram table 56 as necessary.

図２０は各クラスタに含まれる対応点を視覚的に表したフレーム画像の一例のイメージ図である。図２０では同一のクラスタに含まれる対応点を線で囲って表している。図２０のイメージ図は図１９に示したヒストグラムに対応するものである。 FIG. 20 is an image diagram of an example of a frame image that visually represents corresponding points included in each cluster. In FIG. 20, corresponding points included in the same cluster are surrounded by a line. The image diagram of FIG. 20 corresponds to the histogram shown in FIG.

なお、ヒストグラムを求めるにあたり精度を向上させるためには、近接の特徴点のＺ値と比較し、近似したＺ値を持たない特徴点を除去する方法が考えられる。例えば奥行きクラスタリング部４５は対応点クラスタテーブル５８、対応点テーブル５４及び局所特徴量テーブル５３から、特徴点のＸ座標値、Ｙ座標値を取得する。 In order to improve accuracy in obtaining the histogram, a method of removing feature points that do not have an approximate Z value by comparing with the Z values of neighboring feature points is conceivable. For example, the depth clustering unit 45 acquires the X coordinate value and the Y coordinate value of the feature point from the corresponding point cluster table 58, the corresponding point table 54, and the local feature amount table 53.

なお、右フレーム画像、左フレーム画像のどちらの座標でもかまわないが、すべての特徴点にたいして同じフレーム画像の座標値を指定する。奥行きクラスタリング部４５は特徴点同士でＸ座標値、Ｙ座標値の差分を抽出して、その差分による距離が予め決められた閾値以下になる特徴点が他にない場合、その特徴点をクラスタから除外する。 The coordinates of either the right frame image or the left frame image may be used, but the coordinate values of the same frame image are designated for all feature points. The depth clustering unit 45 extracts the difference between the X coordinate value and the Y coordinate value between the feature points, and if there is no other feature point whose distance is less than or equal to a predetermined threshold, the feature point is extracted from the cluster. exclude.

また、クラスタを構成する特徴点の個数が少ない場合、奥行きクラスタリング部４５はクラスタそのものを削除するようにする。例えば奥行きクラスタリング部４５は予め閾値を与えて、クラスタ内の特徴点の個数が閾値以下の場合、クラスタを削除する。 Further, when the number of feature points constituting the cluster is small, the depth clustering unit 45 deletes the cluster itself. For example, the depth clustering unit 45 gives a threshold value in advance, and deletes the cluster when the number of feature points in the cluster is equal to or less than the threshold value.

図３のステップＳ４の処理は例えば図２１に示すフローチャートの手順で行うことができる。図２１はステップＳ４の処理の一例のフローチャートである。 The process of step S4 in FIG. 3 can be performed, for example, according to the procedure of the flowchart shown in FIG. FIG. 21 is a flowchart of an example of the process in step S4.

ステップＳ２１において、奥行きクラスタリング部４５はＺ値ヒストグラムテーブル５５の各ビンの値を０で初期化する。ステップＳ２２において、奥行きクラスタリング部４５は対応点テーブル５４を参照して、対応点とそのＺ値とを取得する。ステップＳ２３において、奥行きクラスタリング部４５はＺ値に合致するＺ値ヒストグラムテーブル５５のビンの値に１を加える。 In step S21, the depth clustering unit 45 initializes the value of each bin of the Z value histogram table 55 with zero. In step S22, the depth clustering unit 45 refers to the corresponding point table 54, and acquires the corresponding point and its Z value. In step S23, the depth clustering unit 45 adds 1 to the bin value of the Z value histogram table 55 that matches the Z value.

ステップＳ２４において、奥行きクラスタリング部４５は対応点テーブル５４を参照して、他に対応点があるか否かを判定する。他に対応点があれば、奥行きクラスタリング部４５はステップＳ２２〜Ｓ２３の処理を繰り返す。他に対応点が無ければ、奥行きクラスタリング部４５はステップＳ２５において、Ｚ値ヒストグラムテーブル５５を参照してヒストグラムのピークを求める。 In step S24, the depth clustering unit 45 refers to the corresponding point table 54 and determines whether there is another corresponding point. If there is another corresponding point, the depth clustering unit 45 repeats the processes of steps S22 to S23. If there is no other corresponding point, the depth clustering unit 45 refers to the Z value histogram table 55 to obtain the peak of the histogram in step S25.

ステップＳ２６において、奥行きクラスタリング部４５はピーク近傍の対応点をまとめてクラスタを作成し、対応点クラスタテーブル５８に保存する。奥行きクラスタリング部４５はステップＳ２７において、他にヒストグラムのピークがあるか否かを判定する。他にヒストグラムのピークがあれば、奥行きクラスタリング部４５はステップＳ２５〜Ｓ２６の処理を繰り返す。他にヒストグラムのピークが無ければ、奥行きクラスタリング部４５はステップＳ２８において、後述するように、クラスタ内に近似の特徴点のない特徴点を除去する。 In step S <b> 26, the depth clustering unit 45 collects corresponding points in the vicinity of the peak to create a cluster, and stores the cluster in the corresponding point cluster table 58. In step S27, the depth clustering unit 45 determines whether there is another histogram peak. If there are other histogram peaks, the depth clustering unit 45 repeats the processes of steps S25 to S26. If there is no other histogram peak, in step S28, the depth clustering unit 45 removes a feature point having no approximate feature point in the cluster, as will be described later.

ステップＳ２９において、奥行きクラスタリング部４５はクラスタ内の特徴点の個数が閾値以下の場合、クラスタを削除する。ステップＳ３０において、奥行きクラスタリング部４５は他に、クラスタがあるか否かを判定する。他にクラスタがあれば、奥行きクラスタリング部４５はステップＳ２８〜Ｓ２９の処理を繰り返す。奥行きクラスタリング部４５は、他にクラスタが無ければ図２１に示す処理を終了する。 In step S29, the depth clustering unit 45 deletes the cluster when the number of feature points in the cluster is equal to or smaller than the threshold value. In step S30, the depth clustering unit 45 determines whether there is another cluster. If there are other clusters, the depth clustering unit 45 repeats the processes of steps S28 to S29. If there is no other cluster, the depth clustering unit 45 ends the process shown in FIG.

図３のステップＳ５の処理の詳細は以下の通りである。サムネイル評価部４６はクラスタリング結果を元に、現在のフレームがサムネイルとして適しているかどうかを評価値として算出する。 The details of the process of step S5 in FIG. 3 are as follows. Based on the clustering result, the thumbnail evaluation unit 46 calculates whether or not the current frame is suitable as a thumbnail as an evaluation value.

フレームの評価値としては、以下の指標を用いる。一つ目の指標は、フレームに含まれるクラスタの数が複数あるか否かである。フレームに含まれるクラスタが一つの場合は当然ながら立体的に見えない。フレームはクラスタが三つ程度含まれると、より立体的に見える。逆に、フレームはクラスタの数が多すぎると、全体的に雑然としてしまい立体的に見えにくくなる。 The following indices are used as frame evaluation values. The first index is whether or not there are a plurality of clusters included in a frame. Obviously, when a single cluster is included in a frame, it does not look three-dimensional. A frame looks more three-dimensional when it contains about three clusters. On the other hand, if the number of clusters in the frame is too large, the frame becomes cluttered as a whole and becomes difficult to see in three dimensions.

二つ目の指標は、各クラスタ内の対応点がまとまっているか否かである。クラスタ内の対応点が画像内に点在する場合、フレームは立体的に見えにくくなる。三つ目の指標はクラスタ間のＺ値の距離が大きいか否かである。フレームはクラスタ間のＺ値の距離が大きい場合、より立体的に見える。 The second index is whether or not the corresponding points in each cluster are collected. When the corresponding points in the cluster are scattered in the image, the frame becomes difficult to see three-dimensionally. The third index is whether or not the Z value distance between clusters is large. The frame looks more three-dimensional when the Z-value distance between clusters is large.

四つ目の指標は、クラスタのうち一番奥に存在するクラスタが背景であり、背景に含まれる特徴点の数が多いか否かである。フレームは背景に含まれる特徴点の数が多いほど立体的に見える。さらに、フレームは構図的に画面上部に背景の特徴点が集まっているほど立体的に見えやすい。 The fourth index is whether the innermost cluster among the clusters is the background and whether the number of feature points included in the background is large. The frame looks three-dimensional as the number of feature points included in the background increases. In addition, the frame is easier to see in three dimensions as the background feature points gather at the top of the screen.

サムネイル評価部４６は、上記の指標を元にスコアリングして、立体らしさの値を抽出する。クラスタの数に関しては、重みｗ_ｃと、クラスタ数を引数にする評価関数ｆ_ｃを用いる。重みｗ_ｃはあらかじめ決められた定数とする。評価関数ｆ_ｃについては、以下のような式（５）が利用できる。 The thumbnail evaluation unit 46 performs scoring based on the above-described index, and extracts a three-dimensionality value. Regarding the number of clusters, using the evaluation function f _c of the weight w _c, the number of clusters in the argument. The weight w _c is a predetermined constant. For the evaluation function f _c may Equation (5) is utilized as follows.

各クラスタ内の対応点のまとまりは、クラスタ間平均距離などの手法を用いることができる。画像サイズをｗ、ｈ、クラスタ集合をＣ、クラスタ内の対応点の個数をｎ、対応点ｉのＸ座標をｘ（ｉ）、ｙ座標をｙ（ｉ）とした場合、対応点のまとまりの評価関数ｆｓは以下のような式（６）で表現できる。 A method such as an average distance between clusters can be used to collect corresponding points in each cluster. When the image size is w, h, the cluster set is C, the number of corresponding points in the cluster is n, the X coordinate of the corresponding point i is x (i), and the y coordinate is y (i), the set of corresponding points The evaluation function fs can be expressed by the following equation (6).

クラスタ間のＺ値の距離の大きさは、クラスタの一番奥のＺ値をｚ_ｂ、一番前のクラスタのＺ値をｚ_ｆとした場合、クラスタ間のＺ値の距離の大きさの評価関数ｆ_ｚは以下のような式（７）で表現できる。 The magnitude of the Z value distance between clusters is the magnitude of the distance of the Z value between clusters, where z _b is the innermost Z value of the cluster and z _f is the Z value of the frontmost cluster. The evaluation function f _z can be expressed by the following equation (7).

背景の評価については、一番奥のクラスタ集合をＣ_ｂ、クラスタ内の対応点の個数をｎ、画像の高さをｈ、対応点ｉのｙ座標をｙ（ｉ）とした場合、背景の評価の評価関数ｆ_ｂは以下のような式（８）で表現できる。 Regarding the background evaluation, when the innermost cluster set is C _b , the number of corresponding points in the cluster is n, the height of the image is h, and the y coordinate of the corresponding point i is y (i), The evaluation function f _b for evaluation can be expressed by the following equation (8).

フレームの評価関数ｆは、式（９）に表現するように、上記４つの評価関数を重みで結合した形式で求める。 The frame evaluation function f is obtained in a form in which the above four evaluation functions are combined with weights, as expressed in Equation (9).

図３のステップＳ５の処理は例えば図２２に示すフローチャートの手順で行うことができる。図２２はステップＳ５の処理の一例のフローチャートである。 The process of step S5 of FIG. 3 can be performed, for example, according to the procedure of the flowchart shown in FIG. FIG. 22 is a flowchart of an example of the process in step S5.

ステップＳ４１において、サムネイル評価部４６はクラスタの個数に関する評価値を算出する。ステップＳ４２において、サムネイル評価部４６はクラスタのまとまりに関する評価値を算出する。ステップＳ４３において、サムネイル評価部４６はクラスタのＺ値差分に関する評価値を算出する。ステップＳ４４において、サムネイル評価部４６はクラスタの背景に関する評価値を算出する。ステップＳ４５において、そして、サムネイル評価部４６はステップＳ４１〜Ｓ４４で算出した全ての評価値の重み付きで結合し、評価値を算出する。 In step S41, the thumbnail evaluation unit 46 calculates an evaluation value related to the number of clusters. In step S42, the thumbnail evaluation unit 46 calculates an evaluation value related to the cluster group. In step S43, the thumbnail evaluation unit 46 calculates an evaluation value related to the Z value difference of the cluster. In step S44, the thumbnail evaluation unit 46 calculates an evaluation value related to the cluster background. In step S45, the thumbnail evaluation unit 46 combines all the evaluation values calculated in steps S41 to S44 with weights to calculate an evaluation value.

なお、図２２に示したステップＳ５の処理は一例であって、ステップＳ４１〜Ｓ４４の少なくとも一つ以上の組み合わせを利用して評価値を算出するものであればよい。 Note that the process of step S5 illustrated in FIG. 22 is an example, and any process may be used as long as the evaluation value is calculated using at least one combination of steps S41 to S44.

図２１のステップＳ２８の処理の詳細は以下の通りである。図２３はステップＳ２８の処理の一例のフローチャートである。奥行きクラスタリング部４５はステップＳ５１において、対応点テーブル５４及び対応点クラスタテーブル５８を参照し、クラスタ内の特徴点の一つを選択し、位置を取得する。 Details of the processing in step S28 in FIG. 21 are as follows. FIG. 23 is a flowchart of an example of the process in step S28. In step S51, the depth clustering unit 45 refers to the corresponding point table 54 and the corresponding point cluster table 58, selects one of the feature points in the cluster, and acquires the position.

ステップＳ５２において、奥行きクラスタリング部４５はクラスタ内の他の特徴点の一つを選択し、位置を取得する。ステップＳ５３において、奥行きクラスタリング部４５はステップＳ５１及びＳ５２で選択した特徴点の位置の差（距離）を算出する。 In step S52, the depth clustering unit 45 selects one of the other feature points in the cluster and acquires the position. In step S53, the depth clustering unit 45 calculates the difference (distance) between the positions of the feature points selected in steps S51 and S52.

ステップＳ５４に進み、奥行きクラスタリング部４５はステップＳ５３で算出した距離が閾値以下か否かを判定する。閾値以下でなければ、ステップＳ５５において、奥行きクラスタリング部４５はクラスタ内に他の特徴点があるか否かを判定する。クラスタ内に他の特徴点があれば、奥行きクラスタリング部４５はステップＳ５２〜Ｓ５４の処理を繰り返す。クラスタ内に他の特徴点がなければ、奥行きクラスタリング部４５はステップＳ５６において、ステップＳ５１で選択した特徴点をクラスタから削除する。 In step S54, the depth clustering unit 45 determines whether the distance calculated in step S53 is equal to or less than a threshold value. If not below the threshold value, in step S55, the depth clustering unit 45 determines whether there is another feature point in the cluster. If there is another feature point in the cluster, the depth clustering unit 45 repeats the processing of steps S52 to S54. If there is no other feature point in the cluster, the depth clustering unit 45 deletes the feature point selected in step S51 from the cluster in step S56.

また、ステップＳ５４において閾値以下であるか、ステップＳ５６の処理の後、奥行きクラスタリング部４５はステップＳ５７においてクラスタ内に他の特徴点があるか否かを判定する。クラスタ内に他の特徴点があれば、奥行きクラスタリング部４５はステップＳ５１〜Ｓ５６の処理を繰り返す。クラスタ内に他の特徴点がなければ、奥行きクラスタリング部４５は図２３に示す処理を終了する。 In step S54, the depth clustering unit 45 determines whether there is another feature point in the cluster in step S57 after the processing in step S56. If there is another feature point in the cluster, the depth clustering unit 45 repeats the processes of steps S51 to S56. If there is no other feature point in the cluster, the depth clustering unit 45 ends the processing shown in FIG.

図２４は本実施例のサムネイル抽出装置の他の例の機能ブロック図である。図２４のサムネイル抽出装置４０Ａは、図２のサムネイル抽出装置４０に、もう一つの評価軸である背景の評価の追加を行っている。具体的には図２４のサムネイル抽出装置４０Ａは図２のサムネイル抽出装置４０の構成に背景評価部４８が追加されている。特にホームビデオなどの場合、旅行先で撮影した映像など、背景が特徴的である部分をサムネイルとして抽出することは有効である。 FIG. 24 is a functional block diagram of another example of the thumbnail extracting apparatus of this embodiment. The thumbnail extraction device 40A in FIG. 24 adds background evaluation, which is another evaluation axis, to the thumbnail extraction device 40 in FIG. Specifically, in the thumbnail extraction device 40A of FIG. 24, a background evaluation unit 48 is added to the configuration of the thumbnail extraction device 40 of FIG. In particular, in the case of home video, it is effective to extract a portion having a characteristic background, such as a video taken at a travel destination, as a thumbnail.

図２４のサムネイル抽出装置４０Ａについては、図２のサムネイル抽出装置４０との差異を中心に説明する。サムネイル抽出装置４０Ａのフローチャートは図３に示したサムネイル抽出装置４０とステップＳ５の処理が異なる。 The thumbnail extraction device 40A in FIG. 24 will be described focusing on differences from the thumbnail extraction device 40 in FIG. The flowchart of the thumbnail extracting device 40A differs from the thumbnail extracting device 40 shown in FIG. 3 in the process of step S5.

サムネイル抽出装置４０ＡのステップＳ５の処理は例えば図２５に示すフローチャートの手順で行うことができる。図２５は、ステップＳ５の処理の他の例のフローチャートである。 The process of step S5 of the thumbnail extraction device 40A can be performed, for example, according to the procedure of the flowchart shown in FIG. FIG. 25 is a flowchart of another example of the process in step S5.

ステップＳ６１〜Ｓ６４までの処理は図２２のステップＳ４１〜Ｓ４４と同様であるため説明を省略する。ステップＳ６５において、背景評価部４８はフレーム間で共通する背景を評価する。 The processing from step S61 to S64 is the same as that from step S41 to S44 in FIG. In step S65, the background evaluation unit 48 evaluates a common background between frames.

背景は一番奥のクラスタであるため、一番奥のクラスタのみを利用する。フレーム数をＦとし、フレームｉにおける背景クラスタをＣｂ（ｉ）とし、個数をｎ（ｉ）とし、対応点ｊの特徴量をｆ（ｉ、ｊ）とし、予め与えられた距離の閾値をＴとすると、フレーム間で共通する背景の評価の評価関数ｆ_ｆは以下のような式（１０）で表現できる。 Since the background is the innermost cluster, only the innermost cluster is used. The number of frames is F, the background cluster in frame i is Cb (i), the number is n (i), the feature quantity of the corresponding point j is f (i, j), and a predetermined distance threshold is T Then, the evaluation function f _f of the background evaluation common to the frames can be expressed by the following equation (10).

ステップＳ６６において、サムネイル評価部４６はステップＳ６１〜Ｓ６５で算出した全ての評価値の重み付きで結合し、評価値を算出する。サムネイル評価部４６はフレームの評価関数ｆを、式（１１）に表現するように、上記５つの評価関数を重み付きで結合した形式で求める。 In step S66, the thumbnail evaluation unit 46 combines all the evaluation values calculated in steps S61 to S65 with weights to calculate an evaluation value. The thumbnail evaluation unit 46 obtains the frame evaluation function f in a form in which the above five evaluation functions are combined with weights so as to be expressed by Expression (11).

なお、図２５に示したステップＳ５の処理は一例であって、ステップＳ６１〜Ｓ６５の少なくとも一つ以上の組み合わせを利用して評価値を算出するものであればよい。 Note that the process of step S5 illustrated in FIG. 25 is an example, and any process may be used as long as the evaluation value is calculated using at least one combination of steps S61 to S65.

（まとめ）
実施例１及び２に示したサムネイル抽出装置４０及び４０Ａは、３Ｄ映像を対象とした場合、より立体的に見えるサムネイルを抽出する必要がある。３Ｄ映像中には、あまり立体的に見えないシーンが多い。実施例１及び２に示したサムネイル抽出装置４０及び４０Ａは立体的に見えるシーンからサムネイルを抽出することにより、より印象的で見栄えのするサムネイルを選択することができる。 (Summary)
The thumbnail extracting devices 40 and 40A shown in the first and second embodiments need to extract thumbnails that look more stereoscopic when 3D video is targeted. In 3D video, there are many scenes that do not look very three-dimensional. The thumbnail extracting devices 40 and 40A shown in the first and second embodiments can select thumbnails that are more impressive and look good by extracting thumbnails from a stereoscopically visible scene.

実施例１及び２に示したサムネイル抽出装置４０及び４０Ａによれば、３Ｄ映像の特徴を用いて画像の奥行きを求め、その構図を元に、立体的に表示できるサムネイルを選択できる。 According to the thumbnail extraction devices 40 and 40A shown in the first and second embodiments, the depth of an image is obtained using the characteristics of the 3D video, and thumbnails that can be displayed in three dimensions can be selected based on the composition.

本実施例における推薦プログラムはパッケージソフトの他、ＷＥＢサービス等によっても提供可能である。 The recommendation program in this embodiment can be provided not only by package software but also by a WEB service or the like.

本発明は、以下に記載する付記のような構成が考えられる。
（付記１）
３Ｄ映像の一フレームを構成するフレーム画像ペア間で対応する特徴点を特徴点ペアとして前記フレーム画像ペアから抽出し、
前記特徴点ペアの奥行き量を前記特徴点ペアの点間距離、所定のレンズ間距離・焦点距離を基に算出して、前記奥行き量で前記特徴点ペアをクラスタリングし、
予め定められたクラスタリングの結果情報に基づく前記フレーム画像ペアの立体的な見え易さの評価条件を基に、１以上の前記フレーム画像ペアのクラスタリングの結果情報から前記１以上のフレーム画像ペアの立体的な見え易さを評価し、
前記１以上のフレーム画像ペアの立体的な見え易さの評価結果に基づき、前記評価結果の最も良い前記フレーム画像ペアを、サムネイル用の前記フレーム画像ペアとして抽出する
処理をコンピュータに実行させるサムネイル抽出プログラム。
（付記２）
前記１以上のフレーム画像ペアの立体的な見え易さを評価する処理は、クラスタの個数を元に、前記フレーム画像ペアの立体的な見え易さを評価する
付記１記載のサムネイル抽出プログラム。
（付記３）
前記１以上のフレーム画像ペアの立体的な見え易さを評価する処理は、クラスタ内の特徴点の位置的な集合度を元に、前記フレーム画像ペアの立体的な見え易さを評価する
付記１又は２記載のサムネイル抽出プログラム。
（付記４）
前記１以上のフレーム画像ペアの立体的な見え易さを評価する処理は、クラスタ間の奥行き量の差を元に、前記フレーム画像ペアの立体的な見え易さを評価する
付記１乃至３何れかに記載のサムネイル抽出プログラム。
（付記５）
前記１以上のフレーム画像ペアの立体的な見え易さを評価する処理は、一番奥のクラスタの個数及び配置を元に、前記フレーム画像ペアの立体的な見え易さを評価する
付記１乃至３何れかに記載のサムネイル抽出プログラム。
（付記６）
前記１以上のフレーム画像ペアの立体的な見え易さを評価する処理は、複数フレーム間における共通の背景を元に、前記フレーム画像ペアの立体的な見え易さを評価する
付記１乃至５何れかに記載のサムネイル抽出プログラム。
（付記７）
前記特徴点を前記特徴点ペアとして前記フレーム画像ペアから抽出する処理は、
前記３Ｄ映像の一フレームを構成する前記フレーム画像ペアを抽出し、
前記抽出した前記フレーム画像ペアから前記特徴点を抽出し、
前記抽出した前記特徴点の局所特徴量を抽出し、
前記抽出した前記局所特徴量を用いて、前記フレーム画像ペア間で対応する前記特徴点を前記特徴点ペアとして抽出する
付記１乃至６何れかに記載のサムネイル抽出プログラム。
（付記８）
コンピュータによって実行されるサムネイル抽出方法であって、
３Ｄ映像の一フレームを構成するフレーム画像ペア間で対応する特徴点を特徴点ペアとして前記フレーム画像ペアから抽出し、
前記特徴点ペアの奥行き量を前記特徴点ペアの点間距離、所定のレンズ間距離・焦点距離を基に算出して、前記奥行き量で前記特徴点ペアをクラスタリングし、
予め定められたクラスタリングの結果情報に基づく前記フレーム画像ペアの立体的な見え易さの評価条件を基に、１以上の前記フレーム画像ペアのクラスタリングの結果情報から前記１以上のフレーム画像ペアの立体的な見え易さを評価し、
前記１以上のフレーム画像ペアの立体的な見え易さの評価結果に基づき、前記評価結果の最も良い前記フレーム画像ペアを、サムネイル用の前記フレーム画像ペアとして抽出する
ことを特徴とするサムネイル抽出方法。 The present invention may have the following configurations as described below.
(Appendix 1)
A feature point corresponding to a pair of frame images constituting one frame of 3D video is extracted from the frame image pair as a feature point pair,
Calculating a depth amount of the feature point pair based on a distance between points of the feature point pair, a predetermined inter-lens distance / focal length, and clustering the feature point pairs by the depth amount;
Based on the evaluation condition of the three-dimensional visibility of the frame image pair based on predetermined clustering result information, the three-dimensional image of the one or more frame image pairs is obtained from the result information of the clustering of the one or more frame image pairs. Appreciable visibility,
Thumbnail extraction for causing a computer to execute processing for extracting the frame image pair having the best evaluation result as the frame image pair for thumbnails based on the evaluation result of the three-dimensional visibility of the one or more frame image pairs program.
(Appendix 2)
The thumbnail extraction program according to appendix 1, wherein the process of evaluating the stereoscopic visibility of the one or more frame image pairs evaluates the stereoscopic visibility of the frame image pair based on the number of clusters.
(Appendix 3)
The process of evaluating the stereoscopic visibility of the one or more frame image pairs is performed by evaluating the stereoscopic visibility of the frame image pairs based on the degree of positional aggregation of feature points in the cluster. A thumbnail extraction program according to 1 or 2.
(Appendix 4)
The processing for evaluating the three-dimensional visibility of the one or more frame image pairs includes any one of appendices 1 to 3 for evaluating the three-dimensional visibility of the frame image pair based on a difference in depth between clusters. Crab thumbnail extraction program.
(Appendix 5)
The process of evaluating the stereoscopic visibility of the one or more frame image pairs is performed by evaluating the stereoscopic visibility of the frame image pair based on the number and arrangement of the innermost clusters. 3. The thumbnail extraction program according to any one of 3 above.
(Appendix 6)
The processing for evaluating the three-dimensional visibility of the one or more frame image pairs includes any one of appendices 1 to 5 for evaluating the three-dimensional visibility of the frame image pair based on a common background among a plurality of frames. Crab thumbnail extraction program.
(Appendix 7)
The process of extracting the feature point from the frame image pair as the feature point pair is as follows:
Extracting the frame image pair constituting one frame of the 3D video;
Extracting the feature points from the extracted frame image pair;
Extracting a local feature amount of the extracted feature point;
The thumbnail extraction program according to any one of supplementary notes 1 to 6, wherein the feature points corresponding to the frame image pairs are extracted as the feature point pairs using the extracted local feature amounts.
(Appendix 8)
A thumbnail extraction method executed by a computer,
A feature point corresponding to a pair of frame images constituting one frame of 3D video is extracted from the frame image pair as a feature point pair,
Calculating a depth amount of the feature point pair based on a distance between points of the feature point pair, a predetermined inter-lens distance / focal length, and clustering the feature point pairs by the depth amount;
Based on the evaluation condition of the three-dimensional visibility of the frame image pair based on predetermined clustering result information, the three-dimensional image of the one or more frame image pairs is obtained from the result information of the clustering of the one or more frame image pairs. Appreciable visibility,
A thumbnail extraction method, wherein the frame image pair having the best evaluation result is extracted as the frame image pair for thumbnails based on the evaluation result of the stereoscopic visibility of the one or more frame image pairs. .

１０ＰＣ
２１入力装置
２２表示装置
２３ＰＣ本体
３１主記憶装置
３２演算処理装置
３３インタフェース装置
３４記録媒体読取装置
３５補助記憶装置
３６記録媒体
３７バス
４０、４０Ａサムネイル抽出装置
４１フレーム画像取得部
４２特徴点抽出部
４３局所特徴抽出部
４４特徴点ペア抽出部
４５奥行きクラスタリング部
４６サムネイル評価部
４７サムネイル抽出部
４８背景評価部
５１映像情報テーブル
５２フレーム情報テーブル
５３局所特徴量テーブル
５４対応点テーブル
５５Ｚ値ヒストグラムテーブル
５６対応点ヒストグラムテーブル
５７Ｚ値クラスタテーブル
５８対応点クラスタテーブル
５９フレーム評価テーブル
１００左フレーム画像
１０１右フレーム画像 10 PC
DESCRIPTION OF SYMBOLS 21 Input device 22 Display device 23 PC main body 31 Main memory device 32 Arithmetic processing device 33 Interface device 34 Recording medium reading device 35 Auxiliary storage device 36 Recording medium 37 Bus 40, 40A Thumbnail extraction device 41 Frame image acquisition unit 42 Feature point extraction unit 43 local feature extraction unit 44 feature point pair extraction unit 45 depth clustering unit 46 thumbnail evaluation unit 47 thumbnail extraction unit 48 background evaluation unit 51 video information table 52 frame information table 53 local feature table 54 corresponding point table 55 Z value histogram table 56 Corresponding point histogram table 57 Z-value cluster table 58 Corresponding point cluster table 59 Frame evaluation table 100 Left frame image 101 Right frame image

Claims

A feature point corresponding to a pair of frame images constituting one frame of 3D video is extracted from the frame image pair as a feature point pair,
Calculating a depth amount of the feature point pair based on a distance between points of the feature point pair, a predetermined inter-lens distance / focal length, and clustering the feature point pairs by the depth amount;
Based on the evaluation condition of the three-dimensional visibility of the frame image pair based on predetermined clustering result information, the three-dimensional image of the one or more frame image pairs is obtained from the result information of the clustering of the one or more frame image pairs. Appreciable visibility,
Thumbnail extraction for causing a computer to execute processing for extracting the frame image pair having the best evaluation result as the frame image pair for thumbnails based on the evaluation result of the three-dimensional visibility of the one or more frame image pairs program.

The thumbnail extraction program according to claim 1, wherein the process of evaluating the stereoscopic visibility of the one or more frame image pairs evaluates the stereoscopic visibility of the frame image pair based on the number of clusters.

The process of evaluating the stereoscopic visibility of the one or more frame image pairs evaluates the stereoscopic visibility of the frame image pairs based on the degree of positional aggregation of feature points in the cluster. Item 3. A thumbnail extraction program according to item 1 or 2.

The process of evaluating the stereoscopic visibility of the one or more frame image pairs evaluates the stereoscopic visibility of the frame image pair based on a difference in depth amount between clusters. A thumbnail extraction program according to any one of the above.

A thumbnail extraction method executed by a computer,
A feature point corresponding to a pair of frame images constituting one frame of 3D video is extracted from the frame image pair as a feature point pair,
Calculating a depth amount of the feature point pair based on a distance between points of the feature point pair, a predetermined inter-lens distance / focal length, and clustering the feature point pairs by the depth amount;
Based on the evaluation condition of the three-dimensional visibility of the frame image pair based on predetermined clustering result information, the three-dimensional image of the one or more frame image pairs is obtained from the result information of the clustering of the one or more frame image pairs. Appreciable visibility,
A thumbnail extraction method, wherein the frame image pair having the best evaluation result is extracted as the frame image pair for thumbnails based on the evaluation result of the stereoscopic visibility of the one or more frame image pairs. .