JP2013533669A

JP2013533669A - Video summary instruction metadata storage

Info

Publication number: JP2013533669A
Application number: JP2013512654A
Authority: JP
Inventors: アーロントーマスディーバー
Original assignee: インテレクチュアルベンチャーズファンド８３エルエルシー
Priority date: 2010-05-25
Filing date: 2011-05-17
Publication date: 2013-08-22
Also published as: US20110292244A1; US8520088B2; EP2577664B1; CN102906818B; CN102906818A; US9124860B2; WO2011149698A1; US20130336633A1; EP2577664A1

Abstract

ディジタルビデオ撮影装置で撮影されたディジタルビデオ時系列に関するビデオサマリを格納する方法であって、複数個のビデオフレームを有するディジタルビデオ時系列をディジタルビデオ撮影装置で撮影するステップと、そのディジタルビデオ時系列をプロセッサ可アクセスメモリ内に格納するステップと、格納済ディジタルビデオ時系列から一群のビデオフレームに相応するキービデオ断片を１個又は複数個特定するステップと、キービデオ断片同士を結合させることでビデオサマリを生成するステップと、そのビデオサマリに相応するビデオフレーム群を指し示すメタデータをその格納済ディジタルビデオ時系列に関連付けて格納することでプロセッサ可アクセスメモリにおけるビデオメモリの格納先を特定するステップと、を有する方法を提供する。 A method for storing a video summary relating to a digital video time series photographed by a digital video photographing apparatus, the step of photographing a digital video time series having a plurality of video frames by a digital video photographing apparatus, and the digital video time series In a processor-accessible memory, identifying one or more key video fragments corresponding to a group of video frames from a stored digital video time series, and combining the key video fragments A step of generating a summary; and a step of specifying a storage location of the video memory in the processor-accessible memory by storing metadata indicating a video frame group corresponding to the video summary in association with the stored digital video time series, and Have To provide a method.

Description

本発明はディジタルビデオ処理、特にディジタルビデオサマリを生成する方法に関する。 The present invention relates to digital video processing, and more particularly to a method for generating a digital video summary.

スチル画像に加えビデオ画像の撮影も可能なディジタル撮影装置は多々あるが、ディジタルビデオコンテンツの管理は面倒な作業になりがちである。これは、ビデオコンテンツの視覚的象徴としてビデオ画像内冒頭フレームのサムネイル画像が一般に使用されているからである。サムネイル画像ではビデオコンテンツを十分に推し量るのが難しいので、得られたビデオ画像にどのような出来事が写っているかを知るためそのビデオ画像全体を視聴せざるを得なくなる場合がある。ユーザは、長々しいビデオ画像全体を視聴することよりはそのビデオ画像の簡潔なサマリを視聴することの方を好むものである。 Although there are many digital photographing devices that can take still images as well as video images, managing digital video content tends to be a tedious task. This is because the thumbnail image of the first frame in the video image is generally used as a visual symbol of the video content. Since it is difficult to estimate the video content sufficiently with a thumbnail image, there are cases where it is necessary to view the entire video image in order to know what event is reflected in the obtained video image. Users prefer to watch a concise summary of the video image rather than watching the entire lengthy video image.

ディジタルビデオには共有に関わる現実的な問題もある。多くのディジタル撮影装置で記録速度が３０乃至６０フレーム／ｓｅｃの速度であり、空間解像度が１９２０×１０８０画素以上であるため、圧縮してもかなりのデータ量になってしまい短めのビデオ画像でも実際上共有できないことである。 Digital video also has practical problems related to sharing. In many digital photographing devices, the recording speed is 30 to 60 frames / sec and the spatial resolution is 1920 × 1080 pixels or more, so even if it is compressed, a considerable amount of data is obtained. It cannot be shared.

ビデオ編集ソフトウェアを用いたビデオ画像のマニュアル操作で、より容易に共有可能な短縮版（サマリ）を作成することも可能である。しかし、マニュアルでのビデオ編集は長々しく面倒な作業になることが多く、大抵のユーザにとって苦痛なものである。他方、自動ビデオサマリ生成アルゴリズム、即ち撮影で得られたビデオ画像を解析してそのサマリを生成するアルゴリズムも存在している。しかし、ビデオ画像の解析によるサマリ生成時にそのビデオ画像を復号する必要があるため、そうしたアルゴリズムは非常に複雑なものとなる。即ち、その種のアルゴリズムをディジタル撮影装置上で実行し、撮影で得られたばかりのビデオ画像に相応するサマリを即座に視聴に供することができない。撮影で得られたビデオ画像の迅速な確認及び共有を可能とする上で、この短所は妨げとなっている。 It is also possible to create a shortened version (summary) that can be shared more easily by manual operation of video images using video editing software. However, manual video editing is often a lengthy and tedious task and is painful for most users. On the other hand, there is an automatic video summary generation algorithm, that is, an algorithm for analyzing a video image obtained by shooting and generating the summary. However, since the video image needs to be decoded when the summary is generated by analyzing the video image, such an algorithm is very complicated. That is, when such an algorithm is executed on a digital photographing device, a summary corresponding to the video image just obtained by photographing cannot be immediately viewed. This disadvantage is hindered in allowing quick confirmation and sharing of video images obtained by shooting.

国際公開第２００７／１２２５４１号パンフレットInternational Publication No. 2007/122541 Pamphlet 欧州特許第２０６３６３５号明細書European Patent No. 2063635 米国特許第５８１８４３９号明細書US Pat. No. 5,818,439 米国特許出願公開第２００７／０１８２８６１号明細書US Patent Application Publication No. 2007/0182861 米国特許出願公開第２００７／０２３７２２５号明細書US Patent Application Publication No. 2007/0237225 米国特許第３９７１０６５号明細書US Pat. No. 3,971,065 米国特許第４６４２６７８号明細書US Pat. No. 4,642,678 米国特許第４７７５５７４号明細書US Pat. No. 4,775,574 米国特許第５１８９５１１号明細書US Pat. No. 5,189,511 米国特許第５４９３３３５号明細書US Pat. No. 5,493,335 米国特許第５６５２６２１号明細書US Pat. No. 5,562,621 米国特許第５６６８５９７号明細書US Pat. No. 5,668,597 米国特許第５９９５０９５号明細書US Pat. No. 5,995,095 米国特許第６１９２１６２号明細書US Pat. No. 6,192,162 米国特許第６２９２２１８号明細書US Pat. No. 6,292,218 米国特許第６８３３８６５号明細書US Pat. No. 6,833,865 米国特許第６９３４０５６号明細書US Pat. No. 6,934,056 米国特許第７０３５４３５号明細書US Pat. No. 7,035,435 米国特許第７０４６７３１号明細書US Pat. No. 7,046,731 米国特許第７４０３２２４号明細書US Pat. No. 7,403,224 米国特許第７４０９１４４号明細書US Pat. No. 7,409,144 米国特許第７４８３６１８号明細書US Pat. No. 7,483,618 米国特許第７５４２０７７号明細書US Pat. No. 7,420,277 米国特許出願公開第２００４／００５２５０５号明細書US Patent Application Publication No. 2004/0052505 米国特許出願公開第２００５／０１９１７２９号明細書US Patent Application Publication No. 2005/0191729 米国特許出願公開第２００７／０１８３４９７号明細書US Patent Application Publication No. 2007/0183497 米国特許出願公開第２００９／０００７２０２号明細書US Patent Application Publication No. 2009/0007202

MA, Y-F et al., "A Generic Framework of User Attention Model and its Application in Video Summarization," IEEE Transactions on Multimedia, IEEE Service Center, Piscata Way, NJ, US, vol.7, no.5, 1 October 2005 (2005-10-01), pages 907-919, XP01113970, ISSN:1520-9210, DOI:10.1109/TMM.2005, 854410MA, YF et al., "A Generic Framework of User Attention Model and its Application in Video Summarization," IEEE Transactions on Multimedia, IEEE Service Center, Piscata Way, NJ, US, vol.7, no.5, 1 October 2005 (2005-10-01), pages 907-919, XP01113970, ISSN: 1520-9210, DOI: 10.1109 / TMM.2005, 854410 MA, Y-F et al., "A User Attention Model for Video Summarization," Proceedings 10th ACM International Conference on Multimedia, Juanles-Pins, France, Dec.1-6, 2002, vol.Conf.10, 1 December 2002 (2002-12-01), pages 533-543, XP001175055, DOI:10.1145/641107,641116, ISBN:978-1-58113-620-3MA, YF et al., "A User Attention Model for Video Summarization," Proceedings 10th ACM International Conference on Multimedia, Juanles-Pins, France, Dec.1-6, 2002, vol.Conf.10, 1 December 2002 (2002 -12-01), pages 533-543, XP001175055, DOI: 10.1145 / 641107,641116, ISBN: 978-1-58113-620-3 Divakaran, A, et al., "Video Browsing System for Personal Video Recorders," Proceedings of SPIE, The International Society for Optical Engineering SPIE, USA, vol.4861, 1 January 2002 (2002-01-01), pages 22-25, XP009092815, ISSN:0277-786X, DOI:10.117/12.470201Divakaran, A, et al., "Video Browsing System for Personal Video Recorders," Proceedings of SPIE, The International Society for Optical Engineering SPIE, USA, vol.4861, 1 January 2002 (2002-01-01), pages 22- 25, XP009092815, ISSN: 0277-786X, DOI: 10.117 / 12.470201 Zhang Tong, "Intelligent Keyframe Extraction for Video Printing," Proceedings of SPIE, The International Society for Optical Engineering SPIE, USA, vol.5601, 1 January 2004 (2004-01-01), pages 25-35, XP009093166, ISSN:0277-786X, DOI:10.1117/12.572474Zhang Tong, "Intelligent Keyframe Extraction for Video Printing," Proceedings of SPIE, The International Society for Optical Engineering SPIE, USA, vol.5601, 1 January 2004 (2004-01-01), pages 25-35, XP009093166, ISSN: 0277-786X, DOI: 10.1117 / 12.572474

このように、ディジタル撮影装置内でビデオサマリを生成することが可能なシステム及び方法を提供すること、特にビデオ撮影終了からディジタル撮影装置上でのビデオサマリ生成までにかかる時間が短い技術を提供することが望まれている。 Thus, it is possible to provide a system and method capable of generating a video summary in a digital photographing apparatus, and in particular, to provide a technique that requires a short time from the end of video photographing to the generation of a video summary on the digital photographing apparatus. It is hoped that.

ここに、本発明に係る方法は、ディジタルビデオ撮影装置で撮影されたディジタルビデオ時系列に関するビデオサマリを格納する方法であって、
複数個のビデオフレームを有するディジタルビデオ時系列をディジタルビデオ撮影装置で撮影するステップと、
そのディジタルビデオ時系列をプロセッサ可アクセスメモリ内に格納するステップと、
格納済ディジタルビデオ時系列から一群のビデオフレームに相応するキービデオ断片を１個又は複数個特定するステップと、
キービデオ断片同士を結合させることでビデオサマリを生成するステップと、
そのビデオサマリに相応するビデオフレーム群を指し示すメタデータをその格納済ディジタルビデオ時系列に関連付けて格納することでプロセッサ可アクセスメモリにおけるビデオメモリの格納先を特定するステップと、
を有する。 Here, the method according to the present invention is a method for storing a video summary relating to a digital video time series shot by a digital video shooting device,
Photographing a digital video time series having a plurality of video frames with a digital video photographing device;
Storing the digital video time series in a processor accessible memory;
Identifying one or more key video fragments corresponding to a group of video frames from a stored digital video time series;
Generating a video summary by combining key video fragments;
Identifying the storage location of the video memory in the processor-accessible memory by storing metadata indicating a video frame group corresponding to the video summary in association with the stored digital video time series; and
Have

本発明には、ビデオサマリがメタデータとしてディジタルビデオファイル内に格納されているので、ビデオサマリを符号化して別ファイル化する必要がない、という利点がある。そのビデオサマリは、ビデオサマリを指し示すメタデータを理解できるスマートなビデオプレーヤでは簡便に利用でき、そうでないビデオプレーヤでは無視されることとなる。 The present invention has the advantage that since the video summary is stored as metadata in the digital video file, it is not necessary to encode the video summary into a separate file. The video summary can be conveniently used by smart video players that can understand the metadata that points to the video summary and ignored by video players that do not.

更に、ビデオサマリが元々のディジタルビデオ時系列に関連付けて同じディジタルビデオファイルに格納される形態には、そのディジタルビデオ時系列をコピーするときや共有化するときにビデオサマリも共にコピー乃至共有化される、という利点がある。 Furthermore, in a form in which the video summary is stored in the same digital video file in association with the original digital video time series, the video summary is copied or shared together when the digital video time series is copied or shared. There is an advantage that.

本発明の一実施形態に係るビデオサマリ生成システムの構成要素を示す上位概念図である。It is a high-level conceptual diagram which shows the component of the video summary production | generation system which concerns on one Embodiment of this invention. 本発明の一実施形態に係るビデオサマリ生成方法のフローチャートである。3 is a flowchart of a video summary generation method according to an embodiment of the present invention. 本発明の一実施形態に係るユーザフィードバック利用型ビデオサマリ生成方法のフローチャートである。5 is a flowchart of a user feedback-based video summary generation method according to an embodiment of the present invention. 本発明の一実施形態に係るメタデータ形態ビデオサマリ格納型ビデオサマリ生成方法のフローチャートである。5 is a flowchart of a metadata summary type video summary generation method according to an embodiment of the present invention. 本発明の一実施形態に係るビデオサマリ表示方法のフローチャートである。5 is a flowchart of a video summary display method according to an embodiment of the present invention.

以下、本発明の好適な実施形態のうち、概ねソフトウェアプログラムとして実施されるものについて詳細に説明する。本件技術分野で習熟を積まれた方々（いわゆる当業者）には自明な通り、そうしたソフトウェアと等価なものをハードウェアで実現することもできる。画像操作アルゴリズム及びシステムは周知であるので、以下の説明では、本発明に係るシステム及び方法を構成し又はそれと直に連携するアルゴリズム及びシステムに的を絞っている。そうしたアルゴリズム及びシステムの別例や、関連する画像信号の生成乃至処理用ハードウェア乃至ソフトウェアについては、本件技術分野で既知のシステム、アルゴリズム、部材及び要素から選択できるので、具体的な図示や説明を省略する。本発明のシステムに関する以下の説明を参照すれば、本発明の実施に役立つが具体的な図示、示唆及び説明を欠くソフトウェアも、従来技術やいわゆる当業者の常識に従い実現することができよう。 In the following, a preferred embodiment of the present invention will be described in detail with respect to what is generally implemented as a software program. As obvious to those skilled in the present technical field (so-called persons skilled in the art), it is also possible to realize hardware equivalent to such software. Since image manipulation algorithms and systems are well known, the following description focuses on algorithms and systems that make up or work directly with the systems and methods of the present invention. Other examples of such algorithms and systems, and related image signal generation or processing hardware or software can be selected from systems, algorithms, members, and elements known in this technical field. Omitted. With reference to the following description of the system of the present invention, software that is useful for the implementation of the present invention but lacks specific illustrations, suggestions, and descriptions may also be implemented in accordance with the prior art and what is commonly known by those skilled in the art.

また、本発明に係る方法を実行するためのコンピュータプログラムは、磁気ディスク（例．ハードディスク，フロッピーディスク）、磁気テープ等の磁気記録媒体、光ディスク、光テープ、機械可読バーコード等の光記録媒体、ランダムアクセスメモリ（ＲＡＭ）、リードオンリメモリ（ＲＯＭ）等の固体電子記憶デバイスをはじめとするコンピュータ可読記録媒体、即ち本発明に係る方法が体現されるよう１台又は複数台のコンピュータを制御するコンピュータプログラムの格納に使用可能な諸有形デバイス乃至媒体に格納することができる。 A computer program for executing the method according to the present invention includes a magnetic disk (eg, hard disk, floppy disk), a magnetic recording medium such as a magnetic tape, an optical recording medium such as an optical disk, an optical tape, and a machine-readable barcode. Computer readable recording media including solid state electronic storage devices such as random access memory (RAM), read only memory (ROM), ie, a computer controlling one or more computers to embody the method of the present invention It can be stored in various tangible devices or media that can be used to store programs.

本発明は、本願記載の実施形態同士を組み合わせた構成をも包含する。「具体例」等との記載があるなら、その構成は本発明の実施形態のうち少なくとも１個で採用されうるものである。ある個所で「一実施形態」「具体例」と称したものと別の個所で「一実施形態」「具体例」と称したものとが同一であるとは限らない。反面、明示のある場合やいわゆる当業者にとり自明な場合以外はそれらが相互排他的な関係になるとも限らない。「方法」「諸方法」等といった単複の別には要旨限定的な意味合いはない。語「又は」は、明示がある場合や文脈上当然な場合を除き非排他的な意味合いで使用されているので、その点に留意されたい。 The present invention also includes a configuration in which the embodiments described in the present application are combined. If there is a description such as “specific example”, the configuration can be adopted in at least one of the embodiments of the present invention. What is referred to as “one embodiment” or “specific example” in one place is not necessarily the same as what is referred to as “one embodiment” or “specific example” in another place. On the other hand, they may not be mutually exclusive unless explicitly stated or obvious to a person skilled in the art. There is no meaning limited to the gist, except for “method”, “methods” and the like. It should be noted that the word “or” is used in a non-exclusive sense unless explicitly stated or contextually.

イメージングデバイス、それに付随する信号取得／処理回路、ディスプレイ等を備えたディジタルカメラは周知であるので、以下の説明では本発明に係る方法及び装置を構成し又はそれと直に連携する要素に的を絞ることにする。本願にて具体的に図示、説明されていない要素は本件技術分野で既知のものから選べばよい。説明してある実施形態のうち一部はソフトウェアの形態を採っている。本発明のシステムに関する以下の説明を参照すれば、本発明の実施に役立つが具体的な図示、示唆及び説明を欠くソフトウェアも、従来技術やいわゆる当業者の常識に従い実現することができよう。 Since digital cameras with imaging devices, associated signal acquisition / processing circuits, displays, etc. are well known, the following description focuses on the elements that make up or directly cooperate with the method and apparatus according to the present invention. I will decide. Elements not specifically shown or described in the present application may be selected from those known in the present technical field. Some of the described embodiments take the form of software. With reference to the following description of the system of the present invention, software that is useful for the implementation of the present invention but lacks specific illustrations, suggestions, and descriptions may also be implemented in accordance with the prior art and what is commonly known by those skilled in the art.

ディジタルカメラに関する以下の説明はいわゆる当業者にとり理解しやすいものであろう。自明な通り、これから説明する構成にはコスト低減、機能追加、カメラ性能向上等を目的とし様々な変形を施すことが可能である。 The following description of the digital camera will be easily understood by those skilled in the art. As is obvious, the configuration described below can be modified in various ways for the purpose of cost reduction, function addition, camera performance improvement, and the like.

図１に、本発明の一実施形態に係りビデオ撮影が可能なディジタルカメラ１０を有するディジタル写真システムのブロック構成を示す。好ましくも、このカメラ１０は電池駆動式でその携帯が可能であり、小型であるため撮影時や画像リビュー時にユーザが容易に手に持つことができる。カメラ１０による撮影で得られたディジタル画像はファイル化され画像メモリ３０内に格納される。なお、本願では、「ディジタル画像」「ディジタル画像ファイル」等の語を、スチル画像かビデオ画像かを問わず種々のディジタル画像乃至そのファイルを包含する意味で使用している。 FIG. 1 shows a block configuration of a digital photo system having a digital camera 10 capable of video shooting according to an embodiment of the present invention. Preferably, the camera 10 is battery-driven and portable and can be easily held by the user during shooting or image review. A digital image obtained by photographing with the camera 10 is filed and stored in the image memory 30. In the present application, the terms “digital image”, “digital image file” and the like are used to include various digital images and their files regardless of whether they are still images or video images.

本実施形態のディジタルカメラ１０はビデオ撮影機能及びスチル撮影機能を併有している。本発明は、ビデオしか撮影できないディジタルビデオカメラの形態や、ディジタル音楽プレーヤ（例．ＭＰ３プレーヤ）、携帯電話、ＧＰＳ受信機、携帯情報端末（ＰＤＡ）その他の機能を併有する形態でも実施することができる。 The digital camera 10 of this embodiment has both a video shooting function and a still shooting function. The present invention can also be implemented in the form of a digital video camera that can only shoot video, or in the form of a digital music player (eg, MP3 player), a mobile phone, a GPS receiver, a personal digital assistant (PDA), and other functions. it can.

ディジタルカメラ１０にはレンズ４及びそれに付随する可調絞り及び可調シャッタ６が備わっている。本実施形態ではレンズ４がズームレンズであり、それを制御する手段としてズーム／合焦モータドライバ８が設けられている。レンズ４は図示しない光景からの光をイメージセンサ１４上、具体的には単一チップ型のカラーＣＣＤイメージセンサやカラーＣＭＯＳイメージセンサの上に合焦させる。このレンズ４は、センサ１４上に光景の像を発生させうる光学系の一種である。本発明は、このほか、固定焦点長レンズを有しその焦点が可変又は固定の光学系を使用する形態でも実施することができる。 The digital camera 10 includes a lens 4 and an adjustable aperture and adjustable shutter 6 associated therewith. In the present embodiment, the lens 4 is a zoom lens, and a zoom / focusing motor driver 8 is provided as means for controlling it. The lens 4 focuses light from a scene (not shown) on the image sensor 14, specifically on a single-chip color CCD image sensor or color CMOS image sensor. The lens 4 is a kind of optical system that can generate an image of a scene on the sensor 14. In addition to this, the present invention can also be implemented in a form using an optical system having a fixed focal length lens and having a variable or fixed focal point.

イメージセンサ１４の出力はアナログ信号プロセッサ（ＡＳＰ）及びアナログディジタル（Ａ／Ｄ）コンバータ１６にてディジタルデータに変換されバッファメモリ１８内に一時的に格納される。メモリ１８内に一時格納された画像データはファームウェアメモリ２８内の埋込ソフトウェアプログラム、例えばファームウェアに従いプロセッサ２０によって操作される。本実施形態では、ソフトウェアプログラムを恒久的に保持するＲＯＭ型のメモリ２８が使用されているが、その内容修正が可能なメモリ例えばフラッシュＥＰＲＯＭをメモリ２８として使用する形態でも、本発明を実施することができる。後者なら、有線インタフェース３８やワイヤレスモデム５０を介し外部装置を接続し、その接続を通じメモリ２８内のソフトウェアプログラムを更新することや、イメージセンサ校正データ、ユーザ設定データ等、カメラ電源オフ時でも失いたくないデータの格納にメモリ２８を使用することができる。また、図示しないが、本実施形態ではプロセッサ２０にプログラムメモリが付設されており、メモリ２８内のソフトウェアプログラムはそこにコピーされた上でプロセッサ２０により実行される。 The output of the image sensor 14 is converted into digital data by an analog signal processor (ASP) and an analog / digital (A / D) converter 16 and temporarily stored in a buffer memory 18. The image data temporarily stored in the memory 18 is operated by the processor 20 according to an embedded software program in the firmware memory 28, for example, firmware. In the present embodiment, the ROM type memory 28 that permanently holds the software program is used. However, the present invention can be implemented also in a form in which a memory whose contents can be modified, such as a flash EPROM, is used as the memory 28. Can do. In the latter case, an external device is connected via the wired interface 38 or the wireless modem 50, and the software program in the memory 28 is updated through the connection, and the image sensor calibration data, user setting data, etc. are lost even when the camera is turned off. The memory 28 can be used to store no data. Although not shown, in this embodiment, a program memory is attached to the processor 20, and the software program in the memory 28 is copied there and executed by the processor 20.

ご理解頂ける通り、このプロセッサ２０には様々な機能が備わっている。それらの機能の実現には、１個又は複数個のプログラマブルプロセッサ例えばディジタル信号プロセッサ（ＤＳＰ）、１個又は複数個のカスタム回路例えばディジタルカメラ向けカスタム集積回路（ＩＣ）、それらプログラマブルプロセッサ及びカスタム回路の組合せ等を使用することができる。同じくご理解頂ける通り、図１に示した諸部材の一部又は全てを共通データバス経由でプロセッサ２０に接続することもできる。例えば、プロセッサ２０、バッファメモリ１８、画像メモリ３０及びファームウェアメモリ２８の間を共通データバスで接続する構成にするとよい。 As can be understood, the processor 20 has various functions. The realization of these functions can include one or more programmable processors such as a digital signal processor (DSP), one or more custom circuits such as a custom integrated circuit (IC) for digital cameras, programmable processors and custom circuits. Combinations and the like can be used. As can also be understood, some or all of the components shown in FIG. 1 can be connected to the processor 20 via a common data bus. For example, the processor 20, the buffer memory 18, the image memory 30, and the firmware memory 28 may be connected via a common data bus.

処理された画像データは画像メモリ３０内に格納される。自明な通り、このメモリ３０は、リムーバブルフラッシュメモリカード、内蔵フラッシュメモリチップ、磁気メモリ、光学メモリ等をはじめ、いわゆる当業者にとり既知の諸形態を採りうる。メモリ３０を、内蔵フラッシュメモリチップ，リムーバブルフラッシュメモリカード対応標準インタフェース併有型の構成にしてもよい。メモリカードとしてはセキュアディジタル（ＳＤ（登録商標））カード、マイクロＳＤ（登録商標）カード、コンパクトフラッシュ（ＣＦ（登録商標））カード、マルチメディアカード（ＭＭＣ）、ｘＤ（登録商標）カード、メモリスティック等を使用することができる。 The processed image data is stored in the image memory 30. As is obvious, the memory 30 may take various forms known to those skilled in the art including a removable flash memory card, a built-in flash memory chip, a magnetic memory, an optical memory, and the like. The memory 30 may be configured to have a built-in flash memory chip and a standard interface compatible with a removable flash memory card. As the memory card, a secure digital (SD (registered trademark)) card, a micro SD (registered trademark) card, a compact flash (CF (registered trademark)) card, a multimedia card (MMC), an xD (registered trademark) card, a memory stick Etc. can be used.

また、イメージセンサ１４は、ＡＳＰ及びＡ／Ｄコンバータ１６の動作に同期するよう、タイミング発生器１２に発する種々のクロック信号例えばローセレクト信号や画素セレクト信号に従い制御される。この例ではセンサ１４のサイズが１２．４メガ画素（４０８８×３０４０画素）であるので、約４０００×３０００画素のスチル画像データを生成することができる。また、通例に倣いセンサ１４上に色フィルタアレイが重畳され、その色が異なる画素群が混在する画素アレイが形成されているので、このセンサ１４でカラー画像を得ることができる。その画素アレイにおける画素色配列は様々なパターンを採りうる。その一例は、本願出願人を譲受人とする特許文献６（発明者：Ｂａｙｅｒ，名称：カラーイメージングアレイ(Color imaging array)，この参照を以てその内容を本願に繰り入れる）に記載の如く、周知のベイヤ色フィルタアレイが形成される画素色配列パターンである。別例としては、本願出願人を譲受人とする特許文献２５（発明者：Ｃｏｍｐｔｏｎ及びＨａｍｉｌｔｏｎ，出願日：２００７年７月２８日，名称：高光感度イメージセンサ(Image Sensor with Improved Light Sensitivity)，この参照を以てその内容を本願に繰り入れる）に記載の画素色配列パターンがある。これらはいずれも例であり、画素色配列パターンとして使用可能なパターンは多様である。 The image sensor 14 is controlled in accordance with various clock signals, such as a low select signal and a pixel select signal, issued to the timing generator 12 so as to synchronize with the operation of the ASP and A / D converter 16. In this example, since the size of the sensor 14 is 12.4 megapixels (4088 × 3040 pixels), still image data of about 4000 × 3000 pixels can be generated. Further, since a color filter array is superimposed on the copying sensor 14 and a pixel array in which pixel groups having different colors are mixed is formed, a color image can be obtained with the sensor 14. The pixel color arrangement in the pixel array can take various patterns. One example is a well-known Bayer as described in Patent Document 6 (inventor: Bayer, name: Color imaging array, the contents of which are incorporated herein by reference), the assignee of which is the assignee of the present application. It is a pixel color arrangement pattern in which a color filter array is formed. As another example, Patent Document 25 (inventor: Compton and Hamilton, filing date: July 28, 2007, name: Image Sensor with Improved Light Sensitivity), assigned to the assignee of the present application. There is a pixel color arrangement pattern described in the above). These are all examples, and there are various patterns that can be used as the pixel color arrangement pattern.

ご理解頂けるように、これらイメージセンサ１４、タイミング発生器１２並びにＡＳＰ及びＡ／Ｄコンバータ１６は、互いに別々のＩＣとして製造することも、ＣＭＯＳイメージセンサでの通例に倣い単一のＩＣとして製造することも可能である。そうしたＩＣに、図１に示した機能、例えばプロセッサ２０によって担われている機能の一部を担わせることもできる。 As can be understood, the image sensor 14, the timing generator 12, and the ASP and A / D converter 16 can be manufactured as separate ICs, or can be manufactured as a single IC following the customary practice of CMOS image sensors. It is also possible. Such an IC can be provided with a part of the functions shown in FIG.

タイミング発生器１２によるイメージセンサ１４の駆動モードとしては、まず、低解像度画像データのモーション付時系列が生じる第１モードがある。ビデオ画像撮影時やスチル撮影に先立つプリビュー・構図検討時には、１２８０×７２０画素のＨＤ（登録商標）解像度画像データ、６４０×４８０画素のＶＧＡ解像度画像データ等、センサ１４の解像度に比べかなりカラム数及びロー数が少ないセンサ画像データがこのモード下で生成される。 As a drive mode of the image sensor 14 by the timing generator 12, there is first a first mode in which a time series with motion of low resolution image data is generated. At the time of video image shooting or previewing / compositioning prior to still shooting, HD (registered trademark) resolution image data of 1280 × 720 pixels, VGA resolution image data of 640 × 480 pixels, etc. Sensor image data with a small number of rows is generated under this mode.

プリビュー向けのセンサ画像データには、同色隣接画素間で画素値を結合させる処理、一部画素値を無視する処理、ある色について画素値同士を結合させる一方他の色について画素値を無視する処理等が施されうる。本願出願人を譲受人とする特許文献１５（発明者：Ｐａｒｕｌｓｋｉ，ｅｔａｌ．，名称：ビデオ画像プリビュー中にスチル撮影を開始する電子カメラ(Electronic Camera for Initiating Capture of Still Images while Previewing Motion Images)，この参照を以てその内容を本願に繰り入れる）に記載の処理を施すようにしてもよい。 For sensor image data for preview, processing for combining pixel values between adjacent pixels of the same color, processing for ignoring some pixel values, processing for combining pixel values for one color while ignoring pixel values for other colors Etc. can be applied. Patent Document 15 (inventor: Parulski, et al., Name: an electronic camera for initiating capture of still images while previewing motion images) that starts still shooting during video image preview; You may make it perform the process as described in this application).

タイミング発生器１２によるイメージセンサ１４の駆動モードとしては、次に、高解像度スチル画像データが生じる第２モードがある。その結果生じる最終的なセンサ画像データは、例えば４０００×３０００画素の解像度を有する１２メガ画素の高解像度画像データである。光景輝度が高い場合はセンサ１４内諸画素の画素値が最終画像データとして使用されるが、光景輝度が低い場合は信号強度ひいてはセンサ１４のＩＳＯ（登録商標）速度を増強すべくセンサ１４内類色画素間で画素値をビニング（結合）したものが最終画像データとして使用される。 As a driving mode of the image sensor 14 by the timing generator 12, there is a second mode in which high-resolution still image data is generated. The resulting final sensor image data is 12 megapixel high resolution image data having a resolution of, for example, 4000 × 3000 pixels. When the scene brightness is high, the pixel values of the pixels in the sensor 14 are used as the final image data. When the scene brightness is low, the signal intensity and thus the sensor 14 internal class is used to increase the ISO (registered trademark) speed. A pixel value binned (combined) between color pixels is used as final image data.

プロセッサ２０は、その際、焦点長設定が適正になり光景からの光がイメージセンサ１４上に合焦するよう制御信号を発してズーム／合焦モータドライバ８を制御する。センサ１４の露光レベルは、可調絞り及び可調シャッタ６によるｆ／ナンバー及び露光時間の制御、タイミング発生器１２によるセンサ１４の露光周期の制御、並びにＡＳＰ及びＡ／Ｄコンバータ１６による利得設定即ちＩＳＯ（登録商標）速度設定の制御によって制御される。プロセッサ２０は光景を照らすべくフラッシュ２も制御する。 At that time, the processor 20 controls the zoom / focusing motor driver 8 by issuing a control signal so that the focal length setting becomes appropriate and the light from the scene is focused on the image sensor 14. The exposure level of the sensor 14 is controlled by controlling the f / number and exposure time by the adjustable diaphragm and adjustable shutter 6, by controlling the exposure cycle of the sensor 14 by the timing generator 12, and by the gain setting by the ASP and A / D converter 16. It is controlled by controlling the ISO (registered trademark) speed setting. The processor 20 also controls the flash 2 to illuminate the scene.

上掲の第１モードでは、本願出願人を譲受人とする特許文献１２（発明者：Ｐａｒｕｌｓｋｉｅｔａｌ．、名称：プログレッシブスキャンイメージセンサ上に画像を高速自動合焦させる電子カメラ(Electronic Camera with Rapid Automatic Focus of an Image upon a Progressive Scan Image Sensor)，この参照を以てその内容を本願に繰り入れる）記載の通り、ディジタルカメラ１０のレンズ４をスルーザレンズ方式で自動合焦させることができる。これは、ズーム／合焦モータドライバ８を用いレンズ４の焦点位置を至近焦点位置から無限遠焦点位置に至る範囲内で様々に変化させつつ、イメージセンサ１４で撮影された画像の中央領域でシャープネス値がピークを呈する焦点位置即ち最善焦点位置を、プロセッサ２０にて判別することで実行される。この最善焦点位置に対応する焦点距離は、適切な光景モードの自動設定等を含め幾通りかの目的で事後使用できることから、他のレンズ設定情報及びカメラ設定情報と共に画像ファイル内にメタデータとして格納される。 In the first mode described above, Patent Document 12 (inventor: Parulski et al., Name: progressive scan image sensor) that automatically assigns an image onto a progressive scan image sensor (Electronic Camera with Rapid). As described in (Automatic Focus of an Image upon a Progressive Scan Image Sensor), the contents of which are incorporated herein by reference), the lens 4 of the digital camera 10 can be automatically focused by the through-the-lens method. This is because the zoom / focusing motor driver 8 is used to change the focal position of the lens 4 within a range from the closest focal position to the infinity focal position, and sharpness in the central area of the image taken by the image sensor 14. The processing is executed by the processor 20 determining the focal position where the value exhibits a peak, that is, the best focal position. The focal length corresponding to this best focus position can be used afterwards for several purposes including automatic setting of the appropriate scene mode, etc., so it is stored as metadata in the image file along with other lens setting information and camera setting information. Is done.

プロセッサ２０は、ディスプレイメモリ３６内に一時格納されていた低解像度カラー画像や作成したメニューを画像ディスプレイ３２上に表示させる。このディスプレイ３２は能動マトリクスカラー液晶ディスプレイ（ＬＣＤ）であるが、有機発光ダイオード（ＯＬＥＤ）ディスプレイをはじめ他種ディスプレイを使用することもできる。ディジタルカメラ１０からのビデオ画像出力信号は、ビデオインタフェース４４を介しビデオディスプレイ４６、具体的にはフラットパネルＨＤＴＶディスプレイに供給される。ビデオ撮影モードやプリビューモードでは、バッファメモリ１８から読み込まれたディジタル画像データがプロセッサ２０によって操作され、一連のモーションプリビュー画像が画像ディスプレイ３２上に原則としてカラーで表示される。画像表示モードでは、画像メモリ３０内に格納されているディジタル画像ファイル内の画像データに基づき画像ディスプレイ３２上に画像が表示される。 The processor 20 displays on the image display 32 the low-resolution color image temporarily stored in the display memory 36 and the created menu. The display 32 is an active matrix color liquid crystal display (LCD), but other types of displays including organic light emitting diode (OLED) displays can be used. A video image output signal from the digital camera 10 is supplied to a video display 46, specifically a flat panel HDTV display, via a video interface 44. In the video shooting mode and the preview mode, the digital image data read from the buffer memory 18 is manipulated by the processor 20, and a series of motion preview images are displayed in color on the image display 32 in principle. In the image display mode, an image is displayed on the image display 32 based on the image data in the digital image file stored in the image memory 30.

その画像ディスプレイ３２上には、ユーザ用コントローラ３４を介したユーザ入力で操作可能なグラフィカルユーザインタフェースが表示される。コントローラ３４は、ビデオ撮影モード、スチル撮影モード、画像表示モード等をはじめとする諸カメラモードの設定や、スチル撮影開始、ビデオ記録開始等の指示に使用される。本実施形態では、コントローラ３４の一種たるシャッタボタンをユーザが半押しすると上掲の第１モードに移行してスチル画像のプリビューが可能となり、全押しすると第２モードに移行してスチル撮影が実行される。コントローラ３４は、更に、カメラへの電源投入、レンズ４の操作及び撮影プロセスの起動にも使用される。コントローラ３４は、ボタン、ロッカスイッチ、ジョイスティック、ロータリダイアル、その任意の組合せ等のほか、ディスプレイ３２に重畳されたタッチスクリーン等の形態を採りうる。ステータスディスプレイや画像ディスプレイを幾つか追加することもできる。 On the image display 32, a graphical user interface that can be operated by a user input via the user controller 34 is displayed. The controller 34 is used for setting various camera modes such as a video shooting mode, a still shooting mode, and an image display mode, and for instructing a start of a still shooting and a video recording. In this embodiment, when the user half-presses the shutter button which is a kind of the controller 34, the first mode is shifted to the above-described first mode, and the still image is previewed. When the user fully presses the shutter button, the second mode is shifted to the still shooting. Is done. The controller 34 is also used for powering on the camera, operating the lens 4 and starting the imaging process. The controller 34 may take the form of a button, a rocker switch, a joystick, a rotary dial, any combination thereof, a touch screen superimposed on the display 32, or the like. You can add several status displays and image displays.

ユーザ用コントローラ３４を用いカメラをタイマーモードに設定することもできる。タイマーモードの許では、ユーザがシャッタボタンを全押しした後、若干の遅延時間例えば１０ｓｅｃを経た後プロセッサ２０によるスチル撮影が開始される。 The camera can also be set to the timer mode using the user controller 34. In the permission of the timer mode, after the user fully presses the shutter button, still shooting by the processor 20 is started after a slight delay time, for example, 10 sec.

プロセッサ２０には、更に、マイクロホン２４から音声信号を受け取りスピーカ２６に音声信号を供給するオーディオコーデック２２が接続されている。これらの部材は、オーディオトラックの記録・再生時だけでなく、ビデオ画像時系列、スチル画像等の記録・再生にも使用可能である。ディジタルカメラ１０をカメラ付携帯電話等の多機能デバイスとして構成し、マイクロホン２４及びスピーカ２６を通話手段として使用することも可能である。 The processor 20 is further connected to an audio codec 22 that receives an audio signal from the microphone 24 and supplies an audio signal to the speaker 26. These members can be used not only for recording / reproduction of audio tracks, but also for recording / reproduction of video image time series, still images, and the like. It is also possible to configure the digital camera 10 as a multi-function device such as a camera-equipped mobile phone, and use the microphone 24 and the speaker 26 as call means.

本実施形態ではスピーカ２６がユーザインタフェースの一部としても使用される。具体的には、ユーザ用コントローラ３４が操作されたことや、特定のモードが指定されたことが、スピーカ２６に発する種々の可聴信号で通知される。本実施形態では、更にマイクロホン２４、オーディオコーデック２２及びプロセッサ２０を用い音声認識が実行される。従って、ユーザは、コントローラ３４の操作ではなく音声コマンドによってプロセッサ２０に入力することができる。スピーカ２６は、更に、電話コールの到来をユーザに通知する手段等としても使用される。この通知には、ファームウェアメモリ２８内に格納されている標準的なリングトーンが使用される。ワイヤレスネットワーク５８経由で画像メモリ３０内にカスタムリングトーンをダウンロード済であれば、そのカスタムリングトーンを使用することもできる。更に、図示しないが、電話コールの到来をサイレントモード即ち非可聴モードで通知できるよう振動デバイスを設けてもよい。 In the present embodiment, the speaker 26 is also used as a part of the user interface. Specifically, the operation of the user controller 34 or the designation of a specific mode is notified by various audible signals emitted from the speaker 26. In the present embodiment, voice recognition is further performed using the microphone 24, the audio codec 22, and the processor 20. Therefore, the user can input to the processor 20 by a voice command instead of operating the controller 34. The speaker 26 is also used as a means for notifying the user of the arrival of a telephone call. For this notification, a standard ring tone stored in the firmware memory 28 is used. If a custom ring tone has been downloaded into the image memory 30 via the wireless network 58, the custom ring tone can be used. Further, although not shown, a vibration device may be provided so that the arrival of a telephone call can be notified in a silent mode, that is, a non-audible mode.

本実施形態のディジタルカメラ１０は加速度計２７を備えているので、カメラモーションに関する情報をそこから得ることができる。好ましいことに、この加速度計２７は、直交三軸それぞれについて線加速度及び角加速度を検知できるものであるので、合計６次元分の情報を取得することができる。 Since the digital camera 10 of the present embodiment includes the accelerometer 27, information regarding camera motion can be obtained therefrom. Preferably, since this accelerometer 27 can detect linear acceleration and angular acceleration for each of the three orthogonal axes, it can acquire information for a total of six dimensions.

プロセッサ２０は、また、イメージセンサ１４から得られる画像データに更なる処理を施してｓＲＧＢ（登録商標）画像データに変換し、それを圧縮して最終的な画像ファイル、例えば周知のＥｘｉｆ（登録商標）−ＪＰＥＧ形式による画像ファイルを生成し、そのファイルを画像メモリ３０内に格納する。 The processor 20 also performs further processing on the image data obtained from the image sensor 14 to convert it into sRGB (registered trademark) image data, compresses it, and compresses the final image file, for example, the well-known Exif (registered trademark). ) -Generate an image file in JPEG format and store the file in the image memory 30.

ディジタルカメラ１０は、有線インタフェース３８を介しインタフェース／充電器４８、ひいては家庭内又は事務所内のデスクトップ乃至ポータブルコンピュータ４０に接続することが可能である。この例では、そのインタフェース３８として周知のＵＳＢ２．０インタフェース仕様に適合するものが使用されている。そのため、インタフェース／充電器４８からインタフェース３８を介し図示しないカメラ１０内二次電池群へと電力を供給することができる。 The digital camera 10 can be connected via a wired interface 38 to an interface / charger 48 and thus to a desktop or portable computer 40 in the home or office. In this example, the interface 38 that conforms to the well-known USB 2.0 interface specification is used. Therefore, power can be supplied from the interface / charger 48 to the secondary battery group in the camera 10 (not shown) via the interface 38.

ディジタルカメラ１０は、また、ワイヤレスモデム５０を介し無線周波数帯５２経由でワイヤレスネットワーク５８に接続することが可能である。モデム５０が準拠する無線インタフェースプロトコルは、例えば、周知のＢｌｕｅｔｏｏｔｈ（登録商標）無線インタフェース、周知のＩＥＥＥ８０２．１１無線インタフェース等である。コンピュータ４０に届いた画像は、そこからインターネット７０経由でフォトサービスプロバイダ７２、例えばＫｏｄａｋ（登録商標）ＥａｓｙＳｈａｒｅ（登録商標）ギャラリに登録することができる。プロバイダ７２に登録された画像には、図示しない他種装置からもアクセスすることができる。 The digital camera 10 can also be connected to the wireless network 58 via the wireless frequency band 52 via the wireless modem 50. The radio interface protocol to which the modem 50 conforms is, for example, a well-known Bluetooth (registered trademark) radio interface, a well-known IEEE 802.11 radio interface, or the like. Images that have arrived at the computer 40 can then be registered with the photo service provider 72, for example, Kodak (registered trademark) EasyShare (registered trademark) gallery, via the Internet 70. The image registered in the provider 72 can also be accessed from other types of devices (not shown).

本発明は、ワイヤレスモデム５０がワイヤレスリンク等の無線周波数リンクを介し図示しない携帯電話網例えば３ＧＳＭ（登録商標）網に接続し、ディジタルカメラ１０内のディジタル画像ファイルをインターネット７０上に送出する形態でも実施することができる。送出されたディジタル画像ファイルはコンピュータ４０やフォトサービスプロバイダ７２で受信される。 In the present invention, the wireless modem 50 is connected to a mobile phone network (not shown) such as a 3GSM (registered trademark) network via a radio frequency link such as a wireless link, and the digital image file in the digital camera 10 is transmitted onto the Internet 70. Can be implemented. The sent digital image file is received by the computer 40 or the photo service provider 72.

次に、図２を参照しつつ本発明の一実施形態に係る方法について説明する。本方法では、まず、ディジタルビデオ撮影装置例えばディジタルカメラ１０を用い、複数個のビデオフレームを有するディジタルビデオ時系列がディジタルビデオ時系列撮影ステップ２１０にて撮影される。 Next, a method according to an embodiment of the present invention will be described with reference to FIG. In this method, first, a digital video time series having a plurality of video frames is photographed in a digital video time series photographing step 210 using a digital video photographing apparatus such as a digital camera 10.

ディジタルビデオ時系列撮影時には特徴量判別ステップ２２０、即ちビデオフレーム群又はその一部の解析を通じ一通り又は複数通りの特徴量を判別するステップも実行される。その判別で求まる特徴量としては、まず、ビデオフレームの色特性やビデオフレーム内顔存否をはじめ、ビデオ属性に関連する特徴量がある。連続ビデオフレーム間大域モーション量や、連続ビデオフレーム内対応要素間局所モーション量をはじめ、モーションに関連する特徴量も求まる。大域モーションが一般に撮影装置の動きに対応するのに対し、局所モーションは光景内被写体の動きに対応している。いわゆる当業者にはご理解頂けるように、上掲の特徴量は一例であり、ビデオフレームに対する解析を通じて他種特徴量を判別することもできる。 At the time of digital video time series shooting, a feature amount determination step 220, that is, a step of determining one or more feature amounts through analysis of a video frame group or a part thereof is also executed. As feature amounts obtained by the discrimination, first, there are feature amounts related to video attributes, including color characteristics of video frames and presence / absence of faces in video frames. In addition to the amount of global motion between successive video frames and the amount of local motion between corresponding elements in successive video frames, the feature quantities related to motion can also be obtained. The global motion generally corresponds to the movement of the photographing apparatus, whereas the local motion corresponds to the movement of the subject in the scene. As can be understood by a so-called person skilled in the art, the above-described feature amounts are examples, and other types of feature amounts can be determined through analysis of video frames.

同ステップ２２０で判別可能な特徴量としてはオーディオ関連の特徴量もある。例えば、時間領域における信号強度、特定周波数帯域における信号強度等といった特徴量は、ディジタルビデオ時系列撮影時にマイクロホン２４を介し録音され、オーディオコーデック２２で処理された１個又は複数個のオーディオサンプルを、解析に供することで判別することができる。 The feature quantities that can be discriminated in step 220 include audio-related feature quantities. For example, feature quantities such as signal strength in the time domain, signal strength in a specific frequency band, and the like are recorded through the microphone 24 during digital video time series shooting and processed by the audio codec 22 as one or more audio samples. It can be determined by subjecting it to analysis.

同ステップ２２０で判別可能な特徴量としては装置設定関連の特徴量もある。例えば、ズーム／合焦モータドライバ８の制御によるズームレンズ４のポジション調整のため、ユーザ用コントローラ３４経由でユーザから与えられた指令、といった特徴量である。この種の特徴量は、ディジタルビデオ時系列撮影時におけるディジタルビデオ撮影装置の設定を解析することで判別できる。ディジタルズームでも像の倍率が変わるので、ディジタルズームを別途特徴量として判別するようにしてもよい。 The feature quantity that can be discriminated in step 220 includes a feature quantity related to the apparatus setting. For example, the feature amount is a command given by the user via the user controller 34 for adjusting the position of the zoom lens 4 under the control of the zoom / focusing motor driver 8. This type of feature quantity can be determined by analyzing the settings of the digital video photographing apparatus at the time of digital video time series photographing. Since the magnification of an image changes even with digital zoom, the digital zoom may be separately determined as a feature amount.

同ステップ２２０で判別可能な特徴量としては、ディジタルカメラ１０の動きに関する計測結果を示す特徴量もある。この種の特徴量は、例えば、ディジタルビデオ時系列撮影時に加速度計２７から得られた加速度計データを解析することで判別でき、ビデオフレームデータに基づき導出されるモーション関連特徴量の補強乃至代替として使用することができる。 The feature quantity that can be discriminated in step 220 includes a feature quantity that indicates a measurement result related to the movement of the digital camera 10. This type of feature quantity can be determined by analyzing accelerometer data obtained from the accelerometer 27 at the time of digital video time series shooting, for example, as a reinforcement or alternative to motion-related feature quantities derived based on video frame data. Can be used.

同ステップ２２０で判別可能な特徴量としては、プロセッサ２０でのビデオ符号化処理適用によって生じるデータの解析結果もある。解析対象データの例としては、ビデオ符号化処理中に実行されるモーション推定処理にて生じるモーションベクトル情報等がある。大抵のビデオ符号化処理では、そうしたモーション推定処理が通常処理鎖の一部としてルーチン的に実行される。 The feature quantity that can be discriminated in step 220 includes an analysis result of data generated by applying the video encoding process in the processor 20. Examples of the analysis target data include motion vector information generated in a motion estimation process executed during the video encoding process. In most video encoding processes, such motion estimation processing is routinely performed as part of the normal processing chain.

ビデオフレーム毎の特徴量判別が済んだ後は、ディジタルビデオ時系列圧縮ステップ２３０にてビデオフレームが圧縮される。使用するビデオ圧縮アルゴリズムは、ＭＰＥＧ規格、Ｈ．２６３規格その他、いわゆる当業者にとり周知の規格に準拠したもの等である。圧縮が済んだビデオフレームはコンテナ、具体的にはＡｐｐｌｅ（登録商標）ＱｕｉｃｋＴｉｍｅ（登録商標）で提供されるビデオファイル用のファイルフォーマットラッパによって収容される。 After the feature amount determination for each video frame is completed, the video frame is compressed in the digital video time series compression step 230. The video compression algorithm used is the MPEG standard, H.264. The H.263 standard and other standards that are known to those skilled in the art. The compressed video frame is accommodated by a file format wrapper for a video file provided by a container, specifically, Apple (registered trademark) QuickTime (registered trademark).

圧縮版ディジタルビデオ時系列格納ステップ２４０では、圧縮が済んだディジタルビデオ時系列がプロセッサ可アクセスメモリ内例えば画像メモリ３０内に格納される。格納される圧縮版ディジタルビデオ時系列はビデオ情報やオーディオ情報を含むものである。 In the compressed version digital video time series storage step 240, the compressed digital video time series is stored in the processor-accessible memory, for example, the image memory 30. The stored compressed digital video time series includes video information and audio information.

特徴量判別ステップ２２０で判別された特徴量は、例えば、格納される圧縮版ディジタルビデオ時系列に係るメタデータとして格納される。そのメタデータの格納には、例えば、Ａｐｐｌｅ（登録商標）ＱｕｉｃｋＴｉｍｅ（登録商標）ファイルフォーマット仕様で規定されているユーザデータアトム等を使用することができる。 The feature quantity discriminated in the feature quantity discrimination step 220 is stored as, for example, metadata related to the stored compressed digital video time series. The metadata can be stored using, for example, a user data atom defined in the Apple (registered trademark) QuickTime (registered trademark) file format specification.

これに代え、圧縮状態で格納されるディジタルビデオ時系列に関連付けられた別のファイル内に、特徴量判別ステップ２２０で判別された特徴量を格納するようにしてもよい。 Alternatively, the feature quantity determined in the feature quantity determination step 220 may be stored in another file associated with the digital video time series stored in the compressed state.

また、ディジタルビデオ時系列を圧縮状態で格納する際に、特徴量判別ステップ２２０で判別された特徴量が恒久格納型メモリに格納されないようにしてもよい。この場合、その特徴量はビデオサマリ生成アルゴリズム終了時点で破棄される。 Further, when the digital video time series is stored in a compressed state, the feature amount determined in the feature amount determination step 220 may not be stored in the permanent storage memory. In this case, the feature amount is discarded at the end of the video summary generation algorithm.

こうしてビデオ撮影動作及び圧縮版ディジタルビデオ時系列の格納が済んだ後、キービデオ断片特定ステップ２５０では、そのディジタルビデオ時系列を代表するキービデオ断片が特定される。即ち、プロセッサを用い諸特徴量を自動解析することで、格納した圧縮版ディジタルビデオ時系列を伸張することなく、幾つかのディジタルビデオ時系列内ビデオフレームを含むキービデオ断片が１個又は複数個特定される。キービデオ断片は、原則として、そのディジタルビデオ時系列内で連なっているビデオフレーム複数個の集まりであるので、始点フレーム番号と、終点フレーム番号又はキービデオ断片長との組合せで、個々別々に特定することができる。 After the video shooting operation and the storage of the compressed digital video time series are thus completed, in the key video fragment specifying step 250, the key video fragment representing the digital video time series is specified. That is, by automatically analyzing various feature quantities using a processor, one or a plurality of key video fragments including video frames in several digital video time series can be obtained without expanding the stored compressed digital video time series. Identified. In principle, a key video fragment is a collection of multiple video frames that are consecutive in the digital video time series. Therefore, the key video fragment is specified individually by a combination of the start frame number and end frame number or key video fragment length. can do.

関連する諸特徴量に基づきビデオ時系列内キービデオ断片を特定する手法としては、例えば、本件技術分野既知の諸手法が使用される。その一例は、まずキービデオフレームを幾つか特定し、個々のキービデオフレームを包含するようビデオ時系列の一部を選択することによって、個々のキービデオ断片を生成する手法である。特許文献２６（発明者：Ｌｕｏｅｔａｌ．，この参照を以て本願に繰り入れる）に記載の如く、ディジタルモーション推定で算出されたビデオ内モーションに基づきキービデオフレームを選択する手法や、特許文献４（発明者：Ｌｕｏｅｔａｌ．，この参照を以て本願に繰り入れる）に記載の如く、そのビデオ撮影装置に付随する加速度計からデータとして得られるビデオ内モーションの特徴に基づきキービデオフレームを選択する手法は、本発明の実施に当たり、特徴量判別結果に基づくキービデオ断片の特定に利用可能である。 As a technique for specifying the key video fragment in the video time series based on the related feature values, for example, various techniques known in this technical field are used. One example is a technique for generating individual key video fragments by first identifying several key video frames and selecting a portion of the video time series to encompass the individual key video frames. As described in Patent Document 26 (inventor: Luo et al., Which is incorporated herein by reference), a technique for selecting a key video frame based on an in-video motion calculated by digital motion estimation, or Patent Document 4 (Invention). (Luo et al., Which is incorporated herein by reference), a method for selecting a key video frame based on the characteristics of motion in video obtained as data from an accelerometer associated with the video photographing apparatus is described in this book. In carrying out the invention, the present invention can be used to specify a key video fragment based on a feature amount discrimination result.

格納済の圧縮版ディジタルビデオ時系列に関連付けられた別のファイル内に特徴量が格納されている場合は、キービデオ断片特定ステップ２５０で解析すべき特徴量がそのファイルから読み込まれる。 If the feature quantity is stored in another file associated with the stored compressed digital video time series, the feature quantity to be analyzed in the key video fragment specifying step 250 is read from the file.

格納済の圧縮版ディジタルビデオ時系列に係るメタデータとして特徴量が格納されている場合は、キービデオ断片特定ステップ２５０にて、格納済の圧縮版ディジタルビデオ時系列に係るビデオフレーム群を伸張することなく、解析すべき特徴量がその圧縮版ディジタルビデオ時系列に係るファイルから抽出される。格納済の圧縮版ディジタルビデオ時系列に係るメタデータとして格納された特徴量の抽出が、その圧縮版ディジタルビデオ時系列の伸張と見なされるべきではないことに留意されたい。格納済の圧縮版ディジタルビデオ時系列を伸張することに該当するのは、寧ろ、圧縮版ディジタルビデオ時系列に係る一連のビデオフレームを再構築する際に使用される圧縮済ビットストリーム内データ、例えばビデオ情報やヘッダ情報を復号することである。 When the feature quantity is stored as metadata related to the stored compressed digital video time series, the video frame group related to the stored compressed digital video time series is expanded in the key video fragment specifying step 250. Instead, the feature quantity to be analyzed is extracted from the compressed digital video time series file. It should be noted that the extraction of the feature quantity stored as metadata relating to a stored compressed digital video time series should not be regarded as an extension of the compressed digital video time series. Rather than decompressing the stored compressed digital video time series, rather, the data in the compressed bit stream used when reconstructing a series of video frames according to the compressed digital video time series, e.g. Decoding video information and header information.

本発明に備わる利点の一つは、格納済の圧縮版ディジタルビデオ時系列を伸張することなくキービデオ断片を特定できることである。そのため、キービデオ断片の特定やそれに後続するビデオサマリの生成が、撮影動作の終了からあまり間をおかずに終了する。格納済の圧縮版ディジタルビデオ時系列から個別のビデオフレームを抽出する手段として伸張が使用される従来の手法では、伸張タスクの実行に必要な時間の長さが桎梏となっていた。 One advantage of the present invention is that key video fragments can be identified without decompressing the stored compressed digital video time series. For this reason, the identification of the key video fragment and the generation of the video summary that follows the key video fragment are completed without much time from the end of the shooting operation. In conventional techniques where decompression is used as a means of extracting individual video frames from a stored compressed digital video time series, the length of time required to perform the decompression task has become a problem.

キービデオ断片の特定は、そのディジタルビデオ時系列の撮影時に判別された特徴量に関する解析の結果に全面的に依拠して実行することも、格納済の圧縮版ディジタルビデオ時系列から抽出された情報を解析で得た特徴量と併用して実行することも可能である。後者の場合、格納済の圧縮版ディジタルビデオ時系列に含まれる情報を部分的に復号しなければならなくなることもあり得る。 The identification of the key video fragment can be performed entirely depending on the result of the analysis on the feature amount determined at the time of shooting the digital video time series, or the information extracted from the stored compressed digital video time series. Can be executed in combination with the feature value obtained by the analysis. In the latter case, the information contained in the stored compressed digital video time series may have to be partially decoded.

例えば、格納済の圧縮版ディジタルビデオ時系列から抽出されたオーディオ情報を解析で得た特徴量と併用してキービデオ断片を特定する場合である。ただ、オーディオ情報は、圧縮版ディジタルビデオファイル全体に占める比率が小さいのが普通であり、ビデオフレームを構成する画素データの伸張に比べ迅速に抽出することができる。また、オーディオ属性関連の特徴量を撮影時に生成できなかった場合や、オーディオ属性関連その他の特徴量の判別並びに撮影したビデオの処理及び符号化に利用可能な期間がプロセッサの情報処理サイクル内になかった場合でも、格納済の圧縮版ディジタルビデオ時系列からオーディオ情報を抽出すれば、そのオーディオ情報をキービデオ断片特定に役立てることができる。これは、速度・性能間の折衷を表している。即ち、オーディオ情報を利用することで、キービデオ断片特定に要する処理時間の全体的長期化と引替に、キービデオ断片特定ステップ２５０の性能を高めることができる。 For example, a case where a key video fragment is specified by using audio information extracted from a stored compressed digital video time series together with a feature amount obtained by analysis. However, the audio information generally has a small ratio to the entire compressed digital video file, and can be extracted more quickly than the expansion of the pixel data constituting the video frame. Also, if audio feature-related feature values could not be generated at the time of shooting, or there was no period available for processing of audio video-related video processing and encoding, as well as audio attribute-related and other feature values. Even if the audio information is extracted from the stored compressed digital video time series, the audio information can be used for key video fragment identification. This represents a compromise between speed and performance. That is, by using the audio information, the performance of the key video fragment specifying step 250 can be enhanced in exchange for the overall increase in the processing time required for specifying the key video fragment.

格納済の圧縮版ディジタルビデオ時系列から抽出されたビデオ情報を解析で得た特徴量と併用してキービデオ断片を特定する場合も同様である。ビデオ時系列全体を伸張してしまうと、ビデオ属性関連特徴量を撮影時に判別するメリットが概ね打ち消されてしまうので、格納済の圧縮版ディジタルビデオ時系列を構成するビデオフレームのうち復号されるものの個数を可能な限り少数にするのが望ましい。いわゆる当業者にはご理解頂けるように、他フレームに対し独立に符号化されているフレームであれば、そのフレームを圧縮版ディジタルビデオ時系列から効率的に復号することができる。これも、速度・性能間の折衷を表している。即ち、ビデオ情報を利用することで、キービデオ断片特定に要する処理時間の全体的長期化と引替に、キービデオ断片特定ステップ２５０の性能を高めることができる。 The same applies to the case where the key video fragment is specified by using the video information extracted from the stored compressed digital video time series together with the feature amount obtained by the analysis. If the entire video time series is expanded, the merit of discriminating the video attribute-related feature amount at the time of shooting is largely negated, so that the video frames constituting the stored compressed digital video time series are decoded. It is desirable to make the number as small as possible. As can be understood by a so-called person skilled in the art, if a frame is encoded independently of other frames, the frame can be efficiently decoded from the compressed digital video time series. This also represents a compromise between speed and performance. That is, by using the video information, the performance of the key video fragment specifying step 250 can be enhanced in exchange for the overall increase in the processing time required for specifying the key video fragment.

キービデオ断片特定ステップ２５０の実行に当たり、ユーザ用コントローラ３４を介したユーザ入力を受け取り、それに応じビデオサマリの諸属性を制御するようにしてもよい。例えば、ビデオサマリの長さ、個別のキービデオ断片の最短時間長、キービデオ断片の総数等に関しユーザから指定を受ける形態である。 In executing the key video fragment identification step 250, user input via the user controller 34 may be received and various attributes of the video summary may be controlled accordingly. For example, the user receives designation from the user regarding the length of the video summary, the minimum time length of individual key video fragments, the total number of key video fragments, and the like.

キービデオ断片特定ステップ２５０にて、本願出願人を譲受人とする係属中の米国特許出願第１２／７８６４７１号（発明者：Ｄｅｅｖｅｒ，名称：キービデオフレーム判別方法(Method for Determining Key Video Frames)）に記載の手法を用いるようにしてもよい。この手法は、ディジタルビデオ時系列を解析することで重要度の時間変化を導出し、その結果に基づきそのディジタルビデオ時系列の時間歪曲表現を生成し、その時間歪曲表現を複数個の歪曲等長期間へと分割し、各歪曲等長期間内のビデオフレームを解析することで当該歪曲等長期間毎にキービデオフレームを選択する、というものである。重要度はそのディジタルビデオ時系列に備わる大域モーション、局所モーション等の特性を反映した情報、特に特徴量判別ステップ２２０における特徴量判別の結果に基づき導出される情報である。時間歪曲表現は、ディジタルビデオ時系列内ビデオフレームのうちあるものを長め、他のあるものを短めにする、といった具合に加重した表現である。キービデオフレーム選択は、特徴量判別ステップ２２０にて判別された特徴量のうち対応する歪曲等長期間内のビデオフレーム群に係るものを解析することで行うのが望ましい。例えば、ズームイン動作終了から間もない、その中央領域における局所モーションの程度が中庸である等といった条件を満たすビデオフレームがキーフレームとして選択される。 In the key video fragment identification step 250, pending US patent application Ser. No. 12 / 786,471, assigned to the assignee of the present application (inventor: Dever, name: Method for Determining Key Video Frames) May be used. This method derives a temporal change in importance by analyzing a digital video time series, generates a time distortion expression of the digital video time series based on the result, and converts the time distortion expression into a plurality of distortion isometric lengths. A key video frame is selected for each long period of time such as distortion by analyzing the video frames within a long period of time such as each distortion. The degree of importance is information reflecting characteristics such as global motion and local motion provided in the digital video time series, in particular information derived based on the result of feature amount discrimination in the feature amount discrimination step 220. The time distortion expression is a weighted expression such that one of the video frames in the digital video time series is lengthened and the other is shortened. The key video frame selection is preferably performed by analyzing a feature amount determined in the feature amount determination step 220 and related to a video frame group within a long period, such as a corresponding distortion. For example, a video frame that satisfies the condition that the degree of local motion in the central area is short after the zoom-in operation is completed is selected as a key frame.

キービデオフレーム判別後は、個々のキービデオフレームの前後にある一群のビデオフレームを選択することで個々のキービデオ断片を特定すればよい。具体的には、キービデオフレームの前２ｓｅｃから後２ｓｅｃまで、合計４ｓｅｃの期間に属するビデオフレーム群を選択することでキービデオ断片を特定すればよい。 After discriminating key video frames, individual key video fragments may be specified by selecting a group of video frames before and after each key video frame. Specifically, a key video fragment may be specified by selecting a video frame group belonging to a total period of 4 seconds from 2 seconds before and 2 seconds after the key video frame.

キービデオ断片は、また、キービデオフレームに対しランク付けを行い、最高ランクキービデオフレームに関連する一群のキービデオフレームのみでキービデオ断片を生成する、といった手法でも特定することができる。キービデオフレームに対するランク付けは、ディジタルビデオ時系列を解析することでカメラ移動パターンを判別し、そのカメラ移動パターンに基づきキービデオフレームのランクを決めることで行うことができる。例えば、特徴量判別ステップ２２０で判別された特徴量のうち大域モーションに関連するものを解析することで、そのディジタルビデオ時系列におけるカメラ固定領域の変遷を示す大域モーション軌跡を得ることができる。ビデオ撮影の全過程長に比し高い比率でカメラが固定されていた領域、即ち高頻度固定領域に対応するビデオフレームには、高いランクが付される。このランク付け処理は各周回毎に最高ランクキービデオフレームを選択する反復的な処理として実行すればよく、またその処理の個々の周回では既選択キービデオフレームのそれと同じ固定領域を表すキービデオフレームに比しそれ以外の固定領域を表すキービデオフレームを優先的に選択するようにすればよい。キービデオフレームのランク付け後は、最高ランクキービデオフレームが包含されるようにキービデオ断片を特定すればよい。 The key video fragments can also be identified in such a manner that the key video frames are ranked and only the group of key video frames associated with the highest rank key video frame is generated. Ranking for the key video frames can be performed by determining the camera movement pattern by analyzing the digital video time series and determining the rank of the key video frame based on the camera movement pattern. For example, by analyzing the features related to the global motion among the features determined in the feature determination step 220, a global motion trajectory indicating the transition of the camera fixed region in the digital video time series can be obtained. Areas where the camera is fixed at a higher ratio than the total length of video shooting, that is, video frames corresponding to the high-frequency fixed areas are given higher ranks. This ranking process may be performed as an iterative process of selecting the highest rank key video frame for each round, and in each round of the process, a key video frame representing the same fixed area as that of the selected key video frame. It is sufficient to preferentially select a key video frame representing a fixed area other than that. After ranking the key video frames, the key video fragments may be identified so that the highest rank key video frame is included.

上掲の通り、キービデオ断片は、各キービデオフレームの前から後にかけて一群のビデオフレームを選択することで特定することができる。これに代え、ビデオサマリの総時間長やキービデオ断片の最短許容時間長に関する条件を設定し、それらの条件を満たすように選択することでもキービデオ断片を特定することができる。更なる条件を課すこと、例えば話者音声がそのキービデオ断片の開始部分や終了部分で途切れない、といった条件を課すこともできる As described above, a key video fragment can be identified by selecting a group of video frames from before to after each key video frame. Alternatively, the key video fragment can be identified by setting conditions relating to the total time length of the video summary and the minimum allowable time length of the key video fragment, and selecting to satisfy these conditions. It is possible to impose further conditions, for example that the speaker's voice is not interrupted at the beginning or end of the key video fragment.

キービデオ断片特定後、ビデオサマリ生成ステップ２６０ではビデオサマリが生成される。即ち、キービデオ断片同士を結合させてひとまとまりにすることでビデオサマリが生成される。本実施形態では、ディジタルビデオ時系列における登場順序に合致する順序でキービデオ断片同士が結合される。 After identifying the key video fragment, a video summary generation step 260 generates a video summary. That is, a video summary is generated by joining key video fragments together. In this embodiment, the key video fragments are combined in an order that matches the appearance order in the digital video time series.

ビデオサマリ表現子格納ステップ２７０では、そのビデオサマリの表現子がプロセッサ可アクセスメモリ内に格納される。プロセッサ可アクセスメモリ内に格納されるビデオサマリ表現子は、例えば、ビデオサマリを組成するディジタルビデオ時系列内ビデオフレーム群を指し示すフレーム指示メタデータである。フレーム指示メタデータは格納済の圧縮版ディジタルビデオ時系列と関連付けつつ格納することが可能なデータであり、ビデオサマリを組成する諸キービデオ断片の始点及び終点フレーム等を指し示している。この形態であれば、ビデオサマリ表現子の格納に必要な物理メモリ量を、フレーム指示メタデータの格納に必要なそれに抑えることができる。 In the video summary representation storage step 270, the video summary representation is stored in the processor accessible memory. The video summary expression stored in the processor-accessible memory is, for example, frame indication metadata indicating a digital video time-series video frame group composing the video summary. The frame indication metadata is data that can be stored in association with the stored compressed digital video time series, and indicates the start and end frames of the key video fragments that compose the video summary. In this form, the amount of physical memory required for storing the video summary expression can be reduced to that required for storing the frame instruction metadata.

プロセッサ可アクセスメモリ内に格納されるビデオサマリ表現子は、或いは、そのビデオサマリに対応する融合版ビデオ時系列である。これは、特定されたキービデオ断片を組成するビデオフレーム群を格納済の圧縮版ディジタルビデオ時系列から抽出し、それらビデオフレーム同士を融合させることで、新規に生成することができる。その際には、ときとして、圧縮版ディジタルビデオ時系列を部分的に復号することや、融合版ビデオ時系列を圧縮して圧縮版ビデオ時系列を生成することが必要になる。融合版ビデオ時系列にビデオデータだけでなくオーディオデータをも含めるには、オーディオデータを圧縮版ディジタルビデオ時系列から抽出する必要もある。 The video summary representation stored in the processor-accessible memory is alternatively a fused video time series corresponding to the video summary. This can be newly generated by extracting a video frame group composing the specified key video fragment from the stored compressed digital video time series and fusing the video frames together. In that case, it is sometimes necessary to partially decode the compressed version of the digital video time series or to compress the fused version of the video time series to generate a compressed version of the video time series. In order to include not only video data but also audio data in the fused video time series, it is also necessary to extract the audio data from the compressed digital video time series.

生成された圧縮版ビデオサマリをプロセッサ可アクセスメモリ内に格納する際には、対応する圧縮版ディジタルビデオ時系列のそれとは別のディジタルビデオファイル内にその圧縮版ビデオサマリを格納する形態を採ることができる。この形態ではそのディジタルビデオファイルがビデオサマリ表現子となるので、ビデオサマリ表現子を元々の圧縮版ディジタルビデオ時系列とは独立に視聴乃至共有することが可能である。ビデオサマリ表現子たるディジタルビデオファイルのフォーマットは、標準的なビデオプレーヤで再生可能なフォーマットにするのが望ましい。 When the generated compressed video summary is stored in the processor-accessible memory, the compressed video summary is stored in a digital video file different from that of the corresponding compressed digital video time series. Can do. In this form, the digital video file becomes a video summary expression, so that the video summary expression can be viewed or shared independently of the original compressed digital video time series. It is desirable that the format of the digital video file as a video summary expression be a format that can be played back by a standard video player.

抽出されたビデオフレーム群を圧縮して圧縮版ビデオサマリを生成する際には、例えば、そのビデオフレーム群を再サンプリングすることで空間解像度を従前の値から新たな値へと変化させ、新たな空間解像度に係るビデオフレーム群を圧縮して圧縮版ビデオサマリを生成するのが望ましい。この再サンプリングは、高空間解像度で撮影されたビデオを共有する際に有益である。何故なら、含まれるビデオフレームの個数が少なくそのビデオフレームの空間解像度も低い圧縮版ビデオサマリ、即ち圧縮版ディジタルビデオ時系列よりも小サイズで共有しやすいビデオサマリが得られるからである。低空間解像度ビデオサマリは高解像度ビデオサマリ全体を伸張することなく生成することができる。伸張が必要なのは、ビデオサマリの生成に必要なビデオフレームのみである。 When generating a compressed video summary by compressing the extracted video frames, for example, by re-sampling the video frames, the spatial resolution is changed from the previous value to a new value, and a new It is desirable to generate a compressed video summary by compressing a video frame group related to the spatial resolution. This resampling is useful when sharing videos taken at high spatial resolution. This is because a compressed version video summary with a small number of video frames included and a low spatial resolution of the video frames, that is, a video summary that is smaller in size and easier to share than a compressed version digital video time series can be obtained. A low spatial resolution video summary can be generated without stretching the entire high resolution video summary. Only the video frames needed to generate the video summary need to be decompressed.

同様に、抽出されたビデオフレーム群を圧縮して圧縮版ビデオサマリを生成する際に、そのビデオサマリを組成するビデオフレームを時間軸沿いに再サンプリングして、時間解像度を従前の値から新たな値へと変化させることもできる。 Similarly, when the extracted video frames are compressed to generate a compressed video summary, the video frames that compose the video summary are resampled along the time axis, and the time resolution is changed from the previous value to a new value. It can also be changed to a value.

図３に、本発明の他の実施形態に係るビデオサマリ生成方法として、生成されたビデオサマリをユーザがプリビューすること、並びに設定調整を通じユーザがビデオサマリ生成結果の更新を要求することが可能な方法を示す。ディジタルビデオ時系列撮影ステップ２１０、特徴量判別ステップ２２０、ディジタルビデオ時系列圧縮ステップ２３０、圧縮版ディジタルビデオ時系列格納ステップ２４０、キービデオ断片特定ステップ２５０、ビデオサマリ生成ステップ２６０及びビデオサマリ表現子格納ステップ２７０はいずれも図２を参照して説明したものと同じ要領で実行されるが、本実施形態では、ビデオサマリ生成ステップ２６０にて生成されたビデオサマリがその格納に先立ちビデオサマリ表示ステップ２６２にてユーザ向けに表示され、そのビデオサマリに関するユーザの諾否がユーザ諾否判別ステップ２６４にて確認される。ユーザが満足との意向を示した場合はビデオサマリ表現子格納ステップ２７０に移行して図２同様の処理が実行されるが、不満足との意向を示した場合は、一通り又は複数通りの設定に関しユーザ設定調整ステップ２６６にてユーザによる調整を受けた上で新たなビデオサマリが生成される。ユーザ設定の調整はユーザ用コントローラ３４を介し行うことができる。その対象には、ビデオサマリの時間長、キービデオ断片の最短時間長、ビデオサマリに含まれるキービデオ断片の個数等といったパラメタ群の設定が含まれうる。ユーザによる設定調整が済んだ後は、キービデオ断片特定ステップ２５０及びビデオサマリ生成ステップ２６０にて、新たなユーザ設定に基づくビデオサマリが新規生成される。いわゆる当業者にはご理解頂ける通り、ユーザは、ビデオサマリのプリビューや設定調整を、自分が満足できるビデオサマリが得られるまで繰返し実行することができる。 FIG. 3 shows a video summary generation method according to another embodiment of the present invention, in which the user can preview the generated video summary and request the user to update the video summary generation result through setting adjustment. The method is shown. Digital video time series shooting step 210, feature quantity determination step 220, digital video time series compression step 230, compressed digital video time series storage step 240, key video fragment identification step 250, video summary generation step 260 and video summary expression storage Steps 270 are executed in the same manner as described with reference to FIG. 2, but in this embodiment, the video summary generated in the video summary generation step 260 is displayed in the video summary display step 262 prior to its storage. The user's approval / disapproval regarding the video summary is confirmed in the user approval / disapproval determination step 264. If the user indicates satisfaction, the process proceeds to the video summary expression storage step 270 and the same processing as in FIG. 2 is executed. If the user indicates dissatisfaction, one or more settings are made. A new video summary is generated after user adjustment in step 266. User settings can be adjusted via the user controller 34. The target can include setting of parameter groups such as the video summary time length, the minimum time length of key video fragments, the number of key video fragments included in the video summary, and the like. After the setting adjustment by the user is completed, a video summary based on the new user setting is newly generated in the key video fragment specifying step 250 and the video summary generating step 260. As will be appreciated by those skilled in the art, the user can repeatedly perform video summary previews and setting adjustments until a satisfactory video summary is obtained.

図４に、本発明の他の実施形態に係るビデオサマリ生成方法として、ビデオサマリを特定するデータが格納済ディジタルビデオ時系列に係るメタデータとして格納される方法を示す。まず、ディジタルビデオ時系列撮影ステップ４１０では、複数個のビデオフレームを有するディジタルビデオ時系列がディジタルビデオ撮影装置によって撮影される。ディジタルビデオ時系列格納ステップ４２０では、そのディジタルビデオ時系列がプロセッサ可アクセスメモリ内に格納される。図２中の特徴量判別ステップ２２０で判別されるような特徴量を、そのディジタルビデオ時系列と共に格納してもよいし格納しなくてもよい。 FIG. 4 shows a video summary generation method according to another embodiment of the present invention in which data specifying a video summary is stored as metadata related to a stored digital video time series. First, in the digital video time series photographing step 410, a digital video time series having a plurality of video frames is photographed by the digital video photographing apparatus. In the digital video time series storage step 420, the digital video time series is stored in the processor-accessible memory. The feature amount determined in the feature amount determination step 220 in FIG. 2 may or may not be stored together with the digital video time series.

次いで、キービデオ断片特定ステップ４３０では、格納済のディジタルビデオ時系列に含まれるビデオフレームのうち１個又は複数個で組成されるキービデオ断片が１個又は複数個特定される。このステップ４３０は、例えば、図２を参照して説明したキービデオ断片特定ステップ２５０に倣い、格納済の特徴量に基づき実行される。同ステップ４３０を、格納済のディジタルビデオ時系列を構成するフレーム群をビデオ解析アルゴリズムに従い直接解析する過程を含む形態にすることもできる。その場合、ステップ４３０を実行するのに、解析上の必要性に従いディジタルビデオ時系列を伸張することが必要になろう。キービデオ断片特定には、図２を参照して前述したものを含め、本件技術分野で知られている諸手法を使用することができる。 Next, in a key video fragment specifying step 430, one or a plurality of key video fragments composed of one or a plurality of video frames included in the stored digital video time series are specified. This step 430 is executed based on the stored feature amount, for example, following the key video fragment specifying step 250 described with reference to FIG. The step 430 may include a process of directly analyzing a frame group constituting a stored digital video time series according to a video analysis algorithm. In that case, performing step 430 would require decompressing the digital video time series according to the analytical needs. Various methods known in this technical field can be used to identify the key video fragment, including those described above with reference to FIG.

ビデオサマリ生成ステップ４４０では、図２中のビデオサマリ生成ステップ２６０と同じく、特定されたキービデオ断片同士の結合によってビデオサマリが生成される。ビデオサマリ指示メタデータ格納ステップ４５０では、そのビデオサマリに相応するビデオフレーム群を指し示すメタデータを、格納済のディジタルビデオ時系列に関連付けて格納することで、プロセッサ可アクセスメモリにおけるビデオサマリの所在が特定される。 In the video summary generation step 440, as in the video summary generation step 260 in FIG. 2, a video summary is generated by combining the identified key video fragments. In the video summary instruction metadata storage step 450, the metadata indicating the video frame group corresponding to the video summary is stored in association with the stored digital video time series, whereby the location of the video summary in the processor accessible memory is determined. Identified.

キービデオ断片特定ステップ４３０、ビデオサマリ生成ステップ４４０及びビデオサマリ指示メタデータ格納ステップ４５０は、ディジタルビデオ時系列撮影ステップ４１０やディジタルビデオ時系列格納ステップ４２０が実行されるものとは異なるディジタルビデオ撮影装置上やプロセッサ２０上で実行することもできる。例えば、ステップ４２０にてディジタルビデオ時系列が格納される画像メモリ３０をリムーバブルメモリカードとし、そのメモリカードを用いディジタルビデオ時系列を別の装置に運び、その装置でステップ４３０〜４５０を実行するようにしてもよい。例えば、図１に示したものに類する別のディジタルビデオ撮影装置にディジタルビデオ時系列を運ぶことや、図１に示したコンピュータ４０等の他装置やビデオ編集システムといったシステムにディジタルビデオ時系列をロードしそこでの処理でビデオサマリを生成させるようにしてもよい。 The key video fragment specifying step 430, the video summary generation step 440, and the video summary instruction metadata storage step 450 are different from those in which the digital video time series shooting step 410 and the digital video time series storage step 420 are executed. It can also be executed on the processor 20 or above. For example, the image memory 30 in which the digital video time series is stored in step 420 is a removable memory card, the digital video time series is carried to another device using the memory card, and the steps 430 to 450 are executed by the device. It may be. For example, the digital video time series is carried to another digital video photographing apparatus similar to that shown in FIG. 1, or the digital video time series is loaded into a system such as another apparatus such as the computer 40 shown in FIG. 1 or a video editing system. However, a video summary may be generated by processing there.

キービデオ断片特定ステップ４３０は、格納済のディジタルビデオ時系列から１個又は複数個のビデオフレームを抽出する過程及び抽出したビデオフレームを解析することでそれらのビデオフレームに係る特徴量を判別する過程を含む形態にすることができる。ステップ４３０は、更に、格納済のディジタルビデオ時系列から１個又は複数個のオーディオサンプルを抽出する過程及びそれらのオーディオサンプルを解析することでそのオーディオサンプルに係る特徴量を判別する過程を含む形態にすることもできる。これらの特徴量に関する解析はキービデオ断片の特定に役立つ。 The key video fragment specifying step 430 includes a process of extracting one or a plurality of video frames from the stored digital video time series, and a process of discriminating a feature amount relating to the video frames by analyzing the extracted video frames. Can be included. The step 430 further includes a process of extracting one or a plurality of audio samples from the stored digital video time series, and a process of discriminating a feature amount related to the audio samples by analyzing the audio samples. It can also be. Analysis of these feature quantities helps to identify key video fragments.

ビデオサマリ指示メタデータ格納ステップ４５０は、ビデオサマリを組成するディジタルビデオ時系列内ビデオフレーム群を指し示すフレーム指示メタデータを格納する形態にすることができる。この要領に従いビデオサマリをフレーム指示メタデータとして格納することは、ビデオサマリの格納に必要な物理メモリの量がフレーム指示メタデータの格納に必要な最小限の量に留まる点で有益なことである。好ましいことに、フレーム指示メタデータは、格納済の圧縮版ディジタルビデオ時系列に関連付けつつ格納することができる。例えば、ビデオサマリ内キービデオ断片を組成するフレームのうち一群の始点フレーム及び終点フレームを指し示すビデオサマリ表現子を、メタデータとして、格納済の圧縮版ディジタルビデオ時系列に係るファイル内に格納すればよい。格納済の圧縮版ディジタルビデオ時系列に係るファイルとは別のファイル内に、フレーム指示メタデータを格納してもよい。 The video summary indication metadata storage step 450 may be configured to store frame indication metadata indicating a group of video frames in the digital video time series constituting the video summary. Storing the video summary as frame indication metadata according to this guideline is beneficial in that the amount of physical memory required to store the video summary remains at the minimum amount required to store the frame indication metadata. . Preferably, the frame indication metadata can be stored in association with the stored compressed digital video time series. For example, if a video summary expression indicating a group of start and end frames of a frame constituting a key video fragment in a video summary is stored as metadata in a file related to a stored compressed digital video time series. Good. The frame instruction metadata may be stored in a file different from the stored compressed digital video time series file.

ビデオサマリ指示メタデータ格納ステップ４５０にて、更に、キービデオ断片を組成するオーディオサンプルのうちビデオサマリを組成するものを指し示すメタデータを格納するようにしてもよい。 In the video summary instruction metadata storage step 450, metadata indicating the audio sample composing the key video fragment among the audio samples composing the video summary may be stored.

ビデオサマリ指示メタデータ格納ステップ４５０にて、更に、特定されたキービデオ断片間の境界に適用されるビデオトランジション効果を指し示すメタデータをも格納するようにしてもよい。ビデオトランジション効果としては、例えば、あるキービデオ断片から次のキービデオ断片へのフェーディング（クロスディゾルブ効果）、あるキービデオ断片からホワイト画面又はブラック画面を経て次のキービデオ断片へのフェーディング等を指定することができる。その他のトランジション効果、例えばクロスワイプ効果、サークル拡／縮効果、横／縦ブラインド効果、チェックボードトランジション効果等の特殊効果を指定することもできる。いわゆる当業者にはご理解頂ける通りこれらは例示に過ぎず、本発明の実施に際してはその他様々なトランジション効果も使用することができる。ビデオトランジション効果を使用することで、ディジタルビデオ時系列内セグメント間の突飛な遷移がなく視覚的品質が総合的に高いビデオサマリを得ることができる。 In the video summary instruction metadata storage step 450, metadata indicating the video transition effect applied to the boundary between the identified key video fragments may be stored. Examples of video transition effects include fading from one key video fragment to the next key video fragment (cross dissolve effect), fading from one key video fragment to the next key video fragment via a white screen or black screen, etc. Can be specified. Other transition effects, for example, special effects such as a cross wipe effect, a circle expansion / contraction effect, a horizontal / vertical blind effect, and a checkboard transition effect can also be designated. As will be understood by those skilled in the art, these are merely examples, and various other transition effects can be used in the practice of the present invention. By using the video transition effect, it is possible to obtain a video summary with high overall visual quality without sudden transitions between segments in the digital video time series.

ビデオサマリ指示メタデータ格納ステップ４５０にて、更に、キービデオ断片間の境界に適用されるオーディオトランジション効果を指し示すメタデータをも格納するようにしてもよい。オーディオトランジション効果としては、例えば、有音から無音へのフェーディングや無音から有音へのフェーディングを指定することができる。そうしたオーディオトランジション効果を使用することで、ディジタルビデオ時系列内オーディオセグメント間の突飛な遷移がなく聴覚的品質が総合的に高いビデオサマリを得ることができる。 In the video summary instruction metadata storage step 450, metadata indicating the audio transition effect applied to the boundary between the key video fragments may also be stored. As the audio transition effect, for example, fading from sound to silence or fading from silence to sound can be specified. By using such an audio transition effect, it is possible to obtain a video summary with a high overall auditory quality without sudden transitions between audio segments in a digital video time series.

図５に、本発明の一実施形態に係りビデオ再生システム上で実行されるビデオサマリ表示方法の流れを示す。図示の方法は、直に表示可能なディジタルビデオファイルとして格納されているビデオサマリではなくその所在がメタデータで特定されているビデオサマリの表示に適した方法である。 FIG. 5 shows a flow of a video summary display method executed on the video playback system according to an embodiment of the present invention. The illustrated method is suitable for displaying a video summary whose location is specified by metadata, not a video summary stored as a digital video file that can be displayed directly.

まず、データ読込ステップ５１０では、格納済のビデオサマリに係るデータ、具体的には格納済のディジタルビデオ時系列を構成するビデオフレームのうちそのビデオサマリを組成するものを指し示すデータが読み込まれる。例えば、そのビデオサマリを組成するビデオフレームを指し示すデータが、ディジタルビデオ時系列の格納先と同じディジタルビデオファイル内に格納されているメタデータ、特にそのビデオサマリを組成するビデオフレームがどれかを示すメタデータから抽出される。或いは、格納済のディジタルビデオ時系列に関連付けられている別のファイルから抽出される。 First, in the data reading step 510, data related to the stored video summary, specifically, data indicating the video frame composing the stored digital video time series is read. For example, data indicating the video frame that composes the video summary indicates which metadata is stored in the same digital video file where the digital video time series is stored, particularly the video frame that composes the video summary. Extracted from metadata. Alternatively, it is extracted from another file associated with the stored digital video time series.

次に、ビデオフレーム抽出ステップ５２０では、ビデオフレームのうちキービデオ断片ひいてはビデオサマリを組成しているものが、格納済のディジタルビデオ時系列から抽出される。ディジタルビデオ時系列が通例に倣い圧縮状態で格納されている場合、このステップ５２０ではそのディジタルビデオ時系列の伸張も実行される。 Next, in a video frame extraction step 520, the video frames that make up the key video fragment and thus the video summary are extracted from the stored digital video time series. In the case where the digital video time series is stored in a compressed state following the usual practice, in step 520, the digital video time series is also decompressed.

ビデオサマリ生成ステップ５３０では、キービデオ断片毎のビデオフレーム抽出結果に基づきビデオサマリが生成される。このステップ５３０の最も単純な実行形態は、抽出されたビデオフレームを単純に並べて途切れのないビデオクリップを生成する、というものである。 In the video summary generation step 530, a video summary is generated based on the video frame extraction result for each key video fragment. The simplest implementation of this step 530 is to simply arrange the extracted video frames to produce an uninterrupted video clip.

ビデオサマリ生成後、ビデオサマリ表示ステップ５４０では、そのビデオサマリがソフトコピーディスプレイの画面上に表示される。ソフトコピーディスプレイとしては、例えば、ディジタルビデオ撮影装置上の閲覧画面や、コンピュータに接続されているディスプレイや、テレビジョン受像機の画面を使用することができる。 After the video summary is generated, in the video summary display step 540, the video summary is displayed on the screen of the soft copy display. As the soft copy display, for example, a browsing screen on a digital video photographing device, a display connected to a computer, or a screen of a television receiver can be used.

データ読込ステップ５１０は、ビデオサマリに相応するオーディオサンプル群を指し示す指示子をも得る形態にすることができる。この場合、ビデオフレーム抽出ステップ５２０にて、更に、格納済のディジタルビデオ時系列を組成するオーディオサンプル群のなかからそのビデオサマリに相応しいものを抽出するようにすればよい。 The data reading step 510 may be configured to obtain an indicator indicating an audio sample group corresponding to the video summary. In this case, in the video frame extraction step 520, it is only necessary to extract the audio sample group composing the stored digital video time series from the audio sample group suitable for the video summary.

データ読込ステップ５１０にて、更に、そのビデオサマリで使用されるビデオトランジション効果を指し示す指示子、そのビデオサマリで使用されるオーディオトランジション効果を指し示す指示子等を得るようにしてもよい。ビデオトランジション効果を指し示す指示子がデータ読込ステップ５１０で得られていれば、ビデオサマリ生成ステップ５３０にて、抽出済のビデオフレーム群をその指示子に係るトランジション効果に従い変形することで、所望のトランジションを呈するように編集されたビデオサマリを生成することができる。従って、例えば、ビデオサマリ内キービデオ断片の末尾に位置する１５個のフレームにブラック化フェーディング性のビデオトランジション効果を適用せよ、との指示子入りのデータが読み込まれた場合、まず、その効果を適用すべきビデオフレーム群が格納済のディジタルビデオ時系列から抽出され、緩慢なブラック化フェーディング様式に従いそれらのフレームに係るデータが修正される。更に、その次のキービデオ断片の冒頭に位置する１５個のフレームが抽出され、緩慢な通常ビデオ復帰フェーディング様式に従いそれらのフレームに係るデータが修正される。 In the data reading step 510, an indicator indicating the video transition effect used in the video summary, an indicator indicating the audio transition effect used in the video summary, and the like may be obtained. If an indicator indicating the video transition effect is obtained in the data reading step 510, the desired transition is obtained by modifying the extracted video frame group in accordance with the transition effect related to the indicator in the video summary generation step 530. A video summary edited to present can be generated. Therefore, for example, when data including an instruction to apply a black transition fading video transition effect to 15 frames located at the end of the key video fragment in the video summary is read, Are extracted from the stored digital video time series and the data for those frames is modified according to a slow blackening fading scheme. In addition, the 15 frames located at the beginning of the next key video fragment are extracted, and the data related to those frames are corrected according to the slow normal video return fading mode.

ソフトコピーディスプレイにオーディオ出力用のスピーカが１個又は複数個備わっている場合、同様に、そのスピーカからのオーディオ出力に先立ちオーディオトランジション効果を適用することが可能である。例えば、キービデオ断片の末尾に位置する８０００個のオーディオサンプルに無音化フェーディング性のオーディオトランジション効果を適用せよ、との指示子入りのデータが読み込まれた場合、まず、その効果を適用すべきオーディオサンプル群が格納済のディジタルビデオ時系列から抽出され、緩慢な無音化フェーディング様式に従いそれらのオーディオサンプルに係るデータが修正される。更に、その次のキービデオ断片の冒頭に位置する８０００個のオーディオサンプルが抽出され、緩慢な通常オーディオ復帰フェーディング様式に従いそれらのオーディオサンプルに係るデータが修正される。 When the soft copy display has one or more speakers for audio output, it is also possible to apply an audio transition effect prior to audio output from the speakers. For example, when data with an instruction to apply a silence fading audio transition effect is read to 8000 audio samples located at the end of a key video fragment, the effect should be applied first. A group of audio samples is extracted from the stored digital video time series, and the data associated with those audio samples is modified according to a slow silence fading scheme. In addition, 8000 audio samples located at the beginning of the next key video fragment are extracted, and the data related to those audio samples is modified according to the slow normal audio return fading mode.

本発明では、ビデオサマリ生成方法として、そのビデオサマリに含まれるビデオフレーム群乃至オーディオサンプル群を指し示すメタデータや、それに適用されるビデオ乃至オーディオトランジション効果を指し示すメタデータを、ディジタルビデオ時系列の格納先と同じディジタルビデオファイル内に格納する、といった方法が使用されうる。また、本発明では、ビデオサマリ表示システムとして、ビデオサマリを組成しているため格納済のディジタルビデオ時系列から抽出する必要があるビデオフレーム及びオーディオサンプルや、ディスプレイ上でのビデオ表示及びスピーカからのオーディオ出力に先立ちデータに適用すべきビデオ及びオーディオトランジション効果を、対応するメタデータを読み込んで特定する、といったシステムが使用されうる。こうした方法及びシステムには、別のビデオファイル内にビデオサマリを格納することなくビデオサマリを表示させることができる、という利点がある。 In the present invention, as a video summary generation method, metadata indicating a video frame group or an audio sample group included in the video summary, or metadata indicating a video or audio transition effect applied thereto is stored in a digital video time series. A method such as storing in the same digital video file as before can be used. In the present invention, as the video summary display system, since the video summary is composed, it is necessary to extract from the stored digital video time series, the video frame and the audio sample, the video display on the display and the speaker. A system may be used that reads and identifies the corresponding metadata to identify video and audio transition effects to be applied to the data prior to audio output. Such a method and system has the advantage that the video summary can be displayed without storing the video summary in a separate video file.

ユーザが原ビデオ視聴かビデオサマリ視聴かを選べるよう、ビデオディスプレイの装置構成に工夫を施してもよい。例えば、ディジタルビデオカムコーダ上に、原ビデオ時系列再生用のそれとビデオサマリ再生用のそれとに分け、複数個の再生ボタンを設けてもよい。早送りボタンの操作に応じビデオサマリが再生されるようにしてもよい。即ち、従来の早送り機能に代わるビデオ時系列内高速縦貫移動手段として、ビデオサマリ再生を使用することができる。ビデオ時系列内フレーム群の単なる時間サンプリングである従来の早送りに比し、ビデオサマリの再生は、ビデオ時系列のうちユーザが視聴したいと望むであろう部分だけをユーザに提示できる点でより有用なものである。 The device configuration of the video display may be devised so that the user can select whether to watch the original video or the video summary. For example, a plurality of playback buttons may be provided on a digital video camcorder, divided into those for original video time-series playback and those for video summary playback. The video summary may be played in response to the operation of the fast forward button. In other words, video summary reproduction can be used as a high-speed vertical moving means in the video time series in place of the conventional fast-forward function. Compared to traditional fast-forwarding, which is just time sampling of frames within a video time series, video summary playback is more useful in that it can present to the user only the part of the video time series that the user would want to watch It is a thing.

図５を参照して説明したビデオ再生システムは、ビデオサマリに係るメタデータを格納済のディジタルビデオ時系列から抽出すること、そのメタデータを処理しディジタルビデオ時系列のビデオサマリ版をどのように表示すべきかを判断すること等が可能であるという意味で、スマートなビデオプレーヤである。しかも、ディジタルビデオ時系列全体を視聴したいか、それともビデオサマリ版を視聴したいかに関し、ユーザ向けに選択肢を提示することもできる。これに対し、一般的なビデオプレーヤでは、ビデオサマリとそれに対応するメタデータとの関係を認識できない。とはいえ、一般的なビデオプレーヤでも、格納済の原ディジタルビデオ時系列を読み込んで表示に供することは可能である。即ち、一般的なビデオプレーヤでは、格納済のディジタルビデオ時系列に付随するメタデータが無視される。元々のディジタルビデオ時系列全体なら再生できる。 The video playback system described with reference to FIG. 5 extracts metadata related to a video summary from a stored digital video time series, and processes the metadata to obtain a video summary version of the digital video time series. It is a smart video player in the sense that it can be determined whether or not to display. In addition, the user can be presented with options regarding whether he wants to watch the entire digital video timeline or to watch the video summary version. On the other hand, a general video player cannot recognize the relationship between the video summary and the corresponding metadata. However, even a general video player can read the stored original digital video time series for display. That is, in a general video player, the metadata accompanying the stored digital video time series is ignored. The entire original digital video time series can be played back.

また、図１及び図２に示すように、本発明には、撮影によりディジタルビデオ時系列を取得して相応のビデオサマリを生成するディジタルビデオ撮影システム（例．ディジタルカメラ１０）なる実施形態がある。本実施形態のシステムは、ビデオフレーム取得用のイメージセンサ１４と、光景の像をイメージセンサ上に発現させる光学系（例．レンズ４及びそれに付随する可調絞り及び可調シャッタ６）と、ビデオサマリを生成してプロセッサ可アクセスメモリ内に格納するプロセッサ２０と、ディジタルビデオ時系列閲覧用の画像ディスプレイ３２と、格納されているディジタルビデオ時系列及びビデオサマリの表現子のうちいずれをソフトコピーディスプレイ上に表示させるかをユーザに選択させる手段（例．ユーザ用コントローラ３４）と、を備える。本システムは、例えば、ディジタルビデオカメラ、スチル撮影モード及びビデオ撮影モードを併有するディジタルカメラ、ウェブカメラ付のラップトップ乃至デスクトップコンピュータ等の形態を採りうる。 As shown in FIGS. 1 and 2, the present invention has an embodiment of a digital video shooting system (eg, digital camera 10) that acquires a digital video time series by shooting and generates a corresponding video summary. . The system of the present embodiment includes an image sensor 14 for acquiring a video frame, an optical system (for example, a lens 4 and an adjustable aperture and adjustable shutter 6 associated therewith) that causes an image of a scene to appear on the image sensor, a video A processor 20 that generates a summary and stores it in a processor-accessible memory, an image display 32 for browsing digital video time series, and a soft copy display that represents any of the stored digital video time series and video summary expressions. Means (for example, a user controller 34) for allowing the user to select whether to display the image on the upper side. This system may take the form of, for example, a digital video camera, a digital camera having both a still shooting mode and a video shooting mode, a laptop with a webcam, or a desktop computer.

ディジタルビデオ撮影システム内のプロセッサ２０は、図２に示した方法の諸ステップを担うソフトウェアを実行する。具体的には、ユーザからの指示に応じ、プロセッサ２０が、複数個のビデオフレームを有するディジタルビデオ時系列の撮影を実行し（ディジタルビデオ時系列撮影ステップ２１０）、その撮影と並行しディジタルビデオ時系列内ビデオフレーム群の一部又は全体を自動解析することで一通り又は複数通りの特徴量を判別し（特徴量判別ステップ２２０）、そのディジタルビデオ時系列を圧縮し（ディジタルビデオ時系列圧縮ステップ２３０）、それにより得られた圧縮版ディジタルビデオ時系列をプロセッサ可アクセスメモリ内に格納し（圧縮版ディジタルビデオ時系列格納ステップ２４０）、格納済圧縮版ディジタルビデオ時系列を伸張することなく特徴量を自動解析することでディジタルビデオ時系列内ビデオフレーム群からなるキービデオ断片を１個又は複数個特定し（キービデオ断片特定ステップ２５０）、キービデオ断片同士を結合させることでビデオサマリを生成し（ビデオサマリ生成ステップ２６０）、そしてそのビデオサマリの表現子をプロセッサ可アクセスメモリ内に格納する（ビデオサマリ表現子格納ステップ２７０）。 The processor 20 in the digital video shooting system executes software responsible for the steps of the method shown in FIG. Specifically, in response to an instruction from the user, the processor 20 executes digital video time-series shooting having a plurality of video frames (digital video time-series shooting step 210), and in parallel with the shooting, One or more feature quantities are discriminated by automatically analyzing part or all of the video frames in the series (feature quantity discrimination step 220), and the digital video time series is compressed (digital video time series compression step). 230), and the compressed digital video time series obtained thereby is stored in the processor-accessible memory (compressed digital video time series storage step 240), and the feature amount is not expanded without decompressing the stored compressed digital video time series. A key consisting of video frames within a digital video time series by automatically analyzing One or a plurality of video fragments are identified (key video fragment identification step 250), a video summary is generated by combining the key video fragments (video summary generation step 260), and the video summary representation is processed by the processor. Store in accessible memory (video summary expression storage step 270).

ディジタルビデオ時系列内オーディオサンプル群の一部又は全体を自動解析し、一通り又は複数通りの特徴量を判別するステップを、ディジタルビデオ撮影システム内のプロセッサ２０にディジタルビデオ時系列の撮影と並行して実行させるようにしてもよい。 The step of automatically analyzing a part or all of the audio sample group in the digital video time series and determining one or a plurality of feature amounts is performed in parallel with the digital video time series shooting in the processor 20 in the digital video shooting system. May be executed.

プロセッサ可アクセスメモリ内に判別済の特徴量を格納させるステップ、特に当該特徴量を格納済圧縮版ディジタルビデオ時系列に係るメタデータとして格納させるステップを、ディジタルビデオ撮影システム内のプロセッサ２０に実行させるようにしてもよい。 The processor 20 in the digital video shooting system executes the step of storing the determined feature quantity in the processor-accessible memory, particularly the step of storing the feature quantity as metadata relating to the stored compressed digital video time series. You may do it.

プロセッサ可アクセスメモリ内に判別済の特徴量を格納させるステップ、特に当該特徴量を格納済の圧縮版ディジタルビデオ時系列に関連付けられた別のファイル内に格納させるステップを、ディジタルビデオ撮影システム内のプロセッサ２０に実行させるようにしてもよい。 Storing the determined feature quantity in the processor-accessible memory, in particular, storing the feature quantity in another file associated with the stored compressed digital video time series. The processor 20 may be executed.

特徴量及びユーザ入力を自動解析して１個又は複数個のキービデオ断片を特定するステップを、ディジタルビデオ撮影システム内のプロセッサ２０に実行させるようにしてもよい。ユーザ入力はユーザ用コントローラ３４経由で得ることができる。ユーザ入力としては、ビデオサマリの時間長、ビデオサマリ内キービデオ断片の最短時間長、ビデオサマリ内キービデオ断片の個数等に関する条件を使用可能である。 The step of automatically analyzing the feature quantity and user input to identify one or more key video fragments may be executed by the processor 20 in the digital video shooting system. User input can be obtained via the user controller 34. As user input, conditions relating to the time length of the video summary, the minimum time length of the key video fragment in the video summary, the number of key video fragments in the video summary, and the like can be used.

ビデオサマリ表現子をプロセッサ可アクセスメモリ内に格納するステップを、ディジタルビデオ撮影システム内のプロセッサ２０に実行させるようにしてもよい。例えば、ディジタルビデオ時系列内ビデオフレームのうちビデオサマリを組成するものを指し示すメタデータを生成し、格納済圧縮版ディジタルビデオ時系列に関連付けて格納するステップである。 The step of storing the video summary representation in the processor accessible memory may be performed by the processor 20 in the digital video shooting system. For example, it is a step of generating metadata indicating a video summary composing among video frames in a digital video time series, and storing the metadata in association with the stored compressed digital video time series.

そのビデオサマリは、ディジタルビデオ撮影システムに備わるユーザ用コントローラ３４に対するユーザの操作に従い、ソフトコピー画像ディスプレイ３２や外付けビデオディスプレイ４６の画面上に表示される。そのビデオサマリが、ディジタルビデオ時系列内ビデオフレームのうちビデオサマリを組成するものをメタデータで指し示す形態で格納されている場合は、図５に示した要領でビデオサマリが抽出、表示される。具体的には、そのディジタルビデオ撮影システムがスマートなビデオプレーヤとして振る舞い、そのメタデータに相応しいビデオ及びオーディオデータを抽出する。 The video summary is displayed on the screen of the soft copy image display 32 or the external video display 46 in accordance with a user operation on the user controller 34 provided in the digital video shooting system. In the case where the video summary is stored in a form in which the metadata that composes the video summary among the video frames in the digital video time series is indicated by metadata, the video summary is extracted and displayed in the manner shown in FIG. Specifically, the digital video shooting system behaves as a smart video player and extracts video and audio data appropriate for the metadata.

ユーザがビデオサマリを視聴してその可否を判断できるようにユーザインタフェースを構成することも可能である。ユーザがビデオサマリを気に入らない場合もあるので、ビデオサマリに関する設定をユーザ自らが調整できるよう、ユーザ用コントローラ３４を利用し相応の手段をディジタルビデオ撮影システムに設けるのが望ましい。調整対象になりうるユーザ設定としては、ビデオサマリの時間長、ビデオサマリ内キービデオ断片の最短時間長、ビデオサマリ内キービデオ断片の個数等がある。キービデオ断片特定ステップ２５０での処理内容や使用する設定を違え、ビデオサマリの候補を複数個、自動的に生成するようにシステムを構成することも可能である。この構成では、ユーザが複数個の候補から適切なビデオサマリを選べるため、納得のいくビデオサマリが生成されない確率を抑え、本発明の効果を全体として高めることができる。 It is also possible to configure the user interface so that the user can view the video summary and determine whether it is possible. Since the user may not like the video summary, it is desirable to provide a corresponding means in the digital video shooting system by using the user controller 34 so that the user himself can adjust the settings relating to the video summary. User settings that can be adjusted include the video summary time length, the minimum time length of key video fragments in the video summary, the number of key video fragments in the video summary, and the like. It is also possible to configure the system so as to automatically generate a plurality of video summary candidates by changing the processing contents and setting used in the key video fragment specifying step 250. In this configuration, since the user can select an appropriate video summary from a plurality of candidates, the probability that a satisfactory video summary is not generated can be suppressed, and the effect of the present invention can be enhanced as a whole.

上掲のディジタルビデオ撮影システムには、格納済圧縮版ディジタルビデオ時系列を伸張することなく、またビデオサマリを符号化して新たなファイルを生成することなく、一群のビデオサマリを迅速に生成、表示できるという利点がある。 The above digital video shooting system quickly generates and displays a group of video summaries without decompressing the stored compressed digital video time series and without generating a new file by encoding the video summary. There is an advantage that you can.

ビデオサマリが別のファイル内に格納される構成では、ネットワーク上でのビデオサマリ共有や、圧縮版ディジタルビデオ時系列内メタデータを解釈してビデオサマリを抽出することが可能なスマートビデオディスプレイが実装されていない装置でのビデオサマリ参照を、より好適に実行することができる。この場合、ディジタルビデオ撮影システム内のプロセッサ２０で生成されるビデオサマリ表現子の格納先を、一般的なビデオプレーヤで再生可能なフォーマットのディジタルビデオファイル内とするのが望ましい。ビデオサマリが常に別ファイルを形成する構成にしてもよいし、ユーザがユーザ用コントローラ３４を操作しビデオサマリの格納形態を指定する構成、例えばユーザインタフェース上の共有ボタンをユーザが操作した場合にビデオサマリが別ファイルで格納される構成にしてもよい。 In a configuration where the video summary is stored in a separate file, a smart video display that can extract the video summary by sharing the video summary on the network and interpreting the metadata in the compressed digital video time series is implemented. It is possible to more suitably perform video summary reference with a device that has not been performed. In this case, it is desirable to store the video summary expression generated by the processor 20 in the digital video shooting system in a digital video file in a format that can be played back by a general video player. The video summary may be configured to always form another file, or the user operates the user controller 34 to specify the storage form of the video summary, for example, when the user operates the share button on the user interface. The summary may be stored in a separate file.

ビデオサマリが別ファイルで格納されるタイプのディジタルビデオ撮影システムでは、ビデオサマリ表現子をディジタルビデオファイル化してプロセッサ可アクセスメモリ内に格納するステップを、そのプロセッサ２０に実行させることもできる。例えば、格納されている圧縮版ディジタルビデオ時系列の一部又は全体を伸張することでビデオサマリに相応するビデオフレーム群を抽出し、そのビデオフレーム群を圧縮することで圧縮版ビデオサマリを生成し、その圧縮版ビデオサマリをプロセッサ可アクセスメモリ内に格納するソフトウェアを、そのプロセッサ２０が実行する構成である。更に、格納されている圧縮版ディジタルビデオ時系列の一部又は全体を伸張することでビデオサマリに相応するオーディオサンプル群を抽出し、そのオーディオサンプル群を圧縮して圧縮版ビデオサマリ内に組み込むステップを、ディジタルビデオ撮影システム内のプロセッサ２０に実行させるようにしてもよい。 In a digital video shooting system of a type in which the video summary is stored in a separate file, the processor 20 may execute the step of converting the video summary representation into a digital video file and storing it in a processor accessible memory. For example, a video frame group corresponding to a video summary is extracted by expanding a part or the whole of a stored compressed digital video time series, and a compressed video summary is generated by compressing the video frame group. The processor 20 executes software for storing the compressed video summary in the processor-accessible memory. Further, a step of extracting a part or the whole of the stored compressed digital video time series to extract an audio sample group corresponding to the video summary, compressing the audio sample group and incorporating it into the compressed video summary. May be executed by the processor 20 in the digital video shooting system.

共有に先立ちディジタルビデオ時系列の空間乃至時間解像度を元々の値から低下させることは、コンピュータネットワーク上でのビデオファイル共有に際し有益である。元々の値と異なる空間乃至時間解像度を有するビデオフレーム群は、抽出されたビデオフレーム群をディジタルビデオ撮影システム内のプロセッサ２０にて再サンプリングすることで得られる。本発明の長所の一つは、このように、元々の圧縮版ディジタルビデオ時系列全体を伸張することなく低解像度のビデオサマリを生成可能なことである。圧縮されているディジタルビデオデータのうちビデオサマリ生成に関連するものだけを伸張すればよいため、元々の圧縮版ディジタルビデオ時系列からビデオサマリへとより高速に符号変換することができる。 Decreasing the spatial or temporal resolution of the digital video time series from its original value prior to sharing is beneficial when sharing video files on a computer network. A video frame group having a spatial or temporal resolution different from the original value is obtained by re-sampling the extracted video frame group by the processor 20 in the digital video shooting system. One of the advantages of the present invention is that it can thus generate a low-resolution video summary without decompressing the entire original compressed digital video time series. Since only compressed digital video data related to the video summary generation needs to be decompressed, it is possible to perform higher-speed code conversion from the original compressed digital video time series to the video summary.

ネットワーク上でのビデオファイル共有に際しては、また、圧縮の積極性を高めることで、圧縮版ディジタルビデオ時系列のサイズを抑えることができる。抽出されたビデオフレーム群を、格納されている圧縮版ディジタルビデオ時系列でのそれに比し積極的な圧縮設定に従いディジタルビデオ撮影システム内のプロセッサ２０に圧縮させることで、より強く圧縮されたビデオサマリを生成することができる。本発明の長所の一つは、このように、元々の圧縮版ディジタルビデオ時系列全体を伸張することなく、より強く圧縮されたビデオサマリを生成することが可能な点にある。 When sharing a video file on a network, the size of the compressed digital video time series can be reduced by increasing the aggressiveness of compression. The extracted video frame group is compressed by the processor 20 in the digital video shooting system in accordance with a compression setting more aggressive than that of the stored compressed digital video time series, so that a more strongly compressed video summary is obtained. Can be generated. One advantage of the present invention is that it is possible to generate a more strongly compressed video summary without decompressing the entire original compressed digital video time series.

イメージセンサ１４、光学系（レンズ４）、プロセッサ２０及びソフトコピーディスプレイ（画像ディスプレイ３２）に加えてユーザインタフェースを備えるディジタルビデオカメラの形態でディジタルビデオ撮影システムを構成すること、特に格納済の圧縮版ディジタルビデオ時系列をそのソフトコピーディスプレイ上に表示させるかそれとも格納済のビデオサマリを表示させるかに関する選択肢をそのユーザインタフェース上でユーザ向けに提示する構成にすることも可能である。この場合、ユーザは、そのディジタルビデオカメラでビデオ撮影を行った直後に、そのビデオのサマリ版を視聴することや、そのビデオサマリに対し随意に修正を求めることや、そのビデオサマリを別ファイルで保存させ共有化することができる。 Constructing a digital video shooting system in the form of a digital video camera having a user interface in addition to the image sensor 14, optical system (lens 4), processor 20 and soft copy display (image display 32), in particular a stored compressed version It is also possible to have a configuration that presents the user with options on whether to display the digital video time series on the softcopy display or to display the stored video summary. In this case, immediately after shooting a video with the digital video camera, the user can view a summary version of the video, ask the video summary to make corrections at will, or save the video summary in a separate file. Can be saved and shared.

そのディジタルビデオカメラを外部のソフトコピーディスプレイに接続し、ユーザが視聴できるよう、格納済の圧縮版ディジタルビデオ時系列や格納済のビデオサマリをその上に表示させることもできる。 The digital video camera can be connected to an external soft copy display to display a stored compressed digital video time series or stored video summary on it for viewing by the user.

格納されているディジタルビデオ時系列やビデオサマリにアクセス可能な独立したビデオ視聴システムの一構成部材としてソフトコピーディスプレイを設けることや、格納されているディジタルビデオ時系列及び格納されているビデオサマリのうちいずれをソフトコピーディスプレイ上に表示させるかに関しユーザ向けに選択肢を提示するユーザインタフェースをディジタルカメラ上に設けることも可能である。 Providing a soft copy display as a component of an independent video viewing system accessible to the stored digital video time series and video summary, and the stored digital video time series and stored video summary It is also possible to provide a user interface on the digital camera that presents options to the user regarding which to display on the softcopy display.

ディジタルビデオ撮影システム内のプロセッサ２０が、キービデオ断片間にビデオトランジション効果を適用しつつビデオサマリを生成する構成にしてもよい。ビデオトランジション効果の結果は表示時に算出可能であり、格納済圧縮版ディジタルビデオ時系列に係るメタデータでビデオサマリの所在を特定する構成ではそのことが有利に働く。 The processor 20 in the digital video shooting system may be configured to generate the video summary while applying the video transition effect between the key video fragments. The result of the video transition effect can be calculated at the time of display, which is advantageous in the configuration in which the location of the video summary is specified by the metadata relating to the stored compressed digital video time series.

ディジタルビデオ撮影システム内のプロセッサ２０でキービデオ断片を解析し、複数個あるビデオトランジション効果のなかから幾つかを自動選択する構成にしてもよい。ホワイト化フェーディング性のトランジション効果が相応しいか、ブラック化フェーディング性のトランジション効果が相応しいか、それともそれ以外のビデオトランジション効果が相応しいかはビデオコンテンツ次第であるので、キービデオ断片の自動解析で得られる情報を利用することは、最適な視覚効果をもたらすキービデオ断片間ビデオトランジション効果を特定する上で有用なことである。 The processor 20 in the digital video shooting system may analyze the key video fragment and automatically select some of the video transition effects. Depending on the video content, it is up to the automatic analysis of key video fragments whether the whitening fading transition effect is appropriate, the black fading transition effect is appropriate, or the other video transition effect is appropriate. Utilizing the information obtained is useful in identifying video transition effects between key video fragments that provide the optimal visual effect.

オーディオトランジション効果も同様に扱うことができる。ディジタルビデオ撮影システム内のプロセッサ２０に、キービデオ断片間にオーディオトランジション効果を適用しつつビデオサマリを生成させるようにすればよい。オーディオトランジション効果の結果は表示時に算出可能である。 Audio transition effects can be handled similarly. What is necessary is just to make the processor 20 in a digital video imaging system generate | occur | produce a video summary, applying the audio transition effect between key video fragments. The result of the audio transition effect can be calculated at the time of display.

ディジタルビデオ撮影システム内のプロセッサ２０にキービデオ断片特定ステップ２５０の変形版を実行させる形態、例えば特徴量を自動解析しその結果を格納済圧縮版ディジタルビデオ時系列内の情報と併用してキービデオ断片を特定する形態にて、本発明を実施することも可能である。速度が犠牲になるものの、この形態であれば、ビデオサマリを組成するキービデオ断片の特定をより多くの情報に基づき行うことができる。また、撮影時に時間的な余裕がなくプロセッサ２０にて所望の特徴量を導出できない場合も多かろう。そうした場合でも、格納済圧縮版ディジタルビデオ時系列を部分的に伸張することで、キービデオ断片の導出に役立つ情報を抽出することができる。例えば、格納済圧縮版ディジタルビデオ時系列内のオーディオ情報を抽出することや、格納済圧縮版ディジタルビデオ時系列内のビデオ情報を抽出することができる。 A form in which the processor 20 in the digital video photographing system executes a modified version of the key video fragment specifying step 250, for example, a feature amount is automatically analyzed and the result is used in combination with information in the stored compressed digital video time series. It is also possible to implement the present invention in a form that specifies fragments. At the expense of speed, this form allows identification of key video fragments that make up a video summary based on more information. In many cases, the processor 20 cannot derive a desired feature amount because there is no time margin at the time of shooting. Even in such a case, it is possible to extract information useful for deriving the key video fragment by partially decompressing the stored compressed digital video time series. For example, audio information in the stored compressed digital video time series can be extracted, and video information in the stored compressed digital video time series can be extracted.

本発明には、マニュアルトリミングを支援できるという特徴もある。マニュアルトリミングは多くのディジタルビデオ撮影装置に備わる編集機能であり、これを利用することで、ユーザは、撮影したビデオのリビュー結果に基づきそのビデオの始点及び終点を随意にトリミングすることができる。本発明の許では、単一のキービデオ断片で組成されたビデオサマリが生成されることがあり、そうした場合にはそのビデオサマリの始点及び終点がマニュアルトリミングの推奨点として提示される。 The present invention also has a feature that manual trimming can be supported. Manual trimming is an editing function provided in many digital video shooting apparatuses, and by using this, the user can trim the start point and end point of the video arbitrarily based on the review result of the shot video. In accordance with the present invention, a video summary composed of a single key video fragment may be generated, in which case the start and end points of the video summary are presented as recommendations for manual trimming.

注記すべきことに、撮影時に特徴量を判別し撮影後に特徴量判別結果を利用するという手順は、ビデオサマリ生成以外の用途にも適している。特徴量に基づくキービデオフレーム判別を含め、他のやり方を採る構成も本発明の技術的範囲内であるものと認められる。 It should be noted that the procedure of discriminating feature amounts at the time of shooting and using the feature amount discrimination results after shooting is also suitable for uses other than video summary generation. It is recognized that configurations employing other methods including key video frame discrimination based on feature amounts are also within the technical scope of the present invention.

また、ディジタルビデオ圧縮アルゴリズムとしては、ディジタルビデオ時系列を複数個のビデオフレーム群に分割して符号化に供するものが多々ある。個々のビデオフレーム群は、独立符号化ビデオフレーム（Ｉフレーム）１個と、復号の際に他のビデオフレーム１個又は複数個に関する情報が必要になる幾つかの予測符号化ビデオフレーム（Ｐフレーム）と、を含む構成である。そのなかではＩフレーム１個が先行し、それぞれそのＩフレームに基づく予測で生成されたＰフレーム幾つかが後続する。ある符号化対象ビデオフレーム群が終わり次の符号化対象ビデオフレーム群が始まる時点で現れるのは後者内のＩフレームである。こうした圧縮方式では、圧縮されたディジタルビデオ時系列の内部に至るアクセスポイントとしてＩフレームを使用し、Ｉフレームを始点とするフレーム群を抽出することができる。具体的には、その符号化対象ビデオフレーム群を構成する圧縮バイトの位置及び個数を指し示すヘッダ情報を復号するのみで、圧縮版のディジタルビデオ時系列から符号化対象ビデオフレーム群全体を抽出してビデオサマリへと符号変換することができる。従って、ビデオサマリを生成する際、各キービデオ断片の冒頭フレームがＩフレームになるよう求めることや、その断片に含まれる符号化対象ビデオフレーム群の個数に端数が生じないよう求めることは有益なことである。これらの制約を課すことで、元々の圧縮版ディジタルビデオ時系列をあまり伸張せずにビデオサマリを生成することが可能となる。 Many digital video compression algorithms are used for encoding by dividing a digital video time series into a plurality of video frame groups. Each group of video frames consists of one independently encoded video frame (I frame) and several predictive encoded video frames (P frames) that require information about one or more other video frames during decoding. ). Among them, one I frame precedes, followed by several P frames generated by prediction based on that I frame. It is an I frame in the latter that appears when a certain video frame group ends and the next video frame group starts. In such a compression method, an I frame is used as an access point that reaches the inside of a compressed digital video time series, and a frame group starting from the I frame can be extracted. Specifically, the entire encoding target video frame group is extracted from the compressed digital video time series only by decoding header information indicating the position and number of compressed bytes constituting the encoding target video frame group. The code can be converted into a video summary. Therefore, when generating a video summary, it is useful to obtain that the first frame of each key video fragment is an I frame, or to obtain no fractional number in the number of target video frames included in the fragment. That is. By imposing these constraints, it is possible to generate a video summary without much decompressing the original compressed digital video time series.

本発明に係る方法を実行するためのコンピュータプログラム製品は、磁気ディスク（例．フロッピーディスク）、磁気テープ等の磁気記録媒体、光ディスク、光テープ、機械可読バーコード等の光記録媒体、ＲＡＭ、ＲＯＭ等の固体電子記憶デバイスをはじめとする１個又は複数個の記録媒体、即ち本発明に係る方法が体現されるよう１台又は複数台のコンピュータを制御するコンピュータプログラムの格納に使用可能な諸有形デバイス乃至媒体に格納可能である。 Computer program products for executing the method according to the present invention include magnetic recording media such as magnetic disks (eg floppy disks), magnetic tapes, optical recording media such as optical disks, optical tapes, machine-readable barcodes, RAM, ROM Various tangibles that can be used to store one or more recording media including a solid-state electronic storage device such as a computer program for controlling one or more computers so that the method of the present invention is embodied. It can be stored in a device or medium.

１フラッシュ、４レンズ、６可調絞り及び可調シャッタ、８ズーム／合焦モータドライバ、１０ディジタルカメラ、１２タイミング発生器、１４イメージセンサ、１６ＡＳＰ及びＡ／Ｄコンバータ、１８バッファメモリ、２０プロセッサ、２２オーディオコーデック、２４マイクロホン、２５圧力センサ、２６スピーカ、２７加速度計、２８ファームウェアメモリ、３０画像メモリ、３２画像ディスプレイ、３４ユーザ用コントローラ、３６ディスプレイメモリ、３８有線インタフェース、４０コンピュータ、４２傾斜センサ、４４ビデオインタフェース、４６ビデオディスプレイ、４８インタフェース／充電器、５０ワイヤレスモデム、５２無線周波数帯、５８ワイヤレスネットワーク、７０インターネット、７２フォトサービスプロバイダ、２１０，４１０ディジタルビデオ時系列撮影ステップ、２２０特徴量判別ステップ、２３０ディジタルビデオ時系列圧縮ステップ、２４０圧縮版ディジタルビデオ時系列格納ステップ、２５０，４３０キービデオ断片特定ステップ、２６０，４４０，５３０ビデオサマリ生成ステップ、２６２，５４０ビデオサマリ表示ステップ、２６４ユーザ諾否判別ステップ、２６６ユーザ設定調整ステップ、２７０ビデオサマリ表現子格納ステップ、４２０ディジタルビデオ時系列格納ステップ、４５０ビデオサマリ指示メタデータ格納ステップ、５１０データ読込ステップ、５２０ビデオフレーム抽出ステップ。 1 flash, 4 lens, 6 adjustable aperture and adjustable shutter, 8 zoom / focus motor driver, 10 digital camera, 12 timing generator, 14 image sensor, 16 ASP and A / D converter, 18 buffer memory, 20 processor , 22 Audio codec, 24 Microphone, 25 Pressure sensor, 26 Speaker, 27 Accelerometer, 28 Firmware memory, 30 Image memory, 32 Image display, 34 User controller, 36 Display memory, 38 Wired interface, 40 Computer, 42 Tilt sensor 44 Video interface 46 Video display 48 Interface / charger 50 Wireless modem 52 Radio frequency band 58 Wireless network 70 Internet, 72 Photo service provider, 210, 410 Digital video time series shooting step, 220 Feature value determination step, 230 Digital video time series compression step, 240 Compressed digital video time series storage step, 250, 430 Key video fragment identification step 260, 440, 530 Video summary generation step, 262, 540 Video summary display step, 264 User acceptance / denial determination step, 266 User setting adjustment step, 270 Video summary expression storage step, 420 Digital video time series storage step, 450 Video summary Instruction metadata storing step, 510 data reading step, 520 video frame extracting step.

Claims

A method for storing a video summary of a digital video time series taken with a digital video camera comprising:
Photographing a digital video time series having a plurality of video frames with a digital video photographing device;
Storing the digital video time series in a processor accessible memory;
Identifying one or more key video fragments corresponding to a group of video frames from a stored digital video time series;
Generating a video summary by combining key video fragments;
Identifying the storage location of the video memory in the processor-accessible memory by storing metadata indicating a video frame group corresponding to the video summary in association with the stored digital video time series; and
Having a method.

The method of claim 1, wherein the metadata includes metadata indicating a group of audio samples corresponding to a video summary.

The method of claim 1, comprising storing metadata indicative of a video transition effect applied to a boundary between key video fragments.

The method of claim 1, comprising storing metadata indicative of an audio transition effect applied to the boundary between key video fragments.

A system for displaying video frames corresponding to a video summary,
A soft copy display used to display video frames,
A processor;
And the processor is
A step of reading data indicating a video frame group corresponding to the video summary among the data associated with the stored digital video time series;
Extracting a video frame group corresponding to the video summary from the stored digital video time series;
A display step for displaying a video frame group corresponding to the video summary on the soft copy display;
Running system.

6. The system according to claim 5, wherein in the reading step, data indicating an audio sample group corresponding to a video summary is also read.

7. The system according to claim 6, wherein a group of audio samples corresponding to the video summary is extracted from the stored digital video time series.

6. The system according to claim 5, wherein in the reading step, data indicating a video transition effect used in a video summary is also read.

9. The system according to claim 8, wherein, in the display step, a video transition effect is applied to a video frame group prior to display.

6. The system according to claim 5, wherein the listed data associated with the stored digital video time series is metadata relating to the stored digital video time series.

6. The system of claim 5, wherein the listed data associated with the stored digital video time series is stored in a separate file.