JP2010245856A

JP2010245856A - Video editing device

Info

Publication number: JP2010245856A
Application number: JP2009092620A
Authority: JP
Inventors: Yoshihiro Morioka; 芳宏森岡; Naoya Kawashita; 直哉川下; Kei Chikaishi; 圭近石; Takeshi Hasegawa; 武志長谷川; Kazushi Shintani; 和司新谷; Junichi Nakahara; 淳一中原; Shinji Mikami; 進児三上; Katsuhiko Yoshida; 勝彦吉田; Makoto Yamashita; 誠山下; Kenji Matsuura; 賢司松浦
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2009-04-07
Filing date: 2009-04-07
Publication date: 2010-10-28

Abstract

<P>PROBLEM TO BE SOLVED: To provide a video editing device that can perform digest reproduction in which comparatively important scenes are collected, and scenes having same feature with little deviation as a scene do not continue. <P>SOLUTION: The video editing device calculates the evaluation of each scene, when generating a digest, according to attribute information added in the generation of video, and the attribute information obtained by analyzing the content of the video, and at the same time, finds the relevance between the scenes, and selects the scene to be extracted according to the relevance. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は映像編集装置に関し、特に、映像のダイジェスト（要約）再生を可能とする映像編集装置に関する。 The present invention relates to a video editing apparatus, and more particularly to a video editing apparatus that enables video digest (summary) playback.

特許文献１は、撮影された動画像について、負担の大きい編集操作入力なしに各種の編集が施されたダイジェスト再生をさせることを可能とする電子カメラを開示する。この電子カメラは、乱数を発生させ、この乱数に基づき、撮影した動画像に対してランダムな編集処理を行う。これにより、この電子カメラによれば、ユーザからの編集操作を入力することなく、自動的にランダムな各種の編集を施して動画像をダイジェスト再生させることができる。 Patent Document 1 discloses an electronic camera that can perform digest reproduction in which various types of editing are performed on a captured moving image without input of a heavy editing operation. The electronic camera generates a random number, and performs a random editing process on the captured moving image based on the random number. Thereby, according to this electronic camera, it is possible to automatically perform various random edits and perform digest playback of a moving image without inputting an editing operation from the user.

特許文献２には、メタデータ（属性情報）に基づいてシーンを評価し、その評価結果に基づいて、撮影した映像のシーンやクリップ数を絞り込んだダイジェスト（要約映像）を簡単に生成する映像撮影装置が開示されている。 Japanese Patent Laid-Open No. 2004-228688 evaluates a scene based on metadata (attribute information) and, based on the evaluation result, captures a scene (summary video) that narrows down the number of scenes and clips of the captured video. An apparatus is disclosed.

特開２００５−２６０７４９号公報JP 2005-260749 A 特開２００８−２２７８６０号公報JP 2008-227860 A

しかしながら、上記特許文献１は撮影された映像をランダムに編集することでダイジェスト（要約）映像を生成するものであり、抽出されるシーンの映像内容とは関係なくダイジェストが生成されるものである。 However, the above-mentioned Patent Document 1 generates a digest (summary) video by randomly editing a shot video, and a digest is generated regardless of the video content of the extracted scene.

特許文献２は、シーンそれぞれの映像内容に基づいてダイジェストを生成するものであるが、抽出される複数のシーン間での関連性まで考慮はされていない。そのため、ズームアップのような同種のシーンばかりからなるダイジェストが生成される可能性がある。 Patent document 2 generates a digest based on the video content of each scene, but does not consider the relevance between a plurality of extracted scenes. Therefore, there is a possibility that a digest consisting only of the same kind of scenes as zooming up is generated.

本発明は、上記の課題を解決するものであって、ダイジェストを生成する際に、抽出される複数のシーン間での関連性も考慮することで、より好適なダイジェスト生成を可能とするものである。 The present invention solves the above-described problems, and enables more suitable digest generation by considering the relationship between a plurality of extracted scenes when generating a digest. is there.

本発明の映像編集装置は、映像をシーンに分割し、シーン毎の属性情報を生成する属性情報生成部と、前記属性情報に基づいたシーン毎の評価と、複数のシーン間の関連性により、再生するシーンを抽出するシーン解析部と、前記再生するシーンに関する情報を記録する再生情報を生成する再生情報生成部と、を備えることを特徴とするものである。 The video editing apparatus of the present invention divides a video into scenes, generates attribute information for each scene, evaluation for each scene based on the attribute information, and relevance between a plurality of scenes, A scene analysis unit for extracting a scene to be reproduced and a reproduction information generation unit for generating reproduction information for recording information related to the scene to be reproduced are provided.

これによりシーン相互間の内容（属性）に応じてシーン抽出を行うことが可能となり、より好適なダイジェスト生成が可能となる。 As a result, scene extraction can be performed according to the content (attribute) between scenes, and a more suitable digest can be generated.

さらにシーン解析部は、抽出対象となる２つのシーン間で共通の属性を持つ際には、一方のシーンのみを抽出するものであってもよい。これにより、同じ属性を持つシーンが重複して抽出され、ダイジェストが生成されることがなくなる。 Furthermore, the scene analysis unit may extract only one scene when the two scenes to be extracted have common attributes. As a result, scenes having the same attribute are not extracted and a digest is not generated.

さらに、シーン解析部は、抽出対象となる２つのシーン間で異なる属性を持つ場合であっても、この異なる属性間での関連性に基づいて、２つのシーンの抽出の要否を判断するものであってもよい。 Furthermore, the scene analysis unit determines whether or not two scenes need to be extracted based on the relationship between the different attributes even when the two scenes to be extracted have different attributes. It may be.

これにより、属性の種類によって、シーン間の関連性を判断してシーンの抽出が可能となるので、より好適なダイジェスト生成が可能となる。 As a result, it is possible to extract scenes by determining the relevance between scenes depending on the type of attribute, and thus it is possible to generate a more suitable digest.

本発明により、ダイジェストを生成する際に、抽出される複数のシーン間での関連性も考慮することで、より好適なダイジェスト生成が可能となる。 According to the present invention, when generating a digest, it is possible to generate a more suitable digest by taking into account the relevance between a plurality of extracted scenes.

ビデオカメラ概観図Video camera overview ビデオカメラ内部のハードウェア構成図Hardware configuration diagram inside the video camera ビデオカメラ内部の機能構成図Functional configuration diagram inside the video camera 撮影した映像のクリップ、シーン、フレームの関係図Relationship diagram of clips, scenes, and frames of recorded video シーンを識別する情報を示す図Diagram showing information that identifies a scene 属性情報と評価の関係を示す図Diagram showing the relationship between attribute information and evaluation シーンに評価を割り当てた結果を示す図Figure showing the result of assigning an evaluation to a scene 評価の高いシーンから抽出した図Figure extracted from a highly evaluated scene ズームアップ／ズームダウンとシーンの関係を示した図Diagram showing the relationship between zoom up / down and scene シーン間の関連性を判断する例示のフローチャートExample flowchart for determining relevance between scenes シーンの識別情報と保持する属性の関係を示す図。The figure which shows the relationship between the identification information of a scene, and the attribute to hold | maintain. 複数の属性間と関連係数の関係を示す図Diagram showing the relationship between multiple attributes and related coefficients

（第１の実施の形態）
＜１．映像編集装置の構成＞
本実施の形態では、映像を編集する映像編集装置を用いて説明する。映像編集装置の例としては、例えばＴＶ番組等を録画し、録画した番組をダイジェスト再生するＴＶ録画機器や、映像を撮影し、撮影した映像をダイジェスト再生する機能を備えたビデオカメラ（ムービー）等がある。図１は、ビデオカメラ１００の外観図である。本実施の形態では、このビデオカメラ１００を映像編集装置として説明を行う。 (First embodiment)
<1. Configuration of video editing device>
In this embodiment, a description will be given using a video editing apparatus that edits video. Examples of the video editing device include, for example, a TV recording device that records a TV program and the like and digests and plays back the recorded program, a video camera (movie) that has a function of shooting video and digesting and playing back the captured video, and the like. There is. FIG. 1 is an external view of the video camera 100. In this embodiment, the video camera 100 will be described as a video editing apparatus.

図１のビデオカメラ内部のハードウェア構成の概略を図２に示す。ビデオカメラ１００は、レンズ群２００、撮像素子２０１、映像ＡＤＣ（ＡｎａｌｏｇＤｉｇｉｔａｌＣｏｎｖｅｒｔｅｒ）２０２、映像信号変換ＩＣ２０３、ＣＰＵ２０４、クロック２０５、レンズ制御モジュール２０６、姿勢検出センサ２０７、入力ボタン２０８、ディスプレイ２０９、スピーカー２１０、出力Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）２１１、圧縮伸張ＩＣ２１２、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２１３、ＲＡＭ（ＲａｎｄａｍＡｃｃｅｓｓＭｅｍｏｒｙ）２１４、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）２１５、音声ＡＤＣ（ＡｎａｌｏＤｉｇｉｔａｌＣｏｎｖｅｒｔｅｒ）２１６、マイクロフォン２１７、を構成要素に持つ。 FIG. 2 shows an outline of the hardware configuration inside the video camera of FIG. The video camera 100 includes a lens group 200, an image sensor 201, a video ADC (Analog Digital Converter) 202, a video signal conversion IC 203, a CPU 204, a clock 205, a lens control module 206, an attitude detection sensor 207, an input button 208, a display 209, and a speaker. 210, output I / F (Interface) 211, compression / decompression IC 212, ROM (Read Only Memory) 213, RAM (Randam Access Memory) 214, HDD (Hard Disk Drive) 215, audio ADC (Anal Digital Converter 17) Microphone 2 , As a component.

レンズ群２００は、撮像素子２０１上で被写体像を形成するために、被写体から入射する光を調整するものである。具体的には、焦点距離やズーム（映像の拡大倍率）を様々な特性を持つ複数のレンズ間の距離を変化させることで、調整するものである。これらの調整は、ビデオカメラの撮影者が手動で調整するものでも、後述するＣＰＵ２０４等からの制御により自動的に調整するものであってもよい。 The lens group 200 adjusts light incident from the subject in order to form a subject image on the image sensor 201. Specifically, the focal length and zoom (image enlargement magnification) are adjusted by changing the distance between a plurality of lenses having various characteristics. These adjustments may be performed manually by a video camera photographer or automatically by control from the CPU 204 or the like described later.

撮像素子２０１は、レンズ群２００を通して入射する光を電気信号に変換するものである。撮像素子にはＣＣＤやＣ−ＭＯＳ等を利用することが可能である。 The image sensor 201 converts light incident through the lens group 200 into an electrical signal. A CCD, C-MOS, or the like can be used for the image sensor.

映像ＡＤＣ２０２は、撮像素子２０１が出力するアナログの電気信号をデジタル信号に変換するものである。 The video ADC 202 converts an analog electric signal output from the image sensor 201 into a digital signal.

映像信号変換ＩＣ２０３は、映像ＡＤＣ２０２が出力するデジタル信号をＮＴＳＣやＰＡＬと言った所定の映像信号に変換するものである。 The video signal conversion IC 203 converts the digital signal output from the video ADC 202 into a predetermined video signal such as NTSC or PAL.

ＣＰＵ２０４は、ビデオカメラ１００全体を制御するものである。制御の種類としては、例えば、前述のレンズの焦点距離やズームの制御を、レンズ制御モジュール２０６を通じて撮像素子２０１への入射光を制御するレンズ制御、入力ボタン２０８や姿勢検出センサ２０７等からの外部入力に対する入力制御の制御、圧縮伸張ＩＣ２１２の動作制御等、これらの制御アルゴリズムをソフトウェア等で実行するものである。 The CPU 204 controls the entire video camera 100. The types of control include, for example, the control of the focal length and zoom of the lens described above, the lens control for controlling the incident light to the image sensor 201 through the lens control module 206, the external from the input button 208, the posture detection sensor 207, and the like. These control algorithms such as input control for input and operation control of the compression / decompression IC 212 are executed by software or the like.

クロック２０５は、ビデオカメラ１００内で動作するＣＰＵ２０４等の回路に処理動作の基準となるクロック信号を出力するものである。なお、クロック２０５は利用する集積回路やまた扱うデータによって、単一または複数のクロックを用いることも可能である。また、ひとつの発振子のクロック信号を任意の倍数に乗じて使用してもよい。 The clock 205 outputs a clock signal serving as a reference for processing operation to a circuit such as the CPU 204 operating in the video camera 100. Note that the clock 205 may be a single clock or a plurality of clocks depending on an integrated circuit to be used and data to be handled. Further, an arbitrary multiple of the clock signal of one oscillator may be used.

レンズ制御モジュール２０６はレンズ群２００の状態を検出するとともに、ＣＰＵ２０４からの制御に基づいてレンズを動作させるものである。レンズ制御モジュール２０６は、レンズ制御用モータとレンズ位置センサを備える。レンズ位置センサはレンズ群２００を構成する複数のレンズ間の距離または位置関係等を検出するものである。レンズ位置センサが検出した複数のレンズ間の位置情報等はＣＰＵ２０４に送信される。ＣＰＵ２０４はレンズ位置センサからの情報、撮像素子２０１等の他の構成要素からの情報に基づいて、複数のレンズを適正に配置させるための信号をレンズ制御用モータに送信する。レンズ制御用モータは、ＣＰＵ２０４から送信された制御信号に基づいてレンズを動作させるモータを駆動する。この結果、レンズ群２００の複数のレンズ間の距離が変更され、レンズの焦点距離、およびズームを調整することで、レンズ群２００を透過した入射光は撮像素子２０１上で、目的とする被写体像を結ぶ。 The lens control module 206 detects the state of the lens group 200 and operates the lens based on the control from the CPU 204. The lens control module 206 includes a lens control motor and a lens position sensor. The lens position sensor detects a distance or a positional relationship between a plurality of lenses constituting the lens group 200. Position information between the plurality of lenses detected by the lens position sensor is transmitted to the CPU 204. The CPU 204 transmits a signal for properly arranging a plurality of lenses to the lens control motor based on information from the lens position sensor and information from other components such as the image sensor 201. The lens control motor drives a motor that operates the lens based on a control signal transmitted from the CPU 204. As a result, the distance between the plurality of lenses of the lens group 200 is changed, and by adjusting the focal length and zoom of the lens, the incident light transmitted through the lens group 200 is the target object image on the image sensor 201. Tie.

なおＣＰＵ２０４は上記以外にも、ビデオカメラ１００で映像撮影時の手振れをレンズ位置センサや後述する姿勢検出センサ等で検出し、レンズ制御用モータを駆動する制御を行うことで、手振れ防止の動作をレンズ制御モジュール２０６で実行させることも可能である。 In addition to the above, the CPU 204 detects camera shake during video shooting with the video camera 100 using a lens position sensor, an attitude detection sensor described later, and the like, and performs control to drive the lens control motor, thereby preventing camera shake. It can also be executed by the lens control module 206.

姿勢検出センサ２０７は、ビデオカメラ１００の姿勢の状態を検出するものである。姿勢検出センサ２０７は、加速度センサ、角速度センサ、仰角・俯角センサ等を備える。これらの各種センサにより、ビデオカメラ１００がどのような状態で撮影を行っているかをＣＰＵ２０４は検出する。なお、これらのセンサは好ましくはビデオカメラ１００の姿勢を詳細に検出するために、３軸方向（垂直方向、水平方向等）についてそれぞれ検出できることが望ましい。 The posture detection sensor 207 detects the posture state of the video camera 100. The posture detection sensor 207 includes an acceleration sensor, an angular velocity sensor, an elevation angle / decline angle sensor, and the like. With these various sensors, the CPU 204 detects in what state the video camera 100 is shooting. Note that these sensors are preferably capable of detecting in three axial directions (vertical direction, horizontal direction, etc.) in order to detect the attitude of the video camera 100 in detail.

入力ボタン２０８は、ビデオカメラ１００の撮影者が使用する入力インタフェースの一つである。これにより、撮影者が撮影の開始や終了、ビデオ撮影中の映像にマーキングを挿入する等、各種要求をビデオカメラ１００に伝えることが可能となる。 The input button 208 is one of input interfaces used by the photographer of the video camera 100. Accordingly, the photographer can transmit various requests to the video camera 100, such as the start and end of shooting, and the insertion of a marking in the video being video-recorded.

ディスプレイ２０９は、ビデオカメラ１００が撮影した映像や映像撮影時のファインダー等として利用するため設けられている。これにより、撮影者は撮影した映像をその場で確認することが可能となる。また、それ以外にもビデオカメラ１００の各種情報を表示することで、撮影情報等、機器情報等のより詳細な情報を撮影者に伝えることが可能となる。 The display 209 is provided for use as an image captured by the video camera 100, a viewfinder at the time of image capturing, or the like. As a result, the photographer can check the photographed image on the spot. In addition, by displaying various information of the video camera 100, it is possible to convey more detailed information such as shooting information and device information to the photographer.

スピーカー２１０は、撮影した映像を再生する際の音声出力に使用される。それ以外にも、ビデオカメラ１００が、撮影者へ各種情報（例えば、撮影に関する情報等）を伝えるための音を出力する際にもスピーカー２１０を使用することが可能である。 The speaker 210 is used for audio output when playing back a captured video. In addition, it is possible to use the speaker 210 when the video camera 100 outputs a sound for conveying various kinds of information (for example, information related to photographing) to the photographer.

出力Ｉ／Ｆ２１１は、ビデオカメラ１００が撮影した映像を外部機器へ出力するために用いられる。具体的には、外部機器とケーブルで接続する場合のケーブルインタフェースや、撮影した映像をメモリカードに記録する場合のメモリカードインタフェース等などがある。これにより、撮影した映像をビデオカメラ１００に備え付けのディスプレイ２０９よりも大きな外部のディスプレイを用いて視聴等することが可能となる。 The output I / F 211 is used to output video captured by the video camera 100 to an external device. Specifically, there are a cable interface for connecting to an external device with a cable, a memory card interface for recording a photographed video on a memory card, and the like. As a result, the captured video can be viewed using an external display larger than the display 209 provided in the video camera 100.

圧縮伸張ＩＣ２１２は、撮影した映像または音声を所定のデジタルデータ形式にする（符号化処理する）ものである。具体的には、撮影した映像・音声データをＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｏｅｒｔｓＧｒｏｕｐ）やＨ２６４等の符号化処理を行い、所定のデータ方式に変換（圧縮）する。また、撮影したデータの再生時には、圧縮伸張ＩＣがこれら、所定のデータ形式の映像データを伸張してディスプレイ２０９等に表示するデータ処理をおこなうものである。 The compression / decompression IC 212 converts the captured video or audio into a predetermined digital data format (encoding processing). Specifically, the captured video / audio data is subjected to encoding processing such as MPEG (Moving Picture Experts Group) or H264, and is converted (compressed) into a predetermined data format. Further, at the time of reproducing the captured data, the compression / decompression IC performs data processing for decompressing the video data in a predetermined data format and displaying it on the display 209 or the like.

ＲＯＭ２１３は、ＣＰＵ２０４が処理するソフトウェアのプログラムやプログラムを動作させるための各種データを記録するものである。 The ROM 213 records software programs processed by the CPU 204 and various data for operating the programs.

ＲＡＭ２１４は、ＣＰＵ２０４が処理するソフトウェアのプログラム実行時に使用するメモリ領域等として使用される。また、圧縮伸張ＩＣ２１２と共用でこのＲＡＭ２１４を使用してもよい。 The RAM 214 is used as a memory area used when executing a software program processed by the CPU 204. The RAM 214 may be used in common with the compression / decompression IC 212.

ＨＤＤ２１５は、圧縮伸張ＩＣ２１２が符号化した映像データを蓄積等する目的で利用される。なお、記録されるデータはこれ以外にも、後述する再生情報のデータ等を記録することも可能である。 The HDD 215 is used for the purpose of storing video data encoded by the compression / decompression IC 212. In addition to this, it is also possible to record reproduction information data to be described later.

音声ＡＤＣ２１６は、マイクロフォン２１７が出力する音声のアナログ電気データをデジタル信号に変換処理する。 The audio ADC 216 converts the analog analog electrical data output from the microphone 217 into a digital signal.

マイクロフォン２１７は、ビデオカメラ１００外部の音声を電気信号に変換して出力する。 The microphone 217 converts the sound outside the video camera 100 into an electrical signal and outputs it.

本実施の形態では、ビデオカメラ１００のハードウェア構成の一例を上記に示したが、本発明では上記の構成に限定されるものではない。例えば、映像ＡＤＣ２０２や映像信号変換ＩＣ２０３等を単一の集積回路として実現することも可能であるし、ＣＰＵ２０４が実行するソフトウェアプログラムの一部を別途、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）を用いてハードウェアとして実現することも可能である。 In the present embodiment, an example of the hardware configuration of the video camera 100 has been described above. However, the present invention is not limited to the above configuration. For example, the video ADC 202, the video signal conversion IC 203, and the like can be realized as a single integrated circuit, and a part of a software program executed by the CPU 204 is separately separated by hardware using an FPGA (Field Programmable Gate Array). It is also possible to realize.

図３にビデオカメラ１００の機能構成図を示す。ビデオカメラ１００は、機能構成として、レンズ部３００、撮像部３０１、映像ＡＤ変換部３０２、信号処理部３０３、映像信号圧縮部３０４、撮像制御部３０５、映像解析部３０６、レンズ制御部３０７、姿勢検出部３０８、属性情報生成部３０９、多重化部３１０、記憶部３１１、シーン解析部３１２、再生情報生成部３１３、音声解析部３１４、音声信号圧縮部３１５、ダイジェスト再生部３１６、映像信号伸張部３１７、映像表示部３１８、音声信号伸張部３１９、音声出力部３２０、音声ＡＤ変換部３２１、マイク部３２２、外部入力部３２３、を備える。 FIG. 3 shows a functional configuration diagram of the video camera 100. The video camera 100 includes, as a functional configuration, a lens unit 300, an imaging unit 301, a video AD conversion unit 302, a signal processing unit 303, a video signal compression unit 304, an imaging control unit 305, a video analysis unit 306, a lens control unit 307, an attitude. Detection unit 308, attribute information generation unit 309, multiplexing unit 310, storage unit 311, scene analysis unit 312, reproduction information generation unit 313, audio analysis unit 314, audio signal compression unit 315, digest reproduction unit 316, video signal decompression unit 317, a video display unit 318, an audio signal expansion unit 319, an audio output unit 320, an audio AD conversion unit 321, a microphone unit 322, and an external input unit 323.

レンズ部３００は、被写体から入射した光の焦点距離、ズーム倍率（映像の拡大倍率）を調整するものである。これらはレンズ制御部３０７からの制御により行われる。レンズ部３００は、図２のレンズ群２００に該当する。 The lens unit 300 adjusts the focal length of the light incident from the subject and the zoom magnification (magnification magnification of the image). These are performed under the control of the lens control unit 307. The lens unit 300 corresponds to the lens group 200 in FIG.

撮像部３０１は、レンズ部３００を透過した光を電気信号に変換する。撮像部３０１は、撮像制御部３０５の制御により、撮像素子上の任意の範囲のデータを出力する。また映像データ以外にも、３原色点の色度空間情報、白色の座標、および３原色のうち少なくとも２つのゲイン情報、色温度情報、Δｕｖ（デルタｕｖ）、および３原色または輝度信号のガンマ情報等の情報も出力することが可能である。この場合、これらの情報は、属性情報生成部３０９へ出力する。撮像部３０１は、図２の撮像素子２０１に該当する。 The imaging unit 301 converts the light transmitted through the lens unit 300 into an electrical signal. The imaging unit 301 outputs data in an arbitrary range on the imaging device under the control of the imaging control unit 305. In addition to video data, chromaticity space information of three primary colors, white coordinates, and gain information of at least two of the three primary colors, color temperature information, Δuv (delta uv), and gamma information of the three primary colors or luminance signals Etc. can also be output. In this case, these pieces of information are output to the attribute information generation unit 309. The imaging unit 301 corresponds to the imaging element 201 in FIG.

映像ＡＤ変換部３０２は、撮像部３０１からのアナログ電気信号を所定の処理内容にしたがってデジタル信号へ変換する。映像ＡＤ変換部３０２は、図２の映像ＡＤＣ２０２に該当する。 The video AD conversion unit 302 converts the analog electrical signal from the imaging unit 301 into a digital signal according to predetermined processing contents. The video AD converter 302 corresponds to the video ADC 202 in FIG.

信号処理部３０３は、映像ＡＤ変換部３０２が出力したデジタル信号を所定の映像信号フォーマットに変換するものである。例えば、ＮＴＳＣ（ＮａｔｉｏｎａｌＴｅｌｅｖｉｓｉｏｎＳｙｓｔｅｍＣｏｍｍｉｔｔｅｅ）で規定された水平線の数、走査線の数、フレームレートに準拠した映像信号とするものなどである。信号処理部３０３は、図２の映像信号変換ＩＣ２０３に該当する。 The signal processing unit 303 converts the digital signal output from the video AD conversion unit 302 into a predetermined video signal format. For example, the video signal conforms to the number of horizontal lines, the number of scanning lines, and the frame rate specified by NTSC (National Television System Committee). The signal processing unit 303 corresponds to the video signal conversion IC 203 in FIG.

映像信号圧縮部３０４は、デジタル映像信号を所定の符号化変換を行い、データ量の圧縮、映像に適した符号化等の処理を行う。具体的な符号化変換としては、ＭＰＥＧ２、ＭＰＥＧ４、Ｈ２６４の符号化方式がある。映像信号圧縮部３０４は、図２の圧縮伸張ＩＣ２１２の圧縮機能に相当する。 The video signal compression unit 304 performs predetermined coding conversion on the digital video signal, and performs processing such as compression of data amount and coding suitable for video. Specific encoding conversion includes MPEG2, MPEG4, and H264 encoding methods. The video signal compression unit 304 corresponds to the compression function of the compression / decompression IC 212 in FIG.

撮像制御部３０５は、撮像部３０１の動作を制御するものである。撮像部３０１に対して、撮影時の露出量や撮影（シャッター）速度、感度等を制御するものである。また、これらの制御情報は属性情報生成部３０９へも併せて出力する。撮像制御部３０５は、図２のＣＰＵ２０４で処理される制御アルゴリズムの一つである。 The imaging control unit 305 controls the operation of the imaging unit 301. The image pickup unit 301 controls the exposure amount at the time of shooting, the shooting (shutter) speed, sensitivity, and the like. These control information are also output to the attribute information generation unit 309. The imaging control unit 305 is one of control algorithms processed by the CPU 204 in FIG.

映像解析部３０６は、撮影された映像信号から映像の特徴を抽出するものである。本実施の形態では、色情報（例えば、映像に含まれる色の分布を検出する）や、ホワイトバランス、映像に人物の顔が含まれている場合には、顔検出を行う等、映像信号を解析することで、映像の特徴を抽出するものである。なお、色分布の検出は映像信号を形成するデータに含まれる色情報を確認することで実現可能である。また、顔検出については、パターンマッチング等を用いることにより実現可能である。映像解析部３０６は、図２のＣＰＵ２０４でソフトウェア処理されるアルゴリズムの一つである。 The video analysis unit 306 extracts video features from the captured video signal. In the present embodiment, the color signal (for example, color distribution included in the video is detected), white balance, and when the video includes a human face, the video signal is detected by performing face detection. By analyzing, the feature of the video is extracted. The color distribution can be detected by confirming color information included in the data forming the video signal. Further, face detection can be realized by using pattern matching or the like. The video analysis unit 306 is one of algorithms processed by the CPU 204 in FIG.

レンズ制御部３０７は、レンズ部３００の動作を制御するものである。レンズ制御には、ズーム制御、フォーカス制御、また手振れ補正制御等がある。レンズ制御部３０７はレンズ部３００を制御するとともに、これらの制御情報を属性情報生成部３０９へ出力する。レンズ制御部３０７は、図２のレンズ制御モジュール２０６に相当する。 The lens control unit 307 controls the operation of the lens unit 300. Lens control includes zoom control, focus control, camera shake correction control, and the like. The lens control unit 307 controls the lens unit 300 and outputs the control information to the attribute information generation unit 309. The lens control unit 307 corresponds to the lens control module 206 in FIG.

姿勢検出部３０８は、ビデオカメラ１００の加速度、角速度、仰角・俯角等を検出する。こられ検出された情報はビデオカメラ１００の姿勢やその変化状況を属性情報として属性情報生成部３０９へ出力される。加速度や角速度については、垂直・水平（２方向）の３方向について検出できることが望ましい。姿勢検出部３０８は、図２の姿勢検出センサ２０７に相当する。 The posture detection unit 308 detects acceleration, angular velocity, elevation angle, depression angle, and the like of the video camera 100. The detected information is output to the attribute information generation unit 309 using the attitude of the video camera 100 and its change state as attribute information. It is desirable that acceleration and angular velocity can be detected in three directions, vertical and horizontal (two directions). The posture detection unit 308 corresponds to the posture detection sensor 207 in FIG.

属性情報生成部３０９は、映像撮影時の撮影情報、外部入力情報及びその他の情報を属性情報（メタデータ）とする。属性情報の一例として以下のような情報が考えられる。 The attribute information generation unit 309 uses shooting information at the time of video shooting, external input information, and other information as attribute information (metadata). The following information can be considered as an example of attribute information.

・撮影開始日時（撮影開始時間）
・撮影終了日時（撮影終了時間）
・撮影時間（再生時間）
・焦点距離
・ズーム倍率
・露出
・撮影速度
・受光感度
・３原色点の色空間情報
・ホワイトバランス
・３原色のうち少なくとも２つのゲイン情報
・色温度情報
・ Δｕｖ（デルタｕｖ）
・３原色または輝度信号のガンマ情報
・色分布
・顔認識情報
・カメラ姿勢（加速度、角速度、仰角・俯角等）
・撮影時刻（撮影開始時刻、終了時刻）
・撮影インデックス情報
・ユーザ入力
・フレームレート
・サンプリング周波数
・特徴のある音声（特定の音の入力）
・シーン切り替え
・撮影状態（三脚等を用いた固定撮影、あるいは、撮影者が手に持って撮影等）
・映像撮影時の手振れ状態
・顔検出
・顔認識
また、上記以外の情報でも、映像に関連する情報であれば、属性情報となる。・ Shooting start date and time (shooting start time)
・ Shooting end date and time (shooting end time)
・ Shooting time (playback time)
-Focal length-Zoom magnification-Exposure-Shooting speed-Photosensitivity-Color space information for the three primary colors-White balance-Gain information for at least two of the three primary colors-Color temperature information-Δuv (Delta uv)
・ Gamma information of three primary colors or luminance signals ・ Color distribution ・ Face recognition information ・ Camera posture (acceleration, angular velocity, elevation angle, depression angle, etc.)
・ Shooting time (shooting start time, end time)
・ Shooting index information ・ User input ・ Frame rate ・ Sampling frequency ・ Characteristic audio (input of specific sound)
・ Scene switching ・ Shooting status (fixed shooting using a tripod, etc., or shooting by the photographer in hand)
-Camera shake during video shooting-Face detection-Face recognition Also, information other than the above is attribute information if it is information related to the video.

上記の各種情報から算出される情報（２次的情報）も属性情報に含まれる。２次的情報としては、例えば、カメラ姿勢（加速度、角速度、仰角・俯角等）の情報からビデオカメラ１００が撮影時の動作として「パン」、「ティルト」等のカメラワークも属性情報となる。また、焦点距離や、ズーム倍率の情報はそのままでも、属性情報として用いることが可能である。また、それ以外にも、上記の各種情報を組み合わせ、また分析等することで新たに生成される情報も属性情報となる。属性情報生成部３０９は、撮影時の各種情報からシーン評価に有用な情報を抽出、算出等して属性情報を生成する。 Information (secondary information) calculated from the various types of information is also included in the attribute information. As the secondary information, for example, camera work such as “pan” and “tilt” as attribute information is also attribute information from the camera posture (acceleration, angular velocity, elevation angle, depression angle, etc.) information when the video camera 100 is shooting. Further, the focal length and zoom magnification information can be used as attribute information as they are. In addition to this, information newly generated by combining and analyzing the above-described various information is also attribute information. The attribute information generation unit 309 generates attribute information by extracting and calculating information useful for scene evaluation from various types of information at the time of shooting.

多重化部３１０は、映像信号圧縮部３０４からの符号化映像データ、音声信号圧縮部３１３からの符号化音声データ、属性情報生成部３０９からの属性情報を多重化して出力するものである。多重化部３１４は、図２のＣＰＵ２０４で実行されるソフトウェアであってもよいし、圧縮伸張ＩＣ２１２で、処理されるものであってもよい。 The multiplexing unit 310 multiplexes the encoded video data from the video signal compression unit 304, the encoded audio data from the audio signal compression unit 313, and the attribute information from the attribute information generation unit 309 and outputs the multiplexed data. The multiplexing unit 314 may be software executed by the CPU 204 in FIG. 2, or may be processed by the compression / decompression IC 212.

記憶部３１１は、多重化部３１０から出力された符号化映像データ、符号化音声データ、属性情報が多重化されたデータを一時保持又は長期保持するものである。また、記憶部３１１は、再生情報生成部３１３が生成する再生情報についても保持するものである。記憶部３１５は、図２のＨＤＤ２１５やＲＡＭ２１４に該当する。 The storage unit 311 temporarily holds the encoded video data, the encoded audio data, and the data multiplexed with the attribute information output from the multiplexing unit 310 for a long time. The storage unit 311 also holds the reproduction information generated by the reproduction information generation unit 313. The storage unit 315 corresponds to the HDD 215 or the RAM 214 in FIG.

シーン解析部３１２は、属性情報生成部３０９が生成した属性情報に基づいて、それぞれのシーンを評価し、その結果に基づいて再生すべきシーンを選択するものである。シーンの評価や選択方法等について後に詳細に説明する。 The scene analysis unit 312 evaluates each scene based on the attribute information generated by the attribute information generation unit 309 and selects a scene to be reproduced based on the result. The scene evaluation and selection method will be described in detail later.

再生情報生成部３１３は、シーン解析部３１２が選択した再生すべきシーンを再生情報として生成するものである。この点についても後述する。 The reproduction information generation unit 313 generates a scene to be reproduced selected by the scene analysis unit 312 as reproduction information. This point will also be described later.

属性情報生成部３０９、シーン解析部３１２、再生情報生成部３１３は、図２のＣＰＵ２０４においてソフトウェアとして処理される。 The attribute information generation unit 309, the scene analysis unit 312 and the reproduction information generation unit 313 are processed as software by the CPU 204 in FIG.

音声解析部３１４は、音声データから特徴のある音を抽出するものである。ここで特徴のある音とは、例えば、撮影者の声、特定の単語の発音、歓声、銃声等がある。これらの音の識別は、これらの音（音声）が持つ特有の周波数を予め登録しておき、それとの比較結果で判別するなどの方法等を用いることで、抽出が可能である。またこれ以外にも、マイクが捕捉した音の入力レベル等の特徴も検出する。音声解析部３１４は、図２のＣＰＵ２０４でソフトウェア処理されるアルゴリズムの一つである。 The voice analysis unit 314 extracts a characteristic sound from the voice data. The characteristic sounds here include, for example, a photographer's voice, pronunciation of a specific word, cheers, gunshots, and the like. These sounds can be identified by registering in advance the unique frequencies of these sounds (speech) and using a method such as discrimination based on the comparison result. In addition, other features such as the input level of the sound captured by the microphone are also detected. The voice analysis unit 314 is one of algorithms processed by the CPU 204 in FIG.

音声信号圧縮部３１５は、音声ＡＤ変換部３２１が出力した音声データを所定の符号化アルゴリズムで変換するものである。符号化にはＭＰ３（ＭＰＥＧＡｕｄｉｏＬａｙｅｒ−３）やＡＡＣ（ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉｎｇ）等の方法がある。音声信号圧縮部３１５は、図２の圧縮伸張ＩＣ２１２での圧縮機能の一つである。 The audio signal compression unit 315 converts the audio data output from the audio AD conversion unit 321 using a predetermined encoding algorithm. There are methods such as MP3 (MPEG Audio Layer-3) and AAC (Advanced Audio Coding) for encoding. The audio signal compression unit 315 is one of the compression functions in the compression / decompression IC 212 in FIG.

ダイジェスト再生部３１６は、記憶部３１１に記録された再生情報に基づいて、同じく記憶部３１１に記録された多重化されたデータの映像データ、音声データを映像信号伸張部３１７、音声信号伸張部３１９でそれぞれ復号させ、映像表示部３１８、音声出力部３２０から出力する。ダイジェスト再生部３１６は、図２のＣＰＵ２０４で実行されるソフトウェア処理のアルゴリズムである。 Based on the reproduction information recorded in the storage unit 311, the digest reproduction unit 316 converts the video data and audio data of the multiplexed data recorded in the storage unit 311 into a video signal expansion unit 317 and an audio signal expansion unit 319. Are decoded and output from the video display unit 318 and the audio output unit 320, respectively. The digest playback unit 316 is a software processing algorithm executed by the CPU 204 of FIG.

音声ＡＤ変換部３２１は、マイク部３２２が出力したアナログの音声信号をデジタル信号の音声データに変換する。音声ＡＤ変換部３２１は、図２の音声ＡＤＣ２１６に相当する。 The audio AD conversion unit 321 converts the analog audio signal output from the microphone unit 322 into audio data of a digital signal. The audio AD conversion unit 321 corresponds to the audio ADC 216 in FIG.

マイク部３２２は、周囲の音を電気信号に変化して音声信号として出力するものである。マイク部３２２は、図２のマイクロフォン２１７に相当する。 The microphone unit 322 converts ambient sounds into electrical signals and outputs them as audio signals. The microphone unit 322 corresponds to the microphone 217 in FIG.

外部入力部３２３は、映像撮影時に外部から受信した各種の情報、例えば、撮影者によるボタン入力、外部から通信経由で受信した撮影インデックス情報等を出力するものである。なお、撮影インデックス情報とは、例えば、映画撮影時における、撮影場面を識別する番号や、撮影回数を示す番号等のそれぞれの撮影を識別するために用いられる識別番号などである。外部入力部３２３は、図２の入力ボタン２０８等に該当する。 The external input unit 323 outputs various types of information received from the outside during video shooting, such as button input by a photographer, shooting index information received from the outside via communication, and the like. Note that the shooting index information is, for example, an identification number used for identifying each shooting such as a number for identifying a shooting scene or a number indicating the number of shooting times at the time of shooting a movie. The external input unit 323 corresponds to the input button 208 in FIG.

上記の構成により、ビデオカメラ１００で撮影された映像は、属性情報に基づいて、撮影された映像から好ましいシーンの抽出を自動的に行い、その部分のみを再生等することが可能となる。 With the above-described configuration, a video captured by the video camera 100 can automatically extract a preferable scene from the captured video based on the attribute information, and can reproduce only that portion.

なお、図２及び図３のハードウェア構成図、機能図は本実施の形態における一態様であり、これに限定されるものではない。例えば、図３において、シーン解析部３１２、再生情報生成部３１３は、記憶部３１１に記録されたデータを読み出して処理しているが、記憶部３１１に多重化データが記録される前に、圧縮された映像信号、圧縮された音声信号と、撮影時の属性情報に基づいてシーン解析、再生情報生成の処理等を行うものであっても良い。 Note that the hardware configuration diagrams and functional diagrams of FIGS. 2 and 3 are one aspect of the present embodiment, and the present invention is not limited to this. For example, in FIG. 3, the scene analysis unit 312 and the reproduction information generation unit 313 read and process data recorded in the storage unit 311, but before the multiplexed data is recorded in the storage unit 311, compression is performed. It is also possible to perform processing such as scene analysis and reproduction information generation based on the recorded video signal, compressed audio signal, and attribute information at the time of shooting.

また上記に示す、図２のハードウェア構成と図３の機能構成との関係は、本実施の形態における一態様であり、これに限定されるものではない。例えば、ＣＰＵ２０４のソフトウェアアルゴリズムで処理される撮像制御部３０５の処理が、独立したハードウェアＩＣ等で実施される場合であってもよい。 Further, the relationship between the hardware configuration of FIG. 2 and the functional configuration of FIG. 3 described above is one aspect of the present embodiment, and the present invention is not limited to this. For example, the process of the imaging control unit 305 processed by the software algorithm of the CPU 204 may be performed by an independent hardware IC or the like.

＜２．撮影したシーンの解析、再生情報の生成＞
図４は、ビデオカメラ１００が撮影する映像の構成を示す図である。撮影者が撮影開始を指示し、撮影の終了又は撮影の一時停止を指示するまでに撮影された映像の単位を「クリップ」とする。撮影者が撮影の開始、撮影の終了又は一時停止を何度も繰り返すと、クリップが複数生成される（図４（Ａ））。 <2. Analysis of captured scene, generation of playback information>
FIG. 4 is a diagram illustrating a configuration of an image captured by the video camera 100. The unit of the video imaged until the photographer gives an instruction to start shooting and gives an instruction to end shooting or pause shooting is “clip”. When the photographer repeats the start of shooting, the end of shooting, or the pause repeatedly, a plurality of clips are generated (FIG. 4A).

一つのクリップは一つ又は複数の「シーン」から構成される。「シーン」は、時間的に連続した一続きの映像である。シーンは、任意に設定することが可能である。例えば、クリップを一つのシーンとして「１クリップ」＝「１シーン」と設定してもよい。また、画面が大きく変わることを境として「シーン」を設定してもよい。この場合には、映像解析部３０６がフレーム間の動きベクトルを算出し、「動き」の大きさ（変化）が所定の値より大きい場合を、「シーン」の切り替え部分としてもよい。また、映像の内容にしたがって論理的なまとまりを一つの「シーン」と設定してもよい。この場合には、撮影者が入力ボタン等で論理的な区切りを入力する等で対応することで可能である。この場合に、撮影者の明確な意図で「クリップ」内の「シーン」を構成することが可能となる。それ以外にも、一定時間毎にシーンを区切っても良い（図４（Ｂ））。 One clip is composed of one or a plurality of “scenes”. A “scene” is a continuous video sequence in time. A scene can be set arbitrarily. For example, “1 clip” = “1 scene” may be set with a clip as one scene. In addition, a “scene” may be set on the boundary that the screen changes greatly. In this case, when the video analysis unit 306 calculates a motion vector between frames and the magnitude (change) of “motion” is larger than a predetermined value, the “scene” switching portion may be used. Further, a logical group may be set as one “scene” according to the content of the video. In this case, it is possible for the photographer to respond by inputting a logical break with an input button or the like. In this case, the “scene” in the “clip” can be configured with a clear intention of the photographer. In addition, the scene may be divided at regular intervals (FIG. 4B).

「シーン」は、一つ又は複数の「フレーム」から構成される。「フレーム」は映像を構成する個々の画像である（図４（Ｃ））。 A “scene” is composed of one or more “frames”. A “frame” is an individual image constituting a video (FIG. 4C).

図５は、シーン解析部３１０が、クリップ内を複数のシーンに分割した場合の例を示すものである。シーン解析部３１０は、上述の通り属性情報等に基づいてクリップを分割する。図５では、それぞれのシーンを「開始時間」と「終了時間」で定めているが、フレーム番号等でシーンの開始と終了を定めても良い。 FIG. 5 shows an example in which the scene analysis unit 310 divides the clip into a plurality of scenes. The scene analysis unit 310 divides the clip based on the attribute information and the like as described above. In FIG. 5, each scene is defined by “start time” and “end time”, but the start and end of the scene may be defined by a frame number or the like.

図６は、シーン解析部３１２が、それぞれのシーンを評価する際に用いる属性情報と評価の関係例を示した図である。例えば、クリップイン（撮影の開始部分）やクリップアウト（撮影の終了前部分）部分については、映像の導入部分や重要部分と考えている場合は、撮影された映像が持つ論理的な意味が高いと推論される。この例では、クリップイン（Ａ）及びクリップアウト（Ｆ）は、評価が１００となっている。その他にも、撮影時のカメラワークとしてズームアップ(Ｄ)やズームダウン（Ｇ）についても、特定の被写体への注目度を高めるとの観点から評価が３０として定められている。このように、シーン解析部３１２は予め属性情報それぞれに対する数値化された評価を持っている。図６の例では、評価点が高いほど高い評価（好ましい）として表現している。シーン解析部３１２は、このような図６の属性情報と評価との関係に基づいて各シーンを評価する。なお、一つのシーンに複数の属性情報が与えられている場合は、それぞれの属性情報に割り当てられている評価（評価点）を加算してもよい。また、一つのシーンに複数の属性情報が与えられている場合、その属性情報のなかから最も評価の高い属性が持つ評価（評価点）を当該シーンの評価点としてもよい。さらに、シーン内に含まれる種々の属性を考慮するのであれば、複数の属性の評価点の平均値等を評価としてもよい。さらに、より詳細に評価を行う場合には、シーンに含まれるフレーム毎に評価をしてもよい。なお、評価は好ましいシーンだけに行う必要はない。例えば、撮影時の手振れは、映像の視聴者に見づらい映像となる可能性があるので、こういった属性を持つシーンには減点（マイナス点）の評価をおこなうものであってもよい。 FIG. 6 is a diagram showing an example of the relationship between the attribute information used when the scene analysis unit 312 evaluates each scene and the evaluation. For example, if the clip-in (shooting start part) or clip-out (pre-shooting end part) part is considered as an introduction part or an important part of the picture, the logical meaning of the shot picture is high. It is inferred. In this example, the evaluation of clip-in (A) and clip-out (F) is 100. In addition, with regard to zoom-up (D) and zoom-down (G) as camera work at the time of shooting, the evaluation is set to 30 from the viewpoint of increasing the degree of attention to a specific subject. As described above, the scene analysis unit 312 has a numerical evaluation for each piece of attribute information in advance. In the example of FIG. 6, the higher the evaluation score, the higher the evaluation (preferred). The scene analysis unit 312 evaluates each scene based on the relationship between the attribute information and evaluation in FIG. When a plurality of pieces of attribute information are given to one scene, evaluations (evaluation points) assigned to the respective attribute information may be added. Further, when a plurality of pieces of attribute information are given to one scene, the evaluation (evaluation point) of the attribute having the highest evaluation among the attribute information may be used as the evaluation point of the scene. Furthermore, if various attributes included in the scene are taken into consideration, an average value of evaluation points of a plurality of attributes may be evaluated. Furthermore, when evaluating in more detail, you may evaluate for every flame | frame contained in a scene. Note that the evaluation need not be performed only on a preferable scene. For example, camera shake at the time of shooting may result in a video that is difficult for the viewer of the video to view. Therefore, a scene having such an attribute may be evaluated for a deduction (minus point).

なお、図６の属性情報と評価の関係は、一つに限定するものではない。例えば、複数の属性情報・評価の組み合わせデータをビデオカメラ１００の撮影者が撮影するモード（例えば、風景の撮影、人物（ポートレート）撮影、静物撮影等）によって切り換えるものであってもよい。また、予め複数の組み合わせデータを備えておき、撮影モードによって、複数のデータを合成（それぞれの評価の値を一定の比率で加算等）したものであってもよい。この場合には、合成の比率を変えることで、動的に属性情報・評価の組み合わせデータを変更することが可能となる。 Note that the relationship between the attribute information and the evaluation in FIG. 6 is not limited to one. For example, the combination data of a plurality of attribute information / evaluation may be switched depending on a mode (for example, landscape shooting, person (portrait) shooting, still life shooting, etc.) in which the video camera 100 is shooting. Alternatively, a plurality of combination data may be provided in advance, and a plurality of data may be combined (addition of respective evaluation values at a certain ratio, etc.) depending on the shooting mode. In this case, the combination data of attribute information and evaluation can be dynamically changed by changing the composition ratio.

図７はシーン解析部３１２が、図５のシーンに分割した映像の各シーンに図６に従って、評価（優先度）を割り当てた結果を示した図である。図７は横軸に時間（シーン）を、縦軸に各シーンの評価（優先度）を示したものである。 FIG. 7 is a diagram showing a result of assigning an evaluation (priority) according to FIG. 6 to each scene of the video divided into the scene of FIG. FIG. 7 shows time (scene) on the horizontal axis and evaluation (priority) of each scene on the vertical axis.

図７の時間０付近のＡは、撮影を開始した直後であるため「クリップイン」の属性を持つ。図６に従えば、「クリップイン」の属性は評価（優先度）１００を持つ。 A in the vicinity of time 0 in FIG. 7 has an attribute of “clip in” because it is immediately after the start of shooting. According to FIG. 6, the attribute of “clip-in” has an evaluation (priority) of 100.

Ｂの符号が付けられているシーンは、「特定音声の抽出」の属性を持つ。特定音声の抽出は、上述の音声解析部３１４等により得られるものである。図６に従えば、「特定音声の抽出」の属性は評価（優先度）７０を持つ。 The scene to which the symbol B is attached has an attribute of “extraction of specific sound”. The extraction of the specific voice is obtained by the above-described voice analysis unit 314 or the like. According to FIG. 6, the attribute of “extraction of specific speech” has an evaluation (priority) 70.

Ｃの符号が付けられているシーンは、撮影者がカメラをパン、ティルト等のビデオカメラ１００本体を動かして撮影した後に、静止して撮影することを意味する属性を示す。このような撮影は、撮影者が静止して撮影する際の被写体に非常に興味を持っている（注目している）として評価が高いと判断できることが考えられる。図６に従えば、このような属性は評価（優先度）４０を持つ。 The scene with the symbol C indicates an attribute that means that the photographer shoots by moving the camera body of the video camera 100 such as pan and tilt, and then shoots still. It is conceivable that such shooting can be judged to have a high evaluation because the photographer is very interested in (attention to) the subject when shooting at a standstill. According to FIG. 6, such an attribute has an evaluation (priority) 40.

Ｄの符号が付けられているシーンは、ビデオカメラをズームアップして撮影しているシーンである。図６では、ズームアップでは評価（優先度）３０を持つ。 A scene to which a symbol D is attached is a scene in which the video camera is zoomed up and photographed. In FIG. 6, the zoom-up has an evaluation (priority) 30.

Ｅの符号が付けられているシーンは、ビデオカメラをズームダウンして撮影しているシーンである。図６では、ズームダウンはズームアップと同じ評価（優先度）３０を持つ。 The scene with the symbol E is a scene that is shot with the video camera zoomed down. In FIG. 6, zoom-down has the same evaluation (priority) 30 as zoom-up.

なお、ズームアップとズームダウンで評価の値をかえることも可能である。例えばズームアップをズームダウンよりも評価を高く設定することで、ズームアップで撮影されるシーン、つまり映像の拡大倍率が大きくなるシーン（拡大されて撮影される被写体があるシーン）、について高い評価（優先度）を割り当てても良い。逆に、映像の拡大倍率が小さくなるシーンについては比較的低い評価（優先度）を割り当てることも可能である。 It is also possible to change the evaluation value by zooming up and zooming down. For example, by setting the zoom-up higher than the zoom-down, the scene that is shot with the zoom-up, that is, the scene where the enlargement magnification of the video is large (the scene with the subject that is shot with the zoom) is highly evaluated ( (Priority) may be assigned. On the other hand, a relatively low evaluation (priority) can be assigned to a scene with a small video magnification.

Ｆの符号が付けられているシーンは、撮影終了直前のため「クリップアウト」の属性を持つ。図６に従えば、「クリップアウト」の属性は評価（優先度）１００を持つ。 The scene with the symbol F has a “clipout” attribute because it is just before the end of shooting. According to FIG. 6, the attribute of “clipout” has an evaluation (priority) of 100.

Ｇの符号が付けられているシーンは、ビデオカメラがパン、ティルト等の動きをともなった撮影（カメラワーク）を行ったシーンである。この場合は評価（優先度）２５が割り当てられる。 The scene to which G is attached is a scene in which the video camera performs shooting (camera work) with movement such as panning and tilting. In this case, evaluation (priority) 25 is assigned.

Ｚの不要が割り当てられているシーンは、顔検出のあるシーンである。これは、映像信号から被写体映像に人物の「顔」が検出されていることを示す。図６に従えば、顔検出は評価（優先度）８０を持つ。なお、シーンの評価として顔検出をより技術的に進めた、「顔認識」をもちいても良い。顔認識は、被写体映像に複数の人物の顔がある場合に、特定の顔を識別する技術である。 A scene to which Z is not required is a scene with face detection. This indicates that a “face” of a person is detected in the subject video from the video signal. According to FIG. 6, face detection has an evaluation (priority) 80. It should be noted that “face recognition”, which is more technically advanced in face detection, may be used as a scene evaluation. Face recognition is a technique for identifying a specific face when there are a plurality of human faces in the subject video.

以上よりシーン解析部３１２は、各シーンについて評価（優先度）を割り当てる。なお、図７の例では、シーンを単位として評価を割り当てたが、シーン解析部３１２はクリップ単位、又は、フレーム単位で上記の評価割り当てをおこなってもよい。 As described above, the scene analysis unit 312 assigns an evaluation (priority) to each scene. In the example of FIG. 7, the evaluation is assigned in units of scenes. However, the scene analysis unit 312 may perform the above-described evaluation assignments in units of clips or frames.

シーン解析部３１２は、さらに、各シーンに割り当てた評価に基づいて、通常好ましいシーンのみを抽出する。簡単な例としては、各シーンに含まれる最も高い評価を代表値として採用し、その代表値が高いシーンのみを抽出する方法である。図７の例では、＃１、＃５、＃８の部分のシーンのみを抽出することとなる。 The scene analysis unit 312 further extracts only normally preferable scenes based on the evaluation assigned to each scene. A simple example is a method in which the highest evaluation included in each scene is adopted as a representative value, and only scenes having a high representative value are extracted. In the example of FIG. 7, only the scenes of the portions # 1, # 5, and # 8 are extracted.

なお、抽出方法については、抽出されたシーンの合計再生時間が所定の時間以内であること、又は、シーンの評価が一定以上であること等、様々な観点で抽出をすることが可能である。 As for the extraction method, extraction can be performed from various viewpoints such as that the total playback time of the extracted scenes is within a predetermined time, or that the evaluation of the scene is more than a certain level.

再生情報成部３１３は、シーン解析部３１２が抽出したシーンに従って、映像再生の手順・方法を記した再生情報を生成する。この再生情報とは、例えば、図８に示すような、再生対象となるシーンの開始時刻と終了時刻で示されるものであってもよい。この場合、各シーンの中における代表的な映像画面（シーン中における最も評価の高い映像画面等）を別途記録しておくと、参照用画面の検索にも有効である。 The reproduction information generation unit 313 generates reproduction information describing the procedure / method of video reproduction according to the scene extracted by the scene analysis unit 312. This reproduction information may be indicated by the start time and end time of a scene to be reproduced as shown in FIG. 8, for example. In this case, if a representative video screen in each scene (a video screen having the highest evaluation in the scene, etc.) is recorded separately, it is also effective for searching the reference screen.

なお、再生情報の形態は図８に限られず、他の形態であってもよい。例えば、フレーム番号での指定などでも可能である。また、再生情報生成部３１３が生成した再生情報を、ＭＰＥＧ等のＴＳ（ＴｒａｎｓｐｏｒｔＳｔｒｅａｍ）として多重化部３１０で符号化された映像信号、音声信号と多重化する場合等には、多重化時に用いられる時刻情報（例えばＰＴＳやＤＴＳの時刻情報）等を用いて再生情報を記録することも可能である。Ｈ２６４の場合にも同様に所定の多重化時の時刻情報を用いてもよい。 Note that the form of the reproduction information is not limited to that shown in FIG. 8, but may be other forms. For example, designation by a frame number is also possible. In addition, when the reproduction information generated by the reproduction information generation unit 313 is multiplexed with a video signal or audio signal encoded by the multiplexing unit 310 as a TS (Transport Stream) such as MPEG, it is used at the time of multiplexing. It is also possible to record the reproduction information using the time information (for example, time information of PTS or DTS). Similarly, in the case of H264, time information at the time of predetermined multiplexing may be used.

さらに、一部のビデオカメラのデータ記録方式として用いられているＡＶＣＨＤ（ＡｄｖａｎｃｅｄＶｉｄｅｏＣｏｄｅｃＨｉｇｈＤｅｆｉｎｉｔｉｏｎ）等の規格を用いて映像データを記録する場合には、ＰｌａｙＬｉｓｔファイル等に再生情報を記録する方法を用いても良い。 Furthermore, when video data is recorded using a standard such as AVCHD (Advanced Video Code High Definition), which is used as a data recording method of some video cameras, a method of recording reproduction information in a PlayList file or the like is used. It may be used.

以上により、撮影した映像からより好適にダイジェスト映像（要約された映像）を自動的に生成することが可能となる。 As described above, it is possible to automatically generate a digest video (summarized video) more appropriately from the captured video.

＜３．属性情報に基づいたシーンの抽出にについて＞
シーン解析部３１２が、撮影した映像からシーンを選択、抽出する方法として各シーンが持つ評価（優先度）の高いものから順に抽出するというのは、シンプルな方法である。しかし、この方法で抽出されたシーンのみでダイジェストを生成すると、似たようなシーンばかりで構成される可能性が高くなり、視聴者が見づらいダイジェストとなる可能性がある。 <3. About scene extraction based on attribute information>
It is a simple method that the scene analysis unit 312 extracts the scenes from the captured video in order from the highest evaluation (priority) of each scene. However, if a digest is generated only with a scene extracted by this method, there is a high possibility that the scene is composed only of similar scenes, and there is a possibility that the digest is difficult for the viewer to see.

図７の例において、抽出するシーンの数を５つまで可能とする、上記の例では、それぞれのシーンが持つ代表値の高い順に５つ選ぶと、シーン＃１、シーン＃４、シーン＃５、シーン＃６、シーン＃８が選択される。次に図７の撮影された映像をズームアップとズームダウンの観点で映像内容を見てみる。図９は、図７からズームアップ（Ｄ）とズームダウン（Ｅ）の属性情報のみを抽出した図である。これによると、シーン＃２からシーン＃６まではズームアップで撮影されていることが分かる。そのため、上記の方法で抽出したダイジェストでは、シーン＃４、シーン＃５、シーン＃６の連続した部分については、ズームアップされた映像が続くこととなる。そのため、すべてのシーンがズームアップの映像であり、ダイジェスト映像は非常に見づらい内容となる可能性がある。 In the example of FIG. 7, the number of scenes to be extracted can be up to five. In the above example, if five scenes are selected in descending order of representative values, scene # 1, scene # 4, and scene # 5 are selected. Scene # 6 and Scene # 8 are selected. Next, let us look at the video content of the captured video in FIG. 7 from the viewpoint of zooming up and zooming down. FIG. 9 is a diagram in which only the attribute information of zoom-up (D) and zoom-down (E) is extracted from FIG. According to this, it can be seen that scenes # 2 to # 6 are photographed with zoom-in. Therefore, in the digest extracted by the above method, the zoomed-up video continues for the continuous portion of scene # 4, scene # 5, and scene # 6. Therefore, all the scenes are zoomed-in images, and the digest images may be very difficult to see.

そこで、本実施の形態のシーン解析部３１２では、あるシーンと他のシーンとの関係にも考慮してシーン抽出を行う。具体的には、図１０に示すフローチャートに従ってシーンを抽出する。図１０のシーン抽出方法を以下に説明する。 Therefore, the scene analysis unit 312 according to the present embodiment performs scene extraction in consideration of the relationship between a certain scene and another scene. Specifically, the scene is extracted according to the flowchart shown in FIG. The scene extraction method in FIG. 10 will be described below.

Ｓ１０１０では、撮影した映像からダイジェストの候補となるシーンの抽出が完了されたか否かを判断する。上記の例では、シーンが５つ抽出されていれば、完了と判断され、Ｓ１０３０の処理へ進む。５つ未満のシーンしか抽出されていない場合は、Ｓ１０２０の処理へ進む。 In S1010, it is determined whether or not extraction of a scene that is a digest candidate is completed from the captured video. In the above example, if five scenes have been extracted, it is determined that the process has been completed, and the process proceeds to S1030. If less than five scenes have been extracted, the process proceeds to S1020.

Ｓ１０２０では、撮影した映像からダイジェストの候補となるシーンの抽出を行う。具体的には、未抽出のシーンの中で評価の高い、シーンを抽出する。Ｓ１０１０とＳ１０２０を必要回数繰り返すことで、ダイジェストに必要なシーンの抽出が完了する。 In step S1020, a scene that is a digest candidate is extracted from the captured video. Specifically, a scene with a high evaluation is extracted from unextracted scenes. By repeating S1010 and S1020 as many times as necessary, the extraction of the scene necessary for the digest is completed.

Ｓ１０３０では、抽出された複数のシーン間での関係について処理がすべて完了していれば、本フローチャートの処理を終了する。未完了の場合は、Ｓ１０４０以降の処理を行う。 In S1030, if all the processes are completed for the relationships between the extracted scenes, the process of this flowchart ends. If not completed, the processing from S1040 is performed.

Ｓ１０４０では、抽出されたシーンから第１のシーンと第２のシーンを取得する。第１のシーンの取得基準は、まだＳ１０４０〜Ｓ１０６０の処理において第１のシーンとして取得されていないシーンの中で最も評価の高いシーンを優先的に取得対象とする。第２のシーンの取得基準は、ダイジェスト候補として取得されたシーンの中で、第１のシーンに時間的に隣接するシーンを取得対象とするものである。 In S1040, a first scene and a second scene are acquired from the extracted scene. The acquisition criterion for the first scene is that the scene with the highest evaluation among the scenes that have not yet been acquired as the first scene in the processing of S1040 to S1060 is preferentially acquired. The second scene acquisition criterion is to acquire a scene temporally adjacent to the first scene among scenes acquired as digest candidates.

図７の例でいえば、ダイジェストの候補としてシーン＃１、シーン＃４、シーン＃５、シーン＃６、シーン＃８が抽出されている。このなかで、最も高い評価をもっているのがシーン＃１とシーン＃８である。ここで、この場合は、第１のシーンをシーン＃１とする。そうすると第１のシーンと連続するダイジェスト候補のシーンは、シーン＃３となる。 In the example of FIG. 7, scene # 1, scene # 4, scene # 5, scene # 6, and scene # 8 are extracted as digest candidates. Of these, scene # 1 and scene # 8 have the highest evaluation. In this case, the first scene is scene # 1. Then, the digest candidate scene that is continuous with the first scene is scene # 3.

Ｓ１０５０では、第１のシーン（ここではシーン＃１）と第２のシーン（ここではシーン＃３）との属性情報を取得する。具体的には、シーン＃１からはクリップイン（Ａ）の属性情報とシーン＃３からは、ズームアップ（Ｄ）とパン・ティルト（Ｇ）の属性情報を取得する。 In S1050, the attribute information of the first scene (here, scene # 1) and the second scene (here, scene # 3) is acquired. Specifically, attribute information of clip-in (A) is acquired from scene # 1, and attribute information of zoom-up (D) and pan / tilt (G) is acquired from scene # 3.

Ｓ１０６０では、取得した属性情報が同一か否かを判断する。上記の例では、シーン＃１のクリップイン（Ａ）とシーン＃３のズームアップ（Ｄ）とパン・ティルト（Ｇ）はいずれも異なるので、同一ではないと判断される。この場合は、再度Ｓ１０３０へ処理を戻す。 In S1060, it is determined whether the acquired attribute information is the same. In the above example, the clip-in (A) of the scene # 1, the zoom-up (D) and the pan / tilt (G) of the scene # 3 are all different, and thus are determined not to be the same. In this case, the process returns to S1030 again.

再度、Ｓ１０３０へ処理を戻し、同様に処理を行うとＳ１０４０で第１のシーンと第２のシーンを取得する。この場合、すでにシーン＃１は、一度、第１のシーンとして取得されているためこの取得候補とはならない。この場合に第１のシーンとして取得されるのは、シーン＃１を除いて最も評価の高いシーン＃８となる。第２のシーンは、第１のシーン（ここではシーン＃８）に時間的に隣接するシーン＃６となる。シーン＃８とシーン＃６について同様にＳ１０５０、Ｓ１０６０の処理を行うと、同一の属性情報を持っていないので、ここでも、Ｓ１０７０の処理を実施されず再度Ｓ１０３０へ処理を戻す。 When the process is returned to S1030 again and the same process is performed, the first scene and the second scene are acquired in S1040. In this case, since scene # 1 has already been acquired as the first scene, it is not an acquisition candidate. In this case, what is acquired as the first scene is the most evaluated scene # 8 except for the scene # 1. The second scene is scene # 6 temporally adjacent to the first scene (scene # 8 here). If the processes of S1050 and S1060 are similarly performed for the scene # 8 and the scene # 6, the same attribute information is not obtained, so the process of S1070 is not performed here and the process returns to S1030 again.

Ｓ１０３０に戻った処理を行うと、Ｓ１０４０では第１のシーンとしてシーン＃５が、第２のシーンとしてシーン＃４、シーン＃６が取得される。この場合、Ｓ１０５０では、シーン＃５の属性情報として静止して撮影（Ｃ）と顔検出／顔認識（Ｚ）が、シーン＃４からは特定音声の抽出（Ｂ）、静止して撮影（Ｃ）、パン・ティルト（Ｇ）が、シーン＃６からは、特定音声の抽出（Ｂ）、ズームダウン（Ｅ）、パン・ティルト（Ｇ）の属性情報が取得される。 When the process returns to S1030, the scene # 5 is acquired as the first scene, and the scene # 4 and the scene # 6 are acquired as the second scene in S1040. In this case, in S1050, the still image (C) and face detection / face recognition (Z) are taken as attribute information of the scene # 5, the specific sound is extracted from the scene # 4 (B), and the image is taken still (C ), Pan / tilt (G), the attribute information of specific sound extraction (B), zoom-down (E), and pan / tilt (G) is acquired from scene # 6.

ここでズーム制御についてシーン＃４、シーン＃５、シーン＃６について見てみると、図９に示すように、ズームアップはシーン＃の後半からシーン＃６の後半まで継続していることが分かる。そのため、シーン＃４、シーン＃５、シーン＃６はすべてズームアップ（Ｄ）の属性を持っていることとなる。これを整理すると図１１に示す内容となる。 Here, looking at the scene # 4, the scene # 5, and the scene # 6 regarding the zoom control, as shown in FIG. 9, it can be seen that the zoom-up continues from the second half of the scene # to the second half of the scene # 6. . Therefore, scene # 4, scene # 5, and scene # 6 all have the zoom-up (D) attribute. When this is arranged, the contents shown in FIG. 11 are obtained.

この場合Ｓ１０６０では、第１のシーン（シーン＃５）と第２のシーン（シーン＃４）とでは、静止して撮影（Ｃ）とズームアップ（Ｄ）が同一の属性となる。さらに第１のシーン（シーン＃５）と別の第２のシーン（シーン＃６）とでは、ズームアップ（Ｄ）が同一の属性となる。そのため、Ｓ１０７０ではシーン＃４とシーン＃６をダイジェストの候補から削除する。 In this case, in S1060, the first scene (scene # 5) and the second scene (scene # 4) are stationary, and shooting (C) and zoom-up (D) have the same attributes. Further, the zoom-up (D) has the same attribute in the first scene (scene # 5) and another second scene (scene # 6). Therefore, in step S1070, scene # 4 and scene # 6 are deleted from the digest candidates.

再度Ｓ１０３０の処理に戻っても、ダイジェストの候補として残っているシーン＃１、シーン＃５、シーン＃８についてはすでに属性情報についてチェックが終了しているため、このフローチャートの処理を終了する。 Even if the process returns to S1030 again, the check on the attribute information has already been completed for scene # 1, scene # 5, and scene # 8 remaining as digest candidates, and thus the process of this flowchart ends.

以上より、シーン解析部３１２は、ダイジェストの候補として映像から評価の高いシーンを抽出し、抽出されたシーン相互間の属性情報に基づいて、さらにダイジェストに適したシーンを選択する。 As described above, the scene analysis unit 312 extracts a highly evaluated scene from the video as a digest candidate, and further selects a scene suitable for the digest based on the attribute information between the extracted scenes.

＜４．新評価に基づく再生情報の生成について＞
再生情報生成部３１３は、シーン解析部３１２がシーン毎に行った評価と、シーン相互間の関連性を考慮して抽出された、再生すべきシーンを特定する。上記の例の結果、シーン解析部３１２がダイジェストで再生すべきシーンを特定した例が図８となる。 <4. About generation of reproduction information based on new evaluation>
The reproduction information generation unit 313 specifies a scene to be reproduced, extracted in consideration of the evaluation performed by the scene analysis unit 312 for each scene and the relationship between the scenes. As a result of the above example, FIG. 8 shows an example in which the scene analysis unit 312 specifies a scene to be reproduced by digest.

なお、再生情報生成部３１３が生成する再生情報は評価されたシーンのみを再生するものでなくてもよい。例えば、評価されたシーンについては、通常の速度で再生を行い、評価されなかったシーンについては高速再生を行うものであってもよい。つまり、評価されたシーン（評価の高いシーン）と評価されなかったシーン（評価の低いシーン）とで、再生方法をかえるものであってもよい。 Note that the reproduction information generated by the reproduction information generation unit 313 may not reproduce only the evaluated scene. For example, an evaluated scene may be reproduced at a normal speed, and a scene that has not been evaluated may be reproduced at a high speed. That is, the playback method may be changed between an evaluated scene (a highly evaluated scene) and a non-evaluated scene (a low evaluated scene).

以上、本実施の形態に示した構成により、ビデオカメラ１００は撮影した映像からダイジェストを生成する際に、抽出される複数のシーン間での関連性も考慮することで、より好適なダイジェスト生成が可能となる。 As described above, with the configuration described in this embodiment, when the video camera 100 generates a digest from a captured video, more appropriate digest generation can be performed by taking into account the relevance between a plurality of extracted scenes. It becomes possible.

なお、本実施の形態では、シーン解析部３１２でのシーン抽出方法について、一度、評価の高い順にシーンを抽出し、抽出されたシーン間での関連性を見る方法でシーン相互間の関連性を考慮したが、これに限定されるものではない。例えば、映像からシーンを抽出する段階で、他のシーンとの関連性も考慮してシーンを抽出してもよい。この場合には、最も評価の高いシーンからシーン抽出を行い、２番目以降のシーン抽出を行う際に、既に抽出されていいて、時間的に隣接する関係になるシーンとの属性情報の同一性を判断する方法を用いればよい。 In the present embodiment, with regard to the scene extraction method in the scene analysis unit 312, once the scenes are extracted in descending order of evaluation, and the relationship between the extracted scenes is checked, the relationship between the scenes is determined. Although considered, it is not limited to this. For example, the scene may be extracted in consideration of the relevance with other scenes at the stage of extracting the scene from the video. In this case, the scene is extracted from the most evaluated scene, and when the second and subsequent scenes are extracted, the attribute information is identical with the scene that has already been extracted and is temporally adjacent. A determination method may be used.

また、本実施の形態では、属性情報が同一であるか否かを基準として判断したが、これに限定するものではない。例えば、ことなる属性情報間の関連性について予め図１２に示すような関連係数を定めておき、これに基づいて隣接するシーン間での関連性を判断してもよい。図１２の例では、それぞれの属性情報が同一又は他の属性情報とどれほど関連性が強いか否かを示している。係数の値が大きいほど関連性が強いことを意味する。図１２の例では、隣接するシーンで両者にズームアップ（Ｄ）が含まれている場合は係数１．０と示されているので関連性が非常に強いことが示されている。反対に、ズームアップ（Ｄ）と顔検出／顔認識（Ｚ）とでは、比較的関連性が弱いとして係数０．１が与えられている。シーン相互間での関連性を判断する際にこれらの係数を用いて数値化し、一定値以上の関連性は、ダイジェストとして採用しないなどの方法をとることも可能である
なお、本実施の形態では、隣接するシーン間を例示として説明したが、本発明はこれに限定されるものではない。例えば、ダイジェストを構成する他の全てのシーンとの関連性を考慮してダイジェストを生成するものであってもよい。この場合には、ダイジェスト全体としてシーンの抽出が行われるため、全体としてまとまりのあるダイジェストを生成することが可能となる。 In the present embodiment, the determination is made based on whether or not the attribute information is the same, but the present invention is not limited to this. For example, a relationship coefficient as shown in FIG. 12 may be determined in advance for the relationship between different attribute information, and the relationship between adjacent scenes may be determined based on this. In the example of FIG. 12, each attribute information is the same or shows how strongly related to other attribute information. The larger the coefficient value, the stronger the relationship. In the example of FIG. 12, when the zoom-in (D) is included in both of the adjacent scenes, it is indicated that the coefficient is 1.0, which indicates that the relationship is very strong. In contrast, the zoom-up (D) and the face detection / face recognition (Z) are given a coefficient of 0.1 because they are relatively weakly related. When judging the relevance between scenes, it is possible to digitize using these coefficients, and it is also possible to take a method such as not to adopt a relevance of a certain value or more as a digest. Although the description has been given by way of example between adjacent scenes, the present invention is not limited to this. For example, the digest may be generated in consideration of relevance with all other scenes constituting the digest. In this case, since the scene is extracted as the entire digest, it is possible to generate a digest that is coherent as a whole.

また、それ以外にも、本実施の形態では映像を構成するそれぞれのシーンの順番に基づいて、ダイジェスト時のシーンの順番も決定されていたが、これに限定される必要はない。上記の実施の形態の説明では、ダイジェスト生成時にシーン＃１は常に先頭に配置され、シーン＃５がシーン＃１より先に再生されるものではなかった。しかし、シーン＃５を先頭に配置し、シーン＃１をシーン＃５の後に再生するものであってもよい。この場合には、隣接するシーン間で属性情報が同一である場合等に、シーンを並べ替えることで、ダイジェストを見やすい内容に修正することが可能となる。 In addition, in the present embodiment, the order of scenes at the time of digest is also determined based on the order of scenes constituting the video, but it is not necessary to be limited to this. In the description of the above embodiment, the scene # 1 is always placed at the head when the digest is generated, and the scene # 5 is not played back before the scene # 1. However, scene # 5 may be arranged at the head, and scene # 1 may be reproduced after scene # 5. In this case, when the attribute information is the same between adjacent scenes, the digest can be rearranged so that the digest can be easily read.

また、複数のクリップからダイジェストを生成する場合、シーン解析部３１２は、夫々のクリップの撮影時間（再生時間）の長さに応じて、それぞれのクリップから抽出するシーンの数を決定してもよい。この場合には、各クリップの撮影時間（再生時間）に応じて、ダイジェストを構成するシーンが抽出されるため、複数のクリップからおよそ平準化してシーンの抽出が行われたダイジェストを生成することが可能となる。この場合に、クリップ間を跨るシーン相互間の関連性について考慮してシーン抽出を行うことで、生成されるダイジェストはより見やすい内容となる可能性がある。 Further, when generating a digest from a plurality of clips, the scene analysis unit 312 may determine the number of scenes to be extracted from each clip according to the length of the shooting time (reproduction time) of each clip. . In this case, since the scenes constituting the digest are extracted according to the shooting time (playback time) of each clip, it is possible to generate a digest in which scene extraction is performed by leveling from a plurality of clips. It becomes possible. In this case, by performing scene extraction in consideration of the relevance between scenes straddling between clips, the generated digest may have a more easily viewable content.

以上より、本実施の形態に示したように、ダイジェスト生成時にシーン相互間の属性情報を考慮することで、より視聴しやすいダイジェストの生成が可能となる。 As described above, as shown in the present embodiment, it is possible to generate a digest that is easier to view by considering attribute information between scenes when generating a digest.

なお、本実施の形態では、ダイジェスト（要約）映像の生成における場合を例に説明したが、撮影した映像から、映像内容として盛り上がるシーンを集めたハイライト映像の場合にも同様に適用できる。この場合は、属性情報と評価の関係については、ハイライト映像生成用のものを用いることになるが、本出願でしめした他のシーンとの関連性も考慮して抽出するシーンを決定する点については同様に実施可能である。 In this embodiment, the case of generating a digest (summary) video has been described as an example. However, the present invention can be similarly applied to a highlight video obtained by collecting scenes that rise as video content from a shot video. In this case, for the relationship between the attribute information and the evaluation, the one for highlight video generation is used, but the scene to be extracted is determined in consideration of the relationship with other scenes shown in this application. Can be implemented in the same manner.

本発明は、映像を生成するビデオカメラ、カメラ、及び、これらの映像を視聴する、ビデオプレーヤ、テレビ、またこれらの映像を編集するビデオ編集機等の製品分野において利用可能である。 INDUSTRIAL APPLICABILITY The present invention can be used in product fields such as a video camera for generating video, a camera, a video player for viewing these videos, a television, and a video editor for editing these videos.

１００ビデオカメラ
２００レンズ群
２０１撮像素子
２０２映像ＡＤＣ
２０３映像信号変換ＩＣ
２０４ＣＰＵ
２０５クロック
２０６レンズ制御モジュール
２０７姿勢検出センサ
２０８入力ボタン
２０９ディスプレイ
２１０スピーカー
２１１出力Ｉ／Ｆ
２１２圧縮伸張ＩＣ
２１３ＲＯＭ
２１４ＲＡＭ
２１５ＨＤＤ
２１６音声ＡＤＣ
２１７マイクロフォン
３００レンズ部
３０１撮像部
３０２映像ＡＤ変換部
３０３信号処理部
３０４映像信号圧縮部
３０５撮像制御部
３０６映像解析部
３０７レンズ制御部
３０８姿勢検出部
３０９属性情報生成部
３１０多重化部
３１１記憶部
３１２シーン解析部
３１３再生情報生成部
３１４音声解析部
３１５音声信号圧縮部
３１６ダイジェスト再生部
３１７映像信号伸張部
３１８映像表示部
３１９音声信号伸張部
３２０音声出力部
３２１音声ＡＤ変換部
３２２マイク部 DESCRIPTION OF SYMBOLS 100 Video camera 200 Lens group 201 Image pick-up element 202 Image | video ADC
203 Video signal conversion IC
204 CPU
205 Clock 206 Lens Control Module 207 Posture Detection Sensor 208 Input Button 209 Display 210 Speaker 211 Output I / F
212 Compression / decompression IC
213 ROM
214 RAM
215 HDD
216 Audio ADC
217 Microphone 300 Lens unit 301 Imaging unit 302 Video AD conversion unit 303 Signal processing unit 304 Video signal compression unit 305 Imaging control unit 306 Video analysis unit 307 Lens control unit 308 Posture detection unit 309 Attribute information generation unit 310 Multiplexing unit 311 Storage unit 312 Scene analysis unit 313 Playback information generation unit 314 Audio analysis unit 315 Audio signal compression unit 316 Digest reproduction unit 317 Video signal expansion unit 318 Video display unit 319 Audio signal expansion unit 320 Audio output unit 321 Audio AD conversion unit 322 Microphone unit

Claims

A video editing device for editing video,
An attribute information generating unit that divides the video into scenes and generates attribute information for each scene;
A scene analysis unit for extracting a scene to be reproduced based on the evaluation for each scene based on the attribute information and the relationship between a plurality of scenes;
A reproduction information generating unit for generating reproduction information for recording information about the scene to be reproduced;
A video editing apparatus comprising:

When the scene analysis unit has common attribute information between two scenes due to the relationship between the scenes, the scene analysis unit extracts only one scene.
The video editing apparatus according to claim 1.

When the scene analysis unit has different attribute information between the two scenes due to the relationship between the scenes, the scene analysis unit determines whether it is necessary to extract the two scenes based on the relationship between the different attribute information.
The video editing apparatus according to claim 1.