JP2008227860A

JP2008227860A - Device for photographing content

Info

Publication number: JP2008227860A
Application number: JP2007062446A
Authority: JP
Inventors: Yoshihiro Morioka; 芳宏森岡; Kenji Matsuura; 賢司松浦; Takashi Inoue; 尚井上; Masaaki Kobayashi; 正明小林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2007-03-12
Filing date: 2007-03-12
Publication date: 2008-09-25
Anticipated expiration: 2027-03-12
Also published as: JP4960121B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a device for photographing content which can create a digest easily by narrowing down the number of metadata below a predetermined number, in the order of priority, thereby narrowing down the number of scenes and clips. <P>SOLUTION: With reference to parameters contained in any one of video image and sound being photographed and recorded or the operational information of a photographic device, a scene information creating section creates scene information by detecting a characteristic scene. A supplementary information adding section adds any one among the type, priority, start time, finish time or representative time as the supplementary information, based on a predetermined rule; a scene list describing section describes the scene information and its supplementary information on a scene list; and a scene sorting section sorts the scene information within a predetermined numbers of pieces, based on the priority from among a plurality of pieces of scene information described on the scene list, after ending recording operation onto an information storage medium. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明はコンテンツ撮影装置に関し、さらに詳しくは、ムービー撮影においてクリップ記録の開始から終了までの間で、不要部を削除したコンテンツや、重要であると推察されるシーンにより構成されるダイジェストを簡単に作成できるコンテンツ撮影装置に関する。 The present invention relates to a content shooting apparatus, and more particularly, in a movie shooting, from the start to the end of clip recording, it is easy to easily delete a digest composed of content from which unnecessary parts are deleted or a scene that is presumed to be important. The present invention relates to a content photographing apparatus that can be created.

従来より、映画やテレビドラマ等の撮影は、絵コンテ等を元に作成したシナリオ（台本）に基づいて行われている。シナリオにはタイトル（名前）が付けられており、複数のシーンにより構成されている。またシーンは複数のカットより構成されている。監督はシナリオに従って演出を行ない、俳優、女優、およびエキストラなどの出演者はシナリオの記述に従って演技する。生本番の舞台やコンサート等においては、シナリオに示されているシーンの順番通りに演じられる。 Conventionally, shooting of movies, TV dramas, and the like has been performed based on scenarios (scripts) created based on storyboards and the like. A scenario has a title (name) and is composed of a plurality of scenes. A scene is composed of a plurality of cuts. The director performs according to the scenario, and actors such as actors, actresses, and extras perform according to the scenario description. In live performances and concerts, performances are performed in the order of the scenes shown in the scenario.

一方、映画やドラマ等においては、シナリオに示されたシーンの順番通りに撮影が行われることは稀である。 On the other hand, in movies, dramas and the like, it is rare that shooting is performed in the order of the scenes indicated in the scenario.

従来の技術におけるメタデータの作成や編集について説明する。メタデータの入力方法と編集システムとして、特許文献１に記載されたものが知られている。具体的には、コンテンツに関連したメタデータの作成あるいはタグ付けを行う場合に、制作されたコンテンツのシナリオ等から事前に抽出したキーワードが音声で入力される。そして、シナリオに基づいて辞書分野の設定およびキーワードの優先順位付けが行われて、音声認識手段によってメタデータが作成される。同方法によれば、キー入力では困難な数秒間隔でメタデータを付与する場合でも、音声認識を用いることによって効率のよいメタデータの付与が可能である。また、メタデータを検索するキーワードとして、シーン検索もできる。 The creation and editing of metadata in the prior art will be described. As a metadata input method and editing system, the one described in Patent Document 1 is known. Specifically, when metadata related to content is created or tagged, a keyword extracted in advance from a scenario or the like of the produced content is input by voice. Based on the scenario, dictionary fields are set and keywords are prioritized, and metadata is created by the voice recognition means. According to this method, even when metadata is given at intervals of several seconds, which is difficult by key input, it is possible to assign metadata efficiently by using voice recognition. A scene search can also be performed as a keyword for searching metadata.

また、シナリオ情報を解析する装置として、特許文献２に記載されたものが知られている。同装置はシナリオ情報格納部、参照情報格納部、算出部および出力部を備えている。情報格納部はシーン毎に区分されたシナリオ情報を格納する。参照情報格納部はシナリオ情報に含まれるデータに関する参照情報を格納する。算出部はシナリオ情報からデータを抽出し、抽出されたデータおよび参照情報に基づいて出力情報を算出して、出力部に出力する。 Moreover, what was described in patent document 2 is known as an apparatus which analyzes scenario information. The apparatus includes a scenario information storage unit, a reference information storage unit, a calculation unit, and an output unit. The information storage unit stores scenario information classified for each scene. The reference information storage unit stores reference information regarding data included in the scenario information. The calculation unit extracts data from the scenario information, calculates output information based on the extracted data and reference information, and outputs the output information to the output unit.

このように構成されたシナリオ情報解析装置は、シナリオに含まれるデータおよびそのデータに関係付けられた参照情報に基づいて、適切な撮影スケジュール等の出力情報を自動的に算出し出力すると共に、シナリオ情報を解析できる。結果、撮影スジュールの立案時間の短縮化に加えて、出力された撮影スケジュールに従って撮影を行なうことにより、撮影を早く完了することができるので、撮影費用を低減できる。
特許第３７８１７１５号公報特開２００４−３６２６１０号公報 The scenario information analysis apparatus configured as described above automatically calculates and outputs output information such as an appropriate shooting schedule based on the data included in the scenario and the reference information related to the data. Information can be analyzed. As a result, in addition to shortening the planning time of the shooting schedule, shooting can be completed quickly by shooting according to the output shooting schedule, so that shooting costs can be reduced.
Japanese Patent No. 3781715 JP 2004-362610 A

上述の特許文献１および特許文献２に代表される従来の装置および方法においては、ムービー撮影においてクリップ記録の開始から終了までの間に、パンやズームなどのカメラワークパターンや、収録音や、ユーザ操作等に基づいて、特徴のある場面でメタデータが生成されてリスト化される。 In the conventional apparatuses and methods represented by the above-described Patent Document 1 and Patent Document 2, during the movie shooting, from the start to the end of clip recording, camera work patterns such as pan and zoom, recorded sound, and user Based on the operation or the like, metadata is generated and listed in a characteristic scene.

そして、記録終了後にリスト中のメタデータを用いて関連したクリップで構成されるダイジェストを作成することができる。コンテンツの内容にも依存するが、コンテンツのダイジェストは、一般的に全体（元の撮影量）の１／３以下の量であることが望ましい。しかしながら、上述の方法では、コンテンツによっては、メタデータ数が膨大になり、作成されるダイジェスト中に撮影コンテンツの半分以上が残ってしまうことがある。このような場合、とてもダイジェストとはいえない長いダイジェストが出来上がる。これは、従来の構成では、撮影コンテンツよりダイジェストを生成する場合、ダイジェストを構成するシーン、クリップの数を絞り込むことができないことに起因する。 A digest composed of related clips can be created using the metadata in the list after the recording is completed. Although it depends on the content, the content digest is generally preferably 1/3 or less of the whole (original shooting amount). However, in the above method, depending on the content, the number of metadata becomes enormous, and more than half of the photographed content may remain in the created digest. In such a case, a long digest that is not a very digest is completed. This is because in the conventional configuration, when a digest is generated from captured content, the number of scenes and clips constituting the digest cannot be reduced.

さらに、様々な観点にたつダイジェスト作成に対するユーザの要望も多い。具体的には、カメラワークに重点を置いたダイジェストや、音に重点を置いたダイジェスト、ユーザのボタン入力に重点を置いたダイジェスト、希望の時間長に収めるダイジェスト、複数の撮影クリップのうちから特徴シーンの多い（たとえば、優先度が高いクリップが多い）クリップだけを選択してダイジェスト化などが挙げられる。しかしながら、従来の技術では、このような要望を満たすユーザの好みのダイジェストを自動的に生成できない。 In addition, there are many user requests for creating digests from various viewpoints. Specifically, digests that focus on camera work, digests that focus on sound, digests that focus on user button input, digests that fit within the desired length of time, and multiple shooting clips For example, digestion may be performed by selecting only a clip having a large number of scenes (for example, a clip having a high priority). However, the conventional technology cannot automatically generate a user's favorite digest that satisfies such a demand.

上述の問題に鑑みて、本発明はメタデータ数を優先度の高い順または優先度の低い順に事前に設定した個数以下に絞り込むことにより、シーンやクリップの数を絞り込んだダイジェストを簡単に生成できる、あるいは自動的に好みのダイジェストを生成できるコンテンツ撮影装置を提供することを目的とする。 In view of the above problems, the present invention can easily generate a digest with a reduced number of scenes and clips by narrowing the number of metadata to a preset number or less in order of high priority or low priority. Another object of the present invention is to provide a content photographing apparatus that can automatically generate a favorite digest.

上記目的を達成するため本発明は、映像、音声またはデータのいずれかを含むコンテンツをそのシーン情報と組み合わせて情報記憶媒体に記録し、シーン情報を参照してコンテンツの特定部分にアクセスするコンテンツ撮影装置であって、
撮影収録する映像音声または撮影装置の動作情報のいずれかに含まれるパラメータを参照して特徴あるシーンを検出してシーン情報を生成するシーン情報生成手段と、
事前に決めてある規則により、前記シーン情報にその種別、優先度、開始時刻、終了時刻、代表時刻かのいずれかを補助情報として付加する手段と、
前記シーン情報とその補助情報とをシーンリストに記述するシーンリスト記述手段と、
前記情報記憶媒体への記録動作の終了後に、前記シーンリストに記述されている複数のシーン情報から前記優先度に基づいてあらかじめ決められた個数以内のシーン情報を選別するシーン選別手段とを備える。 In order to achieve the above object, the present invention records content including any one of video, audio, and data in combination with scene information on an information storage medium, and accesses a specific part of the content by referring to the scene information. A device,
Scene information generating means for generating scene information by detecting a characteristic scene with reference to a parameter included in either video / audio to be captured and recorded or operation information of the imaging device;
Means for adding any one of the type, priority, start time, end time, and representative time as auxiliary information to the scene information according to a predetermined rule;
Scene list description means for describing the scene information and its auxiliary information in a scene list;
And a scene selection means for selecting scene information within a predetermined number based on the priority from a plurality of scene information described in the scene list after the recording operation to the information storage medium is completed.

前記シーン選別手段は、優先度の高い順にシーン情報を選別する高優先度シーン選別手段および優先度の低い順にシーン情報を選別する低優先度シーン選別手段の少なくとも一方を備えることが望ましい。 The scene selection means preferably includes at least one of a high priority scene selection means for selecting scene information in descending order of priority and a low priority scene selection means for selecting scene information in order of low priority.

前記高優先度シーン選別手段は、前記低優先度シーン選別手段により選別されたシーンを除外した撮影コンテンツより高優先度シーンを選別することが望ましい。 The high-priority scene selection unit may select a high-priority scene from the photographed content excluding the scene selected by the low-priority scene selection unit.

前記コンテンツ撮影装置は、
前記シーン情報選択方法の設定手段と、
前記設定手段により決められた演算方式により、前記シーンリスト記述メモリに記述された複数のシーン情報の組み合わせを演算して前記シーン情報に第２の優先度を付加する第２優先度手段と、
前記第２の優先度を持つ特徴あるシーンの開始時刻と終了時刻と代表時刻との少なくともいずれかを前記シーン情報の補助情報として付加する補助情報付加手段と、
前記シーンリスト記述メモリにリストとして保持されている複数のシーン情報から前記第２の優先度の高い順にあらかじめ決まられた個数以内のシーン情報を選別して第２優先度リストを生成する第２優先度リスト生成手段とをさらに備えることが望ましい。 The content photographing device includes:
Setting means for the scene information selection method;
Second priority means for calculating a combination of a plurality of scene information described in the scene list description memory and adding a second priority to the scene information by a calculation method determined by the setting means;
Auxiliary information adding means for adding at least one of a start time, an end time, and a representative time of the characteristic scene having the second priority as auxiliary information of the scene information;
Second priority for generating a second priority list by selecting scene information within a predetermined number in descending order of the second priority from a plurality of scene information held as a list in the scene list description memory It is desirable to further include a degree list generating means.

前記第２優先度リストを撮影コンテンツの再生時参照ファイルに記述する第２優先リスト記述手段をさらに備えることが望ましい。 It is desirable to further include second priority list description means for describing the second priority list in a reference file at the time of reproduction of photographed content.

前記コンテンツ撮影装置は、再生時に前記第２優先度リストを参照することにより、特徴あるシーンの開始点または代表点または終了点を参照した点にスキップするスキップ手段をさらに備えることが望ましい。 It is preferable that the content photographing apparatus further includes a skip unit that refers to the second priority list at the time of reproduction to skip to a point that refers to a start point, a representative point, or an end point of a characteristic scene.

前記コンテンツ撮影装置は、前記第２優先度リストを参照することにより、特徴あるシーン区間を指定順序で再生するシーン再生手段をさらに備えることが望ましい。 The content photographing apparatus preferably further includes a scene reproducing unit that refers to the second priority list and reproduces characteristic scene sections in a specified order.

前記コンテンツ撮影装置は、前記特徴あるシーン区間を指定順序で再生する場合に、各々の特徴あるシーン区間の再生時にその特徴シーンの説明をテロップとして再生映像に重畳するテロップ表示手段をさらに備えることが望ましい。 The content photographing apparatus further includes a telop display unit that superimposes the description of the characteristic scene as a telop on the reproduced video when reproducing the characteristic scene sections in a specified order. desirable.

前記コンテンツ撮影装置は、
ダイジェスト化する特徴あるシーンの優先度指定、ダイジェスト化する特徴あるシーン種別の指定、ダイジェスト時間長の指定、コンテンツのダイジェストへの縮小率指定の少なくとも一つを入力するダイジェスト方式指定手段と、
前記再生時参照ファイルの補助データである第２優先度リストを参照してダイジェスト方式指定手段で指定されたダイジェスト生成方式に従って、再生時参照ファイルを生成する手段と、
前記再生時参照ファイルを再生オブジェクトリストに登録する登録手段とをさらに備えることが望ましい。 The content photographing device includes:
Digest method designation means for inputting at least one of priority specification of a characteristic scene to be digested, specification of a characteristic scene type to be digested, specification of a digest time length, specification of a reduction ratio to a content digest,
Means for generating a reference file at the time of reproduction according to the digest generation method designated by the digest method designation means with reference to a second priority list which is auxiliary data of the reference file at the time of reproduction;
It is desirable to further comprise registration means for registering the reproduction-time reference file in the reproduction object list.

前記コンテンツ撮影装置は、前記再生時参照ファイルを参照して再生する場合に、各々の特徴あるシーン区間の再生時にその特徴シーンの説明をテロップとして再生映像に重畳するテロップ表示手段をさらに備えることが望ましい。 The content photographing apparatus further includes a telop display unit that superimposes the description of the characteristic scene as a telop on the reproduced video when reproducing each characteristic scene section when reproducing with reference to the reproduction reference file. desirable.

前記コンテンツ撮影装置は、
ダイジェスト化する特徴あるシーンの優先度指定、ダイジェスト化する特徴あるシーン種別の指定、ダイジェスト時間長の指定、コンテンツのダイジェストへの縮小率指定の少なくとも一つを入力するダイジェスト方式指定手段と、
前記再生時参照ファイルの補助データである第２優先度リストを参照してダイジェスト方式指定手段で指定されたダイジェスト生成方式に従って、指定された特徴あるシーンの集合体としてのファイルを生成するファイ生成手段とをさらに備えることが望ましい。 The content photographing device includes:
Digest method designation means for inputting at least one of priority specification of a characteristic scene to be digested, specification of a characteristic scene type to be digested, specification of a digest time length, specification of a reduction ratio to a content digest,
File generation means for generating a file as a set of designated characteristic scenes according to the digest generation method specified by the digest method specifying means with reference to the second priority list as auxiliary data of the reference file at the time of reproduction It is desirable to further comprise.

前記コンテンツ撮影装置は、
ダイジェスト化する特徴あるシーンの優先度指定、ダイジェスト化する特徴あるシーン種別の指定、ダイジェスト時間長の指定、およびコンテンツのダイジェストへの縮小率指定の少なくとも一つを入力するダイジェスト方式指定手段と、
前記再生時参照ファイルの補助データである第２優先度リストを参照してダイジェスト方式指定手段で指定されたダイジェスト生成方式に従って、指定されない特徴あるシーン区間を繋ぎ合わせて再生する再生手段をさらに備えることが望ましい。 The content photographing device includes:
Digest method designation means for inputting at least one of priority designation of a characteristic scene to be digested, designation of a characteristic scene type to be digested, designation of a digest time length, and designation of a reduction ratio to a content digest,
It further comprises reproduction means for joining and reproducing characteristic scene sections not designated according to the digest generation method designated by the digest method designation means with reference to the second priority list which is auxiliary data of the reference file at the time of reproduction. Is desirable.

前記コンテンツ撮影装置は、
ダイジェスト化する特徴あるシーンの優先度指定、ダイジェスト化する特徴あるシーン種別の指定、ダイジェスト時間長の指定、およびコンテンツのダイジェストへの縮小率指定の少なくとも一つを入力するダイジェスト方式指定手段と、
前記再生時参照ファイルの補助データである第２優先度リストを参照してダイジェスト方式指定手段で指定されたダイジェスト生成方式に従って、再生時に指定されない撮影区間を集めたファイルを生成するファイル生成手段とをさらに備えることが望ましい。 The content photographing device includes:
Digest method designation means for inputting at least one of priority designation of a characteristic scene to be digested, designation of a characteristic scene type to be digested, designation of a digest time length, and designation of a reduction ratio to a content digest,
File generation means for generating a file in which shooting sections not specified at the time of reproduction are collected according to the digest generation method specified by the digest method specifying means with reference to the second priority list which is auxiliary data of the reference file at the time of reproduction; It is desirable to further provide.

前記コンテンツ撮影装置は、前記再生時参照ファイルの補助データである第２優先度リストを参照することにより、再生時に指定される区間は通常再生を行い、再生時に指定されない撮影区間は「再生速度を通常より変更した再生」または「再生映像に処理を加えた映像の再生」を行う再生手段をさらに備えることが望ましい。 The content photographing apparatus refers to the second priority list that is auxiliary data of the reference file at the time of reproduction, thereby performing normal reproduction in a section designated at the time of reproduction, and a photographing period not designated at the time of reproduction is referred to as “reproduction speed. It is desirable to further include a playback means for performing “playback changed from normal” or “playback of video that has been processed from playback video”.

前記コンテンツ撮影装置は、撮影映像のスロー再生または高速再生またはスキップ再生または静止画表示を行う再生表示手段をさらに備え、前記「再生速度を通常より変更した再生」を行うことが望ましい。 The content photographing apparatus preferably further includes reproduction display means for performing slow reproduction, high-speed reproduction, skip reproduction, or still image display of a photographed video, and performs the “reproduction with the reproduction speed changed from normal”.

前記コンテンツ撮影装置は、カメラワークに起因する撮影映像の横揺れやたて揺れを取り除いた映像を生成する映像生成手段をさらに備え、前記「再生映像に処理を加えた映像の再生」を行うことが望ましい。 The content photographing apparatus further includes a video generation means for generating a video obtained by removing a roll and a sway of a captured video caused by camera work, and performs the “reproduction of a video obtained by processing a reproduced video” Is desirable.

前記コンテンツ撮影装置は、
前記再生オブジェクトリストに登録された再生時参照ファイルから、優先度があらかじめ決めた値上のシーンまたは特定のカメラワークを持ったシーンより構成されるお勧め再生時参照ファイルを生成するファイル生成手段と、
前記お勧め再生時参照ファイルをお勧め再生オブジェクトリストに登録する登録手段をさらに備えることが望ましい。 The content photographing device includes:
File generation means for generating a recommended playback reference file composed of a scene having a predetermined priority or a scene having a specific camera work from a playback reference file registered in the playback object list; ,
It is desirable to further comprise a registration means for registering the recommended playback reference file in the recommended playback object list.

前記コンテンツ撮影装置は、
前記再生オブジェクトリストに登録された再生時参照ファイルの再生時にＢＧＭを生成する手段と、
再生時参照ファイルの特徴あるシーンの切り替わり付近でＢＧＭのメロディー、音色、およびテンポの少なくとも一つを変える変更手段をさらに備えることが望ましい。 The content photographing device includes:
Means for generating BGM at the time of reproduction of the reproduction-time reference file registered in the reproduction object list;
It is desirable to further include changing means for changing at least one of the BGM melody, timbre, and tempo in the vicinity of the characteristic scene change of the reference file at the time of reproduction.

本発明のコンテンツ撮影装置によれば、特に、映像撮影において、撮影コンテンツの不要部削除やダイジェスト生成が容易に実現できる。 According to the content shooting apparatus of the present invention, it is possible to easily delete unnecessary portions of a shot content and generate a digest, particularly in video shooting.

（第１の実施の形態）
図１を参照して、本発明の第１の実施の形態に係るコンテンツ撮影装置の動作について説明する。すなわち、コンテンツ撮影装置であるカメラ１０１の内部の記録媒体（またはバッファメモリ）上に映像データと音声データとメタデータを生成して、メタデータを参照してダイジェスト再生機能などを提供するシステムモデルの一例について説明する。 (First embodiment)
With reference to FIG. 1, the operation of the content photographing apparatus according to the first embodiment of the present invention will be described. That is, a system model that generates video data, audio data, and metadata on a recording medium (or buffer memory) inside the camera 101 that is a content photographing apparatus, and provides a digest playback function and the like with reference to the metadata. An example will be described.

図１において参照符号１０１はカメラを示し、参照符号１０２はカメラ１０１のレンズ部を示し、参照符号１０３はカメラ１０１のマイクを示し、そして参照符号１０４はカメラ１０１の撮影対象を示している。なお、撮影対象１０４とは、例えば、風景や人やペットなどの動物、車、建造物などである。 In FIG. 1, reference numeral 101 indicates a camera, reference numeral 102 indicates a lens unit of the camera 101, reference numeral 103 indicates a microphone of the camera 101, and reference numeral 104 indicates a photographing target of the camera 101. Note that the imaging target 104 is, for example, a landscape, an animal such as a person or a pet, a car, or a building.

参照符号１１４は、メタデータ入力用ボタンを示し、参照符号１０５はカメラ１０１で撮影したデータを示し、参照符号１０６はメタデータを含むＡＶストリームデータファイル１０６を示している。参照符号１０７は、撮影シーン情報ＩＳ（シーン番号、カット番号、テーク番号、その収録テークの採用、不採用、保留）等のメタデータを示している。参照符号１０９は、カメラ１０１に対するリモコンを示している。ユーザはメタデータ入力用ボタン１１４およびリモコン１０９を操作して、カメラ１０１にメタデータ１０７を入力する。なお、カメラ１０１に用いられる撮像素子は、好ましくはＣＣＤやＣ−ＭＯＳなどで構成される。 Reference numeral 114 indicates a metadata input button, reference numeral 105 indicates data captured by the camera 101, and reference numeral 106 indicates an AV stream data file 106 including metadata. Reference numeral 107 indicates metadata such as shooting scene information IS (scene number, cut number, take number, adoption of the recorded take, non-adoption, hold). Reference numeral 109 indicates a remote controller for the camera 101. The user operates the metadata input button 114 and the remote controller 109 to input the metadata 107 to the camera 101. The image sensor used in the camera 101 is preferably composed of a CCD, C-MOS, or the like.

参照符号１０８はカメラ１０１で撮影されたデータシーケンスを示している。データシーケンス１０８においては、時間軸上に映像データ、音声データ、およびメタデータ１０７が配置されている。メタデータ１０７はテキスト形式の文字データとして扱うが、バイナリィ形式のデータとしても良い。データシーケンス１０８は、特定のシーンにおけるクリップ＃１からクリップ＃５までを含んでいる。 Reference numeral 108 indicates a data sequence photographed by the camera 101. In the data sequence 108, video data, audio data, and metadata 107 are arranged on the time axis. The metadata 107 is handled as text data in text format, but may be data in binary format. The data sequence 108 includes clips # 1 to # 5 in a specific scene.

参照符号１１０は編集により、クリップ＃１からクリップ＃５までがつなぎ合わされたデータシーケンスを示している。参照符号１１１は、カメラ１０１に接続可能なテレビを示している。参照符号１１２は、カメラ１０１からテレビ１１１に信号を送る接続ケーブルを示している。参照符号１１３は、テレビ１１１からカメラ１０１へ信号を送る接続ケーブルを示している。ユーザは、カメラ１０１から離れた場所でリモコン１０９を操作して、信号ケーブル１１２を経由して、不要部を削除したコンテンツやダイジェスト（または、要約コンテンツ）をテレビ１１１に表示できる。 Reference numeral 110 indicates a data sequence in which clip # 1 to clip # 5 are connected by editing. Reference numeral 111 denotes a television that can be connected to the camera 101. Reference numeral 112 indicates a connection cable for transmitting a signal from the camera 101 to the television 111. Reference numeral 113 indicates a connection cable for transmitting a signal from the television 111 to the camera 101. The user can display the content or digest (or summary content) from which unnecessary portions are deleted on the television 111 via the signal cable 112 by operating the remote control 109 at a location away from the camera 101.

符号１１５はマイク１０３と同様に、音声を検出して音声信号としてカメラ１０１に入力するマイクを示している。参照符号１１５はカメラ１０１に内蔵されているマイクを示している。但し、マイク１１５は、マイク１０３およびマイク１１７がカメラ１０１に直接取り付けられてカメラ１０１の近傍の音声を収録するのに比べて、ケーブルなどでカメラ１０１に接続されてカメラ１０１の遠方の音声の収録に用いられる。マイク１１５は後述するように、マイクの代わりに光センサを用いることもできる。 Reference numeral 115 denotes a microphone that detects sound and inputs it to the camera 101 as an audio signal, similarly to the microphone 103. Reference numeral 115 indicates a microphone built in the camera 101. However, the microphone 115 is connected to the camera 101 with a cable or the like to record the sound far away from the camera 101 compared to the case where the microphone 103 and the microphone 117 are directly attached to the camera 101 and record the sound near the camera 101. Used for. As will be described later, the microphone 115 can use an optical sensor instead of the microphone.

テレビ１１１による一覧表示について簡単に説明する。テレビ１１１の画面において、横軸は時間の経過を表しており、それぞれのクリップの有効部（有効なシーン）と無効部（無効なシーン）が表示されている。 A list display by the television 111 will be briefly described. On the screen of the television 111, the horizontal axis represents the passage of time, and the valid part (valid scene) and invalid part (invalid scene) of each clip are displayed.

有効部は、例えば、
・パンやズーム後のフィックスシーン、および
・歓声や拍手などの音声で特徴付けられるシーン等からなる。
一方、無効部は、例えば、
・カメラの揺れによる画像の振れ（一般に、「グラグラ」）が大きいシーン、
・ピンボケのシーン、パン／ティルト／ズームが早すぎるシーン、
・逆光になって画面が真っ黒になっているシーン、
・ハウリングがあるシーン、
・地面を撮影しているシーン、および・
・カメラのキャップが閉まっているシーン等からなる。 The effective part is, for example,
・ Fixed scenes after panning and zooming, and ・ Scenes characterized by voices such as cheers and applause.
On the other hand, the invalid part is, for example,
-Scenes with large image shake due to camera shake (generally "grabbing"),
・ Out-of-focus scene, pan / tilt / zoom too early,
・ Scenes where the screen is black due to backlight,
・ Scenes with howling,
A scene shooting the ground, and
・ It consists of scenes where the camera cap is closed.

図１の例では、テレビ１１１の一覧表示において３つある有効部それぞれの代表クリップを代表サムネイルで画面上に表示している。この代表クリップは、それぞれの有効部の先頭フレームであってもよいし、有効部分の途中にある代表フレームであってもよい。また、それぞれの有効部と無効部には、それぞれ優先度が付与されていて、特定の優先度のシーンだけを選択してダイジェストを生成することもできる。 In the example of FIG. 1, the representative clips of each of the three valid portions in the list display of the television 111 are displayed on the screen as representative thumbnails. This representative clip may be the first frame of each effective portion or a representative frame in the middle of the effective portion. Also, priority is assigned to each valid part and invalid part, and a digest can be generated by selecting only scenes having a specific priority.

上述のメタデータ入力用ボタン１１４は、好ましくは３つのボタンにより構成されている。カメラで撮影中に重要な場面で、ユーザがメタデータ入力用ボタン１１４を操作することにより、その重要な撮影場面（クリップ）にマークをつけることができる（「マーキング機能」と言う）。この重要クリップを指すマークもメタデータ１０７であり、このメタデータ１０７を利用することにより、撮影後にマーク検索によりマークを付けたクリップ（クリップの先頭または代表となるフレームの映像、またはそれらのサムネイル映像）を素早く呼び出すことができる。メタデータ入力用ボタン１１４の３つのボタンは、例えば、１つ目のボタンは重要クリップの登録に、２つ目のボタンはボタン操作を有効にしたり文字入力モードに切替えたりするモード切替えに、３つ目のボタンは登録のキャンセル等の用途に使用される。 The metadata input button 114 described above is preferably composed of three buttons. When the user operates the metadata input button 114 in an important scene while shooting with the camera, the important shooting scene (clip) can be marked (referred to as “marking function”). The mark indicating the important clip is also metadata 107, and by using this metadata 107, the clip (the video of the head or the representative frame of the clip, or the thumbnail video thereof) marked by the mark search after shooting is used. ) Can be called quickly. The three buttons of the metadata input button 114 are, for example, the first button for registering important clips, the second button for mode switching for enabling button operation or switching to the character input mode, and the like. The first button is used for canceling registration.

また、１つ目のボタンを押している期間を重要クリップとして登録するモードに切替えることもできる。さらに、１つ目のボタンを押した時点の前後５秒、あるいは前５秒、後１０秒の合計１５秒を重要クリップとして登録するモードに切替えることもできる。ボタンが３つあれば、押すボタンの種類、タイミング、押す長さの組み合わせにより、多くの機能を実現できる。 It is also possible to switch to a mode in which the period during which the first button is pressed is registered as an important clip. Further, the mode can be switched to a mode in which 5 seconds before and after the first button is pressed, or a total of 15 seconds, 5 seconds before and 10 seconds after, are registered as important clips. If there are three buttons, many functions can be realized by combining the type of button to be pressed, timing, and pressing length.

メタデータ１０７として入力された撮影シーン情報ＩＳはクリップのタイムコード（たとえば、２７ＭＨｚのクロック周波数において３２ｂｉｔで表現されるタイムコード）に関連付けられる。そして、タイムコードに関連づけられたメタデータ１０７は、カメラ１０１の本体内で電子的にカチンコ音や収録コンテンツと関連付けて、新たなメタデータ１０７として生成される。これにより、カチンコを鳴らした時刻への即時アクセスはもちろん、カチンコを鳴らした時刻以前の不要な収録データの削除や、収録結果が採用のシーンやカット等の並べ替えが簡単にできる。例えば、運動会の撮影において、かけっこ（短距離競争）、リレーなどの長距離競争、綱引き、および玉入れ等の開始タイミングにおけるフレーム画像もすぐに呼び出すことができる。 The shooting scene information IS input as the metadata 107 is associated with a clip time code (for example, a time code expressed in 32 bits at a clock frequency of 27 MHz). The metadata 107 associated with the time code is generated as new metadata 107 by electronically associating it with a clapperboard sound or recorded content within the main body of the camera 101. As a result, not only the immediate access to the time when the clapper is sounded but also the deletion of unnecessary recorded data before the time when the clapper is sounded and the rearrangement of scenes and cuts where the recording results are adopted can be easily performed. For example, when shooting an athletic meet, a frame image at the start timing of a long distance competition such as a game (short distance competition), a relay, a tug of war, and a ball insertion can be called immediately.

本明細書においては、撮影開始から撮影終了までの期間、または撮影開始から撮影ポーズまでの期間に撮影されたコンテンツをクリップと定義する。ユーザは、カメラで撮影した素材のシーケンスに基づいて、各クリップの開始位置（時刻）と終了位置（時刻）、または長さを指定して、クリップを並べ替えることができる。また各クリップをＴＶモニタなどに表示する場合、そのクリップの先頭または先頭以降から最後尾に至るフレーム（またはフィールド）映像や、パンやズームの前後などにおけるフィックス画像など、あるクリップで最も特徴的なフレームを、そのクリップを代表する映像として指定できる。 In this specification, a content shot during a period from the start of shooting to the end of shooting or from a start of shooting to a shooting pose is defined as a clip. The user can rearrange the clips by designating the start position (time) and end position (time) or length of each clip based on the sequence of the material photographed by the camera. When each clip is displayed on a TV monitor or the like, the most characteristic of a clip, such as a frame (or field) video from the beginning or the beginning to the end of the clip, or a fixed image before or after panning or zooming, etc. A frame can be designated as a video representing the clip.

また、記録・ポーズ・停止などのムービーのボタン操作や、マイク１１５で検出される撮影者の声などを、クリップの特定のタイムコードと関連付けた（マーキングした）メタデータとして登録することができる。撮影者の声の例として、撮影対象に関するメタ情報がある。具体例としては、前述した撮影日（日付、朝昼夕夜など）、撮影方法（レンズ、カメラ、ショット、光源など）、イベントの参加者（目線、動き、表情、テンション、メイク、衣装の状態など）、セリフ（アドリブなどのキーワード）、および音（サウンド）等の他注目ポイントなどコンテンツの撮影に関する情報が挙げられる。 In addition, movie button operations such as recording, pause, and stop, and the photographer's voice detected by the microphone 115 can be registered as metadata associated with (marked) a specific time code of the clip. As an example of a photographer's voice, there is meta-information related to a photographing target. Specific examples include the shooting date (date, morning, day, night, etc.), shooting method (lens, camera, shot, light source, etc.), event participants (line of sight, movement, facial expression, tension, makeup, costume status) ), Words (keywords such as ad lib), and other points of interest such as sound, etc.

次に、図２を参照してカメラ１０１の内部構成と動作について説明する。カメラ１０１の内部には、ズーム制御部２０１、フォーカス制御部２０２、露出制御部２０３、撮像素子２０４、シャッタ速度制御部２０５、カメラマイコン２０６、絶対傾きセンサ２０７、角速度センサ２０８、前後／左右／垂直の加速度センサ２０９、ユーザ入力系２１０、カメラ信号処理部２１１、音声処理系２１２、Ｈ．２６４方式エンコーダ２１３、記録メディア２１４、および出力インタフェース２１５が備えられている。 Next, the internal configuration and operation of the camera 101 will be described with reference to FIG. Inside the camera 101 are a zoom control unit 201, a focus control unit 202, an exposure control unit 203, an image sensor 204, a shutter speed control unit 205, a camera microcomputer 206, an absolute tilt sensor 207, an angular velocity sensor 208, front / rear / left / right / vertical. Acceleration sensor 209, user input system 210, camera signal processing unit 211, audio processing system 212, H.P. A H.264 encoder 213, a recording medium 214, and an output interface 215 are provided.

なお、カメラマイコン２０６は、不要シーンや重要シーンの検出を行うシーン情報生成手段（Ｉｎｄｅｘ生成手段）を備える。不要シーンや重要シーンの検出は、撮像装置のパン、ティルト、ズーム、フォーカス、音声入力レベルなどのデータに関して、それぞれに特定の演算を行うことに実現される。 The camera microcomputer 206 includes scene information generation means (index generation means) that detects unnecessary scenes and important scenes. Detection of unnecessary scenes and important scenes is realized by performing specific calculations on data such as pan, tilt, zoom, focus, and audio input level of the imaging apparatus.

ズーム制御部２０１はレンズ部１０２のズーム動作を制御する。フォーカス制御部２０２は、レンズ部１０２のフォーカス動作を制御する。露出制御部２０３はレンズ部１０２の露出調整動作を制御する。シャッタ速度制御部２０５は撮像素子２０４のシャッタ速度調整動作を制御する。絶対傾きセンサ２０７はカメラ１０１の水平／垂直方向の絶対傾きを検出する。角速度センサ２０８は、カメラ１０１の水平／垂直方向の角速度を検出する。加速度センサ２０９はカメラ１０１の前後／左右／垂直の加速度を検出する。 The zoom control unit 201 controls the zoom operation of the lens unit 102. The focus control unit 202 controls the focus operation of the lens unit 102. The exposure control unit 203 controls the exposure adjustment operation of the lens unit 102. A shutter speed control unit 205 controls the shutter speed adjustment operation of the image sensor 204. The absolute tilt sensor 207 detects the absolute tilt of the camera 101 in the horizontal / vertical direction. An angular velocity sensor 208 detects the angular velocity of the camera 101 in the horizontal / vertical direction. The acceleration sensor 209 detects the front / rear / left / right / vertical acceleration of the camera 101.

ユーザ入力系２１０は、ボタンなどでユーザの操作を受け付けて指示信号を生成する。音声処理系２１２は、内蔵マイク１１７、外部マイク１０３、あるいはマイク１１５からの入力を受け付ける。Ｈ．２６４方式エンコーダ２１３は、音声処理系２１２に入力された音声から、カチンコによる音を検出してカチンコ音検出メタデータを生成する。 The user input system 210 receives a user operation with a button or the like and generates an instruction signal. The audio processing system 212 receives input from the built-in microphone 117, the external microphone 103, or the microphone 115. H. The H.264 encoder 213 detects clapperboard sound from the sound input to the sound processing system 212 and generates clapperboard sound detection metadata.

撮像素子２０４の動作パラメータとしては、３原色点の色度空間情報、白色の座標、および３原色のうち少なくとも２つのゲイン情報、色温度情報、Δｕｖ（デルタｕｖ）、および３原色または輝度信号のガンマ情報の少なくとも１つの撮像素子動作データがある。本実施の形態においては、一例として、３原色点の色度空間情報、３原色のうちＲ（赤）とＢ（青）のゲイン情報、およびＧ（緑）のガンマカーブ情報をメタデータとして取り扱うものとする。なお、３原色点の色度空間情報が分かれば色空間における色再現が可能な範囲が分かる。また、３原色のうちＲ（赤）とＢ（青）のゲイン情報が分かれば色温度が分かる。さらに、Ｇ（緑）のガンマカーブ情報が分かれば、階調表現特性が分かる。なお、色温度検出用の専用センサを具備して、そのセンサから色温度情報を受け取るように構成してもよい。 The operation parameters of the image sensor 204 include chromaticity space information of three primary colors, white coordinates, and gain information of at least two of the three primary colors, color temperature information, Δuv (delta uv), and three primary colors or luminance signals. There is at least one image sensor operation data for gamma information. In the present embodiment, as an example, chromaticity space information of three primary color points, gain information of R (red) and B (blue) of three primary colors, and gamma curve information of G (green) are handled as metadata. Shall. If the chromaticity space information of the three primary color points is known, the range in which color reproduction in the color space is possible is known. If the gain information of R (red) and B (blue) among the three primary colors is known, the color temperature can be determined. Further, if the G (green) gamma curve information is known, the gradation expression characteristic can be known. Note that a dedicated sensor for detecting color temperature may be provided, and color temperature information may be received from the sensor.

レンズのズーム情報、レンズのフォーカス情報、レンズの露出情報、撮像素子のシャッタ速度情報、水平／垂直方向の絶対傾き情報、水平／垂直方向の角速度情報、前後／左右／垂直の加速度情報、ユーザの入力したボタン情報やシーン番号、カット番号、テーク番号、その収録テークの採用、不採用、保留などに関する情報、３原色点の色度空間情報、３原色のうちＲ（赤）とＢ（青）のゲイン情報、およびＧ（緑）のガンマカーブ情報は、カメラマイコン２０６においてメタデータ１０７（カメラメタと呼ぶ）として取り扱われる。 Lens zoom information, lens focus information, lens exposure information, image sensor shutter speed information, horizontal / vertical absolute tilt information, horizontal / vertical angular velocity information, front / rear / left / right / vertical acceleration information, user's information Input button information, scene number, cut number, take number, information on adoption, non-adoption, hold, etc. of the recorded take, chromaticity space information of the three primary colors, R (red) and B (blue) of the three primary colors Gain information and G (green) gamma curve information are handled as metadata 107 (referred to as camera meta) in the camera microcomputer 206.

撮像素子２０４で撮影された情報（画像のデータ）は、カメラ信号処理部２１１による画素単位あるいは複数の画素により構成されるブロック単位で画素欠陥補正やガンマ補正などの処理を経て、Ｈ．２６４方式エンコーダ２１３で圧縮された後に、前述のカメラメタと共に記録メディア２１４に蓄積される。また、Ｈ．２６４方式エンコーダ２１３のＡＶ出力と、カメラマイコン２０６のカメラメタ出力は、出力インタフェース２１５より、それぞれ出力される。 Information (image data) captured by the image sensor 204 is subjected to processing such as pixel defect correction and gamma correction in pixel units or block units composed of a plurality of pixels by the camera signal processing unit 211. After being compressed by the H.264 encoder 213, it is stored in the recording medium 214 together with the camera meta described above. H. The AV output of the H.264 encoder 213 and the camera meta output of the camera microcomputer 206 are output from the output interface 215, respectively.

次に、図３を参照してムービーカメラにおいて作成されるメタデータについて説明する。Ｈ．２６４ストリームのＳＥＩにマッピングされるリアルタイムメタデータの例としては、例えば、
・ムービーなどのカメラで撮影したＡＶコンテンツに関連付けたメタデータ
・一般的にはデータをメタデータ化したメタデータ
・デジタル放送のＳＩ（ＳｅｒｖｉｃｅＩｎｆｏｒｍａｔｉｏｎ）より得るメタデータ
・ＥＰＧ提供事業者より得たＥＰＧ情報などのメタデータ
・Ｉｎｔｅｒｎｅｔから得たＥＰＧなどのメタデータ
などがある。 Next, metadata created in the movie camera will be described with reference to FIG. H. As an example of real-time metadata mapped to SEI of H.264 stream, for example,
・ Metadata associated with AV content shot with a camera such as a movie ・ Metadata generally converted to metadata ・ Metadata obtained from SI (Service Information) of digital broadcasting ・ EPG obtained from an EPG provider Metadata such as information-There is metadata such as EPG obtained from the Internet.

カメラで撮影したＡＶコンテンツに関連付けたメタデータとして、例えば、
・重要シーンでユーザが押したボタン情報（ナンバーなど識別情報を付加できる）
・撮影データ
なお、撮像データには、撮像素子の動作モード、逆光補正、絞り・露出情報、フォーカス、シャッタ速度情報、色温度、ホワイトバランス、ズーム、仰角、地面撮影、グラグラした揺れ、グラグラ、パン／ティルト／ズーム（ＰＴＺと略す）の状態、ハウリングの状態、カメラのキャップ閉まり状態、およびカメラの姿勢状態（水平／垂直方向の絶対傾き情報、水平／垂直方向の角速度情報、前後／左右／垂直の加速度情報など）が含まれる。
・タイムコード（映像フレーム、音声フレーム）
・撮影フレームレート、記録フレームレートなどの映像や音声のフォーマット情報
などがある。 As metadata associated with AV content shot with a camera, for example,
-Button information pressed by the user in an important scene (identification information such as a number can be added)
・ Shooting data Note that the imaging data includes the operation mode of the image sensor, backlight compensation, aperture / exposure information, focus, shutter speed information, color temperature, white balance, zoom, elevation angle, ground shooting, swaying shaking, swaying, panning / Tilt / zoom (abbreviated as PTZ), howling, camera cap closed, and camera posture (horizontal / vertical absolute tilt information, horizontal / vertical angular velocity information, front / back / left / right / vertical) Acceleration information, etc.).
・ Time code (video frame, audio frame)
There are video and audio format information such as shooting frame rate and recording frame rate.

また、ノン（非）リアルタイムメタデータとしては、例えば、
・メニュー情報
・タイトルリスト（代表イベント、ユーザが登録したイベント）
・シーン番号、カット番号、テーク番号、その収録テークの採用／不採用／保留などに関する情報
・映像ブロックの輝度、色情報
・画像認識データ（顔、人、ペットなどの検出、認識）
・音声入力レベル（指定ｃｈの一定期間における入力レベルの最大値）
・音声認識データ
・撮像素子の３原色点の色度空間情報、白色の座標、３原色のうち少なくとも２つのゲイン情報、色温度情報、Δｕｖ（デルタｕｖ）などの撮像システムの動作データ
・外部との通信で入力されたファイル（シナリオなどのテキストをＸＭＬ、バイナリデータの形式のファイルを外部インタフェースより入力）
・３原色または輝度信号のガンマ情報
・静止画
・サムネイル
などがある。上述のメタデータのうちで必要なものが選択されて使用される。メタデータの記述形式としては、ＵＰｎＰやＵＰｎＰ−ＡＶで導入されているプロパティ（ｐｒｏｐｅｒｔｙ）やアトリビュート（ａｔｔｒｉｂｕｔｅ）がある。これらは、ｈｔｔｐ：／／ｕｐｎｐ．ｏｒｇで公開されており、テキストやＸＭＬ（ＥｘｔｅｎｓｉｂｌｅＭａｒｋｕｐＬａｎｇｕａｇｅ）の利用を意識した設計を行うト効率的な運用ができる。 Non-real time metadata includes, for example,
・ Menu information ・ Title list (representative events, events registered by the user)
・ Scene number, cut number, take number, information on adoption / non-adoption / holding of the recorded take ・ Video block brightness, color information ・ Image recognition data (detection and recognition of faces, people, pets, etc.)
・ Voice input level (maximum input level for a specified period of a specified channel)
・ Voice recognition data ・ Chromaticity space information of the three primary color points of the image sensor, white coordinates, gain information of at least two of the three primary colors, color temperature information, Δuv (delta uv) and other operation data of the imaging system Input by communication (text of scenario etc. is input from XML, binary data format file is input from external interface)
-Gamma information of 3 primary colors or luminance signals-Still images-Thumbnails etc. Necessary ones of the above metadata are selected and used. As a description format of metadata, there are a property (property) and an attribute (attribute) introduced in UPnP and UPnP-AV. These are available at http: // upnp. org, and can be operated efficiently to perform design in consideration of the use of text and XML (Extensible Markup Language).

なお、ムービーなどの撮影者、コンテンツ制作者、またはコンテンツの著作権者が各メタデータに価値を付け、コンテンツを利用するユーザの利用内容や頻度により利用料金を徴収するために、各メタデータに価値を与えるメタデータを関連づけることができる。各メタデータに価値を与えるメタデータは該メタデータのアトリビュートで与えてもよいし、独立したプロパティとして与えてもよい。 In addition, in order that a photographer such as a movie, a content creator, or a copyright holder of content adds value to each metadata and collects usage fees according to the usage details and frequency of users who use the content, You can associate metadata that gives value. Metadata giving value to each metadata may be given as an attribute of the metadata or may be given as an independent property.

以下に、録画機器と録画条件に関する情報について一例を説明する。ムービーの機器メーカー識別ＩＤ、機種識別ＩＤ、ムービーなどの撮影者、コンテンツ制作者、またはコンテンツの著作権者が作成し、そして登録するメタデータの価値が高く使用許諾が必要と考える場合、該メタデータの利用には認証による使用許諾のプロセスを実行する構成を本発明に組み込んで構成をすることによって効率的な運用ができる。
この場合、撮影者は撮影した動画コンテンツを暗号化したファイルを作成し、Ｉｎｔｅｒｎｅｔ上のサーバーにそのファイルをアップロードして公開してもよいし、暗号化したファイルをアップロード・告知して、気にいった人に購入してもらう構成もとれる。また、ハプニング映像などニュース価値があるコンテンツが録画できた場合、複数の放送局間でオークションにかけることもできる。メタデータの活用により、増大するコンテンツの分類、検索を効率的にすることができる。 Hereinafter, an example of information regarding the recording device and the recording condition will be described. If the device manufacturer identification ID, model identification ID, movie photographer, content creator, or content copyright holder creates and registers the value of the metadata and thinks that a license is required, For the use of data, an efficient operation can be performed by incorporating a configuration for executing a process of permissioning use by authentication into the present invention.
In this case, the photographer may create a file in which the captured video content is encrypted, upload the file to a server on the Internet and publish it, or upload and announce the encrypted file. You can make a structure that people can buy. In addition, when content that has news value such as a happening video can be recorded, it can be auctioned between a plurality of broadcasting stations. By using metadata, it is possible to efficiently classify and search for increasing contents.

次に、図４を参照して、実施の一形態として映像圧縮方式（Ｈ．２６４／ＡＶＣ方式）および音声圧縮方式（ＡＡＣ方式）における以下の４つの方法について説明する。
・リアルタイムメタデータのマッピング方法
・リアルタイムメタデータから不要シーン検出
・重要シーンの検出を行いそのシーン情報（シーンのインデックス、タグ、または、メタデータと呼ばれる）のマッピング方法 Next, the following four methods in the video compression method (H.264 / AVC method) and the audio compression method (AAC method) will be described as an embodiment with reference to FIG.
・ Real-time metadata mapping method ・ Unnecessary scene detection from real-time metadata ・ Important scene detection and scene information (called scene index, tag, or metadata) mapping method

図４は、図１におけるカメラ１０１内部のＡＶ信号圧縮記録制御部における映像と音声の圧縮エンジンとその周辺処理手段のより詳細な説明図である。図４において、参照符号４０１は映像符号化部を示し、参照符号４０２はＶＣＬ（ＶｉｄｅｏＣｏｄｉｎｇＬａｙｅｒ）−ＮＡＬ（ＮｅｔｗｏｒｋＡｂｓｔｒａｃｔｉｏｎＬａｙｅｒ）ユニットバッファを示し、参照符号４０３はＡＡＣ方式による音声符号化部を示している。 FIG. 4 is a more detailed illustration of the video and audio compression engine and its peripheral processing means in the AV signal compression / recording control unit inside the camera 101 in FIG. 4, reference numeral 401 indicates a video encoding unit, reference numeral 402 indicates a VCL (Video Coding Layer) -NAL (Network Abstraction Layer) unit buffer, and reference numeral 403 indicates an audio encoding unit based on the AAC scheme. ing.

参照符号４０４はＰＳ（ＰａｒａｍｅｔｅｒＳｅｔ）バッファを示し、参照符号４０５はＶＵＩ（ＶｉｄｅｏＵｓａｂｉｌｉｔｙＩｎｆｏｒｍａｔｉｏｎ）バッファを示し、参照符号３０６はＳＥＩ（ＳｕｐｐｌｅｍｅｎｔａｌＥｎｈａｎｃｅｍｅｎｔＩｎｆｏｒｍａｔｉｏｎ）バッファを示し、参照符号４０７はｎｏｎ−ＶＣＬ−ＮＡユニットバッファを示し、参照符号４０８はＭＰＥＧ−ＰＥＳパケット生成部を示している。また、参照符号４０９はＭＰＥＧ−ＴＳ（ＭＰＥＧＴｒａｎｓｐｏｒｔＰａｃｋｅｔ）生成部を示し、参照符号４１０はＡＴＳ（ＡｒｒｉｖａｌＴｉｍｅＳｔａｍｐ）パケット生成部を示し、参照符号４１１はＥＰ−ｍａｐ生成部を示している。 Reference numeral 404 indicates a PS (Parameter Set) buffer, reference numeral 405 indicates a VUI (Video Usability Information) buffer, reference numeral 306 indicates a SEI (Supplemental Enhancement Information) buffer, and reference numeral 407 indicates a non-VCL-NA. Reference numeral 408 denotes an MPEG-PES packet generator. Reference numeral 409 indicates an MPEG-TS (MPEG Transport Packet) generator, reference numeral 410 indicates an ATS (Arrival Time Stamp) packet generator, and reference numeral 411 indicates an EP-map generator.

図４に示すように、映像信号は映像符号化部４０１によってＶＣＬ−ＮＡユニット形式のデータに変換された後に、ＶＣＬ−ＮＡユニットバッファ４０２によって一時保持される。音声信号、外部入力ＰＳデータおよび外部入力ＶＵＩデータは、音声符号化部４０３、ＰＳバッファ４０４、およびＶＵＩバッファ４０５によってそれぞれＮｏｎＶＣＬ−ＮＡユニット形式のデータに変換された後に、ＮｏｎＶＣＬ−ＮＡユニットバッファ４０７で一時保持される。同様に、撮像装置のパン、ティルト、ズーム、フォーカス、音声入力レベル（指定ｃｈの一定期間における入力レベルの最大値）などのリアルタイム系メタデータは、Ｈ．２６４／ＡＶＣのＳＥＩのＵｓｅｒｄａｔａｕｎｒｅｇｉｓｔｅｒｅｄＳＥＩｍｅｓｓａｇｅにマッピングされた後に、更にＳＥＩバッファ４０６によって、ＮｏｎＶＣＬ−ＮＡユニット形式のデータに変換され、ＮｏｎＶＣＬ−ＮＡユニットバッファ４０７で一時保持される。 As shown in FIG. 4, the video signal is temporarily stored in the VCL-NA unit buffer 402 after being converted into data in the VCL-NA unit format by the video encoding unit 401. The voice signal, the external input PS data, and the external input VUI data are converted into data of the Non VCL-NA unit format by the voice encoding unit 403, the PS buffer 404, and the VUI buffer 405, respectively, and then the non-VCL-NA unit buffer. It is temporarily held at 407. Similarly, real-time metadata such as pan, tilt, zoom, focus, and audio input level (maximum value of input level for a specified period of a specified channel) of the imaging apparatus is H.264. After being mapped to the SEI user data unregistered SEI message of H.264 / AVC, it is further converted into non-VCL-NA unit format data by the SEI buffer 406 and temporarily stored in the non-VCL-NA unit buffer 407.

映像信号（４ａ）は映像符号化部４０１、および顔・人物検出手段４０８に入力される。顔・人物検出手段４０８は、人の顔の位置／大きさ／数を検出して検出データ（４ｑ）をシーン情報のメタデータ生成手段４０９に出力する。音声信号（４ｂ）は、音声符号化部４０３、および、シーン情報のメタデータ生成手段４０９に入力される。また、外部入力ＰＳデータ（４ｃ）、外部入力ＶＵＩデータ（４ｄ）、タイムコード（４ｅ）、撮影／記録フレームレート（４ｆ）、ユーザが操作ボタンを押した時刻データ（４ｇ）、逆光補正／絞りデータ（４ｈ）、色温度／ホワイトバランスデータ（４ｉ）、フォーカスデータ（４ｊ）、ズームデータ（４ｋ）、ジャイロセンサーのヨー／ロール／ピッチのデータ（４ｍ）、仰角／地面撮り検出データ（４ｎ）、カメラのレンズキャップの閉まり具合の状態データ（４ｐ）が、それぞれ、シーン情報のメタデータ生成手段４０９に入力される。 The video signal (4a) is input to the video encoding unit 401 and the face / person detection means 408. The face / person detection unit 408 detects the position / size / number of human faces and outputs detection data (4q) to the metadata generation unit 409 of scene information. The audio signal (4b) is input to the audio encoding unit 403 and the scene information metadata generation means 409. External input PS data (4c), external input VUI data (4d), time code (4e), shooting / recording frame rate (4f), time data (4g) when the user presses the operation button, backlight correction / aperture Data (4h), color temperature / white balance data (4i), focus data (4j), zoom data (4k), gyro sensor yaw / roll / pitch data (4m), elevation angle / ground shot detection data (4n) The state data (4p) of the closing state of the lens cap of the camera is input to the scene information metadata generation means 409, respectively.

シーン情報のメタデータ生成手段４０９は、内部にハウリング検出手段４１０、不要シーン検出手段４１１、重要シーン検出手段４１２、リアルタイムデータ選択／マッピング手段４１３を備えている。ハウリング検出手段４１０は音声信号（４ｂ）に基づいて、ハウリングを検出する。 The scene information metadata generation unit 409 includes a howling detection unit 410, an unnecessary scene detection unit 411, an important scene detection unit 412, and a real-time data selection / mapping unit 413. The howling detection means 410 detects howling based on the audio signal (4b).

ここで、不要シーン検出例について説明する。不要シーン検出手段４１１は、図７に示す不要シーンのメタデータ（ＵＡ、ＵＢ、ＵＣ、ＵＤ、ＵＥ、ＵＦ、ＵＧ）を検出する。なお、
ＵＡは、カメラのキャップが閉まっている状態を表し、
ＵＢは、カメラの揺れ、ぐらぐらが大きい状態を表し、
ＵＣは、フォーカスが合わないで、ピンボケになっている状態を表し、
ＵＤは、パン／ティルト／ズームが早すぎる状態を表し、
ＵＥは、逆光になっている状態を表し、
ＵＦは、ハウリングがある状態を表し、そして
ＵＧは、地面を撮影している状態を表す。 Here, an example of unnecessary scene detection will be described. The unnecessary scene detection unit 411 detects the metadata (UA, UB, UC, UD, UE, UF, UG) of the unnecessary scene shown in FIG. In addition,
UA represents the state in which the camera cap is closed,
UB represents a state where camera shake and wobble are large,
UC represents out of focus, out of focus,
UD represents pan / tilt / zoom too early,
UE represents the state of being backlit,
UF represents a state in which there is howling, and UG represents a state in which the ground is being photographed.

上述のメタデータが表す上状態の画像は、一般常識的な感性や感情を持った人であれば、ほぼ万国共通に不要シーンと認識されるものである。これは、受け入れがたいものに対する価値観は、一般に人の生理的な知覚、認知メカニズムに基づくところが多く、ほぼ万国共通であるためと考えられる。 The above-stated image represented by the above-mentioned metadata is recognized as an unnecessary scene in almost all countries as long as it is a person who has a common sense or emotion. This is probably because the values for unacceptable things are generally based on human physiological perception and cognitive mechanisms, and are almost universal.

具体的に言えば、メタデータＵＡ、ＵＢ、ＵＣ、ＵＤ、ＵＥ、ＵＦ、およびＵＧは、それぞれ、優先度（ポイント）をＵＡ（５０点）、ＵＢ（４０点）、ＵＣ（２５点）、ＵＤ（２０点）、ＵＥ（４０点）、ＵＦ（３５点）、ＵＧ（２５点）と重み付けられる。 Specifically, the metadata UA, UB, UC, UD, UE, UF, and UG respectively have a priority (point) of UA (50 points), UB (40 points), UC (25 points), Weighted as UD (20 points), UE (40 points), UF (35 points), UG (25 points).

図７において、不要シーンの選択アルゴリズムとしては、単一で扱うだけでなく、複数の不要シーンの間隔が映像フレームで６０フレーム以内の場合には１つの連続した不要シーンとしてまとめて扱うアルゴリズムを用いる。すなわち、「不要シーン区間の定義式」として、（単一不要シーン区間）＋（間隔が６０フレーム以内の複数の不要シーン区間）が規定できる。 In FIG. 7, as an unnecessary scene selection algorithm, an algorithm that handles not only a single unnecessary scene but also treats it as a single continuous unnecessary scene when the interval between a plurality of unnecessary scenes is 60 frames or less is used. . In other words, (single unnecessary scene section) + (a plurality of unnecessary scene sections with an interval of 60 frames or less) can be defined as the “unnecessary scene section definition formula”.

なお、複数の不要シーンの間隔が６０フレーム以内の場合に、これらの複数の不要シーンを１つの不要シーンとして扱う理由は、６０フレーム以内でとぎれとぎれの映像を繋げると、忙しくて落ち着かない映像になるからである。不要シーンのシーン情報としては、不要シーンを構成する最も優先度の高いメタデータの種別と、不要シーンの時刻と期間（不要シーンの長さ）がシーン情報の詳細記述としてプレイリストのマーカー情報（マーカースキップに使用できる）や、マーカー情報の選択アルゴリズムとしてはと関連つけた補助データなどに使用される。 In addition, when the interval between a plurality of unnecessary scenes is within 60 frames, the reason why these plurality of unnecessary scenes are handled as one unnecessary scene is that if the cut-off video is connected within 60 frames, the video becomes busy and uncomfortable. Because. As the scene information of the unnecessary scene, the type of metadata having the highest priority constituting the unnecessary scene, the time and period of the unnecessary scene (the length of the unnecessary scene), and the marker information of the playlist as the detailed description of the scene information ( It can be used for marker skipping) and auxiliary data associated with the marker information selection algorithm.

なお、不要シーンの数はコンテンツ長に応じて、予め決めておくこともできる。たとえば不要シーンは、１分当たり５個以内で、かつ、最大１００個と決めることができる。また、不要シーンの種別指定や、コンテンツ毎に縮小率を指定することができる。また、３脚などを用いて安定した映像を撮影している場合は、不要シーンの検出機能を手動でＯＦＦにして撮影することもできる。 The number of unnecessary scenes can be determined in advance according to the content length. For example, the number of unnecessary scenes can be determined within 5 per minute and up to 100. In addition, it is possible to specify the type of unnecessary scene and the reduction rate for each content. In addition, when a stable video is shot using a tripod or the like, the unnecessary scene detection function can be manually turned off for shooting.

不要シーン削除の例として、図５に撮影コンテンツより揺れが大きな２つの区間（シーン）を削除して、揺れのない３つの区間（シーン）を組み合わせて、１つの揺れのない映像シーケンスを構成する手順を示す。なお、揺れ以外の原因による不要区間の削除も同様の方法で行うことができる。 As an example of unnecessary scene deletion, in FIG. 5, two sections (scenes) that are more shaken than the photographed content are deleted, and three sections (scenes) that are not shaken are combined to form one unswinged video sequence. Show the procedure. It should be noted that unnecessary sections can be deleted by a similar method due to causes other than shaking.

次に、重要シーン検出例について説明する。重要シーン検出手段４１２は、図８に示す重要シーンのメタデータ（Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ）を以下のごとく検出する。
メタデータ（Ａ）に関しては、クリップのはじめ（ＣＬＩＰ−ＩＮ）、または、クリップのおわり（ＣＬＩＰ−ＯＵＴ）における、１〜３秒のフィックス部分を重要部分として抽出するアルゴリズムを作成できる）。たとえば、１００点とする。 Next, an important scene detection example will be described. The important scene detection means 412 detects the metadata (A, B, C, D, E, F) of the important scene shown in FIG. 8 as follows.
Regarding the metadata (A), an algorithm for extracting a fixed part of 1 to 3 seconds as an important part at the beginning of the clip (CLIP-IN) or the end of the clip (CLIP-OUT) can be created. For example, 100 points.

メタデータ（Ｂ）に関しては、音検出のメタデータ１０７（メインマイクやサブマイクからの音声、カチンコ音、ピストル発射音、歓声、などの入力を検出したメタデータ部分）を重要部分として抽出するアルゴリズムを作成できる）。たとえば、７０点とする。 Regarding the metadata (B), an algorithm for extracting the sound detection metadata 107 (the metadata portion that detects the input from the main microphone and the sub microphone, the clapperboard sound, the pistol emission sound, the cheer, etc.) as an important part. Can be created). For example, 70 points.

メタデータ（Ｃ）に関しては、ジャイロの出力よりカメラのパン、ティルト後のフィックス（静止）シーンを検出することができる。フィックスシーン（カメラマンが興味を持って固定映像を撮影している部分、Ｉｎｔｅｒｅｓｔ部分）を、たとえば、４０点とする。 As for metadata (C), a fixed (still) scene after panning and tilting of the camera can be detected from the output of the gyro. For example, 40 fixed scenes (a portion where the photographer is interested in shooting a fixed video, an Interest portion) are 40 points.

メタデータ（Ｄ）に関しては、カメラのズーム値の時間変化をモニタして、ズームアップ、ズームダウンした部分を検出することができる。より詳細には、ズームアップ、ズームダウンの前後における１〜３秒のフィックス部分を重要部分として抽出するアルゴリズムを作成できる。たとえば、３０点とする。 With regard to the metadata (D), it is possible to detect a zoomed-up and zoomed-down portion by monitoring the temporal change of the zoom value of the camera. More specifically, it is possible to create an algorithm that extracts a fixed part of 1 to 3 seconds before and after zooming up and zooming down as an important part. For example, 30 points.

メタデータ（Ｅ）に関しては、ジャイロの出力よりカメラのパン、ティルトを検出することができる。たとえば、２５点とする。 Regarding metadata (E), panning and tilting of the camera can be detected from the output of the gyro. For example, 25 points.

メタデータ（Ｆ）に関しては、撮像素子から得られる撮影映像を複数のブロックに分割し、各ブロックの色相、色度が、あらかじめ決めた色相、色度の範囲内にあるかどうかの情報が検出できる。たとえば、検出ブロックの大きさと形状、および肌色度から人の顔を検出することができる。さらに、パン、ティルトやズームの後のフィックス映像の検出ブロックの形状および肌色度から人の顔をより精度よく検出することができる。）。たとえば、５０点とする。 For metadata (F), the captured image obtained from the image sensor is divided into a plurality of blocks, and information on whether the hue and chromaticity of each block are within a predetermined hue and chromaticity range is detected. it can. For example, a human face can be detected from the size and shape of the detection block and the skin chromaticity. Furthermore, it is possible to detect a human face more accurately from the shape and skin color of the detection block of the fixed image after panning, tilting and zooming. ). For example, 50 points.

注意すべき点は、上述の状態は一般的な感性・感情を持った人でも、撮影の素人と撮影のカメラワークに慣れた人では重要シーンとして認識するレベルが人によって少し異なる場合がありえることである。すなわち、撮影に慣れた人には映像撮影技法に準拠したカメラワークを意識して撮影するが、素人はそのような知識がないのでカメラワークを考えないで被写体を撮影することが往々にしてあるからである。ただ、素人でも撮影しているうちに、アドバイスを受けたり、自分できづいたりしながら、次第に一般に良いとされるカメラワークを習熟していくことが通例である。 It should be noted that the level described above may be slightly different depending on the level of recognition as an important scene for those who have general sensibility and emotions but who are familiar with shooting and camerawork. It is. In other words, people who are accustomed to shooting shoot with camerawork compliant with video shooting techniques, but amateurs do not have such knowledge, so they often shoot subjects without thinking about camerawork. Because. However, it is customary to gradually become familiar with camera work, which is generally considered good, while receiving advice and being able to do it while an amateur is shooting.

図６に、重要シーンを抽出してダイジェストを生成する例として、撮影コンテンツよりズームアップ、およびズームダウン後の２つのフィックス区間（シーン）を抽出して、１つのダイジェストを構成する手順を示す。なお、ズームアップ、ズームダウン以外の要因による重要シーンの抽出とダイジェスト生成も同様の方法で行うことができる。 FIG. 6 shows a procedure for constructing one digest by extracting two fixed sections (scenes) after zoom-in and zoom-down from the captured content as an example of extracting an important scene and generating a digest. Note that extraction of important scenes and digest generation due to factors other than zoom-up and zoom-down can be performed in the same manner.

Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆはそれぞれ優先度（それぞれ、ポイントを持つ）であり、重み付けを行う。図８において不要シーンは、Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆのメタデータのいずれかで表される重要シーン、もしくは、映像フレームで、Ｎを整数として、（Ｎ）フレームの窓内に存在する複数の重要シーンの中で最も優先度の高いメタデータ代表させる。 A, B, C, D, E, and F are priorities (each has a point) and are weighted. In FIG. 8, an unnecessary scene is an important scene represented by one of metadata of A, B, C, D, E, and F, or a video frame, where N is an integer and is within a window of (N) frame. The metadata having the highest priority among a plurality of important scenes present is represented.

たとえば、３００フレーム幅の窓を用いた場合、この窓内に存在する複数の重要シーンから、「優先度ポイントの算出式」としては、
（３００フレーム区間の窓内で最大優先度のシーンの優先度）＋（その他の優先シーンの優先度に一定の係数で重み付けをした値）を定義して、「複数の重要シーンの優先度」を計算することができる。最優先シーン以外の優先シーンの優先度の重み付けは、たとえば０．２とする。 For example, when a 300-frame-wide window is used, a “priority point calculation formula” from a plurality of important scenes present in the window is as follows:
Define (priority of the scene with the highest priority within the window of the 300 frame section) + (value obtained by weighting the priority of the other priority scenes with a certain coefficient), and "priority of a plurality of important scenes" Can be calculated. For example, the priority weight of priority scenes other than the highest priority scene is set to 0.2.

図８に示した例では、重要シーンの検出窓に、Ｄ（３０点）、Ｃ（４０点）、Ｅ（２５点）、Ｅ（２５点）が含まれる。そのために、優先度合計値は、
０．２×３０＋４０＋０．２×２５＋０．２×２５＝６＋４０＋５＋５
＝５６
となる。 In the example shown in FIG. 8, D (30 points), C (40 points), E (25 points), and E (25 points) are included in the important scene detection window. Therefore, the priority total value is
0.2 × 30 + 40 + 0.2 × 25 + 0.2 × 25 = 6 + 40 + 5 + 5
= 56
It becomes.

なお、重要シーンのシーン情報としては、重要シーンを構成する最も優先度の高いメタデータ（Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ）の種別と、重要シーンの時刻と期間（重要シーンの長さ）がシーン情報の詳細記述としてプレイリストのマーカー情報（マーカースキップに使用できる）や、マーカー情報と関連つけた補助データなどに使用される。ここで、重要シーンの数はコンテンツ長に応じて、あらかじめ決めておくこともできる。また、重要シーンによるコンテンツの縮小率もあらかじめ決めておくこともできる。 Note that the scene information of the important scene includes the type of metadata (A, B, C, D, E, F) having the highest priority constituting the important scene, the time and period of the important scene (the length of the important scene). Is used as detailed description of scene information for playlist marker information (can be used for marker skipping), auxiliary data associated with marker information, and the like. Here, the number of important scenes can be determined in advance according to the content length. In addition, the reduction rate of content due to important scenes can be determined in advance.

たとえば、図８の例にでは、一定期間の窓Ｗ１の中で最大の優先度を持ったメタデータを、ダイジェスト生成のために用いる第２のメタデータとして設定される。この場合、約８００フレームに存在する、フィックス状態を表すＩｎｔｅｒｅｓｔメタデータＣを選択する。この時、事前に決めた規則により、このメタデータにイベントの種別、イベントの優先度、および、イベントの開始時刻と終了時刻と代表時刻とをアトリビュートとして付加する。これによって、第２のメタデータとそのアトリビュートであるイベントの開始時刻と終了時刻と代表時刻を参照することにより、たとえばズームアップ（３秒間）からフィックス（２秒間）までの５秒間の撮影シーンを１つのメタデータで表現することができる。 For example, in the example of FIG. 8, the metadata having the highest priority in the window W1 for a certain period is set as the second metadata used for generating the digest. In this case, Interest metadata C representing a fixed state, which is present in about 800 frames, is selected. At this time, according to a rule determined in advance, an event type, an event priority, an event start time, an end time, and a representative time are added to the metadata as attributes. By referring to the second metadata and the attribute start time and end time of the event, and the representative time, a 5 second shooting scene from zoom-up (3 seconds) to fix (2 seconds), for example, can be obtained. It can be expressed by one piece of metadata.

そこで、このメタデータの優先度の順、あるいは、メタデータのイベントの種別を指定することによりダイジェスト映像を生成することができる。
また、このメタデータの優先度の順、あるいは、メタデータのイベントの種別（たとえば、ズームアップした部分）を指定して、メタデータ位置を参照することにより、撮影コンテンツ内の意味のある部分（たとえば、ズームアップした部分）へと、スキップ再生（メタデータによるスキップ再生）ができる。 Therefore, a digest video can be generated by designating the order of priority of the metadata or the type of event of the metadata.
Also, by specifying the metadata priority order or metadata event type (for example, zoomed-in part) and referring to the metadata position, a meaningful part ( For example, skip playback (skip playback using metadata) can be performed on the zoomed-in portion.

なお、図８において、一定期間の窓の中で優先度を持ったメタデータの合計値が、あらかじめ設定した値（たとえば、２５０）を超えた場合にのみ、第２のメタデータの選択を行うように設定するモードを追加することもできる。 In FIG. 8, the second metadata is selected only when the total value of the metadata having priority in the window for a certain period exceeds a preset value (for example, 250). It is also possible to add a mode to set as follows.

また、高優先度シーンの選別は、まず低優先度シーンを除外したコンテンツより高優先度シーンを選別することにより、より安定したシーンの選別が可能となる。たとえば、ズームアップ後のフィックス部で高優先度シーンとしては有効であっても、ピンボケになっている場合があり、そのような低優先度シーンを優先的に除外できる。 Further, the selection of the high priority scene can be performed more stably by first selecting the high priority scene from the content excluding the low priority scene. For example, even if it is effective as a high priority scene in the fixed part after zooming up, it may be out of focus, and such a low priority scene can be preferentially excluded.

同様に、ズームアップ後のフィックス部で高優先度シーンとしては有効であっても、逆光になって画面が真っ黒になっている場合があり、そのような低優先度シーンを優先的に除外できる。また、運動会のカケッコにおけるピストルの発射音が検出できていて高優先度シーンとしては有効であっても、ズームアップが早すぎたり、パンが早すぎたりして見るに耐えないシーンなどは、低優先度シーンを優先的に除外できる。 Similarly, even if it is effective as a high-priority scene in the fixed part after zooming up, the screen may become black due to backlighting, and such a low-priority scene can be preferentially excluded. . In addition, even if the pistol firing sound in the athletic club is detected and effective as a high-priority scene, the zoom-up is too early or panning is too early, so scenes that cannot be viewed are low. Priority scenes can be excluded preferentially.

上述のように、まず第１段階として、低優先度シーンを除外した安定的に撮影されているシーンにより構成されるプレイリストを生成する。次に、第２段階として、安定したシーンより高優先度シーンを選別する。たとえば、第１段階で作成したプレイリストに高優先度シーンを追加して不要部ないコンテンツから重要シーンを選別する。以上の段階を経ることにより、揺れピンボケなどのない、視覚的にもより安定した、重要シーンの選別が可能となる。 As described above, as a first step, a playlist composed of scenes that have been stably photographed excluding low priority scenes is generated. Next, as a second stage, a high priority scene is selected from stable scenes. For example, a high priority scene is added to the playlist created in the first stage, and important scenes are selected from contents that are not unnecessary. Through the above-described steps, it is possible to select important scenes that are more visually stable without shaking blurring.

なお、不要シーンのメタデータ（ＵＡ、ＵＢ、ＵＣ、ＵＤ、ＵＥ、ＵＦ、ＵＧ）は、不要というマイナスの意味を持つことから、符号にマイナスを付けて処理してもよい。ただし、この場合でも、前述した不要シーンの計算アルゴリズム（複数の不要シーンを１つの不要シーンにたばねる処理）と、重要シーンの計算アルゴリズム（複数の重要シーンから代表シーンを決め、代表シーンの優先度を計算する処理）は異なる処理であり、同じアルゴリズムで実現することはできない。 Note that unnecessary scene metadata (UA, UB, UC, UD, UE, UF, UG) has a negative meaning of being unnecessary, and may be processed with a minus sign. However, even in this case, the above-mentioned calculation algorithm for unnecessary scenes (processing for putting a plurality of unnecessary scenes into one unnecessary scene) and the calculation algorithm for important scenes (representing a representative scene from a plurality of important scenes, the priority of the representative scene) The process of calculating the degree) is a different process and cannot be realized with the same algorithm.

なお、高優先度のシーン（重要シーン）と低優先度のシーン（不要シーン）の中間の値をもつシーンは普通のシーン（一般的なシーン）であると取り扱える。 A scene having an intermediate value between a high priority scene (important scene) and a low priority scene (unnecessary scene) can be handled as an ordinary scene (general scene).

リアルタイムデータ選択／マッピング手段４１３は、前述したリアルタイムメタデータをＳＥＩにマッピング方法することにより、以後、パソコンなど別の再生機器でも、このデータから不要シーンや重要シーンのシーン情報を再計算することができる。不要シーンや重要シーンのシーン情報を再計算するアルゴリズムが更新された場合に有効である。 The real-time data selection / mapping means 413 maps the above-described real-time metadata to SEI, so that it is possible to recalculate the scene information of unnecessary scenes and important scenes from this data even in another playback device such as a personal computer. it can. This is effective when an algorithm for recalculating scene information of unnecessary scenes and important scenes is updated.

ＶＣＬ−ＮＡユニットバッファ４０２から出力されたＶＣＬ−ＮＡユニット形式のデータと、ＮｏｎＶＣＬ−ＮＡユニットバッファ４０７から出力されたＮｏｎＶＣＬ−ＮＡユニット形式のデータに基づいて、ＭＰＥＧ−ＰＥＳパケット（図９（Ｃ）参照）を生成し、１８８バイト長のＭＰＥＧ−ＴＳ（図９（Ｄ）参照）を生成する。次に、ＭＰＥＧ−ＴＳのそれぞれにタイムスタンプを含む４バイトのヘッダーを付加して、１９２バイトのＡＴＳパケット（図９（Ｅ）参照）を生成する。 Based on the VCL-NA unit format data output from the VCL-NA unit buffer 402 and the Non VCL-NA unit format data output from the Non VCL-NA unit buffer 407, an MPEG-PES packet (FIG. 9 ( C)) to generate an 188-byte MPEG-TS (see FIG. 9D). Next, a 4-byte header including a time stamp is added to each MPEG-TS to generate a 192-byte ATS packet (see FIG. 9E).

このタイムスタンプは、たとえば、各ＭＰＥＧ−ＴＳパケットがＡＴＳパケットを生成する処理ブロックに到着した時刻を示す。一般に、タイムスタンプのクロックは２７ＭＨｚであり、４バイト全てがタイムスタンプとして用いる場合や、４バイトの内の３０ビットをタイムスタンプとし、残りの２ビットはコンテンツ保護のためのフラグなどに用いる場合がある。 This time stamp indicates, for example, the time at which each MPEG-TS packet arrives at a processing block that generates an ATS packet. Generally, the clock of the time stamp is 27 MHz, and when all 4 bytes are used as a time stamp, 30 bits out of 4 bytes are used as a time stamp, and the remaining 2 bits are used as a flag for protecting contents. is there.

また、ストリームが包含する各ＧＯＰ（ＧｒｏｕｐｏｆＰｉｃｔｕｒｅ）の先頭ピクチャのＰＴＳ（ＰｒｅｓｅｎｔａｔｉｏｎＴｉｍｅＳｔａｍｐ）、および各ＧＯＰの先頭ピクチャにおける先頭ＡＴＳの連番をペアで、ＥＰ−ＭＡＰとして出力する。なお、ＰＴＳやＤＴＳ（ＤｅｃｏｄｅＴｉｍｅＳｔａｍｐ）はＰＥＳパケットのヘッダーに含まれるので抽出は容易である。 Further, the PTS (Presentation Time Stamp) of the first picture of each GOP (Group of Picture) included in the stream and the serial number of the first ATS in the first picture of each GOP are output as a pair as an EP-MAP. Since PTS and DTS (Decode Time Stamp) are included in the header of the PES packet, extraction is easy.

また、各ＧＯＰの先頭ピクチャにおける先頭ＡＴＳの連番とは、ストリーム先頭のＡＴＳの連番を１とし、ストリーム先頭からのＡＴＳの個数を順次数えた番号である。各ＧＯＰの先頭ピクチャのＰＴＳとＡＴＳ連番のペアとして定義されるＥＰ−ＭＡＰは、プレイリストを用いた再生やストリーム編集の際に用いる。 The serial number of the first ATS in the first picture of each GOP is a number obtained by sequentially counting the number of ATSs from the stream head, with the serial number of the ATS at the head of the stream being 1. The EP-MAP defined as a pair of PTS and ATS serial number of the first picture of each GOP is used for playback using a playlist and stream editing.

これらのＨ．２６４／ＡＶＣ方式については、例えば、「Ｈ．２６４／ＡＶＣ教科書」、大久保榮監修、株式会社インプレス発行、などの書籍に詳述されている。また、ＭＰＥＧ−ＴＳ信号はＩＥＣ６１８８３−４で規定されている。 These H.C. The H.264 / AVC format is described in detail in books such as “H.264 / AVC textbook”, supervised by Okubo Satoshi, published by Impress Corporation. The MPEG-TS signal is defined by IEC 61883-4.

ＭＰＥＧ−ＴＳのＰＡＴ、ＰＭＴなど、ＴＳに関する参考文献としては、例えば、ＣＱ出版社、ＴＥＣＨＩＶｏ．４、「画像＆音声圧縮技術のすべて（インターネット／ディジタルテレビ、モバイル通信時代の必須技術）」、監修、藤原洋、第６章、「画像や音声を多重化するＭＰＥＧシステム」があり、同書にて解説されている。 References related to TS such as PAT and PMT of MPEG-TS include, for example, CQ Publisher, TECH I Vo. 4. “All of image & audio compression technology (essential technology in the Internet / digital television and mobile communication era)”, supervised by Hiroshi Fujiwara, Chapter 6, “MPEG system for multiplexing images and audio” It is explained.

また、ＰＳＩやＳＩの階層構造、処理手順の例、選局処理の例として、「デジタル放送受信機における選局技術」、三宅他、三洋電機技報、ＶＯＬ．３６、ＪＵＮＥ、２００４、第７４号、３１ページから４４ページでも解説されている。 Also, as an example of the hierarchical structure of PSI and SI, an example of processing procedure, and a channel selection process, “Channel selection technology in a digital broadcast receiver”, Miyake et al., Sanyo Electric Technical Report, VOL. 36, JUNE, 2004, No. 74, pages 31 to 44.

次に、図９を参照して、Ｈ．２６４／ＡＶＣのファイル構成例について説明する。左側の四角枠はディレクトリ構成、右側の四角枠はファイル構成であり、ＳＤカードやＤＶＤ−Ｒ、ＤＶＤ−ＲＡＭ、ＢＤ−ＲＥなどの情報記録メディア上に構成される。 Next, referring to FIG. An example of the H.264 / AVC file structure will be described. The left square frame is a directory structure, and the right square frame is a file structure, which is configured on an information recording medium such as an SD card, DVD-R, DVD-RAM, or BD-RE.

図９において、左側の四角枠はディレクトリ構成を示している。ｒｏｏｔ下には、参照ファイル（ｒｅｆ．ｆｉｌｅ）と、「ＰｌａｙＬｉｓｔ」、「ＣＬＩＰ」、および、「ＳＴＲＥＡＭ」ディレクトリが存在する。「ＰｌａｙＬｉｓｔ」ディレクトリ下には、プレイリスト（ファイル）である「＊．ｐｌｓ」ファイルが存在する。また、「ＣＬＩＰ」（クリップ）ディレクトリには、クリップファイルであるである「＊．ｃｌｐ」ファイルが存在する。「ＳＴＲＥＡＭ」ディレクトリ下にはＡＴＳ（１９２バイト）により構成されるストリームファイルである「＊．ａｔｓ」ファイルが存在する。 In FIG. 9, the square frame on the left shows the directory structure. Under the root, there are a reference file (ref.file) and “PlayList”, “CLIP”, and “STREAM” directories. Under the “PlayList” directory, there is a “* .pls” file that is a playlist (file). In the “CLIP” (clip) directory, a “* .clp” file that is a clip file exists. Under the “STREAM” directory, there is a “* .ats” file that is a stream file composed of ATS (192 bytes).

図９において、右側の四角枠はファイル構成について説明する。インデックスファイルは、コンテンツのタイトル情報を管理し、複数のチャプタ情報を管理する。プレイリストは、複数の再生パート情報（Ｐａｒｔ＃（ｎ）、ｎは自然数）を管理する。また、クリップファイルは、ＥＰマップを持つ。ＥＰマップはＰＴＳとストリームファイルを構成するＡＴＳパケットのＡＴＳ連番の相互対照マップテーブルであり、タイムコードとデータ位置の相互変換を行うものであり、プレイリスト再生やストリームファイル編集において不可欠のものである。 In FIG. 9, the right square frame describes the file structure. The index file manages content title information and manages a plurality of chapter information. The playlist manages a plurality of playback part information (Part # (n), where n is a natural number). The clip file has an EP map. An EP map is a cross-reference map table of PTS and ATS sequence numbers of ATS packets that make up a stream file, and performs mutual conversion between time code and data position, and is indispensable for playlist playback and stream file editing. is there.

上述のように、タイトルはプレイリストファイル、プレイリストファイルはクリップファイル、そして、クリップファイルはＡＴＳパケットによるストリームファイルと、それぞれ関連付けられる。 As described above, the title is associated with the playlist file, the playlist file is associated with the clip file, and the clip file is associated with the stream file based on the ATS packet.

前述のリアルタイムメタデータやノンリアルタイムメタデータはＳＥＩにマッピングされた後、ＡＴＳのストリームファイルに変換される。また、リアルタイムあるいはノンリアルタイムのメタデータから演算して生成したメタデータをクリップのシーン情報として、プレイリストのパート情報や補助エリアに付加情報としてマッピングする。すなわち、優先度をもったリストを撮影コンテンツの再生時に参照するプレイリストファイルの補助データとしてマッピングする。これにより、プレイリストファイルのデータを見るだけで、クリップのシーン情報メタデータを参照できるという大きな特徴を持つ。 The real-time metadata and the non-real-time metadata described above are mapped to SEI and then converted into an ATS stream file. Also, metadata generated by calculating from real-time or non-real-time metadata is mapped as scene information of the clip as additional information to the part information or auxiliary area of the playlist. That is, the list having the priority is mapped as auxiliary data of the playlist file that is referred to when the captured content is reproduced. This has a great feature that the scene information metadata of the clip can be referred to only by looking at the data of the playlist file.

そこで、コンテンツの再生機器でプレイリストの付加情報を参照して不要シーンや重要シーンなど撮影コンテンツ中のイベントの開始点または代表点に即座にアクセス（スキップ）できる。また、再生時に参照するプレイリストファイルの補助データであるメタデータのリストを参照することにより、指定したイベント区間（イベントの開始点から終了点までを参照して生成したイベント区間）を順番に再生できる。 Therefore, it is possible to immediately access (skip) the start point or the representative point of the event in the photographed content such as an unnecessary scene or an important scene with reference to the additional information of the playlist by the content playback device. In addition, by referring to the metadata list that is auxiliary data of the playlist file that is referenced during playback, the specified event section (event section generated by referring to the start point to end point of the event) is played in order. it can.

重要シーンにより構成されるダイジェストを生成する場合には、シーンの優先度の指定、またはシーン種別の指定、またはダイジェスト時間長の指定、またはダイジェストへの縮小率指定を入力するダイジェスト方式指定手段や、プレイリストの補助データであるシーンのリストを参照して、新たなプレイリストファイルを生成することもできる。 When generating a digest composed of important scenes, a digest method specifying means for inputting the priority of the scene, the specification of the scene type, the specification of the digest time length, or the reduction rate specification to the digest, It is also possible to generate a new playlist file by referring to the list of scenes that is auxiliary data of the playlist.

さらに、プレイリストを参照することにより、重要シーンとして指定される区間は通常再生を行い、それ以外のシーンは高速速度で再生することもできる。また、プレイリストを参照することにより、不要シーンとして指定される区間は高速再生を行い、それ以外のシーンは通常速度で再生することもできる。また、プレイリストを参照することにより、不要シーンとして指定される区間は検出した代表シーンや予め撮影して登録した静止画を３秒ずつ表示する再生を行い、それ以外のシーンは通常速度で再生することもできる。 Further, by referring to the playlist, the section designated as the important scene is normally reproduced, and the other scenes can be reproduced at a high speed. In addition, by referring to the playlist, the section designated as an unnecessary scene can be played back at high speed, and other scenes can be played back at normal speed. Also, by referring to the playlist, the section designated as an unnecessary scene is played back by displaying the detected representative scene and the still image that has been captured and registered in advance for 3 seconds, and other scenes are played back at normal speed. You can also

特に、子供の音楽会などの撮影コンテンツに対しては、揺れていたりピンボケになっていたりする不要な映像は見たくないが、ピアノや合唱などの演奏は連続して聞きたいというような要望を考慮して、再生する映像は代表シーンや予め撮影して登録してある映像（青空、建物、花、子供の顔など）に切り替えて、音声のみ連続再生することもできる。 Especially for shooting contents such as children's music concerts, I don't want to see unnecessary images that are shaking or out of focus, but I want to hear performances such as piano and chorus continuously. In consideration of this, it is possible to switch the video to be played back to a representative scene or a video (such as a blue sky, a building, a flower, or a child's face) that has been captured and registered in advance and can continuously reproduce only the audio.

プレイリストを参照することにより、不要シーンとして指定される区間はカメラワークに起因する撮影映像の横揺れやたて揺れを取り除いた映像を生成する手段を具備し、不要シーンとして指定される区間は画像処理により画像の揺れを取り除いた映像を生成して表示する再生を行い、それ以外のシーンは通常速度で再生することもできる。 By referring to the playlist, the section designated as an unnecessary scene has a means for generating a video from which the swaying and shaking of the captured video caused by camera work is removed, and the section designated as an unnecessary scene is It is possible to generate and display a video from which image shaking has been removed by image processing, and to reproduce other scenes at normal speed.

プレイリストを参照することにより、優先度があらかじめ決めた値以上のシーンまたは特定のカメラワークを持ったシーンより構成されるプレイリストを新たに生成して、タイトルに登録してもよい。 By referring to the playlist, a playlist composed of scenes having a priority higher than a predetermined value or scenes having a specific camera work may be newly generated and registered in the title.

プレイリストを参照することにより、各シーンの種別に応じたＢＧＭを生成して再生し、シーンの切り替わり付近でＢＧＭのメロディー、音色、テンポを変えより芸術性、文化度の高いコンテンツの再生を行うこともできる。 By referring to the playlist, a BGM according to the type of each scene is generated and played back, and the BGM melody, tone, and tempo are changed in the vicinity of the scene change to play back more artistic and cultural content. You can also.

図１０を参照して、Ｈ．２６４／ＡＶＣのストリームの構成について説明する。図１０（Ａ）は、Ｈ．２６４／ＡＶＣストリームのＧＯＰ構造である。図１０（Ｂ）は、各ピクチャがＶＣＬおよびＮｏｎ−ＶＣＬのＮＡＬユニットによって構成されていることを示す。ＮＡＬ（ｖｉｄｅｏ）は映像のＮＡＬユニットであり、ＮＡＬ（Ａｕｄｉｏ）は音声のＮＡＬユニットであり、ＮＡＬ（ＳＥＩ）はＳＥＩのＮＡＬユニットである。ＮＡＬ（ＳＥＩ）には前述したリアルタイムメタデータを挿入できる。 Referring to FIG. The configuration of the H.264 / AVC stream will be described. FIG. The GOP structure of the H.264 / AVC stream. FIG. 10B shows that each picture is composed of VCL and Non-VCL NAL units. NAL (video) is a video NAL unit, NAL (Audio) is an audio NAL unit, and NAL (SEI) is a SEI NAL unit. The above-described real-time metadata can be inserted into NAL (SEI).

なお、実験の結果から、撮像装置のパン、ティルト、レンズのズーム情報、レンズのフォーカス情報等は、ＧＯＰ構造における全ピクチャに挿入する必要はなく、２フレーム毎に間引いた場合でも、タイムコードが復元できれば、通常視聴に適した速さのパン、ティルト、ズーム、フォーカスなどのカメラワークを復元できるという知見が得られている。 From the experimental results, it is not necessary to insert pan, tilt, lens zoom information, lens focus information, and the like of the imaging device into all pictures in the GOP structure, and even if the time code is thinned out every two frames, The knowledge that if it can be restored, camera work such as panning, tilting, zooming, and focusing at a speed suitable for normal viewing can be restored.

図１０（Ｃ）はＰＥＳパケットの構造を示し、図１０（Ｂ）に示したピクチャデータデータにＰＥＳパケットヘッダが付加されている。なお、ＰＥＳパケットヘッダには、ヘッダーオプションとしてＭＰＥＧのＰＴＳ／ＤＴＳを含めることができる。Ｈ．２６４の観点よりは、ＰＥＳパケットを１ＡＵ（ＡｃｃｅｓｓＵｎｉｔ）として扱う。本例では、図１０（Ｄ）に示すように、ＰＥＳパケットを１８８バイト毎に分割して、ＭＰＥＧ−ＴＳパケットを生成す。図１０（Ｅ）は、ＭＰＥＧ−ＴＳパケットにタイムコードを含む４バイトのヘッダーが付加して、ＡＴＳパケットを構成することを示している。 FIG. 10C shows the structure of the PES packet, and a PES packet header is added to the picture data data shown in FIG. The PES packet header can include MPEG PTS / DTS as a header option. H. From the viewpoint of H.264, a PES packet is handled as 1 AU (Access Unit). In this example, as shown in FIG. 10D, the PES packet is divided every 188 bytes to generate an MPEG-TS packet. FIG. 10E shows that an ATS packet is formed by adding a 4-byte header including a time code to an MPEG-TS packet.

つぎに、図１１を参照して撮影コンテンツの編集例を説明する。まず、ファイルの取り扱いモードとしては、
モードＡ: オリジナルの撮影コンテンツ
モードＢ：不要シーンのコンテンツ（３０点未満のシーン）
モードＣ：不要シーンのないコンテンツ（３０点以上のシーン）
モードＤ：重要シーンのコンテンツ（５０点以上のシーン）
の４種類がある。 Next, an example of editing the shot content will be described with reference to FIG. First, as the file handling mode,
Mode A: Original shooting content Mode B: Unnecessary scene content (less than 30 scenes)
Mode C: Content without unnecessary scenes (30 or more scenes)
Mode D: Contents of important scenes (50 or more scenes)
There are four types.

編集ステップとしては、以下の３ステップがある。具体的には、
ステップ１）オリジナルの撮影コンテンツファイルから不要シーンのコンテンツのみと取り出して確認する。ＯＫであれば、不要シーンを削除する。
ステップ２）不要シーンのないコンテンツを生成する。
ステップ３）さらに、不要シーンのないコンテンツから、メタデータにより重要シーンであることを示されたコンテンツを生成する。 There are the following three steps as editing steps. In particular,
Step 1) Take out only the content of the unnecessary scene from the original photographed content file and confirm it. If OK, the unnecessary scene is deleted.
Step 2) Generate content without unnecessary scenes.
Step 3) Further, from the content without unnecessary scenes, content indicated as important scenes by metadata is generated.

ここで、ステップ１、ステップ２、ステップ３は、ａｔｓファイルをなんら変更することなく、プレイリストファイルの操作のみで実現できる。 Here, Step 1, Step 2, and Step 3 can be realized only by operating the playlist file without changing the ats file.

また、モードＢで不要シーンのコンテンツを再生する場合、映像にテロップでどういう条件で不要かを挿入して、ユーザに知らせることができる。 In addition, when the contents of an unnecessary scene are reproduced in mode B, it is possible to notify the user by inserting under what conditions the video is unnecessary in the video.

さらに、モードＣの不要シーンのないコンテンツや、モードＤの重要シーンのコンテンツを１つの完パケファイルとして作成することもできる。 Further, content without unnecessary scenes in mode C and content of important scenes in mode D can be created as one complete packet file.

図１２は図１１の動作のシーン情報を生成するムービー、シーン情報を利用するパソコン（ＰＣ）、ＤＶＤレコーダーおよびテレビで実施する場合の説明図である。たとえば、ムービーでは図１１のモードＡ、また、パソコンではモードＢとモードＣとモードＤ、ＤＶＤレコーダーやテレビではモードＣとモードＤを用いる。 FIG. 12 is an explanatory diagram of a case where the operation of FIG. 11 is performed on a movie for generating scene information, a personal computer (PC) using the scene information, a DVD recorder, and a television. For example, mode A shown in FIG. 11 is used for movies, mode B, mode C and mode D are used for personal computers, and mode C and mode D are used for DVD recorders and televisions.

本発明は、不要部のないコンテンツやダイジェストを簡単に生成できるため、ホームビデオ等に代表されるコンテンツ撮影装置に利用できる。 Since the present invention can easily generate contents and digests without unnecessary parts, it can be used for content photographing apparatuses represented by home videos and the like.

本発明の実施の形態に係るコンテンツ撮影装置のモデル図Model diagram of content photographing apparatus according to an embodiment of the present invention 図１に示すカメラの内部構成の説明図Explanatory drawing of the internal configuration of the camera shown in FIG. メタデータの分類例を示す図Diagram showing an example of metadata classification メタデータからシン情報の生成ブロックの説明図Illustration of thin information generation block from metadata 撮影コンテンツより揺れが大きな２つの区間を削除して、揺れのない３つの区間を組み合わせて、１つの揺れのない映像シーケンスを構成する手順の説明図。Explanatory drawing of the procedure which deletes two areas with larger shaking than a photographed content, and composes a video sequence without shaking by combining three sections without shaking. 撮影コンテンツよりズームアップ、ズームダウン後の２つのフィックス区間を抽出して、１つのダイジェストを構成する手順の説明図Explanatory drawing of the procedure which extracts two fix sections after zooming up and zooming down from a photographed content and constitutes one digest 不要シーン情報の生成の説明図Illustration of generation of unnecessary scene information 重要シーン情報の生成の説明図Illustration of generation of important scene information ディレクトリ構成とファイル構成の説明図Illustration of directory structure and file structure Ｈ．２６４のピクチャ構造とＭＰＥＧ−ＴＳへの変換方法の説明図H. Illustration of H.264 picture structure and conversion method to MPEG-TS コンテンツの編集概念の説明図Illustration of content editing concept 機器間でのコンテンツの編集例Example of editing content between devices

Explanation of symbols

１０１カメラ
１０２カメラのレンズ部
１０３カメラのマイク
１０４カメラの撮影対象
１０５カメラで撮影したデータ
１０６ＡＶストリームファイルデータ
１０７メタデータ
１０８カメラで撮影されたデータシーケンス
１０９リモコン
１１０編集によりシーン＃１から＃５までを繋いだデータシーケンス
１１１テレビ（ＴＶ）
１１２信号接続ケーブル
１１３信号接続ケーブル
１１４メタデータ入力用ボタン（重要シーン登録ボタン、静止画撮影ボタン）
１１５、１１７マイク
２０１ズーム制御部
２０２フォーカス制御部
２０３露出制御部
２０４撮像素子
２０５シャッタ速度制御部
２０６カメラマイコン
２０７絶対傾きセンサ
２０８角速度センサ
２０９加速度センサ
２１０ユーザ入力系
２１１カメラ信号処理部
２１２音声処理系
２１３Ｈ．２６４方式エンコーダ
２１４記録メディア
２１５出力インタフェース
４０１映像符号化部
４０２ＶＣＬ−ＮＡユニットバッファ
４０３音声符号化部
４０４ＰＳバッファ
４０５ＶＵＩバッファ
４０６ＳＥＩバッファ
４０７ＮｏｎＶＣＬ−ＮＡユニットバッファ
４０８顔・人物検出手段
４０９シーン情報のメタデータ生成手段
４１０ハウリング検出手段
４１１不要シーン検出手段
４１２重要シーン検出手段
４１３リアルタイムデータ／選択マッピング手段 DESCRIPTION OF SYMBOLS 101 Camera 102 Camera lens part 103 Camera microphone 104 Camera photographing target 105 Data 106 photographed by camera 106 AV stream file data 107 Metadata 108 Data sequence photographed by camera 109 Remote control 110 Scenes # 1 to # 5 by editing Data sequence 111 connecting TVs (TV)
112 signal connection cable 113 signal connection cable 114 metadata input button (important scene registration button, still image shooting button)
115, 117 Microphone 201 Zoom control unit 202 Focus control unit 203 Exposure control unit 204 Image sensor 205 Shutter speed control unit 206 Camera microcomputer 207 Absolute tilt sensor 208 Angular velocity sensor 209 Acceleration sensor 210 User input system 211 Camera signal processing unit 212 Audio processing system 213 H.I. H.264 encoder 214 Recording medium 215 Output interface 401 Video encoding unit 402 VCL-NA unit buffer 403 Audio encoding unit 404 PS buffer 405 VUI buffer 406 SEI buffer 407 Non VCL-NA unit buffer 408 Face / person detection means 409 Scene information Metadata generation means 410 Howling detection means 411 Unnecessary scene detection means 412 Important scene detection means 413 Real-time data / selection mapping means

Claims

A content photographing device that records content including any of video, audio, or data in an information storage medium in combination with the scene information, and accesses a specific part of the content with reference to the scene information,
Scene information generating means for generating scene information by detecting a characteristic scene with reference to a parameter included in either video / audio to be captured and recorded or operation information of the imaging device;
Auxiliary information adding means for adding any one of the type, priority, start time, end time, and representative time as auxiliary information to the scene information according to a predetermined rule;
Scene list description means for describing the scene information and its auxiliary information in a scene list;
Content comprising scene selection means for selecting scene information within a predetermined number based on the priority from a plurality of scene information described in the scene list after the recording operation to the information storage medium is completed Shooting device.

2. The scene selection unit according to claim 1, wherein the scene selection unit includes at least one of a high priority scene selection unit that selects scene information in descending order of priority and a low priority scene selection unit that selects scene information in descending order of priority. Content shooting device.

3. The content photographing apparatus according to claim 2, wherein the high priority scene selection unit selects a high priority scene from the shooting content excluding the scene selected by the low priority scene selection unit.

Setting means for the scene information selection method;
Second priority means for calculating a combination of a plurality of scene information described in the scene list description memory and adding a second priority to the scene information by a calculation method determined by the setting means;
Auxiliary information adding means for adding at least one of a start time, an end time, and a representative time of the characteristic scene having the second priority as auxiliary information of the scene information;
Second priority for generating a second priority list by selecting scene information within a predetermined number in descending order of the second priority from a plurality of scene information held as a list in the scene list description memory The content photographing apparatus according to claim 1, further comprising a degree list generation unit.

5. The content photographing apparatus according to claim 4, further comprising second priority list description means for describing the second priority list in a reference file at the time of reproduction of photographed content.

5. The content photographing apparatus according to claim 4, further comprising skip means for skipping to a point referring to a start point, a representative point, or an end point of a characteristic scene by referring to the second priority list during reproduction.

The content photographing apparatus according to claim 5, further comprising a scene reproducing unit that reproduces characteristic scene sections in a designated order by referring to the second priority list.

6. The content shooting according to claim 5, further comprising: a telop display means for superimposing a description of the characteristic scene as a telop on the reproduced video when reproducing the characteristic scene sections in a specified order. apparatus.

Digest method designation means for inputting at least one of priority specification of a characteristic scene to be digested, specification of a characteristic scene type to be digested, specification of a digest time length, specification of a reduction ratio to a content digest,
Means for generating a reference file at the time of reproduction according to the digest generation method designated by the digest method designation means with reference to a second priority list which is auxiliary data of the reference file at the time of reproduction;
6. The content photographing apparatus according to claim 5, further comprising registration means for registering the reproduction reference file in a reproduction object list.

6. The content shooting according to claim 5, further comprising: a telop display means for superimposing a description of the characteristic scene as a telop on the reproduced video when reproducing each characteristic scene section when reproducing with reference to the reproduction reference file. apparatus.

Digest method designation means for inputting at least one of priority specification of a characteristic scene to be digested, specification of a characteristic scene type to be digested, specification of a digest time length, specification of a reduction ratio to a content digest,
File generation means for generating a file as a set of designated characteristic scenes according to the digest generation method specified by the digest method specifying means with reference to the second priority list as auxiliary data of the reference file at the time of reproduction The content photographing device according to claim 5, further comprising:

Digest method designation means for inputting at least one of priority designation of a characteristic scene to be digested, designation of a characteristic scene type to be digested, designation of a digest time length, and designation of a reduction ratio to a content digest,
And further comprising a reproducing means for joining and reproducing characteristic scene sections not specified according to a digest generation method specified by a digest method specifying means with reference to a second priority list which is auxiliary data of the reference file at the time of reproduction. Item 6. The content photographing device according to Item 5.

Digest method designation means for inputting at least one of priority designation of a characteristic scene to be digested, designation of a characteristic scene type to be digested, designation of a digest time length, and designation of a reduction ratio to a content digest,
File generation means for generating a file in which shooting sections not specified at the time of reproduction are collected according to the digest generation method specified by the digest method specifying means with reference to the second priority list which is auxiliary data of the reference file at the time of reproduction; The content photographing device according to claim 5, further comprising:

By referring to the second priority list, which is auxiliary data of the reference file at the time of reproduction, normal reproduction is performed in a section specified at the time of reproduction, and “reproduction with the reproduction speed changed from normal” is performed at a shooting period not specified at the time of reproduction. The content photographing apparatus according to claim 5, further comprising reproduction means for performing “reproduction of a video obtained by processing a reproduction video”.

The content photographing apparatus according to claim 14, further comprising reproduction display means for performing slow reproduction, high-speed reproduction, skip reproduction, or still image display of a photographed video, and performing the “reproduction with the reproduction speed changed from normal”.

15. The content according to claim 14, further comprising video generation means for generating a video obtained by removing a roll and a shake of a shot video caused by camera work, and performing the “reproduction of a video obtained by processing a reproduction video”. Shooting device.

File generation means for generating a recommended playback reference file composed of a scene having a predetermined priority or a scene having a specific camera work from a playback reference file registered in the playback object list; ,
The content photographing apparatus according to claim 9, further comprising registration means for registering the recommended reproduction reference file in a recommended reproduction object list.

Means for generating BGM at the time of reproduction of the reproduction-time reference file registered in the reproduction object list;
10. The content photographing apparatus according to claim 9, further comprising changing means for changing at least one of a BGM melody, timbre, and tempo in the vicinity of a characteristic scene change of the reference file during reproduction.