JP2008236729A

JP2008236729A - Method and apparatus for generating digest

Info

Publication number: JP2008236729A
Application number: JP2008012411A
Authority: JP
Inventors: Shin Nakade; 慎中手
Original assignee: Victor Company of Japan Ltd
Current assignee: Victor Company of Japan Ltd
Priority date: 2007-02-19
Filing date: 2008-01-23
Publication date: 2008-10-02
Anticipated expiration: 2028-01-23
Also published as: JP4893641B2

Abstract

<P>PROBLEM TO BE SOLVED: To optimally classify a plurality of moving-image files obtained by photographing into scenes, and to generate a digest extracting an optimal portion for each genre. <P>SOLUTION: A scene classifying section 42 classifies a plurality of moving-image files classified as the same event by an event classifying section 41 for each scene in accordance with a threshold predetermined correspondingly to the genre of the event on the basis of tags 22. A digest generating section 43 extracts a part of the moving-image files in accordance with a digest generation technique corresponding to the genre of the event for each moving-image file classified to the same event on the basis of the tags 22 and generates an image linking a plurality of portions extracted, respectively, as a digest. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、ビデオカメラや、デジタルカメラ、さらには動画撮影機能を有する携帯電話等の動画撮影装置で撮影した動画ファイルの要約であるダイジェストを生成するダイジェスト生成装置及びダイジェスト生成方法に関する。 The present invention relates to a digest generation apparatus and a digest generation method for generating a digest that is a summary of a moving image file captured by a video camera, a digital camera, and a moving image capturing apparatus such as a mobile phone having a moving image capturing function.

撮影日時間隔を基準として、撮影画像を複数のグループに分け、各グループ内の代表画像を決定する方法がある（例えば、特許文献１参照）。 There is a method in which captured images are divided into a plurality of groups on the basis of the shooting date interval, and representative images in each group are determined (see, for example, Patent Document 1).

特開２００４−２９５２３１号公報JP 2004-295231 A

しかし、上記方法では、撮影画像のジャンルに関わらず撮影日時間隔に基づいて撮影画像を複数のグループに分けているため、ジャンル毎に最適な代表画像を抽出できない、という問題がある。 However, the above method has a problem in that an optimum representative image cannot be extracted for each genre because the photographed images are divided into a plurality of groups based on the photographing date interval regardless of the genre of the photographed image.

特に、ビデオカメラ等の動画撮影装置で撮影した動画ファイルの場合、ジャンルによって動画ファイルの中に含まれる代表部分または特徴部分が異なる、という問題もある。 In particular, in the case of a moving image file shot by a moving image shooting device such as a video camera, there is a problem that the representative portion or characteristic portion included in the moving image file differs depending on the genre.

そこで、本発明は、撮影して得られた複数の動画ファイルを最適にシーン分類し、またジャンル毎に最適な部分を抽出したダイジェストを生成することができるダイジェスト作成装置及びダイジェスト生成方法を提供することを目的とする。 Therefore, the present invention provides a digest creation apparatus and digest generation method capable of generating a digest that optimally classifies a plurality of moving image files obtained by shooting and extracts an optimal part for each genre. For the purpose.

上記の目的を達成するため、本発明のダイジェスト生成装置またはダイジェスト生成方法は、撮影して得られた複数の動画ファイルからダイジェストを生成する際、前記複数の動画ファイルに付与された各動画ファイルのイベントのジャンルを示すジャンル情報と、前記イベントを互いに識別するためのイベント識別情報とが記述されたタグの前記イベント識別情報に基づいて前記複数の動画ファイルをイベント毎に分類し、前記イベント毎に分類された前記複数の動画ファイルを、前記動画ファイル間の撮影間隔に基づいてシーン毎に分類し、前記タグに記述された前記ジャンル情報に基づくダイジェスト生成法により、前記シーン毎に分類された複数の動画ファイルから一部分を抽出してダイジェストとして生成するものである。 In order to achieve the above object, the digest generating apparatus or digest generating method of the present invention, when generating a digest from a plurality of moving image files obtained by shooting, each of the moving image files attached to the plurality of moving image files. The plurality of video files are classified for each event based on the event identification information of a tag in which genre information indicating an event genre and event identification information for identifying the event from each other are described, and for each event The plurality of classified video files are classified for each scene based on the shooting interval between the video files, and a plurality of classified for each scene by the digest generation method based on the genre information described in the tag. A part of the video file is extracted and generated as a digest.

ここで、イベント毎に分類された複数の動画ファイルを、前記動画ファイル間の撮影間隔に基づいてシーン毎に分類する際、ジャンル毎に予め閾値を設定しておき、前記イベント毎に分類された前記複数の動画ファイルの前記タグに記述された前記ジャンル情報に基づいて、前記複数の動画ファイルのジャンルに応じた前記閾値を設定し、設定した前記閾値と前記動画ファイル間の撮影間隔とに基づいてシーン毎に分類するようにしても良いし、ジャンル毎に予め閾値の最大値または最小値を設定しておき、各イベントを構成する動画ファイルの属性情報に基づいて前記閾値を決定し、前記イベント毎に分類された前記複数の動画ファイルの前記タグに記述された前記ジャンル情報に基づいて、前記複数の動画ファイルのジャンルに応じた最大値または最小値と、属性情報に基づいて決定した閾値とを比較し、属性情報に基づいて決定した前記閾値が最大値以上または最小値以下の場合、その最大値または最小値を選択する一方、属性情報に基づいて決定した閾値が最大値より小さいまたは最小値より大きい場合、属性情報に基づいて決定した閾値を選択し、選択した閾値または最大値と、動画ファイル間の撮影間隔とに基づいてシーン毎に分類するようにしても良い。また、各イベントを構成する動画ファイルの属性情報に基づいて前記閾値を決定し、かつ、各イベントのジャンルが特定のジャンルの場合のみ、前記属性情報に基づいて決定した前記閾値を所定係数に基づいて調整して、ジャンル毎の閾値の最大値または最小値による制限を行わないようにしても良い。なお、属性情報は、イベントの長さであっても良いし、イベントを構成する動画ファイルの撮影間隔の平均や、イベントを構成する動画ファイルの撮影位置を示す位置情報、さらにはこれらの組合せであっても良い。 Here, when a plurality of video files classified for each event are classified for each scene based on the shooting interval between the video files, a threshold is set in advance for each genre, and the video files are classified for each event. Based on the genre information described in the tags of the plurality of moving image files, the threshold value is set according to the genre of the plurality of moving image files, and based on the set threshold value and the shooting interval between the moving image files. The threshold value may be classified for each scene, the maximum value or the minimum value of the threshold value is set in advance for each genre, the threshold value is determined based on the attribute information of the video file constituting each event, Based on the genre information described in the tag of the plurality of video files classified for each event, the most suitable for the genre of the plurality of video files. Comparing the value or minimum value with the threshold value determined based on the attribute information, and if the threshold value determined based on the attribute information is greater than or equal to the maximum value or less than the minimum value, while selecting the maximum value or minimum value, When the threshold value determined based on the attribute information is smaller than the maximum value or larger than the minimum value, the threshold value determined based on the attribute information is selected, and based on the selected threshold value or maximum value and the shooting interval between the video files. You may make it classify | categorize for every scene. Further, the threshold value is determined based on the attribute information of the moving image file constituting each event, and the threshold value determined based on the attribute information is based on a predetermined coefficient only when the genre of each event is a specific genre. May be adjusted so as not to be limited by the maximum or minimum threshold value for each genre. The attribute information may be the length of the event, the average of the shooting intervals of the moving image files constituting the event, the position information indicating the shooting positions of the moving image files constituting the event, or a combination thereof. There may be.

本発明のダイジェスト生成装置および方法によれば、各動画ファイルに付与された各動画ファイルのイベントのジャンルを示すジャンル情報と、前記イベントを互いに識別するためのイベント識別情報とが記述されたタグのイベント識別情報に基づいて、複数の動画ファイルをイベント毎に分類し、イベント毎に分類した複数の動画ファイルを動画ファイル間の撮影間隔に基づきシーン毎に分類し、タグのジャンル情報に基づくダイジェスト生成法により、シーン毎に分類した複数の動画ファイルから一部分を抽出してダイジェストとして生成するようにしたので、複数の動画ファイルを最適にシーン分類し、またジャンル毎に最適な部分を抽出したダイジェストを生成することができる。 According to the digest generation apparatus and method of the present invention, a tag in which genre information indicating an event genre of each moving image file assigned to each moving image file and event identification information for identifying the event from each other is described. Based on event identification information, multiple video files are classified for each event, multiple video files classified for each event are classified for each scene based on the shooting interval between video files, and digest generation based on tag genre information Since a part of a plurality of video files classified by scene is extracted and generated as a digest, the digest that optimally classifies a plurality of video files and extracts the optimal part for each genre is extracted. Can be generated.

実施形態１．
次に、本発明を実施するための最良の形態について図面を参照して詳細に説明する。 Embodiment 1. FIG.
Next, the best mode for carrying out the present invention will be described in detail with reference to the drawings.

図１は、本発明になるダイジェスト生成装置の一実施の形態のブロック図を示す。本実施の形態のダイジェスト生成装置は、ビデオカメラに適用されたもので、図１に示すように、コンテンツ保持部１１、撮影部３１、タグ作成部３２、イベント分類部４１、シーン分類部４２、ダイジェスト生成部４３、及び入力指示部４４を有する。 FIG. 1 shows a block diagram of an embodiment of a digest generating apparatus according to the present invention. The digest generation apparatus according to the present embodiment is applied to a video camera. As shown in FIG. 1, a content holding unit 11, a photographing unit 31, a tag creation unit 32, an event classification unit 41, a scene classification unit 42, A digest generation unit 43 and an input instruction unit 44 are provided.

コンテンツ保持部１１は、ビデオカメラで撮影した複数の動画ファイル２１と、各動画ファイルに付与されたタグ２２を保持している。すなわち、コンテンツ保持部１１には、撮影部３１により撮影された撮影日時または記録日時が付与された動画ファイル２１が複数記録されている。そして、各動画ファイル２１には、ユーザの操作等に基づくタグ作成部３２により作成された、各動画ファイルに記録されたイベントのジャンル、すなわち運動会、旅行等のイベントの種類または内容を示すジャンル情報と、イベントを互いに識別するためのイベント識別情報、すなわち例えば運動会という同一ジャンルでも別々の運動会毎に運動会１、運動会２等というように同一ジャンルでも互いに識別できるようにした重複しない数値情報等とが記述されたタグ２２が付与されている。なお、各動画ファイルへのタグの付与を、本装置外で行うことも可能であり、この場合には、本装置からタグ作成部３２を省略することができる。同様に、コンテンツ保持部１１、撮影部３１、入力指示部４４を本装置外に備えていても勿論よい。 The content holding unit 11 holds a plurality of moving image files 21 photographed by a video camera and a tag 22 attached to each moving image file. That is, the content holding unit 11 stores a plurality of moving image files 21 to which shooting dates / times or recording dates / times taken by the shooting unit 31 are added. Each video file 21 includes a genre information indicating the genre of an event recorded in each video file, that is, the type or content of an event such as an athletic meet or a trip, created by the tag creation unit 32 based on the user's operation or the like. And event identification information for identifying events from each other, that is, for example, non-overlapping numerical information that can be distinguished from each other even in the same genre, such as athletic meet 1, athletic meet 2, etc. The described tag 22 is given. It should be noted that it is also possible to attach a tag to each moving image file outside the apparatus, and in this case, the tag creation unit 32 can be omitted from the apparatus. Similarly, the content holding unit 11, the photographing unit 31, and the input instruction unit 44 may of course be provided outside the apparatus.

ここで、「イベント」とは、広義には旅行や運動会等の撮影した内容をいうが、本発明および本実施形態では、タグ作成部３２によって各動画ファイルに付与されたタグ２２が共通な動画ファイルの単位をいう。また、「ジャンル」は、「運動会」、「旅行」等のイベントの内容を示す。従って、１つのイベントにつき１つのジャンルが対応し、動画ファイルのジャンルは、各動画ファイルに付与されたジャンル情報によって特定され、同一ジャンルのイベントは重複しない数値情報等のイベント識別情報によって識別される。なお、イベント識別情報は、同一ジャンルのイベント間だけでなく、種類の異なるイベントとの間でも、重複しない数値情報等としても勿論良い。 Here, the term “event” refers to the contents taken during a travel or athletic meet in a broad sense, but in the present invention and this embodiment, the tag 22 assigned to each video file by the tag creation unit 32 is a common video. A unit of file. “Genre” indicates the content of an event such as “athletic meet” or “travel”. Therefore, one genre corresponds to one event, the genre of the moving image file is specified by the genre information given to each moving image file, and the events of the same genre are identified by event identification information such as numerical information that does not overlap. . The event identification information may of course be numerical information that does not overlap not only between events of the same genre but also between different types of events.

図２は、上記のイベントのジャンルの各例を示す。ジャンルには、例えば、旅行・レジャー、運動会、スポーツ、子供・ペット、結婚式・パーティーがあり、タグ作成部３２が各イベントにイベント情報として付与する。 FIG. 2 shows examples of the genre of the event. The genre includes, for example, travel / leisure, athletic meet, sports, children / pets, wedding / party, and the tag creation unit 32 assigns each event as event information.

本実施形態では、これらのジャンル毎にシーン分類の閾値及びダイジェスト生成法を定めており、ここでは、例えば、「旅行・レジャー」のジャンルにはシーン分類閾値ａ及びダイジェスト生成法Ａ、「運動会・スポーツ」のジャンルにはシーン分類閾値ｂ及びダイジェスト生成法Ｂ、「子供・ペット」のジャンルにはシーン分類閾値ｃ及びダイジェスト生成法Ｃ、「結婚式・パーティー」のジャンルにはシーン分類閾値ｄ及びダイジェスト生成法Ｄを設定している。なお、Ａ〜Ｄのシーン分類閾値およびダイジェスト生成法については後述する。 In the present embodiment, a scene classification threshold and a digest generation method are determined for each of these genres. Here, for example, for the category of “travel / leisure”, the scene classification threshold a and the digest generation method A, Scene classification threshold b and digest generation method B for the genre “sport”, scene classification threshold c and digest generation method C for the genre “children / pets”, and scene classification threshold d and genre “wedding / party” Digest generation method D is set. The scene classification thresholds A and D and the digest generation method will be described later.

図１に戻って説明する。イベント分類部４１は、タグ２２に記述されたジャンル情報とイベント識別情報等により、動画ファイルを同一イベントを撮影した動画ファイル毎に分類する。例えば、別々の運動会や旅行で撮影した複数の動画ファイルを、タグ２２に記述された運動会１、運動会２、旅行１等のジャンル情報とイベント識別情報とによりイベント毎に分類する。なお、ジャンル情報の後に続く“１”、“２”等のイベント識別情報だけでイベントを識別できる場合、すなわち異なるジャンルのイベントでも重複しないようにイベント識別情報が付与されている場合には、イベント識別情報だけでイベントを識別することができる。 Returning to FIG. The event classification unit 41 classifies the moving image file for each moving image file obtained by photographing the same event based on the genre information described in the tag 22 and the event identification information. For example, a plurality of moving image files photographed at different athletic meet or trip are classified for each event based on genre information such as athletic meet 1, athletic meet 2 and travel 1 described in the tag 22 and event identification information. In addition, when an event can be identified only by event identification information such as “1”, “2”, etc. that follows the genre information, that is, when event identification information is given so as not to overlap even in events of different genres, An event can be identified only by the identification information.

続いて、シーン分類部４２は、イベント分類部４１で同一イベントとして分類された動画ファイルを、例えば、図２に示すようなジャンル毎の閾値ａ〜ｄによりシーン毎に分類し、同一イベントの動画ファイルを同じシーン毎に分類する。 Subsequently, the scene classification unit 42 classifies the moving image files classified as the same event by the event classification unit 41 for each scene based on thresholds a to d for each genre as shown in FIG. Sort files into the same scenes.

図３は、本実施形態１におけるシーン分類部４２での処理の流れを示すフローチャートである。 FIG. 3 is a flowchart showing a flow of processing in the scene classification unit 42 according to the first embodiment.

まず、シーン分類部４２は、イベント分類部４１によってタグ２２に基づいてイベント毎に分類された動画ファイル２１から連続して撮影されたシーンの撮影間隔を算出する（Ｓ１０１）。これは、この撮影間隔がある閾値以上である場合は、その間隔の前後で別々のシーンとして分類するためである。以降のステップでは、ジャンル毎にこの閾値を決定する。なお、以降、動画ファイルを単にファイルと省略して説明する場合がある。 First, the scene classification unit 42 calculates a shooting interval of scenes continuously shot from the moving image file 21 classified for each event based on the tag 22 by the event classification unit 41 (S101). This is because if this shooting interval is greater than or equal to a certain threshold, it is classified as a separate scene before and after that interval. In the subsequent steps, this threshold value is determined for each genre. Hereinafter, the moving image file may be simply abbreviated as a file.

例えば、ジャンルが「旅行・レジャー」の場合、いくつかの場所を移動することが多いため、この場所毎にシーンとして分割することが望まれる。一般的に移動中には撮影をしないことが多く、ある程度長い時間を移動に費やすことが多い。よって、このジャンル「旅行・レジャー」では、ファイルの録画時刻の間隔が、ある程度長い閾値以上であったときにシーンの分割を行う。 For example, when the genre is “travel / leisure”, it is often necessary to divide the scene into scenes for each place because there are many places to move. In general, shooting is often not performed during movement, and a certain amount of time is often spent for movement. Therefore, in this genre “travel / leisure”, the scene is divided when the recording time interval of the file is equal to or longer than a certain long threshold.

図４は、例えばある旅行等の同一イベントで撮影した１３個の動画ファイルのシーン分類の一例を示している。 FIG. 4 shows an example of scene classification of 13 moving image files taken at the same event such as a trip.

１３個の動画ファイルそれぞれには、撮影開始時刻、撮影終了時刻が記述されており、これらのファイルで「旅行・レジャー」イベントでの閾値を説明する。一般的な旅行では移動に３０分以上かかると考え、「旅行・レジャー」イベントでのシーン分類の閾値ａを例えば「３０分」とする。勿論、この「３０分」は一例であり、１時間でも、２時間でも、さらには日付単位でも勿論良い。 In each of the 13 moving image files, a shooting start time and a shooting end time are described, and the threshold value in the “travel / leisure” event is described with these files. Considering that it takes 30 minutes or more for a general trip, the threshold “a” for scene classification in a “travel / leisure” event is set to “30 minutes”, for example. Of course, this “30 minutes” is an example, and may be one hour, two hours, or even a date unit.

シーン分類部４２は、図４のファイル１〜ファイル１３の各々について前のファイルからの撮影間隔を算出して、それが上記の閾値ａの「３０分」以上であったときにシーンの分類を行うことにより、図４の最右列に示すように６つのシーンに分類する（Ｓ１０２“Ｙｅｓ”，Ｓ１０３）。 The scene classification unit 42 calculates the shooting interval from the previous file for each of the files 1 to 13 in FIG. 4, and classifies the scene when it is equal to or greater than “30 minutes” of the threshold value a. By doing so, it is classified into six scenes as shown in the rightmost column of FIG. 4 (S102 “Yes”, S103).

次に、ジャンルが「運動会・スポーツ」の場合の閾値ｂとしては、このジャンルでは、競技あるいは試合毎にシーンとして分類を行うことが望まれる。一般的に競技や試合の間隔は旅行の移動時間よりも短い。そこで、閾値ｂを例えば「１０分」としてシーン分類を行う（Ｓ１０４“Ｙｅｓ”，Ｓ１０５）。 Next, as the threshold value b when the genre is “athletic / sports”, in this genre, it is desirable to classify as a scene for each competition or game. In general, the interval between competitions and matches is shorter than the travel time. Therefore, scene classification is performed with the threshold value b set to, for example, “10 minutes” (S104 “Yes”, S105).

また、ジャンルが「子供・ペット」の場合の閾値ｃとしては、このジャンルに分類されるファイル群として、撮影内容が以下の２通り考えられる。「被写体を長い期間撮影したもの」、そして「旅行等のイベントを子供・ペット中心に撮影したもの」である。 Further, as the threshold value c when the genre is “children / pets”, the following two types of shooting contents can be considered as file groups classified into this genre. “A photograph of a subject taken for a long period of time” and “A photograph of an event such as a trip focusing on children and pets”.

そこで、イベントの長さにより閾値ｃを変更する。シーン分類部４２は、イベントの最初のファイルの撮影開始時刻と、最後のファイルの撮影終了時刻の差が３日以上である場合は、日の変わり目毎にシーンを分類する（Ｓ１０６“Ｙｅｓ”，Ｓ１０７“Ｙｅｓ”，Ｓ１０９）。つまり、日付が同じであれば同一シーンとする一方、日付が異なれば別シーンとする。 Therefore, the threshold value c is changed according to the length of the event. When the difference between the shooting start time of the first file of the event and the shooting end time of the last file is 3 days or more, the scene classification unit 42 classifies the scene at every turn of the day (S106 “Yes”, S107 “Yes”, S109). That is, if the date is the same, the scene is the same, while if the date is different, the scene is different.

その一方、イベントの最初のファイルの撮影開始時刻と、最後のファイルの撮影終了時刻の差が２日以下であった場合は、シーン分類の閾値ｃを例えば「１０分」とする（Ｓ１０６“Ｙｅｓ”，Ｓ１０７“Ｎｏ”，Ｓ１０８）。 On the other hand, if the difference between the shooting start time of the first file of the event and the shooting end time of the last file is two days or less, the scene classification threshold c is set to, for example, “10 minutes” (S106 “Yes”). ", S107" No ", S108).

また、ジャンルが「結婚式・パーティー」の場合の閾値ｄとしては、このジャンルでは、会場の変更等によりシーンを分類することが望まれる。一般的に、会場の移動には３０分以上かかると考え、シーン分類部４２は、シーン分類の閾値ｄを例えば「３０分」とする（Ｓ１１０“Ｙｅｓ”，Ｓ１１１）。 Further, as the threshold value d when the genre is “wedding / party”, in this genre, it is desirable to classify scenes by changing the venue or the like. In general, it is considered that it takes 30 minutes or more for the venue to move, and the scene classification unit 42 sets the threshold d for scene classification to, for example, “30 minutes” (S110 “Yes”, S111).

なお、ジャンルが設定されていない場合、イベント分類部４１は、閾値を例えば「２０分」とする（Ｓ１１２）。 If the genre is not set, the event classification unit 41 sets the threshold value to “20 minutes”, for example (S112).

このように、本実施形態のイベント分類部４１は、イベントのジャンル毎に異なる閾値、すなわち各ジャンルに応じた最適な閾値を設定して、各イベントの動画ファイルをシーン毎に分類する。 As described above, the event classification unit 41 of the present embodiment sets a different threshold for each event genre, that is, an optimum threshold corresponding to each genre, and classifies the moving image file of each event for each scene.

そして、イベント分類部４１は、以上の方法により決定した閾値と、Ｓ１０１で算出した動画ファイル間の撮影間隔とを比較して、撮影間隔が閾値未満の場合、同一シーンとして分類する分類する一方、撮影間隔が閾値以上である場合、その前後でファイルを別シーンとして分類する（Ｓ１１３）。 Then, the event classification unit 41 compares the threshold determined by the above method and the shooting interval between the moving image files calculated in S101, and classifies the images as the same scene when the shooting interval is less than the threshold. If the shooting interval is greater than or equal to the threshold, the file is classified as a different scene before and after that (S113).

再び図１に戻り、ダイジェスト生成部４３は、イベント分類部４１によってシーンに分類された動画ファイルの集合からダイジェストを生成する。 Returning to FIG. 1 again, the digest generation unit 43 generates a digest from a set of moving image files classified into scenes by the event classification unit 41.

図５は、ダイジェスト生成部４３によるダイジェスト生成法の処理手順を示すフローチャートである。 FIG. 5 is a flowchart showing the processing procedure of the digest generation method by the digest generation unit 43.

まず、ダイジェスト生成部４３は、生成するダイジェストの長さを決定する（Ｓ３０１）。生成するダイジェストの長さは全ファイルの合計撮影時間や、ファイル数に比例するようにする等、自動的に決定しても良いし、入力指示部４４により入力しても良い。本実施形態１では、例えば、９０秒のダイジェストを生成するものとする。ダイジェストの長さを自動的に設定する場合、入力指示部４４は省略できる。 First, the digest generation unit 43 determines the length of the digest to be generated (S301). The length of the digest to be generated may be automatically determined such as being proportional to the total shooting time of all files or the number of files, or may be input by the input instruction unit 44. In the first embodiment, for example, a digest of 90 seconds is generated. When the digest length is automatically set, the input instruction unit 44 can be omitted.

次に、ダイジェスト生成部４３は、すべてのシーンのダイジェストを生成し、それらを連結することで全体のダイジェストを生成するため、続いて各シーンにおいて何秒のダイジェストを生成するかを決定する（Ｓ３０２）。各シーンのダイジェストの長さは、Ｓ３０１で定めたダイジェスト時間をシーン数で等分しても良いし、各シーンの合計撮影時間やファイル数等の割合で配分しても良い。ここでは、９０秒をシーン数である３で等分し、各シーンから３０秒ずつダイジェストとして抽出することとする。 Next, the digest generation unit 43 generates digests of all scenes, and generates a whole digest by connecting them, so that it determines how many seconds of digests to generate in each scene (S302). ). The digest length of each scene may be equally divided by the number of scenes of the digest time determined in S301, or may be distributed in proportion to the total shooting time of each scene, the number of files, and the like. Here, 90 seconds is equally divided by 3 which is the number of scenes, and extracted from each scene as a digest for 30 seconds.

次に、ダイジェスト生成部４３は、各シーンのダイジェストを生成する（Ｓ３０３）。本実施形態では、図２に示すように、動画ファイルのジャンルにより、ダイジェストの生成法を変更することで、ジャンルに合わせた最適なダイジェストを生成する。 Next, the digest generation unit 43 generates a digest of each scene (S303). In the present embodiment, as shown in FIG. 2, an optimal digest that matches the genre is generated by changing the digest generation method according to the genre of the moving image file.

つまり、本実施形態１では、図２に示すように、イベントのジャンル毎に各イベントの動画ファイルをシーンに分類する際の閾値を変えることができると共に、イベントのジャンル毎にダイジェスト生成法を変えることができる。 In other words, in the first embodiment, as shown in FIG. 2, the threshold for classifying the moving image file of each event into a scene can be changed for each event genre, and the digest generation method is changed for each event genre. be able to.

次に、ジャンル毎のダイジェスト生成法について詳細に説明する。 Next, a digest generation method for each genre will be described in detail.

まず、ジャンルが「旅行・レジャー」、すなわち観光旅行などでビデオカメラを用いて撮影する場合、事前に被写体を決めて、そこにビデオカメラを向けてから撮影を開始することが多い。そのため、撮影開始直後に抽出するべき映像が存在する。 First, when the genre is “travel / leisure”, that is, when taking a picture using a video camera in a sightseeing trip or the like, it is often the case that a subject is determined in advance and the video camera is pointed at the subject before starting to shoot. For this reason, there is an image to be extracted immediately after the start of shooting.

そこで、ダイジェスト生成部４３は、図６（Ａ）に示すダイジェスト生成法Ａにより、図７（Ａ）に示すようにシーンを構成する各動画ファイルの開始部分を抽出する。 Therefore, the digest generation unit 43 extracts the start portion of each moving image file constituting the scene as shown in FIG. 7A by the digest generation method A shown in FIG.

すなわち、ジャンルが「旅行・レジャー」の場合、ダイジェスト生成部４３は、図６（Ａ）に示すダイジェスト生成法Ａにより、同一シーンの全てのファイル、例えば図７（Ａ）の例では６つの動画ファイル１〜６から先頭５秒を抽出し（Ｓ１）、続いて抽出したすべての動画ファイル１〜６の先頭５秒の区間を撮影時刻順に連結する（Ｓ２）。これにより、３０秒のダイジェストが生成する。ここで、すべてのシーンの先頭５秒を抽出しても３０秒に満たない場合、例えば、シーンに含まれるファイルが６つ未満の場合、抽出する長さを５秒以上にしたり、ファイルの中心などから２つ以上の区間を抽出したりすることにより３０秒のダイジェストを生成する。逆に、すべてのシーンの先頭５秒を抽出すると３０秒を超える場合、例えば、シーンに含まれるファイルが６つより多い場合、撮影時間の長いファイルや前のファイルとの撮影間隔が長いファイルから優先的に抽出することにより３０秒のダイジェストを生成する。 That is, when the genre is “travel / leisure”, the digest generation unit 43 uses the digest generation method A shown in FIG. 6A to display all files of the same scene, for example, six moving images in the example of FIG. The first 5 seconds are extracted from the files 1 to 6 (S1), and the sections of the first 5 seconds of all the extracted moving image files 1 to 6 are connected in order of shooting time (S2). As a result, a digest of 30 seconds is generated. Here, if the first 5 seconds of all the scenes are extracted and it is less than 30 seconds, for example, if the number of files included in the scene is less than 6, the length of extraction should be 5 seconds or more, or the center of the file A digest of 30 seconds is generated by extracting two or more sections from the above. Conversely, if the first 5 seconds of all scenes are extracted and it exceeds 30 seconds, for example, if there are more than 6 files in the scene, a file with a long shooting time or a file with a long shooting interval from the previous file is used. A 30-second digest is generated by extracting preferentially.

次に、ジャンルが「運動会・スポーツ」では、「徒競走で子供がビデオカメラの前を通る」、「野球でヒットを打つ」というような、撮影したいプレーが何時発生するかを、事前に知ることが難しいので、ビデオカメラを撮影状態としながら待機することが多い。そのため、各動画ファイルの中盤以降の部分を抽出することでダイジェストを生成する。 Next, if the genre is “athletic / sports”, know in advance when the play you want to shoot occurs, such as “the child runs in front of the video camera in a race” or “hit a hit in baseball” Because it is difficult, it is often on standby while the video camera is in the shooting state. Therefore, a digest is generated by extracting the part after the middle of each moving image file.

すなわち、ジャンルが「運動会・スポーツ」の場合、ダイジェスト生成部４３は、図６（Ｂ）に示すダイジェスト生成法Ｂにより、まず、同一シーンのすべてのファイル、例えば図７（Ｂ）の例に示すように６つのファイルのほぼ中心の斜線により示す５秒を抽出し（Ｓ１１）、続いて、抽出したすべてのファイルのほぼ中心５秒の区間を撮影時刻順に連結する（Ｓ１２）。このようにダイジェスト生成部４３は、図７（Ｂ）に示す同一シーンの６つのファイルの各５秒の区間を撮影時刻順に連結して３０秒のダイジェストを生成する。なお、すべてのシーンの中心５秒を抽出しても３０秒に満たない場合は、抽出する長さを５秒以上にしたり、ファイルの先頭から３分の１と３分の２の時間の部分から２つの区間を抽出したりすることにより３０秒のダイジェストを生成する。逆に、すべてのシーンの先頭５秒を抽出すると３０秒を超える場合は、撮影時間の長いファイルや前のファイルとの撮影間隔が長いファイルから優先的に抽出することにより３０秒のダイジェストを生成する。 That is, when the genre is “athletic / sports”, the digest generation unit 43 first uses the digest generation method B shown in FIG. 6B to show all files in the same scene, for example, the example of FIG. 7B. Thus, 5 seconds indicated by the oblique lines at the approximate centers of the six files are extracted (S11), and then the approximately 5-second sections of all the extracted files are connected in order of photographing time (S12). In this way, the digest generating unit 43 generates a 30-second digest by connecting the 5-second sections of the six files of the same scene shown in FIG. If the center 5 seconds of all the scenes is extracted but it is less than 30 seconds, the length to be extracted is set to 5 seconds or more, or the time period of 1/3 and 2/3 from the beginning of the file Or the like, by extracting two sections, a 30-second digest is generated. Conversely, if the first 5 seconds of all scenes are extracted and it exceeds 30 seconds, a 30-second digest is generated by extracting preferentially from a file with a long shooting time or a file with a long shooting interval from the previous file. To do.

次に、ジャンルが「子供・ペット」では、ジャンルが「旅行・レジャー」と同様に、被写体を捉えてから撮影を開始することが多い。そのため、撮影開始直後に抽出するべき映像が存在する。また、このジャンルでは、一般的に被写体が動き、画面の中央に存在する。そのため、画面中央部分がその周囲と比較して異なる動きをしていたり、異なる色をしていたりする場合、被写体が画像の中央に存在している可能性が高い。そこで、各動画ファイルの開始部分付近で、被写体が画像の中央に存在している可能性が高い部分を抽出することでダイジェストを生成する。 Next, when the genre is “children / pets”, as in the case of the genre “travel / leisure”, shooting is often started after capturing the subject. Therefore, there is a video that should be extracted immediately after the start of shooting. In this genre, the subject generally moves and exists in the center of the screen. For this reason, when the central portion of the screen moves differently or has a different color compared to the surrounding area, it is highly likely that the subject exists in the center of the image. Therefore, a digest is generated by extracting a portion where the subject is likely to exist in the center of the image near the start portion of each moving image file.

すなわち、ジャンルが「子供・ペット」の場合、ダイジェスト生成部４３は、図６（Ｃ）に示すダイジェスト生成法Ｃにより、まず、各ファイルの画像の中央部分に被写体が存在する区間を検出する（Ｓ２１）。画像の中央部分に被写体が存在する区間を検出する方法としては、例えば、被写体が子供の場合は顔が画面中央にあることにより検出が可能であり、顔は単純に肌色が画面中央にあるかどうか、あるいは肌色と髪や目、口の色の位置関係等から検出する。また、被写体がペットの場合や顔が検出できない場合でも、画像の動きベクトルを抽出し、中央付近の動きがその周囲の動きと異なっている場合、そこに被写体があるものとして検出を行う。 That is, when the genre is “children / pets”, the digest generation unit 43 first detects a section in which the subject exists in the central portion of the image of each file by the digest generation method C shown in FIG. S21). For example, if the subject is a child, it can be detected by detecting that the face is in the center of the screen, and whether the face is simply the skin color in the center of the screen. It is detected from the positional relationship between the skin color and the color of the hair, eyes, or mouth. Even if the subject is a pet or a face cannot be detected, the motion vector of the image is extracted, and if the motion near the center is different from the surrounding motion, it is detected that the subject is there.

続いて、ダイジェスト生成部４３は、各ファイル毎に、ファイル開始部分に最も近く、かつ、検出された画像の中央部分に被写体が存在する区間内の先頭の５秒間を抽出した後（Ｓ２２）、各ファイル毎に抽出された上記の５秒間の区間を撮影時刻順に連結する（Ｓ２３）。 Subsequently, for each file, the digest generation unit 43 extracts the first 5 seconds in the section closest to the file start portion and in which the subject is present in the center portion of the detected image (S22), The above-mentioned 5-second sections extracted for each file are connected in order of photographing time (S23).

図７（Ｃ）では、各ファイルにおいて、画像の中心に被写体が存在する部分を灰色で示している。その後、画像の中心に被写体が存在する部分のうち、開始部分に最も近く、被写体が一定時間以上存在する部分を抽出する。ここでは、図７（Ｃ）の斜線部で表される各５秒間を抽出し連結して３０秒のダイジェストを生成する。このようにダイジェスト生成部４３は、図７（Ｃ）に示す同一シーンの６つのファイルの各５秒の区間を撮影時刻順に連結して３０秒のダイジェストを生成する。なお、この場合も、すべてのシーンから５秒の区間を抽出しても３０秒に満たない場合は、抽出する長さを５秒以上にしたり、１ファイルから被写体が存在する区間を２つ以上抽出したりすることにより３０秒のダイジェストを生成する。逆に、すべてのシーンの先頭５秒を抽出すると３０秒を超える場合は、撮影時間の長いファイルや前のファイルとの撮影間隔が長いファイル、被写体の存在する区間が長いものから優先的に抽出することにより３０秒のダイジェストを生成する。 In FIG. 7C, in each file, the portion where the subject exists at the center of the image is shown in gray. After that, the part where the subject is present closest to the start part and the subject exists for a certain time or more is extracted from the part where the subject exists in the center of the image. Here, 5 seconds each represented by the hatched portion in FIG. 7C are extracted and connected to generate a 30-second digest. In this way, the digest generating unit 43 generates a 30-second digest by connecting the 5-second sections of the six files of the same scene shown in FIG. In this case as well, if a 5-second section is extracted from all scenes and is not less than 30 seconds, the length to be extracted is set to 5 seconds or more, or two or more sections in which one subject exists from one file. By extracting, a digest of 30 seconds is generated. On the other hand, if the first 5 seconds of all scenes are extracted, if it exceeds 30 seconds, the files with long shooting times, files with a long shooting interval with the previous file, and those with long subjects are extracted with priority. To generate a 30-second digest.

次に、ジャンルが「パーティー」では、イベント中ビデオカメラで撮像し続け、一つの動画ファイル２１が長くなることが多い。そのため、重要なシーンで拍手や歓声が起こったり、他のカメラのフラッシュがたかれたりすることが多い。そこで、動画ファイルの音声情報や画像情報からこれらを検出し、その前後を抽出することでダイジェストを生成する。 Next, when the genre is “party”, images are continuously captured by the video camera during the event, and one moving image file 21 often becomes long. For this reason, applause and cheers often occur in important scenes, and other camera flashes are often hit. Therefore, the digest is generated by detecting these from the audio information and image information of the moving image file and extracting before and after.

すなわち、この場合、ダイジェスト生成部４３は、図６（Ｄ）に示すダイジェスト生成法Ｄにより、まず、撮影した動画ファイルからフラッシュや拍手が発生した時刻を検出し（Ｓ３１）、続いて検出した時刻の直前の５秒間と直後の５秒間の計１０秒間を抽出した後（Ｓ３２）、抽出された上記の１０秒間の区間を撮影時刻順に連結する（Ｓ３３）。 That is, in this case, the digest generation unit 43 first detects the time when a flash or applause occurred from the captured video file by the digest generation method D shown in FIG. 6D (S31), and then the detected time After extracting a total of 10 seconds, 5 seconds immediately before and 5 seconds immediately after (S32), the extracted 10 seconds section is connected in order of photographing time (S33).

図７（Ｄ）では、ジャンルが「パーティー」の場合のダイジェスト生成法を示したものであり、このジャンル「パーティー」では、音声が一定以上になった部分や、フラッシュがたかれた部分の前後を一定時間抽出する。図７（Ｄ）に斜線部で示すように、音声のピークとフラッシュがたかれた時刻は、３箇所あり、その３箇所の前後５秒ずつ計１０秒を抽出して３０秒のダイジェストを生成する。ここで、３箇所以上のフラッシュや拍手が検出された場合、抽出区間を短くすることですべてのフラッシュと拍手を検出したり、またより大きな拍手やフラッシュの発生する期間の長さ等により優先度をつけ、検出箇所を３箇所に絞っても良い。 FIG. 7D shows a digest generation method in the case where the genre is “party”. In this genre “party”, before and after the portion where the sound is more than a certain level or the portion where the flash is applied. Are extracted for a certain period of time. As shown by the shaded area in FIG. 7 (D), there are three voice peaks and flash times, and a total of 10 seconds is extracted for 5 seconds before and after the three places to generate a 30-second digest. To do. Here, when more than 3 flashes and applause are detected, all the flashes and applause are detected by shortening the extraction interval, and the priority is determined by the length of the period during which larger applauses and flashes occur. And the number of detection points may be narrowed down to three.

同様の方法により、ダイジェスト生成部４３は、シーン２、３についても、各シーンのジャンルに応じてダイジェストを作成する。 By a similar method, the digest generation unit 43 also generates a digest for scenes 2 and 3 according to the genre of each scene.

図８は、１５個の動画ファイル１〜１５から構成されたあるイベントが３つのシーンに分けられ、各シーン毎にジャンルに応じたダイジェストが生成法によりダイジェストが生成される例を示している。 FIG. 8 shows an example in which a certain event composed of 15 moving image files 1 to 15 is divided into three scenes, and a digest corresponding to the genre is generated for each scene by a generation method.

例えば、図８は、１５個の動画ファイル１〜１５から構成されたあるイベントのジャンルが、例えば、「旅行・レジャー」の場合を示している。つまり、図６（Ａ）に示すダイジェスト生成法Ａにより、シーン１〜３毎に、図７（Ａ）に示すように各動画ファイルの先頭から５秒間の一部分を抽出した例を示している。なお、図７（Ｂ）に示すシーン２では動画ファイルが４つであり、図７（Ｃ）シーン３では動画ファイルが５つであるので、各シーンの動画ファイルの先頭５秒以外からも抽出して、シーン２，３のダイジェストを３０秒としている。 For example, FIG. 8 shows a case where the genre of an event composed of 15 moving image files 1 to 15 is “travel / leisure”, for example. That is, an example is shown in which a part of 5 seconds from the beginning of each moving image file is extracted for each scene 1 to 3 by the digest generation method A shown in FIG. 6A, as shown in FIG. Since scene 2 shown in FIG. 7B has four moving image files, and scene 3 in FIG. 7C has five moving image files, extraction is performed from other than the first 5 seconds of the moving image file of each scene. The digest of scenes 2 and 3 is 30 seconds.

そして、最後に、ダイジェスト生成部４３は、イベントのシーン毎に動画ファイルから抽出したダイジェストを、例えば図９に示すように連結することで１のイベント全体のダイジェストを生成する（Ｓ３０４）。 Finally, the digest generating unit 43 generates a digest of the entire event by concatenating the digests extracted from the moving image file for each event scene as shown in FIG. 9, for example (S304).

図９は、図８に示す１５個の動画ファイル１〜１５から構成されたあるイベントのシーン毎に動画ファイルから抽出した３０秒間のダイジェストを３つ連結して、そのイベントのダイジェストを生成した例を示している。 FIG. 9 is an example in which three 30-second digests extracted from a video file are connected for each scene of an event composed of the 15 video files 1 to 15 shown in FIG. 8, and a digest of the event is generated. Is shown.

従って、本実施形態１のダイジェスト生成方法及び生成装置によれば、ビデオカメラで撮影した複数の動画ファイルをタグ２２に基づきイベント毎に分類し、更に同一イベントとして分類された複数の動画ファイルを、そのイベントのジャンルに応じて予め定めた閾値によりシーン毎に分類した後、同一のシーンとして分類された複数の動画ファイルのそれぞれから、各動画像ファイルのイベントのジャンルに応じたダイジェスト生成法により動画ファイルの一部分を抽出し、それぞれ抽出した複数の一部分を繋げた画像をダイジェストとして生成するようにしたため、イベントのジャンル毎に最適なダイジェストを自動的に生成することができる。 Therefore, according to the digest generating method and generating apparatus of the first embodiment, a plurality of moving image files photographed by the video camera are classified for each event based on the tag 22, and a plurality of moving image files classified as the same event are further classified. After classifying each scene according to a predetermined threshold according to the genre of the event, from each of a plurality of video files classified as the same scene, a video is generated by a digest generation method according to the event genre of each moving image file Since a part of the file is extracted and an image obtained by connecting a plurality of extracted parts is generated as a digest, an optimal digest can be automatically generated for each event genre.

なお、本実施形態１では、図２に示すように、イベントのジャンル毎に閾値およびダイジェスト生成法を個別に設定して変えることができるように説明したが、本発明では、これに限らず、ジャンル毎に閾値のみを個別に設定し、ダイジェスト生成法は共通にしても、さらには、ジャンル毎に閾値は共通にし、ジャンル毎に閾値のみを個別に設定するようにしても勿論よい。 In the first embodiment, as illustrated in FIG. 2, the threshold value and the digest generation method can be individually set and changed for each event genre. However, the present invention is not limited to this. Of course, only the threshold value may be individually set for each genre, the digest generation method may be common, and further, the threshold value may be common for each genre, and only the threshold value may be individually set for each genre.

実施形態２．
上記実施形態１では、イベントのジャンル毎に閾値を変えてシーンの分類を行う方法を説明したが、本実施形態２では、イベントを構成する動画ファイルの属性情報に応じて決定した閾値や係数等によりシーンの分類を行う方法について説明する。ここで言う属性情報とは、撮影時刻、撮影時間、撮影位置、ジャンル情報等の各ファイルに付随する情報、あるいは、全ファイルの撮影時間合計やイベント毎のファイル数等の複数のファイルから得られる情報である。なお、構成自体は、図１に示す実施形態１のものと同じであるので、図１に示す実施形態１の構成を参照して、実施形態２特有の動作を説明する。 Embodiment 2. FIG.
In the first embodiment, the method of classifying scenes by changing the threshold value for each event genre has been described. However, in the second embodiment, threshold values, coefficients, and the like determined according to the attribute information of the moving image file constituting the event. A method for classifying scenes will be described. The attribute information mentioned here is obtained from information attached to each file such as shooting time, shooting time, shooting position, and genre information, or a plurality of files such as the total shooting time of all files and the number of files for each event. Information. Since the configuration itself is the same as that of the first embodiment shown in FIG. 1, operations unique to the second embodiment will be described with reference to the configuration of the first embodiment shown in FIG.

図１０は、実施形態２におけるシーン分類の手順を示すフローチャートである。 FIG. 10 is a flowchart showing the procedure of scene classification in the second embodiment.

実施形態２では、まず、図３に示す実施形態１の閾値決定法と同様に、イベント毎に動画ファイル間の撮影間隔を算出し（Ｓ２０１）、続いてそのイベントの長さ、すなわちそのイベントの最初のファイルの撮影開始時刻と最後のファイルの撮影終了時刻との差によって特定イベントのシーン分類の際の閾値を決定する（Ｓ２０２）。例えば、イベントの長さの５％を閾値とするものとすると、イベントの長さが１０時間であった場合、その５％である３０分がシーン分類の閾値となり、イベントの長さが５時間であった場合、その５％である１５分がシーン分類の閾値とする。 In the second embodiment, first, as in the threshold value determination method of the first embodiment shown in FIG. 3, the shooting interval between moving image files is calculated for each event (S201), and then the length of the event, that is, the event A threshold for scene classification of a specific event is determined based on the difference between the shooting start time of the first file and the shooting end time of the last file (S202). For example, assuming that 5% of the length of an event is a threshold, if the length of the event is 10 hours, 30%, which is 5%, becomes the threshold for scene classification, and the length of the event is 5 hours. In this case, 15 minutes, which is 5%, is set as a threshold for scene classification.

ただし、本実施形態２では、さらに、イベントのジャンルが「旅行・レジャー」や「結婚式・パーティー」の場合はＳ２０２にてイベントの長さにより決定した閾値をそのまま用いるが、イベントのジャンルが「運動会・スポーツ」または「子供・ペット」の場合には、Ｓ２０２にて決定した閾値の１／３を閾値とする（Ｓ２０３“Ｙｅｓ”，Ｓ２０４）。これは、イベントのジャンルが「運動会・スポーツ」または「子供・ペット」の場合、短い時間間隔の動画ファイルが多いと予想されるからである。ここで挙げたジャンルと係数は一例であり、どのジャンルにどのような係数を乗じても勿論良い。また、係数は乗じるだけでなく、加算や累乗等をしても良い。なお、このように特定イベントの閾値を変更するＳ２０３，Ｓ２０４の処理は省略しても勿論よい。 However, in the second embodiment, when the event genre is “travel / leisure” or “wedding / party”, the threshold determined by the event length in S202 is used as it is, but the event genre is “ In the case of “athletic / sports” or “children / pets”, 1/3 of the threshold determined in S202 is set as the threshold (S203 “Yes”, S204). This is because when the event genre is “athletic / sports” or “children / pets”, it is expected that there will be many video files with short time intervals. The genres and coefficients listed here are examples, and any genre may be multiplied by any coefficient. Further, not only the coefficient but also addition or power may be performed. Of course, the processing of S203 and S204 for changing the threshold value of the specific event may be omitted.

そして、シーン分類部４２は、以上のようにして決定された閾値と、予めジャンル毎に決めておいた閾値の最大値とを比較し、短い方をシーン分類の閾値として用いる（Ｓ２０５〜２１５）。 Then, the scene classification unit 42 compares the threshold value determined as described above with the maximum threshold value determined in advance for each genre, and uses the shorter one as the threshold value for scene classification (S205 to 215). .

つまり、イベントのジャンルが「旅行・レジャー」かつＳ２０２で決定した閾値が３０分以上であれば、３０分を閾値とする一方（Ｓ２０５“Ｙｅｓ”，Ｓ２０６）、イベントのジャンルが「運動会・スポーツ」かつＳ２０４で決定した閾値が１０分以上であれば、１０分を閾値とする一方（Ｓ２０７“Ｙｅｓ”，Ｓ２０８）、イベントのジャンルが「子供・ペット」でイベントの撮影期間が３日未満かつＳ２０４で決定した閾値が１０分以上であれば、１０分を閾値とする（Ｓ２０９“Ｙｅｓ”，Ｓ２１０“Ｎｏ”，Ｓ２１２“Ｙｅｓ”，Ｓ２１３）。 That is, if the event genre is “travel / leisure” and the threshold determined in S202 is 30 minutes or more, the threshold is 30 minutes (S205 “Yes”, S206), while the event genre is “athletic / sports”. If the threshold determined in S204 is 10 minutes or more, the threshold is 10 minutes (S207 “Yes”, S208), while the event genre is “children / pets”, the event shooting period is less than 3 days, and S204. If the threshold value determined in step 10 is 10 minutes or more, 10 minutes is set as the threshold value (S209 “Yes”, S210 “No”, S212 “Yes”, S213).

これに対し、Ｓ２０２やＳ２０４でイベントの長さに応じて決定した閾値が各ジャンルの最大値を超えない場合は、その閾値をシーン分類の閾値とし、以上のようにして決定した閾値により動画ファイルをシーンに分類する（Ｓ２１６）。 On the other hand, when the threshold determined according to the length of the event in S202 or S204 does not exceed the maximum value of each genre, the threshold is set as the threshold for scene classification, and the moving image file is set according to the threshold determined as described above. Are classified into scenes (S216).

なお、イベントをシーンに分類した後の、ジャンル毎のダイジェスト作成法は、上記実施形態１と同様である。 The method of creating a digest for each genre after classifying events into scenes is the same as in the first embodiment.

従って、本実施形態２のダイジェスト生成方法及び生成装置によれば、ビデオカメラで撮影した複数の動画ファイルをタグ２２に基づきイベント毎に分類し、更に同一イベントとして分類された複数の動画ファイルを、ジャンル毎に閾値の最大値を設け、イベントの長さに応じて決定した閾値がジャンル毎の閾値の最大値を超えれば、その最大値を閾値とする一方、最大値を超えなければ、イベントの長さに応じて決定した閾値等によりシーンの分類を行い、その後、実施形態１と同様にダイジェストを生成するようにしたため、ジャンル毎に閾値の最大値の制限はあるものの、イベントの長さに応じた閾値を設定してシーン毎に分類できると共に、イベントのジャンル毎に最適なダイジェストを自動的に生成することができる。 Therefore, according to the digest generating method and generating apparatus of the second embodiment, a plurality of moving image files photographed by the video camera are classified for each event based on the tag 22, and a plurality of moving image files classified as the same event are further classified. A maximum threshold value is set for each genre, and if the threshold value determined according to the length of the event exceeds the maximum threshold value for each genre, the maximum value is set as the threshold value. Since the scenes are classified according to the threshold value determined according to the length, and then the digest is generated in the same manner as in the first embodiment, the maximum value of the threshold value is limited for each genre. It is possible to classify each scene by setting a corresponding threshold value, and to automatically generate an optimal digest for each event genre.

なお、本実施形態２では、各ジャンルのシーン分類の閾値の最大値を設定したが、逆に、各ジャンルのシーン分類の閾値の最小値を設定し、イベントの長さに応じて決定した閾値が最小値よりも小さい場合、最小値をシーン分類の閾値としても良い。このようにすれば、例えば、「旅行・レジャー」のジャンルの場合、イベントの長さが短く、閾値が極端に短くなってしまう可能性がある。しかし、「旅行・レジャー」ジャンルでは、少なくとも移動が行われる程度の間隔でシーン分類されることが望ましい。そのため、「旅行・レジャー」のジャンルでは、シーン分類の閾値の最小値を，移動が行われるであろう最短時間である例えば１０分を設定しておき、閾値がこの値以下であった場合には、１０分を閾値とする。勿論、以上で挙げた最大値、最小値の値は一例であり、他の値を与えても良い。 In the second embodiment, the maximum value of the scene classification threshold for each genre is set, but conversely, the minimum value of the scene classification threshold for each genre is set and the threshold determined according to the length of the event. Is smaller than the minimum value, the minimum value may be used as a threshold for scene classification. In this way, for example, in the case of a genre of “travel / leisure”, there is a possibility that the length of the event is short and the threshold value becomes extremely short. However, in the “travel / leisure” genre, it is desirable to classify scenes at intervals of at least movement. For this reason, in the category of “travel / leisure”, the minimum threshold value for scene classification is set to, for example, 10 minutes, which is the shortest time during which movement is performed, and the threshold value is less than this value. Uses 10 minutes as a threshold value. Of course, the maximum and minimum values mentioned above are examples, and other values may be given.

また、本実施形態２では、イベントの長さによりシーン分類の閾値を決定するように説明したが、本発明では、これに限らず、イベントを構成する各動画ファイル間の撮影間隔の平均等を閾値として設定するようにしても勿論よい。 In the second embodiment, the scene classification threshold is determined based on the length of the event. However, the present invention is not limited to this, and the average of the shooting intervals between the video files constituting the event is determined. Of course, it may be set as a threshold value.

また、本実施形態２では、図１０に示すように、Ｓ２０３，Ｓ２０４の処理により、特定ジャンルのみ、イベントの長さやファイル間の撮影間隔の平均等の属性情報に基づいて決定した閾値に対し、ジャンル毎の最大値や最小値という制限を設けたり、特定ジャンルのみその閾値に係数を乗算等して調整するように説明したが、本発明では、これに限らず、イベントの属性情報に基づいて決定した閾値に対し、図１０のＳ２０３〜Ｓ２１５のステップは省略してジャンル毎の最大値や最小値という制限や、特定ジャンルへの閾値調整を行わないようにしたり、あるいは図１０のＳ２０５〜Ｓ２１５のステップは省略して特定ジャンルへの閾値調整は行うが、ジャンル毎の最大値や最小値という制限を設けないようにしても勿論よい。なお、このことは、後述する実施形態３，４でも同様である。 Further, in the second embodiment, as shown in FIG. 10, with respect to the threshold value determined based on attribute information such as the length of the event and the average of the shooting interval between files for only a specific genre by the processing of S203 and S204. Although it has been described that the maximum value and the minimum value are set for each genre, or that only a specific genre is adjusted by multiplying the threshold value by a coefficient or the like, the present invention is not limited to this and is based on event attribute information. The steps S203 to S215 in FIG. 10 are omitted with respect to the determined threshold value, so that the limitation of the maximum value and the minimum value for each genre, the threshold value adjustment to a specific genre is not performed, or S205 to S215 in FIG. This step is omitted and the threshold adjustment to the specific genre is performed, but it is of course possible not to provide the maximum value or the minimum value for each genre. This also applies to Embodiments 3 and 4 described later.

実施形態３．
実施形態３は、図１０に示す実施形態２のＳ２０２における閾値決定法を変更したもので、その部分についてのみ説明を行う。 Embodiment 3. FIG.
In the third embodiment, the threshold value determination method in S202 of the second embodiment shown in FIG. 10 is changed, and only that portion will be described.

つまり、上記実施形態２では、属性情報としてイベントの長さによりシーン分類の際の閾値を決定したが、本実施形態３では，それに加えてそのイベントの全ファイル間の撮影間隔の平均をも用いるものとする。以下に、撮影間隔の平均を用いて閾値を求める例を挙げる。 That is, in the second embodiment, the threshold value for scene classification is determined by the length of the event as attribute information. In the third embodiment, the average of the shooting intervals between all files of the event is also used. Shall. Hereinafter, an example in which the threshold value is obtained using the average of the shooting intervals will be described.

例えば、あるイベントにおいて、５つの動画ファイルを撮影し、その撮影間隔が５分、１０分、３０分、３分であった場合、それらの平均値の１２分を撮影間隔の平均による閾値とする。 For example, in a certain event, when five moving image files are shot and the shooting intervals are 5, 10, 30, and 3 minutes, the average value of 12 minutes is set as a threshold based on the average shooting interval. .

そして、本実施形態４では、イベントの長さと、イベントを構成する全動画ファイル間の撮影間隔の平均という２つの属性情報から得られた閾値の大きい方、あるいは小さい方によりシーン分類の閾値として用いる。勿論、得られた２つの閾値の平均を使用するようにしても良い。 In the fourth embodiment, the threshold value for scene classification is determined based on the larger or smaller threshold obtained from the two attribute information, that is, the length of the event and the average of the shooting intervals between all moving image files constituting the event. . Of course, the average of the two obtained threshold values may be used.

例えば、短い間隔で動画ファイルを撮影したイベントの中に、長い撮影間隔が開いた動画ファイルが１つだけあった場合、撮影間隔の平均による閾値は長くなる。そのため、その平均よりは撮影間隔が短い別シーンの動画ファイルでも、同一シーンで分類されることがあり得る。そのため、本実施形態では、イベントの中に所定値以上の撮影間隔の動画ファイルがあった場合には、イベントの長さに基づき決定した閾値を用いる。 For example, if there is only one moving image file with a long shooting interval in an event of shooting a moving image file at a short interval, the threshold based on the average shooting interval becomes long. For this reason, even a moving image file of another scene whose shooting interval is shorter than the average may be classified in the same scene. Therefore, in the present embodiment, when there is a moving image file with a shooting interval equal to or greater than a predetermined value in an event, a threshold value determined based on the length of the event is used.

一方、多くの動画ファイルをほぼ一定の撮影間隔で撮影した場合、ファイル数に比例してイベントの長さが長くなるので、イベントの長さによる閾値は長くなってしまう。そのため、イベントにおける動画ファイル数が所定値以上である場合は、撮影間隔の平均値による閾値を用いる。 On the other hand, when many moving image files are shot at a substantially constant shooting interval, the event length increases in proportion to the number of files, so the threshold value due to the event length increases. Therefore, when the number of moving image files in the event is greater than or equal to a predetermined value, a threshold value based on the average value of shooting intervals is used.

従って、本実施形態３のダイジェスト生成方法及び生成装置によれば、ビデオカメラで撮影した複数の動画ファイルをタグ２２に基づきイベント毎に分類し、更に同一イベントとして分類された複数の動画ファイルを、イベントの長さに基づき決定した閾値、またはイベントを構成する全動画ファイル間の撮影間隔の平均による閾値のいずれか一方を用いてシーンの分類を行い、その後、実施形態１，２と同様にダイジェストを生成するようにしたため、イベントの長さまたはファイルの撮影間隔に応じた最適なシーン分類を行った上で、イベントのジャンル毎に最適なダイジェストを自動的に生成することができる。 Therefore, according to the digest generation method and generation apparatus of the third embodiment, a plurality of video files photographed by the video camera are classified for each event based on the tag 22, and a plurality of video files classified as the same event are further classified. Scene classification is performed using either the threshold determined based on the length of the event or the threshold based on the average of the shooting intervals between all moving image files constituting the event, and then digested in the same manner as in the first and second embodiments. Therefore, an optimal digest can be automatically generated for each event genre after optimal scene classification according to the length of the event or the shooting interval of the file is performed.

実施形態４．
実施形態４は上記実施形態３と同様に、図１０に示す実施形態２のＳ２０２における閾値決定法を変更したもので、その部分についてのみ説明を行う。 Embodiment 4 FIG.
In the fourth embodiment, as in the third embodiment, the threshold value determination method in S202 of the second embodiment shown in FIG. 10 is changed, and only that portion will be described.

つまり、上記実施形態２では、イベントの長さによって特定ジャンルのシーン分類の際の閾値を決定したが、本実施形態４では、イベントの各ファイルの撮影地点の位置情報により閾値を決定する。そのため、本実施形態４の場合、動画ファイルを撮影するビデオカメラは、ＧＰＳ機能を有しており、撮影して記録した動画ファイルに、撮影地点の位置情報を属性情報として付与している。 That is, in the second embodiment, the threshold value for scene classification of a specific genre is determined based on the length of the event. In the fourth embodiment, the threshold value is determined based on the position information of the shooting point of each file of the event. For this reason, in the case of the fourth embodiment, a video camera that captures a moving image file has a GPS function, and adds position information of a shooting point as attribute information to the captured moving image file.

そのため、実施形態４では、例えば、イベントを構成する各動画ファイルに付与されている属性情報のＧＰＳ情報に基づいて各動画ファイルの撮影位置を求め、動画ファイル間の移動距離の合計を求める。この合計移動距離に、ある係数を作用させてシーン分類の際の閾値を求める。例えば、移動距離が６００００（ｍ）であった場合、１／２０００を乗じた３０（分）をシーン分類の閾値とする。 Therefore, in the fourth embodiment, for example, the shooting position of each moving image file is obtained based on the GPS information of the attribute information given to each moving image file constituting the event, and the total moving distance between the moving image files is obtained. A threshold is used for scene classification by applying a certain coefficient to the total movement distance. For example, when the moving distance is 60000 (m), 30 (minutes) multiplied by 1/2000 is set as the threshold for scene classification.

従って、本実施形態４のダイジェスト生成方法及び生成装置によれば、ビデオカメラで撮影した複数の動画ファイルをタグ２２に基づきイベント毎に分類し、更に同一イベントとして分類された複数の動画ファイルの位置情報に基づく閾値によりシーンの分類を行い、その後、実施形態１〜３と同様にダイジェストを生成するようにしたため、動画ファイルの位置情報に基づいて最適なシーン分類を行った上で、イベントのジャンル毎に最適なダイジェストを自動的に生成することができる。 Therefore, according to the digest generating method and generating apparatus of the fourth embodiment, the plurality of moving image files photographed by the video camera are classified for each event based on the tag 22, and the positions of the plurality of moving image files classified as the same event are further classified. Since scenes are classified based on threshold values based on information, and then digests are generated in the same manner as in the first to third embodiments, an optimal scene classification is performed based on the position information of the moving image file, and then the event genre An optimal digest can be automatically generated every time.

なお、上記実施形態１〜４の説明では、動画撮影装置として、ビデオカメラを一例に説明したが、これは一例であり、本発明では、ビデオカメラ以外に、デジタルカメラや動画撮影機能を有する携帯電話等、動画撮影機能を有する装置であれば、本発明の装置および方法は適用される。 In the description of the first to fourth embodiments, the video camera has been described as an example of the moving image shooting device. However, this is an example, and in the present invention, in addition to the video camera, a mobile phone having a digital camera and a moving image shooting function The apparatus and method of the present invention can be applied to any apparatus having a moving image shooting function such as a telephone.

また、上記実施形態２〜４では、シーン分類する際の属性情報として、イベントの長さや、イベントを構成する全動画ファイル間の撮影間隔の平均、各動画ファイルの位置情報等の例を挙げたが、本発明では、属性情報はこれらに限定されるものではなく、他の属性情報を用いてシーン分類の閾値を求めても良いし、実施形態３のように複数の属性情報を組み合わせても良い。 In the second to fourth embodiments, examples of the attribute information when classifying the scene include the length of the event, the average of the shooting intervals between all the moving image files constituting the event, and the position information of each moving image file. However, in the present invention, the attribute information is not limited to these, and the threshold for scene classification may be obtained using other attribute information, or a plurality of attribute information may be combined as in the third embodiment. good.

また、上記実施形態１〜４では、シーン分類をした後、各シーンからジャンルに応じたダイジェスト生成法により動画ファイルの一部分を抽出してダイジェストを生成するように説明したが、本明細書に記載の発明は、これに限らず、上記実施形態１〜４の方法により閾値を設定してシーン分類をするだけでダイジェストを生成しない装置および方法でも、上記実施形態１〜４の方法により閾値を設定してシーン分類をし、ダイジェスト生成は上記実施形態１〜４のダイジェスト生成方法とは別の方法を採用した装置および方法に適用するようにしても良い。 In Embodiments 1 to 4 described above, after scene classification, a digest is generated by extracting a part of a video file from each scene by a digest generation method according to the genre. However, the present invention is not limited to this, and the threshold value is set by the method of Embodiments 1 to 4 even in the apparatus and method that does not generate a digest only by setting the threshold value by the method of Embodiments 1 to 4 and classifying the scene. Then, scene classification may be performed, and digest generation may be applied to an apparatus and method that employs a method different from the digest generation method of the first to fourth embodiments.

本発明のダイジェスト生成装置の実施形態１の構成例を示すブロック図である。It is a block diagram which shows the structural example of Embodiment 1 of the digest production | generation apparatus of this invention. 実施形態１におけるイベントのジャンルの各例と、ジャンル毎にシーン分類閾値とダイジェスト生成法の設定例とを示す図である。It is a figure which shows each example of the genre of the event in Embodiment 1, and the setting example of a scene classification threshold value and a digest production | generation method for every genre. 実施形態１におけるシーン分類処理の手順を示すフローチャートである。4 is a flowchart illustrating a procedure of scene classification processing according to the first embodiment. 撮影された動画ファイルをシーンに分類した一例を示す図である。It is a figure which shows an example which classified the image | photographed moving image file into the scene. 実施形態１におけるダイジェスト生成処理の手順を示すフローチャートである。6 is a flowchart illustrating a procedure of digest generation processing according to the first embodiment. （Ａ）〜（Ｄ）、それぞれ、ジャンルに応じたダイジェスト生成法Ａ〜Ｄの一例を示すフローチャートである。(A)-(D) is a flowchart which shows an example of the digest production | generation methods AD according to a genre, respectively. （Ａ）〜（Ｄ）、それぞれ、図６（Ａ）〜（Ｄ）に示すジャンルに応じたダイジェスト生成法Ａ〜Ｄにより各シーンから抽出するカットの一例を示す図である。(A)-(D) are the figures which show an example of the cut extracted from each scene by the digest production | generation methods AD according to the genre shown to FIG. 6 (A)-(D), respectively. 本実施形態１によるあるイベントのシーン毎のダイジェスト抽出例を示す図である。It is a figure which shows the example of digest extraction for every scene of a certain event by this Embodiment 1. FIG. 図８によりシーン毎に抽出したダイジェストを繋げてあるイベント全体のダイジェストを生成する例を示す図である。It is a figure which shows the example which produces | generates the digest of the whole event which connected the digest extracted for every scene by FIG. 実施形態２におけるシーン分類の手順を示すフローチャートである。10 is a flowchart illustrating a procedure for scene classification in the second embodiment.

Explanation of symbols

１１コンテンツ保持部
２１動画ファイル
２２タグ
３１撮影部
３２タグ作成部
４１イベント分類部
４２シーン分類部
４３ダイジェスト生成部
４４入力指示部 DESCRIPTION OF SYMBOLS 11 Content holding part 21 Movie file 22 Tag 31 Shooting part 32 Tag creation part 41 Event classification part 42 Scene classification part 43 Digest generation part 44 Input instruction part

Claims

A digest generation device that generates a digest from a plurality of video files obtained by shooting,
Tag creation means for creating a tag in which genre information indicating the genre of an event of each video file and event identification information for identifying the events from each other are described for each of the plurality of video files;
Event classification means for classifying the plurality of video files for each event based on the event identification information described in the tag;
Scene classification means for classifying the plurality of video files classified for each event into scenes based on shooting intervals between the video files;
Digest generating means for extracting a part from a plurality of video files classified for each scene and generating as a digest by a digest generating method based on the genre information described in the tag;
A digest generation device.

In the digest production | generation apparatus of Claim 1,
The scene classification means includes
A threshold is set in advance for each genre,
When classifying the plurality of video files classified for each event, for each scene based on the shooting interval between the video files,
Based on the genre information described in the tags of the plurality of video files classified for each event, the threshold value corresponding to the genre of the plurality of video files is set, and the set threshold value and the video file A digest generation device that classifies scenes based on the shooting interval between them.

In the digest production | generation apparatus of Claim 1,
The scene classification means includes
The maximum threshold value is set in advance for each genre,
When classifying the plurality of video files classified for each event, for each scene based on the shooting interval between the video files,
Determine the threshold based on the attribute information of the video file that constitutes each event,
Based on the genre information described in the tags of the plurality of video files classified for each event, the maximum value according to the genre of the plurality of video files, and the attribute information determined Compare with the threshold,
When the threshold value determined by the attribute information is greater than or equal to the maximum value, the maximum value is selected, while when the threshold value determined based on the attribute information is smaller than the maximum value, the threshold value is determined based on the attribute information Select the threshold,
A digest generation apparatus that classifies each scene based on the selected threshold value or the maximum value and a shooting interval between the moving image files.

In the digest production | generation apparatus of Claim 1,
The scene classification means includes
The minimum threshold value is set in advance for each genre,
When classifying the plurality of video files classified for each event, for each scene based on the shooting interval between the video files,
Determine the threshold based on the attribute information of the video file that constitutes each event,
Based on the genre information described in the tags of the plurality of video files classified for each event, the maximum value according to the genre of the plurality of video files, and the threshold value for determining the attribute information Compare
When the threshold value determined based on the attribute information is less than or equal to the minimum value, the minimum value is selected, while when the threshold value determined based on the attribute information is greater than the minimum value, based on the attribute information Select the determined threshold value,
A digest generation device that classifies each scene based on the selected threshold value or the minimum value and a shooting interval between the moving image files.

In the digest production | generation apparatus of Claim 1,
The scene classification means includes
When classifying the plurality of video files classified for each event, for each scene based on the shooting interval between the video files,
The threshold value is determined based on the attribute information of the video file constituting each event, and the threshold value determined based on the attribute information is adjusted based on a predetermined coefficient only when the genre of each event is a specific genre. A digest generator.

In the digest production | generation apparatus as described in any one of Claims 3-5,
The said attribute information is a digest production | generation apparatus which is the length of an event.

In the digest production | generation apparatus as described in any one of Claims 3-5,
The said attribute information is a digest production | generation apparatus which is the average of the imaging | photography space | interval of the moving image file which comprises an event.

In the digest production | generation apparatus as described in any one of Claims 3-5,
The said attribute information is a digest production | generation apparatus which is the positional information which shows the imaging | photography position of the moving image file which comprises an event.

A digest generation method for generating a digest from a plurality of video files obtained by shooting,
Based on the event identification information of a tag in which genre information indicating an event genre of each moving image file assigned to the plurality of moving image files and event identification information for identifying the events from each other are described. Categorizing video files by event,
Classifying the plurality of video files classified for each event into scenes based on shooting intervals between the video files;
Extracting a part from a plurality of moving image files classified for each scene by the digest generation method based on the genre information described in the tag, and generating as a digest;
A digest generation method comprising: