JP2018206292A

JP2018206292A - Video summary creation device and program

Info

Publication number: JP2018206292A
Application number: JP2017114206A
Authority: JP
Inventors: 貴裕望月; Takahiro Mochizuki; 松井　淳; Atsushi Matsui; 淳松井; 吉彦河合; Yoshihiko Kawai; 伶遠藤; Rei Endo
Original assignee: Nippon Hoso Kyokai NHK; Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2017-06-09
Filing date: 2017-06-09
Publication date: 2018-12-27
Anticipated expiration: 2037-06-09
Also published as: JP6917788B2

Abstract

To create a video summary which is constituted of only an image between important zones by taking into consideration a performance appearing in the image.SOLUTION: A cut division part 10 of a video summary creation device 1 divides a program video into cut videos V[i], a scene creation part 11 creates a scene video V[j] by integrating the cut videos V[i] having the same scenes. An element score calculation part 12 calculates element scores S[i] to S[i] with respect to the respective cut videos V[i] at each performance such as a "telop". An integration score calculation part 13 calculates an integration score S[i] of the cut videos V[i] on the basis of the element scores S[i] to S[i] and a weight coefficient Wto W.A video summary creation part 14 refers to the integration score S[i] and the scene video V[j], selects the cut videos V[i] until a length of a video summary as a whole exceeds a prescribed value, and creates the video summary by sorting and connecting the selected cut videos V[i] in time series. By this constitution, the video summary taking into consideration the performance such as the "telop" can be created.SELECTED DRAWING: Figure 1

Description

本発明は、コンピュータ及びハードディスクを用いた映像処理分野において、要約映像を生成する装置及びプログラムに関する。 The present invention relates to an apparatus and program for generating summary video in the field of video processing using a computer and a hard disk.

従来、放送局では、視聴者の番組への関心を高める媒体として、番組ＨＰ等の「ネット用コンテンツ」の必要性が高まっている。ネット用コンテンツにおいては、番組映像中の重要な映像区間のみで構成されたダイジェスト映像（要約映像）を配信することが望ましい。 2. Description of the Related Art Conventionally, in broadcasting stations, there is an increasing need for “net contents” such as the program HP as a medium for increasing the interest of viewers in programs. In content for the Internet, it is desirable to distribute a digest video (summary video) composed only of important video sections in a program video.

しかしながら、要約映像を人手により制作することは、労力及び費用の両面でコストが非常に高くなってしまう。このため、要約映像を自動的に生成する技術が望まれている。要約映像を自動的に生成する技術として、例えば特許文献１〜５の手法が提案されている。 However, manually producing a summary video is very expensive in terms of both labor and cost. For this reason, a technique for automatically generating a summary video is desired. As techniques for automatically generating a summary video, for example, methods of Patent Documents 1 to 5 have been proposed.

特許文献１の手法は、映像のモーダル毎に映像を複数の映像区間に分割し、２つの映像区間の類似度を求め、類似度に基づいて映像区間をクラスタリングする。そして、複数のクラスタのそれぞれから代表的な映像区間を抽出し、代表的な映像区間を結合することで要約映像を生成する。 In the method of Patent Document 1, a video is divided into a plurality of video sections for each modal of the video, the similarity between two video sections is obtained, and the video sections are clustered based on the similarity. Then, a representative video section is extracted from each of the plurality of clusters, and a summary video is generated by combining the representative video sections.

特許文献２の手法は、映像毎の類似度に基づいて映像間で対応区間を生成し、対応区間から共通映像区間及び個別映像区間を抽出し、共通映像区間から共通要約区間を選択すると共に、個別映像区間から個別要約区間を選択する。そして、共通要約区間及び個別要約区間を統合して要約映像を生成する。 The method of Patent Document 2 generates a corresponding section between videos based on the similarity for each video, extracts a common video section and an individual video section from the corresponding section, selects a common summary section from the common video section, Select an individual summary section from an individual video section. Then, the summary video is generated by integrating the common summary section and the individual summary section.

特許文献３の手法は、メタデータ及び特徴量に基づいて、複数の映像区間から１つ以上の映像区間を選択し、所定の評価関数の評価値を最大とする映像区間の集合を求め、映像区間の集合を結合して要約映像を生成する。 The method of Patent Document 3 selects one or more video segments from a plurality of video segments based on metadata and feature amounts, obtains a set of video segments that maximizes an evaluation value of a predetermined evaluation function, and A summary video is generated by combining sets of sections.

特許文献４の手法は、映像に対して画像特徴量及び音声特徴量を求め、画像特徴量及び音声特徴量に基づいて、映像の処理単位の重要度を算出し、重要度に基づいて要約映像を生成する。 The method of Patent Document 4 obtains an image feature amount and an audio feature amount for a video, calculates the importance of a video processing unit based on the image feature amount and the audio feature amount, and summarizes the video based on the importance. Is generated.

特許文献５の手法は、元映像から時間の短い分割映像を生成し、分割映像毎に、ブロック領域を視覚単語とみなし、視覚単語の特徴量に基づいてスコアを算出し、スコアの高い順に分割映像を選択して要約映像を生成する。 The method of Patent Literature 5 generates a divided video with a short time from the original video, regards the block area as a visual word for each divided video, calculates a score based on the feature amount of the visual word, and divides the score in descending order of score. Select video and generate summary video.

特開２０１４−１７９９０６号公報JP 2014-179906 A 特開２０１３−１２６２３３号公報JP 2013-126233 A 特開２０１２−１９３０５号公報JP 2012-19305 A 特開２０１４−３３４１７号公報JP, 2014-33417, A 特開２０１２−１０２６５号公報JP 2012-10265 A

前述の特許文献１，２の手法は、映像区間同士の類似度または共通区間の有無に基づいて、要約映像を生成するものである。しかし、これらの手法は、類似した映像区間が繰り返し出現することを前提とするものであるため、類似した映像区間が繰り返し出現するとは限らない一般の放送番組映像へ適用することは難しい。 The methods described in Patent Documents 1 and 2 generate summary videos based on the similarity between video sections or the presence or absence of a common section. However, since these methods are based on the premise that similar video sections appear repeatedly, it is difficult to apply to general broadcast program videos in which similar video sections do not always appear repeatedly.

また、前述の特許文献３の手法は、基本的な画像の特徴量、音声の特徴量及び付与されたメタデータに基づいて、要約映像を生成するものである。しかし、映像に対して詳細なメタデータを予め付与しておく必要があり、処理負荷が高い。 Further, the method disclosed in Patent Document 3 described above generates a summary video based on a basic image feature amount, audio feature amount, and assigned metadata. However, it is necessary to give detailed metadata to the video in advance, and the processing load is high.

また、前述の特許文献４，５の手法は、メタデータを利用することなく、基本的な画像の特徴量及び音声の特徴量に基づいて、要約映像を生成するものである。これらの手法は、メタデータを利用しないから、特許文献３の手法に比べて処理負荷が低い。 In addition, the methods disclosed in Patent Documents 4 and 5 generate a summary video based on basic image feature amounts and audio feature amounts without using metadata. Since these methods do not use metadata, the processing load is low compared to the method of Patent Document 3.

これらの特許文献１〜５の手法は、いずれも要約映像を生成するものであるが、映像に現れる演出を考慮していない。このため、要約映像には、演出の観点からみた重要な場面が含まれない場合がある。ここで、演出とは、脚本等に基づいて、所定の意図を達成するように表現し、効果的に見せることをいう。例えば放送番組映像の演出としては、テロップ表示、メインの出演者またはゲストの登場、カメラのズームインまたはパンニング、説明用のＣＧ映像等の要素がある。これらの要素は、映像の中で重要な場面に使用される傾向が高い。 These methods of Patent Documents 1 to 5 all generate a summary video, but do not consider the effects that appear in the video. For this reason, the summary video may not include an important scene from the viewpoint of production. Here, the term “production” refers to expressing effectively so as to achieve a predetermined intention based on a script or the like. For example, the production of broadcast program video includes elements such as telop display, appearance of a main performer or guest, camera zoom-in or panning, and explanatory CG video. These elements tend to be used for important scenes in video.

一般に、要約映像は、重要な区間の映像のみで構成されることが望ましい。このため、映像に対して演出による効果の程度を求め、効果の高い区間を重要な区間として特定し、重要な区間の映像を結合して要約映像を生成することが所望されていた。 In general, it is desirable that the summary video is composed only of video in an important section. For this reason, it has been desired to determine the degree of effect of the effect on the video, identify a highly effective section as an important section, and combine the videos in the important section to generate a summary video.

そこで、本発明は前記課題を解決するためになされたものであり、その目的は、映像に現れる演出を考慮することで、重要な区間の映像のみで構成される要約映像を生成可能な要約映像生成装置及びプログラムを提供することにある。 Therefore, the present invention has been made to solve the above-described problems, and its purpose is to take into account the effects that appear in the video and to generate a summary video that can generate a summary video composed only of video in an important section. To provide a generation device and a program.

前記課題を解決するために、請求項１の要約映像生成装置は、映像から要約映像を生成する要約映像生成装置において、前記映像をカット単位の複数のカット映像に分割するカット分割部と、前記カット分割部により分割された前記複数のカット映像のそれぞれについて、所定数の異なる演出毎に、当該演出の重要度を表すスコアを算出するスコア算出部と、前記カット分割部により分割された前記複数のカット映像のそれぞれについて、前記スコア算出部により算出された前記演出毎のスコアに基づいて、総合スコアを算出する総合スコア算出部と、前記総合スコア算出部により算出された前記総合スコアに基づいて、前記複数のカット映像から、前記要約映像を構成するカット映像を選択し、前記要約映像を生成する要約映像生成部と、を備えたことを特徴とする。 In order to solve the above-mentioned problem, the summary video generation device according to claim 1 is a summary video generation device that generates a summary video from a video, a cut division unit that divides the video into a plurality of cut videos in a cut unit, For each of the plurality of cut videos divided by the cut dividing unit, for each predetermined number of different effects, a score calculating unit that calculates a score representing the importance of the effect, and the plurality of the divided divided by the cut dividing unit For each of the cut video, based on the score for each effect calculated by the score calculation unit, based on the total score calculated by the total score calculation unit and the total score calculation unit that calculates a total score A summary video generation unit that selects the cut video constituting the summary video from the plurality of cut videos, and generates the summary video; Characterized by comprising.

また、請求項２の要約映像生成装置は、請求項１に記載の要約映像生成装置において、前記総合スコア算出部が、前記複数のカット映像のそれぞれについて、前記スコア算出部により算出された前記演出毎のスコア、及び予め設定された演出毎の重み係数に基づいて、前記総合スコアを算出する、ことを特徴とする。 The summary video generation device according to claim 2 is the summary video generation device according to claim 1, wherein the total score calculation unit calculates the effect calculated by the score calculation unit for each of the plurality of cut videos. The total score is calculated based on each score and a preset weighting factor for each effect.

また、請求項３の要約映像生成装置は、請求項１または２に記載の要約映像生成装置において、さらに、前記カット分割部により分割された前記複数のカット映像から、同じ場面のカット映像をシーン映像として生成するシーン生成部を備え、前記要約映像生成部が、前記シーン生成部により生成された前記シーン映像の中から選択する前記カット映像の数が所定値を超えないように、前記要約映像を構成するカット映像を選択する、ことを特徴とする。 The summary video generation device according to claim 3 is the summary video generation device according to claim 1, further comprising: cutting video of the same scene from the plurality of cut videos divided by the cut division unit. A summary generation video generation unit, wherein the summary video generation unit does not exceed a predetermined value so that a number of the cut video selected from the scene video generated by the scene generation unit does not exceed a predetermined value. It is characterized by selecting a cut video that constitutes.

また、請求項４の要約映像生成装置は、請求項１から３までのいずれか一項に記載の要約映像生成装置において、前記スコア算出部が、前記演出に関連する対象が前記映像内に現れる面積、前記演出に関連する対象の動きの量、または前記演出に関連する対象が現れる確率に基づいて、前記スコアを算出する、ことを特徴とする。 The summary video generation device according to claim 4 is the summary video generation device according to any one of claims 1 to 3, wherein the score calculation unit causes an object related to the effect to appear in the video. The score is calculated based on the area, the amount of movement of the object related to the effect, or the probability that the object related to the effect will appear.

また、請求項５の要約映像生成装置は、請求項３に記載の要約映像生成装置において、前記要約映像生成部が、前記カット分割部により分割された前記複数のカット映像から、前記総合スコア算出部により算出された前記総合スコアに従って前記演出の重要度の高い順番に、前記シーン生成部により生成された前記シーン映像の中から選択する前記カット映像の数が所定値を超えないように、前記要約映像の全体の長さが所定値を超えるまで、前記要約映像を構成するカット映像を選択する要約映像選択部と、前記要約映像選択部により選択された前記カット映像を時系列に連結し、前記要約映像を生成する要約映像出力部と、を備えたことを特徴とする。 The summary video generation device according to claim 5 is the summary video generation device according to claim 3, wherein the summary video generation unit calculates the total score from the plurality of cut videos divided by the cut division unit. The number of cut videos selected from the scene videos generated by the scene generation unit in order of the importance of the effect according to the total score calculated by the unit does not exceed a predetermined value. Until the total length of the summary video exceeds a predetermined value, the summary video selection unit that selects the cut video constituting the summary video, and the cut video selected by the summary video selection unit are connected in time series, A summary video output unit for generating the summary video.

さらに、請求項６のプログラムは、コンピュータを、請求項１から５までのいずれか一項に記載の要約映像生成装置として機能させることを特徴とする。 Furthermore, a program according to a sixth aspect causes a computer to function as the summary video generation apparatus according to any one of the first to fifth aspects.

以上のように、本発明によれば、映像に現れる演出を考慮することで、重要な区間の映像のみで構成される要約映像を生成することが可能となる。 As described above, according to the present invention, it is possible to generate a summary video composed only of video in an important section by taking into consideration the effects that appear in the video.

本発明の実施形態による要約映像生成装置の構成例を示すブロック図である。It is a block diagram which shows the structural example of the summary video generation apparatus by embodiment of this invention. カット系列V_C[1],...,V_C[N_C]及びシーン系列V_S[1],...,V_S[N_S]を説明する図である。It is a figure explaining cut series V _C [1], ..., V _C [N _C ] and scene series V _S [1], ..., V _S [N _S ]. 要素スコア算出部の構成例及び入出力データ例を示すブロック図である。It is a block diagram which shows the structural example and example of input-output data of an element score calculation part. 総合スコア算出部の構成例及び入出力データ例を示すブロック図である。It is a block diagram which shows the structural example and example of input-output data of a comprehensive score calculation part. 要約映像生成部の構成例及び入出力データ例を示すブロック図である。It is a block diagram which shows the structural example and example of input / output data of a summary image | video production | generation part. 要約映像生成部の処理例を示すフローチャートである。It is a flowchart which shows the process example of the summary image | video production | generation part. 図６のフローチャートを説明する図である。It is a figure explaining the flowchart of FIG.

以下、本発明を実施するための形態について図面を用いて詳細に説明する。本発明は、映像に現れる演出（例えばテロップ、出演者、カメラワークの動き量、説明用のＣＧ映像等）による効果の程度を重要度として求め、効果の高い区間を重要な区間として特定し、重要な区間の映像を結合して要約映像を生成することを特徴とする。これにより、重要な区間の映像のみで構成される要約映像が生成される。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings. The present invention obtains the degree of effect by the effects appearing in the video (for example, telop, performer, camerawork movement amount, CG video for explanation, etc.) as the importance, and identifies the high-efficiency section as the important section, A summary video is generated by combining videos of important sections. As a result, a summary video composed only of video in an important section is generated.

〔全体構成〕
まず、本発明の実施形態による要約映像生成装置の全体構成について説明する。図１は、本発明の実施形態による要約映像生成装置の構成例を示すブロック図である。この要約映像生成装置１は、カット分割部１０、シーン生成部１１、要素スコア算出部１２、総合スコア算出部１３及び要約映像生成部１４を備えている。〔overall structure〕
First, the overall configuration of a summary video generation apparatus according to an embodiment of the present invention will be described. FIG. 1 is a block diagram illustrating a configuration example of a summary video generation apparatus according to an embodiment of the present invention. The summary video generation apparatus 1 includes a cut division unit 10, a scene generation unit 11, an element score calculation unit 12, a total score calculation unit 13, and a summary video generation unit 14.

カット分割部１０は、番組映像を入力し、番組映像をカット単位のカット映像V_C[i]（i=1,...,N_C）に分割する「カット映像分割処理」を行い、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を生成する。そして、カット分割部１０は、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]をシーン分割部１１及び要素スコア算出部１２に出力する。 The cut dividing unit 10 inputs a program video, performs a “cut video dividing process” for dividing the program video into cut videos V _C [i] (i = 1,..., N _C ) in units of cuts, A cut sequence V _C [1],..., V _C [N _C ] composed of the video V _C [i] is generated. Then, the cut dividing unit 10 outputs the cut series V _C [1],..., V _C [N _C ] including the cut video V _C [i] to the scene dividing unit 11 and the element score calculating unit 12.

パラメータi=1,...,N_Cは、カット映像V_C[i]の番号（カット番号）を示し、N_Cは、カット映像V_C[i]の数を示す。カット映像V_C[i]は、カメラが切り替るまでの間に、切れ目なく連続して撮影された映像である。 Parameter i = 1, ..., N _C represents the cut video V _C [i] number (cut number), N _C indicates the number of cut video V _C [i]. The cut video V _C [i] is a video shot continuously without a break before the camera is switched.

尚、「カット映像分割処理」は既知であり、詳細については、例えば特開２００８−３３７４９号公報を参照されたい。 Note that “cut video division processing” is already known, and for details, see, for example, Japanese Patent Laid-Open No. 2008-33749.

シーン生成部１１は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力する。そして、シーン生成部１１は、同じ場面のカット映像V_C[i]を統合してシーン境界を検出し、シーン単位のシーン映像V_S[j]（j=1,...,N_S）を生成する「シーン映像生成処理」を行い、シーン映像V_S[j]からなるシーン系列V_S[1],...,V_S[N_S]を生成する。シーン生成部１１は、シーン映像V_S[j]からなるシーン系列V_S[1],...,V_S[N_S]を要約映像生成部１４に出力する。 The scene generation unit 11 inputs the cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut division unit 10. Then, the scene generation unit 11 detects the scene boundary by integrating the cut video V _C [i] of the same scene, and the scene video V _S [j] (j = 1,..., N _S ) for each scene. To generate a scene sequence V _S [1],..., V _S [N _S ] composed of the scene video V _S [j]. The scene generation unit 11 outputs the scene series V _S [1],..., V _S [N _S ] composed of the scene video V _S [j] to the summary video generation unit 14.

パラメータj=1,...,N_Sは、シーン映像V_S[j]の番号（シーン番号）を示し、N_Sは、シーン映像V_S[j]の数を示す。シーン映像V_S[j]は、ある場面における一連のカット映像V_C[i]を複数まとめた映像である。 Parameter j = 1, ..., N _S denotes the scene image V _S [j] number (scene number), N _S denotes the number of scene image V _S [j]. The scene image V _S [j] is an image in which a series of cut images V _C [i] in a certain scene are collected.

尚、「シーン映像生成処理」は既知であり、詳細については、例えば特開２０１４−２２５１１８号公報、特開２０１４−３３３５５号公報を参照されたい。 Note that “scene video generation processing” is known, and for details, see, for example, Japanese Patent Application Laid-Open Nos. 2014-225118 and 2014-33355.

図２は、カット系列V_C[1],...,V_C[N_C]及びシーン系列V_S[1],...,V_S[N_S]を説明する図である。カット系列V_C[1],...,V_C[N_C]は、番組映像をカット単位に分割することにより生成され、シーン系列V_S[1],...,V_S[N_S]は、カット系列V_C[1],...,V_C[N_C]を同じ場面毎に統合することで生成される。 Figure 2 is a cut sequence _{V C [1], ...,} V C [N C] and scene sequence V _S [1], ..., a diagram illustrating the V _{_S} [N _S]. The cut sequence V _C [1], ..., V _C [N _C ] is generated by dividing the program video into cut units, and the scene sequence V _S [1], ..., V _S [N _S ] Is generated by integrating the cut sequences V _C [1], ..., V _C [N _C ] for the same scene.

図２の例では、シーン映像V_S[1]は、カット映像V_C[1]，V_C[2]，V_C[3]を統合した映像であり、シーン映像V_S[2]は、カット映像V_C[4]，V_C[5]を統合した映像である。また、シーン映像V_S[N_S]は、カット映像V_C[N_C-1]，V_C[N_C]を統合した映像である。このように、カット映像V_C[i]は、シーン映像V_S[j]のいずれかに属することとなる。 In the example of FIG. 2, the scene image V _S [1] is an image obtained by integrating the cut images V _C [1], V _C [2], and V _C [3], and the scene image V _S [2] is This is an integrated video of cut video V _C [4] and V _C [5]. The scene image V _S [N _S ] is an image obtained by integrating the cut images V _C [N _C -1] and V _C [N _C ]. Thus, the cut video V _C [i] belongs to one of the scene videos V _S [j].

図１に戻って、要素スコア算出部１２は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力する。そして、要素スコア算出部１２は、「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」からなる４つの要素の演出毎に、各カット映像V_C[i]について、当該カット映像V_C[i]に基づいて重要度を表す要素スコアS₁[i]〜S₄[i]を算出する。 Returning to FIG. 1, the element score calculation unit 12 inputs a cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut division unit 10. Then, the element score calculation unit 12 performs, for each cut video V _C [i], for each cut video V _C [i] for each production of the four elements including “telop”, “face recognition”, “camera work”, and “CG video quality”. Element scores S ₁ [i] to S ₄ [i] representing importance are calculated based on _C [i].

重要度を表す要素スコアには、カット映像V_C[i]に現れる演出による効果の程度が反映される。カット映像V_C[i]についての「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」の要素スコアをそれぞれ、テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]とする。 The element score representing the importance reflects the degree of effect due to the effect appearing in the cut video V _C [i]. The element scores of “telop”, “face recognition”, “camera work” and “CG image quality” for the cut video V _C [i] are the telop score S ₁ [i], the face recognition score S ₂ [i], and the camera, respectively. The work score S ₃ [i] and the CG video likelihood score S ₄ [i] are used.

要素スコア算出部１２は、要素スコアS₁[i]〜S₄[i]（テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]）からなる要素スコア系列S₁[1],...,S₁[N_C]，S₂[1],...,S₂[N_C]，S₃[1],...,S₃[N_C]，S₄[1],...,S₄[N_C]を生成する。 The element score calculation unit 12 includes element scores S ₁ [i] to S ₄ [i] (telop score S ₁ [i], face recognition score S ₂ [i], camera work score S ₃ [i], and CG image likelihood. Element score series S ₁ [1], ..., S ₁ [N _C ], S ₂ [1], ..., S ₂ [N _C ], S ₃ [1 consisting of scores S ₄ [i]) ], ..., S ₃ [N _C ], S ₄ [1], ..., S ₄ [N _C ] are generated.

要素スコア算出部１２は、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]、及び要素スコアS₁[i]〜S₄[i]（テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]）からなる要素スコア系列S₁[1],...,S₁[N_C]，S₂[1],...,S₂[N_C]，S₃[1],...,S₃[N_C]，S₄[1],...,S₄[N_C]を、総合スコア算出部１３に出力する。要素スコア算出部１２の詳細については後述する。 The element score calculation unit 12 includes a cut sequence V _C [1],..., V _C [N _C ] composed of cut videos V _C [i] and element scores S ₁ [i] to S ₄ [i] ( Element score series S ₁ [1], composed of telop score S ₁ [i], face recognition score S ₂ [i], camera work score S ₃ [i], and CG video-likeness score S ₄ [i]) ... , S ₁ [N _C ], S ₂ [1], ..., S ₂ [N _C ], S ₃ [1], ..., S ₃ [N _C ], S ₄ [1], .. , S ₄ [N _C ] is output to the total score calculation unit 13. Details of the element score calculation unit 12 will be described later.

総合スコア算出部１３は、要素スコア算出部１２から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]、及び要素スコアS₁[i]〜S₄[i]からなる要素スコア系列S₁[1],...,S₁[N_C]，S₂[1],...,S₂[N_C]，S₃[1],...,S₃[N_C]，S₄[1],...,S₄[N_C]を入力する。 The total score calculation unit 13 receives from the element score calculation unit 12 a cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i], and an element score S ₁ [i]. to S ₄ [i] element score line S ₁ [1] consisting _{_{of, ..., S 1 [N C}} ], S 2 [1], ..., S 2 [N C], S 3 [1] , ..., S ₃ [N _C ], S ₄ [1], ..., S ₄ [N _C ] are input.

総合スコア算出部１３は、各カット映像V_C[i]について、予め設定された重み係数W₁〜W₄を用いて要素スコアS₁[i]〜S₄[i]を統合し、総合スコアS[i]を算出し、総合スコアS[i]からなる総合スコア系列S[1],...,S[N_C]を生成する。そして、総合スコア算出部１３は、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]、及び総合スコアS[i]からなる総合スコア系列S[1],...,S[N_C]を要約映像生成部１４に出力する。総合スコア算出部１３の詳細については後述する。 The total score calculation unit 13 integrates the element scores S ₁ [i] to S ₄ [i] using the preset weighting factors W _{1 to} W ₄ for each cut video V _C [i], and generates a total score. S [i] is calculated, and a total score series S [1],..., S [N _C ] composed of the total score S [i] is generated. Then, the total score calculation unit 13 includes a cut sequence V _C [1], ..., V _C [N _C ] composed of the cut video V _C [i], and a total score sequence S composed of the total score S [i]. [1],..., S [N _C ] are output to the summary video generation unit 14. Details of the total score calculation unit 13 will be described later.

要約映像生成部１４は、総合スコア算出部１３から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]、及び総合スコアS[i]からなる総合スコア系列S[1],...,S[N_C]を入力する。また、要約映像生成部１４は、シーン生成部１１から、シーン映像V_S[j]からなるシーン系列V_S[1],...,V_S[N_S]を入力する。 The summary video generation unit 14 receives the cut score V _C [1],..., V _C [N _C ] including the cut video V _C [i] and the total score S [i] from the total score calculation unit 13. The total score series S [1], ..., S [N _C ] is input. In addition, the summary video generation unit 14 receives the scene series V _S [1],..., V _S [N _S ] including the scene video V _S [j] from the scene generation unit 11.

要約映像生成部１４は、総合スコアS[i]及びシーン映像V_S[j]を参照し、要約映像全体の長さが所定値を超えるまで、要約映像を構成するカット映像V_C[i]を選択する。そして、要約映像生成部１４は、選択したカット映像V_C[i]を時系列に（フレーム番号の早い順に）ソートして連結することで要約映像を生成し、要約映像を出力する。要約映像生成部１４の詳細については後述する。 The summary video generation unit 14 refers to the overall score S [i] and the scene video V _S [j], and the cut video V _C [i] constituting the summary video until the total length of the summary video exceeds a predetermined value. Select. Then, the summary video generation unit 14 generates a summary video by sorting and connecting the selected cut video V _C [i] in time series (in order of frame number), and outputs the summary video. Details of the summary video generation unit 14 will be described later.

〔要素スコア算出部１２〕
次に、図１に示した要素スコア算出部１２について詳細に説明する。前述のとおり、要素スコア算出部１２は、「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」からなる４つの要素の演出毎に、各カット映像V_C[i]について、当該カット映像V_C[i]に基づいてテロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]を算出する。 [Element Score Calculation Unit 12]
Next, the element score calculation unit 12 shown in FIG. 1 will be described in detail. As described above, the element score calculation unit 12 performs the cut for each cut video V _C [i] for each production of four elements including “telop”, “face recognition”, “camera work”, and “CG video quality”. Based on the video V _C [i], a telop score S ₁ [i], a face recognition score S ₂ [i], a camera work score S ₃ [i], and a CG video likelihood score S ₄ [i] are calculated.

図３は、要素スコア算出部１２の構成例及び入出力データ例を示すブロック図である。この要素スコア算出部１２は、テロップ領域検出部２０、顔認識処理部２１、カメラワーク算出部２２、ＣＧ映像らしさ算出部２３、テロップスコア算出部２４、顔認識スコア算出部２５、カメラワークスコア算出部２６及びＣＧ映像らしさスコア算出部２７を備えている。 FIG. 3 is a block diagram illustrating a configuration example and input / output data example of the element score calculation unit 12. The element score calculation unit 12 includes a telop area detection unit 20, a face recognition processing unit 21, a camera work calculation unit 22, a CG video likelihood calculation unit 23, a telop score calculation unit 24, a face recognition score calculation unit 25, and a camera work score calculation. Unit 26 and a CG video likelihood score calculation unit 27.

要素スコア算出部１２は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力し、これを総合スコア算出部１３に出力する。 The element score calculation unit 12 inputs a cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut division unit 10, and uses this as a total score calculation unit 13 is output.

テロップ領域検出部２０は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力し、カット映像V_C[i]からTフレーム毎にフレーム画像P[i,n]をサンプリングし、フレーム画像P[i,n]からなる画像系列P[i,1],...,P[i,N_P]を生成する。そして、テロップ領域検出部２０は、フレーム画像P[i,n]からなる画像系列P[i,1],...,P[i,N_P]を顔認識処理部２１及びＣＧ映像らしさ算出部２３に出力する。 Ticker area detection unit 20, from the cut dividing unit 10, a cut line consisting of cut video _{_{V C [i] V C [}} 1], ..., V C [N C] Enter the cut video V _C [i ] Sample frame image P [i, n] every T frames to generate image sequence P [i, 1], ..., P [i, N _P ] consisting of frame images P [i, n] To do. Then, the telop area detection unit 20 calculates the image sequence P [i, 1],..., P [i, N _P ] composed of the frame images P [i, n] as the face recognition processing unit 21 and the CG image likelihood. To the unit 23.

パラメータn=1,...,N_Pは、フレーム画像P[i,n]の番号を示し、N_Pは、サンプリングされたフレーム画像P[i,n]の数を示す。 Parameters n = 1,..., N _P indicate the numbers of the frame images P [i, n], and N _P indicates the number of sampled frame images P [i, n].

テロップ領域検出部２０は、各フレーム画像P[i,n]について、テロップが表示されている領域を検出する「テロップ領域検出処理」を行い、テロップ領域の面積を算出し、当該フレーム画像P[i,n]に対するテロップ領域の面積比率r_TL[i,n]を算出する。そして、テロップ領域検出部２０は、フレーム画像P[i,n]に対するテロップ領域の面積比率r_TL[i,n]からなる面積比率系列r_TL[i,1],...,r_TL[i,N_P]を生成し、これをテロップスコア算出部２４に出力する。このテロップ領域の面積は、当該演出に関連する対象である「テロップ」が映像内に現れる面積である。 The telop area detection unit 20 performs “telop area detection processing” for detecting an area in which a telop is displayed for each frame image P [i, n], calculates the area of the telop area, and calculates the frame image P [ The area ratio r _TL [i, n] of the telop area to i, n] is calculated. The ticker area detection unit 20, the frame image P [i, n] area of telop area for the ratio r _TL [i, n] consists of area ratio sequence _{r TL [i, 1],} ..., r TL [ i, N _P ] is generated and output to the telop score calculation unit 24. The area of the telop area is an area where “telop”, which is a target related to the production, appears in the video.

尚、「テロップ領域検出処理」は既知であり、詳細については、例えば特開２０１３−３０９６３号公報を参照されたい。 Note that “telop area detection processing” is already known, and for details, see, for example, Japanese Patent Application Laid-Open No. 2013-30963.

テロップスコア算出部２４は、テロップ領域検出部２０から、フレーム画像P[i,n]に対するテロップ領域の面積比率r_TL[i,n]からなる面積比率系列r_TL[i,1],...,r_TL[i,N_P]を入力する。そして、テロップスコア算出部２４は、以下の式により、各カット映像V_C[i]について、面積比率r_TL[i,n]に基づいて、当該カット映像V_C[i]のテロップスコアS₁[i]を算出する。

C_TLは正規化定数であり、予め設定される。 The telop score calculation unit 24 sends an area ratio series r _TL [i, 1],... _Composed of the area ratio r _TL [i, n] of the telop area to the frame image P [i, n] from the telop area detection unit 20. ., r Enter _TL [i, N _P ]. Then, the telop score calculation unit 24 calculates the telop score S _{1 of the} cut video V _C [i] based on the area ratio r _TL [i, n] for each cut video V _C [i] according to the following formula. [i] is calculated.

C _TL is a normalization constant and is set in advance.

テロップスコアS₁[i]の範囲は、0≦S₁[i]≦1である。テロップスコアS₁[i]は、フレーム画像P[i,n]内でテロップ領域の面積が広いほど、大きい値となり、面積が狭いほど、小さい値となる。つまり、テロップスコアS₁[i]は、カット映像V_C[i]において、テロップが表示される領域が広いフレーム画像P[i,n]が出現するほど、大きい値となる。テロップスコアS₁[i]は、カット映像V_C[i]において、テロップが表示される領域が最も広いフレーム画像P[i,n]の面積比率r_TL[i,n]を、０から１までの間の範囲で正規化した値となる。 The range of the telop score S ₁ [i] is 0 ≦ S ₁ [i] ≦ 1. The telop score S ₁ [i] is larger as the area of the telop area is larger in the frame image P [i, n], and is smaller as the area is smaller. That is, the telop score S ₁ [i] increases as the frame image P [i, n] having a wider area in which the telop is displayed appears in the cut video V _C [i]. The telop score S ₁ [i] is an area ratio r _TL [i, n] of the frame image P [i, n] having the widest area where the telop is displayed in the cut video V _C [i]. It becomes a value normalized in the range between.

テロップスコア算出部２４は、テロップスコアS₁[i]からなるテロップスコア系列S₁[1],...,S₁[N_C]を生成し、これを総合スコア算出部１３に出力する。 The telop score calculation unit 24 generates a telop score series S ₁ [1],..., S ₁ [N _C ] composed of the telop score S ₁ [i], and outputs this to the total score calculation unit 13.

顔認識処理部２１は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力すると共に、テロップ領域検出部２０から、フレーム画像P[i,n]からなる画像系列P[i,1],...,P[i,N_P]を入力する。 The face recognition processing unit 21 inputs a cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut dividing unit 10, and a telop area detection unit 20. , P [i, 1],..., P [i, N _P ] composed of frame images P [i, n] are input.

顔認識処理部２１は、各フレーム画像P[i,n]について、対象番組におけるメインの出演者ID[m]（M名のメインの出演者ID[1],...,ID[M]）を対象として、これらの顔を認識する「顔認識処理」を行い、顔領域F[i,n,k]を検出する。当該フレーム画像P[i,n]からK個の顔領域F[i,n,k]が検出されたとする。顔認識処理部２１は、顔領域F[i,n,k]からなるK個の顔領域系列F[i,n,1],...,F[i,n,K]を生成する。 For each frame image P [i, n], the face recognition processing unit 21 selects the main performer ID [m] (M main performer IDs [1],..., ID [M]) in the target program. The “face recognition process” for recognizing these faces is performed, and the face area F [i, n, k] is detected. It is assumed that K face areas F [i, n, k] are detected from the frame image P [i, n]. The face recognition processing unit 21 generates K face area sequences F [i, n, 1],..., F [i, n, K] composed of face areas F [i, n, k].

パラメータm=1,...,Mは、メインの出演者ID[m]の番号を示し、Mは、メインの出演者ID[m]の数を示す。また、パラメータk=1,...,Kは、顔領域F[i,n,k]の番号を示し、Kは、フレーム画像P[i,n]から検出された顔領域F[i,n,k]の数を示す。 Parameters m = 1,..., M indicate the number of the main performer ID [m], and M indicates the number of the main performer ID [m]. Parameters k = 1,..., K indicate the number of the face area F [i, n, k], and K indicates the face area F [i, n] detected from the frame image P [i, n]. n, k].

尚、「顔認識処理」は既知であり、詳細については、例えば特開２０１７−３３３７２号公報を参照されたい。 Note that “face recognition processing” is known, and for details, see, for example, Japanese Patent Application Laid-Open No. 2017-33372.

顔認識処理部２１は、当該フレーム画像P[i,n]に対する顔領域F[i,n,k]の面積比率r_FC[i,n,k]を算出し、K個の面積比率r_FC[i,n,k]からなる面積比率系列r_FC[i,n,1],...,r_FC[i,n,K]を生成する。 The face recognition processing unit 21 calculates an area ratio r _FC [i, n, k] of the face area F [i, n, k] with respect to the frame image P [i, n], and K area ratios r _FC. An area ratio series r _FC [i, n, 1], ..., r _FC [i, n, K] consisting of [i, n, k] is generated.

顔認識処理部２１は、顔領域F[i,n,k]が出演者ID[m]の顔である確率p_FC[i,n,k,m]を算出する「顔確率算出処理」を行い、M個の確率p_FC[i,n,k,m]からなる確率系列p_FC[i,n,k,1],...,p_FC[i,n,k,M]を生成する。この確率p_FC[i,n,k,m]は、当該演出に関連する対象である出演者ID[m]の「顔」が現れる確率である。 The face recognition processing unit 21 performs a “face probability calculation process” for calculating the probability p _FC [i, n, k, m] that the face area F [i, n, k] is the face of the performer ID [m]. And generate a probability sequence p _FC [i, n, k, 1], ..., p _FC [i, n, k, M] consisting of M probabilities p _FC [i, n, k, m] To do. This probability p _FC [i, n, k, m] is the probability that the “face” of the performer ID [m], which is the target related to the production, appears.

尚、「顔確率算出処理」は既知であり、詳細については、「顔認識処理」と同様に、例えば特開２０１７−３３３７２号公報を参照されたい。 The “face probability calculation process” is already known, and for details, refer to, for example, Japanese Patent Application Laid-Open No. 2017-33372 as in the “face recognition process”.

顔認識処理部２１は、フレーム画像P[i,n]に対する顔領域F[i,n,k]の面積比率r_FC[i,n,k]からなる面積比率系列r_FC[i,n,1],...,r_FC[i,n,K]、及び、フレーム画像P[i,n]内の顔領域F[i,n,k]が出演者ID[m]の顔である確率p_FC[i,n,k,m]からなる確率系列p_FC[i,n,k,1],...,p_FC[i,n,k,M]を顔認識スコア算出部２５に出力する。 The face recognition processing unit 21 has an area ratio series r _FC [i, n, k] composed of area ratios r _FC [i, n, k] of the face region F [i, n, k] with respect to the frame image P [i, n]. 1], ..., r _FC [i, n, K] and face area F [i, n, k] in frame image P [i, n] are the faces of performer ID [m] A probability sequence p _FC [i, n, k, 1], ..., p _FC [i, n, k, M] consisting of probabilities p _FC [i, n, k, m] Output to.

顔認識スコア算出部２５は、顔認識処理部２１から、フレーム画像P[i,n]に対する顔領域F[i,n,k]の面積比率r_FC[i,n,k]からなる面積比率系列r_FC[i,n,1],...,r_FC[i,n,K]、及び、フレーム画像P[i,n]内の顔領域F[i,n,k]が出演者ID[m]の顔である確率p_FC[i,n,k,m]からなる確率系列p_FC[i,n,k,1],...,p_FC[i,n,k,M]を入力する。 The face recognition score calculation unit 25 receives an area ratio composed of the area ratio r _FC [i, n, k] of the face region F [i, n, k] with respect to the frame image P [i, n] from the face recognition processing unit 21. The series r _FC [i, n, 1], ..., r _FC [i, n, K] and the face area F [i, n, k] in the frame image P [i, n] Probability sequence p _FC [i, n, k, 1], ..., p _FC [i, n, k, M] consisting of the probability p _FC [i, n, k, m] that is the face of ID [m] ] Is entered.

顔認識スコア算出部２５は、以下の式により、フレーム画像P[i,n]内の顔領域F[i,n,k]が出演者ID[m]の顔である確率p_FC[i,n,k,m]に基づいて、パラメータw_FC[i,n,k]を算出する。

C’_FCは、パラメータw_FC[i,n,k]の最小値を定める値であり、予め設定される。 The face recognition score calculation unit 25 calculates the probability p _FC [i,] that the face area F [i, n, k] in the frame image P [i, n] is the face of the performer ID [m] by the following formula. The parameter w _FC [i, n, k] is calculated based on n, k, m].

C ′ _FC is a value that determines the minimum value of the parameter w _FC [i, n, k], and is set in advance.

パラメータw_FC[i,n,k]は、フレーム画像P[i,n]内の顔領域F[i,n,k]において、最大となる確率p_FC[i,n,k,m]が低いほど、大きい値となり、最大となる確率p_FC[i,n,k,m]が高いほど、小さい値となる。 The parameter w _FC [i, n, k] has a maximum probability p _FC [i, n, k, m] in the face area F [i, n, k] in the frame image P [i, n]. The lower the value, the larger the value, and the higher the probability of maximum p _FC [i, n, k, m], the smaller the value.

顔認識スコア算出部２５は、以下の式により、フレーム画像P[i,n]に対する顔領域F[i,n,k]の面積比率r_FC[i,n,k]、及び前記数式（２）にて算出したパラメータw_FC[i,n,k]に基づいて、パラメータR_FC[i,n]を算出する。

The face recognition score calculation unit 25 calculates the area ratio r _FC [i, n, k] of the face area F [i, n, k] with respect to the frame image P [i, n] by the following formula and the formula (2) The parameter R _FC [i, n] is calculated based on the parameter w _FC [i, n, k] calculated in (1).

パラメータR_FC[i,n]は、フレーム画像P[i,n]において、顔領域F[i,n,k]のパラメータw_FC[i,n,k]に顔領域F[i,n,k]の面積比率r_FC[i,n,k]を乗算し、乗算結果を全ての顔領域F[i,n,k]について加算した値である。 The parameter R _FC [i, n] is included in the face area F [i, n, k] in the parameter w _FC [i, n, k] of the face area F [i, n, k] in the frame image P [i, n]. This is a value obtained by multiplying the area ratio r _FC [i, n, k] of k] and adding the multiplication results for all face regions F [i, n, k].

パラメータR_FC[i,n]は、パラメータw_FC[i,n,k]が大きいほど（確率p_FC[i,n,k,m]が低いほど）、大きい値となり、パラメータw_FC[i,n,k]が小さいほど（確率p_FC[i,n,k,m]が高いほど）、小さい値となる。また、パラメータR_FC[i,n]は、面積比率r_FC[i,n,k]が高いほど、大きい値となり、面積比率r_FC[i,n,k]が小さいほど、小さい値となる。 The parameter R _FC [i, n] increases as the parameter w _FC [i, n, k] increases (the probability p _FC [i, n, k, m] decreases). The parameter w _FC [i, n] , n, k] is smaller (the probability p _FC [i, n, k, m] is higher), the smaller the value is. The parameter R _FC [i, n] is the area ratio _{r FC [i, n, k} ] higher, becomes a large value, as the area ratio _{r FC [i, n, k} ] is small, a small value .

顔認識スコア算出部２５は、以下の式により、各カット映像V_C[i]について、前記数式（３）にて算出したパラメータR_FC[i,n]に基づいて、当該カット映像V_C[i]の顔認識スコアS₂[i]を算出する。

C_FCは正規化定数であり、予め設定される。 Face recognition score calculation unit 25, by the following equation, for each cut image V _C [i], the parameter R _FC [i, n] calculated the at Equation (3) based on, the cut video V _C [ The face recognition score S ₂ [i] of i] is calculated.

C _FC is a normalization constant and is set in advance.

顔認識スコアS₂[i]の範囲は、0≦S₂[i]≦1である。顔認識スコアS₂[i]は、パラメータR_FC[i,n]が大きいほど、大きい値となり、パラメータR_FC[i,n]が小さいほど、小さい値となる。つまり、顔認識スコアS₂[i]は、確率p_FC[i,n,k,m]が低いほど、大きい値となり、確率p_FC[i,n,k,m]が高いほど、小さい値となり、面積比率r_FC[i,n,k]が高いほど、大きい値となり、面積比率r_FC[i,n,k]が低いほど、小さい値となる。 The range of the face recognition score S ₂ [i] is 0 ≦ S ₂ [i] ≦ 1. Face recognition score S ₂ [i], the higher the parameter R _FC [i, n] is large, it becomes a large value, as the parameter R _FC [i, n] is small, a small value. In other words, the face recognition score S ₂ [i], the probability _{p FC [i, n, k} , m] is as low, becomes a large value, the probability _{p FC [i, n, k} , m] higher, smaller Thus, the higher the area ratio r _FC [i, n, k], the larger the value, and the lower the area ratio r _FC [i, n, k], the smaller the value.

確率p_FC[i,n,k,m]は、顔領域F[i,n,k]がメインの出演者ID[m]の顔である確率であるから、確率p_FC[i,n,k,m]が低い場合は、メインでない出演者ID[m]の顔（ゲストの顔）である確率が高く、確率p_FC[i,n,k,m]が高い場合は、ゲストの顔である確率が低いことを意味する。 The probability p _FC [i, n, k, m] is the probability that the face area F [i, n, k] is the face of the main performer ID [m], so the probability p _FC [i, n, If k, m] is low, there is a high probability that it is a non-main performer ID [m] face (guest face), and if the probability p _FC [i, n, k, m] is high, the guest face This means that the probability of being is low.

したがって、顔認識スコアS₂[i]は、カット映像V_C[i]に含まれるフレーム画像P[i,n]において、メインの出演者ID[m]の顔が現れる総面積が広いほど、大きい値となり、メインの出演者ID[m]の顔が現れる総面積が狭いほど、小さい値となる。また、顔認識スコアS₂[i]は、カット映像V_C[i]に含まれるフレーム画像P[i,n]において、ゲストの顔が現れる確率が高いほど、大きい値となり、ゲストの顔が現れる確率が低いほど、小さい値となる。つまり、顔認識スコアS₂[i]は、カット映像V_C[i]において、メインの出演者ID[m]の顔が現れる総面積が広いフレーム画像P[i,n]が出現するほど、大きい値となり、ゲストの顔が現れる確率が高いフレーム画像P[i,n]が出現するほど、大きい値となる。 Therefore, the face recognition score S ₂ [i] is larger as the total area where the face of the main performer ID [m] appears in the frame image P [i, n] included in the cut video V _C [i] The larger the value, the smaller the total area in which the face of the main performer ID [m] appears. In addition, the face recognition score S ₂ [i] increases as the probability that the guest's face appears in the frame image P [i, n] included in the cut video V _C [i] increases. The lower the probability of appearing, the smaller the value. That is, the face recognition score S ₂ [i] is such that the frame image P [i, n] having a large total area in which the face of the main performer ID [m] appears in the cut video V _C [i] The value increases as the frame image P [i, n] with a higher probability of appearance of the guest's face appears.

顔認識スコア算出部２５は、顔認識スコアS₂[i]からなる顔認識スコア系列S₂[1],...,S₂[N_C]を生成し、これを総合スコア算出部１３に出力する。 The face recognition score calculation unit 25 generates a face recognition score series S ₂ [1],..., S ₂ [N _C ] composed of the face recognition score S ₂ [i], and sends this to the total score calculation unit 13. Output.

カメラワーク算出部２２は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力し、「カメラワーク検出処理」を行う。そして、カメラワーク算出部２２は、各カット映像V_C[i]について、ズーム、パン等の所定のカメラワークが生じた映像区間V_CW[i,q]を求め、映像区間V_CW[i,q]からなる映像区間系列V_CW[i,1],...,V_CW[i,N_CW]を生成する。 The camera work calculation unit 22 inputs the cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut dividing unit 10 and performs “camera work detection processing”. I do. Then, the camera work calculation unit 22 obtains a video section V _CW [i, q] in which predetermined camera work such as zooming and panning has occurred for each cut video V _C [i], and the video section V _CW [i, Generate a video interval series V _CW [i, 1], ..., V _CW [i, N _CW ] consisting of q].

パラメータq=1,...,N_CWは、映像区間V_CW[i,q]の番号を示し、N_CWは、カット映像V_C[i]から検出された映像区間V_CW[i,q]の数を示す。 Parameter q = 1, ..., N _CW indicates the number of video segments _{V CW [i, q],} N CW , the video segment is detected from the cut video _{_{V C [i] V CW [}} i, q ] Number.

尚、「カメラワーク検出処理」は既知であり、詳細については、例えば特開平１０−２４３３４０号公報を参照されたい。 The “camera work detection process” is already known, and for details, see, for example, Japanese Patent Laid-Open No. 10-243340.

カメラワーク算出部２２は、各映像区間V_CW[i,q]についてカメラの動き量を算出し、カメラの動き量を画像の対角線の長さで正規化（除算）した動き率r_CW[i,q]を算出し、動き率r_CW[i,q]からなる動き率系列r_CW[i,1],...,r_CW[i,N_CW]を生成する。そして、カメラワーク算出部２２は、動き率r_CW[i,q]からなる動き率系列r_CW[i,1],...,r_CW[i,N_CW]をカメラワークスコア算出部２６に出力する。この動き率r_CW[i,q]は、当該演出に関連する対象である「カメラワーク」の動き率である。 Camera work calculation unit 22, the video section V _CW [i, q] to calculate the movement amount of the camera for normalization of the amount of motion of the camera by the length of the diagonal of the image (division) motion ratio r _CW [i , q] to generate a motion rate sequence r _CW [i, 1],..., r _CW [i, N _CW ] composed of the motion rates r _CW [i, q]. Then, the camera work calculation unit 22 _{converts the} motion rate sequence r _CW [i, 1],..., R _CW [i, N _CW ] composed of the motion rates r _CW [i, q] into the camera work score calculation unit 26. Output to. This motion rate r _CW [i, q] is a motion rate of “camera work” that is a target related to the effect.

カメラワークスコア算出部２６は、カメラワーク算出部２２から、動き率r_CW[i,q]からなる動き率系列r_CW[i,1],...,r_CW[i,N_CW]を入力し、以下の式により、各カット映像V_C[i]について、動き率r_CW[i,q]に基づいて、当該カット映像V_C[i]のカメラワークスコアS₃[i]を算出する。

C_CWは正規化定数であり、予め設定される。 The camera work score calculation unit 26 receives the motion rate sequence r _CW [i, 1], ..., r _CW [i, N _CW ] composed of the motion rate r _CW [i, q] from the camera work calculation unit 22. type, by the following equation, for each cut image V _C [i], based on the movement rate r _CW [i, q], calculated camerawork score S ₃ [i] of the cut video V _C [i] To do.

C _CW is a normalization constant and is set in advance.

カメラワークスコアS₃[i]の範囲は、0≦S₃[i]≦1である。カメラワークスコアS₃[i]は、カット映像V_C[i]において、カメラの動き量が多いほど、大きい値となり、カメラの動き量が少ないほど、小さい値となる。つまり、カメラワークスコアS₃[i]は、カット映像V_C[i]において、カメラの動き量が多い映像区間V_CW[i,q]が出現するほど、大きい値となる。 The range of the camera work score S ₃ [i] is 0 ≦ S ₃ [i] ≦ 1. In the cut video V _C [i], the camera work score S ₃ [i] increases as the camera movement amount increases, and decreases as the camera movement amount decreases. That is, the camera work score S ₃ [i] increases as the video section V _CW [i, q] with a large amount of camera motion appears in the cut video V _C [i].

カメラワークスコア算出部２６は、カメラワークスコアS₃[i]からなるカメラワークスコア系列S₃[1],...,S₃[N_C]を生成し、これを総合スコア算出部１３に出力する。 The camera work score calculation unit 26 generates a camera work score series S ₃ [1],..., S ₃ [N _C ] composed of the camera work score S ₃ [i], and sends this to the total score calculation unit 13. Output.

ＣＧ映像らしさ算出部２３は、カット分割部１０から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力すると共に、テロップ領域検出部２０から、フレーム画像P[i,n]からなる画像系列P[i,1],...,P[i,N_P]を入力する。 The CG video quality calculation unit 23 receives a cut sequence V _C [1],..., V _C [N _C ] composed of the cut video V _C [i] from the cut division unit 10 and a telop area detection unit. 20, an image sequence P [i, 1],..., P [i, N _P ] composed of frame images P [i, n] is input.

ＣＧ映像らしさ算出部２３は、各フレーム画像P[i,n]について、「サポートベクターマシーン（SVM）の認識モデルを用いた識別処理」を行い、ＣＧ映像である確率p_CG[i,n]を算出する。そして、ＣＧ映像らしさ算出部２３は、ＣＧ映像である確率p_CG[i,n]からなる確率系列p_CG[i,1],...,p_CG[i,N_P]を生成する。ＣＧ映像である確率p_CG[i,n]の範囲は、0≦p_CG[i,n]≦1である。 The CG video likelihood calculation unit 23 performs “discrimination processing using a recognition model of a support vector machine (SVM)” for each frame image P [i, n], and a probability p _CG [i, n] that is a CG video. Is calculated. Then, the CG video likelihood calculation unit 23 generates a probability sequence p _CG [i, 1],..., P _CG [i, N _P ] composed of the probability p _CG [i, n] that is a CG video. The range of the probability p _CG [i, n] that is a CG image is 0 ≦ p _CG [i, n] ≦ 1.

サポートベクターマシーンには、ＣＧ映像の画像特徴とＣＧ映像でない画像の画像特徴との間の違いを予め学習させておく。ＣＧ映像は彩度が高く、かつ人工的なテクスチャ特徴を持つ傾向がある。そこで、画像特徴としては、ＨＳＶ色空間のS値（彩度）のヒストグラム、エッジ方向ヒストグラム、または自然物及び人工物の分類性の高いフラクタル特徴等が用いられる。 The support vector machine is made to learn in advance a difference between an image feature of a CG video and an image feature of an image that is not a CG video. CG images tend to be highly saturated and have artificial texture features. Therefore, as the image feature, an S value (saturation) histogram in the HSV color space, an edge direction histogram, or a fractal feature with high classification of natural objects and artifacts is used.

尚、「サポートベクターマシーン（SVM）の認識モデルを用いた識別処理」は既知であるから、ここでは詳細な説明を省略する。また、フラクタル特徴の詳細については、例えば特開２００１−５６８２０号公報を参照されたい。 Since “identification processing using a support vector machine (SVM) recognition model” is known, detailed description thereof is omitted here. For details of the fractal feature, refer to, for example, Japanese Patent Laid-Open No. 2001-56820.

ＣＧ映像らしさ算出部２３は、ＣＧ映像である確率p_CG[i,n]からなる確率系列p_CG[i,1],...,p_CG[i,N_P]をＣＧ映像らしさスコア算出部２７に出力する。 The CG image likelihood calculation unit 23 calculates a probability score p _CG [i, 1], ..., p _CG [i, N _P ] composed of the probability p _CG [i, n], which is a CG image, as a CG image likelihood score. To the unit 27.

ＣＧ映像らしさスコア算出部２７は、ＣＧ映像らしさ算出部２３から、ＣＧ映像である確率p_CG[i,n]からなる確率系列p_CG[i,1],...,p_CG[i,N_P]を入力する。そして、ＣＧ映像らしさスコア算出部２７は、以下の式により、各カット映像V_C[i]について、ＣＧ映像である確率p_CG[i,n]に基づいて、ＣＧ映像らしさスコアS₄[i]を算出する。

The CG video likelihood score calculating unit 27 receives a probability sequence p _CG [i, 1],..., P _CG [i, i] including the probability p _CG [i, n] of the CG video from the CG video likelihood calculating unit 23. Enter N _P ]. Then, the CG video likelihood score calculation unit 27 calculates the CG video likelihood score S ₄ [i] for each cut video V _C [i] based on the probability p _CG [i, n] that is a CG video for each cut video V _C [i]. ] Is calculated.

ＣＧ映像らしさスコアS₄[i]の範囲は、0≦S₄[i]≦1である。ＣＧ映像らしさスコアS₄[i]は、カット映像V_C[i]に含まれるフレーム画像P[i,n]について、ＣＧ映像である確率p_CG[i,n]の最大値が大きいほど、大きい値となり、ＣＧ映像である確率p_CG[i,n]の最大値が小さいほど、小さい値となる。つまり、ＣＧ映像らしさスコアS₄[i]は、カット映像V_C[i]において、ＣＧ映像である確率p_CG[i,n]の高いフレーム画像P[i,n]が出現するほど、大きい値となる。 The range of the CG video likelihood score S ₄ [i] is 0 ≦ S ₄ [i] ≦ 1. The CG video likelihood score S ₄ [i] is larger for the frame image P [i, n] included in the cut video V _C [i], as the maximum value of the probability p _CG [i, n] of the CG video is larger. The larger the value, the smaller the maximum value of the probability p _CG [i, n] that is a CG video, the smaller the value. That is, the CG video likelihood score S ₄ [i] increases as the frame image P [i, n] having a high probability p _CG [i, n] that is a CG video appears in the cut video V _C [i]. Value.

ＣＧ映像らしさスコア算出部２７は、ＣＧ映像らしさスコアS₄[i]からなるＣＧ映像らしさスコア系列S₄[1],...,S₄[N_C]を生成し、これを総合スコア算出部１３に出力する。 The CG video quality score calculation unit 27 generates a CG video quality score series S ₄ [1],..., S ₄ [N _C ] composed of CG video quality scores S ₄ [i], and calculates a total score. To the unit 13.

〔総合スコア算出部１３〕
次に、図１に示した総合スコア算出部１３について詳細に説明する。前述のとおり、総合スコア算出部１３は、各カット映像V_C[i]について、テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]を統合して総合スコアS[i]を算出する。 [Total Score Calculation Unit 13]
Next, the total score calculation unit 13 illustrated in FIG. 1 will be described in detail. As described above, the total score calculation unit 13 determines, for each cut video V _C [i], the telop score S ₁ [i], the face recognition score S ₂ [i], the camera work score S ₃ [i], and the CG video likelihood. The score S ₄ [i] is integrated to calculate the total score S [i].

図４は、総合スコア算出部１３の構成例及び入出力データ例を示すブロック図である。この総合スコア算出部１３は、重み係数設定部３０及びスコア算出部３１を備えている。 FIG. 4 is a block diagram illustrating a configuration example and input / output data example of the total score calculation unit 13. The total score calculation unit 13 includes a weight coefficient setting unit 30 and a score calculation unit 31.

総合スコア算出部１３は、要素スコア算出部１２から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]を入力し、これを要約映像生成部１４に出力する。 The total score calculation unit 13 receives the cut sequence V _C [1], ..., V _C [N _C ] composed of the cut video V _C [i] from the element score calculation unit 12, and generates the summary video. To the unit 14.

重み係数設定部３０は、テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]の各要素について、総合スコアS[i]への反映度を定める重み係数W_r（r=1,...,4）を設定する。そして、重み係数設定部３０は、重み係数W₁〜W₄をスコア算出部３１に出力する。重み係数W₁〜W₄は、利用者（要約映像の制作者）の操作により自由に定められ、予め設定される The weighting coefficient setting unit 30 calculates the total score S for each element of the telop score S ₁ [i], the face recognition score S ₂ [i], the camera work score S ₃ [i], and the CG video likelihood score S ₄ [i]. A weight coefficient W _r (r = 1,..., 4) that determines the degree of reflection on [i] is set. Then, the weight coefficient setting unit 30 outputs the weight coefficients W _{1 to} W ₄ to the score calculation unit 31. The weighting factors W _{1 to} W ₄ are freely determined and set in advance by the operation of the user (summary video producer).

スコア算出部３１は、要素スコア算出部１２のテロップスコア算出部２４から、テロップスコアS₁[i]からなるテロップスコア系列S₁[1],...,S₁[N_C]を入力し、顔認識スコア算出部２５から、顔認識スコアS₂[i]からなる顔認識スコア系列S₂[1],...,S₂[N_C]を入力する。また、重み係数設定部３０は、カメラワークスコア算出部２６から、カメラワークスコアS₃[i]からなるカメラワークスコア系列S₃[1],...,S₃[N_C]を入力し、ＣＧ映像らしさスコア算出部２７から、ＣＧ映像らしさスコアS₄[i]からなるＣＧ映像らしさスコア系列S₄[1],...,S₄[N_C]を入力する。また、スコア算出部３１は、重み係数設定部３０から重み係数W₁〜W₄を入力する。 The score calculation unit 31 inputs the telop score series S ₁ [1],..., S ₁ [N _C ] composed of the telop score S ₁ [i] from the telop score calculation unit 24 of the element score calculation unit 12. The face recognition score series S ₂ [1],..., S ₂ [N _C ] composed of the face recognition score S ₂ [i] is input from the face recognition score calculation unit 25. Further, the weighting factor setting unit 30 inputs a camera work score series S ₃ [1],..., S ₃ [N _C ] composed of the camera work score S ₃ [i] from the camera work score calculation unit 26. The CG video likelihood score series S ₄ [1],..., S ₄ [N _C ] composed of the CG video likelihood score S ₄ [i] is input from the CG video likelihood score calculation unit 27. Moreover, the score calculation unit 31 inputs weighting factors W _{1 to} W ₄ from the weighting factor setting unit 30.

スコア算出部３１は、以下の式により、テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]に重み係数W_rをそれぞれ乗算し、乗算結果を加算することで、カット映像V_C[i]の総合スコアS[i]を算出する。

The score calculation unit 31 calculates the weight coefficient W to the telop score S ₁ [i], the face recognition score S ₂ [i], the camera work score S ₃ [i], and the CG video likelihood score S ₄ [i] according to the following formula. The total score S [i] of the cut video V _C [i] is calculated by multiplying each _r and adding the multiplication results.

スコア算出部３１は、カット映像V_C[i]の総合スコアS[i]からなる総合スコア系列S[1],...,S[N_C]を生成し、これを要約映像生成部１４に出力する。 The score calculation unit 31 generates a total score series S [1],..., S [N _C ] composed of the total score S [i] of the cut video V _C [i], and this is generated as the summary video generation unit 14. Output to.

これにより、重み係数W_rに応じて、当該重み係数W_rに対応する要素のスコアが反映された総合スコアS[i]が算出される。重み係数W_rを高く設定した要素については、その要素のスコアが総合スコアS[i]に大きく反映され、重み係数W_rを低く設定した要素については、その要素のスコアが総合スコアS[i]にさほど反映されない。 Thus, in accordance with the weighting factor W _r, the total score S score of elements corresponding to the weight coefficient W _r is reflected [i] is calculated. For an element for which the weighting factor _Wr is set high, the score for that element is greatly reflected in the overall score S [i], and for an element for which the weighting factor _Wr is set low, the score for that element is the overall score S [i ] Is not reflected so much.

例えば、顔認識スコアS₂[i]の重み係数W₂に大きい値を設定し、他の重み係数W_1,3,4に小さい値を設定した場合には、顔認識スコアS₂[i]が大きく反映された総合スコアS[i]が生成される。そして、後述する要約映像生成部１４により、メインの出演者ID[m]の顔またはゲストの顔が多く現れる要約映像が生成される。同様に、ＣＧ映像らしさスコアS₄[i]の重み係数W₄に大きい値を設定し、他の重み係数W_1,2,3に小さい値を設定した場合には、ＣＧ映像らしさスコアS₄[i]が大きく反映された総合スコアS[i]が生成される。そして、後述する要約映像生成部１４により、ＣＧ映像が多く現れる要約映像が生成される。 For example, when a large value is set for the weight coefficient W ₂ of the face recognition score S ₂ [i] and a small value is set for the other weight coefficients W ₁ , ₃ , ₄ , the face recognition score S ₂ [i] A total score S [i] that greatly reflects is generated. Then, a summary video in which a face of the main performer ID [m] or a guest's face appears is generated by the summary video generation unit 14 described later. Similarly, to set a large value to the weighting factor W ₄ of the CG image likeness score S ₄ [i], if you set the smaller value to the other weight coefficient W _{1, 2, 3} are, CG image likeness score S ₄ A total score S [i] that greatly reflects [i] is generated. Then, a summary video in which many CG videos appear is generated by the summary video generation unit 14 described later.

〔要約映像生成部１４〕
次に、図１に示した要約映像生成部１４について詳細に説明する。前述のとおり、要約映像生成部１４は、総合スコアS[i]及びシーン映像V_S[j]を参照し、要約映像全体の長さが所定値を超えるまでカット映像V_C[i]を選択し、選択したカット映像V_C[i]を連結して要約映像を生成する。 [Summary Video Generation Unit 14]
Next, the summary video generation unit 14 shown in FIG. 1 will be described in detail. As described above, the summary video generation unit 14 refers to the overall score S [i] and the scene video V _S [j], and selects the cut video V _C [i] until the total length of the summary video exceeds a predetermined value. Then, the selected cut video V _C [i] is connected to generate a summary video.

図５は、要約映像生成部１４の構成例及び入出力データ例を示すブロック図である。この要約映像生成部１４は、要約映像選択部４０及び要約映像出力部４１を備えている。 FIG. 5 is a block diagram illustrating a configuration example and input / output data example of the summary video generation unit 14. The summary video generation unit 14 includes a summary video selection unit 40 and a summary video output unit 41.

要約映像選択部４０は、総合スコア算出部１３から、カット映像V_C[i]からなるカット系列V_C[1],...,V_C[N_C]、及び総合スコアS[i]からなる総合スコア系列S[1],...,S[N_C]を入力する。また、要約映像選択部４０は、シーン生成部１１から、シーン映像V_S[j]からなるシーン系列V_S[1],...,V_S[N_S]を入力する。 The summary video selection unit 40, from the total score calculation unit 13, from the cut sequence V _C [1], ..., V _C [N _C ] consisting of the cut video V _C [i], and the total score S [i]. The total score series S [1], ..., S [N _C ] is input. In addition, the summary video selection unit 40 inputs the scene series V _S [1],..., V _S [N _S ] including the scene video V _S [j] from the scene generation unit 11.

要約映像選択部４０は、全てのカット映像V_C[i]を、総合スコアS[i]に基づいて重要度の高い順にソートし、ソート後のカット映像V_C[I[i]]に対して順番に、選択したカット映像V_C[I[i]]の全体の長さ（要約映像全体の長さ）が所定値を超えるまで、カット映像V_C[I[i]]を選択する。重要度の高い順とは、「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」を総合した演出の効果の高い順をいう。 The summary video selection unit 40 sorts all the cut videos V _C [i] in descending order of importance based on the overall score S [i], and the sorted cut videos V _C [I [i]] are sorted. In sequence, the cut video V _C [I [i]] is selected until the entire length of the selected cut video V _C [I [i]] (the total length of the summary video) exceeds a predetermined value. The order of importance is the order in which the effect of the production combining “telop”, “face recognition”, “camera work”, and “CG image quality” is high.

要約映像選択部４０は、カット映像V_C[I[i]]を選択する際に、シーン映像V_S[j]内で選択するカット映像V_C[I[i]]の数が所定値を超えないようにする。そして、要約映像選択部４０は、選択したカット映像V_C[I[i]]を要約映像出力部４１に出力する。 SUMMARY image selection section 40, when selecting the cut video _{V C [I [i]]} , the number of cut images selected in the scene image _{_{V S [j] V C [}} I [i]] is a predetermined value Do not exceed. The summary video selection unit 40 then outputs the selected cut video V _C [I [i]] to the summary video output unit 41.

要約映像出力部４１は、要約映像選択部４０から、選択したカット映像V_C[I[i]]を入力し、選択したカット映像V_C[I[i]]を時系列に連結して要約映像V_C[i'₁]...V_C[i'_L]を生成し、要約映像V_C[i'₁]...V_C[i'_L]を出力する。Lは、選択されたカット映像V_C[i]の数、すなわち要約映像V_C[i'₁]...V_C[i'_L]の数である。 Summary video output unit 41, the digest video selection unit 40 receives the cut image V _C selected [I [i]], and connecting the selected cut image V _C [I [i]] in a time-series Summary Video V _C [i ' ₁ ] ... V _C [i' _L ] is generated, and summary video V _C [i ' ₁ ] ... V _C [i' _L ] is output. L is the number of selected cut videos V _C [i], that is, the number of summary videos V _C [i ′ ₁ ]... V _C [i ′ _L ].

〔要約映像生成部１４の動作〕
図６は、要約映像生成部１４の処理例を示すフローチャートであり、図７は、図６のフローチャートを説明する図である。要約映像生成部１４の要約映像選択部４０は、総合スコア系列S[1],...,S[N_C]のスコアが高い順にカット系列V_C[1],...,V_C[N_C]をソートし、カット系列V_C[I[1]],...,V_C[I[N_C]]を生成する（ステップＳ６０１）。 [Operation of Summary Video Generation Unit 14]
FIG. 6 is a flowchart illustrating an example of processing of the summary video generation unit 14, and FIG. 7 is a diagram illustrating the flowchart of FIG. The summary video selection unit 40 of the summary video generation unit 14 generates the cut sequences V _C [1], ..., V _C [in descending order of the scores of the overall score series S [1], ..., S [N _C ]. N _C ] are sorted to generate a cut sequence V _C [I [1]],..., V _C [I [N _C ]] (step S601).

例えば図７に示すように、総合スコア系列S[1],...,S[N_C]のスコアがS[8]＞S[4]＞S[1]＞S[10]＞S[3]＞．．．＞S[20]の場合、ステップＳ６０１の処理により、ソート後のカット系列は、V_C[I[1]]=V_C[8]，V_C[I[2]]=V_C[4]，V_C[I[3]]=V_C[1]，V_C[I[4]]=V_C[10]，V_C[I[5]]=V_C[3]，．．．，V_C[I[N_C]]=V_C[20]となる。 For example, as shown in FIG. 7, the score of the total score series S [1],..., S [N _C ] is S [8]> S [4]> S [1]> S [10]> S [ 3] ＞． . . When> S [20], the cut sequence after sorting is processed as V _C [I [1]] = V _C [8], V _C [I [2]] = V _C [4] by the processing in step S601. , V _C [I [3]] = V _C [1], V _C [I [4]] = V _C [10], V _C [I [5]] = V _C [3],. . . , V _C [I [N _C ]] = V _C [20].

図６に戻って、要約映像選択部４０は、初期設定として、パラメータi=1,...,N_Cについての全てのフラグSelect[i]を「false」に設定し（Select[i]=false（i=1,...,N_C））、パラメータj=1,...,N_Sについての全てのカウントCount[j]を０に設定する（Count[j]=0（j=1,...,N_S）（ステップＳ６０２）。尚、このパラメータiは、図６及び図７の説明のために用いられ、カット映像V_C[i]、総合スコアS[i]等のパラメータiとは異なる。 Returning to FIG. 6, the video summary selection unit 40 as an initial setting, and set the parameter i = 1, ..., all the flags the Select [i] for N _C to "false" (Select [i] = false (i = 1, ..., N _C )), set all counts Count [j] for parameters j = 1, ..., N _S to 0 (Count [j] = 0 (j = 1,..., N _S ) (step S 602) Note that this parameter i is used for the explanation of FIG.6 and FIG.7, and includes cut video V _C [i], total score S [i], etc. Different from parameter i.

フラグSelect[i]は、後述するステップＳ６０４及びステップＳ６０５にてカット映像V_C[I[i]]が要約映像の一部に選択された場合、「true」が設定され、カット映像V_C[I[i]]が要約映像に選択されない場合、初期設定された「false」が維持される。カウントCount[j]は、パラメータjのシーン番号のシーン映像V_S[j]において、当該シーン映像V_S[j]に属するカット映像V_C[I[i]]の中で、要約映像の一部に選択されたカット映像V_C[I[i]]の数を示す。 The flag Select [i] is set to “true” when the cut video V _C [I [i]] is selected as a part of the summary video in steps S604 and S605 described later, and the cut video V _C [ If I [i]] is not selected for the summary video, the default “false” is maintained. Count Count [j] is the scene image V _S [j] of the parameter j scene number, in the scene image V _S [j] Cut belonging to the video _{V C [I [i]]} , the video summary one Indicates the number of cut images V _C [I [i]] selected.

以下、ソート後のカット系列V_C[I[1]],...,V_C[I[N_C]]のそれぞれにパラメータi=1,...,N_Cを対応させ、ソート後のカット映像V_C[I[i]]毎に、要約映像の選択処理が行われる。要約映像の選択処理は、後述するステップＳ６０６のとおり、要約映像として選択されたカット映像V_C[I[i]]の全フレーム数が所定値T_MAXを超えるまで行われる。 Hereafter, the parameters i = 1, ..., N _C are made to correspond to the sorted cut sequences V _C [I [1]], ..., V _C [I [N _C ]], respectively. A summary video selection process is performed for each cut video V _C [I [i]]. The summary video selection processing is performed until the total number of frames of the cut video V _C [I [i]] selected as the summary video exceeds a predetermined value T _MAX as in step S606 described later.

要約映像選択部４０は、パラメータiに１を設定し（i=1、ステップＳ６０３）、フラグSelect[i]に「true」が設定されている、またはカウントCount[J[i]]が所定値N_MAXよりも大きい条件を満たすか否かを判定する（ステップＳ６０４）。すなわち、要約映像選択部４０は、パラメータiのカット映像V_C[I[i]]が要約映像の一部に選択されている、またはパラメータiのカット映像V_C[I[i]]の属するシーン番号J[i]のシーン映像V_S[J[i]]において、要約映像の一部として選択されているカット映像数が所定値N_MAXよりも大きい条件を満たすか否かを判定する。 The summary video selection unit 40 sets the parameter i to 1 (i = 1, step S603), the flag Select [i] is set to “true”, or the count Count [J [i]] is a predetermined value. It is determined whether or not a condition greater than N _MAX is satisfied (step S604). That is, the video summary selection unit 40 genera parameter i of the cut video V _C [I [i]] is selected in a part of the video summary or parameter i of the cut video _{V C, [I [i]} ] In the scene video V _S [J [i]] with the scene number J [i], it is determined whether or not the condition that the number of cut videos selected as a part of the summary video satisfies a condition larger than a predetermined value N _MAX is determined.

J[i]は、カット映像V_C[I[i]]の属するシーン番号を示す。N_MAXは、１つのシーン映像V_S[J[i]]から選択されるカット映像V_C[I[i]]の数の最大値を示し、予め設定される。 J [i] indicates the scene number to which the cut video V _C [I [i]] belongs. N _MAX indicates the maximum value of the number of cut videos V _C [I [i]] selected from one scene video V _S [J [i]], and is preset.

要約映像選択部４０は、ステップＳ６０４において、フラグSelect[i]に「true」が設定されている、またはカウントCount[J[i]]が所定値N_MAXよりも大きい条件を満たさないと判定した場合（ステップＳ６０４：Ｎ）、カット映像V_C[I[i]]を要約映像の一部に選択し、フラグSelect[i]に「true」を設定し（Select[i]=true）、カウントCount[J[i]]をインクリメントする（Count[J[i]]＝Count[J[i]]+1）（ステップＳ６０５）。すなわち、要約映像選択部４０は、フラグSelect[i]に「true」が設定されていない（「false」が設定されている）場合、かつカウントCount[J[i]]が所定値N_MAXよりも大きくない場合、ステップＳ６０５の処理を行う。言い換えると、要約映像選択部４０は、カット映像V_C[I[i]]が要約映像の一部に選択されておらず、かつ、カット映像V_C[I[i]]の属するシーン番号J[i]のシーン映像V_S[J[i]]において、要約映像の一部として選択されているカット映像数が所定値N_MAXよりも大きくない場合、ステップＳ６０５の処理を行う。 In step S604, the summary video selection unit 40 determines that the flag Select [i] is set to “true” or that the count Count [J [i]] does not satisfy the condition greater than the predetermined value _NMAX . In the case (step S604: N), the cut video V _C [I [i]] is selected as a part of the summary video, the flag Select [i] is set to “true” (Select [i] = true), and the count is performed. Count [J [i]] is incremented (Count [J [i]] = Count [J [i]] + 1) (step S605). That is, the summary video selection unit 40, when “true” is not set in the flag Select [i] (“false” is set), and the count Count [J [i]] is greater than the predetermined value N _MAX . If not, the process of step S605 is performed. In other words, the summary video selection unit 40 does not select the cut video V _C [I [i]] as a part of the summary video and the scene number J to which the cut video V _C [I [i]] belongs. In the scene video V _S [J [i]] of [i], when the number of cut videos selected as part of the summary video is not greater than the predetermined value N _MAX , the process of step S605 is performed.

例えば図７に示すように、ステップＳ６０５により、初期設定にて「false」に設定されたフラグSelect[1],...,Select[N_c]のうち、フラグSelect[1],...,Select[5]等が「true」に変更される。 For example, as illustrated in FIG. 7, among the flags Select [1],..., Select [N _c ] that are initially set to “false” in step S605, the flags Select [1],. , Select [5] etc. are changed to "true".

図６に戻って、要約映像選択部４０は、パラメータi=1から現在のパラメータiまでにおいて、フラグSelect[i]が「true」（Select[i]=true）に設定されている全てのカット映像V_C[I[i]]の合計フレーム数を算出し、合計フレーム数が所定値T_MAXよりも大きいか否かを判定する（ステップＳ６０６）。すなわち、要約映像選択部４０は、要約映像の一部として選択された全てのカット映像V_C[i]の全フレーム数が所定値T_MAXよりも大きいか否か、つまり、要約映像全体の長さが所定値T_MAXよりも大きいか否かを判定する。T_MAXは、利用者が生成したい要約映像の長さ（フレーム数）を示し、予め設定される。 Returning to FIG. 6, the summary video selection unit 40 performs all the cuts in which the flag Select [i] is set to “true” (Select [i] = true) from the parameter i = 1 to the current parameter i. The total number of frames of the video V _C [I [i]] is calculated, and it is determined whether or not the total number of frames is greater than a predetermined value T _MAX (step S606). That is, the summary video selection unit 40 determines whether or not the total number of frames of all cut videos V _C [i] selected as a part of the summary video is greater than the predetermined value T _MAX , that is, the length of the entire summary video. Whether or not is greater than a predetermined value T _MAX is determined. T _MAX indicates the length (number of frames) of the summary video that the user wants to generate and is preset.

要約映像選択部４０は、ステップＳ６０６において、要約映像の一部として選択されたカット映像V_C[i]の合計フレーム数が所定値T_MAXよりも大きくないと判定した場合（ステップＳ６０６：Ｎ）、すなわち要約映像全体の長さが所定値T_MAXよりも大きくないと判定した場合、ステップＳ６０４へ移行する。この場合、当該パラメータiのフラグSelect[i]はステップＳ６０５にて「true」に設定されているから、ステップＳ６０４からステップＳ６０７へ移行し、次のカット映像V_C[I[i+1]]の処理が行われる。 When the summary video selection unit 40 determines in step S606 that the total number of frames of the cut video V _C [i] selected as part of the summary video is not greater than the predetermined value T _MAX (step S606: N). That is, if it is determined that the total length of the summary video is not greater than the predetermined value T _MAX , the process proceeds to step S604. In this case, since the flag Select [i] of the parameter i is set to “true” in step S605, the process proceeds from step S604 to step S607, and the next cut video V _C [I [i + 1]]. Is performed.

一方、要約映像選択部４０は、ステップＳ６０６において、要約映像の一部として選択されたカット映像V_C[i]の合計フレーム数が所定値T_MAXよりも大きいと判定した場合（ステップＳ６０６：Ｙ）、すなわち要約映像全体の長さが所定値T_MAXよりも大きいと判定した場合、ステップＳ６１１へ移行する。 On the other hand, when the summary video selection unit 40 determines in step S606 that the total number of frames of the cut video V _C [i] selected as part of the summary video is larger than the predetermined value T _MAX (step S606: Y). ), That is, if it is determined that the length of the entire summary video is greater than the predetermined value T _MAX , the process proceeds to step S611.

一方、要約映像選択部４０は、ステップＳ６０４において、フラグSelect[i]に「true」が設定されている、またはカウントCount[J[i]]が所定値N_MAXよりも大きい条件を満たすと判定した場合（ステップＳ６０４：Ｙ）、パラメータiをインクリメントする（i＝i+1、ステップＳ６０７）。すなわち、要約映像選択部４０は、フラグSelect[i]に「true」が設定されている場合、ステップＳ６０７の処理を行う。または、要約映像選択部４０は、カウントCount[J[i]]が所定値N_MAXよりも大きい場合、ステップＳ６０７の処理を行う。 On the other hand, the summary video selection unit 40 determines in step S604 that the flag Select [i] is set to “true” or that the count Count [J [i]] satisfies the condition greater than the predetermined value _NMAX. If so (step S604: Y), the parameter i is incremented (i = i + 1, step S607). That is, the summary video selection unit 40 performs the process of step S607 when “true” is set in the flag Select [i]. Or, digest video selection unit 40, when the count Count [J [i]] is greater than the predetermined value N _MAX, the process of step S607.

言い換えると、要約映像選択部４０は、カット映像V_C[I[i]]が要約映像の一部に選択されている場合、次のカット映像V_C[I[i+1]]の処理を行うため、ステップＳ６０７へ移行する。または、要約映像選択部４０は、カット映像V_C[I[i]]の属するシーン番号J[i]のシーン映像V_S[J[i]]において、要約映像の一部として選択されているカット映像数が所定値N_MAXよりも大きい場合、当該シーン映像V_S[J]における残りのカット映像V_C[I[i]]の処理のために、ステップＳ６０７の処理を行う。この場合の残りのカット映像V_C[I[i]]は、要約映像として選択されることはない。 In other words, when the cut video V _C [I [i]] is selected as a part of the summary video, the summary video selection unit 40 performs processing of the next cut video V _C [I [i + 1]]. Therefore, the process proceeds to step S607. Alternatively, the summary video selection unit 40 is selected as a part of the summary video in the scene video V _S [J [i]] of the scene number J [i] to which the cut video V _C [I [i]] belongs. If the number of cut videos is larger than the predetermined value N _MAX , the process of step S607 is performed to process the remaining cut videos V _C [I [i]] in the scene video V _S [J]. The remaining cut video V _C [I [i]] in this case is not selected as the summary video.

要約映像選択部４０は、パラメータiが所定値N_Cよりも大きいか否かを判定する（i＞N_C、ステップＳ６０８）。すなわち、要約映像選択部４０は、全てのカット映像V_C[I[i]]の処理が完了したか否かを判定する。 The summary video selection unit 40 determines whether or not the parameter i is greater than the predetermined value N _C (i> N _C , step S608). That is, the summary video selection unit 40 determines whether or not processing of all cut videos V _C [I [i]] has been completed.

要約映像選択部４０は、ステップＳ６０８において、パラメータiが所定値N_Cよりも大きくないと判定した場合（ステップＳ６０８：Ｎ）、ステップＳ６０４へ移行し、当該カット映像V_C[I[i]]の処理を行う。 When the summary video selection unit 40 determines in step S608 that the parameter i is not greater than the predetermined value N _C (step S608: N), the summary video selection unit 40 proceeds to step S604, and the cut video V _C [I [i]]. Perform the process.

一方、要約映像選択部４０は、ステップＳ６０８において、パラメータiが所定値N_Cよりも大きいと判定した場合（ステップＳ６０８：Ｙ）、すなわち全てのカット映像V_C[I[i]]の処理が完了した場合、所定値N_MAXをインクリメントする（N_MAX=N_MAX+1、ステップＳ６０９）。 On the other hand, if the summary video selection unit 40 determines in step S608 that the parameter i is larger than the predetermined value N _C (step S608: Y), that is, the processing of all cut video V _C [I [i]] is performed. If completed, the predetermined value N _MAX is incremented (N _MAX = N _MAX +1, step S609).

これにより、ステップＳ６０４にて使用する所定値N_MAX（１つのシーン映像V_S[j[i]]から選択されるカット映像V_C[I[i]]の数の最大値）がインクリメントされる。そして、後述するステップＳ６１０からステップＳ６０３へ移行すると、最初のパラメータi=1のカット映像V_C[I[i]]から順番に処理が行われる。したがって、ステップＳ６０６の条件を満たすまで、要約映像の一部として選択されるカット映像V_C[I[i]]が追加される。 As a result, the predetermined value N _MAX (the maximum value of the number of cut images V _C [I [i]] selected from one scene image V _S [j [i]]) used in step S604 is incremented. . Then, when the process proceeds from step S610, which will be described later, to step S603, processing is performed in order from the first cut video V _C [I [i]] with the parameter i = 1. Therefore, the cut video V _C [I [i]] selected as a part of the summary video is added until the condition of step S606 is satisfied.

要約映像選択部４０は、所定値N_MAXが所定値Ncよりも大きいか否かを判定し（ステップＳ６１０）、所定値N_MAXが所定値Ncよりも大きくないと判定した場合（ステップＳ６１０：Ｎ）、ステップＳ６０３へ移行する。一方、要約映像選択部４０は、所定値N_MAXが所定値Ncよりも大きいと判定した場合（ステップＳ６１０：Ｙ）、ステップＳ６１１へ移行する。 The summary video selection unit 40 determines whether or not the predetermined value N _MAX is larger than the predetermined value Nc (step S610), and determines that the predetermined value N _MAX is not larger than the predetermined value Nc (step S610: N ), The process proceeds to step S603. On the other hand, if the summary video selection unit 40 determines that the predetermined value N _MAX is larger than the predetermined value Nc (step S610: Y), the summary video selection unit 40 proceeds to step S611.

要約映像出力部４１は、ステップＳ６０６（Ｙ）またはステップＳ６１０（Ｙ）から移行して、ステップＳ６０５にて選択された全てのカット映像V_C[I[i]]（フラグSelect[i]=trueである全てのカット映像V_C[I[i]]）を時系列にソートして連結する。そして、要約映像出力部４１は、連結した映像V_C[i'₁]...V_C[i'_L]を要約映像として生成し、出力する（ステップＳ６１１）。Lは、選択されたカット映像V_C[I[i]]の数を示す。 The summary video output unit 41 proceeds from step S606 (Y) or step S610 (Y), and all the cut videos V _C [I [i]] (flag Select [i] = true) selected in step S605. All cut videos V _C [I [i]]) are sorted and connected in time series. Then, the summary video output unit 41 generates and outputs the concatenated video V _C [i ′ ₁ ]... V _C [i ′ _L ] as a summary video (step S611). L indicates the number of selected cut videos V _C [I [i]].

例えば図７に示すように、ステップＳ６１１により、フラグSelect[i]=trueである全てのカット映像V_C[I[1]]=V_C[8]，V_C[I[2]]=V_C[4]，V_C[I[3]]=V_C[1]，V_C[I[4]]=V_C[10]，V_C[I[5]]=V_C[3]等が時系列にソートされ連結される。そして、連結された映像V_C[i'₁]=V_C[1]，V_C[i'₂]=V_C[3]，V_C[i'₃]=V_C[4]，．．．，V_C[i'_L]=V_C[25]が要約映像として生成される。 For example, as shown in FIG. 7, in step S611, all the cut videos V _C [I [1]] = V _C [8], V _C [I [2]] = V with the flag Select [i] = true. _C [4], V _C [I [3]] = V _C [1], V _C [I [4]] = V _C [10], V _C [I [5]] = V _C [3] etc. Are sorted and concatenated in time series. Then, the concatenated images V _C [i ′ ₁ ] = V _C [1], V _C [i ′ ₂ ] = V _C [3], V _C [i ′ ₃ ] = V _C [4],. . . , V _C [i ′ _L ] = V _C [25] is generated as a summary video.

以上のように、本発明の実施形態による要約映像生成装置１によれば、カット分割部１０は、番組映像をカット映像V_C[i]（i=1,...,N_C）に分割し、シーン生成部１１は、同じ場面のカット映像V_C[i]を統合してシーン映像V_S[j]（j=1,...,N_S）を生成する。 As described above, according to the summary video generation device 1 according to the embodiment of the present invention, the cut dividing unit 10 divides the program video into the cut video V _C [i] (i = 1,..., N _C ). Then, the scene generation unit 11 integrates the cut video V _C [i] of the same scene to generate the scene video V _S [j] (j = 1,..., N _S ).

要素スコア算出部１２は、「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」からなる４つの要素の演出毎に、各カット映像V_C[i]について、要素スコアS₁[i]〜S₄[i]（テロップスコアS₁[i]、顔認識スコアS₂[i]、カメラワークスコアS₃[i]及びＣＧ映像らしさスコアS₄[i]）を算出する。 The element score calculation unit 12 generates an element score S ₁ [i] for each cut video V _C [i] for each production of the four elements including “telop”, “face recognition”, “camera work”, and “CG image quality”. ] To S ₄ [i] (telop score S ₁ [i], face recognition score S ₂ [i], camera work score S ₃ [i], and CG video likelihood score S ₄ [i]) are calculated.

総合スコア算出部１３は、要素スコアS₁[i]〜S₄[i]及び重み係数W₁〜W₄に基づいて、カット映像V_C[i]の総合スコアS[i]を算出する。 The total score calculation unit 13 calculates the total score S [i] of the cut video V _C [i] based on the element scores S ₁ [i] to S ₄ [i] and the weighting factors W _{1 to} W ₄ .

要約映像生成部１４は、総合スコアS[i]及びシーン映像V_S[j]を参照し、要約映像全体の長さが所定値を超えるまでカット映像V_C[i]を選択し、選択したカット映像V_C[i]を時系列にソートして連結することで要約映像を生成する。 The summary video generation unit 14 refers to the overall score S [i] and the scene video V _S [j], selects and selects the cut video V _C [i] until the total length of the summary video exceeds a predetermined value. A summary video is generated by sorting and connecting the cut video V _C [i] in time series.

これにより、番組映像における「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」の各要素のように、番組演出上重要な場面に生じる傾向の強い要素を考慮した要約映像を生成することができる。 As a result, a summary video is generated that takes into consideration factors that are likely to occur in scenes important for program production, such as the elements of “telop”, “face recognition”, “camera work”, and “CG video” in the program video. be able to.

つまり、番組映像に現れる演出を考慮することで、演出による効果の度合いの高い重要な区間の映像のみで構成される要約映像を生成することが可能となる。そして、このような要約映像をネット用コンテンツとして配信することにより、視聴者の番組への関心を高める媒体として利用することできる。 In other words, by taking into account the effects that appear in the program image, it is possible to generate a summary image that is composed only of images of important sections where the effect of the effects is high. Then, by distributing such summary video as content for the Internet, it can be used as a medium for increasing the interest of viewers in the program.

また、本発明の実施形態による要約映像生成装置１によれば、総合スコア算出部１３は、カット映像V_C[i]の総合スコアS[i]を算出する際に、要素スコアS₁[i]〜S₄[i]毎に設定された重み係数W₁〜W₄を用いるようにした。 Also, according to the summary video generation device 1 according to the embodiment of the present invention, the total score calculation unit 13 calculates the element score S ₁ [i] when calculating the total score S [i] of the cut video V _C [i]. ] To S ₄ [i] The weight coefficients W _{1 to} W ₄ set for each are used.

これにより、重み係数W₁〜W₄に応じて、当該重み係数W₁〜W₄の要素のスコアを総合スコアS[i]へ反映することができる。つまり、どの要素に重きを置いて要約映像を作るのかを、重み係数W₁〜W₄に対し自由に設定することで、利用者の意図に沿った様々なバリエーションの要約映像を生成することができる。 This makes it possible in accordance with the weighting coefficients W ₁ to _W-4, reflecting the scores of the element of the weighting factors W ₁ to _W-4 to the total score S [i]. That is, whether to make a video summary strong emphasis on which elements, by setting freely relative to the weight coefficient W ₁ to _W-4, to generate a video summary of the many variations in line with the intentions of the user it can.

また、本発明の実施形態による要約映像生成装置１によれば、要約映像生成部１４は、要約映像を生成するためのカット映像V_C[i]を選択する際に、シーン映像V_S[j]内で選択するカット映像V_C[i]の数が所定値N_MAXを超えないようにした。これにより、特定のシーンに偏ってカット映像V_C[i]が選択されることがないから、特定のシーンに偏った要約映像が生成されることがない。つまり、番組映像の全体のシーンを考慮した要約映像が生成される。 Further, according to the summary video generation apparatus 1 according to the embodiment of the present invention, the summary video generation unit 14 selects the scene video V _S [j] when selecting the cut video V _C [i] for generating the summary video. The number of cut images V _C [i] to be selected in [] is prevented from exceeding a predetermined value N _MAX . As a result, the cut video V _C [i] is not selected biased to a specific scene, so that a summary video biased to the specific scene is not generated. That is, a summary video in consideration of the entire scene of the program video is generated.

以上、実施形態を挙げて本発明を説明したが、本発明は前記実施形態に限定されるものではなく、その技術思想を逸脱しない範囲で種々変形可能である。前記実施形態では、演出の種類を「テロップ」「顔認識」「カメラワーク」及び「ＣＧ映像らしさ」として、要約映像を生成するようにしたが、演出の種類は、これら４つの要素に限定されるものではなく、他の要素を用いるようにしてもよい。例えば、音の大きさ、音楽、会話、特定の物体等を演出の要素として、要約映像を生成するようにしてもよい。また、これらの演出の要素のうち任意の所定数の要素を用いて、要約映像を生成するようにしてもよい。 The present invention has been described with reference to the embodiment. However, the present invention is not limited to the above-described embodiment, and various modifications can be made without departing from the technical idea thereof. In the embodiment, the summary video is generated with the production type as “telop”, “face recognition”, “camera work”, and “CG video quality”, but the production type is limited to these four elements. Instead of this, other elements may be used. For example, a summary video may be generated using sound volume, music, conversation, a specific object, etc. as elements of production. Moreover, you may make it produce | generate a summary image | video using arbitrary predetermined number elements among these elements of production.

前述のとおり、要素スコアは、演出による効果の程度が反映された重要度を表すから、演出が「音の大きさ」の場合、要素スコアは、例えばその大きさに比例した値が設定される。また、演出が「音楽」の場合、要素スコアは、カット映像V_C[i]内で音楽が流れている時間割合に応じて設定される。また、演出が「特定の物体」の場合、要素スコアは、前述の「顔認識」の場合と同様に、当該物体が検出された領域の面積比率及び当該物体である確率に基づいて設定される。 As described above, the element score represents the degree of importance reflecting the effect level of the effect. Therefore, when the effect is “sound volume”, the element score is set to a value proportional to the magnitude, for example. . Further, when the production is “music”, the element score is set according to the proportion of time that the music is flowing in the cut video V _C [i]. When the production is “specific object”, the element score is set based on the area ratio of the area in which the object is detected and the probability that the object is the same as in the case of the “face recognition” described above. .

尚、本発明の実施形態による要約映像生成装置１のハードウェア構成としては、通常のコンピュータを使用することができる。要約映像生成装置１は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。要約映像生成装置１に備えたカット分割部１０、シーン生成部１１、要素スコア算出部１２、総合スコア算出部１３及び要約映像生成部１４の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。これらのプログラムは、前記記憶媒体に格納されており、ＣＰＵに読み出されて実行される。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもでき、ネットワークを介して送受信することもできる。 Note that a normal computer can be used as the hardware configuration of the summary video generation apparatus 1 according to the embodiment of the present invention. The summary video generation apparatus 1 is configured by a computer including a volatile storage medium such as a CPU and a RAM, a non-volatile storage medium such as a ROM, an interface, and the like. The functions of the cut dividing unit 10, the scene generating unit 11, the element score calculating unit 12, the total score calculating unit 13, and the summary video generating unit 14 included in the summary video generation device 1 are stored in the CPU as a program describing these functions. Each is realized by executing. These programs are stored in the storage medium and read out and executed by the CPU. These programs can also be stored and distributed in a storage medium such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, etc. You can also send and receive.

１要約映像生成装置
１０カット分割部
１１シーン生成部
１２要素スコア算出部
１３総合スコア算出部
１４要約映像生成部
２０テロップ領域検出部
２１顔認識処理部
２２カメラワーク算出部
２３ＣＧ映像らしさ算出部
２４テロップスコア算出部
２５顔認識スコア算出部
２６カメラワークスコア算出部
２７ＣＧ映像らしさスコア算出部
３０重み係数設定部
３１スコア算出部
４０要約映像選択部
４１要約映像出力部 DESCRIPTION OF SYMBOLS 1 Summary video generation apparatus 10 Cut division part 11 Scene generation part 12 Element score calculation part 13 Comprehensive score calculation part 14 Summary video generation part 20 Telop area detection part 21 Face recognition processing part 22 Camera work calculation part 23 CG video likelihood calculation part 24 Telop score calculation unit 25 Face recognition score calculation unit 26 Camera work score calculation unit 27 CG video likelihood score calculation unit 30 Weight coefficient setting unit 31 Score calculation unit 40 Summary video selection unit 41 Summary video output unit

Claims

In a summary video generation device that generates a summary video from a video,
A cut division unit that divides the video into a plurality of cut videos in a cut unit;
For each of the plurality of cut videos divided by the cut dividing unit, for each predetermined number of different effects, a score calculation unit that calculates a score representing the importance of the effect;
For each of the plurality of cut videos divided by the cut division unit, a total score calculation unit that calculates a total score based on the score for each effect calculated by the score calculation unit;
Based on the total score calculated by the total score calculation unit, from the plurality of cut videos, select a cut video constituting the summary video, and generate a summary video generation unit,
A summary video generation apparatus comprising:

The summary video generation device according to claim 1,
The total score calculation unit
For each of the plurality of cut videos, the total score is calculated based on a score for each effect calculated by the score calculation unit and a preset weighting factor for each effect. Video generation device.

In the summary video generation device according to claim 1 or 2,
Furthermore, a scene generation unit that generates a cut video of the same scene as a scene video from the plurality of cut videos divided by the cut division unit,
The summary video generation unit includes:
Summary video generation comprising: selecting a cut video constituting the summary video so that the number of the cut videos selected from the scene video generated by the scene generation unit does not exceed a predetermined value apparatus.

In the summary image generation device according to any one of claims 1 to 3,
The score calculation unit
The score is calculated based on an area where an object related to the effect appears in the video, an amount of movement of the object related to the effect, or a probability that an object related to the effect appears. Summary video generator.

The summary video generation device according to claim 3,
The summary video generation unit includes:
From the plurality of cut images divided by the cut dividing unit, the scene images generated by the scene generating unit in the order of importance of the effect according to the total score calculated by the total score calculating unit. A summary video selection unit that selects the cut video constituting the summary video until the total length of the summary video exceeds a predetermined value so that the number of the cut videos to be selected does not exceed a predetermined value;
A summary video output unit for generating the summary video by connecting the cut video selected by the summary video selection unit in time series; and
A summary video generation apparatus comprising:

The program for functioning a computer as the summary image | video production | generation apparatus as described in any one of Claim 1-5.