JP6321945B2

JP6321945B2 - Digest video generation device, digest video generation method, and digest video generation program

Info

Publication number: JP6321945B2
Application number: JP2013237734A
Authority: JP
Inventors: 大我吉田; 新井　啓之; 啓之新井; 行信谷口
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2013-11-18
Filing date: 2013-11-18
Publication date: 2018-05-09
Anticipated expiration: 2033-11-18
Also published as: JP2015099958A

Description

本発明は、映像を編集してダイジェスト映像を生成するダイジェスト映像生成装置、ダイジェスト映像生成方法及びダイジェスト映像生成プログラムに関する。 The present invention relates to a digest video generation device, a digest video generation method, and a digest video generation program that edit video and generate a digest video.

映像の内容に興味があるか否かをユーザが判断するために、映像すべてを視聴すると非常に時間がかかる場合があるため、映像すべてを視聴してから判断することが困難なことがある。短時間で映像の内容を把握するために、映像のダイジェスト映像を生成してダイジェスト映像を視聴することが行われている。 Since it may take a very long time to view all the videos in order for the user to determine whether or not he / she is interested in the contents of the videos, it may be difficult to determine after viewing all the videos. In order to grasp the contents of a video in a short time, a digest video of the video is generated and the digest video is viewed.

ダイジェスト映像を生成する技術としては、例えば特許文献１に記載されている技術がある。この技術では、他のユーザの動作状況を分析し、他のユーザの興味情報に基づいてシーンの注目度を算出し、他のユーザ注目度の高いシーンを映像から選択し、選択したシーンでダイジェスト映像を生成し、提示している。 As a technique for generating a digest video, for example, there is a technique described in Patent Document 1. In this technology, the operation status of other users is analyzed, the attention level of the scene is calculated based on the interest information of the other users, the scene with the other user attention level is selected from the video, and the digest is selected in the selected scene. A video is generated and presented.

特開２０１２−２４８９８７号公報JP 2012-248987 A

しかしながら、注目度の高いシーンを見るだけでは、内容の前後関係やシーン間の関係を考慮していないダイジェスト映像が生成されることがあり、内容の把握が難しい場合がある。例えば、サッカーの試合映像では、得点が入った瞬間のシーンで注目度が高くなると考えられる。しかし、得点が入った瞬間のシーンだけを見ても、どのようなプレイがあって得点につながったのかといった得点が入ったシーンの前後関係を把握することができず、ユーザが十分に内容を把握できない可能性がある。また、例えば長時間に亘って生放送される映像の場合、時間帯によって視聴者数が異なり、視聴者数が少ない場合には、注目度の高い他のシーンと似ているシーンがあってもダイジェスト映像に加えることができないという問題もある。そのため、前述のように生成されたダイジェスト映像だけでは映像の内容に関して十分な情報を得ることができず、映像の内容を把握するために映像を視聴しなければならないことがあった。 However, simply viewing a scene with a high degree of attention may generate a digest video that does not consider the context of the content or the relationship between the scenes, which may make it difficult to grasp the content. For example, in a soccer game video, it is considered that the degree of attention increases in the scene at the moment when a score is entered. However, just looking at the scene at the moment when the score was entered, it was impossible to grasp the context of the scene with the score, such as what kind of play had led to the score. It may not be possible to grasp. For example, in the case of video that is broadcast live for a long time, if the number of viewers varies depending on the time zone and the number of viewers is small, even if there are scenes similar to other scenes with high attention, There is also a problem that it cannot be added to the video. For this reason, sufficient information regarding the content of the video cannot be obtained only with the digest video generated as described above, and the video has to be viewed in order to grasp the content of the video.

本発明は、前述の問題を解決するためのものであり、映像内容に対するユーザの把握の度合いを高めることができるダイジェスト映像を生成するダイジェスト映像生成装置、ダイジェスト映像生成方法及びダイジェスト映像生成プログラムを提供することを目的としている。 The present invention is intended to solve the above-described problems, and provides a digest video generation device, a digest video generation method, and a digest video generation program that generate a digest video that can increase the degree of user understanding of video content. The purpose is to do.

本発明の一態様は、指定された映像を複数のシーンに分割するシーン分割部と、前記複数のシーンごとに興味度を算出するユーザ興味度算出部と、前記複数のシーンにおける２つのシーンの組み合わせごとにシーン間の類似度を算出するシーン間類似度算出部と、前記ユーザ興味度算出部が算出した各シーンの興味度と前記シーン間類似度算出部が算出した各シーン間の類似度とに基づいて各シーンに対するスコアを算出するシーンスコア算出部と、前記複数のシーンそれぞれに対して算出されたスコアに基づいて前記複数のシーンからシーンを選択し、選択したシーンを含む前記映像のダイジェスト映像を生成するダイジェスト映像生成部と、を備え、前記シーンスコア算出部は、前記複数のシーンそれぞれをノードとし、シーン間の類似度をノード間のエッジとするグラフを生成し、ノードの興味度を初期状態における各ノードに対する存在確率としノード間のエッジに割り当てられた類似度を遷移確率としてノード間をランダムに移動した後の各ノードの存在確率をスコアとして算出することを特徴とするダイジェスト映像生成装置である。 One aspect of the present invention includes a scene dividing unit that divides a designated video into a plurality of scenes, a user interest degree calculating unit that calculates an interest level for each of the plurality of scenes, and two scenes in the plurality of scenes. Inter-scene similarity calculation unit that calculates the similarity between scenes for each combination, the interest degree of each scene calculated by the user interest calculation unit, and the similarity between scenes calculated by the inter-scene similarity calculation unit A scene score calculation unit that calculates a score for each scene based on the above, a scene is selected from the plurality of scenes based on the score calculated for each of the plurality of scenes, and the video including the selected scene is selected. It includes a digest video generation unit for generating a digest video, the said scene score calculation unit, each of the plurality of scenes to a node, similar between scenes Each node after moving randomly between nodes using the node's degree of interest as the existence probability for each node in the initial state and the similarity assigned to the edge between nodes as the transition probability a digest video generation device characterized that you calculate the existence probability of the node as the score.

また、本発明の一態様は、上記に記載のダイジェスト映像生成装置において、前記シーン間類似度算出部は、前記映像における前記複数のシーンそれぞれの開始時刻、シーンに関連するキーワード、シーンの映像特徴のうち少なくとも１つを用いてシーン間の類似度を算出することを特徴とする。 According to another aspect of the present invention, in the digest video generation device described above, the inter-scene similarity calculation unit includes a start time of each of the plurality of scenes in the video, a keyword related to the scene, and a video feature of the scene. The similarity between scenes is calculated using at least one of the above.

また、本発明の一態様は、指定された映像のダイジェスト映像を生成するダイジェスト映像生成装置が行うダイジェスト映像生成方法であって、前記映像を複数のシーンに分割するシーン分割ステップと、前記複数のシーンごとに興味度を算出するユーザ興味度算出ステップと、前記複数のシーンにおける２つのシーンの組み合わせごとにシーン間の類似度を算出するシーン間類似度算出ステップと、前記ユーザ興味度算出ステップにおいて算出した各シーンの興味度と前記シーン間類似度算出ステップにおいて算出した各シーン間の類似度とに基づいて各シーンに対するスコアを算出するシーンスコア算出ステップと、前記複数のシーンそれぞれに対して算出されたスコアに基づいて前記複数のシーンからシーンを選択し、選択したシーンを含む前記映像のダイジェスト映像を生成するダイジェスト映像生成ステップと、を含み、前記シーンスコア算出ステップでは、前記複数のシーンそれぞれをノードとしシーン間の類似度をノード間のエッジとするグラフを生成し、ノードの興味度を初期状態における各ノードに対する存在確率としノード間のエッジに割り当てられた類似度を遷移確率としてノード間をランダムに移動した後の各ノードの存在確率をスコアとして算出することを特徴とするダイジェスト映像生成方法である。 Another aspect of the present invention is a digest video generation method performed by a digest video generation apparatus that generates a digest video of a specified video, a scene division step for dividing the video into a plurality of scenes, A user interest level calculating step for calculating an interest level for each scene, an interscene similarity level calculating step for calculating a similarity level between scenes for each combination of two scenes in the plurality of scenes, and the user interest level calculating step. A scene score calculating step for calculating a score for each scene based on the calculated interest degree of each scene and the similarity between scenes calculated in the inter-scene similarity calculating step, and calculating for each of the plurality of scenes Selecting a scene from the plurality of scenes based on the score obtained, and selecting the selected scene Anda digest video generation step of generating a digest video of the video, including, in the scene score calculating step to generate a graph edge between the similarity nodes between scene each of the plurality of scenes and nodes, The node's degree of interest is the probability of existence for each node in the initial state, the similarity assigned to the edge between nodes is used as the transition probability, and the existence probability of each node after moving randomly between nodes is calculated as a score A digest video generation method.

また、本発明の一態様は、上記に記載のダイジェスト映像生成方法において、前記シーン間類似度算出ステップでは、前記映像における前記複数のシーンそれぞれの開始時刻、シーンに関連するキーワード、シーンの映像特徴のうち少なくとも１つを用いてシーン間の類似度を算出することを特徴とする。 According to another aspect of the present invention, in the digest video generation method described above, in the inter-scene similarity calculation step, a start time of each of the plurality of scenes in the video, a keyword related to the scene, and a video feature of the scene The similarity between scenes is calculated using at least one of the above.

また、本発明の一態様は、上記に記載のダイジェスト映像生成装置としてコンピュータを機能させるためのダイジェスト映像生成プログラムである。 One embodiment of the present invention is a digest video generation program for causing a computer to function as the digest video generation apparatus described above.

本発明によれば、シーンごとに算出した興味度とシーン間の類似度とに基づいてスコアを算出し、算出したスコアに基づいてダイジェスト映像に含めるシーンを選択することにより、映像に含まれる各シーンのうち興味をもたれやすいシーンと当該シーンに関連するシーンとを含めたダイジェスト映像を生成することができ、映像内容に対するユーザの把握の度合いを高めることができる。 According to the present invention, a score is calculated based on the degree of interest calculated for each scene and the similarity between scenes, and each scene included in the video is selected by selecting a scene to be included in the digest video based on the calculated score. It is possible to generate a digest video including a scene that is easily interesting among the scenes and a scene related to the scene, and it is possible to increase a user's level of understanding of the video content.

本実施形態におけるダイジェスト映像生成装置１の構成例を示すブロック図である。It is a block diagram which shows the structural example of the digest video generation apparatus 1 in this embodiment. 本実施形態における映像情報蓄積部１１が記憶している映像情報管理テーブルの一例を示す図である。It is a figure which shows an example of the video information management table which the video information storage part 11 in this embodiment has memorize | stored. 本実施形態における映像情報蓄積部１１が記憶している映像−シーン関係管理テーブルの一例を示す図である。It is a figure which shows an example of the video-scene relationship management table which the video information storage part 11 in this embodiment has memorize | stored. 本実施形態における映像情報蓄積部１１が記憶しているキーワード情報管理テーブルの一例を示す図である。It is a figure which shows an example of the keyword information management table which the video information storage part 11 in this embodiment has memorize | stored. 本実施形態における映像情報蓄積部１１が記憶している映像−キーワード関係管理テーブルの一例を示す図である。It is a figure which shows an example of the video-keyword relationship management table which the video information storage part 11 in this embodiment has memorize | stored. 本実施形態における映像情報蓄積部１１が記憶しているシーン興味度管理テーブルの一例を示す図である。It is a figure which shows an example of the scene interest degree management table which the video information storage part 11 in this embodiment has memorize | stored. 本実施形態におけるシーンスコア算出部１６がグラフを生成する対象のシーンの一例を示す図である。It is a figure which shows an example of the scene by which the scene score calculation part 16 in this embodiment produces | generates a graph. 本実施形態におけるシーンスコア算出部１６が生成するグラフの一例を示す図である。It is a figure which shows an example of the graph which the scene score calculation part 16 in this embodiment produces | generates. 本実施形態におけるシーンスコア算出部１６が算出したベクトルｖ（各シーンのスコア）の一例を示す図である。It is a figure which shows an example of the vector v (score of each scene) which the scene score calculation part 16 in this embodiment calculated. 本実施形態におけるダイジェスト映像生成装置１が行うダイジェスト映像生成処理を示すフローチャートである。It is a flowchart which shows the digest video generation process which the digest video generation apparatus 1 in this embodiment performs.

以下、図面を参照して、本発明に係る実施形態におけるダイジェスト映像生成装置、ダイジェスト映像生成方法及びダイジェスト映像生成プログラムを説明する。図１は、本実施形態におけるダイジェスト映像生成装置１の構成例を示すブロック図である。図１に示すように、ダイジェスト映像生成装置１は、映像情報蓄積部１１、映像指定部１２、シーン分割部１３、ユーザ興味度算出部１４、シーン間類似度算出部１５、シーンスコア算出部１６、ダイジェスト映像生成部１７、及び、ダイジェスト映像出力部１８を備えている。 Hereinafter, a digest video generation device, a digest video generation method, and a digest video generation program according to an embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a digest video generation apparatus 1 according to the present embodiment. As shown in FIG. 1, the digest video generation device 1 includes a video information storage unit 11, a video designation unit 12, a scene division unit 13, a user interest level calculation unit 14, an interscene similarity calculation unit 15, and a scene score calculation unit 16. A digest video generation unit 17 and a digest video output unit 18.

映像情報蓄積部１１には、映像及び映像に付随する情報が記憶されている。映像に付随する情報には、例えば映像情報管理テーブル、映像−シーン関係管理テーブル、キーワード情報管理テーブル、映像−キーワード関係管理テーブル、シーン興味度管理テーブルが含まれている。以下に各テーブルの一例を示して説明する。 The video information storage unit 11 stores video and information accompanying the video. The information accompanying the video includes, for example, a video information management table, a video-scene relationship management table, a keyword information management table, a video-keyword relationship management table, and a scene interest level management table. An example of each table will be described below.

図２は、本実施形態における映像情報蓄積部１１が記憶している映像情報管理テーブルの一例を示す図である。図２に示すように、映像情報管理テーブルは、映像ＩＤ、映像タイトル、及び、内容の説明の各項目の列を有している。映像情報管理テーブルにおいて各行は、映像ごとに存在する。映像ＩＤは、映像情報蓄積部１１に記憶されている映像を識別するための識別子である。映像タイトルは、記憶されている映像のタイトルである。内容の説明は、記憶されている映像の内容を示すテキスト情報である。映像情報蓄積部１１に記憶されている映像は、映像ＩＤ、映像タイトルや内容の説明に含まれるキーワードを指定することにより読み出し可能になっている。 FIG. 2 is a diagram illustrating an example of a video information management table stored in the video information storage unit 11 according to the present embodiment. As shown in FIG. 2, the video information management table has columns of video ID, video title, and description of contents. Each row in the video information management table exists for each video. The video ID is an identifier for identifying a video stored in the video information storage unit 11. The video title is the title of the stored video. The description of the content is text information indicating the content of the stored video. The video stored in the video information storage unit 11 can be read out by designating a video ID, a video title, and a keyword included in the content description.

図３は、本実施形態における映像情報蓄積部１１が記憶している映像−シーン関係管理テーブルの一例を示す図である。図３に示すように映像−シーン関係管理テーブルには、映像ＩＤ、シーンＩＤ、及び、タイムスタンプの各項目の列を有している。映像−シーン関係管理テーブルにおいて各行は、映像情報蓄積部１１が記憶している映像に含まれるシーンごとに存在する。シーンＩＤは、映像ＩＤで識別される映像に含まれる各シーンを識別するための識別子である。タイムスタンプは、映像ＩＤで識別される映像の開始点からシーンが始まるまでの経過時間（開始時刻）である。映像情報蓄積部１１に記憶されている映像に含まれる各シーンは、映像ＩＤとシーンＩＤとを指定することによりで読み出し可能になっている。 FIG. 3 is a diagram illustrating an example of a video-scene relationship management table stored in the video information storage unit 11 according to the present embodiment. As shown in FIG. 3, the video-scene relationship management table has columns of items of video ID, scene ID, and time stamp. Each row in the video-scene relationship management table exists for each scene included in the video stored in the video information storage unit 11. The scene ID is an identifier for identifying each scene included in the video identified by the video ID. The time stamp is an elapsed time (start time) from the start point of the video identified by the video ID to the start of the scene. Each scene included in the video stored in the video information storage unit 11 can be read by designating the video ID and the scene ID.

図４は、本実施形態における映像情報蓄積部１１が記憶しているキーワード情報管理テーブルの一例を示す図である。図４に示すように、キーワード情報管理テーブルは、キーワードＩＤとキーワード名との項目の列を有している。キーワード情報管理テーブルにおいて各行は、キーワードごとに存在する。キーワードＩＤは、キーワードを識別するための識別子である。キーワード名は、キーワードを示すテキスト情報である。 FIG. 4 is a diagram illustrating an example of the keyword information management table stored in the video information storage unit 11 according to the present embodiment. As shown in FIG. 4, the keyword information management table has columns of items of keyword ID and keyword name. Each row in the keyword information management table exists for each keyword. The keyword ID is an identifier for identifying a keyword. The keyword name is text information indicating the keyword.

図５は、本実施形態における映像情報蓄積部１１が記憶している映像−キーワード関係管理テーブルの一例を示す図である。図５に示すように、映像−キーワード関係管理テーブルは、映像ＩＤ、シーンＩＤ、及び、キーワードＩＤの各項目の列を有している。映像−キーワード関係管理テーブルにおいて各行は、映像に含まれるシーンに対応付けられているキーワードごとに存在する。映像ＩＤとシーンＩＤとを指定することにより、映像ＩＤとシーンＩＤとで特定されるシーン（映像の一部）に対応付けられているキーワードのキーワードＩＤを得ることができる。 FIG. 5 is a diagram illustrating an example of a video-keyword relationship management table stored in the video information storage unit 11 according to the present embodiment. As shown in FIG. 5, the video-keyword relationship management table has columns of items of video ID, scene ID, and keyword ID. Each row in the video-keyword relationship management table exists for each keyword associated with a scene included in the video. By specifying the video ID and the scene ID, the keyword ID of the keyword associated with the scene (part of the video) specified by the video ID and the scene ID can be obtained.

図６は、本実施形態における映像情報蓄積部１１が記憶しているシーン興味度管理テーブルの一例を示す図である。図６に示すように、シーン興味度管理テーブルは、映像ＩＤ、シーンＩＤ、及び、興味度の各項目の列を有している。興味度は、映像ＩＤとシーンＩＤとで特定されるシーンに対するユーザの興味を数値化した値である。映像ＩＤとシーンＩＤとを指定することにより、映像ＩＤとシーンＩＤとで特定されるシーン（映像の一部）に対応付けられている興味度を得ることができる。 FIG. 6 is a diagram illustrating an example of a scene interest level management table stored in the video information storage unit 11 according to the present embodiment. As shown in FIG. 6, the scene interest level management table has columns of items of video ID, scene ID, and interest level. The degree of interest is a value obtained by quantifying the user's interest in the scene specified by the video ID and the scene ID. By specifying the video ID and the scene ID, it is possible to obtain the degree of interest associated with the scene (part of the video) specified by the video ID and the scene ID.

図１に戻り、ダイジェスト映像生成装置１の構成の説明を続ける。
映像指定部１２は、ユーザの操作を受け付けてダイジェスト映像を生成する対象の映像と特定する情報を入力する。映像指定部１２は、入力した情報をシーン分割部１３に出力する。映像指定部１２が入力する情報は、例えば映像の映像タイトルや、映像の内容を示す文言や、映像に関するキーワード、映像に割り当てられている映像ＩＤなどである。 Returning to FIG. 1, the description of the configuration of the digest video generation device 1 will be continued.
The video designating unit 12 receives information specified by a user's operation as a target video for generating a digest video. The video designation unit 12 outputs the input information to the scene division unit 13. The information input by the video designating unit 12 includes, for example, a video title of the video, a word indicating the content of the video, a keyword related to the video, a video ID assigned to the video, and the like.

シーン分割部１３は、映像指定部１２から出力される情報と、映像情報蓄積部１１に記憶されている各テーブルとに基づいて、ダイジェスト映像を生成する対象の映像を特定する。シーン分割部１３は、特定した映像を読み出し、読み出した映像を複数のシーンに分割する。シーン分割部１３が映像を複数のシーンに分割する際には、公知の方法、例えば参考文献１に記載されているカット点検出方法を用いる。シーン分割部１３は、分割した複数のシーンそれぞれに対応するシーンＩＤを割り当て、シーンが始まる映像における経過時間をタイムスタンプとする。シーン分割部１３は、映像ＩＤとシーンＩＤとタイムスタンプとを対応付けてシーンごとに記憶させる。
［参考文献１］：特許第２８６９３９８号公報 The scene dividing unit 13 specifies a target video for generating a digest video based on information output from the video designating unit 12 and each table stored in the video information storage unit 11. The scene dividing unit 13 reads the specified video and divides the read video into a plurality of scenes. When the scene dividing unit 13 divides an image into a plurality of scenes, a known method, for example, a cut point detection method described in Reference Document 1 is used. The scene division unit 13 assigns a scene ID corresponding to each of the plurality of divided scenes, and uses the elapsed time in the video where the scene starts as a time stamp. The scene dividing unit 13 stores the video ID, the scene ID, and the time stamp in association with each scene.
[Reference 1]: Japanese Patent No. 2869398

ユーザ興味度算出部１４は、シーン分割部１３において得られた各シーンに対する興味度を算出し、算出した興味度とシーンＩＤと映像ＩＤとを対応付けてシーン興味度管理テーブルに記憶させる。ユーザ興味度算出部１４が算出する興味度として、例えば映像のシーンごとの視聴率を用いることができる。ユーザの興味度は、任意の方法で算出することができ、例えば視聴者のフィードバックを得ることが可能なシステムで映像が提供されている場合にはコメントの入力数や共感などを示すボタンの押下数を利用してフィードバックの頻度が高いほど興味度が高くなるように算出してもよい。 The user interest level calculation unit 14 calculates the interest level for each scene obtained by the scene division unit 13, and stores the calculated interest level, the scene ID, and the video ID in association with each other in the scene interest level management table. As the interest level calculated by the user interest level calculation unit 14, for example, an audience rating for each scene of the video can be used. The degree of interest of the user can be calculated by any method. For example, when the video is provided by a system capable of obtaining viewer feedback, the number of comments input, the push of a button indicating empathy, etc. The number may be used so that the degree of interest increases as the frequency of feedback increases.

シーン間類似度算出部１５は、ダイジェスト映像を生成する対象の映像のシーン間の類似度を算出する。シーン間類似度算出部１５は、シーンに付随するキーワードの共起度、シーンのタイムスタンプの近さ、あるいはシーンの映像特徴などに基づいて、類似度を算出する。 The inter-scene similarity calculation unit 15 calculates the similarity between scenes of a target video for generating a digest video. The inter-scene similarity calculation unit 15 calculates the similarity based on the co-occurrence of keywords associated with the scene, the proximity of the scene time stamp, or the video characteristics of the scene.

シーンに付随するキーワードの共起度を用いて類似度を算出する場合、シーン間類似度算出部１５は、シーンＡとシーンＢとの類似度Ｒ（Ａ，Ｂ）を次式（１）により算出する。 When calculating the similarity using the co-occurrence of keywords associated with the scene, the inter-scene similarity calculation unit 15 calculates the similarity R (A, B) between the scene A and the scene B by the following equation (1). calculate.

式（１）において、｜Ｋ_Ａ｜はシーンＡに付随するキーワードの数であり、｜Ｋ_Ｂ｜はシーンＢに付随するキーワードの数であり、｜Ｋ_Ａ∩Ｋ_Ｂ｜はシーンＡとシーンＢとに共通して付随するキーワードの数である。シーンに付随するキーワードとしては、字幕として映像に付与されているテキストから抽出したキーワードや、シーンに対して視聴者が入力したコメントに含まれるキーワードを利用することができる。シーン間類似度算出部１５は、抽出したキーワードと当該キーワードを識別するキーワードＩＤとを対応付けてキーワード情報管理テーブルに記憶させる。また、シーン間類似度算出部１５は、キーワードＩＤと映像ＩＤとシーンＩＤと対応付けて映像−キーワード関係管理テーブルに記憶させる。 In equation (1), | K _A | is the number of keywords associated with scene A, | K _B | is the number of keywords associated with scene B, and | K _A ∩K _B | B is the number of keywords that accompany B. As keywords associated with the scene, keywords extracted from text attached to the video as subtitles or keywords included in comments input by the viewer to the scene can be used. The inter-scene similarity calculation unit 15 stores the extracted keyword and the keyword ID for identifying the keyword in association with each other in the keyword information management table. The inter-scene similarity calculation unit 15 stores the keyword ID, the video ID, and the scene ID in association with each other in the video-keyword relationship management table.

また、シーンのタイムスタンプの近さを用いて類似度を算出する場合、シーン類似度算出部１５は、シーンＡとシーンＢとの類似度Ｒ（Ａ，Ｂ）を次式（２）により算出する。なお、Ｔ_Ａは映像の開始点からシーンＡが開始するまでの経過時間であり、Ｔ_Ｂは映像の開始点からシーンＢが開始するまでの経過時間である。 When calculating the similarity using the closeness of the time stamp of the scene, the scene similarity calculation unit 15 calculates the similarity R (A, B) between the scene A and the scene B by the following equation (2). To do. T _A is an elapsed time from the start point of the video until the scene A starts, and T _B is an elapsed time from the start point of the video to the start of the scene B.

また、シーンの映像特徴を用いて類似を算出する場合、シーン類似度算出部１５は、シーンにおける映像の色や音、映像における動きの類似度を数値化して類似度Ｒ（Ａ，Ｂ）を算出する。 When calculating the similarity using the video feature of the scene, the scene similarity calculation unit 15 quantifies the similarity of the color and sound of the video in the scene and the motion in the video, and calculates the similarity R (A, B). calculate.

シーン類似度算出部１５は、シーン間の類似度Ｒ（Ａ，Ｂ）として、前述の指標を含む複数の指標に基づく類似度を組み合わせて算出してもよい。この場合、類似度Ｒ（Ａ，Ｂ）は、次式（３）により算出される。なお、Ｒ_ｉ（Ａ，Ｂ）は任意の指標に基づいて算出したシーンＡとシーンＢとの類似度であり、ｗ_ｉは類似度Ｒ_ｉ（Ａ，Ｂ）に対する重み係数である。 The scene similarity calculation unit 15 may calculate the similarity R (A, B) between scenes by combining similarities based on a plurality of indices including the above-described indices. In this case, the similarity R (A, B) is calculated by the following equation (3). Note that R _i (A, B) is the similarity between the scene A and the scene B calculated based on an arbitrary index, and w _i is a weighting factor for the similarity R _i (A, B).

シーンスコア算出部１６は、シーン間類似度算出部１５が算出したシーン間の類似度を用いて、各シーンのスコアを算出する。シーンスコア算出部１６は、各シーンのスコアの算出を以下のように行う。シーンスコア算出部１６は、シーンをノードとし、シーン間の類似度を該当するシーンのノード間のエッジとする無向グラフ（以下、グラフという。）を生成する。 The scene score calculation unit 16 calculates the score of each scene using the similarity between scenes calculated by the inter-scene similarity calculation unit 15. The scene score calculation unit 16 calculates the score of each scene as follows. The scene score calculation unit 16 generates an undirected graph (hereinafter referred to as a graph) in which a scene is a node and a similarity between scenes is an edge between nodes of the corresponding scene.

図７は、本実施形態におけるシーンスコア算出部１６がグラフを生成する対象のシーンの一例を示す図である。図８は、本実施形態におけるシーンスコア算出部１６が生成するグラフの一例を示す図である。ここでは、サッカーの試合映像に対してタイムスタンプに基づく類似度を算出してグラフを作成した例が示されている。図７には、グラフを作成する際のノードとなるシーン１〜シーン５に関する情報（タイムスタンプと興味度）が示されている。図８には、シーンスコア算出部１６が生成したグラフが示されている。 FIG. 7 is a diagram illustrating an example of a scene for which the scene score calculation unit 16 according to the present embodiment generates a graph. FIG. 8 is a diagram illustrating an example of a graph generated by the scene score calculation unit 16 in the present embodiment. Here, an example is shown in which a graph is created by calculating a similarity based on a time stamp for a soccer game video. FIG. 7 shows information (time stamp and degree of interest) related to scenes 1 to 5 which are nodes when creating a graph. FIG. 8 shows a graph generated by the scene score calculation unit 16.

シーンスコア算出部１６は、各シーンに対応する行と列とを有し、行に対応するシーンの興味度、及び、行と列とに対応するシーン間の類似度から算出する値を要素の値とする確率行列Ｐを算出する。確率行列Ｐのｉ行ｊ列の要素ｐ_ｉｊは、例えば次式（４）により算出される。 The scene score calculation unit 16 has a row and a column corresponding to each scene, and calculates a value calculated from the degree of interest of the scene corresponding to the row and the similarity between the scenes corresponding to the row and the column. A probability matrix P as a value is calculated. The element _pij of i row j column of the probability matrix P is calculated by following Formula (4), for example.

式（４）において、αは興味度と類似度とのいずれを重視するかを示す重み係数であり、Ｉ（Ｓ_ｉ）は映像におけるｉ番目のシーンＳｉの興味度であり、Ｒ（Ｓ_ｉ，Ｓ_ｊ）は映像におけるｉ番目のシーンＳ_ｉとｊ番目のシーンＳ_ｊとのシーン間の類似度である。また、シーンスコア算出部１６は、映像における各シーンの興味度の和（Σ_ｉＩ（Ｓ_ｉ））と、シーン間の類似度の和（Σ_ｉＲ（Ｓ_ｉ，Ｓ_ｊ））とそれぞれが１になるように正規化する。 In Equation (4), α is a weighting factor indicating which of the interest level and the similarity level is important, I (S _i ) is the interest level of the i-th scene Si in the video, and R (S _i , S _j ) is the similarity between the scenes of the i-th scene S _i and the j-th scene S _j in the video. In addition, the scene score calculation unit 16 includes a sum of the interest levels of each scene in the video (Σ _i I (S _i )) and a similarity level between the scenes (Σ _i R (S _i , S _j )). Normalize so that becomes 1.

シーンスコア算出部１６は、算出した確率行列Ｐに対して次式（５）を満たすベクトルｖを算出する。 The scene score calculation unit 16 calculates a vector v satisfying the following expression (5) for the calculated probability matrix P.

算出されたベクトルｖは、グラフにおいて任意のノードを出発点とし、シーンＳ_ｊにいる場合に確率ｐ_ｉｊ（＝αＩ（Ｓ_ｉ）＋（１−α）Ｒ（Ｓ_ｉ，Ｓ_ｊ））でシーンＳ_ｉへの移動を無限回繰り返したときの、各シーン（グラフにおけるノード）における存在確率である。ベクトルｖは、確率行列Ｐの固有値が最大となるときの固有ベクトルに相当する。ベクトルｖを長さ１に正規化したときの各次元の値を、ベクトルｖの各次元に対応するシーンのスコアとする。 The calculated vector v has a probability p _ij (= αI (S _i ) + (1−α) R (S _i , S _j )) when an arbitrary node in the graph is the starting point and the user is in the scene S _j. This is the existence probability in each scene (node in the graph) when the movement to the scene S _i is repeated infinitely. The vector v corresponds to the eigenvector when the eigenvalue of the probability matrix P is maximized. The value of each dimension when the vector v is normalized to length 1 is set as the score of the scene corresponding to each dimension of the vector v.

例えば、図７と図８に示した５つのシーン１〜シーン５に対してα＝０．５とした場合の各シーンのスコアを算出すると図９に示す値となる。図９は、本実施形態におけるシーンスコア算出部１６が算出したベクトルｖ（各シーンのスコア）の一例を示す図である。図９において、例えば５行３列の要素の値は、シーン５の興味度にα（＝０．５）を乗じた値と、シーン３とシーン５との類似度に（１−α）を乗じた値との和に基づいた値となっている。なお、確率行列Ｐの要素は各列における要素の和が１になるように正規化してある。ただし、図示している例では小数第２位までを表記しているため、第２列と第４列とにおいて誤差が生じている。 For example, when the score of each scene when α = 0.5 is calculated for the five scenes 1 to 5 shown in FIGS. 7 and 8, the values shown in FIG. 9 are obtained. FIG. 9 is a diagram illustrating an example of a vector v (score of each scene) calculated by the scene score calculation unit 16 in the present embodiment. In FIG. 9, for example, the value of the element in 5 rows and 3 columns is obtained by multiplying the degree of interest of the scene 5 by α (= 0.5) and the similarity between the scene 3 and the scene 5 by (1−α). The value is based on the sum of the multiplied value. The elements of the probability matrix P are normalized so that the sum of the elements in each column is 1. However, in the illustrated example, since the second decimal place is shown, an error occurs in the second column and the fourth column.

ダイジェスト映像生成部１７は、シーンスコア算出部１６が算出した各シーンのスコアに基づいて、映像からダイジェスト映像を生成する。ダイジェスト映像に加えるシーンは、シーンスコアの高いシーンから順に、シーン数又はシーンの合計時間が予め定められた閾値以内になるように選択する。例えば、ダイジェスト映像生成部１７は、図７〜図９において例示したシーン１〜シーン５から２つのシーンを選択してダイジェスト映像を生成する場合、スコアの最も高いシーン３（得点シーン）と次にスコアの高いシーン２（攻撃シーン）とを選択する。ダイジェスト映像生成部１７は、選択した２つのシーンをタイムスタンプの小さい順に並べてダイジェスト映像として出力する。 The digest video generation unit 17 generates a digest video from the video based on the score of each scene calculated by the scene score calculation unit 16. Scenes to be added to the digest video are selected in order from the scene with the highest scene score so that the number of scenes or the total time of the scenes is within a predetermined threshold. For example, when the digest video generation unit 17 selects two scenes from the scenes 1 to 5 illustrated in FIGS. 7 to 9 to generate a digest video, the digest 3 having the highest score and the next scene are scored next. Scene 2 (attack scene) with a high score is selected. The digest video generation unit 17 arranges the two selected scenes in ascending order of time stamps and outputs them as a digest video.

ダイジェスト映像出力部１８は、ダイジェスト映像生成部１７が生成したダイジェスト映像を表示することにより、ユーザに提示する。なお、ダイジェスト映像生成装置１とユーザとが位置する場所が離れている場合などにおいては、ダイジェスト映像出力部１８は、ユーザが操作している装置へダイジェスト映像を送信することになる。また、ダイジェスト映像がネットワーク上に設けられたサーバ装置を介して配信される場合には、ダイジェスト映像出力部１８はダイジェスト映像をサーバ装置に送信して記憶させる。 The digest video output unit 18 displays the digest video generated by the digest video generation unit 17 and presents it to the user. In addition, when the place where the digest video production | generation apparatus 1 and a user are located apart, the digest video output part 18 will transmit a digest video to the apparatus which the user is operating. When the digest video is distributed via a server device provided on the network, the digest video output unit 18 transmits the digest video to the server device for storage.

図１０は、本実施形態におけるダイジェスト映像生成装置１が行うダイジェスト映像生成処理を示すフローチャートである。ダイジェスト映像生成処理が開始されると、映像指定部１２は、ユーザの操作を受け付けてダイジェスト映像を提示する対象となる映像を指定する情報を入力する（ステップＳ１１）。 FIG. 10 is a flowchart showing digest video generation processing performed by the digest video generation device 1 according to this embodiment. When the digest video generation process is started, the video designating unit 12 receives information from the user and inputs information for designating a video to be presented as a digest video (step S11).

シーン分割部１３は、映像指定部１２が入力した情報と映像情報蓄積部１１に記憶されている各テーブルとに基づいて映像を特定し、特定した映像を複数のシーンに分割する（ステップＳ１２）。 The scene dividing unit 13 specifies a video based on the information input by the video designating unit 12 and each table stored in the video information storage unit 11, and divides the specified video into a plurality of scenes (step S12). .

ユーザ興味度算出部１４は、シーン分割部１３により得られた複数のシーンごとに、シーンに対するユーザの興味度を算出する処理を繰り返して行う（ステップＳ１３）。 The user interest level calculation unit 14 repeatedly performs the process of calculating the user interest level for the scene for each of the plurality of scenes obtained by the scene division unit 13 (step S13).

シーン間類似度算出部１５は、シーン分割部１３により得られた複数のシーンにおける２つのシーンのすべての組み合わせごとに、シーン間の類似度を算出する処理を繰り返して行う（ステップＳ１４）。 The inter-scene similarity calculation unit 15 repeatedly performs the process of calculating the similarity between scenes for every combination of two scenes in the plurality of scenes obtained by the scene division unit 13 (step S14).

シーンスコア算出部１６は、ステップＳ１３において算出された各シーンのユーザの興味度と、ステップＳ１４において算出された２つのシーン間の類似度とに基づいて、各シーンのスコアを算出する（ステップＳ１５）。 The scene score calculation unit 16 calculates the score of each scene based on the degree of interest of the user of each scene calculated in step S13 and the similarity between the two scenes calculated in step S14 (step S15). ).

ダイジェスト映像生成部１７は、シーンスコア算出部１６により算出された各シーンのスコアに基づいてダイジェスト映像に含めるシーンを選択し、選択したシーンを映像情報蓄積部１１に記憶されている映像から抽出し、抽出したシーンの映像をタイムスタンプ順に連結してダイジェスト映像を生成する（ステップＳ１６）。 The digest video generation unit 17 selects a scene to be included in the digest video based on the score of each scene calculated by the scene score calculation unit 16, and extracts the selected scene from the video stored in the video information storage unit 11. The extracted scene images are connected in the order of time stamps to generate a digest image (step S16).

ダイジェスト映像出力部１８は、ステップＳ１６において生成されたダイジェスト映像をユーザに提示し（ステップＳ１７）、ダイジェスト映像生成処理を終了させる。 The digest video output unit 18 presents the digest video generated in step S16 to the user (step S17), and ends the digest video generation process.

本実施形態のダイジェスト映像生成装置１は、ユーザの指定した映像を分割して得られたシーンごとに、他のユーザ（視聴者）の行動に基づいた興味度を算出して付与する。続いて、時間、キーワード、映像特徴などの情報に基づいてシーン間の類似度を算出する。そして、興味度の高いシーンだけでなく、シーン間の類似度に基づいて興味度の高いシーンに関連するシーン、すなわちユーザの興味度が高い可能性があるシーンを含める選択をし、興味をもたれたシーンと類似するシーンを含めたダイジェスト映像を生成して提示することにより、映像内容に対するユーザの把握の度合いを高めることができる。 The digest video generation apparatus 1 of the present embodiment calculates and assigns an interest level based on the behavior of another user (viewer) for each scene obtained by dividing a video specified by the user. Subsequently, the degree of similarity between scenes is calculated based on information such as time, keywords, and video features. Then, select not only high-interest scenes but also scenes related to high-interest scenes based on the similarity between scenes, that is, scenes that may be of high interest to the user. By generating and presenting a digest video including a scene similar to the scene, it is possible to increase the user's level of understanding of the video content.

また、本実施形態のダイジェスト映像生成装置１は、映像の視聴者の反応として得られたコメントや、映像の特徴の類似性を分析することにより、興味をもたれやすいシーンに類似したシーンをダイジェスト映像に含めることができる。これにより、視聴者が少ないシーンであっても、興味をもたれやすいシーンをダイジェスト映像に含めて生成し提示することができる。 In addition, the digest video generation apparatus 1 according to the present embodiment analyzes a comment that is obtained as a reaction of the viewer of the video and a similarity of the feature of the video, so that a scene similar to a scene that is likely to be interesting is digest video. Can be included. Thereby, even if it is a scene with few viewers, the scene which is easy to be interesting can be included in a digest image | video, and can be shown and produced.

また、本実施形態のダイジェスト映像生成装置１は、各シーンをグラフにおけるノードとし、シーン間の類似度をエッジとしたグラフを生成し、各シーンの興味度の値を初期状態での各ノードの存在確率として、シーン間の類似度を遷移確率としてノード間をランダムに移動した後の各ノードにおける存在確率をシーンのスコアとしている。スコアを算出する際に用いる確率行列Ｐの各要素の値を、興味度の高さ及びシーン間の類似度のいずれを重視するかを表すパラメータαによって調節することができるため、柔軟なダイジェストの生成が可能になる。 In addition, the digest video generation apparatus 1 according to the present embodiment generates a graph in which each scene is a node in the graph and the similarity between the scenes is an edge, and the interest value of each scene is set to the initial value of each node. As the existence probability, the similarity between scenes is used as a transition probability, and the existence probability at each node after moving randomly between nodes is used as the score of the scene. Since the value of each element of the probability matrix P used for calculating the score can be adjusted by the parameter α indicating which of the high degree of interest and the similarity between scenes is important, a flexible digest Generation is possible.

本実施形態のダイジェスト映像生成装置１は、上記の特徴を有しているので、例えばサッカーの試合映像に対して、得点が入った瞬間のシーンに加え、得点が入る前の攻撃中のシーンや、ゴールを決めた選手による他のプレイのシーンをダイジェスト映像に含めてユーザに提示することができる。 Since the digest video generation apparatus 1 of the present embodiment has the above-described features, for example, in addition to a scene at the moment when a score is entered for a soccer game video, an attacking scene before the score is entered, The scene of another play by the player who has scored the goal can be included in the digest video and presented to the user.

なお、本実施形態では、ダイジェスト映像装置１が映像情報蓄積部１１を備える構成について説明したが、映像情報蓄積部１１はダイジェスト映像生成装置１がネットワークなどを介してアクセス可能なサーバ装置であってもよい。この場合、ダイジェスト映像装置１が備える各機能部は、ネットワークを介してサーバ装置にアクセスをして上述の処理を行うことになる。 In the present embodiment, the configuration in which the digest video device 1 includes the video information storage unit 11 has been described. However, the video information storage unit 11 is a server device that can be accessed by the digest video generation device 1 via a network or the like. Also good. In this case, each functional unit included in the digest video device 1 accesses the server device via the network and performs the above-described processing.

上述した実施形態におけるダイジェスト映像生成装置１をコンピュータで実現するようにしてもよい。その場合、ダイジェスト映像生成装置１が備える各機能部を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ−ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。更に「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、更に前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＰＬＤ（Programmable Logic Device）やＦＰＧＡ（Field Programmable Gate Array）等のハードウェアを用いて実現されるものであってもよい。 You may make it implement | achieve the digest image | video production | generation apparatus 1 in embodiment mentioned above with a computer. In that case, a program for realizing each functional unit included in the digest video generation apparatus 1 is recorded on a computer-readable recording medium, and the program recorded on the recording medium is read into a computer system and executed. It may be realized. Here, the “computer system” includes an OS and hardware such as peripheral devices. The “computer-readable recording medium” refers to a storage device such as a flexible medium, a magneto-optical disk, a portable medium such as a ROM and a CD-ROM, and a hard disk incorporated in a computer system. Further, the “computer-readable recording medium” is a program that dynamically holds a program for a short time, like a communication line when a program is transmitted via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory inside a computer system serving as a server or a client in that case may be included and a program held for a certain period of time. Further, the program may be for realizing a part of the functions described above, and may be a program capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be realized using hardware such as PLD (Programmable Logic Device) or FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration is not limited to this embodiment, and includes designs and the like that do not depart from the gist of the present invention.

映像に含まれる部分映像を組み合わせたダイジェスト映像によって、映像内容に対するユーザの把握の度合いを高めることが不可欠な用途にも適用できる。 It can also be applied to applications where it is indispensable to increase the degree of grasp of the user with respect to the video content by digest video combining partial videos included in the video.

１…ダイジェスト映像生成装置
１１…映像情報蓄積部
１２…映像指定部
１３…シーン分割部
１４…ユーザ興味度算出部
１５…シーン間類似度算出部
１６…シーンスコア算出部
１７…ダイジェスト映像生成部
１８…ダイジェスト映像出力部 DESCRIPTION OF SYMBOLS 1 ... Digest image | video production | generation apparatus 11 ... Image | video information storage part 12 ... Image | video designation | designated part 13 ... Scene division | segmentation part 14 ... User interest degree calculation part 15 ... Interscene similarity calculation part 16 ... Scene score calculation part 17 ... Digest image generation part 18 ... Digest video output section

Claims

A scene dividing unit that divides the specified video into a plurality of scenes;
A user interest degree calculation unit for calculating an interest degree for each of the plurality of scenes;
An inter-scene similarity calculation unit that calculates the similarity between scenes for each combination of two scenes in the plurality of scenes;
A scene score calculator that calculates a score for each scene based on the interest of each scene calculated by the user interest calculator and the similarity between scenes calculated by the inter-scene similarity calculator;
A digest video generation unit that selects a scene from the plurality of scenes based on the score calculated for each of the plurality of scenes, and generates a digest video of the video including the selected scene;
With
The scene score calculation unit
A graph in which each of the plurality of scenes is a node and the similarity between scenes is an edge between the nodes is generated, and the similarity that is assigned to the edge between the nodes is defined as the existence probability for each node in the initial state. A digest video generation device characterized by calculating, as a score, the existence probability of each node after randomly moving between nodes using as a transition probability.

The digest video generation device according to claim 1 ,
The inter-scene similarity calculation unit
A digest video generation apparatus, wherein the similarity between scenes is calculated using at least one of a start time of each of the plurality of scenes in the video, a keyword related to the scene, and a video feature of the scene.

A digest video generation method performed by a digest video generation device that generates a digest video of a specified video,
A scene dividing step of dividing the video into a plurality of scenes;
A user interest degree calculating step for calculating an interest degree for each of the plurality of scenes;
An inter-scene similarity calculation step of calculating a similarity between scenes for each combination of two scenes in the plurality of scenes;
A scene score calculation step for calculating a score for each scene based on the interest level of each scene calculated in the user interest level calculation step and the similarity level between scenes calculated in the inter-scene similarity calculation step;
Selecting a scene from the plurality of scenes based on the score calculated for each of the plurality of scenes, and generating a digest video of the video including the selected scene; and
Including
In the scene score calculation step,
Generate a graph in which each of the plurality of scenes is a node and the similarity between the scenes is an edge between the nodes, and the interest degree of the node is the existence probability for each node in the initial state, and the similarity assigned to the edge between the nodes is A digest video generation method characterized by calculating, as a score, the existence probability of each node after moving randomly between nodes as a transition probability.

The digest video generation method according to claim 3 ,
In the inter-scene similarity calculation step,
A digest video generation method, wherein the similarity between scenes is calculated using at least one of a start time of each of the plurality of scenes in the video, a keyword related to the scene, and video characteristics of the scene.

A digest video generation program for causing a computer to function as the digest video generation device according to claim 1 .