JP2008103802A

JP2008103802A - Image compositing device

Info

Publication number: JP2008103802A
Application number: JP2006282371A
Authority: JP
Inventors: Takeaki Suenaga; 健明末永; Yoshiaki Ogisawa; 義昭荻澤; Shuichi Watabe; 秀一渡部
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2006-10-17
Filing date: 2006-10-17
Publication date: 2008-05-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image compositing device capable of presenting information on other video images without interfering with viewing desire and understanding to a video image that a user is viewing mainly by selecting the video image related to a video scene for compositing and displaying when the user is viewing the video scene unimportant to the user. <P>SOLUTION: A video image feature detection section 102 acquires feature information from each scene in the image being viewed by the user currently, and an importance calculation section 103 uses the feature information for calculating the index values of the importance to each scene successively, thus detecting an unimportant fixed section (scene) by an unimportant scene identification section 104. At a recorded video image selection section 106, a subvideo image related to the video image under viewing is selected by a storage section 105. A video composite section 109 composites the subvideo image related to the video image under viewing mainly corresponding to an unimportant scene. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、複数の映像を合成して表示を行う映像合成装置に関する。 The present invention relates to a video composition apparatus that synthesizes and displays a plurality of videos.

近年、ネットワークの高速化、チャンネルの多様化に伴い、個人が視聴することの出来る映像、動画コンテンツの量は飛躍的に増加している。また、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）やハードディスク装置（ＨＤＤ）などの記録媒体の大容量化に伴い、映像をこれらの記録媒体へ大量に録画し、放送時間に縛られることなく番組を視聴するというスタイルも一般的となった。 In recent years, with the speeding up of networks and the diversification of channels, the amount of video and video content that can be viewed by individuals has increased dramatically. Also, with the increase in capacity of recording media such as DVDs (Digital Versatile Discs) and hard disk drives (HDDs), a large amount of video is recorded on these recording media and the program can be viewed without being restricted by the broadcast time. Also became common.

このように個人が扱う映像の量が膨大になると、これら映像の中に自分が視聴したいと考えうる映像が存在しているかどうかを容易に知ることが出来なくなる。そこで、この膨大な映像の中から映像の存在を得るための手法として、特許文献１では、主として視聴する任意の映像を視聴している間に、映像に添付されている付加情報を基に関連する映像を自動で検索し、その情報又はそれ自身を多重化して出力する手法が提案されている。
特開２００３−１３４４２８号公報 If the amount of video handled by an individual becomes enormous in this way, it will not be possible to easily know whether or not there is a video that can be viewed by the user. Therefore, as a technique for obtaining the existence of a video from this enormous amount of video, Patent Document 1 mainly relates to the additional information attached to the video while viewing an arbitrary video to be viewed. A method has been proposed in which a video to be searched is automatically searched and the information or itself is multiplexed and output.
JP 2003-134428 A

しかしながら、特許文献１では映像を挿入するタイミングまでは考慮されておらず、ユーザにとって重要と考えられるシーンにまで多重化を行ってしまい、その結果ユーザの映像への視聴意欲を著しく削ぐものであった。 However, Patent Document 1 does not take into account the timing of video insertion, and multiplexes scenes that are considered to be important for the user, resulting in a significant reduction in the user's willingness to view the video. It was.

また、特許文献１では映像に添付される情報を用いて、番組開始時にその番組に関連する別の映像を検索しているが、一般的に映像は信号的、意味的に性質の異なる複数のシーンの組み合わせで成っているものであるから、この刻々と変化するシーン各々に関連する適切な別の映像を選択するには、この手法では不十分であった。 In Patent Document 1, another information related to the program is searched at the start of the program using information attached to the video. In general, a video has a plurality of signals and semantically different properties. Since this method is composed of a combination of scenes, this method is not sufficient to select an appropriate different image related to each moment-changing scene.

更に、映像が挿入され多重表示する区間が、元の映像の総再生時間より長くなる場合、単純に挿入された映像を先頭から再生するだけでは、挿入された映像を最後まで再生する前に元の映像の再生が終わり、挿入された映像の再生が途中までとなってしまい、ユーザが挿入された映像の概要を理解することが難しくなるという問題があった。 Furthermore, when the video is inserted and the section for multiple display becomes longer than the total playback time of the original video, simply playing the inserted video from the beginning will cause the original video to be played back before playback to the end. Thus, there has been a problem that the playback of the inserted video is finished and the inserted video is played halfway, making it difficult for the user to understand the outline of the inserted video.

そこで、本発明の目的は、ユーザが重要としない映像シーンを視聴している時に、該映像シーンと関連のある映像を選択し、合成して表示することにより、ユーザが主として視聴している映像への視聴意欲や理解を妨げることなく、他の映像の情報も提示可能な映像合成装置を提供しようとするものである。 Therefore, an object of the present invention is to select a video that is related to the video scene when the user is watching a video scene that is not important, and display the synthesized video to display the video that the user mainly views. It is an object of the present invention to provide a video synthesizing apparatus capable of presenting other video information without disturbing the willingness to watch and understanding.

斯かる実情に鑑み、第１の発明による映像合成装置は、放送または記録されている少なくとも２つの映像を合成して表示する映像合成装置であって、
前記映像内の各シーンに対応する特徴情報を検出する特徴検出部と、前記映像および／または前記特徴情報を記録する記録部と、前記特徴情報に基づいて第１の映像の各シーンに対する重要度についての指標の値を計算する計算部と、第２の映像を選択する映像選択部と、前記指標の値に基づいて前記第２の映像を合成する前記第１の映像内のシーンを特定するシーン特定部と、前記シーン特定部で特定された前記第１の映像内の当該シーンに前記第２の映像を合成する映像合成部と、を具備することを特徴とする。 In view of such circumstances, the video composition device according to the first aspect of the present invention is a video composition device that synthesizes and displays at least two images that are broadcast or recorded.
A feature detection unit that detects feature information corresponding to each scene in the video, a recording unit that records the video and / or the feature information, and the importance of each scene of the first video based on the feature information A calculation unit that calculates an index value for the image, a video selection unit that selects a second video, and a scene in the first video that combines the second video based on the value of the index A scene specifying unit; and a video synthesizing unit that combines the second video with the scene in the first video specified by the scene specifying unit.

また、第２の発明による映像合成装置は、前記映像選択部が、前記特徴情報に基づいて、前記第２の映像として、前記第１の映像に類似または関連した映像を検出して選択することを特徴とする。 In the video composition device according to the second invention, the video selection unit detects and selects a video similar to or related to the first video as the second video based on the feature information. It is characterized by.

また、第３の発明による映像合成装置は、前記映像選択部が、前記第１の映像のシーンごとあるいは連続するシーングループごとの前記特徴情報に基づいて前記第２の映像を選択することを特徴とする。 In the video composition device according to the third invention, the video selection unit selects the second video based on the feature information for each scene of the first video or each continuous scene group. And

また、第４の発明による映像合成装置は、前記映像選択部が前記第２の映像を選択するための前記特徴情報は、前記シーン特定部で特定された前記第１の映像内のシーンの直近の前記指標の値が高いシーンあるいはシーングループから得られる前記特徴情報とすることを特徴とする。 In the video composition device according to a fourth aspect of the invention, the feature information for the video selection unit to select the second video is the latest of the scene in the first video specified by the scene specification unit. The feature information is obtained from a scene or a scene group having a high value of the index.

また、第５の発明による映像合成装置は、前記第１の映像が放送中の映像であり、前記第２の映像が装置内に記録されている映像であることを特徴とする。 The video composition apparatus according to the fifth invention is characterized in that the first video is a video being broadcast and the second video is a video recorded in the device.

また、第６の発明による映像合成装置は、前記第１の映像および前記第２の映像が共に、装置内に記録された映像であることを特徴とする。 The video composition device according to the sixth invention is characterized in that both the first video and the second video are video recorded in the device.

また、第７の発明による映像合成装置は、前記特徴情報が、映像中で刻々と変化する映像情報、音声情報、テキスト情報を含むことを特徴とする。 According to a seventh aspect of the present invention, there is provided the video composition device, wherein the feature information includes video information, audio information, and text information that change every moment in the video.

また、第８の発明による映像合成装置は、前記シーン特定部が、前記指標の値に対して閾値処理を行うことで、前記第２の映像を合成する前記第１の映像内のシーンを特定することを特徴とする。 According to an eighth aspect of the present invention, the scene specifying unit specifies a scene in the first video to be combined with the second video by performing threshold processing on the index value. It is characterized by doing.

また、第９の発明による映像合成装置は、前記計算部が、前記閾値を設定し、前記指標の値とともに、前記シーン特定部に出力することを特徴とする。 In the video composition device according to a ninth aspect of the invention, the calculation unit sets the threshold value and outputs the threshold value together with the index value to the scene specifying unit.

また、第１０の発明による映像合成装置は、前記計算部が、第１の映像シーンの種別により前記閾値の選択域の情報を反転させることを特徴とする。 In the video composition device according to a tenth aspect of the invention, the calculation unit inverts the information on the threshold selection area according to the type of the first video scene.

また、第１１の発明による映像合成装置は、前記映像合成部が、選択した前記第２の映像を、前記シーン特定部で特定された前記第１の映像内のシーンの連続時間長さに合わせたダイジェスト映像として編集することを特徴とする。 According to an eleventh aspect of the present invention, there is provided the video composition device, wherein the video composition unit matches the second video selected by the continuous time length of the scene in the first video specified by the scene specifying unit. It is characterized by being edited as a digest video.

本発明では、ユーザが現在、主として視聴している映像内の各シーンから映像情報を取得し、それらを用いて各シーンに対する重要度の指標の値を逐次計算することで、重要ではない一定の区間（シーン）を検出する。この重要ではないシーンに合わせて主として視聴中の映像に関連する他の映像を多重化することで、ユーザが主として視聴している映像への視聴意欲や理解を妨げることなく、他の映像の情報を提供することができる。 In the present invention, video information is obtained from each scene in the video currently being viewed by the user, and the importance index value for each scene is sequentially calculated by using the video information. A section (scene) is detected. By multiplexing other video mainly related to the video currently being viewed in accordance with this unimportant scene, information on other video can be obtained without hindering the user's willingness to understand or understanding the video that is mainly viewed. Can be provided.

また、この選択された映像を、主として視聴中の映像において重要ではないとされたシーンの直前のシーン、即ち最近重要とされていたシーンから得られる特徴も加味して選択することで、ユーザのその時々の興味を推定した、適切なコンテンツを選出することができる。 In addition, the user can select the selected video in consideration of the characteristics obtained from the scene immediately before the scene that is mainly regarded as unimportant in the currently viewed video, that is, the feature obtained from the recently important scene. Appropriate contents can be selected based on their interests.

また、選択された映像を、重要ではないシーンの長さに合わせてダイジェストに編集し、再生することにより短時間で選択された映像の概要を知ることができる。 Further, the selected video can be digested in accordance with the length of an unimportant scene and played back, so that an outline of the selected video can be known in a short time.

以下、本発明の実施の形態を図示例と共に説明する。
図１は、本発明の一実施形態にかかる映像合成装置の概略構成を示した機能ブロック図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a functional block diagram showing a schematic configuration of a video composition device according to an embodiment of the present invention.

図１の映像合成装置１００は、映像入力部１０１から得られた映像とその映像に付随する付属情報から、その映像の特徴を検出する映像特徴検出部１０２、得られた映像特徴から映像各シーンの重要度を計算する重要度計算部１０３、計算された重要度から、重要度の低いシーン区間を特定する非重要シーン特定部１０４、記録部１０５に既に録画記録された映像とその映像特徴から、入力映像と録画済映像の特徴相関を計算し、入力映像に相関のある映像を録画済映像から選択する録画済映像選択部１０６、選択された録画済映像、その映像特徴と非重要シーン特定部１０４から得られた非重要シーン区間情報を受け取ってダイジェストを作成するダイジェスト作成部１０７、放送コンテンツと録画済コンテンツのダイジェストから動画像出力部１０８上の画面表示を切り替える映像合成部１０９とを含んで構成される。 The video composition apparatus 100 in FIG. 1 includes a video feature detection unit 102 that detects a feature of the video from the video obtained from the video input unit 101 and attached information accompanying the video, and each scene from the obtained video feature. An importance calculation unit 103 for calculating the importance of the image, a non-important scene specifying unit 104 for specifying a scene section having a low importance from the calculated importance, and a video already recorded and recorded in the recording unit 105 and its video feature The feature correlation between the input video and the recorded video is calculated, and the recorded video selection unit 106 for selecting the video correlated with the input video from the recorded video, the selected recorded video, the video feature and the unimportant scene specification The digest creation unit 107 that receives the non-important scene section information obtained from the unit 104 and creates a digest, a moving image from the digest of the broadcast content and the recorded content Configured to include an image combining unit 109 to switch the screen display on the radical 19 108.

一般に、ある映像を視聴する際、ユーザは常にその映像に集中しているわけではない。コマーシャル中や単調なシーンなど、一般的にあまり重要とされないシーンにおいて、ユーザは視聴中の映像に対する興味を一時的に失ってしまう。 In general, when viewing a video, the user is not always focused on the video. In scenes that are generally not very important such as commercials and monotonous scenes, the user temporarily loses interest in the video being viewed.

本発明では、ユーザが主として視聴している映像（以下、「主映像」という）の各シーンの重要度を計算し、閾値を用いて重要度が低いと判断されたシーンに合わせて、蓄積された映像の中から、現在のユーザの興味を推定した適切な映像（以下、「副映像」という）を選択し、その副映像自身または副映像を再編集したダイジェストを再生する。 In the present invention, the importance level of each scene of the video mainly viewed by the user (hereinafter referred to as “main video”) is calculated and stored according to the scene determined to be low in importance using a threshold value. An appropriate video (hereinafter referred to as “sub-video”) from which the current user's interest has been estimated is selected from the video, and the sub-video itself or a digest re-edited from the sub-video is reproduced.

ここで、本発明における重要度とは、映像内各シーンの視聴する価値を示す指標とする。ここで、主映像と副映像は同時に再生されるため、副映像を視聴している間、主映像の内容が全く把握できなくなるようなことはない。
更に、ユーザの興味が高いと推定される映像を副映像として選択することで、ユーザは飽きること無く映像視聴を行うことが出来る。 Here, the importance in the present invention is an index indicating the viewing value of each scene in the video. Here, since the main video and the sub-video are played back at the same time, the content of the main video is never lost while the sub-video is being viewed.
Furthermore, by selecting a video that is estimated to be of high user interest as a sub-video, the user can watch the video without getting bored.

以下に、図１各部の詳細を示す。
まず、映像特徴検出部１０２の動作を説明する。 Details of each part of FIG. 1 are shown below.
First, the operation of the video feature detection unit 102 will be described.

映像特徴検出部１０２は、映像から様々な情報を取得し、映像全体、またはシーンごとの特徴を検出するブロックである。ここで得られた映像特徴は、録画済映像選択部１０６と重要度計算部１０３に伝達され、視聴の映像と類似または関連した映像の検索や、映像の重要度を計算するのに利用される。録画済映像選択部１０６と重要度計算部１０３の詳細については後述する。 The video feature detection unit 102 is a block that acquires various information from the video and detects the features of the entire video or each scene. The obtained video features are transmitted to the recorded video selection unit 106 and the importance level calculation unit 103, and are used to search for videos similar to or related to the viewed video and to calculate the importance level of the video. . Details of the recorded video selection unit 106 and the importance level calculation unit 103 will be described later.

映像特徴検出部１０２で映像特徴を検出する為に、映像自体から得られる情報乃至は映像に付属する情報を利用できる。
映像自体から得られる情報の例としては、画像情報、音声情報または字幕情報など、映像に付属する情報としてはＥＰＧ（ＥｌｅｃｔｒｏｎｉｃＰｒｏｇｒａｍＧｕｉｄｅ）、映像に付されたタグ情報などが挙げられる。 In order to detect the video feature by the video feature detection unit 102, information obtained from the video itself or information attached to the video can be used.
Examples of information obtained from the video itself include EPG (Electronic Program Guide) and tag information attached to the video as information attached to the video, such as image information, audio information, or caption information.

これら映像情報から得られる映像特徴の例として、画像情報から得られるシーン切替位置や色ヒストグラム情報、エッジ情報、テロップ情報、動きベクトル情報、音声情報から得られる音声レベルや発話位置、背景音楽（ＢＧＭ）情報、ＥＰＧやタグ情報から得られるタイトル、ジャンル情報、出演者、録画日時などの重要キーワードが挙げられる。 Examples of video features obtained from these video information include scene switching position and color histogram information obtained from image information, edge information, telop information, motion vector information, voice level and speech position obtained from voice information, background music (BGM) ) Important keywords such as information, titles obtained from EPG and tag information, genre information, performers, recording date and time.

映像特徴はシーンごとに求められるものであって、映像入力部１０１から得られる入力映像が放送中の映像であった場合、リアルタイムに処理しなければならないが、映像入力部１０１からの入力映像が録画済の映像、即ちユーザの視聴開始前に映像全体が既知であった場合、事前に処理しておくことが可能である。この場合、映像特徴を事前に検出し、記録部１０５に記録しておく構成としても良い。 The video feature is obtained for each scene, and when the input video obtained from the video input unit 101 is a video being broadcast, it must be processed in real time, but the input video from the video input unit 101 is If the recorded video, that is, the entire video is known before the user starts viewing, it can be processed in advance. In this case, the video feature may be detected in advance and recorded in the recording unit 105.

重要度計算部１０３では、映像特徴検出部１０２から得られた映像の特徴を利用し、シーンごとに重要度を計算する。 The importance level calculation unit 103 uses the video features obtained from the video feature detection unit 102 to calculate the importance level for each scene.

前述したように、一般にユーザは自分で興味のある映像を選択し視聴している場合でも、全ての時間その映像に集中しているわけではない。
例えば、サッカーの試合における単調なパス回しのシーンや、野球の試合における攻守交替時などの一般的に盛り上がりに欠けるシーン、バラエティや歌番組などでは好みのタレントや歌手が出演していないシーンなどはユーザが映像に対して一時的に興味を失い、その映像に集中しなくなる。 As described above, in general, even when a user selects and views a video of interest by himself / herself, the user does not concentrate on the video all the time.
For example, a monotonous pass scene in a soccer game, a scene that generally lacks excitement, such as when changing offense and defense in a baseball game, or a scene where a favorite talent or singer does not appear in a variety or song program The user temporarily loses interest in the video and does not concentrate on the video.

本発明では、刻々と変化する映像の各シーンから得られる複数の映像特徴を単一的乃至は複合的に利用し、シーンごとの重要度を決定する。この重要度を用いてユーザが映像に対して興味を失うシーンを推定する。 In the present invention, the importance of each scene is determined by using a plurality of video features obtained from each scene of a video that changes every moment in a single or complex manner. A scene in which the user loses interest in the video is estimated using this importance.

例えば、人物やテロップが出現するシーンなど、ユーザが有用な情報を取得可能なオブジェクトが多数存在するシーンでは、エッジ強度が高い領域を多く含むという特徴がある。このことから、エッジ強度を用いた重要度が考えられる。 For example, a scene in which there are many objects from which a user can obtain useful information, such as a scene in which a person or a telop appears, has a feature that it includes many regions with high edge strength. From this, the importance using the edge strength can be considered.

まずエッジ強度ｅを、次式で求める。
ｅ＝ｆ(ｐ)・・・（１） First, the edge strength e is obtained by the following equation.
e = f (p) (1)

ここでｐは任意の画像の各画素における輝度であり、ｆ（ｐ）はＳｏｂｅｌやＬａｐｌａｃｉａｎに代表されるエッジ取得の為のフィルタである。
このエッジ情報ｅが事前に設定された閾値ｔを越える画素をエッジ強度が高い領域として検出し、これらエッジ強度が高い領域の出現頻度Ｎを重要度ｗとすると、次式のようになる。
ｗ＝Ｎ（ｅ）・・・（２） Here, p is the luminance at each pixel of an arbitrary image, and f (p) is a filter for edge acquisition represented by Sobel and Laplacian.
If the pixel in which the edge information e exceeds a preset threshold value t is detected as a region having a high edge strength, and the appearance frequency N of the region having a high edge strength is the importance w, the following equation is obtained.
w = N (e) (2)

また、前述のエッジ強度ｅが高い画素の集合に更にハフ変換を用いて、円や直線などの人工の幾何学的な規則性を抽出し、その出現頻度を重要度としても良い。 Further, an artificial geometric regularity such as a circle or a straight line may be further extracted by using the Hough transform for the set of pixels having a high edge strength e, and the appearance frequency may be used as the importance.

この他にも音声情報を用いた重要度計算も考えられる。
盛り上がったシーンのみを検出する為に、音声情報から得られる、音声レベルを重要度としても良いし、人物が会話するシーンを検出する為に、音声情報から音声区間を検出し、更に人の声の周波数特性から発話区間を特定、その出現頻度を重要度としても良い。また、モノラル音声が番組本編、ステレオ音声がコマーシャルとなっている番組の場合、この音声フォーマットの種別を利用してコマーシャルシーン区間を検出し、コマーシャルシーン区間の重要の度合いが低くなるように重要度を設定しても良い。 In addition to this, importance calculation using voice information can be considered.
In order to detect only the rising scene, the voice level obtained from the voice information may be set as the importance level, or in order to detect the scene where the person talks, the voice section is detected from the voice information, and further the voice of the person The utterance interval may be identified from the frequency characteristics of the, and the appearance frequency may be set as the importance. Also, in the case of a program in which the monaural audio is the main program and the stereo audio is commercial, the importance is set so that the commercial scene section is detected using the type of the audio format, and the importance of the commercial scene section is reduced. May be set.

上記重要度作成の例では、映像を復号した後の情報を用いて重要度の計算を行ったが、勿論復号前の情報を用いて重要度を計算しても良い。例えば、映像がＭＰＥＧ（ＭｏｖｉｎｇＰｉｃｔｕｒｅＥｘｐｅｒｔｓＧｒｏｕｐ）方式のようなフレーム間予測符号化方式によって圧縮されていた場合、復号前の映像データから得られる動きベクトル情報を用いることによって、映像中の動きが激しい区間、即ちユーザに視覚的に刺激を与えるかどうかを重要度とすることも出来る。 In the above-described example of creating importance, importance is calculated using information after video is decoded, but of course, importance may be calculated using information before decoding. For example, when video is compressed by an inter-frame predictive encoding method such as MPEG (Moving Picture Experts Group), motion vector information obtained from video data before decoding is used, so that the motion in the video is intense. The interval, that is, whether or not the user is visually stimulated can be set as the importance.

また、いくつかの映像特徴を複合的に利用した重要度の計算方法も考えられる。
図２は、重要度計算の一例を示した図である。
図２を用いて、スポーツ映像を視聴中の場合の重要度計算の例を示す。
サッカーなどのスポーツ映像を視聴している場合には、ユーザが興味を持つのは、シュート、決定的なパス、ファールなど試合結果を左右するような決定的なイベントが発生したシーンであることが多く、それ以外のシーンに対しては興味が薄くなることが多い。
このことから、前記決定的なシーンを特徴付ける映像情報に対して重要度が高くなるように計算を行う。 Also, a method of calculating importance using a combination of several video features is conceivable.
FIG. 2 is a diagram illustrating an example of importance calculation.
An example of importance calculation when a sports video is being viewed will be described with reference to FIG.
When watching sports videos such as soccer, the user may be interested in scenes where decisive events such as shoots, decisive passes, and fouls affect game results. Many people are less interested in other scenes.
Therefore, the calculation is performed so that the importance is high for the video information characterizing the decisive scene.

一般的に、前記重要なイベントが発生したシーンでは以下のような映像特徴が得られる。
Ｉ．観客の歓声が大きくなることから、音声レベルが高くなる。
ＩＩ．重要なシーンを最適な角度から捉えようと様々な角度からのカメラに次々と切り替えられることが多いことからカメラ切替頻度が高い。
ＩＩＩ．得点が入ったシーンでは、画面上に文字情報が表示される。 In general, the following video features can be obtained in a scene where the important event occurs.
I. Since the cheering of the audience increases, the sound level increases.
II. The camera switching frequency is high because it is often possible to switch to cameras from various angles one after another in order to capture important scenes from the optimal angle.
III. In a scene with a score, character information is displayed on the screen.

これらの特徴から、図２（ａ）に示すように映像特徴検出部１０２から音声レベル、カメラ切り替え頻度、文字情報出現頻度を映像特徴として取得し、任意の時刻ｔでの音声レベルをｘ、時刻ｔ前後の単位時間中のカメラ切り替え頻度をｙ、時刻ｔ前後の単位時間中の文字情報出現頻度をｚとすると、係数ａ、ｂ、ｃを利用し、映像重要度ｗを以下のように計算することができる。
ｗ＝ａｘ＋ｂｙ＋ｃｚ・・・（３） From these features, as shown in FIG. 2A, the audio level, camera switching frequency, and character information appearance frequency are acquired from the video feature detection unit 102 as video features, and the audio level at an arbitrary time t is set to x, time If the camera switching frequency during the unit time before and after t is y, and the character information appearance frequency during the unit time before and after time t is z, the video importance w is calculated as follows using the coefficients a, b, and c. can do.
w = ax + by + cz (3)

図２（ｂ）は、（３）の式により計算した例を示す。
ここで、係数ａ、ｂ、ｃは本装置に予め設定されている任意の値とする。これ以降、特に断りが無い限り、予め設定された値とは本装置で予め設定された任意の値であるとする。 FIG. 2B shows an example calculated by the equation (3).
Here, the coefficients a, b, and c are arbitrary values set in advance in the present apparatus. Hereinafter, unless otherwise specified, it is assumed that the preset value is an arbitrary value preset in the present apparatus.

一方、ドラマなどの映像においては、出演者が大きく映し出されたシーンや出演者の発話シーンにユーザに興味を抱かせるイベントが含まれていることが多いので、映像情報からシーンの切り替わりの多さ、出演者の発話の長さや回数などを映像特徴として検出し映像の重要度合いの指標とすることができる。
それ以外にも、予め好みの俳優の画像、音声などを登録しておき、パターンマッチングなどの手法で、現在のシーンに含まれる情報が事前に登録された画像、音声にマッチするかどうかを判定し、これら好みの俳優が出演しているシーンか否かを重要度の指標としても良い。 On the other hand, in a video such as a drama, there are many events that make the user interested in the scene where the performer is projected widely or the utterance scene of the performer. The length and number of utterances of performers can be detected as video features and used as an index of the importance of the video.
In addition, it is possible to register images and sounds of your favorite actors in advance, and determine whether the information included in the current scene matches the images and sounds registered in advance using a method such as pattern matching. And it is good also as an index of importance whether it is a scene where these favorite actors appear.

また、ニュース、情報番組などの映像では、一般にアナウンサーがメイン画面中央に配置され、発話しているシーンや、テロップが含まれるシーンにユーザにとって有益な情報が含まれる傾向がある。よって、映像情報からテロップの出現頻度や発話位置などを映像特徴として取得し、複合的に利用することで、各シーンの映像重要度を計算することが可能となる。 In addition, in videos such as news and information programs, an announcer is generally arranged in the center of the main screen, and there is a tendency that useful information for the user is included in a scene where a utterance or a telop is included. Therefore, by acquiring the appearance frequency and utterance position of the telop as video features from the video information and using them in combination, the video importance of each scene can be calculated.

図２を用いた説明において、係数ａ、ｂ、ｃは予め設定されている値であるとしたが、ユーザがその視聴目的に応じて任意に決定できる係数としても良い。 In the description using FIG. 2, the coefficients a, b, and c are values set in advance. However, the coefficients may be arbitrarily determined by the user according to the viewing purpose.

例えば、ユーザが得点の入ったシーンに重きを置く場合、それらのシーンは歓声が大きく、得点が入ったことを示す文字情報が挿入されることから、音声レベルの係数ａと文字出現頻度の係数ｃを大きくすることにより、得点の入るシーンに対する重要度が高くなる。 For example, when the user puts weight on scenes with scores, the scenes are loudly cheered, and character information indicating that scores have been entered is inserted. Increasing c increases the importance of the scored scene.

また、ユーザが得点の有無に関わらず、試合を左右する決定的なシーン全般に対して重きを置く場合、それらのシーンでは歓声が大きく、リプレイが多く取り入れられることから、音声レベルの係数ａとカメラ切り替え頻度の係数ｂを大きくすることで、決定的なシーン全般が重要とされる重要度をユーザは得ることが出来る。 In addition, when the user places importance on the decisive scenes that influence the game regardless of the score, the cheers are large in those scenes, and many replays are taken in. By increasing the coefficient b of the camera switching frequency, the user can obtain a degree of importance in which a decisive scene as a whole is important.

図３は、ユーザからの入力を受け付けるインターフェイスの例を示した図である。
この際のユーザからの入力は、操作入力部１１０から得るものとし、図３（ａ）に示すように、コンボボックス形式で、予め用意した複数の重要とする要素と、その係数を入力させる形式としても良い。 FIG. 3 is a diagram illustrating an example of an interface that receives input from the user.
In this case, the input from the user is obtained from the operation input unit 110, and as shown in FIG. 3A, a plurality of important elements prepared in advance and a coefficient for inputting the coefficients are provided in a combo box format. It is also good.

または、重要度計算部１０３は、映像ジャンルごとに、係数ａ、ｂ、ｃの値を設定したテーブルを備え、これを参照して重要度を計算するとしてもよい。 Alternatively, the importance calculation unit 103 may include a table in which the values of the coefficients a, b, and c are set for each video genre, and calculate the importance by referring to the table.

無論、重要度を計算する際に使用する映像特徴はいくつ使用しても良いし、映像重要度は映像特徴からのみ計算されると限定するわけではない。通常再生、早送り、巻き戻し、一時停止などの過去の操作履歴から、一時停止を行ったり、巻き戻して再度視聴したシーンには高い重要度、早送りで飛ばしたシーンについては低い重要度を設定することで、ユーザが好みとするシーンを推定することもできる。 Of course, any number of video features used for calculating the importance may be used, and the video importance is not limited to being calculated only from the video features. From the past operation history such as normal playback, fast forward, rewind, pause, etc., set a high importance level for scenes that are paused or rewinded and watched again, and a low importance level for scenes skipped by fast forward Thus, it is possible to estimate a scene that the user likes.

また、ユーザの視線を計測するハードウェアなど、連携する外部機器から得られる情報を用い、ユーザがどの程度映像に集中していたかによって、映像の重要度を計算しても良い。また、前述の例以外の手段を用いても良く、上記例は重要度の計算方法を限定するものではない。 Further, the importance of the video may be calculated depending on how much the user is focused on the video using information obtained from a linked external device such as hardware for measuring the user's line of sight. Further, means other than the above example may be used, and the above example does not limit the calculation method of importance.

図４は、録画済の映像を入力映像とした際の重要度の様子を示した図である。
図４を用いて非重要シーン特定部１０４を説明する。
非重要シーン特定部１０４は、現在再生中の映像４０１について重要度計算部１０３から得られた重要度と閾値を用いて、副映像を再生するシーンを特定するものである。グラフ４０３は、任意のシーン区間４０２について、重要度計算部１０３で得られた映像の各シーンに対する重要度を、横軸に時間、縦軸に重要度をとってグラフ化したものである。 FIG. 4 is a diagram showing the degree of importance when a recorded video is used as an input video.
The non-important scene specifying unit 104 will be described with reference to FIG.
The non-important scene specifying unit 104 specifies a scene for reproducing the sub-picture using the importance and the threshold obtained from the importance calculating unit 103 for the video 401 currently being reproduced. A graph 403 is a graph showing the importance for each scene of the video obtained by the importance calculation unit 103 for an arbitrary scene section 402, with time on the horizontal axis and importance on the vertical axis.

現在視聴中の映像が録画済のものであった場合、その映像全体４０１は既知であり、映像特徴検出部１０２で行われる特徴検出、重要度計算部１０３で行われる重要度計算処理を事前に行い、記録部１０５に記録しておくことが可能である。この事前に計算された重要度と、重要度計算部１０３に予め設定しておいた閾値４０４を用い、閾値４０４未満になるシーン区間を非重要シーン区間４０６として特定する。 If the currently viewed video is already recorded, the entire video 401 is known, and the feature detection performed by the video feature detection unit 102 and the importance calculation processing performed by the importance calculation unit 103 are performed in advance. And can be recorded in the recording unit 105. Using this pre-calculated importance level and the threshold value 404 preset in the importance level calculation unit 103, a scene section that is less than the threshold value 404 is identified as an unimportant scene section 406.

図５は、非重要シーン区間決定の手法を示した図である。
ここで、重要度が得られた場合に、閾値４０４未満の区間を単純に非重要シーン区間とすると、図５（ａ）に示すような非重要シーン区間が得られる。 FIG. 5 is a diagram showing a method for determining an unimportant scene section.
Here, when the importance level is obtained, if a section less than the threshold 404 is simply set as an unimportant scene section, an unimportant scene section as shown in FIG. 5A is obtained.

また、ここで得られる非重要シーン区間情報は、映像合成部１０９やダイジェスト作成部１０７に伝達され、副映像を合成表示するか否かの判断や区間長に合わせたダイジェストの作成に利用される。 Further, the non-important scene section information obtained here is transmitted to the video composition unit 109 and the digest creation unit 107, and is used to determine whether or not to synthesize and display sub-pictures and to create a digest that matches the section length. .

このために、非重要シーンにある程度の区間長が必要とされるような場合には、図５（ａ）に示すような区間長が極端に短い区間は、非重要シーン区間として副映像を再生するのに適さないとし、更に以下のような処理を加える。 For this reason, when a certain length of section length is required for an unimportant scene, a section with an extremely short section length as shown in FIG. 5A is reproduced as a non-important scene section. In addition, the following processing is added.

極端に短い区間を非重要シーン区間から排除する為に、本発明では図５（ｂ）に示すように、予め設定されたシーン区間最低連続時間ｍを利用し、連続時間がｍ以上である区間を非重要シーンとして特定することにする。
また、各種フィルタを用い、得られた重要度に対して平滑化等を行うことで、急激な重要度の変化を軟調化し、必要以上に裁断化されたシーン区間を取り除くこととしても良い。 In order to exclude an extremely short section from an unimportant scene section, the present invention uses a preset scene section minimum continuous time m as shown in FIG. 5B, and has a continuous time of m or more. Will be identified as an unimportant scene.
In addition, by using various filters and smoothing the obtained importance, etc., it is possible to soften a sudden change in importance and remove a scene section that is cut more than necessary.

一方、現在視聴中の映像が放送中の映像であって、かつ、図５（ａ）を用いて説明したような極端に区間長が短い区間を排除する必要がある場合がある。
図６は、放送中の映像を入力映像とした際の重要度の様子を示した図である。
この場合には、図６に示すように、現在時刻６０３の時点で映像全体が受信されてはおらず、当然のことながら計算可能な重要度は現在時刻６０３以前に放送された映像部分についてのみである。このため、図４を用いて前述した録画済映像の場合のように、非重要シーンがどの程度連続するかを事前に知ることが出来ない。 On the other hand, there is a case where it is necessary to exclude a section having an extremely short section length as described with reference to FIG.
FIG. 6 is a diagram showing the degree of importance when an image being broadcast is used as an input image.
In this case, as shown in FIG. 6, the entire video is not received at the current time 603, and naturally the importance that can be calculated is only for the video portion broadcast before the current time 603. is there. For this reason, as in the case of the recorded video described above with reference to FIG. 4, it is impossible to know in advance how long the non-important scenes are continuous.

入力映像が放送中の映像の場合は、図５（ｃ）に示すように、予め設定したシーン区間最低連続時間ｎを用い、閾値を跨いだ時点から重要シーン区間または非重要シーン区間がｎ以上連続した場合にのみ画面表示切り替えを行うこととする。 When the input video is a video being broadcast, as shown in FIG. 5 (c), the preset scene segment minimum continuous time n is used, and there are n or more important scene segments or non-important scene segments from the time when the threshold is crossed. The screen display is switched only when it continues.

また、放送中の映像を一旦記録部に一定時間だけ記録した後に再生を行うタイムシフト再生を行って、重要度によって特定するシーンの区間長が短い区間を排除することも可能である。
図７は、タイムシフト再生を行う場合における映像合成装置の概略構成を示した機能ブロック図である。図１と同一の符号を付した部分は同一物を表わしている。記録部７０１は、図１の記録部１０５で記録していた映像や映像特徴以外に、主映像として視聴する放送中の映像や映像特徴等も保存する。 It is also possible to exclude a section having a short section length of the scene specified by importance by performing time-shifted reproduction in which a video being broadcast is once recorded in a recording unit for a predetermined time and then reproduced.
FIG. 7 is a functional block diagram showing a schematic configuration of the video composition apparatus when performing time shift reproduction. Parts denoted by the same reference numerals as those in FIG. 1 represent the same items. The recording unit 701 stores, in addition to the video and video features recorded by the recording unit 105 in FIG.

図７に示すように、放送中の映像を記録部７０１は一定時間ｔ１記録し、映像合成部１０９に出力を行う。
また、記録部７０１は、一定時間ｔ１を待たずに、放送映像の入力があると映像特徴検出部１０２に出力を行う。この後の処理、即ち、映像特徴検出部１０２で入力映像の特徴を検出し、重要度計算部１０３で重要度を計算する処理は、図１で説明したものと同様である。 As shown in FIG. 7, the recording unit 701 records the video being broadcast for a predetermined time t1, and outputs it to the video composition unit 109.
In addition, the recording unit 701 outputs to the video feature detection unit 102 when a broadcast video is input without waiting for a predetermined time t1. The subsequent processing, that is, the processing for detecting the feature of the input video by the video feature detection unit 102 and calculating the importance by the importance calculation unit 103 is the same as that described with reference to FIG.

ところで、タイムシフトの一定時間ｔ１は、非重要シーン特定部１０４における図５（ｂ）で示したシーン区間最低連続時間ｍ以上になるように設定される。そこで、非重要シーン特定部１０４は、重要度計算部１０３からの閾値及び入力映像の重要度の出力を得て、重要シーン乃至は非重要シーンが最低限、時間ｍ以上連続するかどうかを、記録部７０１が映像合成部１０９に主映像となる動画像を出力する前に確かめることが可能になる。 By the way, the fixed time t1 of the time shift is set to be equal to or longer than the scene section minimum continuous time m shown in FIG. Therefore, the non-important scene specifying unit 104 obtains the threshold value and the importance level of the input video from the importance calculating unit 103, and determines whether or not the important scene or the unimportant scene continues at least for a time m. It is possible to confirm the recording unit 701 before outputting the moving image to be the main video to the video composition unit 109.

図８は、タイムシフトを用いた重要シーン区間決定の手法を示した図である。
図８を用いて、受信した放送中の映像を一旦記録部７０１に溜め込み、ｍ≦ｔ１を満たす時間ｔ１だけタイムシフトして再生を行う際の例を具体的に示す。放送映像８０１は、一旦記録部７０１に蓄積され、時間ｔ１だけ遅れて映像合成部１０９に映像８０２として出力されるものとする。この際、時間ｔ１は非重要シーン特定部１０４におけるシーン区間最低連続時間ｍ以上であるものとする。 FIG. 8 is a diagram showing a method for determining an important scene section using time shift.
FIG. 8 is used to specifically show an example in which the received broadcast video is temporarily stored in the recording unit 701 and is reproduced while being shifted by a time t1 that satisfies m ≦ t1. The broadcast video 801 is temporarily stored in the recording unit 701 and is output as the video 802 to the video synthesis unit 109 with a delay of time t1. At this time, it is assumed that the time t1 is equal to or longer than the scene section minimum continuous time m in the non-important scene specifying unit 104.

一方、非重要シーン特定部１０４は、重要度計算部１０３からの出力を得て処理を行い、例えば、重要度が閾値より低い非重要シーンの連続時間がｍ以上である区間を非重要シーンとして求めていく。ここで、現在時刻８０３においては、放送映像８０１がｔ１の分だけ記憶部７０１に蓄積され、映像合成部１０９にはｔ１より遅く出力されることが保証されているので、この時点で非重要シーン特定部１０４が重要シーン乃至は非重要シーンが最低限、ｍ以上連続するか否かを確かめて映像合成部１０９に非重要シーン区間情報を出力することが可能となる。即ち、放送中の映像であっても、映像から非重要シーンを抽出した際に発生する、必要以上に裁断化された重要シーン区間８０４や、必要以上に裁断化された非重要シーン区間８０５、８０６が排除され、最終的に８０２に示されるような重要シーン区間、非重要シーン区間が決定される。 On the other hand, the non-important scene specifying unit 104 performs processing by obtaining the output from the importance calculating unit 103. For example, a section where the continuous time of an unimportant scene whose importance is lower than a threshold is m or more is set as an unimportant scene. I will ask. Here, at the current time 803, the broadcast video 801 is accumulated in the storage unit 701 for t1, and is guaranteed to be output to the video synthesis unit 109 later than t1, so at this point in time the non-important scene The specifying unit 104 can check whether or not the important scene or the non-important scene continues at least m or more, and can output the non-important scene section information to the video composition unit 109. That is, even if the video is being broadcast, an important scene section 804 that is cut more than necessary, or an unimportant scene section 805 that is cut more than necessary, which is generated when an unimportant scene is extracted from the video, 806 is excluded, and an important scene section and an unimportant scene section as shown in 802 are finally determined.

上記図８を用いて示したように、現在視聴中の映像が放送中の映像に対しても、極端に区間長が短い区間を排除した重要シーン区間乃至は非重要シーン区間を求めることができる。 As shown in FIG. 8, the important scene section or the non-important scene section excluding the section having an extremely short section length can be obtained even for the video currently being viewed. .

以上の説明では、非重要シーン特定部１０４が行う処理として、重要度計算部１０３から得られる重要度によって、映像を重要、非重要シーンに分ける例を示した。 In the above description, as an example of the process performed by the unimportant scene specifying unit 104, an image is divided into important and unimportant scenes according to the importance obtained from the importance calculating unit 103.

ところで、重要度計算部１０３で得られる映像特徴から導き出された重要度（以下、「映像重要度」という）とユーザが考える重要度（以下、「ユーザ重要度」という）が一致しないことが考えられる。 By the way, it is considered that the importance derived from the video feature obtained by the importance calculation unit 103 (hereinafter referred to as “video importance”) and the importance considered by the user (hereinafter referred to as “user importance”) do not match. It is done.

例えば、カメラワークを元に、その動きが大きい（激しい）シーンを重要とする重要度が計算されている場合を考える。
前述したように、映像がサッカー中継番組だった場合には、ユーザが注目するシーンは、シュートシーンなどカメラワークの動きが大きい（激しい）シーンであり、これは映像重要度とユーザ重要度が一致する。 For example, let us consider a case in which importance is calculated based on camera work, in which a scene with a large (strong) movement is important.
As described above, when the video is a soccer broadcast program, the scene that the user pays attention to is a scene where the camera work is large (severe) such as a shoot scene, which matches the video importance level and the user importance level. To do.

一方、映像が料理番組だった場合、ユーザが注目するシーンは、調理をする手元を映すシーン、材料をフリップで紹介するシーン、出来上がりの画など、主にカメラワークの動きが小さいシーンであり、映像重要度とユーザ重要度が一致しない。 On the other hand, when the video is a cooking program, the scenes that the user pays attention to are scenes where the movement of camera work is mainly small, such as scenes that show the hands of cooking, scenes that introduce materials by flip, finished pictures, etc. Video importance and user importance do not match.

図９は、非重要シーン区間の決定方法の一例を示した図である。
ここで、単純に図９（ａ）のように閾値以上のシーンを重要シーンとすると、サッカー中継番組ではユーザが注目するシュートシーンなどの重要シーンが取得できるが、料理番組においては、シーンの切り替わりなど余りユーザが重要としないシーンばかりが重要シーンと判定されてしまう。 FIG. 9 is a diagram illustrating an example of a method for determining an unimportant scene section.
Here, as shown in FIG. 9A, if a scene that is equal to or greater than the threshold value is set as an important scene, an important scene such as a shoot scene that is noticed by a user can be acquired in a soccer broadcast program. Only scenes that are not so important to the user are determined to be important scenes.

この問題を解決するために、２種類の異なる計算手法で各々の映像重要度を計算することも考えられるが、代わりに１つの映像重要度、例えば上記の例では動きの大きいシーンを重要とするような映像重要度を求めておき、動きの小さいシーンを重要としたい場合には、図９（ｂ）に示すようにユーザ重要度の重要・非重要の評価のみを反転（閾値による指定範囲を反転）させ、映像重要度の閾値以上の区間をユーザにとっての非重要シーンとすることも考えられる。つまり、極端にカメラワークの動きが大きなシーン、即ち映像重要度がある一定以上の大きさを持つ区間のほうを、非ユーザ重要シーンと判定し、この非ユーザ重要シーンを最終的な非重要シーンとする。 In order to solve this problem, it may be possible to calculate the importance of each video using two different calculation methods, but instead, one video importance, for example, a scene with a large motion in the above example is important. If the video importance level is calculated and a scene with small movements is important, only the importance / non-importance evaluation of the user importance level is reversed as shown in FIG. It is also conceivable that a section that is equal to or higher than the video importance level threshold is set as a non-important scene for the user. That is, a scene with extremely large camera work movement, that is, a section having a certain level of video importance is determined as a non-user important scene, and this non-user important scene is determined as a final non-important scene. And

以上で述べたように、重要度計算部１０３で計算されたある映像重要度に対して、ある閾値以上、ある閾値以下のいずれをユーザ重要度における重要とみなしてもよい。
これは重要度計算部１０３で計算された映像重要度の計算方法と、ユーザが重要と考えるシーンの性質との組合せに依存する。非重要シーン区間を閾値以上とするか、以下とするかは、重要度計算部１０３での重要度計算時に判断し、閾値とともに非重要シーン特定部１０４に伝えるものとする。 As described above, with respect to a certain video importance level calculated by the importance level calculation unit 103, any of a certain threshold value and a certain threshold value or less may be regarded as important in the user importance level.
This depends on the combination of the video importance calculation method calculated by the importance calculation unit 103 and the scene property that the user considers important. Whether the non-important scene section is set to be greater than or less than the threshold value is determined at the time of importance calculation in the importance calculation unit 103, and is transmitted to the non-important scene specifying unit 104 together with the threshold value.

無論、ユーザが閾値の範囲の判断をしても良く、操作入力部１１０からの入力で判断する形式としても良い。 Of course, the user may determine the range of the threshold, or may be determined by an input from the operation input unit 110.

また、ここでは、閾値が一つであると仮定して説明をしたが、複数の閾値を用いて非重要シーンを特定するとしても良い。例えば、ｓ＜ｔの関係を満たす閾値ｓ、ｔが存在するときに、閾値ｓ以上かつ閾値ｔ未満や、閾値ｓ未満または閾値ｔ以上など、複数の閾値を用いた閾値処理により非重要シーン区間を決定しても良い。 Although the description has been made assuming that there is one threshold value here, an unimportant scene may be specified using a plurality of threshold values. For example, when there are thresholds s and t satisfying the relationship of s <t, non-important scene sections are obtained by threshold processing using a plurality of thresholds such as threshold s or more and less than threshold t, or less than threshold s or threshold t. May be determined.

また、これらの閾値はユーザからの入力を操作入力部１１０から受け取るものとしても良い。ユーザの入力は図３（ｂ）に示すように、閾値となる値を入力させる形式としても良いし、図３（ｃ）に示すように、先に計算された重要度のグラフを表示し、グラフ上での閾値を視覚的に選択させる形式としても良い。 These threshold values may be received from the operation input unit 110 from the user. As shown in FIG. 3 (b), the user's input may be in the form of inputting a threshold value, or as shown in FIG. 3 (c), a graph of importance calculated earlier is displayed. It is good also as a form which makes the threshold value on a graph select visually.

次に録画済映像選択部１０６の動作を示す。
現在視聴中の映像に関する映像特徴は、映像特徴検出部１０２より録画済映像選択部１０６に伝えられ、録画済映像の中から適切な副映像を記憶部１０５より選び出す。 Next, the operation of the recorded video selection unit 106 will be described.
The video feature relating to the currently viewed video is transmitted from the video feature detection unit 102 to the recorded video selection unit 106, and an appropriate sub-video is selected from the storage unit 105 from the recorded video.

ユーザが現在視聴している主映像は、ユーザが自ら選択し、視聴しているものであるから、ユーザは今現在、主映像に関連するまたは主映像に相関のある映像に興味を持っていると考えられる。そこで、主映像に相関や関連が高いものを副映像として選出することで、ユーザの興味を引く映像を選択することが可能になる。 Since the main video currently being viewed by the user is the one that the user has selected and viewed, the user is currently interested in video related to or correlated with the main video. it is conceivable that. Therefore, by selecting an image that is highly correlated or related to the main image as a sub image, it is possible to select an image that attracts the user's interest.

更に、この副映像を検出する条件として、直前までユーザが視聴していた重要なシーンから得られる映像特徴を加えることにより、ユーザの今現在の興味をより適切に推定したシーンを検出することが出来る。 Furthermore, as a condition for detecting this sub-picture, it is possible to detect a scene more appropriately estimating the current interest of the user by adding video features obtained from an important scene that the user has been watching until immediately before. I can do it.

この場合、非重要シーン特定部１０４から重要シーンの区間情報が映像特徴検出部１０２に出力され、映像特徴検出部１０２から、該重要シーンの区間情報の特徴情報が録画済映像選択部１０６へ伝達される。
また、映像特徴検出部１０２は、特徴情報を時間情報とともに、記憶している必要がある。 In this case, the important scene section information is output from the non-important scene specifying unit 104 to the video feature detecting unit 102, and the feature information of the important scene section information is transmitted from the video feature detecting unit 102 to the recorded video selecting unit 106. Is done.
Further, the video feature detection unit 102 needs to store feature information together with time information.

現在視聴中の主映像と副映像の相関は、先に映像特徴検出部１０２で得られた映像特徴、またはそれに付随する情報の相関から得られる。ここで、副映像の映像特徴は、副映像の記録時または記録後に、映像特徴検出部１０２を用いて取得され、記録部１０５に記録されているものとする。これら相関の例としては、音声レベル、使用音楽、映像ヒストグラム、動きベクトル、ＥＰＧから得られるジャンル情報、出演者情報、タイトル情報、スポンサー情報、録画日時情報などが挙げられる。 The correlation between the main video currently being viewed and the sub-video is obtained from the correlation between the video features previously obtained by the video feature detection unit 102 or information associated therewith. Here, it is assumed that the video feature of the sub-video is acquired by using the video feature detection unit 102 at the time of or after the sub-video is recorded and is recorded in the recording unit 105. Examples of these correlations include audio level, music used, video histogram, motion vector, genre information obtained from EPG, performer information, title information, sponsor information, recording date and time information, and the like.

以下にユーザが主映像を視聴している際に相関の高い映像を取得する例を、図４を用いて示す。 An example of acquiring a highly correlated video when the user is viewing the main video will be described below with reference to FIG.

図４は、上記でも説明したように、過去に録画した映像の状態を示す図である。
過去に録画した映像４０１内のある区間４０２に注目し、その区間の重要度をグラフ化したものが４０３であるが、設定された閾値４０４から決定された非重要シーン区間４０６に再生する副映像を選びたい。 FIG. 4 is a diagram showing the state of video recorded in the past as described above.
Focusing on a section 402 in the video 401 recorded in the past, the importance of the section is graphed 403, but the sub-picture to be reproduced in the non-important scene section 406 determined from the set threshold 404 I want to choose.

録画済映像選択部１０６では、ＥＰＧ情報のような映像に付随の情報の他に、この非重要シーン区間４０６直前の重要シーン区間４０５に含まれる映像情報を用いて相関を計算することにより、適切な副映像を選択する。 The recorded video selection unit 106 calculates the correlation by using the video information included in the important scene section 405 immediately before the non-important scene section 406 in addition to the information accompanying the video such as EPG information. To select a sub-picture.

図１０は、映像に付随する情報の例を示した図である。
例えば、図１０に示すような映像群から、ユーザがバラエティ番組である映像Ｂを選択し、視聴しているとする。このとき、これら映像に付随する情報のみを用いて関連する副映像を選択した場合、同一のジャンルに分類されている映像Ｄや、同一の出演者が出演している映像Ａが選択される。 FIG. 10 is a diagram illustrating an example of information accompanying a video.
For example, it is assumed that the user has selected and viewed video B, which is a variety program, from the video group shown in FIG. At this time, when a related sub-video is selected using only information associated with these videos, the video D classified into the same genre or the video A in which the same performer appears is selected.

しかし、一般的に映像は複数の異なるシーンから成り立っているものであって、これら映像単位で計算した相関では、各々のシーンに適切な、即ちユーザの今現在の興味を良く推定した映像の選択が出来ない。 However, in general, a video is composed of a plurality of different scenes, and the correlation calculated for each video unit selects a video that is appropriate for each scene, that is, a user's current interest is well estimated. I can't.

これに対し、本発明では、直前の重要シーンの特徴を利用することにより、現在のユーザの興味を良く推定した映像の選択を可能とする。 On the other hand, in the present invention, it is possible to select a video that has well estimated the current user's interest by using the feature of the immediately preceding important scene.

図１１は、入力映像を重要シーンと非重要シーンに分別した後の様子を示した図である。
今、図１１に示すように、映像Ｂ１１０１内に非重要シーン１１０４、１１０５が含まれていたとする。また、非重要シーン１１０４、１１０５各々の直前の重要シーンが、漫才を行う重要シーン１１０２と、出演者がサッカー対決を行うという重要シーン１１０３であったとする。このとき非重要シーン１１０５では直前の映像特徴が以下のようになる。
Ｉ．全体的に画面を占める領域が芝であることが多いことから、映像の色分布が緑に偏る。
ＩＩ．フィールド上の白色線で構成された、フィールドが存在する。
ＩＩＩ．フィールド上に、複数のオブジェクト（プレイヤー）が存在する。これらは、色によって主に２種類（チームごと）に分類される。
ＩＶ．上記オブジェクトより小さく、円形をしたオブジェクト（ボール）が存在し、このオブジェクトが画面の中心にあることが多い。
以上のような画像特徴を持つことから、同様の特徴を持つ映像にユーザの今現在の関心があると考え、図１０に示される映像群からこれらの特徴と同様の特徴を持ち、相関が高い映像Ｆを選択する。 FIG. 11 is a diagram illustrating a state after the input video is classified into an important scene and an unimportant scene.
Assume that non-important scenes 1104 and 1105 are included in the video B 1101 as shown in FIG. In addition, it is assumed that the important scenes immediately before each of the unimportant scenes 1104 and 1105 are the important scene 1102 where the comics are performed and the important scene 1103 where the performer performs a soccer confrontation. At this time, in the unimportant scene 1105, the immediately preceding video features are as follows.
I. Since the area that occupies the screen as a whole is often grass, the color distribution of the image is biased to green.
II. There is a field composed of white lines on the field.
III. There are multiple objects (players) on the field. These are mainly classified into two types (each team) by color.
IV. There is a circular object (ball) that is smaller than the above object, and this object is often at the center of the screen.
Since it has the above image features, it is considered that the user is currently interested in videos with similar features, and from the video group shown in FIG. Select image F.

このように、非重要シーン直前の重要シーンから特徴を検出し、これを利用した相関を計算することで、シーンごとに適切な副映像を選択することが可能となる。 In this way, by detecting features from the important scene immediately before the unimportant scene and calculating the correlation using this, it is possible to select an appropriate sub-picture for each scene.

相関を得る手法としては、他にも、この直前の重要シーンに含まれる字幕情報から、重要なキーワードを抽出し、マッチする映像を選出しても良いし、映像情報中の人物が大きく映るシーンからパターンマッチングによって同一の出演者が含まれる映像を選出しても良い。また、音声情報から、同様の背景音楽（ＢＧＭ）を使用している映像などを選出しても良い。 Other methods for obtaining correlation include extracting important keywords from subtitle information included in the immediately preceding important scene and selecting matching videos, or scenes where people in the video information appear large From the above, a video including the same performer may be selected by pattern matching. Moreover, you may select the image | video etc. which use the same background music (BGM) from audio | voice information.

次に、図４を用いて以下にダイジェスト作成部１０７の動作の例を示す。
このブロックでは非重要シーン特定部１０４で得られた非重要シーン区間、記録部１０５に記録された映像特徴はダイジェスト作成部１０７に伝えられ、録画済映像選択部１０６で選択された映像について、各シーンに合わせたダイジェスト作成を行う。 Next, an example of the operation of the digest creation unit 107 will be described below with reference to FIG.
In this block, the non-important scene section obtained by the non-important scene specifying unit 104 and the video feature recorded in the recording unit 105 are transmitted to the digest creation unit 107, and each video selected by the recorded video selection unit 106 is Create a digest that matches the scene.

ダイジェスト作成部１０７では非重要シーン特定部１０４から受け取った非重要シーン連続時間情報から、この時間に合わせた長さのダイジェストを作成する。 The digest creation unit 107 creates a digest having a length corresponding to this time from the non-important scene continuous time information received from the non-important scene specifying unit 104.

録画済映像を主映像として視聴する際には映像全体４０１が既知であり、映像特徴検出部１０２で行われる特徴検出、重要度計算部１０３で行われる重要度計算処理を事前に行い、記録部１０５に記録しておくことが可能である。
前述したように、この事前に計算された重要度を用いて、非重要シーン特定部１０４では、非重要シーン区間４０６がどの程度連続するかを事前に知ることができ、ダイジェスト編集部１０７ではこの非重要シーン区間の連続時間に合わせたダイジェストを作成する。 When viewing the recorded video as the main video, the entire video 401 is known, and the feature detection performed by the video feature detection unit 102 and the importance calculation processing performed by the importance calculation unit 103 are performed in advance, and the recording unit 105 can be recorded.
As described above, using the importance calculated in advance, the non-important scene specifying unit 104 can know in advance how long the non-important scene section 406 continues, and the digest editing unit 107 can detect this. Create a digest that matches the continuous time of non-important scene sections.

時間に合わせたダイジェストを作成する為には、例えば、設定されたダイジェスト再生時間に合わせてダイジェストの作成を行うことを可能とする特許番号３６４０６１５に示されるような技術を用いてもよいし、その他の技術を用いても良い。 In order to create a digest adapted to the time, for example, a technique such as that shown in Patent No. 3640615 that makes it possible to create a digest according to a set digest playback time may be used. The technique may be used.

一方、現在放送中の映像を視聴している場合、映像全体を事前に知ることが出来ない為、非重要シーン特定部の説明で図４を用いて述べたように、非重要シーンの開始点（ダイジェスト再生開始点）から、非重要シーンの終了点（ダイジェスト再生終了点）がいつ発生するかは事前に知ることは不可能である。この為、連続再生可能な時間に合わせての副映像をリアルタイムに作成することが出来ない。 On the other hand, since the entire video cannot be known in advance when viewing the currently broadcast video, as described with reference to FIG. 4 in the description of the non-important scene specifying unit, the start point of the non-important scene It is impossible to know in advance when the end point of the non-important scene (digest playback end point) occurs from (digest playback start point). For this reason, it is not possible to create sub-pictures in real time in accordance with the time during which continuous playback is possible.

そこで、以下の方法により、ダイジェスト作成を行うこととする。
まず、予め副映像の中から数秒程度にカットしたハイライトシーンをいくつか選択しておく。ここで、ハイライトシーンとは、先に重要度計算部１０３で計算された重要度が特に高いシーンであり、音声の盛り上がり、シーンの切り替わり、人が中心に映っているシーンなど、主にユーザに興味を抱かせるイベントが起きたシーンを指すものとする。これらのハイライトシーン群を重要度計算部１０３で計算される重要度が高い順や、元の映像の時間軸に沿った順に、非重要シーンの終了点に到達するまで再生することで、ダイジェストを実現する。非重要シーンの終了点に到達した際に、再生途中であったダイジェスト映像は、途中で終了するものとしてもよいし、ダイジェスト映像が再生終了するまで非重要シーン終了点を延長するようにしてもよい。 Therefore, a digest is created by the following method.
First, some highlight scenes cut in about several seconds are selected from sub-pictures in advance. Here, the highlight scene is a scene having a particularly high importance calculated by the importance calculation unit 103, and is mainly used by a user such as a sound excitement, a scene change, or a scene in which a person is reflected in the center. It refers to a scene where an event that interests you occurs. By playing these highlight scenes in the order of importance calculated by the importance calculation unit 103 or along the time axis of the original video until the end point of the non-important scene is reached, a digest is obtained. Is realized. When the end point of the non-important scene is reached, the digest video that was in the middle of playback may end in the middle, or the non-important scene end point may be extended until the digest video finishes playback. Good.

映像合成部１０９は、非重要シーン特定部１０４からの指示を受けて、動画像出力部１０８の表示を切りかえる。非重要シーン特定部１０４で重要と判断されたシーンでは主映像のみの表示とし、非重要と判断されたシーンでは、主映像と合わせてダイジェスト編集部１０７で作成された副映像を再生する。 In response to an instruction from the unimportant scene specifying unit 104, the video composition unit 109 switches the display of the moving image output unit 108. In the scene determined to be important by the non-important scene specifying unit 104, only the main video is displayed. In the scene determined to be unimportant, the sub-video created by the digest editing unit 107 is reproduced together with the main video.

図１２は、映像を多重化する際の具体例を示した図である。
表示方式としては図１２に示すように、メイン画面（図１２（ａ））を縮小し、空いたスペースにダイジェストをひとつ乃至は複数表示しても良いし（図１２（ｂ））、ＰｉｎＰ（ＰｉｃｔｕｒｅｉｎＰｉｃｔｕｒｅ）形式でメイン画面とダイジェストを同時に表示（図１２（ｃ））しても良い。
また、ダイジェストに加えて、タイトルなどの基本情報や関連するキーワード情報などのテキストデータを表示（図１２（ｄ））しても良い。 FIG. 12 is a diagram showing a specific example when video is multiplexed.
As a display method, as shown in FIG. 12, the main screen (FIG. 12 (a)) may be reduced, and one or more digests may be displayed in an empty space (FIG. 12 (b)). The main screen and digest may be displayed simultaneously in a P (Picture in Picture) format (FIG. 12C).
In addition to the digest, basic data such as a title and text data such as related keyword information may be displayed (FIG. 12D).

また、出力される音声に関しては、主映像と副映像が同時に再生されている際は、主映像乃至は副映像どちらかひとつの音声を消音とし、もう片方音声のみを再生させるようにしても良い。他にも、主映像の重要度が低いと判断されている間は、主映像の音声を弱音とし、副音声の音声を多重化して再生させるようにしても良い。 As for the output audio, when the main video and the sub video are being played back simultaneously, either the main video or the sub video may be muted and only the other audio may be played back. . In addition, while it is determined that the importance of the main video is low, the audio of the main video may be weak and the audio of the sub audio may be multiplexed and reproduced.

以下に映像視聴時におけるユーザの特殊操作について述べる。
副映像のダイジェスト再生中、ユーザのその映像への興味に応じ、リモコン等の入力装置から特定の操作をすることが可能である。
ダイジェストとして再生された映像に興味を持った場合は、入力装置から“興味有り”といったような信号を送ることで、後に見る映像の候補とするためのブックマーク登録を記憶部１０５に行う形式としても良い。
逆にダイジェストとして再生された映像に興味がない場合は、その映像をそのまま記憶部１０５の消去リストに加える形式としても良い。 The following describes the user's special operations during video viewing.
During the digest reproduction of the sub-video, it is possible to perform a specific operation from an input device such as a remote controller according to the user's interest in the video.
If you are interested in the video played as a digest, you can send a signal such as “I'm interested” from the input device to register the bookmark in the storage unit 105 as a video candidate to be viewed later. good.
On the contrary, if the video reproduced as a digest is not interested, the video may be added to the deletion list in the storage unit 105 as it is.

また、現在視聴中の映像とダイジェストで流れた映像を入れ替えて視聴する形式としても良い。 Moreover, it is good also as a format which interchanges and views the image | video currently watched and the image | video which flowed by digest.

尚、本発明の映像合成装置は、上述の図示例にのみ限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々変更を加え得ることは勿論である。 It should be noted that the video composition apparatus of the present invention is not limited to the above-described illustrated examples, and it is needless to say that various modifications can be made without departing from the gist of the present invention.

本発明の一実施形態にかかる映像合成装置の概略構成を示した機能ブロック図である。1 is a functional block diagram showing a schematic configuration of a video composition device according to an embodiment of the present invention. 本発明の一実施形態にかかる映像合成装置において重要度計算の一例を示した図である。It is the figure which showed an example of importance calculation in the video synthesizing | combining apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置においてユーザからの入力を受け付けるインターフェイスの例を示した図である。It is the figure which showed the example of the interface which receives the input from a user in the video synthesizing | combining apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において録画済の映像を入力映像とした際の重要度の様子を示した図である。It is the figure which showed the mode of the importance at the time of making into the input image | video the recorded image | video in the video composition apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において非重要シーン区間決定の手法を示した図である。It is the figure which showed the method of the non-important scene area determination in the video synthesizing | combining apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において放送中の映像を入力映像とした際の重要度の様子を示した図である。It is the figure which showed the mode of the importance when the image | video in broadcast is made into the input image | video in the image | video synthetic | combination apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置の概略構成を示した機能ブロック図である。1 is a functional block diagram showing a schematic configuration of a video composition device according to an embodiment of the present invention. 本発明の一実施形態にかかる映像合成装置において、タイムシフトを用いた重要シーン区間決定の手法を示した図である。It is the figure which showed the technique of the important scene area determination using a time shift in the video synthesizing | combining apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において非重要シーン区間の決定方法の一例を示した図である。It is the figure which showed an example of the determination method of the non-important scene area in the video composition device concerning one embodiment of the present invention. 本発明の一実施形態にかかる映像合成装置において映像に付随する情報の例を示した図である。It is the figure which showed the example of the information accompanying a video in the video synthesizing | combining apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において入力映像を重要シーンと非重要シーンに分別した後の様子を示した図である。It is the figure which showed the mode after classifying an input image | video into an important scene and an unimportant scene in the image | video synthetic | combination apparatus concerning one Embodiment of this invention. 本発明の一実施形態にかかる映像合成装置において映像を多重化する際の具体例を示した図である。It is the figure which showed the specific example at the time of multiplexing an image | video in the video synthesizing | combining apparatus concerning one Embodiment of this invention.

Explanation of symbols

１００、７００映像合成装置
１０１映像入力部
１０２映像特徴検出部
１０３重要度計算部
１０４非重要シーン特定部
１０５、７０１記憶部
１０６録画済映像選択部
１０７ダイジェスト作成部
１０８動画像出力部
１０９映像合成部
１１０操作入力部
４０１、６０１映像の全区間
４０２任意区間
４０４、６０６閾値
４０５、４０７、６０７重要シーン区間
４０６非重要シーン区間
６０２放送済区間
６０３、８０３現在放送位置
６０４未放送区間
６０８ダイジェスト再生開始点
６０９ダイジェスト再生終了点
８０１放送映像
８０２シフト出力映像
８０４、１１０２、１１０３重要シーン
１１０１映像Ｂ
８０５、８０６、１１０４、１１０５非重要シーン 100, 700 Video composition device 101 Video input unit 102 Video feature detection unit 103 Importance calculation unit 104 Non-important scene specification unit 105, 701 Storage unit 106 Recorded video selection unit 107 Digest creation unit 108 Moving image output unit 109 Video synthesis unit 110 Operation input units 401, 601 All video sections 402 Arbitrary sections 404, 606 Threshold values 405, 407, 607 Important scene sections 406 Non-important scene sections 602 Broadcasted sections 603, 803 Current broadcast position 604 Non-broadcast sections 608 Digest playback start point 609 Digest playback end point 801 Broadcast video 802 Shift output video 804, 1102, 1103 Important scene 1101 Video B
805, 806, 1104, 1105 Non-important scene

Claims

A video synthesizing device for synthesizing and displaying at least two videos that are broadcast or recorded,
A feature detector for detecting feature information corresponding to each scene in the video;
A recording unit for recording the video and / or the feature information;
A calculation unit that calculates a value of an index for importance of each scene of the first video based on the feature information;
A video selection unit for selecting a second video;
A scene specifying unit for specifying a scene in the first video for synthesizing the second video based on the value of the index;
An image composition device comprising: an image composition unit that composes the second image with the scene in the first image identified by the scene identification unit.

The video according to claim 1, wherein the video selection unit detects and selects a video similar to or related to the first video as the second video based on the feature information. Synthesizer.

3. The video composition apparatus according to claim 2, wherein the video selection unit selects the second video based on the feature information for each scene of the first video or each continuous scene group. .

The feature information for the video selection unit to select the second video is from a scene or scene group having a high index value closest to the scene in the first video specified by the scene specification unit. The video composition apparatus according to claim 3, wherein the obtained characteristic information is used.

5. The apparatus according to claim 1, wherein the first video is a video being broadcast, and the second video is a video recorded in the apparatus. 6. Video synthesizer.

5. The video composition apparatus according to claim 1, wherein both the first video and the second video are videos recorded in the device. 6.

The video synthesizing apparatus according to any one of claims 1 to 6, wherein the feature information includes video information, audio information, and text information that change every moment in the video.

The said scene specific | specification part specifies the scene in the said 1st image | video which synthesize | combines a said 2nd image | video by performing a threshold value process with respect to the value of the said index, The Claim 1 characterized by the above-mentioned. Item 8. The video synthesizing device according to any one of Items 7.

9. The video composition apparatus according to claim 8, wherein the calculation unit sets the threshold value and outputs the threshold value together with the index value to the scene specifying unit.

The video composition device according to claim 9, wherein the calculation unit inverts the information of the threshold selection range according to a type of the first video scene.

The video composition unit edits the selected second video as a digest video that matches the continuous time length of the scene in the first video specified by the scene specification unit, The video composition device according to any one of claims 1 to 10.