JP2016116073A

JP2016116073A - Video processing method, video processing apparatus, and program

Info

Publication number: JP2016116073A
Application number: JP2014253212A
Authority: JP
Inventors: 和博嶋内; Kazuhiro Shimauchi; 広志池田; Hiroshi Ikeda; 伸穂池田; Nobuo Ikeda; 篤史木村; Atsushi Kimura
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2014-12-15
Filing date: 2014-12-15
Publication date: 2016-06-23

Abstract

PROBLEM TO BE SOLVED: To propose a video processing method capable of generating a summarized video of switching videos according to music in a proper operating mode, a video processing apparatus, and a program.SOLUTION: The video processing method includes: extracting a plurality of unit videos from an input video; generating editing information for switching the unit videos adopted from the extracted unit videos according to input music; and controlling an operation mode with a processor for the extracting processing of extracting the unit videos and for the adopting processing of adopting the unit video.SELECTED DRAWING: Figure 1

Description

本開示は、映像処理方法、映像処理装置及びプログラムに関する。 The present disclosure relates to a video processing method, a video processing device, and a program.

近年、ウェアラブルカメラやアクションカメラといった種類のカメラがスポーツ等の分野において広く用いられている。このようなカメラでは、長時間連続して撮影されることが多く、また構図が単調になりやすいことから、撮影された映像そのままでは鑑賞に堪えない場合がある。そのため、撮影された映像の見どころを短くまとめた要約映像を生成するための技術の開発が望まれている。 In recent years, various types of cameras such as wearable cameras and action cameras have been widely used in sports and the like. Such a camera is often photographed continuously for a long time, and the composition tends to be monotonous. Therefore, the photographed image may be unbearable. Therefore, it is desired to develop a technique for generating a summary video that summarizes the highlights of the captured video.

そのような技術に関して、例えば下記特許文献１及び２に開示されているように、要約映像の生成のための複数の動作モードを有し、適切な動作モードで要約映像を生成する技術が開発されている。詳しくは、下記特許文献１では、長さの異なる複数の要約映像をそれぞれ作成可能にする技術が開示されている。 With respect to such a technique, for example, as disclosed in Patent Documents 1 and 2 below, a technique has been developed that has a plurality of operation modes for generating a summary video and generates a summary video in an appropriate operation mode. ing. Specifically, Japanese Patent Application Laid-Open No. 2004-228561 discloses a technique that enables creation of a plurality of summary videos having different lengths.

特開２００２−２７１７３９号公報JP 2002-271739 A 特開２００５−１１３９２号公報JP 2005-11392 A

しかし、上記特許文献１に開示された技術は、長さの異なる複数の要約映像を生成可能にするものであって、要約映像の内容についての多様性を形成するものではなかった。そこで、本開示では、音楽に応じて映像を切り替える要約映像の生成を適切な動作モードで行うことが可能な、新規且つ改良された映像処理方法、映像処理装置及びプログラムを提案する。 However, the technique disclosed in Patent Document 1 enables generation of a plurality of summary videos having different lengths, and does not form diversity regarding the contents of the summary video. Therefore, the present disclosure proposes a new and improved video processing method, video processing apparatus, and program capable of generating a summary video that switches videos according to music in an appropriate operation mode.

本開示によれば、入力された映像から複数の単位映像を抽出することと、抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成することと、前記単位映像を抽出する抽出処理及び前記単位映像を採用する採用処理における動作モードをプロセッサにより制御することと、を含む映像処理方法が提供される。 According to the present disclosure, a plurality of unit videos are extracted from an input video, and edit information for switching the unit video adopted from the extracted unit videos according to input music is generated. And a processor that controls an operation mode in an extraction process for extracting the unit video and an adopting process that employs the unit video.

また、本開示によれば、入力された映像から複数の単位映像を抽出する抽出部と、前記抽出部により抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成する編集部と、前記抽出部及び前記編集部における動作モードを制御する動作モード制御部と、を備える映像処理装置が提供される。 According to the present disclosure, the extraction unit that extracts a plurality of unit videos from the input video and the unit video adopted from the unit videos extracted by the extraction unit are switched according to the input music. There is provided a video processing apparatus including an editing unit that generates editing information for the above, and an operation mode control unit that controls an operation mode in the extraction unit and the editing unit.

また、本開示によれば、コンピュータを、入力された映像から複数の単位映像を抽出する抽出部と、前記抽出部により抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成する編集部と、前記抽出部及び前記編集部における動作モードを制御する動作モード制御部と、として機能させるためのプログラムが提供される。 In addition, according to the present disclosure, the computer extracts, from the input video, a plurality of unit videos, and the unit video adopted from the unit videos extracted by the extraction unit into the input music. There is provided a program for functioning as an editing unit that generates editing information to be switched according to the operation mode, and an operation mode control unit that controls an operation mode in the extraction unit and the editing unit.

以上説明したように本開示によれば、音楽に応じて映像を切り替える要約映像の生成を適切な動作モードで行うことが可能である。なお、上記の効果は必ずしも限定的なものではなく、上記の効果とともに、または上記の効果に代えて、本明細書に示されたいずれかの効果、または本明細書から把握され得る他の効果が奏されてもよい。 As described above, according to the present disclosure, it is possible to generate a summary video that switches video according to music in an appropriate operation mode. Note that the above effects are not necessarily limited, and any of the effects shown in the present specification, or other effects that can be grasped from the present specification, together with or in place of the above effects. May be played.

本実施形態に係る映像処理装置の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the video processing apparatus which concerns on this embodiment. 本実施形態に係る映像処理装置において実行される映像解析処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the video analysis process performed in the video processing apparatus which concerns on this embodiment. 本実施形態に係る映像処理装置において実行される編集情報生成処理及び要約映像生成処理の概要を説明するための図である。It is a figure for demonstrating the outline | summary of the edit information generation process and summary video generation process which are performed in the video processing apparatus which concerns on this embodiment. 本実施形態に係る映像処理装置の論理的な構成の一例を示すブロック図である。It is a block diagram which shows an example of the logical structure of the video processing apparatus which concerns on this embodiment. 本実施形態に係る単位映像の抽出処理を説明するための図である。It is a figure for demonstrating the extraction process of the unit image | video which concerns on this embodiment. 本実施形態に係る単位映像の切り替えタイミングの設定処理を説明するための図である。It is a figure for demonstrating the setting process of the switching timing of the unit video which concerns on this embodiment. 本実施形態に係る映像処理装置の動作モードの一例を説明するための図である。It is a figure for demonstrating an example of the operation mode of the video processing apparatus which concerns on this embodiment. 本実施形態に係る単位映像の選択処理を説明するための図である。It is a figure for demonstrating the selection process of the unit image | video which concerns on this embodiment. 本実施形態に係る単位映像の選択処理を説明するための図である。It is a figure for demonstrating the selection process of the unit image | video which concerns on this embodiment. 本実施形態に係る採用区間の設定処理を説明するための図である。It is a figure for demonstrating the setting process of the employment area which concerns on this embodiment. 本実施形態に係る採用区間の設定処理を説明するための図である。It is a figure for demonstrating the setting process of the employment area which concerns on this embodiment. 本実施形態に係る採用区間の設定処理を説明するための図である。It is a figure for demonstrating the setting process of the employment area which concerns on this embodiment. 本実施形態に係る映像処理装置において実行される要約映像の生成処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the production | generation process of the summary image | video performed in the video processing apparatus which concerns on this embodiment. 本実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the information processing apparatus which concerns on this embodiment.

以下に添付図面を参照しながら、本開示の好適な実施の形態について詳細に説明する。なお、本明細書及び図面において、実質的に同一の機能構成を有する構成要素については、同一の符号を付することにより重複説明を省略する。 Hereinafter, preferred embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In addition, in this specification and drawing, about the component which has the substantially same function structure, duplication description is abbreviate | omitted by attaching | subjecting the same code | symbol.

また、本明細書及び図面において、実質的に同一の機能構成を有する要素を、同一の符号の後に異なるアルファベットを付して区別する場合もある。例えば、実質的に同一の機能構成を有する複数の要素を、必要に応じて映像処理装置１００Ａ、１００Ｂ及び１００Ｃのように区別する。ただし、実質的に同一の機能構成を有する複数の要素の各々を特に区別する必要がない場合、同一符号のみを付する。例えば、映像処理装置１００Ａ、１００Ｂ及び１００Ｃを特に区別する必要が無い場合には、単に映像処理装置１００と称する。 In the present specification and drawings, elements having substantially the same functional configuration may be distinguished by adding different alphabets after the same reference numerals. For example, a plurality of elements having substantially the same functional configuration are distinguished as necessary, such as the video processing devices 100A, 100B, and 100C. However, when there is no need to particularly distinguish each of a plurality of elements having substantially the same functional configuration, only the same reference numerals are given. For example, the video processing devices 100A, 100B, and 100C are simply referred to as the video processing device 100 when it is not necessary to distinguish them.

なお、説明は以下の順序で行うものとする。
１．概要
２．基本構成
３．機能詳細
３．１．単位映像の抽出処理
３．２．切り替えタイミングの設定処理
３．３．動作モードの決定処理
３．４．単位映像の選択処理
３．５．採用区間の設定処理
４．動作処理
５．ハードウェア構成例
６．まとめ The description will be made in the following order.
1. Overview 2. Basic configuration Functional details 3.1. Unit video extraction process 3.2. Switching timing setting process 3.3. Operation mode determination processing 3.4. Unit video selection process 3.5. 3. Adopting section setting process Operation processing 5. Hardware configuration example Summary

＜１．概要＞
まず、図１〜図３を参照して、本実施形態に係る映像処理装置の概要を説明する。 <1. Overview>
First, an overview of a video processing apparatus according to the present embodiment will be described with reference to FIGS.

図１は、本実施形態に係る映像処理装置１００の概要を説明するための図である。図１では、映像処理装置１００を用いるユーザの動作、及び映像処理装置１００において行われる処理の推移を示しており、時間は左から右へ流れる。図１に示すように、映像処理装置１００は、ユーザにより撮影された映像１０から要約映像５０を生成する。要約映像５０とは、ユーザにより撮影された映像を要約したダイジェスト版の映像である。映像処理装置１００は、撮影された映像１０から任意の採用基準を用いて採用した区間を、入力された音楽３０に応じて切り替えて連結することで、要約映像５０を生成する。なお、本明細書では、映像は画像（静止画像／動画像）データ及び音声データを含むものとする。以下、映像処理装置１００において実行される要約映像５０の生成処理の概要を説明する。 FIG. 1 is a diagram for explaining an overview of a video processing apparatus 100 according to the present embodiment. FIG. 1 shows a user's operation using the video processing device 100 and a transition of processing performed in the video processing device 100, and time flows from left to right. As shown in FIG. 1, the video processing apparatus 100 generates a summary video 50 from a video 10 taken by a user. The summary video 50 is a digest version video that summarizes video shot by the user. The video processing apparatus 100 generates the summary video 50 by switching and connecting sections adopted from the captured video 10 using any adoption standard according to the input music 30. In the present specification, the video includes image (still image / moving image) data and audio data. Hereinafter, an outline of the generation process of the summary video 50 executed in the video processing apparatus 100 will be described.

まず、ユーザにより撮影が行われる期間、映像処理装置１００は、撮影された映像１０を記録する記録処理を行うと共に、映像１０を解析する映像解析処理を行う。例えば、映像処理装置１００は、映像解析処理として、撮影中のユーザ操作の解析を行ったり、笑顔検出、色検出及び動きベクトルの検出等の画像解析を行ったり、撮影中のセンサ情報に基づく被写体の動作の解析を行ったりする。 First, during a period in which shooting is performed by the user, the video processing apparatus 100 performs recording processing for recording the shot video 10 and video analysis processing for analyzing the video 10. For example, as the video analysis processing, the video processing device 100 analyzes user operations during shooting, performs image analysis such as smile detection, color detection, and motion vector detection, or subjects based on sensor information during shooting. Analyzing the operation of.

次いで、映像処理装置１００は、映像解析処理の結果を示す映像解析結果情報２０及び入力された音楽３０に基づいて編集情報生成処理を行う。例えば、映像処理装置１００は、任意の採用基準を用いて映像解析結果情報２０を評価することで、映像１０から要約映像５０に採用する単位映像を選択する。単位映像とは、一続きの映像であり、ショットとも称される。そして、映像処理装置１００は、採用した単位映像を、音楽３０に応じて切り替えるための編集情報４０を生成する。編集情報４０とは、どの音楽３０のどの区間をＢＧＭ（background music）として、どの単位映像を、どのタイミングで切り替えるか、を規定する情報である。映像処理装置１００は、音楽理論に基づいて音楽３０を解析することで、音楽３０のメロディやリズム、拍又は盛り上がり等に応じたタイミングで単位映像が切り替わるよう、編集情報４０を生成する。 Next, the video processing apparatus 100 performs editing information generation processing based on the video analysis result information 20 indicating the result of the video analysis processing and the input music 30. For example, the video processing apparatus 100 selects a unit video to be adopted as the summary video 50 from the video 10 by evaluating the video analysis result information 20 using an arbitrary adoption standard. The unit video is a series of video and is also called a shot. Then, the video processing apparatus 100 generates editing information 40 for switching the adopted unit video according to the music 30. The editing information 40 is information that defines which section of which music 30 is BGM (background music) and which unit video is switched at which timing. The video processing apparatus 100 analyzes the music 30 based on the music theory, and generates editing information 40 so that the unit video is switched at a timing according to the melody, rhythm, beat, or excitement of the music 30.

そして、映像処理装置１００は、編集情報４０に基づいて要約映像生成処理を行う。例えば、映像処理装置１００は、編集情報４０により指定された音楽３０をＢＧＭとして、編集情報４０により指定された単位映像を指定されたタイミングで切り替えて連結することで、要約映像５０を生成する。映像処理装置１００は、要約映像５０を再生したり、記録したり、他の機器へ送信したりすることも可能である。 Then, the video processing apparatus 100 performs a summary video generation process based on the editing information 40. For example, the video processing apparatus 100 generates the summary video 50 by switching and connecting the unit video specified by the editing information 40 at the specified timing with the music 30 specified by the editing information 40 as BGM. The video processing apparatus 100 can also reproduce, record, and transmit the summary video 50 to other devices.

なお、図１に示した映像解析処理は、ユーザによる撮影と並行して行われてもよいし、撮影後に行われてもよい。また、映像解析処理、編集情報生成処理及び要約映像生成処理は、連続的に行われてもよいし、非連続的に行われてもよい。また、映像処理装置１００は、複数の映像及び音楽３０をＢＧＭとする要約映像５０を生成してもよい。 Note that the video analysis processing illustrated in FIG. 1 may be performed in parallel with shooting by the user or after shooting. In addition, the video analysis process, the editing information generation process, and the summary video generation process may be performed continuously or discontinuously. The video processing apparatus 100 may generate a summary video 50 in which a plurality of videos and music 30 are BGM.

以上、要約映像５０の生成処理の概要を説明した。続いて、図２及び図３を参照して、要約映像５０の生成処理をより詳細に説明する。 The outline of the generation process of the summary video 50 has been described above. Next, the generation process of the summary video 50 will be described in more detail with reference to FIGS.

図２は、本実施形態に係る映像処理装置１００において実行される映像解析処理の概要を説明するための図である。図２に示した図では、映像１０はユーザの１日の映像であり、映像解析結果情報２０はハイライト２１及びシーンセグメント２２といった、映像の属性を示す情報を含む。映像１０には、海に到着した映像、サーフィンを行っている映像、休憩中の映像、ランチでの映像、ホテルでの映像、及び夕日の映像が含まれている。ハイライト２１とは、映像１０における見どころを示す区間である。見どころとしては、例えば、ジャンプやターンなどの特定の動作や笑顔、歓声が沸いたイベントの盛り上がったシーン、結婚式のケーキカットや指輪交換等の特定のイベントにおける重要なシーンなどが挙げられる。シーンセグメント２２とは、映像１０を所定の条件で分割する区間である。例えば、シーンセグメント２２は、色に基づいて分割された、同一系統の色が連続する区間であってもよい。また、シーンセグメント２２は、カメラワークに基づいて分割された、同一のカメラワークが連続する区間であってもよい。また、シーンセグメント２２は、日時に基づいて分割された、近しい日時で撮影された区間であってもよい。また、シーンセグメント２２は、場所に基づいて分割された、同一又は近しい場所で撮影された区間であってもよい。一例として、図２では、シーンセグメント２２が、色に基づいて分割された例を示している。セグメント化される色は、例えば白系、青系、緑系及び赤系などであってもよい。映像処理装置１００は、映像解析処理により、ハイライト２１及びシーンセグメント２２といった映像の属性を解析する。 FIG. 2 is a diagram for explaining an overview of video analysis processing executed in the video processing device 100 according to the present embodiment. In the diagram shown in FIG. 2, the video 10 is a video of the user's day, and the video analysis result information 20 includes information indicating video attributes such as a highlight 21 and a scene segment 22. The video 10 includes a video arriving at the sea, a video surfing, a video during a break, a video at lunch, a video at a hotel, and a video at sunset. The highlight 21 is a section showing the highlight in the video 10. Highlights include, for example, specific scenes such as jumps and turns, smiles, lively scenes of cheering events, and important scenes in specific events such as wedding cake cuts and ring exchanges. The scene segment 22 is a section in which the video 10 is divided under a predetermined condition. For example, the scene segment 22 may be a section in which colors of the same system are divided based on colors. The scene segment 22 may be a section in which the same camera work is divided based on the camera work. In addition, the scene segment 22 may be a section that is divided on the basis of the date and time and is shot at a close date and time. In addition, the scene segment 22 may be a section that is divided based on a place and is shot at the same or near place. As an example, FIG. 2 shows an example in which the scene segment 22 is divided based on colors. The colors to be segmented may be, for example, white, blue, green, and red. The video processing apparatus 100 analyzes video attributes such as the highlight 21 and the scene segment 22 by video analysis processing.

図３は、本実施形態に係る映像処理装置１００において実行される編集情報生成処理及び要約映像生成処理の概要を説明するための図である。まず、映像処理装置１００は、シーンセグメント２２が同一である一続きの映像を単位映像として抽出する。そして、映像処理装置１００は、単位映像の中からハイライト２１を優先的に採用しつつ、所定の方針に従って単位映像を採用する。例えば、映像処理装置１００は、視覚的な偏りを低減するために、シーンセグメント２２が分散する単位映像を採用してもよい。また、映像処理装置１００は、ユーザにより指定されたサーフィンやスノーボード等のテーマに沿って単位映像を採用してもよい。具体的には、サーフィンの場合、映像処理装置１００は、食事よりもサーフィン中のターン等のハイライトの割合が多くなるよう、また、青系、海に近い場所、波が高い時間帯のシーンセグメントの割合が多くなるよう、単位映像を採用してもよい。また、映像処理装置１００は、音楽理論に基づいて音楽３０（ＢＧＭ）を解析して、単位映像を切り替えるべきタイミングを設定する。これらの処理を経て、映像処理装置１００は、設定したタイミングで、採用した単位映像を切り替えるための編集情報４０を生成する。そして、映像処理装置１００は、編集情報４０に基づいて要約映像５０を生成する。なお、要約映像５０に含まれる単位映像は、時系列に沿っていてもよいし、沿っていなくてもよい。 FIG. 3 is a diagram for explaining an overview of the editing information generation process and the summary video generation process executed in the video processing apparatus 100 according to the present embodiment. First, the video processing apparatus 100 extracts a series of videos having the same scene segment 22 as a unit video. Then, the video processing apparatus 100 adopts the unit video according to a predetermined policy while preferentially adopting the highlight 21 from the unit videos. For example, the video processing apparatus 100 may employ a unit video in which the scene segments 22 are dispersed in order to reduce visual bias. In addition, the video processing apparatus 100 may adopt unit videos in accordance with themes such as surfing and snowboard specified by the user. Specifically, in the case of surfing, the image processing apparatus 100 is configured so that the ratio of highlights such as turns during surfing is higher than that of meals, and the scenes of blue systems, places close to the sea, and time zones with high waves are high. Unit video may be adopted so that the proportion of segments increases. Further, the video processing apparatus 100 analyzes the music 30 (BGM) based on the music theory, and sets the timing for switching the unit video. Through these processes, the video processing apparatus 100 generates editing information 40 for switching the adopted unit video at the set timing. Then, the video processing apparatus 100 generates a summary video 50 based on the editing information 40. Note that the unit video included in the summary video 50 may or may not be along the time series.

例えば、映像処理装置１００は、アクションカメラやウェアラブルカメラ等のカメラとして実現され得る。アクションカメラやウェアラブルカメラ等のカメラは、長時間連続して撮影される場合が多く、また構図が単調になりやすい。このため、このようなカメラで撮影された映像は、見どころをまとめた要約映像に編集されることが望ましい。しかし、このようなカメラは小型であったりＵＩがシンプルであったりする場合が多いので、映像を確認しながら手動で編集することが困難な場合がある。そのため、長時間連続して撮影された、構図が単調な映像であっても、適切な要約映像が生成されることが望ましい。この点、本実施形態に係る映像処理装置１００は、そのような映像であっても、属性が分散され、ユーザにより指定されたテーマに沿った、ハイライトを含むショットがＢＧＭに応じて切り替わる要約映像を生成することが可能である。なお、映像処理装置１００は、一般的なビデオカメラ等として実現されてもよいし、カメラとは別箇のＰＣ（Personal Computer）又はネットワーク上のサーバ等の情報処理装置として実現されてもよい。 For example, the video processing apparatus 100 can be realized as a camera such as an action camera or a wearable camera. Cameras such as action cameras and wearable cameras are often taken continuously for a long time, and the composition tends to be monotonous. For this reason, it is desirable to edit the video shot by such a camera into a summary video that summarizes the highlights. However, since such a camera is often small or has a simple UI, it may be difficult to manually edit while checking the video. Therefore, it is desirable that an appropriate summary video is generated even if the composition is a monotonous video taken continuously for a long time. In this regard, the video processing apparatus 100 according to the present embodiment is a summary in which, even for such video, attributes are distributed and a shot including a highlight is switched according to BGM along a theme specified by the user. It is possible to generate a video. Note that the video processing apparatus 100 may be realized as a general video camera or the like, or may be realized as an information processing apparatus such as a PC (Personal Computer) separate from the camera or a server on a network.

以上、本実施形態に係る映像処理装置１００の概要を説明した。続いて、図４を参照して、本実施形態に係る映像処理装置１００の基本的な構成例を説明する。 The overview of the video processing apparatus 100 according to the present embodiment has been described above. Next, a basic configuration example of the video processing apparatus 100 according to the present embodiment will be described with reference to FIG.

＜２．基本構成＞
図４は、本実施形態に係る映像処理装置１００の論理的な構成の一例を示すブロック図である。図４に示すように、映像処理装置１００は、入力部１１０、記憶部１２０、出力部１３０及び制御部１４０を含む。 <2. Basic configuration>
FIG. 4 is a block diagram illustrating an example of a logical configuration of the video processing apparatus 100 according to the present embodiment. As shown in FIG. 4, the video processing apparatus 100 includes an input unit 110, a storage unit 120, an output unit 130, and a control unit 140.

（１）入力部１１０
入力部１１０は、外部からの各種情報の入力を受け付ける機能を有する。図４に示すように、入力部１１０は、センサ部１１１、操作部１１２、映像取得部１１３及び音楽取得部１１４を含む。 (1) Input unit 110
The input unit 110 has a function of accepting input of various information from the outside. As shown in FIG. 4, the input unit 110 includes a sensor unit 111, an operation unit 112, a video acquisition unit 113, and a music acquisition unit 114.

（１．１）センサ部１１１
センサ部１１１は、被写体の動作を検出する機能を有する。例えば、センサ部１１１は、ジャイロセンサ、加速度センサ及び重力センサを含み得る。被写体とは、撮影対象であり、撮影者（ユーザ）も含むものとする。センサ部１１１は、ＧＰＳ（Global Positioning System）、赤外線センサ、近接センサ、タッチセンサ等の任意のセンサを含んでいてもよい。センサ部１１１は、センシング結果を示すセンサ情報を制御部１４０へ出力する。なお、センサ部１１１は、映像処理装置１００と一体的に形成されていなくてもよい。例えば、センサ部１１１は、被写体に装着されたセンサから有線又は無線通信を介してセンサ情報を取得してもよい。 (1.1) Sensor unit 111
The sensor unit 111 has a function of detecting the motion of the subject. For example, the sensor unit 111 may include a gyro sensor, an acceleration sensor, and a gravity sensor. A subject is a subject to be photographed and includes a photographer (user). The sensor unit 111 may include an arbitrary sensor such as a GPS (Global Positioning System), an infrared sensor, a proximity sensor, or a touch sensor. The sensor unit 111 outputs sensor information indicating the sensing result to the control unit 140. The sensor unit 111 may not be formed integrally with the video processing apparatus 100. For example, the sensor unit 111 may acquire sensor information from a sensor attached to the subject via wired or wireless communication.

（１．２）操作部１１２
操作部１１２は、ユーザ操作を受け付ける機能を有する。例えば、操作部１１２は、ボタン及びタッチパッド等により実現される。操作部１１２は、撮影中のズーム操作や撮影モードの設定操作などの操作を受け付け得る。撮影モードとしては、例えば動画を撮影する通常モード、及び動画及び静止画を同時に撮影する同時撮影モードなどが考えられる。他にも、操作部１１２は、撮影中又は撮影後に、要約映像に含めるべき区間を指定する編集指示を受け付け得る。操作部１１２は、ユーザ操作の内容を示す操作情報を制御部１４０へ出力する。 (1.2) Operation unit 112
The operation unit 112 has a function of accepting user operations. For example, the operation unit 112 is realized by a button, a touch pad, and the like. The operation unit 112 can accept operations such as a zoom operation during shooting and a shooting mode setting operation. As the shooting mode, for example, a normal mode for shooting a moving image and a simultaneous shooting mode for simultaneously shooting a moving image and a still image can be considered. In addition, the operation unit 112 can accept an editing instruction for designating a section to be included in the summary video during or after shooting. The operation unit 112 outputs operation information indicating the content of the user operation to the control unit 140.

（１．３）映像取得部１１３
映像取得部１１３は、映像を取得する機能を有する。例えば、映像取得部１１３は、撮像装置として実現され、デジタル信号とされた撮影画像（動画像／静止画像）のデータを出力する。映像取得部１１３は、周囲の音を収音し、アンプ及びＡＤＣ（ＡｎａｌｏｇＤｉｇｉｔａｌＣｏｎｖｅｒｔｅｒ）を介してデジタル信号に変換した音データを取得するマイクをさらに含んでいてもよい。その場合、映像取得部１１３は、周囲の音が付随する映像データを出力する。 (1.3) Video acquisition unit 113
The video acquisition unit 113 has a function of acquiring video. For example, the video acquisition unit 113 is realized as an imaging device and outputs data of a captured image (moving image / still image) that is a digital signal. The video acquisition unit 113 may further include a microphone that collects ambient sound and acquires sound data converted into a digital signal via an amplifier and an ADC (Analog Digital Converter). In that case, the video acquisition unit 113 outputs video data accompanied by surrounding sounds.

（１．４）音楽取得部１１４
音楽取得部１１４は、要約映像のＢＧＭとなる音楽データを取得する機能を有する。例えば、音楽取得部１１４は、有線又は無線のインタフェースとして実現され、ＰＣ又はサーバ等の他の装置から音楽データを取得する。有線のインタフェースとしては、例えばＵＳＢ（Universal Serial Bus）等の規格に準拠したコネクタが挙げられる。無線のインタフェースとしては、例えばＢｌｕｅｔｏｏｔｈ（登録商標）又はＷｉ−Ｆｉ（登録商標）等の通信規格に準拠した通信装置が挙げられる。音楽取得部１１４は、取得した音楽データを制御部１４０へ出力する。 (1.4) Music acquisition unit 114
The music acquisition unit 114 has a function of acquiring music data to be BGM of the summary video. For example, the music acquisition unit 114 is realized as a wired or wireless interface, and acquires music data from another device such as a PC or a server. As the wired interface, for example, a connector compliant with a standard such as USB (Universal Serial Bus) can be cited. As a wireless interface, for example, a communication device compliant with a communication standard such as Bluetooth (registered trademark) or Wi-Fi (registered trademark) can be cited. The music acquisition unit 114 outputs the acquired music data to the control unit 140.

（２）記憶部１２０
記憶部１２０は、各種情報を記憶する機能を有する。例えば、記憶部１２０は、入力部１１０から出力された情報、及び制御部１４０により生成される情報を記憶する。 (2) Storage unit 120
The storage unit 120 has a function of storing various information. For example, the storage unit 120 stores information output from the input unit 110 and information generated by the control unit 140.

（３）出力部１３０
出力部１３０は、各種情報を出力する機能を有する。例えば、出力部１３０は、後述する要約映像生成部１４６により生成された要約映像を再生する機能を有していてもよい。その場合、出力部１３０は、表示部及びスピーカを含み得る。他にも、出力部１３０は、後述する編集部１４４により生成された編集情報を出力する機能を有していてもよい。その場合、出力部１３０は、有線又は無線のインタフェースを含み得る。 (3) Output unit 130
The output unit 130 has a function of outputting various information. For example, the output unit 130 may have a function of playing back a summary video generated by a summary video generation unit 146 described later. In that case, the output unit 130 may include a display unit and a speaker. In addition, the output unit 130 may have a function of outputting editing information generated by the editing unit 144 described later. In that case, the output unit 130 may include a wired or wireless interface.

（４）制御部１４０
制御部１４０は、演算処理装置および制御装置として機能し、各種プログラムに従って映像処理装置１００内の動作全般を制御する。図４に示すように、制御部１４０は、音楽解析部１４１、映像解析部１４２、抽出部１４３、編集部１４４、動作モード制御部１４５及び要約映像生成部１４６を含む。 (4) Control unit 140
The control unit 140 functions as an arithmetic processing device and a control device, and controls the overall operation in the video processing device 100 according to various programs. As shown in FIG. 4, the control unit 140 includes a music analysis unit 141, a video analysis unit 142, an extraction unit 143, an editing unit 144, an operation mode control unit 145, and a summary video generation unit 146.

（４．１）音楽解析部１４１
音楽解析部１４１は、入力された音楽の内容を解析する機能を有する。詳しくは、音楽解析部１４１は、音楽取得部１１４により取得された音楽データを対象として、音楽理論に基づく解析を行う。 (4.1) Music analysis unit 141
The music analysis unit 141 has a function of analyzing the content of the input music. Specifically, the music analysis unit 141 performs analysis based on music theory for the music data acquired by the music acquisition unit 114.

音楽解析部１４１は、音楽の構造を解析してもよい。例えば、音楽解析部１４１は、音楽の構造を解析することで、所定の条件を満たす部分を特定する。例えば、音楽解析部１４１は、音楽理論に基づいて、音楽のイントロ（Intro）部分、メロディ（Verse）部分、コーラス（サビとも称される）（Chorus）部分、間奏（Interlude）部分、ソロ（Solo）部分、エンディング（Outro）部分等の構成要素を特定し得る。メロディ部分には、Ａメロディ（Melody A）やＢメロディ（Melody B）に区分されてもよい。さらに、音楽解析部１４１は、特定した音楽の各構成要素におけるコード進行を検出してもよく、検出したコード信号に基づいて、コーラス部分の中でも特に重要な部分（区間）を特定してもよい。他にも、音楽解析部１４１は、コーラス部分の中でも、ボーカルの歌い始めの区間、最もボーカルの音程が高い区間等を、特に重要な部分として特定してもよい。 The music analysis unit 141 may analyze the music structure. For example, the music analysis unit 141 identifies a portion that satisfies a predetermined condition by analyzing the structure of music. For example, the music analysis unit 141 is based on a music theory, and includes an intro part, a melody (Verse) part, a chorus (also called chorus) part, an interlude part, and a solo part. ) Part, ending (Outro) part, etc. can be specified. The melody portion may be divided into A melody A and B melody B. Furthermore, the music analysis unit 141 may detect chord progression in each component of the identified music, and may identify a particularly important portion (section) in the chorus portion based on the detected chord signal. . In addition, the music analysis unit 141 may specify a section in which the vocal singing starts, a section having the highest vocal pitch, and the like as particularly important parts in the chorus part.

また、音楽解析部１４１は、音楽のリズムを解析してもよい。例えば、音楽解析部１４１は、音楽の拍（ビート）を解析したり、小節を解析したりする。例えば４拍子であれば、ひとつの小節に等間隔で４つの拍が含まれ、そのうち始めの拍が小節の始めと一致する。小節の始めと一致する拍を、以下では小節頭の拍とも称する。 Further, the music analysis unit 141 may analyze the rhythm of music. For example, the music analysis unit 141 analyzes music beats or measures bars. For example, in the case of four time signatures, four beats are included at equal intervals in one measure, and the first beat of these measures coincides with the beginning of the measure. The beat that coincides with the beginning of the measure is also referred to as the beat at the beginning of the measure.

音楽解析部１４１は、解析結果を示す音楽解析結果情報を、編集部１４４へ出力する。なお、音楽解析結果情報は、例えば音楽データにおける各構成要素の位置、特に重要な部分の位置、各拍の位置、及び各小節の位置を示す情報を含む。 The music analysis unit 141 outputs music analysis result information indicating the analysis result to the editing unit 144. The music analysis result information includes, for example, information indicating the position of each component in the music data, particularly the position of an important part, the position of each beat, and the position of each bar.

（４．２）映像解析部１４２
映像解析部１４２は、入力された映像の内容を解析する機能を有する。詳しくは、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として、内容の解析を行う。そして、映像解析部１４２は、映像の内容の解析結果を示す映像解析結果情報を抽出部１４３へ出力する。 (4.2) Video analysis unit 142
The video analysis unit 142 has a function of analyzing the content of the input video. Specifically, the video analysis unit 142 analyzes the content for the video data acquired by the video acquisition unit 113. Then, the video analysis unit 142 outputs video analysis result information indicating the analysis result of the video content to the extraction unit 143.

・ハイライトの検出
例えば、映像解析部１４２は、入力部１１０により入力された情報に基づいてハイライトを検出し、検出したハイライトを示す情報を映像解析結果情報に含めて出力する。一例として、映像解析部１４２が、被写体動作、ユーザ操作、並びに顔及び笑顔に関するハイライトを検出する例を説明する。 Highlight Detection For example, the video analysis unit 142 detects a highlight based on information input from the input unit 110, and outputs information indicating the detected highlight in the video analysis result information. As an example, an example will be described in which the video analysis unit 142 detects highlights related to subject motion, user operation, and face and smile.

例えば、映像解析部１４２は、センサ部１１１により取得されたセンサ情報に基づいて、被写体の所定の動作を検出する。例えば、映像解析部１４２は、センサ情報に基づいて、被写体の跳躍（ジャンプ）、進行方向の転換（ターン）、走行、加速又は減速等の被写体の動作を検出し得る。また、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として画像認識処理を行うことで、被写体の所定の動作を検出してもよい。被写体動作の検出処理に関して、映像解析結果情報は、検出された被写体の動作を示す情報、及び映像データにおいて当該動作が検出される区間を示す情報を含み得る。 For example, the video analysis unit 142 detects a predetermined motion of the subject based on the sensor information acquired by the sensor unit 111. For example, the video analysis unit 142 can detect the movement of the subject such as jumping of the subject (jump), change of traveling direction (turn), running, acceleration, or deceleration based on the sensor information. The video analysis unit 142 may detect a predetermined motion of the subject by performing an image recognition process on the video data acquired by the video acquisition unit 113. Regarding subject motion detection processing, the video analysis result information may include information indicating the detected motion of the subject and information indicating a section in which the motion is detected in the video data.

例えば、映像解析部１４２は、操作部１１２により取得された操作情報に基づいて、ユーザ操作を検出する。例えば、映像解析部１４２は、撮影中に取得された操作情報に基づいて、ズーム操作や撮影モードの設定操作などの所定の操作などを検出する。ユーザ操作の検出処理に関して、映像解析結果情報は、検出されたユーザ操作を示す情報、及び映像データにおいて当該ユーザ操作が検出される区間を示す情報を含み得る。他にも、映像解析部１４２は、撮影中又は撮影後に取得された操作情報に基づいて、編集指示を検出する。この場合、映像解析結果情報は、ユーザにより要約映像に含めるべき区間として指定された区間を示す情報を含み得る。 For example, the video analysis unit 142 detects a user operation based on the operation information acquired by the operation unit 112. For example, the video analysis unit 142 detects a predetermined operation such as a zoom operation or a shooting mode setting operation based on operation information acquired during shooting. Regarding the user operation detection process, the video analysis result information may include information indicating the detected user operation and information indicating a section in which the user operation is detected in the video data. In addition, the video analysis unit 142 detects an editing instruction based on operation information acquired during or after shooting. In this case, the video analysis result information may include information indicating a section designated as a section to be included in the summary video by the user.

例えば、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として画像認識処理を行うことで、被写体の顔及び笑顔を検出する。顔及び笑顔の検出処理に関して、映像解析結果情報は、映像データにおいて顔及び笑顔が検出される区間、領域、並びに顔及び笑顔の数を示す情報を含み得る。 For example, the video analysis unit 142 performs image recognition processing on the video data acquired by the video acquisition unit 113 to detect the face and smile of the subject. Regarding the face and smile detection process, the video analysis result information may include information indicating a section and a region where the face and smile are detected in the video data, and the number of faces and smiles.

例えば、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として音声認識処理を行うことで、歓声が沸いた区間を検出する。歓声の検出処理に関して、映像解析結果情報は、映像データにおいて歓声が検出される区間、並びに音量を示す情報を含み得る。 For example, the video analysis unit 142 performs voice recognition processing on the video data acquired by the video acquisition unit 113 to detect a section where cheers are struck. Regarding the cheering detection process, the video analysis result information may include information indicating the section in which the cheer is detected in the video data and the volume.

例えば、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として画像認識処理を行うことで、特定のイベントにおける重要シーンを検出する。重要シーンとしては、結婚式におけるケーキカットや指輪交換等が挙げられる。重要シーンの検出処理に関して、映像解析結果情報は、映像データにおいて重要シーンが検出される区間、並びに重要度を示す情報を含み得る。 For example, the video analysis unit 142 detects an important scene in a specific event by performing an image recognition process on the video data acquired by the video acquisition unit 113. Important scenes include wedding cake cuts and ring exchange. Regarding the important scene detection processing, the video analysis result information may include information indicating the section in which the important scene is detected in the video data and the importance.

・シーンセグメントのための情報の検出
例えば、映像解析部１４２は、入力部１１０により入力された情報に基づいてシーンセグメントのための情報を検出し、検出したシーンセグメントのための情報を映像解析結果情報に含めて出力する。一例として、映像解析部１４２が、色、カメラワーク、日時及び場所に関するシーンセグメントのための情報を検出する例を説明する。 -Detection of information for a scene segment For example, the video analysis unit 142 detects information for a scene segment based on the information input by the input unit 110, and the information for the detected scene segment is a video analysis result. Output in the information. As an example, an example will be described in which the video analysis unit 142 detects information for a scene segment related to color, camera work, date / time, and location.

例えば、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として画像認識処理を行うことで、映像の色を検出し得る。詳しくは、映像解析部１４２は、映像のＹＵＶ又はＲＧＢ等を解析して、カラーヒストグラムを各フレーム又は複数のフレームごとに検出する。そして、映像解析部１４２は、各フレームにおいて支配的な色を当該フレームの色として検出する。なお、検出された色を識別するための識別情報を、色ＩＤとも称する。色の検出処理に関して、映像解析結果情報は、各区間の色ＩＤを示す情報を含み得る。 For example, the video analysis unit 142 can detect the color of the video by performing image recognition processing on the video data acquired by the video acquisition unit 113. Specifically, the video analysis unit 142 analyzes YUV or RGB of the video and detects a color histogram for each frame or a plurality of frames. Then, the video analysis unit 142 detects a dominant color in each frame as the color of the frame. The identification information for identifying the detected color is also referred to as a color ID. Regarding the color detection process, the video analysis result information may include information indicating the color ID of each section.

例えば、映像解析部１４２は、映像取得部１１３により取得された映像データを対象として画像認識処理を行うことで、カメラワークを検出し得る。例えば、映像解析部１４２は、各フレーム又は複数のフレームごとに動きベクトルを検出することで、静止、上下又は左右といったカメラワークを検出する。なお、検出されたカメラワークを識別するための識別情報を、カメラワークＩＤとも称する。カメラワークの検出処理に関して、映像解析結果情報は、各区間のカメラワークＩＤを示す情報を含み得る。 For example, the video analysis unit 142 can detect camerawork by performing image recognition processing on the video data acquired by the video acquisition unit 113. For example, the video analysis unit 142 detects camera work such as still, up and down, left and right by detecting a motion vector for each frame or for each of a plurality of frames. The identification information for identifying the detected camera work is also referred to as a camera work ID. Regarding the camera work detection process, the video analysis result information may include information indicating the camera work ID of each section.

例えば、映像解析部１４２は、センサ部１１１に含まれるＧＰＳ又は映像取得部１１３に含まれるカメラ等に内蔵された時計により取得された撮影日時を検出し得る。なお、検出された撮影日時を識別するための識別情報を、撮影日時ＩＤとも称する。同一又は近しい日時に撮影された区間には、同一の撮影日時ＩＤが付されるものとする。撮影日時の検出処理に関して、映像解析結果情報は、撮影日時セグメントの各々の撮影日時ＩＤ及び区間を示す情報を含み得る。 For example, the video analysis unit 142 can detect the shooting date and time acquired by a GPS included in the sensor unit 111 or a clock built in a camera or the like included in the video acquisition unit 113. Note that the identification information for identifying the detected shooting date / time is also referred to as a shooting date / time ID. It is assumed that the same shooting date / time ID is assigned to sections shot at the same or close date / time. Regarding the shooting date / time detection process, the video analysis result information may include information indicating the shooting date / time ID and the section of each shooting date / time segment.

例えば、映像解析部１４２は、センサ部１１１含まれるＧＰＳにより取得された位置情報に基づいて、撮影された場所を検出し得る。なお、検出された撮影場所を識別するための識別情報を、撮影場所ＩＤとも称する。同一又は近しい場所に撮影された区間には、同一の撮影場所ＩＤが付されるものとする。撮影場所の検出処理に関して、映像解析結果情報は、各区間の撮影場所ＩＤを示す情報を含み得る。 For example, the video analysis unit 142 can detect the location where the image was taken based on position information acquired by the GPS included in the sensor unit 111. The identification information for identifying the detected shooting location is also referred to as a shooting location ID. It is assumed that the same shooting location ID is attached to sections shot at the same or close locations. Regarding the shooting location detection process, the video analysis result information may include information indicating the shooting location ID of each section.

（４．３）抽出部１４３
抽出部１４３は、入力された映像から複数の単位映像を抽出する機能を有する。詳しくは、抽出部１４３は、映像解析部１４２による解析結果に基づいて、映像取得部１１３により取得された映像データから複数の単位映像を抽出する。詳しくは、抽出部１４３は、解析結果情報が示す映像の属性が同一である一続きの映像を、単位映像として抽出する。 (4.3) Extraction unit 143
The extraction unit 143 has a function of extracting a plurality of unit videos from the input video. Specifically, the extraction unit 143 extracts a plurality of unit videos from the video data acquired by the video acquisition unit 113 based on the analysis result by the video analysis unit 142. Specifically, the extraction unit 143 extracts a series of videos having the same video attributes indicated by the analysis result information as unit videos.

例えば、抽出部１４３は、シーンセグメントが同一である一続きの映像を単位映像として抽出してもよい。また、抽出部１４３は、ハイライトが検出された映像を単位映像として抽出してもよい。詳しくは、抽出部１４３は、被写体のジャンプ等の所定の動作が検出された区間を、ひとつの単位映像として抽出してもよい。また、抽出部１４３は、ズーム操作や撮影モードの設定操作等の所定の操作が検出された区間、又はユーザにより要約映像に含めるべき区間として指定された区間を、それぞれひとつの単位映像として抽出してもよい。その際、抽出部１４３は、ズーム操作であればズーム後の区間を単位映像として抽出してもよく、撮影モードの設定操作であれば同時撮影モードで撮影された区間を単位映像として抽出してもよい。また、抽出部１４３は、被写体の顔又は笑顔が検出された区間、即ち被写体の状態が笑顔である又はカメラに顔を向けている等の所定の状態であると検出された区間、又はその前後の区間を、ひとつの単位映像として抽出してもよい。また、抽出部１４３は、歓声が沸いている区間を、ひとつの単位映像として抽出してもよい。また、抽出部１４３は、特定のイベントにおける重要シーンが撮影された区間を、ひとつの単位映像として抽出してもよい。抽出部１４３は、これらの抽出基準を組み合わせて用いてもよい。 For example, the extraction unit 143 may extract a series of videos having the same scene segment as a unit video. Further, the extraction unit 143 may extract a video in which highlight is detected as a unit video. Specifically, the extraction unit 143 may extract a section in which a predetermined operation such as a subject jump is detected as one unit video. In addition, the extraction unit 143 extracts a section in which a predetermined operation such as a zoom operation or a shooting mode setting operation is detected, or a section designated as a section to be included in the summary video by the user as one unit video. May be. At that time, the extraction unit 143 may extract a section after zooming as a unit video if the operation is a zoom operation, or extract a section shot in the simultaneous shooting mode as a unit video if the operation is a shooting mode setting operation. Also good. In addition, the extraction unit 143 is a section in which the face or smile of the subject is detected, that is, a section in which the subject is detected to be in a predetermined state such as smiling or facing the camera, or before and after the section. May be extracted as one unit video. In addition, the extraction unit 143 may extract a section where cheers are boiling as one unit video. The extraction unit 143 may extract a section in which an important scene in a specific event is captured as one unit video. The extraction unit 143 may use a combination of these extraction criteria.

抽出部１４３は、映像解析部１４２による解析結果に基づいて、抽出した単位映像に注目度を設定してもよい。例えば、抽出部１４３は、ハイライトに相当する区間の単位映像に高い注目度を設定する。詳しくは、抽出部１４３は、映像解析部１４２により、単位映像の撮影区間における被写体の動作が所定の動作であると解析された場合、被写体の状態が所定の状態であると解析された場合又は所定の操作があったと解析された場合に、当該単位映像に高い注目度を設定する。他にも、抽出部１４３は、映像解析部１４２により、単位映像の撮影区間において歓声が沸いたと解析された場合、重要シーンであったと解析された場合に、当該単位映像に高い注目度を設定する。これにより、被写体のジャンプ等の所定の動作が検出された区間に該当する単位映像に、高い注目度が設定される。また、被写体の状態が笑顔である又はカメラに顔を向けている等の所定の状態であると検出された区間に該当する単位映像に、高い注目度が設定される。また、ズーム操作や撮影モードの設定操作等の所定の操作が検出された区間に該当する単位映像に、高い注目度が設定される。また、歓声が沸いた区間に該当する単位映像に高い注目度が設定される。また、結婚式のケーキカットや指輪交換等の特定のイベントにおける重要なシーンが検出された区間に該当する単位映像に、高い注目度が設定される。他にも、抽出部１４３は、ユーザにより要約映像に含めるべき区間として指定された区間に該当する単位映像に、高い注目度を設定してもよい。そして、抽出部１４３は、上述した以外の他の場合に低い注目度を設定する。以下では、注目度が高い単位映像を、ハイライトショットとも称する。また、注目度が低い単位映像を、サブショットとも称する。また、抽出されたハイライトショットの種類を識別するための識別情報を、ハイライトＩＤとも称する。例えば、ハイライトＩＤは、ジャンプ、ズーム操作、歓声、重要シーン、ユーザにより指定された等のハイライトの種類に応じて異なるＩＤが設定され得る。 The extraction unit 143 may set the degree of attention for the extracted unit video based on the analysis result by the video analysis unit 142. For example, the extraction unit 143 sets a high degree of attention to the unit video in the section corresponding to the highlight. Specifically, when the video analysis unit 142 analyzes that the motion of the subject in the shooting section of the unit video is a predetermined motion, the extraction unit 143 analyzes that the subject is in a predetermined state or When it is analyzed that a predetermined operation has been performed, a high degree of attention is set for the unit video. In addition, the extraction unit 143 sets a high degree of attention to the unit video when it is analyzed by the video analysis unit 142 that a cheer is generated in the shooting period of the unit video or when it is analyzed that the scene is an important scene. To do. As a result, a high degree of attention is set for the unit video corresponding to the section in which the predetermined motion such as the jump of the subject is detected. In addition, a high degree of attention is set for a unit video corresponding to a section in which the subject is detected to be in a predetermined state such as a smiling face or a face facing the camera. In addition, a high degree of attention is set for a unit video corresponding to a section in which a predetermined operation such as a zoom operation or a shooting mode setting operation is detected. In addition, a high degree of attention is set for the unit video corresponding to the section where the cheers are struck. Further, a high degree of attention is set for a unit video corresponding to a section in which an important scene in a specific event such as a wedding cake cut or ring exchange is detected. In addition, the extraction unit 143 may set a high degree of attention to a unit video corresponding to a section designated as a section to be included in the summary video by the user. Then, the extraction unit 143 sets a low degree of attention in cases other than those described above. Hereinafter, a unit video with a high degree of attention is also referred to as a highlight shot. A unit video with a low level of attention is also referred to as a sub-shot. The identification information for identifying the type of the extracted highlight shot is also referred to as a highlight ID. For example, as the highlight ID, a different ID can be set according to the type of highlight such as jump, zoom operation, cheer, important scene, or user-designated highlight.

（４．４）編集部１４４
編集部１４４は、抽出部１４３により抽出された単位映像を、入力された音楽に応じて切り替えるための編集情報を生成する機能を有する。例えば、編集部１４４は、入力されたどの音楽のどの区間をＢＧＭとするかを設定する。そして、編集部１４４は、ＢＧＭにする音楽を音楽解析部１４１による音楽解析結果により区切り、各区間に抽出部１４３により抽出された単位映像を割り当てる。これにより、要約映像において、音楽が区切られたタイミングで単位映像が切り替わることとなる。単位映像の割り当ての際、編集部１４４は、抽出部１４３により抽出された単位映像から全部又は一部を要約映像に採用する単位映像として決定し、採用した単位映像を各区間に割り当て得る。なお、編集部１４４は、原則撮影時刻の順に単位映像を割り当てるものとする。もちろん、編集部１４４は、撮影時刻に依存せずに単位映像を割り当ててもよい。このように、編集部１４４は、入力されたどの音楽のどの区間をＢＧＭとして、どの単位映像をどのタイミングで切り替えるかを設定することで、編集情報を生成する。編集部１４４による処理の詳細については、後に詳しく説明する。 (4.4) Editing unit 144
The editing unit 144 has a function of generating editing information for switching the unit video extracted by the extracting unit 143 according to the input music. For example, the editing unit 144 sets which section of which music is input as BGM. Then, the editing unit 144 divides the music to be BGM by the music analysis result by the music analysis unit 141, and assigns the unit video extracted by the extraction unit 143 to each section. As a result, in the summary video, the unit video is switched at the timing when the music is divided. When allocating unit videos, the editing unit 144 may determine all or part of the unit videos extracted by the extraction unit 143 as unit videos to be adopted as summary videos, and assign the adopted unit videos to each section. Note that the editing unit 144 assigns unit videos in the order of shooting times in principle. Of course, the editing unit 144 may assign a unit video without depending on the shooting time. In this way, the editing unit 144 generates editing information by setting which unit video is to be switched at which timing with which section of which music is input as BGM. Details of the processing by the editing unit 144 will be described in detail later.

（４．５）動作モード制御部１４５
動作モード制御部１４５は、抽出部１４３及び編集部１４４における動作モードを制御する機能を有する。動作モード制御部１４５は、抽出部１４３による単位映像の抽出結果、及び編集部１４４による切り替えタイミングの設定結果に応じて、動作モードを制御する。動作モード制御部１４５による処理の詳細については、後に詳しく説明する。 (4.5) Operation mode control unit 145
The operation mode control unit 145 has a function of controlling operation modes in the extraction unit 143 and the editing unit 144. The operation mode control unit 145 controls the operation mode according to the unit video extraction result by the extraction unit 143 and the switching timing setting result by the editing unit 144. Details of processing by the operation mode control unit 145 will be described in detail later.

（４．６）要約映像生成部１４６
要約映像生成部１４６は、音楽と編集情報に基づいて切り替わる単位映像とから成る要約映像を生成する機能を有する。例えば、要約映像生成部１４６は、編集情報により指定された音楽をＢＧＭとして、編集情報により指定された単位映像を指定されたタイミングで切り替えて連結することで、要約映像を生成する。 (4.6) Summary video generation unit 146
The summary video generation unit 146 has a function of generating a summary video composed of music and unit videos that are switched based on editing information. For example, the summary video generation unit 146 generates a summary video by switching and connecting the unit video specified by the editing information at the specified timing with the music specified by the editing information as BGM.

＜３．機能詳細＞
以上、本実施形態に係る映像処理装置１００の基本構成を説明した。続いて、映像処理装置１００が有する機能を詳細に説明する。 <3. Function details>
The basic configuration of the video processing apparatus 100 according to the present embodiment has been described above. Next, functions of the video processing apparatus 100 will be described in detail.

［３．１．単位映像の抽出処理］
抽出部１４３は、映像解析部１４２による解析結果に基づいて、映像取得部１１３により取得された映像データから複数の単位映像を抽出する。具体的には、抽出部１４３は、映像解析部１４２により解析された映像の属性に応じて単位映像を抽出する。例えば、抽出部１４３は、シーンセグメントのための情報及びハイライトを示す情報に基づいて、映像データからハイライトショット及びサブショットを抽出する。以下、図５を参照して、映像解析結果に基づく単位映像の抽出処理を具体的に説明する。 [3.1. Unit video extraction processing]
The extraction unit 143 extracts a plurality of unit videos from the video data acquired by the video acquisition unit 113 based on the analysis result by the video analysis unit 142. Specifically, the extraction unit 143 extracts a unit video according to the video attribute analyzed by the video analysis unit 142. For example, the extraction unit 143 extracts highlight shots and sub-shots from the video data based on information for scene segments and information indicating highlights. Hereinafter, the unit video extraction process based on the video analysis result will be described in detail with reference to FIG.

図５は、本実施形態に係る単位映像の抽出処理を説明するための図である。図５では、抽出部１４３がハイライトショット２６０Ａ〜２６０Ｅ及びサブショット２７０Ａ〜２７０Ｇを抽出する処理を概略的に示している。図５に示すように、抽出部１４３は、まずシーンセグメントのための情報に基づいて、シーンセグメント２１０を生成する。例えば、抽出部１４３は、色ＩＤが同一の区間をセグメント化することで、シーンセグメント２１０を生成する。抽出部１４３は、シーンセグメントのための情報を複数用いてもよく、例えば、色ＩＤ、カメラワークＩＤ、撮影場所ＩＤ及び撮影日時ＩＤが同一の区間をセグメント化することで、シーンセグメント２１０を生成してもよい。次いで、抽出部１４３は、シーンセグメント２１０とハイライト２２０との紐付けを行い、入力された映像２３０からハイライトショット２４０Ａ〜２４０Ｅを抽出する。そして、抽出部１４３は、入力された映像２３０のシーンセグメント２１０により区分される区間をサブショットとして抽出する。ただし、抽出部１４３は、ハイライトショット２４０と重なる、時間が短い（例えば、後述する最長の割り当て区間の長さより短い）、極端に明るい若しくは暗い、又はカメラワークが安定しない区間を除外することで、サブショット２５０を抽出してもよい。以下では、映像結果情報に基づいて抽出部１４３により抽出された単位映像、即ちハイライトショット及びサブショットの数を、抽出数とも称する。 FIG. 5 is a diagram for explaining unit video extraction processing according to the present embodiment. FIG. 5 schematically shows a process in which the extraction unit 143 extracts highlight shots 260A to 260E and sub-shots 270A to 270G. As shown in FIG. 5, the extraction unit 143 first generates a scene segment 210 based on information for a scene segment. For example, the extraction unit 143 generates a scene segment 210 by segmenting sections having the same color ID. The extraction unit 143 may use a plurality of pieces of information for a scene segment. For example, the extraction unit 143 generates a scene segment 210 by segmenting a section having the same color ID, camera work ID, shooting location ID, and shooting date / time ID. May be. Next, the extraction unit 143 associates the scene segment 210 with the highlight 220 and extracts highlight shots 240 </ b> A to 240 </ b> E from the input video 230. Then, the extraction unit 143 extracts a section divided by the scene segment 210 of the input video 230 as a sub-shot. However, the extraction unit 143 excludes a section that overlaps the highlight shot 240 and has a short time (for example, shorter than the length of the longest allocated section described later), extremely bright or dark, or a section where the camera work is not stable. The sub-shot 250 may be extracted. Hereinafter, the number of unit videos extracted by the extraction unit 143 based on the video result information, that is, the number of highlight shots and sub-shots is also referred to as the number of extractions.

［３．２．切り替えタイミングの設定処理］
編集部１４４は、音楽解析部１４１から出力された音楽解析結果情報に基づいて、入力された音楽に応じて単位映像の切り替えタイミングを設定する。例えば、編集部１４４は、抽出部１４３により抽出された単位映像を、音楽解析部１４１により解析された構成要素に応じて、小節に応じて、又は拍に応じて切り替えるための編集情報を生成してもよい。具体的には、編集部１４４は、入力された音楽を、構成要素が切り替わるタイミング、小節が切り替わるタイミング、又は拍に応じたタイミングで区切り、その区切った位置に単位映像の切り替えタイミングを設定する。 [3.2. Switching timing setting process]
Based on the music analysis result information output from the music analysis unit 141, the editing unit 144 sets the switching timing of the unit video according to the input music. For example, the editing unit 144 generates editing information for switching the unit video extracted by the extracting unit 143 according to the component analyzed by the music analyzing unit 141, according to the measure, or according to the beat. May be. Specifically, the editing unit 144 divides the input music at a timing at which a component is switched, a timing at which a measure is switched, or a timing according to a beat, and sets a unit video switching timing at the divided position.

例えば、拍に応じたタイミングとして、編集部１４４は、単位映像を１拍ごとに切り替えるための編集情報を生成してもよい。その場合、単位映像がテンポ良くスピード感を持って切り替わることとなり、鑑賞者の感情を盛り上げることが可能となる。ただし、編集部１４４は、音楽の拍の速さが閾値を超える場合に、単位映像を複数拍ごとに切り替えるための編集情報を生成してもよい。例えば、２拍ごとに単位映像が切り替わってもよい。これにより、ＢＧＭがテンポの速い音楽である場合に、単位映像があまりに早く切り替わってしまうことが防止されるので、鑑賞者にせわしない印象を与えてしまうことを回避することができる。 For example, as the timing according to the beat, the editing unit 144 may generate editing information for switching the unit video for each beat. In this case, the unit video is switched with a sense of speed at a high tempo, and the viewer's emotion can be raised. However, the editing unit 144 may generate editing information for switching the unit video for every plurality of beats when the beat speed of the music exceeds the threshold value. For example, the unit video may be switched every two beats. Thereby, when the BGM is music with a fast tempo, the unit video is prevented from being switched too early, so that it is possible to avoid giving the viewer an impression that does not frustrate.

例えば、編集部１４４は、音楽解析部１４１により解析された音楽の構造の種類ごとに、拍に応じた単位映像の切り替えの実施回数を設定してもよい。具体的には、編集部１４４は、イントロ部分及びコーラス部分といった音楽の構成要素ごとに、拍に応じた単位映像の切り替えの実施回数を設定してもよい。さらに、編集部１４４は、前記音楽解析部により特定された所定の条件を満たす部分で、拍に応じた単位映像の切り替えを実施してもよい。具体的には、編集部１４４は、コーラス部分の中でも、ボーカルの歌い始めの部分、最もボーカルの音程が高い部分等の特に重要な部分で、拍に応じた単位映像の切り替えを実施してもよい。これにより、ＢＧＭの盛り上がりに合わせて拍に応じた単位映像の切り替えを実施することが可能となり、より効果的に鑑賞者の感情を盛り上げることが可能となる。 For example, the editing unit 144 may set the number of times of switching of unit video according to the beat for each type of music structure analyzed by the music analysis unit 141. Specifically, the editing unit 144 may set the number of unit video switching operations according to the beat for each musical component such as an intro part and a chorus part. Further, the editing unit 144 may switch the unit video according to the beat at a part that satisfies the predetermined condition specified by the music analysis unit. Specifically, the editing unit 144 may switch the unit video in accordance with the beat in the chorus part, particularly in a part that is particularly important such as a part where the vocal starts and the part where the vocal pitch is highest. Good. Thereby, it is possible to switch the unit video according to the beat in accordance with the excitement of the BGM, and it is possible to excite the viewer's emotions more effectively.

例えば、編集部１４４は、音楽解析部１４１により解析された音楽の小節の単位で、拍に応じた単位映像の切り替えの実施有無を選択してもよい。この場合、小節の単位で拍に応じた単位映像の切り替えが行われることとなる。人は意識的又は無意識的にしろ、小節を意識しながら音楽を聴き、展開を予測するものであると考えられる。そのため、小節の単位で拍に応じた単位映像の切り替えは、鑑賞者に受け入れられやすいので、容易に鑑賞者の感情を盛り上げることが可能となる。さらに、小節の単位での拍に応じた単位映像の切り替えは、小節の単位での単位映像の切り替えと整合性が良い。また、編集部１４４は、拍に応じた単位映像の切り替えを実施する小節同士を離間させてもよい。これにより、拍に応じた単位映像の切り替えが連続する複数の小節で行われることがなくなり、過度の切り替えが防止される。 For example, the editing unit 144 may select whether or not to switch the unit video according to the beat in units of music measures analyzed by the music analysis unit 141. In this case, the unit video is switched according to the beat in units of measures. A person can be consciously or unconsciously listening to music with consciousness of measures and predicting development. Therefore, the switching of the unit video corresponding to the beat in the unit of measure is easily accepted by the viewer, so that the emotion of the viewer can be easily raised. Furthermore, the switching of the unit video according to the beat in the unit of measure is consistent with the switching of the unit video in the unit of measure. In addition, the editing unit 144 may separate bars for switching the unit video according to the beat. As a result, unit video switching according to beats is not performed in a plurality of consecutive bars, and excessive switching is prevented.

なお、設定された切り替えタイミングにより音楽が区切られる区間を、以下では割り当て区間とも称する。つまり、切り替えタイミングを設定することは、各単位映像をどのくらいの長さ要約映像に割り当てるか、という割り当て区間を設定することに相当する。割り当て区間のうち最長の区間を、以下では最長の割り当て区間とも称する。 A section in which music is divided at the set switching timing is also referred to as an allocation section below. In other words, setting the switching timing is equivalent to setting an allocation interval of how long each unit video is allocated to the summary video. The longest section among the allocation sections is also referred to as the longest allocation section below.

上述した、単位映像の切り替えタイミングの設定は、例えば予め設定された確率テーブルに基づいて設定されてもよい。その際、編集部１４４は、音楽の構成要素の切り替わりのタイミングでは必ず単位映像を切り替えること、最長の割り当て区間の長さの設定等のルールに従ってもよい。 The setting of the unit video switching timing described above may be set based on a preset probability table, for example. At that time, the editing unit 144 may follow rules such as switching unit videos and setting the length of the longest allocated section at the timing of switching of music components.

なお、ひとつの小節内で拍に応じて切り替わる前記単位映像は、互いに類似することが望ましい。これにより、鑑賞者に煩雑な印象を与えてしまうことを回避することが可能となる。互いに類似するとは、例えば被写体の動作、撮影日時、撮影場所、色又はカメラワークの少なくともいずれかが近いことを指す。例えば、色が同一でカメラワークが右から左へ移動する単位映像と左から右へ単位映像とは、互いに類似すると言える。また、被写体がジャンプしている単位映像同士は、互いに類似すると言える。また、互いに類似するとは、例えば単位映像に特定の被写体が含まれることを指していてもよい。例えば、同一人物や同一チームの人物が含まれる単位映像は類似すると言える。ここで、ひとつの小節内で拍に応じて切り替わる単位映像の少なくともひとつは、ひとつの小節内で２回以上採用されてもよい。例えば４拍子であれば、単位映像Ａ、単位映像Ｂ、単位映像Ａ、単位映像Ｂの順に採用されてもよいし、単位映像Ａ、単位映像Ａ、単位映像Ａ、単位映像Ａの順に採用されてもよい。これにより、鑑賞者に煩雑な印象を与えてしまうことを回避することがより容易になる。もちろん、ひとつの小節内で拍に応じて切り替わる単位映像は、それぞれ異なっていてもよい。例えば４拍子であれば、単位映像Ａ、単位映像Ｂ、単位映像Ｃ、単位映像Ｄの順に採用されてもよい。 In addition, it is desirable that the unit videos that are switched according to the beat within one measure are similar to each other. Thereby, it is possible to avoid giving a complicated impression to the viewer. “Similar to each other” means, for example, that at least one of the motion of the subject, the shooting date / time, the shooting location, the color, or the camera work is close. For example, it can be said that the unit video in which the color is the same and the camera work moves from right to left and the unit video from left to right are similar to each other. It can also be said that the unit videos in which the subject is jumping are similar to each other. Moreover, being similar to each other may indicate that a specific subject is included in the unit video, for example. For example, it can be said that unit videos including the same person or the same team are similar. Here, at least one of the unit images switched according to the beat in one measure may be adopted twice or more in one measure. For example, in the case of 4 beats, the unit video A, the unit video B, the unit video A, and the unit video B may be used in this order, or the unit video A, the unit video A, the unit video A, and the unit video A may be used in this order. May be. This makes it easier to avoid giving the viewer a complicated impression. Of course, the unit video that switches according to the beat in one measure may be different. For example, in the case of 4 beats, the unit video A, the unit video B, the unit video C, and the unit video D may be employed in this order.

以下、図６を参照して、音楽解析結果に基づく単位映像の切り替えタイミングの設定処理を具体的に説明する。 Hereinafter, with reference to FIG. 6, the setting process of the switching timing of the unit video based on the music analysis result will be specifically described.

図６は、本実施形態に係る単位映像の切り替えタイミングの設定処理を説明するための図である。図６では、音楽のうちＢＧＭとして使用される区間３１０の構成要素３２０、及び設定される切り替えタイミング３３０を示している。切り替えタイミング３３０の区分け線が切り替えタイミングを示しており、区分け線により区分される区間が、割り当て区間を示している。図６に示すように、構成要素３２０としてメロディ部分、コーラス部分及びエンディング部分が含まれている。また、図６に示した音楽は、ひとつの小節３４３に１つの小節頭の拍３４２及び３つの拍３４１が含まれる４拍子の音楽である。図６に示した例では、編集部１４４は、構成要素３２０がメロディからコーラスに切り替わるタイミング、及びコーラスからエンディングに切り替わるタイミングで、単位映像の切り替えタイミングを設定している。また、編集部１４４は、１小節単位の割り当て区間３５１Ａ〜３５１Ｄを設定し、２小節単位の割り当て区間３５２を設定し、３小節単位の割り当て区間３５３を設定し、１拍単位の割り当て区間３５４を設定している。そのため、区間３５４において、１拍ごとに単位映像が切り替えられる。この場合、最長の割り当て区間３６０は３小節分である。 FIG. 6 is a diagram for explaining unit video switching timing setting processing according to the present embodiment. In FIG. 6, the component 320 of the area 310 used as BGM among music, and the switching timing 330 set are shown. The dividing line of the switching timing 330 shows the switching timing, and the section divided by the dividing line shows the allocation section. As shown in FIG. 6, the component 320 includes a melody portion, a chorus portion, and an ending portion. The music shown in FIG. 6 is quadruple music in which one bar 343 includes one bar 342 and three beats 341. In the example illustrated in FIG. 6, the editing unit 144 sets the unit video switching timing at the timing when the component 320 switches from the melody to the chorus, and at the timing when the component 320 switches from the chorus to the ending. Further, the editing unit 144 sets allocation sections 351A to 351D in units of one bar, sets allocation sections 352 in units of two bars, sets allocation sections 353 in units of three bars, and sets allocation sections 354 in units of one beat. It is set. Therefore, in the section 354, the unit video is switched for each beat. In this case, the longest allocation section 360 is 3 bars.

下記の表１に、図６に示した例における、ＢＧＭ全体において及び各構成要素において採用される単位映像の個数を、切り替えタイミングの種類（割り当て区間の長さ）ごとに示した。 Table 1 below shows the number of unit videos employed in the entire BGM and in each component in the example shown in FIG. 6 for each switching timing type (allocation section length).

なお、１拍ごとに単位映像が切り替えられる場合、ひとつの単位映像が複数回採用される場合も考えられるので、選択される単位映像の個数は最大４個となる。表１を参照すると、図６に示した例では、全体で最大１０個の単位映像が要約映像に採用される。また、図６に示した例では、最長の割り当て区間は３小節分である。 Note that when the unit video is switched for each beat, one unit video may be adopted a plurality of times, and therefore, the maximum number of unit videos to be selected is four. Referring to Table 1, in the example shown in FIG. 6, a total of up to 10 unit videos are adopted as summary videos. Further, in the example shown in FIG. 6, the longest allocation section is for three bars.

このように、要約映像に採用される単位映像の数は、編集部１４４が音楽解析結果情報に基づいて設定した切り替えタイミングにより定まる割り当て区間の数、即ち音楽が区切られた数により定まる。以下では、音楽解析結果情報に基づいて編集部１４４により音楽が区切られた数を、採用数とも称する。例えば、図６に示した例では、採用数は最大１０個である。より詳しくは、拍に応じた切り替えの内容が、単位映像Ａ、単位映像Ｂ、単位映像Ｃ、単位映像Ｄであれば、採用数は１０個となる。また、拍に応じた切り替えの内容が、単位映像Ａ、単位映像Ｂ、単位映像Ａ、単位映像Ｂであれば、採用数は８個となる As described above, the number of unit videos employed in the summary video is determined by the number of allocation sections determined by the switching timing set by the editing unit 144 based on the music analysis result information, that is, the number of music segments. In the following, the number of pieces of music divided by the editing unit 144 based on the music analysis result information is also referred to as an adopted number. For example, in the example shown in FIG. More specifically, if the content of switching according to the beat is unit video A, unit video B, unit video C, and unit video D, the number of adoption is ten. If the content of switching according to the beat is unit video A, unit video B, unit video A, and unit video B, the number of adoption is eight.

編集部１４４は、切り替えタイミングの設定処理において設定した切り替えタイミングで、抽出部１４３により抽出された単位映像を切り替えてもよい。他にも、編集部１４４は、切り替えタイミングの設定処理において設定した切り替えタイミングを変更してもよい。例えば、編集部１４４は、切り替えタイミングの設定処理において設定した、割り当て区間の総数（採用数に相当）及び割り当て区間の長さごとの数を保持しつつ、割り当て区間の順序を入れ換えてもよい。そのような例については、後述する採用区間の設定処理において説明する。 The editing unit 144 may switch the unit video extracted by the extraction unit 143 at the switching timing set in the switching timing setting process. In addition, the editing unit 144 may change the switching timing set in the switching timing setting process. For example, the editing unit 144 may change the order of the allocation sections while maintaining the total number of allocation sections (corresponding to the number employed) and the number of allocation sections for each length set in the switching timing setting process. Such an example will be described later in the adopted section setting process.

［３．３．動作モードの決定処理］
上述した、切り替えタイミングの設定処理と単位映像の抽出処理との順番は任意である。 [3.3. Operation mode decision processing]
The order of the switching timing setting process and the unit video extraction process described above is arbitrary.

切り替えタイミングの設定処理が先である場合、単位映像の抽出処理には、切り替えタイミングの設定処理に係る制限が課されることとなる。例えば、抽出部１４３は、少なくとも採用数以上の数の単位映像を抽出する、という制限が課され得る。本制限により、要約映像内では、単位映像が重複することなく切り替わることとなる。また、抽出部１４３は、抽出した各単位映像がどのタイミングで用いられてもよいように、最長の割り当て区間（図６に示した例では３小節分）以上の長さの単位映像を抽出する、という制限が課され得る。本制限によれば、抽出したどの単位映像であっても、最長の割り当て区間に割り当て可能となる。 When the switching timing setting process is first, the unit video extraction process is restricted by the switching timing setting process. For example, the extraction unit 143 may be restricted to extract at least the number of adopted unit videos. Due to this restriction, unit videos are switched without overlapping in the summary video. Further, the extraction unit 143 extracts a unit video having a length equal to or longer than the longest allocation section (in the example illustrated in FIG. 6, three bars) so that each extracted unit video may be used at any timing. May be imposed. According to this restriction, any extracted unit video can be allocated to the longest allocation section.

単位映像の抽出処理が先である場合、切り替えタイミングの設定処理には、単位映像の抽出処理に係る制限が課されることとなる。例えば、編集部１４４は、抽出部１４３により抽出された単位映像の数より少ない数の単位映像を割り当てるよう、切り替えタイミングを設定する、という制限が課される。本制限により、要約映像内では、単位映像が重複することなく切り替わることとなる。また、編集部１４４は、抽出部１４３により抽出された各単位映像の長さに応じた長さの割り当て区間となるよう、切り替えタイミングを設定する、という制限が課され得る。本制限によれば、抽出部１４３により抽出された各単位映像に、適した割り当て区間を割り当てることができる。 When the unit video extraction process is first, the switching timing setting process is limited by the unit video extraction process. For example, the editing unit 144 is restricted to set the switching timing so as to allocate a smaller number of unit videos than the number of unit videos extracted by the extraction unit 143. Due to this restriction, unit videos are switched without overlapping in the summary video. In addition, a limitation may be imposed that the editing unit 144 sets the switching timing so as to be an allocation section having a length corresponding to the length of each unit video extracted by the extraction unit 143. According to this restriction, a suitable allocation section can be assigned to each unit video extracted by the extraction unit 143.

動作モード制御部１４５は、このような制限を満たすために、抽出部１４３及び編集部１４４の動作モードを変更し得る。以下では、切り替えタイミングの設定処理が先に行われる場合について説明する。 The operation mode control unit 145 can change the operation modes of the extraction unit 143 and the editing unit 144 in order to satisfy such a restriction. Hereinafter, a case where the switching timing setting process is performed first will be described.

まず、動作モード制御部１４５は、動作モードを通常処理モード（第１の動作モード）として、抽出部１４３及び編集部１４４を動作させる。通常処理モードにおいては、編集部１４４は、上述した通り音楽解析結果情報を利用して単位映像の切り替えタイミングを設定する。また、抽出部１４３は、上述した通り映像解析結果情報を利用して単位映像を抽出する。 First, the operation mode control unit 145 operates the extraction unit 143 and the editing unit 144 with the operation mode as the normal processing mode (first operation mode). In the normal processing mode, the editing unit 144 sets the unit video switching timing using the music analysis result information as described above. Further, the extraction unit 143 extracts the unit video using the video analysis result information as described above.

動作モード制御部１４５は、通常処理モードにおける、抽出数と採用数との大小関係に応じて、抽出部１４３による再度の抽出処理又は編集部１４４による再度の採用処理の少なくともいずれかを、動作モードを変更して実施させる否かを判定する。ここでの抽出処理とは、上述した単位映像の抽出処理を指す。また、ここでの採用処理とは、上述した切り替えタイミングの設定処理を指す。抽出数と採用数との大小関係に関しては、上述したように抽出数が採用数以上であるという制限がある。動作モード制御部１４５は、本制限が満たされていない場合に、動作モードを変更することで本制限を満たすことを可能にする。 The operation mode control unit 145 selects at least one of the re-extraction process by the extraction unit 143 and the re-adoption process by the editing unit 144 according to the magnitude relationship between the number of extractions and the number of adoptions in the normal processing mode. It is determined whether or not to implement the change. The extraction processing here refers to the above-described unit video extraction processing. In addition, the adoption process here refers to the switching timing setting process described above. Regarding the magnitude relationship between the number of extractions and the number of adoptions, there is a limitation that the number of extractions is equal to or more than the number of adoptions as described above. The operation mode control unit 145 can satisfy the restriction by changing the operation mode when the restriction is not satisfied.

例えば、動作モード制御部１４５は、通常処理モードにおける採用数と抽出数とが等しい又は抽出数の方が多い場合に、動作モードの変更を行わないと判定する。つまり、動作モード制御部１４５は、抽出数が採用数以上である場合に、動作モードの変更を行わないと判定する。動作モードを変更せずとも、上述した抽出数が採用数以上である、制限が満たされているためである。 For example, the operation mode control unit 145 determines not to change the operation mode when the employed number and the extracted number in the normal processing mode are equal or the extracted number is larger. That is, the operation mode control unit 145 determines not to change the operation mode when the number of extractions is equal to or greater than the number of adoptions. This is because the restriction that the number of extractions described above is equal to or greater than the number adopted is satisfied without changing the operation mode.

一方で、動作モード制御部１４５は、通常処理モードにおける抽出数が採用数よりも少ない場合に、動作モードを他の動作モードに変更し得る。例えば、動作モード制御部１４５は、動作モードを分割処理モード（第２の動作モード）、又はリトライ処理モード（第５の動作モード）に変更し得る。 On the other hand, the operation mode control unit 145 can change the operation mode to another operation mode when the number of extractions in the normal processing mode is smaller than the number of adoption. For example, the operation mode control unit 145 can change the operation mode to the division processing mode (second operation mode) or the retry processing mode (fifth operation mode).

分割処理モードにおいては、抽出部１４３は、通常処理モードにおいて抽出された単位映像のうち少なくともいずれかを２以上の単位映像に分割する。例えば、抽出部１４３は、通常処理モードにおいて抽出された単位映像のうち、長さが閾値を超える単位映像を分割の対象としてもよい。また、抽出部１４３は、分割後の単位映像が、最長の割り当て区間以上となるよう、分割数を決定してもよい。分割処理モードにより、抽出数が増加することになるので、抽出数が採用数以上である、という制限が満たされ得る。 In the division processing mode, the extraction unit 143 divides at least one of the unit videos extracted in the normal processing mode into two or more unit videos. For example, the extraction unit 143 may set a unit video whose length exceeds a threshold among the unit videos extracted in the normal processing mode as a division target. In addition, the extraction unit 143 may determine the number of divisions so that the divided unit video is equal to or longer than the longest allocation section. Since the number of extractions increases according to the division processing mode, the restriction that the number of extractions is equal to or more than the number of adoptions can be satisfied.

リトライ処理モードにおいては、編集部１４４は、所定の間隔で音楽を区切ることで、切り替えタイミングを設定する。また、抽出部１４３は、映像を所定の間隔で区切った単位映像を抽出する。例えば、編集部１４４は、入力された音楽を等間隔で、又は予め設定された間隔で区切り、その区切りのタイミングを切り替えタイミングとして設定する。また、抽出部１４３は、入力された映像を等間隔で、又は予め設定された間隔で区切ることで、その区切った映像を単位映像として抽出する。つまり、抽出部１４３は、ハイライトを考慮せずに単位映像を抽出する。リトライ処理モードは、区切る間隔を調節することで採用数及び抽出数を任意に調節可能であるので、抽出数が採用数以上である、という制限が満たされ得る。 In the retry processing mode, the editing unit 144 sets the switching timing by dividing music at a predetermined interval. Further, the extraction unit 143 extracts a unit video obtained by dividing the video at a predetermined interval. For example, the editing unit 144 divides the input music at equal intervals or at preset intervals, and sets the timing of the division as the switching timing. In addition, the extraction unit 143 extracts the divided video as a unit video by dividing the input video at regular intervals or at preset intervals. That is, the extraction unit 143 extracts the unit video without considering the highlight. In the retry processing mode, the number of extractions and the number of extractions can be arbitrarily adjusted by adjusting the separation interval. Therefore, the restriction that the number of extractions is equal to or more than the number of adoptions can be satisfied.

以上説明した各動作モードを、図７を参照しながら比較説明する。図７は、本実施形態に係る映像処理装置１００の動作モードの一例を説明するための図である。図７に示すように、通常処理モードでは、映像解析結果情報及び音楽解析結果情報が利用され、映像品質が「高」の要約映像が生成されることとなる。分割処理モードでは、映像解析結果情報が修正して利用される。具体的には、図７に示すように、通常処理モードにおいて抽出された単位映像４１０が単位映像４１１及び４１２に分割される。同様に、単位映像４２０が単位映像４２１、４２２及び４２３に分割され、単位映像４３０が単位映像４３１、４３２及び４３３に分割される。分割処理モードでは、本来ひとつであった単位映像が複数に分割されてそれぞれが要約映像に採用され得る。つまり、似た単位映像が要約映像に採用され得るので、要約映像の映像品質は「中」となる。リトライ処理モードでは、映像解析結果情報及び音楽解析結果情報が無視される。具体的には、図７に示すように、切り替えタイミングは等間隔となり、単位映像は入力された映像を等間隔で区切ったものとなる。そのため、リトライ処理モードにおいて生成される要約映像は単調なものとなるので、映像品質は「低」となる。 The operation modes described above will be compared and described with reference to FIG. FIG. 7 is a diagram for explaining an example of an operation mode of the video processing apparatus 100 according to the present embodiment. As shown in FIG. 7, in the normal processing mode, the video analysis result information and the music analysis result information are used, and a summary video with a video quality of “high” is generated. In the division processing mode, the video analysis result information is corrected and used. Specifically, as shown in FIG. 7, the unit video 410 extracted in the normal processing mode is divided into unit videos 411 and 412. Similarly, the unit video 420 is divided into unit videos 421, 422, and 423, and the unit video 430 is divided into unit videos 431, 432, and 433. In the division processing mode, a unit video that was originally one can be divided into a plurality of pieces and each can be adopted as a summary video. That is, since similar unit videos can be adopted for the summary video, the video quality of the summary video is “medium”. In the retry processing mode, the video analysis result information and the music analysis result information are ignored. Specifically, as shown in FIG. 7, the switching timing is equally spaced, and the unit video is obtained by dividing the input video at equal intervals. Therefore, the summary video generated in the retry processing mode is monotonous, and the video quality is “low”.

動作モード制御部１４５は、通常処理モードにおける抽出数が採用数よりも少ない場合に、動作モードを分割処理モード及びリトライ処理モード以外の他の動作モードに変更してもよい。例えば、動作モード制御部１４５は、動作モードを最長割り当て区間短縮処理モード（第３の動作モード）、又はサブショット条件緩和処理モード（第４の動作モード）に変更し得る。 The operation mode control unit 145 may change the operation mode to another operation mode other than the division processing mode and the retry processing mode when the number of extractions in the normal processing mode is smaller than the number of adoption. For example, the operation mode control unit 145 can change the operation mode to the longest allocation interval shortening processing mode (third operation mode) or the sub-shot condition relaxation processing mode (fourth operation mode).

最長割り当て区間短縮処理モードにおいては、編集部１４４は、通常処理モードと比較して、最長の割り当て区間を短くする。これにより、抽出部１４３は、通常処理モードよりも短い最長の割り当て区間以上の長さで単位映像を抽出することとなる。図６に示した例では、抽出部１４３は、通常処理モードにおいては３小節分以上の長さで単位映像を抽出する。一方、抽出部１４３は、最長割り当て区間短縮処理モードにおいては、例えば２小節分以上の長さで単位映像を抽出する。これにより、抽出部１４３は、通常処理モードにおいては２小節分しかなく、短いためサブショットとして抽出されなかった区間の映像を、サブショットして抽出することが可能となる。このように、最長割り当て区間短縮処理モードでは、抽出数が増加することになるので、抽出数が採用数以上である、という制限が満たされ得る。 In the longest allocation section shortening processing mode, the editing unit 144 shortens the longest allocation section as compared with the normal processing mode. As a result, the extraction unit 143 extracts the unit video with a length equal to or longer than the longest allocation section shorter than the normal processing mode. In the example illustrated in FIG. 6, the extraction unit 143 extracts a unit video with a length of three bars or more in the normal processing mode. On the other hand, in the longest allocation section shortening processing mode, the extraction unit 143 extracts a unit video with a length of, for example, two bars or more. As a result, the extraction unit 143 can sub-extract and extract an image of a section that has only two bars in the normal processing mode and has not been extracted as a sub-shot because it is short. In this way, in the longest allocation section shortening processing mode, the number of extractions increases, so the restriction that the number of extractions is equal to or greater than the number of adoptions can be satisfied.

サブショット条件緩和処理モードにおいては、抽出部１４３は、通常処理モードと比較して単位映像を抽出するための映像解析部１４２による解析結果に関する条件を緩和する。例えば、抽出部１４３は、時間が短い区間であっても単位映像として抽出したり、極端に明るい若しくは暗い区間であっても単位映像として抽出したり、カメラワークが安定しない区間であっても単位映像として抽出する。このように、サブショット条件緩和処理モードでは、抽出数が増加することになるので、抽出数が採用数以上である、という制限が満たされ得る。 In the sub-shot condition relaxation processing mode, the extraction unit 143 relaxes conditions related to the analysis result by the video analysis unit 142 for extracting the unit video as compared with the normal processing mode. For example, the extraction unit 143 extracts a unit video even if the time is short, extracts it as a unit video even if it is extremely bright or dark, or even if the camera work is not stable. Extract as video. In this way, in the sub-shot condition relaxation processing mode, the number of extractions increases, so that the restriction that the number of extractions is equal to or greater than the number of adoptions can be satisfied.

上記挙げた動作モードの順序は任意である。例えば、動作モード制御部１４５は、通常処理モードの後に、分割処理モード、最長割り当て区間短縮処理モード、サブショット条件緩和処理モード、リトライ処理モードの順に動作モードを変更してもよい。また、動作モード制御部１４５は、上述した動作モードを任意に組み合わせて用いてもよい。さらに、動作モード制御部１４５は、上述した動作モードの全部又は一部を採用した処理を並列的に行わせ、最も品質が高い結果を得られる動作モードを選択してもよい。 The order of the operation modes listed above is arbitrary. For example, the operation mode control unit 145 may change the operation mode in the order of the division processing mode, the longest allocated section shortening processing mode, the sub-shot condition relaxation processing mode, and the retry processing mode after the normal processing mode. The operation mode control unit 145 may use any combination of the above-described operation modes. Furthermore, the operation mode control unit 145 may perform processing that employs all or part of the above-described operation modes in parallel, and may select an operation mode that provides the highest quality result.

［３．４．単位映像の選択処理］
（概要）
編集部１４４は、抽出部１４３により抽出された単位映像の中から、要約映像に採用する単位映像を選択する。例えば、編集部１４４は、ハイライトを優先して、採用数分の単位映像を選択する。以下、図８及び図９を参照して、単位映像の選択処理を説明する。 [3.4. Unit video selection process]
(Overview)
The editing unit 144 selects a unit video to be used for the summary video from the unit videos extracted by the extraction unit 143. For example, the editing unit 144 prioritizes highlighting and selects unit videos for the number of adopted units. Hereinafter, the unit video selection process will be described with reference to FIGS.

図８は、本実施形態に係る単位映像の選択処理を説明するための図である。図８に示すように、まず、編集部１４４は、要約映像に採用する単位映像の候補として、ひとつ以上のサブショット５１０を選択する。選択ショット５２０は、要約映像に採用する単位映像の候補として選択された単位映像である。編集部１４４は、例えばシーンセグメントが分散し、及び／又はユーザにより指定されたテーマに沿うようにサブショット５１０を選択し得る。例えば、編集部１４４は、後に説明する評価関数による評価値が高い順にサブショット５１０を選択する。図中の［１］［２］［３］［４］［５］［６］［７］は、評価関数を用いた選択順序を示している。また、採用数は７であるものとする。図８に示すように、編集部１４４は、選択ショット５２０においては、選択した単位映像を撮影された時刻に沿って並べる。 FIG. 8 is a diagram for explaining unit video selection processing according to the present embodiment. As shown in FIG. 8, the editing unit 144 first selects one or more sub-shots 510 as unit video candidates to be used for the summary video. The selected shot 520 is a unit video selected as a unit video candidate to be adopted for the summary video. The editing unit 144 may select the sub-shots 510 so that, for example, the scene segments are distributed and / or follow the theme specified by the user. For example, the editing unit 144 selects the sub-shots 510 in descending order of evaluation values based on an evaluation function described later. [1], [2], [3], [4], [5], [6], and [7] in the figure indicate the selection order using the evaluation function. In addition, it is assumed that the number of adoption is seven. As shown in FIG. 8, in the selected shot 520, the editing unit 144 arranges the selected unit videos along the time at which the selected unit videos were taken.

図９は、本実施形態に係る単位映像の選択処理を説明するための図である。図９に示すように、編集部１４４は、要約映像に採用する単位映像の候補として、ハイライトショット５３０を選択する。編集部１４４は、例えば選択ショットにおいて隣り合う単位映像が同一のハイライトとならないよう、ハイライトショット５３０を選択し得る。例えば、編集部１４４は、後に説明する評価関数による評価値が高い順にハイライトショット５３０を選択する。また、編集部１４４は、ハイライトショット５３０を選択する代わりに、既に選択されたサブショットの中から優先度の低いサブショット５４０を除去する。優先度が低いサブショット５４０としては、例えば選択順序が遅かったサブショットが挙げられる。図中の［１］［２］は、評価関数を用いた選択順序及び除去順序を示している。 FIG. 9 is a diagram for explaining unit video selection processing according to the present embodiment. As illustrated in FIG. 9, the editing unit 144 selects the highlight shot 530 as a candidate for a unit video to be adopted for the summary video. For example, the editing unit 144 can select the highlight shot 530 so that adjacent unit videos in the selected shot do not have the same highlight. For example, the editing unit 144 selects the highlight shots 530 in descending order of evaluation values based on an evaluation function described later. Further, the editing unit 144 removes the sub-shot 540 having a low priority from the already selected sub-shots instead of selecting the highlight shot 530. An example of the sub-shot 540 having a low priority is a sub-shot whose selection order is late. [1] and [2] in the figure indicate a selection order and a removal order using evaluation functions.

（サブショットの評価関数）
以下では、サブショットの選択のために用いられる評価関数の一例を説明する。例えば、編集部１４４は、下記の数式１に示す評価関数を用いて、サブショットを選択し得る。 (Sub-shot evaluation function)
Hereinafter, an example of an evaluation function used for selecting a sub-shot will be described. For example, the editing unit 144 can select a sub-shot using an evaluation function shown in Equation 1 below.

上記数式１におけるＷ_ｓｉＳｉ及びＷ_ｓｓＳｓは、シーンセグメントに関する項である。記号Ｗ_ｓｉ及び記号Ｗ_ｓｓは、各項の重みであり、編集部１４４により任意に設定され得る。記号Ｓｉは、シーンセグメントのセグメントＩＤに関する値（スコア）である。例えば、記号Ｓｉは、シーンセグメントのために用いられた色ＩＤ、カメラワークＩＤ、撮影日時ＩＤ及び／又は場所ＩＤに基づいて計算される。例えば、事前に設定されたテーマに沿うために、事前に設定されたテーマに沿ったセグメントＩＤの割合に近づくようにスコアが計算され得る。また、視覚的な偏りを低減するために、各セグメントＩＤが均等に選択されるようなスコアが計算され得る。記号Ｓｓは、シーンセグメントの安定度に関するスコアである。記号Ｓｓは、シーンセグメントのために用いられた色及び／又はカメラワークの安定度（時間変化量の少なさ）に基づいて計算される。例えば、安定度が高いほど、高いスコアが計算されてもよい。他にも、編集部１４４は、選択元の映像ファイルを分散させるべく、選択元の映像ファイルに関する項を上記数式１に追加してもよい。また、編集部１４４は、撮影時刻の分散を分散させるべく、選択済みの前後の選択ショットまでの時間に関する項を上記数式１に追加してもよい。 W _si Si and W _ss Ss in Equation 1 are terms related to the scene segment. The symbol W _si and the symbol W _ss are weights of each term, and can be arbitrarily set by the editing unit 144. The symbol Si is a value (score) related to the segment ID of the scene segment. For example, the symbol Si is calculated based on the color ID, camera work ID, shooting date / time ID and / or location ID used for the scene segment. For example, to follow a preset theme, the score can be calculated to approach the percentage of segment IDs along the preset theme. Also, a score can be calculated such that each segment ID is evenly selected to reduce visual bias. The symbol Ss is a score related to the stability of the scene segment. The symbol Ss is calculated based on the color used for the scene segment and / or the stability of the camerawork (less time change). For example, a higher score may be calculated as the stability is higher. In addition, the editing unit 144 may add a term related to the selection source video file to Equation 1 in order to distribute the selection source video file. Further, the editing unit 144 may add a term related to the time to the selected shots before and after the selection to Equation 1 in order to disperse the variance of the shooting time.

編集部１４４は、サブショットをひとつ選択する度に、上記数式１に示した評価関数を未選択の各サブショットについて計算し、評価値が最も高いサブショットを選択する。なお、各記号のスコアは、すでに選択されたサブショットとの関係で変動し得る。 Each time the editing unit 144 selects one sub-shot, the evaluation function shown in Equation 1 is calculated for each unselected sub-shot, and the sub-shot having the highest evaluation value is selected. Note that the score of each symbol may vary in relation to the already selected sub-shot.

（ハイライトショットの評価関数）
以下では、ハイライトショットの選択のために用いられる評価関数の一例を説明する。例えば、編集部１４４は、下記の数式２に示す評価関数を用いて、ハイライトショットを選択し得る。 (Highlight Shot Evaluation Function)
In the following, an example of an evaluation function used for selecting a highlight shot will be described. For example, the editing unit 144 can select a highlight shot using an evaluation function shown in Equation 2 below.

上記数式２におけるＷ_ｈｉＨｉ及びＷ_ｈｓＨｓは、ハイライトに関する項である。記号Ｗ_ｈｉ及び記号Ｗ_ｈｓは、各項の重みであり、編集部１４４により任意に設定され得る。記号ＨｉはハイライトＩＤに関するスコアである。例えば、記号Ｈｉは、ハイライトＩＤに基づいて計算される。例えば、事前に設定されたテーマに沿うために、事前に設定されたテーマに沿ったハイライトＩＤの割合に近づくようにスコアが計算され得る。また、視覚的な偏りを低減するために、各ハイライトＩＤが均等に選択されるようなスコアが計算され得る。記号Ｈｓは、ハイライトショットの価値に関するスコアである。記号Ｈｓは、例えばスノーボードのジャンプであれば、滞空時間が長いほど、また回転量が多い程、高いスコアが計算され得る。他の記号については、上記数式１と同様である。 W _hi Hi and W _hs Hs in Equation 2 are terms related to highlighting. The symbol W _hi and the symbol W _hs are the weights of each term, and can be arbitrarily set by the editing unit 144. The symbol Hi is a score related to the highlight ID. For example, the symbol Hi is calculated based on the highlight ID. For example, to follow a preset theme, the score can be calculated to approach the percentage of highlight IDs along the preset theme. Also, a score can be calculated such that each highlight ID is evenly selected to reduce visual bias. The symbol Hs is a score related to the value of the highlight shot. For example, if the symbol Hs is a snowboard jump, the higher the dwell time and the greater the amount of rotation, the higher the score can be calculated. Other symbols are the same as those in Equation 1 above.

編集部１４４は、ハイライトショットをひとつ選択する度に、上記数式２に示した評価関数を未選択の各ハイライトショットについて計算し、評価値が最も高いハイライトショットを選択する。そして、編集部１４４は、既に選択されたサブショットの中から選択順序が遅かったサブショットを除去する。なお、各記号のスコアは、すでに選択されたハイライトショットとの関係で変動し得る。 Each time the editing unit 144 selects one highlight shot, the evaluation function shown in Equation 2 is calculated for each unselected highlight shot, and the highlight shot with the highest evaluation value is selected. Then, the editing unit 144 removes sub-shots whose selection order is late from the already selected sub-shots. Note that the score of each symbol may vary depending on the relationship with the already selected highlight shot.

編集部１４４は、記号Ｈｉを用いることで、例えばジャンプのハイライトショットが連続することを回避することができる。なお、ユーザにより要約映像に含めるべき区間として指定された区間に係るハイライトショットに関しては、記号Ｈｉに係るスコアは無視されてもよい。その場合、例えばユーザによりハイライトとして指定されたジャンプの単位映像が連続し得る。また、編集部１４４は、記号Ｈｓを用いることで、価値が高いハイライトショットを優先的に選択することが可能となる。 By using the symbol Hi, the editing unit 144 can avoid, for example, continuous highlight shots of jumps. Note that the score related to the symbol Hi may be ignored for the highlight shot related to the section designated as the section to be included in the summary video by the user. In this case, for example, unit videos of jumps designated as highlights by the user can be continuous. In addition, the editing unit 144 can preferentially select a high-value highlight shot by using the symbol Hs.

なお、編集部１４４は、同一ハイライトＩＤのハイライトショットの選択数を、予め設定あれた回数未満としてもよい。例えば、編集部１４４は、下記の数式を満たすハイライトショットを選択してもよい。下記数式によれば、例えば本来はジャンプのハイライトショットは２回まで選択可能であっても、記号Ｈｓのスコアが高いジャンプについては選択回数が３回以上になり得、記号Ｈｓのスコアが低いジャンプについては選択回数が２回未満になり得る。
ハイライトスコアＨｓ−減衰係数×選択回数≧閾値 …（数式３） Note that the editing unit 144 may set the number of highlight shots with the same highlight ID to be less than the preset number. For example, the editing unit 144 may select a highlight shot that satisfies the following mathematical formula. According to the following formula, for example, even if a highlight shot of a jump can be selected up to two times, a jump with a high score of the symbol Hs can be selected three times or more, and the score of the symbol Hs is low. For jumps, the number of selections can be less than two.
Highlight score Hs−attenuation coefficient × number of selections ≧ threshold value (Equation 3)

以上、単位映像の選択処理の一例を説明した。上記では、まずサブショットを選択し、続いてハイライトショットを選択する例を説明したが、本技術はかかる例に限定されない。例えば、編集部１４４は、まずハイライトショットを選択し、続いてサブショットを選択してもよい。その場合、編集部１４４は、まずハイライトショットを選択して、選択したハイライトショットの数を採用数から差し引いた数のサブショットを選択する。他にも、編集部１４４は、ハイライトショットとサブショットとを同時に選択してもよい。その場合、編集部１４４は、ハイライトショットとサブショットとで共通の評価関数を適用し得る。なお、サブショットについては、ハイライトＩＤ及びハイライトショットの価値に関するスコア（記号Ｈｉ及びＨｓ）は存在しないため、対応する項を任意の値（例えば０）に設定することで、共通の評価関数が適用可能となる。 Heretofore, an example of unit video selection processing has been described. In the above description, an example in which a sub-shot is first selected and then a highlight shot is selected has been described. However, the present technology is not limited to such an example. For example, the editing unit 144 may first select a highlight shot and then select a sub-shot. In that case, the editing unit 144 first selects a highlight shot, and selects the number of sub-shots obtained by subtracting the number of selected highlight shots from the number of adopted shots. In addition, the editing unit 144 may simultaneously select a highlight shot and a sub-shot. In that case, the editing unit 144 can apply a common evaluation function between the highlight shot and the sub-shot. For the sub-shots, there is no highlight ID and score (symbol Hi and Hs) regarding the value of the highlight shot, and therefore a common evaluation function can be obtained by setting the corresponding term to an arbitrary value (for example, 0). Can be applied.

［３．５．採用区間の設定処理］
編集部１４４は、抽出部１４３により抽出された単位映像に当該単位映像の内容に応じた採用区間を設定し、複数の単位映像の各々について設定した採用区間を採用するための編集情報を生成する。例えば、編集部１４４は、単位映像の内容に応じて、要約映像に採用すべき採用区間を設定し、設定した採用区間を連結するための編集情報を生成する。なお、採用区間の位置とは、単位映像のうち、要約映像に採用される区間である。採用区間は単位映像の全部であってもよいし、一部であってもよい。 [3.5. Adopting section setting process]
The editing unit 144 sets an adopted section corresponding to the content of the unit video in the unit video extracted by the extracting unit 143, and generates editing information for adopting the adopted section set for each of the plurality of unit videos. . For example, the editing unit 144 sets an adopted section to be adopted for the summary video according to the content of the unit video, and generates editing information for connecting the set adopted sections. Note that the position of the adopted section is a section adopted for the summary video among the unit videos. The adopted section may be the entire unit video or a part thereof.

例えば、編集部１４４は、単位映像の内容に応じて当該単位映像における採用区間の位置を設定してもよい。例えば、編集部１４４は、単位映像がハイライトショットであるか若しくはサブショットであるか、ハイライトＩＤ、色ＩＤ、カメラワークＩＤ等の属性は何か、といった単位映像の内容に応じて採用区間の位置を設定し得る。採用区間の位置とは、単位映像全体における当該単位映像のうち採用区間として設定される区間の位置を示し、例えば単位映像の前半、中盤、又は後半等が挙げられる。これにより、例えば鑑賞者の感情を盛り上げるためにより適切な区間が、単位映像の内容に応じて設定され、要約映像に採用されることとなる。 For example, the editing unit 144 may set the position of the adopted section in the unit video according to the content of the unit video. For example, the editing unit 144 adopts the adopted section according to the content of the unit video such as whether the unit video is a highlight shot or a sub-shot, and what attributes such as a highlight ID, a color ID, and a camera work ID are. Can be set. The position of the adopted section indicates the position of the section set as the adopted section among the unit videos in the entire unit video, and includes, for example, the first half, the middle board, or the second half of the unit video. Thereby, for example, a more appropriate section for exciting the viewer's emotion is set according to the content of the unit video and is adopted in the summary video.

例えば、編集部１４４は、映像解析部１４２により解析された映像の被写体の動作に応じて単位映像における採用区間の位置を設定してもよい。例えば、スノーボードのジャンプに係るハイライトショットを想定する。編集部１４４は、映像解析部１４２により被写体の動作がジャンプであると解析された単位映像に関しては、助走中、助走中から滞空中まで、滞空中、滞空中から着地後、又は着地から着地後までのいずれかの位置に採用区間を設定してもよい。その場合、編集部１４４は、ジャンプの様々な注目すべき見どころに着目した採用区間を設定することが可能である。他の例として、スノーボードのターン（移動方向の転換）に係るハイライトショットを想定する。編集部１４４は、映像解析部１４２により被写体の動作が移動方向の転換であると解析された単位映像に関しては、転換前から転換中まで、転換中、又は転換中から転換後までのいずれかの位置に採用区間を設定してもよい。その場合、編集部１４４は、ターンの様々な注目すべき見どころに着目した採用区間を設定することが可能である。 For example, the editing unit 144 may set the position of the adopted section in the unit video in accordance with the motion of the subject of the video analyzed by the video analysis unit 142. For example, a highlight shot related to a snowboard jump is assumed. For the unit video analyzed by the video analysis unit 142 as the subject's motion is a jump, the editing unit 144 is running, from running to running in the air, in the air, from landing in the air, or after landing from the landing. The employed section may be set at any position up to. In that case, the editing unit 144 can set an adopted section focusing on various notable highlights of the jump. As another example, a highlight shot relating to a snowboard turn (change in the direction of movement) is assumed. For the unit video analyzed by the video analysis unit 142 that the movement of the subject is a change of the moving direction, the editing unit 144 is either before the change to the change, during the change, or from the change to the after the change. An adopted section may be set at the position. In that case, the editing unit 144 can set an adopted section focusing on various notable highlights of the turn.

例えば、編集部１４４は、同じ種類（同じハイライトＩＤ）の２以上のハイライトショットに採用区間を設定する場合、２以上のハイライトショットの各々における採用区間の位置を分散させてもよい。例えば、編集部１４４は、選択ショットに複数のスノーボードのジャンプに係るハイライトショットが含まれる場合、助走中、助走中から滞空中まで、滞空中、滞空中から着地後、又は着地から着地後までといった、採用区間の位置を分散させてもよい。同様に、編集部１４４は、選択ショットに複数のスノーボードのターンに係るハイライトショットが含まれる場合、転換前から転換中まで、転換中、又は転換中から転換後までといった、採用区間の位置を分散させてもよい。その場合、同じ種類のハイライトショットであっても異なる観点で採用区間が設定されるので、鑑賞者は飽きることなく要約映像を鑑賞することが可能となる。 For example, when the adopted section is set to two or more highlight shots of the same type (the same highlight ID), the editing unit 144 may distribute the positions of the adopted sections in each of the two or more highlight shots. For example, when the selected shot includes highlight shots related to jumps of a plurality of snowboards, the editing unit 144 is running, running from running to staying in the air, staying in the air, from staying in the air to after landing, or from landing to after landing For example, the positions of the adopted sections may be dispersed. Similarly, when the selected shot includes highlight shots related to a plurality of snowboard turns, the editing unit 144 determines the position of the adopted section from before conversion to during conversion, during conversion, or from conversion to after conversion. It may be dispersed. In that case, since the adopted sections are set from different viewpoints even for the same type of highlight shot, the viewer can appreciate the summary video without getting bored.

例えば、編集部１４４は、ハイライトショットと他の種類のハイライトショット又はサブショットとを連結するよう編集情報を生成してもよい。例えば、編集部１４４は、同じハイライトＩＤのハイライトショットを連続しないように割り当てたり、連続する場合には間にサブショットを割り当てたりする。これにより、要約映像は抑揚が効いたものとなるので、鑑賞者は飽きることなく要約映像を鑑賞することが可能となる。 For example, the editing unit 144 may generate editing information so as to connect the highlight shot and another type of highlight shot or sub-shot. For example, the editing unit 144 assigns highlight shots with the same highlight ID so as not to be continuous, or assigns sub-shots between them when they are consecutive. As a result, the summary video is inflected, so that the viewer can enjoy the summary video without getting bored.

例えば、編集部１４４は、ハイライトショットの採用区間の長さをサブショットの採用区間の長さより長く設定してもよい。例えば、編集部１４４は、長い割り当て区間には優先的にハイライトショットを割り当てる。これにより、鑑賞者はより長い時間ハイライトショットを鑑賞することが可能となるので、より効果的に鑑賞者の感情を盛り上げることができる。 For example, the editing unit 144 may set the length of the highlight shot adoption section longer than the length of the sub-shot adoption section. For example, the editing unit 144 preferentially assigns highlight shots to long assignment sections. Accordingly, the viewer can view the highlight shot for a longer time, and thus can more effectively excite the viewer.

以下、図１０〜図１２を参照して、採用区間の設定処理を具体的に説明する。図１０〜図１２は、本実施形態に係る採用区間の設定処理を説明するための図である。特に、図１０では、長い割り当て区間に優先的にハイライトショットを割り当てる例を説明する。 Hereinafter, with reference to FIGS. 10 to 12, the setting process of the adopted section will be specifically described. FIGS. 10-12 is a figure for demonstrating the setting process of the employment area which concerns on this embodiment. In particular, FIG. 10 illustrates an example in which highlight shots are preferentially assigned to long assignment intervals.

図１０に示すように、切り替えタイミングの設定処理において設定された割り当て区間７１０の内訳は、１小節単位の割り当て区間７１１が２個、２小節単位の割り当て区間７１２が４個、３小節単位の割り当て区間７１３が１個である場合を想定する。例えば、編集部１４４は、下記の表２に示す規則に従って、長い割り当て区間には優先的にハイライトショットを割り当てる。なお、ハイライトの種類やシーンセグメントの種類等に応じて、下記表２に示す規則はさらに細分化されてもよい。 As shown in FIG. 10, the breakdown of the allocation section 710 set in the switching timing setting process is two allocation sections 711 for one bar unit, four allocation sections 712 for two bar units, and allocation for three bar units. Assume that the number of sections 713 is one. For example, the editing unit 144 preferentially allocates highlight shots to long allocation sections according to the rules shown in Table 2 below. Note that the rules shown in Table 2 below may be further subdivided according to the type of highlight, the type of scene segment, and the like.

図１０に示すように、選択ショット７２０の内訳は、サブショット７２１Ａ、ハイライトショット７２２Ａ、サブショット７２１Ｂ、ハイライトショット７２２Ｂ、サブショット７２１Ｃ、サブショット７２１Ｄ及びハイライトショット７２２Ｃの順であるものとする。編集部１４４は、以下に説明するように各単位映像に割り当て区間を割り当てることで、どの単位映像をどのタイミングで切り替えるかを設定する編集情報７３０を生成する。 As shown in FIG. 10, the breakdown of the selected shot 720 is in the order of sub-shot 721A, highlight shot 722A, sub-shot 721B, highlight shot 722B, sub-shot 721C, sub-shot 721D, and highlight shot 722C. To do. The editing unit 144 generates editing information 730 for setting which unit video is switched at which timing by assigning an allocation section to each unit video as described below.

まず、編集部１４４は、１つ目の選択ショット７２０であるサブショット７２１Ａには、残りの割り当て区間のうち最も優先度が高い１小節単位の割り当て区間７１１Ａを割り当てる。次いで、編集部１４４は、２つ目の選択ショット７２０であるハイライトショット７２２Ａには、残りの割り当て区間のうち最も優先度が高い３小節単位の割り当て区間７１３を割り当てる。次に、編集部１４４は、３つ目の選択ショット７２０であるサブショット７２１Ｂには、残りの割り当て区間のうち最も優先度が高い１小節単位の割り当て区間７１１Ｂを割り当てる。次いで、編集部１４４は、４つ目の選択ショット７２０であるハイライトショット７２２Ｂには、残りの割り当て区間のうち最も優先度が高い２小節単位の割り当て区間７１２Ａを割り当てる。次に、編集部１４４は、５つ目の選択ショット７２０であるサブショット７２１Ｃには、残りの割り当て区間のうち最も優先度が高い２小節単位の割り当て区間７１２Ｂを割り当てる。次いで、編集部１４４は、６つ目の選択ショット７２０であるサブショット７２１Ｄには、残りの割り当て区間のうち最も優先度が高い２小節単位の割り当て区間７１２Ｃを割り当てる。最後に、編集部１４４は、７つ目の選択ショット７２０であるハイライトショット７２２Ｃには、残った２小節単位の割り当て区間７１２Ｄを割り当てる。 First, the editing unit 144 assigns an allocation section 711A in units of one measure having the highest priority among the remaining allocation sections to the sub-shot 721A that is the first selected shot 720. Next, the editing unit 144 assigns the allocation section 713 in units of three measures having the highest priority among the remaining allocation sections to the highlight shot 722A that is the second selected shot 720. Next, the editing unit 144 assigns the allocation section 711B having the highest priority among the remaining allocation sections to the sub-shot 721B that is the third selected shot 720. Next, the editing unit 144 assigns the allocation section 712A in units of two measures having the highest priority among the remaining allocation sections to the highlight shot 722B that is the fourth selected shot 720. Next, the editing unit 144 assigns the allocation section 712B having the highest priority among the remaining allocation sections to the sub-shot 721C that is the fifth selected shot 720. Next, the editing unit 144 assigns the allocation section 712C having the highest priority among the remaining allocation sections to the sub-shot 721D that is the sixth selected shot 720. Finally, the editing unit 144 allocates the remaining 2-bar unit allocation section 712D to the highlight shot 722C that is the seventh selected shot 720.

なお、上述した割り当てはメロディ等のひとつの構成要素ごとに行われるものとする。その場合、構成要素内でどのような割り当てが行われるにしろ、構成要素が切り替わるタイミングで単位映像が切り替わることが保証される。 Note that the above-described assignment is performed for each component such as a melody. In this case, it is guaranteed that the unit video is switched at the timing when the component is switched, regardless of what assignment is performed in the component.

続いて、図１１及び図１２を参照して、ひとつの単位映像における採用区間を設定する例を説明する。例えば、図１１に示すように、編集部１４４は、基本的には単位映像７４０の中央部分に採用区間７５０を設定する。一方で、図１２に示すように、編集部１４４は、ターン等のハイライトショットについては、単位映像７４０の前半部分、中央部分、又は後半部分に採用区間７５０を設定してもよい。ここで、編集部１４４が設定する採用区間７５０の長さは、図１０を参照して説明した、各単位映像に割り当てられた割り当て区間の長さに相当する。 Next, an example of setting an adopted section in one unit video will be described with reference to FIGS. 11 and 12. For example, as shown in FIG. 11, the editing unit 144 basically sets an adopted section 750 in the center portion of the unit video 740. On the other hand, as illustrated in FIG. 12, the editing unit 144 may set an adoption section 750 in the first half, the center, or the second half of the unit video 740 for a highlight shot such as a turn. Here, the length of the adopted section 750 set by the editing unit 144 corresponds to the length of the assigned section assigned to each unit video described with reference to FIG.

以上、本実施形態に係る映像処理装置１００が有する機能について詳細に説明した。続いて、図１３を参照して、本実施形態に係る映像処理装置１００の動作処理例を説明する。 The functions of the video processing apparatus 100 according to the present embodiment have been described above in detail. Next, an example of operation processing of the video processing apparatus 100 according to the present embodiment will be described with reference to FIG.

＜４．動作処理＞
図１３は、本実施形態に係る映像処理装置１００において実行される要約映像の生成処理の流れの一例を示すフローチャートである。 <4. Operation processing>
FIG. 13 is a flowchart illustrating an example of a summary video generation process executed in the video processing apparatus 100 according to the present embodiment.

図１３に示すように、まず、ステップＳ１０２で、音楽解析部１４１は、入力された音楽を解析する。例えば、音楽解析部１４１は、音楽理論に基づいて、イントロ部分やコーラス部分等の音楽の構造を解析したり、コーラス部分の中でも特に重要な部分を特定したり、拍や小節を解析したりする。 As shown in FIG. 13, first, in step S102, the music analysis unit 141 analyzes the input music. For example, the music analysis unit 141 analyzes the structure of music such as an intro part and a chorus part based on music theory, specifies a particularly important part in the chorus part, and analyzes beats and measures. .

次いで、ステップＳ１０４で、映像解析部１４２は、入力された映像を解析する。例えば、映像解析部１４２は、被写体動作を検出したり、ユーザ操作を検出したり、顔及び笑顔を検出したり、色を検出したり、カメラワークを検出したりする。 Next, in step S104, the video analysis unit 142 analyzes the input video. For example, the video analysis unit 142 detects a subject motion, detects a user operation, detects a face and a smile, detects a color, and detects camera work.

次に、ステップＳ１０６で、編集部１４４は、単位映像の切り替えタイミングを設定する。例えば、編集部１４４は、ステップＳ１０２における音楽解析結果に基づいて、拍毎、１小節ごと、又は複数小節ごとに切り替えタイミングを設定する。その際、編集部１４４は、コーラス部分の中でも特に重要な部分で拍に応じた切り替えが行われるよう設定し得る。本ステップにより、最長の割り当て区間長が定まる。 Next, in step S106, the editing unit 144 sets unit video switching timing. For example, the editing unit 144 sets the switching timing for each beat, for each bar, or for each bar based on the music analysis result in step S102. At that time, the editing unit 144 can set so that switching according to the beat is performed in a particularly important portion of the chorus portion. By this step, the longest allocation section length is determined.

次いで、ステップＳ１０８で、編集部１４４は、要約映像へ採用される単位映像の数（採用数）を計算する。例えば、編集部１４４は、上記ステップＳ１０６において設定した切り替えタイミングにより定まる割り当て区間の数に基づいて採用数を計算する。詳しくは、編集部１４４は、単位映像に重複がない場合は割り当て区間の数をそのまま採用数とし、重複が有る場合はその分を割り当て区間の数から差し引くことで、採用数を計算する。 Next, in step S108, the editing unit 144 calculates the number of unit videos (adopted number) adopted for the summary video. For example, the editing unit 144 calculates the number of adoption based on the number of allocation sections determined by the switching timing set in step S106. Specifically, the editing unit 144 calculates the number of adoption by subtracting the number of assigned sections from the number of assigned sections if the number of assigned sections is the same as the number of adopted sections when there is no overlap in the unit video.

次に、ステップＳ１１０で、抽出部１４３は、単位映像を抽出する。例えば、抽出部１４３は、上記ステップＳ１０４における映像解析結果に基づいて、ハイライトショット及びサブショットを抽出する。その際、抽出部１４３は、上記ステップＳ１０６において設定された切り替えタイミングにより定まる割り当て区間のうち最長の割り当て区間以上の長さで単位映像を抽出する。また、抽出部１４３は、抽出したハイライトショット及びサブショットの総数を、抽出数として計算する。 Next, in step S110, the extraction unit 143 extracts a unit video. For example, the extraction unit 143 extracts highlight shots and sub-shots based on the video analysis result in step S104. At that time, the extraction unit 143 extracts a unit video with a length equal to or longer than the longest allocation section among the allocation sections determined by the switching timing set in step S106. Further, the extraction unit 143 calculates the total number of extracted highlight shots and sub-shots as the number of extractions.

次いで、ステップＳ１１２で、動作モード制御部１４５は、抽出数が採用数以上であるか否かを判定する。 Next, in step S112, the operation mode control unit 145 determines whether the number of extractions is equal to or greater than the number of adoptions.

抽出数が採用数以上ではないと判定された場合（Ｓ１１２／ＮＯ）、ステップＳ１１４で、動作モード制御部１４５は、動作モードを変更する。例えば、動作モード制御部１４５は、変更前が通常動作モードであれば、分割処理モードに変更する。そして、処理はステップＳ１０６へ戻る。このように、抽出数が採用数以上となるまで、動作モード制御部１４５は動作モードを変更して処理をステップＳ１０６へ戻す。なお、どの動作モードであっても抽出数が採用数以上とならない場合、映像処理装置１００は、エラーを出力して処理を停止してもよい。 If it is determined that the number of extractions is not equal to or greater than the number of adoptions (S112 / NO), in step S114, the operation mode control unit 145 changes the operation mode. For example, the operation mode control unit 145 changes to the division processing mode when the change before the normal operation mode. Then, the process returns to step S106. As described above, the operation mode control unit 145 changes the operation mode and returns the process to step S106 until the number of extractions is equal to or more than the number of adoption. Note that in any operation mode, when the number of extractions does not exceed the number of adoptions, the video processing apparatus 100 may output an error and stop the processing.

抽出数が採用数以上であると判定された場合（Ｓ１１２／ＹＥＳ）、ステップＳ１１６で、編集部１４４は、要約映像へ採用する単位映像を選択する。例えば、編集部１４４は、抽出部１４３により抽出された単位映像の中から、視覚的な偏りを低減するために属性が分散する単位映像を採用したり、ユーザにより指定されたテーマに沿うように単位映像を選択したりする。なお、編集部１４４は、サブショットと比較してハイライトショットを優先的に採用してもよい。 If it is determined that the extracted number is equal to or larger than the adopted number (S112 / YES), in step S116, the editing unit 144 selects a unit video to be adopted for the summary video. For example, the editing unit 144 adopts a unit video in which attributes are dispersed in order to reduce the visual bias from the unit videos extracted by the extraction unit 143, or follows the theme specified by the user. Select unit video. Note that the editing unit 144 may preferentially adopt the highlight shot as compared to the sub-shot.

次に、ステップＳ１１８で、編集部１４４は、各単位映像の採用区間を設定する。例えば、編集部１４４は、上記ステップＳ１１６において選択した各単位映像のうち、要約映像に採用すべき採用区間を設定する。その際、編集部１４４は、単位映像の内容に応じて、例えば特に注目すべき区間が要約映像に採用されるよう、適切な位置に採用区間を設定する。なお、編集部１４４は、以上説明した処理結果を編集情報に格納する。 Next, in step S118, the editing unit 144 sets an adopted section for each unit video. For example, the editing unit 144 sets an adopted section to be adopted for the summary video among the unit videos selected in step S116. At this time, the editing unit 144 sets an adopted section at an appropriate position so that, for example, a section to be particularly noted is adopted in the summary video according to the content of the unit video. The editing unit 144 stores the processing result described above in the editing information.

そして、ステップＳ１２０で、要約映像生成部１４６は、要約映像を生成する。例えば、要約映像生成部１４６は、編集情報により指定された音楽をＢＧＭとして、編集情報により指定された単位映像を指定されたタイミングで切り替えて連結することで、要約映像を生成する。 In step S120, the summary video generation unit 146 generates a summary video. For example, the summary video generation unit 146 generates a summary video by switching and connecting the unit video specified by the editing information at the specified timing with the music specified by the editing information as BGM.

以上、本実施形態に係る要約映像の生成処理の流れの一例を説明した。 Heretofore, an example of the summary video generation processing flow according to the present embodiment has been described.

＜５．ハードウェア構成例＞
最後に、図１４を参照して、本実施形態に係る情報処理装置のハードウェア構成について説明する。図１４は、本実施形態に係る情報処理装置のハードウェア構成の一例を示すブロック図である。なお、図１４に示す情報処理装置９００は、例えば、図４に示した映像処理装置１００を実現し得る。本実施形態に係る映像処理装置１００による情報処理は、ソフトウェアと、以下に説明するハードウェアとの協働により実現される。 <5. Hardware configuration example>
Finally, the hardware configuration of the information processing apparatus according to the present embodiment will be described with reference to FIG. FIG. 14 is a block diagram illustrating an example of a hardware configuration of the information processing apparatus according to the present embodiment. Note that the information processing apparatus 900 illustrated in FIG. 14 can realize the video processing apparatus 100 illustrated in FIG. 4, for example. Information processing by the video processing apparatus 100 according to the present embodiment is realized by cooperation of software and hardware described below.

図１４に示すように、情報処理装置９００は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）９０１、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）９０２、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）９０３及びホストバス９０４ａを備える。また、情報処理装置９００は、ブリッジ９０４、外部バス９０４ｂ、インタフェース９０５、入力装置９０６、出力装置９０７、ストレージ装置９０８、ドライブ９０９、接続ポート９１１、通信装置９１３及びセンサ９１５を備える。情報処理装置９００は、ＣＰＵ９０１に代えて、又はこれとともに、ＤＳＰ若しくはＡＳＩＣ等の処理回路を有してもよい。 As illustrated in FIG. 14, the information processing apparatus 900 includes a CPU (Central Processing Unit) 901, a ROM (Read Only Memory) 902, a RAM (Random Access Memory) 903, and a host bus 904a. The information processing apparatus 900 includes a bridge 904, an external bus 904b, an interface 905, an input device 906, an output device 907, a storage device 908, a drive 909, a connection port 911, a communication device 913, and a sensor 915. The information processing apparatus 900 may include a processing circuit such as a DSP or an ASIC in place of or in addition to the CPU 901.

ＣＰＵ９０１は、演算処理装置および制御装置として機能し、各種プログラムに従って情報処理装置９００内の動作全般を制御する。また、ＣＰＵ９０１は、マイクロプロセッサであってもよい。ＲＯＭ９０２は、ＣＰＵ９０１が使用するプログラムや演算パラメータ等を記憶する。ＲＡＭ９０３は、ＣＰＵ９０１の実行において使用するプログラムや、その実行において適宜変化するパラメータ等を一時記憶する。ＣＰＵ９０１は、例えば、図４に示す制御部１４０を形成し得る。 The CPU 901 functions as an arithmetic processing device and a control device, and controls the overall operation in the information processing device 900 according to various programs. Further, the CPU 901 may be a microprocessor. The ROM 902 stores programs used by the CPU 901, calculation parameters, and the like. The RAM 903 temporarily stores programs used in the execution of the CPU 901, parameters that change as appropriate during the execution, and the like. For example, the CPU 901 can form the control unit 140 shown in FIG.

ＣＰＵ９０１、ＲＯＭ９０２及びＲＡＭ９０３は、ＣＰＵバスなどを含むホストバス９０４ａにより相互に接続されている。ホストバス９０４ａは、ブリッジ９０４を介して、ＰＣＩ（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ／Ｉｎｔｅｒｆａｃｅ）バスなどの外部バス９０４ｂに接続されている。なお、必ずしもホストバス９０４ａ、ブリッジ９０４および外部バス９０４ｂを分離構成する必要はなく、１つのバスにこれらの機能を実装してもよい。 The CPU 901, ROM 902 and RAM 903 are connected to each other by a host bus 904a including a CPU bus. The host bus 904a is connected to an external bus 904b such as a PCI (Peripheral Component Interconnect / Interface) bus via a bridge 904. Note that the host bus 904a, the bridge 904, and the external bus 904b do not necessarily have to be configured separately, and these functions may be mounted on one bus.

入力装置９０６は、例えば、マウス、キーボード、タッチパネル、ボタン、マイクロフォン、スイッチ及びレバー等、ユーザによって情報が入力される装置によって実現される。また、入力装置９０６は、例えば、赤外線やその他の電波を利用したリモートコントロール装置であってもよいし、情報処理装置９００の操作に対応した携帯電話やＰＤＡ等の外部接続機器であってもよい。さらに、入力装置９０６は、例えば、上記の入力手段を用いてユーザにより入力された情報に基づいて入力信号を生成し、ＣＰＵ９０１に出力する入力制御回路などを含んでいてもよい。情報処理装置９００のユーザは、この入力装置９０６を操作することにより、情報処理装置９００に対して各種のデータを入力したり処理動作を指示したりすることができる。入力装置９０６は、例えば、図４に示す操作部１１２を形成し得る。 The input device 906 is realized by a device to which information is input by a user, such as a mouse, a keyboard, a touch panel, a button, a microphone, a switch, and a lever. The input device 906 may be, for example, a remote control device using infrared rays or other radio waves, or may be an external connection device such as a mobile phone or a PDA that supports the operation of the information processing device 900. . Furthermore, the input device 906 may include, for example, an input control circuit that generates an input signal based on information input by the user using the above-described input means and outputs the input signal to the CPU 901. A user of the information processing apparatus 900 can input various data and instruct a processing operation to the information processing apparatus 900 by operating the input device 906. The input device 906 can form, for example, the operation unit 112 shown in FIG.

出力装置９０７は、取得した情報をユーザに対して視覚的又は聴覚的に通知することが可能な装置で形成される。このような装置として、ＣＲＴディスプレイ装置、液晶ディスプレイ装置、プラズマディスプレイ装置、ＥＬディスプレイ装置及びランプ等の表示装置や、スピーカ及びヘッドホン等の音声出力装置や、プリンタ装置等がある。出力装置９０７は、例えば、情報処理装置９００が行った各種処理により得られた結果を出力する。具体的には、表示装置は、情報処理装置９００が行った各種処理により得られた結果を、テキスト、イメージ、表、グラフ等、様々な形式で視覚的に表示する。他方、音声出力装置は、再生された音声データや音響データ等からなるオーディオ信号をアナログ信号に変換して聴覚的に出力する。上記表示装置及び上記音声出力装置は、例えば、図４に示す出力部１３０を形成し得る。 The output device 907 is formed of a device capable of visually or audibly notifying acquired information to the user. Examples of such devices include CRT display devices, liquid crystal display devices, plasma display devices, EL display devices, display devices such as lamps, audio output devices such as speakers and headphones, printer devices, and the like. For example, the output device 907 outputs results obtained by various processes performed by the information processing device 900. Specifically, the display device visually displays results obtained by various processes performed by the information processing device 900 in various formats such as text, images, tables, and graphs. On the other hand, the audio output device converts an audio signal composed of reproduced audio data, acoustic data, and the like into an analog signal and outputs it aurally. The display device and the audio output device can form, for example, the output unit 130 shown in FIG.

ストレージ装置９０８は、情報処理装置９００の記憶部の一例として形成されたデータ格納用の装置である。ストレージ装置９０８は、例えば、ＨＤＤ等の磁気記憶部デバイス、半導体記憶デバイス、光記憶デバイス又は光磁気記憶デバイス等により実現される。ストレージ装置９０８は、記憶媒体、記憶媒体にデータを記録する記録装置、記憶媒体からデータを読み出す読出し装置および記憶媒体に記録されたデータを削除する削除装置などを含んでもよい。このストレージ装置９０８は、ＣＰＵ９０１が実行するプログラムや各種データ及び外部から取得した各種のデータ等を格納する。ストレージ装置９０８は、例えば、図４に示す記憶部１２０を形成し得る。 The storage device 908 is a data storage device formed as an example of a storage unit of the information processing device 900. The storage apparatus 908 is realized by, for example, a magnetic storage device such as an HDD, a semiconductor storage device, an optical storage device, a magneto-optical storage device, or the like. The storage device 908 may include a storage medium, a recording device that records data on the storage medium, a reading device that reads data from the storage medium, a deletion device that deletes data recorded on the storage medium, and the like. The storage device 908 stores programs executed by the CPU 901, various data, various data acquired from the outside, and the like. The storage device 908 can form, for example, the storage unit 120 shown in FIG.

ドライブ９０９は、記憶媒体用リーダライタであり、情報処理装置９００に内蔵、あるいは外付けされる。ドライブ９０９は、装着されている磁気ディスク、光ディスク、光磁気ディスク、または半導体メモリ等のリムーバブル記憶媒体に記録されている情報を読み出して、ＲＡＭ９０３に出力する。また、ドライブ９０９は、リムーバブル記憶媒体に情報を書き込むこともできる。 The drive 909 is a storage medium reader / writer, and is built in or externally attached to the information processing apparatus 900. The drive 909 reads information recorded on a removable storage medium such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory, and outputs the information to the RAM 903. The drive 909 can also write information to a removable storage medium.

接続ポート９１１は、外部機器と接続されるインタフェースであって、例えばＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）などによりデータ伝送可能な外部機器との接続口である。接続ポート９１１は、例えば、図４に示す音楽取得部１１４を形成し得る。 The connection port 911 is an interface connected to an external device, and is a connection port with an external device capable of transmitting data by, for example, USB (Universal Serial Bus). The connection port 911 can form, for example, the music acquisition unit 114 shown in FIG.

通信装置９１３は、例えば、ネットワーク９２０に接続するための通信デバイス等で形成された通信インタフェースである。通信装置９１３は、例えば、有線若しくは無線ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）、Ｂｌｕｅｔｏｏｔｈ（登録商標）又はＷＵＳＢ（ＷｉｒｅｌｅｓｓＵＳＢ）用の通信カード等である。また、通信装置９１３は、光通信用のルータ、ＡＤＳＬ（ＡｓｙｍｍｅｔｒｉｃＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）用のルータ又は各種通信用のモデム等であってもよい。この通信装置９１３は、例えば、インターネットや他の通信機器との間で、例えばＴＣＰ／ＩＰ等の所定のプロトコルに則して信号等を送受信することができる。通信装置９１３は、例えば、図４に示す音楽取得部１１４を形成し得る。 The communication device 913 is a communication interface formed by a communication device for connecting to the network 920, for example. The communication device 913 is, for example, a communication card for wired or wireless LAN (Local Area Network), LTE (Long Term Evolution), Bluetooth (registered trademark), or WUSB (Wireless USB). The communication device 913 may be a router for optical communication, a router for ADSL (Asymmetric Digital Subscriber Line), a modem for various communication, or the like. The communication device 913 can transmit and receive signals and the like according to a predetermined protocol such as TCP / IP, for example, with the Internet and other communication devices. The communication device 913 can form, for example, the music acquisition unit 114 illustrated in FIG.

なお、ネットワーク９２０は、ネットワーク９２０に接続されている装置から送信される情報の有線、または無線の伝送路である。例えば、ネットワーク９２０は、インターネット、電話回線網、衛星通信網などの公衆回線網や、Ｅｔｈｅｒｎｅｔ（登録商標）を含む各種のＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）などを含んでもよい。また、ネットワーク９２０は、ＩＰ−ＶＰＮ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ−ＶｉｒｔｕａｌＰｒｉｖａｔｅＮｅｔｗｏｒｋ）などの専用回線網を含んでもよい。 The network 920 is a wired or wireless transmission path for information transmitted from a device connected to the network 920. For example, the network 920 may include a public line network such as the Internet, a telephone line network, a satellite communication network, various LANs (Local Area Network) including Ethernet (registered trademark), a WAN (Wide Area Network), and the like. The network 920 may also include a dedicated line network such as an IP-VPN (Internet Protocol-Virtual Private Network).

センサ９１５は、例えば、加速度センサ、ジャイロセンサ、地磁気センサ、光センサ、音センサ、測距センサ、力センサ等の各種のセンサである。センサ９１５は、情報処理装置９００の姿勢、移動速度等、情報処理装置９００自身の状態に関する情報や、情報処理装置９００の周辺の明るさや騒音等、情報処理装置９００の周辺環境に関する情報を取得する。また、センサ９１５は、ＧＰＳ信号を受信して装置の緯度、経度及び高度を測定するＧＰＳセンサを含んでもよい。センサ９１５は、例えば、図４に示すセンサ部１１１を形成し得る。本実施形態においては、センサ９１５は、情報処理装置９００と分離していてもよい。例えば、センサ９１５は被写体に装着され、情報処理装置９００は、被写体をセンシングした結果を示す情報を有線又は無線通信により取得してもよい。 The sensor 915 is various sensors such as an acceleration sensor, a gyro sensor, a geomagnetic sensor, an optical sensor, a sound sensor, a distance measuring sensor, and a force sensor. The sensor 915 acquires information on the state of the information processing apparatus 900 itself, such as the posture and movement speed of the information processing apparatus 900, and information on the surrounding environment of the information processing apparatus 900, such as brightness and noise around the information processing apparatus 900. . Sensor 915 may also include a GPS sensor that receives GPS signals and measures the latitude, longitude, and altitude of the device. The sensor 915 can form, for example, the sensor unit 111 shown in FIG. In the present embodiment, the sensor 915 may be separated from the information processing apparatus 900. For example, the sensor 915 may be attached to a subject, and the information processing apparatus 900 may acquire information indicating a result of sensing the subject by wired or wireless communication.

撮像装置９１７は、撮像レンズ、絞り、ズームレンズ、及びフォーカスレンズ等により構成されるレンズ系、レンズ系に対してフォーカス動作やズーム動作を行わせる駆動系、レンズ系で得られる撮像光を光電変換して撮像信号を生成する固体撮像素子アレイ等を有する。固体撮像素子アレイは、例えばＣＣＤ（ＣｈａｒｇｅＣｏｕｐｌｅｄＤｅｖｉｃｅ）センサアレイや、ＣＭＯＳ（ＣｏｍｐｌｅｍｅｎｔａｒｙＭｅｔａｌＯｘｉｄｅＳｅｍｉｃｏｎｄｕｃｔｏｒ）センサアレイにより実現されてもよい。撮像装置９１７は、デジタル信号とされた撮影画像のデータを出力する。撮像装置９１７は、例えば、図４に示す映像取得部１１３を形成し得る。 The imaging device 917 photoelectrically converts imaging light obtained by the lens system including a lens system including an imaging lens, a diaphragm, a zoom lens, a focus lens, and the like, a drive system that performs a focus operation and a zoom operation on the lens system, and the lens system. A solid-state imaging device array that generates an imaging signal. The solid-state imaging device array may be realized by, for example, a CCD (Charge Coupled Device) sensor array or a CMOS (Complementary Metal Oxide Semiconductor) sensor array. The imaging device 917 outputs captured image data that is a digital signal. The imaging device 917 can form, for example, the video acquisition unit 113 illustrated in FIG.

以上、本実施形態に係る情報処理装置９００の機能を実現可能なハードウェア構成の一例を示した。上記の各構成要素は、汎用的な部材を用いて実現されていてもよいし、各構成要素の機能に特化したハードウェアにより実現されていてもよい。従って、本実施形態を実施する時々の技術レベルに応じて、適宜、利用するハードウェア構成を変更することが可能である。 Heretofore, an example of the hardware configuration capable of realizing the functions of the information processing apparatus 900 according to the present embodiment has been shown. Each of the above components may be realized using a general-purpose member, or may be realized by hardware specialized for the function of each component. Therefore, it is possible to change the hardware configuration to be used as appropriate according to the technical level at the time of carrying out this embodiment.

なお、上述のような本実施形態に係る情報処理装置９００の各機能を実現するためのコンピュータプログラムを作製し、ＰＣ等に実装することが可能である。また、このようなコンピュータプログラムが格納された、コンピュータで読み取り可能な記録媒体も提供することができる。記録媒体は、例えば、磁気ディスク、光ディスク、光磁気ディスク、フラッシュメモリ等である。また、上記のコンピュータプログラムは、記録媒体を用いずに、例えばネットワークを介して配信されてもよい。 Note that a computer program for realizing each function of the information processing apparatus 900 according to the present embodiment as described above can be produced and mounted on a PC or the like. In addition, a computer-readable recording medium storing such a computer program can be provided. The recording medium is, for example, a magnetic disk, an optical disk, a magneto-optical disk, a flash memory, or the like. Further, the above computer program may be distributed via a network, for example, without using a recording medium.

＜６．まとめ＞
以上、図１〜図１４を参照して、本開示の一実施形態について詳細に説明した。上記説明したように、本実施形態に係る映像処理装置１００は、音楽に合わせた適切なタイミングで適切な単位映像が切り替わることで、鑑賞者の感情を盛り上げることが可能な要約映像を生成することが可能である。 <6. Summary>
The embodiment of the present disclosure has been described in detail above with reference to FIGS. As described above, the video processing apparatus 100 according to the present embodiment generates a summary video that can excite the viewer's emotions by switching an appropriate unit video at an appropriate timing according to music. Is possible.

より詳しくは、映像処理装置１００は、入力された音楽の拍を解析し、入力された映像から複数の単位映像を抽出し、抽出した単位映像を拍に応じて切り替えるための編集情報を生成する。これにより、拍に応じた速いタイミングで単位映像が切り替わるので、鑑賞者の感情をより効果的に盛り上げることが可能である。 More specifically, the video processing apparatus 100 analyzes the beat of the input music, extracts a plurality of unit videos from the input video, and generates editing information for switching the extracted unit videos according to the beat. . As a result, the unit video is switched at a fast timing according to the beat, so that the viewer's emotion can be raised more effectively.

また、映像処理装置１００は、抽出した単位映像に当該単位映像の内容に応じた採用区間を設定し、複数の単位映像の各々について設定した採用区間を採用するための編集情報を生成する。これにより、映像処理装置１００は、要約映像に採用される候補として抽出された区間の各々について、抽出された区間の各々のうち特に鑑賞すべき区間に、実際に要約映像に採用する区間を設定することが可能となる。よって、例えば鑑賞者の感情を盛り上げるためにより適切な区間が、要約映像に採用されることとなる。 In addition, the video processing apparatus 100 sets an adopted section corresponding to the content of the unit video in the extracted unit video, and generates editing information for adopting the adopted section set for each of the plurality of unit videos. As a result, the video processing apparatus 100 sets, for each section extracted as a candidate to be adopted for the summary video, a section to be actually used for the summary video in a section to be particularly appreciated among each of the extracted sections. It becomes possible to do. Therefore, for example, a more appropriate section is used for the summary video to excite the viewer's emotions.

また、映像処理装置１００は、入力された映像から単位映像を抽出する処理、及び入力された音楽に応じて単位映像を切り替えるタイミングを設定する処理に関する動作モードを制御する。これにより、映像処理装置１００は、音楽に応じて映像を切り替える要約映像の生成を適切な動作モードで行うことが可能である。具体的には、映像処理装置１００は、採用数と抽出数とが等しい又は抽出数の方が多い状態となるよう動作モードを切り替えることにより、設定した切り替えタイミングでそれぞれ異なる単位映像を切り替えることができる。 In addition, the video processing apparatus 100 controls an operation mode related to a process of extracting a unit video from the input video and a process of setting a timing for switching the unit video according to the input music. As a result, the video processing apparatus 100 can generate a summary video that switches videos according to music in an appropriate operation mode. Specifically, the video processing apparatus 100 can switch between different unit videos at the set switching timing by switching the operation mode so that the number of employed and the number of extracted is equal or the number of extracted is larger. it can.

以上、添付図面を参照しながら本開示の好適な実施形態について詳細に説明したが、本開示の技術的範囲はかかる例に限定されない。本開示の技術分野における通常の知識を有する者であれば、特許請求の範囲に記載された技術的思想の範疇内において、各種の変更例または修正例に想到し得ることは明らかであり、これらについても、当然に本開示の技術的範囲に属するものと了解される。 The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the technical scope of the present disclosure is not limited to such examples. It is obvious that a person having ordinary knowledge in the technical field of the present disclosure can come up with various changes or modifications within the scope of the technical idea described in the claims. Of course, it is understood that it belongs to the technical scope of the present disclosure.

なお、本明細書において説明した各装置は、単独の装置として実現されてもよく、一部または全部が別々の装置として実現されても良い。例えば、図４に示した映像処理装置１００の機能構成例のうち、記憶部１２０及び制御部１４０が、入力部１１０及び出力部１３０とネットワーク等で接続されたサーバ等の装置に備えられていても良い。 Each device described in this specification may be realized as a single device, or a part or all of the devices may be realized as separate devices. For example, in the functional configuration example of the video processing apparatus 100 illustrated in FIG. 4, the storage unit 120 and the control unit 140 are provided in a device such as a server connected to the input unit 110 and the output unit 130 via a network or the like. Also good.

また、本明細書においてフローチャート及びシーケンス図を用いて説明した処理は、必ずしも図示された順序で実行されなくてもよい。いくつかの処理ステップは、並列的に実行されてもよい。また、追加的な処理ステップが採用されてもよく、一部の処理ステップが省略されてもよい。 Further, the processes described using the flowcharts and sequence diagrams in this specification are not necessarily executed in the order shown. Some processing steps may be performed in parallel. Further, additional processing steps may be employed, and some processing steps may be omitted.

また、本明細書に記載された効果は、あくまで説明的または例示的なものであって限定的ではない。つまり、本開示に係る技術は、上記の効果とともに、または上記の効果に代えて、本明細書の記載から当業者には明らかな他の効果を奏しうる。 Further, the effects described in the present specification are merely illustrative or exemplary and are not limited. That is, the technology according to the present disclosure can exhibit other effects that are apparent to those skilled in the art from the description of the present specification in addition to or instead of the above effects.

なお、以下のような構成も本開示の技術的範囲に属する。
（１）
入力された映像から複数の単位映像を抽出することと、
抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成することと、
前記単位映像を抽出する抽出処理及び前記単位映像を採用する採用処理における動作モードをプロセッサにより制御することと、
を含む映像処理方法。
（２）
前記動作モードを制御することにおいて、第１の動作モードにおける、抽出される前記単位映像の抽出数と採用される前記単位映像の採用数との大小関係に応じて、再度の前記抽出処理又は再度の前記採用処理の少なくともいずれかを、前記動作モードを変更して実施させる否かを判定する、前記（１）に記載の映像処理方法。
（３）
前記映像処理方法は、
前記音楽の内容を解析することと、
前記映像の内容を解析することと、
をさらに含み、
前記抽出数は、前記映像の解析結果に基づいて抽出された前記単位映像の数であり、
前記採用数は、前記音楽の解析結果に基づいて前記音楽が区切られた数である、前記（２）に記載の映像処理方法。
（４）
前記動作モードを制御することにおいて、前記第１の動作モードにおける前記採用数と前記抽出数とが等しい又は前記抽出数の方が多い場合に、前記動作モードの変更を行わないと判定する、前記（３）に記載の映像処理方法。
（５）
前記動作モードを制御することにおいて、前記第１の動作モードにおける前記抽出数が前記採用数よりも少ない場合に、前記動作モードを第２の動作モードとし、
前記抽出処理において、前記第２の動作モードの場合、前記第１の動作モードにおいて抽出された前記単位映像のうち少なくともいずれかを２以上の前記単位映像に分割する、前記（４）に記載の映像処理方法。
（６）
前記抽出処理において、前記音楽が区切られる最長の区間以上の長さの前記単位映像を抽出する、前記（４）又は（５）に記載の映像処理方法。
（７）
前記動作モードを制御することにおいて、前記第１の動作モードにおける前記抽出数が前記採用数よりも少ない場合に、前記動作モードを第３の動作モードとし、
前記採用処理は、前記第３の動作モードの場合、前記第１の動作モードと比較して前記音楽が区切られる最長の区間を短くする、前記（６）に記載の映像処理方法。
（８）
前記動作モードを制御することにおいて、前記第１の動作モードにおける前記抽出数が前記採用数よりも少ない場合に、前記動作モードを第４の動作モードとし、
前記抽出処理において、前記第４の動作モードの場合、前記第１の動作モードと比較して前記単位映像を抽出するための前記映像の解析結果に関する条件を緩和する、前記（４）〜（７）のいずれか一項に記載の映像処理方法。
（９）
前記動作モードを制御することにおいて、前記第１の動作モードにおける前記抽出数が前記採用数よりも少ない場合に、前記動作モードを第５の動作モードとし、
前記採用処理において、前記第５の動作モードの場合、所定の間隔で前記音楽を区切り
前記抽出処理において、前記第５の動作モードの場合、前記映像を前記所定の間隔で区切った前記単位映像を抽出する、前記（４）〜（８）のいずれか一項に記載の映像処理方法。
（１０）
入力された映像から複数の単位映像を抽出する抽出部と、
前記抽出部により抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成する編集部と、
前記抽出部及び前記編集部における動作モードを制御する動作モード制御部と、
を備える映像処理装置。
（１１）
前記動作モード制御部は、第１の動作モードにおける、前記抽出部により抽出される前記単位映像の抽出数と前記編集部により採用される前記単位映像の採用数との大小関係に応じて、前記抽出部による再度の抽出処理又は前記編集部による再度の採用処理の少なくともいずれかを、前記動作モードを変更して実施させる否かを判定する、前記（１０）に記載の映像処理装置。
（１２）
前記映像処理装置は、
前記音楽の内容を解析する音楽解析部と、
前記映像の内容を解析する映像解析部と、
をさらに備え、
前記抽出数は、前記映像解析部による解析結果に基づいて前記抽出部により抽出された前記単位映像の数であり、
前記採用数は、前記音楽解析部による解析結果に基づいて前記編集部により前記音楽が区切られた数である、前記（１１）に記載の映像処理装置。
（１３）
前記動作モード制御部は、前記第１の動作モードにおける前記採用数と前記抽出数とが等しい又は前記抽出数の方が多い場合に、前記動作モードの変更を行わないと判定する、前記（１２）に記載の映像処理装置。
（１４）
コンピュータを、
入力された映像から複数の単位映像を抽出する抽出部と、
前記抽出部により抽出された前記単位映像から採用した前記単位映像を、入力された音楽に応じて切り替えるための編集情報を生成する編集部と、
前記抽出部及び前記編集部における動作モードを制御する動作モード制御部と、
として機能させるためのプログラム。 The following configurations also belong to the technical scope of the present disclosure.
(1)
Extracting multiple unit videos from the input video;
Generating editing information for switching the unit video adopted from the extracted unit video according to the input music;
Controlling an operation mode in an extraction process of extracting the unit video and an adoption process of adopting the unit video by a processor;
Video processing method.
(2)
In controlling the operation mode, in the first operation mode, depending on the magnitude relationship between the number of extracted unit videos to be extracted and the number of adopted unit videos to be employed, the extraction process is performed again or again. The video processing method according to (1), wherein it is determined whether or not at least one of the adoption processes is performed by changing the operation mode.
(3)
The video processing method includes:
Analyzing the music content;
Analyzing the content of the video;
Further including
The extraction number is the number of unit videos extracted based on the analysis result of the video,
The video processing method according to (2), wherein the adopted number is a number obtained by dividing the music based on the analysis result of the music.
(4)
In controlling the operation mode, when the adopted number and the extraction number in the first operation mode are equal or the extraction number is larger, it is determined not to change the operation mode, The video processing method according to (3).
(5)
In controlling the operation mode, when the number of extraction in the first operation mode is less than the adopted number, the operation mode is set as the second operation mode,
In the extraction process, in the second operation mode, at least one of the unit videos extracted in the first operation mode is divided into two or more unit videos. Video processing method.
(6)
The video processing method according to (4) or (5), wherein in the extraction process, the unit video having a length equal to or longer than a longest section in which the music is divided is extracted.
(7)
In controlling the operation mode, when the number of extractions in the first operation mode is less than the number of adoption, the operation mode is a third operation mode,
The video processing method according to (6), wherein, in the third operation mode, the adoption process shortens a longest section in which the music is divided as compared to the first operation mode.
(8)
In controlling the operation mode, when the number of extraction in the first operation mode is less than the number of adoption, the operation mode is a fourth operation mode,
In the extraction process, in the case of the fourth operation mode, the conditions regarding the analysis result of the video for extracting the unit video are relaxed compared to the first operation mode. The video processing method according to any one of the above.
(9)
In controlling the operation mode, when the number of extraction in the first operation mode is less than the adopted number, the operation mode is set as a fifth operation mode,
In the adoption process, in the fifth operation mode, the music is divided at a predetermined interval. In the extraction process, in the fifth operation mode, the unit video obtained by dividing the video at the predetermined interval is The video processing method according to any one of (4) to (8), wherein the video processing method is extracted.
(10)
An extraction unit for extracting a plurality of unit videos from the input video;
An editing unit that generates editing information for switching the unit video adopted from the unit video extracted by the extraction unit according to input music;
An operation mode control unit for controlling an operation mode in the extraction unit and the editing unit;
A video processing apparatus comprising:
(11)
The operation mode control unit, in the first operation mode, according to the magnitude relationship between the number of unit videos extracted by the extraction unit and the number of unit videos adopted by the editing unit, The video processing apparatus according to (10), wherein it is determined whether or not at least one of a re-extraction process by the extraction unit and a re-adoption process by the editing unit is performed by changing the operation mode.
(12)
The video processing device includes:
A music analysis unit for analyzing the contents of the music;
A video analysis unit for analyzing the content of the video;
Further comprising
The number of extractions is the number of unit videos extracted by the extraction unit based on the analysis result by the video analysis unit,
The video processing apparatus according to (11), wherein the adopted number is a number obtained by dividing the music by the editing unit based on an analysis result by the music analysis unit.
(13)
The operation mode control unit determines not to change the operation mode when the adopted number and the extracted number in the first operation mode are equal to or greater than the extracted number. ).
(14)
Computer
An extraction unit for extracting a plurality of unit videos from the input video;
An editing unit that generates editing information for switching the unit video adopted from the unit video extracted by the extraction unit according to input music;
An operation mode control unit for controlling an operation mode in the extraction unit and the editing unit;
Program to function as.

１０映像
２０映像解析結果情報
３０音楽
４０編集情報
５０要約映像
１００映像処理装置
１１０入力部
１１１センサ部
１１２操作部
１１３映像取得部
１１４音楽取得部
１２０記憶部
１３０出力部
１４０制御部
１４１音楽解析部
１４２映像解析部
１４３抽出部
１４４編集部
１４５動作モード制御部
１４６要約映像生成部
10 video 20 video analysis result information 30 music 40 editing information 50 summary video 100 video processing apparatus 110 input unit 111 sensor unit 112 operation unit 113 video acquisition unit 114 music acquisition unit 120 storage unit 130 output unit 140 control unit 141 control unit 141 music analysis unit 142 Video analysis unit 143 Extraction unit 144 Editing unit 145 Operation mode control unit 146 Summary video generation unit

Claims

Extracting multiple unit videos from the input video;
Generating editing information for switching the unit video adopted from the extracted unit video according to the input music;
Controlling an operation mode in an extraction process of extracting the unit video and an adoption process of adopting the unit video by a processor;
Video processing method.

In controlling the operation mode, in the first operation mode, depending on the magnitude relationship between the number of extracted unit videos to be extracted and the number of adopted unit videos to be employed, the extraction process is performed again or again. The video processing method according to claim 1, wherein it is determined whether or not to perform at least one of the adoption processes by changing the operation mode.

The video processing method includes:
Analyzing the music content;
Analyzing the content of the video;
Further including
The extraction number is the number of unit videos extracted based on the analysis result of the video,
The video processing method according to claim 2, wherein the adopted number is a number obtained by dividing the music based on an analysis result of the music.

In the control of the operation mode, it is determined that the change of the operation mode is not performed when the adopted number and the extraction number in the first operation mode are equal to or greater than the extraction number. Item 4. The video processing method according to Item 3.

In controlling the operation mode, when the number of extraction in the first operation mode is less than the adopted number, the operation mode is set as the second operation mode,
5. The video according to claim 4, wherein, in the extraction process, in the second operation mode, at least one of the unit videos extracted in the first operation mode is divided into two or more unit videos. 6. Processing method.

5. The video processing method according to claim 4, wherein, in the extraction process, the unit video having a length equal to or longer than a longest section in which the music is divided is extracted.

In controlling the operation mode, when the number of extractions in the first operation mode is less than the number of adoption, the operation mode is a third operation mode,
The video processing method according to claim 6, wherein in the third operation mode, the adoption process shortens a longest section in which the music is divided as compared to the first operation mode.

In controlling the operation mode, when the number of extraction in the first operation mode is less than the number of adoption, the operation mode is a fourth operation mode,
5. The video according to claim 4, wherein, in the extraction process, in the case of the fourth operation mode, the condition relating to the analysis result of the video for extracting the unit video is relaxed compared to the first operation mode. Processing method.

In controlling the operation mode, when the number of extraction in the first operation mode is less than the adopted number, the operation mode is set as a fifth operation mode,
In the adoption process, in the fifth operation mode, the music is divided at a predetermined interval. In the extraction process, in the fifth operation mode, the unit video obtained by dividing the video at the predetermined interval is The video processing method according to claim 4, wherein extraction is performed.

An extraction unit for extracting a plurality of unit videos from the input video;
An editing unit that generates editing information for switching the unit video adopted from the unit video extracted by the extraction unit according to input music;
An operation mode control unit for controlling an operation mode in the extraction unit and the editing unit;
A video processing apparatus comprising:

The operation mode control unit, in the first operation mode, according to the magnitude relationship between the number of unit videos extracted by the extraction unit and the number of unit videos adopted by the editing unit, The video processing apparatus according to claim 10, wherein it is determined whether or not at least one of a re-extraction process by the extraction unit and a re-adoption process by the editing unit is performed by changing the operation mode.

The video processing device includes:
A music analysis unit for analyzing the contents of the music;
A video analysis unit for analyzing the content of the video;
Further comprising
The number of extractions is the number of unit videos extracted by the extraction unit based on the analysis result by the video analysis unit,
The video processing apparatus according to claim 11, wherein the adopted number is a number obtained by dividing the music by the editing unit based on an analysis result by the music analysis unit.

The operation mode control unit determines that the operation mode is not changed when the adopted number and the extracted number in the first operation mode are equal to or greater than the extracted number. The video processing apparatus described in 1.

Computer
An extraction unit for extracting a plurality of unit videos from the input video;
An editing unit that generates editing information for switching the unit video adopted from the unit video extracted by the extraction unit according to input music;
An operation mode control unit for controlling an operation mode in the extraction unit and the editing unit;
Program to function as.