JP2024024479A

JP2024024479A - Information processing device, information processing method, and program

Info

Publication number: JP2024024479A
Application number: JP2022127327A
Authority: JP
Inventors: 明久寺見; Akihisa Terami
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2024-02-22

Abstract

PROBLEM TO BE SOLVED: To facilitate metadata editing to make it easier for a viewer to focus on a particular portion of a video when viewing it.

SOLUTION: An image processing device acquires a position input that specifies a position on a video image. On the basis of the position input, the image processing device generates position information indicating the position on the video image and time information indicating the playback position of the video image. The image processing device generates at least one of the position information and the time information further on the basis of at least one of the content of the video image at the position on the video image specified by the position input and the sound associated with the video image corresponding to the playback position of the video image that is the subject of the position input. The image processing device records the position information and the time information.

SELECTED DRAWING: Figure 2

Description

本開示は、情報処理装置、情報処理方法、及びプログラムに関し、特に配信された映像に関連するメタデータを記録する方法に関する。 The present disclosure relates to an information processing device, an information processing method, and a program, and particularly relates to a method of recording metadata related to distributed video.

近年、映像配信サービスが広く用いられている。例えば、教育現場において、新型コロナウイルスの影響に伴い、講義映像のリアルタイム配信及びオンデマンド配信の導入が進んでいる。 In recent years, video distribution services have been widely used. For example, in educational settings, due to the impact of the new coronavirus, real-time distribution and on-demand distribution of lecture videos are being introduced.

また、映像に付与するメタデータに関連する様々な技術が提案されている。このような技術は、映像制作の効率化及び講義映像の視聴者における学習効果の向上に寄与している。例えば特許文献１は、「発表スライドの切り替え検出」及び「講師の発話内容の音声解析」などに基づいて、映像データを分割することを開示している。また、特許文献１は、分割されたそれぞれの区間を示す情報を記録することを開示している。特許文献１に記載の技術を利用することで、視聴者は所望の話題に関連する区間を迅速に再生することが可能となる。 Additionally, various techniques related to metadata added to videos have been proposed. Such technology contributes to increasing the efficiency of video production and improving the learning effect for viewers of lecture videos. For example, Patent Document 1 discloses dividing video data based on "detection of switching of presentation slides" and "audio analysis of lecturer's utterance content". Further, Patent Document 1 discloses recording information indicating each divided section. By using the technology described in Patent Document 1, viewers can quickly reproduce sections related to a desired topic.

特開２００９－１４７５３８号公報Japanese Patent Application Publication No. 2009-147538

視聴者は、講義の復習などの目的で、映像を再度視聴することがある。このときに視聴者が重要な部分に注目しやすくなることは視聴者にとって便利である。例えば、講義映像を再度視聴する場合に、「重要箇所」や「分からない箇所」に注目しやすくなることにより、学習効果をより向上させることができる。 Viewers may rewatch videos for the purpose of reviewing lectures or the like. At this time, it is convenient for the viewer to be able to easily focus on important parts. For example, when rewatching a lecture video, it becomes easier to focus on "important parts" and "parts you don't understand," which can further improve the learning effect.

本開示は、視聴者が映像を視聴する際に、特定の部分に注目しやすくするためのメタデータ編集を容易にする技術を提供する。 The present disclosure provides a technology that facilitates metadata editing to make it easier for a viewer to focus on a specific part when viewing a video.

本開示の一実施形態に係る情報処理装置は以下の構成を備える。すなわち、
動画像上の位置を指定する位置入力を取得する取得手段と、
前記位置入力に基づいて前記動画像上の位置を示す位置情報及び前記動画像の再生位置を示す時刻情報を生成する生成手段であって、前記位置情報と前記時刻情報との少なくとも一方を、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の少なくとも一方にさらに基づいて生成する、生成手段と、
前記位置情報及び前記時刻情報を記録する記録手段と、
を備える。 An information processing device according to an embodiment of the present disclosure includes the following configuration. That is,
an acquisition means for acquiring a position input specifying a position on the moving image;
A generating means for generating position information indicating a position on the moving image and time information indicating a playback position of the moving image based on the position input, the generating means generating at least one of the position information and the time information by the at least one of the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. further generating means for generating based on;
recording means for recording the location information and the time information;
Equipped with.

視聴者が映像を視聴する際に、特定の部分に注目しやすくするためのメタデータ編集を容易にすることができる。 It is possible to easily edit metadata to make it easier for viewers to focus on specific parts when viewing a video.

一実施形態に係る情報処理装置のハードウェア構成例を示す図。FIG. 1 is a diagram illustrating an example of a hardware configuration of an information processing device according to an embodiment. 一実施形態に係る情報処理装置の機能構成例を示す図。FIG. 1 is a diagram illustrating an example of a functional configuration of an information processing device according to an embodiment. 一実施形態に係る情報処理方法のフローチャート。1 is a flowchart of an information processing method according to an embodiment. 位置入力、及び位置情報の生成方法の例を説明する図。FIG. 3 is a diagram illustrating an example of a method for inputting a position and generating position information. 一実施形態に係る位置情報の生成方法を説明する図。FIG. 2 is a diagram illustrating a method of generating location information according to an embodiment. 位置入力、及び位置情報の生成方法の例を説明する図。FIG. 3 is a diagram illustrating an example of a method for inputting a position and generating position information. 一実施形態に係る位置情報の生成方法を説明する図。FIG. 2 is a diagram illustrating a method of generating location information according to an embodiment. 位置入力、及び時刻情報の生成方法の例を説明する図。FIG. 3 is a diagram illustrating an example of a method for inputting a position and generating time information. 一実施形態に係る時刻情報の生成方法を説明する図。FIG. 2 is a diagram illustrating a time information generation method according to an embodiment. 一実施形態に係る時刻情報の生成方法を説明する図。FIG. 2 is a diagram illustrating a time information generation method according to an embodiment.

以下、添付図面を参照して実施形態を詳しく説明する。なお、以下の実施形態は特許請求の範囲を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. Note that the following embodiments do not limit the scope of the claims. Although a plurality of features are described in the embodiment, not all of these features are essential, and the plurality of features may be arbitrarily combined. Furthermore, in the accompanying drawings, the same or similar components are designated by the same reference numerals, and redundant description will be omitted.

本開示の一実施形態に係る情報処理装置の全体構成を図１に示す。情報処理装置１０１は、ＰＣ（パーソナルコンピュータ）、スマートフォン、又はタブレット端末装置等のコンピュータ装置である。情報処理装置１０１は、ＣＰＵ１０２、ＲＯＭ１０３、ＲＡＭ１０４、記憶部１０５、表示部１０６、及び操作部１０７を有する。これらの構成のそれぞれはバス１０８で互いに接続されている。 FIG. 1 shows the overall configuration of an information processing device according to an embodiment of the present disclosure. The information processing device 101 is a computer device such as a PC (personal computer), a smartphone, or a tablet terminal device. The information processing device 101 includes a CPU 102, a ROM 103, a RAM 104, a storage section 105, a display section 106, and an operation section 107. Each of these configurations is connected to each other by a bus 108.

ＣＰＵ１０２は、ＲＯＭ１０３又はＲＡＭ１０４の内容に従って情報処理装置１０１内の各部の動作を制御することができる。また、ＣＰＵ１０２は、ＲＡＭ１０４にロードされたプログラムを実行することができる。ＲＯＭ１０３は読み出し専用メモリである。ＲＯＭ１０３は、ブートプログラム、ファームウェア、後述する処理を実現するための各種処理プログラム、及び各種データを格納することができる。ＲＡＭ１０４はワークメモリである。ＲＡＭ１０４は、ＣＰＵ１０２が処理を行うために、一時的にプログラム及びデータを格納することができる。ＲＡＭ１０４には、ＣＰＵ１０２によって各種の処理プログラム及びデータがロードされる。記憶部１０５は、変更可能なデータを大量に記憶するための記録媒体である。記憶部１０５は、例えばハードディスクドライブ又はソリッドステートドライブなどであってもよい。 The CPU 102 can control the operation of each part within the information processing apparatus 101 according to the contents of the ROM 103 or RAM 104. Further, the CPU 102 can execute programs loaded into the RAM 104. ROM 103 is a read-only memory. The ROM 103 can store a boot program, firmware, various processing programs for implementing processes described below, and various data. RAM 104 is a work memory. The RAM 104 can temporarily store programs and data for the CPU 102 to process. Various processing programs and data are loaded into the RAM 104 by the CPU 102. The storage unit 105 is a recording medium for storing a large amount of changeable data. The storage unit 105 may be, for example, a hard disk drive or a solid state drive.

本実施形態において、記憶部１０５は後述するメタデータを記憶することができる。また、記憶部１０５は動画像のデータを記憶することができる。動画像は複数のフレームで構成される。また、動画像は複数のフレーム画像を含むことができる。本明細書においては、動画像のことを映像と呼ぶことがある。また、記憶部１０５は、動画像に関連する音のデータを記憶することができる。例えば、記憶部１０５は、動画像データと音データとを含むビデオファイルを格納していてもよい。 In this embodiment, the storage unit 105 can store metadata that will be described later. Furthermore, the storage unit 105 can store moving image data. A moving image consists of multiple frames. Further, a moving image can include multiple frame images. In this specification, a moving image may be referred to as a video. Furthermore, the storage unit 105 can store sound data related to moving images. For example, the storage unit 105 may store a video file including moving image data and sound data.

表示部１０６は、液晶画面又はタッチパネル画面等の画面である。表示部１０６は、ＣＰＵ１０２による処理結果を画像又は文字として表示することができる。また、表示部１０６は、動画像を表示することができる。表示部１０６が表示する動画像は、記憶部１０５に記憶されていてもよいし、ネットワークを介して外部の装置から送信されていてもよい。さらに、表示部１０６は、後述するようなメタデータに基づく情報を表示することができる。なお、表示部１０６がタッチパネル画面を有する場合には、表示部１０６は、ユーザがタッチパネル画面を操作することで入力された操作入力をＣＰＵ１０２へと通知することができる。 The display unit 106 is a screen such as a liquid crystal screen or a touch panel screen. The display unit 106 can display the processing result by the CPU 102 as an image or text. Furthermore, the display unit 106 can display moving images. The moving image displayed by the display unit 106 may be stored in the storage unit 105 or may be transmitted from an external device via a network. Furthermore, the display unit 106 can display information based on metadata as described below. Note that when the display unit 106 has a touch panel screen, the display unit 106 can notify the CPU 102 of an operation input input by the user operating the touch panel screen.

操作部１０７はユーザインターフェースである。操作部１０７は、例えばキーボード、マウス、ボタン、又はタッチパネル画面などであってもよい。ユーザは、操作部１０７を操作することにより、各種の指示をＣＰＵ１０２に対して入力することができる。本実施形態では、動画像の視聴者であるユーザは、動画像に対する位置入力を行うために、操作部１０７を操作することができる。 The operation unit 107 is a user interface. The operation unit 107 may be, for example, a keyboard, a mouse, a button, a touch panel screen, or the like. The user can input various instructions to the CPU 102 by operating the operation unit 107. In this embodiment, a user who is a viewer of a moving image can operate the operation unit 107 in order to input a position on the moving image.

図２は、本開示の一実施形態に係る情報処理装置１０１の論理構成を示す。情報処理装置１０１は、取得部２０１、位置決定部２０２、時刻決定部２０３、及び記録部２０４を有する。このような情報処理装置１０１は、図１に示すような、プロセッサとメモリとを備えるコンピュータにより実現することができる。すなわち、ＣＰＵ１０２のようなプロセッサが、ＲＯＭ１０３、ＲＡＭ１０４、又は記憶部１０５のようなメモリに格納されたプログラムを実行することにより、図２に示される各部の機能を実現することができる。もっとも、情報処理装置１０１が有する一部又は全部の機能が専用のハードウェアによって実現されてもよい。また、本開示の一実施形態に係る画像処理装置は、例えばネットワークを介して接続された複数の情報処理装置によって構成されていてもよい。 FIG. 2 shows a logical configuration of the information processing device 101 according to an embodiment of the present disclosure. The information processing device 101 includes an acquisition section 201, a position determination section 202, a time determination section 203, and a recording section 204. Such an information processing device 101 can be realized by a computer including a processor and a memory as shown in FIG. That is, by a processor such as the CPU 102 executing a program stored in a memory such as the ROM 103, the RAM 104, or the storage unit 105, the functions of each unit shown in FIG. 2 can be realized. However, some or all of the functions of the information processing device 101 may be realized by dedicated hardware. Further, the image processing device according to an embodiment of the present disclosure may be configured by, for example, a plurality of information processing devices connected via a network.

取得部２０１は、動画像上の位置を指定する位置入力を取得する。ユーザは、動画像の特定のフレーム画像が表示部１０６に表示されている際に、この特定のフレーム画像の特定の位置を指定することができる。ユーザによる位置入力は、１つの座標を指定する入力であってもよいし、特定の領域を指定する入力であってもよい。領域の入力方法については後述する。 The acquisition unit 201 acquires a position input specifying a position on a moving image. The user can specify a specific position of a specific frame image of a moving image when the specific frame image is displayed on the display unit 106. The position input by the user may be an input specifying one coordinate or may be an input specifying a specific area. The method of inputting the area will be described later.

位置決定部２０２は、取得部２０１が取得した位置入力に基づいて動画像上の位置を示す位置情報を生成する。この位置情報は、位置入力によって指定された動画像上の位置を示していてもよい。一方で、この位置情報は、位置入力によって指定された動画像上の位置に基づいて位置決定部２０２が決定した別の位置を示していてもよい。例えば、図５及び図７を参照して後述するように、位置決定部２０２は、位置入力によって指定された動画像上の位置における動画像の内容に基づいて位置情報を生成することができる。また、位置決定部２０２は、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音に基づいて位置情報を生成してもよい。より具体的には、位置決定部２０２は、位置入力が指定する動画像上の位置を、この位置における動画像の内容と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音と、の少なくとも一方に基づいて補正できる。 The position determination unit 202 generates position information indicating the position on the moving image based on the position input acquired by the acquisition unit 201. This position information may indicate the position on the moving image designated by the position input. On the other hand, this position information may indicate another position determined by the position determination unit 202 based on the position on the moving image specified by the position input. For example, as will be described later with reference to FIGS. 5 and 7, the position determining unit 202 can generate position information based on the content of the moving image at a position on the moving image specified by the position input. Further, the position determination unit 202 may generate position information based on sound related to a moving image corresponding to the playback position of the moving image that is the target of the position input. More specifically, the position determining unit 202 determines the position on the moving image specified by the position input, the content of the moving image at this position, and the moving image corresponding to the playback position of the moving image that is the target of the position input. can be corrected based on at least one of the following:

時刻決定部２０３は、取得部２０１が取得した位置入力に基づいて動画像の再生位置を示す時刻情報を生成する。この時刻情報は、動画像中の特定の時刻を示すことができる。例えば、時刻情報は、動画像の開始から特定の時点までの経過時間を表していてもよい。また、時刻情報は、動画像中の特定のフレームの時刻を示していてもよい。例えば、時刻情報は、動画像におけるフレーム画像の表示タイミングを示す情報であってもよい。この表示タイミングを示す情報は、動画像の開始からフレーム画像が表示されるまでの時間を示す情報であってもよいし、フレーム画像の番号を示す情報であってもよい。 The time determination unit 203 generates time information indicating the playback position of the moving image based on the position input acquired by the acquisition unit 201. This time information can indicate a specific time in a moving image. For example, the time information may represent the elapsed time from the start of the video to a specific point in time. Further, the time information may indicate the time of a specific frame in the moving image. For example, the time information may be information indicating the display timing of frame images in a moving image. The information indicating the display timing may be information indicating the time from the start of the moving image until the frame image is displayed, or may be information indicating the number of the frame image.

時刻決定部２０３が生成する時刻情報は、位置入力の対象となった動画像の再生位置を示していてもよい。例えば、時刻情報は、動画像の開始からユーザが位置入力を行った時点までの経過時間を表していてもよい。また、この時刻情報は、ユーザによる位置入力の対象となったフレームの時刻を示していてもよい。一方で、この時刻情報は、位置入力の対象となった動画像の再生位置に基づいて時刻決定部２０３が決定した別の時刻を示していてもよい。例えば、図９を参照して後述するように、位置決定部２０２は、位置入力によって指定された動画像上の位置における動画像の内容に基づいて、時刻情報を生成してもよい。また、図１０を参照して後述するように、位置決定部２０２は、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音に基づいて、時刻情報を生成してもよい。より具体的には、時刻決定部２０３は、位置入力の対象となった動画像の再生位置を、位置入力が指定する動画像上の位置における動画像の内容と、この再生位置に対応する動画像に関連する音と、の少なくとも一方に基づいて補正できる。図９及び１０はこのような例を示している。 The time information generated by the time determination unit 203 may indicate the playback position of the moving image that is the target of the position input. For example, the time information may represent the elapsed time from the start of the video to the time when the user inputs the position. Further, this time information may indicate the time of the frame whose position is input by the user. On the other hand, this time information may indicate another time determined by the time determination unit 203 based on the playback position of the moving image that is the subject of position input. For example, as will be described later with reference to FIG. 9, the position determination unit 202 may generate time information based on the content of the moving image at a position on the moving image specified by the position input. Further, as will be described later with reference to FIG. 10, the position determining unit 202 generates time information based on the sound associated with the video corresponding to the playback position of the video that is the target of the position input. Good too. More specifically, the time determination unit 203 determines the playback position of the video that is the target of the position input based on the content of the video at the position on the video specified by the position input and the video corresponding to this playback position. The image-related sound can be corrected based on at least one of the following. Figures 9 and 10 illustrate such an example.

このように、位置決定部２０２及び時刻決定部２０３は、取得部２０１が取得した位置入力に基づいて、動画像上の位置を示す位置情報及び動画像の再生位置を示す時刻情報を生成する。ここで、位置情報と時刻情報との少なくとも一方は、位置入力によって指定された動画像上の位置における動画像の内容と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音と、の少なくとも一方にさらに基づいて生成することができる。例えば、位置決定部２０２は、位置入力によって指定された動画像上の位置における動画像の内容と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音と、の少なくとも一方にさらに基づいて、位置情報を生成することができる。このとき、時刻決定部２０３は、位置入力の対象となった動画像の再生位置を示す時刻情報を生成してもよい。別の例において、時刻決定部２０３は、位置入力によって指定された動画像上の位置における動画像の内容と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音と、の少なくとも一方にさらに基づいて、時刻情報を生成することができる。このとき、位置決定部２０２は、位置入力が指定する動画像上の位置を示す位置情報を生成してもよい。 In this way, the position determining unit 202 and the time determining unit 203 generate position information indicating the position on the moving image and time information indicating the playback position of the moving image, based on the position input acquired by the acquiring unit 201. Here, at least one of the position information and the time information is the content of the video at the position on the video specified by the position input, and the video corresponding to the playback position of the video that is the target of the position input. and/or the associated sound. For example, the position determining unit 202 can determine the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. Location information can be generated further based on at least one. At this time, the time determining unit 203 may generate time information indicating the playback position of the moving image that is the target of the position input. In another example, the time determination unit 203 determines the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. The time information can be further generated based on at least one of the following. At this time, the position determination unit 202 may generate position information indicating the position on the video image specified by the position input.

記録部２０４は、位置決定部２０２が生成した位置情報及び時刻決定部２０３が生成した時刻情報を記録する。記録部２０４は、これらの情報を、動画像に対するメタデータとして記録することができる。また、記録部２０４は、これらの情報を、記憶部１０５のような任意の記憶媒体に記録することができる。 The recording unit 204 records the position information generated by the position determining unit 202 and the time information generated by the time determining unit 203. The recording unit 204 can record this information as metadata for the moving image. Further, the recording unit 204 can record this information in an arbitrary storage medium such as the storage unit 105.

再生部２０５は、動画像を再生する。再生部２０５は、例えば、記憶部１０５のような記憶媒体に記録されている動画像を、表示部１０６のような画面に表示することができる。ユーザは、表示部１０６に表示された画面を見ながら、動画像に対する位置入力を行うことができる。また、再生部２０５は、記録部２０４が記録した位置情報及び時刻情報のようなメタデータに基づいて、動画像の表示を制御することができる。具体的な表示制御方法については後述する。 The playback unit 205 plays back the moving image. The playback unit 205 can display a moving image recorded in a storage medium such as the storage unit 105 on a screen such as the display unit 106, for example. The user can input the position of the moving image while viewing the screen displayed on the display unit 106. Furthermore, the playback unit 205 can control the display of moving images based on metadata such as position information and time information recorded by the recording unit 204. A specific display control method will be described later.

もっとも、情報処理装置１０１の構成は上記のものには限定されない。例えば、情報処理装置１０１がネットワークに接続されていてもよい。また、ネットワーク上のストリーミングサーバから、情報処理装置１０１に動画像が配信されてもよい。この場合、記録部２０４は必須ではない。また、情報処理装置１０１は、ネットワークを介して位置情報及び時刻情報のようなメタデータをストリーミングサーバ内に記録することができる。また、情報処理装置１０１は、このようなメタデータをストリーミングサーバから取得することができる。このような構成によっても、情報処理装置１０１は動作可能である。 However, the configuration of the information processing device 101 is not limited to the above. For example, the information processing device 101 may be connected to a network. Further, moving images may be distributed to the information processing apparatus 101 from a streaming server on a network. In this case, the recording unit 204 is not essential. Further, the information processing apparatus 101 can record metadata such as location information and time information in the streaming server via the network. Further, the information processing apparatus 101 can acquire such metadata from the streaming server. Even with such a configuration, the information processing apparatus 101 can operate.

図３は、一実施形態に係る情報処理方法のフローチャートを示す。図３に示す処理は、情報処理装置１０１が行うことができる。また、図３に示す処理は、再生部２０５が動画像を再生している間に行うことができる。 FIG. 3 shows a flowchart of an information processing method according to an embodiment. The processing shown in FIG. 3 can be performed by the information processing apparatus 101. Further, the processing shown in FIG. 3 can be performed while the playback unit 205 is playing back the moving image.

Ｓ３０１では、取得部２０１が、ユーザからの動画像に対する位置入力を取得する。なお、上述のように、取得部２０１は、位置入力に関連する属性情報（例えばマークの種類を示す情報）をさらに取得してもよい。 In S301, the acquisition unit 201 acquires a position input for a moving image from the user. Note that, as described above, the acquisition unit 201 may further acquire attribute information related to position input (for example, information indicating the type of mark).

Ｓ３０２では、位置決定部２０２が、Ｓ３０１で取得した位置入力に基づいて位置情報を生成する。この位置情報はメタデータとして記憶される。詳細な位置情報の生成方法は後述する。 In S302, the position determining unit 202 generates position information based on the position input acquired in S301. This location information is stored as metadata. A detailed method of generating position information will be described later.

Ｓ３０３では、時刻決定部２０３が、Ｓ３０１で取得した位置入力に基づいて時刻情報を生成する。この時刻情報はメタデータとして記憶される。詳細な時刻情報の生成方法は後述する。 In S303, the time determination unit 203 generates time information based on the position input acquired in S301. This time information is stored as metadata. A detailed time information generation method will be described later.

Ｓ３０４では、記録部２０４が、Ｓ３０２で生成された位置情報とＳ３０３で生成された時刻情報を、それぞれメタデータの格納領域に記録する。記録部２０４は、再生中の動画像のメタデータとしてこれらの情報を記録することができる。 In S304, the recording unit 204 records the position information generated in S302 and the time information generated in S303 in the metadata storage area. The recording unit 204 can record this information as metadata of the moving image being played back.

Ｓ３０５では、記録部２０４が、ユーザによる位置入力に備えて、メタデータの識別子を示すインデクスｎを１つ加算する。 In S305, the recording unit 204 increments the index n indicating the metadata identifier by one in preparation for a position input by the user.

Ｓ３０６では、記録部２０４が、インデクスｎが最大値となっているかどうかを判定する。この最大値は、格納可能なメタデータの最大数を示す。インデクスｎが最大値であれば、図３に示すフローチャートに従う処理は終了する。そうでなければ、処理はＳ３０１へ戻る。 In S306, the recording unit 204 determines whether the index n has reached the maximum value. This maximum value indicates the maximum number of metadata that can be stored. If the index n is the maximum value, the process according to the flowchart shown in FIG. 3 ends. Otherwise, the process returns to S301.

図４（Ｄ）は、記録部２０４が記録したメタデータの一例を示す。ユーザが位置入力を行うと、メタデータの識別子であるインデクスごとの格納領域に、位置情報（例えば座標）、時刻情報（例えば動画像中の時刻）、及び入力されたマークの種類が記憶される。 FIG. 4(D) shows an example of metadata recorded by the recording unit 204. When a user inputs a location, location information (e.g., coordinates), time information (e.g., time in a video image), and the type of mark entered are stored in a storage area for each index, which is a metadata identifier. .

（領域指定に基づく位置情報の生成方法）
以下では、ユーザによる位置入力が四角形の領域を指定する場合の、位置情報の生成方法について説明する。既に説明したとおり、この場合に位置決定部２０２は、ユーザが指定した四角形の領域を示す位置情報を生成してもよい。一方で、以下の例において、位置決定部２０２は、位置入力によって指定された動画像上の位置における動画像の内容に基づいて位置情報を生成する。例えば、位置決定部２０２は、位置入力によって指定された動画像上の位置にある動画像上の情報を包含する領域を示すように、位置情報を生成することができる。例えば、位置入力は、動画像中の情報の少なくとも一部を含む領域を指定していてもよい。この場合、位置決定部２０２は、情報全体（例えば連続した記述全体）を包含する領域を示す位置情報を生成することができる。以下の例において、位置決定部２０２は、このような領域を特定するための座標を算出する。ここで、動画像上の情報の種類は特に限定されない。動画像上の情報は、例えば連続した記述、イラスト、図、又は表などでありうる。以下の例において、位置決定部２０２は、このような領域を特定するための座標を算出する。 (Method of generating location information based on area specification)
In the following, a method for generating position information when the user's position input specifies a rectangular area will be described. As already explained, in this case, the position determining unit 202 may generate position information indicating the rectangular area specified by the user. On the other hand, in the following example, the position determining unit 202 generates position information based on the content of the moving image at the position on the moving image specified by the position input. For example, the position determining unit 202 can generate position information so as to indicate an area that includes information on the moving image at a position on the moving image specified by the position input. For example, the position input may specify an area that includes at least part of the information in the moving image. In this case, the position determining unit 202 can generate position information indicating a region that includes the entire information (for example, the entire continuous description). In the example below, the position determination unit 202 calculates coordinates for identifying such an area. Here, the type of information on the moving image is not particularly limited. The information on the moving image may be, for example, a continuous description, illustration, diagram, or table. In the example below, the position determination unit 202 calculates coordinates for identifying such an area.

このような実施形態について、図４（Ａ）～（Ｄ）を参照して説明する。この例においては、動画像上の連続した記述を包含する領域を示すように、位置情報が生成される。図４（Ａ）に示すように、ユーザは、講義の配信映像４０１内の黒板４０２に書かれている記述内容「ＡＢＣＤＥ」に注目する。そして、ユーザは、位置入力として記述内容「ＡＢＣＤＥ」と重畳する四角形の枠４０３を入力し、さらにこの記述が重要であることを示すマーク４０４を入力する。このように、ユーザは、動画像中の注目部分を示す位置入力を行うことができる。また、この位置入力は、動画像中の連続した記述の一部を指しているかもしれない。 Such an embodiment will be described with reference to FIGS. 4(A) to 4(D). In this example, position information is generated to indicate an area that includes continuous descriptions on a moving image. As shown in FIG. 4(A), the user pays attention to the descriptive content "ABCDE" written on the blackboard 402 in the delivered video 401 of the lecture. Then, the user inputs a rectangular frame 403 that overlaps the description content "ABCDE" as a position input, and further inputs a mark 404 indicating that this description is important. In this manner, the user can input a position indicating the portion of interest in the moving image. Moreover, this position input may point to a part of a continuous description in a moving image.

また、ユーザは、位置入力に関連付けて、注目部分の種類を示す属性情報（この例ではマーク４０４）を入力することができる。なお、属性情報の種類は特に限定されない。例えば、属性情報は複数の種類から選択可能であってもよい。例えば、属性情報として、ユーザにとって記述が分からないことを示す「分からない」マークを入力可能であってもよい。 Furthermore, the user can input attribute information (mark 404 in this example) indicating the type of the portion of interest in association with the position input. Note that the type of attribute information is not particularly limited. For example, the attribute information may be selectable from a plurality of types. For example, it may be possible to input an "I don't understand" mark indicating that the user does not understand the description as the attribute information.

このような例において、位置決定部２０２は、位置入力が指定する領域を拡大又は縮小することにより、動画像上の情報（例えば連続した記述）を包含する領域を判定することができる。以下に、Ｓ３０２で位置決定部２０２が位置情報を生成するための処理例を、図５に示すフローチャートに従って説明する。図５に示す処理において、位置決定部２０２は、位置入力が示す領域の隣接領域に記述情報がある場合に、隣接領域を含むように領域を拡大する。 In such an example, the position determining unit 202 can determine an area that includes information (for example, a continuous description) on the moving image by enlarging or reducing the area specified by the position input. An example of the process by which the position determining unit 202 generates position information in S302 will be described below with reference to the flowchart shown in FIG. 5. In the process shown in FIG. 5, when descriptive information exists in an area adjacent to the area indicated by the position input, the position determining unit 202 expands the area to include the adjacent area.

Ｓ５０１では、取得部２０１が、位置入力が示す領域（この例では四角形の枠４０３）の頂点座標を検出する。この例では、図４（Ｂ）に示す通り、左側の水平座標ｘ１、右側の水平座標ｘ２、上側の垂直座標ｙ１、及び下側の垂直座標ｙ２が検出される。 In S501, the acquisition unit 201 detects the vertex coordinates of the area (in this example, the rectangular frame 403) indicated by the position input. In this example, as shown in FIG. 4B, a left horizontal coordinate x1, a right horizontal coordinate x2, an upper vertical coordinate y1, and a lower vertical coordinate y2 are detected.

Ｓ５０２では、位置決定部２０２が、位置入力が示す領域の上側の画像領域を解析する。この解析により、位置決定部２０２は、この上側の画像領域に記述情報があるかどうかを判定することができる。図４（Ｂ）では、この上側の画像領域は、領域４０５として表されている、位置入力が示す領域の上側にある幅αを有する領域である。具体的には、領域４０５は、座標（ｘ１，ｙ１）、（ｘ２，ｙ１）、（ｘ１，ｙ１＋α）、及び（ｘ２，ｙ１＋α）で囲まれる領域である。ここで、αは任意の一定値である。 In S502, the position determining unit 202 analyzes the image area above the area indicated by the position input. Through this analysis, the position determining unit 202 can determine whether descriptive information exists in this upper image area. In FIG. 4B, this upper image region is a region having a width α above the region indicated by the position input, represented as region 405. Specifically, the area 405 is an area surrounded by coordinates (x1, y1), (x2, y1), (x1, y1+α), and (x2, y1+α). Here, α is an arbitrary constant value.

具体的な解析方法は特に限定されない。例えば、位置決定部２０２は、領域４０５内のＲＧＢ値を走査することができる。そして、走査中に背景（例えば黒板４０２）のＲＧＢ値と異なるＲＧＢ値が検出された場合、位置決定部２０２は、記述情報が存在すると判断することができる。また、位置決定部２０２は、パターン認識に基づく手法を用いることで解析を行ってもよい。パターン認識を用いることにより、さらに高い精度で記述情報を検出できる。例えば、領域４０５から文字が検出された場合、位置決定部２０２は、位置決定部２０２は領域４０５に記述情報が存在すると判断することができる。 The specific analysis method is not particularly limited. For example, position determiner 202 can scan RGB values within region 405. If an RGB value different from the RGB value of the background (for example, the blackboard 402) is detected during scanning, the position determining unit 202 can determine that descriptive information exists. Further, the position determining unit 202 may perform the analysis using a method based on pattern recognition. By using pattern recognition, descriptive information can be detected with even higher accuracy. For example, if a character is detected in the area 405, the position determining unit 202 can determine that descriptive information exists in the area 405.

Ｓ５０３では、位置決定部２０２が、上側の画像領域に記述情報があるかどうかを判定する。記述情報がある場合、処理はＳ５０５へ進み、そうでない場合、処理はＳ５０４へ進む。Ｓ５０４では、位置決定部２０２が、上側の垂直座標ｙ１をｙ１＋αで置き換え、その後処理はＳ５０２に戻る。 In S503, the position determination unit 202 determines whether descriptive information is present in the upper image area. If there is descriptive information, the process advances to S505; otherwise, the process advances to S504. In S504, the position determining unit 202 replaces the upper vertical coordinate y1 with y1+α, and then the process returns to S502.

Ｓ５０５では、位置決定部２０２が、位置入力が示す領域の下側の画像領域を解析する。解析方法はＳ５０２と同様である。Ｓ５０６では、位置決定部２０２が、下側の画像領域に記述情報があるかどうかを判定する。記述情報があれば処理はＳ５０８へ進み、そうでない場合、処理はＳ５０７へ進む。Ｓ５０７では、位置決定部２０２が、下側の垂直座標ｙ２をｙ２－αで置き換え、その後処理はＳ５０５に戻る。 In S505, the position determination unit 202 analyzes the image area below the area indicated by the position input. The analysis method is the same as S502. In S506, the position determining unit 202 determines whether descriptive information is present in the lower image area. If there is descriptive information, the process advances to S508; otherwise, the process advances to S507. In S507, the position determining unit 202 replaces the lower vertical coordinate y2 with y2-α, and then the process returns to S505.

Ｓ５０８では、位置決定部２０２が、位置入力が示す領域の右側の画像領域を解析する。解析方法はＳ５０２と同様である。Ｓ５０９では、位置決定部２０２が、右側の画像領域に記述情報があるかどうかを判定する。記述情報があれば処理はＳ５１１へ進み、そうでない場合、処理はＳ５１０へ進む。Ｓ５１０では、位置決定部２０２が、右側の水平座標ｘ２をｘ２＋αで置き換え、その後処理はＳ５０８に戻る。 In S508, the position determination unit 202 analyzes the image area on the right side of the area indicated by the position input. The analysis method is the same as S502. In S509, the position determination unit 202 determines whether descriptive information is present in the right image area. If there is descriptive information, the process advances to S511; otherwise, the process advances to S510. In S510, the position determining unit 202 replaces the right horizontal coordinate x2 with x2+α, and then the process returns to S508.

Ｓ５１１では、位置決定部２０２が、位置入力が示す領域の左側の画像領域を解析する。解析方法はＳ５０２と同様である。Ｓ５１２では、位置決定部２０２が、左側の画像領域に記述情報があるかどうかを判定する。記述情報があれば処理はＳ５１４へ進み、そうでない場合、処理はＳ５１３へ進む。Ｓ５１３では、位置決定部２０２が、左側の水平座標ｘ１をｘ１－αで置き換え、その後処理はＳ５１１に戻る。 In S511, the position determination unit 202 analyzes the image area on the left side of the area indicated by the position input. The analysis method is the same as S502. In S512, the position determination unit 202 determines whether descriptive information is present in the left image area. If there is descriptive information, the process advances to S514; otherwise, the process advances to S513. In S513, the position determining unit 202 replaces the left horizontal coordinate x1 with x1-α, and then the process returns to S511.

Ｓ５１４では、位置決定部２０２、Ｓ５０４で算出したｘ１、Ｓ５０７で算出したｘ２、Ｓ５１０で算出したｙ１、及びＳ５１３で算出したｙ２を示す位置情報を生成する。こうして、図５のフローチャートに従う処理は終了する。 In S514, the position determination unit 202 generates position information indicating x1 calculated in S504, x2 calculated in S507, y1 calculated in S510, and y2 calculated in S513. In this way, the processing according to the flowchart of FIG. 5 ends.

ここまで、位置入力が四角形の領域を示す場合について説明した。例えば、ユーザが、マウスを操作することにより、注目した領域と重なるように四角形の領域を指定する場合に、このような方法を用いることができる。一方で、上述のように、位置入力が１つの座標、すなわち点を指定してもよい。１つの座標を指定する位置入力は、例えば、視線入力又はスマートフォン上でのタップ操作により、手軽に行うことができる。この場合、Ｓ５０１において、ｘ１及びｘ２として指定された座標の水平方向の座標を用いることができ、ｙ１及びｙ２として指定された座標の垂直方向の座標を用いることができる。Ｓ５０２以降の処理は同様に行うことができる。 Up to this point, the case where the position input indicates a rectangular area has been described. For example, such a method can be used when the user specifies a rectangular area so as to overlap the area of interest by operating the mouse. On the other hand, as described above, the position input may specify one coordinate, that is, a point. A position input specifying one coordinate can be easily performed by, for example, line-of-sight input or a tap operation on a smartphone. In this case, in S501, the horizontal coordinates of the coordinates designated as x1 and x2 can be used, and the vertical coordinates of the coordinates designated as y1 and y2 can be used. Processing after S502 can be performed in the same manner.

また、位置入力が示す領域の形状は特に限定されない。例えば、位置入力が、丸のようなループ形状を示していてもよい。この場合、Ｓ５０１において、形状の内部、又は形状の境界線上の座標から、水平方向の座標ｘ１，ｘ２及び垂直方向の座標ｙ１，ｙ２を任意に選択することができる。例えば、位置入力が示す形状の境界線上の各点が持つ座標値のうち、最も小さい水平方向の座標及び最も大きい水平方向の座標をｘ１及びｘ２に、最も小さい垂直方向の座標及び最も大きい垂直方向の座標をｙ２及びｙ１に、それぞれ設定することができる。Ｓ５０２以降の処理は同様に行うことができる。 Further, the shape of the area indicated by the position input is not particularly limited. For example, the position input may indicate a loop shape such as a circle. In this case, in S501, horizontal coordinates x1, x2 and vertical coordinates y1, y2 can be arbitrarily selected from coordinates inside the shape or on the boundary line of the shape. For example, among the coordinate values of each point on the boundary line of the shape indicated by the position input, the smallest horizontal coordinate and largest horizontal coordinate are x1 and x2, and the smallest vertical coordinate and largest vertical coordinate are The coordinates of can be set to y2 and y1, respectively. Processing after S502 can be performed in the same manner.

上記の方法によれば、動画像中の連続した記述の一部を指す位置入力がなされた場合に、連続した記述全体を包含する領域を示す位置情報を生成することができる。このような処理により得られた位置情報が示す位置は、位置入力が指定する位置を補正したものに相当する。このような方法によれば、ユーザによる注目部分の一部を指定するという簡単な操作で、注目部分の全体を示す位置情報を生成することができる。後述するように、このような位置情報に従う領域を動画像上に示すことにより、ユーザは注目部分を把握しやすくなる。 According to the above method, when a positional input indicating a part of continuous descriptions in a moving image is made, positional information indicating an area including the entire continuous descriptions can be generated. The position indicated by the position information obtained through such processing corresponds to the position specified by the position input, which has been corrected. According to such a method, position information indicating the entire portion of interest can be generated by the user's simple operation of specifying a part of the portion of interest. As will be described later, by showing an area according to such positional information on a moving image, it becomes easier for the user to grasp the part of interest.

ここまで、位置入力に従う領域を、位置入力が示す位置における動画像に基づいて拡大する場合について説明した。一方で、位置入力に従う領域、又は上記の方法に従って拡大された領域を、位置入力が示す位置における動画像に基づいて縮小してもよい。例えば、位置入力が示す領域が連続した記述の領域よりも広い場合、領域を縮小してもよい。具体的には、位置決定部２０２は、領域の境界線と記述の間の空白を省くように、領域を示す適切な座標を決定することができる。このような処理は、位置決定部２０２が、位置入力が示す領域の端部領域に記述情報があるかどうかを判定し、記述情報を含まない端部領域を除外するように領域を縮小することにより実現できる。具体的には、マイナスの値を有するαを用いて図５に従う処理を行うことにより、座標ｘ１、ｘ２、ｙ１、及びｙ２をそれぞれ領域の内側方向に移動させることができる。 Up to this point, a case has been described in which the area according to the position input is enlarged based on the moving image at the position indicated by the position input. On the other hand, the area according to the position input or the area enlarged according to the above method may be reduced based on the moving image at the position indicated by the position input. For example, if the area indicated by the position input is wider than the area of continuous description, the area may be reduced. Specifically, the position determination unit 202 can determine appropriate coordinates that indicate the area so as to eliminate blank spaces between the boundaries of the area and the description. In such processing, the position determination unit 202 determines whether descriptive information is present in the edge area of the area indicated by the position input, and reduces the area so as to exclude the edge area that does not include descriptive information. This can be achieved by Specifically, by performing the processing according to FIG. 5 using α having a negative value, the coordinates x1, x2, y1, and y2 can each be moved inward of the area.

（領域分割に基づく位置情報の生成方法）
位置入力は、動画像上の領域の分割位置を指定していてもよい。この場合、位置決定部２０２は、位置入力に従う分割により得られた分割領域のうちの１つを示すように、位置情報を生成することができる。このような例において、ユーザによる位置入力は線を指定していてもよい。線を指定する位置入力は、例えば、視線入力又はスマートフォン上でのスワイプ操作により、手軽に行うことができる。 (Method for generating location information based on area division)
The position input may specify a dividing position of an area on a moving image. In this case, the position determining unit 202 can generate position information to indicate one of the divided areas obtained by dividing according to the position input. In such an example, the user's position input may specify a line. A position input for specifying a line can be easily performed by, for example, line-of-sight input or a swipe operation on a smartphone.

以下では、動画像に映っている所定領域を２つの領域に分割する線が入力された場合の、位置情報の生成方法について説明する。所定領域の種類は特に限定されないが、例えば黒板又はホワイトボードの領域であってもよい。この場合、位置決定部２０２は、動画像から所定領域を検出し、位置入力に従って分割された所定領域の分割領域のうちの１つを示すように、位置情報を生成することができる。このように、位置入力によって指定された動画像上の位置にある所定領域に基づいて位置情報を生成することも、動画像の内容に基づいて位置情報を生成する方法の一例である。 In the following, a method for generating positional information when a line dividing a predetermined area shown in a moving image into two areas is input will be described. The type of the predetermined area is not particularly limited, but may be, for example, a blackboard or whiteboard area. In this case, the position determining unit 202 can detect a predetermined area from the moving image and generate position information so as to indicate one of the divided areas of the predetermined area divided according to the position input. Generating position information based on a predetermined area located at a position on a moving image specified by position input in this way is also an example of a method of generating position information based on the contents of a moving image.

図６（Ａ）は、ユーザが、配信映像６０１の中の黒板６０２上の記述「ＡＢＣＤＥＶＷＸＹＺ」に注目した場合を表す。この例において、ユーザは、記述「ＡＢＣＤＥＶＷＹＺ」がある領域の位置をメタデータとして記録するために、線６０３を引いている。位置決定部２０２は、図６（Ｃ）に示すように、記述「ＡＢＣＤＥＶＷＸＹＺ」を包含する領域６０４を決定し、この領域６０４の位置を示す位置情報を生成する。 FIG. 6A shows a case where the user focuses on the description “ABCDEVWXYZ” on the blackboard 602 in the distributed video 601. In this example, the user has drawn a line 603 to record the location of the area containing the description "ABCDEVWYZ" as metadata. As shown in FIG. 6C, the position determining unit 202 determines an area 604 that includes the description "ABCDEVWXYZ" and generates position information indicating the position of this area 604.

このような例において、Ｓ３０２で位置決定部２０２が位置情報を生成するための処理例を、図７に示すフローチャートに従って説明する。図７に示す処理において、位置決定部２０２は、位置入力に従って分割された動画像上の複数の領域のそれぞれに情報が含まれるかどうかに基づいて、複数の領域から１つの領域を選択し、選択された領域を示すように位置情報を生成する。以下の例において位置決定部２０２は、線によって分割された領域の中から、記述情報を含む領域を選択する。なお、領域の分割方法は２分割には限定されない。また、領域の分割手法も、線を引く方法には限られない。いずれにせよ、同様の手法を用いて、分割により得られたそれぞれの分割領域のうち、記述情報を含む領域を選択することができる。このように、分割位置を指定する位置入力によって指定されたそれぞれの分割領域における動画像の内容に基づいて位置情報を生成することも、動画像の内容に基づいて位置情報を生成する方法の一例である。 In such an example, a processing example for the position determination unit 202 to generate position information in S302 will be described with reference to the flowchart shown in FIG. In the process shown in FIG. 7, the position determining unit 202 selects one region from the plurality of regions based on whether information is included in each of the plurality of regions on the moving image divided according to the position input, Generate location information to indicate the selected area. In the following example, the position determining unit 202 selects an area containing descriptive information from areas divided by lines. Note that the method of dividing the area is not limited to two. Furthermore, the method of dividing the region is not limited to the method of drawing lines. In any case, a similar technique can be used to select a region containing descriptive information from among the divided regions obtained by division. In this way, generating position information based on the content of the moving image in each divided area specified by the position input specifying the dividing position is also an example of a method of generating position information based on the content of the moving image. It is.

Ｓ７０１で、位置決定部２０２は、図６（Ｂ）に示すように、黒板６０２の４つの頂点座標（ｘ１，ｘ２，ｙ１，ｙ２）を画像解析により算出する。なお、画像解析の方法は特に限定されないが、例えば領域分割手法を用いることができる。例えば、位置決定部２０２は、フレーム画像内の特徴量に基づいて黒板領域を判定し、判定された黒板領域の頂点座標を算出することができる。さらに、位置決定部２０２は、位置入力が示す線と黒板の上辺との交点座標ｘ３を画像解析により算出する。 In S701, the position determining unit 202 calculates four vertex coordinates (x1, x2, y1, y2) of the blackboard 602 by image analysis, as shown in FIG. 6(B). Note that the method of image analysis is not particularly limited, but for example, a region division method can be used. For example, the position determination unit 202 can determine a blackboard area based on the feature amount in the frame image, and calculate the vertex coordinates of the determined blackboard area. Furthermore, the position determining unit 202 calculates the coordinates x3 of the intersection between the line indicated by the position input and the upper side of the blackboard by image analysis.

Ｓ７０２で、位置決定部２０２は、黒板６０２の画像領域を解析することにより、黒板６０２上で記述情報が存在する部分を判定する。記述情報の検出は、Ｓ５０２と同様に行うことができる。 In S702, the position determination unit 202 determines a portion of the blackboard 602 where descriptive information exists by analyzing the image area of the blackboard 602. Detection of descriptive information can be performed in the same manner as in S502.

Ｓ７０３で、位置決定部２０２は、位置入力が示す線の右側と左側との双方に記述が有るかどうかを判定する。双方に記述が有ると場合、処理はＳ７０４へ進み、そうでない場合、処理はＳ７０７へ進む。 In S703, the position determination unit 202 determines whether there is a description on both the right and left sides of the line indicated by the position input. If there are descriptions in both, the process advances to S704; otherwise, the process advances to S707.

Ｓ７０４で、位置決定部２０２は、位置入力が示す線の右側に記述が有るかどうかを判定する。記述がある場合、処理はＳ７０９へ進み、そうでない場合、処理はＳ７０５へ進む。 In S704, the position determination unit 202 determines whether there is a description on the right side of the line indicated by the position input. If there is a description, the process advances to S709; otherwise, the process advances to S705.

Ｓ７０５で、位置決定部２０２は、位置入力が示す線の左側に記述が有るかどうかを判定する。記述が有る場合、処理はＳ７１０へ進み、そうでない場合、処理はＳ７０６へ進む。 In S705, the position determination unit 202 determines whether there is a description on the left side of the line indicated by the position input. If there is a description, the process advances to S710; otherwise, the process advances to S706.

Ｓ７０６で、位置決定部２０２は、メタデータとして記録される位置情報は存在しないと判定する。そして、図７に従う処理は終了する。 In S706, the position determining unit 202 determines that there is no position information recorded as metadata. Then, the process according to FIG. 7 ends.

Ｓ７０７で、位置決定部２０２は、右側と左側のどちらの領域を示す位置情報をメタデータとして記録するのかを示すユーザ入力を取得する。位置決定部２０２は、ユーザ入力を促すプロンプトをユーザに対して出力してもよい。例えば、スマートフォン上で位置入力が行われている場合、ユーザは、右側と左側のどちらかをタップすることにより入力を行うことができる。また、視線入力を用いて位置入力が行われている場合、右側と左側のどちらかに一定時間以上視線を向けることにより入力を行うことができる。 In step S707, the position determining unit 202 obtains a user input indicating which area, the right side or the left side, is to be recorded as metadata. The position determining unit 202 may output a prompt to the user for user input. For example, when position input is performed on a smartphone, the user can input by tapping either the right or left side. Further, when position input is performed using line of sight input, input can be performed by directing the line of sight to either the right side or the left side for a certain period of time or more.

Ｓ７０８で、位置決定部２０２は、ユーザが右側を選択したかどうかを判定する。右側が選択されている場合、処理はＳ７０９へ進み、そうでない場合、処理はＳ７１０へ進む。 In S708, the position determining unit 202 determines whether the user has selected the right side. If the right side is selected, the process advances to S709; otherwise, the process advances to S710.

Ｓ７０９で、位置決定部２０２は、位置入力が示す線の右側の領域を示す位置情報をメタデータとして記録することを決定する。そして、位置決定部２０２は、Ｓ７０１で算出された座標に従い、このような位置情報を生成する。この位置情報は、例えば頂点座標（ｘ１，ｘ３，ｙ１，ｙ２）を示す情報である。そして、図７に従う処理は終了する。 In S709, the position determining unit 202 determines to record position information indicating the area on the right side of the line indicated by the position input as metadata. Then, the position determination unit 202 generates such position information according to the coordinates calculated in S701. This position information is, for example, information indicating vertex coordinates (x1, x3, y1, y2). Then, the process according to FIG. 7 ends.

Ｓ７１０で、位置決定部２０２は、位置入力が示す線の左側の領域を示す位置情報をメタデータとして記録することを決定する。そして、位置決定部２０２は、Ｓ７０１で算出された座標に従い、このような位置情報を生成する。この位置情報は、例えば頂点座標（ｘ２，ｘ３，ｙ１，ｙ２）を示す情報である。そして、図７に従う処理は終了する。 In S710, the position determination unit 202 determines to record position information indicating the area to the left of the line indicated by the position input as metadata. Then, the position determination unit 202 generates such position information according to the coordinates calculated in S701. This position information is, for example, information indicating vertex coordinates (x2, x3, y1, y2). Then, the process according to FIG. 7 ends.

上記の例では、ユーザが縦方向に線を引く場合について説明した。一方で、領域の分割方法は特に限定されない。例えば、線の方向は特に限定されない。ユーザが横方向に線を引いた場合にも、位置決定部２０２は同様に位置情報を生成することができる。その場合、位置決定部２０２は、位置入力が示す線に従って上下に領域を分割し、上下それぞれの領域に記述が有るかどうかを判定することができる。 In the above example, the case where the user draws a line in the vertical direction has been described. On the other hand, the method of dividing the regions is not particularly limited. For example, the direction of the line is not particularly limited. Even when the user draws a line in the horizontal direction, the position determination unit 202 can similarly generate position information. In that case, the position determination unit 202 can divide the area into upper and lower areas according to the line indicated by the position input, and determine whether there is a description in each of the upper and lower areas.

このような例によれば、注目部分に合わせて領域を線などで分割するという簡単な入力操作に基づいて、注目部分を示す位置情報を生成することができる。後述するように、このような位置情報に従う領域を動画像上に示すことにより、ユーザは注目部分を把握しやすくなる。 According to such an example, position information indicating the portion of interest can be generated based on a simple input operation of dividing an area with lines or the like according to the portion of interest. As will be described later, by showing an area according to such positional information on a moving image, it becomes easier for the user to grasp the part of interest.

（動画像に基づく時刻情報の生成方法）
以下では、時刻決定部２０３による時刻情報の生成方法について説明する。上述のように、時刻決定部２０３は、位置入力の対象となった動画像の再生位置を示す時刻情報を生成してもよい。一方で、時刻決定部２０３は、位置入力によって指定された動画像上の位置における動画像の内容に基づいて、時刻情報を生成してもよい。 (Method for generating time information based on moving images)
Below, a method for generating time information by the time determination unit 203 will be explained. As described above, the time determination unit 203 may generate time information indicating the playback position of the moving image that is the subject of position input. On the other hand, the time determining unit 203 may generate time information based on the content of the moving image at the position on the moving image specified by the position input.

この実施形態で、時刻決定部２０３は、位置入力が示す位置にある動画像上の情報の出現が開始した時刻を判定し、判定した時刻を示すように時刻情報を生成する。例えば、時刻決定部２０３は、位置入力が示す位置にある記述情報が出現したタイミングを特定する。そして、時刻決定部２０３は、このタイミングを示す時刻情報を生成する。以下の例で、時刻決定部２０３は、講義の配信映像において、講師が黒板に注目部分の板書を書き始めたタイミングを特定する。このような構成によれば、ユーザが復習のために動画像を再度視聴するときに、注目部分の板書の説明が開始するタイミングから動画像を再生することが容易になる。 In this embodiment, the time determination unit 203 determines the time when information on the moving image at the position indicated by the position input starts to appear, and generates time information to indicate the determined time. For example, the time determination unit 203 identifies the timing at which descriptive information at the position indicated by the position input appears. Then, the time determination unit 203 generates time information indicating this timing. In the example below, the time determining unit 203 identifies the timing when the lecturer starts writing on the blackboard the part of interest in the distributed video of the lecture. According to such a configuration, when the user views the video again for review, it becomes easy to play the video from the timing at which the explanation on the board of interest starts.

図８（Ａ）は、ユーザが動画像に対して位置入力を行う様子を示す。ユーザは、四角形の領域８０３を示す位置入力を行っている。この領域８０３は、配信映像８０１内の黒板８０２にある記述内容「ＡＢＣＤＥ」を包含している。ここで、ユーザが位置入力を行った際に表示されている（すなわち、位置入力の対象となった）フレームの時刻をｔ１とする。このとき時刻決定部２０３は、図８（Ｂ）に示すように、講師８０４が領域８０３で囲まれている記述内容を書き始めたときのフレームを特定する。そして、時刻決定部２０３は、特定したフレームの時刻ｔ０を示す時刻情報を生成する。こうして生成された時刻情報は、メタデータとして記憶される。 FIG. 8(A) shows how a user inputs a position on a moving image. The user is inputting a position indicating a rectangular area 803. This area 803 includes the written content "ABCDE" on the blackboard 802 in the distributed video 801. Here, the time of the frame being displayed when the user inputs the position (that is, the frame that is the target of the position input) is assumed to be t1. At this time, the time determination unit 203 identifies the frame when the lecturer 804 started writing the written content surrounded by the area 803, as shown in FIG. 8(B). Then, the time determination unit 203 generates time information indicating the time t0 of the specified frame. The time information generated in this way is stored as metadata.

このような例において、Ｓ３０３で時刻決定部２０３が時刻情報を生成するための処理例を、図９に示すフローチャートに従って説明する。Ｓ９０１で、時刻決定部２０３は、時刻ｔ０を探索するための一時変数ｔを用意する。また、時刻決定部２０３は、変数ｔに、位置入力の対象となったフレームの時刻ｔ１を設定する。 In such an example, a processing example for the time determination unit 203 to generate time information in S303 will be described with reference to the flowchart shown in FIG. In S901, the time determination unit 203 prepares a temporary variable t for searching for time t0. Further, the time determination unit 203 sets the time t1 of the frame whose position is input to the variable t.

Ｓ９０２で、時刻決定部２０３は、変数ｔが示す時刻のフレームの１つ前のフレームの時刻を、変数ｔに代入する。 In S902, the time determination unit 203 assigns to the variable t the time of the frame immediately before the frame at the time indicated by the variable t.

Ｓ９０３で、時刻決定部２０３は、変数ｔが示す時刻のフレームにおいて、フレーム画像から記述情報を抽出する。ここで、時刻決定部２０３は、フレーム画像のうち、Ｓ３０２で生成された位置情報に示される位置（例えば四角形の領域）から、記述情報を抽出することができる。このように、時刻決定部２０３は、位置入力が示す位置にある動画像上の情報として、位置情報が示す領域内の情報と連続している情報全体を取得することができる。別の実施形態において、時刻決定部２０３は、フレーム画像のうち、Ｓ３０１で取得された位置情報に示される位置（例えば四角形の領域）から、記述情報を抽出してもよい。記述情報の抽出は、Ｓ５０２と同様の方法で行うことができる。 In S903, the time determination unit 203 extracts descriptive information from the frame image in the frame at the time indicated by the variable t. Here, the time determination unit 203 can extract descriptive information from the position (for example, a rectangular area) indicated by the position information generated in S302 from the frame image. In this way, the time determining unit 203 can acquire, as information on the moving image at the position indicated by the position input, all of the information that is continuous with the information in the area indicated by the position information. In another embodiment, the time determination unit 203 may extract descriptive information from a position (for example, a rectangular area) indicated by the position information acquired in S301 in the frame image. Extraction of descriptive information can be performed in the same manner as in S502.

Ｓ９０４で、時刻決定部２０３は、記述情報を抽出できたかどうかを判定する。抽出された記述情報がない場合、処理はＳ９０５へ進み、そうでなければ、処理はＳ９０２へ戻る。Ｓ９０５で、時刻決定部２０３は、変数ｔを示す時刻情報を生成する。この時刻情報は、メタデータとして記録される。このような処理により得られた時刻情報が示す再生位置は、位置入力の対象となった動画像の再生位置を補正したものに相当する。 In S904, the time determination unit 203 determines whether descriptive information has been extracted. If there is no extracted descriptive information, the process advances to S905; otherwise, the process returns to S902. In S905, the time determination unit 203 generates time information indicating the variable t. This time information is recorded as metadata. The playback position indicated by the time information obtained through such processing corresponds to a corrected playback position of the moving image that is the subject of position input.

このような実施形態によれば、注目部分の記述情報が出現した時刻を記録することができる。このため、ユーザが動画像を再び視聴する時に、動画像のうち注目部分に関係する区間を再生することが容易になる。 According to such an embodiment, it is possible to record the time when the descriptive information of the portion of interest appears. Therefore, when the user views the moving image again, it becomes easy to reproduce the section of the moving image that is related to the portion of interest.

（音に基づく時刻情報の生成方法）
時刻決定部２０３は、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音に基づいて、時刻情報を生成してもよい。特に、時刻決定部２０３は、位置入力によって指定された動画像上の位置における動画像の内容と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音と、の双方に基づいて時刻情報を生成することができる。特に、時刻決定部２０３は、位置入力によって指定された動画像上の位置における動画像上の記述情報と、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音が示す内容と、に基づいて時刻情報を生成することができる。 (Method for generating time information based on sound)
The time determining unit 203 may generate the time information based on the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. In particular, the time determination unit 203 determines the content of the moving image at the position on the moving image specified by the position input and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. Time information can be generated based on both. In particular, the time determination unit 203 determines the descriptive information on the moving image at the position on the moving image specified by the position input and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. Time information can be generated based on the content shown.

例えば、図８（Ａ）の例において、講師は時刻ｔ０よりも前の時刻ｔ００において、四角形の領域８０３にある記述内容「ＡＢＣＤＥ」に関する説明を開始しているかもしれない。この実施形態において、時刻決定部２０３は、「ＡＢＣＤＥ」のような記述情報を示す発話が開始する時刻を示す時刻情報を生成する。具体的には、時刻決定部２０３は、動画像に関連付けられた音が示す発話言語に基づいて時刻ｔ００を特定することができる。そして、時刻決定部２０３は、こうして特定した時刻ｔ００を示す時刻情報を生成する。このような構成によれば、注目部分の板書を書き始める前に注目部分に関する説明を開始した場合であっても、注目部分の板書の説明が開始するタイミングから動画像を再生することが容易になる。 For example, in the example of FIG. 8A, the lecturer may have started explaining the written content "ABCDE" in the rectangular area 803 at time t00, which is earlier than time t0. In this embodiment, the time determination unit 203 generates time information indicating the time at which an utterance indicating descriptive information such as "ABCDE" starts. Specifically, the time determination unit 203 can identify time t00 based on the spoken language indicated by the sound associated with the moving image. Then, the time determination unit 203 generates time information indicating the thus specified time t00. According to such a configuration, even if the explanation of the part of interest starts before writing the part of interest on the board, it is easy to play the video from the timing when the explanation of the part of interest starts written on the board. Become.

このような例において、Ｓ３０３で時刻決定部２０３が時刻情報を生成するための処理例を、図１０に示すフローチャートに従って説明する。Ｓ１１０１で、時刻決定部２０３は、時刻ｔ１のフレームにおいて、フレーム画像から記述情報を抽出する。記述情報の抽出は、Ｓ９０３と同様に行うことができ、具体的には例えばパターン認識を用いることができる。Ｓ１１０２で、時刻決定部２０３は、図９に示す方法に従って、記述内容を書き始めたときの時刻ｔ０を特定する。さらに、時刻決定部２０３は、時刻ｔ００を探索するための一時変数ｔを用意する。また、時刻決定部２０３は、変数ｔに、特定された時刻ｔ０を設定する。 In such an example, a processing example for the time determination unit 203 to generate time information in S303 will be described with reference to the flowchart shown in FIG. In S1101, the time determination unit 203 extracts descriptive information from the frame image in the frame at time t1. Extraction of descriptive information can be performed in the same manner as in S903, and specifically, for example, pattern recognition can be used. In S1102, the time determination unit 203 identifies the time t0 when writing the description content starts, according to the method shown in FIG. Furthermore, the time determination unit 203 prepares a temporary variable t for searching for time t00. Further, the time determination unit 203 sets the specified time t0 to the variable t.

Ｓ１１０３で、時刻決定部２０３は、変数ｔが示す時刻から始まり、一定時間後までの音データを抽出する。また、時刻決定部２０３は、抽出された音データに示される発話言語を音声解析により抽出する。音声解析の方法は特に限定されない。 In S1103, the time determination unit 203 extracts sound data starting from the time indicated by the variable t and ending after a certain period of time. Furthermore, the time determination unit 203 extracts the spoken language indicated by the extracted sound data by voice analysis. The voice analysis method is not particularly limited.

Ｓ１１０４で、時刻決定部２０３は、Ｓ１１０１で抽出した記述情報と、Ｓ１１０３で抽出した発話言語とが一致するかどうかを判定する。なお、記述情報と発話言語とが一致するかどうかを判定する際に、時刻決定部２０３は、記述情報が発話言語に含まれるかどうかを判定してもよい。一致した場合、処理はＳ１１０５へ進み、そうでなければ、処理はＳ１１０６へ進む。 In S1104, the time determination unit 203 determines whether the descriptive information extracted in S1101 and the spoken language extracted in S1103 match. Note that when determining whether the descriptive information and the spoken language match, the time determination unit 203 may determine whether the descriptive information is included in the spoken language. If they match, the process advances to S1105; otherwise, the process advances to S1106.

Ｓ１１０５で、時刻決定部２０３は、変数ｔによって示される時刻を示す時刻情報を生成する。そして、図１０に従う処理は終了する。 In S1105, the time determination unit 203 generates time information indicating the time indicated by the variable t. Then, the process according to FIG. 10 ends.

Ｓ１１０６で、時刻決定部２０３は、動画像の終端の音データを抽出したどうかを判定する。なお、図１０に示す処理においては、時刻ｔ１から一定の範囲内の音データが処理対象として抽出されてもよい。この場合、時刻決定部２０３は、この範囲内における終端の音データを抽出したどうかを判定してもよい。動画像の終端の音データを抽出したと判定された場合、処理はＳ１１０７へ進む。そうではない場合、処理はＳ１１０３へ戻る。その後、Ｓ１１０３において、時刻決定部２０３は、変数ｔを、Ｓ１１０３で抽出された音の長さを加算することにより更新し、再び音データの抽出を行う。 In S1106, the time determination unit 203 determines whether sound data at the end of the moving image has been extracted. Note that in the process shown in FIG. 10, sound data within a certain range from time t1 may be extracted as the processing target. In this case, the time determination unit 203 may determine whether the last sound data within this range has been extracted. If it is determined that the sound data at the end of the moving image has been extracted, the process advances to S1107. If not, the process returns to S1103. After that, in S1103, the time determination unit 203 updates the variable t by adding the length of the sound extracted in S1103, and extracts the sound data again.

Ｓ１１０７で、時刻決定部２０３は、変数ｔを初期化するためにｔ０を代入する。Ｓ１１０８で、時刻決定部２０３は、変数ｔが示す時刻の一定時間前から始まり、時刻ｔまでの音データを抽出する。また、時刻決定部２０３は、抽出された音データに示される発話言語を、Ｓ１１０３と同様に抽出する。さらに、時刻決定部２０３は、変数ｔを、抽出された音の長さを減算することにより更新する。 In S1107, the time determination unit 203 assigns t0 to initialize the variable t. In S1108, the time determination unit 203 extracts sound data starting from a certain time before the time indicated by the variable t and ending at time t. Furthermore, the time determining unit 203 extracts the spoken language indicated by the extracted sound data in the same manner as in S1103. Further, the time determination unit 203 updates the variable t by subtracting the length of the extracted sound.

Ｓ１１０９で、時刻決定部２０３は、Ｓ１１０１で抽出した記述情報と、Ｓ１１０８で抽出した言語が一致するかどうかを判定する。一致した場合、処理はＳ１１０５へ進み、そうでなければ、処理はＳ１１１０へ進む。 In S1109, the time determination unit 203 determines whether the descriptive information extracted in S1101 and the language extracted in S1108 match. If they match, the process advances to S1105; otherwise, the process advances to S1110.

Ｓ１１１０で、時刻決定部２０３は、動画像の先頭の音データを抽出したどうかを判定する。動画像の先頭の音データを抽出したと判定された場合、処理はＳ１１１１へ進む。そうではない場合、処理はＳ１１０８へ戻る。 In S1110, the time determination unit 203 determines whether the sound data at the beginning of the moving image has been extracted. If it is determined that the sound data at the beginning of the moving image has been extracted, the process advances to S1111. If not, the process returns to S1108.

Ｓ１１１１で、時刻決定部２０３は、メタデータとして記録される時刻情報は存在しないと判定する。そして、図１０に従う処理は終了する。もっとも、このような場合に、時刻決定部２０３は、位置入力の対象となった動画像の再生位置を示す時刻情報を生成してもよい。 In S1111, the time determination unit 203 determines that there is no time information recorded as metadata. Then, the process according to FIG. 10 ends. However, in such a case, the time determination unit 203 may generate time information indicating the playback position of the moving image that is the subject of position input.

上記の手法によれば、時刻決定部２０３は、音に基づいて、位置入力の対象となった動画像の再生位置（時刻ｔ１）又は図９に従う補正後の再生位置（時刻ｔ０）の近傍において、記述情報を示す発話が開始する時刻（ｔ００）を特定することができる。そして、時刻決定部２０３は、この時刻（ｔ００）を示す時刻情報を生成することができる。このような処理により得られた時刻情報が示す再生位置は、位置入力の対象となった動画像の再生位置を補正したものに相当する。なお、Ｓ１１０７～Ｓ１１１０の処理を行った後に、Ｓ１１０３～Ｓ１１０６の処理を行ってもよい。 According to the above method, the time determination unit 203 determines, based on the sound, the playback position (time t1) of the moving image that is the target of position input or the vicinity of the playback position after correction according to FIG. 9 (time t0). , it is possible to specify the time (t00) at which the utterance indicating the descriptive information starts. Then, the time determination unit 203 can generate time information indicating this time (t00). The playback position indicated by the time information obtained through such processing corresponds to a corrected playback position of the moving image that is the subject of position input. Note that after performing the processing in S1107 to S1110, the processing in S1103 to S1106 may be performed.

上記の例においては、音データに示される発話言語と、動画像から抽出された記述情報とに基づいて、時刻情報が生成された。しかしながら、音データを用いた時刻情報の生成方法は、この方法に限定されない。時刻決定部２０３は、さまざまな方法で、動画像に関連する音に基づいて時刻情報を生成することができる。例えば、時刻決定部２０３は、音データに基づいて、継続した発話が途切れたタイミングを判定することができる。そして、時刻決定部２０３は、継続した発話が途切れたタイミングのうち、位置入力が行われた際の動画像中の時刻より前で、位置入力が行われた際の動画像中の時刻に最も近いタイミングを示す時刻情報を生成してもよい。別の例として、時刻決定部２０３は、継続した発話が途切れたタイミングのうち、図９に従う処理で判定された時刻ｔ０より前で、時刻ｔ０に最も近いタイミングを示す時刻情報を生成してもよい。 In the above example, time information was generated based on the spoken language shown in the sound data and the descriptive information extracted from the video image. However, the method of generating time information using sound data is not limited to this method. The time determination unit 203 can generate time information based on sounds associated with moving images using various methods. For example, the time determination unit 203 can determine the timing at which continuous speech is interrupted based on the sound data. Then, the time determination unit 203 determines the timing at which the continuous speech is interrupted, before the time in the video when the position input is performed, and the most at the time in the video when the position input is performed. Time information indicating a nearby timing may be generated. As another example, the time determination unit 203 may generate time information indicating a timing closest to time t0, which is before time t0 determined by the process according to FIG. 9, among the timings at which continuous speech is interrupted. good.

また、時刻決定部２０３は、さまざまな方法で、動画像に関連する音が示す内容に基づいて時刻情報を生成することができる。例えば、時刻決定部２０３は、動画像に関連する音が含む効果音の再生時刻に基づいて時刻情報を生成することができる。例えば、動画像に関連する音は、特定の効果音（例えばセクションの最初に再生される効果音、又は注目点を示す効果音など）を含んでいてもよい。この場合、時刻決定部２０３は、位置入力の対象となった動画像の再生位置に対応する時刻において、効果音を再生中かどうかを判定することができる。そして、この時刻において効果音を再生中の場合、時刻決定部２０３は、この効果音の再生開始時刻又は再生終了時刻を示す時刻情報を生成することができる。また、時刻決定部２０３は、位置入力の対象となった動画像の再生位置の近傍において再生される効果音を検出してもよい。この場合、時刻決定部２０３は、検出した効果音の再生開始時刻又は再生終了時刻を示す時刻情報を生成することができる。このような構成により生成された時刻情報を用いることにより、音の内容及びユーザの好みに合った適切な位置から動画像及びこれに関連する音を再生することが可能となる。以上のように、時刻情報を生成するために記述情報を参照することは必須ではない。 Further, the time determination unit 203 can generate time information based on the content indicated by the sound related to the moving image using various methods. For example, the time determination unit 203 can generate time information based on the playback time of a sound effect included in a sound related to a moving image. For example, sounds associated with moving images may include specific sound effects (eg, a sound effect played at the beginning of a section, a sound effect indicating a point of interest, etc.). In this case, the time determination unit 203 can determine whether the sound effect is being played back at the time corresponding to the playback position of the moving image that is the subject of the position input. If the sound effect is being played at this time, the time determining unit 203 can generate time information indicating the playback start time or playback end time of this sound effect. Furthermore, the time determining unit 203 may detect a sound effect that is played near the playback position of the moving image that is the target of the position input. In this case, the time determination unit 203 can generate time information indicating the playback start time or playback end time of the detected sound effect. By using the time information generated by such a configuration, it becomes possible to reproduce moving images and sounds related thereto from an appropriate position that matches the content of the sounds and the user's preferences. As described above, it is not essential to refer to descriptive information in order to generate time information.

このような実施形態によれば、注目部分の記述情報に関する説明を開始した時刻を記録することができる。このため、ユーザが動画像を再び視聴する時に、動画像のうち注目部分に関係する区間を再生することが容易になる。 According to such an embodiment, it is possible to record the time when the explanation regarding the descriptive information of the portion of interest is started. Therefore, when the user views the moving image again, it becomes easy to reproduce the section related to the portion of interest in the moving image.

（メタデータの利用方法）
図４（Ｃ）は、記録部２０４が記録したメタデータである位置情報及び時刻情報の利用方法の一例を示す。図４（Ｃ）に示すように、再生部２０５は、動画像を表示部１０６に表示させることができる。このとき、再生部２０５は、時刻情報が示すフレームにおいて、位置情報に従う部分を示す情報を動画像上に表示することができる。例えば、図４（Ｃ）の例では、位置情報に従う領域４０６が動画像上で特定されている。さらに、再生部２０５は、位置情報に従う部分を示す情報に関連付けて、この部分の属性情報を示すことができる。図４（Ｃ）の例では、領域４０６に関連付けて、この部分が重要であることを示すマークが表示されている。 (How to use metadata)
FIG. 4C shows an example of a method of using location information and time information, which are metadata recorded by the recording unit 204. As shown in FIG. 4C, the playback unit 205 can display the moving image on the display unit 106. At this time, the playback unit 205 can display, on the moving image, information indicating a portion according to the position information in the frame indicated by the time information. For example, in the example of FIG. 4C, an area 406 according to position information is specified on the video image. Furthermore, the playback unit 205 can indicate the attribute information of this part in association with the information indicating the part according to the position information. In the example of FIG. 4C, a mark indicating that this part is important is displayed in association with the area 406.

また、再生部２０５は、動画像の再生位置を決めるために用いられるシークバー上に、時刻情報が示す再生位置を示すマーカを表示してもよい。このマーカをユーザが指定した際に、再生部２０５は、時刻情報が示す再生位置から動画像の再生を開始してもよい。さらなる例として、再生部２０５は、位置情報が示す動画像上の位置を示す情報を、位置入力の対象となった動画像の再生位置におけるフレーム画像に重畳することにより得られたサムネイル画像を表示してもよい。このサムネイル画像をユーザが指定した際に、再生部２０５は、時刻情報が示す再生位置から動画像の再生を開始してもよい。 Furthermore, the playback unit 205 may display a marker indicating the playback position indicated by the time information on a seek bar used to determine the playback position of the moving image. When the user specifies this marker, the playback unit 205 may start playing the moving image from the playback position indicated by the time information. As a further example, the playback unit 205 displays a thumbnail image obtained by superimposing information indicating the position on the video image indicated by the position information on a frame image at the playback position of the video image that is the target of the position input. You may. When the user specifies this thumbnail image, the playback unit 205 may start playing the moving image from the playback position indicated by the time information.

（その他の実施例）
ここまで、位置入力によって指定された動画像上の位置における動画像の内容に基づいて位置情報を生成する方法を説明した（図５及び図７）。また、位置入力によって指定された動画像上の位置における動画像の内容に基づいて時刻情報を生成する方法も説明した（図９及び図１０）。さらに、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音に基づいて時刻情報を生成する方法も説明した（図１０）。一方で、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音に基づいて位置情報を生成してもよい。例えば、位置決定部２０２は、位置入力の対象となった動画像の再生位置に対応する動画像に関連する音が示す発話言語を検出することができる。そして、位置決定部２０２は、検出された発話言語に対応する記述情報を、例えばパターンマッチング等を用いて位置入力によって指定された動画像上の位置の近傍で検索することができる。このような記述情報が検索された場合、位置決定部２０２は、検索された記述情報を包含する領域を示すように位置情報を生成することができる。これらの方法は、位置情報と時刻情報との少なくとも一方を生成するために、任意に組み合わせて用いることができる。 (Other examples)
Up to this point, a method for generating position information based on the content of a moving image at a position on the moving image specified by position input has been described (FIGS. 5 and 7). Furthermore, a method for generating time information based on the content of a moving image at a position on the moving image specified by position input has also been described (FIGS. 9 and 10). Furthermore, a method for generating time information based on the sound associated with a moving image corresponding to the playback position of the moving image that is the target of position input has also been described (FIG. 10). On the other hand, the position information may be generated based on the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. For example, the position determination unit 202 can detect the spoken language indicated by the sound related to the moving image that corresponds to the playback position of the moving image that is the target of the position input. Then, the position determining unit 202 can search for descriptive information corresponding to the detected spoken language in the vicinity of the position on the moving image designated by the position input, for example, using pattern matching or the like. When such descriptive information is retrieved, the position determination unit 202 can generate positional information to indicate a region that includes the retrieved descriptive information. These methods can be used in any combination to generate at least one of location information and time information.

本開示の内容は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 The content of the present disclosure provides a system or device with a program that implements one or more functions of the embodiments described above via a network or a storage medium, and one or more processors in a computer of the system or device executes the program. This can also be realized by reading and executing processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

本明細書の開示は、以下の情報処理装置、情報処理方法、及びプログラムを含む。 The disclosure of this specification includes the following information processing device, information processing method, and program.

（項目１）
動画像上の位置を指定する位置入力を取得する取得手段と、
前記位置入力に基づいて前記動画像上の位置を示す位置情報及び前記動画像の再生位置を示す時刻情報を生成する生成手段であって、前記位置情報と前記時刻情報との少なくとも一方を、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の少なくとも一方にさらに基づいて生成する、生成手段と、
前記位置情報及び前記時刻情報を記録する記録手段と、
を備えることを特徴とする情報処理装置。 (Item 1)
an acquisition means for acquiring a position input specifying a position on the moving image;
A generating means for generating position information indicating a position on the moving image and time information indicating a playback position of the moving image based on the position input, the generating means generating at least one of the position information and the time information by the at least one of the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. further generating means for generating based on;
recording means for recording the location information and the time information;
An information processing device comprising:

（項目２）
前記生成手段は、前記位置入力が指定する前記動画像上の位置を、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の少なくとも一方に基づいて補正することにより、前記位置情報を生成し、及び／又は、
前記位置入力の対象となった前記動画像の再生位置を、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の少なくとも一方に基づいて補正することにより、前記時刻情報を生成する
ことを特徴とする、項目１に記載の情報処理装置。 (Item 2)
The generating means converts the position on the moving image specified by the position input into the contents of the moving image at the position on the moving image specified by the position input, and the moving image that is the target of the position input. generating the position information by correcting it based on at least one of: a sound associated with the moving image corresponding to a reproduction position of the image; and/or
The playback position of the video that is the target of the position input, the content of the video at the position on the video specified by the position input, and the reproduction of the video that is the target of the position input. The information processing device according to item 1, wherein the time information is generated by correcting the time information based on at least one of: a sound related to the moving image corresponding to a position;

（項目３）
前記生成手段は、前記位置入力によって指定された前記動画像上の位置にある前記動画像上の情報を包含する領域を示すように、前記位置情報を生成することを特徴とする、項目１又は２に記載の情報処理装置。 (Item 3)
Item 1 or 2, wherein the generating means generates the position information so as to indicate an area including information on the moving image at a position on the moving image specified by the position input. 2. The information processing device according to 2.

（項目４）
前記情報は連続した記述であることを特徴とする、項目３に記載の情報処理装置。 (Item 4)
The information processing device according to item 3, wherein the information is a continuous description.

（項目５）
前記位置入力は前記動画像上の情報の少なくとも一部を含む領域を指定し、
前記生成手段は、前記位置入力が指定する領域を拡大又は縮小することにより、前記動画像上の情報を包含する領域を判定することを特徴とする、項目２から４のいずれか１項目に記載の情報処理装置。 (Item 5)
The position input specifies an area including at least part of information on the video image,
According to any one of items 2 to 4, the generating means determines an area that includes information on the moving image by enlarging or reducing the area specified by the position input. information processing equipment.

（項目６）
前記位置入力は、四角形、ループ形状、又は点を指定することを特徴とする、項目１から５のいずれか１項目に記載の情報処理装置。 (Item 6)
The information processing device according to any one of items 1 to 5, wherein the position input specifies a rectangle, a loop shape, or a point.

（項目７）
前記位置入力は、前記動画像上の所定領域の分割位置を指定し、
前記生成手段は、前記動画像から前記所定領域を検出し、前記位置入力に従って分割された前記所定領域の分割領域のうちの１つを示すように、前記位置情報を生成することを特徴とする、項目１に記載の情報処理装置。 (Item 7)
The position input specifies a division position of a predetermined area on the video image,
The generating means detects the predetermined area from the moving image and generates the position information so as to indicate one of the divided areas of the predetermined area divided according to the position input. , the information processing device according to item 1.

（項目８）
前記所定領域は黒板又はホワイトボードの領域であることを特徴とする、項目７に記載の情報処理装置。 (Item 8)
8. The information processing device according to item 7, wherein the predetermined area is an area of a blackboard or a whiteboard.

（項目９）
前記生成手段は、前記位置入力に従って分割された前記動画像上の複数の領域のそれぞれに情報が含まれるかどうかに基づいて、前記複数の領域から１つの領域を選択し、選択された領域を示すように前記位置情報を生成することを特徴とする、項目７又は８に記載の情報処理装置。 (Item 9)
The generating means selects one region from the plurality of regions based on whether information is included in each of the plurality of regions on the moving image divided according to the position input, and generates the selected region. 9. The information processing device according to item 7 or 8, characterized in that the position information is generated as shown.

（項目１０）
前記位置入力は線を指定することを特徴とする、項目７から９のいずれか１項目に記載の情報処理装置。 (Item 10)
9. The information processing device according to any one of items 7 to 9, wherein the position input specifies a line.

（項目１１）
前記生成手段は、前記位置入力が示す位置にある前記動画像上の情報の出現が開始した時刻を判定し、判定した時刻を示すように前記時刻情報を生成することを特徴とする、項目１から１０のいずれか１項目に記載の情報処理装置。 (Item 11)
Item 1, wherein the generating means determines a time when information on the moving image at a position indicated by the position input starts to appear, and generates the time information to indicate the determined time. The information processing device according to any one of 10 to 10.

（項目１２）
前記生成手段は、前記位置情報が示す領域内の情報と連続している情報全体について出現が開始した時刻を判定し、判定した時刻を示すように前記時刻情報を生成することを特徴とする、項目１から１０のいずれか１項目に記載の情報処理装置。 (Item 12)
The generation means is characterized in that it determines the time when the appearance of all information that is continuous with the information in the area indicated by the position information starts to appear, and generates the time information so as to indicate the determined time. The information processing device according to any one of items 1 to 10.

（項目１３）
前記生成手段は、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の双方に基づいて前記時刻情報を生成することを特徴とする、項目１から１０のいずれか１項目に記載の情報処理装置。 (Item 13)
The generating means generates the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. The information processing device according to any one of items 1 to 10, characterized in that the time information is generated based on both of and.

（項目１４）
前記生成手段は、前記位置入力によって指定された前記動画像上の位置における前記動画像上の記述情報と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音が示す内容と、に基づいて前記時刻情報を生成することを特徴とする、項目１から１０のいずれか１項目に記載の情報処理装置。 (Item 14)
The generating means is configured to generate descriptive information on the moving image at a position on the moving image specified by the position input, and information related to the moving image corresponding to a playback position of the moving image that is the target of the position input. The information processing device according to any one of items 1 to 10, wherein the information processing device generates the time information based on the content indicated by the sound.

（項目１５）
前記生成手段は、前記記述情報を示す発話が開始する時刻を示す前記時刻情報を生成することを特徴とする、項目１４に記載の情報処理装置。 (Item 15)
15. The information processing device according to item 14, wherein the generating means generates the time information indicating a time when an utterance indicating the descriptive information starts.

（項目１６）
前記生成手段は、前記音に基づいて、前記位置入力の対象となった前記動画像の再生位置の近傍において、前記記述情報を示す発話が開始する時刻を特定し、前記時刻を示す前記時刻情報を生成することを特徴とする、項目１４又は１５に記載の情報処理装置。 (Item 16)
The generating means identifies, based on the sound, a time at which an utterance indicating the descriptive information starts in the vicinity of a playback position of the moving image that is the target of the position input, and generates the time information indicating the time. The information processing device according to item 14 or 15, characterized in that the information processing device generates.

（項目１７）
情報処理装置が行う情報処理方法であって、
動画像上の位置を指定する位置入力を取得する工程と、
前記位置入力に基づいて前記動画像上の位置を示す位置情報及び前記動画像の再生位置を示す時刻情報を生成する工程であって、前記位置情報と前記時刻情報との少なくとも一方を、前記位置入力によって指定された前記動画像上の位置における前記動画像の内容と、前記位置入力の対象となった前記動画像の再生位置に対応する前記動画像に関連する音と、の少なくとも一方にさらに基づいて生成する工程と、
前記位置情報及び前記時刻情報を記録する工程と、
を含むことを特徴とする情報処理方法。 (Item 17)
An information processing method performed by an information processing device, the method comprising:
obtaining a position input specifying a position on the video image;
The step of generating position information indicating a position on the moving image and time information indicating a playback position of the moving image based on the position input, the step of generating at least one of the position information and the time information based on the position input. Further, at least one of the content of the moving image at the position on the moving image specified by the input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. a step of generating based on the
recording the location information and the time information;
An information processing method characterized by comprising:

（項目１８）
コンピュータを、項目１から１６のいずれか１項目に記載の情報処理装置として機能させるためのプログラム。 (Item 18)
A program for causing a computer to function as the information processing device according to any one of items 1 to 16.

本開示の範囲は上記実施形態に制限されるものではなく、その精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、権利範囲を公にするために請求項を添付する。 The scope of the present disclosure is not limited to the above embodiments, and various changes and modifications can be made without departing from the spirit and scope thereof. Accordingly, the claims are appended hereto in order to disclose the scope of the rights.

１０１：情報処理装置、２０１：取得部、２０２：位置決定部、２０３：時刻決定部、２０４：記憶部、２０５：再生部 101: Information processing device, 201: Acquisition unit, 202: Position determination unit, 203: Time determination unit, 204: Storage unit, 205: Reproduction unit

Claims

an acquisition means for acquiring a position input specifying a position on the moving image;
A generating means for generating position information indicating a position on the moving image and time information indicating a playback position of the moving image based on the position input, the generating means generating at least one of the position information and the time information by the at least one of the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. further generating means for generating based on;
recording means for recording the location information and the time information;
An information processing device comprising:

The generating means converts the position on the moving image specified by the position input into the contents of the moving image at the position on the moving image specified by the position input, and the moving image that is the target of the position input. generating the position information by correcting it based on at least one of: a sound associated with the moving image corresponding to a reproduction position of the image; and/or
The playback position of the video that is the target of the position input, the content of the video at the position on the video specified by the position input, and the reproduction of the video that is the target of the position input. The information processing apparatus according to claim 1, wherein the time information is generated by correcting the time information based on at least one of: a sound related to the moving image corresponding to a position;

1 . The generating means generates the position information so as to indicate a region including information on the moving image located at a position on the moving image specified by the position input. The information processing device described in .

The information processing device according to claim 3, wherein the information is a continuous description.

The position input specifies an area including at least part of information on the video image,
4. The information processing apparatus according to claim 3, wherein the generating means determines an area that includes information on the moving image by enlarging or reducing an area specified by the position input.

4. The information processing apparatus according to claim 3, wherein the position input specifies a rectangle, a loop shape, or a point.

The position input specifies a division position of a predetermined area on the video image,
The generating means detects the predetermined area from the moving image and generates the position information so as to indicate one of the divided areas of the predetermined area divided according to the position input. , The information processing device according to claim 1.

8. The information processing apparatus according to claim 7, wherein the predetermined area is an area of a blackboard or a whiteboard.

The generating means selects one region from the plurality of regions based on whether information is included in each of the plurality of regions on the moving image divided according to the position input, and generates the selected region. 8. The information processing apparatus according to claim 7, wherein the position information is generated as shown in FIG.

8. The information processing apparatus according to claim 7, wherein the position input specifies a line.

3. The generating means determines a time when information on the moving image at a position indicated by the position input starts to appear, and generates the time information so as to indicate the determined time. 1. The information processing device according to 1.

The generation means is characterized in that it determines the time when the appearance of all information that is continuous with the information in the area indicated by the position information starts to appear, and generates the time information so as to indicate the determined time. The information processing device according to claim 1.

The generating means generates the content of the moving image at the position on the moving image specified by the position input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. The information processing device according to claim 1, wherein the time information is generated based on both of the following.

The generating means is configured to generate descriptive information on the moving image at a position on the moving image specified by the position input, and information related to the moving image corresponding to a playback position of the moving image that is the target of the position input. The information processing apparatus according to claim 1, wherein the time information is generated based on the content indicated by the sound.

15. The information processing apparatus according to claim 14, wherein the generating means generates the time information indicating a time when an utterance indicating the descriptive information starts.

The generating means identifies, based on the sound, a time at which an utterance indicating the descriptive information starts in the vicinity of a playback position of the moving image that is the target of the position input, and generates the time information indicating the time. 15. The information processing apparatus according to claim 14, wherein the information processing apparatus generates the information processing apparatus.

An information processing method performed by an information processing device, the method comprising:
obtaining a position input specifying a position on the video image;
The step of generating position information indicating a position on the moving image and time information indicating a playback position of the moving image based on the position input, the step of generating at least one of the position information and the time information based on the position input. Further, at least one of the content of the moving image at the position on the moving image specified by the input, and the sound related to the moving image corresponding to the playback position of the moving image that is the target of the position input. a step of generating based on the
recording the location information and the time information;
An information processing method characterized by comprising:

A program for causing a computer to function as the information processing device according to any one of claims 1 to 16.