JP2021048506A

JP2021048506A - Video scene information management apparatus

Info

Publication number: JP2021048506A
Application number: JP2019170171A
Authority: JP
Inventors: 直也東條; Naoya Tojo; 統新井田; Osamu Araida
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2021-03-25

Abstract

To provide a video scene management apparatus capable of selectively reproducing each scene in a video file by a form of displaying a scene identifier of each scene at a proper place within a screen.SOLUTION: A scene information acquisition unit 101 acquires, as scene information, a data set including a scene identifier, a scene interval, and positional information related with a display position of the scene identifier in reproduction of a scene, for each scene of a video file. A scene information registration unit 102 registers a data set of each scene information for each video file to a scene information management server 3. A scene information retrieval unit 104 selects the scene information by keyword retrieval of the scene identifier. A video file reproduction unit 103 reproduces each scene of the video file. A scene information reproduction unit 105 displays the scene identifier in a reproduced scene.SELECTED DRAWING: Figure 1

Description

本発明は、映像ファイルからシーンを抽出してシーン情報を生成、管理し、このシーン情報に基づいてシーンを選択的に再生する映像シーン情報管理装置に係り、特に、映像ファイル内の各シーンを、そのシーン識別子が画面内の適所に表示された形式で再生する映像シーン情報管理装置に関する。 The present invention relates to a video scene information management device that extracts scenes from a video file, generates and manages scene information, and selectively reproduces the scenes based on the scene information. In particular, the present invention relates to a video scene information management device that selectively reproduces each scene in the video file. The present invention relates to a video scene information management device that reproduces the scene identifier in a format in which the scene identifier is displayed in an appropriate place on the screen.

特許文献１には、ユーザが番組や番組に含まれるシーンを字幕情報から抽出し、ディスプレイに表示させる技術が開示されている。 Patent Document 1 discloses a technique in which a user extracts a program or a scene included in the program from subtitle information and displays it on a display.

特開2015-52897号公報JP-A-2015-52897

従来技術では、視聴ユーザは字幕を対象とするキーワード検索により当該字幕の表示シーンを所望のシーンとして抽出することができるが、字幕を含まない映像ファイルからはシーンを抽出することができない。 In the prior art, the viewing user can extract the display scene of the subtitle as a desired scene by performing a keyword search for the subtitle, but cannot extract the scene from the video file that does not include the subtitle.

また、映像ファイルからオブジェクトを自動検出して識別し、当該識別結果をテキストとして登録しておくことで、当該テキストを対象としたキーワード検索により所望のオブジェクトが映るシーンを自動的に抽出することができる。 In addition, by automatically detecting and identifying an object from a video file and registering the identification result as text, it is possible to automatically extract a scene in which a desired object appears by a keyword search targeting the text. it can.

しかしながら、オブジェクトの検出／識別プロセスがブラックボックス化されていると、視聴ユーザはオブジェクト検出／識別の過程を把握できない。したがって、オブジェクトの識別結果に基づいて抽出されたシーンが再生されても当該シーンが何を根拠に選択されたシーンであるか等、その意味や内容を認識することができない場合がある。 However, if the object detection / identification process is black-boxed, the viewing user cannot grasp the object detection / identification process. Therefore, even if the scene extracted based on the identification result of the object is reproduced, it may not be possible to recognize the meaning and content of the scene, such as what the scene was selected based on.

さらに、ホームビデオでペット等を撮影した映像ファイルをネットワーク上に公開して多数のネットユーザの視聴に供する場合、汎用のオブジェクト検出／識別ではペットが犬であること、あるいは猫であることは識別できても名前等の固有情報は識別できない。したがって、ネットユーザに名前等を検索キーワードとするシーン検索を提供できない。 Furthermore, when a video file of a pet, etc. taken with a home video is published on the network and used for viewing by a large number of net users, general-purpose object detection / identification identifies that the pet is a dog or a cat. Even if it can be done, unique information such as a name cannot be identified. Therefore, it is not possible to provide a scene search using a name or the like as a search keyword to a net user.

さらに、映像ファイルが360°映像であると、死角の存在によりシーンとして抽出する再生位置を網羅的に探索することが困難であった。 Furthermore, if the video file is a 360 ° video, it is difficult to comprehensively search for the playback position to be extracted as a scene due to the presence of blind spots.

本発明の目的は、上記の技術課題を解決し、映像ファイル内の各シーンを、そのシーン識別子が画面内の適所に表示された形式で再生できる映像シーン管理装置を提供することにある。 An object of the present invention is to solve the above technical problems and to provide a video scene management device capable of reproducing each scene in a video file in a format in which the scene identifier is displayed at an appropriate position on the screen.

上記の目的を達成するために、本発明は、映像ファイルのシーンを管理する映像シーン管理装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above object, the present invention is characterized in that the video scene management device for managing the scene of the video file has the following configuration.

(1) 映像ファイルのシーンごとに、シーン識別子、シーン区間および当該シーンに表示するシーン識別子の位置情報を含むデータセットをシーン情報として取得するシーン情報取得手段と、各シーン情報のデータセットを映像ファイルごとに登録するシーン情報登録手段とを具備した。 (1) For each scene in the video file, the scene information acquisition means for acquiring the data set including the scene identifier, the scene section, and the position information of the scene identifier to be displayed in the scene as the scene information, and the data set of each scene information are video. It is equipped with a scene information registration means for registering each file.

(2) シーン情報をそのシーン識別子に対するキーワード検索により選択する手段と、選択したシーン情報に基づいて当該映像ファイルの各シーンを再生する手段と、再生する各シーンに前記選択したシーン情報に基づいてシーン識別子を表示する手段とを具備した。 (2) A means for selecting scene information by keyword search for the scene identifier, a means for reproducing each scene of the video file based on the selected scene information, and a means for reproducing each scene based on the selected scene information. It was provided with a means for displaying the scene identifier.

(3) シーン情報取得手段は、映像ファイルからオブジェクトを検出する手段と、オブジェクトを識別する手段とを具備し、前記シーン情報登録手段は、オブジェクトの識別結果をシーン識別子、オブジェクトの検出区間をシーン区間、オブジェクトの位置をシーン識別子の位置情報とするシーン情報を登録するようにした。 (3) The scene information acquisition means includes a means for detecting an object from a video file and a means for identifying the object, and the scene information registration means sets the object identification result as a scene identifier and the object detection section as a scene. Changed to register the scene information that uses the position of the section and the object as the position information of the scene identifier.

(4) ユーザの入力操作を受け付ける入力操作手段をさらに具備し、シーン情報取得手段は、入力操作手段から入力されたシーン区間を取得する手段と、入力操作手段から入力されたシーン識別子を取得する手段と、入力操作手段から入力されたシーン識別子の表示位置を位置情報として取得する手段とを具備した。 (4) Further provided with an input operation means for accepting a user's input operation, the scene information acquisition means acquires a means for acquiring a scene section input from the input operation means and a scene identifier input from the input operation means. It is provided with means and means for acquiring the display position of the scene identifier input from the input operation means as position information.

(5) 映像ファイルが360°映像であると、シーン識別子の位置情報を、当該360°映像を二次元に展開した座標上での座標値とした。 (5) When the video file is a 360 ° video, the position information of the scene identifier is used as the coordinate value on the coordinates obtained by expanding the 360 ° video in two dimensions.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) 映像ファイルのシーンごとに、シーン識別子、シーン区間およびシーン識別子の位置情報を含むデータセットがシーン情報として取得、登録されるので、当該シーン情報を参照すれば、映像ファイルのシーン識別子に対応したシーンを、当該シーン識別子が適所に表示された形式で再生できるようになる。 (1) For each scene in the video file, a data set including the scene identifier, the scene section, and the position information of the scene identifier is acquired and registered as the scene information. Therefore, if the scene information is referred to, the scene identifier of the video file can be used. The corresponding scene can be played back in a format in which the scene identifier is displayed in place.

(2) 映像ファイルのシーン情報を検索し、このシーン情報に基づいて当該映像ファイルの各シーンを再生するので、映像ファイルのシーン識別子に対応したシーンを当該シーン識別子と共に再生できるようになる。したがって、各シーンのシーン識別子を参照すれば、各シーンの内容や各シーンの各画面内に写る情報の意味、名称などを視覚的に認識できるようになる。 (2) Since the scene information of the video file is searched and each scene of the video file is played back based on this scene information, the scene corresponding to the scene identifier of the video file can be played back together with the scene identifier. Therefore, by referring to the scene identifier of each scene, it becomes possible to visually recognize the content of each scene and the meaning and name of the information reflected in each screen of each scene.

(3) シーン情報取得手段は、映像ファイルからオブジェクトを検出、識別し、オブジェクトの識別結果をシーン識別子、オブジェクトの検出区間をシーン区間、オブジェクトの位置をシーン識別子の位置情報とするシーン情報を登録するので、映像ファイルからシーンを客観的に抽出して再生できるのみならず、シーン識別子をシーン内の適所に表示できるようになる。 (3) The scene information acquisition means detects and identifies an object from a video file, and registers scene information in which the identification result of the object is the scene identifier, the detection section of the object is the scene section, and the position of the object is the position information of the scene identifier. Therefore, not only can the scene be objectively extracted from the video file and played back, but also the scene identifier can be displayed at an appropriate place in the scene.

(4) ユーザの入力操作を受け付ける入力操作手段を設け、入力操作手段から入力されたシーン区間、シーン識別子およびシーン識別子の表示位置に関する位置情報をシーン情報として登録するので、映像ファイルから手動でシーンを抽出し、シーン識別子およびその表示位置を任意に登録できる。したがって、多数のユーザに対してシーン情報の登録が許可されていれば、各ユーザの主観により、様々な観点からシーン情報を登録、表示できるようになる。 (4) An input operation means for accepting the user's input operation is provided, and the position information regarding the scene section, the scene identifier, and the display position of the scene identifier input from the input operation means is registered as the scene information. Can be extracted and the scene identifier and its display position can be registered arbitrarily. Therefore, if the registration of the scene information is permitted for a large number of users, the scene information can be registered and displayed from various viewpoints depending on the subjectivity of each user.

(5) 映像ファイルが360°映像であると、シーン識別子の位置情報を、当該360°映像を二次元に展開した際の座標値で表現するので、映像ファイルが360°映像であっても、特徴的なシーンにシーン情報を登録しておけば、各シーン識別子の座標が選択的に画面に表示された状態でシーンを再生できるようになる。 (5) If the video file is a 360 ° video, the position information of the scene identifier is expressed by the coordinate values when the 360 ° video is expanded in two dimensions, so even if the video file is a 360 ° video, If scene information is registered in a characteristic scene, the scene can be played back with the coordinates of each scene identifier selectively displayed on the screen.

本発明を適用した映像シーン情報管理システムの機能ブロック図である。It is a functional block diagram of the video scene information management system to which this invention is applied. シーン情報管理サーバにおけるシーン情報の登録形式を示した図である。It is a figure which showed the registration format of the scene information in the scene information management server. シーン情報管理装置の第１実施形態の機能ブロック図である。It is a functional block diagram of the 1st Embodiment of a scene information management apparatus. シーン情報の取得および登録方法を模式的に示した図である。It is a figure which showed typically the acquisition and registration method of a scene information. 映像ファイルの所望シーンの再生方法を示したフローチャートである。It is a flowchart which showed the reproduction method of the desired scene of a video file. シーン検索画面の一例を示した図である。It is a figure which showed an example of the scene search screen. キーワードが入力された場合のシーン再生の例を示した図である。It is a figure which showed the example of the scene reproduction when a keyword is input. キーワード「愛犬△△」が入力された場合のシーン再生を示した図である。It is a figure which showed the scene reproduction when the keyword "pet dog △△" is input. キーワード「愛猫〇〇」が入力された場合のシーン再生を示した図である。It is a figure which showed the scene reproduction when the keyword "love cat 〇〇" is input. シーン情報管理装置の第２実施形態の機能ブロック図である。It is a functional block diagram of the 2nd Embodiment of a scene information management apparatus. シーン情報を手動登録する際の登録画面の一例を示した図である。It is a figure which showed an example of the registration screen at the time of manual registration of a scene information. シーン情報の入力および登録方法を模式的に示した図である。It is a figure which showed typically the input and registration method of a scene information.

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明を適用した映像シーン情報管理システムの構成を示した機能ブロック図であり、映像シーン情報管理装置１、映像ファイルサーバ２およびシーン情報管理サーバ３を主要な構成としている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a functional block diagram showing a configuration of a video scene information management system to which the present invention is applied, and has a video scene information management device 1, a video file server 2, and a scene information management server 3 as main configurations.

映像ファイルサーバ２には多数の映像ファイルが格納されている。各映像ファイルは映像信号と音声信号とを含み、さらに字幕情報を含む場合もある。映像シーン情報管理装置１は、映像ファイルの特徴的なシーンごとにシーン情報を設定してシーン情報管理サーバ３へ登録する一方、映像ファイルの再生時には、対応するシーン情報をシーン情報管理サーバ３から読み出して当該シーンを選択的に再生する。 A large number of video files are stored in the video file server 2. Each video file includes a video signal and an audio signal, and may also include subtitle information. The video scene information management device 1 sets scene information for each characteristic scene of the video file and registers it in the scene information management server 3, while when playing back the video file, the corresponding scene information is transmitted from the scene information management server 3 from the scene information management server 3. Read out and selectively play the scene.

映像シーン情報管理装置１において、シーン情報取得部１０１は、映像ファイルの各シーンに固有のシーン識別子としてのテキストtext、当該シーンの開始時間tsおよび終了時間teならびに前記テキストtextをシーン再生時に画面表示する位置を特定する位置情報等を含むデータセットをシーン情報として取得する。シーン情報登録部１０２は、前記シーン情報のデータセットを映像ファイルの識別子と対応付けてシーン情報管理サーバ３へテーブル形式で登録する。 In the video scene information management device 1, the scene information acquisition unit 101 displays a text text as a scene identifier unique to each scene of the video file, a start time ts and an end time te of the scene, and the text text on the screen during scene playback. Acquire a data set including position information for specifying a position to be performed as scene information. The scene information registration unit 102 registers the scene information data set in the scene information management server 3 in a table format in association with the video file identifier.

図２は、前記シーン情報管理サーバ３における各シーン情報の登録形式を示した図であり、本実施形態では、シーン識別子としてのテキストtext、当該シーンの開始時間tsおよび終了時間teならびにテキストtextをシーン再生時に重畳表示する際の位置情報p1，p2が登録され、さらに当該シーン情報が客観的（auto）に取得されたものか主観的（user1，user2…）に取得されたものかを示す種別が登録されている。シーン情報管理サーバ３は、このようなデータセットの一ないし複数を映像ファイルごとにテーブル形式で管理する。 FIG. 2 is a diagram showing a registration format of each scene information in the scene information management server 3. In the present embodiment, a text text as a scene identifier, a start time ts and an end time te of the scene, and a text text are displayed. A type that indicates whether the position information p1 and p2 for superimposed display during scene playback are registered, and whether the scene information is acquired objectively (auto) or subjectively (user1, user2 ...). Is registered. The scene information management server 3 manages one or more of such data sets in a table format for each video file.

図１へ戻り、映像ファイル再生部１０３は、ユーザが入力操作部１０７から指定した映像ファイルを表示パネル１０６上で再生する。シーン情報検索部１０４は、ユーザが入力操作部１０７から指定した検索キーワードに対応するシーン情報をシーン情報管理サーバ３から検索して取得する。シーン情報再生部１０５は、映像ファイルの前記シーン情報で特定されるシーンの再生に合わせて、当該シーンと対応付けられたテキストtextを位置情報p1，p2で特定される座標位置に表示する。 Returning to FIG. 1, the video file reproduction unit 103 reproduces the video file specified by the user from the input operation unit 107 on the display panel 106. The scene information search unit 104 searches for and acquires scene information corresponding to the search keyword specified by the user from the input operation unit 107 from the scene information management server 3. The scene information reproduction unit 105 displays the text text associated with the scene at the coordinate positions specified by the position information p1 and p2 in accordance with the reproduction of the scene specified by the scene information of the video file.

このように、本実施形態によればシーン情報に基づく映像ファイルのシーン再生時に、当該シーンと対応付けられたテキストtextが当該シーンと関連する位置に表示されるので、シーン再生が容易になると共に、当該シーンの要部、解釈、趣旨等を視聴者に簡単に認識させることができるようになる。 As described above, according to the present embodiment, when the scene of the video file based on the scene information is reproduced, the text text associated with the scene is displayed at the position related to the scene, so that the scene reproduction becomes easy. , The viewer can easily recognize the main part, interpretation, purpose, etc. of the scene.

図３は、前記シーン情報管理装置１の第１実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。本実施形態では、シーン情報取得部１０１が、オブジェクト検出部１０１ａ、オブジェクト識別部１０１ｂおよびオブジェクト識別モデル１０１ｃを含み、映像ファイルからシーン情報を自動的に、すなわち客観的に取得する点に特徴がある。 FIG. 3 is a functional block diagram showing the configuration of the first embodiment of the scene information management device 1, and the same reference numerals as those above represent the same or equivalent parts, and thus the description thereof will be omitted. The present embodiment is characterized in that the scene information acquisition unit 101 includes the object detection unit 101a, the object identification unit 101b, and the object identification model 101c, and automatically, that is, objectively acquires the scene information from the video file. ..

オブジェクト検出部１０１ａは、映像ファイルから、人、物、動物等のオブジェクトを検出する。なお、映像ファイルが360°映像であると、当該映像が２次元に展開され、当該２次元映像上でオブジェクト検出が実行される。 The object detection unit 101a detects objects such as people, objects, and animals from the video file. If the video file is a 360 ° video, the video is expanded in two dimensions and object detection is executed on the two-dimensional video.

オブジェクト識別部１０１ｂは、予めオブジェクト識別用の画像特徴量を学習したオブジェクト識別モデル１０１ｃに前記検出したオブジェクトを適用することで当該オブジェクトを識別し、識別結果をテキスト形式で出力する。 The object identification unit 101b identifies the object by applying the detected object to the object identification model 101c that has learned the image feature amount for object identification in advance, and outputs the identification result in a text format.

前記シーン情報登録部１０２は、オブジェクトの識別結果（テキスト）、オブジェクトの出現期間（開始時間tsおよび終了時間te）、オブジェクトの画面上での位置情報p1，p2およびこれらの情報の種別（auto）からなるデータセットをシーン情報としてシーン情報管理サーバ３へ登録する。 The scene information registration unit 102 sets the object identification result (text), the object appearance period (start time ts and end time te), the position information p1 and p2 on the screen of the object, and the type of these information (auto). A data set consisting of the above is registered in the scene information management server 3 as scene information.

図４は、本実施形態におけるシーン情報の取得および登録方法を模式的に示した図であり、再生時間t1において、同図(a)に示したようにオブジェクト検出部１０１ａが犬をオブジェクトとして検知すると、オブジェクトに外接する矩形枠k1が設定され、その右上座標および左下座標が、それぞれ当該オブジェクトの位置情報p1_1，p2_1として取得される。前記オブジェクト識別部１０１ｂは、オブジェクト識別モデル１０１ｃを用いて当該オブジェクトを「犬」と識別し、オブジェクト追跡が開始される。また、オブジェクト識別モデル１０１ｃにより具体的なモデル、例えば「愛犬△△」が存在すれば、「愛犬△△」と識別し、オブジェクト追跡が開始される。 FIG. 4 is a diagram schematically showing a method of acquiring and registering scene information in the present embodiment. At playback time t1, the object detection unit 101a detects a dog as an object as shown in FIG. 4A. Then, the rectangular frame k1 circumscribing the object is set, and the upper right coordinate and the lower left coordinate of the rectangular frame k1 are acquired as the position information p1_1 and p2_1 of the object, respectively. The object identification unit 101b identifies the object as a "dog" using the object identification model 101c, and object tracking is started. Further, if a specific model, for example, "pet dog Δ△" exists by the object identification model 101c, it is identified as "pet dog Δ△" and object tracking is started.

再生時刻t2において、同図(b)に示したようにオブジェクト検出部１０１ａが新たに猫をオブジェクトとして検知すると、オブジェクトに外接する矩形枠k2が設定され、その位置情報p1_2，p2_2が取得される。前記オブジェクト識別部１０１ｂは、当該オブジェクトを「猫」と識別し、オブジェクト追跡が開始される。また、オブジェクト識別モデル１０１ｃにより具体的なモデル、例えば「愛猫△△」が存在すれば、「愛猫〇〇」と識別してオブジェクト追跡が開始される。 When the object detection unit 101a newly detects a cat as an object at the playback time t2 as shown in Fig. (B), a rectangular frame k2 circumscribing the object is set, and its position information p1_2 and p2_2 are acquired. .. The object identification unit 101b identifies the object as a "cat" and starts object tracking. Further, if a specific model, for example, "love cat △△" exists by the object identification model 101c, it is identified as "love cat 〇〇" and object tracking is started.

時刻t3において、同図(c)に示したように犬がフレームアウトするとシーン情報の登録が実行される。ここでは、テキストtextが「愛犬△△」、開始時間tsが「t1」、終了時間teが「t3」、第１座標が「p1_1」、第２座標が「p2_1」、種別がオブジェクト識別に基づく客観的な自動登録である旨の「auto」からなるデータセットが、当該映像ファイルに関するシーン情報の一つとしてシーン情報管理サーバ３へ登録される。 At time t3, when the dog frames out as shown in Fig. (C), the scene information is registered. Here, the text text is "pet dog △△", the start time ts is "t1", the end time te is "t3", the first coordinate is "p1_1", the second coordinate is "p2_1", and the type is based on object identification. A data set consisting of "auto" indicating that it is an objective automatic registration is registered in the scene information management server 3 as one of the scene information related to the video file.

さらに、時刻t4において、同図(d)に示したように猫がフレームアウトするとシーン情報の登録が実行される。ここでは、テキストtextが「愛猫〇〇」、開始時間tsが「t2」、終了時間teが「t4」、第１座標が「p1_2」、第２座標が「p2_2」、種別が「auto」のデータセットが当該映像ファイルに関するもう一つのシーン情報としてシーン情報管理サーバ３へ追加登録される。 Further, at time t4, when the cat frames out as shown in Fig. (D), the scene information is registered. Here, the text text is "love cat 〇〇", the start time ts is "t2", the end time te is "t4", the first coordinate is "p1_2", the second coordinate is "p2_2", and the type is "auto". Data set is additionally registered in the scene information management server 3 as another scene information related to the video file.

図５は、上記のようにしてシーン情報管理サーバ３に登録されたシーン情報を用いて映像ファイルの所望シーンを再生する方法を示したフローチャートであり、主に、前記シーン情報検索部１０４およびシーン情報再生部１０５の動作を示している。 FIG. 5 is a flowchart showing a method of reproducing a desired scene of a video file using the scene information registered in the scene information management server 3 as described above, and mainly includes the scene information search unit 104 and the scene. The operation of the information reproduction unit 105 is shown.

ステップＳ１では、表示パネル１０６にシーン検索画面が表示される。図６は、シーン検索画面の一例を示した図であり、映像再生枠１１と共に、検索キーワードを入力する検索ボックス１２、前のシーンへ戻る「戻る」ボタン１３、および次のシーンへ進む「次へ」ボタン１４が用意されている。さらに、前記映像再生枠１１の右側には、検索結果の総数「４」および現在再生中の検索結果の序数「１」が分数表示されており、その下方にメッセージ表示スペース１５が確保されている。 In step S1, the scene search screen is displayed on the display panel 106. FIG. 6 is a diagram showing an example of a scene search screen, in which a search box 12 for inputting a search keyword, a “back” button 13 for returning to the previous scene, and a “next” for advancing to the next scene are shown together with the video playback frame 11. A "to" button 14 is provided. Further, on the right side of the video reproduction frame 11, the total number of search results "4" and the ordinal number "1" of the search results currently being reproduced are displayed in fractions, and a message display space 15 is secured below the total number of search results. ..

ステップＳ２において、ユーザによる検索ボックス１２へのキーワード入力が確認されるとステップＳ３へ進み、シーン情報管理サーバ３のシーン情報テーブルが参照される。ユーザは、キーワードとして前記「愛犬△△」や「愛猫〇〇」を入力することができる。 When the user confirms the keyword input to the search box 12 in step S2, the process proceeds to step S3, and the scene information table of the scene information management server 3 is referred to. The user can input the above-mentioned "pet dog △△" and "pet cat 〇〇" as keywords.

ステップＳ４では、テキストtextが前記キーワードと一致するデータセットが登録されているか否かが判断される。登録されていなければステップＳ１２へ進み、キーワードに一致するシーンが未登録である旨のメッセージを前記メッセージ表示スペース１５に表示して当該処理を終了する。 In step S4, it is determined whether or not a data set whose text text matches the keyword is registered. If it is not registered, the process proceeds to step S12, a message indicating that the scene matching the keyword is not registered is displayed in the message display space 15, and the process is terminated.

これに対して、テキストtextが前記キーワードと一致するデータセットが登録されていればステップＳ５へ進み、そのようなデータセットが複数存在するか否かが判断される。一つしか存在しなければステップＳ１３へ進み、当該データセットの開始時間tsで特定される位置から終了時間teで特定される位置までのシーン再生が開始される。このとき、当該シーンの前記第１および第２座標p1，p2に対応した位置に前記テキストtextが重畳表示される。 On the other hand, if a data set whose text text matches the keyword is registered, the process proceeds to step S5, and it is determined whether or not there are a plurality of such data sets. If there is only one, the process proceeds to step S13, and scene reproduction from the position specified by the start time ts of the data set to the position specified by the end time te is started. At this time, the text text is superimposed and displayed at the positions corresponding to the first and second coordinates p1 and p2 of the scene.

図７は、キーワードとして「愛犬△△」が入力された場合のシーン再生の例を示した図である。本実施形態では、「愛犬△△」がテキストtextとして登録されたデータセットが存在するので、その開始時間ts_1、終了時間te_1、第１および第２座標p1，p2が取得される。そして、「愛犬△△」が登場する開始時刻ts_1からシーン再生が開始され、前記第１および第２座標p1，p2で定義される位置またはその近傍に前記テキスト「愛犬△△」が表示される。 FIG. 7 is a diagram showing an example of scene reproduction when "pet dog Δ△" is input as a keyword. In the present embodiment, since there is a data set in which "pet dog △△" is registered as a text text, the start time ts_1, the end time te_1, and the first and second coordinates p1 and p2 are acquired. Then, the scene playback is started from the start time ts_1 when the "pet dog △△" appears, and the text "pet dog △△" is displayed at or near the positions defined by the first and second coordinates p1 and p2. ..

図８は、キーワードとして「愛猫〇〇」が入力された場合のシーン再生の例を示した図である。本実施形態では、「愛猫〇〇」がテキストtextとして登録されたデータセットが存在するので、その開始時間、終了時間、第１および第２座標が取得される。そして、「愛猫〇〇」が登場する開始時刻ts_1からシーン再生が開始され、前記第１および第２座標p1，p2で定義される位置またはその近傍に前記テキスト「愛猫〇〇」が表示される。 FIG. 8 is a diagram showing an example of scene reproduction when "love cat 〇〇" is input as a keyword. In the present embodiment, since there is a data set in which "love cat 〇〇" is registered as a text text, the start time, end time, first and second coordinates thereof are acquired. Then, the scene playback is started from the start time ts_1 when "love cat 〇〇" appears, and the text "love cat 〇〇" is displayed at or near the positions defined by the first and second coordinates p1 and p2. Will be done.

なお、前記キーワードとして「愛犬△△」および「愛猫〇〇」の論理積を入力すると、図９に示したように、「愛犬△△」および「愛猫〇〇」のいずれもが登場する時刻から当該キーワードを含むシーン再生が開始される。また、論理和を入力すると、「愛犬△△」および「愛猫〇〇タマ」のいずれかが登場する全てのシーンが再生される。 If the logical product of "pet dog △△" and "pet cat 〇〇" is input as the keyword, both "pet dog △△" and "pet cat 〇〇" appear as shown in FIG. Scene playback including the keyword is started from the time. Also, when the logical sum is input, all the scenes in which either "Pet dog △△" or "Pet cat 〇〇tama" appears are played.

図５へ戻り、前記ステップＳ５において、キーワードと一致するテキストを含むデータセットが複数存在すればステップＳ６へ進み、開始時間tsが最も早いデータセットに基づいて、当該開始時間tsから終了時間teまでのシーンが前記テキストと共に再生される。ステップＳ７では、「次へ」ボタンに対する操作の有無が判断され、そのような操作が検知されるとステップＳ８へ進む。ステップＳ８では、開始時間tsが次のデータセットに基づいて、当該開始時間から終了時間まで、前記テキストの重畳表示を含むシーンが再生される。 Returning to FIG. 5, in step S5, if there are a plurality of data sets containing text matching the keyword, the process proceeds to step S6, and based on the data set having the earliest start time ts, from the start time ts to the end time te. Scene is played back with the text. In step S7, it is determined whether or not there is an operation for the "Next" button, and when such an operation is detected, the process proceeds to step S8. In step S8, a scene including the superimposed display of the text is reproduced from the start time to the end time based on the next data set having a start time ts.

ステップＳ９では、「戻る」ボタンに対する操作の有無が判断され、そのような操作が検知されるとステップＳ１０へ進む。ステップＳ１０では、開始時間が一つ前のデータセットに基づいて、当該開始時間から終了時間まで、前記テキストの重畳表示を含むシーンが再生される。ステップＳ１１ではシーン再生を停止する操作の有無が判断され、そのような操作が検知されるまでは、ステップＳ７へ戻って上記の各処理が繰り返される。 In step S9, it is determined whether or not there is an operation for the "back" button, and when such an operation is detected, the process proceeds to step S10. In step S10, the scene including the superimposed display of the text is reproduced from the start time to the end time based on the data set immediately before the start time. In step S11, it is determined whether or not there is an operation to stop the scene reproduction, and until such an operation is detected, the process returns to step S7 and each of the above processes is repeated.

本実施形態によれば、映像ファイルからシーンを客観的に抽出して再生できるのみならず、シーン内の適所にシーン識別子を表示できるので、各シーンの内容や各シーンの各画面内に写る情報の意味、名称などを視覚的に認識できるようになる。 According to this embodiment, not only can the scene be objectively extracted from the video file and played back, but also the scene identifier can be displayed at an appropriate position in the scene, so that the content of each scene and the information reflected in each screen of each scene can be displayed. You will be able to visually recognize the meaning and name of.

図１０は、前記シーン情報管理装置１の第２実施形態の構成を示した機能ブロック図であり、前記と同一の符号は同一または同等部分を表しているので、その説明は省略する。本実施形態では、シーン情報取得部１０１が、テキスト取得部１０１ｄ、シーン区間取得部１０１ｅおよび表示位置取得部１０１ｆを含み、映像ファイルのシーン情報を手動操作で、すなわち主観的に取得するようにした点に特徴がある。 FIG. 10 is a functional block diagram showing the configuration of the second embodiment of the scene information management device 1, and the same reference numerals as those above represent the same or equivalent parts, and thus the description thereof will be omitted. In the present embodiment, the scene information acquisition unit 101 includes the text acquisition unit 101d, the scene section acquisition unit 101e, and the display position acquisition unit 101f, and manually acquires the scene information of the video file, that is, subjectively. It is characterized by points.

前記テキスト取得部１０１ｄは、映像ファイルの各シーンにおいてユーザが入力操作部１０７から入力したテキストtextを取得する。シーン区間取得部１０１ｅは、前記テキストに対応したシーン区間としてユーザが入力操作部１０７から入力する開始時間tsおよび終了時間teを取得する。表示位置取得部１０１ｆは、前記テキストの表示位置としてユーザが入力操作部１０７から入力する位置情報p1，p2を取得する。前記シーン情報登録部１０２は、入力されたテキストtext、開始時間ts、終了時間teおよび位置情報p1，p2のデータセットをシーン情報としてシーン情報管理サーバ３へ登録する。 The text acquisition unit 101d acquires the text text input by the user from the input operation unit 107 in each scene of the video file. The scene section acquisition unit 101e acquires the start time ts and the end time te that the user inputs from the input operation unit 107 as the scene section corresponding to the text. The display position acquisition unit 101f acquires the position information p1 and p2 input by the user from the input operation unit 107 as the display position of the text. The scene information registration unit 102 registers the input text text, start time ts, end time te, and data sets of position information p1 and p2 as scene information in the scene information management server 3.

図１１は、第２実施形態においてシーン情報を手動登録する際の登録画面の一例を示した図であり、前記表示パネル１０６に表示されてユーザにより操作される。 FIG. 11 is a diagram showing an example of a registration screen when manually registering scene information in the second embodiment, which is displayed on the display panel 106 and operated by the user.

映像再生枠２１は映像ファイルやその一部シーンを再生する。時刻t1において、図１２(a)に示したように愛犬△△が出現し、当該再生位置を愛犬△△のシーンとして登録するのであれば、ユーザはテキストボックス２２にテキスト「愛犬△△」を入力し、開始時間ボックスに時刻t1を入力する。 The video reproduction frame 21 reproduces a video file and a part of the scene. If the pet dog △△ appears at time t1 as shown in FIG. 12 (a) and the playback position is registered as the pet dog △△ scene, the user inputs the text “pet dog △△” in the text box 22. Enter and enter the time t1 in the Start Time box.

ユーザはさらに、第１座標ボックス２５に「愛犬△△」の位置情報として、例えばその右上座標p1を入力し、第２座標ボックス２６に左下座標p2を入力し、種別ボックス２７にユーザ名を入力する。 The user further inputs, for example, the upper right coordinate p1 of the "pet dog △△" in the first coordinate box 25, the lower left coordinate p2 in the second coordinate box 26, and the user name in the type box 27. To do.

その後、同図(b)に示したように、「愛犬△△」が時刻t2においてフレームアウトすると終了時間ボックス２４に時刻t2を入力する。最後に、入力ボタン２８をチェックして上記のデータセットをシーン情報としてサーバへ登録する。 After that, as shown in FIG. 6B, when the “pet dog △△” framed out at the time t2, the time t2 is input to the end time box 24. Finally, the input button 28 is checked and the above data set is registered in the server as scene information.

本実施形態によれば、映像ファイルから手動でシーンを抽出し、シーン識別子およびその表示位置を任意に登録できるので、多数のユーザに対してシーン情報の登録が許可されていれば、各ユーザの主観により、様々な観点からシーン情報を登録、表示できるようになる。 According to the present embodiment, a scene can be manually extracted from a video file and a scene identifier and its display position can be arbitrarily registered. Therefore, if registration of scene information is permitted for a large number of users, each user can register the scene. Depending on the subjectivity, scene information can be registered and displayed from various viewpoints.

なお、上記の第１実施形態では映像ファイルの画像を対象にオブジェクト検出および識別を実行してシーン識別子としてのテキストを取得するものとして説明したが、本発明はこれのみに限定されるものではなく、映像ファイルの音声信号を対象とする音声認識により特徴的なテキストを取得し、あるいは字幕情報から特徴的なテキストを取得しても良い。 In the first embodiment described above, the object detection and identification are performed on the image of the video file to acquire the text as the scene identifier, but the present invention is not limited to this. , A characteristic text may be acquired by voice recognition targeting an audio signal of a video file, or a characteristic text may be acquired from subtitle information.

１...映像シーン情報管理装置，２...映像ファイルサーバ，３...シーン情報管理サーバ，１０１...シーン情報取得部，１０１ａ...オブジェクト検出部，１０１ｂ...オブジェクト識別部，１０１ｃ...オブジェクト識別モデル，１０１ｄ...テキスト取得部，１０１ｅ...シーン区間取得部，１０１ｆ...表示位置取得部，１０２...シーン情報登録部，１０３...映像ファイル再生部，１０４...シーン情報検索部，１０５...シーン情報再生部，１０６...表示パネル，１０７...ユーザが入力操作部 1 ... video scene information management device, 2 ... video file server, 3 ... scene information management server, 101 ... scene information acquisition unit, 101a ... object detection unit, 101b ... object identification Unit, 101c ... Object identification model, 101d ... Text acquisition unit, 101e ... Scene section acquisition unit, 101f ... Display position acquisition unit, 102 ... Scene information registration unit, 103 ... Video File playback unit, 104 ... scene information search unit, 105 ... scene information playback unit, 106 ... display panel, 107 ... user input operation unit

Claims

In the video scene management device that manages the scenes of the video file
A scene information acquisition means for acquiring a data set including a scene identifier, a scene section, and a position information of a scene identifier to be displayed in the scene as scene information for each scene of a video file.
A video scene information management device characterized in that it is provided with a scene information registration means for registering a data set of each scene information for each video file.

A means of selecting scene information by keyword search for the scene identifier,
A means to play each scene in the video file based on the selected scene information,
The video scene information management device according to claim 1, wherein each scene to be reproduced is provided with a means for displaying a scene identifier based on the selected scene information.

The scene information acquisition means is
A means of detecting objects from video files,
It has a means to identify the detected object,
Claim 1 or 2 is characterized in that the scene information registration means registers scene information in which the object identification result is a scene identifier, the object detection section is a scene section, and the object position is the position information of the scene identifier. The video scene information management device described in.

The video file contains video and audio signals.
The video scene information management device according to claim 3, wherein the means for detecting the object is to extract the object from at least one of a video signal and an audio signal.

The video file contains video signals, audio signals and subtitle information.
The video scene information management device according to claim 3, wherein the means for detecting the object is to extract the object from at least one of a video signal, an audio signal, and subtitle information.

The video scene information management device according to any one of claims 3 to 5, wherein the data set includes type information indicating that the scene information is objectively generated.

Further equipped with an input operation means for accepting a user's input operation,
The scene information acquisition means is
A means to acquire the scene section input from the input operation means, and
A means to acquire the scene identifier input from the input operation means, and
The video scene information management device according to claim 1 or 2, further comprising means for acquiring the position information of the scene identifier input from the input operation means.

The video scene information management device according to claim 7, wherein the data set includes type information indicating that the scene information is subjectively generated by the user.

According to any one of claims 1 to 8, the video file is a 360 ° video, and the position information of the scene identifier is a coordinate value on the coordinates obtained by expanding the 360 ° video in two dimensions. The described video scene information management device.