JP6856883B2

JP6856883B2 - Information processing device, control method and control program of information processing device

Info

Publication number: JP6856883B2
Application number: JP2020004073A
Authority: JP
Inventors: 健治笠原; 咲耶西山
Original assignee: Mixi Inc
Current assignee: Mixi Inc
Priority date: 2020-01-15
Filing date: 2020-01-15
Publication date: 2021-04-14
Anticipated expiration: 2036-12-28
Also published as: JP2020074563A

Description

本発明は、ムービーからフレームを抽出して出力する情報処理システムに関する。
本発明は、特に、ムービーからフレームと音声を抽出し、当該音声に対応するテキストを当該フレームに重畳表示する情報処理システムに関する。 The present invention relates to an information processing system that extracts and outputs a frame from a movie.
The present invention particularly relates to an information processing system that extracts a frame and a voice from a movie and superimposes and displays a text corresponding to the voice on the frame.

音声付き動画から注目静止画を自動的に抽出して写真アルバムを作成する方法が知られている（例えば、特許文献１参照）。
具体的には、音声付き動画データを構成する音声データの波形データ又はテキストデータの中から指定波形又は指定テキストに一致する注目部分を特定し、当該音声付き動画データを構成する動画データから当該注目部分に対応する静止画を注目静止画として抽出し、注目静止画に注目部分の音声のテキストが付加された写真アルバムを作成する（特許文献１の［請求項４］［請求項６］等）。 A method of automatically extracting a still image of interest from a moving image with audio to create a photo album is known (see, for example, Patent Document 1).
Specifically, the attention portion that matches the designated waveform or the designated text is specified from the waveform data or text data of the audio data that constitutes the video data with audio, and the attention is given from the video data that constitutes the video data with audio. The still image corresponding to the portion is extracted as the still image of interest, and a photo album in which the audio text of the portion of interest is added to the still image of interest is created (Patent Document 1 [Claim 4] [Claim 6], etc.). ..

特開２００６−３３３０６５号公報Japanese Unexamined Patent Publication No. 2006-333065

静止画像にオブジェクトを重畳配置する場合、オブジェクトの配置態様を静止画像の表示内容に応じて設定できれば都合がよい。特許文献１には、注目静止画に対してテキストをどのように付加するか具体的な配置態様が開示されていない。 When superimposing an object on a still image, it is convenient if the arrangement mode of the object can be set according to the display content of the still image. Patent Document 1 does not disclose a specific arrangement mode of how to add a text to a still image of interest.

本発明が解決しようとする課題は、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置することである。 An object to be solved by the present invention is to superimpose a text corresponding to a voice extracted from a movie on a frame extracted from the movie in a highly appealing manner.

〔Ａ〕上記課題を解決するため、本発明の一態様である「情報処理装置」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定手段と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される音源の位置を加味して設定する設定手段と、を備える。
〔Ｂ〕上記課題を解決するため、本発明の一態様である「情報処理装置の制御方法」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定ステップと、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定ステップにおいて特定される音源の位置を加味して設定する設定ステップと、を含む。
〔Ｃ〕上記課題を解決するため、本発明の一態様である「制御プログラム」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定機能と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定機能により特定される音源の位置を加味して設定する設定機能と、を情報処理装置のコンピュータに実現させる。
〔Ｄ〕上記課題を解決するため、本発明の一態様である「コンピュータ読取り可能な記録媒体」に記録される制御プログラムは、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定機能と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定機能により特定される音源の位置を加味して設定する設定機能と、を情報処理装置のコンピュータに実現させる。 [A] In order to solve the above problem, the "information processing device" which is one aspect of the present invention is a sound source of a designated sound source included in the designated range in relation to a designated frame included in the designated range of the movie. It includes a specific means for specifying a position, and a setting means for setting an arrangement mode of superimposed text corresponding to the designated voice and superposed on the designated frame in consideration of the position of a sound source specified by the specific means. ..
[B] In order to solve the above problem, the "control method of the information processing device" which is one aspect of the present invention is a designated sound included in the designated range in relation to the designated frame included in the designated range of the movie. A specific step of specifying the position of the sound source of the above, and a setting step of setting the arrangement mode of the superimposed text corresponding to the designated voice and superposed on the designated frame in consideration of the position of the sound source specified in the specific step. ,including.
[C] In order to solve the above problem, the "control program" which is one aspect of the present invention is the position of the sound source of the designated sound included in the designated range in relation to the designated frame included in the designated range of the movie. Information processing of a specific function for specifying the above-mentioned function and a setting function for setting the arrangement mode of the superimposed text corresponding to the specified voice and superposed on the specified frame in consideration of the position of the sound source specified by the specific function. Realize it on the computer of the device.
[D] In order to solve the above problems, the control program recorded on the "computer-readable recording medium" which is one aspect of the present invention has the designated range in relation to the designated frame included in the designated range of the movie. The position of the sound source specified by the specific function is added to the specific function for specifying the position of the sound source of the designated voice included in the above, and the arrangement mode of the superimposed text corresponding to the designated voice and superposed on the designated frame. The setting function to be set is realized in the computer of the information processing device.

上記〔Ａ〕の「情報処理装置」には、下記の技術的限定を加えてもよい。また、同様の技術的限定を、上記〔Ｂ〕の「制御方法」，上記〔Ｃ〕の「制御プログラム」及び上記〔Ｄ〕の「記録媒体」が記録する制御プログラムに加えてもよい。
・前記設定手段が、前記特定手段により特定される音源の位置と前記指定フレームの表示範囲との関係に応じて前記配置態様を設定する。
・前記設定手段が、前記音源の位置が前記表示範囲の内部である場合に前記重畳テキストが該音源に関連付けて配置されるように前記配置態様を設定し、前記音源の位置が前記表示範囲の外部である場合に前記重畳テキストが該音源に関連付けずに配置されるように前記配置態様を設定する。
・前記特定手段が、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて前記音源の位置を特定し、前記設定手段が、前記ユーザ操作により指定される領域に応じて前記重畳テキストの配置領域を設定する。
・前記ユーザ操作は閉図形を描く操作であり、前記特定手段が、前記ユーザ操作の端点の位置に応じて前記音源の位置を特定し、前記設定手段が、前記ユーザ操作が描く閉図形が形成する閉領域の位置に応じて前記重畳テキストの配置領域を設定する。
・複数の候補範囲をすくなくとも含む候補リストの中からユーザにより選択される指定数の範囲を前記指定範囲として指定する指定手段をさらに備え、前記特定手段が、前記指定手段により指定される指定範囲ごとに、該指定範囲に包含される指定音声の音源の位置を特定し、前記設定手段が、前記指定手段により指定される指定範囲ごとに、該指定範囲に包含される指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される該指定範囲に包含される指定音声の音源の位置を加味して設定する。
・前記指定手段が、前記ムービーに包含される音声のまとまりにそれぞれ対応する複数の候補テキストにそれぞれ関連付けられた複数の前記候補範囲を含む前記候補リストをユーザに提示し前記指定数の範囲を該ユーザに選択させる。 The following technical limitations may be added to the "information processing device" of the above [A]. Further, the same technical limitation may be added to the control program recorded by the "control method" of the above [B], the "control program" of the above [C], and the "recording medium" of the above [D].
-The setting means sets the arrangement mode according to the relationship between the position of the sound source specified by the specific means and the display range of the designated frame.
The setting means sets the arrangement mode so that the superimposed text is arranged in association with the sound source when the position of the sound source is inside the display range, and the position of the sound source is the display range. The arrangement mode is set so that the superimposed text is arranged without being associated with the sound source when it is external.
The specific means specifies the position of the sound source according to the position designated by the user operation that also serves as the position designation and the area designation, and the setting means superimposes the sound source according to the area designated by the user operation. Set the text placement area.
The user operation is an operation of drawing a closed figure, the specific means specifies the position of the sound source according to the position of the end point of the user operation, and the setting means forms the closed figure drawn by the user operation. The arrangement area of the superimposed text is set according to the position of the closed area.
Further, a designation means for designating a specified number of ranges selected by the user from a candidate list including at least a plurality of candidate ranges as the designated range is provided, and the specific means is specified for each designated range designated by the designated means. The position of the sound source of the designated sound included in the designated range is specified, and the setting means is superposed on the designated frame included in the designated range for each designated range designated by the designated means. The arrangement mode of the superimposed text is set in consideration of the position of the sound source of the designated voice included in the designated range specified by the specific means.
The designated means presents the user with a candidate list including a plurality of candidate ranges associated with a plurality of candidate texts corresponding to a group of sounds included in the movie, and sets the specified number of ranges. Let the user make a choice.

本明細書では、下記のように用語を用いる。
・「ムービー」は、複数のフレームと音声データとを少なくとも包含する。
・「指定範囲」とは、「ムービー」のタイムラインにおいて指定される時間的範囲をいう。
・「フレーム」とは、「ムービー」を構成する静止画像をいう。
・「指定音声」の音源の位置は、「指定フレーム」が表示する範囲の内側又は外側に存在する。
・「重畳テキスト」とは、例えば、「指定音声」から変換された変換テキスト，「指定音声」の検出位置に補充された補充テキスト，変換テキスト又は補充テキストに編集が加えられた編集テキストなど、「指定音声」に由来するテキストをいう。
・「配置態様」とは、「重畳テキスト」の配置の仕方をいう。
・「ユーザ操作」は、画面上の指示位置を連続的に入力する位置入力装置（例えば、タッチパネル，マウス，タッチパッド等）を用いて行われる。 In this specification, terms are used as follows.
-A "movie" includes at least a plurality of frames and audio data.
-"Specified range" means the time range specified in the timeline of "movie".
-"Frame" means a still image that constitutes a "movie".
-The position of the sound source of the "designated voice" exists inside or outside the range displayed by the "designated frame".
-The "superimposed text" is, for example, a converted text converted from the "designated voice", a supplementary text supplemented at the detection position of the "designated voice", a converted text or an edited text obtained by editing the supplementary text, etc. A text derived from "designated voice".
-"Arrangement mode" refers to the method of arranging "superimposed text".
-"User operation" is performed using a position input device (for example, a touch panel, a mouse, a touch pad, etc.) that continuously inputs an instruction position on the screen.

本発明は、指定フレームに重畳配置される重畳テキストの配置態様を当該指定フレームとの関係で特定される指定音声の音源の位置に応じて設定する。
よって、本発明によれば、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置することが可能になる。 The present invention sets the arrangement mode of the superimposed text superimposed on the designated frame according to the position of the sound source of the designated voice specified in relation to the designated frame.
Therefore, according to the present invention, it is possible to superimpose and arrange the text corresponding to the voice extracted from the movie on the frame extracted from the movie in a highly appealing manner.

重畳テキストの配置態様の説明図である。（実施例）It is explanatory drawing of the arrangement mode of superimposition text. (Example) 重畳テキストの配置態様の説明図である。（実施例）It is explanatory drawing of the arrangement mode of superimposition text. (Example) ネットワーク構成例の説明図である。（実施例）It is explanatory drawing of the network configuration example. (Example) ユーザ装置のハードウェア構成例の説明図である。（実施例）It is explanatory drawing of the hardware configuration example of a user apparatus. (Example) サーバ装置のハードウェア構成例の説明図である。（実施例）It is explanatory drawing of the hardware configuration example of a server device. (Example) 画像生成システムの機能構成例の説明図である。（実施例）It is explanatory drawing of the functional configuration example of an image generation system. (Example) 画像生成手順の説明図である。（実施例）It is explanatory drawing of the image generation procedure. (Example) 管理データの項目例の説明図である。（実施例）It is explanatory drawing of the item example of management data. (Example) 編集画面Ａの表示例の説明図である。（実施例）It is explanatory drawing of the display example of an edit screen A. (Example) 編集データの項目例の説明図である。（実施例）It is explanatory drawing of the item example of edit data. (Example) 編集画面Ｂの表示例の説明図である。（実施例）It is explanatory drawing of the display example of an edit screen B. (Example) サムネイル画像の表示例の説明図である。（実施例）It is explanatory drawing of the display example of a thumbnail image. (Example) サムネイル画像の表示例の説明図である。（実施例）It is explanatory drawing of the display example of a thumbnail image. (Example) 重畳テキストの配置態様の説明図である。（変形例）It is explanatory drawing of the arrangement mode of superimposition text. (Modification example) 重畳テキストの配置態様の説明図である。（変形例）It is explanatory drawing of the arrangement mode of superimposition text. (Modification example)

［１．実施形態］
［１−１．概要］
本実施形態は、ムービーからフレームと音声を抽出し当該音声に対応するテキストを当該フレームに重畳配置して出力する情報処理システムに関する。
本実施形態では、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置するため、フレームに重畳配置されるテキストの配置態様を指定音声の音源の位置に応じて設定する構成が採用されている。 [1. Embodiment]
[1-1. Overview]
The present embodiment relates to an information processing system that extracts a frame and a voice from a movie, superimposes and arranges a text corresponding to the voice on the frame, and outputs the frame and the voice.
In the present embodiment, in order to superimpose the text corresponding to the voice extracted from the movie on the frame extracted from the movie in a highly appealing manner, the arrangement mode of the text superposed on the frame is specified as the sound source of the voice. The configuration that sets according to the position of is adopted.

［１−２．情報処理装置］
本実施形態に係る情報処理システムを構成する情報処理装置は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定手段と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される音源の位置を加味して設定する設定手段と、を備える。 [1-2. Information processing device]
The information processing apparatus constituting the information processing system according to the present embodiment includes a specific means for specifying the position of a sound source of a designated sound source included in the designated range in relation to a designated frame included in the designated range of the movie. The present invention includes setting means for setting the arrangement mode of the superimposed text corresponding to the designated voice and superimposing and arranging on the designated frame in consideration of the position of the sound source specified by the specific means.

［２．実施例］
［２−１．概要］
本実施例は、ムービーからフレームと音声を抽出し当該音声に対応するテキストを当該フレームに重畳配置した複数の画像をレイアウトして合成したサムネイル画像を生成する画像生成サービスを提供する画像生成システムに関する。
画像生成サービスでは、ムービーから抽出されるフレームや音声は、画像生成サービスを利用するユーザによりそれぞれ指定される。 [2. Example]
[2-1. Overview]
This embodiment relates to an image generation system that provides an image generation service that extracts a frame and sound from a movie, lays out a plurality of images in which text corresponding to the sound is superimposed on the frame, and generates a composite thumbnail image. ..
In the image generation service, the frames and sounds extracted from the movie are each specified by the user who uses the image generation service.

画像生成サービスでは、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置するため、フレームに重畳配置される重畳テキストの配置態様が指定音声の音源の位置とフレームの表示範囲との関係に応じて設定される。
また、画像生成システムでは、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて音源の位置が特定され、当該ユーザ操作により指定される領域に応じて重畳テキストの配置領域が設定される。 In the image generation service, the text corresponding to the voice extracted from the movie is superimposed and arranged on the frame extracted from the movie in a highly appealing manner. Therefore, the arrangement mode of the superimposed text superimposed on the frame is the designated voice. It is set according to the relationship between the position of the sound source and the display range of the frame.
Further, in the image generation system, the position of the sound source is specified according to the position specified by the user operation that combines the position designation and the area designation, and the arrangement area of the superimposed text is set according to the area specified by the user operation. Will be done.

図１を参照し、重畳テキストの配置態様の一例を説明する。図１では、指定音声の音源の位置がフレームの表示範囲の内部である場合が想定される。
フレーム１１０には被写体１２０が含まれる。フレーム１１０を包含する画面に閉図形を描くユーザ操作がなされると、軌跡１３０が形成される。このとき、軌跡１３０の始点１３１の位置に応じて音源の位置が特定され、軌跡１３０が形成する閉領域１３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の内部である場合、重畳テキストは、閉領域１３２に応じた領域に始点１３１の位置に応じた音源の位置に関連付けて配置される吹き出し１４０の内部に配置される。 An example of the arrangement mode of the superimposed text will be described with reference to FIG. In FIG. 1, it is assumed that the position of the sound source of the designated voice is inside the display range of the frame.
The frame 110 includes the subject 120. When a user operation for drawing a closed figure is performed on the screen including the frame 110, the locus 130 is formed. At this time, the position of the sound source is specified according to the position of the start point 131 of the locus 130, and the text placement area is set according to the position of the closed region 132 formed by the locus 130.
When the position of the sound source of the designated voice is inside the display range of the frame, the superimposed text is placed inside the balloon 140 which is arranged in the area corresponding to the closed area 132 in association with the position of the sound source corresponding to the position of the start point 131. Be placed.

図２を参照し、重畳テキストの配置態様の他の例を説明する。図２では、指定音声の音源の位置がフレームの表示範囲の外部である場合が想定される。
フレーム２１０には被写体２２０が含まれる。フレーム２１０を包含する画面に閉図形を描くユーザ操作がなされると、軌跡２３０が形成される。このとき、軌跡２３０の始点２３１の位置に応じて音源の位置が特定され、軌跡２３０が形成する閉領域２３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の外部である場合、重畳テキストは、閉領域２３２に応じた領域に始点２３１の位置に応じた音源の位置に関連付けずに配置される領域２４０の内部に配置される。 Another example of the arrangement mode of the superimposed text will be described with reference to FIG. In FIG. 2, it is assumed that the position of the sound source of the designated voice is outside the display range of the frame.
The frame 210 includes a subject 220. When a user operation for drawing a closed figure is performed on the screen including the frame 210, the locus 230 is formed. At this time, the position of the sound source is specified according to the position of the start point 231 of the locus 230, and the text placement area is set according to the position of the closed region 232 formed by the locus 230.
When the position of the sound source of the designated voice is outside the display range of the frame, the superimposed text is placed inside the area 240 corresponding to the closed area 232 without being associated with the position of the sound source corresponding to the position of the start point 231. Is placed in.

［２−２．ネットワーク構成］
図３は、実施例のシステムのネットワーク構成例を示す。
本実施例のシステムは、ユーザが使用するユーザ端末１０と、画像生成サービスを提供する画像生成システム２０と、を含む。
画像生成システム２０は、ユーザ管理サーバ２１とデータ処理サーバ２２とデータ管理サーバ２３とファイル管理サーバ２４とストレージ２５とを含む。 [2-2. Network configuration]
FIG. 3 shows an example of a network configuration of the system of the embodiment.
The system of this embodiment includes a user terminal 10 used by a user and an image generation system 20 that provides an image generation service.
The image generation system 20 includes a user management server 21, a data processing server 22, a data management server 23, a file management server 24, and a storage 25.

ユーザ端末１０とユーザ管理サーバ２１とは、通信ネットワーク３０を通じてそれぞれデータの授受が可能である。ユーザ管理サーバ２１及びデータ処理サーバ２２は、データ管理サーバ２３を介して、ストレージ２５に記憶されるデータにそれぞれアクセス可能である。また、ユーザ管理サーバ２１及びデータ処理サーバ２２は、ファイル管理サーバ２４を介して、ストレージ２５に記憶されるデータにそれぞれアクセス可能である。
通信ネットワーク３０は、既存のネットワーク（例えば、インターネット（Internet），携帯電話網，無線ＷＡＮ（Wireless Wide Area Network），無線ＬＡＮ（Wireless Local Area Network），イーサネット（Ethernet）（登録商標）などのうち少なくともいずれか）を含んでいてよい。 Data can be exchanged between the user terminal 10 and the user management server 21 through the communication network 30. The user management server 21 and the data processing server 22 can access the data stored in the storage 25 via the data management server 23, respectively. Further, the user management server 21 and the data processing server 22 can access the data stored in the storage 25 via the file management server 24, respectively.
The communication network 30 is at least one of existing networks (for example, Internet, mobile phone network, wireless WAN (Wireless Wide Area Network), wireless LAN (Wireless Local Area Network), Ethernet (Ethernet) (registered trademark), and the like. Any) may be included.

［２−２−１．ユーザ端末］
ユーザ端末１０は、所定のＷｅｂブラウザプログラムがインストールされたユーザ装置（コンピュータ）である。
本実施例のシステムでは、ユーザ装置として、Ｗｅｂブラウザプログラムをインストール可能な汎用の携帯装置（例えば、携帯電話，スマートフォン（smartphone），タブレット（tablet）端末，タブレットＰＣ（personal computer），ウェアラブルデバイス（wearable device）など）や汎用の処理装置（例えば、ＰＣ（personal computer）など）を用いることができる。 [2-2-1. User terminal]
The user terminal 10 is a user device (computer) in which a predetermined Web browser program is installed.
In the system of this embodiment, as a user device, a general-purpose mobile device (for example, a mobile phone, a smartphone (smartphone), a tablet (tablet) terminal, a tablet PC (personal computer), a wearable device (wearable)) on which a Web browser program can be installed can be installed. (Device), etc.) and general-purpose processing devices (for example, PC (personal computer), etc.) can be used.

［２−２−２．画像生成システム］
画像生成システム２０は、ユーザ管理サーバ２１とデータ処理サーバ２２とデータ管理サーバ２３とファイル管理サーバ２４とストレージ２５とを含む。
ユーザ管理サーバ２１は、Ｗｅｂサーバプログラム（ＨＴＴＰデーモン（HyperText Transfer Protocol Daemon）ともいう。）がインストールされたサーバ装置（コンピュータ）である。
ユーザ管理サーバ２１は、ユーザ端末１０からの要求（リクエスト）に応じて、データ管理サーバ２３を介してストレージ２５から必要なデータを読み出し、ユーザ端末１０に提供（レスポンス）する。また、ユーザ管理サーバ２１は、ユーザ端末１０からの要求（リクエスト）に応じて、ユーザ端末１０から取得したデータを、データ管理サーバ２３を介してストレージ２５に書き込み、処理結果をユーザ端末１０に提供（レスポンス）する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、ユーザ管理サーバ２１の機能を分担させ又はユーザ管理サーバ２１にかかる負荷を分散させてもよい。 [2-2-2. Image generation system]
The image generation system 20 includes a user management server 21, a data processing server 22, a data management server 23, a file management server 24, and a storage 25.
The user management server 21 is a server device (computer) in which a Web server program (also referred to as an HTTP daemon (HyperText Transfer Protocol Daemon)) is installed.
The user management server 21 reads necessary data from the storage 25 via the data management server 23 and provides (responses) to the user terminal 10 in response to a request from the user terminal 10. Further, the user management server 21 writes the data acquired from the user terminal 10 to the storage 25 via the data management server 23 in response to the request from the user terminal 10, and provides the processing result to the user terminal 10. (Response).
A server system may be configured by linking a plurality of server devices to share the functions of the user management server 21 or to distribute the load applied to the user management server 21.

データ処理サーバ２２は、アプリケーションプログラムがインストールされたサーバ装置（コンピュータ）である。
データ処理サーバ２２は、データ管理サーバ２３を介してストレージ２５から必要なデータを読み出し、これに演算・加工を施し、演算・加工済みのデータをデータ管理サーバ２３を介してストレージ２５に書き込む。また、データ処理サーバ２２は、ファイル管理サーバ２４を介してストレージ２５から必要なデータを読み出し、これに演算・加工を施し、演算・加工済みのデータをファイル管理サーバ２４を介してストレージ２５に書き込む。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、データ処理サーバ２２の機能を分担させ又はデータ処理サーバ２２にかかる負荷を分散させてもよい。 The data processing server 22 is a server device (computer) in which an application program is installed.
The data processing server 22 reads necessary data from the storage 25 via the data management server 23, performs calculation / processing on the necessary data, and writes the calculated / processed data to the storage 25 via the data management server 23. Further, the data processing server 22 reads necessary data from the storage 25 via the file management server 24, performs calculation / processing on the necessary data, and writes the calculated / processed data to the storage 25 via the file management server 24. ..
A server system may be configured by linking a plurality of server devices to share the functions of the data processing server 22 or to distribute the load applied to the data processing server 22.

データ管理サーバ２３は、ＤＢ（Database）サーバプログラムがインストールされたサーバ装置（コンピュータ）である。データ管理サーバ２３は、内蔵する又は外部の接続可能なストレージ２５とともにＤＢＭＳ（Database Management System）を構成する。
データ管理サーバ２３は、例えば、データの格納要求に応じ要求元から取得されるデータをストレージ２５に格納する機能と、データの抽出要求に応じストレージ２５から抽出されるデータを要求元に応答する機能とを有する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、データ管理サーバ２３の機能を分担させ又はデータ管理サーバ２３にかかる負荷を分散させてもよい。 The data management server 23 is a server device (computer) in which a DB (Database) server program is installed. The data management server 23 constitutes a DBMS (Database Management System) together with the built-in or externally connectable storage 25.
The data management server 23 has, for example, a function of storing data acquired from a request source in response to a data storage request and a function of responding to a request source of data extracted from the storage 25 in response to a data extraction request. And have.
A server system may be configured by linking a plurality of server devices, and the functions of the data management server 23 may be shared or the load applied to the data management server 23 may be distributed.

ファイル管理サーバ２４は、ファイルサーバプログラムがインストールされたサーバ装置（コンピュータ）である。
ファイル管理サーバ２４は、例えば、データの格納要求に応じ要求元から取得されるデータをストレージ２５に格納する機能と、データの抽出要求に応じストレージ２５から抽出されるデータを要求元に応答する機能とを有する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、ファイル管理サーバ２４の機能を分担させ又はファイル管理サーバ２４にかかる負荷を分散させてもよい。 The file management server 24 is a server device (computer) in which a file server program is installed.
The file management server 24 has, for example, a function of storing data acquired from a request source in response to a data storage request and a function of responding to a request source of data extracted from the storage 25 in response to a data extraction request. And have.
A server system may be configured by linking a plurality of server devices to share the functions of the file management server 24 or to distribute the load applied to the file management server 24.

ストレージ２５は、管理データ及びファイルデータを記憶する記憶装置である。
なお、複数の記憶装置を用意し、ストレージ２５が記憶するデータの種類ごとに別々に記憶させてもよい。またストレージ２５が記憶するデータを複数の記憶装置に分散配置することも可能である。 The storage 25 is a storage device that stores management data and file data.
A plurality of storage devices may be prepared and stored separately for each type of data stored in the storage 25. It is also possible to distribute the data stored in the storage 25 to a plurality of storage devices.

［２−３．ハードウェア構成］
［２−３−１．ユーザ装置のハードウェア構成］
図４は、ユーザ装置のハードウェア構成例を示す。
典型的なユーザ装置は、制御処理部を構成するＭＰＵ（Micro-Processing Unit）４１１と、主記憶部を構成するＲＡＭ（Random Access Memory）４２１と、補助記憶部を構成するＲＯＭ（Read Only Memory）４２２及びＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory）４２３と、入力部及び表示部を構成するタッチパネルディスプレイ４３１と、音声出力部を構成するスピーカ４３２と、通信制御部を構成するＮＩＣ（Network Interface Controller）４４１及び無線ＬＡＮ（Local Area Network）チップ４４２と、を少なくとも有する。 [2-3. Hardware configuration]
[2-3-1. Hardware configuration of user device]
FIG. 4 shows an example of the hardware configuration of the user device.
A typical user device is an MPU (Micro-Processing Unit) 411 that constitutes a control processing unit, a RAM (Random Access Memory) 421 that constitutes a main storage unit, and a ROM (Read Only Memory) that constitutes an auxiliary storage unit. 422 and EEPROM (Electrically Erasable Programmable Read-Only Memory) 423, a touch panel display 431 that constitutes an input unit and a display unit, a speaker 432 that constitutes an audio output unit, and a NIC (Network Interface Controller) that constitutes a communication control unit. It has at least a 441 and a wireless LAN (Local Area Network) chip 442.

ＲＡＭ４２１と、ＲＯＭ４２２と、ＥＥＰＲＯＭ４２３と、タッチパネルディスプレイ４３１と、スピーカ４３２と、ＮＩＣ４４１と、無線ＬＡＮチップ４４２とは、バスラインを介してＭＰＵ４１１と接続される。
ＭＰＵ４１１は、（１）ＲＯＭ４２２又はＥＥＰＲＯＭ４２３に記憶されたプログラムをＲＡＭ４２１上に読み込み、（２）プログラムの指示に従ってタッチパネルディスプレイ４３１とＥＥＰＲＯＭ４２３とＮＩＣ４４１と無線ＬＡＮチップ４４２との少なくともいずれかからデータを取得し、（３）取得したデータをプログラムに規定される手順で演算・加工した上で、（４）演算済み・加工済みのデータをＥＥＰＲＯＭ４２３とタッチパネルディスプレイ４３１とスピーカ４３２とＮＩＣ４４１と無線ＬＡＮチップ４４２との少なくともいずれかに提供する。 The RAM 421, ROM 422, EEPROM 423, touch panel display 431, speaker 432, NIC 441, and wireless LAN chip 442 are connected to the MPU 411 via a bus line.
The MPU 411 (1) reads the program stored in the ROM 422 or the EEPROM 423 onto the RAM 421, and (2) acquires data from at least one of the touch panel display 431, the EEPROM 423, the NIC 441, and the wireless LAN chip 442 according to the instructions of the program. (3) After calculating and processing the acquired data according to the procedure specified in the program, (4) at least the calculated and processed data of EEPROM 423, touch panel display 431, speaker 432, NIC 441, and wireless LAN chip 442. Provide to either.

［２−３−２．サーバ装置のハードウェア構成］
図５は、サーバ装置のハードウェア構成例を示す。
典型的なサーバ装置は、ＭＰＵやＲＯＭを含む制御処理装置５１０と、ＲＡＭを含む主記憶装置５２０と、ＨＤＤ（Hard Disc Drive）を含む補助記憶装置５３０と、マウスやキーボードを含む入力装置５４０と、ディスプレイやスピーカを含む出力装置５５０と、ネットワークカード（Network Interface Card）を含む通信制御装置５６０と、を有する。 [2-3-2. Server device hardware configuration]
FIG. 5 shows an example of the hardware configuration of the server device.
Typical server devices include a control processing device 510 including an MPU and ROM, a main memory device 520 including a RAM, an auxiliary storage device 530 including an HDD (Hard Disc Drive), and an input device 540 including a mouse and a keyboard. It has an output device 550 including a display and a speaker, and a communication control device 560 including a network interface card.

主記憶装置５２０、補助記憶装置５３０、入力装置５４０、出力装置５５０及び通信制御装置５６０は、バスラインを介して制御処理装置５１０とそれぞれ接続される。
制御処理装置５１０は、（１）補助記憶装置５３０に記憶されたプログラムを主記憶装置５２０上に読み込み、（２）プログラムの指示に従って入力装置５４０と補助記憶装置５３０と通信制御装置５６０との少なくともいずれかからデータを取得し、（３）取得したデータをプログラムに規定される手順で演算・加工した上で、（４）演算済み・加工済みのデータを補助記憶装置５３０と出力装置５５０と通信制御装置５６０との少なくともいずれかに提供する。 The main storage device 520, the auxiliary storage device 530, the input device 540, the output device 550, and the communication control device 560 are each connected to the control processing device 510 via a bus line.
The control processing device 510 (1) reads the program stored in the auxiliary storage device 530 onto the main storage device 520, and (2) follows at least the input device 540, the auxiliary storage device 530, and the communication control device 560 according to the instruction of the program. Data is acquired from either of them, (3) the acquired data is calculated and processed according to the procedure specified in the program, and (4) the calculated and processed data is communicated with the auxiliary storage device 530 and the output device 550. Provided to at least one with control device 560.

［２−４．機能構成］
図６は、画像生成システムの機能構成例を示す。
図６に例示されるように、ユーザ管理サーバ２１は、受付部６１１と、作成部６１２と、提供部６１３と、を含む。また、データ処理サーバ２２は、抽出部６２１と、変換部６２２と、指定部６２３と、特定部６２４と、設定部６２５と、生成部６２６と、を含む。 [2-4. Functional configuration]
FIG. 6 shows an example of the functional configuration of the image generation system.
As illustrated in FIG. 6, the user management server 21 includes a reception unit 611, a creation unit 612, and a provision unit 613. Further, the data processing server 22 includes an extraction unit 621, a conversion unit 622, a designation unit 623, a specific unit 624, a setting unit 625, and a generation unit 626.

ユーザ管理サーバ２１が担う機能は、サーバ装置向けＯＳ（Operating System）と当該ＯＳ上で動作するＷｅｂサーバプログラムとがサーバ装置にそれぞれインストールされることにより実現される。
データ処理サーバ２２が担う機能は、サーバ装置向けＯＳと当該ＯＳ上で動作するアプリケーションプログラムとがサーバ装置にそれぞれインストールされることにより実現される。
サーバ装置にインストールされるべきプログラムは、各種の記録媒体（例えば、ＣＤ（Compact Disc），ＤＶＤ（Digital Versatile Disk），ＭＯディスク（Magneto-Optical disk），フラッシュメモリ（flash memory）など）に記録された状態で配布され当該記録媒体からサーバ装置に読み込まれてもよいし、通信ネットワークを介し搬送波に重畳させてサーバ装置に供給されてもよい。 The function of the user management server 21 is realized by installing an OS (Operating System) for the server device and a Web server program running on the OS in the server device, respectively.
The function of the data processing server 22 is realized by installing an OS for the server device and an application program running on the OS in the server device, respectively.
Programs to be installed in the server device are recorded on various recording media (for example, CD (Compact Disc), DVD (Digital Versatile Disk), MO disk (Magneto-Optical disk), flash memory, etc.). It may be distributed in this state and read from the recording medium into the server device, or may be superposed on the carrier via the communication network and supplied to the server device.

受付部６１１は、ユーザ端末１０から要求（リクエスト）を受け付ける。
作成部６１２は、受付部６１１により受け付けられた要求（リクエスト）に応じたＷｅｂページを作成する。
提供部６１３は、作成部６１２により作成されたＷｅｂページをユーザ端末１０に提供（レスポンス）する。 The reception unit 611 receives a request from the user terminal 10.
The creation unit 612 creates a Web page according to the request received by the reception unit 611.
The providing unit 613 provides (responses) the Web page created by the creating unit 612 to the user terminal 10.

抽出部６２１は、指定ムービーからフレームと音声を抽出する。抽出されたフレーム及び音声は、ファイル管理サーバ２４に格納される。
変換部６２２は、抽出部６２１により抽出される音声に音声認識処理を施して変換テキストを生成する。なお、有効な音声認識が不可能である場合は、認識不可能な音声であることを示す補充テキストを生成する。生成されたテキストは、データ管理サーバ２３に格納される。 The extraction unit 621 extracts frames and sounds from the designated movie. The extracted frame and voice are stored in the file management server 24.
The conversion unit 622 performs voice recognition processing on the voice extracted by the extraction unit 621 to generate a conversion text. If effective speech recognition is not possible, a supplementary text indicating that the speech is unrecognizable is generated. The generated text is stored in the data management server 23.

指定部６２３は、抽出部６２１により指定ムービーから抽出される音声のまとまりにそれぞれ対応する複数の候補テキスト（変換テキスト又は補充テキスト）にそれぞれ関連付けられた複数の候補範囲を含む候補リストをユーザに提示して単数又は複数の範囲を選択させ、選択された範囲を指定範囲としてそれぞれ指定する。
また、指定範囲に包含される音声を指定音声として指定するとともに、指定範囲に包含されるいずれかのフレームを指定フレームとして指定する。指定フレームの指定は自動的に行ってもよいし、ユーザの指定に応じて行ってもよい。 The designation unit 623 presents to the user a candidate list including a plurality of candidate ranges associated with a plurality of candidate texts (converted text or supplementary text) corresponding to a group of sounds extracted from the designated movie by the extraction unit 621. Then, one or a plurality of ranges are selected, and the selected range is designated as a designated range.
In addition, the voice included in the designated range is designated as the designated voice, and any frame included in the designated range is designated as the designated frame. The designated frame may be specified automatically or according to the user's specification.

特定部６２４は、位置指定と領域指定とを兼ねるユーザ操作の始点の位置に応じて、指定範囲に包含される指定フレームとの関係で当該指定範囲に包含される指定音声の音源の位置を特定する。本実施例では、音源の位置は指定範囲ごとに特定される。
設定部６２５は、特定部６２４により特定される音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストが当該音源に関連付けて配置されるように配置態様を設定し、音源の位置が表示範囲の外部である場合に重畳テキストが当該音源に関連付けずに配置されるように配置態様を設定する。いずれの場合にも、位置指定と領域指定とを兼ねるユーザ操作が描く閉図形が形成する閉領域の位置に応じて重畳テキストの配置領域を設定する。
生成部６２６は、サムネイル画像を生成する。生成されたサムネイル画像は、ファイル管理サーバ２４を介してストレージ２５に格納され、Ｗｅｂサーバにて公開される。 The identification unit 624 specifies the position of the sound source of the designated voice included in the designated range in relation to the designated frame included in the designated range according to the position of the start point of the user operation that also serves as the position designation and the area designation. To do. In this embodiment, the position of the sound source is specified for each designated range.
The setting unit 625 sets the arrangement mode so that the superimposed text is arranged in association with the sound source when the position of the sound source specified by the specific unit 624 is inside the display range of the designated frame, and the position of the sound source is set. The arrangement mode is set so that the superimposed text is arranged without being associated with the sound source when it is outside the display range. In either case, the arrangement area of the superimposed text is set according to the position of the closed area formed by the closed figure drawn by the user operation that serves as both the position specification and the area specification.
The generation unit 626 generates a thumbnail image. The generated thumbnail image is stored in the storage 25 via the file management server 24 and published on the Web server.

［２−５．画像生成手順］
図７は、画像生成手順を例示する。画像生成システム２０は、下記の手順によりサムネイル画像を生成する。
〔Ｓ７０２〕
ユーザ管理サーバ２１が、ユーザ端末１０からムービー指定データを取得する。ムービー指定データは、指定ムービーファイルがユーザ端末１０の補助記憶部に記憶されている場合は当該指定ムービーファイルを特定するファイルパス、指定ムービーファイルが他のサーバ装置にて管理されている場合は当該指定ムービーファイルを特定するＵＲＬ（Uniform Resource Locator）である。 [2-5. Image generation procedure]
FIG. 7 illustrates an image generation procedure. The image generation system 20 generates thumbnail images according to the following procedure.
[S702]
The user management server 21 acquires movie designation data from the user terminal 10. The movie designation data corresponds to the file path that identifies the designated movie file when the designated movie file is stored in the auxiliary storage unit of the user terminal 10, and the designated movie file when the designated movie file is managed by another server device. A URL (Uniform Resource Locator) that identifies the specified movie file.

〔Ｓ７０４〕
ユーザ管理サーバ２１又はデータ処理サーバ２２が、指定ムービーファイルを取得する。指定ムービーファイルは、ユーザ端末１０又は他のサーバ装置から取得する。
取得された指定ムービーファイルは、ファイル管理サーバ２４を介してストレージ２５に格納される。また、取得された指定ムービーファイルを管理するムービー管理情報が、データ管理サーバ２３を介してストレージ２５に格納される。 [S704]
The user management server 21 or the data processing server 22 acquires the designated movie file. The designated movie file is acquired from the user terminal 10 or another server device.
The acquired designated movie file is stored in the storage 25 via the file management server 24. Further, the movie management information for managing the acquired designated movie file is stored in the storage 25 via the data management server 23.

図８（ａ）は、ムービー管理情報の項目例を示す。
図８（ａ）に例示されるように、ムービー管理情報は、キー項目である「ムービーＩＤ」と、指定ムービーファイルのストレージ２５における格納位置を示す「ファイルパス」，指定ムービーに基づくサムネイル画像の生成をリクエストしたユーザを特定する「ユーザＩＤ」，指定ムービーに基づくサムネイル画像の生成リクエストを受け付けた時期を特定可能な「受付時期」，指定ムービーの「再生時間」と、を含む。 FIG. 8A shows an example of items of movie management information.
As illustrated in FIG. 8A, the movie management information includes a key item "movie ID", a "file path" indicating a storage position of the designated movie file in the storage 25, and a thumbnail image based on the designated movie. It includes a "user ID" that identifies the user who requested the generation, a "reception time" that can specify the time when the thumbnail image generation request based on the specified movie is received, and a "playback time" of the specified movie.

〔Ｓ７０６〕
データ処理サーバ２２が、指定ムービーから抽出される音声データに音声認識処理を施して変換テキストを生成する。なお、有効な音声認識が不可能である場合は、認識不可能な音声であることを示す補充テキストを生成する。生成されたテキストを管理するテキスト管理情報が、データ管理サーバ２３に格納される。 [S706]
The data processing server 22 performs voice recognition processing on the voice data extracted from the designated movie to generate converted text. If effective speech recognition is not possible, a supplementary text indicating that the speech is unrecognizable is generated. The text management information that manages the generated text is stored in the data management server 23.

図８（ｂ）は、テキスト管理情報の項目例を示す。
図８（ｂ）に例示されるように、テキスト管理情報は、キー項目である「ムービーＩＤ」と、範囲の先頭を特定可能な「先頭位置」，範囲の末尾を特定可能な「末尾位置」，当該範囲に包含される音声に対する音声認識結果である「変換テキスト又は補充テキスト」と、を含む。 FIG. 8B shows an example of items of text management information.
As illustrated in FIG. 8B, the text management information includes a key item "movie ID", a "start position" that can specify the beginning of the range, and a "end position" that can specify the end of the range. , Includes "converted text or supplementary text", which is the result of speech recognition for speech included in the range.

〔Ｓ７０８〕
データ処理サーバ２２が、指定ムービーから仮フレームを抽出する。具体的には、指定ムービーから抽出される音声のまとまりにそれぞれ対応する複数の候補テキスト（変換テキスト又は補充テキスト）にそれぞれ関連付けられた複数の候補範囲ごとに、当該候補範囲に包含されるいずれかのフレームを仮フレームとして抽出する。 [S708]
The data processing server 22 extracts a temporary frame from the designated movie. Specifically, one of the plurality of candidate ranges associated with each of the plurality of candidate texts (converted text or supplementary text) corresponding to the group of sounds extracted from the specified movie is included in the candidate range. Is extracted as a temporary frame.

〔Ｓ７１０〕
ユーザ管理サーバ２１が、編集画面Ａをユーザ端末１０に提示する。
図９は、編集画面Ａの表示例である。
図９に例示されるように、編集画面Ａ９００は、指定ムービーのタイムライン９１０に対応付けて複数の候補範囲９２０（９２０ａ〜９２０ｆ）を配置したものである。複数の候補範囲９２０には、テキストボックス９２２（９２２ａ〜９２２ｆ），再生ボタン９２４（９２４ａ〜９２４ｆ），仮フレーム９２６（９２６ａ〜９２６ｆ），チェックボックス９２８（９２８ａ〜９２８ｆ）がそれぞれ対応付けて表示される。
編集画面Ａ９００において、タイムライン９１０は、ムービー管理情報（図８（ａ））の「再生時間」に対応する。 [S710]
The user management server 21 presents the edit screen A to the user terminal 10.
FIG. 9 is a display example of the edit screen A.
As illustrated in FIG. 9, the editing screen A900 has a plurality of candidate ranges 920 (920a to 920f) arranged in association with the timeline 910 of the designated movie. In the plurality of candidate ranges 920, a text box 922 (922a to 922f), a play button 924 (924a to 924f), a temporary frame 926 (926a to 926f), and a check box 928 (928a to 928f) are displayed in association with each other. To.
On the edit screen A900, the timeline 910 corresponds to the "playback time" of the movie management information (FIG. 8A).

候補範囲９２０は、テキスト管理情報（図８（ｂ））の「先頭位置」から「末尾位置」までの範囲に対応する。テキストボックス９２２に表示されるテキストは、テキスト管理情報（図８（ｂ））の「変換テキスト又は補充テキスト」に対応する。
再生ボタン９２４がタップされると、ユーザ端末１０はテキスト管理情報（図８（ｂ））の「先頭位置」から「末尾位置」までの候補範囲の音声の再生をユーザ管理サーバ２１にリクエストする。
ユーザが必要に応じて音声を再生させつつ必要に応じてテキストボックス９２２のテキストを編集し、単数又は複数（図９の表示例では４つ）の候補範囲にそれぞれ対応するチェックボックス９２８にチェックを入れてボタン９３０をタップすると、ユーザ端末１０は編集データＡをユーザ管理サーバ２１に送信する。 The candidate range 920 corresponds to the range from the “start position” to the “end position” of the text management information (FIG. 8 (b)). The text displayed in the text box 922 corresponds to the "converted text or supplementary text" in the text management information (FIG. 8 (b)).
When the play button 924 is tapped, the user terminal 10 requests the user management server 21 to play the voice in the candidate range from the "start position" to the "end position" of the text management information (FIG. 8 (b)).
The user edits the text in the text box 922 as needed while playing the audio as needed, and checks the check boxes 928 corresponding to each of the singular or plural (four in the display example of FIG. 9) candidate ranges. When the button 930 is inserted and the button 930 is tapped, the user terminal 10 transmits the edited data A to the user management server 21.

〔Ｓ７１２〕
ユーザ管理サーバ２１が、ユーザ端末１０から編集データＡを取得する。取得された編集データＡは、データ管理サーバ２３を介してストレージ２５に蓄積される。
図１０（ａ）は、編集データＡの項目例を示す。
図１０（ａ）に例示されるように、編集データＡは、「ムービーＩＤ」と、候補範囲の先頭を特定可能な「先頭位置」と、候補範囲の末尾を特定可能な「末尾位置」と、当該候補範囲に対応する「編集テキスト」と、当該候補範囲が選択されている場合に有意とする「選択フラグ」と、を含む。 [S712]
The user management server 21 acquires the editing data A from the user terminal 10. The acquired edited data A is stored in the storage 25 via the data management server 23.
FIG. 10A shows an example of items of the edited data A.
As illustrated in FIG. 10A, the editing data A includes a "movie ID", a "start position" that can specify the beginning of the candidate range, and a "end position" that can specify the end of the candidate range. , The "edit text" corresponding to the candidate range and the "selection flag" which is significant when the candidate range is selected.

〔Ｓ７１４〕
ユーザ管理サーバ２１が、編集画面Ｂをユーザ端末１０に提示する。
図１１は、編集画面Ｂの表示例である。
図１１に例示されるように、編集画面Ｂ１１００は、領域１１１０，領域１１２０，領域１１３０，ボタン１１４０を含む。 [S714]
The user management server 21 presents the edit screen B to the user terminal 10.
FIG. 11 is a display example of the edit screen B.
As illustrated in FIG. 11, the edit screen B1100 includes an area 1110, an area 1120, an area 1130, and a button 1140.

領域１１１０には、選択された範囲ごとに仮フレーム１１１２（１１１２ａ〜１１１２ｄ）と編集テキスト１１１４（１１１４ａ〜１１１４ｄ）と領域１１１６（１１１６ａ〜１１１６ｄ）が表示される。
仮フレーム１１１２の上方に表示される矢印がタップされると、当該仮フレームの直前又はそれより前の他のフレームに更新される。同様に、仮フレーム１１１２の下方に表示される矢印がタップされると、当該仮フレームの直後又はそれより後の他のフレームに更新される。
仮フレーム１１１２を包含する領域１１１６において図１及び図２を参照して説明したように閉図形を描くユーザ操作がなされると、軌跡が形成される。 Temporary frames 1112 (1112a to 1112d), edit texts 1114 (1114a to 1114d), and areas 1116 (1116a to 1116d) are displayed in the area 1110 for each selected range.
When the arrow displayed above the temporary frame 1112 is tapped, the frame is updated to another frame immediately before or before the temporary frame. Similarly, when the arrow displayed below the temporary frame 1112 is tapped, the frame is updated to another frame immediately after or after the temporary frame.
When a user operation for drawing a closed figure is performed in the area 1116 including the temporary frame 1112 as described with reference to FIGS. 1 and 2, a locus is formed.

領域１１２０では、サムネイル画像のレイアウトが選択される。図１１の表示例では、２行２列とするか、４行１列とするかを選択させている。
領域１１３０では、サムネイル画像の出力形態が選択される。図１１の表示例では、ＪＰＥＧ（Joint Photographic Experts Group）形式のＷｅｂ表示用データとするか、ＰＤＦ（Portable Document Format）形式の印刷用データとするかを選択させている。
ユーザが必要に応じてフレームを選択し、選択されたフレームに対してユーザ操作を行い、レイアウト及び出力形態を選択し、ボタン１１４０をタップすると、ユーザ端末１０は編集データＢ及び選択情報（レイアウト選択情報及び出力形態選択情報）をユーザ管理サーバ２１に送信する。 In area 1120, the thumbnail image layout is selected. In the display example of FIG. 11, it is selected whether to use 2 rows and 2 columns or 4 rows and 1 column.
In the area 1130, the output form of the thumbnail image is selected. In the display example of FIG. 11, JPEG (Joint Photographic Experts Group) format Web display data or PDF (Portable Document Format) format print data is selected.
When the user selects a frame as necessary, performs a user operation on the selected frame, selects a layout and an output form, and taps the button 1140, the user terminal 10 displays the editing data B and selection information (layout selection). Information and output form selection information) are transmitted to the user management server 21.

〔Ｓ７１６〕
ユーザ管理サーバ２１が、ユーザ端末１０から編集データＢを取得する。取得された編集データＢは、データ管理サーバ２３を介してストレージ２５に蓄積される。
図１０（ｂ）は、編集データＢの項目例を示す。
図１０（ｂ）に例示されるように、編集データＢは、「ムービーＩＤ」と、候補範囲の先頭を特定可能な「先頭位置」と、候補範囲の末尾を特定可能な「末尾位置」と、当該候補範囲に包含される「選択フレーム」と、当該候補範囲に対応する「軌跡情報」と、を含む。 [S716]
The user management server 21 acquires the editing data B from the user terminal 10. The acquired edited data B is stored in the storage 25 via the data management server 23.
FIG. 10B shows an example of items of the edited data B.
As illustrated in FIG. 10B, the edited data B includes a "movie ID", a "start position" that can specify the beginning of the candidate range, and a "end position" that can specify the end of the candidate range. , The "selection frame" included in the candidate range and the "trajectory information" corresponding to the candidate range are included.

〔Ｓ７１８〕
データ処理サーバ２２が、サムネイル画像を生成する。生成されたサムネイル画像は、ファイル管理サーバ２４を介してストレージ２５に格納され、Ｗｅｂサーバにて公開される。
具体的には、データ処理サーバ２２は次の手順でサムネイル画像を生成する。
・編集データＡ（図１０（ａ））において「選択フラグ」が有意の範囲を指定範囲としてそれぞれ指定する。
・指定範囲に対応する編集データＡ（図１０（ａ））の「編集テキスト」を重畳テキストに指定する。
・指定範囲に対応する編集データＢ（図１０（ｂ））の「選択フレーム」を指定フレームに指定する。 [S718]
The data processing server 22 generates a thumbnail image. The generated thumbnail image is stored in the storage 25 via the file management server 24 and published on the Web server.
Specifically, the data processing server 22 generates thumbnail images in the following procedure.
-In the edited data A (FIG. 10 (a)), the range in which the "selection flag" is significant is designated as the designated range.
-Specify the "editing text" of the editing data A (FIG. 10A) corresponding to the specified range as the superimposed text.
-Specify the "selected frame" of the editing data B (FIG. 10B) corresponding to the specified range as the specified frame.

・指定範囲に対応する編集データＢ（図１０（ｂ））の「軌跡情報」から特定される軌跡の始点の位置に応じて、指定フレームとの関係で当該指定範囲に包含される指定音声の音源の位置を特定する。音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストが当該音源に関連付けて配置されるように配置態様を設定し、音源の位置が表示範囲の外部である場合に重畳テキストが当該音源に関連付けずに配置されるように配置態様を設定する。
・指定範囲に対応する編集データＢ（図１０（ｂ））の「軌跡情報」から特定される軌跡が形成する閉領域の位置に応じて重畳テキストの配置領域を設定する。
・指定フレーム内の設定された配置領域に設定された配置態様で重畳テキストを重畳配置した画像を生成する。
・上記の手順で生成された指定範囲ごとの画像を、レイアウト選択情報により特定されるレイアウトで配置し、出力形態選択情報により特定される形式で出力する。 -The designated voice included in the designated range in relation to the designated frame according to the position of the start point of the locus specified from the "trajectory information" of the editing data B (FIG. 10B) corresponding to the designated range. Identify the location of the sound source. The arrangement mode is set so that the superimposed text is arranged in association with the sound source when the position of the sound source is inside the display range of the specified frame, and the superimposed text is applicable when the position of the sound source is outside the display range. The arrangement mode is set so that it is arranged without being associated with the sound source.
-The placement area of the superimposed text is set according to the position of the closed area formed by the locus specified from the "trajectory information" of the editing data B (FIG. 10B) corresponding to the specified range.
-Generates an image in which superimposed text is superimposed and arranged in the arrangement mode set in the set arrangement area in the specified frame.
-The images for each specified range generated in the above procedure are arranged in the layout specified by the layout selection information, and output in the format specified by the output form selection information.

〔Ｓ７２０〕
ユーザ管理サーバ２１が、ユーザ端末１０にサムネイル画像を提供する。
図１２は、サムネイル画像の表示例である。サムネイル画像１２００は、編集画面Ｂ１１００において２行２列のレイアウト及びＷｅｂ表示用データがそれぞれ選択された場合に生成されるデータである。
サムネイル画像１２００を構成する指定フレームには、重畳テキストがそれぞれ重畳配置されている。各指定フレームには、対応する指定範囲のムービーを再生させるハイパーリンクが設定されていてもよい。 [S720]
The user management server 21 provides a thumbnail image to the user terminal 10.
FIG. 12 is a display example of a thumbnail image. The thumbnail image 1200 is data generated when the layout for 2 rows and 2 columns and the data for Web display are selected on the edit screen B1100.
Superimposed texts are superimposed and arranged in the designated frames constituting the thumbnail image 1200. A hyperlink may be set in each designated frame to play a movie in the corresponding designated range.

図１３は、サムネイル画像の表示例である。サムネイル画像１３００は、編集画面Ｂ１１００において４行１列のレイアウト及び印刷用データがそれぞれ選択された場合に生成されるデータである。
サムネイル画像１３００を構成する指定フレームには、重畳テキストがそれぞれ重畳配置されている。各指定フレームの隅には、対応する指定範囲のムービーを再生させるＵＲＬを変換した２次元コードを配置してもよい。 FIG. 13 is a display example of a thumbnail image. The thumbnail image 1300 is data generated when the layout and print data of 4 rows and 1 column are selected on the edit screen B1100.
Superimposed texts are superimposed and arranged in the designated frames constituting the thumbnail image 1300. At the corner of each designated frame, a two-dimensional code converted from a URL for playing a movie in the corresponding designated range may be placed.

［２−６．実施例のシステムが奏する効果］
何らかの情報を他のユーザに知らせる手段としてＳＮＳ（Social Networking Service）がしばしば用いられる。ＳＮＳは情報を拡散させる用途に利用される性質上、静的な情報（例えば、テキスト，静止画像等）との親和性が高い。
新たなムービー（動画像）がインターネット上で視聴可能になった場合にも、その事実を知らせる手段としてＳＮＳを利用することは可能である。しかし、ムービー（動画像）の視聴にはまとまった時間がかかることから、ＳＮＳで紹介するには視聴した上でその内容を静的に示す投稿が必要であった。 [2-6. Effects of the system of the embodiment]
SNS (Social Networking Service) is often used as a means for notifying other users of some information. Due to the nature of SNS being used for spreading information, it has a high affinity with static information (for example, text, still images, etc.).
Even when a new movie (moving image) becomes available on the Internet, it is possible to use SNS as a means of notifying the fact. However, since it takes a lot of time to watch a movie (moving image), it is necessary to post it statically after watching it in order to introduce it on SNS.

本実施例のシステムは、指定ムービーの指定範囲に包含される指定フレームに重畳表示させる重畳テキストを、指定音声の音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストを該音源に関連付けて配置し、指定音声の音源の位置が指定フレームの表示範囲の外部である場合に重畳テキストを音源に関連付けずに配置する。音源の位置は、閉図形を描くユーザ操作の端点（始点）の位置に応じて特定される。また、重畳テキストの配置領域は、閉図形が形成する閉領域の位置に応じて設定される。
これらの処理が指定範囲ごとに行われ、テキストがそれぞれ重畳配置された複数の指定フレームをレイアウトしたサムネイル画像が最終的に出力される。
よって、実施例のシステムによれば、ムービーから抽出される指定音声に対応するテキストを当該ムービーから抽出される指定フレームに、より簡便な操作でより訴求力の高い態様で重畳配置することが可能になる。 The system of this embodiment superimposes and displays the superposed text on the designated frame included in the designated range of the designated movie, and superimposes the text on the sound source when the position of the sound source of the designated voice is inside the display range of the designated frame. When the position of the sound source of the specified voice is outside the display range of the specified frame, the superimposed text is placed without being associated with the sound source. The position of the sound source is specified according to the position of the end point (start point) of the user operation for drawing the closed figure. Further, the arrangement area of the superimposed text is set according to the position of the closed area formed by the closed figure.
These processes are performed for each specified range, and a thumbnail image in which a plurality of specified frames on which texts are superimposed are laid out is finally output.
Therefore, according to the system of the embodiment, it is possible to superimpose the text corresponding to the designated voice extracted from the movie on the designated frame extracted from the movie in a more appealing manner with a simpler operation. become.

［３．変形例］
［３−１．データ連携の変形例］
上記実施例では、編集データＡ及び編集データＢを同期通信でそれぞれ取得する構成が採用されている。データの伝送には、例えばHTTPプロトコルのPOSTメソッドが利用される。
これに対し、編集データＡの項目及び編集データＢの項目を非同期通信で順次取得する構成が採用されてもよい。データの伝送には、例えばXMLHttpRequestオブジェクトが利用されるとよい。 [3. Modification example]
[3-1. Modification example of data linkage]
In the above embodiment, a configuration is adopted in which the edited data A and the edited data B are acquired by synchronous communication, respectively. For data transmission, for example, the POST method of the HTTP protocol is used.
On the other hand, a configuration in which the items of the edited data A and the items of the edited data B are sequentially acquired by asynchronous communication may be adopted. For example, an XMLHttpRequest object may be used for data transmission.

［３−２．処理主体の変形例］
上記実施例では、サムネイル画像の生成に関するデータ処理を画像生成システム２０（特にデータ処理サーバ２２）が実行する構成が採用されている。ユーザ端末１０は、画像生成システム２０の入出力装置に相当する役割を担っている。
これに対し、サムネイル画像の生成に関するデータ処理の少なくとも一部をユーザ端末１０が実行する構成が採用されてもよい。例えば音声認識処理をＰＣ等のユーザ端末１０に実行させれば画像生成システム２０（データ処理サーバ２２）の処理負荷を抑えることが可能になる。 [3-2. Deformation example of processing subject]
In the above embodiment, a configuration is adopted in which the image generation system 20 (particularly the data processing server 22) executes data processing related to the generation of thumbnail images. The user terminal 10 plays a role corresponding to an input / output device of the image generation system 20.
On the other hand, a configuration may be adopted in which the user terminal 10 executes at least a part of the data processing related to the generation of the thumbnail image. For example, if the voice recognition process is executed by a user terminal 10 such as a PC, the processing load of the image generation system 20 (data processing server 22) can be suppressed.

［３−３．判断主体の変形例］
上記実施例では、指定範囲の指定，指定フレームの指定，音源の位置の特定，重畳テキストの配置領域の指定等をユーザの操作に基づいて行う構成が採用されている。
これに対し、これらの処理の少なくとも一部を、画像生成システム２０（例えば、データ処理サーバ２２）がユーザの操作に基づかないで行う構成が採用されてもよい。例えば擬似乱数に基づいてランダムに指定・特定してもよいし、所定の条件に基づいて指定・特定してもよい。 [3-3. Deformation example of judgment subject]
In the above embodiment, a configuration is adopted in which the specified range is specified, the specified frame is specified, the position of the sound source is specified, the arrangement area of the superimposed text is specified, and the like based on the user's operation.
On the other hand, a configuration may be adopted in which the image generation system 20 (for example, the data processing server 22) performs at least a part of these processes without being based on the user's operation. For example, it may be randomly specified / specified based on a pseudo-random number, or it may be specified / specified based on a predetermined condition.

［３−４．ユーザ操作の変形例］
上記実施形態では、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて音源の位置が特定され、当該ユーザ操作により指定される領域に応じて重畳テキストの配置領域が設定される。上記実施例では、画面に閉図形を描くユーザ操作の軌跡の始点の位置に応じて音源の位置を特定し、軌跡が形成する閉領域の位置に応じてテキストの配置領域を設定する構成が採用されている。
これに対し、画面に線を描くユーザ操作の軌跡の始点の位置に応じて音源の位置を特定し、軌跡の終点の位置を包含する一定領域にテキストの配置領域を設定する構成が採用されてもよい。なお、テキストの配置領域は、テキストが他の被写体に重ならないように設定するのが好ましい。 [3-4. Modification example of user operation]
In the above embodiment, the position of the sound source is specified according to the position specified by the user operation that also serves as the position designation and the area designation, and the arrangement area of the superimposed text is set according to the area specified by the user operation. .. In the above embodiment, the position of the sound source is specified according to the position of the start point of the locus of the user operation for drawing the closed figure on the screen, and the text arrangement area is set according to the position of the closed area formed by the locus. Has been done.
On the other hand, a configuration is adopted in which the position of the sound source is specified according to the position of the start point of the locus of the user operation that draws a line on the screen, and the text placement area is set in a certain area including the position of the end point of the locus. May be good. The text placement area is preferably set so that the text does not overlap with other subjects.

図１４を参照し、重畳テキストの配置態様の一例を説明する。図１４では、指定音声の音源の位置がフレームの表示範囲の内部である場合が想定される。
フレーム１４１０には被写体１４２０が含まれる。フレーム１４１０を包含する画面に線を描くユーザ操作がなされると、軌跡１４３０が形成される。このとき、軌跡１４３０の始点１４３１の位置に応じて音源の位置が特定され、軌跡１４３０の終点１４３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の内部である場合、重畳テキストは、終点１４３２の位置に応じた領域に始点１４３１の位置に応じた音源の位置に関連付けて配置される吹き出し１４４０の内部に配置される。 An example of the arrangement mode of the superimposed text will be described with reference to FIG. In FIG. 14, it is assumed that the position of the sound source of the designated voice is inside the display range of the frame.
The frame 1410 includes the subject 1420. When a user operation of drawing a line on the screen including the frame 1410 is performed, a locus 1430 is formed. At this time, the position of the sound source is specified according to the position of the start point 1431 of the locus 1430, and the text arrangement area is set according to the position of the end point 1432 of the locus 1430.
When the position of the sound source of the specified voice is inside the display range of the frame, the superimposed text is placed inside the balloon 1440 which is arranged in the area corresponding to the position of the end point 1432 in association with the position of the sound source corresponding to the position of the start point 1431. Is placed in.

図１５を参照し、重畳テキストの配置態様の他の例を説明する。図１５では、指定音声の音源の位置がフレームの表示範囲の外部である場合が想定される。
フレーム１５１０には被写体１５２０が含まれる。フレーム１５１０を包含する画面に線を描くユーザ操作がなされると、軌跡１５３０が形成される。このとき、軌跡１５３０の始点１５３１の位置に応じて音源の位置が特定され、軌跡１５３０の終点１５３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の外部である場合、重畳テキストは、終点１５３２の位置に応じた領域に始点１５３１の位置に応じた音源の位置に関連付けずに配置される領域１５４０の内部に配置される。 Another example of the arrangement mode of the superimposed text will be described with reference to FIG. In FIG. 15, it is assumed that the position of the sound source of the designated voice is outside the display range of the frame.
The frame 1510 includes the subject 1520. When a user operation for drawing a line is performed on the screen including the frame 1510, the locus 1530 is formed. At this time, the position of the sound source is specified according to the position of the start point 1531 of the locus 1530, and the text arrangement area is set according to the position of the end point 1532 of the locus 1530.
When the position of the sound source of the specified voice is outside the display range of the frame, the superimposed text is placed in the area corresponding to the position of the end point 1532, regardless of the position of the sound source corresponding to the position of the start point 1531. Placed inside.

１０ユーザ端末
２０画像生成システム
２１ユーザ管理サーバ
２２データ処理サーバ（情報処理装置の一例）
２３データ管理サーバ
２４ファイル管理サーバ
２５ストレージ
３０通信ネットワーク

10 User terminal 20 Image generation system 21 User management server 22 Data processing server (an example of information processing device)
23 Data management server 24 File management server 25 Storage 30 Communication network

Claims

Placing the text corresponding to the audio associated with the specified frame, the frame position specified-out based on the locus drawn by the user operation in the movie,
Information processing device.

The user operation is an operation of drawing a closed figure.
The text placement area is set according to the position of the end point of the locus and the position of the closed area formed by the closed figure.
The information processing device according to claim 1.

The text corresponding to the audio associated with the specified frame in the movie, including the step of placing said frame position specified-out based on the locus drawn by the user operation, the information processing method.

The text corresponding to the audio associated with the specified frame in the movie, a control program for realizing functions to place on the frame at a location specified-out based on the locus drawn by the user operation in the computer of the information processing apparatus.