JP6648003B2

JP6648003B2 - Information processing apparatus, control method of information processing apparatus, and control program

Info

Publication number: JP6648003B2
Application number: JP2016257053A
Authority: JP
Inventors: 健治笠原; 咲耶西山
Original assignee: Mixi Inc
Current assignee: Mixi Inc
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2020-02-14
Anticipated expiration: 2036-12-28
Also published as: JP2018110312A

Description

本発明は、ムービーからフレームを抽出して出力する情報処理システムに関する。
本発明は、特に、ムービーからフレームと音声を抽出し、当該音声に対応するテキストを当該フレームに重畳表示する情報処理システムに関する。 The present invention relates to an information processing system that extracts and outputs frames from a movie.
The present invention particularly relates to an information processing system that extracts a frame and a sound from a movie and superimposes a text corresponding to the sound on the frame.

音声付き動画から注目静止画を自動的に抽出して写真アルバムを作成する方法が知られている（例えば、特許文献１参照）。
具体的には、音声付き動画データを構成する音声データの波形データ又はテキストデータの中から指定波形又は指定テキストに一致する注目部分を特定し、当該音声付き動画データを構成する動画データから当該注目部分に対応する静止画を注目静止画として抽出し、注目静止画に注目部分の音声のテキストが付加された写真アルバムを作成する（特許文献１の［請求項４］［請求項６］等）。 There is known a method of automatically extracting a still image of interest from a moving image with sound and creating a photo album (for example, see Patent Document 1).
Specifically, an attention portion that matches a designated waveform or a designated text is specified from waveform data or text data of audio data constituting the moving image data with sound, and the attention portion is determined from the moving image data constituting the moving image data with sound. A still image corresponding to the portion is extracted as a still image of interest, and a photo album in which audio text of the attention portion is added to the still image of interest is created (for example, [Claim 4] and [Claim 6] of Patent Document 1). .

特開２００６−３３３０６５号公報JP 2006-333065 A

静止画像にオブジェクトを重畳配置する場合、オブジェクトの配置態様を静止画像の表示内容に応じて設定できれば都合がよい。特許文献１には、注目静止画に対してテキストをどのように付加するか具体的な配置態様が開示されていない。 When superimposing an object on a still image, it is convenient if the arrangement mode of the object can be set according to the display content of the still image. Patent Literature 1 does not disclose a specific arrangement of how text is added to a still image of interest.

本発明が解決しようとする課題は、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置することである。 A problem to be solved by the present invention is to superimpose and arrange a text corresponding to audio extracted from a movie on a frame extracted from the movie in a highly appealing manner.

〔Ａ〕上記課題を解決するため、本発明の一態様である「情報処理装置」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定手段と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される音源の位置を加味して設定する設定手段と、を備える。
〔Ｂ〕上記課題を解決するため、本発明の一態様である「情報処理装置の制御方法」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定ステップと、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定ステップにおいて特定される音源の位置を加味して設定する設定ステップと、を含む。
〔Ｃ〕上記課題を解決するため、本発明の一態様である「制御プログラム」は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定機能と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定機能により特定される音源の位置を加味して設定する設定機能と、を情報処理装置のコンピュータに実現させる。
〔Ｄ〕上記課題を解決するため、本発明の一態様である「コンピュータ読取り可能な記録媒体」に記録される制御プログラムは、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定機能と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定機能により特定される音源の位置を加味して設定する設定機能と、を情報処理装置のコンピュータに実現させる。 [A] In order to solve the above problem, an “information processing device” according to an aspect of the present invention includes a sound source of a designated sound included in a designated range of a movie in relation to a designated frame included in the designated range of the movie. A specifying unit configured to specify a position; and a setting unit configured to set an arrangement mode of a superimposed text to be superimposed on the specified frame in correspondence with the specified voice in consideration of a position of a sound source specified by the specifying unit. .
[B] In order to solve the above problem, an “information processing device control method” according to an aspect of the present invention includes a designated sound included in a designated range included in a designated range of a movie in relation to a designated frame included in the designated range. A specifying step of specifying the position of the sound source; and a setting step of setting the arrangement mode of the superimposed text superimposed on the specified frame corresponding to the specified voice in consideration of the position of the sound source specified in the specifying step. ,including.
[C] In order to solve the above-described problem, a “control program” according to an embodiment of the present invention includes a position of a sound source of a designated sound included in a designated range in relation to a designated frame included in the designated range of a movie. And a setting function of setting a layout mode of superimposed text superimposed on the specified frame corresponding to the specified voice in consideration of a position of a sound source specified by the specific function. It is realized by the computer of the device.
[D] In order to solve the above-mentioned problem, a control program recorded on a “computer-readable recording medium” according to an embodiment of the present invention includes: The specific function for specifying the position of the sound source of the designated voice included in the above, and the arrangement mode of the superimposed text that is superimposed and placed on the specified frame corresponding to the specified sound in consideration of the position of the sound source specified by the specific function. And a setting function for setting the information by the computer of the information processing apparatus.

上記〔Ａ〕の「情報処理装置」には、下記の技術的限定を加えてもよい。また、同様の技術的限定を、上記〔Ｂ〕の「制御方法」，上記〔Ｃ〕の「制御プログラム」及び上記〔Ｄ〕の「記録媒体」が記録する制御プログラムに加えてもよい。
・前記設定手段が、前記特定手段により特定される音源の位置と前記指定フレームの表示範囲との関係に応じて前記配置態様を設定する。
・前記設定手段が、前記音源の位置が前記表示範囲の内部である場合に前記重畳テキストが該音源に関連付けて配置されるように前記配置態様を設定し、前記音源の位置が前記表示範囲の外部である場合に前記重畳テキストが該音源に関連付けずに配置されるように前記配置態様を設定する。
・前記特定手段が、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて前記音源の位置を特定し、前記設定手段が、前記ユーザ操作により指定される領域に応じて前記重畳テキストの配置領域を設定する。
・前記ユーザ操作は閉図形を描く操作であり、前記特定手段が、前記ユーザ操作の端点の位置に応じて前記音源の位置を特定し、前記設定手段が、前記ユーザ操作が描く閉図形が形成する閉領域の位置に応じて前記重畳テキストの配置領域を設定する。
・複数の候補範囲をすくなくとも含む候補リストの中からユーザにより選択される指定数の範囲を前記指定範囲として指定する指定手段をさらに備え、前記特定手段が、前記指定手段により指定される指定範囲ごとに、該指定範囲に包含される指定音声の音源の位置を特定し、前記設定手段が、前記指定手段により指定される指定範囲ごとに、該指定範囲に包含される指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される該指定範囲に包含される指定音声の音源の位置を加味して設定する。
・前記指定手段が、前記ムービーに包含される音声のまとまりにそれぞれ対応する複数の候補テキストにそれぞれ関連付けられた複数の前記候補範囲を含む前記候補リストをユーザに提示し前記指定数の範囲を該ユーザに選択させる。 The following technical limitation may be added to the “information processing device” of [A]. Further, the same technical limitation may be added to the “control method” of [B], the “control program” of [C], and the control program recorded by the “recording medium” of [D].
The setting means sets the arrangement mode according to a relationship between a position of the sound source specified by the specifying means and a display range of the specified frame.
The setting means sets the arrangement mode such that the superimposed text is arranged in association with the sound source when the position of the sound source is inside the display range, and the position of the sound source is set in the display range. The arrangement mode is set so that the superimposed text is arranged without being associated with the sound source when the text is outside.
The specifying means specifies the position of the sound source in accordance with a position specified by a user operation serving both as a position specification and an area specification, and the setting means performs the superimposition in accordance with an area specified by the user operation Set the text placement area.
The user operation is an operation of drawing a closed figure, the specifying means specifies the position of the sound source according to the position of an end point of the user operation, and the setting means forms a closed figure drawn by the user operation. The arrangement area of the superimposed text is set according to the position of the closed area to be placed.
A designating means for designating, as the designated range, a designated number range selected by a user from a candidate list including at least a plurality of candidate ranges, wherein the identifying means is provided for each designated range designated by the designated means; The position of a sound source of a designated sound included in the designated range is specified, and the setting unit is superimposed on a designated frame included in the designated range for each designated range designated by the designating unit. The arrangement mode of the superimposed text is set in consideration of the position of the sound source of the designated voice included in the designated range specified by the specifying unit.
The specifying means presents to the user the candidate list including the plurality of candidate ranges respectively associated with the plurality of candidate texts respectively corresponding to the group of sounds included in the movie, and specifies the specified number range; Let the user choose.

本明細書では、下記のように用語を用いる。
・「ムービー」は、複数のフレームと音声データとを少なくとも包含する。
・「指定範囲」とは、「ムービー」のタイムラインにおいて指定される時間的範囲をいう。
・「フレーム」とは、「ムービー」を構成する静止画像をいう。
・「指定音声」の音源の位置は、「指定フレーム」が表示する範囲の内側又は外側に存在する。
・「重畳テキスト」とは、例えば、「指定音声」から変換された変換テキスト，「指定音声」の検出位置に補充された補充テキスト，変換テキスト又は補充テキストに編集が加えられた編集テキストなど、「指定音声」に由来するテキストをいう。
・「配置態様」とは、「重畳テキスト」の配置の仕方をいう。
・「ユーザ操作」は、画面上の指示位置を連続的に入力する位置入力装置（例えば、タッチパネル，マウス，タッチパッド等）を用いて行われる。 In this specification, the following terms are used.
"Movie" includes at least a plurality of frames and audio data.
"Specified range" refers to a temporal range specified in the timeline of "movie".
“Frame” refers to a still image that constitutes a “movie”.
The position of the sound source of “specified voice” exists inside or outside the range displayed by “specified frame”.
-The "superimposed text" is, for example, a converted text converted from the "specified voice", a supplementary text supplemented at the detection position of the "specified speech", an edited text obtained by editing the converted text or the supplementary text, and the like. Refers to text derived from “designated voice”.
The “arrangement mode” refers to a manner of arranging the “superimposed text”.
The “user operation” is performed using a position input device (for example, a touch panel, a mouse, a touch pad, or the like) that continuously inputs a designated position on the screen.

本発明は、指定フレームに重畳配置される重畳テキストの配置態様を当該指定フレームとの関係で特定される指定音声の音源の位置に応じて設定する。
よって、本発明によれば、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置することが可能になる。 According to the present invention, the arrangement mode of the superimposed text superimposed on the specified frame is set according to the position of the sound source of the specified voice specified in relation to the specified frame.
Therefore, according to the present invention, it is possible to superimpose and arrange a text corresponding to a sound extracted from a movie on a frame extracted from the movie in a manner having a high appeal.

重畳テキストの配置態様の説明図である。（実施例）It is explanatory drawing of the arrangement | positioning mode of a superimposed text. (Example) 重畳テキストの配置態様の説明図である。（実施例）It is explanatory drawing of the arrangement | positioning mode of a superimposition text. (Example) ネットワーク構成例の説明図である。（実施例）FIG. 3 is an explanatory diagram of a network configuration example. (Example) ユーザ装置のハードウェア構成例の説明図である。（実施例）FIG. 3 is an explanatory diagram of a hardware configuration example of a user device. (Example) サーバ装置のハードウェア構成例の説明図である。（実施例）FIG. 3 is an explanatory diagram of a hardware configuration example of a server device. (Example) 画像生成システムの機能構成例の説明図である。（実施例）FIG. 2 is an explanatory diagram of a functional configuration example of an image generation system. (Example) 画像生成手順の説明図である。（実施例）FIG. 9 is an explanatory diagram of an image generation procedure. (Example) 管理データの項目例の説明図である。（実施例）FIG. 9 is an explanatory diagram of an example of items of management data. (Example) 編集画面Ａの表示例の説明図である。（実施例）FIG. 14 is an explanatory diagram of a display example of an editing screen A. (Example) 編集データの項目例の説明図である。（実施例）FIG. 9 is an explanatory diagram of an example of items of edit data. (Example) 編集画面Ｂの表示例の説明図である。（実施例）FIG. 9 is an explanatory diagram of a display example of an edit screen B. (Example) サムネイル画像の表示例の説明図である。（実施例）FIG. 9 is an explanatory diagram of a display example of a thumbnail image. (Example) サムネイル画像の表示例の説明図である。（実施例）FIG. 9 is an explanatory diagram of a display example of a thumbnail image. (Example) 重畳テキストの配置態様の説明図である。（変形例）It is explanatory drawing of the arrangement | positioning mode of a superimposed text. (Modification) 重畳テキストの配置態様の説明図である。（変形例）It is explanatory drawing of the arrangement | positioning mode of a superimposed text. (Modification)

［１．実施形態］
［１−１．概要］
本実施形態は、ムービーからフレームと音声を抽出し当該音声に対応するテキストを当該フレームに重畳配置して出力する情報処理システムに関する。
本実施形態では、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置するため、フレームに重畳配置されるテキストの配置態様を指定音声の音源の位置に応じて設定する構成が採用されている。 [1. Embodiment]
[1-1. Overview]
The present embodiment relates to an information processing system that extracts a frame and a sound from a movie, superimposes a text corresponding to the sound on the frame, and outputs the text.
In the present embodiment, the text corresponding to the sound extracted from the movie is superimposed on the frame extracted from the movie in a highly appealing manner. Is set in accordance with the position of.

［１−２．情報処理装置］
本実施形態に係る情報処理システムを構成する情報処理装置は、ムービーの指定範囲に包含される指定フレームとの関係で該指定範囲に包含される指定音声の音源の位置を特定する特定手段と、前記指定音声に対応し前記指定フレームに重畳配置される重畳テキストの配置態様を前記特定手段により特定される音源の位置を加味して設定する設定手段と、を備える。 [1-2. Information processing device]
An information processing apparatus that constitutes the information processing system according to the present embodiment includes a specification unit that specifies a position of a sound source of a specified sound included in the specified range in relation to a specified frame included in the specified range of the movie; Setting means for setting the arrangement mode of the superimposed text superimposed on the specified frame corresponding to the specified voice, taking into account the position of the sound source specified by the specifying means.

［２．実施例］
［２−１．概要］
本実施例は、ムービーからフレームと音声を抽出し当該音声に対応するテキストを当該フレームに重畳配置した複数の画像をレイアウトして合成したサムネイル画像を生成する画像生成サービスを提供する画像生成システムに関する。
画像生成サービスでは、ムービーから抽出されるフレームや音声は、画像生成サービスを利用するユーザによりそれぞれ指定される。 [2. Example]
[2-1. Overview]
The present embodiment relates to an image generation system that provides an image generation service that extracts a frame and a sound from a movie, lays out a plurality of images in which text corresponding to the sound is superimposed on the frame, and generates a synthesized thumbnail image. .
In the image generation service, frames and audio extracted from a movie are specified by a user using the image generation service.

画像生成サービスでは、ムービーから抽出される音声に対応するテキストを当該ムービーから抽出されるフレームに訴求力の高い態様で重畳配置するため、フレームに重畳配置される重畳テキストの配置態様が指定音声の音源の位置とフレームの表示範囲との関係に応じて設定される。
また、画像生成システムでは、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて音源の位置が特定され、当該ユーザ操作により指定される領域に応じて重畳テキストの配置領域が設定される。 In the image generation service, the text corresponding to the audio extracted from the movie is superimposed on the frame extracted from the movie in a highly appealing manner. This is set according to the relationship between the position of the sound source and the display range of the frame.
Further, in the image generation system, the position of the sound source is specified according to the position specified by a user operation that serves both as the position specification and the region specification, and the arrangement region of the superimposed text is set according to the region specified by the user operation. Is done.

図１を参照し、重畳テキストの配置態様の一例を説明する。図１では、指定音声の音源の位置がフレームの表示範囲の内部である場合が想定される。
フレーム１１０には被写体１２０が含まれる。フレーム１１０を包含する画面に閉図形を描くユーザ操作がなされると、軌跡１３０が形成される。このとき、軌跡１３０の始点１３１の位置に応じて音源の位置が特定され、軌跡１３０が形成する閉領域１３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の内部である場合、重畳テキストは、閉領域１３２に応じた領域に始点１３１の位置に応じた音源の位置に関連付けて配置される吹き出し１４０の内部に配置される。 With reference to FIG. 1, an example of an arrangement mode of the superimposed text will be described. In FIG. 1, it is assumed that the position of the sound source of the designated voice is inside the display range of the frame.
The subject 110 is included in the frame 110. When the user performs a user operation of drawing a closed graphic on the screen including the frame 110, a locus 130 is formed. At this time, the position of the sound source is specified according to the position of the starting point 131 of the trajectory 130, and the text arrangement area is set according to the position of the closed area 132 formed by the trajectory 130.
When the position of the sound source of the designated voice is inside the display range of the frame, the superimposed text is placed inside the balloon 140 arranged in the area corresponding to the closed area 132 in association with the position of the sound source corresponding to the position of the start point 131. Be placed.

図２を参照し、重畳テキストの配置態様の他の例を説明する。図２では、指定音声の音源の位置がフレームの表示範囲の外部である場合が想定される。
フレーム２１０には被写体２２０が含まれる。フレーム２１０を包含する画面に閉図形を描くユーザ操作がなされると、軌跡２３０が形成される。このとき、軌跡２３０の始点２３１の位置に応じて音源の位置が特定され、軌跡２３０が形成する閉領域２３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の外部である場合、重畳テキストは、閉領域２３２に応じた領域に始点２３１の位置に応じた音源の位置に関連付けずに配置される領域２４０の内部に配置される。 With reference to FIG. 2, another example of an arrangement mode of the superimposed text will be described. In FIG. 2, it is assumed that the position of the sound source of the designated voice is outside the display range of the frame.
The frame 210 includes the subject 220. When the user performs a user operation of drawing a closed graphic on the screen including the frame 210, a locus 230 is formed. At this time, the position of the sound source is specified according to the position of the starting point 231 of the trajectory 230, and the placement area of the text is set according to the position of the closed area 232 formed by the trajectory 230.
When the position of the sound source of the designated voice is outside the display range of the frame, the superimposed text is placed in the region corresponding to the closed region 232 without being associated with the position of the sound source corresponding to the position of the starting point 231. Placed in

［２−２．ネットワーク構成］
図３は、実施例のシステムのネットワーク構成例を示す。
本実施例のシステムは、ユーザが使用するユーザ端末１０と、画像生成サービスを提供する画像生成システム２０と、を含む。
画像生成システム２０は、ユーザ管理サーバ２１とデータ処理サーバ２２とデータ管理サーバ２３とファイル管理サーバ２４とストレージ２５とを含む。 [2-2. Network configuration]
FIG. 3 illustrates a network configuration example of the system according to the embodiment.
The system according to the present embodiment includes a user terminal 10 used by a user and an image generation system 20 that provides an image generation service.
The image generation system 20 includes a user management server 21, a data processing server 22, a data management server 23, a file management server 24, and a storage 25.

ユーザ端末１０とユーザ管理サーバ２１とは、通信ネットワーク３０を通じてそれぞれデータの授受が可能である。ユーザ管理サーバ２１及びデータ処理サーバ２２は、データ管理サーバ２３を介して、ストレージ２５に記憶されるデータにそれぞれアクセス可能である。また、ユーザ管理サーバ２１及びデータ処理サーバ２２は、ファイル管理サーバ２４を介して、ストレージ２５に記憶されるデータにそれぞれアクセス可能である。
通信ネットワーク３０は、既存のネットワーク（例えば、インターネット（Internet），携帯電話網，無線ＷＡＮ（Wireless Wide Area Network），無線ＬＡＮ（Wireless Local Area Network），イーサネット（Ethernet）（登録商標）などのうち少なくともいずれか）を含んでいてよい。 The user terminal 10 and the user management server 21 can exchange data with each other through the communication network 30. The user management server 21 and the data processing server 22 can access data stored in the storage 25 via the data management server 23, respectively. Further, the user management server 21 and the data processing server 22 can access data stored in the storage 25 via the file management server 24, respectively.
The communication network 30 is at least one of existing networks (for example, the Internet (Internet), a mobile phone network, a wireless WAN (Wireless Wide Area Network), a wireless LAN (Wireless Local Area Network), an Ethernet (Ethernet) (registered trademark), and the like. Any).

［２−２−１．ユーザ端末］
ユーザ端末１０は、所定のＷｅｂブラウザプログラムがインストールされたユーザ装置（コンピュータ）である。
本実施例のシステムでは、ユーザ装置として、Ｗｅｂブラウザプログラムをインストール可能な汎用の携帯装置（例えば、携帯電話，スマートフォン（smartphone），タブレット（tablet）端末，タブレットＰＣ（personal computer），ウェアラブルデバイス（wearable device）など）や汎用の処理装置（例えば、ＰＣ（personal computer）など）を用いることができる。 [2-2-1. User terminal]
The user terminal 10 is a user device (computer) on which a predetermined Web browser program is installed.
In the system according to the present embodiment, as a user device, a general-purpose mobile device (for example, a mobile phone, a smartphone, a tablet terminal, a tablet PC (personal computer), a wearable device (wearable) that can install a Web browser program. device) or a general-purpose processing device (eg, a PC (personal computer)).

［２−２−２．画像生成システム］
画像生成システム２０は、ユーザ管理サーバ２１とデータ処理サーバ２２とデータ管理サーバ２３とファイル管理サーバ２４とストレージ２５とを含む。
ユーザ管理サーバ２１は、Ｗｅｂサーバプログラム（ＨＴＴＰデーモン（HyperText Transfer Protocol Daemon）ともいう。）がインストールされたサーバ装置（コンピュータ）である。
ユーザ管理サーバ２１は、ユーザ端末１０からの要求（リクエスト）に応じて、データ管理サーバ２３を介してストレージ２５から必要なデータを読み出し、ユーザ端末１０に提供（レスポンス）する。また、ユーザ管理サーバ２１は、ユーザ端末１０からの要求（リクエスト）に応じて、ユーザ端末１０から取得したデータを、データ管理サーバ２３を介してストレージ２５に書き込み、処理結果をユーザ端末１０に提供（レスポンス）する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、ユーザ管理サーバ２１の機能を分担させ又はユーザ管理サーバ２１にかかる負荷を分散させてもよい。 [2-2-2. Image generation system]
The image generation system 20 includes a user management server 21, a data processing server 22, a data management server 23, a file management server 24, and a storage 25.
The user management server 21 is a server device (computer) in which a Web server program (also referred to as an HTTP daemon (HyperText Transfer Protocol Daemon)) is installed.
The user management server 21 reads necessary data from the storage 25 via the data management server 23 in response to a request (request) from the user terminal 10 and provides the data to the user terminal 10 (response). Further, in response to a request from the user terminal 10, the user management server 21 writes data acquired from the user terminal 10 to the storage 25 via the data management server 23, and provides a processing result to the user terminal 10. (Response).
Note that a server system may be configured by linking a plurality of server devices, and the functions of the user management server 21 may be shared or the load on the user management server 21 may be distributed.

データ処理サーバ２２は、アプリケーションプログラムがインストールされたサーバ装置（コンピュータ）である。
データ処理サーバ２２は、データ管理サーバ２３を介してストレージ２５から必要なデータを読み出し、これに演算・加工を施し、演算・加工済みのデータをデータ管理サーバ２３を介してストレージ２５に書き込む。また、データ処理サーバ２２は、ファイル管理サーバ２４を介してストレージ２５から必要なデータを読み出し、これに演算・加工を施し、演算・加工済みのデータをファイル管理サーバ２４を介してストレージ２５に書き込む。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、データ処理サーバ２２の機能を分担させ又はデータ処理サーバ２２にかかる負荷を分散させてもよい。 The data processing server 22 is a server device (computer) in which an application program is installed.
The data processing server 22 reads necessary data from the storage 25 via the data management server 23, performs calculation and processing on the data, and writes the calculated and processed data to the storage 25 via the data management server 23. Further, the data processing server 22 reads necessary data from the storage 25 via the file management server 24, performs calculation and processing on the data, and writes the calculated and processed data to the storage 25 via the file management server 24. .
Note that a server system may be configured by linking a plurality of server devices, and the functions of the data processing server 22 may be shared or the load on the data processing server 22 may be distributed.

データ管理サーバ２３は、ＤＢ（Database）サーバプログラムがインストールされたサーバ装置（コンピュータ）である。データ管理サーバ２３は、内蔵する又は外部の接続可能なストレージ２５とともにＤＢＭＳ（Database Management System）を構成する。
データ管理サーバ２３は、例えば、データの格納要求に応じ要求元から取得されるデータをストレージ２５に格納する機能と、データの抽出要求に応じストレージ２５から抽出されるデータを要求元に応答する機能とを有する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、データ管理サーバ２３の機能を分担させ又はデータ管理サーバ２３にかかる負荷を分散させてもよい。 The data management server 23 is a server device (computer) in which a DB (Database) server program is installed. The data management server 23 constitutes a DBMS (Database Management System) together with the built-in or externally connectable storage 25.
The data management server 23 has, for example, a function of storing data acquired from a request source in response to a data storage request in the storage 25 and a function of responding to the request source with data extracted from the storage 25 in response to a data extraction request. And
Note that a server system may be configured by linking a plurality of server devices, and the functions of the data management server 23 may be shared or the load on the data management server 23 may be distributed.

ファイル管理サーバ２４は、ファイルサーバプログラムがインストールされたサーバ装置（コンピュータ）である。
ファイル管理サーバ２４は、例えば、データの格納要求に応じ要求元から取得されるデータをストレージ２５に格納する機能と、データの抽出要求に応じストレージ２５から抽出されるデータを要求元に応答する機能とを有する。
なお、複数のサーバ装置を連携させてサーバシステムを構成し、ファイル管理サーバ２４の機能を分担させ又はファイル管理サーバ２４にかかる負荷を分散させてもよい。 The file management server 24 is a server device (computer) in which a file server program is installed.
The file management server 24 has, for example, a function of storing data obtained from a request source in response to a data storage request in the storage 25 and a function of responding to the request source with data extracted from the storage 25 in response to a data extraction request. And
Note that a server system may be configured by linking a plurality of server devices, and the functions of the file management server 24 may be shared or the load on the file management server 24 may be distributed.

ストレージ２５は、管理データ及びファイルデータを記憶する記憶装置である。
なお、複数の記憶装置を用意し、ストレージ２５が記憶するデータの種類ごとに別々に記憶させてもよい。またストレージ２５が記憶するデータを複数の記憶装置に分散配置することも可能である。 The storage 25 is a storage device that stores management data and file data.
Note that a plurality of storage devices may be prepared and stored separately for each type of data stored in the storage 25. It is also possible to distribute the data stored in the storage 25 to a plurality of storage devices.

［２−３．ハードウェア構成］
［２−３−１．ユーザ装置のハードウェア構成］
図４は、ユーザ装置のハードウェア構成例を示す。
典型的なユーザ装置は、制御処理部を構成するＭＰＵ（Micro-Processing Unit）４１１と、主記憶部を構成するＲＡＭ（Random Access Memory）４２１と、補助記憶部を構成するＲＯＭ（Read Only Memory）４２２及びＥＥＰＲＯＭ（Electrically Erasable Programmable Read-Only Memory）４２３と、入力部及び表示部を構成するタッチパネルディスプレイ４３１と、音声出力部を構成するスピーカ４３２と、通信制御部を構成するＮＩＣ（Network Interface Controller）４４１及び無線ＬＡＮ（Local Area Network）チップ４４２と、を少なくとも有する。 [2-3. Hardware configuration]
[2-3-1. Hardware configuration of user device]
FIG. 4 illustrates an example of a hardware configuration of the user device.
A typical user device includes an MPU (Micro-Processing Unit) 411 configuring a control processing unit, a RAM (Random Access Memory) 421 configuring a main storage unit, and a ROM (Read Only Memory) configuring an auxiliary storage unit. 422, an EEPROM (Electrically Erasable Programmable Read-Only Memory) 423, a touch panel display 431 forming an input unit and a display unit, a speaker 432 forming an audio output unit, and a NIC (Network Interface Controller) forming a communication control unit. 441 and a wireless LAN (Local Area Network) chip 442.

ＲＡＭ４２１と、ＲＯＭ４２２と、ＥＥＰＲＯＭ４２３と、タッチパネルディスプレイ４３１と、スピーカ４３２と、ＮＩＣ４４１と、無線ＬＡＮチップ４４２とは、バスラインを介してＭＰＵ４１１と接続される。
ＭＰＵ４１１は、（１）ＲＯＭ４２２又はＥＥＰＲＯＭ４２３に記憶されたプログラムをＲＡＭ４２１上に読み込み、（２）プログラムの指示に従ってタッチパネルディスプレイ４３１とＥＥＰＲＯＭ４２３とＮＩＣ４４１と無線ＬＡＮチップ４４２との少なくともいずれかからデータを取得し、（３）取得したデータをプログラムに規定される手順で演算・加工した上で、（４）演算済み・加工済みのデータをＥＥＰＲＯＭ４２３とタッチパネルディスプレイ４３１とスピーカ４３２とＮＩＣ４４１と無線ＬＡＮチップ４４２との少なくともいずれかに提供する。 The RAM 421, the ROM 422, the EEPROM 423, the touch panel display 431, the speaker 432, the NIC 441, and the wireless LAN chip 442 are connected to the MPU 411 via a bus line.
The MPU 411 reads (1) the program stored in the ROM 422 or the EEPROM 423 onto the RAM 421, and (2) acquires data from at least one of the touch panel display 431, the EEPROM 423, the NIC 441, and the wireless LAN chip 442 according to the instructions of the program. (3) After calculating and processing the acquired data according to the procedure specified in the program, (4) calculating and processing the processed and processed data by at least the EEPROM 423, the touch panel display 431, the speaker 432, the NIC 441, and the wireless LAN chip 442. Provide to either.

［２−３−２．サーバ装置のハードウェア構成］
図５は、サーバ装置のハードウェア構成例を示す。
典型的なサーバ装置は、ＭＰＵやＲＯＭを含む制御処理装置５１０と、ＲＡＭを含む主記憶装置５２０と、ＨＤＤ（Hard Disc Drive）を含む補助記憶装置５３０と、マウスやキーボードを含む入力装置５４０と、ディスプレイやスピーカを含む出力装置５５０と、ネットワークカード（Network Interface Card）を含む通信制御装置５６０と、を有する。 [2-3-2. Hardware configuration of server device]
FIG. 5 illustrates an example of a hardware configuration of the server device.
A typical server device includes a control processing device 510 including an MPU and a ROM, a main storage device 520 including a RAM, an auxiliary storage device 530 including an HDD (Hard Disc Drive), and an input device 540 including a mouse and a keyboard. , An output device 550 including a display and a speaker, and a communication control device 560 including a network card (Network Interface Card).

主記憶装置５２０、補助記憶装置５３０、入力装置５４０、出力装置５５０及び通信制御装置５６０は、バスラインを介して制御処理装置５１０とそれぞれ接続される。
制御処理装置５１０は、（１）補助記憶装置５３０に記憶されたプログラムを主記憶装置５２０上に読み込み、（２）プログラムの指示に従って入力装置５４０と補助記憶装置５３０と通信制御装置５６０との少なくともいずれかからデータを取得し、（３）取得したデータをプログラムに規定される手順で演算・加工した上で、（４）演算済み・加工済みのデータを補助記憶装置５３０と出力装置５５０と通信制御装置５６０との少なくともいずれかに提供する。 The main storage device 520, the auxiliary storage device 530, the input device 540, the output device 550, and the communication control device 560 are connected to the control processing device 510 via a bus line.
The control processing device 510 (1) reads the program stored in the auxiliary storage device 530 into the main storage device 520, and (2) executes at least one of the input device 540, the auxiliary storage device 530, and the communication control device 560 according to the instruction of the program. Data is obtained from any of them, and (3) the obtained data is calculated and processed according to the procedure specified in the program, and (4) the processed and processed data is communicated with the auxiliary storage device 530 and the output device 550. It is provided to at least one of the control device 560.

［２−４．機能構成］
図６は、画像生成システムの機能構成例を示す。
図６に例示されるように、ユーザ管理サーバ２１は、受付部６１１と、作成部６１２と、提供部６１３と、を含む。また、データ処理サーバ２２は、抽出部６２１と、変換部６２２と、指定部６２３と、特定部６２４と、設定部６２５と、生成部６２６と、を含む。 [2-4. Functional configuration]
FIG. 6 shows a functional configuration example of the image generation system.
As illustrated in FIG. 6, the user management server 21 includes a receiving unit 611, a creating unit 612, and a providing unit 613. Further, the data processing server 22 includes an extraction unit 621, a conversion unit 622, a designation unit 623, a specification unit 624, a setting unit 625, and a generation unit 626.

ユーザ管理サーバ２１が担う機能は、サーバ装置向けＯＳ（Operating System）と当該ＯＳ上で動作するＷｅｂサーバプログラムとがサーバ装置にそれぞれインストールされることにより実現される。
データ処理サーバ２２が担う機能は、サーバ装置向けＯＳと当該ＯＳ上で動作するアプリケーションプログラムとがサーバ装置にそれぞれインストールされることにより実現される。
サーバ装置にインストールされるべきプログラムは、各種の記録媒体（例えば、ＣＤ（Compact Disc），ＤＶＤ（Digital Versatile Disk），ＭＯディスク（Magneto-Optical disk），フラッシュメモリ（flash memory）など）に記録された状態で配布され当該記録媒体からサーバ装置に読み込まれてもよいし、通信ネットワークを介し搬送波に重畳させてサーバ装置に供給されてもよい。 The function of the user management server 21 is realized by installing an OS (Operating System) for the server device and a Web server program running on the OS on the server device.
The function of the data processing server 22 is realized by installing an OS for a server device and an application program running on the OS on the server device.
The program to be installed in the server device is recorded on various recording media (eg, CD (Compact Disc), DVD (Digital Versatile Disk), MO disk (Magneto-Optical disk), flash memory (flash memory), etc.). It may be distributed in a state of being read and read into the server device from the recording medium, or may be supplied to the server device by being superimposed on a carrier wave via a communication network.

受付部６１１は、ユーザ端末１０から要求（リクエスト）を受け付ける。
作成部６１２は、受付部６１１により受け付けられた要求（リクエスト）に応じたＷｅｂページを作成する。
提供部６１３は、作成部６１２により作成されたＷｅｂページをユーザ端末１０に提供（レスポンス）する。 The receiving unit 611 receives a request from the user terminal 10.
The creating unit 612 creates a Web page according to the request (request) received by the receiving unit 611.
The providing unit 613 provides the Web page created by the creating unit 612 to the user terminal 10 (response).

抽出部６２１は、指定ムービーからフレームと音声を抽出する。抽出されたフレーム及び音声は、ファイル管理サーバ２４に格納される。
変換部６２２は、抽出部６２１により抽出される音声に音声認識処理を施して変換テキストを生成する。なお、有効な音声認識が不可能である場合は、認識不可能な音声であることを示す補充テキストを生成する。生成されたテキストは、データ管理サーバ２３に格納される。 The extraction unit 621 extracts frames and audio from the designated movie. The extracted frames and audio are stored in the file management server 24.
The conversion unit 622 performs a voice recognition process on the voice extracted by the extraction unit 621 to generate a converted text. If effective speech recognition is not possible, a supplementary text indicating that the speech is unrecognizable is generated. The generated text is stored in the data management server 23.

指定部６２３は、抽出部６２１により指定ムービーから抽出される音声のまとまりにそれぞれ対応する複数の候補テキスト（変換テキスト又は補充テキスト）にそれぞれ関連付けられた複数の候補範囲を含む候補リストをユーザに提示して単数又は複数の範囲を選択させ、選択された範囲を指定範囲としてそれぞれ指定する。
また、指定範囲に包含される音声を指定音声として指定するとともに、指定範囲に包含されるいずれかのフレームを指定フレームとして指定する。指定フレームの指定は自動的に行ってもよいし、ユーザの指定に応じて行ってもよい。 The specifying unit 623 presents to the user a candidate list including a plurality of candidate ranges respectively associated with a plurality of candidate texts (converted text or supplementary text) respectively corresponding to a group of sounds extracted from the specified movie by the extracting unit 621. Then, one or more ranges are selected, and the selected ranges are respectively designated as designated ranges.
Also, the sound included in the specified range is specified as the specified sound, and any of the frames included in the specified range is specified as the specified frame. The designation of the designated frame may be performed automatically or in accordance with the designation of the user.

特定部６２４は、位置指定と領域指定とを兼ねるユーザ操作の始点の位置に応じて、指定範囲に包含される指定フレームとの関係で当該指定範囲に包含される指定音声の音源の位置を特定する。本実施例では、音源の位置は指定範囲ごとに特定される。
設定部６２５は、特定部６２４により特定される音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストが当該音源に関連付けて配置されるように配置態様を設定し、音源の位置が表示範囲の外部である場合に重畳テキストが当該音源に関連付けずに配置されるように配置態様を設定する。いずれの場合にも、位置指定と領域指定とを兼ねるユーザ操作が描く閉図形が形成する閉領域の位置に応じて重畳テキストの配置領域を設定する。
生成部６２６は、サムネイル画像を生成する。生成されたサムネイル画像は、ファイル管理サーバ２４を介してストレージ２５に格納され、Ｗｅｂサーバにて公開される。 The specifying unit 624 specifies the position of the sound source of the specified sound included in the specified range in relation to the specified frame included in the specified range, according to the position of the start point of the user operation that serves as both the position specification and the area specification. I do. In this embodiment, the position of the sound source is specified for each specified range.
The setting unit 625 sets an arrangement mode such that the superimposed text is arranged in association with the sound source when the position of the sound source specified by the specifying unit 624 is inside the display range of the designated frame. The arrangement mode is set so that the superimposed text is arranged without being associated with the sound source when the text is outside the display range. In any case, the arrangement region of the superimposed text is set in accordance with the position of the closed region formed by the closed figure drawn by the user operation which is both position designation and region designation.
The generation unit 626 generates a thumbnail image. The generated thumbnail image is stored in the storage 25 via the file management server 24, and is made public on the Web server.

［２−５．画像生成手順］
図７は、画像生成手順を例示する。画像生成システム２０は、下記の手順によりサムネイル画像を生成する。
〔Ｓ７０２〕
ユーザ管理サーバ２１が、ユーザ端末１０からムービー指定データを取得する。ムービー指定データは、指定ムービーファイルがユーザ端末１０の補助記憶部に記憶されている場合は当該指定ムービーファイルを特定するファイルパス、指定ムービーファイルが他のサーバ装置にて管理されている場合は当該指定ムービーファイルを特定するＵＲＬ（Uniform Resource Locator）である。 [2-5. Image generation procedure]
FIG. 7 illustrates an image generation procedure. The image generation system 20 generates a thumbnail image according to the following procedure.
[S702]
The user management server 21 acquires the movie designation data from the user terminal 10. The movie designation data is a file path for specifying the designated movie file when the designated movie file is stored in the auxiliary storage unit of the user terminal 10, and the file path when the designated movie file is managed by another server device. It is a URL (Uniform Resource Locator) that specifies a specified movie file.

〔Ｓ７０４〕
ユーザ管理サーバ２１又はデータ処理サーバ２２が、指定ムービーファイルを取得する。指定ムービーファイルは、ユーザ端末１０又は他のサーバ装置から取得する。
取得された指定ムービーファイルは、ファイル管理サーバ２４を介してストレージ２５に格納される。また、取得された指定ムービーファイルを管理するムービー管理情報が、データ管理サーバ２３を介してストレージ２５に格納される。 [S704]
The user management server 21 or the data processing server 22 acquires the specified movie file. The designated movie file is obtained from the user terminal 10 or another server device.
The acquired designated movie file is stored in the storage 25 via the file management server 24. Also, movie management information for managing the obtained designated movie file is stored in the storage 25 via the data management server 23.

図８（ａ）は、ムービー管理情報の項目例を示す。
図８（ａ）に例示されるように、ムービー管理情報は、キー項目である「ムービーＩＤ」と、指定ムービーファイルのストレージ２５における格納位置を示す「ファイルパス」，指定ムービーに基づくサムネイル画像の生成をリクエストしたユーザを特定する「ユーザＩＤ」，指定ムービーに基づくサムネイル画像の生成リクエストを受け付けた時期を特定可能な「受付時期」，指定ムービーの「再生時間」と、を含む。 FIG. 8A shows an example of items of the movie management information.
As illustrated in FIG. 8A, the movie management information includes a key item “movie ID”, a “file path” indicating a storage location of the designated movie file in the storage 25, and a thumbnail image based on the designated movie. The information includes a “user ID” that specifies the user who has requested the generation, a “reception time” that can specify a time at which a request to generate a thumbnail image based on the specified movie has been received, and a “reproduction time” of the specified movie.

〔Ｓ７０６〕
データ処理サーバ２２が、指定ムービーから抽出される音声データに音声認識処理を施して変換テキストを生成する。なお、有効な音声認識が不可能である場合は、認識不可能な音声であることを示す補充テキストを生成する。生成されたテキストを管理するテキスト管理情報が、データ管理サーバ２３に格納される。 [S706]
The data processing server 22 performs a voice recognition process on the voice data extracted from the designated movie to generate a converted text. If effective speech recognition is not possible, a supplementary text indicating that the speech is unrecognizable is generated. Text management information for managing the generated text is stored in the data management server 23.

図８（ｂ）は、テキスト管理情報の項目例を示す。
図８（ｂ）に例示されるように、テキスト管理情報は、キー項目である「ムービーＩＤ」と、範囲の先頭を特定可能な「先頭位置」，範囲の末尾を特定可能な「末尾位置」，当該範囲に包含される音声に対する音声認識結果である「変換テキスト又は補充テキスト」と、を含む。 FIG. 8B shows an example of items of the text management information.
As illustrated in FIG. 8B, the text management information includes “movie ID” as a key item, “head position” at which the start of the range can be specified, and “end position” at which the end of the range can be specified. , "Converted text or supplementary text" which is a speech recognition result for speech included in the range.

〔Ｓ７０８〕
データ処理サーバ２２が、指定ムービーから仮フレームを抽出する。具体的には、指定ムービーから抽出される音声のまとまりにそれぞれ対応する複数の候補テキスト（変換テキスト又は補充テキスト）にそれぞれ関連付けられた複数の候補範囲ごとに、当該候補範囲に包含されるいずれかのフレームを仮フレームとして抽出する。 [S708]
The data processing server 22 extracts a temporary frame from the designated movie. Specifically, for each of a plurality of candidate ranges respectively associated with a plurality of candidate texts (converted text or supplementary text) respectively corresponding to a group of sounds extracted from the designated movie, one of the candidate ranges included in the candidate range Are extracted as temporary frames.

〔Ｓ７１０〕
ユーザ管理サーバ２１が、編集画面Ａをユーザ端末１０に提示する。
図９は、編集画面Ａの表示例である。
図９に例示されるように、編集画面Ａ９００は、指定ムービーのタイムライン９１０に対応付けて複数の候補範囲９２０（９２０ａ〜９２０ｆ）を配置したものである。複数の候補範囲９２０には、テキストボックス９２２（９２２ａ〜９２２ｆ），再生ボタン９２４（９２４ａ〜９２４ｆ），仮フレーム９２６（９２６ａ〜９２６ｆ），チェックボックス９２８（９２８ａ〜９２８ｆ）がそれぞれ対応付けて表示される。
編集画面Ａ９００において、タイムライン９１０は、ムービー管理情報（図８（ａ））の「再生時間」に対応する。 [S710]
The user management server 21 presents the editing screen A to the user terminal 10.
FIG. 9 is a display example of the editing screen A.
As illustrated in FIG. 9, the editing screen A900 has a plurality of candidate ranges 920 (920a to 920f) arranged in association with the timeline 910 of the designated movie. A text box 922 (922a to 922f), a play button 924 (924a to 924f), a temporary frame 926 (926a to 926f), and a check box 928 (928a to 928f) are displayed in association with the plurality of candidate ranges 920, respectively. You.
In the editing screen A900, the timeline 910 corresponds to the “reproduction time” of the movie management information (FIG. 8A).

候補範囲９２０は、テキスト管理情報（図８（ｂ））の「先頭位置」から「末尾位置」までの範囲に対応する。テキストボックス９２２に表示されるテキストは、テキスト管理情報（図８（ｂ））の「変換テキスト又は補充テキスト」に対応する。
再生ボタン９２４がタップされると、ユーザ端末１０はテキスト管理情報（図８（ｂ））の「先頭位置」から「末尾位置」までの候補範囲の音声の再生をユーザ管理サーバ２１にリクエストする。
ユーザが必要に応じて音声を再生させつつ必要に応じてテキストボックス９２２のテキストを編集し、単数又は複数（図９の表示例では４つ）の候補範囲にそれぞれ対応するチェックボックス９２８にチェックを入れてボタン９３０をタップすると、ユーザ端末１０は編集データＡをユーザ管理サーバ２１に送信する。 The candidate range 920 corresponds to a range from the “head position” to the “end position” of the text management information (FIG. 8B). The text displayed in the text box 922 corresponds to “converted text or supplementary text” in the text management information (FIG. 8B).
When the play button 924 is tapped, the user terminal 10 requests the user management server 21 to reproduce the audio in the candidate range from the “start position” to the “end position” of the text management information (FIG. 8B).
The user edits the text in the text box 922 as necessary while reproducing the sound as necessary, and checks the check boxes 928 corresponding to one or more (four in the display example of FIG. 9) candidate ranges. When the user inserts and taps the button 930, the user terminal 10 transmits the edit data A to the user management server 21.

〔Ｓ７１２〕
ユーザ管理サーバ２１が、ユーザ端末１０から編集データＡを取得する。取得された編集データＡは、データ管理サーバ２３を介してストレージ２５に蓄積される。
図１０（ａ）は、編集データＡの項目例を示す。
図１０（ａ）に例示されるように、編集データＡは、「ムービーＩＤ」と、候補範囲の先頭を特定可能な「先頭位置」と、候補範囲の末尾を特定可能な「末尾位置」と、当該候補範囲に対応する「編集テキスト」と、当該候補範囲が選択されている場合に有意とする「選択フラグ」と、を含む。 [S712]
The user management server 21 acquires the edited data A from the user terminal 10. The acquired edited data A is stored in the storage 25 via the data management server 23.
FIG. 10A shows an example of items of the edit data A.
As illustrated in FIG. 10A, the edit data A includes a “movie ID”, a “head position” that can specify the head of the candidate range, and a “tail position” that can specify the end of the candidate range. , An “edit text” corresponding to the candidate range, and a “selection flag” that is significant when the candidate range is selected.

〔Ｓ７１４〕
ユーザ管理サーバ２１が、編集画面Ｂをユーザ端末１０に提示する。
図１１は、編集画面Ｂの表示例である。
図１１に例示されるように、編集画面Ｂ１１００は、領域１１１０，領域１１２０，領域１１３０，ボタン１１４０を含む。 [S714]
The user management server 21 presents the editing screen B to the user terminal 10.
FIG. 11 is a display example of the editing screen B.
As illustrated in FIG. 11, the editing screen B1100 includes an area 1110, an area 1120, an area 1130, and a button 1140.

領域１１１０には、選択された範囲ごとに仮フレーム１１１２（１１１２ａ〜１１１２ｄ）と編集テキスト１１１４（１１１４ａ〜１１１４ｄ）と領域１１１６（１１１６ａ〜１１１６ｄ）が表示される。
仮フレーム１１１２の上方に表示される矢印がタップされると、当該仮フレームの直前又はそれより前の他のフレームに更新される。同様に、仮フレーム１１１２の下方に表示される矢印がタップされると、当該仮フレームの直後又はそれより後の他のフレームに更新される。
仮フレーム１１１２を包含する領域１１１６において図１及び図２を参照して説明したように閉図形を描くユーザ操作がなされると、軌跡が形成される。 In the area 1110, a temporary frame 1112 (1112a to 1112d), an edited text 1114 (1114a to 1114d), and an area 1116 (1116a to 1116d) are displayed for each selected range.
When the arrow displayed above the temporary frame 1112 is tapped, the frame is updated to another frame immediately before or before the temporary frame. Similarly, when an arrow displayed below the temporary frame 1112 is tapped, the frame is updated to another frame immediately after the temporary frame or later.
When a user operation for drawing a closed graphic is performed in the area 1116 including the temporary frame 1112 as described with reference to FIGS. 1 and 2, a trajectory is formed.

領域１１２０では、サムネイル画像のレイアウトが選択される。図１１の表示例では、２行２列とするか、４行１列とするかを選択させている。
領域１１３０では、サムネイル画像の出力形態が選択される。図１１の表示例では、ＪＰＥＧ（Joint Photographic Experts Group）形式のＷｅｂ表示用データとするか、ＰＤＦ（Portable Document Format）形式の印刷用データとするかを選択させている。
ユーザが必要に応じてフレームを選択し、選択されたフレームに対してユーザ操作を行い、レイアウト及び出力形態を選択し、ボタン１１４０をタップすると、ユーザ端末１０は編集データＢ及び選択情報（レイアウト選択情報及び出力形態選択情報）をユーザ管理サーバ２１に送信する。 In the area 1120, the layout of the thumbnail image is selected. In the display example of FIG. 11, a selection is made between two rows and two columns or four rows and one column.
In the area 1130, the output mode of the thumbnail image is selected. In the display example of FIG. 11, the user is allowed to select whether to use JPEG (Joint Photographic Experts Group) format Web display data or PDF (Portable Document Format) format print data.
When the user selects a frame as needed, performs a user operation on the selected frame, selects a layout and an output mode, and taps a button 1140, the user terminal 10 edits the data B and selection information (layout selection). Information and output format selection information) to the user management server 21.

〔Ｓ７１６〕
ユーザ管理サーバ２１が、ユーザ端末１０から編集データＢを取得する。取得された編集データＢは、データ管理サーバ２３を介してストレージ２５に蓄積される。
図１０（ｂ）は、編集データＢの項目例を示す。
図１０（ｂ）に例示されるように、編集データＢは、「ムービーＩＤ」と、候補範囲の先頭を特定可能な「先頭位置」と、候補範囲の末尾を特定可能な「末尾位置」と、当該候補範囲に包含される「選択フレーム」と、当該候補範囲に対応する「軌跡情報」と、を含む。 [S716]
The user management server 21 acquires the edited data B from the user terminal 10. The acquired edited data B is stored in the storage 25 via the data management server 23.
FIG. 10B shows an example of items of the edit data B.
As illustrated in FIG. 10B, the edit data B includes a “movie ID”, a “head position” that can specify the head of the candidate range, and a “tail position” that can specify the end of the candidate range. , The “selected frame” included in the candidate range, and “trajectory information” corresponding to the candidate range.

〔Ｓ７１８〕
データ処理サーバ２２が、サムネイル画像を生成する。生成されたサムネイル画像は、ファイル管理サーバ２４を介してストレージ２５に格納され、Ｗｅｂサーバにて公開される。
具体的には、データ処理サーバ２２は次の手順でサムネイル画像を生成する。
・編集データＡ（図１０（ａ））において「選択フラグ」が有意の範囲を指定範囲としてそれぞれ指定する。
・指定範囲に対応する編集データＡ（図１０（ａ））の「編集テキスト」を重畳テキストに指定する。
・指定範囲に対応する編集データＢ（図１０（ｂ））の「選択フレーム」を指定フレームに指定する。 [S718]
The data processing server 22 generates a thumbnail image. The generated thumbnail image is stored in the storage 25 via the file management server 24, and is made public on the Web server.
Specifically, the data processing server 22 generates a thumbnail image in the following procedure.
In the editing data A (FIG. 10A), the range in which the "selection flag" is significant is specified as the specified range.
Specify the "edit text" of the edit data A (FIG. 10A) corresponding to the specified range as the superimposed text.
Specify the “selected frame” of the edit data B (FIG. 10B) corresponding to the specified range as the specified frame.

・指定範囲に対応する編集データＢ（図１０（ｂ））の「軌跡情報」から特定される軌跡の始点の位置に応じて、指定フレームとの関係で当該指定範囲に包含される指定音声の音源の位置を特定する。音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストが当該音源に関連付けて配置されるように配置態様を設定し、音源の位置が表示範囲の外部である場合に重畳テキストが当該音源に関連付けずに配置されるように配置態様を設定する。
・指定範囲に対応する編集データＢ（図１０（ｂ））の「軌跡情報」から特定される軌跡が形成する閉領域の位置に応じて重畳テキストの配置領域を設定する。
・指定フレーム内の設定された配置領域に設定された配置態様で重畳テキストを重畳配置した画像を生成する。
・上記の手順で生成された指定範囲ごとの画像を、レイアウト選択情報により特定されるレイアウトで配置し、出力形態選択情報により特定される形式で出力する。 In accordance with the position of the starting point of the trajectory specified from the “trajectory information” of the editing data B (FIG. 10B) corresponding to the specified range, the designated voice included in the specified range in relation to the specified frame Identify the location of the sound source. When the position of the sound source is inside the display range of the specified frame, the arrangement mode is set so that the superimposed text is arranged in association with the sound source. When the position of the sound source is outside the display range, the superimposed text is The arrangement mode is set so as to be arranged without being associated with the sound source.
A superimposed text arrangement area is set in accordance with the position of the closed area formed by the trajectory specified from the "trajectory information" of the edit data B (FIG. 10B) corresponding to the specified range.
Generate an image in which the superimposed text is superimposed and arranged in the layout mode set in the set layout area in the designated frame.
The images for each of the designated ranges generated in the above procedure are arranged in the layout specified by the layout selection information, and are output in the format specified by the output form selection information.

〔Ｓ７２０〕
ユーザ管理サーバ２１が、ユーザ端末１０にサムネイル画像を提供する。
図１２は、サムネイル画像の表示例である。サムネイル画像１２００は、編集画面Ｂ１１００において２行２列のレイアウト及びＷｅｂ表示用データがそれぞれ選択された場合に生成されるデータである。
サムネイル画像１２００を構成する指定フレームには、重畳テキストがそれぞれ重畳配置されている。各指定フレームには、対応する指定範囲のムービーを再生させるハイパーリンクが設定されていてもよい。 [S720]
The user management server 21 provides the user terminal 10 with the thumbnail image.
FIG. 12 is a display example of a thumbnail image. The thumbnail image 1200 is data generated when the layout of two rows and two columns and the data for Web display are selected on the editing screen B1100.
The superimposed text is superimposed on each of the designated frames constituting the thumbnail image 1200. A hyperlink for playing a movie in a corresponding designated range may be set in each designated frame.

図１３は、サムネイル画像の表示例である。サムネイル画像１３００は、編集画面Ｂ１１００において４行１列のレイアウト及び印刷用データがそれぞれ選択された場合に生成されるデータである。
サムネイル画像１３００を構成する指定フレームには、重畳テキストがそれぞれ重畳配置されている。各指定フレームの隅には、対応する指定範囲のムービーを再生させるＵＲＬを変換した２次元コードを配置してもよい。 FIG. 13 is a display example of a thumbnail image. The thumbnail image 1300 is data generated when layout and print data of four rows and one column are selected on the editing screen B1100.
The superimposed text is superimposed on each of the designated frames constituting the thumbnail image 1300. A two-dimensional code obtained by converting a URL for reproducing a movie in a corresponding designated range may be arranged at a corner of each designated frame.

［２−６．実施例のシステムが奏する効果］
何らかの情報を他のユーザに知らせる手段としてＳＮＳ（Social Networking Service）がしばしば用いられる。ＳＮＳは情報を拡散させる用途に利用される性質上、静的な情報（例えば、テキスト，静止画像等）との親和性が高い。
新たなムービー（動画像）がインターネット上で視聴可能になった場合にも、その事実を知らせる手段としてＳＮＳを利用することは可能である。しかし、ムービー（動画像）の視聴にはまとまった時間がかかることから、ＳＮＳで紹介するには視聴した上でその内容を静的に示す投稿が必要であった。 [2-6. Effect of the system of the embodiment]
An SNS (Social Networking Service) is often used as a means for informing other users of some information. The SNS has a high affinity with static information (for example, text, still images, etc.) due to the property used for spreading information.
Even when a new movie (moving image) can be viewed on the Internet, it is possible to use the SNS as a means for notifying the fact. However, since viewing a movie (moving image) takes a considerable amount of time, it is necessary to post the content in a static manner after viewing it to introduce it on the SNS.

本実施例のシステムは、指定ムービーの指定範囲に包含される指定フレームに重畳表示させる重畳テキストを、指定音声の音源の位置が指定フレームの表示範囲の内部である場合に重畳テキストを該音源に関連付けて配置し、指定音声の音源の位置が指定フレームの表示範囲の外部である場合に重畳テキストを音源に関連付けずに配置する。音源の位置は、閉図形を描くユーザ操作の端点（始点）の位置に応じて特定される。また、重畳テキストの配置領域は、閉図形が形成する閉領域の位置に応じて設定される。
これらの処理が指定範囲ごとに行われ、テキストがそれぞれ重畳配置された複数の指定フレームをレイアウトしたサムネイル画像が最終的に出力される。
よって、実施例のシステムによれば、ムービーから抽出される指定音声に対応するテキストを当該ムービーから抽出される指定フレームに、より簡便な操作でより訴求力の高い態様で重畳配置することが可能になる。 The system according to the present embodiment provides a superimposed text to be superimposed on a specified frame included in a specified range of a specified movie, and a superimposed text to the sound source when the position of the sound source of the specified sound is inside the display range of the specified frame. When the position of the sound source of the designated voice is outside the display range of the designated frame, the superimposed text is placed without being associated with the sound source. The position of the sound source is specified according to the position of the end point (start point) of the user operation for drawing the closed figure. Further, the arrangement region of the superimposed text is set according to the position of the closed region formed by the closed figure.
These processes are performed for each designated range, and a thumbnail image in which a plurality of designated frames in which texts are respectively superimposed and laid out is finally output.
Therefore, according to the system of the embodiment, the text corresponding to the designated sound extracted from the movie can be superimposed on the designated frame extracted from the movie in a more appealing manner by a simpler operation. become.

［３．変形例］
［３−１．データ連携の変形例］
上記実施例では、編集データＡ及び編集データＢを同期通信でそれぞれ取得する構成が採用されている。データの伝送には、例えばHTTPプロトコルのPOSTメソッドが利用される。
これに対し、編集データＡの項目及び編集データＢの項目を非同期通信で順次取得する構成が採用されてもよい。データの伝送には、例えばXMLHttpRequestオブジェクトが利用されるとよい。 [3. Modification]
[3-1. Modified example of data linkage]
In the above embodiment, a configuration is adopted in which the edit data A and the edit data B are obtained by synchronous communication. For transmitting data, for example, a POST method of the HTTP protocol is used.
On the other hand, a configuration in which the item of the edit data A and the item of the edit data B are sequentially acquired by asynchronous communication may be adopted. For data transmission, for example, an XMLHttpRequest object may be used.

［３−２．処理主体の変形例］
上記実施例では、サムネイル画像の生成に関するデータ処理を画像生成システム２０（特にデータ処理サーバ２２）が実行する構成が採用されている。ユーザ端末１０は、画像生成システム２０の入出力装置に相当する役割を担っている。
これに対し、サムネイル画像の生成に関するデータ処理の少なくとも一部をユーザ端末１０が実行する構成が採用されてもよい。例えば音声認識処理をＰＣ等のユーザ端末１０に実行させれば画像生成システム２０（データ処理サーバ２２）の処理負荷を抑えることが可能になる。 [3-2. Modification of processing subject]
In the above embodiment, the configuration is adopted in which the image generation system 20 (particularly, the data processing server 22) executes the data processing related to the generation of the thumbnail image. The user terminal 10 has a role corresponding to an input / output device of the image generation system 20.
On the other hand, a configuration in which the user terminal 10 executes at least a part of the data processing regarding the generation of the thumbnail image may be adopted. For example, if the user terminal 10 such as a PC executes the voice recognition processing, the processing load on the image generation system 20 (data processing server 22) can be reduced.

［３−３．判断主体の変形例］
上記実施例では、指定範囲の指定，指定フレームの指定，音源の位置の特定，重畳テキストの配置領域の指定等をユーザの操作に基づいて行う構成が採用されている。
これに対し、これらの処理の少なくとも一部を、画像生成システム２０（例えば、データ処理サーバ２２）がユーザの操作に基づかないで行う構成が採用されてもよい。例えば擬似乱数に基づいてランダムに指定・特定してもよいし、所定の条件に基づいて指定・特定してもよい。 [3-3. Modification of judgment subject]
The above embodiment employs a configuration in which the designation of the designated range, the designated frame, the position of the sound source, the designation of the arrangement area of the superimposed text, and the like are performed based on the operation of the user.
On the other hand, a configuration may be adopted in which the image generation system 20 (for example, the data processing server 22) performs at least part of these processes without being based on a user operation. For example, it may be specified and specified at random based on a pseudo random number, or may be specified and specified based on a predetermined condition.

［３−４．ユーザ操作の変形例］
上記実施形態では、位置指定と領域指定とを兼ねるユーザ操作により指定される位置に応じて音源の位置が特定され、当該ユーザ操作により指定される領域に応じて重畳テキストの配置領域が設定される。上記実施例では、画面に閉図形を描くユーザ操作の軌跡の始点の位置に応じて音源の位置を特定し、軌跡が形成する閉領域の位置に応じてテキストの配置領域を設定する構成が採用されている。
これに対し、画面に線を描くユーザ操作の軌跡の始点の位置に応じて音源の位置を特定し、軌跡の終点の位置を包含する一定領域にテキストの配置領域を設定する構成が採用されてもよい。なお、テキストの配置領域は、テキストが他の被写体に重ならないように設定するのが好ましい。 [3-4. Modification of user operation]
In the above-described embodiment, the position of the sound source is specified according to the position specified by the user operation that serves both as the position specification and the region specification, and the arrangement region of the superimposed text is set according to the region specified by the user operation. . In the above embodiment, a configuration is adopted in which the position of the sound source is specified according to the position of the starting point of the trajectory of the user operation for drawing the closed figure on the screen, and the arrangement area of the text is set according to the position of the closed region formed by the trajectory. Have been.
On the other hand, a configuration is adopted in which the position of the sound source is specified according to the position of the start point of the trajectory of the user operation that draws a line on the screen, and the text arrangement area is set in a fixed area including the position of the end point of the trajectory. Is also good. It is preferable to set the text placement area so that the text does not overlap with another subject.

図１４を参照し、重畳テキストの配置態様の一例を説明する。図１４では、指定音声の音源の位置がフレームの表示範囲の内部である場合が想定される。
フレーム１４１０には被写体１４２０が含まれる。フレーム１４１０を包含する画面に線を描くユーザ操作がなされると、軌跡１４３０が形成される。このとき、軌跡１４３０の始点１４３１の位置に応じて音源の位置が特定され、軌跡１４３０の終点１４３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の内部である場合、重畳テキストは、終点１４３２の位置に応じた領域に始点１４３１の位置に応じた音源の位置に関連付けて配置される吹き出し１４４０の内部に配置される。 With reference to FIG. 14, an example of an arrangement mode of the superimposed text will be described. In FIG. 14, it is assumed that the position of the sound source of the designated sound is inside the display range of the frame.
The frame 1410 includes a subject 1420. When a user performs a user operation of drawing a line on the screen including the frame 1410, a trajectory 1430 is formed. At this time, the position of the sound source is specified according to the position of the start point 1431 of the trajectory 1430, and the text arrangement area is set according to the position of the end point 1432 of the trajectory 1430.
When the position of the sound source of the designated voice is inside the display range of the frame, the superimposed text is in the area corresponding to the position of the end point 1432 inside the speech balloon 1440 arranged in association with the position of the sound source corresponding to the position of the start point 1431. Placed in

図１５を参照し、重畳テキストの配置態様の他の例を説明する。図１５では、指定音声の音源の位置がフレームの表示範囲の外部である場合が想定される。
フレーム１５１０には被写体１５２０が含まれる。フレーム１５１０を包含する画面に線を描くユーザ操作がなされると、軌跡１５３０が形成される。このとき、軌跡１５３０の始点１５３１の位置に応じて音源の位置が特定され、軌跡１５３０の終点１５３２の位置に応じてテキストの配置領域が設定される。
指定音声の音源の位置がフレームの表示範囲の外部である場合、重畳テキストは、終点１５３２の位置に応じた領域に始点１５３１の位置に応じた音源の位置に関連付けずに配置される領域１５４０の内部に配置される。 With reference to FIG. 15, another example of the arrangement mode of the superimposed text will be described. In FIG. 15, it is assumed that the position of the sound source of the designated voice is outside the display range of the frame.
The frame 1510 includes a subject 1520. When a user operation for drawing a line on the screen including the frame 1510 is performed, a trajectory 1530 is formed. At this time, the position of the sound source is specified according to the position of the start point 1531 of the trajectory 1530, and the text placement area is set according to the position of the end point 1532 of the trajectory 1530.
When the position of the sound source of the designated sound is outside the display range of the frame, the superimposed text is placed in an area corresponding to the position of the end point 1532 without being associated with the position of the sound source corresponding to the position of the start point 1531. Located inside.

１０ユーザ端末
２０画像生成システム
２１ユーザ管理サーバ
２２データ処理サーバ（情報処理装置の一例）
２３データ管理サーバ
２４ファイル管理サーバ
２５ストレージ
３０通信ネットワーク

10 User terminal 20 Image generation system 21 User management server 22 Data processing server (an example of an information processing device)
23 data management server 24 file management server 25 storage 30 communication network

Claims

Specifying means for specifying, based on a user operation, a position of a sound source of a sound included in the specified range in relation to a frame included in the specified range specified by the user in the movie;
Setting means for setting an arrangement mode of superimposed text superimposed on the frame corresponding to the voice based on a position of a sound source specified by the specifying means and a trajectory drawn by a user operation;
An information processing apparatus comprising:

The user operation is an operation of drawing a closed figure,
The specifying means specifies a position of the sound source according to a position of an end point of the trajectory,
The setting means sets an arrangement area of the superimposed text according to a position of a closed area formed by the closed graphic,
The information processing device according to claim 1.

Designation means for designating, as a designated range, a range selected by a user from a candidate list including at least a plurality of candidate ranges of the movie corresponding to sounds extracted from the movie,
Specifying means for specifying a position of a sound source of the sound included in the specified range in relation to a frame included in the specified range;
Setting means for setting an arrangement mode of superimposed text superimposed on the frame corresponding to the voice based on a position of a sound source specified by the specifying means;
An information processing apparatus comprising:

The setting unit sets the arrangement mode according to the relationship between the position of the sound source specified by the specifying unit and the display range of the frame.
The information processing device according to claim 3.

The setting means sets the arrangement mode such that the superimposed text is arranged in association with the sound source when the position of the sound source is inside the display range, and the position of the sound source is outside the display range. In the case of setting the arrangement mode such that the superimposed text is arranged without being associated with the sound source,
The information processing device according to claim 4.

Wherein the specifying unit specifies the position of the sound source in accordance with the position specified by the user operations that serves both as a position designation and the area designation,
The setting unit sets the arrangement region of the superimposed text in accordance with the area designated by the user operation,
The information processing apparatus according to claim 4.

The specifying means specifies, for each specified range specified by the specifying means, a position of a sound source of a sound included in the specified range,
The setting means includes, for each specified range specified by the specifying means, an arrangement mode of the superimposed text to be superimposed on a frame included in the specified range included in the specified range specified by the specifying means. Settings based on the position of the sound source of the audio
The information processing apparatus according to claim 3.

The designating means presents to the user the candidate list including a plurality of candidate ranges respectively associated with a plurality of candidate texts respectively corresponding to a group of sounds included in the movie, and specifies the designated number range to the user. Let me choose
The information processing device according to claim 7.

A specifying step of specifying, based on a user operation, a position of a sound source of a sound included in the specified range in relation to a frame included in the specified range specified by the user in the movie;
A setting step of setting an arrangement mode of superimposed text superimposed on the frame corresponding to the voice based on a position of a sound source specified in the specifying step and a trajectory drawn by the user operation;
An information processing method, including:

A designation step of designating a range selected by a user from a candidate list including at least a plurality of candidate ranges of the movie corresponding to sounds extracted from the movie as a designated range,
A specifying step of specifying a position of a sound source of the sound included in the specified range in relation to a frame included in the specified range;
A setting step of setting the arrangement mode of the superimposed text that is superimposed on the frame corresponding to the voice, taking into account the position of the sound source specified in the specifying step;
An information processing method, including:

A specifying function of specifying, based on a user operation, a position of a sound source of a sound included in the specified range in relation to a frame included in the specified range specified by the user in the movie;
A setting function of setting a layout mode of superimposed text superimposed on the frame corresponding to the voice based on a position of a sound source specified by the specific function and a trajectory drawn by the user operation;
Control program that causes a computer of an information processing apparatus to implement the above.

A designation function of designating a range selected by a user from a candidate list including at least a plurality of candidate ranges of the movie corresponding to sounds extracted from the movie as a designated range,
A specifying function of specifying a position of a sound source of the sound included in the specified range in relation to a frame included in the specified range;
A setting function of setting the arrangement mode of the superimposed text superimposed on the frame corresponding to the voice, taking into account the position of the sound source specified by the specific function;
Control program that causes a computer of an information processing apparatus to implement the above.