JP6773349B1

JP6773349B1 - Information processing equipment and programs

Info

Publication number: JP6773349B1
Application number: JP2019222880A
Authority: JP
Inventors: 松尾　幸治; 幸治松尾
Original assignee: カクテルメイク株式会社
Priority date: 2019-12-10
Filing date: 2019-12-10
Publication date: 2020-10-21
Anticipated expiration: 2039-12-10
Also published as: JP2021093618A

Abstract

【課題】ＡＶデータの編集を行うユーザが、ＡＶデータに含まれる音声のデータを、テキストのデータとしてＡＶデータに重畳させて容易に表示可能とする情報処理装置及びプログラムを提供する。【解決手段】情報処理装置において、編集部１０２は、音声のデータと画像のデータとを少なくとも含む処理対象のＡＶデータのうち、音声のデータに基づいて、画像のデータを加工してＡＶデータを編集する。テキスト生成部１０３は、音声のデータに基づいて、音声の内容を示すテキストデータを生成する。対象決定部１０４は、生成されたテキストデータを所定単位の文字列に区分して、１以上の区分テキストデータを編集対象として決定する。編集部１０２はさらに、決定した１以上の区分テキストデータに基づいて処理対象のＡＶデータを編集する。【選択図】図７PROBLEM TO BE SOLVED: To provide an information processing device and a program which enable a user who edits AV data to superimpose audio data included in the AV data on the AV data as text data and easily display the data. SOLUTION: In an information processing apparatus, an editorial unit 102 processes image data based on audio data among AV data to be processed including at least audio data and image data to generate AV data. To edit. The text generation unit 103 generates text data indicating the content of the voice based on the voice data. The target determination unit 104 divides the generated text data into character strings of predetermined units, and determines one or more divided text data as editing targets. The editorial unit 102 further edits the AV data to be processed based on the determined one or more divided text data. [Selection diagram] FIG. 7

Description

本発明は、情報処理装置及びプログラムに関する。 The present invention relates to an information processing device and a program.

従来より、ＡＶ（ＡｕｄｉｏＶｉｓｕａｌ）データを視聴する者の理解を助けるための技術として、ＡＶデータに含まれる音声のデータをテキストのデータとしてＡＶデータに重畳させて表示させる技術は存在する（例えば特許文献１参照）。 Conventionally, as a technique for assisting a viewer's understanding of AV (Audio Visual) data, there is a technique for displaying audio data included in AV data as text data by superimposing it on AV data (for example, patent). Reference 1).

特開２０１２−１０５２３４号公報Japanese Unexamined Patent Publication No. 2012-105234

しかしながら、近年、いわゆる動画共有サービスの一般化に伴い、自身が編集したＡＶデータを動画共有サービスにアップロードする利用者が増える状況にある。このような利用者からは、ＡＶデータを簡単な操作で編集したいとする要望がある。 However, in recent years, with the generalization of so-called video sharing services, the number of users who upload their own edited AV data to video sharing services is increasing. From such users, there is a request to edit AV data with a simple operation.

本発明は、ＡＶデータの編集を行うユーザが、ＡＶデータに含まれる音声のデータを、テキストのデータとしてＡＶデータに重畳させて表示させる操作を容易に行うことができるようにすることを目的とする。 An object of the present invention is to enable a user who edits AV data to easily perform an operation of superimposing audio data included in the AV data on the AV data as text data and displaying the data. To do.

上記目的を達成するため、本発明の一態様の情報処理装置は、
音声のデータと画像のデータとを少なくとも含む処理対象のデータのうち、前記音声のデータに基づいて、前記画像のデータを加工することで前記処理対象のデータを編集する編集手段を備える。 In order to achieve the above object, the information processing device of one aspect of the present invention is
Among the data to be processed including at least audio data and image data, an editing means for editing the data to be processed by processing the image data based on the audio data is provided.

本発明の一態様のプログラムは、上述の本発明の一態様の情報処理装置に対応するプログラムである。 The program of one aspect of the present invention is a program corresponding to the above-mentioned information processing apparatus of one aspect of the present invention.

本発明によれば、ＡＶデータの編集を行うユーザが、ＡＶデータに含まれる音声のデータを、テキストのデータとしてＡＶデータに重畳させて表示させる操作を容易に行うことができる。 According to the present invention, a user who edits AV data can easily perform an operation of superimposing audio data included in the AV data on the AV data and displaying it as text data.

本発明の一実施形態に係るサーバを含む、情報処理システムにより実現可能な本サービスの一例の概要を示す図である。It is a figure which shows the outline of an example of this service which can be realized by an information processing system including the server which concerns on one Embodiment of this invention. 区分テキストデータを個別に編集する手法のうち、前後する区分テキストデータを結合する手法の一例を示す図である。It is a figure which shows an example of the method of combining the preceding and following division text data among the method of editing the division text data individually. 区分テキストデータを個別に編集する手法のうち、指定された区分テキストデータを編集する手法の一例を示す図である。It is a figure which shows an example of the method of editing the designated division text data among the method of editing the division text data individually. 区分テキストデータを個別に編集する手法のうち、テロップをＡＶデータに表示させる時間帯の設定、及びテロップの見た目の設定を夫々行うための手法の一例を示す図である。It is a figure which shows an example of the method for setting the time zone for displaying a telop in AV data, and setting the appearance of a telop among the methods for individually editing the divided text data. 本発明の一実施形態に係るサーバを含む、情報処理システムの構成の一例を示す図である。It is a figure which shows an example of the structure of the information processing system including the server which concerns on one Embodiment of this invention. 図５の情報処理システムのうち、サーバのハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware configuration of a server in the information processing system of FIG. 図６のサーバの機能的構成のうち、編集受付処理を実行するための機能的構成の一例を示す機能ブロック図である。It is a functional block diagram which shows an example of the functional configuration for executing an edit reception process among the functional configurations of the server of FIG. 図７の機能的構成を有するサーバにより実行が制御される編集受付処理の流れを示すフローチャートである。It is a flowchart which shows the flow of the edit acceptance process whose execution is controlled by the server which has the functional configuration of FIG.

以下、本発明の実施形態について、図面を用いて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

なお、以下において、単に「画像」と呼ぶ場合には、「動画像」と「静止画像」との両方を含むものとする。
また、「動画像」には、次の第１処理乃至第３処理の夫々により表示される画像を含むものとする。
第１処理とは、平面画像（２Ｄ画像）におけるオブジェクト（例えばアニメのキャラクタ）の夫々の動作に対して、複数枚からなる一連の静止画像を時間経過と共に連続的に切り替えて表示させる処理をいう。具体的には例えば、２次元アニメーション、いわゆるパラパラ漫画の原理による処理が第１処理に該当する。
第２処理とは、立体画像（３Ｄモデルの画像）におけるオブジェクト（例えばアニメのキャラクタ）の夫々の動作に対応するモーションを設定しておき、時間経過と共に当該モーションを変化させて表示させる処理をいう。具体的には例えば、３次元アニメーションが第２処理に該当する。
第３処理とは、オブジェクト（例えばアニメのキャラクタ）の夫々の動作に対応した映像（即ち動画像）を準備しておき、時間経過と共に当該映像を流していく処理をいう。
ここで、「映像（即ち動画像）」は、複数のフレームやフィールド等の画像（以下、「単位画像」と呼ぶ）から構成される。なお以下の例では、単位画像はフレームであるものとして説明する。 In the following, when simply referred to as an "image", both a "moving image" and a "still image" are included.
Further, the "moving image" shall include an image displayed by each of the following first to third processes.
The first process is a process of continuously switching and displaying a series of still images composed of a plurality of images with the passage of time in response to each operation of an object (for example, an animation character) in a flat image (2D image). .. Specifically, for example, processing based on the principle of two-dimensional animation, so-called flip book, corresponds to the first processing.
The second process is a process in which a motion corresponding to each movement of an object (for example, an animation character) in a stereoscopic image (3D model image) is set, and the motion is changed and displayed with the passage of time. .. Specifically, for example, three-dimensional animation corresponds to the second process.
The third process is a process in which an image (that is, a moving image) corresponding to each movement of an object (for example, an animation character) is prepared and the image is played over time.
Here, the "video (that is, moving image)" is composed of images such as a plurality of frames and fields (hereinafter, referred to as "unit images"). In the following example, the unit image will be described as a frame.

まず図１乃至図３を参照して、後述する図５の情報処理システムにより実現可能なサービス（以下、「本サービス」と呼ぶ）の概要について説明する。 First, with reference to FIGS. 1 to 3, an outline of a service (hereinafter, referred to as “this service”) that can be realized by the information processing system of FIG. 5 described later will be described.

図１は、本発明の一実施形態に係るサーバを含む、情報処理システムにより実現可能な本サービスの一例の概要を示す図である。 FIG. 1 is a diagram showing an outline of an example of this service that can be realized by an information processing system, including a server according to an embodiment of the present invention.

本サービスは、ＡＶデータの編集を行うユーザＵ（図５を参照）に対して、サービス提供者Ｇ（図５を参照）により提供されるサービスの一例である。
本サービスでは、音声のデータと画像のデータとを含むＡＶデータＤのうち音声のデータが、編集可能なテキストのデータＴ（以下、「テキストデータＴ」と呼ぶ）として出力される。
出力されたテキストデータＴは、ユーザＵによる所定の設定操作によって、再生されるＡＶデータＤの任意のタイミングで、任意の位置に重畳的に表示させることができる。 This service is an example of a service provided by a service provider G (see FIG. 5) to a user U (see FIG. 5) who edits AV data.
In this service, the audio data among the AV data D including the audio data and the image data is output as editable text data T (hereinafter referred to as "text data T").
The output text data T can be superposedly displayed at an arbitrary position at an arbitrary timing of the AV data D to be reproduced by a predetermined setting operation by the user U.

ここで、「音声のデータ」には、被写体としてＡＶデータＤに登場する人物Ｍや、ナレーションのように声のみで登場する人物Ｍにより発せられた音声がデータ化されたものが含まれる。また、人間以外の生物や物等から発せられた音声がデータ化されたものも、「音声のデータ」に含まれる。
具体的には例えば、犬や猫の鳴き声、レストランのＢＧＭ（ＢａｃｋＧｒｏｕｎｄＭｕｓｉｃ）、自動車のエンジン音等は、いずれも音声のデータに含まれる。 Here, the "voice data" includes data of the voice emitted by the person M appearing in the AV data D as the subject and the person M appearing only by the voice like a narration. In addition, "voice data" also includes data of voices emitted from organisms and objects other than humans.
Specifically, for example, the barking of dogs and cats, BGM (Back Ground Music) of restaurants, engine sounds of automobiles, and the like are all included in the audio data.

ユーザＵは、専門的な知識を必要とすることなく、自分のスマートフォン等の端末２（以下、「ユーザ端末２」と呼ぶ）を操作するだけで、編集の対象とするＡＶデータＤに含まれる音声のデータをテキストデータＴとして出力することができる。また、ユーザＵは、出力したテキストデータＴを編集することでＡＶデータＤを編集することができる。 The user U is included in the AV data D to be edited simply by operating the terminal 2 (hereinafter referred to as "user terminal 2") such as his / her smartphone without requiring specialized knowledge. Audio data can be output as text data T. Further, the user U can edit the AV data D by editing the output text data T.

図１には、本サービスを利用するユーザＵのユーザ端末２に表示されるＵＩ（ＵｓｅｒＩｎｔｅｒｆａｃｅ）の一例が示されている。図１に示すＵＩは、表示領域Ｆ１と表示領域Ｆ２とを少なくとも含むように構成されている。
表示領域Ｆ１には、編集対象となるＡＶデータＤが再生可能な状態で表示されている。なお、図１のＡＶデータＤの内容は、被写体として登場している人物Ｍが、ＡＶデータＤの視聴者に向けて様々な話をするものになっている。
表示領域Ｆ２には、編集対象となるＡＶデータＤに含まれる音声のデータをテキストデータ化した、テキストデータＴの一部が表示されている。具体的には、表示領域Ｆ２には、ｎ個（ｎは１以上の整数値）の文字列に区分されたテキストデータｔ１乃至ｔｎ（以下、「区分テキストデータｔ１乃至ｔｎ」と呼ぶ）のうち、区分テキストデータｔ１乃至ｔ７が表示されている。
即ち、表示領域Ｆ２には、「はいどうも」という区分テキストデータｔ１と、「タケノコです」という区分テキストデータｔ２と、「今日はですねこの」という区分テキストデータｔ３と、「ｕｓｂｃハブを」という区分テキストデータｔ４と、「紹介させていただこうと」という区分テキストデータｔ５と、「思います」という区分テキストデータｔ６と、「見てください」という区分テキストデータｔ７とが表示されている。
なお、図１の例では、「書き出し」と表記されたボタンＢ１が、区分テキストデータｔ６の上に重なるようにして表示されている。このため、図１に示す表示のタイミングでは、区分テキストデータｔ６が視認できない状態になっている。
また、図１の例では、区分テキストデータｔ１乃至ｔｎのうち区分テキストデータｔ１乃至ｔ７のみが表示されている。ただし、ユーザＵが、表示領域Ｆ２を上方向にスワイプする操作を行うことで、区分テキストデータｔ８乃至ｔｎを順次表示させることができる。 FIG. 1 shows an example of a UI (User Interface) displayed on the user terminal 2 of the user U who uses this service. The UI shown in FIG. 1 is configured to include at least a display area F1 and a display area F2.
In the display area F1, the AV data D to be edited is displayed in a playable state. It should be noted that the content of the AV data D in FIG. 1 is such that the person M appearing as a subject talks variously to the viewer of the AV data D.
In the display area F2, a part of the text data T, which is the text data of the voice data included in the AV data D to be edited, is displayed. Specifically, in the display area F2, among the text data t1 to tun (hereinafter, referred to as "divided text data t1 to tun") divided into n character strings (n is an integer value of 1 or more). , Category text data t1 to t7 are displayed.
That is, in the display area F2, the division text data t1 "Yes, thank you", the division text data t2 "It is Takenoko", the division text data t3 "Today is this", and "usbc hub" The division text data t4, the division text data t5 "let me introduce you", the division text data t6 "I think", and the division text data t7 "please see" are displayed.
In the example of FIG. 1, the button B1 described as "write" is displayed so as to overlap the divided text data t6. Therefore, at the display timing shown in FIG. 1, the division text data t6 is invisible.
Further, in the example of FIG. 1, only the division text data t1 to t7 are displayed among the division text data t1 to tn. However, the user U can swipe the display area F2 upward to sequentially display the divided text data t8 to tun.

ここで、ユーザＵが、表示領域Ｆ２の「書き出し」と表記されたボタンＢ１をタップする操作を行うと、図示せぬテキストデータＴの全文を、所定形式で出力することができる。
具体的には例えば、ユーザＵが、ボタンＢ１をタップする操作を行うと、テキストデータＴの全文として、「はいどうもタケノコです今日はですねこのｕｓｂｃハブを紹介させていただこうと思いますこれがですね見てくださいｈｄｍｉのケーブルもガッツリささですね僕も何度か使用しているんですけどねこれがね接続不良が全然起きないんですよ」といった内容のテキストデータＴが所定形式で出力される。
なお、ユーザＵがテキストデータＴの全文を出力する際の形式は特に限定されない。例えば、ユーザＵがテキストデータＴの全文を出力する際の形式として、テキストデータＴの全文をそのまま文章としてユーザ端末２に表示させることもできるし、データファイルとして出力することもできる。 Here, when the user U taps the button B1 labeled "write" in the display area F2, the entire text of the text data T (not shown) can be output in a predetermined format.
Specifically, for example, when user U taps button B1, the full text of the text data T says, "Yes, I'm a bamboo shoot. Today, I'd like to introduce you to this usbc hub. Please take a look. The HDMI cable is also tight. I have used it several times, but this does not cause any connection failure. ”Text data T is output in the specified format. ..
The format when the user U outputs the entire text of the text data T is not particularly limited. For example, as a format for the user U to output the full text of the text data T, the full text of the text data T can be displayed as a text as it is on the user terminal 2, or can be output as a data file.

ユーザＵは、区分テキストデータｔ１乃至ｔｎのうち、任意の区分テキストデータｔｋ（ｋは１以上ｎ以下の任意の整数値）を指定して個別に編集することもできる。
以下、図２及び図３を参照して、区分テキストデータｔｋを個別に編集する手法について説明する。 The user U can also specify arbitrary division text data tk (k is an arbitrary integer value of 1 or more and n or less) among the division text data t1 to tun and edit them individually.
Hereinafter, a method of individually editing the divided text data tk will be described with reference to FIGS. 2 and 3.

図２は、区分テキストデータを個別に編集する手法のうち、前後する区分テキストデータを結合する手法の一例を示す図である。 FIG. 2 is a diagram showing an example of a method of combining the preceding and following divided text data among the methods of individually editing the divided text data.

図２（Ａ）には、区分テキストデータｔｋと、区分テキストデータｔｋ＋１とを結合させる手法の一例が示されている。
図２（Ｂ）には、区分テキストデータｔｋと、区分テキストデータｔｋ＋１とが結合された後の状態の一例が示されている。
図２（Ａ）に示す手法では、ユーザＵは、区分テキストデータｔｋを示す編集用のオブジェクトＪｋに、区分テキストデータｔｋ＋１を示す編集用のオブジェクトＪｋ＋１をドラッグする操作を行う。これにより、区分テキストデータｔｋと区分テキストデータｔｋ＋１とを結合させることができる。
具体的には、図２（Ａ）に示すように、ユーザＵは、区分テキストデータｔ１を示す編集用のオブジェクトＪ１の上に、区分テキストデータｔ２を示す編集用のオブジェクトＪ２を重ねるようにドラッグする。
これにより、図２（Ｂ）に示すように、「はいどうも」という区分テキストデータｔ１に、「タケノコです」という区分テキストデータｔ２が結合されて、「はいどうもタケノコです」という区分テキストデータｔ１が表示される。
また、上述のドラッグ操作により、区分テキストデータｔ２は、区分テキストデータｔ１に結合されるので、その下の区分テキストデータｔ３乃至ｔ８が順次繰り上がって表示される。即ち、図２（Ａ）に示す区分テキストデータｔ３乃至ｔ８の夫々は、図２（Ｂ）に示す区分テキストデータｔ２乃至ｔ７の夫々として表示される。さらに、図２（Ａ）に示す状態（上述のドラッグ操作の前を示す状態）では表示されていなかった「見てください」という区分テキストデータｔ９が、図２（Ｂ）に示す状態では区分テキストデータｔ８に繰り上がって表示される。 FIG. 2A shows an example of a method of combining the divided text data tk and the divided text data tk + 1.
FIG. 2B shows an example of the state after the division text data tk and the division text data tk + 1 are combined.
In the method shown in FIG. 2A, the user U performs an operation of dragging the editing object Jk + 1 indicating the division text data tk + 1 to the editing object Jk indicating the division text data tk. Thereby, the division text data tk and the division text data tk + 1 can be combined.
Specifically, as shown in FIG. 2A, the user U drags the editing object J1 indicating the division text data t1 so as to superimpose the editing object J2 indicating the division text data t2. To do.
As a result, as shown in FIG. 2 (B), the division text data t1 "Yes, it is" is combined with the division text data t2 "It is bamboo shoot", and the division text data t1 "Yes, it is bamboo shoot" is combined. Is displayed.
Further, since the division text data t2 is combined with the division text data t1 by the above-mentioned drag operation, the division text data t3 to t8 below the division text data t3 are sequentially moved up and displayed. That is, each of the divided text data t3 to t8 shown in FIG. 2 (A) is displayed as each of the divided text data t2 to t7 shown in FIG. 2 (B). Further, the division text data t9 "please see" that was not displayed in the state shown in FIG. 2 (A) (the state showing before the drag operation described above) is the division text in the state shown in FIG. 2 (B). It is displayed in the data t8.

上述したように、ユーザＵは、区分テキストデータｔ１乃至ｔｎのうち任意の区分テキストデータｔｋを編集対象として指定して個別に編集することができる。
具体的には例えば、ユーザＵは、図２（Ｂ）に示すように、区分テキストデータｔ１を示す編集用のオブジェクトＪ１の右端に「・・・」と表記されたボタンＢ２をタップする操作を行う。これにより、ユーザＵは、区分テキストデータｔ１を編集対象として指定して個別に編集することができる。 As described above, the user U can specify any division text data tk among the division text data t1 to tun as an editing target and edit it individually.
Specifically, for example, as shown in FIG. 2B, the user U taps the button B2 marked with "..." at the right end of the editing object J1 indicating the division text data t1. Do. As a result, the user U can specify the divided text data t1 as an editing target and edit it individually.

図３は、区分テキストデータを個別に編集する手法のうち、指定された区分テキストデータを編集する手法の一例を示す図である。 FIG. 3 is a diagram showing an example of a method of editing a designated divided text data among the methods of individually editing the divided text data.

図３（Ａ）には、ユーザＵが区分テキストデータｔｋを編集する様子が示されている。上述した図２（Ｂ）に示すボタンＢ２がタップされると、表示領域Ｆ２には、図３（Ａ）に示すような、区分テキストデータｔｋを編集するためのボタンＢ４が表示される。
具体的には例えば、図３（Ａ）に示すように、区分テキストデータｔ１が編集対象として指定されると、区分テキストデータｔ１を示す編集用のオブジェクトＪ１がアクティブになる。また、それとともに、入力文字を選択するためのボタンＢ４が表示領域Ｆ２に表示される。
これにより、ユーザＵは、区分テキストデータｔ１を自由に編集することができる。なお、図３（Ａ）には、「はいどうもタケノコです」という文章が「はいどうもタケノコで」という文章に編集された例が示されている。
ユーザＵは、編集作業が完了した場合には、その旨を示すボタンＢ３をタップする操作を行う。これにより、ユーザＵは、編集対象として指定している区分テキストデータｔ１に対応する表示用のオブジェクトであるテロップＰ１をＡＶデータＤに重畳させて表示させることができる。 FIG. 3A shows how the user U edits the segmented text data tk. When the button B2 shown in FIG. 2B described above is tapped, the button B4 for editing the divided text data tk as shown in FIG. 3A is displayed in the display area F2.
Specifically, for example, as shown in FIG. 3A, when the division text data t1 is designated as an editing target, the editing object J1 indicating the division text data t1 becomes active. At the same time, a button B4 for selecting an input character is displayed in the display area F2.
As a result, the user U can freely edit the divided text data t1. In addition, FIG. 3A shows an example in which the sentence "Yes, it is a bamboo shoot" is edited into the sentence "Yes, it is a bamboo shoot".
When the editing work is completed, the user U performs an operation of tapping the button B3 indicating that fact. As a result, the user U can superimpose the telop P1, which is a display object corresponding to the division text data t1 designated as the editing target, on the AV data D and display it.

図３（Ｂ）には、区分テキストデータｔ１を示す編集用のオブジェクトＪ１の右側に、ボタンＢ５とボタンＢ６とが表示されている。ボタンＢ５は、ユーザＵが、編集後の区分テキストデータｔｋに対応するテロップＰｋをＡＶデータＤに表示させる時間帯を設定する際にタップ等するボタンである。ボタンＢ６は、ユーザＵが、テロップＰｋをＡＶデータＤに表示させるか否かを設定する際にタップ等するボタンである。
図３（Ｂ）の例において、ユーザＵがボタンＢ５をタップする操作を行うと、例えば図４（Ａ）に示すようなＵＩがユーザ端末２に表示される。即ち、ユーザ端末２には、ユーザＵがＡＶデータＤにテロップＰ１を表示させる時間帯を設定するためのＵＩが表示される。なお、ユーザＵがＡＶデータＤにテロップＰ１を表示させる時間帯を設定するためのＵＩの具体例については、図４（Ａ）を参照して後述する。
また、ユーザＵがボタンＢ６をタップする操作を行うと、ＡＶデータＤにテロップＰ１を表示させないようにすることができる。 In FIG. 3B, buttons B5 and buttons B6 are displayed on the right side of the editing object J1 showing the division text data t1. Button B5 is a button that the user U taps or the like when setting a time zone for displaying the telop Pk corresponding to the edited divided text data tk on the AV data D. Button B6 is a button that the user U taps or the like when setting whether or not to display the telop Pk on the AV data D.
In the example of FIG. 3B, when the user U taps the button B5, for example, the UI as shown in FIG. 4A is displayed on the user terminal 2. That is, the user terminal 2 displays a UI for setting a time zone in which the user U displays the telop P1 on the AV data D. A specific example of the UI for setting the time zone for the user U to display the telop P1 on the AV data D will be described later with reference to FIG. 4 (A).
Further, when the user U taps the button B6, the telop P1 can be prevented from being displayed on the AV data D.

図４は、区分テキストデータを個別に編集する手法のうち、テロップをＡＶデータに表示させる時間帯の設定、及びテロップの見た目の設定を夫々行うための手法の一例を示す図である。 FIG. 4 is a diagram showing an example of a method for individually editing the divided text data, for setting the time zone for displaying the telop on the AV data and for setting the appearance of the telop, respectively.

図４（Ａ）には、図３に例示する、「はいどうもタケノコで」という区分テキストデータｔ１に対応するテロップＰ１を、ＡＶデータＤに重畳させて表示させる時間帯を設定するためのＵＩの一例が示されている。
図４（Ａ）に示すように、ユーザＵは、表示領域Ｆ２に示すタイムラインＬ上の設定バーＲ１及びＲ２の夫々をドラッグする操作を行う。これにより、ユーザＵは、テロップＰ１をＡＶデータＤに重畳させて表示させる時間帯を自由に設定することができる。 FIG. 4A shows a UI for setting a time zone in which the telop P1 corresponding to the division text data t1 “Yes, it is a bamboo shoot” illustrated in FIG. 3 is superimposed on the AV data D and displayed. An example is shown.
As shown in FIG. 4A, the user U performs an operation of dragging each of the setting bars R1 and R2 on the timeline L shown in the display area F2. As a result, the user U can freely set the time zone in which the telop P1 is superimposed on the AV data D and displayed.

図４（Ｂ）の表示領域Ｆ２には、テロップＰをＡＶデータＤに表示させるときの見た目を設定するためのＵＩが表示されている。
具体的には、図４（Ｂ）の表示領域Ｆ２には、ＵＩとして、テロップＰのフォントと、テロップＰのフォントカラーとの夫々を設定する操作を行うためのボタンとして、ボタンＢ７と、ボタンＢ８との夫々が表示されている。
これにより、ユーザＵは、ＡＶデータＤに表示させるテロップＰの見た目を自由に設定することができる。具体的には例えば、図４（Ｂ）に例示するように、ＡＶデータＤに重畳するように表示された「おはようございます」というテロップＰについて、フォントを「ゴシック１」とし、フォントカラーを「白」とする設定を行うことができる。 In the display area F2 of FIG. 4B, a UI for setting the appearance when the telop P is displayed on the AV data D is displayed.
Specifically, in the display area F2 of FIG. 4B, buttons B7 and buttons are used as buttons for setting the font of the telop P and the font color of the telop P as UI. Each of B8 is displayed.
As a result, the user U can freely set the appearance of the telop P to be displayed on the AV data D. Specifically, for example, as illustrated in FIG. 4B, the font is set to "Gothic 1" and the font color is set to "Gothic 1" for the telop P "Good morning" displayed so as to be superimposed on the AV data D. You can set it to "white".

次に、本サービスを実現させる情報システムの構成について説明する。
図５は、本発明の一実施形態に係るサーバを含む、情報処理システムの構成の一例を示す図である。 Next, the configuration of the information system that realizes this service will be described.
FIG. 5 is a diagram showing an example of a configuration of an information processing system including a server according to an embodiment of the present invention.

図５に示す情報処理システムは、サーバ１と、ユーザ端末２とを含むように構成されている。
サーバ１、及びユーザ端末２の夫々は、インターネット等の所定のネットワークＮを介して相互に接続されている。 The information processing system shown in FIG. 5 is configured to include a server 1 and a user terminal 2.
The server 1 and the user terminal 2 are connected to each other via a predetermined network N such as the Internet.

サーバ１は、サービス提供者Ｇにより管理される情報処理装置である。サーバ１は、ユーザ端末２と適宜通信をしながら、本サービスを実現するための各種処理を実行する。 The server 1 is an information processing device managed by the service provider G. The server 1 executes various processes for realizing this service while appropriately communicating with the user terminal 2.

ユーザ端末２は、ユーザＵにより操作される情報処理装置であって、例えばパーソナルコンピュータ、スマートフォン、タブレット等で構成される。 The user terminal 2 is an information processing device operated by the user U, and is composed of, for example, a personal computer, a smartphone, a tablet, or the like.

図６は、図５の情報処理システムのうち、サーバのハードウェア構成の一例を示すブロック図である。 FIG. 6 is a block diagram showing an example of the hardware configuration of the server in the information processing system of FIG.

サーバ１は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１１と、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）１２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１３と、バス１４と、入出力インターフェース１５と、入力部１６と、出力部１７と、記憶部１８と、通信部１９と、ドライブ２０とを備えている。 The server 1 includes a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, a bus 14, an input / output interface 15, an input unit 16, and an output unit 17. A storage unit 18, a communication unit 19, and a drive 20 are provided.

ＣＰＵ１１は、ＲＯＭ１２に記録されているプログラム、又は、記憶部１８からＲＡＭ１３にロードされたプログラムに従って各種の処理を実行する。
ＲＡＭ１３には、ＣＰＵ１１が各種の処理を実行する上において必要なデータ等も適宜記憶される。 The CPU 11 executes various processes according to the program recorded in the ROM 12 or the program loaded from the storage unit 18 into the RAM 13.
Data and the like necessary for the CPU 11 to execute various processes are also appropriately stored in the RAM 13.

ＣＰＵ１１、ＲＯＭ１２及びＲＡＭ１３は、バス１４を介して相互に接続されている。このバス１４にはまた、入出力インターフェース１５も接続されている。入出力インターフェース１５には、入力部１６、出力部１７、記憶部１８、通信部１９及びドライブ２０が接続されている。 The CPU 11, ROM 12 and RAM 13 are connected to each other via the bus 14. An input / output interface 15 is also connected to the bus 14. An input unit 16, an output unit 17, a storage unit 18, a communication unit 19, and a drive 20 are connected to the input / output interface 15.

入力部１６は、例えばキーボード等により構成され、各種情報を出力する。
出力部１７は、液晶等のディスプレイやスピーカ等により構成され、各種情報を画像や音声として出力する。
記憶部１８は、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）等で構成され、各種データを記憶する。
通信部１９は、インターネットを含むネットワークＮを介して他の装置（例えば図５のユーザ端末２等）との間で通信を行う。 The input unit 16 is composed of, for example, a keyboard or the like, and outputs various information.
The output unit 17 is composed of a display such as a liquid crystal display, a speaker, or the like, and outputs various information as images or sounds.
The storage unit 18 is composed of a DRAM (Dynamic Random Access Memory) or the like, and stores various data.
The communication unit 19 communicates with another device (for example, the user terminal 2 in FIG. 5) via the network N including the Internet.

ドライブ２０には、磁気ディスク、光ディスク、光磁気ディスク、或いは半導体メモリ等よりなる、リムーバブルメディア３０が適宜装着される。ドライブ２０によってリムーバブルメディア３０から読み出されたプログラムは、必要に応じて記憶部１８にインストールされる。
また、リムーバブルメディア３０は、記憶部１８に記憶されている各種データも、記憶部１８と同様に記憶することができる。 A removable medium 30 made of a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is appropriately mounted on the drive 20. The program read from the removable media 30 by the drive 20 is installed in the storage unit 18 as needed.
In addition, the removable media 30 can also store various data stored in the storage unit 18 in the same manner as the storage unit 18.

なお、図示はしないが、図５のユーザ端末２も、図６に示すハードウェア構成と同様の構成を有することができる。従って、ユーザ端末２のハードウェア構成の説明については省略する。 Although not shown, the user terminal 2 of FIG. 5 can also have the same configuration as the hardware configuration shown in FIG. Therefore, the description of the hardware configuration of the user terminal 2 will be omitted.

このような図６のサーバ１の各種ハードウェアと各種ソフトウェアとの協働により、サーバ１における編集受付処理を含む各種処理の実行が可能になる。その結果、サービス提供者Ｇは、上述の本サービスを提供することができる。 By collaborating with the various hardware of the server 1 of FIG. 6 and the various software, it is possible to execute various processes including the edit reception process on the server 1. As a result, the service provider G can provide the above-mentioned service.

「編集受付処理」とは、上述の本サービスを実現させる処理のことをいう。
以下、サーバ１において処理の実行が制御される、編集受付処理を実行するための機能構成について説明する。 The "editing reception process" means the process for realizing the above-mentioned service.
Hereinafter, the functional configuration for executing the edit reception process, in which the execution of the process is controlled on the server 1, will be described.

図７は、図６のサーバの機能的構成のうち、編集受付処理を実行するための機能的構成の一例を示す機能ブロック図である。 FIG. 7 is a functional block diagram showing an example of the functional configuration for executing the edit reception process among the functional configurations of the server of FIG.

図７に示すように、サーバ１のＣＰＵ１１においては、編集受付処理の実行が制御される場合、取得部１０１と、編集部１０２と、テキスト生成部１０３と、対象決定部１０４と、表示制御部１０５とが機能する。
また、サーバ１の記憶部１８の一領域には、ＡＶデータＤＢ１８１が設けられている。ＡＶデータＤＢ１８１には、ユーザ端末２で作成又は取得された１以上のＡＶデータＤが記憶されて管理されている。 As shown in FIG. 7, in the CPU 11 of the server 1, when the execution of the edit reception process is controlled, the acquisition unit 101, the editing unit 102, the text generation unit 103, the target determination unit 104, and the display control unit 105 works.
Further, an AV data DB 181 is provided in one area of the storage unit 18 of the server 1. The AV data DB 181 stores and manages one or more AV data D created or acquired by the user terminal 2.

取得部１０１は、音声のデータと画像のデータとを少なくとも含む処理対象のＡＶデータＤを取得する。取得部１０１により取得されたＡＶデータＤは、ＡＶデータＤＢ１８１に記憶されて管理される。 The acquisition unit 101 acquires the AV data D to be processed, which includes at least audio data and image data. The AV data D acquired by the acquisition unit 101 is stored and managed in the AV data DB 181.

編集部１０２は、取得部１０１により取得された処理対象のＡＶデータＤのうち、音声のデータに基づいて画像のデータを加工することで、処理対象のＡＶデータＤを編集する。具体的には、編集部１０２は、画像のデータの加工として、例えば上述の図２乃至図４に示すような編集を行う。
また、編集部１０２は、後述する対象決定部１０４により編集対象として決定された区分テキストデータｔ１乃至ｔｎに基づいて、処理対象のＡＶデータＤを編集する。 The editing unit 102 edits the AV data D to be processed by processing the image data based on the audio data among the AV data D to be processed acquired by the acquisition unit 101. Specifically, the editing unit 102 performs editing as shown in FIGS. 2 to 4 described above, for example, as processing of image data.
Further, the editing unit 102 edits the AV data D to be processed based on the division text data t1 to tun determined as the editing target by the target determination unit 104 described later.

テキスト生成部１０３は、取得部１０１により取得された処理対象のＡＶデータＤに含まれる音声のデータに基づいて、音声のデータの音声の内容を示すテキストデータＴを生成する。
具体的には、テキスト生成部１０３は、取得部１０１により取得された処理対象のＡＶデータＤに含まれる音声のデータを認識して、その音声のデータの内容を示すテキストデータＴを生成する。例えば上述の図１乃至図４の例では、テキスト生成部１０３は、「はいどうもタケノコです今日はですねこのｕｓｂｃハブを紹介させていただこうと思いますこれがですね見てくださいｈｄｍｉのケーブルもガッツリささですね僕も何度か使用しているんですけどねこれがね接続不良が全然起きないんですよ」という内容のテキストデータＴを生成する。
なお、テキスト生成部１０３がテキストデータＴを生成する際に用いられる手法は特に限定されない。例えば従来の文字起こしの手法を採用することもできるし、ＡＩ（人工知能）による自動認識の手法を採用することもできる。 The text generation unit 103 generates text data T indicating the content of the voice of the voice data based on the voice data included in the AV data D to be processed acquired by the acquisition unit 101.
Specifically, the text generation unit 103 recognizes the voice data included in the AV data D to be processed acquired by the acquisition unit 101, and generates the text data T indicating the content of the voice data. For example, in the example of FIGS. 1 to 4 above, the text generator 103 says, "Yes, I'm a bamboo shoot. Today, I'd like to introduce you to this usbc hub. Look at this. The HDMI cable is also tight. Well, I've used it several times, but this is where no connection failure occurs at all. "
The method used by the text generation unit 103 to generate the text data T is not particularly limited. For example, a conventional transcription method can be adopted, or an automatic recognition method by AI (artificial intelligence) can be adopted.

対象決定部１０４は、テキスト生成部１０３により生成されたテキストデータＴを、区分テキストデータｔ１乃至ｔｎに区分して、１以上の区分テキストデータｔを編集対象として決定する。
ここで、対象決定部１０４による編集対象の決定は、自動的に行われてもよいし、ユーザＵに選択させてもよい。編集対象が自動的に決定される場合の具体的手法は特に限定されないが、例えば所定のアルゴリズムや、ＡＩ（人工知能）による機械学習等の技術が用いられる。 The target determination unit 104 divides the text data T generated by the text generation unit 103 into the division text data t1 to tun, and determines one or more division text data t as the editing target.
Here, the determination of the editing target by the target determination unit 104 may be automatically performed or may be selected by the user U. The specific method when the editing target is automatically determined is not particularly limited, but for example, a predetermined algorithm or a technique such as machine learning by AI (artificial intelligence) is used.

表示制御部１０５は、編集部１０２による編集を支援するための所定のＵＩを表示する制御を実行する。
具体的には例えば、表示制御部１０５は、編集部１０２による編集を支援するための所定のＵＩとして、図１乃至図４に示すＵＩをユーザ端末２に表示させる制御を実行する。 The display control unit 105 executes control for displaying a predetermined UI for supporting editing by the editorial unit 102.
Specifically, for example, the display control unit 105 executes control for displaying the UI shown in FIGS. 1 to 4 on the user terminal 2 as a predetermined UI for supporting editing by the editing unit 102.

次に、図８を参照して、図７の機能的構成を有するサーバ１により実行が制御される編集受付処理の流れについて説明する。
図８は、図７の機能的構成を有するサーバ１により実行が制御される編集受付処理の流れを示すフローチャートである。 Next, with reference to FIG. 8, the flow of the edit reception process whose execution is controlled by the server 1 having the functional configuration of FIG. 7 will be described.
FIG. 8 is a flowchart showing a flow of edit reception processing whose execution is controlled by the server 1 having the functional configuration of FIG. 7.

即ち、図７のサーバ１により編集受付処理の実行が制御される場合には、ステップＳ１において、サーバ１のテキスト生成部１０３は、処理対象となるＡＶデータＤが選択されたか否かを判定する。
所定のＡＶデータＤが処理対象として選択された場合には、ステップＳ１において「ＹＥＳ」と判定されて、処理はステップＳ２に進む。
これに対して、処理対象となるＡＶデータＤが選択されていない場合には、ステップＳ１において「ＮＯ」と判定されて、所定のＡＶデータＤが処理対象として選択されるまで、ステップＳ１の処理の制御が繰り返し実行される。 That is, when the execution of the edit acceptance process is controlled by the server 1 of FIG. 7, in step S1, the text generation unit 103 of the server 1 determines whether or not the AV data D to be processed is selected. ..
When the predetermined AV data D is selected as the processing target, it is determined as "YES" in step S1, and the processing proceeds to step S2.
On the other hand, when the AV data D to be processed is not selected, the process of step S1 is performed until it is determined as "NO" in step S1 and the predetermined AV data D is selected as the processing target. Control is repeatedly executed.

ステップＳ２において、サーバ１のテキスト生成部１０３は、処理対象として選択されたＡＶデータＤに含まれる音声のデータを認識する。
ステップＳ３において、サーバ１のテキスト生成部１０３は、処理対象として選択されたＡＶデータＤに含まれる音声のデータに基づいて、音声のデータの音声の内容を示すテキストデータＴを生成する。
ステップＳ４において、サーバ１の対象決定部１０４は、ステップＳ３でテキスト生成部１０３により生成されたテキストデータＴを、区分テキストデータｔ１乃至ｔｎに区分する。
ステップＳ５において、サーバ１の対象決定部１０４は、ステップＳ４で区分した区分テキストデータｔ１乃至ｔｎのうち、１以上の区分テキストデータｔを編集対象として決定する。
ステップＳ６において、サーバ１の表示制御部１０５は、編集部１０２による編集を支援するためのＵＩとして、操作対象となる１以上の区分テキストデータｔを含むＵＩをユーザ端末２に表示する制御を実行する。 In step S2, the text generation unit 103 of the server 1 recognizes the voice data included in the AV data D selected as the processing target.
In step S3, the text generation unit 103 of the server 1 generates text data T indicating the content of the voice of the voice data based on the voice data included in the AV data D selected as the processing target.
In step S4, the target determination unit 104 of the server 1 divides the text data T generated by the text generation unit 103 in step S3 into the division text data t1 to tn.
In step S5, the target determination unit 104 of the server 1 determines one or more of the divided text data t1 to tun classified in step S4 as the editing target.
In step S6, the display control unit 105 of the server 1 executes control to display the UI including one or more divided text data t to be operated on the user terminal 2 as the UI for supporting the editing by the editing unit 102. To do.

ステップＳ７において、サーバ１の編集部１０２は、区分テキストデータｔを含むＵＩを介して、処理対象となるＡＶデータＤの区分テキストデータが編集されたか否かを判定する。処理対象となるＡＶデータＤの区分テキストデータが編集された場合には、ステップＳ７において「ＹＥＳ」と判定されて、処理はステップＳ８に進む。
これに対して、処理対象となるＡＶデータＤの区分テキストデータが編集されていない場合には、ステップＳ７において「ＮＯ」と判定されて、処理対象となるＡＶデータＤの区分テキストデータが編集されるまでステップＳ７の処理の制御が繰り返し実行される。 In step S7, the editorial unit 102 of the server 1 determines whether or not the divided text data of the AV data D to be processed has been edited via the UI including the divided text data t. When the division text data of the AV data D to be processed is edited, it is determined as "YES" in step S7, and the process proceeds to step S8.
On the other hand, if the division text data of the AV data D to be processed has not been edited, it is determined as "NO" in step S7, and the division text data of the AV data D to be processed is edited. Until then, the control of the process in step S7 is repeatedly executed.

ステップＳ８において、サーバ１の編集部１０２は、区分テキストデータｔに対する編集を受付ける。
これにより、サーバ１により実行が制御される編集受付処理が終了する。 In step S8, the editorial unit 102 of the server 1 accepts edits to the divided text data t.
As a result, the edit acceptance process whose execution is controlled by the server 1 ends.

以上、本発明の一実施形態について説明したが、本発明は、上述の実施形態に限定されるものではなく、本発明の目的を達成できる範囲での変形、改良等は本発明に含まれるものである。 Although one embodiment of the present invention has been described above, the present invention is not limited to the above-described embodiment, and modifications, improvements, and the like within the range in which the object of the present invention can be achieved are included in the present invention. Is.

例えば、上述の実施形態におけるＡＶデータＤに含まれる音声のデータや画像のデータは例示に過ぎず、あらゆるＡＶデータＤを本サービスの対象とすることができる。 For example, the audio data and the image data included in the AV data D in the above-described embodiment are merely examples, and any AV data D can be the target of this service.

また例えば、図１乃至図４では、ＡＶデータＤに登場する人物Ｍが１人のみ描画されているが、これは例示に過ぎない。ＡＶデータＤに登場する人物Ｍは複数人存在してもよい。 Further, for example, in FIGS. 1 to 4, only one person M appearing in the AV data D is drawn, but this is only an example. There may be a plurality of persons M appearing in the AV data D.

また例えば、上述の実施形態におけるテキストデータＴ（即ち音声のデータ）の内容や、テキストデータＴを構成する区分テキストデータｔ１乃至ｔ７の夫々の区分単位や内容は例示に過ぎない。当然ながら上述の実施形態以外の区分単位や内容であってもよい。 Further, for example, the contents of the text data T (that is, voice data) in the above-described embodiment and the respective division units and contents of the division text data t1 to t7 constituting the text data T are merely examples. Of course, it may be a division unit or content other than the above-described embodiment.

また例えば、図２には、区分テキストデータｔ１と区分テキストデータｔ２とを結合させる手法として、区分テキストデータｔ１を示す編集用のオブジェクトＪ１の上に、区分テキストデータｔ２を示す編集用のオブジェクトＪ２を重ねるようにドラッグする手法が示されている。ただし、この手法以外にも、例えば区分テキストデータｔ２を示す編集用のオブジェクトＪ２の上に、区分テキストデータｔ１を示す編集用のオブジェクトＪ１を重ねるようにドラッグしてもよい。これにより、区分テキストデータｔ１と区分テキストデータｔ２とを結合させることができる。 Further, for example, in FIG. 2, as a method of combining the divided text data t1 and the divided text data t2, the editing object J2 showing the divided text data t2 is placed on the editing object J1 showing the divided text data t1. The method of dragging so as to overlap is shown. However, in addition to this method, for example, the editing object J1 indicating the division text data t1 may be dragged onto the editing object J2 indicating the division text data t2. As a result, the divided text data t1 and the divided text data t2 can be combined.

また、図５に示すシステム構成や、図６に示すサーバ１のハードウェア構成は、本発明の目的を達成するための例示に過ぎず、特に限定されない。 Further, the system configuration shown in FIG. 5 and the hardware configuration of the server 1 shown in FIG. 6 are merely examples for achieving the object of the present invention, and are not particularly limited.

また、図７に示す機能ブロック図は、例示に過ぎず、特に限定されない。即ち、上述した一連の処理を全体として実行できる機能が情報処理システムに備えられていれば足り、この機能を実現するためにどのような機能ブロックを用いるのかは、特に図７の例に限定されない。 Further, the functional block diagram shown in FIG. 7 is merely an example and is not particularly limited. That is, it suffices if the information processing system is provided with a function capable of executing the above-mentioned series of processes as a whole, and what kind of functional block is used to realize this function is not particularly limited to the example of FIG. ..

また、機能ブロックの存在場所も、図７に限定されず、任意でよい。
例えば、図７の例において、編集受付処理の実行に必要となる機能ブロックは、サーバ１側が備える構成となっているが、これは例示に過ぎない。例えば本サービスの利用者専用のアプリケーションプログラムをユーザ端末２にインストールさせることにより、これらの機能ブロックの少なくとも一部をユーザ端末２側が備える構成としてもよい。
また、１つの機能ブロックは、ハードウェア単体で構成してもよいし、ソフトウェア単体で構成してもよいし、それらの組み合わせで構成してもよい。 Further, the location of the functional block is not limited to FIG. 7, and may be arbitrary.
For example, in the example of FIG. 7, the functional block required for executing the edit reception process is configured to be provided on the server 1 side, but this is only an example. For example, by installing an application program dedicated to the user of this service on the user terminal 2, at least a part of these functional blocks may be provided on the user terminal 2 side.
Further, one functional block may be configured by a single piece of hardware, a single piece of software, or a combination thereof.

各機能ブロックの処理をソフトウェアにより実行させる場合には、そのソフトウェアを構成するプログラムが、コンピュータ等にネットワークや記録媒体からインストールされる。
コンピュータは、専用のハードウェアに組み込まれているコンピュータであってもよい。また、コンピュータは、各種のプログラムをインストールすることで、各種の機能を実行することが可能なコンピュータ、例えばサーバの他汎用のスマートフォンやパーソナルコンピュータであってもよい。 When the processing of each functional block is executed by software, the programs constituting the software are installed on a computer or the like from a network or a recording medium.
The computer may be a computer embedded in dedicated hardware. Further, the computer may be a computer capable of executing various functions by installing various programs, for example, a general-purpose smartphone or a personal computer in addition to a server.

このようなプログラムを含む記録媒体は、各ユーザにプログラムを提供するために装置本体とは別に配布される、リムーバブルメディアにより構成されるだけではなく、装置本体に予め組み込まれた状態で各ユーザに提供される記録媒体等で構成される。 The recording medium containing such a program is not only composed of removable media, which is distributed separately from the device main body to provide the program to each user, but also is preliminarily incorporated in the device main body to each user. It is composed of the provided recording media and the like.

なお、本明細書において、記録媒体に記録されるプログラムを記述するステップは、その順序に添って時系列的に行われる処理はもちろん、必ずしも時系列的に処理されなくとも、並列的或いは個別に実行される処理をも含むものである。 In the present specification, the steps for describing a program recorded on a recording medium are not necessarily processed in chronological order according to the order, but are not necessarily processed in chronological order, but in parallel or individually. It also includes the processing to be executed.

また、本明細書において、システムの用語は、複数の装置や複数の手段等より構成される全体的な装置を意味するものである。 Further, in the present specification, the term of the system means an overall device composed of a plurality of devices, a plurality of means, and the like.

以上まとめると、本発明が適用される情報処理システムは、次のような構成を取れば足り、各種各様な実施形態を取ることができる。
即ち、本発明が適用される情報処理装置（例えば図７のサーバ１）は、
音声のデータ（例えば音声のデータ）と画像のデータ（例えば画像のデータ）とを少なくとも含む処理対象のデータ（例えばＡＶデータＤ）のうち、前記音声のデータに基づいて、前記画像のデータを加工することで前記処理対象のデータを編集する編集手段（例えば図７の編集部１０２）を備える。 Summarizing the above, the information processing system to which the present invention is applied suffices to have the following configuration, and various various embodiments can be taken.
That is, the information processing device to which the present invention is applied (for example, the server 1 in FIG. 7) is
Of the data to be processed (for example, AV data D) including at least audio data (for example, audio data) and image data (for example, image data), the image data is processed based on the audio data. An editing means (for example, the editing unit 102 in FIG. 7) for editing the data to be processed is provided.

これにより、音声のデータと画像のデータとを含む処理対象のデータのうち、音声のデータに基づいて画像のデータを加工することで処理対象のデータを編集することができる。その結果、専門的な知識がない者であっても、動画共有サービスにアップロードするためのＡＶデータを簡単な操作で編集することが可能となる。 As a result, among the data to be processed including the audio data and the image data, the data to be processed can be edited by processing the image data based on the audio data. As a result, even a person without specialized knowledge can edit AV data for uploading to a video sharing service with a simple operation.

また、前記処理対象のデータに含まれる前記音声のデータに基づいて、当該音声の内容を示すテキストのデータを生成するテキスト生成手段（例えば図６のテキスト生成部１０３）と、
生成された前記テキストを所定単位の文字列に区分して、１以上の文字列を編集対象として決定する編集対象決定手段（例えば図６の対象決定部１０４）と、
をさらに備え、
前記編集手段は、編集対象として決定された前記１以上の文字列に基づいて、前記処理対象のデータを編集することができる。 Further, a text generation means (for example, the text generation unit 103 in FIG. 6) that generates text data indicating the content of the voice based on the voice data included in the data to be processed, and
An edit target determination means (for example, the target determination unit 104 in FIG. 6) that divides the generated text into character strings of a predetermined unit and determines one or more character strings as edit targets.
With more
The editing means can edit the data to be processed based on the one or more character strings determined as the editing target.

これにより、処理対象のデータに含まれる音声のデータに基づいて、その音声の内容を示すテキストのデータが生成される。また、所定単位の文字列に区分されたテキストのデータに基づいて編集対象が決定される。その結果、専門的な知識がない者であっても、動画共有サービスにアップロードするためのＡＶデータを簡単な操作で編集することが可能となる。 As a result, text data indicating the content of the voice is generated based on the voice data included in the data to be processed. In addition, the editing target is determined based on the text data divided into character strings of a predetermined unit. As a result, even a person without specialized knowledge can edit AV data for uploading to a video sharing service with a simple operation.

また、前記編集を支援するための編集用画面を表示する制御を実行する表示制御手段（例えば図６の表示制御部１０５）をさらに備えることができる。 Further, a display control means (for example, the display control unit 105 in FIG. 6) for executing control for displaying the editing screen for supporting the editing can be further provided.

これにより、処理対象のデータを編集するための画面がスマートフォン等の端末に表示させることができる。その結果、専門的な知識がない者であっても、スマートフォン等の端末に対する簡単な操作で、例えば動画共有サービスにアップロードするためのＡＶデータを容易に編集することが可能となる。 As a result, a screen for editing the data to be processed can be displayed on a terminal such as a smartphone. As a result, even a person without specialized knowledge can easily edit AV data for uploading to a video sharing service, for example, by a simple operation on a terminal such as a smartphone.

また、前記表示制御手段は、
編集対象として決定された前記１以上の文字列の夫々を示すオブジェクトを、前記編集用画面に選択可能に表示させる制御を実行することができる。 Further, the display control means is
It is possible to execute a control for displaying an object indicating each of the one or more character strings determined as an editing target on the editing screen so as to be selectable.

これにより、処理対象のデータを編集するための編集用画面に、編集対象として決定された１以上の文字列の夫々を示すオブジェクトが表示される。その結果、例えば動画共有サービスにアップロードするためのＡＶデータを簡単な操作で編集することが可能となる。 As a result, an object indicating each of the one or more character strings determined as the editing target is displayed on the editing screen for editing the data to be processed. As a result, for example, AV data for uploading to a video sharing service can be edited with a simple operation.

１・・・サーバ、２・・・ユーザ端末、１１・・・ＣＰＵ、１２・・・ＲＯＭ、１３・・・ＲＡＭ、１４・・・バス、１５・・・入出力インターフェース、１６・・・入力部、１７・・・出力部、１８・・・記憶部、１９・・・通信部、２０・・・ドライブ、３０・・・リムーバブルメディア、１０１・・・取得部、１０２・・・編集部、１０３・・・テキスト生成部、１０４・・・対象決定部、１０５・・・表示制御部、１８１・・・ＡＶデータＤＢ、Ｄ・・・ＡＶデータ、ｔ・・・区分テキストデータ、Ｂ・・・ボタン、Ｊ・・・編集用のオブジェクト、Ｆ・・・表示領域、Ｍ・・・人物、Ｌ・・・画像タイムライン、Ｒ・・・バー、Ｎ・・・ネットワーク
1 ... server, 2 ... user terminal, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... bus, 15 ... input / output interface, 16 ... input Unit, 17 ... Output unit, 18 ... Storage unit, 19 ... Communication unit, 20 ... Drive, 30 ... Removable media, 101 ... Acquisition unit, 102 ... Editorial department, 103 ... text generation unit, 104 ... target determination unit, 105 ... display control unit, 181 ... AV data DB, D ... AV data, t ... division text data, B ... -Button, J ... Editing object, F ... Display area, M ... Person, L ... Image timeline, R ... Bar, N ... Network

Claims

Of the data to be processed including at least audio data and image data, an editing means for editing the data to be processed by processing the image data based on the audio data, and
A text generation means for generating text data indicating the content of the voice based on the voice data included in the data to be processed, and
An editing target determining means for dividing the text in the generated text data into character strings of a predetermined unit and determining one or more character strings as editing targets.
A display control means that executes control to display a predetermined editing screen for supporting the editing, and
Equipped with a,
The editing means edits the data to be processed based on the one or more character strings determined as the target of the editing.
The display control means executes control to display an object indicating each of the one or more character strings determined as the target of editing on the editing screen in chronological order and in a selectable manner.
In the editing target determining means, one third object is generated by overlapping the first object and the second object in the context among the one or more objects displayed in time series on the editing screen. Then, the third character string obtained by combining the first character string indicated by the first object and the second character string indicated by the second object is used as the character string indicated by the third object. Decide what to edit,
Information processing device.

The third object is created by overlapping the first object and the second object by a predetermined operation of the user who performs the editing.
The information processing device according to claim 1.

For computers that control information processing devices
Of the data to be processed including at least audio data and image data, an editing step of editing the processing target data by processing the image data based on the audio data, and
A text generation step of generating text data indicating the content of the voice based on the voice data included in the data to be processed, and a text generation step.
An edit target determination step in which the text in the generated text data is divided into character strings of a predetermined unit and one or more character strings are determined as edit targets.
A display control step that executes control to display a predetermined editing screen for supporting the editing, and
A program for executing a control process comprising,
In the editing step, the data to be processed is edited based on the one or more character strings determined as the target of the editing.
In the display control step, control is executed to display an object indicating each of the one or more character strings determined as the target of the editing on the editing screen in chronological order and in a selectable manner.
In the edit target determination step, one third object is generated by overlapping the first object and the second object in the context among the one or more objects displayed in time series on the edit screen. Then, the third character string obtained by combining the first character string indicated by the first object and the second character string indicated by the second object is used as the character string indicated by the third object. Decide what to edit,
program.