JP2022145617A

JP2022145617A - Method and system for generating video content based on voice synthesis for image

Info

Publication number: JP2022145617A
Application number: JP2022039998A
Authority: JP
Inventors: ジェミンキム; Jaemin Kim; スミイ; Sumee Lee; ジュヒョンイ; Joo Hyun Lee; ソヒョンパク; So Hyun Park; ヘインチョン; Hyein Jeong; ジョンミンソン; Jeong Min Sohn; ソジョンファン; So Jeong Hwang
Original assignee: Line Corp; Naver Corp
Current assignee: Z Intermediate Global Corp; Naver Corp
Priority date: 2021-03-17
Filing date: 2022-03-15
Publication date: 2022-10-04
Anticipated expiration: 2042-03-15
Also published as: JP7277635B2; KR102465870B1; KR20220129868A

Abstract

To provide a method and system for generating video content based on voice synthesis for an image.SOLUTION: A video content generation method includes: a step of extracting a snapshot of an image uploaded trough a content editing tool; a step of displaying the extracted snapshot along a timeline through the content editing tool; a step of providing a length adjustment function for adjusting the length of the snapshot displayed through the content editing tool; a step of adjusting a running time of the snapshot with the length adjusted by the length adjustment function, in accordance with the adjusted length; and a step of generating video synthesis for a text to be input trough the content editing tool and adding it at a point of time selected on the timeline.SELECTED DRAWING: Figure 4

Description

以下の説明は、イメージに対する音声合成に基づいて映像コンテンツを生成する方法およびシステムに関する。 The following description relates to methods and systems for generating video content based on speech synthesis to images.

イメージを含む資料に音源（音声合成（ＴｅｘｔＴｏＳｐｅｅｃｈ：ＴＴＳ）含む）を適用しようとするとき、一例として、パワーポイントで形成された資料の場合には、各スライドにそれぞれ１つの音源を追加しなければならないという面倒な作業が求められた。このとき、各スライドには１つの音源しか追加することができないという制約があり、再生開始時間を自由に編集することができないというも問題もあった。 When applying a sound source (including text-to-speech (TTS)) to a material containing images, for example, in the case of a PowerPoint-based material, one sound source must be added to each slide. The troublesome work that must be done was required. At this time, there is a restriction that only one sound source can be added to each slide, and there is also the problem that the playback start time cannot be freely edited.

このように、映像コンテンツの製作と消費のニーズが高まった現在の市場における音声合成を利用した従来の映像製作技術は、面倒で制限的な形態しか提供することができないという問題を抱えている。 As such, the conventional video production technology using voice synthesis in the current market where the needs for production and consumption of video content have increased has the problem that it can only be provided in a cumbersome and limited form.

韓国公開特許第１０－２０１４－０１４７４０１号公報（公開日：２０１４年１２月３０）Korean Patent Publication No. 10-2014-0147401 (publication date: December 30, 2014)

多数のイメージに対してユーザが希望する音声合成をリアルタイムで生成してユーザが希望する再生開始時間にダビングすることができ、生成された音声合成がダビングされた多数のイメージによって映像コンテンツを生成および提供することができる、映像コンテンツ生成方法およびシステムを提供する。 A voice synthesis desired by the user can be generated in real time for a number of images and dubbed at a reproduction start time desired by the user, and the generated voice synthesis can be dubbed to generate and reproduce video contents. A method and system for generating video content is provided.

少なくとも１つのプロセッサを含むコンピュータ装置の映像コンテンツ生成方法であって、前記少なくとも１つのプロセッサにより、コンテンツ編集ツールにアップロードされたイメージのスナップショットを抽出する段階、前記少なくとも１つのプロセッサにより、前記抽出されたスナップショットを前記コンテンツ編集ツールでタイムラインに沿って表示する段階、前記少なくとも１つのプロセッサにより、前記コンテンツ編集ツールに前記表示されたスナップショットの長さを調節する長さ調節機能を提供する段階、前記少なくとも１つのプロセッサにより、前記長さ調節機能によって長さが調節されたスナップショットのランニングタイムを前記調節された長さによって調節する段階、および前記少なくとも１つのプロセッサにより、前記コンテンツ編集ツールに入力されるテキストに対する音声合成を生成して前記タイムラインの選択された時点に追加する段階を含むことを特徴とする映像コンテンツ生成方法を提供する。 1. A method of generating video content for a computing device comprising at least one processor, comprising: extracting, by said at least one processor, a snapshot of an image uploaded to a content editing tool; displaying the displayed snapshot along a timeline with the content editing tool; and providing, by the at least one processor, a length adjustment function to the content editing tool to adjust the length of the displayed snapshot. adjusting, by the at least one processor, the running time of the snapshot whose length has been adjusted by the length adjustment function by the adjusted length; and, by the at least one processor, providing the content editing tool with A video content generation method is provided, comprising generating a speech synthesis for an input text and adding it to the selected time point of the timeline.

一側面によると、前記表示されたスナップショットの長さは、前記表示されたスナップショットに対応するイメージが前記タイムライン上で占有する時間である前記ランニングタイムに比例し、前記タイムラインに沿って表示する段階は、前記抽出されたスナップショットをデフォルトランニングタイムに比例する長さで前記コンテンツ編集ツールに表示することを特徴としてよい。 According to one aspect, the length of the displayed snapshot is proportional to the running time, which is the time occupied on the timeline by the image corresponding to the displayed snapshot, along the timeline. The displaying step may be characterized by displaying the extracted snapshot in the content editing tool for a length proportional to a default running time.

他の側面によると、前記長さ調節機能を提供する段階は、前記表示されたスナップショットのうちの第１スナップショットに対して、予め設定された左側領域または右側領域に対するユーザのタッチ＆ドラッグまたはクリック＆ドラッグによって前記第１スナップショットの長さを増加または減少させる機能を提供することを特徴としてよい。 According to another aspect, the step of providing the length adjustment function includes a user's touch-and-drag or It may be characterized by providing the ability to increase or decrease the length of the first snapshot by clicking and dragging.

また他の側面によると、前記長さ調節機能を提供する段階は、前記第１スナップショットの前記左側領域または前記右側領域に対するユーザのタッチまたはクリックが維持される間、前記第１スナップショットの左側終端部分または右側終端部分に対する前記タイムライン上の時点を表示することを特徴としてよい。 According to yet another aspect, the step of providing the length adjustability comprises adjusting the left side of the first snapshot while a user's touch or click on the left side region or the right side region of the first snapshot is maintained. It may be characterized by displaying the point in time on the timeline for the end portion or the right end portion.

また他の側面によると、前記ランニングタイムを前記調節された長さによって調節する段階は、前記長さが調節されたスナップショットに対応するイメージが前記タイムライン上で占有する時間である前記ランニングタイムを前記長さが調節された程度に比例するように増加または減少させることを特徴としてよい。 According to another aspect, the step of adjusting the running time according to the adjusted length is the time occupied on the timeline by an image corresponding to the length-adjusted snapshot. is increased or decreased proportionally to the extent to which said length is adjusted.

また他の側面によると、前記音声合成を生成して前記タイムラインの選択された時点に追加する段階は、前記コンテンツ編集ツールで選択された音声タイプによって前記テキストに対する音声合成を生成することを特徴としてよい。 According to yet another aspect, generating the speech synthesis and adding to the timeline at the selected time includes generating speech synthesis for the text according to a speech type selected in the content editing tool. may be

また他の側面によると、前記音声合成を生成して前記タイムラインの選択された時点に追加する段階は、前記タイムライン上で特定の時点を示すタイムインジケータの移動によって選択された前記タイムラインの特定の時点に、前記生成された音声合成を追加することを特徴としてよい。 According to yet another aspect, the step of generating and adding the speech synthesis to the selected point in time on the timeline includes: It may be characterized by adding the generated speech synthesis at a specific point in time.

また他の側面によると、前記映像コンテンツ生成方法は、前記少なくとも１つのプロセッサにより、ユーザの入力に基づいて、前記タイムラインに追加された前記音声合成の前記タイムライン上の位置を移動させる段階をさらに含んでよい。 According to yet another aspect, the method for generating video content includes, by the at least one processor, moving a position on the timeline of the speech synthesis added to the timeline based on user input. May contain more.

また他の側面によると、前記映像コンテンツ生成方法は、前記少なくとも１つのプロセッサにより、前記コンテンツ編集ツールで提供された複数の効果音のうちから１つの効果音が選択される段階、および前記少なくとも１つのプロセッサにより、前記コンテンツ編集ツールで前記タイムラインに対して選択された時点に、前記選択された効果音を追加する段階をさらに含んでよい。 According to yet another aspect, the method for generating video content includes selecting, by the at least one processor, one sound effect from among a plurality of sound effects provided by the content editing tool; adding the selected sound effect to the timeline at the selected time point with the content editing tool, by a single processor.

また他の側面によると、前記映像コンテンツ生成方法は、前記少なくとも１つのプロセッサにより、前記表示されたスナップショットの順序を変更するための機能を提供する段階をさらに含んでよい。 According to yet another aspect, the video content generation method may further include providing, by the at least one processor, functionality for reordering the displayed snapshots.

また他の側面によると、前記イメージは、イメージ化が可能な複数のページを含むファイルの形態でアップロードされることを特徴としてよい。 According to another aspect, the image may be uploaded in the form of a file containing a plurality of pages that can be imaged.

また他の側面によると、前記音声合成を生成して前記タイムラインの選択された時点に追加する段階は、前記タイムラインに追加しようとする第１音声合成が前記タイムラインに既に追加された第２音声合成とランニングタイムの少なくとも一部が重なる場合、前記第１音声合成を前記第２音声合成とは異なる音声チャンネルとして前記タイムラインに追加することを特徴としてよい。 According to yet another aspect, the step of generating and adding the speech synthesis to the timeline at the selected point in time includes adding a first speech synthesis to be added to the timeline to a first speech synthesis that has already been added to the timeline. The first speech synthesis may be added to the timeline as an audio channel different from the second speech synthesis when at least part of the running time overlaps with the second speech synthesis.

また他の側面によると、前記音声合成を生成して前記タイムラインの選択された時点に追加する段階は、前記タイムラインの選択された時点に追加された音声合成に対するインジケータを前記コンテンツ編集ツールに表示することを特徴としてよい。 According to yet another aspect, generating and adding the speech synthesis to the timeline at the selected point includes providing an indicator to the speech synthesis added at the timeline at the selected point to the content editing tool. It may be characterized by displaying.

また他の側面によると、前記インジケータによって前記テキストの少なくとも一部が表示されることを特徴としてよい。 In still another aspect, the indicator may display at least a portion of the text.

また他の側面によると、前記インジケータの長さは、前記音声合成の長さに比例することを特徴としてよい。 In still another aspect, the length of the indicator may be proportional to the length of the speech synthesis.

また他の側面によると、前記音声合成を生成して前記タイムラインの選択された時点に追加する段階は、前記インジケータに対するユーザ入力に基づいて、前記音声合成の生成に利用された音声タイプに関する情報、前記音声合成の長さに関する情報、および前記テキストのうちの少なくとも１つを表示することを特徴としてよい。 According to yet another aspect, generating the speech synthesis and adding it to the timeline at the selected point includes information about the speech type used to generate the speech synthesis based on user input to the indicator. , information about the length of said speech synthesis, and said text.

前記方法をコンピュータ装置に実行させるためのコンピュータプログラムを提供する。 A computer program is provided for causing a computer device to perform the method.

前記方法をコンピュータ装置に実行させるためのプログラムが記録されている、コンピュータ読み取り可能な記録媒体を提供する。 A computer-readable recording medium is provided in which a program for causing a computer device to execute the method is recorded.

コンピュータ読み取り可能な命令を実行するように実現される少なくとも１つのプロセッサを含み、前記少なくとも１つのプロセッサにより、コンテンツ編集ツールにアップロードされたイメージのスナップショットを抽出し、前記抽出されたスナップショットを前記コンテンツ編集ツールでタイムラインに沿って表示し、前記コンテンツ編集ツールに前記表示されたスナップショットの長さを調節する長さ調節機能を提供し、前記長さ調節機能によって長さが調節されたスナップショットのランニングタイムを前記調節された長さによって調節し、前記コンテンツ編集ツールに入力されるテキストに対する音声合成を生成して前記タイムラインの選択された時点に追加することを特徴とする、コンピュータ装置を提供する。 at least one processor implemented to execute computer readable instructions, said at least one processor extracting a snapshot of an image uploaded to a content editing tool; displaying along a timeline in a content editing tool, providing a length adjustment function for adjusting the length of the displayed snapshot in the content editing tool, and a snap whose length is adjusted by the length adjustment function; A computer device that adjusts the running time of a shot according to the adjusted length, generates speech synthesis for the text input to the content editing tool, and adds it to the selected time point on the timeline. I will provide a.

多数のイメージに対してユーザが希望する音声合成をリアルタイムで生成してユーザが希望する再生開始時間にダビングすることができ、生成された音声合成がダビングされた多数のイメージを利用して映像コンテンツを生成および提供することができる。 A voice synthesis desired by a user can be generated in real time for a plurality of images and dubbed at a reproduction start time desired by the user, and the generated voice synthesis is used for video contents using a plurality of dubbed images. can be generated and provided.

本発明の一実施形態における、ネットワーク環境の例を示した図である。1 is a diagram showing an example of a network environment in one embodiment of the present invention; FIG. 本発明の一実施形態における、コンピュータ装置の例を示したブロック図である。1 is a block diagram illustrating an example of a computing device, in accordance with one embodiment of the present invention; FIG. 本発明の一実施形態における、映像コンテンツ生成システムの例を示した図である。1 is a diagram showing an example of a video content generation system in one embodiment of the present invention; FIG. 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。FIG. 4 is a diagram showing an example of a content editing tool screen in one embodiment of the present invention; 本発明の一実施形態における、映像コンテンツ生成方法の例を示したフローチャートである。4 is a flow chart illustrating an example of a method for generating video content, according to an embodiment of the present invention;

以下、実施形態について、添付の図面を参照しながら詳しく説明する。 Embodiments will be described in detail below with reference to the accompanying drawings.

本発明の実施形態に係るコンテンツ生成システムは、少なくとも１つのコンピュータ装置によって実現されてよく、本発明の実施形態に係るコンテンツ生成方法は、コンテンツ生成システムを実現する少なくとも１つのコンピュータ装置によって実行されてよい。コンピュータ装置においては、本発明の一実施形態に係るコンピュータプログラムがインストールされて実行されてよく、コンピュータ装置は、実行されたコンピュータプログラムの制御にしたがって本発明の実施形態に係るコンテンツ生成方法を実行してよい。上述したコンピュータプログラムは、コンピュータ装置と結合してコンテンツ生成方法をコンピュータ装置に実行させるためにコンピュータ読み取り可能な記録媒体に記録されてよい。 A content generation system according to embodiments of the present invention may be implemented by at least one computer device, and a content generation method according to an embodiment of the present invention may be implemented by at least one computer device that implements the content generation system. good. A computer program according to an embodiment of the present invention may be installed and executed in a computer device, and the computer device executes a content generation method according to an embodiment of the present invention under control of the executed computer program. you can The computer program described above may be recorded in a computer-readable recording medium in order to combine with a computer device and cause the computer device to execute the content generation method.

図１は、本発明の一実施形態における、ネットワーク環境の例を示した図である。図１のネットワーク環境は、複数の電子機器１１０、１２０、１３０、１４０、複数のサーバ１５０、１６０、およびネットワーク１７０を含む例を示している。このような図１は、発明の説明のための一例に過ぎず、電子機器の数やサーバの数が図１のように限定されることはない。また、図１のネットワーク環境は、本実施形態に適用可能な環境を説明するための一例に過ぎず、本実施形態に適用可能な環境が図１のネットワーク環境に限定されることはない。 FIG. 1 is a diagram showing an example of a network environment in one embodiment of the present invention. The network environment of FIG. 1 illustrates an example including multiple electronic devices 110 , 120 , 130 , 140 , multiple servers 150 , 160 , and a network 170 . Such FIG. 1 is merely an example for explaining the invention, and the number of electronic devices and the number of servers are not limited as in FIG. Also, the network environment of FIG. 1 is merely an example for explaining the environment applicable to this embodiment, and the environment applicable to this embodiment is not limited to the network environment of FIG.

複数の電子機器１１０、１２０、１３０、１４０は、コンピュータ装置によって実現される固定端末や移動端末であってよい。複数の電子機器１１０、１２０、１３０、１４０の例としては、スマートフォン、携帯電話、ナビゲーション、ＰＣ（ｐｅｒｓｏｎａｌｃｏｍｐｕｔｅｒ）、ノート型ＰＣ、デジタル放送用端末、ＰＤＡ（ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ＰＭＰ（ＰｏｒｔａｂｌｅＭｕｌｔｉｍｅｄｉａＰｌａｙｅｒ）、タブレットなどがある。一例として、図１では、電子機器１１０の例としてスマートフォンを示しているが、本発明の実施形態において、電子機器１１０は、実質的に無線または有線通信方式を利用し、ネットワーク１７０を介して他の電子機器１２０、１３０、１４０および／またはサーバ１５０、１６０と通信することのできる多様な物理的なコンピュータ装置のうちの１つを意味してよい。 The plurality of electronic devices 110, 120, 130, 140 may be fixed terminals or mobile terminals implemented by computing devices. Examples of the plurality of electronic devices 110, 120, 130, and 140 include smartphones, mobile phones, navigation systems, PCs (personal computers), notebook PCs, digital broadcasting terminals, PDAs (Personal Digital Assistants), and PMPs (Portable Multimedia Players). ), tablets, etc. As an example, FIG. 1 shows a smart phone as an example of the electronic device 110, but in embodiments of the present invention, the electronic device 110 substantially utilizes a wireless or wired communication scheme and communicates with other devices via the network 170. may refer to one of a wide variety of physical computing devices capable of communicating with the electronic devices 120, 130, 140 and/or the servers 150, 160.

通信方式が限定されることはなく、ネットワーク１７０が含むことのできる通信網（一例として、移動通信網、有線インターネット、無線インターネット、放送網、衛星網など）を利用する通信方式だけではなく、機器間の近距離無線通信が含まれてもよい。例えば、ネットワーク１７０は、ＰＡＮ（ｐｅｒｓｏｎａｌａｒｅａｎｅｔｗｏｒｋ）、ＬＡＮ（ｌｏｃａｌａｒｅａｎｅｔｗｏｒｋ）、ＣＡＮ（ｃａｍｐｕｓａｒｅａｎｅｔｗｏｒｋ）、ＭＡＮ（ｍｅｔｒｏｐｏｌｉｔａｎａｒｅａｎｅｔｗｏｒｋ）、ＷＡＮ（ｗｉｄｅａｒｅａｎｅｔｗｏｒｋ）、ＢＢＮ（ｂｒｏａｄｂａｎｄｎｅｔｗｏｒｋ）、インターネットなどのネットワークのうちの１つ以上の任意のネットワークを含んでよい。さらに、ネットワーク１７０は、バスネットワーク、スターネットワーク、リングネットワーク、メッシュネットワーク、スター－バスネットワーク、ツリーまたは階層的ネットワークなどを含むネットワークトポロジのうちの任意の１つ以上を含んでもよいが、これらに限定されることはない。 The communication method is not limited, and not only the communication method using the communication network that can be included in the network 170 (eg, mobile communication network, wired Internet, wireless Internet, broadcast network, satellite network, etc.), but also the device It may also include short-range wireless communication between For example, the network 170 includes a PAN (personal area network), a LAN (local area network), a CAN (campus area network), a MAN (metropolitan area network), a WAN (wide area network), a BBN (broadband network), and the Internet. Any one or more of the networks may be included. Additionally, network 170 may include any one or more of network topologies including, but not limited to, bus networks, star networks, ring networks, mesh networks, star-bus networks, tree or hierarchical networks, and the like. will not be

サーバ１５０、１６０それぞれは、複数の電子機器１１０、１２０、１３０、１４０とネットワーク１７０を介して通信して命令、コード、ファイル、コンテンツ、サービスなどを提供する１つ以上のコンピュータ装置によって実現されてよい。例えば、サーバ１５０は、ネットワーク１７０を介して接続した複数の電子機器１１０、１２０、１３０、１４０にサービス（一例として、コンテンツ提供サービス、グループ通話サービス（または、音声会議サービス）、メッセージングサービス、メールサービス、ソーシャルネットワークサービス、地図サービス、翻訳サービス、金融サービス、決済サービス、検索サービスなど）を提供するシステムであってよい。 Each of servers 150, 160 is implemented by one or more computing devices that communicate with a plurality of electronic devices 110, 120, 130, 140 over network 170 to provide instructions, code, files, content, services, etc. good. For example, the server 150 provides services (for example, content provision service, group call service (or voice conference service), messaging service, mail service) to a plurality of electronic devices 110, 120, 130, and 140 connected via the network 170. , social network services, map services, translation services, financial services, payment services, search services, etc.).

図２は、本発明の一実施形態における、コンピュータ装置の例を示したブロック図である。上述した複数の電子機器１１０、１２０、１３０、１４０それぞれやサーバ１５０、１６０それぞれは、図２に示したコンピュータ装置２００によって実現されてよい。 FIG. 2 is a block diagram illustrating an example computing device, in accordance with one embodiment of the present invention. Each of the plurality of electronic devices 110, 120, 130 and 140 and each of the servers 150 and 160 described above may be realized by the computer device 200 shown in FIG.

このようなコンピュータ装置２００は、図２に示すように、メモリ２１０、プロセッサ２２０、通信インタフェース２３０、および入力／出力インタフェース２４０を含んでよい。メモリ２１０は、コンピュータ読み取り可能な記録媒体であって、ＲＡＭ（ｒａｎｄｏｍａｃｃｅｓｓｍｅｍｏｒｙ）、ＲＯＭ（ｒｅａｄｏｎｌｙｍｅｍｏｒｙ）、およびディスクドライブのような永続的大容量記録装置を含んでよい。ここで、ＲＯＭやディスクドライブのような永続的大容量記録装置は、メモリ２１０とは区分される別の永続的記録装置としてコンピュータ装置２００に含まれてもよい。また、メモリ２１０には、オペレーティングシステムと、少なくとも１つのプログラムコードが記録されてよい。このようなソフトウェア構成要素は、メモリ２１０とは別のコンピュータ読み取り可能な記録媒体からメモリ２１０にロードされてよい。このような別のコンピュータ読み取り可能な記録媒体は、フロッピー（登録商標）ドライブ、ディスク、テープ、ＤＶＤ／ＣＤ－ＲＯＭドライブ、メモリカードなどのコンピュータ読み取り可能な記録媒体を含んでよい。他の実施形態において、ソフトウェア構成要素は、コンピュータ読み取り可能な記録媒体ではない通信インタフェース２３０を通じてメモリ２１０にロードされてもよい。例えば、ソフトウェア構成要素は、ネットワーク１７０を介して受信されるファイルによってインストールされるコンピュータプログラムに基づいてコンピュータ装置２００のメモリ２１０にロードされてよい。 Such a computing device 200 may include memory 210, processor 220, communication interface 230, and input/output interface 240, as shown in FIG. The memory 210 is a computer-readable storage medium and may include random access memory (RAM), read only memory (ROM), and permanent mass storage devices such as disk drives. Here, a permanent mass storage device such as a ROM or disk drive may be included in computer device 200 as a separate permanent storage device separate from memory 210 . Also stored in memory 210 may be an operating system and at least one program code. Such software components may be loaded into memory 210 from a computer-readable medium separate from memory 210 . Such other computer-readable recording media may include computer-readable recording media such as floppy drives, disks, tapes, DVD/CD-ROM drives, memory cards, and the like. In other embodiments, software components may be loaded into memory 210 through communication interface 230 that is not a computer-readable medium. For example, software components may be loaded into memory 210 of computing device 200 based on computer programs installed by files received over network 170 .

プロセッサ２２０は、基本的な算術、ロジック、および入出力演算を実行することにより、コンピュータプログラムの命令を処理するように構成されてよい。命令は、メモリ２１０または通信インタフェース２３０によって、プロセッサ２２０に提供されてよい。例えば、プロセッサ２２０は、メモリ２１０のような記録装置に記録されたプログラムコードにしたがって受信される命令を実行するように構成されてよい。 Processor 220 may be configured to process computer program instructions by performing basic arithmetic, logic, and input/output operations. Instructions may be provided to processor 220 by memory 210 or communication interface 230 . For example, processor 220 may be configured to execute received instructions according to program code stored in a storage device, such as memory 210 .

通信インタフェース２３０は、ネットワーク１７０を介してコンピュータ装置２００が他の装置（一例として、上述した記録装置）と互いに通信するための機能を提供してよい。一例として、コンピュータ装置２００のプロセッサ２２０がメモリ２１０のような記録装置に記録されたプログラムコードにしたがって生成した要求や命令、データ、ファイルなどが、通信インタフェース２３０の制御にしたがってネットワーク１７０を介して他の装置に伝達されてよい。これとは逆に、他の装置からの信号や命令、データ、ファイルなどが、ネットワーク１７０を経てコンピュータ装置２００の通信インタフェース２３０を通じてコンピュータ装置２００に受信されてよい。通信インタフェース２３０を通じて受信された信号や命令、データなどはプロセッサ２２０やメモリ２１０に伝達されてよく、ファイルなどはコンピュータ装置２００がさらに含むことのできる記録媒体（上述した永続的記録装置）に記録されてよい。 Communication interface 230 may provide functionality for computer device 200 to communicate with other devices (eg, the recording device described above) via network 170 . As an example, processor 220 of computing device 200 can transmit requests, commands, data, files, etc. generated according to program code recorded in a recording device such as memory 210 to other devices via network 170 under the control of communication interface 230 . device. Conversely, signals, instructions, data, files, etc. from other devices may be received by computing device 200 through communication interface 230 of computing device 200 over network 170 . Signals, commands, data, etc., received through the communication interface 230 may be transmitted to the processor 220 and the memory 210, and files, etc., may be recorded in a recording medium (the permanent recording device described above) that the computing device 200 may further include. you can

入力／出力インタフェース２４０は、入力／出力装置２５０とのインタフェースのための手段であってよい。例えば、入力装置は、マイク、キーボード、またはマウスなどの装置を、出力装置は、ディスプレイ、スピーカのような装置を含んでよい。他の例として、入力／出力インタフェース２４０は、タッチスクリーンのように入力と出力のための機能が１つに統合された装置とのインタフェースのための手段であってもよい。入力／出力装置２５０は、少なくとも１つのコンピュータ装置２００と１つの装置で構成されてもよい。 Input/output interface 240 may be a means for interfacing with input/output device 250 . For example, input devices may include devices such as a microphone, keyboard, or mouse, and output devices may include devices such as displays, speakers, and the like. As another example, input/output interface 240 may be a means for interfacing with a device that integrates functionality for input and output, such as a touch screen. Input/output device 250 may consist of at least one computing device 200 and one device.

また、他の実施形態において、コンピュータ装置２００は、図２の構成要素よりも少ないか多くの構成要素を含んでもよい。しかし、大部分の従来技術的構成要素を明確に図に示す必要はない。例えば、コンピュータ装置２００は、上述した入力／出力装置２５０のうちの少なくとも一部を含むように実現されてもよいし、トランシーバやデータベースなどのような他の構成要素をさらに含んでもよい。 Also, in other embodiments, computing device 200 may include fewer or more components than the components of FIG. However, most prior art components need not be explicitly shown in the figures. For example, computing device 200 may be implemented to include at least some of the input/output devices 250 described above, and may also include other components such as transceivers, databases, and the like.

図３は、本発明の一実施形態における、映像コンテンツ生成システムの例を示した図である。図３は、コンテンツ生成サーバ３００、複数のユーザ３１０、およびコンテンツ編集ツール３２０を示している。 FIG. 3 is a diagram showing an example of a video content generation system in one embodiment of the present invention. FIG. 3 shows a content generation server 300, a plurality of users 310, and a content editing tool 320. As shown in FIG.

コンテンツ生成サーバ３００は、少なくとも１つのコンピュータ装置２００で実現されてよく、複数のユーザ３１０にコンテンツ編集ツール３２０を提供し、複数のユーザ３１０がコンテンツ編集ツール３２０を利用しながら複数のユーザ３１０それぞれがイメージに音声合成をダビングして映像コンテンツを生成することをサポートしてよい。 The content generation server 300 may be implemented in at least one computing device 200 and provides a plurality of users 310 with a content editing tool 320 such that each of the plurality of users 310 can use the content editing tool 320 while the plurality of users 310 use the content editing tool 320. It may support dubbing speech synthesis into images to generate video content.

ここで、「イメージ」は、個別の複数のイメージ、イメージの束、またはイメージの束と少なくとも１つの個別のイメージを含んでよい。また、イメージの束は、ＰＤＦファイルのように１つのファイルに含まれたページをイメージ化したものを含んでよい。 Here, "image" may include a plurality of individual images, a batch of images, or a batch of images and at least one individual image. A bundle of images may also include images of pages contained in a single file, such as a PDF file.

複数のユーザ３１０は、コンテンツ生成サーバ３００からコンテンツ編集ツール３２０の提供を受けてイメージから映像コンテンツを生成してよい。このとき、複数のユーザ３１０それぞれは、実質的に、ネットワーク１７０を介してコンテンツ生成サーバ３００にアクセスしてコンテンツ編集ツール３２０の提供を受ける物理的な電子機器であってよい。このような物理的な電子機器もそれぞれ、図２を参照しながら説明したコンピュータ装置２００で実現されてよい。 A plurality of users 310 may receive content editing tools 320 from the content generation server 300 to generate video content from images. At this time, each of the plurality of users 310 may be substantially a physical electronic device that accesses the content generation server 300 via the network 170 and receives provision of the content editing tool 320 . Each such physical electronic device may also be embodied in the computing device 200 described with reference to FIG.

コンテンツ編集ツール３２０は、ウェブ方式またはアプリ方式で複数のユーザ３１０に提供されてよい。ウェブ方式とは、複数のユーザ３１０が、コンテンツ編集ツール３２０の機能が実現されていてコンテンツ生成サーバ３００から提供されるウェブページを訪問し、該当のウェブページを通じて映像コンテンツの生成のための機能の提供を受ける方式を意味してよい。アプリ方式とは、複数のユーザ３１０に対応する物理的な電子機器それぞれにおいてインストールされて実行されるアプリケーションを通じてコンテンツ生成サーバ３００に接続して、映像コンテンツの生成のための機能の提供を受ける方式を意味してよい。実施形態によって、映像コンテンツの生成のための機能が含まれたアプリケーションを利用して、複数のユーザ３１０に対応する物理的な電子機器それぞれが自律的に映像コンテンツの生成を処理してもよい。 The content editing tool 320 may be provided to the plurality of users 310 in a web-based or app-based manner. In the web method, a plurality of users 310 visit a web page that implements the function of the content editing tool 320 and is provided from the content generation server 300, and perform functions for generating video content through the corresponding web page. It may mean a method of receiving an offer. The application method is a method of connecting to the content generation server 300 through an application installed and executed in each physical electronic device corresponding to a plurality of users 310 and receiving a function for generating video content. can mean Depending on the embodiment, each physical electronic device corresponding to a plurality of users 310 may autonomously process the generation of video content using an application that includes a function for generating video content.

一実施形態において、コンテンツ生成サーバ３００は、ユーザがコンテンツ編集ツール３２０にアップロードしたイメージのサムネイルをタイムラインに沿ってコンテンツ編集ツール３２０に表示してよい。ユーザが複数のページで形成されたファイルをアップロードする場合、コンテンツ生成サーバ３００は、複数のページをイメージ化し、イメージ化されたページのサムネイルをタイムラインに沿ってコンテンツ編集ツール３２０に表示してよい。 In one embodiment, the content generation server 300 may display thumbnails of images uploaded to the content editing tool 320 by the user along a timeline on the content editing tool 320 . When the user uploads a file made up of multiple pages, the content generation server 300 may image the multiple pages and display thumbnails of the imaged pages on the content editing tool 320 along a timeline. .

このとき、コンテンツ編集ツール３２０は、ユーザがタイムライン上のイメージの順序を調節するための機能を提供してよい。ユーザは、該当の機能を利用して、自身がアップロードしたイメージの順序を決定してよい。タイムライン上のイメージの順序は、最終的に生成される映像コンテンツにおいてイメージが登場する順序に対応してよい。 At this time, the content editing tool 320 may provide a function for the user to adjust the order of the images on the timeline. Users may determine the order of their uploaded images using the appropriate function. The order of the images on the timeline may correspond to the order in which the images appear in the final generated video content.

また、コンテンツ編集ツール３２０は、ユーザがタイムライン上のイメージのうちで希望するイメージを削除するための機能を提供してよい。言い換えれば、ユーザは、該当の機能を利用して、自身がアップロードしたイメージのうちで不要なイメージを削除することができる。 Also, the content editing tool 320 may provide a function for the user to delete a desired image among the images on the timeline. In other words, the user can use the corresponding function to delete unnecessary images among the images uploaded by the user.

また、コンテンツ編集ツール３２０は、ユーザがタイムライン上で各イメージが占有する時間（または、区間）を調節するための機能を提供してよい。調節された時間は、最終的に生成される映像コンテンツにおいてイメージが登場する時間（または、区間）に対応してよい。例えば、コンテンツ編集ツール３２０に表示されるサムネイルの横の長さ（または、縦の長さ）は、イメージがタイムライン上で占有する時間（または、区間）に対応してよい。一例として、コンテンツ編集ツール３２０は、初めは４秒の時間（または、区間）に対応する長さでサムネイルを表示してよい。このとき、コンテンツ編集ツール３２０は、サムネイルの左側および／または右側終端部分をユーザがクリックあるいはタッチした後にドラッグすることによってサムネイルの長さを増減するための機能を提供してよい。この場合、調節されたサムネイルの長さにしたがい、タイムライン上でイメージが占有する時間が増減されてよい。 Also, the content editing tool 320 may provide a function for the user to adjust the time (or interval) occupied by each image on the timeline. The adjusted time may correspond to the time (or section) at which the image appears in the video content that is finally generated. For example, the horizontal length (or vertical length) of the thumbnail displayed in content editing tool 320 may correspond to the time (or interval) that the image occupies on the timeline. As an example, content editing tool 320 may initially display thumbnails with a length corresponding to a time (or interval) of four seconds. At this time, the content editing tool 320 may provide functionality for increasing or decreasing the length of the thumbnail by the user clicking or touching and then dragging the left and/or right end portions of the thumbnail. In this case, the time occupied by the image on the timeline may be increased or decreased according to the adjusted thumbnail length.

また、コンテンツ編集ツール３２０は、ユーザがタイムライン上で希望する時点や区間を選択するための機能を提供してよく、選択された時点や区間に対してユーザが希望する任意のテキストを連係させるためのユーザインタフェースを提供してよい。選択された時点や区間に対して任意のテキストが連係されれば、コンテンツ生成サーバ３００は、連係されたテキストを自動で音声に変換し、選択された時点や区間に変換された音声を追加することにより、ユーザが希望する内容の音声を簡単かつ便利にイメージにダビングできるようにサポートすることができる。 In addition, the content editing tool 320 may provide a function for the user to select a desired time point or section on the timeline, and associate any text desired by the user with the selected time point or section. may provide a user interface for If any text is linked to the selected time point or section, the content generation server 300 automatically converts the linked text into speech and adds the converted voice to the selected time point or section. Therefore, it is possible to easily and conveniently dub the voice of the user's desired contents into the image.

図４～１９は、本発明の一実施形態における、コンテンツ編集ツールの画面の例を示した図である。 4 to 19 are diagrams showing examples of content editing tool screens in one embodiment of the present invention.

図４は、図３を参照しながら説明したコンテンツ編集ツール３２０の第１画面例４００を示している。本実施形態に係るコンテンツ編集ツール３２０の構成は一例に過ぎず、前記構成は実施形態によって多様に変更されてよい。 FIG. 4 shows an example first screen 400 of the content editing tool 320 described with reference to FIG. The configuration of the content editing tool 320 according to the present embodiment is only an example, and the configuration may be variously changed according to the embodiment.

ユーザは、自身の電子機器を利用してコンテンツ編集ツール３２０にアクセスしてよく、コンテンツ編集ツール３２０は、ユーザがイメージをアップロードするための機能４１０を提供してよい。図４の第１画面例４００では、動画やＰＤＦファイルをアップロードする例について説明しているが、コンテンツ編集ツール３２０は、個別の複数のイメージや複数のイメージが含まれた１つのファイル、または１つのファイルと複数のイメージの組み合わせをアップロードするための機能を提供してもよい。このとき、ユーザがアップロードするイメージは、ユーザがコンテンツ編集ツール３２０にアクセスするために使用した電子機器のローカル保存場所に保存されたイメージを含んでよい。実施形態によって、ユーザがアップロードするイメージは、電子機器のローカル保存場所ではなく、ウェブ上に位置するイメージであってもよい。 A user may access the content editing tool 320 using their electronic device, and the content editing tool 320 may provide functionality 410 for the user to upload images. Although the first screen example 400 of FIG. 4 describes an example of uploading a video or PDF file, the content editing tool 320 can be used to create a single file containing multiple individual images, multiple images, or a single file containing multiple images. You may provide functionality for uploading a single file and a combination of multiple images. At this time, the images uploaded by the user may include images stored in the local storage location of the electronic device that the user used to access the content editing tool 320 . Depending on the embodiment, the image uploaded by the user may be an image located on the web rather than a local storage location on the electronic device.

また、コンテンツ編集ツール３２０は、イメージにダビングを追加するための機能４２０を提供してよい。一例として、機能４２０は、音声選択機能４２１およびテキスト入力機能４２２を含んでよい。音声選択機能４２１は、多様な種類の予め定義された音声タイプのうちから１つを選択するための機能であってよく、テキスト入力機能４２２は、音声合成（ＴｅｘｔＴｏＳｐｅｅｃｈ：ＴＴＳ）を生成するためのテキストを入力するための機能であってよい。一例として、ユーザが、音声選択機能４２１で音声タイプ「音声１」を選択し、テキスト入力機能４２２にテキスト「こんにちは」を入力したとする。このとき、試し聞きボタン４２３やダビング追加ボタン４２４を選択（一例として、ＰＣ環境におけるクリックまたはタッチスクリーン環境におけるタッチによって選択）する場合、入力されたテキスト「こんにちは」と選択された音声タイプ「音声１」の識別子がコンテンツ編集ツール３２０からコンテンツ生成サーバ３００に伝達されてよい。この場合、コンテンツ生成サーバ３００は、音声タイプ「音声１」を使用してテキスト「こんにちは」に対する音声合成を生成してよく、生成された音声合成をコンテンツ編集ツール３２０からユーザの電子機器に伝達してよい。このとき、試し聞きボタン４２３の選択に応答して電子機器のスピーカから音声合成が出力されてよく、ダビング追加ボタン４２４の選択に応答して、機能４１０によってアップロードされたイメージと関連して音声合成がタイムラインに追加されてよい。より具体的に、コンテンツ編集ツール３２０は、最終的に生成される映像コンテンツに対するタイムラインを可視的に表現するためのタイムライン表示機能４４０を含んでよい。このとき、音声合成がタイムラインのどこに追加されるのかについては、以下でさらに詳しく説明する。 Content editing tools 320 may also provide functionality 420 for adding dubbing to images. As an example, functions 420 may include voice selection function 421 and text input function 422 . Voice selection function 421 may be a function for selecting one of a wide variety of predefined voice types, and text input function 422 generates text-to-speech (TTS). It may be a function for entering text for As an example, suppose the user selects the voice type “Voice 1” in voice selection function 421 and enters the text “Hello” in text input function 422 . At this time, when the trial listening button 423 or the dubbing addition button 424 is selected (for example, by clicking in a PC environment or by touching in a touch screen environment), the input text "Hello" and the selected voice type "Voice 1" are selected. ” may be communicated from the content editing tool 320 to the content generation server 300 . In this case, the content generation server 300 may generate speech synthesis for the text "hello" using the speech type "speech 1", and transmit the generated speech synthesis from the content editing tool 320 to the user's electronic device. you can At this time, speech synthesis may be output from the electronic device's speakers in response to selection of preview button 423 , and speech synthesis associated with the image uploaded by function 410 may be output in response to selection of add dubbing button 424 . may be added to the timeline. More specifically, the content editing tool 320 may include a timeline display function 440 for visually representing a timeline for the video content that is finally generated. Where speech synthesis is added to the timeline at this time will be discussed in more detail below.

実施形態によって、音声選択機能４２１は、ユーザがお気に入り登録をした音声タイプのうちから１つを選択するように実現されてよい。このとき、全体の音声タイプのうちの特定の音声タイプをお気に入り登録するためのユーザインタフェースがユーザに提供されてよい。一例として、ユーザがダビング追加機能４２０に示された「全体表示」を選択する場合、ユーザに全体の音声タイプを表示するためのユーザインタフェースが提供されてよく、ユーザは、提供されたユーザインタフェースから、全体の音声タイプのうちの希望する少なくとも１つの音声タイプをお気に入り登録してよい。この場合、音声選択機能４２１は、ユーザがお気に入り登録した音声のうちの１つを選択するように実現されてよい。 Depending on the embodiment, the voice selection function 421 may be implemented to select one of the voice types that the user has favorited. At this time, the user may be provided with a user interface for bookmarking a specific voice type among all voice types. As an example, if the user selects "Display All" shown in the dubbing addition function 420, a user interface may be provided for displaying the overall audio type to the user, and the user may select from the provided user interface , at least one desired voice type out of all voice types may be favorited. In this case, voice selection function 421 may be implemented to select one of the voices that the user has favorited.

また、コンテンツ編集ツール３２０は、予め製作されている効果音をイメージと関連させてタイムラインに追加するための効果音追加機能４３０を提供してよい。効果音追加機能４３０は、予め製作されている多数の効果音のリストを表示し、効果音に対する試し聞きを実行するか、効果音をタイムラインの特定の時間に追加したりするための機能を含んでよい。必要によっては、ユーザが希望する効果音を外部ファイルから追加するか、直接生成したりしてもよい。 The content editing tool 320 may also provide a sound effect addition function 430 for adding pre-made sound effects to the timeline in association with the image. The sound effect addition function 430 displays a list of many pre-made sound effects, performs a trial listening to the sound effects, or adds a sound effect to a specific time on the timeline. may contain. If desired, user-desired sound effects may be added from an external file or generated directly.

また、コンテンツ編集ツール３２０は、タイムラインの特定の時点を示すタイムインジケータ４５０を表示してよい。図４では、タイムインジケータ４５０がデフォルトである００：００．００の時点にある例を示している。 Content editing tool 320 may also display a time indicator 450 that indicates a particular point in time on the timeline. FIG. 4 shows an example where the time indicator 450 is at the default time of 00:00.00.

また、図４のコンテンツ編集ツール３２０に示された保存ボタン４６０は、現在のプロジェクトの編集を保存するための機能を提供してよく、ダウンロードボタン４７０は、映像コンテンツを生成してユーザの電子機器にダウンロードするための機能を提供してよい。 Also, a save button 460 shown in the content editing tool 320 of FIG. 4 may provide functionality for saving the edits of the current project, and a download button 470 may generate video content for download to the user's electronic device. may provide functionality for downloading to

図５は、コンテンツ編集ツール３２０の第２画面例５００を示している。図５の第２画面例５００では、図４で説明した機能４１０によってイメージがアップロードされることにより、アップロードされたイメージのサムネイルのうちの一部がタイムライン表示機能４４０によって表示された例を示している。このとき、各サムネイルは、予め設定された時間間隔（図５の実施形態では４秒の時間間隔）に対応するようにタイムライン表示機能４４０に表示されている。また、タイムライン表示機能４４０の領域に対するクリック＆ドラッグ（または、タッチスクリーン環境のためのタッチ＆ドラッグやスワイプジェスチャ）によってタイムラインとサムネイルの探索が可能となる。 FIG. 5 shows an example second screen 500 of the content editing tool 320 . A second screen example 500 in FIG. 5 shows an example in which an image is uploaded by the function 410 described with reference to FIG. ing. At this time, each thumbnail is displayed on the timeline display function 440 so as to correspond to a preset time interval (a time interval of 4 seconds in the embodiment of FIG. 5). Also, a click-and-drag (or a touch-and-drag or swipe gesture for touch screen environments) on the area of the timeline display function 440 allows exploration of the timeline and thumbnails.

図６は、コンテンツ編集ツール３２０の第３画面例６００として、タイムライン表示機能４４０の領域に対するクリック＆ドラッグによってタイムライン表示機能４４０の他の領域が表示される例を示している。第３画面例６００では、最後のサムネイルであるサムネイル１０により、ユーザが１０枚のイメージをアップロードしたことが分かる。上述したように、１０枚のイメージは、個別のイメージまたは１０枚のイメージにイメージ化が可能なページを含む１つのファイルの形態でアップロードされてもよいし、ｎ枚のイメージにイメージ化が可能なページを含むファイルとｍ枚の個別のイメージ（ここで、ｎとｍは自然数であり、ｎ＋ｍ＝１０）が結合された形態でアップロードされてもよい。２つ以上のファイルと個別イメージの組み合わせが使用可能であることは、容易に理解できるであろう。 FIG. 6 shows an example of displaying another area of the timeline display function 440 by clicking and dragging the area of the timeline display function 440 as a third screen example 600 of the content editing tool 320 . In the third example screen 600, the last thumbnail, thumbnail 10, indicates that the user has uploaded 10 images. As noted above, the 10 images may be uploaded in the form of individual images or a single file containing pages that can be imaged into 10 images, or can be imaged into n images. A file containing a page and m individual images (here, n and m are natural numbers, n+m=10) may be uploaded in a combined form. It will be readily appreciated that combinations of two or more files and individual images can be used.

図７は、コンテンツ編集ツール３２０の第４画面例７００として、サムネイルの時間間隔を調節した例を示している。例えば、図７の第４画面例７００において、タイムライン表示機能４４０の領域に表示されるサムネイルの横の長さは、イメージがタイムライン上で占有する時間（または、区間）に対応してよい。このとき、第４画面例７００では、ユーザがサムネイル２の右側終端部分をクリックした後に右側方向にドラッグしながらサムネイルの長さを伸ばした例を示している。この場合、伸びたサムネイル２の長さにしたがい、サムネイル２に対応するイメージがタイムライン上で占有する時間（以下、ランニングタイム）が増えてよい。このとき、第４画面例７００では、ユーザがサムネイル２の右側終端部分をクリックしている間、サムネイル２の右側終端部分に対応するタイムライン上の時点（９．９秒の時点）が表示されるユーザインタフェース７１０が示されている。したがって、ユーザは、このようなユーザインタフェース７１０に表示される時間に基づいてサムネイル２の長さを調節してよい。一方、サムネイル２の長さが増えた分だけ、サムネイル２の後ろのサムネイル（一例として、サムネイル３～１０）の開始時点が変更されてよい。図７の実施形態では、サムネイル２の長さを調節してサムネイルに対応するイメージのランニングタイムを調節する例について説明したが、このような説明がタイムライン表示機能４４０の各サムネイルにも同じように適用可能であることは、容易に理解できるであろう。 FIG. 7 shows an example in which the time interval between thumbnails is adjusted as a fourth screen example 700 of the content editing tool 320 . For example, in the fourth screen example 700 of FIG. 7, the horizontal length of the thumbnail displayed in the area of the timeline display function 440 may correspond to the time (or interval) that the image occupies on the timeline. . At this time, the fourth screen example 700 shows an example in which the user extends the length of the thumbnail while dragging in the right direction after clicking the right end portion of the thumbnail 2 . In this case, according to the length of the thumbnail 2 that has been extended, the time that the image corresponding to the thumbnail 2 occupies on the timeline (hereinafter referred to as running time) may increase. At this time, in the fourth screen example 700, while the user is clicking the right end portion of thumbnail 2, the time point (9.9 seconds) on the timeline corresponding to the right end portion of thumbnail 2 is displayed. A user interface 710 is shown. Accordingly, the user may adjust the length of thumbnail 2 based on the time displayed in such user interface 710 . On the other hand, the start points of the thumbnails after thumbnail 2 (thumbnails 3 to 10, for example) may be changed by the length of thumbnail 2 . In the embodiment of FIG. 7, an example in which the running time of the image corresponding to the thumbnail is adjusted by adjusting the length of the thumbnail 2 has been described. It will be readily understood that it is applicable to

図８は、コンテンツ編集ツール３２０の第５画面例８００として、サムネイル４の時間間隔が減少した例を示している。第５画面例８００では、ユーザがサムネイル４の右側終端部分をクリックした後に左側方向にドラッグしてサムネイルの長さを縮めた例を示している。このとき、縮まったサムネイル４の長さにしたがい、サムネイル４に対応するイメージのランニングタイムが減ってよい。この場合、第５画面例８００では、ユーザがサムネイル４の右側終端部分をクリックしている間、サムネイル４の右側終端部分に対応するタイムライン上の時点（１７秒の時点）が表示されるユーザインタフェース８１０が示されている。一方、サムネイル４の長さが減った分だけ、サムネイル４の後ろのサムネイル（一例として、サムネイル５～１０）の開始時点が変更されてよい。 FIG. 8 shows an example of a fifth screen example 800 of the content editing tool 320 in which the time interval between thumbnails 4 is reduced. A fifth screen example 800 shows an example in which the user shortens the length of the thumbnail by clicking the right end portion of the thumbnail 4 and then dragging it leftward. At this time, the running time of the image corresponding to the thumbnail 4 may be reduced according to the shortened length of the thumbnail 4 . In this case, in the fifth screen example 800, while the user is clicking the right end portion of thumbnail 4, the time point (17 seconds) on the timeline corresponding to the right end portion of thumbnail 4 is displayed. An interface 810 is shown. On the other hand, the starting points of the thumbnails after the thumbnail 4 (thumbnails 5 to 10, for example) may be changed by the amount corresponding to the reduced length of the thumbnail 4 .

図７および図８の実施形態では、ユーザがサムネイルの右側終端部分をクリックした後に左右方向にドラッグしながらサムネイルの長さを増減することにより、サムネイルに対応するイメージのランニングタイムを増減する実施形態について説明した。このような説明により、実施形態によっては、コンテンツ編集ツール３２０がサムネイルの左側終端部分をクリックした後に左右方向にドラッグしてサムネイルの長さを増減することにより、サムネイルに対応するイメージのランニングタイムを増減する機能を提供することも可能であることは、容易に理解できるであろう。 In the embodiments of FIGS. 7 and 8, the running time of the image corresponding to the thumbnail is increased or decreased by increasing or decreasing the length of the thumbnail while the user clicks on the right end portion of the thumbnail and then drags left or right. explained. With this explanation, in some embodiments, the content editing tool 320 can increase or decrease the length of the thumbnail by clicking on the left end portion of the thumbnail and then dragging in the horizontal direction to increase or decrease the running time of the image corresponding to the thumbnail. It will be readily appreciated that it is also possible to provide the ability to increase or decrease.

図９は、コンテンツ編集ツール３２０の第６画面例９００として、サムネイルの順序が変更された例を示している。コンテンツ編集ツール３２０は、ユーザが特定のサムネイルをクリック後、ドラッグ（タッチスクリーン環境ではタッチ後にドラッグ）することによってサムネイルの順序を変更するための機能を提供してよい。一例として、ユーザは、第５画面例８００で、サムネイル１をクリックした後に右側方向にドラッグすることにより、サムネイル１とサムネイル２の順序を変更してよい。第６画面例９００は、サムネイル１とサムネイル２の順序が変更された様子を示している。 FIG. 9 shows an example in which the thumbnail order is changed as a sixth screen example 900 of the content editing tool 320 . Content editing tools 320 may provide functionality for a user to change the order of thumbnails by clicking on a particular thumbnail and then dragging (touch then drag in a touchscreen environment). As an example, the user may change the order of thumbnail 1 and thumbnail 2 in the fifth screen example 800 by clicking thumbnail 1 and then dragging it to the right. A sixth screen example 900 shows how the order of thumbnail 1 and thumbnail 2 has been changed.

図１０は、コンテンツ編集ツール３２０の第７画面例１０００として、特定のサムネイルが削除された例を示している。コンテンツ編集ツール３２０は、ユーザが特定のサムネイルを選択した後に削除するための機能を提供してよい。一例として、ユーザが特定のサムネイルに対してマウスオーバーイベントを発生させることによって該当のサムネイルを削除するためのユーザインタフェースが表示されてよく、ユーザは、表示されたユーザインタフェースを利用して該当のサムネイルを削除してよい。このようなサムネイル削除のための方法が多様に提供可能であることは、容易に理解できるであろう。一例として、ユーザは、特定のサムネイルをマウスでクリックして選択した後にキーボード上の「Ｄｅｌ」キーを押すことにより、選択されたサムネイルを削除してもよい。 FIG. 10 shows an example in which a specific thumbnail is deleted as a seventh screen example 1000 of the content editing tool 320 . The content editing tool 320 may provide functionality for the user to delete a particular thumbnail after it has been selected. As an example, a user interface may be displayed for a user to delete a particular thumbnail by generating a mouseover event on the thumbnail. can be deleted. It will be readily appreciated that a variety of methods for such thumbnail deletion can be provided. As an example, a user may delete a selected thumbnail by pressing a "Del" key on the keyboard after selecting a particular thumbnail with a mouse click.

図１１および図１２は、コンテンツ編集ツール３２０の第８画面例１１００および第９画面例１２００として、ダビングを追加する例を示している。上述したように、タイムインジケータ４５０は、タイムラインの特定の時点を示すものである。例えば、ユーザは、タイムインジケータ４５０をドラッグするか希望するタイムラインの位置をクリックする方式によってタイムインジケータ４５０を移動させてよい。第８画面例１１００で、タイムインジケータ４５０と関連して表示された時刻「００：０６．００」は、タイムラインで現在タイムインジケータ４５０が指示する時点を示してよい。 11 and 12 show examples of adding dubbing as an eighth screen example 1100 and a ninth screen example 1200 of the content editing tool 320. FIG. As noted above, time indicator 450 indicates a particular point in time on the timeline. For example, the user may move the time indicator 450 by dragging the time indicator 450 or clicking a desired timeline position. The time '00:06.00' displayed in association with the time indicator 450 in the eighth screen example 1100 may indicate the time indicated by the current time indicator 450 on the timeline.

また、第８画面例１１００には、ダビング追加機能４２０のテキスト入力機能４２２によってテキスト「こんにちは、私はＡＡＡです。」が入力された例を示している。このとき、ユーザがダビング追加ボタン４２４を選択する場合、第９画面例１２００のように、テキスト「こんにちは、私はＡＡＡです。」に対応する第１音声合成のための音声合成インジケータ１２１０がタイムライン表示機能４４０の領域にサムネイルと関連して表示されてよい。このとき、第１音声合成は、上述したように、コンテンツ生成サーバ３００で生成されてコンテンツ編集ツール３２０に伝達されてよい。一方、音声合成インジケータ１２１０には、対応するテキスト「こんにちは、私はＡＡＡです。」の少なくとも一部（第９画面例１２００の「こんにちは、私」）と、第１音声合成の生成に使用された音声タイプの識別子（一例として、音声タイプ「音声１」の識別子（１）１２２０）が表示されてよい。 Further, the eighth screen example 1100 shows an example in which the text "Hello, I am AAA." At this time, if the user selects the add dubbing button 424, the speech synthesis indicator 1210 for the first speech synthesis corresponding to the text "Hello, I am AAA." It may be displayed in association with the thumbnail in the display function 440 area. At this time, the first synthesized speech may be generated by the content generation server 300 and transferred to the content editing tool 320 as described above. On the other hand, text-to-speech indicator 1210 includes at least a portion of the corresponding text "Hello, I am AAA." An identifier for the voice type (eg, identifier (1) 1220 for voice type “voice 1”) may be displayed.

音声合成インジケータ１２１０の長さは、第１音声合成の長さに対応してよく、このような音声合成インジケータ１２１０の長さによって表示されるテキストの分量が異なってよい。このとき、第８画面例１１００に示されたタイムインジケータ４５０の時刻は「００：０６．００」であり、第９画面例１２００に示されたタイムインジケータ４５０の時刻は「００：０９．５６」である。言い換えれば、第１音声合成のための音声合成インジケータ１２１０の長さは、３．５６秒（００：０９．５６－００：０６．００＝００：０３．５６）であることが分かる。 The length of the speech synthesis indicator 1210 may correspond to the length of the first speech synthesis, and the amount of displayed text may vary according to the length of the speech synthesis indicator 1210 . At this time, the time of the time indicator 450 shown in the eighth screen example 1100 is "00:06.00", and the time of the time indicator 450 shown in the ninth screen example 1200 is "00:09.56". is. In other words, it can be seen that the length of the speech synthesis indicator 1210 for the first speech synthesis is 3.56 seconds (00:09.56-00:06.00=00:03.56).

一方、ユーザが第８画面例１１００で試し聞きボタン４２３を選択する場合、テキスト「こんにちは、私はＡＡＡです。」に対応する第１音声合成がユーザの電子機器のスピーカから出力されてよい。言い換えれば、電子機器は、コンテンツ編集ツール３２０の制御にしたがい、第１音声合成をスピーカから出力してよい。 On the other hand, if the user selects listen button 423 in eighth example screen 1100, a first speech synthesis corresponding to the text "Hello, I am AAA" may be output from the speaker of the user's electronic device. In other words, the electronic device may output the first speech synthesis from the speaker under control of the content editing tool 320 .

図１３は、コンテンツ編集ツール３２０の第１０画面例１３００として、ユーザが音声合成インジケータ１２１０上にマウスオーバーのような入力を発生させる場合、マウスポインタの位置（タッチスクリーン環境では、音声合成インジケータ１２１０の位置をタッチしてタッチを位置させる間のタッチの位置）と関連して音声合成情報１３１０が表示される例を示している。音声合成情報１３１０は、音声合成の生成に利用された音声タイプ（音声１）、音声合成の長さ（３．５６秒（００：０３．５６））、入力されたテキスト（こんにちは、私はＡＡＡです。）を含んでよい。 FIG. 13 shows, as a tenth screen example 1300 of the content editing tool 320, when the user generates an input such as a mouseover on the speech synthesis indicator 1210, the position of the mouse pointer (in a touch screen environment, the position of the speech synthesis indicator 1210). 13 shows an example in which text-to-speech information 1310 is displayed in association with the location of the touch (between touching the location and placing the touch). The speech synthesis information 1310 includes the speech type (speech 1) used to generate the speech synthesis, the speech synthesis length (3.56 seconds (00:03.56)), the input text (Hello, I am AAA ) may be included.

図１４は、コンテンツ編集ツール３２０の第１１画面例１４００として、ユーザがサムネイル３の長さをタイムインジケータ４５０に合うように減らした場合の例を示している。この場合、サムネイル３の長さは、第１音声合成の長さが１．５６であり、映像コンテンツのためのタイムラインでサムネイル３に対応するイメージのランニングタイムが１．５６秒になることが分かる。 FIG. 14 shows an eleventh screen example 1400 of the content editing tool 320 when the user reduces the length of the thumbnail 3 to match the time indicator 450 . In this case, the length of thumbnail 3 may be 1.56 seconds for the length of the first speech synthesis, and the running time of the image corresponding to thumbnail 3 on the timeline for video content may be 1.56 seconds. I understand.

図１５は、コンテンツ編集ツール３２０の第１２画面例１５００として、ユーザが第１音声合成の開始時点を変更する例を示している。言い換えれば、第１２画面例１５００では、第１１画面例１４００と比べて音声合成インジケータ１２１０の位置が変更していることが分かる。一例として、ユーザは、コンテンツ編集ツール３２０で音声合成インジケータ１２１０をクリックした状態で左側または右側にドラッグすることによって音声合成インジケータ１２１０の位置を変更してよく、このような音声合成インジケータ１２１０の位置変更によって第１音声合成の開始時点が変更されてよい。一方、音声合成インジケータ１２１０の位置の変更は、該当の音声合成インジケータ１２１０が選択（一例として、クリック）された状態でキーボードの方向キー入力によってなされてもよい。このような位置の変更は、音声合成インジケータ１２１０だけでなく、コンテンツ編集ツール３２０で提供される多様なインジケータそれぞれに対しても共通の方法で適用することが可能である。また、多数のインジケータは、１つのグループから選択されてもよい。一例として、キーボードの「Ｓｈｉｆｔ」キーを押した状態で多数のインジケータを順に選択（一例として、クリック）することにより、多数のインジケータが１つのグループとして選択されてよい。この場合、ユーザは、ドラッグやキーボードの方向キーの入力などにより、該当のグループに属する多数のインジケータの位置を一度に変更してもよい。 FIG. 15 shows an example of a twelfth screen example 1500 of the content editing tool 320 in which the user changes the start point of the first speech synthesis. In other words, it can be seen that in the twelfth screen example 1500 , the position of the speech synthesis indicator 1210 has changed compared to the eleventh screen example 1400 . As an example, the user may change the position of the text-to-speech indicator 1210 by clicking and dragging the text-to-speech indicator 1210 to the left or to the right in the content editing tool 320 , and such repositioning of the text-to-speech indicator 1210 may be changed. may change the starting point of the first speech synthesis. On the other hand, the position of the voice synthesis indicator 1210 may be changed by inputting direction keys on the keyboard while the corresponding voice synthesis indicator 1210 is selected (eg, clicked). Such position changes can be applied in a common manner not only to speech synthesis indicator 1210 but also to each of the various indicators provided by content editing tool 320 . Also, multiple indicators may be selected from one group. As an example, multiple indicators may be selected as a group by sequentially selecting (eg, clicking) multiple indicators while holding down the "Shift" key on the keyboard. In this case, the user may change the positions of many indicators belonging to the group at once by dragging or inputting direction keys on the keyboard.

図１６および図１７は、コンテンツ編集ツール３２０の第１３画面例１６００および第１４画面例１７００として、ダビングをさらに追加する例を示している。 16 and 17 show examples of adding dubbing as a thirteenth screen example 1600 and a fourteenth screen example 1700 of the content editing tool 320. FIG.

第１３画面例１６００は、ユーザがタイムインジケータ４５０を「００：０５．７８」の位置に移動させた後、音声選択機能４２１によって音声タイプ「音声２」を選択し、テキスト入力機能４２２によってテキスト「はじめまして。」を入力した例を示している。このとき、ユーザがダビング追加ボタン４２４を選択する場合、第１４画面例１７００のように、テキスト「はじめまして。」に対応する第２音声合成のための音声合成インジケータ１７１０がタイムライン表示機能４４０の領域にサムネイルと関連して表示されてよい。上述したように、音声合成インジケータ１７１０には、対応するテキスト「はじめまして」の少なくとも一部（第１４画面例１７００の「はじめ」）と、第２音声合成の生成に使用された音声タイプの識別子（一例として、音声タイプ「音声２」の識別子（２）１７２０）が表示されてよい。 In the thirteenth screen example 1600, after the user moves the time indicator 450 to the position of "00:05.78", the voice selection function 421 selects the voice type "voice 2", and the text input function 422 selects the text " Nice to meet you.” is input. At this time, when the user selects the add dubbing button 424, the speech synthesis indicator 1710 for the second speech synthesis corresponding to the text "Nice to meet you." may be displayed in association with the thumbnail in the As described above, the text-to-speech indicator 1710 includes at least a portion of the corresponding text "Nice to meet you" ("Hajime" in the fourteenth screen example 1700) and an identifier for the type of speech used to generate the second speech synthesis ("Nice to meet you"). As an example, identifier (2) 1720) for voice type "voice 2" may be displayed.

音声合成インジケータ１７１０の長さは、第２音声合成の長さに対応してよく、このような音声合成インジケータ１７１０の長さによって表示されるテキストの分量が異なってよい。このとき、第１３画面例１６００に示されたタイムインジケータ４５０の時刻は「００：０６．００」であり、第１４画面例１７００に示されたタイムインジケータ４５０の時刻は「００：０８．２４」である。言い換えれば、第２音声合成のための音声合成インジケータ１７１０の長さは、２．２４秒（００：０８．２４－００：０６．００＝００：０２．２４）であることが分かる。 The length of the speech synthesis indicator 1710 may correspond to the length of the second speech synthesis, and the amount of displayed text may vary according to the length of the speech synthesis indicator 1710 . At this time, the time of the time indicator 450 shown in the thirteenth screen example 1600 is "00:06.00", and the time of the time indicator 450 shown in the fourteenth screen example 1700 is "00:08.24". is. In other words, it can be seen that the length of the speech synthesis indicator 1710 for the second speech synthesis is 2.24 seconds (00:08.24-00:06.00=00:02.24).

一方、ユーザが第１３画面例１６００で試し聞きボタン４２３を選択する場合、テキスト「はじめまして」に対応する第２音声合成がユーザの電子機器のスピーカから出力されてよい。言い換えれば、電子機器は、コンテンツ編集ツール３２０の制御にしたがって第２音声合成をスピーカから出力してよい。 On the other hand, if the user selects the listen button 423 in the thirteenth example screen 1600, a second speech synthesis corresponding to the text "nice to meet you" may be output from the speaker of the user's electronic device. In other words, the electronic device may output the second speech synthesis from the speaker under control of the content editing tool 320 .

図１８は、コンテンツ編集ツール３２０の第１５画面例１８００として、効果音を追加する例を示している。第１５画面例１８００では、ユーザが効果音追加機能４３０によって効果音２を選択（一例として、点線枠１８１０内のプラスボタンをクリック）することにより、現在のタイムインジケータ４５０の時点を開始時点として効果音２のインジケータ１８２０が追加される例を示している。このとき、効果音２のインジケータ１８２０の長さは、点線枠１８１０に示したように２．４６秒であってよい。このようなインジケータ１８２０も、ユーザがクリック＆ドラッグによって他の時点に移動させることが可能である。 FIG. 18 shows an example of adding a sound effect as a fifteenth screen example 1800 of the content editing tool 320 . In the fifteenth screen example 1800, the user selects the sound effect 2 using the sound effect addition function 430 (for example, clicks the plus button in the dotted frame 1810), and the current time indicator 450 is set as the start time. An example is shown in which a sound 2 indicator 1820 is added. At this time, the duration of the sound effect 2 indicator 1820 may be 2.46 seconds as indicated by the dotted frame 1810 . Such an indicator 1820 can also be moved to another point in time by the user by clicking and dragging.

以上の実施形態では、サムネイルのための１つのチャンネルと音声合成のための１つのチャンネル、さらに効果音のための１つのチャンネルという合計３つのチャンネルによって、映像コンテンツを生成するための情報をタイムラインに沿って羅列する例について説明した。しかし、実施形態によっては、音声合成のための２つ以上のチャンネルおよび／または効果音のための２つ以上のチャンネルが使用されてもよい。 In the above embodiment, information for generating video content is displayed on the timeline using a total of three channels: one channel for thumbnails, one channel for voice synthesis, and one channel for sound effects. An example of enumerating along is explained. However, in some embodiments, more than one channel for speech synthesis and/or more than one channel for sound effects may be used.

図１９は、コンテンツ編集ツール３２０の第１６画面例１９００として、音声合成のための２つ以上のチャンネルを使用する例を示している。第１６画面例１９００では、２つの音声合成インジケータ１２１０、１７１０の一部分が重なって表示された例を示している。これは、少なくとも一部のタイムラインで２つの音声合成が同時に出力されることも可能であることを示している。図１９の実施形態では、音声合成のための２つのチャンネルが使用されることを示しているが、３つ以上のチャンネルも使用可能であることは容易に理解できるであろう。また、効果音のための２つ以上のチャンネルが使用可能であることも容易に理解できるであろう。 FIG. 19 shows an example sixteenth screen 1900 of the content editing tool 320 using two or more channels for speech synthesis. A sixteenth screen example 1900 shows an example in which two speech synthesis indicators 1210 and 1710 are partially overlapped and displayed. This indicates that it is also possible for two speech syntheses to be output simultaneously on at least some timelines. Although the embodiment of FIG. 19 shows that two channels are used for speech synthesis, it will be readily appreciated that more than two channels may be used. It will also be readily apparent that more than one channel for sound effects can be used.

図２０は、本発明の一実施形態における、映像コンテンツ生成方法の例を示したフローチャートである。本実施形態に係る映像コンテンツ生成方法は、コンテンツ編集ツール３２０によってコンテンツ編集支援のためのサービスを提供するコンピュータ装置２００で実行されてよい。このとき、コンピュータ装置２００のプロセッサ２２０は、メモリ２１０が含むオペレーティングシステムのコードと、少なくとも１つのコンピュータプログラムのコードとによる制御命令（ｉｎｓｔｒｕｃｔｉｏｎ）を実行するように実現されてよい。ここで、プロセッサ２２０は、コンピュータ装置２００に記録されたコードが提供する制御命令にしたがってコンピュータ装置２００が図２０の方法に含まれる段階２０１０～２０９０を実行するようにコンピュータ装置２００を制御してよい。 FIG. 20 is a flow chart illustrating an example of a video content generation method in one embodiment of the present invention. The video content generation method according to the present embodiment may be performed by the computer device 200 that provides services for content editing support by the content editing tool 320 . At this time, the processor 220 of the computing device 200 may be implemented to execute control instructions according to the operating system code and the at least one computer program code contained in the memory 210 . Here, processor 220 may control computing device 200 such that computing device 200 performs steps 2010-2090 included in the method of FIG. 20 according to control instructions provided by code recorded in computing device 200. .

段階２０１０で、コンピュータ装置２００は、コンテンツ編集ツールにアップロードされたイメージのスナップショットを抽出してよい。上述したように、イメージは、個別の複数のイメージや複数のイメージが含まれた１つのファイル、または１つのファイルと複数のイメージの組み合わせの形態でアップロードされてよい。特定の実施形態において、イメージは、イメージ化が可能な複数のページを含むファイルの形態でアップロードされてよい。一例として、ＰＤＦファイルがアップロードされる場合、コンピュータ装置２００は、ＰＤＦファイルからイメージを抽出して複数のイメージファイルとして保存してよく、複数のイメージファイルそれぞれに対するスナップショットを抽出してよい。 At step 2010, computing device 200 may extract a snapshot of the image uploaded to the content editing tool. As noted above, images may be uploaded in the form of individual images, a file containing images, or a combination of a file and images. In certain embodiments, images may be uploaded in the form of files containing multiple pages that can be imaged. As an example, if a PDF file is uploaded, the computing device 200 may extract images from the PDF file and store them as multiple image files, and may extract snapshots for each of the multiple image files.

段階２０２０で、コンピュータ装置２００は、抽出されたスナップショットをコンテンツ編集ツールでタイムラインに沿って表示してよい。ここで、表示されたスナップショットの長さは、表示されたスナップショットに対応するイメージが前記タイムライン上で占有する時間のランニングタイムに比例してよい。このとき、コンピュータ装置２００は、抽出されたスナップショットをデフォルトランニングタイムに比例する長さでコンテンツ編集ツールに表示してよい。図５では、４秒のデフォルトランニングタイムに比例する長さでスナップショットを表示する例について説明した。 At step 2020, the computing device 200 may display the extracted snapshots along a timeline with a content editing tool. Here, the length of the displayed snapshot may be proportional to the running time of the time occupied on the timeline by the image corresponding to the displayed snapshot. At this time, the computing device 200 may display the extracted snapshot on the content editing tool with a length proportional to the default running time. FIG. 5 illustrates an example of displaying a snapshot with a length proportional to the default running time of 4 seconds.

段階２０３０で、コンピュータ装置２００は、表示されたスナップショットの順序を変更するための機能を提供してよい。一例として、図８および図９では、サムネイル１とサムネイル２の位置を変更する例について説明した。実施形態によって、コンピュータ装置２００は、特定のサムネイルを削除するための機能をさらに提供してもよい。 At step 2030, computing device 200 may provide functionality for changing the order of the displayed snapshots. As an example, FIG. 8 and FIG. 9 describe an example of changing the positions of thumbnail 1 and thumbnail 2 . Depending on the embodiment, computing device 200 may further provide functionality for deleting a particular thumbnail.

段階２０４０で、コンピュータ装置２００は、コンテンツ編集ツールに表示されたスナップショットの長さを調節する長さ調節機能を提供してよい。一例として、コンピュータ装置２００は、表示されたスナップショットのうちの第１スナップショットに対し、予め設定された左側領域または右側領域に対するユーザのタッチ＆ドラッグまたはクリック＆ドラッグによって第１スナップショットの長さを増減させる機能を提供してよい。また、コンピュータ装置２００は、第１スナップショットの左側領域または右側領域に対するユーザのタッチまたはクリックが維持される間、第１スナップショットの左側終端部分または右側終端部分に対するタイムライン上の時点を表示してよい。一例として、図７および図８では、サムネイルの長さを増減することと、このときにタイムライン上の時点が該当のスナップショットの右側終端部分に表示される例について説明した。 At step 2040, computing device 200 may provide a length adjustment function to adjust the length of the snapshot displayed in the content editing tool. As an example, the computing device 200 may display the length of the first snapshot by the user's touch-and-drag or click-and-drag on a preset left area or right area for the first snapshot among the displayed snapshots. may be provided with the ability to increase or decrease the Also, while the user's touch or click on the left region or right region of the first snapshot is maintained, the computing device 200 displays the point in time on the timeline for the left end portion or right end portion of the first snapshot. you can As an example, FIG. 7 and FIG. 8 describe an example in which the thumbnail length is increased or decreased, and the time point on the timeline is displayed at the right end portion of the corresponding snapshot at this time.

段階２０５０で、コンピュータ装置２００は、長さ調節機能によって長さが調節されたスナップショットのランニングタイムを調節された長さによって調節してよい。一例として、コンピュータ装置２００は、長さが調節されたスナップショットに対応するイメージが、タイムライン上で占有する時間の前記ランニングタイムがスナップショットの長さが調節された程度に比例するように増減してよい。 At step 2050, the computing device 200 may adjust the running time of the snapshot whose length is adjusted by the length adjustment function according to the adjusted length. As an example, computing device 200 increases or decreases the running time of the time that an image corresponding to the length-adjusted snapshot occupies on the timeline in proportion to the extent to which the length of the snapshot was adjusted. You can

段階２０６０で、コンピュータ装置２００は、コンテンツ編集ツールに入力されるテキストに対する音声合成を生成してタイムラインの選択された時点に追加してよい。このとき、コンピュータ装置２００は、コンテンツ編集ツールで選択された音声タイプによってテキストに対する音声合成を生成してよい。年齢、性別、言語（韓国語、英語、中国語、日本語、スペイン語など）、感情（喜び、悲しみなど）などに応じて多数の音声タイプが予め生成されたものがコンテンツ編集ツールでユーザに提供されてよく、ユーザは、コンテンツ編集ツールから音声合成に利用するための特定の音声タイプを選択してよい。また、コンピュータ装置２００は、タイムライン上で特定の時点を示すタイムインジケータの移動によって選択されたタイムラインの特定の時点に、生成された音声合成を追加してよい。図１１および図１２、図１６および図１７では、タイムインジケータ４５０によって選択された時点に音声合成を追加する例について説明した。 At step 2060, computing device 200 may generate a speech synthesis for the text entered into the content editing tool and add it to the timeline at the selected point in time. At this time, the computing device 200 may generate speech synthesis for the text according to the speech type selected in the content editing tool. A large number of voice types are pre-generated according to age, gender, language (Korean, English, Chinese, Japanese, Spanish, etc.), emotion (joy, sadness, etc.), etc. A user may select a particular voice type to utilize for speech synthesis from a content editing tool. Also, the computing device 200 may add the generated speech synthesis to a specific point in time on the timeline selected by moving a time indicator indicating the specific point in time on the timeline. 11 and 12, 16 and 17 described examples of adding speech synthesis at the time selected by the time indicator 450. FIG.

実施形態によって、コンピュータ装置２００は、タイムラインに追加しようとする第１音声合成がタイムラインに既に追加された第２音声合成とランニングタイムの少なくとも一部が重なる場合、第１音声合成を第２音声合成とは異なる音声チャンネルとしてタイムラインに追加してよい。言い換えれば、生成される映像コンテンツにおいて２つ以上の音声合成が同時に出力されるようにダビングがなされてよい。図１９では、２つの音声合成が互いに異なるチャンネルとしてタイムラインに追加される例について説明した。 According to an embodiment, if the first speech synthesis to be added to the timeline overlaps at least part of the running time of the second speech synthesis that has already been added to the timeline, the computer device 200 replaces the first speech synthesis with the second speech synthesis. It may be added to the timeline as an audio channel separate from speech synthesis. In other words, dubbing may be performed such that two or more audio syntheses are output simultaneously in the generated video content. FIG. 19 has described an example in which two speech syntheses are added to the timeline as different channels.

また、コンピュータ装置２００は、タイムラインの選択された時点に追加された音声合成に対するインジケータをコンテンツ編集ツールで表示してよい。実施形態によっては、インジケータによってテキストの少なくとも一部が表示されてよく、インジケータの長さは音声合成の長さに比例してよい。ここで、音声合成の長さとは、音声合成が出力される時間を意味してよい。 Computing device 200 may also display an indicator for speech synthesis added at the selected point in the timeline in the content editing tool. In some embodiments, the indicator may display at least a portion of the text, and the length of the indicator may be proportional to the length of the speech synthesis. Here, the speech synthesis length may mean the time during which the speech synthesis is output.

さらに、コンピュータ装置２００は、インジケータに対するユーザ入力に基づいて、音声合成の生成に利用された音声タイプに関する情報、音声合成の長さに関する情報、およびテキストのうちの少なくとも１つを含む音声合成情報を出力してよい。音声合成情報は、インジケータに対するユーザ入力が発生する位置と関連して表示されてよい。一例として、図１３では、音声合成インジケータ１２１０に関する音声合成情報１３１０を表示する例について説明した。 Additionally, computing device 200 may generate speech synthesis information including at least one of information about the type of speech used to generate the speech synthesis, information about the length of speech synthesis, and text based on the user input to the indicator. can be output. Speech synthesis information may be displayed in association with the location where user input to the indicator occurs. As an example, in FIG. 13, an example of displaying the speech synthesis information 1310 regarding the speech synthesis indicator 1210 has been described.

段階２０７０で、コンピュータ装置２００は、ユーザの入力に基づいて、タイムラインに追加された音声合成のタイムライン上の位置を移動させてよい。一例として、図１４および図１５では、ユーザのクリック＆ドラッグまたはタッチ＆ドラッグのような入力によって音声合成の位置を移動させる例について説明した。 At step 2070, computing device 200 may move the position on the timeline of the speech synthesis added to the timeline based on the user's input. As an example, in FIGS. 14 and 15, an example of moving the position of speech synthesis by input such as click-and-drag or touch-and-drag by the user has been described.

段階２０８０で、コンピュータ装置２００は、コンテンツ編集ツールで提供された複数の効果音のうちから１つの効果音の選択を受けてよい。一例として、図１８では、効果音追加機能４３０によってユーザに複数の効果音を提供し、ユーザが複数の効果音のうちから１つを選択することについて説明した。 At step 2080, the computing device 200 may receive a selection of one sound effect from multiple sound effects provided by the content editing tool. As an example, in FIG. 18, multiple sound effects are provided to the user by the sound effect addition function 430, and the user selects one of the multiple sound effects.

段階２０９０で、コンピュータ装置２００は、コンテンツ編集ツールでタイムラインに対して選択された時点に、選択された効果音を追加してよい。一例として、図１８では、タイムインジケータ４５０によって選択された時点に効果音２を追加する例について説明した。 At step 2090, computing device 200 may add the selected sound effect at the time selected to the timeline with the content editing tool. As an example, in FIG. 18, the example of adding the sound effect 2 at the time selected by the time indicator 450 has been described.

このとき、実施形態によって、段階２０１０～２０９０のうちの少なくとも一部が並列的に実行されてよい。一例として、段階２０４０と段階２０５０は、長さ調節のためのユーザの入力によってトリガーされてよく、段階２０６０と段階２０７０は、音声合成の追加のためのユーザの入力によってトリガーされてよく、段階２０８０と段階２０９０は、効果音の追加のためのユーザの入力によってトリガーされてよい。したがって、段階２０４０～２０９０の順序は、ユーザの入力によって変更されてもよい。 At this time, depending on the embodiment, at least some of steps 2010-2090 may be performed in parallel. As an example, steps 2040 and 2050 may be triggered by user input for length adjustment, steps 2060 and 2070 may be triggered by user input for adding speech synthesis, and step 2080. and stage 2090 may be triggered by user input for the addition of sound effects. Accordingly, the order of steps 2040-2090 may be changed by user input.

この後、ユーザが映像コンテンツの生成を要請する場合、コンピュータ装置２００は、イメージを映像コンテンツに合わせたサイズに平準化した後、動画を生成してよい。実施形態によって、コンピュータ装置２００は、映像コンテンツにウォーターマークおよび／または字幕を挿入してよい。この後、コンピュータ装置２００は、タイムラインに合うように動画に音声合成および／または効果音を挿入して最終映像コンテンツを生成してよい。 Thereafter, when the user requests the generation of video content, the computer apparatus 200 may generate a moving image after leveling the image to a size suitable for the video content. Depending on the embodiment, the computing device 200 may insert watermarks and/or subtitles into the video content. Computing device 200 may then insert voice synthesis and/or sound effects into the video to match the timeline to generate the final video content.

このように、本発明の実施形態によると、多数のイメージに対してユーザが希望する音声合成をリアルタイムで生成してユーザが希望する再生開始時間にダビングすることができ、生成された音声合成がダビングされた多数のイメージによって映像コンテンツを生成および提供することができる。 As described above, according to the embodiment of the present invention, it is possible to generate a speech synthesis desired by the user for a number of images in real time and dub at a playback start time desired by the user, and the generated speech synthesis is Video content can be generated and provided by multiple dubbed images.

上述したシステムまたは装置は、ハードウェア構成要素、またはハードウェア構成要素とソフトウェア構成要素との組み合わせによって実現されてよい。例えば、実施形態で説明された装置および構成要素は、例えば、プロセッサ、コントローラ、ＡＬＵ（ａｒｉｔｈｍｅｔｉｃｌｏｇｉｃｕｎｉｔ）、デジタル信号プロセッサ、マイクロコンピュータ、ＦＰＧＡ（ｆｉｅｌｄｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）、ＰＬＵ（ｐｒｏｇｒａｍｍａｂｌｅｌｏｇｉｃｕｎｉｔ）、マイクロプロセッサ、または命令を実行して応答することができる様々な装置のように、１つ以上の汎用コンピュータまたは特殊目的コンピュータを利用して実現されてよい。処理装置は、オペレーティングシステム（ＯＳ）およびＯＳ上で実行される１つ以上のソフトウェアアプリケーションを実行してよい。また、処理装置は、ソフトウェアの実行に応答し、データにアクセスし、データを記録、操作、処理、および生成してもよい。理解の便宜のために、１つの処理装置が使用されるとして説明される場合もあるが、当業者であれば、処理装置が複数個の処理要素および／または複数種類の処理要素を含んでもよいことが理解できるであろう。例えば、処理装置は、複数個のプロセッサまたは１つのプロセッサおよび１つのコントローラを含んでよい。また、並列プロセッサのような、他の処理構成も可能である。 The systems or devices described above may be realized by hardware components or a combination of hardware and software components. For example, the devices and components described in the embodiments may include, for example, processors, controllers, ALUs (arithmetic logic units), digital signal processors, microcomputers, FPGAs (field programmable gate arrays), PLUs (programmable logic units), microcontrollers, It may be implemented using one or more general purpose or special purpose computers, such as a processor or various devices capable of executing instructions and responding to instructions. The processing unit may run an operating system (OS) and one or more software applications that run on the OS. The processor may also access, record, manipulate, process, and generate data in response to executing software. For convenience of understanding, one processing device may be described as being used, but those skilled in the art will appreciate that a processing device may include multiple processing elements and/or multiple types of processing elements. You can understand that. For example, a processing unit may include multiple processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

ソフトウェアは、コンピュータプログラム、コード、命令、またはこれらのうちの１つ以上の組み合わせを含んでもよく、思うままに動作するように処理装置を構成したり、独立的または集合的に処理装置に命令したりしてよい。ソフトウェアおよび／またはデータは、処理装置に基づいて解釈されたり、処理装置に命令またはデータを提供したりするために、いかなる種類の機械、コンポーネント、物理装置、仮想装置、コンピュータ記録媒体または装置に具現化されてよい。ソフトウェアは、ネットワークによって接続されたコンピュータシステム上に分散され、分散された状態で記録されても実行されてもよい。ソフトウェアおよびデータは、１つ以上のコンピュータ読み取り可能な記録媒体に記録されてよい。 Software may include computer programs, code, instructions, or a combination of one or more of these, to configure a processor to operate at its discretion or to independently or collectively instruct a processor. You can Software and/or data may be embodied in any kind of machine, component, physical device, virtual device, computer storage medium or device for interpretation on or for providing instructions or data to a processing device. may be changed. The software may be stored and executed in a distributed fashion over computer systems linked by a network. Software and data may be recorded on one or more computer-readable recording media.

実施形態に係る方法は、多様なコンピュータ手段によって実行可能なプログラム命令の形態で実現されてコンピュータ読み取り可能な媒体に記録されてよい。前記コンピュータ読み取り可能な媒体は、プログラム命令、データファイル、データ構造などを単独でまたは組み合わせて含んでよい。媒体は、コンピュータ実行可能なプログラムを継続して記録するものであっても、実行またはダウンロードのために一時記録するものであってもよい。また、媒体は、単一または複数のハードウェアが結合した形態の多様な記録手段または格納手段であってよく、あるコンピュータシステムに直接接続する媒体に限定されることはなく、ネットワーク上に分散して存在するものであってもよい。媒体の例としては、ハードディスク、フロッピー（登録商標）ディスク、および磁気テープのような磁気媒体、ＣＤ－ＲＯＭおよびＤＶＤのような光媒体、フロプティカルディスク（ｆｌｏｐｔｉｃａｌｄｉｓｋ）のような光磁気媒体、およびＲＯＭ、ＲＡＭ、フラッシュメモリなどを含み、プログラム命令が記録されるように構成されたものであってよい。また、媒体の他の例として、アプリケーションを配布するアプリケーションストアやその他の多様なソフトウェアを供給または配布するサイト、サーバなどで管理する記録媒体または格納媒体も挙げられる。プログラム命令の例は、コンパイラによって生成されるもののような機械語コードだけではなく、インタプリタなどを使用してコンピュータによって実行される高級言語コードを含む。 The method according to the embodiments may be embodied in the form of program instructions executable by various computer means and recorded on a computer-readable medium. The computer-readable media may include program instructions, data files, data structures, etc. singly or in combination. The medium may be a continuous recording of the computer-executable program or a temporary recording for execution or download. In addition, the medium may be various recording means or storage means in the form of a combination of single or multiple hardware, and is not limited to a medium that is directly connected to a computer system, but is distributed over a network. It may exist in Examples of media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and ROM, RAM, flash memory, etc., and may be configured to store program instructions. Other examples of media include recording media or storage media managed by application stores that distribute applications, sites that supply or distribute various software, and servers. Examples of program instructions include high-level language code that is executed by a computer, such as using an interpreter, as well as machine language code, such as that generated by a compiler.

以上のように、実施形態を、限定された実施形態および図面に基づいて説明したが、当業者であれば、上述した記載から多様な修正および変形が可能であろう。例えば、説明された技術が、説明された方法とは異なる順序で実行されたり、かつ／あるいは、説明されたシステム、構造、装置、回路などの構成要素が、説明された方法とは異なる形態で結合されたりまたは組み合わされたり、他の構成要素または均等物によって対置されたり置換されたとしても、適切な結果を達成することができる。 As described above, the embodiments have been described based on the limited embodiments and drawings, but those skilled in the art will be able to make various modifications and variations based on the above description. For example, the techniques described may be performed in a different order than in the manner described and/or components such as systems, structures, devices, circuits, etc. described may be performed in a manner different from the manner described. Appropriate results may be achieved when combined or combined, opposed or substituted by other elements or equivalents.

したがって、異なる実施形態であっても、特許請求の範囲と均等なものであれば、添付される特許請求の範囲に属する。 Accordingly, different embodiments that are equivalent to the claims should still fall within the scope of the appended claims.

３００：コンテンツ生成サーバ
３１０：ユーザ
３２０：コンテンツ編集ツール 300: Content Generation Server 310: User 320: Content Editing Tool

Claims

A method of generating video content for a computing device comprising at least one processor, comprising:
extracting, by the at least one processor, a snapshot of an image uploaded through a content editing tool;
displaying, by the at least one processor, the extracted snapshots along a timeline through the content editing tool;
providing, by the at least one processor, a length adjustment function to adjust the length of the displayed snapshot through the content editing tool;
adjusting, by the at least one processor, a running time of the snapshot whose length has been adjusted by the length adjusting function by the adjusted length; and inputting by the at least one processor through the content editing tool. generating a speech synthesis for the text to be generated and adding it to the selected time point of the timeline.

the length of the displayed snapshot is proportional to the running time of the time occupied on the timeline by the image corresponding to the displayed snapshot;
The step of displaying along the timeline includes:
2. The video content generation method of claim 1, wherein the extracted snapshot is displayed through the content editing tool for a length proportional to a default running time.

The step of providing the length adjustability comprises:
A function of increasing or decreasing the length of the first snapshot among the displayed snapshots by a user's touch-and-drag or click-and-drag on a preset left area or right area. 3. A video content generation method according to claim 1 or 2, characterized in that it provides .

The step of providing the length adjustability comprises:
displaying a point in time on the timeline for the left end portion or the right end portion of the first snapshot while a user's touch or click on the left side region or the right side region of the first snapshot is maintained. 4. A video content generation method as claimed in claim 3.

Adjusting the running time according to the adjusted length includes:
The running time of the time occupied by the image corresponding to the length-adjusted snapshot on the timeline is increased or decreased in proportion to the extent to which the length is adjusted. The video content generation method according to any one of claims 1 to 4.

generating the speech synthesis and adding it to the timeline at the selected point in time;
The video content generation method according to any one of claims 1 to 5, wherein speech synthesis is generated for the text according to the audio type selected through the content editing tool.

generating the speech synthesis and adding it to the timeline at the selected point in time;
wherein said generated speech synthesis is added at a specific point in time on said timeline selected by moving a time indicator representing a specific point in time on said timeline. The video content generation method according to any one of .

further comprising, by the at least one processor, moving a position on the timeline of the speech synthesis added to the timeline based on user input. The video content generation method according to the item.

selecting, by the at least one processor, one sound effect from among a plurality of sound effects provided through the content editing tool; and 9. The video content generation method according to any one of claims 1 to 8, further comprising: adding the selected sound effect at a point in time selected by .

10. The video of any one of claims 1 to 9, further comprising: providing, by said at least one processor, functionality for changing the order of said displayed snapshots. Content generation method.

The video content generation method according to any one of claims 1 to 10, wherein the image is uploaded in the form of a file containing a plurality of pages that can be imaged.

generating the speech synthesis and adding it to the timeline at the selected point in time;
If the first speech synthesis to be added to the timeline overlaps at least a part of the running time of the second speech synthesis already added to the timeline, the first speech synthesis is different from the second speech synthesis. A video content generation method according to any one of claims 1 to 5, characterized in that it is added to said timeline as an audio channel.

generating the speech synthesis and adding it to the timeline at the selected point in time;
The video content generation method according to any one of claims 1 to 5, characterized in that an indicator for speech synthesis added at the selected time point of the timeline is displayed through the content editing tool. .

14. The method of claim 13, wherein the indicator causes at least part of the text to be displayed.

14. The method of claim 13, wherein the length of the indicator is proportional to the length of the voice synthesis.

generating the speech synthesis and adding it to the timeline at the selected point in time;
displaying at least one of information of the speech type used to generate the speech synthesis, information of the length of the speech synthesis, and the text, based on user input to the indicator; 14. The video content generation method according to claim 13.

A computer program for causing a computer device to perform the method according to any one of claims 1-16.

at least one processor implemented to execute computer readable instructions;
by the at least one processor;
Extract snapshots of images uploaded through content editing tools,
displaying the extracted snapshot along a timeline through the content editing tool;
providing a length adjustment function for adjusting the length of the displayed snapshot through the content editing tool;
adjusting the running time of the snapshot whose length has been adjusted by the length adjustment function according to the adjusted length;
A computer device that generates a speech synthesis for text input through the content editing tool and adds it to the timeline at selected points.

to provide the length adjustment functionality, by the at least one processor:
A function of increasing or decreasing the length of the first snapshot among the displayed snapshots by a user's touch-and-drag or click-and-drag on a preset left area or right area. 19. A computing device as claimed in claim 18, characterized in that it provides: