JP6641045B1

JP6641045B1 - Content generation system and content generation method

Info

Publication number: JP6641045B1
Application number: JP2019033926A
Authority: JP
Inventors: 鈴木　智也; 智也鈴木
Original assignee: みんとる合同会社
Priority date: 2019-02-27
Filing date: 2019-02-27
Publication date: 2020-02-05
Anticipated expiration: 2039-02-27
Also published as: JP2020140326A

Abstract

【課題】発話音声に基づいてコンテンツを生成する。【解決手段】コンテンツ生成システムは、発話者の音声情報を取得する音声取得部と、前記音声情報をテキスト情報に変換する変換部と、前記テキスト情報を解析する解析部と、前記テキスト情報の解析結果に基づいてシナリオ情報を生成するシナリオ情報生成部と、素材画像が格納された素材画像データベースと、前記シナリオ情報を所定の編集単位毎に区分し、前記編集単位に前記素材画像データベースに格納された前記素材画像を対応付けて対応情報を生成する対応付け部と、前記編集単位毎に対応付けられた前記素材画像を連結して動画像を生成する動画像生成部と、を備えることを特徴とする。【選択図】図３A content is generated based on an uttered voice. A content generation system includes a voice acquisition unit that acquires voice information of a speaker, a conversion unit that converts the voice information into text information, an analysis unit that analyzes the text information, and an analysis of the text information. A scenario information generating unit for generating scenario information based on the result, a material image database storing material images, and dividing the scenario information into predetermined editing units, and storing the scenario information in the material image database in the editing units And a moving image generating unit that generates a moving image by linking the material images associated with each of the editing units. And [Selection diagram] FIG.

Description

本発明は、例えば、発話音声に基づいてコンテンツを生成できるようにしたコンテンツ生成システム、及びコンテンツ生成方法に関する。 The present invention relates to a content generation system and a content generation method that can generate content based on, for example, an uttered voice.

従来、動画像（アニメーションを含む）、漫画等のコンテンツを容易に生成するための様々方法が提案されている。 2. Description of the Related Art Conventionally, various methods have been proposed for easily generating moving images (including animations) and comics.

例えば、特許文献１には、任意の動画データが指定されると動画データに含まれる複数の画像の中から重要なシーンの画像を自動的に抽出し、抽出した画像を漫画的に配置変換して表示し、画像には吹き出しなどの画面効果等を自動的に配置する、こと等が記載されている。 For example, in Patent Document 1, when arbitrary moving image data is designated, an image of an important scene is automatically extracted from a plurality of images included in the moving image data, and the extracted images are arranged and converted into a cartoon. It is described that an image effect such as a balloon is automatically arranged on the image.

また例えば、特許文献２には、入力された映像信号及びそれに対応する音声信号に基づき、映像信号から映像内容の切り替わりの特徴となる映像フレームを、漫画の構成単位であるコマ画像として抽出し、前記コマ画像に登場する人物の人物領域を検出し、前記音声信号から音声認識された前記人物の台詞を文字列情報として生成し、前記文字列情報を前記人物の台詞内容として挿入した吹き出しを、前記人物領域に基づいて前記コマ画像に重畳する、こと等が記載されている。 Further, for example, in Patent Document 2, a video frame which is a feature of switching video contents from a video signal is extracted as a frame image which is a constituent unit of a comic based on an input video signal and an audio signal corresponding thereto, Detecting a person region of a person appearing in the frame image, generating a speech of the person speech-recognized from the voice signal as character string information, and inserting a balloon with the character string information inserted as speech content of the person, It describes superimposing on the frame image based on the person area.

特開２０１４−６９１２号公報JP 2014-6912 A 特開２００３−８５５７２号公報JP-A-2003-85572

上述したように、動画像から画像を抽出して漫画を生成する技術は存在する。しかしながら、発話音声に基づいて動画像、漫画等のコンテンツを生成することはできない。 As described above, there is a technique for extracting an image from a moving image to generate a comic. However, it is not possible to generate moving images, comics, and other contents based on speech sounds.

本発明は、このような状況に鑑みてなされたものであり、発話音声に基づいてコンテンツを生成できるようにすることを目的とする。 The present invention has been made in view of such a situation, and an object of the present invention is to enable content to be generated based on a speech sound.

本発明は、上記課題の少なくとも一部を解決する手段を複数含んでいるが、その例を挙げるならば、以下のとおりである。 The present invention includes a plurality of means for solving at least a part of the above-described problems, and examples thereof are as follows.

上記課題を解決すべく、本発明の一態様に係るコンテンツ生成システムは、発話者の音声情報を取得する音声取得部と、前記音声情報をテキスト情報に変換する変換部と、前記テキスト情報を解析する解析部と、前記テキスト情報の解析結果に基づいてシナリオ情報を生成するシナリオ情報生成部と、素材画像が格納された素材画像データベースと、前記シナリオ情報を所定の編集単位に区分し、前記編集単位毎に前記素材画像データベースに格納された前記素材画像を対応付けて対応情報を生成する対応付け部と、前記編集単位毎に対応付けられた前記素材画像を連結して動画像を生成する動画像生成部と、を備えることを特徴とする。 In order to solve the above problems, a content generation system according to an aspect of the present invention includes a voice acquisition unit that acquires voice information of a speaker, a conversion unit that converts the voice information into text information, and analyzes the text information. A scenario information generating unit that generates scenario information based on the analysis result of the text information, a material image database storing material images, and classifying the scenario information into predetermined editing units. An associating unit that associates the material images stored in the material image database for each unit to generate association information, and a moving image that generates a moving image by linking the material images associated with each edit unit And an image generation unit.

前記解析部は、前記テキスト情報の前記解析結果として、コンテンツにおける５Ｗ１Ｈを特定することができる。 The analysis unit can specify 5W1H in the content as the analysis result of the text information.

前記解析部は、コンテンツにおける登場人物の感情を特定することができる。 The analysis unit can specify the emotion of the character in the content.

前記解析部は、前記音声情報に基づいて前記コンテンツにおける登場人物の感情を特定することができる。 The analysis unit can specify an emotion of a character in the content based on the audio information.

前記解析部は、前記音声情報に基づいて前記発話者のメタデータを特定することができる。 The analysis unit may specify metadata of the speaker based on the audio information.

前記対応付け部は、前記発話者の前記メタデータに基づき、前記編集単位毎に前記素材画像データベースに格納された前記素材画像を対応付けて前記対応情報を生成することができる。 The associating unit may generate the association information by associating the material images stored in the material image database with respect to each editing unit based on the metadata of the speaker.

前記対応付け部は、前記編集単位毎に前記素材画像データベースに格納された、所定の作風の前記素材画像を対応付けて前記対応情報を生成することができる。 The associating unit may generate the association information by associating the material images of a predetermined style stored in the material image database for each of the editing units.

前記シナリオ情報生成部は、前記テキスト情報の解析結果に基づいて、コンテンツにおける登場人物に関する少なくとも発言、挙動、状況、及び感情のうちの一つを時系列に配置した前記シナリオ情報を生成することができる。 The scenario information generating unit may generate the scenario information in which at least one of a comment, a behavior, a situation, and an emotion regarding a character in the content is arranged in a time series based on the analysis result of the text information. it can.

前記コンテンツ生成システムは、ユーザからの操作入力を受け付ける操作部と、前記ユーザからの操作入力に基づき、前記シナリオ情報及び前記対応情報の少なくとも一方を修正する修正部と、を備えることができる。 The content generation system may include an operation unit that receives an operation input from a user, and a correction unit that corrects at least one of the scenario information and the correspondence information based on the operation input from the user.

前記修正部は、前記シナリオ情報及び前記対応情報の少なくとも一方に対する修正結果を学習し、学習結果に基づいて前記素材画像データベースを更新することができる。 The correction unit may learn a correction result for at least one of the scenario information and the correspondence information, and may update the material image database based on the learning result.

前記修正部は、前記ユーザから入力された前記素材画像を前記シナリオ情報の前記編集単位に対応付けることによって前記対応情報を修正することができる。 The correction unit may correct the correspondence information by associating the material image input by the user with the editing unit of the scenario information.

前記コンテンツ生成システムは、フリー素材としての画像データを収集し、前記素材画像として前記素材画像データベースに登録する収集部を、備えることができる。 The content generation system may include a collection unit that collects image data as free material and registers the image data as the material image in the material image database.

本発明の他の態様に係るコンテンツ生成方法は、発話者の音声情報を取得する音声取得ステップと、前記音声情報をテキスト情報に変換する変換ステップと、前記テキスト情報を解析する解析ステップと、前記テキスト情報の解析結果に基づいてシナリオ情報を生成するシナリオ情報生成ステップと、前記シナリオ情報を所定の編集単位に区分し、前記編集単位毎に素材画像データベースに格納された素材画像を対応付けて対応情報を生成する対応付けステップと、前記編集単位毎に対応付けられた前記素材画像を連結して動画像を生成する動画生成ステップと、を含むことを特徴とする。 A content generation method according to another aspect of the present invention includes a voice obtaining step of obtaining voice information of a speaker, a conversion step of converting the voice information into text information, an analysis step of analyzing the text information, A scenario information generating step of generating scenario information based on the analysis result of the text information, and dividing the scenario information into predetermined editing units, and associating the material images stored in the material image database for each of the editing units. And a moving image generating step of connecting the material images associated with each editing unit to generate a moving image.

本発明の一態様によれば、発話音声に基づいてコンテンツを生成することが可能となる。 According to one embodiment of the present invention, it is possible to generate a content based on a speech sound.

上記した以外の課題、構成及び効果は、以下の実施形態の説明により明らかにされる。 Problems, configurations, and effects other than those described above will be apparent from the following description of the embodiments.

図1は、本発明の一実施の形態に係るコンテンツ生成システムの構成例を示す図である。FIG. 1 is a diagram showing a configuration example of a content generation system according to an embodiment of the present invention. 図２は、端末装置を構成する機能ブロックの構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a functional block configuring the terminal device. 図３は、サーバ装置を構成する機能ブロックの構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a functional block configuring the server device. 図４は、シナリオ情報のデータ構造の一例を示す図である。FIG. 4 is a diagram illustrating an example of the data structure of the scenario information. 図５は、素材画像ＤＢのデータ構造の一例を示す図である。FIG. 5 is a diagram illustrating an example of the data structure of the material image DB. 図６は、対応情報のデータ構造の一例を示す図である。FIG. 6 is a diagram illustrating an example of the data structure of the correspondence information. 図７は、コンテンツ生成処理の一例を説明するフローチャートである。FIG. 7 is a flowchart illustrating an example of the content generation process. 図８は、修正処理の一例を説明するフローチャートである。FIG. 8 is a flowchart illustrating an example of the correction process. 図９は、コンテンツ表示画面の表示例を示す図である。FIG. 9 is a diagram illustrating a display example of a content display screen. 図１０は、コンピュータの構成例を示すブロック図である。FIG. 10 is a block diagram illustrating a configuration example of a computer.

以下、本発明に係る一実施の形態を図面に基づいて説明する。なお、実施の形態を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。また、以下の実施の形態において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須のものではないことは言うまでもない。また、「Ａからなる」、「Ａよりなる」、「Ａを有する」、「Ａを含む」と言うときは、特にその要素のみである旨明示した場合等を除き、それ以外の要素を排除するものでないことは言うまでもない。同様に、以下の実施の形態において、構成要素等の形状、位置関係等に言及するときは、特に明示した場合および原理的に明らかにそうでないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。 An embodiment according to the present invention will be described below with reference to the drawings. In all the drawings for describing the embodiments, the same members are denoted by the same reference numerals in principle, and the repeated description thereof will be omitted. Also, in the following embodiments, the components (including element steps, etc.) are not necessarily essential, unless otherwise specified or considered to be indispensable in principle. Needless to say. In addition, when saying “consisting of A”, “consisting of A”, “having A”, or “including A”, other elements are excluded unless otherwise specified. Needless to say, it doesn't. Similarly, in the following embodiments, when referring to the shapes, positional relationships, and the like of the components, the shapes are substantially the same unless otherwise specified, and in cases where it is clearly considered in principle not to be so. And the like.

＜本発明の一実施形態に係るコンテンツ生成システムの概要＞
本発明の一実施形態に係るコンテンツ生成システムは、発話者による発話音声を入力として、それに対応する動画像、漫画等のコンテンツを生成するものである。 <Overview of Content Generation System According to One Embodiment of the Present Invention>
A content generation system according to an embodiment of the present invention receives a speech voice of a speaker as input and generates a corresponding content such as a moving image and a comic.

発話者は、例えば、物語、ゲームプラン、体験、思い出等を話したり、小説等の書籍を朗読したりして音声を発話するものとする。発話者は、一人でもよいし、複数でもよい。 The speaker speaks, for example, a story, a game plan, an experience, memories, or the like, or reads a book such as a novel and speaks. The number of speakers may be one or more.

図１は、本発明の一実施形態に係るコンテンツ生成システム１０の構成例を示している。 FIG. 1 shows a configuration example of a content generation system 10 according to an embodiment of the present invention.

コンテンツ生成システム１０は、ネットワーク１１を介して接続される端末装置２０及びサーバ装置３０を備える。 The content generation system 10 includes a terminal device 20 and a server device 30 connected via a network 11.

ネットワーク１１は、ＬＡＮ(Local Area Network)、ＷＡＮ(Wide Area Network)、インターネット、公衆電話通信網等からなる。 The network 11 includes a LAN (Local Area Network), a WAN (Wide Area Network), the Internet, a public telephone communication network, and the like.

端末装置２０は、例えば、スマートフォン、タブレット型コンピュータ等の一般的なコンピュータから成る。 The terminal device 20 includes a general computer such as a smartphone and a tablet computer.

端末装置２０は、コンテンツの基となる発話音声を取得し、音声情報としてネットワーク１１を介してサーバ装置３０に送信する。また、端末装置２０は、サーバ装置３０からネットワーク１１を介して送信されるコンテンツを受信、再生して表示する。 The terminal device 20 acquires the uttered voice that is the basis of the content, and transmits the uttered voice to the server device 30 via the network 11 as voice information. In addition, the terminal device 20 receives, reproduces, and displays the content transmitted from the server device 30 via the network 11.

サーバ装置３０は、所謂クラウドネットワーク上に配置される。サーバ装置３０は、サーバコンピュータ等の一般的なコンピュータから成る。サーバ装置３０は、端末装置２０から送信される発話音声に基づいてコンテンツを生成する。 The server device 30 is arranged on a so-called cloud network. The server device 30 includes a general computer such as a server computer. The server device 30 generates content based on the uttered voice transmitted from the terminal device 20.

＜端末装置２０を構成する機能ブロックの構成例＞
次に、図２は、端末装置２０を構成する機能ブロックの構成例を示している。端末装置２０は、制御部２１、通信部２２、音声取得部２３、音声出力部２４、操作部２５、及び表示部２６の各機能ブロックを備える。 <Configuration Example of Functional Blocks Constituting Terminal Device 20>
Next, FIG. 2 illustrates a configuration example of a functional block configuring the terminal device 20. The terminal device 20 includes functional blocks of a control unit 21, a communication unit 22, a voice acquisition unit 23, a voice output unit 24, an operation unit 25, and a display unit 26.

制御部２１は、例えば、コンピュータ等が内蔵するＣＰＵ(Central Processing Unit)が所定のプログラムを実行することによって実現される。制御部２１は、端末装置２０の全体の動作を制御する。 The control unit 21 is realized, for example, by a CPU (Central Processing Unit) incorporated in a computer or the like executing a predetermined program. The control unit 21 controls the entire operation of the terminal device 20.

通信部２２は、例えば、コンピュータ等が内蔵する通信装置によって実現される。通信部２２は、ネットワーク１１を介してサーバ装置３０に接続し、各種の情報を通信する。例えば、通信部２２は、音声取得部２３が取得した発話者の音声情報をサーバ装置３０に送信する。また例えば、通信部２２は、サーバ装置３０から送信される、コンテンツ（動画像や漫画）を受信する。 The communication unit 22 is realized by, for example, a communication device built in a computer or the like. The communication unit 22 connects to the server device 30 via the network 11 and communicates various information. For example, the communication unit 22 transmits the speech information of the speaker acquired by the speech acquisition unit 23 to the server device 30. Further, for example, the communication unit 22 receives the content (moving image or comic) transmitted from the server device 30.

音声取得部２３は、例えば、コンピュータ等が内蔵する出力装置に含まれるマイクロフォンによって実現される。音声取得部２３は、発話者の音声を取得し、その結果得られる音声情報を通信部２２に出力する。なお、音声取得部２３によって取得した発話者の音声を録音し、発話後にサーバ装置３０に送信してもよい。また、端末装置２０に対して外部から発話者の音声を予め録音した音声情報を入力し、サーバ装置３０に送信するようにしてもよい。 The sound acquisition unit 23 is realized by, for example, a microphone included in an output device built in a computer or the like. The voice acquisition unit 23 acquires the voice of the speaker and outputs the resulting voice information to the communication unit 22. Note that the voice of the speaker acquired by the speech acquisition unit 23 may be recorded and transmitted to the server device 30 after the speech. Alternatively, voice information obtained by pre-recording the voice of the speaker from the outside may be input to the terminal device 20 and transmitted to the server device 30.

音声出力部２４は、例えば、コンピュータ等が内蔵する出力装置に含まれるスピーカによって実現される。音声出力部２４は、例えば、サーバ装置３０から送信されるコンテンツの音声を出力する。 The audio output unit 24 is realized by, for example, a speaker included in an output device built in a computer or the like. The audio output unit 24 outputs, for example, audio of the content transmitted from the server device 30.

表示部２６は、例えば、コンピュータ等が内蔵する出力装置に含まれるディスプレイによって実現される。表示部２６は、サーバ装置３０から供給されるコンテンツの画面を表示する。また、表示部２６は、サーバ装置３０から供給される画面情報に基づき、シナリオ情報３３１（図４）や対応情報３３３（図６）を修正するための修正画面を表示する。 The display unit 26 is realized by, for example, a display included in an output device incorporated in a computer or the like. The display unit 26 displays a screen of the content supplied from the server device 30. Further, the display unit 26 displays a modification screen for modifying the scenario information 331 (FIG. 4) and the correspondence information 333 (FIG. 6) based on the screen information supplied from the server device 30.

＜サーバ装置３０を構成する機能ブロックの構成例＞
次に、図３は、サーバ装置３０を構成する機能ブロックの構成例を示している。サーバ装置３０は、制御部３１、通信部３２、及び記憶部３３の各機能ブロックを備える。 <Configuration Example of Functional Blocks Constituting Server Device 30>
Next, FIG. 3 illustrates a configuration example of a functional block configuring the server device 30. The server device 30 includes functional blocks of a control unit 31, a communication unit 32, and a storage unit 33.

制御部３１は、例えば、サーバコンピュータが内蔵するＣＰＵが所定のプログラムを実行することにより実現される。制御部３１は、サーバ装置３０の全体を制御する。制御部３１は、変換部３１１、解析部３１２、シナリオ情報生成部３１３、対応付け部３１４、動画像生成部３１５、修正部３１６、収集部３１７、漫画生成部３１８、及び表示制御部３１９の各機能ブロックを有する。なお、制御部３１の各機能ブロックの少なくとも一部は、例えば機械学習や深層学習により得られた学習モデルを用いて実現してもよい。 The control unit 31 is realized, for example, by a CPU incorporated in a server computer executing a predetermined program. The control unit 31 controls the entire server device 30. The control unit 31 includes a conversion unit 311, an analysis unit 312, a scenario information generation unit 313, an association unit 314, a moving image generation unit 315, a correction unit 316, a collection unit 317, a cartoon generation unit 318, and a display control unit 319. Has functional blocks. Note that at least a part of each functional block of the control unit 31 may be realized using a learning model obtained by, for example, machine learning or deep learning.

変換部３１１は、端末装置２０から送信される音声情報をテキスト情報に変換する。 The conversion unit 311 converts voice information transmitted from the terminal device 20 into text information.

解析部３１２は、音声情報及びテキスト情報を解析し、解析結果を記憶する。具体的には例えば、解析部３１２は、音声情報から発話者の発話音声のニュアンス（抑揚、強弱、トーン、スピード等）を検出したり、テキスト情報に自然言語処理を行ったりする。なお、テキスト情報の解析は、自然言語処理に限らず、任意の方法を適用できる。そして、解析部３１２は、解析結果としてコンテンツの５Ｗ１Ｈを特定する。 The analysis unit 312 analyzes the voice information and the text information, and stores the analysis result. Specifically, for example, the analysis unit 312 detects nuances (intonation, strength, tone, speed, etc.) of the uttered voice of the speaker from the voice information, and performs natural language processing on the text information. The analysis of the text information is not limited to natural language processing, and any method can be applied. Then, the analysis unit 312 specifies 5W1H of the content as an analysis result.

具体的には例えば、発話音声のニュアンス、及びテキスト情報の自然言語処理の結果の少なくとも一方に基づき、コンテンツのタイトル、登場人物（語り手を含む）の感情（喜怒哀楽等）、発言（台詞）、状況（日時、季節、時代、場所等）、挙動、メタデータ（年代、性別、出身国（使用言語）、出身地方（使用言語の訛）等）を特定する。 Specifically, for example, based on at least one of the nuances of the uttered voice and the result of the natural language processing of the text information, the title of the content, the emotions of the characters (including the narrator) (emotions, emotions, etc.), the speech (line) , Situation (date and time, season, era, place, etc.), behavior, and metadata (age, gender, country of origin (language used), region of origin (accent of language used, etc.)) are specified.

さらに、解析部３１２は、発話音声のニュアンスに基づき、発話者のメタデータを特定する。なお、解析部３１２によって特定されたコンテンツのタイトル、登場人物、発話者のメタデータについては、ユーザが入力できるようにしてもよい。 Further, the analysis unit 312 specifies the metadata of the speaker based on the nuance of the uttered voice. The metadata of the title, characters, and speakers of the content specified by the analysis unit 312 may be input by the user.

シナリオ情報生成部３１３は、解析部３１２による解析結果に基づき、サーバ装置３０にて生成されるコンテンツの脚本に相当するシナリオ情報３３１を生成し、記憶部３３に格納する。 The scenario information generation unit 313 generates scenario information 331 corresponding to a script of the content generated by the server device 30 based on the analysis result by the analysis unit 312, and stores the scenario information 331 in the storage unit 33.

図４は、シナリオ情報３３１のデータ構造の一例を示している。同図の場合、シナリオ情報３３１には、タイトルが付与され、時系列０番として、発話者のメタデータが配置される。また、シナリオ情報３３１には、コンテンツの進行に対して時系列順（時系列１番以降）に登場人物の発言（台詞）及び挙動の少なくとも一方が配置される。さらに、シナリオ情報３３１には、登場人物が発言や挙動に行ったときの状況、及び感情が対応付けられている。さらに、シナリオ情報３３１には、登場人物のメタデータが配置される。なお、シナリオ情報３３１には、登場人物に関する少なくとも発言、挙動、状況、及び感情のうちの一つを時系列に配置すればよい。 FIG. 4 shows an example of the data structure of the scenario information 331. In the case of the figure, a title is given to the scenario information 331, and the metadata of the speaker is arranged as the time series number 0. Further, in the scenario information 331, at least one of the utterance (line) and the behavior of the character is arranged in chronological order with respect to the progress of the content (time series number 1 and later). Further, the scenario information 331 is associated with the situation and emotion when the character makes a statement or behavior. Further, in the scenario information 331, metadata of the characters is arranged. In the scenario information 331, at least one of a comment, a behavior, a situation, and an emotion regarding a character may be arranged in a time series.

図３に戻る。対応付け部３１４は、生成されたシナリオ情報３３１を、その内容に応じて、登場人物（語り手を含む）の発言及び挙動を少なくとも１以上含むように所定の編集単位毎（例えば、シーン毎等）に区分する。さらに、対応付け部３１４は、素材画像ＤＢ（データベース）３３２に格納されている素材画像のうち、シナリオ情報３３１の編集単位毎に、登場人物の感情、状況、挙動、メタデータに一致するもの（完全に一致するものに限らず、類似しているものを含む）を検索する。そして、対応付け部３１４は、各編集単位に素材画像を対応付けた対応情報３３３を生成し、記憶部３３に格納する。 Referring back to FIG. The associating unit 314 converts the generated scenario information 331 into predetermined editing units (for example, scenes, etc.) so as to include at least one utterance and behavior of a character (including a narrator) according to the content thereof. Classify into. Further, the associating unit 314 determines, for each editing unit of the scenario information 331, the material image stored in the material image DB (database) 332 that matches the emotion, situation, behavior, and metadata of the character ( Search for not only exact matches, but also similar ones). Then, the associating unit 314 generates association information 333 in which a material image is associated with each editing unit, and stores the information in the storage unit 33.

なお、素材画像の検索に際しては、発話者のメタデータに、素材画像における登場人物のメタデータが一致するものを検索するようにしてもよい。具体的には例えば、発話者の出身国が日本である場合、登場人物には日本人が登場し、日本語を話している素材画像を検出するようにする。また、発話者のメタデータに、素材画像の作風が一致するものを検索するようにしてもよい。具体的には例えば、発話者の女性である場合、作風が女性向けである素材画像を検出するようにする。なお、ユーザが、検索する素材画像の作風を選択できるようにしてもよい。そして、複数の素材画像が検索された場合、その中からシナリオ情報３３１の各編集単位に対応付けるものを対応付け部３１４が選択してもよいし、ユーザに選択させてもよい。 When searching for the material image, a search may be made for a metadata in which the metadata of the character in the material image matches the metadata of the speaker. Specifically, for example, when the country of origin of the speaker is Japan, a material image in which a Japanese character appears and speaks Japanese is detected. Further, a search may be made for a metadata whose style of the material image matches the metadata of the speaker. Specifically, for example, when the speaker is a woman, a material image whose style is directed to a woman is detected. The user may be allowed to select the style of the material image to be searched. When a plurality of material images are searched, the associating unit 314 may select an image to be associated with each editing unit of the scenario information 331 from among them, or the user may select it.

図５は、素材画像ＤＢ３３２のデータ構造の一例を示している。素材画像ＤＢ３３２は予め生成されており、記憶部３３に格納されている。素材画像ＤＢ３３２には、素材ＩＤに対応付けて、素材画像データ、素材内容、パラメータ、メタデータ、及び作風が記録されている。素材ＩＤは、素材画像を特定するための識別子である。素材画像データは、例えば、３Ｄモーションデータや２Ｄモーションデータの動画像であり、そのデータ形式は任意である。また、素材画像データは、静止画像であり、そのデータ形式は任意である。素材内容には、素材画像データの内容（登場人物とその挙動等）が記録されている。パラメータには、素材画像データにおける登場人物の感情（喜怒哀楽等）、状況（時刻、季節、時代、場所）等が数値化されて記録されている。メタデータには、素材画像データにおける登場人物の年代、性別、出身国（使用言語）、出身地方（使用言語の訛）等のメタデータが記録されている。作風には、素材画像データの作風（万人向け、男性向け、女性向け、子供向け、成人向け、時代劇、西洋風、東洋風、実写、アニメーション等）が記録されている。 FIG. 5 shows an example of the data structure of the material image DB 332. The material image DB 332 is generated in advance and stored in the storage unit 33. In the material image DB 332, material image data, material content, parameters, metadata, and style are recorded in association with the material ID. The material ID is an identifier for specifying the material image. The material image data is, for example, a moving image of 3D motion data or 2D motion data, and its data format is arbitrary. The material image data is a still image, and its data format is arbitrary. In the material contents, the contents of the material image data (characters and their behavior, etc.) are recorded. In the parameters, the emotions (emotions, sorrows, etc.) of the characters in the material image data, the situations (time, season, era, place) and the like are quantified and recorded. In the metadata, metadata such as the age of the character in the material image data, gender, country of origin (language used), region of origin (accent of the language used), and the like are recorded. In the style, the style of the material image data (for everyone, for men, for women, for children, for adults, historical drama, western style, oriental style, live action, animation, etc.) is recorded.

図６は、対応情報３３３のデータ構造の一例を示している。対応情報３３３には、シナリオ情報３３１の各編集単位の時系列番号（図６の場合、シーン番号）に対応付けて素材ＩＤが記録される。 FIG. 6 shows an example of the data structure of the correspondence information 333. In the correspondence information 333, a material ID is recorded in association with the time series number (scene number in FIG. 6) of each editing unit of the scenario information 331.

図３に戻る。動画像生成部３１５は、生成された対応情報３３３に基づき、素材ＩＤに対応する素材画像データを素材画像ＤＢ３３２から取得して時系列順に連結する。さらに、動画像生成部３１５は、シナリオ情報３３１を参照し、登場人物の発話（台詞）を、連結した素材画像データに字幕として重畳したり、あるいは発話に対応する音声を合成したりして、字幕または合成音声を含むコンテンツとしての動画像を生成して記憶部３３に格納する。 Referring back to FIG. The moving image generation unit 315 acquires the material image data corresponding to the material ID from the material image DB 332 based on the generated correspondence information 333, and connects them in chronological order. Further, the moving image generation unit 315 refers to the scenario information 331, and superimposes the utterance (line) of the character as subtitles on the connected material image data, or synthesizes the voice corresponding to the utterance, A moving image as a content including subtitles or synthesized audio is generated and stored in the storage unit 33.

修正部３１６は、端末装置２０から送信されるユーザからの修正指示に基づき、シナリオ情報３３１を修正、補完する。また、修正部３１６は、端末装置２０から送信されるユーザからの修正指示に基づき、対応情報３３３を修正する。なお、ユーザが対応情報３３３を修正する場合、ユーザが任意の素材画像データ（動画像または静止画像のどちらでもよい）を素材画像ＤＢ３３２に追加登録して、シナリオ情報３３１の任意の編集単位に対応付けるようにしてもよい。さらに、修正部３１６は、シナリオ情報３３１や対応情報３３３に対する修正結果を学習し、学習結果に基づいて素材画像ＤＢ３３２における素材内容やパラメータを更新する。 The correction unit 316 corrects and supplements the scenario information 331 based on a correction instruction from the user transmitted from the terminal device 20. The correction unit 316 corrects the correspondence information 333 based on a correction instruction from the user transmitted from the terminal device 20. When the user modifies the correspondence information 333, the user additionally registers arbitrary material image data (either a moving image or a still image) in the material image DB 332 and associates the material image data with an arbitrary editing unit of the scenario information 331. You may do so. Further, the correction unit 316 learns a correction result for the scenario information 331 and the correspondence information 333, and updates material contents and parameters in the material image DB 332 based on the learning result.

収集部３１７は、インターネットにて、所謂、フリー素材として公開されている画像データ（動画像データ、及び静止画像データ）を収集し、収集した画像データを解析して、その素材内容、パラメータ、メタデータ、及び作風を設定し、素材画像データとして素材画像ＤＢ３３２に登録する。 The collection unit 317 collects image data (moving image data and still image data) disclosed as so-called free materials on the Internet, analyzes the collected image data, and analyzes the material contents, parameters, and meta data. Data and style are set, and registered in the material image DB 332 as material image data.

漫画生成部３１８は、動画像生成部３１５によって生成された動画像に基づき、コンテンツとしての漫画を生成して記憶部３３に格納する。具体的には例えば、動画像の各シーンから代表画像を抽出し、代表画像に登場人物の発話（台詞）を吹き出しとして重畳することにより、コンテンツとしての漫画を生成する。 The cartoon generation unit 318 generates a cartoon as content based on the moving image generated by the moving image generation unit 315, and stores it in the storage unit 33. Specifically, for example, a manga as content is generated by extracting a representative image from each scene of a moving image and superimposing a speech (line) of a character as a balloon on the representative image.

表示制御部３１９は、端末装置２０からの要求に応じ、生成されたコンテンツを端末装置２０に供給して再生、表示させる。また、表示制御部３１９は、端末装置２０からの要求に応じ、シナリオ情報３３１や対応情報３３３をユーザが修正可能な修正画面を表示するための画面情報を生成する。生成された画面情報は、通信部３２によって端末装置２０に送信される。 The display control unit 319 supplies the generated content to the terminal device 20 according to a request from the terminal device 20, and reproduces and displays the content. Further, the display control unit 319 generates screen information for displaying a correction screen in which the user can correct the scenario information 331 and the correspondence information 333 in response to a request from the terminal device 20. The generated screen information is transmitted to the terminal device 20 by the communication unit 32.

通信部３２は、例えば、サーバコンピュータが内蔵する通信装置により実現される。通信部３２は、ネットワーク１１を介して接続してきた端末装置２０と各種の情報を通信する。例えば、通信部３２は、端末装置２０から送信される音声情報を受信する。また、例えば、通信部３２は、ネットワーク１１を介し、端末装置２０にコンテンツ（動画像や漫画）や画面情報を送信する。 The communication unit 32 is realized by, for example, a communication device built in the server computer. The communication unit 32 communicates various information with the terminal device 20 connected via the network 11. For example, the communication unit 32 receives audio information transmitted from the terminal device 20. In addition, for example, the communication unit 32 transmits content (moving image or comic) and screen information to the terminal device 20 via the network 11.

記憶部３３は、例えばサーバコンピュータが内蔵する記憶装置によって実現される。記憶部３３には、シナリオ情報３３１、素材画像ＤＢ３３２、対応情報３３３、及びコンテンツ３３４が格納される。記憶部３３に格納されたコンテンツ３３４は、コンテンツ３３４の基となった発話音声を送信してきた端末装置２０の他、所定の端末装置等に供給するようにしてもよい。 The storage unit 33 is realized by, for example, a storage device built in the server computer. The storage unit 33 stores scenario information 331, a material image DB 332, correspondence information 333, and contents 334. The content 334 stored in the storage unit 33 may be supplied to a predetermined terminal device or the like, in addition to the terminal device 20 that has transmitted the utterance voice on which the content 334 is based.

なお、図３に示されたサーバ装置３０の機能ブロックの一部または全てを端末装置２０に設けるようにしてもよい。 Note that some or all of the functional blocks of the server device 30 shown in FIG.

＜コンテンツ生成システム１０によるコンテンツ生成処理＞
次に、コンテンツ生成システム１０によるコンテンツ生成処理について説明する。図７は、コンテンツ生成システム１０によるコンテンツ生成処理の一例を説明するフローチャートである。 <Content generation processing by the content generation system 10>
Next, content generation processing by the content generation system 10 will be described. FIG. 7 is a flowchart illustrating an example of a content generation process performed by the content generation system 10.

該コンテンツ作成処理は、例えば、端末装置２０に対するユーザ（発話者等）からの所定の開始操作に応じて開始される。 The content creation process is started, for example, in response to a predetermined start operation on the terminal device 20 by a user (speaker or the like).

はじめに、端末装置２０は、音声取得部２３により、発話者の発話音声を取得し、その結果得られる音声情報を通信部２２に出力し、通信部２２により、音声取得部２３からの音声情報を、ネットワーク１１を介してサーバ装置３０に送信する。サーバ装置３０は、通信部３２により、端末装置２０からの音声情報を制御部３１に出力する（ステップＳ１）。 First, the terminal device 20 acquires the uttered speech of the speaker by the speech acquisition unit 23, outputs the speech information obtained as a result to the communication unit 22, and outputs the speech information from the speech acquisition unit 23 by the communication unit 22. , To the server device 30 via the network 11. The server device 30 outputs the voice information from the terminal device 20 to the control unit 31 through the communication unit 32 (Step S1).

次に、制御部３１は、変換部３１１により、発話者の音声情報をテキスト情報に変換する（ステップＳ２）。 Next, the control unit 31 converts the voice information of the speaker into text information by the conversion unit 311 (step S2).

次に、制御部３１は、解析部３１２により、音声情報及びテキスト情報を解析する（ステップＳ３）。次に、制御部３１は、シナリオ情報生成部３１３により、解析部３１２による解析結果に基づいてシナリオ情報３３１を生成し、記憶部３３に格納する（ステップＳ４）。 Next, the control unit 31 analyzes the voice information and the text information by the analysis unit 312 (step S3). Next, the control unit 31 causes the scenario information generation unit 313 to generate the scenario information 331 based on the analysis result by the analysis unit 312, and stores the scenario information 331 in the storage unit 33 (Step S4).

次に、制御部３１は、対応付け部３１４により、シナリオ情報３３１を所定の編集単位毎に区分し（ステップＳ５）、素材画像ＤＢ３３２を参照し、シナリオ情報３３１の編集単位毎に素材画像を対応付けた対応情報３３３を生成し、記憶部３３に格納する（ステップＳ６）。 Next, the control unit 31 divides the scenario information 331 into predetermined editing units by the associating unit 314 (step S5), and refers to the material image DB 332 to associate the material image with each editing unit of the scenario information 331. The attached information 333 is generated and stored in the storage unit 33 (step S6).

次に、制御部３１は、動画像生成部３１５により、対応情報３３３に基づき、素材ＩＤに対応する素材画像データを素材画像ＤＢ３３２から取得して時系列順に連結する。さらに、動画像生成部３１５は、シナリオ情報３３１を参照し、登場人物の発話（台詞）を、連結した素材画像データに字幕として重畳したり、あるいは発話に対応する音声を合成したりして、字幕または合成音声を含む動画像を生成し、コンテンツ３３４として記憶部３３に格納する（ステップＳ７）。 Next, the control unit 31 acquires the material image data corresponding to the material ID from the material image DB 332 based on the correspondence information 333 by the moving image generating unit 315, and connects them in chronological order. Further, the moving image generation unit 315 refers to the scenario information 331, and superimposes the utterance (line) of the character as subtitles on the connected material image data, or synthesizes the voice corresponding to the utterance, A moving image including captions or synthesized voice is generated and stored in the storage unit 33 as the content 334 (step S7).

次に、制御部３１は、漫画生成部３１８により、動画像に基づいて漫画を生成し、コンテンツ３３４として記憶部３３に格納する（ステップＳ８）。 Next, the control unit 31 causes the cartoon generation unit 318 to generate a cartoon based on the moving image and stores it as the content 334 in the storage unit 33 (step S8).

以上で、コンテンツ生成システム１０によるコンテンツ生成処理は終了される。なお、生成されたコンテンツ３３４は、端末装置２０等からの要求に応じて要求元に供給されて再生される。 Thus, the content generation processing by the content generation system 10 ends. Note that the generated content 334 is supplied to a request source and reproduced in response to a request from the terminal device 20 or the like.

以上に説明したコンテンツ生成処理によれば、発話者による発話音声に基づいてコンテンツとして動画像及び漫画を生成することが可能となる。 According to the content generation processing described above, it is possible to generate a moving image and a comic as content based on the speech sound of the speaker.

＜コンテンツ生成システム１０による修正処理＞
次に、コンテンツ生成システム１０による修正処理について説明する。図８は、コンテンツ生成システム１０による修正処理の一例を説明するフローチャートである。 <Correction processing by the content generation system 10>
Next, correction processing by the content generation system 10 will be described. FIG. 8 is a flowchart illustrating an example of a correction process performed by the content generation system 10.

該修正処理は、端末装置２０に対してユーザから修正要求が入力され、該修正要求がサーバ装置３０に送信されて制御部３１に入力されたときに開始される。 The correction process is started when a correction request is input from the user to the terminal device 20, and the correction request is transmitted to the server device 30 and input to the control unit 31.

はじめに、制御部３１は、修正部３１６により、ユーザからの修正要求がシナリオ情報３３１の修正を要求するものであるか否かを判定する（ステップＳ１１）。 First, the control unit 31 determines whether the correction request from the user is a request for correction of the scenario information 331 by the correction unit 316 (step S11).

ここで、修正部３１６がシナリオ情報３３１の修正を要求するものであると判定した場合（ステップＳ１１でＹＥＳ）、処理はステップＳ１２に進められる。次に、制御部３１は、表示制御部３１９により、シナリオ情報３３１を修正するための修正画面の画面情報を生成し、通信部３２が、生成された画面情報を端末装置２０に送信する。端末装置２０では、送信された画面情報に基づいてシナリオ情報３３１を修正するための画面が表示され、該画面に対し、操作部２５を用いてユーザが修正指示を入力すると、通信部２２が該修正指示をサーバ装置３０に送信する。そして、サーバ装置３０にて、制御部３１は、修正部３１６により、ユーザからの修正指示に応じて、記憶部３３に格納されているシナリオ情報３３１を修正、補完する（ステップＳ１２）。 If the correcting unit 316 determines that the scenario information 331 is to be corrected (YES in step S11), the process proceeds to step S12. Next, the control unit 31 causes the display control unit 319 to generate screen information of a correction screen for correcting the scenario information 331, and the communication unit 32 transmits the generated screen information to the terminal device 20. The terminal device 20 displays a screen for correcting the scenario information 331 based on the transmitted screen information. When the user inputs a correction instruction to the screen using the operation unit 25, the communication unit 22 A correction instruction is transmitted to the server device 30. Then, in the server device 30, the control unit 31 corrects and supplements the scenario information 331 stored in the storage unit 33 by the correction unit 316 in response to a correction instruction from the user (step S12).

なお、修正部３１６がシナリオ情報３３１の修正を要求するものではないと判定した場合（ステップＳ１１でＮＯ）、ステップＳ１２はスキップされ、処理はステップＳ１３に進められる。 If the modifying unit 316 determines that the modification of the scenario information 331 is not requested (NO in step S11), step S12 is skipped, and the process proceeds to step S13.

次に、制御部３１は、修正部３１６により、ユーザからの修正要求が対応情報３３３の修正を要求するものであるか否かを判定する（ステップＳ１３）。 Next, the control unit 31 determines whether or not the correction request from the user requests the correction of the correspondence information 333 by the correction unit 316 (step S13).

ここで、修正部３１６が対応情報３３３の修正を要求するものであると判定した場合（ステップＳ１３でＹＥＳ）、処理はステップＳ１４に進められる。次に、制御部３１は、表示制御部３１９により、対応情報３３３を修正するための修正画面の画面情報を生成し、通信部３２が、生成された画面情報を端末装置２０に送信する。端末装置２０では、送信された画面情報に基づいて対応情報３３３を修正するための画面が表示され、該画面に対し、操作部２５を用いてユーザが修正指示を入力すると、通信部２２が該修正指示をサーバ装置３０に送信する。そして、サーバ装置３０にて、制御部３１は、修正部３１６により、ユーザからの修正指示に応じて、記憶部３３に格納されている対応情報３３３を修正する（ステップＳ１４）。 Here, when the correction unit 316 determines that the correction of the correspondence information 333 is requested (YES in step S13), the process proceeds to step S14. Next, the control unit 31 causes the display control unit 319 to generate screen information of a correction screen for correcting the correspondence information 333, and the communication unit 32 transmits the generated screen information to the terminal device 20. The terminal device 20 displays a screen for correcting the correspondence information 333 based on the transmitted screen information. When the user inputs a correction instruction to the screen using the operation unit 25, the communication unit 22 A correction instruction is transmitted to the server device 30. Then, in the server device 30, the control unit 31 causes the correction unit 316 to correct the correspondence information 333 stored in the storage unit 33 in response to a correction instruction from the user (step S14).

なお、修正部３１６が対応情報３３３の修正を要求するものではないと判定した場合（ステップＳ１３でＮＯ）、ステップＳ１４はスキップされる。 If the correction unit 316 determines that the correction of the correspondence information 333 is not requested (NO in step S13), step S14 is skipped.

次に、制御部３１は、動画像生成部３１５により、修正されたシナリオ情報３３１及び対応情報３３３に基づいてコンテンツとしての動画像を再生成して記憶部３３に格納するとともに、漫画生成部３１８により、再生成された動画像に基づいて、コンテンツとしての漫画を再生成して記憶部３３に格納する（ステップＳ１５）。 Next, the control unit 31 causes the moving image generation unit 315 to regenerate a moving image as content based on the corrected scenario information 331 and the correspondence information 333 and store the generated moving image in the storage unit 33, and the comic generation unit 318. Thereby, the comic as the content is regenerated based on the regenerated moving image and stored in the storage unit 33 (step S15).

次に、制御部３１は、修正部３１６により、シナリオ情報３３１や対応情報３３３に対する修正結果を学習し、学習結果に基づいて素材画像ＤＢ３３２における素材内容やパラメータを更新する（ステップＳ１６）。なお、シナリオ情報生成部３１３や対応付け部３１４においても、シナリオ情報３３１や対応情報３３３に対する修正結果を学習し、以降の処理に適用するようにしてもよい。以上で、コンテンツ生成システム１０による修正処理は終了される。 Next, the control unit 31 learns a correction result for the scenario information 331 and the correspondence information 333 by the correction unit 316, and updates material contents and parameters in the material image DB 332 based on the learning result (step S16). Note that the scenario information generating unit 313 and the associating unit 314 may also learn the correction result for the scenario information 331 and the corresponding information 333 and apply the result to the subsequent processing. Thus, the correction processing by the content generation system 10 is completed.

コンテンツ生成システム１０による修正処理によれば、ユーザはシナリオ情報３３１や対応情報３３３を修正し、修正結果を反映させたコンテンツを得ることができる。さらに、ユーザからの修正指示に基づく修正結果が学習されて素材画像ＤＢ３３２に反映されるので、これ以降、より適切な素材画像が検索されてコンテンツが生成されるようになる。 According to the modification processing by the content generation system 10, the user can modify the scenario information 331 and the correspondence information 333 and obtain a content reflecting the modification result. Further, the correction result based on the correction instruction from the user is learned and reflected in the material image DB 332, so that a more appropriate material image is searched and content is generated thereafter.

＜コンテンツ表示画面について＞
次に、図９は、端末装置２０等におけるコンテンツ表示画面５０の表示例を示している。 <About the content display screen>
Next, FIG. 9 shows a display example of the content display screen 50 on the terminal device 20 or the like.

コンテンツ表示画面５０には、コンテンツが表示されるコンテンツ再生領域５１が設けられている。コンテンツ再生領域５１には、字幕５３を表示させることができる。コンテンツ再生領域５１の上側にはコンテンツのタイトル５２が表示される。コンテンツ再生領域５１の下側には、ユーザがコンテンツの再生開始を指示するための操作ボタン５４、コンテンツの早戻しを指示するための操作ボタン５５、及びコンテンツの早送りを指示するための操作ボタン５６が設けられている。なお、タイトル５２、字幕５３、及び操作ボタン５４〜５６の表示位置は、図示した例に限られず任意である。 The content display screen 50 is provided with a content reproduction area 51 in which content is displayed. A subtitle 53 can be displayed in the content reproduction area 51. Above the content reproduction area 51, a title 52 of the content is displayed. At the lower side of the content reproduction area 51, an operation button 54 for the user to instruct the start of reproduction of the content, an operation button 55 for instructing the fast rewind of the content, and an operation button 56 for instructing the fast forward of the content Is provided. In addition, the display positions of the title 52, the subtitle 53, and the operation buttons 54 to 56 are not limited to the illustrated example and are arbitrary.

コンテンツ再生領域５１の右側には、コンテンツの現在の作風を表すとともに、ユーザがコンテンツの作風の変更を指示することができる作風ボタン６１〜６５が設けられている。同図の場合、作風ボタン６１〜６５のうち、作風（女性向け）ボタン６２が強調表示されており、コンテンツの現在の作風が女性向けであることを表している。 On the right side of the content reproduction area 51, style buttons 61 to 65 are provided, which indicate the current style of the content and allow the user to instruct the change of the style of the content. In the case of the drawing, the style (for women) button 62 among the style buttons 61 to 65 is highlighted, indicating that the current style of the content is for women.

この状態において、例えば、ユーザが作風（子供向け）ボタン６３を選択した場合、上述したコンテンツ生成処理（図７）のステップＳ６〜Ｓ８が再度実行されて、子供向けの素材画像データを用いてコンテンツが再生成され、コンテンツ再生領域５１に表示される。 In this state, for example, when the user selects the style (for children) button 63, the above-described steps S6 to S8 of the content generation processing (FIG. 7) are executed again, and the content is generated using the material image data for children. Are regenerated and displayed in the content reproduction area 51.

なお、作風ボタン６１〜６５の表示位置や数は、図示した例に限られず任意である。また、作風ボタンを１つだけ表示し、該作風ボタンが操作された場合、作風の選択肢がポップアップ表示されるようにしてもよい。 In addition, the display positions and the number of the style buttons 61 to 65 are not limited to the illustrated example and are arbitrary. Alternatively, only one style button may be displayed, and when the style button is operated, the style options may be displayed in a pop-up display.

＜一般的なコンピュータの構成例＞
上述したように、端末装置２０、及びサーバ装置３０は、一般的なコンピュータによって実現することができる。 <Example of general computer configuration>
As described above, the terminal device 20 and the server device 30 can be realized by a general computer.

図１０は、端末装置２０、及びサーバ装置３０を実現する一般的なコンピュータの構成例を示している。 FIG. 10 illustrates a configuration example of a general computer that realizes the terminal device 20 and the server device 30.

該コンピュータ１００において、ＣＰＵ１０１，ＲＯＭ（Read Only Memory）１０２，ＲＡＭ（Random Access Memory）１０３は、バス１０４により相互に接続されている。 In the computer 100, a CPU 101, a ROM (Read Only Memory) 102, and a RAM (Random Access Memory) 103 are interconnected by a bus 104.

バス１０４には、さらに、入出力インターフェース１０５が接続されている。入出力インターフェース１０５には、入力装置１０６、出力装置１０７、記憶装置１０８、通信装置１０９、およびドライブ装置１１０が接続されている。 The bus 104 is further connected to an input / output interface 105. An input device 106, an output device 107, a storage device 108, a communication device 109, and a drive device 110 are connected to the input / output interface 105.

入力装置１０６は、キーボード、マウス、タッチパネル、マイクロフォン等よりなり、例えば、端末装置２０の音声取得部２３及び操作部２５（図２）として機能する。出力装置１０７は、ディスプレイ、スピーカ等よりなり、例えば、端末装置２０の音声出力部２４及び表示部２６（図２）として機能する。 The input device 106 includes a keyboard, a mouse, a touch panel, a microphone, and the like, and functions as, for example, the voice acquisition unit 23 and the operation unit 25 (FIG. 2) of the terminal device 20. The output device 107 includes a display, a speaker, and the like, and functions as, for example, the audio output unit 24 and the display unit 26 (FIG. 2) of the terminal device 20.

記憶装置１０８は、ＨＤＤ(hard Disk Drive)、ＳＳＤ(solid State Drive)などからなり、例えば、サーバ装置３０の記憶部３３（図３）として機能する。通信装置１０９は、ＬＡＮインターフェースやＷｉ−Ｆｉインターフェースからなり、例えば、端末装置２０の通信部２２（図２）、及びサーバ装置３０の通信部３２（図３）として機能する。ドライブ装置１１０は、磁気ディスク、光ディスク、光磁気ディスク、又は半導体メモリ等のリムーバブルメディア１１１を駆動する。 The storage device 108 includes a hard disk drive (HDD), a solid state drive (SSD), and the like, and functions as, for example, the storage unit 33 (FIG. 3) of the server device 30. The communication device 109 includes a LAN interface and a Wi-Fi interface, and functions as, for example, the communication unit 22 of the terminal device 20 (FIG. 2) and the communication unit 32 of the server device 30 (FIG. 3). The drive device 110 drives a removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory.

以上のように構成されるコンピュータ１００では、ＣＰＵ１０１が、例えば、記憶装置１０８に記憶されているプログラムを、入出力インターフェース１０５およびバス１０４を介して、ＲＡＭ１０３にロードして実行することにより、例えば、端末装置２０の制御部２１（図２）、及びサーバ装置３０の制御部３１（図３）が実現される。 In the computer 100 configured as described above, the CPU 101 loads, for example, a program stored in the storage device 108 into the RAM 103 via the input / output interface 105 and the bus 104 and executes the program. The control unit 21 (FIG. 2) of the terminal device 20 and the control unit 31 (FIG. 3) of the server device 30 are realized.

コンピュータ１００（ＣＰＵ１０１）が実行するプログラムは、例えば、パッケージメディア等としてのリムーバブルメディア１１１に記録して提供することができる。また、プログラムは、ローカルエリアネットワーク、インターネット、デジタル衛星放送といった、有線または無線の伝送媒体を介して提供することができる。 The program executed by the computer 100 (CPU 101) can be provided by being recorded on a removable medium 111 as a package medium or the like, for example. Further, the program can be provided via a wired or wireless transmission medium such as a local area network, the Internet, or digital satellite broadcasting.

コンピュータ１００では、プログラムは、リムーバブルメディア１１１をドライブ装置１１０に装着することにより、入出力インターフェース１０５を介して、記憶装置１０８にインストールすることができる。また、プログラムは、有線または無線の伝送媒体を介して、通信装置１０９で受信し、記憶装置１０８にインストールすることができる。その他、プログラムは、ＲＯＭ１０２や記憶装置１０８に、あらかじめインストールしておくことができる。 In the computer 100, the program can be installed in the storage device 108 via the input / output interface 105 by attaching the removable medium 111 to the drive device 110. Further, the program can be received by the communication device 109 via a wired or wireless transmission medium and installed in the storage device 108. In addition, the program can be installed in the ROM 102 or the storage device 108 in advance.

なお、コンピュータ１００が実行するプログラムは、本明細書で説明する順序に沿って時系列に処理が行われるプログラムであってもよいし、並列に、あるいは呼び出しが行われたとき等の必要なタイミングで処理が行われるプログラムであってもよい。 Note that the program executed by the computer 100 may be a program in which processing is performed in chronological order according to the order described in this specification, or may be performed at a necessary timing such as in parallel or when a call is made. May be a program that performs the processing in.

本明細書に記載された効果はあくまで例示であって限定されるものではなく、他の効果があってもよい。 The effects described in the present specification are merely examples and are not limited, and may have other effects.

本発明は、上記した実施形態に限定されるものではなく、様々な変形例が含まれる。例えば、上記した各実施形態は、本発明を分かりやすく説明するために詳細に説明したものであり、本発明が、必ずしも説明した全ての構成要素を備えるものに限定されるものではない。また、ある実施形態の構成の一部を、他の実施形態の構成に置き換えることが可能であり、ある実施形態の構成に、他の実施形態の構成を加えることも可能である。また、各実施形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 The present invention is not limited to the embodiments described above, and includes various modifications. For example, each of the above embodiments has been described in detail in order to explain the present invention in an easily understandable manner, and the present invention is not necessarily limited to a configuration including all the described components. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of one embodiment can be added to the configuration of another embodiment. Further, for a part of the configuration of each embodiment, it is possible to add, delete, or replace another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部または全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成、機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現されてもよい。各機能を実現するプログラム、テーブル、ファイル等の情報は、半導体メモリ、ＨＤＤ、ＳＳＤ等の記憶装置、または、ＩＣカード、ＳＤカード、ＤＶＤ等の記録媒体に置くことができる。また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 In addition, the above-described configurations, functions, processing units, processing means, and the like may be partially or entirely realized by hardware, for example, by designing an integrated circuit. Each of the above-described configurations, functions, and the like may be implemented by software by a processor interpreting and executing a program that implements each function. Information such as a program, a table, and a file for realizing each function can be stored in a storage device such as a semiconductor memory, an HDD, and an SSD, or a recording medium such as an IC card, an SD card, and a DVD. In addition, control lines and information lines are shown as necessary for the description, and do not necessarily indicate all control lines and information lines on a product. In fact, it can be considered that almost all components are interconnected.

１０・・・コンテンツ生成システム、１１・・・ネットワーク、２０・・・端末装置、２１・・・制御部、２２・・・通信部、２３・・・音声取得部、２４・・・音声出力部、２５・・・操作部、２６・・・表示部、３０・・・サーバ装置、３１・・・制御部、３２・・・通信部、３３・・・記憶部、５０・・・コンテンツ表示画面、５１・・・コンテンツ再生領域、５２・・・タイトル、５３・・・字幕、５４〜５６・・・操作ボタン、６１〜６５・・・作風ボタン、１００・・・コンピュータ、１０１・・・ＣＰＵ、１０２・・・ＲＯＭ、１０３・・・ＲＡＭ、１０４・・・バス、１０５・・・入出力インターフェース、１０６・・・入力装置、１０７・・・出力装置、１０８・・・記憶装置、１０９・・・通信装置、１１０・・・ドライブ装置、１１１・・・リムーバブルメディア、３１１・・・変換部、３１２・・・解析部、３１３・・・シナリオ情報生成部、３１４・・・対応付け部、３１５・・・動画像生成部、３１６・・・修正部、３１７・・・収集部、３１８・・・漫画生成部、３１９・・・表示制御部、３３１・・・シナリオ情報、３３２・・・素材画像ＤＢ、３３３・・・対応情報、３３４・・・コンテンツ DESCRIPTION OF SYMBOLS 10 ... Content generation system, 11 ... Network, 20 ... Terminal device, 21 ... Control unit, 22 ... Communication unit, 23 ... Sound acquisition unit, 24 ... Sound output unit , 25 ... operation unit, 26 ... display unit, 30 ... server device, 31 ... control unit, 32 ... communication unit, 33 ... storage unit, 50 ... content display screen , 51: Content playback area, 52: Title, 53: Subtitle, 54 to 56: Operation buttons, 61 to 65: Style buttons, 100: Computer, 101: CPU , 102 ROM, 103 RAM, 104 bus 105 input / output interface 106 input device 107 output device 108 storage device 109 ..Communication devices, 110 ... Dora Device, 111: removable media, 311: conversion unit, 312: analysis unit, 313: scenario information generation unit, 314: association unit, 315: moving image generation unit, 316: correction unit, 317: collection unit, 318: comic generation unit, 319: display control unit, 331: scenario information, 332: material image DB, 333: correspondence Information, 334 ... contents

Claims

A voice acquisition unit that acquires voice information of a speaker;
A conversion unit that converts the audio information into text information,
An analysis unit for analyzing the text information,
A scenario information generating unit that generates scenario information based on the analysis result of the text information,
A material image database in which the material images are stored,
An associating unit that divides the scenario information into predetermined editing units, and generates association information by associating the material images stored in the material image database with each editing unit;
A moving image generation unit that generates a moving image by linking the material images associated with each editing unit,
Equipped with a,
The analysis unit specifies the metadata of the speaker based on the audio information,
The content generation system, wherein the association unit associates the material images stored in the material image database for each editing unit based on the metadata of the speaker to generate the correspondence information. .

The content generation system according to claim 1, wherein
The content generation system, wherein the analysis unit specifies 5W1H in the content as the analysis result of the text information.

The content generation system according to claim 1 or 2,
The content generation system, wherein the analysis unit specifies an emotion of a character in the content.

The content generation system according to claim 3, wherein
The content generation system, wherein the analysis unit specifies an emotion of a character in the content based on the audio information.

The content generation system according to claim 1, wherein:
The analysis unit specifies at least one of age, gender, country of origin, language used, region of origin, and accent of language used as metadata of the speaker based on the audio information. Content generation system.

A content generation system according to any one of claims 1 to 5 ,
The association unit associates the material image of a predetermined style stored in the material image database for each editing unit based on the metadata of the speaker to generate the association information. Content generation system.

  The content generation system according to claim 6, wherein
  In the material image database, as the style, at least one of the material images for everyone, for men, for women, for children, for adults, historical drama, Western style, Eastern style, live-action, and animation Stored
  A content generation system, characterized in that:

A content generation system according to any one of claims 1 to 7,
The scenario information generation unit generates the scenario information in which at least one of a comment, a behavior, a situation, and an emotion regarding a character in the content is arranged in time series based on the analysis result of the text information. Characteristic content generation system.

It is a content generation system according to any one of claims 1 to 8,
An operation unit for receiving an operation input from a user,
A correction unit that corrects at least one of the scenario information and the correspondence information based on an operation input from the user,
A content generation system comprising:

The content generation system according to claim 9, wherein:
The content generation system, wherein the correction unit learns a correction result for at least one of the scenario information and the correspondence information, and updates the material image database based on the learning result.

The content generation system according to claim 9, wherein:
The content generation system, wherein the correction unit corrects the correspondence information by associating the material image input by the user with the editing unit of the scenario information.

A content generation system according to any one of claims 1 to 11,
A collection unit that collects image data as free material and registers the material image in the material image database as the material image.
A content generation system, comprising:

A content generation method by a content generation system,
A voice obtaining step of obtaining voice information of the speaker;
A conversion step of converting the audio information into text information,
An analysis step of analyzing the text information;
A scenario information generating step of generating scenario information based on the analysis result of the text information,
An associating step of dividing the scenario information into predetermined edit units and generating corresponding information by associating material images stored in a material image database for each edit unit;
A moving image generating step of generating a moving image by linking the material images associated with each editing unit,
Only including,
The analyzing step specifies the metadata of the speaker based on the audio information,
The content generation method, wherein the associating step generates the association information by associating the material images stored in the material image database for each of the editing units based on the metadata of the speaker. .