JP5777233B1

JP5777233B1 - Movie generation apparatus and movie generation method

Info

Publication number: JP5777233B1
Application number: JP2015051364A
Authority: JP
Inventors: 鉄兵前田
Original assignee: 株式会社アクアティカ
Priority date: 2015-03-13
Filing date: 2015-03-13
Publication date: 2015-09-09
Anticipated expiration: 2035-03-13
Also published as: JP2016171529A

Abstract

【課題】キャラクターに対応する動画をリアルタイムに生成する。【解決手段】動画生成装置１は、音声担当者からキャラクターの音声の入力を受け付ける音声受付部１４１と、受け付けた音声を動作担当者の端末に出力する音声出力部１４２と、音声が出力された後、動作担当者端末２から音声に対応するキャラクターの動作を受け付ける動作受付部１４３と、音声と動作に対応する動画とを合成してキャラクターの動画を生成する合成部１４５と、生成されたキャラクターの動画を出力する動画出力部１４６とを備える。【選択図】図２A moving image corresponding to a character is generated in real time. A moving image generating apparatus includes: a voice receiving unit that receives input of a character's voice from a voice person in charge; a voice output unit that outputs the received voice to a terminal of a person in charge of operation; Thereafter, an operation accepting unit 143 that receives the operation of the character corresponding to the voice from the operation person in charge terminal 2, a synthesizing unit 145 that synthesizes the audio and the video corresponding to the operation to generate a character video, and the generated character A moving image output unit 146 for outputting the moving image. [Selection] Figure 2

Description

本発明は、動画生成装置及び動画生成方法に関する。 The present invention relates to a moving image generating apparatus and a moving image generating method.

従来、キャラクターが登場するアニメ等の動画の制作においては、声優が、動画に登場するキャラクターの音声を吹き込むことが行われている（例えば、特許文献１参照）。このような動画の制作では、まず、シナリオが作成され、当該シナリオに基づいて音声の無い動画が生成される。そして、声優が、当該動画におけるキャラクターの動作を見ながら、シナリオに基づいて音声を吹き込む。 2. Description of the Related Art Conventionally, in the production of moving images such as anime in which characters appear, voice actors blow the sound of characters appearing in moving images (see, for example, Patent Document 1). In production of such a moving image, a scenario is first created, and a moving image without sound is generated based on the scenario. Then, the voice actor blows sound based on the scenario while watching the action of the character in the video.

特開平５−２３２６０１号公報JP-A-5-232601

しかしながら、従来の動画の制作では、シナリオに基づいて音声の無い動画を作成し、その後、当該動画に対して音声を吹き込むことから、音声付きの動画が完成するまでに時間がかかるという問題がある。このため、例えば、アニメキャラクター等のキャラクターが登場するライブ中継を行ったり、視聴者からのリクエストに対応して、キャラクターがリアルタイムに応答したりすることができないという問題があった。 However, in conventional video production, there is a problem that it takes time to complete a video with sound because a video without sound is created based on a scenario and then sound is blown into the video. . For this reason, for example, there has been a problem that a live broadcast in which a character such as an anime character appears or a character cannot respond in real time in response to a request from a viewer.

そこで、本発明はこれらの点に鑑みてなされたものであり、キャラクターに対応する動画をリアルタイムに生成することができる動画生成装置及び動画生成方法を提供することを目的とする。 Therefore, the present invention has been made in view of these points, and an object thereof is to provide a moving image generating apparatus and a moving image generating method capable of generating a moving image corresponding to a character in real time.

本発明の第１の態様に係る動画生成装置は、音声担当者からキャラクターの音声の入力を受け付ける音声受付部と、受け付けた音声を動作担当者の端末に出力する音声出力部と、前記音声が出力された後、前記動作担当者の端末から前記音声に対応する前記キャラクターの動作を受け付ける動作受付部と、前記音声と前記動作に対応する動画とを合成して前記キャラクターの動画を生成する合成部と、生成された前記キャラクターの動画を出力する動画出力部と、を備える。 The moving image generating apparatus according to the first aspect of the present invention includes a voice receiving unit that receives input of a character's voice from a voice person, a voice output part that outputs the received voice to the terminal of the person in charge, and the voice is After being output, a composition for generating a motion picture of the character by synthesizing the motion and the motion corresponding to the motion from the terminal of the person in charge of motion and the motion corresponding to the motion And a moving image output unit that outputs the generated moving image of the character.

前記動画生成装置は、前記音声担当者から、前記音声に対応する前記キャラクターの感情を示す感情情報を受け付ける感情受付部をさらに備え、前記動作受付部は、前記音声担当者から受け付けた前記感情情報を前記動作担当者の端末に表示させ、当該感情情報に対応する前記キャラクターの動作を受け付けてもよい。 The moving image generating apparatus further includes an emotion receiving unit that receives emotion information indicating the emotion of the character corresponding to the voice from the voice person, and the motion receiving unit receives the emotion information received from the voice person May be displayed on the terminal of the person in charge of the action, and the action of the character corresponding to the emotion information may be received.

前記動作受付部は、前記キャラクターの複数の動作を示す情報を前記動作担当者の端末に表示させ、当該複数の動作から一の動作の選択を受け付けることにより、前記キャラクターの動作を受け付けてもよい。 The motion accepting unit may accept the motion of the character by displaying information indicating a plurality of motions of the character on the terminal of the person in charge of the motion and accepting selection of one motion from the motions. .

前記動作受付部は、前記感情に対応する前記キャラクターの複数の動作を示す情報を前記動作担当者の端末に表示させ、当該複数の動作から一の動作の選択を受け付けることにより、前記キャラクターの動作を受け付けてもよい。 The action accepting unit displays information indicating a plurality of actions of the character corresponding to the emotion on the terminal of the person in charge of the action, and accepts selection of one action from the plurality of actions, thereby moving the action of the character May be accepted.

前記動画生成装置は、前記キャラクターの感情と、バックグラウンドミュージックとを関連付けて記憶する記憶部と、前記記憶部を参照して前記感情情報が示す感情に対応する前記バックグラウンドミュージックを選択する選択部とをさらに備え、前記合成部は、前記音声と、前記動作に対応する動画と、選択された前記バックグラウンドミュージックとを合成して前記キャラクターの動画を生成してもよい。 The moving image generating apparatus includes a storage unit that associates and stores the emotion of the character and background music, and a selection unit that selects the background music corresponding to the emotion indicated by the emotion information with reference to the storage unit The synthesizing unit may generate the character video by synthesizing the voice, the video corresponding to the motion, and the selected background music.

前記合成部は、前記音声の再生が開始する時刻と、前記音声に対応する前記キャラクターの動作が開始する時刻とを同期させて、前記音声と前記動作に対応する動画とを合成してもよい。 The synthesizing unit may synthesize the sound and the moving image corresponding to the action by synchronizing the time when the reproduction of the sound starts and the time when the action of the character corresponding to the sound starts. .

前記動画生成装置は、受け付けた音声を示す音声情報を記憶する記憶部をさらに備え、前記合成部は、前記記憶部に記憶された前記音声情報を取得し、当該音声情報が示す音声の再生が開始する時刻と、前記キャラクターの動作が開始する時刻とを同期させて、前記音声と前記動作に対応する動画とを合成してもよい。 The moving image generation apparatus further includes a storage unit that stores audio information indicating the received audio, the synthesizing unit acquires the audio information stored in the storage unit, and reproduces the audio indicated by the audio information. The sound and the moving image corresponding to the motion may be synthesized by synchronizing the start time and the time when the character's motion starts.

前記動画生成装置は、受け付けた音声をテキストに変換する変換部をさらに備え、前記動作受付部は、前記変換部によって変換された前記テキストを前記動作担当者の端末に表示させ、前記動作担当者の端末から前記キャラクターの動作を受け付けてもよい。 The moving image generation apparatus further includes a conversion unit that converts received speech into text, and the operation reception unit displays the text converted by the conversion unit on a terminal of the operation person, and the operation person The movement of the character may be received from the terminal.

前記動画出力部は、受け付けた音声の出力が終了したことに応じて、受け付けた動作に対応する動画の出力を終了させるとともに、前記音声が入力されていない状態に対応する前記キャラクターの動画を出力してもよい。 The moving image output unit ends the output of the moving image corresponding to the received operation and outputs the moving image of the character corresponding to the state where the sound is not input in response to the completion of the output of the received sound. May be.

前記音声出力部は、受け付けた音声を演出担当者の端末に出力し、前記動画生成装置は、前記音声が出力された後、前記演出担当者の端末から、前記動画の演出に係る情報を受け付ける演出受付部をさらに備え、前記合成部は、合成された前記キャラクターの動画に対して、前記演出に対応する効果を付加してもよい。 The audio output unit outputs the received audio to the terminal of the director in charge of production, and the video generation device receives information related to the production of the video from the terminal of the director in charge of production after the voice is output. An effect receiving unit may be further included, and the composition unit may add an effect corresponding to the effect to the synthesized animation of the character.

本発明の第２の態様に係る動画生成方法は、コンピュータにより実行される、音声担当者からキャラクターの音声の入力を受け付けるステップと、受け付けた音声を動作担当者の端末に出力するステップと、前記音声が出力された後、前記動作担当者の端末から前記音声に対応する前記キャラクターの動作を受け付けるステップと、前記音声と前記動作に対応する動画とを合成して前記キャラクターの動画を生成するステップと、生成された前記キャラクターの動画を出力するステップと、を備える。 The moving image generating method according to the second aspect of the present invention includes a step of receiving an input of a character's voice from a voice person, a step of outputting the received voice to a terminal of the person in charge, which is executed by a computer; After the voice is output, the step of receiving the motion of the character corresponding to the voice from the terminal of the person in charge of the operation, and the step of generating the video of the character by synthesizing the voice and the video corresponding to the motion And outputting the generated animation of the character.

本発明によれば、キャラクターに対応する動画をリアルタイムに生成することができるという効果を奏する。 According to the present invention, there is an effect that a moving image corresponding to a character can be generated in real time.

第１の実施形態に係る動画生成システムの概要を示す図である。It is a figure which shows the outline | summary of the moving image production | generation system which concerns on 1st Embodiment. 第１の実施形態に係る動画生成装置の構成を示す図である。It is a figure which shows the structure of the moving image production | generation apparatus which concerns on 1st Embodiment. 動画生成装置がキャラクターの動画を出力するまでの処理の流れを示すシーケンス図である。It is a sequence diagram which shows the flow of a process until a moving image production | generation apparatus outputs the moving image of a character. キャラクターの動作に関する情報の一例を示す図である。It is a figure which shows an example of the information regarding a character's operation | movement. バックグラウンドミュージックに関する情報の一例を示す図である。It is a figure which shows an example of the information regarding background music. 効果音に関する情報の一例を示す図である。It is a figure which shows an example of the information regarding a sound effect. 音声と動作に対応する動画との合成について説明する図である。It is a figure explaining a synthesis | combination with the moving image corresponding to an audio | voice and operation | movement. キャラクターの動画の一例を示す図である。It is a figure which shows an example of the moving image of a character. 第２の実施形態に係る動画生成装置の構成を示す図である。It is a figure which shows the structure of the moving image production | generation apparatus which concerns on 2nd Embodiment. 第３の実施形態に係る動画生成装置の構成を示す図である。It is a figure which shows the structure of the moving image production | generation apparatus which concerns on 3rd Embodiment.

以下、本発明の実施形態について説明する。
＜第１の実施形態＞
［動画生成システムＳの概要］
図１は、第１の実施形態に係る動画生成システムＳの概要を示す図である。
動画生成システムＳは、動画生成装置１と、動作担当者端末２と、演出担当者端末３とを備える。動画生成装置１、動作担当者端末２及び演出担当者端末３は、記憶部と制御部とを備えたコンピュータである。動画生成装置１は、動作担当者端末２及び演出担当者端末３と、ＬＡＮやインターネット等の通信ネットワークを介して通信可能に接続されている。 Hereinafter, embodiments of the present invention will be described.
<First Embodiment>
[Outline of Movie Generation System S]
FIG. 1 is a diagram illustrating an overview of a moving image generation system S according to the first embodiment.
The moving image generating system S includes a moving image generating device 1, an operation person in charge terminal 2, and an effect person in charge terminal 3. The moving image generating device 1, the operation person in charge terminal 2, and the production person in charge terminal 3 are computers including a storage unit and a control unit. The moving image generating apparatus 1 is connected to the operation staff terminal 2 and the rendering staff terminal 3 via a communication network such as a LAN or the Internet so as to be communicable.

動画生成装置１は、アニメ等に登場するキャラクターの動画を生成するサーバ等のコンピュータである。具体的には、動画生成装置１は、キャラクターの声を担当する音声担当者から、当該キャラクターの音声の入力を受け付ける（図１の（１））。動画生成装置１は、キャラクターの音声を受け付けると、当該音声を示す音声情報を動作担当者端末２及び演出担当者端末３に出力する（図１の（２））。 The moving image generating apparatus 1 is a computer such as a server that generates a moving image of a character appearing in an animation or the like. Specifically, the moving image generating apparatus 1 receives an input of the character's voice from a voice person in charge of the character's voice ((1) in FIG. 1). When the moving image generating apparatus 1 receives the voice of the character, the moving picture generating apparatus 1 outputs voice information indicating the voice to the operation staff terminal 2 and the production staff terminal 3 ((2) in FIG. 1).

動画生成装置１は、動作担当者端末２から音声に対応したキャラクターの動作を示す動作情報を受け付けるとともに、演出担当者端末３から音声に対応したバックグラウンドミュージック（以下、ＢＧＭという。）や照明効果等の演出を示す演出情報を受け付ける（図１の（３）、（４））。 The moving image generating apparatus 1 receives motion information indicating the motion of the character corresponding to the voice from the motion staff terminal 2 and background music (hereinafter referred to as BGM) corresponding to the voice or lighting effect from the director staff terminal 3. Production information indicating production such as the above is received ((3), (4) in FIG. 1).

動画生成装置１は、受け付けた音声と、動作情報に対応する動画とを合成するとともに、合成した動画に対して演出情報が示す演出効果を付加することにより、キャラクターの動画を生成する（図１の（５））。動画生成装置１は、生成したキャラクターの動画を出力する（図１の（６））。 The moving image generating apparatus 1 combines the received sound and the moving image corresponding to the motion information, and generates a moving image of the character by adding a rendering effect indicated by the rendering information to the combined moving image (FIG. 1). (5)). The moving image generating device 1 outputs the generated moving image of the character ((6) in FIG. 1).

このように、動画生成装置１では、キャラクターが話をしながら動作する動画を出力することができる。また、動画生成装置１では、音声担当者がシナリオに基づかずに、キャラクターの音声を即興で発したとしても、当該音声に対応するように動作担当者がキャラクターの動作を決定することでキャラクターの動画を生成することができる。よって、動画生成装置１は、音声担当者が発した音声に対応して、音声付きのキャラクターの動画をリアルタイムに生成することができる。
以下、動画生成装置１の構成及び動作について説明する。 As described above, the moving image generating apparatus 1 can output a moving image in which a character operates while speaking. Also, in the video generation device 1, even if the voice person does not follow the scenario and improvises the voice of the character, the action person determines the character's movement so as to correspond to the voice. A video can be generated. Therefore, the moving image generating device 1 can generate a moving image of a character with sound in real time in response to the sound uttered by the sound person in charge.
Hereinafter, the configuration and operation of the moving image generating apparatus 1 will be described.

［動画生成装置１の構成例］
図２は、第１の実施形態に係る動画生成装置１の構成を示す図である。動画生成装置１は、通信部１１と、マイク１２と、記憶部１３と、制御部１４とを備える。 [Configuration example of moving image generation apparatus 1]
FIG. 2 is a diagram illustrating a configuration of the moving image generating apparatus 1 according to the first embodiment. The moving image generating apparatus 1 includes a communication unit 11, a microphone 12, a storage unit 13, and a control unit 14.

通信部１１は、通信ネットワークと接続するための通信インターフェイスであり、例えばＬＡＮコントローラを含んでいる。
マイク１２は、動画生成装置１の使用者から音声の入力を受け付け、入力された音声を制御部１４に出力する。本実施形態では、マイク１２は、キャラクターの音声を担当する音声担当者から音声の入力を受け付ける。 The communication unit 11 is a communication interface for connecting to a communication network, and includes, for example, a LAN controller.
The microphone 12 receives voice input from the user of the video generation device 1 and outputs the input voice to the control unit 14. In the present embodiment, the microphone 12 receives an input of voice from a voice person in charge of the voice of the character.

記憶部１３は、例えば、ＲＯＭ及びＲＡＭ等により構成される。記憶部１３は、動画生成装置１を機能させるための各種プログラムを記憶する。記憶部１３は、動画生成装置１の制御部１４を、後述する音声受付部１４１、音声出力部１４２、動作受付部１４３、演出受付部１４４、合成部１４５、及び動画出力部１４６として機能させる動画生成プログラムを記憶する。記憶部１３は、外部メモリ等の記憶媒体に記憶されたプログラムを読み取って記憶してもよく、通信ネットワークを介して外部機器からダウンロードされたプログラムを記憶してもよい。また、記憶部１３は、キャラクターの動作に対応する動画や、キャラクターの動作に合わせて再生されるＢＧＭや、動画に対して照明効果を設定するための動画編集プログラム等を記憶する。また、記憶部１３は、音声担当者から受け付けた音声を示す音声情報を一時的に記憶する音声バッファ１３１を備える。 The storage unit 13 is configured by, for example, a ROM and a RAM. The storage unit 13 stores various programs for causing the moving image generating apparatus 1 to function. The storage unit 13 causes the control unit 14 of the video generation device 1 to function as a voice reception unit 141, a voice output unit 142, an operation reception unit 143, an effect reception unit 144, a synthesis unit 145, and a video output unit 146 to be described later. Store the generation program. The storage unit 13 may read and store a program stored in a storage medium such as an external memory, or may store a program downloaded from an external device via a communication network. In addition, the storage unit 13 stores a moving image corresponding to the character's motion, a BGM reproduced according to the character's motion, a moving image editing program for setting a lighting effect on the moving image, and the like. In addition, the storage unit 13 includes an audio buffer 131 that temporarily stores audio information indicating the audio received from the audio staff.

制御部１４は、例えば、ＣＰＵにより構成される。制御部１４は、記憶部１３に記憶されている各種プログラムを実行することにより、動画生成装置１に係る機能を制御する。制御部１４は、音声受付部１４１と、音声出力部１４２と、動作受付部１４３と、演出受付部１４４と、合成部１４５と、動画出力部１４６とを備える。 The control part 14 is comprised by CPU, for example. The control unit 14 controls functions related to the moving image generating device 1 by executing various programs stored in the storage unit 13. The control unit 14 includes an audio reception unit 141, an audio output unit 142, an operation reception unit 143, an effect reception unit 144, a synthesis unit 145, and a moving image output unit 146.

以下、動画生成装置１がキャラクターの動画を出力するまでの処理の流れを示すシーケンスを参照しながら、制御部１４が備える機能について説明する。図３は、動画生成装置１がキャラクターの動画を出力するまでの処理の流れを示すシーケンス図である。 Hereinafter, functions provided in the control unit 14 will be described with reference to a sequence showing a flow of processing until the moving image generation apparatus 1 outputs a character moving image. FIG. 3 is a sequence diagram showing a flow of processing until the moving image generating apparatus 1 outputs a character moving image.

まず、音声受付部１４１は、マイク１２を介して、音声担当者からキャラクターの音声の入力を受け付け（Ｓ１）、入力された音声を示す音声情報を音声バッファ１３１に記憶させる（Ｓ２）。音声情報は、例えば、アナログ音声信号をデジタル符号化することにより生成されるデジタルデータである。なお、音声受付部１４１は、音声担当者が使用する音声担当者端末（不図示）から、当該音声担当者端末において入力された音声を示す音声情報を受信することにより、キャラクターの音声の入力を受け付けるようにしてもよい。
続いて、音声出力部１４２は、音声受付部１４１が受け付けたキャラクターの音声を示す音声情報を動作担当者端末２及び演出担当者端末３に出力する。 First, the voice reception unit 141 receives an input of a character's voice from the voice staff via the microphone 12 (S1), and stores voice information indicating the input voice in the voice buffer 131 (S2). The audio information is, for example, digital data generated by digitally encoding an analog audio signal. The voice reception unit 141 receives voice information indicating voice input at the voice officer terminal from a voice officer terminal (not shown) used by the voice officer, thereby inputting the character's voice. You may make it accept.
Subsequently, the voice output unit 142 outputs voice information indicating the voice of the character received by the voice reception unit 141 to the operation staff terminal 2 and the production staff terminal 3.

続いて、動作受付部１４３は、音声出力部１４２がキャラクターの音声を動作担当者端末２に出力した後、動作担当者端末２から、当該キャラクターの音声に対応するキャラクターの動作を示す動作情報を受信することにより、当該動作を受け付ける（Ｓ３）。具体的には、動作受付部１４３は、キャラクターの複数の動作を示す情報を動作担当者端末２に表示させ、当該複数の動作から一の動作の選択を受け付けることにより、キャラクターの動作を受け付ける。 Subsequently, after the voice output unit 142 outputs the voice of the character to the action staff terminal 2, the action accepting section 143 receives action information indicating the action of the character corresponding to the voice of the character from the action staff terminal 2. The operation is accepted by receiving (S3). Specifically, the motion accepting unit 143 displays information indicating a plurality of motions of the character on the motion staff terminal 2, and accepts the motion of the character by accepting selection of one motion from the plurality of motions.

より具体的には、例えば、記憶部１３には、図４に示すように、キャラクターの複数の動作を示す情報として、キャラクターの感情と、キャラクターの感情に対応するキャラクターの表情と、キャラクターの動作と、当該感情、表情及び動作に対応した動画とが関連付けられて記憶されている。なお、図４に示す例では、感情と表情とが１対１に関連付けられているが、一の感情に対して複数の表情が関連付けられていてもよい。 More specifically, for example, as shown in FIG. 4, in the storage unit 13, as information indicating a plurality of character actions, the character emotions, the character facial expressions corresponding to the character emotions, and the character actions are stored. And a moving image corresponding to the emotion, facial expression, and action are stored in association with each other. In the example shown in FIG. 4, emotions and facial expressions are associated one-to-one, but a plurality of facial expressions may be associated with one emotion.

動作受付部１４３は、キャラクターの複数の感情を示す情報と、当該感情に対応するキャラクターの表情を示す画像とを動作担当者端末２に予め表示させておく。動作受付部１４３は、音声出力部１４２がキャラクターの音声を出力したことに応じて、動作担当者端末２からキャラクターの音声に対応するキャラクターの感情の選択を受け付ける。 The motion reception unit 143 displays information indicating a plurality of emotions of the character and an image indicating the facial expression of the character corresponding to the emotions on the motion officer terminal 2 in advance. The motion accepting unit 143 accepts selection of the character's emotion corresponding to the character's voice from the motion staff terminal 2 in response to the voice output unit 142 outputting the character's voice.

動作受付部１４３は、キャラクターの感情が選択されると、当該感情に関連付けられている複数の動作から一の動作の選択を受け付ける動作選択画面を動作担当者端末２に表示させる。動作担当者端末２は、動作選択画面において一の動作が選択されると、選択された一の動作を示す動作情報を動画生成装置１に送信する。動作受付部１４３は、当該動作情報を受信することにより、選択された感情に対応する一の動作の選択を受け付ける。 When the emotion of the character is selected, the motion accepting unit 143 displays a motion selection screen for accepting selection of one motion from a plurality of motions associated with the emotion on the motion officer terminal 2. When one operation is selected on the operation selection screen, the operation person-in-charge terminal 2 transmits operation information indicating the selected one operation to the moving image generating apparatus 1. The motion accepting unit 143 receives the motion information and accepts selection of one motion corresponding to the selected emotion.

動作受付部１４３は、音声出力部１４２が音声情報を出力したタイミングに同期して、キャラクターの複数の動作を示す情報を動作担当者端末２に表示させてもよい。動作受付部１４３は、例えば、音声出力部１４２が音声情報を出力すると同時にキャラクターの複数の動作を示す情報を動作担当者端末２に表示させる。このようにすることで、動作担当者は、音声担当者が音声を入力したタイミングに合わせてキャラクターの感情及び動作を選択しやすくなる。 The action accepting unit 143 may display information indicating a plurality of actions of the character on the action person terminal 2 in synchronization with the timing when the voice output unit 142 outputs the voice information. For example, the motion reception unit 143 causes the motion person terminal 2 to display information indicating a plurality of motions of the character at the same time as the voice output unit 142 outputs the voice information. By doing in this way, it becomes easy for the person in charge of operation to select the character's emotion and action in accordance with the timing when the person in charge of voice inputs the sound.

なお、動作受付部１４３は、音声出力部１４２が音声情報を出力するタイミングを制御する指示を動作担当者端末２から受け付けて、受け付けた指示に基づいて、音声出力部１４２に対して、音声の出力を停止させたり、出力済の音声を再出力させたりしてもよい。このようにすることで、音声出力部１４２は、動作受付部１４３が受け付けた指示に基づくタイミングで、当該指示に対応する音声情報を音声バッファ１３１から読み出して出力することができるので、動作担当者は、選択する動作を間違えた場合であっても、適切な動作に修正することができる。 The operation receiving unit 143 receives an instruction for controlling the timing at which the audio output unit 142 outputs audio information from the operation person in charge terminal 2, and based on the received instruction, The output may be stopped, or the output audio may be output again. In this way, the audio output unit 142 can read out and output audio information corresponding to the instruction from the audio buffer 131 at a timing based on the instruction received by the operation receiving unit 143. Even if the selected operation is wrong, it can be corrected to an appropriate operation.

また、動作担当者端末２は、複数台設けられていてもよい。例えば、動作担当者端末２が２台設けられている場合、動作受付部１４３は、一方の動作担当者端末２から感情の選択を受け付け、他方の動作担当者端末２から動作の選択を受け付けてもよい。 A plurality of operation staff terminals 2 may be provided. For example, when two operation staff terminals 2 are provided, the operation reception unit 143 receives an emotion selection from one operation staff terminal 2 and receives an operation selection from the other operation staff terminal 2. Also good.

続いて、演出受付部１４４は、音声出力部１４２がキャラクターの音声を演出担当者端末３に出力した後、演出担当者端末３から、キャラクターの動画の演出に係る演出情報を受け付ける（Ｓ４）。 Subsequently, after the voice output unit 142 outputs the voice of the character to the director-in-charge person terminal 3, the stage reception unit 144 receives the stage information related to the production of the character video from the director-in-charge terminal 3 (S4).

具体的には、記憶部１３は、図５に示すように、バックグラウンドミュージック（以下、ＢＧＭという。）の題名と、ＢＧＭのファイルとを関連付けて記憶している。演出受付部１４４は、演出担当者端末３に、複数のＢＧＭの題名を予め表示させておく。演出受付部１４４は、音声出力部１４２がキャラクターの音声を出力したことに応じて、演出担当者端末３からキャラクターの音声とともに流すＢＧＭの題名の選択を受け付ける。演出受付部１４４は、ＢＧＭの題名が選択されたことに応じて、演出担当者端末３から、題名を示す情報を演出情報として受信する。 Specifically, as illustrated in FIG. 5, the storage unit 13 stores a title of background music (hereinafter referred to as “BGM”) and a BGM file in association with each other. The effect receiving unit 144 displays a plurality of BGM titles in advance on the effect person in charge terminal 3. In response to the fact that the voice output unit 142 has output the voice of the character, the stage reception unit 144 receives a selection of the title of the BGM that is played along with the voice of the character from the stage person in charge of the stage 3. The effect receiving unit 144 receives information indicating the title from the effect person in charge terminal 3 as effect information in response to selection of the BGM title.

また、記憶部１３は、図６に示すように、効果音のカテゴリと、効果音の内容を示す情報と、効果音のファイルとを関連付けて記憶している。演出受付部１４４は、演出担当者端末３に、複数の効果音のそれぞれに対応するカテゴリを表示させておく。演出受付部１４４は、音声出力部１４２がキャラクターの音声を出力したことに応じて、当該カテゴリのいずれかの選択を受け付ける。演出受付部１４４は、カテゴリが選択されると、当該カテゴリに対応する効果音の内容を示す情報を演出担当者端末３に表示させ、効果音の選択を受け付ける。演出受付部１４４は、効果音が選択されたことに応じて、演出担当者端末３から、効果音を示す情報を演出情報として受信する。 As shown in FIG. 6, the storage unit 13 stores a sound effect category, information indicating the content of the sound effect, and a sound effect file in association with each other. The effect receiving unit 144 displays the category corresponding to each of the plurality of sound effects on the effect person in charge terminal 3. The effect receiving unit 144 receives a selection of the category in response to the voice output unit 142 outputting the voice of the character. When the category is selected, the effect receiving unit 144 displays information indicating the content of the sound effect corresponding to the category on the effect person in charge terminal 3 and receives the selection of the sound effect. In response to the selection of the sound effect, the effect receiving unit 144 receives information indicating the sound effect from the effect person in charge terminal 3 as effect information.

また、記憶部１３は、キャラクターの動画に対して適用可能な照明効果の種類を示す情報を記憶している。演出受付部１４４は、演出担当者端末３に、複数の照明効果の種類を示す情報を表示させておく。演出受付部１４４は、音声出力部１４２がキャラクターの音声を出力したことに応じて、照明効果の種類の選択を受け付ける。演出受付部１４４は、照明効果の種類が選択されたことに応じて、演出担当者端末３から、当該種類を示す情報を演出情報として受信する。 In addition, the storage unit 13 stores information indicating the types of lighting effects that can be applied to a moving image of a character. The effect receiving unit 144 displays information indicating a plurality of types of lighting effects on the effect person in charge terminal 3. The effect receiving unit 144 receives a selection of the type of lighting effect in response to the voice output unit 142 outputting the voice of the character. In response to the selection of the type of lighting effect, the effect receiving unit 144 receives information indicating the type from the effect person in charge terminal 3 as effect information.

このようにすることで、動画生成装置１は、演出担当者端末３を使用する演出担当者が、キャラクターの音声を聞きながら選択した、当該音声に適した演出効果を受け付けることができる。
なお、上述の説明では、動画生成装置１が動作を示す動作情報を受け付けた後に、演出を示す演出情報を受け付けるものとして説明したが、この順番で動作情報及び演出情報を受け付けなくてもよい。例えば、動画生成装置１は、演出情報を受け付けてから動作情報を受け付けたり、演出情報及び動作情報を同時に受け付けたりしてもよい。 By doing in this way, the animation production | generation apparatus 1 can receive the production effect suitable for the said voice which the production person in charge using the production person in charge terminal 3 selected while listening to the voice of a character.
In the above description, the moving image generating apparatus 1 has been described as receiving the effect information indicating the effect after receiving the operation information indicating the operation. However, the operation information and the effect information may not be received in this order. For example, the moving image generating apparatus 1 may receive the operation information after receiving the production information, or may receive the production information and the operation information at the same time.

続いて、合成部１４５は、音声受付部１４１が受け付けた音声と、動作受付部１４３が受け付けた動作に対応する動画とを合成してキャラクターの音声付きの動画を生成する（Ｓ５）。具体的には、合成部１４５は、音声の再生が開始する時刻と、音声に対応するキャラクターの動作が開始する時刻とを同期させて、音声と、動作に対応する動画とを合成する。 Subsequently, the synthesizing unit 145 synthesizes the audio received by the audio receiving unit 141 and the moving image corresponding to the operation received by the operation receiving unit 143 to generate a moving image with the character audio (S5). Specifically, the synthesizing unit 145 synthesizes the sound and the moving image corresponding to the operation by synchronizing the time when the reproduction of the sound starts and the time when the action of the character corresponding to the sound starts.

図７は、音声と動作に対応する動画との合成について説明する図である。
合成部１４５は、図７に示すように、音声受付部１４１が時刻ｔ１において音声を受け付け、音声バッファ１３１に当該音声を示す音声情報の記憶が開始されると、予め定められた待ち時間Ｗｔが経過した時刻ｔ３を、キャラクターの動画の出力が開始される時刻とする。 FIG. 7 is a diagram for explaining synthesis of audio and a moving image corresponding to an operation.
As shown in FIG. 7, when the voice reception unit 141 receives a voice at time t1, and the voice buffer 131 starts storing voice information indicating the voice, the synthesis unit 145 has a predetermined waiting time Wt. The elapsed time t3 is set as the time when the output of the character moving image is started.

続いて、合成部１４５は、当該時刻ｔ３までに動作受付部１４３が受け付けた動作を、音声に対応する動作と決定する。図７に示す例では、合成部１４５は、時刻ｔ２において受け付けた動作を、音声に対応する動作と決定する。そして、合成部１４５は、当該動作に対応する動画の再生を開始する時刻を、動画の再生が開始する時刻ｔ３とする。 Subsequently, the synthesizing unit 145 determines that the operation received by the operation receiving unit 143 by the time t3 is the operation corresponding to the voice. In the example illustrated in FIG. 7, the synthesis unit 145 determines that the operation received at time t2 is an operation corresponding to voice. Then, the synthesizing unit 145 sets the time when the reproduction of the moving image corresponding to the operation is started as the time t3 when the reproduction of the moving image starts.

続いて、合成部１４５は、音声バッファ１３１に記憶された音声情報を取得し、当該音声情報が示す音声と、受け付けた動作に対応する動画とを合成して、キャラクターの音声付きの動画を生成する。 Subsequently, the synthesizing unit 145 acquires the audio information stored in the audio buffer 131, and synthesizes the audio indicated by the audio information and the video corresponding to the received operation to generate a video with the character audio. To do.

また、合成部１４５は、合成されたキャラクターの動画に対して、演出に対応する効果を付加する。合成部１４５は、当該時刻ｔ３までに演出受付部１４４が受け付けた演出情報が示す演出を、音声に対応する演出と決定する。そして、合成部１４５は、当該演出を開始する時刻を、動画の再生が開始する時刻ｔ３とし、合成されたキャラクターの動画に対して、演出に対応する効果を付加する。 The combining unit 145 also adds an effect corresponding to the effect to the combined character animation. The composition unit 145 determines the effect indicated by the effect information received by the effect receiving unit 144 by the time t3 as an effect corresponding to the voice. The synthesizing unit 145 sets the time at which the effect is started as the time t3 when the reproduction of the moving image starts, and adds an effect corresponding to the effect to the synthesized character moving image.

動画出力部１４６は、時刻ｔ３において、生成されたキャラクターの動画を出力する（Ｓ６）。ここで、時刻ｔ３において動画出力部１４６が出力を開始したキャラクターの動画に対応する音声の入力が終了したにもかかわらず、動作受付部１４３が受け付けた動作に対応する動画が継続して出力されると、キャラクターの動画を見る人が違和感を抱く。そこで、動画出力部１４６は、生成されたキャラクターの動画の出力中に、音声受付部１４１が受け付けた音声の出力が終了したことに応じて、動作受付部１４３が受け付けた動作に対応する動画の出力を終了させる。そして、動画出力部１４６は、音声が入力されていない状態に対応するキャラクターの動画を出力する。 The moving image output unit 146 outputs a moving image of the generated character at time t3 (S6). Here, the video corresponding to the motion accepted by the motion accepting unit 143 is continuously output even though the input of the voice corresponding to the video of the character started to be output by the video output unit 146 at time t3 is completed. Then, the person who watches the animation of the character feels uncomfortable. Therefore, the video output unit 146 outputs the video corresponding to the motion accepted by the motion accepting unit 143 in response to the completion of the output of the voice accepted by the voice accepting unit 141 during the output of the generated video of the character. Terminate output. Then, the moving image output unit 146 outputs a moving image of the character corresponding to a state where no sound is input.

図７に示す例では、時刻ｔ４において、音声受付部１４１が受け付けた音声の出力が終了することから、動画出力部１４６は、時刻ｔ４において、動作受付部１４３が受け付けた動作に対応する動画の出力を終了させる。そして、動画出力部１４６は、音声が入力されていない状態に対応する動画として、例えば、予め定められた通常動作に対応する動画を出力する。このようにすることで、動画出力部１４６は、音声受付部１４１が受け付けた音声の出力に合わせて、動作受付部１４３が受け付けた動作に対応する動画を出力することができる。これにより、動作担当者は、動作の開始時間及び終了時間を気にすることなく、様々な動作を選択することができ、キャラクターの動画のバリエーションを増やすことができる。 In the example illustrated in FIG. 7, since the output of the audio received by the audio reception unit 141 ends at time t4, the moving image output unit 146 displays the moving image corresponding to the operation received by the operation reception unit 143 at time t4. Terminate output. Then, the moving image output unit 146 outputs, for example, a moving image corresponding to a predetermined normal operation as a moving image corresponding to a state in which no sound is input. In this way, the moving image output unit 146 can output a moving image corresponding to the operation accepted by the operation accepting unit 143 in accordance with the output of the audio accepted by the audio accepting unit 141. Thus, the person in charge of the operation can select various operations without worrying about the start time and end time of the operation, and can increase variations of the animation of the character.

［利用例］
続いて、キャラクターの動画の利用例について説明する。図８は、テレビのニュース番組に対してキャラクターの動画が表示された例を示す。図８に示す例では、ニュース番組の画面に含まれている領域Ａ１に対してキャラクターの動画が表示されている。ニュース番組のように、予め放送する内容が決まっている場合には、キャラクターの動作が少ないことから、音声担当者がリアルタイムで発する言葉に合わせて動作担当者が単純な動作を組み合わせることにより、動画生成装置１は、リアルタイムでキャラクターの動画を生成することができる。これにより、キャラクターの動画が含まれる生放送を実現することができる。 [Usage example]
Next, an example of using a character animation will be described. FIG. 8 shows an example in which a moving image of a character is displayed for a television news program. In the example shown in FIG. 8, a moving image of a character is displayed in the area A1 included in the news program screen. When the content to be broadcast is determined in advance, such as in a news program, the motion of the character is few, so the motion clerk combines simple motions according to the words that the voice clerk utters in real time. The generation device 1 can generate a moving image of a character in real time. This makes it possible to realize a live broadcast that includes a character video.

なお、図８に示すように、ニュース画面に対してメッセージ表示領域Ａ２を表示させ、視聴者が番組に対して投稿したメッセージを表示させてもよい。この場合、放送局側では、当該メッセージに対する応答担当者を設定しておく。そして、動画生成装置１が、視聴者からメッセージを受け付けるとともに、応答担当者から当該メッセージの応答メッセージを受け付ける。そして、放送局側では、動画生成装置１が受け付けたこれらのメッセージを、ニュース画面に表示させる。ここで、応答メッセージは、図８に示すキャラクターの発言として表示させてもよい。このようにすることで、視聴者に対して、あたかもキャラクターとコミュニケーションしているように思わせて、キャラクターに対する親密感を抱かせ、視聴者が番組を視聴する意欲を高めることができる。 In addition, as shown in FIG. 8, the message display area A2 may be displayed on the news screen, and the message posted by the viewer on the program may be displayed. In this case, the person in charge of responding to the message is set on the broadcast station side. Then, the moving image generating apparatus 1 accepts a message from the viewer and accepts a response message of the message from the responder. Then, on the broadcast station side, these messages received by the moving image generating apparatus 1 are displayed on the news screen. Here, the response message may be displayed as a comment of the character shown in FIG. By doing so, it is possible to make the viewer feel as if they are communicating with the character, to make the character feel intimate, and to increase the willingness of the viewer to watch the program.

また、図８に示す例では、ニュース番組に対してキャラクターの動画が表示された例について説明したが、これに限らない。例えば、ポータルサイトに掲載されたニュースのそれぞれに対応して、キャラクターの動画が表示されるようにしてもよい。この場合には、ニュースのカテゴリのそれぞれに対して、キャラクターの動画を生成するためのチームを設けておく。そして、動画生成装置１は、複数のチームのそれぞれから音声や動作を受け付けて、複数のカテゴリのそれぞれに対応するキャラクターの動画を生成する。このようにすることで、視聴者が、複数のニュースのカテゴリのいずれを視聴した場合であっても、当該視聴者に、同じキャラクターの動画を視聴させることができ、ポータルサイト全体として統一感を出すことができる。 Further, in the example illustrated in FIG. 8, the example in which the moving image of the character is displayed for the news program has been described, but the present invention is not limited thereto. For example, a character animation may be displayed corresponding to each news posted on the portal site. In this case, a team for generating character animations is provided for each news category. Then, the moving image generating apparatus 1 receives voices and actions from each of the plurality of teams, and generates a character moving image corresponding to each of the plurality of categories. In this way, even if a viewer views any of a plurality of news categories, the viewer can watch a video of the same character, and the entire portal site has a sense of unity. Can be put out.

また、ポータルサイト等において、視聴者からの質問に対して、キャラクターが回答するように動画を生成してもよい。この場合には、動画生成装置１は、例えば、視聴者から質問メッセージをテキスト情報で受け付け、自身の表示部に表示させる。その後、動画生成装置１は、音声担当者から、当該質問メッセージに対応する回答メッセージをキャラクターの音声で受け付け、動作担当者から、当該音声に対応する動作を受け付ける。そして、動画生成装置１は、受け付けた音声と、受け付けた動作に対応する動画とを合成することにより、質問メッセージに対してキャラクターが回答する動画を生成する。このようにすることで、視聴者は、キャラクターとリアルタイムにコミュニケーションをとることができる。 Further, on a portal site or the like, a moving image may be generated so that a character answers a question from a viewer. In this case, for example, the moving image generating apparatus 1 accepts a question message from the viewer as text information and displays it on its own display unit. After that, the moving image generating apparatus 1 accepts an answer message corresponding to the question message from the voice person, and accepts an action corresponding to the voice from the action person. Then, the moving image generating apparatus 1 generates a moving image in which the character answers the question message by synthesizing the received sound and the moving image corresponding to the received operation. In this way, the viewer can communicate with the character in real time.

なお、動画生成装置１は、視聴者から質問メッセージを受け付けるときに、質問メッセージを音声で受け付けて、当該音声をテキスト情報に変換して自身の表示部に表示させてもよい。また、視聴者から質問メッセージを受け付けるときに、感情を示す情報を受け付けるようにしてもよい。このようにすることで、音声担当者は、視聴者の感情に適した回答を行い、円滑なコミュニケーションをとることができる。 In addition, when the moving image generating apparatus 1 receives a question message from the viewer, the moving image generating device 1 may receive the question message by voice, convert the voice into text information, and display the text information on its display unit. Further, when a question message is received from a viewer, information indicating emotion may be received. By doing in this way, the voice person in charge can make an answer suitable for the viewer's emotion and can smoothly communicate.

また、ニュース番組の制作者が、状況に応じて複数のキャラクターの少なくともいずれかに対応する動画を生成してもよい。この場合には、ニュース番組の制作者は、複数のキャラクターのそれぞれの動画を生成するためチームを設けておく。そして、ニュース番組の制作者は、ニュースに対する意見を発信する場合に、意見の内容に応じて、複数のキャラクターのいずれかを選択し、当該選択されたキャラクターが意見を述べる動画を生成してもよい。例えば、一のニュースに対して異なる意見が存在する場合に、一のアナウンサーが異なる意見を同時に発信すると、価値観がぶれている等の批判を受けることがある。これに対して、ニュース番組の制作者は、複数のキャラクターにそれぞれ異なる意見を発信させることで、様々な意見を、個々のキャラクターの意見として柔軟に発信することができる。 In addition, the creator of the news program may generate a moving image corresponding to at least one of a plurality of characters depending on the situation. In this case, the producer of the news program has a team for generating the moving images of the plurality of characters. The news program creator, when sending an opinion on the news, selects one of a plurality of characters in accordance with the content of the opinion and generates a video in which the selected character describes the opinion. Good. For example, when there are different opinions for one news, if one announcer sends out different opinions at the same time, it may be criticized that values are blurred. On the other hand, a news program creator can flexibly transmit various opinions as opinions of individual characters by causing a plurality of characters to transmit different opinions.

［第１の実施形態における効果］
以上説明したように、第１の実施形態に係る動画生成装置１は、音声担当者から受け付けた音声を動作担当者端末２に出力し、当該音声が出力された後、動作担当者端末２から当該音声に対応するキャラクターの動作を受け付け、当該音声と当該動作に対応する動画とを合成してキャラクターの動画を生成して出力する。 [Effect in the first embodiment]
As described above, the moving image generating apparatus 1 according to the first embodiment outputs the sound received from the sound person in charge to the operation person terminal 2, and after the sound is output, from the operation person terminal 2 The motion of the character corresponding to the sound is received, and the motion video corresponding to the motion is synthesized with the sound to generate and output a character motion image.

このようにすることで、音声担当者がシナリオ等に基づかずに即興でキャラクターの音声を発し、動作担当者が当該音声を聞きながらキャラクターの動作を決定することで、動画生成装置１は、音声付きのキャラクターの動画をリアルタイムに生成することができる。 By doing in this way, the moving picture generation device 1 determines the movement of the character by improvising the voice of the character improvising without being based on the scenario etc. It is possible to generate a video of a character with a mark in real time.

なお、本実施形態では、動画生成装置１は、一のキャラクターに対応する音声、動作、及び演出を受け付けることとしたが、これに限らない。例えば、動画生成装置１は、複数のキャラクターのそれぞれに対応する音声、動作、及び演出を受け付け、複数のキャラクターを含むとともに、当該複数のキャラクターの音声が含まれる動画を生成してもよい。 In the present embodiment, the moving image generating apparatus 1 accepts the sound, action, and effect corresponding to one character, but is not limited thereto. For example, the moving image generating apparatus 1 may receive sounds, actions, and effects corresponding to each of a plurality of characters, and generate a moving image that includes the plurality of characters and includes the sounds of the plurality of characters.

＜第２の実施形態＞
［音声担当者が提示した感情に対応する動作を受け付ける］
続いて、第２の実施形態について説明する。第１の実施形態では、動作担当者は、動作担当者端末２に出力されたキャラクターの音声を聞きながら、キャラクターの動作を選択した。しかしながら、動作担当者が音声のみに基づいてキャラクターの動作を選択すると、音声担当者が表現したい感情と異なった動作を選択する可能性がある。これに対して、第２の実施形態では、動画生成装置１が、音声担当者から感情を示す感情情報を受け付け、当該感情情報に対応する動作を動作担当者端末２に送信することにより、動作担当者から、当該感情情報に対応する動作の選択を受け付ける。以下に、第２の実施形態に係る動画生成システムＳについて図面を参照して説明する。なお、第１の実施形態と同様の構成については同一の符号を付し、詳細な説明を省略する。 <Second Embodiment>
[Accept actions corresponding to the emotions presented by the voice representative]
Next, the second embodiment will be described. In the first embodiment, the person in charge of the action selected the action of the character while listening to the voice of the character output to the person-in-charge terminal 2. However, if the person in charge of the action selects the action of the character based only on the voice, there is a possibility that the person in charge of the voice will select an action different from the emotion that the voice person in charge wants to express. On the other hand, in the second embodiment, the moving image generating device 1 receives emotion information indicating emotion from the person in charge of sound, and transmits an operation corresponding to the emotion information to the operation person in charge terminal 2. A selection of an action corresponding to the emotion information is received from the person in charge. Hereinafter, a moving image generating system S according to the second embodiment will be described with reference to the drawings. In addition, about the structure similar to 1st Embodiment, the same code | symbol is attached | subjected and detailed description is abbreviate | omitted.

図９は、第２の実施形態に係る動画生成装置１の構成を示す図である。図９に示すように、動画生成装置１は、感情受付部１４７と、ＢＧＭ選択部１４８とをさらに備える。
感情受付部１４７は、音声担当者から、音声に対応するキャラクターの感情を示す感情情報を受け付ける。例えば、感情受付部１４７は、動画生成装置１の表示部（不図示）に、複数の感情から選択を受け付ける感情受付画面を表示させ、音声担当者から、音声担当者が演じているキャラクターの感情の選択を受け付ける。 FIG. 9 is a diagram illustrating a configuration of the moving image generating apparatus 1 according to the second embodiment. As shown in FIG. 9, the moving image generating apparatus 1 further includes an emotion receiving unit 147 and a BGM selecting unit 148.
The emotion receiving unit 147 receives emotion information indicating the emotion of the character corresponding to the voice from the voice staff. For example, the emotion accepting unit 147 displays an emotion accepting screen for accepting a selection from a plurality of emotions on a display unit (not shown) of the moving image generating device 1, and the emotion of the character played by the voice officer from the voice officer Accept the selection.

動作受付部１４３は、音声担当者から受け付けた、キャラクターの感情を示す感情情報を動作担当者端末２に表示させ、当該感情情報に対応するキャラクターの動作を受け付ける。ここで、動作受付部１４３は、記憶部１３を参照して、受け付けた感情情報が示す感情に対応するキャラクターの複数の動作を示す情報を選択してもよい。そして、動作受付部１４３は、選択された情報を動作担当者端末２に表示させ、当該複数の動作から一の動作の選択を受け付けてもよい。この場合、動作受付部１４３は、音声担当者から感情情報を受け付けた後に音声が入力されたタイミングに同期して、感情情報が示す感情に対応するキャラクターの複数の動作を示す情報を動作担当者端末２に表示させてもよい。 The motion reception unit 143 displays emotion information indicating the emotion of the character received from the voice staff on the motion staff terminal 2 and receives the motion of the character corresponding to the emotion information. Here, the motion receiving unit 143 may select information indicating a plurality of motions of the character corresponding to the emotion indicated by the received emotion information with reference to the storage unit 13. Then, the operation accepting unit 143 may display the selected information on the operation person in charge terminal 2 and accept selection of one operation from the plurality of operations. In this case, the motion reception unit 143 receives information indicating a plurality of motions of the character corresponding to the emotion indicated by the emotion information in synchronization with the timing when the voice is input after receiving the emotion information from the voice representative. It may be displayed on the terminal 2.

また、第２の実施形態において、記憶部１３は、キャラクターの感情と、ＢＧＭの題名と、ＢＧＭのファイルとを関連付けて記憶しており、ＢＧＭ選択部１４８は、記憶部１３を参照して、感情受付部１４７が受け付けた感情情報が示す感情に対応するＢＧＭを選択する。なお、ＢＧＭ選択部１４８は、感情情報が示す感情に対応するＢＧＭが複数存在する場合には、当該複数のＢＧＭから一のＢＧＭをランダムに選択してもよい。また、感情情報が示す感情に対応するＢＧＭが複数存在する場合には、演出受付部１４４が、当該複数のＢＧＭを示す情報を送信し、演出担当者端末３から一のＢＧＭの選択を受け付けてもよい。 In the second embodiment, the storage unit 13 stores character emotions, BGM titles, and BGM files in association with each other, and the BGM selection unit 148 refers to the storage unit 13. The BGM corresponding to the emotion indicated by the emotion information received by the emotion receiving unit 147 is selected. Note that if there are a plurality of BGMs corresponding to the emotion indicated by the emotion information, the BGM selection unit 148 may randomly select one BGM from the plurality of BGMs. When there are a plurality of BGMs corresponding to the emotion indicated by the emotion information, the effect receiving unit 144 transmits information indicating the plurality of BGMs, and receives selection of one BGM from the director-in-charge terminal 3. Also good.

合成部１４５は、音声受付部１４１が受け付けた音声と、動作受付部１４３が受け付けた動作に対応する動画と、ＢＧＭ選択部１４８が選択したＢＧＭとを合成してキャラクターの動画を生成する。 The synthesizing unit 145 synthesizes the audio received by the audio receiving unit 141, the moving image corresponding to the operation received by the operation receiving unit 143, and the BGM selected by the BGM selecting unit 148 to generate a character moving image.

なお、記憶部１３が、感情情報が示す感情に対応する動作を１つ記憶している場合、動作受付部１４３は、動作担当者端末２から一の動作の選択を受け付けることなく、感情情報が示す感情に対応する動作を特定してもよい。この場合、合成部１４５は、音声受付部１４１が受け付けた音声と、動作受付部１４３が特定した動作に対応する動画と、ＢＧＭ選択部１４８が選択したＢＧＭとを合成して、動作担当者及び演出担当者が介在することなく、自動的にキャラクターの動画を生成することができる。 In addition, when the memory | storage part 13 has memorize | stored one operation | movement corresponding to the emotion which emotion information shows, the operation | movement reception part 143 does not receive selection of one operation | movement from the operation person in charge terminal 2, but emotion information is received. You may specify the operation | movement corresponding to the emotion to show. In this case, the synthesis unit 145 synthesizes the audio received by the audio reception unit 141, the moving image corresponding to the operation specified by the operation reception unit 143, and the BGM selected by the BGM selection unit 148, and Character animation can be automatically generated without the director in charge.

また、感情受付部１４７は、感情情報とともに、又は感情情報に代えて、キャラクターの動作を示す情報を受け付けてもよい。この場合、合成部１４５は、音声受付部１４１が受け付けた音声と、感情受付部１４７が受け付けた動作に対応する動画と、ＢＧＭ選択部１４８が選択したＢＧＭとを合成して、動作担当者及び演出担当者が介在することなく、自動的にキャラクターの動画を生成することができる。 Moreover, the emotion reception part 147 may receive the information which shows a character's operation | movement with emotion information or instead of emotion information. In this case, the synthesis unit 145 synthesizes the voice received by the voice reception unit 141, the moving image corresponding to the motion received by the emotion reception unit 147, and the BGM selected by the BGM selection unit 148, and Character animation can be automatically generated without the director in charge.

［第２の実施形態における効果］
以上説明したように、第２の実施形態に係る動画生成装置１は、音声担当者から受け付けた、キャラクターの感情を示す感情情報を動作担当者端末２に表示させ、当該感情情報に対応するキャラクターの動作を受け付ける。このようにすることで、動作担当者は、音声担当者が意図したキャラクターの感情を把握し、当該感情に対応したキャラクターの動作を選択することができる。 [Effects of Second Embodiment]
As described above, the moving image generating apparatus 1 according to the second embodiment displays the emotion information indicating the character's emotion received from the voice staff on the operation staff terminal 2, and the character corresponding to the emotion information. The operation of is accepted. By doing in this way, the person in charge of action can grasp the emotion of the character intended by the person in charge of voice and can select the action of the character corresponding to the emotion.

なお、第２の実施形態では、音声担当者は、自身が発した音声に対応するキャラクターの感情を選択したが、これに限らず、キャラクターの表情を選択してもよい。このようにすることで、動作担当者は、音声担当者がイメージした感情及び表情に対応したキャラクターの動作を選択することができる。 In the second embodiment, the voice person selects the character's emotion corresponding to the voice he / she uttered. However, the present invention is not limited to this, and the character's facial expression may be selected. By doing in this way, the person in charge of operation can select the action of the character corresponding to the emotion and expression imaged by the person in charge of voice.

＜第３の実施形態＞
［入力された音声を示すテキストを動作担当者端末２に表示させる］
続いて、第３の実施形態について説明する。第１の実施形態では、動作担当者が、音声担当者から受け付けた音声を聞きながら動作を選択したが、音声を聞き取れないことにより、動作を選択できないこともある。これに対して、第３の実施形態では、動画生成装置１が、音声担当者から受け付けた音声をテキストに変換し、当該テキストを動作担当者端末２に表示させる点で第１の実施形態と異なる。 <Third Embodiment>
[Display text indicating the input voice on the operation staff terminal 2]
Subsequently, a third embodiment will be described. In the first embodiment, the person in charge of the operation selects the action while listening to the voice received from the person in charge of the voice. However, the action may not be selected because the voice cannot be heard. On the other hand, the third embodiment is different from the first embodiment in that the moving image generating apparatus 1 converts the voice received from the voice staff into text and displays the text on the operation staff terminal 2. Different.

図１０は、第３の実施形態に係る動画生成装置１の構成を示す図である。図１０に示すように、動画生成装置１は、変換部１４９をさらに備える。
変換部１４９は、音声受付部１４１が受け付けた音声を解析することにより、当該音声をテキスト情報に変換する。
動作受付部１４３は、変換部１４９によって変換されたテキスト情報を動作担当者端末２に表示させ、当該動作担当者端末からキャラクターの動作を受け付ける。 FIG. 10 is a diagram illustrating a configuration of the moving image generating apparatus 1 according to the third embodiment. As shown in FIG. 10, the moving image generating apparatus 1 further includes a conversion unit 149.
The conversion unit 149 converts the voice into text information by analyzing the voice received by the voice reception unit 141.
The motion accepting unit 143 displays the text information converted by the converting unit 149 on the operation person in charge terminal 2 and receives the character's action from the operation person in charge terminal.

動画生成装置１が、第２の実施形態に係る動画生成装置１と同様に感情受付部１４７を有する場合、変換部１４９は、感情受付部１４７が受け付けた感情情報に対応するテキスト又はアイコン等を含むテキスト情報を動作担当者端末２に表示させてもよい。このようにすることで、動作担当者が動作の選択をしやすくなる。 When the moving image generating device 1 includes the emotion receiving unit 147 as in the moving image generating device 1 according to the second embodiment, the converting unit 149 displays text or an icon corresponding to the emotion information received by the emotion receiving unit 147. The included text information may be displayed on the operation person in charge terminal 2. In this way, the person in charge of the operation can easily select the operation.

［第３の実施形態における効果］
以上説明したように、第３の実施形態に係る動画生成装置１は、テキスト情報を動作担当者端末２に表示させ、当該動作担当者端末からキャラクターの動作を受け付けるので、動作担当者に音声の内容を正確に把握させることができる。 [Effect in the third embodiment]
As described above, the moving image generating apparatus 1 according to the third embodiment displays text information on the operation person in charge terminal 2 and accepts a character action from the operation person in charge terminal. The contents can be grasped accurately.

なお、音声出力部１４２は、動作受付部１４３が動作担当者端末２にテキスト情報の表示を開始してから、音声を動作担当者端末２に出力してもよい。このようにすることで、動作担当者端末２では、音声が出力される前に、当該音声に対応するテキスト情報が表示される。これにより、動作担当者は、テキスト情報に基づいて音声の内容を確認してから音声を聞くことができるので、音声の内容をより正確に把握することができる。 The voice output unit 142 may output the voice to the operation staff terminal 2 after the operation reception section 143 starts displaying the text information on the operation staff terminal 2. By doing so, the person-in-charge terminal 2 displays text information corresponding to the voice before the voice is output. Thereby, since the person in charge of operation can hear the sound after confirming the content of the sound based on the text information, the content of the sound can be grasped more accurately.

また、上記の説明においては、音声担当者が動作担当者及び演出担当者と異なる場合について説明したが、音声担当者が、動作担当者及び演出担当者の少なくともいずれかと同一であってもよい。 In the above description, the case where the person in charge of voice is different from the person in charge of operation and the person in charge of directing has been described, but the person in charge of sound may be the same as at least one of the person in charge of operation and the person in charge of directing.

以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更又は改良を加えることが可能であることが当業者に明らかである。そのような変更又は改良を加えた形態も本発明の技術的範囲に含まれ得ることが、特許請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

１・・・動画生成装置、１１・・・通信部、１２・・・マイク、１３・・・記憶部、１３１・・・音声バッファ、１４・・・制御部、１４１・・・音声受付部、１４２・・・音声出力部、１４３・・・動作受付部、１４４・・・演出受付部、１４５・・・合成部、１４６・・・動画出力部、１４７・・・感情受付部、１４８・・・ＢＧＭ選択部、１４９・・・変換部、２・・・動作担当者端末、３・・・演出担当者端末、Ｓ・・・動画生成システム DESCRIPTION OF SYMBOLS 1 ... Movie production | generation apparatus, 11 ... Communication part, 12 ... Microphone, 13 ... Memory | storage part, 131 ... Audio | voice buffer, 14 ... Control part, 141 ... Audio | voice reception part, 142 ... voice output unit, 143 ... motion reception unit, 144 ... production reception unit, 145 ... synthesis unit, 146 ... video output unit, 147 ... emotion reception unit, 148 ... -BGM selection part, 149 ... conversion part, 2 ... operation person in charge terminal, 3 ... production person in charge terminal, S ... animation production system

Claims

A voice reception unit that receives input of a character's voice from a voice representative;
An emotion receiving unit that receives emotion information indicating the emotion of the character corresponding to the voice from the voice person in charge;
An audio output unit that outputs the received audio to the terminal of the person in charge of operation,
Corresponding to the emotion information displayed on the operation person's terminal and the voice inputted to the operation person's terminal from the operation person's terminal after the sound output unit starts outputting the sound An action accepting unit for accepting the action of the character to be
A synthesizing unit that synthesizes the voice and the video corresponding to the motion to generate the video of the character;
A video output unit for outputting the generated video of the character;
A moving image generating apparatus comprising:

The motion accepting unit displays information indicating a plurality of motions of the character on the terminal of the person in charge of the motion, and accepts the motion of the character by accepting selection of one motion from the plurality of motions.
The moving image generating apparatus according to claim 1 .

The action accepting unit displays information indicating a plurality of actions of the character corresponding to the emotion on the terminal of the person in charge of the action, and accepts selection of one action from the plurality of actions, thereby moving the action of the character Accept
The moving image generating apparatus according to claim 2 .

A storage unit for storing the character's emotion and background music in association with each other;
A selection unit that selects the background music corresponding to the emotion indicated by the emotion information with reference to the storage unit, and
The synthesizing unit synthesizes the voice, the video corresponding to the motion, and the selected background music to generate the video of the character.
The moving image generating apparatus according to any one of claims 1 to 3 .

The synthesizing unit synchronizes the time when the reproduction of the sound starts and the time when the action of the character corresponding to the sound starts, and synthesizes the sound and the moving image corresponding to the action.
The moving image generating device according to any one of claims 1 to 4 .

A storage unit for storing voice information indicating the received voice;
The synthesizing unit acquires the voice information stored in the storage unit, and synchronizes the time when the reproduction of the voice indicated by the voice information starts and the time when the action of the character starts, Combining the video corresponding to the motion,
The moving image generating apparatus according to claim 5 .

A conversion unit for converting the received voice into text;
The motion accepting unit displays the text converted by the converting unit on the terminal of the person in charge of operation, and accepts the action of the character from the terminal of the person in charge of motion;
The moving image generating apparatus according to claim 6 .

The moving image output unit ends the output of the moving image corresponding to the received operation and outputs the moving image of the character corresponding to the state where the sound is not input in response to the completion of the output of the received sound. To
The moving image generating apparatus according to any one of claims 1 to 7 .

The voice output unit outputs the received voice to the terminal of the director in charge of production,
After the sound is output, an effect receiving unit that receives information related to the effect of the moving image from the terminal of the person in charge of the effect is further provided,
The synthesis unit adds an effect corresponding to the effect to the synthesized animation of the character.
The moving image generating apparatus according to any one of claims 1 to 8 .

Executed by the computer,
Receiving a voice input of the character from the voice representative;
Receiving emotion information indicating the emotion of the character corresponding to the voice from the voice representative;
Outputting the received voice to the terminal of the person in charge of operation,
After starting the output of the voice, from the terminal of the person in charge of the operation, the emotion information displayed on the terminal of the person in charge of the operation and the action of the character corresponding to the voice input to the terminal of the person in charge of the operation A step of accepting,
Synthesizing the voice and a video corresponding to the motion to generate a video of the character;
Outputting the generated animation of the character;
A video generation method comprising: