JP7496128B2

JP7496128B2 - Virtual person dialogue system, image generation method, and image generation program

Info

Publication number: JP7496128B2
Application number: JP2020179082A
Authority: JP
Inventors: 晴彦安田
Original assignee: 株式会社シルバコンパス
Priority date: 2019-11-28
Filing date: 2020-10-26
Publication date: 2024-06-06
Anticipated expiration: 2039-11-28
Also published as: JP2021086618A

Description

本発明は、仮想人物対話システム、仮想人物対話システムによる映像生成方法および仮想人物対話システムの映像生成プログラムに関する。 The present invention relates to a virtual person dialogue system, an image generation method using the virtual person dialogue system, and an image generation program for the virtual person dialogue system.

特許文献１には、指定された特定の顔画像データと、補正処理に利用された顔画像データとに基づいて、顔認識データ用メモリに記憶すべき顔画像データを補正し、正面以外の角度や方向の画像でも個人の顔の顔検出を行う撮像装置が開示されている。 Patent Document 1 discloses an imaging device that corrects face image data to be stored in a memory for face recognition data based on specified specific face image data and face image data used in the correction process, and detects the face of an individual even in images taken from angles or directions other than the front.

特許文献２には、あらかじめ作成されている会話テンプレートの中から、入力された文に対応する文を選択し、選択された文を仮想エージェントのエージェント情報に基づいて加工して応答文を生成する、会話文生成装置が開示されている。 Patent document 2 discloses a conversation sentence generation device that selects a sentence that corresponds to an input sentence from a conversation template created in advance, and processes the selected sentence based on the agent information of a virtual agent to generate a response sentence.

特開２０１１－７６４５７号公報JP 2011-76457 A 特開２０１５－６９４５５号公報JP 2015-69455 A

故人や有名人等、実際にはそこに存在しない特定の仮想人物の動画を生成し、現実味のある対話を実現するためには、映像、音声、性格の特性等、仮想人物に関する膨大な情報が必要である。また、これらの情報を統合して仮想人物を生成するには、コンピュータグラフィックス等を用いて映像を生成するため、大規模な設備やコンテンツを購入する必要があり、個人レベルで使用するのは困難であった。そこで、簡易な構成で仮想人物の発話映像を生成できるシステムが必要とされている。 In order to generate videos of specific virtual characters that do not actually exist, such as deceased people or celebrities, and to realize realistic dialogue, a huge amount of information about the virtual character, such as images, voice, and personality traits, is required. Furthermore, to integrate this information to generate a virtual character, images are generated using computer graphics, etc., which requires the purchase of large-scale equipment and content, making it difficult for individuals to use. Therefore, there is a need for a system that can generate talking videos of virtual characters with a simple configuration.

本発明は、簡易な構成で仮想人物の発話映像を生成することを目的の１つとする。 One of the objectives of the present invention is to generate speech images of a virtual person using a simple configuration.

上記目的を達成するため、本発明の一の観点に係る仮想人物対話システムは、人が動作する映像モデルを複数種類記憶する映像モデルデータベースと、前記映像モデルデータベース内のデータから、仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択部と、登録される情報ソースから、前記仮想人物の顔データを抽出する映像処理部と、前記顔データを前記使用映像モデルに統合する顔挿入部と、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理部と、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理部と、を備える。 To achieve the above object, a virtual person dialogue system according to one aspect of the present invention includes a video model database that stores multiple types of video models of human movements, a video model selection unit that selects a video model to be used to generate a virtual person from data in the video model database, a video processing unit that extracts facial data of the virtual person from a registered information source, a face insertion unit that integrates the facial data into the video model to be used, an audio processing unit that extracts audio from the information source and generates a voice of the virtual person, and a video display processing unit that generates an image of the virtual person speaking based on the video model to be used into which the facial data has been integrated and the voice of the generated virtual person.

人物の性格モデルを複数記憶する性格モデルデータベースと、前記仮想人物の性格に関する質問を提示し、前記質問に対する回答に基づいて、前記仮想人物の生成に使用する使用性格モデルを前記性格モデルデータベース内のデータから選択する性格モデル選択部と、前記使用性格モデルに基づいて、前記仮想人物が発話するメッセージを生成する対話処理部と、をさらに備えるものとしてもよい。 The system may further include a personality model database that stores multiple personality models of characters, a personality model selection unit that presents questions about the personality of the virtual character and selects a personality model to be used to generate the virtual character from data in the personality model database based on the answers to the questions, and a dialogue processing unit that generates a message to be spoken by the virtual character based on the personality model to be used.

前記性格モデル選択部は、前記仮想人物が作成した記録に基づいて前記使用性格モデルを選択するものとしてもよい。 The personality model selection unit may select the personality model to be used based on a record created by the virtual person.

前記仮想人物への質問が入力される入力部と、前記仮想人物の返答を出力する出力部と、をさらに備え、前記対話処理部は、前記質問に対する返答を生成し、前記出力部から前記返答を出力させるものとしてもよい。 The system may further include an input unit for inputting a question to the virtual character, and an output unit for outputting a response from the virtual character, and the dialogue processing unit may generate a response to the question and cause the output unit to output the response.

前記メッセージに対する評価に基づいて前記使用性格モデルを補正する性格モデル補正部をさらに備えるものとしてもよい。 The device may further include a personality model correction unit that corrects the usage personality model based on an evaluation of the message.

上記目的を達成するため、本発明の別の観点に係る映像生成方法は、人が動作する映像モデルを複数種類記憶する映像モデルデータベースを備える仮想人物対話システムにより、仮想人物の映像を生成する方法であって、
前記映像モデルデータベース内のデータから、前記仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択ステップと、登録される情報ソースから、生成する仮想人物の顔データを抽出する映像処理ステップと、前記顔データを前記使用映像モデルに統合する顔挿入ステップと、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理ステップと、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理ステップと、を含む。 In order to achieve the above object, a video generation method according to another aspect of the present invention is a method for generating a video of a virtual person by a virtual person dialogue system having a video model database that stores a plurality of types of video models of human movements, the method comprising:
The method includes a video model selection step of selecting a usage video model to be used for generating the virtual person from data in the video model database, a video processing step of extracting facial data of the virtual person to be generated from a registered information source, a face insertion step of integrating the facial data into the usage video model, an audio processing step of extracting audio from the information source and generating a voice of the virtual person, and a video display processing step of generating an image of the virtual person speaking based on the usage video model into which the facial data has been integrated and the voice of the generated virtual person.

上記目的を達成するため、本発明のさらに別の観点に係る映像生成プログラムは、人が動作する映像モデルを複数種類記憶する映像モデルデータベースを備える仮想人物対話システムにより、仮想人物の映像を生成するコンピュータプログラムであって、前記映像モデルデータベース内のデータから、前記仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択命令と、登録される情報ソースから、生成する仮想人物の顔データを抽出する映像処理命令と、前記顔データを前記使用映像モデルに統合する顔挿入命令と、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理命令と、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理命令と、をコンピュータに実行させる。
なお、コンピュータプログラムは、インターネット等のネットワークを介したダウンロードによって提供したり、ＣＤ－ＲＯＭなどのコンピュータ読取可能な各種の記録媒体に記録して提供したりすることができる。 In order to achieve the above-mentioned object, an image generation program according to yet another aspect of the present invention is a computer program for generating an image of a virtual person using a virtual person dialogue system equipped with a video model database that stores multiple types of video models of human motion, and causes a computer to execute a video model selection command for selecting a usage video model to be used to generate the virtual person from data in the video model database, a video processing command for extracting facial data of the virtual person to be generated from a registered information source, a face insertion command for integrating the facial data into the usage video model, an audio processing command for extracting audio from the information source and generating a voice of the virtual person, and a video display processing command for generating an image of the virtual person speaking based on the usage video model into which the facial data has been integrated and the voice of the generated virtual person.
The computer program can be provided by downloading via a network such as the Internet, or can be provided by recording it on various computer-readable recording media such as a CD-ROM.

本発明によれば、簡易な構成で仮想人物の発話映像を生成できる。 The present invention makes it possible to generate a video of a virtual person speaking using a simple configuration.

本発明にかかる仮想人物対話システムの概略構成図である。1 is a schematic configuration diagram of a virtual person dialogue system according to the present invention; 上記仮想人物対話システムの機能ブロック図である。FIG. 2 is a functional block diagram of the virtual person dialogue system. 上記仮想人物対話システムが、仮想人物の生成に使用する使用映像モデルを決定する工程を示すシーケンス図である。FIG. 11 is a sequence diagram showing a process in which the virtual person dialogue system determines a video model to be used for generating a virtual person. 上記仮想人物対話システムが、仮想人物の声を生成する工程を示すシーケンス図である。FIG. 4 is a sequence diagram showing a process in which the virtual character dialogue system generates a voice of a virtual character. 上記仮想人物対話システムが、仮想人物の性格モデルを決定する工程を示すシーケンス図である。FIG. 4 is a sequence diagram showing a process in which the virtual character dialogue system determines a personality model of a virtual character. ユーザが、上記仮想人物対話システムを用いて仮想人物と対話する工程を示すシーケンス図である。FIG. 2 is a sequence diagram showing a process in which a user interacts with a virtual character using the virtual character interaction system.

以下、本発明にかかる仮想人物対話システム、映像生成方法、および映像生成プログラムの実施の形態について、図面を参照しながら説明する。 Below, embodiments of the virtual person dialogue system, image generation method, and image generation program of the present invention will be described with reference to the drawings.

●仮想人物対話システムの概要
仮想人物対話システムは、実際にはそこにいない特定の仮想人物の動画、声を再生し、また発話内容を自動生成することで、ユーザが仮想人物との対話を疑似的に行うことができるシステムである。仮想人物の生成対象となる人物（以下、「対象人物」ともいう。）は、故人や有名人、戦争体験者等の語り手等、場所や時間の制限により話す機会が無い又は限られる人物が想定されるが、どのような人物であってもよい。仮想人物は、ユーザから登録される、対象人物に関する情報および後述するモデルデータに基づいて生成される。仮想人物は、ユーザ端末１０（図１参照）上において再生され、あたかも実際に存在しているかのように、動作し、発話し、ユーザに話しかけたり、ユーザからの質問に答えたりする。 ● Overview of the Virtual Person Dialogue System The virtual person dialogue system is a system that allows a user to virtually dialogue with a virtual person by playing back the video and voice of a specific virtual person who is not actually there and automatically generating the content of the speech. The person for which the virtual person is generated (hereinafter also referred to as the "target person") is assumed to be a person who has no or limited opportunities to speak due to location or time restrictions, such as a deceased person, a celebrity, or a narrator such as a war veteran, but may be any person. The virtual person is generated based on information about the target person registered by the user and model data to be described later. The virtual person is played on a user terminal 10 (see FIG. 1), and acts, speaks, talks to the user, and answers questions from the user as if it actually exists.

図１に示すように、ユーザＵは、ユーザ端末１０を介して、仮想人物対話システムの一部又は全部の構成を備えるクラウドコンピュータＣと通信を行うことで、仮想人物Ｋと対話を行う。ユーザＵがユーザ端末１０を介してクラウドコンピュータＣにログインすると（ステップｓ１）、クラウドコンピュータCから仮想人物Kの映像が送信される（ステップｓ２）。ユーザUが仮想人物Kに話しかけると（ステップｓ３）、クラウドコンピュータCは、入力されたメッセージの内容を解析し、あらかじめ決定されている仮想人物Kの性格に基づいて返答を生成し、ユーザ端末１０上で映像と共に再生させる（ステップｓ４）。 As shown in FIG. 1, a user U communicates with a cloud computer C, which comprises part or all of the configuration of a virtual person dialogue system, via a user terminal 10 to converse with a virtual person K. When the user U logs into the cloud computer C via the user terminal 10 (step s1), an image of the virtual person K is transmitted from the cloud computer C (step s2). When the user U speaks to the virtual person K (step s3), the cloud computer C analyzes the contents of the input message, generates a reply based on the predetermined personality of the virtual person K, and plays it back on the user terminal 10 together with the image (step s4).

図２に示すように、本発明にかかる仮想人物対話システム１（以下、「本システム１」ともいう。）は、記憶装置２０と、仮想人物生成装置３０と、動画生成装置４０と、がネットワークNWを介して接続されて構成されている。本システム１は、顧客が有するユーザ端末１０とネットワークＮＷで接続され、相互に情報の送受信が可能である。 As shown in FIG. 2, the virtual person dialogue system 1 according to the present invention (hereinafter also referred to as "this system 1") is configured by connecting a storage device 20, a virtual person generation device 30, and a video generation device 40 via a network NW. This system 1 is connected to a user terminal 10 owned by a customer via the network NW, and can send and receive information between them.

ユーザ端末１０、記憶装置２０、仮想人物生成装置３０および動画生成装置４０の相互の接続は、それぞれ無線であっても有線であってもよい。なお、記憶装置２０、仮想人物生成装置３０および動画生成装置４０は、１個の装置で構成されていてもよい。また、記憶装置２０、仮想人物生成装置３０および動画生成装置４０の機能の一部又は全部がクラウドコンピュータＣ上に実現されていてもよい。 The user terminal 10, the storage device 20, the virtual person generation device 30, and the video generation device 40 may be connected to each other wirelessly or by wire. The storage device 20, the virtual person generation device 30, and the video generation device 40 may be configured as a single device. In addition, some or all of the functions of the storage device 20, the virtual person generation device 30, and the video generation device 40 may be realized on the cloud computer C.

ユーザ端末１０は、仮想人物と対話するユーザが使用するコンピュータであり、入力部１１、出力部１２、表示部１３、情報ソース登録部１４、および通信処理部１９を備える。ユーザ端末１０は、例えばパーソナルコンピュータである。また、ユーザ端末１０は、スマートホンやタブレットであってもよい。本システム１に接続されるユーザ端末１０は、単数であっても複数であってもよい。 The user terminal 10 is a computer used by a user who interacts with a virtual person, and includes an input unit 11, an output unit 12, a display unit 13, an information source registration unit 14, and a communication processing unit 19. The user terminal 10 is, for example, a personal computer. The user terminal 10 may also be a smartphone or a tablet. There may be one or more user terminals 10 connected to the system 1.

入力部１１は、ユーザから仮想人物へのメッセージを入力する機能部であり、キーボード、タッチパネルディスプレイおよびマイクロホン等により構成される。 The input unit 11 is a functional unit that allows the user to input a message to the virtual character, and is composed of a keyboard, a touch panel display, a microphone, etc.

出力部１２は、仮想人物のメッセージが出力される機能部である。出力部１２は、メッセージを文字表示するディスプレイ、又はメッセージを音声出力するスピーカ等により構成される。 The output unit 12 is a functional unit that outputs a message from the virtual person. The output unit 12 is configured with a display that displays the message in text, a speaker that outputs the message as audio, etc.

ユーザ端末１０の表示部１３は、液晶画面等の平面的な再生機器の他、ヘッドマウントディスプレイ型のＶＲ表示装置や、ホログラム（立体映像）表示装置等の、仮想人物の像を立体的に再生する機器であってもよい。ユーザ端末１０が仮想人物の像を立体的に再生する装置である構成によれば、仮想人物との対話をより現実感のあるものとすることができる。また、表示部１３は、複数のユーザが同時に１個の仮想人物の像を視認可能な投影装置であってもよい。 The display unit 13 of the user terminal 10 may be a flat playback device such as an LCD screen, or a device that plays a virtual character image in three dimensions, such as a head-mounted display type VR display device or a hologram (three-dimensional image) display device. If the user terminal 10 is configured as a device that plays a virtual character image in three dimensions, it is possible to make the interaction with the virtual character more realistic. Furthermore, the display unit 13 may be a projection device that allows multiple users to view the image of one virtual character simultaneously.

表示部１３は、本システム１独自のＵＩにより表示されてもよいし、本システム１がＳＫＹＰＥ（登録商標）等既存のチャットツールと連動して、仮想人物からのメッセージや動画が既存のツールに表示されてもよい。この構成によれば、実際の人物とチャットをしているような感覚を得ることができ、仮想人物との対話を現実感のあるものとすることができる。 The display unit 13 may display a UI unique to the present system 1, or the present system 1 may be linked to an existing chat tool such as SKYPE (registered trademark) so that messages and videos from the virtual person are displayed in the existing tool. With this configuration, it is possible to obtain the feeling of chatting with a real person, and to make the conversation with the virtual person feel realistic.

情報ソース登録部１４は、対象人物に関する情報、すなわち対象人物の情報ソースを取得する機能部である。情報ソースは、例えば対象人物が含まれる動画、静止画および音源、ならびに対象人物が作成した日記等の記録文書、趣味嗜好を表す文書、ＳＮＳ等の文字データを含む。また、情報ソースは、衣服等の所有物に関する情報を含む。情報ソースは、ユーザにより登録される他、インターネットを通じて取得してもよい。取得される情報ソースは、仮想人物生成装置３０に送信される。 The information source registration unit 14 is a functional unit that acquires information about the target person, i.e., the information source of the target person. The information source includes, for example, videos, still images, and audio sources that include the target person, as well as record documents such as diaries created by the target person, documents expressing hobbies and preferences, and text data such as SNS. The information source also includes information about possessions such as clothing. The information source may be registered by the user or may be acquired via the Internet. The acquired information source is transmitted to the virtual person generation device 30.

通信処理部１９は、ネットワークＮＷを介して本システム１と情報の授受を行う機能部であり、通信の形式は任意である。 The communication processing unit 19 is a functional unit that transmits and receives information to and from the system 1 via the network NW, and the communication format is arbitrary.

ユーザがユーザ端末１０を通じて対象人物の情報を登録すると、仮想人物生成装置３０は、当該情報を処理して、仮想人物の映像や声、性格等を決定づける。決定された仮想人物のデータは記憶装置２０に格納され、動画生成装置４０により適宜呼び出される。動画生成装置４０は、仮想人物データに基づいて仮想人物の映像、声、メッセージを含む動画を生成し、ユーザ端末１０上に表示させる。 When a user registers information about a target person through the user terminal 10, the virtual person generation device 30 processes the information and determines the image, voice, personality, etc. of the virtual person. The determined virtual person data is stored in the storage device 20 and is called up as appropriate by the video generation device 40. The video generation device 40 generates a video including the image, voice, and message of the virtual person based on the virtual person data, and displays it on the user terminal 10.

●記憶装置の構成
記憶装置２０は、情報処理を実行するためのCPU（Central Processing Unit）などの演算装置、RAM（Random Access Memory）やROM（Read Only Memory）などの記憶装置を備え、これによりソフトウェア資源として少なくとも、映像モデルＤＢ２１、性格モデルＤＢ２２、仮想人物データ記憶部２３、および通信処理部２９を有する。なお、本明細書において「ＤＢ」は「データベース」の略である。 Configuration of the storage device The storage device 20 comprises an arithmetic unit such as a CPU (Central Processing Unit) for executing information processing, and storage devices such as a RAM (Random Access Memory) and a ROM (Read Only Memory), and thus has, as software resources, at least an image model DB 21, a personality model DB 22, a virtual person data storage unit 23, and a communication processing unit 29. In this specification, "DB" is an abbreviation for "database."

映像モデルＤＢ２１は、人が動作する映像モデルを複数種類記憶する記憶部である。映像モデルは、仮想人物の像を生成するために用いられる、映像のテンプレートである。映像モデルは、特に胴体の形や動作を構成するデータである。また、映像モデルは、後述する顔データを統合して、統合した顔データを胴体の像と共に動作させるように構成されている。 The video model DB21 is a storage unit that stores multiple types of video models of human movements. A video model is a video template used to generate an image of a virtual person. A video model is data that configures, in particular, the shape and movement of the torso. In addition, the video model is configured to integrate face data, which will be described later, and move the integrated face data together with the image of the torso.

映像モデルには、身長、体重、年齢等に応じて、体格が異なる複数種類の人物の外観が含まれている。映像モデルには、各人物が着用して再生可能な、複数種類の服装が含まれている。さらに、映像モデルは、各外観の人物が動作する様々なデータを含んでおり、例えば、うなずく、腕を組む、手を挙げる、といった、対話の際によく行われる動作のデータが含まれている。映像モデルは、実際の人物を撮影した映像であってもよいし、CGでモデリングした映像であってもよく、両方が含まれていてもよい。 The video model includes the appearances of multiple types of people with different physiques depending on height, weight, age, etc. The video model includes multiple types of clothing that can be worn by each person and played back. Furthermore, the video model includes various data on the actions of people with each appearance, including data on actions that are often performed during conversation, such as nodding, folding arms, and raising a hand. The video model may be footage of an actual person, or it may be footage modeled using CG, or it may include both.

性格モデルＤＢ２２は、人物の性格モデルが複数種類記憶されている記憶部である。性格モデルは、例えば、質問に対する回答の特性を含み、ポジティブな内容であるかネガティブな内容であるかといった回答の方針、および回答に表れる喜怒哀楽等を決定づける。また、性格モデルは、ユーザからの質問に対する回答に限らず、季節や時間帯等に応じたメッセージの特性であってもよい。性格モデルＤＢ２２には、各性格モデルに即した、あらかじめ想定される質問に対する返答が合わせて記憶されていてもよい。この構成によれば、定型的な質問に対して、性格モデルに応じた返答を生成する計算処理負担が軽減できる。 The personality model DB22 is a storage unit that stores multiple types of personality models for people. The personality model includes, for example, characteristics of answers to questions, and determines the answer policy, such as whether the content is positive or negative, and the emotions expressed in the answer. Furthermore, the personality model is not limited to answers to questions from users, but may also be characteristics of messages according to seasons, time periods, etc. The personality model DB22 may also store responses to questions that are expected in advance and are in line with each personality model. With this configuration, the computational burden of generating responses in accordance with the personality model to standard questions can be reduced.

仮想人物データ記憶部２３は、仮想人物ごとに決定された映像モデル、性格モデル、および声の情報を格納する記憶部である。また、仮想人物データ記憶部２３は、仮想人物が知っている情報、例えば対象人物のエピソードや、体験談等の情報が格納されている。仮想人物データは、仮想人物生成装置３０により決定され、格納される。また、仮想人物データは、仮想人物の動画の再生時に、動画生成装置４０により呼び出される。 The virtual character data storage unit 23 is a storage unit that stores information on the image model, personality model, and voice determined for each virtual character. The virtual character data storage unit 23 also stores information known to the virtual character, such as episodes and personal experiences of the target character. The virtual character data is determined and stored by the virtual character generation device 30. The virtual character data is also called up by the video generation device 40 when playing a video of the virtual character.

●仮想人物生成装置の構成
仮想人物生成装置３０は、情報処理を実行するためのCPU（Central Processing Unit）などの演算装置、RAM（Random Access Memory）やROM（Read Only Memory）などの記憶装置を備え、これによりソフトウェア資源として少なくとも、映像処理部３１、音声処理部３２、性格処理部３３および通信処理部３９を備える。 ●Configuration of the virtual character generation device The virtual character generation device 30 comprises a calculation device such as a CPU (Central Processing Unit) for executing information processing, and storage devices such as a RAM (Random Access Memory) and a ROM (Read Only Memory), and thereby comprises at least a video processing unit 31, an audio processing unit 32, a personality processing unit 33, and a communications processing unit 39 as software resources.

映像処理部３１は、対象人物のデータから、仮想人物の生成に用いる外観データを抽出する機能部である。外観データは、対象人物の顔、体、髪型、服装などを含むデータである。また、映像処理部３１は、仮想人物の生成に使用する映像モデルを選択し、仮想人物の映像に使用する映像データを決定する。なお、映像処理部３１は、ユーザ端末１０の情報ソース登録部１４を介して登録される情報ソースの他、インターネット上から取得される情報ソースに基づいて、仮想人物の外観データを抽出してもよい。また、映像処理部３１は、複数のユーザ端末１０から登録される情報ソースに基づいて、１個の仮想人物の生成に用いる外観データを抽出してもよい。有名人など、多くのユーザが共通の仮想人物と対話する場合、各ユーザが１個の仮想人物の情報ソースを登録する。この構成によれば、より多くの情報ソースに基づいて仮想人物を生成することができ、より現実感のある対話が可能となる。 The video processing unit 31 is a functional unit that extracts appearance data used to generate a virtual person from the data of the target person. The appearance data includes the face, body, hairstyle, clothing, etc. of the target person. The video processing unit 31 also selects a video model to be used to generate the virtual person and determines the video data to be used for the image of the virtual person. The video processing unit 31 may extract the appearance data of the virtual person based on an information source registered via the information source registration unit 14 of the user terminal 10, as well as an information source obtained from the Internet. The video processing unit 31 may also extract appearance data to be used to generate one virtual person based on information sources registered from multiple user terminals 10. When many users, such as celebrities, interact with a common virtual person, each user registers an information source for one virtual person. With this configuration, a virtual person can be generated based on more information sources, enabling more realistic interactions.

映像処理部３１は、動画取得部３１１、静止画取得部３１２、トリミング部３１３、画像補正部３１４、映像モデル選択部３１５および顔挿入部３１６を有する。 The video processing unit 31 has a video acquisition unit 311, a still image acquisition unit 312, a trimming unit 313, an image correction unit 314, a video model selection unit 315, and a face insertion unit 316.

動画取得部３１１は、動画データを取得する機能部である。動画取得部３１１は、ユーザ端末１０に登録される情報ソースに含まれる動画を取得する。また、動画取得部３１１は、ユーザ端末１０を通じてユーザに動画の撮影を促すこともできる。ユーザ端末１０を通じて動画が撮影可能な状況として、例えば、対象人物がユーザの身近な人物であり、仮想人物を別のユーザ端末１０に表示させる場合や、対象人物が亡くなった後にも対話可能とするために仮想人物を生成しておく場合などが考えられる。この場合、動画取得部３１１は、ユーザに動画を撮影させるためのチュートリアルをユーザ端末１０に表示させてもよい。 The video acquisition unit 311 is a functional unit that acquires video data. The video acquisition unit 311 acquires videos included in an information source registered in the user terminal 10. The video acquisition unit 311 can also prompt the user to shoot a video through the user terminal 10. Examples of situations in which a video can be shot through the user terminal 10 include a case where the target person is a person close to the user and a virtual person is displayed on another user terminal 10, or a case where a virtual person is generated so that the target person can be interacted with even after the target person has passed away. In this case, the video acquisition unit 311 may display a tutorial on the user terminal 10 to encourage the user to shoot a video.

静止画取得部３１２は、静止画データを取得する機能部である。静止画取得部３１２は、ユーザ端末１０に登録される情報ソースに含まれる静止画を取得する。また、静止画取得部３１２は、ユーザ端末１０を通じてユーザに静止画の撮影を促すこともできる。この場合、静止画取得部３１２は、ユーザに静止画、すなわち写真を撮影させるためのチュートリアルをユーザ端末１０に表示させてもよい。また、静止画取得部３１２は、動画データを静止画に変換し、取得する。静止画取得部３１２は、対象人物の様々な角度の画像や、様々な表情の画像を抽出し、静止画に変換する。 The still image acquisition unit 312 is a functional unit that acquires still image data. The still image acquisition unit 312 acquires still images contained in an information source registered in the user terminal 10. The still image acquisition unit 312 can also prompt the user to take still images through the user terminal 10. In this case, the still image acquisition unit 312 may cause the user terminal 10 to display a tutorial for encouraging the user to take still images, i.e., photographs. The still image acquisition unit 312 also converts video data into still images and acquires them. The still image acquisition unit 312 extracts images of the target person from various angles and images of various facial expressions, and converts them into still images.

トリミング部３１３は、静止画から対象人物のデータをトリミングして抽出する機能部である。トリミング部３１３は、顔認識機能を備え、対象人物の顔のみを自動で抽出可能であってもよい。 The trimming unit 313 is a functional unit that trims and extracts data of a target person from a still image. The trimming unit 313 may have a face recognition function and be capable of automatically extracting only the face of the target person.

画像補正部３１４は、抽出された画像の色調補正や、解像度補正を行い、抽出した画像の質を均一化する。また、画像補正部３１４は、抽出された画像が鮮明か否かを判別し、不鮮明な画像を抽出されたデータ群から除外してもよい。また、画像補正部３１４は、所定以下の解像度の画像を抽出されたデータ群から除外してもよい。 The image correction unit 314 performs color correction and resolution correction on the extracted images to make the quality of the extracted images uniform. The image correction unit 314 may also determine whether the extracted images are clear or not, and exclude unclear images from the extracted data group. The image correction unit 314 may also exclude images with a resolution below a predetermined level from the extracted data group.

映像モデル選択部３１５は、仮想人物の生成に使用する使用映像モデルを映像モデルDB２１内のデータから選択する機能部である。映像モデル選択部３１５は、動画取得部３１１により取得される外観データに基づいて、対象人物に最も類似する映像モデルを選択してもよいし、ユーザ端末１０に映像モデルを複数提示し、ユーザに使用する映像モデルを選択させてもよい。この構成によれば、仮想人物が動いている様子が表れている情報ソースを十分登録しなくても、映像モデルにより仮想人物の動画を構成することができる。 The video model selection unit 315 is a functional unit that selects the video model to be used to generate a virtual person from the data in the video model DB 21. The video model selection unit 315 may select the video model that is most similar to the target person based on the appearance data acquired by the video acquisition unit 311, or may present multiple video models to the user terminal 10 and allow the user to select the video model to use. With this configuration, a video of a virtual person can be created using a video model even if a sufficient number of information sources showing the virtual person moving are not registered.

また、映像モデル選択部３１５は、生成する仮想人物の服装を、外観データに基づいて決定してもよいし、情報ソースに含まれる所有物情報に基づいて決定してもよい。また、映像モデル選択部３１５は、映像モデルDB２１から、仮想人物の服装を選択してもよい。すなわち、対象人物がその服装をしている情報ソースがあれば、当該情報ソースに基づいて仮想人物の映像を生成することができるし、対象人物の情報ソースがなくても、所有物情報に基づいて仮想人物の映像を生成可能である。また、映像モデルＤＢ２１から服装のデータを選択することもできるので、対象人物の服装に関するデータが不足していても、仮想人物の生成を簡便に行うことができる。なお、映像モデル選択部３１５は複数種類の服装をしている仮想人物の映像を構成しておき、時期や時間帯、又はユーザの選択に基づいて服装が変更可能になっていてもよい。 The video model selection unit 315 may determine the clothes of the virtual person to be generated based on the appearance data or on the possession information included in the information source. The video model selection unit 315 may select the clothes of the virtual person from the video model DB 21. That is, if there is an information source in which the target person is wearing that clothing, it is possible to generate an image of the virtual person based on the information source, and even if there is no information source of the target person, it is possible to generate an image of the virtual person based on the possession information. Since the clothing data can be selected from the video model DB 21, even if the data on the clothing of the target person is insufficient, it is possible to easily generate a virtual person. Note that the video model selection unit 315 may compose an image of a virtual person wearing multiple types of clothing, and the clothing may be changed based on the season, time period, or user selection.

映像モデル選択部３１５は、生成する仮想人物の髪型を、外観データに基づいて決定してもよいし、映像モデルDB２１から、仮想人物の髪型を選択してもよい。さらに、映像モデル選択部３１５は、複数種類の髪型をしている仮想人物の映像を構成しておき、髪型が変更可能になっていてもよい。 The video model selection unit 315 may determine the hairstyle of the virtual person to be generated based on the appearance data, or may select the hairstyle of the virtual person from the video model DB 21. Furthermore, the video model selection unit 315 may compose videos of virtual people with multiple types of hairstyles, and the hairstyle may be changeable.

なお、ここまでの説明において、映像処理部３１は対象人物自身の情報ソースに基づいて仮想人物のデータを抽出することを想定して説明したが、対象人物に似ている人物の動画や静止画を新たに撮影し、仮想人物の生成に用いてもよい。また、髪型や服装など、似ている人物の外観データを部分的に使用して、仮想人物の生成に用いてもよい。すなわち、外観データのうち仮想人物の生成に用いる要素を、ユーザが選択可能になっていてもよい。 In the above explanation, it has been assumed that the video processing unit 31 extracts data of a virtual character based on the information source of the target person himself/herself. However, it is also possible to take new videos or still images of a person who resembles the target person and use them to generate a virtual character. Also, it is also possible to partially use the appearance data of a similar person, such as hairstyle or clothing, to generate a virtual character. In other words, the user may be able to select the elements of the appearance data to be used to generate a virtual character.

顔挿入部３１６は、動画取得部３１１、静止画取得部３１２、トリミング部３１３および画像補正部３１４により抽出された顔データを使用映像モデルに統合する機能部である。顔挿入部３１６により、使用映像モデルで構成される胴体に、顔データが統合され、仮想人物の全身像が構成される。 The face insertion unit 316 is a functional unit that integrates the face data extracted by the video acquisition unit 311, the still image acquisition unit 312, the trimming unit 313, and the image correction unit 314 into the used video model. The face insertion unit 316 integrates the face data into the torso that is configured by the used video model, and creates a full-body image of the virtual person.

音声処理部３２は、仮想人物の話す声を人工的に生成する機能部である。音声処理部３２は、音声抽出部３２１と音声生成部３２２とを備える。 The voice processing unit 32 is a functional unit that artificially generates the speaking voice of a virtual person. The voice processing unit 32 includes a voice extraction unit 321 and a voice generation unit 322.

音声抽出部３２１は、情報ソースから対象人物の音声を抽出する機能部である。音声抽出部３２１は、例えば情報ソースに含まれる複数種類の声のうち、最も長時間含まれている人物の声を対象人物の声と同定してもよい。 The voice extraction unit 321 is a functional unit that extracts the voice of a target person from an information source. For example, the voice extraction unit 321 may identify the voice of the person that is included for the longest period of time among multiple types of voices included in the information source as the voice of the target person.

音声生成部３２２は、音声抽出部３２１により抽出された音声に基づいて、仮想人物の声を生成する機能部である。音声生成部３２２は、対象人物の音声をトリミングし、仮想人物の声として再生可能な状態に編集してもよい。また、音声生成部３２２は、あらかじめ用意された音声データの中から、対象人物の声に似た声を選んで仮想人物の声として決定することもできる。さらに、音声生成部３２２は、対象人物の音声に類似する人工音声を生成してもよい。なお、仮想人物からのメッセージをテキストで表示する場合は、音声の生成はなくてもよい。 The voice generation unit 322 is a functional unit that generates the voice of the virtual character based on the voice extracted by the voice extraction unit 321. The voice generation unit 322 may trim the voice of the target character and edit it so that it can be played back as the voice of the virtual character. The voice generation unit 322 may also select a voice similar to the voice of the target character from voice data prepared in advance and determine it as the voice of the virtual character. Furthermore, the voice generation unit 322 may generate an artificial voice similar to the voice of the target character. Note that if a message from the virtual character is displayed as text, voice generation may not be necessary.

性格処理部３３は、仮想人物の性格モデルを決定する機能部である。性格処理部３３は、テキストデータ登録部３３１、性格モデル選択部３３２、および性格モデル補正部３３３を備える。 The personality processing unit 33 is a functional unit that determines the personality model of the virtual person. The personality processing unit 33 includes a text data registration unit 331, a personality model selection unit 332, and a personality model correction unit 333.

テキストデータ登録部３３１は、情報ソースからテキストデータを抽出し、仮想人物データ記憶部２３に格納する機能部である。テキストデータ登録部３３１は、対象人物のブログやＳＮＳ等の電子的なテキストデータを抽出し、所定のルールに従って仮想人物データ記憶部２３に格納する。また、テキストデータ登録部３３１は、対象人物による手書きの文書、例えば日記等を読み込み、テキストデータに変換して仮想人物データ記憶部２３に格納してもよい。さらに、テキストデータ登録部３３１は、音声や動画データに含まれる対象人物の声をテキストデータに変換し、仮想人物データ記憶部２３に格納してもよい。 The text data registration unit 331 is a functional unit that extracts text data from an information source and stores it in the virtual person data storage unit 23. The text data registration unit 331 extracts electronic text data such as the target person's blog or SNS, and stores it in the virtual person data storage unit 23 according to a predetermined rule. The text data registration unit 331 may also read a handwritten document by the target person, such as a diary, convert it into text data, and store it in the virtual person data storage unit 23. Furthermore, the text data registration unit 331 may convert the voice of the target person contained in audio or video data into text data, and store it in the virtual person data storage unit 23.

性格モデル選択部３３２は、仮想人物の生成に使用する性格モデル（以下、「使用性格モデル」ともいう。）を性格モデルDB２２から選択する機能部である。性格モデル選択部３３２は、ユーザ端末１０を通じて仮想人物の性格に関する質問を提示する。ユーザ端末１０から質問に対する回答が入力されると、当該回答に基づいて、仮想人物の生成に使用する使用性格モデルを性格モデルＤＢ２２内のデータから選択する。 The personality model selection unit 332 is a functional unit that selects a personality model (hereinafter also referred to as the "used personality model") to be used in generating a virtual character from the personality model DB 22. The personality model selection unit 332 presents a question about the personality of the virtual character via the user terminal 10. When an answer to the question is input from the user terminal 10, the personality model to be used in generating the virtual character is selected from the data in the personality model DB 22 based on the answer.

性格に関する質問は、複数提示されてもよい。また、入力される回答と次の質問とが紐づけられるチャートに沿って、質問が提示されてもよい。ユーザが質問に答えていくことで、あらかじめ用意された性格の基本分類に基づいて、仮想人物の基本的な性格付けが行われる。性格付けを、対象人物の実際の会話の情報から行うものとすると、膨大な量の会話の情報が必要である。本システム１によれば、性格に関する質問の回答に基づいてあらかじめ用意された性格のいずれかに分類することができるので、情報が不足していても簡易な構成で仮想人物の性格を決定することができる。 Multiple personality questions may be presented. Questions may also be presented along a chart that links input answers to the next question. As the user answers the questions, a basic personality assignment is made for the virtual character based on basic personality classifications prepared in advance. If personality assignment were to be made from information about the target character's actual conversation, a huge amount of conversation information would be required. With this system 1, the answers to personality questions can be classified into one of the prepared personalities, so that the personality of the virtual character can be determined with a simple configuration even if information is insufficient.

なお、仮想人物の性格モデルは、ユーザからの質問のタイプに応じたシナリオパターンごとに定められていてもよい。シナリオパターンは、例えば日常会話、又は悩みごとの相談等である。一部のシナリオパターンに関して性格モデルが決定されれば、当該シナリオパターンに即した対話が可能に構成されていてもよい。この構成によれば、必要なシナリオパターンに関する性格モデルのみを決定すれば対話できるので、簡便である。 The personality model of the virtual character may be determined for each scenario pattern according to the type of question from the user. The scenario pattern may be, for example, daily conversation or consultation about a problem. Once a personality model is determined for some scenario patterns, a configuration may be made that allows dialogue in accordance with the scenario pattern. This configuration is simple, since dialogue can be carried out by determining only the personality model for the necessary scenario pattern.

性格モデル補正部３３３は、性格モデル選択部３３２が選択した使用性格モデルを補正する機能部である。性格モデル補正部３３３は、ユーザ端末１０から、仮想人物が行った返答に対する評価を受信し、当該評価に基づいて使用性格モデルを補正する。例えば、ユーザは、返答に対し、対象人物の返答として適切な内容であったか否かを評価として入力する。また、返答とともになされる仮想人物の動作に対して評価をしてもよい。性格モデル補正部３３３は、ＡＩ等により自動学習を行い、性格モデルを補正する。この構成によれば、仮想人物の性格をより対象人物に近いものに補正することができる。なお、１個の仮想人物に対して複数のユーザ端末１０が同時又は別の時点で対話を行う場合において、複数のユーザ端末１０からの評価を１個の仮想人物の性格モデルの補正に使用してもよい。この構成によれば、仮想人物の性格モデルに多くのフィードバックを与えることができるため、仮想人物の性格モデルを対象人物の性格により近づけ、対話精度を上げることができる。 The personality model correction unit 333 is a functional unit that corrects the personality model used selected by the personality model selection unit 332. The personality model correction unit 333 receives an evaluation of the response made by the virtual person from the user terminal 10, and corrects the personality model used based on the evaluation. For example, the user inputs an evaluation of whether the response was appropriate for the target person. The user may also evaluate the virtual person's actions that accompany the response. The personality model correction unit 333 performs automatic learning using AI or the like and corrects the personality model. With this configuration, the personality of the virtual person can be corrected to be closer to the target person. Note that when multiple user terminals 10 simultaneously or at different times have a conversation with one virtual person, the evaluations from the multiple user terminals 10 may be used to correct the personality model of one virtual person. With this configuration, a lot of feedback can be given to the personality model of the virtual person, so that the personality model of the virtual person can be made closer to the personality of the target person, and the accuracy of the conversation can be improved.

また、性格モデル補正部３３３は、ユーザからの評価ではなく、仮想人物からのメッセージに対するユーザの返答に基づいて、当該メッセージが適していたか否かを判定し、性格モデルを補正してもよい。性格モデル補正部３３３は、ユーザの返答内容をテキストデータに変換して解析してもよいし、ユーザの声色から満足度を類推してもよい。 The personality model correction unit 333 may also determine whether a message from a virtual person was appropriate based on the user's response to the message, rather than on the user's evaluation, and correct the personality model. The personality model correction unit 333 may convert the content of the user's response into text data and analyze it, or may infer the level of satisfaction from the user's tone of voice.

通信処理部３９は、ユーザ端末１０、記憶装置２０、および動画生成装置４０と、ネットワークNWを通じて相互に通信する機能部である。 The communication processing unit 39 is a functional unit that communicates with the user terminal 10, the storage device 20, and the video generating device 40 through the network NW.

●動画生成装置の構成
動画生成装置４０は、仮想人物生成装置３０により生成された仮想人物の動画を、ユーザ端末１０上に表示させる装置である。動画生成装置４０は、映像表示処理部４１、対話処理部４２、通信処理部４９を備える。 Configuration of the Moving Image Generating Device The moving image generating device 40 is a device that displays a moving image of the virtual person generated by the virtual person generating device 30 on the user terminal 10. The moving image generating device 40 includes an image display processing unit 41, a dialogue processing unit 42, and a communication processing unit 49.

映像表示処理部４１は、仮想人物が発話する発話映像を生成する機能部である。映像表示処理部４１は、外観データから抽出された顔データをモデリング処理し、発話に合わせて動作させる。 The video display processing unit 41 is a functional unit that generates a speech video of a virtual person speaking. The video display processing unit 41 performs modeling processing on the face data extracted from the appearance data, and makes it move in accordance with the speech.

対話処理部４２は、使用性格モデルに基づいて、仮想人物が発話するメッセージを生成する機能部である。メッセージの内容は、ユーザからの質問に対する返答であってもよいし、日付、季節、又は時間帯、もしくはインターネット上の天気予報やニュース等の外部情報に応じて生成される言葉であってもよい。また、ユーザへの返答にあたって、使用性格モデルに加えて、日付、季節、又は時間帯、もしくはインターネット上の天気予報やニュース等の外部情報に基づいて返答を生成してもよい。対話処理部４２は、最適な回答をＡＩにより決定する。 The dialogue processing unit 42 is a functional unit that generates a message to be spoken by the virtual person based on the usage personality model. The content of the message may be a response to a question from the user, or may be words generated based on external information such as the date, season, or time of day, or weather forecasts and news on the Internet. Furthermore, when replying to the user, a response may be generated based on external information such as the date, season, or time of day, or weather forecasts and news on the Internet, in addition to the usage personality model. The dialogue processing unit 42 determines the optimal response using AI.

対話処理部４２で生成されるメッセージは、音声処理部３２で生成される声により発話され、映像表示処理部４１で生成される発話映像と共にユーザ端末１０で再生される。仮想人物の声は、音声抽出部３２１から抽出した対象人物の台詞を再生してもよい。また、あらかじめ決定した似た声の音源データに基づいて再生してもよい。さらに、人工の音声を生成し、再生してもよい。 The message generated by the dialogue processing unit 42 is spoken by a voice generated by the audio processing unit 32, and is played on the user terminal 10 together with the spoken image generated by the image display processing unit 41. The voice of the virtual character may be the lines of the target character extracted from the audio extraction unit 321. It may also be played based on sound source data of a similar voice determined in advance. Furthermore, an artificial voice may be generated and played.

通信処理部４９は、ユーザ端末１０、記憶装置２０、および仮想人物生成装置３０と、ネットワークNWを通じて相互に通信する機能部である。 The communication processing unit 49 is a functional unit that communicates with the user terminal 10, the storage device 20, and the virtual person generation device 30 via the network NW.

●使用映像モデルを決定する流れ
図３を用いて、仮想人物生成装置３０が使用映像モデルを決定する流れを説明する。同図に示すように、まず、ユーザ端末１０から対象人物の情報ソースが登録され、仮想人物生成装置３０に送信される（ステップＳ１１）。次いで、仮想人物生成装置３０は、情報ソースから外観データを抽出する（ステップＳ１２）。外観データのうち、動画を静止画に変換する（ステップＳ１３）。次いで、登録された静止画および動画から変換された静止画に対し、対象人物の画像をトリミングし、画像の色調および解像度を補正する（ステップＳ１４）。トリミングおよび画像の補正は、順不同である。なお、このとき、補正してもデータの品質が所定以下である場合は、当該画像を後の工程に使用しないことを決定してもよい。 ● Flow of determining the video model to be used The flow of the virtual person generation device 30 determining the video model to be used will be described with reference to FIG. 3. As shown in the figure, first, the information source of the target person is registered from the user terminal 10 and transmitted to the virtual person generation device 30 (step S11). Next, the virtual person generation device 30 extracts appearance data from the information source (step S12). Among the appearance data, moving images are converted into still images (step S13). Next, the registered still images and still images converted from the moving images are trimmed to the target person's images, and the color tone and resolution of the images are corrected (step S14). The trimming and image correction can be performed in any order. At this time, if the quality of the data is below a certain level even after correction, it may be determined that the image is not to be used in the subsequent process.

次いで、仮想人物生成装置３０は、トリミングおよび画像補正が施された画像を記憶装置２０の仮想人物データ記憶部２３に格納する（ステップＳ１５）。仮想人物生成装置３０は、格納される画像のうち主に体格に関する情報に基づいて、映像モデルＤＢ２１に格納されている映像モデルを参照し（ステップＳ１６）、対象人物の外観に最も類似する映像モデルを選択し、ユーザ端末１０に表示させる（ステップＳ１７）。なお、このとき、映像モデルの候補がユーザ端末１０に複数表示され、ユーザ端末１０により使用映像モデルを選択可能にしてもよい。また、提示される映像モデルとは異なる映像モデルをユーザ端末１０により選択可能になっていてもよい。 The virtual person generation device 30 then stores the trimmed and corrected image in the virtual person data storage unit 23 of the storage device 20 (step S15). Based on information about the physique of the stored images, the virtual person generation device 30 refers to the video models stored in the video model DB 21 (step S16), selects the video model that most resembles the appearance of the target person, and displays it on the user terminal 10 (step S17). At this time, multiple candidates for the video model may be displayed on the user terminal 10, and the user terminal 10 may be able to select the video model to be used. Also, the user terminal 10 may be able to select a video model different from the presented video model.

次いで、ユーザ端末１０は、使用映像モデルが有するパーツを個別に変更する入力を受け付ける（ステップＳ１８）。パーツは、輪郭や、目、鼻又は口等の各このとき、仮想モデルの髪型や服装についての選択が入力されてもよい。使用映像モデルのパーツが適宜変更され、仮想人物の使用映像モデルが確定すると、当該使用映像モデルに、外観データから抽出された顔データを統合する（ステップＳ１９）。次いで、顔データが統合された使用映像モデルを、記憶装置２０の仮想人物データ記憶部２３に格納する（ステップＳ２０）。 Next, the user terminal 10 accepts input to change individual features of the used video model (step S18). Features include each of the contours, eyes, nose, and mouth, and at this time, a selection regarding the hairstyle and clothing of the virtual model may also be input. Once the features of the used video model have been appropriately changed and the used video model of the virtual person has been finalized, face data extracted from the appearance data is integrated into the used video model (step S19). Next, the used video model with the integrated face data is stored in the virtual person data storage unit 23 of the storage device 20 (step S20).

●仮想人物の声を生成する流れ
図４を用いて、仮想人物生成装置３０が仮想人物の声を生成する流れを説明する。まず、ユーザ端末１０から情報ソースが登録されると（ステップＳ２１）、仮想人物生成装置３０は、当該情報ソースから対象人物の音声データを抽出する（ステップＳ２２）。仮想人物生成装置３０は、当該音声データに基づいて、仮想人物の声を生成する。 Flow of generating a voice of a virtual character The flow of generating a voice of a virtual character by the virtual character generation device 30 will be described with reference to Fig. 4. First, when an information source is registered from the user terminal 10 (step S21), the virtual character generation device 30 extracts voice data of the target character from the information source (step S22). The virtual character generation device 30 generates the voice of the virtual character based on the voice data.

●仮想人物の性格モデルを決定する流れ
図５を用いて、仮想人物生成装置３０が仮想人物の性格モデルを決定する流れを説明する。ユーザ端末１０から情報ソースが登録されると（ステップＳ３１）、仮想人物生成装置３０は、当該情報ソースからブログやＳＮＳなどのテキストデータを抽出する（ステップＳ３２）。また、このとき、手書きの日記等の画像データを抽出し、テキストデータに変換する。さらに、音源データを抽出し、対象人物の声をテキストデータに変換する。抽出されたテキストデータは、所定のルールに基づいて仮想人物データ記憶部２３に格納される（ステップＳ３３）。 ● Flow of determining a personality model of a virtual character The flow of determining a personality model of a virtual character by the virtual character generation device 30 will be described with reference to Fig. 5. When an information source is registered from the user terminal 10 (step S31), the virtual character generation device 30 extracts text data such as blogs and SNS from the information source (step S32). At this time, image data such as handwritten diaries is also extracted and converted into text data. Furthermore, sound source data is extracted and the voice of the target character is converted into text data. The extracted text data is stored in the virtual character data storage unit 23 based on a predetermined rule (step S33).

次いで、仮想人物生成装置３０は、対象人物の性格に関する質問を、ユーザ端末１０に表示させる（ステップＳ３４）。このとき、質問の内容は登録される情報ソースに基づいて決定されてもよい。また、登録したいシナリオパターンをユーザに選択させ、シナリオパターンに応じた質問を表示させてもよい。ユーザ端末１０は、質問に対する回答の入力を受け付ける（ステップＳ３５）。なおこのとき、１度に複数の質問が表示されてもよいし、ステップＳ３４およびステップＳ３５を繰り返してもよい。 Next, the virtual person generation device 30 causes the user terminal 10 to display questions about the personality of the target person (step S34). At this time, the content of the questions may be determined based on the information source to be registered. Alternatively, the user may be allowed to select a scenario pattern to be registered, and questions corresponding to the scenario pattern may be displayed. The user terminal 10 accepts input of answers to the questions (step S35). At this time, multiple questions may be displayed at once, or steps S34 and S35 may be repeated.

仮想人物生成装置３０は、性格に関する質問への回答に基づいて、性格モデルＤＢ２２に格納されている性格モデルを参照し（ステップＳ３６）、使用性格モデルを決定する（ステップＳ３７）。次いで、決定した使用性格モデルを仮想人物データ記憶部２３に格納する（ステップＳ３８）。 Based on the answers to the questions about personality, the virtual person generation device 30 refers to the personality models stored in the personality model DB 22 (step S36) and determines the personality model to be used (step S37). The determined personality model to be used is then stored in the virtual person data storage unit 23 (step S38).

●仮想人物と対話する流れ
図６を用いて、ユーザが仮想人物対話システムを用いて仮想人物と対話する流れを説明する。ユーザ端末１０にＩＤおよびパスワードが入力されると（ステップＳ４１）、仮想人物生成装置３０により認証され（ステップＳ４２）、ＩＤに紐づけられている仮想人物との対話が可能となる。このとき、仮想人物からチャットの着信がある、電話がかかってくる、メールが届く、といった演出がなされてもよい。次いで、記憶装置２０の仮想人物データ記憶部２３から、対話する仮想人物のデータが呼び出され、動画生成装置４０により参照可能な状態となる（ステップＳ４３）。すなわち、ユーザ端末１０上に仮想人物の像が表示される。仮想人物は、表示された時点で発話してもよく、動作をしてもよい。 Flow of interacting with a virtual character A flow of interacting with a virtual character using the virtual character interaction system will be described with reference to FIG. 6. When an ID and a password are input to the user terminal 10 (step S41), the virtual character generation device 30 authenticates the user (step S42), and the user can interact with the virtual character associated with the ID. At this time, the virtual character may send an incoming chat message, a phone call, or an e-mail. Next, data of the virtual character to be interacted with is called from the virtual character data storage unit 23 of the storage device 20, and the data is made available for reference by the video generation device 40 (step S43). That is, an image of the virtual character is displayed on the user terminal 10. The virtual character may speak or move when displayed.

ユーザ端末１０から仮想人物への質問が入力されると（ステップＳ４４）、動画生成装置４０は、仮想人物のデータに基づいて、仮想人物が返答する動画を生成する。 When a question is input to the virtual character from the user terminal 10 (step S44), the video generating device 40 generates a video in which the virtual character responds based on the data of the virtual character.

具体的には、まず、動画生成装置４０は、仮想人物の性格モデルに基づいて、質問に対する返答テキストを生成する（ステップＳ４５）。また、動画生成装置４０は、当該返答テキストを仮想人物の声で再生する返答音声を生成する（ステップＳ４６）。返答音声は、記憶されている対象人物の音源データでもよいし、人工的に生成した人工音声であってもよい。さらに、動画生成装置４０は、返答音声を再生する際に再生される返答映像を生成する（ステップＳ４７）。生成される返答音声および返答映像は、返答する動画としてユーザ端末１０に送信される（ステップＳ４８）。なお、返答音声および返答映像は、統合されて１個のデータファイルとしてユーザ端末１０に送信されてもよいし、それぞれのデータファイルがユーザ端末１０に送信されてもよい。次いで、ユーザ端末１０上に、仮想人物の動画が表示される（ステップＳ４９）。すなわち、ユーザからの質問に対し仮想人物が返答し、仮想人物との対話が成立する。ステップＳ４４からステップＳ４９までの工程は、複数回繰り返されてよい。この構成により、仮想人物との自然な対話が可能である。 Specifically, first, the video generating device 40 generates a response text to the question based on the personality model of the virtual person (step S45). The video generating device 40 also generates a response voice that reproduces the response text in the voice of the virtual person (step S46). The response voice may be stored audio data of the target person, or may be an artificial voice that is artificially generated. Furthermore, the video generating device 40 generates a response video that is reproduced when reproducing the response voice (step S47). The generated response voice and response video are transmitted to the user terminal 10 as a response video (step S48). Note that the response voice and response video may be integrated and transmitted to the user terminal 10 as a single data file, or each data file may be transmitted to the user terminal 10. Next, a video of the virtual person is displayed on the user terminal 10 (step S49). That is, the virtual person responds to the question from the user, and a dialogue with the virtual person is established. The process from step S44 to step S49 may be repeated multiple times. This configuration allows for a natural dialogue with the virtual person.

なお、図６においては、ステップＳ４４に示すユーザ端末１０への質問の入力を契機に仮想人物の動画が生成される流れを説明したが、所定の日付や時間になったことに基づいて仮想人物の動画が生成され、ユーザ端末１０に表示される構成であってもよい。また、インターネット上等からの外部情報に基づいて動画が生成されてもよいし、仮想人物対話システム１の管理者からの指令に基づいて、動画が生成されてもよい。動画が生成されると直ちにユーザ端末１０に表示される構成であってもよいし、動画をあらかじめ生成しておき、ユーザからの質問、日付、時間、外部情報又は指令等を契機にユーザ端末１０に表示させてもよい。 In FIG. 6, the flow of generating a video of a virtual person triggered by inputting a question to the user terminal 10 shown in step S44 has been described, but the configuration may be such that a video of a virtual person is generated when a specific date or time arrives and is displayed on the user terminal 10. The video may also be generated based on external information from the Internet, etc., or based on instructions from an administrator of the virtual person dialogue system 1. The video may be displayed on the user terminal 10 immediately after it is generated, or the video may be generated in advance and displayed on the user terminal 10 in response to a question from the user, a date, a time, external information, or an instruction, etc.

ステップＳ４９に次いで、ユーザ端末１０から動画に対する評価が入力されると（ステップＳ５０）、仮想人物生成装置３０は性格モデルを補正し、記憶装置２０の仮想人物データ記憶部２３に記憶する（ステップＳ５１）。 Following step S49, when an evaluation of the video is input from the user terminal 10 (step S50), the virtual person generation device 30 corrects the personality model and stores it in the virtual person data storage unit 23 of the storage device 20 (step S51).

このように、本発明にかかる仮想人物対話システムによれば、簡易な構成で仮想人物の発話映像を生成できる。 In this way, the virtual character dialogue system of the present invention can generate a speech video of a virtual character with a simple configuration.

１仮想人物生成システム
１０ユーザ端末
２０記憶装置
２１映像モデルＤＢ
３０仮想人物生成装置
３１映像処理部
４０動画生成装置
４１映像表示処理部 1 Virtual person generation system 10 User terminal 20 Storage device 21 Video model DB
30 Virtual person generation device 31 Video processing unit 40 Video generation device 41 Video display processing unit

Claims

A video model database that stores a plurality of video models of human movements;
a video model selection unit that selects a video model to be used for generating a virtual person from the data in the video model database;
a face insertion unit for integrating face data of the virtual person into the usage video model;
a voice processing unit for generating a voice of the virtual person;
a video display processing unit that generates a video of the virtual person speaking based on the usage video model into which the face data is integrated and the generated voice of the virtual person;
A personality model database that stores a plurality of personality models of a person;
a personality model selection unit that selects a personality model to be used for generating the virtual person from data in the personality model database;
a dialogue processor that generates a message to be spoken by the virtual person based on the usage personality model;
a personality model correction unit that corrects the usage personality model based on evaluations of the message transmitted from a plurality of terminals that can receive the message;
Equipped with
the dialogue processing unit determines emotions to be expressed in a message to be spoken by the virtual person based on the usage personality model, and generates the message including the emotions.
Virtual person dialogue system.

the personality model selection unit selects the personality model to be used from data in the personality model database based on a record created by a person who is a target of generating the virtual character;
The virtual person interaction system according to claim 1 .

the personality model selection unit, as the user answers the questions, assigns a basic personality to the virtual character based on a basic personality classification prepared in advance, and classifies a personality model to be used for generating the virtual character into one of the personalities included in the personality model database;
3. A virtual person dialogue system according to claim 1 or 2.

The system further includes an input unit for inputting a question to the virtual person, and an output unit for outputting a response from the virtual person,
the dialogue processor generates a response to the question, and causes the output unit to output the response;
4. A virtual person dialogue system according to claim 1.

A method for generating an image of a virtual person by a virtual person dialogue system including an image model database storing a plurality of types of image models of human actions, and a personality model database storing a plurality of personality models of people, comprising:
a video model selection step of selecting a video model to be used for generating the virtual person from data in the video model database;
a face insertion step of integrating face data of the virtual person into the usage video model;
a voice processing step for generating a voice for said virtual character;
a video display processing step of generating a video of the virtual person speaking based on the usage video model into which the face data has been integrated and the generated voice of the virtual person;
a personality model selection step of selecting a personality model to be used for generating the virtual person from data in the personality model database;
a dialogue processing step of generating a message to be spoken by the virtual person based on the usage personality model;
a personality model correction step of correcting the usage personality model based on evaluations of the message transmitted from a plurality of terminals capable of receiving the message;
Including,
In the dialogue processing step, emotions expressed in a message uttered by the virtual person are determined based on the usage personality model, and the message including the emotions is generated.
Image generation method.

A computer program for generating an image of a virtual person by a virtual person dialogue system including an image model database storing a plurality of types of image models of human actions, and a personality model database storing a plurality of personality models of people, the computer program comprising:
a visual model selection command for selecting a visual model to be used in generating the virtual person from data in the visual model database;
a face insertion command for integrating face data of the virtual person into the usage video model;
voice processing instructions for generating a voice for said virtual character;
a video display processing command for generating a video in which the virtual person speaks based on the usage video model into which the face data has been integrated and the generated voice of the virtual person;
a personality model selection command for selecting a personality model to be used in generating the virtual person from data in the personality model database;
A dialogue processing command for generating a message to be spoken by the virtual person based on the usage personality model;
a personality model correction command for correcting the usage personality model based on evaluations of the message transmitted from a plurality of terminals capable of receiving the message;
Run the following on your computer:
The dialogue processing command determines emotions to be expressed in a message to be spoken by the virtual person based on the usage personality model, and generates the message including the emotions.
Video generation program.