JP2021086618A

JP2021086618A - Virtual person interaction system, video generation method, and video generation program

Info

Publication number: JP2021086618A
Application number: JP2020179082A
Authority: JP
Inventors: 晴彦安田; Haruhiko Yasuda
Original assignee: Cro Magnon Corp
Current assignee: Cro Magnon Corp
Priority date: 2019-11-28
Filing date: 2020-10-26
Publication date: 2021-06-03
Anticipated expiration: 2039-11-28
Also published as: JP7496128B2

Abstract

To provide a virtual person interaction system for generating a video of a virtual person speaking, with a simple configuration, a video generation method using the virtual person interaction system, and a video generation program of the virtual person interaction system.SOLUTION: A virtual person interaction system 1 includes: a video model database 21 which stores multiple kinds of video models for motion of a person; a video model selection unit 315 which selects a video model to be used for generating a virtual person from data in the video model database; a video processing unit 31 which extracts face data of a virtual person to be generated, from a registered information source; a face insertion unit 316 which integrates the face data with the video model; a voice processing unit 32 which extracts sound from the information source to generate voice of the virtual person; and a video display processing unit 41 which generates a video of the virtual person speaking, on the basis of the video model integrated with the face data and the generated voice of the virtual person.SELECTED DRAWING: Figure 2

Description

本発明は、仮想人物対話システム、仮想人物対話システムによる映像生成方法および仮想人物対話システムの映像生成プログラムに関する。 The present invention relates to a virtual person dialogue system, a video generation method by the virtual person dialogue system, and a video generation program of the virtual person dialogue system.

特許文献１には、指定された特定の顔画像データと、補正処理に利用された顔画像データとに基づいて、顔認識データ用メモリに記憶すべき顔画像データを補正し、正面以外の角度や方向の画像でも個人の顔の顔検出を行う撮像装置が開示されている。 In Patent Document 1, the face image data to be stored in the face recognition data memory is corrected based on the specified specific face image data and the face image data used for the correction process, and an angle other than the front is obtained. An imaging device that detects an individual's face even in an image of a person's face or a direction is disclosed.

特許文献２には、あらかじめ作成されている会話テンプレートの中から、入力された文に対応する文を選択し、選択された文を仮想エージェントのエージェント情報に基づいて加工して応答文を生成する、会話文生成装置が開示されている。 In Patent Document 2, a sentence corresponding to the input sentence is selected from the conversation templates created in advance, and the selected sentence is processed based on the agent information of the virtual agent to generate a response sentence. , A conversational sentence generator is disclosed.

特開２０１１−７６４５７号公報Japanese Unexamined Patent Publication No. 2011-76457 特開２０１５−６９４５５号公報JP-A-2015-69455

故人や有名人等、実際にはそこに存在しない特定の仮想人物の動画を生成し、現実味のある対話を実現するためには、映像、音声、性格の特性等、仮想人物に関する膨大な情報が必要である。また、これらの情報を統合して仮想人物を生成するには、コンピュータグラフィックス等を用いて映像を生成するため、大規模な設備やコンテンツを購入する必要があり、個人レベルで使用するのは困難であった。そこで、簡易な構成で仮想人物の発話映像を生成できるシステムが必要とされている。 In order to generate a video of a specific virtual person who does not actually exist, such as a deceased person or a celebrity, and to realize a realistic dialogue, a huge amount of information about the virtual person such as video, audio, and personality traits is required. Is. In addition, in order to integrate this information and generate a virtual person, it is necessary to purchase large-scale equipment and contents because video is generated using computer graphics etc., so it is not necessary to use it at the individual level. It was difficult. Therefore, there is a need for a system that can generate an utterance video of a virtual person with a simple configuration.

本発明は、簡易な構成で仮想人物の発話映像を生成することを目的の１つとする。 One of the objects of the present invention is to generate an utterance video of a virtual person with a simple configuration.

上記目的を達成するため、本発明の一の観点に係る仮想人物対話システムは、人が動作する映像モデルを複数種類記憶する映像モデルデータベースと、前記映像モデルデータベース内のデータから、仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択部と、登録される情報ソースから、前記仮想人物の顔データを抽出する映像処理部と、前記顔データを前記使用映像モデルに統合する顔挿入部と、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理部と、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理部と、を備える。 In order to achieve the above object, the virtual person dialogue system according to one aspect of the present invention generates a virtual person from a video model database that stores a plurality of types of video models in which a person operates and data in the video model database. A video model selection unit that selects the video model to be used for, a video processing unit that extracts the face data of the virtual person from the registered information source, and a face insertion unit that integrates the face data into the video model to be used. Based on the voice processing unit that extracts voice from the information source and generates the voice of the virtual person, the video model used in which the face data is integrated, and the generated voice of the virtual person. A video display processing unit that generates a video spoken by the virtual person is provided.

人物の性格モデルを複数記憶する性格モデルデータベースと、前記仮想人物の性格に関する質問を提示し、前記質問に対する回答に基づいて、前記仮想人物の生成に使用する使用性格モデルを前記性格モデルデータベース内のデータから選択する性格モデル選択部と、前記使用性格モデルに基づいて、前記仮想人物が発話するメッセージを生成する対話処理部と、をさらに備えるものとしてもよい。 A personality model database that stores a plurality of personality models of a person and a question about the personality of the virtual person are presented, and based on the answer to the question, a personality model used to generate the virtual person is stored in the personality model database. A personality model selection unit that selects from data and a dialogue processing unit that generates a message spoken by the virtual person based on the usage personality model may be further provided.

前記性格モデル選択部は、前記仮想人物が作成した記録に基づいて前記使用性格モデルを選択するものとしてもよい。 The personality model selection unit may select the personality model based on the record created by the virtual person.

前記仮想人物への質問が入力される入力部と、前記仮想人物の返答を出力する出力部と、をさらに備え、前記対話処理部は、前記質問に対する返答を生成し、前記出力部から前記返答を出力させるものとしてもよい。 An input unit for inputting a question to the virtual person and an output unit for outputting the response of the virtual person are further provided, the dialogue processing unit generates a response to the question, and the output unit outputs the response. May be output.

前記メッセージに対する評価に基づいて前記使用性格モデルを補正する性格モデル補正部をさらに備えるものとしてもよい。 A personality model correction unit that corrects the personality model based on the evaluation of the message may be further provided.

上記目的を達成するため、本発明の別の観点に係る映像生成方法は、人が動作する映像モデルを複数種類記憶する映像モデルデータベースを備える仮想人物対話システムにより、仮想人物の映像を生成する方法であって、
前記映像モデルデータベース内のデータから、前記仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択ステップと、登録される情報ソースから、生成する仮想人物の顔データを抽出する映像処理ステップと、前記顔データを前記使用映像モデルに統合する顔挿入ステップと、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理ステップと、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理ステップと、を含む。 In order to achieve the above object, the image generation method according to another aspect of the present invention is a method of generating an image of a virtual person by a virtual person dialogue system provided with an image model database that stores a plurality of types of image models in which a person operates. And
A video model selection step of selecting a video model to be used for generating the virtual person from the data in the video model database, and a video processing step of extracting the face data of the virtual person to be generated from the registered information source. , A face insertion step that integrates the face data into the used video model, a voice processing step that extracts voice from the information source and generates a voice of the virtual person, and the used video model in which the face data is integrated. And a video display processing step of generating a video spoken by the virtual person based on the generated voice of the virtual person.

上記目的を達成するため、本発明のさらに別の観点に係る映像生成プログラムは、人が動作する映像モデルを複数種類記憶する映像モデルデータベースを備える仮想人物対話システムにより、仮想人物の映像を生成するコンピュータプログラムであって、前記映像モデルデータベース内のデータから、前記仮想人物の生成に使用する使用映像モデルを選択する映像モデル選択命令と、登録される情報ソースから、生成する仮想人物の顔データを抽出する映像処理命令と、前記顔データを前記使用映像モデルに統合する顔挿入命令と、前記情報ソースから音声を抽出し、前記仮想人物の声を生成する音声処理命令と、前記顔データが統合された前記使用映像モデルと、生成された前記仮想人物の声と、に基づいて、前記仮想人物が発話する映像を生成する映像表示処理命令と、をコンピュータに実行させる。
なお、コンピュータプログラムは、インターネット等のネットワークを介したダウンロードによって提供したり、ＣＤ−ＲＯＭなどのコンピュータ読取可能な各種の記録媒体に記録して提供したりすることができる。 In order to achieve the above object, the video generation program according to still another aspect of the present invention generates a video of a virtual person by a virtual person dialogue system including a video model database that stores a plurality of types of video models in which a person operates. A computer program that selects a video model to be used to generate the virtual person from the data in the video model database, and a face data of the virtual person to be generated from the registered information source. The video processing command to be extracted, the face insertion command to integrate the face data into the video model used, the voice processing command to extract the voice from the information source and generate the voice of the virtual person, and the face data are integrated. A computer is made to execute a video display processing command for generating a video spoken by the virtual person based on the generated video model to be used and the generated voice of the virtual person.
The computer program can be provided by downloading via a network such as the Internet, or can be recorded and provided on various computer-readable recording media such as a CD-ROM.

本発明によれば、簡易な構成で仮想人物の発話映像を生成できる。 According to the present invention, it is possible to generate an utterance video of a virtual person with a simple configuration.

本発明にかかる仮想人物対話システムの概略構成図である。It is a schematic block diagram of the virtual person dialogue system which concerns on this invention. 上記仮想人物対話システムの機能ブロック図である。It is a functional block diagram of the said virtual person dialogue system. 上記仮想人物対話システムが、仮想人物の生成に使用する使用映像モデルを決定する工程を示すシーケンス図である。FIG. 5 is a sequence diagram showing a process of determining a video model to be used by the virtual person dialogue system for generating a virtual person. 上記仮想人物対話システムが、仮想人物の声を生成する工程を示すシーケンス図である。It is a sequence diagram which shows the process which the said virtual person dialogue system generates a voice of a virtual person. 上記仮想人物対話システムが、仮想人物の性格モデルを決定する工程を示すシーケンス図である。It is a sequence diagram which shows the process which the said virtual person dialogue system determines a personality model of a virtual person. ユーザが、上記仮想人物対話システムを用いて仮想人物と対話する工程を示すシーケンス図である。It is a sequence diagram which shows the process which a user interacts with a virtual person using the said virtual person dialogue system.

以下、本発明にかかる仮想人物対話システム、映像生成方法、および映像生成プログラムの実施の形態について、図面を参照しながら説明する。 Hereinafter, the virtual person dialogue system, the image generation method, and the embodiment of the image generation program according to the present invention will be described with reference to the drawings.

●仮想人物対話システムの概要
仮想人物対話システムは、実際にはそこにいない特定の仮想人物の動画、声を再生し、また発話内容を自動生成することで、ユーザが仮想人物との対話を疑似的に行うことができるシステムである。仮想人物の生成対象となる人物（以下、「対象人物」ともいう。）は、故人や有名人、戦争体験者等の語り手等、場所や時間の制限により話す機会が無い又は限られる人物が想定されるが、どのような人物であってもよい。仮想人物は、ユーザから登録される、対象人物に関する情報および後述するモデルデータに基づいて生成される。仮想人物は、ユーザ端末１０（図１参照）上において再生され、あたかも実際に存在しているかのように、動作し、発話し、ユーザに話しかけたり、ユーザからの質問に答えたりする。 ● Overview of the virtual person dialogue system The virtual person dialogue system simulates the user's dialogue with the virtual person by playing the video and voice of a specific virtual person who is not actually there and automatically generating the utterance content. It is a system that can be done in a targeted manner. The person to be generated as a virtual person (hereinafter, also referred to as "target person") is assumed to be a person who has no or limited opportunity to speak due to space and time restrictions, such as a deceased person, a celebrity, a narrator such as a war-experienced person, etc. However, it can be any person. The virtual person is generated based on the information about the target person registered by the user and the model data described later. The virtual person is played back on the user terminal 10 (see FIG. 1) and operates, speaks, speaks to the user, and answers questions from the user as if it actually exists.

図１に示すように、ユーザＵは、ユーザ端末１０を介して、仮想人物対話システムの一部又は全部の構成を備えるクラウドコンピュータＣと通信を行うことで、仮想人物Ｋと対話を行う。ユーザＵがユーザ端末１０を介してクラウドコンピュータＣにログインすると（ステップｓ１）、クラウドコンピュータCから仮想人物Kの映像が送信される（ステップｓ２）。ユーザUが仮想人物Kに話しかけると（ステップｓ３）、クラウドコンピュータCは、入力されたメッセージの内容を解析し、あらかじめ決定されている仮想人物Kの性格に基づいて返答を生成し、ユーザ端末１０上で映像と共に再生させる（ステップｓ４）。 As shown in FIG. 1, the user U interacts with the virtual person K by communicating with the cloud computer C having a part or the whole configuration of the virtual person dialogue system via the user terminal 10. When the user U logs in to the cloud computer C via the user terminal 10 (step s1), the image of the virtual person K is transmitted from the cloud computer C (step s2). When the user U talks to the virtual person K (step s3), the cloud computer C analyzes the content of the input message, generates a response based on the predetermined character of the virtual person K, and generates a response, and the user terminal 10 Play back with the video above (step s4).

図２に示すように、本発明にかかる仮想人物対話システム１（以下、「本システム１」ともいう。）は、記憶装置２０と、仮想人物生成装置３０と、動画生成装置４０と、がネットワークNWを介して接続されて構成されている。本システム１は、顧客が有するユーザ端末１０とネットワークＮＷで接続され、相互に情報の送受信が可能である。 As shown in FIG. 2, in the virtual person dialogue system 1 (hereinafter, also referred to as “the present system 1”) according to the present invention, the storage device 20, the virtual person generation device 30, and the moving image generation device 40 are networked. It is configured by being connected via NW. The system 1 is connected to the user terminal 10 owned by the customer via a network NW, and can transmit and receive information to and from each other.

ユーザ端末１０、記憶装置２０、仮想人物生成装置３０および動画生成装置４０の相互の接続は、それぞれ無線であっても有線であってもよい。なお、記憶装置２０、仮想人物生成装置３０および動画生成装置４０は、１個の装置で構成されていてもよい。また、記憶装置２０、仮想人物生成装置３０および動画生成装置４０の機能の一部又は全部がクラウドコンピュータＣ上に実現されていてもよい。 The connection between the user terminal 10, the storage device 20, the virtual person generation device 30, and the moving image generation device 40 may be wireless or wired, respectively. The storage device 20, the virtual person generation device 30, and the moving image generation device 40 may be composed of one device. Further, a part or all of the functions of the storage device 20, the virtual person generation device 30, and the moving image generation device 40 may be realized on the cloud computer C.

ユーザ端末１０は、仮想人物と対話するユーザが使用するコンピュータであり、入力部１１、出力部１２、表示部１３、情報ソース登録部１４、および通信処理部１９を備える。ユーザ端末１０は、例えばパーソナルコンピュータである。また、ユーザ端末１０は、スマートホンやタブレットであってもよい。本システム１に接続されるユーザ端末１０は、単数であっても複数であってもよい。 The user terminal 10 is a computer used by a user who interacts with a virtual person, and includes an input unit 11, an output unit 12, a display unit 13, an information source registration unit 14, and a communication processing unit 19. The user terminal 10 is, for example, a personal computer. Further, the user terminal 10 may be a smartphone or a tablet. The number of user terminals 10 connected to the system 1 may be singular or plural.

入力部１１は、ユーザから仮想人物へのメッセージを入力する機能部であり、キーボード、タッチパネルディスプレイおよびマイクロホン等により構成される。 The input unit 11 is a functional unit for inputting a message from a user to a virtual person, and is composed of a keyboard, a touch panel display, a microphone, and the like.

出力部１２は、仮想人物のメッセージが出力される機能部である。出力部１２は、メッセージを文字表示するディスプレイ、又はメッセージを音声出力するスピーカ等により構成される。 The output unit 12 is a functional unit that outputs a message of a virtual person. The output unit 12 includes a display that displays a message in characters, a speaker that outputs a message by voice, and the like.

ユーザ端末１０の表示部１３は、液晶画面等の平面的な再生機器の他、ヘッドマウントディスプレイ型のＶＲ表示装置や、ホログラム（立体映像）表示装置等の、仮想人物の像を立体的に再生する機器であってもよい。ユーザ端末１０が仮想人物の像を立体的に再生する装置である構成によれば、仮想人物との対話をより現実感のあるものとすることができる。また、表示部１３は、複数のユーザが同時に１個の仮想人物の像を視認可能な投影装置であってもよい。 The display unit 13 of the user terminal 10 three-dimensionally reproduces an image of a virtual person such as a head-mounted display type VR display device and a hologram (stereoscopic image) display device in addition to a flat playback device such as a liquid crystal screen. It may be a device that does. According to the configuration in which the user terminal 10 is a device that reproduces the image of the virtual person in three dimensions, the dialogue with the virtual person can be made more realistic. Further, the display unit 13 may be a projection device capable of allowing a plurality of users to visually recognize an image of one virtual person at the same time.

表示部１３は、本システム１独自のＵＩにより表示されてもよいし、本システム１がＳＫＹＰＥ（登録商標）等既存のチャットツールと連動して、仮想人物からのメッセージや動画が既存のツールに表示されてもよい。この構成によれば、実際の人物とチャットをしているような感覚を得ることができ、仮想人物との対話を現実感のあるものとすることができる。 The display unit 13 may be displayed by the UI unique to the system 1, or the system 1 may be linked with an existing chat tool such as SKYPE (registered trademark) to send a message or video from a virtual person to the existing tool. It may be displayed. According to this configuration, it is possible to obtain the feeling of chatting with an actual person, and it is possible to make the dialogue with the virtual person realistic.

情報ソース登録部１４は、対象人物に関する情報、すなわち対象人物の情報ソースを取得する機能部である。情報ソースは、例えば対象人物が含まれる動画、静止画および音源、ならびに対象人物が作成した日記等の記録文書、趣味嗜好を表す文書、ＳＮＳ等の文字データを含む。また、情報ソースは、衣服等の所有物に関する情報を含む。情報ソースは、ユーザにより登録される他、インターネットを通じて取得してもよい。取得される情報ソースは、仮想人物生成装置３０に送信される。 The information source registration unit 14 is a functional unit that acquires information about the target person, that is, an information source of the target person. The information source includes, for example, moving images including the target person, still images and sound sources, recorded documents such as a diary created by the target person, documents expressing hobbies and tastes, and character data such as SNS. Information sources also include information about property such as clothing. The information source may be registered by the user or acquired through the Internet. The acquired information source is transmitted to the virtual person generation device 30.

通信処理部１９は、ネットワークＮＷを介して本システム１と情報の授受を行う機能部であり、通信の形式は任意である。 The communication processing unit 19 is a functional unit that exchanges information with the system 1 via the network NW, and the communication format is arbitrary.

ユーザがユーザ端末１０を通じて対象人物の情報を登録すると、仮想人物生成装置３０は、当該情報を処理して、仮想人物の映像や声、性格等を決定づける。決定された仮想人物のデータは記憶装置２０に格納され、動画生成装置４０により適宜呼び出される。動画生成装置４０は、仮想人物データに基づいて仮想人物の映像、声、メッセージを含む動画を生成し、ユーザ端末１０上に表示させる。 When the user registers the information of the target person through the user terminal 10, the virtual person generation device 30 processes the information to determine the image, voice, personality, and the like of the virtual person. The determined virtual person data is stored in the storage device 20, and is appropriately called by the moving image generation device 40. The moving image generation device 40 generates a moving image including a video, a voice, and a message of a virtual person based on the virtual person data, and displays the moving image on the user terminal 10.

●記憶装置の構成
記憶装置２０は、情報処理を実行するためのCPU（Central Processing Unit）などの演算装置、RAM（Random Access Memory）やROM（Read Only Memory）などの記憶装置を備え、これによりソフトウェア資源として少なくとも、映像モデルＤＢ２１、性格モデルＤＢ２２、仮想人物データ記憶部２３、および通信処理部２９を有する。なお、本明細書において「ＤＢ」は「データベース」の略である。 ● Configuration of storage device The storage device 20 includes a computing device such as a CPU (Central Processing Unit) for executing information processing, and a storage device such as RAM (Random Access Memory) and ROM (Read Only Memory). As software resources, it has at least a video model DB 21, a personality model DB 22, a virtual person data storage unit 23, and a communication processing unit 29. In this specification, "DB" is an abbreviation for "database".

映像モデルＤＢ２１は、人が動作する映像モデルを複数種類記憶する記憶部である。映像モデルは、仮想人物の像を生成するために用いられる、映像のテンプレートである。映像モデルは、特に胴体の形や動作を構成するデータである。また、映像モデルは、後述する顔データを統合して、統合した顔データを胴体の像と共に動作させるように構成されている。 The video model DB 21 is a storage unit that stores a plurality of types of video models in which a person operates. A video model is a video template used to generate an image of a virtual person. The video model is data that constitutes the shape and movement of the body in particular. Further, the video model is configured to integrate the face data described later and operate the integrated face data together with the image of the body.

映像モデルには、身長、体重、年齢等に応じて、体格が異なる複数種類の人物の外観が含まれている。映像モデルには、各人物が着用して再生可能な、複数種類の服装が含まれている。さらに、映像モデルは、各外観の人物が動作する様々なデータを含んでおり、例えば、うなずく、腕を組む、手を挙げる、といった、対話の際によく行われる動作のデータが含まれている。映像モデルは、実際の人物を撮影した映像であってもよいし、CGでモデリングした映像であってもよく、両方が含まれていてもよい。 The video model includes the appearances of a plurality of types of people having different physiques according to their height, weight, age, and the like. The video model includes multiple types of clothing that each person can wear and play. In addition, the video model contains various data on the movements of the person of each appearance, including data on movements that are often performed during dialogue, such as nodding, crossing arms, and raising hands. .. The video model may be a video of an actual person, a video modeled by CG, or both may be included.

性格モデルＤＢ２２は、人物の性格モデルが複数種類記憶されている記憶部である。性格モデルは、例えば、質問に対する回答の特性を含み、ポジティブな内容であるかネガティブな内容であるかといった回答の方針、および回答に表れる喜怒哀楽等を決定づける。また、性格モデルは、ユーザからの質問に対する回答に限らず、季節や時間帯等に応じたメッセージの特性であってもよい。性格モデルＤＢ２２には、各性格モデルに即した、あらかじめ想定される質問に対する返答が合わせて記憶されていてもよい。この構成によれば、定型的な質問に対して、性格モデルに応じた返答を生成する計算処理負担が軽減できる。 The personality model DB 22 is a storage unit in which a plurality of types of personality models of a person are stored. The personality model includes, for example, the characteristics of the answer to the question, and determines the policy of the answer such as whether the content is positive or negative, and the emotions and emotions appearing in the answer. Further, the personality model is not limited to the answer to the question from the user, and may be the characteristic of the message according to the season, the time zone, and the like. In the personality model DB 22, answers to presumed questions corresponding to each personality model may be stored together. According to this configuration, it is possible to reduce the calculation processing load of generating a response according to the personality model for a standard question.

仮想人物データ記憶部２３は、仮想人物ごとに決定された映像モデル、性格モデル、および声の情報を格納する記憶部である。また、仮想人物データ記憶部２３は、仮想人物が知っている情報、例えば対象人物のエピソードや、体験談等の情報が格納されている。仮想人物データは、仮想人物生成装置３０により決定され、格納される。また、仮想人物データは、仮想人物の動画の再生時に、動画生成装置４０により呼び出される。 The virtual person data storage unit 23 is a storage unit that stores video model, personality model, and voice information determined for each virtual person. Further, the virtual person data storage unit 23 stores information known to the virtual person, for example, information such as an episode of the target person and an experience story. The virtual person data is determined and stored by the virtual person generation device 30. Further, the virtual person data is called by the moving image generation device 40 when playing back the moving image of the virtual person.

●仮想人物生成装置の構成
仮想人物生成装置３０は、情報処理を実行するためのCPU（Central Processing Unit）などの演算装置、RAM（Random Access Memory）やROM（Read Only Memory）などの記憶装置を備え、これによりソフトウェア資源として少なくとも、映像処理部３１、音声処理部３２、性格処理部３３および通信処理部３９を備える。 ● Configuration of virtual person generation device The virtual person generation device 30 includes arithmetic units such as a CPU (Central Processing Unit) for executing information processing, and storage devices such as RAM (Random Access Memory) and ROM (Read Only Memory). As a result, at least a video processing unit 31, a voice processing unit 32, a character processing unit 33, and a communication processing unit 39 are provided as software resources.

映像処理部３１は、対象人物のデータから、仮想人物の生成に用いる外観データを抽出する機能部である。外観データは、対象人物の顔、体、髪型、服装などを含むデータである。また、映像処理部３１は、仮想人物の生成に使用する映像モデルを選択し、仮想人物の映像に使用する映像データを決定する。なお、映像処理部３１は、ユーザ端末１０の情報ソース登録部１４を介して登録される情報ソースの他、インターネット上から取得される情報ソースに基づいて、仮想人物の外観データを抽出してもよい。また、映像処理部３１は、複数のユーザ端末１０から登録される情報ソースに基づいて、１個の仮想人物の生成に用いる外観データを抽出してもよい。有名人など、多くのユーザが共通の仮想人物と対話する場合、各ユーザが１個の仮想人物の情報ソースを登録する。この構成によれば、より多くの情報ソースに基づいて仮想人物を生成することができ、より現実感のある対話が可能となる。 The video processing unit 31 is a functional unit that extracts appearance data used for generating a virtual person from the data of the target person. Appearance data is data including the face, body, hairstyle, clothes, etc. of the target person. Further, the video processing unit 31 selects a video model to be used for generating a virtual person, and determines video data to be used for the video of the virtual person. The video processing unit 31 may extract appearance data of a virtual person based on an information source acquired from the Internet in addition to an information source registered via the information source registration unit 14 of the user terminal 10. Good. Further, the video processing unit 31 may extract appearance data used for generating one virtual person based on information sources registered from a plurality of user terminals 10. When many users, such as celebrities, interact with a common virtual person, each user registers an information source for one virtual person. According to this configuration, a virtual person can be generated based on more information sources, and a more realistic dialogue becomes possible.

映像処理部３１は、動画取得部３１１、静止画取得部３１２、トリミング部３１３、画像補正部３１４、映像モデル選択部３１５および顔挿入部３１６を有する。 The video processing unit 31 includes a moving image acquisition unit 311, a still image acquisition unit 312, a trimming unit 313, an image correction unit 314, a video model selection unit 315, and a face insertion unit 316.

動画取得部３１１は、動画データを取得する機能部である。動画取得部３１１は、ユーザ端末１０に登録される情報ソースに含まれる動画を取得する。また、動画取得部３１１は、ユーザ端末１０を通じてユーザに動画の撮影を促すこともできる。ユーザ端末１０を通じて動画が撮影可能な状況として、例えば、対象人物がユーザの身近な人物であり、仮想人物を別のユーザ端末１０に表示させる場合や、対象人物が亡くなった後にも対話可能とするために仮想人物を生成しておく場合などが考えられる。この場合、動画取得部３１１は、ユーザに動画を撮影させるためのチュートリアルをユーザ端末１０に表示させてもよい。 The moving image acquisition unit 311 is a functional unit that acquires moving image data. The moving image acquisition unit 311 acquires the moving image included in the information source registered in the user terminal 10. In addition, the moving image acquisition unit 311 can also urge the user to shoot a moving image through the user terminal 10. As a situation in which a moving image can be taken through the user terminal 10, for example, when the target person is a person close to the user and a virtual person is displayed on another user terminal 10, dialogue is possible even after the target person dies. Therefore, it is conceivable to generate a virtual person. In this case, the moving image acquisition unit 311 may display a tutorial for causing the user to shoot a moving image on the user terminal 10.

静止画取得部３１２は、静止画データを取得する機能部である。静止画取得部３１２は、ユーザ端末１０に登録される情報ソースに含まれる静止画を取得する。また、静止画取得部３１２は、ユーザ端末１０を通じてユーザに静止画の撮影を促すこともできる。この場合、静止画取得部３１２は、ユーザに静止画、すなわち写真を撮影させるためのチュートリアルをユーザ端末１０に表示させてもよい。また、静止画取得部３１２は、動画データを静止画に変換し、取得する。静止画取得部３１２は、対象人物の様々な角度の画像や、様々な表情の画像を抽出し、静止画に変換する。 The still image acquisition unit 312 is a functional unit that acquires still image data. The still image acquisition unit 312 acquires a still image included in the information source registered in the user terminal 10. In addition, the still image acquisition unit 312 can also urge the user to shoot a still image through the user terminal 10. In this case, the still image acquisition unit 312 may display a tutorial for causing the user to take a still image, that is, a photograph, on the user terminal 10. Further, the still image acquisition unit 312 converts the moving image data into a still image and acquires it. The still image acquisition unit 312 extracts images of various angles of the target person and images of various facial expressions and converts them into still images.

トリミング部３１３は、静止画から対象人物のデータをトリミングして抽出する機能部である。トリミング部３１３は、顔認識機能を備え、対象人物の顔のみを自動で抽出可能であってもよい。 The trimming unit 313 is a functional unit that trims and extracts the data of the target person from the still image. The trimming unit 313 may have a face recognition function and can automatically extract only the face of the target person.

画像補正部３１４は、抽出された画像の色調補正や、解像度補正を行い、抽出した画像の質を均一化する。また、画像補正部３１４は、抽出された画像が鮮明か否かを判別し、不鮮明な画像を抽出されたデータ群から除外してもよい。また、画像補正部３１４は、所定以下の解像度の画像を抽出されたデータ群から除外してもよい。 The image correction unit 314 performs color tone correction and resolution correction of the extracted image to make the quality of the extracted image uniform. Further, the image correction unit 314 may determine whether or not the extracted image is clear, and may exclude the unclear image from the extracted data group. Further, the image correction unit 314 may exclude an image having a resolution of a predetermined value or less from the extracted data group.

映像モデル選択部３１５は、仮想人物の生成に使用する使用映像モデルを映像モデルDB２１内のデータから選択する機能部である。映像モデル選択部３１５は、動画取得部３１１により取得される外観データに基づいて、対象人物に最も類似する映像モデルを選択してもよいし、ユーザ端末１０に映像モデルを複数提示し、ユーザに使用する映像モデルを選択させてもよい。この構成によれば、仮想人物が動いている様子が表れている情報ソースを十分登録しなくても、映像モデルにより仮想人物の動画を構成することができる。 The video model selection unit 315 is a functional unit that selects a video model to be used for generating a virtual person from the data in the video model DB 21. The video model selection unit 315 may select a video model most similar to the target person based on the appearance data acquired by the video acquisition unit 311, or presents a plurality of video models to the user terminal 10 to the user. You may let you select the video model to use. According to this configuration, it is possible to compose a video of a virtual person by a video model without sufficiently registering an information source showing how the virtual person is moving.

また、映像モデル選択部３１５は、生成する仮想人物の服装を、外観データに基づいて決定してもよいし、情報ソースに含まれる所有物情報に基づいて決定してもよい。また、映像モデル選択部３１５は、映像モデルDB２１から、仮想人物の服装を選択してもよい。すなわち、対象人物がその服装をしている情報ソースがあれば、当該情報ソースに基づいて仮想人物の映像を生成することができるし、対象人物の情報ソースがなくても、所有物情報に基づいて仮想人物の映像を生成可能である。また、映像モデルＤＢ２１から服装のデータを選択することもできるので、対象人物の服装に関するデータが不足していても、仮想人物の生成を簡便に行うことができる。なお、映像モデル選択部３１５は複数種類の服装をしている仮想人物の映像を構成しておき、時期や時間帯、又はユーザの選択に基づいて服装が変更可能になっていてもよい。 Further, the video model selection unit 315 may determine the clothes of the virtual person to be generated based on the appearance data or the possession information included in the information source. Further, the video model selection unit 315 may select the clothes of the virtual person from the video model DB 21. That is, if there is an information source in which the target person is dressed, the image of the virtual person can be generated based on the information source, and even if there is no information source of the target person, it is based on the property information. It is possible to generate an image of a virtual person. Further, since the clothes data can be selected from the video model DB 21, the virtual person can be easily generated even if the data related to the clothes of the target person is insufficient. The video model selection unit 315 may configure images of a virtual person wearing a plurality of types of clothes, and the clothes may be changed based on the time, time zone, or user's selection.

映像モデル選択部３１５は、生成する仮想人物の髪型を、外観データに基づいて決定してもよいし、映像モデルDB２１から、仮想人物の髪型を選択してもよい。さらに、映像モデル選択部３１５は、複数種類の髪型をしている仮想人物の映像を構成しておき、髪型が変更可能になっていてもよい。 The video model selection unit 315 may determine the hairstyle of the virtual person to be generated based on the appearance data, or may select the hairstyle of the virtual person from the video model DB 21. Further, the image model selection unit 315 may configure an image of a virtual person having a plurality of types of hairstyles so that the hairstyle can be changed.

なお、ここまでの説明において、映像処理部３１は対象人物自身の情報ソースに基づいて仮想人物のデータを抽出することを想定して説明したが、対象人物に似ている人物の動画や静止画を新たに撮影し、仮想人物の生成に用いてもよい。また、髪型や服装など、似ている人物の外観データを部分的に使用して、仮想人物の生成に用いてもよい。すなわち、外観データのうち仮想人物の生成に用いる要素を、ユーザが選択可能になっていてもよい。 In the above description, the video processing unit 31 has been described on the assumption that the data of the virtual person is extracted based on the information source of the target person itself, but a moving image or a still image of a person similar to the target person has been described. May be newly photographed and used to generate a virtual person. In addition, appearance data of similar persons such as hairstyles and clothes may be partially used to generate a virtual person. That is, the user may be able to select the element of the appearance data used for generating the virtual person.

顔挿入部３１６は、動画取得部３１１、静止画取得部３１２、トリミング部３１３および画像補正部３１４により抽出された顔データを使用映像モデルに統合する機能部である。顔挿入部３１６により、使用映像モデルで構成される胴体に、顔データが統合され、仮想人物の全身像が構成される。 The face insertion unit 316 is a functional unit that integrates the face data extracted by the moving image acquisition unit 311, the still image acquisition unit 312, the trimming unit 313, and the image correction unit 314 into the video model to be used. The face insertion unit 316 integrates face data into the body composed of the video model used, and forms a full-body image of a virtual person.

音声処理部３２は、仮想人物の話す声を人工的に生成する機能部である。音声処理部３２は、音声抽出部３２１と音声生成部３２２とを備える。 The voice processing unit 32 is a functional unit that artificially generates a voice spoken by a virtual person. The voice processing unit 32 includes a voice extraction unit 321 and a voice generation unit 322.

音声抽出部３２１は、情報ソースから対象人物の音声を抽出する機能部である。音声抽出部３２１は、例えば情報ソースに含まれる複数種類の声のうち、最も長時間含まれている人物の声を対象人物の声と同定してもよい。 The voice extraction unit 321 is a functional unit that extracts the voice of the target person from the information source. For example, the voice extraction unit 321 may identify the voice of the person who has been included for the longest time as the voice of the target person among the plurality of types of voices included in the information source.

音声生成部３２２は、音声抽出部３２１により抽出された音声に基づいて、仮想人物の声を生成する機能部である。音声生成部３２２は、対象人物の音声をトリミングし、仮想人物の声として再生可能な状態に編集してもよい。また、音声生成部３２２は、あらかじめ用意された音声データの中から、対象人物の声に似た声を選んで仮想人物の声として決定することもできる。さらに、音声生成部３２２は、対象人物の音声に類似する人工音声を生成してもよい。なお、仮想人物からのメッセージをテキストで表示する場合は、音声の生成はなくてもよい。 The voice generation unit 322 is a functional unit that generates the voice of a virtual person based on the voice extracted by the voice extraction unit 321. The voice generation unit 322 may trim the voice of the target person and edit it so that it can be reproduced as the voice of a virtual person. Further, the voice generation unit 322 can also select a voice similar to the voice of the target person from the voice data prepared in advance and determine it as the voice of the virtual person. Further, the voice generation unit 322 may generate an artificial voice similar to the voice of the target person. When displaying a message from a virtual person as text, it is not necessary to generate voice.

性格処理部３３は、仮想人物の性格モデルを決定する機能部である。性格処理部３３は、テキストデータ登録部３３１、性格モデル選択部３３２、および性格モデル補正部３３３を備える。 The personality processing unit 33 is a functional unit that determines a personality model of a virtual person. The personality processing unit 33 includes a text data registration unit 331, a personality model selection unit 332, and a personality model correction unit 333.

テキストデータ登録部３３１は、情報ソースからテキストデータを抽出し、仮想人物データ記憶部２３に格納する機能部である。テキストデータ登録部３３１は、対象人物のブログやＳＮＳ等の電子的なテキストデータを抽出し、所定のルールに従って仮想人物データ記憶部２３に格納する。また、テキストデータ登録部３３１は、対象人物による手書きの文書、例えば日記等を読み込み、テキストデータに変換して仮想人物データ記憶部２３に格納してもよい。さらに、テキストデータ登録部３３１は、音声や動画データに含まれる対象人物の声をテキストデータに変換し、仮想人物データ記憶部２３に格納してもよい。 The text data registration unit 331 is a functional unit that extracts text data from an information source and stores it in the virtual person data storage unit 23. The text data registration unit 331 extracts electronic text data such as a blog or SNS of the target person, and stores it in the virtual person data storage unit 23 according to a predetermined rule. Further, the text data registration unit 331 may read a handwritten document by the target person, for example, a diary, etc., convert it into text data, and store it in the virtual person data storage unit 23. Further, the text data registration unit 331 may convert the voice of the target person included in the voice or moving image data into text data and store it in the virtual person data storage unit 23.

性格モデル選択部３３２は、仮想人物の生成に使用する性格モデル（以下、「使用性格モデル」ともいう。）を性格モデルDB２２から選択する機能部である。性格モデル選択部３３２は、ユーザ端末１０を通じて仮想人物の性格に関する質問を提示する。ユーザ端末１０から質問に対する回答が入力されると、当該回答に基づいて、仮想人物の生成に使用する使用性格モデルを性格モデルＤＢ２２内のデータから選択する。 The personality model selection unit 332 is a functional unit that selects a personality model (hereinafter, also referred to as a “usable personality model”) used for generating a virtual person from the personality model DB 22. The personality model selection unit 332 presents a question regarding the personality of the virtual person through the user terminal 10. When an answer to the question is input from the user terminal 10, the personality model used for generating the virtual person is selected from the data in the personality model DB 22 based on the answer.

性格に関する質問は、複数提示されてもよい。また、入力される回答と次の質問とが紐づけられるチャートに沿って、質問が提示されてもよい。ユーザが質問に答えていくことで、あらかじめ用意された性格の基本分類に基づいて、仮想人物の基本的な性格付けが行われる。性格付けを、対象人物の実際の会話の情報から行うものとすると、膨大な量の会話の情報が必要である。本システム１によれば、性格に関する質問の回答に基づいてあらかじめ用意された性格のいずれかに分類することができるので、情報が不足していても簡易な構成で仮想人物の性格を決定することができる。 Multiple personality questions may be asked. In addition, the question may be presented along a chart in which the input answer and the next question are linked. When the user answers the question, the basic personality of the virtual person is performed based on the basic classification of the personality prepared in advance. If the personality is based on the information of the actual conversation of the target person, a huge amount of information on the conversation is required. According to this system 1, it is possible to classify into one of the personalities prepared in advance based on the answer to the question about the personality, so even if the information is insufficient, the personality of the virtual person can be determined with a simple configuration. Can be done.

なお、仮想人物の性格モデルは、ユーザからの質問のタイプに応じたシナリオパターンごとに定められていてもよい。シナリオパターンは、例えば日常会話、又は悩みごとの相談等である。一部のシナリオパターンに関して性格モデルが決定されれば、当該シナリオパターンに即した対話が可能に構成されていてもよい。この構成によれば、必要なシナリオパターンに関する性格モデルのみを決定すれば対話できるので、簡便である。 The personality model of the virtual person may be defined for each scenario pattern according to the type of question from the user. The scenario pattern is, for example, daily conversation or consultation for each problem. If the personality model is determined for some scenario patterns, it may be possible to have a dialogue according to the scenario pattern. According to this configuration, it is convenient because it is possible to have a dialogue by determining only the personality model related to the necessary scenario pattern.

性格モデル補正部３３３は、性格モデル選択部３３２が選択した使用性格モデルを補正する機能部である。性格モデル補正部３３３は、ユーザ端末１０から、仮想人物が行った返答に対する評価を受信し、当該評価に基づいて使用性格モデルを補正する。例えば、ユーザは、返答に対し、対象人物の返答として適切な内容であったか否かを評価として入力する。また、返答とともになされる仮想人物の動作に対して評価をしてもよい。性格モデル補正部３３３は、ＡＩ等により自動学習を行い、性格モデルを補正する。この構成によれば、仮想人物の性格をより対象人物に近いものに補正することができる。なお、１個の仮想人物に対して複数のユーザ端末１０が同時又は別の時点で対話を行う場合において、複数のユーザ端末１０からの評価を１個の仮想人物の性格モデルの補正に使用してもよい。この構成によれば、仮想人物の性格モデルに多くのフィードバックを与えることができるため、仮想人物の性格モデルを対象人物の性格により近づけ、対話精度を上げることができる。 The personality model correction unit 333 is a functional unit that corrects the personality model selected by the personality model selection unit 332. The personality model correction unit 333 receives an evaluation for the response made by the virtual person from the user terminal 10, and corrects the personality model based on the evaluation. For example, the user inputs as an evaluation whether or not the content is appropriate for the response of the target person in response to the response. In addition, the action of the virtual person performed with the response may be evaluated. The personality model correction unit 333 automatically learns by AI or the like to correct the personality model. According to this configuration, the character of the virtual person can be corrected to be closer to the target person. When a plurality of user terminals 10 interact with one virtual person at the same time or at different times, the evaluations from the plurality of user terminals 10 are used to correct the personality model of one virtual person. You may. According to this configuration, a lot of feedback can be given to the personality model of the virtual person, so that the personality model of the virtual person can be brought closer to the character of the target person and the dialogue accuracy can be improved.

また、性格モデル補正部３３３は、ユーザからの評価ではなく、仮想人物からのメッセージに対するユーザの返答に基づいて、当該メッセージが適していたか否かを判定し、性格モデルを補正してもよい。性格モデル補正部３３３は、ユーザの返答内容をテキストデータに変換して解析してもよいし、ユーザの声色から満足度を類推してもよい。 Further, the personality model correction unit 333 may determine whether or not the message is suitable based on the user's response to the message from the virtual person instead of the evaluation from the user, and correct the personality model. The personality model correction unit 333 may convert the response content of the user into text data and analyze it, or may infer the satisfaction level from the voice of the user.

通信処理部３９は、ユーザ端末１０、記憶装置２０、および動画生成装置４０と、ネットワークNWを通じて相互に通信する機能部である。 The communication processing unit 39 is a functional unit that communicates with the user terminal 10, the storage device 20, and the moving image generation device 40 through the network NW.

●動画生成装置の構成
動画生成装置４０は、仮想人物生成装置３０により生成された仮想人物の動画を、ユーザ端末１０上に表示させる装置である。動画生成装置４０は、映像表示処理部４１、対話処理部４２、通信処理部４９を備える。 ● Configuration of Movie Generation Device The movie generation device 40 is a device that displays a movie of a virtual person generated by the virtual person generation device 30 on the user terminal 10. The moving image generation device 40 includes a video display processing unit 41, an interactive processing unit 42, and a communication processing unit 49.

映像表示処理部４１は、仮想人物が発話する発話映像を生成する機能部である。映像表示処理部４１は、外観データから抽出された顔データをモデリング処理し、発話に合わせて動作させる。 The video display processing unit 41 is a functional unit that generates an utterance video spoken by a virtual person. The video display processing unit 41 models the face data extracted from the appearance data and operates it according to the utterance.

対話処理部４２は、使用性格モデルに基づいて、仮想人物が発話するメッセージを生成する機能部である。メッセージの内容は、ユーザからの質問に対する返答であってもよいし、日付、季節、又は時間帯、もしくはインターネット上の天気予報やニュース等の外部情報に応じて生成される言葉であってもよい。また、ユーザへの返答にあたって、使用性格モデルに加えて、日付、季節、又は時間帯、もしくはインターネット上の天気予報やニュース等の外部情報に基づいて返答を生成してもよい。対話処理部４２は、最適な回答をＡＩにより決定する。 The dialogue processing unit 42 is a functional unit that generates a message uttered by a virtual person based on the usage personality model. The content of the message may be a response to a question from the user, or may be a word generated according to the date, season, or time zone, or external information such as weather forecast or news on the Internet. .. Further, in replying to the user, in addition to the usage personality model, the reply may be generated based on the date, season, or time zone, or external information such as weather forecast and news on the Internet. The dialogue processing unit 42 determines the optimum answer by AI.

対話処理部４２で生成されるメッセージは、音声処理部３２で生成される声により発話され、映像表示処理部４１で生成される発話映像と共にユーザ端末１０で再生される。仮想人物の声は、音声抽出部３２１から抽出した対象人物の台詞を再生してもよい。また、あらかじめ決定した似た声の音源データに基づいて再生してもよい。さらに、人工の音声を生成し、再生してもよい。 The message generated by the dialogue processing unit 42 is uttered by the voice generated by the voice processing unit 32, and is reproduced on the user terminal 10 together with the uttered video generated by the video display processing unit 41. As the voice of the virtual person, the dialogue of the target person extracted from the voice extraction unit 321 may be reproduced. Further, it may be reproduced based on the sound source data of a similar voice determined in advance. In addition, artificial audio may be generated and reproduced.

通信処理部４９は、ユーザ端末１０、記憶装置２０、および仮想人物生成装置３０と、ネットワークNWを通じて相互に通信する機能部である。 The communication processing unit 49 is a functional unit that communicates with the user terminal 10, the storage device 20, and the virtual person generation device 30 through the network NW.

●使用映像モデルを決定する流れ
図３を用いて、仮想人物生成装置３０が使用映像モデルを決定する流れを説明する。同図に示すように、まず、ユーザ端末１０から対象人物の情報ソースが登録され、仮想人物生成装置３０に送信される（ステップＳ１１）。次いで、仮想人物生成装置３０は、情報ソースから外観データを抽出する（ステップＳ１２）。外観データのうち、動画を静止画に変換する（ステップＳ１３）。次いで、登録された静止画および動画から変換された静止画に対し、対象人物の画像をトリミングし、画像の色調および解像度を補正する（ステップＳ１４）。トリミングおよび画像の補正は、順不同である。なお、このとき、補正してもデータの品質が所定以下である場合は、当該画像を後の工程に使用しないことを決定してもよい。 ● Flow of determining the video model to be used The flow of determining the video model to be used by the virtual person generation device 30 will be described with reference to FIG. As shown in the figure, first, the information source of the target person is registered from the user terminal 10 and transmitted to the virtual person generation device 30 (step S11). Next, the virtual person generation device 30 extracts appearance data from the information source (step S12). Of the appearance data, the moving image is converted into a still image (step S13). Next, the image of the target person is trimmed with respect to the registered still image and the still image converted from the moving image, and the color tone and resolution of the image are corrected (step S14). Cropping and image correction are in no particular order. At this time, if the quality of the data is still less than or equal to the predetermined value even after the correction, it may be decided not to use the image in the subsequent process.

次いで、仮想人物生成装置３０は、トリミングおよび画像補正が施された画像を記憶装置２０の仮想人物データ記憶部２３に格納する（ステップＳ１５）。仮想人物生成装置３０は、格納される画像のうち主に体格に関する情報に基づいて、映像モデルＤＢ２１に格納されている映像モデルを参照し（ステップＳ１６）、対象人物の外観に最も類似する映像モデルを選択し、ユーザ端末１０に表示させる（ステップＳ１７）。なお、このとき、映像モデルの候補がユーザ端末１０に複数表示され、ユーザ端末１０により使用映像モデルを選択可能にしてもよい。また、提示される映像モデルとは異なる映像モデルをユーザ端末１０により選択可能になっていてもよい。 Next, the virtual person generation device 30 stores the trimmed and image-corrected image in the virtual person data storage unit 23 of the storage device 20 (step S15). The virtual person generation device 30 refers to the video model stored in the video model DB 21 based mainly on the information about the physique of the stored images (step S16), and the video model most similar to the appearance of the target person. Is selected and displayed on the user terminal 10 (step S17). At this time, a plurality of video model candidates may be displayed on the user terminal 10, and the video model to be used may be selectable by the user terminal 10. Further, a video model different from the presented video model may be selectable by the user terminal 10.

次いで、ユーザ端末１０は、使用映像モデルが有するパーツを個別に変更する入力を受け付ける（ステップＳ１８）。パーツは、輪郭や、目、鼻又は口等の各このとき、仮想モデルの髪型や服装についての選択が入力されてもよい。使用映像モデルのパーツが適宜変更され、仮想人物の使用映像モデルが確定すると、当該使用映像モデルに、外観データから抽出された顔データを統合する（ステップＳ１９）。次いで、顔データが統合された使用映像モデルを、記憶装置２０の仮想人物データ記憶部２３に格納する（ステップＳ２０）。 Next, the user terminal 10 receives an input for individually changing the parts of the video model used (step S18). For the parts, selections for the contour, eyes, nose, mouth, etc., and the hairstyle and clothes of the virtual model may be input at this time. When the parts of the video model used are appropriately changed and the video model used by the virtual person is determined, the face data extracted from the appearance data is integrated into the video model used (step S19). Next, the used video model in which the face data is integrated is stored in the virtual person data storage unit 23 of the storage device 20 (step S20).

●仮想人物の声を生成する流れ
図４を用いて、仮想人物生成装置３０が仮想人物の声を生成する流れを説明する。まず、ユーザ端末１０から情報ソースが登録されると（ステップＳ２１）、仮想人物生成装置３０は、当該情報ソースから対象人物の音声データを抽出する（ステップＳ２２）。仮想人物生成装置３０は、当該音声データに基づいて、仮想人物の声を生成する。 ● Flow of generating a voice of a virtual person With reference to FIG. 4, a flow of generating a voice of a virtual person by the virtual person generation device 30 will be described. First, when the information source is registered from the user terminal 10 (step S21), the virtual person generation device 30 extracts the voice data of the target person from the information source (step S22). The virtual person generation device 30 generates a voice of a virtual person based on the voice data.

●仮想人物の性格モデルを決定する流れ
図５を用いて、仮想人物生成装置３０が仮想人物の性格モデルを決定する流れを説明する。ユーザ端末１０から情報ソースが登録されると（ステップＳ３１）、仮想人物生成装置３０は、当該情報ソースからブログやＳＮＳなどのテキストデータを抽出する（ステップＳ３２）。また、このとき、手書きの日記等の画像データを抽出し、テキストデータに変換する。さらに、音源データを抽出し、対象人物の声をテキストデータに変換する。抽出されたテキストデータは、所定のルールに基づいて仮想人物データ記憶部２３に格納される（ステップＳ３３）。 ● Flow of Determining the Personality Model of a Virtual Person With reference to FIG. 5, a flow of determining a personality model of a virtual person by the virtual person generation device 30 will be described. When the information source is registered from the user terminal 10 (step S31), the virtual person generation device 30 extracts text data such as a blog or SNS from the information source (step S32). At this time, image data such as a handwritten diary is extracted and converted into text data. Furthermore, the sound source data is extracted and the voice of the target person is converted into text data. The extracted text data is stored in the virtual person data storage unit 23 based on a predetermined rule (step S33).

次いで、仮想人物生成装置３０は、対象人物の性格に関する質問を、ユーザ端末１０に表示させる（ステップＳ３４）。このとき、質問の内容は登録される情報ソースに基づいて決定されてもよい。また、登録したいシナリオパターンをユーザに選択させ、シナリオパターンに応じた質問を表示させてもよい。ユーザ端末１０は、質問に対する回答の入力を受け付ける（ステップＳ３５）。なおこのとき、１度に複数の質問が表示されてもよいし、ステップＳ３４およびステップＳ３５を繰り返してもよい。 Next, the virtual person generation device 30 causes the user terminal 10 to display a question regarding the character of the target person (step S34). At this time, the content of the question may be determined based on the registered information source. Further, the user may be made to select the scenario pattern to be registered, and the question corresponding to the scenario pattern may be displayed. The user terminal 10 accepts the input of the answer to the question (step S35). At this time, a plurality of questions may be displayed at one time, or steps S34 and S35 may be repeated.

仮想人物生成装置３０は、性格に関する質問への回答に基づいて、性格モデルＤＢ２２に格納されている性格モデルを参照し（ステップＳ３６）、使用性格モデルを決定する（ステップＳ３７）。次いで、決定した使用性格モデルを仮想人物データ記憶部２３に格納する（ステップＳ３８）。 The virtual person generation device 30 refers to the personality model stored in the personality model DB 22 (step S36) based on the answer to the question about the personality, and determines the personality model to be used (step S37). Next, the determined usage personality model is stored in the virtual person data storage unit 23 (step S38).

●仮想人物と対話する流れ
図６を用いて、ユーザが仮想人物対話システムを用いて仮想人物と対話する流れを説明する。ユーザ端末１０にＩＤおよびパスワードが入力されると（ステップＳ４１）、仮想人物生成装置３０により認証され（ステップＳ４２）、ＩＤに紐づけられている仮想人物との対話が可能となる。このとき、仮想人物からチャットの着信がある、電話がかかってくる、メールが届く、といった演出がなされてもよい。次いで、記憶装置２０の仮想人物データ記憶部２３から、対話する仮想人物のデータが呼び出され、動画生成装置４０により参照可能な状態となる（ステップＳ４３）。すなわち、ユーザ端末１０上に仮想人物の像が表示される。仮想人物は、表示された時点で発話してもよく、動作をしてもよい。 ● Flow of interacting with a virtual person With reference to FIG. 6, a flow of a user interacting with a virtual person using a virtual person dialogue system will be described. When the ID and password are input to the user terminal 10 (step S41), the virtual person generator 30 authenticates the user terminal 10 (step S42), and the user can interact with the virtual person associated with the ID. At this time, an effect such as an incoming chat from a virtual person, an incoming call, or an e-mail may be made. Next, the virtual person data storage unit 23 of the storage device 20 calls up the data of the virtual person to interact with, and the video generation device 40 makes it possible to refer to the data (step S43). That is, an image of a virtual person is displayed on the user terminal 10. The virtual person may speak or act when displayed.

ユーザ端末１０から仮想人物への質問が入力されると（ステップＳ４４）、動画生成装置４０は、仮想人物のデータに基づいて、仮想人物が返答する動画を生成する。 When a question to the virtual person is input from the user terminal 10 (step S44), the moving image generation device 40 generates a moving image in which the virtual person responds based on the data of the virtual person.

具体的には、まず、動画生成装置４０は、仮想人物の性格モデルに基づいて、質問に対する返答テキストを生成する（ステップＳ４５）。また、動画生成装置４０は、当該返答テキストを仮想人物の声で再生する返答音声を生成する（ステップＳ４６）。返答音声は、記憶されている対象人物の音源データでもよいし、人工的に生成した人工音声であってもよい。さらに、動画生成装置４０は、返答音声を再生する際に再生される返答映像を生成する（ステップＳ４７）。生成される返答音声および返答映像は、返答する動画としてユーザ端末１０に送信される（ステップＳ４８）。なお、返答音声および返答映像は、統合されて１個のデータファイルとしてユーザ端末１０に送信されてもよいし、それぞれのデータファイルがユーザ端末１０に送信されてもよい。次いで、ユーザ端末１０上に、仮想人物の動画が表示される（ステップＳ４９）。すなわち、ユーザからの質問に対し仮想人物が返答し、仮想人物との対話が成立する。ステップＳ４４からステップＳ４９までの工程は、複数回繰り返されてよい。この構成により、仮想人物との自然な対話が可能である。 Specifically, first, the moving image generation device 40 generates a response text to the question based on the personality model of the virtual person (step S45). In addition, the moving image generation device 40 generates a response voice that reproduces the response text in the voice of a virtual person (step S46). The response voice may be stored sound source data of the target person, or may be artificially generated artificial voice. Further, the moving image generation device 40 generates a response video to be reproduced when the response voice is reproduced (step S47). The generated response voice and response video are transmitted to the user terminal 10 as a response video (step S48). The response audio and the response video may be integrated and transmitted to the user terminal 10 as one data file, or each data file may be transmitted to the user terminal 10. Next, a moving image of the virtual person is displayed on the user terminal 10 (step S49). That is, the virtual person responds to the question from the user, and the dialogue with the virtual person is established. The steps from step S44 to step S49 may be repeated a plurality of times. With this configuration, it is possible to have a natural dialogue with a virtual person.

なお、図６においては、ステップＳ４４に示すユーザ端末１０への質問の入力を契機に仮想人物の動画が生成される流れを説明したが、所定の日付や時間になったことに基づいて仮想人物の動画が生成され、ユーザ端末１０に表示される構成であってもよい。また、インターネット上等からの外部情報に基づいて動画が生成されてもよいし、仮想人物対話システム１の管理者からの指令に基づいて、動画が生成されてもよい。動画が生成されると直ちにユーザ端末１０に表示される構成であってもよいし、動画をあらかじめ生成しておき、ユーザからの質問、日付、時間、外部情報又は指令等を契機にユーザ端末１０に表示させてもよい。 In FIG. 6, the flow of generating the video of the virtual person triggered by the input of the question to the user terminal 10 shown in step S44 has been described, but the virtual person is based on the predetermined date and time. The moving image may be generated and displayed on the user terminal 10. Further, the moving image may be generated based on external information from the Internet or the like, or the moving image may be generated based on a command from the administrator of the virtual person dialogue system 1. It may be configured to be displayed on the user terminal 10 as soon as the moving image is generated, or the moving image may be generated in advance and the user terminal 10 may be triggered by a question, date, time, external information or command from the user. It may be displayed in.

ステップＳ４９に次いで、ユーザ端末１０から動画に対する評価が入力されると（ステップＳ５０）、仮想人物生成装置３０は性格モデルを補正し、記憶装置２０の仮想人物データ記憶部２３に記憶する（ステップＳ５１）。 Following step S49, when the evaluation of the moving image is input from the user terminal 10 (step S50), the virtual person generation device 30 corrects the personality model and stores it in the virtual person data storage unit 23 of the storage device 20 (step S51). ).

このように、本発明にかかる仮想人物対話システムによれば、簡易な構成で仮想人物の発話映像を生成できる。 As described above, according to the virtual person dialogue system according to the present invention, it is possible to generate an utterance video of a virtual person with a simple configuration.

１仮想人物生成システム
１０ユーザ端末
２０記憶装置
２１映像モデルＤＢ
３０仮想人物生成装置
３１映像処理部
４０動画生成装置
４１映像表示処理部 1 Virtual person generation system 10 User terminal 20 Storage device 21 Video model DB
30 Virtual person generation device 31 Video processing unit 40 Video generation device 41 Video display processing unit

Claims

A video model database that stores multiple types of video models in which humans operate,
From the data in the video model database, a video model selection unit that selects the video model to be used for generating a virtual person, and
A video processing unit that extracts the face data of the virtual person from the registered information source,
A face insertion unit that integrates the face data into the video model used, and
A voice processing unit that extracts voice from the information source and generates the voice of the virtual person,
A video display processing unit that generates a video spoken by the virtual person based on the used video model in which the face data is integrated and the generated voice of the virtual person.
A personality model database that stores multiple personality models of a person,
A personality model selection unit that selects a personality model to be used for generating the virtual person from data in the personality model database based on a record created by the person for which the virtual person is to be generated.
A dialogue processing unit that generates a message uttered by the virtual person based on the usage personality model, and
To prepare
Virtual person dialogue system.

The personality model selection unit presents a question regarding the personality of the virtual person, and selects a personality model to be used for generating the virtual person from the data in the personality model database based on the answer to the question.
The virtual person dialogue system according to claim 1.

An input unit for inputting a question to the virtual person and an output unit for outputting the response of the virtual person are further provided.
The dialogue processing unit generates a response to the question, and outputs the response from the output unit.
The virtual person dialogue system according to claim 1 or 2.

A personality model correction unit that corrects the usage personality model based on the evaluation of the message is further provided.
The virtual person dialogue system according to any one of claims 1 to 3.

It is a method of generating a virtual person's image by a virtual person dialogue system including a video model database that stores a plurality of types of video models in which a person operates and a personality model database that stores a plurality of person's personality models.
A video model selection step for selecting a video model to be used for generating the virtual person from the data in the video model database, and
A video processing step that extracts the face data of the virtual person to be generated from the registered information source, and
A face insertion step that integrates the face data into the video model used,
A voice processing step that extracts voice from the information source and generates the voice of the virtual person,
A video display processing step that generates a video spoken by the virtual person based on the used video model in which the face data is integrated and the generated voice of the virtual person.
A personality model selection step of selecting a personality model to be used for generating the virtual person from data in the personality model database based on a record created by the person for which the virtual person is to be generated, and a personality model selection step.
A dialogue processing step that generates a message spoken by the virtual person based on the usage personality model, and
including,
Video generation method.

A computer program that generates a virtual person's image by a virtual person dialogue system including a video model database that stores a plurality of types of video models in which a person operates and a personality model database that stores a plurality of person's personality models.
A video model selection command for selecting a video model to be used for generating the virtual person from the data in the video model database, and
A video processing instruction that extracts the face data of the virtual person to be generated from the registered information source,
A face insertion command that integrates the face data into the video model used, and
A voice processing instruction that extracts voice from the information source and generates the voice of the virtual person,
A video display processing instruction that generates a video spoken by the virtual person based on the used video model in which the face data is integrated and the generated voice of the virtual person.
A personality model selection command for selecting a personality model to be used for generating the virtual person from data in the personality model database based on a record created by the person for which the virtual person is to be generated.
An interactive processing instruction that generates a message spoken by the virtual person based on the usage personality model, and
Let the computer run
Video generation program.