JP2021043841A

JP2021043841A - Virtual character generation apparatus and program

Info

Publication number: JP2021043841A
Application number: JP2019166912A
Authority: JP
Inventors: 松本　征二; Seiji Matsumoto; 征二松本; 秀行増井; Hideyuki Masui
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2019-09-13
Filing date: 2019-09-13
Publication date: 2021-03-18
Anticipated expiration: 2039-09-13
Also published as: JP7415387B2

Abstract

To provide a virtual character generation apparatus which can simply generate a virtual character with no feeling of strangeness, and a program.SOLUTION: A virtual character generation apparatus 1 includes: an input data receiving unit 11 which receives input data; an image impression specifying unit 15 or a voice impression specifying unit 19 which specifies impression of the input data received by the input data receiving unit 11 by referring to a first storage unit storing first data composed of image data and voice data and an impression discriminator to be used for specifying impression of the first data; a voice selection unit 16 or an image selection unit 20 which selects second data associated with the specified impression by referring to a second storage unit storing multiple pieces of second data of a kind different from the first data and associated with impressions thereof; and a virtual character generation unit 21 which generates a virtual character composed of the input data received by the input data receiving unit 11 and the second data selected by the voice selection unit 16 or the image selection unit 20.SELECTED DRAWING: Figure 1

Description

本発明は、仮想キャラクタ生成装置及びプログラムに関する。 The present invention relates to a virtual character generator and a program.

従来、企業の受付等にディスプレイやマイクを設置し、表示されたバーチャルなキャラクタが、人に代わってユーザに対する接客応対を行う取り組みがなされている。そのような接客をするキャラクタは、３次元仮想空間上で動作する仮想キャラクタを生成する装置（例えば、特許文献１）により生成されるものを利用している場合がある。 Conventionally, an effort has been made to install a display or a microphone at a reception desk of a company, and the displayed virtual character responds to a user on behalf of a person. As the character that serves such customers, a character generated by a device that generates a virtual character that operates in a three-dimensional virtual space (for example, Patent Document 1) may be used.

特開２０１１−１１３１３５号公報Japanese Unexamined Patent Publication No. 2011-113135

上述したような接客対応で使用する仮想キャラクタを、接客相手の好みに適合したものにすることは可能である。そのようにすれば、相手が、仮想キャラクタに対する好感を持ちやすいものにできる。しかし、多数を相手に接客する場合には、各々の相手の好みに合うように、仮想キャラクタを多数用意しなければならず、手間がかかる。また、仮想キャラクタをしゃべるようにしたものでは、同じ声を用いると、仮想キャラクタの見た目と声とが一致せず、違和感を覚えるものになってしまう。 It is possible to make the virtual character used for customer service as described above suitable for the taste of the customer service partner. By doing so, the other party can easily have a favorable impression on the virtual character. However, when serving a large number of customers, it is necessary to prepare a large number of virtual characters to suit the tastes of each partner, which is troublesome. Further, in the case where the virtual character is made to speak, if the same voice is used, the appearance and the voice of the virtual character do not match, and the person feels uncomfortable.

そこで、本発明は、違和感のない仮想キャラクタを簡単に生成可能な仮想キャラクタ生成装置及びプログラムを提供することを目的とする。 Therefore, an object of the present invention is to provide a virtual character generation device and a program capable of easily generating a virtual character without a sense of discomfort.

本発明は、以下のような解決手段により、前記課題を解決する。
第１の発明は、画像データと音声データとを組み合わせて構成される仮想キャラクタを生成する仮想キャラクタ生成装置であって、入力データを受け付ける入力データ受付手段と、画像データ又は音声データからなる第１データと前記第１データの印象特定に用いられる印象識別器とを記憶する第１記憶部と、前記第１データと異なる種別からなるそれぞれの印象が対応付けられた複数の第２データを記憶する第２記憶部と、前記第１記憶部を参照して、前記入力データ受付手段が受け付けた前記入力データの印象を特定する印象特定手段と、前記第２記憶部を参照して、前記印象特定手段により特定した前記印象に対応付けられた第２データを選定する選定手段と、前記入力データ受付手段が受け付けた前記入力データと、前記選定手段により選定された前記第２データとにより仮想キャラクタが生成される仮想キャラクタ生成手段と、を備える仮想キャラクタ生成装置である。
第２の発明は、第１の発明の仮想キャラクタ生成装置において、前記入力データ受付手段は、一の仮想キャラクタの画像データを、前記入力データとして受け付け、前記第２記憶部は、音声データと、前記音声データに適合する前記印象とが予め対応付けられた音声データ記憶部であり、前記入力データ受付手段が受け付けた前記一の仮想キャラクタの画像データを分析して、前記一の仮想キャラクタの画像データに関する特徴を示す画像特徴データを抽出する特徴抽出手段を備え、前記印象特定手段は、前記特徴抽出手段により抽出された前記画像特徴データに基づいて、前記一の仮想キャラクタの画像データに係る前記印象を特定し、前記選定手段は、前記音声データ記憶部を参照し、前記印象特定手段により特定された前記印象から前記音声データを選定する、仮想キャラクタ生成装置である。
第３の発明は、第２の発明の仮想キャラクタ生成装置において、前記音声データ記憶部は、性別及び年代ごとに、前記音声データと、前記印象とが対応付けられており、前記特徴抽出手段により抽出された前記画像特徴データに基づいて、性別及び年代を推定する推定手段を備え、前記選定手段は、前記音声データ記憶部を参照し、さらに前記推定手段が推定した前記性別及び前記年代に基づいて、前記音声データを選定する、仮想キャラクタ生成装置である。
第４の発明は、第２の発明又は第３の発明の仮想キャラクタ生成装置において、前記第１記憶部は、各仮想キャラクタの前記画像特徴データと、前記印象とを対応付けたキャラクタ特徴記憶部であり、前記印象特定手段は、前記特徴抽出手段により抽出された前記画像特徴データと、前記キャラクタ特徴記憶部に記憶された前記画像特徴データとの一致度合に基づいて、前記印象を特定する、仮想キャラクタ生成装置である。
第５の発明は、第１の発明の仮想キャラクタ生成装置において、前記入力データ受付手段は、一の音声データを、前記入力データとして受け付け、前記第２記憶部は、各仮想キャラクタの少なくとも一部を表す画像データと、前記画像データに適合する前記印象とが予め対応付けられた画像データ記憶部であり、前記入力データ受付手段が受け付けた前記一の音声データを分析して、前記一の音声データの特徴量を算出する特徴量算出手段を備え、前記印象特定手段は、前記特徴量算出手段により抽出された前記音声データの特徴量に基づいて、前記一の音声データに係る前記印象を特定し、前記選定手段は、前記画像データ記憶部を参照し、前記印象特定手段により特定された前記印象から前記画像データを選定する、仮想キャラクタ生成装置である。
第６の発明は、第５の発明の仮想キャラクタ生成装置において、前記第１記憶部は、前記音声データの特徴量と、前記印象とを対応付けた音声特徴記憶部であり、前記印象特定手段は、前記特徴量算出手段により算出された前記一の音声データの特徴量と、前記音声特徴記憶部に記憶された前記音声データの特徴量との一致度合に基づいて、前記印象を特定する、仮想キャラクタ生成装置である。
第７の発明は、第５の発明又は第６の発明の仮想キャラクタ生成装置において、前記画像データ記憶部は、各仮想キャラクタを構成するパーツごとに、前記画像データと、前記印象とが対応付けられており、前記選定手段は、前記画像データ記憶部を参照し、前記パーツごとに、前記印象特定手段により特定された前記印象から前記画像データを選定し、前記仮想キャラクタ生成手段は、前記選定手段により前記パーツごとに選定された前記画像データを組み合わせて一の仮想キャラクタの画像データを生成し、生成した前記画像データと、前記一の音声データとによる前記仮想キャラクタを生成する、仮想キャラクタ生成装置である。
第８の発明は、第１の発明から第７の発明までのいずれかの仮想キャラクタ生成装置としてコンピュータを機能させるためのプログラムである。 The present invention solves the above problems by the following solutions.
The first invention is a virtual character generation device that generates a virtual character composed of a combination of image data and audio data, the first invention comprising an input data receiving means for receiving input data and image data or audio data. A first storage unit that stores data and an impression classifier used to identify the impression of the first data, and a plurality of second data in which each impression of a type different from the first data is associated with each other are stored. The impression specifying means for specifying the impression of the input data received by the input data receiving means by referring to the second storage unit and the first storage unit, and the impression specifying means with reference to the second storage unit. A virtual character is created by the selection means for selecting the second data associated with the impression specified by the means, the input data received by the input data receiving means, and the second data selected by the selection means. It is a virtual character generation device including a virtual character generation means to be generated.
According to the second invention, in the virtual character generation device of the first invention, the input data receiving means receives image data of one virtual character as the input data, and the second storage unit receives audio data. An audio data storage unit in which the impression matching the audio data is associated in advance, and the image data of the one virtual character received by the input data receiving means is analyzed to obtain an image of the one virtual character. A feature extraction means for extracting image feature data indicating features related to data is provided, and the impression specifying means relates to the image data of the one virtual character based on the image feature data extracted by the feature extraction means. The selection means is a virtual character generation device that identifies an impression, refers to the voice data storage unit, and selects the voice data from the impression specified by the impression specifying means.
According to the third invention, in the virtual character generation device of the second invention, the voice data storage unit associates the voice data with the impression for each gender and age, and the feature extraction means. An estimation means for estimating gender and age based on the extracted image feature data is provided, and the selection means refers to the audio data storage unit and is based on the gender and age estimated by the estimation means. This is a virtual character generation device that selects the voice data.
The fourth invention is the virtual character generation device of the second invention or the third invention, in which the first storage unit is a character feature storage unit in which the image feature data of each virtual character and the impression are associated with each other. The impression specifying means identifies the impression based on the degree of coincidence between the image feature data extracted by the feature extracting means and the image feature data stored in the character feature storage unit. It is a virtual character generator.
According to a fifth aspect of the present invention, in the virtual character generation device of the first invention, the input data receiving means receives one voice data as the input data, and the second storage unit is at least a part of each virtual character. An image data storage unit in which the image data representing the above and the impression matching the image data are previously associated with each other, and the one audio data received by the input data receiving means is analyzed to obtain the one audio. A feature amount calculating means for calculating a feature amount of data is provided, and the impression specifying means identifies the impression related to the one voice data based on the feature amount of the voice data extracted by the feature amount calculating means. The selection means is a virtual character generation device that refers to the image data storage unit and selects the image data from the impression specified by the impression specifying means.
A sixth aspect of the present invention is the virtual character generation device of the fifth invention, wherein the first storage unit is a voice feature storage unit in which a feature amount of the voice data and the impression are associated with each other, and the impression specifying means. Identifyes the impression based on the degree of coincidence between the feature amount of the one voice data calculated by the feature amount calculation means and the feature amount of the voice data stored in the voice feature storage unit. It is a virtual character generator.
According to the seventh invention, in the virtual character generation device of the fifth invention or the sixth invention, the image data storage unit associates the image data with the impression for each part constituting each virtual character. The selection means refers to the image data storage unit, selects the image data from the impression specified by the impression specifying means for each part, and the virtual character generating means selects the image data. Virtual character generation that generates image data of one virtual character by combining the image data selected for each part by means, and generates the virtual character by the generated image data and the one audio data. It is a device.
The eighth invention is a program for operating a computer as any of the virtual character generation devices from the first invention to the seventh invention.

本発明によれば、違和感のない仮想キャラクタを簡単に生成可能な仮想キャラクタ生成装置及びプログラムを提供することができる。 According to the present invention, it is possible to provide a virtual character generation device and a program capable of easily generating a virtual character without a sense of discomfort.

本実施形態に係る仮想キャラクタ生成装置の機能ブロック図である。It is a functional block diagram of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の記憶部の例を示す図である。It is a figure which shows the example of the storage part of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置で用いる印象に対応する印象マップの例を示す図である。It is a figure which shows the example of the impression map corresponding to the impression used in the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の音声ＤＢを構築する音声ＤＢ構築処理を示すフローチャートである。It is a flowchart which shows the voice DB construction process which constructs the voice DB of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の画像ＤＢを構築する画像ＤＢ構築処理を示すフローチャートである。It is a flowchart which shows the image DB construction process which constructs the image DB of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の仮想キャラクタ生成処理を示すフローチャートである。It is a flowchart which shows the virtual character generation processing of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の音声選定処理を示すフローチャートである。It is a flowchart which shows the voice selection process of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の音声選定処理を説明するための図である。It is a figure for demonstrating the voice selection process of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の画像選定処理を示すフローチャートである。It is a flowchart which shows the image selection process of the virtual character generation apparatus which concerns on this embodiment. 本実施形態に係る仮想キャラクタ生成装置の画像選定処理を説明するための図である。It is a figure for demonstrating the image selection process of the virtual character generation apparatus which concerns on this embodiment.

以下、本発明を実施するための形態について、図を参照しながら説明する。なお、これは、あくまでも一例であって、本発明の技術的範囲はこれに限られるものではない。
（実施形態）
＜仮想キャラクタ生成装置１＞
図１は、本実施形態に係る仮想キャラクタ生成装置１の機能ブロック図である。
図２は、本実施形態に係る仮想キャラクタ生成装置１の記憶部３０の例を示す図である。
図３は、本実施形態に係る仮想キャラクタ生成装置１で用いる印象に対応する印象マップ５０の例を示す図である。 Hereinafter, embodiments for carrying out the present invention will be described with reference to the drawings. It should be noted that this is only an example, and the technical scope of the present invention is not limited to this.
(Embodiment)
<Virtual character generator 1>
FIG. 1 is a functional block diagram of the virtual character generation device 1 according to the present embodiment.
FIG. 2 is a diagram showing an example of a storage unit 30 of the virtual character generation device 1 according to the present embodiment.
FIG. 3 is a diagram showing an example of an impression map 50 corresponding to the impression used in the virtual character generation device 1 according to the present embodiment.

図１に示す仮想キャラクタ生成装置１は、入力データに基づいて、見た目と声とに違和感を生じさせない仮想キャラクタを生成する装置である。より具体的には、入力データが、仮想キャラクタが発話する声に係る音声データである場合には、仮想キャラクタ生成装置１は、音声データに合致する仮想キャラクタの画像データを選定する。また、入力データが、仮想キャラクタの画像データである場合には、仮想キャラクタ生成装置１は、画像データに合致する仮想キャラクタの音声データを選定する。
仮想キャラクタ生成装置１は、例えば、サーバである。仮想キャラクタ生成装置１は、その他、パーソナルコンピュータ（ＰＣ）等であってもよい。 The virtual character generation device 1 shown in FIG. 1 is a device that generates a virtual character that does not cause a sense of discomfort in appearance and voice based on input data. More specifically, when the input data is voice data related to the voice spoken by the virtual character, the virtual character generation device 1 selects the image data of the virtual character that matches the voice data. When the input data is the image data of the virtual character, the virtual character generation device 1 selects the voice data of the virtual character that matches the image data.
The virtual character generation device 1 is, for example, a server. The virtual character generation device 1 may also be a personal computer (PC) or the like.

図１に示すように、仮想キャラクタ生成装置１は、制御部１０と、記憶部３０と、入力部３７と、表示部３８と、通信部３９とを備える。
制御部１０は、仮想キャラクタ生成装置１の全体を制御する中央処理装置（ＣＰＵ）である。制御部１０は、記憶部３０に記憶されているオペレーティングシステム（ＯＳ）やアプリケーションプログラムを適宜読み出して実行することにより、上述したハードウェアと協働し、各種機能を実行する。 As shown in FIG. 1, the virtual character generation device 1 includes a control unit 10, a storage unit 30, an input unit 37, a display unit 38, and a communication unit 39.
The control unit 10 is a central processing unit (CPU) that controls the entire virtual character generation device 1. The control unit 10 appropriately reads and executes the operating system (OS) and the application program stored in the storage unit 30 to cooperate with the above-mentioned hardware and execute various functions.

制御部１０は、入力データ受付部１１（入力データ受付手段）と、画像処理部１２と、音声処理部１７と、仮想キャラクタ生成部２１（仮想キャラクタ生成手段）とを備える。
入力データ受付部１１は、入力部３７から入力データを受け付ける制御部である。入力データ受付部１１は、入力データとして、音声データ又は画像データを受け付ける。ここで、音声データは、例えば、人の声に係るデータである。また、画像データは、例えば、仮想キャラクタの顔画像を含む画像データである。画像データは、全身のみならず、上半身を示す仮想キャラクタの画像であってもよい。 The control unit 10 includes an input data receiving unit 11 (input data receiving means), an image processing unit 12, a voice processing unit 17, and a virtual character generating unit 21 (virtual character generating means).
The input data receiving unit 11 is a control unit that receives input data from the input unit 37. The input data receiving unit 11 receives audio data or image data as input data. Here, the voice data is, for example, data related to a human voice. Further, the image data is, for example, image data including a face image of a virtual character. The image data may be an image of a virtual character showing not only the whole body but also the upper body.

画像処理部１２は、入力データ受付部１１が受け付けた入力データが画像データである場合の処理を行う制御部である。画像データは、仮想キャラクタの画像であり、人物を、例えば、アニメーションのように表した画像であってもよいし、人物の写真のようなリアルな画像であってもよい。また、仮想キャラクタの画像は、２Ｄ画像であっても、３Ｄ画像であっても、いずれでもよい。 The image processing unit 12 is a control unit that performs processing when the input data received by the input data receiving unit 11 is image data. The image data is an image of a virtual character, and may be an image of a person represented by, for example, an animation, or a realistic image such as a photograph of a person. Further, the image of the virtual character may be either a 2D image or a 3D image.

画像処理部１２は、画像特徴抽出部１３（特徴抽出手段）と、属性推定部１４（推定手段）と、画像印象特定部１５（印象特定手段）と、音声選定部１６（選定手段）とを備える。
画像特徴抽出部１３は、入力データとして受け付けた画像データを分析して、画像に係る特徴を示す画像特徴データを抽出する。画像特徴抽出部１３は、例えば、ディープラーニング等の手法を用いて、画像データから画像特徴データを抽出してもよい。画像特徴データとしては、例えば、仮想キャラクタの髪の毛の色や髪型、目、鼻、口の比率や顔色、服装とその色等である。 The image processing unit 12 includes an image feature extraction unit 13 (feature extraction means), an attribute estimation unit 14 (estimation means), an image impression identification unit 15 (impression identification means), and a voice selection unit 16 (selection means). Be prepared.
The image feature extraction unit 13 analyzes the image data received as the input data and extracts the image feature data indicating the features related to the image. The image feature extraction unit 13 may extract image feature data from the image data by using, for example, a technique such as deep learning. The image feature data includes, for example, the hair color and hairstyle of the virtual character, the ratio and complexion of eyes, nose, and mouth, clothing and its color, and the like.

属性推定部１４は、画像データが示す仮想キャラクタの性別及び年代を推定する。属性推定部１４は、例えば、画像特徴抽出部１３により抽出された画像特徴データに基づいて、画像データが示す仮想キャラクタの性別及び年代を推定してもよい。より具体的には、属性推定部１４は、例えば、髪型や髪の毛の色、目、鼻、口の比率等に基づいて性別及び年代を推定してもよい。また、属性推定部１４は、公知の画像認識手法を用いて、例えば、顔を認識し、さらに、顔を構成する各パーツの特徴から、性別及び年代を推定してもよい。 The attribute estimation unit 14 estimates the gender and age of the virtual character indicated by the image data. The attribute estimation unit 14 may estimate the gender and age of the virtual character indicated by the image data, for example, based on the image feature data extracted by the image feature extraction unit 13. More specifically, the attribute estimation unit 14 may estimate the gender and age based on, for example, the hairstyle, the color of the hair, the ratio of the eyes, the nose, and the mouth. Further, the attribute estimation unit 14 may recognize a face by using a known image recognition method, and further estimate the gender and age from the characteristics of each part constituting the face.

画像印象特定部１５は、画像データによる仮想キャラクタの印象を特定する。ここで、印象とは、画像データが示す仮想キャラクタの見た目の印象に関するものである。印象は、例えば、「キュート」、「ゴージャス」、「トラディショナル」等の分類として得ることができる。
画像印象特定部１５は、例えば、画像特徴抽出部１３により抽出された画像特徴データに基づいて印象を特定する。より具体的には、画像印象特定部１５は、画像特徴抽出部１３により抽出された画像特徴データと、画像ＤＢ（データベース）３３（後述する）（第１記憶部）の各画像特徴データ（印象識別器）とを比較して一致度合の高い画像ＤＢ３３の画像特徴データに対応した印象を、この画像データの印象として特定する。 The image impression specifying unit 15 specifies the impression of the virtual character based on the image data. Here, the impression is related to the visual impression of the virtual character indicated by the image data. Impressions can be obtained, for example, as classifications such as "cute", "gorgeous", and "traditional".
The image impression specifying unit 15 specifies an impression based on, for example, the image feature data extracted by the image feature extracting unit 13. More specifically, the image impression specifying unit 15 includes image feature data extracted by the image feature extraction unit 13 and each image feature data (impression) of the image DB (database) 33 (described later) (first storage unit). The impression corresponding to the image feature data of the image DB 33 having a high degree of matching in comparison with the classifier) is specified as the impression of the image data.

音声選定部１６は、画像データが示す仮想キャラクタに適合する音声データを選定する。音声選定部１６は、画像印象特定部１５により特定された印象に基づいて、音声ＤＢ３２（後述する）（第２記憶部）に記憶された音声データから印象に対応する音声データ（第２データ）を選定する。その際、音声選定部１６は、属性推定部１４が推定した性別及び年代に対応する音声データを選定する。 The voice selection unit 16 selects voice data that matches the virtual character indicated by the image data. The voice selection unit 16 is based on the impression specified by the image impression specifying unit 15, and the voice data corresponding to the impression from the voice data stored in the voice DB 32 (described later) (second storage unit) (second data). To select. At that time, the voice selection unit 16 selects voice data corresponding to the gender and age estimated by the attribute estimation unit 14.

音声処理部１７は、入力データ受付部１１が受け付けた入力データが音声データである場合の処理を行う制御部である。音声データは、仮想キャラクタの声として用いるデータである。音声データとしては、発話内容を含む人の声を録音したデータであってもよいし、「あー」といった発話内容を含まない人の声を収録したデータであってよい。また、音声データは、例えば、敵対的生成ネットワーク（ＧＡＮｓ：ＧｅｎｅｒａｔｉｖｅＡｄｖｅｒｓａｒｉａｌＮｅｔｗｏｒｋｓ）により生成したものや、合成音声作成等によりコンピュータ等で人工的に生成したものであってもよい。 The voice processing unit 17 is a control unit that performs processing when the input data received by the input data receiving unit 11 is voice data. The voice data is data used as the voice of a virtual character. The voice data may be data in which the voice of a person including the utterance content is recorded, or data in which the voice of a person not including the utterance content such as "ah" is recorded. Further, the voice data may be, for example, one generated by a hostile generation network (GANs: Generative Adversarial Networks), or one artificially generated by a computer or the like by synthetic voice creation or the like.

音声処理部１７は、音声特徴量算出部１８（特徴量算出手段）と、音声印象特定部１９（印象特定手段）と、画像選定部２０（選定手段）とを備える。
音声特徴量算出部１８は、入力データとして受け付けた音声データの特徴量を算出する。音声データの特徴量として、例えば、声の高さ、大きさ、音色に関する特徴量を算出する。音声特徴量算出部１８は、声の高さ、大きさ、音色に関する特徴量を、それぞれ基本周波数、パワー、メル周波数ケプストラム係数（ＭＦＣＣ：Ｍｅｌ−ＦｒｅｑｕｅｎｃｙＣｅｐｓｔｒｕｍＣｏｅｆｆｉｃｉｅｎｔｓ）として算出する。なお、ＭＦＣＣは、ｎ次元（ｎは、例えば、２０）とした数値で表される音声の特徴量である。 The voice processing unit 17 includes a voice feature amount calculation unit 18 (feature amount calculation means), a voice impression specifying unit 19 (impression specifying means), and an image selection unit 20 (selection means).
The voice feature amount calculation unit 18 calculates the feature amount of the voice data received as input data. As the feature amount of the voice data, for example, the feature amount related to the pitch, loudness, and timbre of the voice is calculated. The voice feature amount calculation unit 18 calculates feature amounts related to voice pitch, loudness, and timbre as fundamental frequency, power, and mel frequency cepstrum coefficients (MFCC: Mel-Frequency Cepstrum Coefficients), respectively. The MFCC is a voice feature amount represented by a numerical value in n dimensions (n is, for example, 20).

音声印象特定部１９は、音声データによる仮想キャラクタの印象を特定する。ここで、印象とは、音声データが表す声の印象に関するものであり、画像データの印象と同様に、例えば、「キュート」、「ゴージャス」、「トラディショナル」等の分類として得ることができる。
音声印象特定部１９は、例えば、音声特徴量算出部１８により算出された音声の特徴量に基づいて印象を特定する。より具体的には、音声印象特定部１９は、音声特徴量算出部１８により算出された特徴量と、音声ＤＢ３２（第１記憶部）の各特徴量（印象識別器）とを比較して特徴量の類似度を算出し、一定の閾値以上のものを分類して、分類した印象を、一致度合が高いものとして、この音声データの印象として特定する。 The voice impression specifying unit 19 identifies the impression of the virtual character based on the voice data. Here, the impression is related to the impression of the voice represented by the voice data, and can be obtained as a classification such as "cute", "gorgeous", "traditional", etc., like the impression of the image data.
The voice impression specifying unit 19 specifies an impression based on, for example, the voice feature amount calculated by the voice feature amount calculation unit 18. More specifically, the voice impression specifying unit 19 is characterized by comparing the feature amount calculated by the voice feature amount calculation unit 18 with each feature amount (impression classifier) of the voice DB 32 (first storage unit). The similarity of the quantities is calculated, and those having a certain threshold value or more are classified, and the classified impression is specified as the impression of the voice data as having a high degree of agreement.

画像選定部２０は、音声データが示す仮想キャラクタに適合する画像データを選定する。画像選定部２０は、音声印象特定部１９により特定された印象に基づいて、画像ＤＢ３３（第２記憶部）に記憶された画像データから印象に対応する画像データ（第２データ）を選定する。ここで、画像ＤＢ３３に、パーツ別の画像データを記憶している場合には、画像選定部２０は、パーツごとに、印象に対応する画像データを選定する。 The image selection unit 20 selects image data that matches the virtual character indicated by the voice data. The image selection unit 20 selects image data (second data) corresponding to the impression from the image data stored in the image DB 33 (second storage unit) based on the impression specified by the voice impression specifying unit 19. Here, when the image data for each part is stored in the image DB 33, the image selection unit 20 selects the image data corresponding to the impression for each part.

仮想キャラクタ生成部２１は、入力データ受付部１１が受け付けた入力データが画像データである場合には、画像処理部１２により選定された音声データと、受け付けた画像データとからなる仮想キャラクタを生成する。
また、仮想キャラクタ生成部２１は、入力データ受付部１１が受け付けた入力データが音声データである場合には、音声処理部１７により選定された画像データと、受け付けた音声データとからなる仮想キャラクタを生成する。 When the input data received by the input data receiving unit 11 is image data, the virtual character generation unit 21 generates a virtual character composed of the audio data selected by the image processing unit 12 and the received image data. ..
Further, when the input data received by the input data receiving unit 11 is voice data, the virtual character generation unit 21 generates a virtual character composed of the image data selected by the voice processing unit 17 and the received voice data. Generate.

記憶部３０は、制御部１０が各種の処理を実行するために必要なプログラム、データ等を記憶するためのハードディスク、半導体メモリ素子等の記憶領域である。
記憶部３０は、プログラム記憶部３１と、音声ＤＢ３２（音声データ記憶部、音声特徴記憶部）と、画像ＤＢ３３（画像データ記憶部、キャラクタ特徴記憶部）と、生成キャラクタ記憶部３４と、発話内容記憶部３５とを備える。
プログラム記憶部３１は、各種のプログラムを記憶する記憶領域である。プログラム記憶部３１は、仮想キャラクタ生成プログラム３１ａを記憶している。仮想キャラクタ生成プログラム３１ａは、仮想キャラクタ生成装置１の制御部１０が実行する各種機能を行うためのプログラムである。 The storage unit 30 is a storage area for a hard disk, a semiconductor memory element, or the like for storing programs, data, and the like necessary for the control unit 10 to execute various processes.
The storage unit 30 includes a program storage unit 31, a voice DB 32 (voice data storage unit, voice feature storage unit), an image DB 33 (image data storage unit, character feature storage unit), a generated character storage unit 34, and utterance contents. A storage unit 35 is provided.
The program storage unit 31 is a storage area for storing various programs. The program storage unit 31 stores the virtual character generation program 31a. The virtual character generation program 31a is a program for performing various functions executed by the control unit 10 of the virtual character generation device 1.

音声ＤＢ３２は、音声データと、音声の特徴及び印象に関する情報とを記憶したデータベースである。
図２（Ａ）に、音声ＤＢ３２が有する項目及びデータの例を示す。
図２（Ａ）に示す音声ＤＢ３２は、音声ＩＤ（ＩＤｅｎｔｉｆｉｃａｔｉｏｎ）をキーとして、性別、年代、音声データ、特徴量、印象データの各項目に対応するデータを記憶する。
音声ＩＤは、音声データを一意に識別するための識別情報である。
性別及び年代は、音声データから判断された性別及び年代である。性別は、男性又は女性の区分であり、年代は、１０代、２０代、といったものであるが、これは、一例である。
音声データは、音声そのもののデータである。
特徴量は、音声データを分析して算出された音声データの特徴量である。特徴量は、上記したように、基本周波数、パワー、ＭＦＣＣとして数値で表されるものである。例えば、ＭＣＦＦ（２０次元）で表されるものとしては、
ＭＦＣＣ：−１４．７７，−８．７８，−６．７０，・・・，２．８０，２２，２９からなる２０個の数値が並んだもので表される。
印象データは、音声データから判断された印象を識別するためのデータである。印象データは、この例では、印象マップ５０の座標データである。 The voice DB 32 is a database that stores voice data and information on voice features and impressions.
FIG. 2A shows an example of items and data included in the voice DB 32.
The voice DB 32 shown in FIG. 2A stores data corresponding to each item of gender, age, voice data, feature amount, and impression data using the voice ID (IDentification) as a key.
The voice ID is identification information for uniquely identifying the voice data.
The gender and age are the gender and age determined from the voice data. Gender is a classification of males or females, and ages are teens, 20s, etc., which is an example.
The voice data is the data of the voice itself.
The feature amount is a feature amount of the voice data calculated by analyzing the voice data. As described above, the feature amount is expressed numerically as the fundamental frequency, power, and MFCC. For example, what is represented by MCFF (20 dimensions) is
MFCC: Represented by a series of 20 numerical values consisting of -14.77, -8.78, -6.70, ..., 2.80, 22, 29.
Impression data is data for identifying an impression determined from voice data. The impression data is the coordinate data of the impression map 50 in this example.

ここで、印象マップ５０について、図３に基づき説明する。
図３に示す印象マップ５０は、画像データ及び音声データの双方で用いられ、各々の印象を表現するのに用いるマップである。
印象マップ５０は、三次元で印象を表したものである。この印象マップ５０は、Ｘ軸方向で女性的又は男性的といった傾向を表す。また、印象マップ５０は、Ｙ軸方向で若い又は大人っぽいといった傾向を表し、Ｚ軸方向で、アニメーション又はリアルといった傾向を表す。そして、Ｘ軸とＹ軸とによって、「キュート」、「ボーイッシュ」といった印象に関する分類が対応付けられている。なお、Ｘ軸とＹ軸とによって示された印象は、一例である。
図２（Ａ）に示す音声ＩＤがＶ００１の印象データは、位置５１によって示される識別器であり、音声ＩＤがＶ００２の印象データは、位置５２によって示される識別器である。 Here, the impression map 50 will be described with reference to FIG.
The impression map 50 shown in FIG. 3 is a map used for both image data and audio data, and is used to express each impression.
The impression map 50 represents an impression in three dimensions. The impression map 50 shows a tendency of being feminine or masculine in the X-axis direction. Further, the impression map 50 shows a tendency of being young or mature in the Y-axis direction, and shows a tendency of animation or real in the Z-axis direction. Then, the X-axis and the Y-axis are associated with classifications related to impressions such as "cute" and "boyish". The impression shown by the X-axis and the Y-axis is an example.
The impression data with the voice ID V001 shown in FIG. 2A is the classifier indicated by the position 51, and the impression data with the voice ID V002 is the classifier indicated by the position 52.

図１の画像ＤＢ３３は、画像データと、画像の特徴及び印象に関する情報とを記憶したデータベースである。
図２（Ｂ）に、画像ＤＢ３３が有する項目及びデータの例を示す。
図２（Ｂ）に示す画像ＤＢ３３は、画像ＩＤをキーとして、パーツ種類、画像データ、画像特徴データ、印象データの各項目に対応したデータを記憶する。
画像ＩＤは、画像データを一意に識別するための識別情報である。
パーツ種類は、画像データが示すパーツの種類である。パーツの種類としては、全身、上半身、顔、髪型、アクセサリ、服等がある。なお、全身や上半身といった、仮想キャラクタの根幹を示すデータについては、パーツ種類をブランクにしてもよい。
画像データは、画像そのもののデータである。
画像特徴データは、画像データを分析して付加されたタグ情報である。画像特徴データは、例えば、髪の毛の色や髪型等である。図２（Ｂ）の例では、色を黒、茶等の文字で示しているが、例えば、ＲＧＢ値等の数値によって表してもよい。
印象データは、画像データから判断された印象を識別するためのデータである。印象データは、この例では、上述した印象マップ５０（図３）の座標データである。例えば、図２（Ｂ）に示す画像ＩＤがＩ００２の印象データは、位置５３（図３参照）によって示される識別器である。 The image DB 33 of FIG. 1 is a database that stores image data and information on image features and impressions.
FIG. 2B shows an example of items and data included in the image DB 33.
The image DB 33 shown in FIG. 2B stores data corresponding to each item of part type, image data, image feature data, and impression data using the image ID as a key.
The image ID is identification information for uniquely identifying the image data.
The part type is the type of the part indicated by the image data. Types of parts include whole body, upper body, face, hairstyle, accessories, clothes, etc. The part type may be blank for the data indicating the basis of the virtual character such as the whole body and the upper body.
The image data is the data of the image itself.
The image feature data is tag information added by analyzing the image data. The image feature data is, for example, a hair color, a hairstyle, or the like. In the example of FIG. 2B, the color is indicated by characters such as black and brown, but for example, it may be represented by a numerical value such as an RGB value.
Impression data is data for identifying an impression determined from image data. In this example, the impression data is the coordinate data of the impression map 50 (FIG. 3) described above. For example, the impression data with the image ID I002 shown in FIG. 2B is the classifier indicated by the position 53 (see FIG. 3).

図１の生成キャラクタ記憶部３４は、仮想キャラクタ生成装置１によって生成された仮想キャラクタを記憶する記憶領域である。生成キャラクタ記憶部３４は、例えば、１つの音声ＩＤと１以上の画像ＩＤとを対応付けて記憶する。そのようにすれば、生成された仮想キャラクタは、画像データと音声データとから構成されることが明確になる。
発話内容記憶部３５は、仮想キャラクタの発話内容を記憶する記憶領域である。発話内容記憶部３５は、発話内容を、例えば、テキストとして記憶する。 The generated character storage unit 34 of FIG. 1 is a storage area for storing the virtual character generated by the virtual character generation device 1. The generation character storage unit 34 stores, for example, one voice ID and one or more image IDs in association with each other. By doing so, it becomes clear that the generated virtual character is composed of image data and audio data.
The utterance content storage unit 35 is a storage area for storing the utterance content of the virtual character. The utterance content storage unit 35 stores the utterance content as, for example, a text.

入力部３７は、例えば、キーボードやマウス等の入力装置である。
表示部３８は、例えば、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等の表示装置である。
なお、仮想キャラクタ生成装置１がサーバである場合には、入力部３７及び表示部３８は、サーバに外付けされた各種の装置であってよい。
通信部３９は、通信ネットワークＮを介して、外部装置との間の通信を行うためのインタフェースである。
ここで、コンピュータとは、制御部、記憶装置等を備えた情報処理装置をいい、仮想キャラクタ生成装置１は、制御部、記憶部等を備えた情報処理装置であり、コンピュータの概念に含まれる。 The input unit 37 is, for example, an input device such as a keyboard or a mouse.
The display unit 38 is, for example, a display device such as an LCD (Liquid Crystal Display).
When the virtual character generation device 1 is a server, the input unit 37 and the display unit 38 may be various devices externally attached to the server.
The communication unit 39 is an interface for performing communication with an external device via the communication network N.
Here, the computer means an information processing device provided with a control unit, a storage device, and the like, and the virtual character generation device 1 is an information processing device provided with a control unit, a storage unit, and the like, and is included in the concept of a computer. ..

＜仮想キャラクタ生成装置１の準備処理＞
次に、仮想キャラクタ生成装置１における処理について説明する。
まず、仮想キャラクタを生成するにあたって必要な情報を、仮想キャラクタ生成装置１に準備する処理について説明する。
図４は、本実施形態に係る仮想キャラクタ生成装置１の音声ＤＢ３２を構築する音声ＤＢ構築処理を示すフローチャートである。
図５は、本実施形態に係る仮想キャラクタ生成装置１の画像ＤＢ３３を構築する画像ＤＢ構築処理を示すフローチャートである。 <Preparation process for virtual character generation device 1>
Next, the processing in the virtual character generation device 1 will be described.
First, a process of preparing the information necessary for generating a virtual character in the virtual character generation device 1 will be described.
FIG. 4 is a flowchart showing a voice DB construction process for constructing the voice DB 32 of the virtual character generation device 1 according to the present embodiment.
FIG. 5 is a flowchart showing an image DB construction process for constructing the image DB 33 of the virtual character generation device 1 according to the present embodiment.

まず、仮想キャラクタ生成装置１を運用するにあたっては、音声ＤＢ３２と、画像ＤＢ３３とを予め準備する必要がある。そこで、仮想キャラクタ生成装置１の構築作業を行っているユーザは、仮想キャラクタの声として使用する音声データを、予め複数用意する。また、ユーザは、仮想キャラクタの画像として使用する画像データを、予め複数用意する。画像データは、仮想キャラクタの全身や上半身等、仮想キャラクタの中心になる画像の他、仮想キャラクタに装飾するパーツ別の画像も複数用意する。 First, in operating the virtual character generation device 1, it is necessary to prepare the voice DB 32 and the image DB 33 in advance. Therefore, the user who is constructing the virtual character generation device 1 prepares a plurality of voice data to be used as the voice of the virtual character in advance. In addition, the user prepares a plurality of image data to be used as an image of the virtual character in advance. As the image data, in addition to the image that is the center of the virtual character such as the whole body and the upper body of the virtual character, a plurality of images for each part to be decorated on the virtual character are prepared.

＜音声ＤＢ３２の構築＞
音声ＤＢ３２を構築する際には、仮想キャラクタ生成装置１の制御部１０は、ユーザの操作により、表示部３８に図示しない管理者用画面を表示させる。そして、ユーザは、管理者用画面から音声ＤＢ３２を構築するための画面（図示せず）を表示させて、画面にしたがって、データの入力を行う。 <Construction of voice DB 32>
When constructing the voice DB 32, the control unit 10 of the virtual character generation device 1 causes the display unit 38 to display an administrator screen (not shown) by the user's operation. Then, the user displays a screen (not shown) for constructing the voice DB 32 from the administrator screen, and inputs data according to the screen.

図４のステップＳ（以下、単に「Ｓ」という。）１１において、ユーザは、用意した複数の音声データのうち１つの音声データを、入力部３７を介して入力することで、仮想キャラクタ生成装置１の制御部１０は、入力された音声データを、音声ＤＢ３２に登録する。その際、制御部１０は、一意になる音声ＩＤを、例えば、ランダムに決定して、音声データに対応付ける。
Ｓ１２において、登録した音声データによる音声の性別及び年代を、ユーザが入力部３７を介して入力することで、制御部１０は、性別及び年代を、音声データに対応付けて音声ＤＢ３２に登録する。ここで、登録する音声データの性別及び年代は、ユーザが決定したものである。 In step S (hereinafter, simply referred to as “S”) 11 of FIG. 4, the user inputs one voice data out of the prepared plurality of voice data via the input unit 37, thereby causing a virtual character generation device. The control unit 10 of 1 registers the input voice data in the voice DB 32. At that time, the control unit 10 randomly determines, for example, a unique voice ID and associates it with the voice data.
In S12, the user inputs the gender and age of the voice based on the registered voice data via the input unit 37, and the control unit 10 registers the gender and age in the voice DB 32 in association with the voice data. Here, the gender and age of the voice data to be registered are determined by the user.

Ｓ１３において、ユーザは、音声データが表す印象を、入力部３７を介して、例えば、印象マップ５０（図３参照）にプロットする。例えば、制御部１０は、印象マップ５０を画面に表示させる。そして、ユーザは、入力部３７を操作して、印象マップ５０が示す印象のいずれかの位置をプロットする。ここで、登録する音声データの印象は、ユーザが決定したものである。 In S13, the user plots the impression represented by the voice data on, for example, the impression map 50 (see FIG. 3) via the input unit 37. For example, the control unit 10 displays the impression map 50 on the screen. Then, the user operates the input unit 37 to plot the position of any of the impressions shown by the impression map 50. Here, the impression of the voice data to be registered is determined by the user.

Ｓ１４において、制御部１０は、プロットした位置である三次元の座標を印象データとして、音声データに対応付けて音声ＤＢ３２に登録する。なお、この例では、印象データを印象マップ５０が示す位置として記憶するものであるが、プロットした位置に示された「キュート」等のテキストを登録してもよい。
Ｓ１５において、制御部１０（音声特徴量算出部１８）は、音声データの特徴量を算出する。
Ｓ１６において、制御部１０は、算出した音声データの特徴量を、音声データに対応付けて音声ＤＢ３２に登録する。 In S14, the control unit 10 registers the three-dimensional coordinates, which are the plotted positions, as impression data in the voice DB 32 in association with the voice data. In this example, the impression data is stored as the position indicated by the impression map 50, but a text such as "cute" indicated at the plotted position may be registered.
In S15, the control unit 10 (voice feature amount calculation unit 18) calculates the feature amount of the voice data.
In S16, the control unit 10 registers the calculated feature amount of the voice data in the voice DB 32 in association with the voice data.

Ｓ１７において、ユーザが予め用意した全ての音声データについて登録処理をしたか否かによって、制御部１０は、処理を終了するか否かを判断する。全ての音声データについて登録処理をした場合に、ユーザが、音声ＤＢ３２の構築に用いる画面を閉じる操作を行うと、制御部１０は、画面を閉じる操作に応じた信号を受け付けることで、処理を終了すると判断する。ユーザが用意した全ての音声データについて登録処理をした場合（Ｓ１７：ＹＥＳ）には、制御部１０は、本処理を終了する。他方、用意した全ての音声データについて登録処理をしていない場合（Ｓ１７：ＮＯ）には、制御部１０は、処理をＳ１１に移し、未処理の音声データについて、同様の処理を行う。 In S17, the control unit 10 determines whether or not to end the processing depending on whether or not the registration processing has been performed for all the voice data prepared in advance by the user. When the user performs an operation of closing the screen used for constructing the voice DB 32 when all the voice data is registered, the control unit 10 ends the process by receiving a signal corresponding to the operation of closing the screen. Then judge. When the registration process is performed for all the voice data prepared by the user (S17: YES), the control unit 10 ends this process. On the other hand, when all the prepared voice data has not been registered (S17: NO), the control unit 10 shifts the processing to S11 and performs the same processing on the unprocessed voice data.

＜画像ＤＢ３３の構築＞
画像ＤＢ３３を構築する際には、仮想キャラクタ生成装置１の制御部１０は、ユーザの操作により、表示部３８に図示しない管理者用画面を表示させる。そして、ユーザは、管理者用画面から画像ＤＢ３３を構築するための画面（図示せず）を表示させて、画面にしたがって、データの入力を行う。 <Construction of image DB33>
When constructing the image DB 33, the control unit 10 of the virtual character generation device 1 causes the display unit 38 to display an administrator screen (not shown) by the user's operation. Then, the user displays a screen (not shown) for constructing the image DB 33 from the administrator screen, and inputs data according to the screen.

図５のＳ２１において、ユーザは、用意した複数の画像データのうち１つの画像データと、その画像データが示すパーツ種類とを、入力部３７を介して入力することで、仮想キャラクタ生成装置１の制御部１０は、画像データ及びパーツ種類を画像ＤＢ３３に登録する。その際、制御部１０は、一意になる画像ＩＤを、例えば、ランダムに決定して、画像データに対応付ける。
ここで、画像データは、３Ｄ画像であってもよいし、２Ｄ画像であってもよい。また、画像データは、仮想キャラクタの全身等を表す全体画像の他、例えば、髪型、アクセサリ、洋服等の、仮想キャラクタを構成する一部分の画像である部分画像を含む。 In S21 of FIG. 5, the user inputs the image data of one of the prepared plurality of image data and the part type indicated by the image data via the input unit 37, thereby causing the virtual character generation device 1. The control unit 10 registers the image data and the part type in the image DB 33. At that time, the control unit 10 randomly determines, for example, a unique image ID and associates it with the image data.
Here, the image data may be a 3D image or a 2D image. Further, the image data includes a whole image showing the whole body of the virtual character and the like, and a partial image which is an image of a part of the virtual character such as a hairstyle, an accessory, and clothes.

Ｓ２２において、制御部１０（画像特徴抽出部１３）は、画像ＤＢ３３に登録した画像データを分析して、画像に係る特徴を示す画像特徴データを抽出する。そして、制御部１０は、抽出した画像特徴データを、画像ＤＢ３３に登録する。
Ｓ２３において、ユーザは、画像データに対する印象を、入力部３７を介して、例えば、印象マップ５０（図３参照）にプロットする。ここで、登録する画像データの印象は、ユーザが決定したものである。
Ｓ２４において、制御部１０は、プロットした位置である三次元の座標を、印象データとして、画像データに対応付けて画像ＤＢ３３に登録する。 In S22, the control unit 10 (image feature extraction unit 13) analyzes the image data registered in the image DB 33 and extracts the image feature data indicating the features related to the image. Then, the control unit 10 registers the extracted image feature data in the image DB 33.
In S23, the user plots the impression of the image data on, for example, the impression map 50 (see FIG. 3) via the input unit 37. Here, the impression of the image data to be registered is determined by the user.
In S24, the control unit 10 registers the three-dimensional coordinates, which are the plotted positions, as impression data in the image DB 33 in association with the image data.

Ｓ２５において、ユーザが予め用意した全ての画像データについて登録処理をしたか否かによって、制御部１０は、処理を終了するか否かを判断する。全ての画像データについて登録処理をした場合に、ユーザが、画像ＤＢ３３の構築に用いる画面を閉じる操作を行うと、制御部１０は、画面を閉じる操作に応じた信号を受け付けることで、処理を終了すると判断する。ユーザが用意した全ての画像データについて登録処理をした場合（Ｓ２５：ＹＥＳ）には、制御部１０は、本処理を終了する。他方、用意した全ての画像データについて登録処理をしていない場合（Ｓ２５：ＮＯ）には、制御部１０は、処理をＳ２１に移し、未処理の画像データについて、同様の処理を行う。 In S25, the control unit 10 determines whether or not to end the processing depending on whether or not the registration processing has been performed for all the image data prepared in advance by the user. When the user performs an operation of closing the screen used for constructing the image DB 33 when all the image data is registered, the control unit 10 ends the process by receiving a signal corresponding to the operation of closing the screen. Then judge. When the registration process is performed for all the image data prepared by the user (S25: YES), the control unit 10 ends this process. On the other hand, when all the prepared image data has not been registered (S25: NO), the control unit 10 shifts the processing to S21 and performs the same processing on the unprocessed image data.

上記した音声ＤＢ構築処理によって、音声ＤＢ３２が構築できる。また、上記した画像ＤＢ構築処理によって、画像ＤＢ３３が構築できる。よって、仮想キャラクタ生成装置１が音声データや画像データである入力データを受け付けた場合に、制御部１０は、構築した音声ＤＢ３２及び画像ＤＢ３３を用いて処理を行うことができる。 The voice DB 32 can be constructed by the voice DB construction process described above. Further, the image DB 33 can be constructed by the image DB construction process described above. Therefore, when the virtual character generation device 1 receives input data such as voice data or image data, the control unit 10 can perform processing using the constructed voice DB 32 and image DB 33.

＜仮想キャラクタ生成処理＞
次に、入力データを使用した仮想キャラクタの生成について説明する。
図６は、本実施形態に係る仮想キャラクタ生成装置１の仮想キャラクタ生成処理を示すフローチャートである。
図７は、本実施形態に係る仮想キャラクタ生成装置１の音声選定処理を示すフローチャートである。
図８は、本実施形態に係る仮想キャラクタ生成装置１の音声選定処理を説明するための図である。
図９は、本実施形態に係る仮想キャラクタ生成装置１の画像選定処理を示すフローチャートである。
図１０は、本実施形態に係る仮想キャラクタ生成装置１の画像選定処理を説明するための図である。 <Virtual character generation process>
Next, the generation of a virtual character using the input data will be described.
FIG. 6 is a flowchart showing a virtual character generation process of the virtual character generation device 1 according to the present embodiment.
FIG. 7 is a flowchart showing a voice selection process of the virtual character generation device 1 according to the present embodiment.
FIG. 8 is a diagram for explaining the voice selection process of the virtual character generation device 1 according to the present embodiment.
FIG. 9 is a flowchart showing an image selection process of the virtual character generation device 1 according to the present embodiment.
FIG. 10 is a diagram for explaining an image selection process of the virtual character generation device 1 according to the present embodiment.

例えば、ユーザが入力部３７を介して入力データを入力することで、図６のＳ３１において、仮想キャラクタ生成装置１の制御部１０（入力データ受付部１１）は、入力データを受け付ける。
Ｓ３２において、制御部１０は、受け付けた入力データが画像データであるか否かを判断する。画像データである場合（Ｓ３２：ＹＥＳ）には、制御部１０は、処理をＳ３３に移す。他方、画像データではない場合、つまり、音声データである場合（Ｓ３２：ＮＯ）には、制御部１０は、処理をＳ３４に移す。
Ｓ３３において、制御部１０（画像処理部１２）は、音声選定処理を行う。 For example, when the user inputs the input data via the input unit 37, the control unit 10 (input data reception unit 11) of the virtual character generation device 1 receives the input data in S31 of FIG.
In S32, the control unit 10 determines whether or not the received input data is image data. In the case of image data (S32: YES), the control unit 10 shifts the processing to S33. On the other hand, when it is not image data, that is, when it is audio data (S32: NO), the control unit 10 shifts the processing to S34.
In S33, the control unit 10 (image processing unit 12) performs audio selection processing.

ここで、音声選定処理について、図７に基づき説明する。
図７のＳ４１において、制御部１０（画像特徴抽出部１３）は、画像データを分析して、画像に係る特徴を示す画像特徴データを抽出する。
例えば、図８に示す画像データ６０の場合、制御部１０によって画像データ６０から抽出される画像特徴データを、表６１に示す。表６１に示す画像特徴データは、画像データ６０を構成する各部分（パーツ）についての特徴を表す。 Here, the voice selection process will be described with reference to FIG.
In S41 of FIG. 7, the control unit 10 (image feature extraction unit 13) analyzes the image data and extracts the image feature data indicating the features related to the image.
For example, in the case of the image data 60 shown in FIG. 8, the image feature data extracted from the image data 60 by the control unit 10 is shown in Table 61. The image feature data shown in Table 61 represent features of each portion (part) constituting the image data 60.

図７のＳ４２において、制御部１０（属性推定部１４）は、Ｓ４１で抽出した画像特徴データに基づいて、画像データが示す仮想キャラクタの性別及び年代を推定する。なお、上述したように、制御部１０は、公知の画像認識手法を用いて、例えば、顔を認識し、さらに、顔の各パーツの特徴から、性別及び年代を推定してもよい。
Ｓ４３において、制御部１０（画像印象特定部１５）は、画像ＤＢ３３を参照し、画像データが示す仮想キャラクタの印象を特定する。制御部１０は、例えば、Ｓ４１により抽出した画像特徴データと、画像ＤＢ３３に登録された各画像特徴データとを照合し、一致度合の高い画像ＤＢ３３の画像特徴データに対応付けられた印象データの位置が示す印象を、この画像データの印象として特定する。 In S42 of FIG. 7, the control unit 10 (attribute estimation unit 14) estimates the gender and age of the virtual character indicated by the image data based on the image feature data extracted in S41. As described above, the control unit 10 may recognize the face by using a known image recognition method, and further estimate the gender and age from the characteristics of each part of the face.
In S43, the control unit 10 (image impression specifying unit 15) refers to the image DB 33 and specifies the impression of the virtual character indicated by the image data. For example, the control unit 10 collates the image feature data extracted by S41 with each image feature data registered in the image DB 33, and the position of the impression data associated with the image feature data of the image DB 33 having a high degree of matching. Is specified as an impression of this image data.

Ｓ４４において、制御部１０は、図６のＳ３１の処理で受け付けた画像データを、画像ＤＢ３３に登録する。その際、制御部１０は、パーツ種類を、画像特徴データに基づいて、全身や、上半身等として登録してもよい。また、制御部１０は、一意になる画像ＩＤを決定して付与する。
このＳ４４の処理によって、画像ＤＢ３３には、様々な画像データが適宜追加され、データベースとして蓄積される。よって、仮想キャラクタ生成装置１は、画像ＤＢ３３に、より多くの画像データを登録させることができる。なお、このＳ４４の処理は、任意である。入力データを登録させたくない場合には、制御部１０は、Ｓ４４の処理を行わないようにすればよい。 In S44, the control unit 10 registers the image data received in the process of S31 of FIG. 6 in the image DB 33. At that time, the control unit 10 may register the part type as the whole body, the upper body, or the like based on the image feature data. Further, the control unit 10 determines and assigns a unique image ID.
By the process of S44, various image data are appropriately added to the image DB 33 and stored as a database. Therefore, the virtual character generation device 1 can register more image data in the image DB 33. The processing of S44 is arbitrary. If it is not desired to register the input data, the control unit 10 may not perform the process of S44.

Ｓ４５において、制御部１０（音声選定部１６）は、Ｓ４３により特定された印象に基づいて、音声ＤＢ３２に記憶された複数の音声データから音声データを１つ選定する。画像データと、音声データとは、同じ印象マップ５０（図３）を用いて印象データの位置によって印象を特定している。そのため、制御部１０は、画像データの印象データが示す座標位置に近似した位置の印象データに対応した音声データを選定することにより、画像データに適合した音声データを選定できる。その後、制御部１０は、処理を図６のＳ３５に移す。 In S45, the control unit 10 (voice selection unit 16) selects one voice data from the plurality of voice data stored in the voice DB 32 based on the impression specified by S43. The image data and the audio data specify the impression by the position of the impression data using the same impression map 50 (FIG. 3). Therefore, the control unit 10 can select audio data suitable for the image data by selecting audio data corresponding to the impression data at a position close to the coordinate position indicated by the impression data of the image data. After that, the control unit 10 shifts the processing to S35 of FIG.

他方、図６のＳ３４において、制御部１０（音声処理部１７）は、画像選定処理を行う。
ここで、画像選定処理について、図９に基づき説明する。
図９のＳ５１において、制御部１０（音声特徴量算出部１８）は、音声データの特徴量を算出する。
Ｓ５２において、制御部１０（音声印象特定部１９）は、音声ＤＢ３２を参照し、音声データが示す仮想キャラクタの印象を特定する。制御部１０は、例えば、Ｓ５１により抽出した音声データの特徴量と、音声ＤＢ３２に登録された特徴量とを照合し、一致度合の高い音声ＤＢ３２の特徴量に対応付けられた印象データの位置が示す印象を、この音声データの印象として特定する。 On the other hand, in S34 of FIG. 6, the control unit 10 (audio processing unit 17) performs image selection processing.
Here, the image selection process will be described with reference to FIG.
In S51 of FIG. 9, the control unit 10 (voice feature amount calculation unit 18) calculates the feature amount of the voice data.
In S52, the control unit 10 (voice impression specifying unit 19) refers to the voice DB 32 and specifies the impression of the virtual character indicated by the voice data. For example, the control unit 10 collates the feature amount of the voice data extracted by S51 with the feature amount registered in the voice DB 32, and the position of the impression data associated with the feature amount of the voice DB 32 having a high degree of matching is determined. The impression to be shown is specified as an impression of this voice data.

Ｓ５３において、制御部１０は、図６のＳ３１の処理で受け付けた音声データを、音声ＤＢ３２に登録する。その際、制御部１０は、Ｓ５２で特徴量が類似した音声ＤＢ３２の性別及び年代を、この音声データの性別及び年代として登録してもよい。また、制御部１０は、一意になる音声ＩＤを決定して付与する。
このＳ５３の処理によって、音声ＤＢ３２には、様々な音声データが適宜追加され、データベースとして蓄積される。よって、音声ＤＢ３２に、より多くの音声データを登録させることができる。なお、このＳ５３の処理は、任意である。入力データを登録させたくない場合には、制御部１０は、Ｓ５３処理を行わないようにすればよい。 In S53, the control unit 10 registers the voice data received in the process of S31 in FIG. 6 in the voice DB 32. At that time, the control unit 10 may register the gender and age of the voice DB 32 having similar feature amounts in S52 as the gender and age of the voice data. Further, the control unit 10 determines and assigns a unique voice ID.
By the process of S53, various voice data are appropriately added to the voice DB 32 and stored as a database. Therefore, more voice data can be registered in the voice DB 32. The processing of S53 is arbitrary. If it is not desired to register the input data, the control unit 10 may not perform the S53 process.

Ｓ５４において、制御部１０（画像選定部２０）は、Ｓ５２の処理で特定された印象に基づいて、画像ＤＢ３３に記憶された複数の画像データから画像データを１つ選定する。ここで、画像ＤＢ３３は、パーツ別の画像データを記憶している場合には、制御部１０は、パーツごとに、複数の画像データから画像データを１つ選定すればよい。 In S54, the control unit 10 (image selection unit 20) selects one image data from the plurality of image data stored in the image DB 33 based on the impression specified in the process of S52. Here, when the image DB 33 stores the image data for each part, the control unit 10 may select one image data from the plurality of image data for each part.

なお、パーツ種類が全身、上半身等の仮想キャラクタのメイン（根幹）となる画像は、必須である。そのため、制御部１０は、音声データの印象データが示す座標位置に最も近似した位置の印象データに対応した画像データを選定すればよい。
また、パーツ種類が髪型や、アクセサリ等は、必須でなない。そこで、制御部１０は、音声データの印象データが示す座標位置から所定範囲の位置に画像データの印象データがある場合に限り、画像データを選定すればよい。図１０（Ａ）は、髪型の画像データの集合７１を示し、図１０（Ｂ）は、アクセサリの画像データの集合７２を示す。 An image in which the parts type is the main (root) of the virtual character such as the whole body and the upper body is indispensable. Therefore, the control unit 10 may select the image data corresponding to the impression data at the position closest to the coordinate position indicated by the impression data of the voice data.
In addition, hairstyles and accessories are not essential parts. Therefore, the control unit 10 may select the image data only when the impression data of the image data is located in a predetermined range from the coordinate position indicated by the impression data of the audio data. FIG. 10A shows a set of hairstyle image data 71, and FIG. 10B shows a set of accessory image data 72.

さらに、パーツ種類が洋服の画像は、メインになる画像のパーツ種類に応じて変わるものである。メインになる画像のパーツ種類が全身の場合には、洋服の画像が必須であるが、上半身の場合には、例えば、上半身の洋服の画像のみ必須になる。よって、制御部１０は、メインになる画像に対応して、画像データを選定すればよい。図１０（Ｃ）は、洋服の画像データの集合７３を示す。
その後、制御部１０は、処理を図６のＳ３５に移す。 Further, the image of clothes whose parts type is clothes changes according to the parts type of the main image. When the main image part type is the whole body, the image of the clothes is indispensable, but in the case of the upper body, for example, only the image of the clothes of the upper body is indispensable. Therefore, the control unit 10 may select image data corresponding to the main image. FIG. 10C shows a set 73 of image data of clothes.
After that, the control unit 10 shifts the processing to S35 of FIG.

図６のＳ３５において、制御部１０（仮想キャラクタ生成部２１）は、Ｓ３３又はＳ３４のうちのいずれかの処理で選定されたデータと、入力データとを用いて仮想キャラクタを生成する。
Ｓ３６において、制御部１０は、生成した仮想キャラクタを、生成キャラクタ記憶部３４に登録する。制御部１０は、生成キャラクタ記憶部３４に、１つの音声データの音声ＩＤと、１以上の画像データの画像ＩＤとを対応付けて記憶させればよい。その後、制御部１０は、本処理を終了する。 In S35 of FIG. 6, the control unit 10 (virtual character generation unit 21) generates a virtual character using the data selected in the process of either S33 or S34 and the input data.
In S36, the control unit 10 registers the generated virtual character in the generated character storage unit 34. The control unit 10 may store the voice ID of one voice data and the image ID of one or more image data in association with each other in the generation character storage unit 34. After that, the control unit 10 ends this process.

この処理により、音声データに適合した画像データを用いた仮想キャラクタや、画像データに適合した音声データを用いた仮想キャラクタを生成するので、生成した仮想キャラクタは、見た目と声とに違和感を覚えにくいものにできる。
なお、仮想キャラクタを、例えば、通信部３９を介して接続された端末等に表示させる場合には、仮想キャラクタ生成装置１の制御部１０は、例えば、発話内容記憶部３５に記憶された発話内容データに対して音声データを用いて発話させた発話データを、端末等に出力すればよい。また、仮想キャラクタ生成装置１の制御部１０は、画像データに動きを付けて出力させてもよい。 By this processing, a virtual character using image data conforming to the voice data and a virtual character using the voice data conforming to the image data are generated, so that the generated virtual character does not easily feel a sense of discomfort in appearance and voice. Can be made.
When the virtual character is displayed on, for example, a terminal connected via the communication unit 39, the control unit 10 of the virtual character generation device 1 may, for example, utter the utterance content stored in the utterance content storage unit 35. The utterance data obtained by using the voice data for the data may be output to a terminal or the like. Further, the control unit 10 of the virtual character generation device 1 may output the image data with motion.

このように、本実施形態の仮想キャラクタ生成装置１によれば、以下のような効果がある。
（１）キャラクタに係る画像データに適合する音声データを選定して仮想キャラクタを生成する。また、声に係る音声データに適合する画像データを選定して仮想キャラクタを生成する。よって、仮想キャラクタの見た目と声との親和性が高いものにでき、違和感を覚えにくくできる。 As described above, the virtual character generation device 1 of the present embodiment has the following effects.
(1) Select audio data that matches the image data related to the character and generate a virtual character. In addition, image data that matches the voice data related to the voice is selected to generate a virtual character. Therefore, the appearance of the virtual character can be made to have a high affinity with the voice, and it is possible to make it difficult to feel a sense of discomfort.

（２）キャラクタに係る画像データから画像特徴データを抽出して印象を特定する。また、音声ＤＢ３２には、音声データに印象が対応付けられており、画像データから特定された印象を用いて、音声データを選定する。よって、仮想キャラクタを構成する画像データと、音声データとの印象が近しいものにできる。その結果、仮想キャラクタの見た目と声とに違和感を覚えにくいものにできる。
（３）キャラクタに係る画像データから画像特徴データを抽出して、キャラクタの性別及び年代を推定する。また、音声ＤＢ３２には、音声データに性別及び年代が対応付けられており、画像データから推定した性別及び年代を用いて、音声データを選定する。よって、仮想キャラクタを構成する画像データと、音声データとが、同じ性別及び年代を示すものにできる。その結果、仮想キャラクタの見た目と声とに違和感を覚えにくいものにできる。
（４）キャラクタに係る画像データの印象を、画像データの画像特徴データと、画像ＤＢ３３に記憶された画像特徴データとの一致度合に基づいて特定する。よって、画像データの印象を、画像ＤＢ３３に登録された画像データを用いて行うことができる。そのため、画像データの印象を、蓄積されたデータと同様なものにできる。 (2) Image feature data is extracted from the image data related to the character to specify the impression. Further, in the voice DB 32, an impression is associated with the voice data, and the voice data is selected by using the impression specified from the image data. Therefore, the impression of the image data constituting the virtual character and the voice data can be made close to each other. As a result, it is possible to make the appearance and voice of the virtual character less likely to feel uncomfortable.
(3) Image feature data is extracted from the image data related to the character, and the gender and age of the character are estimated. Further, in the voice DB 32, a gender and an age are associated with the voice data, and the voice data is selected using the gender and the age estimated from the image data. Therefore, the image data constituting the virtual character and the voice data can indicate the same gender and age. As a result, it is possible to make the appearance and voice of the virtual character less likely to feel uncomfortable.
(4) The impression of the image data related to the character is specified based on the degree of coincidence between the image feature data of the image data and the image feature data stored in the image DB 33. Therefore, the impression of the image data can be made by using the image data registered in the image DB 33. Therefore, the impression of the image data can be made similar to the accumulated data.

（５）キャラクタに係る音声データから特徴量を算出して印象を特定する。また、画像ＤＢ３３には、画像データに印象が対応付けられており、音声データから特定された印象を用いて、画像データを選定する。よって、仮想キャラクタを構成する画像データと、音声データとの印象が近しいものにできる。その結果、仮想キャラクタの見た目と声とに違和感を覚えにくいものにできる。
（６）キャラクタに係る音声データの印象を、音声データの特徴量と、音声ＤＢ３２に記憶された特徴量との一致度合に基づいて特定する。よって、音声データの印象を、音声ＤＢ３２に登録された音声データを用いて行うことができる。そのため、音声データの印象を、蓄積されたデータと同様なものにできる。
（７）画像ＤＢ３３は、パーツ種類に画像データが対応付けられており、音声データから特定された印象を用いて、パーツごとに画像データを選定する。そして、選定したパーツごとの画像データを組み合わせて画像データを生成する。よって、仮想キャラクタを構成する画像データの各パーツは、印象が統一されたものにできる。 (5) The feature amount is calculated from the voice data related to the character to specify the impression. Further, in the image DB 33, an impression is associated with the image data, and the image data is selected by using the impression specified from the audio data. Therefore, the impression of the image data constituting the virtual character and the voice data can be made close to each other. As a result, it is possible to make the appearance and voice of the virtual character less likely to feel uncomfortable.
(6) The impression of the voice data related to the character is specified based on the degree of coincidence between the feature amount of the voice data and the feature amount stored in the voice DB 32. Therefore, the impression of the voice data can be made by using the voice data registered in the voice DB 32. Therefore, the impression of the voice data can be made similar to the accumulated data.
(7) In the image DB 33, image data is associated with a part type, and image data is selected for each part using the impression specified from the audio data. Then, the image data is generated by combining the image data for each selected part. Therefore, each part of the image data constituting the virtual character can have a unified impression.

以上、本発明の実施形態について説明したが、本発明は上述した実施形態に限定されるものではない。また、実施形態に記載した効果は、本発明から生じる最も好適な効果を列挙したに過ぎず、本発明による効果は、実施形態に記載したものに限定されない。なお、上述した実施形態及び後述する変形形態は、適宜組み合わせて用いることもできるが、詳細な説明は省略する。 Although the embodiments of the present invention have been described above, the present invention is not limited to the above-described embodiments. Moreover, the effects described in the embodiments are merely a list of the most preferable effects arising from the present invention, and the effects according to the present invention are not limited to those described in the embodiments. The above-described embodiment and the modified form described later can be used in combination as appropriate, but detailed description thereof will be omitted.

（変形形態）
（１）本実施形態では、入力データを入力部から受け付けるものを例に説明したが、これに限定されない。例えば、通信部を介して接続された端末から仮想キャラクタ生成装置に対して入力データを送信することで、仮想キャラクタ生成装置の制御部は、入力データを受け付けてもよい。また、処理前のデータを記憶部に記憶させておき、記憶部に記憶された処理前のデータを入力データとして受け付けてもよい。 (Transformed form)
(1) In the present embodiment, an example of receiving input data from an input unit has been described, but the present embodiment is not limited to this. For example, the control unit of the virtual character generation device may accept the input data by transmitting the input data from the terminal connected via the communication unit to the virtual character generation device. Further, the data before processing may be stored in the storage unit, and the data before processing stored in the storage unit may be accepted as input data.

（２）本実施形態では、三次元の印象マップを例に、印象を説明したが、これは、一例である。さらに別の次元があってもよい。また、各軸を構成する項目も一例であり、他の項目であってもよい。 (2) In the present embodiment, the impression is explained using a three-dimensional impression map as an example, but this is an example. There may be yet another dimension. Further, the items constituting each axis are also an example, and may be other items.

（３）本実施形態では、音声データを受け付けた場合に生成する仮想キャラクタの画像データについて、特段の指定をしないものであったが、指定を含めて音声データを受け付けてもよい。指定とは、例えば、全身、上半身等の仮想キャラクタとして表示する画像の範囲指定をいう。そのようにすれば、音声データの印象に基づいて選定する画像データを、指定に基づくものにできる。また、仮想キャラクタのメインになる画像を、全身と、上半身を例に説明したが、顔画像のみであってもよい。 (3) In the present embodiment, the image data of the virtual character generated when the voice data is received is not specified in particular, but the voice data may be accepted including the designation. The designation means, for example, the range designation of the image to be displayed as a virtual character such as the whole body and the upper body. By doing so, the image data selected based on the impression of the audio data can be based on the designation. Further, although the main image of the virtual character has been described by taking the whole body and the upper body as an example, only the face image may be used.

１仮想キャラクタ生成装置
１０制御部
１１入力データ受付部
１３画像特徴抽出部
１４属性推定部
１５画像印象特定部
１６音声選定部
１８音声特徴量算出部
１９音声印象特定部
２０画像選定部
２１仮想キャラクタ生成部
３０記憶部
３１ａ仮想キャラクタ生成プログラム
３２音声ＤＢ
３３画像ＤＢ
３７入力部
３８表示部
３９通信部 1 Virtual character generation device 10 Control unit 11 Input data reception unit 13 Image feature extraction unit 14 Attribute estimation unit 15 Image impression specification unit 16 Voice selection unit 18 Voice feature amount calculation unit 19 Voice impression specification unit 20 Image selection unit 21 Virtual character generation Unit 30 Storage unit 31a Virtual character generation program 32 Voice DB
33 Image DB
37 Input unit 38 Display unit 39 Communication unit

Claims

A virtual character generator that generates a virtual character composed of a combination of image data and audio data.
Input data receiving means for receiving input data and
A first storage unit that stores first data composed of image data or audio data and an impression classifier used to identify the impression of the first data.
A second storage unit that stores a plurality of second data to which impressions of different types from the first data are associated with each other.
With reference to the first storage unit, an impression specifying means for specifying an impression of the input data received by the input data receiving means, and an impression specifying means.
With reference to the second storage unit, a selection means for selecting the second data associated with the impression specified by the impression specifying means, and a selection means.
A virtual character generating means in which a virtual character is generated by the input data received by the input data receiving means and the second data selected by the selection means, and
A virtual character generator comprising.

In the virtual character generation device according to claim 1,
The input data receiving means receives the image data of one virtual character as the input data, and receives the image data.
The second storage unit is a voice data storage unit in which the voice data and the impression matching the voice data are previously associated with each other.
A feature extraction means for analyzing the image data of the one virtual character received by the input data receiving means and extracting image feature data indicating features related to the image data of the one virtual character is provided.
The impression specifying means identifies the impression related to the image data of the one virtual character based on the image feature data extracted by the feature extracting means.
The selection means refers to the voice data storage unit and selects the voice data from the impression specified by the impression specifying means.
Virtual character generator.

In the virtual character generation device according to claim 2.
In the voice data storage unit, the voice data and the impression are associated with each other for each gender and age.
An estimation means for estimating gender and age based on the image feature data extracted by the feature extraction means is provided.
The selection means refers to the voice data storage unit, and further selects the voice data based on the gender and the age estimated by the estimation means.
Virtual character generator.

In the virtual character generator according to claim 2 or 3.
The first storage unit is a character feature storage unit that associates the image feature data of each virtual character with the impression.
The impression specifying means identifies the impression based on the degree of coincidence between the image feature data extracted by the feature extracting means and the image feature data stored in the character feature storage unit.
Virtual character generator.

In the virtual character generation device according to claim 1,
The input data receiving means receives one voice data as the input data, and receives the input data.
The second storage unit is an image data storage unit in which image data representing at least a part of each virtual character and the impression matching the image data are associated in advance.
A feature amount calculating means for analyzing the one voice data received by the input data receiving means and calculating the feature amount of the one voice data is provided.
The impression specifying means identifies the impression related to the one voice data based on the feature amount of the voice data extracted by the feature amount calculating means.
The selection means refers to the image data storage unit and selects the image data from the impression specified by the impression specifying means.
Virtual character generator.

In the virtual character generation device according to claim 5.
The first storage unit is a voice feature storage unit that associates the feature amount of the voice data with the impression.
The impression specifying means is based on the degree of coincidence between the feature amount of the one voice data calculated by the feature amount calculating means and the feature amount of the voice data stored in the voice feature storage unit. To identify,
Virtual character generator.

In the virtual character generation device according to claim 5 or 6.
In the image data storage unit, the image data and the impression are associated with each part constituting each virtual character.
The selection means refers to the image data storage unit, and selects the image data from the impression specified by the impression specifying means for each part.
The virtual character generation means generates image data of one virtual character by combining the image data selected for each part by the selection means, and the generated image data and the one audio data are used. Generate a virtual character,
Virtual character generator.

The program for operating a computer as the virtual character generation device according to any one of claims 1 to 7.