JP2021043723A

JP2021043723A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2021043723A
Application number: JP2019165579A
Authority: JP
Inventors: 一則奥冨; Kazunori Okutomi; 山崎　健史; Takeshi Yamazaki; 健史山崎
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-09-11
Filing date: 2019-09-11
Publication date: 2021-03-18
Anticipated expiration: 2039-09-11
Also published as: JP7418106B2

Abstract

To generate a response model for reproduction of a response from a specific person of a predetermined age.SOLUTION: An information processing apparatus is characterized by having: acquiring means for acquiring information on a response of a specific person together with age information of the specific person upon the response; setting means for setting a predetermined age; extracting means for extracting, from the acquired response information, information on a response corresponding to the predetermined age, based on the predetermined age and the age information; and generating means for generating a learning model for the response of the specific person from the extracted response information.SELECTED DRAWING: Figure 2

Description

本発明は、特定人物を模した応答を実現する技術に関する。 The present invention relates to a technique for realizing a response imitating a specific person.

近年、パーソナルコンピュータやスマートフォンの処理性能の向上に伴い、ユーザの質問や問いかけに対して、パーソナルコンピュータやスマートフォン上に生成されたチャットボットやアバターが自動応答する技術が発達している。チャットボットに関する技術として、特許文献１には、複数のユーザの端末からの質問文に対して自動応答するチャットボットシステムにおいて、長文の応答メッセージを適切な長さに分割することが開示されている。チャットボットやアバターとのコミュニケーションにおいて、ユーザは、必要な情報が速やかに得られるというだけでなく、コミュニケーション自体を楽しむことも出来るようになってきている。そのため、より人間らしいコミュニケーションを人工知能（ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ（ＡＩ））で実現するために、様々なケースの会話などをコンピュータに学習させることも行われている。機械学習されたアバターは、あたかも、本物の特定人物が応答しているかのような自動応答をすることが可能である。 In recent years, along with the improvement of the processing performance of personal computers and smartphones, a technology has been developed in which chatbots and avatars generated on personal computers and smartphones automatically respond to user's questions and inquiries. As a technique related to a chatbot, Patent Document 1 discloses that a long response message is divided into appropriate lengths in a chatbot system that automatically responds to question sentences from terminals of a plurality of users. .. In communication with chatbots and avatars, users are not only able to quickly obtain the necessary information, but are also able to enjoy the communication itself. Therefore, in order to realize more human-like communication with artificial intelligence (AI), conversations in various cases are learned by a computer. Machine-learned avatars can automatically respond as if a real person were responding.

特開２００９−３５３３号公報Japanese Unexamined Patent Publication No. 2009-3533

しかしながら、従来技術におけるチャットボットシステムやアバターとのコミュニケーションでは、ユーザに対して十分な満足感を与えることが出来ない場合がある。例えば、俳優などの著名人を模したアバターとコミュニケーションを行う場合、現在の著名人を模したアバターではなく、過去の最も活躍していた時の著名人を再現したアバターとのコミュニケーションが望まれることがある。また、故人をアバターで再現してコミュニケーションを行う場合、亡くなる直前の故人ではなく、所望の年齢の故人とのコミュニケーションが望まれることもある。 However, communication with a chatbot system or an avatar in the prior art may not give sufficient satisfaction to the user. For example, when communicating with an avatar that imitates a celebrity such as an actor, it is desirable to communicate with an avatar that reproduces the celebrity who was most active in the past, not the avatar that imitates the current celebrity. There is. In addition, when communicating by reproducing the deceased person with an avatar, it may be desired to communicate with the deceased person of a desired age instead of the deceased person immediately before his death.

本発明は、上記課題を鑑みてなされたものであり、所定の年齢の特定人物を模した応答を実現するための学習モデルを生成することを目的とする。 The present invention has been made in view of the above problems, and an object of the present invention is to generate a learning model for realizing a response imitating a specific person of a predetermined age.

本発明は、特定人物の応答の情報を、前記応答の時の前記特定人物の年齢情報とともに取得する取得手段と、所定の年齢を設定する設定手段と、前記所定の年齢と前記年齢情報とに基づき、前記応答の情報から前記所定の年齢に対応する応答の情報を抽出する抽出手段と、前記抽出された応答の情報から、前記特定人物の応答の学習モデルを生成する生成手段と、を有することを特徴とする。 The present invention includes acquisition means for acquiring response information of a specific person together with age information of the specific person at the time of the response, setting means for setting a predetermined age, and the predetermined age and the age information. Based on this, it has an extraction means for extracting response information corresponding to the predetermined age from the response information, and a generation means for generating a learning model of the response of the specific person from the extracted response information. It is characterized by that.

本発明によれば、所定の年齢の特定人物を模した応答を実現出来る。 According to the present invention, it is possible to realize a response that imitates a specific person of a predetermined age.

第１の実施形態における装置およびシステム構成図Device and system configuration diagram in the first embodiment 第１の実施形態における機能構成図Functional configuration diagram in the first embodiment Ｑ＆Ａデータと年齢情報とを示す図Diagram showing Q & A data and age information 第１の実施形態における応答モデルの概念図Conceptual diagram of the response model in the first embodiment ニューラルネットワークの概念を示す図Diagram showing the concept of neural networks 第１の実施形態における応答モデル生成のフローチャートFlow chart of response model generation in the first embodiment 第１の実施形態における応答モデルを使ったコミュニケーションのフローチャートFlow chart of communication using the response model in the first embodiment 第２の実施形態における応答モデルの概念図Conceptual diagram of the response model in the second embodiment 年齢別の応答モデルを用いたアバターの表示例Display example of avatar using response model by age 応答モデルを年齢別に選択するモデルの概念図Conceptual diagram of a model that selects a response model by age

（第１の実施形態）
以下、図面を用いて、第一の実施形態を説明する。図１は、本実施形態における装置およびシステム構成図である。情報処理装置１１００は、例えば、一般的なコンピュータであり、特定人物を模した応答を実現するための応答モデル（学習モデル）を生成する。 (First Embodiment)
Hereinafter, the first embodiment will be described with reference to the drawings. FIG. 1 is a configuration diagram of an apparatus and a system according to the present embodiment. The information processing device 1100 is, for example, a general computer, and generates a response model (learning model) for realizing a response imitating a specific person.

情報処理装置１１００は、ＣＰＵ１１０１、ＤＲＡＭ１１０２、二次記憶装置１１０３、ネットワークＩＦ１１０４、Ｉ／Ｏコントローラ１１０５から構成されている。ＣＰＵ１１０１は、中央演算装置であり、コンピュータプログラムの指示命令に従い、各種処理を行う。ＤＲＡＭ１１０２は、揮発性のメモリであり、コンピュータプログラムや各種データを一時的に記憶する。二次記憶装置１１０３は、ハードディスクドライブ（ＨＤＤ）もしくはソリッドステイトディスク（ＳＳＤ）などの不揮発性記憶装置であり、コンピュータプログラムや各種データを記憶する。ネットワーク１４００は、有線ケーブルもしくは無線ネットワークであり、ネットワーク１４００を介して、外部装置との各種データの送受信が行われる。コンピュータプログラムや各種データは、二次記憶装置１１０３もしくはネットワーク１４００から取得され、ＤＲＡＭ１１０２に一時的に記憶される。ＣＰＵ１１０１は、ＤＲＡＭ１１０２に記憶されたコンピュータプログラムや各種データに基づき情報処理を行う。Ｉ／Ｏコントローラ１１０５は、外部接続された入力装置１１０６、モニタ１１０７、スピーカ１１０８との間の入出力を制御するコントローラである。入力装置１１０６は、例えば、キーボードやマウス、マイクのデバイスであり、ユーザの入力を受け付ける。ユーザは、入力装置１１０６を使い、情報処理装置１１００上に再現された仮想的な特定人物に対して、テキスト入力や音声入力で、問いかけや質問を行うことが可能である。モニタ１１０７は、例えば、一般的な液晶ディスプレイであり、情報処理装置１１００の処理結果を表示する。モニタ１１０７上に、情報処理装置１１００上に再現された特定人物のコンピュータグラフィックスを表示しても良い。また、モニタ１１０７上に、特定人物の応答をテキスト表示しても良い。スピーカ１１０８は、情報処理装置１１００上に再現された特定人物の応答を音声出力するようにしても良い。 The information processing device 1100 is composed of a CPU 1101, a DRAM 1102, a secondary storage device 1103, a network IF 1104, and an I / O controller 1105. The CPU 1101 is a central processing unit, and performs various processes according to instructions and instructions of a computer program. The DRAM 1102 is a volatile memory and temporarily stores computer programs and various data. The secondary storage device 1103 is a non-volatile storage device such as a hard disk drive (HDD) or a solid state disk (SSD), and stores computer programs and various data. The network 1400 is a wired cable or a wireless network, and various data are transmitted and received to and from an external device via the network 1400. The computer program and various data are acquired from the secondary storage device 1103 or the network 1400, and are temporarily stored in the DRAM 1102. The CPU 1101 performs information processing based on a computer program and various data stored in the DRAM 1102. The I / O controller 1105 is a controller that controls input / output between an externally connected input device 1106, a monitor 1107, and a speaker 1108. The input device 1106 is, for example, a keyboard, mouse, or microphone device, and receives user input. The user can use the input device 1106 to ask a question or ask a question to a virtual specific person reproduced on the information processing device 1100 by text input or voice input. The monitor 1107 is, for example, a general liquid crystal display, and displays the processing result of the information processing apparatus 1100. Computer graphics of a specific person reproduced on the information processing apparatus 1100 may be displayed on the monitor 1107. Further, the response of a specific person may be displayed as text on the monitor 1107. The speaker 1108 may output the response of a specific person reproduced on the information processing device 1100 by voice.

ユーザは、モニタ１１０７上に表示された特定人物のコンピュータグラフィックスを視認しながら、入力装置１１０６で質問などの問いかけを行い、スピーカ１１０８から音声出力される応答を聞くことが出来る。尚、情報処理装置１１００上に再現された特定人物とのコミュニケーションは、入力装置１１０６、モニタ１１０７、スピーカ１１０８を用いずに行っても良い。例えば、情報処理装置１１００とネットワーク１４００を介して接続されているクライアントＰＣやスマートフォン１３００を使って、土曜のコミュニケーションを行うことが可能である。 While visually recognizing the computer graphics of a specific person displayed on the monitor 1107, the user can ask a question or the like on the input device 1106 and listen to the response output by the speaker 1108. Communication with the specific person reproduced on the information processing device 1100 may be performed without using the input device 1106, the monitor 1107, and the speaker 1108. For example, it is possible to communicate on Saturday using a client PC or a smartphone 1300 connected to the information processing device 1100 via the network 1400.

図２は、本実施形態における機能構成図である。図２に示した各機能は、情報処理装置１１００が、ＤＲＡＭ１１０２に記憶されたコンピュータプログラムや各種データに基づく処理を行うことにより、実現される。記憶手段２００１は、ＤＲＡＭ１１０２や二次記憶装置１１０３により実現され、Ｑ＆Ａデータおよび年齢情報を保持する。Ｑ＆Ａデータは、特定人物に対して行った質問に対する応答を集めたデータである。Ｑ＆Ａデータは、テキスト形式のデータでも良いし、音声形式のデータでも良い。また、特定人物の自発的な発言として、無言に対する応答を含むデータでも良い。Ｑ＆Ａデータは、携帯電話やスマートフォンの会話ログから収集しても良いし、パーソナルコンピュータ上に記録されているチャットログから収集しても良い。年齢情報は、Ｑ＆Ａデータに含まれる各応答を特定人物が何歳の時にしたものかを示す情報である。年齢情報は、Ｑ＆Ａデータに含まれる各応答に関連付けて記憶されている。図３は、本実施形態におけるＱ＆Ａデータおよび年齢情報を示す図である。図に示す通り、Ｑ＆Ａデータ３００１には、多くの質問に対する回答が含まれている。また、各Ｑ＆Ａが、何歳の時になされたものかを示す年齢情報３００２が関連付けられている。記憶手段２００１には、一人の特定人物のＱ＆Ａデータ３００１および年齢情報３００２だけでなく、複数の特定人物のＱ＆Ａデータ３００１および年齢情報３００２を記憶しておいても良い。また、年齢情報３００２は、２５歳〜３０歳のように年齢の範囲で設定しておいても良い。 FIG. 2 is a functional configuration diagram according to the present embodiment. Each function shown in FIG. 2 is realized by the information processing apparatus 1100 performing processing based on a computer program and various data stored in the DRAM 1102. The storage means 2001 is realized by the DRAM 1102 and the secondary storage device 1103, and holds Q & A data and age information. Q & A data is data that collects responses to questions asked to a specific person. The Q & A data may be text format data or voice format data. Further, as a voluntary remark of a specific person, data including a response to silence may be used. The Q & A data may be collected from the conversation log of the mobile phone or smartphone, or may be collected from the chat log recorded on the personal computer. The age information is information indicating at what age a specific person made each response included in the Q & A data. The age information is stored in association with each response included in the Q & A data. FIG. 3 is a diagram showing Q & A data and age information in this embodiment. As shown in the figure, the Q & A data 3001 contains answers to many questions. In addition, age information 3002 indicating at what age each Q & A was made is associated. The storage means 2001 may store not only the Q & A data 3001 and the age information 3002 of one specific person but also the Q & A data 3001 and the age information 3002 of a plurality of specific persons. Further, the age information 3002 may be set in the age range such as 25 to 30 years old.

取得手段２００２は、記憶手段２００１に記憶されているＱ＆Ａデータと年齢情報とを取得する手段である。そして、設定手段２００３は、ユーザ指示などに基づき所定の年齢を設定する手段である。所定の年齢は、ユーザが所望の年齢を定めればよい。例えば、特定人物が２５歳の時の仮想人物を再現してコミュニケーションを行いたい場合は、２５歳を設定する。また、２５〜３０歳のように年齢の範囲を設定しても良い。抽出手段２００４は、設定手段２００３で設定された所定の年齢に基づき、Ｑ＆Ａデータから対応する応答情報を抽出する手段である。例えば、設定手段２００３で２５歳が設定されている場合、図３に示すＱ＆Ａデータ３００１には、２５歳、５０歳、５１歳の応答情報が含まれているため、年齢情報３００２を参照して、２５歳のみの応答情報を抽出する。生成手段２００５は、抽出手段２００４で抽出された応答情報を用いて、特定人物の応答モデル（学習モデル）を生成する。応答モデルは、最も単純なものでは、抽出手段２００４で抽出されたＱ＆Ａデータそのものでも良い。その場合、記憶手段２００１に記憶されていたＱ＆Ａデータに含まれる質問のみ応答できるモデルになる。判定手段２００６は、生成した応答モデルを使ったコミュニケーションを行うか否か、またコミュニケーションを停止するか否かを判定する判定手段である。判定手段２００６は、所定の時間だけユーザからの質問がなかった場合やユーザからコミュニケーション停止の指示を受けた場合、学習によって生成した応答モデルを使ったコミュニケーションを停止させる。入力手段２００７は、ユーザからの質問を受け付け、応答モデルに質問のデータを入力する手段である。尚、生成した応答モデルが複数存在する場合、入力手段２００７からの入力に基づき、所望の応答モデルを選択する。出力手段２００８は、ユーザからの質問に対する応答モデルの応答を出力する手段である。 The acquisition means 2002 is a means for acquiring the Q & A data and the age information stored in the storage means 2001. Then, the setting means 2003 is a means for setting a predetermined age based on a user instruction or the like. The predetermined age may be set by the user as desired. For example, if you want to reproduce a virtual person when a specific person is 25 years old and communicate with him, set 25 years old. Further, the age range may be set such as 25 to 30 years old. The extraction means 2004 is a means for extracting the corresponding response information from the Q & A data based on the predetermined age set by the setting means 2003. For example, when 25 years old is set by the setting means 2003, the Q & A data 3001 shown in FIG. 3 includes the response information of 25 years old, 50 years old, and 51 years old, so refer to the age information 3002. , Extract response information only for 25 years old. The generation means 2005 generates a response model (learning model) of a specific person by using the response information extracted by the extraction means 2004. The response model, in the simplest form, may be the Q & A data itself extracted by the extraction means 2004. In that case, the model can answer only the questions included in the Q & A data stored in the storage means 2001. The determination means 2006 is a determination means for determining whether or not to perform communication using the generated response model and whether or not to stop the communication. The determination means 2006 stops communication using the response model generated by learning when there is no question from the user for a predetermined time or when the user gives an instruction to stop communication. The input means 2007 is a means for receiving a question from the user and inputting the question data into the response model. When there are a plurality of generated response models, a desired response model is selected based on the input from the input means 2007. The output means 2008 is a means for outputting the response of the response model to the question from the user.

図４は、本実施形態における応答モデルの一例の概念図である。前述した通り、抽出手段２００４で抽出されたＱ＆Ａデータそのものを応答モデルにした場合、記憶手段２００１に記憶されていたＱ＆Ａデータに含まれる質問のみしか応答することが出来ない。よって、記憶手段２００１に記憶されていたＱ＆Ａデータに含まれる質問以外の質問にも応答するため、意味解析モデル４１０２を導入する。意味解析モデル４１０２は、あらかじめ、ニューラルネットワークやサポートベクターマシンなどの機械学習を用いて作成しておく。意味解析モデル４１０２は、テキスト形式もしくは音声形式の質問データを入力とし、質問を同定可能な質問の特徴量４１０３を出力する。類似の質問に同一ラベルを付与した多くの質問データを教師データとして機械学習を行うことにより、類似の質問に対して同一の特徴量を出力する意味解析モデル４１０２を作成することが出来る。Ｑ＆Ａデータが比較的少ない場合は、特徴量４１０３はスカラー値でも良いが、Ｑ＆Ａデータが多い場合は、特徴量４１０３はベクトル値にすると良い。ここで、機械学習の一つであるニューラルネットワークの概念について説明する。 FIG. 4 is a conceptual diagram of an example of the response model in the present embodiment. As described above, when the Q & A data itself extracted by the extraction means 2004 is used as the response model, only the questions included in the Q & A data stored in the storage means 2001 can be answered. Therefore, in order to respond to questions other than the questions included in the Q & A data stored in the storage means 2001, the semantic analysis model 4102 is introduced. The semantic analysis model 4102 is created in advance by using machine learning such as a neural network or a support vector machine. The semantic analysis model 4102 inputs question data in text format or voice format, and outputs a feature amount 4103 of the question that can identify the question. By performing machine learning using many question data with the same label attached to similar questions as teacher data, it is possible to create a semantic analysis model 4102 that outputs the same features for similar questions. When the Q & A data is relatively small, the feature amount 4103 may be a scalar value, but when the Q & A data is large, the feature amount 4103 may be a vector value. Here, the concept of a neural network, which is one of machine learning, will be described.

なお、ニューラルネットワークの原理自体は公知であるため、簡単に説明する。図５は、ニューラルネットワークを説明する図である。図５では中間層を１層としているが、２層以上で中間層を構成することが望ましい。図５に示すニューラルネットワークでは、入力層はＭｉ個のノード（ｎ１１、ｎ１２、…、ｎ１Ｍｉ）を有し、中間層はＭｈ個のノード（ｎ２１、ｎ２２、…、ｎ２Ｍｈ）を有し、出力層（最終層）はＭｏ個のノード（ｎ３１、ｎ３２、…、ｎ３Ｍｏ）を有している。そして、各層のノードは隣接する層の全てのノードと結合しており、階層間で情報伝達を行う３層の階層型ニューラルネットワークを構成している。 Since the principle of the neural network itself is known, it will be briefly described. FIG. 5 is a diagram illustrating a neural network. In FIG. 5, the intermediate layer is one layer, but it is desirable that the intermediate layer is composed of two or more layers. In the neural network shown in FIG. 5, the input layer has Mi nodes (n11, n12, ..., N1Mi), the intermediate layer has Mh nodes (n21, n22, ..., N2Mh), and the output layer. The (final layer) has Mo nodes (n31, n32, ..., N3Mo). The nodes of each layer are connected to all the nodes of the adjacent layers to form a three-layer hierarchical neural network that transmits information between the layers.

入力層に画像を入力する場合、該入力層には、画素とノードとが１対１となるように、画素数分のノードを設ける。また、出力層においても出力する画素数分のノードが設定されている。つまり本実施形態においては、１６画素×１６画素のブロック画像が入力され、１６画素×１６画素の画素値を出力するので、入力層および出量層におけるノードは２５６個である。データは、図５の左から右へ、即ち、入力層、中間層、出力層の順で受け渡される。入力層の各ノードは中間層のすべてのノードに接続され、ノード間の接続はそれぞれ重みを持っている。一方のノードから結合を通して他方のノードに伝達される際の出力値は、結合の重みによって増強あるいは減衰される。このような接続に定められた重み係数、バイアス値の集合は学習モデルのパラメータである。なお活性化関数については特に限定しないが、ロジスティックシグモイド関数やＲｅｃｔｉｆｉｅｄＬｉｎｅａｒＵｎｉｔ（ＲｅＬＵ）関数などを用いれば良い。学習方法としては、種々提案されているニューラルネットワークの学習方法を適用すれば良い。例えば、入力層に生徒データを入力してニューラルネットワークを動作させた場合に出力層から得られる出力と、該生徒データに予め対応づけられている教師データと、の差分を計算し、該差分を極小化するように、重み係数及びバイアス値を調整する。 When an image is input to the input layer, the input layer is provided with nodes for the number of pixels so that the pixels and the nodes are on a one-to-one basis. Also, in the output layer, nodes for the number of pixels to be output are set. That is, in the present embodiment, since the block image of 16 pixels × 16 pixels is input and the pixel value of 16 pixels × 16 pixels is output, the number of nodes in the input layer and the output layer is 256. Data is passed from left to right in FIG. 5, that is, in the order of input layer, intermediate layer, and output layer. Each node in the input layer is connected to all the nodes in the middle layer, and the connections between the nodes have their own weights. The output value transmitted from one node to the other node through the join is enhanced or attenuated by the weight of the join. The set of weighting factors and bias values defined for such a connection is a parameter of the learning model. The activation function is not particularly limited, but a logistic sigmoid function, a Rectifier Unit (ReLU) function, or the like may be used. As a learning method, various proposed neural network learning methods may be applied. For example, the difference between the output obtained from the output layer when the student data is input to the input layer and the neural network is operated and the teacher data associated with the student data in advance is calculated, and the difference is calculated. Adjust the weighting coefficient and bias value to minimize it.

図４に示す通り、上記の通りに作成された意味解析モデル４１０２と、抽出手段２００４で抽出された応答情報から作成されたＱ＆Ａモデル４１０１とを直列接続することにより、応答モデル４１００を作成することが出来る。Ｑ＆Ａモデル４１０１は、Ｑ＆Ａデータの集合であるが、意味解析モデル４１０２と接続するため、入力の質問を意味解析モデル４１０２の特徴量に変更しておく。応答モデル４１００の処理としては、まず、質問が入力されると、意味解析モデル４１０２は、質問を同定可能な特徴量４１０３を出力する。Ｑ＆Ａモデル４１０１は、意味解析モデル４１０２が出力した特徴量４１０３を入力として、特徴量４１０３に対応する応答を出力する。以上の処理により、あらかじめ記憶されていた質問以外の質問にも応答可能な応答モデル４１００を実現することが出来る。尚、上記実施形態では、Ｑ＆Ａモデル４１０１とは別に、意味解析モデル４１０２を機械学習により生成したが、機械学習の段階で、Ｑ＆Ａモデル４１０１を含めて応答モデル４１００を作成しても良い。 As shown in FIG. 4, the response model 4100 is created by connecting the semantic analysis model 4102 created as described above and the Q & A model 4101 created from the response information extracted by the extraction means 2004 in series. Can be done. The Q & A model 4101 is a set of Q & A data, but in order to connect with the semantic analysis model 4102, the input question is changed to the feature amount of the semantic analysis model 4102. As the processing of the response model 4100, first, when a question is input, the semantic analysis model 4102 outputs a feature amount 4103 that can identify the question. The Q & A model 4101 takes the feature amount 4103 output by the semantic analysis model 4102 as an input, and outputs a response corresponding to the feature amount 4103. Through the above processing, it is possible to realize a response model 4100 capable of answering questions other than the questions stored in advance. In the above embodiment, the semantic analysis model 4102 is generated by machine learning separately from the Q & A model 4101, but the response model 4100 may be created including the Q & A model 4101 at the machine learning stage.

図６は、本実施形態における応答モデルの生成を示すフローチャートである。以下、各ステップについて説明する。尚、以下の各ステップは、情報処理装置１１００、クライアントＰＣ１２００、スマートフォン１３００の少なくともいずれか一つが行うものである。 FIG. 6 is a flowchart showing the generation of the response model in the present embodiment. Hereinafter, each step will be described. Each of the following steps is performed by at least one of the information processing device 1100, the client PC 1200, and the smartphone 1300.

Ｓ６０１は、取得手段２００２が、記憶手段２００１から、特定人物のＱ＆Ａデータおよび、対応する年齢情報を取得するステップである。尚、特定人物は、あらかじめ設定しておいても良いし、ユーザの指示に基づき設定しても良い。 S601 is a step in which the acquisition means 2002 acquires the Q & A data of a specific person and the corresponding age information from the storage means 2001. The specific person may be set in advance or may be set based on the user's instruction.

Ｓ６０２は、設定手段２００３が、ユーザ所望の年齢を設定するステップである。ここで設定する年齢は、特定の年齢ではなく、２５〜３０歳のように年齢の範囲であっても良い。また、西暦や元号で指定し、特定人物の生年月日に基づき、年齢を算出ししても良い。 S602 is a step in which the setting means 2003 sets the age desired by the user. The age set here may be in the age range such as 25 to 30 years old instead of a specific age. In addition, the age may be calculated based on the date of birth of a specific person by specifying it in the Christian era or the era name.

Ｓ６０３は、抽出手段２００４が、Ｓ６０２で設定された所定の年齢に基づき、特定人物のＱ＆Ａデータから対応する応答情報を抽出する処理である。例えば、Ｓ６０２で２５歳が設定された場合、Ｓ６０１で取得した年齢情報を参照して、特定人物のＱ＆Ａデータから２５歳の応答情報のみを抽出する。 S603 is a process in which the extraction means 2004 extracts the corresponding response information from the Q & A data of a specific person based on the predetermined age set in S602. For example, when 25 years old is set in S602, only the response information of 25 years old is extracted from the Q & A data of a specific person by referring to the age information acquired in S601.

Ｓ６０４は、Ｓ６０３で抽出された応答情報を用いて、応答モデルを生成するステップである。応答モデルの生成の詳細については、前述したため、ここでは省略する。 S604 is a step of generating a response model using the response information extracted in S603. Since the details of generating the response model have been described above, they will be omitted here.

以上の処理により、ユーザが所望した年齢の特定人物の応答を再現することが可能な応答モデルを生成することが出来る。 By the above processing, it is possible to generate a response model capable of reproducing the response of a specific person of a desired age by the user.

次に、情報処理装置１１００、クライアントＰＣ１２００、スマートフォン１３００の少なくともいずれか一つを使い、生成された応答モデルを用いたコミュニケーションを行う方法について説明する。 Next, a method of communicating using the generated response model using at least one of the information processing device 1100, the client PC 1200, and the smartphone 1300 will be described.

図７は、第１の実施形態における応答モデルを使ったコミュニケーションのフローチャートである。以下、各ステップについて説明する。 FIG. 7 is a flowchart of communication using the response model in the first embodiment. Hereinafter, each step will be described.

Ｓ７０１は、入力手段２００７が、ユーザからの入力指示に基づき、コミュニケーションを望む年齢の特定人物の応答モデルを選択するステップである。尚、ここでは、複数の特定人物もしくは複数年齢の応答モデルがすでに生成されている前提で選択を行っている。しかしながら、所望の応答モデルが一つだけ生成されている場合には、本ステップにおける選択を省略することも可能である。 S701 is a step in which the input means 2007 selects a response model of a specific person of a desired age to communicate based on an input instruction from the user. Here, the selection is made on the premise that a response model of a plurality of specific persons or a plurality of ages has already been generated. However, if only one desired response model is generated, the selection in this step can be omitted.

Ｓ７０２は、判定手段２００６が、Ｓ７０１で選択された応答モデルとのコミュニケーションを停止するか否かを判定するステップである。判定は、ユーザの入力指示や、あらかじめ設定した所定時間を経過したことなどに基づき行う。コミュニケーションを停止すると判定された場合は、本フローにおける処理を終了させる。コミュニケーションを停止しないと判定された場合、すなわち、コミュニケーションを継続すると判定された場合は、Ｓ７０３に処理を進める。 S702 is a step of determining whether or not the determination means 2006 stops communication with the response model selected in S701. The determination is made based on a user's input instruction or the elapse of a preset predetermined time. If it is determined that communication will be stopped, the processing in this flow will be terminated. If it is determined not to stop the communication, that is, if it is determined to continue the communication, the process proceeds to S703.

Ｓ７０３は、判定手段２００６が、ユーザからの質問があるか否かを判定するステップである。ユーザからの質問は、入力装置１１０６、クライアントＰＣ１２００、スマートフォン１３００を用いて、テキスト形式や音声形式のデータで入力される。また、特定人物からの自発的な発言を望む場合は、自発モードをあらかじめ設定しておくことにより、ユーザからの質問がなかったとしても、質問が入力されたものとして処理を進める。質問が入力されたら、質問ありと判定され、Ｓ７０４に処理を進める。質問なしと判定された場合、コミュニケーションを停止するか否かを判定するため、Ｓ７０２に処理を戻す。 S703 is a step in which the determination means 2006 determines whether or not there is a question from the user. Questions from the user are input in text format or voice format data using the input device 1106, the client PC 1200, and the smartphone 1300. Further, when a spontaneous remark from a specific person is desired, by setting the spontaneous mode in advance, even if there is no question from the user, the process proceeds as if the question was input. When the question is input, it is determined that there is a question, and the process proceeds to S704. If it is determined that there is no question, the process is returned to S702 in order to determine whether or not to stop the communication.

Ｓ７０４は、出力手段２００８が、Ｓ７０３で入力された質問に対する応答を出力するステップである。応答は、Ｓ７０３で入力された質問を応答モデルに入力し、応答モデルから応答を出力する。応答は、スピーカ１１０８、モニタ１１０７、ネットワーク１４００を介したクライアントＰＣ１２００、スマートフォン１３００で出力される。出力後、次のユーザからの質問を受けるため、Ｓ７０３に処理を戻す。 S704 is a step in which the output means 2008 outputs a response to the question input in S703. As for the response, the question input in S703 is input to the response model, and the response is output from the response model. The response is output from the speaker 1108, the monitor 1107, the client PC 1200 via the network 1400, and the smartphone 1300. After the output, the process is returned to S703 in order to receive a question from the next user.

以上、本実施形態によれば、ユーザ所望の年齢の特定人物の応答モデルを作成し、生成された応答モデルを使ったコミュニケーションが可能となる。 As described above, according to the present embodiment, it is possible to create a response model of a specific person of a user's desired age and communicate using the generated response model.

（第二の実施形態）
第一の実施形態では、意味解析モデル４１０２とＱ＆Ａモデル４１０１と直列接続させて、応答モデル４１００を生成した。しかしながら、ユーザの多様な質問に対して、対象の特定人物らしい応答を適切に行うためには、異なる方法で応答モデルを作成したほうが良いこともある。 (Second embodiment)
In the first embodiment, the semantic analysis model 4102 and the Q & A model 4101 are connected in series to generate the response model 4100. However, it may be better to create response models in different ways in order to properly respond to a variety of user questions in a way that is unique to the target person.

一般的に、特定人物らしいか否かは、応答の内容と、語尾の癖や方言の有無などの口調とで判別されていると考えられる。そこで、本実施形態では、口調モデルを導入した応答モデルを生成する。尚、応答モデルの生成方法以外に関しては、第一の実施形態と同様であるため、以下では、本実施形態における応答モデルの生成方法のみについて説明する。 In general, it is considered that whether or not a person seems to be a specific person is determined by the content of the response and the tone such as the habit of the ending and the presence or absence of a dialect. Therefore, in the present embodiment, a response model incorporating a tone model is generated. Since the method of generating the response model is the same as that of the first embodiment, only the method of generating the response model in the present embodiment will be described below.

図８は、本実施形態における応答モデル８１００の概念図である。図８において、意味解析モデル８１０２は第一の実施形態における意味解析モデル４１０２と同様で、特徴量８１０３は第一の実施形態における特徴量４１０３と同様である。本実施形態における応答モデル８１００は、意味解析モデル８１０２、再学習済の標準応答モデル８１０１、口調モデル８１０４を直列に接続することにより実現される。 FIG. 8 is a conceptual diagram of the response model 8100 according to the present embodiment. In FIG. 8, the semantic analysis model 8102 is the same as the semantic analysis model 4102 in the first embodiment, and the feature amount 8103 is the same as the feature amount 4103 in the first embodiment. The response model 8100 in the present embodiment is realized by connecting the semantic analysis model 8102, the retrained standard response model 8101, and the tone model 8104 in series.

再学習済の標準応答モデル８１０１は、標準応答モデルに対して、Ｑ＆Ａデータ３００１および年齢情報３００２を用いた再学習を行い、生成されたものである。標準応答モデルは、一般的な標準応答を行うことが可能なモデルであり、標準的な会話ログなどを教師データとすることにより生成可能である。標準的な会話ログからは、特定人物ならではの応答を学習することは出来ないが、膨大なＱ＆Ａデータを収集することが可能である。よって、多様な質問に対して応答可能な応答モデルを生成することが出来る。本実施形態では、多様な質問に対して応答可能な標準応答モデルに対して、Ｑ＆Ａデータ３００１および年齢情報３００２を用いて再学習を行う。すなわち、ユーザが設定した年齢に対応するＱ＆Ａデータ３００１を用いて再学習を行い、Ｑ＆Ａデータ３００１に含まれる質問に対してはＱ＆Ａデータ３００１に含まれる応答を行う。そして、Ｑ＆Ａデータ３００１に含まれない質問に対しては標準応答を行うように学習するものである。 The retrained standard response model 8101 is generated by retraining the standard response model using the Q & A data 3001 and the age information 3002. The standard response model is a model capable of performing a general standard response, and can be generated by using a standard conversation log or the like as teacher data. It is not possible to learn the response unique to a specific person from a standard conversation log, but it is possible to collect a huge amount of Q & A data. Therefore, it is possible to generate a response model that can respond to various questions. In the present embodiment, the standard response model capable of responding to various questions is relearned using the Q & A data 3001 and the age information 3002. That is, re-learning is performed using the Q & A data 3001 corresponding to the age set by the user, and the response included in the Q & A data 3001 is performed for the question included in the Q & A data 3001. Then, it is learned to give a standard response to a question not included in the Q & A data 3001.

口調モデル８１０４は、入力されたテキストもしくは音声をそのまま出力するモデルに対して、ユーザが設定した年齢に対応するＱ＆Ａデータ３００１を用いて口調を学習させたモデルである。すなわち、入力されたテキストもしくは音声に対して、内容としては同じあるが、語尾の癖や方言などの特定人物の口調らしさが加わった出力を行うように学習させたものである。 The tone model 8104 is a model in which the tone is learned by using the Q & A data 3001 corresponding to the age set by the user for the model that outputs the input text or voice as it is. That is, the input text or voice is trained to output the same content but with the tone of a specific person such as a habit of ending a word or a dialect.

本実施形態における応答モデル８１００では、第一の実施形態と同様に、質問が入力されると、意味解析モデル８１０２は、質問を同定可能な特徴量８１０３を出力する。再学習済の標準応答モデル８１０１は、意味解析モデル８１０２が出力した特徴量８１０３を入力として、特徴量８１０３に対応する応答を出力する。ここでの応答は、Ｑ＆Ａデータ３００１に含まれていた質問に対しては、特定人物らしい応答になるが、Ｑ＆Ａデータ３００１に含まれていなかった質問に対しては、標準応答となる。再学習済の標準応答モデル８１０１の出力は、口調モデル８１０４に入力され、特定人物らしい口調の応答で出力される。以上の処理により、仮に、再学習済の標準応答モデル８１０１からの出力が標準応答だったとしても、口調モデル８１０４により特定人物らしい口調の応答で出力されるため、ユーザは特定人物らしさを感じることが出来る。再学習済の標準応答モデル８１０１からの出力がＱ＆Ａデータ３００１に含まれていた質問だった場合は、応答の内容と口調ともに特定人物らしくすることが可能になる。 In the response model 8100 in the present embodiment, as in the first embodiment, when a question is input, the semantic analysis model 8102 outputs a feature amount 8103 that can identify the question. The retrained standard response model 8101 takes the feature amount 8103 output by the semantic analysis model 8102 as an input, and outputs a response corresponding to the feature amount 8103. The response here is a specific person-like response to the question included in the Q & A data 3001, but is a standard response to the question not included in the Q & A data 3001. The output of the relearned standard response model 8101 is input to the tone model 8104, and is output with a response of a tone that seems to be a specific person. By the above processing, even if the output from the relearned standard response model 8101 is a standard response, the tone model 8104 outputs a response with a tone that seems to be a specific person, so that the user feels like a specific person. Can be done. If the output from the relearned standard response model 8101 is a question included in the Q & A data 3001, it is possible to make both the content and tone of the response look like a specific person.

尚、本実施形態におけるモデルの学習は、前述したニューラルネットワークやサポートベクターマシンなどの各機械学習の手法を用いて実現することが可能である。以上、本実施形態によれば、ユーザの多様な質問に対して、対象の特定人物らしい応答を適切に行う応答モデルを生成することが可能となる。 The learning of the model in this embodiment can be realized by using each machine learning method such as the neural network and the support vector machine described above. As described above, according to the present embodiment, it is possible to generate a response model that appropriately responds to various questions of the user in a manner specific to the target specific person.

図９は、年齢別の応答モデルを用いたアバターの表示例を示したものである。図９に示した年齢別のアバターの画像を、モニタ１１０７、クライアントＰＣ１２００、スマートフォン１３００のいずれかに表示させることにより、ユーザは、互いに異なる年齢とアバターを対面しているかのように感じることが出来る。そして、あらかじめ作成された年齢別の応答モデルを各年齢のアバターに設定しておけば、ユーザの質問や問いかけに対して、年齢ごとに特有の応答をすることが出来るため、ユーザが互いに異なる年齢のアバターと会話をしているような感覚を得ることが出来る。 FIG. 9 shows an example of displaying an avatar using an age-specific response model. By displaying the image of the avatar by age shown in FIG. 9 on any of the monitor 1107, the client PC 1200, and the smartphone 1300, the user can feel as if they are facing different ages and avatars. .. Then, if a pre-created response model for each age is set for the avatar of each age, it is possible to respond to the user's question or question uniquely for each age, so that the users have different ages. You can get the feeling of having a conversation with your avatar.

図１０は、図９のアバターによる応答を実現するための応答モデルの一例であり、応答モデルを年齢別に選択するモデルの概念図である。図に示す通り、応答モデル１０１００には、２５歳で再学習済の標準応答モデル１０１０１、５０歳で再学習済の標準応答モデル１０２０１、７０歳で再学習済の標準応答モデル１０３０１、が含まれている。更に、２５歳で再学習済の標準応答モデル１０１０１には２５歳の口調モデル１０１０４が接続されている。５０歳で再学習済の標準応答モデル１０２０１には５０歳の口調モデル１０２０４が接続され、７０歳で再学習済の標準応答モデル１０３０１には７０歳の口調モデル１０３０４が接続されている。モデルの機能は図８に示した応答８１００と類似している。大きく異なる点として、入力（質問）に対して意味解析モデルおよび年齢選択１０１０２では、入力（質問）の意味解析に加えて、年齢選択を行い、選択された年齢に対応するモデルに対して、特徴量１０１０３を出力するようになっている。年齢選択は、あらかじめ設定されたものでも良いし、ユーザの設定指示に応じて選択しても良い。また、特定の年齢だけでなく、年齢の幅で選択しても良い。このような構成にすることにより、図９に示した年齢別のアバターそれぞれに対応した応答モデルが選択され、適切な出力（応答）がなされる。 FIG. 10 is an example of a response model for realizing the response by the avatar of FIG. 9, and is a conceptual diagram of a model in which the response model is selected according to age. As shown in the figure, the response model 10100 includes a standard response model 10101 retrained at age 25, a standard response model 10201 retrained at age 50, and a standard response model 10301 retrained at age 70. ing. Further, a 25-year-old tone model 10104 is connected to the standard response model 10101 that has been relearned at the age of 25. A 50-year-old tone model 10204 is connected to the standard response model 10201 relearned at the age of 50, and a 70-year-old tone model 10304 is connected to the standard response model 10301 relearned at the age of 70. The function of the model is similar to the response 8100 shown in FIG. The major difference is that the semantic analysis model and age selection 10102 for input (question) performs age selection in addition to semantic analysis of input (question), and features the model corresponding to the selected age. The quantity 10103 is output. The age selection may be preset or may be selected according to a user's setting instruction. Moreover, you may select not only a specific age but also the range of ages. With such a configuration, a response model corresponding to each age-specific avatar shown in FIG. 9 is selected, and an appropriate output (response) is made.

（その他の実施例）
本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other Examples)
The present invention supplies a program that realizes one or more functions of the above-described embodiment to a system or device via a network or storage medium, and one or more processors in the computer of the system or device reads and executes the program. It can also be realized by the processing to be performed. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

２００２取得手段
２００３設定手段
２００４抽出手段
２００５生成手段 2002 Acquisition means 2003 Setting means 2004 Extraction means 2005 Generation means

Claims

An acquisition means for acquiring the response information of the specific person together with the age information of the specific person at the time of the response, and
Setting means to set a predetermined age and
An extraction means for extracting response information corresponding to the predetermined age from the response information based on the predetermined age and the age information.
An information processing apparatus comprising: a generation means for generating a learning model of a response of a specific person from the extracted response information.

The information processing device according to claim 1, wherein the setting means sets a predetermined age within an age range.

The information processing device according to claim 1 or 2, wherein each of the responses included in the response information is associated with the age information of the specific person.

The information processing apparatus according to any one of claims 1 to 3, wherein the response learning model is a model that outputs a response corresponding to the predetermined age in response to an input from a user. ..

The information processing apparatus according to any one of claims 1 to 4, wherein the generation means generates a response learning model including an analysis model that analyzes the meaning of input from a user.

The information processing apparatus according to any one of claims 1 to 5, wherein the generation means generates a learning model of a response including a tone model that reproduces the tone of the specific person of a predetermined age. ..

Any one of claims 1 to 6, wherein the generation means generates a learning model of the response based on the extracted response information and a standard response model that performs a standard response. The information processing device described in.

The information processing apparatus according to any one of claims 1 to 7, wherein the generation means uses a neural network to generate a learning model of the response.

The information processing apparatus according to any one of claims 1 to 8, further comprising a display means for displaying an avatar associated with a learning model of the response of the specific person.

An acquisition step in which the acquisition means acquires the response information of the specific person together with the age information of the specific person at the time of the response.
The setting means is a setting process for setting a predetermined age, and
An extraction step in which the extraction means extracts the response information corresponding to the predetermined age from the response information based on the predetermined age and the age information.
An information processing apparatus characterized in that the generation means includes a generation step of generating a learning model of the response of the specific person from the extracted response information.

Computer,
An acquisition means for acquiring the response information of the specific person together with the age information of the specific person at the time of the response, and
Setting means to set a predetermined age and
An extraction means for extracting response information corresponding to the predetermined age from the response information based on the predetermined age and the age information.
A program that functions as an information processing device, which comprises a generation means for generating a learning model of the response of the specific person from the extracted response information.