JP2022159632A

JP2022159632A - Learning method and content reproduction device

Info

Publication number: JP2022159632A
Application number: JP2021063937A
Authority: JP
Inventors: 継河合; Kei Kawai
Original assignee: Crystal Method Co Ltd
Current assignee: Crystal Method Co Ltd
Priority date: 2021-04-05
Filing date: 2021-04-05
Publication date: 2022-10-18
Anticipated expiration: 2041-04-05
Also published as: JP6930781B1

Abstract

To provide a learning method capable of representing characters in which multifaceted feelings of a user can be reflected and a content reproduction device.SOLUTION: A learning method includes an input data acquisition step, an output data acquisition step, a first internal representation database generation step, and a second internal representation database generation step. The input data acquisition step acquires user data of a user. The output data acquisition step acquires internal representation data indicating an internal representation of the user. The first internal representation database generation step generates a first internal representation database with the user data as first input data. The second internal representation database generation step generates second internal representation database with user data as second input data and with data of one kind or more included internal representation data and different from a kind of the first output data in the first internal representation database generation step as second output data.SELECTED DRAWING: Figure 14

Description

本発明は、学習方法、及びコンテンツ再生装置に関する。 The present invention relates to a learning method and a content reproduction device.

近年、ＡＩ（Artificial Intelligence）によって生成された、外見や声や嗜好がユーザと類似するキャラクターを生成する技術が注目を集めている。例えば、出産や七五三や成人式や結婚式の記録を残すために写真を取るように、ユーザの記録をキャラクター化することによって、当時の知識や技術、思い出までもがデジタル上で生き続けることが可能となる。それと同時に、キャラクターとの会話がユーザとの会話と比べて、違和感のない会話にするための、キャラクターの表現を再生するための技術が注目されており、例えば特許文献１の話者変換技術が知られている。 2. Description of the Related Art In recent years, attention has been focused on a technique for generating a character that is similar in appearance, voice, and tastes to a user, generated by AI (Artificial Intelligence). For example, by turning the user's record into a character, just like taking pictures to record childbirth, Shichigosan, coming-of-age ceremonies, and weddings, the knowledge, skills, and even memories of the time can continue to live digitally. It becomes possible. At the same time, technology for reproducing character expressions to make conversations with characters feel more natural than conversations with users is attracting attention. Are known.

上記特許文献１に記載された技術は、ソース話者の発話を表す映像音声データをソース話者の感情に対応して発話を発声するターゲット話者を表す映像音声データに変換する学習済みニューラルネットワーク構造を格納し、ニューラルネットワークを介しソース話者の発話を表す映像データ及びソース話者の音声データをターゲット話者の発話を表す映像データ及び音声データに変換することで、映像処理及び音響処理の専門家などによる変換処理を必要とすることなく、映像データと音声データとを相互利用することによって話者変換を行うことができる話者変換装置に関する技術である。 The technology described in Patent Document 1 is a trained neural network that converts video-audio data representing an utterance of a source speaker into video-audio data representing a target speaker who utters an utterance corresponding to the emotion of the source speaker. By storing the structure and converting the video data representing the utterance of the source speaker and the audio data of the source speaker into the video data and the audio data representing the utterance of the target speaker via the neural network, the video processing and the audio processing are performed. This technology relates to a speaker conversion device capable of performing speaker conversion by mutually using video data and audio data without requiring conversion processing by an expert or the like.

特開２０２０－９１３３８号公報Japanese Patent Application Laid-Open No. 2020-91338

ここで、特許文献１では、ユーザの発話を表す映像音声データと、ユーザの感情に対応するキャラクターの発話を表す映像音声データとのペアから構成される訓練データをニューラルネットワーク構造に入力し、キャラクターの映像音声データを出力する。しかしながら、特許文献１では、訓練データとして、ユーザの発話を表す映像音声データと、ユーザの感情に対応するキャラクターの発話を表す映像音声データとを用いているため、ユーザの複雑な感情まで、映像音声データに反映することはできず、ユーザと会話する時と比べて、違和感が残るキャラクターの映像音声データができてしまう。例えば、ユーザの感情が怒りを示していても、表情が笑顔であった場合、特許文献１では、ユーザの感情として笑顔が選択されるため、キャラクターの映像音声データにユーザの感情の怒りを反映することができず、違和感の残るキャラクターの映像音声データができてしまう。従って、ユーザの多面的な感情が反映できるキャラクターの表現を再生するための技術が望まれている。 Here, in Patent Document 1, training data composed of a pair of audio-video data representing a user's utterance and audio-video data representing a character's utterance corresponding to the user's emotion is input to a neural network structure, and the character's output video and audio data. However, in Patent Document 1, as training data, audiovisual data representing a user's utterance and audiovisual data representing an utterance of a character corresponding to the user's emotion are used. This cannot be reflected in the audio data, resulting in video and audio data of the character that leaves a sense of incongruity compared to when talking with the user. For example, even if the user's emotion indicates anger, if the facial expression is a smile, in Patent Document 1, a smile is selected as the user's emotion. It is impossible to do so, and the video and audio data of the character that leaves a sense of incongruity is created. Therefore, there is a demand for a technique for reproducing character expressions that can reflect the user's multifaceted emotions.

そこで本発明は、上述した問題点に鑑みて案出されたものであり、その目的とするところは、ユーザの多面的な感情が反映できるキャラクターの表現を再生できる学習方法、及びコンテンツ再生装置を提供することにある。 SUMMARY OF THE INVENTION Accordingly, the present invention has been devised in view of the above-described problems, and its object is to provide a learning method and a content reproducing apparatus capable of reproducing character expressions that can reflect the user's multifaceted emotions. to provide.

第１発明に係る学習方法は、キャラクターの表現を示す表現データを生成するために用いられるデータベースを生成する学習方法であって、ユーザに関する情報が記載されたテキストデータと、前記ユーザの画像を含む画像データと、前記ユーザの音声に関する音声データとの中の何れか１以上を含むユーザデータを取得する入力データ取得ステップと、前記ユーザの自己認識を示す自己認識データと、前記ユーザの事象に対する優先順位を示す優先順位データと、前記ユーザの事象に対する感情表現を示す感情表現データと、前記ユーザの事象に対する因果関係の推定を示す因果関係データと、の中の２種類以上のデータを含む、前記ユーザの内部表象を示す内部表象データとを取得する出力データ取得ステップと、前記入力データ取得ステップにより取得したユーザデータを第１入力データとし、前記内部表象データに含まれる１種類以上のデータである第１内部表象データを第１出力データとして、前記第１入力データと前記第１出力データとを一組の第１内部表象用学習データとして、複数の前記第１内部表象用学習データを用いた機械学習により第１内部表象用データベースを生成する第１内部表象用データベース生成ステップと、前記入力データ取得ステップにより取得したユーザデータを第２入力データとし、前記第１内部表象用データベース生成ステップにおける第１出力データと異なる種類のデータであり、前記内部表象データに含まれる１種類以上のデータである第２内部表象データを第２出力データとして、前記第２入力データと前記第２出力データとを一組の第２内部表象用学習データとして、複数の前記第２内部表象用学習データを用いた機械学習により第２内部表象用データベースを生成する第２内部表象用データベース生成ステップとを備えることを特徴とする。 A learning method according to a first aspect of the invention is a learning method for generating a database used to generate expression data representing an expression of a character, the learning method including text data describing information about a user and an image of the user. an input data acquisition step of acquiring user data including at least one of image data and audio data relating to the user's voice; self-recognition data indicating the user's self-recognition; and the user's priority for events. including two or more types of data among priority order data indicating a ranking, emotional expression data indicating an emotional expression of the user's event, and causal relationship data indicating a presumed causal relationship of the user's event. an output data obtaining step for obtaining internal representation data representing a user's internal representation; and user data obtained by said input data obtaining step as first input data, and at least one type of data included in said internal representation data. Using the first internal representation data as first output data, the first input data and the first output data as a set of first internal representation learning data, and using a plurality of the first internal representation learning data A first internal representation database generation step for generating a first internal representation database by machine learning; Second internal representation data, which is data of a different type from the first output data and is one or more types of data included in the internal representation data, is used as second output data, and the second input data and the second output data are combined. and a second internal representation database generating step of generating a second internal representation database by machine learning using a plurality of the second internal representation learning data as a set of second internal representation learning data. Characterized by

第２発明に係る学習方法は、第１発明において、前記第１内部表象用データベースを用いて生成された第１内部表象データと、前記第２内部表象用データベースを用いて生成された第２内部表象データとを入力として、前記キャラクターの表現を示す表現データを出力するための表現用データベースを生成する表現用データベース生成ステップをさらに備えることを特徴とする。 A learning method according to a second invention is, in the first invention, the first internal representation data generated using the first internal representation database, and the second internal representation data generated using the second internal representation database. The method further comprises an expression database generating step of generating an expression database for outputting expression data representing an expression of the character, using the expression data as an input.

第３発明に係る学習方法は、第１発明又は第２発明において、前記入力データ取得ステップは、質問に対して前記ユーザが回答した内容に関するテキスト形式のデータを含む前記テキストデータと、前記質問に対して前記ユーザが回答した内容に関する画像形式のデータを含む前記画像データと、前記質問に対して前記ユーザが回答した内容に関する音声形式のデータとの中の何れか１以上を含む前記ユーザデータを取得することを特徴とする。 A learning method according to a third invention is the first invention or the second invention, wherein the input data acquisition step comprises: the user data including any one or more of the image data including image format data relating to the content of the user's response to the question and the audio format data relating to the content of the user's response to the question; characterized by obtaining

第４発明に係る学習方法は、第１発明～第３発明のいずれかにおいて、前記ユーザデータは、前記テキストデータの特徴を示すテキスト特徴量データを有し、前記入力データ取得ステップは、取得した前記テキストデータに基づき抽出された前記テキスト特徴量データを取得するテキスト特徴量データ取得ステップを含むことを特徴とする。 A learning method according to a fourth invention is the learning method according to any one of the first to third inventions, wherein the user data has text feature amount data indicating characteristics of the text data, and the input data acquiring step comprises: It is characterized by including a text feature amount data acquisition step of acquiring the text feature amount data extracted based on the text data.

第５発明に係る学習方法は、第１発明～第４発明のいずれかにおいて、前記ユーザデータは、前記画像データの特徴を示す画像特徴量データを有し、前記入力データ取得ステップは、取得した前記画像データに基づき抽出された前記画像特徴量データを取得する画像特徴量データ取得ステップを含むことを特徴とする。 A learning method according to a fifth aspect is the learning method according to any one of the first to fourth aspects, wherein the user data has image feature amount data indicating features of the image data, and the input data obtaining step includes: It is characterized by including an image feature amount data acquisition step of acquiring the image feature amount data extracted based on the image data.

第６発明に係る学習方法は、第１発明～第５発明のいずれかにおいて、前記ユーザデータは、前記音声データの特徴を示す音声特徴量データを有し、前記入力データ取得ステップは、取得した前記音声データに基づき抽出された前記音声特徴量データを取得する音声特徴量データ取得ステップを含むことを特徴とする。 A learning method according to a sixth invention is the learning method according to any one of the first to fifth inventions, wherein the user data includes voice feature amount data indicating features of the voice data, and the input data obtaining step comprises: It is characterized by including an audio feature amount data acquisition step of acquiring the audio feature amount data extracted based on the audio data.

第７発明に係るコンテンツ再生装置は、第２発明の学習方法により生成された前記第１内部表象用データベースと、前記第２内部表象用データベースと、前記表現用データベースとを参照し、前記キャラクターの表現データを出力するコンテンツ再生装置であって、任意のテキストデータと、画像データと、音声データとの中の何れか１以上のデータを含む刺激データを取得する取得部と、前記第１内部表象用データベースを参照し、前記取得部により取得した刺激データに対応する前記第１内部表象データを取得する第１内部表象処理部と、前記第２内部表象用データベースを参照し、前記取得部により取得した刺激データに対応する前記第２内部表象データを取得する第２内部表象処理部と、前記表現用データベースを参照し、前記第１内部表象用データベースを用いて生成された第１内部表象データと、前記第２内部表象用データベースを用いて生成された第２内部表象データとに対応する前記表現データを出力する表現処理部と、を備えることを特徴とする。 A content reproduction device according to a seventh aspect of the invention refers to the first internal representation database, the second internal representation database, and the representation database generated by the learning method of the second invention, and reproduces the character. A content reproduction device for outputting expression data, comprising: an acquisition unit for acquiring stimulus data including any one or more of arbitrary text data, image data, and audio data; and the first internal representation. a first internal representation processing unit for obtaining the first internal representation data corresponding to the stimulus data obtained by the obtaining unit by referring to the database for internal representation; and obtaining by the obtaining unit by referring to the second internal representation database. a second internal representation processing unit that acquires the second internal representation data corresponding to the stimulus data obtained; and first internal representation data generated by referring to the representation database and using the first internal representation database. and an expression processing unit that outputs the expression data corresponding to the second internal representation data generated using the second internal representation database.

第１発明～第７発明によれば、第１内部表象用データベース生成ステップは、複数の第１内部表象用学習データを用いた機械学習により第１内部表象用データベースを生成し、第２内部表象用データベース生成ステップは、第２入力データと第２出力データとを一組の第２内部表象用学習データとして、複数の第２内部表象用学習データを用いた機械学習により第２内部表象用データベースを生成する。このため、一つのユーザデータから異なる種類のデータを含む内部表象データを生成することができる。これにより、ユーザの感情を多面的に学習することが可能となる。例えばユーザの感情が怒りであるが、表情が笑いであるようなユーザの感情を多面的に学習する。これによって、ユーザの多面的な感情が反映できるキャラクターの表現が可能となる。 According to the first to seventh inventions, the first internal representation database generating step generates the first internal representation database by machine learning using a plurality of first internal representation learning data, and the second internal representation database The step of generating a second internal representation database uses the second input data and the second output data as a set of second internal representation learning data, and performs machine learning using a plurality of second internal representation learning data to generate a second internal representation database. to generate Therefore, internal representation data including different types of data can be generated from one user data. This makes it possible to learn the user's emotions in a multifaceted manner. For example, the user's emotion is anger, but the user's facial expression is laughter. This makes it possible to express a character that can reflect the user's multifaceted emotions.

特に、第２発明によれば、内部表象データを入力として、キャラクターの表現を示す表現データを出力するための表現用データベースを生成する。これにより、キャラクターの表情を取得することができるため、多面的なユーザの感情に合わせたキャラクターの表現を学習することが可能となる。 In particular, according to the second invention, an expression database is generated for outputting expression data representing character expressions by using internal representation data as input. As a result, since the expression of the character can be acquired, it is possible to learn the expression of the character in accordance with the multifaceted emotions of the user.

特に、第３発明によれば、入力データ取得ステップは、質問に対してユーザが回答した内容に関するテキスト形式のデータを含むテキストデータと、質問に対してユーザが回答した内容に関する画像形式のデータを含む画像データと、質問に対してユーザが回答した内容に関する音声形式のデータとの中の何れか１以上を含むユーザデータを取得する。これにより、例えばユーザの好みや価値観に関する質問に対する回答を学習に反映することが可能となり、よりユーザの性質にあった学習ができるため、多面的なユーザの感情を学習することができる。 In particular, according to the third invention, the input data acquisition step acquires text data including text format data relating to the content of the user's answer to the question and image format data relating to the content of the user's answer to the question. User data including one or more of image data including data and voice format data regarding the content of the user's answer to the question is acquired. As a result, for example, it is possible to reflect answers to questions about the user's preferences and values in learning, and learning that is more suited to the user's characteristics is possible, so that the user's multifaceted emotions can be learned.

特に、第４発明によれば、入力データ取得ステップは、取得したテキストデータに基づき抽出されたテキスト特徴量データを取得する。これにより、取得したテキストデータからユーザの性質の傾向を学習することができるため、より精度の高い学習が可能となる。 In particular, according to the fourth invention, the input data acquisition step acquires text feature amount data extracted based on the acquired text data. As a result, since it is possible to learn the tendency of the user's disposition from the acquired text data, more accurate learning becomes possible.

特に、第５発明によれば、入力データ取得ステップは、取得した画像データに基づき抽出された画像特徴量データを取得する。これにより、取得した画像データからユーザの性質の傾向を学習することができるため、より精度の高い学習が可能となる。 In particular, according to the fifth invention, the input data obtaining step obtains the image feature amount data extracted based on the obtained image data. As a result, since it is possible to learn the tendency of the user's disposition from the acquired image data, more accurate learning becomes possible.

特に、第６発明によれば、入力データ取得ステップは、取得した音声データに基づき抽出された音声特徴量データを取得する。これにより、取得した音声データからユーザの性質の傾向を学習することができるため、より精度の高い学習が可能となる。 In particular, according to the sixth invention, the input data acquisition step acquires speech feature amount data extracted based on the acquired speech data. As a result, since it is possible to learn the tendency of the user's disposition from the acquired voice data, more accurate learning becomes possible.

特に、第７発明によれば、第１内部表象用データベースと、第２内部表象用データベースと、表現用データベースとを参照し、刺激データに対するキャラクターの表現データを出力する。これにより、刺激データに対するユーザの内部表象を反映したキャラクターの表現データを出力することができるためユーザの多面的な感情をキャラクターで再生することが可能となる。 In particular, according to the seventh invention, the first internal representation database, the second internal representation database, and the representation database are referred to, and the representation data of the character corresponding to the stimulus data is output. As a result, it is possible to output character expression data that reflects the user's internal representation of the stimulus data, so that the user's multifaceted emotions can be reproduced by the character.

図１は、第１実施形態におけるコンテンツ再生システムの一例を示す模式図である。FIG. 1 is a schematic diagram showing an example of a content reproduction system according to the first embodiment. 図２は、第１実施形態におけるコンテンツ再生システムの動作の一例を示す模式図である。FIG. 2 is a schematic diagram showing an example of the operation of the content reproduction system according to the first embodiment. 図３（ａ）は、第１内部表象用データベースの学習方法の一例を示す模式図であり、図３（ｂ）は、第２内部表象用データベースの学習方法の一例を示す模式図である。FIG. 3(a) is a schematic diagram showing an example of the learning method for the first internal representation database, and FIG. 3(b) is a schematic diagram showing an example of the learning method for the second internal representation database. 図４（ａ）は、表現用データベースの学習方法の一例を示す模式図であり、図４（ａ）は、ｓｏｕｎｄ学習モデルの学習方法の一例を示す模式図である。FIG. 4(a) is a schematic diagram showing an example of a learning method for a representation database, and FIG. 4(a) is a schematic diagram showing an example of a sound learning model learning method. 図５（ａ）は、ｖｉｓｕａｌ学習モデルの学習方法の一例を示す模式図であり、図５（ｂ）は、テキスト学習モデルの学習方法の一例を示す模式図である。FIG. 5(a) is a schematic diagram showing an example of the learning method of the visual learning model, and FIG. 5(b) is a schematic diagram showing an example of the learning method of the text learning model. 図６は、第１内部表象用データベースの一例を示す模式図である。FIG. 6 is a schematic diagram showing an example of the first internal representation database. 図７は、第２内部表象用データベースの一例を示す模式図である。FIG. 7 is a schematic diagram showing an example of the second internal representation database. 図８は、表現用データベースの一例を示す模式図である。FIG. 8 is a schematic diagram showing an example of an expression database. 図９は、ｓｏｕｎｄ学習モデルの一例を示す模式図である。FIG. 9 is a schematic diagram showing an example of a sound learning model. 図１０は、ｖｉｓｕａｌ学習モデルの一例を示す模式図である。FIG. 10 is a schematic diagram showing an example of a visual learning model. 図１１は、テキスト学習モデルの一例を示す模式図である。FIG. 11 is a schematic diagram showing an example of a text learning model. 図１２（ａ）は、実施形態におけるコンテンツ再生装置の構成の一例を示す模式図であり、図１２（ｂ）は、実施形態におけるコンテンツ再生装置の機能の一例を示す模式図であり、図１２（ｃ）は、ＤＢ生成部の一例を示す模式図である。FIG. 12(a) is a schematic diagram showing an example of the configuration of the content reproduction device according to the embodiment, and FIG. 12(b) is a schematic diagram showing an example of the functions of the content reproduction device according to the embodiment. (c) is a schematic diagram showing an example of a DB generator. 図１３は、処理部の一例を示す模式図である。FIG. 13 is a schematic diagram illustrating an example of a processing unit; 図１４は、実施形態における学習方法の一例を示すフローチャートである。FIG. 14 is a flow chart showing an example of a learning method according to the embodiment. 図１５は、実施形態におけるコンテンツ再生システムの動作の一例を示すフローチャートである。FIG. 15 is a flow chart showing an example of the operation of the content reproduction system in the embodiment.

以下、本発明を適用した実施形態における学習方法、コンテンツ再生装置、及びコンテンツ再生システムの一例について、図面を参照しながら説明する。 An example of a learning method, a content reproduction device, and a content reproduction system according to embodiments to which the present invention is applied will be described below with reference to the drawings.

（第１実施形態）
図１～図５を参照して、第１実施形態におけるコンテンツ再生システム１００、コンテンツ再生装置１、及び学習方法の一例について説明する。図１は、本実施形態におけるコンテンツ再生システム１００の一例を示す模式図である。図２は、本実施形態におけるコンテンツ再生システム１００の動作の一例を示す模式図である。図３～図５は、本実施形態における学習方法の一例を示す模式図である。 (First embodiment)
An example of a content reproduction system 100, a content reproduction device 1, and a learning method according to the first embodiment will be described with reference to FIGS. 1 to 5. FIG. FIG. 1 is a schematic diagram showing an example of a content reproduction system 100 according to this embodiment. FIG. 2 is a schematic diagram showing an example of the operation of the content reproduction system 100 according to this embodiment. 3 to 5 are schematic diagrams showing an example of the learning method in this embodiment.

＜コンテンツ再生システム１００＞
コンテンツ再生システム１００は、入力された任意のテキストデータと、画像データと、音声データとの中の何れか１以上のデータを含む刺激データに対し、キャラクターの表現を示す表現データを生成するために用いられる。コンテンツ再生システム１００は、例えば学習データを用いた機械学習により生成されたデータベースを参照し、入力された刺激データに対し、キャラクターの音声データと画像データとテキストデータの中の何れか１以上のデータを含む表現データを生成する。 <Content reproduction system 100>
The content reproduction system 100 generates expression data representing an expression of a character in response to stimulus data including any one or more of input arbitrary text data, image data, and audio data. Used. The content reproduction system 100 refers to a database generated by, for example, machine learning using learning data, and generates any one or more of character voice data, image data, and text data for input stimulus data. Generate expression data containing

コンテンツ再生システム１００は、例えば図１に示すように、コンテンツ再生装置１を備える。コンテンツ再生システム１００は、例えば端末２及びサーバ３の少なくとも何れかを備えてもよい。コンテンツ再生装置１は、例えば通信網４を介して端末２やサーバ３と接続される。 The content reproduction system 100 includes a content reproduction device 1 as shown in FIG. 1, for example. The content reproduction system 100 may include at least one of the terminal 2 and the server 3, for example. The content reproduction device 1 is connected to the terminal 2 and the server 3 via the communication network 4, for example.

コンテンツ再生システム１００では、例えば図２に示すように、コンテンツ再生装置１が刺激データを取得する。例えばコンテンツ再生装置１は、刺激データを取得する。その後、コンテンツ再生装置１は、ｓｏｕｎｄ学習モデルを参照し、刺激データに含まれる音声データに対する音声特徴量データを、ｖｉｓｕａｌ学習モデルを参照し、刺激データに含まれる画像データに対する画像特徴量データを、テキスト学習モデルを参照し、刺激データに含まれるテキストデータに対するテキスト特徴量データを、をそれぞれ取得する。その後、コンテンツ再生装置１は、第１内部表象用データベースを参照し、音声特徴量データと、画像特徴量データと、テキスト特徴量データとの中の何れか１以上のデータに対応する第１内部表象データと、第２内部表象用データベースを参照し、音声特徴量データと、画像特徴量データと、テキスト特徴量データとの中の何れか１以上のデータに対応する第２内部表象データとを取得する。そして、コンテンツ再生装置１は、取得した第１内部表象データ及び第２内部表象データに基づき、表現用データベースを参照し、キャラクターの音声データと画像データとテキストデータの中の何れか１以上のデータを含む表現データを生成する。これにより、コンテンツ再生システム１００では、生成された表現データを出力することで、入力された任意の音声データと画像データとテキストデータの中の何れか１以上のデータを含む刺激データに対応する最適な表現データを再生させることができる。 In the content reproduction system 100, for example, as shown in FIG. 2, the content reproduction device 1 acquires stimulus data. For example, the content reproduction device 1 acquires stimulus data. After that, the content reproduction device 1 refers to the sound learning model to obtain sound feature amount data for the sound data included in the stimulus data, and to the visual learning model to obtain image feature amount data for the image data included in the stimulus data. The text learning model is referred to, and the text feature amount data corresponding to the text data included in the stimulus data are obtained. After that, the content reproduction device 1 refers to the first internal representation database, and refers to the first internal representation data corresponding to any one or more of the audio feature amount data, the image feature amount data, and the text feature amount data. Representation data and second internal representation data corresponding to at least one of speech feature amount data, image feature amount data, and text feature amount data with reference to a second internal representation database get. Then, based on the acquired first internal representation data and second internal representation data, the content reproduction device 1 refers to the representation database, and reproduces any one or more of the voice data, image data, and text data of the character. Generate expression data containing As a result, in the content reproduction system 100, by outputting the generated expression data, an optimal stimulus data corresponding to any one or more of input audio data, image data, and text data can be obtained. expression data can be reproduced.

ｓｏｕｎｄ学習モデルは、入力された音声データから、音声特徴量データを出力するモデルである。ｓｏｕｎｄ学習モデルは、例えば機械学習により、生成されてもよい。ｓｏｕｎｄ学習モデルは、例えば一組の予め取得された過去の音声データと、過去の音声データに紐づけられた音声特徴量データとを学習データ（音声特徴量用学習データ）として、複数の学習データを用いた機械学習により構築された学習済みモデルが用いられてもよい。 The sound learning model is a model that outputs speech feature amount data from input speech data. A sound learning model may be generated, for example, by machine learning. The sound learning model uses, for example, a set of previously acquired past speech data and speech feature quantity data linked to the past speech data as learning data (speech feature quantity learning data), and a plurality of learning data A trained model constructed by machine learning using may be used.

ｖｉｓｕａｌ学習モデルは、入力された画像データから、画像特徴量データを出力するモデルである。ｖｉｓｕａｌ学習モデルは、例えば機械学習により、生成されてもよい。ｖｉｓｕａｌ学習モデルは、例えば一組の予め取得された過去の画像データと、過去の画像データに紐づけられた画像特徴量データとを学習データ（画像特徴量用学習データ）として、複数の学習データを用いた機械学習により構築された学習済みモデルが用いられてもよい。 A visual learning model is a model that outputs image feature amount data from input image data. A visual learning model may be generated, for example, by machine learning. The visual learning model uses, for example, a set of previously acquired past image data and image feature amount data linked to the past image data as learning data (image feature amount learning data), and a plurality of learning data A trained model constructed by machine learning using may be used.

テキスト学習モデルは、入力されたテキストデータから、テキスト特徴量データを出力するモデルである。テキスト学習モデルは、例えば機械学習により、生成されてもよい。テキスト学習モデルは、例えば一組の予め取得された過去のテキストデータと、過去のテキストデータに紐づけられたテキスト特徴量データとを学習データ（テキスト特徴量用学習データ）として、複数の学習データを用いた機械学習により構築された学習済みモデルが用いられてもよい。 A text learning model is a model that outputs text feature data from input text data. A text learning model may be generated, for example, by machine learning. The text learning model uses, for example, a set of previously acquired past text data and text feature data linked to the past text data as learning data (learning data for text features), and a plurality of learning data A trained model constructed by machine learning using may be used.

第１内部表象用データベースは、機械学習により生成される。第１内部表象用データベースとして、例えばユーザデータを入力データとし、内部表象データに含まれる１種類以上のデータ（第１内部表象データ）を第１出力データとして、第１入力データと第１出力データを一組の学習データ（第１内部表象用学習データ）として、学習データを用いた機械学習により構築された、第１入力データから第１出力データを生成するための学習済みモデルが用いられる。また、第１出力データは、第１内部表象用学習データとして用いられる第１内部表象データである。また、第１内部表象データは、第１内部表象用データベースを用いて生成された内部表象データを含む。 The first internal representation database is generated by machine learning. As the first internal representation database, for example, user data is used as input data, and one or more types of data (first internal representation data) included in the internal representation data are used as first output data, first input data and first output data is a set of learning data (first internal representation learning data), and a trained model for generating first output data from first input data is used, which is constructed by machine learning using the learning data. The first output data is first internal representation data used as first internal representation learning data. The first internal representation data includes internal representation data generated using the first internal representation database.

第２内部表象用データベースは、第１内部表象用データベースに用いられた第１出力データと異なる種類のデータを第２出力データとする点で、第１内部表象用データベースと異なる。第２内部表象用データベースは、機械学習により生成される。第２内部表象用データベースとして、例えばユーザデータを第２入力データとし、内部表象データに含まれる１種類以上のデータ（第２内部表象データ）を第２出力データとして、第２入力データと第２出力データを一組の学習データ（第２内部表象用学習データ）として、学習データを用いた機械学習により構築された、第２入力データから第２出力データを生成するための学習済みモデルが用いられる。また、第２出力データは、第２内部表象用学習データとして用いられる第２内部表象データである。また、第２内部表象データは、第２内部表象用データベースを用いて生成された内部表象データを含む。 The second internal representation database differs from the first internal representation database in that the second output data is data of a different type from the first output data used in the first internal representation database. The second internal representation database is generated by machine learning. As the second internal representation database, for example, user data is used as second input data, one or more types of data (second internal representation data) contained in the internal representation data are used as second output data, and the second input data and the second A trained model for generating the second output data from the second input data, which is constructed by machine learning using the learning data with the output data as a set of learning data (second internal representation learning data), is used. be done. The second output data is second internal representation data used as second internal representation learning data. The second internal representation data includes internal representation data generated using the second internal representation database.

表現用データベースは、入力された第１内部表象データと第２内部表象データとに基づいて、表現データを出力する。表現用データベースは、例えば機械学習により生成されてもよい。表現用データベースとして、例えば予め取得された一対の第１内部表象データと第２内部表象データとを第３入力データとし、表現データを第３出力データとして、第３入力データと第３出力データを一組の学習データ（表現用学習データ）として、学習データを用いた機械学習により構築された、第３入力データから第３出力データを生成するための学習済みモデルが用いられてもよい。 The expression database outputs expression data based on the input first internal representation data and second internal representation data. The representation database may be generated by machine learning, for example. As an expression database, for example, a pair of previously acquired first internal representation data and second internal representation data is used as third input data, expression data is used as third output data, and the third input data and third output data are As a set of learning data (learning data for expression), a trained model for generating third output data from third input data, constructed by machine learning using learning data, may be used.

刺激データは、例えばコンテンツ再生システム１００によって出力される内部表象データを生成する際に用いられる。刺激データは、任意の任意のテキストデータと、画像データと、音声データとの中の何れか１以上のデータを含む。刺激データは、例えば、画像データだけであってもよいし、画像データと音声データとであってもよい。また、刺激データは、任意のテキストデータに基づき抽出されたテキスト特徴量データと、任意の画像データに基づき抽出された画像特徴量データと、任意の音声データに基づき抽出された音声特徴量データとの中の何れか１以上を含んでいてもよい。 The stimulus data is used, for example, in generating internal representational data output by the content reproduction system 100 . Stimulus data includes any one or more of arbitrary text data, image data, and audio data. The stimulus data may be, for example, only image data, or image data and audio data. The stimulus data includes text feature amount data extracted based on arbitrary text data, image feature amount data extracted based on arbitrary image data, and voice feature amount data extracted based on arbitrary voice data. may include any one or more of

テキストデータは、例えばコンテンツ再生システム１００によって出力される内部表象データを生成する際に用いられる。テキストデータは、文字など文字コードによって表されるデータである。テキストデータは、例えば、モニタやプリンタなどの機器を制御するためのデータである制御文字を含む。制御文字は、例えば、改行を表す改行文字やタブ（水平タブ）などが含まれる。 The text data is used, for example, when generating internal representation data output by the content reproduction system 100 . Text data is data represented by character codes such as characters. Text data includes, for example, control characters, which are data for controlling devices such as monitors and printers. Control characters include, for example, line feed characters representing line breaks and tabs (horizontal tabs).

テキストデータは、例えば通信網４を介して、ＳＮＳ等のサーバに記憶されたユーザが投稿した、又はユーザに関する情報が記載されたデータを含む。また、テキストデータは、音声データを音声認識することによって抽出したものであってもよい。テキストデータは、例えばコンテンツ再生装置１等を介して、ユーザ等により入力されてもよい。 The text data includes data posted by a user stored in a server such as an SNS via the communication network 4, or data describing information about the user. Also, the text data may be extracted by recognizing voice data. The text data may be input by a user or the like via the content reproduction device 1 or the like, for example.

音声データは、例えばコンテンツ再生システム１００によって出力される内部表象データを生成する際に用いられる。音声データは、音声を符号化したものである。音声の符号化には例えば、量子化ビット数とサンプリング周波数と時間とで定まる長さのビット列として表されるパルス符号変調（ＰＣＭ）方式に基づくものと、音声の波の疎密を１ｂｉｔで表現して一定の間隔で標本化するパルス密度変調（ＰＤＭ）方式に基づくものなどがある。 Audio data is used, for example, in generating internal representation data output by the content reproduction system 100 . Audio data is encoded audio. For audio coding, for example, one based on pulse code modulation (PCM), which is expressed as a bit string with a length determined by the number of quantization bits, sampling frequency, and time, and one that expresses the density of audio waves with 1 bit. Some are based on a pulse density modulation (PDM) method in which sampling is performed at regular intervals.

音声データは、例えば動画データから抽出された音声に基づいたものであってもよい。音声データは、例えば公知の収音装置等を用いて収音された音声のデータを示すほか、例えば公知の技術で生成された擬似的な音声を示してもよい。音声データは、例えば通信網４を介して、ＳＮＳ等のサーバに記憶されたユーザが投稿した、又はユーザに関する情報が記載されたデータを含む。音声データは、例えばコンテンツ再生装置１等を介して、ユーザ等により入力されてもよい。 The audio data may be based on audio extracted from the video data, for example. The audio data may indicate, for example, data of audio collected using a known sound collection device or the like, or may indicate, for example, pseudo-audio generated by a known technique. The voice data includes data posted by a user stored in a server such as an SNS via the communication network 4, or data describing information about the user. The audio data may be input by a user or the like via the content reproduction device 1 or the like, for example.

画像データは、例えばコンテンツ再生システム１００によって出力される内部表象データを生成する際に用いられる。画像データは、複数の画素の集合体を含むデータである。画像データは、例えば動画から抽出されたものであってもよく、動画データであってもよい。 The image data is used, for example, when generating internal representation data output by the content reproduction system 100 . Image data is data that includes an aggregate of a plurality of pixels. The image data may be, for example, extracted from a moving image, or may be moving image data.

画像データは、例えば通信網４を介して取得したものであってもよい。画像データは、例えば公知の撮像装置等を用いて撮像された画像を示す他、例えば公知の技術で生成された擬似的な画像を示してもよい。画像データは、例えば通信網４を介して、ＳＮＳ等のサーバに記憶されたユーザが投稿した、又はユーザに関する情報が記載されたデータを含む。画像データは、例えばコンテンツ再生装置１等を介して、ユーザ等により入力されてもよい。 The image data may be obtained via the communication network 4, for example. The image data may indicate an image captured using a known imaging device or the like, or may indicate a pseudo image generated by a known technique, for example. The image data includes data posted by a user stored in a server such as an SNS via the communication network 4, or data describing information about the user. The image data may be input by a user or the like via the content reproduction device 1 or the like, for example.

テキスト特徴量データは、テキストデータに含まれる特徴を示すデータである。テキストの特徴とは、例えばテキストを形態素解析し、得られた単語や文章の意味に基づいて算出した、類似する単語及び単語の意味の出現傾向等であってもよい。また、単語や文章の意味に基づいて、ベクトルや関数グラフ等であってもよい。また、テキスト特徴量データは、会話の内容から、推測した単語の意味が含まれてもよい。また、テキスト特徴量データは、公知の技術を用いて取得してもよい。 The text feature amount data is data indicating features included in the text data. The feature of the text may be, for example, the occurrence tendency of similar words and the meaning of the words, which is calculated based on the meaning of words and sentences obtained by morphological analysis of the text. It may also be a vector, function graph, or the like based on the meaning of words or sentences. Also, the text feature amount data may include meanings of words inferred from the content of the conversation. Also, the text feature amount data may be acquired using a known technique.

音声特徴量データは、音声データに含まれる音声の特徴を示すデータである。音声の特徴とは、音の響きの特徴である音響特徴量と、音声の言語的意味に伴い、音声をテキストに変換しても損なわない意味特徴量を含む。音響特徴量は、例えば、基本周波数、スペクトル包絡、非周期性指標、スペクトログラム、音声の大きさ、ケプストラム、単語の発音、イントネーション、音波の時間遅れ、音声の時間による増減の変化等を示したものである。意味特徴量は、発言した単語の傾向、言葉使い等を示したものである。また、意味特徴量はテキスト特徴量と同じものであってもよい。また、音声特徴量データは、音響特徴量から取得された意味特徴量を含んでもよい。この場合、意味特徴量は、例えば音響特徴量に含まれる単語のアクセントから単語の意味特徴量を取得することで同音異義語の判断したものを含んでもよい。また、音声特徴量データは、公知の技術を用いて取得してもよい。 The audio feature amount data is data indicating audio features included in the audio data. The feature of speech includes an acoustic feature quantity that is a feature of the reverberation of sound, and a semantic feature quantity that accompanies the linguistic meaning of the speech and is not lost even if the speech is converted into text. Acoustic features include, for example, fundamental frequency, spectral envelope, aperiodicity index, spectrogram, loudness of speech, cepstrum, pronunciation of words, intonation, time delay of sound waves, change in increase or decrease over time of speech, etc. is. The semantic feature quantity indicates the tendency of the uttered word, word usage, and the like. Also, the semantic feature amount may be the same as the text feature amount. Also, the speech feature quantity data may include a semantic feature quantity obtained from the acoustic feature quantity. In this case, the semantic features may include homonyms determined by obtaining the semantic features of words from the accents of the words included in the acoustic features, for example. Also, the voice feature amount data may be acquired using a known technique.

画像特徴量データは、画像の特徴を示すデータである。画像の特徴とは、例えば画像認識により、認識した撮像対象であってもよい。また、複数の画像から共通して現れるデータであってもよい。共通して現れるデータは、例えばユーザを映した動画の中で、ユーザの笑顔が良く見られる傾向にあるとすれば、上述したユーザの笑顔を共通して現れるデータとしてもよい。また、画像の特徴は、例えば人の瞳の動きの特徴であってもよい。また、画像特徴量データは、撮像対象に基づく点群データであってもよい。点群データは、撮像対象に対する３次元構造の特徴を示し、例えばＳＩＦＴ（Scale-Invariant Feature Transform）による画像解析又は３Ｄカメラ等の公知の撮像装置や処理技術によって取得してもよい。点群データは、例えば撮像対象の構造に基づく曲率情報や、位置情報を含んでもよい。なお、曲率情報、及び位置情報は、公知の撮像装置や処理技術によって取得してもよい。また、画像特徴量データは、公知の技術を用いて取得してもよい。 The image feature amount data is data indicating features of an image. The feature of the image may be an imaging target recognized by image recognition, for example. Alternatively, data that appears in common from a plurality of images may be used. For example, if a smile of a user tends to be often seen in moving images of the user, the data that commonly appears may be the data that commonly appears of the user's smile described above. Further, the feature of the image may be, for example, the feature of the movement of the human eye. Also, the image feature amount data may be point cloud data based on an imaging target. The point cloud data characterizes the three-dimensional structure of the object to be imaged, and may be obtained by image analysis by SIFT (Scale-Invariant Feature Transform) or known imaging devices and processing techniques such as a 3D camera. The point cloud data may include, for example, curvature information based on the structure of the imaging target and position information. Curvature information and position information may be acquired by a known imaging device or processing technology. Also, the image feature amount data may be obtained using a known technique.

ユーザデータは、ユーザに関する情報が記載されたテキストデータと、ユーザの画像を含む画像データと、ユーザの音声に関する音声データとの中の何れか１以上を含むデータである。また、ユーザデータは、ユーザに関する情報が記載されたテキストデータに基づき抽出されたテキスト特徴量データと、ユーザに関する情報が記載された画像データに基づき抽出された画像特徴量データと、ユーザに関する情報が記載された音声データに基づき抽出された音声特徴量データとの中の何れか１以上を含んでいてもよい。 The user data is data including any one or more of text data describing information about the user, image data including the user's image, and voice data regarding the user's voice. The user data includes text feature amount data extracted based on text data describing information about the user, image feature amount data extracted based on image data describing information about the user, and information about the user. and audio feature amount data extracted based on the described audio data.

ユーザに関する情報が記載されたテキストデータは、例えばユーザの住所や氏名等の個人情報にが記載されたテキストデータ、又はユーザの好みや思い出等のユーザの嗜好について記載されたテキストデータ、又はユーザが自分で記載したテキストデータを含む。また、ユーザに関する情報が記載されたテキストデータは、ユーザに関する質問に対してユーザが回答した内容に関するテキスト形式のデータを含めてもよい。 Text data describing information about the user is, for example, text data describing personal information such as the user's address and name, text data describing user preferences such as user preferences and memories, or text data describing user preferences such as user preferences and memories. Includes self-written text data. Also, the text data in which the information about the user is described may include data in text format regarding the content of the user's answer to the question about the user.

ユーザの画像を含む画像データは、ユーザの姿の全身、又は体の一部の画像を含む画像データである。また、ユーザの画像を含む画像データは、ユーザに関する質問に対してユーザが回答した内容に関する画像形式のデータでもよい。 Image data containing an image of a user is image data containing an image of the user's whole body or a part of the body. Further, the image data including the image of the user may be data in image format regarding the content of the user's answer to the question about the user.

ユーザの音声に関する音声データは、ユーザの声を記録した音声データである。また、ユーザの音声に関する音声データは、ユーザに関する質問に対してユーザが回答した内容に関する音声形式のデータでもよい。 The audio data related to the user's voice is audio data in which the user's voice is recorded. Also, the voice data related to the user's voice may be voice format data related to the content of the user's answer to the user's question.

内部表象データは、ユーザの自己認識を示す自己認識データと、ユーザの事象に対する優先順位を示す優先順位データと、ユーザの事象に対する感情表現を示す感情表現データと、ユーザの事象に対する因果関係の推定を示す因果関係データと、の中の何れか１種類以上のデータを含むデータである。 The internal representation data includes self-recognition data indicating the user's self-recognition, priority data indicating the user's priority for the event, emotional expression data indicating the user's emotional expression for the event, and estimation of the user's causal relationship to the event. and causal relationship data indicating and data including one or more types of data in.

自己認識データは、ユーザの自己認識を示すデータである。自己認識は、自身の社会の中での使命や役割、立場などから生まれる感情を指す。例えば、集団の中でリーダー立場であった場合、「リーダーとしてとるべき表現」が自己認識である。また、自己認識は、例えば集団の中で空気を乱さないようにするための感情なども含む。自己認識データは、例えば、例えば喜び、怒り、哀愁、楽しいなどがある。 The self-recognition data is data indicating the user's self-recognition. Self-awareness refers to feelings that arise from one's mission, role, or position in society. For example, when a person is in a position of leader in a group, self-awareness is the expression that should be taken as a leader. Self-awareness also includes, for example, feelings for not disturbing the atmosphere in a group. Self-awareness data includes, for example, joy, anger, sorrow, fun, and the like.

優先順位データは、ユーザの事象に対する優先順位を示すデータである。ユーザの事象に対する優先順位とは、事象に対してユーザがどのようなことを優先するかに順列をつけたものである。例えば、ユーザが感情表現の優先順位よりも自己認識の優先順位を上にしたならば、ユーザは自己認識からなる表現をする傾向であることを示す。優先順位データとして、例えば自己認識、モーダリティ、感情表現、因果関係等がある。 The priority data is data indicating the priority of user events. A user's event priority is a sequence of user's priorities for events. For example, if a user prioritizes self-awareness over empathy, it indicates that the user tends to express self-awareness. Examples of priority order data include self-recognition, modality, emotional expression, causal relationship, and the like.

感情表現データは、ユーザの事象に対する感情表現を示すデータである。ユーザの事象に対する感情表現は、事象に対してユーザがどのような感情を抱いたかを示すものである。感情表現データは、例えば喜び、怒り、哀愁、楽しいなどがある。 Emotional expression data is data indicating the user's emotional expression for an event. The user's emotional expression to the event indicates how the user feels about the event. Emotional expression data includes, for example, joy, anger, sorrow, and fun.

因果関係データは、事象に対する因果関係の推定を示すデータである。事象に対する因果関係の推定とは、ユーザが事象に対してどのような事象を連想したかを示すものである。例えば、事故が起きたという事象に対して、渋滞という事象を連想することを指す。 Causal relationship data is data that indicates an estimated causal relationship to an event. Estimation of causality for an event indicates what kind of event the user associates with the event. For example, it refers to associating the event of traffic congestion with the event of an accident.

表現データは、キャラクターを含む画像、及びキャラクターの音声によって構成されるキャラクターの表現を示すデータである。表現は、例えば映像的表現、音声的表現、身体的表現等がある。映像的表現は、視覚に働きかける表現であり、身振りや表情等がある。音声的表現は、聴覚に働きかける表現であり、言葉や発言、歌等がある。身体的表現は、触覚に働きかける表現であり、ボディタッチなどがある。表現データは、擬似的に生成された擬似データを含んでいてもよい。 The representation data is data representing the representation of the character composed of an image containing the character and the voice of the character. Expressions include, for example, visual expressions, audio expressions, and physical expressions. Visual expressions are expressions that work on the sense of sight, and include gestures, facial expressions, and the like. The phonetic expression is an expression that works on hearing, and includes words, remarks, songs, and the like. Physical expressions are expressions that work on the sense of touch, such as body touch. The expression data may include simulated pseudo data.

なお、上述した「ユーザ」は、実在する人物又は動物のほか、アニメーション等のような、擬似的に生成された人物又は動物でもよい。 It should be noted that the above-mentioned "user" may be an actual person or animal, or may be a simulated person or animal such as an animation.

なお、上述した「キャラクター」は、ユーザを模して擬似的に生成された人物又は動物、或いは実在する人物又は動物を模して擬似的に生成された人物又は動物のほか、アニメーション等のような、擬似的に生成された人物又は動物でもよい。 In addition, the above-mentioned "character" is a person or animal that is pseudo-generated to imitate a user, or a person or animal that is pseudo-generated to imitate a real person or animal, as well as animation etc. Alternatively, a simulated person or animal may be used.

＜学習方法＞
本実施形態における学習方法は、入力された刺激データに対し、キャラクターの表現を示す表現データを生成するために用いられるデータベース又は学習モデルを生成する際に用いられる。データベースは、例えば第１内部表象用データベース第２内部表象用データベースと、表現用データベースとを含む。学習モデルは、例えばｓｏｕｎｄ学習モデル、ｖｉｓｕａｌ学習モデル、テキスト学習モデルとを含む。 <Learning method>
The learning method according to the present embodiment is used when generating a database or a learning model that is used to generate expression data representing character expressions for input stimulus data. The database includes, for example, a first internal representation database, a second internal representation database, and an expression database. Learning models include, for example, sound learning models, visual learning models, and text learning models.

学習方法は、例えば図３（ａ）に示すように、第１内部表象用データベースを生成する。ユーザデータを第１入力データとし、内部表象データに含まれる１種類以上の第１内部表象データを第１出力データとして、第１入力データと第１出力データとを一組の第１内部表象用学習データとして、内部表象用学習データを用いた機械学習により、第１入力データから第１出力データを生成するための第１内部表象用データベースを生成する。 The learning method generates a first internal representation database, for example, as shown in FIG. 3(a). user data as first input data, one or more kinds of first internal representation data contained in internal representation data as first output data, and first input data and first output data as a set of first internal representation data A first internal representation database for generating first output data from first input data is generated by machine learning using internal representation learning data as learning data.

また、学習方法は、例えば図３（ｂ）に示すように、第２内部表象用データベースを生成する。ユーザデータを第２入力データとし、内部表象データに含まれる１種類以上の第２内部表象データを第２出力データとして、第２入力データと第２出力データとを一組の第２内部表象用学習データとして、内部表象用学習データを用いた機械学習により、第２入力データから第２出力データを生成するための第２内部表象用データベースを生成する。第２内部表象用データベースの学習方法は、第１内部表象用データベースに用いられた第１出力データと異なる種類のデータを第２出力データとする点で、第１内部表象用データベースの学習方法と異なる。 Also, the learning method generates a second internal representation database, for example, as shown in FIG. 3(b). user data as second input data, one or more types of second internal representation data contained in internal representation data as second output data, and second input data and second output data as a set of second internal representation data A second internal representation database for generating second output data from second input data is generated by machine learning using internal representation learning data as learning data. The learning method for the second internal representation database is different from the learning method for the first internal representation database in that data of a different type from the first output data used in the first internal representation database is used as the second output data. different.

また、学習方法は、例えば図４（ａ）に示すように、表現用データベースを生成してもよい。学習方法は、第１内部表象用データベースと、第２内部表象用データベースとを用いて生成された２種類以上のデータを含む内部表象データを入力として、キャラクターの表現を示す表現データを出力とした１組の表現用学習データとして、複数の表現用学習データを用いた機械学習により、２種類以上のデータを含む内部表象データを入力として、キャラクターの表現を示す表現データを出力するための表現用データベースを生成してもよい。 Also, the learning method may generate an expression database, for example, as shown in FIG. 4(a). In the learning method, internal representation data including two or more types of data generated using the first internal representation database and the second internal representation database are input, and representation data representing character representation is output. Expression data for outputting expression data indicating a character's expression by inputting internal representation data including two or more types of data by machine learning using a plurality of expression learning data as a set of expression learning data A database may be generated.

また、学習方法は、例えば図４（ａ）に示すように、ｓｏｕｎｄ学習モデルを生成してもよい。学習方法は、一組の予め取得された過去の音声データと、過去の音声データに紐づけられた音声特徴量データとを音声特徴量用学習データとして、複数の音声特徴量用学習データを用いた機械学習により、音声データを入力として、音声特徴量データを出力するためのｓｏｕｎｄ学習モデルを生成してもよい。 Moreover, the learning method may generate a sound learning model, for example, as shown in FIG. 4(a). In the learning method, a set of past speech data obtained in advance and speech feature quantity data linked to the past speech data are used as speech feature quantity learning data, and a plurality of speech feature quantity learning data are used. A sound learning model may be generated by machine learning using speech data as input and for outputting speech feature amount data.

また、学習方法は、例えば図５（ａ）に示すように、ｖｉｓｕａｌ学習モデルを生成してもよい。学習方法は、一組の予め取得された過去の画像データと、過去の画像データに紐づけられた画像特徴量データとを画像特徴量用学習データとして、複数の画像特徴量用学習データを用いた機械学習により、画像データを入力として、画像特徴量データを出力するためのｖｉｓｕａｌ学習モデルを生成してもよい。 Moreover, the learning method may generate a visual learning model, for example, as shown in FIG. 5(a). In the learning method, a set of past image data obtained in advance and image feature amount data linked to the past image data are used as image feature amount learning data, and a plurality of image feature amount learning data are used. A visual learning model may be generated by machine learning using image data as input and outputting image feature data.

また、学習方法は、例えば図５（ｂ）に示すように、テキスト学習モデルを生成してもよい。学習方法は、一組の予め取得された過去のテキストデータと、過去のテキストデータに紐づけられたテキスト特徴量データとをテキスト特徴量用学習データとして、複数のテキスト特徴量用学習データを用いた機械学習により、テキストデータを入力として、テキスト特徴量データを出力するためのテキスト学習モデルを生成してもよい。 Also, the learning method may generate a text learning model, for example, as shown in FIG. 5(b). In the learning method, a set of previously obtained past text data and text feature amount data linked to the past text data are used as text feature amount learning data, and a plurality of text feature amount learning data are used. Machine learning may be used to generate a text learning model for inputting text data and outputting text feature amount data.

学習方法では、例えばニューラルネットワークをモデルとした機械学習を用いて、上述したデータベースを生成する。上述したデータベースは、例えばＣＮＮ（Convolution Neural Network）等のニューラルネットワークをモデルとした機械学習を用いて生成されるほか、任意のモデルが用いられてもよい。 In the learning method, for example, machine learning using a neural network as a model is used to generate the database described above. The database described above may be generated using machine learning using a neural network such as CNN (Convolution Neural Network) as a model, or any other model may be used.

第１内部表象用データベースには、例えばユーザデータ（第１入力データ）と内部表象データに含まれる１種類以上の第１内部表象データ（第１出力データ）との間における連関度を有する第１連関性が記憶される。連関度は、第１入力データと第１出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 In the first internal representation database, for example, a first Associations are stored. The degree of association indicates the degree of connection between the first input data and the first output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えば第１連関性は、多対多の情報（複数の第１入力データ、対、複数の第１出力データ）の間における繋がりの度合いにより構築される。第１連関性は、機械学習の過程で適宜更新され、例えば複数の第１入力データ、及び複数の第１出力データに基づいて最適化された関数（分類器）を示す。なお、第１連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, the first association is constructed by the degree of connection between many-to-many information (plurality of first input data, vs. pluralities of first output data). The first association is appropriately updated in the process of machine learning, and indicates, for example, a function (classifier) optimized based on a plurality of first input data and a plurality of first output data. In addition, the first association may have, for example, a plurality of degrees of association indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえた第１連関性を用いて、第１入力データに適した第１出力データを選択する。これにより、第１入力データが、第１出力データと同一又は類似である場合のほか、非類似である場合においても、第１入力データに適した第１出力データを定量的に選択することができる。 Therefore, in the content reproduction system 100, the first output data suitable for the first input data is selected, for example, using the first association based on all the results determined by the classifier. As a result, the first output data suitable for the first input data can be quantitatively selected not only when the first input data is the same as or similar to the first output data, but also when the first output data is dissimilar. can.

第１連関性は、例えば図６に示すように、複数の第１出力データと、複数の第１入力データとの間における繋がりの度合いを示してもよい。この場合、第１連関性を用いることで、複数の第１出力データ（図６では「第１出力データＡ」～「第１出力データＣ」）のそれぞれに対し、複数の第１入力データ（図６では「第１出力データＡ」～「第１出力データＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えば第１連関性を介して、１つの第１出力データに対して、複数の第１入力データを紐づけることができる。これにより、第１入力データに対して多角的な第１出力データの選択を実現することができる。 The first association may indicate the degree of connection between the plurality of first output data and the plurality of first input data, as shown in FIG. 6, for example. In this case, by using the first association, for each of the plurality of first output data (“first output data A” to “first output data C” in FIG. 6), the plurality of first input data ( In FIG. 6, the degree of relationship between "first output data A" to "first output data C") can be linked and stored. Therefore, for example, a plurality of first input data can be associated with one first output data via the first association. As a result, it is possible to realize a multifaceted selection of the first output data for the first input data.

第１連関性は、例えば各第１出力データと、各第１入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、第１出力データに含まれる「第１出力データＡ」は、第１入力データに含まれる「第１出力データＡ」との間の連関度ＡＡ「７３％」を示し、第１入力データに含まれる「第１出力データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 The first association has, for example, a plurality of degrees of association each linking each first output data and each first input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, the “first output data A” included in the first output data indicates the degree of association AA “73%” with the “first output data A” included in the first input data, and the first input data shows the degree of association AB "12%" with "first output data B" included in . That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、第１内部表象用データベースは、第１入力データと第１出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。第１入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Further, the first internal representation database may be provided with at least one hidden layer between the first input data and the first output data, and may be machine-learned. The degree of association described above is set in either or both of the first input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

第２内部表象用データベースには、例えばユーザデータ（第２入力データ）と内部表象データに含まれる１種類以上の第２内部表象データ（第２出力データ）との間における連関度を有する第２連関性が記憶される。連関度は、第２入力データと第２出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 In the second internal representation database, for example, a second Associations are stored. The degree of association indicates the degree of connection between the second input data and the second output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えば第２連関性は、多対多の情報（複数の第２入力データ、対、複数の第２出力データ）の間における繋がりの度合いにより構築される。第２連関性は、機械学習の過程で適宜更新され、例えば複数の第２入力データ、及び複数の第２出力データに基づいて最適化された関数（分類器）を示す。なお、第２連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, the second association is constructed by the degree of connection between many-to-many information (plurality of second input data, vs. pluralities of second output data). The second association is appropriately updated in the process of machine learning, and indicates, for example, a function (classifier) optimized based on a plurality of second input data and a plurality of second output data. Note that the second association may have, for example, a plurality of degrees of association indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえた第２連関性を用いて、第２入力データに適した第２出力データを選択する。これにより、第２入力データが、第２出力データと同一又は類似である場合のほか、非類似である場合においても、第２入力データに適した第２出力データを定量的に選択することができる。 For this reason, the content reproduction system 100 selects the second output data suitable for the second input data, for example, using the second association based on all the results determined by the classifier. Thereby, the second output data suitable for the second input data can be quantitatively selected not only when the second input data is the same as or similar to the second output data, but also when the second output data is dissimilar. can.

第２連関性は、例えば図７に示すように、複数の第２出力データと、複数の第２入力データとの間における繋がりの度合いを示してもよい。この場合、第２連関性を用いることで、複数の第２出力データ（図７では「第２出力データＡ」～「第２出力データＣ」）のそれぞれに対し、複数の第２入力データ（図７では「第２出力データＡ」～「第２出力データＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えば第２連関性を介して、１つの第２出力データに対して、複数の第２入力データを紐づけることができる。これにより、第２入力データに対して多角的な第２出力データの選択を実現することができる。 The second association may indicate the degree of connection between the plurality of second output data and the plurality of second input data, as shown in FIG. 7, for example. In this case, by using the second association, for each of the plurality of second output data (“second output data A” to “second output data C” in FIG. 7), the plurality of second input data ( In FIG. 7, the degree of relationship between "second output data A" to "second output data C") can be linked and stored. Therefore, for example, a plurality of second input data can be associated with one second output data via the second association. As a result, it is possible to realize a multifaceted selection of the second output data for the second input data.

第２連関性は、例えば各第２出力データと、各第２入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、第２出力データに含まれる「第２出力データＡ」は、第２入力データに含まれる「第２出力データＡ」との間の連関度ＡＡ「７３％」を示し、第２入力データに含まれる「第２出力データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 The second association has, for example, a plurality of degrees of association each linking each second output data and each second input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, the “second output data A” included in the second output data indicates the degree of association AA “73%” with the “second output data A” included in the second input data, and the second input data shows the degree of association AB "12%" with "second output data B" included in . That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、第２内部表象用データベースは、第２入力データと第２出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。第２入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Also, the second internal representation database may be provided with at least one hidden layer between the second input data and the second output data, and may be machine-learned. The degree of association described above is set in either or both of the second input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

表現用データベースには、例えば２種類以上のデータを含む内部表象データ（第３入力データ）と表現データ（第３出力データ）との間における連関度を有する第３連関性が記憶される。連関度は、第３入力データと第３出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 The expression database stores, for example, a third association having a degree of association between internal representation data (third input data) and expression data (third output data) including two or more types of data. The degree of association indicates the degree of connection between the third input data and the third output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えば第３連関性は、多対多の情報（複数の第３入力データ、対、複数の第３出力データ）の間における繋がりの度合いにより構築される。第３連関性は、機械学習の過程で適宜更新され、例えば複数の第３入力データ、及び複数の第３出力データに基づいて最適化された関数（分類器）を示す。なお、第３連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, the third association is constructed by the degree of connection between many-to-many information (a plurality of third input data, versus a plurality of third output data). The third relevance is appropriately updated in the process of machine learning, and indicates, for example, a function (classifier) optimized based on a plurality of third input data and a plurality of third output data. Note that the third association may have, for example, a plurality of degrees of association indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえた第３連関性を用いて、第３入力データに適した第３出力データを選択する。これにより、第３入力データが、第３出力データと同一又は類似である場合のほか、非類似である場合においても、第３入力データに適した第３出力データを定量的に選択することができる。 Therefore, in the content reproduction system 100, the third output data suitable for the third input data is selected, for example, using the third relevance based on all the results determined by the classifier. As a result, it is possible to quantitatively select the third output data suitable for the third input data not only when the third input data is the same as or similar to the third output data, but also when the third output data is dissimilar. can.

第３連関性は、例えば図８に示すように、複数の第３出力データと、複数の第３入力データとの間における繋がりの度合いを示してもよい。この場合、第３連関性を用いることで、複数の第３出力データ（図８では「第３出力データＡ」～「第３出力データＣ」）のそれぞれに対し、複数の第３入力データ（図８では「第３出力データＡ」～「第３出力データＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えば第３連関性を介して、１つの第３出力データに対して、複数の第３入力データを紐づけることができる。これにより、第３入力データに対して多角的な第３出力データの選択を実現することができる。 The third association may indicate the degree of connection between the plurality of third output data and the plurality of third input data, as shown in FIG. 8, for example. In this case, by using the third association, for each of the plurality of third output data (“third output data A” to “third output data C” in FIG. 8), the plurality of third input data ( In FIG. 8, the degree of relationship between "third output data A" to "third output data C") can be linked and stored. For this reason, a plurality of third input data can be associated with one third output data via the third association, for example. As a result, it is possible to realize multifaceted selection of the third output data for the third input data.

第３連関性は、例えば各第３出力データと、各第３入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、第３出力データに含まれる「第３出力データＡ」は、第３入力データに含まれる「第３出力データＡ」との間の連関度ＡＡ「７３％」を示し、第３入力データに含まれる「第３出力データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 The third association has, for example, a plurality of degrees of association that link each third output data and each third input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, the “third output data A” included in the third output data indicates the degree of association AA “73%” with the “third output data A” included in the third input data, and the third input data shows the degree of association AB "12%" with "third output data B" included in . That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、表現用データベースは、第３入力データと第３出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。第３入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Further, the representation database may be provided with at least one or more hidden layers between the third input data and the third output data for machine learning. The degree of association described above is set in either or both of the third input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

ｓｏｕｎｄ学習モデルには、例えば音声データを入力データとし、音声特徴量データを出力データとして、入力データと出力データとの間における連関度を有する音声連関性が記憶される。連関度は、入力データと出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 The sound learning model stores, for example, speech data as input data, speech feature amount data as output data, and speech association having a degree of association between the input data and the output data. The degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えば音声連関性は、多対多の情報（複数の入力データ、対、複数の出力データ）の間における繋がりの度合いにより構築される。音声連関性は、機械学習の過程で適宜更新され、例えば複数の入力データ、及び複数の出力データに基づいて最適化された関数（分類器）を示す。なお、音声連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, speech associations are constructed by the degree of connectivity between many-to-many information (multiple input data versus multiple output data). The phonetic associations are updated accordingly in the course of machine learning and represent optimized functions (classifiers) based on, for example, multiple input data and multiple output data. Note that the audio association may have, for example, a plurality of degrees of association indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえた音声連関性を用いて、入力データに適した出力データを選択する。これにより、入力データが、出力データと同一又は類似である場合のほか、非類似である場合においても、入力データに適した出力データを定量的に選択することができる。 Therefore, in the content reproduction system 100, output data suitable for input data is selected, for example, using audio association based on all the results determined by the classifier. This makes it possible to quantitatively select output data suitable for input data, not only when the input data is the same as or similar to the output data, but also when they are dissimilar.

音声連関性は、例えば図９に示すように、複数の出力データと、複数の入力データとの間における繋がりの度合いを示してもよい。この場合、音声連関性を用いることで、複数の出力データ（図９では「音声特徴量データＡ」～「音声特徴量データＣ」）のそれぞれに対し、複数の入力データ（図９では「音声データＡ」～「音声データＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えば音声連関性を介して、１つの出力データに対して、複数の入力データを紐づけることができる。これにより、入力データに対して多角的な出力データの選択を実現することができる。 The audio association may indicate the degree of connection between multiple output data and multiple input data, as shown in FIG. 9, for example. In this case, by using the speech association, for each of the plurality of output data ("speech feature amount data A" to "speech feature amount data C" in FIG. Data A” to “Voice Data C”) can be linked and stored. For this reason, multiple input data can be associated with one output data, for example, via audio association. This makes it possible to realize multifaceted selection of output data for input data.

音声連関性は、例えば各出力データと、各入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、出力データに含まれる「音声特徴量データＡ」は、入力データに含まれる「音声特徴量データＡ」との間の連関度ＡＡ「７３％」を示し、入力データに含まれる「音声特徴量データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 The audio association has, for example, a plurality of degrees of association that link each output data and each input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, "speech feature amount data A" included in the output data indicates an association degree AA of "73%" with "speech feature amount data A" included in the input data, and "speech feature amount data A" included in the input data Quantity Data B” and the degree of association AB “12%”. That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、ｓｏｕｎｄ学習モデルは、入力データと出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Also, the sound learning model may be machine-learned by providing at least one or more hidden layers between the input data and the output data. The degree of association described above is set in either or both of the input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

ｖｉｓｕａｌ学習モデルには、例えば画像データを入力データとし、画像特徴量データを出力データとして、入力データと出力データとの間における連関度を有する画像連関性が記憶される。連関度は、入力データと出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 The visual learning model stores, for example, image data as input data, image feature amount data as output data, and image association having a degree of association between the input data and the output data. The degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えば画像連関性は、多対多の情報（複数の入力データ、対、複数の出力データ）の間における繋がりの度合いにより構築される。画像連関性は、機械学習の過程で適宜更新され、例えば複数の入力データ、及び複数の出力データに基づいて最適化された関数（分類器）を示す。なお、画像連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, image association is constructed by the degree of connection between many-to-many information (multiple input data, vs. multiple output data). The image association is updated as appropriate in the course of machine learning, and represents an optimized function (classifier) based on, for example, multiple input data and multiple output data. Note that the image association may have, for example, a plurality of association degrees indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえた画像連関性を用いて、入力データに適した出力データを選択する。これにより、入力データが、出力データと同一又は類似である場合のほか、非類似である場合においても、入力データに適した出力データを定量的に選択することができる。 For this reason, the content reproduction system 100 selects output data suitable for input data, for example, using image association based on all the results determined by the classifier. This makes it possible to quantitatively select output data suitable for input data, not only when the input data is the same as or similar to the output data, but also when they are dissimilar.

画像連関性は、例えば図１０に示すように、複数の出力データと、複数の入力データとの間における繋がりの度合いを示してもよい。この場合、画像連関性を用いることで、複数の出力データ（図１０では「画像特徴量データＡ」～「画像特徴量データＣ」）のそれぞれに対し、複数の入力データ（図１０では「画像データＡ」～「画像データＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えば画像連関性を介して、１つの出力データに対して、複数の入力データを紐づけることができる。これにより、入力データに対して多角的な出力データの選択を実現することができる。 Image association may indicate the degree of connection between a plurality of output data and a plurality of input data, as shown in FIG. 10, for example. In this case, by using image associativity, a plurality of input data ("image The degree of relationship between data A” to “image data C”) can be linked and stored. For this reason, a plurality of pieces of input data can be associated with one piece of output data, for example, via image association. This makes it possible to realize multifaceted selection of output data for input data.

画像連関性は、例えば各出力データと、各入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、出力データに含まれる「画像特徴量データＡ」は、入力データに含まれる「画像特徴量データＡ」との間の連関度ＡＡ「７３％」を示し、入力データに含まれる「画像特徴量データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 The image association has, for example, a plurality of degrees of association that link each output data and each input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, "image feature amount data A" included in the output data indicates the degree of association AA "73%" between "image feature amount data A" included in the input data, and "image feature amount data A" included in the input data. Quantity Data B” and the degree of association AB “12%”. That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、ｖｉｓｕａｌ学習モデルは、入力データと出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Also, the visual learning model may be machine-learned by providing at least one or more hidden layers between the input data and the output data. The degree of association described above is set in either or both of the input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

テキスト学習モデルには、例えばテキストデータを入力データとし、テキスト特徴量データを出力データとして、入力データと出力データとの間における連関度を有するテキスト連関性が記憶される。連関度は、入力データと出力データとの繋がりの度合いを示しており、例えば連関度が高いほど各データの繋がりが強いと判断することができる。連関度は、例えば百分率等の３値以上（３段階以上）で示されるほか、２値（２段階）で示されてもよい。 The text learning model stores, for example, text data as input data, text feature data as output data, and text relevance having a degree of relevance between the input data and the output data. The degree of association indicates the degree of connection between the input data and the output data. For example, it can be determined that the higher the degree of association, the stronger the connection between the data. The degree of association may be indicated by three or more values (three or more levels) such as percentage, or may be indicated by two values (two levels).

例えばテキスト連関性は、多対多の情報（複数の入力データ、対、複数の出力データ）の間における繋がりの度合いにより構築される。テキスト連関性は、機械学習の過程で適宜更新され、例えば複数の入力データ、及び複数の出力データに基づいて最適化された関数（分類器）を示す。なお、テキスト連関性は、例えば各データの間における繋がりの度合いを示す複数の連関度を有してもよい。連関度は、例えばデータベースがニューラルネットワークで構築される場合、重み変数に対応させることができる。 For example, textual associations are constructed by the degree of connectivity between many-to-many information (multiple input data versus multiple output data). Text relevance is appropriately updated in the course of machine learning, and represents an optimized function (classifier) based on, for example, multiple input data and multiple output data. Note that the text relevance may have, for example, multiple degrees of relevance indicating the degree of connection between each piece of data. The degree of association can correspond to a weight variable, for example when the database is built with neural networks.

このため、コンテンツ再生システム１００では、例えば分類器の判定した結果を全て踏まえたテキスト連関性を用いて、入力データに適した出力データを選択する。これにより、入力データが、出力データと同一又は類似である場合のほか、非類似である場合においても、入力データに適した出力データを定量的に選択することができる。 For this reason, the content reproduction system 100 selects output data suitable for input data, for example, using text relevance based on all the results determined by the classifier. This makes it possible to quantitatively select output data suitable for input data, not only when the input data is the same as or similar to the output data, but also when they are dissimilar.

テキスト連関性は、例えば図１１に示すように、複数の出力データと、複数の入力データとの間における繋がりの度合いを示してもよい。この場合、テキスト連関性を用いることで、複数の出力データ（図１１では「テキスト特徴量データＡ」～「テキスト特徴量データＣ」）のそれぞれに対し、複数の入力データ（図１１では「テキストデータＡ」～「テキストデータＣ」）の関係の度合いを紐づけて記憶させることができる。このため、例えばテキスト連関性を介して、１つの出力データに対して、複数の入力データを紐づけることができる。これにより、入力データに対して多角的な出力データの選択を実現することができる。 Text relevance may indicate the degree of connection between a plurality of output data and a plurality of input data, as shown in FIG. 11, for example. In this case, by using text relevance, a plurality of input data ("text Data A” to “Text Data C”) can be linked and stored. For this reason, a plurality of input data can be associated with one output data, for example, via text association. This makes it possible to realize multifaceted selection of output data for input data.

テキスト連関性は、例えば各出力データと、各入力データとをそれぞれ紐づける複数の連関度を有する。連関度は、例えば百分率、１０段階、又は５段階等の３段階以上で示され、例えば線の特徴（例えば太さ等）で示される。例えば、出力データに含まれる「テキスト特徴量データＡ」は、入力データに含まれる「テキスト特徴量データＡ」との間の連関度ＡＡ「７３％」を示し、入力データに含まれる「テキスト特徴量データＢ」との間の連関度ＡＢ「１２％」を示す。すなわち、「連関度」は、各データ間における繋がりの度合いを示しており、例えば連関度が高いほど、各データの繋がりが強いことを示す。 Text relevance has, for example, a plurality of degrees of relevance that link each piece of output data and each piece of input data. The degree of association is indicated, for example, in three or more levels such as percentage, 10 levels, or 5 levels, and is indicated, for example, by line characteristics (such as thickness). For example, the "text feature amount data A" included in the output data indicates the degree of association AA "73%" with the "text feature amount data A" included in the input data, and the "text feature amount data A" included in the input data. Quantity Data B” and the degree of association AB “12%”. That is, the "relevance degree" indicates the degree of connection between each piece of data. For example, the higher the degree of association, the stronger the connection between each piece of data.

また、テキスト学習モデルは、入力データと出力データとの間に少なくとも１以上の隠れ層が設けられ、機械学習させるようにしてもよい。入力データ又は隠れ層データの何れか一方又は両方において上述した連関度が設定され、これが各データの重み付けとなり、これに基づいて出力の選択が行われる。そして、この連関度がある閾値を超えた場合に、その出力を選択するようにしてもよい。 Also, the text learning model may be machine-learned by providing at least one or more hidden layers between the input data and the output data. The degree of association described above is set in either or both of the input data and the hidden layer data, and this serves as weighting for each data, and output selection is performed based on this. Then, when the degree of association exceeds a certain threshold, the output may be selected.

＜コンテンツ再生装置１＞
次に、図１２、図１３を参照して、本実施形態におけるコンテンツ再生装置１の一例を説明する。図１２（ａ）は、本実施形態におけるコンテンツ再生装置１の構成の一例を示す模式図であり、図１２（ｂ）は、本実施形態におけるコンテンツ再生装置１の機能の一例を示す模式図である。図１２（ｃ）は、ＤＢ生成部１６の一例を示す模式図である。図１３は、処理部１２の一例を示す模式図である。 <Content playback device 1>
Next, an example of the content reproduction device 1 according to the present embodiment will be described with reference to FIGS. 12 and 13. FIG. FIG. 12(a) is a schematic diagram showing an example of the configuration of the content reproduction apparatus 1 according to this embodiment, and FIG. 12(b) is a schematic diagram showing an example of the functions of the content reproduction apparatus 1 according to this embodiment. be. FIG. 12C is a schematic diagram showing an example of the DB generator 16. As shown in FIG. FIG. 13 is a schematic diagram showing an example of the processing unit 12. As shown in FIG.

コンテンツ再生装置１として、例えばラップトップ（ノート）ＰＣ又はデスクトップＰＣ等の電子機器が用いられる。コンテンツ再生装置１は、例えば図１２（ａ）に示すように、筐体１０と、ＣＰＵ（Central Processing Unit）１０１と、ＲＯＭ（Read Only Memory）１０２と、ＲＡＭ（Random Access Memory）１０３と、保存部１０４と、Ｉ／Ｆ１０５～１０７とを備える。各構成１０１～１０７は、内部バス１１０により接続される。 As the content reproduction device 1, for example, an electronic device such as a laptop (notebook) PC or desktop PC is used. For example, as shown in FIG. 12A, the content reproduction apparatus 1 includes a housing 10, a CPU (Central Processing Unit) 101, a ROM (Read Only Memory) 102, a RAM (Random Access Memory) 103, and a storage unit. It has a unit 104 and I/Fs 105-107. Each configuration 101 - 107 is connected by an internal bus 110 .

ＣＰＵ１０１は、コンテンツ再生装置１全体を制御する。ＲＯＭ１０２は、ＣＰＵ１０１の動作コードを格納する。ＲＡＭ１０３は、ＣＰＵ１０１の動作時に使用される作業領域である。保存部１０４は、データベースや学習対象データ等の各種情報が記憶される。保存部１０４として、例えばＨＤＤ（Hard Disk Drive）のほか、ＳＳＤ（Solid State Drive）等のデータ保存装置が用いられる。なお、例えばコンテンツ再生装置１は、図示しないＧＰＵ（Graphics Processing Unit）を有してもよい。 The CPU 101 controls the content reproduction device 1 as a whole. ROM 102 stores the operation code of CPU 101 . A RAM 103 is a work area used when the CPU 101 operates. The storage unit 104 stores various types of information such as databases and data to be learned. As the storage unit 104, for example, a data storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive) is used. Note that, for example, the content reproduction device 1 may have a GPU (Graphics Processing Unit) not shown.

Ｉ／Ｆ１０５は、通信網４を介して、必要に応じて端末２、サーバ３、ウェブサイト等との各種情報の送受信を行うためのインターフェースである。Ｉ／Ｆ１０６は、入力部１０８との情報の送受信を行うためのインターフェースである。入力部１０８として、例えばキーボードが用いられ、コンテンツ再生装置１の使用者等は、入力部１０８を介して、各種情報、又はコンテンツ再生装置１の制御コマンド等を入力する。Ｉ／Ｆ１０７は、表示部１０９との各種情報の送受信を行うためのインターフェースである。表示部１０９は、保存部１０４に保存された各種情報、又はコンテンツ等を表示する。表示部１０９として、ディスプレイが用いられ、例えばタッチパネル式の場合、入力部１０８と一体に設けられる。また、表示部１０９は、スピーカが用いられてもよい。 The I/F 105 is an interface for transmitting/receiving various information to/from the terminal 2, the server 3, a website, etc. via the communication network 4 as necessary. The I/F 106 is an interface for transmitting/receiving information to/from the input unit 108 . A keyboard, for example, is used as the input unit 108 , and the user of the content reproduction apparatus 1 inputs various information, control commands for the content reproduction apparatus 1 , etc. via the input unit 108 . The I/F 107 is an interface for transmitting and receiving various information to and from the display unit 109 . The display unit 109 displays various types of information or content stored in the storage unit 104 . A display is used as the display unit 109 , and is provided integrally with the input unit 108 in the case of a touch panel type, for example. A speaker may be used for the display unit 109 .

図１２（ｂ）は、コンテンツ再生装置１の機能の一例を示す模式図である。コンテンツ再生装置１は、取得部１１と、処理部１２と、生成部１３と、出力部１４と、記憶部１５とを備え、例えばＤＢ生成部１６を有してもよい。ＤＢ生成部１６は、例えば図１２（ｃ）に示すように、第１内部表象用データベース生成部１６１と、第２内部表象用データベース生成部１６２と、表現用データベース生成部１６３とを有する。なお、図１２（ｂ）、図１２（ｃ）、図１３に示した各機能は、ＣＰＵ１０１が、ＲＡＭ１０３を作業領域として、保存部１０４等に記憶されたプログラムを実行することにより実現され、例えば人工知能等により制御されてもよい。 FIG. 12(b) is a schematic diagram showing an example of the functions of the content reproduction device 1. As shown in FIG. The content reproduction device 1 includes an acquisition unit 11, a processing unit 12, a generation unit 13, an output unit 14, and a storage unit 15, and may have a DB generation unit 16, for example. The DB generation unit 16 has a first internal representation database generation unit 161, a second internal representation database generation unit 162, and a representation database generation unit 163, as shown in FIG. 12(c), for example. The functions shown in FIGS. 12B, 12C, and 13 are implemented by the CPU 101 using the RAM 103 as a work area and executing a program stored in the storage unit 104 or the like. It may be controlled by artificial intelligence or the like.

＜＜取得部１１＞＞
取得部１１は、刺激データを取得する。取得したデータは、上述した表現データを生成する際に用いられる。取得部１１は、例えば入力部１０８から入力されたテキストデータ、画像データ、音声データを取得するほか、例えば通信網４を介して、端末２等からテキストデータ、画像データ、音声データを取得してもよい。 <<acquisition unit 11>>
Acquisition unit 11 acquires stimulation data. The acquired data is used when generating the expression data described above. Acquisition unit 11 acquires text data, image data, and voice data input from input unit 108, for example, and also acquires text data, image data, and voice data from terminal 2 or the like via communication network 4, for example. good too.

取得部１１は、例えば上述した各種データベースの生成に用いられる学習データを取得してもよい。取得部１１は、例えば入力部１０８から入力された学習データを取得するほか、例えば通信網４を介して、端末２等から学習データを取得してもよい。 The acquisition unit 11 may acquire, for example, learning data used for generating various databases described above. The acquiring unit 11 acquires learning data input from the input unit 108, for example, and may also acquire learning data from the terminal 2 or the like via the communication network 4, for example.

例えば、第１内部表象用データベースの生成に用いられる第１内部表象用学習データとして、過去のユーザデータ及び内部表象データが挙げられる。また、例えば表現用データベースの生成に用いられる学習データ（表現用学習データ）として、表現データが挙げられる。 For example, the first internal representation learning data used to generate the first internal representation database includes past user data and internal representation data. Further, for example, expression data can be given as learning data (learning data for expression) used for generating a database for expression.

＜＜処理部１２＞＞
処理部１２は、例えばｓｏｕｎｄ学習モデル、ｖｉｓｕａｌ学習モデル、テキスト学習モデル、第１内部表象用データベース、第２内部表象用データベース、表現用データベースを参照し、刺激データに対応する表現データを取得する。 <<processing unit 12>>
The processing unit 12 refers to, for example, the sound learning model, the visual learning model, the text learning model, the first internal representation database, the second internal representation database, and the representation database, and acquires representation data corresponding to the stimulus data.

処理部１２は、図１３に示すように、取得部１１に接続される音声処理部１２１と、画像処理部１２２と、テキスト処理部１２３とを有する。また、処理部１２は、音声処理部１２１と、画像処理部１２２と、テキスト処理部１２３に接続される第１内部表象処理部１２４と、第２内部表象処理部１２５とを有する。また、処理部１２は、第１内部表象処理部１２４と、第２内部表象処理部１２５とに接続される表現処理部１２６を有する。 The processing unit 12 has an audio processing unit 121, an image processing unit 122, and a text processing unit 123, which are connected to the acquisition unit 11, as shown in FIG. The processing unit 12 also includes a speech processing unit 121 , an image processing unit 122 , a first internal representation processing unit 124 connected to the text processing unit 123 , and a second internal representation processing unit 125 . The processing unit 12 also has a representation processing unit 126 connected to the first internal representation processing unit 124 and the second internal representation processing unit 125 .

音声処理部１２１は、例えばｓｏｕｎｄ学習モデルを参照し、音声データに対応する音声特徴量データを取得する。音声処理部１２１は、例えば音声データに対し、音声連関性の最も高い音声特徴量データを、第１音声特徴量データとして選択するほか、例えば予め設定された閾値以上の連関度を有する複数の音声特徴量データを、第１音声特徴量データとして選択してもよい。また、選択される音声特徴量データの数については、任意に設定できる。 The sound processing unit 121 refers to, for example, a sound learning model, and acquires sound feature amount data corresponding to sound data. The audio processing unit 121 selects, for example, the audio feature amount data having the highest audio correlation with the audio data as the first audio feature amount data. The feature amount data may be selected as the first audio feature amount data. Also, the number of audio feature amount data to be selected can be arbitrarily set.

画像処理部１２２は、例えばｖｉｓｕａｌ学習モデルを参照し、画像データに対応する画像特徴量データを取得する。画像処理部１２２は、例えば画像データに対し、画像連関性の最も高い画像特徴量データを、第１画像特徴量データとして選択するほか、例えば予め設定された閾値以上の連関度を有する複数の画像特徴量データを、第１画像特徴量データとして選択してもよい。また、選択される画像特徴量データの数については、任意に設定できる。 The image processing unit 122 acquires image feature amount data corresponding to image data by referring to, for example, a visual learning model. The image processing unit 122 selects, for example, image feature amount data with the highest image relevance to image data as first image feature amount data, and also selects, for example, a plurality of images having a degree of relevance equal to or greater than a preset threshold value. The feature amount data may be selected as the first image feature amount data. Also, the number of image feature amount data to be selected can be arbitrarily set.

テキスト処理部１２３は、例えばテキスト学習モデルを参照し、テキストデータに対応するテキスト特徴量データを取得する。テキスト処理部１２３は、例えばテキストデータに対し、テキスト連関性の最も高いテキスト特徴量データを、第１テキスト特徴量データとして選択するほか、例えば予め設定された閾値以上の連関度を有する複数のテキスト特徴量データを、第１テキスト特徴量データとして選択してもよい。また、選択されるテキスト特徴量データの数については、任意に設定できる。 The text processing unit 123 acquires text feature amount data corresponding to text data, for example, by referring to a text learning model. The text processing unit 123 selects, for example, the text feature amount data with the highest text relevance to the text data as the first text feature amount data. The feature data may be selected as the first text feature data. Also, the number of selected text feature quantity data can be set arbitrarily.

第１内部表象処理部１２４は、例えば第１内部表象用データベースを参照し、音声特徴量データを含む音声データと、画像特徴量データを含む画像データと、テキスト特徴量データを含むテキストデータとの中の何れか１以上のデータを入力として、入力に対応する内部表象データに含まれる１種類以上のデータ（第１内部表象データ）を取得する。第１内部表象処理部１２４は、例えばテキストデータを入力データとし、第１内部表象用データベースを参照して演算された出力データを、第１内部表象データとして取得する。 The first internal representation processing unit 124 refers to, for example, the first internal representation database, and converts audio data including audio feature amount data, image data including image feature amount data, and text data including text feature amount data. One or more of the data are input, and one or more types of data (first internal representation data) included in the internal representation data corresponding to the input are obtained. The first internal representation processing unit 124 receives, for example, text data as input data, and acquires output data calculated by referring to the first internal representation database as first internal representation data.

第１内部表象処理部１２４は、例えばテキストデータに対し、第１連関性の最も高い第１内部表象データを選択するほか、例えば予め設定された閾値以上の連関度を有する第１内部表象データを選択してもよい。また、選択される第１内部表象データの数については、任意に設定できる。 The first internal representation processing unit 124 selects, for example, first internal representation data having the highest degree of first association with text data, and also selects first internal representation data having a degree of association equal to or greater than a preset threshold value, for example. You may choose. Also, the number of selected first internal representation data can be set arbitrarily.

第２内部表象処理部１２５は、例えば第２内部表象用データベースを参照し、音声特徴量データを含む音声データと、画像特徴量データを含む画像データと、テキスト特徴量データを含むテキストデータとの中の何れか１以上のデータを入力として、入力に対応する内部表象データに含まれる１種類以上のデータ（第２内部表象データ）を取得する。第２内部表象処理部１２５は、例えばテキストデータを入力データとし、第２内部表象用データベースを参照して演算された出力データを、第２内部表象データとして取得する。 The second internal representation processing unit 125 refers to, for example, the second internal representation database, and converts audio data including audio feature amount data, image data including image feature amount data, and text data including text feature amount data. One or more of the data are input, and one or more types of data (second internal representation data) included in the internal representation data corresponding to the input are obtained. The second internal representation processing unit 125 uses, for example, text data as input data, and acquires output data calculated by referring to the second internal representation database as second internal representation data.

第２内部表象処理部１２５は、例えばテキストデータに対し、第２連関性の最も高い第２内部表象データを選択するほか、例えば予め設定された閾値以上の連関度を有する第２内部表象データを選択してもよい。また、選択される第２内部表象データの数については、任意に設定できる。 The second internal representation processing unit 125 selects, for example, second internal representation data having the highest degree of second association with text data, and also selects second internal representation data having a degree of association greater than or equal to a preset threshold. You may choose. Also, the number of selected second internal representation data can be set arbitrarily.

表現処理部１２６は、例えば表現用データベースを参照し、第１内部表象データと、第２内部表象データを入力として、入力に対応する表現データを取得する。表現処理部１２６は、例えば第１内部表象データに含まれる自己認識データと、第２内部表象データに含まれる感情表現データとを入力データとし、表現用データベースを参照して演算された出力データを、表現データとして取得する。 The expression processing unit 126 refers to, for example, an expression database, receives the first internal representation data and the second internal representation data, and obtains representation data corresponding to the input. The expression processing unit 126 uses, for example, the self-recognition data included in the first internal representation data and the emotion expression data included in the second internal representation data as input data, and outputs data calculated by referring to the expression database. , is obtained as expression data.

表現処理部１２６は、例えば自己認識データと、感情表現データとに対し、第３連関性の最も高い表現データを選択するほか、例えば予め設定された閾値以上の連関度を有する表現データを選択してもよい。また、選択される第２内部表象データの数については、任意に設定できる。 The expression processing unit 126 selects, for example, expression data with the highest third degree of association with self-recognition data and emotional expression data, and also selects expression data with a degree of association equal to or greater than a preset threshold. may Also, the number of selected second internal representation data can be set arbitrarily.

＜＜生成部１３＞＞
生成部１３は、処理部１２で取得した表現データに基づき、少なくとも１つの擬似データを生成する。生成部１３は、例えば表現処理部１２６で取得された表現データに基づき、音声及び画像を含む擬似データを生成する。擬似データを生成することによって、記憶部１５に記憶されていないキャラクターの表現を出力することが可能となる。生成部１３は、擬似データを生成する際に、公知の技術を用いてもよい。 <<generation unit 13>>
The generator 13 generates at least one piece of pseudo data based on the expression data acquired by the processor 12 . The generation unit 13 generates pseudo data including audio and images based on the expression data acquired by the expression processing unit 126, for example. By generating pseudo data, it is possible to output expressions of characters that are not stored in the storage unit 15 . The generator 13 may use a known technique when generating the pseudo data.

＜＜出力部１４＞＞
出力部１４は、表現データを出力する。出力部１４は、例えば生成部１３で生成された擬似データを出力してもよい。出力部１４は、Ｉ／Ｆ１０７を介して表示部１０９に表現データを出力するほか、例えばＩ／Ｆ１０５を介して、端末２等に表現データを出力する。 <<output unit 14>>
The output unit 14 outputs expression data. The output unit 14 may output the pseudo data generated by the generation unit 13, for example. The output unit 14 outputs expression data to the display unit 109 via the I/F 107, and also outputs expression data to the terminal 2 or the like via the I/F 105, for example.

＜＜記憶部１５＞＞
記憶部１５は、保存部１０４に保存されたデータベース等の各種データを必要に応じて取出す。記憶部１５は、各構成１１～１４、１６により取得又は生成された各種データを、必要に応じて保存部１０４に保存する。 <<storage unit 15>>
The storage unit 15 retrieves various data such as databases stored in the storage unit 104 as necessary. The storage unit 15 stores various data acquired or generated by each of the components 11 to 14 and 16 in the storage unit 104 as necessary.

＜＜ＤＢ生成部１６＞＞
ＤＢ生成部１６は、複数の学習データを用いた機械学習によりデータベースを生成する。機械学習には、例えば上述したニューラルネットワーク等が用いられる。 <<DB generator 16>>
The DB generation unit 16 generates a database by machine learning using a plurality of learning data. For machine learning, for example, the above-described neural network or the like is used.

ＤＢ生成部１６は、例えば、第１内部表象用データベース生成部１６１と、第２内部表象用データベース生成部１６２と、表現用データベース生成部１６３とを有する。 The DB generator 16 has, for example, a first internal representation database generator 161 , a second internal representation database generator 162 , and a representation database generator 163 .

第１内部表象用データベース生成部１６１は、例えば一対のユーザデータと、第１内部表象データと、を第１内部表象用学習データとして、複数の第１内部表象用学習データを用いた機械学習により第１内部表象用データベースを生成する。 The first internal representation database generation unit 161 performs machine learning using a plurality of first internal representation learning data, for example, a pair of user data and first internal representation data as first internal representation learning data. A first internal representation database is generated.

第２内部表象用データベース生成部１６２は、例えば一対のユーザデータと、第２内部表象データと、を第２内部表象用学習データとして、複数の第２内部表象用学習データを用いた機械学習により第２内部表象用データベースを生成する。 The second internal representation database generation unit 162 uses a pair of user data and second internal representation data as second internal representation learning data, for example, and performs machine learning using a plurality of second internal representation learning data. Generate a second internal representation database.

表現用データベース生成部１６３は、例えば一対の第１内部表象データと第２内部表象データと、表現データと、を表現用学習データとして、複数の表現用学習データを用いた機械学習により表現用データベースを生成する。 The representation database generation unit 163 generates a representation database by machine learning using a plurality of representation learning data, for example, using a pair of first internal representation data, second internal representation data, and representation data as learning data for representation. to generate

＜端末２＞
端末２は、例えばコンテンツ再生システム１００を用いたサービスのユーザ等が保有し、通信網４を介してコンテンツ再生装置１と接続される。端末２は、例えばデータベースを生成する電子機器を示してもよい。端末２は、例えばパーソナルコンピュータや、タブレット端末等の電子機器が用いられる。端末２は、例えばコンテンツ再生装置１の備える機能のうち、少なくとも一部の機能を備えてもよい。 <Terminal 2>
The terminal 2 is owned by, for example, a user of a service using the content reproduction system 100 and is connected to the content reproduction device 1 via the communication network 4 . Terminal 2 may for example represent an electronic device that generates a database. As the terminal 2, for example, an electronic device such as a personal computer or a tablet terminal is used. The terminal 2 may have, for example, at least part of the functions of the content reproduction device 1 .

＜サーバ３＞
サーバ３は、通信網４を介してコンテンツ再生装置１と接続される。サーバ３は、過去の各種データ等が記憶され、必要に応じてコンテンツ再生装置１から各種データが送信される。サーバ３は、例えばコンテンツ再生装置１の備える機能のうち、少なくとも一部の機能を備えてもよく、例えばコンテンツ再生装置１の代わりに少なくとも一部の処理を行ってもよい。サーバ３は、例えばコンテンツ再生装置１の保存部１０４に記憶された各種データのうち少なくとも一部が記憶され、例えば保存部１０４の代わりに用いられてもよい。 <Server 3>
Server 3 is connected to content reproduction device 1 via communication network 4 . The server 3 stores various past data and the like, and various data are transmitted from the content reproducing apparatus 1 as necessary. The server 3 may have, for example, at least part of the functions of the content reproduction device 1, and may perform at least part of the processing instead of the content reproduction device 1, for example. For example, the server 3 stores at least part of various data stored in the storage unit 104 of the content reproduction device 1, and may be used instead of the storage unit 104, for example.

＜通信網４＞
通信網４は、例えばコンテンツ再生装置１が通信回路を介して接続されるインターネット網等である。通信網４は、いわゆる光ファイバ通信網で構成されてもよい。また、通信網４は、有線通信網のほか、無線通信網等の公知の通信技術で実現してもよい。 <Communication network 4>
The communication network 4 is, for example, an Internet network or the like to which the content reproduction device 1 is connected via a communication circuit. The communication network 4 may be composed of a so-called optical fiber communication network. Moreover, the communication network 4 may be realized by a known communication technology such as a wireless communication network in addition to the wired communication network.

（実施形態：学習方法）
次に、実施形態における学習方法の一例について説明する。図１４は、本実施形態における学習方法の一例を示すフローチャートである。 (Embodiment: learning method)
Next, an example of the learning method in the embodiment will be described. FIG. 14 is a flow chart showing an example of a learning method according to this embodiment.

学習方法は、取得ステップＳ１１０と、第１内部表象用データベース生成ステップＳ１２０と、第２内部表象用データベース生成ステップＳ１３０と、表現用データベース生成ステップＳ１４０を備える。 The learning method includes an acquisition step S110, a first internal representation database generation step S120, a second internal representation database generation step S130, and a representation database generation step S140.

＜取得ステップＳ１１０＞
取得ステップＳ１１０は、ユーザデータと、第１内部表象データと、第２内部表象データと、表現データを取得する。取得ステップＳ１１０は、例えばユーザデータとして、ユーザがユーザに関する質問がされたインタビュー動画を取得してもよい。また、取得ステップＳ１１０は、ユーザデータとして、例えばｓｏｕｎｄ学習モデルを参照し、ユーザデータに含まれる音声データに対する音声特徴量データと、ｖｉｓｕａｌ学習モデルを参照し、ユーザデータに含まれる画像データに対する画像特徴量データと、テキスト学習モデルを参照し、ユーザデータに含まれるテキストデータに対するテキスト特徴量データと、をそれぞれ取得してもよい。また、取得ステップＳ１１０は、主成分分析、形態素分析、ランダムフォレストによる分類等の公知の技術によって、テキスト学習モデルを用いることなく、ユーザデータとして、テキスト特徴量データを取得してもよい。また、取得ステップＳ１１０は、ＭＦＣＣ（Mel-Frequency Cepstrum Coefficient）等の公知の技術によって、ｓｏｕｎｄ学習モデルを用いることなく、ユーザデータとして、音声特徴量データを取得してもよい。また、取得ステップＳ１１０は、ＳＩＦＴ（Scale-Invariant Feature Transform）等の公知の技術によって、テキスト学習モデルを用いることなく、ユーザデータとして、テキスト特徴量データを取得してもよい。 <Acquisition step S110>
An acquisition step S110 acquires user data, first internal representation data, second internal representation data, and representation data. The obtaining step S110 may obtain, as user data, for example, an interview video in which the user is asked a question about the user. Further, in the obtaining step S110, as user data, for example, a sound learning model is referred to, sound feature amount data for sound data included in the user data and visual learning model are referred to, and image feature data for image data included in the user data are referred to. Quantity data and text feature data for text data included in user data may be obtained by referring to a text learning model. In addition, the acquisition step S110 may acquire text feature amount data as user data by a known technique such as principal component analysis, morphological analysis, classification by random forest, or the like, without using a text learning model. Further, the acquisition step S110 may acquire speech feature amount data as user data by a known technique such as MFCC (Mel-Frequency Cepstrum Coefficient) without using a sound learning model. Also, in the acquisition step S110, text feature amount data may be acquired as user data by a known technique such as SIFT (Scale-Invariant Feature Transform) without using a text learning model.

取得ステップＳ１１０は、ユーザデータとして、ユーザがソーシャルネットサービス等に投稿した画像データ、音声データ、テキストデータのいずれかを取得してもよい。取得ステップＳ１１０では、例えば取得部１１が、上述した各データを取得する。取得部１１は、例えば端末２等からユーザデータ、及び内部表象データに含まれる２種類以上のデータ及び表現データを取得するほか、例えば記憶部１５を介して、保存部１０４から取得してもよい。ユーザデータとして、例えばユーザに関する情報が記載されたテキストデータのみを取得してもよいが、ユーザに関する情報が記載されたテキストデータと、ユーザの画像を含む画像データと、ユーザの音声に関する音声データとを取得することにより、例えば、ユーザの視覚的表現や、聴覚的表現を学習することが可能となるため、より精度の良い学習を行うことが可能となる。 In the obtaining step S110, any one of image data, voice data, and text data posted by the user on a social network service or the like may be obtained as the user data. At the acquisition step S110, for example, the acquisition unit 11 acquires each data described above. The acquisition unit 11 acquires, for example, user data and two or more types of data and expression data included in the internal representation data from the terminal 2 or the like, and may acquire from the storage unit 104 via the storage unit 15, for example. . As the user data, for example, only text data describing information about the user may be acquired. By acquiring , for example, it is possible to learn the user's visual expressions and auditory expressions, so that it is possible to perform more accurate learning.

＜第１内部表象用データベース生成ステップＳ１２０＞
次に、第１内部表象用データベース生成ステップＳ１２０は、一対のユーザデータと、第１内部表象データとを第１内部表象用学習データとした機械学習により、第１内部表象用データベースを生成する。例えば、第１内部表象用データベース生成部１６１は、公知の機械学習により、第１内部表象用データベースを生成する。第１内部表象用データベース生成部１６１は、例えば記憶部１５を介して、生成した第１内部表象用データベースを保存部１０４に保存する。なお、生成された第１内部表象用データベースは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。第１内部表象用学習データは、一対のユーザデータと内部表象データに含まれる１種類以上のデータとのデータを複数（例えば１０００程度）含ませてもよい。 <First Internal Representation Database Generation Step S120>
Next, a first internal representation database generation step S120 generates a first internal representation database through machine learning using a pair of user data and the first internal representation data as first internal representation learning data. For example, the first internal representation database generation unit 161 generates the first internal representation database by known machine learning. The first internal representation database generation unit 161 stores the generated first internal representation database in the storage unit 104 via the storage unit 15, for example. Note that the generated first internal representation database may be transmitted to the server 3 or another content reproduction device 1, for example. The first internal representation learning data may include a plurality of data (for example, about 1000) of a pair of user data and one or more types of data included in the internal representation data.

＜第２内部表象用データベース生成ステップＳ１３０＞
次に、第２内部表象用データベース生成ステップＳ１３０は、一対のユーザデータと、第２内部表象データとを第２内部表象用学習データとした機械学習により、第２内部表象用データベースを生成する。例えば、第２内部表象用データベース生成部１６２は、公知の機械学習により、第２内部表象用データベースを生成する。第２内部表象用データベース生成部１６２は、例えば記憶部１５を介して、生成した第２内部表象用データベースを保存部１０４に保存する。なお、生成された第２内部表象用データベースは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。第２内部表象用学習データは、一対のユーザデータと内部表象データに含まれる１種類以上のデータとのデータを複数（例えば１０００程度）含ませてもよい。第１内部表象用データベースと、第２内部表象用データベースとを参照して、別々の種類のデータを含む内部表象データを独立して取得することにより、ユーザの多面的な感情を示した内部表象データを取得することが可能となる。例えば、第１内部表象用データベースを用いて、内部表象データに含まれる自己認識データとして、「笑い」を取得し、第２内部表象用データベースを用いて、内部表象データに含まれる感情表現データとして、「怒り」を取得することで、ユーザの多面的な感情を示した内部表象データを学習することができる。 <Second Internal Representation Database Generation Step S130>
Next, a second internal representation database generating step S130 generates a second internal representation database by machine learning using the pair of user data and the second internal representation data as second internal representation learning data. For example, the second internal representation database generation unit 162 generates the second internal representation database by known machine learning. The second internal representation database generation unit 162 stores the generated second internal representation database in the storage unit 104 via the storage unit 15, for example. The generated second internal representation database may be transmitted to the server 3 or another content reproduction device 1, for example. The second internal representation learning data may include a plurality of data (for example, about 1000) of a pair of user data and one or more types of data included in the internal representation data. An internal representation showing the user's multifaceted emotions by independently acquiring internal representation data containing different types of data by referring to a first internal representation database and a second internal representation database. Data can be obtained. For example, using the first internal representation database, "laughter" is acquired as self-recognition data included in the internal representation data, and using the second internal representation database, "laughter" is acquired as emotional expression data included in the internal representation data. , and “anger”, it is possible to learn the internal representation data representing the multifaceted emotions of the user.

＜表現用データベース生成ステップＳ１４０＞
次に、表現用データベース生成ステップＳ１４０は、一組の第１内部表象データと、第２内部表象データと、表現データと、を表現用学習データとして、複数の表現用学習データを用いた機械学習により表現用データベースを生成する。例えば、表現用データベース生成部１６３は、公知の機械学習により、表現用データベースを生成する。表現用データベース生成部１６３は、例えば記憶部１５を介して、生成した表現用データベースを保存部１０４に保存する。なお、生成された表現用データベースは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。表現用学習データは、一対の第１内部表象データと第２内部表象データと、表現データとのデータを複数（例えば１０００程度）含ませてもよい。第１内部表象データと第２内部表象データを入力データとすることにより、多面的な感情に基づく表現データを取得することが可能となる。 <Expression database generation step S140>
Next, in an expression database generation step S140, a set of first internal representation data, second internal representation data, and expression data are used as representation learning data, and machine learning is performed using a plurality of representation learning data. generates a representation database. For example, the expression database generation unit 163 generates an expression database by known machine learning. The representation database generation unit 163 stores the generated representation database in the storage unit 104 via the storage unit 15, for example. Note that the generated expression database may be transmitted to the server 3 or another content reproduction device 1, for example. The expression learning data may include a plurality of pairs (for example, about 1000) of pairs of first internal representation data, second internal representation data, and expression data. By using the first internal representation data and the second internal representation data as input data, it is possible to obtain representation data based on multifaceted emotions.

また、上述した学習方法は、一例であり、学習のタイミング及び学習のステップの手順等は任意のものであってもよい。また、第１内部表象用データベース及び第２内部表象用データベースと、出力データが異なる種類のデータを用いた点で異なる学習方法によって、内部表象データを出力するためのデータベースを１以上生成してもよい。これによって、多面的なユーザの感情をより多角的に評価することが可能となる。 Also, the learning method described above is merely an example, and the timing of learning, the procedure of the steps of learning, and the like may be arbitrary. Also, one or more databases for outputting internal representation data may be generated by a different learning method in that different types of output data are used from the first internal representation database and the second internal representation database. good. This makes it possible to evaluate the user's multifaceted emotions from many angles.

（第１実施形態：コンテンツ再生システムの動作）
次に、本実施形態におけるコンテンツ再生システム１００の動作の一例について説明する。図１５は、本実施形態におけるコンテンツ再生システム１００の動作の一例を示すフローチャートである。 (First Embodiment: Operation of Content Playback System)
Next, an example of the operation of the content reproduction system 100 according to this embodiment will be described. FIG. 15 is a flow chart showing an example of the operation of the content reproduction system 100 according to this embodiment.

＜取得手段Ｓ２１０＞
取得手段Ｓ２１０は、ユーザ等により入力された刺激データを取得する。取得手段Ｓ２１０では、例えば取得部１１が、刺激データを取得する。取得部１１は、例えば端末２等から刺激データを取得するほか、例えば記憶部１５を介して、保存部１０４から取得してもよい。また、取得手段Ｓ２１０は、例えば刺激データとして、任意の音声データのみを取得してもよいが、一つのデータに紐づいた複数種類のデータを取得してもよい。例えば刺激データとして、動画のように画像データと、画像データに紐づいた音声データを取得してもよい。 <Acquisition Means S210>
Acquisition means S210 acquires stimulation data input by a user or the like. In acquisition means S210, for example, the acquisition unit 11 acquires stimulation data. The acquisition unit 11 may acquire stimulus data from, for example, the terminal 2 or the like, and may also acquire the stimulus data from the storage unit 104 via the storage unit 15, for example. Further, the acquisition means S210 may acquire only arbitrary voice data as stimulus data, for example, but may acquire a plurality of types of data linked to one data. For example, as stimulus data, image data such as moving images and audio data linked to the image data may be acquired.

＜特徴量処理手段Ｓ２２０＞
特徴量処理手段Ｓ２２０は、例えばｓｏｕｎｄ学習モデル、ｖｉｓｕａｌ学習モデル、テキスト学習モデルを参照し、取得手段Ｓ２１０で取得した刺激データに対応するテキスト特徴量データ、画像特徴量データ、音声特徴量データをそれぞれ取得する。特徴量処理手段Ｓ２２０は、例えばテキスト学習モデルを参照し、取得したテキストデータに対するテキスト特徴量データを取得する。また、特徴量処理手段Ｓ２２０は、例えばｓｏｕｎｄ学習モデルを参照し、刺激データに含まれる音声データに対する音声特徴量データと、ｖｉｓｕａｌ学習モデルを参照し、刺激データに含まれる画像データに対する画像特徴量データと、テキスト学習モデルを参照し、刺激データに含まれるテキストデータに対するテキスト特徴量データと、をそれぞれ取得してもよい。また、特徴量処理手段Ｓ２２０は、主成分分析、形態素分析、ランダムフォレストによる分類等の公知の技術によって、テキスト学習モデルを用いることなく、テキスト特徴量データを取得してもよい。また、特徴量処理手段Ｓ２２０は、ＭＦＣＣ（Mel-Frequency Cepstrum Coefficient）等の公知の技術によって、ｓｏｕｎｄ学習モデルを用いることなく、音声特徴量データを取得してもよい。また、特徴量処理手段Ｓ２２０は、ＳＩＦＴ（Scale-Invariant Feature Transform）等の公知の技術によって、テキスト学習モデルを用いることなく、テキスト特徴量データを取得してもよい。 <Feature amount processing means S220>
The feature amount processing means S220 refers to, for example, a sound learning model, a visual learning model, and a text learning model, and obtains text feature amount data, image feature amount data, and audio feature amount data corresponding to the stimulus data acquired by the acquisition means S210, respectively. get. The feature amount processing means S220 acquires text feature amount data for the acquired text data by referring to, for example, a text learning model. In addition, the feature amount processing means S220 refers to, for example, a sound learning model for audio feature amount data for audio data included in the stimulus data, and a visual learning model for image feature amount data for image data included in the stimulus data. , and text feature amount data corresponding to text data included in the stimulus data by referring to the text learning model. Also, the feature amount processing means S220 may acquire text feature amount data by a known technique such as principal component analysis, morphological analysis, classification by random forest, or the like, without using a text learning model. Also, the feature amount processing means S220 may acquire speech feature amount data by a known technique such as MFCC (Mel-Frequency Cepstrum Coefficient) without using a sound learning model. Also, the feature amount processing means S220 may acquire text feature amount data by a known technique such as SIFT (Scale-Invariant Feature Transform) without using a text learning model.

また、特徴量処理手段Ｓ２２０は、例えば記憶部１５を介して、取得した音声データ及び特徴点データを保存部１０４に保存してもよい。なお、取得した各データは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。取得するデータは、擬似的に生成されたデータでもよい。取得するデータは、例えばひとつのテキストデータに対して複数のデータを取得してもよい。また、一つのデータに紐づいた複数種類のデータを入力データとすることにより、例えば複合的に特徴量を算出することが可能となり、より精度のよい表現データの取得ができる。 Further, the feature amount processing unit S220 may store the acquired voice data and feature point data in the storage unit 104 via the storage unit 15, for example. Each acquired data may be transmitted to the server 3 or another content reproduction device 1, for example. The data to be acquired may be pseudo-generated data. As for the data to be acquired, for example, a plurality of data may be acquired for one text data. In addition, by using a plurality of types of data linked to one data as input data, for example, it becomes possible to calculate a feature amount in a composite manner, and more accurate expression data can be acquired.

＜内部表象処理手段Ｓ２３０＞
内部表象処理手段Ｓ２３０は、例えば第１内部表象用データベース及び第２内部表象用データベースを参照し、特徴量処理手段Ｓ２２０で取得したテキスト特徴量データを含むテキストデータ、画像特徴量データを含む画像データ、音声特徴量データを含む音声データに対応する第１内部表象データ及び第２内部表象データを取得する。内部表象処理手段Ｓ２３０では、例えば第１内部表象処理部１２４は、第１内部表象用データベースを参照し、テキスト特徴量データに対応する自己認識データを取得し、第２内部表象処理部１２５は、第２内部表象用データベースを参照し、テキスト特徴量データに対応する感情表現データをそれぞれ取得する。内部表象処理手段Ｓ２３０は、例えば記憶部１５を介して、取得した内部表象データを保存部１０４に保存してもよい。なお、取得した内部表象データは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。取得するデータはひとつの入力データに対して複数のデータを取得してもよい。また、内部表象処理手段Ｓ２３０は、、第１内部表象用データベース及び第２内部表象用データベースと、出力データが異なる種類のデータを用いた点で異なる学習方法によって、内部表象データを出力するためのデータベースを１以上用いて、より多くの種類を含む内部表象データを取得してもよい。これにより、ユーザの感情をより多角的に判断することができる。 <Internal Representation Processing Means S230>
The internal representation processing means S230 refers to, for example, the first internal representation database and the second internal representation database, and the text data including the text feature amount data and the image data including the image feature amount data acquired by the feature amount processing means S220. , to acquire first internal representation data and second internal representation data corresponding to speech data including speech feature data. In the internal representation processing means S230, for example, the first internal representation processing unit 124 refers to the first internal representation database, acquires self-recognition data corresponding to the text feature amount data, and the second internal representation processing unit 125 The second internal representation database is referred to, and emotional expression data corresponding to the text feature amount data is acquired. The internal representation processing means S230 may store the acquired internal representation data in the storage unit 104 via the storage unit 15, for example. The acquired internal representation data may be transmitted to the server 3 or another content reproduction device 1, for example. A plurality of pieces of data may be acquired for one piece of input data. In addition, the internal representation processing means S230 outputs internal representation data by a learning method different from the first internal representation database and the second internal representation database in that different types of output data are used. One or more databases may be used to obtain internal representation data containing more types. As a result, the user's emotions can be judged from various angles.

＜表現処理手段Ｓ２４０＞
表現処理手段Ｓ２４０は、例えば表現用データベースを参照し、内部表象処理手段Ｓ２３０で取得した第１内部表象データと第２内部表象データとを入力として、入力に対応する表現データを取得する。表現処理手段Ｓ２４０では、表現処理部１２６は、例えば第１内部表象データに含まれる自己認識データと、第２内部表象データに含まれる感情表現データとを入力データとし、表現用データベースを参照して演算された出力データを、表現データとして取得する。例えば、第１内部表象データに含まれる自己認識データとして、「笑い」を、第２内部表象データに含まれる感情表現データとして、「怒り」と、優先順位データとして「自己認識」とが入力されたとすると、内心では怒りを感じているが、場の空気を優先して笑っているというような多面的な感情に基づく表現データを取得することができる。 <Expression processing means S240>
The representation processing means S240 refers to, for example, a representation database, receives the first internal representation data and the second internal representation data acquired by the internal representation processing means S230, and acquires representation data corresponding to the input. In the expression processing means S240, the expression processing unit 126 uses, for example, the self-recognition data included in the first internal representation data and the emotional expression data included in the second internal representation data as input data, and refers to the expression database. The calculated output data is obtained as expression data. For example, "laughter" is input as self-recognition data included in the first internal representation data, "anger" is input as emotional expression data included in the second internal representation data, and "self-recognition" is input as priority data. In this case, it is possible to acquire expression data based on multifaceted emotions, such as a person who is feeling angry inwardly, but is laughing while prioritizing the atmosphere of the place.

表現処理手段Ｓ２４０は、例えば記憶部１５を介して、取得した表現データを保存部１０４に保存してもよい。なお、取得した表現データは、例えばサーバ３や他のコンテンツ再生装置１に送信されてもよい。取得するデータは擬似データでもよい。取得するデータはひとつの入力データに対して複数のデータを取得してもよい。 The expression processing means S240 may store the acquired expression data in the storage unit 104 via the storage unit 15, for example. Note that the acquired representation data may be transmitted to the server 3 or another content reproduction device 1, for example. The data to be acquired may be pseudo data. A plurality of pieces of data may be acquired for one piece of input data.

＜出力手段Ｓ２５０＞
出力手段Ｓ２５０では、例えば出力部１４は、表現処理手段Ｓ２４０により取得された表現データを、表示部１０９や端末２等に出力する。 <Output Means S250>
In the output unit S250, for example, the output unit 14 outputs the expression data acquired by the expression processing unit S240 to the display unit 109, the terminal 2, or the like.

上述した各手段を行うことで、本実施形態におけるコンテンツ再生システム１００の動作が完了する。 By performing each means described above, the operation of the content reproduction system 100 in this embodiment is completed.

本発明の実施形態を説明したが、この実施形態は例として提示したものであり、発明の範囲を限定することは意図していない。これら新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。これら実施形態やその変形は、発明の範囲や要旨に含まれるとともに、特許請求の範囲に記載された発明とその均等の範囲に含まれる。 While embodiments of the invention have been described, the embodiments have been presented by way of example and are not intended to limit the scope of the invention. These novel embodiments can be implemented in various other forms, and various omissions, replacements, and modifications can be made without departing from the scope of the invention. These embodiments and modifications thereof are included in the scope and gist of the invention, and are included in the scope of the invention described in the claims and equivalents thereof.

１：コンテンツ再生装置
２：端末
３：サーバ
４：通信網
１０：筐体
１１：取得部
１２：処理部
１３：生成部
１４：出力部
１５：記憶部
１６：ＤＢ生成部
１００：コンテンツ再生システム
１０１：ＣＰＵ
１０２：ＲＯＭ
１０３：ＲＡＭ
１０４：保存部
１０５：Ｉ／Ｆ
１０６：Ｉ／Ｆ
１０７：Ｉ／Ｆ
１０８：入力部
１０９：表示部
１１０：内部バス
１２１：音声処理部
１２２：画像処理部
１２３：テキスト処理部
１２４：第１内部表象処理部
１２５：第２内部表象処理部
１２６：表現処理部
１６１：第１内部表象用データベース生成部
１６２：第２内部表象用データベース生成部
１６３：表現用データベース生成部
Ｓ１１０：取得ステップ
Ｓ１２０：第１内部表象用データベース生成ステップ
Ｓ１３０：第２内部表象用データベース生成ステップ
Ｓ１４０：表現用データベース生成ステップ
Ｓ２１０：取得手段
Ｓ２２０：特徴量処理手段
Ｓ２３０：内部表象処理手段
Ｓ２４０：表現処理手段
Ｓ２５０：出力手段 Reference Signs List 1 : Content reproduction device 2 : Terminal 3 : Server 4 : Communication network 10 : Case 11 : Acquisition unit 12 : Processing unit 13 : Generation unit 14 : Output unit 15 : Storage unit 16 : DB generation unit 100 : Content reproduction system 101 : CPU
102: ROM
103: RAM
104: Storage unit 105: I/F
106: I/F
107: I/F
108: Input unit 109: Display unit 110: Internal bus 121: Audio processing unit 122: Image processing unit 123: Text processing unit 124: First internal representation processing unit 125: Second internal representation processing unit 126: Expression processing unit 161: First internal representation database generation unit 162: Second internal representation database generation unit 163: Expression database generation unit S110: Acquisition step S120: First internal representation database generation step S130: Second internal representation database generation step S140 : Expression database generation step S210 : Acquisition means S220 : Feature amount processing means S230 : Internal representation processing means S240 : Expression processing means S250 : Output means

第１発明に係る学習方法は、キャラクターの表現を示す表現データを生成するために用いられるデータベースを生成する学習方法であって、ユーザに関する情報が記載されたテキストデータと、前記ユーザの画像を含む画像データと、前記ユーザの音声に関する音声データとの中の何れか１以上を含むユーザデータを取得する入力データ取得ステップと、前記ユーザの自己認識を示す自己認識データと、前記ユーザの事象に対する優先順位を示す優先順位データと、前記ユーザの事象に対する感情表現を示す感情表現データと、前記ユーザの事象に対する因果関係の推定を示す因果関係データと、の中の２種類以上のデータを含む、前記ユーザの内部表象を示す内部表象データとを取得する出力データ取得ステップと、前記入力データ取得ステップにより取得したユーザデータを第１入力データとし、前記内部表象データに含まれる１種類以上のデータである第１内部表象データを第１出力データとして、前記第１入力データと前記第１出力データとを一組の第１内部表象用学習データとして、複数の前記第１内部表象用学習データを用いた機械学習により第１内部表象用データベースを生成する第１内部表象用データベース生成ステップと、前記入力データ取得ステップにより取得したユーザデータを第２入力データとし、前記第１内部表象用データベース生成ステップにおける第１出力データと異なる種類のデータであり、前記内部表象データに含まれる１種類以上のデータである第２内部表象データを第２出力データとして、前記第２入力データと前記第２出力データとを一組の第２内部表象用学習データとして、複数の前記第２内部表象用学習データを用いた機械学習により第２内部表象用データベースを生成する第２内部表象用データベース生成ステップとをコンピュータに実行させることを特徴とする。 A learning method according to a first aspect of the invention is a learning method for generating a database used to generate expression data representing an expression of a character, the learning method including text data describing information about a user and an image of the user. an input data acquisition step of acquiring user data including at least one of image data and audio data relating to the user's voice; self-recognition data indicating the user's self-recognition; and the user's priority for events. including two or more types of data among priority order data indicating a ranking, emotional expression data indicating an emotional expression of the user's event, and causal relationship data indicating a presumed causal relationship of the user's event. an output data obtaining step for obtaining internal representation data representing a user's internal representation; and user data obtained by said input data obtaining step as first input data, and at least one type of data included in said internal representation data. Using the first internal representation data as first output data, the first input data and the first output data as a set of first internal representation learning data, and using a plurality of the first internal representation learning data A first internal representation database generation step for generating a first internal representation database by machine learning; Second internal representation data, which is data of a different type from the first output data and is one or more types of data included in the internal representation data, is used as second output data, and the second input data and the second output data are combined. a second internal representation database generating step of generating a second internal representation database by machine learning using a plurality of the second internal representation learning data as a set of second internal representation learning data; It is characterized by

第２発明に係る学習方法は、第１発明において、前記第１内部表象用データベースを用いて生成された第１内部表象データと、前記第２内部表象用データベースを用いて生成された第２内部表象データとを入力として、前記キャラクターの表現を示す表現データを出力するための表現用データベースを生成する表現用データベース生成ステップをさらにコンピュータに実行させることを特徴とする。 A learning method according to a second invention is, in the first invention, the first internal representation data generated using the first internal representation database, and the second internal representation data generated using the second internal representation database. The method further causes the computer to execute an expression database generation step of generating an expression database for outputting expression data representing the expression of the character, using the expression data as input.

Claims

A learning method for generating a database used to generate expression data indicating a character's expression,
an input data acquisition step of acquiring user data including at least one of text data describing information about a user, image data including an image of the user, and audio data relating to the voice of the user;
self-recognition data indicating the user's self-recognition; priority data indicating the priority of the user's event; emotional expression data indicating the user's emotional expression regarding the event; an output data acquisition step of acquiring internal representation data representing the user's internal representation, including two or more types of data from:
Using the user data acquired by the input data acquiring step as first input data, and the first internal representation data, which is one or more types of data contained in the internal representation data, as first output data, the first input data and the generating a first internal representation database by machine learning using a plurality of the first internal representation learning data, with the first output data as a set of first internal representation learning data; a step;
The user data acquired by the input data acquisition step is used as second input data, the data is of a different type from the first output data in the first internal representation database generation step, and is included in the internal representation data. A plurality of the second internal representation learning data, wherein the second internal representation data as data is used as second output data, the second input data and the second output data are used as a set of second internal representation learning data, and and a second internal representation database generating step of generating a second internal representation database by machine learning using .

An expression indicating the expression of the character, with input of first internal representation data generated using the first internal representation database and second internal representation data generated using the second internal representation database. 2. The learning method according to claim 1, further comprising an expression database generating step of generating an expression database for outputting data.

The input data obtaining step includes: the text data including data in text format regarding the content of the user's response to the question; and the image data including data in image format regarding the content of the user's response to the question. 3. The learning method according to claim 1 or 2, wherein the user data including at least one of the following: and audio format data regarding the content of the user's answer to the question.

the user data has text feature amount data indicating features of the text data;
4. The method according to any one of claims 1 to 3, wherein said input data acquisition step includes a text feature amount data acquisition step of acquiring said text feature amount data extracted based on said acquired text data. learning method.

The user data has image feature amount data indicating features of the image data,
5. The method according to any one of claims 1 to 4, wherein said input data acquisition step includes an image feature amount data acquisition step of acquiring said image feature amount data extracted based on said acquired image data. learning method.

The user data has audio feature amount data indicating features of the audio data,
6. The method according to any one of claims 1 to 5, wherein said input data acquisition step includes a voice feature amount data acquisition step of acquiring said voice feature amount data extracted based on said acquired voice data. learning method.

3. A content reproduction device that refers to the first internal representation database, the second internal representation database, and the representation database generated by the learning method according to claim 2, and outputs representation data of the character. and
an acquisition unit that acquires stimulus data including any one or more of arbitrary text data, image data, and audio data;
a first internal representation processing unit that refers to the first internal representation database and acquires the first internal representation data corresponding to the stimulus data acquired by the acquisition unit;
a second internal representation processing unit that refers to the second internal representation database and acquires the second internal representation data corresponding to the stimulus data acquired by the acquisition unit;
Referring to the expression database, corresponding to first internal representation data generated using the first internal representation database and second internal representation data generated using the second internal representation database and an expression processing unit that outputs the expression data.