JP2020149196A

JP2020149196A - Personality prediction device and training data collection device

Info

Publication number: JP2020149196A
Application number: JP2019044659A
Authority: JP
Inventors: 雅彦春野; Masahiko Haruno; 数馬森; Kazuma Mori; 秀紀柏岡; Hidenori Kashioka
Original assignee: National Institute of Information and Communications Technology
Current assignee: National Institute of Information and Communications Technology
Priority date: 2019-03-12
Filing date: 2019-03-12
Publication date: 2020-09-17

Abstract

To make a personality prediction detailed using data collectable via a wide area network.SOLUTION: A personality prediction device of the present invention includes a storage unit and a control unit. The storage unit stores at least one of models RM1 to RMT that maps explanatory variables to an objective variable. The control unit receives the explanatory variables and outputs an objective variable using the at least one of models RM1 to RMT. The objective variable includes a personality value pa. The explanatory variable includes a first feature na, which is obtained from a type of operation performed by a user of a device connected to a wide area network on the device and information of an operation performed time.SELECTED DRAWING: Figure 8

Description

本発明は、広域ネットワークを介して取得可能なデータからパーソナリティを予測するパーソナリティ予測装置、およびパーソナリティの機械学習に必要な学習データを収集する学習データ収集装置に関する。 The present invention relates to a personality prediction device that predicts a personality from data that can be acquired via a wide area network, and a learning data collection device that collects learning data necessary for machine learning of a personality.

従来、広域ネットワークを介して取得可能なデータからパーソナリティを予測するパーソナリティ予測装置が知られている。たとえば、非特許文献１には、実験参加者がＦａｃｅｂｏｏｋ（登録商標）において行った「いいね」に関するデータをＬＡＳＳＯ（Least Absolute Shrinkage and Selection Operator）線形回帰モデルを用いて機械学習し、開放性、外向性、誠実性、外向性、協調性、および神経症傾向の５つの観点からなるビッグファイブを当該実験参加者に関して判定する構成が開示されている。 Conventionally, a personality prediction device that predicts personality from data that can be acquired via a wide area network has been known. For example, in Non-Patent Document 1, data on "likes" performed by experimental participants in Facebook (registered trademark) is machine-learned using a Lasso (Least Absolute Shrinkage and Selection Operator) linear regression model, and openness is described. A configuration is disclosed in which a Big Five is determined for a participant in the experiment from the five viewpoints of extroversion, integrity, extroversion, cooperation, and neuropathy tendency.

Wu Youyoua, Michal Kosinskib, and David Stillwella, "Computer-based personality judgments are more accurate than those made by humans", Proceedings of the National Academy of Sciences (PNAS), 2015Wu Youyoua, Michal Kosinskib, and David Stillwella, "Computer-based personality judgments are more accurate than those made by humans", Proceedings of the National Academy of Sciences (PNAS), 2015

個人のパーソナリティは、当該個人の様々な行動に反映される。ＳＮＳ（Social Network Service）およびＩｏＴ（Internet of Things）機器の普及に伴い、個人のパーソナリティが反映されたデータを、広域ネットワークを介して収集し易くなる。個人がＦａｃｅｂｏｏｋにおいて行った「いいね」に関するデータは、その一例に過ぎない。非特許文献１に開示されているパーソナリティ予測の構成には、さらなる詳細なパーソナリティ予測の実現に向けて改善の余地がある。 An individual's personality is reflected in the individual's various behaviors. With the spread of SNS (Social Network Service) and IoT (Internet of Things) devices, it becomes easier to collect data that reflects an individual's personality via a wide area network. The data about "likes" that individuals have made on Facebook is just one example. The structure of the personality prediction disclosed in Non-Patent Document 1 has room for improvement toward the realization of a more detailed personality prediction.

本発明は、上述のような課題を解決するためになされたものであり、その目的は、広域ネットワークを介して取得可能なデータを用いたパーソナリティ予測を詳細化することである。 The present invention has been made to solve the above-mentioned problems, and an object of the present invention is to refine personality prediction using data that can be acquired via a wide area network.

本発明に係るパーソナリティ予測装置は、記憶部と、制御部とを備える。記憶部には、説明変数を目的変数に対応させる少なくとも１つのモデルが保存されている。制御部は、説明変数を受けて、少なくとも１つのモデルを用いて目的変数を出力する。目的変数は、パーソナリティ値を含む。説明変数は、広域ネットワークに接続された装置のユーザが当該装置に対して行った操作の種類と操作が行われた時刻情報とから得られる第１特徴量を含む。 The personality prediction device according to the present invention includes a storage unit and a control unit. At least one model in which the explanatory variable corresponds to the objective variable is stored in the storage unit. The control unit receives the explanatory variables and outputs the objective variables using at least one model. The objective variable contains a personality value. The explanatory variables include a first feature amount obtained from the type of operation performed on the device by the user of the device connected to the wide area network and the time information on which the operation was performed.

本発明に係る学習データ収集装置は、モデルを機械学習によって生成するための学習データを、広域ネットワークを介して収集する。モデルの目的変数は、パーソナリティ値を含む。モデルの説明変数は、前記広域ネットワークに接続された装置のユーザが当該装置に対して行った操作の種類と前記操作が行われた時刻情報とから得られる特徴量を含む。前記学習データ収集装置は、制御部と、記憶部とを備える。前記制御部は、前記広域ネットワークに接続された端末装置のユーザにパーソナリティ診断テストを行って前記端末装置のユーザのパーソナリティの実測値を取得し、前記実測値と前記特徴量とを関連付けて学習データとして前記記憶部に保存する。 The learning data collecting device according to the present invention collects learning data for generating a model by machine learning via a wide area network. The objective variable of the model contains the personality value. The explanatory variables of the model include a feature amount obtained from the type of operation performed on the device by the user of the device connected to the wide area network and the time information on which the operation was performed. The learning data collecting device includes a control unit and a storage unit. The control unit performs a personality diagnostic test on the user of the terminal device connected to the wide area network, acquires the measured value of the personality of the user of the terminal device, and associates the measured value with the feature amount to learn data. Is stored in the storage unit.

本発明に係るパーソナリティ予測装置および学習データ収集装置によれば、広域ネットワークに接続された装置のユーザが当該装置に対して行った操作の種類と操作が行われた時刻情報とから得られる特徴量を用いることにより、広域ネットワークを介して取得可能なデータを用いたパーソナリティ予測を詳細化することができる。 According to the personality prediction device and the learning data collection device according to the present invention, the feature amount obtained from the type of operation performed on the device by the user of the device connected to the wide area network and the time information of the operation. By using, personality prediction using data that can be acquired via a wide area network can be detailed.

実施の形態に係るパーソナリティ予測装置がインターネットに接続されている様子を示す図である。It is a figure which shows the state that the personality prediction device which concerns on embodiment is connected to the Internet. パーソナリティ診断テストの一例を示す図である。It is a figure which shows an example of a personality diagnostic test. ユーザのＳＮＳにおける行動の時系列を示す図である。It is a figure which shows the time series of the action in the SNS of a user. 単語ベクトルの一例を示す図である。It is a figure which shows an example of a word vector. 図１のパーソナリティ予測装置の機能ブロック図である。It is a functional block diagram of the personality prediction device of FIG. 図５の制御部によって行われる学習処理の概要を示す図である。It is a figure which shows the outline of the learning process performed by the control unit of FIG. 図６の各学習処理の流れを示すフローチャートである。It is a flowchart which shows the flow of each learning process of FIG. 図５の制御部によって行われるパーソナリティ予測処理の概要を示す図である。It is a figure which shows the outline of the personality prediction processing performed by the control unit of FIG. パーソナリティ予測装置による言語性ＩＱの予測値と言語性ＩＱの実測値との対応関係を示す図である。It is a figure which shows the correspondence relationship between the predicted value of verbal IQ by a personality predictor, and the measured value of verbal IQ. インターネットにＩｏＴ機器が接続されている様子を示す図である。It is a figure which shows the state that the IoT device is connected to the Internet. 図１０の電子レンジ、洗濯機、およびテレビに対するユーザの操作の時系列を示す図である。FIG. 10 is a diagram showing a time series of user operations on the microwave oven, washing machine, and television of FIG.

以下、実施の形態について、図面を参照しながら詳細に説明する。なお、図中同一または相当部分には同一符号を付してその説明は原則として繰り返さない。 Hereinafter, embodiments will be described in detail with reference to the drawings. In principle, the same or corresponding parts in the drawings are designated by the same reference numerals and the description is not repeated.

図１は、実施の形態に係るパーソナリティ予測装置１００がインターネット８００（広域ネットワーク）に接続されている様子を示す図である。図１に示されるように、インターネット８００には、パーソナリティ予測装置１００、ＳＮＳ（Social Network Service）サーバ２００、デスクトップコンピュータ５１（端末装置）、スマートフォン５２（端末装置）、およびノート型パーソナルコンピュータ５３（端末装置）が接続されている。ユーザＳｂ１〜Ｓｂ３はそれぞれ、デスクトップコンピュータ５１、スマートフォン５２、およびノート型パーソナルコンピュータ５３のユーザである。 FIG. 1 is a diagram showing a state in which the personality prediction device 100 according to the embodiment is connected to the Internet 800 (wide area network). As shown in FIG. 1, the Internet 800 includes a personality prediction device 100, an SNS (Social Network Service) server 200, a desktop computer 51 (terminal device), a smartphone 52 (terminal device), and a notebook personal computer 53 (terminal). Device) is connected. Users Sb1 to Sb3 are users of the desktop computer 51, the smartphone 52, and the notebook personal computer 53, respectively.

パーソナリティ予測装置１００は、学習データ収集装置を兼ねる。パーソナリティ予測装置と学習データ収集装置とは、別個の装置であっても構わない。パーソナリティ予測装置１００は、ユーザＳｂ１〜Ｓｂ３を含む複数の被験者から複数の学習データを収集する。パーソナリティ予測装置１００は、インターネット８００を介して、図２に示されるようなパーソナリティ診断テストをインターネット８００に接続されている端末に送信する。デスクトップコンピュータ５１、スマートフォン５２（端末装置）、およびノート型パーソナルコンピュータ５３の各々は、パーソナリティ予測装置１００から受信したパーソナリティ診断テストを画面に表示する。ユーザＳｂ１〜Ｓｂ３は、デスクトップコンピュータ５１、スマートフォン５２（端末装置）、およびノート型パーソナルコンピュータ５３においてパーソナリティ診断テストの各項目に回答し、回答結果をパーソナリティ予測装置１００に送信する。 The personality prediction device 100 also serves as a learning data collection device. The personality prediction device and the learning data collection device may be separate devices. The personality prediction device 100 collects a plurality of learning data from a plurality of subjects including the users Sb1 to Sb3. The personality prediction device 100 transmits a personality diagnostic test as shown in FIG. 2 to a terminal connected to the Internet 800 via the Internet 800. Each of the desktop computer 51, the smartphone 52 (terminal device), and the notebook personal computer 53 displays the personality diagnostic test received from the personality prediction device 100 on the screen. The users Sb1 to Sb3 answer each item of the personality diagnostic test on the desktop computer 51, the smartphone 52 (terminal device), and the notebook personal computer 53, and transmit the answer result to the personality prediction device 100.

パーソナリティ予測装置１００は、パーソナリティ診断テストの回答結果を用いて被験者のパーソナリティを実測し、正規化されたパーソナリティ値として算出する。パーソナリティ診断テストの回答結果を用いて算出されたパーソナリティ値を、以下では実測値と呼ぶ。パーソナリティ値は、連続値でもよいし、分類値でもよい。パーソナリティ診断テストによって、「ＰｏｓｉｔｉｖｅｏｒＮｅｇａｔｉｖｅｐｅｒｓｏｎａｌｉｔｙ」（たとえばＢｉｇ５）、「Ｉｎｔｅｌｌｉｇｅｎｃｅ」（たとえばＲａｖｅｎＩＱ）、「Ｃｌｉｎｉｃａｌｐｅｒｓｏｎａｌｉｔｙ」（たとえばＰＤＩ）、および「Ｓｏｃｉｏｅｃｏｎｏｍｉｃ」（たとえばリスク回避）の各カテゴリから選択された８６種類のパーソナリティが実測される。 The personality prediction device 100 actually measures the personality of the subject using the response result of the personality diagnostic test, and calculates it as a normalized personality value. The personality value calculated using the response results of the personality diagnostic test is hereinafter referred to as an actually measured value. The personality value may be a continuous value or a classification value. By personality diagnostic tests, "Positive or Negative personality" (eg Big5), "Intelligence" (eg Raven IQ), "Clinical personality" (eg PDI), and "Socioeconomic" (eg Socioeconomic) Eighty-six personalities are measured.

複数の被験者の各々は、ＳＮＳにアカウントを有する。パーソナリティ予測装置１００は、ＳＮＳが公開するＡＰＩ（Application Programming Interface）を用いて、複数の被験者の各々について、ＳＮＳに投稿したテキスト情報、およびＳＮＳにおける行動に関する情報を取得し、当該被験者のパーソナリティ値の実測値とともに学習データを作成する。ＳＮＳにおける行動に関する情報には、各被験者が行った行動（操作）の種類、および当該行動が行われた時刻情報が含まれる。なお、ＳＮＳとしては、たとえばＦａｃｅｂｏｏｋ（登録商標）、Ｔｗｉｔｔｅｒ（登録商標）、Ｉｎｓｔａｇｒａｍ（登録商標）、およびＬＩＮＥ（登録商標）を挙げることができる。 Each of the plurality of subjects has an account on SNS. The personality prediction device 100 uses an API (Application Programming Interface) published by the SNS to acquire text information posted on the SNS and information on behavior in the SNS for each of a plurality of subjects, and obtains the personality value of the subject. Create learning data along with the measured values. The information about the action in the SNS includes the type of action (operation) performed by each subject and the time information when the action was performed. Examples of the SNS include Facebook (registered trademark), Twitter (registered trademark), Instagram (registered trademark), and LINE (registered trademark).

図３は、ユーザＳｂ１のＳＮＳにおける行動の時系列を示す図である。図３に示されるように、ユーザＳｂ１は、時刻ｔ１、ｔ２およびｔ３においてスマートフォン６０（端末装置）からＳＮＳにおいて或る行動（たとえば「いいね」の選択、あるいは文章あるいは画像の投稿）をしている。複数の被験者の各々がインターネット８００に接続された端末装置を介してＳＮＳにおいて行った行動の種類および当該行動が行われた時刻情報から得られるネットワーク情報（第１特徴量）が、学習データに含まれる。当該ネットワーク情報には、各被験者の行動パターンを表す情報として、当該被験者がＳＮＳにおいて活動する時間帯（たとえば早朝、昼、夕方、あるいは深夜）、およびＳＮＳにおける行動の時間間隔（たとえば平均，ばらつき）が含まれる。 FIG. 3 is a diagram showing a time series of actions of the user Sb1 in the SNS. As shown in FIG. 3, the user Sb1 performs a certain action (for example, "like" selection or posting of a sentence or an image) on the SNS from the smartphone 60 (terminal device) at times t1, t2 and t3. There is. The learning data includes the type of action performed by each of the plurality of subjects in the SNS via the terminal device connected to the Internet 800 and the network information (first feature amount) obtained from the time information in which the action was performed. Is done. In the network information, as information representing the behavior pattern of each subject, the time zone in which the subject is active in the SNS (for example, early morning, noon, evening, or midnight), and the time interval of the behavior in the SNS (for example, average, variation) Is included.

被験者の行動の順序には、当該被験者の行動パターンが顕著に反映される。パーソナリティ予測装置１００においては、被験者がＳＮＳにおいて行った行動の順序を当該行動の時刻情報から導出することができるため、行動の種類のみからパーソナリティを予測する場合よりも、より詳細なパーソナリティ予測を行うことができる。 The behavior pattern of the subject is remarkably reflected in the order of the behavior of the subject. In the personality prediction device 100, since the order of actions performed by the subject in the SNS can be derived from the time information of the action, more detailed personality prediction is performed than in the case of predicting the personality only from the type of action. be able to.

パーソナリティ予測装置１００は、ＳＮＳサーバから取得したテキスト情報に対して形態素解析を行って言語情報（第２特徴量）を抽出する。言語情報には、単語ベクトルが含まれる。図４には、「今日」、「は」、「暑い」、「日」、および「だった」という品詞を次元に含む単語ベクトルが示されている。単語ベクトルから算出される、各次元の語数の平均値、分散値、および単語ベクトルに含まれる感情語（ポジティブ語あるいはネガティブ語）の数も言語情報に含まれる。 The personality prediction device 100 performs morphological analysis on the text information acquired from the SNS server and extracts linguistic information (second feature amount). Language information includes word vectors. FIG. 4 shows a word vector whose dimensions include the parts of speech "today", "ha", "hot", "day", and "was". The average value of the number of words in each dimension, the variance value, and the number of emotional words (positive words or negative words) included in the word vector calculated from the word vector are also included in the linguistic information.

単語ベクトルの各次元の用語は、被験者のパーソナリティに応じて差が生じやすい用語であることが望ましい。そのため、複数の被験者の全テキスト情報に現れる用語の回数が予め定められた範囲に含まれる用語が単語ベクトルの次元に含まれる用語として選択されることが望ましい。また、単語ベクトルの次元に含まれる用語は、名詞、形容詞、動詞、および副詞に限定されてもよいし、「今日は」、「暑い日」のようなバイグラムが用いられてもよい。 It is desirable that the terms in each dimension of the word vector are terms that are likely to differ depending on the personality of the subject. Therefore, it is desirable that a term whose number of terms appearing in the total text information of a plurality of subjects is included in a predetermined range is selected as a term included in the dimension of the word vector. In addition, the terms included in the dimension of the word vector may be limited to nouns, adjectives, verbs, and adverbs, and bigrams such as "today" and "hot day" may be used.

図５は、図１のパーソナリティ予測装置１００の機能ブロック図である。図５に示されるように、パーソナリティ予測装置１００は、記憶部１０１と、制御部１０２と、ＲＡＭ（Random Access Memory）１０３と、ネットワークインターフェイス１０４と、表示部１０５とを含む。 FIG. 5 is a functional block diagram of the personality prediction device 100 of FIG. As shown in FIG. 5, the personality prediction device 100 includes a storage unit 101, a control unit 102, a RAM (Random Access Memory) 103, a network interface 104, and a display unit 105.

記憶部１０１には、たとえばオペレーティングシステムのプログラム、パーソナリティ予測アプリケーションのプログラム、パーソナリティ学習プログラム、複数の学習データ、および複数のモデルが保存されている。複数のモデルの各々は、ネットワーク情報および言語情報を含む説明変数を、パーソナリティ値を含む説明変数に対応させる。記憶部１０１は、たとえば、ハードディスクあるいは外付けの記憶媒体である。 The storage unit 101 stores, for example, an operating system program, a personality prediction application program, a personality learning program, a plurality of training data, and a plurality of models. Each of the models maps an explanatory variable containing network information and linguistic information to an explanatory variable containing a personality value. The storage unit 101 is, for example, a hard disk or an external storage medium.

制御部１０２は、記憶部１０１に保存されている各種プログラムを実行することにより、パーソナリティ予測装置１００の機能を実現する。制御部１０２は、ＣＰＵ（Central Processing Unit）のようなコンピュータを含む。ＲＡＭ１０３は、ワーキングメモリとして機能し、プログラムの実行に必要な各種データを一時的に格納する。制御部１０２は、ネットワークインターフェイス１０４を介して、インターネットに接続されている外部装置へデータを送信するとともに、当該外部装置からのデータを受信する。 The control unit 102 realizes the function of the personality prediction device 100 by executing various programs stored in the storage unit 101. The control unit 102 includes a computer such as a CPU (Central Processing Unit). The RAM 103 functions as a working memory and temporarily stores various data necessary for executing the program. The control unit 102 transmits data to an external device connected to the Internet via the network interface 104, and receives data from the external device.

表示部１０５は、たとえば、液晶ディスプレイ、有機ＥＬ（Electro Luminescence）ディスプレイ、またはその他の表示機器を含む。表示部１０５は、不図示のタッチセンサと組み合わされてタッチパネルとして構成されてもよい。表示部１０５には、各種アプリケーションのＧＵＩ（Graphical User Interface）が表示される。 The display unit 105 includes, for example, a liquid crystal display, an organic EL (Electro Luminescence) display, or other display device. The display unit 105 may be configured as a touch panel in combination with a touch sensor (not shown). The GUI (Graphical User Interface) of various applications is displayed on the display unit 105.

図６は、図５の制御部１０２によって行われる学習処理の概要を示す図である。図５に示されるように、制御部１０２は、複数の学習データに対して複数回の学習を行って各学習においてモデルを生成するアンサンブル学習を行う。具体的には、アンサンブル学習の一種であるブースティングの一例として、制御部１０２は、Ｔ（＞１）回の学習処理を逐次的に行って各学習においてモデルを生成するＡｄａＢｏｏｓｔ（Adaptive Boosting）を行う。なお、制御部１０２は、アンサンブル学習としてバギング、あるいはランダムフォレストを行ってもよい。また、制御部１０２がアンサンブル学習以外の学習を行ってもよいし、生成されるモデルが１つでもよい。 FIG. 6 is a diagram showing an outline of the learning process performed by the control unit 102 of FIG. As shown in FIG. 5, the control unit 102 performs ensemble learning in which a plurality of learning data are learned a plurality of times to generate a model in each learning. Specifically, as an example of boosting, which is a type of ensemble learning, the control unit 102 sequentially performs T (> 1) learning processes to generate a model in each learning of AdaBoost (Adaptive Boosting). Do. The control unit 102 may perform bagging or random forest as ensemble learning. Further, the control unit 102 may perform learning other than ensemble learning, or may generate only one model.

各学習処理においては、Ｋ個の学習データＤ_１〜Ｄ_Ｋが使用され、ＬＡＳＳＯを正則化項とする線形回帰分析が行われる。Ｔ回の学習処理によって生成されるモデルＲＭ_１〜ＲＭ_Ｔの各々は、ＬＡＳＳＯ線形回帰モデルである。ＬＡＳＳＯを正則化項として用いることにより、過学習を抑制することができるとともに、モデルＲＭ_１〜ＲＭ_Ｔの説明変数の次元を削減（スパース化）することができる。なお、各学習処理は、教師データとモデルの説明変数（予測データ）との誤差を最小化する教師あり学習であればどのような学習法でもよく、たとえばディープニューラルネットワークを用いた深層学習であってもよい。 In each learning process, K pieces of learning data D ₁ to D _K is used, a linear regression analysis is performed to the LASSO the regularization term. Each of the models RM _{1 to} RM _T generated by the training process of T times is a Lasso linear regression model. By using LASSO as a regularization argument, overfitting can be suppressed and the dimensions of the explanatory variables of the models RM _{1 to} RM _T can be reduced (sparsed). Note that each learning process may be any learning method as long as it is supervised learning that minimizes the error between the teacher data and the explanatory variables (prediction data) of the model. For example, it is deep learning using a deep neural network. You may.

学習データＤ_ｉには、被験者Ｓｂ_ｉのネットワーク情報ｎ_ｉ、言語情報ｓ_ｉ、およびパーソナリティの実測値ｍ_ｉが含まれている。奇数回目の学習処理においては、ネットワーク情報ｎ_ｉとパーソナリティの実測値ｍ_ｉとの対応関係が学習される。偶数回目の学習においては、言語情報ｓ_ｉとパーソナリティの実測値ｍ_ｉとの対応関係が学習される。 The training data _{D i,} it contains network information _{n i,} language information _{s i,} and personality measured value _{m i} of the subject Sb _i. In the learning process of the odd-numbered, correspondence between the measured value m _i network information n _i and personality it is learned. In even-numbered learning correspondence between the measured value m _i of the language information s _i and personality is learned.

ＡｄａＢｏｏｓｔにおいては、各学習データに重み（重要度）ｗ_ｊ（ｉ）が付される。重みｗ_ｊ（ｉ）は、ｊ回目の学習における学習データＤ_ｉの重みを意味する。重みの初期値ｗ_１（ｉ）は、１／Ｋに初期化される。 In AdaBoost, each learning data is weighted (importance) w _j (i). The weights _w j (i) refers to the weight of the learning data _{D i} in the j-th learning. The initial value w ₁ (i) of the weight is initialized to 1 / K.

図７は、図６の各学習処理の流れを示すフローチャートである。図７に示される処理は、ｊ回目の学習処理であり、ＡｄａＢｏｏｓｔを行う不図示のメインルーチンから呼び出される。以下では、ステップを単にＳと記載する。 FIG. 7 is a flowchart showing the flow of each learning process of FIG. The process shown in FIG. 7 is the j-th learning process, and is called from a main routine (not shown) that performs AdaBoost. In the following, the step is simply referred to as S.

図７に示されるように、制御部１０２は、Ｓ１０１において各学習データの重みを以下の式（１）に従って正規化し、処理をＳ１０２に進める。 As shown in FIG. 7, the control unit 102 normalizes the weight of each learning data in S101 according to the following equation (1), and advances the process to S102.

制御部１０２は、Ｓ１０２において線形回帰分析において以下の式（２）に示される重み付きの誤り率ε_ｊを最小化することにより、モデルＲＭ_ｊを生成する。誤り率ε_ｊは、２分の１よりも小さい。制御部１０２は、モデルＲＭ_ｊを記憶部１０１に保存して、処理をＳ１０３に進める。以下の式（３）は、説明変数ｘ_ｉに対応するモデルＲＭ_ｊの説明変数（予測値）と実測値との二乗誤差ｄ_ｊを示す。式（３）の説明変数ｘ_ｉは、奇数回目（ｊが奇数）の学習処理においてはネットワーク情報ｎ_ｉであり、偶数回目（ｊが偶数）の学習処理においては言語情報ｓ_ｉである。 The control unit 102 generates the model RM _j by minimizing the weighted error rate ε _j shown in the following equation (2) in the linear regression analysis in S102. The error rate ε _j is less than half. The control unit 102 stores the model RM _j in the storage unit 101 and advances the process to S103. The following equation (3) shows the square error d _j between the actual measurement value and the explanatory variables (predicted value) of the model RM _j corresponding to the explanatory variable x _i. Explanatory variable x _i of Equation (3) is, in the learning process in the odd number of times (j is an odd number) is the network information n _i, in the learning process of the even-numbered (j is an even number) is the language information s _i.

式（２）に示されるように、正規化された重みπ_ｊ（ｉ）が相対的に大きい程、対応する二乗誤差ｄ_ｊを小さくすることが誤差ε_ｊの最小化に寄与する。そのため、線形回帰分析においては、重みπ_ｊ（ｉ）が相対的に大きい程、二乗誤差ｄ_ｊが小さくなるようにモデルＲＭ_ｊが生成される。すなわち、ｊ回目の学習処理においては、重みｗ_ｊ（ｉ）が相対的に大きい学習データが集中的に学習される。 As shown in the equation (2), the smaller the normalized weight π _j (i) is, the smaller the corresponding square error d _j contributes to the minimization of the error ε _j . Therefore, in the linear regression analysis, the model RM _j is generated so that the larger the weight π _j (i) is, the smaller the square error d _j is. That is, in the j-th learning process, the learning data having a relatively large weight w _j (i) is intensively learned.

制御部１０２は、Ｓ１０３において、以下の式（４）に従ってモデルＲＭ_ｊの信頼度α_ｊを算出し、モデルＲＭ_ｊと信頼度α_ｊとを関連付けて記憶部１０１に保存し、処理をＳ１０４に進める。信頼度α_ｊは、誤り率ε_ｊが小さいほど大きい。式（５）に示される変数β_ｊは、誤り率ε_ｊが２分の１よりも小さいことから、１よりも小さい。 In S103, the control unit 102 calculates the reliability α _j of the model RM _j according to the following equation (4), associates the model RM _j with the reliability α _j , stores it in the storage unit 101, and processes the process in S104. Proceed. The reliability α _j increases as the error rate ε _j decreases. The variable β _j shown in the equation (5) is smaller than 1 because the error rate ε _j is smaller than half.

制御部１０２は、Ｓ１０４において各学習データの重みｗ_ｊ（ｉ）を以下の式（６）に従って順次更新した後、処理をメインルーチンに返す。（ｊ＋１）回目の学習処理においては、重みｗ_ｊ＋１（ｉ）が使用される。 The control unit 102 sequentially updates the weights w _j (i) of each learning data in S104 according to the following equation (6), and then returns the process to the main routine. In the (j + 1) th learning process, the weight w _{j + 1} (i) is used.

変数β_ｊは１よりも小さいため、二乗誤差ｄ_ｊが小さいほど、学習データＤ_ｉの重みは小さくなる。その結果、二乗誤差ｄ_ｊが相対的に大きい学習データＤ_ｉの重みは、次の（ｊ＋１）回目の学習において相対的に大きくなる。すなわち、（ｊ＋１）回目の学習処理においては、モデルＲＭ_ｊの二乗誤差が相対的に大きい学習データが集中的に学習される。 Since the variable beta _j is smaller than 1, the more square error d _j is small, the weight of the learning data D _i becomes small. As a result, the weight of the square error d _j is relatively large training data D _i is relatively larger in the next (j + 1) th learning. That is, in the (j + 1) th learning process, the learning data having a relatively large square error of the model RM _j is intensively learned.

図８は、図５の制御部１０２によって行われるパーソナリティ予測処理の概要を示す図である。図８に示されるように、制御部１０２は、説明変数としてネットワーク情報ｎ_ａおよび言語情報ｓ_ａを受けて、目的変数としてパーソナリティ値ｐ_ａを出力する。パーソナリティ値ｐ_ａは、以下の式（７）に示されるように、モデルＲＭ_１〜ＲＭ_Ｔの出力を各モデルの信頼度に応じて加重平均された値である。 FIG. 8 is a diagram showing an outline of personality prediction processing performed by the control unit 102 of FIG. As shown in FIG. 8, the control unit 102 receives the network information n _a and language information s _a as an explanatory variable, and outputs the personality value p _a as the objective variable. Personality value p _a, as shown in the following equation (7), a weighted average value in accordance with the output of the model RM ₁ ~RM _T to the reliability of each model.

図９は、パーソナリティ予測装置１００による言語性ＩＱ（Intelligence Quotient）の予測値と言語性ＩＱの実測値との対応関係を示す図である。図９においては、被験者のパーソナリティの予測値と実測値とが１つの点としてプロットされている。図９に示されるように、言語性ＩＱの予測値と言語性ＩＱの実測値との相関係数Ｒは０．３８５であり、ｐ値は０．０１より小さい。一般に相関係数が０．２より大きい場合、両者には相間があり、予測値によって実測値を予測可能であるとされる。そのため、パーソナリティ予測装置１００によって言語性ＩＱが予測可能であるといえる。 FIG. 9 is a diagram showing the correspondence between the predicted value of the linguistic IQ (Intelligence Quotient) by the personality prediction device 100 and the measured value of the linguistic IQ. In FIG. 9, the predicted value and the measured value of the personality of the subject are plotted as one point. As shown in FIG. 9, the correlation coefficient R between the predicted value of verbal IQ and the measured value of verbal IQ is 0.385, and the p-value is smaller than 0.01. Generally, when the correlation coefficient is larger than 0.2, there is a phase between the two, and it is said that the measured value can be predicted by the predicted value. Therefore, it can be said that the verbal IQ can be predicted by the personality prediction device 100.

なお、パーソナリティ予測装置１００によって予測可能なパーソナリティは、言語性ＩＱに限定されない。パーソナリティ予測装置１００によれば、Ｂｉｇ５以外にも、たとえば、自閉症、共感性、および不安傾向等の予測が可能である。 The personality that can be predicted by the personality prediction device 100 is not limited to the verbal IQ. According to the personality prediction device 100, it is possible to predict, for example, autism, empathy, anxiety tendency, etc. in addition to Big5.

パーソナリティ予測装置１００においては、被験者の行動の時刻情報として、ＳＮＳにおいて行った行動の時刻情報を用いる場合について説明した。被験者の行動の時刻情報は、ＳＮＳにおいて行った行動の時刻情報に限定されず、たとえばＩｏＴ機器への操作の時刻情報であってもよい。 In the personality prediction device 100, a case where the time information of the behavior performed in the SNS is used as the time information of the behavior of the subject has been described. The time information of the behavior of the subject is not limited to the time information of the behavior performed in the SNS, and may be, for example, the time information of the operation on the IoT device.

図１０は、インターネット８００にＩｏＴ機器が接続されている様子を示す図である。図１０に示されるように、インターネット８００には、電子レンジ６１（装置）、洗濯機６２（装置）、およびテレビ６３（装置）が接続されている。電子レンジ６１、洗濯機６２、およびテレビ６３は、ＩｏＴ機器に含まれる。 FIG. 10 is a diagram showing a state in which an IoT device is connected to the Internet 800. As shown in FIG. 10, a microwave oven 61 (device), a washing machine 62 (device), and a television 63 (device) are connected to the Internet 800. The microwave oven 61, the washing machine 62, and the television 63 are included in the IoT device.

図１１は、図１０の電子レンジ６１、洗濯機６２、およびテレビ６３に対するユーザＳｂ２の操作の時系列を示す図である。図１１に示されるように、ユーザＳｂ２は、時刻ｔ１１において電子レンジ６１による暖めを開始している。ユーザＳｂ２は、時刻ｔ１２において洗濯機６２による洗濯を開始している。ユーザＳｂ２は、時刻ｔ１３においてリモコンを用いてテレビ６３をつけている。ユーザＳｂ２が電子レンジ６１、洗濯機６２、およびテレビ６３に対して行った操作、および当該操作が行われた時刻情報から得られる特徴量（第１特徴量）が、学習データに含まれてもよい。すなわち、パーソナリティ予測装置１００においては、パーソナリティ予測の対象者がＩｏＴ機器に対して行った操作および当該操作から得られる特徴量（第１特徴量）が当該対象者のパーソナリティ予測に用いられてもよい。 FIG. 11 is a diagram showing a time series of operations of the user Sb2 with respect to the microwave oven 61, the washing machine 62, and the television 63 of FIG. As shown in FIG. 11, the user Sb2 has started warming by the microwave oven 61 at time t11. User Sb2 has started washing with the washing machine 62 at time t12. User Sb2 is turning on the television 63 at time t13 using the remote controller. Even if the learning data includes the operation performed by the user Sb2 on the microwave oven 61, the washing machine 62, and the television 63, and the feature amount (first feature amount) obtained from the time information in which the operation is performed. Good. That is, in the personality prediction device 100, the operation performed by the target person of the personality prediction on the IoT device and the feature amount (first feature amount) obtained from the operation may be used for the personality prediction of the target person. ..

以上、実施の形態に係るパーソナリティ予測装置によれば、パーソナリティ予測を詳細化することができる。 As described above, according to the personality prediction device according to the embodiment, the personality prediction can be refined.

今回開示された各実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて請求の範囲によって示され、請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 It should be considered that each embodiment disclosed this time is exemplary in all respects and not restrictive. The scope of the present invention is shown by the claims rather than the above description, and it is intended to include all modifications within the meaning and scope equivalent to the claims.

５１デスクトップコンピュータ、５２，６０スマートフォン、５３ノート型パーソナルコンピュータ、６１電子レンジ、６２洗濯機、６３テレビ、１００パーソナリティ予測装置、１０１記憶部、１０２制御部、１０３ＲＡＭ、１０４ネットワークインターフェイス、１０５表示部、２００ＳＮＳサーバ、８００インターネット。 51 desktop computer, 52,60 smartphone, 53 notebook personal computer, 61 microwave oven, 62 washing machine, 63 TV, 100 personality predictor, 101 storage unit, 102 control unit, 103 RAM, 104 network interface, 105 display unit, 200 SNS server, 800 internet.

Claims

A storage unit that stores at least one model that maps the explanatory variables to the objective variables,
A control unit that receives the explanatory variables and outputs the objective variable using the at least one model is provided.
The objective variable contains a personality value
The explanatory variable is a personality prediction device including a first feature amount obtained from the type of operation performed on the device by a user of the device connected to the wide area network and the time information on which the operation was performed.

The explanatory variables are a second feature amount obtained from a morphological analysis of a sentence transmitted by the user from a terminal device connected to the wide area network to a server of an SNS (Social Network Service) connected to the wide area network.
The first feature amount is generated from the type and time information of the action performed by the user in the SNS.
The personality prediction device according to claim 1, wherein the control unit acquires the first feature amount and the second feature amount from the server.

A plurality of learning data are stored in the storage unit.
Each of the plurality of learning data is the measured value of each personality of the plurality of persons, and the learning data in which the first feature amount and the second feature amount corresponding to each of the plurality of persons are associated with each other.
The control unit performs ensemble learning that performs learning on the plurality of training data a plurality of times to generate a model in each learning, generates at least one model, and is responsible for each of the at least one model. The personality prediction device according to claim 2, wherein a weighted average value of the outputs is output as the personality value.

In the ensemble learning, for each of the plurality of learning data in each learning, the larger the error between the predicted value of the personality predicted from the learning data and the actually measured value of the learning data, the relative the weight of the learning data. AdaBoost (Adaptive Boosting) that increases the number of data and reflects it in minimizing the error in the next learning.
The control unit learns to minimize the error between the predicted value of the personality predicted from the first feature amount and the measured value, and the error between the predicted value of the personality predicted from the second feature amount and the measured value. Alternately learn to minimize
The personality prediction device according to claim 3, wherein the output of each of the at least one model is weighted and averaged according to the reliability of each of the at least one model, and the value is output as the personality value.

The personality predictor according to any one of claims 1 to 4, wherein the at least one model is a linear regression model in which a Lasso (Least Absolute Shrinkage and Selection Operator) is used as a regularization term.

It is a learning data collection device that collects training data for generating a model by machine learning via a wide area network.
The objective variable of the model contains a personality value.
The explanatory variables of the model include features obtained from the type of operation performed on the device by the user of the device connected to the wide area network and the time information on which the operation was performed.
The learning data collecting device is
Control unit and
Equipped with a storage unit
The control unit performs a personality diagnostic test on the user of the terminal device connected to the wide area network, acquires the measured value of the personality of the user of the terminal device, and associates the measured value with the feature amount to learn data. A learning data collecting device that stores the data in the storage unit.