JP2018004813A

JP2018004813A - Communication ability estimation device, method, and program

Info

Publication number: JP2018004813A
Application number: JP2016128904A
Authority: JP
Inventors: 有紀子中野; Yukiko Nakano; 将吾岡田; Shiyougo Okada; 良広松儀; Yoshihiro Matsugi; 佑樹林; Yuki Hayashi; 宏軒黄; Hung-Hsuan Huang; 裕高瀬; Yutaka Takase; 克己新田; Katsumi Nitta
Original assignee: Seikei Gakuen
Current assignee: Seikei Gakuen
Priority date: 2016-06-29
Filing date: 2016-06-29
Publication date: 2018-01-11

Abstract

PROBLEM TO BE SOLVED: To provide a communication ability estimation device using a machine learning technique.SOLUTION: A communication ability estimation device according to the present invention comprises: means that extracts the amount of characteristics on speech from speech sound information of subjects participating in a group discussion; means that inputs the amount of characteristics on the speech of each subject and outputs a rating value of a communication ability of the subject for creation of teacher data; means that allows a learning machine to learn the created teacher data and creates an estimation model for estimating a communication ability; means that inputs the amount of characteristics on speech extracted from speech sound information of a subject to be rated by the means that extracts the amount of characteristics to the estimation model when the subject to be rated participates in the group discussion; and means that acquires an estimated value of a communication ability of the subject as an output of the estimation model.SELECTED DRAWING: Figure 1

Description

本発明は、コミュニケーション能力を推定する技術に関する。 The present invention relates to a technique for estimating communication ability.

近年、人工知能分野において「インタラクションを通じて行動主体が他者との社会的な関係を構築し、利用することで問題解決を行う能力」と定義される社会知（非特許文献１）をモデル化する研究が活発に行われている（例えば、非特許文献２）。 In recent years, in the field of artificial intelligence, social intelligence (Non-Patent Document 1), which is defined as "the ability of an action subject to build and use social relationships with others and solve problems through interaction", is modeled. Research is actively conducted (for example, Non-Patent Document 2).

一方、近年の人材育成の現場では、若年者のコミュニケーション能力の向上を図ることが重要な課題となっており、そのための教育・訓練基盤の整備が強く求められている。 On the other hand, in the field of human resource development in recent years, it is an important issue to improve the communication ability of young people, and there is a strong demand for the establishment of an education and training base for that purpose.

西田豊明, 角康之, 松村真宏：社会知デザイン, 知の科学, オーム社(2009)Toyoaki Nishida, Yasuyuki Kado, Masahiro Matsumura: Social Intelligence Design, Knowledge Science, Ohmsha (2009)

Nishida, T.: Conversational informatics: an engineeringapproach, Vol. 9, Wiley. com (2008)Nishida, T .: Conversational informatics: an engineeringapproach, Vol. 9, Wiley.com (2008)

本発明は、上記に鑑みてなされたものであり、機械学習の手法を用いたコミュニケーション能力推定装置を提供することを目的とする。 The present invention has been made in view of the above, and an object of the present invention is to provide a communication ability estimation apparatus using a machine learning technique.

本発明者は、機械学習の手法を用いたコミュニケーション能力推定装置の構成につき鋭意検討した結果、以下の構成に想到し、本発明に至ったのである。 As a result of intensive studies on the configuration of a communication ability estimation device using a machine learning technique, the present inventor has conceived the following configuration and has reached the present invention.

すなわち、本発明によれば、コミュニケーション能力を推定する装置であって、グループディスカッションに参加する各被験者の発話音声情報から発話に関する特徴量を抽出する手段と、各前記被験者の前記発話に関する特徴量を入力とし、該被験者のコミュニケーション能力の評定値を出力とする教師データを生成する手段と、生成した前記教師データを学習器に学習させてコミュニケーション能力を推定する推定モデルを生成する手段と、評価対象となる被験者がグループディスカッションに参加した際に、前記特徴量を抽出する手段が該被験者の発話音声情報から抽出した発話に関する特徴量を前記推定モデルに対して入力する手段と、前記推定モデルの出力として該被験者のコミュニケーション能力の推定値を取得する手段と、を含み、前記発話に関する特徴量を抽出する手段は、前記発話音声情報を解析して言語情報を取得し、該言語情報から所定の品詞を抽出し、該所定の品詞の出現数に関する第１の特徴量を抽出することを特徴とする、コミュニケーション能力推定装置が提供される。 That is, according to the present invention, an apparatus for estimating communication ability, a means for extracting a feature amount related to speech from speech speech information of each subject participating in a group discussion, and a feature amount related to the speech of each subject. Means for generating teacher data as an input and outputting the evaluation value of the subject's communication ability, means for generating an estimation model for estimating the communication ability by causing the learner to learn the generated teacher data, and an evaluation object When the subject who participates in the group discussion, the means for extracting the feature amount inputs the feature amount related to the utterance extracted from the utterance voice information of the subject to the estimation model, and the output of the estimation model Means for obtaining an estimated value of the communication ability of the subject. The means for extracting the feature value related to the utterance analyzes the utterance voice information to acquire language information, extracts a predetermined part of speech from the language information, and a first feature amount related to the number of appearances of the predetermined part of speech. A communication ability estimation device is provided that is characterized by extracting.

上述したように、本発明によれば、機械学習の手法を用いたコミュニケーション能力推定装置が提供される。 As described above, according to the present invention, a communication ability estimation apparatus using a machine learning technique is provided.

本実施形態のコミュニケーション能力推定装置の機能ブロック図。The functional block diagram of the communication capability estimation apparatus of this embodiment. 教師データ生成処理のフローチャート。The flowchart of a teacher data generation process. 本実施形態のコミュニケーション能力推定装置の機能ブロック図。The functional block diagram of the communication capability estimation apparatus of this embodiment. コミュニケーション能力推定処理のフローチャート。The flowchart of a communication ability estimation process. 本実施例で生成した回帰モデルの決定係数Ｒ^２を示す棒グラフ。Bar graph showing the coefficient of determination R ² of the regression model generated in this embodiment. 本実施例で生成した分類モデルの正答率を示す棒グラフ。The bar graph which shows the correct answer rate of the classification model produced | generated by the present Example. 本実施例で生成した分類モデルのROC曲線を示す図。The figure which shows the ROC curve of the classification model produced | generated in the present Example.

以下、本発明を図面に示した実施の形態をもって説明するが、本発明は、図面に示した実施の形態に限定されるものではない。なお、以下に参照する各図においては、共通する要素について同じ符号を用い、適宜、その説明を省略するものとする。 Hereinafter, the present invention will be described with reference to embodiments shown in the drawings, but the present invention is not limited to the embodiments shown in the drawings. In the drawings referred to below, the same reference numerals are used for common elements, and the description thereof is omitted as appropriate.

図１は、本発明の実施形態であるコミュニケーション能力推定装置１００の機能ブロック図を示す。以下、図１に基づいて、本実施形態のコミュニケーション能力推定装置１００の機能構成を説明する。 FIG. 1 shows a functional block diagram of a communication capability estimation apparatus 100 that is an embodiment of the present invention. Hereinafter, the functional configuration of the communication capability estimation apparatus 100 according to the present embodiment will be described with reference to FIG.

図１に示すように、本実施形態のコミュニケーション能力推定装置１００は、コーパス収集部１０１と、評定値入力部１０２と、音声認識部１０３と、形態素解析部１０４と、発話音声解析部１０５と、動作量検出部１０６と、特徴量抽出部１０７と、教師データ生成部１０８と、学習実行部１０９と、学習器１１０と、推定実行部１１２と、推定値出力部１１３とを含んで構成される。 As shown in FIG. 1, the communication ability estimation apparatus 100 of the present embodiment includes a corpus collection unit 101, a rating value input unit 102, a speech recognition unit 103, a morpheme analysis unit 104, an utterance speech analysis unit 105, The movement amount detection unit 106, the feature amount extraction unit 107, the teacher data generation unit 108, the learning execution unit 109, the learning device 110, the estimation execution unit 112, and the estimated value output unit 113 are configured. .

コーパス収集部１０１は、クループディスカッションに参加する被験者からコーパスを収集する手段である。 The corpus collection unit 101 is means for collecting a corpus from subjects participating in a group discussion.

評定値入力部１０２は、コミュニケーション能力を評定する専門家から被験者の評定値の入力を受け付ける手段である。 The rating value input unit 102 is means for receiving an input of a test subject's rating value from an expert who evaluates communication ability.

音声認識部１０３は、被験者の発話音声情報を解析して言語情報（テキストデータ）を取得する手段である。 The voice recognition unit 103 is means for analyzing utterance voice information of a subject and acquiring language information (text data).

形態素解析部１０４は、音声認識部１０３が取得した言語情報に対して形態素解析を施して所定の品詞を抽出する手段である。 The morpheme analysis unit 104 is a unit that performs morpheme analysis on the linguistic information acquired by the speech recognition unit 103 and extracts predetermined parts of speech.

発話音声解析部１０５は、被験者の発話音声情報を解析して発話を抽出する手段である。 The utterance voice analysis unit 105 is a means for extracting the utterance by analyzing the utterance voice information of the subject.

動作量検出部１０６は、被験者のモーション情報から動作量を検出する手段である。 The motion amount detection unit 106 is a means for detecting the motion amount from the motion information of the subject.

特徴量抽出部１０７は、形態素解析部１０４が抽出する所定の品詞、発話音声解析部１０５が抽出する発話、および、動作量検出部１０６が検出する動作量に基づいて、被験者のコミュニケーションに係る特徴量を抽出する手段である。 The feature amount extraction unit 107 is a feature related to the subject's communication based on the predetermined part of speech extracted by the morphological analysis unit 104, the utterance extracted by the utterance voice analysis unit 105, and the operation amount detected by the operation amount detection unit 106. It is a means for extracting the quantity.

教師データ生成部１０８は、コミュニケーション能力の推定モデルを生成するための教師データを生成する手段である。 The teacher data generation unit 108 is a means for generating teacher data for generating an estimation model of communication ability.

学習実行部１０９は、教師データ生成部１０８が生成した教師データを学習器に学習させる手段である。 The learning execution unit 109 is a unit that causes the learning device to learn the teacher data generated by the teacher data generation unit 108.

学習器１１０は、教師データを学習してコミュニケーション能力の推定モデルを生成する手段である。 The learning device 110 is a unit that learns teacher data and generates an estimation model of communication ability.

推定実行部１１２は、評価対象となる被験者の特徴量を推定モデルに対して入力する手段である。 The estimation execution unit 112 is means for inputting the feature amount of the subject to be evaluated into the estimation model.

推定値出力部１１３は、推定モデルの出力として被験者のコミュニケーション能力の推定値を取得する手段である。 The estimated value output unit 113 is means for acquiring an estimated value of the subject's communication ability as an output of the estimated model.

なお、本実施形態では、コミュニケーション能力推定装置１００を構成するコンピュータが所定のプログラムを実行することにより、コミュニケーション能力推定装置１００が上述した各手段として機能する。 In the present embodiment, the communication capacity estimation apparatus 100 functions as the above-described units when a computer constituting the communication capacity estimation apparatus 100 executes a predetermined program.

以上、コミュニケーション能力推定装置１００の機能構成について説明してきたが、続いて、コミュニケーション能力推定装置１００が実行する処理について説明する。コミュニケーション能力推定装置１００は、コミュニケーション能力の推定モデルを生成するために機械学習を行う“学習モード”と、生成された推定モデル（学習器）を用いて被験者のコミュニケーション能力を推定する“推定モード”を備えている。ここでは、まず、図１を参照しながら“学習モード”について説明する。 The functional configuration of the communication ability estimation device 100 has been described above. Next, processing executed by the communication ability estimation device 100 will be described. The communication ability estimation apparatus 100 performs “learning mode” in which machine learning is performed to generate an estimation model of communication ability, and “estimation mode” in which the communication ability of the subject is estimated using the generated estimation model (learning device). It has. Here, the “learning mode” will be described first with reference to FIG.

コミュニケーション能力推定装置１００は、“学習モード”において、まず最初に、コーパスを収集する。以下、コーパスを収集する手順を説明する。 In the “learning mode”, the communication ability estimation apparatus 100 first collects corpora. The procedure for collecting the corpus will be described below.

複数の被験者を集めてグループ分けし、各グループに対して共通の課題を与えてグループディスカッションを行わせる。このとき、グループのメンバーはお互いに初対面であることが好ましく、また、各グループに対して、傾向の異なる複数の課題を与えることが好ましい。 Collect multiple subjects and divide them into groups. Give each group a common task and conduct a group discussion. At this time, it is preferable that the members of the group meet each other for the first time, and it is preferable to give a plurality of tasks having different tendencies to each group.

グループディスカッションが行われる間、各被験者に指向性の音声マイクを装着して、各被験者の発話音声情報を個別的に取得するとともに、任意のモーションセンサ（例えば、３軸加速度センサ）を各被験者の身体（好ましくは、頭部）に装着して、各被験者のモーション情報（例えば、加速度データ）を取得する。なお、本実施形態において、“モーション情報”とは、被験者の動作量を抽出することができるＲＡＷデータを意味し、その目的に適う限り、非接触センサ（例えば、赤外線センサ）のセンサ出力をモーション情報として取得してもよいし、デジタルビデオカメラが撮影する被験者の映像をモーション情報として取得してもよい。 During the group discussion, each subject is equipped with a directional voice microphone to individually acquire the speech voice information of each subject, and an arbitrary motion sensor (for example, a three-axis acceleration sensor) is attached to each subject. It is worn on the body (preferably the head), and motion information (for example, acceleration data) of each subject is acquired. In the present embodiment, “motion information” means RAW data from which the amount of movement of the subject can be extracted, and the sensor output of a non-contact sensor (for example, an infrared sensor) is used as a motion as long as the purpose is met. It may be acquired as information, or an image of a subject taken by a digital video camera may be acquired as motion information.

各被験者から取得された発話音声情報とモーション情報は、コーパス収集部１０１によって、当該被験者の識別情報（以下、被験者ＩＤという）に紐付けられ、記憶領域１２０に格納される。 The speech voice information and motion information acquired from each subject are linked to the subject's identification information (hereinafter referred to as subject ID) by the corpus collection unit 101 and stored in the storage area 120.

コーパスの収集が完了した後、グループディスカッションの様子を収めたビデオ映像を予め選任した複数の評定者に視聴させ、各評定者に各被験者のコミュニケーション能力を評定させる。なお、本実施形態では、企業等で採用面接の経験のある専門家を評定者として選任することが好ましい。 After the corpus collection is completed, a video image showing the state of the group discussion is viewed by a plurality of pre-selected graders, and each grader is graded for the communication ability of each subject. In the present embodiment, it is preferable to select an expert who has an interview experience at a company or the like as an evaluator.

一方、評定値入力部１０２は、評定値を入力するための所定のＵＩを評定者に提供する。これを受けて、各評定者は、提供されたＵＩを介して各被験者の評定値をコミュニケーション能力推定装置１００に入力する。これを受けて、評定値入力部１０２は、入力された各被験者の評定値を被験者ＩＤに紐付けて、記憶領域１２０に格納する。 On the other hand, the rating value input unit 102 provides the evaluator with a predetermined UI for inputting the rating value. In response to this, each evaluator inputs the evaluation value of each subject to the communication ability estimation apparatus 100 via the provided UI. In response to this, the rating value input unit 102 stores the input rating value of each subject in association with the subject ID in the storage area 120.

各被験者のコーパス（発話音声情報、モーション情報）と評価値が揃って記憶領域１２０に格納されたことを受けて、コミュニケーション能力推定装置１００は、「教師データ生成処理」を実行する。以下、コミュニケーション能力推定装置１００が実行する「教師データ生成処理」を図２に示すフローチャートに基づいて説明する。 In response to the corpus (speech voice information, motion information) and evaluation values of each subject being stored together in the storage area 120, the communication ability estimation device 100 executes “teacher data generation processing”. Hereinafter, the “teacher data generation process” executed by the communication ability estimation apparatus 100 will be described based on the flowchart shown in FIG.

まず、ステップ１０１では、音声認識部１０３、形態素解析部１０４、発話音声解析部１０５および特徴量抽出部１０７が協働して、各被験者の発話音声情報から“発話に関する特徴量”を抽出する。本実施形態では、“発話に関する特徴量”として、「発話ターン（Ｓ）」、「韻律（Ａ）」、「言語（Ｌ）」という３種類の特徴量グループを抽出する。下記表１は、それぞれの特徴量グループに属する特徴量をまとめて示す。 First, in step 101, the speech recognition unit 103, the morpheme analysis unit 104, the utterance speech analysis unit 105, and the feature amount extraction unit 107 cooperate to extract “features related to utterance” from the speech speech information of each subject. In the present embodiment, three types of feature amount groups, “utterance turn (S)”, “prosody (A)”, and “language (L)”, are extracted as “features related to speech”. Table 1 below collectively shows the feature values belonging to each feature value group.

以下では、上記表１に示す各特徴量グループに属する特徴量とその算出方法について順を追って説明する。 Hereinafter, the feature quantities belonging to each feature quantity group shown in Table 1 and the calculation method thereof will be described in order.

＜発話ターン（Ｓ）＞
発話ターン（Ｓ）は、被験者の発話の量（時間長さ・回数）に関する特徴量のグループであり、「合計発話長」、「合計発話回数」、「合計発話長（１秒以上）」、「合計発話回数（１秒以上）」という４つの特徴量からなる。本実施形態では、発話音声解析部１０５と特徴量抽出部１０７が協働して、各特徴量を以下の手順で算出する。 <Speaking turn (S)>
The utterance turn (S) is a group of feature amounts related to the amount of utterance (time length / number) of the subject, and “total utterance length”, “total utterance count”, “total utterance length (1 second or more)”, It consists of four feature quantities, “total number of utterances (1 second or more)”. In the present embodiment, the utterance voice analysis unit 105 and the feature amount extraction unit 107 cooperate to calculate each feature amount according to the following procedure.

（合計発話長）
発話音声解析部１０５は、１つの課題についてグループディスカッションが行われる期間（以下、１セッションという）に取得した各被験者の発話音声データを所定の微小時間単位（例えば、0.01秒単位）に分割して各区間の音圧を分析し、音圧が所定の閾値以上の区間を“発話区間“として検出する。これを受けて、特徴量抽出部１０７は、検出された発話区間の数Ｎに発話区間の長さ（例えば、0.01秒）を乗じた値（N×0.01）を「合計発話長」とする。 (Total utterance length)
The utterance voice analysis unit 105 divides utterance voice data of each subject acquired during a group discussion period (hereinafter referred to as one session) for one subject into predetermined minute time units (for example, 0.01 second units). The sound pressure in each section is analyzed, and a section in which the sound pressure is equal to or greater than a predetermined threshold is detected as a “speech section”. In response to this, the feature amount extraction unit 107 sets a value (N × 0.01) obtained by multiplying the number N of detected utterance sections by the length of the utterance section (for example, 0.01 seconds) as “total utterance length”.

（合計発話回数）
特徴量抽出部１０７は、１以上の連続する発話区間を“発話断片”として抽出し、抽出した発話断片の合計数を「合計発話回数」とする。 (Total number of utterances)
The feature amount extraction unit 107 extracts one or more continuous utterance sections as “utterance fragments”, and sets the total number of extracted utterance fragments as “total number of utterances”.

（合計発話長（１秒以上））
特徴量抽出部１０７は、抽出した発話断片の中から、１秒以上の時間長さを有する発話断片を抽出し、抽出した１以上の発話断片の時間長さの総和を「合計発話長（１秒以上）」とする。 (Total utterance length (more than 1 second))
The feature quantity extraction unit 107 extracts utterance fragments having a time length of 1 second or more from the extracted utterance fragments, and calculates the total time length of the extracted one or more utterance fragments as “total utterance length (1 Sec.) ”.

（合計発話回数（１秒以上））
特徴量抽出部１０７は、抽出した１秒以上の時間長さを有する発話断片の合計数を「合計発話回数（１秒以上）」とする。 (Total number of utterances (1 second or more))
The feature amount extraction unit 107 sets the total number of extracted utterance fragments having a time length of 1 second or longer as “total number of utterances (1 second or longer)”.

＜韻律（Ａ）＞
韻律（Ａ）は、被験者の発話の韻律に関する特徴量のグループであり、「最大ピッチ」、「最小ピッチ」、「ピッチ平均」、「最大インテンシティ」、「最小インテンシティ」、「音圧の幅」、「抑揚」、「話速」という８つの特徴量からなる。本実施形態では、発話音声解析部１０５と特徴量抽出部１０７が協働して、各特徴量を以下の手順で算出する。 <Prosody (A)>
The prosody (A) is a group of features related to the prosody of the subject's utterance, and is “maximum pitch”, “minimum pitch”, “pitch average”, “maximum intensity”, “minimum intensity”, “sound pressure”. It consists of eight feature quantities: “width”, “intonation”, and “speech speed”. In the present embodiment, the utterance voice analysis unit 105 and the feature amount extraction unit 107 cooperate to calculate each feature amount according to the following procedure.

（最大ピッチ）
発話音声解析部１０５は、１セッションで記録された各被験者の発話音声データから検出された各発話断片から最も高いピッチを検出する。これを受けて、特徴量抽出部１０７は、各発話断片から検出された最も高いピッチの平均値を「最大ピッチ」とする。 (Maximum pitch)
The utterance voice analysis unit 105 detects the highest pitch from each utterance fragment detected from the utterance voice data of each subject recorded in one session. In response to this, the feature quantity extraction unit 107 sets the average value of the highest pitches detected from each utterance fragment as the “maximum pitch”.

（最小ピッチ）
発話音声解析部１０５は、検出された各発話断片から最も低いピッチを検出する。これを受けて、特徴量抽出部１０７は、各発話断片から検出された最も低いピッチの平均値を「最小ピッチ」とする。 (Minimum pitch)
The speech voice analysis unit 105 detects the lowest pitch from each detected speech fragment. In response to this, the feature quantity extraction unit 107 sets the average value of the lowest pitches detected from each utterance fragment as the “minimum pitch”.

（ピッチ平均）
発話音声解析部１０５は、検出した各発話断片を所定の微小時間単位（例えば、0.1秒単位）に分割して各区間のピッチを検出する。これを受けて、特徴量抽出部１０７は、各区間から検出されたピッチの平均値を算出する。その上で、各発話断片から算出されたピッチの平均値の平均値を「ピッチ平均」とする。 (Pitch average)
The utterance voice analysis unit 105 divides each detected utterance fragment into a predetermined minute time unit (for example, 0.1 second unit) and detects the pitch of each section. In response to this, the feature quantity extraction unit 107 calculates the average value of the pitches detected from each section. Then, the average value of the pitch average values calculated from each utterance fragment is referred to as “pitch average”.

（最大シンテンシティ）
発話音声解析部１０５は、特徴量抽出部１０７が検出した各発話断片から最大音圧を検出する。これを受けて、特徴量抽出部１０７は、各発話断片から検出された最大音圧の平均値を「最大シンテンシティ」とする。 (Maximum intensity)
The utterance voice analysis unit 105 detects the maximum sound pressure from each utterance fragment detected by the feature amount extraction unit 107. In response to this, the feature amount extraction unit 107 sets the average value of the maximum sound pressure detected from each utterance fragment as “maximum intensity”.

（最小インテンシティ）
発話音声解析部１０５は、特徴量抽出部１０７が検出した各発話断片から最小音圧を検出する。これを受けて、特徴量抽出部１０７は、各発話断片から検出された最小音圧の平均値を「最小シンテンシティ」とする。 (Minimum intensity)
The utterance voice analysis unit 105 detects the minimum sound pressure from each utterance fragment detected by the feature amount extraction unit 107. In response to this, the feature quantity extraction unit 107 sets the average value of the minimum sound pressure detected from each utterance fragment as the “minimum intensity”.

（音圧の幅）
特徴量抽出部１０７は、最大インテンシティと最小インテンシティの差分を「音圧の幅」とする。 (Width of sound pressure)
The feature quantity extraction unit 107 sets the difference between the maximum intensity and the minimum intensity as the “sound pressure width”.

（抑揚）
特徴量抽出部１０７は、最大ピッチと最小ピッチの差分を「抑揚」とする。 (intonation)
The feature amount extraction unit 107 sets the difference between the maximum pitch and the minimum pitch as “intonation”.

（話速）
発話音声解析部１０５は、特徴量抽出部１０７が検出した全発話断片に含まれるシラブル（音節）を検出する。これを受けて、特徴量抽出部１０７は、検出されたシラブルの合計数を「合計発話長」で割った値を「話速」とする。 (Speaking speed)
The utterance voice analysis unit 105 detects syllables (syllables) included in all utterance fragments detected by the feature amount extraction unit 107. In response to this, the feature amount extraction unit 107 sets a value obtained by dividing the total number of detected syllables by the “total utterance length” as “speech speed”.

＜言語（Ｌ）＞
言語（Ｌ）は、被験者の発話において出現する所定の品詞（「名詞」、「動詞」、「感動詞」、「フィラー」）の出現数に関する特徴量のグループであり、「名詞数」、「動詞数」、「感動詞数」、「フィラー数」、「新規名詞数」、「既出名詞数」、「新規名詞頻度」という７つの特徴量からなる。本実施形態では、音声認識部１０３、形態素解析部１０４および特徴量抽出部１０７が協働して、各特徴量を以下の手順で算出する。 <Language (L)>
The language (L) is a group of feature quantities related to the number of appearances of a predetermined part of speech (“noun”, “verb”, “impression verb”, “filler”) appearing in the speech of the subject. It consists of seven feature quantities: “number of verbs”, “number of impressions”, “number of fillers”, “number of new nouns”, “number of existing nouns”, and “new noun frequency”. In the present embodiment, the voice recognition unit 103, the morpheme analysis unit 104, and the feature amount extraction unit 107 cooperate to calculate each feature amount according to the following procedure.

（名詞数、動詞数、感動詞数、フィラー数）
音声認識部１０３は、１セッションで記録された各被験者の発話音声データを解析して言語情報（発話の内容を表すテキストデータ）を取得する。これを受けて、形態素解析部１０４は、取得された言語情報に対して形態素解析を施して、４種類の品詞（「名詞」、「動詞」、「感動詞」、「フィラー」）を抽出する。ここで、「感動詞」とは、「ああ」、「なるほど」、「うん」といった、相手の発言に対する相槌や反応をするときによく観測される品詞であり、「フィラー」とは、「えーっと」、「なんか」といった、話者が発話の合間を埋めようとするときによく観測される品詞である。４種類の品詞（「名詞」、「動詞」、「感動詞」、「フィラー」）が抽出されたことを受けて、特徴量抽出部１０７は、「名詞」の合計数、「動詞」の合計数、「感動詞」の合計数および「フィラー」の合計数を、それぞれ、「名詞数」、「動詞数」、「感動詞数」および「フィラー数」とする。 (Nouns, verbs, impressions, fillers)
The voice recognition unit 103 analyzes the speech voice data of each subject recorded in one session, and acquires language information (text data representing the content of the speech). In response to this, the morphological analysis unit 104 performs morphological analysis on the acquired linguistic information and extracts four types of parts of speech (“noun”, “verb”, “impression verb”, “filler”). . Here, “impression verb” is a part of speech that is often observed when interacting with or responding to the other's remarks, such as “Oh”, “I see”, “Ye”, and “Filler” ”Or“ something ”, which is a part of speech often observed when a speaker tries to fill a gap between utterances. In response to the extraction of four types of part of speech (“noun”, “verb”, “impression verb”, “filler”), the feature quantity extraction unit 107 calculates the total number of “nouns” and the total of “verbs”. The number, the total number of “impression verbs”, and the total number of “fillers” are defined as “noun number”, “verb number”, “impression verb number”, and “filler number”, respectively.

（新規名詞数）
特徴量抽出部１０７は、１セッションにおいて、注目する被験者が他の被験者に先駆けて初めて発話した「名詞」を当該被験者の「新規名詞」として検出し、検出した「新規名詞」の合計数を「新規名詞数」とする。 (Number of new nouns)
In one session, the feature amount extraction unit 107 detects “nouns” spoken for the first time by the subject in question before other subjects as “new nouns” of the subject, and the total number of detected “new nouns” is “ The number of new nouns.

（既出名詞数）
特徴量抽出部１０７は、「名詞数」と「新規名詞数」の差分を「既出名詞数」とする。 (Number of existing nouns)
The feature quantity extraction unit 107 sets the difference between the “number of nouns” and the “number of new nouns” as the “number of existing nouns”.

（新規名詞頻度）
特徴量抽出部１０７は、「新規名詞数」を「合計発話回数」で割った値を「新規名詞頻度」とする。 (New noun frequency)
The feature amount extraction unit 107 sets a value obtained by dividing the “number of new nouns” by the “total number of utterances” as “new noun frequency”.

以上、上記表１に示す各特徴量グループに属する特徴量とその算出方法について説明した。なお、本実施形態では、特徴量抽出部１０７が、各被験者の「発話ターン（Ｓ）」、「韻律（Ａ）」、「言語（Ｌ）」を被験者ＩＤに紐付けて記憶領域１３０に格納する。 The feature amounts belonging to each feature amount group shown in Table 1 and the calculation method thereof have been described above. In this embodiment, the feature amount extraction unit 107 associates each subject's “utterance turn (S)”, “prosody (A)”, and “language (L)” with the subject ID and stores them in the storage area 130. To do.

再び、図２に戻って説明を続ける。 Returning again to FIG. 2, the description will be continued.

続くステップ１０２では、動作量検出部１０６と特徴量抽出部１０７が協働して、各被験者のモーション情報から“動作に関する特徴量”を抽出する。本実施形態では、“動作に関する特徴量”として、「動作（Ｍ）」という特徴量グループを抽出する。下記表２は、「動作（Ｍ）」に属する特徴量をまとめて示す。 In the subsequent step 102, the motion amount detection unit 106 and the feature amount extraction unit 107 cooperate to extract “feature amount related to motion” from the motion information of each subject. In the present embodiment, a feature value group “motion (M)” is extracted as “feature value related to motion”. Table 2 below collectively shows the feature quantities belonging to “motion (M)”.

以下では、上記表２に示す「動作（Ｍ）」に属する特徴量とその算出方法について順を追って説明する。 In the following, the feature quantities belonging to “motion (M)” shown in Table 2 and the calculation method thereof will be described in order.

＜動作（Ｍ）＞
動作（Ｍ）は、「動作量の平均」、「動作量の標準偏差」、「発話中の動作量の平均」、「発話中の動作量の標準偏差」という４つの特徴量からなる。本実験では、動作量検出部１０６と特徴量抽出部１０７が協働して、各特徴量を以下の手順で算出する。なお、以下では、モーション情報として、３軸加速度センサの加速度データを取得した場合を例にとって説明する。 <Operation (M)>
The motion (M) is composed of four feature amounts: “average of motion amount”, “standard deviation of motion amount”, “average of motion amount during speech”, and “standard deviation of motion amount during speech”. In this experiment, the motion amount detection unit 106 and the feature amount extraction unit 107 cooperate to calculate each feature amount according to the following procedure. In the following, a case where acceleration data of a triaxial acceleration sensor is acquired as motion information will be described as an example.

（動作量の平均、動作量の標準偏差）
動作量検出部１０６は、１セッションで記録された各被験者の加速度データを分析して、時刻ｔにおける加速度の３次元ベクトル:at={xt,yt,zt}のノルム|at|を算出する。これを受けて、特徴量抽出部１０７は、算出された|at|の平均および標準偏差を、それぞれ、「動作量の平均」および「動作量の標準偏差」とする。 (Average of movement amount, standard deviation of movement amount)
The motion amount detection unit 106 analyzes the acceleration data of each subject recorded in one session, and calculates the norm | at | of the three-dimensional vector: at = {xt, yt, zt} of acceleration at time t. In response to this, the feature amount extraction unit 107 sets the calculated | at | average and standard deviation as “average of motion amount” and “standard deviation of motion amount”, respectively.

（発話中の動作量の平均、発話中の動作量の標準偏差）
動作量検出部１０６は、１セッションで記録された各被験者の発話音声データから検出される各発話断片に対応する期間に記録された当該被験者の加速度データを分析して、当該期間における動作量（ノルム|at|）を算出する。これを受けて、特徴量抽出部１０７は、各発話断片について算出された動作量（ノルム|at|）の平均および標準偏差を算出した上で、算出された動作量（ノルム|at|）の平均の平均値を「動作量の平均（発話単位）」とし、算出された動作量（ノルム|at|）の標準偏差の平均値を「動作量の標準偏差（発話単位）」とする。 (Average of movement during utterance, standard deviation of movement during utterance)
The motion amount detection unit 106 analyzes the acceleration data of the subject recorded in the period corresponding to each utterance fragment detected from the speech data of each subject recorded in one session, and the motion amount ( Norm | at |) is calculated. In response to this, the feature amount extraction unit 107 calculates the average and standard deviation of the motion amount (norm | at |) calculated for each utterance fragment, and then calculates the calculated motion amount (norm | at |). The average average value is “average motion amount (utterance unit)”, and the average standard deviation of the calculated motion amount (norm | at |) is “standard deviation of motion amount (utterance unit)”.

以上、上記表２に示す「動作（Ｍ）」に属する特徴量とその算出方法について説明した。なお、本実施形態では、特徴量抽出部１０７が、抽出した各被験者の「動作（Ｍ）」を被験者ＩＤに紐付けて記憶領域１３０に格納する。 Heretofore, the feature quantities belonging to “motion (M)” shown in Table 2 and the calculation method thereof have been described. In the present embodiment, the feature amount extraction unit 107 stores the extracted “motion (M)” of each subject in association with the subject ID in the storage area 130.

続くステップ１０３では、教師データ生成部１０８が、各被験者の特徴量を入力（Ｘ）とし、当該参加者のコミュニケーション能力の評価値を出力（Ｙ）とする教師データを生成する。具体的には、まず、各被験者の被験者ＩＤに紐付いた４種類の特徴グループ（発話ターン（Ｓ）、韻律（Ａ）、言語（Ｌ）、動作（Ｍ））の中から、「言語（Ｌ）」を含む複数の特徴グループからなるセット（以下、特徴量セットという）を記憶領域１３０から読み出す。ここで、特徴量セットは、「言語（Ｌ）」を含む３以上の特徴グループからなることが好ましい。その後、被験者の特徴量セットを構成する各特徴量（実数）を要素とする素性ベクトルを入力（Ｘ）とし、当該被験者の被験者ＩＤに紐付いた評定値（実数）を記憶領域１２０から読み出して出力（Ｙ）とした上で、入力（Ｘ）と出力（Ｙ）の組を教師データとして生成する。 In the subsequent step 103, the teacher data generation unit 108 generates teacher data in which the characteristic amount of each subject is input (X) and the evaluation value of the communication ability of the participant is output (Y). Specifically, first, from the four types of feature groups (utterance turn (S), prosody (A), language (L), action (M)) associated with the subject ID of each subject, “language (L ) ”Including a plurality of feature groups (hereinafter referred to as feature amount sets) is read from the storage area 130. Here, the feature amount set is preferably composed of three or more feature groups including “language (L)”. Thereafter, a feature vector having each feature amount (real number) constituting the subject feature amount set as an input (X) is used, and a rating value (real number) associated with the subject ID of the subject is read from the storage area 120 and output. Based on (Y), a set of input (X) and output (Y) is generated as teacher data.

最後に、ステップ１０４では、学習実行部１０９が、ステップ１０３で生成された教師データを学習器１１０に学習させる。これにより、学習器１１０がコミュニケーション能力を推定する推定モデルを生成する。なお、学習器１１０としては、サポートベクトルマシン（Support Vector Machine：SVM）やサポートベクトル回帰（Support Vector Regression: SVR）を例示することができる。 Finally, in step 104, the learning execution unit 109 causes the learning device 110 to learn the teacher data generated in step 103. Thereby, the learning device 110 generates an estimation model for estimating the communication ability. As the learning device 110, a support vector machine (Support Vector Machine: SVM) or a support vector regression (Support Vector Regression: SVR) can be exemplified.

以上、コミュニケーション能力推定装置１００が“学習モード”で実行する処理について説明したので、続いて、コミュニケーション能力推定装置１００が“推定モード”で実行する処理について説明する。なお、以下の説明においては、図３に示す機能ブロック図を適宜参照するものとする。 The processing executed by the communication ability estimation device 100 in the “learning mode” has been described above. Next, the processing executed by the communication ability estimation device 100 in the “estimation mode” will be described. In the following description, the functional block diagram shown in FIG. 3 is referred to as appropriate.

“推定モード”では、評価対象となる被験者をグループディスカッションに参加させて、その際の当該被験者のコーパス（発話音声情報、モーション情報）を取得する。具体的には、先述した“学習モード”と同じ手順で、コミュニケーション能力を評価しようとする被験者を含むグループに対して所定の課題を与えてグループディスカッションを行わせた上で、コミュニケーション能力推定装置１００が被験者に装着した音声マイクおよびモーションセンサから発話音声情報およびモーション情報を取得する。これを受けて、コーパス収集部１０１が、被験者から取得した発話音声情報とモーション情報を、当該被験者の被験者ＩＤに紐付けて記憶領域１２０に格納する。 In the “estimation mode”, the subject to be evaluated is allowed to participate in the group discussion, and the corpus (utterance voice information, motion information) of the subject at that time is acquired. Specifically, in the same procedure as the above-described “learning mode”, the communication ability estimation apparatus 100 is configured to give a predetermined task to a group including a subject whose communication ability is to be evaluated and perform a group discussion. Utterance voice information and motion information are acquired from a voice microphone and a motion sensor attached to the subject. In response to this, the corpus collection unit 101 stores the utterance voice information and motion information acquired from the subject in the storage area 120 in association with the subject ID of the subject.

記憶領域１２０に、被験者の発話音声情報とモーション情報が格納されたことを受けて、コミュニケーション能力推定装置１００は、「コミュニケーション能力推定処理」を実行する。以下、コミュニケーション能力推定装置１００が実行する「コミュニケーション能力推定処理」を図４に示すフローチャートに基づいて説明する。 In response to storing the utterance voice information and motion information of the subject in the storage area 120, the communication ability estimation device 100 executes “communication ability estimation processing”. Hereinafter, “communication ability estimation processing” executed by the communication ability estimation apparatus 100 will be described based on the flowchart shown in FIG.

まず、ステップ２０１では、音声認識部１０３、形態素解析部１０４、発話音声解析部１０５および特徴量抽出部１０７が協働して、先述した“学習モード”と同じ手順で、被験者の発話音声情報から「発話ターン（Ｓ）」、「韻律（Ａ）」、「言語（Ｌ）」を抽出し、当該被験者の被験者ＩＤに紐付けて記憶領域１３０に格納する。 First, in step 201, the speech recognition unit 103, the morpheme analysis unit 104, the utterance speech analysis unit 105, and the feature amount extraction unit 107 cooperate with each other from the utterance speech information of the subject in the same procedure as the “learning mode” described above. “Speech turn (S)”, “prosody (A)”, and “language (L)” are extracted and associated with the subject ID of the subject and stored in the storage area 130.

続くステップ２０２では、動作量検出部１０６と特徴量抽出部１０７が協働して、先述した“学習モード”と同じ手順で、被験者のモーション情報から「動作（Ｍ）」を抽出し、当該被験者の被験者ＩＤに紐付けて記憶領域１３０に格納する。 In the following step 202, the motion amount detection unit 106 and the feature amount extraction unit 107 cooperate to extract “motion (M)” from the motion information of the subject in the same procedure as the “learning mode” described above, and the subject The subject ID is stored in the storage area 130.

最後に、ステップ２０３では、推定実行部１１２が、被験者の被験者ＩＤに紐付いた４種類の特徴グループ（発話ターン（Ｓ）、韻律（Ａ）、言語（Ｌ）、動作（Ｍ））の一部または全部からなる特徴量セットであって、推定モデル（学習器１１０）を生成する際に使用した特徴量セットと同じ組み合わせを有する特徴量セットを記憶領域１３０から読み出して、推定モデルに入力する。これを受けて、推定モデルが当該被験者のコミュニケーション能力の推定値を出力し、推定値出力部１１３がこれを取得する。その後、推定値出力部１１３は、取得した推定値を、当該被験者の被験者ＩＤに紐付けて記憶領域１４０に格納するとともに、ユーザまたはアプリケーションの要求に応じて、所定の出力先に出力する。 Finally, in step 203, the estimation execution part 112 is a part of four types of feature groups (speech turn (S), prosody (A), language (L), action (M)) associated with the subject ID of the subject. Alternatively, a feature quantity set consisting of all and having the same combination as the feature quantity set used when generating the estimation model (learning device 110) is read from the storage area 130 and input to the estimation model. In response to this, the estimation model outputs an estimated value of the communication ability of the subject, and the estimated value output unit 113 acquires the estimated value. Thereafter, the estimated value output unit 113 stores the acquired estimated value in association with the subject ID of the subject in the storage area 140 and outputs it to a predetermined output destination in response to a request from the user or application.

以上、説明したように、本実施形態によれば、機械学習の手法を用いた新規なコミュニケーション能力推定装置が提供される。なお、本実施形態のコミュニケーション能力推定装置は、その一部をもって、コミュニケーション能力の推定モデルを生成する装置として観念することができ、また、当該推定モデルを生成するための教師データを生成する装置として観念することができる。また、本実施形態によれば、コミュニケーション能力の推定モデルを生成する方法が提供される。なお、本実施形態のコミュニケーション能力の推定モデルを生成する方法は、その一部をもって、コミュニケーション能力の推定モデルを生成するための教師データを生成する方法として観念することができる。 As described above, according to the present embodiment, a novel communication ability estimation device using a machine learning technique is provided. In addition, the communication ability estimation apparatus of this embodiment can be thought of as an apparatus that generates an estimation model of communication ability, and a device that generates teacher data for generating the estimation model. I can think of it. Moreover, according to this embodiment, the method of producing | generating the estimation model of communication ability is provided. The method for generating a communication ability estimation model according to the present embodiment can be considered as a method for generating teacher data for generating a communication ability estimation model.

なお、上述した実施形態の各機能は、Ｃ、Ｃ＋＋、Ｃ＃、Ｊａｖａ（登録商標）などで記述された装置実行可能なプログラムにより実現することができ、本実施形態のプログラムは、ハードディスク装置、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ、フレキシブルディスク、ＥＥＰＲＯＭ、ＥＰＲＯＭなどの装置可読な記録媒体に格納して頒布することができ、また他装置が可能な形式でネットワークを介して伝送することができる。 Note that each function of the above-described embodiment can be realized by a device-executable program described in C, C ++, C #, Java (registered trademark), and the like. It can be stored and distributed on a device-readable recording medium such as a CD-ROM, MO, DVD, flexible disk, EEPROM, EPROM, etc., and can be transmitted via a network in a format that other devices can.

以上、本発明について実施形態をもって説明してきたが、本発明は上述した実施形態に限定されるものではなく、当業者が推考しうるその他の実施態様の範囲内において、本発明の作用・効果を奏する限り、本発明の範囲に含まれるものである。 As described above, the present invention has been described with the embodiment. However, the present invention is not limited to the above-described embodiment, and the functions and effects of the present invention are within the scope of other embodiments that can be considered by those skilled in the art. As long as it plays, it is included in the scope of the present invention.

以下、本発明のコミュニケーション能力の推定モデルを生成する方法について、実施例を用いてより具体的に説明を行なうが、本発明は、後述する実施例に限定されるものではない。 Hereinafter, the method for generating an estimation model of communication ability according to the present invention will be described more specifically with reference to examples. However, the present invention is not limited to the examples described later.

（１）コーパスの収集
４０名の被験者（大学生）をグループ内のメンバーがお互いに初対面となるように４人一組の計１０グループに分け、各グループに共通の課題を与えてグループディスカッションを行ってもらった。本実験では、課題による得手・不得手が原因で各被験者のコミュニケーション能力が過大・過小評価されることを回避するために、下記（ア）〜（ウ）に示す３つの課題を設定し、決められた時間内でディスカッションを行ってもらった。
（ア）学園祭に呼ぶべき有名人をランク付けする課題（１５分）
（イ）学園祭における出店計画を作る課題（２０分）
（ウ）外国人の友人を日本に呼んでもてなす計画を作る課題（２０分） (1) Collection of corpus 40 subjects (university students) are divided into 10 groups of 4 people in a group so that the members of the group face each other for the first time, and a group discussion is given to each group with common issues. I got it. In this experiment, in order to avoid over- / under-evaluation of each subject's communication ability due to the pros and cons of the task, the following three tasks (a) to (c) were set and decided: We had a discussion within the time given.
(A) Tasks to rank celebrities to call for school festivals (15 minutes)
(I) Challenges for making a store opening plan at a school festival (20 minutes)
(C) Making a plan to entertain foreign friends in Japan (20 minutes)

本実験では、各被験者にヘッドセットマイク（audio-technica：HYP-190H）を装着し、後頭部に加速度センサ（ATR-Promotions：WAA-010）を取り付けた状態でディスカッションを行ってもらい、その様子をビデオカメラで撮影した。その間、各被験者のヘッドセットマイクを介して発話音声データを記録するとともに、各被験者の加速度センサを介して３軸方向（x，y，z）の加速度データを30fpsで取得した。 In this experiment, each subject had a discussion with a headset microphone (audio-technica: HYP-190H) and an acceleration sensor (ATR-Promotions: WAA-010) attached to the back of the head. Taken with a video camera. During that time, utterance voice data was recorded via the headset microphone of each subject, and acceleration data in three axial directions (x, y, z) was acquired at 30 fps via the acceleration sensor of each subject.

（２）被験者のコミュニケーション能力の評定
異なる業種（小売業、人材派遣、ＩＴ関連等）や規模（中小企業から大企業まで）の企業で採用面接の経験を持つ２１名の元人事採用担当者に被験者のコミュニケーション能力を評価してもらった。実験に際し、各評定者には、各グループのディスカッションの様子を収めたビデオを視聴してもらい、各被験者について、以下に述べる６つの項目を評価してもらった。 (2) Assessment of subject's communication skills Twenty-one former HR recruiters with experience of recruitment interviews in companies of different industries (retail, personnel dispatch, IT-related, etc.) and scale (from SMEs to large companies) The subjects were evaluated for their communication skills. During the experiment, each evaluator was asked to watch a video containing the discussion of each group, and each subject was evaluated for the following six items.

中央職業能力開発協会（JAVADA）によると、若年者の習得すべきコミュニケーション能力は、双方向のコミュニケーションを円滑に行い情報伝達を適切かつ的確に行う「意思疎通」、組織に適応する「協調性」、プレゼンテーション能力に関する「自己表現」という３つの項目からなるとされており、そのうち、グループディスカッションにおいて個人差が最も顕著になる「意思疎通」の能力は、「傾聴する姿勢」、「双方向の円滑なコミュニケーション」、「意見集約力」、「情報伝達能力」、「論理的で明瞭な主張」という５つの要素項目からなるとされている。これを受け、本実験では、各評定者に、上述した５つの要素項目を５段階で評価してもらうとともに、５つの要素項目を考慮した総合評価である「総合的なコミュニケーション能力」という項目を１０段階で評価してもらった。下記表３に６つの評価項目とその内容をまとめて示す。 According to the Japan Vocational Ability Development Association (JAVADA), the communication skills that young people should acquire are “communication” that facilitates two-way communication and information transmission appropriately and accurately, and “collaboration” that adapts to the organization. It is said that it consists of three items, “self-expression” regarding presentation ability. Among them, the ability of “communication”, where individual differences are most noticeable in group discussions, is “attentive listening”, “two-way smoothness” It is said to consist of five element items: “communication”, “opinion gathering ability”, “information transmission ability”, and “logical and clear assertion”. In response to this, in this experiment, each grader was asked to evaluate the above-mentioned five element items in five stages, and an item of “comprehensive communication ability”, which is a comprehensive evaluation in consideration of the five element items. We had you evaluate in ten steps. Table 3 below summarizes the six evaluation items and their contents.

のべ１２０名（４０名×３課題）の被験者について２１名の評定者の評定値が出揃ったところで、各評価項目のクローンバックのα信頼性係数を算出したところ、評価項目（０）の値（α=0.66）だけが信頼性の目安である0.8を下回っていたが、その他の評価項目（１）〜（５）の値は全て0.8を上回っていた。これを受け、本実験では、評価項目（１）〜（５）の結果だけを採用し、各評価項目について２１人の評定者が回答した評定値の平均値を当該評価項目に係る評定値とした。 When the evaluation values of 21 evaluators were obtained for a total of 120 subjects (40 people x 3 tasks), and the clone reliability α reliability coefficient of each evaluation item was calculated, the value of the evaluation item (0) Only (α = 0.66) was below the reliability standard of 0.8, but the values of the other evaluation items (1) to (5) were all above 0.8. Accordingly, in this experiment, only the results of the evaluation items (1) to (5) are adopted, and the average value of the evaluation values answered by 21 evaluators for each evaluation item is determined as the evaluation value related to the evaluation item. did.

（３）特徴量の抽出
１セッションに記録された各被験者の発話音声データを、音声分析ソフトウェアPraatを用いて解析し、そこから先述した「発話ターン（Ｓ）」、「韻律（Ａ）」、「言語（Ｌ）」を抽出した。同じく、１セッションで記録された各被験者の加速度データから、先述した「動作（Ｍ）」を抽出した。 (3) Extraction of feature amount The speech data of each subject recorded in one session is analyzed using speech analysis software Praat, and “utterance turn (S)”, “prosody (A)” described above, “Language (L)” was extracted. Similarly, the above-mentioned “motion (M)” was extracted from the acceleration data of each subject recorded in one session.

（４）コミュニケーション能力推定モデルの生成
各被験者のコーパスから抽出した４つの特徴量グループ（Ｓ、Ａ、Ｍ、Ｌ）を組み合わせてなる１５種類の特徴量セットを準備した。下記表４に、準備した特徴量セット（１）〜（１５）の内容をまとめて示す。 (4) Generation of Communication Capability Estimation Model 15 types of feature amount sets were prepared by combining four feature amount groups (S, A, M, L) extracted from the corpus of each subject. Table 4 below summarizes the contents of the prepared feature quantity sets (1) to (15).

本実験では、下記表４に示す各特徴量セットを使用して、以下の手順で機械学習（回帰学習・分類学習）を行って、コミュニケーション能力を推定する推定モデルを生成した。 In this experiment, machine learning (regression learning / classification learning) was performed by the following procedure using each feature amount set shown in Table 4 below, and an estimation model for estimating communication ability was generated.

（回帰学習）
各被験者に係る、上記表４に示す各特徴量セットを構成する特徴量（実数）を要素とする素性ベクトルを入力（Ｘ）とし、当該被験者の評定値（評定項目毎）を出力（Ｙ）として、入出（Ｘ）と出力（Ｙ）の組からなる教師データを生成し、生成した教師データに基づいて、線形のサポートベクトル回帰（Support Vector Regression: SVR）を用いて回帰学習を行った。なお、SVRにおける誤差関数にはε許容誤差を用い、誤差の許容度を調整するεに関しては[0, 0.01, 0.1, 1]の探索域を、損失関数とマージンの大きさの間のトレードオフを調整するパラメータCに関しては[0.01, 0.1, 1, 5, 10]の探索域を設け、交差検定を通じて、グリッドサーチにより最適パラメータを決定した。 (Regression learning)
A feature vector having each feature quantity (real number) constituting each feature quantity set shown in Table 4 as an element is input (X), and a rating value (for each rating item) of the subject is output (Y). As described above, teacher data composed of a pair of input / output (X) and output (Y) is generated, and regression learning is performed using linear support vector regression (SVR) based on the generated teacher data. Note that ε tolerance is used for the error function in SVR, and the search range of [0, 0.01, 0.1, 1] is used for ε for adjusting error tolerance, and the trade-off between loss function and margin size For parameter C that adjusts, a search range of [0.01, 0.1, 1, 5, 10] was provided, and the optimal parameter was determined by grid search through cross-validation.

（分類学習）
上述した教師データについて、評定値（出力（Ｙ））の平均値mと標準偏差σを算出し、m+0.1σ以上の評価値を高群に分類し、m-0.1σ以下の評価値を低群に分類した上で、評価値（出力（Ｙ））を高低の二値ラベルに変換するとともに、高群・低群のいずれにも分類されない平均値近傍の評価値を出力（Ｙ）とする組を教師データから除外した。上述した修正を加えた教師データに基づいて、線形のサポートベクトルマシン（Support Vector Machine：SVM）を用いて分類学習を行った。なお、SVMにおける損失とマージンの大きさの間のトレードオフを調整するパラメータC を[0.01, 0.1, 1, 5, 10] の範囲で探索し、テストに用いた。 (Classification learning)
For the teacher data described above, the average value m and standard deviation σ of the rating values (output (Y)) are calculated, the evaluation values of m + 0.1σ or higher are classified into high groups, and the evaluation values of m−0.1σ or lower are calculated. After classifying into the low group, the evaluation value (output (Y)) is converted into a high and low binary label, and the evaluation value near the average value that is not classified into either the high group or the low group is output (Y). Group to be excluded from the teacher data. Based on the teacher data with the above modifications, classification learning was performed using a linear support vector machine (SVM). A parameter C that adjusts the trade-off between loss and margin size in SVM was searched in the range of [0.01, 0.1, 1, 5, 10] and used for testing.

（４）回帰・分類モデルの評価
本実験では、交差検定により各モデルの推定性能を評価した。具体的には、実験に参加した１０のグループのうち、９のグループに属する各被験者の教師データを用いた機械学習によりモデルを生成したのち、残りの１のグループに属する各被験者の教師データをテストデータに用いて、生成したモデルの推定性能を評価した。 (4) Evaluation of regression / classification model In this experiment, the estimated performance of each model was evaluated by cross-validation. Specifically, out of 10 groups that participated in the experiment, after generating a model by machine learning using teacher data of each subject belonging to 9 groups, the teacher data of each subject belonging to the remaining 1 group is The estimation performance of the generated model was evaluated using test data.

（回帰モデルの推定性能評価）
回帰モデルの推定性能の評価では、テストデータに対する決定係数Ｒ^２を指標とした。図５は、上記表３に示す評価項目（１）〜（５）の各々に係る１５種類の特徴量セット毎の決定係数Ｒ^２の棒グラフを示す。なお、図５に示す棒グラフおいて、各棒に付した付番は上記表４に示す特徴量セットの番号を意味する（以下、図６において同様）。 (Estimated performance evaluation of regression model)
In the evaluation of estimation performance of the regression model was the decision for the test data coefficient R ² as an index. Figure 5 shows a bar graph of the coefficient of determination R ² for each 15 kinds of characteristic quantity set according to each of the evaluation items shown in Table 3 (1) to (5). In the bar graph shown in FIG. 5, the number assigned to each bar means the number of the feature amount set shown in Table 4 (hereinafter the same as in FIG. 6).

図５に示すように、評価項目（１）〜（４）では、いずれも、「発話ターン（Ｓ）＋動作（Ｍ）」の特徴量セットを使用して生成された回帰モデルが最大値（0.47, 0.54, 0.56, 0.52）を示した。一方、「評価項目（５）：総合的なコミュニケーション能力」では、「発話ターン（Ｓ）＋言語（Ｌ）＋動作（Ｍ）」の特徴量セットを使用して生成された回帰モデルが最大値（0.62）を示した。 As shown in FIG. 5, in each of the evaluation items (1) to (4), the regression model generated using the feature amount set of “utterance turn (S) + motion (M)” is the maximum value ( 0.47, 0.54, 0.56, 0.52). On the other hand, in “evaluation item (5): comprehensive communication ability”, the regression model generated using the feature amount set of “utterance turn (S) + language (L) + motion (M)” is the maximum value. (0.62).

（分類モデルの推定性能評価）
分類モデルの推定性能の評価では、正答率（Accuracy）を指標とした。図６は、上記表３に示す評価項目（１）〜（５）の各々に係る１５種類の特徴量セット毎の正答率の棒グラフを示す。 (Estimated performance evaluation of classification model)
In evaluating the estimation performance of the classification model, the accuracy rate (Accuracy) was used as an index. FIG. 6 shows a bar graph of the correct answer rate for each of 15 types of feature quantity sets according to each of the evaluation items (1) to (5) shown in Table 3.

図６に示すように、「評価項目（１）：双方向の円滑なコミュニケーション」および「評価項目（２）：意見集約力」では、いずれも、「韻律（Ａ）＋言語（Ｌ）＋動作（Ｍ）」の特徴量セットを使用して生成された分類モデルが最大値（0.85,0.83）を示した。また、「評価項目（３）：情報伝達力」および「評価項目（４）：論理的で明瞭な主張」では、いずれも、「韻律（Ａ）＋発話ターン（Ｓ）＋言語（Ｌ）」の特徴量セットを使用して生成された分類モデルが最大値（0.87,0.85）を示した。一方、「評価項目（５）：総合的なコミュニケーション能力」では，「韻律（Ａ）＋言語（Ｌ）＋動作（Ｍ）」の特徴量セットを使用して生成された分類モデルが最大値（0.93）を示した。 As shown in FIG. 6, in both “evaluation item (1): two-way smooth communication” and “evaluation item (2): opinion gathering power”, both “prosody (A) + language (L) + operation” The classification model generated using the feature quantity set of (M) ”showed the maximum value (0.85, 0.83). Further, in “evaluation item (3): information transmission ability” and “evaluation item (4): logical and clear assertion”, both “prosody (A) + utterance turn (S) + language (L)” The classification model generated using the feature quantity set showed the maximum value (0.87, 0.85). On the other hand, in “evaluation item (5): comprehensive communication ability”, the classification model generated using the feature quantity set of “prosody (A) + language (L) + motion (M)” is the maximum value ( 0.93).

（分類モデルの頑健性評価）
図７は、異なる特徴量セット（Ａ、Ｓ、Ｌ、Ｍ、Ａ＋Ｌ＋Ｍ）を使用して生成された分類モデルによる「評価項目（５）：総合的なコミュニケーション能力」の推定結果に関して、SVMからROC曲線をプロットした図を示す。図７において、縦軸は真陽性率（True Positive(TP)：低群のデータを正しく分類する割合）を示し、横軸は偽陽性率（FalsePositive（FP）：低群のデータを誤って高群に分類する割合）を示す。図７に示すように、「韻律（Ａ）＋言語（Ｌ）＋動作（Ｍ）」の特徴量セットを使用した分類モデルのAUC (Area Under the Curve)は0.97とＡ、Ｓ、Ｌ、Ｍをそれぞれ用いたモデルのAUCを改善しており、FP = 0.05 において、TP は0.91 以上となった。この結果は、「韻律（Ａ）＋言語（Ｌ）＋動作（Ｍ）」の特徴量セットを使用して生成された分類モデルが、高群のデータの誤分類精度を低く保ちつつ、低群のデータを91%以上の割合で正しく分類できることを示す。 (Robustness evaluation of classification model)
FIG. 7 is a graph showing an estimation result of “Evaluation item (5): Comprehensive communication ability” based on a classification model generated using different feature quantity sets (A, S, L, M, A + L + M). The figure which plotted the curve is shown. In FIG. 7, the vertical axis indicates the true positive rate (True Positive (TP): the rate of correctly classifying low group data), and the horizontal axis indicates the false positive rate (FalsePositive (FP): low group data incorrectly high). The ratio of classification into groups). As shown in FIG. 7, the AUC (Area Under the Curve) of the classification model using the feature value set of “prosody (A) + language (L) + motion (M)” is 0.97, and A, S, L, M The AUC of the model using each was improved, and the TP was 0.91 or more at FP = 0.05. As a result, the classification model generated using the feature quantity set of “prosody (A) + language (L) + motion (M)” can maintain the low classification accuracy of the high group data while maintaining the low group accuracy. It shows that the data of can be correctly classified at a rate of 91% or more.

１００…コミュニケーション能力推定装置
１０１…コーパス収集部
１０２…評定値入力部
１０３…音声認識部
１０４…形態素解析部
１０５…発話音声解析部
１０６…動作量検出部
１０７…特徴量抽出部
１０８…教師データ生成部
１０９…学習実行部
１１０…学習器
１１２…推定実行部
１１３…推定値出力部
１２０，１３０，１４０…記憶領域 DESCRIPTION OF SYMBOLS 100 ... Communication ability estimation apparatus 101 ... Corpus collection part 102 ... Rating value input part 103 ... Speech recognition part 104 ... Morphological analysis part 105 ... Speech speech analysis part 106 ... Motion amount detection part 107 ... Feature quantity extraction part 108 ... Teacher data generation Unit 109 ... Learning execution unit 110 ... Learner 112 ... Estimation execution unit 113 ... Estimated value output unit 120, 130, 140 ... Storage area

Claims

A method for generating teacher data for generating an estimation model of communication ability,
Extracting a feature amount related to the utterance of the subject from the utterance voice information of each subject participating in the group discussion;
Generating teacher data having as input a feature amount related to the utterance of each subject and outputting a rating value of the subject's communication ability;
Including
The step of extracting a feature amount related to the utterance includes:
Analyzing the speech information, obtaining linguistic information, extracting a predetermined part of speech from the linguistic information, and extracting a first feature quantity relating to the number of appearances of the predetermined part of speech;
Method.

The step of extracting a feature amount related to the utterance includes:
Analyzing the utterance voice information, extracting a subject's utterance, and extracting a second feature amount relating to the amount of the utterance;
The method of claim 1.

The step of extracting a feature amount related to the utterance includes:
Analyzing the utterance voice information, extracting a subject's utterance, and extracting a third feature amount related to the prosody of the utterance;
The method according to claim 1 or 2.

Furthermore, obtaining the motion information of each subject,
Extracting a feature amount related to a motion amount from the motion information of each subject,
The step of generating the teacher data includes:
Generating teacher data with the feature amount related to the utterance and the feature amount related to the motion amount of each subject as inputs, and output as a rating value of the subject's communication ability,
The method as described in any one of Claims 1-3.

A method for generating an estimated model of communication ability,
Generating teacher data by the method according to any one of claims 1 to 4,
Learning the generated teacher data with a learning device to generate the estimation model;
Including methods.

The estimation model is a regression model or a classification model;
The method of claim 5.

The program for making a computer perform each step of the method as described in any one of Claims 1-6.

An apparatus for generating teacher data for generating an estimation model of communication ability,
Means for extracting feature values related to utterance from utterance voice information of each subject participating in the group discussion;
Means for generating teacher data having as input a feature amount related to the utterance of each subject and outputting a rating value of the subject's communication ability;
Including
The means for extracting the feature amount related to the utterance is:
Analyzing the speech information, obtaining linguistic information, extracting a predetermined part of speech from the linguistic information, and extracting a first feature quantity relating to the number of appearances of the predetermined part of speech;
apparatus.

An apparatus for generating an estimation model of communication ability,
Means for extracting feature values related to utterance from utterance voice information of each subject participating in the group discussion;
Means for generating teacher data having as input a feature amount related to the utterance of each subject and outputting a rating value of the subject's communication ability;
Means for causing the learning device to learn the teacher data and generating the estimation model;
Including
The means for extracting the feature amount related to the utterance is:
Analyzing the speech information, obtaining linguistic information, extracting a predetermined part of speech from the linguistic information, and extracting a first feature quantity relating to the number of appearances of the predetermined part of speech;
apparatus.

A device for estimating communication ability,
Means for extracting feature values related to utterance from utterance voice information of each subject participating in the group discussion;
Means for generating teacher data having as input a feature amount related to the utterance of each subject and outputting a rating value of the subject's communication ability;
Means for generating an estimation model for causing the learning device to learn the generated teacher data and estimating communication ability;
Means for inputting, to the estimation model, a feature amount related to an utterance extracted by the means for extracting the feature amount from the utterance voice information of the subject when the subject to be evaluated participates in the group discussion;
Means for obtaining an estimate of the communication ability of the subject as an output of the estimation model;
Including
The means for extracting the feature amount related to the utterance is:
Analyzing the speech information, obtaining linguistic information, extracting a predetermined part of speech from the linguistic information, and extracting a first feature quantity relating to the number of appearances of the predetermined part of speech;
Communication ability estimation device.