JP2023005289A

JP2023005289A - Dialogue support server, dialogue support system, dialogue support method, and program

Info

Publication number: JP2023005289A
Application number: JP2021107082A
Authority: JP
Inventors: 文博高山; Fumihiro Takayama; 佳史相本; Yoshifumi Aimoto; 俊之吉田; Toshiyuki Yoshida; 夢希子小松; Yukiko Komatsu
Original assignee: NTT Data Institute of Management Consulting Inc
Current assignee: NTT Data Institute of Management Consulting Inc
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-01-18

Abstract

To enable a person in charge to get close to client's feelings in a dialogue between a client and the person in charge.SOLUTION: An acquisition unit is provided to acquire first image data including the face of a client imaged during a dialogue between the client and a person in charge and second image data including the face of the person in charge imaged during the dialogue. An image processing unit is provided to estimate a first emotion expressed by the face indicated by the first image data, estimate a second emotion expressed by the face indicated by the second image data, and calculate a degree of similarity between the first emotion and the second emotion as a degree of sympathy during the dialogue. A communication unit is provided to transmit the degree of sympathy calculated by the image processing unit to a terminal device of the person in charge.SELECTED DRAWING: Figure 2

Description

本発明は、対話支援サーバ、対話支援システム、対話支援方法、及びプログラムに関する。 The present invention relates to a dialogue support server, a dialogue support system, a dialogue support method, and a program.

従来から音声や画像を用いて、顧客が持つ感情の状態を検出する技術が利用されている。例えば、特許文献１には、提案時のみならず、日々の感情分析結果も含めることにより、顧客が商品に興味があるか否かを、より精度で判定することができる技術が開示されている。 Conventionally, technology has been used to detect the emotional state of customers using voice and images. For example, Patent Literature 1 discloses a technique capable of more accurately determining whether or not a customer is interested in a product by including not only the result of sentiment analysis at the time of proposal but also the results of daily emotion analysis. .

特開２０２０－１８４２１６号公報JP 2020-184216 A

しかしながら、顧客の感情の状態を検出できたとしても、その感情に寄り添うことができるとは限らない。特に、経験の浅い担当者は、顧客が悩んでいることに気づいたとしても、どう対応してよいのか判らない場合がある。このために顧客から信頼を得ることができず、機会損失が発生する可能性があった。 However, even if the customer's emotional state can be detected, it is not always possible to get close to the customer's emotions. In particular, even if an inexperienced person in charge notices that a customer is worried, he or she may not know how to respond. As a result, the customer's trust could not be obtained, and there was a possibility of opportunity loss.

本発明は、このような事情に鑑みてなされたもので、その目的は、顧客と担当者の対話において、担当者が顧客の感情に寄り添うことができる対話支援サーバ、対話支援システム、対話支援方法、及びプログラムを提供することにある。 The present invention has been made in view of such circumstances, and its object is to provide a dialogue support server, a dialogue support system, and a dialogue support method that enable the person in charge to be close to the customer's feelings in the dialogue between the customer and the person in charge. , and to provide a program.

上述した課題を解決するために、本発明に係る対話支援サーバは、顧客と担当者の対話時における前記顧客の顔が撮像された第１画像データ、及び前記対話時における前記担当者の顔が撮像された第２画像データを取得する取得部と、前記第１画像データに示されている顔が示す第１感情を推定し、前記第２画像データに示されている顔が示す第２感情を推定し、前記第１感情と前記第２感情とが類似する度合を、前記対話時における共感度として算出する画像処理部と、前記画像処理部によって算出された前記共感度を、前記担当者の端末装置に送信する通信部とを備える。 In order to solve the above-described problems, a dialogue support server according to the present invention provides first image data in which the customer's face is imaged during the dialogue between the customer and the person in charge, and the face of the person in charge during the dialogue. an acquisition unit for acquiring imaged second image data; and a first emotion indicated by the face indicated by the first image data, and estimating the second emotion indicated by the face indicated by the second image data. and calculating the degree of similarity between the first emotion and the second emotion as a degree of empathy during the dialogue; and a communication unit that transmits to the terminal device.

また、本発明は、上述の対話支援サーバにおいて、前記画像処理部は、前記顧客の顔が特定の対象感情を示している第１度合を前記第１感情として推定し、前記担当者の顔が前記対象感情を示している第２度合を前記第２感情として推定し、前記第１度合と前記第２度合の差分の絶対値に基づいて前記共感度を算出する。 Further, according to the present invention, in the above-described dialogue support server, the image processing unit estimates, as the first emotion, a first degree to which the face of the customer indicates a specific target emotion, and the face of the person in charge A second degree indicating the target emotion is estimated as the second emotion, and the degree of empathy is calculated based on an absolute value of a difference between the first degree and the second degree.

また、本発明は、上述の対話支援サーバにおいて、前記画像処理部は、前記第１画像データから抽出された特徴点と悩み度推定モデルに基づいて、前記顧客が悩んでいる度合を、悩み度として推定し、前記通信部は、前記画像処理部によって算出された前記共感度と前記悩み度を前記担当者の端末装置に送信し、前記悩み度推定モデルは、学習用の顔画像に当該学習用の顔画像が悩んでいる顔であるか否かを示すラベルが対応付けられた学習用データセットを用いて、顔画像と悩んでいる顔との対応関係を機械学習したモデルであり、入力された顔画像が悩んでいる顔である度合を推定するモデルである。 Further, in the above-described dialogue support server, the image processing unit calculates the degree of concern of the customer based on the feature points extracted from the first image data and the concern level estimation model. and the communication unit transmits the level of empathy and the level of worry calculated by the image processing unit to the terminal device of the person in charge, and the level of worry estimation model uses the facial image for learning as the learning It is a model that machine-learned the correspondence relationship between the face image and the worried face using a training data set that has a label indicating whether the face image for the target is a worried face. It is a model that estimates the degree to which a given face image is a distressed face.

また、本発明において、上述の対話支援サーバは、音声処理部を更に備え、前記取得部は、前記対話時の音声を集音するマイクによって集音された音声データを取得し、前記音声処理部は、前記音声データに含まれる音声をテキスト化し、前記テキスト化した音声から出現頻度が閾値以上であるキーワードを抽出し、前記通信部は、前記音声処理部によって抽出された前記キーワードを、前記担当者の端末装置に送信する。 Further, in the present invention, the above-described dialogue support server further comprises a speech processing unit, the acquisition unit acquires audio data collected by a microphone that collects the speech during the dialogue, and the speech processing unit converts the speech contained in the speech data into text, extracts a keyword having a frequency of occurrence equal to or higher than a threshold value from the text-converted speech, and the communication unit transfers the keyword extracted by the speech processing unit to the person in charge to the terminal device of the other party.

また、本発明は、上述の対話支援サーバにおいて、前記音声処理部は、前記キーワードに基づいて、対話が記憶された対話データベースを参照し、前記対話データベースから前記キーワードの出現頻度が閾値以上である対話を、類似ケースとして抽出し、前記通信部は、前記音声処理部によって抽出された前記類似ケースを、前記担当者の端末装置に送信する。 Further, according to the present invention, in the above-described dialogue support server, the speech processing unit refers to a dialogue database in which dialogues are stored based on the keyword, and the appearance frequency of the keyword from the dialogue database is equal to or higher than a threshold. A dialogue is extracted as a similar case, and the communication unit transmits the similar case extracted by the speech processing unit to the terminal device of the person in charge.

また、本発明に係る対話支援システムは、顧客と担当者の対話時における前記顧客の顔を撮像する第１カメラと、前記対話時における前記担当者の顔を撮像する第２カメラと、前記担当者の端末装置と、上記に記載の対話支援サーバであって、前記第１カメラによって撮像された第１画像データ、及び前記第２カメラによって撮像された第２画像データを取得し、前記共感度を前記端末装置に送信する対話支援サーバと、を備える。 Further, the dialogue support system according to the present invention includes a first camera that captures the face of the customer during the dialogue between the customer and the person in charge, a second camera that captures the face of the person in charge during the dialogue, and a terminal device of a person, and the dialogue support server described above, which obtains first image data captured by the first camera and second image data captured by the second camera, and obtains the empathy level to the terminal device.

また、本発明に係る対話支援方法は、コンピュータ装置が行う対話支援方法であって、取得部が、顧客と担当者の対話時における前記顧客の顔が撮像された第１画像データ、及び前記対話時における前記担当者の顔が撮像された第２画像データを取得し、画像処理部が、前記第１画像データに示されている顔が示す第１感情を推定し、前記第２画像データに示されている顔が示す第２感情を推定し、前記第１感情と前記第２感情とが類似する度合を、前記対話時における共感度として算出し、通信部が、前記画像処理部によって算出された前記共感度を、前記担当者の端末装置に送信する。 Further, a dialogue support method according to the present invention is a dialogue support method performed by a computer device, wherein an acquisition unit obtains first image data in which the customer's face is imaged during a dialogue between the customer and the person in charge, and the dialogue acquires second image data in which the face of the person in charge is imaged at the time, an image processing unit estimates a first emotion indicated by the face indicated in the first image data, and calculates the first emotion in the second image data estimating a second emotion indicated by the indicated face, calculating a degree of similarity between the first emotion and the second emotion as a degree of empathy during the dialogue, and calculating by the communication unit by the image processing unit The degree of empathy obtained is transmitted to the terminal device of the person in charge.

また、上述した課題を解決するために、本発明は、コンピュータを、上記に記載の対話支援サーバとして動作させるためのプログラムであって、前記コンピュータを前記対話支援サーバが備える各部として機能させるためのプログラムである。 Further, in order to solve the above-described problems, the present invention provides a program for causing a computer to operate as the dialogue support server described above, comprising: It's a program.

本発明によれば、共感度を、担当者の端末装置に送信することができる。このため、担当者は顧客との対話における共感度を知ることができ、共感度に応じて対話を行うことにより顧客の感情に寄り添うことが可能となる。 According to the present invention, the empathy level can be transmitted to the terminal device of the person in charge. Therefore, the person in charge can know the degree of sympathy in the dialogue with the customer, and by conducting dialogue according to the degree of sympathy, it becomes possible to get close to the customer's feelings.

実施形態に係る対話支援システム１の適用例を示す図である。It is a figure showing an example of application of dialogue support system 1 concerning an embodiment. 実施形態に係る対話支援システム１の構成例を示すブロック図である。1 is a block diagram showing a configuration example of a dialogue support system 1 according to an embodiment; FIG. 実施形態に係るアセスメントデータ１２０の例を示す図である。It is a figure showing an example of assessment data 120 concerning an embodiment. 実施形態に係る顧客Ｕの感情データ１２１Ａの例を示す図である。It is a figure which shows the example of the customer's U emotion data 121A which concerns on embodiment. 実施形態に係る担当者Ｔの感情データ１２１Ｂの例を示す図である。It is a figure which shows the example of the person in charge T's emotion data 121B which concerns on embodiment. 実施形態に係る悩み度データ１２２の例を示す図である。It is a figure which shows the example of the degree-of-worry data 122 which concerns on embodiment. 実施形態に係る共感度データ１２３の例を示す図である。It is a figure which shows the example of empathy degree data 123 which concerns on embodiment. 実施形態に係るキーワードデータ１２４の例を示す図である。It is a figure which shows the example of the keyword data 124 which concern on embodiment. 実施形態に係るケーステキストデータ１２５の例を示す図である。FIG. 4 is a diagram showing an example of case text data 125 according to the embodiment; 実施形態に係る端末装置２０に表示される画像の例を示す図である。4A and 4B are diagrams showing examples of images displayed on the terminal device 20 according to the embodiment; FIG. 実施形態に係る端末装置２０に表示される画像の例を示す図である。4A and 4B are diagrams showing examples of images displayed on the terminal device 20 according to the embodiment; FIG. 実施形態に係るサーバ装置１０が行う処理の流れを示すフローチャートである。4 is a flowchart showing the flow of processing performed by the server device 10 according to the embodiment; 実施形態に係るサーバ装置１０が行う処理の流れを示すフローチャートである。4 is a flowchart showing the flow of processing performed by the server device 10 according to the embodiment;

以下、本発明の一実施形態について図面を参照して説明する。 An embodiment of the present invention will be described below with reference to the drawings.

（対話支援システム１について）
図１は、実施形態に係る対話支援システム１の適用例を示す図である。図１に示すように、対話支援システム１は、顧客Ｕと担当者Ｔが対話を行う場合に適用される。 (Regarding Dialogue Support System 1)
FIG. 1 is a diagram showing an application example of a dialogue support system 1 according to an embodiment. As shown in FIG. 1, the dialogue support system 1 is applied when a customer U and a person in charge T have a dialogue.

対話支援システム１では、対話時において、カメラＣ１が顧客Ｕの顔を撮像し、撮像した画像を、画像データＤ１として、サーバ装置１０に送信する。カメラＣ２が担当者Ｔの顔を撮像し、撮像した画像を、画像データＤ２として、サーバ装置１０に送信する。また、マイクＭが顧客Ｕと担当者Ｔの対話に係る音声を集音し、集音した音声を、音声データＶとして、サーバ装置１０に送信する。サーバ装置１０は、画像データＤ１、画像データＤ２、及び音声データＶを用いて、対話の解析を行い、その解析の処理結果Ｋを、端末装置２０に送信する。 In the dialogue support system 1, during dialogue, the camera C1 captures the face of the customer U and transmits the captured image to the server device 10 as the image data D1. Camera C2 captures the face of person in charge T and transmits the captured image to server device 10 as image data D2. In addition, the microphone M collects the voice related to the conversation between the customer U and the person in charge T, and transmits the collected voice as the voice data V to the server device 10 . The server device 10 analyzes the dialogue using the image data D1, the image data D2, and the voice data V, and transmits the processing result K of the analysis to the terminal device 20. FIG.

ここで、サーバ装置１０が、対話を解析する方法について説明する。サーバ装置１０は、対話の解析の一例として、対話時における顧客Ｕの感情を推定する。サーバ装置１０は、画像データＤ１に示されている顧客Ｕの顔画像の特徴から、複数の感情、例えば、喜び、悲しみ、驚き、怒り、恐れ、中立などの度合いを推定し、推定結果を端末装置２０に送信する。これにより、担当者Ｔは、対話しながら、顧客Ｕの感情を把握することができる。 Here, the method by which the server device 10 analyzes the dialogue will be described. As an example of dialogue analysis, the server device 10 estimates the emotion of the customer U during the dialogue. The server device 10 estimates the degree of a plurality of emotions, such as joy, sadness, surprise, anger, fear, and neutrality, from the features of the face image of the customer U shown in the image data D1, and outputs the estimation result to the terminal. Send to device 20 . Thereby, the person in charge T can understand the emotion of the customer U while having a conversation.

また、サーバ装置１０は、共感度を算出する。共感度は、対話において顧客Ｕと担当者Ｔが共感している度合である。本実施形態では、顧客Ｕの感情と担当者Ｔの感情が似ている場合に、顧客Ｕと担当者Ｔが共感しているとみなす。一方、顧客Ｕの感情と担当者Ｔの感情が似ていない場合に、顧客Ｕと担当者Ｔが共感していないとみなす。例えば、サーバ装置１０は、画像データＤ１に示されている顧客Ｕの顔画像の特徴から顧客Ｕの感情を推定する。また、サーバ装置１０は、画像データＤ２に示されている担当者Ｔの顔画像の特徴から担当者Ｔの感情を推定する。そして、サーバ装置１０は、推定した顧客Ｕの感情と担当者Ｔの感情が似ている度合を、共感度として算出する。これにより、担当者Ｔは、対話しながら、顧客Ｕの感情に共有できているか、すなわち顧客Ｕに寄り添っているかを認識することができる。 The server device 10 also calculates the degree of empathy. The degree of empathy is the degree of empathy between the customer U and the person in charge T in the dialogue. In this embodiment, when the feelings of the customer U and the feelings of the person in charge T are similar, it is assumed that the customer U and the person in charge T empathize. On the other hand, when the feelings of the customer U and the feelings of the person in charge T are not similar, it is considered that the customer U and the person in charge T do not sympathize. For example, the server device 10 estimates the emotion of the customer U from the features of the face image of the customer U shown in the image data D1. Further, the server device 10 estimates the emotion of the person in charge T from the features of the face image of the person in charge T shown in the image data D2. Then, the server device 10 calculates the degree of similarity between the estimated emotion of the customer U and the emotion of the person in charge T as the degree of empathy. Thereby, the person in charge T can recognize whether he/she can share the feelings of the customer U, that is, whether he or she is close to the customer U while having a conversation.

また、サーバ装置１０は、対話時における顧客Ｕが悩んでいる度合（悩み度合）を推定する。本実施形態では、機械学習の手法を用いて、顧客Ｕの悩み度合を推定する。具体的には、本実施形態では、予め、悩み度推定モデルを生成する。悩み度推定モデルは、例えば、顔画像から抽出される特徴点に基づいて、その顔画像に示されている顔の悩み度を推定するモデルである。この場合、悩み度推定モデルは、学習用の顔画像とその特徴点、及び、その画像が悩んでいる顔か否かを示すラベルが対応付けられた学習用データセットを学習することによって、顔画像の特徴点と悩み度との対応関係を学習したモデルである。サーバ装置１０が顧客Ｕの悩み度を推定することより、担当者Ｔは、対話しながら、顧客Ｕが悩んでいるかどうかを把握することができる。 In addition, the server device 10 estimates the degree of concern (degree of concern) of the customer U during the dialogue. In this embodiment, a machine learning technique is used to estimate the degree of concern of the customer U. FIG. Specifically, in this embodiment, a worry level estimation model is generated in advance. The distress level estimation model is, for example, a model that estimates the distress level of the face shown in the face image based on the feature points extracted from the face image. In this case, the worry level estimation model learns a learning data set in which a learning face image, its feature points, and a label indicating whether or not the image is a worried face are associated with each other. It is a model that has learned the correspondence between the feature points of the image and the degree of concern. By the server device 10 estimating the degree of concern of the customer U, the person in charge T can grasp whether the customer U is worried while talking.

また、サーバ装置１０は、共感度と悩み度とを、対話の経過時間と対応づけて、端末装置２０に送信する。これにより、担当者Ｔは、対話の経過に応じて、顧客Ｕの悩み度がどのように変化しているのか、共感度がどのように変化しているのかを把握することができる。 In addition, the server device 10 associates the degree of empathy and the degree of anxieties with the elapsed time of the dialogue and transmits them to the terminal device 20 . As a result, the person in charge T can grasp how the customer U's degree of anxieties is changing and how the degree of sympathy is changing according to the progress of the dialogue.

また、サーバ装置１０は、対話において頻出するキーワード（頻出キーワード）を抽出する。サーバ装置１０は、音声データＶとして集音された音声を、音声認識技術を用いてテキスト化し、テキスト化された文字情報に含まれる名詞のうち、例えば、閾値以上の個数存在する名詞を、頻出キーワードとして抽出し、抽出した頻出キーワードを端末装置２０に送信する。これにより、担当者Ｔは、対話にて頻繁に登場しているキーワードを把握することができる。 The server device 10 also extracts keywords that frequently appear in the dialogue (frequent keywords). The server device 10 converts the voice collected as the voice data V into text using a voice recognition technology, and out of the nouns included in the textual information, for example, the nouns present in a number equal to or greater than a threshold value are identified as frequent occurrences. Keywords are extracted, and the extracted frequent keywords are transmitted to the terminal device 20 . As a result, the person in charge T can grasp the keywords that frequently appear in the dialogue.

また、サーバ装置１０は、類似ケースを抽出する。類似ケースは、現在行われている対話と類似する対話である。類似ケースは、例えば過去に行われた対話のデータベースから抽出される。サーバ装置１０は、対話から抽出した頻出キーワードを用いて、過去に行われた対話のデータベースを検索し、頻出キーワードと同等なキーワードが対応づけられている対話を、類似ケースとして抽出する。サーバ装置１０は、抽出した類似ケースを端末装置２０に送信する。これにより、担当者Ｔは、過去に似たような対話が行われていた場合に、その対話の内容を参考にすることができる。 The server device 10 also extracts similar cases. Similar cases are interactions that are similar to the current interaction. Similar cases are extracted, for example, from a database of past interactions. The server device 10 searches a database of past dialogues using the frequent keywords extracted from the dialogues, and extracts dialogues associated with keywords equivalent to the frequent keywords as similar cases. The server device 10 transmits the extracted similar cases to the terminal device 20 . Thereby, the person in charge T can refer to the content of the dialogue when a similar dialogue has been held in the past.

また、サーバ装置１０は、顧客Ｕのアセスメントデータを、端末装置２０に送信する。アセスメントデータは、顧客Ｕに対する事前の調査結果であり、例えば、対話の前に行った顧客Ｕへのアンケート結果である。これにより、担当者Ｔは、対話を行う前や、対話を行っている途中において、顧客Ｕのアセスメントデータを確認することができる。 The server device 10 also transmits the assessment data of the customer U to the terminal device 20 . The assessment data are the results of preliminary research on the customer U, such as the results of a questionnaire to the customer U conducted before the dialogue. As a result, the person in charge T can confirm the assessment data of the customer U before or during the dialogue.

図２は、実施形態に係る対話支援システム１の構成例を示すブロック図である。対話支援システム１は、例えば、二台のカメラＣ（カメラＣ１、及びカメラＣ２）と、マイクＭと、サーバ装置１０と、端末装置２０とを備える。これら対話支援システム１の構成要素（カメラＣ、マイクＭ、サーバ装置１０、及び端末装置２０）は、通信ネットワークＮＷを介して通信可能に接続されている。 FIG. 2 is a block diagram showing a configuration example of the dialogue support system 1 according to the embodiment. The dialogue support system 1 includes, for example, two cameras C (a camera C1 and a camera C2), a microphone M, a server device 10, and a terminal device 20. These components of the dialogue support system 1 (camera C, microphone M, server device 10, and terminal device 20) are communicably connected via a communication network NW.

サーバ装置１０は、例えば、通信部１１と、記憶部１２と、制御部１３とを備える。通信部１１は、カメラＣと、マイクＭと、端末装置２０と通信する。例えば、通信部１１は、カメラＣから画像データＤを取得する。通信部１１は、マイクＭから音声データＶを取得する。通信部１１は、端末装置２０に、対話の解析結果を、処理結果Ｋとして送信する。 The server device 10 includes a communication unit 11, a storage unit 12, and a control unit 13, for example. The communication unit 11 communicates with the camera C, the microphone M, and the terminal device 20 . For example, the communication unit 11 acquires the image data D from the camera C. The communication unit 11 acquires the voice data V from the microphone M. The communication unit 11 transmits the analysis result of the dialogue as the processing result K to the terminal device 20 .

記憶部１２は、記憶媒体、例えば、ＨＤＤ（Hard Disk Drive）、フラッシュメモリ、ＥＥＰＲＯＭ（Electrically Erasable Programmable Read Only Memory）、ＲＡＭ（Random Access read/write Memory）、ＲＯＭ（Read Only Memory）、または、これらの記憶媒体の任意の組み合わせによって構成される。記憶部１２は、サーバ装置１０の各種の処理を実行するためのプログラム、及び各種の処理を行う際に利用される一時的なデータを記憶する。 The storage unit 12 is a storage medium such as a HDD (Hard Disk Drive), flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), RAM (Random Access read/write Memory), ROM (Read Only Memory), or any of these any combination of storage media. The storage unit 12 stores programs for executing various processes of the server device 10 and temporary data used when performing various processes.

記憶部１２は、例えば、アセスメントデータ１２０と、感情データ１２１と、共感度データ１２３と、キーワードデータ１２４と、ケーステキストデータ１２５と、学習済モデル１２６とを記憶する。アセスメントデータ１２０は、事前に行った顧客Ｕへのアンケート結果を示す情報である。感情データ１２１は、対話における顧客Ｕの感情を示す情報である。悩み度データ１２２は、対話における顧客Ｕの悩み度を示す情報である。共感度データ１２３は、対話における顧客Ｕと担当者Ｔの共感度を示す情報である。キーワードデータ１２４は、対話における頻出キーワードを示す情報である。ケーステキストデータ１２５は、類似ケースを示す情報である。学習済モデル１２６は、悩み度推定モデルの構成を示す情報である。 The storage unit 12 stores, for example, assessment data 120, emotion data 121, empathy data 123, keyword data 124, case text data 125, and learned models 126. FIG. The assessment data 120 is information indicating the result of a questionnaire given to the customer U in advance. The emotion data 121 is information indicating the emotion of the customer U in the dialogue. The trouble level data 122 is information indicating the trouble level of the customer U in the dialogue. The degree of empathy data 123 is information indicating the degree of empathy between the customer U and the person in charge T in the dialogue. The keyword data 124 is information indicating frequently appearing keywords in dialogue. The case text data 125 is information indicating similar cases. The learned model 126 is information indicating the configuration of the worry level estimation model.

制御部１３は、サーバ装置１０がハードウェアとして備えるＣＰＵ（Central Processing Unit）にプログラムを実行させることによって実現される。制御部１３は、サーバ装置１０を統括的に制御する。制御部１３は、例えば、取得部１３０と、アンケート処理部１３１と、画像処理部１３２と、音声処理部１３３と、装置制御部１３４とを備える。 The control unit 13 is implemented by causing a CPU (Central Processing Unit) provided as hardware in the server device 10 to execute a program. The control unit 13 comprehensively controls the server device 10 . The control unit 13 includes an acquisition unit 130, a questionnaire processing unit 131, an image processing unit 132, an audio processing unit 133, and a device control unit 134, for example.

取得部１３０は、事前に行った顧客Ｕへのアンケートを取得する。アンケートは、例えば、顧客Ｕが、自身のスマートフォンなどの端末装置（不図示）に表示されたアンケートに回答を入力し、「回答を送信する」などと記載されたボタンを押下する操作を行うことによって、端末装置からサーバ装置１０に送信される。取得部１３０は、取得したアンケートを、アンケート処理部１３１に出力する。 Acquisition unit 130 acquires a questionnaire to customer U conducted in advance. For the questionnaire, for example, the customer U inputs an answer to a questionnaire displayed on a terminal device (not shown) such as his/her own smartphone, and presses a button such as "send answer". is transmitted from the terminal device to the server device 10. Acquisition unit 130 outputs the acquired questionnaire to questionnaire processing unit 131 .

アンケート処理部１３１は、アンケートから得られた回答に基づいて、アセスメントデータ１２０を生成する。アンケート処理部１３１は、例えば、アンケートで得られた、顧客Ｕの性別や年齢、及び家族関係などを取得し、エラーチェックを行う。アンケート処理部１３１は、例えば、年齢が極端に若すぎたり、１００歳を超えるような高齢であったりする場合などをエラーとして抽出する。アンケート処理部１３１は、エラーチェック後のアンケート結果を、アセスメントデータ１２０として、記憶部１２に記憶させる。 The questionnaire processing unit 131 generates assessment data 120 based on the answers obtained from the questionnaire. The questionnaire processing unit 131 acquires, for example, the gender, age, and family relationship of the customer U obtained from the questionnaire, and performs an error check. For example, the questionnaire processing unit 131 extracts, as an error, the case where the age is extremely young or the age is over 100 years old. The questionnaire processing unit 131 causes the storage unit 12 to store the questionnaire result after the error check as the assessment data 120 .

取得部１３０は、画像データＤ１、Ｄ２を取得し、取得した画像データＤ１、Ｄ２を、画像処理部１３２に出力する。 Acquisition unit 130 acquires image data D1 and D2 and outputs the acquired image data D1 and D2 to image processing unit 132 .

画像処理部１３２は、画像データＤ１に基づいて、感情データ１２１を生成する。画像処理部１３２は、画像データＤ１にて示される顔画像から、特徴点を抽出する。特徴点は、感情或いは悩み度合を推定する際に、特徴となり得る点である。特徴点は、例えば、眉や目、及び口唇の形状や、目じり、口角の位置などを示す点群である。画像処理部１３２は、抽出した特徴点に基づいて、顧客Ｕの感情を推定する。例えば、画像処理部１３２は、眉の形状が眉尻に向かう方向に下がる傾向にあり、口角の位置が口唇の中央部より下にある場合には、悲しみの感情が高く、喜びの感情が低いと推定する。一方、画像処理部１３２は、眉の形状が眉尻に向かう方向に下がる傾向にはなく、口角の位置が口唇の中央部より上にある場合には、悲しみの感情が低く、喜びの感情が高いと推定する。画像処理部１３２は、その他の感情、例えば、驚き、怒り、恐れ、及び中立などの感情についても同様な方法で推定する。例えば、画像処理部１３２は、喜び、悲しみ、驚き、怒り、恐れ、中立などの度合いを割合で推定する。画像処理部１３２は、推定した感情を、感情データ１２１として、記憶部１２に記憶させる。 Image processing section 132 generates emotion data 121 based on image data D1. The image processing unit 132 extracts feature points from the facial image represented by the image data D1. A feature point is a point that can be a feature when estimating an emotion or a degree of worry. A feature point is, for example, a point group indicating the shape of eyebrows, eyes, and lips, and the positions of the corners of the eyes and the corners of the mouth. The image processing unit 132 estimates the emotion of the customer U based on the extracted feature points. For example, if the shape of the eyebrows tends to fall toward the outer ends of the eyebrows and the position of the corners of the mouth is below the center of the lips, the image processing unit 132 indicates that the emotion of sadness is high and the emotion of joy is low. We estimate that On the other hand, the image processing unit 132 does not tend to lower the shape of the eyebrows in the direction toward the outer ends of the eyebrows, and when the corners of the mouth are above the center of the lips, the emotion of sadness is low and the emotion of joy is low. Estimate high. The image processor 132 estimates other emotions such as surprise, anger, fear, and neutrality in a similar manner. For example, the image processing unit 132 estimates the degree of joy, sadness, surprise, anger, fear, neutrality, etc., as a percentage. The image processing unit 132 stores the estimated emotion in the storage unit 12 as the emotion data 121 .

画像処理部１３２は、画像データＤ１に基づいて、悩み度データ１２２を生成する。画像処理部１３２は、画像データＤ１にて示される顔画像から、特徴点を抽出し、抽出した特徴点を示す情報を、悩み度推定モデルに入力する。悩み度推定モデルは、入力された特徴点から推定される悩み度を出力する。画像処理部１３２は、悩み度推定モデルから出力された悩み度を、その顔画像における悩み度として推定する。画像処理部１３２は、推定した悩み度を、悩み度データ１２２として、記憶部１２に記憶させる。 The image processing unit 132 generates the worry level data 122 based on the image data D1. The image processing unit 132 extracts feature points from the face image represented by the image data D1, and inputs information indicating the extracted feature points to the degree-of-worry estimation model. Anxiety level estimation model outputs anxiety level estimated from input feature points. The image processing unit 132 estimates the distress level output from the distress level estimation model as the distress level of the face image. The image processing unit 132 stores the estimated degree of concern in the storage unit 12 as the degree of concern data 122 .

画像処理部１３２は、画像データＤ１、Ｄ２に基づいて、共感度データ１２３を生成する。画像処理部１３２は、画像データＤ１、Ｄ２にて示される顔画像から、顧客Ｕ及び担当者Ｔの感情をそれぞれ推定する。画像処理部１３２が、顧客Ｕ及び担当者Ｔそれぞれの感情を推定する方法は、画像処理部１３２が感情データ１２１を生成する際に用いた方法と同様であるため、その説明を省略する。 The image processing unit 132 generates empathy data 123 based on the image data D1 and D2. The image processing unit 132 estimates the emotions of the customer U and the person in charge T from the facial images indicated by the image data D1 and D2. The method by which the image processing unit 132 estimates the emotion of each of the customer U and the person in charge T is the same as the method used by the image processing unit 132 to generate the emotion data 121, so description thereof will be omitted.

画像処理部１３２は、顧客Ｕ及び担当者Ｔそれぞれの感情の近さに基づいて、共感度データ１２３を生成する。画像処理部１３２は、例えば、それぞれの感情をベクトルとみなした多次元の感情のベクトル空間における、顧客Ｕの位置、及び担当者Ｔの位置を算出する。画像処理部１３２は、顧客Ｕの位置から担当者Ｔまでの距離を算出し、算出した距離の大きさに応じた値を共感度とする。例えば、画像処理部１３２は、下記の（１）式を用いて、共感度を算出する。（１）式におけるａ_ｉは顧客Ｕの、ｉ番目の種別の感情（ｉ）の推定値を示す。ｂ_ｉは顧客Ｕの、ｉ番目の種別の感情（ｉ）の推定値を示す。ｉは、ｉ=１～ｎの自然数であり、ｎは画像処理部１３２が推定した感情の数である。感情（ｉ）を何れの感情の種別とするかは、任意に設定されてよいが、例えば、感情（１）は喜び、感情（２）は悲しみ、などに設定される。 The image processing unit 132 generates the empathy level data 123 based on the emotional closeness between the customer U and the person in charge T. FIG. The image processing unit 132, for example, calculates the position of the customer U and the position of the person in charge T in a multidimensional emotion vector space in which each emotion is regarded as a vector. The image processing unit 132 calculates the distance from the position of the customer U to the person in charge T, and sets a value corresponding to the magnitude of the calculated distance as the degree of empathy. For example, the image processing unit 132 calculates the degree of empathy using the following formula (1). In the expression (1), a _i indicates the estimated value of the customer U's emotion (i) of the i-th type. b _i denotes the estimated value of customer U's i-th type of emotion (i). i is a natural number from i=1 to n, and n is the number of emotions estimated by the image processing unit 132 . The type of emotion (i) may be set arbitrarily. For example, emotion (1) is set to joy and emotion (2) is set to be sad.

（１）式に示すように、顧客Ｕと担当者Ｔの互いの感情（ｉ）の推定値が近く、その差分が０（ゼロ）に近い場合、共感度は１に近づく。一方、顧客Ｕと担当者Ｔの互いの感情（ｉ）の推定値の差分が大きい場合、共感度は０に近づく。この場合、共感度が大きいほど、顧客Ｕと担当者Ｔが共感していることを示し、共感度が小さいほど、顧客Ｕと担当者Ｔが共感していないことを示している。画像処理部１３２は、算出した共感度を、共感度データ１２３として、記憶部１２に記憶させる。 As shown in the formula (1), when the estimated values of the feelings (i) of the customer U and the person in charge T are close to each other and the difference is close to 0 (zero), the degree of empathy approaches 1. On the other hand, when the difference between the estimated values of feelings (i) of the customer U and the person in charge T is large, the degree of empathy approaches zero. In this case, a higher degree of empathy indicates that the customer U and the person in charge T are in empathy, and a smaller degree of empathy indicates that the customer U and the person in charge T are less empathetic. The image processing unit 132 causes the storage unit 12 to store the calculated degree of empathy as the degree of empathy data 123 .

取得部１３０は、音声データＶを取得し、取得した音声データＶを、音声処理部１３３に出力する。 Acquisition unit 130 acquires audio data V and outputs acquired audio data V to audio processing unit 133 .

音声処理部１３３は、音声データＶに基づいて、キーワードデータ１２４を生成する。音声処理部１３３は、音声データＶにおける音声を、音声認識技術を用いてテキスト化する。ここで用いる音声認識技術は任意の技術であってよい。例えば、音声処理部１３３は、音素ごとの音響モデルを用いて音声を音素に変換し、変換した音素をつなげた音素群を単語に変換して出力することにより音声認識を行い、音声認識した結果をテキスト化する。音声処理部１３３は、テキスト化したデータに含まれる単語を検索し、対話開始から現時点までの間に、所定の回数以上、出現する単語を抽出する。音声処理部１３３は、抽出した単語を、キーワードデータ１２４として、記憶部１２に記憶させる。 The voice processing unit 133 generates keyword data 124 based on the voice data V. FIG. The speech processing unit 133 converts the speech in the speech data V into text using speech recognition technology. The speech recognition technology used here may be any technology. For example, the speech processing unit 133 performs speech recognition by converting speech into phonemes using an acoustic model for each phoneme, converting a group of phonemes connecting the converted phonemes into words, and outputting the speech recognition result. to text. The speech processing unit 133 searches for words contained in the text data, and extracts words that appear more than a predetermined number of times from the start of the dialogue to the present time. The speech processing unit 133 stores the extracted words in the storage unit 12 as the keyword data 124 .

音声処理部１３３は、音声データＶに基づいて、ケーステキストデータ１２５を生成する。音声処理部１３３は、例えば、音声データＶをテキスト化した文章（以下、第１文章という）を、その文章の内容に基づいてベクトル表現する。例えば、音声処理部１３３は、Ｄｏｃ２Ｖｅｃを用いて、文章をベクトル表現する。文章を、Ｄｏｃ２Ｖｅｃを用いてベクトル表現すると、例えば介護や家族の話題など、対話の内容が似ているものが、似たようなベクトル値をもつベクトルに数値化される。 The voice processing unit 133 generates case text data 125 based on the voice data V. FIG. The speech processing unit 133, for example, expresses a sentence obtained by converting the speech data V into text (hereinafter referred to as a first sentence) as a vector based on the content of the sentence. For example, the speech processing unit 133 uses Doc2Vec to express a sentence as a vector. When texts are vector-expressed using Doc2Vec, conversations with similar content, such as nursing care and family topics, are digitized into vectors having similar vector values.

音声処理部１３３は、過去の対話がテキスト化された文章（以下、第２文章という）をベクトル表現したデータベースを参照する。音声処理部１３３は、第１文章をベクトル表現した場合におけるベクトル空間上の位置座標（以下、第１位置という）と、第２文章をベクトル表現した場合におけるベクトル空間上の位置座標（以下、第２位置という）が、所定の距離以内となる第２文章を、類似ケースとして抽出する。画像処理部１３２は、抽出した類似ケースを、ケーステキストデータ１２５として、記憶部１２に記憶させる。 The speech processing unit 133 refers to a database in which sentences (hereinafter, referred to as second sentences) in which past dialogues have been converted into text are expressed in vectors. The speech processing unit 133 generates positional coordinates in the vector space when the first sentence is expressed in vectors (hereinafter referred to as the first position) and positional coordinates in the vector space when the second sentence is expressed in vectors (hereinafter referred to as the first position). 2 positions) are within a predetermined distance are extracted as similar cases. The image processing unit 132 stores the extracted similar cases in the storage unit 12 as case text data 125 .

装置制御部１３４は、サーバ装置１０を統括的に制御する。装置制御部１３４は、例えば、通信部１１がアンケートの回答を受信した場合に、そのアンケートの回答を取得部１３０に出力する。装置制御部１３４は、画像処理部１３２が感情データ１２１などのデータを生成すると、そのデータを通信部１１に出力して端末装置２０に通知する。装置制御部１３４は、音声処理部１３３がキーワードデータ１２４などのデータを生成すると、そのデータを通信部１１に出力して端末装置２０に通知する。 The device control unit 134 controls the server device 10 as a whole. For example, when the communication unit 11 receives a questionnaire answer, the device control unit 134 outputs the questionnaire answer to the acquisition unit 130 . When the image processing unit 132 generates data such as the emotion data 121 , the device control unit 134 outputs the data to the communication unit 11 and notifies the terminal device 20 of the data. When the voice processing unit 133 generates data such as the keyword data 124, the device control unit 134 outputs the data to the communication unit 11 and notifies the terminal device 20 of it.

図３は、実施形態に係るアセスメントデータ１２０の例を示す図である。アセスメントデータ１２０は、例えば、アセスメントＮｏ、顧客Ｕ、入居希望者などの項目を備える。ここでは対話の内容が、顧客Ｕの家族を入居させる介護施設について相談する内容である場合を例示している。このため、アンケートに、介護施設に入居させる家族、すなわち、入居希望者についての項目があり、この項目から得られた回答が、アセスメントデータ１２０における入居希望者の項目に記憶される。 FIG. 3 is a diagram showing an example of assessment data 120 according to the embodiment. The assessment data 120 includes, for example, items such as assessment number, customer U, and tenant applicant. Here, a case is exemplified in which the content of the dialogue is the content of consultation about the nursing care facility where the family of the customer U resides. For this reason, the questionnaire includes items about family members to be moved into nursing care facilities, that is, applicants.

アセスメントＮｏは、アセスメントデータ１２０を一意に特定する番号などの識別情報である。顧客Ｕの項目には、アンケートの回答から得られた顧客に関する情報、例えば、氏名、及び住所などを示す情報が記憶される。入居希望者は、アンケートの回答から得られた入居希望者に関する情報、例えば、入居希望者の氏名、及び住所などを示す情報が記憶される。 The assessment number is identification information such as a number that uniquely identifies the assessment data 120 . In the item of customer U, information about customers obtained from responses to questionnaires, such as information indicating names and addresses, is stored. For the prospective tenant, information about the prospective tenant obtained from answers to questionnaires, for example, information indicating the name and address of the prospective tenant is stored.

図４は、実施形態に係る顧客Ｕの感情データ１２１Ａの例を示す図である。感情データ１２１Ａは、例えば、顧客Ｕ顔画像Ｎｏ、時間、喜び、悲しみ、驚き、怒り、などの項目を備える。顧客Ｕ顔画像Ｎｏは、画像データＤ１から得られた顧客Ｕの顔画像を一意に特定する番号などの識別情報である。時間は、顔画像が撮像された時刻を示す情報である。時間は、例えば、対話開始時刻を基準に設定される。喜び、悲しみ、驚き、怒り、などの項目には、それぞれの感情について音声処理部１３３によって推定された、その感情を示す割合が記憶される。 FIG. 4 is a diagram showing an example of customer U emotion data 121A according to the embodiment. The emotion data 121A includes, for example, items such as customer U face image number, time, joy, sadness, surprise, and anger. The customer U face image No. is identification information such as a number that uniquely identifies the face image of the customer U obtained from the image data D1. Time is information indicating the time when the face image was captured. The time is set based on, for example, the dialogue start time. Items such as joy, sadness, surprise, anger, etc. store the ratios of emotions estimated by the speech processing unit 133 for each emotion.

図５は、実施形態に係る担当者Ｔの感情データ１２１Ｂの例を示す図である。感情データ１２１Ｂの構成は、感情データ１２１Ａと同様であるため、その説明を省略する。担当者Ｔの顔画像から推定した感情についても、感情データ１２１Ａｔｏ同様に、記憶部１２に記憶されてよい。 FIG. 5 is a diagram showing an example of the emotion data 121B of the person in charge T according to the embodiment. Since the configuration of the emotion data 121B is the same as that of the emotion data 121A, its explanation is omitted. The emotion estimated from the face image of the person in charge T may also be stored in the storage unit 12 in the same manner as the emotion data 121Ato.

図６は、実施形態に係る悩み度データ１２２の例を示す図である。悩み度データ１２２は、例えば、顧客Ｕ顔画像Ｎｏ、時間、悩み度などの項目を備える。顧客Ｕ顔画像Ｎｏ、及び時間については、感情データ１２１Ａと同様であるため、その説明を省略する。悩み度の項目には、音声処理部１３３によって推定された、悩み度が記憶される。 FIG. 6 is a diagram showing an example of the worry level data 122 according to the embodiment. The worry level data 122 includes, for example, items such as the customer U face image number, time, and worry level. The customer U face image No. and time are the same as the emotion data 121A, so description thereof will be omitted. The degree of concern estimated by the voice processing unit 133 is stored in the degree of concern item.

図７は、実施形態に係る共感度データ１２３の例を示す図である。共感度データ１２３は、例えば、時間と、共感度などの項目を備える。時間は、感情データ１２１Ａにおける時間と同様であるため、その説明を省略する。共感度の項目には、音声処理部１３３によって算出された共感度が記憶される。 FIG. 7 is a diagram showing an example of empathy data 123 according to the embodiment. The empathy level data 123 includes items such as time and empathy level, for example. Since the time is the same as the time in the emotion data 121A, its explanation is omitted. The degree of empathy calculated by the voice processing unit 133 is stored in the item of degree of empathy.

図８は、実施形態に係るキーワードデータ１２４の例を示す図である。キーワードデータ１２４は、例えば、テキストデータと、頻出キーワードなどの項目を備える。テキストデータは、音声データＶがテキスト化された情報である。テキストデータは、例えば、対話の時間経過に伴って更新される。頻出キーワードは、対話において所定の回数以上登場する単語であり、例えば、キーワード１、キーワード２のように、複数の単語が記憶されてよい。 FIG. 8 is a diagram showing an example of keyword data 124 according to the embodiment. The keyword data 124 includes, for example, items such as text data and frequently appearing keywords. The text data is information obtained by converting the voice data V into text. The text data is updated, for example, along with the passage of time of dialogue. A frequent keyword is a word that appears more than a predetermined number of times in a dialogue, and a plurality of words such as keyword 1 and keyword 2 may be stored.

図９は、実施形態に係るケーステキストデータ１２５の例を示す図である。ケーステキストデータ１２５は、例えば、テキストデータと類似ケースなどの項目を備える。テキストデータは、キーワードデータ１２４におけるテキストデータと同様であるため、その説明を省略する。類似ケースは、音声処理部１３３によって抽出された類似ケースであり、例えば、ケース１、ケース２のように、複数の類似ケースが記憶されてよい。また、ケーステキストデータ１２５に、類似ケースの類似度合いを示す「類似スコア」が記憶されてもよい。類似スコアは、例えば、ベクトル空間における文章同士の距離の近さに応じて算出される。 FIG. 9 is a diagram showing an example of case text data 125 according to the embodiment. The case text data 125 includes items such as text data and similar cases, for example. Since the text data is the same as the text data in the keyword data 124, the description thereof is omitted. A similar case is a similar case extracted by the speech processing unit 133, and a plurality of similar cases such as case 1 and case 2 may be stored. Also, the case text data 125 may store a “similarity score” that indicates the degree of similarity of similar cases. A similarity score is calculated, for example, according to the closeness of the distance between sentences in the vector space.

なお、学習済モデル１２６は、悩み度推定モデルを構築するために必要な情報が記憶される。モデルを構築するために必要な情報とは、具体的には、モデルの構成や、使用するパラメータの設定値等である。例えば、モデルが、ＣＮＮ（Convolutional Neural Network）である場合、モデルの構成は、ＣＮＮの入力層、中間層、出力層の各層のユニット数、中間層の層数、活性化関数などを示す情報である。使用するパラメータは、各階層のノードを結合する結合係数や重みを示す情報である。 Note that the learned model 126 stores information necessary for constructing a worry level estimation model. The information necessary for constructing the model is, specifically, the configuration of the model, the set values of the parameters to be used, and the like. For example, if the model is a CNN (Convolutional Neural Network), the configuration of the model is information indicating the number of units in each layer of the CNN input layer, intermediate layer, and output layer, the number of intermediate layers, the activation function, and so on. be. The parameters to be used are information indicating coupling coefficients and weights for coupling nodes in each hierarchy.

ここで、端末装置２０に表示される画像の例について説明する。図１０及び図１１は、実施形態に係る端末装置２０に表示される画像の例を示す図である。 Here, examples of images displayed on the terminal device 20 will be described. 10 and 11 are diagrams showing examples of images displayed on the terminal device 20 according to the embodiment.

図１０に示すように、端末装置２０のディスプレイには、例えば、画像２００～２０７が表示される。画像２００～２０２は、担当者Ｔが操作するボタン画像の例を示している。例えば、画像２００は、対話においてリアルタイムに進行する状況を表示させるためのボタン画像である。画像２０１は、顧客Ｕのアセスメントデータを表示させるためのボタン画像である。画像２０２は、画像の表示を終了させる場合に選択されるボタン画像である。なお、図１０の例では、画像２００が選択され、対話においてリアルタイムに進行する状況が表示された場合の例が示されている。 As shown in FIG. 10, images 200 to 207 are displayed on the display of the terminal device 20, for example. Images 200 to 202 show examples of button images that the person in charge T operates. For example, the image 200 is a button image for displaying the progress of dialogue in real time. An image 201 is a button image for displaying customer U's assessment data. An image 202 is a button image that is selected when ending the display of the image. Note that the example of FIG. 10 shows an example in which the image 200 is selected and the situation in which the dialogue progresses in real time is displayed.

画像２０３は、顧客Ｕと担当者Ｔの顔画像を示している。画像２０３は、例えば、対話の進行に応じて、リアルタイムに変化する。画像２０３における顧客Ｕと担当者Ｔの顔画像のそれぞれに、特徴点が示されていてもよい。 An image 203 shows the facial images of the customer U and the person in charge T. FIG. The image 203 changes in real time, for example, according to the progress of the dialogue. A feature point may be indicated in each of the face images of the customer U and the person in charge T in the image 203 .

画像２０４は、頻出キーワードを示している。画像２０４は、サーバ装置１０から通知されるキーワードデータ１２４に基づいて表示され、例えば、対話の進行に応じて、リアルタイムに変化する。 The image 204 shows frequent keywords. The image 204 is displayed based on the keyword data 124 notified from the server device 10, and changes in real time according to the progress of the dialogue, for example.

画像２０５は、顧客Ｕの感情分析の結果を示している。画像２０５は、サーバ装置１０から通知される感情データ１２１に基づいて表示され、例えば、対話の進行に応じて、リアルタイムに変化する。 Image 205 shows the result of customer U's sentiment analysis. The image 205 is displayed based on the emotion data 121 notified from the server device 10, and changes in real time according to the progress of the dialogue, for example.

画像２０６は、類似ケースを示している。画像２０６は、サーバ装置１０から通知されるケーステキストデータ１２５に基づいて表示され、例えば、対話の進行に応じて、リアルタイムに変化する。 Image 206 shows a similar case. The image 206 is displayed based on the case text data 125 notified from the server device 10, and changes in real time according to the progress of the dialogue, for example.

画像２０７は、寄り添い状態を示している。図１１には、画像２０７を拡大させた例が示されている。図１１に示すように、画像２０７には、上側に悩み度、下側に共感度がそれぞれ、時系列で示されている。この図の例では、悩み度が低い場合には緩和状態にあり、悩み度が高い場合には深刻な状態にあることを示している。また、共感度が低い場合には不協和であることを示し、共感度が高い場合には伴走しており、担当者Ｔが顧客Ｕの感情に寄り添えていることを示している。 An image 207 shows the snuggling state. FIG. 11 shows an example in which the image 207 is enlarged. As shown in FIG. 11, in the image 207, the degree of worry is shown on the upper side, and the degree of sympathy is shown on the lower side in chronological order. In the example of this figure, when the degree of worry is low, it is in a relaxed state, and when the degree of worry is high, it is in a serious state. In addition, when the degree of empathy is low, it indicates dissonance, and when the degree of empathy is high, it indicates that the person in charge T is accompanying the customer U, indicating that the person in charge T is close to the customer U's feelings.

ここで、サーバ装置１０が行う処理の流れについて説明する。図１２及び図１３は、実施形態に係るサーバ装置１０が行う処理の流れを示すフローチャートである。 Here, the flow of processing performed by the server device 10 will be described. 12 and 13 are flowcharts showing the flow of processing performed by the server device 10 according to the embodiment.

図１２には、画像データＤ１、Ｄ２を用いて行う処理の流れが示されている。図１３には、音声データＶを用いて行う処理の流れが示されている。 FIG. 12 shows the flow of processing performed using the image data D1 and D2. FIG. 13 shows the flow of processing performed using the voice data V. As shown in FIG.

図１２に示すように、サーバ装置１０は、時刻ｔにおける画像データＤ１を取得する（ステップＳ１０）。サーバ装置１０は、取得した画像データＤ１から、顔の特徴点を抽出する（ステップＳ１１）。サーバ装置１０は、抽出した特徴点に基づいて、顧客Ｕの感情を推定し、推定した感情を感情データ１２１Ａとして記憶させる（ステップＳ１２）。また、サーバ装置１０は、ステップＳ１１で抽出した特徴点と悩み度推定モデルに基づいて、顧客Ｕの悩み度を推定し、推定した悩み度を悩み度データ１２２として記憶させる（ステップＳ１３）。 As shown in FIG. 12, the server device 10 acquires image data D1 at time t (step S10). The server device 10 extracts facial feature points from the acquired image data D1 (step S11). The server device 10 estimates the emotion of the customer U based on the extracted feature points, and stores the estimated emotion as the emotion data 121A (step S12). The server device 10 also estimates the degree of concern of the customer U based on the feature points and the degree-of-concern estimation model extracted in step S11, and stores the estimated degree of concern as the degree-of-concern data 122 (step S13).

一方、サーバ装置１０は、時刻ｔにおける画像データＤ２を取得する（ステップＳ１４）。サーバ装置１０は、取得した画像データＤ２から、顔の特徴点を抽出する（ステップＳ１５）。サーバ装置１０は、抽出した特徴点に基づいて、担当者Ｔの感情を推定し、推定した感情を感情データ１２１Ｂとして記憶させる（ステップＳ１６）。 On the other hand, the server device 10 acquires the image data D2 at time t (step S14). The server device 10 extracts facial feature points from the acquired image data D2 (step S15). Server device 10 estimates the emotion of person in charge T based on the extracted feature points, and stores the estimated emotion as emotion data 121B (step S16).

サーバ装置１０は、ステップＳ１２で推定した顧客Ｕの感情と、ステップＳ１６で推定した担当者Ｔの感情とに基づいて、顧客Ｕと担当者Ｔの共感度を推定し、推定した共感度を、共感度データ１２３として記憶させる（ステップＳ１７）。 The server device 10 estimates the degree of empathy of the customer U and the person in charge T based on the emotion of the customer U estimated in step S12 and the emotion of the person in charge T estimated in step S16, and calculates the estimated degree of empathy as This is stored as empathy data 123 (step S17).

サーバ装置１０は、時刻をｔから、ｔ＋Δｔに進め（ステップＳ１８）、対話が終了するなどして、画像データＤ１、Ｄ２の取得が終了するまで、ステップＳ１０～ステップＳ１７に示す処理を繰り返し行う。なお、感情を推定する処理については、顧客Ｕと担当者Ｔのどちらを先に推定してもよい。具体的には、ステップＳ１０～Ｓ１２に示す処理を行う前に、ステップＳ１４～Ｓ１６に示す処理が行われてもよい。 The server device 10 advances the time from t to t+Δt (step S18), and repeats the processing shown in steps S10 to S17 until the acquisition of the image data D1 and D2 ends, such as when the dialogue ends. Regarding the process of estimating emotion, either the customer U or the person in charge T may be estimated first. Specifically, the processes shown in steps S14 to S16 may be performed before performing the processes shown in steps S10 to S12.

図１３に示すように、サーバ装置１０は、時刻ｔにおける音声データＶを取得する（ステップＳ２０）。サーバ装置１０は、取得した音声データＶを、音声認識技術を用いてテキスト化し、キーワードデータ１２４のテキストデータとして記憶させる（ステップＳ２１）。サーバ装置１０は、テキストデータから、頻出キーワードを抽出し、抽出した頻出キーワードをキーワードデータ１２４に記憶させる（ステップＳ２２）。 As shown in FIG. 13, the server device 10 acquires voice data V at time t (step S20). The server device 10 converts the obtained voice data V into text using voice recognition technology and stores it as text data of the keyword data 124 (step S21). The server device 10 extracts frequently appearing keywords from the text data, and stores the extracted frequently appearing keywords in the keyword data 124 (step S22).

また、サーバ装置１０は、類似ケースを抽出し、抽出した類似ケースをケーステキストデータ１２５として記憶させる（ステップＳ２３）。サーバ装置１０は、ステップＳ２１でテキスト化した文章を、Ｄｏｃ２Ｖｅｃなどを用いてベクトル表現する。サーバ装置１０は、過去の対話などのデータベースを参照し、ベクトル空間において、今回の対話と近いベクトル値をもつ対話を、類似ケースとして抽出する。サーバ装置１０は、時刻をｔから、ｔ＋Δｔに進め（ステップＳ２４）、対話が終了するなどして、音声データＶの取得が終了するまで、ステップＳ２０～ステップＳ２３に示す処理を繰り返し行う。 The server device 10 also extracts similar cases and stores the extracted similar cases as case text data 125 (step S23). The server device 10 vector-expresses the sentence converted to text in step S21 using Doc2Vec or the like. The server device 10 refers to a database of past conversations, etc., and extracts, as similar cases, conversations having vector values close to the current conversation in the vector space. The server device 10 advances the time from t to t+Δt (step S24), and repeats the processing shown in steps S20 to S23 until acquisition of the voice data V ends, such as when the dialogue ends.

サーバ装置１０は、上記のフローチャートで記憶させた感情データ１２１～ケーステキストデータ１２５（以下、感情データ１２１等という）を、定期的に端末装置２０に送信する。サーバ装置１０は、感情データ１２１等を生成する度に、端末装置２０に感情データ１２１等を送信するようにしてもよいし、ある程度まとめて端末装置２０に感情データ１２１等を送信してもよい。 The server device 10 periodically transmits the emotion data 121 to case text data 125 (hereinafter referred to as emotion data 121 and the like) stored in the flowchart above to the terminal device 20 . The server device 10 may transmit the emotion data 121 or the like to the terminal device 20 each time it generates the emotion data 121 or the like, or may collectively transmit the emotion data 121 or the like to the terminal device 20 to some extent. .

以上説明した通り、実施形態のサーバ装置１０は、取得部１３０と、画像処理部１３２と、通信部１１とを備える。取得部１３０は、画像データＤ１と、画像データＤ２を取得する。画像処理部１３２は、画像データＤ１が示す顧客Ｕの顔の感情を推定する。画像処理部１３２は、画像データＤ２が示す担当者Ｔの顔の感情を推定する。画像処理部１３２は、推定した顧客Ｕの感情と、担当者Ｔの感情とが類似する度合を、対話における共感度として算出する。通信部１１は、画像処理部１３２によって算出された共感度を、端末装置２０に送信する。 As described above, the server device 10 of the embodiment includes the acquisition unit 130 , the image processing unit 132 and the communication unit 11 . Acquisition unit 130 acquires image data D1 and image data D2. The image processing unit 132 estimates the facial emotion of the customer U indicated by the image data D1. The image processing unit 132 estimates the emotion of the person in charge T's face indicated by the image data D2. The image processing unit 132 calculates the degree of similarity between the estimated emotion of the customer U and the emotion of the person in charge T as the degree of sympathy in the dialogue. The communication unit 11 transmits the degree of empathy calculated by the image processing unit 132 to the terminal device 20 .

ここで、サーバ装置１０は、「対話支援サーバ」の一例である。カメラＣ１は、「第１カメラ」の一例である。カメラＣ２は、「第２カメラ」の一例である。画像データＤ１は、「第１画像データ」の一例である。画像データＤ２は、「第２画像データ」の一例である。画像データＤ１が示す顧客Ｕの顔の感情は、「第１感情」の一例である。画像データＤ２が示す担当者Ｔの顔の感情は、「第２感情」の一例である。 Here, the server device 10 is an example of a "dialogue support server." The camera C1 is an example of a "first camera". Camera C2 is an example of a “second camera”. The image data D1 is an example of "first image data". The image data D2 is an example of "second image data". The emotion of the customer U's face indicated by the image data D1 is an example of the "first emotion". The emotion of the person in charge T's face indicated by the image data D2 is an example of the "second emotion".

これにより、実施形態のサーバ装置１０では、対話における共感度を推定し、推定した共感度を担当者Ｔに提示することができる。共感度は、顧客Ｕの感情と担当者Ｔの感情とか類似する度合である。このため、担当者Ｔは、顧客との対話において、顧客Ｕに共感しているか確認することができる。仮に共感度が低い場合には、担当者Ｔは、自身の顔の表情を変化させて、顧客Ｕの感情に近づけるなどして、共感度を高めるように対応することが可能である。したがって、担当者が顧客の感情に寄り添うことができる。 As a result, the server device 10 of the embodiment can estimate the degree of empathy in the dialogue and present the estimated degree of empathy to the person in charge T. FIG. The degree of sympathy is the degree of similarity between the feelings of the customer U and the feelings of the person in charge T. Therefore, the person in charge T can confirm whether or not he/she sympathizes with the customer U in the dialogue with the customer. If the degree of empathy is low, the person in charge T can change his/her facial expression to make it closer to the emotion of the customer U, thereby increasing the degree of empathy. Therefore, the person in charge can get close to the customer's feelings.

顧客Ｕと信頼関係を構築するためには、感情的に受容され共感されることの有用性が指摘されている（例えば、「今井、雄西、坂東、”納得の概念分析”、日本看護研究学会雑誌 Vol. 39 No. 2 2016」、「今井、雄西、坂東、”転移のある高齢がん患者の治療に対する納得の要素”、日本がん看護学会誌、30巻（2016）、3号 p.19-28」などを参照）。また、信頼関係の構築のテクニックとして、ミラーリングが有用であることが示唆されている。本実施形態では、顔画像から、顧客Ｕと担当者Ｔの感情が似ている度合（シンクロ率）を共感度として算出して、担当者に提示する。これにより、経験の少ない担当者であっても、顧客の感情を受容して共感を得られるように対話を進めることができ、顧客Ｕと信頼関係を構築することが可能となる。 In order to build a relationship of trust with the customer U, the usefulness of being emotionally accepted and sympathetic has been pointed out Academic Journal Vol. 39 No. 2 2016", "Imai, Yusai, Bando, ``Convincing Factors for Treatment of Elderly Cancer Patients with Metastasis'', Journal of Japanese Society of Cancer Nursing, Vol. 30 (2016), No. 3 pp.19-28”). It is also suggested that mirroring is useful as a technique for building trusting relationships. In this embodiment, the degree of similarity (synchronization rate) between the emotions of the customer U and the person in charge T is calculated from the face image as the degree of empathy, and is presented to the person in charge. As a result, even an inexperienced person in charge can accept the customer's feelings and proceed with the dialogue so as to gain sympathy, making it possible to build a relationship of trust with the customer U.

また、実施形態のサーバ装置１０では、画像処理部１３２は、顧客Ｕの顔が特定の対象感情（例えば、喜びや悲しみなどの感情）を示す度合（第１度合）を推定する。画像処理部１３２は、担当者Ｔの顔についても、その特定の対象感情を示す度合（第２度合）を推定する。画像処理部１３２は、例えば、（１）式に示すように、第１度合と第２度合の差分の絶対値に基づいて共感度を算出する。これにより、実施形態のサーバ装置１０では、定量的に、共感度を算出することができる。 Further, in the server device 10 of the embodiment, the image processing unit 132 estimates the degree (first degree) to which the face of the customer U expresses a specific target emotion (for example, an emotion such as joy or sadness). The image processing unit 132 also estimates the degree (second degree) of the person in charge T's face indicating the specific target emotion. For example, the image processing unit 132 calculates the degree of empathy based on the absolute value of the difference between the first degree and the second degree, as shown in Equation (1). As a result, the server device 10 of the embodiment can quantitatively calculate the degree of empathy.

また、実施形態のサーバ装置１０では、画像処理部１３２は、顧客Ｕの悩み度を推定する。画像処理部１３２は、画像データＤ１が示す顧客Ｕの顔の特徴点を悩み度推定モデルに入力して得られた出力を、悩み度として推定する。悩み度推定モデルは、学習用データセットを用いて、顔画像と、その顔が悩んでいるか否かとの対応関係を機械学習したモデルである。学習用データセットは、学習用の顔画像に当該学習用の顔画像が悩んでいる顔であるか否かを示すラベルが対応付けられてペアとなったデータセットである。悩み度推定モデルは、入力された顔画像が前記悩んでいる顔である度合を推定するモデルである。これにより、実施形態のサーバ装置１０では、機械学習の手法を用いて、悩み度を算出することができる。 Further, in the server device 10 of the embodiment, the image processing unit 132 estimates the customer U's degree of concern. The image processing unit 132 estimates the output obtained by inputting the feature points of the face of the customer U indicated by the image data D1 into the worry level estimation model as the worry level. The worry level estimation model is a model obtained by machine-learning the correspondence relationship between a face image and whether or not the face is worried using a learning data set. The learning data set is a data set in which a learning face image is paired with a label indicating whether or not the learning face image is a face of concern. The degree-of-worry estimation model is a model for estimating the degree to which an input face image is the face that worries. As a result, in the server device 10 of the embodiment, the degree of worry can be calculated using a machine learning method.

また、実施形態のサーバ装置１０では、音声処理部１３３を更に備える。取得部１３０は音声データＶを取得する。音声処理部１３３は、音声データＶから頻出キーワードを抽出する。音声処理部１３３は、音声データＶに含まれる音声をテキスト化し、テキスト化した音声から出現頻度が閾値以上であるキーワードを、頻出キーワードとして抽出する。通信部１１は、音声処理部１３３によって抽出された頻出キーワードを、端末装置２０に送信する。これにより、実施形態のサーバ装置１０では、対話において頻繁に登場する単語を、担当者Ｔに提示することができる。したがって、担当者Ｔは、対話において顧客Ｕが繰り返し発言する単語を認識することができ、顧客Ｕが関心を持つ事項に沿って対話を進めることが可能となる。 Further, the server device 10 of the embodiment further includes an audio processing unit 133 . Acquisition unit 130 acquires voice data V. FIG. The voice processing unit 133 extracts frequent keywords from the voice data V. FIG. The speech processing unit 133 converts the speech contained in the speech data V into text, and extracts keywords whose frequency of appearance is equal to or higher than a threshold from the converted speech as frequent keywords. The communication unit 11 transmits the frequent keywords extracted by the speech processing unit 133 to the terminal device 20 . As a result, the server device 10 of the embodiment can present the person in charge T with words that frequently appear in the dialogue. Therefore, the person in charge T can recognize the words repeatedly spoken by the customer U in the dialogue, and can proceed with the dialogue along the matters in which the customer U is interested.

また、実施形態のサーバ装置１０では、音声処理部１３３は、音声データＶから抽出した頻出キーワードに基づいて、類似ケースを抽出する。音声処理部１３３は、音声データＶに含まれる音声をテキスト化した文章を、Ｄｏｃ２Ｖｅｃなどを用いて、その内容に応じたベクトル空間上の位置を算出し、ベクトル表現する。音声処理部１３３は、ベクトル空間において、近い位置にある対話を、類似ケースとして抽出する。通信部１１は、音声処理部１３３によって抽出された類似ケースを、端末装置２０に送信する。これにより、実施形態のサーバ装置１０では、今回行われている対話と、似た対話を、担当者Ｔに提示することができる。したがって、担当者Ｔは、今回の対話と似た対話のケースを認識することができ、類似ケースを参考にしながら対話を進めることが可能となる。 In addition, in the server device 10 of the embodiment, the speech processing unit 133 extracts similar cases based on frequent keywords extracted from the speech data V. FIG. The speech processing unit 133 uses Doc2Vec or the like to calculate the position in the vector space according to the content of the sentence obtained by converting the speech contained in the speech data V into text, and expresses it as a vector. The speech processing unit 133 extracts dialogues at close positions in the vector space as similar cases. The communication unit 11 transmits the similar cases extracted by the speech processing unit 133 to the terminal device 20 . As a result, in the server device 10 of the embodiment, it is possible to present the person in charge T with a dialogue similar to the current dialogue. Therefore, the person in charge T can recognize cases of dialogue similar to the current dialogue, and can proceed with the dialogue while referring to similar cases.

なお、上述した実施例においては、顧客Ｕと担当者Ｔとが、対面にて、対話を行う場合を例示して説明した。しかしながら、これに限定されない。顧客Ｕと担当者Ｔとが、リモートにて、対話を行う場合にも、対話支援システム１を適用することができる。この場合、対話支援システム１は、２つのマイクＭ（以下、マイクＭ１、Ｍ２という）を備える。マイクＭ１は、顧客Ｕの音声を集音し、集音した音声である第１音声データをサーバ装置１０に送信する。マイクＭ２は、担当者Ｔの音声を集音し、集音した音声である第２音声データをサーバ装置１０に送信する。サーバ装置１０は、マイクＭ１から通知された第１音声データ、マイクＭ２から通知された第２音声データに基づいて、キーワードデータ１２４を生成する。また、サーバ装置１０は、第１音声データ、第２音声データに基づいて、ケーステキストデータ１２５を生成する。 In the above-described embodiment, the case where the customer U and the person in charge T have a face-to-face conversation has been exemplified and explained. However, it is not limited to this. The dialogue support system 1 can also be applied when the customer U and the person in charge T have a dialogue remotely. In this case, the dialogue support system 1 includes two microphones M (hereinafter referred to as microphones M1 and M2). The microphone M1 collects the voice of the customer U and transmits first voice data, which is the collected voice, to the server device 10 . The microphone M2 collects the voice of the person in charge T and transmits second voice data, which is the collected voice, to the server device 10 . The server device 10 generates the keyword data 124 based on the first voice data notified from the microphone M1 and the second voice data notified from the microphone M2. The server device 10 also generates the case text data 125 based on the first voice data and the second voice data.

また、上述した実施例においては、図１２のフローにおいて、ステップＳ１１～Ｓ１３を順に実行する場合を例示して説明した。しかしながら、これに限定されない。ステップＳ１２、Ｓ１３は、共に、ステップＳ１１にて抽出された顔の特徴点を用いた処理である。このため、ステップＳ１２、Ｓ１３のうち何れを先に実行してもよい。すなわち、ステップＳ１１の次に、ステップＳ１３を実行してもよい。 Further, in the above-described embodiment, the case in which steps S11 to S13 are sequentially executed in the flow of FIG. 12 has been exemplified and explained. However, it is not limited to this. Both steps S12 and S13 are processes using the facial feature points extracted in step S11. Therefore, either of steps S12 and S13 may be executed first. That is, step S13 may be executed after step S11.

また、述した実施例においては、（１）式を用いて共感度を算出する場合を例示して説明した。しかしながら、これに限定されない。共感度を算出する数式は、少なくとも顧客Ｕと担当者Ｔの感情が類似する度合を算出することができれば、任意の数式であってよい。また、共感度を算出するためのテーブルや、計算モデル、学習モデルなどが用いられてもよい。 Moreover, in the above-described embodiment, the case where the degree of empathy is calculated using the formula (1) has been exemplified and explained. However, it is not limited to this. The formula for calculating the degree of empathy may be any formula as long as it can at least calculate the degree of similarity between the emotions of the customer U and the person in charge T. Also, a table for calculating empathy, a calculation model, a learning model, or the like may be used.

（１）式以外に、共感度を算出する数式として、例えば、下記の（２）式が用いられてもよい。（２）式におけるａ_ｉ、及びｂ_ｉは（１）式におけるａ_ｉ、及びｂ_ｉと同様である。 In addition to formula (1), for example, formula (2) below may be used as a formula for calculating the degree of empathy. a _i and b _i in formula (2) are the same as a _i and b _i in formula (1).

（２）式を用いた場合、所定の範囲（ここでは、０～１の範囲）に収まるように、共感度を算出することができる。これにより、例えば、図１１のように、端末装置２０のディスプレイに、共感度を時系列のグラフにて表示させる場合に、極端に低い値や極端に大きな値が表示されてしまうことがない。このため、担当者Ｔが画面を見た場合に、共感度のグラフの表示分解能が粗なっていて共感度の変化が把握し難くなってしまうような事態を回避することができる。 When using formula (2), the degree of empathy can be calculated so as to fall within a predetermined range (here, the range of 0 to 1). As a result, for example, as shown in FIG. 11, when the degree of empathy is displayed as a time-series graph on the display of the terminal device 20, extremely low values and extremely large values are not displayed. Therefore, when the person in charge T looks at the screen, it is possible to avoid a situation in which the display resolution of the empathy level graph is coarse and it becomes difficult to grasp the change in the empathy level.

上述した実施形態における対話支援システム１、及びサーバ装置１０の全部又は一部をコンピュータで実現するようにしてもよい。その場合、この機能を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することによって実現してもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムを送信する場合の通信線のように、短時間の間、動的にプログラムを保持するもの、その場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリのように、一定時間プログラムを保持しているものも含んでもよい。また上記プログラムは、前述した機能の一部を実現するためのものであってもよく、さらに前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるものであってもよく、ＦＰＧＡ（Field Programmable Gate Array）等のプログラマブルロジックデバイスを用いて実現されるものであってもよい。 All or part of the dialogue support system 1 and the server device 10 in the above-described embodiments may be realized by a computer. In that case, a program for realizing this function may be recorded in a computer-readable recording medium, and the program recorded in this recording medium may be read into a computer system and executed. It should be noted that the "computer system" referred to here includes hardware such as an OS and peripheral devices. The term "computer-readable recording medium" refers to portable media such as flexible discs, magneto-optical discs, ROMs and CD-ROMs, and storage devices such as hard discs incorporated in computer systems. Furthermore, "computer-readable recording medium" means a medium that dynamically retains a program for a short period of time, like a communication line when transmitting a program via a network such as the Internet or a communication line such as a telephone line. It may also include something that holds the program for a certain period of time, such as a volatile memory inside a computer system that serves as a server or client in that case. Further, the program may be for realizing a part of the functions described above, or may be capable of realizing the functions described above in combination with a program already recorded in the computer system. It may be implemented using a programmable logic device such as an FPGA (Field Programmable Gate Array).

以上、この発明の実施形態について図面を参照して詳述してきたが、具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計等も含まれる。 Although the embodiment of the present invention has been described in detail with reference to the drawings, the specific configuration is not limited to this embodiment, and design and the like are included within the scope of the gist of the present invention.

１…対話支援システム、１０…サーバ装置（対話支援サーバ）、１２…記憶部、１２０…アセスメントデータ、１２１…感情データ、１２２…悩み度データ、１２３…共感度データ、１２４…キーワードデータ、１２５…ケーステキストデータ、１３…制御部、１３０…取得部、１３１…アンケート処理部、１３２…画像処理部、１３３…音声処理部、２０…端末装置 DESCRIPTION OF SYMBOLS 1... Dialogue support system 10... Server apparatus (dialogue support server) 12... Storage part 120... Assessment data 121... Emotion data 122... Worry level data 123... Sympathy level data 124... Keyword data 125... Case text data 13... Control unit 130... Acquisition unit 131... Questionnaire processing unit 132... Image processing unit 133... Audio processing unit 20... Terminal device

Claims

an acquisition unit that acquires first image data in which the customer's face is imaged during the dialogue between the customer and the person in charge, and second image data in which the face of the person in charge is imaged during the dialogue;
estimating a first emotion indicated by the face indicated by the first image data, estimating a second emotion indicated by the face indicated by the second image data, and calculating the first emotion and the second emotion; an image processing unit that calculates the degree of similarity as the degree of empathy during the dialogue;
a communication unit that transmits the empathy level calculated by the image processing unit to the terminal device of the person in charge;
A dialogue support server comprising:

The image processing unit estimates a first degree that the face of the customer indicates the specific target emotion as the first emotion, and estimates a second degree that the face of the person in charge indicates the target emotion as the first emotion. 2 estimating as emotion, and calculating the empathy level based on the absolute value of the difference between the first degree and the second degree;
A dialogue support server according to claim 1.

The image processing unit estimates, as a degree of concern, the degree of concern of the customer based on the feature points extracted from the first image data and the concern degree estimation model,
The communication unit transmits the empathy level and the distress level calculated by the image processing unit to the terminal device of the person in charge,
The worry level estimation model uses a learning data set in which a label indicating whether or not the learning face image is a face that the user is worried about is associated with the learning face image. It is a model that performs machine learning of the correspondence relationship with the face that is in question, and is a model that estimates the degree to which the input face image is the face that worries.
3. A dialogue support server according to claim 1 or 2.

further comprising an audio processing unit,
The acquisition unit acquires audio data collected by a microphone that collects audio during the dialogue,
The speech processing unit converts the speech contained in the speech data into text, extracts a keyword having a frequency of appearance equal to or higher than a threshold from the textualized speech,
The communication unit transmits the keyword extracted by the voice processing unit to the terminal device of the person in charge.
A dialogue support server according to any one of claims 1 to 3.

The speech processing unit refers to a dialogue database in which dialogues are stored based on the keywords, extracts dialogues in which the appearance frequency of the keywords is equal to or higher than a threshold value from the dialogue database as similar cases,
The communication unit transmits the similar case extracted by the speech processing unit to the terminal device of the person in charge.
5. A dialogue support server according to claim 4.

a first camera that captures the face of the customer during conversation between the customer and the person in charge;
a second camera that captures the face of the person in charge during the dialogue;
a terminal device of the person in charge;
6. The dialogue support server according to any one of claims 1 to 5, wherein first image data captured by said first camera and second image data captured by said second camera are acquired. and a dialogue support server that transmits the empathy level to the terminal device;
A dialogue support system.

A dialogue support method performed by a computer device,
An acquisition unit acquires first image data in which the customer's face is imaged during the dialogue between the customer and the person in charge, and second image data in which the face of the person in charge is imaged during the dialogue,
An image processing unit estimates a first emotion indicated by the face indicated by the first image data, estimates a second emotion indicated by the face indicated by the second image data, and calculates the first emotion. calculating the degree of similarity with the second emotion as the degree of empathy during the dialogue;
The communication unit transmits the empathy level calculated by the image processing unit to the terminal device of the person in charge;
Dialogue support method.

6. A program for causing a computer to operate as the dialogue support server according to any one of claims 1 to 5, the program for causing the computer to function as each unit included in the dialogue support server.