JP6746963B2

JP6746963B2 - Conversation evaluation device, program, and conversation evaluation method

Info

Publication number: JP6746963B2
Application number: JP2016042271A
Authority: JP
Inventors: 英樹阪梨; 嘉山　啓; 啓嘉山
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2016-03-04
Filing date: 2016-03-04
Publication date: 2020-08-26
Anticipated expiration: 2036-03-04
Also published as: JP2017156688A

Description

本発明は、音声による会話を評価する技術に関する。 The present invention relates to a technique for evaluating voice conversation.

発話音声の解析により種々の事柄を評価する技術が従来から提案されている。例えば特許文献１には、発話音声の音程シーケンスにおける基音の間隔から話者の心理的または生理的な状態を推定する技術が開示されている。 Conventionally, a technique for evaluating various matters by analyzing a speech voice has been proposed. For example, Patent Document 1 discloses a technique of estimating the psychological or physiological state of a speaker from the interval of fundamental tones in the pitch sequence of the spoken voice.

特許第４４９５９０７号公報Japanese Patent No. 4495907

しかし、特許文献１の技術では、特定の話者の状態が推定されるに過ぎず、例えば複数の話者間の音声による会話（例えば発話に対する応答の音声の印象）を客観的に評価することはできない。以上の事情を考慮して、本発明は、音声による会話を客観的に評価することを目的とする。 However, the technique of Patent Document 1 merely estimates the state of a specific speaker, and objectively evaluates, for example, a voice conversation between a plurality of speakers (for example, an impression of a voice as a response to an utterance). I can't. In consideration of the above circumstances, the present invention aims to objectively evaluate a voice conversation.

以上の課題を解決するために、本発明の第１態様に係る会話評価装置は、会話を構成する音声の特徴量を取得する特徴取得部と、会話について特徴量とは別種の関連情報を生成する情報生成部と、特徴量と関連情報とに応じて会話を評価する会話評価部とを具備する。以上の態様では、会話を構成する音声の特徴量に応じて当該会話を客観的に評価することが可能である。また、特徴量とは別種の関連情報が特徴量とともに会話の評価に加味されるから、特徴量のみを評価に反映させる構成と比較して会話を適切に評価することが可能である。 In order to solve the above problems, the conversation evaluation apparatus according to the first aspect of the present invention generates a feature acquisition unit that acquires a feature amount of a voice forming a conversation and a related information that is different from the feature amount regarding the conversation. And a conversation evaluation unit that evaluates the conversation according to the feature amount and the related information. In the above aspect, it is possible to objectively evaluate the conversation according to the feature amount of the voice that constitutes the conversation. In addition, since the related information different from the feature amount is added to the evaluation of the conversation together with the feature amount, it is possible to appropriately evaluate the conversation as compared with the configuration in which only the feature amount is reflected in the evaluation.

本発明の好適な態様において、特徴取得部は、関連情報に応じた条件で特徴量を取得する。以上の態様では、関連情報に応じた条件で特徴量が取得されるから、特徴量の取得に関連情報を利用しない構成と比較して特徴量を適切に取得できるという利点がある。 In a preferred aspect of the present invention, the characteristic acquisition unit acquires the characteristic amount under a condition according to the related information. In the above aspect, since the characteristic amount is acquired under the condition according to the related information, there is an advantage that the characteristic amount can be appropriately acquired as compared with the configuration in which the related information is not used for acquiring the characteristic amount.

本発明の第２態様に係る会話評価装置は、会話を構成する音声の特徴量を取得する特徴取得部と、会話について特徴量とは別種の関連情報を生成する情報生成部と、特徴量に応じて会話を評価する会話評価部とを具備し、特徴取得部は、関連情報に応じた条件で特徴量を取得する。以上の態様では、会話を構成する音声の特徴量に応じて当該会話を客観的に評価することが可能である。また、関連情報に応じた条件で特徴量が取得されるから、特徴量の取得に関連情報を利用しない構成と比較して特徴量を適切に取得できるという利点がある。 A conversation evaluation apparatus according to a second aspect of the present invention includes a feature acquisition unit that acquires a feature amount of a voice forming a conversation, an information generation unit that generates related information of a conversation different from the feature amount, and a feature amount. And a conversation evaluation unit that evaluates the conversation accordingly, and the feature acquisition unit acquires the feature amount under a condition according to the related information. In the above aspect, it is possible to objectively evaluate the conversation according to the feature amount of the voice that constitutes the conversation. Further, since the characteristic amount is acquired under the condition according to the related information, there is an advantage that the characteristic amount can be appropriately acquired as compared with the configuration in which the related information is not used for acquiring the characteristic amount.

前述の各態様に係る会話評価装置の好適例において、特徴取得部は、会話を構成する第１音声および第２音声の各々の音高を特徴量として取得し、会話評価部は、第１音声と第２音声との音高差に応じて会話を評価する。以上の態様では、第１音声と第２音声との音高差に応じて会話が評価されるから、発話音声の音高に対する応答音声の音高の関係という観点から応答音声の印象の良否を客観的に評価することが可能である。 In a preferred example of the conversation evaluation device according to each of the above-described aspects, the feature acquisition unit acquires the pitch of each of the first voice and the second voice that form a conversation as a feature amount, and the conversation evaluation unit uses the first voice. The conversation is evaluated according to the pitch difference between the second voice and the second voice. In the above aspect, since the conversation is evaluated according to the pitch difference between the first voice and the second voice, the quality of the impression of the response voice is judged from the viewpoint of the relationship between the pitch of the utterance voice and the pitch of the response voice. It can be evaluated objectively.

前述の各態様において、関連情報は、例えば、会話の時間的な状況、会話の話者間における過去の会話の履歴、会話の話者間の関係、および、会話の各話者の属性の少なくともひとつを示す情報である。 In each of the aforementioned aspects, the relevant information is, for example, at least the temporal situation of the conversation, the history of past conversations between the speakers of the conversation, the relationship between the speakers of the conversation, and the attributes of each speaker of the conversation. This is one piece of information.

第１実施形態の会話評価装置の構成図である。It is a block diagram of the conversation evaluation apparatus of 1st Embodiment. 会話評価処理のフローチャートである。It is a flow chart of conversation evaluation processing. 第２実施形態の会話評価装置の構成図である。It is a block diagram of the conversation evaluation apparatus of 2nd Embodiment. 第３実施形態の会話評価装置の構成図である。It is a block diagram of the conversation evaluation apparatus of 3rd Embodiment. 第４実施形態の会話評価装置の構成図である。It is a block diagram of the conversation evaluation apparatus of 4th Embodiment.

＜第１実施形態＞
図１は、本発明の第１実施形態に係る会話評価装置１００の構成図である。第１実施形態の会話評価装置１００は、利用者Ｕ1と利用者Ｕ2との間の会話を評価する解析装置であり、例えば好印象な会話の訓練に好適に使用される。利用者Ｕ1が発音する音声Ｖ1（第１音声の例示）と利用者Ｕ2が発音する音声Ｖ2（第２音声の例示）とで会話が構成される。 <First Embodiment>
FIG. 1 is a configuration diagram of a conversation evaluation device 100 according to the first embodiment of the present invention. The conversation evaluation device 100 according to the first embodiment is an analysis device that evaluates a conversation between the user U1 and the user U2, and is preferably used for training a good conversation, for example. A conversation is composed of the voice V1 (an example of the first voice) pronounced by the user U1 and the voice V2 (an example of the second voice) pronounced by the user U2.

第１実施形態では、例えば問掛けおよび話掛けを含む発話の音声Ｖ1を利用者Ｕ1が発音し、利用者Ｕ1からの問掛けに対する回答や話掛けに対する受応えを含む応答の音声Ｖ2を利用者Ｕ2が発音する場合を想定する。利用者Ｕ2が発音する音声Ｖ2は、例えば間投詞を意味する音声である。例えば、「うん」「ええ」等の相鎚や、「え〜と」「あの〜」等の言淀み（応答の停滞）、「はい」「いいえ」等の回答（質問に対する肯定／否定）、話者の感動を表す「ああ」「おお」等の語句、あるいは、発話に対する問返し（聞き直し）を意味する「え？」「なに？」等の語句が、間投詞として例示され得る。 In the first embodiment, for example, the user U1 pronounces a voice V1 of an utterance including an inquiry and a talk, and a voice V2 of a response including an answer to the inquiry from the user U1 and an answer to the talk from the user U1 to the user. Suppose U2 is pronounced. The voice V2 pronounced by the user U2 is, for example, a voice meaning an interjection. For example, "yes", "yes", etc., stagnation such as "e-to", "that-", etc. (stagnation in response), "yes", "no", etc. answers (affirmation/denial to the question), Words such as “oh” and “oh” that express the emotion of the speaker, or words and phrases such as “e?” and “what?” that mean a question (repeat) to the utterance can be exemplified as the interjection.

図１に例示される通り、第１実施形態の会話評価装置１００は、制御装置１２と記憶装置１４と表示装置１６と入力装置１８と収音装置２２と収音装置２４とを具備するコンピュータシステムで実現される。例えば携帯電話機やスマートフォン等の可搬型の情報処理装置またはパーソナルコンピュータ等の情報処理装置で会話評価装置１００は実現され得る。なお、相互に別体で構成された複数の装置により会話評価装置１００を実現することも可能である。 As illustrated in FIG. 1, the conversation evaluation device 100 according to the first embodiment is a computer system including a control device 12, a storage device 14, a display device 16, an input device 18, a sound collecting device 22, and a sound collecting device 24. Will be realized in. For example, the conversation evaluation device 100 can be realized by a portable information processing device such as a mobile phone or a smartphone or an information processing device such as a personal computer. It is also possible to realize the conversation evaluation device 100 by a plurality of devices configured separately from each other.

収音装置２２および収音装置２４は、周囲の音響を収音する音声入力機器である。収音装置２２は、利用者Ｕ1が発音した音声Ｖ1を表す音声信号Ｘ1を生成し、収音装置２４は、利用者Ｕ2が発音した音声Ｖ2を表す音声信号Ｘ2を生成する。なお、音声信号Ｘ1および音声信号Ｘ2の各々をアナログからデジタルに変換するＡ/Ｄ変換器の図示は便宜的に省略した。 The sound pickup device 22 and the sound pickup device 24 are voice input devices that pick up ambient sounds. The sound collecting device 22 generates a sound signal X1 representing the sound V1 sounded by the user U1, and the sound collecting device 24 generates a sound signal X2 representing the sound V2 sounded by the user U2. The A/D converter for converting each of the audio signal X1 and the audio signal X2 from analog to digital is omitted for convenience of illustration.

制御装置１２は、例えばＣＰＵ（Central Processing Unit）等の処理回路を含んで構成され、会話評価装置１００の各要素を統括的に制御する。具体的には、制御装置１２は、収音装置２２が生成する音声信号Ｘ1と収音装置２４が生成する音声信号Ｘ2とを解析することで、利用者Ｕ1と利用者Ｕ2との会話を評価する。第１実施形態の制御装置１２は、利用者Ｕ1の発話に対する利用者Ｕ2の応答について印象の良否の指標（以下「評価値」という）Ｓを算定する。 The control device 12 is configured to include a processing circuit such as a CPU (Central Processing Unit), and centrally controls each element of the conversation evaluation device 100. Specifically, the control device 12 analyzes the voice signal X1 generated by the sound collecting device 22 and the voice signal X2 generated by the sound collecting device 24 to evaluate the conversation between the user U1 and the user U2. To do. The control device 12 of the first embodiment calculates an index (hereinafter, referred to as “evaluation value”) S of impression of the response of the user U2 to the utterance of the user U1.

表示装置１６（例えば液晶表示パネル）は、制御装置１２による制御のもとで各種の画像を表示する。例えば、利用者Ｕ1と利用者Ｕ2との会話の評価結果（評価値Ｓ）が表示装置１６に表示される。入力装置１８は、会話評価装置１００に対する利用者Ｕ（例えば利用者Ｕ1や利用者Ｕ2）からの指示を受付ける。例えば利用者Ｕ（Ｕ1，Ｕ2）が操作する複数の操作子や、表示装置１６の表示面に対する接触を検知するタッチパネルが入力装置１８として好適に利用される。 The display device 16 (for example, a liquid crystal display panel) displays various images under the control of the control device 12. For example, the evaluation result (evaluation value S) of the conversation between the users U1 and U2 is displayed on the display device 16. The input device 18 receives an instruction from the user U (for example, the user U1 or the user U2) to the conversation evaluation device 100. For example, a plurality of operators operated by the user U (U1, U2) and a touch panel that detects contact with the display surface of the display device 16 are preferably used as the input device 18.

記憶装置１４は、制御装置１２が実行するプログラムや制御装置１２が使用する各種のデータを記憶する。例えば半導体記録媒体または磁気記録媒体等の公知の記録媒体、あるいは、複数の記録媒体の組合せが記憶装置１４として任意に採用され得る。第１実施形態の制御装置１２は、記憶装置１４に記憶されたプログラムを実行することで、利用者Ｕ1と利用者Ｕ2との会話を評価するための複数の機能（特徴取得部３２，情報生成部３４，会話評価部３６）を実現する。なお、制御装置１２の機能を複数の装置に分散した構成や、制御装置１２の機能の一部または全部を専用の電子回路が実現する構成も採用され得る。 The storage device 14 stores programs executed by the control device 12 and various data used by the control device 12. For example, a known recording medium such as a semiconductor recording medium or a magnetic recording medium, or a combination of a plurality of recording media may be arbitrarily adopted as the storage device 14. The control device 12 of the first embodiment executes a program stored in the storage device 14 to execute a plurality of functions for evaluating the conversation between the user U1 and the user U2 (feature acquisition unit 32, information generation). The unit 34 and the conversation evaluation unit 36) are realized. A configuration in which the functions of the control device 12 are distributed to a plurality of devices, or a configuration in which a dedicated electronic circuit realizes some or all of the functions of the control device 12 may be employed.

特徴取得部３２は、利用者Ｕ1の音声Ｖ1の特徴量と利用者Ｕ2の音声Ｖ2の特徴量とを取得する。第１実施形態の特徴取得部３２は、音声信号Ｘ1の解析により利用者Ｕ1の音声Ｖ1の特徴量を抽出し、音声信号Ｘ2の解析により利用者Ｕ2の音声Ｖ2の特徴量を抽出する。具体的には、音声Ｖ1および音声Ｖ2の各々について韻律に関する特徴量が抽出される。韻律は、受聴者が知覚し得る言語学的および音声学的な特性であり、言語の一般的な表記のみからでは把握できない性質を意味する。 The characteristic acquisition unit 32 acquires the characteristic amount of the voice V1 of the user U1 and the characteristic amount of the voice V2 of the user U2. The feature acquisition unit 32 of the first embodiment extracts the feature amount of the voice V1 of the user U1 by analyzing the voice signal X1, and extracts the feature amount of the voice V2 of the user U2 by analyzing the voice signal X2. Specifically, the prosodic feature amount is extracted for each of the voice V1 and the voice V2. Prosody is a linguistic and phonetic characteristic that can be perceived by a listener, and means a property that cannot be understood only from general notation of language.

第１実施形態の特徴取得部３２は、利用者Ｕ1の音声Ｖ1の音高Ｐ1と利用者Ｕ2の音声Ｖ2の音高Ｐ2とを特徴量として抽出する。例えば、特徴取得部３２は、音声信号Ｘ1の発話区間内の平均的な音高Ｐ1と音声信号Ｘ2の発話区間内の平均的な音高Ｐ2とを抽出する。発話区間は、発話が継続する区間（一連の発話の始点から終点までの区間）である。音高Ｐ1および音高Ｐ2の抽出には公知の音声解析技術が任意に採用され得る。 The feature acquisition unit 32 of the first embodiment extracts the pitch P1 of the voice V1 of the user U1 and the pitch P2 of the voice V2 of the user U2 as feature amounts. For example, the feature acquisition unit 32 extracts the average pitch P1 in the utterance section of the voice signal X1 and the average pitch P2 in the utterance section of the voice signal X2. The utterance section is a section in which utterance continues (a section from a start point to an end point of a series of utterances). A known voice analysis technique may be arbitrarily adopted to extract the pitch P1 and the pitch P2.

利用者Ｕ1が発話した音声Ｖ1の音高Ｐ1に対して特定の関係にある音高Ｐ2の音声Ｖ2で利用者Ｕ2が応答した場合に、利用者Ｕ1は、利用者Ｕ2の音声Ｖ2が心地良く安心感のある好印象な応答であると知覚する、という傾向がある。具体的には、利用者Ｕ1の音高Ｐ1に対して協和関係にある音高Ｐ2を利用者Ｕ2が発音した場合に、利用者Ｕ2の応答は良好な印象と感取される。また、利用者Ｕ2による応答の印象に特に大きく影響するのは、利用者Ｕ1による音声Ｖ1の発話区間のうち音声Ｖ2の発話区間の始点に近い末尾側の区間である。そこで、第１実施形態の特徴取得部３２は、利用者Ｕ1の音声Ｖ1の発話区間のうち当該発話区間の末尾に位置する所定長（例えば180msec）の区間の音高Ｐ1を特定する。 When the user U2 responds with the voice V2 of the pitch P2 having a specific relationship to the pitch P1 of the voice V1 spoken by the user U1, the user U1 can comfortably hear the voice V2 of the user U2. There is a tendency to perceive that the response is safe and has a good impression. Specifically, when the user U2 pronounces a pitch P2 that is in a harmony relationship with the pitch P1 of the user U1, the response of the user U2 is perceived as a good impression. Further, the impression of the response by the user U2 is particularly greatly influenced by the end portion of the utterance section of the voice V1 by the user U1 near the start point of the utterance section of the voice V2. Therefore, the feature acquisition unit 32 of the first embodiment specifies the pitch P1 of a section of a predetermined length (for example, 180 msec) located at the end of the utterance section of the voice V1 of the user U1.

図１の情報生成部３４は、利用者Ｕ1と利用者Ｕ2との会話について、特徴取得部３２が抽出する特徴量とは別種の情報（以下「関連情報」という）Ｒを生成する。関連情報Ｒは、利用者Ｕ1と利用者Ｕ2との会話に関連する情報である。第１実施形態では、会話の時間的な状況を示す関連情報Ｒを例示する。具体的には、会話日時（例えば日付や時刻）と会話継続長（例えば会話開始からの経過時間）とを会話状況として示す関連情報Ｒを情報生成部３４は生成する。例えば、情報生成部３４は、計時回路（図示略）が計時する時刻を参照して会話日時および会話継続長を特定する。すなわち、例えば音声Ｖ1または音声Ｖ2における最新の発話区間の始点の日時が会話日時として特定され、音声Ｖ1または音声Ｖ2における最先の発話区間の開始時刻から現在時刻までの経過時間が会話継続長として特定される。 The information generation unit 34 in FIG. 1 generates information (hereinafter referred to as “related information”) R different from the feature amount extracted by the feature acquisition unit 32, regarding the conversation between the users U1 and U2. The related information R is information related to the conversation between the users U1 and U2. In the first embodiment, the related information R indicating the temporal situation of conversation is illustrated. Specifically, the information generation unit 34 generates related information R indicating the conversation date and time (for example, date and time) and the conversation duration (for example, elapsed time from the start of conversation) as the conversation situation. For example, the information generator 34 identifies the conversation date and time and the conversation duration by referring to the time measured by a timing circuit (not shown). That is, for example, the date and time of the start point of the latest utterance section in the voice V1 or voice V2 is specified as the conversation date and time, and the elapsed time from the start time of the earliest utterance section in the voice V1 or voice V2 to the current time is the conversation duration. Specified.

会話評価部３６は、特徴取得部３２が抽出した特徴量（音高Ｐ1，音高Ｐ2）と情報生成部３４が生成した関連情報Ｒとに応じて利用者Ｕ1と利用者Ｕ2との会話を評価する。すなわち、会話評価部３６は、音高Ｐ1および音高Ｐ2と関連情報Ｒとに応じた評価値Ｓを算定する。以上の説明から理解される通り、第１実施形態では、会話を構成する音声（Ｖ1，Ｖ2）自体の特徴量だけでなく特徴量以外の関連情報Ｒも会話の評価に加味される。会話評価部３６が算定した評価値Ｓが表示装置１６に表示される。 The conversation evaluation unit 36 conducts a conversation between the users U1 and U2 according to the feature amount (pitch P1, pitch P2) extracted by the feature acquisition unit 32 and the related information R generated by the information generation unit 34. evaluate. That is, the conversation evaluation unit 36 calculates the evaluation value S according to the pitch P1 and the pitch P2 and the related information R. As can be understood from the above description, in the first embodiment, not only the feature amount of the voice (V1, V2) forming the conversation but also the related information R other than the feature amount is taken into consideration in the conversation evaluation. The evaluation value S calculated by the conversation evaluation unit 36 is displayed on the display device 16.

前述の通り、利用者Ｕ1の音声Ｖ1の音高Ｐ1に対して協和関係にある音高Ｐ2の音声Ｖ2で利用者Ｕ2が応答した場合に利用者Ｕ1は良好な印象を感取するという傾向がある。以上の傾向を考慮して、第１実施形態の会話評価部３６は、音高Ｐ1と音高Ｐ2との音高差ΔＰ（ΔＰ＝|Ｐ1−Ｐ2|）に応じて評価値Ｓを算定する。具体的には、音高差ΔＰが協和関係に近いほど評価値Ｓが大きい数値となるように会話評価部３６は評価値Ｓを算定する。第１実施形態で例示する協和関係は、例えば周波数比が整数比に近い音程の関係（例えば完全一度，完全八度，完全五度，完全四度）である。 As described above, when the user U2 responds with the voice V2 of the pitch P2 that is in a harmony with the pitch P1 of the voice V1 of the user U1, the user U1 tends to feel a good impression. is there. In consideration of the above tendency, the conversation evaluation unit 36 of the first embodiment calculates the evaluation value S according to the pitch difference ΔP (ΔP=|P1-P2|) between the pitch P1 and the pitch P2. .. Specifically, the conversation evaluation unit 36 calculates the evaluation value S such that the evaluation value S has a larger value as the pitch difference ΔP is closer to the consonance relation. The consonance relationship illustrated in the first embodiment is, for example, a pitch relationship in which the frequency ratio is close to an integer ratio (for example, perfect once, perfect eighth, perfect fifth, and perfect fourth).

以上の例示の通り、第１実施形態の会話評価部３６は、音高Ｐ1と音高Ｐ2とを評価値Ｓに反映させるほか、利用者Ｕ1と利用者Ｕ2との会話に関する関連情報Ｒも加味して評価値Ｓを算定する。関連情報Ｒ（会話日時，会話継続長）と評価値Ｓとの具体的な関係を以下に例示する。 As described above, the conversation evaluation unit 36 according to the first embodiment reflects the pitch P1 and the pitch P2 in the evaluation value S, and also includes the related information R regarding the conversation between the users U1 and U2. Then, the evaluation value S is calculated. A specific relationship between the related information R (conversation date and time, conversation duration) and the evaluation value S is illustrated below.

夜間や休日の会話は、親密な友人同士の会話である可能性が高いから、例えば平日の日中の会話（典型的には業務上の会話等）と比較して、利用者Ｕ1が利用者Ｕ2に好印象を感取する可能性が高いという傾向がある。以上の傾向を考慮して、関連情報Ｒで指定される会話日時が夜間や休日に該当する場合には、会話日時が平日の日中に該当する場合と比較して評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。例えば、会話評価部３６は、会話日時が夜間や休日に該当する場合に所定値を評価値Ｓに加点する。 Since it is highly likely that conversations at night or on holidays are between intimate friends, compared to, for example, weekday daytime conversations (typically business conversations), user U1 U2 tends to have a good impression. In consideration of the above tendency, when the conversation date and time specified by the related information R corresponds to nighttime or holiday, the evaluation value S is larger than that when the conversation date and time corresponds to daytime on weekdays. As described above, the conversation evaluation unit 36 calculates the evaluation value S. For example, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the conversation date/time corresponds to nighttime or holiday.

また、長時間にわたり会話が継続している場合には、相互に良好な印象を感取しながら利用者Ｕ1と利用者Ｕ2との会話が盛上がっている可能性が高い。以上の傾向を考慮して、関連情報Ｒで指定される会話継続長が長いほど評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。例えば、会話評価部３６は、会話継続長が所定の閾値を上回る場合に所定値を評価値Ｓに加点する。他方、会話が過度に長時間にわたる場合には、利用者Ｕ1および利用者Ｕ2の疲労により相互間の印象が悪化する可能性がある。以上の傾向を考慮すると、関連情報Ｒで指定される会話継続長が所定の閾値を上回る場合に評価値Ｓを減点することも可能である。なお、相異なる複数の閾値を利用することも可能である。例えば、第１閾値と第２閾値とを設定し（第１閾値＜第２閾値）、会話継続長が第１閾値と第２閾値との間の数値である場合に評価値Ｓを加点する一方、会話継続長が第１閾値を下回る場合または第２閾値を上回る場合には評価値Ｓを減点する構成が想定される。各閾値を境界とする範囲毎に評価値Ｓに対する加点値または減点値を段階的に変化させることも可能である。 Further, when the conversation continues for a long time, there is a high possibility that the conversation between the user U1 and the user U2 will be lively, feeling the good impressions of each other. In consideration of the above tendency, the conversation evaluation unit 36 calculates the evaluation value S such that the evaluation value S becomes a larger value as the conversation continuation length designated by the related information R becomes longer. For example, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the conversation duration exceeds a predetermined threshold. On the other hand, when the conversation lasts for an excessively long time, the mutual impression may be deteriorated due to the fatigue of the users U1 and U2. Considering the above tendency, it is possible to deduct the evaluation value S when the conversation duration designated by the related information R exceeds a predetermined threshold. It is also possible to use a plurality of different thresholds. For example, the first threshold value and the second threshold value are set (first threshold value<second threshold value), and when the conversation duration is a numerical value between the first threshold value and the second threshold value, the evaluation value S is added. It is assumed that the evaluation value S is deducted when the conversation duration is below the first threshold or above the second threshold. It is also possible to change the point addition value or the point deduction value for the evaluation value S step by step for each range with each threshold as the boundary.

図２は、第１実施形態の制御装置１２が利用者Ｕ1と利用者Ｕ2との会話を評価する処理（以下「会話評価処理」という）のフローチャートである。例えば入力装置１８に対する利用者Ｕ（Ｕ1，Ｕ2）からの指示や利用者Ｕによる発話の開始を契機として会話評価処理が開始される。 FIG. 2 is a flowchart of a process in which the control device 12 of the first embodiment evaluates a conversation between the users U1 and U2 (hereinafter referred to as "conversation evaluation process"). For example, the conversation evaluation process is started in response to an instruction from the user U (U1, U2) to the input device 18 or the start of the utterance by the user U.

図２の会話評価処理を開始すると、特徴取得部３２は、音声信号Ｘ1および音声信号Ｘ2の解析により利用者Ｕ1の音声Ｖ1の音高Ｐ1と利用者Ｕ2の音声Ｖ2の音高Ｐ2とを順次に抽出する（ＳA1）。また、情報生成部３４は、例えば計時回路が計時する時刻を参照して関連情報Ｒ（第１実施形態では会話日時および会話継続長）を生成する（ＳA2）。会話評価部３６は、特徴取得部３２が抽出した特徴量（音高Ｐ1，音高Ｐ2）と情報生成部３４が生成した関連情報Ｒとに応じた評価値Ｓを算定する（ＳA3）。なお、特徴取得部３２による特徴量の抽出（ＳA1）と情報生成部３４による関連情報Ｒの生成（ＳA2）との先後は逆転され得る。 When the conversation evaluation process of FIG. 2 is started, the feature acquisition unit 32 sequentially analyzes the pitch P1 of the voice V1 of the user U1 and the pitch P2 of the voice V2 of the user U2 by analyzing the voice signal X1 and the voice signal X2. (SA1). Further, the information generating unit 34 generates the related information R (the conversation date and time and the conversation duration in the first embodiment) with reference to the time measured by the clock circuit, for example (SA2). The conversation evaluation unit 36 calculates an evaluation value S according to the feature amount (pitch P1, pitch P2) extracted by the feature acquisition unit 32 and the related information R generated by the information generation unit 34 (SA3). Note that the extraction of the characteristic amount by the characteristic acquisition unit 32 (SA1) and the generation of the related information R by the information generation unit 34 (SA2) can be reversed.

以上に例示した通り、第１実施形態では、会話を構成する音声Ｖ1および音声Ｖ2の特徴量に応じて利用者Ｕ1と利用者Ｕ2との間の会話を客観的に評価することが可能である。また、会話を構成する音声Ｖ1および音声Ｖ2の特徴量のほかに当該会話の関連情報Ｒも加味して会話が評価されるから、特徴量のみを評価結果に反映させる構成と比較して会話を適切に評価することが可能である。第１実施形態では特に、音声Ｖ1の音高Ｐ1と音声Ｖ2の音高Ｐ2との音高差ΔＰに応じて会話が評価されるから、利用者Ｕ1の音声Ｖ1に対する利用者Ｕ2の音声Ｖ2の音程（すなわち音高差）という観点から、利用者Ｕ2による応答の印象の良否を客観的に評価することが可能である。 As illustrated above, in the first embodiment, it is possible to objectively evaluate the conversation between the users U1 and U2 according to the feature amounts of the voice V1 and the voice V2 that constitute the conversation. .. Further, the conversation is evaluated by considering the related information R of the conversation in addition to the feature amounts of the voice V1 and the voice V2 that make up the conversation. Therefore, the conversation is compared with the configuration in which only the feature amount is reflected in the evaluation result. It is possible to evaluate appropriately. Particularly in the first embodiment, since the conversation is evaluated according to the pitch difference ΔP between the pitch P1 of the voice V1 and the pitch P2 of the voice V2, the voice V2 of the user U2 with respect to the voice V1 of the user U1 From the viewpoint of pitch (that is, pitch difference), it is possible to objectively evaluate the quality of the impression of the response by the user U2.

＜第２実施形態＞
本発明の第２実施形態を説明する。なお、以下に例示する各形態において作用や機能が第１実施形態と同様である要素については、第１実施形態の説明で参照した符号を流用して各々の詳細な説明を適宜に省略する。 <Second Embodiment>
A second embodiment of the present invention will be described. It should be noted that in each of the following exemplary embodiments, the elements having the same functions and functions as those in the first embodiment are assigned the reference numerals used in the description of the first embodiment, and the detailed description thereof will be appropriately omitted.

図３は、第２実施形態の会話評価装置１００の構成図である。図３に例示される通り、第２実施形態の会話評価装置１００の記憶装置１４は、利用者Ｕの組合せ毎に履歴情報Ｈを記憶する。履歴情報Ｈは、利用者Ｕ間の過去の会話の履歴（会話履歴）に関する情報である。具体的には、第２実施形態の履歴情報Ｈは、利用者Ｕ間で過去に実施された会話の頻度（以下「会話頻度」という）と利用者Ｕ間での最初の会話からの経過時間（以下「関係期間」という）とを指定する。会話頻度は、所定長の期間（例えば１ヶ月間）内における会話の回数を意味する。利用者Ｕ間の会話毎に関連情報Ｒの会話頻度は更新される。会話頻度や関係期間は、利用者Ｕ間の親密度の指標とも換言され得る。 FIG. 3 is a configuration diagram of the conversation evaluation device 100 according to the second embodiment. As illustrated in FIG. 3, the storage device 14 of the conversation evaluation device 100 according to the second embodiment stores the history information H for each combination of the users U. The history information H is information relating to the history of past conversations (conversation history) between the users U. Specifically, the history information H of the second embodiment is the frequency of conversations conducted in the past between the users U (hereinafter referred to as “conversation frequency”) and the elapsed time from the first conversation between the users U. (Hereinafter referred to as “relationship period”). The conversation frequency means the number of conversations within a predetermined length of time (for example, one month). The conversation frequency of the related information R is updated for each conversation between the users U. The conversation frequency and the related period can be paraphrased as an index of the degree of intimacy between the users U.

第２実施形態の情報生成部３４は、記憶装置１４に記憶された履歴情報Ｈを参照して関連情報Ｒを生成する。例えば、利用者Ｕ1および利用者Ｕ2は、入力装置１８を適宜に操作することで自身の識別情報を会話評価装置１００に指示する。情報生成部３４は、識別情報が示す利用者Ｕ1と利用者Ｕ2との間の履歴情報Ｈを記憶装置１４から検索し、当該履歴情報Ｈで指定された会話頻度と関係期間を含む関連情報Ｒを生成する。なお、特徴取得部３２が音声Ｖ1の音高Ｐ1と音声Ｖ2の音高Ｐ2とを抽出する動作は第１実施形態と同様である。 The information generator 34 of the second embodiment generates the related information R by referring to the history information H stored in the storage device 14. For example, the user U1 and the user U2 instruct the conversation evaluation device 100 about their own identification information by appropriately operating the input device 18. The information generation unit 34 searches the storage device 14 for the history information H between the users U1 and U2 indicated by the identification information, and the related information R including the conversation frequency and the related period designated by the history information H. To generate. The operation of the feature acquisition unit 32 to extract the pitch P1 of the voice V1 and the pitch P2 of the voice V2 is the same as in the first embodiment.

会話頻度が高い場合には、利用者Ｕ1と利用者Ｕ2とが良好な関係にあり、相互に良好な印象を維持しながら会話している可能性が高い。以上の傾向を考慮して、関連情報Ｒで指定される会話頻度が高いほど評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。例えば、会話評価部３６は、会話頻度が所定の閾値を上回る場合に所定値を評価値Ｓに加点する。なお、相異なる複数の閾値を利用することも可能である。例えば、会話頻度が第１閾値と第２閾値（第１閾値＜第２閾値）との間の数値である場合に評価値Ｓを加点する一方、会話頻度が第１閾値を下回る場合または第２閾値を上回る場合には評価値Ｓを減点する構成が想定される。各閾値を境界とする範囲毎に評価値Ｓに対する加点値または減点値を段階的に変化させることも可能である。 When the conversation frequency is high, the user U1 and the user U2 have a good relationship, and there is a high possibility that they are talking while maintaining a good impression of each other. In consideration of the above tendency, the conversation evaluation unit 36 calculates the evaluation value S so that the evaluation value S becomes a larger value as the conversation frequency designated by the related information R becomes higher. For example, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the conversation frequency exceeds a predetermined threshold. It is also possible to use a plurality of different thresholds. For example, when the conversation frequency is a numerical value between the first threshold value and the second threshold value (first threshold value<second threshold value), the evaluation value S is added, while when the conversation frequency is less than the first threshold value or the second threshold value. It is assumed that the evaluation value S is deducted when the threshold value is exceeded. It is also possible to change the point addition value or the point deduction value for the evaluation value S step by step for each range with each threshold as the boundary.

また、最初の会話からの関係期間が長い場合にも同様に、利用者Ｕ1と利用者Ｕ2とが良好な関係にあり、相互に良好な印象を維持しながら会話している可能性が高い。以上の傾向を考慮して、関連情報Ｒで指定される関係期間が長いほど評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。例えば、会話評価部３６は、関係期間が所定の閾値を上回る場合に所定値を評価値Ｓに加点する。なお、相異なる複数の閾値を利用することも可能である。例えば、関係期間が第１閾値と第２閾値（第１閾値＜第２閾値）との間の数値である場合に評価値Ｓを加点する一方、関係期間が第１閾値を下回る場合または第２閾値を上回る場合には評価値Ｓを減点する構成が想定される。各閾値を境界とする範囲毎に評価値Ｓに対する加点値または減点値を段階的に変化させることも可能である。 Also, when the relationship period from the first conversation is long, similarly, the user U1 and the user U2 have a good relationship, and there is a high possibility that they are talking while maintaining a good impression with each other. In consideration of the above tendency, the conversation evaluation unit 36 calculates the evaluation value S such that the evaluation value S becomes a larger numerical value as the relationship period designated by the related information R is longer. For example, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the relationship period exceeds a predetermined threshold value. It is also possible to use a plurality of different thresholds. For example, the evaluation value S is added when the relationship period is a numerical value between the first threshold value and the second threshold value (first threshold value<second threshold value), while the relationship period is less than the first threshold value or the second threshold value. It is assumed that the evaluation value S is deducted when the threshold value is exceeded. It is also possible to change the point addition value or the point deduction value for the evaluation value S step by step for each range with each threshold as the boundary.

以上に例示した通り、第２実施形態においても、会話を構成する音声Ｖ1および音声Ｖ2の特徴量のほかに当該会話の関連情報Ｒも加味して会話が評価される。したがって、第１実施形態と同様に、特徴量のみを評価に反映させる構成と比較して会話を適切に評価することが可能である。第２実施形態では特に、利用者Ｕ間の会話履歴（例えば会話頻度や関係期間）が関連情報Ｒとして利用されるから、利用者Ｕ間の過去の会話の傾向を踏まえた適切な評価が実現される。 As described above, in the second embodiment as well, the conversation is evaluated in consideration of the related information R of the conversation in addition to the feature amounts of the voice V1 and the voice V2 forming the conversation. Therefore, as in the first embodiment, it is possible to appropriately evaluate the conversation as compared with the configuration in which only the feature amount is reflected in the evaluation. Particularly in the second embodiment, since the conversation history between the users U (for example, the conversation frequency and the related period) is used as the related information R, an appropriate evaluation based on the past tendency of conversation between the users U is realized. To be done.

なお、以上の説明では、入力装置１８に対する操作で指示された識別情報を利用したが、利用者Ｕ1および利用者Ｕ2を特定する方法は任意である。例えば、音声信号Ｘ1に対する話者識別で利用者Ｕ1を特定するとともに音声信号Ｘ2に対する話者識別で利用者Ｕ2を特定し、利用者Ｕ1と利用者Ｕ2との間の履歴情報Ｈを検索することも可能である。利用者Ｕ1および利用者Ｕ2の話者識別には公知の認識技術が任意に採用され得る。 In the above description, the identification information instructed by the operation on the input device 18 is used, but the method of identifying the user U1 and the user U2 is arbitrary. For example, the user U1 is identified by the speaker identification for the voice signal X1, the user U2 is identified by the speaker identification for the voice signal X2, and the history information H between the users U1 and U2 is searched. Is also possible. A known recognition technique can be arbitrarily adopted for the speaker identification of the user U1 and the user U2.

＜第３実施形態＞
図４は、第３実施形態の会話評価装置１００の構成図である。図４に例示される通り、第３実施形態の会話評価装置１００の記憶装置１４は、利用者Ｕの組合せ毎に話者情報Ｑを記憶する。話者情報Ｑは、利用者Ｕ間の関係を示す情報である。具体的には、第３実施形態の話者情報Ｑは、利用者Ｕ間の相互関係（友人，家族，知人，同僚等）と利用者Ｕ間の親密度とを指定する。相互関係および親密度は、具体的には入力装置１８に対する利用者Ｕからの操作で指示され得るが、例えばＳＮＳ（Social Networking Service）に登録された情報から話者情報Ｑに反映させることも可能である。 <Third Embodiment>
FIG. 4 is a configuration diagram of the conversation evaluation device 100 according to the third embodiment. As illustrated in FIG. 4, the storage device 14 of the conversation evaluation device 100 of the third embodiment stores the speaker information Q for each combination of the users U. The speaker information Q is information indicating the relationship between the users U. Specifically, the speaker information Q of the third embodiment specifies the mutual relationship (friend, family, acquaintance, colleague, etc.) between the users U and the intimacy degree between the users U. The mutual relationship and familiarity can be specifically instructed by the operation of the user U on the input device 18, but can be reflected in the speaker information Q from information registered in, for example, SNS (Social Networking Service). Is.

第３実施形態の情報生成部３４は、記憶装置１４に記憶された話者情報Ｑを参照して関連情報Ｒを生成する。例えば、情報生成部３４は、第２実施形態と同様に識別情報の入力や話者識別で特定された利用者Ｕ1と利用者Ｕ2と間の話者情報Ｑを記憶装置１４から検索し、当該話者情報Ｑで指定された相互関係と親密度とを含む関連情報Ｒを生成する。なお、特徴取得部３２が音声Ｖ1の音高Ｐ1と音声Ｖ2の音高Ｐ2とを抽出する動作は第１実施形態と同様である。 The information generator 34 of the third embodiment generates the related information R by referring to the speaker information Q stored in the storage device 14. For example, the information generation unit 34 searches the storage device 14 for the speaker information Q between the users U1 and U2 identified by the input of the identification information or the speaker identification as in the second embodiment, and The related information R including the mutual relationship and familiarity designated by the speaker information Q is generated. The operation of the feature acquisition unit 32 to extract the pitch P1 of the voice V1 and the pitch P2 of the voice V2 is the same as in the first embodiment.

相互関係が友人である場合には、利用者Ｕ1と利用者Ｕ2とが良好な関係にあり、相互に良好な印象を維持しながら会話している可能性が高い。以上の傾向を考慮して、関連情報Ｒで指定される相互関係が友人である場合には、相互関係が他の関係である場合と比較して評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。具体的には、会話評価部３６は、利用者Ｕ1と利用者Ｕ2との相互関係が友人である場合に所定値を評価値Ｓに加点する。 When the mutual relationship is a friend, the user U1 and the user U2 have a good relationship, and there is a high possibility that they are talking while maintaining a good impression of each other. In consideration of the above tendency, when the mutual relationship specified by the related information R is a friend, the conversation is performed so that the evaluation value S becomes a larger numerical value than in the case where the mutual relationship is another relationship. The evaluation unit 36 calculates the evaluation value S. Specifically, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the mutual relationship between the user U1 and the user U2 is a friend.

また、親密度が高い場合にも同様に、利用者Ｕ1と利用者Ｕ2とが良好な関係にあり、相互に良好な印象を維持しながら会話している可能性が高い。以上の傾向を考慮して、関連情報Ｒで指定される親密度が高いほど評価値Ｓが大きい数値となるように、会話評価部３６は評価値Ｓを算定する。例えば、会話評価部３６は、親密度が所定の閾値を上回る場合に所定値を評価値Ｓに加点する。なお、相異なる複数の閾値を利用することも可能である。例えば、親密度が第１閾値と第２閾値（第１閾値＜第２閾値）との間の数値である場合に評価値Ｓを加点する一方、親密度が第１閾値を下回る場合または第２閾値を上回る場合には評価値Ｓを減点する構成が想定される。各閾値を境界とする範囲毎に評価値Ｓに対する加点値または減点値を段階的に変化させることも可能である。 Similarly, when the degree of intimacy is high, the user U1 and the user U2 also have a good relationship, and there is a high possibility that they are talking while maintaining a good impression of each other. In consideration of the above tendency, the conversation evaluation unit 36 calculates the evaluation value S such that the evaluation value S becomes a larger value as the degree of intimacy specified by the related information R becomes higher. For example, the conversation evaluation unit 36 adds a predetermined value to the evaluation value S when the degree of intimacy exceeds a predetermined threshold. It is also possible to use a plurality of different thresholds. For example, when the degree of intimacy is a numerical value between the first threshold value and the second threshold value (first threshold value<second threshold value), the evaluation value S is added, while when the intimacy degree is less than the first threshold value or the second threshold value, It is assumed that the evaluation value S is deducted when the threshold value is exceeded. It is also possible to change the point addition value or the point deduction value for the evaluation value S step by step for each range with each threshold as the boundary.

以上に例示した通り、第３実施形態においても、会話を構成する音声Ｖ1および音声Ｖ2の特徴量のほかに当該会話の関連情報Ｒも加味して会話が評価される。したがって、第１実施形態と同様に、特徴量のみを評価に反映させる構成と比較して会話を適切に評価することが可能である。第３実施形態では特に、利用者Ｕ間の話者情報Ｑ（例えば相互関係や親密度）が関連情報Ｒとして利用されるから、利用者Ｕ間の実際の関係を踏まえた適切な評価が実現される。 As illustrated above, in the third embodiment as well, the conversation is evaluated in consideration of the related information R of the conversation in addition to the feature amounts of the voice V1 and the voice V2 that form the conversation. Therefore, as in the first embodiment, it is possible to appropriately evaluate the conversation as compared with the configuration in which only the feature amount is reflected in the evaluation. In the third embodiment, in particular, the speaker information Q (for example, mutual relationship and intimacy) between the users U is used as the related information R, so that an appropriate evaluation based on the actual relationship between the users U is realized. To be done.

＜第４実施形態＞
図５は、第４実施形態の会話評価装置１００の構成図である。図５に例示される通り、第４実施形態の会話評価装置１００の記憶装置１４は、利用者Ｕ毎に属性情報Ａを記憶する。属性情報Ａは、利用者Ｕの属性（特徴や性質）を示す情報である。利用者Ｕが発音する音声に依存する情報が属性情報Ａとして特に好適である。第４実施形態の属性情報Ａは、利用者Ｕの発音周波数を指定する。発音周波数は、利用者Ｕが発音する音声の平均的な音高（平均ピッチ）である。 <Fourth Embodiment>
FIG. 5: is a block diagram of the conversation evaluation apparatus 100 of 4th Embodiment. As illustrated in FIG. 5, the storage device 14 of the conversation evaluation device 100 according to the fourth embodiment stores the attribute information A for each user U. The attribute information A is information indicating the attribute (feature or property) of the user U. Information that depends on the voice pronounced by the user U is particularly suitable as the attribute information A. The attribute information A of the fourth embodiment specifies the sounding frequency of the user U. The sounding frequency is an average pitch (average pitch) of the sound produced by the user U.

第４実施形態の情報生成部３４は、第２実施形態と同様に識別情報の入力や話者識別で特定された利用者Ｕ1および利用者Ｕ2の各々の属性情報Ａを記憶装置１４から検索し、各属性情報Ａで指定された発音周波数を含む関連情報Ｒを生成する。すなわち、第４実施形態の関連情報Ｒは、評価対象の会話を実施する各利用者Ｕの情報であり、第１実施形態から第３実施形態で例示した関連情報Ｒと同様に、特徴取得部３２が抽出する特徴量とは別個の種類の情報の一例である。 The information generation unit 34 of the fourth embodiment searches the storage device 14 for the attribute information A of each of the user U1 and the user U2 identified by the input of the identification information and the speaker identification as in the second embodiment. , And generates related information R including the sounding frequency specified by each attribute information A. That is, the related information R of the fourth embodiment is information of each user U who conducts the conversation of the evaluation target, and like the related information R illustrated in the first to third embodiments, the characteristic acquisition unit. 32 is an example of information of a type different from the feature amount extracted by 32.

第１実施形態から第３実施形態では、情報生成部３４が生成した関連情報Ｒを会話評価部３６による会話の評価に反映させる構成を例示したが、第４実施形態では、特徴取得部３２による特徴量の抽出に関連情報Ｒが反映される。すなわち、第４実施形態の特徴取得部３２は、情報生成部３４が生成した関連情報Ｒに応じた条件で特徴量を抽出する。 In the first to third embodiments, the configuration in which the related information R generated by the information generation unit 34 is reflected in the conversation evaluation by the conversation evaluation unit 36 is illustrated, but in the fourth embodiment, the feature acquisition unit 32 is used. The related information R is reflected in the extraction of the feature amount. That is, the characteristic acquisition unit 32 of the fourth embodiment extracts the characteristic amount under the condition according to the related information R generated by the information generation unit 34.

具体的には、特徴取得部３２は、収音装置２２が生成する音声信号Ｘ1のうち、関連情報Ｒが指定する利用者Ｕ1の発音周波数を含む所定帯域内の音響成分を抽出し、抽出後の音響成分から音高Ｐ1を特定する。すなわち、利用者Ｕ1が平常的に発音する音域に限定して音高Ｐ1が特定される。同様に、特徴取得部３２は、収音装置２４が生成する音声信号Ｘ2のうち関連情報Ｒが指定する利用者Ｕ2の発音周波数を含む所定帯域内の音響成分から音高Ｐ2を特定する。なお、利用者Ｕの発音域を指定する属性情報Ａを関連情報Ｒとして利用することも可能である。 Specifically, the feature acquisition unit 32 extracts, from the audio signal X1 generated by the sound collection device 22, an acoustic component within a predetermined band including the sounding frequency of the user U1 designated by the related information R, and after extraction. The pitch P1 is specified from the sound component of. That is, the pitch P1 is specified only in the tone range in which the user U1 normally sounds. Similarly, the feature acquisition unit 32 specifies the pitch P2 from the acoustic component within the predetermined band including the sounding frequency of the user U2 designated by the related information R in the sound signal X2 generated by the sound collection device 24. It is also possible to use the attribute information A that specifies the pronunciation range of the user U as the related information R.

会話評価部３６は、以上の例示のように特徴取得部３２が関連情報Ｒを使用して特定した音高Ｐ1および音高Ｐ2に応じて利用者Ｕ1と利用者Ｕ2との会話を評価する。具体的には、会話評価部３６は、音高Ｐ1と音高Ｐ2との音高差ΔＰに応じて評価値Ｓを算定する。第４実施形態における会話評価部３６による評価には関連情報Ｒは加味されない。ただし、第１実施形態から第３実施形態の例示と同様に、第４実施形態でも会話評価部３６による評価に関連情報Ｒを加味することは可能である。 The conversation evaluation unit 36 evaluates the conversation between the users U1 and U2 according to the pitch P1 and the pitch P2 identified by the feature acquisition unit 32 using the related information R as described above. Specifically, the conversation evaluation unit 36 calculates the evaluation value S according to the pitch difference ΔP between the pitch P1 and the pitch P2. The related information R is not added to the evaluation by the conversation evaluation unit 36 in the fourth embodiment. However, similarly to the exemplification of the first to third embodiments, it is possible to add the related information R to the evaluation by the conversation evaluation unit 36 also in the fourth embodiment.

以上に例示した通り、第４実施形態においても第１実施形態と同様に、会話を構成する音声Ｖ1および音声Ｖ2の特徴量に応じて利用者Ｕ1と利用者Ｕ2との間の会話を客観的に評価することが可能である。また、第４実施形態では、会話に関する関連情報Ｒに応じた条件で特徴量（音高Ｐ1，音高Ｐ2）が抽出されるから、特徴量の抽出に関連情報Ｒを利用しない構成と比較して特徴量を適切に抽出できるという利点がある。例えば、第４実施形態では、関連情報Ｒが指定する発音周波数に対応した周波数帯域に制限することで特徴量を高精度に抽出することが可能である。 As illustrated above, also in the fourth embodiment, as in the first embodiment, the conversation between the users U1 and U2 is objectively determined according to the feature amounts of the voice V1 and the voice V2 that constitute the conversation. It is possible to evaluate. In addition, in the fourth embodiment, the feature quantity (pitch P1, pitch P2) is extracted under the condition according to the related information R regarding the conversation, so that a comparison is made with the configuration in which the related information R is not used for extracting the feature quantity. Therefore, there is an advantage that the feature amount can be appropriately extracted. For example, in the fourth embodiment, it is possible to highly accurately extract the feature amount by limiting the frequency band corresponding to the sounding frequency designated by the related information R.

なお、以上の例示では、利用者Ｕの発音周波数を属性情報Ａとして例示したが、属性情報Ａの内容は以上の例示に限定されない。例えば、利用者Ｕの性別を指定する属性情報Ａを利用することも可能である。特徴取得部３２は、関連情報Ｒが指定する性別について想定される周波数帯域内で音高Ｐを特定する。例えば、特徴取得部３２は、関連情報Ｒが指定する利用者Ｕ1の性別が女性である場合には、音声信号Ｘ1のうち女性に想定される高音域の音響成分から音高Ｐ1を抽出し、利用者Ｕ1の性別が男性である場合には、音声信号Ｘ1のうち男性に想定される低音域の音響成分から音高Ｐ1を抽出する。 In the above example, the pronunciation frequency of the user U is illustrated as the attribute information A, but the content of the attribute information A is not limited to the above example. For example, it is possible to use the attribute information A that specifies the gender of the user U. The feature acquisition unit 32 specifies the pitch P within the frequency band assumed for the sex designated by the related information R. For example, when the gender of the user U1 designated by the related information R is female, the feature acquisition unit 32 extracts the pitch P1 from the acoustic component of the high frequency range assumed for the female in the audio signal X1, When the gender of the user U1 is male, the pitch P1 is extracted from the acoustic component in the low range assumed by the male in the audio signal X1.

なお、利用者Ｕ1の音声Ｖ1と利用者Ｕ2の音声Ｖ2との音高差が１オクターブを上回る場合に、音高Ｐ1および音高Ｐ2の一方を他方に対して１オクターブの整数倍だけ近付けることで両者間の音高差を１オクターブ以内に補正（以下「音高補正」という）する構成が好適である。利用者Ｕ1と利用者Ｕ2とで性別が相違する場合（すなわち音高差が大きい場合）には、音高補正の必要性が高いと推定される。以上の傾向を考慮すると、関連情報Ｒが指定する性別が利用者Ｕ1と利用者Ｕ2とで相違する場合には特徴取得部３２が音高補正を実行し、利用者Ｕ1と利用者Ｕ2とで性別が共通する場合には特徴取得部３２が音高補正を省略する構成も好適である。 When the pitch difference between the voice V1 of the user U1 and the voice V2 of the user U2 exceeds one octave, one of the pitch P1 and the pitch P2 is brought closer to the other by an integral multiple of one octave. Therefore, it is preferable that the pitch difference between the two is corrected within one octave (hereinafter referred to as "pitch correction"). When the sexes of the user U1 and the user U2 are different (that is, the pitch difference is large), it is estimated that the pitch correction is highly necessary. Considering the above tendency, when the genders specified by the related information R are different between the user U1 and the user U2, the feature acquisition unit 32 performs pitch correction, and the user U1 and the user U2. A configuration in which the feature acquisition unit 32 omits pitch correction when the genders are common is also suitable.

＜第５実施形態＞
第５実施形態の情報生成部３４は、第３実施形態と同様に、記憶装置１４に記憶された話者情報Ｑを参照することで、利用者Ｕ1と利用者Ｕ2との間の親密度を指定する関連情報Ｒを生成する。特徴取得部３２は、第４実施形態と同様に、情報生成部３４が生成した関連情報Ｒに応じた条件で利用者Ｕ1の音声Ｖ1および利用者Ｕ2の音声Ｖ2の各々の特徴量（音高Ｐ1，音高Ｐ2）を抽出する。具体的には、特徴取得部３２は、関連情報Ｒが指定する親密度に応じた頻度で特徴量を抽出する。 <Fifth Embodiment>
Like the third embodiment, the information generation unit 34 of the fifth embodiment refers to the speaker information Q stored in the storage device 14 to determine the intimacy degree between the user U1 and the user U2. The related information R designated is generated. Similar to the fourth embodiment, the characteristic acquisition unit 32, under the condition according to the related information R generated by the information generation unit 34, each characteristic amount (pitch of the voice V1 of the user U1 and the voice V2 of the user U2). P1 and pitch P2) are extracted. Specifically, the characteristic acquisition unit 32 extracts the characteristic amount at a frequency according to the degree of intimacy specified by the related information R.

例えば親密度が高い場合には、利用者Ｕ1と利用者Ｕ2とが良好な関係にあるから、評価値Ｓは比較的に大きい数値になると予想される。親密度が低い場合には、評価値Ｓの大小の予想は困難である。したがって、親密度が高い場合には頻繁に会話を評価する必要性は低く、親密度が低い場合には頻繁に会話を評価する必要がある、という傾向が想定される。以上の傾向を考慮して、第５実施形態の特徴取得部３２は、関連情報Ｒで指定される親密度が高いほど、特徴量（音高Ｐ1，音高Ｐ2）の抽出の頻度を低下させる。 For example, when the degree of intimacy is high, since the user U1 and the user U2 have a good relationship, the evaluation value S is expected to be a relatively large numerical value. When the degree of intimacy is low, it is difficult to predict the magnitude of the evaluation value S. Therefore, it is assumed that there is a low need for frequent conversation evaluation when the degree of intimacy is high, and a frequent need for evaluating the conversation when the degree of intimacy is low. In consideration of the above tendency, the feature acquisition unit 32 of the fifth embodiment lowers the frequency of extracting the feature amount (pitch P1, pitch P2) as the degree of intimacy specified by the related information R is higher. ..

具体的には、特徴取得部３２は、親密度が所定の閾値を上回る場合に、親密度が閾値を下回る場合と比較して低い頻度で特徴量を抽出する。例えば、親密度が閾値を下回る場合には、音声Ｖ1および音声Ｖ2の相前後する発話区間の１組毎に（すなわち、利用者Ｕ1による発話と利用者Ｕ2による応答との組毎に１回の頻度で）音高Ｐ1および音高Ｐ2が抽出される。他方、親密度が閾値を上回る場合には、音声Ｖ1および音声Ｖ2の発話区間の複数組毎に（すなわち、利用者Ｕ1による発話と利用者Ｕ2による応答との複数回毎に１回の頻度で）音高Ｐ1および音高Ｐ2が抽出される。会話評価部３６による会話の評価は特徴取得部３２による特徴量の抽出毎に実行されるから、関連情報Ｒで指定される親密度が高いほど、会話評価部３６による評価の頻度（さらには表示装置１６に表示される評価値Ｓの更新の頻度）は低下する。なお、相異なる複数の閾値を利用することも可能である。例えば、複数の閾値の各々を境界とする範囲毎に頻度を設定し、複数の範囲のうち親密度が属する範囲に対応した頻度で特徴取得部３２が特徴量を抽出する構成が想定される。 Specifically, the feature acquisition unit 32 extracts the feature amount at a lower frequency when the degree of intimacy exceeds a predetermined threshold than when the degree of intimacy falls below the threshold. For example, when the degree of intimacy is lower than the threshold value, it is set for each set of the speech sections of the voice V1 and the voice V2 that follow each other (that is, once for each set of the utterance by the user U1 and the response by the user U2. Pitch P1 and pitch P2 are extracted (in frequency). On the other hand, when the degree of intimacy exceeds the threshold value, the frequency is once for each of a plurality of sets of speech sections of the voice V1 and the voice V2 (that is, once for a plurality of times of the utterance by the user U1 and the response by the user U2. ) The pitch P1 and the pitch P2 are extracted. Since the evaluation of the conversation by the conversation evaluation unit 36 is performed every time the feature amount is extracted by the feature acquisition unit 32, the higher the degree of intimacy specified by the related information R, the more frequently the conversation evaluation unit 36 evaluates the frequency (and further the display). The frequency of updating the evaluation value S displayed on the device 16 decreases. It is also possible to use a plurality of different thresholds. For example, a configuration in which a frequency is set for each range having each of a plurality of threshold values as boundaries and the feature acquisition unit 32 extracts the feature amount at a frequency corresponding to a range to which the intimacy degree belongs among the plurality of ranges is assumed.

以上に例示した通り、第５実施形態においても第１実施形態と同様に、会話を構成する音声Ｖ1および音声Ｖ2の特徴量に応じて利用者Ｕ1と利用者Ｕ2との間の会話を客観的に評価することが可能である。また、第５実施形態では、関連情報Ｒに応じた条件で特徴量（音高Ｐ1，音高Ｐ2）が抽出されるから、第４実施形態と同様に、特徴量の抽出に関連情報Ｒを利用しない構成と比較して特徴量を適切に抽出できるという利点がある。例えば第５実施形態では、特徴量の抽出の頻度が関連情報Ｒに応じて制御されるから、特徴量の抽出に関連情報Ｒを利用しない構成と比較して、特徴取得部３２による特徴量の抽出と会話評価部３６による会話の評価とに必要な演算量を削減することが可能である。 As illustrated above, also in the fifth embodiment, similar to the first embodiment, the conversation between the user U1 and the user U2 is objectively objective according to the feature amounts of the voice V1 and the voice V2 forming the conversation. It is possible to evaluate. Further, in the fifth embodiment, since the feature quantity (pitch P1, pitch P2) is extracted under the condition according to the related information R, the related information R is extracted in the extraction of the feature quantity as in the fourth embodiment. There is an advantage that the feature amount can be appropriately extracted as compared with the configuration that is not used. For example, in the fifth embodiment, the frequency of extraction of the feature quantity is controlled according to the related information R, so that the feature quantity obtained by the feature acquisition unit 32 can be compared with the configuration in which the related information R is not used to extract the feature quantity. It is possible to reduce the amount of calculation required for the extraction and the conversation evaluation by the conversation evaluation unit 36.

なお、第５実施形態の例示では、利用者Ｕ間の親密度に応じて特徴量の抽出条件（具体的には頻度）を制御したが、特徴量の抽出条件に反映させる関連情報Ｒの内容は以上の例示に限定されない。例えば、第１実施形態から第３実施形態で例示した任意の関連情報Ｒを、特徴量の抽出条件の制御に適用することが可能である。例えば、第２実施形態で例示した会話頻度や関係期間に応じて特徴量の抽出条件を制御する構成（例えば、会話頻度が高いほど、または、関係期間が長いほど、特徴量の抽出の頻度を低下させる構成）も想定される。 In the example of the fifth embodiment, the feature amount extraction condition (specifically, the frequency) is controlled according to the degree of intimacy between the users U, but the content of the related information R to be reflected in the feature amount extraction condition. Is not limited to the above examples. For example, it is possible to apply the arbitrary related information R exemplified in the first to third embodiments to the control of the feature quantity extraction condition. For example, a configuration in which the feature amount extraction condition is controlled according to the conversation frequency and the relation period illustrated in the second embodiment (for example, the higher the conversation frequency or the longer the relation period is, the more frequently the feature amount is extracted. It is also envisioned that the configuration).

また、会話評価部３６による会話の評価に関連情報Ｒを加味する第１実施形態から第３実施形態の構成と、特徴取得部３２による特徴量の抽出条件を関連情報Ｒに応じて制御する第４実施形態および第５実施形態の構成とを併合することも可能である。会話評価部３６による会話の評価と特徴量の抽出条件の制御とには、相異なる種類の関連情報Ｒが好適に適用され得るが、関連情報Ｒを共通に適用することも可能である。例えば、関連情報Ｒが会話頻度を含む構成では、特徴取得部３２による特徴量の抽出頻度を会話頻度に応じて制御するとともに、第２実施形態の例示のように会話評価部３６による会話の評価にも会話頻度を流用することが可能である。 Further, the configuration of the first to third embodiments in which the related information R is added to the evaluation of the conversation by the conversation evaluation unit 36, and the feature quantity extraction condition by the characteristic acquisition unit 32 are controlled according to the related information R. It is also possible to combine the configurations of the fourth embodiment and the fifth embodiment. The different types of related information R can be preferably applied to the evaluation of the conversation and the control of the feature amount extraction condition by the conversation evaluation unit 36, but the related information R can also be commonly applied. For example, in the configuration in which the related information R includes the conversation frequency, the extraction frequency of the feature amount by the feature acquisition unit 32 is controlled according to the conversation frequency, and the conversation evaluation unit 36 evaluates the conversation as illustrated in the second embodiment. It is also possible to use the conversation frequency.

＜変形例＞
以上に例示した各態様は多様に変形され得る。具体的な変形の態様を以下に例示する。以下の例示から任意に選択された２個以上の態様は、相互に矛盾しない範囲で適宜に併合され得る。 <Modification>
Each aspect illustrated above can be variously modified. Specific modes of modification will be exemplified below. Two or more aspects arbitrarily selected from the following exemplifications can be appropriately merged within a range not inconsistent with each other.

（１）前述の各形態では、利用者Ｕ1が発音した音声Ｖ1と利用者Ｕ2が発音した音声Ｖ2とで構成される会話を評価したが、会話評価装置１００が評価する音声は、利用者Ｕによる発声音（すなわち肉声）に限定されない。具体的には、音声Ｖ1および音声Ｖ2の一方を、公知の音声合成技術により生成された合成音声とすることも可能である。例えば、利用者Ｕ1が発音した音声Ｖ1と音声合成で生成された音声Ｖ2とで構成される会話の評価にも前述の各形態と同様の構成が採用され得る。すなわち、利用者Ｕ1の音声Ｖ1に対する音声認識により発話内容を解析することで、利用者Ｕ1の発話に対する適切な応答の音声Ｖ2が生成される。事前に収録された複数の音声を選択的に音声Ｖ2として選択することも可能である。また、音声合成で生成された音声Ｖ1と利用者Ｕ2が発音した音声Ｖ2とで構成される会話を評価する構成や、音声合成で生成された音声Ｖ1および音声Ｖ2で構成される会話を評価する構成も採用され得る。 (1) In each of the above-described modes, the conversation composed of the voice V1 pronounced by the user U1 and the voice V2 pronounced by the user U2 was evaluated, but the voice evaluated by the conversation evaluation device 100 is the user U. Is not limited to the vocal sound (that is, the real voice). Specifically, one of the voice V1 and the voice V2 can be a synthesized voice generated by a known voice synthesis technique. For example, the same configuration as each of the above-described embodiments may be adopted for the evaluation of the conversation composed of the voice V1 pronounced by the user U1 and the voice V2 generated by the voice synthesis. That is, by analyzing the utterance content by voice recognition of the voice U1 of the user U1, a voice V2 of an appropriate response to the utterance of the user U1 is generated. It is also possible to selectively select a plurality of prerecorded voices as the voice V2. Further, a configuration for evaluating a conversation composed of a voice V1 generated by voice synthesis and a voice V2 pronounced by the user U2, and a conversation composed of voice V1 and voice V2 generated by voice synthesis are evaluated. Configurations may also be employed.

前述のように音声合成で音声Ｖ1および音声Ｖ2を生成する構成では、収音装置２２や収音装置２４が省略される。また、合成音声を利用する構成では、音声の音響的な特性（例えば音高や音量）を指定する音声合成用のパラメータを音声Ｖ1や音声Ｖ2の特徴量として特徴取得部３２が取得することも可能である。以上の構成では、音声信号Ｘ1の解析で音声Ｖ1の特徴量を抽出する処理や、音声信号Ｘ2の解析で音声Ｖ2の特徴量を抽出する処理は省略され得る。以上の説明から理解される通り、特徴取得部３２は、会話を構成する音声（Ｖ1，Ｖ2）の特徴量を取得する要素として包括的に表現され、特徴抽出のための解析処理により特徴量を音声信号から抽出する要素のほか、抽出以外の任意の方法で特徴量を取得する要素も包含する。すなわち、特徴量の「抽出」は特徴量の「取得」の一例である。 As described above, in the configuration in which the voice V1 and the voice V2 are generated by voice synthesis, the sound collecting device 22 and the sound collecting device 24 are omitted. Further, in the configuration using the synthesized voice, the feature acquisition unit 32 may also acquire a voice synthesis parameter that specifies the acoustic characteristic of the voice (for example, pitch or volume) as the feature amount of the voice V1 or the voice V2. It is possible. With the above configuration, the process of extracting the feature amount of the voice V1 by analyzing the voice signal X1 and the process of extracting the feature amount of the voice V2 by analyzing the voice signal X2 can be omitted. As can be understood from the above description, the feature acquisition unit 32 is comprehensively expressed as an element that acquires the feature amount of the voice (V1, V2) forming the conversation, and the feature amount is obtained by the analysis process for feature extraction. In addition to elements extracted from the audio signal, elements that acquire the feature amount by any method other than extraction are also included. That is, “extraction” of the feature amount is an example of “acquisition” of the feature amount.

（２）前述の各形態では、会話評価部３６が算定した評価値Ｓを表示装置１６に表示させたが、会話評価部３６による評価結果の形態は評価値Ｓに限定されない。例えば、評価値Ｓに応じた評価コメントを表示装置１６に表示させる（評価値Ｓの表示の有無は不問）ことも可能である。また、評価結果の出力方法は表示に限定されない。例えば、評価値Ｓや評価コメントを音声で出力することも可能である。 (2) In each of the above-described embodiments, the evaluation value S calculated by the conversation evaluation unit 36 is displayed on the display device 16, but the form of the evaluation result by the conversation evaluation unit 36 is not limited to the evaluation value S. For example, it is possible to display an evaluation comment according to the evaluation value S on the display device 16 (whether or not the evaluation value S is displayed). Moreover, the output method of the evaluation result is not limited to the display. For example, it is possible to output the evaluation value S and the evaluation comment by voice.

（３）特徴量（音高Ｐ1，音高Ｐ2）と関連情報Ｒとに応じて評価値Ｓを算定する方法は、前述の各形態での例示に限定されない。例えば、特徴量に応じて会話を評価した数値と関連情報Ｒに応じて算定された数値とを適用した演算（例えば加重和）により評価値Ｓを算定することも可能である。また、例えば特徴量と評価値Ｓとの関係（例えば両者間の関係を規定する演算式の種類や係数）を関連情報Ｒに応じて変化させる構成でも、特徴量と関連情報Ｒとの双方に応じた評価値Ｓを算定することが可能である。 (3) The method of calculating the evaluation value S according to the characteristic amount (pitch P1, pitch P2) and the related information R is not limited to the above-described examples. For example, it is also possible to calculate the evaluation value S by a calculation (for example, a weighted sum) that applies a numerical value that evaluates conversation according to the feature amount and a numerical value that is calculated according to the related information R. Further, for example, even in a configuration in which the relationship between the feature amount and the evaluation value S (for example, the type and coefficient of the arithmetic expression that defines the relationship between the two) is changed according to the related information R, both the feature amount and the related information R are set. It is possible to calculate a corresponding evaluation value S.

（４）特徴取得部３２が抽出する特徴量は音高（Ｐ1，Ｐ2）に限定されない。例えば、音声Ｖ1および音声Ｖ2の各々の音量を特徴量として特徴取得部３２が抽出することも可能である。会話評価部３６は、例えば、音声Ｖ1と音声Ｖ2との間の音量差に応じて会話を評価する。例えば、音声Ｖ1と音声Ｖ2との間の音量差が所定値に近いほど評価値Ｓが大きい数値となるように会話評価部３６は評価値Ｓを算定する。 (4) The feature amount extracted by the feature acquisition unit 32 is not limited to the pitch (P1, P2). For example, the feature acquisition unit 32 can extract the volume of each of the voice V1 and the voice V2 as the feature amount. The conversation evaluation unit 36 evaluates the conversation, for example, according to the volume difference between the voice V1 and the voice V2. For example, the conversation evaluation unit 36 calculates the evaluation value S such that the evaluation value S becomes larger as the volume difference between the voice V1 and the voice V2 becomes closer to a predetermined value.

音声Ｖ1の発話区間と音声Ｖ2の発話区間との間隔（以下「発話間隔」という）を特徴取得部３２が特徴量として抽出することも可能である。会話時の発話間隔が適切である場合には、会話の相手の音声が安心感のある好印象な発話であると知覚される、という傾向がある。以上の傾向を考慮すると、発話間隔が所定値に近いほど評価値Ｓが大きい数値となるように、会話評価部３６が評価値Ｓを算定する構成が好適である。 It is also possible for the feature acquisition unit 32 to extract the interval between the utterance section of the voice V1 and the utterance section of the voice V2 (hereinafter referred to as the "utterance interval") as the feature amount. When the utterance interval during conversation is appropriate, there is a tendency that the voice of the other party in the conversation is perceived as a utterance that is comfortable and has a good impression. In consideration of the above tendency, it is preferable that the conversation evaluation unit 36 calculates the evaluation value S so that the evaluation value S becomes larger as the utterance interval is closer to the predetermined value.

（５）情報生成部３４が関連情報Ｒを生成する方法は前述の各形態の例示に限定されない。具体的には、音声信号Ｘ1および音声信号Ｘ2を解析した結果から情報生成部３４が関連情報Ｒを生成することも可能である。例えば、特徴取得部３２が音声Ｖ1の音高Ｐ1と音声Ｖ2の音高Ｐ2とを特定した結果を利用して、情報生成部３４が、利用者Ｕ1および利用者Ｕ2の各々の性別を推定し、第４実施形態と同様に、利用者Ｕ1および利用者Ｕ2の性別を指定した関連情報Ｒを生成することも可能である。 (5) The method by which the information generation unit 34 generates the related information R is not limited to the above-described exemplary embodiments. Specifically, the information generating unit 34 can generate the related information R from the result of analyzing the voice signal X1 and the voice signal X2. For example, the information generation unit 34 estimates the genders of the user U1 and the user U2 by using the result of the feature acquisition unit 32 specifying the pitch P1 of the voice V1 and the pitch P2 of the voice V2. Similarly to the fourth embodiment, it is possible to generate the related information R that specifies the sexes of the user U1 and the user U2.

（６）携帯電話機やスマートフォン等の端末装置と通信するサーバ装置（単体の装置または複数の装置で構成されるサーバシステム）で会話評価装置１００を実現することも可能である。例えば、会話評価装置１００は、音声信号Ｘ1と音声信号Ｘ2とを端末装置から受信し、前述の各形態と同様の方法で利用者Ｕ1と利用者Ｕ2との会話を評価した結果（例えば評価値Ｓ）を端末装置に送信する。 (6) It is also possible to realize the conversation evaluation device 100 by a server device (a single device or a server system configured by a plurality of devices) that communicates with a terminal device such as a mobile phone or a smartphone. For example, the conversation evaluation device 100 receives the voice signal X1 and the voice signal X2 from the terminal device, and evaluates the conversation between the user U1 and the user U2 by the same method as each of the above-described modes (for example, an evaluation value). S) is transmitted to the terminal device.

（７）前述の各形態で例示した会話評価装置１００は、前述の通り、制御装置１２とプログラムとの協働で実現され得る。例えば第１実施形態から第３実施形態に対応する第１態様のプログラムは、制御装置１２等のコンピュータ（例えば単数または複数の処理回路）を、会話を構成する音声の特徴量を取得する特徴取得部３２、会話について特徴量とは別種の関連情報Ｒを生成する情報生成部３４、および、特徴量と関連情報Ｒとに応じて会話を評価する会話評価部３６として機能させる。 (7) As described above, the conversation evaluation device 100 exemplified in each of the above-described modes can be realized by the cooperation of the control device 12 and the program. For example, the program of the first aspect corresponding to the first to third embodiments is a feature acquisition that causes a computer (for example, one or more processing circuits) such as the control device 12 to acquire a feature amount of voice that constitutes a conversation. The unit 32 functions as an information generation unit 34 that generates related information R of a type different from the feature amount and a conversation evaluation unit 36 that evaluates the conversation according to the feature amount and the related information R.

また、第４実施形態または第５実施形態に対応する第２態様のプログラムは、制御装置１２等のコンピュータ（例えば単数または複数の処理回路）を、会話を構成する音声の特徴量を取得する特徴取得部３２、会話について特徴量とは別種の関連情報Ｒを生成する情報生成部３４、および、特徴量に応じて会話を評価する会話評価部３６として機能させるプログラムであり、特徴取得部３２は、関連情報Ｒに応じた条件で特徴量を取得する。 In addition, the program of the second aspect corresponding to the fourth embodiment or the fifth embodiment causes a computer (for example, a single or a plurality of processing circuits) such as the control device 12 to acquire the feature amount of voice that constitutes a conversation. The acquisition unit 32 is a program that functions as an information generation unit 34 that generates related information R of a type different from the feature amount for conversation, and a conversation evaluation unit 36 that evaluates the conversation according to the feature amount. , The characteristic amount is acquired under the condition according to the related information R.

以上に例示した各態様のプログラムは、コンピュータが読取可能な記録媒体に格納された形態で提供されてコンピュータにインストールされ得る。記録媒体は、例えば非一過性（non-transitory）の記録媒体であり、ＣＤ-ＲＯＭ等の光学式記録媒体（光ディスク）が好例であるが、半導体記録媒体や磁気記録媒体等の公知の任意の形式の記録媒体を包含し得る。また、通信網を介した配信の形態でプログラムをコンピュータに配信することも可能である。 The program of each aspect illustrated above can be provided in a form stored in a computer-readable recording medium and installed in the computer. The recording medium is, for example, a non-transitory recording medium, and an optical recording medium (optical disk) such as a CD-ROM is a good example, but any known recording medium such as a semiconductor recording medium or a magnetic recording medium is used. The recording medium of this type may be included. It is also possible to distribute the program to the computer in the form of distribution via a communication network.

（８）本発明の好適な態様は、前述の各形態で例示した会話評価装置１００の動作方法（会話評価方法）としても特定され得る。第１態様に係る会話評価方法は、コンピュータ（単体のコンピュータまたは複数のコンピュータで構成されるシステム）が、会話を構成する音声の特徴量を取得し、会話について特徴量とは別種の関連情報Ｒを生成し、特徴量と関連情報Ｒとに応じて会話を評価する。第２態様に係る会話評価方法は、コンピュータが、会話を構成する音声の特徴量を取得し、会話について特徴量とは別種の関連情報Ｒを生成し、特徴量と関連情報Ｒとに応じて会話を評価する方法であって、特徴量の取得においては、関連情報Ｒに応じた条件で特徴量を取得する。 (8) The preferred aspect of the present invention can be specified as the operation method (conversation evaluation method) of the conversation evaluation device 100 exemplified in each of the above-described embodiments. In the conversation evaluation method according to the first aspect, a computer (a single computer or a system composed of a plurality of computers) acquires a feature amount of a voice forming a conversation, and the related information R of a type different from the feature amount regarding the conversation. Is generated and the conversation is evaluated according to the feature amount and the related information R. In the conversation evaluation method according to the second aspect, the computer acquires the characteristic amount of the voice that constitutes the conversation, generates the related information R of a type different from the characteristic amount of the conversation, and according to the characteristic amount and the related information R. This is a method of evaluating conversation, and in the acquisition of the characteristic amount, the characteristic amount is acquired under the condition according to the related information R.

１００…会話評価装置、１２…制御装置、１４…記憶装置、１６…表示装置、１８…入力装置、２２…収音装置、２４…収音装置、３２…特徴取得部、３４…情報生成部、３６…会話評価部。

100... Conversation evaluation device, 12... Control device, 14... Storage device, 16... Display device, 18... Input device, 22... Sound collection device, 24... Sound collection device, 32... Feature acquisition unit, 34... Information generation unit, 36... Conversation evaluation department.

Claims

A characteristic acquisition unit that acquires the pitch of a voice that constitutes a conversation between the first user and the second user ;
The frequency of conversations conducted in the past between the first user and the second user, and the number of conversations conducted in the past between the first user and the second user. An information generation unit that generates related information including at least one of a relation period that is an elapsed time from the first conversation ,
A conversation evaluation device, comprising: a conversation evaluation unit that evaluates the conversation according to the pitch and the related information.

The conversation evaluation device according to claim 1, wherein the feature acquisition unit acquires the pitch under a condition according to the related information.

The characteristic acquisition unit acquires the pitch of each of the first audio and second audio constituting the conversation,
The conversation evaluation unit, conversation evaluation apparatus according to claim 1 or claim 2 for evaluating the conversation in accordance with pitch difference between the second audio as the first audio.

Computer,
A characteristic acquisition unit that acquires the pitch of a voice that constitutes a conversation between the first user and the second user ,
The frequency of conversations conducted in the past between the first user and the second user, and the number of conversations conducted in the past between the first user and the second user. An information generation unit that generates related information including at least one of a relation period that is an elapsed time from the first conversation , and
A program that functions as a conversation evaluation unit that evaluates the conversation according to the pitch and the related information.

Acquiring the pitch of the voice that constitutes the conversation between the first user and the second user,
The frequency of conversations conducted in the past between the first user and the second user, and the number of conversations conducted in the past between the first user and the second user. Generate relevant information including at least one of the relationship period, which is the elapsed time from the first conversation,
Evaluate the conversation according to the pitch and the related information
A conversation evaluation method implemented by a computer.