JP7143591B2

JP7143591B2 - speaker estimation device

Info

Publication number: JP7143591B2
Application number: JP2018005622A
Authority: JP
Inventors: 勇太落合
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2018-01-17
Filing date: 2018-01-17
Publication date: 2022-09-29
Anticipated expiration: 2038-01-17
Also published as: JP2019124835A

Description

本発明は、発話者を推定する発話者推定装置の技術分野に関する。 The present invention relates to the technical field of a speaker estimation device for estimating a speaker.

この種の装置として、音声認識によって発話者を推定するものが知られている。例えば特許文献１では、音声認識と音声認証とを並列に行うことで、発話内容の認識と発話者の特定を並列に行うという技術が開示されている。特許文献２では、発話内容に特定のキーワードが含まれているか否かによって発話者の本人性を確認するという技術が開示されている。 As a device of this type, a device that estimates a speaker by voice recognition is known. For example, Patent Literature 1 discloses a technique of performing speech recognition and speech authentication in parallel, thereby performing speech recognition and speaker identification in parallel. Patent Literature 2 discloses a technique of confirming the identity of a speaker based on whether or not a specific keyword is included in the content of the speech.

その他の関連技術として、特許文献３では、スピーカから出力された発話内容に対する応答時間に基づいて、発話内容への興味の有無を判定するという技術が開示されている。特許文献４では、会話から抽出されたキーワードと、会話内容が入力された時のユーザの精神状態とに基づいて、ユーザの興味を判定するという技術が開示されている。特許文献５では、車両における着座位置、発話者及び会話内容に基づいて、乗員構成を推定するという技術が開示されている。 As another related technique, Patent Literature 3 discloses a technique of determining whether or not there is an interest in speech content based on the response time to the speech content output from a speaker. Patent Document 4 discloses a technique of determining a user's interest based on a keyword extracted from a conversation and the mental state of the user when the content of the conversation was input. Patent Literature 5 discloses a technique of estimating the occupant composition based on the seating position in the vehicle, the speaker, and the content of the conversation.

特開２０１６－０７１０５０号公報JP 2016-071050 A 特開２０１０－１０９６１８号公報JP 2010-109618 A 特開２０１７－１１１４９３号公報JP 2017-111493 A 特開２００９－２９４７９０号公報JP 2009-294790 A 特開２０１２－１３３５３０号公報JP 2012-133530 A

上述した特許文献１に記載されている技術では、音声認証（例えば、声紋データを利用した認証）を利用して発話者を特定している。しかしながら、音声認証のみでは、発話者を正確に特定することが難しい場合がある。即ち、上述した特許文献１を含む従来技術には、発話者を正確に特定するという点で精度向上の余地が十分に残されている。 In the technique described in the above-mentioned Patent Document 1, the speaker is specified using voice authentication (for example, authentication using voiceprint data). However, it may be difficult to accurately identify the speaker only with voice authentication. In other words, the related art including the above-mentioned Patent Document 1 leaves ample room for improvement in accuracy in terms of accurately identifying the speaker.

本発明は、例えば上記問題点に鑑みてなされたものであり、発話者を精度良く推定することが可能な発話者推定装置を提供することを課題とする。 SUMMARY OF THE INVENTION It is an object of the present invention to provide a speaker estimation apparatus capable of accurately estimating a speaker.

本発明に係る発話者推定装置の一態様では、ユーザに話題を提供する提供手段と、前記話題に対する前記ユーザの発話の内容及び音声の特徴の少なくとも一方を取得する取得手段と、前記発話の内容及び音声の特徴の少なくとも一方に基づいて、前記ユーザの特徴量を解析する解析手段と、前記特徴量に基づいて、前記ユーザの個人属性を推定する推定手段とを備える。 In one aspect of the speaker estimating device according to the present invention, provision means for providing a topic to a user, acquisition means for acquiring at least one of the content of the user's utterance and voice features with respect to the topic, and the content of the utterance. and an analysis means for analyzing the feature amount of the user based on at least one of the features of the speech, and an estimation means for estimating the personal attribute of the user based on the feature amount.

本実施形態に係る発話者推定装置の構成を示すブロック図である。1 is a block diagram showing the configuration of a speaker estimation device according to this embodiment; FIG. 本実施形態に係る発話者推定装置の動作の流れを示すフローチャートである。4 is a flow chart showing the operation flow of the speaker estimation device according to the embodiment; ユーザの語尾のパターンを解析するためのルールの一例を示す表である。FIG. 11 is a table showing an example of rules for analyzing patterns of endings of words of users; FIG. ユーザの会話の長さのパターンを解析するためのルールの一例を示す表である。4 is a table showing an example of rules for analyzing user conversation length patterns; ユーザの言いよどみのパターンを解析するためのルールの一例を示す表である。FIG. 11 is a table showing an example of rules for analyzing user hesitation patterns; FIG. ユーザの単語の繰り返しのパターンを解析するためのルールの一例を示す表である。FIG. 4 is a table showing an example of rules for analyzing a user's word repetition pattern; FIG. 特徴的な趣味と単語の分類パターンを解析するためのルールの一例を示す表である。FIG. 11 is a table showing an example of rules for analyzing characteristic hobbies and word classification patterns; FIG. ＰＯＩと単語の分類パターンを解析するためのルールの一例を示す表である。FIG. 11 is a table showing an example of rules for analyzing POI and word classification patterns; FIG. レストランと単語の分類パターンを解析するためのルールの一例を示す表である。FIG. 11 is a table showing an example of rules for analyzing classification patterns of restaurants and words; FIG. ユーザの感情表現を表す単語とスコアとの関係の一例を示す表である。4 is a table showing an example of the relationship between words representing user's emotional expressions and scores. ユーザの発話した文章とスコアとの関係の一例を示す表である。It is a table|surface which shows an example of the relationship between the sentence which a user uttered, and a score. ユーザ照合処理の具体的な方法の一例を示す表である。4 is a table showing an example of a specific method of user collation processing;

以下、図面を参照して発話者推定装置の実施形態について説明する。 An embodiment of a speaker estimation device will be described below with reference to the drawings.

＜装置構成＞
まず、本実施形態に係る発話者推定装置の構成について、図１を参照して説明する。図１は、本実施形態に係る発話者推定装置の構成を示すブロック図である。 <Device configuration>
First, the configuration of the speaker estimation device according to this embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the speaker estimation device according to this embodiment.

図１に示すように、本実施形態に係る発話者推定装置は、ＥＣＵ（ＥｌｅｃｔｉｒｃＣｏｎｔｒｏｌＵｎｉｔ）１００と、スピーカ２００と、マイク３００とを備えて構成されている。なお、発話者推定装置は、図示せぬ車両に搭載されており、車両のドライバを推定（特定）するための処理を実行する。 As shown in FIG. 1 , the speaker estimation device according to the present embodiment includes an ECU (Electric Control Unit) 100 , a speaker 200 and a microphone 300 . The speaker estimation device is mounted on a vehicle (not shown) and executes processing for estimating (identifying) the driver of the vehicle.

ＥＣＵ１００は、その機能を実現するための処理ブロックとして、話題提供部１１０、発話取得部１２０、特徴量解析部１３０、及びユーザ照合部１４０を備えている。 The ECU 100 includes a topic provision unit 110, an utterance acquisition unit 120, a feature amount analysis unit 130, and a user verification unit 140 as processing blocks for realizing the functions.

話題提供部１１０は、スピーカ２００を介して、車両のドライバに対して話題を提供することが可能に構成されている。即ち、話題提供部１１０は、車両のドライバと会話する機能を有している。話題提供部１１０は、ドライバを推定できていない段階では、無作為に話題を提供すればよい。一方、話題提供部１１０は、ドライバを推定した後では、ユーザが興味のある話題を提供してもよい。話題提供部１１０は、後述する付記における「提供手段」の一具体例である。 The topic providing unit 110 is configured to be able to provide topics to the driver of the vehicle via the speaker 200 . That is, the topic providing unit 110 has a function of conversing with the driver of the vehicle. The topic provision part 110 should just provide a topic at random at the stage where the driver cannot be estimated. On the other hand, the topic providing unit 110 may provide a topic in which the user is interested after estimating the driver. The topic providing unit 110 is a specific example of the "providing means" in the appendix described later.

発話取得部１２０は、マイク３００を介して、車両のドライバの発話を取得することが可能に構成されている。より具体的には、発話取得部１２０は、車両のドライバの発話内容（即ち、どんな内容の話をしているのか）、及び音声の特徴（例えば、音声の速さ、音声の高さ、抑揚等）を取得する。発話取得部１２０で取得された発話内容及び音声の特徴は、特徴量解析部に出力される構成となっている。発話取得部１２０は、後述する付記における「取得手段」の一具体例である。 The speech acquisition unit 120 is configured to be able to acquire the speech of the driver of the vehicle via the microphone 300 . More specifically, the utterance acquisition unit 120 acquires the utterance content of the driver of the vehicle (i.e., what kind of content is being spoken) and voice features (for example, voice speed, voice pitch, intonation, etc.). etc.). The utterance contents and voice features acquired by the utterance acquisition unit 120 are configured to be output to the feature amount analysis unit. The utterance acquisition unit 120 is a specific example of the “acquisition unit” in the appendix described later.

特徴量解析部１３０は、発話取得部１２０で取得された発話内容及び音声の特徴に基づいて、ユーザを推定するための特徴量を解析することが可能に構成されている。なお、特徴量解析部１３０が解析する特徴量及び具体的な解析方法については、後に詳述する。特徴量解析部１３０で解析された特徴量は、ユーザ照合部１４０に出力される構成となっている。特徴量解析部１３０は、後述する付記における「解析手段」の一具体例である。 The feature amount analysis unit 130 is configured to be able to analyze feature amounts for estimating the user based on the utterance contents and voice features acquired by the utterance acquisition unit 120 . Note that the feature amount analyzed by the feature amount analysis unit 130 and a specific analysis method will be described in detail later. The feature amount analyzed by the feature amount analysis unit 130 is configured to be output to the user verification unit 140 . The feature quantity analysis unit 130 is a specific example of the “analyzing means” in the appendix described later.

ユーザ照合部１４０は、特徴量解析部１３０で解析された特徴量に基づいて、ユーザ照合処理を実行することが可能に構成されている。即ち、ユーザ照合部１４０は、現在の車両のドライバが誰であるのかの推定（特定）する機能を有している。ユーザ照合部１４０による照合結果は、その後のドライバに対する車両内サービスの提供（例えば、提供する話題の内容等）を決定するために用いられる。ユーザ照合部１４０は、後述する付記における「推定手段」の一具体例である。 The user verification unit 140 is configured to be able to execute user verification processing based on the feature quantity analyzed by the feature quantity analysis unit 130 . That is, the user verification unit 140 has a function of estimating (identifying) who is the current driver of the vehicle. The result of matching by the user matching unit 140 is used to determine the subsequent provision of in-vehicle services to the driver (for example, the content of topics to be provided). The user collation unit 140 is a specific example of the “estimating means” in the appendix described later.

＜動作の流れ＞
次に、本実施形態に係る発話者推定装置の動作の流れについて、図２を参照して説明する。図２は、本実施形態に係る発話者推定装置の動作の流れを示すフローチャートである。 <Flow of operation>
Next, the operation flow of the speaker estimation device according to this embodiment will be described with reference to FIG. FIG. 2 is a flow chart showing the operation flow of the speaker estimation device according to this embodiment.

図２に示すように、本実施形態に係る発話者推定装置の動作時には、まずドライバが乗車したか否かを判定する（ステップＳ１１）。ドライバが乗車していないと判定された場合（ステップＳ１１：ＮＯ）、所定期間後に再びステップＳ１１の処理が実行される。 As shown in FIG. 2, when the speaker estimation device according to the present embodiment operates, it is first determined whether or not the driver has gotten into the vehicle (step S11). When it is determined that the driver is not in the vehicle (step S11: NO), the process of step S11 is executed again after a predetermined period of time.

ドライバが乗車していると判定された場合（ステップＳ１１：ＹＥＳ）、話題提供部１１０によるドライバとの会話をスタートする（ステップＳ１２）。会話がスタートとすると、話題提供部１１０は、ドライバに対して無作為に話題を提供する（ステップＳ１３）。 If it is determined that the driver is in the vehicle (step S11: YES), the topic providing unit 110 starts talking with the driver (step S12). When the conversation starts, topic providing unit 110 randomly provides topics to the driver (step S13).

続いて、発話取得部１２０が、提供された話題に対するドライバの発話内容を取得する（ステップＳ１４）。その後、発話取得部１２０は、解析に十分な量の発話内容を取得したか否かを判定する（ステップＳ１５）。解析に十分な量の発話内容を取得していないと判定された場合（ステップＳ１５：ＮＯ）、発話取得部１２０は、ドライバの発話内容を取得する処理を続行する。或いは、新たな発話内容を取得するために、話題提供部１１０が、ドライバに対して別の話題を提供するようにしてもよい。 Subsequently, the speech acquisition unit 120 acquires the content of the driver's speech on the provided topic (step S14). After that, the speech acquisition unit 120 determines whether or not a sufficient amount of speech content for analysis has been acquired (step S15). If it is determined that a sufficient amount of utterance content for analysis has not been acquired (step S15: NO), the utterance acquisition unit 120 continues the process of acquiring the utterance content of the driver. Alternatively, the topic providing unit 110 may provide another topic to the driver in order to acquire new utterance content.

解析に十分な量の発話内容を取得していると判定された場合（ステップＳ１５：ＹＥＳ）、特徴量解析部１３０が、取得した発話内容に基づいて、ドライバを推定するための特徴量を解析する。具体的には、特徴量解析部１３０は、ドライバが応答に使うフレーズの解析（ステップＳ１６）、ドライバ応答に使う単語の解析（ステップＳ１７）、話題に対するドライバの感情の解析（ステップＳ１８）をそれぞれ実行する。なお、上記特徴量の具体的な解析方法については、後に詳述する。 If it is determined that a sufficient amount of speech content for analysis has been acquired (step S15: YES), the feature quantity analysis unit 130 analyzes the feature quantity for estimating the driver based on the acquired speech content. do. Specifically, the feature amount analysis unit 130 analyzes phrases used in the driver's response (step S16), analyzes words used in the driver's response (step S17), and analyzes the driver's emotion with respect to the topic (step S18). Run. A specific method of analyzing the feature amount will be described in detail later.

上述したドライバの発話内容に基づく処理（即ち、ステップＳ１４～ステップＳ１８）を行う一方で、発話取得部１２０は、提供された話題に対するドライバの音声の特徴も取得する（ステップＳ１９）。その後、発話取得部１２０は、解析に十分な量の音声の特徴を取得したか否かを判定する（ステップＳ２０）。解析に十分な量の音声の特徴を取得していないと判定された場合（ステップＳ２０：ＮＯ）、発話取得部１２０は、ドライバの音声の特徴を取得する処理を続行する。或いは、新たな発話内容を取得するために、話題提供部１１０が、ドライバに対して別の話題を提供するようにしてもよい。 While performing the above-described processing based on the content of the driver's utterance (that is, steps S14 to S18), the utterance acquisition unit 120 also acquires the features of the driver's speech for the provided topic (step S19). After that, the speech acquisition unit 120 determines whether or not a sufficient amount of speech features for analysis has been acquired (step S20). If it is determined that a sufficient amount of voice features for analysis has not been acquired (step S20: NO), the speech acquisition unit 120 continues the process of acquiring the driver's voice features. Alternatively, the topic providing unit 110 may provide another topic to the driver in order to acquire new utterance content.

解析に十分な量の音声の特徴を取得していると判定された場合（ステップＳ２０：ＹＥＳ）、特徴量解析部１３０が、取得した音声の特徴に基づいて、ドライバを推定するための特徴量を解析する。具体的には、特徴量解析部１３０は、提供した話題に対するユーザの声のトーンを解析する（ステップＳ２１）。なお、上記特徴量の具体的な解析方法については、後に詳述する。 If it is determined that a sufficient amount of speech features for analysis has been acquired (step S20: YES), the feature quantity analysis unit 130 acquires feature quantities for estimating the driver based on the acquired speech features. to parse Specifically, the feature amount analysis unit 130 analyzes the tone of the user's voice on the provided topic (step S21). A specific method of analyzing the feature amount will be described in detail later.

特徴量解析部１３０が特徴量の解析を終了した後は、ユーザ照合部１４０が、解析結果として得られる特徴量に基づいて、ユーザ照合処理を実行する（ステプＳ２２）。即ち、現在の車両のドライバが誰なのかを推定するための処理を実行する。ユーザ照合処理の具体的な内容については、後に詳述する。 After the feature amount analysis unit 130 finishes analyzing the feature amount, the user verification unit 140 executes user verification processing based on the feature amount obtained as the analysis result (step S22). That is, a process for estimating who the current driver of the vehicle is is executed. Specific contents of the user collation processing will be described in detail later.

＜応答に使うフレーズの解析＞
次に、ユーザが応答によく使うフレーズの解析（即ち、図２のステップＳ１６の処理）について、図３から図６を参照して具体的に説明する。図３は、ユーザの語尾のパターンを解析するためのルールの一例を示す表である。図４は、ユーザの会話の長さのパターンを解析するためのルールの一例を示す表である。図５は、ユーザの言いよどみのパターンを解析するためのルールの一例を示す表である。図６は、ユーザの単語の繰り返しのパターンを解析するためのルールの一例を示す表である。 <Analysis of phrases used in responses>
Next, the analysis of phrases frequently used in responses by users (that is, the process of step S16 in FIG. 2) will be specifically described with reference to FIGS. 3 to 6. FIG. FIG. 3 is a table showing an example of rules for analyzing user's ending patterns. FIG. 4 is a table showing an example of rules for analyzing patterns of conversation lengths of users. FIG. 5 is a table showing an example of rules for analyzing user hesitation patterns. FIG. 6 is a table showing an example of rules for analyzing a user's word repetition pattern.

なお、応答に使うフレーズを解析するためのルールは、所定の特徴を示すパターン毎に事前に作成されている。また、作成したルールを機械学習することでモデルを作成してもよい。例えば、サポートベクターマシンを使用して分類モデルを作成して、正解データとして所定のパターンを予め分類し、決まった分類の中から特徴量を自動的に判定するようにしてもよい。或いは、学習データをＤＮＮ（ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）に入力して自動的に特徴量を出力するようにしてもよい。 Note that rules for analyzing phrases used in responses are created in advance for each pattern indicating a predetermined characteristic. A model may also be created by performing machine learning on the created rules. For example, a support vector machine may be used to create a classification model, pre-classify predetermined patterns as correct data, and automatically determine feature quantities from the determined classification. Alternatively, the learning data may be input to a DNN (Deep Neural Network) to automatically output the feature amount.

図３に示すように、応答に使うフレーズの解析では、語尾のパターンを特徴量として解析してもよい。具体的には、「～です」、「～ます」等の丁寧語を１分以内に３回以上使うか否か、「～じゃん」「～だべ」等の方言を１分以内に３回以上使うか否か、「～でしょ」、「～だろ」等の簡易的な表現を１分以内に３回以上使うか否かを判定すればよい。 As shown in FIG. 3, in the analysis of phrases used in responses, patterns of word endings may be analyzed as feature amounts. Specifically, whether or not to use polite words such as "~desu" and "~masu" three times or more within one minute, and dialects such as "~jan" and "~dabe" three times within one minute. It is sufficient to determine whether or not they use more than that, and whether or not they use simple expressions such as "--sho" and "--daro" three or more times within one minute.

図４に示すように、応答に使うフレーズの解析では、会話の長さのパターンを特徴量として解析してもよい。具体的には、一人で同じ話題をひたすら話すか否か（１時間以内に１回５分の話を２回以上するか否か）、一人で番う話題をひたすら話すか否か（１時間以内に１回５分の話を２回以上するか否か）、簡潔に話すか否か（１回の話が２０秒程度か否か）を判定すればよい。なお、上記条件以外の場合には、一般平均であると判定すればよい。 As shown in FIG. 4, in the analysis of phrases used in responses, the pattern of the length of conversation may be analyzed as a feature amount. Specifically, whether or not one person talks intently on the same topic (whether or not one person talks for five minutes twice or more within an hour), whether or not one person talks about the same topic (one hour It is sufficient to determine whether or not the speaker speaks twice or more for five minutes at a time within the same period or not, and whether or not the speaker speaks briefly (whether one talk lasts about 20 seconds). In addition, in cases other than the above conditions, it may be judged to be a general average.

図５に示すように、応答に使うフレーズの解析では、言いよどみのパターンを特徴量として解析してもよい。具体的には、「え～」、を１分以内に３回以上使うか否か、「あの～」を１分以内に３回以上使うか否か、どもることが１分以内に３回以上あるか否かを判定すればよい。 As shown in FIG. 5, in the analysis of phrases used in responses, hesitation patterns may be analyzed as feature quantities. Specifically, whether or not you use "eh~" three or more times within one minute, whether or not you use "ah~" three or more times within one minute, and whether or not you stutter three or more times within one minute It should be determined whether or not there is

図６に示すように、応答に使うフレーズの解析では、単語の繰り返しのパターンを特徴量として解析してもよい。具体的には、「あれ」、「それ」等の指示語を１分以内に３回以上使うか否か、固有名詞（例えば、野球選手の名前や場所の名前等）を１分以内に３回以上使うか否か、「あいつ」、「そいつ」等の代名詞を１分以内に３回以上使うか否かを判定すればよい。なお、上記条件以外の場合には、一般平均であると判定すればよい。 As shown in FIG. 6, in the analysis of phrases used in responses, patterns of repeated words may be analyzed as feature quantities. Specifically, whether or not the demonstrative words such as "that" and "that" are used three or more times within one minute, and whether or not proper nouns (for example, the name of a baseball player, the name of a place, etc.) are used three times or more within one minute It is sufficient to determine whether or not the pronouns such as "that guy" and "that guy" are used three or more times within one minute. In addition, in cases other than the above conditions, it may be judged to be a general average.

＜応答に使う単語の解析＞
次に、ユーザが応答によく使う単語の解析（即ち、図２のステップＳ１７の処理）について、図７から図９を参照して具体的に説明する。図７は、特徴的な趣味と単語の分類パターンを解析するためのルールの一例を示す表である。図８は、ＰＯＩと単語の分類パターンを解析するためのルールの一例を示す表である。図９は、レストランと単語の分類パターンを解析するためのルールの一例を示す表である。 <Analysis of words used in responses>
Next, the analysis of words frequently used in responses by the user (that is, the process of step S17 in FIG. 2) will be specifically described with reference to FIGS. 7 to 9. FIG. FIG. 7 is a table showing an example of rules for analyzing characteristic hobbies and word classification patterns. FIG. 8 is a table showing an example of rules for analyzing POI and word classification patterns. FIG. 9 is a table showing an example of rules for analyzing restaurant and word classification patterns.

なお、応答に使う単語を解析するためのルールは、所定ジャンルの単語ごとに事前に作成されている。また、作成したルールを機械学習することでモデルを作成してもよい。例えば、サポートベクターマシンを使用して分類モデルを作成して、正解データとして所定のパターンを予め分類し、決まった分類の中から特徴量を自動的に判定するようにしてもよい。 Note that rules for analyzing words used in responses are created in advance for each word of a predetermined genre. A model may also be created by performing machine learning on the created rules. For example, a support vector machine may be used to create a classification model, pre-classify predetermined patterns as correct data, and automatically determine feature quantities from the determined classification.

図７に示すように、応答に使う単語の解析では、趣味に関する単語の使用回数から特徴量である趣味を判定してもよい。具体的には、野球選手の名前やチームの名前を１時間以内に５回以上使っている場合には、野球が趣味であると判定すればよい。サッカー選手の名前やチームの名前を１時間以内に５回以上使っている場合には、サッカーが趣味であると判定すればよい。ハイキングによく使われる場所の名前を１時間以内に５回以上使っている場合には、ハイキングが趣味であると判定すればよい。本の作品名１時間以内に５回以上使っている場合には、読書が趣味であると判定すればよい。旅行によく使われる場所の名前や観光名所を１時間以内に５回以上使っている場合には、旅行が趣味であると判定すればよい。 As shown in FIG. 7, in the analysis of the words used in the response, the hobby, which is a feature amount, may be determined from the number of times the word related to the hobby is used. Specifically, if the name of a baseball player or the name of a team is used five times or more within one hour, it may be determined that baseball is a hobby. If the soccer player's name or team's name is used five times or more within one hour, it can be determined that soccer is a hobby. If the name of a place frequently used for hiking is used five times or more within one hour, it may be determined that hiking is a hobby. If the work name of the book is used five times or more within one hour, it may be determined that reading is a hobby. If the name of a place or a tourist attraction that is frequently used for travel is used five times or more within one hour, it may be determined that travel is a hobby.

図８に示すように、応答に使う単語の解析では、ＰＯＩ（ＰｏｉｎｔＯｆＩｎｔｅｒｅｓｔ）に関する単語の使用回数から特徴量であるＰＯＩを判定してもよい。具体的には、映画館の名前を１時間以内に５回以上使っている場合には、映画館がＰＯＩであると判定すればよい。博物館に展示されている作品の名前や、展覧会のジャンル名を１時間以内に５回以上使っている場合には、博物館がＰＯＩであると判定すればよい。 As shown in FIG. 8, in the analysis of words used in responses, POIs, which are feature amounts, may be determined from the number of times words are used regarding POIs (Point Of Interest). Specifically, if the name of the movie theater is used five times or more within one hour, it may be determined that the movie theater is the POI. If the name of a work exhibited in the museum or the name of the genre of the exhibition is used five times or more within one hour, the museum may be determined to be the POI.

図９に示すように、応答に使う単語の解析では、レストランに関する単語の使用回数から特徴量であるレストランを判定してもよい。具体的には、イタリアンのレストランの名前や料理名を１時間以内に５回以上使っている場合には、イタリアンのレストランを特徴量として判定すればよい。和食のレストランの名前や料理名を１時間以内に５回以上使っている場合には、和食のレストランを特徴量として判定すればよい。 As shown in FIG. 9, in analyzing the words used in the response, the restaurant, which is a feature amount, may be determined from the number of times the words related to the restaurant are used. Specifically, when the name of an Italian restaurant or the name of a dish is used five times or more within one hour, the Italian restaurant may be determined as the feature quantity. If the name of a Japanese restaurant or the name of a dish is used five times or more within one hour, the Japanese restaurant may be determined as the feature quantity.

＜話題に対する感情の解析＞
次に、話題に対するユーザの感情の解析（即ち、図２のステップＳ１８の処理）について、図１０及び図１１を参照して具体的に説明する。図１０は、ユーザの感情表現を表す単語とスコアとの関係の一例を示す表である。図１１は、ユーザの発話した文章とスコアとの関係の一例を示す表である。 <Analysis of emotions toward topics>
Next, the analysis of the user's emotion with respect to the topic (that is, the process of step S18 in FIG. 2) will be specifically described with reference to FIGS. 10 and 11. FIG. FIG. 10 is a table showing an example of the relationship between words representing user's emotional expressions and scores. FIG. 11 is a table showing an example of the relationship between sentences uttered by users and scores.

図１０に示すように、話題に対する感情を判定する際には、感情を表す単語にスコアを付け、そのスコアの合計値または平均値を用いて判定すればよい。具体的には、「すばらしい」という単語を使った場合には“＋２”、「最悪だ」という単語を使った場合には“－２”、「まあまあだね」という単語を使った場合には“＋１”というスコアをつければよい。なお、“＋”はポジティブな感情に対するスコア、“－”はネガティブな感情に対するスコアである。 As shown in FIG. 10, when judging an emotion about a topic, a score may be given to words expressing the emotion, and the total value or average value of the scores may be used for determination. Specifically, when using the word "wonderful", "+2", when using the word "worst", "-2", when using the word "so-so" should be given a score of "+1". "+" is the score for positive emotions, and "-" is the score for negative emotions.

図１１に示すように、話題に対する感情を判定する際には、文章の特徴を機械学習により学習して判定してもよい。例えば、文章に対して感情の正解値を付け、その結果をもとに機械学習を行い、モデルを作成すればよい。この時の学習ロジックは、例えばサポートベクターマシンやＤＮＮを用いればよい。このようなモデルによれば、例えば「昨日の○○○はすごかったね」という文章を使った場合に、“＋２”のスコアが付けられる。「昨日食べたパスタは最悪だったわ」という文章を使った場合に、“－２”のスコアが付けられる。「今日の□□□はなかなかいいね」という文章を使った場合に、“＋１”のスコアが付けられる。 As shown in FIG. 11, when judging an emotion about a topic, it may be determined by learning features of sentences by machine learning. For example, it is possible to assign a correct emotion value to a sentence, perform machine learning based on the result, and create a model. For the learning logic at this time, for example, a support vector machine or DNN may be used. According to such a model, a score of "+2" is assigned to a sentence such as "Yesterday's XX was amazing." A score of "-2" is given when using the sentence "The pasta I ate yesterday was the worst." A score of “+1” is given when the sentence “Today's □□□ is quite good” is used.

＜話題に対するトーンの解析＞
次に、話題に対するユーザの声のトーンの解析（即ち、図２のステップＳ２１の処理）について具体的に説明する。 <Analysis of tone for topic>
Next, the analysis of the tone of the user's voice on the topic (that is, the process of step S21 in FIG. 2) will be specifically described.

なお、声のトーンを解析するためのルールは、音声の特徴に基づいて事前に作成されている。また、作成したルールを機械学習することでモデルを作成してもよい。例えば、サポートベクターマシンを使用して分類モデルを作成して、正解データとして所定のパターンを予め分類し、決まった分類の中から特徴量を自動的に判定するようにしてもよい。 Note that the rules for analyzing the tone of voice are created in advance based on the features of the voice. A model may also be created by performing machine learning on the created rules. For example, a support vector machine may be used to create a classification model, pre-classify predetermined patterns as correct data, and automatically determine feature quantities from the determined classification.

より具体的には、話題に対するトーンの解析では、音声の速さ、音声の高さ（周波数）、抑揚等に基づいて、声のトーンがポジティブであるか又はネガティブであるかを判定すればよい。例えば、音声が速く、且つ高い場合には、ポジティブなトーンであると判定すればよい。また、音声が遅く、且つ低い場合には、ネガティブなトーンであると判定すればよい。 More specifically, in tone analysis for a topic, it is sufficient to determine whether the tone of voice is positive or negative based on the speed of speech, pitch (frequency) of speech, intonation, etc. . For example, if the voice is fast and high, it may be determined that the tone is positive. Also, if the voice is slow and low, it may be determined that the tone is negative.

＜ユーザ照合処理＞
次に、ユーザ照合処理（即ち、図２のステップＳ２２の処理）について、図１２を参照して具体的に説明する。図１２は、ユーザ照合処理の具体的な方法の一例を示す表である。 <User verification process>
Next, the user verification process (that is, the process of step S22 in FIG. 2) will be specifically described with reference to FIG. FIG. 12 is a table showing an example of a specific method of user verification processing.

ユーザ照合処理は、上述した解析処理の結果として得られた特徴量の組み合わせを利用して、判定器を作成して行えばよい。なお、判定器を作成する場合には機械学習を行ってもよい。 The user collation processing may be performed by creating a determiner using a combination of feature amounts obtained as a result of the analysis processing described above. Note that machine learning may be performed when creating the determiner.

図１４に示す判定器を用いる場合には、よく使うフレーズのパターンが「丁寧語をよく使う」であり、よく話す話題が「野球」であり、よく話す単語が「野球選手名」であり、よく話す話題の感情が「ポジティブ」であり、話題のトーンが「ポジティブ」である場合に、ユーザパターンは“Ａ”であると判定される。よく話す話題が「サッカー」であり、よく話す単語が「サッカー選手名」であり、よく話す話題の感情が「ネガティブ」であり、話題のトーンが「ポジティブ」である場合には、ユーザパターンは“Ｂ”であると判定される。よく使うフレーズのパターンが「簡易表現が多い」であり、よく話す話題が「読書」であり、よく話す単語が「作品名」であり、よく話す話題の感情が「ポジティブ」であり、話題のトーンが「暗い（ネガティブ）」である場合には、ユーザパターンは“Ｃ”であると判定される。 When using the determiner shown in FIG. 14, the frequently used phrase pattern is "use polite language often", the frequently spoken topic is "baseball", the frequently spoken word is "baseball player's name", The user pattern is determined to be "A" when the sentiment of the topic of frequent discussion is "positive" and the tone of the topic is "positive." If the frequently talked topic is "soccer", the frequently talked word is "soccer player name", the sentiment of the frequently talked topic is "negative", and the topic tone is "positive", then the user pattern is: It is determined to be "B". The frequently used phrase pattern is "many simple expressions", the frequently talked about topic is "reading", the frequently talked about word is "title of work", and the frequently talked about topic sentiment is "positive" If the tone is "dark (negative)", the user pattern is determined to be "C".

以上説明したように、本実施形態に係る発話者推定装置によれば、ドライバの発話内容及び音声の特徴から解析された複数の特徴量に基づいて、ユーザパターン（即ち、ドライバが誰であるのか）が推定される。よって、現在の車両のドライバ（即ち、発話者）が誰であるのかを、極めて高い精度で推定することが可能である。 As described above, according to the speaker estimating device according to the present embodiment, a user pattern (that is, who the driver is is ) is estimated. Therefore, it is possible to estimate who the current vehicle driver (that is, the speaker) is with extremely high accuracy.

＜付記＞
以上説明した実施形態から導き出される発明の各種態様を以下に説明する。 <Appendix>
Various aspects of the invention derived from the embodiments described above will be described below.

（付記１）
付記１に記載の発話者推定装置は、ユーザに話題を提供する提供手段と、前記話題に対する前記ユーザの発話の内容及び音声の特徴の少なくとも一方を取得する取得手段と、前記発話の内容及び音声の特徴の少なくとも一方に基づいて、前記ユーザの特徴量を解析する解析手段と、前記特徴量に基づいて、前記ユーザの個人属性を推定する推定手段とを備える。 (Appendix 1)
The speaker estimating device described in appendix 1 includes providing means for providing a topic to a user, acquisition means for acquiring at least one of the content of the user's utterance and voice features for the topic, and the content and voice of the utterance. and an estimating means for estimating personal attributes of the user based on the feature amount.

付記１に記載の発話者推定装置によれば、提供した話題に対するユーザの発話の内容及び音声の特徴の少なくとも一方に基づいて、ユーザの特徴量が解析される。そして、解析された特徴量に基づいて、ユーザの個人属性が推定される。なお、「特徴量」とは、ユーザの個人属性を推定するためのパラメータであり、例えばユーザがよく使うフレーズや単語に関する情報、提供された話題に対するユーザの感情や声のトーン等を含んでいる。「個人属性」とは、ユーザ個人を特定するための属性情報であり、例えばユーザの本人性（本人らしさ）を示す情報である。 According to the speaker estimation device described in appendix 1, the feature amount of the user is analyzed based on at least one of the content of the user's utterance and the feature of the voice with respect to the provided topic. Then, the user's personal attribute is estimated based on the analyzed feature amount. Note that the "feature amount" is a parameter for estimating the user's personal attributes, and includes, for example, information on phrases and words that the user frequently uses, the user's emotion and tone of voice regarding the provided topic, and the like. . “Individual attribute” is attribute information for specifying an individual user, and is, for example, information indicating the user's identity (identity).

上述した構成によれば、ユーザ（即ち、発話者）の発話から解析される特徴量に基づいて個人属性が推定されるため、単に音声認証等でユーザの個人属性を推定する場合と比べると、より高い精度でユーザを推定（言い換えれば、特定）することが可能である。 According to the above-described configuration, the personal attribute is estimated based on the feature amount analyzed from the utterance of the user (that is, the speaker). It is possible to estimate (in other words, identify) the user with higher accuracy.

本発明は、上述した実施形態に限られるものではなく、請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う発話者推定装置もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiments, and can be modified as appropriate within the scope not contrary to the gist or idea of the invention that can be read from the scope of claims and the entire specification. is also included in the technical scope of the present invention.

１００ＥＣＵ
１１０話題提供部
１２０発話取得部
１３０特徴量解析部
１４０ユーザ照合部
２００スピーカ
２００マイク 100 ECUs
110 Topic Providing Unit 120 Speech Acquisition Unit 130 Feature Amount Analysis Unit 140 User Verification Unit 200 Speaker 200 Microphone

Claims

a providing means for providing a topic to a user;
Acquisition means for acquiring the contents of the user's utterances and voice features for the topic;
analysis means for analyzing the emotion on the topic based on the content of the utterance and analyzing the tone of voice on the topic based on the characteristics of the voice ;
a speaker estimation device, comprising: estimation means for estimating personal attributes of the user based on the topic, feelings about the topic, and tone of voice about the topic .