JP4380978B2

JP4380978B2 - COMMUNICATION SYSTEM, COMMUNICATION TERMINAL DEVICE, COMMUNICATION METHOD

Info

Publication number: JP4380978B2
Application number: JP2002318737A
Authority: JP
Inventors: 淳富士本
Original assignee: Aruze Corp
Current assignee: Universal Entertainment Corp
Priority date: 2002-10-31
Filing date: 2002-10-31
Publication date: 2009-12-09
Anticipated expiration: 2022-10-31
Also published as: JP2004153707A

Description

【０００１】
【発明の属する技術分野】
本発明は、ユーザが不在時において、所定の処理を行う通信システム、通信端末装置、通信方法に関する。
【０００２】
【従来の技術】
従来、例えば、ユーザが自宅にいないときにおいて、相手の電話装置から、ユーザの電話装置に呼び出し信号が送られた場合、以下の処理が行われる。留守番モードに設定されたユーザの電話装置は、自動的にオフフック状態になる。そして、上記電話装置は、所定のメッセージを相手の電話装置に送信する。
【０００３】
そして、従来では、以下のような技術があった。留守番モードに設定された電話装置は、相手の電話装置から送られた所定のキーワードの音声信号に基づいて、話者を特定する。そして、留守番モードに設定された電話装置は、話者に対応する応答メッセージを上記相手の電話装置に送る（例えば、特許文献１参照）。
【０００４】
また、ユーザが自宅にいないときに、自宅に訪問者が訪れた場合、所定の装置は、訪問者を検出すると、以下の処理を行う。上記所定の装置は、不在である旨を示すメッセージを取得し、上記訪問者に出力する。
【０００５】
【特許文献１】
特開平９−３１２６８６号公報
【０００６】
【発明が解決しようとする課題】
上述した従来技術では、以下のような問題があった。例えば、ユーザが自宅にいないときに、相手の電話装置から呼び出し信号が、自宅の電話装置に送られてきた場合、ユーザは、相手によっては、直接通話したいと考えることもある。一方、ユーザは、相手によっては、直接通話しないで、所定の応答文を相手の電話装置から出力させればよいと考えることもある。
【０００７】
同じく、ユーザが自宅にいないときに、訪問者が自宅に来た場合、訪問者によっては、ユーザは、直接通話したいと考えることもある。一方、訪問者によっては、ユーザは、直接通話しないで、所定の応答文が訪問者に出力されればよいと考えることもある。
【０００８】
このため、例えば、ユーザが自宅にいない場合に、訪問者や電話をかけてきた相手（以下、訪問者等という）に応じて、ユーザが訪問者等と通話できるようになったり、応答文が訪問者等に出力されるようになる通信システムの開発が望まれていた。
【０００９】
本発明の目的は、例えば、ユーザが自宅にいない場合に、訪問者等に応じて、ユーザが訪問者等と通話できるようになったり、応答文が訪問者等に出力されるようになる通信システム、通信端末装置、通信方法を提供することである。
【００１０】
【課題を解決するための手段】
以上の問題点を解決するために、本発明は、所定場所に配置された通信端末と、前記所定場所から離れている前記通信端末のユーザが携帯する携帯端末とを有する通信システムであって、
前記通信端末は、
話者により予め入力された所定入力情報の音声信号の特徴を示す第１特徴データと対応づけられた、前記ユーザが予め選定した話者を特定する情報である話者特定情報を予め複数記憶する第１記憶手段と、
話者により入力された所定入力情報に基づいて、前記所定入力情報の音声信号の特徴を示す第２特徴データを抽出する特徴抽出手段と、
前記第２特徴データと各第１特徴データとの間の類似度を計算し、計算された各類似度のうち、最も高い類似度が所定値を超える場合、前記最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得する第１取得手段と、
各話者特定情報と、前記携帯端末との間で通信可能な状態に設定するように指示する指令である第１指令、又は、話者に対して応答文を出力するように指示する指令である第２指令とが対応づけられている対応データを参照して、前記第１取得手段により取得された話者特定情報に対応する指令を取得する第２取得手段と、
前記第２取得手段により取得された指令が第１指令の場合、前記話者と前記ユーザとの間で通話が行えるように、前記携帯端末との間で、データ通信可能な状態に設定する通信手段と、
前記第２取得手段により取得された指令が第２指令の場合、前記話者に対して、前記応答文を出力する出力手段とを有し、
前記第１取得手段における所定値は、特定の言葉に対し、前記第１取得手段が計算した前記第１特徴データと前記第２特徴データの類似度が、一定値以上であるか否かの実験を行い、該実験の結果、前記類似度が前記一定値以上である場合に話者を特定できたと判断したときの前記一定値であり、
さらに、前記通信端末は、予め設定された特定の言葉である所定入力情報を音声入力させるメッセージ情報を出力する手段と、
取得した前記所定入力情報を示す音声信号を音声入力手段を介して、前記特徴抽出手段に送る手段とを有し、
前記特徴抽出手段は、送られてきた所定入力情報を示す音声信号の音声区間の検出を行い、検出した前記音声区間内において、第２特徴データの抽出の処理を行うとともに、
前記特徴抽出手段で前記第２特徴データが抽出ができなかった場合に制御手段は所定の応答文を出力する通信システムである。
【００１１】
また、本発明は、所定場所に配置され、前記所定場所から離れているユーザにより所有されている通信端末装置であって、
話者により予め入力された所定入力情報の音声信号の特徴を示す第１特徴データと対応づけられた、前記ユーザが予め選定した話者を特定する情報である話者特定情報を予め複数記憶する第１記憶手段と、
話者により入力された所定入力情報に基づいて、前記所定入力情報の音声信号の特徴を示す第２特徴データを抽出する特徴抽出手段と、
前記第２特徴データと各第１特徴データとの間の類似度を計算し、計算された各類似度のうち、最も高い類似度が所定値を超える場合に、前記最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得する第１取得手段と、
各話者特定情報と、前記ユーザが携帯する携帯端末と通信を行うように指示する指令である第１指令、又は、前記話者に対して応答文を出力するように指示する指令である第２指令と、が対応づけられている対応データを参照して、前記第１取得手段により取得された話者特定情報に対応する指令を取得する第２取得手段と、
前記第２取得手段により取得された指令が第１指令の場合、前記話者と前記ユーザとの間で通話が行えるように、前記携帯端末との間で、データ通信可能な状態に設定する通信手段と、
前記第２取得手段により取得された指令が第２指令の場合、前記話者に対して、前記応答文を出力する出力手段とを有し、
前記第１取得手段における所定値は、特定の言葉に対し、前記第１取得手段が計算した前記第１特徴データと前記第２特徴データの類似度が、一定値以上であるか否かの実験を行い、該実験の結果、前記類似度が前記一定値以上である場合に話者を特定できたと判断したときの前記一定値であり、
さらに、前記通信端末装置は、予め設定された特定の言葉である所定入力情報を音声入力させるメッセージ情報を出力する手段と、
取得した前記所定入力情報を示す音声信号を音声入力手段を介して、前記特徴抽出手段に送る手段とを有し、
前記特徴抽出手段は、送られてきた所定入力情報を示す音声信号の音声区間の検出を行い、検出した前記音声区間内において、第２特徴データの抽出の処理を行うとともに、
前記特徴抽出手段で前記第２特徴データが抽出ができなかった場合に制御手段は所定の応答文を出力する通信端末装置である。
【００１２】
また、本発明は、所定場所に配置された通信端末と、前記所定場所から離れている前記通信端末のユーザが携帯する携帯端末と、を用いた通信方法であって、
話者により予め入力された所定入力情報の音声信号の特徴を示す第１特徴データと対応づけられた、前記ユーザが予め選定した話者を特定する情報である話者特定情報が、第１記憶手段に、予め複数記憶されており、
話者により入力された所定入力情報に基づいて、前記所定入力情報の音声信号の特徴を示す第２特徴データを抽出するステップと、
前記第２特徴データと各第１特徴データとの間の類似度を計算し、計算された各類似度のうち、最も高い類似度が所定値を超える場合、前記最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得する第１取得ステップと、
各話者特定情報と、前記携帯端末との間で通信可能な状態に設定するように指示する指令である第１指令、又は、話者に対して応答文を出力するように指示する指令である第２指令とが対応づけられている対応データを参照して、前記第１取得ステップにより取得された話者特定情報に対応する指令を取得する第２取得ステップと、
前記第２取得ステップにより取得された指令が第１指令の場合、前記話者と前記ユーザとの間で通話が行えるように、前記通信端末が、前記携帯端末との間で、データ通信可能な状態に設定するステップと、
前記第２取得ステップにより取得された指令が第２指令の場合、前記通信端末が、前記話者に対して、前記応答文を出力するステップとを有し、
前記第１取得手段における所定値は、特定の言葉に対し、前記第１取得手段が計算した前記第１特徴データと前記第２特徴データの類似度が、一定値以上であるか否かの実験を行い、該実験の結果、前記類似度が前記一定値以上である場合に話者を特定できたと判断したときの前記一定値であり、
さらに、前記通信方法は、予め設定された特定の言葉である所定入力情報を音声入力させるメッセージ情報を出力するステップと、
取得した前記所定入力情報を示す音声信号を音声入力手段を介して、特徴抽出手段に送るステップとを有し、
前記特徴抽出手段は、送られてきた所定入力情報を示す音声信号の音声区間の検出を行い、検出した前記音声区間内において、第２特徴データの抽出の処理を行うとともに、
前記特徴抽出手段で前記第２特徴データが抽出ができなかった場合に制御手段は所定の応答文を出力する通信方法である。
【００１４】
【発明の実施の形態】
（構成）
図１は、本実施の形態の通信システムの構成を示す図である。通信システムは、所定の場所に配置された通信端末装置１を有する。また、通信システムは、通信端末装置１のユーザが携帯する携帯端末装置３を有する。通信ネットワーク４（例えば、電話網、インターネット）には、各装置が接続されている。通信端末装置１のユーザは、上記所定の場所から離れている。具体的には、例えば、通信端末装置１は、ユーザの自宅に配置されている。そして、ユーザは、自宅から外出している。この際、ユーザは、携帯端末装置３を保持している。
【００１５】
（通信端末装置の構成）
通信端末装置１は、例えば、ユーザの自宅に配置されている。そして、図１に示すように、通信端末装置１は、入力部１１と、音声入力部１２と、特徴抽出部１３と、第１取得部１４と、第１記憶部１５と、第２取得部１６と、第２記憶部１７と、出力部１８と、通信部１９と、各部を制御する制御部２０とを有する。
【００１６】
入力部１１には、例えば、ユーザの自宅を訪れる訪問者により、各種の入力情報が入力される。入力部１１とは、例えば、マイクロホンやキーボードである。
【００１７】
入力情報とは、キーボード等を介して、入力された文字情報などでもよい。また、入力情報とは、訪問者から発話された音声情報でもよい。以下、本実施の形態においては、訪問者を話者という。
【００１８】
入力部１１により、入力された入力情報は、制御部２０に送られる。制御部２０は、送られてきた入力情報のうち、音声で入力された所定入力情報を判断する。そして、制御部２０は、所定入力情報の音声信号を、音声入力部１２に送る。
【００１９】
音声入力部１２には、上記所定入力情報の音声信号が入力される。この所定入力情報を示す音声信号とは、例えば、『こんにちは』を示す音声信号である。音声入力部１２は、上記所定入力情報を示す音声信号について、Ａ／Ｄ変換処理を行う。そして、Ａ／Ｄ変換処理された音声信号は、特徴抽出部１３に送られる。
【００２０】
特徴抽出部１３は、送られてきた所定入力情報を示す音声信号の音声区間の検出を行う。具体的には、特徴抽出部１３は、入力された所定入力情報を示す音声信号のパワーに基づいて、音声区間の検出を行う。そして、特徴抽出部１３は、検出した音声区間内において、所定入力情報を示す音声信号について、短時間のスペクトル分析を行う。そして、特徴抽出部１３は、短時間のスペクトル分析の結果に基づいて、複数の特徴ベクトルを抽出する。そして、特徴抽出部１３は、抽出した各特徴ベクトルに基づいて、特徴データを抽出する。この特徴データとは、所定入力情報を示す音声信号の特徴を示すデータである。そして、特徴抽出部１３は、抽出した特徴データを第２特徴データとして、第１取得部１４に送る。
【００２１】
なお、特徴抽出部１３は、以下のような処理を行っても良い。特徴抽出部１３は、所定入力情報を示す音声信号について、フーリエ変換処理を行う。特徴抽出部１３は、フーリエ変換処理されたスペクトルの絶対値の対数をとり、逆フーリエ変換処理を行い、ケプストラム係数を求める。そして、特徴抽出部１３は、ケプストラム係数に基づいて、特徴データを抽出してもよい。
【００２２】
また、ユーザが予め選定した話者を特定する情報である話者特定情報は、上記話者により予め入力された所定入力情報の音声信号の特徴を示す特徴データ（以下、第１特徴データという）と対応づけられている。そして、第１記憶部１５には、上記話者特定情報が予め複数記憶されている。図２は、第１記憶部１５の記憶内容の一例を示す図である。図２に示すように、各話者特定情報には、それぞれ、第１特徴データが対応づけられている。また、話者特定情報には、応答文が対応づけられている。そして、第２記憶部１７には、上記話者特定情報が予め複数記憶されている。
【００２３】
第１取得部１４は、特徴抽出部１３から送られた第２特徴データを取得する。そして、第１取得部１４は、第１記憶部１５にアクセスする。そして、第１取得部１４は、上記第２特徴データと、各第１特徴データとの間の類似度を計算する。例えば、第１取得部１４は、マハラノビス距離の数式を用いて、上記第２特徴データと、各第１特徴データとの間の類似度を計算することができる。
【００２４】
そして、第１取得部１４は、計算した各類似度のうち、最も高い類似度が、所定値を超えているか否かを判断する。この所定値は、例えば、以下のようにして決められる。通信端末装置１の開発者が実験を行う。そして、上記開発者は、実験の結果、第１取得部１４が計算した類似度が一定値以上である場合、計算された類似度に基づいて、話者を特定できると判断する。上記一定値が、上記所定値となる。
【００２５】
第１取得部１４は、最も高い類似度が、所定値を超えていると判断した場合、最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得する。第１取得部１４は、取得した話者特定情報を第２取得部１６に送る。一方、第１取得部１４は、最も高い類似度が、所定値を超えていないと判断した場合、その旨を制御部２０に送る。
【００２６】
第２取得部１６は、対応テーブルを保持している。この対応テーブルにおいては、各話者特定情報と、第１指令又は第２指令とが、対応づけられている。図３は、対応テーブルの一例を示す図である。第１指令とは、携帯端末装置３との間で、通信可能な状態に設定するように指示する指令である。第２指令とは、話者に対して、応答文を出力するように指示する指令である。
【００２７】
そして、第２取得部１６は、上記対応テーブルを参照して、第１取得部１４により取得された話者特定情報に対応する指令を取得する。第２取得部１６により取得された指令は、制御部２０に送られる。この際、話者特定情報も制御部２０に送られる。
【００２８】
制御部２０は、第２取得部１６から送られてきた指令に基づいて、以下のような処理を行う。制御部２０は、第２取得部１６により取得された指令が第１指令の場合には、通信部１９に対して、『携帯端末装置３との間で、通信可能な状態に設定するように指示する旨』を送る。
【００２９】
一方、制御部２０は、第２取得部１６により取得された指令が第２指令の場合には、第２記憶部１７にアクセスする。そして、制御部２０は、第１取得部１４により取得された話者特定情報に対応する応答文を取得する。そして、制御部２０は、出力部１８に対して、上記応答文を送る。この際、制御部２０は、『話者に対して、上記応答文を出力するように指示する旨』を送る。
【００３０】
通信部１９は、制御部２０から送られてきた指示に基づいて、話者とユーザとの間で通話が行えるように、携帯端末装置３との間で、データ通信可能な状態に設定する。
【００３１】
具体的には、通信部１９は、制御部２０から送られてきた指示に基づいて、接続要求信号を、通信ネットワーク４を介して、携帯端末装置３の制御部２０に送る。携帯端末装置３の制御部３０は、出力部３１に、通信端末装置１から接続要求信号が送られた旨を出力させる。そして、携帯端末装置３の通信部２９は、通信ネットワーク４を介して、応答信号を通信端末装置１に送る。これにより、通信端末装置１は、携帯端末装置３との間で、データ通信が可能な状態に設定される。通信端末装置１の制御部２０は、ユーザと通話が可能な状態である旨を出力部１８に出力させる。
【００３２】
そして、話者が入力部１１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部２０及び通信部１９などを介して、携帯端末装置３の出力部３１に出力される。そして、ユーザが入力部３１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部３０及び通信部２９などを介して、通信端末装置１の出力部１８に出力される。これにより、話者とユーザとの間の通話が行えるようになる。
【００３３】
出力部１８は、制御部２０から送られてきた指示に基づいて、送られてきた応答文を出力する。なお、第１取得部１４は、最も高い類似度が、所定値を超えていないと判断した場合、その旨を制御部２０に送る。そして、制御部２０は、所定の応答文（例えば、定型的な留守番メッセージ）を出力部１８に出力させる。
【００３４】
図１に示すように、携帯端末装置３は、通信端末装置１とデータ通信を行う通信部２９と、入出力部３１と、各部を制御する制御部３０とを有する。
【００３５】
（通信方法）
上述した通信システムを用いた通信方法の説明は、以下のとおりである。
【００３６】
（１）第１記憶部１５に、各話者特定情報に対応する第１特徴データを、記憶させる処理
ユーザは、予め、自分が選定した話者に、所定入力情報（例えば、『こんにちは』などの所定の言葉）を示す音声を入力部１１に入力させる。ユーザが選定した話者とは、例えば、ユーザの家族や友達や会社の後輩などである。この際、入力部１１には、所定入力情報である旨と、話者特定情報とが入力される。
【００３７】
制御部２０は、所定入力情報である旨に基づいて、入力された所定入力情報を示す音声信号を音声入力部１２を介して、特徴抽出部１３に送る。また、制御部２０は、入力された話者特定情報を保持する。そして、特徴抽出部１３は、送られてきた所定入力情報を示す音声信号に基づいて、特徴データを抽出する。抽出された特徴データは、制御部２０に送られる。
【００３８】
制御部２０は、抽出された特徴データと、保持していた話者特定情報とを対応づける。そして、制御部２０は、抽出された特徴データと、保持していた話者特定情報とを第１記憶部１５に記憶させる。ここで、第１記憶部１５に記憶されている特徴データは、第１特徴データとして、第１記憶部１５に記憶される。
【００３９】
（２）第２取得部１６が、対応テーブルを保持する処理
ユーザは、入力部１１を用いて、各話者特定情報を入力する。この際、ユーザは、入力部１１を用いて、話者特定情報に対応させて、第１指令を特定する情報、又は、第２指令を特定する情報を入力する。ここでいう話者特定情報は、第１記憶部１５に記憶されている話者特定情報と同じである。
【００４０】
入力された情報は、制御部２０に送られる。制御部２０は、入力された情報に基づいて、対応テーブルを生成する。そして、制御部２０は、生成した対応テーブルを第２取得部１６に送る。第２取得部１６は、送られてきた対応テーブルを保持する。
【００４１】
（３）第２記憶部１７に、各話者特定情報に対応する応答文を、記憶させる処理ユーザは、入力部１１を用いて、各話者特定情報と、各話者特定情報に対応する応答文とを入力する。この各話者特定情報とは、第２指令と対応づけられている話者特定情報である。そして、入力された情報は、制御部２０に送られる。制御部２０は、各応答文を、話者特定情報と対応づけて第２記憶部１７に記憶させる。
【００４２】
（４）話者（訪問者）が通信端末装置１に入力情報を入力した場合に行われる処理
図４は、本処理を説明するためのフローチャート図である。ここでは、通信端末装置１が、例えば、ユーザの自宅に設置されているとする。そして、ユーザは、自宅から外出しているとする。そして、ユーザは携帯端末装置３を保持しているとする。但し、本発明は、通信端末装置１が、ユーザの自宅に設置されている場合に、限定されない。
【００４３】
先ず、ある話者（訪問者）が、『ユーザに用がある旨』を入力部１１により入力する。入力された情報は、制御部２０に送られる。制御部２０は、例えば、『現在、外出しております。『こんにちは』という言葉を音声入力してください』というメッセージ情報を出力部１８に出力させる。
【００４４】
上記話者は、入力部１１を用いて、所定入力情報を示す音声（『こんにちは』）を入力する（Ｓ１０）。すると、制御部２０は、上記所定入力情報を示す音声信号を音声入力部１２を介して、特徴抽出部１３に送る。
【００４５】
特徴抽出部１３は、所定入力情報を示す音声信号に基づいて、上記所定入力情報を示す音声信号の特徴を示す第２特徴データを抽出する（Ｓ２０）。そして、特徴抽出部１３は、第２特徴データを、第１取得部１４に送る。
【００４６】
第１取得部１４は、特徴抽出部１３から送られた第２特徴データを取得する。そして、第１取得部１４は、第１記憶部１５にアクセスする。そして、第１取得部１４は、上記第２特徴データと、各第１特徴データとの間の類似度を計算する（Ｓ３０）。
【００４７】
そして、第１取得部１４は、計算した各類似度のうち、最も高い類似度が、所定値を超えているか否かを判断する（Ｓ４０）。
【００４８】
第１取得部１４は、最も高い類似度が、所定値を超えていると判断した場合、最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得する（Ｓ５０）。第１取得部１４は、取得した話者特定情報を第２取得部１６に送る。その後、処理は、ステップＳ６０の処理へ移行する。
【００４９】
第１取得部１４は、最も高い類似度が、所定値を超えていないと判断した場合、その旨が制御部２０に送られる。そして、制御部２０は、所定の応答文（例えば、定型的な留守番メッセージ）を出力部１８に出力させる（Ｓ５５）。
【００５０】
第２取得部１６は、上記対応テーブルを参照して、第１取得部１４により取得された話者特定情報に対応する指令を取得する（Ｓ６０）。そして、第２取得部１６は、取得した指令を制御部２０に送る。この際、話者特定情報も制御部２０に送られる。
【００５１】
制御部２０は、送られてきた指令が第１指令であるか第２指令であるか判断する（Ｓ７０）。制御部２０は、送られてきた指令が第１指令と判断した場合、制御部２０は、以下の処理を行う。制御部２０は、『通信部１９に対して、携帯端末装置３との間で、通信可能な状態に設定するように指示する旨』を通信部１９に送る（Ｓ８０）。その後、処理は、Ｓ１００の処理へ移行する。
【００５２】
一方、制御部２０は、送られてきた指令が第２指令と判断した場合、制御部２０は、以下の処理を行う。制御部２０は、第２記憶部１７にアクセスする。そして、制御部２０は、話者特定情報に対応する応答文を取得する（Ｓ８５）。そして、制御部２０は、取得した応答文と、『話者に対して上記応答文を出力するように指示する旨』と、を出力部１８に送る（Ｓ９０）。その後、処理は、Ｓ１２０の処理へ移行する。
【００５３】
通信部１９は、制御部２０から送られてきた指示に基づいて、話者とユーザとの間で通話が行えるように、携帯端末装置３との間で、データ通信可能な状態に設定する（Ｓ１００）。通信部１９は、携帯端末装置３との間で、データ通信可能な状態に設定した場合、その旨を制御部２０に送る。制御部２０は、ユーザと通話が可能な状態である旨を出力部１８に出力させる。
【００５４】
そして、ユーザと話者との間で、通話が行われる（Ｓ１１０）。この処理の具体的な説明は、以下のとおりである。話者が入力部１１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部２０及び通信部１９などを介して、携帯端末装置３の出力部１８に出力される。
【００５５】
そして、ユーザが入力部１１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部２０及び通信部１９などを介して、通信端末装置１の出力部１８に出力される。これにより、話者とユーザとの間の通話が行える。
【００５６】
ステップＳ１２０では、出力部１８は、制御部２０から送られてきた指示に基づいて、応答文を出力する。
【００５７】
（作用効果）
本実施の形態によれば、特徴抽出部１３は、話者（例えば、訪問者）により入力された所定入力情報に基づいて、所定入力情報の音声信号の特徴を示す第２特徴データを抽出する。そして、第１取得部１４は、第２特徴データと各第１特徴データとの間の類似度を計算する。そして、第１取得部１４は、計算された各類似度のうち、最も高い類似度が所定値を超える場合に、最も高い類似度に対応する第１特徴データに関連する話者特定情報を取得することができる。
【００５８】
このため、例えば、ユーザが自宅から外出している（所定場所から離れている）ときに、上記自宅に訪問してくる訪問者が、所定の話者である場合には、本通信システムは、上記訪問者がだれであるのか正確に特定できる。この所定の話者とは、第１記憶部１５に記憶されている話者特定情報に対応する話者である。
【００５９】
また、第２取得部１６は、対応データを参照して、第１取得部１４により取得された話者特定情報に対応する指令を取得することができる。そして、取得された指令が第１指令の場合、通信部１９は、話者とユーザとの間で通話が行えるように、携帯端末装置３との間で、データ通信可能な状態に設定する。また、取得された指令が第２指令の場合、出力部１８は、話者に対して、所定の応答文を出力する。
【００６０】
このため、本通信システムは、例えば、ユーザが自宅から外出しているときに、自宅に訪問してくる話者（訪問者）に応じて、携帯端末装置３と通信端末装置１とが通信できる状態に設定する処理と、所定の応答文を上記話者に出力する処理と、のうち、いずれかの処理を行うことができる。
【００６１】
この結果、例えば、ユーザは、親しい友人が自宅に訪問してきたときには、ユーザは、外出先から上記友人と通話を行えることができる。また、例えば、ユーザは、それほど親しくない友人が自宅に訪問してきたときには、通信端末装置１が上記友人に対して、所定の応答文を出力することができる。従って、ユーザにとって便利な通信システムの実現が可能となる。
【００６２】
また、訪問者が、第１記憶部１５に記憶された話者特定情報に対応する話者でない場合、例えば、定型的なメッセージ情報が上記訪問者に出力される。このため、第１記憶部１５の各応答文に、ユーザが第三者には知られたくない情報が含まれているような場合、これらの情報が、第三者に知られてしまうことがない。ここでいう第三者とは、第１記憶部１５に記憶された話者特定情報に対応する話者以外の者である。
【００６３】
また、本実施の形態では、第２記憶部１７には、各話者特定情報が記憶されている。そして、各話者特定情報には、それぞれ、応答文が対応づけられている。そして、出力部１８は、例えば、ユーザの自宅に訪問してくる訪問者に対して、上記訪問者に対応する応答文を出力することができる。
【００６４】
このため、例えば、ユーザの会社の上司や先輩が自宅に訪問してきたときには、出力部１８は、上司や先輩用の応答文を、上司や先輩に対して出力することができる。また、例えば、ユーザの部下や後輩が自宅に訪問してきたときには、出力部１８は、部下や後輩用の応答文を、部下や後輩に対して出力することができる。従って、ユーザにとって一層便利な通信システムの実現が可能となる。
【００６５】
（変形例１）
（通信システムの構成）
本実施の形態の通信システムの第１の変形例は、以下のとおりである。図５は、本変形例の通信システムの構成を示す図である。通信システムは、所定の場所に配置された第１通信端末装置１を有する。また、通信システムは、第１通信端末装置１のユーザ（以下、第１ユーザという）が携帯する携帯端末装置３を有する。この第１ユーザは、上記所定の場所から離れている。そして、通信システムは、複数の第２通信端末装置５を有する。通信ネットワーク４（例えば、電話網、インターネット）には、各装置が接続されている。
【００６６】
なお、本変形例の通信システムにおいて、実施の形態の通信システムで示した構成と同一構成の説明は、省略される。また、本変形例の通信システムにおいて、実施の形態の通信システムで示した構成と同一又は類似の構成には、同一符号が付される。
【００６７】
具体的には、例えば、第１通信端末装置１は、第１ユーザの自宅（所定の場所）に配置されている。そして、第１ユーザは、自宅から外出している。この際、第１ユーザは、携帯端末装置３を保持している。そして、第２通信端末装置５のユーザを第２ユーザという。
【００６８】
本変形例の第１通信端末装置１において、実施の形態の通信端末装置１と異なる点は、以下のとおりである。
【００６９】
第１記憶部１５には、ユーザ特定情報が複数記憶されている。このユーザ特定情報とは、第２ユーザを特定する情報である。各ユーザ特定情報は、第２ユーザにより予め入力された所定入力情報の音声信号の特徴を示す第１特徴データと対応づけられている。
【００７０】
また、第２記憶部１７には、ユーザ特定情報が複数記憶されている。各ユーザ特定情報には、それぞれ、応答文が対応づけられている。
【００７１】
特徴抽出部１３は、第２通信端末装置５から送られた第２ユーザの所定入力情報に基づいて、所定入力情報の音声信号の特徴を示す第２特徴データを抽出する。
【００７２】
第１取得部１４は、第２特徴データと、各第１特徴データとの間の類似度を計算し、計算された各類似度のうち、最も高い類似度が所定値を超える場合に、最も高い類似度に対応する第１特徴データに関連するユーザ特定情報を取得する。
【００７３】
第２取得部１６は、各ユーザ特定情報と、第１指令又は第２指令と、が対応づけられている対応テーブルを参照して、第１取得部１４により取得されたユーザ特定情報に対応する指令を取得する。第１指令とは、第２通信端末装置５が携帯端末装置３との間で通信可能な状態に設定するように指示する指令である。第２指令とは、第２ユーザに対して応答文を出力するように指示する指令である。第２取得部１６は、取得した指令と、第１取得部１４により取得されたユーザ特定情報と、を制御部２０に送る。
【００７４】
制御部２０は、第２取得部１６により取得された指令が第１指令の場合には、以下の処理を行う。制御部２０は、通信部１９に、上記第１指令を、第２通信端末装置５に送信するように指示する。一方、制御部２０は、第２取得部１６により取得された指令が第２指令の場合には、以下の処理を行う。制御部２０は、第２記憶部１７にアクセスする。そして、制御部２０は、ユーザ特定情報に対応する応答文を取得する。そして、制御部２０は、通信部１９に、『上記第２指令と、取得した応答文とを第２通信端末装置５に送信するように指示する旨』を送る。
【００７５】
第２通信端末装置５は、通信部４９と、入出力部５１と、制御部５０とを有する。第２通信端末装置５の制御部５０は、第１指令を取得した場合、以下の処理を行う。制御部５０は、通信部４９に対して、『第１ユーザと第２ユーザとの間で通話が行えるように、携帯端末装置３との間でデータ通信可能な状態に設定するように指示する旨』を送る。また、第２通信端末装置５の制御部５０は、第２指令及び応答文を取得した場合、出力部５１に、上記応答文を出力させる。
【００７６】
なお、第１通信端末装置１の第１取得部１４は、最も高い類似度が、所定値を超えていないと判断した場合、その旨を制御部２０に送る。そして、制御部２０は、所定の応答文（例えば、定型的な留守番メッセージ）と、上記所定の応答文を出力するように指示する旨とを、通信部１９を介して、第２通信端末装置５に送る。第２通信端末装置５の制御部５０は、上記指示及び応答文を取得した場合、出力部５１に、上記所定の応答文を出力させる。
【００７７】
（通信方法）
上述した通信システムを用いた通信方法の説明は、以下のとおりである。本変形例においても、実施の形態の（１）、（２）、（３）と同じ処理が行われる。
【００７８】
但し、実施の形態の（１）、（２）、（３）の説明においては、ユーザは、第１ユーザと置き換えられる。また、話者は、第２ユーザに置き換えられる。また、話者特定情報は、ユーザ特定情報に置き換えられる。また、第１指令は、本変形例の第１指令であり、第２指令は、本変形例の第２指令である。
【００７９】
図６は、第２ユーザが第２通信端末装置５に入力情報を入力した場合に行われる処理を説明するためのフローチャート図である。ここでは、一例として、第１通信端末装置１は、第１ユーザの自宅に配置されているとする。そして、第１ユーザは、自宅から外出しているとする。そして、第１ユーザは、携帯端末装置３を保持しているとする。
【００８０】
また、本処理では、一例として、各端末装置は、電話装置である場合の説明が行われる。本処理では、第１通信端末装置１を第１電話装置１といい、第２通信端末装置５を第２電話装置５といい、携帯端末装置３を携帯電話装置という。但し、各端末装置が電話装置以外の場合にも、本変形例は適用できる。
【００８１】
先ず、第１ユーザは、自宅から外出する際、第１電話装置１の入力部１１を用いて、以下のような情報を入力する。第１ユーザは、『第２電話装置５から呼び出し信号を受信した場合、所定のメッセージ情報を上記第２電話装置５に送信するように指示する旨』を入力部１１により入力する。すると、上記指示する旨は制御部２０に送られる。制御部２０は、上記指示を実行できるように、各部に対して、それぞれ、所定の指示を送る。
【００８２】
そして、ある第２ユーザは、自己の第２電話装置５の入力部５１を用いて、第１電話装置１に割り当てられた電話番号を入力する。上記電話番号は、制御部５０に送られる。制御部５０は、上記電話番号に基づいて、呼び出し信号を第１電話装置１に送る。
【００８３】
上記呼び出し信号は、第１電話装置１の通信部１９を介して、第１電話装置１の制御部２０に送られる。制御部２０は、例えば、『現在、外出しております。『こんにちは』という言葉を音声入力してください』というメッセージ情報を、通信部１９を介して、第２電話装置５に送る。
【００８４】
上記メッセージ情報は、第２電話装置５の通信部４９と、制御部５０とを介して、出力部５１に送られる。出力部５１は、上記メッセージ情報を出力する。
【００８５】
第２ユーザは、入力部５１を用いて、所定入力情報を示す音声（例えば、『こんにちは』）を入力する（Ｓ２００）。すると、制御部５０は、上記所定入力情報を示す音声信号を、通信部４９などを介して、第１電話装置１の制御部２０に送る。制御部２０は、上記所定入力情報を示す音声信号を、音声入力部１２を介して、特徴抽出部１３に送る。
【００８６】
そして、ステップＳ２０、Ｓ３０、Ｓ４０の処理が行われる。そして、第１取得部１４は、最も高い類似度が、所定値を超えていると判断した場合、最も高い類似度に対応する第１特徴データに関連するユーザ特定情報を取得する（Ｓ２１０）。第１取得部１４は、取得したユーザ特定情報を第２取得部１６に送る。その後、処理は、ステップＳ２４０の処理へ移行する。
【００８７】
第１取得部１４は、最も高い類似度が、所定値を超えていないと判断した場合、その旨が制御部２０に送られる。制御部２０は、所定の応答文を、通信部１９を介して、第２電話装置５に送る（Ｓ２２０）。そして、第２電話装置５の制御部５０に所定の応答文が送られる。制御部５０は、出力部５１に、所定の応答文（例えば、定型的な留守番メッセージ）を出力させる（Ｓ２３０）。
【００８８】
第２取得部１６は、上記対応テーブルを参照して、第１取得部１４により取得されたユーザ特定情報に対応する指令を取得する（Ｓ２４０）。そして、第２取得部１６は、取得した指令を制御部２０に送る。この際、ユーザ特定情報も制御部２０に送られる。
【００８９】
制御部２０は、送られてきた指令が第１指令であるか第２指令であるか判断する（Ｓ２５０）。制御部２０は、送られてきた指令が第１指令と判断した場合、制御部２０は、第１指令を通信部１９に送る。通信部１９は、上記第１指令を第２電話装置５に送る（Ｓ２６０）。その後、処理は、Ｓ２８０の処理へ移行する。
【００９０】
一方、制御部２０は、送られてきた指令が第２指令と判断した場合、制御部２０は、以下の処理を行う。制御部２０は、第２記憶部１７にアクセスする。そして、制御部２０は、ユーザ特定情報に対応する応答文を取得する（Ｓ２６５）。そして、制御部２０は、第２指令と、上記取得した応答文とを通信部１９に送る。通信部１９は、第２指令と、上記応答文とを第２電話装置５に送る（Ｓ２７０）。その後、処理は、Ｓ３００の処理へ移行する。
【００９１】
第２電話装置５の制御部５０に、第１指令が送られてきた場合、以下の処理が行われる。制御部５０は、第１ユーザと第２ユーザとの間で通話が行えるように、携帯電話装置３との間で、データ通信可能な状態に設定するように、通信部４９に指示する（Ｓ２８０）。通信部４９は、制御部５０から送られてきた指示に基づいて、第１ユーザと第２ユーザとの間で通話が行えるように、携帯電話装置３との間で、データ通信可能な状態に設定する（Ｓ２９０）。
【００９２】
具体的には、通信部４９は、制御部５０から送られてきた指示に基づいて、接続要求信号を、携帯電話装置３の制御部３０に送る。携帯電話装置３の制御部３０は、出力部３１に、第２電話装置５から接続要求信号が送られた旨を出力させる。そして、携帯電話装置３の通信部２９は、第２電話装置５に対して、応答信号を送る。これにより、第２電話装置５は、携帯電話装置３との間で、データ通信が可能な状態に設定される。通信部４９は、携帯電話装置３との間で、データ通信可能な状態に設定した場合、その旨を制御部５０に送る。制御部５０は、第１ユーザと通話が可能な状態である旨を出力部５１に出力させる。
【００９３】
そして、第１ユーザと第２ユーザとの間で、通話が行われる（Ｓ２９５）。この処理の具体的な説明は、以下のとおりである。第２ユーザが入力部５１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部５０、通信部４９などを介して、携帯電話装置３の出力部３１に出力される。第１ユーザが入力部３１を用いて、所定の通話情報を入力する。所定の通話情報は、制御部３０、通信部２９などを介して、第２電話装置５の出力部５１に出力される。これにより、第１ユーザと第２ユーザとの間の通話が行われる。
【００９４】
ステップＳ３００では、第２電話装置５の制御部２０に、第２指令及び応答文が送られる。制御部２０の指示により、出力部１８は、上記応答文を出力する。
【００９５】
（作用効果）
本変形例の通信システムも、例えば、第１ユーザが自宅から外出している（所定場所から離れている）ときに、上記自宅に電話をかけてくる第２ユーザが、所定の話者である場合には、上記第２ユーザがだれであるのか正確に特定できる。
【００９６】
また、本通信システムは、例えば、第１ユーザが自宅から外出しているときに、自宅に電話をかけてくる第２ユーザに応じて、携帯電話装置３（携帯端末装置３）と第２電話装置５（第２通信端末装置５）とが通信できる状態に設定する処理と、所定の応答文を上記第２ユーザに出力する処理と、のうち、いずれかの処理を行うことができる。
【００９７】
この結果、例えば、親しい友人の第２電話装置５から、自宅の第１電話装置１に呼び出し信号が送られたようなときには、第１ユーザは、外出先から上記友人と通話を行えることができる。また、例えば、それほど親しくない友人の第２電話装置５から、自宅の第１電話装置１に呼び出し信号が送られたようなときには、自宅にある第１電話装置１が上記友人の第２電話装置５に所定の応答文を送ることができる。そして、上記第２電話装置５において、上記友人に対して、所定の応答文が出力される。従って、第１ユーザにとって便利な通信システムの実現が可能となる。
【００９８】
また、第２ユーザが、第１記憶部１５に記憶されたユーザ特定情報に対応するユーザでない場合、例えば、定型的なメッセージ情報が上記第２ユーザに出力される。このため、第１記憶部１５の各応答文に、第１ユーザが第三者には知られたくない情報が含まれているような場合、これらの情報が、第三者に知られてしまうことがない。
【００９９】
また、本変形例では、第２記憶部１７には、各ユーザ特定情報が記憶されている。そして、各ユーザ特定情報には、それぞれ、応答文が対応づけられている。そして、例えば、第１電話装置１に呼び出し信号を送った第２電話装置５の第２ユーザに対して、上記第２ユーザに対応する応答文が出力される。
【０１００】
このため、例えば、第１ユーザの会社の上司（又は先輩）の第２電話装置５から呼び出し信号が第１電話装置１に送られたような場合、上記第２電話装置５の出力部５１は、上司（又は先輩）用の応答文を、上司（又は先輩）に対して出力することができる。また、例えば、第１ユーザの会社の部下（又は後輩）の第２電話装置５から呼び出し信号が第１電話装置１に送られたような場合、上記第２電話装置５の出力部５１は、部下（又は後輩）用の応答文を、部下（又は後輩）に対して出力することができる。従って、第１ユーザにとって一層便利な通信システムの実現が可能となる。
【０１０１】
（変形例２）
また、特徴抽出部１３が特徴データを抽出できなかった場合、その旨が制御部２０に送られるようにしてもよい。そして、制御部２０は、所定の応答文（例えば、定型的な留守番メッセージ）を出力部１８に出力させるようにしてもよい。
【０１０２】
また、通信システムは、話者モデル生成部（図示せず）を有するようにしてもよい。そして、話者モデル生成部は、特徴抽出部１３により、抽出された第２特徴データに基づいて、第２話者モデルを生成するようにしてもよい。この際、話者モデル生成部は、隠れマルコフモデル法に基づいて、話者モデルを生成してもよい。
【０１０３】
そして、第１記憶部１５においては、各話者特定情報は、第１話者モデルと対応づけられている。そして、第１取得部１４は、第２話者モデルと、各第１話者モデルとの間の類似度を計算するようにしてもよい。第１取得部１４は、例えば、『確率モデルによる音声認識』（中川著、昭和６３年、電子通信学会発行、コロナ社）に記載のビタビアルゴリズムに基づいて、シンボル生起確率を類似度として、計算することができる。
【０１０４】
なお、本変形例で説明した話者特定情報を、ユーザ特定情報に置き換えることで、変形例１の通信システムは、本変形例のように変形されることができる。
【０１０５】
（変形例３）
第２記憶部１７の記憶内容は、以下のように変形されてもよい。第２指令と対応づけられた話者特定情報には、ランク情報が関連づけられている。そして、ランク情報には、それぞれ応答文が対応づけられている。そして、第２記憶部１７には、話者特定情報が複数記憶されている。図７は、変形例３の第２記憶部１７の記憶内容の一例を示す図である。
【０１０６】
そして、ステップＳ９０において、制御部２０は、第２取得部１６により取得された指令が第２指令の場合には、第２記憶部１７にアクセスした後、以下の処理を行うようにしてもよい。制御部２０は、話者特定情報に対応するランク情報を取得し、取得したランク情報に対応する応答文を取得するようにしてもよい。
【０１０７】
なお、本変形例で説明した話者特定情報を、ユーザ特定情報に置き換えることで、変形例１の通信システムは、本変形例のように変形されることができる。
【０１０８】
また、上述した実施の形態や、各変形例において、音声入力部１２は、ノイズキャンセル機能を有するようにしてもよい。
【０１０９】
【発明の効果】
本発明は、例えば、ユーザが自宅から外出しているときに、自宅に訪問してくる話者に応じて、携帯端末装置３と通信端末装置１とが通信できる状態に設定する処理と、所定の応答文を上記話者に出力する処理と、のうち、いずれかの処理を行うことができる。
【０１１０】
この結果、例えば、ユーザは、親しい友人が自宅に訪問してきたときには、ユーザは、外出先から上記友人と通話を行うことができる。また、例えば、ユーザは、それほど親しくない友人が自宅に訪問してきたときには、通信端末装置１が上記友人に対して、所定の応答文を出力することができる。従って、ユーザにとって便利な通信システムの実現が可能となる。
【図面の簡単な説明】
【図１】本実施の形態の通信システムの構成を示す図である。
【図２】本実施の形態の第１記憶部の記憶内容の一例を示す図である。
【図３】本実施の形態の対応テーブルの一例を示す図である。
【図４】本実施の形態の通信方法を説明するためのフローチャート図である。
【図５】変形例１の通信システムの構成を示す図である。
【図６】変形例１の通信方法を説明するためのフローチャート図である。
【図７】変形例２の第２記憶部の記憶内容の一例を示す図である。
【符号の説明】
１…第１通信端末装置、第１電話装置、通信端末装置、３…携帯端末装置、携帯電話装置、４…通信ネットワーク、５…第２通信端末装置、第２電話装置、１１…入力部、１２…音声入力部、１３…特徴抽出部、１４…第１取得部、１５…第１記憶部、１６…第２取得部、１７…第２記憶部、１８…出力部、１９…通信部、２０…制御部、２９…通信部、３０…制御部、３１…入出力部、４９…通信部、５０…制御部、５１…入出力部。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a communication system, a communication terminal device, and a communication method that perform predetermined processing when a user is absent.
[0002]
[Prior art]
Conventionally, for example, when the user is not at home and the call signal is sent from the other party's telephone device to the user's telephone device, the following processing is performed. The user's telephone device set to the answering machine mode automatically goes off-hook. Then, the telephone device transmits a predetermined message to the other telephone device.
[0003]
Conventionally, there are the following techniques. The telephone device set to the answering machine mode identifies the speaker based on a predetermined keyword voice signal sent from the other telephone device. Then, the telephone device set to the answering machine mode sends a response message corresponding to the speaker to the telephone device of the other party (see, for example, Patent Document 1).
[0004]
In addition, when a visitor visits the home when the user is not at home, the predetermined device performs the following processing when the visitor is detected. The predetermined device acquires a message indicating that it is absent and outputs it to the visitor.
[0005]
[Patent Document 1]
Japanese Patent Laid-Open No. 9-312686
[0006]
[Problems to be solved by the invention]
The prior art described above has the following problems. For example, when a call signal is sent from the other party's telephone device to the home telephone device when the user is not at home, the user may want to make a direct call depending on the other party. On the other hand, depending on the other party, the user may think that a predetermined response sentence may be output from the other party's telephone device without making a direct call.
[0007]
Similarly, if a visitor comes to home when the user is not at home, depending on the visitor, the user may want to make a call directly. On the other hand, depending on the visitor, the user may think that a predetermined response sentence may be output to the visitor without making a direct call.
[0008]
For this reason, for example, when the user is not at home, the user can make a call with the visitor or the like depending on the visitor or the other party who called (hereinafter referred to as a visitor or the like). It has been desired to develop a communication system that can be output to visitors.
[0009]
An object of the present invention is, for example, communication in which a user can make a call with a visitor or the like or a response sentence is output to a visitor or the like according to a visitor or the like when the user is not at home. A system, a communication terminal device, and a communication method are provided.
[0010]
[Means for Solving the Problems]
  In order to solve the above problems, the present invention is a communication system having a communication terminal arranged at a predetermined location and a portable terminal carried by a user of the communication terminal away from the predetermined location,
  The communication terminal is
  A plurality of pieces of speaker specifying information, which are information specifying the speaker selected in advance by the user, associated with the first feature data indicating the characteristics of the audio signal of the predetermined input information input in advance by the speaker are stored in advance. First storage means;
  Feature extraction means for extracting second feature data indicating features of a voice signal of the predetermined input information based on predetermined input information input by a speaker;
  A similarity between the second feature data and each first feature data is calculated. If the highest similarity among the calculated similarities exceeds a predetermined value, the second corresponding to the highest similarity is calculated. First acquisition means for acquiring speaker identification information related to one feature data;
  A first command which is a command for instructing to set each speaker specific information and the portable terminal to be in a communicable state, or a command for instructing a speaker to output a response sentence A second acquisition means for acquiring a command corresponding to the speaker specifying information acquired by the first acquisition means with reference to correspondence data associated with a second command;
  If the command acquired by the second acquisition means is the first command, communication for setting data communication with the portable terminal so that a call can be performed between the speaker and the user. Means,
  When the instruction acquired by the second acquisition means is a second instruction, the output means outputs the response sentence to the speaker.
  The predetermined value in the first acquisition means is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is a certain value or more for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
  Further, the communication terminal outputs a message information for voice input of predetermined input information which is a specific word set in advance;
  Means for sending a voice signal indicating the acquired predetermined input information to the feature extraction means via voice input means;
  The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
  The control unit is a communication system that outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction unit.
[0011]
  Further, the present invention is a communication terminal device that is disposed at a predetermined location and is owned by a user away from the predetermined location,
  A plurality of pieces of speaker specifying information, which are information specifying the speaker selected in advance by the user, associated with the first feature data indicating the characteristics of the audio signal of the predetermined input information input in advance by the speaker are stored in advance. First storage means;
  Feature extraction means for extracting second feature data indicating features of a voice signal of the predetermined input information based on predetermined input information input by a speaker;
  The similarity between the second feature data and each first feature data is calculated, and when the highest similarity among the calculated similarities exceeds a predetermined value, it corresponds to the highest similarity First acquisition means for acquiring speaker identification information related to the first feature data;
  A first instruction that is an instruction to communicate with each speaker specific information and a portable terminal carried by the user, or an instruction that instructs the speaker to output a response sentence. A second acquisition unit that acquires a command corresponding to the speaker identification information acquired by the first acquisition unit with reference to the correspondence data associated with the two commands;
  If the command acquired by the second acquisition means is the first command, communication for setting data communication with the portable terminal so that a call can be performed between the speaker and the user. Means,
  When the instruction acquired by the second acquisition means is a second instruction, the output means outputs the response sentence to the speaker.
  The predetermined value in the first acquisition means is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is a certain value or more for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
  Further, the communication terminal device outputs message information for voice input of predetermined input information which is a predetermined specific word;
  Means for sending a voice signal indicating the acquired predetermined input information to the feature extraction means via voice input means;
  The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
  The control means is a communication terminal device that outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction means.
[0012]
  Further, the present invention is a communication method using a communication terminal arranged at a predetermined location and a portable terminal carried by a user of the communication terminal away from the predetermined location,
  Speaker specifying information, which is information specifying a speaker selected in advance by the user, is associated with first feature data indicating characteristics of a voice signal of predetermined input information input in advance by a speaker. A plurality of means are stored in advance,
  Extracting second feature data indicating a feature of an audio signal of the predetermined input information based on predetermined input information input by a speaker;
  A similarity between the second feature data and each first feature data is calculated. If the highest similarity among the calculated similarities exceeds a predetermined value, the second corresponding to the highest similarity is calculated. A first acquisition step of acquiring speaker specifying information related to one feature data;
  A first command which is a command for instructing to set each speaker specific information and the portable terminal to be in a communicable state, or a command for instructing a speaker to output a response sentence A second acquisition step of acquiring a command corresponding to the speaker identification information acquired by the first acquisition step with reference to correspondence data associated with a second command;
  When the command acquired in the second acquisition step is the first command, the communication terminal can perform data communication with the mobile terminal so that a call can be performed between the speaker and the user. A step to set the state;
  When the command acquired in the second acquisition step is a second command, the communication terminal has a step of outputting the response sentence to the speaker.
  The predetermined value in the first acquisition means is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is a certain value or more for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
  Further, the communication method includes outputting message information for voice input of predetermined input information that is a specific word set in advance;
  Sending the voice signal indicating the acquired predetermined input information to the feature extraction means via the voice input means,
  The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
  In the communication method, the control unit outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction unit.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
(Constitution)
FIG. 1 is a diagram illustrating a configuration of a communication system according to the present embodiment. The communication system includes a communication terminal device 1 arranged at a predetermined location. Further, the communication system includes a mobile terminal device 3 that is carried by a user of the communication terminal device 1. Each device is connected to a communication network 4 (for example, a telephone network, the Internet). The user of the communication terminal device 1 is away from the predetermined location. Specifically, for example, the communication terminal device 1 is disposed at the user's home. And the user is going out from home. At this time, the user holds the mobile terminal device 3.
[0015]
(Configuration of communication terminal device)
The communication terminal device 1 is disposed at a user's home, for example. As illustrated in FIG. 1, the communication terminal device 1 includes an input unit 11, a voice input unit 12, a feature extraction unit 13, a first acquisition unit 14, a first storage unit 15, and a second acquisition unit. 16, the 2nd memory | storage part 17, the output part 18, the communication part 19, and the control part 20 which controls each part.
[0016]
Various input information is input to the input unit 11 by, for example, a visitor visiting the user's home. The input unit 11 is, for example, a microphone or a keyboard.
[0017]
The input information may be character information input via a keyboard or the like. The input information may be voice information uttered by a visitor. Hereinafter, in this embodiment, a visitor is referred to as a speaker.
[0018]
The input information input by the input unit 11 is sent to the control unit 20. The control unit 20 determines predetermined input information input by voice among the input information transmitted. Then, the control unit 20 sends an audio signal of predetermined input information to the audio input unit 12.
[0019]
The audio input unit 12 receives an audio signal of the predetermined input information. The audio signal indicating the predetermined input information, for example, a voice signal indicating "Hello". The voice input unit 12 performs A / D conversion processing on the voice signal indicating the predetermined input information. Then, the audio signal subjected to the A / D conversion process is sent to the feature extraction unit 13.
[0020]
The feature extraction unit 13 detects a voice section of a voice signal indicating the predetermined input information that has been sent. Specifically, the feature extraction unit 13 detects a speech section based on the power of the speech signal indicating the input predetermined input information. And the feature extraction part 13 performs a short-time spectrum analysis about the audio | voice signal which shows predetermined input information in the detected audio | voice area. Then, the feature extraction unit 13 extracts a plurality of feature vectors based on the result of short-time spectrum analysis. Then, the feature extraction unit 13 extracts feature data based on each extracted feature vector. This characteristic data is data indicating the characteristics of an audio signal indicating predetermined input information. Then, the feature extraction unit 13 sends the extracted feature data to the first acquisition unit 14 as second feature data.
[0021]
Note that the feature extraction unit 13 may perform the following processing. The feature extraction unit 13 performs a Fourier transform process on the audio signal indicating the predetermined input information. The feature extraction unit 13 takes the logarithm of the absolute value of the spectrum subjected to Fourier transform processing, performs inverse Fourier transform processing, and obtains a cepstrum coefficient. The feature extraction unit 13 may extract feature data based on the cepstrum coefficient.
[0022]
Further, speaker specifying information, which is information for specifying a speaker selected in advance by the user, is feature data (hereinafter referred to as first feature data) indicating the characteristics of the audio signal of the predetermined input information input in advance by the speaker. It is associated with. The first storage unit 15 stores a plurality of the speaker specifying information in advance. FIG. 2 is a diagram showing an example of the contents stored in the first storage unit 15. As shown in FIG. 2, the first feature data is associated with each speaker specifying information. In addition, a response sentence is associated with the speaker identification information. The second storage unit 17 stores a plurality of the speaker specifying information in advance.
[0023]
The first acquisition unit 14 acquires the second feature data sent from the feature extraction unit 13. Then, the first acquisition unit 14 accesses the first storage unit 15. Then, the first acquisition unit 14 calculates the similarity between the second feature data and each first feature data. For example, the first acquisition unit 14 can calculate the similarity between the second feature data and each of the first feature data using a mathematical formula of Mahalanobis distance.
[0024]
Then, the first acquisition unit 14 determines whether or not the highest similarity among the calculated similarities exceeds a predetermined value. This predetermined value is determined as follows, for example. The developer of the communication terminal device 1 performs an experiment. Then, as a result of the experiment, the developer determines that the speaker can be identified based on the calculated similarity when the similarity calculated by the first acquisition unit 14 is a certain value or more. The constant value is the predetermined value.
[0025]
If the first acquisition unit 14 determines that the highest similarity exceeds a predetermined value, the first acquisition unit 14 acquires speaker specifying information related to the first feature data corresponding to the highest similarity. The first acquisition unit 14 sends the acquired speaker identification information to the second acquisition unit 16. On the other hand, if the first acquisition unit 14 determines that the highest similarity does not exceed the predetermined value, the first acquisition unit 14 sends a message to that effect to the control unit 20.
[0026]
The second acquisition unit 16 holds a correspondence table. In this correspondence table, each speaker specifying information is associated with the first command or the second command. FIG. 3 is a diagram illustrating an example of the correspondence table. The first command is a command for instructing to set a communicable state with the mobile terminal device 3. The second command is a command for instructing the speaker to output a response sentence.
[0027]
And the 2nd acquisition part 16 acquires the instruction | command corresponding to the speaker specific information acquired by the 1st acquisition part 14 with reference to the said correspondence table. The command acquired by the second acquisition unit 16 is sent to the control unit 20. At this time, speaker identification information is also sent to the control unit 20.
[0028]
The control unit 20 performs the following process based on the command sent from the second acquisition unit 16. When the command acquired by the second acquisition unit 16 is the first command, the control unit 20 instructs the communication unit 19 to “set to a state in which communication with the mobile terminal device 3 is possible. Send “Instruction to Instruct”.
[0029]
On the other hand, when the command acquired by the second acquisition unit 16 is the second command, the control unit 20 accesses the second storage unit 17. Then, the control unit 20 acquires a response sentence corresponding to the speaker identification information acquired by the first acquisition unit 14. Then, the control unit 20 sends the response sentence to the output unit 18. At this time, the control unit 20 sends “instructing the speaker to output the response sentence”.
[0030]
Based on the instruction sent from the control unit 20, the communication unit 19 sets a state in which data communication is possible with the mobile terminal device 3 so that a call can be made between the speaker and the user.
[0031]
Specifically, the communication unit 19 sends a connection request signal to the control unit 20 of the mobile terminal device 3 via the communication network 4 based on the instruction sent from the control unit 20. The control unit 30 of the mobile terminal device 3 causes the output unit 31 to output that a connection request signal has been sent from the communication terminal device 1. Then, the communication unit 29 of the mobile terminal device 3 sends a response signal to the communication terminal device 1 via the communication network 4. Thereby, the communication terminal device 1 is set to a state in which data communication is possible with the mobile terminal device 3. The control unit 20 of the communication terminal device 1 causes the output unit 18 to output a message indicating that a call with the user is possible.
[0032]
Then, the speaker inputs predetermined call information using the input unit 11. The predetermined call information is output to the output unit 31 of the mobile terminal device 3 via the control unit 20 and the communication unit 19. Then, the user inputs predetermined call information using the input unit 31. The predetermined call information is output to the output unit 18 of the communication terminal device 1 via the control unit 30, the communication unit 29, and the like. As a result, a call between the speaker and the user can be performed.
[0033]
The output unit 18 outputs the sent response text based on the instruction sent from the control unit 20. When the first acquisition unit 14 determines that the highest similarity does not exceed the predetermined value, the first acquisition unit 14 sends a message to that effect to the control unit 20. Then, the control unit 20 causes the output unit 18 to output a predetermined response sentence (for example, a typical answering machine message).
[0034]
As illustrated in FIG. 1, the mobile terminal device 3 includes a communication unit 29 that performs data communication with the communication terminal device 1, an input / output unit 31, and a control unit 30 that controls each unit.
[0035]
(Communication method)
The description of the communication method using the communication system described above is as follows.
[0036]
(1) Processing for storing first feature data corresponding to each speaker specifying information in the first storage unit 15
The user, in advance, to the speaker that he has selected, predetermined input information (e.g., predetermined words such as "Hello") is input to the input unit 11 the sound indicating. The speaker selected by the user is, for example, the user's family, friends, or junior employees of the company. At this time, the input unit 11 is input with predetermined input information and speaker identification information.
[0037]
Based on the fact that it is the predetermined input information, the control unit 20 sends an audio signal indicating the input predetermined input information to the feature extraction unit 13 via the audio input unit 12. Further, the control unit 20 holds the input speaker specifying information. And the feature extraction part 13 extracts feature data based on the audio | voice signal which shows the predetermined input information sent. The extracted feature data is sent to the control unit 20.
[0038]
The control unit 20 associates the extracted feature data with the stored speaker identification information. Then, the control unit 20 stores the extracted feature data and the held speaker identification information in the first storage unit 15. Here, the feature data stored in the first storage unit 15 is stored in the first storage unit 15 as the first feature data.
[0039]
(2) Processing in which the second acquisition unit 16 holds the correspondence table
The user inputs each speaker specifying information using the input unit 11. At this time, the user uses the input unit 11 to input information specifying the first command or information specifying the second command in correspondence with the speaker specifying information. The speaker specifying information here is the same as the speaker specifying information stored in the first storage unit 15.
[0040]
The input information is sent to the control unit 20. The control unit 20 generates a correspondence table based on the input information. Then, the control unit 20 sends the generated correspondence table to the second acquisition unit 16. The second acquisition unit 16 holds the correspondence table that has been sent.
[0041]
(3) The processing user who stores the response sentence corresponding to each speaker specifying information in the second storage unit 17 uses the input unit 11 to correspond to each speaker specifying information and each speaker specifying information. Enter a response sentence. Each speaker specific information is speaker specific information associated with the second command. Then, the input information is sent to the control unit 20. The control unit 20 stores each response sentence in the second storage unit 17 in association with the speaker identification information.
[0042]
(4) Processing performed when a speaker (visitor) inputs input information to the communication terminal device 1
FIG. 4 is a flowchart for explaining this processing. Here, it is assumed that the communication terminal device 1 is installed at the user's home, for example. It is assumed that the user is out of the house. The user is assumed to hold the mobile terminal device 3. However, the present invention is not limited to the case where the communication terminal device 1 is installed at the user's home.
[0043]
First, a certain speaker (visitor) inputs “noticeable to the user” by the input unit 11. The input information is sent to the control unit 20. The control unit 20 may, for example, “I am currently out. To output the message information such as "Hello", please voice input the word "to the output unit 18.
[0044]
The speaker, using the input unit 11, inputs a voice ( "Hello") indicating a predetermined input information (S10). Then, the control unit 20 sends an audio signal indicating the predetermined input information to the feature extraction unit 13 via the audio input unit 12.
[0045]
The feature extraction unit 13 extracts second feature data indicating the feature of the audio signal indicating the predetermined input information based on the audio signal indicating the predetermined input information (S20). Then, the feature extraction unit 13 sends the second feature data to the first acquisition unit 14.
[0046]
The first acquisition unit 14 acquires the second feature data sent from the feature extraction unit 13. Then, the first acquisition unit 14 accesses the first storage unit 15. Then, the first acquisition unit 14 calculates the similarity between the second feature data and each first feature data (S30).
[0047]
Then, the first acquisition unit 14 determines whether or not the highest similarity among the calculated similarities exceeds a predetermined value (S40).
[0048]
If the first acquisition unit 14 determines that the highest similarity exceeds a predetermined value, the first acquisition unit 14 acquires speaker specifying information related to the first feature data corresponding to the highest similarity (S50). The first acquisition unit 14 sends the acquired speaker identification information to the second acquisition unit 16. Thereafter, the process proceeds to the process of step S60.
[0049]
When the first acquisition unit 14 determines that the highest degree of similarity does not exceed the predetermined value, a message to that effect is sent to the control unit 20. Then, the control unit 20 causes the output unit 18 to output a predetermined response sentence (for example, a typical answering machine message) (S55).
[0050]
The second acquisition unit 16 refers to the correspondence table and acquires a command corresponding to the speaker identification information acquired by the first acquisition unit 14 (S60). Then, the second acquisition unit 16 sends the acquired command to the control unit 20. At this time, speaker identification information is also sent to the control unit 20.
[0051]
The control unit 20 determines whether the sent command is the first command or the second command (S70). When the control unit 20 determines that the sent command is the first command, the control unit 20 performs the following processing. The control unit 20 sends to the communication unit 19 “instructing the communication unit 19 to set a state in which communication with the mobile terminal device 3 is possible” (S80). Thereafter, the process proceeds to S100.
[0052]
On the other hand, when the control unit 20 determines that the sent command is the second command, the control unit 20 performs the following processing. The control unit 20 accesses the second storage unit 17. And the control part 20 acquires the response sentence corresponding to speaker specific information (S85). Then, the control unit 20 sends the acquired response sentence and “instructing the speaker to output the response sentence” to the output unit 18 (S90). Thereafter, the process proceeds to S120.
[0053]
Based on the instruction sent from the control unit 20, the communication unit 19 sets a state in which data communication is possible with the mobile terminal device 3 so that a call can be made between the speaker and the user ( S100). When the communication unit 19 is set in a state in which data communication can be performed with the mobile terminal device 3, the communication unit 19 sends a message to that effect to the control unit 20. The control unit 20 causes the output unit 18 to output a message indicating that a call with the user is possible.
[0054]
Then, a call is performed between the user and the speaker (S110). A specific description of this process is as follows. A speaker uses the input unit 11 to input predetermined call information. The predetermined call information is output to the output unit 18 of the mobile terminal device 3 via the control unit 20 and the communication unit 19.
[0055]
Then, the user inputs predetermined call information using the input unit 11. The predetermined call information is output to the output unit 18 of the communication terminal device 1 through the control unit 20 and the communication unit 19. Thereby, a telephone call between the speaker and the user can be performed.
[0056]
In step S120, the output unit 18 outputs a response sentence based on the instruction sent from the control unit 20.
[0057]
(Function and effect)
According to the present embodiment, the feature extraction unit 13 extracts the second feature data indicating the feature of the audio signal of the predetermined input information based on the predetermined input information input by the speaker (for example, a visitor). . Then, the first acquisition unit 14 calculates the similarity between the second feature data and each first feature data. And the 1st acquisition part 14 acquires the speaker specific information relevant to the 1st feature data corresponding to the highest similarity, when the highest similarity exceeds predetermined value among each calculated similarity. can do.
[0058]
For this reason, for example, when a visitor visiting the home is a predetermined speaker when the user is out of the home (away from a predetermined location), the communication system It is possible to accurately identify who the visitor is. The predetermined speaker is a speaker corresponding to the speaker specifying information stored in the first storage unit 15.
[0059]
Further, the second acquisition unit 16 can acquire a command corresponding to the speaker identification information acquired by the first acquisition unit 14 with reference to the correspondence data. And when the acquired instruction | command is a 1st instruction | command, the communication part 19 sets to the state which can perform data communication between the portable terminal devices 3 so that a telephone call can be performed between a speaker and a user. When the acquired command is the second command, the output unit 18 outputs a predetermined response sentence to the speaker.
[0060]
For this reason, this communication system can communicate with the portable terminal device 3 and the communication terminal device 1 according to the speaker (visitor) who visits a home, for example, when a user is going out of the home. Any one of a process for setting a state and a process for outputting a predetermined response sentence to the speaker can be performed.
[0061]
As a result, for example, when a close friend visits the home, the user can make a call with the friend from the outside. Further, for example, when a friend who is not so close visits home, the communication terminal device 1 can output a predetermined response sentence to the friend. Therefore, a communication system convenient for the user can be realized.
[0062]
If the visitor is not a speaker corresponding to the speaker specifying information stored in the first storage unit 15, for example, typical message information is output to the visitor. For this reason, when each response sentence in the first storage unit 15 includes information that the user does not want to be known to a third party, such information may be known to the third party. Absent. The third party referred to here is a person other than the speaker corresponding to the speaker specifying information stored in the first storage unit 15.
[0063]
In the present embodiment, each speaker specifying information is stored in the second storage unit 17. Each speaker specifying information is associated with a response sentence. And the output part 18 can output the response sentence corresponding to the said visitor with respect to the visitor who visits a user's home, for example.
[0064]
For this reason, for example, when the boss or senior of the user's company visits the home, the output unit 18 can output a response message for the boss or senior to the boss or senior. For example, when a user's subordinate or a junior visits the home, the output unit 18 can output a response message for the subordinate or the junior to the subordinate or the junior. Therefore, a communication system more convenient for the user can be realized.
[0065]
(Modification 1)
(Configuration of communication system)
A first modification of the communication system of the present embodiment is as follows. FIG. 5 is a diagram illustrating a configuration of a communication system according to the present modification. The communication system includes a first communication terminal device 1 arranged at a predetermined location. Further, the communication system includes a mobile terminal device 3 carried by a user of the first communication terminal device 1 (hereinafter referred to as a first user). The first user is away from the predetermined location. The communication system includes a plurality of second communication terminal devices 5. Each device is connected to a communication network 4 (for example, a telephone network, the Internet).
[0066]
In the communication system of the present modification, the description of the same configuration as that shown in the communication system of the embodiment is omitted. In the communication system according to the present modification, the same reference numerals are given to the same or similar components as those shown in the communication system according to the embodiment.
[0067]
Specifically, for example, the first communication terminal device 1 is disposed at the home (predetermined place) of the first user. And the 1st user has gone out from home. At this time, the first user holds the mobile terminal device 3. And the user of the 2nd communication terminal device 5 is called 2nd user.
[0068]
The first communication terminal device 1 of the present modification is different from the communication terminal device 1 of the embodiment as follows.
[0069]
The first storage unit 15 stores a plurality of pieces of user specifying information. This user specifying information is information for specifying the second user. Each user specifying information is associated with the first feature data indicating the feature of the audio signal of the predetermined input information input in advance by the second user.
[0070]
The second storage unit 17 stores a plurality of pieces of user specifying information. A response sentence is associated with each user specifying information.
[0071]
The feature extraction unit 13 extracts second feature data indicating the feature of the audio signal of the predetermined input information based on the predetermined input information of the second user sent from the second communication terminal device 5.
[0072]
The first acquisition unit 14 calculates the similarity between the second feature data and each of the first feature data, and the highest similarity among the calculated similarities exceeds the predetermined value. User specifying information related to the first feature data corresponding to the high similarity is acquired.
[0073]
The second acquisition unit 16 corresponds to the user identification information acquired by the first acquisition unit 14 with reference to a correspondence table in which each user identification information is associated with the first command or the second command. Get a directive. The first command is a command for instructing the second communication terminal device 5 to set a state in which communication with the mobile terminal device 3 is possible. The second command is a command for instructing the second user to output a response sentence. The second acquisition unit 16 sends the acquired command and the user specifying information acquired by the first acquisition unit 14 to the control unit 20.
[0074]
When the command acquired by the second acquisition unit 16 is the first command, the control unit 20 performs the following processing. The control unit 20 instructs the communication unit 19 to transmit the first command to the second communication terminal device 5. On the other hand, when the command acquired by the second acquisition unit 16 is the second command, the control unit 20 performs the following processing. The control unit 20 accesses the second storage unit 17. And the control part 20 acquires the response sentence corresponding to user specific information. Then, the control unit 20 sends the communication unit 19 “instruction to transmit the second command and the acquired response sentence to the second communication terminal device 5”.
[0075]
The second communication terminal device 5 includes a communication unit 49, an input / output unit 51, and a control unit 50. The control part 50 of the 2nd communication terminal device 5 performs the following processes, when acquiring a 1st instruction | command. The control unit 50 instructs the communication unit 49 to “set a state in which data communication with the mobile terminal device 3 is possible so that a call can be made between the first user and the second user. Send a message. Moreover, the control part 50 of the 2nd communication terminal device 5 makes the output part 51 output the said response sentence, when a 2nd command and a response sentence are acquired.
[0076]
In addition, the 1st acquisition part 14 of the 1st communication terminal device 1 transmits that to the control part 20, when it is judged that the highest similarity does not exceed the predetermined value. Then, the control unit 20 informs the second communication terminal device via the communication unit 19 that a predetermined response text (for example, a typical answering machine message) and an instruction to output the predetermined response text are given. Send to 5. When the control unit 50 of the second communication terminal device 5 acquires the instruction and the response sentence, the control unit 50 causes the output unit 51 to output the predetermined response sentence.
[0077]
(Communication method)
The description of the communication method using the communication system described above is as follows. Also in this modification, the same processing as (1), (2), and (3) of the embodiment is performed.
[0078]
However, in the description of (1), (2), and (3) of the embodiment, the user is replaced with the first user. The speaker is replaced with the second user. Also, the speaker identification information is replaced with user identification information. The first command is the first command of the present modification, and the second command is the second command of the present modification.
[0079]
FIG. 6 is a flowchart for explaining processing performed when the second user inputs input information to the second communication terminal device 5. Here, as an example, it is assumed that the first communication terminal device 1 is placed at the home of the first user. The first user is assumed to be away from home. The first user is assumed to hold the mobile terminal device 3.
[0080]
Further, in this process, as an example, a case where each terminal device is a telephone device is described. In this process, the first communication terminal device 1 is referred to as the first telephone device 1, the second communication terminal device 5 is referred to as the second telephone device 5, and the mobile terminal device 3 is referred to as the mobile phone device. However, this modification can also be applied when each terminal device is other than a telephone device.
[0081]
First, when going out from home, the first user inputs the following information using the input unit 11 of the first telephone device 1. The first user inputs “to instruct to send predetermined message information to the second telephone device 5 when a call signal is received from the second telephone device 5” through the input unit 11. Then, the instruction is sent to the control unit 20. The control unit 20 sends a predetermined instruction to each unit so that the above instruction can be executed.
[0082]
Then, a certain second user inputs the telephone number assigned to the first telephone device 1 by using the input unit 51 of his / her second telephone device 5. The telephone number is sent to the control unit 50. The control unit 50 sends a calling signal to the first telephone device 1 based on the telephone number.
[0083]
The call signal is sent to the control unit 20 of the first telephone device 1 via the communication unit 19 of the first telephone device 1. The control unit 20 may, for example, “I am currently out. The message information of "Hello", please voice input the word ", via the communication unit 19, and sends it to the second telephone device 5.
[0084]
The message information is sent to the output unit 51 via the communication unit 49 and the control unit 50 of the second telephone device 5. The output unit 51 outputs the message information.
[0085]
The second user uses the input unit 51, a voice indicating a predetermined input information (e.g., "Hello") to enter (S200). Then, the control unit 50 sends an audio signal indicating the predetermined input information to the control unit 20 of the first telephone device 1 via the communication unit 49 or the like. The control unit 20 sends an audio signal indicating the predetermined input information to the feature extraction unit 13 via the audio input unit 12.
[0086]
And the process of step S20, S30, S40 is performed. And when it is judged that the highest similarity exceeds the predetermined value, the 1st acquisition part 14 acquires user specific information relevant to the 1st feature data corresponding to the highest similarity (S210). The first acquisition unit 14 sends the acquired user identification information to the second acquisition unit 16. Thereafter, the process proceeds to the process of step S240.
[0087]
When the first acquisition unit 14 determines that the highest degree of similarity does not exceed the predetermined value, a message to that effect is sent to the control unit 20. The control unit 20 sends a predetermined response sentence to the second telephone device 5 via the communication unit 19 (S220). Then, a predetermined response text is sent to the control unit 50 of the second telephone device 5. The control unit 50 causes the output unit 51 to output a predetermined response sentence (for example, a typical answering machine message) (S230).
[0088]
The second acquisition unit 16 refers to the correspondence table and acquires a command corresponding to the user identification information acquired by the first acquisition unit 14 (S240). Then, the second acquisition unit 16 sends the acquired command to the control unit 20. At this time, user identification information is also sent to the control unit 20.
[0089]
The control unit 20 determines whether the sent command is the first command or the second command (S250). When the control unit 20 determines that the sent command is the first command, the control unit 20 sends the first command to the communication unit 19. The communication unit 19 sends the first command to the second telephone device 5 (S260). Thereafter, the process proceeds to S280.
[0090]
On the other hand, when the control unit 20 determines that the sent command is the second command, the control unit 20 performs the following processing. The control unit 20 accesses the second storage unit 17. And the control part 20 acquires the response sentence corresponding to user specific information (S265). Then, the control unit 20 sends the second command and the acquired response sentence to the communication unit 19. The communication unit 19 sends the second command and the response sentence to the second telephone device 5 (S270). Thereafter, the process proceeds to S300.
[0091]
When the first command is sent to the control unit 50 of the second telephone device 5, the following processing is performed. The control unit 50 instructs the communication unit 49 to set a state in which data communication is possible with the mobile phone device 3 so that a call can be made between the first user and the second user (S280). ). Based on the instruction sent from the control unit 50, the communication unit 49 enables data communication with the mobile phone device 3 so that a call can be made between the first user and the second user. Setting is made (S290).
[0092]
Specifically, the communication unit 49 sends a connection request signal to the control unit 30 of the mobile phone device 3 based on the instruction sent from the control unit 50. The control unit 30 of the mobile phone device 3 causes the output unit 31 to output that a connection request signal has been sent from the second phone device 5. Then, the communication unit 29 of the mobile phone device 3 sends a response signal to the second phone device 5. As a result, the second telephone device 5 is set to a state in which data communication can be performed with the mobile phone device 3. When the communication unit 49 is set in a state in which data communication is possible with the mobile phone device 3, the communication unit 49 sends a message to that effect to the control unit 50. The control unit 50 causes the output unit 51 to output a message indicating that a call with the first user is possible.
[0093]
Then, a call is performed between the first user and the second user (S295). A specific description of this process is as follows. The second user uses the input unit 51 to input predetermined call information. The predetermined call information is output to the output unit 31 of the mobile phone device 3 via the control unit 50, the communication unit 49, and the like. The first user uses the input unit 31 to input predetermined call information. The predetermined call information is output to the output unit 51 of the second telephone device 5 via the control unit 30, the communication unit 29, and the like. Thereby, a telephone call is performed between the first user and the second user.
[0094]
In step S300, the second command and the response sentence are sent to the control unit 20 of the second telephone device 5. In response to an instruction from the control unit 20, the output unit 18 outputs the response sentence.
[0095]
(Function and effect)
In the communication system of this modification, for example, when the first user is out of the home (away from a predetermined place), the second user who makes a call to the home is a predetermined speaker. In this case, it is possible to accurately identify who the second user is.
[0096]
In addition, this communication system, for example, when the first user is away from home, the mobile phone device 3 (mobile terminal device 3) and the second phone according to the second user who makes a call to the home. Any one of the process of setting the communication apparatus 5 (second communication terminal apparatus 5) in a communicable state and the process of outputting a predetermined response sentence to the second user can be performed.
[0097]
As a result, for example, when a call signal is sent from the second telephone device 5 of a close friend to the first telephone device 1 at home, the first user can make a call with the friend from the outside. . Further, for example, when a call signal is sent from the second telephone device 5 of a friend who is not so close to the first telephone device 1 at home, the first telephone device 1 at home is the second telephone device of the friend. 5 can send a predetermined response sentence. Then, the second telephone device 5 outputs a predetermined response sentence to the friend. Therefore, a communication system convenient for the first user can be realized.
[0098]
In addition, when the second user is not a user corresponding to the user specifying information stored in the first storage unit 15, for example, typical message information is output to the second user. For this reason, when each response sentence in the first storage unit 15 includes information that the first user does not want to be known to the third party, the information is known to the third party. There is nothing.
[0099]
In the present modification, the second storage unit 17 stores user identification information. Each user specifying information is associated with a response sentence. Then, for example, a response sentence corresponding to the second user is output to the second user of the second telephone device 5 that has sent a call signal to the first telephone device 1.
[0100]
For this reason, for example, when a call signal is sent to the first telephone apparatus 1 from the second telephone apparatus 5 of the boss (or senior) of the company of the first user, the output unit 51 of the second telephone apparatus 5 is The response sentence for the boss (or senior) can be output to the boss (or senior). Also, for example, when a call signal is sent to the first telephone device 1 from the second telephone device 5 of the subordinate (or junior) of the company of the first user, the output unit 51 of the second telephone device 5 is The response sentence for the subordinate (or junior) can be output to the subordinate (or junior). Accordingly, it is possible to realize a communication system that is more convenient for the first user.
[0101]
(Modification 2)
Further, when the feature extraction unit 13 cannot extract feature data, a message to that effect may be sent to the control unit 20. Then, the control unit 20 may cause the output unit 18 to output a predetermined response sentence (for example, a typical answering machine message).
[0102]
Further, the communication system may include a speaker model generation unit (not shown). Then, the speaker model generation unit may generate a second speaker model based on the second feature data extracted by the feature extraction unit 13. At this time, the speaker model generation unit may generate a speaker model based on the hidden Markov model method.
[0103]
And in the 1st memory | storage part 15, each speaker specific information is matched with the 1st speaker model. Then, the first acquisition unit 14 may calculate the similarity between the second speaker model and each first speaker model. The first acquisition unit 14 calculates, for example, the symbol occurrence probability as the similarity based on the Viterbi algorithm described in “Voice Recognition Using a Probability Model” (by Nakagawa, 1988, published by the IEICE, Corona). can do.
[0104]
Note that the communication system of the first modification can be modified as in this modification by replacing the speaker identification information described in this modification with the user identification information.
[0105]
(Modification 3)
The stored contents of the second storage unit 17 may be modified as follows. Rank information is associated with the speaker identification information associated with the second command. Each rank information is associated with a response sentence. The second storage unit 17 stores a plurality of speaker specifying information. FIG. 7 is a diagram illustrating an example of the contents stored in the second storage unit 17 of the third modification.
[0106]
In step S90, when the command acquired by the second acquisition unit 16 is the second command, the control unit 20 may perform the following processing after accessing the second storage unit 17. . The control unit 20 may acquire rank information corresponding to the speaker specifying information and acquire a response sentence corresponding to the acquired rank information.
[0107]
Note that the communication system of the first modification can be modified as in this modification by replacing the speaker identification information described in this modification with the user identification information.
[0108]
In the above-described embodiment and each modification, the voice input unit 12 may have a noise canceling function.
[0109]
【The invention's effect】
The present invention provides, for example, a process of setting the mobile terminal device 3 and the communication terminal device 1 to be communicable according to a speaker visiting the home when the user is away from home, Any one of the processes for outputting the response sentence to the speaker can be performed.
[0110]
As a result, for example, when a close friend visits the home, the user can make a call with the friend from the outside. Further, for example, when a friend who is not so close visits home, the communication terminal device 1 can output a predetermined response sentence to the friend. Therefore, a communication system convenient for the user can be realized.
[Brief description of the drawings]
FIG. 1 is a diagram showing a configuration of a communication system according to an embodiment.
FIG. 2 is a diagram showing an example of storage contents of a first storage unit according to the present embodiment.
FIG. 3 is a diagram showing an example of a correspondence table according to the present embodiment.
FIG. 4 is a flowchart for explaining a communication method of the present embodiment.
FIG. 5 is a diagram illustrating a configuration of a communication system according to a first modification.
FIG. 6 is a flowchart for explaining a communication method of a first modification.
FIG. 7 is a diagram illustrating an example of storage contents of a second storage unit according to Modification 2;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... 1st communication terminal device, 1st telephone device, communication terminal device, 3 ... Mobile terminal device, mobile phone device, 4 ... Communication network, 5 ... 2nd communication terminal device, 2nd telephone device, 11 ... Input part, DESCRIPTION OF SYMBOLS 12 ... Voice input part, 13 ... Feature extraction part, 14 ... 1st acquisition part, 15 ... 1st memory | storage part, 16 ... 2nd acquisition part, 17 ... 2nd memory | storage part, 18 ... Output part, 19 ... Communication part, DESCRIPTION OF SYMBOLS 20 ... Control part, 29 ... Communication part, 30 ... Control part, 31 ... Input / output part, 49 ... Communication part, 50 ... Control part, 51 ... Input / output part

Claims

A communication system having a communication terminal arranged at a predetermined location and a portable terminal carried by a user of the communication terminal away from the predetermined location,
The communication terminal is
A plurality of pieces of speaker specifying information, which are information specifying the speaker selected in advance by the user, associated with the first feature data indicating the characteristics of the audio signal of the predetermined input information input in advance by the speaker are stored in advance. First storage means;
Feature extraction means for extracting second feature data indicating features of a voice signal of the predetermined input information based on predetermined input information input by a speaker;
A similarity between the second feature data and each first feature data is calculated. If the highest similarity among the calculated similarities exceeds a predetermined value, the second corresponding to the highest similarity is calculated. First acquisition means for acquiring speaker identification information related to one feature data;
A first command which is a command for instructing to set each speaker specific information and the portable terminal to be in a communicable state, or a command for instructing a speaker to output a response sentence A second acquisition means for acquiring a command corresponding to the speaker specifying information acquired by the first acquisition means with reference to correspondence data associated with a second command;
If the command acquired by the second acquisition means is the first command, communication for setting data communication with the portable terminal so that a call can be performed between the speaker and the user. Means,
When the instruction acquired by the second acquisition means is a second instruction, the output means outputs the response sentence to the speaker.
The predetermined value in the first acquisition means is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is a certain value or more for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
Further, the communication terminal outputs a message information for voice input of predetermined input information which is a specific word set in advance;
Means for sending a voice signal indicating the acquired predetermined input information to the feature extraction means via voice input means;
The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
The control means outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction means;
A communication system characterized by the above.

In the communication terminal, a response sentence is associated with the speaker identification information, and a second storage unit that stores a plurality of the speaker identification information in advance,
And when the command acquired by the second acquisition means is a second command, there is a third acquisition means for acquiring a response sentence corresponding to the speaker specifying information,
The communication system according to claim 1, wherein the output unit outputs the response sentence acquired by the third acquisition unit to the speaker.

A communication terminal device that is located at a predetermined location and is owned by a user away from the predetermined location,
A plurality of pieces of speaker specifying information, which are information specifying the speaker selected in advance by the user, associated with the first feature data indicating the characteristics of the audio signal of the predetermined input information input in advance by the speaker are stored in advance. First storage means;
Feature extraction means for extracting second feature data indicating features of a voice signal of the predetermined input information based on predetermined input information input by a speaker;
The similarity between the second feature data and each first feature data is calculated, and when the highest similarity among the calculated similarities exceeds a predetermined value, it corresponds to the highest similarity First acquisition means for acquiring speaker identification information related to the first feature data;
A first instruction that is an instruction to communicate with each speaker specific information and a portable terminal carried by the user, or an instruction that instructs the speaker to output a response sentence. A second acquisition unit that acquires a command corresponding to the speaker identification information acquired by the first acquisition unit with reference to the correspondence data associated with the two commands;
If the command acquired by the second acquisition means is the first command, communication for setting data communication with the portable terminal so that a call can be performed between the speaker and the user. Means,
When the instruction acquired by the second acquisition means is a second instruction, the output means outputs the response sentence to the speaker.
The predetermined value in the first acquisition means is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is a certain value or more for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
Further, the communication terminal device outputs message information for voice input of predetermined input information which is a predetermined specific word;
Means for sending a voice signal indicating the acquired predetermined input information to the feature extraction means via voice input means;
The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
The control means outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction means;
A communication terminal device.

A communication method using a communication terminal arranged at a predetermined location and a portable terminal carried by a user of the communication terminal away from the predetermined location,
Speaker specifying information, which is information specifying a speaker selected in advance by the user, is associated with first feature data indicating characteristics of a voice signal of predetermined input information input in advance by a speaker. A plurality of means are stored in advance,
Extracting second feature data indicating a feature of an audio signal of the predetermined input information based on predetermined input information input by a speaker;
A similarity between the second feature data and each first feature data is calculated. If the highest similarity among the calculated similarities exceeds a predetermined value, the second corresponding to the highest similarity is calculated. A first acquisition step of acquiring speaker specifying information related to one feature data;
A first command which is a command for instructing to set each speaker specific information and the portable terminal to be in a communicable state, or a command for instructing a speaker to output a response sentence A second acquisition step of acquiring a command corresponding to the speaker identification information acquired by the first acquisition step with reference to correspondence data associated with a second command;
When the command acquired in the second acquisition step is the first command, the communication terminal can perform data communication with the mobile terminal so that a call can be performed between the speaker and the user. A step to set the state;
When the command acquired in the second acquisition step is a second command, the communication terminal has a step of outputting the response sentence to the speaker.
The predetermined value in the first acquisition step is an experiment as to whether or not the similarity between the first feature data and the second feature data calculated by the first acquisition means is greater than or equal to a certain value for a specific word. And, as a result of the experiment, when the similarity is equal to or higher than the predetermined value, the predetermined value when it is determined that the speaker can be specified,
Further, the communication method includes outputting message information for voice input of predetermined input information that is a specific word set in advance;
Sending the voice signal indicating the acquired predetermined input information to the feature extraction means via the voice input means,
The feature extraction means detects a voice section of a voice signal indicating predetermined input information that has been sent, and performs extraction processing of second feature data in the detected voice section,
The control means outputs a predetermined response sentence when the second feature data cannot be extracted by the feature extraction means;
A communication method characterized by the above.

A response sentence is associated with the speaker identification information, and the second storage means stores a plurality of the speaker identification information in advance.
A third acquisition step of acquiring a response sentence corresponding to the speaker identification information when the instruction acquired in the second acquisition step is a second instruction;
The communication method according to claim 4 , further comprising a step in which the communication terminal outputs the response sentence acquired in the third acquisition step to the speaker.