JP2004104757A

JP2004104757A - Voice input device

Info

Publication number: JP2004104757A
Application number: JP2003129795A
Authority: JP
Inventors: Kazufumi Matsumoto; 松本　一文; Toshihiro Shiren; 枝連　俊弘; Daisuke Asai; 浅井　大輔
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2002-07-16
Filing date: 2003-05-08
Publication date: 2004-04-02

Abstract

<P>PROBLEM TO BE SOLVED: To provide a voice input device externally attached to an electronic device with a wireless communication function, and provided with a feature extraction function of input voice which is necessary for voice recognition, and a communication control function for efficiently using communication lines. <P>SOLUTION: The voice input device connected to an external device connection unit of the electronic device with the wireless communication function and a connection unit for the external device is provided with a voice input means for inputting user's voice, a features extraction means for extracting a voice features from the voice input by the voice input means, a conversion means for converting the voice features extracted by the features extraction means into an information format corresponding to the wireless communication function of the electronic device, and outputting the converted data as the transmission information, a control means for controlling the wireless communication function of the electronic device through the external device connection unit of the electronic device, and instructing the electronic device to send the transmission information output the conversion means from to an external voice recognition device. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、無線通信機能を備えた電子機器に外部接続して利用する音声入力装置に関する。
【０００２】
【従来の技術】
従来、携帯電話機等の無線通信端末を音声入力機器として利用し、音声認識機能をセンター設備側に持たせて音声認識処理や認証処理を行う場合、無線通信端末からセンター設備に音声が送信される通信処理の過程で音声品質が劣化し、音声認識率及び認証率を上げることは難しかった。
【０００３】
例えば、音声入力機器として携帯電話機を利用した場合、電話回線の周波数帯域幅の制限により音声品質が劣化し、音声認識ソフトウェアが実装されたパーソナルコンピュータに接続されたマイクから音声入力する場合等に比べて音声品質の劣化が顕著であり、音声認識率を低下させる一因となっている。
【０００４】
このような通信による音質劣化を防ぐ方法としてＤＳＲ（Ｄｉｓｔｒｉｂｕｔｅｄ　Ｓｐｅｅｃｈ　Ｒｅｃｏｇｎｉｔｉｏｎ）方式が提案されている。このＤＳＲ方式は、音声認識処理のうち前段階の音響処理部分と後段階の音声認識処理部分とを分離して、音声入力機器側に音響処理部分を受け持たせ、センター設備側に音声認識処理部分を受け持たせて、入力音声を通信処理で劣化しないデータに加工してからセンター設備に送信するものである。
【０００５】
このＤＳＲ方式に対応する音響処理機能を無線通信端末に実装する方法として、１）ソフトウェアとしてパーソナルコンピュータや携帯情報端末に実装する、２）ＬＳＩ（Ｌａｒｇｅ　Ｓｃａｌｅ　Ｉｎｔｅｇｒａｔｅｄ　Ｃｉｒｃｕｉｔ）化して無線通信端末に実装する、というものが実現されている。
【０００６】
【発明が解決しようとする課題】
しかしながら、従来のＤＳＲ方式に対応する音響処理機能の無線通信端末への実装方法では、無線通信端末として最も普及している現行の携帯電話機に対しては実装することができず、ＤＳＲ方式対応のＬＳＩを組み込んだ新規の携帯電話機の登場を待たなければならない。
【０００７】
この場合、既存の携帯電話機ユーザーに対して、ＤＳＲ方式の音声認識機能を普及させることは難しくなり、携帯電話機ユーザーにとって最も簡便な入力環境の提供と、サービス提供者にとっても優位な音声認識サービスの普及を遅らせることになる。
【０００８】
また、従来のＤＳＲ方式の音声認識サービスでは、センター設備が音声認識した後、応答を音声で携帯電話機に返すようにしていたため、返信するデータ量が多い割に実質的な情報量が少ないため、無駄な音声データの授受が多くなって、電話回線の利用効率を低下させるとともに、通信コストも高くなるという問題があった。
【０００９】
また、近時、インターネット接続機能を備えた携帯電話機が普及しているが、インターネットから提供されるサービスと、音声認識サービスとを利用する場合、データと音声を別々のネットワーク（インターネットと電話回線）で授受しなければならず、通信コストが高くなるという問題があった。
【００１０】
更に、ＤＳＲ方式対応のＬＳＩを携帯電話機に組み込んだ場合であっても、携帯電話機に内蔵されたマイクを利用することになり、元々周波数帯域が制限された電話回線用に搭載されたものなので、音声認識用のマイクとしては収音周波数帯域が不十分であり、その性能の個体差も大きいため、入力音声を音声認識に必要な周波数帯域で収音することが困難である。
【００１１】
本発明の課題は、無線通信機能を備えた電子機器に外付けして、音声認識に必要な入力音声の特徴抽出機能と、通信回線を効率よく利用する通信制御機能とを有する音声入力装置を提供することである。
【００１２】
【課題を解決するための手段】
上記課題を解決するため、請求項１記載の発明は、
無線通信機能及び外部機器用接続部を備えた電子機器の該外部機器用接続部に接続される音声入力装置であって、
利用者の音声を入力する音声入力手段と、
前記音声入力手段により入力された音声から音声特徴量を抽出する特徴量抽出手段と、
前記特徴量抽出手段により抽出された音声特徴量を前記電子機器の無線通信機能に対応する情報形態に変換して送信情報として出力する変換手段と、
前記電子機器の外部機器用接続部を介して当該電子機器の無線通信機能を制御し、前記変換手段から出力された送信情報を当該電子機器から外部の音声認識装置に送信させる制御手段と、
を備えたことを特徴としている。
【００１３】
請求項１記載の発明によれば、
無線通信機能及び外部機器用接続部を備えた電子機器の該外部機器用接続部に接続される音声入力装置であって、特徴量抽出手段が、音声入力手段により入力された利用者の音声から音声特徴量を抽出し、前記特徴量抽出手段により抽出された音声特徴量を変換手段で前記電子機器の無線通信機能に対応する情報形態に変換して送信情報として出力し、制御手段が、前記電子機器の外部機器用接続部を介して当該電子機器の無線通信機能を制御して、前記変換手段から出力された送信情報を当該電子機器から外部の音声認識装置に送信させる。
【００１４】
したがって、既存の無線通信機能及び外部機器用接続部を備えた電子機器に音声認識機能を付加することができ、既存の電子機器のユーザーに対して音声入力環境を容易に提供できる。請求項２に記載したように、本発明の音声入力機器を無線通信機能及び外部機器用接続部を備えた電子機器と一体に構成することもできる。
【００１５】
また、請求項３に記載する発明のように、請求項１あるいは２記載の音声入力装置において、前記利用者の生体特徴量を検出する生体特徴検出手投を備え、前記変換手段は、前記生体特徴検出手段により検出された生体特徴量及び前記抽出された音声特徴量を前記電子機器の無線通信機能に対応する情報形態に変換して送信情報として出力するようにしてもよい。
【００１６】
請求項３記載の発明によれば、
前記利用者の生体特徴を検出する生体特徴検出手段を備え、前記変換手段は、前記生体特徴検出手段により検出された生体特徴量及び前記抽出された音声特徴量を前記電子機器の無線通信機能に対応する情報形態に変換して送信情報として出力することにより、音声認識に必要な音声特徴量のみを効率よく送信でき、無線通信資源を有効利用して音声認識にかかる通信コストを低減できるとともに、音声特徴と生体特徴とを組み合わせたユーザー認証サービスも容易に提供できる。
【００１７】
また、請求項４に記載する発明のように、請求項１〜３記載の音声入力装置において、前記電子機器は、前記外部の音声認識装置との間の無線通信内容を表示する表示部を備え、前記音声認識装置は、前記送信情報に含まれた音声特徴量及び生体特徴量により前記利用者を認証するとともに音声内容を認識し、その認証結果及び音声認識結果を前記電子機器に応答する認証・認識機能を有し、前記制御手段は、前記電子機器の外部機器用接続部を介して前記音声認識装置から送信された認証結果及び音声認識結果を受信すると、この認証結果及び音声認識結果を前記電子機器の表示部に表示させるようにしてもよい。
【００１８】
請求項４記載の発明によれば、
前記電子機器は、前記外部の音声認識装置との間の無線通信内容を表示する表示部を備え、前記外部の音声認識装置は、前記送信情報に含まれた音声特徴量及び生体特徴量により前記利用者を認証するとともに音声内容を認識し、その認証結果及び音声認識結果を前記電子機器に応答する認証・認識装置であり、前記制御手段は、前記電子機器の外部機器用接続部を介して前記認証・認識装置から送信された認証結果及び音声認識結果を受信すると、この認証結果及び音声認識結果を前記電子機器の表示部に表示させることにより、電子機器のユーザーは、応答結果を見ながら音声入力を行うことができ、その応答レスポンスも高速化でき、使い勝手の良い音声入力環境を提供できる。
【００１９】
また、請求項５に記載する発明のように、請求項１〜４の何れか一項に記載の音声入力装置において、無線通信機能を備えた被制御機器との間で通信手順を実行する無線通信手段と、前記特徴量抽出手段により抽出された音声特徴量に基づいて入力された音声内容を認識して音声認識情報を出力する音声認識手段と、を備え、前記音声入力手段は、前記利用者の被制御機器に対する指示音声を入力し、前記特徴量抽出手段は、前記音声入力手段により入力された指示音声から指示音声特徴量を抽出し、前記音声認識手段は、前記特徴量抽出手段により抽出された指示音声特徴量に基づいて入力された指示音声内容を認識して指示情報を出力し、前記変換手段は、前記音声認識手段から出力された指示情報を前記被制御機器の無線通信機能に対応する情報形態に変換して送信情報として出力し、前記制御手段は、前記無線通信手段を制御して、前記変換手段から出力された送信情報を前記被制御機器に送信させるようにしてもよい。
【００２０】
請求項５記載の発明によれば、無線通信機能を備えた被制御機器との間で通信手順を実行する無線通信手段と、前記特徴量抽出手段により抽出された音声特徴量に基づいて入力された音声内容を認識して音声認識情報を出力する音声認識手段と、を備え、前記音声入力手段は、前記利用者の被制御機器に対する指示音声を入力し、前記特徴量抽出手投は、前記音声入力手段により入力された指示音声から指示音声特徴量を抽出し、前記音声認識手段は、前記特徴量抽出手段により抽出された指示音声特徴量に基づいて入力された指示音声内容を認識して指示情報を出力し、前記変換手段は、前記音声認識手段から出力された指示情報を前記被制御機器の無線通信機能に対応する情報形態に変換して送信情報として出力し、前記制御手段は、前記無線通信手段を制御して、前記変換手段から出力された送信情報を前記被制御機器に送信させることにより、音声入力装置の利用形態を拡大でき、その利便性を向上できる。
【００２１】
【発明の実施の形態】
以下、図を参照して本発明の実施の形態を詳細に説明する。
図１〜図５は、本発明を適用した音声認識システムの一実施の形態を示す図である。
まず、構成を説明する。
図１は、本実施の形態における音声認識システム１００全体の概略構成を示す図である。この音声認識システム１００は、アプリケーションサーバ１０、認証・認識サーバ２０、無線基地局３０、携帯電話機４０、音声入力ユニット５０及び被制御機器６０により構成され、認証・認識サーバ２０、無線基地局３０及び被制御機器６０は、通信ネットワークＮに接続されている。
【００２２】
通信ネットワークＮは、公衆電話回線網、ＩＳＤＮ網、インターネット、ＬＡＮ（Ｌｏｃａｌ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）又はＷＡＮ（Ｗｉｄｅ　Ａｒｅａ　Ｎｅｔｗｏｒｋ）等を含んで構成される。音声認識システム１００では、無線基地局３０は公衆電話回線網に接続され　認証・認識サーバ２０はＬＡＮ又はＷＡＮに接続されるものとする。
【００２３】
また、通信ネットワークＮは、ＬＡＮ又はＷＡＮと、公衆電話回線網、ＩＳＤＮ網及びインターネットとの間に、セキュリティ機能を搭載したネットワーク・サーバ等が接続されており、認証・認識サーバ２０に対するハツキング行為や違法メールなどが送信されないように構成されているものとする。
【００２４】
アプリケーションサーバ１０は、予め登録された携帯電話機４０のユーザーに対して、各種アプリケーションサービスを提供するためのサーバであり、携帯電話機４０から送信されるアプリケーションサービスの要求内容に応じて、携帯電話機４０で実行可能なアプリケーションプログラムを、認証・認識サーバ２０、通信ネットワークＮ及び無線基地局３０を介して携帯電話機４０に送信する。
【００２５】
認証・認識サーバ２０は、携帯電話機４０から送信される圧縮データからユーザーの音声特徴量及び生体特徴量を抽出し、その音声特徴量及び生体特徴量により当該ユーザーを認証するとともに、その音声特徴量により音声内容を認識し、その認証結果及び音声認識結果を応答情報として通信ネットワークＮ及び無線基地局３０を介して携帯電話機４０に送信する。
【００２６】
また、認証・認識サーバ２０は、携帯電話機４０のユーザーを認証した後、携帯電話機４０から送信されるアプリケーションに関する要求内容を含む音声特徴量を認識し、その音声認識結果に基づくアプリケーション要求指示をアプリケーションサーバ１０に送信し、そのアプリケーション要求指示に応じてアプリケーションサーバ１０から応答送信されるアプリケーションプログラムを通信ネットワークＮ及び無線基地局３０を介して携帯電話機４０に送信する。
なお、認証・認識サーバ２０は、請求項に記載の外部の音声認識装置に相当する。
【００２７】
無線基地局３０は、自己の設置場所から通信可能範囲に存在する携帯電話機４０から音声通話要求先の電話番号を受信すると、ネットワークＮを介して接続された電話交換機（図示せず）に送信して、携帯電話機４０と音声通話要求先の携帯電話機あるいは宅内固定電話機との間で音声通話処理に必要な通信プロトコルを実行する。
【００２８】
また、無線基地局３０は、自己の設置場所から通信可能範囲に存在する携帯電話機４０からアプリケーションサービス要求を受信すると、ネットワークＮを介して携帯電話機４０の識別情報を認証・認識サーバ２０に送信して、携帯電話機４０と認証・認識サーバ２０及びアプリケーションサーバ１０との間でアプリケーションサービスに必要な通信プロトコルを実行する。
【００２９】
携帯電話機４０は、通常の音声通話機能と、アプリケーションサーバ１０がネットワークＮを介してインターネット上で開設するアプリケーションサービス用ホームページにアクセスするインターネットアクセス機能とを備える。
【００３０】
また、携帯電話機４０は、図２に示すように、外部機器と接続する外部機器接続部４０１、テンキーや各種機能キーを含むキー入力部４０２及びＬＣＤにより構成された表示部４０３を備える。本実施の形態では、外部機器接続部４０１に音声入力ユニット５０が接続される。なお、携帯電話機４０の通信方式は、ＰＤＣ方式、ＣＤＭＡ方式、ＧＳＭ方式あるいはＰＨＳ方式などで利用される従来の携帯電話機である。なお、携帯電話機４０は、請求項に記載の電子機器に相当する。
【００３１】
音声入力ユニット５０は、図２に示すように、携帯電話機４０の外部機器接続部４０１に接続可能な携帯電話接続部５０９を備えており、携帯電話機４０に外付けすることにより、主に入力音声の音声認識機能を追加するものである。また、音声入力ユニット５０は、携帯電話機４０に接続されない場合、指示音声を指示信号として外部機器通信部５０８により被制御機器６０に無線送信して、被制御機器６０をリモート操作するリモコンユニットとしての機能も有する。
【００３２】
音声入力ユニット５０は、その内部の機能的構成を図３に示すように、音声入力部５０１、生体センサ部５０２、Ａ／Ｄ変換部５０３、特徴量抽出部５０４、音声認識部５０５、データ圧縮部５０６、携帯電話制御部５０７、外部機器通信部５０８及び携帯電話接続部５０９により構成される。
【００３３】
音声入力部５０１は、収音指向性が狭く、音声認識に必要な収音周波数帯域を持つマイクを有し、ユーザーが発声する音声を収音し、アナログ音声信号としてＡ／Ｄ変換部５０３に出力する。なお、音声入力部５０１は、請求項に記載の音声入力手段としての機能を有する。
【００３４】
生体センサ部５０２は、ユーザーの指紋、顔、又はアイリス等の生体特徴を検出する生体センサを有し、ユーザーの指紋、顔、又はアイリス等の生体特徴を検出し、その生体特徴量をデータ圧縮部５０６に出力する。なお、生体センサ部５０２は、請求項に記載の生体特徴検出手段としての機能を有する。
【００３５】
Ａ／Ｄ変換部５０３は、音声入力部５０１から入力されたアナログ音声信号を所定のサンプリング周波数でサンプリングし、そのサンプリングしたアナログ音声信号を所定の量子化数でデジタル変換してデジタル音声データとして特徴量抽出部５０４に出力する。
【００３６】
特徴量抽出部５０４は、Ａ／Ｄ変換部５０３から入力されたデジタル音声データから音声認識に必要な音声特徴量を抽出し、その音声特徴量を音声認識部５０５及びデータ圧縮部５０６に出力する。なお、特徴量抽出部５０４は、請求項に記載の特徴量抽出手段としての機能を有する。
【００３７】
音声認識部５０５は、音声入力ユニット５０が携帯電話機４０に接続されていない場合に動作し、特徴量抽出部５０４から入力された指示音声特徴量から指示音声内容を認識し、その指示情報をデータ圧縮部５０６に出力する。なお、音声認識部５０５は、請求項に記載の音声認識手段としての機能を有する。
【００３８】
データ圧縮部５０６は、特徴量抽出部５０４から入力された音声特徴量と、生体センサ部５０２から入力された生体特徴量とを携帯電話機４０のデータ通信方式に適したデータ圧縮方式で圧縮し、また、音声認識部５０５から入力された指示情報を外部機器通信部５０８の通信方式に適したデータ圧縮方式で圧縮し、各圧縮データ（送信情報）を携帯電話制御部５０７に出力する。なお、データ圧縮部５０６は、請求項に記載の変換手段としての機能を有する。
【００３９】
携帯電話制御部５０７は、音声入力ユニット５０が携帯電話機４０に接続されていた場合、データ圧縮部５０６から入力された圧縮データを携帯電話接続部５０９を介して携帯電話機４０のデータ通信機能を制御して、認証・認識サーバ２０に送信させる。そして、携帯電話制御部５０７は、送信した圧縮データに対する認証・認識サーバ２０の応答信号（認証結果、音声認識結果、コンテンツ等）を携帯電話機４０から携帯電話接続部５０９を介して受信すると、携帯電話機４０の表示部４０３を制御して受信内容を表示させる。
【００４０】
また、携帯電話制御部５０７は、音声入力ユニット５０が携帯電話機４０に接続されていない場合、データ圧縮部５０６から入力された圧縮データを被制御機器６０を制御する制御信号として外部機器通信部５０８に出力し、外部機器通信部５０８を制御して制御信号を被制御機器６０に送信させる。なお、携帯電話制御部５０７は、請求項に記載の制御手段としての機能を有する。
【００４１】
外部機器通信部５０８は、ＩｒＤＡ（Ｉｎｆｒａｒｅｄ　Ｄａｔａ　Ａｓｓｏｃｉａｔｉｏｎ）、無線ＬＡＮ、又はブルートゥース（Ｂｌｕｅｔｏｏｔｈ）等の被制御機器６０に対応した無線通信機能を有し、携帯電話制御部５０７から入力された制御信号を被制御機器６０に無線送信する。なお、外部機器通信部５０８は、請求項に記載の無線通信手段としての機能を有する。電池部５１０は、充電可能な蓄電池と、音声入力ユニット５０内の各部に必要な駆動電圧を生成する電圧生成回路とを内蔵し、その生成した駆動電圧を音声入力ユニット５０内の各部に供給する。
【００４２】
被制御機器６０は、ＩｒＤＡ（Ｉｎｆｒａｒｅｄ　Ｄａｔａ　Ａｓｓｏｃｉａｔｉｏｎ）、無線ＬＡＮ、又はブルートゥース（ＢＩｕｅｔｏｏｔｈ）等の無線通信機能と、通信ネットワークＮに接続する有線通信機能とを有する家電製品等であり、音声入力ユニット５０から無線送信される圧縮データを受信して、その圧縮データに含まれる指示情報に応じた動作を行うとともに、通信ネットワークＮを介して認証・認識サーバ２０から送信される指示信号を受信して、指示信号に応じた動作を行う。
【００４３】
次に、本実施の形態の動作を説明する。
まず、音声認識システム１００を携帯電話機４０のユーザーが利用する際の前提としてユーザー認証のための音声特徴量と生体特徴量の登録が必要であり、音声入力ユニット５０を携帯電話機４０に接続し、ユーザーの音声特徴量情報と生体特徴量情報を認証・認識サーバ２０に送信して登録しておくものとする。
【００４４】
音声入力ユニット５０が携帯電話機４０に接続された場合の音声認識システム１００の動作について、図４に示すフローチャートを参照して説明する。
音声入力ユニット５０が携帯電話機４０に接続されている場合、ユーザーの音声（例えば、名前等）が音声入力ユニット５０の音声入力部５０１に向かって発声されると（ステップＳ１０１）、音声入力部５０１により収音されてアナログ音声信号としてＡ／Ｄ変換部５０３に出力される（ステップＳ１０２）。また、この時、生体センサ部５０２では、ユーザーの指紋、顔、又はアイリス等の生体特徴量が検出され（ステップＳ２０１）、その検出された生体特徴量がデータ圧縮部５０６に出力される（ステップＳ２０２）。
【００４５】
次いで、Ａ／Ｄ変換部５０３では、音声入力部５０１から入力されたアナログ音声信号がサンプリングされ、デジタル音声データに変換されて特徴量抽出部５０４に出力される（ステップＳ１０３）。特徴量抽出部５０４では、Ａ／Ｄ変換部５０３から入力されたデジタル音声データから音声認識に必要な音声特徴量が抽出され、その音声特徴量がデータ圧縮部５０６に出力される（ステップＳ１０４）。
【００４６】
データ圧縮部５０６では、特徴量抽出部５０４から入力された音声特徴量及び生体センサ部５０２から入力された生体特徴量が携帯電話機４０のデータ通信方式に適したデータ圧縮方式で圧縮され、その圧縮データが携帯電話制御部５０７に出力される（ステップＳ１０５）。携帯電話制御部５０７では、データ圧縮部５０６から圧縮データが入力されると、携帯電話接続部５０９を介して携帯電話機４０のデータ通信機能が制御されて、その圧縮データが無線基地局３０及び通信ネットワークＮを介して認証・認識サーバ２０に送信される（ステップＳ１０６）。
【００４７】
認証・認識サーバ２０では、携帯電話機４０から受信した圧縮データが伸長されて音声特徴量と生体特徴量が抽出され、この音声特徴量及び生体特徴量が、予め登録された該当ユーザーの音声特徴量及び生体特徴量と照合されて携帯電話機４０のユーザー認証が行われるとともに、音声特徴量に基づいて音声認識が行われ、その認証結果及び音声認識結果が通信ネットワークＮ及び無線基地局３０を介して携帯電話機に送信される。
【００４８】
携帯電話制御部５０７では、認証・認識サーバ２０から送信された認証結果及び音声認識結果が携帯電話機４０により受信されると、携帯電話接続部５０９を介して携帯電話機４０の表示部４０３が制御されて受信内容が表示される（ステップＳ１０７）。
【００４９】
また、認証・認識サーバ２０では、音声認識した音声内容にアプリケーションサーバ１０へのサービス要求が含まていれるか否かが判別され、アプリケーションサーバー１０へのサービス要求が含まていると、そのサービス要求内容がアプリケーションサーバ１０に送信される。
【００５０】
アプリケーションサーバ１０では、認証・認織サーバ２０から送信されたサービス要求内容に応じて、対応するコンテンツデータやアプリケーションプログラムが認証・認識サーバ２０に送信される。認証・認識サーバ２０では、アプリケ−ションサーバ１０から送信されたコンテンツデータやアプリケーションプログラムが通信ネットワークＮ及び無線基地局３０を介して携帯電話機４０に送信される。
【００５１】
携帯電話制御部５０７では、認証・認識サーバ２０から送信されたコンテンツデータやアプリケーションプログラムが携帯電話機４０により受信されると、携帯電話接続部５０９を介して携帯電話機４０のアプリケーション実行部（図示せず）及び表示部４０３が制御されて、コンテンツや実行中のアプリケーション内容が表示される（ステップＳ１０７）。
【００５２】
以後、続いて音声入力部５０１に入力された音声の認識は、音声入力ユニット５０、携帯電話機４０及び認証・認識サーバ２０において実行される上記ステップＳ１０１〜ステップＳ１０７の処理手順により繰り返し行われ、その音声認識結果に基づく応答内容が携帯電話機４０の表示部４０３に表示される。
【００５３】
また、音声入力部５０１から入力された音声内容が被制御機器６０に対する指示音声であった場合、認証・認識サーバ２０は、その指示音声内容を認識し、対応する制御信号を通信ネットワークＮを介して被制御機器６０に制御信号を送信する。
【００５４】
したがって、携帯電話機４０に音声入力ユニット５０を接続することにより、ユーザーは、携帯電話機４０の表示部４０３を自分の正面に向けた状態で、音声入力しながら音声認識結果を表示郡４０３で確認することができ、携帯電話機４０を用いて音声入力に対応するアプリケーションサービスを確実かつ容易に利用することができる。
【００５５】
次に、音声入力ユニット５０が携帯電話機４０に接続されていない場合の音声認識システム１００の動作について、図５に示すフローチャートを参照して説明する。
音声入力ユニット５０が携帯電話機４０に接続されていない場合、ユーザーの指示音声（例えば、電源オン等）が音声入力ユニット５０の音声入力部５０１に向かって発声されると（ステップＳ３０１）、音声入力部５０１により収音されてアナログ指示音声信号としてＡ／Ｄ変換部５０３に出力される（ステップＳ３０２）。
【００５６】
次いで、Ａ／Ｄ変換部５０３では、音声入力部５０１から入力されたアナログ指示音声信号がサンプリングされ、デジタル指示音声データに変換されて特徴量抽出部５０４に出力される（ステップＳ３０３）。特徴量抽出部５０４では、Ａ／Ｄ変換部５０３から入力されたデジタル指示音声データから音声認識に必要な指示音声特徴量が抽出され、その指示音声特徴量が音声認識部５０５に出力される（ステップＳ３０４）。
【００５７】
音声認識部５０５では、特徴量抽出部５０４から入力された音声特徴量から指示音声内容が認識され、その指示情報がデータ圧縮部５０６に出力される（ステップＳ３０５）。データ圧縮部５０６では、音声認識部５０５から入力された指示情報が外部機器通信部５０８のデータ通信方式に適したデータ圧縮方式で圧縮されその圧縮データが携帯電話制御部５０７に出力される（ステップＳ３０６）。
携帯電話制御部５０７では、外部機器通信部５０８が制御されてデータ圧縮部５０６から入力された圧縮データが被制御機器６０に送信される（ステップＳ３０７）。
【００５８】
以後、続いて音声入力部５０１に入力された指示音声の認識は、音声入力ユニット５０内において実行される上記ステップＳ３０１〜ステップＳ３０７の処理手順により繰り返し行われ、その音声認識結果に基づく指示情報が被制御機器６０に送信されて、被制御機器６０がリモート操作される。
【００５９】
以上のように、本実施の形態の音声認識システム１００では、従来の携帯電話機４０の外部機器接続部４０１に音声入力ユニット５０を接続し、音声入力ユニット５０が、ユーザーの音声を収音して音声認識に必要な音声特徴量を抽出し、その音声特徴量を携帯電話機４０のデータ通信方式に適した圧縮データに変換し、携帯電話機４０のデータ通信機能を制御して認証・認識サーバ２０に圧縮データを送信させ、認証・認識サーバ２０から送信される音声認識結果を携帯電話機４０の表示部４０３に表示させるようにした。
【００６０】
したがって、従来は無線通信端末として最も普及している現行の携帯電話機に対して音声認識機能を実装することができなかったが、本実施の形態の音声入力ユニット５０を現行の携帯電話機に接続することにより、現行の携帯電話機にも音声認識機能を付加することができ、従来の携帯電話機ユーザーに対して音声認識サービスを提供できる。
【００６１】
このため、既存の携帯電話機ユーザーに対して、ＤＳＲ方式の音声認識機能を普及させることが容易になり、携帯電話機ユーザーにとって最も簡便な入力環境の提供と、サービス提供者にとっても優位な音声認識サービスの普及を促進することができる。
【００６２】
また、従来のＤＳＲ方式の音声認識サービスでは、センター設備が音声認識した後、応答を音声で携帯電話機に返すようにしていたが、本実施の形態の音声認識システム１００では、入力音声を携帯電話機４０のデータ通信方式に適した圧縮データに変換して認証・認識サーバ２０に送信し、認証・認識サーバ２０から送信される音声認識結果を携帯電話機４０の表示部４０３に表示させるようにしたため、携帯電話機ユーザーは、音声認識結果及び応答内容を見ながら音声入力を継続することができ、その応答レスポンスも高速化でき、使い勝手の良い音声入力環境を提供できる。その結果、無駄な音声データの授受を無くして、通信ネットワークの利用効率を向上させるとともに通信コストも抑えることができる。
【００６３】
また、本実施の形態の音声認識システム１００では、現行の携帯電話機の通信方式のみを利用して音声認識サービス及びアプリケーションサービスを提供できるため、従来のように、インターネットから提供されるサービスと音声認識サービスとを利用する場合、データと音声を別々のネットワーク（インターネットと電話回線）で授受する必要が無くなり、通信コストを更に抑えることができる。
【００６４】
更に、本実施の形態の音声入力ユニット５０では、指向性の良いマイクなどを搭載した音声入力部５０１としたため、従来の携帯電話機に内蔵されたマイクを利用する場合に比べて、ユーザーが発声する音声の周波数帯域を十分カバーして確実に収音することができる。
【００６５】
また、本実施の形態の音声入力ユニット５０では、音声認識部５０５と被制御機器６０に搭載された無線通信機能に対応した外部機器通信部５０８とを有するため、音声入力ユニット５０が携帯電話機４０に接続されず単体の場合は、指示音声を入力して被制御機器６０をリモート操作することもでき、音声入力ユニット５０の利用形態を拡大でき、利便性を向上できる。
【００６６】
また、本実施の形態の音声入力ユニット５０では、ユーザーの指紋、顔又はアイリス等の生体特徴量を検出する生体センサ部５０２を有し、検出された生体特徴量も認証・認識サーバ２０に送信し、音声特徴量及び生体特徴量をユーザー認証時に利用するようにしたため、正規登録ユーザーか否かの認証を確実に行うことができ、音声特徴と生体特徴とを組み合わせたユーザー認証サービスを容易に提供できる。その結果、音声認識サービス及びアプリケーションサービスの不正使用を防止できる。
【００６７】
なお、上記実施の形態では、音声入力ユニット５０を携帯電話機４０に接続するものとして説明したが、例えば、無線通信機能を備えた携帯型情報端末の外部機器接続部の仕様に合わせるようにしても良い。
【００６８】
【発明の効果】
請求項１、２記載の発明によれば、既存の無線通信機能及び外部機器用接続部を備えた電子機器に音声認識機能を付加することができ、既存の電子機器のユーザーに対して音声入力環境を容易に提供できる。
【００６９】
請求項３記載の発明によれば、音声認識に必要な音声特徴量のみを効率よく送信でき、無線通信資源を有効利用して音声認識にかかる通信コストを低減できるとともに、音声特徴と生体特徴とを組み合わせたユーザー認証サービスも容易に提供できる。
【００７０】
請求項４記載の発明によれば、電子機器のユーザーは、応答結果を見ながら音声入力を行うことができ、その応答レスポンスも高速化でき、使い勝手の良い音声入力環境を提供できる。
【００７１】
請求項５記載の発明によれば、音声入力装置の利用形態を拡大でき、その利便性を向上できる。
【図面の簡単な説明】
【図１】本発明を適用した一実施の形態における音声認識システム１００の全体構成を示す図である。
【図２】本実施の形態における携帯電話機４０と音声入力ユニット５０の外観を示す図である。
【図３】図２の音声入力ユニット５０内部の機能的構成を示すブロック図である。
【図４】図２の音声入力ユニット５０が携帯電話機４０に接続された場合に、音声入力ユニット５０により実行される動作内容を示すフローチャートである。
【図５】図２の音声入力ユニット５０が携帯電話機４０に接続されない場合に、音声入力ユニット５０により実行される動作内容を示すフローチャートである。
【符号の説明】
１０　　アプリケーションサーバ
２０　　認証・認識サーバ
３０　　無線基地局
４０　　携帯電話機
５０　　音声入力ユニット
６０　　被制御機器
１００　音声認識システム
４０１　外部機器接続部
４０２　キー入力部
４０３　表示部
５０１　音声入力部
５０２　生体センサ部
５０３　Ａ／Ｄ変換部
５０４　特徴量抽出部
５０５　音声認識部
５０６　データ圧縮部
５０７　携帯電話制御部
５０８　外部機器通信部
５０９　携帯電話接続部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice input device used by being externally connected to an electronic device having a wireless communication function.
[0002]
[Prior art]
Conventionally, when a wireless communication terminal such as a mobile phone is used as a voice input device and a voice recognition function is provided in the center facility to perform voice recognition processing and authentication processing, voice is transmitted from the wireless communication terminal to the center facility. In the course of communication processing, the voice quality deteriorates, and it has been difficult to increase the voice recognition rate and the authentication rate.
[0003]
For example, when a mobile phone is used as a voice input device, the voice quality is degraded due to the limitation of the frequency bandwidth of the telephone line, and compared with a case where voice is input from a microphone connected to a personal computer on which voice recognition software is installed. As a result, the voice quality is remarkably deteriorated, which is one of the causes of lowering the voice recognition rate.
[0004]
A DSR (Distributed Speech Recognition) method has been proposed as a method of preventing sound quality deterioration due to such communication. This DSR method separates a sound processing part at a preceding stage and a sound recognition processing part at a later stage in the sound recognition processing, and assigns the sound processing part to the sound input device side, and the sound recognition processing part to the center facility side. The part is assigned, the input voice is processed into data that does not deteriorate in the communication processing, and then transmitted to the center facility.
[0005]
As a method of mounting the sound processing function corresponding to the DSR method in a wireless communication terminal, 1) mounting it as a software on a personal computer or a portable information terminal, 2) converting it into an LSI (Large Scale Integrated Circuit) and mounting it on the wireless communication terminal. , Has been realized.
[0006]
[Problems to be solved by the invention]
However, the conventional method of mounting the sound processing function corresponding to the DSR method on a wireless communication terminal cannot be mounted on a current mobile phone that is most widely used as a wireless communication terminal, and is not compatible with the DSR method. We have to wait for a new mobile phone incorporating an LSI.
[0007]
In this case, it is difficult to disseminate the DSR-based voice recognition function to existing mobile phone users, providing the simplest input environment for mobile phone users and providing a voice recognition service that is superior to service providers. It will delay its spread.
[0008]
Also, in the conventional DSR-based voice recognition service, after the center facility performs voice recognition, a response is returned to the mobile phone by voice, so that the amount of data to be returned is large, but the actual information amount is small. There has been a problem that useless transmission and reception of voice data increases, which reduces the use efficiency of telephone lines and increases communication costs.
[0009]
In recent years, mobile phones with an Internet connection function have become widespread, but when using services provided from the Internet and voice recognition services, data and voice must be stored on separate networks (Internet and telephone lines). In such a case, communication costs increase.
[0010]
Furthermore, even when a DSR-compatible LSI is incorporated in a mobile phone, the microphone built into the mobile phone is used, and the microphone is originally mounted for a telephone line whose frequency band is limited. A microphone for voice recognition has an insufficient sound-collecting frequency band, and there is a large individual difference in performance. Therefore, it is difficult to collect input sound in a frequency band necessary for speech recognition.
[0011]
An object of the present invention is to provide a voice input device that is externally provided to an electronic device having a wireless communication function and has a feature extraction function of an input voice necessary for voice recognition and a communication control function that efficiently uses a communication line. To provide.
[0012]
[Means for Solving the Problems]
In order to solve the above problems, the invention according to claim 1 is:
An audio input device connected to the external device connection portion of an electronic device having a wireless communication function and an external device connection portion,
Voice input means for inputting a user's voice,
A feature amount extracting unit that extracts a voice feature amount from the voice input by the voice input unit,
A conversion unit that converts the audio feature amount extracted by the feature amount extraction unit into an information form corresponding to a wireless communication function of the electronic device and outputs the information form as transmission information;
A control unit that controls a wireless communication function of the electronic device via the external device connection unit of the electronic device, and transmits transmission information output from the conversion unit to the external voice recognition device from the electronic device.
It is characterized by having.
[0013]
According to the first aspect of the present invention,
An audio input device connected to the external device connection unit of an electronic device having a wireless communication function and an external device connection unit, wherein the feature amount extraction unit is configured to detect a user's voice input by the audio input unit. The voice feature amount is extracted, the voice feature amount extracted by the feature amount extraction unit is converted by a conversion unit into an information form corresponding to a wireless communication function of the electronic device, and output as transmission information. The wireless communication function of the electronic device is controlled via the external device connection unit of the electronic device, and the transmission information output from the conversion unit is transmitted from the electronic device to an external voice recognition device.
[0014]
Therefore, a voice recognition function can be added to an existing electronic device having a wireless communication function and an external device connection unit, and a voice input environment can be easily provided to a user of the existing electronic device. As described in claim 2, the voice input device of the present invention can be integrally formed with an electronic device having a wireless communication function and a connection portion for an external device.
[0015]
According to a third aspect of the present invention, there is provided the voice input device according to the first or second aspect, further comprising: a biometric feature detecting means for detecting a biometric feature amount of the user; The biological feature amount detected by the feature detection unit and the extracted voice feature amount may be converted into an information form corresponding to a wireless communication function of the electronic device and output as transmission information.
[0016]
According to the invention described in claim 3,
A biometric feature detection unit configured to detect a biometric feature of the user, wherein the conversion unit converts the biometric feature amount detected by the biometric feature detection unit and the extracted voice feature amount into a wireless communication function of the electronic device. By converting to the corresponding information form and outputting it as transmission information, only the speech features necessary for speech recognition can be transmitted efficiently, and the communication cost for speech recognition can be reduced by effectively utilizing wireless communication resources. A user authentication service combining voice features and biometric features can be easily provided.
[0017]
According to a fourth aspect of the present invention, in the voice input device according to the first to third aspects, the electronic device includes a display unit that displays wireless communication contents with the external voice recognition device. The voice recognition device authenticates the user based on voice features and biometric features included in the transmission information, recognizes voice content, and authenticates the authentication result and the voice recognition result to the electronic device. -Having a recognition function, the control means, upon receiving the authentication result and the voice recognition result transmitted from the voice recognition device via the external device connection unit of the electronic device, the control unit converts the authentication result and the voice recognition result You may make it display on the display part of the said electronic device.
[0018]
According to the invention described in claim 4,
The electronic device includes a display unit that displays the content of wireless communication with the external voice recognition device, the external voice recognition device, the external voice recognition device according to the voice feature and biometric feature included in the transmission information An authentication / recognition device that authenticates a user and recognizes voice content, and responds to the electronic device with the authentication result and the voice recognition result, wherein the control unit is connected to an external device connection unit of the electronic device. Upon receiving the authentication result and the voice recognition result transmitted from the authentication / recognition device, by displaying the authentication result and the voice recognition result on the display unit of the electronic device, the user of the electronic device can view the response result. Voice input can be performed, the response response can be speeded up, and a user-friendly voice input environment can be provided.
[0019]
According to a fifth aspect of the present invention, in the voice input device according to any one of the first to fourth aspects, the wireless input device executes a communication procedure with a controlled device having a wireless communication function. Communication means, and speech recognition means for recognizing the input speech content based on the speech feature quantity extracted by the feature quantity extraction means and outputting speech recognition information, wherein the speech input means comprises Inputting a command voice to a controlled device of the user, the feature amount extracting unit extracts a command voice feature amount from the command voice input by the voice input unit, and the voice recognition unit uses the feature amount extracting unit. Recognizing the instruction voice content input based on the extracted instruction voice feature amount and outputting the instruction information, the conversion unit converts the instruction information output from the voice recognition unit into a wireless communication function of the controlled device. To Is converted into information form output as transmission information to said control means, said controls the wireless communication unit, the transmission information outputted from the converting means may be caused to transmit the to the controlled apparatus.
[0020]
According to the fifth aspect of the present invention, a wireless communication unit that executes a communication procedure with a controlled device having a wireless communication function, and an input based on the audio feature amount extracted by the feature amount extraction unit. Voice recognition means for recognizing the voice content and outputting voice recognition information, wherein the voice input means inputs an instruction voice to the controlled device of the user, and The instruction voice feature amount is extracted from the instruction voice input by the voice input unit, and the voice recognition unit recognizes the input instruction voice content based on the instruction voice feature amount extracted by the feature amount extraction unit. Outputting instruction information, the conversion means converts the instruction information output from the voice recognition means into an information form corresponding to a wireless communication function of the controlled device, and outputs the information form as transmission information, and the control means, Previous It controls the wireless communication unit, by transmitting the transmission information outputted from said conversion means to the controlled device, can expand the usage of the voice input device can improve its convenience.
[0021]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
1 to 5 are diagrams showing an embodiment of a speech recognition system to which the present invention is applied.
First, the configuration will be described.
FIG. 1 is a diagram showing a schematic configuration of the entire speech recognition system 100 according to the present embodiment. The voice recognition system 100 includes an application server 10, an authentication / recognition server 20, a wireless base station 30, a mobile phone 40, a voice input unit 50, and a controlled device 60. The controlled device 60 is connected to the communication network N.
[0022]
The communication network N includes a public telephone line network, an ISDN network, the Internet, a LAN (Local Area Network), a WAN (Wide Area Network), and the like. In the voice recognition system 100, the wireless base station 30 is connected to a public telephone network, and the authentication / recognition server 20 is connected to a LAN or WAN.
[0023]
In the communication network N, a network server or the like equipped with a security function is connected between the LAN or WAN and a public telephone line network, an ISDN network, and the Internet. It is assumed that the system is configured not to send illegal e-mails.
[0024]
The application server 10 is a server for providing various application services to a user of the mobile phone 40 registered in advance, and the application server 10 uses the mobile phone 40 according to the request content of the application service transmitted from the mobile phone 40. The executable application program is transmitted to the mobile phone 40 via the authentication / recognition server 20, the communication network N, and the wireless base station 30.
[0025]
The authentication / recognition server 20 extracts a voice feature and a biometric feature of the user from the compressed data transmitted from the mobile phone 40, authenticates the user based on the voice feature and the biometric feature, and furthermore, extracts the voice feature. And transmits the authentication result and the voice recognition result to the mobile phone 40 via the communication network N and the wireless base station 30 as response information.
[0026]
After authenticating the user of the mobile phone 40, the authentication / recognition server 20 recognizes a voice feature including a request content related to the application transmitted from the mobile phone 40, and issues an application request instruction based on the voice recognition result to the application. An application program transmitted to the server 10 and transmitted as a response from the application server 10 in response to the application request instruction is transmitted to the mobile phone 40 via the communication network N and the wireless base station 30.
Note that the authentication / recognition server 20 corresponds to an external voice recognition device described in the claims.
[0027]
When receiving the telephone number of the voice call request destination from the mobile phone 40 existing within the communicable range from its own installation location, the wireless base station 30 transmits the telephone number to a telephone exchange (not shown) connected via the network N. Then, a communication protocol required for voice communication processing is executed between the mobile phone 40 and the mobile phone or the fixed telephone at home where the voice call is requested.
[0028]
Further, upon receiving an application service request from the mobile phone 40 existing within the communicable range from its own location, the wireless base station 30 transmits the identification information of the mobile phone 40 to the authentication / recognition server 20 via the network N. Then, the communication protocol required for the application service is executed between the mobile phone 40 and the authentication / recognition server 20 and the application server 10.
[0029]
The mobile phone 40 has a normal voice call function and an Internet access function for accessing an application service homepage established on the Internet by the application server 10 via the network N.
[0030]
Further, as shown in FIG. 2, the mobile phone 40 includes an external device connection unit 401 for connecting to an external device, a key input unit 402 including numeric keys and various function keys, and a display unit 403 including an LCD. In the present embodiment, the audio input unit 50 is connected to the external device connection unit 401. The communication system of the mobile phone 40 is a conventional mobile phone used in the PDC system, the CDMA system, the GSM system, the PHS system, or the like. Note that the mobile phone 40 corresponds to an electronic device described in the claims.
[0031]
As shown in FIG. 2, the voice input unit 50 includes a mobile phone connection unit 509 that can be connected to the external device connection unit 401 of the mobile phone 40. The voice recognition function is added. When the voice input unit 50 is not connected to the mobile phone 40, the voice input unit 50 transmits a command voice as a command signal to the controlled device 60 by the external device communication unit 508 wirelessly, and functions as a remote control unit for remotely operating the controlled device 60. It also has functions.
[0032]
As shown in FIG. 3, the voice input unit 50 includes a voice input unit 501, a biometric sensor unit 502, an A / D conversion unit 503, a feature amount extraction unit 504, a voice recognition unit 505, and a data compression unit. A mobile phone control unit 507, an external device communication unit 508, and a mobile phone connection unit 509.
[0033]
The voice input unit 501 has a microphone with a narrow sound collection directivity and a sound collection frequency band necessary for voice recognition, collects a voice uttered by a user, and sends the voice to the A / D conversion unit 503 as an analog voice signal. Output. Note that the voice input unit 501 has a function as a voice input unit described in the claims.
[0034]
The biometric sensor unit 502 includes a biometric sensor that detects a biometric feature such as a user's fingerprint, face, or iris, detects a biometric feature such as the user's fingerprint, face, or iris, and compresses the biometric feature. Output to the unit 506. The living body sensor unit 502 has a function as a living body characteristic detecting unit described in the claims.
[0035]
The A / D conversion unit 503 samples the analog audio signal input from the audio input unit 501 at a predetermined sampling frequency, and digitally converts the sampled analog audio signal with a predetermined quantization number to obtain digital audio data. Output to the quantity extraction unit 504.
[0036]
The feature extraction unit 504 extracts a speech feature necessary for speech recognition from the digital speech data input from the A / D conversion unit 503, and outputs the speech feature to the speech recognition unit 505 and the data compression unit 506. . Note that the feature amount extraction unit 504 has a function as a feature amount extraction unit described in the claims.
[0037]
The voice recognition unit 505 operates when the voice input unit 50 is not connected to the mobile phone 40, recognizes the instruction voice content from the instruction voice feature amount input from the feature amount extraction unit 504, and converts the instruction information into data. Output to the compression unit 506. Note that the voice recognition unit 505 has a function as a voice recognition unit described in the claims.
[0038]
The data compression unit 506 compresses the voice feature input from the feature extraction unit 504 and the biometric feature input from the biometric sensor unit 502 using a data compression method suitable for the data communication system of the mobile phone 40, The instruction information input from the voice recognition unit 505 is compressed by a data compression method suitable for the communication method of the external device communication unit 508, and each piece of compressed data (transmission information) is output to the mobile phone control unit 507. Note that the data compression unit 506 has a function as a conversion unit described in the claims.
[0039]
When the voice input unit 50 is connected to the mobile phone 40, the mobile phone control unit 507 controls the data communication function of the mobile phone 40 via the mobile phone connection unit 509 using the compressed data input from the data compression unit 506. Then, the data is transmitted to the authentication / recognition server 20. Upon receiving a response signal (authentication result, voice recognition result, content, etc.) of the authentication / recognition server 20 to the transmitted compressed data from the mobile phone 40 via the mobile phone connection unit 509, the mobile phone control unit 507 The display unit 403 of the telephone 40 is controlled to display the received content.
[0040]
In addition, when the voice input unit 50 is not connected to the mobile phone 40, the mobile phone control unit 507 uses the compressed data input from the data compression unit 506 as a control signal for controlling the controlled device 60 as an external device communication unit 508. And controls the external device communication unit 508 to transmit a control signal to the controlled device 60. Note that the mobile phone control unit 507 has a function as a control unit described in the claims.
[0041]
The external device communication unit 508 has a wireless communication function corresponding to the controlled device 60 such as IrDA (Infrared Data Association), wireless LAN, or Bluetooth (Bluetooth). Wireless transmission to the controlled device 60 is performed. Note that the external device communication unit 508 has a function as a wireless communication unit described in the claims. Battery unit 510 has a built-in rechargeable storage battery and a voltage generation circuit that generates a drive voltage required for each unit in audio input unit 50, and supplies the generated drive voltage to each unit in audio input unit 50. .
[0042]
The controlled device 60 is a home appliance or the like having a wireless communication function such as IrDA (Infrared Data Association), wireless LAN, or Bluetooth (Bluetooth), and a wired communication function for connecting to the communication network N. Receiving the compressed data wirelessly transmitted from the server, performing an operation according to the instruction information included in the compressed data, and receiving the instruction signal transmitted from the authentication / recognition server 20 via the communication network N, The operation according to the instruction signal is performed.
[0043]
Next, the operation of the present embodiment will be described.
First, as a premise when the user of the mobile phone 40 uses the voice recognition system 100, it is necessary to register a voice feature and a biometric feature for user authentication, and connect the voice input unit 50 to the mobile phone 40, It is assumed that the voice feature information and the biometric feature information of the user are transmitted to the authentication / recognition server 20 and registered.
[0044]
The operation of the voice recognition system 100 when the voice input unit 50 is connected to the mobile phone 40 will be described with reference to the flowchart shown in FIG.
When the voice input unit 50 is connected to the mobile phone 40, when a user's voice (for example, a name or the like) is uttered toward the voice input unit 501 of the voice input unit 50 (Step S101), the voice input unit 501 is used. And is output to the A / D converter 503 as an analog audio signal (step S102). At this time, the biometric sensor unit 502 detects biometric features such as a user's fingerprint, face, or iris (step S201), and outputs the detected biometric features to the data compression unit 506 (step S201). S202).
[0045]
Next, in the A / D conversion section 503, the analog audio signal input from the audio input section 501 is sampled, converted into digital audio data, and output to the feature quantity extraction section 504 (step S103). The feature extraction unit 504 extracts a speech feature required for speech recognition from the digital speech data input from the A / D conversion unit 503, and outputs the speech feature to the data compression unit 506 (step S104). .
[0046]
In the data compression unit 506, the voice feature input from the feature extraction unit 504 and the biometric feature input from the biometric sensor unit 502 are compressed by a data compression method suitable for the data communication system of the mobile phone 40, and the compression is performed. The data is output to the mobile phone control unit 507 (step S105). In the mobile phone control unit 507, when the compressed data is input from the data compression unit 506, the data communication function of the mobile phone 40 is controlled via the mobile phone connection unit 509, and the compressed data is transmitted to the wireless base station 30 and communicates with the wireless base station 30. It is transmitted to the authentication / recognition server 20 via the network N (step S106).
[0047]
In the authentication / recognition server 20, the compressed data received from the mobile phone 40 is decompressed to extract a voice feature and a biometric feature, and the voice feature and the biometric feature are stored in advance in the corresponding user's voice feature. In addition to performing the user authentication of the mobile phone 40 by collating with the biometric feature, the voice recognition is performed based on the voice feature, and the authentication result and the voice recognition result are transmitted via the communication network N and the wireless base station 30. Sent to mobile phone.
[0048]
In the mobile phone control unit 507, when the authentication result and the voice recognition result transmitted from the authentication / recognition server 20 are received by the mobile phone 40, the display unit 403 of the mobile phone 40 is controlled via the mobile phone connection unit 509. The received contents are displayed (step S107).
[0049]
Further, the authentication / recognition server 20 determines whether or not the voice content recognized by the voice includes a service request to the application server 10. If the service request to the application server 10 is included, the service request content is determined. Is transmitted to the application server 10.
[0050]
In the application server 10, corresponding content data and an application program are transmitted to the authentication / recognition server 20 according to the service request content transmitted from the authentication / authentication server 20. In the authentication / recognition server 20, content data and application programs transmitted from the application server 10 are transmitted to the mobile phone 40 via the communication network N and the wireless base station 30.
[0051]
When the mobile phone control unit 507 receives the content data and the application program transmitted from the authentication / recognition server 20 by the mobile phone 40, an application execution unit (not shown) of the mobile phone 40 via the mobile phone connection unit 509. ) And the display unit 403 are controlled to display the contents and the contents of the application being executed (step S107).
[0052]
Thereafter, the recognition of the voice input to the voice input unit 501 is repeatedly performed according to the processing procedure of steps S101 to S107 executed in the voice input unit 50, the mobile phone 40, and the authentication / recognition server 20. The content of the response based on the voice recognition result is displayed on the display unit 403 of the mobile phone 40.
[0053]
When the voice content input from the voice input unit 501 is the instruction voice for the controlled device 60, the authentication / recognition server 20 recognizes the instruction voice content and transmits a corresponding control signal via the communication network N. Control signal to the controlled device 60.
[0054]
Therefore, by connecting the voice input unit 50 to the mobile phone 40, the user checks the voice recognition result in the display group 403 while inputting voice with the display unit 403 of the mobile phone 40 facing his / her front. Thus, the application service corresponding to the voice input using the mobile phone 40 can be reliably and easily used.
[0055]
Next, an operation of the voice recognition system 100 when the voice input unit 50 is not connected to the mobile phone 40 will be described with reference to a flowchart shown in FIG.
When the voice input unit 50 is not connected to the mobile phone 40, when a user's instruction voice (for example, power-on) is uttered toward the voice input unit 501 of the voice input unit 50 (step S301), voice input is performed. The sound is collected by the unit 501 and output to the A / D conversion unit 503 as an analog instruction sound signal (step S302).
[0056]
Next, in the A / D conversion unit 503, the analog instruction audio signal input from the audio input unit 501 is sampled, converted into digital instruction audio data, and output to the feature amount extraction unit 504 (step S303). The feature amount extraction unit 504 extracts an instruction speech feature amount necessary for speech recognition from the digital instruction speech data input from the A / D conversion unit 503, and outputs the instruction speech feature amount to the speech recognition unit 505 ( Step S304).
[0057]
The voice recognition unit 505 recognizes the instruction voice content from the voice feature amount input from the feature amount extraction unit 504, and outputs the instruction information to the data compression unit 506 (step S305). In the data compression unit 506, the instruction information input from the voice recognition unit 505 is compressed by a data compression method suitable for the data communication method of the external device communication unit 508, and the compressed data is output to the mobile phone control unit 507 (step). S306).
In the mobile phone control unit 507, the external device communication unit 508 is controlled, and the compressed data input from the data compression unit 506 is transmitted to the controlled device 60 (Step S307).
[0058]
Thereafter, recognition of the instruction voice input to the voice input unit 501 is repeatedly performed by the processing procedure of the above-described steps S301 to S307 executed in the voice input unit 50, and the instruction information based on the voice recognition result is obtained. Sent to the controlled device 60, the controlled device 60 is remotely operated.
[0059]
As described above, in the voice recognition system 100 according to the present embodiment, the voice input unit 50 is connected to the external device connection unit 401 of the conventional mobile phone 40, and the voice input unit 50 collects the voice of the user. The voice feature required for voice recognition is extracted, the voice feature is converted into compressed data suitable for the data communication method of the mobile phone 40, and the data communication function of the mobile phone 40 is controlled to be transmitted to the authentication / recognition server 20. The compressed data is transmitted, and the voice recognition result transmitted from the authentication / recognition server 20 is displayed on the display unit 403 of the mobile phone 40.
[0060]
Therefore, the voice recognition function could not be mounted on the current mobile phone which is most widely used as a wireless communication terminal, but the voice input unit 50 of the present embodiment is connected to the current mobile phone. Accordingly, a voice recognition function can be added to a current mobile phone, and a voice recognition service can be provided to a conventional mobile phone user.
[0061]
As a result, it becomes easy to disseminate the DSR type voice recognition function to existing mobile phone users, providing the simplest input environment for mobile phone users and a voice recognition service that is superior to service providers. Can be promoted.
[0062]
Also, in the conventional DSR-based speech recognition service, after the center facility performs speech recognition, a response is returned to the mobile phone by voice. However, in the voice recognition system 100 of the present embodiment, the input voice is transmitted to the mobile phone. Since the data is converted into compressed data suitable for the data communication system 40 and transmitted to the authentication / recognition server 20, and the voice recognition result transmitted from the authentication / recognition server 20 is displayed on the display unit 403 of the mobile phone 40, The user of the mobile phone can continue the voice input while watching the voice recognition result and the response content, can also speed up the response, and can provide a user-friendly voice input environment. As a result, it is possible to eliminate useless transmission and reception of voice data, improve the use efficiency of the communication network, and suppress communication costs.
[0063]
Further, in the voice recognition system 100 of the present embodiment, since the voice recognition service and the application service can be provided by using only the communication method of the current mobile phone, the services provided from the Internet and the voice recognition When the service is used, there is no need to exchange data and voice over separate networks (the Internet and a telephone line), and the communication cost can be further reduced.
[0064]
Further, in the voice input unit 50 of the present embodiment, since the voice input unit 501 is provided with a microphone having good directivity, the user speaks as compared with the case where a microphone built in a conventional mobile phone is used. Sound can be reliably collected by sufficiently covering the frequency band of the sound.
[0065]
Further, since the voice input unit 50 of the present embodiment includes the voice recognition unit 505 and the external device communication unit 508 corresponding to the wireless communication function mounted on the controlled device 60, the voice input unit 50 In the case of a single unit without being connected to the control unit, the instruction voice can be input and the controlled device 60 can be remotely operated, the use form of the voice input unit 50 can be expanded, and the convenience can be improved.
[0066]
In addition, the voice input unit 50 of the present embodiment has a biometric sensor unit 502 for detecting a biometric feature such as a user's fingerprint, face, or iris, and also transmits the detected biometric feature to the authentication / recognition server 20. In addition, since the voice feature and the biometric feature are used at the time of user authentication, authentication of whether or not the user is a legitimately registered user can be reliably performed, and the user authentication service combining the voice feature and the biometric feature can be easily performed. Can be provided. As a result, unauthorized use of the voice recognition service and the application service can be prevented.
[0067]
In the above-described embodiment, the voice input unit 50 is described as being connected to the mobile phone 40. However, for example, the voice input unit 50 may be adapted to the specifications of an external device connection unit of a portable information terminal having a wireless communication function. good.
[0068]
【The invention's effect】
According to the first and second aspects of the present invention, a voice recognition function can be added to an existing electronic device having a wireless communication function and a connection unit for an external device, and voice input to a user of the existing electronic device can be performed. Environment can be easily provided.
[0069]
According to the third aspect of the present invention, it is possible to efficiently transmit only the voice feature amount necessary for voice recognition, to reduce the communication cost for voice recognition by effectively using wireless communication resources, and to reduce the voice feature and the biometric feature. Can easily provide a user authentication service.
[0070]
According to the fourth aspect of the present invention, the user of the electronic device can perform a voice input while watching the response result, the response response can be speeded up, and a user-friendly voice input environment can be provided.
[0071]
According to the fifth aspect of the invention, the usage of the voice input device can be expanded, and the convenience can be improved.
[Brief description of the drawings]
FIG. 1 is a diagram showing an overall configuration of a speech recognition system 100 according to an embodiment of the present invention.
FIG. 2 is a diagram showing the appearance of a mobile phone 40 and a voice input unit 50 in the present embodiment.
FIG. 3 is a block diagram showing a functional configuration inside a voice input unit 50 of FIG. 2;
FIG. 4 is a flowchart showing an operation performed by the voice input unit 50 when the voice input unit 50 of FIG. 2 is connected to the mobile phone 40;
FIG. 5 is a flowchart showing an operation performed by the voice input unit 50 when the voice input unit 50 of FIG. 2 is not connected to the mobile phone 40;
[Explanation of symbols]
10 Application server
20 Authentication / recognition server
30 wireless base stations
40 mobile phone
50 voice input unit
60 controlled equipment
100 voice recognition system
401 External device connection
402 key input section
403 Display
501 Voice input unit
502 Biological Sensor
503 A / D converter
504 Feature extraction unit
505 Voice Recognition Unit
506 Data compression unit
507 Mobile phone control unit
508 External device communication unit
509 Mobile phone connection

Claims

An audio input device connected to the external device connection portion of an electronic device having a wireless communication function and an external device connection portion,
Voice input means for inputting a user's voice,
A feature amount extracting unit that extracts a voice feature amount from the voice input by the voice input unit,
A conversion unit that converts the audio feature amount extracted by the feature amount extraction unit into an information form corresponding to a wireless communication function of the electronic device and outputs the information form as transmission information;
A control unit that controls a wireless communication function of the electronic device via the external device connection unit of the electronic device, and transmits transmission information output from the conversion unit to the external voice recognition device from the electronic device.
A voice input device comprising:

2. The voice input device according to claim 1, wherein the voice input device is integrated with an electronic device having a wireless communication function and a connection portion for an external device.

A biometric feature detection unit configured to detect a biometric feature of the user, wherein the conversion unit converts a biometric feature detected by the biometric feature detection and the extracted voice feature into a wireless communication function of the electronic device. 3. The voice input device according to claim 1, wherein the voice input device converts the information into an information format corresponding to the information and outputs the information as transmission information.

The electronic device includes a display unit that displays wireless communication content between the external voice recognition device,
The voice recognition device authenticates the user based on a voice feature and a biometric feature included in the transmission information, recognizes voice content, and authenticates the authentication result and the voice recognition result to the electronic device. Has a recognition function,
Upon receiving the authentication result and the voice recognition result transmitted from the voice recognition device via the external device connection unit of the electronic device, the control unit displays the authentication result and the voice recognition result on a display unit of the electronic device. The voice input device according to claim 1, wherein the voice input device is displayed.

Wireless communication means for executing a communication procedure with a controlled device having a wireless communication function,
Speech recognition means for recognizing the input speech content based on the speech feature quantity extracted by the feature quantity extraction means and outputting speech recognition information,
The voice input means inputs an instruction voice to the controlled device of the user,
The feature amount extraction means extracts a command voice feature value from a command voice input by the voice input means,
The voice recognition unit outputs the instruction information by recognizing the input instruction voice content based on the instruction voice feature amount extracted by the feature amount extraction unit,
The conversion unit converts the instruction information output from the voice recognition unit to an information format corresponding to a wireless communication function of the controlled device, and outputs the information as transmission information.
The voice according to any one of claims 1 to 4, wherein the control unit controls the wireless communication unit to transmit transmission information output from the conversion unit to the controlled device. Input device.