JP2004309779A

JP2004309779A - Voice authentication device

Info

Publication number: JP2004309779A
Application number: JP2003102976A
Authority: JP
Inventors: Hiroyasu Ide; 博康井手
Original assignee: Casio Computer Co Ltd
Current assignee: Casio Computer Co Ltd
Priority date: 2003-04-07
Filing date: 2003-04-07
Publication date: 2004-11-04

Abstract

<P>PROBLEM TO BE SOLVED: To realize voice authentication which affects neither a speech authentication rate nor an other-person acceptance rate while adapting to change of voice data with time on which voice authentication is based. <P>SOLUTION: A plurality of voices are registered in a registered voice group 24. When voice authentication is performed, a use registered voice selection part 25 selects the voice with the best origin (voice most suitable to voice authentication) out of the plurality of registered voices and a voice collator 22 collates a voice inputted through a microphone 20 with the selected voice to complete the authentication when they match each other. Then the input voice which passes the collation is stored as a new registered voice in the registered voice group 24 by a registration update control part 23. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術の分野】
本発明は、音声を認証するための装置、及び認証方法に関する。
【０００２】
【従来の技術】
音声認証は、あらかじめシステムに個人の音声を登録しておき、新たに入力された音声が、登録音声と同一人物かどうかを判断する。そのため、音声からフーリエ変換（ＦＦＴ：ＦａｓｔＦｏｕｒｉｅｒＴｒａｎｓｆｏｒｍ）やケプストラム変換などの特徴抽出を行い、二つの音声（登録音声と新規入力音声）の特徴量を比較し、それがどの程度似ているかによって同一人物の発生した音声であるかどうかを判断している。図１はその例である。マイク１０より音声が入力されると、音声入力器１１でマイク１０から入力された音声をアナログ信号からデジタル信号に変換する。音声入力器１１でデジタル信号化された音声データは、音声照合器１２でフーリエ変換、ケプストラム変換の手法により音声特徴が抽出され、あらかじめ登録しておいた登録音声１３の音声特徴とＤＰマッチング等の手法を用いて比較を行い、マイク１０からの入力された音声情報と登録音声１３に登録されている音声情報が一致するかを判別し、その結果を出力する。このような技術に関しては、特許文献に開示されている（例えば、特許文献１参照）。
【０００３】
【特許文献１】特開２００２−２４４６９７
【０００４】
【発明が解決しようとする課題】
しかし、音声には、経時変化があるため、登録音声を登録してから日が経つにつれ、登録者本人の声が変成し、認証に通りづらくなってくるという問題がある。一方、音声の経時変化に対応するために、認証に成功した音声を常に登録音声として登録或は更新し、新たに登録或は更新された登録音声を次回の認証に利用することも考えられるが、登録音声の登録或は更新の際にたまたま調子の悪いときの声や、ノイズの多い音声が登録されてしまう事があり、やはり認証性能に悪影響を及ぼし易い。
【０００５】
以上のような問題は、音声認証において特に顕著に生じるが、指紋認証や網膜認証等、その他の情報認証においてもほぼ同様に生じることが考えられる。
本発明は、上記の問題点に鑑みてなされたものであり、情報認証（特には音声認証）時に、その認証の対象となる入力情報（音声認証の場合は入力音声）の経時変化に対応でき、かつ経時変化に対応することによって認証率を高め、一方で他人受入率を低く抑えた認証を可能にすることを目的とする。
【０００６】
【課題を解決するための手段】
請求項１記載の発明は、複数の登録音声を登録音声群として記憶する登録音声記憶手段と、該登録音声記憶手段によって記憶された登録音声の中から認証に適している音声を選択する選択手段と、該選択手段により選択された登録音声と認証対象となる入力音声とを照合する照合手段と、該照合手段による照合の結果に応じて前記入力音声を新たな登録音声として前記登録音声記憶手段に記憶させる登録手段と、を有することを特徴とする音声認証装置である。
【０００７】
請求項２記載の発明は、前記選択手段は、前記登録音声記憶手段に登録音声群として記憶された複数の登録音声のそれぞれについて、該登録音声群に含まれる他の全ての登録音声との類似度を算出する類似度算出手段と、該類似度算出手段により算出された類似度に基づいて、前記登録音声群の複数の登録音声のなかから前記照合手段における照合で用いられる登録音声を選択する登録音声選択手段と、を有することを特徴とする請求項１記載の音声認証装置である。
【０００８】
請求項３記載の発明は、前記登録手段は、前記登録音声記憶手段に登録音声群として記憶された複数の登録音声のそれぞれについて、ノイズ部分を検出してそのノイズ部分のエネルギーを算出するノイズエネルギー算出手段と、該ノイズエネルギー算出手段により算出されたノイズ部分のエネルギーの値に基づいて、前記登録音声群の複数の登録音声のなかから前記照合手段における照合で用いられる登録音声を選択する登録音声選択手段と、を有することを特徴とする請求項１記載の音声認識装置である。
【０００９】
請求項４記載の発明は、複数の登録音声を登録音声群として記憶する登録音声記憶ステップと、該登録音声記憶ステップによって記憶された登録音声の中から認証に適している音声を選択する選択ステップと、該選択ステップにより選択された登録音声と認証対象となる入力音声とを照合する照合ステップと、該照合ステップによる照合の結果に応じて前記入力音声を前記登録音声群の新たな登録音声として記憶させる登録ステップと、を有することを特徴とする音声認証方法である。
【００１０】
請求項５記載の発明は、前記選択ステップは、前記登録音声記憶ステップにおいて登録音声群として記憶された複数の登録音声のそれぞれについて、該登録音声群に含まれる他の全ての登録音声との類似度を算出する類似度算出ステップと、該類似度算出ステップにより算出された類似度に基づいて、前記登録音声群の複数の登録音声のなかから前記照合ステップにおける照合で用いられる登録音声を選択する登録音声選択ステップと、を有することを特徴とする請求項４記載の音声認証方法である。
【００１１】
請求項６記載の発明は、前記登録ステップは、前記登録音声記憶ステップに登録音声群として記憶された複数の登録音声のそれぞれについて、ノイズ部分を検出してそのノイズ部分のエネルギーを算出するノイズエネルギー算出ステップと、該ノイズエネルギー算出ステップにより算出されたノイズ部分のエネルギーの値に基づいて、前記登録音声群の複数の登録音声のなかから前記照合ステップにおける照合で用いられる登録音声を選択する登録音声選択ステップと、を有することを特徴とする請求項４記載の音声認識方法である。
【００１２】
以上請求項１から請求項６記載の発明は、音声の経時変化に対応するため、入力情報を新たな登録情報として登録すると、ノイズ等を含んだ新たな入力情報も登録されてしまうという問題を解決するために、入力情報に基づいて登録された複数の情報の中から、認証に適している登録情報を、照合に利用する情報として選択する手段及び手法を設けたものである。これにより、新たな入力情報を登録することにより、音声の経時変化等に対応しかつ入力情報のノイズ等の影響を受けないようにすることができる。
【００１３】
請求項７記載の発明は、複数の登録情報を登録情報群として記憶する登録情報記憶手段と、該登録情報記憶手段によって記憶された登録情報の中から認証に適している情報を選択する選択手段と、該選択手段により選択された登録情報と認証対象となる入力情報とを照合する照合手段と、該照合手段による照合の結果に応じて前記入力情報を新たな登録情報として前記登録情報記憶手段に記憶させる登録手段と、を有することを特徴とする情報認証装置である。
【００１４】
請求項８記載の発明は、複数の登録情報を登録情報群として記憶する登録情報記憶ステップと、該登録情報記憶ステップによって記憶された登録情報の中から認証に適している情報を選択する選択ステップと、該選択ステップにより選択された登録情報と認証対象となる入力情報とを照合する照合ステップと、該照合ステップによる照合の結果に応じて前記入力情報を前記登録情報群の新たな登録情報として記憶させる登録ステップと、を有することを特徴とする情報認証方法である。
【００１５】
請求項７及び請求項８記載の登録情報には、音声の情報の他に指紋の情報、網膜の情報、静脈の情報、手相の情報が考えられ、前記請求項１から請求項６と同様に、これら音声等の情報の経時変化等に対応し、かつ入力情報のノイズ等の影響を受けない認証装置を実現することができる。
【００１６】
【発明の実施の形態】
以下、図を参照しながら本発明の実施形態を詳細に説明する。
図２は、本発明の音声認証装置の構成を示す機能ブロック図である。
音声認証を受けようとする場合、マイク２０より音声を入力する。マイク２０に入力された音声は、音声入力器２１により音声データとして情報処理装置に入力される。一方、利用登録音声選出部２５において、あらかじめ登録しておいた登録音声群２４より最も素性の良い登録音声データ（素性の良い登録音声については後述）を選出する。音声入力器２１より入力された音声データと、利用登録音声選出部２５より選出された登録音声データを音声照合器２２により照合する。
【００１７】
認証が成功した入力音声は、登録更新制御部２３により登録音声群２４に登録される。登録更新制御部２３、及び利用登録音声選出部２５の動作について以下に詳細に説明をする。
利用登録音声選出部２５には、例えば以下の２つの方法が考えられる。
（１）登録音声の相互類似度を利用する方法。
（２）ノイズのエネルギーを用いる方法。
【００１８】
まず、上記の（１）の方法を用いた場合の実施例について説明をする。例えば、二つの音声の類似度は、各音声の特徴ベクトル群をＤＰマッチングした場合の距離Ｄ（Ｆ）で表され、類似度が高いほど値が小さくなる。これを利用した場合、各登録音声について、その音声と他の全ての登録音声との類似度を算出する。この値が最も小さいものが、最も平均的な（素性の良い）音声データであると考えられるので、その音声データを認証時の登録音声として選択する。
【００１９】
図３は、図２の利用登録音声選出部２５に登録音声の類似度を利用して、認証のために最も平均的な音声データを登録音声群２４の中から選出する処理を示すフローチャートである。登録音声群２４に登録可能な最大登録音声数をＮとする。また、登録音声群２４に登録されている１つの登録音声ｉと、登録音声群２４に登録されている他の１つの登録音声ｊに対する相対類似度をｓｉｊとする。さらに、登録音声群２４に登録されている１つの登録音声ｉと、登録音声群２４に登録されている他の全ての登録音声ｊ（ｊ＝１．．Ｎ、ただしｊ≠ｉ）に対する相対類似度の総和を登録音声ｉの類似度ｓ［ｉ］とする。
【００２０】
登録音声データをｉ（ｉ＝０．．Ｎ―１）とした時、ステップＳ３０１において、各登録音声の類似度を保存するためのメモリ領域ｓ［０．．Ｎ−１］を確保し、それぞれ０に初期化する。ステップＳ３０２でｉ＝０とし、まず登録音声群２４の登録音声データを１つ決める。ステップＳ３０２では、ｊを「ｉ＋１」に設定する。ステップＳ３０４では、ステップＳ３０２とステップＳ３０３で決めた登録音声データの相対類似度ｓｉｊを計算する。ステップＳ３０５では、ステップＳ３０４で計算した相対類似度を順次類似度ｓ［ｉ］に加える。なお、登録音声データｉと登録音声データｊの相対類似度ｓｉｊ、および登録音声データｊと登録音声ｉの相対類似度ｓｊｉは同じ値（ｓｉｊ＝ｓｊｉ）であることから、図３の計算では相対類似度ｓ［ｉ］と相対類似度ｓ［ｊ］を同時に計算し、無駄な計算を省略している。ステップＳ３０６では、登録音声ｉと登録音声群２４に登録された他の登録音声データの相対類似度を計算するために、ｊに１を加算したものを次の登録音声データとする。ステップＳ３０７にて登録音声群２４に登録されている他の登録音声データ全てに対して相対類似度ｓｉｊを計算したかチェックする。まだ相対類似度を計算していない登録音声データがある場合には、ステップＳ３０４に戻り相対類似度を計算する。全ての登録音声データと相対類似度を計算し、登録音声データの類似度ｓ［ｉ］の計算が終了した場合には、ステップＳ３０８で次に類似度を計算する登録音声データを決め、ステップＳ３０３に戻り、類似度を計算する。ステップＳ３０９で、登録音声群２４に登録されている登録音声データの類似度が全て計算されたかを判断する。全ての類似度が計算されると、ステップＳ３１０で、計算した類似度ｓ［ｉ］の中で最小の値となる登録音声データを選出し、認証に利用する音声とする。
【００２１】
図４は、図２の登録更新制御部２３の処理を示すフローチャートである。ステップＳ４１では、入力音声データが認証に成功したかを判別する。入力音声データが認証に通った場合（認証に成功した場合）は、登録音声データが登録音声群２４に登録可能な登録音声データの最大件数であるかをステップＳ４２で判別する。登録音声群２４に登録可能な最大件数分の登録音声データが登録されている場合には、ステップＳ４３にて、登録音声群２４に登録されている登録音声データのうち、最も過去に登録された登録音声データを削除する。登録音声群２４に登録登録されている登録音声データの件数が最大登録件数未満の場合、あるいはステップＳ４３により最も過去に登録された登録音声データが削除された後に、ステップＳ４４にて入力音声データが登録音声群２４に登録される。以上のようにして、図２の登録更新制御部２３を実行する。
【００２２】
次に、登録音声群２４から認証に利用する音声データを１つを選択する際に、ノイズのエネルギーを用いる場合について説明する。
登録音声群２４に登録されている登録音声データにも登録時の背景ノイズなどは含まれている。利用登録音声選出の方法（２）は、この登録音声群２４への音声データ登録時に入ってしまった背景ノイズを算出し、最もノイズの少ない登録音声を選出して音声認証に利用することを特徴とする音声認識装置である。例えば特許文献によれば、入力音声データから音声部分だけを切り出すことが出来るので、切り出されなかった残りの部分には、ノイズだけが録音されていると考えられる。そのノイズ部分のエネルギーを算出し、各登録音声ごとに比較する事で背景ノイズが最小の登録音声を発見できる。
【００２３】
図５は、図２の利用登録音声選出部２５に登録音声のノイズ部分のエネルギーを計算し、最もノイズ部分のエネルギーの小さい登録音声データを選出する処理を示すフローチャートである。
登録音声群２４に登録可能な最大登録音声数をＮとする。また、登録音声群２４に登録されている１つの登録音声ｉのノイズ部分のエネルギーの合計をノイズのサンプル数で割った値をｓ［ｉ］とする。ステップＳ５０１で、ノイズエネルギーｓ［ｉ］を保存するためのメモリ領域ｓ［０．．Ｎ−１］を確保し、それぞれ０に初期化する。ステップＳ５０３で、ｉ番目の登録音声データのノイズ部分を切り出す。ステップＳ５０４では、ステップＳ５０３で切り出したノイズ部分のエネルギーの合計をノイズ部分のサンプル数で割った値をｓ［ｉ］として代入する。ステップＳ５０５で、次の登録音声データを選択し、ステップＳ５０６で、全ての登録音声データに対してノイズ部分のエネルギーを計算したかを判別する。まだ、全ての登録音声データについてノイズ部分のエネルギーを計算していない場合には、ステップＳ５０３に戻り計算を続ける。全ての登録音声データについてノイズ部分のエネルギーの計算を終了した場合には、ステップＳ５０７で、ノイズ部分のエネルギーｓ［０．．Ｎ−１］のうち最小の値の登録音声データを選択し、音声認証に利用する。これにより、登録時の背景ノイズが最も少ない音声データを選択することが可能となる。
【００２４】
図６は、本発明の音声認証装置に用いる情報処理装置の構成図である。本実施の形態において音声認証装置は、ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）６１、メモリ６２、外部記憶装置６３、媒体駆動装置６４、可搬記憶媒体６５、入力装置６６、出力装置６７、ネットワーク接続装置６８によって構成されており、可搬記憶媒体６５以外の各装置はバス６９で接続されている。ＣＰＵ６１は、外部記憶装置６３や可搬記憶媒体６５内に格納されている音声認証を実現するために必要なプログラムをメモリ６２にロードする。さらに、ＣＰＵ６１は、入力装置６６から入力された音声データに対して、メモリ６２に格納されたプログラム命令に従って各種の処理を実行し、音声認証処理を実施する。実行された処理の結果は、外部記憶装置６３に保存され、あるいは出力装置６７から出力される。
【００２５】
なお、上記各実施形態の説明では、利用登録音声選出部２５において、登録音声群２４から一つの音声のみを選択する場合を例示して説明したが、例えば複数の登録音声を選択してこれらの平均を求めて音声照合器２２での照合に用いるといった構成をとることも可能である。この場合、利用登録音声選出部２５において、例えば、登録音声の相互類似度の値の小さいものから所定個数の登録音声を選択したり、あるいはノイズのエネルギーの小さいものから所定個数の登録音声を選択して、その選択された複数の音声の特徴ベクトルの要素毎に平均値を求めて、実際の照合に用いるデータを求めるようにすることが可能である。
【００２６】
また、以上の各実施形態の説明は、音声認証における本発明の実施例であるが、指紋認証や網膜認証等、その他の情報認証においても同様に実施可能である。また、上述した本発明の各実施形態は、コンピュータに実行させることのできるプログラムとして、例えば磁気ディスク（フレキシブルディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリなどの記録媒体に書き込んで各種装置に適用することも可能である。本装置を実現するコンピュータは、記録媒体に記録されたプログラムを読み込み、このプログラムによって動作が制御されることにより、上述した処理を実行する。
【００２７】
また、図２中に記載されている各構成要素２０から２５は、全てが１つの装置内に収納されていなくても良い。したがって、例えば、サーバと複数の音声認証用端末とがネットワークで接続されているようなシステムにおいて、各端末内に２０と２１（及び、認証結果を表示する表示部等）が設けられ、サーバ内に２２から２５が設けられているような構成であっても良い。
【００２８】
また、本発明の各実施形態に示した記憶手段としては、上記で既に説明したＣＤ−ＲＯＭやＤＶＤ−ＲＯＭ等の記憶手段の例の他にも、例えば、Ｂｌｕｅ−ｒａｙＤｉｓｃ（Ｒ）やＡＯＤ（ＡｄｖａｎｃｅｄＯｐｔｉｃａｌＤｉｓｃ）などの青色レーザを用いた次世代光ディスク記憶媒体、赤色レーザを用いるＨＤ−ＤＶＤ９、青紫色レーザーを用いるＢｌｕｅＬａｓｅｒＤＶＤなど、今後開発される種々の大容量記憶媒体を用いて本発明を実施することが可能である。
【００２９】
【発明の効果】
請求項１から請求項２、及び請求項４から請求項５記載の発明によれば、登録音声を複数用意し、時系列にそって認証されたデータを入れ換えて行くことによって、入力音声が経時変化した場合に対応する事が可能となった。また、照合時には登録音声の中から認証に適したものを選択することで、経時変化に対応するために逐次登録音声データを更新することによる認証精度への悪影響を抑えることが出来るようになった。さらに、請求項３、請求項６記載の発明によれば、登録音声群から認証に利用する１つの登録音声データを選択する際に、相互類似度やノイズエネルギーを利用することで、最も素性の良い音声を選ぶことが可能となり、経時変化に対応し、経時変化に対応するために逐次登録音声データを更新することによる認証精度への悪影響も抑えることができるようになった。さらに、本発明を音声以外の他の認証技術に適用することにより、音声以外の情報認証においても上記と同様の効果を得ることが可能となった。
【図面の簡単な説明】
【図１】音声認証システムの一般的な構成図である。
【図２】本発明の音声認証システムの構成図である。
【図３】登録音声の相互類似度を計算し、最小の類似度となる登録音声を選出する処理を示すフローチャートである。
【図４】本発明の音声登録処理を示すフローチャートである。
【図５】登録音声のノイズの平均を計算し、最小のノイズとなる登録音声を選出する処理を示すフローチャートである。
【図６】情報処理装置の構成図である。
【符号の説明】
２０…マイク
２１…音声入力器
２２…音声照合器
２３…登録更新制御部
２４…登録音声群
２５…利用登録音声選出部
６１…ＣＰＵ
６２…メモリ
６３…外部記憶装置
６４…媒体駆動装置
６５…可搬記憶媒体
６６…入力装置
６７…出力装置
６８…ネットワーク接続装置
６９…バス[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus for authenticating voice and an authentication method.
[0002]
[Prior art]
In voice authentication, a personal voice is registered in the system in advance, and it is determined whether the newly input voice is the same person as the registered voice. Therefore, features such as Fourier transform (FFT: Fast Fourier Transform) and cepstrum transform are extracted from the speech, and the features of the two speeches (registered speech and new input speech) are compared, and the same is determined depending on how similar they are. It is determined whether the voice is generated by a person. FIG. 1 is an example. When voice is input from the microphone 10, the voice input device 11 converts the voice input from the microphone 10 from an analog signal to a digital signal. The voice data digitized by the voice input device 11 is subjected to Fourier transform and cepstrum transform by a voice collator 12 to extract voice characteristics, and the voice characteristics of a registered voice 13 registered in advance and DP matching and the like. A comparison is made using a method, it is determined whether the voice information input from the microphone 10 matches the voice information registered in the registered voice 13, and the result is output. Such a technique is disclosed in Patent Literature (for example, see Patent Literature 1).
[0003]
[Patent Document 1] JP-A-2002-244697
[0004]
[Problems to be solved by the invention]
However, there is a problem in that the voice of the registrant is altered as time passes after the registered voice is registered because the voice has a temporal change, and it becomes difficult to pass the authentication. On the other hand, in order to cope with a temporal change of the voice, it is conceivable that the voice that has been successfully authenticated is always registered or updated as a registered voice, and the newly registered or updated registered voice is used for the next authentication. When the registered voice is registered or updated, a voice when the condition is bad or a noisy voice may be registered, which also has a bad influence on the authentication performance.
[0005]
The above-mentioned problem occurs particularly remarkably in voice authentication, but it can be considered that similar problems occur in other information authentication such as fingerprint authentication and retinal authentication.
The present invention has been made in view of the above-described problems, and can cope with a temporal change of input information to be authenticated (input voice in the case of voice authentication) at the time of information authentication (in particular, voice authentication). It is another object of the present invention to increase the authentication rate by responding to changes over time, and to enable authentication with a low false acceptance rate.
[0006]
[Means for Solving the Problems]
According to a first aspect of the present invention, there is provided a registered voice storing means for storing a plurality of registered voices as a registered voice group, and a selecting means for selecting a voice suitable for authentication from the registered voices stored by the registered voice storing means. Matching means for comparing the registered voice selected by the selecting means with the input voice to be authenticated; and the registered voice storing means as the new registered voice based on the result of the matching by the matching means. And a registering means for storing in the voice authentication device.
[0007]
According to a second aspect of the present invention, the selecting unit is configured to select, for each of the plurality of registered voices stored as the registered voice group in the registered voice storage unit, a similarity to all other registered voices included in the registered voice group. A similarity calculating means for calculating a degree, and a registered voice used for matching by the matching means is selected from a plurality of registered voices of the registered voice group based on the similarity calculated by the similarity calculating means. The voice authentication device according to claim 1, further comprising: registered voice selection means.
[0008]
The invention according to claim 3, wherein the registration unit detects a noise portion and calculates an energy of the noise portion for each of the plurality of registered voices stored as a registered voice group in the registered voice storage unit. Calculating means for selecting a registered voice to be used in the matching by the matching means from a plurality of registered voices of the registered voice group based on the energy value of the noise portion calculated by the noise energy calculating means; The voice recognition device according to claim 1, further comprising: a selection unit.
[0009]
According to a fourth aspect of the present invention, a registered voice storing step of storing a plurality of registered voices as a registered voice group, and a selecting step of selecting a voice suitable for authentication from the registered voices stored in the registered voice storing step A collating step of collating the registered voice selected by the selecting step with the input voice to be authenticated; and, according to a result of the collation by the collating step, the input voice is used as a new registered voice of the registered voice group. And a registration step of storing.
[0010]
According to a fifth aspect of the present invention, in the selecting step, each of the plurality of registered voices stored as the registered voice group in the registered voice storing step is similar to all other registered voices included in the registered voice group. A similarity calculating step of calculating a degree, and a registered voice used in the matching in the matching step is selected from a plurality of registered voices of the registered voice group based on the similarity calculated in the similarity calculating step. 5. The voice authentication method according to claim 4, comprising a registration voice selection step.
[0011]
The invention according to claim 6, wherein the registering step detects a noise portion and calculates an energy of the noise portion for each of the plurality of registered voices stored as the registered voice group in the registered voice storing step. A registered voice for selecting a registered voice to be used in the matching in the matching step from a plurality of registered voices of the registered voice group based on the calculating step and the energy value of the noise portion calculated in the noise energy calculating step 5. The speech recognition method according to claim 4, comprising a selecting step.
[0012]
As described above, the first to sixth aspects of the present invention have a problem that when input information is registered as new registration information, new input information including noise or the like is also registered in order to cope with a temporal change of voice. In order to solve the problem, there is provided a means and a method for selecting registration information suitable for authentication from a plurality of pieces of information registered based on input information as information used for collation. Thereby, by registering new input information, it is possible to cope with a temporal change of voice and the like and not to be affected by noise or the like of the input information.
[0013]
According to a seventh aspect of the present invention, there is provided a registration information storage means for storing a plurality of registration information as a registration information group, and a selection means for selecting information suitable for authentication from the registration information stored by the registration information storage means. Matching means for matching the registered information selected by the selecting means with the input information to be authenticated; and the registered information storing means as the new registered information based on the result of the matching by the matching means. And a registration means for storing the information in the information authentication device.
[0014]
An invention according to claim 8 is a registration information storage step of storing a plurality of registration information as a registration information group, and a selection step of selecting information suitable for authentication from the registration information stored in the registration information storage step. A collation step of collating the registration information selected by the selection step with input information to be authenticated; and, according to a result of the collation by the collation step, the input information as new registration information of the registration information group. And a registration step of storing the information.
[0015]
The registration information according to claims 7 and 8 may include fingerprint information, retinal information, vein information, and palm information in addition to audio information. Thus, it is possible to realize an authentication apparatus that can cope with a temporal change of information such as voice and is not affected by noise of input information.
[0016]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
FIG. 2 is a functional block diagram showing the configuration of the voice authentication device of the present invention.
To receive voice authentication, voice is input from the microphone 20. The voice input to the microphone 20 is input to the information processing apparatus as voice data by the voice input device 21. On the other hand, the registered voice selection unit 25 selects registered voice data with the best feature (registered voice with good feature will be described later) from the registered voice group 24 registered in advance. The voice data input from the voice input unit 21 and the registered voice data selected by the use registration voice selection unit 25 are verified by the voice verification unit 22.
[0017]
The input voice that has been successfully authenticated is registered in the registered voice group 24 by the registration update control unit 23. The operation of the registration update control unit 23 and the use registration voice selection unit 25 will be described in detail below.
For example, the following two methods are conceivable for the use registration voice selection unit 25.
(1) A method using mutual similarity of registered voices.
(2) A method using energy of noise.
[0018]
First, an embodiment using the above method (1) will be described. For example, the similarity between two voices is represented by a distance D (F) when the feature vector group of each voice is subjected to DP matching, and the value decreases as the similarity increases. When this is used, for each registered voice, the similarity between the voice and all other registered voices is calculated. Since the voice data with the smallest value is considered to be the average voice data (having good characteristics), the voice data is selected as the registered voice at the time of authentication.
[0019]
FIG. 3 is a flowchart showing a process of selecting the most average voice data from the registered voice group 24 for authentication by using the similarity of the registered voice in the use registered voice selection unit 25 of FIG. . Let N be the maximum number of registered voices that can be registered in the registered voice group 24. Also, let sij be the relative similarity between one registered voice i registered in the registered voice group 24 and another registered voice j registered in the registered voice group 24. Further, the relative similarity between one registered voice i registered in the registered voice group 24 and all other registered voices j (j = 1... N, where j ≠ i) registered in the registered voice group 24. The sum of the degrees is defined as the similarity s [i] of the registered voice i.
[0020]
Assuming that the registered voice data is i (i = 0..N-1), in step S301, a memory area s [0... . N-1], and each is initialized to 0. In step S302, i = 0, and one registered voice data of the registered voice group 24 is determined first. In step S302, j is set to “i + 1”. In step S304, the relative similarity sij of the registered voice data determined in steps S302 and S303 is calculated. In step S305, the relative similarities calculated in step S304 are sequentially added to the similarities s [i]. The relative similarity sji between the registered voice data i and the registered voice data j and the relative similarity sji between the registered voice data j and the registered voice i are the same value (sij = sji). The similarity s [i] and the relative similarity s [j] are calculated at the same time, and unnecessary calculation is omitted. In step S306, in order to calculate the relative similarity between the registered voice i and the other registered voice data registered in the registered voice group 24, the value obtained by adding 1 to j is set as the next registered voice data. In step S307, it is checked whether the relative similarity sij has been calculated for all the other registered voice data registered in the registered voice group 24. If there is any registered voice data for which the relative similarity has not been calculated, the process returns to step S304 to calculate the relative similarity. After calculating the relative similarities with all the registered voice data and completing the calculation of the similarity s [i] of the registered voice data, in step S308, the registered voice data for which the similarity is to be calculated next is determined. And the similarity is calculated. In step S309, it is determined whether all the similarities of the registered voice data registered in the registered voice group 24 have been calculated. When all the similarities are calculated, in step S310, the registered voice data having the minimum value among the calculated similarities s [i] is selected and set as the voice to be used for authentication.
[0021]
FIG. 4 is a flowchart showing the processing of the registration update control unit 23 of FIG. In step S41, it is determined whether the input voice data has been successfully authenticated. If the input voice data has passed the authentication (if the authentication has succeeded), it is determined in step S42 whether the registered voice data is the maximum number of registered voice data that can be registered in the registered voice group 24. If the registered voice data of the maximum number that can be registered in the registered voice group 24 is registered, in step S43, of the registered voice data registered in the registered voice group 24, Delete the registered voice data. If the number of registered voice data registered in the registered voice group 24 is less than the maximum number of registered voice data, or after the registered voice data most recently registered is deleted in step S43, the input voice data is deleted in step S44. It is registered in the registered voice group 24. As described above, the registration update control unit 23 of FIG. 2 is executed.
[0022]
Next, a case where noise energy is used to select one piece of voice data to be used for authentication from the registered voice group 24 will be described.
The registered voice data registered in the registered voice group 24 also includes background noise at the time of registration. The method (2) of selecting a registered voice for use is characterized in that background noise that has entered when registering voice data in the registered voice group 24 is calculated, and a registered voice with the least noise is selected and used for voice authentication. Is a voice recognition device. For example, according to Patent Literature, only an audio portion can be cut out from input audio data, so that it is considered that only noise is recorded in the remaining portion not cut out. By calculating the energy of the noise portion and comparing each registered voice, a registered voice with the minimum background noise can be found.
[0023]
FIG. 5 is a flowchart showing a process of calculating the energy of the noise portion of the registered voice in the use registered voice selection unit 25 of FIG. 2 and selecting the registered voice data having the smallest energy of the noise portion.
Let N be the maximum number of registered voices that can be registered in the registered voice group 24. In addition, a value obtained by dividing the total energy of the noise portion of one registered voice i registered in the registered voice group 24 by the number of noise samples is defined as s [i]. In step S501, a memory area s [0... . N-1], and each is initialized to 0. In step S503, a noise portion of the i-th registered voice data is cut out. In step S504, a value obtained by dividing the total energy of the noise portion extracted in step S503 by the number of samples in the noise portion is substituted as s [i]. In step S505, the next registered voice data is selected, and in step S506, it is determined whether the energy of the noise portion has been calculated for all the registered voice data. If the energy of the noise portion has not been calculated for all the registered voice data, the process returns to step S503 to continue the calculation. If the calculation of the energy of the noise portion has been completed for all the registered voice data, in step S507, the energy s [0. . N-1], the registered voice data having the minimum value is selected and used for voice authentication. This makes it possible to select audio data with the least background noise at the time of registration.
[0024]
FIG. 6 is a configuration diagram of an information processing device used for the voice authentication device of the present invention. In the present embodiment, the voice authentication device includes a CPU (Central Processing Unit) 61, a memory 62, an external storage device 63, a medium drive device 64, a portable storage medium 65, an input device 66, an output device 67, and a network connection device 68. Each device other than the portable storage medium 65 is connected by a bus 69. The CPU 61 loads a program necessary for implementing voice authentication stored in the external storage device 63 or the portable storage medium 65 into the memory 62. Further, the CPU 61 executes various processes on the voice data input from the input device 66 in accordance with the program instructions stored in the memory 62, and performs the voice authentication process. The result of the executed processing is stored in the external storage device 63 or output from the output device 67.
[0025]
In the description of each of the above embodiments, the case where only one voice is selected from the registered voice group 24 in the use registered voice selection unit 25 has been described as an example. It is also possible to adopt a configuration in which an average is obtained and used for the matching in the voice matching unit 22. In this case, in the use registration voice selection unit 25, for example, a predetermined number of registration voices are selected from those with low mutual similarity values of registration voices, or a predetermined number of registration voices are selected from those with low noise energy. Then, it is possible to obtain an average value for each element of the selected feature vectors of the plurality of voices, and obtain data to be used for actual comparison.
[0026]
Further, the above embodiments are examples of the present invention in voice authentication, but the present invention can be similarly applied to other information authentication such as fingerprint authentication and retinal authentication. Each of the embodiments of the present invention described above can be executed by a computer as a program, for example, a recording medium such as a magnetic disk (flexible disk, hard disk, etc.), an optical disk (CD-ROM, DVD, etc.), a semiconductor memory, or the like. It is also possible to write and apply to various devices. The computer that realizes the present apparatus reads the program recorded on the recording medium, and executes the above-described processing by controlling the operation of the program.
[0027]
Further, all of the components 20 to 25 described in FIG. 2 do not need to be housed in one device. Therefore, for example, in a system in which a server and a plurality of voice authentication terminals are connected via a network, 20 and 21 (and a display unit for displaying an authentication result, etc.) are provided in each terminal. The configuration in which 22 to 25 are provided in the camera may be adopted.
[0028]
As the storage means shown in each embodiment of the present invention, in addition to the examples of the storage means such as the CD-ROM and the DVD-ROM described above, for example, a Blue-ray Disc (R) or an AOD (Advanced Optical Disc), a next-generation optical disk storage medium using a blue laser, an HD-DVD9 using a red laser, and a Blue Laser DVD using a blue-violet laser. It is possible to carry out the invention.
[0029]
【The invention's effect】
According to the invention described in claims 1 and 2, and a plurality of registered voices are prepared, and the authenticated data is exchanged in time series by replacing the authenticated data in time series. It is now possible to respond to changes. In addition, by selecting a suitable voice from the registered voices at the time of verification, it is possible to suppress the adverse effect on the authentication accuracy due to successively updating the registered voice data in order to cope with aging. . Further, according to the third and sixth aspects of the present invention, when one registered voice data to be used for authentication is selected from a registered voice group, mutual similarity or noise energy is used to select most registered voice data. A good voice can be selected, and it is possible to cope with a change over time, and to suppress the adverse effect on the authentication accuracy by sequentially updating the registered voice data in order to cope with the change over time. Furthermore, by applying the present invention to other authentication technologies other than voice, the same effects as described above can be obtained in information authentication other than voice.
[Brief description of the drawings]
FIG. 1 is a general configuration diagram of a voice authentication system.
FIG. 2 is a configuration diagram of a voice authentication system of the present invention.
FIG. 3 is a flowchart illustrating a process of calculating a mutual similarity between registered voices and selecting a registered voice having a minimum similarity.
FIG. 4 is a flowchart showing a voice registration process according to the present invention.
FIG. 5 is a flowchart showing a process of calculating an average of noise of a registered voice and selecting a registered voice having the minimum noise.
FIG. 6 is a configuration diagram of an information processing apparatus.
[Explanation of symbols]
20 microphone 21 voice input device 22 voice verification device 23 registration update control unit 24 registered voice group 25 usage registration voice selection unit 61 CPU
62 memory 63 external storage device 64 medium drive device 65 portable storage medium 66 input device 67 output device 68 network connection device 69 bus

Claims

Registered voice storage means for storing a plurality of registered voices as a registered voice group;
Selecting means for selecting a voice suitable for authentication from among the registered voices stored by the registered voice storage means;
Collating means for collating the registered voice selected by the selecting means with the input voice to be authenticated,
Registration means for storing the input voice as a new registered voice in the registered voice storage means in accordance with the result of the verification by the verification means;
A voice authentication device comprising:

A selecting unit configured to calculate, for each of a plurality of registered voices stored as a registered voice group in the registered voice storing unit, a similarity with all other registered voices included in the registered voice group; When,
Based on the similarity calculated by the similarity calculating means, a registered voice selecting means for selecting a registered voice used in matching by the matching means from a plurality of registered voices of the registered voice group,
The voice authentication device according to claim 1, further comprising:

The registration means, for each of a plurality of registered voices stored as a registered voice group in the registered voice storage means, noise energy calculation means for detecting a noise portion and calculating the energy of the noise portion,
Based on the energy value of the noise portion calculated by the noise energy calculating means, a registered voice selecting means for selecting a registered voice used in matching by the matching means from a plurality of registered voices of the registered voice group,
The voice recognition device according to claim 1, comprising:

A registered voice storing step of storing a plurality of registered voices as a registered voice group;
A selecting step of selecting a voice suitable for authentication from the registered voices stored by the registered voice storing step;
A matching step of matching the registered voice selected by the selecting step with the input voice to be authenticated,
A registration step of storing the input voice as a new registered voice of the registered voice group according to a result of the verification by the verification step;
A voice authentication method comprising:

The selecting step is a similarity calculating step of calculating, for each of the plurality of registered voices stored as the registered voice group in the registered voice storing step, a similarity to all other registered voices included in the registered voice group. When,
Based on the similarity calculated by the similarity calculation step, a registered voice selection step of selecting a registered voice used in the matching in the matching step from a plurality of registered voices of the registered voice group;
5. The voice authentication method according to claim 4, comprising:

The registration step, for each of the plurality of registered voices stored as a registered voice group in the registered voice storage step, a noise energy calculation step of detecting a noise portion and calculating the energy of the noise portion,
A registered voice selecting step of selecting a registered voice to be used in the matching in the matching step from a plurality of registered voices of the registered voice group based on the energy value of the noise portion calculated in the noise energy calculating step;
5. The speech recognition method according to claim 4, comprising:

Registration information storage means for storing a plurality of registration information as a registration information group,
Selecting means for selecting information suitable for authentication from the registered information stored by the registered information storing means;
Collation means for collating the registration information selected by the selection means with input information to be authenticated,
Registration means for storing the input information as new registration information in the registration information storage means in accordance with the result of the matching by the matching means;
An information authentication device, comprising:

A registration information storage step of storing a plurality of registration information as a registration information group;
A selection step of selecting information suitable for authentication from the registration information stored by the registration information storage step;
A collation step of collating the registration information selected by the selection step with input information to be authenticated,
A registration step of storing the input information as new registration information of the registration information group according to a result of the comparison by the comparison step;
An information authentication method, comprising: