JP3631020B2

JP3631020B2 - Speaker recognition method

Info

Publication number: JP3631020B2
Application number: JP35157998A
Authority: JP
Inventors: 正樹松平
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1998-12-10
Filing date: 1998-12-10
Publication date: 2005-03-23
Anticipated expiration: 2018-12-10
Also published as: JP2000181489A

Description

【０００１】
【発明の属する技術分野】
この発明は、発話された音声が特定の個人のものかどうかを判定する話者認識方法に関するものである。
【０００２】
【従来の技術】
発話された音声が特定の個人のものかどうかを判定する話者認識装置では、指定された利用者の登録音声特徴データと発話された音声の特徴データから尤度を計算し、あらかじめ設定された閾値と比較することによって判定を行なっている。尤度が閾値より大きい場合は指定された本人として受理し、小さい場合は詐称者として棄却するのである。
【０００３】
しかし、閾値を厳しく（すなわち大きく）設定すれば正しい本人が棄却される誤り（ＦＲ：ＦａｌｓｅＲｅｊｅｃｔｉｏｎ）が多くなり、閾値を緩く（すなわち小さく）すれば詐称者が受理されてしまう誤り（ＦＡ：ＦａｌｓｅＡｃｃｅｐｔａｎｃｅ）が多くなるという問題があり、閾値を適切な値に設定することが重要になってくる。文献：特開平９−１９８０８６「話者認識用しきい値設定方法及びこの方法を用いた話者認識装置」では、話者モデルの学習過程で閾値を設定する方式が述べられている。
【０００４】
【発明が解決しようとする課題】
しかしながら、上記の方法に代表されるように１つの閾値で受理・棄却を決定する方式では、いかに閾値を適切に設定してもその精度をさらに高めようとすると登録時や認識時の発声時間を長くしたり、発声回数を多くしたりする必要があり、すべての利用者に対する負担が増大するという問題点があった。
【０００５】
例えば、『Ｊ．Ｌ．Ｇａｕｖａｉｎ， ”ＥｘｐｅｒｉｍｅｎｔｓｗｉｔｈＳｐｅａｋｅｒＶｅｒｉｆｉｃａｔｉｏｎｏｖｅｒｔｈｅＴｅｌｅｐｈｏｎｅ”，ＥＵＲＯＳＰＥＥＣＨ ’９５，１９９５』では、認識時の発声を長くするほど、また、発声回数を多くするほど精度が向上する（すなわち、上述したＦＲとＦＡが等しくなる誤り率ＥＥＲ：ＥｑｕａｌＥｒｒｏｒＲａｔｅが小さくなる）ことが報告されている。
【０００６】
本発明は、前記従来の問題点を解決し、話者認識のための閾値設定時に利用者の負担を軽減することのできる話者認識方法を提供することを目的とする。
【０００７】
そのために、第１発明の話者認識方法においては、第１の認識ステップとして、利用者に対して負担の軽い第１の課題として発話を実行させ、利用者ごとにTA＞TBとなるような２つの閾値TA、TBを設け、発音された音声の特徴データと指定された利用者のデータベースに格納された特徴データから計算された尤度が閾値TAより大きい場合はその利用者を本人として受理し、前記尤度が閾値TBより小さい場合は詐称者として棄却し、閾値TAとTBの間にある場合に、より精緻な認識のために第１の課題よりも利用者に対して負担の重い第２の課題としての発話を実行させ、閾値Tを用いて本人であるか否かを検証する第２の認識ステップを有する話者認識方法であって、閾値TAを第１の課題を実施したときの詐称者受理率が第２の認識ステップにおける詐称者受理率と等しくなるように設定すると共に、閾値TBを第１の課題を実施したときの本人棄却率が第２の認識ステップにおける本人棄却率と等しくなるように設定したことを特徴とする。
【０００９】
【発明の実施の形態】
以下、本発明の実施の形態について、図面を参照しながら詳細に説明する。
【００１０】
［第１の実施形態の説明］
［構成の説明］
図１に第１の実施形態における話者認識方法を実現する装置のブロック図を示す。
【００１１】
特徴量分析部１１は、学習・認識制御部１２と接続されており、入力された音声データの分析結果を出力する。学習・認識制御部１２は、特徴量保存部１３あるいは尤度計算部１４に選択的に接続され、特徴量分析部１１から得られたデータをいずれかに出力する。特徴量保存部１３は、尤度計算部１４および閾値設定部１５に接続されており、学習・認識制御部１２から出力されたＩＤに応じた特徴データを出力する。尤度計算部１４は、尤度計算部（１）、（２）から成り、各が第１判定部１６、第２判定部１７に接続されており、学習・認識制御部１２から出力されたデータおよび特徴量保存部１３から出力されたデータをもとに尤度を計算し、いずれかに出力する。閾値設定部１５は、第１判定部１６および第２判定部１７に接続されており、必要に応じて閾値を出力する。第１判定部１６は、結果出力および学習・認識制御部１２に接続されており、尤度計算部１４および閾値設定部１５から出力されたデータをもとに結果を判定し、出力するか、あるいは学習・認識制御部１２に処理を戻す。第２判定部１７は、結果出力に接続されており、尤度計算部１４および閾値設定部１５から出力されたデータをもとに結果を判定し、出力する。
【００１２】
［動作の説明］
まず、利用者を登録（学習）する時の処理の流れを図２を用いて説明する。
登録（学習）時は、学習・認識制御部１２のスイッチは特徴量保存部１３に接続される（Ｓ１００）。利用者のＩＤと音声が入力されると、特徴量分析部１１は、入力された音声を分析し、個人性の特徴データを抽出して、学習・認識制御部１２に出力する（Ｓ１０１，１０２）。学習・認識制御部１２は、入力された特徴データとＩＤを特徴量保存部１３に出力する。特徴量保存部１３は入力されたＩＤと特徴データを対応させて格納する（Ｓ１０３）。これで、１つのＩＤに対応する利用者の登録処理は完了であり、利用者人数に応じて上記登録処理を繰り返す。
【００１３】
次に、利用者が本人かどうかを認識する時の処理の流れを図３を用いて説明する。
【００１４】
利用者は負担の小さい（例えば、発声時間が短い、あるいは発声回数が少ない）課題（課題１）を実行する。この時、学習・認識制御部１２のスイッチは、尤度計算部（１）１４に接続される（Ｓ２００）。利用者のＩＤと課題１の音声が入力されると特徴量分析部１１は、入力された音声を分析し、個人性の特徴データを抽出して、学習・認識制御部１２に出力する（Ｓ２０１，２０２）。学習・認識制御部１２は、入力されたＩＤおよび特徴データを尤度計算部（１）１４に出力する。尤度計算部（１）１４は、特徴量保存部から入力されたＩＤに対応する特徴データを検索し、学習・認識制御部１２から出力された特徴データと比較して尤度を計算し、第１判定部１６に出力する（Ｓ２０３，２０４）。ここで、尤度は値が大きいほど特徴データが類似していることを表わすものとする。第１判定部は、出力された尤度と閾値設定部１５で設定された閾値ＴＡ、ＴＢ（ＴＡ＞ＴＢ：設定方法については後述する）を比較し、尤度がＴＡより大きい場合は利用者をＩＤの本人として受理し、処理を終了する（Ｓ２０５，２１３）。尤度がＴＢより小さい場合は、利用者をＩＤの詐称者として棄却し、処理を終了する（Ｓ２０６、２１４）。それ以外の場合は学習・認識制御部１２に通知し、利用者の負担が大きい（例えば、発声時間が長い、あるいは発声回数が多い）課題（課題２）を実行する。
【００１５】
課題２では、学習・認識制御部１２のスイッチは尤度計算部（２）１４に接続される（Ｓ２０７）。課題２の音声が入力されると、課題１の場合と同様に特徴量抽出部１１で抽出された特徴データは学習・認識制御部１２を経て、ＩＤとともに今度は尤度計算部（２）１４に出力される（Ｓ２０８，２０９）。尤度計算部（２）１４は、入力されたＩＤに対する特徴データを検索して尤度を計算し、第２判定部１７に出力する（Ｓ２１０，２１１）。第２判定部１７は、出力された尤度と閾値設定部１５で設定された閾値Ｔ（設定方法については後述する）を比較し、尤度がＴより大きい場合は利用者をＩＤの本人として受理、それ以外の場合は利用者をＩＤの詐称者として棄却し、処理を終了する（Ｓ２１２、２１４）。
【００１６】
次に、閾値設定部１５での閾値の設定方法の例を説明する。
【００１７】
閾値設定部１５は、特徴量保存部１３に格納されている利用者の特徴データをもとに利用者のＩＤ毎に３つの閾値Ｔ、ＴＡ、ＴＢを設定する。Ｔは、前記文献：特開平９−１９８０８６で示されているように学習に用いた音声データ（特徴データ）を利用して求めることができる。ただし、ここでは課題２に対応した音声データ（特徴データ）を使用する。また、この文献では本人棄却率と詐称者受理率が等しくなるように閾値Ｔを設定しているが、どちらかが小さく（良く）なるように設定してもよい。
【００１８】
ＴＡとＴＢは、本人棄却率、詐称者受理率、課題１での判定率（正誤にかかわらず課題１で受理・棄却の判定ができる比率）に依存して設定する。ここでは、課題１の本人棄却率、詐称者受理率を課題２の本人棄却率、詐称者受理率と等しくする場合の設定方法を図４を用いて説明する。課題２において閾値をＴとした時の詐称者受理率をＥＡ、本人棄却率をＥＢとする。一般に、利用者負担の小さい（発声時間が短い、発声回数が少ない）課題ほど、尤度の信頼性が低くなり、認識率は悪くなる。すなわち、本人棄却率、詐称者受理率ともに課題１の曲線（点線）は課題２の曲線（実線）より上に（認識率の悪いほうに）あることになり、閾値をＴとした時の詐称者受理率、本人棄却率はそれぞれＥＡ、ＥＢより大きく（悪く）なる。ここで、課題１に対して、閾値をＴより大きくあるいは小さくしていくと、詐称者受理率がＥＡになる点、本人棄却率がＥＢになる点が存在しうる。この時の閾値をそれぞれＴＡおよび、ＴＢとして設定する。ただし、課題１の認識率によっては、詐称者受理率がＥＡになる閾値あるいは本人棄却率がＥＢになる閾値が存在しないことがある。その場合は、存在するほうの閾値だけを有効にするか、課題を変更する。
【００１９】
以上説明したように、第１の実施形態によれば、利用者負担の小さい課題１に対して２つの閾値ＴＡ、ＴＢ（ＴＡ＞ＴＢ）を用意し、尤度がＴＡより大きい場合は利用者をＩＤの本人として受理し、ＴＢより小さい場合は利用者をＩＤの詐称者として棄却することによって、利用者の多くは負担の小さい課題で受理・棄却の判定を行なうことができる。この時、課題および閾値によっては、利用者負担の大きい課題２と同等の本人棄却率、詐称者受理率を維持できる。
【００２０】
［第２の実施形態の説明］
［構成の説明］
前述の実施形態１では、課題１、課題２ともに音声の特徴データで認証する方式を説明したが、課題２は音声の特徴データを用いない手段も考えられる。ここでは、課題２としてパスワードを入力する場合の例を説明する。
【００２１】
図５に第２の実施形態における話者認識方法を実現する装置のブロック図を示す。
【００２２】
課題１の処理を行なう特徴量分析部１１、学習・認識制御部１２、特徴量保存部１３、尤度計算部１４、閾値設定部１５、第１判定部１６は、次の点を除いて実施形態１と同様である。学習・認識制御部１２は、特徴量保存部１３、尤度計算部１４の他に第２制御部１９に接続されており、課題１で判定できなかった場合にＩＤを出力し、制御を渡す。また、第２制御部１９は、パスワード保存部１８および第２判定部１７に選択的に接続され、入力されたパスワードおよびＩＤを出力する。パスワード保存部１８は、第２判定部１７と接続されており、必要に応じて第２制御部１９から出力されたＩＤに対応したパスワードを出力する。第２判定部１７は、結果出力に接続されており、第２制御部１７およびパスワード保存部１８から出力されたＩＤおよびパスワードから結果を判定し、出力する。
【００２３】
［動作の説明］
音声の特徴データの学習は実施形態１と同様である。
パスワードの学習時は、第２制御部１９のスイッチはパスワード保存部１８に接続される。ＩＤおよびパスワードが入力されると、第２制御部１９はそれらをパスワード保存部１８に出力する。パスワード保存部１８はＩＤとパスワードを対応させて格納する。これで１つのＩＤに対する学習処理は終了である。
【００２４】
次に、認識時の処理の流れを説明する。
課題１までは実施形態１と同様である。
課題２では、学習・認識制御部１２は第２制御部１９にＩＤを出力し、制御を渡す。課題２としてパスワードが入力されると、第２制御部１９はそのパスワードと学習・認識制御部１２から出力されたＩＤを第２判定部１７に出力する。第２判定部１７は、第２制御部１９から出力されたＩＤに対応するパスワードをパスワード保存部１８から検索し、第２制御部１９から出力されたパスワードと比較し、同じ場合は利用者をＩＤの本人として受理、それ以外の場合は利用者をＩＤの詐称者として棄却し、処理を終了する。
【００２５】
以上説明したように、第２の実施形態によれば、課題２として音声の特徴データを用いない手段をとることができ、パスワードなどの他の認証手段と容易に融合することができる。
【００２６】
【発明の効果】
以上、詳細に説明したように、第１の発明によれば、第１の認識ステップとして、利用者に対して負担の軽い第１の課題として発話を実行させ、利用者ごとにTA＞TBとなるような２つの閾値TA、TBを設け、発音された音声の特徴データと指定された利用者のデータベースに格納された特徴データから計算された尤度が閾値TAより大きい場合はその利用者を本人として受理し、閾値TBより小さい場合は詐称者として棄却し、閾値TAとTBの間にある場合に、より精緻な認識のために第１の課題よりも利用者に対して負担の重い第２の課題としての発話を実行させ、閾値Tを用いて本人であるか否かを検証する第２の認識ステップを有する話者認識方法であって、閾値TAを第１の課題を実施したときの詐称者受理率が第２の認識ステップにおける詐称者受理率と等しくなるように設定すると共に、閾値TBを第１の課題を実施したときの本人棄却率が前記第２の認識ステップにおける本人棄却率と等しくなるように設定した構成としたので、利用者の多くは負担の小さい課題で受理・棄却の判定を行なうことができる。
【図面の簡単な説明】
【図１】第１の実施形態における話者認識方法を実現するための装置のブロック図である。
【図２】利用者登録時の処理の流れ図である。
【図３】第１の実施形態における認識時の処理の流れ図である。
【図４】閾値設定のための説明図である。
【図５】第２の実施形態における話者認識方法を実現するための装置のブロック図である。
【符号の説明】
１１特徴量分析部
１２学習・認識制御部
１３特徴量保存部
１４尤度計算部
１５閾値設定部
１６第１判定部
１７第２判定部
１８パスワード保存部
１９第２制御部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a speaker recognition method for determining whether a spoken voice belongs to a specific individual.
[0002]
[Prior art]
In the speaker recognition device for determining whether the spoken voice is of a specific individual, the likelihood is calculated from the registered voice feature data of the designated user and the feature data of the spoken voice, and is set in advance. The determination is made by comparing with a threshold value. If the likelihood is greater than the threshold, it is accepted as the designated person, and if it is smaller, it is rejected as a spoofer.
[0003]
However, if the threshold is set strictly (that is, large), the error (FR: False Rejection) that the correct person is rejected increases, and if the threshold is relaxed (that is, small), the false person is accepted (FA: False). (Acceptance) increases, and it is important to set the threshold value to an appropriate value. Document: Japanese Patent Laid-Open No. 9-198086 “Speaker recognition threshold setting method and speaker recognition apparatus using this method” describes a method of setting a threshold in the learning process of a speaker model.
[0004]
[Problems to be solved by the invention]
However, in the method of determining acceptance / rejection with one threshold as represented by the above method, the utterance time at the time of registration or recognition will be increased if the accuracy is further improved no matter how appropriate the threshold is set. There is a problem that it is necessary to increase the number of utterances or to increase the burden on all users.
[0005]
For example, “J. L. Gauvin, “Experiments with Speaker Verification over the Telephone”, EUROSPECH '95, 1995 ”, the longer the utterance at the time of recognition and the more the number of utterances, the higher the accuracy (that is, the above-mentioned FRFA). It is reported that the error rate EER: Equal Error Rate becomes smaller.
[0006]
An object of the present invention is to solve the above-mentioned conventional problems and to provide a speaker recognition method capable of reducing a user's burden when setting a threshold for speaker recognition.
[0007]
Therefore, in the speaker recognition method of the first invention, as the first recognition step, the user is caused to execute the utterance as the first problem with a light burden, and TA> TB for each user. Two thresholds TA and TB are provided, and if the likelihood calculated from the feature data of the pronounced speech and the feature data stored in the specified user database is greater than the threshold TA, the user is accepted as the principal. However, if the likelihood is smaller than the threshold TB, it is rejected as an impersonator, and if it is between the thresholds TA and TB, it is more burdensome for the user than the first task for more precise recognition. A speaker recognition method having a second recognition step of executing utterance as a second task and verifying whether or not the user is the person using the threshold value T. The threshold TA is used for the first task. When the fraudster acceptance rate is fraud in the second recognition step The threshold TB is set to be equal to the person rejection rate in the second recognition step, and the threshold TB is set to be equal to the person acceptance rate. .
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0010]
[Description of First Embodiment]
[Description of configuration]
FIG. 1 shows a block diagram of an apparatus for realizing the speaker recognition method according to the first embodiment.
[0011]
The feature quantity analysis unit 11 is connected to the learning / recognition control unit 12 and outputs an analysis result of the input voice data. The learning / recognition control unit 12 is selectively connected to the feature amount storage unit 13 or the likelihood calculation unit 14 and outputs the data obtained from the feature amount analysis unit 11 to one of them. The feature quantity storage unit 13 is connected to the likelihood calculation unit 14 and the threshold setting unit 15 and outputs feature data corresponding to the ID output from the learning / recognition control unit 12. The likelihood calculation unit 14 includes likelihood calculation units (1) and (2), each of which is connected to the first determination unit 16 and the second determination unit 17, and is output from the learning / recognition control unit 12. The likelihood is calculated based on the data and the data output from the feature amount storage unit 13 and output to any one. The threshold setting unit 15 is connected to the first determination unit 16 and the second determination unit 17 and outputs a threshold as necessary. The first determination unit 16 is connected to the result output and learning / recognition control unit 12, and determines and outputs the result based on the data output from the likelihood calculation unit 14 and the threshold setting unit 15. Alternatively, the process is returned to the learning / recognition control unit 12. The second determination unit 17 is connected to the result output, and determines and outputs the result based on the data output from the likelihood calculation unit 14 and the threshold setting unit 15.
[0012]
[Description of operation]
First, the flow of processing when registering (learning) a user will be described with reference to FIG.
At the time of registration (learning), the switch of the learning / recognition control unit 12 is connected to the feature amount storage unit 13 (S100). When the user's ID and voice are input, the feature quantity analysis unit 11 analyzes the input voice, extracts personality feature data, and outputs it to the learning / recognition control unit 12 (S101, 102). ). The learning / recognition control unit 12 outputs the input feature data and ID to the feature amount storage unit 13. The feature quantity storage unit 13 stores the input ID and feature data in association with each other (S103). This completes the registration process of the user corresponding to one ID, and repeats the registration process according to the number of users.
[0013]
Next, the flow of processing when recognizing whether or not the user is the user will be described with reference to FIG.
[0014]
The user executes a task (issue 1) with a small burden (for example, the utterance time is short or the number of utterances is small). At this time, the switch of the learning / recognition control unit 12 is connected to the likelihood calculating unit (1) 14 (S200). When the user ID and the voice of the task 1 are input, the feature amount analysis unit 11 analyzes the input voice, extracts the personality feature data, and outputs it to the learning / recognition control unit 12 (S201). 202). The learning / recognition control unit 12 outputs the input ID and feature data to the likelihood calculation unit (1) 14. The likelihood calculation unit (1) 14 searches for feature data corresponding to the ID input from the feature amount storage unit, calculates the likelihood by comparing with the feature data output from the learning / recognition control unit 12, It outputs to the 1st determination part 16 (S203,204). Here, the likelihood indicates that the larger the value, the more similar the feature data. The first determination unit compares the output likelihood with thresholds TA and TB set by the threshold setting unit 15 (TA> TB: a setting method will be described later), and if the likelihood is greater than TA, the user Is accepted as the principal of the ID, and the process is terminated (S205, 213). If the likelihood is smaller than TB, the user is rejected as an ID spoofer, and the process is terminated (S206, 214). In other cases, the learning / recognition control unit 12 is notified, and a task (task 2) with a heavy burden on the user (for example, a long utterance time or a large number of utterances) is executed.
[0015]
In Task 2, the switch of the learning / recognition control unit 12 is connected to the likelihood calculating unit (2) 14 (S207). When the voice of the task 2 is input, the feature data extracted by the feature quantity extraction unit 11 passes through the learning / recognition control unit 12 as in the case of the task 1, and this time the likelihood calculation unit (2) 14 together with the ID. (S208, 209). The likelihood calculating unit (2) 14 searches the feature data for the input ID, calculates the likelihood, and outputs the likelihood to the second determining unit 17 (S210, 211). The second determination unit 17 compares the output likelihood with a threshold T set by the threshold setting unit 15 (the setting method will be described later). If the likelihood is greater than T, the user is identified as the identity person. Acceptance, otherwise, the user is rejected as an ID spoofer and the process is terminated (S212, 214).
[0016]
Next, an example of a threshold setting method in the threshold setting unit 15 will be described.
[0017]
The threshold setting unit 15 sets three thresholds T, TA, and TB for each user ID based on the user feature data stored in the feature amount storage unit 13. T can be obtained by using speech data (feature data) used for learning as disclosed in the above-mentioned document: Japanese Patent Laid-Open No. 9-198086. However, here, audio data (feature data) corresponding to the problem 2 is used. Further, in this document, the threshold value T is set so that the principal rejection rate and the impersonator acceptance rate are equal, but either may be set to be smaller (good).
[0018]
TA and TB are set depending on the rejection rate of the person, the rate of acceptance of the impersonator, and the determination rate in task 1 (the ratio at which acceptance / rejection can be determined in task 1 regardless of correctness). Here, a setting method in the case where the subject rejection rate and the spoofer acceptance rate in Task 1 are made equal to the subject rejection rate and the spoofer acceptance rate in Task 2 will be described with reference to FIG. When the threshold value is T in Task 2, the impersonator acceptance rate is EA, and the principal rejection rate is EB. In general, the smaller the user burden (the shorter the utterance time and the fewer the number of utterances), the lower the reliability of the likelihood and the worse the recognition rate. In other words, both the subject rejection rate and the fraudster acceptance rate, the curve (dotted line) of task 1 is above the curve (solid line) of task 2 (the one with the lower recognition rate), and the threshold is T. The person acceptance rate and the principal rejection rate are larger (bad) than EA and EB, respectively. Here, with respect to the problem 1, if the threshold value is made larger or smaller than T, there may be a point where the spoof acceptance rate becomes EA and a person rejection rate becomes EB. The thresholds at this time are set as TA and TB, respectively. However, depending on the recognition rate of Task 1, there may be no threshold at which the false actor acceptance rate becomes EA or the threshold at which the person rejection rate becomes EB. In that case, only the existing threshold value is enabled or the task is changed.
[0019]
As described above, according to the first embodiment, two threshold values TA and TB (TA> TB) are prepared for the task 1 with a small user burden, and the user is more likely when the likelihood is larger than TA. If the user is smaller than TB, the user can be rejected as an ID spoofer, so that many of the users can determine whether to accept or reject the task with a small burden. At this time, depending on the problem and the threshold value, it is possible to maintain the same person rejection rate and fraudster acceptance rate as the problem 2 with a large user burden.
[0020]
[Description of Second Embodiment]
[Description of configuration]
In the first embodiment described above, the method of authenticating with the voice feature data for both of the tasks 1 and 2 has been described. However, the task 2 may be a means that does not use the voice feature data. Here, an example in which a password is input as problem 2 will be described.
[0021]
FIG. 5 shows a block diagram of an apparatus for realizing the speaker recognition method according to the second embodiment.
[0022]
The feature amount analysis unit 11, the learning / recognition control unit 12, the feature amount storage unit 13, the likelihood calculation unit 14, the threshold setting unit 15, and the first determination unit 16 that perform the task 1 are implemented except for the following points. This is the same as the first embodiment. The learning / recognition control unit 12 is connected to the second control unit 19 in addition to the feature amount storage unit 13 and the likelihood calculation unit 14, and outputs an ID and passes control when it cannot be determined in the task 1. . The second control unit 19 is selectively connected to the password storage unit 18 and the second determination unit 17 and outputs the input password and ID. The password storage unit 18 is connected to the second determination unit 17 and outputs a password corresponding to the ID output from the second control unit 19 as necessary. The second determination unit 17 is connected to the result output, and determines and outputs the result from the ID and password output from the second control unit 17 and the password storage unit 18.
[0023]
[Description of operation]
The learning of voice feature data is the same as in the first embodiment.
When the password is learned, the switch of the second control unit 19 is connected to the password storage unit 18. When the ID and password are input, the second control unit 19 outputs them to the password storage unit 18. The password storage unit 18 stores the ID and the password in association with each other. This completes the learning process for one ID.
[0024]
Next, the flow of processing during recognition will be described.
Problem 1 is the same as in the first embodiment.
In Task 2, the learning / recognition control unit 12 outputs the ID to the second control unit 19 and passes control. When a password is input as the assignment 2, the second control unit 19 outputs the password and the ID output from the learning / recognition control unit 12 to the second determination unit 17. The second determination unit 17 searches the password storage unit 18 for a password corresponding to the ID output from the second control unit 19 and compares it with the password output from the second control unit 19. Accept as the ID person, otherwise reject the user as an ID spoofer and end the process.
[0025]
As described above, according to the second embodiment, a means that does not use voice feature data can be taken as problem 2, and can be easily integrated with other authentication means such as a password.
[0026]
【The invention's effect】
As described above in detail, according to the first invention, as the first recognition step, the user is caused to execute utterance as the first problem with a light burden, and TA> TB for each user. Two thresholds TA and TB are provided, and if the likelihood calculated from the feature data of the pronounced speech and the feature data stored in the specified user database is greater than the threshold TA, the user is selected. If it is accepted as the principal and is smaller than the threshold TB, it is rejected as an impersonator, and if it is between the thresholds TA and TB, it is more burdensome for the user than the first task for more precise recognition. A speaker recognition method having a second recognition step for executing the utterance as the second task and verifying whether or not the user is the person using the threshold T, when the first task is performed with the threshold TA. The rate of accepting an impersonator is accepting an impersonator in the second recognition step Since the threshold TB is set to be equal to the rejection rate in the second recognition step, the threshold TB is set to be equal to the rate. Many of them can accept / reject decisions with less burdensome tasks.
[Brief description of the drawings]
FIG. 1 is a block diagram of an apparatus for realizing a speaker recognition method according to a first embodiment.
FIG. 2 is a flowchart of processing at the time of user registration.
FIG. 3 is a flowchart of processing during recognition in the first embodiment.
FIG. 4 is an explanatory diagram for setting a threshold value.
FIG. 5 is a block diagram of an apparatus for realizing a speaker recognition method according to a second embodiment.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 11 Feature-value analysis part 12 Learning / recognition control part 13 Feature-value preservation | save part 14 Likelihood calculation part 15 Threshold setting part 16 1st determination part 17 2nd determination part 18 Password preservation | save part 19 2nd control part

Claims

As the first task with a light burden on the user, utterance is executed, and two threshold values TA and TB are set so that TA> TB for each user. The likelihood is calculated from the feature data stored in the person's database, and if the likelihood is greater than the threshold TA, the user is accepted as the person, and if the likelihood is less than the threshold TB, the person is rejected as a spoofer A first recognition step;
When the likelihood is between the threshold values TA and TB in the first recognition step, the second problem is more burdensome for the user than the first problem for more precise recognition. A speaker recognition method having a second recognition step of executing an utterance and verifying whether or not the user is authentic using a threshold T,
The threshold TA is set so that the impersonator acceptance rate when the first problem is implemented is equal to the impersonator acceptance rate in the second recognition step, and the threshold TB is implemented as the first problem. A speaker recognition method, wherein the person rejection rate when set to be equal to the person rejection rate in the second recognition step is set.