JP2005084613A - Speaker matching device - Google Patents

Speaker matching device Download PDF

Info

Publication number
JP2005084613A
JP2005084613A JP2003319651A JP2003319651A JP2005084613A JP 2005084613 A JP2005084613 A JP 2005084613A JP 2003319651 A JP2003319651 A JP 2003319651A JP 2003319651 A JP2003319651 A JP 2003319651A JP 2005084613 A JP2005084613 A JP 2005084613A
Authority
JP
Japan
Prior art keywords
pattern
unit
user
person
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP2003319651A
Other languages
Japanese (ja)
Other versions
JP4232961B2 (en
Inventor
Tsuneo Kato
恒夫 加藤
Toru Shimizu
徹 清水
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
KDDI Corp
Original Assignee
KDDI Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by KDDI Corp filed Critical KDDI Corp
Priority to JP2003319651A priority Critical patent/JP4232961B2/en
Publication of JP2005084613A publication Critical patent/JP2005084613A/en
Application granted granted Critical
Publication of JP4232961B2 publication Critical patent/JP4232961B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speaker matching device improved in the ratio (matching ratio) capable of determining whether a speaker is identical person or pretender. <P>SOLUTION: A matching pattern generation part 1 generates a matching pattern of continuous numbers by discarding and replacing a number with which it is difficult to decide whether the person himself or herself is a pretender with a new number on the basis of an evaluation result obtained by a number unit score evaluation part 10, and the presents the matching pattern to a user. A pattern dependency precision prediction/threshold setting part 9 predicts precision and sets a threshold based upon the numbers constituting the matching pattern. The number unit score evaluation part 10 evaluates similarities, found by the numbers between a speech that the user speaks and a previously registered speech, by precision evaluating functions by the numbers. A speaker decision part 8 decides a similarity to the matching pattern by using the threshold set by the pattern dependency precision prediction/threshold setting part 9. Replacement in number chain units is possible. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、話者照合装置に関し、特に、連続数字パタンを用いたテキスト指定型話者照合装置に関する。   The present invention relates to a speaker verification device, and more particularly, to a text designation type speaker verification device using a continuous numeral pattern.

本人か詐称者かを音声で判定するための話者照合技術として、連続数字パタンからなる照合パタンを用いたテキスト指定型話者照合および複数の照合パタンを用いた話者照合がある。   As speaker verification techniques for determining whether a person is a person or an impersonator by voice, there are text-specific speaker verification using a verification pattern consisting of continuous numeric patterns and speaker verification using a plurality of verification patterns.

連続数字パタンからなる照合パタンを用いたテキスト指定型話者照合では、連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う。照合の度に異なる照合パタンをシステム側から指定するので、録音された本人の音声による詐称を防ぐことができる。   In text-specific speaker verification using a verification pattern consisting of continuous numeric patterns, a verification pattern consisting of continuous digits is presented to the user, and the similarity between the voice uttered by the user and the pre-registered voice is calculated. Perform speaker verification. Since a different verification pattern is specified from the system side every time verification is performed, it is possible to prevent spoofing by the recorded voice of the person.

複数の照合パタンを用いた話者照合では、単一の発声で本人か詐称者か判定するのが困難な場合に異なる照合パタンを提示して再発声を促し、複数の発声(より多くの音声情報)を用いて判定ができるため、本人か詐称者かを正しく判定できる割合(照合率)を高めることができる。   In speaker verification using multiple verification patterns, if it is difficult to determine whether the person is a person or an impersonator with a single utterance, a different verification pattern is presented to encourage recurrence, and multiple utterances (more voices) Since the determination can be made using the information), it is possible to increase the ratio (collation rate) at which it is possible to correctly determine whether the person is a person or an impersonator.

図8は、従来の話者照合装置のブロック図であり、この話者照合装置では、連続数字パタンからなる照合パタンを用いるとともに、本人か詐称者かを正しく判定できる割合(照合率)を高めるために複数の照合パタンを用いている。照合パタン作成部1は、連続数字からなる照合パタンを作成し、ユーザに提示して照合パタンを発声するように促す。提示された照合パタンに従ってユーザが発声(音声入力)すると、音声認識部2は、音声認識モデル3を用いてその発声内容を認識する。   FIG. 8 is a block diagram of a conventional speaker verification device. In this speaker verification device, a verification pattern consisting of continuous numeric patterns is used, and the ratio (verification rate) for correctly determining whether the person is a person or an impersonator is increased. Therefore, a plurality of collation patterns are used. The collation pattern creation unit 1 creates a collation pattern composed of continuous numbers, presents it to the user, and prompts the user to utter the collation pattern. When the user utters (speech input) according to the presented collation pattern, the speech recognition unit 2 recognizes the utterance content using the speech recognition model 3.

認識結果照合部4は、音声認識部4での認識結果が照合パタンと一致するか否かを判定する。ここで、ユーザが照合パターンと異なる音声を誤って入力すると、認識結果は棄却され、また、録音された本人の音声を用いて本人になりすまして音声入力しても、認識結果が照合パタンと一致することは殆どないので、認識結果は棄却される。   The recognition result collation unit 4 determines whether or not the recognition result in the voice recognition unit 4 matches the collation pattern. Here, if the user mistakenly inputs a voice that differs from the matching pattern, the recognition result is rejected, and even if the recorded person's voice is used to impersonate the voice, the recognition result matches the matching pattern. Since there is little to do, the recognition result is rejected.

認識結果が照合パタンと一致していれば、本人スコア計算部5は、照合時の発声と登録話者モデル6として以前に登録された本人の音声とのスコア(類似度)を計算する。スコアの計算に際しては背景話者モデル7を用いて背景話者の影響が除かれる。本人スコア計算部5におけるスコア計算の代表的な手法としては、HMM(Hidden Markov Model:隠れマルコフモデル)法、GMM(Gaussian Mixture density Model:ガウス混合分布モデル)法、DPマッチング法などがある。   If the recognition result matches the collation pattern, the principal score calculation unit 5 calculates a score (similarity) between the utterance at the collation and the voice of the principal who has been registered as the registered speaker model 6 previously. In calculating the score, the background speaker model 7 is used to remove the influence of the background speaker. As representative methods of score calculation in the principal score calculation unit 5, there are an HMM (Hidden Markov Model) method, a GMM (Gaussian Mixture density Model) method, a DP matching method, and the like.

話者照合判定部8は、本人スコア計算部5で求められたスコアを本人受理用および他人棄却用の2種類の閾値を用いて判定する。すなわち、本人スコア計算部5で求められたスコアをS、本人受理用の閾値をSacc、他人棄却用の閾値をSrejとすると、S≧Saccならば、ユーザは本人であるとして受理し、S<Srejならば、ユーザは本人でないとして棄却し、Srej≦S<Saccならば、本人か詐称者かの判別が困難であるとして、照合パタン作成部1に対してユーザに再照合のための照合パタンを提示するように指示する。再照合に際しては、登録されている数字がランダムに連結されて新たな照合パタンが作成される。 The speaker verification determination unit 8 determines the score obtained by the principal score calculation unit 5 using two kinds of threshold values for accepting the person and rejecting the other person. That is, if the score obtained by the principal score calculation unit 5 is S, the threshold for accepting the person is S acc , and the threshold for rejecting the other person is S rej , the user is accepted as the person if S ≧ S acc. , S <S rej rejects the user as not being the person, and S rej ≦ S <S acc. Instructs to present a verification pattern for verification. At the time of re-collation, the registered numbers are randomly connected to create a new collation pattern.

図9は、2種類の閾値Sacc、Srejを用いた判定における受理/棄却/再発声の関係を、横軸をスコアSとし、縦軸をスコア分布(頻度)として示している。同図から明らかなように、S<Srejの場合にはユーザは詐称者であるとみることができ、S≧Saccの場合にはユーザは本人であるとみることができる。また、Srej≦S<Saccの場合にはユーザが本人であるか詐称者であるかを高い確率で判別できないので、ユーザに再発声を促す。 FIG. 9 shows the relationship between acceptance / rejection / relapse voice in the determination using two types of threshold values S acc and S rej , with the horizontal axis as score S and the vertical axis as score distribution (frequency). As can be seen from the figure, when S <S rej , the user can be regarded as an impersonator, and when S ≧ S acc , the user can be regarded as the person himself. Further, when S rej ≦ S <S acc , it is not possible to determine with high probability whether the user is the person or the impersonator.

テキスト指定型話者照合および複数の照合パタンを用いた話者照合については、下記非特許文献1〜4に記載されている。
松井 古井:“テキスト指定型話者認識”信学論D-2 J79-D-2 No.5 pp.647-656(1996) 内部 黒岩 樋口:“数字を用いた話者照合方式の検討”信学技法,SP98-68,pp.1-8(1998) 加藤 清水 “連続数字のパタン指定方式による話者照合精度の改善効果”音講論,2002秋,pp129-130(2002) 加藤 清水 “連続数字のパタン指定方式による時期差データに対する話者照合精度の改善”音講論,2003春,pp105-106(2003)
Non-Patent Documents 1 to 4 below describe text-specific speaker verification and speaker verification using a plurality of verification patterns.
Furui Matsui: “Text-specific speaker recognition” Theory of Science D-2 J79-D-2 No.5 pp.647-656 (1996) Inner Kuroiwa Higuchi: “Study of speaker verification using numbers”, IEICE Tech., SP98-68, pp.1-8 (1998) Shimizu Kato “Improvement of speaker verification accuracy by pattern designation method for consecutive numbers”, sound lecture, autumn 2002, pp129-130 (2002) Shimizu Kato, “Improvement of speaker verification accuracy for time difference data by pattern specification method for continuous numbers”, sound lecture, spring 2003, pp105-106 (2003)

上述のように、従来の話者照合装置における話者照合判定部は、Srej≦S<Saccと判定した場合、照合パタン作成部1に対してユーザに再照合のための照合パタンを提示するように指示し、照合パタン作成部1は、再度、連続数字パタンを作成して第2発声をユーザに促す。しかしながら、第2発声のために作成される照合パタンは、単に数字をランダムに連結することによって得られるものであるため、そのような照合パタンを提示してユーザに第2発声、第3発声を促して再照合したとしても本人か詐称者かを正しく判定できる割合(照合率)が十分に高くなることは期待できないという問題がある。 As described above, when the speaker verification determination unit in the conventional speaker verification device determines that S rej ≦ S <S acc , the verification pattern creation unit 1 is presented with a verification pattern for re-verification to the user. The collation pattern creation unit 1 again creates a continuous numeric pattern and prompts the user to make a second utterance. However, since the collation pattern created for the second utterance is obtained by simply concatenating numbers at random, the collation pattern is presented to the user for the second utterance and the third utterance. There is a problem that even if prompting and re-verification are performed, it cannot be expected that the ratio (verification rate) for correctly determining whether the person is the person or the impersonator will be sufficiently high.

本発明の目的は、ユーザの発声の特徴に応じて、本人か詐称者かを正しく判定できる割合(照合率)が十分に高くなることが期待できる照合パタンを自動的に作成し、本人か詐称者かを正しく判定できる割合(照合率)を向上させることにある。   An object of the present invention is to automatically create a verification pattern that can be expected to have a sufficiently high ratio (matching rate) that can correctly determine whether the person is a person or a fraud according to the characteristics of the user's utterance. It is to improve the ratio (collation rate) that can be correctly determined.

上記課題を解決するために、本発明は、連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う話者照合装置において、照合パタンを構成する数字(あるいは数字連鎖)を元に精度予測と閾値設定を行うパタン依存精度予測・閾値設定部と、ユーザにより発声された音声と予め登録された音声との間において数字(あるいは数字連鎖)ごとに求めた類似度を、数字(あるいは数字連鎖)ごとの精度評価関数で評価する数字単位スコア評価部と、前記数字単位スコア評価部による評価結果に基づき、本人か詐称者かの判別が困難な数字(あるいは数字連鎖)を別の数字(あるいは数字連鎖)に入れ替えることによって新たな連続数字からなる照合パタンを作成する照合パタン作成部と、前記パタン依存精度予測・閾値設定部により設定された閾値を用いて、ユーザにより発声された音声と予め登録された音声との間において照合パターンについて求めた類似度を判定する判定部とを備えたことを特徴とする。   In order to solve the above-mentioned problem, the present invention presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification. In the person verification device, a pattern-dependent accuracy prediction / threshold setting unit that performs accuracy prediction and threshold setting based on a number (or a number chain) constituting the verification pattern, and a voice uttered by the user and a previously registered voice Based on the evaluation result by the numerical unit score evaluation unit and the numerical unit score evaluation unit that evaluates the degree of similarity calculated for each number (or number chain) with the accuracy evaluation function for each number (or number chain) A collation pattern consisting of new consecutive numbers by replacing a difficult number (or number chain) with another number (or number chain) Using the threshold set by the collation pattern creation unit and the pattern-dependent accuracy prediction / threshold setting unit, the similarity obtained for the collation pattern is determined between the voice uttered by the user and the voice registered in advance. And a determination unit.

ここで、前記パタン依存精度予測・閾値設定部は、照合パタンを構成する数字(あるいは数字連鎖)ごとに予め求められた本人および詐称者の類似度のスコア分布に基づいて精度予測と閾値設定を行うものとすることができる。   Here, the pattern-dependent accuracy prediction / threshold setting unit performs accuracy prediction and threshold setting based on the score distribution of similarity between the person and the impersonator obtained in advance for each number (or number chain) constituting the verification pattern. Can be done.

本発明によれば、ユーザにより発声された音声と予め登録された音声との間において数字(あるいは数字連鎖)ごとに類似度を求め、この類似度の評価結果から、本人か詐称者かの判別が困難な数字(あるいは数字連鎖)を別の数字(あるいは数字連鎖)に入れ替えて新たな照合パターンを作成するので、ユーザが再発声を行う場合に、指定された照合パターンに従うことによって本人か詐称者かを正しく判定できる割合(照合率)が十分に高くなることが期待できる。   According to the present invention, the degree of similarity is obtained for each number (or number chain) between the voice uttered by the user and the voice registered in advance, and it is determined whether the person is the person or the impersonator from the evaluation result of the degree of similarity. Since a new matching pattern is created by replacing a difficult number (or number chain) with another number (or number chain), when the user makes a recurring voice, the person himself or herself is misrepresented by following the specified matching pattern. It can be expected that the ratio (collation rate) for correctly determining whether the person is a person will be sufficiently high.

以下、図面を参照して本発明について説明する。図1は、本発明に係る話者照合装置の一実施形態を示すブロック図であり、図8と同一あるいは同等部分には同じ符号を付してある。   The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a speaker verification apparatus according to the present invention. The same or equivalent parts as those in FIG.

図1が図8と異なるのは、パタン依存精度予測・閾値設定部9と数字単位スコア評価部10が新たに追加され、本人スコア計算部5が数字ごとのスコア(類似度)を求め、このスコアを数字単位スコア評価部10に出力し、照合パタン作成部1が数字単位スコア評価部10での評価結果にしたがって照合パタンを作成する点である。その他の部分は、図7と同様であるので、以下では主として、照合パタン作成部1、本人スコア計算部5、パタン依存精度予測・閾値設定部9、および数字単位スコア評価部10について説明する。   FIG. 1 differs from FIG. 8 in that a pattern-dependent accuracy prediction / threshold setting unit 9 and a numerical unit score evaluation unit 10 are newly added, and the principal score calculation unit 5 obtains a score (similarity) for each number. The score is output to the numerical unit score evaluation unit 10, and the verification pattern generation unit 1 generates a verification pattern according to the evaluation result in the numerical unit score evaluation unit 10. Since the other parts are the same as those in FIG. 7, the collation pattern creation unit 1, the person score calculation unit 5, the pattern-dependent accuracy prediction / threshold setting unit 9, and the numerical unit score evaluation unit 10 will be mainly described below.

照合パタン作成部1は、ユーザに発声させる照合パタン(連続数字パタン)を作成するが、ユーザに再発声を促す時には、後述の数字単位スコア評価部10で得られた本人らしさまたは詐称者らしさの評価結果に基づいて本人か詐称者かの判別が困難であるとされた数字を別の数字に入れ替えて新たな照合パタンを作成する。   The collation pattern creation unit 1 creates a collation pattern (continuous numeric pattern) to be uttered by the user. When prompting the user to recite, the collation pattern creation unit 1 has the identity or pretenderness obtained by the numerical unit score evaluation unit 10 described later. A new verification pattern is created by replacing a number that is difficult to determine whether the person is a person or an impersonator with another number based on the evaluation result.

本人スコア計算部5は、数字ごとに求めたスコアを数字単位スコア評価部10に出力する。数字ごとのスコアは、登録話者モデル6と背景話者モデル7を用いて数字区切りごとに照合を行うことにより求めることができ、照合パターンについてのスコアとは照合する区切りの単位が異なるだけである。   The principal score calculation unit 5 outputs the score obtained for each number to the number unit score evaluation unit 10. The score for each number can be obtained by matching for each number break using the registered speaker model 6 and the background speaker model 7, and the score for the matching pattern is different only in the unit of the break. is there.

パタン依存精度予測・閾値設定部9は、照合パタンを構成する数字ごとの本人ならびに詐称者のスコア分布のパラメータを数字単位スコア評価部10に出力する。数字ごとの本人ならびに詐称者のスコア分布は予め求めておくことができる。   The pattern-dependent accuracy prediction / threshold setting unit 9 outputs, to the numerical unit score evaluation unit 10, the parameters of the score distribution of the person and the impersonator for each number constituting the verification pattern. The score distribution of the person and the impersonator for each number can be obtained in advance.

図2は、数字ごとに集計した本人ならびに詐称者のスコア分布の一例を示す特性図である。μi,d、σi,dはそれぞれ、本人のスコア分布における平均値、分散であり、μc,d、σc、dはそれぞれ、詐称者のスコア分布における平均値、分散でありまた、SEER,dは、誤受理率FARと誤棄却率FRRとが等しい等誤り率EERとなる閾値である。これらのパラメータ(平均値,分散)および等誤り率と等誤り率を与える閾値は数字dによって異なるので、d下付きで表記する。 FIG. 2 is a characteristic diagram showing an example of the score distribution of the person and the impersonator who are tabulated for each number. μ i, d and σ i, d are the mean value and variance in the individual's score distribution, respectively, μ c, d , σ c, d are the mean value and variance in the spoofer's score distribution, and S EER, d is a threshold value at which the equal error rate EER d equals the error acceptance rate FAR d and the error rejection rate FRR d . These parameters (average value, variance) and the equal error rate and the threshold value for giving the equal error rate differ depending on the number d, and are therefore indicated with a subscript d.

本人および詐称者のスコア分布をそれぞれ正規分布として近似すると、閾値SEER,dと等誤り率EERは、下記の式(1)、(2)で与えられる。 When the score distributions of the person and the impersonator are approximated as normal distributions, the threshold S EER, d and the equal error rate EER d are given by the following equations (1) and (2).

Figure 2005084613
Figure 2005084613

Figure 2005084613
Figure 2005084613

また、パタン依存精度予測・閾値設定部9は、予め求められた数字ごとの本人ならびに詐称者のスコア分布を組み合わせて照合パタンに関する本人ならびに詐称者のスコア分布を推定し、それらのスコア分布から照合パタンに依存する閾値SEER,zと等誤り率EERとを推定する。 Further, the pattern-dependent accuracy prediction / threshold setting unit 9 estimates the score distribution of the person and the impersonator regarding the collation pattern by combining the score distribution of the person and the impersonator for each number obtained in advance, and collates from the score distribution. A threshold value S EER, z depending on the pattern and an equal error rate EER z are estimated.

数字単位スコア評価部10は、数字ごとのスコアと数字ごとのスコア分布のパラメータ(μc,d、μi,d、σc,d、σi,d)とを入力とし、照合パタンを構成する数字ごとに、本人らしさまたは詐称者らしさの評価を行う。この評価結果は、上述のように、照合パタン作成部1での照合パタンの作成する際に利用される。 The numerical unit score evaluation unit 10 inputs a score for each number and parameters (μ c, d , μ i, d , σ c, d , σ i, d ) for each number, and forms a collation pattern For each number you do, evaluate your identity or your impersonator. As described above, this evaluation result is used when the collation pattern creation unit 1 creates a collation pattern.

話者照合判定部8は、本人受理用および他人棄却用の2種類の閾値Sacc、Srejを用いて判定を行うが、これらの閾値Sacc、Srejも照合パタンに使用される数字によって異なる。閾値Sacc、Srejは、パタン依存精度予測・閾値設定部9で推定された、照合パタンに関するスコア分布に従って設定される。 The speaker verification determination unit 8 performs determination using two types of threshold values S acc and S reg for accepting the person and rejecting the other person. The threshold values S acc and S reg are also determined by the numbers used in the verification pattern. Different. The thresholds S acc and S rej are set according to the score distribution related to the collation pattern estimated by the pattern-dependent accuracy prediction / threshold setting unit 9.

次に、照合パタン作成部1、パタン依存精度予測・閾値設定部9、および数字単位スコア評価部10の具体的構成について説明する。   Next, specific configurations of the collation pattern creation unit 1, the pattern-dependent accuracy prediction / threshold setting unit 9, and the numerical unit score evaluation unit 10 will be described.

図3は、照合パタン作成部1の一例を示すブロック図である。照合パタン作成部1には、数字単位スコア評価部10から数字ごとの本人らしさの評価結果が入力される。数字(連鎖)取捨選択部11は、評価結果が0.5に近い数字を捨て、別の新たな数字に入れ替えるように照合パタン生成部12に指示する。照合パタン生成部12は、この指示にしたがって登録発声パタンから新たな数字をピックアップし、これを残されている数字に付加して再照合時の照合パタンとする。ここでピックアップする数字は、捨てられた数字以外の任意の数字を任意の順序で選択することができる。   FIG. 3 is a block diagram illustrating an example of the collation pattern creation unit 1. The collation pattern creation unit 1 receives an evaluation result of the individuality for each number from the numerical unit score evaluation unit 10. The number (chain) sorting selection unit 11 instructs the collation pattern generation unit 12 to discard a number whose evaluation result is close to 0.5 and replace it with another new number. The verification pattern generation unit 12 picks up a new number from the registered utterance pattern in accordance with this instruction and adds it to the remaining number as a verification pattern at the time of re-verification. As the numbers to be picked up, any number other than the discarded numbers can be selected in any order.

図4は、数字(連鎖)取捨選択部11における数字(連鎖)の入れ替えの一例の説明図である。まず、照合パタン作成部1は、ユーザに第1発声目の照合パタン、例えば“1 2 3 4”を提示する。ユーザは、この照合パタンにしたがって音声入力する。   FIG. 4 is an explanatory diagram of an example of the replacement of numbers (chains) in the number (chain) sorting selection unit 11. First, the collation pattern creation unit 1 presents the collation pattern of the first utterance, for example, “1 2 3 4” to the user. The user inputs voice according to the verification pattern.

第1発声目の照合パタンについてのスコアSがSrej≦S<Sacc(図9参照)あり、数字単位スコア評価部10で得られた各数字についての本人らしさの評価結果がそれぞれ、0.45、0.55、0.65、0.60であったとすると、数字(連鎖)取捨選択部11は、再照合時の照合パタンとして、本人らしさの評価結果が0.50に近い数字、すなわち本人か詐称者かの判別が困難な数字を捨て、新たな数字に入れ替えて連続数字を作成する。 The score S for the collation pattern of the first utterance is S rej ≦ S <S acc (see FIG. 9), and the individuality evaluation result for each number obtained by the numerical unit score evaluation unit 10 is 0.45, If it is 0.55, 0.65, or 0.60, the number (chain) selection selection unit 11 has difficulty in discriminating whether the identity evaluation result is a value close to 0.50, that is, the person or the impersonator, as the collation pattern at the time of re-collation. Discard the numbers and replace them with new numbers to create consecutive numbers.

例えば、本人らしさの評価結果が0.45、0.55の“1 2”を捨て、新たに“5 6”を加えて第発2声目の照合パタン“3 4 5 6”を作成する。したがって、照合パタン作成部1は、第2発声目の照合パタンとして“3 4 5 6”をユーザに提示することになる。第3発声目以降の再発声についても同様である。   For example, “1 2” having an evaluation result of authenticity of 0.45 and 0.55 is discarded, and “5 6” is newly added to create a collation pattern “3 4 5 6” of the second voice. Therefore, the verification pattern creation unit 1 presents “3 4 5 6” to the user as the verification pattern of the second utterance. The same applies to recurrent voices after the third utterance.

図5は、パタン依存精度予測・閾値設定部9の一例を示すブロック図である。パタン依存精度予測・閾値設定部9には、照合パタン作成部1で作成された照合パタンが入力される。本人・詐称者スコア分布予測部91は、予め求められた数字ごとのスコア分布のパラメータ(μc,d、μi,d、σc,d、σi,d)92用いて照合パタンに関する本人ならびに詐称者のスコア分布を推定する。精度予測・閾値設定部93は、本人・詐称者スコア分布予測部91で推定された、照合パタンに関する本人ならびに詐称者のスコア分布に基づいて照合パタンに対する閾値を求め、この閾値を話者照合判定部8に出力する。なお、図示しないが、パタン依存精度予測・閾値設定部9は、数字ごとの本人ならびに詐称者のスコア分布のパラメータ(μc,d、μi,d、σc,d、σi,d)を数字単位スコア評価部10に出力する。 FIG. 5 is a block diagram illustrating an example of the pattern-dependent accuracy prediction / threshold setting unit 9. The pattern-dependent accuracy prediction / threshold setting unit 9 receives the collation pattern created by the collation pattern creation unit 1. The person / spoofer score distribution prediction unit 91 uses the parameters (μ c, d , μ i, d , σ c, d , σ i, d ) 92 for each number obtained in advance to identify the person regarding the matching pattern. As well as the score distribution of fraudsters. The accuracy prediction / threshold setting unit 93 obtains a threshold for the collation pattern based on the score distribution of the person and the impersonator regarding the collation pattern estimated by the person / spoofer score distribution prediction unit 91, and determines the threshold for speaker collation. Output to unit 8. Although not shown in the drawing, the pattern-dependent accuracy prediction / threshold setting unit 9 sets the parameters (μ c, d , μ i, d , σ c, d , σ i, d ) of the person and the spoofer for each number. Is output to the numerical unit score evaluation unit 10.

照合パタンに関するスコア分布は、正規分布の再生性から求められる。例えば、照合パタンzが4種類の変数s,t,u,v(4桁の数字)で構成されるとき、照合パタンzに関するスコア分布sは、構成要素s,s,s,sの相加平均で近似され、下記式(3)で求められる。 The score distribution relating to the verification pattern is obtained from the reproducibility of the normal distribution. For example, when the collation pattern z is composed of four types of variables s, t, u, and v (four-digit numbers), the score distribution s z regarding the collation pattern z is the components s s , s t , s u , s v is approximated by the arithmetic mean of, obtained by the formula described below (3).

Figure 2005084613
Figure 2005084613

このとき、照合パタンzに関する本人のスコア分布sの平均値μc,zと分散Vc,zは、下記式(4)、(5)で求められる。 At this time, the average value mu c, z and variance V c of score distribution s z of the person about the collation pattern z, z is represented by the following formula (4) obtained in (5).

Figure 2005084613
Figure 2005084613

Figure 2005084613
Figure 2005084613

また、照合パタンzに関する詐称者のスコア分布sの平均値μi,zと分散Vi,zは、下記式(6)、(7)で求められる。 Further, the average value μ i, z and the variance V i, z of the score distribution s z of the impersonator regarding the collation pattern z are obtained by the following formulas (6) and (7).

Figure 2005084613
Figure 2005084613

Figure 2005084613
Figure 2005084613

以上の変数を用いて、照合パタンに関するスコア分布sにおける等誤り率EERと該等誤り率を与える閾値SEER,zは、下記式(8)、(9)で与えられる。 Using the above variables, the equal error rate EER z and the threshold S EER, z that gives the equal error rate in the score distribution s z regarding the collation pattern are given by the following equations (8) and (9).

Figure 2005084613
Figure 2005084613

Figure 2005084613
Figure 2005084613

図6は、数字単位スコア評価部10の一例を示すブロック図である。数字単位スコア評価部10は、本人スコア計算部5からの数字ごとのスコアSとパタン依存精度予測・閾値設定部9のからの数字ごとのスコア分布のパラメータ(μc,d、μi,d、σc,d、σi,d)92とを入力とし、照合パタンを構成する数字ごとに、本人らしさまたは詐称者らしさの評価を行う。 FIG. 6 is a block diagram illustrating an example of the numerical unit score evaluation unit 10. The numerical unit score evaluation unit 10 calculates the score S for each number from the principal score calculation unit 5 and the score distribution parameters (μ c, d , μ i, d) for each number from the pattern-dependent accuracy prediction / threshold setting unit 9. , Σ c, d , σ i, d ) 92 are input, and the identity or the impersonator is evaluated for each number constituting the collation pattern.

スコアSは数字ごとに分布の範囲(平均値と分散)が異なるため、数字ごとのスコア分布の範囲(平均値と分散)のばらつきを正規化した本人らしさの評価尺度を次のように定める。
平均値μ、分散σの正規分布の確率密度関数を下記式(10)とすると、新たな本人らしさの評価尺度Qを下記式(11)のとおり定義できる。
Since the range of distribution (average value and variance) of the score S differs for each number, an evaluation scale of personality that normalizes the variation in the range of score distribution (average value and variance) for each number is determined as follows.
When the probability density function of the normal distribution with the mean value μ and the variance σ 2 is represented by the following equation (10), a new evaluation measure Q d of personality can be defined as the following equation (11).

Figure 2005084613
Figure 2005084613

Figure 2005084613
Figure 2005084613

ここで、Qは、0<Q<1を満たし、0に近いほど詐称者らしく、1に近いほど本人らしい。また、0.5付近では本人か詐称者かの判別が難しいといえる。数字単位スコア評価部10は、数字ごとにQを求め、その結果を照合パタン作成部1に渡す。 Here, Q d satisfies 0 <Q d <1, and the closer to 0, the more impersonator, the closer to 1, the more likely the person is. In addition, it can be said that it is difficult to determine whether the person is a person or an impersonator near 0.5. The numerical unit score evaluation unit 10 calculates Q d for each number and passes the result to the collation pattern creation unit 1.

なお、上記実施形態では、数字ごとに本人らしさを評価し、数字単位での入れ替えにより新たな照合パタンを作成するものとして説明したが、例えば2桁からなる数字連鎖を単位としてスコアを求め、予め求められた数字単位でのスコア分布から算出される数字連鎖単位でのスコア分布に基づいて本人らしさを評価し、数字連鎖単位での入れ替えを行って新たな照合パタンを作成するようにすることもできる。   In the embodiment described above, the identity is evaluated for each number, and a new collation pattern is created by replacement in units of numbers. However, for example, a score is obtained in units of a two-digit number chain, It is also possible to evaluate the identity based on the score distribution in the numerical chain unit calculated from the obtained score distribution in the numerical unit, and create a new matching pattern by replacing the numerical chain unit. it can.

また、上記実施形態では、式(11)により本人らしさを評価したが、本人らしさの評価は、その他の評価関数で行うことができ、また、本人スコア計算部5で求められた数字(あるいは数字連鎖)ごとのスコアそのものを用いて行うこともできる。   Further, in the above embodiment, the identity is evaluated by the expression (11). However, the identity can be evaluated by another evaluation function, and the number (or the number) obtained by the principal score calculation unit 5 can be evaluated. It is also possible to use the score for each chain).

図7は、従来方式(Qに依存しない第2発目照合パタンを用いる方式)と本発明(Qに依存する第2発目照合パタンを用いる方式)とで話者照合を行った実験結果を対比して示す。 7, the conventional method (method of using a second shot th matching pattern does not depend on Q d) and the present invention (method of using a second shot th matching pattern depending on the Q d) and de experiments conducted speaker verification The results are shown in comparison.

ここでは、男性81名、女性74名の携帯電話音声を用いた。また、本人受理用の閾値Srej、他人棄却用の閾値Saccをそれぞれ、誤棄却率FRR=1.0%、誤受理率FAR=1.0%になるように設定し、6桁数字の照合パタンを用い、2桁連鎖保存4桁で数字の入れ替えを行った。na,(k)、ni,(k)は本人、詐称者のk回目の発生数、na,(k)、nr(k)はk回目の発声で受理された本人、棄却された詐称者の発生数を表す。 Here, mobile phone voices of 81 men and 74 women were used. In addition, the threshold S reg for accepting the principal and the threshold S acc for rejecting the other person are set so that the false rejection rate FRR = 1.0% and the false acceptance rate FAR = 1.0%, respectively, and a 6-digit number verification pattern is used. Numbers were exchanged with 2 digits stored in 4 digits. n a, (k) , ni, (k) are the number of occurrences of the person, the impersonator, the k-th occurrence, and na , (k) , n r (k) are the persons who are accepted in the k-th utterance, rejected The number of occurrences of fraudsters.

図7から、第1発声目で判定されずに再発声を行った本人11.5%、詐称者12.0%のうち、第2発声目で正しく判定された割合は、従来方式の本人5.8%、詐称者3.6%に対し、本発明では本人6.5%、詐称者4.3%であり、それぞれ0.7%ずつ増加していることが分かる。   Figure 7 shows that 11.5% of the person who made a recurrence without being judged by the first utterance and 12.0% of the spoofer were correctly judged by the second utterance. In contrast to 3.6%, in the present invention, it is 6.5% for the person and 4.3% for the fraud.

本発明は、本人らしさを考慮して照合パタンを作成することにより、本人か詐称者かを正しく判定できる割合(照合率)を向上させることができるので、テレホンバンキングなどの電話サービスのための音声認証システム、インターネットや携帯インターネットにおける音声認証システムなどに適用することができる。   In the present invention, by creating a collation pattern in consideration of the identity of the person, it is possible to improve the ratio (collation rate) that can correctly determine whether the person is a person or an impersonator. Therefore, the voice for telephone services such as telephone banking The present invention can be applied to an authentication system, a voice authentication system on the Internet or a mobile Internet, and the like.

本発明に係る話者照合装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the speaker collation apparatus which concerns on this invention. 数字ごとに集計した本人ならびに詐称者のスコア分布の一例を示す特性図である。It is a characteristic view which shows an example of the score distribution of the person who is totaled for every number, and an impersonator. 照合パタン作成部の一例を示すブロック図である。It is a block diagram which shows an example of a collation pattern creation part. 本発明における数字(連鎖)の入れ替えの一例の説明図である。It is explanatory drawing of an example of replacement | exchange of the number (chain) in this invention. パタン依存精度予測・閾値設定部の一例を示すブロック図である。It is a block diagram which shows an example of a pattern dependence precision prediction and threshold value setting part. 数字単位スコア評価部の一例を示すブロック図である。It is a block diagram which shows an example of a numerical unit score evaluation part. 従来方式と本発明とに従って話者照合を行った実験結果を対比して示す図である。It is a figure which compares and shows the experimental result which performed speaker collation according to the conventional system and this invention. 従来の話者照合装置のブロック図である。It is a block diagram of the conventional speaker verification apparatus. 2種類の閾値を用いた判定における受理/棄却/再発声の関係を示す特性図である。It is a characteristic view which shows the relationship of acceptance / rejection / recurrence voice in the determination using two types of threshold values.

符号の説明Explanation of symbols

1・・・照合パタン作成部、2・・・音声認識部、3・・・音声認識モデル、4・・・認識結果照合部、5・・・本人スコア計算部、6・・・登録話者モデル、7・・・背景話者モデル、8・・・話者照合判定部、9・・・パタン依存・精度予測閾値設定部、10・・・数字単位スコア評価部、11・・・数字(連鎖)取捨選択部、12・・・照合パタン生成部、13・・・登録発声パタン、91・・・本人・詐称者スコア分布推定部、92・・・数字ごとのスコア分布のパラメータ、93・・・精度予測・閾値設定部
DESCRIPTION OF SYMBOLS 1 ... Collation pattern creation part, 2 ... Speech recognition part, 3 ... Speech recognition model, 4 ... Recognition result collation part, 5 ... Personal score calculation part, 6 ... Registered speaker Model: 7 ... Background speaker model, 8 ... Speaker collation determination unit, 9 ... Pattern dependence / accuracy prediction threshold setting unit, 10 ... Number unit score evaluation unit, 11 ... Number ( Chain) selection selection unit, 12 ... collation pattern generation unit, 13 ... registered utterance pattern, 91 ... person / spoofer score distribution estimation unit, 92 ... parameter of score distribution for each number, 93 ..Accuracy prediction / threshold setting section

Claims (3)

連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う話者照合装置において、
照合パタンを構成する数字を元に精度予測と閾値設定を行うパタン依存精度予測・閾値設定部と、
ユーザにより発声された音声と予め登録された音声との間において数字ごとに求めた類似度を、数字ごとの精度評価関数で評価する数字単位スコア評価部と、
前記数字単位スコア評価部による評価結果に基づき、本人か詐称者かの判別が困難な数字を別の数字に入れ替えることによって新たな連続数字からなる照合パタンを作成する照合パタン作成部と、
前記パタン依存精度予測・閾値設定部により設定された閾値を用いて、ユーザにより発声された音声と予め登録された音声との間において照合パターンについて求めた類似度を判定する判定部とを備えたことを特徴とする話者照合装置。
In a speaker verification device that presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification,
A pattern-dependent accuracy prediction / threshold setting unit for performing accuracy prediction and threshold setting based on the numbers constituting the verification pattern;
A numerical unit score evaluation unit that evaluates the degree of similarity calculated for each number between the voice uttered by the user and the previously registered voice, with an accuracy evaluation function for each number;
Based on the evaluation result by the numerical unit score evaluation unit, a collation pattern creation unit that creates a collation pattern composed of new continuous numbers by replacing a number that is difficult to identify whether the person or the impersonator with another number;
A determination unit that determines the similarity obtained for the matching pattern between the voice uttered by the user and the voice registered in advance using the threshold set by the pattern-dependent accuracy prediction / threshold setting unit; A speaker verification device characterized by that.
連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う話者照合装置において、
照合パタンを構成する数字連鎖を元に精度予測と閾値設定を行うパタン依存精度予測・閾値設定部と、
ユーザにより発声された音声と予め登録された音声との間において数字連鎖ごとに求めた類似度を、数字連鎖ごとの精度評価関数で評価する数字単位スコア評価部と、
前記数字単位スコア評価部による評価結果に基づき、本人か詐称者かの判別が困難な数字連鎖を別の数字連鎖に入れ替えることによって新たな連続数字からなる照合パタンを作成する照合パタン作成部と、
前記パタン依存精度予測・閾値設定部により設定された閾値を用いて、ユーザにより発声された音声と予め登録された音声との間において照合パターンについて求めた類似度を判定する判定部とを備えたことを特徴とする話者照合装置。
In a speaker verification device that presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification,
A pattern-dependent accuracy prediction / threshold setting unit that performs accuracy prediction and threshold setting based on a number chain that constitutes a matching pattern;
A numerical unit score evaluation unit that evaluates the degree of similarity obtained for each number chain between the voice uttered by the user and the previously registered voice with an accuracy evaluation function for each number chain;
Based on the evaluation result by the numerical unit score evaluation unit, a collation pattern creation unit that creates a collation pattern composed of new continuous numbers by replacing a number chain that is difficult to determine whether the person or the impersonator with another number chain, and
A determination unit that determines the similarity obtained for the matching pattern between the voice uttered by the user and the voice registered in advance using the threshold set by the pattern-dependent accuracy prediction / threshold setting unit; A speaker verification device characterized by that.
前記パタン依存精度予測・閾値設定部は、照合パタンを構成する数字あるいは数字連鎖ごとに予め求められた本人および詐称者の類似度のスコア分布に基づいて精度予測と閾値設定を行うことを特徴とする請求項2または3に記載の話者照合装置。 The pattern-dependent accuracy prediction / threshold setting unit is configured to perform accuracy prediction and threshold setting based on a score distribution of similarity between a person and an impersonator obtained in advance for each number or number chain constituting a collation pattern. The speaker verification device according to claim 2 or 3.
JP2003319651A 2003-09-11 2003-09-11 Speaker verification device Expired - Fee Related JP4232961B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2003319651A JP4232961B2 (en) 2003-09-11 2003-09-11 Speaker verification device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2003319651A JP4232961B2 (en) 2003-09-11 2003-09-11 Speaker verification device

Publications (2)

Publication Number Publication Date
JP2005084613A true JP2005084613A (en) 2005-03-31
JP4232961B2 JP4232961B2 (en) 2009-03-04

Family

ID=34418543

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2003319651A Expired - Fee Related JP4232961B2 (en) 2003-09-11 2003-09-11 Speaker verification device

Country Status (1)

Country Link
JP (1) JP4232961B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007057714A (en) * 2005-08-23 2007-03-08 Nec Corp Generating apparatus of speaker identification device updating data, method and program, and updating apparatus of speaker identificaion device, method and program
JP2009157050A (en) * 2007-12-26 2009-07-16 Hitachi Omron Terminal Solutions Corp Uttering verification device and uttering verification method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007057714A (en) * 2005-08-23 2007-03-08 Nec Corp Generating apparatus of speaker identification device updating data, method and program, and updating apparatus of speaker identificaion device, method and program
JP2009157050A (en) * 2007-12-26 2009-07-16 Hitachi Omron Terminal Solutions Corp Uttering verification device and uttering verification method

Also Published As

Publication number Publication date
JP4232961B2 (en) 2009-03-04

Similar Documents

Publication Publication Date Title
AU2004300140B2 (en) System and method for providing improved claimant authentication
US7447632B2 (en) Voice authentication system
WO2017162053A1 (en) Identity authentication method and device
Saquib et al. A survey on automatic speaker recognition systems
JP4232961B2 (en) Speaker verification device
Georgescu et al. GMM-UBM modeling for speaker recognition on a Romanian large speech corpora
US20050232470A1 (en) Method and apparatus for determining the identity of a user by narrowing down from user groups
Jayanna et al. Fuzzy vector quantization for speaker recognition under limited data conditions
JP3849841B2 (en) Speaker recognition device
JP4440414B2 (en) Speaker verification apparatus and method
Akingbade et al. Voice-based door access control system using the mel frequency cepstrum coefficients and gaussian mixture model
Farrús et al. On the fusion of prosody, voice spectrum and face features for multimodal person verification
US7162641B1 (en) Weight based background discriminant functions in authentication systems
JP2004279770A (en) Speaker authentication device and discriminant function setting method
Gupta et al. Text dependent voice based biometric authentication system using spectrum analysis and image acquisition
Safavi et al. Combination of rule-based and data-driven fusion methodologies for different speaker verification modes of operation
Fierrez-Aguilar et al. Speaker verification using adapted user-dependent multilevel fusion
Narayanaswamy et al. Extracting additional information from Gaussian mixture model probabilities for improved text independent speaker identification
Turner Security and privacy in speaker recognition systems
WO2009110613A1 (en) Personal collation device and speaker registration device, and method and program
Al-Tekreeti et al. Speaker voice recognition using a hybrid PSO/fuzzy logic system
BenZeghiba et al. Speaker verification based on user-customized password
JP3919314B2 (en) Speaker recognition apparatus and method
Pillay Voice Biometrics under Mismatched Noise Conditions
Ganchev et al. Toward 2003 NIST Speaker Recognition Evaluation: The WCL-1 System

Legal Events

Date Code Title Description
A621 Written request for application examination

Free format text: JAPANESE INTERMEDIATE CODE: A621

Effective date: 20050831

A977 Report on retrieval

Free format text: JAPANESE INTERMEDIATE CODE: A971007

Effective date: 20080212

A131 Notification of reasons for refusal

Free format text: JAPANESE INTERMEDIATE CODE: A131

Effective date: 20080430

A521 Request for written amendment filed

Free format text: JAPANESE INTERMEDIATE CODE: A523

Effective date: 20080627

TRDD Decision of grant or rejection written
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 20081203

A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

A61 First payment of annual fees (during grant procedure)

Free format text: JAPANESE INTERMEDIATE CODE: A61

Effective date: 20081204

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111219

Year of fee payment: 3

R150 Certificate of patent or registration of utility model

Ref document number: 4232961

Country of ref document: JP

Free format text: JAPANESE INTERMEDIATE CODE: R150

Free format text: JAPANESE INTERMEDIATE CODE: R150

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111219

Year of fee payment: 3

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20141219

Year of fee payment: 6

LAPS Cancellation because of no payment of annual fees