JP2005084613A

JP2005084613A - Speaker matching device

Info

Publication number: JP2005084613A
Application number: JP2003319651A
Authority: JP
Inventors: Tsuneo Kato; 恒夫加藤; Toru Shimizu; 徹清水
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2003-09-11
Filing date: 2003-09-11
Publication date: 2005-03-31
Anticipated expiration: 2023-09-11
Also published as: JP4232961B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a speaker matching device improved in the ratio (matching ratio) capable of determining whether a speaker is identical person or pretender. <P>SOLUTION: A matching pattern generation part 1 generates a matching pattern of continuous numbers by discarding and replacing a number with which it is difficult to decide whether the person himself or herself is a pretender with a new number on the basis of an evaluation result obtained by a number unit score evaluation part 10, and the presents the matching pattern to a user. A pattern dependency precision prediction/threshold setting part 9 predicts precision and sets a threshold based upon the numbers constituting the matching pattern. The number unit score evaluation part 10 evaluates similarities, found by the numbers between a speech that the user speaks and a previously registered speech, by precision evaluating functions by the numbers. A speaker decision part 8 decides a similarity to the matching pattern by using the threshold set by the pattern dependency precision prediction/threshold setting part 9. Replacement in number chain units is possible. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

本発明は、話者照合装置に関し、特に、連続数字パタンを用いたテキスト指定型話者照合装置に関する。 The present invention relates to a speaker verification device, and more particularly, to a text designation type speaker verification device using a continuous numeral pattern.

本人か詐称者かを音声で判定するための話者照合技術として、連続数字パタンからなる照合パタンを用いたテキスト指定型話者照合および複数の照合パタンを用いた話者照合がある。 As speaker verification techniques for determining whether a person is a person or an impersonator by voice, there are text-specific speaker verification using a verification pattern consisting of continuous numeric patterns and speaker verification using a plurality of verification patterns.

連続数字パタンからなる照合パタンを用いたテキスト指定型話者照合では、連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う。照合の度に異なる照合パタンをシステム側から指定するので、録音された本人の音声による詐称を防ぐことができる。 In text-specific speaker verification using a verification pattern consisting of continuous numeric patterns, a verification pattern consisting of continuous digits is presented to the user, and the similarity between the voice uttered by the user and the pre-registered voice is calculated. Perform speaker verification. Since a different verification pattern is specified from the system side every time verification is performed, it is possible to prevent spoofing by the recorded voice of the person.

複数の照合パタンを用いた話者照合では、単一の発声で本人か詐称者か判定するのが困難な場合に異なる照合パタンを提示して再発声を促し、複数の発声（より多くの音声情報）を用いて判定ができるため、本人か詐称者かを正しく判定できる割合（照合率）を高めることができる。 In speaker verification using multiple verification patterns, if it is difficult to determine whether the person is a person or an impersonator with a single utterance, a different verification pattern is presented to encourage recurrence, and multiple utterances (more voices) Since the determination can be made using the information), it is possible to increase the ratio (collation rate) at which it is possible to correctly determine whether the person is a person or an impersonator.

図８は、従来の話者照合装置のブロック図であり、この話者照合装置では、連続数字パタンからなる照合パタンを用いるとともに、本人か詐称者かを正しく判定できる割合（照合率）を高めるために複数の照合パタンを用いている。照合パタン作成部１は、連続数字からなる照合パタンを作成し、ユーザに提示して照合パタンを発声するように促す。提示された照合パタンに従ってユーザが発声（音声入力）すると、音声認識部２は、音声認識モデル３を用いてその発声内容を認識する。 FIG. 8 is a block diagram of a conventional speaker verification device. In this speaker verification device, a verification pattern consisting of continuous numeric patterns is used, and the ratio (verification rate) for correctly determining whether the person is a person or an impersonator is increased. Therefore, a plurality of collation patterns are used. The collation pattern creation unit 1 creates a collation pattern composed of continuous numbers, presents it to the user, and prompts the user to utter the collation pattern. When the user utters (speech input) according to the presented collation pattern, the speech recognition unit 2 recognizes the utterance content using the speech recognition model 3.

認識結果照合部４は、音声認識部４での認識結果が照合パタンと一致するか否かを判定する。ここで、ユーザが照合パターンと異なる音声を誤って入力すると、認識結果は棄却され、また、録音された本人の音声を用いて本人になりすまして音声入力しても、認識結果が照合パタンと一致することは殆どないので、認識結果は棄却される。 The recognition result collation unit 4 determines whether or not the recognition result in the voice recognition unit 4 matches the collation pattern. Here, if the user mistakenly inputs a voice that differs from the matching pattern, the recognition result is rejected, and even if the recorded person's voice is used to impersonate the voice, the recognition result matches the matching pattern. Since there is little to do, the recognition result is rejected.

認識結果が照合パタンと一致していれば、本人スコア計算部５は、照合時の発声と登録話者モデル６として以前に登録された本人の音声とのスコア（類似度）を計算する。スコアの計算に際しては背景話者モデル７を用いて背景話者の影響が除かれる。本人スコア計算部５におけるスコア計算の代表的な手法としては、ＨＭＭ（Hidden Markov Model：隠れマルコフモデル）法、ＧＭＭ（Gaussian Mixture density Model：ガウス混合分布モデル）法、ＤＰマッチング法などがある。 If the recognition result matches the collation pattern, the principal score calculation unit 5 calculates a score (similarity) between the utterance at the collation and the voice of the principal who has been registered as the registered speaker model 6 previously. In calculating the score, the background speaker model 7 is used to remove the influence of the background speaker. As representative methods of score calculation in the principal score calculation unit 5, there are an HMM (Hidden Markov Model) method, a GMM (Gaussian Mixture density Model) method, a DP matching method, and the like.

話者照合判定部８は、本人スコア計算部５で求められたスコアを本人受理用および他人棄却用の２種類の閾値を用いて判定する。すなわち、本人スコア計算部５で求められたスコアをＳ、本人受理用の閾値をＳ_ａｃｃ、他人棄却用の閾値をＳ_ｒｅｊとすると、Ｓ≧Ｓ_ａｃｃならば、ユーザは本人であるとして受理し、Ｓ＜Ｓ_ｒｅｊならば、ユーザは本人でないとして棄却し、Ｓ_ｒｅｊ≦Ｓ＜Ｓ_ａｃｃならば、本人か詐称者かの判別が困難であるとして、照合パタン作成部１に対してユーザに再照合のための照合パタンを提示するように指示する。再照合に際しては、登録されている数字がランダムに連結されて新たな照合パタンが作成される。 The speaker verification determination unit 8 determines the score obtained by the principal score calculation unit 5 using two kinds of threshold values for accepting the person and rejecting the other person. That is, if the score obtained by the principal score calculation unit 5 is S, the threshold for accepting the person is S _acc , and the threshold for rejecting the other person is S _rej , the user is accepted as the person if S ≧ S _acc. , S <S _{rej rejects} the user as not being the person, and S _rej ≦ S <S _acc. Instructs to present a verification pattern for verification. At the time of re-collation, the registered numbers are randomly connected to create a new collation pattern.

図９は、２種類の閾値Ｓ_ａｃｃ、Ｓ_ｒｅｊを用いた判定における受理／棄却／再発声の関係を、横軸をスコアＳとし、縦軸をスコア分布（頻度）として示している。同図から明らかなように、Ｓ＜Ｓ_ｒｅｊの場合にはユーザは詐称者であるとみることができ、Ｓ≧Ｓ_ａｃｃの場合にはユーザは本人であるとみることができる。また、Ｓ_ｒｅｊ≦Ｓ＜Ｓ_ａｃｃの場合にはユーザが本人であるか詐称者であるかを高い確率で判別できないので、ユーザに再発声を促す。 FIG. 9 shows the relationship between acceptance / rejection / _{relapse voice} in the determination using two types of threshold _values S _acc and S _rej , with the horizontal axis as score S and the vertical axis as score distribution (frequency). As can be seen from the figure, when S <S _rej , the user can be regarded as an impersonator, and when S ≧ S _acc , the user can be regarded as the person himself. Further, when S _rej ≦ S <S _acc , it is not possible to determine with high probability whether the user is the person or the impersonator.

テキスト指定型話者照合および複数の照合パタンを用いた話者照合については、下記非特許文献１〜４に記載されている。
松井古井:“テキスト指定型話者認識”信学論D-2 J79-D-2 No.5 pp.647-656(1996) 内部黒岩樋口:“数字を用いた話者照合方式の検討”信学技法,SP98-68,pp.1-8(1998) 加藤清水 “連続数字のパタン指定方式による話者照合精度の改善効果”音講論,2002秋,pp129-130(2002) 加藤清水 “連続数字のパタン指定方式による時期差データに対する話者照合精度の改善”音講論,2003春,pp105-106(2003) Non-Patent Documents 1 to 4 below describe text-specific speaker verification and speaker verification using a plurality of verification patterns.
Furui Matsui: “Text-specific speaker recognition” Theory of Science D-2 J79-D-2 No.5 pp.647-656 (1996) Inner Kuroiwa Higuchi: “Study of speaker verification using numbers”, IEICE Tech., SP98-68, pp.1-8 (1998) Shimizu Kato “Improvement of speaker verification accuracy by pattern designation method for consecutive numbers”, sound lecture, autumn 2002, pp129-130 (2002) Shimizu Kato, “Improvement of speaker verification accuracy for time difference data by pattern specification method for continuous numbers”, sound lecture, spring 2003, pp105-106 (2003)

上述のように、従来の話者照合装置における話者照合判定部は、Ｓ_ｒｅｊ≦Ｓ＜Ｓ_ａｃｃと判定した場合、照合パタン作成部１に対してユーザに再照合のための照合パタンを提示するように指示し、照合パタン作成部１は、再度、連続数字パタンを作成して第２発声をユーザに促す。しかしながら、第２発声のために作成される照合パタンは、単に数字をランダムに連結することによって得られるものであるため、そのような照合パタンを提示してユーザに第２発声、第３発声を促して再照合したとしても本人か詐称者かを正しく判定できる割合（照合率）が十分に高くなることは期待できないという問題がある。 As described above, when the speaker verification determination unit in the conventional speaker verification device determines that S _rej ≦ S <S _acc , the verification pattern creation unit 1 is presented with a verification pattern for re-verification to the user. The collation pattern creation unit 1 again creates a continuous numeric pattern and prompts the user to make a second utterance. However, since the collation pattern created for the second utterance is obtained by simply concatenating numbers at random, the collation pattern is presented to the user for the second utterance and the third utterance. There is a problem that even if prompting and re-verification are performed, it cannot be expected that the ratio (verification rate) for correctly determining whether the person is the person or the impersonator will be sufficiently high.

本発明の目的は、ユーザの発声の特徴に応じて、本人か詐称者かを正しく判定できる割合（照合率）が十分に高くなることが期待できる照合パタンを自動的に作成し、本人か詐称者かを正しく判定できる割合（照合率）を向上させることにある。 An object of the present invention is to automatically create a verification pattern that can be expected to have a sufficiently high ratio (matching rate) that can correctly determine whether the person is a person or a fraud according to the characteristics of the user's utterance. It is to improve the ratio (collation rate) that can be correctly determined.

上記課題を解決するために、本発明は、連続数字からなる照合パタンをユーザに提示し、ユーザにより発声された音声と予め登録された音声との類似度を算出して話者照合を行う話者照合装置において、照合パタンを構成する数字（あるいは数字連鎖）を元に精度予測と閾値設定を行うパタン依存精度予測・閾値設定部と、ユーザにより発声された音声と予め登録された音声との間において数字（あるいは数字連鎖）ごとに求めた類似度を、数字（あるいは数字連鎖）ごとの精度評価関数で評価する数字単位スコア評価部と、前記数字単位スコア評価部による評価結果に基づき、本人か詐称者かの判別が困難な数字（あるいは数字連鎖）を別の数字（あるいは数字連鎖）に入れ替えることによって新たな連続数字からなる照合パタンを作成する照合パタン作成部と、前記パタン依存精度予測・閾値設定部により設定された閾値を用いて、ユーザにより発声された音声と予め登録された音声との間において照合パターンについて求めた類似度を判定する判定部とを備えたことを特徴とする。 In order to solve the above-mentioned problem, the present invention presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification. In the person verification device, a pattern-dependent accuracy prediction / threshold setting unit that performs accuracy prediction and threshold setting based on a number (or a number chain) constituting the verification pattern, and a voice uttered by the user and a previously registered voice Based on the evaluation result by the numerical unit score evaluation unit and the numerical unit score evaluation unit that evaluates the degree of similarity calculated for each number (or number chain) with the accuracy evaluation function for each number (or number chain) A collation pattern consisting of new consecutive numbers by replacing a difficult number (or number chain) with another number (or number chain) Using the threshold set by the collation pattern creation unit and the pattern-dependent accuracy prediction / threshold setting unit, the similarity obtained for the collation pattern is determined between the voice uttered by the user and the voice registered in advance. And a determination unit.

ここで、前記パタン依存精度予測・閾値設定部は、照合パタンを構成する数字（あるいは数字連鎖）ごとに予め求められた本人および詐称者の類似度のスコア分布に基づいて精度予測と閾値設定を行うものとすることができる。 Here, the pattern-dependent accuracy prediction / threshold setting unit performs accuracy prediction and threshold setting based on the score distribution of similarity between the person and the impersonator obtained in advance for each number (or number chain) constituting the verification pattern. Can be done.

本発明によれば、ユーザにより発声された音声と予め登録された音声との間において数字（あるいは数字連鎖）ごとに類似度を求め、この類似度の評価結果から、本人か詐称者かの判別が困難な数字（あるいは数字連鎖）を別の数字（あるいは数字連鎖）に入れ替えて新たな照合パターンを作成するので、ユーザが再発声を行う場合に、指定された照合パターンに従うことによって本人か詐称者かを正しく判定できる割合（照合率）が十分に高くなることが期待できる。 According to the present invention, the degree of similarity is obtained for each number (or number chain) between the voice uttered by the user and the voice registered in advance, and it is determined whether the person is the person or the impersonator from the evaluation result of the degree of similarity. Since a new matching pattern is created by replacing a difficult number (or number chain) with another number (or number chain), when the user makes a recurring voice, the person himself or herself is misrepresented by following the specified matching pattern. It can be expected that the ratio (collation rate) for correctly determining whether the person is a person will be sufficiently high.

以下、図面を参照して本発明について説明する。図１は、本発明に係る話者照合装置の一実施形態を示すブロック図であり、図８と同一あるいは同等部分には同じ符号を付してある。 The present invention will be described below with reference to the drawings. FIG. 1 is a block diagram showing an embodiment of a speaker verification apparatus according to the present invention. The same or equivalent parts as those in FIG.

図１が図８と異なるのは、パタン依存精度予測・閾値設定部９と数字単位スコア評価部１０が新たに追加され、本人スコア計算部５が数字ごとのスコア（類似度）を求め、このスコアを数字単位スコア評価部１０に出力し、照合パタン作成部１が数字単位スコア評価部１０での評価結果にしたがって照合パタンを作成する点である。その他の部分は、図７と同様であるので、以下では主として、照合パタン作成部１、本人スコア計算部５、パタン依存精度予測・閾値設定部９、および数字単位スコア評価部１０について説明する。 FIG. 1 differs from FIG. 8 in that a pattern-dependent accuracy prediction / threshold setting unit 9 and a numerical unit score evaluation unit 10 are newly added, and the principal score calculation unit 5 obtains a score (similarity) for each number. The score is output to the numerical unit score evaluation unit 10, and the verification pattern generation unit 1 generates a verification pattern according to the evaluation result in the numerical unit score evaluation unit 10. Since the other parts are the same as those in FIG. 7, the collation pattern creation unit 1, the person score calculation unit 5, the pattern-dependent accuracy prediction / threshold setting unit 9, and the numerical unit score evaluation unit 10 will be mainly described below.

照合パタン作成部１は、ユーザに発声させる照合パタン（連続数字パタン）を作成するが、ユーザに再発声を促す時には、後述の数字単位スコア評価部１０で得られた本人らしさまたは詐称者らしさの評価結果に基づいて本人か詐称者かの判別が困難であるとされた数字を別の数字に入れ替えて新たな照合パタンを作成する。 The collation pattern creation unit 1 creates a collation pattern (continuous numeric pattern) to be uttered by the user. When prompting the user to recite, the collation pattern creation unit 1 has the identity or pretenderness obtained by the numerical unit score evaluation unit 10 described later. A new verification pattern is created by replacing a number that is difficult to determine whether the person is a person or an impersonator with another number based on the evaluation result.

本人スコア計算部５は、数字ごとに求めたスコアを数字単位スコア評価部１０に出力する。数字ごとのスコアは、登録話者モデル６と背景話者モデル７を用いて数字区切りごとに照合を行うことにより求めることができ、照合パターンについてのスコアとは照合する区切りの単位が異なるだけである。 The principal score calculation unit 5 outputs the score obtained for each number to the number unit score evaluation unit 10. The score for each number can be obtained by matching for each number break using the registered speaker model 6 and the background speaker model 7, and the score for the matching pattern is different only in the unit of the break. is there.

パタン依存精度予測・閾値設定部９は、照合パタンを構成する数字ごとの本人ならびに詐称者のスコア分布のパラメータを数字単位スコア評価部１０に出力する。数字ごとの本人ならびに詐称者のスコア分布は予め求めておくことができる。 The pattern-dependent accuracy prediction / threshold setting unit 9 outputs, to the numerical unit score evaluation unit 10, the parameters of the score distribution of the person and the impersonator for each number constituting the verification pattern. The score distribution of the person and the impersonator for each number can be obtained in advance.

図２は、数字ごとに集計した本人ならびに詐称者のスコア分布の一例を示す特性図である。μ_ｉ，ｄ、σ_ｉ，ｄはそれぞれ、本人のスコア分布における平均値、分散であり、μ_ｃ，ｄ、σ_ｃ、ｄはそれぞれ、詐称者のスコア分布における平均値、分散でありまた、Ｓ_{ＥＥＲ，ｄ}は、誤受理率ＦＡＲ_ｄと誤棄却率ＦＲＲ_ｄとが等しい等誤り率ＥＥＲ_ｄとなる閾値である。これらのパラメータ（平均値，分散）および等誤り率と等誤り率を与える閾値は数字ｄによって異なるので、ｄ下付きで表記する。 FIG. 2 is a characteristic diagram showing an example of the score distribution of the person and the impersonator who are tabulated for each number. μ _{i, d} and σ _{i, d} are the mean value and variance in the individual's score distribution, respectively, μ _{c, d} , σ _{c, d} are the mean value and variance in the spoofer's score distribution, and S _{EER, d} is a threshold value at which the equal error rate EER _d equals the error acceptance rate FAR _d and the error rejection rate FRR _d . These parameters (average value, variance) and the equal error rate and the threshold value for giving the equal error rate differ depending on the number d, and are therefore indicated with a subscript d.

本人および詐称者のスコア分布をそれぞれ正規分布として近似すると、閾値Ｓ_{ＥＥＲ，ｄ}と等誤り率ＥＥＲ_ｄは、下記の式（１）、（２）で与えられる。 When the score distributions of the person and the impersonator are approximated as normal distributions, the threshold S _{EER, d} and the equal error rate EER _d are given by the following equations (1) and (2).

また、パタン依存精度予測・閾値設定部９は、予め求められた数字ごとの本人ならびに詐称者のスコア分布を組み合わせて照合パタンに関する本人ならびに詐称者のスコア分布を推定し、それらのスコア分布から照合パタンに依存する閾値Ｓ_{ＥＥＲ，ｚ}と等誤り率ＥＥＲ_ｚとを推定する。 Further, the pattern-dependent accuracy prediction / threshold setting unit 9 estimates the score distribution of the person and the impersonator regarding the collation pattern by combining the score distribution of the person and the impersonator for each number obtained in advance, and collates from the score distribution. A threshold value S _{EER, z} depending on the pattern and an equal error rate EER _z are estimated.

数字単位スコア評価部１０は、数字ごとのスコアと数字ごとのスコア分布のパラメータ（μ_ｃ，ｄ、μ_ｉ，ｄ、σ_ｃ，ｄ、σ_ｉ，ｄ）とを入力とし、照合パタンを構成する数字ごとに、本人らしさまたは詐称者らしさの評価を行う。この評価結果は、上述のように、照合パタン作成部１での照合パタンの作成する際に利用される。 The numerical unit score evaluation unit 10 inputs a score for each number and parameters (μ _{c, d} , μ _{i, d} , σ _{c, d} , σ _{i, d} ) for each number, and forms a collation pattern For each number you do, evaluate your identity or your impersonator. As described above, this evaluation result is used when the collation pattern creation unit 1 creates a collation pattern.

話者照合判定部８は、本人受理用および他人棄却用の２種類の閾値Ｓ_ａｃｃ、Ｓ_ｒｅｊを用いて判定を行うが、これらの閾値Ｓ_ａｃｃ、Ｓ_ｒｅｊも照合パタンに使用される数字によって異なる。閾値Ｓ_ａｃｃ、Ｓ_ｒｅｊは、パタン依存精度予測・閾値設定部９で推定された、照合パタンに関するスコア分布に従って設定される。 The speaker verification determination unit 8 performs determination using two types of threshold _values S _acc and S _reg for accepting the person and rejecting the other person. The threshold _values S _acc and S _reg are also determined by the numbers used in the verification pattern. Different. The thresholds S _acc and S _rej are set according to the score distribution related to the collation pattern estimated by the pattern-dependent accuracy prediction / threshold setting unit 9.

次に、照合パタン作成部１、パタン依存精度予測・閾値設定部９、および数字単位スコア評価部１０の具体的構成について説明する。 Next, specific configurations of the collation pattern creation unit 1, the pattern-dependent accuracy prediction / threshold setting unit 9, and the numerical unit score evaluation unit 10 will be described.

図３は、照合パタン作成部１の一例を示すブロック図である。照合パタン作成部１には、数字単位スコア評価部１０から数字ごとの本人らしさの評価結果が入力される。数字（連鎖）取捨選択部１１は、評価結果が0.5に近い数字を捨て、別の新たな数字に入れ替えるように照合パタン生成部１２に指示する。照合パタン生成部１２は、この指示にしたがって登録発声パタンから新たな数字をピックアップし、これを残されている数字に付加して再照合時の照合パタンとする。ここでピックアップする数字は、捨てられた数字以外の任意の数字を任意の順序で選択することができる。 FIG. 3 is a block diagram illustrating an example of the collation pattern creation unit 1. The collation pattern creation unit 1 receives an evaluation result of the individuality for each number from the numerical unit score evaluation unit 10. The number (chain) sorting selection unit 11 instructs the collation pattern generation unit 12 to discard a number whose evaluation result is close to 0.5 and replace it with another new number. The verification pattern generation unit 12 picks up a new number from the registered utterance pattern in accordance with this instruction and adds it to the remaining number as a verification pattern at the time of re-verification. As the numbers to be picked up, any number other than the discarded numbers can be selected in any order.

図４は、数字（連鎖）取捨選択部１１における数字（連鎖）の入れ替えの一例の説明図である。まず、照合パタン作成部１は、ユーザに第１発声目の照合パタン、例えば“1 2 3 4”を提示する。ユーザは、この照合パタンにしたがって音声入力する。 FIG. 4 is an explanatory diagram of an example of the replacement of numbers (chains) in the number (chain) sorting selection unit 11. First, the collation pattern creation unit 1 presents the collation pattern of the first utterance, for example, “1 2 3 4” to the user. The user inputs voice according to the verification pattern.

第１発声目の照合パタンについてのスコアＳがＳ_ｒｅｊ≦Ｓ＜Ｓ_ａｃｃ（図９参照）あり、数字単位スコア評価部１０で得られた各数字についての本人らしさの評価結果がそれぞれ、0.45、0.55、0.65、0.60であったとすると、数字（連鎖）取捨選択部１１は、再照合時の照合パタンとして、本人らしさの評価結果が0.50に近い数字、すなわち本人か詐称者かの判別が困難な数字を捨て、新たな数字に入れ替えて連続数字を作成する。 The score S for the collation pattern of the first utterance is S _rej ≦ S <S _acc (see FIG. 9), and the individuality evaluation result for each number obtained by the numerical unit score evaluation unit 10 is 0.45, If it is 0.55, 0.65, or 0.60, the number (chain) selection selection unit 11 has difficulty in discriminating whether the identity evaluation result is a value close to 0.50, that is, the person or the impersonator, as the collation pattern at the time of re-collation. Discard the numbers and replace them with new numbers to create consecutive numbers.

例えば、本人らしさの評価結果が0.45、0.55の“1 2”を捨て、新たに“5 6”を加えて第発２声目の照合パタン“3 4 5 6”を作成する。したがって、照合パタン作成部１は、第２発声目の照合パタンとして“3 4 5 6”をユーザに提示することになる。第３発声目以降の再発声についても同様である。 For example, “1 2” having an evaluation result of authenticity of 0.45 and 0.55 is discarded, and “5 6” is newly added to create a collation pattern “3 4 5 6” of the second voice. Therefore, the verification pattern creation unit 1 presents “3 4 5 6” to the user as the verification pattern of the second utterance. The same applies to recurrent voices after the third utterance.

図５は、パタン依存精度予測・閾値設定部９の一例を示すブロック図である。パタン依存精度予測・閾値設定部９には、照合パタン作成部１で作成された照合パタンが入力される。本人・詐称者スコア分布予測部９１は、予め求められた数字ごとのスコア分布のパラメータ（μ_ｃ，ｄ、μ_ｉ，ｄ、σ_ｃ，ｄ、σ_ｉ，ｄ）９２用いて照合パタンに関する本人ならびに詐称者のスコア分布を推定する。精度予測・閾値設定部９３は、本人・詐称者スコア分布予測部９１で推定された、照合パタンに関する本人ならびに詐称者のスコア分布に基づいて照合パタンに対する閾値を求め、この閾値を話者照合判定部８に出力する。なお、図示しないが、パタン依存精度予測・閾値設定部９は、数字ごとの本人ならびに詐称者のスコア分布のパラメータ（μ_ｃ，ｄ、μ_ｉ，ｄ、σ_ｃ，ｄ、σ_ｉ，ｄ）を数字単位スコア評価部１０に出力する。 FIG. 5 is a block diagram illustrating an example of the pattern-dependent accuracy prediction / threshold setting unit 9. The pattern-dependent accuracy prediction / threshold setting unit 9 receives the collation pattern created by the collation pattern creation unit 1. The person / spoofer score distribution prediction unit 91 uses the parameters (μ _{c, d} , μ _{i, d} , σ _{c, d} , σ _{i, d} ) 92 for each number obtained in advance to identify the person regarding the matching pattern. As well as the score distribution of fraudsters. The accuracy prediction / threshold setting unit 93 obtains a threshold for the collation pattern based on the score distribution of the person and the impersonator regarding the collation pattern estimated by the person / spoofer score distribution prediction unit 91, and determines the threshold for speaker collation. Output to unit 8. Although not shown in the drawing, the pattern-dependent accuracy prediction / threshold setting unit 9 sets the parameters (μ _{c, d} , μ _{i, d} , σ _{c, d} , σ _{i, d} ) of the person and the spoofer for each number. Is output to the numerical unit score evaluation unit 10.

照合パタンに関するスコア分布は、正規分布の再生性から求められる。例えば、照合パタンｚが４種類の変数ｓ，ｔ，ｕ，ｖ（４桁の数字）で構成されるとき、照合パタンｚに関するスコア分布ｓ_ｚは、構成要素ｓ_ｓ，ｓ_ｔ，ｓ_ｕ，ｓ_ｖの相加平均で近似され、下記式（３）で求められる。 The score distribution relating to the verification pattern is obtained from the reproducibility of the normal distribution. For example, when the collation pattern z is composed of four types of variables s, t, u, and v (four-digit numbers), the score distribution s _z regarding the collation pattern _z is the components s _s , s _t , s _u , s _v is approximated by the arithmetic mean of, obtained by the formula described below (3).

このとき、照合パタンｚに関する本人のスコア分布ｓ_ｚの平均値μ_ｃ，ｚと分散Ｖ_ｃ，ｚは、下記式（４）、（５）で求められる。 At this time, the average value mu _{c, z} and variance _{V c} of score distribution _{s z} of the person about the collation pattern _{z, z} is represented by the following formula (4) obtained in (5).

また、照合パタンｚに関する詐称者のスコア分布ｓ_ｚの平均値μ_ｉ，ｚと分散Ｖ_ｉ，ｚは、下記式（６）、（７）で求められる。 Further, the average value μ _{i, z} and the variance V _{i, z} of the score distribution s _z of the impersonator regarding the collation pattern z are obtained by the following formulas (6) and (7).

以上の変数を用いて、照合パタンに関するスコア分布ｓ_ｚにおける等誤り率ＥＥＲ_ｚと該等誤り率を与える閾値Ｓ_{ＥＥＲ，ｚ}は、下記式（８）、（９）で与えられる。 Using the above variables, the equal error rate EER _z and the threshold S _{EER, z} that gives the equal error rate in the score distribution s _z regarding the collation pattern are given by the following equations (8) and (9).

図６は、数字単位スコア評価部１０の一例を示すブロック図である。数字単位スコア評価部１０は、本人スコア計算部５からの数字ごとのスコアＳとパタン依存精度予測・閾値設定部９のからの数字ごとのスコア分布のパラメータ（μ_ｃ，ｄ、μ_ｉ，ｄ、σ_ｃ，ｄ、σ_ｉ，ｄ）９２とを入力とし、照合パタンを構成する数字ごとに、本人らしさまたは詐称者らしさの評価を行う。 FIG. 6 is a block diagram illustrating an example of the numerical unit score evaluation unit 10. The numerical unit score evaluation unit 10 calculates the score S for each number from the principal score calculation unit 5 and the score distribution parameters (μ _{c, d} , μ _{i, d)} for each number from the pattern-dependent accuracy prediction / threshold setting unit 9. , Σ _{c, d} , σ _{i, d} ) 92 are input, and the identity or the impersonator is evaluated for each number constituting the collation pattern.

スコアＳは数字ごとに分布の範囲（平均値と分散）が異なるため、数字ごとのスコア分布の範囲（平均値と分散）のばらつきを正規化した本人らしさの評価尺度を次のように定める。
平均値μ、分散σ^２の正規分布の確率密度関数を下記式（１０）とすると、新たな本人らしさの評価尺度Ｑ_ｄを下記式（１１）のとおり定義できる。 Since the range of distribution (average value and variance) of the score S differs for each number, an evaluation scale of personality that normalizes the variation in the range of score distribution (average value and variance) for each number is determined as follows.
When the probability density function of the normal distribution with the mean value μ and the variance σ ^{2 is represented by} the following equation (10), a new evaluation measure Q _d of personality can be defined as the following equation (11).

ここで、Ｑ_ｄは、０＜Ｑ_ｄ＜１を満たし、０に近いほど詐称者らしく、１に近いほど本人らしい。また、０．５付近では本人か詐称者かの判別が難しいといえる。数字単位スコア評価部１０は、数字ごとにＱ_ｄを求め、その結果を照合パタン作成部１に渡す。 Here, Q _d satisfies 0 <Q _d <1, and the closer to 0, the more impersonator, the closer to 1, the more likely the person is. In addition, it can be said that it is difficult to determine whether the person is a person or an impersonator near 0.5. The numerical unit score evaluation unit 10 calculates Q _d for each number and passes the result to the collation pattern creation unit 1.

なお、上記実施形態では、数字ごとに本人らしさを評価し、数字単位での入れ替えにより新たな照合パタンを作成するものとして説明したが、例えば２桁からなる数字連鎖を単位としてスコアを求め、予め求められた数字単位でのスコア分布から算出される数字連鎖単位でのスコア分布に基づいて本人らしさを評価し、数字連鎖単位での入れ替えを行って新たな照合パタンを作成するようにすることもできる。 In the embodiment described above, the identity is evaluated for each number, and a new collation pattern is created by replacement in units of numbers. However, for example, a score is obtained in units of a two-digit number chain, It is also possible to evaluate the identity based on the score distribution in the numerical chain unit calculated from the obtained score distribution in the numerical unit, and create a new matching pattern by replacing the numerical chain unit. it can.

また、上記実施形態では、式（１１）により本人らしさを評価したが、本人らしさの評価は、その他の評価関数で行うことができ、また、本人スコア計算部５で求められた数字（あるいは数字連鎖）ごとのスコアそのものを用いて行うこともできる。 Further, in the above embodiment, the identity is evaluated by the expression (11). However, the identity can be evaluated by another evaluation function, and the number (or the number) obtained by the principal score calculation unit 5 can be evaluated. It is also possible to use the score for each chain).

図７は、従来方式（Ｑ_ｄに依存しない第２発目照合パタンを用いる方式）と本発明（Ｑ_ｄに依存する第２発目照合パタンを用いる方式）とで話者照合を行った実験結果を対比して示す。 7, the conventional method (method of using a second shot th matching pattern does not depend on Q _d) and the present invention (method of using a second shot th matching pattern depending on the Q _d) and de experiments conducted speaker verification The results are shown in comparison.

ここでは、男性81名、女性74名の携帯電話音声を用いた。また、本人受理用の閾値Ｓ_ｒｅｊ、他人棄却用の閾値Ｓ_ａｃｃをそれぞれ、誤棄却率ＦＲＲ＝1.0％、誤受理率ＦＡＲ＝1.0％になるように設定し、６桁数字の照合パタンを用い、２桁連鎖保存４桁で数字の入れ替えを行った。ｎ_{ａ，（ｋ）}、ｎ_{ｉ，（ｋ）}は本人、詐称者のｋ回目の発生数、ｎ_{ａ，（ｋ）}、ｎ_ｒ（ｋ）はｋ回目の発声で受理された本人、棄却された詐称者の発生数を表す。 Here, mobile phone voices of 81 men and 74 women were used. In addition, the threshold S _reg for accepting the principal and the threshold S _acc for _rejecting the other person are set so that the false rejection rate FRR = 1.0% and the false acceptance rate FAR = 1.0%, respectively, and a 6-digit number verification pattern is used. Numbers were exchanged with 2 digits stored in 4 digits. n _{a, (k)} , _{ni, (k)} are the number of occurrences of the person, the impersonator, the k-th occurrence, and na _{, (k)} , n _{r (k)} are the persons who are accepted in the k-th utterance, rejected The number of occurrences of fraudsters.

図７から、第１発声目で判定されずに再発声を行った本人11.5％、詐称者12.0％のうち、第２発声目で正しく判定された割合は、従来方式の本人5.8％、詐称者3.6％に対し、本発明では本人6.5％、詐称者4.3％であり、それぞれ0.7％ずつ増加していることが分かる。 Figure 7 shows that 11.5% of the person who made a recurrence without being judged by the first utterance and 12.0% of the spoofer were correctly judged by the second utterance. In contrast to 3.6%, in the present invention, it is 6.5% for the person and 4.3% for the fraud.

本発明は、本人らしさを考慮して照合パタンを作成することにより、本人か詐称者かを正しく判定できる割合（照合率）を向上させることができるので、テレホンバンキングなどの電話サービスのための音声認証システム、インターネットや携帯インターネットにおける音声認証システムなどに適用することができる。 In the present invention, by creating a collation pattern in consideration of the identity of the person, it is possible to improve the ratio (collation rate) that can correctly determine whether the person is a person or an impersonator. Therefore, the voice for telephone services such as telephone banking The present invention can be applied to an authentication system, a voice authentication system on the Internet or a mobile Internet, and the like.

本発明に係る話者照合装置の一実施形態を示すブロック図である。It is a block diagram which shows one Embodiment of the speaker collation apparatus which concerns on this invention. 数字ごとに集計した本人ならびに詐称者のスコア分布の一例を示す特性図である。It is a characteristic view which shows an example of the score distribution of the person who is totaled for every number, and an impersonator. 照合パタン作成部の一例を示すブロック図である。It is a block diagram which shows an example of a collation pattern creation part. 本発明における数字（連鎖）の入れ替えの一例の説明図である。It is explanatory drawing of an example of replacement | exchange of the number (chain) in this invention. パタン依存精度予測・閾値設定部の一例を示すブロック図である。It is a block diagram which shows an example of a pattern dependence precision prediction and threshold value setting part. 数字単位スコア評価部の一例を示すブロック図である。It is a block diagram which shows an example of a numerical unit score evaluation part. 従来方式と本発明とに従って話者照合を行った実験結果を対比して示す図である。It is a figure which compares and shows the experimental result which performed speaker collation according to the conventional system and this invention. 従来の話者照合装置のブロック図である。It is a block diagram of the conventional speaker verification apparatus. ２種類の閾値を用いた判定における受理／棄却／再発声の関係を示す特性図である。It is a characteristic view which shows the relationship of acceptance / rejection / recurrence voice in the determination using two types of threshold values.

Explanation of symbols

１・・・照合パタン作成部、２・・・音声認識部、３・・・音声認識モデル、４・・・認識結果照合部、５・・・本人スコア計算部、６・・・登録話者モデル、７・・・背景話者モデル、８・・・話者照合判定部、９・・・パタン依存・精度予測閾値設定部、１０・・・数字単位スコア評価部、１１・・・数字（連鎖）取捨選択部、１２・・・照合パタン生成部、１３・・・登録発声パタン、９１・・・本人・詐称者スコア分布推定部、９２・・・数字ごとのスコア分布のパラメータ、９３・・・精度予測・閾値設定部
DESCRIPTION OF SYMBOLS 1 ... Collation pattern creation part, 2 ... Speech recognition part, 3 ... Speech recognition model, 4 ... Recognition result collation part, 5 ... Personal score calculation part, 6 ... Registered speaker Model: 7 ... Background speaker model, 8 ... Speaker collation determination unit, 9 ... Pattern dependence / accuracy prediction threshold setting unit, 10 ... Number unit score evaluation unit, 11 ... Number ( Chain) selection selection unit, 12 ... collation pattern generation unit, 13 ... registered utterance pattern, 91 ... person / spoofer score distribution estimation unit, 92 ... parameter of score distribution for each number, 93 ..Accuracy prediction / threshold setting section

Claims

In a speaker verification device that presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification,
A pattern-dependent accuracy prediction / threshold setting unit for performing accuracy prediction and threshold setting based on the numbers constituting the verification pattern;
A numerical unit score evaluation unit that evaluates the degree of similarity calculated for each number between the voice uttered by the user and the previously registered voice, with an accuracy evaluation function for each number;
Based on the evaluation result by the numerical unit score evaluation unit, a collation pattern creation unit that creates a collation pattern composed of new continuous numbers by replacing a number that is difficult to identify whether the person or the impersonator with another number;
A determination unit that determines the similarity obtained for the matching pattern between the voice uttered by the user and the voice registered in advance using the threshold set by the pattern-dependent accuracy prediction / threshold setting unit; A speaker verification device characterized by that.

In a speaker verification device that presents a verification pattern consisting of continuous numbers to a user, calculates a similarity between a voice uttered by the user and a previously registered voice, and performs speaker verification,
A pattern-dependent accuracy prediction / threshold setting unit that performs accuracy prediction and threshold setting based on a number chain that constitutes a matching pattern;
A numerical unit score evaluation unit that evaluates the degree of similarity obtained for each number chain between the voice uttered by the user and the previously registered voice with an accuracy evaluation function for each number chain;
Based on the evaluation result by the numerical unit score evaluation unit, a collation pattern creation unit that creates a collation pattern composed of new continuous numbers by replacing a number chain that is difficult to determine whether the person or the impersonator with another number chain, and
A determination unit that determines the similarity obtained for the matching pattern between the voice uttered by the user and the voice registered in advance using the threshold set by the pattern-dependent accuracy prediction / threshold setting unit; A speaker verification device characterized by that.

The pattern-dependent accuracy prediction / threshold setting unit is configured to perform accuracy prediction and threshold setting based on a score distribution of similarity between a person and an impersonator obtained in advance for each number or number chain constituting a collation pattern. The speaker verification device according to claim 2 or 3.