JP2007052496A

JP2007052496A - User authentication system and user authentication method

Info

Publication number: JP2007052496A
Application number: JP2005235428A
Authority: JP
Inventors: Kazufumi Matsumoto; 一文松本; Toshiro Kodama; 敏朗児玉
Original assignee: Advanced Media Inc
Current assignee: Advanced Media Inc
Priority date: 2005-08-15
Filing date: 2005-08-15
Publication date: 2007-03-01

Abstract

<P>PROBLEM TO BE SOLVED: To provide user authentication that can achieve rigidity with two-element authentication by combining password authentication and voiceprint authentication, can shorten user authentication time, does not require complicated work, and has high operability. <P>SOLUTION: An utterance acquiring section 16 is connected to an interactive voice responding device 3 from a user call terminal 1 with a line, and acquires utterance voice of a password input into the user call terminal 1 via a telephone network 2. Password acquiring voice is input into a voice recognition server 5, and it is determined by voice recognition whether the acquired password utterance voice matches with a previously registered password. After success in password authentication, the password utterance voice is input into a voiceprint authentication section 27, and it is determined whether voiceprint data of the password utterance voice matches with registered voiceprint data in a user profile of the user. The voiceprint authentication result is reported from an authentication result reporting section 28 to a user management server 7. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、ユーザ端末から取り込んだ発話音声を通信網経由で受信してリアルタイムでユーザ認証を行う音声認証局に適用可能なユーザ認証システム及びユーザ認証方法に関する。 The present invention relates to a user authentication system and a user authentication method that can be applied to a voice authentication station that receives voices captured from a user terminal via a communication network and performs user authentication in real time.

現在、インターネット上では、商取引、サイトアクセスをはじめとした様々な場面で、ユーザ本人であることの確認であるユーザ認証が行われている。例えば、事前にユーザ本人にユーザＩＤ及びパスワードを決めてもらい又は付与しておき、認証の必要が発生した場合にユーザ端末に表示したＷＷＷ画面やダイアログにユーザＩＤ及びパスワードを入力してもらい、認証サーバに予め登録したユーザＩＤ及びパスワードと照合してユーザ認証を行っている。ところが、ユーザＩＤ及びパスワードで認証を行う方法は、ユーザＩＤ及びパスワードが漏洩した場合、容易に成りすましを許してしまうという問題がある。 Currently, on the Internet, user authentication, which is confirmation of the identity of a user, is performed in various situations such as commercial transactions and site access. For example, the user himself / herself decides or gives a user ID and password, and when authentication is required, the user ID and password are entered on the WWW screen or dialog displayed on the user terminal, and authentication is performed. User authentication is performed by checking with a user ID and password registered in advance in the server. However, the method of authenticating with the user ID and password has a problem that if the user ID and password are leaked, spoofing is easily allowed.

そこで、セキュリティーを向上させつつ、容易かつ安価にユーザ認証を行う方法としてパスワード認証と声紋認証とを組み合わせた認証方法が提案されている（例えば、特許文献１参照）。かかる特許文献に開示された認証方法は、ネットワーク経由でパスワードを受信してパスワード認証した後、当該ネットワークとは異なる通信回線である電話回線を使用してユーザの音声を取得し、ユーザ音声の声紋情報に基づいてユーザを認証するものである。
特開２００４―１３２７４号公報 Therefore, an authentication method combining password authentication and voiceprint authentication has been proposed as a method for performing user authentication easily and inexpensively while improving security (for example, see Patent Document 1). In the authentication method disclosed in this patent document, after receiving a password via a network and authenticating the password, the user's voice is acquired using a telephone line which is a communication line different from the network, and the voice print of the user voice is obtained. The user is authenticated based on the information.
JP 2004-13274 A

しかしながら、上述したユーザ端末からネットワーク経由でパスワードを送信してパスワード認証した後、異なる通信回線である電話回線を使用してユーザ音声を送り声紋認証を行う方法は、ネットワーク経由でのパスワード送信操作と電話回線での発呼操作という２つの作業が別々に存在するので、ユーザ認証が完了するまでの時間が長くかかるといった問題があると共に、ユーザ認証のための作業自体が二度手間になり煩雑であるといった問題がある。 However, after the password is transmitted from the user terminal through the network and the password is authenticated, the user voice is transmitted by using a telephone line which is a different communication line and the voiceprint authentication is performed by a password transmission operation through the network. Since there are two separate operations for making a call on the telephone line, there is a problem that it takes a long time to complete user authentication, and the operation for user authentication itself is troublesome twice. There is a problem.

本発明は、以上のような実情に鑑みてなされたもので、パスワード認証と声紋認証とを組み合わせた堅牢性を維持しつつ、ユーザ認証時間を短縮可能で高速応答性を実現でき、しかも通信回線の切り替えが不要で操作性の改善されたユーザ認証システム及びユーザ認証方法を提供することを目的とする。 The present invention has been made in view of the above circumstances, and can maintain a robust combination of password authentication and voiceprint authentication while reducing user authentication time and realizing high-speed responsiveness. It is an object of the present invention to provide a user authentication system and a user authentication method that do not require switching and have improved operability.

本発明のユーザ認証システムは、ユーザ通話端末に入力されたパスワードの発話音声を通信網経由で取得する発話取得手段と、取得されたパスワード発話音声が予め登録された登録パスワードと一致するか否か音声認識により判定する音声認識手段と、前記パスワード発話音声の声紋データが当該ユーザのユーザプロファイル内の登録声紋データと一致するか否か判定する声紋認証手段と、を具備したことを特徴とする。 The user authentication system according to the present invention includes an utterance acquisition unit that acquires an utterance voice of a password input to a user call terminal via a communication network, and whether or not the acquired password utterance voice matches a registered password registered in advance. Voice recognition means for judging by voice recognition, and voiceprint authentication means for judging whether or not the voiceprint data of the password utterance voice matches the registered voiceprint data in the user profile of the user.

このように構成されたユーザ認証システムによれば、ユーザ通話端末に入力されたパスワードの発話音声を用いてパスワード認証と声紋認証とが行われるので、パスワード認証と声紋認証とを組み合わせた堅牢性を実現できると共に、ユーザ認証時間を短縮可能で高速応答性を実現できる。また通信回線の切り替えが不要なので操作性の改善を図ることもできる。 According to the user authentication system configured as described above, password authentication and voiceprint authentication are performed using the uttered voice of the password input to the user call terminal, and thus robustness combining password authentication and voiceprint authentication is achieved. It can be realized, and the user authentication time can be shortened and high-speed response can be realized. In addition, since it is not necessary to switch communication lines, operability can be improved.

本発明は、上記ユーザ認証システムにおいて、前記ユーザプロファイルは、発話速度の異なるパスワード発話音声からそれぞれ抽出された複数パターンの声紋データを有し、前記声紋認証手段は、ユーザ認証時に取得されたパスワード発話音声の声紋データが前記ユーザプロファイル内の複数パターンの声紋データのいずれかと一致しているか否か判定することを特徴とする。 In the user authentication system according to the present invention, the user profile includes a plurality of patterns of voiceprint data extracted from password utterances having different utterance speeds, and the voiceprint authentication means includes the password utterance acquired at the time of user authentication. It is determined whether or not the voice print data matches with any one of a plurality of patterns of voice print data in the user profile.

これにより、発話速度の異なるパスワード発話音声からそれぞれ抽出された複数パターンの声紋データとの照合により声紋認証が行われるので、声紋認証の誤判定を有効に防止することができる。パスワード発話時のユーザの心理状態、健康状態などのユーザ状況に応じて発話速度又は声帯の緊張度等が変化して声紋データに揺らぎが生じるが、発話速度の異なる複数パターンを登録しておくことにより柔軟に対応することができる。 Thereby, since the voiceprint authentication is performed by collating with the voiceprint data of a plurality of patterns extracted from the password utterance voices having different utterance speeds, it is possible to effectively prevent an erroneous determination of the voiceprint authentication. The voice rate data fluctuates due to changes in the speech rate or vocal cord tension depending on the user's state of mind such as the user's psychological state and health status at the time of password utterance, but multiple patterns with different utterance rates should be registered It can respond flexibly.

また本発明は、上記ユーザ認証システムにおいて、ユーザ認証要求したユーザ通話端末の識別情報又は当該ユーザの識別情報に基づいて、登録パスワードを管理するユーザ管理手段から当該ユーザの登録パスワードを取得するユーザ情報取得手段と、前記ユーザ情報取得手段が取得した登録パスワードを前記音声認識手段が音声認識に用いるキーワード辞書に変換するキーワード辞書作成手段と、を具備したことを特徴とする。 In the user authentication system, the present invention also provides user information for acquiring the user's registered password from the user management terminal that manages the registered password based on the identification information of the user call terminal that requested user authentication or the identification information of the user. And a keyword dictionary creating means for converting the registered password obtained by the user information obtaining means into a keyword dictionary used by the voice recognition means for voice recognition.

これにより、音声認識手段がキーワード辞書を用いてパスワード発話音声を認識できた場合はパスワード認証が成功し、認識できない場合にはパスワード認証が失敗したことになる。当該ユーザが登録しているパスワードだけをキーワード辞書に変換するので、多数のキーワード辞書を用いて音声認識する場合に比べて認識に要する時間及び認識精度を格段に上げることができる。 As a result, when the voice recognition means can recognize the password utterance using the keyword dictionary, the password authentication succeeds, and when it cannot be recognized, the password authentication fails. Since only the password registered by the user is converted into the keyword dictionary, the time required for recognition and the recognition accuracy can be significantly increased as compared to the case where speech recognition is performed using a large number of keyword dictionaries.

また本発明は、上記ユーザ認証システムにおいて、前記声紋認証手段が声紋認証に成功した場合、当該パスワード発話音声の声紋データを前記ユーザプロファイルに声紋データとして登録するユーザプロファイル再構築手段を具備したことを特徴とする。 According to the present invention, the user authentication system further comprises a user profile restructuring unit that registers the voiceprint data of the password uttered voice as voiceprint data in the user profile when the voiceprint authentication unit succeeds in the voiceprint authentication. Features.

これにより、声紋認証に成功した声紋データを用いてユーザプロファイルを再構築するので、ユーザの体型や声帯変化によってユーザの声紋データが径時変化してもユーザ認証時に得られる最新の声紋データでユーザプロファイルが柔軟に更新されることとなり、声紋認証の誤判定を防止することができる。 As a result, the user profile is reconstructed using the voiceprint data that has been successfully voiceprinted, so even if the user's voiceprint data changes due to changes in the user's body shape or vocal cords, the user can use the latest voiceprint data obtained at the time of user authentication. Since the profile is flexibly updated, erroneous determination of voiceprint authentication can be prevented.

また本発明は、上記ユーザ認証システムにおいて、声紋データを新規登録する場合、ユーザ通話端末に対して発話入力しない期間を設け、当該発話入力しない期間の受信信号に基づいてユーザ側の環境が許容可能な雑音レベルであるか否か判定する雑音判定手段と、前記雑音判定手段が許容可能な雑音レベルであると判定した後、声紋登録用のパスワード発話音声データから声紋データを抽出して前記ユーザプロファイルに登録するユーザプロファイル作成手段と、を具備したことを特徴とする。 In the user authentication system, when voiceprint data is newly registered, the present invention provides a period during which no utterance is input to the user call terminal, and the environment on the user side is acceptable based on a received signal during the period during which no utterance is input. Noise determination means for determining whether or not the noise level is acceptable, and after determining that the noise determination means has an acceptable noise level, voice print data is extracted from voice utterance voice data for voice print registration and the user profile And a user profile creation means for registration.

これにより、ユーザ側の環境が許容可能な雑音レベルであることを確認した上で、声紋データ新規登録のためのパスワード発話を行うので、登録された声紋データの精度が低いことに起因した誤判定は有効に防止することができる。 As a result, after confirming that the user's environment has an acceptable noise level, a password utterance for new registration of voiceprint data is performed, so erroneous determination due to low accuracy of registered voiceprint data Can be effectively prevented.

また本発明は、上記ユーザ認証システムにおいて、声紋登録用のパスワード発話音声データの発話期間から入力音声が入力許容レベルを超えているか否か判定する音割れ判定手段を備え、前記音割れ判定手段で前記入力許容レベルに収まっていると判定された後に声紋データを抽出することを特徴とする。 In the user authentication system, the present invention further includes sound crack determination means for determining whether or not the input voice exceeds an input allowable level from the utterance period of the password utterance voice data for voiceprint registration. Voiceprint data is extracted after it is determined that the input allowable level is satisfied.

これにより、入力許容レベルを超える入力音声が存在した場合は、声紋データとして抽出する対象から除外されるので、声紋認証の精度を向上することができる。 As a result, if there is an input voice that exceeds the input allowable level, it is excluded from the object to be extracted as voiceprint data, so that the accuracy of voiceprint authentication can be improved.

また本発明は、上記ユーザ認証システムにおいて、声紋データを新規登録する場合、声紋登録用のパスワード発話音声データから発話期間の前後の少なくとも一方の所定区間の状態からユーザ側の環境が許容可能な雑音レベルであるか否か判定する雑音判定手段と、声紋登録用のパスワード発話音声データの発話期間から声紋データを抽出して前記ユーザプロファイルに登録するユーザプロファイル作成手段と、を具備したことを特徴とする。 Further, according to the present invention, in the user authentication system, when voiceprint data is newly registered, the noise on the user side is acceptable from the state of at least one predetermined section before and after the utterance period from the password utterance voice data for voiceprint registration. Noise determination means for determining whether or not a level, and user profile creation means for extracting voice fingerprint data from an utterance period of voice utterance voice data for voiceprint registration and registering it in the user profile To do.

これにより、声紋登録用のパスワード発話音声データの所定区間の状態からユーザ側の環境が許容可能な雑音レベルであるか否か判定するので、雑音判定のために発話入力しない期間を設ける必要が無くなり、新規声紋登録の時間を短縮することができる。 As a result, it is determined whether or not the environment on the user side has an acceptable noise level from the state of the predetermined section of the password utterance voice data for voiceprint registration, so there is no need to provide a period during which no utterance is input for noise determination. The time for registering a new voiceprint can be shortened.

これにより、入力音声が入力許容レベルを超えているために音割れが発生している入力音声に基づいた声紋データが登録されるのを防止でき、音割れデータに基づいた誤認証を防止できる。 As a result, it is possible to prevent voiceprint data based on the input voice in which sound cracking has occurred because the input voice exceeds the input allowable level, and to prevent erroneous authentication based on the sound cracking data.

また本発明は、上記ユーザ認証システムにおいて、声紋データを新規登録する場合、声紋登録用のパスワード発話を少なくとも３回行わせ、１回目のパスワード発話音声と２回目のパスワード発話音声とを比較して本人確度を計算し、１回目のパスワード発話音声と３回目のパスワード発話音声とを比較して本人確度を計算し、２回目のパスワード発話音声と３回目のパスワード発話音声とを比較して本人確度を計算し、計算された全ての本人確度が所定値を超えている場合に当該各パスワード発話音声を採用する。 In the user authentication system, when newly registering voiceprint data, the password utterance for voiceprint registration is performed at least three times, and the first password utterance voice is compared with the second password utterance voice. Calculating the identity accuracy, comparing the first password utterance speech with the third password utterance speech, calculating the identity accuracy, comparing the second password utterance speech with the third password utterance speech, and the identity accuracy And when all the calculated personal identities exceed a predetermined value, the password utterances are adopted.

これにより、２回は本人が発話したが、１回は他人が発話したような場合であっても、その他人の発話だけを問題があると判断でき、再度の発話を要求できるので、声紋認証に混乱を招くような登録行為を排除できると共に発話要求回数を最小回数に抑えることができる。 As a result, the person uttered twice, but even if another person uttered once, it can be determined that there is a problem only with the other person's utterance, and a second utterance can be requested. The registration act that causes confusion is eliminated, and the number of utterance requests can be suppressed to the minimum number.

また、本発明のユーザ認証方法は、ユーザ通話端末に入力されたパスワードの発話音声を通信網経由で取得する工程と、取得されたパスワード発話音声が予め登録された登録パスワードと一致するか否か音声認識により判定する工程と、前記パスワード発話音声の声紋データが当該ユーザのユーザプロファイル内の登録声紋データと一致するか否か判定する工程と、を具備したことを特徴とする。 Further, the user authentication method of the present invention includes a step of acquiring the speech utterance of the password input to the user call terminal via the communication network, and whether or not the acquired password utterance speech matches a registered password registered in advance. A step of determining by voice recognition, and a step of determining whether or not the voiceprint data of the password utterance voice matches the registered voiceprint data in the user profile of the user.

本発明によれば、パスワード認証と声紋認証とを組み合わせた２要素認証による堅牢性を実現できると共にユーザ認証時間を短縮可能で、しかも煩雑な作業が要求されない操作性に優れたユーザ認証システム及びユーザ認証方法を提供できる。 ADVANTAGE OF THE INVENTION According to this invention, the user authentication system which can implement | achieve robustness by the two-factor authentication which combined password authentication and voiceprint authentication, can shorten user authentication time, and was excellent in the operativity which does not require a complicated operation, and a user An authentication method can be provided.

以下、本発明の一実施の形態について図面を参照しながら具体的に説明する。
図１は本発明の実施の形態によるユーザ認証システムを適用したシステムの構成を示すブロック図である。ユーザ通話端末１は、ユーザの発話音声を入力する機能と有線又は無線により電話網２に接続して通話可能にする通信機能とを有する。ユーザ通話端末１として、携帯電話端末、固定電話端末、通話機能を搭載したＰＤＡ等を用いることができるが、本実施の形態では携帯電話端末を用いた例を説明する。 Hereinafter, an embodiment of the present invention will be specifically described with reference to the drawings.
FIG. 1 is a block diagram showing the configuration of a system to which a user authentication system according to an embodiment of the present invention is applied. The user call terminal 1 has a function of inputting a user's uttered voice and a communication function of enabling communication by connecting to the telephone network 2 by wire or wireless. As the user call terminal 1, a mobile phone terminal, a fixed phone terminal, a PDA equipped with a call function, or the like can be used. In this embodiment, an example using a mobile phone terminal will be described.

一方、上記ユーザ通話端末１からの要求に応じて声紋登録及びユーザ認証を行うのが音声認証局である。音声認証局は、電話網２経由で接続したユーザ通話端末１に音声ガイダンスを与える対話型音声対応装置３、ユーザプロファイルに基づいて声紋認証を行う声紋認証サーバ４、発話内容（パスワード）の音声認識を行う音声認識サーバ５、登録ユーザのユーザＩＤ及びパスワードが登録されたユーザデータベース６を備えたユーザ管理サーバ７を主な構成要素として備える。本実施の形態は、各構成要素（３〜７）をＬＡＮ又はＷＡＮ等のネットワークを介した接続で音声認証局をシステム構築しているが、分散配置せずに１つのサーバ上に構築するようにしても良いし、特定の構成要素（例えば、対話型音声対応装置３、声紋認証サーバ４、音声認識サーバ５）だけを同一サーバ上に構築してもよい。 On the other hand, the voice certificate authority performs voiceprint registration and user authentication in response to a request from the user call terminal 1. The voice authentication station includes an interactive voice-compatible device 3 that provides voice guidance to the user call terminal 1 connected via the telephone network 2, a voiceprint authentication server 4 that performs voiceprint authentication based on a user profile, and voice recognition of speech content (password). And a user management server 7 having a user database 6 in which registered user IDs and passwords are registered as main components. In the present embodiment, the voice certificate authority system is constructed by connecting each component (3 to 7) via a network such as a LAN or WAN. However, the voice certificate authority is constructed on one server without being distributed. Alternatively, only specific components (for example, the interactive voice support device 3, the voiceprint authentication server 4, and the voice recognition server 5) may be constructed on the same server.

図２は、対話型音声対応装置３の機能を示すブロック図である。音声ガイダンス対応部１１は、回線接続したユーザ通話端末１に対して声紋登録用の音声ガイダンス及びユーザ認証用の音声ガイダンスを提供する。音声ガイダンスデータは、図示されていないメモリに格納されていて、予め定めたシーケンスにしたがって読み出される。 FIG. 2 is a block diagram showing the functions of the interactive speech support apparatus 3. The voice guidance support unit 11 provides voice guidance for voiceprint registration and voice guidance for user authentication to the user call terminal 1 connected to the line. The voice guidance data is stored in a memory (not shown) and is read according to a predetermined sequence.

発信者番号取得部１２は、対話型音声対応装置３に対して接続要求してきたユーザ通話端末１の発信者番号通知信号からユーザ通話端末１の発信者番号を取得する部分である。取得した発信者番号を声紋認証サーバ４に通知してユーザ登録されているか否かの問い合わせを行う。なお、本例では、声紋認証サーバ４経由でユーザ管理サーバ７にユーザ登録の有無を問い合わせるが、登録有無の確認だけであればユーザ管理サーバ７に直接問合せを行うように構成しても良い。 The caller number acquisition unit 12 is a part that acquires the caller number of the user call terminal 1 from the caller number notification signal of the user call terminal 1 that has requested connection to the interactive voice-compatible device 3. The acquired caller number is notified to the voiceprint authentication server 4 to inquire whether the user is registered. In this example, the user management server 7 is inquired of the presence / absence of user registration via the voiceprint authentication server 4. However, the user management server 7 may be directly inquired if only the presence / absence of registration is confirmed.

ユーザ情報作成部１３は、発信者番号取得部１２による登録問合せの結果、登録が確認された場合に、ユーザ情報を作成して声紋認証サーバ４に通知する。新規声紋登録の場合、ユーザ情報作成部１３から声紋認証サーバ４へのユーザ情報の通知が、声紋データを除くユーザプロファイルデータの取得及びキーワード辞書作成のためのトリガとなる。また、声紋認証の場合、ユーザ情報作成部１３から声紋認証サーバ４へのユーザ情報の通知が、キーワード辞書作成のためのトリガとなる。本例では、ユーザ情報として発信者番号を用いているが、ユーザ登録確認時に取得したユーザＩＤを用いるようにしても良い。 The user information creation unit 13 creates user information and notifies the voice print authentication server 4 when registration is confirmed as a result of the registration inquiry by the caller number acquisition unit 12. In the case of new voiceprint registration, notification of user information from the user information creation unit 13 to the voiceprint authentication server 4 becomes a trigger for acquiring user profile data excluding voiceprint data and creating a keyword dictionary. In the case of voiceprint authentication, notification of user information from the user information creation unit 13 to the voiceprint authentication server 4 serves as a trigger for creating a keyword dictionary. In this example, the caller number is used as the user information, but the user ID acquired at the time of user registration confirmation may be used.

雑音判定部１４及び発話音声レベル判定部１５は、新規声紋登録の場合に起動される機能ブロックである。雑音判定部１４は、声紋登録を行うユーザの周辺環境が声紋登録可能な雑音状況であるか否か判定する。雑音レベルが許容値を超えている場合、声紋登録不可の判定結果を音声ガイダンス対応部１１からユーザ通話端末１にガイダンス出力させる。発話音声レベル判定部１５は、ユーザに所定の内容を発話してもらい、その時の発話音声レベルを判定する。声紋登録するキーワード発話音声の発話音声レベルを適切なレベルに誘導するため、キーワード発話に先立ち、適切な発話音声レベルをユーザに認識してもらうために行っている。発話音声レベル判定部１５は、ユーザの声の大きさが（発話音声レベル）が小さ過ぎないか又は大き過ぎないかを判定する。発話音声レベルが不適切な場合は「声を大きく」「声を小さく」といった音声ガイダンスを音声ガイダンス対応部１１からユーザ通話端末１にガイダンス出力させる。 The noise determination unit 14 and the spoken voice level determination unit 15 are functional blocks that are activated in the case of new voiceprint registration. The noise determination unit 14 determines whether the surrounding environment of the user who performs voiceprint registration is a noise situation in which voiceprint registration is possible. When the noise level exceeds the allowable value, the voice guidance corresponding unit 11 causes the user call terminal 1 to output a guidance result indicating that the voiceprint registration is impossible. The utterance voice level determination unit 15 asks the user to utter predetermined contents, and determines the utterance voice level at that time. In order to guide the utterance voice level of the keyword utterance voice to be registered as a voiceprint to an appropriate level, it is performed in order for the user to recognize the appropriate utterance voice level prior to the keyword utterance. The utterance voice level determination unit 15 determines whether the loudness of the user's voice (utterance voice level) is not too low or too high. When the utterance voice level is inappropriate, voice guidance such as “loud voice” and “lower voice” is output from the voice guidance corresponding unit 11 to the user call terminal 1.

発話取得部１６は、音声ガイダンス対応部１１からユーザによるキーワード発話音声を取得する。取得したキーワード発話音声データは、新規声紋登録時にはプロファイル作成依頼部１７及び音声認識サーバ５へ送出し、ユーザ認証時には認証依頼部１８及び音声認識サーバ５へ送出する。 The utterance acquisition unit 16 acquires a keyword utterance voice by the user from the voice guidance support unit 11. The acquired keyword utterance voice data is sent to the profile creation requesting unit 17 and the voice recognition server 5 at the time of new voiceprint registration, and sent to the authentication requesting unit 18 and the voice recognition server 5 at the time of user authentication.

プロファイル作成依頼部１７は、新規声紋登録時に起動される機能ブロックである。発話取得部１６が取得したキーワード発話音声データを声紋認証サーバ４へ送信する。プロファイル作成依頼部１７は、当該キーワード発話音声データから声紋データを作成してプロファイル登録することを声紋認証サーバ４へ依頼する。 The profile creation request unit 17 is a functional block that is activated when a new voiceprint is registered. The keyword utterance voice data acquired by the utterance acquisition unit 16 is transmitted to the voiceprint authentication server 4. The profile creation requesting unit 17 requests the voiceprint authentication server 4 to create voiceprint data from the keyword utterance voice data and register the profile.

認証依頼部１８は、声紋認証時に起動される機能ブロックである。発話取得部１６が取得したキーワード発話音声データを声紋認証サーバ４へ送信して声紋認証依頼する。 The authentication request unit 18 is a functional block that is activated during voiceprint authentication. The keyword utterance voice data acquired by the utterance acquisition unit 16 is transmitted to the voiceprint authentication server 4 to request voiceprint authentication.

図３は、声紋認証サーバ４の機能を示すブロック図である。ユーザ登録確認部２１は、発信者番号取得部１２から通知された発信者番号をユーザ管理サーバ７に通知し、当該発信番号が付与されたユーザが登録されているか否かの問合せを行う。ユーザ管理サーバ７から返信されるユーザ登録の有無に関する情報を発信者番号取得部１２へ通知する。 FIG. 3 is a block diagram showing functions of the voiceprint authentication server 4. The user registration confirmation unit 21 notifies the user management server 7 of the caller number notified from the caller number acquisition unit 12, and makes an inquiry as to whether or not the user to which the call number is assigned is registered. Information about the presence / absence of user registration returned from the user management server 7 is notified to the caller number acquisition unit 12.

ユーザ情報取得部２２は、ユーザ情報作成部１３から受けた発信者番号を用いてユーザ管理サーバ７に対して当該発信者番号に関連付けて登録されているユーザ情報（ユーザＩＤ、パスワード）の取得要求を発する。なお、ユーザ登録確認時にユーザＩＤを取得している場合は発信者番号に代えてユーザＩＤを用いることができる。新規声紋登録の場合は、ユーザ管理サーバ７からの応答に含まれたユーザ情報をユーザプロファイル作成部２５に入力すると共にパスワードをキーワード辞書作成部２３に入力する。ユーザ認証の場合は、ユーザプロファイル作成部２３へのユーザ情報の入力は行わず、キーワード辞書作成部２３にだけパスワードを入力する。 The user information acquisition unit 22 uses the caller number received from the user information creation unit 13 to acquire the user information (user ID, password) registered with the user management server 7 in association with the caller number. To emit. If the user ID is acquired at the time of user registration confirmation, the user ID can be used instead of the caller number. In the case of new voiceprint registration, the user information included in the response from the user management server 7 is input to the user profile creation unit 25 and the password is input to the keyword dictionary creation unit 23. In the case of user authentication, user information is not input to the user profile creation unit 23, but a password is entered only into the keyword dictionary creation unit 23.

キーワード辞書作成部２３は、ユーザ情報取得部２２からパスワードデータを受け取り、音声認識サーバ５が音声認識のために使用するキーワード辞書を当該パスワードデータから作成する。本実施の形態では、声紋登録対象となっているユーザの登録パスワード又は声紋認証対象となっているユーザの登録パスワードを取り出してキーワード辞書の形式に展開しており、当該登録パスワードに対応したキーワード辞書だけが準備されることになる。したがって、音声認識エンジンが当該キーワード辞書を参照して音声認識する場合、発話音声が当該登録パスワードと一致している場合は音声認識できるが、それ以外の発話音声であれば音声認識に失敗することになる。すなわち、登録パスワードに対応したキーワード辞書を用いた音声認識において音声認識に成功すればパスワード一致でパスワード認証成功を意味し、音声認識に失敗すればパスワード不一致でパスワード認証失敗を意味する。本実施の形態では、音声認識は音声認識サーバ５が行っている。パスワード認証が成功した場合は、ユーザプロファイル作成や声紋認証が行われる。しかし、パスワード認証が失敗した場合は、ユーザプロファイル作成や声紋認証へ移行しないで処理が終了する。 The keyword dictionary creation unit 23 receives password data from the user information acquisition unit 22 and creates a keyword dictionary that the speech recognition server 5 uses for speech recognition from the password data. In the present embodiment, a registered password of a user who is a voiceprint registration target or a registered password of a user who is a voiceprint authentication target is extracted and expanded into a keyword dictionary format, and the keyword dictionary corresponding to the registered password Only will be prepared. Therefore, when the speech recognition engine recognizes speech by referring to the keyword dictionary, speech recognition can be performed if the spoken speech matches the registered password, but speech recognition fails if the speech speech is other than that. become. That is, if the speech recognition using the keyword dictionary corresponding to the registered password is successful, it means that the password authentication is successful and the password authentication is successful. In the present embodiment, the voice recognition server 5 performs voice recognition. If password authentication is successful, user profile creation and voiceprint authentication are performed. However, if password authentication fails, the process ends without moving to user profile creation or voiceprint authentication.

ユーザプロファイル作成部２５は、プロファイル作成依頼部１７からのプロファイル作成依頼を受けて起動される。起動されたユーザプロファイル作成部２５は、プロファイル作成依頼部１７から渡されたパスワード発話音声データから声紋データを抽出する。そして、抽出した声紋データとユーザ情報取得部２２から渡されたユーザ情報とを対応付けたユーザプロファイルを作成し、ユーザプロファイルデータベース２６に登録する。本実施の形態では、ユーザ情報に電話番号、ユーザＩＤ、パスワードを含むものとするが、必ずしもこれらの項目の全てを含まなければならないわけではない。 The user profile creation unit 25 is activated in response to a profile creation request from the profile creation request unit 17. The activated user profile creation unit 25 extracts voiceprint data from the password utterance voice data passed from the profile creation request unit 17. Then, a user profile in which the extracted voiceprint data is associated with the user information passed from the user information acquisition unit 22 is created and registered in the user profile database 26. In the present embodiment, the user information includes a telephone number, a user ID, and a password. However, not all of these items need to be included.

声紋認証部２７は、認証依頼部１８からの声紋認証依頼を受けて起動される。起動された声紋認証部２７は、認証依頼部１８から渡されたパスワード発話音声データから声紋データを抽出すると共にユーザプロファイルデータベース２６から電話番号（発信者番号）をキーにして登録声紋データを取り出す。そして、パスワード発話音声データから抽出した声紋データとユーザプロファイルデータベース２６から取り出した登録声紋データとを比較し、類似度のスコアが所定値を超えれば、声紋認証成功とし、スコアが所定値を超えなければ声紋認証失敗とした認証結果を出力する。 The voiceprint authentication unit 27 is activated in response to a voiceprint authentication request from the authentication request unit 18. The activated voiceprint authentication unit 27 extracts voiceprint data from the password utterance voice data passed from the authentication requesting unit 18 and extracts registered voiceprint data from the user profile database 26 using the telephone number (sender number) as a key. Then, the voiceprint data extracted from the password utterance voice data is compared with the registered voiceprint data extracted from the user profile database 26. If the similarity score exceeds a predetermined value, the voiceprint authentication is successful, and the score must exceed the predetermined value. If the voice print authentication fails, the authentication result is output.

認証結果通知部２８は、声紋認証部２７から出力された認証結果をユーザ特定情報である電話番号と一緒にユーザ管理サーバ７へ通知する。なお、認証結果通知部２８が認証結果を通知する通知先はユーザ管理サーバ７に限定されるものではなく、用途・目的に応じたシステム構成に応じて通知先を変更することは可能である。例えば、要求元のユーザ通話端末１に直接送信することも可能であるし、別のアプリケーションサーバへ通知することもできる。 The authentication result notifying unit 28 notifies the user management server 7 of the authentication result output from the voiceprint authenticating unit 27 together with the telephone number that is the user specifying information. Note that the notification destination to which the authentication result notification unit 28 notifies the authentication result is not limited to the user management server 7, and the notification destination can be changed according to the system configuration corresponding to the purpose and purpose. For example, it can be transmitted directly to the requesting user call terminal 1 or can be notified to another application server.

ユーザプロファイル再構築部２９は、ユーザ認証を行った際に声紋認証部２７が抽出した声紋データをユーザプロファイルデータベース２６に登録してユーザプロファイルを再構築している部分である。 The user profile reconstruction unit 29 is a part that registers the voiceprint data extracted by the voiceprint authentication unit 27 when user authentication is performed in the user profile database 26 and reconstructs the user profile.

音声認識サーバ５は、音声認識エンジンを備えている。音声認識エンジンは、発話取得部１６から与えられたパスワード発話音声データを、キーワード辞書作成部２３が作成したキーワード辞書を参照して認識する。キーワード辞書作成部２３が作成したキーワード辞書が１つのパスワードに対するものだけであれば、当該１つのパスワードしか認識できないことになる。したがって、音声認識の成功はパスワード認証成功を意味し、音声認識に失敗はパスワード認証失敗を意味する。音声認識結果は対話型音声対応装置３へ送信する。 The voice recognition server 5 includes a voice recognition engine. The voice recognition engine recognizes the password utterance voice data given from the utterance acquisition unit 16 with reference to the keyword dictionary created by the keyword dictionary creation unit 23. If the keyword dictionary created by the keyword dictionary creation unit 23 is only for one password, only that one password can be recognized. Therefore, the success of voice recognition means the success of password authentication, and the failure of voice recognition means the failure of password authentication. The speech recognition result is transmitted to the interactive speech support apparatus 3.

次に、以上のように構成された本実施の形態の動作について、新規声紋登録動作とユーザ認証動作とに分けて具体的に説明する。音声認証局として機能する本システムから認証サービスを受けるために、ユーザ管理サーバ７に対して予めユーザＩＤ、パスワード、その他のユーザ情報を登録しているものとする。 Next, the operation of the present embodiment configured as described above will be specifically described by dividing it into a new voiceprint registration operation and a user authentication operation. It is assumed that a user ID, a password, and other user information are registered in advance in the user management server 7 in order to receive an authentication service from this system that functions as a voice certificate authority.

図４は、新規声紋登録時のフロー図である。ユーザがユーザ通話端末１を用いて対話型音声対応装置３の電話番号をダイヤルし回線接続する。対話型音声対応装置３では、ユーザ通話端末１から着信があると、発信者番号取得部１２がユーザ通話端末１の発信者番号を取得する。発信者番号が非通知設定であった場合、発番号通知設定にして掛け直すように音声ガイダンスする。発信者番号取得部１２が取得した発信者番号はユーザ登録確認部２１へ送られる。ユーザ登録確認部２１は当該発信者番号を有するユーザが登録されているか否かユーザ管理サーバ７に問い合せを出す。ユーザ管理サーバ７からユーザ登録確認部２１に返された登録有無情報は発信者番号取得部１２へ通知される。ユーザ登録されていなかった場合、音声ガイダンス対応部１１はユーザ未登録である旨を音声ガイダンスして処理を終了する。 FIG. 4 is a flowchart when registering a new voiceprint. The user uses the user call terminal 1 to dial the telephone number of the interactive voice response device 3 and connect the line. In the interactive voice response device 3, when there is an incoming call from the user call terminal 1, the caller number acquisition unit 12 acquires the caller number of the user call terminal 1. If the caller ID is set to non notification, the voice guidance is given so that the calling number is set and the call is repeated. The caller number acquired by the caller number acquisition unit 12 is sent to the user registration confirmation unit 21. The user registration confirmation unit 21 inquires of the user management server 7 whether or not the user having the caller number is registered. The registration presence / absence information returned from the user management server 7 to the user registration confirmation unit 21 is notified to the caller number acquisition unit 12. If the user is not registered, the voice guidance support unit 11 performs voice guidance to the effect that the user is not registered, and ends the process.

ユーザ登録されていた場合、新規声紋登録の処理に移り、ユーザ情報作成部１３がユーザ情報取得部２２へ発信者番号を通知してユーザプロファイル作成の準備を開始させると共に、ユーザ通話端末１に対しては音声ガイダンス対応部１１から雑音判定するので発話しないように指示する音声ガイダンスが発せられる。 If the user has been registered, the process proceeds to a new voiceprint registration process. The user information creation unit 13 notifies the user information acquisition unit 22 of the caller number and starts preparation for creating a user profile. Since the voice guidance corresponding unit 11 makes a noise determination, a voice guidance for instructing not to speak is issued.

ユーザ情報取得部２２は、発信者番号に対応した登録ユーザのユーザ情報をユーザ管理サーバ７から取得する。取得したユーザＩＤ、パスワード及び電話番号をユーザプロファイル作成部２５へ入力すると共に、パスワード認証のためにパスワードデータをキーワード辞書作成部２３に入力する。 The user information acquisition unit 22 acquires user information of a registered user corresponding to the caller number from the user management server 7. The acquired user ID, password, and telephone number are input to the user profile creation unit 25, and password data is input to the keyword dictionary creation unit 23 for password authentication.

雑音判定部１４は、発話していない状態での受話音声データからユーザ側の雑音状況を判定する。雑音レベルが所定値を超えていれば、もう少し静かな環境から電話を掛け直すように音声ガイダンスして処理を終了する。雑音レベルが所定値を超えていない場合は、ユーザ本人に発話してもらうように音声ガイダンスする。 The noise determination unit 14 determines the noise situation on the user side from the received voice data when not speaking. If the noise level exceeds a predetermined value, the voice guidance is performed so as to make a call again from a slightly quieter environment, and the process is terminated. If the noise level does not exceed the predetermined value, voice guidance is given so that the user himself / herself speaks.

このように、雑音レベルが所定値を超えないような静かな環境で声紋登録を行うことにより、声紋認証の精度を大幅に改善することができる。雑音レベルの閾値は用途・目的に応じて設定可能であり、閾値を上げることにより声紋認証の精度を上げることができる。 Thus, by performing voiceprint registration in a quiet environment where the noise level does not exceed a predetermined value, the accuracy of voiceprint authentication can be greatly improved. The threshold of the noise level can be set according to the use and purpose, and the accuracy of voiceprint authentication can be increased by increasing the threshold.

次に、音声ガイダンスによってユーザに発話を促し、ユーザが発話した発話内容からユーザの発話音声レベルを判定する。例えば、会社名、所属、名前を発話するように音声ガイダンスで発話を促す。発話音声レベル判定部１５は、受信した音声信号からユーザの発話音声レベルを測定する。発話音声レベルの判定の結果、音声レベルが所定値よりも低い場合は「もう少し大きな声で発生してください」といった内容の音声ガイダンスを出力し、音声レベルが所定値よりも高い場合は「もう少し小さい声で発生してください」といった内容の音声ガイダンスを出力する。声紋登録時の声の大きさは小さ過ぎても又大き過ぎても精度の良い声紋データを採集することができない。そこで、実際に登録するパスワードを発話する前に、適切な発話音声レベルをユーザに認識させるために発話音声レベルの測定を行っている。 Next, the user is prompted to speak by voice guidance, and the speech voice level of the user is determined from the speech content spoken by the user. For example, utterance is urged by voice guidance so that the company name, affiliation, and name are spoken. The utterance voice level determination unit 15 measures the user's utterance voice level from the received voice signal. If the speech level is determined to be lower than the specified value, a voice guidance with a content such as “Please generate a louder voice” is output. If the level is higher than the specified value, “Slightly lower” Voice guidance with the content “Please generate with voice” is output. Accurate voiceprint data cannot be collected when the voiceprint size is too small or too large at the time of voiceprint registration. Therefore, before the password to be actually registered is uttered, the utterance voice level is measured in order to make the user recognize an appropriate utterance voice level.

ここで、会社名、所属、名前の発話が終了したら、ユーザ通話端末１の特定ボタン（例えば＃ボタン）を押下して発話が終了したことを知らせるように音声ガイダンスすることが望ましい。音声ガイダンス対応部１１は＃ボタンの押下を検出することにより次の処理へ移行できるので、対話型音声対応装置３での待ち時間を短縮することができ、ひいては声紋登録時間の短縮につながる。 Here, when the utterance of the company name, affiliation, and name is completed, it is desirable to provide voice guidance so as to notify the user that the utterance has ended by pressing a specific button (for example, # button) on the user call terminal 1. Since the voice guidance support unit 11 can move to the next processing by detecting the pressing of the # button, the waiting time in the interactive voice support device 3 can be shortened, leading to a reduction in voiceprint registration time.

次に、音声ガイダンス対応部１１は、ユーザがユーザ管理サーバ７に登録しているパスワードを複数回発話するように音声ガイダンスを行う。本実施の形態では、ユーザ管理サーバ７にユーザが使用している携帯電話機の携帯電話番号がパスワードとして登録されているものとする。例えば「１回目の声紋登録を行います。登録した携帯電話機の携帯電話番号をおっしゃってください」といった音声ガイダンスを出力する。ユーザはユーザ通話端末１に対して登録携帯電話番号を発話する。 Next, the voice guidance support unit 11 performs voice guidance so that the user speaks the password registered in the user management server 7 a plurality of times. In the present embodiment, it is assumed that the mobile phone number of the mobile phone used by the user is registered in the user management server 7 as a password. For example, a voice guidance such as “Perform first voiceprint registration. Please tell us the mobile phone number of the registered mobile phone” is output. The user speaks the registered mobile phone number to the user call terminal 1.

最初に、発話された携帯電話番号を音声認識サーバ５で音声認識し、ユーザ管理サーバ７に登録されている携帯電話番号と一致するか否か判断する。具体的には、発話取得部１６が携帯電話番号の発話音声を取得して音声認識サーバ５へ送信する。一方、キーワード辞書作成部２３は、発信者番号に基づいてユーザ管理サーバ７から取得した登録携帯電話番号からキーワード辞書を作成する。例えば、数字で表記されている登録携帯電話番号（０３―１２３４・・・）を、当該登録携帯電話番号の読み方である発音データ（ゼロサンイチニイサンヨン・・・）に変換する。この発音データがキーワード辞書として用いられる。 First, the spoken mobile phone number is voice-recognized by the voice recognition server 5, and it is determined whether or not it matches the mobile phone number registered in the user management server 7. Specifically, the utterance acquisition unit 16 acquires the utterance voice of the mobile phone number and transmits it to the voice recognition server 5. On the other hand, the keyword dictionary creation unit 23 creates a keyword dictionary from the registered mobile phone number acquired from the user management server 7 based on the caller number. For example, a registered mobile phone number (03-1234...) Represented by a number is converted into pronunciation data (zero sanity sanyon...) That is a way of reading the registered mobile phone number. This pronunciation data is used as a keyword dictionary.

音声認識サーバ５が発話取得部１６から受け取った携帯電話番号の発話音声を音響分析し、キーワード辞書作成部２３が作成したキーワード辞書を用いて音声認識する。ここで、キーワード辞書作成部２３が今回作成したキーワード辞書は、ユーザ情報取得部２２から与えられた１つの携帯電話番号に関するものだけである。すなわち、キーワード辞書を用いて音声認識を行う音声認識サーバ５が今回認識可能な音声データは、キーワード辞書として今回用意された登録携帯電話番号だけである。このため、音声認識サーバ５が携帯電話番号の発話音声の音声認識に成功した場合、発話取得部１６から受け取った携帯電話番号とユーザ情報取得部２２から受け取った登録携帯電話番号とが一致したことになる。本実施の形態は携帯電話番号をパスワードとして登録しているので、発話音声の携帯電話番号を音声認識できたということはパスワードが一致してパスワード認証が成功したことになる。逆に、音声認識サーバ５が携帯電話番号の発話音声を音声認識できなかった場合、パスワードが不一致でパスワード認証に失敗したことになる。 The speech recognition server 5 acoustically analyzes the speech of the mobile phone number received from the speech acquisition unit 16 and recognizes the speech using the keyword dictionary created by the keyword dictionary creation unit 23. Here, the keyword dictionary created this time by the keyword dictionary creation unit 23 is only for one mobile phone number given from the user information acquisition unit 22. That is, the speech data that can be recognized this time by the speech recognition server 5 that performs speech recognition using the keyword dictionary is only the registered mobile phone number prepared this time as a keyword dictionary. For this reason, when the voice recognition server 5 succeeds in the voice recognition of the speech of the mobile phone number, the mobile phone number received from the speech acquisition unit 16 and the registered mobile phone number received from the user information acquisition unit 22 match. become. In the present embodiment, since the mobile phone number is registered as a password, the fact that the mobile phone number of the uttered voice can be recognized by voice means that the password matches and the password authentication is successful. On the other hand, if the voice recognition server 5 cannot recognize the speech of the mobile phone number, the password authentication does not match and the password authentication has failed.

このように、音声認識サーバ５は１つのキーワード辞書との照合を行うだけであるので、短時間で音声認識を完了することができ、しかも候補は１つだけであるので多数の候補（多数のキーワード辞書）の中から最も類似度の高いキーワードを選択する方式に比べて認識精度を大幅に向上させることができる。 In this way, since the speech recognition server 5 only performs matching with one keyword dictionary, speech recognition can be completed in a short time, and since there is only one candidate, a large number of candidates (a large number of candidates) The recognition accuracy can be greatly improved as compared with the method of selecting the keyword having the highest similarity from the keyword dictionary.

なお、キーワード辞書作成部２３において、音声認識の度にパスワードをキーワード辞書に展開しているが、パスワードを予めキーワード辞書の形式に展開したものをユーザ管理サーバ７のデーターベースに格納しておき、指定されたパスワードに対応したキーワード辞書を当該パスワードの代わりに取り出して音声認識サーバ５から参照可能にするように構成しても良い。このように構成することで、毎回の辞書作成時に必要なリソースを省力化する事も可能である。 The keyword dictionary creating unit 23 expands the password into the keyword dictionary every time voice recognition is performed, but stores the password expanded in the keyword dictionary format in the database of the user management server 7 in advance. A keyword dictionary corresponding to the designated password may be taken out instead of the password and can be referred to from the voice recognition server 5. By configuring in this way, it is possible to save resources required for creating a dictionary every time.

音声認識サーバ５は携帯電話番号の発話音声に対する音声認識結果（認識成功／認識失敗）を発話取得部１６へ返信する。発話音声単語が登録携帯電話番号であった場合は「認識成功」が返信され、発話音声単語が登録携帯電話番号以外であった場合は「認識失敗」が返信される。 The voice recognition server 5 returns the voice recognition result (recognition success / recognition failure) to the utterance voice of the mobile phone number to the utterance acquisition unit 16. When the utterance voice word is a registered mobile phone number, “recognition success” is returned, and when the utterance voice word is other than the registration mobile phone number, “recognition failure” is returned.

発話取得部１６は、音声認識サーバ５から返信された認識結果に応じてユーザ通話端末１に対する音声ガイダンスの内容を切り替える。「認識失敗」が返信された場合、登録された携帯電話番号と一致しなかった旨の音声ガイダンスを出力して１回目の声紋登録処理の最初に戻る。「認識成功」が返信された場合、２回目の声紋登録を行うので登録携帯電話番号を発声して特定ボタン（例えば＃ボタン）を押下するように指示する音声ガイダンスを出力する。声紋登録時に音声認識サーバ５による認識失敗が所定回数（例えば３回）繰り返された場合は声紋登録が行われなかった旨の音声ガイダンスを出力して登録処理を終了する。 The utterance acquisition unit 16 switches the content of the voice guidance for the user call terminal 1 according to the recognition result returned from the voice recognition server 5. When “recognition failure” is returned, a voice guidance indicating that the mobile phone number does not match the registered mobile phone number is output, and the process returns to the beginning of the first voiceprint registration process. If “recognition success” is returned, the voiceprint registration is performed for the second time, so that the registered mobile phone number is uttered and a voice guidance instructing to press a specific button (for example, # button) is output. If recognition failure by the voice recognition server 5 is repeated a predetermined number of times (for example, three times) during voiceprint registration, voice guidance indicating that voiceprint registration has not been performed is output, and the registration process is terminated.

一方、１回目の声紋登録で音声認識サーバ５から「認識成功」が返信された場合、２回目の声紋登録を行うため上記音声ガイダンスを出力すると共に、発話取得部１６からプロファイル作成依頼部１７に対して上記音声認識に用いた携帯電話番号の発話音声データを入力する。プロファイル作成依頼部１７は携帯電話番号の発話音声データをユーザプロファイル作成部２５へ供給することでユーザプロファイルの作成依頼を発行する。 On the other hand, when “recognition success” is returned from the voice recognition server 5 in the first voiceprint registration, the voice guidance is output to perform the second voiceprint registration, and the utterance acquisition unit 16 sends the profile creation requesting unit 17. On the other hand, the voice data of the mobile phone number used for the voice recognition is input. The profile creation requesting unit 17 issues the user profile creation request by supplying the speech data of the mobile phone number to the user profile creating unit 25.

ユーザプロファイル作成部２５は、携帯電話番号の発話音声データを音響分析して声紋データを抽出する。ユーザプロファイル作成部２５は、ユーザ情報取得部２２が発信者番号に基づいて取得したユーザ情報（ユーザＩＤ、電話番号、パスワード）を登録したユーザプロファイルを作成してユーザプロファイルデータベース２６に登録する。さらに、ユーザプロファイル作成部２５は、今回抽出した声紋データを当該ユーザのユーザプロファイルに追加する。これにより、１回目の声紋データ、ユーザＩＤ、電話番号、パスワードからなるユーザプロファイルが登録されたことになる。これで１回目の声紋登録が完了する。 The user profile creation unit 25 acoustically analyzes the speech voice data of the mobile phone number and extracts voiceprint data. The user profile creation unit 25 creates a user profile in which user information (user ID, telephone number, password) acquired by the user information acquisition unit 22 based on the caller number is registered, and registers the user profile in the user profile database 26. Further, the user profile creation unit 25 adds the voiceprint data extracted this time to the user profile of the user. As a result, the user profile including the first voiceprint data, user ID, telephone number, and password is registered. This completes the first voiceprint registration.

２回目の声紋登録及び３回目の声紋登録においても１回目の声紋登録と同じ処理を繰り返し、同じユーザプロファイルに２回目、３回目の声紋データを順次登録する。本例では３回目の声紋データ登録が終了したところで、認証結果通知部２８からユーザ管理サーバ７へ登録結果が通知される。また、ユーザ通話端末１に対して音声ガイダンス対応部１１から声紋が登録された旨の音声ガイダンスがなされる。 In the second voiceprint registration and the third voiceprint registration, the same processing as the first voiceprint registration is repeated, and the second and third voiceprint data are sequentially registered in the same user profile. In this example, when the third voiceprint data registration is completed, the registration result is notified from the authentication result notification unit 28 to the user management server 7. In addition, voice guidance to the effect that the voiceprint has been registered is made from the voice guidance corresponding unit 11 to the user call terminal 1.

なお、声紋登録する入力音声が音割れしているか否かを判定する音割れ判定手段を備えることが望ましい。発話取得部１６は、音割れ判定手段の判定結果を受けてプロファイル作成依頼を出すか否か判断する。音割れしている場合は、再度のパスワード発話を促す音声ガイダンスを行う。入力許容レベルを超える入力音声が在った場合、入力機器側は入力許容レベルの最大値でしかそれを数値化できないため、結果として数値データから音声に戻した場合に元の音声からかけ離れた音になってしまう。そこで、音割れ判定手段が音割れしていないと判定した入力音声を声紋データ抽出の対象とする。または、音割れしない入力音声のレベルをユーザに認識させた上で、声紋登録のためのパスワード発話を行わせ、声紋登録時の発話では入力音声が入力許容レベルを超えないように音声ガイダンスなどで導く。 Note that it is desirable to provide sound crack determination means for determining whether or not the input voice to be registered as a voiceprint is cracked. The utterance acquisition unit 16 determines whether or not to issue a profile creation request in response to the determination result of the sound crack determination unit. If the sound is broken, voice guidance is given to prompt the user to speak the password again. If there is an input sound that exceeds the allowable input level, the input device can only quantify it at the maximum allowable input level. As a result, when returning from numerical data to sound, the sound far from the original sound Become. In view of this, the voice input data that is determined by the sound breaking determination means as not being broken is used as a voiceprint data extraction target. Or, let the user recognize the level of the input voice that does not break the sound, and then make a password utterance for voiceprint registration, and use voice guidance etc. so that the input voice does not exceed the input allowable level in the utterance at the time of voiceprint registration Lead.

ここで、音割れ判定には、デジタル系によるものとアナログ系によるものとがある。デジタル系による音割れ判定では、入力音声の音圧を数値化した場合に、当該数値が一定レベルを超えていれば音割れしていると判定することができる。アナログ系による音割れ判定では、レベルでの判断が困難であるので、入力音声をＦＦＴ（高速フーリエ変換）処理して周波数軸上での音圧分布に展開し、その分布状況が音声帯域の全域に渡り分布していれば音割れしていると判定することができる。 Here, the sound cracking determination includes a digital system and an analog system. In the sound crack determination by the digital system, when the sound pressure of the input voice is digitized, it can be determined that the sound is broken if the numerical value exceeds a certain level. It is difficult to judge by level in the sound crack judgment by analog system, so the input voice is processed by FFT (Fast Fourier Transform) and developed into the sound pressure distribution on the frequency axis. If it is distributed over the range, it can be determined that the sound is broken.

また、上記の説明では１回の音声入力毎に声紋登録を行っているが、１回目から３回目までの声紋データについて相互信頼度判定を行った上で、ユーザプロファイルに登録を行うように構成することが望ましい。相互信頼度判定は図示していない相互信頼度判定手段によって行うものとする。 Further, in the above description, voiceprint registration is performed for each voice input, but it is configured so that the mutual reliability determination is performed on the voiceprint data from the first time to the third time and then registered in the user profile. It is desirable to do. The mutual reliability determination is performed by a mutual reliability determination means (not shown).

上記したように声紋データを登録する為の音声発話は少なくとも３回行う。この発話の際に、通常は３回とも本人が発話するが、悪意の在るユーザが２回は本人、１回を他人に発話させた場合は後の声紋認証で混乱が起きる。これらの発話はキーワードチェック、ノイズチェック、ＳＮ比チェック、音割れチェックの何れのチェックにもかかる事無く、３発話とも本人のものとして声紋データが作成される。しかし、実際には１発話は他人のものである為、このままでは他人排他率を悪化させてしまう。 As described above, voice utterance for registering voiceprint data is performed at least three times. At the time of this utterance, the person himself usually utters three times. However, if a malicious user causes the other person to speak twice and the other person speaks, confusion occurs in later voiceprint authentication. These utterances are not subject to any of the keyword check, noise check, S / N ratio check, and sound crack check, and voiceprint data is created for the three utterances as that of the person. However, since one utterance actually belongs to another person, the exclusion ratio of other persons deteriorates as it is.

これを防止するために相互信頼度判定手段による相互信頼度チェックを行う。１回目の発話と２回目の発話の本人確度を計算しこれをＡとする。２回目の発話と３回目の発話の本人確度を計算しこれをＢとする。３回目の発話と１回目の発話の本人確度を計算しこれをＣとする。このＡ，Ｂ，Ｃの何れもが一定の閾値を超えていなければ、声紋登録できないこととする。 In order to prevent this, a mutual reliability check is performed by the mutual reliability determination means. The identity accuracy of the first utterance and the second utterance is calculated, and this is set as A. The identity accuracy of the second utterance and the third utterance is calculated, and this is set as B. The identity accuracy of the third utterance and the first utterance is calculated and this is set as C. If any of A, B, and C does not exceed a certain threshold, voiceprint registration cannot be performed.

Ａ，Ｂ，Ｃの何れもが一定の閾値を超えていない場合、３回のパスワード発話、全てに問題が在ると判断され、全てのパスワード発話を再度要求する。発話取得部１６は、相互信頼度判定手段からの指示を受けて前述した手順で再び３回のパスワード発話を要求する音声ガイダンスを出力する。 If any of A, B, and C does not exceed a certain threshold, it is determined that there is a problem with all three password utterances, and all password utterances are requested again. In response to an instruction from the mutual reliability determination means, the utterance acquisition unit 16 outputs voice guidance requesting password utterances three times again according to the procedure described above.

Ａ，Ｂが悪く、Ｃが良い場合、１回目のパスワード発話と３回目のパスワード発話には問題が無く、２回目の発話に問題が在ると判断されて、２回目のパスワード発話を再度要求する。発話取得部１６は、相互信頼度判定手段からの指示を受けて前述した手順で再び２回目のパスワード発話を要求する音声ガイダンスを出力する。 If A and B are bad and C is good, it is judged that there is no problem in the first password utterance and the third password utterance, and there is a problem in the second utterance, and the second password utterance is requested again. To do. In response to the instruction from the mutual reliability determination means, the utterance acquisition unit 16 outputs voice guidance requesting the second password utterance again by the procedure described above.

Ａが悪く、Ｂ，Ｃが良い場合、何れか２回のパスワード発話がおかしいが、それを特定できないので、全てのパスワード発話を再度要求する。発話取得部１６は、相互信頼度判定手段からの指示を受けて前述した手順で再び３回のパスワード発話を要求する音声ガイダンスを出力する。 If A is bad and B and C are good, any two password utterances are strange, but since it cannot be specified, all password utterances are requested again. In response to an instruction from the mutual reliability determination means, the utterance acquisition unit 16 outputs voice guidance requesting password utterances three times again according to the procedure described above.

本実施の形態は、新規声紋登録によって各ユーザのユーザプロファイルに同一パスワードに関する３つの声紋データを登録している。同一ユーザが同一パスワードを発話したとしても、全く同一の声紋データとはならない。そこで、声紋データのぶれを吸収して誤判定を防止するために複数回の発話音声から複数の声紋データを採集して登録しておくこととした。 In this embodiment, three voiceprint data relating to the same password are registered in the user profile of each user by new voiceprint registration. Even if the same user speaks the same password, the voice print data is not exactly the same. Therefore, in order to absorb fluctuations in voiceprint data and prevent erroneous determination, a plurality of voiceprint data is collected and registered from a plurality of utterances.

また、パスワードの発話を促す音声ガイダンスにおいて、１回目、２回目、３回目で発話速度を変えるように音声ガイダンスを行う。例えば、１回目の声紋登録時には「普通の早さで携帯電話番号をおっしゃってください」、２回目の声紋登録時には「早口で携帯電話番号をおっしゃってください」、３回目の声紋登録時には「ゆっくりと携帯電話番号をおっしゃってください」といった音声ガイダンスを行う。このような音声ガイダンスに従って発話されたパスワードの声紋データを採集して登録することで、普通の早さで発話した際の声紋データと、早口で発話した際の声紋データと、ゆっくりと発話した際の声紋データとがユーザプロファイルに登録される。 Further, in the voice guidance for prompting the utterance of the password, the voice guidance is performed so that the utterance speed is changed at the first time, the second time, and the third time. For example, “Register your mobile phone number at normal speed” when registering the first voiceprint, “Please tell your mobile phone number as soon as possible” when registering the second voiceprint, and “Slowly when registering the third voiceprint” Please give me your mobile phone number. " By collecting and registering the voiceprint data of passwords spoken according to such voice guidance, voiceprint data when speaking at normal speed, voiceprint data when speaking quickly, and when speaking slowly Voice print data is registered in the user profile.

また、同一パスワードについて発話速度の異なる複数の声紋データを取得するために、発話音声データを計算機に入力し、発話速度を計算機上で変化させることにより同一発話音声から複数の声紋データを取得するように構成しても良い。このように構成した場合、ユーザは声紋登録のために１回だけ発話すればよいので、声紋登録に要する時間を短縮することができる。又は、上記しように発話速度を変えて複数回（３回）パスワードを発話させ、各発話音声データを計算機に入力してそれぞれ発話速度を変化させることにより、発話速度の異なる多数の声紋データを取得でき、声紋認証の精度を改善させることができる。 In addition, in order to acquire a plurality of voiceprint data having different utterance speeds for the same password, the utterance voice data is input to a computer, and a plurality of voiceprint data is acquired from the same utterance voice by changing the utterance speed on the computer. You may comprise. In such a configuration, the user needs to speak only once for voiceprint registration, so the time required for voiceprint registration can be shortened. Or, as described above, changing the utterance speed, uttering a password multiple times (three times), inputting each utterance voice data into the computer and changing the utterance speed, thereby obtaining a large number of voiceprint data with different utterance speeds This can improve the accuracy of voiceprint authentication.

図５はユーザ認証時のフロー図である。以下、ユーザ認証動作について説明する。
本システムで認証サービスを受けようとするユーザがユーザ通話端末１から対話型音声対応装置３の電話番号をダイヤルして回線接続する。対話型音声対応装置３では、ユーザ通話端末１から着信があると、発信者番号取得部１２がユーザ通話端末１の発信者番号を取得する。発信者番号が非通知設定であった場合、発番号通知設定にして掛け直すように音声ガイダンスする。発信者番号取得部１２が取得した発信者番号はユーザ登録確認部２１へ送られる。ユーザ登録確認部２１は当該発信者番号を有するユーザが登録されているか否かユーザ管理サーバ７に問い合せる。ユーザ管理サーバ７からユーザ登録確認部２１に返された登録有無情報は発信者番号取得部１２へ通知される。ユーザ登録されていなかった場合、音声ガイダンス対応部１１はユーザ未登録である旨を音声ガイダンスして処理を終了する。 FIG. 5 is a flowchart for user authentication. Hereinafter, the user authentication operation will be described.
A user who wants to receive an authentication service in this system dials the telephone number of the interactive voice-compatible device 3 from the user call terminal 1 and connects to the line. In the interactive voice response device 3, when there is an incoming call from the user call terminal 1, the caller number acquisition unit 12 acquires the caller number of the user call terminal 1. If the caller ID is set to non notification, the voice guidance is given so that the calling number is set and the call is repeated. The caller number acquired by the caller number acquisition unit 12 is sent to the user registration confirmation unit 21. The user registration confirmation unit 21 inquires of the user management server 7 whether or not a user having the caller number is registered. The registration presence / absence information returned from the user management server 7 to the user registration confirmation unit 21 is notified to the caller number acquisition unit 12. If the user is not registered, the voice guidance support unit 11 performs voice guidance to the effect that the user is not registered, and ends the process.

ユーザ登録されていた場合、ユーザ情報作成部１３がユーザ情報取得部２２へ発信者番号を通知してキーワード辞書作成の準備を開始させる一方、音声ガイダンス対応部１１がユーザ通話端末１に対して登録携帯電話番号を発話して最後に特定ボタン（例えば＃ボタン）を押下することを指示する音声ガイダンスを出力する。ユーザは、ユーザ管理サーバ７に事前に登録している携帯電話番号を音声ガイダンスに従い発話する。 If the user is registered, the user information creation unit 13 notifies the user information acquisition unit 22 of the caller number and starts preparation for creating the keyword dictionary, while the voice guidance support unit 11 registers with the user call terminal 1. A voice guidance is output instructing the user to utter a mobile phone number and finally press a specific button (for example, # button). The user speaks the mobile phone number registered in advance in the user management server 7 according to the voice guidance.

発話取得部１６は、ユーザがユーザ通話端末１に対して発した携帯電話番号の発話音声データを取得する。最初に、パスワード認証のため携帯電話番号の発話音声データを音声認識サーバ５へ送信して認識可能な否か判定する。 The utterance acquisition unit 16 acquires utterance voice data of a mobile phone number uttered by the user to the user call terminal 1. First, the speech voice data of the mobile phone number is transmitted to the voice recognition server 5 for password authentication to determine whether or not it can be recognized.

ユーザ情報取得部２２は、着信時のユーザ登録確認に連動してユーザ管理サーバ７から登録パスワードである登録携帯電話番号を取得し、キーワード辞書作成部２３へ供給している。キーワード辞書作成部２３は、前述した新規声紋登録時と同様にして、登録携帯電話番号を当該携帯電話番号の読み方である発音データに変換してキーワード辞書として保持している。 The user information acquisition unit 22 acquires a registered mobile phone number, which is a registration password, from the user management server 7 in conjunction with user registration confirmation at the time of incoming call, and supplies it to the keyword dictionary creation unit 23. The keyword dictionary creation unit 23 converts the registered mobile phone number into pronunciation data that is a way of reading the mobile phone number and holds it as a keyword dictionary in the same way as when registering a new voiceprint.

音声認識サーバ５は、発話取得部１６から供給された携帯電話番号の発話音声データを音響分析し、当該分析結果をキーワード辞書作成部２３の保持するキーワード辞書と照合し音声認識を実行する。発話取得部１６から供給された携帯電話番号の発話音声データとキーワード辞書作成部２３に供給された登録携帯電話番号とが同一であれば音声認識に成功するが、異なっていれば認識できないので音声認識に失敗する。すなわち、自己の指定する携帯電話番号を予めユーザ管理サーバ７に登録しておき、ユーザ認証時にユーザが発話した携帯電話番号と一致した場合はパスワードが一致したことになるのでパスワード認証が成功したことになる。また、不一致であった場合はパスワード認証に失敗したことになる。音声認識サーバ５はパスワード認証結果となる音声認識結果を発話取得部１６へ通知する。 The voice recognition server 5 acoustically analyzes the utterance voice data of the mobile phone number supplied from the utterance acquisition unit 16 and collates the analysis result with the keyword dictionary held by the keyword dictionary creation unit 23 to execute voice recognition. If the utterance voice data of the mobile phone number supplied from the utterance acquisition unit 16 and the registered mobile phone number supplied to the keyword dictionary creation unit 23 are the same, the speech recognition succeeds, but if they are different, the voice cannot be recognized. Recognition fails. That is, the mobile phone number specified by the user is registered in the user management server 7 in advance, and if the mobile phone number spoken by the user at the time of user authentication is matched, the password is matched, so the password authentication is successful. become. If they do not match, password authentication has failed. The voice recognition server 5 notifies the utterance acquisition unit 16 of the voice recognition result that is the password authentication result.

発話取得部１６は、音声認識結果が音声認識成功であった場合は認証依頼部１８へ声紋認証依頼を発行する。また、音声認識失敗であった場合は、登録されている携帯電話番号と不一致であるので、再度携帯電話番号を発話して特定ボタンを押下するように音声ガイダンスを行う。音声認識失敗が所定回数になったら認証されなかった旨の音声ガイダンスを出力して認証処理を終了する。 The utterance acquisition unit 16 issues a voice print authentication request to the authentication request unit 18 when the voice recognition result is a voice recognition success. If the voice recognition is unsuccessful, it does not match the registered mobile phone number, so the voice guidance is performed so that the mobile phone number is spoken again and the specific button is pressed. If the voice recognition failure reaches a predetermined number of times, a voice guidance indicating that the authentication has not been performed is output, and the authentication process is terminated.

声紋認証部２７は、認証依頼部１８から声紋認証の依頼（パスワード認証された携帯電話番号の発話音声データを含む）を受け取る。本例では、当該依頼に発信者番号取得部１２が取得した発信者番号が付加されているものとするが、ユーザ情報取得部２２から発信者番号である電話番号を取得するように構成しても良い。声紋認証部２７は、当該携帯電話番号の発話音声データから声紋データを抽出する一方、ユーザプロファイルデータベース２６から電話番号をキーにして当該ユーザの登録声紋データを取得する。そして、今回発話音声データから抽出した声紋データとユーザプロファイルデータベース２６から取得した登録声紋データとの類似度を計算する。類似度のスコアが所定値を超えていれば声紋が一致したとして声紋認証成功となるが、類似度のスコアが所定値に至らなかった場合は声紋認証失敗となる。 The voiceprint authentication unit 27 receives a voiceprint authentication request (including speech voice data of a password-authenticated mobile phone number) from the authentication requesting unit 18. In this example, it is assumed that the caller number acquired by the caller number acquisition unit 12 is added to the request, but the telephone number that is the caller number is acquired from the user information acquisition unit 22. Also good. The voiceprint authentication unit 27 extracts voiceprint data from the utterance voice data of the mobile phone number, and acquires the registered voiceprint data of the user from the user profile database 26 using the phone number as a key. Then, the degree of similarity between the voiceprint data extracted from the utterance voice data this time and the registered voiceprint data acquired from the user profile database 26 is calculated. If the similarity score exceeds a predetermined value, the voiceprint authentication succeeds because the voiceprints match, but if the similarity score does not reach the predetermined value, the voiceprint authentication fails.

ここで、本実施の形態は声紋認証の誤判定を防止する観点から、上記した通り複数の声紋データを登録している。声紋登録時のパスワード発話を促すガイダンスで１回目は「普通」、２回目は「早口」、３回目は「ゆっくり」といった具合に発話速度を変えるように指示して発話速度の異なる声紋データを登録することが望ましい。パスワード発話時のユーザの状況（心理状態及び健康状態を含む）に応じて発話速度が大きく変化し、また声帯の緊張度も大きく変化するので、登録された声紋データが１つだけであると誤判定を完全に防ぐことは困難である。本実施の形態の如く、発話速度の異なる複数の声紋データをユーザプロファイルとして登録しておけば、ユーザの状況に応じて発話音声の声紋データにぶれがあったとしても、柔軟に対応することができ、誤判定を効果的に防止することができる。すなわち、類似度の閾値を下げることでも声紋データのぶれにある程度対応可能であるが、それでは声紋認証自体のセキュリティレベルを低下させることになる。一方、本実施の形態のように発話速度を変えた複数の声紋データで対応すれば、声紋認証の信頼性を低下させること無く誤判定を効果的に抑制することができる。 Here, the present embodiment registers a plurality of voiceprint data as described above from the viewpoint of preventing erroneous determination of voiceprint authentication. Guidance for prompting password utterance when registering voiceprints Registering voiceprint data with different utterance speeds by instructing to change the utterance speed, such as "Normal" for the first time, "Fast" for the second time, "Slow" for the third time, etc. It is desirable to do. Depending on the user's situation at the time of password utterance (including psychological state and health state), the utterance speed changes greatly and the vocal cord tension also changes greatly. Therefore, it is mistaken that there is only one registered voiceprint data. It is difficult to prevent judgment completely. If a plurality of voiceprint data having different utterance speeds are registered as a user profile as in this embodiment, even if there is a fluctuation in the voiceprint data of the uttered voice according to the user's situation, it can be flexibly dealt with. And erroneous determination can be effectively prevented. In other words, it is possible to cope with a certain amount of fluctuation of voiceprint data by lowering the threshold of similarity, but this lowers the security level of voiceprint authentication itself. On the other hand, if a plurality of voiceprint data with different utterance speeds are used as in this embodiment, erroneous determination can be effectively suppressed without reducing the reliability of voiceprint authentication.

また、ユーザの体型変化又は加齢による声帯変化によっても声紋は変化する。このような声紋変化に対応するために、ユーザプロファイルが最新の声紋データに柔軟に更新されるように制御している。 The voiceprint also changes due to changes in the user's body shape or changes in vocal cords due to aging. In order to cope with such a voiceprint change, the user profile is controlled to be flexibly updated to the latest voiceprint data.

図６は、ユーザプロファイルの一部である声紋データの登録状態を模式的に示した図である。新規声紋登録時に、普通の発話速度でのパスワード発話音声から得られた声紋データを第１声紋パターンとして登録し、早口でのパスワード発話音声から得られた声紋データを第２声紋パターンとして登録し、ゆっくりした発話速度でのパスワード発話音声から得られた声紋データを第３声紋パターンとして登録している。また各登録声紋データの登録日時を登録している。声紋認証部２７は、第１声紋パターンとして登録された声紋データと照合して類似度が所定値以下であれば、次に第２声紋パターンとして登録された声紋データとの照合を行い類似度が所定値以下であれば、さらに第３声紋パターンとして登録された声紋データとの照合を行う。類似度が所定値を超えた声紋データが第１から第３のいずれの声紋パターンであるかをユーザプロファイル再構築部２９に伝える。 FIG. 6 is a diagram schematically showing a registration state of voiceprint data which is a part of the user profile. When registering a new voiceprint, register the voiceprint data obtained from the password utterance voice at the normal utterance speed as the first voiceprint pattern, and register the voiceprint data obtained from the password utterance voice in the early voice as the second voiceprint pattern, Voiceprint data obtained from the password utterance voice at a slow utterance speed is registered as the third voiceprint pattern. Also, the registration date and time of each registered voiceprint data is registered. The voiceprint authentication unit 27 collates with the voiceprint data registered as the first voiceprint pattern, and if the similarity is equal to or less than a predetermined value, then the voiceprint authentication unit 27 performs collation with the voiceprint data registered as the second voiceprint pattern. If it is equal to or less than the predetermined value, it is further collated with voiceprint data registered as the third voiceprint pattern. The user profile reconstructing unit 29 is informed of which voice print data is the first to third voice print patterns whose similarity exceeds a predetermined value.

ユーザプロファイル再構築部２９は、声紋認証部２７から伝えられた声紋パターンの末尾に、今回認証成功した声紋データを追加登録し、さらに登録日時を書き込む。声紋パターン毎の登録数は予め決めておき、登録数を超えたときには登録日時の古いものから削除する。このようなユーザプロファイル再構築によりユーザプロファイルには最新の声紋データが声紋パターン毎に登録されるものとなる。したがって、ユーザの体型変化又は加齢による声帯変化によってパスワード発話音声の声紋データが変化したとしても、最新の声紋データに基づいて声紋認証されるので、誤判定を有効に防止することができる。 The user profile restructuring unit 29 additionally registers the voiceprint data that has been successfully authenticated at the end of the voiceprint pattern transmitted from the voiceprint authentication unit 27 and further writes the registration date and time. The number of registrations for each voiceprint pattern is determined in advance, and when the number exceeds the number of registrations, the oldest registration date is deleted. With this user profile reconstruction, the latest voiceprint data is registered in the user profile for each voiceprint pattern. Therefore, even if the voiceprint data of the password uttered voice changes due to the change in the user's body shape or the aging, the voiceprint authentication is performed based on the latest voiceprint data, so that erroneous determination can be effectively prevented.

また、声紋認証部２７による声紋認証結果は認証結果通知部２８から通知先として設定されたユーザ管理サーバ７へ通知される。また、声紋認証部２７から音声ガイダンス対応部１１に対して声紋認証成功／失敗が通知され、音声ガイダンス対応部１１からユーザ通話端末１に対してユーザ認証成功／失敗が知らされる。 The voice print authentication result by the voice print authentication unit 27 is notified from the authentication result notification unit 28 to the user management server 7 set as a notification destination. Also, the voiceprint authentication unit 27 notifies the voice guidance corresponding unit 11 of the success / failure of the voiceprint authentication, and the voice guidance support unit 11 notifies the user call terminal 1 of the success / failure of the user authentication.

このように本実施の形態によれば、ユーザ通話端末１から入力されたパスワード発話音声データをパスワード認証と声紋認証の両方に用いるようにしたので、ユーザは対話型音声対応装置３に接続してパスワードを１回発声するだけでユーザ認証サービスを受けることができ、ユーザ側の操作を簡略化できる。また、通信回線の切り替えを伴うことなくパスワード認証と声紋認証が可能であるので、ユーザ認証に要する時間を短縮化することも可能である。 As described above, according to the present embodiment, since the password utterance voice data input from the user call terminal 1 is used for both password authentication and voiceprint authentication, the user connects to the interactive voice-compatible device 3. The user authentication service can be received by uttering the password once, and the operation on the user side can be simplified. Further, since password authentication and voiceprint authentication can be performed without switching communication lines, it is possible to shorten the time required for user authentication.

また本実施の形態によれば、発話速度の異なるキーワード発話音声からそれぞれ声紋データ（第１から第３の声紋パターン）を抽出して登録しておき、ユーザ認証時にパスワード発話音声データから抽出した声紋データがいずれかの声紋パターンの声紋データと一致すれば声紋認証が成功するようにしたので、ユーザ側の状況変化に柔軟に対応することができ声紋認証の誤判定を効果的に防止することができる。 According to the present embodiment, voice print data (first to third voice print patterns) are extracted and registered from keyword utterance voices having different utterance speeds, and the voice print extracted from the password utterance voice data at the time of user authentication. Since the voiceprint authentication succeeds if the data matches the voiceprint data of one of the voiceprint patterns, it is possible to flexibly cope with a change in the situation on the user side and effectively prevent erroneous determination of the voiceprint authentication. it can.

また本実施の形態によれば、声紋認証に成功した声紋データをユーザプロファイル再構築部２９がユーザプロファイルに登録してユーザプロファイルを最新の声紋データで再構築するようにしたので、ユーザの体型変化又は加齢による声帯変化によってパスワード発話音声の声紋データが変化したとしても、最新の声紋データに基づいて声紋認証されるので、誤判定を有効に防止することができる。 In addition, according to the present embodiment, the user profile reconstruction unit 29 registers the voice print data that has succeeded in the voice print authentication into the user profile and reconstructs the user profile with the latest voice print data. Alternatively, even if the voiceprint data of the password utterance is changed due to a change in the vocal cords due to aging, since the voiceprint authentication is performed based on the latest voiceprint data, erroneous determination can be effectively prevented.

また本実施の形態によれば、認証対象ユーザの登録パスワードだけでキーワード辞書を作成するので、パスワード発話音声に対する音声認識精度を高めることができると共に認識時間を短縮することができる。 Further, according to the present embodiment, since the keyword dictionary is created only with the registered password of the user to be authenticated, it is possible to improve the voice recognition accuracy for the password uttered voice and shorten the recognition time.

なお、上記実施の形態では、図４に示すように新規声紋登録時にパスワード発話する前に雑音判定を行っているが、パスワード発話音声を取得した後に当該パスワード発話音声の前後に存在する無音区間から雑音判定を行うようにしても良い。 In the above embodiment, as shown in FIG. 4, noise determination is performed before password utterance at the time of new voiceprint registration. However, after acquiring the password utterance voice, from the silent section existing before and after the password utterance voice. Noise determination may be performed.

図７に示すように、発話取得部１６が取得するパスワード発話音声データは発話開始前の所定区間と発話終了後の所定区間を含んでいる。音声認識サーバ５の音声認識エンジンは発話区間を切り出して音響分析を行っており、発話開始前の無音区間（Ｔ１からＴ２）のタイムタグと、発話終了後の無音区間（Ｔ３からＴ４）のタイムタグを音声認識結果と一緒に対話型音声対応装置３へ返送することができる。対話型音声対応装置３は音声認識サーバ５から送られてきたタイムタグを雑音判定部１４へ入力する。雑音判定部１４は発話取得部１６からパスワード発話音声データを取り込み、タイムタグを参照して無音区間（Ｔ１からＴ２）（Ｔ３からＴ４）を切り出して雑音判定を行う。 As shown in FIG. 7, the password utterance voice data acquired by the utterance acquisition unit 16 includes a predetermined section before the start of the utterance and a predetermined section after the end of the utterance. The voice recognition engine of the voice recognition server 5 cuts out the utterance section and performs acoustic analysis, and the time tag of the silent section (T1 to T2) before the start of the utterance and the time of the silent section (T3 to T4) after the end of the utterance. The tag can be returned to the interactive speech support apparatus 3 together with the speech recognition result. The interactive speech support apparatus 3 inputs the time tag sent from the speech recognition server 5 to the noise determination unit 14. The noise determination unit 14 takes in the password utterance voice data from the utterance acquisition unit 16, cuts out the silent section (T1 to T2) (T3 to T4) with reference to the time tag, and performs noise determination.

このように、パスワード発話音声データを用いて雑音判定を行うことで、雑音判定のためだけに発話を中断させて雑音測定する時間を削減でき、声紋登録に要する時間を短縮することができる。 Thus, by performing noise determination using the password utterance voice data, it is possible to reduce time for noise measurement by interrupting utterances only for noise determination, and it is possible to reduce time required for voiceprint registration.

また、以上の説明では、ユーザ通話端末１から対話型音声対応装置３にダイヤルして回線接続しているが、ユーザ通話端末１と対話型音声対応装置３との間で音声データの送受信が可能であるならば通信形態は限定されない。 In the above description, the user call terminal 1 dials the interactive voice response device 3 to connect the line. However, voice data can be transmitted and received between the user call terminal 1 and the interactive voice response device 3. If it is, the communication form will not be limited.

本発明は、ユーザ通話端末から取り込んだ発話音声を通信網経由で受信してリアルタイムでユーザ認証を行う音声認証局に適用可能である。 The present invention can be applied to a voice authentication station that receives a speech voice captured from a user call terminal via a communication network and performs user authentication in real time.

本発明の一実施の形態のシステム構成図System configuration diagram of an embodiment of the present invention 図１に示す対話型音声対応装置の機能ブロック図Functional block diagram of the interactive speech support apparatus shown in FIG. 図１に示す声紋認証サーバの機能ブロック図Functional block diagram of the voiceprint authentication server shown in FIG. 本実施の形態での新規声紋登録のフロー図Flow chart of new voiceprint registration in this embodiment 本実施の形態での声紋認証のフロー図Flow chart of voiceprint authentication in this embodiment 本実施の形態におけるユーザプロファイルへの声紋データの登録状態を示す図The figure which shows the registration state of the voiceprint data to the user profile in this Embodiment 発話区間と無音区間とを含んだ発話音声データの音声波形図Speech waveform diagram of utterance voice data including utterance interval and silent interval

Explanation of symbols

１ユーザ通話端末
２電話網
３対話型音声対応装置
４声紋認証サーバ
５音声認識サーバ
６ユーザデータベース
７ユーザ管理サーバ
１１音声ガイダンス対応部
１２発信者番号取得部
１３ユーザ情報作成部
１４雑音判定部
１５発話音声レベル判定部
１６発話取得部
１７プロファイル作成依頼部
１８認証依頼部
２１ユーザ登録確認部
２２ユーザ情報取得部
２３キーワード辞書作成部
２５ユーザプロファイル作成部
２６ユーザプロファイルデータベース
２７声紋認証部
２８認証結果通知部
２９ユーザプロファイル再構築部 DESCRIPTION OF SYMBOLS 1 User call terminal 2 Telephone network 3 Interactive voice corresponding apparatus 4 Voiceprint authentication server 5 Voice recognition server 6 User database 7 User management server 11 Voice guidance corresponding part 12 Caller number acquisition part 13 User information creation part 14 Noise judgment part 15 Speech Voice level determination unit 16 Utterance acquisition unit 17 Profile creation request unit 18 Authentication request unit 21 User registration confirmation unit 22 User information acquisition unit 23 Keyword dictionary creation unit 25 User profile creation unit 26 User profile database 27 Voice print authentication unit 28 Authentication result notification unit 29 User Profile Restructuring Unit

Claims

Speech acquisition means for acquiring speech utterance voice input to a user call terminal via a communication network, and voice recognition means for determining by speech recognition whether the acquired password utterance voice matches a registered password registered in advance. And voice print authentication means for determining whether or not the voice print data of the password uttered voice matches the registered voice print data in the user profile of the user.

The user profile has a plurality of patterns of voiceprint data extracted from password utterances having different utterance speeds, and the voiceprint authentication means includes the voiceprint data of the password utterances acquired at the time of user authentication in the user profile. 2. The user authentication system according to claim 1, wherein it is determined whether or not it matches any one of a plurality of patterns of voiceprint data.

User information acquisition means for acquiring the registered password of the user from the user management means for managing the registered password based on the identification information of the user call terminal that requested user authentication or the identification information of the user, and the user information acquisition means 3. The user authentication system according to claim 1, further comprising: a keyword dictionary creating unit that converts the registered password into a keyword dictionary used by the voice recognition unit for voice recognition.

4. The apparatus according to claim 1, further comprising: a user profile restructuring unit for additionally registering voice print data of the password uttered voice as voice print data in the user profile when the voice print authentication unit succeeds in the voice print authentication. The user authentication system according to any one of the above.

When newly registering voiceprint data, there is a period during which no utterance is input to the user call terminal, and noise that determines whether or not the user's environment has an acceptable noise level based on a received signal during the period when the utterance is not input Determination means, and user profile creation means for extracting voiceprint data from password utterance voice data for voiceprint registration and registering it in the user profile after the noise determination means determines that the noise level is acceptable. The user authentication system according to any one of claims 1 to 4.

When newly registering voiceprint data, noise that determines whether or not the environment on the user side is at an acceptable noise level from the state of at least one predetermined section before or after the utterance period from the password utterance voice data for voiceprint registration 5. The apparatus according to claim 1, further comprising: a determination unit; and a user profile creation unit that extracts voiceprint data from a speech period of voice utterance voice data for voiceprint registration and registers the extracted voiceprint data in the user profile. User authentication system.

It is provided with a sound breaking judgment means for judging whether or not the input voice exceeds the input allowable level from the utterance period of the password utterance voice data for voiceprint registration, and it is determined that the sound breaking judgment means falls within the input allowable level. 7. The user authentication system according to claim 6, wherein the voice print data is extracted after the extraction.

When registering new voiceprint data, perform password utterances for voiceprint registration at least three times, compare the first password utterance voice with the second password utterance voice, calculate the person's accuracy, and first password utterance Comparing the voice and the third password utterance voice to calculate the personality accuracy, comparing the second password utterance voice and the third password utterance voice to calculate the personality accuracy, and calculating all the personality accuracy The user authentication system according to any one of claims 1 to 4, wherein each password uttered voice is adopted when the password exceeds a predetermined value.

Acquiring a speech utterance of a password input to a user call terminal via a communication network; determining by speech recognition whether the acquired password utterance matches a registered password registered in advance; and the password Determining whether the voiceprint data of the uttered voice matches the registered voiceprint data in the user profile of the user.