JP2010109618A

JP2010109618A - Authentication device, authentication method, and program

Info

Publication number: JP2010109618A
Application number: JP2008278893A
Authority: JP
Inventors: Nobuya Yamaguchi; 伸弥山口
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2008-10-29
Filing date: 2008-10-29
Publication date: 2010-05-13

Abstract

<P>PROBLEM TO BE SOLVED: To improve the accuracy of identity confirmation by flexibly combining various authentication methods with one another, in an authentication technique for confirming identity by telephone voice. <P>SOLUTION: An authentication device for confirming the identity of a speaker carrying out a telephone call using a telephone terminal connected to a telephone network includes: a means to receive the voice of the speaker from the telephone network; a means to authenticate voiceprint by checking voiceprint information of the received voice against predetermined voiceprint information; a means to perform a speech recognition authentication processing using a first keyword; and an authentication control means to perform a speech recognition authentication processing using a second keyword when the authentication results by the voiceprint authentication is different from the authentication results by the speech recognition authentication using the first keyword. The authentication control means determines the identity of the speaker based on the speech recognition authentication results using the second keyword. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、音声を用いて本人性を確認するための認証技術に関するものである。 The present invention relates to an authentication technique for confirming identity by using voice.

テレビショッピング等のように、ユーザからコールセンタ等に電話をかけることにより商品の販売等、何かしらの商取引を行うサービスが普及している。このようなサービスにおいては、電話の発信者が、正当にサービスを受けられる者かどうかを着信者側が確認することが重要である。 Services such as TV shopping and the like that perform some kind of commercial transaction such as sales of products by calling a call center from a user are becoming widespread. In such a service, it is important for the callee side to check whether the caller is a person who can receive the service properly.

そのために、例えば、発信者の電話番号が予め登録したユーザの電話番号と合致するかどうかを判定することにより発信者を認証することが行われる。また、ユーザに、予め登録したキーワード等を話してもらうことにより認証を行う技術もある。
特開２０００−３４７６８３号公報 For this purpose, for example, the caller is authenticated by determining whether or not the caller's telephone number matches a previously registered user's telephone number. There is also a technique for performing authentication by having a user speak a keyword or the like registered in advance.
JP 2000-347683 A

しかしながら、上記従来の技術では、本人でない者が本人になりすまして本人の電話から電話をかける場合に不正を見抜くことは困難である。本人性を確認するために、声紋認証を用いることも考えられるが、声紋認証のみでは本人であるにも関わらず本人性を否定してしまうことも考えられ、声紋認証のみを用いて本人性を確認することには問題がある。 However, with the above-described conventional technique, it is difficult to detect fraud when a person who is not the person is impersonating the person and making a call from the person's own telephone. Although voiceprint authentication may be used to confirm the identity, voiceprint authentication alone may deny the identity of the person even though it is the identity of the person. There is a problem with checking.

本発明は、上記の点に鑑みてなされたものであり、電話音声により本人性を確認するための認証技術において、種々の認証方式を柔軟に組み合わせて、本人性確認の精度を向上させるための技術を提供することを目的とする。 The present invention has been made in view of the above points, and in an authentication technique for confirming identity by telephone voice, various authentication methods can be flexibly combined to improve the accuracy of identity confirmation. The purpose is to provide technology.

上記の課題を解決するために、本発明は、電話網に接続された電話端末を利用して通話を行う話者の本人性確認を行うための認証装置であって、所定の声紋情報、第１のキーワード、及び第２のキーワードを格納する格納手段と、前記電話端末から送出された前記話者の音声を前記電話網から受信する受信手段と、前記受信手段により受信した音声から声紋情報を取得し、当該声紋情報と、前記格納手段に格納された所定の声紋情報とを照合することにより声紋認証を行う声紋認証手段と、前記第１のキーワードを前記話者に発話させることを促す音声メッセージを前記電話端末に送信し、その後に前記受信手段により受信した前記話者の音声の内容が、前記格納手段に格納された第１のキーワードに該当するかどうかを判定する音声認識認証手段と、前記声紋認証手段による認証結果と、前記第１のキーワードを用いた前記音声認識認証手段による認証結果とが異なる場合に、前記第２のキーワードを前記話者に発話させることを促す音声メッセージを前記電話端末に送信し、その後に前記受信手段により受信した前記話者の音声の内容が、前記格納手段に格納された第２のキーワードに該当するかどうかを判定する音声認識認証処理を前記音声認識認証手段に実行させる認証制御手段と、を備え、前記認証制御手段が、前記第２のキーワードを用いた音声認識認証結果に基づき前記話者の本人性の判定を行うことを特徴とする認証装置として構成される。 In order to solve the above-described problems, the present invention provides an authentication device for confirming the identity of a speaker who makes a call using a telephone terminal connected to a telephone network. Storage means for storing one keyword and second keyword, receiving means for receiving the voice of the speaker transmitted from the telephone terminal from the telephone network, and voiceprint information from the voice received by the receiving means. A voiceprint authentication unit that acquires and compares the voiceprint information with predetermined voiceprint information stored in the storage unit, and voice that prompts the speaker to utter the first keyword A voice recognition authorization for determining whether or not the content of the voice of the speaker received by the receiving means after transmitting a message to the telephone terminal corresponds to the first keyword stored in the storage means; And voice that prompts the speaker to speak the second keyword when the authentication result by the voiceprint authentication unit is different from the authentication result by the voice recognition authentication unit using the first keyword. A voice recognition and authentication process for determining whether or not the content of the voice of the speaker received by the receiving means corresponds to the second keyword stored in the storage means after transmitting a message to the telephone terminal Authentication control means to be executed by the voice recognition authentication means, wherein the authentication control means determines the identity of the speaker based on a voice recognition authentication result using the second keyword. Configured as an authentication device.

前記認証制御手段は、前記第２のキーワードを用いた音声認識認証処理の際に、更に前記声紋認証手段に対して前記第２のキーワードに関わる前記話者の音声に対する声紋認証を実行させ、当該声紋認証の結果と、前記第２のキーワードを用いた音声認識認証処理の結果とに基づき、前記話者の本人性の判定を行うこととしてもよい。 The authentication control unit causes the voiceprint authentication unit to further execute voiceprint authentication for the voice of the speaker related to the second keyword in the voice recognition authentication process using the second keyword, The identity of the speaker may be determined based on the result of voiceprint authentication and the result of voice recognition authentication processing using the second keyword.

前記受信手段により受信する前記話者の音声における無音区間の継続時間又は無音区間の回数に基づき前記話者の会話の安定度を判定し、更に、前記話者の音声から所定の特徴量を抽出し、当該特徴量を用いて前記話者の興奮度を判定し、前記安定度と前記興奮度の両方又はいずれか１つに基づき、オペレータを介して前記話者の本人性を確認するオペレータ認証に移行するか否かを判定するオペレータ認証判定手段を更に備えることとしてもよい。 Based on the duration of a silent period or the number of silent periods in the voice of the speaker received by the receiving means, the conversation stability of the speaker is determined, and a predetermined feature amount is extracted from the voice of the speaker And determining the speaker's excitement level using the feature amount, and confirming the speaker's identity through an operator based on both the stability and / or the excitement level Operator authentication determination means for determining whether or not to move to may be further provided.

また、前記格納手段は、複数の登録ユーザの電話番号を更に格納しており、前記認証装置は、前記電話端末の電話番号を前記電話網から受信する電話番号受信手段と、前記電話番号受信手段により受信した前記電話端末の電話番号が、前記格納手段に格納されているか否かを判定することにより電話端末認証を行う電話端末認証手段と、を更に備え、前記認証装置は、前記電話端末認証手段による電話端末認証に成功した場合にのみ、前記声紋認証手段及び前記音声認識認証手段を用いた認証処理を行うこととしてもよい。 The storage means further stores telephone numbers of a plurality of registered users, and the authentication device includes a telephone number receiving means for receiving the telephone number of the telephone terminal from the telephone network, and the telephone number receiving means. Telephone terminal authentication means for performing telephone terminal authentication by determining whether or not the telephone number of the telephone terminal received by the storage means is stored in the storage means, and the authentication device comprises the telephone terminal authentication Only when the telephone terminal authentication by the means is successful, an authentication process using the voiceprint authentication means and the voice recognition authentication means may be performed.

本発明によれば、音声により本人性を確認するための認証技術において、種々の認証方式を柔軟に組み合わせることにより、本人性確認の精度を向上させることが可能になる。また、話者の音声からオペレータ認証要否を判定し、自動的にオペレータ認証に移行することが可能となる。 According to the present invention, in the authentication technique for confirming the identity by voice, it is possible to improve the accuracy of the identity confirmation by flexibly combining various authentication methods. Further, it is possible to determine whether or not operator authentication is necessary from the voice of the speaker and automatically shift to operator authentication.

以下、図面を参照して本発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

（システム構成）
図１は、本発明の実施の形態に係るシステムの構成図である。図１に示すように、本実施の形態に係るシステムは、発信者端末１、着信者端末２、オペレータ端末３及び認証装置４が電話網５を介して接続された構成をとる。また、電話網５は、通話制御装置６を備えている。 (System configuration)
FIG. 1 is a configuration diagram of a system according to an embodiment of the present invention. As shown in FIG. 1, the system according to the present embodiment has a configuration in which a caller terminal 1, a callee terminal 2, an operator terminal 3, and an authentication device 4 are connected via a telephone network 5. The telephone network 5 includes a call control device 6.

発信者端末１、着信者端末２、及びオペレータ端末３は、電話網５を介して通話を行うために使用される一般的な電話機である。なお、着信者端末２は、自動的に受注業務等を行う自動音声応答システムでもよい。電話網５は、固定電話網、携帯電話網、VoIP網のいずれでもよく、特定の方式に限定されるものではない。 The caller terminal 1, the callee terminal 2, and the operator terminal 3 are general telephones used for making a call via the telephone network 5. The callee terminal 2 may be an automatic voice response system that automatically performs an order receiving operation. The telephone network 5 may be any of a fixed telephone network, a mobile phone network, and a VoIP network, and is not limited to a specific method.

通話制御装置６は、発信者番号や通話音声を認証装置４に送信したり、認証装置４からの指示に基き、呼の接続制御等を行うための装置である。このような通話制御装置６の通話制御動作は従来技術を用いて実現できる。 The call control device 6 is a device for transmitting a caller number and call voice to the authentication device 4 and for performing call connection control based on an instruction from the authentication device 4. Such a call control operation of the call control device 6 can be realized by using a conventional technique.

認証装置４は、本発明に係る認証処理を行う装置である。図２に、認証装置４の機能構成図を示す。 The authentication device 4 is a device that performs authentication processing according to the present invention. FIG. 2 shows a functional configuration diagram of the authentication device 4.

図２に示すように、本実施の形態に係る認証装置４は、送受信部４１、端末認証部４２、音声認識認証部４３、声紋認証部４４、オペレータ認証判定部４５、認証制御部４６、及びユーザ情報格納部４７を有する。 As shown in FIG. 2, the authentication device 4 according to the present embodiment includes a transmission / reception unit 41, a terminal authentication unit 42, a voice recognition authentication unit 43, a voice print authentication unit 44, an operator authentication determination unit 45, an authentication control unit 46, and A user information storage unit 47 is included.

送受信部４１は、通話制御装置６から電話番号、音声等を受信したり、呼接続の制御等を行うための制御情報を通話制御装置６に送信するための機能部である。なお、音声に関しては、通話制御装置６から直接受信することに代えて、通話制御装置６による呼制御に基づき、端末から電話網５を介して直接受信することも可能である。 The transmission / reception unit 41 is a functional unit for receiving a telephone number, voice, and the like from the call control device 6 and transmitting control information for controlling call connection to the call control device 6. Note that voice can be directly received from the terminal via the telephone network 5 based on call control by the call control device 6 instead of being directly received from the call control device 6.

端末認証部４２は、通話制御装置６から、発信者端末１の電話番号を受信し、発信者端末１の電話番号に基づき、ユーザ情報格納部４７に格納された情報を検索することにより、発信者端末１の電話番号が予め登録された電話番号かどうかの判定を行う機能部である。 The terminal authentication unit 42 receives the telephone number of the caller terminal 1 from the call control device 6, and searches the information stored in the user information storage unit 47 based on the telephone number of the caller terminal 1 to make a call. It is a functional unit that determines whether the telephone number of the user terminal 1 is a telephone number registered in advance.

音声認識認証部４３は、通話制御装置６から受信する音声を音声認識技術を用いてテキストに変換し、そのテキストとユーザ情報格納部４７に格納された情報とを比較することにより音声認識認証を行う機能部である。 The voice recognition authentication unit 43 converts voice received from the call control device 6 into text using voice recognition technology, and compares the text with information stored in the user information storage unit 47 to perform voice recognition authentication. It is a functional part to perform.

声紋認証部４４は、通話制御装置６から受信する音声から声紋（音声の特徴）を抽出し、当該声紋とユーザ情報格納部４７に格納された声紋とを比較することにより、声紋認証を行う機能部である。 The voiceprint authentication unit 44 extracts a voiceprint (speech characteristics) from the voice received from the call control device 6 and compares the voiceprint with the voiceprint stored in the user information storage unit 47 to perform voiceprint authentication. Part.

オペレータ認証判定部４５は、音声認識認証／声紋認証からオペレータ認証に移行するかどうかを判定するための機能部である。認証制御部４６は、各認証部に対して処理指示を行うとともに、各認証部による処理結果を取得する機能や、発信者端末１に対して発話を促すための音声メッセージを送出する機能、本人性の判定や認証継続の判定を行う機能等を有する機能部である。後述するフローチャートに示す処理は、この認証制御部４６の制御により行われる。 The operator authentication determination unit 45 is a functional unit for determining whether to move from voice recognition authentication / voice print authentication to operator authentication. The authentication control unit 46 gives a processing instruction to each authentication unit, obtains a processing result by each authentication unit, sends a voice message for prompting the utterance to the caller terminal 1, It is a functional unit having a function of performing sex determination and authentication continuation determination. The processing shown in the flowchart described below is performed under the control of the authentication control unit 46.

ユーザ情報格納部４７は、ユーザ情報をテーブルとして格納する機能部である。図３に、ユーザ情報格納部４７が格納するテーブルの一例を示す。図３に示すように、本実施形態におけるユーザ情報格納部４７には、電話番号、名前及び住所等の個人情報、キーワード１、キーワード２、及び声紋情報が対応付けて格納されている。ここで、本実施の形態におけるキーワード１は、最初の音声認識認証に用いるキーワードであり、キーワード２は、最初の音声認識認証等において認証結果が成功でなかった場合に、次の音声認識認証で用いるキーワードである。 The user information storage unit 47 is a functional unit that stores user information as a table. FIG. 3 shows an example of a table stored in the user information storage unit 47. As shown in FIG. 3, in the user information storage unit 47 in the present embodiment, personal information such as a telephone number, a name, and an address, keyword 1, keyword 2, and voiceprint information are stored in association with each other. Here, the keyword 1 in the present embodiment is a keyword used for the first voice recognition authentication, and the keyword 2 is the next voice recognition authentication when the authentication result is not successful in the first voice recognition authentication or the like. The keyword to use.

これらの情報は、例えば、ユーザ登録処理等において予め登録される情報である。また、声紋情報は、該当発話者の音声を取得する機会がある場合に、随時取得、更新することが可能になっている。 These pieces of information are information registered in advance in, for example, user registration processing. The voiceprint information can be acquired and updated as needed when there is an opportunity to acquire the voice of the corresponding speaker.

認証装置４は、CPU、メモリ等の記憶装置を有する一般的なコンピュータに、上記の機能部に対応する処理を行うためのプログラムを搭載することにより実現することが可能である。当該プログラムは、ネットワーク上のサーバからダウンロードしてコンピュータにインストールしてもよいし、プログラムが記録された記録媒体（メモリ等）からコンピュータにインストールしてもよい。 The authentication device 4 can be realized by mounting a program for performing processing corresponding to the above functional unit on a general computer having a storage device such as a CPU and a memory. The program may be downloaded from a server on the network and installed in the computer, or may be installed in the computer from a recording medium (memory or the like) on which the program is recorded.

（システムの動作）
次に、図４に示すフローチャートを参照して、認証装置４の動作を説明する。 (System operation)
Next, the operation of the authentication device 4 will be described with reference to the flowchart shown in FIG.

まず、発信者端末１から着信者端末２に対して電話の発呼（電話をかけること）がなされることにより、通話制御装置６が発信者端末１の電話番号を電話網５を介して取得し、その電話番号を認証装置４に送信する。 First, when a call is made from the caller terminal 1 to the callee terminal 2 (calling), the call control device 6 acquires the telephone number of the caller terminal 1 via the telephone network 5. Then, the telephone number is transmitted to the authentication device 4.

認証装置４の端末認証部４２は、送受信部４１を介して発信者端末１の電話番号を受信する。端末認証部４２は、受信した電話番号を用いてユーザ情報格納部４７を検索することにより、受信した電話番号がユーザ情報格納部４７に格納されているかどうか（登録されているかどうか）をチェックすることにより端末認証を行う（ステップ１）。 The terminal authentication unit 42 of the authentication device 4 receives the telephone number of the caller terminal 1 via the transmission / reception unit 41. The terminal authentication unit 42 searches the user information storage unit 47 using the received telephone number to check whether the received telephone number is stored in the user information storage unit 47 (whether it is registered). Thus, terminal authentication is performed (step 1).

ここで、受信した電話番号がユーザ情報格納部４７に格納されていなければ（ステップ２におけるNo)、発信者端末１からの着信者端末２への接続を拒否する（ステップ３）。具体的には、例えば、認証制御部４６が、発信者端末１に対して着信者端末２への接続ができない旨の音声メッセージを送信するとともに、通話制御装置６に対して発信者端末１と着信者端末２との間の呼接続処理を中断する旨の指示を送る。 If the received telephone number is not stored in the user information storage unit 47 (No in step 2), the connection from the caller terminal 1 to the callee terminal 2 is rejected (step 3). Specifically, for example, the authentication control unit 46 transmits a voice message indicating that connection to the callee terminal 2 is not possible to the caller terminal 1, and the caller terminal 1 and the call control device 6. An instruction to interrupt the call connection process with the callee terminal 2 is sent.

ステップ１の端末認証の結果、受信した電話番号がユーザ情報格納部４７に格納されていると判定された場合（ステップ２のYes）、認証制御部４７は、発信者端末１に対し、予め登録されたキーワード１を発信者に発話させるための音声メッセージ（もちろん、キーワード１自身は含まれない）を通話制御装置６を介して送信し、音声認識認証部４３に対して音声認識を行うよう指示することにより、音声認識認証処理を行う（ステップ４）。 If it is determined that the received telephone number is stored in the user information storage unit 47 as a result of the terminal authentication in step 1 (Yes in step 2), the authentication control unit 47 registers the caller terminal 1 in advance. A voice message (of course, keyword 1 itself is not included) is transmitted through the call control device 6 to cause the caller to speak the keyword 1 and the voice recognition / authentication unit 43 is instructed to perform voice recognition. Thus, voice recognition authentication processing is performed (step 4).

音声認識認証処理（ステップ４）において、音声認識認証部４３は、上記音声メッセージに応答して発信者が発した音声を通話制御装置６を介して受信する。音声認識認証部４３は、受信した音声をテキストに変換し、当該テキストと、ステップ１で確認した発信者の電話番号に対応するキーワード１とを比較することにより、発信者が発した音声がキーワード１に対応するかどうかを調べる。 In the voice recognition authentication process (step 4), the voice recognition authentication unit 43 receives the voice uttered by the caller in response to the voice message via the call control device 6. The voice recognition / authentication unit 43 converts the received voice into text and compares the text with the keyword 1 corresponding to the telephone number of the caller confirmed in step 1, so that the voice uttered by the caller is the keyword. Check whether it corresponds to 1.

音声認識認証部４３により、発信者が発した音声がキーワード１に対応しないと判定された場合（音声認識認証に失敗した場合）は、ステップ３に移行し、接続を拒否する。 When the voice recognition authentication unit 43 determines that the voice uttered by the caller does not correspond to the keyword 1 (when the voice recognition authentication fails), the process proceeds to step 3 and the connection is rejected.

音声認識認証部４３により、発信者が発した音声がキーワード１に対応すると判定された場合（音声認識認証に成功した場合）には声紋認証処理が行われる（ステップ６）。 When the voice recognition authentication unit 43 determines that the voice uttered by the caller corresponds to the keyword 1 (when the voice recognition authentication is successful), a voiceprint authentication process is performed (step 6).

声紋認証処理（ステップ６）において、認証制御部４６は、声紋認証部４４に対して声紋認証を行うよう指示する。更に、認証制御部４６は、取引に関係する質問やアンケート等の音声メッセージを発信者端末１に送信することにより、発信者に対して発話を行うように促す。認証制御部４６はこのような音声メッセージを、声紋認証処理の開始から予め定めた時間の間だけ送出する。これにより、予め定めた時間分発信者の発話音声を取得することが可能になる。 In the voiceprint authentication process (step 6), the authentication control unit 46 instructs the voiceprint authentication unit 44 to perform voiceprint authentication. Furthermore, the authentication control unit 46 urges the caller to speak by transmitting a voice message such as a question or questionnaire related to the transaction to the caller terminal 1. The authentication control unit 46 sends out such a voice message for a predetermined time from the start of the voiceprint authentication process. Thereby, it becomes possible to acquire the voice of the caller for a predetermined time.

声紋認証部４６は、認証制御部４６からの音声メッセージに応答して発信者が発話した音声を、通話制御装置６を介して順次取得し、所定区間の音声から順次声紋を抽出し、抽出した声紋と、ユーザ情報格納部４７に格納された発信者の電話番号に対応する声紋とを比較することにより、声紋認証を行う。上記のように、音声の取得は一定時間連続して行われることから、声紋の抽出及び比較も複数回行うことになり、１回だけの発話を用いて声紋認証を行う場合に比べて声紋認証を精度を上げることができる。また、ここで取得した声紋情報をユーザ情報として蓄積しておき、次回からの比較対象に用いてもよい。これにより、様々な発話音声における声紋を蓄積でき、声紋認証の精度を向上させることができる。 The voiceprint authentication unit 46 sequentially acquires the voices uttered by the caller in response to the voice message from the authentication control unit 46 via the call control device 6, and extracts and extracts the voiceprints sequentially from the voices in the predetermined section. Voiceprint authentication is performed by comparing the voiceprint with the voiceprint corresponding to the telephone number of the caller stored in the user information storage unit 47. As described above, since voice acquisition is performed continuously for a certain period of time, voiceprint extraction and comparison are also performed a plurality of times, and voiceprint authentication is performed as compared to voiceprint authentication using a single utterance. The accuracy can be increased. Further, the voiceprint information acquired here may be stored as user information and used as a comparison target from the next time. Thereby, voiceprints in various uttered voices can be accumulated, and the accuracy of voiceprint authentication can be improved.

ステップ６の声紋認証の結果、声紋認証部４４により、発信者の音声の声紋と、ユーザ情報格納部４７に格納された発信者の電話番号に対応する声紋とが一致すると判定された場合（つまり、声紋認証に成功した場合）（ステップ７のYes)、認証制御部４６は認証に成功したと判定し（ステップ８）、通話制御装置６に対して発信者端末１と着信者端末２との間の呼接続を許可する旨の通知を行う。この通知を受けた通話制御装置６は、発信者端末１と着信者端末２との間の呼接続を行う。これにより、発信者端末１と着信者端末２との間の通話が行われる。 As a result of the voiceprint authentication in step 6, when the voiceprint authentication unit 44 determines that the voiceprint of the caller's voice matches the voiceprint corresponding to the caller's telephone number stored in the user information storage unit 47 (that is, If the voiceprint authentication is successful) (Yes in step 7), the authentication control unit 46 determines that the authentication is successful (step 8), and the call control device 6 determines whether the caller terminal 1 and the callee terminal 2 Notification that the call connection is permitted. Upon receiving this notification, the call control device 6 performs call connection between the caller terminal 1 and the callee terminal 2. Thereby, a call between the caller terminal 1 and the callee terminal 2 is performed.

ステップ６の声紋認証の結果、声紋認証部４４により、発信者の音声の声紋と、ユーザ情報格納部４７に格納された発信者の電話番号に対応する声紋とが一致しないと判定された場合（つまり、声紋認証に失敗した場合）、ステップ３に進み、接続を拒否することとしてもよいが、本実施の形態では、ステップ９に進み、更に認証を継続することとしている。 When the voiceprint authentication unit 44 determines that the voiceprint of the caller's voice does not match the voiceprint corresponding to the telephone number of the caller stored in the user information storage unit 47 as a result of the voiceprint authentication in step 6 ( That is, when voiceprint authentication fails), the process may proceed to step 3 to reject the connection, but in the present embodiment, the process proceeds to step 9 to further continue the authentication.

すなわち、声紋認証では、正しい発信者であっても声の調子などにより、認証に失敗する可能性が比較的高い。一方、この段階では、端末認証とキーワード認証共に成功しており発信者は本人である可能性が高い。従って、ステップ６の声紋認証に失敗した段階で接続を拒否してしまうと、正しい発信者であるにも関わらず、接続を拒否してしまう場合が生じ得る。そのような場合をできるだけ無くすために、本実施の形態では、ステップ７において声紋認証に失敗した場合には更に認証を継続することとしている。もちろん、なりすましをできるだけ排除するという観点から、ステップ６の声紋認証に失敗した場合に接続を拒否する処理を行うことも可能である。 That is, in voiceprint authentication, even a correct caller is relatively likely to fail authentication due to the tone of the voice. On the other hand, at this stage, both terminal authentication and keyword authentication have succeeded, and the sender is highly likely to be the person. Therefore, if the connection is rejected at the stage where the voiceprint authentication in step 6 fails, the connection may be rejected even though the caller is a correct sender. In order to eliminate such a case as much as possible, in this embodiment, if voiceprint authentication fails in step 7, authentication is further continued. Of course, from the viewpoint of eliminating impersonation as much as possible, it is also possible to perform a process of rejecting the connection when voiceprint authentication in step 6 fails.

なお、上記の処理例では、音声認証認識（キーワード認証）に成功し、声紋認証に失敗した場合にステップ９に移ることとしているが、音声認証認識が失敗した場合に、接続を拒否しないで、声紋認証に移り、ここで声紋認証に失敗した場合にはじめて接続を拒絶し、ここで声紋認証に成功した場合に、ステップ９に移ることとしてもよい。 In the above processing example, if the voice authentication recognition (keyword authentication) succeeds and the voiceprint authentication fails, the process proceeds to step 9. If the voice authentication recognition fails, the connection is not rejected. It moves to voiceprint authentication, and when voiceprint authentication fails here, connection is refused for the first time, and when voiceprint authentication is successful here, it is good also as moving to step 9. FIG.

次に、ステップ９の処理について説明する。 Next, the process of step 9 will be described.

ステップ９においては、ステップ９において用いたキーワード１とは別のキーワード２を用いた音声認識認証を行うとともに、キーワード２に対応する発話音声に基づく声紋認証も行う。 In step 9, voice recognition authentication using a keyword 2 different from the keyword 1 used in step 9 is performed, and voiceprint authentication based on the uttered voice corresponding to the keyword 2 is also performed.

ここでは、まず、認証制御部４６が、発信者に対してキーワード２の発話を要求する音声メッセージ（もちろん、キーワード２自身は含まれない）を通話制御装置６を介して発信者端末１に送信するとともに、音声認識認証部４３と、声紋認証部４４に対してそれぞれの認証処理を行う旨の指示を行う。 Here, first, the authentication control unit 46 transmits a voice message requesting the caller to speak the keyword 2 (of course, the keyword 2 itself is not included) to the caller terminal 1 via the call control device 6. At the same time, the voice recognition authentication unit 43 and the voiceprint authentication unit 44 are instructed to perform respective authentication processes.

音声認識認証部４３は、キーワード２の発話を要求する音声メッセージに応答して発信者から発せられた音声を受信し、それをテキストに変換し、当該テキストと、ユーザ情報格納部４７に格納された発信者の電話番号に対応するキーワード２とを比較し、当該テキストがキーワード２に一致するかどうかを調べることにより音声認識認証を行う。 The voice recognition / authentication unit 43 receives the voice uttered from the caller in response to the voice message requesting the utterance of the keyword 2, converts it into text, and stores the text and the user information storage unit 47. The voice recognition authentication is performed by comparing with the keyword 2 corresponding to the telephone number of the caller and checking whether the text matches the keyword 2.

声紋認証部４４も発信者から発せられた音声を受信し、当該音声から声紋を抽出し、抽出した声紋と、ユーザ情報格納部４７に格納された発信者の電話番号に対応する声紋とを比較し、これらが一致するかどうかを調べることにより声紋認証を行う。 The voiceprint authentication unit 44 also receives the voice emitted from the caller, extracts the voiceprint from the voice, and compares the extracted voiceprint with the voiceprint corresponding to the caller's phone number stored in the user information storage unit 47 Then, voiceprint authentication is performed by checking whether or not they match.

もちろん、ここでの声紋認証においても、一定時間発信者に発話を行わせるためのメッセージを認証制御部４６が送信し、それに応答して発信者から発話された音声に基づき声紋認証を行うこととしてもよい。 Of course, also in the voiceprint authentication here, the authentication control unit 46 transmits a message for causing the caller to speak for a certain period of time, and voiceprint authentication is performed based on the voice uttered from the caller in response thereto. Also good.

ステップ９の音声認識認証と声紋認証の結果、音声認識認証と声紋認証の両方に失敗した場合（ステップ１０の"両方失敗"）、ステップ３に移り、認証制御部４６は、接続を拒否するための処理を行う。音声認識認証と声紋認証の両方に成功した場合（ステップ１０の"両方成功"）、認証制御部４６は、認証に成功したと判断して、ステップ８に移り、接続のための処理を行う。 If both voice recognition authentication and voiceprint authentication fail as a result of the voice recognition authentication and voiceprint authentication in step 9 (“both failed” in step 10), the process proceeds to step 3 and the authentication control unit 46 rejects the connection. Perform the process. When both the voice recognition authentication and the voiceprint authentication are successful (“both successful” in Step 10), the authentication control unit 46 determines that the authentication is successful, moves to Step 8, and performs a process for connection.

また、本実施形態では、音声認識認証と声紋認証のうちのいずれか一方において失敗した場合（ステップ１０の"１つ失敗"）、オペレータ認証を行うこととしている（ステップ１１）。すなわち、この場合、認証制御部４６が、通話制御装置６に対して発信者端末１とオペレータ端末３とを接続する旨の指示を送信する。当該指示を受けた通話制御装置６は、発信者端末１とオペレータ端末３とを接続するための制御を行う。発信者端末１とオペレータ端末３が接続された後は、オペレータ端末３のオペレータが、発信者と会話を行うことにより本人性を確認する。このとき、発信者端末１とオペレータ端末３間でやりとりされる音声を認証装置４が取得し、更に声紋認証を行ったり、録音を行ったりしてもよい。オペレータの判断により本人性が確認されれば、ステップ８に移り、発信者端末１と着信者端末２との間の通話が行われることになる。 Further, in the present embodiment, when one of voice recognition authentication and voiceprint authentication fails (“one failure” in step 10), operator authentication is performed (step 11). That is, in this case, the authentication control unit 46 transmits an instruction to connect the caller terminal 1 and the operator terminal 3 to the call control device 6. Upon receiving the instruction, the call control device 6 performs control for connecting the caller terminal 1 and the operator terminal 3. After the caller terminal 1 and the operator terminal 3 are connected, the operator of the operator terminal 3 confirms the identity by performing a conversation with the caller. At this time, the authentication device 4 may acquire voice exchanged between the caller terminal 1 and the operator terminal 3, and may further perform voiceprint authentication or recording. If the identity is confirmed by the operator's judgment, the process proceeds to step 8 where a call between the caller terminal 1 and the callee terminal 2 is performed.

（オペレータ認証移行判定のための処理例）
上記の処理例では、ステップ９の音声認識認証・声紋認証処理において、音声認識認証と声紋認証のうちのいずれか一方が失敗した場合にオペレータ認証に移行することとしていたが、最初の声紋認証（ステップ６）の段階で、所定の条件を満たす場合にオペレータ認証に移行することとしてもよい。 (Processing example for determining operator authentication)
In the above processing example, in the voice recognition authentication / voiceprint authentication process in step 9, when either one of the voice recognition authentication and voiceprint authentication fails, the operator authentication is performed. In the step 6), when predetermined conditions are satisfied, it is possible to shift to operator authentication.

この処理を行う場合、図４の処理フローにおけるステップ６の処理とステップ７の判定処理のみが異なってくる。以下、本例でのステップ６、ステップ７の処理を説明する。 When this processing is performed, only the processing in step 6 and the determination processing in step 7 in the processing flow of FIG. 4 are different. Hereinafter, the processing of step 6 and step 7 in this example will be described.

ステップ６において、声紋認証部４４は、図４の場合と同様にして声紋認証処理を行う。更に、本例では、この声紋認証処理と並行して、オペレータ認証判定部４５が発信者の音声を順次取得しながら、以下の処理を行う。 In step 6, the voiceprint authentication unit 44 performs voiceprint authentication processing in the same manner as in FIG. Further, in this example, in parallel with the voiceprint authentication process, the operator authentication determination unit 45 performs the following process while sequentially acquiring the caller's voice.

オペレータ認証判定部４５は、予め定めた時間の間発信者が発話を行う中で、発信者の音声における無音区間の継続時間を測定する。例えば、図５の場合であれば、無音区間の継続時間は８秒である。一定時間内において無音区間が長いということは、発信者が会話を安定して行うことができていないことを示していることから、無音区間の継続時間を調べることにより、発信者の安定度をチェックすることができる。ここでは、例えば、無音区間の継続時間と安定度の数値とを対応付けたテーブルを予め認証装置４の記憶装置内に備えておき、オペレータ認証判定部４５は、実際に測定した無音区間の継続時間に対応する安定度を上記テーブルから取得する。無音区間の継続時間に代えて、無音区間の回数により安定度を判定してもよい。 The operator authentication determination unit 45 measures the duration of the silent section in the caller's voice while the caller speaks for a predetermined time. For example, in the case of FIG. 5, the duration of the silent section is 8 seconds. A long silent section within a certain period of time indicates that the caller is not able to talk stably. Therefore, by checking the duration of the silent section, the stability of the caller is improved. Can be checked. Here, for example, a table in which the duration of the silent section and the numerical value of the stability are associated with each other is provided in the storage device of the authentication device 4 in advance, and the operator authentication determination unit 45 performs the continuation of the silent section actually measured. The stability corresponding to time is obtained from the table. Instead of the duration of the silent section, the stability may be determined by the number of silent sections.

更に、オペレータ認証判定部４５は、予め定めた時間の間順次取得する音声の中から発信者の興奮度を示す特徴量を抽出し、発信者の興奮度を判定する。興奮度を示す特徴量としては、発信者の音声の周波数、リズム、強弱等がある。そして、例えば、興奮度を示す特徴量の値と、興奮度の数値とを対応付けたテーブルを予め認証装置４の記憶装置内に備えておき、オペレータ認証判定部４５は、実際に測定した特徴量に対応する興奮度を上記テーブルから取得する。なお、音声の周波数、リズム、強弱等から発話者の興奮度を推定する技術自体は既存技術である。 Further, the operator authentication determination unit 45 extracts a feature amount indicating the degree of excitement of the caller from voices sequentially acquired for a predetermined time, and determines the degree of excitement of the caller. The feature amount indicating the degree of excitement includes the frequency, rhythm, strength, etc. of the caller's voice. Then, for example, a table in which the value of the feature amount indicating the degree of excitement and the numerical value of the degree of excitement are associated in advance is provided in the storage device of the authentication device 4, and the operator authentication determination unit 45 performs the actual measurement of the feature. The excitement level corresponding to the amount is acquired from the table. Note that the technology itself for estimating the degree of excitement of a speaker from the frequency, rhythm, strength, etc. of speech is an existing technology.

ステップ７において、オペレータ認証判定部４５は、声紋認証部４４による声紋認証結果がＯＫか否かをチェックする。ＯＫであれば、認証成功とし、ステップ８に進む。声紋認証部４４による声紋認証結果がＮＧの場合、オペレータ認証判定部４５は、ステップ６で求めた安定度と興奮度の合計を予め定めた閾値と比較し、安定度と興奮度の合計が当該閾値以上であればオペレータ認証に移行する（ステップ１１へ）。具体的には、この場合、オペレータ認証判定部４５は、オペレータ認証に移行するよう認証制御部４６に通知を行い、認証制御部４６が通話制御装置６に対して発信者端末１とオペレータ端末３間の接続指示を行う。また、安定度と興奮度の合計が当該閾値未満であれば前述したステップ９に移行する。 In step 7, the operator authentication determination unit 45 checks whether the voiceprint authentication result by the voiceprint authentication unit 44 is OK. If OK, the authentication is successful and the process proceeds to step 8. When the voiceprint authentication result by the voiceprint authentication unit 44 is NG, the operator authentication determination unit 45 compares the sum of stability and excitement obtained in step 6 with a predetermined threshold, and the sum of stability and excitement If it is equal to or greater than the threshold value, the process proceeds to operator authentication (to step 11). Specifically, in this case, the operator authentication determination unit 45 notifies the authentication control unit 46 to shift to operator authentication, and the authentication control unit 46 notifies the call control device 6 of the caller terminal 1 and the operator terminal 3. Instruct the connection between. If the sum of the stability and the excitement is less than the threshold, the process proceeds to step 9 described above.

なお、上記の例では、安定度と興奮度の合計に基づきオペレータ認証への移行を判定していたが、安定度と興奮度についてそれぞれ閾値を設け、いずれか一方、又は両方が閾値以上になった場合にオペレータ認証へ移行することとしてもよい。 In the above example, the transition to the operator authentication is determined based on the sum of the stability and the excitement level. However, a threshold is set for each of the stability and the excitement level, and either one or both are equal to or more than the threshold value. It is also possible to shift to operator authentication in the case of

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

本発明の実施の形態に係るシステムの構成図である。1 is a configuration diagram of a system according to an embodiment of the present invention. 認証装置４の機能構成図である。3 is a functional configuration diagram of an authentication device 4. FIG. ユーザ情報格納部４７が格納するテーブルの一例を示す図である。It is a figure which shows an example of the table which the user information storage part 47 stores. 認証装置４の動作を説明するためのフローチャートである。4 is a flowchart for explaining the operation of the authentication device 4; 無音区間の継続時間を説明するための図である。It is a figure for demonstrating the duration of a silence area.

Explanation of symbols

１発信者端末
２着信者端末
３オペレータ端末
４認証装置
５電話網
６通話制御装置
４１送受信部
４２端末認証部
４３音声認識認証部
４４声紋認証部
４５オペレータ認証判定部
４６認証制御部
４７ユーザ情報格納部 DESCRIPTION OF SYMBOLS 1 Caller terminal 2 Callee terminal 3 Operator terminal 4 Authentication apparatus 5 Telephone network 6 Call control apparatus 41 Transmission / reception part 42 Terminal authentication part 43 Voice recognition authentication part 44 Voiceprint authentication part 45 Operator authentication determination part 46 Authentication control part 47 User information storage Part

Claims

An authentication device for confirming the identity of a speaker who makes a call using a telephone terminal connected to a telephone network,
Storage means for storing predetermined voiceprint information, a first keyword, and a second keyword;
Receiving means for receiving the voice of the speaker transmitted from the telephone terminal from the telephone network;
Voiceprint authentication means for acquiring voiceprint information from the voice received by the receiving means, and performing voiceprint authentication by comparing the voiceprint information with predetermined voiceprint information stored in the storage means;
A voice message prompting the speaker to utter the first keyword is transmitted to the telephone terminal, and then the content of the voice of the speaker received by the receiving means is stored in the storage means. Speech recognition and authentication means for determining whether or not the keyword corresponds to one keyword;
When the authentication result by the voiceprint authentication means is different from the authentication result by the voice recognition authentication means using the first keyword, a voice message that prompts the speaker to speak the second keyword A voice recognition authentication process for determining whether a content of the voice of the speaker transmitted to the telephone terminal and then received by the receiving means corresponds to a second keyword stored in the storage means; An authentication control means to be executed by the authentication means,
The authentication apparatus, wherein the authentication control means determines the identity of the speaker based on a voice recognition authentication result using the second keyword.

The authentication control unit causes the voiceprint authentication unit to further execute voiceprint authentication for the voice of the speaker related to the second keyword in the voice recognition authentication process using the second keyword, The authentication apparatus according to claim 1, wherein the identity of the speaker is determined based on a result of voiceprint authentication and a result of voice recognition authentication processing using the second keyword.

Based on the duration of a silent period or the number of silent periods in the voice of the speaker received by the receiving means, the conversation stability of the speaker is determined, and a predetermined feature amount is extracted from the voice of the speaker And determining the speaker's excitement level using the feature amount, and confirming the speaker's identity through an operator based on both the stability and / or the excitement level The authentication apparatus according to claim 1, further comprising an operator authentication determination unit that determines whether or not to shift to the above.

The storage means further stores a plurality of registered user telephone numbers, and the authentication device includes:
Telephone number receiving means for receiving the telephone number of the telephone terminal from the telephone network;
Telephone terminal authentication means for performing telephone terminal authentication by determining whether or not the telephone number of the telephone terminal received by the telephone number receiving means is stored in the storage means;
4. The authentication apparatus according to claim 1, wherein the authentication device performs an authentication process using the voiceprint authentication unit and the voice recognition authentication unit only when the telephone terminal authentication by the telephone terminal authentication unit is successful. The authentication device according to any one of claims.

An authentication method executed by an authentication apparatus for confirming the identity of a speaker who makes a call using a telephone terminal connected to a telephone network, wherein the authentication apparatus includes predetermined voiceprint information, a first keyword , And storage means for storing the second keyword, the authentication method comprising:
Receiving the voice of the speaker sent from the telephone terminal from the telephone network;
A voiceprint authenticating step for acquiring voiceprint information from the voice received by the receiving means and performing voiceprint authentication by comparing the voiceprint information with predetermined voiceprint information stored in the storage means;
The voice recognition and authentication means transmits a voice message prompting the speaker to utter the first keyword to the telephone terminal, and then transmits the voice content of the speaker received by the receiving means. A first speech recognition and authentication step for determining whether or not the first keyword stored in the storage means corresponds;
When the authentication result in the voiceprint authentication step is different from the authentication result in the first voice recognition authentication step using the first keyword, the voice recognition authentication means sets the second keyword as the second keyword. Whether a voice message prompting a speaker to speak is transmitted to the telephone terminal, and then the content of the voice of the speaker received by the receiving unit corresponds to the second keyword stored in the storage unit A second speech recognition and authentication step for determining whether or not;
Determining the identity of the speaker based on a speech recognition authentication result in the second speech recognition authentication step;
An authentication method characterized by comprising:

In order to confirm the identity of a speaker who makes a call using a telephone terminal connected to a telephone network, a computer having storage means for storing predetermined voiceprint information, a first keyword, and a second keyword. A program for causing the computer to function as an authentication device of
Receiving means for receiving the voice of the speaker transmitted from the telephone terminal from the telephone network;
Voiceprint authentication means for acquiring voiceprint information from the voice received by the receiving means, and performing voiceprint authentication by comparing the voiceprint information with predetermined voiceprint information stored in the storage means;
A voice message prompting the speaker to utter the first keyword is transmitted to the telephone terminal, and then the content of the voice of the speaker received by the receiving means is stored in the storage means. Voice recognition and authentication means for determining whether or not the keyword corresponds to one keyword;
When the authentication result by the voiceprint authentication means is different from the authentication result by the voice recognition authentication means using the first keyword, a voice message that prompts the speaker to speak the second keyword A voice recognition authentication process for determining whether a content of the voice of the speaker transmitted to the telephone terminal and then received by the receiving means corresponds to a second keyword stored in the storage means; Authentication control means for causing the authentication means to perform determination of the identity of the speaker based on a voice recognition authentication result using the second keyword;
Program to function as.