JP5436951B2

JP5436951B2 - User authentication device and user authentication method

Info

Publication number: JP5436951B2
Application number: JP2009151496A
Authority: JP
Inventors: 克佳長嶋
Original assignee: 株式会社クローバー・ネットワーク・コム
Priority date: 2009-06-25
Filing date: 2009-06-25
Publication date: 2014-03-05
Anticipated expiration: 2029-06-25
Also published as: JP2011008544A

Description

本発明は、本人認証装置および本人認証方法に関する。 The present invention relates to a personal authentication device and a personal authentication method.

従来より、パスワード、音声、画像、バイオメトリクス等を利用した認証装置が提案されている。 Conventionally, authentication devices using passwords, sounds, images, biometrics, and the like have been proposed.

認証に画像情報を利用するものとして特許文献１記載の電子機器が提案されている。この特許文献１の電子機器では、撮像部で撮像した画像から人の顔を抽出してユーザを認証するようにしている。また、認証に音声情報を利用するものとして特許文献２に記載の画像形成装置が提案されている。特許文献２の画像形成装置では、ユーザの音声から声紋を抽出して抽出された声紋を記憶部に記憶された声紋情報と照合してユーザを認証するようにしている。 An electronic device described in Patent Document 1 has been proposed as one that uses image information for authentication. In the electronic device disclosed in Patent Document 1, a human face is extracted from an image captured by an imaging unit to authenticate a user. Further, an image forming apparatus described in Patent Document 2 has been proposed as one that uses audio information for authentication. In the image forming apparatus disclosed in Patent Document 2, a voice print is extracted from a user's voice, and the extracted voice print is compared with voice print information stored in a storage unit to authenticate the user.

[特許文献１] 特開２００９−８１６５７号公報
[特許文献２] 特開２００９−９４６７１号公報 [Patent Document 1] Japanese Unexamined Patent Application Publication No. 2009-81657
[Patent Document 2] JP-A-2009-94671

しかしながら、特許文献２に記載の従来技術のように、音声認証にアナログデータによってパターンによる照合分析を利用しているため、本人認証結果の精度が低く、またシステム処理工数が多いという問題があった。 However, as in the prior art described in Patent Document 2, since collation analysis using a pattern based on analog data is used for voice authentication, there is a problem that the accuracy of the authentication result is low and the number of system processing steps is large. .

そこで、本発明は、上記問題点に鑑みなされたものであって、少ないシステムの処理工数で本人認証結果の精度を高くできる本人認証装置および本人認証方法を提供することを目的とする。 Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to provide a personal authentication device and a personal authentication method that can increase the accuracy of a personal authentication result with a small number of system processing steps.

上記課題を解決するために、本発明の本人認証装置は、端末および通信回線を介して入力されたユーザの第１の音声認証用音声データとユーザの第１の音声認識用音声データを入力し、前記ユーザの第１の音声認証用音声データをデジタル変換したデジタル音声認証用音声データに、該デジタル音声認証用音声データから作成した声紋データを加えた第１音声データと、前記ユーザの第１の音声認識用音声データをデジタル変換したデジタル音声認識用音声データである第２音声データを、含むテーブルを予め記憶する記憶装置と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認証用音声データをデジタル変換した第１のデジタル音声データと、前記テーブル中の対応する第１音声データを比較し、前記第１のデジタル音声データを認証し、前記端末のユーザが真正なユーザであるか否かを判断する音声認証手段と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認識用音声データまたは前記第２の音声認証用音声データをデジタル変換した第２のデジタル音声データと、前記テーブル中の対応する第２音声データを比較し、前記第２のデジタル音声データを音声認識し、前記ユーザの発話内容の正誤を確認する音声認識手段と、を備え、前記音声認証手段による第１のデジタル音声データの認証と、前記音声認識手段による第２のデジタル音声データの認識の組合せにより、本人認証する。 In order to solve the above-described problem, the personal authentication device of the present invention inputs the user's first voice authentication voice data and the user's first voice recognition voice data inputted via the terminal and the communication line. First voice data obtained by adding voice print data created from the digital voice authentication voice data to digital voice authentication voice data obtained by digitally converting the first voice authentication voice data of the user; The second voice data, which is the voice data for digital voice recognition obtained by digitally converting the voice data for voice recognition, is input based on the user's authentication request via the terminal and the communication line. Comparing the first digital voice data obtained by digitally converting the second voice authentication voice data and the corresponding first voice data in the table; The first digital voice data is authenticated, voice authentication means for determining whether or not the user of the terminal is a genuine user, and input based on the authentication request of the user via the terminal and the communication line The second digital voice data obtained by digitally converting the second voice recognition voice data or the second voice authentication voice data is compared with the corresponding second voice data in the table, and the second digital voice is compared. Voice recognition means for voice recognition of the data and confirming the correctness of the utterance content of the user, authentication of the first digital voice data by the voice authentication means, and second digital voice data by the voice recognition means The person is authenticated by the combination of recognition.

本発明によれば、デジタル変換後の音声データを利用して音声認証するため、本人認証結果の精度を高くできる。またデジタルデータを利用するためシステムの処理工数を少なくすることができる。また、本発明によれば、音声認証技術と音声認識技術の組み合わせを利用して本人認証をすることでより精度の高い本人確認を行うことができる。 According to the present invention, since voice authentication is performed using voice data after digital conversion, the accuracy of the personal authentication result can be increased. Also, since digital data is used, the number of processing steps of the system can be reduced. Further , according to the present invention, it is possible to perform more accurate identity verification by performing identity authentication using a combination of speech authentication technology and speech recognition technology.

また、本発明の音声認証装置は、端末および通信回線を介して入力されたユーザの第１の音声認証用音声データとユーザの第１の音声認識用音声データを入力し、前記ユーザの第１の音声認証用音声データをデジタル変換したデジタル音声認証用音声データに、該デジタル音声認証用音声データから作成した声紋データを加えた第１音声データと、前記ユーザの第１の音声認識用音声データをデジタル変換したデジタル音声認識用音声データである第２音声データを、含むテーブルを予め記憶する記憶装置と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認証用音声データをデジタル変換した第１のデジタル音声データと、前記テーブル中の対応する第１音声データを比較し、前記第１のデジタル音声データを認証し、前記端末のユーザが真正なユーザであるか否かを判断する音声認証手段と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認識用音声データまたは前記第２の音声認証用音声データをデジタル変換した第２のデジタル音声データと、前記テーブル中の対応する第２音声データを比較し、前記第２のデジタル音声データを音声認識し、前記ユーザの発話内容の正誤を確認する音声認識手段と、前記音声認証手段及び前記音声認識手段による判定結果に応じて前記端末から入力された音声データの本人認証結果を通知する通知手段と、を備え、前記音声認証手段による第１のデジタル音声データの認証と、前記音声認識手段による第２のデジタル音声データの認識の組合せにより、本人認証する。 The voice authentication apparatus of the present invention receives the first audio data for speech recognition of the first voice authentication for voice data and the user of the user input via the terminal and the communication line, a first of said users First voice data obtained by adding voice print data created from the voice data for digital voice authentication to voice data for digital voice authentication obtained by digitally converting the voice data for voice authentication of the first voice data for voice recognition of the user A second storage device that stores in advance a table that includes second speech data that is digital speech recognition speech data that has been digitally converted, and a second speech input based on the user's authentication request via the terminal and the communication line. The first digital sound data obtained by digitally converting the authentication sound data and the corresponding first sound data in the table are compared, and the first digital sound data is compared. A voice authentication means for authenticating data and determining whether or not the user of the terminal is a genuine user; and a second voice recognition unit input based on the authentication request of the user via the terminal and a communication line Comparing the second digital voice data obtained by digitally converting the voice data or the second voice authentication voice data with the corresponding second voice data in the table, and voice-recognizing the second digital voice data; Voice recognition means for confirming the correctness of the user's utterance content, and notification means for notifying the personal authentication result of the voice data input from the terminal according to the determination result by the voice authentication means and the voice recognition means. Provided by a combination of authentication of the first digital voice data by the voice authentication means and recognition of the second digital voice data by the voice recognition means. Witness.

本発明によれば、音声認証技術と音声認識技術の組み合わせを利用して本人認証をすることでより精度の高い本人確認を行うことができる。また、音声認証手段及び音声認識手段による判定結果に応じて、端末から入力された音声データの本人認証結果を通知する通知手段を備える。ADVANTAGE OF THE INVENTION According to this invention, a more accurate identity verification can be performed by authenticating a person using the combination of a voice authentication technique and a voice recognition technique. In addition, a notification unit is provided for notifying the personal authentication result of the voice data input from the terminal according to the determination result by the voice authentication unit and the voice recognition unit.

また、本発明の本人認証方法は、端末および通信回線を介して入力されたユーザの第１の音声認証用音声データとユーザの第１の音声認識用音声データを入力し、前記ユーザの第１の音声認証用音声データをデジタル変換したデジタル音声認証用音声データに、該デジタル音声認証用音声データから作成した声紋データを加えた第１音声データと、前記ユーザの第１の音声認識用音声データをデジタル変換したデジタル音声認識用音声データである第２音声データを、含むテーブルを予め記憶する工程と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認証用音声データをデジタル変換した第１のデジタル音声データと、前記テーブル中の対応する第１音声データを比較し、前記第１のデジタル音声データを認証して、前記端末のユーザが真正なユーザであるか否かを判断する音声認証工程と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認識用音声データまたは前記第２の音声認証用音声データをデジタル変換した第２のデジタル音声データと、前記テーブル中の対応する第２音声データを比較し、前記第２のデジタル音声データを音声認識して、前記ユーザの発話内容の正誤を確認する音声認識工程を含み、前記音声認証工程による第１のデジタル音声データの認証と、前記音声認識工程による第２のデジタル音声データの認識の組合せにより、本人認証する。 Also, authentication method of the present invention receives the first audio data for speech recognition of the first voice authentication for voice data and the user of the user input via the terminal and the communication line, a first of said users First voice data obtained by adding voice print data created from the voice data for digital voice authentication to voice data for digital voice authentication obtained by digitally converting the voice data for voice authentication of the first voice data for voice recognition of the user Pre-stores a table including second voice data which is digital voice recognition voice data obtained by digitally converting the voice data, and second voice authentication input based on the user's authentication request via the terminal and the communication line The first digital audio data obtained by digitally converting the audio data for use and the corresponding first audio data in the table are compared, and the first digital audio data is compared. A voice authentication step for determining whether or not the user of the terminal is a genuine user, and a second voice recognition input based on the user's authentication request via the terminal and a communication line And the second digital voice data obtained by digitally converting the second voice authentication voice data and the corresponding second voice data in the table are compared, and the second digital voice data is voice-recognized. A speech recognition step for confirming whether the user's utterance content is correct or not, and a combination of authentication of the first digital speech data by the speech authentication step and recognition of the second digital speech data by the speech recognition step, Authenticate yourself.

また、本発明の本人認証方法は、端末および通信回線を介して入力されたユーザの第１の音声認証用音声データとユーザの第１の音声認識用音声データを入力し、前記ユーザの第１の音声認証用音声データをデジタル変換したデジタル音声認証用音声データに、該デジタル音声認証用音声データから作成した声紋データを加えた第１音声データと、前記ユーザの第１の音声認識用音声データをデジタル変換したデジタル音声認識用音声データである第２音声データを、含むテーブルを予め記憶する記憶工程と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認証用音声データをデジタル変換した第１のデジタル音声データと、前記テーブル中の対応する第１音声データを比較し、前記第１のデジタル音声データを認証し、前記端末のユーザが真正なユーザであるか否かを判断する音声認証工程と、前記端末および通信回線を介して前記ユーザの認証依頼に基づき入力された第２の音声認識用音声データまたは前記第２の音声認証用音声データをデジタル変換した第２のデジタル音声データと、前記テーブル中の対応する第２音声データを比較し、前記第２のデジタル音声データを音声認識し、前記ユーザの発話内容の正誤を確認する音声認識工程と、前記音声認証工程及び前記音声認識工程による判定結果に応じて前記端末から入力された音声データの本人認証結果を通知する通知工程とを含み、前記音声認証工程による第１のデジタル音声データの認証と、前記音声認識工程による第２のデジタル音声データの認識の組合せにより、本人認証する。 In the personal authentication method of the present invention, the user's first voice authentication voice data and the user's first voice recognition voice data input via the terminal and the communication line are input, and the user's first voice authentication data is input. First voice data obtained by adding voice print data created from the voice data for digital voice authentication to voice data for digital voice authentication obtained by digitally converting the voice data for voice authentication of the first voice data for voice recognition of the user A step of storing in advance a table including second voice data which is digital voice recognition voice data obtained by digitally converting the second voice data, and a second voice inputted based on the user's authentication request via the terminal and the communication line The first digital sound data obtained by digitally converting the authentication sound data and the corresponding first sound data in the table are compared, and the first digital sound data is compared. A voice authentication step of authenticating data and determining whether or not the user of the terminal is a genuine user, and a second voice recognition input input based on the authentication request of the user via the terminal and a communication line Comparing the second digital voice data obtained by digitally converting the voice data or the second voice authentication voice data with the corresponding second voice data in the table, and voice-recognizing the second digital voice data; A speech recognition step for confirming whether the user's utterance content is correct, and a notification step for notifying the authentication result of the speech data input from the terminal according to the speech authentication step and the determination result of the speech recognition step. The authentication of the first digital voice data by the voice authentication step and the recognition of the second digital voice data by the voice recognition step To.

本発明によれば、少ないシステムの処理工数で本人認証結果の精度を高くできる本人認証装置および本人認証方法を提供することができる。 According to the present invention, it is possible to provide a personal authentication device and a personal authentication method that can increase the accuracy of a personal authentication result with a small number of system processing steps.

本発明の実施形態に係るシステムの構成図である。1 is a configuration diagram of a system according to an embodiment of the present invention. 本発明の実施形態における登録時の処理フローチャートである。It is a processing flowchart at the time of registration in the embodiment of the present invention. 本発明の実施形態における本人認証時の処理フローチャートである。It is a process flowchart at the time of the personal authentication in embodiment of this invention.

以下、本発明に係る好適な実施形態について、図面を参照しながら詳細に説明する。 DESCRIPTION OF EMBODIMENTS Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施形態に係るシステムの構成図である。図１に示すように、本人認証システム１００内には、本人認証装置１、県民センター１０、市民センター２０〜４０、…、端末５０、６０等が設けられている。県民センター１０の下位層には、市民センター２０〜４０、…が設けられている。各市民センター２０〜４０には出張所２０Ａ〜２０Ｃ，…、出張所３０Ａ〜３０Ｃ，…、４０Ａ〜４０Ｃ，…が設けられている。 FIG. 1 is a configuration diagram of a system according to an embodiment of the present invention. As shown in FIG. 1, the personal authentication system 100 includes a personal authentication device 1, a prefectural center 10, civic centers 20 to 40, terminals 50 and 60. Citizen centers 20 to 40 are provided in a lower layer of the prefectural center 10. Each civic center 20-40 is provided with branch offices 20A-20C, ..., branch offices 30A-30C, ..., 40A-40C, ....

ユーザは端末５０を介して本人認証装置１へ本人認証用の音声データを登録することができる。またユーザは端末６０を介して本人認証装置１へ本人認証を依頼することができる。ここで、端末５０および端末６０は携帯電話でもよく、パーソナルコンピュータによるＩＰ電話でもよい。したがって、通信回線は電話回線およびインターネット回線を適用することができる。 The user can register voice data for personal authentication in the personal authentication device 1 via the terminal 50. Further, the user can request personal authentication from the personal authentication device 1 via the terminal 60. Here, the terminal 50 and the terminal 60 may be mobile phones or IP phones using personal computers. Therefore, a telephone line and an Internet line can be applied as the communication line.

本人認証装置１は、制御サーバ２、音声認証サーバ３、音声認識サーバ４、システム連携サーバ５を備え、これらはネットワーク６を介して接続されている。本実施形態では本人認証装置１を複数のサーバの組み合わせによって実現しているが一つのサーバによって本人認証装置１の機能を実現することもできる。また、制御サーバ２、音声認証サーバ３、音声認識サーバ４、システム連携サーバ５は、例えばＣＰＵ（Central Processing Unit）、ＲＡＭ（Random Access Memory）、ＲＯＭ（Read Only Memory）、ハードディスク装置などの記憶装置によって構成されている。 The personal authentication device 1 includes a control server 2, a voice authentication server 3, a voice recognition server 4, and a system linkage server 5, which are connected via a network 6. In the present embodiment, the personal authentication device 1 is realized by a combination of a plurality of servers, but the function of the personal authentication device 1 can also be realized by a single server. The control server 2, the voice authentication server 3, the voice recognition server 4, and the system cooperation server 5 are storage devices such as a CPU (Central Processing Unit), a RAM (Random Access Memory), a ROM (Read Only Memory), and a hard disk device, for example. It is constituted by.

本人認証装置１は、デジタル音声認証とデジタル音声認識の組み合わせによる本人認証を行う装置である。音声認証技術と音声認識技術を組み合わせることによって、より精度の高い本人確認を行うことができる。音声認証のメリットとしては、特別の機器を使わずにいつでも、どこでも、本人確認が可能である。 The personal authentication device 1 is a device that performs personal authentication by a combination of digital voice authentication and digital voice recognition. By combining voice authentication technology and voice recognition technology, it is possible to perform identity verification with higher accuracy. As a merit of voice authentication, identity verification is possible anytime, anywhere without using special equipment.

制御サーバ２は、本人認証装置１全体を制御している。制御サーバ２は、音声認証サーバ３及び音声認識サーバ４による結果に応じて端末６０から入力された音声データの本人認証結果を端末６０に通知する。 The control server 2 controls the personal authentication device 1 as a whole. The control server 2 notifies the terminal 60 of the personal authentication result of the voice data input from the terminal 60 according to the results of the voice authentication server 3 and the voice recognition server 4.

音声認証サーバ３は、端末および通信回線を介して入力された音声認証用音声データをデジタル変換したデジタル音声データとこのデジタル音声データから作成した声紋データを利用して音声認証を行う。音声認証サーバ３は、音声認証用の文言として、氏名、生年月日、電話番号、個人番号等の個人の情報のなどの固定データを登録する。この音声認証用の文言としては例えば、例えば１０文字以内の任意の文言とすることができる。申込書には、その言葉の意味（ペットの名前、電話番号とかを記入する）を記入してもらい、その文言を端末５０から本認証装置１にデータ登録する。 The voice authentication server 3 performs voice authentication using digital voice data obtained by digitally converting voice authentication voice data input via a terminal and a communication line, and voice print data created from the digital voice data. The voice authentication server 3 registers fixed data such as personal information such as name, date of birth, telephone number, and personal number as a word for voice authentication. For example, the text for voice authentication can be any text within 10 characters. The application form is filled in with the meaning of the words (enter the name of the pet, telephone number, etc.), and the words are registered in the authentication apparatus 1 from the terminal 50.

詳細には、音声認証サーバ３は、端末５０から通信回線を介して入力されたユーザの音声認証用音声データをデジタル変換し、離散的に変換したデジタル音声データを音声認証サーバ３内の記憶装置のテーブル内に識別子に対応付けて記憶する。 Specifically, the voice authentication server 3 digitally converts user voice authentication voice data input from the terminal 50 via a communication line, and the discretely converted digital voice data is stored in the voice authentication server 3. Are stored in association with identifiers in the table.

ここで、個人を特徴づける声紋の要素には、声道の長さや形、声帯の長さ、性別、発話速度、発音等がある。そこで、音声認証サーバ３は、デジタル音声データに変換後に、声道の長さや形、声帯の長さ、性別、発話速度、発音等の情報を発話音声から抽出して点数化し、声紋データを作成し、この作成した声紋データをデジタル音声データに関連付けて音声認証サーバ３の記憶装置内のテーブル内に記憶する。音声認証サーバ３は、分別フィルタを通すことにより音の要素別にデータ保存を行う。 Here, the elements of the voiceprint characterizing the individual include the length and shape of the vocal tract, the length of the vocal cords, gender, speaking speed, pronunciation, and the like. Therefore, the voice authentication server 3 generates voiceprint data by converting the voice vocal tract length and shape, vocal cord length, gender, utterance speed, pronunciation, and other information from the uttered voice and converting it into points after conversion to digital voice data. The created voiceprint data is stored in a table in the storage device of the voice authentication server 3 in association with the digital voice data. The voice authentication server 3 stores data for each sound element by passing through a classification filter.

音声認証サーバ３は、「音質」「音程」「音量」の音の３つの要素を表現するデジタル音声データに、個別の声の質を判別する「声紋」を表現する声紋データを加えることにより音声認証用デジタルデータを構成する。したがって、音声認証用デジタルデータには、デジタル音声データと声紋データが含まれ、これらはテーブル内に関連付けて記憶されている。この音声認証用デジタルデータが第１音声データとなる。 The voice authentication server 3 adds voice print data representing a “voice print” that distinguishes the quality of individual voices to digital voice data expressing the three elements of “sound quality”, “pitch”, and “volume”. Configure digital data for authentication. Therefore, the digital data for voice authentication includes digital voice data and voiceprint data, which are stored in association with each other in the table. This digital data for voice authentication becomes the first voice data.

音声認証サーバ３は、端末６０から通信回線を介して入力されたユーザの認証用音声データをデジタル変換したデジタルデータと、このデジタルデータから作成した声紋データを記憶装置内のテーブル中の対応する音声認証用デジタルデータ（第１音声データ）と比較しユーザの認証用音声データを音声認証する。音声認証サーバ３は、音質、音程、音量を表現するデジタル音声データと声紋データとに基づいて端末のユーザが真正なユーザであるかどうかを判断する。 The voice authentication server 3 converts digital data obtained by digitally converting user authentication voice data input from the terminal 60 via a communication line, and voice print data created from the digital data to corresponding voices in a table in the storage device. The user authentication voice data is voice-authenticated compared with the authentication digital data (first voice data). The voice authentication server 3 determines whether or not the user of the terminal is an authentic user based on the digital voice data and voiceprint data representing the sound quality, pitch, and volume.

具体的には、音声認証サーバ３は、判定処理において、相関処理を用いて、メモリ空間に２次元的なオリジナルデータとリクエストされた入力データを形成し、例えば水平方向Ｘを時間軸にとり、垂直方向Ｙをマグニチュード（ｍ）にとる二次元ビットイメージを生成する。そして、音声認証サーバ３は、二次元ビットイメージを水平方向に切り出して、マグニチュードのエッジを抽出し、この抽出したエッジの位置を比較することにより判定処理、すなわち相関処理をＣＰＵに実行させることで、両者の一致又は不一致を検出する。例えばデジタル変換された音声データは離散的にＲＡＭに保存されているので、水平方向に切り出したエッジの位置の同定処理により同一性を判定できるため、時間軸を間引いても音声データの同一性を短時間かつＣＰＵの負荷を低減させることができる。 Specifically, in the determination process, the voice authentication server 3 uses the correlation process to form two-dimensional original data and the requested input data in the memory space. For example, the horizontal direction X takes the time axis and the vertical A two-dimensional bit image having a direction Y in magnitude (m) is generated. The voice authentication server 3 cuts out the two-dimensional bit image in the horizontal direction, extracts magnitude edges, and compares the positions of the extracted edges to cause the CPU to execute determination processing, that is, correlation processing. , A match or mismatch between the two is detected. For example, since the digitally converted audio data is discretely stored in the RAM, the identity can be determined by identifying the position of the edge cut out in the horizontal direction, so that the audio data is identical even if the time axis is thinned out. The load on the CPU can be reduced in a short time.

従来の音声認証方式は、パターンによる照合分析を使用するのが一般的であったが、本実施形態では、音声認証サーバ３は、デジタルデータによるデータ比較照合により音声認証を行う。デジタル音声認証は、従来のパターン認識によるアナログ音声認証よりも認証精度を向上するとともにシステム処理工数を減少させることができる。 Conventional voice authentication methods generally use collation analysis based on patterns. In this embodiment, the voice authentication server 3 performs voice authentication by data comparison and collation using digital data. Digital voice authentication can improve the authentication accuracy and reduce the number of system processing steps compared with the analog voice authentication based on the conventional pattern recognition.

音声認識サーバ４は、音声認識でユーザの発話内容の正誤を確認する処理を実行する。制御サーバ２は、音声自動応答装置で幾つかの質問を行い、音声認識サーバ４は、質問に対する回答が正しかったか、内容に誤りや不合理な点がなかったかを含めて総合判断して認証する。 The voice recognition server 4 executes processing for confirming whether the user's utterance content is correct or not by voice recognition. The control server 2 makes several questions using the automatic voice response device, and the voice recognition server 4 authenticates by comprehensively judging whether the answer to the question was correct or whether there were any errors or irrational points in the contents. .

具体的には、音声認識サーバ４は、端末５０から通信回線を介して入力された音声認識用音声データをデジタル変換し、変換したデジタル音声データ（第２音声データ）を音声認識サーバ４の記憶装置内のテーブルに記憶する。音声認識サーバ４は、端末６０から通信回線を介して入力されたユーザの認証用音声データをデジタル変換したデジタルデータと音声認識サーバ４の記憶装置内のテーブル中の対応する第２音声データを読み出して比較しユーザの認証用音声データを音声認識する。この音声認識には隠れマルコフモデルや統計的言語モデルを用いることができる。 Specifically, the voice recognition server 4 digitally converts voice recognition voice data input from the terminal 50 via a communication line, and stores the converted digital voice data (second voice data) in the voice recognition server 4. Store in a table in the device. The voice recognition server 4 reads out digital data obtained by digitally converting user authentication voice data input from the terminal 60 via a communication line and corresponding second voice data in a table in the storage device of the voice recognition server 4. The voice data for user authentication is recognized. A hidden Markov model or a statistical language model can be used for this speech recognition.

音声認識と組み合わせることにより任意の文言に対応が可能となる。音声認識用音声データについては、端末５０からの処理により、音声認証を行った後に文言の変更を行うことができる。音声を発することにより認証を行うため、他人に聞かれる可能性がある。そのため、音声認識用音声データを任意な時に変更を可能とすることにより、不正利用を防止できる。この変更機能は、他のバイオメトリクス方式と比較して優位な点である。他の虹彩、静脈等のバイオメトリクス方式は、固定データのみの活用であるが、本方式は、固定と可変との任意な組み合わせにより対応が可能である。 Arbitrary wording can be supported by combining with voice recognition. With respect to the voice data for voice recognition, the wording can be changed after voice authentication is performed by processing from the terminal 50. Since authentication is performed by uttering a voice, it may be heard by others. Therefore, unauthorized use can be prevented by making it possible to change the voice data for voice recognition at any time. This change function is an advantage over other biometric methods. Other biometric methods such as iris and vein use only fixed data, but this method can be supported by any combination of fixed and variable.

システム連携サーバ５は、制御サーバ２、音声認証サーバ３及び音声認識サーバ４の全体を連携させる。なお、本人認証装置１は、会員番号、口座番号、暗証番号、パスワード等の既存の認証手段と生体認証を併用してもよい。これにより、セキュリティの向上を図ることができる。 The system linkage server 5 links the control server 2, the voice authentication server 3, and the voice recognition server 4 together. The personal authentication device 1 may use biometric authentication together with existing authentication means such as a membership number, account number, password, and password. Thereby, security can be improved.

次に、登録時の処理について説明する。図２は、本発明の実施形態における登録時の処理フローチャートである。ユーザは、音声認証処理を利用する場合、申込み受付を行った後、端末５０より本人登録電話番号に架電を行う（ステップＳ１１）。ユーザは端末５０より制御サーバ２からの指示に基づき登録申込み時のＩＤおよびパスワードの入力を行う（ステップＳ１２）。制御サーバ２は、入力されたＩＤ及びパスワードに従い申込み時のデータと照合を行う（ステップＳ１３）。 Next, processing at the time of registration will be described. FIG. 2 is a process flowchart at the time of registration in the embodiment of the present invention. When the user uses the voice authentication process, after accepting the application, the user makes a call from the terminal 50 to the registered telephone number (step S11). Based on an instruction from the control server 2 from the terminal 50, the user inputs an ID and a password when applying for registration (step S12). The control server 2 collates with the data at the time of application according to the input ID and password (step S13).

制御サーバ２は、ＩＤとパスワードの照合がＯＫであった場合（ステップＳ１３でＹ）、端末５０から入力された音声に基づいて、音声認証用音声データ（固定データ）の登録を行う（ステップＳ１４）。音声認証サーバ３は、制御サーバ２からの指示に従い、複数回、音声認証用音声データ取得を行う。制御サーバ２からの音声指示に基づいて、ユーザは端末５０より各音声について例えば３回入力を行って、制御サーバ２は、データの保存を実施する。 If the verification of the ID and password is OK (Y in Step S13), the control server 2 registers voice authentication voice data (fixed data) based on the voice input from the terminal 50 (Step S14). ). The voice authentication server 3 acquires voice data for voice authentication a plurality of times in accordance with instructions from the control server 2. Based on the voice instruction from the control server 2, the user inputs, for example, three times for each voice from the terminal 50, and the control server 2 saves the data.

音声認証サーバ３は、端末５０から入力された音声認証用データをデジタル変換し、離散的に変換したデジタル音声データを音声の要素に分けてデジタル録音により格納する（ステップＳ１４）。 The voice authentication server 3 digitally converts voice authentication data input from the terminal 50, divides the discretely converted digital voice data into voice elements and stores them by digital recording (step S14).

音声認証サーバ３は、認証要素として、まず変換したデジタル音声データから音声の特徴を表す要素ＰＡＲＣＯＲ係数（偏自己相関係数）を分析して抽出する。次に、音声認証サーバ３は、さらに、変換したデジタル音声データからピッチ周期、振幅、有声／無声判断など音源情報を作り出し声紋データとして音声認証サーバ３の記憶装置内のテーブルに格納する。このデジタル音声データと声紋データとが音声認証用デジタルデータとなる。ここで、音声情報をアナログデータとして格納をした場合、音声認識としては使えるが、元の音声確認、すなわち音声認証が複雑になる。そこで、音声認証サーバ３は、認証方式を変更した場合でも、データ再取得対応と肉声を聞く必要があるときの対応として音声そのものをデジタルデータとして保存しておく。 The voice authentication server 3 first analyzes and extracts an element PARCOR coefficient (partial autocorrelation coefficient) representing a voice feature from the converted digital voice data as an authentication factor. Next, the voice authentication server 3 further creates sound source information such as pitch period, amplitude, voiced / unvoiced judgment from the converted digital voice data, and stores it as a voice print data in a table in the storage device of the voice authentication server 3. The digital voice data and voiceprint data become voice authentication digital data. Here, when voice information is stored as analog data, it can be used for voice recognition, but the original voice confirmation, that is, voice authentication becomes complicated. Therefore, even when the authentication method is changed, the voice authentication server 3 stores the voice itself as digital data as a response to data reacquisition and a response when it is necessary to listen to the real voice.

次に音声認識サーバ４は、任意の言葉の登録を行う（ステップＳ１５）。端末５０を介して制御サーバ２からの指示に基づいて申込み書記入の任意の言葉を発声してもらう。音声認識サーバ４は、端末６０および通信回線を介して入力されたユーザの認証用音声データをデジタル録音により記録する。データを固定音声と同様分析を行い、肉声を合わせて格納する（ステップＳ１６）。音声認識サーバ４は、端末５０から通信回線を介して入力されたユーザ認証用音声データをデジタル変換し、離散的に変換したデジタル音声データを音声認識サーバ４の記憶装置内のテーブルに記憶する。 Next, the voice recognition server 4 registers an arbitrary word (step S15). Based on an instruction from the control server 2 via the terminal 50, an arbitrary word for filling in the application form is uttered. The voice recognition server 4 records voice data for user authentication input via the terminal 60 and the communication line by digital recording. The data is analyzed in the same manner as the fixed voice, and the voice is combined and stored (step S16). The voice recognition server 4 digitally converts user authentication voice data input from the terminal 50 via a communication line, and stores the discretely converted digital voice data in a table in the storage device of the voice recognition server 4.

次に、本人認証時の処理について説明する。図３は、本発明の実施形態における本人認証時の処理フローチャートである。 Next, processing at the time of personal authentication will be described. FIG. 3 is a process flowchart at the time of personal authentication in the embodiment of the present invention.

制御サーバ２は、指定の電話番号から受電する（ステップＳ２１）。制御サーバ２は、端末６０から登録申込み時のＩＤおよびパスワードの入力を行い（ステップＳ２２）、登録されているＩＤとパスワードデータと照合を行う（ステップＳ２３）。音声認証サーバ３は、ＩＤとパスワードの照合がＯＫであった場合、制御サーバ２からの指示に従い、ユーザの認証用音声データの照合を行う（ステップＳ２４）。具体的には、音声認証サーバ３は、相関処理を用いて、メモリ空間に２次元的なオリジナルデータとリクエストされた入力データを形成し、例えば水平方向Ｘを時間軸にとり、垂直方向Ｙをマグニチュード（ｍ）にとる二次元ビットイメージを生成する。そして、音声認証サーバ３は、二次元ビットイメージを水平方向に切り出して、マグニチュードのエッジを抽出し、この抽出したエッジの位置を比較することにより両者の一致又は不一致を検出する。制御サーバ２は、音声認証処理において固定データの照合がＯＫの場合、ステップＳ２５に進める。 The control server 2 receives power from the designated telephone number (step S21). The control server 2 inputs the ID and password when applying for registration from the terminal 60 (step S22), and collates the registered ID and password data (step S23). If the verification of the ID and password is OK, the voice authentication server 3 checks the voice data for user authentication in accordance with the instruction from the control server 2 (step S24). Specifically, the voice authentication server 3 uses correlation processing to form two-dimensional original data and requested input data in the memory space, for example, taking the horizontal direction X as a time axis and the vertical direction Y as a magnitude. A two-dimensional bit image taken in (m) is generated. Then, the voice authentication server 3 cuts out the two-dimensional bit image in the horizontal direction, extracts magnitude edges, and compares the positions of the extracted edges to detect matching or mismatching of both. If the verification of the fixed data is OK in the voice authentication process, the control server 2 proceeds to step S25.

音声認識サーバ４は、音声認証処理において本人確認係数が一定値以下の場合、あるいは機械（レコーダー等の機械発声音）からの発生が疑われる場合は、ユーザの認証用音声データ（任意の文言）について音声認証と同様、制御サーバ２からの指示に従い任意データの照合を行って音声認識処理を実行する（ステップＳ２５）。音声認識サーバ４は、音声認識処理において任意データの認証がＯＫの場合にステップＳ２６に進む。音声認証サーバ３と音声認識サーバ４は、固定データおよび任意データについて、決められた一定期間経過後（たとえば1年）である場合、複数保有するデータのうち最終のものを最新のものと入れ替える制御を行う。これによりユーザの声道の変化による経年劣化に対応することができる。 The voice recognition server 4 uses the voice data for user authentication (arbitrary wording) when the identity verification coefficient is less than a certain value in the voice authentication process or when it is suspected to be generated from a machine (a machine utterance sound such as a recorder). As in voice authentication, arbitrary data is collated in accordance with an instruction from the control server 2 to execute voice recognition processing (step S25). The voice recognition server 4 proceeds to step S26 when the authentication of the arbitrary data is OK in the voice recognition process. The voice authentication server 3 and the voice recognition server 4 control the fixed data and the arbitrary data to replace the last one of the plurality of held data with the latest one after a fixed period of time (for example, one year). I do. As a result, it is possible to cope with deterioration over time due to a change in the vocal tract of the user.

制御サーバ２は、発声に対して音声認証機能および音声認識機能により総合判定を行って、本人認証結果をシステムに返す（ステップＳ２６）。これによって本人認証が終了する。 The control server 2 makes a comprehensive determination on the utterance using the voice authentication function and the voice recognition function, and returns the personal authentication result to the system (step S26). This completes the personal authentication.

上述の実施形態では垂直方向のマグニチュード（ｍ）を例示したが、本発明は垂直方向のマグニチュード（ｍ）に限定されることはなく、例えば、水平方向に時間軸（ｔ）をメモリ（ＲＡＭ）に設定し、垂直方向に周波数帯域（ｆ）のスペクトラムを設定してもよい。この周波数帯域は第一フォルマント(約５００〜１０００Ｈｚ)と第二フォルマント(約１５００〜３０００Hz)によって母音が判別でき、子音には明確なフォルマントが確認することができない。
そして、記憶装置に記憶しているオリジナルの周波数帯域の音声データパターンをヒストグラムラムのエッジ画像で特定し、認証する音声入力データの周波数帯域の音声データパターンをヒストグラムラムのエッジ画像で特定し、両者を相関処理することでデータの一致もしくは不一致を判定することができる。 In the above-described embodiment, the vertical magnitude (m) is exemplified. However, the present invention is not limited to the vertical magnitude (m). For example, the time axis (t) in the horizontal direction is stored in the memory (RAM). And the spectrum of the frequency band (f) may be set in the vertical direction. In this frequency band, vowels can be distinguished by the first formant (about 500 to 1000 Hz) and the second formant (about 1500 to 3000 Hz), and no clear formant can be confirmed in the consonant.
Then, the voice data pattern of the original frequency band stored in the storage device is specified by the edge image of the histogram ram, the voice data pattern of the frequency band of the voice input data to be authenticated is specified by the edge image of the histogram ram, By performing the correlation processing, it is possible to determine whether the data matches or does not match.

この場合、第一フォルマンと第二フォルマンの画像が位置するか否かを判定するので、すべての周波数帯域の音声データを比較しないため、フォルマント周波数帯域（ｘ方向）以外の周波数帯域に現れるヒストグラム画像処理を省略することでＣＰＵの負荷を低減でき処理速度を向上することができる。例えば、ＣＰＵは１０秒間のサンプリングデータの相関処理を０．５秒以内に完了させることができる。つまりアナログ解析に比してデジタル処理が速度およびＣＰＵの負荷を軽減できるし、固体差によるフォルマント周波数の分布およびベクトルがそれぞれ相違するため、認証処理の精度が従来に比して向上させることができる。さらに、ヒストグラムのエッジ検出は所定の閾値に設定してもよく、０レベルを所定期間に何回通過したかを検出するゼロクロス検出を用いても良い。要はヒストグラム画像の相関処理ができるデータを用いることができる。 In this case, since it is determined whether or not the images of the first formant and the second formant are located, since the audio data of all frequency bands are not compared, a histogram image appearing in a frequency band other than the formant frequency band (x direction). By omitting the processing, the load on the CPU can be reduced and the processing speed can be improved. For example, the CPU can complete the correlation processing of sampling data for 10 seconds within 0.5 seconds. In other words, digital processing can reduce the speed and CPU load compared to analog analysis, and the formant frequency distribution and vector due to individual differences are different, so that the accuracy of authentication processing can be improved compared to the conventional method. . Furthermore, the edge detection of the histogram may be set to a predetermined threshold value, or zero cross detection that detects how many times the 0 level has been passed in a predetermined period may be used. In short, data capable of correlation processing of histogram images can be used.

以上、本発明の好ましい実施例について詳述したが、本発明に係る実施例に限定されるものではなく、特許請求の範囲に記載された本発明の要旨の範囲内において、種々の変形、変更が可能である。 The preferred embodiments of the present invention have been described in detail above, but the present invention is not limited to the embodiments according to the present invention, and various modifications and changes can be made within the scope of the gist of the present invention described in the claims. Is possible.

１００本人認証システム
１本人認証装置
２制御サーバ
３音声認証サーバ
４音声認識サーバ
５システム連携サーバ DESCRIPTION OF SYMBOLS 100 Personal authentication system 1 Personal authentication apparatus 2 Control server 3 Voice authentication server 4 Voice recognition server 5 System cooperation server

Claims

The user's first voice authentication voice data and the user's first voice recognition voice data input via the terminal and the communication line are input,
First voice data obtained by adding voice print data created from the digital voice authentication voice data to digital voice authentication voice data obtained by digitally converting the first voice authentication voice data of the user;
A storage device that stores in advance a table including second voice data that is digital voice recognition voice data obtained by digitally converting the first voice recognition voice data of the user;
The first digital voice data obtained by digitally converting the second voice authentication voice data input based on the user authentication request via the terminal and the communication line is compared with the corresponding first voice data in the table. Voice authentication means for authenticating the first digital voice data and determining whether or not the user of the terminal is a genuine user;
The second voice recognition voice data input based on the user authentication request via the terminal and the communication line, or the second digital voice data obtained by digitally converting the second voice authentication voice data, and the table Voice recognition means for comparing the corresponding second voice data in the voice, recognizing the second digital voice data, and confirming the correctness of the utterance content of the user,
A personal authentication apparatus, wherein the personal authentication is performed by a combination of authentication of the first digital voice data by the voice authentication means and recognition of the second digital voice data by the voice recognition means.

The user's first voice authentication voice data and the user's first voice recognition voice data input via the terminal and the communication line are input,
First voice data obtained by adding voice print data created from the digital voice authentication voice data to digital voice authentication voice data obtained by digitally converting the first voice authentication voice data of the user;
A storage device that stores in advance a table including second voice data that is digital voice recognition voice data obtained by digitally converting the first voice recognition voice data of the user;
The first digital voice data obtained by digitally converting the second voice authentication voice data input based on the user authentication request via the terminal and the communication line is compared with the corresponding first voice data in the table. Voice authentication means for authenticating the first digital voice data and determining whether or not the user of the terminal is a genuine user;
The second voice recognition voice data input based on the user authentication request via the terminal and the communication line, or the second digital voice data obtained by digitally converting the second voice authentication voice data, and the table Voice recognition means for comparing corresponding second voice data in the voice, recognizing the second digital voice data, and confirming the correctness of the utterance content of the user;
Notification means for notifying the personal authentication result of the voice data input from the terminal according to the determination result by the voice authentication means and the voice recognition means,
A personal authentication apparatus, wherein the personal authentication is performed by a combination of authentication of the first digital voice data by the voice authentication means and recognition of the second digital voice data by the voice recognition means.

The user's first voice authentication voice data and the user's first voice recognition voice data input via the terminal and the communication line are input,
First voice data obtained by adding voice print data created from the digital voice authentication voice data to digital voice authentication voice data obtained by digitally converting the first voice authentication voice data of the user;
Storing in advance a table including second voice data which is digital voice recognition voice data obtained by digitally converting the first voice recognition voice data of the user;
The first digital voice data obtained by digitally converting the second voice authentication voice data input based on the user authentication request via the terminal and the communication line is compared with the corresponding first voice data in the table. And authenticating the first digital voice data to determine whether or not the user of the terminal is a genuine user;
The second voice recognition voice data input based on the user authentication request via the terminal and the communication line, or the second digital voice data obtained by digitally converting the second voice authentication voice data, and the table A voice recognition step of comparing the corresponding second voice data in the voice, recognizing the second digital voice data, and confirming whether the user's utterance content is correct or incorrect,
A personal authentication method comprising: authenticating a person by a combination of authentication of the first digital voice data by the voice authentication process and recognition of the second digital voice data by the voice recognition process.

The user's first voice authentication voice data and the user's first voice recognition voice data input via the terminal and the communication line are input,
First voice data obtained by adding voice print data created from the digital voice authentication voice data to digital voice authentication voice data obtained by digitally converting the first voice authentication voice data of the user;
A storage step of previously storing a table including second voice data which is digital voice recognition voice data obtained by digitally converting the user's first voice recognition voice data;
The first digital voice data obtained by digitally converting the second voice authentication voice data input based on the user authentication request via the terminal and the communication line is compared with the corresponding first voice data in the table. A voice authentication step of authenticating the first digital voice data and determining whether a user of the terminal is a genuine user;
The second voice recognition voice data input based on the user authentication request via the terminal and the communication line, or the second digital voice data obtained by digitally converting the second voice authentication voice data, and the table A voice recognition step of comparing the corresponding second voice data in the voice, recognizing the second digital voice data, and confirming the correctness of the utterance content of the user;
A notification step of notifying a personal authentication result of voice data input from the terminal according to a determination result by the voice authentication step and the voice recognition step ,
A personal authentication method comprising: authenticating a person by a combination of authentication of the first digital voice data by the voice authentication process and recognition of the second digital voice data by the voice recognition process.