JPH09127976A

JPH09127976A - Speaker recognition system and speaker recognition method

Info

Publication number: JPH09127976A
Application number: JP7306833A
Authority: JP
Inventors: Junichiro Fujimoto; 潤一郎藤本; Atsushi Shibata; 敦柴田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1995-10-30
Filing date: 1995-10-30
Publication date: 1997-05-16
Anticipated expiration: 2015-10-30
Also published as: JP3506293B2

Abstract

PROBLEM TO BE SOLVED: To cope with the case that a normal speaker recognition means cannot be used due to a cold of a speaker by collating the similarity between the feature of the inputted voice of the speaker and the voice feature corresponding to the specific information within the voice feature of the speaker stored in a speaker recognition information memory means. SOLUTION: A user inputs the specific information such as voice from a specific information input means 2. This voice is converted into the feature quantity and fed to a register section 6. The register section 6 registers the standard pattern of the voice of the user in a speaker recognition information memory section 5 correspondingly to the specific information inputted from the specific information input means 2. When a normal speaker recognition means cannot be utilized, the user inputs the prescribed specific information from the specific information input means 2. This information is converted into the feature quantity and fed to a speaker recognition section 7. The speaker recognition section 7 extracts the registered standard pattern from the speaker recognition information memory section 5 and collates this standard pattern with the feature pattern from a feature extraction section 4.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、話者識別を行なう
話者識別システムおよび話者識別方法に関する。TECHNICAL FIELD The present invention relates to a speaker identification system and a speaker identification method for speaker identification.

【０００２】[0002]

【従来の技術】従来、銀行などにおいて、本人であるこ
とを確認するために、暗証番号などを利用者に入力させ
るようにしている。また、コンピュータでは、パスワー
ドと称して、暗証番号と同様の暗証文字列を利用者に入
力させることによって本人の確認を行なっている。しか
しながら、このような暗証番号や暗証文字列などの入力
による確認は、他人が、暗証番号や暗証文字列を知りさ
えすれば、難無く、これを盗用することができる。しか
も、暗証番号や暗証文字列は、それを登録した者(本人)
の生年月日や記念日、あるいは電話番号、氏名の綴りな
どを利用したものが多く、他人がこれを見破ることは差
程難しいことではない。2. Description of the Related Art Conventionally, in a bank or the like, a user is required to input a personal identification number or the like in order to confirm his / her identity. Further, the computer confirms the identity of the user by allowing the user to enter a personal identification code string similar to a personal identification number, called a password. However, such confirmation by inputting the personal identification number or personal identification character string can be stolen without difficulty as long as another person knows the personal identification number or personal identification character string. Moreover, the PIN and PIN are the person who registered them (the person).
Many of them use the date of birth and anniversary, or phone number, spelling of name, etc., so it is not difficult for others to discover it.

【０００３】暗証番号や暗証文字列のこのような欠点を
回避するため、近年、声によって本人か否かを判定す
る、話者認識を用いた話者識別が着目されている。この
話者認識を用いた話者識別は、ある話者が発声した音声
の特徴パターンが、予め登録されているこの話者の音声
標準パターンと一致するか否かを調べることにより、本
人か否かを判定するものである。すなわち、話者の音声
から抽出した特徴量(特徴パターン)とこの話者の音声標
準パターンとの類似度を計算し、類似度の高低によって
本人か否かを判定するものであり、人間の肉体的特徴を
利用するものであることから、音声は、暗証番号や暗証
文字列に比べて他人がこれを真似ることは難かしく、従
って、他人の盗用をより有効に防止することができる。In order to avoid such drawbacks of the personal identification number and personal identification character string, attention has recently been paid to speaker identification using speaker recognition, which determines whether or not the person is the person himself / herself by voice. Speaker identification using this speaker recognition is performed by checking whether or not the characteristic pattern of the voice uttered by a speaker matches the pre-registered standard voice pattern of this speaker. It is to determine whether or not. That is, the similarity between the feature amount (feature pattern) extracted from the speaker's voice and the voice standard pattern of this speaker is calculated, and it is determined whether or not the person is the person based on the degree of similarity. Since it utilizes the physical characteristics, it is difficult for other people to imitate the voice as compared with the personal identification number or the personal identification character string, and thus it is possible to more effectively prevent the theft of another person.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上述し
たような従来の話者識別システムにおいては、利用者の
音声が、風邪などによって突然変化すると、話者認識を
用いた話者識別を行なうことができなくなってしまうと
いう問題があった。However, in the conventional speaker identification system as described above, when the voice of the user suddenly changes due to a cold or the like, the speaker identification using the speaker recognition can be performed. There was a problem that I could not do it.

【０００５】本発明は、風邪などによって声が突然変化
して、話者認識を全く使えなくなった場合にも、このよ
うな事態に対処することの可能な話者識別システムおよ
び話者識別方法を提供することを目的としている。The present invention provides a speaker identification system and a speaker identification method capable of coping with such a situation even when the voice is suddenly changed due to a cold or the like and the speaker recognition cannot be used at all. It is intended to be provided.

【０００６】[0006]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の発明は、話者識別用情報が記憶され
る話者識別用情報記憶手段と、利用者を特定するための
特定用情報を入力するための特定用情報入力手段と、話
者の音声を入力するための音声入力手段と、音声入力手
段から入力された話者の音声の特徴と前記話者識別用情
報記憶手段に記憶されている話者の音声特徴のうち前記
特定用情報入力手段から入力された特定用情報に対応す
る音声特徴とが類似しているか否かの照合を行なう照合
手段と、前記照合の結果、類似していないと判別された
ときに、利用者に確認をとるための確認手段とを備えて
いることを特徴としている。In order to achieve the above object, the invention according to claim 1 is a speaker identification information storage means for storing speaker identification information, and a user for specifying a user. Specific information input means for inputting specific information, voice input means for inputting the voice of the speaker, characteristics of the voice of the speaker input from the voice input means, and the speaker identification information storage Collating means for collating whether or not the speaker voice feature stored in the means is similar to the voice feature corresponding to the identifying information input from the identifying information inputting means; As a result, when it is determined that they are not similar to each other, a confirmation means for confirming the user is provided.

【０００７】また、請求項２記載の発明は、請求項１記
載の話者識別システムにおいて、前記確認手段は、利用
者を特定するための特定用情報に対応付けて付随情報が
記憶される付随情報記憶手段を備えており、前記照合の
結果、類似していないと判別したときに、特定用情報入
力手段から入力された特定用情報に対応付けされて前記
付随情報記憶手段に記憶されている付随情報に従って、
利用者に確認のための通知を行ない、該利用者から正規
の利用者か否かの確認をとることを特徴としている。According to a second aspect of the present invention, in the speaker identification system according to the first aspect, the confirmation means stores associated information in association with identification information for identifying a user. An information storage unit is provided, and when it is determined that they are not similar as a result of the collation, the information storage unit is stored in the associated information storage unit in association with the identification information input from the identification information input unit. According to the accompanying information
It is characterized in that a notification for confirmation is given to the user, and whether the user is an authorized user or not is confirmed.

【０００８】また、請求項３記載の発明は、請求項２記
載の話者識別システムにおいて、前記付随情報記憶手段
には、付随情報として、さらに、利用者を特定するため
の第２の特定用情報が記憶されるようになっており、前
記確認手段は、前記照合の結果、類似していないと判別
したときに、利用者に対し第２の特定用情報の入力を指
示し、利用者から第２の特定用情報が入力されたとき
に、前記確認手段は、利用者によって入力された第２の
特定用情報と前記特定用情報入力手段から入力された特
定用情報に対応させて前記付随情報記憶手段に記憶され
ている第２の特定用情報とを照合して、正規の利用者か
否かの確認をとることを特徴としている。Further, the invention according to claim 3 is the speaker identification system according to claim 2, wherein the accompanying information storage means further includes, as additional information, a second identification information for identifying the user. Information is stored, and when the confirmation means determines that they are not similar as a result of the collation, the confirmation means instructs the user to input the second identification information, When the second specifying information is input, the confirming unit associates the second specifying information input by the user with the specifying information input from the specifying information input unit. It is characterized in that it is checked whether or not the user is an authorized user by collating with the second specifying information stored in the information storage means.

【０００９】また、請求項４記載の発明は、請求項１ま
たは請求項２記載の話者識別システムにおいて、前記確
認手段は、前記照合の結果、類似していないと判別した
ときに、利用者に対して、正しい第２の特定用情報を含
む複数のダミーの特定用情報を提示し、利用者にそのう
ちの１つを選択させるようになっていることを特徴とし
ている。The invention according to claim 4 is the speaker identification system according to claim 1 or 2, wherein the confirmation means determines that the user is not similar as a result of the collation. On the other hand, a plurality of dummy identification information items including the correct second identification information item are presented, and the user is allowed to select one of them.

【００１０】また、請求項５記載の発明は、請求項２記
載の話者識別システムにおいて、前記付随情報記憶手段
には、利用者の電話番号が付随情報として記憶されてお
り、前記確認手段は、該電話番号に従って、利用者を電
話呼出しすることを特徴としている。According to the invention of claim 5, in the speaker identification system according to claim 2, the telephone number of the user is stored as incidental information in the incidental information storage means, and the confirmation means is The user is called according to the telephone number.

【００１１】また、請求項６記載の発明は、請求項５記
載の話者識別システムにおいて、前記確認手段が利用者
を電話呼出しするとき、通話中であった場合に、利用者
が正規の利用者であると判断することを特徴としてい
る。According to a sixth aspect of the present invention, in the speaker identification system according to the fifth aspect, when the confirmation means calls the user by telephone, if the user is in a telephone call, the user can properly use the telephone. The feature is that it is judged as a person.

【００１２】また、請求項７記載の発明は、請求項１ま
たは請求項２記載の話者識別システムにおいて、確認の
結果、正規の利用者であるとの確認がとれなかった場合
に、現在の利用者の音声を再生可能に保存する音声記憶
手段がさらに設けられていることを特徴としている。According to the invention of claim 7, in the speaker identification system according to claim 1 or claim 2, when it is not confirmed that the user is an authorized user as a result of the confirmation, the present invention is obtained. It is characterized in that a voice storage means for storing the voice of the user in a reproducible manner is further provided.

【００１３】また、請求項８記載の発明は、請求項１ま
たは請求項２記載の話者識別システムにおいて、確認の
結果、正規の利用者であるとの確認がとれなかった場合
に、現在の利用者の映像を再生可能に保存する映像記憶
手段がさらに設けられていることを特徴としている。Further, according to the invention of claim 8, in the speaker identification system according to claim 1 or claim 2, when it is not confirmed that the user is an authorized user as a result of the confirmation, the present invention is obtained. It is characterized in that a video storage means for storing the video of the user in a reproducible manner is further provided.

【００１４】また、請求項９記載の発明は、利用者を特
定するための特定用情報が入力され、話者の音声が入力
されるとき、入力された話者の音声の特徴と入力された
特定用情報に対応させて予め記憶されている音声特徴と
が類似しているか否かの照合を行ない、前記照合の結
果、類似していないと判別されたときに、利用者に確認
をとることを特徴としている。According to the invention of claim 9, when the specifying information for specifying the user is input and the voice of the speaker is input, the feature of the input voice of the speaker is input. A check is performed to determine whether or not the pre-stored voice feature is similar to the identification information, and when the result of the comparison indicates that they are not similar, a confirmation is given to the user. Is characterized by.

【００１５】また、請求項１０記載の発明は、請求項９
記載の話者識別方法において、前記照合の結果、類似し
ていないと判別したときに、利用者に対し第２の特定用
情報の入力を指示し、利用者から第２の特定用情報が入
力されたときに、利用者によって入力された第２の特定
用情報と前記特定用情報に対応させて予め記憶されてい
る第２の特定用情報とを照合して、正規の利用者か否か
の確認をとることを特徴としている。The invention described in claim 10 is the same as the ninth invention.
In the speaker identification method described, when it is determined that they are not similar as a result of the collation, the user is instructed to input the second specifying information, and the user inputs the second specifying information. At this time, the second identification information input by the user is collated with the second identification information stored in advance corresponding to the identification information to determine whether the user is an authorized user. It is characterized by taking the confirmation of.

【００１６】また、請求項１１記載の発明は、請求項９
記載の話者識別システムにおいて、前記照合の結果、類
似していないと判別したときに、利用者に対して、正し
い第２の特定用情報を含む複数のダミーの特定用情報を
提示し、利用者にそのうちの１つを選択させることを特
徴としている。The invention according to claim 11 is the invention according to claim 9.
In the speaker identification system described above, when it is determined that they are not similar as a result of the collation, a plurality of dummy specifying information including the correct second specifying information is presented to the user to use. It is characterized by letting the person choose one of them.

【００１７】[0017]

【発明の実施の形態】図１は本発明に係る話者識別シス
テムの構成例を示す図である。図１を参照すると、この
話者識別システムは、例えば銀行などにおける本人の確
認を話者認識により行なうためのものであって、利用者
の音声を入力するための音声入力手段(例えば、マイク
ロフォン)１と、利用者を特定するための特定用情報を
入力させるための特定用情報入力手段(例えばキーボー
ド)２と、音声入力手段１から入力された信号の中から
話者の音声の部分のみを音声区間として検出する音声区
間検出部３と、音声区間検出部３で検出した音声区間内
の音声信号から特徴量(特徴パターン)を抽出する特徴抽
出部４と、話者認識を行なうに先立って話者の音声の標
準的な特徴量(特徴パターン)を標準パターンとして話者
認識用情報記憶部５に予め登録する登録部６と、利用者
(話者)の音声の特徴量(特徴パターン)と話者認識用情報
記憶部５に登録されている標準パターンとを照合し、そ
の類似度に基づいて話者認識を行なう話者認識部７と、
標準パターンの登録を行なう登録モードと話者認識を行
なう認識モードとの切替を行なう切替部(例えばスイッ
チ)８とを有している。1 is a diagram showing a configuration example of a speaker identification system according to the present invention. Referring to FIG. 1, the speaker identification system is for confirming the identity of a person in a bank or the like by speaker recognition, and is a voice input unit (for example, a microphone) for inputting a voice of a user. 1, a specific information input unit (for example, a keyboard) 2 for inputting specific information for specifying a user, and only a voice part of the speaker from the signal input from the voice input unit 1. A voice section detector 3 for detecting a voice section, a feature extractor 4 for extracting a feature amount (feature pattern) from a voice signal in the voice section detected by the voice section detector 3, and a speaker recognition prior to the recognition. A registration unit 6 that pre-registers a standard feature amount (feature pattern) of a speaker's voice as a standard pattern in the speaker recognition information storage unit 5, and a user.
A speaker recognizing unit 7 that matches the feature amount (feature pattern) of the voice of (speaker) with the standard pattern registered in the speaker recognition information storage unit 5 and performs speaker recognition based on the degree of similarity. When,
It has a switching unit (for example, a switch) 8 for switching between a registration mode for registering a standard pattern and a recognition mode for speaker recognition.

【００１８】ここで、特徴抽出部４は、音声信号を特徴
量(特徴パターン)として、スペクトルに変換しても良い
し、あるいはＬＰＣケプストラムに変換しても良く、特
徴量の種類については特に限定するものではない。な
お、スペクトルに変換するためには、特徴量変換にはＦ
ＦＴを用い、また、ＬＰＣケプストラムに変換するため
にはＬＰＣ分析などを用いるのがよい。Here, the feature extraction unit 4 may convert the voice signal into a spectrum as a feature amount (feature pattern) or an LPC cepstrum, and the type of the feature amount is not particularly limited. Not something to do. It should be noted that in order to convert to a spectrum, F to conversion of feature quantity
FT is preferably used, and LPC analysis or the like is preferably used for conversion into LPC cepstrum.

【００１９】また、標準パターンの登録時(登録モード
時)において、登録部６は、ある話者が発声した音声に
基づいて特徴抽出部４で抽出された特徴量(特徴パター
ン)を標準パターンとして話者認識用情報記憶部５に登
録する際、図２に示すように、この話者により特定用情
報入力手段２から入力された特定用情報(例えば、この
話者の名前や生年月日，あるいはこの話者の暗証番号な
ど)と対応付けて、標準パターンを話者認識用情報記憶
部５に登録することができる。換言すれば、話者認識用
情報記憶部５には、話者識別に必要な話者認識用の情報
が登録されるようになっており、また、この話者認識用
情報記憶部５には、複数の話者(例えば利用者Ａ，Ｂ，
Ｃ，Ｄ，…)の話者認識用情報が登録可能となってい
る。When the standard pattern is registered (in the registration mode), the registration unit 6 uses the feature quantity (feature pattern) extracted by the feature extraction unit 4 based on the voice uttered by a speaker as the standard pattern. When registering in the speaker recognition information storage unit 5, as shown in FIG. 2, the identification information input by the speaker from the identification information input means 2 (for example, the name or birth date of this speaker, Alternatively, the standard pattern can be registered in the speaker recognition information storage unit 5 in association with the speaker's personal identification number. In other words, the speaker recognition information storage unit 5 is registered with the speaker recognition information necessary for speaker identification, and the speaker recognition information storage unit 5 is also stored in the speaker recognition information storage unit 5. , Multiple speakers (eg users A, B,
C, D, ...) Speaker recognition information can be registered.

【００２０】また、話者認識用情報記憶部５に登録され
る音声の標準パターンとしては、この話者識別システム
の使用形態等に応じて、各利用者(話者)に予め言葉を発
声させたものであっても良いし、各利用者ごとにそれぞ
れ自由に所望の言葉を発声させたものであっても良い。As a standard pattern of voices registered in the speaker recognition information storage unit 5, each user (speaker) is made to speak a word in advance according to the usage pattern of the speaker identification system. Alternatively, each user may freely utter a desired word.

【００２１】また、話者認識部７には、話者認識用情報
記憶部５に登録されている複数の話者の標準パターンか
ら現在の話者に対応する標準パターンを取り出し、この
標準パターンと現在の話者の特徴パターンとを照合し、
その類似度が所定基準値(しきい値)よりも高いか低いか
により現在の話者が正規の話者本人であるか否かを判定
する話者照合方式のものを用いることができる。Further, the speaker recognition unit 7 extracts a standard pattern corresponding to the current speaker from the standard patterns of a plurality of speakers registered in the speaker recognition information storage unit 5, and extracts the standard pattern as the standard pattern. Match the current speaker's characteristic pattern,
It is possible to use a speaker verification system that determines whether or not the current speaker is the regular speaker based on whether the degree of similarity is higher or lower than a predetermined reference value (threshold value).

【００２２】なお、話者認識部７において、話者照合方
式の話者認識がなされる場合、この話者認識時に、利用
者(話者)は、特定用情報入力手段２から登録モード時に
入力した特定用情報と同じ特定用情報を入力する必要が
ある。これにより、話者認識部７では、話者認識用情報
記憶部５に登録されている複数の話者の標準パターンの
うちから現在の話者に対応する標準パターンを取り出す
ことができ、この標準パターンと現在の話者の音声の特
徴パターンとの照合を行なうことができる。When the speaker recognition unit 7 performs speaker recognition by the speaker verification method, the user (speaker) inputs from the specific information input means 2 in the registration mode at the time of speaker recognition. It is necessary to enter the same identifying information as the identifying information that was specified. As a result, the speaker recognition unit 7 can extract the standard pattern corresponding to the current speaker from the standard patterns of the plurality of speakers registered in the speaker recognition information storage unit 5, and the standard pattern can be extracted. It is possible to match the pattern with the characteristic pattern of the voice of the current speaker.

【００２３】さらに、話者認識部７は、話者認識用情報
記憶部５に登録される音声の標準パターンが各利用者
(話者)に予め言葉を発声させたものである場合には、こ
れに対応した認識を行なうものにすることができ、ま
た、話者認識用情報記憶部５に登録される音声の標準パ
ターンが各利用者ごとにそれぞれ自由に所望の言葉を発
声させたものである場合には、これに対応した認識を行
なうものにすることができる。但し、各利用者(話者)に
予め決められた言葉を発声させて話者認識を行なう場
合、類似の判定基準(しきい値)を各話者に対して全て一
定値にすることができるが、各利用者ごとにそれぞれ所
望の言葉を発声させて話者認識を行なう場合には、類似
の判定基準(しきい値)を各話者ごとに相違させることも
できる。Further, the speaker recognizing unit 7 determines that the standard pattern of the voice registered in the speaker recognizing information storage unit 5 is for each user.
When the (speaker) has spoken a word in advance, the corresponding recognition can be performed, and the standard pattern of the voice registered in the speaker recognition information storage unit 5 can be used. Is a voice in which a desired word is freely uttered for each user, recognition corresponding to this can be performed. However, when each user (speaker) utters a predetermined word to perform speaker recognition, a similar criterion (threshold) can be set to a constant value for each speaker. However, when a desired word is uttered for each user to perform speaker recognition, a similar determination standard (threshold value) can be made different for each speaker.

【００２４】このような構成の話者識別システムを利用
者(例えばＤ)が始めて利用する場合、この利用者(話者)
Ｄは、先ず、自己の音声を標準パターンとして登録する
必要がある。このため、この利用者Ｄは、切替部(例え
ばスイッチ)８を操作して、特徴抽出部４を登録部６に
接続し、登録モードに設定する。When a user (for example, D) uses the speaker identification system having such a configuration for the first time, this user (speaker)
First, D needs to register his own voice as a standard pattern. Therefore, the user D operates the switching unit (for example, the switch) 8 to connect the feature extraction unit 4 to the registration unit 6 and set the registration mode.

【００２５】次いで、利用者(話者)Ｄは、特定用情報入
力手段２から所定の特定用情報，例えば(利用者Ｄ)を入
力する。また、この際、利用者は、予め決められた特定
の言葉を発声する。この音声は、音声入力手段１から入
力し、音声区間検出部３，特徴抽出部４により、特徴量
(特徴パターン)に変換され、この話者の音声の標準パタ
ーンとして、登録部６に与えられる。Next, the user (speaker) D inputs predetermined identification information, for example (user D), from the identification information input means 2. Further, at this time, the user utters a predetermined specific word. This voice is input from the voice input means 1, and the voice section detection unit 3 and the feature extraction unit 4 input the feature amount.
(Feature pattern) and is given to the registration unit 6 as a standard pattern of the voice of the speaker.

【００２６】これにより、登録部６は、この利用者(話
者)Ｄの音声の標準パターンを特定用情報入力手段２か
ら入力された特定用情報と対応付けて、話者認識用情報
記憶部５に登録する。例えば過去に、この話者認識用情
報記憶部５に複数の利用者(異なる利用者)Ａ，Ｂ，Ｃが
自己の音声を標準パターンとして登録しており、現在の
利用者Ｄが上記のように自己の音声を標準パターンとし
て登録するとき、この標準パターンは、話者認識用情報
記憶部５に図２に示すように記憶(登録)される。As a result, the registration unit 6 associates the standard pattern of the voice of the user (speaker) D with the identification information input from the identification information input means 2, and the speaker recognition information storage unit. Register to 5. For example, in the past, a plurality of users (different users) A, B, and C have registered their own voices as standard patterns in the speaker recognition information storage unit 5, and the current user D is as described above. When the user's own voice is registered as a standard pattern, the standard pattern is stored (registered) in the speaker recognition information storage unit 5 as shown in FIG.

【００２７】このようにして、この音声の標準パターン
が話者認識用情報記憶部５に記憶されると、利用者Ｄ
は、この話者識別システムにより、利用者Ｄについての
話者認識を行なわせることができる。すなわち、この利
用者Ｄは、このシステムを用いて、いま利用している利
用者が利用者Ｄ本人であるか否かの判定を行なわせるこ
とができる。In this way, when the standard pattern of this voice is stored in the speaker recognition information storage unit 5, the user D
With this speaker identification system, the speaker recognition for the user D can be performed. That is, this user D can use this system to determine whether or not the user who is currently using is the user D himself / herself.

【００２８】具体的に、利用者Ｄが以後、このシステム
を利用する場合、利用者Ｄは、切替部８を操作して、特
徴抽出部４を話者認識部７に接続し、このシステムを認
識モードに設定する。Specifically, when the user D subsequently uses this system, the user D operates the switching unit 8 to connect the feature extracting unit 4 to the speaker recognizing unit 7, and to use this system. Set to recognition mode.

【００２９】次いで、利用者Ｄは、特定用情報入力手段
２から所定の特定用情報，例えば(利用者Ｄ)を入力す
る。また、この際、利用者Ｄは、予め決められた特定の
言葉を発声する。この音声は、音声入力手段１から入力
し、音声区間検出部３，特徴抽出部４により、特徴量
(特徴パターン)に変換されて、話者認識部７に与えられ
る。Next, the user D inputs predetermined identification information, for example (user D), from the identification information input means 2. Further, at this time, the user D utters a predetermined specific word. This voice is input from the voice input means 1, and the voice section detection unit 3 and the feature extraction unit 4 input the feature amount.
It is converted into (feature pattern) and given to the speaker recognition unit 7.

【００３０】これにより、話者認識部７は、特定用情報
入力手段２から入力された特定用情報(利用者Ｄ)に対応
させて登録されている標準パターンを話者認識用情報記
憶部５から取り出し、この標準パターンと特徴抽出部４
からの特徴パターンとを照合して、その類似度を算出
し、この類似度が所定基準値よりも高いか低いかを判定
する。この結果、類似度が低いと判定されたときには、
利用者が正規の話者本人Ｄではないと判別し、この利用
者による利用を拒絶する。これに対し、類似度が高いと
判定されたときには、利用者が正規の話者本人Ｄである
と判別し、利用者による利用を許可する。すなわち、利
用者によるアプリケーション(例えば入出金，残高照会
などの処理)の利用を許可する。As a result, the speaker recognition section 7 uses the standard pattern registered in association with the identification information (user D) input from the identification information input means 2 as the speaker recognition information storage section 5. From the standard pattern and feature extraction unit 4
And the similarity is calculated, and it is determined whether the similarity is higher or lower than a predetermined reference value. As a result, when it is determined that the similarity is low,
It is determined that the user is not the authorized speaker himself D, and the use by this user is rejected. On the other hand, when it is determined that the degree of similarity is high, it is determined that the user is the regular speaker himself D, and the use is permitted by the user. That is, the user is permitted to use the application (for example, processing such as deposit / withdrawal and balance inquiry).

【００３１】ところで、このような話者識別システムに
おいては、前述したように、利用者(話者)の音声が、風
邪などによって突然変化すると、本人の音声であるにも
かかわらず、本人ではないと判定され、話者識別を行な
うことができなくなってしまう。By the way, in such a speaker identification system, as described above, when the voice of the user (speaker) suddenly changes due to a cold or the like, the voice of the user is not the true voice of the user. Therefore, the speaker identification cannot be performed.

【００３２】このような不都合を解決するため、図１の
話者識別システムには、音声入力手段１から入力された
話者の音声の特徴と前記話者識別用情報記憶部５に記憶
されている話者の音声特徴のうち特定用情報入力手段２
から入力された特定用情報に対応する音声特徴とが類似
しているか否かの照合を行なった結果、類似していない
と判別されたときに、さらに、利用者に確認をとるため
の確認手段１１が設けられており、この確認手段１１に
よって、利用者が正規の話者本人であることが確認され
たときには、話者認識部７における話者照合の結果、類
似していないと判別されても、現在の利用者が正規の話
者本人であると識別するようになっている。In order to solve such an inconvenience, in the speaker identification system of FIG. 1, the characteristics of the speaker's voice input from the voice input means 1 and the speaker identification information storage unit 5 are stored. Information input means 2 for identification of the voice characteristics of the speaker
As a result of collating whether or not the voice feature corresponding to the identification information input from the user is similar, when it is determined that they are not similar, a confirmation means for further confirming with the user 11 is provided, and when the confirmation means 11 confirms that the user is the regular speaker himself, it is determined that the speakers are not similar as a result of the speaker verification in the speaker recognition unit 7. However, the current user is identified as the legitimate speaker himself.

【００３３】図３は確認手段１１の一構成例を示す図で
ある。図３の例では、確認手段１１は、利用者を特定す
るための特定用情報に対応付けて付随情報が記憶される
付随情報記憶部１２と、話者認識部７における話者照合
の結果、類似していないと判別されたときに、特定用情
報入力手段２から入力された特定用情報に対応付けて付
随情報記憶部１２に記憶されている付随情報に従って利
用者に確認のための通知を行なう通知部１３と、通知部
１３によって利用者に通知がなされ、利用者から確認の
ための応答があったときに、該応答に基づき、利用者が
正規の話者本人であるか否かを判別する判別部１４とを
有している。FIG. 3 is a diagram showing an example of the structure of the confirmation means 11. In the example of FIG. 3, the confirmation unit 11 includes the incidental information storage unit 12 in which the incidental information is stored in association with the identification information for identifying the user, and the result of speaker verification in the speaker recognition unit 7, When it is determined that they are not similar, the user is notified for confirmation according to the accompanying information stored in the accompanying information storage unit 12 in association with the specifying information input from the specifying information input unit 2. The notifying unit 13 to perform, and when the notifying unit 13 notifies the user and a response for confirmation is given from the user, based on the response, it is determined whether or not the user is the proper speaker himself. It has a discriminating unit 14 for discriminating.

【００３４】図４は付随情報記憶部１２の構成例を示す
図であり、図４の例では、付随情報記憶部１２には、付
随情報として、利用者への通知の仕方が記憶されるよう
になっている。例えば正規の利用者の電話番号、あるい
は、この話者識別システムの例えば表示装置にガイダン
スなどを表示する旨などが記憶されるようになってい
る。FIG. 4 is a diagram showing an example of the configuration of the incidental information storage unit 12. In the example of FIG. 4, the incidental information storage unit 12 stores, as incidental information, a method of notifying the user. It has become. For example, the telephone number of the authorized user, or the fact that guidance or the like is displayed on the display device of the speaker identification system is stored.

【００３５】さらに、図４の例では、付随情報記憶部１
２には、特定用情報入力手段２から入力された特定用情
報と対応付けて、付随情報として、利用者を特定するた
めの第２の特定用情報が記憶されるようになっている。Further, in the example of FIG. 4, the accompanying information storage unit 1
The second identification information is associated with the identification information input from the identification information input unit 2 and stores second identification information for identifying the user as accompanying information.

【００３６】ここで、第２の特定用情報としては、特定
用情報とは異なるものを用いることができる。なお、以
下では、特定用情報を、第２の特定用情報と区別するた
め、第１の特定用情報と呼ぶことにする。この場合、第
１の特定用情報としては、前述のように、例えば、利用
者の名前や生年月日、あるいは暗証番号(以下、第１の
暗証番号と呼ぶ)などを用いることができ、また、第２
の特定用情報としては、例えば、上記第１の暗証番号と
は異なる第２の暗証番号を用いることができる。Here, as the second specifying information, information different from the specifying information can be used. In the following, the identifying information will be referred to as the first identifying information in order to distinguish it from the second identifying information. In this case, as the first identification information, as described above, for example, the name of the user, the date of birth, or the personal identification number (hereinafter, referred to as the first personal identification number) can be used. , Second
As the identifying information of, for example, a second personal identification number different from the first personal identification number can be used.

【００３７】なお、このような各種の付随情報は、例え
ば、利用者Ｄが自己の音声の標準パターンを新規に登録
する際に、特定用情報入力手段２から特定用情報ととも
に、入力することができ、これによって、付随情報記憶
部１２には、利用者Ｄの特定用情報に対応させて、利用
者Ｄの付随情報が登録される。It should be noted that such various accompanying information can be input together with the specifying information from the specifying information input means 2 when the user D newly registers the standard pattern of his or her voice. Accordingly, the accompanying information of the user D is registered in the accompanying information storage unit 12 in association with the identification information of the user D.

【００３８】また、通知部１３としては、上記付随情報
記憶部１２に付随情報として記憶される利用者への通知
の仕方に応じて、種々の形態のものを用いることができ
る。例えば、利用者への通知の仕方が、利用者に電話に
かけるものである場合(付随情報として、電話番号が設
定されている場合)には、通知部１３としては、通信装
置(電話装置やパソコン通信機能をもつ端末など)を用い
ることができる。また、利用者への通知の仕方が、この
システムにおいて利用者にその旨をガイダンスなどで知
らせる場合には、通知部１３としては、この話者識別シ
ステムに備わった表示装置や音声合成出力装置などを用
いることができる。Further, as the notifying section 13, various types can be used according to the way of notifying the user stored as the accompanying information in the accompanying information storage section 12. For example, when the method of notifying the user is to call the user (when a telephone number is set as the accompanying information), the notification unit 13 may be a communication device (a telephone device or a telephone device). A terminal having a personal computer communication function) can be used. Further, in the case of notifying the user of this system by means of guidance or the like in this system, the notification unit 13 may be a display device, a voice synthesis output device, or the like provided in this speaker identification system. Can be used.

【００３９】このようにして、通知部１３から確認のた
めの通知があったときに、利用者は、この通知に対する
応答として、例えば、第２の特定用情報を入力したり、
あるいは音声等で返答したりすることができる。なお、
第２の特定用情報を入力する場合、第２の特定用情報の
入力は、例えば特定用情報入力手段２を用いて(兼用し
て)行なうこともできるし、あるいは、特定用情報入力
手段２以外の入力手段から行なうこともできる。In this way, when a notification for confirmation is given from the notification section 13, the user inputs, for example, the second specifying information as a response to this notification,
Alternatively, it is possible to reply by voice or the like. In addition,
When the second specifying information is input, the second specifying information can be input, for example, by using the specifying information input means 2 (combined use), or alternatively, the specifying information input means 2 can be input. It can also be performed from other input means.

【００４０】また、判別部１４は、通知部１３からの通
知に対する利用者の応答として、第２の特定用情報が入
力されると、利用者によって入力された第２の特定用情
報と特定用情報入力手段２から入力された特定用情報に
対応させて付随情報記憶部１２に記憶されている第２の
特定用情報とを照合して、正規の利用者か否かの判別を
行なうようになっている。Further, when the second specifying information is input as the user's response to the notification from the notifying unit 13, the determining unit 14 determines the second specifying information and the specifying information input by the user. The identification information input from the information input means 2 is matched with the second identification information stored in the associated information storage section 12 to determine whether or not the user is an authorized user. Has become.

【００４１】図５乃至図８は本発明の話者識別システム
の種々の使用形態例を示す図である。図５の使用形態例
は、図３の構成例において、音声入力手段１，特定用情
報入力手段２，音声区間検出部３，特徴抽出部４，話者
認識用情報記憶部５，登録部６，話者認識部７，切替部
８，付随情報記憶部１２，通知部１３，判別部１４，さ
らには表示装置１６が、例えば、話者認識装置ユニット
３０として、銀行の窓口などに設置されるものとなって
いる。5 to 8 are views showing various usage examples of the speaker identification system of the present invention. The example of the usage pattern of FIG. 5 is the same as the configuration example of FIG. 3, except that the voice input unit 1, the specific information input unit 2, the voice section detection unit 3, the feature extraction unit 4, the speaker recognition information storage unit 5, and the registration unit 6 are included. The speaker recognition unit 7, the switching unit 8, the accompanying information storage unit 12, the notification unit 13, the determination unit 14, and the display device 16 are installed, for example, as a speaker recognition device unit 30 at a bank counter or the like. It has become a thing.

【００４２】図５の使用形態例では、標準パターンの新
規登録，変更あるいは更新，話者認識を行なうために、
利用者は、例えば銀行の窓口などに設置されている話者
認識装置ユニット３０のところに出向き、この話者認識
装置ユニット３０によって、標準パターンの新規登録操
作，話者認識操作，標準パターンの変更あるいは更新操
作を、前述したようにして行なうことができる。なお、
この話者認識装置ユニット３０に、標準パターンの自動
更新機能が備わっているときには、利用者は、標準パタ
ーンの変更あるいは更新操作を行なうことなく、標準パ
ターンは自動更新される。In the usage example of FIG. 5, in order to perform new registration, change or update of the standard pattern, and speaker recognition,
The user goes to the speaker recognition device unit 30 installed at, for example, a window of a bank, and uses the speaker recognition device unit 30 to perform a new registration operation of the standard pattern, a speaker recognition operation, and a change of the standard pattern. Alternatively, the update operation can be performed as described above. In addition,
When the speaker recognition device unit 30 has a standard pattern automatic updating function, the user automatically updates the standard pattern without changing or updating the standard pattern.

【００４３】また、図５の使用形態例では、この話者認
識ユニット３０によって利用者が例えば標準パターンの
新規登録操作を行なう際、利用者は、これとともに、付
随情報の入力を行ない、入力された付随情報を付随情報
記憶部１２に記憶させることができる。すなわち、この
場合、付随情報記憶部１２には、付随情報として、例え
ば、各利用者ごとの第２の特定用情報とともに、利用者
への通知の仕方として、例えば表示装置１６へガイダン
スを表示する旨などが記憶される。Further, in the usage pattern example of FIG. 5, when the user performs a new registration operation of the standard pattern by the speaker recognition unit 30, the user also inputs and inputs the accompanying information. The accompanying information can be stored in the accompanying information storage unit 12. That is, in this case, the accompanying information storage unit 12 displays guidance as the accompanying information, for example, together with the second specifying information for each user, and as a way of notifying the user on the display device 16, for example. The effect is stored.

【００４４】このようにして、標準パターンの新規登録
あるいは、変更，更新がなされ、認識モード時におい
て、例えば利用者Ｄが話者認識を行なうために特定用情
報入力手段２から第１の特定用情報を入力し、音声入力
手段１から音声を入力するとき、話者認識部７は、音声
入力手段１から入力された音声の特徴パターンと特定用
情報入力手段２から入力された第１の特定用情報に対応
した標準パターン(例えば利用者Ｄの標準パターン)とを
照合し、これらが類似しているか否かを判別する。In this way, the standard pattern is newly registered, changed or updated, and in the recognition mode, for example, the user D uses the identification information input means 2 for the first identification to recognize the speaker. When the information is input and the voice is input from the voice input unit 1, the speaker recognition unit 7 determines the feature pattern of the voice input from the voice input unit 1 and the first identification input from the identification information input unit 2. The standard pattern corresponding to the usage information (for example, the standard pattern of the user D) is collated to determine whether or not they are similar.

【００４５】この結果、入力された音声の特徴パターン
と利用者Ｄの標準パターンとが類似していると判別され
たときには、利用者が正規の利用者Ｄ本人であると識別
し、この利用者に対して、例えば、利用者Ｄ用のアプリ
ケーション(入出金，残高照会等のアプリケーション)の
利用を許可する。As a result, when it is determined that the characteristic pattern of the input voice and the standard pattern of the user D are similar, the user is identified as the authorized user D himself and this user is identified. For example, the use of an application for user D (application for deposit / withdrawal, balance inquiry, etc.) is permitted.

【００４６】これに対し、入力された音声の特徴パター
ンと利用者Ｄの標準パターンとが類似していないと判別
されたときには、正規の利用者Ｄか否かの確認をとる。
すなわち、通知部１３は、この利用者への通知の仕方を
付随情報記憶部１２から読出し、この通知の仕方が、例
えば表示装置１６へのガイダンス表示である場合、第２
の特定用情報を利用者に入力させる旨のガイダンス、例
えば「第２の特定用情報を入力して下さい」などのガイ
ダンスを、この話者認識装置ユニット３０の表示装置１
６に画面表示し、利用者に知らせる。利用者が、これに
応答して、例えば特定用情報入力手段２から第２の特定
用情報を入力するとき、判別部１４では、いま入力され
た第２の特定用情報と付随情報記憶部１２に記憶されて
いる利用者Ｄの第２の特定用情報とを照合する。この結
果、これらが一致したときには、利用者が正規の利用者
Ｄ本人であると識別し、この利用者に対して、例えば、
利用者Ｄ用のアプリケーション(入出金，残高照会等の
アプリケーション)の利用を許可する。On the other hand, when it is determined that the characteristic pattern of the input voice and the standard pattern of the user D are not similar, it is confirmed whether or not the user D is a regular user.
That is, the notifying unit 13 reads out the method of notifying the user from the associated information storage unit 12, and if the method of notifying is the guidance display on the display device 16, for example, the second
Guidance for allowing the user to input the identification information of the speaker, for example, "Please input the second identification information" is displayed on the display device 1 of the speaker recognition device unit 30.
Display the screen on 6 and inform the user. In response to this, for example, when the user inputs the second specifying information from the specifying information input means 2, the discriminating unit 14 causes the discriminating unit 14 to input the second specifying information and the accompanying information storage unit 12 that have just been input. The second identification information of the user D stored in is compared. As a result, when they match, the user is identified as the authorized user D himself and, for this user, for example,
Allow the use of applications for user D (applications for deposit / withdrawal, balance inquiry, etc.).

【００４７】これに対し、判別部１４における照合の結
果、これらが一致しないときには、この利用者に対し
て、例えば、利用者Ｄ用のアプリケーションの利用を禁
止する。On the other hand, as a result of the collation performed by the discriminating unit 14, if they do not match, the user is prohibited from using the application for the user D, for example.

【００４８】このように、正規の利用者Ｄの音声が例え
ば風邪などによって突然変化し、入力された音声の特徴
パターンと利用者Ｄの標準パターンとが類似しないもの
となっても、この正規の利用者Ｄが第２の特定用情報を
正しく入力することで、利用者が利用者Ｄであると識別
され、この利用者Ｄに対するアプリケーションの利用を
許可することができる。また、利用者Ｄ以外の他人，例
えばＥが、利用者Ｄの第１の特定用情報を知得しても、
利用者Ｄの第２の特定用情報を知得しない限り、この他
人Ｅは、利用者Ｄ用のアプリケーションを利用すること
ができないので、悪意のある他人によって正規の利用者
用のアプリケーションが利用されてしまうという事態を
も、有効に防止することができる。As described above, even if the voice of the regular user D suddenly changes due to a cold, for example, and the characteristic pattern of the input voice becomes dissimilar to the standard pattern of the user D, the regular voice of the regular user D becomes normal. When the user D correctly inputs the second specifying information, the user is identified as the user D, and the user D can be permitted to use the application. Further, even if another person other than the user D, for example, E learns the first specifying information of the user D,
This stranger E cannot use the application for the user D unless he / she knows the second specifying information of the user D, so that the malicious user can use the application for the authorized user. It is possible to effectively prevent such a situation that it happens.

【００４９】また、図６の使用形態例では、図５の使用
形態例において、利用者への通知を例えばオペレーショ
ンセンタ８０を介して行なうものとなっている。この場
合、通知部１３は、オペレーションセンタ８０に設置さ
れているアクセス受動部２４と、アクセス受動部２４に
アクセスするためのアクセス部２３とを有している。す
なわち、図６の使用形態例では、図３の構成例におい
て、音声入力手段１，特定用情報入力手段２，音声区間
検出部３，特徴抽出部４，話者認識用情報記憶部５，登
録部６，話者認識部７，切替部８，通知部１３のアクセ
ス部２３，判別部１４は、図５の使用形態例と同様に、
例えば話者認識装置ユニット３０として銀行の窓口など
に設置されているが、通知部１３のアクセス受動部２４
は、例えば電話装置としてオペレーションセンタ８０の
管理者によって管理され、アクセス受動部２４がアクセ
ス部２３によってアクセスされたとき、オペレーション
センタ８０の管理者が、別途、利用者の携帯電話などに
確認のための電話などを行なうように構成されている。
また、オペレーションセンタ８０から利用者へ確認のた
めの通知を行なうため、付随情報記憶部１２も、オペレ
ーションセンタ８０側に設けられている。Further, in the usage example of FIG. 6, in the usage example of FIG. 5, the user is notified via the operation center 80, for example. In this case, the notification unit 13 has an access passive unit 24 installed in the operation center 80 and an access unit 23 for accessing the access passive unit 24. That is, in the usage example of FIG. 6, in the configuration example of FIG. 3, the voice input unit 1, the specific information input unit 2, the voice section detection unit 3, the feature extraction unit 4, the speaker recognition information storage unit 5, and the registration are performed. The unit 6, the speaker recognition unit 7, the switching unit 8, the access unit 23 of the notification unit 13, and the determination unit 14 are the same as in the usage example of FIG.
For example, although the speaker recognition device unit 30 is installed at a bank counter or the like, the access passive unit 24 of the notification unit 13 is used.
Is managed by the administrator of the operation center 80 as a telephone device, for example, and when the access passive unit 24 is accessed by the access unit 23, the administrator of the operation center 80 confirms separately with the user's mobile phone or the like. Is configured to make a telephone call or the like.
Further, in order to notify the user from the operation center 80 for confirmation, the accompanying information storage unit 12 is also provided on the operation center 80 side.

【００５０】図６の使用形態例では、話者認識装置ユニ
ット３０において、利用者の入力された音声の特徴パタ
ーンと例えば利用者Ｄの音声の標準パターンとの照合の
結果、これらが類似していないと判別されたとき、話者
認識装置ユニット３０のアクセス部２３は、オペレーシ
ョンセンタ８０のアクセス受動部２４を例えば電話で呼
出し、例えば、「利用者Ｄに確認をとって下さい」など
の音声ガイドを流し、アクセス受動部２４の受話器から
オペレーションセンタ８０の管理者に伝える。これによ
り、オペレーションセンタ８０の管理者は、付随情報記
憶部１２から利用者Ｄに対応する付随情報，例えば利用
者Ｄの電話番号を検索し、利用者Ｄに例えば電話で連絡
する。この結果、利用者Ｄ本人が話者認識を行なってい
るとの確認が得られると、管理者は、アクセス受動部２
４の送話器から例えば「利用者Ｄである」旨のメッセー
ジを発声する。あるいは、「利用者Ｄである」旨をアク
セス受動部２４の所定の機能キー，例えば“＊”で通知
する。これにより、アクセス部２３はこれを受信して、
利用者に対し利用者Ｄ用のアプリケーションの利用を許
可する。In the usage pattern example of FIG. 6, in the speaker recognition device unit 30, as a result of matching between the characteristic pattern of the voice input by the user and the standard pattern of the voice of the user D, for example, these are similar. When it is determined that there is not, the access unit 23 of the speaker recognition device unit 30 calls the access passive unit 24 of the operation center 80 by, for example, a telephone, and a voice guide such as "please check with the user D". Is transmitted to the manager of the operation center 80 from the handset of the access passive unit 24. Thereby, the administrator of the operation center 80 retrieves the associated information corresponding to the user D, for example, the telephone number of the user D from the associated information storage unit 12, and contacts the user D by telephone, for example. As a result, when it is confirmed that the user D himself / herself is performing the speaker recognition, the administrator informs the access passive unit 2
From the transmitter of No. 4, a message saying, for example, "I am a user D" is uttered. Alternatively, the user is informed that he / she is a user D by a predetermined function key of the access passive unit 24, for example, “*”. As a result, the access unit 23 receives this,
Allow the user to use the application for user D.

【００５１】これに対し、利用者Ｄ本人が話者認識を行
なっているとの確認が得られない場合には、オペレーシ
ョンセンタ８０の管理者は、アクセス受動部２４の送話
器から例えば「利用者Ｄではない」旨のメッセージを発
声する。あるいは、「利用者Ｄではない」旨をアクセス
受動部２４の所定の機能キー，例えば“＃”で通知す
る。これにより、アクセス部２３はこれを受信して、利
用者に対し利用者Ｄ用のアプリケーションの利用を禁止
する。On the other hand, when the confirmation that the user D himself / herself is performing the speaker recognition cannot be obtained, the administrator of the operation center 80 uses the transmitter of the access passive unit 24 to, for example, “use”. It is not person D ”. Alternatively, "not user D" is notified by a predetermined function key of the access passive unit 24, for example, "#". As a result, the access unit 23 receives this and prohibits the user from using the application for the user D.

【００５２】このように、図６の使用形態例において
も、図５の使用形態例と同様に、正規の利用者Ｄの音声
が例えば風邪などによって突然変化し、入力された音声
の特徴パターンと利用者Ｄの標準パターンとが類似しな
いものとなっても、この正規の利用者Ｄが第２の特定用
情報を正しく入力することで、利用者が利用者Ｄである
と識別され、この利用者Ｄに対するアプリケーションの
利用を許可することができる。また、利用者Ｄ以外の他
人，例えばＥが、利用者Ｄの第１の特定用情報を知得し
ても、利用者Ｄの第２の特定用情報を知得しない限り、
この他人Ｅは、利用者Ｄ用のアプリケーションを利用す
ることができないので、悪意のある他人によって正規の
利用者用のアプリケーションが利用されてしまうという
事態をも、有効に防止することができる。As described above, also in the usage pattern example of FIG. 6, as in the usage pattern example of FIG. 5, the voice of the authorized user D suddenly changes due to, for example, a cold, and a characteristic pattern of the input voice is obtained. Even if the standard pattern of the user D is not similar to this, the regular user D correctly identifies the user as the user D by correctly inputting the second specifying information, and The person D can be permitted to use the application. Also, even if another person other than the user D, for example E, knows the first specifying information of the user D, but does not know the second specifying information of the user D,
Since the stranger E cannot use the application for the user D, it is possible to effectively prevent the situation where the malicious user uses the application for the authorized user.

【００５３】なお、図６の使用形態例では、オペレーシ
ョンセンタ８０の管理者が利用者Ｄ本人に直接問い合せ
することができるので、このときには、利用者から第２
の特定用情報を入力させずとも、利用者が利用者Ｄ本人
であるかを直接確認することができる。従って、この場
合には、判別部１４は設けずとも良い。但し、この場合
であっても、さらに、利用者から第２の特定用情報を入
力させることもでき、このときには、判別部１４は設け
る必要がある。また、この場合、話者認識ユニット３０
側にも付随情報記憶部１２を設けることができる。In the example of usage shown in FIG. 6, the administrator of the operation center 80 can directly inquire of the user D himself.
It is possible to directly confirm whether the user is the user D himself or herself without inputting the identification information. Therefore, in this case, the determination unit 14 may not be provided. However, even in this case, the user can further input the second specifying information, and at this time, the determination unit 14 needs to be provided. In this case, the speaker recognition unit 30
The side information storage unit 12 can also be provided on the side.

【００５４】また、図７の使用形態例は、図３の構成例
において、利用者が端末によって話者認識等の操作を行
なうものとなっている。すなわち、図７の例では、音声
入力手段１，特定用情報入力手段２，音声区間検出部
３，特徴抽出部４が、利用者の家庭や会社等に設置され
ている端末３１(例えばパソコンや電話装置など)で実現
されており、切替部８，話者認識用情報記憶部５，登録
部６，話者認識部７，付随情報記憶部１２，判別部１４
が、例えば、銀行の窓口などに設置されている話者認識
装置ユニット３２で実現されている。また、この場合、
通知部１３は、端末３１側に設けられているアクセス受
動部２４と、話者認識装置ユニット３２に設けられ、端
末３１のアクセス受動部２４にアクセスするアクセス部
２３とにより実現されている。In the example of the usage pattern shown in FIG. 7, the user performs an operation such as speaker recognition on the terminal in the configuration example shown in FIG. That is, in the example of FIG. 7, the voice input unit 1, the specific information input unit 2, the voice section detection unit 3, and the feature extraction unit 4 are connected to a terminal 31 (for example, a personal computer or a personal computer) installed at the user's home or office. It is realized by a telephone device, etc., and includes a switching unit 8, a speaker recognition information storage unit 5, a registration unit 6, a speaker recognition unit 7, an accompanying information storage unit 12, and a determination unit 14.
Is realized by, for example, a speaker recognition device unit 32 installed at a bank counter or the like. Also, in this case,
The notification unit 13 is realized by the access passive unit 24 provided on the terminal 31 side and the access unit 23 provided on the speaker recognition device unit 32 and accessing the access passive unit 24 of the terminal 31.

【００５５】この場合、付随情報記憶部１２には、各利
用者ごとのアクセス受動部２４の電話番号などが付随情
報(利用者への通知の仕方)として予め記憶されている。
また、利用者側の端末３１と銀行などに設置されている
話者認識装置ユニット３２とは、通信手段３３，例えば
通信回線(有線)あるいは無線によって、互いに情報の送
受信がなされるようになっている。In this case, the telephone number of the access passive unit 24 for each user is previously stored in the incidental information storage unit 12 as incidental information (method of notifying the user).
In addition, the terminal 31 on the user side and the speaker recognition device unit 32 installed in a bank or the like are adapted to exchange information with each other by communication means 33, for example, a communication line (wired) or wireless. There is.

【００５６】なお、図７の例では、１つの端末３１が話
者認識装置ユニット３２に通信手段３３を介して接続さ
れている場合のみが示されているが、話者認識装置ユニ
ット３２には、１つのみならず、複数の端末を送受信可
能に接続することができる。また、図７では、音声入力
手段１，特定用情報入力手段２，アクセス受動部２４が
一体のユニット(端末)として構成されているが、これら
は別々の装置として設置されていても良い。In the example of FIG. 7, only one terminal 31 is connected to the speaker recognition device unit 32 through the communication means 33, but the speaker recognition device unit 32 is not shown. Not only one but also a plurality of terminals can be connected so as to be able to transmit and receive. Further, in FIG. 7, the voice input unit 1, the specific information input unit 2, and the access passive unit 24 are configured as an integrated unit (terminal), but they may be installed as separate devices.

【００５７】図７の使用形態例では、標準パターンの新
規登録，変更あるいは更新，話者認識を行なうために、
利用者は、利用者の家庭や会社等に設置されている端末
３１を操作することによって、例えば銀行の窓口などに
設置されている話者認識装置ユニット３２に対し、標準
パターンの新規登録操作，話者認識操作，標準パターン
の変更あるいは更新操作を、前述したと同様にして行な
うことができる。但し、図７の使用形態例では、登録モ
ードにするか認識モードにするかの切替指示は、例え
ば、端末の特定用情報入力手段２から与えることがで
き、端末の特定用情報入力手段２から登録モードにする
か認識モードにするかの指示が通信手段３３を介して伝
送されるとき、話者認識装置ユニット３２側では、この
指示に応じて、切替部８の切替制御を行なうようになっ
ている。また、この話者認識装置ユニット３２に、標準
パターンの自動更新機能が備わっているときには、利用
者は、標準パターンの変更あるいは更新操作を行なうこ
となく、標準パターンは自動更新される。In the use form example of FIG. 7, in order to perform new registration, change or update of the standard pattern, and speaker recognition,
The user operates the terminal 31 installed in the user's home or company to perform a new registration operation of the standard pattern for the speaker recognition device unit 32 installed in a bank counter, for example. The speaker recognition operation and the standard pattern change or update operation can be performed in the same manner as described above. However, in the usage example of FIG. 7, a switching instruction to switch between the registration mode and the recognition mode can be given from, for example, the identification information input means 2 of the terminal, and from the identification information input means 2 of the terminal. When an instruction to enter the registration mode or the recognition mode is transmitted via the communication means 33, the speaker recognition device unit 32 side controls the switching of the switching unit 8 according to the instruction. ing. When the speaker recognizing device unit 32 has a standard pattern automatic updating function, the user automatically updates the standard pattern without changing or updating the standard pattern.

【００５８】図７の使用形態例では、認識モード時に、
話者認識装置ユニット３２の話者認識部７において、入
力された利用者の音声の特徴パターンと正規の利用者Ｄ
の標準パターンとを照合した結果、これらが類似してい
ないときには、話者認識装置ユニット３２のアクセス部
２３は、利用者Ｄの付随情報(例えば電話番号)を、付随
情報記憶部１２から読出し、この利用者Ｄの付随情報
(電話番号)によって利用者Ｄのアクセス受動部２４を呼
出し、例えば、「確認のため、第２の特定用情報を入力
して下さい」などの音声ガイドを流し、アクセス受動部
２４の受話器から利用者Ｄに与える。利用者Ｄが、これ
に応答して、アクセス受動部２４の送話器から例えば、
第２の特定用情報を発声するとき、あるいは、第２の特
定用情報をアクセス受動部２４のキー操作によりプッシ
ュトーン等で通知し、アクセス部２３がこれを受信する
とき、判別部１４は、受信した第２の特定用情報を付随
情報記憶部１２に記憶されている利用者Ｄの第２の特定
用情報と照合する。この結果、これらが一致すると、利
用者に対し利用者Ｄ用のアプリケーションの利用を許可
する。In the use mode example of FIG. 7, in the recognition mode,
In the speaker recognizing unit 7 of the speaker recognizing device unit 32, the characteristic pattern of the voice of the input user and the normal user D
As a result of the comparison with the standard pattern of No. 2, if they are not similar, the access unit 23 of the speaker recognition device unit 32 reads the accompanying information (for example, telephone number) of the user D from the accompanying information storage unit 12, Accompanying information of this user D
Call the access passive unit 24 of the user D by (telephone number), for example, play a voice guide such as "Please enter the second identification information for confirmation" and use it from the handset of the access passive unit 24. Give to person D. In response to this, the user D uses the transmitter of the access passive unit 24 to
When the second specifying information is uttered, or when the second specifying information is notified by a push tone or the like by the key operation of the access passive unit 24 and the access unit 23 receives this, the determining unit 14 The received second specifying information is collated with the second specifying information of the user D stored in the associated information storage unit 12. As a result, if they match, the user is permitted to use the application for user D.

【００５９】これに対し、利用者から入力された第２の
特定用情報と付随情報記憶部１２に記憶されている利用
者Ｄの第２の特定用情報とが一致しないとき、この利用
者に対し利用者Ｄ用のアプリケーションの利用を禁止す
る。On the other hand, when the second specifying information input by the user does not match the second specifying information of the user D stored in the accompanying information storage unit 12, this user is identified. On the other hand, the use of the application for user D is prohibited.

【００６０】これにより、図５，図６の使用形態例と同
様に、正規の利用者Ｄの音声が例えば風邪などによって
突然変化し、入力された音声の特徴パターンと利用者Ｄ
の標準パターンとが類似しないものとなっても、この正
規の利用者Ｄが第２の特定用情報を正しく入力すること
で、利用者が利用者Ｄであると識別され、この利用者Ｄ
に対するアプリケーションの利用を許可することができ
る。また、利用者Ｄ以外の他人，例えばＥが、利用者Ｄ
の第１の特定用情報を知得しても、利用者Ｄの第２の特
定用情報を知得しない限り、この他人Ｅは、利用者Ｄ用
のアプリケーションを利用することができないので、悪
意のある他人によって正規の利用者用のアプリケーショ
ンが利用されてしまうという事態をも、有効に防止する
ことができる。As a result, similarly to the usage patterns shown in FIGS. 5 and 6, the voice of the regular user D suddenly changes due to a cold, for example, and the characteristic pattern of the input voice and the user D are changed.
Even if the standard pattern is not similar to this standard pattern, the normal user D correctly identifies the user as the user D by correctly inputting the second specifying information.
You can allow the use of applications for. In addition, another person other than the user D, for example E, is the user D.
Even if the first identification information of the user D is known, the stranger E cannot use the application for the user D unless the second identification information of the user D is known. It is also possible to effectively prevent a situation in which an application for a legitimate user is used by a certain other person.

【００６１】なお、図７の使用形態例においては、話者
認識装置ユニット３０の管理者から利用者Ｄ本人に直接
確認のための電話等を行ない、利用者Ｄ本人に直接問い
合せることもできるので、このときには、利用者Ｄから
第２の特定用情報を入力させずとも、利用者が利用者Ｄ
本人であるか否かを直接確認することができる。従っ
て、この場合には、判別部１４は設けずとも良い。但
し、この場合であっても、さらに、利用者から第２の特
定用情報を入力させることもでき、このときには、判別
部１４は設ける必要がある。また、この場合、話者認識
ユニット３０側にも付随情報記憶部１２を設けることが
できる。In the example of the usage pattern shown in FIG. 7, the administrator of the speaker recognition device unit 30 can make a telephone call or the like for direct confirmation to the user D, and the user D can be directly inquired. At this time, even if the user D does not input the second specifying information, the user D
You can directly confirm whether you are the person or not. Therefore, in this case, the determination unit 14 may not be provided. However, even in this case, the user can further input the second specifying information, and at this time, the determination unit 14 needs to be provided. Further, in this case, the accompanying information storage unit 12 can be provided also on the speaker recognition unit 30 side.

【００６２】また、図８の使用形態例は、図７の使用形
態例において、アクセス受動部２４が例えばオペレーシ
ョンセンタ８０に設置されたものとなっており、この場
合の操作，動作については、図６の使用形態例とほぼ同
様になされる。In addition, in the usage example of FIG. 8, the access passive unit 24 is installed, for example, in the operation center 80 in the usage example of FIG. 7. This is performed in substantially the same manner as the usage example of No. 6.

【００６３】また、例えば図７(あるいは図８)の使用形
態例において、音声入力手段１，特定用情報入力手段
２，アクセス受動部２４を例えば、図９に示すように、
１つの電話装置(あるいはパソコン通信装置)３５として
共用することもできる。すなわち、この電話装置(ある
いはパソコン通信装置)３５としては、利用者の家庭や
会社等にある既存のもの(例えばプッシュホン電話器)を
用いることができ、この場合、電話装置３５のハンドセ
ットの送話器を音声入力手段１として用い、また、ハン
ドセットの受話器をアクセス受動部２４において例えば
音声ガイドの受信部として用い、また、電話装置３５の
操作部(テンキー部)を特定用情報入力手段２として用い
ることができる。また、アクセス受動部２４において、
確認の発信を例えば音声メッセージで行なうようになっ
ている場合、上記ハンドセットの送話器をアクセス受動
部２４の確認発信部として用いることができ、また、ア
クセス受動部２４において第２の特定用情報の発信をプ
ッシュトーンで行なうようになっている場合、電話装置
３５の操作部(テンキー部)をアクセス受動部２４の確認
発信部としても用いることができる。Further, in the usage example of FIG. 7 (or FIG. 8), for example, as shown in FIG. 9, the voice input unit 1, the specific information input unit 2, and the access passive unit 24 are
It can also be shared as one telephone device (or personal computer communication device) 35. That is, as the telephone device (or personal computer communication device) 35, an existing one (for example, a touch-tone telephone) in the home or company of the user can be used. In this case, the handset transmission of the telephone device 35 is performed. Device is used as the voice input means 1, the handset receiver is used as the voice guide reception part in the access passive part 24, and the operation part (numeric keypad part) of the telephone device 35 is used as the specific information input part 2. be able to. In the access passive unit 24,
When the confirmation is transmitted by, for example, a voice message, the transmitter of the handset can be used as the confirmation transmitting unit of the access passive unit 24, and the second passive identification information is used in the access passive unit 24. When the call is transmitted by push tone, the operation unit (numeric keypad) of the telephone device 35 can be used as the confirmation transmission unit of the access passive unit 24.

【００６４】このように、例えば図７の使用形態例にお
いて、音声入力手段１，特定用情報入力手段２，アクセ
ス受動部２４は、１つの電話装置(あるいはパソコン通
信装置)３５で実現することが可能であり、この場合、
利用者は、別途、話者認識用の装置(音声入力手段１，
特定用情報入力手段２)を用意せずに済む。Thus, for example, in the example of the use form of FIG. 7, the voice input means 1, the specific information input means 2, and the access passive unit 24 can be realized by one telephone device (or personal computer communication device) 35. Is possible, in this case
The user separately uses a device for recognizing a speaker (voice input means 1,
It is not necessary to prepare the specific information input means 2).

【００６５】なお、音声入力手段１，アクセス受動部２
４をこのように１つの電話装置(あるいはパソコン通信
装置)３５で実現する場合、利用者が話者認識を行なう
ときには、この電話装置３５のハンドセットが持ち上げ
られ、この電話装置３５は、通話状態となっていること
から、話者の確認を行なうためアクセス部２３がアクセ
ス受動部２４をアクセスするとき、利用者が正規の利用
者(話者本人)である場合には、利用者先のアクセス受動
部すなわち電話装置３５は、通話中となっている。The voice input unit 1 and the access passive unit 2
When 4 is realized by one telephone device (or personal computer communication device) 35 in this way, when the user recognizes the speaker, the handset of the telephone device 35 is lifted, and the telephone device 35 becomes Therefore, when the access unit 23 accesses the access passive unit 24 to confirm the speaker, if the user is an authorized user (speaker himself), the access passive of the user destination The unit, that is, the telephone device 35 is in a call.

【００６６】このことに着目し、アクセス部２３がアク
セス受動部２４をアクセスしたときに通話中である場合
に、いま話者認識を行なっている利用者が正規の話者本
人であると判定し、確認を行なうこともできる。With this in mind, when the access unit 23 is in a call when accessing the access passive unit 24, it is determined that the user who is currently recognizing the speaker is the regular speaker himself. You can also check.

【００６７】また、図７，図８の構成例では、アクセス
部２３，アクセス受動部２４が設けられているが、これ
らを設けずに、確認手段１１を実現することも可能であ
る。Further, in the configuration examples of FIGS. 7 and 8, the access unit 23 and the access passive unit 24 are provided, but it is also possible to realize the confirmation means 11 without providing them.

【００６８】すなわち、話者識別を行なうために、利用
者が自己の端末(例えば電話装置あるいはパソコン通信
装置)によって、例えば銀行等に設置されている話者認
識装置ユニットをアクセスするのに必要な電話番号を入
力し、この電話番号が自己の端末からデジタル信号で送
出されるとき、銀行等に設置されている話者認識装置ユ
ニットでは、利用者端末からデジタル信号で送出された
電話番号を例えば表示するように構成することもでき
る。That is, it is necessary for the user to access the speaker recognition device unit installed in, for example, a bank by his / her own terminal (for example, telephone device or personal computer communication device) in order to identify the speaker. When a telephone number is entered and this telephone number is sent out as a digital signal from its own terminal, the speaker recognition device unit installed in a bank or the like uses the telephone number sent out as a digital signal from the user terminal, for example. It can also be configured to display.

【００６９】この場合、利用者が、銀行等に設置されて
いる話者認識装置ユニットをアクセスした後、端末の特
定用情報入力手段２から特定用情報を入力し、また、音
声入力手段１から音声を発生し、音声入力手段１から入
力された利用者の音声の特徴パターンと利用者Ｄの標準
パターンとの照合を行なわせた結果、これらが類似して
いないと判別されたときには、この時点で、話者認識装
置ユニット側のオペレータ(例えば銀行等の係員)は、上
記のように表示されている電話番号と上記のように入力
された特定用情報に対応させて付随情報記憶部１２に予
め登録されている正規の利用者の電話番号とを照合し、
この結果、一致したときには、利用者が正規の利用者で
あると確認することができる。これに対し、一致しない
ときには、利用者が正規の利用者ではないと判断するこ
とができる。In this case, the user accesses the speaker recognition device unit installed in a bank or the like, then inputs the identification information from the identification information input means 2 of the terminal, and also from the voice input means 1. At this time point, when a voice is generated and the characteristic pattern of the voice of the user input from the voice input means 1 is compared with the standard pattern of the user D and it is determined that they are not similar, Then, the operator on the speaker recognition device unit side (for example, a staff member of a bank or the like) stores in the incidental information storage unit 12 in association with the telephone number displayed as described above and the identification information input as described above. Check the phone number of a legitimate user registered in advance,
As a result, when they match, it can be confirmed that the user is an authorized user. On the other hand, when they do not match, it can be determined that the user is not an authorized user.

【００７０】このように、銀行等の話者認識装置ユニッ
トから利用者のアクセス受動部２４にアクセスせずと
も、確認を行なうことも可能である。As described above, the confirmation can be performed without accessing the access passive unit 24 of the user from the speaker recognition device unit such as a bank.

【００７１】また、上述の各構成例において、利用者が
正規の利用者Ｄではなく、利用者Ｄ以外の他人であると
確認されたときに、さらに、この他人が誰であったかが
履歴として残れば、より都合良い。話者認識(すなわ
ち、話者識別)を行なうための音声特徴パターンには、
利用者の声の情報が含まれていることからこれを履歴と
して保存することもできるが、通常、音声特徴パターン
は、元の音声信号に対し、データ量が圧縮されているた
め、これに基づいて誰であるかを判定することは難かし
い。Further, in each of the above configuration examples, when it is confirmed that the user is not the authorized user D but another person other than the user D, the history of who the other person is is also recorded. The more convenient it is. Voice feature patterns for speaker recognition (that is, speaker identification) include
Since the information of the voice of the user is included, it can be saved as a history, but the voice feature pattern is usually based on this because the data amount is compressed with respect to the original voice signal. It is difficult to judge who is.

【００７２】そこで、確認手段１１による確認の結果、
正規の利用者でないと確認された場合、現話者の音声標
準パターンではなく、現話者の元の音声を再生可能に保
存するようにすることができる。Then, as a result of the confirmation by the confirmation means 11,
If it is confirmed that the user is not a legitimate user, the original voice of the current speaker can be reproducibly stored instead of the voice standard pattern of the current speaker.

【００７３】図１０は現話者の音声を再生可能に保存す
る機能を備えた話者識別システムの構成例を示す図であ
る。図１０を参照すると、この話者識別システムでは、
認識モード時に、音声入力手段１から入力された音声信
号あるいは、音声区間検出後の音声信号(音声区間内の
音声信号)を再生可能に記憶する音声記憶手段(メモリ)
５０がさらに設けられており、確認手段１１において、
現話者が正規の話者本人であると確認されたときには、
この音声記憶手段５０に記憶された音声信号を例えば確
認手段１１からの制御によって消去する一方、現話者が
正規の話者本人ではないと判断されたときには、この音
声記憶手段５０に記憶された音声信号を履歴として保存
するようになっている。FIG. 10 is a diagram showing a configuration example of a speaker identification system having a function of reproducibly storing the voice of the current speaker. Referring to FIG. 10, in this speaker identification system,
A voice storage unit (memory) for reproducibly storing the voice signal input from the voice input unit 1 or the voice signal after the voice section detection (voice signal within the voice section) in the recognition mode.
50 is further provided, and in the confirmation means 11,
When it is confirmed that the current speaker is the original speaker,
The voice signal stored in the voice storage means 50 is erased, for example, by control from the confirmation means 11, while the voice signal is stored in the voice storage means 50 when it is determined that the present speaker is not the regular speaker himself. The audio signal is saved as a history.

【００７４】このような構成の話者識別システムでは、
利用者が認識モード時に音声を発声するとき、音声入力
手段１からの入力音声信号は、音声記憶手段５０に記憶
される。しかる後、確認手段１１によって前述したよう
な種々の仕方で現話者が正規の話者本人であるか否かを
確認し、最終的に正規の話者本人でないと判断されたと
きには、音声記憶手段５０にいま記憶された音声信号を
履歴とて保存し、この音声を後で再生することで、誰が
本人になりすまして利用しようとしたかを割り出すこと
ができる。In the speaker identification system having such a configuration,
When the user utters a voice in the recognition mode, the input voice signal from the voice input means 1 is stored in the voice storage means 50. Thereafter, the confirmation means 11 confirms whether or not the present speaker is the regular speaker himself in various ways as described above, and when it is finally determined that the speaker is not the regular speaker himself, the voice memory is stored. By storing the voice signal now stored in the means 50 as a history and reproducing this voice later, it is possible to determine who impersonated the person and tried to use it.

【００７５】なお、この構成例において、音声入力手段
１から音声信号を音声記憶手段５０に直接記憶させても
良いが、音声記憶手段５０の容量を節約する場合には、
音声区間検出後の音声信号(音声区間内の音声信号)を記
憶させるのが良い。また、記憶すべき音声信号として、
ＰＣＭにするか、ＡＤＰＣＭを使うか、帯域をどの程度
まで残すかによって、音声のデータの量が決まるが、音
声記憶手段５０には、話者の音声をできるだけ良い音質
で記憶するのがよい。In this configuration example, the voice signal from the voice input means 1 may be directly stored in the voice storage means 50, but in the case of saving the capacity of the voice storage means 50,
It is preferable to store the voice signal after detection of the voice section (voice signal within the voice section). Also, as a voice signal to be stored,
The amount of voice data is determined depending on whether PCM is used, ADPCM is used, or how much band is left. It is preferable that the voice storage unit 50 store the voice of the speaker with the best possible sound quality.

【００７６】また、上述の例では、利用者が正規の話者
本人であると確認されたときは、メモリ容量を節約する
ため、音声記憶手段５０に蓄積した音声信号を消去する
としたが、正規の話者本人であることが確認されたとき
にも、音声記憶手段５０に蓄積した音声信号を消去せず
に、そのまま残しておき、例えば、正規の話者本人が次
に利用するときに、これに上書きするようにしてもよ
い。これにより、装置が誤って正規の話者本人と判断し
たときにも、音声記憶手段５０に蓄積された音声信号に
基づき、本人にかわって誰が利用したかを割り出すこと
ができる。In the above example, when it is confirmed that the user is the proper speaker himself, the voice signal stored in the voice storage means 50 is erased in order to save the memory capacity. Even when it is confirmed that the speaker is the speaker himself, the voice signal stored in the voice storage means 50 is not erased but left as it is, for example, when the regular speaker himself uses it. You may make it overwrite this. Thus, even when the device mistakenly determines that the speaker is the proper speaker, it is possible to determine who used the speaker on behalf of the speaker based on the voice signal stored in the voice storage unit 50.

【００７７】また、図１０の構成例では、利用者の音声
を履歴として保存するようにしているが、利用者の映像
を履歴として残すことも可能である。すなわち、確認手
段１１による確認の結果、正規の利用者でないと確認さ
れた場合、利用者の映像を保存するようにすることも可
能である。Further, in the configuration example of FIG. 10, the voice of the user is stored as the history, but the video of the user may be stored as the history. That is, as a result of the confirmation by the confirmation means 11, when it is confirmed that the user is not an authorized user, it is possible to save the image of the user.

【００７８】図１１は利用者の映像を保存する機能を備
えた話者識別システムの構成例を示す図である。図１１
を参照すると、この話者識別システムでは、利用者の映
像を撮像する撮像手段(例えばカメラ)５２と、撮像手段
５２からの映像信号をＡ／Ｄ変換するＡ／Ｄ変換部５３
と、Ａ／Ｄ変換部５３によりデジタル変換された映像信
号を記憶する映像記憶手段５４とがさらに設けられてお
り、確認手段１１において、現話者が正規の話者本人で
あると確認されたときには、この映像記憶手段５４に記
憶された映像信号を例えば確認手段１１の制御によって
消去する一方、現話者が正規の話者本人ではないと判断
されたときには、この映像記憶手段５４に記憶された映
像信号を履歴として保存するようになっている。FIG. 11 is a diagram showing an example of the configuration of a speaker identification system having a function of saving the image of the user. FIG.
With reference to, in this speaker identification system, an image pickup unit (for example, a camera) 52 for picking up an image of a user, and an A / D conversion unit 53 for A / D converting a video signal from the image pickup unit 52.
And a video storage means 54 for storing the video signal digitally converted by the A / D converter 53, and the confirmation means 11 confirms that the present speaker is the regular speaker himself. Sometimes, the video signal stored in the video storage means 54 is erased, for example, by the control of the confirmation means 11, while it is stored in the video storage means 54 when it is determined that the present speaker is not the regular speaker himself. The recorded video signal is saved as a history.

【００７９】このような構成の話者識別システムでは、
利用者が認識のための操作を行なうとき、撮像手段５２
からの映像信号は、映像記憶手段５４に記憶される。し
かる後、確認手段１１によって前述したような種々の仕
方で現話者が正規の話者本人であるか否かを確認し、正
規の話者本人でないと判断されたときには、映像記憶手
段５４にいま記憶された映像信号を履歴とて保存し、こ
の映像を後で再生することで、誰が本人になりすまして
利用しようとしたかを割り出すことができる。In the speaker identification system having such a configuration,
When the user performs an operation for recognition, the image pickup means 52
The video signal from is stored in the video storage means 54. After that, the confirmation means 11 confirms whether or not the present speaker is the regular speaker himself in various ways as described above, and when it is determined that the current speaker is not the regular speaker himself, the image storage means 54 stores the information. By storing the video signal that has just been stored as a history and playing this video later, it is possible to figure out who impersonated himself and tried to use it.

【００８０】上述の例では、利用者が正規の話者本人で
あると確認されたときは、メモリ容量を節約するため、
映像記憶手段５４に蓄積した映像信号を消去するとした
が、正規の話者本人であることが確認されたときにも、
映像記憶手段５４に蓄積した映像信号を消去せずに、そ
のまま残しておき、例えば、正規の話者本人が次に利用
するときに、これに上書きするようにしてもよい。これ
により、装置が誤って正規の話者本人と判断したときに
も、映像記憶手段５４に蓄積された映像信号に基づき、
本人にかわって誰が利用したかを割り出すことができ
る。In the above example, when it is confirmed that the user is the legitimate speaker himself, in order to save the memory capacity,
Although it has been stated that the video signal accumulated in the video storage means 54 is deleted, even when it is confirmed that the person is a legitimate speaker,
The video signal stored in the video storage unit 54 may be left as it is without being erased, and may be overwritten on the next time the regular speaker himself uses it, for example. As a result, even when the device mistakenly determines that the speaker is the proper speaker himself, based on the video signal stored in the video storage means 54,
You can figure out who used it on behalf of the person.

【００８１】なお、この構成例において、撮像手段５２
は動画用のものであっても、静止用のものであっても良
く、必要に応じて、映像記憶手段５４に保存されている
映像を見ることによって前回の使用者の映像を見ること
ができる。In this configuration example, the image pickup means 52
May be for moving images or may be for still images. If necessary, the previous user's image can be viewed by viewing the image stored in the image storage means 54. .

【００８２】このようにして利用者の音声や映像を再生
可能に保存することで、他人が誰かを後で知ることがで
きる。なお、図１０，図１１の構成例では、音声あるい
は映像のいずれか一方を履歴として残すようになってい
るが、図１０と図１１とを組合せ、音声と映像との両方
を履歴として残すように構成することもできる。By thus storing the user's voice and video in a reproducible manner, it is possible for another person to know later on. It should be noted that in the configuration examples of FIGS. 10 and 11, either one of the voice and the video is left as the history, but both the voice and the video are left as the history by combining FIG. 10 and FIG. 11. It can also be configured to.

【００８３】また、上述の各構成例において、利用者の
風邪などによる声の変化は頻繁に起こるものではないた
め、利用者に第２の特定用情報を入力させる場合、利用
者が第２の特定用情報を正確に覚えていないことがあ
る。このような場合を考慮して、音声入力手段１から入
力された利用者の音声の特徴パターンと特定用情報入力
手段２から入力された特定用情報(第１の特定用情報)に
対応した利用者(例えば利用者Ｄ)の音声の標準パターン
とが類似していないと判別されたときに、利用者に対し
て、正しい第２の特定用情報を含む複数のダミーの特定
用情報を提示し、利用者にそのうちの１つを選択させる
こともできる。Further, in each of the above-described configuration examples, since the voice change due to the user's cold or the like does not occur frequently, when the user inputs the second specifying information, the user specifies the second specifying information. You may not remember the identifying information exactly. In consideration of such a case, the usage corresponding to the characteristic pattern of the user's voice input from the voice input unit 1 and the identification information (first identification information) input from the identification information input unit 2 When it is determined that the standard pattern of the voice of the person (for example, user D) is not similar, the user is presented with a plurality of dummy identification information including the correct second identification information. You can also let the user choose one of them.

【００８４】図１２は正しい第２の特定用情報を含む複
数のダミーの特定用情報を利用者に提示し、利用者にそ
のうちの１つを選択させる機能を備えた話者識別システ
ムの構成例を示す図である。図１２の構成例では、話者
認識部７において入力音声の特徴パターンと標準パター
ンとを照合の結果、これらが類似していないと判定した
場合は、第１の特定用情報に対応した第２の特定用情報
を付随情報記憶部から取り出し、取り出した第２の特定
用情報を含む複数の（例えば２０個程度の)ダミーの特
定用情報を発生し、これらを、例えば表示装置１６に表
示するダミー特定用情報発生部２９がさらに設けられて
いる。FIG. 12 shows an example of the configuration of a speaker identification system having a function of presenting a plurality of dummy specifying information including correct second specifying information to the user and allowing the user to select one of them. FIG. In the configuration example of FIG. 12, when the speaker recognition unit 7 compares the characteristic pattern of the input voice with the standard pattern and determines that they are not similar, the second pattern corresponding to the first specifying information is used. Of the identification information is extracted from the associated information storage unit, a plurality of dummy identification information (for example, about 20 pieces) including the extracted second identification information is generated, and these are displayed on the display device 16, for example. A dummy specifying information generator 29 is further provided.

【００８５】このような構成では、例えば表示装置１６
に表示された複数のダミーの特定用情報のうちのいずれ
か１つを利用者に選択させる。利用者によって１つの特
定用情報が選択されると、判別部１４では、利用者によ
り選択された特定用情報と付随情報記憶部１２に記憶さ
れている利用者Ｄの第２の特定用情報とを照合し、これ
らが一致したら、利用者が正規の利用者であると確認
し、この利用者にアプリケーションの利用を許可する。
これによって、利用者は、仮に第２の特定用情報を正確
に覚えていなくても、提示された複数の特定用情報の中
から正しい第２の特定用情報を見出し、これを選択する
ことができる。In such a configuration, for example, the display device 16
The user is allowed to select any one of the plurality of pieces of dummy specifying information displayed in. When one piece of identifying information is selected by the user, the determining unit 14 determines the identifying information selected by the user and the second identifying information of the user D stored in the associated information storage unit 12. If they match, it is confirmed that the user is a legitimate user, and this user is permitted to use the application.
Thereby, even if the user does not remember the second identifying information correctly, the user can find the correct second identifying information from the presented plurality of identifying information and select it. it can.

【００８６】なお、複数のダミー特定用情報として、例
えば、２０個程度を表示すれば、他人が正しい第２の特
定用情報を選択する確率も５％程度しかないため、実用
上、第３者が第２の特定用情報を選択することを有効に
防止できる。また、このときに提示する複数のダミー特
定用情報は、ダミー特定用情報発生部２９において、一
定の規則をもたせて、常に同じものが用いられるのが望
ましい。すなわち、正しい第２の特定用情報以外のダミ
ー特定用情報をランダムに発生させていると、正しい第
２の特定用情報以外の特定用情報については、毎回変わ
るので、例えば、複数のダミー特定用情報を２回表示さ
せて、両方に共通の特定用情報があればそれが正しい第
２の特定用情報であると他人にわかってしまう。If, for example, about 20 pieces of dummy identification information are displayed, the probability that another person will select the correct second identification information is only about 5%. Can effectively prevent the second selecting information from being selected. In addition, it is preferable that the plurality of pieces of dummy specifying information presented at this time have the same rule in the dummy specifying information generating section 29 and are always the same. That is, if the dummy specifying information other than the correct second specifying information is randomly generated, the specifying information other than the correct second specifying information changes every time. Information is displayed twice, and if there is identification information common to both, it is known to others that it is the correct second identification information.

【００８７】このように、図１２の構成例では、風邪な
どで声が変わった時でも、しかも、第２の特定用情報を
正確に覚えていなくても、正しく利用することが可能と
なる。As described above, in the configuration example of FIG. 12, even when the voice changes due to a cold or the like, and it is possible to correctly use the second specifying information without remembering it accurately.

【００８８】上述の各構成例では、話者認識用情報記憶
部５とは別に、付随情報記憶部１２が設けられている
が、例えば図１３に示すように、付随情報記憶部１２の
機能を話者認識用情報記憶部５にもたせることもでき
る。この場合には、通知部１３，判別部１４は、話者認
識用情報記憶部５から利用者への通知の仕方，第２の特
定用情報等を読出して、用いることができる。In each of the above configuration examples, the accompanying information storage unit 12 is provided separately from the speaker recognition information storage unit 5. However, as shown in FIG. 13, for example, the accompanying information storage unit 12 functions as shown in FIG. The information can be stored in the speaker recognition information storage unit 5. In this case, the notification unit 13 and the determination unit 14 can read and use the notification method to the user, the second identification information, and the like from the speaker recognition information storage unit 5.

【００８９】また、上述の構成例では、音声区間検出部
３の後に、特徴抽出部４が設けられているが、これのか
わりに、音声区間検出部３の前に、特徴抽出部４が設け
られていても良い。In the above configuration example, the feature extraction unit 4 is provided after the voice section detection unit 3, but instead of this, the feature extraction unit 4 is provided before the voice section detection unit 3. It may be.

【００９０】さらに、図７，図８の構成例では、端末側
に音声区間検出部３，特徴抽出部４が設けられている
が、これらの一方あるいは両方を端末側ではなく、銀行
等に設置されている話者認識装置ユニット側に設けるこ
とも可能である。Further, in the configuration examples of FIGS. 7 and 8, the voice section detection unit 3 and the feature extraction unit 4 are provided on the terminal side, but one or both of them are installed not at the terminal side but at a bank or the like. It is also possible to provide it on the side of the speaker recognition device unit that is used.

【００９１】また、図７，図８の構成例では、話者認識
装置ユニット側に話者認識部７が設けられているが、こ
れを、話者認識装置ユニット側ではなく、端末側に設け
ることも可能である。Further, in the configuration examples of FIGS. 7 and 8, the speaker recognition unit 7 is provided on the speaker recognition device unit side, but this is provided on the terminal side, not on the speaker recognition device unit side. It is also possible.

【００９２】[0092]

【発明の効果】以上に説明したように、請求項１乃至請
求項１１記載の発明によれば、音声入力手段から入力さ
れた話者の音声の特徴と話者識別用情報記憶手段に記憶
されている話者の音声特徴のうち特定用情報入力手段か
ら入力された特定用情報に対応する音声特徴とが類似し
ていないと判別されたときに、利用者に確認をとるよう
にしているので、風邪などによって利用者の声が突然変
化しても、話者識別を行なうことができる。As described above, according to the invention described in claims 1 to 11, the characteristics of the voice of the speaker input from the voice input means and the speaker identification information storage means are stored. When it is determined that the voice feature of the speaker who is not similar to the voice feature corresponding to the identification information input from the identification information input means is confirmed, the user is asked to confirm. The speaker can be identified even if the user's voice suddenly changes due to a cold or the like.

[Brief description of the drawings]

【図１】本発明に係る話者識別システムの構成例を示す
図である。FIG. 1 is a diagram showing a configuration example of a speaker identification system according to the present invention.

【図２】話者認識用情報記憶部の構成例を示す図であ
る。FIG. 2 is a diagram showing a configuration example of a speaker recognition information storage unit.

【図３】確認手段の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of confirmation means.

【図４】付随情報記憶部の構成例を示す図である。FIG. 4 is a diagram showing a configuration example of an associated information storage unit.

【図５】本発明の話者識別システムの使用形態例を示す
図である。FIG. 5 is a diagram showing an example of a usage pattern of the speaker identification system of the present invention.

【図６】本発明の話者識別システムの使用形態例を示す
図である。FIG. 6 is a diagram showing an example of a usage pattern of the speaker identification system of the present invention.

【図７】本発明の話者識別システムの使用形態例を示す
図である。FIG. 7 is a diagram showing an example of usage of the speaker identification system of the present invention.

【図８】本発明の話者識別システムの使用形態例を示す
図である。FIG. 8 is a diagram showing an example of a usage pattern of the speaker identification system of the present invention.

【図９】本発明の話者識別システムの使用形態例を示す
図である。FIG. 9 is a diagram showing an example of a usage pattern of the speaker identification system of the present invention.

【図１０】現話者の音声を再生可能に保存する機能を備
えた話者識別システムの構成例を示す図である。FIG. 10 is a diagram showing a configuration example of a speaker identification system having a function of reproducibly storing a voice of a current speaker.

【図１１】利用者の映像を保存する機能を備えた話者識
別システムの構成例を示す図である。FIG. 11 is a diagram showing a configuration example of a speaker identification system having a function of saving a user's image.

【図１２】本発明に係る話者識別システムの他の構成例
を示す図である。FIG. 12 is a diagram showing another configuration example of the speaker identification system according to the present invention.

【図１３】話者認識用情報記憶部の他の構成例を示す図
である。FIG. 13 is a diagram illustrating another configuration example of a speaker recognition information storage unit.

[Explanation of symbols]

１音声入力手段２指示手段３音声区間検出部４特徴抽出部５話者認識用情報記憶部６登録部７話者認識部８切替部１１確認手段１２付随情報記憶部１３通知部１４判別部１６表示装置２３アクセス部２４アクセス受動部３０話者認識装置ユニット３１端末３２話者認識装置ユニット３３通信手段３５電話装置(あるいはパソコン通信装
置) ５０音声記憶手段５２撮像手段５３Ａ／Ｄ変換部５４映像記憶手段８０オペレーションセンタDESCRIPTION OF SYMBOLS 1 voice input means 2 instruction means 3 voice section detection section 4 feature extraction section 5 speaker recognition information storage section 6 registration section 7 speaker recognition section 8 switching section 11 confirmation means 12 incidental information storage section 13 notification section 14 discrimination section 16 Display device 23 Access unit 24 Access passive unit 30 Speaker recognition device unit 31 Terminal 32 Speaker recognition device unit 33 Communication means 35 Telephone device (or personal computer communication device) 50 Voice storage means 52 Imaging means 53 A / D conversion section 54 Video Storage means 80 Operation Center

Claims

[Claims]

1. A speaker identification information storage means for storing speaker identification information, an identification information input means for inputting identification information for identifying a user, and a speaker voice. A voice input means for inputting; a feature of the voice of the speaker input from the voice input means; and a voice feature of the speaker stored in the speaker identifying information storage means, from the specifying information input means Collating means for collating whether or not the inputted voice information corresponding to the identification information is similar, and for confirming to the user when it is determined that they are not similar as a result of the collation. Speaker identification system.

2. The speaker identification system according to claim 1, wherein the confirmation means includes an accompanying information storage means for storing accompanying information in association with identification information for identifying a user, When it is determined that they are not similar as a result of the collation, the user is confirmed according to the accompanying information stored in the accompanying information storage unit in association with the identifying information input from the identifying information input unit. The speaker identification system is characterized in that the user is notified and the user is confirmed whether or not the user is an authorized user.

3. The speaker identification system according to claim 2, wherein the accompanying information storage means further stores second specifying information for specifying a user as accompanying information. When the result of the collation indicates that they are not similar, the confirmation means instructs the user to input the second specifying information, and the user inputs the second specifying information. At this time, the confirmation means is stored in the associated information storage means in association with the second identification information input by the user and the identification information input from the identification information input means. A speaker identification system characterized by collating with second identification information to confirm whether or not the user is an authorized user.

4. The speaker identification system according to claim 1 or 2, wherein the confirmation means is a result of the collation,
When it is determined that they are not similar, the user is presented with a plurality of dummy identification information including the correct second identification information, and the user is allowed to select one of them. A speaker identification system characterized by being present.

5. The speaker identification system according to claim 2, wherein the attendant information storage means stores a user's telephone number as attendant information, and the confirmation means follows the user's telephone number. A speaker identification system characterized by making a telephone call.

6. The speaker identification system according to claim 5, wherein when the confirmation unit calls the user by telephone, if the user is in a call, it is determined that the user is an authorized user. A feature speaker identification system.

7. The speaker identification system according to claim 1, wherein the voice of the current user is reproducible when the result of the confirmation does not confirm that the user is an authorized user. A speaker identification system characterized in that a voice storage means for storing is further provided.

8. The speaker identification system according to claim 1 or 2, wherein if the result of the confirmation is that the user is not a legitimate user, the video of the current user can be reproduced. A speaker identification system, further comprising a video storage means for storing.

9. When the specifying information for specifying the user is input and the voice of the speaker is input, the characteristics of the input voice of the speaker and the input specific information are associated in advance. A speaker identification method characterized by performing matching whether stored voice characteristics are similar or not, and as a result of the matching, confirming to the user when it is determined that they are not similar. .

10. The speaker identification method according to claim 9, wherein when the result of the collation indicates that they are not similar, the user is instructed to input the second specifying information, and the user specifies When the second specifying information is input, the second specifying information input by the user is collated with the second specifying information stored in advance corresponding to the specifying information. , A speaker identification method characterized by confirming whether the user is an authorized user or not.

11. The speaker identification system according to claim 9, wherein when the result of the collation indicates that they are not similar to each other, a plurality of dummy identification information including correct second specifying information is presented to the user. A speaker identification method characterized by presenting identification information and allowing a user to select one of them.