JPH04295898A

JPH04295898A - Speaker collating system

Info

Publication number: JPH04295898A
Application number: JP3061777A
Authority: JP
Inventors: Shingo Nishimura; 新吾西村
Original assignee: Sekisui Chemical Co Ltd
Current assignee: Sekisui Chemical Co Ltd
Priority date: 1991-03-26
Filing date: 1991-03-26
Publication date: 1992-10-20

Abstract

PURPOSE:To prevent only the category of a registerer from becoming large unnecessarily, and to improve the collation rate by generating newly a pattern of a non-registerer by a calculation and learning it, at the time of learning it additionally. CONSTITUTION:According to this speaker collating system, at the time of collating a speaker from an input voice by using a neural network 23, in the case of leaning additionally only a pattern of a registerer, an additional learning pattern of a non-registerer is not extracted from an actual voice sample but generated by a calculation, a candidate point of a non-registerer pattern is set by using a random number onto a multi-dimensional space by a result of analysis of a main component analysis executed to a registerer pattern, and whether the candidate point concerned is set as an actual non-registerer pattern or not is decided by whether a distance between the candidate point concerned and a category of each registerer is within a certain range or not.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、電気錠、ＩＣカード等
のオンライン端末等で入力音声からその話者を照合する
に好適な話者照合方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speaker verification method suitable for verifying a speaker from input voice using an online terminal such as an electric lock or an IC card.

【０００２】0002

【従来の技術】一般に音声試料から抽出したパターンは
時期が経つと変化し、そのため学習直後の音声試料に対
しては照合率が高いが、学習時の音声試料と入力された
音声試料との間の時期差が大きくなるとともに、照合率
が劣化することが知られている。[Prior Art] In general, patterns extracted from speech samples change over time. Therefore, the matching rate is high for speech samples immediately after learning, but there is a difference between the speech samples at the time of learning and the input speech samples. It is known that the matching rate deteriorates as the timing difference increases.

【０００３】そこで、本出願人は、特願平２−１２０８
６５により、ニューラルネットワークによる話者照合に
おいて、時期的に新しい音声試料から抽出したパターン
を学習パターンに加え、学習済みのニューラルネットワ
ークを再度、学習することにより照合率の劣化を防ぐ方
式を提案している。[0003] Therefore, the present applicant filed Japanese Patent Application No. 2-1208.
65 proposed a method for preventing deterioration of the matching rate in speaker matching using a neural network by adding patterns extracted from periodically new speech samples to the learning patterns and re-learning the trained neural network. There is.

【０００４】然るに、ニューラルネットワークによる話
者照合において追加学習を行なう場合、一般に登録者は
長期間にわたって何度も本システムを使用するため、追
加学習用の音声試料を用意できるが、非登録者は新しい
音声試料を用意するのが困難なため、特願平２−１２０
８６５においては、初期の学習に用いた音声試料を繰り
返し学習していた。[0004] However, when conducting additional learning in speaker verification using a neural network, registrants generally use the system many times over a long period of time, so they can prepare speech samples for additional learning, but non-registered persons Because it is difficult to prepare new audio samples,
In 865, the voice samples used for initial learning were repeatedly learned.

【０００５】また、本出願人は、特願平２−２４３４０
７により、ニューラルネットワークによる話者照合にお
いて、非登録者の学習パターンを実際の音声試料から抽
出するのでなく、計算により作成する方式も既に提案し
ている。[0005] Also, the present applicant has filed Japanese Patent Application No. 2-24340.
7, in speaker verification using a neural network, a method has already been proposed in which learning patterns of non-registered persons are created by calculation instead of extracting them from actual speech samples.

【０００６】[0006]

【発明が解決しようとする課題】然しながら、従来技術
では、図３に示す如く、登録者の音声試料のみが新しく
なるため、パターン空間上で登録者の占めるカテゴリが
不必要に大きくなり、非登録者のカテゴリに重なってし
まう（図３の斜線部）虞れがあり、そのため非登録者の
音声試料を登録者のものと誤照合することがあった。そ
れを防ぐには非登録者の音声試料を追加学習用に用意す
る必要があるが、例えば、本システムを一般家庭で使用
した（家族が登録者とした）場合、家族以外の者の音声
試料を用意することは事実上、困難であった。また仮に
用意できたとしても、その音声試料がパターン空間上で
の登録者のカテゴリの不必要な拡大を防げるとは限らな
かった。[Problems to be Solved by the Invention] However, in the prior art, as shown in FIG. 3, only the voice sample of the registrant is new, so the category occupied by the registrant in the pattern space becomes unnecessarily large, and the number of unregistered There is a risk that the voice samples of non-registrants may overlap with those of registrants (the shaded area in FIG. 3), and as a result, voice samples of non-registered persons may be erroneously matched with those of registrants. To prevent this, it is necessary to prepare voice samples of non-registered persons for additional learning. It was actually difficult to prepare. Furthermore, even if a voice sample could be prepared, it would not necessarily be possible to prevent unnecessary expansion of the registrant's category on the pattern space.

【０００７】本発明は、追加学習する際、非登録者のパ
ターンを新たに計算で作成し学習することにより、登録
者のカテゴリのみが不必要に大きくなることを防ぎ、照
合率の向上を図ることを目的とする。[0007] The present invention prevents only the category of registrants from becoming unnecessarily large and improves the matching rate by creating and learning patterns for non-registrants during additional learning. The purpose is to

【０００８】[0008]

【課題を解決するための手段】請求項１に記載の本発明
は、ニューラルネットワークを用いて入力音声から話者
照合するに際し、登録者のパターンのみを追加学習する
場合、非登録者の追加学習パターンを実際の音声試料か
ら抽出するのでなく、計算により作成する話者照合方式
であって、非登録者パターンの候補点を、登録者パター
ンに対し行なった主成分分析等の分析結果による多次元
空間上に乱数を使って設定し、該候補点を実際の非登録
者パターンとするか否かの判定を、該候補点と各登録者
のカテゴリとの間の距離がある範囲内に入っているか否
かにより行なうようにしたものである。[Means for Solving the Problems] The present invention as set forth in claim 1 provides a method for additionally learning patterns of only registrants when verifying speakers from input speech using a neural network. This is a speaker matching method in which patterns are created by calculation rather than by extracting patterns from actual speech samples, and candidate points of non-registrant patterns are multidimensional based on the results of analysis such as principal component analysis performed on registrant patterns. Random numbers are set in the space, and it is determined whether the candidate point is an actual non-registrant pattern or not when the distance between the candidate point and each registrant category falls within a certain range. This is done depending on whether or not the person is present.

【０００９】請求項２に記載の本発明は、請求項１に記
載の本発明において更に、前記ニューラルネットワーク
への入力として、音声の周波数特性の時間的変化を用い
るようにしたものである。[0009] The present invention as set forth in claim 2 is the invention as set forth in claim 1, further comprising using a temporal change in frequency characteristics of audio as an input to the neural network.

【００１０】請求項３に記載の本発明は、ニューラルネ
ットワークを用いて入力音声から話者照合するに際し、
登録者のパターンのみを追加学習する場合、非登録者の
追加学習パターンを実際の音声試料から抽出するのでな
く、計算により作成する話者照合方式であって、非登録
者パターンの候補点を、登録者パターンに対し行なった
主成分分析等の分析結果による多次元空間上でのメッシ
ュの格子点に定め、該候補点を実際の非登録者パターン
とするか否かの判定を、該候補点と各登録者のカテゴリ
との間の距離がある範囲内に入っているか否かにより行
なうようにしたものである。[0010] The present invention as recited in claim 3 provides that when verifying speakers from input speech using a neural network,
When additionally learning only the patterns of registrants, the additional learning patterns of non-registrants are not extracted from actual speech samples, but are created by calculation, using a speaker matching method that uses the candidate points of the patterns of non-registrants as The candidate points are determined as grid points of a mesh in a multidimensional space based on the analysis results such as principal component analysis performed on the registrant pattern, and it is determined whether or not the candidate point is an actual non-registrant pattern. This is done based on whether the distance between the registrant and each registrant's category is within a certain range.

【００１１】請求項４に記載の本発明は、請求項３に記
載の本発明において更に、前記ニューラルネットワーク
への入力として、音声の周波数特性の時間的変化を用い
るようにしたものである。[0011] The present invention as set forth in claim 4 is the invention as set forth in claim 3, further comprising using a temporal change in frequency characteristics of audio as an input to the neural network.

【００１２】尚、本発明の実施において、登録者及び非
登録者の追加学習に用いるパターンは、新しいパターン
のみでも良いし、古いパターンに単純に追加しても良い
し、新しいパターンを追加した分、古いパターンを削除
しても良いものとする。登録者の主成分分析に用いる音
声試料も同様である。[0012] In carrying out the present invention, the pattern used for additional learning for registrants and non-registrants may be only a new pattern, may be simply added to an old pattern, or may be a pattern used for additional learning of registrants and non-registrants. , old patterns may be deleted. The same applies to voice samples used for principal component analysis of registrants.

【００１３】[0013]

【作用】請求項１、３のそれぞれに記載の本発明によれ
ば、下記■、■の作用がある。[Function] According to the present invention as set forth in claims 1 and 3, the following effects (1) and (2) are achieved.

【００１４】■非登録者の学習パターンに計算で作成し
たパターンを新しく追加することにより、パターン空間
上で登録者のカテゴリのみが不必要に大きくなることを
防ぎ、照合率を向上し得る。[0014] By adding a newly calculated pattern to the learning patterns of non-registrants, it is possible to prevent only the category of registrants from becoming unnecessarily large in the pattern space and improve the matching rate.

【００１５】■非登録者の追加学習用の音声試料が必要
なくなる。[0015] ■ There is no longer a need for voice samples for additional learning by non-registered persons.

【００１６】請求項２、４のそれぞれに記載の本発明に
よれば、下記■の作用がある。According to the present invention as set forth in claims 2 and 4, the following effect (2) is achieved.

【００１７】■ニューラルネットワークへの入力として
、「音声の周波数特性の時間的変化」を用いたから、入
力を得るための前処理が、従来の複雑な特徴量抽出に比
して単純となり、この前処理に要する時間が短くて足り
る。従って、話者照合処理を複雑な処理装置によること
なく容易に実時間処理できる。■Since "temporal changes in the frequency characteristics of audio" are used as input to the neural network, the preprocessing to obtain the input is simpler than the conventional complex feature extraction. The time required for processing is short. Therefore, speaker verification processing can be easily performed in real time without using a complicated processing device.

【００１８】[0018]

【実施例】図１は本発明の第１実施例に係る話者照合シ
ステムを示す模式図、図２は本発明の第２実施例に係る
話者照合システムを示す模式図、図３はパターン空間の
概念図である。[Embodiment] FIG. 1 is a schematic diagram showing a speaker verification system according to a first embodiment of the present invention, FIG. 2 is a schematic diagram showing a speaker verification system according to a second embodiment of the present invention, and FIG. 3 is a pattern diagram showing a speaker verification system according to a second embodiment of the present invention. It is a conceptual diagram of space.

【００１９】（第１実施例）（図１参照）話者照合シス
テム１０は、図１に示す如く、学習用システムと照合用
システムを有して構成されている。(First Embodiment) (See FIG. 1) As shown in FIG. 1, the speaker verification system 10 includes a learning system and a verification system.

【００２０】学習用システムは、音声入力部１１、前処
理部１２、主成分分析部１３、乱数生成部１４、距離計
算部１５、パターン選択部１６、逆変換部１７、学習パ
ターン記憶部１８から構成される。The learning system includes a voice input section 11, a preprocessing section 12, a principal component analysis section 13, a random number generation section 14, a distance calculation section 15, a pattern selection section 16, an inverse transformation section 17, and a learning pattern storage section 18. configured.

【００２１】照合用システムは、音声入力部２１、前処
理部２２、ニューラルネットワーク２３、判定部２４か
ら構成される。The verification system includes a voice input section 21, a preprocessing section 22, a neural network 23, and a determining section 24.

【００２２】以下、登録者の学習パターン作成、非登録
者の学習パターン作成、学習、照合の各手順について説
明する。[0022] Hereinafter, each procedure of creating learning patterns for registrants, creating learning patterns for non-registrants, learning, and matching will be explained.

【００２３】（Ａ）　登録者の学習パターン作成■登録
者の音声を音声入力部１１にて採取し、これを前処理部
１２において前処理する。(A) Creation of learning pattern for registrant ■ The voice of the registrant is collected by the voice input section 11 and preprocessed by the preprocessing section 12 .

【００２４】この前処理は、例えば、入力音声「タダイ
マ」を４つのブロックに時間的に等分割し、複数チャン
ネルのバンドパスフィルタに通すことにより、各ブロッ
ク即ち各一定時間毎の周波数特性を得るものとすること
ができる。このとき、バンドパスフィルタの出力は各ブ
ロック毎に平均化回路で平均化される。[0024] In this preprocessing, for example, the input audio "Tadaima" is temporally equally divided into four blocks and passed through a multi-channel band pass filter to obtain the frequency characteristics of each block, that is, each fixed time period. can be taken as a thing. At this time, the output of the bandpass filter is averaged by an averaging circuit for each block.

【００２５】■上記■の前処理結果を、登録者の学習パ
ターンとして学習パターン記憶部１８に格納する。(2) The preprocessing result of (2) above is stored in the learning pattern storage section 18 as the registrant's learning pattern.

【００２６】（Ｂ）　非登録者の学習パターン作成非登
録者の学習パターン作成は、特徴空間上での登録者パタ
ーンと非登録者パターンとの配置によるが、一般に特徴
空間の次元（入力の次元）は高いので、主成分分析やそ
の他の変換を用いて、次元を落とした特徴空間上で行な
う。本実施例では、主成分分析部１３を用いた主成分分
析により非登録パターンを作成した。(B) Creation of learning patterns for non-registered users Creation of learning patterns for non-registered users depends on the arrangement of registrant patterns and non-registered user patterns on the feature space, but generally the dimensions of the feature space (input dimensions) ) is high, so it is performed on a reduced dimension feature space using principal component analysis or other transformations. In this example, the non-registered pattern was created by principal component analysis using the principal component analysis section 13.

【００２７】■主成分分析部１３において、学習パター
ン記憶部１８に格納されている登録者の学習パターンを
主成分分析にかけ、変換行列を得る。(2) The principal component analysis section 13 subjects the registrant's learning pattern stored in the learning pattern storage section 18 to principal component analysis to obtain a transformation matrix.

【００２８】■乱数生成部１４において、主成分軸が張
る多次元空間上に乱数を使ってパターンを設定し、非登
録者の学習パターンの候補点を設定する。[0028] In the random number generation unit 14, a pattern is set using random numbers on a multidimensional space defined by the principal component axes, and candidate points for learning patterns of non-registered persons are set.

【００２９】■距離計算部１５、パターン選択部１６に
おいて、非登録者の学習パターン用データを、候補点と
各登録者のカテゴリとの間の距離がある範囲内に入って
いるかどうかにより選択する。■The distance calculation section 15 and the pattern selection section 16 select learning pattern data for non-registered users depending on whether the distance between the candidate point and each registrant's category is within a certain range. .

【００３０】■逆変換部１７において、上記■で選択し
た非登録者の学習パターン用データと主成分分析で得た
変換行列の逆行列の積をとり、非登録者の学習パターン
とする。(2) In the inverse transformation unit 17, the product of the learning pattern data for the non-registered person selected in (2) above and the inverse matrix of the transformation matrix obtained by principal component analysis is taken as a learning pattern for the non-registered person.

【００３１】■上記■の作成結果を、非登録者の学習パ
ターンとして学習パターン記憶部１８に格納する。② The creation result of ② above is stored in the learning pattern storage unit 18 as a learning pattern for a non-registered person.

【００３２】（Ｃ）　学習学習パターン記憶部１８に格納した上記（Ａ）　、（Ｂ
）　の登録者パターンと非登録者パターンを学習パター
ンとしてニューラルネットワーク２３を追加学習する。(C) Learning The above (A) and (B) stored in the learning pattern storage unit 18
) The neural network 23 is additionally trained using the registrant pattern and non-registrant pattern as learning patterns.

【００３３】（Ｄ）　照合登録者及び非登録者の音声を音声入力部２１にて採取し
、これを前処理部２２において前処理し、この前処理結
果を学習済みのニューラルネットワーク２３への入力と
する。これにより、判定部２４は、ニューラルネットワ
ーク２３の出力結果から、話者照合を行なう。(D) Verification The voices of registrants and non-registrants are collected by the voice input section 21, preprocessed by the preprocessing section 22, and the preprocessing results are input to the trained neural network 23. shall be. Thereby, the determination unit 24 performs speaker verification based on the output result of the neural network 23.

【００３４】尚、前処理部２２による前処理は、例えば
、前述（Ａ）　の■における前処理部１２と同様に、音
声の一定時間内における平均的な周波数特性の時間的変
化を得るものとすることができる。[0034] The preprocessing by the preprocessing unit 22 is, for example, similar to the preprocessing unit 12 in (A) above, to obtain the temporal change in the average frequency characteristic within a certain period of time of the audio. can do.

【００３５】以下、上記話者照合システム１０による実
験結果について説明する。[0035] Experimental results using the speaker verification system 10 described above will be explained below.

【００３６】■登録者　５名（　１００パターン）、非
登録者２５名（　１００パターン）の音声試料を用い、
初期学習を行なう。■Using voice samples from 5 registered users (100 patterns) and 25 non-registered users (100 patterns),
Perform initial learning.

【００３７】■１カ月に１度、登録者の音声試料（５０
パターン）を追加学習用に用意する。■ Once a month, registrant's voice samples (50
pattern) for additional learning.

【００３８】■非登録者の追加学習用パターンは、以下
の方法で作成する。■Additional learning patterns for non-registered users are created in the following manner.

【００３９】（１）　登録者　５名の音声試料に対し、
前処理を行ない６４次元の特徴ベクトルを得る。(1) For the voice samples of 5 registrants,
Perform preprocessing to obtain a 64-dimensional feature vector.

【００４０】（２）５名の特徴ベクトルについて主成分
分析を行ない、第３軸までの主成分軸が張る空間上で、
登録者パターンの分布を考慮して非登録者パターン（５
０パターン）を作成する。(2) Perform principal component analysis on the feature vectors of the five people, and on the space spanned by the principal component axes up to the third axis,
Considering the distribution of registrant patterns, the non-registrant pattern (5
0 pattern).

【００４１】■上記　５名の登録者パターンと計算によ
り作成した非登録者パターンを、学習パターンに加え、
学習済みのニューラルネットワークを再度、学習する。■ Add the above 5 registrant patterns and the non-registrant pattern created by calculation to the learning pattern,
Retrain the trained neural network.

【００４２】■学習後のネットワークに、登録者及び非
登録者の評価パターンを入力として与え判定したところ
、非登録者の初期の音声試料を繰り返し学習したときに
比べて、誤り率において２０％の改善がみられた。■ When the evaluation patterns of registrants and non-registrants were given as input to the trained network, the error rate was 20% lower than when the initial speech samples of non-registrants were repeatedly learned. Improvement was seen.

【００４３】この第１実施例によれば、追加学習用のパ
ターンに、時期的に新しい登録者のパターンと、計算で
作成した非登録者のパターンを用いることにより、パタ
ーン空間上で登録者のカテゴリのみが不必要に大きくな
ることを防ぎ、照合率を向上し得る。また、非登録者の
追加学習用の音声試料が必要なくなる。According to this first embodiment, by using the pattern of a registrant who is new in time and the pattern of a non-registered person created by calculation as the pattern for additional learning, the registrant's pattern is determined in the pattern space. It is possible to prevent only categories from becoming unnecessarily large and improve the matching rate. Furthermore, there is no need for voice samples for additional learning by non-registered persons.

【００４４】（第２実施例）（図２参照）話者照合シス
テム３０が前記話者照合システム１０と異なる点は、図
２に示す如く、乱数生成部１４を格子点生成部１４Ａと
したことのみにある。(Second Embodiment) (See FIG. 2) The speaker verification system 30 differs from the speaker verification system 10 in that the random number generation section 14 is replaced by a grid point generation section 14A, as shown in FIG. Only in

【００４５】即ち、話者照合システム３０にあっては、
非登録者の学習パターン作成時に、非登録者パターンの
候補点を設定するに際し（話者照合システム１０による
前述（Ｂ）　の■の段階）、「格子点生成部１４Ａにお
いて、主成分軸が張る多次元空間上でのメッシュの格子
点を非登録者の学習パターンの候補点として設定するも
の」である。That is, in the speaker verification system 30,
When creating a learning pattern for a non-registered person, when setting candidate points for the non-registered person pattern (step (■) in (B) above by the speaker matching system 10), it is necessary to This method sets grid points of a mesh in a multidimensional space as candidate points for learning patterns for non-registered users.

【００４６】尚、メッシュの格子点とは、各主成分軸方
向に沿ってある間隔をなす格子面群を想定したとき、各
主成分軸方向に沿う格子面群が他の主成分軸方向に沿う
格子面群と交差することにて形成する格子点を言う。[0046] Note that the lattice points of a mesh are defined as a group of lattice planes arranged at a certain interval along each principal component axis, and a lattice plane group along each principal component axis direction is Refers to a lattice point formed by intersecting a group of lattice planes along the grid.

【００４７】従って、この第２実施例にあっても、追加
学習用のパターンに、時期的に新しい登録者のパターン
と、計算で作成した非登録者のパターンを用いることに
より、パターン空間上で登録者のカテゴリのみが不必要
に大きくなることを防ぎ、照合率を向上し得る。また、
非登録者の追加学習用の音声試料が必要なくなる。Therefore, in this second embodiment as well, by using the pattern of a chronologically new registrant and the pattern of a non-registered person created by calculation as the pattern for additional learning, it is possible to It is possible to prevent only the registrant category from becoming unnecessarily large and improve the matching rate. Also,
There is no longer a need for voice samples for additional learning by non-registered users.

【００４８】[0048]

【発明の効果】以上のように本発明によれば、追加学習
する際、非登録者のパターンを新たに計算で作成し学習
することにより、登録者のカテゴリのみが不必要に大き
くなることを防ぎ、照合率の向上を図ることができる。[Effects of the Invention] As described above, according to the present invention, when additional learning is performed, patterns of non-registrants are newly calculated and learned, thereby preventing only the category of registrants from becoming unnecessarily large. It is possible to prevent this and improve the matching rate.

[Brief explanation of drawings]

【図１】図１は本発明の第１実施例に係る話者照合シス
テムを示す模式図である。FIG. 1 is a schematic diagram showing a speaker verification system according to a first embodiment of the present invention.

【図２】図２は本発明の第２実施例に係る話者照合シス
テムを示す模式図である。FIG. 2 is a schematic diagram showing a speaker verification system according to a second embodiment of the present invention.

【図３】図３はパターン空間の概念図である。FIG. 3 is a conceptual diagram of a pattern space.

[Explanation of symbols]

１０、３０　　話者照合システム１１　　音声入力部１２　　前処理部１３　　主成分分析部１４　　乱数生成部１４Ａ　　格子点生成部１５　　距離計算部１６　　パターン選択部１７　　逆変換部１８　　学習パターン記憶部２１　　音声入力部２２　　前処理部２３　　ニューラルネットワーク２４　　判定部 10, 30 Speaker verification system 11 Audio input section 12 Pre-processing section 13 Principal component analysis section 14 Random number generator 14A Lattice point generation section 15 Distance calculation section 16 Pattern selection section 17 Inverse conversion section 18 Learning pattern storage unit 21 Audio input section 22 Pre-processing section 23 Neural network 24 Judgment section

Claims

[Claims]

[Claim 1] When performing speaker verification from input speech using a neural network, if only the patterns of registered users are additionally learned, the additional learning patterns of non-registered users are not extracted from actual speech samples, but are computed. In the speaker matching method to be created, the candidate points of the non-registered person pattern are
Random numbers are set on a multidimensional space based on the analysis results such as principal component analysis performed on the registrant pattern, and a judgment is made as to whether or not the candidate point is an actual non-registrant pattern. A speaker verification method based on whether the distance between each registrant and the category is within a certain range.

2. The speaker verification method according to claim 1, wherein temporal changes in frequency characteristics of speech are used as input to the neural network.

[Claim 3] When performing speaker verification from input speech using a neural network, if only the patterns of registered users are additionally learned, the additional learning patterns of non-registered users are not extracted from actual speech samples, but are computed. In the speaker matching method to be created, the candidate points of the non-registered person pattern are
Based on the analysis results such as principal component analysis performed on the registrant pattern, grid points of a mesh in a multidimensional space are determined, and it is determined whether or not the candidate point is an actual non-registrant pattern.
A speaker verification method based on whether the distance between the candidate point and each registrant's category is within a certain range.

4. The speaker verification method according to claim 3, wherein temporal changes in frequency characteristics of speech are used as input to the neural network.