JPH04369699A - Unspecified spealer voice recognition system using neural network - Google Patents

Unspecified spealer voice recognition system using neural network

Info

Publication number
JPH04369699A
JPH04369699A JP3147224A JP14722491A JPH04369699A JP H04369699 A JPH04369699 A JP H04369699A JP 3147224 A JP3147224 A JP 3147224A JP 14722491 A JP14722491 A JP 14722491A JP H04369699 A JPH04369699 A JP H04369699A
Authority
JP
Japan
Prior art keywords
network
neural network
speaker
speakers
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP3147224A
Other languages
Japanese (ja)
Other versions
JPH0752359B2 (en
Inventor
Hidefumi Sawai
沢井 秀文
Satoru Nakamura
悟 中村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
A T R JIDO HONYAKU DENWA KENKYUSHO KK
Original Assignee
A T R JIDO HONYAKU DENWA KENKYUSHO KK
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by A T R JIDO HONYAKU DENWA KENKYUSHO KK filed Critical A T R JIDO HONYAKU DENWA KENKYUSHO KK
Priority to JP3147224A priority Critical patent/JPH0752359B2/en
Publication of JPH04369699A publication Critical patent/JPH04369699A/en
Publication of JPH0752359B2 publication Critical patent/JPH0752359B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Abstract

PURPOSE:To offer the neural network which extending a neural network architecture, proposed for the recognition of a specific person or a limited number of speakers, so that the voice of the specific speaker can be recognized. CONSTITUTION:A voice feature quantity is inputted to an input layer 3 in the form of a feature parameter time sequence, and propagated to 1st hidden layers 40, 41...4n in parallel at the same time to extract features characteristic to respective speakers. The 1st hidden layer 40 extracts effective feature quantities for discriminating between the speakers at the same time. The extracted features are propagated to 2nd hidden layers 50, 51, 52...5n, the network is learnt by an error reverse propagating method, and the result is outputted to an output layer 6.

Description

【発明の詳細な説明】[Detailed description of the invention]

【0001】0001

【産業上の利用分野】この発明はニューラルネットワー
クによる不特定話者音声認識方式に関し、特に、ニュー
ラルネットワークを用いて不特定話者の音声認識を行な
う音声認識技術分野に適用されるようなニューラルネッ
トワークによる不特定話者音声認識方式に関する。
[Industrial Application Field] The present invention relates to a method for recognizing speaker-independent speech using a neural network, and in particular, to a neural network applied to the field of speech recognition technology that uses a neural network to perform speech recognition for an independent speaker. This paper relates to a speaker-independent speech recognition method.

【0002】0002

【従来の技術および発明が解決しようとする課題】近年
、音声認識の分野において、ニューラルネットワークの
応用が活発に行なわれてきている。特に、時間遅れ神経
回路網(TDNN)により、有声破裂音/b,d,g/
の音素認識において高い性能が示されて以来、TDNN
を基本構造とする18子音認識用のネットワークや23
音素認識用のネットワークやマルチスピーカの音素認識
を行なうネットワークが多数提案されてきた。
BACKGROUND OF THE INVENTION In recent years, neural networks have been actively applied in the field of speech recognition. In particular, voiced plosives /b, d, g/
Since its high performance in phoneme recognition was demonstrated, TDNN
A network for recognizing 18 consonants with the basic structure and 23
Many networks for phoneme recognition and networks for multi-speaker phoneme recognition have been proposed.

【0003】しかしながら、不特定話者の音声認識を音
素認識のレベルから本格的に認識し得るシステムは、い
まだ出現していない。ただし、限られた少数の話者の音
素認識を行なうものは、たとえば Hampshire
 J., and A. Waibel: “The 
Meta−Pi Network: Connecti
onist Rapid Adaptation fo
r High Performance Multi−
SpeakerPhoneme Recognitio
in ”, Proceedings of the 
1990 IEEE International C
onference on Acoustics, S
peech and Signal Processi
ng, S3.9, pp164−168, 1990
.において提案されている。しかし、これらの認識シス
テムも学習話者とは異なる未知話者の音声に対する性能
は検証されていなかった。
[0003] However, a system that can fully recognize the speech of unspecified speakers from the level of phoneme recognition has not yet appeared. However, those that perform phoneme recognition for a limited number of speakers, such as Hampshire
J. , and A. Waibel: “The
Meta-Pi Network: Connecti
onist Rapid Adaptation for
r High Performance Multi-
SpeakerPhoneme Recognition
in ”, Proceedings of the
1990 IEEE International C
onference on acoustics, S
peach and signal process
ng, S3.9, pp164-168, 1990
.. It has been proposed in However, the performance of these recognition systems for the voices of unknown speakers, who are different from the learned speakers, has not been verified.

【0004】それゆえに、この発明の主たる目的は、学
習時間やサンプル数を軽減でき、高精度な認識が可能な
ニューラルネットワークによる不特定話者音声認識方式
を提供することである。
[0004] Therefore, the main object of the present invention is to provide a speaker-independent speech recognition method using a neural network that can reduce the learning time and the number of samples and can perform highly accurate recognition.

【0005】[0005]

【課題を解決するための手段】この発明は各話者間に学
習されたネットワークと話者間を識別するために学習さ
れた話者識別用のネットワークを統合して単一のネット
ワークを構成し、追加学習により全体のネットワークを
構成したものである。
[Means for Solving the Problems] The present invention configures a single network by integrating a network learned between each speaker and a network for speaker identification learned to discriminate between speakers. , the entire network is constructed by additional learning.

【0006】[0006]

【作用】この発明に係るニューラルネットワークによる
不特定話者音声認識方式は、各話者ごとに学習されたネ
ットワークと、話者間を識別するために学習された話者
識別用のネットワークを統合し、各ネットワークの学習
を個別的に行なうことにより、学習時間やサンプル数を
軽減でき、高精度な認識を可能にする。
[Operation] The speaker-independent speech recognition method using a neural network according to the present invention integrates a network trained for each speaker and a network for speaker identification trained to distinguish between speakers. By training each network individually, the training time and number of samples can be reduced, making highly accurate recognition possible.

【0007】[0007]

【発明の実施例】図1はこの発明の一実施例の概略ブロ
ック図である。図1を参照して、音声入力信号は特徴分
析部1に与えられ、FFT分析やLPC分析が行なわれ
、この発明の特徴となるニューラルネットワーク2に与
えられ、音声認識が行なわれて認識結果が出力される。
DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a schematic block diagram of an embodiment of the present invention. Referring to FIG. 1, a voice input signal is given to a feature analysis section 1, where FFT analysis and LPC analysis are performed, and then given to a neural network 2, which is a feature of the present invention, where voice recognition is performed and a recognition result is obtained. Output.

【0008】図2は図1に示したニューラルネットワー
クの具体的なブロック図である。図2を参照して、ニュ
ーラルネットワークは入力層3と隠れ層第1層40,4
1,42…4nと、隠れ層第2層50,51,52…5
nと出力層6とを含む。隠れ層第1層41は話者1の学
習用サンプルで学習するサブネットワークであり、隠れ
層第2層51は同じ話者1の学習用サンプルで学習する
サブネットワークであり、隠れ層第1層42は話者2の
学習用サンプルで学習するサブネットワークであり、隠
れ層第2層52は同じ話者2の学習用サンプルで学習す
るサブネットワークである。隠れ層第1層4nは話者N
の学習用サンプルで学習するサブネットワークであり、
隠れ層第2層5nは同じ話者Nのサブネットワークであ
る。隠れ層第1層40は話者識別用ネットワークと呼ば
れる話者1から話者Nまでの学習用のサンプルを用いて
、いずれの話者の音素であるかを判定するためのサブネ
ットワークである。出力層6は各出力ユニットの値から
音素カテゴリーC1,C2,…Ck…CKを最終的に判
定する。
FIG. 2 is a concrete block diagram of the neural network shown in FIG. 1. Referring to FIG. 2, the neural network has an input layer 3 and a first hidden layer 40, 4.
1, 42...4n, and the second hidden layer 50, 51, 52...5
n and an output layer 6. The first hidden layer 41 is a sub-network that is trained using training samples of speaker 1, and the second hidden layer 51 is a sub-network that is trained using training samples of the same speaker 1. 42 is a sub-network that is trained using the learning samples of speaker 2, and the second hidden layer 52 is a sub-network that is trained using the learning samples of the same speaker 2. The first hidden layer 4n is the speaker N.
is a subnetwork trained with training samples of
The second hidden layer 5n is a subnetwork of the same speaker N. The first hidden layer 40 is a sub-network called a speaker identification network that uses learning samples from speakers 1 to N to determine which speaker's phoneme it belongs to. The output layer 6 finally determines the phoneme categories C1, C2,...Ck...CK from the values of each output unit.

【0009】次に、この発明の一実施例の動作について
説明する。入力層3で特徴パラメータ時系列の形式で入
力された音声特徴量は入力層3と隠れ層第1層41,4
2…4nとの間に接続されたコネクションを介して並列
かつ同時に隠れ層第1層41,42…4nに伝搬される
。このとき、各サブネットワークは各話者のサブネット
ワークごとに各話者特有の特徴抽出を行なうと同時に、
隠れ層第1層40では各話者間を識別するために有効な
特徴量を同時に抽出する。
Next, the operation of one embodiment of the present invention will be explained. The audio features input in the input layer 3 in the form of feature parameter time series are input to the input layer 3 and the first hidden layer 41, 4.
2...4n in parallel and simultaneously to the first hidden layers 41, 42...4n. At this time, each subnetwork extracts features specific to each speaker for each subnetwork of each speaker, and at the same time,
The first hidden layer 40 simultaneously extracts feature amounts effective for distinguishing between speakers.

【0010】次に、隠れ層第1層40,41,42…4
nの出力は、隠れ層第1層40,41,42…4nと隠
れ層第2層50,51,52…5nとの間に接続された
コネクションを介して隠れ層第2層50,51,52…
5nに伝搬される。隠れ層第2層50,51,52…5
nから出力層6へのコネクションは、図2に示すように
、各話者のサブネットワークのk番目のサブレイヤーが
出力層6のk番目のカテゴリーCkに対応するユニット
に接続されている。また、話者識別用ネットワークにつ
いても同様に接続されているが、隠れ層第2層50,5
1,52…5nから出力層6へのコネクションはフルコ
ネクションとなっている。また、モジュール性を保つた
めに、各サブネットワーク間は接続されていない。この
ネットワークの学習は、誤差逆伝搬法( McClel
land J. L., D.E. Rumelhar
t and the PDP Research Gr
oup: “Parallel Distribute
d Processing ”, vol.1. Ch
ap.8. MITPress (1988) .)に
より行なうことができる。
Next, the first hidden layer 40, 41, 42...4
The output of n is transmitted to the second hidden layer 50, 51, . 52...
5n. Hidden layer second layer 50, 51, 52...5
The connection from n to the output layer 6 is such that the kth sublayer of each speaker's subnetwork is connected to the unit corresponding to the kth category Ck of the output layer 6, as shown in FIG. The speaker identification network is also connected in the same way, but the second hidden layer 50, 5
The connections from 1, 52, . . . , 5n to the output layer 6 are full connections. Furthermore, in order to maintain modularity, each subnetwork is not connected. This network is trained using the error backpropagation method (McClel
land J. L. , D. E. Rumelhar
t and the PDP Research Group
oup: “Parallel Distribution
d Processing”, vol.1. Ch.
ap. 8. MITPress (1988). ) can be done.

【0011】上述のような各話者ごとに学習されたネッ
トワークと、話者識別用ネットワークとを統合したネッ
トワークは、モジュール性が高いために各サブネットワ
ークごとに学習を行なうことができ、従来から提案され
ているネットワークや同程度の自由度(ネットワークの
コネクション数)を持つ単純な4層構成のネットワーク
と比較すると、学習時間や学習用のサンプルを大幅に軽
減できる利点がある。また、認識率も安定して高くなる
ことは、中村悟,沢井秀文:「不特定話者音素認識のた
めのニューラルネットワークアーキテクチャの検討」電
子情報通信学会音声研究会,SP90−61,1990
年12月20日で実験的に証明されている。
[0011] A network that integrates a network learned for each speaker and a network for speaker identification as described above has a high modularity, so that learning can be performed for each sub-network, which has been conventionally Compared to the proposed network and a simple four-layer network with the same degree of freedom (number of network connections), this has the advantage of significantly reducing learning time and training samples. In addition, the recognition rate is also stable and high. Satoru Nakamura, Hidefumi Sawai: "Study of neural network architecture for speaker-independent phoneme recognition" Institute of Electronics, Information and Communication Engineers Speech Research Group, SP90-61, 1990
Experimentally proven on December 20th.

【0012】0012

【発明の効果】以上のように、この発明によれば、ニュ
ーラルネットワークの構成を各話者ごとのサブネットワ
ークと、話者識別用ネットワークとからモジュールを構
成し、各サブネットワークの学習を個別的に行なえるよ
うにしたので、学習時間やサンプル数を軽減でき、高精
度な認識が可能となる。
[Effects of the Invention] As described above, according to the present invention, the configuration of the neural network is configured into a module consisting of a subnetwork for each speaker and a network for speaker identification, and the learning of each subnetwork is performed individually. This makes it possible to reduce learning time and the number of samples, making highly accurate recognition possible.

【図面の簡単な説明】[Brief explanation of the drawing]

【図1】この発明の一実施例の概略ブロック図である。FIG. 1 is a schematic block diagram of an embodiment of the invention.

【図2】図1に示したニューラルネットワークの具体的
なブロック図である。
FIG. 2 is a specific block diagram of the neural network shown in FIG. 1.

【符号の説明】[Explanation of symbols]

1  特徴分析部 2  ニューラルネットワーク 3  入力層 40,41,42…4n  隠れ層第1層50,51,
52…5n  隠れ層第2層6  出力層
1 Feature analysis unit 2 Neural network 3 Input layer 40, 41, 42...4n Hidden layer 1st layer 50, 51,
52...5n Hidden layer 2nd layer 6 Output layer

Claims (1)

【特許請求の範囲】[Claims] 【請求項1】  各話者間に学習されたネットワークと
話者間を識別するために学習された話者識別用のネット
ワークを統合して単一のネットワークを構成し、追加学
習により全体のネットワークを構成することを特徴とす
る、ニューラルネットワークによる不特定話者音声認識
方式。
Claim 1: A network learned between each speaker and a network for speaker identification learned to discriminate between speakers are integrated to form a single network, and the entire network is constructed by additional learning. A speaker-independent speech recognition method using a neural network, which is characterized by comprising:
JP3147224A 1991-06-19 1991-06-19 Independent Speaker Speech Recognition Method Using Neural Network Expired - Fee Related JPH0752359B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP3147224A JPH0752359B2 (en) 1991-06-19 1991-06-19 Independent Speaker Speech Recognition Method Using Neural Network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP3147224A JPH0752359B2 (en) 1991-06-19 1991-06-19 Independent Speaker Speech Recognition Method Using Neural Network

Publications (2)

Publication Number Publication Date
JPH04369699A true JPH04369699A (en) 1992-12-22
JPH0752359B2 JPH0752359B2 (en) 1995-06-05

Family

ID=15425382

Family Applications (1)

Application Number Title Priority Date Filing Date
JP3147224A Expired - Fee Related JPH0752359B2 (en) 1991-06-19 1991-06-19 Independent Speaker Speech Recognition Method Using Neural Network

Country Status (1)

Country Link
JP (1) JPH0752359B2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348675A (en) * 1993-06-07 1994-12-22 Ebara Corp Neuro-computer application equipment and machinery provided with the neuro-computer application equipment
CN104903954A (en) * 2013-01-10 2015-09-09 感官公司 Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS56119199A (en) * 1980-02-26 1981-09-18 Sanyo Electric Co Voice identifying device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348675A (en) * 1993-06-07 1994-12-22 Ebara Corp Neuro-computer application equipment and machinery provided with the neuro-computer application equipment
CN104903954A (en) * 2013-01-10 2015-09-09 感官公司 Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
JP2016509254A (en) * 2013-01-10 2016-03-24 センソリー・インコーポレイテッド Speaker verification and identification using artificial neural network based subphoneme discrimination

Also Published As

Publication number Publication date
JPH0752359B2 (en) 1995-06-05

Similar Documents

Publication Publication Date Title
FI117954B (en) System for verifying a speaker
EP0342630B1 (en) Speech recognition with speaker adaptation by learning
JP2764277B2 (en) Voice recognition device
KR100309205B1 (en) Voice processing apparatus and method
EP0549265A2 (en) Neural network-based speech token recognition system and method
CN112331216A (en) Speaker recognition system and method based on composite acoustic features and low-rank decomposition TDNN
Lippmann Speech perception by humans and machines
CN112331218B (en) Single-channel voice separation method and device for multiple speakers
US5758021A (en) Speech recognition combining dynamic programming and neural network techniques
Caminero et al. On-line garbage modeling with discriminant analysis for utterance verification
Matsuoka et al. Syllable recognition using integrated neural networks
CN114387997A (en) Speech emotion recognition method based on deep learning
Hansen et al. Robust speech recognition training via duration and spectral-based stress token generation
JPH04369699A (en) Unspecified spealer voice recognition system using neural network
Gammal et al. Combating reverberation in speaker verification
JPH04318900A (en) Multidirectional simultaneous sound collection type voice recognizing method
Webb et al. Speaker identification experiments using HMMs
Van Hout et al. Tackling unseen acoustic conditions in query-by-example search using time and frequency convolution for multilingual deep bottleneck features
International Neural Network Society (INNS), the IEEE Neural Network Council Cooperating Societies et al. Text-dependent speaker identification using learning vector quantization
Lohrenz et al. On temporal context information for hybrid BLSTM-based phoneme recognition
CN114420111B (en) One-dimensional hypothesis-based speech vector distance calculation method
JPH05204399A (en) Unspecified speaker's phoneme recognition method
Ting et al. Malay syllable recognition based on multilayer perceptron and dynamic time warping
Zhing-Xuan et al. A kind of fuzzy-neural networks for text-independent speaker identification
JPH0323920B2 (en)

Legal Events

Date Code Title Description
A01 Written decision to grant a patent or to grant a registration (utility model)

Free format text: JAPANESE INTERMEDIATE CODE: A01

Effective date: 19951219

LAPS Cancellation because of no payment of annual fees