JPH03230256A

JPH03230256A - Voice recognizing method

Info

Publication number: JPH03230256A
Application number: JP2026672A
Authority: JP
Inventors: Kazuhiko Okashita; 和彦岡下; Shingo Nishimura; 新吾西村; Masashi Miyagawa; 宮川　正志
Original assignee: Sekisui Chemical Co Ltd
Current assignee: Sekisui Chemical Co Ltd
Priority date: 1990-02-05
Filing date: 1990-02-05
Publication date: 1991-10-14

Abstract

PURPOSE:To enable a real-time processing and to secure the high rate of recognition by calculating the average of frequency characteristic in the respective blocks of hourly equally divided voice blocks and using a value differentiating the average between the block equipped with the maximum level and the other block as an input to a neural network. CONSTITUTION:The already known voice waveform of each recognizing word is passed through a 16 channel band pass filter 11 and the frequency characteristic of the input voice is calculated. In the respective bands of the band pass filter 11, respective voice blocks hourly equally dividing the voice waveform into eight are defined as one block, and an averaging circuit 12 calculates the average of the frequency characteristics in the respective blocks. Then, a differential value H between the block equipped with the maximum level and the other block is calculated and this calculated value is defined as the input to a neural network 20. Thus, the real-time processing is enabled and the high rate of recognition is secured.

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、電気錠、ＩＣカード等のオンライン゛・端末
等て入力音声からその単語を認識するに好適な音声認識
方法に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition method suitable for recognizing words from input speech using online terminals such as electric locks and IC cards.

［従来の技術］従来、雑音の影響、回線等の入力系の相違等に対して高
い認識率を確保できる音声認識方法として、特開平１−
２６０４９０号公報に記載の如くのものが提案されてい
る。この音声認識方法は、音声をＬＰＧ分析して１フレ
ーム毎にＬＰＣケプストラムを算出し、フレーム間のＬ
ＰＣケプストラムの差分値を入力パラメータとして作成
する。他方、各音声の標準パターンも同様の差分値にて
作成してオく、そして、入力パラメータと標準パターン
との類似度を統計的尺度により計算し、類似度が最大と
なる標準パターンに対応する音声を認識結果とする。[Prior Art] Conventionally, Japanese Unexamined Patent Application Publication No. 1999-1-1 has been used as a speech recognition method that can ensure a high recognition rate despite the influence of noise, differences in input systems such as lines, etc.
A method as described in Japanese Patent No. 260490 has been proposed. This speech recognition method calculates the LPC cepstrum for each frame through LPG analysis of speech, and calculates the LPC cepstrum between frames.
Create the difference value of the PC cepstrum as an input parameter. On the other hand, a standard pattern for each voice is also created using the same difference value, and the degree of similarity between the input parameters and the standard pattern is calculated using a statistical measure, and the standard pattern with the maximum degree of similarity is selected. Use the voice as the recognition result.

［発明が解決しようとする課題］然しながら、従来技術には下記■〜■の問題点がある。[Problem to be solved by the invention] However, the prior art has the following problems (1) to (3).

■ＬＰＣ分析は、音声を時系列的に多数区分したフレー
ム毎に順次行なうものであるため、処理時間が多大であ
る。(2) Since the LPC analysis is performed sequentially for each frame in which audio is divided into multiple segments in time series, the processing time is large.

■入力パラメータの算出後の類似度の計算もフレーム毎
に順次行なうものであるため、処理時間が多大である。(2) Calculation of similarity after calculation of input parameters is also performed sequentially for each frame, which requires a large amount of processing time.

■上記■、■により、実時間処理を行なうためには、高
級で複雑な処理が必要となる。(2) Due to the above (2) and (2), high-class and complicated processing is required to perform real-time processing.

本発明は、容易に実時間処理でき、かつ高い認識率を確
保できる音声認識方法を提供することを目的とする。An object of the present invention is to provide a speech recognition method that can easily perform real-time processing and ensure a high recognition rate.

［課題を解決するための手段］請求項１に記載の本発明は、ニューラルネットワークを
用いて入力音声からその単語を認識する単語認識方法で
あって、入力音声の周波数特性を算出し、各帯域のそれ
ぞれにおいて時間的に等分割した音声区間のそれぞれを
１つのブロックとして、各ブロックの中で周波数特性の
平均を算出し、それらの平均を対応する帯域内の最大の
レベルを持つブロックと他のブロックとの間で差分した
値を、ニューラルネットワークへの入力として用いるよ
うにしたものである。[Means for Solving the Problems] The present invention as set forth in claim 1 is a word recognition method for recognizing a word from an input voice using a neural network, which calculates the frequency characteristics of the input voice and calculates the frequency characteristics of each band. In each of the blocks, each of the speech intervals divided equally in time is treated as one block, and the average frequency characteristics are calculated within each block, and these averages are divided into the block with the maximum level in the corresponding band and the other blocks. The difference value between the block and the block is used as an input to the neural network.

請求項２に記載の本発明は、前記ニューラルネットワー
クが階層的なニューラルネットワークであるようにした
ものである。According to a second aspect of the present invention, the neural network is a hierarchical neural network.

［作用コ請求項１に記載の本発明によれば、下記■〜■の作用効
果がある。[Function] According to the present invention as set forth in claim 1, there are the following effects (1) to (4).

■ニューラルネットワークへ入力する特徴パラメータと
して「周波数特性」を用いたから、入力を得るための前
処理が、ＬＰＧ相関やＬＰＣケプストラムの如くの複雑
な特徴量抽出に比して単純で並列的に周波数分析でき、
その前処理に要する時間が短くて足りる。■Since "frequency characteristics" are used as the feature parameters input to the neural network, the preprocessing to obtain the input is simpler and parallel frequency analysis compared to complex feature extraction such as LPG correlation or LPC cepstrum. I can do it,
The time required for the pretreatment is short.

■ニューラルネットワークは、原理的に、ネットワーク
全体の演算処理が単純かつ迅速である。■In principle, neural networks allow simple and quick calculation processing of the entire network.

■ニューラルネットワークは、原理的に、それを構成し
ている各ユニットか独立に動作しており、並列的な演算
処理が可能である。従って、演算処理が迅速である。■In principle, each unit that makes up a neural network operates independently, and parallel arithmetic processing is possible. Therefore, calculation processing is quick.

■上記■〜■により、音声認識処理を複雑な処理装置に
よることなく容易に実時間処理できる。(2) With the above (2) to (4), voice recognition processing can be easily performed in real time without using a complicated processing device.

■定常的なスペクトル歪に強く、高い認識率を維持てき
る。これは、以下に解析する如く、入力音声の各ブロッ
クでの周波数特性の平均を同一帯域内の最大レベルを持
つプロ・νりと他のプロ・νりとの間で差分するもので
あるため、スペクトル歪を消去できることによる。即ち
、■をプロ・ツク番号、ｋを帯域番号、Ａｋをに帯域の
周波数伝送特性、Ｓ　ｍｉｋを学習段階でのに帯域１ブ
ロツクの音声信号、Ｓ　ｔｉｋを評価段階で電話回線を
通した後における如く、定常的な周波数伝送特性Ａｋの
影響によりスペクトルが歪んだ、ｋ帯域１ブロックの音
声信号とする時、Ｓｔｉｋ＝ＡｋＩＩＳｍｉである。そして、評価段階での各音声信号Ｓ　ｔｉｋを
単語のパワー全体で正規化したものは、である、そして
、上記（１）式の対数を取り、例えば同一帯域内で最大
のレベルを持つｊブロックとの差分を取る。この差分値
Ｈは、＝　ｌｏｇ（Ｓ　ｔｉｋ）−ｌｏｇ（Σ　ΣＳ　ｔｉｋ
）−ｌｏｇ（Ｓ　ｔｊｋ）＋　ｌｏｇ（ΣΣＳ　ｔｉｋ
）＝　ｌｏｇ（Ｓ　ｔｉｋ）−ｌｏｇ（Ｓ　ｔｊｋ）　
　　　　　　　　・・・（２）＝　ｌｏｇ（Ａ　ｋ−８
ｍ１ｋ）−１ｏｇ（Ａ　ｋ−８ｍｊｋ）＝　ｌｏｇ（Ａ
　ｋ）＋　ｌｏｇ（Ｓ　ｍ１ｋ）−ｌｏｇ（Ａ　ｋ）−
ｌｏｇ（Ｓ　ｍｊｋ）＝　ｌｏｇ（Ｓ　ｍ１ｋ）−ｌｏ
ｇ（Ｓ　ｍｊｋ）　　　　　　　　　・・・（３）であ
る。■It is resistant to constant spectral distortion and maintains a high recognition rate. This is because, as analyzed below, the average frequency characteristics of each block of the input audio are differentiated between the pro-v having the maximum level in the same band and the other pro-v. , due to the ability to eliminate spectral distortion. That is, ■ is the program number, k is the band number, Ak is the frequency transmission characteristic of the band, Smik is the audio signal of one block of the band at the learning stage, and Stik is the voice signal after passing through the telephone line at the evaluation stage. As shown in , when the audio signal of one block in the k band is spectrum-distorted due to the influence of the stationary frequency transmission characteristic Ak, Stik=AkIISmi. Then, each speech signal S tik at the evaluation stage is normalized by the entire power of the word, and then, by taking the logarithm of the above equation (1), for example, the j block having the maximum level in the same band Take the difference between This difference value H is: = log(S tik) - log(Σ ΣS tik
)−log(S tjk)+log(ΣΣS tik
)=log(S tik)−log(S tjk)
...(2)=log(A k-8
m1k)-1og(A k-8mjk)=log(A
k) + log(S m1k)−log(A k)−
log(S mjk)=log(S m1k)−lo
g(S mjk) (3).

上記差分値Ｈの（２）式は評価段階における音声信号の
差分値を表わし、（３）式は学習段階における音声信号
の差分値を表わす。即ち、（２）式の評価段階における
音声信号の差分値は、周波数伝送特性Ａｋを消去されて
、（３）式の学習段階の差分値と同等になる、即ちスペ
クトル歪を消去できるのである。Equation (2) of the above difference value H represents the difference value of the audio signal at the evaluation stage, and equation (3) represents the difference value of the audio signal at the learning stage. That is, the difference value of the audio signal in the evaluation stage of equation (2) has the frequency transmission characteristic Ak removed and becomes equal to the difference value in the learning stage of equation (3), that is, spectral distortion can be eliminated.

請求項２に記載の本発明によれば、下記■の作用効果が
ある。According to the present invention as set forth in claim 2, there is the following effect (2).

０階層的なニューラルネットワークにあっては、現在、
後述する如くの簡単な学習アルゴリズム（パックプロパ
ゲーション）が確立されており、高い認識率を実現でき
るニューラルネットワークを容易に形成できる。Currently, in a zero-layer neural network,
A simple learning algorithm (pack propagation) as described below has been established, and a neural network that can achieve a high recognition rate can be easily formed.

［実施例コ第１図は本発明が適用された音声認識システムの一例を
示す模式図、第２図はニューラルネットワークを示す模
式図、第３図は階層的なニューラルネットワークを示す
模式図、第４図はユニットの構造を示す模式図である。[Embodiment] Figure 1 is a schematic diagram showing an example of a speech recognition system to which the present invention is applied, Figure 2 is a schematic diagram showing a neural network, Figure 3 is a schematic diagram showing a hierarchical neural network, FIG. 4 is a schematic diagram showing the structure of the unit.

本発明の具体的実施例の説明に先立ち、ニューラルネッ
トワークの構成、学習アルゴリズムについて説明する。Prior to describing specific embodiments of the present invention, the configuration of the neural network and the learning algorithm will be described.

（１）ニューラルネットワークは、その構造から、第２
図（Ａ）に示す階層的ネットワークと第２図（Ｂ）に示
す相互結合ネットワークの２種に大別できる０本発明は
、両ネットワークのいずれを用いて構成するものであっ
ても良いが、階層的ネットワークは後述する如くの簡単
な学習アルゴリズムが確立されているためより有用であ
る。(1) Due to its structure, neural networks
The present invention can be roughly divided into two types: the hierarchical network shown in FIG. 2(A) and the interconnected network shown in FIG. 2(B). Hierarchical networks are more useful because simple learning algorithms have been established as described below.

（２）ネットワークの構造階層的ネットワークは、第３図に示す如く、入力層、中
間層、出力層からなる階層構造をとる。(2) Network Structure A hierarchical network has a hierarchical structure consisting of an input layer, an intermediate layer, and an output layer, as shown in FIG.

各層は１以上のユニットから構成される。結合は、入力
層→中間層→出力層という前向きの結合たけで、各層内
での結合はない。Each layer is composed of one or more units. The connections are forward connections from the input layer to the middle layer to the output layer, and there are no connections within each layer.

（３）ユニットの構造ユニットは第４図に示す如く脳のニューロンのモデル化
であり構造は簡単である。他のユニットから入力を受け
、その総和をとり一定の規則（変換関数）で変換し、結
果を出力する。他のユニットとの結合には、それぞれ結
合の強さを表わす可変の重みを付ける。(3) Structure of the unit The unit is a model of a neuron in the brain and has a simple structure as shown in FIG. It receives input from other units, sums it up, transforms it using a certain rule (conversion function), and outputs the result. Each connection with another unit is given a variable weight that represents the strength of the connection.

（４）学習（パックプロパゲーション）ネットワークの
学習とは、実際の出力を目標値（望ましい出力）に近づ
けることであり、−船釣には第４図に示した各ユニット
の変換関数及び重みを変化させて学習を行なう。(4) Learning (pack propagation) Network learning means bringing the actual output closer to the target value (desired output). Learn by making changes.

又、学習のアルゴリズムとしては、例えば、Ｒｕｇ＋ｅ
ｌｈａｒｔ、　Ｄ、Ｅ、、ＭｃＣｌｅｌｌａｎｄ、　Ｊ
、Ｌ、　ａｎｄ　ｔｈｅｐＨｌ’　Ｒｅ５ｅａｒｃｈ　
Ｇｒｏｕｐ、　ＰＡＲＡＬＬＥＬ　ＤＩＳＴＲｉＢｔｌ
ＴＥＤＰＲＯＣＥＳＳＩＮＧ、　ｔｈｅ　ＭＩＴ　Ｐｒ
ｅｓｓ、　１９８６．に記載されているパックプロパゲ
ーションを用いることができる。Also, as a learning algorithm, for example, Rug+e
lhart, D.E., McClelland, J.
, L, and thepHl' Re5earch
Group, PARALLEL DISTRiBtl
TEDPROCESSING, the MIT Pr
ess, 1986. Pack propagation as described in .

゛以下、本発明の具体的な実施例について説明する。゛Hereinafter, specific embodiments of the present invention will be described.

認識システム１は、１６チヤンネルのバンドパスフィル
タ１１、平均化回路１２、ブロック差分回路１３、ニュ
ーラルネットワーク２０、判定回路３０の結合にて構成
される（第１図参照）。The recognition system 1 is composed of a 16-channel bandpass filter 11, an averaging circuit 12, a block difference circuit 13, a neural network 20, and a determination circuit 30 (see FIG. 1).

この認識システム１にあっては、認識単語を４７都道府
県名、特定話者を１名とした。以下、認識システム１の
学習動作と評価動作について詳述する。In this recognition system 1, the recognized words were the names of 47 prefectures, and the specific speaker was one person. The learning operation and evaluation operation of the recognition system 1 will be described in detail below.

（学習）１、入力作成 ■各認識単語の既知入力音声波形を１６チヤンネルのバ
ンドパスフィルタ１１に通し、入力音声の周波数特性を
算出する。(Learning) 1. Input creation ■ Pass the known input speech waveform of each recognized word through a 16-channel bandpass filter 11 to calculate the frequency characteristics of the input speech.

■バンドパスフィルタ１１の各帯域のそれぞれにおいて
音声波形を時間的に８等分割した音声区間のそれぞれを
１つのブロックとして、平均化回路１２により、各ブロ
ックの中で、上記■で求めた周波数特性の平均を算出す
る。この学習段階における音声信号のに帯域ｌブロック
での周波数特性の平均を、Ｓ　ｍｉｋとする。■In each band of the band pass filter 11, the audio waveform is temporally divided into eight equal parts, each of which is treated as one block, and the averaging circuit 12 calculates the frequency characteristics obtained in the above (■) within each block. Calculate the average of Let S mik be the average frequency characteristic of the audio signal in the band l block in this learning stage.

■各帯域毎に最大のレベルを持つブロックを見つける。■Find the block with the maximum level for each band.

そして、最大のレベルを持つブロックと他のブロックと
の差分値を求める。即ち、上記■で各帯域にて求めた各
ブロックの周波数特性の平均を、単語のパワーΣΣＳ　
ｍｉｋで除算して正規化し、次に対数を取り、同一帯域
内で最大のレベルを持つｊブロックとの差分を取り、前
記（３）式の如くの差分値Ｈを算出する。Then, the difference value between the block with the maximum level and other blocks is determined. That is, the average of the frequency characteristics of each block obtained in each band in (■) above is calculated as the word power ΣΣS
It is normalized by dividing by mik, then the logarithm is taken, and the difference with block j having the maximum level within the same band is taken to calculate the difference value H as shown in equation (3) above.

■上記■で求めた値をニューラルネットワーク２０への
入力とする。入力個数は１６チヤンネル×８ブロック＝
１２８個となる。(2) The value obtained in (2) above is input to the neural network 20. Number of inputs is 16 channels x 8 blocks =
There will be 128 pieces.

２、学習 ■　１２８個の入力層と４８個の出力層をもっニューラ
ルネットワーク２ｏを用いる。2. Learning ■ A neural network 2o with 128 input layers and 48 output layers is used.

０４７個の認識単語のそれぞれに番号付けし、４７個の
出力層と対応させ、各認識単語について上記ｌの■で求
めた入力に対し、その単語に対応した出力層が「１」、
その他の出力層が「Ｏ」という値（目標値）になるよう
に、パックプロパゲーションにより５０００回学習する
。これにより、一定認識率を保証し得るニューラルネッ
トワーク２ｏを構築する。Each of the 047 recognized words is numbered and made to correspond to the 47 output layers, and for each recognized word, the output layer corresponding to that word is "1" for the input obtained in step 1 above.
Learning is performed 5000 times by pack propagation so that the other output layers have a value of "O" (target value). In this way, a neural network 2o that can guarantee a constant recognition rate is constructed.

（評価）１、入力作成 ■各認識単語の未知入力音声波形を１６チヤンネルのバ
ンドパスフィルタ１１に通し、入力音声の周波数特性を
算出する。(Evaluation) 1. Input Creation - Pass the unknown input speech waveform of each recognized word through a 16-channel bandpass filter 11 to calculate the frequency characteristics of the input speech.

■バンドパスフィルタ１１の各帯域のそれぞれにおいて
音声波形を時間的に８等分割した音声区間のそれぞれを
１つのブロックとして、平均化回路１２により、各ブロ
ックの中で、上記■で求めた周波数特性の平均を算出す
る。この評価段階における音声信号のに帯域ｉブロック
での周波数特性の平均を、Ｓ　ｔｉｋとする。■In each band of the band pass filter 11, the audio waveform is temporally divided into eight equal parts, each of which is treated as one block, and the averaging circuit 12 calculates the frequency characteristics obtained in the above (■) within each block. Calculate the average of The average frequency characteristic of the audio signal in the i-band block at this evaluation stage is defined as Stik.

そして、最大のレベルを持つブロックと他のブロックと
の差分値を求める。即ち、上記■で各帯域にて求めた各
ブロックの周波数特性の平均を、単語の全パワーΣΣＳ
　ｔｉｋで除算して前記（１）式の如く正規化し、次に
対数を取り、同一帯域内で最大のレベルを持っｊブロッ
クとの差分を取り、前記（２）式の如くの差分値Ｈな算
出する。Then, the difference value between the block with the maximum level and other blocks is determined. That is, the average of the frequency characteristics of each block obtained in each band in the above (■) is calculated as the total power of the word ΣΣS
Divide by tik to normalize as in equation (1) above, then take the logarithm, take the difference from block j having the maximum level within the same band, and calculate the difference value H as in equation (2) above. calculate.

２、学習 ■上記■で求めた値をニューラルネットワーク２０へ入
力する。2. Learning ■ Input the values obtained in (■) above to the neural network 20.

■ニューラルネットワーク２０の出力層の値より判定回
路３０にて入力単語を判定する。(2) The determination circuit 30 determines the input word based on the value of the output layer of the neural network 20.

以下、本発明の実験結果について説明する。Below, experimental results of the present invention will be explained.

（実験１）本発明例として、周波数特性の平均を同一帯域内の最大
のレベルを持つブロックと他のブロックとの間で差分し
たものをニューラルネットワーク２０への入力とした。(Experiment 1) As an example of the present invention, the difference between the average frequency characteristics of the block having the maximum level and other blocks within the same band was input to the neural network 20.

認識単語を４７都道府県名、特定話者を１名とした。The words to be recognized were the names of 47 prefectures, and the specific speaker was one person.

結果、認識率は９４．０％、処理速度は１秒以内（１単
語平均認識時間）であった。As a result, the recognition rate was 94.0%, and the processing speed was within 1 second (average recognition time for one word).

（実験２）比較例として、ＬＰＧ相関とＬＰＣケプストラムのフレ
ーム間差分値にて、入力パラメータと標準パターンのそ
れぞれを作成し、両者の類似度を統計的尺度により計算
した。認識単語を４７都道府県名、特定話者を１名とし
た。(Experiment 2) As a comparative example, an input parameter and a standard pattern were each created using the inter-frame difference values of the LPG correlation and the LPC cepstrum, and the degree of similarity between the two was calculated using a statistical scale. The words to be recognized were the names of 47 prefectures, and the specific speaker was one person.

結果、認識率は９３．２％、処理速度は１秒以上（１単
語平均認識時間）であった。As a result, the recognition rate was 93.2%, and the processing speed was more than 1 second (average recognition time for one word).

以下、上記実施例の作用について説明する。Hereinafter, the operation of the above embodiment will be explained.

■ニューラルネットワーク２０へ入力する特徴パラメー
タとして「周波数特性」を用いたから、入力を得るため
の前処理が、ＬＰＧ相関やＬＰＣケプストラムの如くの
複雑な特徴量抽出に比して単純で並列的に周波数分析で
き、その前処理に要する時間が短くて足りる。■Since "frequency characteristics" are used as the feature parameters input to the neural network 20, the preprocessing to obtain the input is simpler and parallel to the frequency characteristics than complex feature extraction such as LPG correlation or LPC cepstrum. can be analyzed, and the time required for preprocessing is short.

■ニューラルネットワーク２０は、原理的に、ネットワ
ーク全体の演算処理が単純かつ迅速である。(2) In principle, in the neural network 20, the calculation processing of the entire network is simple and quick.

■ニューラルネットワーク２０は、原理的に１、それを
構成している各ユニットが独立に動作しており、並列的
な演算処理が可能である。従って、演算処理が迅速であ
る。(1) In principle, each unit constituting the neural network 20 operates independently, and parallel arithmetic processing is possible. Therefore, calculation processing is quick.

■定常的なスペクトル歪に強く、高い認識率を維持でき
る。これは、［作用］の■にて前述の如く、評価段階で
算出した（２）式の如くの差分値が、周波数伝送特性Ａ
ｋを消去されて、学習段階で算出した（３）式の如くの
差分値と同等となり、雑音の影響や回線等の入力系の相
違に起因するスペクトル歪を消去できるからである。■It is resistant to constant spectral distortion and can maintain a high recognition rate. As mentioned above in [Operation], this means that the difference value calculated in the evaluation stage as shown in equation (2) is the frequency transmission characteristic A.
This is because k is eliminated and the difference value becomes equivalent to the difference value as shown in equation (3) calculated in the learning stage, and spectral distortion caused by the influence of noise or differences in input systems such as lines can be eliminated.

０階層的なニューラルネットワーク２０にあっては、現
在、前述の如くの簡単な学習アルゴリズム（バックプロ
パゲーション）が確立されており、高い認識率を実現で
きるニューラルネットワーク２０を容易に形成できる。For the 0-layer neural network 20, a simple learning algorithm (backpropagation) as described above has been established, and it is possible to easily form the neural network 20 that can achieve a high recognition rate.

［発明の効果コ以上のように本発明によれば、容易に実時間処理でき、
かつ高い認識率を確保できる音声認識方法を得ることが
できる。[Effects of the Invention] As described above, according to the present invention, real-time processing is easily possible.
Moreover, it is possible to obtain a speech recognition method that can ensure a high recognition rate.

[Brief explanation of drawings]

第１図は本発明が適用された音声認識システムの一例を
示す模式図、第２図はニューラルネットワークを示す模
式図、第３図は階層的なニューラルネットワークを示す
模式図、第４図はユニットの構造を示す模式図である。１・・・認識システム、１０・・・バンドパスフィルタ、１２・・・平均化回路、１３・・・ブロック差分回路、２０・・・ニューラルネットワーク、３０・・・判定回路。Fig. 1 is a schematic diagram showing an example of a speech recognition system to which the present invention is applied, Fig. 2 is a schematic diagram showing a neural network, Fig. 3 is a schematic diagram showing a hierarchical neural network, and Fig. 4 is a schematic diagram showing a unit. FIG. DESCRIPTION OF SYMBOLS 1... Recognition system, 10... Band pass filter, 12... Averaging circuit, 13... Block difference circuit, 20... Neural network, 30... Judgment circuit.

Claims

[Claims]

(1) A word recognition method that uses a neural network to recognize words from input speech, in which the frequency characteristics of the input speech are calculated, and each speech interval divided equally in time in each band is divided into one As a block, the average frequency characteristics are calculated within each block, and the difference between the average and the block with the highest level in the corresponding band and other blocks is applied to the neural network. Speech recognition method used as input.

(2) The speech recognition method according to claim 1, wherein the neural network is a hierarchical neural network.