JPS62283400A

JPS62283400A - Voice recognition

Info

Publication number: JPS62283400A
Application number: JP12680986A
Authority: JP
Inventors: 田部井　幸雄; 森戸　誠
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-05-31
Filing date: 1986-05-31
Publication date: 1987-12-09

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】３、発明の詳細な説明（産業上の利用分野）この発明は音声認識方法に関し、特にローカルピークを
用いて単語音声認識を行なう音声認識方法に関する。Detailed Description of the Invention 3. Detailed Description of the Invention (Field of Industrial Application) The present invention relates to a speech recognition method, and particularly to a speech recognition method that performs word speech recognition using local peaks.

（従来の技術）従来この種の音声認識方法として、日本音響学会講演論
文集ｌ−４−１２（昭和６０年３月）　Ｐ、２３−２４
に３己載されるものがあった。(Prior art) Conventionally, as this type of speech recognition method, the Acoustical Society of Japan Proceedings l-4-12 (March 1985) P, 23-24
There were three articles published in .

第３図は従来のローカルピークを用いた音声認識方法を
説明するための流わ図である。従来方法では入力に声を
１２にＨ２，１２ビツトでＡ　、／　Ｄ変換しくステッ
プ３０、以下ステップをＳで表わす）、続いてＮチャネ
ル（ｌをチャネル数とするとｉ＝１、・・・、Ｎ）例え
ば２２チヤネルのディジタルのバントパスフィルタ（Ｂ
ＰＦ）バンクにて１０ｍ　ｓ　ｅ　ｃ毎に周波数分析を
行なって入力パタンをｊ３する（　Ｓ　３１）。次に得
られた入力パタンから周波数・軸方向の極大値、すなわ
ちローカルピーク特徴を抽出する（　Ｓ　３２）。FIG. 3 is a flowchart for explaining a conventional speech recognition method using local peaks. In the conventional method, the input voice is converted into 12 H2, 12-bit A/D conversion in step 30 (hereinafter the step is denoted by S), and then N channels (where l is the number of channels, i=1, . . . ) N) For example, a 22-channel digital bandpass filter (B
PF) Bank performs frequency analysis every 10 msec and converts the input pattern to j3 (S31). Next, local maximum values in the frequency and axial directions, that is, local peak features, are extracted from the obtained input pattern (S32).

ローカルピーク特徴抽出基準として、振幅及び周波数・
噛共に対数で表わした最小ニー近似直線を求める。求め
られた最小二乗近似直線を越えるビークに対応するチャ
ネルをローカルピーク打りとして“１”に設定し、残り
を“０”と設定して２値化を行なう。Amplitude and frequency are used as local peak feature extraction criteria.
Find the minimum knee approximation line expressed logarithmically. The channel corresponding to the peak that exceeds the obtained least squares approximation straight line is set to "1" as a local peak, and the remaining channels are set to "0" to perform binarization.

入力音声か登録用であるか認識対象用であるかの＃１１
定（３３，３）により、合線用である場合には、前述の
２値化されたパタンの加算により標準パタンの作成を行
なう（５月）。#11 Whether the input voice is for registration or recognition target
According to (33, 3), if the pattern is for a composite line, a standard pattern is created by adding the binarized patterns described above (May).

一方、認識対象用（以下、入力パタンと称す）である場
合、標準パタンとの類似度計算を行なって最大類似度を
与える標準パタンのカテゴリ名を認識結果として出力す
る（　Ｓ　３５）。On the other hand, if the pattern is for recognition (hereinafter referred to as input pattern), the similarity with the standard pattern is calculated and the category name of the standard pattern that gives the maximum similarity is output as the recognition result (S35).

類似度としては、認識対象用の大カパタン中の第２フレ
ームと標準パタン中の第ｍフレームとを時間的に対応さ
せた場合、（１）式で定義する。The degree of similarity is defined by equation (1) when the second frame in the large pattern for recognition and the m-th frame in the standard pattern are made to temporally correspond.

尚、２、ｍは任意の正の整数とする。Note that 2 and m are arbitrary positive integers.

Ｓｌ、＝Ｘｔ　２！　／　（ＩＸ’ｌ　ｌｌｚ、ｉ　　
）　・・（１）ココテ’ｌＬ　＝　（Ｘ＋Ａ　　、　Ｘ
ｚ！　　、”、ＸＮＬ）ハ大カパタン、１１は標準パタ
ンをそれぞれ表わし、ｔは転値を表わしている。又、ｉ
ｌ　Ｘｔｆｉ、ｉｌＺ、、ｌ　　は６々ｘＱ、、２．、
、ノノルムヲ表ワス。すなわち、である。Sl,=Xt 2! / (IX'l llz, i
) ... (1) Kokote'lL = (X+A, X
Z! , ",
l Xtfi,ilZ,,l is 6xQ,,2. ,
, Nonormuwo was an expression. That is, .

（発明か解決しようとする問題点）しかしなから、上述した従来の方法では、（１）及び（
２）式からも理解出来るように類似度計算において乗算
、平方根演算及び除算を行なうため、動作速度か遅く、
演算処理のための回路構成か複雑かつ大型となるという
問題点があった。(Problem to be solved by invention) However, in the conventional method described above, (1) and (
2) As can be understood from the formula, multiplication, square root operation, and division are performed in similarity calculation, so the operation speed is slow.
There was a problem in that the circuit configuration for arithmetic processing was complicated and large.

この発明は、上述した従来の問題点を除去し、部活な演
算にて動作速度の優れた行速認識方法を提供することに
ある。The object of the present invention is to eliminate the above-mentioned conventional problems and to provide a method for recognizing running speed that uses cumbersome calculations and is superior in operating speed.

（問題点を解決するための手段）この「１的の達成を図るため、この発明による音声認識
方法は、ローカルピークを用いた音声認識における類似
度計算方法において、認識対象用の入力パタンか２値で
あり、標準パタンか多値ではあるか、標準パタンは入力
パタンが“１”の値をとるときの径行の総和で求められ
るため、標準パタンの値は０からｑ録回数までの小さな
値であることに基つき、）刀根演算、除算を行わずに類
似度計算を行なうものである。(Means for Solving the Problems) In order to achieve the first objective, the speech recognition method according to the present invention uses a method for calculating similarity in speech recognition using local peaks. Whether it is a standard pattern or a multi-valued value, the standard pattern is found by the sum of the diameter lines when the input pattern takes the value "1", so the value of the standard pattern is a small value from 0 to the number of q records. Based on this fact, the similarity calculation is performed without performing a root operation or division.

従って、この発明によればローカルピーク特徴抽出処理
において、ローカルピークとして低域がら定められた個
数のみを抽出する。この場合、低域とは、周波数分析す
る際のチャネル数の番号付けを低周波側から行なったと
き、一番低周波のチャネル番号すなわち一番最初のチャ
ネルのことを意味する。又、この場合、定められた個数
とは、認識処理において認識率の低下を来さない程度の
個数をいい、好ましくは例えば７個とすることが出来る
。Therefore, according to the present invention, in the local peak feature extraction process, only a predetermined number of low frequency local peaks are extracted as local peaks. In this case, the low frequency band means the lowest frequency channel number, that is, the first channel when numbering channels in frequency analysis from the low frequency side. Further, in this case, the predetermined number refers to a number that does not cause a decrease in the recognition rate in recognition processing, and is preferably seven, for example.

次に、この発明によ九ば、類似度算出処理において次の
三つの計数処理を行なう。第一計数処理では認識対象パ
タンすなわち入力パタンの入力ローカルピークベクトル
の要素Ｘ＋　　（ｉ＝１．２、・・・、Ｎ）が“１”で
ある数を計数する。Next, according to the present invention, the following three counting processes are performed in the similarity calculation process. In the first counting process, the number in which the element X+ (i=1.2, . . . , N) of the input local peak vector of the recognition target pattern, that is, the input pattern is “1” is counted.

第二計数処理では入力ローカルピークヘクトルの要素Ｘ
、が“１パである場合の、対応する標準パタンベクトル
の要素Ｚ１の和を算出する。又、第三計数処理では、標
準パタンヘクトルの要素Ｚ１か“０”以外の時の要素Ｚ
、の二乗和を算出する。In the second counting process, the element X of the input local peak hector
, is "1 pa", the sum of the element Z1 of the corresponding standard pattern vector is calculated.In addition, in the third counting process, the sum of the element Z1 of the standard pattern vector or the element Z when it is other than "0" is calculated.
, calculate the sum of squares.

さらに、この類似度算出処理では、これら３つの計数処
理で得ら九たそわぞわの値をアドレスとして用いてテー
ブルを参照することにより、このデープルから類似度値
を読み取って出力する。このテーブルには、予めアドレ
スと対応する類似度値を格納して表として形成しておく
。又、上述の標準パタンも予めメモリに格納しておく。Furthermore, in this similarity calculation process, by referring to the table using the nine values obtained in these three counting processes as an address, the similarity value is read from this duple and output. In this table, similarity values corresponding to addresses are stored in advance and formed as a table. Further, the above-mentioned standard pattern is also stored in the memory in advance.

（作用）このように、この発明によれば、大カパタンの入力ロー
カルピークベクトルを２値で設定し、標準パタンヘクト
ルを多値で設定し、しかもこの標準パタンベクトルの値
の大きさは０から最大、入力ローカルピークベクトルか
“１”を取る総回数の値までであること、及び入力パタ
ンから抽出されるローカルピークの個数を低域からＬ！
識率に低下を来さない程度の少ない個数としたことによ
って、類似度算出のための計算回数を著しく低減出来る
。(Operation) As described above, according to the present invention, the input local peak vector of a large Kapatan is set as a binary value, and the standard pattern hector is set as a multi-valued value, and furthermore, the magnitude of the value of this standard pattern vector is from 0 to 0. The maximum value is the total number of times the input local peak vector takes "1", and the number of local peaks extracted from the input pattern is set from low to L!
The number of calculations for calculating the similarity can be significantly reduced by setting the number to a small value that does not cause a decrease in the recognition rate.

さらに、この発明によれば、除算、東方根演算を行なわ
ずに、乗算及び乗算結果の加算によって得ら才また値を
アドレスとしてテーブルから対応する類似度値を読み取
る方法である。Further, according to the present invention, there is a method of reading the corresponding similarity value from the table using the value obtained by multiplication and addition of the multiplication results without performing division or eastern root operation.

このようなことから、類似度算出のための演算速度処理
を簡素化することか出来、従って演算速度が高速化する
。又、演算処理の簡素化に基づいて、この行速認識方法
を実施するための装置の簡単化及び小型化が図れる。For this reason, the calculation speed processing for calculating the similarity can be simplified, and therefore the calculation speed can be increased. Furthermore, based on the simplification of arithmetic processing, it is possible to simplify and downsize the device for implementing this traveling speed recognition method.

（実施例）以下、図面を参照してこの発明の実施例につき説明する
。(Embodiments) Hereinafter, embodiments of the present invention will be described with reference to the drawings.

尚、以下の実施例ではチャネル数１を２２とした場合に
つき説明するが、何らこれに限定されるものではないこ
とを理解されたい。In the following embodiments, a case will be explained in which the number of channels is 22 instead of 1, but it should be understood that the present invention is not limited to this.

第１図はこの発明の一実施例の全体構成のブロック図で
あり、１０は音声入力端子、１２はＡ／Ｄ変換器、１４
はバントパスフィルタ（ＢＰＦ）、＋６はローカルピー
ク抽出部、１８は音声区間検出部、２０はメモリ、２２
は標準パタンメモリ、２４はＬｆ　／１１区間検出部１
８からの情報信号によってメモリ２０及び標準パタンメ
モリ２２を制御するための制御部である。２６は類似度
計算部であって、計数部２８．３０．３２、テーブルＲ
ＡＭ又はテーブルＲＯＭで形成された類似度デープル３
４及びマツチング部３６を具えている。さらに３８は判
定部、４０は出力端子である。尚、ハート構成上では好
ましくは第１図における１６．１８．２８、〜３２及び
３６はｌ凡用のマイクロプロセッサ４２で一体構成する
ことが出来る。FIG. 1 is a block diagram of the overall configuration of an embodiment of the present invention, in which 10 is an audio input terminal, 12 is an A/D converter, and 14 is an audio input terminal.
is a band pass filter (BPF), +6 is a local peak extractor, 18 is a voice section detector, 20 is a memory, 22
is standard pattern memory, 24 is Lf/11 section detection section 1
This is a control unit for controlling the memory 20 and the standard pattern memory 22 using information signals from the standard pattern memory 8. 26 is a similarity calculation unit, which includes a counting unit 28, 30, 32, and a table R.
Similarity daple 3 formed in AM or table ROM
4 and a matching section 36. Furthermore, 38 is a determination section, and 40 is an output terminal. In terms of the heart configuration, preferably 16, 18, 28, 32 and 36 in FIG.

次に第１図の装置ブロック図及び第２図の音声語Ａの流
ｉ図を参照してこの発明の音声認識方法につき説明する
。Next, the speech recognition method of the present invention will be explained with reference to the apparatus block diagram in FIG. 1 and the flowchart of the spoken word A in FIG. 2.

まず、音声入力端子ＩＯに音声が入力しく５２０）、従
来と同様にＡ／Ｄ変換器１２によってディジタル音声に
変換する（　Ｓ　２１）。この音声はＢＰＦ目によりＮ
チャネルこの実施例では２２チヤネルの周波数に分析さ
れる（　Ｓ　２２）。周波数分析後の出力は音声区間検
出部１８において音声区間か検出され（Ｓ２３）、他方
においてローカルピーク抽出部１６に送られ、そこで最
小二乗近似直線を超えるビークをローカルピーク有りと
して“１”と設定し、残りは“０”と設定し２値化し、
ローカルピークベクトルＸ＝　（Ｘ、、Ｘ２、・・・、
×２２）を抽出する（Ｓ２４）。尚、ローカルピーク抽
出は、音声区間検出部１６において決定された音声区間
内に対して行なわれ、メモリ２０に格納される。尚、こ
こで、音声区間の始端情報をＩ、とし、終端＋１１報を
ｌＥとする。First, audio is input to the audio input terminal IO (520), and is converted into digital audio by the A/D converter 12 (S21), as in the past. This audio is N depending on the BPF.
Channels are analyzed into 22 channel frequencies in this example (S22). The output after the frequency analysis is detected as a voice section by the voice section detection section 18 (S23), and then sent to the local peak extraction section 16, where a peak exceeding the least squares approximation straight line is set as "1" as a local peak. The remaining values are set to “0” and binarized,
Local peak vector X= (X,,X2,...,
×22) is extracted (S24). Note that local peak extraction is performed within the voice section determined by the voice section detecting section 16 and stored in the memory 20. Here, the start end information of the voice section is assumed to be I, and the end point +11 information is assumed to be IE.

上述した音声区間検出部１８ではある特定の帯域のＢＰ
Ｆ出力の値を基にして音声と音声無しのフレームを判定
することによって音声区間を決定する。音声区間決定の
方法は、音声が入力されると信号エネルギーが大きくな
ることに基つき、ある閾値とＢＰＦ出力値との比較によ
り行なう。又、他の情報（例えばゼロ交差数）を補足的
に用いる方式その他の数々の方式があるが、この発明の
目的ではないのでその説明を省略する。又、この実施例
では音声区間検出部１８の情報■５及び■８を制御部２
４に送り、制御部２４において入力の第２フレーム（ｌ
は音声区間内の値）及び標準パタンの第ｍフレームを選
択しく後述する）、各々メモリ２０及び標準パタンメモ
リ２２に伝達する。The voice section detection unit 18 described above detects the BP of a certain specific band.
A voice section is determined by determining voice and non-voice frames based on the value of the F output. The voice section determination method is based on the fact that signal energy increases when voice is input, and is performed by comparing a certain threshold value with a BPF output value. There are also a number of other methods, including a method that supplementarily uses other information (for example, the number of zero crossings), but their explanation will be omitted since they are not the purpose of this invention. Furthermore, in this embodiment, the information (5) and (8) of the voice section detection unit 18 is sent to the control unit 2.
4, and the control unit 24 sends the input second frame (l
(values within the voice interval) and the m-th frame of the standard pattern (described in detail later) are transmitted to the memory 20 and the standard pattern memory 22, respectively.

次に類似度計算！ｌ５２６において館述した（１）式に
示すのと等価な類似度を計算する（　Ｓ　２５）。これ
は以ド示すような方法により実現される。計算部２８で
は、メモリ２０内の第ｍフレーム（ｆｆｉ＝［Ｉ１９、
Ｉ、］）の入力ローカルピークベクトルＸｔの恕素Ｘｇ
＋−（ｉ＝１．２、・・・、２２）か“ｌ”である数を
計数する（第　計数処理）。入力ローカルピーク抽出部
ルは２値であり、Ｉ×１＝１であるので、これは、ｌ　ｘｂｉ２＝ＸＬ　ｔｙ　＝ΣＸ８２２　　・・・・
・　（３）ｍｒをΔ１数することになる。Next, calculate the similarity! A similarity equivalent to that shown in equation (1) described in 1526 is calculated (S25). This is achieved by the method shown below. The calculation unit 28 calculates the mth frame (ffi=[I19,
I, ]) of the input local peak vector Xt
+-(i=1.2, . . . , 22) or “l” is counted (first counting process). Since the input local peak extractor is binary and I×1=1, this is: l xbi2=XL ty =ΣX822...
・(3)mr will be multiplied by Δ1.

ＩＮＩＮ部数０には、２値の入力ローカルピークベクト
ルＸとこれに対応する第ｍフレームのある標準パタンヘ
クトル２゜が入力され、入力ローカルピークパタンＸＬ
（７）要素ｘ、Ａ（ｉ＝ｔ、２、・・・、（２）が“１
”である場合の標準パタンヘクトルの要素ｚ、、（ｉ＝
＋ｉ、２、・・・、２２）の和を算出ず′６（第二計数
処理）。即ち、Ｘ、ｌは“１“か“０”であるのでが計数される。The binary input local peak vector
(7) Elements x, A (i=t, 2,..., (2) are "1"
”, the element z of the standard pattern hector, , (i=
+i, 2, ..., 22)'6 (second counting process). That is, since X and l are "1" or "0", they are counted.

計数部３２では、標準パタンベクトル２ｍの要素Ｚ１．
．の“０”以外の要素Ｚ　＋＋＋の二乗和か算出される
（第三計数処理）。すなわち、１２．１２＝　７，２゜；Σ２１ｆｆｉ・・・・・・　
（５）１；イか計数される。The counting unit 32 calculates element Z1. of the standard pattern vector 2m.
．． The sum of squares of the elements Z +++ other than "0" is calculated (third counting process). That is, 12.12=7,2°; Σ21ffi...
(5) 1; A is counted.

この実施例においては、ローカルピーク抽出部１６にお
いて２２チヤネルのローカルピークのうち、低域からあ
る定めらねた個数例えば最大７個まで求めることにして
いる。これは７個に限定しても認識率の低下は全く起き
ず、メモリ削減効果（後述する）が大きいためであるか
、認識率の低下を来さない適当な個数に設定出来る。In this embodiment, the local peak extractor 16 extracts a predetermined number of local peaks from the low range, for example, up to seven local peaks from the 22 channels. This may be because the recognition rate does not decrease at all even if the number is limited to seven, and the memory reduction effect (described later) is large, or the number can be set to an appropriate number that does not cause a decrease in the recognition rate.

この実施例ではローカルピークの数の鼓犬値を７としで
あるので（３）式は１〜７の値となり３ビツトで表わせ
、（４）式は０〜２＋（＝３Ｘ７）の値となり５ビツト
で表わせ、（５〉式は１〜６３（＝３’　Ｘ７）の値と
なり６ビツトで表わせる。In this example, the drum value of the number of local peaks is set to 7, so equation (3) has a value of 1 to 7, which is expressed in 3 bits, and equation (4) has a value of 0 to 2+(=3X7), which is 5. Expression (5) has a value of 1 to 63 (=3'×7) and can be expressed in 6 bits.

よって（３）〜（５）式をまとめて１４ビツト（＝３＋
５＋６）で表わせる。Therefore, formulas (3) to (5) are combined into 14 bits (=3+
5+6).

このような３つの計数処理によってそれぞれ求められた
値によって表わされる１４ビツトの値をアドレスとして
テーブル３４に格納されている類似度値５Ｌ−ｏを読み
出す（Ｓ２６）、、このテーブル３４は（３）〜く５）
式から（１）式の値を求めるための変換テーブルであり
、この実施例ではＰめ８ビツトの類似度値を格納してお
く。テーブル３４の容量は１４ビツトアドレス、即ち　
１６　Ｋバイトで済む。（１）式の類似度値を８ビツト
としたのは、８ビツトとしても性能が劣化しなかったこ
とによる。The similarity value 5L-o stored in the table 34 is read out using the 14-bit value represented by the values obtained through these three counting processes as an address (S26), and this table 34 is written as (3). ～ku5)
This is a conversion table for calculating the value of equation (1) from the equation, and in this embodiment, Pth 8-bit similarity values are stored. The capacity of the table 34 is a 14-bit address, i.e.
It only takes 16K bytes. The similarity value in equation (1) was set to 8 bits because the performance did not deteriorate even with 8 bits.

以上の如く求めた類似度値Ｓｐ−に基づき、マツチング
部３６において、総合類似度Ｓ、を求める（Ｓ２７）。Based on the similarity value Sp- obtained as above, the matching section 36 calculates the overall similarity S (S27).

この場合、１とｍとの対応には、非線形に対応させる方
法もあるか、この実施例では線形に対応させるものとす
る。このとき総合類似度Ｓ、は次式（６）で求められる
。In this case, there is a method of non-linear correspondence between 1 and m, or in this embodiment, it is assumed that they are made to correspond linearly. At this time, the overall similarity S is determined by the following equation (6).

こ：で、ｋはある標準パタン番号（ｋ＝１．２、・・・
、Ｋ）、Ｌｋは標準パタンメモリ２２に格納されている
に番目の標準パタン長、には標準パタンの総数をそれぞ
れ示す。Here, k is a certain standard pattern number (k=1.2,...
, K) and Lk are the lengths of the second standard patterns stored in the standard pattern memory 22, and the total number of standard patterns is shown, respectively.

以上の如くしてに個の標準パタン全てについてＳｋ　（
ｋ＝１．２、・・・、に）を計算し、’！−＋１定部３
８に送る。判定部３８においては、最大類似度に０すな
わちｋｏ＝ａｒｇ　　ｍａｘ　　Ｓ＋。As described above, for all the standard patterns Sk (
k=1.2,...), and '! -+1 constant part 3
Send to 8. In the determination unit 38, the maximum similarity is 0, that is, ko=arg max S+.

）≦ｋｉＫなるｋ。を選択し出力端１’−４０に送る（　５２８）
。)≦kiK. Select and send to output terminal 1'-40 (528)
.

この発明は上述した実施例のみに限定されるものではな
い。例えば、上述した実施例では２２チヤネルの例につ
き説明したが、任意のチャネル数であってもこの発明を
適用して好適である。又、この発明は音声認識方法であ
るので、これを実施するための装置構成は上述した実施
例にのみ何ら限定されるものではない。The invention is not limited to the embodiments described above. For example, in the above-described embodiment, an example of 22 channels has been described, but the present invention can be applied to any number of channels. Furthermore, since the present invention is a voice recognition method, the device configuration for carrying out the method is not limited to the above-described embodiment.

（発明の効果）上述した説明からも明らかなようにこの発明によれば音
声のローカルピークパタンの性質を利用することにより
、類似度計算を少ない容量のメモリと簡ｊｐな計数部よ
り実現出来る。これがため、従来必要であった平方根演
算、除算か不要となり、従来の方法に比べ簡ｑｔな構成
で、高速に類似度計算を行なうことが可能となる。この
発明を実施する装置を小型な音声認識装置として実現出
来る。(Effects of the Invention) As is clear from the above description, according to the present invention, by utilizing the properties of local peak patterns of speech, similarity calculation can be realized with a small memory capacity and a simple counting section. This eliminates the need for square root calculations and divisions, which were conventionally required, and it becomes possible to perform similarity calculations at high speed with a simpler configuration than conventional methods. A device implementing this invention can be realized as a small-sized speech recognition device.

[Brief explanation of the drawing]

第１図はこの発明の音声！１方法の説明に供する、この
発明を実施するための音声認識装置の構成の一例を示す
ブロック図、第２図はこの発明の音声認識方法を説明するための流れ
図、第３図は従来の音声認識方法の説明に供する流れ図であ
る。１０・−・音声入力端ｆ、　　１２・・・Ａ／Ｄ変換番
１４・・・バンドパスフィルタバンク１６・・・ローカルピーク抽出部１８・・・音声区間検出部、　２０・・・メモリ２２・
・・標準パタンメモリ２４・・・制御部、　　　　　２６・・・類似度計数部
２８〜３２・・・計数部、　　　３４・・・テーブル３
６・・・マツチング部、　　３８・・・判定部４０・・
・出力端子。特許出願人　　　　沖電気工業株式会社＠？４２−　ｔ　ｎ　Ｉ　？　ＳＱ　’４’ｔ　＊　ラ’４
　Ｖ’）　３Ｌ　ｈ、”＠第３図Figure 1 is the audio of this invention! 1 is a block diagram showing an example of the configuration of a speech recognition device for carrying out the present invention to provide an explanation of the method; FIG. 2 is a flowchart to explain the speech recognition method of the present invention; and FIG. It is a flowchart used to explain a recognition method. 10... Audio input terminal f, 12... A/D conversion number 14... Band pass filter bank 16... Local peak extractor 18... Voice section detector, 20... Memory 22...
...Standard pattern memory 24...Control unit, 26...Similarity counting unit 28-32...Counting unit, 34...Table 3
6... Matching section, 38... Judgment section 40...
・Output terminal. Patent applicant Oki Electric Industry Co., Ltd.@? 42-tnI? SQ '4't * La'4
V') 3L h,"@Fig.3

Claims

[Claims]

(1) A process of frequency-analyzing the input audio to obtain an input pattern, finding a least squares approximation straight line from the input pattern, and setting the peak exceeding this to "1" as a local peak, and setting the others to "0". A local peak feature extraction process that performs binarization by setting the above-mentioned In a speech recognition method including a process of determining input speech according to similarity, the local peak feature extraction process has a function of extracting only a predetermined number of local peaks from the low range, and the similarity calculation process has a function of extracting only a predetermined number of local peaks from the low range. a first counting process that counts the number of "1" elements in the local peak vector; a second counting process that calculates the sum of the elements of the corresponding standard pattern when the input local peak vector is "1"; The third counting process calculates the sum of squares of the elements when the elements of the pattern vector are other than "0", and the values obtained from these first, second, and third counting processes are used as addresses, and similar values are calculated from the conversion table. A speech recognition method characterized by including processing of reading and outputting a degree value.