JPS58102999A

JPS58102999A - Voice recognition equipment

Info

Publication number: JPS58102999A
Application number: JP56202013A
Authority: JP
Inventors: 松村　純孝; 正則鈴木
Original assignee: Pioneer Corp; Pioneer Electronic Corp
Current assignee: Pioneer Corp
Priority date: 1981-12-15
Filing date: 1981-12-15
Publication date: 1983-06-18

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】本発明は音声認識装置に関する。[Detailed description of the invention] The present invention relates to a speech recognition device.

音声認識装置には入力音声の特徴を周波数分析等により
抽出して入カバターンを作成しその入カバターンと予め
登録された標準パターンとの類似度の大きさから入力音
声を識別するいわゆるノくターンマツチング方式のもの
がある。The speech recognition device uses a so-called nokuturn pine that extracts the characteristics of the input speech by frequency analysis, creates an input pattern, and identifies the input speech based on the degree of similarity between the input pattern and a pre-registered standard pattern. There is a ching method.

かかる方式の音声認識装置においては、入力・々ターン
と標準パターンとの類似度の計算、すなわちパターンの
マツチング計算を行なうことによって入力音声が識別さ
れる。このマ、ンチング計算は、通常、マイクロコンビ
ーータにより演算処理サレるが、所定の単語の音声を認
識させる場合、同一単語の音声であっても発声速度の変
動により音声の単語時間長に対応するパターン長及び音
声の周波数分布構造に対応するパターンの時間的変動は
同一ではない。このため、マツチング計算では人カバタ
ーンに非線形若しくは線形の時間変換を行なって標準パ
ターンとの類似度を比較しなければならない。ところが
、入力音声と登録された音声との単語時間長の差、すな
わち入カッ（ターンと標準パターンとのパターン長の差
が大きく異なる場合には処理時間が長くなり、また認識
率も良くないという問題点があった。In such a speech recognition device, input speech is identified by calculating the degree of similarity between an input turn and a standard pattern, that is, performing a pattern matching calculation. This machining calculation is usually processed by a microconverter, but when recognizing the audio of a given word, even if the audio is the same word, the length of the word can be adjusted by varying the speaking speed. The pattern length and the temporal variation of the pattern corresponding to the frequency distribution structure of the voice are not the same. For this reason, in the matching calculation, it is necessary to perform non-linear or linear time transformation on the human cover pattern and compare the degree of similarity with the standard pattern. However, if the difference in word length between the input speech and the registered speech, that is, the difference in the pattern length between the input speech and the standard pattern, is large, the processing time will be longer and the recognition rate will be poor. There was a problem.

そこで、本発明の目的は、パターンのマツチング計算の
処理時間を速めると共に認識率の向上を図った音声認識
装置を提供することである。SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide a speech recognition device that speeds up the processing time for pattern matching calculations and improves the recognition rate.

本発明による音声認識装置は、入力音声の単語時間長と
標準パターンとされた音声の単語時間長との差の大きさ
が所定時間内のときのみ７７チングを行なうようになさ
れている。The speech recognition apparatus according to the present invention is configured to perform 77 checking only when the magnitude of the difference between the word duration of input speech and the word duration of speech used as a standard pattern is within a predetermined time.

以下、本発明の実施例を図面を参照して説明する０第１図は本発明による音声認識装置のプロ、ツク図であ
る。第１図において、マイクロホンｌには登録または識
別すべき音声が入力され、マイクロホーン１の出力信号
すなわち音声信号はアンプ２−Ｃ増幅されて特徴抽出回
路３へ供給される。特徴抽出回路３は、アンプ２の出力
端に各々接続された複数ｏＢＰＦ（バンド・パス・フィ
ルタ）からなるフィルタ回路４とマルチプレクサ５とか
らなり、音声信号を周波数分析すべくフィルタ回路４の
出力はマルチプレクサ５により順次時分割的にサンプリ
ングされる。マルチプレクサ５の出力は〜巾（アナログ
／デジタル）変換器６を介してマイクロコンピュータ７
に供給される。マイクロコンヒユータフはプロセッサ、
クロック発生器、メモリ。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a schematic diagram of a speech recognition device according to the present invention. In FIG. 1, a voice to be registered or identified is input to a microphone 1, and the output signal of the microphone 1, that is, the voice signal, is amplified by an amplifier 2-C and supplied to a feature extraction circuit 3. The feature extraction circuit 3 consists of a filter circuit 4 consisting of a plurality of oBPFs (band pass filters) each connected to the output end of the amplifier 2, and a multiplexer 5, and the output of the filter circuit 4 is The multiplexer 5 sequentially samples the signals in a time-division manner. The output of the multiplexer 5 is sent to the microcomputer 7 via a ~width (analog/digital) converter 6.
supplied to Microcomputer tough is a processor,
Clock generator, memory.

入出力インターフェース等からなり、登録時にはＡ　／
　Ｄ変換器６の出力信号である例えば８ビ、トのデジタ
ル信号から標準パターンを作成して記憶する。Consists of input/output interfaces, etc., and A /
A standard pattern is created and stored from, for example, an 8-bit digital signal that is the output signal of the D converter 6.

次に、認識時のマイクロコンピュータ７の動作を第２図
の動作フロー図を参照して説明する。Next, the operation of the microcomputer 7 during recognition will be explained with reference to the operation flow diagram of FIG.

マイクロコンピュータ７は、先ず、認ｍ用の音声の入力
を、例えば装置に設けられた表示ランプ等により要求す
ると共に認識用の音声が入力されるまで待機する（１１
）。そして、かかる音声がマイクロホン１から入力され
、その音声の特徴が特徴抽出回路３による周波数分析に
よシ抽出され、更にＡ／Ｄ変換器６を介してデジタル信
号となってマイクロコンピュータ７に供給されると、そ
のデジタル信号から入カバターンを作成する（１２）。First, the microcomputer 7 requests the input of the voice for recognition using, for example, a display lamp provided on the device, and waits until the voice for recognition is input (11).
). Then, such voice is inputted from the microphone 1, and the characteristics of the voice are extracted by frequency analysis by the feature extraction circuit 3, and then supplied to the microcomputer 7 as a digital signal via the A/D converter 6. Then, an input cover pattern is created from the digital signal (12).

次に、入カバターンのパターン長から入力音声の単語時
間長Ｔｗを計算する（１３）。この単語時間長Ｔｗは入
力音声のパワーレベルから判断して求める。ところで、
マルチプレクサ５は１９Ｔ定周期でフィルタ回路４０所
定数、例えば８個からなるＢＰＦの出力信号を走査して
Ａ／Ｄ変換器６に供給する。このため、入カバターンを
形成するＡ　／　Ｄ変換器６が発生するデジタル信号に
よるデータを入力順に所定数分加算することにより人力
音声のパワーレベルは求められる。そして、第３図に示
すような特性の音声レベルが得られる場合、通常、音声
の入力時ルベルが所定レベルＶｒ、を越えた時点ｔｘよ
り時間△１．だけ前の時点１１を単語の始端とし、また
音声レベルが所畑レベルｖｒ２を下回った時点ｔｙより
［ｌＨ’ｌ△ｔ２だけ後の時点ｔ２を単語の終端とする
。よって、単語時間長Ｔｗはｔ２−６１となる。次いで
、マイクロコンピー−タフは、上記のようにして求めら
れた入力音声の単語時間長Ｔｗとある特定の標準パター
ンとして登録された音声の単語時間長Ｔｗｎ　（ｎは自
然数）との差の大きさ△Ｔ’ｗｎをｎを１にして（１４
）から計算する（１５）。そして、差の大きさｉｗｎが
所定値Ｔｒ１例えば、２００　ｍ５ｅｃより大であるか
を判断する（１６）。△Ｔｗｎ　）　Ｔｒの場合にはそ
の標準パターンとのマツチング計算を行なわず、その標
準パターンとの類似度を０とする（１７）。Next, the word duration Tw of the input speech is calculated from the pattern length of the input pattern (13). This word duration Tw is determined based on the power level of the input speech. by the way,
The multiplexer 5 scans the output signal of the BPF consisting of a predetermined number of filter circuits 40, for example eight, at a constant cycle of 19T and supplies the scanned signal to the A/D converter 6. Therefore, the power level of the human voice can be determined by adding a predetermined number of digital signal data generated by the A/D converter 6 forming the input pattern in the order of input. When a voice level with characteristics as shown in FIG. 3 is obtained, normally a time Δ1. The start of the word is set at time 11, which is 10 minutes earlier, and the end of the word is set at time t2, which is [lH'lΔt2 later than the time ty when the voice level falls below the Tokohata level vr2. Therefore, the word time length Tw is t2-61. Next, Microcomputer Tough calculates the magnitude of the difference between the word duration Tw of the input speech obtained as described above and the word duration Twn (n is a natural number) of the speech registered as a certain standard pattern. △T'wn with n set to 1 (14
) (15). Then, it is determined whether the magnitude of the difference iwn is greater than a predetermined value Tr1, for example, 200 m5ec (16). In the case of ΔTwn ) Tr, no matching calculation with that standard pattern is performed, and the degree of similarity with that standard pattern is set to 0 (17).

△’ｒｗｎ≦’ｒｒの場合には入カバターンと標準パタ
ーンとのパターンマツチング計算を行ない、その類似度
５ｎ（ｎは自然数）を計算するー（１８）ｏそして、次
に自然数ｎが標準パターンの数ｎｐと等しいかを判断す
る（１９）。ｎ４ｎｐの場合にはｎに１を加算して（２
の、行程（１５）に戻る。ｎ＝ｎｐの場合にはマツチン
グ計算により各々算出された類似度Ｓｎから最大値ＳＭ
ＡＸを検出しく２１）、その最大値ＳＭＡＸが所定値Ｓ
ｒより大であるかを判断する（２２）。ＳＭＡＸ≧Ｓｒ
の場合には標準パターンとして登録された音声の中に入
力音声と同じ単語の発声音があるとしてその最大値ＳＭ
ＡＸとなった標準パターンに対応する識別信号を認識出
力ポートに出力する（２３）。しかし、ＳＭＡＸ　＜　
Ｓｒの場合には人力音声が登録された音声と異なる単語
の発声音であるとして認識出力ポートから識別外信号を
発生する（２４）。If △'rwn≦'rr, perform pattern matching calculation between the input cover pattern and the standard pattern, and calculate the similarity 5n (n is a natural number) - (18) o Then, next, the natural number n is the standard pattern. is equal to the number np (19). In the case of n4np, add 1 to n and calculate (2
Return to step (15). In the case of n=np, the maximum value SM is calculated from the respective degrees of similarity Sn calculated by matching calculation.
AX is detected21), and its maximum value SMAX is the predetermined value S.
It is determined whether it is greater than r (22). SMAX≧Sr
In the case of , the maximum value SM is assumed to be the same word as the input voice among the voices registered as a standard pattern.
The identification signal corresponding to the standard pattern that has become AX is output to the recognition output port (23). However, SMAX <
In the case of Sr, an unidentified signal is generated from the recognition output port because the human voice is a pronunciation of a word different from the registered voice (24).

このように、本発明による音声認識装置によれば、入力
音声の単語時間長と標準パターンとじて登録された音声
の単語時間長との差の大きさが所定時間内のときのみパ
ターンのマツチング計算ヲ行なうようになされているた
め、入力音声の単語時間長と登録された音声の単語時間
長との差が大速くなると共に極端に長さの異なる単語ど
うしのミスマツチングがなくなり認識率の向上が図れる
のである。As described above, according to the speech recognition device according to the present invention, pattern matching calculation is performed only when the magnitude of the difference between the word duration of input speech and the word duration of speech registered as a standard pattern is within a predetermined time. As a result, the difference between the word duration of the input speech and the word duration of the registered speech becomes much faster, and mismatching between words with extremely different lengths is eliminated, improving the recognition rate. It is.

[Brief explanation of drawings]

第１図は本発明の実施例を示すブロック図、第２図は第
１図のマイクロコンピュータの動作フロー図、第３図は
音声レベルの時間的変動を示す特性図。主安部分の符号の説明１・・・・・・マイクロホン３・・・　特徴抽出回路４・・・・・　フィルタ回路５・・・・マルチプレクサ６・・・・・・・・・Ａ　／　Ｄ変換器７・・・・・・
・・・マイクロコンピュータ出願人　　パイオニア株式
会社代理人　　弁理士　藤　村　元　彦FIG. 1 is a block diagram showing an embodiment of the present invention, FIG. 2 is an operation flow diagram of the microcomputer shown in FIG. 1, and FIG. 3 is a characteristic diagram showing temporal fluctuations in audio level. Explanation of the symbols of the main part 1...Microphone 3...Feature extraction circuit 4...Filter circuit 5...Multiplexer 6...A/D conversion Vessel 7...
...Microcomputer applicant Pioneer Co., Ltd. agent Patent attorney Motohiko Fujimura

Claims

[Claims]

(1) A speech recognition device that extracts the features of input speech, creates an input pattern, and matches the input pattern with a pre-registered standard pattern, wherein 1. A speech recognition device characterized in that the matching is performed only when the magnitude of the difference between the word length of the speech and the word duration is within a predetermined time.

(2) The speech recognition device according to claim 1, wherein the word duration of the input speech is determined from the level of the input speech.