JPH0431896A

JPH0431896A - Speech recognizing device

Info

Publication number: JPH0431896A
Application number: JP2138798A
Authority: JP
Inventors: Toshiyuki Masumura; 増村　利行
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-05-29
Filing date: 1990-05-29
Publication date: 1992-02-04

Abstract

PURPOSE:To reduce the occurrence of speech recognizing errors due to wrong voice detection by performing matching to voice sections having difference detecting levels for comparison. CONSTITUTION:A voice detecting section 2 is set so that the section 2 can regard a somewhat high noise level (for example, 10 times) as a voice section and another voice detecting section 6 is set so that the section 6 can regard a somewhat low noise level (for example, 2 times) as a voice section. A matching section 4 performs matching to the voice section detected by the section 2 and another matching section 7 performs matching to the voice section detected by the section 6. A discriminating section 8 compares the recognized results of the sections 4 and 7 with each other and discriminates that which result is closer to a standard pattern and outputs a discriminated result. Therefore, dropping of recognition rate caused by wrong voice detection can be improved.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は音声認識装置に関し、特に音声検出部りによる
認識率の低下の改善を図った音声認識装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device, and more particularly to a speech recognition device that aims to improve the reduction in recognition rate caused by a speech detection section.

[Conventional technology]

従来の音声認識装置のブロック図を第４図に示す、第４
図において、１はマイクロフォン、２は音声区間を判定
する音声検出部、３は音声の特徴パラメータたる特徴ベ
クトルを抽出する音声分析部、４は標準パタンと特徴ベ
クトルを照合するマツチング部、５は標準パタンを格納
する標準パタンメモリである。マイクロフォン１に入力
された音声は、音声検出部１と音声分析部２に入力され
る。音声検出部１は、入力音声のレベルを検出し、その
大小にもとづいて有効音声区間を判定する。音声分析部
３は、入力音声をディジタル変換し、分析フレームごと
に特徴ベクトルを抽出する。マツチング部４は、音声検
出部１が出力した音声区間における分析フレームごとの
特徴ベクトルと、標準パタンメモリ５にあらかじめ格納
されている標準パタンとのマツチングを行い、入力音声
に最も近い標準パタンを判定し認識結果として出力する
。A block diagram of a conventional speech recognition device is shown in FIG.
In the figure, 1 is a microphone, 2 is a voice detection unit that determines a voice section, 3 is a voice analysis unit that extracts feature vectors that are voice feature parameters, 4 is a matching unit that matches standard patterns and feature vectors, and 5 is a standard This is a standard pattern memory that stores patterns. The voice input to the microphone 1 is input to the voice detection section 1 and the voice analysis section 2. The audio detection unit 1 detects the level of input audio, and determines an effective audio section based on the level of the input audio. The speech analysis unit 3 digitally converts the input speech and extracts a feature vector for each analysis frame. The matching unit 4 matches the feature vector for each analysis frame in the voice section output by the voice detection unit 1 with the standard pattern stored in advance in the standard pattern memory 5, and determines the standard pattern closest to the input voice. and output it as a recognition result.

[Problem to be solved by the invention]

上述した従来の音声認識装置は、語頭又は語尾の弱い音
声を入力した場合に、周囲の雑音の状況により語頭又は
語尾を正確に検出できずに音声認識の結果が誤るという
欠点がある。The above-described conventional speech recognition device has a drawback that when inputting speech with a weak beginning or end of a word, the beginning or end of the word cannot be accurately detected due to surrounding noise, resulting in incorrect speech recognition results.

例えば、「サン」という音声は、語頭の「す」の部分の
音声レベルが小さいので、周囲の雑音レベルが大きい場
合は、「す」の部分を音声と見なす事ができず、「ンＪ
の部分のみを音声として検出してしまう。For example, in the case of the sound "san", the sound level of the "su" part at the beginning of the word is low, so if the surrounding noise level is high, the "su" part cannot be considered as speech, and the sound level of the "su" part at the beginning of the word is low.
Only that part is detected as audio.

又、このような音声を正確に検出しようとして、音声検
出しきい値を低くして設定すると、音声でない部分、例
えば音声の発声前の「舌うち音」をも音声として検出し
てしまい、やはり、結果として音声認識結果が誤ってし
まうという欠点がある。Additionally, if you set a low voice detection threshold in an attempt to accurately detect such voices, non-voice parts, such as the "tongue-clicking sound" before the voice is uttered, will also be detected as voice. However, this method has the disadvantage that the result of speech recognition is incorrect.

[Means to solve the problem]

本発明の装置は、入力音声の音声区間を検出する音声検
出部と、入力音声の特徴ベクトルを求める音声分析部と
、標準パタンを格納しておく標準パタンメモリ部と、前
記特徴ベクトルと標準パタンとのマツチングを行ない入
力音声を認識するマツチング部からなる音声認識装置に
おいて、互いに異なる音声検出レベルを有して音声区間
を検出する複数の音声検出部と、前記複数の音声検出部
に対応して配設する複数もしくは時分割使用の１つのマ
ツチング部と、前記マツチング部による複数のマツチン
グの認識結果を比較して標準パタンに最も近いものを判
定して出力する判定部とを備えて構成される。The device of the present invention includes a speech detection section that detects a speech section of input speech, a speech analysis section that obtains a feature vector of input speech, a standard pattern memory section that stores a standard pattern, and a speech detection section that detects a speech section of input speech. A speech recognition device comprising a matching section that performs matching with a matching section to recognize input speech, includes a plurality of speech detection sections that detect speech sections having mutually different speech detection levels, and a plurality of speech detection sections corresponding to the plurality of speech detection sections. The pattern includes a plurality of disposed matching sections or one matching section that is used in a time-sharing manner, and a judgment section that compares the recognition results of the plurality of matchings by the matching section, judges the one closest to the standard pattern, and outputs the result. .

〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第１図は本発明の音声認識装置の一実施例の構成図であ
り、音声検出部及びマツチング部をそれぞれ２つ有する
場合を例としている。第１図において、１はマイクロフ
ォン、２．６は互いに検出レベルを異にする音声検出部
、３は音声分析部、４．７はマツチング部、５は標準パ
タンメモリ、８は判定部である。第１図の各部の基本的
動作は、第４図の場合と同様であるが、音声検出部２と
音声検出部６の音声検出レベルが異っている。FIG. 1 is a block diagram of an embodiment of a speech recognition device according to the present invention, and takes as an example a case in which there are two speech detection sections and two matching sections. In FIG. 1, 1 is a microphone, 2.6 is a voice detection section having different detection levels, 3 is a voice analysis section, 4.7 is a matching section, 5 is a standard pattern memory, and 8 is a judgment section. The basic operation of each section in FIG. 1 is the same as that in FIG. 4, but the voice detection levels of voice detection section 2 and voice detection section 6 are different.

本実施例では、音声検出部２は雑音レベルに対し、ある
程度大きなレベル（例えば１０倍）を音声区間と見なす
ように設定され、音声検出部６は雑音レベルに対し、小
さなレベル（例えば２倍）でも音声区間と見なすように
設定されている。マツチング部４は音声検出部２が検出
した音声区間に対しマツチングを行い、マツチング部７
は、音声検出部６が検出した音声区間に対し、マツチン
グを行う０判定部８は、マツチング部４が出力する認識
結果と、マツチング部７が出力する認識結果を比較し、
どちらの結果がより標準パタンに近いかを判定しその結
果を出力する。In this embodiment, the voice detection unit 2 is set to consider a level that is somewhat higher than the noise level (for example, 10 times) as a voice section, and the voice detection unit 6 is set to consider a level that is smaller than the noise level (for example, 2 times) as a voice section. However, it is set to be treated as a voice section. The matching unit 4 performs matching on the voice section detected by the voice detection unit 2, and the matching unit 7
The 0 determination unit 8 that performs matching for the voice section detected by the voice detection unit 6 compares the recognition result output by the matching unit 4 and the recognition result output by the matching unit 7,
Determine which result is closer to the standard pattern and output the result.

次に本実施例の動作について、第２図、第３図を用いて
説明する。第２図は、低雑音下での［サン」の音声波形
、第３図は高雑音下での「サン」の音声波形である。第
２．３図において、Ｌｌは音声検出部２の音声検出レベ
ル、Ｌ２は音声検出部６の音声検出レベルである。又、
ｔｌ、ｔｌ’は音声検出部によって検出された音声区間
、ｔ２゜ｔ２’は音声検出部６によって検出された音声
区間である。第２，３図に示すように、低雑音下では音
声検出部２が、また高雑音下では音声検出部６が正確に
音声区間を検出することができる。第３図での場合では
、音声検出部２は「ン」の部分を、音声検出部６はｒサ
ン」の部分を検出している。この音声をそれぞれマツチ
ング部４，７で標準パタンメモリ部５の標準パタンと照
合する。標準パタンメモリ部５にはｒサンＪという標準
パタンが格納されているので、マツチング部７は認識結
果とし「サン」を出力する。マツチング部４は「ン」に
最も近い標準パタン、例えば「ヨン」を認識結果として
出力する。判定部８は、「サン」と「サン」の類似度と
、「ン」と「ヨン」の類似度を比較し、類似度がより大
きい「サン」を認識結果とする。Next, the operation of this embodiment will be explained using FIGS. 2 and 3. FIG. 2 shows the speech waveform of "San" under low noise, and FIG. 3 shows the speech waveform of "San" under high noise. In FIG. 2.3, Ll is the voice detection level of the voice detection section 2, and L2 is the voice detection level of the voice detection section 6. or,
tl and tl' are voice sections detected by the voice detection section, and t2°t2' is a voice section detected by the voice detection section 6. As shown in FIGS. 2 and 3, the speech detection section 2 can accurately detect the speech section under low noise, and the speech detection section 6 can accurately detect the speech section under high noise. In the case shown in FIG. 3, the voice detection section 2 detects the part "n", and the voice detection section 6 detects the part "r sun". These voices are compared with the standard patterns in the standard pattern memory section 5 by matching sections 4 and 7, respectively. Since the standard pattern memory section 5 stores the standard pattern r-san J, the matching section 7 outputs "san" as the recognition result. The matching unit 4 outputs the standard pattern closest to "n", for example "yon", as a recognition result. The determining unit 8 compares the degree of similarity between "san" and "san" and the degree of similarity between "n" and "yon", and determines "san" with the greater degree of similarity as the recognition result.

なお、本実施例ではマツチング部を複数固有するが、こ
れは、マツチングをリアルタイムで行うためのものであ
る。マツチング部１が１個の場合は、１つのマツチング
部を時分割に使用することになる。In this embodiment, a plurality of matching sections are provided, but this is for performing matching in real time. When there is only one matching section 1, one matching section is used for time division.

〔Effect of the invention〕

以上説明したように本発明は、異なる音声検出レベルを
有する複数の音声検出部と、それぞれの音声検出部に対
応する複数もしくは時分割使用の１つのマツチング部を
有して検出レベルの異なる音声区間に対してマツチング
を施して比較することにより、音声検出の誤りによる音
声認識の誤りを著しく低減できる効果がある。As explained above, the present invention has a plurality of voice detection units having different voice detection levels, and a plurality of matching units corresponding to each voice detection unit, or one matching unit for time-sharing use, to match voice segments with different detection levels. By performing matching and comparing the results, it is possible to significantly reduce errors in speech recognition due to errors in speech detection.

の構成図である。FIG.

１・・・マイクロフォン、２．６・・・音声検出部、３
・・・音声分析部、４．７・・・マツチング部、５・・
・標準パタンメモリ部、８・・・判定部、Ｌｌ・・・音
声検出部２の検出レベル、Ｌ２・・・音声検出部６の検
出レベル、ｔｌ、ｔｌ’・・・音声検出部２の音声検出
区間、ｔ２．ｔ２’・・・音声検出部６の音声検圧区間
。1...Microphone, 2.6...Audio detection section, 3
...Voice analysis section, 4.7...Matching section, 5...
- Standard pattern memory section, 8... Judgment section, Ll... Detection level of the voice detection section 2, L2... Detection level of the voice detection section 6, tl, tl'... Voice of the voice detection section 2 Detection interval, t2. t2'...Audio pressure detection section of the audio detection unit 6.

Claims

[Claims]

A speech detection section detects speech sections of input speech, a speech analysis section obtains feature vectors of input speech, a standard pattern memory section stores standard patterns, and performs input by matching the feature vectors with standard patterns. A speech recognition device including a matching section that recognizes speech includes a plurality of speech detection sections that detect speech sections having mutually different speech detection levels, and a plurality of speech detection sections arranged corresponding to the plurality of speech detection sections. One matching part for split use,
A speech recognition device comprising: a determining section that compares recognition results of a plurality of matchings by the matching section, determines the one closest to a standard pattern, and outputs the same.