JPH0431896A - Speech recognizing device - Google Patents

Speech recognizing device

Info

Publication number
JPH0431896A
JPH0431896A JP2138798A JP13879890A JPH0431896A JP H0431896 A JPH0431896 A JP H0431896A JP 2138798 A JP2138798 A JP 2138798A JP 13879890 A JP13879890 A JP 13879890A JP H0431896 A JPH0431896 A JP H0431896A
Authority
JP
Japan
Prior art keywords
section
speech
voice
matching
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP2138798A
Other languages
Japanese (ja)
Inventor
Toshiyuki Masumura
増村 利行
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP2138798A priority Critical patent/JPH0431896A/en
Publication of JPH0431896A publication Critical patent/JPH0431896A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To reduce the occurrence of speech recognizing errors due to wrong voice detection by performing matching to voice sections having difference detecting levels for comparison. CONSTITUTION:A voice detecting section 2 is set so that the section 2 can regard a somewhat high noise level (for example, 10 times) as a voice section and another voice detecting section 6 is set so that the section 6 can regard a somewhat low noise level (for example, 2 times) as a voice section. A matching section 4 performs matching to the voice section detected by the section 2 and another matching section 7 performs matching to the voice section detected by the section 6. A discriminating section 8 compares the recognized results of the sections 4 and 7 with each other and discriminates that which result is closer to a standard pattern and outputs a discriminated result. Therefore, dropping of recognition rate caused by wrong voice detection can be improved.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 本発明は音声認識装置に関し、特に音声検出部りによる
認識率の低下の改善を図った音声認識装置に関する。
DETAILED DESCRIPTION OF THE INVENTION [Field of Industrial Application] The present invention relates to a speech recognition device, and more particularly to a speech recognition device that aims to improve the reduction in recognition rate caused by a speech detection section.

〔従来の技術〕[Conventional technology]

従来の音声認識装置のブロック図を第4図に示す、第4
図において、1はマイクロフォン、2は音声区間を判定
する音声検出部、3は音声の特徴パラメータたる特徴ベ
クトルを抽出する音声分析部、4は標準パタンと特徴ベ
クトルを照合するマツチング部、5は標準パタンを格納
する標準パタンメモリである。マイクロフォン1に入力
された音声は、音声検出部1と音声分析部2に入力され
る。音声検出部1は、入力音声のレベルを検出し、その
大小にもとづいて有効音声区間を判定する。音声分析部
3は、入力音声をディジタル変換し、分析フレームごと
に特徴ベクトルを抽出する。マツチング部4は、音声検
出部1が出力した音声区間における分析フレームごとの
特徴ベクトルと、標準パタンメモリ5にあらかじめ格納
されている標準パタンとのマツチングを行い、入力音声
に最も近い標準パタンを判定し認識結果として出力する
A block diagram of a conventional speech recognition device is shown in FIG.
In the figure, 1 is a microphone, 2 is a voice detection unit that determines a voice section, 3 is a voice analysis unit that extracts feature vectors that are voice feature parameters, 4 is a matching unit that matches standard patterns and feature vectors, and 5 is a standard This is a standard pattern memory that stores patterns. The voice input to the microphone 1 is input to the voice detection section 1 and the voice analysis section 2. The audio detection unit 1 detects the level of input audio, and determines an effective audio section based on the level of the input audio. The speech analysis unit 3 digitally converts the input speech and extracts a feature vector for each analysis frame. The matching unit 4 matches the feature vector for each analysis frame in the voice section output by the voice detection unit 1 with the standard pattern stored in advance in the standard pattern memory 5, and determines the standard pattern closest to the input voice. and output it as a recognition result.

〔発明が解決しようとする課題〕[Problem to be solved by the invention]

上述した従来の音声認識装置は、語頭又は語尾の弱い音
声を入力した場合に、周囲の雑音の状況により語頭又は
語尾を正確に検出できずに音声認識の結果が誤るという
欠点がある。
The above-described conventional speech recognition device has a drawback that when inputting speech with a weak beginning or end of a word, the beginning or end of the word cannot be accurately detected due to surrounding noise, resulting in incorrect speech recognition results.

例えば、「サン」という音声は、語頭の「す」の部分の
音声レベルが小さいので、周囲の雑音レベルが大きい場
合は、「す」の部分を音声と見なす事ができず、「ンJ
の部分のみを音声として検出してしまう。
For example, in the case of the sound "san", the sound level of the "su" part at the beginning of the word is low, so if the surrounding noise level is high, the "su" part cannot be considered as speech, and the sound level of the "su" part at the beginning of the word is low.
Only that part is detected as audio.

又、このような音声を正確に検出しようとして、音声検
出しきい値を低くして設定すると、音声でない部分、例
えば音声の発声前の「舌うち音」をも音声として検出し
てしまい、やはり、結果として音声認識結果が誤ってし
まうという欠点がある。
Additionally, if you set a low voice detection threshold in an attempt to accurately detect such voices, non-voice parts, such as the "tongue-clicking sound" before the voice is uttered, will also be detected as voice. However, this method has the disadvantage that the result of speech recognition is incorrect.

〔課題を解決するための手段〕[Means to solve the problem]

本発明の装置は、入力音声の音声区間を検出する音声検
出部と、入力音声の特徴ベクトルを求める音声分析部と
、標準パタンを格納しておく標準パタンメモリ部と、前
記特徴ベクトルと標準パタンとのマツチングを行ない入
力音声を認識するマツチング部からなる音声認識装置に
おいて、互いに異なる音声検出レベルを有して音声区間
を検出する複数の音声検出部と、前記複数の音声検出部
に対応して配設する複数もしくは時分割使用の1つのマ
ツチング部と、前記マツチング部による複数のマツチン
グの認識結果を比較して標準パタンに最も近いものを判
定して出力する判定部とを備えて構成される。
The device of the present invention includes a speech detection section that detects a speech section of input speech, a speech analysis section that obtains a feature vector of input speech, a standard pattern memory section that stores a standard pattern, and a speech detection section that detects a speech section of input speech. A speech recognition device comprising a matching section that performs matching with a matching section to recognize input speech, includes a plurality of speech detection sections that detect speech sections having mutually different speech detection levels, and a plurality of speech detection sections corresponding to the plurality of speech detection sections. The pattern includes a plurality of disposed matching sections or one matching section that is used in a time-sharing manner, and a judgment section that compares the recognition results of the plurality of matchings by the matching section, judges the one closest to the standard pattern, and outputs the result. .

〔実施例〕〔Example〕

次に、本発明について図面を参照して説明する。 Next, the present invention will be explained with reference to the drawings.

第1図は本発明の音声認識装置の一実施例の構成図であ
り、音声検出部及びマツチング部をそれぞれ2つ有する
場合を例としている。第1図において、1はマイクロフ
ォン、2.6は互いに検出レベルを異にする音声検出部
、3は音声分析部、4.7はマツチング部、5は標準パ
タンメモリ、8は判定部である。第1図の各部の基本的
動作は、第4図の場合と同様であるが、音声検出部2と
音声検出部6の音声検出レベルが異っている。
FIG. 1 is a block diagram of an embodiment of a speech recognition device according to the present invention, and takes as an example a case in which there are two speech detection sections and two matching sections. In FIG. 1, 1 is a microphone, 2.6 is a voice detection section having different detection levels, 3 is a voice analysis section, 4.7 is a matching section, 5 is a standard pattern memory, and 8 is a judgment section. The basic operation of each section in FIG. 1 is the same as that in FIG. 4, but the voice detection levels of voice detection section 2 and voice detection section 6 are different.

本実施例では、音声検出部2は雑音レベルに対し、ある
程度大きなレベル(例えば10倍)を音声区間と見なす
ように設定され、音声検出部6は雑音レベルに対し、小
さなレベル(例えば2倍)でも音声区間と見なすように
設定されている。マツチング部4は音声検出部2が検出
した音声区間に対しマツチングを行い、マツチング部7
は、音声検出部6が検出した音声区間に対し、マツチン
グを行う0判定部8は、マツチング部4が出力する認識
結果と、マツチング部7が出力する認識結果を比較し、
どちらの結果がより標準パタンに近いかを判定しその結
果を出力する。
In this embodiment, the voice detection unit 2 is set to consider a level that is somewhat higher than the noise level (for example, 10 times) as a voice section, and the voice detection unit 6 is set to consider a level that is smaller than the noise level (for example, 2 times) as a voice section. However, it is set to be treated as a voice section. The matching unit 4 performs matching on the voice section detected by the voice detection unit 2, and the matching unit 7
The 0 determination unit 8 that performs matching for the voice section detected by the voice detection unit 6 compares the recognition result output by the matching unit 4 and the recognition result output by the matching unit 7,
Determine which result is closer to the standard pattern and output the result.

次に本実施例の動作について、第2図、第3図を用いて
説明する。第2図は、低雑音下での[サン」の音声波形
、第3図は高雑音下での「サン」の音声波形である。第
2.3図において、Llは音声検出部2の音声検出レベ
ル、L2は音声検出部6の音声検出レベルである。又、
tl、tl’は音声検出部によって検出された音声区間
、t2゜t2’は音声検出部6によって検出された音声
区間である。第2,3図に示すように、低雑音下では音
声検出部2が、また高雑音下では音声検出部6が正確に
音声区間を検出することができる。第3図での場合では
、音声検出部2は「ン」の部分を、音声検出部6はrサ
ン」の部分を検出している。この音声をそれぞれマツチ
ング部4,7で標準パタンメモリ部5の標準パタンと照
合する。標準パタンメモリ部5にはrサンJという標準
パタンが格納されているので、マツチング部7は認識結
果とし「サン」を出力する。マツチング部4は「ン」に
最も近い標準パタン、例えば「ヨン」を認識結果として
出力する。判定部8は、「サン」と「サン」の類似度と
、「ン」と「ヨン」の類似度を比較し、類似度がより大
きい「サン」を認識結果とする。
Next, the operation of this embodiment will be explained using FIGS. 2 and 3. FIG. 2 shows the speech waveform of "San" under low noise, and FIG. 3 shows the speech waveform of "San" under high noise. In FIG. 2.3, Ll is the voice detection level of the voice detection section 2, and L2 is the voice detection level of the voice detection section 6. or,
tl and tl' are voice sections detected by the voice detection section, and t2°t2' is a voice section detected by the voice detection section 6. As shown in FIGS. 2 and 3, the speech detection section 2 can accurately detect the speech section under low noise, and the speech detection section 6 can accurately detect the speech section under high noise. In the case shown in FIG. 3, the voice detection section 2 detects the part "n", and the voice detection section 6 detects the part "r sun". These voices are compared with the standard patterns in the standard pattern memory section 5 by matching sections 4 and 7, respectively. Since the standard pattern memory section 5 stores the standard pattern r-san J, the matching section 7 outputs "san" as the recognition result. The matching unit 4 outputs the standard pattern closest to "n", for example "yon", as a recognition result. The determining unit 8 compares the degree of similarity between "san" and "san" and the degree of similarity between "n" and "yon", and determines "san" with the greater degree of similarity as the recognition result.

なお、本実施例ではマツチング部を複数固有するが、こ
れは、マツチングをリアルタイムで行うためのものであ
る。マツチング部1が1個の場合は、1つのマツチング
部を時分割に使用することになる。
In this embodiment, a plurality of matching sections are provided, but this is for performing matching in real time. When there is only one matching section 1, one matching section is used for time division.

〔発明の効果〕〔Effect of the invention〕

以上説明したように本発明は、異なる音声検出レベルを
有する複数の音声検出部と、それぞれの音声検出部に対
応する複数もしくは時分割使用の1つのマツチング部を
有して検出レベルの異なる音声区間に対してマツチング
を施して比較することにより、音声検出の誤りによる音
声認識の誤りを著しく低減できる効果がある。
As explained above, the present invention has a plurality of voice detection units having different voice detection levels, and a plurality of matching units corresponding to each voice detection unit, or one matching unit for time-sharing use, to match voice segments with different detection levels. By performing matching and comparing the results, it is possible to significantly reduce errors in speech recognition due to errors in speech detection.

の構成図である。FIG.

1・・・マイクロフォン、2.6・・・音声検出部、3
・・・音声分析部、4.7・・・マツチング部、5・・
・標準パタンメモリ部、8・・・判定部、Ll・・・音
声検出部2の検出レベル、L2・・・音声検出部6の検
出レベル、tl、tl’・・・音声検出部2の音声検出
区間、t2.t2’・・・音声検出部6の音声検圧区間
1...Microphone, 2.6...Audio detection section, 3
...Voice analysis section, 4.7...Matching section, 5...
- Standard pattern memory section, 8... Judgment section, Ll... Detection level of the voice detection section 2, L2... Detection level of the voice detection section 6, tl, tl'... Voice of the voice detection section 2 Detection interval, t2. t2'...Audio pressure detection section of the audio detection unit 6.

Claims (1)

【特許請求の範囲】[Claims]  入力音声の音声区間を検出する音声検出部と、入力音
声の特徴ベクトルを求める音声分析部と、標準パタンを
格納しておく標準パタンメモリ部と、前記特徴ベクトル
と標準パタンとのマッチングを行ない入力音声を認識す
るマッチング部からなる音声認識装置において、互いに
異なる音声検出レベルを有して音声区間を検出する複数
の音声検出部と、前記複数の音声検出部に対応して配設
する複数もしくは時分割使用の1つのマッチング部と、
前記マッチング部による複数のマッチングの認識結果を
比較して標準パタンに最も近いものを判定して出力する
判定部とを備えて成ることを特徴とする音声認識装置。
A speech detection section detects speech sections of input speech, a speech analysis section obtains feature vectors of input speech, a standard pattern memory section stores standard patterns, and performs input by matching the feature vectors with standard patterns. A speech recognition device including a matching section that recognizes speech includes a plurality of speech detection sections that detect speech sections having mutually different speech detection levels, and a plurality of speech detection sections arranged corresponding to the plurality of speech detection sections. One matching part for split use,
A speech recognition device comprising: a determining section that compares recognition results of a plurality of matchings by the matching section, determines the one closest to a standard pattern, and outputs the same.
JP2138798A 1990-05-29 1990-05-29 Speech recognizing device Pending JPH0431896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2138798A JPH0431896A (en) 1990-05-29 1990-05-29 Speech recognizing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP2138798A JPH0431896A (en) 1990-05-29 1990-05-29 Speech recognizing device

Publications (1)

Publication Number Publication Date
JPH0431896A true JPH0431896A (en) 1992-02-04

Family

ID=15230480

Family Applications (1)

Application Number Title Priority Date Filing Date
JP2138798A Pending JPH0431896A (en) 1990-05-29 1990-05-29 Speech recognizing device

Country Status (1)

Country Link
JP (1) JPH0431896A (en)

Similar Documents

Publication Publication Date Title
US4811399A (en) Apparatus and method for automatic speech recognition
US5625747A (en) Speaker verification, speech recognition and channel normalization through dynamic time/frequency warping
CA1207907A (en) Speaker verification system
ATE250801T1 (en) METHOD AND DEVICE FOR DETECTING NOISE SIGNAL SAMPLES FROM A NOISE
KR20170073113A (en) Method and apparatus for recognizing emotion using tone and tempo of voice signal
US5159637A (en) Speech word recognizing apparatus using information indicative of the relative significance of speech features
KR100737358B1 (en) Method for verifying speech/non-speech and voice recognition apparatus using the same
JP2996019B2 (en) Voice recognition device
JPH0431896A (en) Speech recognizing device
JP3251460B2 (en) Speaker verification method and apparatus
Mishra et al. Speaker identification, differentiation and verification using deep learning for human machine interface
JP3114757B2 (en) Voice recognition device
JP2989231B2 (en) Voice recognition device
KR100246617B1 (en) Speech detection method using the continuous pitch information
JPH03160499A (en) Speech recognizing device
JPH01222299A (en) Voice recognizing device
JPH0316038B2 (en)
KR100281581B1 (en) Korean Continuous Number Speech Recognition Using Simultaneous Articulation Model
JP2712704B2 (en) Signal processing device
JPH04152397A (en) Voice recognizing device
JPH0395599A (en) Voice recognition system
KR100349656B1 (en) Apparatus and method for speech detection using multiple sub-detection system
JPH02272495A (en) Voice recognizing device
JPS5858598A (en) Voice recognition system
JPH0950292A (en) Voice recognition device