JPS63303395A - Voice recognition equipment with multi- amplification function - Google Patents

Voice recognition equipment with multi- amplification function

Info

Publication number
JPS63303395A
JPS63303395A JP62138240A JP13824087A JPS63303395A JP S63303395 A JPS63303395 A JP S63303395A JP 62138240 A JP62138240 A JP 62138240A JP 13824087 A JP13824087 A JP 13824087A JP S63303395 A JPS63303395 A JP S63303395A
Authority
JP
Japan
Prior art keywords
amplification
level
speech
signal
speech recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP62138240A
Other languages
Japanese (ja)
Inventor
羽金 廣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NEC Corp
Original Assignee
NEC Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by NEC Corp filed Critical NEC Corp
Priority to JP62138240A priority Critical patent/JPS63303395A/en
Publication of JPS63303395A publication Critical patent/JPS63303395A/en
Pending legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 〔産業上の利用分野〕 この発明は発声された音声を認識する音声認識装置に関
し、特に、発声レベルの低い音声部分から音素レベルの
特徴を確実に抽出するマルチ増幅機能を備えた音声認識
装置に関するものである。
[Detailed Description of the Invention] [Field of Industrial Application] The present invention relates to a speech recognition device that recognizes uttered speech, and in particular, a multi-amplification function that reliably extracts phoneme-level features from speech parts with low utterance levels. The present invention relates to a speech recognition device equipped with the following.

〔従来の技術〕[Conventional technology]

従来、この種の音声認識装置では登録(特定話者用の音
声認識装置で行々う処理)や認識処理を行なう前に、マ
イクロホンからの音声信号を増幅するための増幅器の増
幅度を発声者自身がボリューム等で設定したシ、あるい
は発声者のテスト発声のレベルに従って音声認識装置が
自動的に設定して(以後このレベル設定する処理をレベ
ル設定と呼ぶ)、その増幅度で増幅された音声信号に対
して音声認識処理を行なっていた。
Conventionally, in this type of speech recognition device, the amplification degree of the amplifier for amplifying the speech signal from the microphone is determined by the speaker before registration (processing performed by the speech recognition device for a specific speaker) and recognition processing. The voice is automatically set by the voice recognition device according to the volume set by the user himself or the level of the test voice of the speaker (hereinafter, this level setting process is referred to as level setting), and the voice is amplified by the amplification level. Voice recognition processing was performed on the signal.

〔発明が解決しようとする問題点〕[Problem that the invention seeks to solve]

上述した従来の音声認識装置ではレベル設定後に登録や
認識が行なわれ、レベル設定で決定された増幅度はその
後の登録や認識処理中は変更されることなく一定の増幅
度でマイクロホンからの音声信号を増幅して処理してい
るため、発声中の音量が小さい区間、特に子音区間では
増幅度が小さい(音量が大きい部分に比べてLΦ変換後
の量子化の精度が粗い)音声信号を処理して音声認識を
行なわざるをえないという欠点がある。
In the conventional speech recognition device described above, registration and recognition are performed after setting the level, and the amplification determined by the level setting is not changed during subsequent registration and recognition processing, and the audio signal from the microphone is processed at a constant amplification. Because it amplifies and processes the audio signal, the degree of amplification is small in the low-volume sections during utterance, especially the consonant sections (the quantization accuracy after LΦ conversion is rougher than in the high-volume sections). The disadvantage is that voice recognition has to be performed.

〔問題点を解決するための手段〕[Means for solving problems]

この発明のマルチ増幅機能を備えた音声認識装置は、発
声された音声から音素レベルの特徴を抽出して認識する
に際して、その音素レベルの%徴を抽出する時間帯に対
して最適な増幅度となる音声信号を記憶部から読み出し
、その信号から音素レベルの特徴を抽出しながら音声全
体を認識するものである。
The speech recognition device equipped with the multi-amplification function of the present invention, when extracting and recognizing phoneme-level features from uttered speech, selects the optimum amplification degree for the time period for extracting the phoneme-level % characteristics. The system reads out a voice signal from the storage section and recognizes the entire voice while extracting features at the phoneme level from the signal.

〔作用〕[Effect]

この発明は最適ガレベルで増幅された音声信号からその
特徴を抽出できるので、認識率を向上させることができ
る。
Since the present invention can extract the features from the audio signal amplified at the optimum level, the recognition rate can be improved.

〔実施例〕〔Example〕

第1図はこの発明に係るマルチ増幅機能を備えた音声認
識装置行の一実施例を示すブロック図である。同図にお
いて、MCはマイクロホン、Al + A2〜A5はそ
れぞれ異なった増幅度a1〜a5を備えた増幅器、C,
、C2〜C5は入力するアナログ音声信号をデイジタル
カ刊声信号に変換して出力するA/D変換器、SDおよ
びEDはそれぞれ音声信号の始めと終りを検出し、それ
ぞれ始端検出信号S1および終端検出信号S2を出力す
る始端検出部および終端検出部、M1+ M2〜M5は
始端検出信号S1の入力によシ記憶動作を開始しに1変
換器C7〜C5から出力されたディジタルな音声信号を
記憶し終端検出信号S2の入力によシ記憶動作を終了す
る記憶部、SEはフレーム区間に対して最適な増幅度で
記憶されている音声信号を記憶部M、〜M5の中から選
んで音声信号Vとして出力する増幅度選択部、RCはそ
れぞれのフレームの区間の音量レベルを音量レベル信号
りとして出力する一方、送られてきた音声信号Vを例え
ば数rnsOフレームに分けてそれぞれのフレームから
音素レベルの特徴を抽出して標準フレームパターンとマ
ツチングを行ない音韻ラベル付けをした音声認識結果信
号Tを出力する音声認識部である。なお、第2図は発ル
された音声「あさ」(朝)の音量変化をフレームF1〜
F+5に分けた図である。また、前記音声認識部RCは
例えば音韻の特徴として母音。
FIG. 1 is a block diagram showing an embodiment of a speech recognition device equipped with a multi-amplification function according to the present invention. In the figure, MC is a microphone, Al + A2 to A5 are amplifiers each having different amplification degrees a1 to a5, C,
, C2 to C5 are A/D converters that convert input analog audio signals into digital voice signals and output them, and SD and ED detect the beginning and end of the audio signals, respectively, and output a starting edge detection signal S1 and an ending edge detection signal, respectively. The start edge detection section and the end detection section M1+ which output the signal S2, M2 to M5 start the storage operation upon input of the start edge detection signal S1, and store the digital audio signals output from the converters C7 to C5. The storage unit SE, which completes the storage operation upon input of the termination detection signal S2, selects the audio signal stored at the optimum amplification degree for the frame section from the storage units M, to M5, and stores the audio signal V. The amplification selector RC outputs the volume level of each frame section as a volume level signal, while dividing the sent audio signal V into, for example, several rnsO frames and extracting the phoneme level from each frame. This is a speech recognition unit that extracts features, matches them with standard frame patterns, and outputs a speech recognition result signal T with phoneme labels attached. In addition, Figure 2 shows the volume change of the uttered voice "Asa" (morning) from frame F1 to
It is a diagram divided into F+5. Further, the speech recognition unit RC recognizes, for example, a vowel as a phoneme feature.

g擦合、鼻音、破裂音を選び、それぞれの標準フレーム
パターンを前もって登録して認識時にその標準フレーム
パターンとマツチングをとって類似性の一番高い標準フ
レームパターンの属性を示す音韻ラベル名をそれぞれの
フレームに対して付けている。
g Select a rasp, a nasal, or a plosive, register each standard frame pattern in advance, match it with that standard frame pattern during recognition, and assign a phonological label name to each that indicates the attribute of the standard frame pattern with the highest similarity. attached to the frame.

次に上記構成によるマルチ増幅機能を備えた音声認識装
置の動作について説明する。まず、マイクロホンMCか
ら入力された音声信号はそれぞれ異なった増幅度1.〜
a5を備えた増幅器A1〜A5で増幅されたのち、〜Φ
変換器C1〜C5に入力する。そして、この〜■変換器
C1〜C5は入力するアナログ音声信号をディジタルな
音声信号に変換して記憶部M、〜M5に出力する。一方
、始端検出部SDおよび終端検出部EDは音声信号の始
端検出信号S1および終端検出信号S2を記憶部M1〜
M5に出力する。したがって、この記憶部M1〜M5は
始端検出信号S1が入力してから終端検出信号S2が入
力するまでにΦ変換器C1〜C5から出力されたディジ
タルな音声信号を記憶する。
Next, the operation of the speech recognition device having the multi-amplification function configured as described above will be explained. First, the audio signals input from the microphones MC have different amplification degrees of 1. ~
After being amplified by amplifiers A1 to A5 equipped with a5, ~Φ
input to converters C1-C5. The converters C1 to C5 convert the input analog audio signals into digital audio signals and output the digital audio signals to the storage units M and M5. On the other hand, the start end detection unit SD and the end detection unit ED store the start end detection signal S1 and the end detection signal S2 of the audio signal in the storage units M1 to
Output to M5. Therefore, the storage units M1 to M5 store the digital audio signals output from the Φ converters C1 to C5 from when the start detection signal S1 is input to when the end detection signal S2 is input.

そして、音声認識部RCは記憶部M、−M5から音声信
号を読み出し、その音声信号から音素レベルの特徴を抽
出して始端から終端までの音声を認識するが、まず例え
ば第2図に示すように発声された音声[あさj(朝)を
数mgのフレームに分けてそれぞれのフレームから音素
レベルの特徴を抽出して各フレームの音韻ラベルを付け
る。すなわち、音声認識部RCはフレームF1の音量レ
ベルから音量レベル信号りを増幅度選択部SEに出力す
る。したがって、この増幅度選択部SEはフレームF1
の区間に対して最適な増幅度で記憶されている音声信号
を記憶部M1〜M5の中から選び音声信号Vに出力する
。したがって、音声認識部RCは入力する音声信号Vか
ら特徴を抽出して標準フレームパターンとマツチングを
行なって音韻ラベル付けをする。同様にして、フレーム
F2〜F+5についても音韻ラベル付けを行なうことが
できる。
Then, the speech recognition section RC reads out the speech signals from the storage sections M and -M5, extracts features at the phoneme level from the speech signals, and recognizes the speech from the beginning to the end. The speech uttered [Asaj (morning)] is divided into frames of several milligrams each, phoneme-level features are extracted from each frame, and a phoneme label is attached to each frame. That is, the speech recognition unit RC outputs a volume level signal from the volume level of the frame F1 to the amplification degree selection unit SE. Therefore, this amplification selector SE selects the frame F1.
The audio signal stored at the optimum amplification degree for the section is selected from the storage units M1 to M5 and outputted as the audio signal V. Therefore, the speech recognition unit RC extracts features from the input speech signal V, matches them with the standard frame pattern, and attaches a phoneme label. Similarly, phonological labeling can be applied to frames F2 to F+5 as well.

そして、単語辞書に認識対象とガる単語の音韻ラベル列
(以下単語音韻ラベル列と呼ぶ)を持ち、未知入力信号
から得られた音韻ラベル列と単語音韻ラベル列とマツチ
ングを行ない、類似度が最も大きい単語音韻ラベル列の
単語の属性を認識結果とすることにより単語認識を行な
うことができる。
Then, the word dictionary has a phonological label string of the word to be recognized (hereinafter referred to as the word phonological label string), and the phonological label string obtained from the unknown input signal is matched with the word phonological label string, and the similarity is calculated. Word recognition can be performed by using the attribute of the word in the largest word phoneme label string as the recognition result.

なお、以上は単語認識を例にして説明したが、これに限
定せず、単語辞書に加えて単語間の接続情報等の言語レ
ベルの辞書をもつことによシ連続的に発声された音声も
同様にして認識することができることはもちろんである
。また、特徴を音韻ラベル、認識方式をマツチングとし
て説明したが、これに限定されないことはもちろんであ
る。また、以上の説明では増幅器、に1変換器および記
憶部をそれぞれ5個設けた場合について説明したが、こ
れに限定せず、任意の個数でよいことはもちろんである
The above explanation is based on word recognition as an example, but it is not limited to this; by having a language-level dictionary containing connection information between words in addition to a word dictionary, it is possible to recognize continuously uttered speech as well. Of course, it can be recognized in the same way. Further, although the feature has been described as a phonetic label and the recognition method as matching, it is needless to say that the present invention is not limited to this. Further, in the above description, a case has been described in which one amplifier, one converter, and five storage units are provided, but the present invention is not limited to this, and it goes without saying that any number may be used.

〔発明の効果〕〔Effect of the invention〕

以上詳細に説明したように、この発明に係るマルチ増幅
機能を備えた音声認識装置によれば、語頭2語尾および
子音部のように発声レベルの低い音声部分から音素レベ
ルの特徴を抽出するに際してその区間に対して最適なレ
ベルで増幅された音声信号からその特徴を抽出するので
、認識エラーやりジエクトを削減することができる効果
がある。
As explained in detail above, the speech recognition device equipped with the multi-amplification function according to the present invention is effective in extracting phoneme-level features from voice parts with low utterance levels, such as the beginning and end of words and consonants. Since the features are extracted from the audio signal that has been amplified at the optimal level for the section, it has the effect of reducing recognition errors and errors.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図はこの発明に係るマルチ増幅機能を備えた音声認
識装置の一実施例を示すブロック図、第2図は発声者が
「あさ」を発声したときのマイクロホンからの音声信号
の音量の変化を示す図である。 MC・・・eマイクロホン、A1〜A5 ・・・会増幅
器、T・・・・音声認識結果、自〜C5・・・・んΦ変
換器、Ml−M5・・・・記憶部、SE・・・φ増幅度
選択部、RC・・・・音声認識部、SD @・・・始端
検出部、ED・・・・終端検出部、Sl ・・・・始端
検出信号、S2 ・・・・終端検出信号、L−・・・音
量レベル信号。
Fig. 1 is a block diagram showing an embodiment of a speech recognition device equipped with a multi-amplification function according to the present invention, and Fig. 2 shows changes in the volume of the audio signal from the microphone when the speaker utters "Asa". FIG. MC... e microphone, A1-A5... group amplifier, T... speech recognition result, self-C5... Φ converter, Ml-M5... memory section, SE...・φ amplification selection section, RC...speech recognition section, SD@...starting end detection section, ED...end detection section, Sl...starting end detection signal, S2...end detection Signal, L-...Volume level signal.

Claims (1)

【特許請求の範囲】[Claims] 各々異なる増幅度を持ちマイクロホンの出力を増幅する
複数の増幅器と、これら増幅器の音声出力信号の始端か
ら終端までをそれぞれ一時記憶する記憶部と、発声され
た音声の音量の値に従つて最適な増幅度の音声出力信号
を記憶部から読み出して認識処理する手段とを備えたこ
とを特徴とするマルチ増幅機能を備えた音声認識装置。
A plurality of amplifiers, each having a different amplification degree, amplify the output of the microphone, a memory section that temporarily stores the audio output signals of these amplifiers from the beginning to the end, and an optimal 1. A speech recognition device having a multi-amplification function, comprising means for reading an amplification-level speech output signal from a storage unit and performing recognition processing.
JP62138240A 1987-06-03 1987-06-03 Voice recognition equipment with multi- amplification function Pending JPS63303395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP62138240A JPS63303395A (en) 1987-06-03 1987-06-03 Voice recognition equipment with multi- amplification function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP62138240A JPS63303395A (en) 1987-06-03 1987-06-03 Voice recognition equipment with multi- amplification function

Publications (1)

Publication Number Publication Date
JPS63303395A true JPS63303395A (en) 1988-12-09

Family

ID=15217349

Family Applications (1)

Application Number Title Priority Date Filing Date
JP62138240A Pending JPS63303395A (en) 1987-06-03 1987-06-03 Voice recognition equipment with multi- amplification function

Country Status (1)

Country Link
JP (1) JPS63303395A (en)

Similar Documents

Publication Publication Date Title
KR910002198B1 (en) Method and device for voice awareness (detection)
JP2007233412A (en) Method and system for speaker-independent recognition of user-defined phrase
JPS58130393A (en) Voice recognition equipment
JPS5862699A (en) Voice recognition equipment
JPH0582599B2 (en)
JPS63303395A (en) Voice recognition equipment with multi- amplification function
JPS63149699A (en) Voice input/output device
JPH11231897A (en) Speech recognition device and method
JPH0283593A (en) Noise adaptive speech recognizing device
JPS6312000A (en) Voice recognition equipment
JPS59224900A (en) Voice recognition system
JPS6131480B2 (en)
KR940005044B1 (en) Voice recognizing apparatus and voice recording method
JPS59195300A (en) Voice recognition equipment
JPH09198078A (en) Speech recognition device
JPS58195895A (en) Word voice recognition equipment
JPS63306498A (en) Voice section detecting system
JPS6135498A (en) Voice recognition equipment
JPH01158499A (en) Standing noise eliminaton system
JPS6076800A (en) Voice recognition system
JPS59180598A (en) Voice input system
JPS62223798A (en) Voice recognition equipment
JPS59176791A (en) Voice registration system
JPS6070497A (en) Voice recognition equipment
JPH07302098A (en) Word voice recognition device