JPH01310399A - Speech recognition device - Google Patents

Speech recognition device

Info

Publication number
JPH01310399A
JPH01310399A JP63141069A JP14106988A JPH01310399A JP H01310399 A JPH01310399 A JP H01310399A JP 63141069 A JP63141069 A JP 63141069A JP 14106988 A JP14106988 A JP 14106988A JP H01310399 A JPH01310399 A JP H01310399A
Authority
JP
Japan
Prior art keywords
section
distance
microphone
voice
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
JP63141069A
Other languages
Japanese (ja)
Inventor
Tsuneo Nitta
恒雄 新田
Akira Nakayama
昭 中山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba Computer Engineering Corp
Original Assignee
Toshiba Corp
Toshiba Computer Engineering Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Computer Engineering Corp filed Critical Toshiba Corp
Priority to JP63141069A priority Critical patent/JPH01310399A/en
Publication of JPH01310399A publication Critical patent/JPH01310399A/en
Pending legal-status Critical Current

Links

Abstract

PURPOSE:To accurately detect a speech section even in environment where an external noise is loud by detecting the distance between the lips of a speaker and a microphone and entering the section wherein the lips move as a next candidate for the speech section. CONSTITUTION:A distance sensor 3 provided to a microphone device 1 detects and outputs variation in the distance between the microphone 2 and the lips of the speaker 21 including the motion of the lips and their peripheral part when the speaker 21 speaks to a distance calculation part 7. The calculation part 6 converts the detection signal into a digital value and outputs it to a voicing section detection part 7. The detection part 7 uses a threshold value which is determined adaptively to regard the section wherein the largest variation in the distance between the microphone 2 and lips is obtained as the voicing section, and outputs it to a time normalization part 8 as the next candidate for a speech section signal outputted from a speech section detection part 5. When the external noise is too loud to detect the speech section through the microphone 2, the voicing section is utilized to find the speech section.

Description

【発明の詳細な説明】[Detailed description of the invention]

[発明の目的] (産業上の利用分野) 本発明は音声認識装置に関する。 (従来の技術) 音声認識装置は、発声者が発生した音声を入力して、そ
の音声信号から音声区間を検出し、この音声区間の信号
を時間的に正規化して得た特徴量と標準パターンとの間
で類似度′a算を行ない、最も高いスコアを示すカテゴ
リを音声認識結果として出力する装置であって、人間の
音声を利用した操作により自動的に作動を行なう種々の
装置に使用されている。 しかして、音声認識装置を使用した音声認識では、その
前処理として発声者が発生した音声における音声区間、
を検出する処理を行なうことが必要である。 従来の音声認識装置において音声区間を検出するために
は、発声者の音声をマイクロフォンから入力し、入力し
た音声を分析して得た音響的特徴パラメータに基づいて
、適宜なしきい値を利用して音声区間の始端と終端を検
出して音声区間を検出することが行なわれている。 (発明が解決しようとする課題) しかしながら、このような従来の音声認識装置における
音声区間の検出の方法においては、外部騒音が大きい環
境では、発声者が発生した音声に加え°て外部の騒音が
一緒にマイクロフォンに入力されることがある。この場
合には、音声の分析時微量も、本来の音声の成分とそれ
以外の外部騒音の成分とが重畳してしまうために、正確
な音声区間の検出が困難となる可能性がある。これは特
に騒音レベルが大きい場合や突発的なノイズが発生する
場合に顕著である。 本発明は前記事情に基づいてなされたもので、外部騒音
に影響されることなく音声区間の検出を正確に行なうこ
とができる音声認識装置を提供することを目的とする。 [発明の構成] (課題を解決するための手段) 前記目的を達成するために本発明の音声認識装置は、発
声者の器とマイクロフォンとの間の距離を検出する距離
センサと、この距離センサで検出した信号をデジタル量
に変換する距離計算手段と、この距離計算手段で出力さ
れた信号の距離時系列から安定した距ilitmを抽出
し唇およびその付近部の動きから音声の発声区間を検出
する発声区間検出手段とを備え、発声区間検出部で検出
した発声区間を音声区間の次候補とすることを特徴とす
るものである。 (作用) すなわち、距離センサが、発声者が発声した時の唇とマ
イクロフォンとの間の距離を検出し、Bが動いている区
間を発声者が音声を発生している区間とみなして音声区
間の次候補としてエントリする。これにより外部騒音が
大きい環境下において、マイクから入力した大きな騒音
を含む音声信号から音声区間を検出することが困難な場
合でも、発声区間を利用して正確に音声区間を検出する
ことができる。 (実施例) 以下本発明の実施例を図面を参照して説明する。 本発明の音声認識装置の構成の一実施例を第1図につい
て説明する。 図中1はマイクロフォン2と距離センサ3とを一体に取
付けたマイクロフォン装置!で、第2図で示すように発
声者21が装着して使用する。音声入力部であるマイク
ロフォン2は接話形のもので、発声者21の唇22の前
方の一定距離を置いた位置に設けられる。距離センサ3
はマイクロフォン2と発声者21の唇22との間の距離
を測定するセンサで、発声時に唇22およびその付近部
が動くことによりマイクロフォン2と唇22との間の距
離が変化するので、その距離の変化を検出する。このた
め、距離センサ3はマイクロフォン2と唇22との距離
を正確に検出できる位置に設けられる。距離センサ3と
しては、赤外線センサ、超音波センサなどを使用するが
、なかでも赤外線センサはノイズが少なく好適である。 マイクロフォン2と距離センサ3は支持部材1aにより
一体に支持され、この支持部材1aは発声者21が装着
するようになっている。なお、このマイクロフォン2と
距離センサ3は信号線(図示せず)を介して装置本体に
信号を送るようになっている。 4は音響分析部で、マイクロフォン2から入力した音声
信号を受け、その音響的特徴パラメータを抽出して、そ
の信号を音声区間検出部5に出力するものである。音声
区間検出部5は、音響分析部4からの信号を受けて音声
信号を検出し、その信号を時間正規化部8に出力するも
のである。 6は距離計算部で、前記距離センサ3からの検出信号を
受けてデジタル量に変換し、その信号を発声区間検出1
部7に出力するものである。発声区間検出部7は距離計
算部6からの信号を受けて発声区間を検出し、その信号
を時間正規化部8に出力するものである。 8は時間正規化部で、音声区間検出部5および発声区間
検出部7から夫々出力された音声区間信号と発声区間信
号を受け、夫々の信号から時間的に正規化した特徴量を
得て類似度演算部9に出力するものである。類似度演算
部9は時間正規化部8からの信号を受けて標準パターン
10との間で類似度演算を行なうものである。 このように構成された音声認識装置により音声認識を行
なう場合について説明する。 マイクロフォン装置1を装着した発声者21が発生した
音声はマイクロフォン2に入力され、この音声信号は音
響分析部4で音響的特徴パラメータが抽出される。抽出
された特徴量の信号は音声区間検出部5に出力され、音
声区間検出部5において特徴量の一部(例えばパワー系
列)を用いて、適応的に決定されているしきい値により
音声区間の始端と終端を検出する。この音声区間信号は
時間規格化部8に出力される。 一方、マイクロフォン装置1に設けた距離センサ3は発
声者21が音声を発声した時の唇22およびその付近部
の動きに伴うマイクロフォン2と唇22との間の距離の
変化を検出し、その検出信号を距離計算部6に出力する
。距離計算部6ではこの検出信号をデジタル量に変換し
て発声区間検出部7に出力する。発声区間検出部7では
、適応的に決定されるしきい値を用いて、マイクロフォ
ン2と822との間の距離変動が最も大きい区間を検出
して、これを発声区間とみなす。この発声区間信号は前
記音声区間検出部5から出力された音声区間信号の次候
補として時間規制化部8に出力される。 時間規制化部8では、音声区間検出部5と発声区間検出
部7から夫々出力された音声区間信号と発声区間信号を
受けて、前記音響分析部4で抽出された特徴ff1(例
えばバンドパス出力部)を、音声区間内と発声区間内で
時間的に正規化した2通りの特徴量を得、この信号を類
似度演算部9に出力する。類似度演算部9では、これら
2通りの特徴量と標準パターン10との間で類似度演算
を行ない、最も高いスコアを示すカテゴリを音声認識結
果として出力する。 ここで、発声者21が発生した音声に基づく音声区間を
求めるとともに、音声発生時の唇22およびその付近部
の動きに着目して発声区間を求め、この発生区間を音声
区間の次候補に使用するものとしてノミネートしておく
から、外部騒音が大きくマイクロフォン2に音声に加え
て外部騒音も一緒に入力されて分析時微量から音声区間
を検出するのが困難な場合には、発声区間を利用して音
声区間を求めることができる。求めた音声区間は、本来
の音声区間と合致する正確なものである。 次に、発声区間検出部7と音声区間検出部5の動作を説
明して、本発明装置の特徴を説明する。 第2図は、距離計算部6で出力されたマイクロフォン2
と発声者21の唇22の間の距離から発声区間を探索す
る音声区間検出部7の内部処理のフローの概要を示して
いる。まず、マイクロフォン2と唇22との間の時系列
データD (n)に対して平滑化処理を行ない安定した
距離変動量の時系列d (n)を抽出する(SPI)。 次にこの時系列d (n)のn−1〜10の区間で、無
発声区間とみなせる時の平均距離d^を算出しく5P2
)、これに適応的に決めたオフセットdoを加算したし
きい値dTll (−dT + do )を決定する(
S P 3)。 発声区間SA、SEの探索は、d (n) > d T
llの区間が連続して数フレーム以上続いた時、初めて
d (n) > d Tllになった位置を始端とし、
さらにd (n) > d Tllの区間が連続して1
0数フレーム以上続いた後にd (n) < d Tl
lの区間が連続して数フレーム以上続いた時、初めてd
 (n) < d Tllとなった位置を終端と決定し
、このSA、SEの区間を発声区間として時間規制化部
8へ信号を送る(SF3)。 また、音声区間検出部5も同様な手法によって音声区間
を探索する。この場合、マイクロフォン2から入力され
た音声信号から抽出した特徴量により決定され、その特
徴量は必ずしも音声によるものだけでなく、マイクロフ
ォン周辺の外部ノイズ成分が含まれることがある。 第3図および第4図は、大きな外部ノイズが音声区間の
前後にある場合の発声区間検出部7と音声区間検出部の
処理の様子を示す線図である。第3図で示す発声区間検
出部7による検出発声区間SA、SEは、マイクロフォ
ンと唇との間の変動距離から探索されたものであるから
、外部ノイズの影響を全く受けずに、発声者の口の動き
から音声が発生されたとみなされる区間を検出すること
ができる。一方、第4図で示す音声区間検出部5の処理
では、適応的に決定されたしきいii p T11を用
いてもノイズ成分のために正確な音声区間ST。 ETの検出は不可能となり、ノイズ成分を含んだ区間S
F、EFが誤検出される。音声区間の検出は、音声認識
装置にとっては致命的なエラーとなる。この様な事態を
避けるために、外部ノイズが比較的大きな環境において
も、外部ノイズ音に影響されることがない発声区間SA
、EAを音声区間の次候補としてエントリすることによ
り、より安定した高い認識率の音声認識装置を実現する
ことができる。 なお、本発明は前述した実施例に限定されず、要旨を変
更しない範囲で種々変形して実施することができる。音
声入力用のマイクロフォンとしては通常使用される接話
型のマイクロフォンに限定されず、本発明装置を適用す
る音声取込みを行なう装置により変更される。例えば電
話受話器をマイクロフォンとして使用できる。この場合
には、電話受話機を一定位置に固定し、また距離センサ
を電話受話器に一体に取付ける。
[Object of the Invention] (Industrial Application Field) The present invention relates to a speech recognition device. (Prior art) A speech recognition device inputs the speech generated by a speaker, detects a speech section from the speech signal, and temporally normalizes the signal of this speech section to obtain feature quantities and a standard pattern. This is a device that calculates the degree of similarity 'a between ``a'' and outputs the category with the highest score as a voice recognition result, and is used in various devices that operate automatically by operations using human voice. ing. Therefore, in speech recognition using a speech recognition device, as a preprocessing step, the speech interval of the speech generated by the speaker is
It is necessary to perform processing to detect the In order to detect a speech interval in a conventional speech recognition device, the voice of the speaker is input through a microphone, and an appropriate threshold value is used based on the acoustic feature parameters obtained by analyzing the input voice. A voice section is detected by detecting the start and end of the voice section. (Problem to be Solved by the Invention) However, in the method of detecting speech intervals in such a conventional speech recognition device, in an environment with large external noise, external noise may be heard in addition to the voice generated by the speaker. It may also be input into the microphone. In this case, when analyzing the voice, even a minute amount of the original voice component and other external noise components are superimposed, so it may be difficult to accurately detect the voice section. This is particularly noticeable when the noise level is high or sudden noise occurs. The present invention has been made based on the above-mentioned circumstances, and an object of the present invention is to provide a speech recognition device that can accurately detect speech sections without being affected by external noise. [Structure of the Invention] (Means for Solving the Problem) In order to achieve the above object, the speech recognition device of the present invention includes a distance sensor that detects the distance between the speaker's device and the microphone, and a distance sensor that detects the distance between the speaker's device and the microphone. A distance calculation means for converting the detected signal into a digital quantity, and a stable distance ilitm is extracted from the distance time series of the signal output by this distance calculation means, and the utterance period of the voice is detected from the movement of the lips and the surrounding area. The invention is characterized in that it includes a vocalization section detecting means for detecting a vocalization section, and uses the vocalization section detected by the vocalization section detecting section as the next candidate for the vocalization section. (Function) In other words, the distance sensor detects the distance between the speaker's lips and the microphone when the speaker speaks, and the section in which B is moving is regarded as the section in which the speaker is generating sound, and the speech section is determined. Enter as the next candidate. As a result, even if it is difficult to detect a voice section from a voice signal including loud noise input from a microphone in an environment with large external noise, the voice section can be accurately detected using the utterance section. (Example) Examples of the present invention will be described below with reference to the drawings. An embodiment of the configuration of the speech recognition device of the present invention will be described with reference to FIG. In the figure, 1 is a microphone device in which a microphone 2 and a distance sensor 3 are integrated! As shown in FIG. 2, the speaker 21 wears and uses the device. The microphone 2, which is a voice input section, is of a close-talking type and is provided at a position a certain distance in front of the lips 22 of the speaker 21. Distance sensor 3
is a sensor that measures the distance between the microphone 2 and the lips 22 of the speaker 21. The distance between the microphone 2 and the lips 22 changes as the lips 22 and the surrounding area move when speaking. Detect changes in Therefore, the distance sensor 3 is provided at a position where the distance between the microphone 2 and the lips 22 can be accurately detected. As the distance sensor 3, an infrared sensor, an ultrasonic sensor, or the like is used, and among them, an infrared sensor is preferable because it has less noise. The microphone 2 and the distance sensor 3 are integrally supported by a support member 1a, which is worn by a speaker 21. Note that the microphone 2 and the distance sensor 3 are configured to send signals to the main body of the apparatus via a signal line (not shown). Reference numeral 4 denotes an acoustic analysis section that receives the audio signal input from the microphone 2, extracts its acoustic characteristic parameters, and outputs the signal to the audio section detection section 5. The voice section detection section 5 receives the signal from the acoustic analysis section 4, detects a voice signal, and outputs the signal to the time normalization section 8. 6 is a distance calculation unit which receives the detection signal from the distance sensor 3, converts it into a digital quantity, and converts the signal into a digital quantity.
It is output to section 7. The vocalization section detecting section 7 receives the signal from the distance calculation section 6, detects the vocalization section, and outputs the signal to the time normalization section 8. Reference numeral 8 denotes a time normalization unit which receives the voice interval signal and the utterance interval signal respectively output from the voice interval detection unit 5 and the utterance interval detection unit 7, obtains temporally normalized feature quantities from each signal, and calculates the similarity. It is output to the degree calculation section 9. The similarity calculation section 9 receives the signal from the time normalization section 8 and performs a similarity calculation between it and the standard pattern 10. A case in which speech recognition is performed by the speech recognition device configured in this manner will be described. The voice generated by the speaker 21 wearing the microphone device 1 is input to the microphone 2, and the acoustic analysis unit 4 extracts acoustic characteristic parameters from this voice signal. The signal of the extracted feature quantity is output to the speech interval detection unit 5, and the speech interval detection unit 5 uses a part of the feature quantity (for example, a power sequence) to determine the speech interval according to an adaptively determined threshold value. Detect the start and end of. This voice section signal is output to the time standardization section 8. On the other hand, the distance sensor 3 provided in the microphone device 1 detects a change in the distance between the microphone 2 and the lips 22 due to the movement of the lips 22 and its vicinity when the speaker 21 utters a voice. The signal is output to the distance calculation section 6. The distance calculation section 6 converts this detection signal into a digital quantity and outputs it to the utterance section detection section 7. The utterance section detecting section 7 uses an adaptively determined threshold to detect the section in which the distance variation between the microphone 2 and 822 is the largest, and regards this as the utterance section. This vocalization section signal is output to the time regulation section 8 as the next candidate for the vocalization section signal outputted from the vocalization section detecting section 5. The time regulation unit 8 receives the voice interval signal and the utterance interval signal output from the voice interval detection unit 5 and the utterance interval detection unit 7, respectively, and receives the feature ff1 (for example, bandpass output) extracted by the acoustic analysis unit 4. part) is temporally normalized within the speech interval and within the utterance interval to obtain two types of feature quantities, and output these signals to the similarity calculation unit 9. The similarity calculation unit 9 performs similarity calculation between these two types of feature amounts and the standard pattern 10, and outputs the category showing the highest score as a speech recognition result. Here, a voice interval is determined based on the voice generated by the speaker 21, and a voice interval is determined by focusing on the movement of the lips 22 and its vicinity when the voice is generated, and this generated interval is used as the next candidate for the voice interval. Therefore, if the external noise is large and the external noise is input to microphone 2 in addition to the voice, and it is difficult to detect the vocal interval from a small amount during analysis, the vocal interval can be used. The speech interval can be found by The obtained voice section is accurate and matches the original voice section. Next, the operations of the utterance period detecting section 7 and the voice section detecting section 5 will be explained, and the features of the apparatus of the present invention will be explained. FIG. 2 shows the microphone 2 output from the distance calculation section 6.
2 shows an overview of the internal processing flow of the voice section detecting section 7 that searches for a voice section from the distance between the lips 22 of the speaker 21 and the lips 22 of the speaker 21. First, a smoothing process is performed on the time series data D (n) between the microphone 2 and the lips 22 to extract a time series d (n) of stable distance variation (SPI). Next, calculate the average distance d^ in the interval n-1 to 10 of this time series d (n) that can be considered as a silent interval 5P2
), the threshold value dTll (-dT + do) is determined by adding the adaptively determined offset do to this (
S P 3). The search for vocalization sections SA and SE is performed using d (n) > d T
When the section ll continues for several frames or more, the position where d (n) > d Tll for the first time becomes the starting point,
Furthermore, the interval d (n) > d Tll is continuously 1
After 0 frames or more, d (n) < d Tl
Only when the interval l continues for several frames or more, d
The position where (n) < d Tll is determined as the end, and a signal is sent to the time regulating unit 8 with this SA and SE section as the utterance section (SF3). Furthermore, the voice section detecting section 5 also searches for a voice section using a similar method. In this case, it is determined by the feature amount extracted from the audio signal input from the microphone 2, and the feature amount is not necessarily based on the voice, but may include external noise components around the microphone. FIGS. 3 and 4 are diagrams showing the processing of the vocalization section detection section 7 and the speech section detection section when large external noise is present before and after the speech section. Since the vocalization sections SA and SE detected by the vocalization section detecting section 7 shown in FIG. It is possible to detect the interval in which speech is considered to have been generated based on mouth movements. On the other hand, in the processing of the speech section detection unit 5 shown in FIG. 4, even if the adaptively determined threshold ii p T11 is used, the speech section ST is accurate due to noise components. It becomes impossible to detect ET, and the section S containing noise components
F and EF are erroneously detected. Detection of a voice section is a fatal error for the voice recognition device. In order to avoid such a situation, the utterance section SA is not affected by external noise even in an environment with relatively large external noise.
, EA as the next candidate for the speech section, it is possible to realize a speech recognition device with a more stable and high recognition rate. Note that the present invention is not limited to the embodiments described above, and can be implemented with various modifications without changing the gist. The microphone for voice input is not limited to the normally used close-talking type microphone, but may be changed depending on the device for capturing voice to which the device of the present invention is applied. For example, a telephone receiver can be used as a microphone. In this case, the telephone receiver is fixed at a fixed position, and the distance sensor is integrally attached to the telephone receiver.

【発明の効果】【Effect of the invention】

以上説明したように本発明の音声認識装置によれば、発
声者が発声した時における発声者のぽとマイクロフォン
との間の距離の変動を距離センサで測定して、Bが動い
た区間を発声区間として検出し、音声区間の次候補とし
てエントリすることにより、大きな騒音の環境下におい
てマイクロフォンから入力した発声の音声から音声区間
の検出が困難であ、る場合にも、発声区間を利用して外
部騒音に影響されることなく音声区間を正確に検出する
ことができ、使用環境に制約されることのない高い認識
率と安定した精度を得ることができる。
As explained above, according to the speech recognition device of the present invention, the distance sensor measures the change in the distance between the speaker's port and the microphone when the speaker speaks, and the range in which B moves is uttered. By detecting it as a segment and entering it as the next candidate for the speech segment, it is possible to use the speech segment even when it is difficult to detect the speech segment from the voice input from the microphone in a noisy environment. It is possible to accurately detect voice sections without being affected by external noise, and it is possible to obtain a high recognition rate and stable accuracy that is not restricted by the usage environment.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は本発明の音声認識装置の一実施例を示すシステ
ム構成図、第2図は同実施例におけるマイクロフォン装
置を示す説明図、第3図はこの実施例における発声区間
検出部の処理を示すフローチャート、第4図は発声区間
検出部による発声区間の検出の状態を示す線図、第5図
は音声区間検出部による音声区間の検出の状態を示す線
図である。 1・・・マイクロフォン装置、2・・・マイクロフォン
、3・・・距離センサ、4・・・音響分析部、5・・・
音声区間検出部、6・・・距離計算部、7・・・発声区
間検出部、8・・・時間正規化部、9・・・類似度演算
部。 出願人代理人 弁理士 鈴江武彦 第2図 第3図 第4図 第5図
FIG. 1 is a system configuration diagram showing an embodiment of the speech recognition device of the present invention, FIG. 2 is an explanatory diagram showing a microphone device in the same embodiment, and FIG. FIG. 4 is a diagram showing the state of voice section detection by the voice section detecting section, and FIG. 5 is a diagram showing the state of voice section detection by the voice section detecting section. DESCRIPTION OF SYMBOLS 1... Microphone device, 2... Microphone, 3... Distance sensor, 4... Acoustic analysis part, 5...
Speech section detection unit, 6... Distance calculation unit, 7... Vocalization section detection unit, 8... Time normalization unit, 9... Similarity calculation unit. Applicant's representative Patent attorney Takehiko Suzue Figure 2 Figure 3 Figure 4 Figure 5

Claims (1)

【特許請求の範囲】[Claims] 発声者が発生した音声をマイクロフォンから入力して、
その音声信号から音声区間を検出し、この音声区間の信
号を時間的に正規化して得た特徴量と標準パターンとの
間で類似度演算を行なって音声認識を行なう装置におい
て、前記発声者の唇とマイクロフォンとの間の距離を検
出する距離センサと、この距離センサで検出した信号を
デジタル量に変換する距離計算手段と、この距離計算手
段で出力された信号の距離時系列から安定した距離量を
抽出し前記唇およびその付近部の動きから前記音声の発
声区間を検出する発声区間検出手段とを備え、前記発声
区間検出部で検出した発声区間を前記音声区間の次候補
とすることを特徴とする音声認識装置。
Input the voice generated by the speaker from the microphone,
In an apparatus that performs speech recognition by detecting a speech section from the speech signal and performing a similarity calculation between the feature amount obtained by temporally normalizing the signal of this speech section and a standard pattern, A distance sensor that detects the distance between the lips and the microphone, a distance calculation means that converts the signal detected by this distance sensor into a digital quantity, and a stable distance based on the distance time series of the signal output by this distance calculation means. utterance period detecting means for extracting the amount of utterance and detecting the utterance section of the voice from the movement of the lips and the vicinity thereof, the utterance section detected by the utterance section detecting section being set as a next candidate for the voice section. Characteristic voice recognition device.
JP63141069A 1988-06-08 1988-06-08 Speech recognition device Pending JPH01310399A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP63141069A JPH01310399A (en) 1988-06-08 1988-06-08 Speech recognition device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP63141069A JPH01310399A (en) 1988-06-08 1988-06-08 Speech recognition device

Publications (1)

Publication Number Publication Date
JPH01310399A true JPH01310399A (en) 1989-12-14

Family

ID=15283514

Family Applications (1)

Application Number Title Priority Date Filing Date
JP63141069A Pending JPH01310399A (en) 1988-06-08 1988-06-08 Speech recognition device

Country Status (1)

Country Link
JP (1) JPH01310399A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0676899A3 (en) * 1994-04-06 1997-11-19 AT&T Corp. Audio-visual communication system having integrated perceptual speech and video coding
JP2006139117A (en) * 2004-11-12 2006-06-01 Kenwood Corp Microphone apparatus, utterance detector, utterance detecting method, and voice outputting method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0676899A3 (en) * 1994-04-06 1997-11-19 AT&T Corp. Audio-visual communication system having integrated perceptual speech and video coding
JP2006139117A (en) * 2004-11-12 2006-06-01 Kenwood Corp Microphone apparatus, utterance detector, utterance detecting method, and voice outputting method
JP4568905B2 (en) * 2004-11-12 2010-10-27 株式会社ケンウッド Microphone device and speech detection device

Similar Documents

Publication Publication Date Title
EP1503368B1 (en) Head mounted multi-sensory audio input system
US20190385605A1 (en) Method and system for providing voice recognition trigger and non-transitory computer-readable recording medium
JP2019101385A (en) Audio processing apparatus, audio processing method, and audio processing program
US20040015357A1 (en) Method and apparatus for rejection of speech recognition results in accordance with confidence level
JP3838159B2 (en) Speech recognition dialogue apparatus and program
WO2020250828A1 (en) Utterance section detection device, utterance section detection method, and utterance section detection program
JP3798530B2 (en) Speech recognition apparatus and speech recognition method
JPH01310399A (en) Speech recognition device
Yoshinaga et al. Audio-visual speech recognition using new lip features extracted from side-face images
Mathew et al. Piezoelectric Throat Microphone Based Voice Analysis
JP2005010652A (en) Speech detecting device
JPS6338993A (en) Voice section detector
JPH04184495A (en) Voice recognition device
JPH03114100A (en) Voice section detecting device
JP2005107384A (en) Device and method for speech recognition, program, and recording medium
JPH04324499A (en) Speech recognition device
JPH02178699A (en) Voice recognition device
JP2021162685A (en) Utterance section detection device, voice recognition device, utterance section detection system, utterance section detection method, and utterance section detection program
JPH0316038B2 (en)
JPH0546196A (en) Speech recognition device
JP3125928B2 (en) Voice recognition device
JP2000206986A (en) Language information detector
Ishi et al. Real-time audio-visual voice activity detection for speech recognition in noisy environments
Suk et al. Voice/non-voice classification using reliable fundamental frequency estimator for voice activated powered wheelchair control
Sahu et al. Odia isolated word recognition using DTW