JPS62237498A - Voice section detecting method - Google Patents

Voice section detecting method

Info

Publication number
JPS62237498A
JPS62237498A JP61079304A JP7930486A JPS62237498A JP S62237498 A JPS62237498 A JP S62237498A JP 61079304 A JP61079304 A JP 61079304A JP 7930486 A JP7930486 A JP 7930486A JP S62237498 A JPS62237498 A JP S62237498A
Authority
JP
Japan
Prior art keywords
value
level
detection
input
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP61079304A
Other languages
Japanese (ja)
Other versions
JPH0740200B2 (en
Inventor
陽一 山田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Oki Electric Industry Co Ltd
Original Assignee
Oki Electric Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oki Electric Industry Co Ltd filed Critical Oki Electric Industry Co Ltd
Priority to JP61079304A priority Critical patent/JPH0740200B2/en
Publication of JPS62237498A publication Critical patent/JPS62237498A/en
Publication of JPH0740200B2 publication Critical patent/JPH0740200B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。
(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】 (産業上の利用分野) この発明は音声認識装置における音声区間検出方法に関
するものである。
DETAILED DESCRIPTION OF THE INVENTION (Industrial Field of Application) The present invention relates to a speech segment detection method in a speech recognition device.

(従来の技術) 従来より、音声認識処理に当り、音声区間の検出を行っ
ている。先ず、この従来の音声区間検出方法につき第4
図を参照して説明する。
(Prior Art) Conventionally, speech sections have been detected in speech recognition processing. First, let us introduce the fourth aspect of this conventional voice section detection method.
This will be explained with reference to the figures.

従来の音声区間検出方法によれば、音声信号の入力レベ
ル値を時間tの関数S(し)とした時、この音声信号が
入力した時の雑音レベル値N、入力レベル値5(L)等
よりレベル閾値りを設定する。
According to the conventional voice section detection method, when the input level value of a voice signal is a function S (shi) of time t, the noise level value N when this voice signal is input, the input level value 5 (L), etc. Set a higher level threshold.

その閾値りと比較して入力レベル値5(L)が大である
状態(S(j)>L)が一定時間すなわち始端決定高し
ヘル入力最低継続時間TS以上継続した時、この継続時
間TSの開始時刻を音声区間の始端とする。その後入力
レベル値S (t)がその閾値りと比較して小である状
態(S(t)≦L)が一定時間すなわち終端決定低レベ
ル入力継続時間TE以、I:継続した時、この継続時間
TEの開始時刻を音声区間の終端とするという判定方法
により音声区間の決定を行っていた。
When the state in which the input level value 5(L) is large compared to the threshold value (S(j)>L) continues for a certain period of time, that is, the minimum input duration time TS for starting end determination, this duration time TS The start time of is the start of the audio section. After that, when the state in which the input level value S (t) is smaller than the threshold value (S (t) ≦L) continues for a certain period of time, that is, after the termination determination low level input duration time TE, this continuation The voice section was determined by a determination method in which the start time of time TE was taken as the end of the voice section.

この場合、雑音レベル値りとしては、音声信号の入力中
でないと想定される時刻toから予め定められた長さの
、時間的に連続した雑音測定時間間隔TNにおける入力
レベル値5(L)の平均値 を使用している。そしてレベル閾値りとしては雑音レベ
ル値Nに予め定められた定数COを加算した値 L=N+CO とする方法か一般的な方法であった。
In this case, the noise level value is the input level value 5 (L) in a temporally continuous noise measurement time interval TN of a predetermined length from time to when it is assumed that no audio signal is being input. Average values are used. The level threshold value has been generally determined by adding a predetermined constant CO to the noise level value N, such that the value L=N+CO.

この方法によりば、第4図においてレベル閾値りと入力
レベル値S(し)とが一致する時刻tl。
According to this method, the time tl in FIG. 4 at which the level threshold value coincides with the input level value S(shi).

t2.t3.t/Iとすると、入力レベル値5(L)が
このレベル閾値りを前述の継続時間TE以上越える区間
の開始点例えば時刻t3を音声区間の始端と決定してい
る。
t2. t3. Assuming t/I, the start point of the section in which the input level value 5 (L) exceeds this level threshold by more than the aforementioned duration TE, for example time t3, is determined as the start point of the voice section.

次に、レベル閾値りを前述の継続時間TE以上下回る区
間の開始点例えば時刻t4をその音声区間の終端と決定
している。
Next, the starting point, for example, time t4, of the section where the level threshold is lower than the above-mentioned duration time TE is determined as the end of the voice section.

(発明が解決しようとする問題点) しかしながら、雑音変動が激しい環境下で音声認識装置
を実際に使用する場合には、雑音レベル値測定時刻にお
ける入力レベル値と、雑音レベル値測定時刻からある時
間経過後における入力レベル値とが異なる値を取る確率
は経過時間に従って大きくなると一般に考えられている
。従って、雑音レベル値測定時刻からの経過時間が比較
的短い期間内の時刻例えば音声発声開始直前の時刻にお
ける入力レベル値(大きさ)が雑音レベル値と異なった
値となる確率は比較的小さいので、前述した従来の如く
設定されたレベル閾値を使用して始端を安定かつ正確に
検出出来る確率は大である。
(Problem to be Solved by the Invention) However, when actually using a speech recognition device in an environment with severe noise fluctuations, the input level value at the noise level value measurement time and the time period from the noise level value measurement time are It is generally believed that the probability that the input level value will take a different value after elapsed time increases as time elapses. Therefore, the probability that the input level value (magnitude) will be different from the noise level value at a time within a relatively short period of time since the noise level value measurement time, for example, just before the start of speech production, is relatively small. There is a high probability that the starting point can be detected stably and accurately using the level threshold set as described above in the conventional manner.

しかし、雑音レベル値測定時刻よりの経過時間が比較的
長くなる時刻例えば音声発声終了直後の時刻における入
力レベル値(大きさ)が雑音レベル値と異なった値とな
る確率は大きいので、終端検出時には、最初に設定した
レベル閾値が終端検出に適した値ではなくなっており、
これがため終端を正確に検出出来なくなる確率が大とな
り、これに起因して音声認識性能の低下をもたらすとい
う問題点があった。
However, since there is a high probability that the input level value (magnitude) will be different from the noise level value at a time when the elapsed time from the noise level value measurement time is relatively long, for example, immediately after the end of voice utterance, when detecting the end , the initially set level threshold is no longer suitable for terminal detection,
This increases the probability that the end cannot be detected accurately, which causes a problem in that speech recognition performance deteriorates.

この発明は上述した問題点の解決を図るためになされた
ものである。
This invention has been made to solve the above-mentioned problems.

従って、この発明の目的は音声区間の終端を安定かつ正
確に検出することを可能とした音声区間検出方法を提供
することにある。
Accordingly, an object of the present invention is to provide a voice section detection method that makes it possible to stably and accurately detect the end of a voice section.

(問題点を解決するための手段) この目的の達成を図るため、この発明による音声区間検
出方法によれば、次のような処理を行う(第1図参照)
(Means for solving the problem) In order to achieve this objective, according to the speech interval detection method according to the present invention, the following processing is performed (see Figure 1).
.

この発明によれば、第1段階として仮終端を決定し、第
2段階として真の終端を決定する。
According to this invention, the temporary termination is determined in the first step, and the true termination is determined in the second step.

この第一段階では、入力音声信号の雑音レベル値Nに予
め定められた正の定aC1を加えて得られた始端検出用
レベル閾値LSよりも値の小さくかつ雑音レベル値Nよ
りも値が大きいレベル値を得、このレベル値を仮終端検
出用レベル閾値りにEと設定する。次に、入力レベル値
S(し)がこの坂路端検出用レベル閾値りにEを予め定
められている仮終端決定低レベル入力最低継続時間TK
E以上下回る区間の開始時刻t3をその音声区間の仮終
端と決定する。
In this first stage, the value is smaller than the level threshold LS for start detection obtained by adding a predetermined positive constant aC1 to the noise level value N of the input audio signal, and the value is larger than the noise level value N. A level value is obtained, and this level value is set as E as a level threshold for temporary termination detection. Next, the input level value S(shi) is set to the level threshold for detecting the slope end, and E is predetermined for the temporary termination determination low level input minimum duration TK.
The start time t3 of the section below E or more is determined as the temporary end of that voice section.

次に、第2段階として、仮終端直後における入力レベル
値の大きさを終端検出用雑音レベル値NEとし、この雑
音レベル値NEに予め定められた正の定数03を加算し
た値を終端検出用レベル閾値LEとする。次に、始端か
ら坂路端までの記憶された入力レベル値S(し)とこの
レベル値LEとの交差時刻t2を真の終端として検出す
る。この場合、終端検出用レベル閾値LEを始端検出用
レベル閾値LSと同一の値はもとより、このレベル閾値
LSよりも大きな値或いは小さな値に設定することが出
来る。
Next, as a second step, the magnitude of the input level value immediately after the temporary termination is set as the noise level value NE for termination detection, and the value obtained by adding a predetermined positive constant 03 to this noise level value NE is used for termination detection. Let the level threshold value LE be. Next, the intersection time t2 of the stored input level value S(shi) from the start end to the slope end and this level value LE is detected as the true end. In this case, the level threshold LE for end detection can be set not only to the same value as the level threshold LS for start detection, but also to a value larger or smaller than this level threshold LS.

(作用) このように、この発明によれば、坂路端検出直後の入力
レベル値S (t)を真の終端検出用の雑音レベル閾値
LEとして設定しているので、終端検出は雑音の時間的
変動の影響を受けにくく、従って、音声区間の検出は安
定かつ正確となる。
(Operation) As described above, according to the present invention, the input level value S (t) immediately after detecting the slope end is set as the noise level threshold LE for detecting the true end, so the end detection is performed based on the temporal It is less susceptible to fluctuations, so detection of voice sections is stable and accurate.

(実施例) 以下、図面を参照してこの発明の音声区間検出方法の実
施例につき説明する。
(Embodiments) Hereinafter, embodiments of the voice section detection method of the present invention will be described with reference to the drawings.

第1図はこの発明の音声区間検出方法の説明に供する説
明図で横軸に時刻tを取り、縦軸にレベル値を取って入
力レベル値の変化の様子の一例を示しである。第2図は
この発明の音声区間検出方法を実施するだめの音声区間
検出部の一構成例を示すブロック図である。又、第3図
はこの発明の詳細な説明に供する処理の流れ図である。
FIG. 1 is an explanatory diagram for explaining the voice section detection method of the present invention, and shows an example of changes in input level values, with time t plotted on the horizontal axis and level values plotted on the vertical axis. FIG. 2 is a block diagram showing an example of the configuration of a speech section detection section for implementing the speech section detection method of the present invention. Moreover, FIG. 3 is a flowchart of processing provided for detailed explanation of the present invention.

第2図に示す音声区間検出部は、レベル抽出部11、仮
音声区間検出用閾値設定部12、仮音声区間検出部I3
、レベル記憶部I4、終端検出用閾値設定部15、終端
検出部16及び制御部17を以って構成している。尚、
以下の説明において、流れ図の処理ステップをSで表わ
す。
The speech section detection section shown in FIG. 2 includes a level extraction section 11, a provisional speech section detection threshold setting section 12, and a provisional speech section detection section I3.
, a level storage section I4, a termination detection threshold setting section 15, a termination detection section 16, and a control section 17. still,
In the following description, processing steps in the flowchart are denoted by S.

先ず、処理をスタートさせる(Sl)。入力音声信号a
1をレベル検出部11に入力させ、そのレベル抽出を行
って入力レベル信号a2に変換する(S2)。この入力
レベル信号a2の入力レベル値をsB)とし、第1図に
実線で示す。この入力レヘル信号a2を仮音声区間検出
用閾値設定部■2、仮音声区間検出部13及びレベル記
憶部14へ出力する。
First, processing is started (Sl). input audio signal a
1 is input to the level detection section 11, and its level is extracted and converted into an input level signal a2 (S2). The input level value of this input level signal a2 is defined as sB) and is shown by a solid line in FIG. This input level signal a2 is outputted to the provisional speech section detection threshold setting section (2), the provisional speech section detection section 13, and the level storage section 14.

制御部22は音声の発声中でないと想定される時刻にお
いて仮音声区間検出用閾値設定部12へ仮音声区間検出
用閾値設定指令信号rlを出力する。
The control section 22 outputs a temporary voice section detection threshold setting command signal rl to the temporary voice section detection threshold setting section 12 at a time when it is assumed that the voice is not being uttered.

仮音声区間検出用閾値設定部I2は、仮音声区間検出用
閾値設定指令信号r1が入力された時刻t。
The provisional speech section detection threshold setting unit I2 receives the provisional speech section detection threshold setting command signal r1 at time t.

より予め定められた時間期間TSNだけ入力レベル信号
a2を受は取り、この時間期間における入カレへル値S
(し)の平均値を雑音レベル値Nすなわち と設定する。次に、この雑音レベル値Nに対して予め学
習して定めら九た正の定数C1を加算した値を始端検出
用レベル閾値LSとして決定する(S3)。
The input level signal a2 is received for a predetermined time period TSN, and the input level signal S during this time period is
The average value of (shi) is set as the noise level value N, that is. Next, a value obtained by adding a ninety-plus constant C1 learned and determined in advance to this noise level value N is determined as a level threshold LS for start edge detection (S3).

LS=N+C1 このレベル閾値LSの信号b1を仮音声区間検出部13
に送る。
LS=N+C1 The signal b1 of this level threshold LS is sent to the temporary speech section detection unit 13.
send to

次に、仮音声区間検出用閾値設定部12において、この
雑音レベル値Nに対して始端検出用レベル閾値設定に用
いた定数C1に比較して小さい値を持つrめ学習により
定められだ正の定数02を加算した値を坂路端検出用し
ヘル閾値LKEとして決定する(S4)。
Next, in the provisional speech interval detection threshold setting unit 12, a positive value determined by r learning, which has a smaller value than the constant C1 used for setting the level threshold for start detection, is determined for this noise level value N. The value obtained by adding the constant 02 is used for slope edge detection and is determined as the health threshold LKE (S4).

LKE=N+C2 この坂路端検出用レベル閾値LKEの決め方は入力音声
信号の雑音レベル値Nに予め定められた正の定数CIを
加えて得られた始端検出用レベル閾値LSから、予め定
められた正の定数C’2を差し引いて、このレベル閾値
LSよりも値の小さくかつ雑音レベル値Nよりも値が大
きいレベル値を得、このレベル値を坂路端検出用レベル
閾値りにEと設定しても良い。
LKE=N+C2 The method of determining the level threshold LKE for detecting the slope end is to calculate the level threshold LKE for starting edge detection, which is obtained by adding a predetermined positive constant CI to the noise level value N of the input audio signal, to a predetermined positive value LKE. By subtracting the constant C'2, a level value smaller than this level threshold value LS and larger than the noise level value N is obtained, and this level value is set as E as the level threshold value for slope end detection. Also good.

仮音声区間検出用閾値設定部12はこのレベル閾値LK
Eの信号b2を仮音声区間検出部13へ送ると共に、仮
音声区間検出用閾値設定終了信号b3を制御部17へ送
る。
The provisional voice section detection threshold setting unit 12 uses this level threshold LK.
E signal b2 is sent to the temporary voice section detection section 13, and a temporary voice section detection threshold setting end signal b3 is sent to the control section 17.

制御部17はこの仮音声区間検出用閾値上定終了信号b
3が供給されると、仮音声区間検出部13へ仮音声区間
検出指令信号r2を出力する。
The control unit 17 receives this provisional voice section detection threshold upper determination end signal b.
3 is supplied, a temporary voice section detection command signal r2 is output to the temporary voice section detection section 13.

仮音声区間検出部13は、仮音声区間検出指令信号r2
の受信後、人カレベル信号a2.始端検出用レベル閾値
b2及び仮終端検出用レベル閾値b3を入力として仮音
声区間の検出を開始し、始端と仮終端とを検出する。
The temporary voice section detection unit 13 receives a temporary voice section detection command signal r2.
After receiving the human level signal a2. Detection of a temporary speech section is started by inputting the level threshold value b2 for detecting the start end and the level threshold value b3 for detecting the temporary end point, and the start end and the temporary end point are detected.

この始端時刻tlの検出処理においては、入力レベル値
5(t)が時間の経過により始端検出用レベル閾値LS
と一致した時刻t1からこの入力レベル値5(t)が、
学習により予め定められた始端決定高レベル入力最低継
続時間TS以上、このレベル閾値LSより大きな値とな
っていること5(t)>LS 但し、時間期間73以上 を検出した時、この継続時間TSの前述の開始時刻t1
を音声区間の始端と決定する(S5)。
In this process of detecting the starting point time tl, the input level value 5(t) changes over time to the starting point detection level threshold LS.
From time t1, which coincides with , this input level value 5(t) becomes
The starting point determination high level input minimum duration time TS predetermined by learning must be greater than this level threshold value LS5(t)>LS However, when a time period of 73 or more is detected, this duration time TS The aforementioned start time t1 of
is determined to be the starting point of the voice section (S5).

また、仮終端の検出処理においては、入力レベル値5(
L)が始端検出後、レベルを低下してきて仮終端検出用
レベル閾値LKEと一致した時刻t3からこの入力レベ
ル値S(し)が、学習により予め定められた坂路端決定
低レベル入力最低継続時間TKE以上、このレベル閾値
LSを下回る値となっていること 5(L)≦LKE 但し、継続時間TKE以上 を検出した時、この継続時間TKHの前述の開始時刻t
3を仮音声区間の仮終端と決定する(S6)。
In addition, in the temporary termination detection process, the input level value 5 (
From time t3 when L) decreases in level after detecting the start end and matches the level threshold LKE for provisional end detection, this input level value S(shi) changes to the slope end determination low level input minimum duration time predetermined by learning. The value must be greater than TKE and less than this level threshold LS5(L)≦LKE However, when the duration time TKE or more is detected, the above-mentioned start time t of this duration time TKH
3 is determined as the temporary end of the temporary voice section (S6).

このようにして検出された始端時刻t1の信号d1をレ
ベル記憶部14、終端検出部16及び制御部17へ出力
すると共に(S5)、検出された仮終端時刻t3の信号
d2をレベル記憶部14、終端検出用閾値設定部15、
終端検出部16及び制御部17へ出力する(S6)。
The signal d1 at the start time t1 detected in this way is output to the level storage section 14, the end detection section 16, and the control section 17 (S5), and the signal d2 at the detected temporary end time t3 is output to the level storage section 14. , end detection threshold setting unit 15,
The signal is output to the termination detection section 16 and the control section 17 (S6).

レベル記憶部14には始端時刻t1と仮終端時刻t3の
それぞれの信号d1及びd2が入力する。
Signals d1 and d2 at the starting point time t1 and the tentative ending point time t3 are input to the level storage section 14, respectively.

始端時刻信号diが入力すると、始端時刻tlから入力
レベル信号a2の入力レベル値5(t)の記憶を開始し
、この入力レベル値の記憶を仮終端時刻t3から予め学
習によって定められた所定時間を経過する時刻まで継続
して行う。
When the start end time signal di is input, storage of the input level value 5(t) of the input level signal a2 is started from the start end time tl, and this input level value is stored for a predetermined time predetermined by learning from the temporary end time t3. Continue until the time elapses.

、制御部17は仮音声区間検出部13からの仮終端時刻
d2を受信した後、終端検出用閾値設定指令信号r3を
終端検出用閾値設定部15へ出力する。
After receiving the provisional end time d2 from the provisional speech section detection section 13, the control section 17 outputs the end detection threshold setting command signal r3 to the end detection threshold setting section 15.

終端検出用閾値設定部15は、制御部17からの終端検
出用閾値設定指令信号r3を受は取った後、仮音声区間
検出部13からの仮終端時刻信号d2によって与えられ
る時刻t3から時間軸正方向へ予め定められた終端検出
用雑音測定時間TEN分の入力レベル値S(し)を終端
検出用雑音レベル信号e1としてレベル記憶部14から
受は取る。そして、この雑音測定時間TENでの入力レ
ベル値5(t)の平均値NEすなわち を終端検出用雑音レベル値NEと設定する。続いて、こ
の雑音レベル値NEに予め学習によって定められている
正の定数C3を加えて終端検出用レベル閾値LEと設定
する(S7)。すなわちLE=NE+C3 尚、この場合、この定数03を選定することによって、
終端検出用レベル閾値LEを始端検出用レベル閾値LE
と同一の値とすることはもとより、このレベル閾値LE
よりも大きな値或いは小さな値に設定″′4−ることか
出来る。
After receiving the end detection threshold setting command signal r3 from the control section 17, the end detection threshold setting section 15 sets the time axis from time t3 given by the temporary end time signal d2 from the temporary voice section detection section 13. The input level value S(shi) for a predetermined termination detection noise measurement time TEN in the positive direction is taken from the level storage unit 14 as the termination detection noise level signal e1. Then, the average value NE of the input level values 5(t) during this noise measurement time TEN is set as the termination detection noise level value NE. Subsequently, a positive constant C3 determined in advance by learning is added to this noise level value NE to set it as a level threshold value LE for end detection (S7). That is, LE=NE+C3 In this case, by selecting this constant 03,
The level threshold LE for end detection is set to the level threshold LE for start end detection.
Of course, this level threshold LE should be the same value as LE.
It is possible to set it to a value larger or smaller than ``'4-''.

このレベル閾値LEの信号f1を終端検出部16へ出力
する共に、終端検出用閾値設定終了信号f2を;レリ御
部17へ出力する。
The signal f1 of this level threshold LE is output to the termination detection section 16, and the termination detection threshold setting completion signal f2 is output to the control section 17.

制御部17は終端検出用閾値設定終了16を受は取ると
、終端検出部16へ終端検出指令信号r4を出力する。
When the control unit 17 receives the termination detection threshold setting completion 16, it outputs a termination detection command signal r4 to the termination detection unit 16.

終端検出部16は終端検出指令信号r4を受は取った後
、仮音声区間検出部13から入力した始端時刻信号dl
で定まる時刻tiから仮終端時刻信号d2で定まる時刻
t3までの仮音声区間の入力レベル値5(t)の信号e
2をレベル記憶部14から受は取り、更に終端検出用閾
値設定部15より終端検出用レベル閾値信号flを受は
取る。
After receiving the end detection command signal r4, the end detection section 16 receives the start end time signal dl input from the temporary voice section detection section 13.
A signal e with an input level value 5(t) of a temporary voice section from time ti determined by d2 to time t3 determined by the temporary end time signal d2.
2 from the level storage section 14, and further receives the termination detection level threshold signal fl from the termination detection threshold setting section 15.

そして、この終端検出部16において、これら信号e2
及びflによってそれぞれ定められる仮音声区間の入力
レベル値5(t)と、終端検出用レベル閾値LEとの大
小比較を仮終端時刻t3から時間軸負方向へ行っていき
、仮音声区間の入力レベル値5(t)が終端検出レベル
値LEよりも初めて大となる時刻例えばt2を真の終端
時刻として検出(S8)する。
Then, in this termination detection section 16, these signals e2
The input level value 5(t) of the temporary voice section determined by and fl is compared in magnitude with the level threshold value LE for end detection in the negative direction of the time axis from the temporary end time t3, and the input level of the temporary voice section is The time when the value 5(t) becomes larger than the end detection level value LE for the first time, for example t2, is detected as the true end time (S8).

このようにして決定された終端時刻の信号gを制御部1
7へ出力してこの音声区間検出の処理がエンドとなる(
S9)。
The signal g of the end time determined in this way is sent to the control unit 1.
7, and this voice section detection process ends (
S9).

(発明の効果) 上述した説明から明らかなように、この発明では、音声
認識装置使用時における雑音変動に対応するために、終
端検出用レベル閾値の設定を仮終端直後の入力レベル値
の平均値を基準にして行っている。従って、終端検出を
周囲雑音レベルの時間的変動の影響を受けずに行えるの
で、安定かつ正確な1:f意図間検出を行うことか出来
、よって音声認識装置における認識性能の向上が期待出
来る。
(Effects of the Invention) As is clear from the above description, in this invention, in order to cope with noise fluctuations when using a speech recognition device, the level threshold for termination detection is set to the average value of the input level values immediately after temporary termination. This is done based on. Therefore, since end detection can be performed without being affected by temporal fluctuations in the ambient noise level, stable and accurate 1:f inter-intentional detection can be performed, and improvement in the recognition performance of the speech recognition device can therefore be expected.

【図面の簡単な説明】[Brief explanation of drawings]

第1図はこの発明に係る音声区間検出方法の実施例の説
明に供する入力音声信号の入力レベル値の時間的変化の
様子を示す説明図、 第2図はこの発明の音声区間検出方法の説明に供する音
声区間検出部を示すブロック図、第3図はこの発明の音
声区間検出方法の処理の流れ図、 第4図は従来の音声区間検出方法の説明図である。 11・・・レベル抽出部 I2・・・仮音声区間検出用閾値設定部13・・・仮音
声区間検出部 14・・・レベル記憶部 15・・・終端検出用閾値設定部 16・・・終端検出部、   17・・・制御部。 特許出願人    沖電気工業株式会社゛どいl−り 処理のう龍fL(2) 第3図
FIG. 1 is an explanatory diagram showing how the input level value of an input audio signal changes over time to explain an embodiment of the voice section detection method according to the present invention, and FIG. 2 is an explanation of the voice section detection method according to the present invention. FIG. 3 is a flowchart of the processing of the speech section detection method of the present invention, and FIG. 4 is an explanatory diagram of the conventional speech section detection method. 11...Level extractor I2...Temporary voice section detection threshold setting section 13...Temporary voice section detection section 14...Level storage section 15...Temporary voice section detection threshold setting section 16...End Detection unit, 17... Control unit. Patent applicant: Oki Electric Industry Co., Ltd.

Claims (2)

【特許請求の範囲】[Claims] (1)入力音声信号の音声区間より前の所定の時間期間
におけ当該入力音声信号の入力レベル値の平均値を雑音
レベルとし、該雑音レベルを基準にして設定したレベル
閾値と、前記入力レベル値との大小比較により音声区間
を検出するに当り、入力音声信号の前記雑音レベル値に
予め定められた正の定数を加えて得られた始端検出用レ
ベル閾値よりも値が小さくかつ前記雑音レベル値よりも
値が大きいレベル値を仮終端検出用レベル閾値と設定し
、 前記入力レベル値が前記仮終端検出用レベル閾値を予め
定められている仮終端決定低レベル入力最低継続時間以
上下回る区間の開始時刻をその音声区間の仮終端と決定
し、 該仮終端直後における入力レベル値の大きさを終端検出
用雑音レベル値とし、該雑音レベル値に予め定められた
正の定数を加算した値を終端検出用レベル閾値と設定し
、 始端から仮終端までの記憶された前記入力レベル値と、
前記終端検出用レベル閾値との交差時刻の中で仮終端に
最も近い時刻を真の終端として検出する ことを特徴とする音声区間検出方法。
(1) The average value of the input level values of the input audio signal during a predetermined time period before the audio section of the input audio signal is taken as the noise level, and the level threshold set based on the noise level and the input level When detecting a voice section by comparing the noise level with a value, the value is smaller than the level threshold for start detection, which is obtained by adding a predetermined positive constant to the noise level value of the input voice signal, and the noise level is A level value larger than the above value is set as a temporary termination detection level threshold, and the input level value is lower than the temporary termination detection level threshold by a predetermined temporary termination determination low level input minimum duration time or more. The start time is determined as the temporary end of the speech section, the magnitude of the input level value immediately after the temporary end is set as the noise level value for end detection, and the value obtained by adding a predetermined positive constant to the noise level value is determined. Set as a level threshold for termination detection, and the input level value stored from the starting edge to the temporary termination,
A voice section detection method characterized by detecting a time closest to a provisional end among times of intersection with the end detection level threshold as a true end.
(2)前記終端検出用レベル閾値を前記始端検出用レベ
ル閾値と同一の値、該レベル閾値よりも大きな値又は小
さな値に設定することを特徴とする特許請求の範囲第1
項記載の音声区間検出方法。
(2) The level threshold for detecting the end point is set to the same value as the level threshold value for detecting the starting point, a value larger than the level threshold value, or a value smaller than the level threshold value.
The speech interval detection method described in Section 1.
JP61079304A 1986-04-08 1986-04-08 Voice section detection method Expired - Lifetime JPH0740200B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61079304A JPH0740200B2 (en) 1986-04-08 1986-04-08 Voice section detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61079304A JPH0740200B2 (en) 1986-04-08 1986-04-08 Voice section detection method

Publications (2)

Publication Number Publication Date
JPS62237498A true JPS62237498A (en) 1987-10-17
JPH0740200B2 JPH0740200B2 (en) 1995-05-01

Family

ID=13686101

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61079304A Expired - Lifetime JPH0740200B2 (en) 1986-04-08 1986-04-08 Voice section detection method

Country Status (1)

Country Link
JP (1) JPH0740200B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4810044B2 (en) * 2000-01-27 2011-11-09 ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー Voice detection device with two switch-off criteria
US8099277B2 (en) 2006-09-27 2012-01-17 Kabushiki Kaisha Toshiba Speech-duration detector and computer program product therefor
US8380500B2 (en) 2008-04-03 2013-02-19 Kabushiki Kaisha Toshiba Apparatus, method, and computer program product for judging speech/non-speech

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4810044B2 (en) * 2000-01-27 2011-11-09 ニュアンス コミュニケーションズ オーストリア ゲーエムベーハー Voice detection device with two switch-off criteria
US8099277B2 (en) 2006-09-27 2012-01-17 Kabushiki Kaisha Toshiba Speech-duration detector and computer program product therefor
US8380500B2 (en) 2008-04-03 2013-02-19 Kabushiki Kaisha Toshiba Apparatus, method, and computer program product for judging speech/non-speech

Also Published As

Publication number Publication date
JPH0740200B2 (en) 1995-05-01

Similar Documents

Publication Publication Date Title
JPS5852695A (en) Voice detector for vehicle
JPS62237498A (en) Voice section detecting method
JPH1195785A (en) Voice segment detection system
JP2002073061A (en) Voice recognition device and its method
JPS61259296A (en) Voice section detection system
JP3360978B2 (en) Voice recognition device
JPH0311139B2 (en)
JPS61140999A (en) Voice section detection system
JPH0138320B2 (en)
JPS61156100A (en) Voice recognition equipment
JPS62141595A (en) Voice detection system
JPS6247319B2 (en)
JP2737109B2 (en) Voice section detection method
JP3484559B2 (en) Voice recognition device and voice recognition method
JP3031081B2 (en) Voice recognition device
JP2748383B2 (en) Voice recognition method
JPS61223796A (en) Voice section detection system
JPS6120880B2 (en)
JP3063856B2 (en) Finding the minimum value of matching distance value in speech recognition
JPS63155196A (en) Voiceless sound detection
JPS5834986B2 (en) Adaptive voice detection circuit
KR0128669B1 (en) Real time detecting method for voice signal
JPS61113100A (en) Voice parameter detector
JP2003271189A (en) Circuit for detecting speaker direction and detecting method thereof
JPS63291096A (en) Voice section detecting system