JPS62237498A

JPS62237498A - Voice section detecting method

Info

Publication number: JPS62237498A
Application number: JP61079304A
Authority: JP
Inventors: 陽一山田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-04-08
Filing date: 1986-04-08
Publication date: 1987-10-17
Anticipated expiration: 2010-05-01
Also published as: JPH0740200B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は音声認識装置における音声区間検出方法に関
するものである。DETAILED DESCRIPTION OF THE INVENTION (Industrial Field of Application) The present invention relates to a speech segment detection method in a speech recognition device.

（従来の技術）従来より、音声認識処理に当り、音声区間の検出を行っ
ている。先ず、この従来の音声区間検出方法につき第４
図を参照して説明する。(Prior Art) Conventionally, speech sections have been detected in speech recognition processing. First, let us introduce the fourth aspect of this conventional voice section detection method.
This will be explained with reference to the figures.

従来の音声区間検出方法によれば、音声信号の入力レベ
ル値を時間ｔの関数Ｓ（し）とした時、この音声信号が
入力した時の雑音レベル値Ｎ、入力レベル値５（Ｌ）等
よりレベル閾値りを設定する。According to the conventional voice section detection method, when the input level value of a voice signal is a function S (shi) of time t, the noise level value N when this voice signal is input, the input level value 5 (L), etc. Set a higher level threshold.

その閾値りと比較して入力レベル値５（Ｌ）が大である
状態（Ｓ（ｊ）＞Ｌ）が一定時間すなわち始端決定高し
ヘル入力最低継続時間ＴＳ以上継続した時、この継続時
間ＴＳの開始時刻を音声区間の始端とする。その後入力
レベル値Ｓ　（ｔ）がその閾値りと比較して小である状
態（Ｓ（ｔ）≦Ｌ）が一定時間すなわち終端決定低レベ
ル入力継続時間ＴＥ以、Ｉ：継続した時、この継続時間
ＴＥの開始時刻を音声区間の終端とするという判定方法
により音声区間の決定を行っていた。When the state in which the input level value 5(L) is large compared to the threshold value (S(j)>L) continues for a certain period of time, that is, the minimum input duration time TS for starting end determination, this duration time TS The start time of is the start of the audio section. After that, when the state in which the input level value S (t) is smaller than the threshold value (S (t) ≦L) continues for a certain period of time, that is, after the termination determination low level input duration time TE, this continuation The voice section was determined by a determination method in which the start time of time TE was taken as the end of the voice section.

この場合、雑音レベル値りとしては、音声信号の入力中
でないと想定される時刻ｔｏから予め定められた長さの
、時間的に連続した雑音測定時間間隔ＴＮにおける入力
レベル値５（Ｌ）の平均値を使用している。そしてレベル閾値りとしては雑音レベ
ル値Ｎに予め定められた定数ＣＯを加算した値Ｌ＝Ｎ＋ＣＯとする方法か一般的な方法であった。In this case, the noise level value is the input level value 5 (L) in a temporally continuous noise measurement time interval TN of a predetermined length from time to when it is assumed that no audio signal is being input. Average values are used. The level threshold value has been generally determined by adding a predetermined constant CO to the noise level value N, such that the value L=N+CO.

この方法によりば、第４図においてレベル閾値りと入力
レベル値Ｓ（し）とが一致する時刻ｔｌ。According to this method, the time tl in FIG. 4 at which the level threshold value coincides with the input level value S(shi).

ｔ２．ｔ３．ｔ／Ｉとすると、入力レベル値５（Ｌ）が
このレベル閾値りを前述の継続時間ＴＥ以上越える区間
の開始点例えば時刻ｔ３を音声区間の始端と決定してい
る。t2. t3. Assuming t/I, the start point of the section in which the input level value 5 (L) exceeds this level threshold by more than the aforementioned duration TE, for example time t3, is determined as the start point of the voice section.

次に、レベル閾値りを前述の継続時間ＴＥ以上下回る区
間の開始点例えば時刻ｔ４をその音声区間の終端と決定
している。Next, the starting point, for example, time t4, of the section where the level threshold is lower than the above-mentioned duration time TE is determined as the end of the voice section.

（発明が解決しようとする問題点）しかしながら、雑音変動が激しい環境下で音声認識装置
を実際に使用する場合には、雑音レベル値測定時刻にお
ける入力レベル値と、雑音レベル値測定時刻からある時
間経過後における入力レベル値とが異なる値を取る確率
は経過時間に従って大きくなると一般に考えられている
。従って、雑音レベル値測定時刻からの経過時間が比較
的短い期間内の時刻例えば音声発声開始直前の時刻にお
ける入力レベル値（大きさ）が雑音レベル値と異なった
値となる確率は比較的小さいので、前述した従来の如く
設定されたレベル閾値を使用して始端を安定かつ正確に
検出出来る確率は大である。(Problem to be Solved by the Invention) However, when actually using a speech recognition device in an environment with severe noise fluctuations, the input level value at the noise level value measurement time and the time period from the noise level value measurement time are It is generally believed that the probability that the input level value will take a different value after elapsed time increases as time elapses. Therefore, the probability that the input level value (magnitude) will be different from the noise level value at a time within a relatively short period of time since the noise level value measurement time, for example, just before the start of speech production, is relatively small. There is a high probability that the starting point can be detected stably and accurately using the level threshold set as described above in the conventional manner.

しかし、雑音レベル値測定時刻よりの経過時間が比較的
長くなる時刻例えば音声発声終了直後の時刻における入
力レベル値（大きさ）が雑音レベル値と異なった値とな
る確率は大きいので、終端検出時には、最初に設定した
レベル閾値が終端検出に適した値ではなくなっており、
これがため終端を正確に検出出来なくなる確率が大とな
り、これに起因して音声認識性能の低下をもたらすとい
う問題点があった。However, since there is a high probability that the input level value (magnitude) will be different from the noise level value at a time when the elapsed time from the noise level value measurement time is relatively long, for example, immediately after the end of voice utterance, when detecting the end , the initially set level threshold is no longer suitable for terminal detection,
This increases the probability that the end cannot be detected accurately, which causes a problem in that speech recognition performance deteriorates.

この発明は上述した問題点の解決を図るためになされた
ものである。This invention has been made to solve the above-mentioned problems.

従って、この発明の目的は音声区間の終端を安定かつ正
確に検出することを可能とした音声区間検出方法を提供
することにある。Accordingly, an object of the present invention is to provide a voice section detection method that makes it possible to stably and accurately detect the end of a voice section.

（問題点を解決するための手段）この目的の達成を図るため、この発明による音声区間検
出方法によれば、次のような処理を行う（第１図参照）
。(Means for solving the problem) In order to achieve this objective, according to the speech interval detection method according to the present invention, the following processing is performed (see Figure 1).
.

この発明によれば、第１段階として仮終端を決定し、第
２段階として真の終端を決定する。According to this invention, the temporary termination is determined in the first step, and the true termination is determined in the second step.

この第一段階では、入力音声信号の雑音レベル値Ｎに予
め定められた正の定ａＣ１を加えて得られた始端検出用
レベル閾値ＬＳよりも値の小さくかつ雑音レベル値Ｎよ
りも値が大きいレベル値を得、このレベル値を仮終端検
出用レベル閾値りにＥと設定する。次に、入力レベル値
Ｓ（し）がこの坂路端検出用レベル閾値りにＥを予め定
められている仮終端決定低レベル入力最低継続時間ＴＫ
Ｅ以上下回る区間の開始時刻ｔ３をその音声区間の仮終
端と決定する。In this first stage, the value is smaller than the level threshold LS for start detection obtained by adding a predetermined positive constant aC1 to the noise level value N of the input audio signal, and the value is larger than the noise level value N. A level value is obtained, and this level value is set as E as a level threshold for temporary termination detection. Next, the input level value S(shi) is set to the level threshold for detecting the slope end, and E is predetermined for the temporary termination determination low level input minimum duration TK.
The start time t3 of the section below E or more is determined as the temporary end of that voice section.

次に、第２段階として、仮終端直後における入力レベル
値の大きさを終端検出用雑音レベル値ＮＥとし、この雑
音レベル値ＮＥに予め定められた正の定数０３を加算し
た値を終端検出用レベル閾値ＬＥとする。次に、始端か
ら坂路端までの記憶された入力レベル値Ｓ（し）とこの
レベル値ＬＥとの交差時刻ｔ２を真の終端として検出す
る。この場合、終端検出用レベル閾値ＬＥを始端検出用
レベル閾値ＬＳと同一の値はもとより、このレベル閾値
ＬＳよりも大きな値或いは小さな値に設定することが出
来る。Next, as a second step, the magnitude of the input level value immediately after the temporary termination is set as the noise level value NE for termination detection, and the value obtained by adding a predetermined positive constant 03 to this noise level value NE is used for termination detection. Let the level threshold value LE be. Next, the intersection time t2 of the stored input level value S(shi) from the start end to the slope end and this level value LE is detected as the true end. In this case, the level threshold LE for end detection can be set not only to the same value as the level threshold LS for start detection, but also to a value larger or smaller than this level threshold LS.

（作用）このように、この発明によれば、坂路端検出直後の入力
レベル値Ｓ　（ｔ）を真の終端検出用の雑音レベル閾値
ＬＥとして設定しているので、終端検出は雑音の時間的
変動の影響を受けにくく、従って、音声区間の検出は安
定かつ正確となる。(Operation) As described above, according to the present invention, the input level value S (t) immediately after detecting the slope end is set as the noise level threshold LE for detecting the true end, so the end detection is performed based on the temporal It is less susceptible to fluctuations, so detection of voice sections is stable and accurate.

（実施例）以下、図面を参照してこの発明の音声区間検出方法の実
施例につき説明する。(Embodiments) Hereinafter, embodiments of the voice section detection method of the present invention will be described with reference to the drawings.

第１図はこの発明の音声区間検出方法の説明に供する説
明図で横軸に時刻ｔを取り、縦軸にレベル値を取って入
力レベル値の変化の様子の一例を示しである。第２図は
この発明の音声区間検出方法を実施するだめの音声区間
検出部の一構成例を示すブロック図である。又、第３図
はこの発明の詳細な説明に供する処理の流れ図である。FIG. 1 is an explanatory diagram for explaining the voice section detection method of the present invention, and shows an example of changes in input level values, with time t plotted on the horizontal axis and level values plotted on the vertical axis. FIG. 2 is a block diagram showing an example of the configuration of a speech section detection section for implementing the speech section detection method of the present invention. Moreover, FIG. 3 is a flowchart of processing provided for detailed explanation of the present invention.

第２図に示す音声区間検出部は、レベル抽出部１１、仮
音声区間検出用閾値設定部１２、仮音声区間検出部Ｉ３
、レベル記憶部Ｉ４、終端検出用閾値設定部１５、終端
検出部１６及び制御部１７を以って構成している。尚、
以下の説明において、流れ図の処理ステップをＳで表わ
す。The speech section detection section shown in FIG. 2 includes a level extraction section 11, a provisional speech section detection threshold setting section 12, and a provisional speech section detection section I3.
, a level storage section I4, a termination detection threshold setting section 15, a termination detection section 16, and a control section 17. still,
In the following description, processing steps in the flowchart are denoted by S.

先ず、処理をスタートさせる（Ｓｌ）。入力音声信号ａ
１をレベル検出部１１に入力させ、そのレベル抽出を行
って入力レベル信号ａ２に変換する（Ｓ２）。この入力
レベル信号ａ２の入力レベル値をｓＢ）とし、第１図に
実線で示す。この入力レヘル信号ａ２を仮音声区間検出
用閾値設定部■２、仮音声区間検出部１３及びレベル記
憶部１４へ出力する。First, processing is started (Sl). input audio signal a
1 is input to the level detection section 11, and its level is extracted and converted into an input level signal a2 (S2). The input level value of this input level signal a2 is defined as sB) and is shown by a solid line in FIG. This input level signal a2 is outputted to the provisional speech section detection threshold setting section (2), the provisional speech section detection section 13, and the level storage section 14.

制御部２２は音声の発声中でないと想定される時刻にお
いて仮音声区間検出用閾値設定部１２へ仮音声区間検出
用閾値設定指令信号ｒｌを出力する。The control section 22 outputs a temporary voice section detection threshold setting command signal rl to the temporary voice section detection threshold setting section 12 at a time when it is assumed that the voice is not being uttered.

仮音声区間検出用閾値設定部Ｉ２は、仮音声区間検出用
閾値設定指令信号ｒ１が入力された時刻ｔ。The provisional speech section detection threshold setting unit I2 receives the provisional speech section detection threshold setting command signal r1 at time t.

より予め定められた時間期間ＴＳＮだけ入力レベル信号
ａ２を受は取り、この時間期間における入カレへル値Ｓ
（し）の平均値を雑音レベル値Ｎすなわちと設定する。次に、この雑音レベル値Ｎに対して予め学
習して定めら九た正の定数Ｃ１を加算した値を始端検出
用レベル閾値ＬＳとして決定する（Ｓ３）。The input level signal a2 is received for a predetermined time period TSN, and the input level signal S during this time period is
The average value of (shi) is set as the noise level value N, that is. Next, a value obtained by adding a ninety-plus constant C1 learned and determined in advance to this noise level value N is determined as a level threshold LS for start edge detection (S3).

ＬＳ＝Ｎ＋Ｃ１このレベル閾値ＬＳの信号ｂ１を仮音声区間検出部１３
に送る。LS=N+C1 The signal b1 of this level threshold LS is sent to the temporary speech section detection unit 13.
send to

次に、仮音声区間検出用閾値設定部１２において、この
雑音レベル値Ｎに対して始端検出用レベル閾値設定に用
いた定数Ｃ１に比較して小さい値を持つｒめ学習により
定められだ正の定数０２を加算した値を坂路端検出用し
ヘル閾値ＬＫＥとして決定する（Ｓ４）。Next, in the provisional speech interval detection threshold setting unit 12, a positive value determined by r learning, which has a smaller value than the constant C1 used for setting the level threshold for start detection, is determined for this noise level value N. The value obtained by adding the constant 02 is used for slope edge detection and is determined as the health threshold LKE (S4).

ＬＫＥ＝Ｎ＋Ｃ２この坂路端検出用レベル閾値ＬＫＥの決め方は入力音声
信号の雑音レベル値Ｎに予め定められた正の定数ＣＩを
加えて得られた始端検出用レベル閾値ＬＳから、予め定
められた正の定数Ｃ’２を差し引いて、このレベル閾値
ＬＳよりも値の小さくかつ雑音レベル値Ｎよりも値が大
きいレベル値を得、このレベル値を坂路端検出用レベル
閾値りにＥと設定しても良い。LKE=N+C2 The method of determining the level threshold LKE for detecting the slope end is to calculate the level threshold LKE for starting edge detection, which is obtained by adding a predetermined positive constant CI to the noise level value N of the input audio signal, to a predetermined positive value LKE. By subtracting the constant C'2, a level value smaller than this level threshold value LS and larger than the noise level value N is obtained, and this level value is set as E as the level threshold value for slope end detection. Also good.

仮音声区間検出用閾値設定部１２はこのレベル閾値ＬＫ
Ｅの信号ｂ２を仮音声区間検出部１３へ送ると共に、仮
音声区間検出用閾値設定終了信号ｂ３を制御部１７へ送
る。The provisional voice section detection threshold setting unit 12 uses this level threshold LK.
E signal b2 is sent to the temporary voice section detection section 13, and a temporary voice section detection threshold setting end signal b3 is sent to the control section 17.

制御部１７はこの仮音声区間検出用閾値上定終了信号ｂ
３が供給されると、仮音声区間検出部１３へ仮音声区間
検出指令信号ｒ２を出力する。The control unit 17 receives this provisional voice section detection threshold upper determination end signal b.
3 is supplied, a temporary voice section detection command signal r2 is output to the temporary voice section detection section 13.

仮音声区間検出部１３は、仮音声区間検出指令信号ｒ２
の受信後、人カレベル信号ａ２．始端検出用レベル閾値
ｂ２及び仮終端検出用レベル閾値ｂ３を入力として仮音
声区間の検出を開始し、始端と仮終端とを検出する。The temporary voice section detection unit 13 receives a temporary voice section detection command signal r2.
After receiving the human level signal a2. Detection of a temporary speech section is started by inputting the level threshold value b2 for detecting the start end and the level threshold value b3 for detecting the temporary end point, and the start end and the temporary end point are detected.

この始端時刻ｔｌの検出処理においては、入力レベル値
５（ｔ）が時間の経過により始端検出用レベル閾値ＬＳ
と一致した時刻ｔ１からこの入力レベル値５（ｔ）が、
学習により予め定められた始端決定高レベル入力最低継
続時間ＴＳ以上、このレベル閾値ＬＳより大きな値とな
っていること５（ｔ）＞ＬＳ但し、時間期間７３以上を検出した時、この継続時間ＴＳの前述の開始時刻ｔ１
を音声区間の始端と決定する（Ｓ５）。In this process of detecting the starting point time tl, the input level value 5(t) changes over time to the starting point detection level threshold LS.
From time t1, which coincides with , this input level value 5(t) becomes
The starting point determination high level input minimum duration time TS predetermined by learning must be greater than this level threshold value LS5(t)>LS However, when a time period of 73 or more is detected, this duration time TS The aforementioned start time t1 of
is determined to be the starting point of the voice section (S5).

また、仮終端の検出処理においては、入力レベル値５（
Ｌ）が始端検出後、レベルを低下してきて仮終端検出用
レベル閾値ＬＫＥと一致した時刻ｔ３からこの入力レベ
ル値Ｓ（し）が、学習により予め定められた坂路端決定
低レベル入力最低継続時間ＴＫＥ以上、このレベル閾値
ＬＳを下回る値となっていること５（Ｌ）≦ＬＫＥ但し、継続時間ＴＫＥ以上を検出した時、この継続時間ＴＫＨの前述の開始時刻ｔ
３を仮音声区間の仮終端と決定する（Ｓ６）。In addition, in the temporary termination detection process, the input level value 5 (
From time t3 when L) decreases in level after detecting the start end and matches the level threshold LKE for provisional end detection, this input level value S(shi) changes to the slope end determination low level input minimum duration time predetermined by learning. The value must be greater than TKE and less than this level threshold LS5(L)≦LKE However, when the duration time TKE or more is detected, the above-mentioned start time t of this duration time TKH
3 is determined as the temporary end of the temporary voice section (S6).

このようにして検出された始端時刻ｔ１の信号ｄ１をレ
ベル記憶部１４、終端検出部１６及び制御部１７へ出力
すると共に（Ｓ５）、検出された仮終端時刻ｔ３の信号
ｄ２をレベル記憶部１４、終端検出用閾値設定部１５、
終端検出部１６及び制御部１７へ出力する（Ｓ６）。The signal d1 at the start time t1 detected in this way is output to the level storage section 14, the end detection section 16, and the control section 17 (S5), and the signal d2 at the detected temporary end time t3 is output to the level storage section 14. , end detection threshold setting unit 15,
The signal is output to the termination detection section 16 and the control section 17 (S6).

レベル記憶部１４には始端時刻ｔ１と仮終端時刻ｔ３の
それぞれの信号ｄ１及びｄ２が入力する。Signals d1 and d2 at the starting point time t1 and the tentative ending point time t3 are input to the level storage section 14, respectively.

始端時刻信号ｄｉが入力すると、始端時刻ｔｌから入力
レベル信号ａ２の入力レベル値５（ｔ）の記憶を開始し
、この入力レベル値の記憶を仮終端時刻ｔ３から予め学
習によって定められた所定時間を経過する時刻まで継続
して行う。When the start end time signal di is input, storage of the input level value 5(t) of the input level signal a2 is started from the start end time tl, and this input level value is stored for a predetermined time predetermined by learning from the temporary end time t3. Continue until the time elapses.

、制御部１７は仮音声区間検出部１３からの仮終端時刻
ｄ２を受信した後、終端検出用閾値設定指令信号ｒ３を
終端検出用閾値設定部１５へ出力する。After receiving the provisional end time d2 from the provisional speech section detection section 13, the control section 17 outputs the end detection threshold setting command signal r3 to the end detection threshold setting section 15.

終端検出用閾値設定部１５は、制御部１７からの終端検
出用閾値設定指令信号ｒ３を受は取った後、仮音声区間
検出部１３からの仮終端時刻信号ｄ２によって与えられ
る時刻ｔ３から時間軸正方向へ予め定められた終端検出
用雑音測定時間ＴＥＮ分の入力レベル値Ｓ（し）を終端
検出用雑音レベル信号ｅ１としてレベル記憶部１４から
受は取る。そして、この雑音測定時間ＴＥＮでの入力レ
ベル値５（ｔ）の平均値ＮＥすなわちを終端検出用雑音レベル値ＮＥと設定する。続いて、こ
の雑音レベル値ＮＥに予め学習によって定められている
正の定数Ｃ３を加えて終端検出用レベル閾値ＬＥと設定
する（Ｓ７）。すなわちＬＥ＝ＮＥ＋Ｃ３尚、この場合、この定数０３を選定することによって、
終端検出用レベル閾値ＬＥを始端検出用レベル閾値ＬＥ
と同一の値とすることはもとより、このレベル閾値ＬＥ
よりも大きな値或いは小さな値に設定″′４−ることか
出来る。After receiving the end detection threshold setting command signal r3 from the control section 17, the end detection threshold setting section 15 sets the time axis from time t3 given by the temporary end time signal d2 from the temporary voice section detection section 13. The input level value S(shi) for a predetermined termination detection noise measurement time TEN in the positive direction is taken from the level storage unit 14 as the termination detection noise level signal e1. Then, the average value NE of the input level values 5(t) during this noise measurement time TEN is set as the termination detection noise level value NE. Subsequently, a positive constant C3 determined in advance by learning is added to this noise level value NE to set it as a level threshold value LE for end detection (S7). That is, LE=NE+C3 In this case, by selecting this constant 03,
The level threshold LE for end detection is set to the level threshold LE for start end detection.
Of course, this level threshold LE should be the same value as LE.
It is possible to set it to a value larger or smaller than ``'4-''.

このレベル閾値ＬＥの信号ｆ１を終端検出部１６へ出力
する共に、終端検出用閾値設定終了信号ｆ２を；レリ御
部１７へ出力する。The signal f1 of this level threshold LE is output to the termination detection section 16, and the termination detection threshold setting completion signal f2 is output to the control section 17.

制御部１７は終端検出用閾値設定終了１６を受は取ると
、終端検出部１６へ終端検出指令信号ｒ４を出力する。When the control unit 17 receives the termination detection threshold setting completion 16, it outputs a termination detection command signal r4 to the termination detection unit 16.

終端検出部１６は終端検出指令信号ｒ４を受は取った後
、仮音声区間検出部１３から入力した始端時刻信号ｄｌ
で定まる時刻ｔｉから仮終端時刻信号ｄ２で定まる時刻
ｔ３までの仮音声区間の入力レベル値５（ｔ）の信号ｅ
２をレベル記憶部１４から受は取り、更に終端検出用閾
値設定部１５より終端検出用レベル閾値信号ｆｌを受は
取る。After receiving the end detection command signal r4, the end detection section 16 receives the start end time signal dl input from the temporary voice section detection section 13.
A signal e with an input level value 5(t) of a temporary voice section from time ti determined by d2 to time t3 determined by the temporary end time signal d2.
2 from the level storage section 14, and further receives the termination detection level threshold signal fl from the termination detection threshold setting section 15.

そして、この終端検出部１６において、これら信号ｅ２
及びｆｌによってそれぞれ定められる仮音声区間の入力
レベル値５（ｔ）と、終端検出用レベル閾値ＬＥとの大
小比較を仮終端時刻ｔ３から時間軸負方向へ行っていき
、仮音声区間の入力レベル値５（ｔ）が終端検出レベル
値ＬＥよりも初めて大となる時刻例えばｔ２を真の終端
時刻として検出（Ｓ８）する。Then, in this termination detection section 16, these signals e2
The input level value 5(t) of the temporary voice section determined by and fl is compared in magnitude with the level threshold value LE for end detection in the negative direction of the time axis from the temporary end time t3, and the input level of the temporary voice section is The time when the value 5(t) becomes larger than the end detection level value LE for the first time, for example t2, is detected as the true end time (S8).

このようにして決定された終端時刻の信号ｇを制御部１
７へ出力してこの音声区間検出の処理がエンドとなる（
Ｓ９）。The signal g of the end time determined in this way is sent to the control unit 1.
7, and this voice section detection process ends (
S9).

（発明の効果）上述した説明から明らかなように、この発明では、音声
認識装置使用時における雑音変動に対応するために、終
端検出用レベル閾値の設定を仮終端直後の入力レベル値
の平均値を基準にして行っている。従って、終端検出を
周囲雑音レベルの時間的変動の影響を受けずに行えるの
で、安定かつ正確な１：ｆ意図間検出を行うことか出来
、よって音声認識装置における認識性能の向上が期待出
来る。(Effects of the Invention) As is clear from the above description, in this invention, in order to cope with noise fluctuations when using a speech recognition device, the level threshold for termination detection is set to the average value of the input level values immediately after temporary termination. This is done based on. Therefore, since end detection can be performed without being affected by temporal fluctuations in the ambient noise level, stable and accurate 1:f inter-intentional detection can be performed, and improvement in the recognition performance of the speech recognition device can therefore be expected.

[Brief explanation of drawings]

第１図はこの発明に係る音声区間検出方法の実施例の説
明に供する入力音声信号の入力レベル値の時間的変化の
様子を示す説明図、第２図はこの発明の音声区間検出方法の説明に供する音
声区間検出部を示すブロック図、第３図はこの発明の音
声区間検出方法の処理の流れ図、第４図は従来の音声区間検出方法の説明図である。１１・・・レベル抽出部Ｉ２・・・仮音声区間検出用閾値設定部１３・・・仮音
声区間検出部１４・・・レベル記憶部１５・・・終端検出用閾値設定部１６・・・終端検出部、　　　１７・・・制御部。特許出願人　　　　沖電気工業株式会社゛どいｌ−り処理のう龍ｆＬ（２）第３図FIG. 1 is an explanatory diagram showing how the input level value of an input audio signal changes over time to explain an embodiment of the voice section detection method according to the present invention, and FIG. 2 is an explanation of the voice section detection method according to the present invention. FIG. 3 is a flowchart of the processing of the speech section detection method of the present invention, and FIG. 4 is an explanatory diagram of the conventional speech section detection method. 11...Level extractor I2...Temporary voice section detection threshold setting section 13...Temporary voice section detection section 14...Level storage section 15...Temporary voice section detection threshold setting section 16...End Detection unit, 17... Control unit. Patent applicant: Oki Electric Industry Co., Ltd.

Claims

[Claims]

(1) The average value of the input level values of the input audio signal during a predetermined time period before the audio section of the input audio signal is taken as the noise level, and the level threshold set based on the noise level and the input level When detecting a voice section by comparing the noise level with a value, the value is smaller than the level threshold for start detection, which is obtained by adding a predetermined positive constant to the noise level value of the input voice signal, and the noise level is A level value larger than the above value is set as a temporary termination detection level threshold, and the input level value is lower than the temporary termination detection level threshold by a predetermined temporary termination determination low level input minimum duration time or more. The start time is determined as the temporary end of the speech section, the magnitude of the input level value immediately after the temporary end is set as the noise level value for end detection, and the value obtained by adding a predetermined positive constant to the noise level value is determined. Set as a level threshold for termination detection, and the input level value stored from the starting edge to the temporary termination,
A voice section detection method characterized by detecting a time closest to a provisional end among times of intersection with the end detection level threshold as a true end.

(2) The level threshold for detecting the end point is set to the same value as the level threshold value for detecting the starting point, a value larger than the level threshold value, or a value smaller than the level threshold value.
The speech interval detection method described in Section 1.