JPS61256394A

JPS61256394A - Voice section detection system

Info

Publication number: JPS61256394A
Application number: JP60099137A
Authority: JP
Inventors: 章次栗木; 河本　俊毅; 安田　晴剛; 中谷　奉文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1985-05-10
Filing date: 1985-05-10
Publication date: 1986-11-13

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野一本発明は、音声認識装置における音声区間検出方式に関
する。DETAILED DESCRIPTION OF THE INVENTION Technical Field: 1. Field of the Invention The present invention relates to a speech segment detection method in a speech recognition device.

従来技術一般に、音声信号のパワーは話者によって異なっている
。そのため、音声信号をＡＤ変換するさい、有効ビット
数を保つためＡＤ変換器の前段にＡＧＣ回路を設けてい
る。このＡＧＣ回路によって話者間の音声パワーの違い
は除去されるが、無音区間ではＡＧＣ回路のゲインが大
きくなり、音声区間ではゲインが小さくなる。そのため
ＡＧＣ回路後の信号は雑音をともなう無音区間と音声区
間のパワーの差が小さくなっている。音声区間はパワー
を検出しである一定の閾値を基に検出されるが、無音区
間の雑音パワーによって閾値を可変させる場合、音声区
間ではＡＧＣ回路のゲインの差により相対的にパワー閾
値が上がるため、促音の後に続く子音の様なパワーの小
さい音声が欠落する場合があった。例えば、”５ＴＯＰ
”　”Ｆ”“Ｚ”などの言葉では単語中の無音声区間後
の子音が欠落する場合がある。BACKGROUND OF THE INVENTION In general, the power of a voice signal differs depending on the speaker. Therefore, when performing AD conversion of an audio signal, an AGC circuit is provided before the AD converter in order to maintain the effective number of bits. This AGC circuit eliminates the difference in voice power between speakers, but the gain of the AGC circuit becomes large during silent sections, and becomes small during speech sections. Therefore, in the signal after the AGC circuit, the difference in power between a silent section with noise and a voice section is small. Voice sections are detected based on a certain threshold by detecting power, but if the threshold is varied depending on the noise power in silent sections, the power threshold will rise relatively in voice sections due to the difference in the gain of the AGC circuit. , low-power sounds such as consonants following consonants were sometimes missing. For example, “5TOP
In words such as ``F'' and ``Z'', the consonant after a silent period in the word may be missing.

第３図は、」−記従来技術の欠点を説明するための波形
図で、（０）図は音声パワーＡと音声区間切出し閾値Ｂ
の関係を”　Ｓ　Ｔ　ＯＰ”を例として示したもので（
斜線部は雑音）、この場合、＃　Ｐ　ＩＩの部分が欠落
し、音声区間信号は（ｂ）図のようになる。すなわち、
従来の音声認識装置においては、音声区間の検出は入力
信号のパワー情報を用いて音声が発声されていない間の
雑音パワーを基に決定された閾値を使用して検出してい
るが、この方法では、入力信号をＡＧＣ回路に通した後
では無発声時と発声時のＡＧＣ回路のゲインが異なるた
め、促音の後の子音が無発声時の雑音パワーにり小さく
なり音声として検出されないことがあった。FIG. 3 is a waveform diagram for explaining the shortcomings of the prior art described in "-".
The relationship is shown using "S T OP" as an example (
The shaded area is noise). In this case, the #P II portion is missing, and the voice section signal becomes as shown in Figure (b). That is,
In conventional speech recognition devices, speech intervals are detected using power information of the input signal and a threshold determined based on the noise power during periods when speech is not being uttered. In this case, after the input signal is passed through the AGC circuit, the gain of the AGC circuit is different when there is no utterance and when there is utterance, so the consonant after the consonant may be reduced by the noise power during the unuttered time and not be detected as speech. Ta.

目　　　　　的一本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声認識装置において、騒音下においても安定に
音声区間の検出ができるようにすることを目的としてな
されたものである。Purpose 1 The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to enable a speech recognition device to stably detect speech sections even under noisy conditions.

購−一成。Purchase - Kazunari.

本発明は、上記目的を達成するため、入力された音声パ
ワーの話者による違いを除去するＡＧＣ回路と、該Ａ、
　Ｇ　Ｃ回路を通過した音声パワーと無発声時の雑音パ
ワーを基に設定された閾値とを比較して音声区間を得る
手段と、ある定められた長さの無音区間がある場合、そ
れを単語の区切りとする一単語信号を生成する手段とを
有する音声認識装置において、単語内の音声区間が終了
した時点からある−・定時間たった無音区間中の雑音パ
ワーを基にして閾値を変化させて次に来る音声の区間を
検出することを特徴としたものである。以下、本発明の
実施例に基づいて説明する。In order to achieve the above object, the present invention provides an AGC circuit that eliminates differences in input voice power depending on speakers;
A means for obtaining a speech interval by comparing the speech power passed through the G C circuit with a threshold set based on the noise power during non-utterance, and a means for obtaining a speech interval when there is a silent interval of a certain length, In the speech recognition device, the threshold value is changed based on the noise power during the silent section after a fixed period of time after the end of the speech section within the word. This method is characterized by detecting the next audio section. Hereinafter, the present invention will be explained based on examples.

第１図は、本発明による音声区間検出方式を実施するの
に使用して好適な電気回路の一例を示すブロック線図、
第２図は、本発明の動作原理を説明するための信号波形
図で、第１図において、１はマイク、２はＡＧＣ（自動
利得制御回路）、３はパワー検出部、４は閾値可変部、
５は区間検出部で、。は促音による無音区間検出情報、
ｂは閾値情報を表わしている。FIG. 1 is a block diagram showing an example of an electric circuit suitable for use in implementing the voice section detection method according to the present invention;
FIG. 2 is a signal waveform diagram for explaining the operating principle of the present invention. In FIG. 1, 1 is a microphone, 2 is an AGC (automatic gain control circuit), 3 is a power detection section, and 4 is a threshold variable section. ,
5 is an interval detection section. is silent section detection information due to consonants,
b represents threshold information.

第２図を参照しながら詳細に説明すると、音声が入力さ
れる前の雑音のパワーによって決定された閾値Ｂ１によ
って音声区間が検出され、図示例の場合、”　Ｓ　Ｔ○
″の部分が検出される。単語中に無音区間がある場合、
ＡＧＣ回路のゲインが無音区間より小さいため、単語中
の雑音のパワーは小さくなっている。一般にＡＧＣ回路
のゲインは語尾を検出するためにゲインを大きくする場
合の時定数は数秒程度にしであるため第２図（ｂ）に示
すようなゲインとなる。そのため単語内の無音区間でも
ＡＧＣ回路のゲインは変化しない。この単語内の無音区
間に着目して閾値Ｂ２を設定するが、この閾値Ｂ２の設
定方法としては例えば下記のような方法がある。To explain in detail with reference to FIG. 2, a speech section is detected using a threshold value B1 determined by the power of noise before speech is input, and in the illustrated example, "ST○
” is detected. If there is a silent section in the word,
Since the gain of the AGC circuit is smaller than that of the silent section, the power of the noise in the word is small. Generally, when the gain of the AGC circuit is increased to detect the end of a word, the time constant is approximately several seconds, so the gain is as shown in FIG. 2(b). Therefore, the gain of the AGC circuit does not change even during a silent section within a word. The threshold value B2 is set by focusing on the silent section within the word, and the following method is available as a method for setting the threshold value B2, for example.

■、単語内の音声区間が終了した時点より一定時間たっ
た時点Ｔでの雑音パワーを基にして設定する。(2) Set based on the noise power at time T, which is a certain period of time after the end of the speech section within the word.

■、同時点Ｔより数１０　ｍｓから１．　ＯＯｍｓ程の
雑音パワーの平均を基に設定する。■, from several tens of ms to 1. Set based on the average noise power of about OOms.

■、無音区間内において次の音声区間が検出されるまで
、一定時間例えば１００　ｍｓごとに閾値Ｂ２を変化さ
せる。(2) The threshold value B2 is changed every fixed period of time, for example, 100 ms, until the next voice section is detected within the silent section.

第２図（ｃ）は、上述のごとくして得られた音声区間信
号を示す。FIG. 2(c) shows the voice section signal obtained as described above.

次に単語内の音声区間が終了しておら次の音声区間が始
まるまでの時間には大きく分けて２種類ある。ひとつは
Ｔ′″　ＩＩＫ”などの子音の前にあるものでだいたい
５０ｍ５以下の無音区間である。There are roughly two types of time between the end of a vocal section within a word and the start of the next vocal section. One type is a silent section of approximately 50m5 or less, which occurs before a consonant such as T'''IIK''.

他のひとつは促音によるもので１００　ｍｓ以上の無音
区間である。これらのうち促音による無音区間でのみ閾
値を変化させるために単語内での音声区間終了後一定時
間、だいたい１．　ＯＯｍｓ程の間に新たな音声区間を
検出した場合は閾値は変化させない。また、語尾の場合
、一定時間、例えば３００ｍ５〜４００　ｍｓ程度無音
区間が続いた場合、単語の終了と検知するが、ＡＧＣ回
路によってゲインが大きくなる。つまり雑音が大きくな
る間は閾値を定期的に変化させる必要がある。この変化
させる時間はＡＧＣ回路のゲインを大きくする時定数に
よって決定される。The other one is due to consonants and is a silent section of 100 ms or more. Among these, in order to change the threshold only in silent sections due to consonant sounds, approximately 1. If a new voice section is detected within about OOms, the threshold value is not changed. Further, in the case of the end of a word, if a silent period continues for a certain period of time, for example, about 300 m5 to 400 ms, the end of the word is detected, but the gain is increased by the AGC circuit. In other words, it is necessary to change the threshold value periodically while the noise increases. The time for this change is determined by the time constant that increases the gain of the AGC circuit.

効　　　果以」二の説明から明らかなように、本発明によると、雑
音下においても音声情報を欠落させずに安定に音声区間
を検出することができる。Effects As is clear from the explanation in section 2, according to the present invention, it is possible to stably detect a voice section without losing voice information even under noise.

[Brief explanation of the drawing]

第１図は、本発明の実施に使用される電気回路の一例を
示すブロック線図、第２図は、本発明の動作原理を説明
するための信号波形図、第３図は、従来の音声区間検出
方法の一例を説明するための信号波形図である。１・・・マイク、２・・・ＡＧＣ回路、３・・・パワー
検出部、４・・・閾値可変部、５・・・区間検出部。７一FIG. 1 is a block diagram showing an example of an electric circuit used to carry out the present invention, FIG. 2 is a signal waveform diagram for explaining the operating principle of the present invention, and FIG. FIG. 3 is a signal waveform diagram for explaining an example of a section detection method. DESCRIPTION OF SYMBOLS 1... Microphone, 2... AGC circuit, 3... Power detection part, 4... Threshold variable part, 5... Section detection part. 71

Claims

[Claims]

(1) An AGC circuit that removes differences in input voice power depending on the speaker, and a voice interval that compares the voice power that has passed through the AGC circuit with a threshold set based on the noise power during non-utterance. and, if there is a silent interval of a certain length, a means for generating a one-word signal that uses the silent interval as a word delimiter. A speech section detection method that detects the next speech section by changing a threshold value based on the noise power in a silent section that has elapsed for a certain period of time.

(2), the noise power during a silent section after a certain period of time from the end of the vocal section within a word is calculated by several tens to 100 meters.
Claim (1) characterized in that the next voice section is detected by changing the threshold value based on the value averaged over s.
The speech interval detection method described in Section.

(3) Detect the noise power during a silent section after a certain period of time has elapsed from the end of the speech section within a word, and change the threshold value sequentially based on the detected power each time. The voice interval detection method according to claim 1, wherein the voice interval detection method detects the voice interval that occurs at the end of the voice interval.