JPH0792672B2

JPH0792672B2 - Voice section detection method

Info

Publication number: JPH0792672B2
Application number: JP61082808A
Authority: JP
Inventors: 晴剛安田
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1986-04-10
Filing date: 1986-04-10
Publication date: 1995-10-09
Anticipated expiration: 2010-10-09
Also published as: JPS62238599A

Description

【発明の詳細な説明】技術分野本発明は、音声認識装置における音声区間検出方法に関
する。TECHNICAL FIELD The present invention relates to a voice section detection method in a voice recognition device.

従来技術音声認識装置の音声区間検出方法において、始端の検出
方法はさまざまあり、種々有効な方法が報告されている
が、端始以降の音声区間検出には、一般的には、一定閾
値が用いられている。一方、音声認識装置の音声区間検
出はほとんどの場合、リアルタイム検出が要求されてお
り、先の一定閾値をある音声入力時以外にその周囲ノイ
ズをサンプルし、それを基にして次単語の一定閾値を決
定しているのが現状である。このように、一定閾値の場
合、語頭が子音などの場合欠落する可能性が大きいが、
特に、語頭は重要な要素が大きい。2. Description of the Related Art There are various methods of detecting a start edge in a voice section detection method of a voice recognition device, and various effective methods have been reported. However, a constant threshold value is generally used for voice section detection after the start. Has been. On the other hand, in most cases, the voice recognition device requires real-time detection of the voice section, and the ambient noise is sampled except when a certain voice is input, and the constant threshold of the next word is based on it. The current situation is to decide. In this way, in the case of a fixed threshold, there is a high possibility that the beginning of a word will be missing, such as a consonant,
In particular, the beginning of a word has a large important factor.

上述のように、従来技術においては、音声始端が検出さ
れた後、語中においては少なくとも一定閾値で音声区間
を検出していため、決定されている閾値が高い場合（比
較的騒音レベルが高い場合）音声パワーの低い部分つま
りは子音部分などを欠落する可能性が高かつた。As described above, in the related art, after the voice start end is detected, the voice section is detected at least in a certain threshold value in the word, and therefore when the determined threshold value is high (when the noise level is relatively high, ) There was a high possibility that parts with low voice power, that is, consonant parts, would be missing.

目的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声区間をより正確に検出することを目的として
なされたものである。Purpose The present invention has been made in view of the above-mentioned circumstances,
In particular, the purpose is to detect the voice section more accurately.

構成本発明は、上記目的を達成するために、（１）マイクか
ら入力された音声の音声パワー及び周波数差分パワース
ペクトルを用いて音声の区間を検出する検出手段を有す
る音声区間検出方法において、音声始端を検出した時点
から、一定のフレーム区間の間音声区間検出の閾値を下
げておき、一定フレーム区間経過後、前記閾値をもとの
値に戻して語頭部分の検出精度を上げること、更には、
（２）音声区間中に無音区間が存在する場合、前記無音
区間の次の有音区間の子音始端を検出した時点で、一定
フレームの区間の間、切り出し閾値を下げるようにした
こと、更には、（３）有声／無声検出手段を有し、無声
区間中の時は、音声区間検出閾値を下げ、有声信号が発
生した時点で前記閾値をもとの値にもどすことを特徴と
したものである。以下、本発明の実施例に基いて説明す
る。Structure In order to achieve the above object, the present invention provides (1) a voice section detection method including a detection section for detecting a section of voice using voice power and frequency difference power spectrum of voice input from a microphone, From the time when the start edge is detected, the threshold for voice section detection is lowered for a fixed frame section, and after the fixed frame section has passed, the threshold is returned to the original value to improve the detection accuracy of the beginning of the word, and ,
(2) When there is a silent section in the voice section, the cut-out threshold is lowered during the fixed frame section at the time when the consonant start of the voiced section next to the silent section is detected. (3) It has a voiced / unvoiced detection means, lowers the voice section detection threshold value during the unvoiced section, and restores the threshold value to the original value when the voiced signal is generated. is there. Hereinafter, it demonstrates based on the Example of this invention.

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、１はマイクロフォン、２は前処理
部、３は音声パワー生成部、４は始端検出部、５はｎフ
レームチェック部、６は閾値決定部、７は閾値からの減
算部、８は比較部である。FIG. 1 is an electrical block diagram for explaining an embodiment of the present invention. In the figure, 1 is a microphone, 2 is a pre-processing unit, 3 is a voice power generation unit, 4 is a start end detection unit, 5 Is an n frame check unit, 6 is a threshold value determination unit, 7 is a subtraction unit from the threshold value, and 8 is a comparison unit.

第２図及び第３図は、それぞれ本発明の動作説明をする
ためのタイムチャートで、両図とも（ａ）は音声パワー
信号、（ｂ）及び（ｃ）は音声区間パルスで、（ｂ）は
従来技術による音声区間パルス、（ｃ）は本発明による
音声区間パルスである。2 and 3 are time charts for explaining the operation of the present invention. In both figures, (a) is a voice power signal, (b) and (c) are voice section pulses, and (b) is a voice section pulse. Is a voice section pulse according to the prior art, and (c) is a voice section pulse according to the present invention.

第２図は、始端検出がＳ点で行われた場合の例で、この
場合、子音の先頭で正確に行われたにもかかわらず、音
声パワーによる検出閾値Ｐthが高いために音声区間パル
スは（ｂ）に示すように語頭の部分のみが残り、有用な
情報を切り落すことになる。従つて、始端からｎフレー
ムの区間のみ区間の閾値を下げておき、ｎフレームたっ
た時点で閾値をもとに戻すようにすれば、音声区間パル
スは（ｃ）のようになり、有用な情報が欠落することは
ない。FIG. 2 shows an example in which the start edge detection is performed at the point S. In this case, although the detection threshold Pth by the voice power is high, the voice section pulse is As shown in (b), only the beginning of the word remains and useful information is cut off. Therefore, if the threshold value of the section is lowered only for the section of n frames from the start end and the threshold value is returned to the original value when n frames have passed, the voice section pulse becomes as shown in (c), and useful information is obtained. There is nothing missing.

第３図は、語中に無音区間が生じた場合の例で、この場
合、次の有音区間の先頭に有用な子音情報があるが（例
えばストップ、復改など）、この場合も、前記と同様に
して音声区間パルス（ｃ）を検出することが可能とな
る。FIG. 3 is an example of a case where a silent section occurs in a word. In this case, useful consonant information is present at the beginning of the next voiced section (for example, stop, revision, etc.). The voice section pulse (c) can be detected in the same manner as.

第４図は、本発明の他の実施例を説明するための電気的
ブロック線図で、この実施例は、第１図に示した実施例
に更に有音／無声検出回路９を付加し、前記のｎフレー
ム区間を自動的に制御するようにしたものである。FIG. 4 is an electrical block diagram for explaining another embodiment of the present invention. In this embodiment, a voiced / unvoiced detection circuit 9 is added to the embodiment shown in FIG. The above n frame section is automatically controlled.

第５図は、第４図に示した実施例の動作説明をするため
のタイムチャートで、（ａ）は音声パワー信号、（ｂ）
は有声／無声信号、（ｃ）は音声区間信号であるが、閾
値を下げる目的は主に子音の救済にある。而して、子音
はすべてが無声である訳ではないが比較的無声音が多
い。そこで、音声の有声／無声信号を利用し、ｎフレー
ム中に有声音が発生した場合には閾値を下げておく必要
がなくなるので、閾値をもとに戻す（閾値は正常値の方
が望ましいのは当然であり、閾値を下げることによりノ
イズを切り出してしまう可能性が強い）。又、無声音が
続いても、ｎフレームたったら閾値をもとに戻せば良い
ことになる。FIG. 5 is a time chart for explaining the operation of the embodiment shown in FIG. 4, in which (a) is an audio power signal and (b) is an audio power signal.
Is a voiced / unvoiced signal, and (c) is a voice section signal. The purpose of lowering the threshold value is mainly for consonant relief. Thus, not all unvoiced consonants are relatively unvoiced. Therefore, it is not necessary to lower the threshold value by using the voiced / unvoiced signal of the voice and when the voiced sound occurs in n frames, the threshold value is returned to the original value (a normal value is preferable for the threshold value. Of course, there is a strong possibility that noise will be cut out by lowering the threshold). Even if unvoiced sound continues, the threshold value may be returned to the original value after n frames.

効果以上の説明から明らかなように、本発明によると、語中
の閾値による子音の欠落を最小に押えることができる。Effects As is clear from the above description, according to the present invention, it is possible to minimize the loss of consonants due to the threshold value in words.

[Brief description of drawings]

第１図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第２図及び第３図は、それぞれ第１図に示
した実施例の動作説明をするためのタイムチャート、第
４図は、本発明の他の実施例を説明するための電気的ブ
ロック線図、第５図は、第４図に示した実施例の動作説
明をするためのタイムチャートである。１……マイクロフォン、２……前処理部、３……音声パ
ワー生成部、４……始端検出部、５……ｎフレームチェ
ック部、６……閾値決定部、７……閾値からの減算部、
８……比較部、９……有声／無声検出回路。FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, and FIGS. 2 and 3 are time charts for explaining the operation of the embodiment shown in FIG. 1, respectively. FIG. 4 is an electrical block diagram for explaining another embodiment of the present invention, and FIG. 5 is a time chart for explaining the operation of the embodiment shown in FIG. 1 ... Microphone, 2 ... Preprocessing unit, 3 ... Voice power generation unit, 4 ... Starting point detection unit, 5 ... N frame check unit, 6 ... Threshold value determination unit, 7 ... Subtraction unit from threshold value ,
8 ... Comparison section, 9 ... Voice / unvoiced detection circuit.

Claims

[Claims]

1. A voice section detection method having a detection means for detecting a voice section by using the voice power and frequency difference power spectrum of the voice input from a microphone, wherein a constant frame section is detected from the time when the voice start end is detected. A voice interval detection method, wherein a threshold for voice interval detection is lowered during a period of time, and after a lapse of a certain frame interval, the threshold is returned to the original value to improve the detection accuracy of the beginning of a word.

2. When a voiceless section exists in a voice section, when a consonant starting point of a voiced section next to the voiceless section is detected, the cutout threshold value is lowered during a certain frame section. The voice section detection method according to claim (1).

3. A voiced / unvoiced detection means, wherein a voice section detection threshold value is lowered during an unvoiced section, and the threshold value is returned to the original value when a voiced signal is generated. The voice section detection method according to claim (1) or (2).