JPH0792672B2 - Voice section detection method - Google Patents

Voice section detection method

Info

Publication number
JPH0792672B2
JPH0792672B2 JP61082808A JP8280886A JPH0792672B2 JP H0792672 B2 JPH0792672 B2 JP H0792672B2 JP 61082808 A JP61082808 A JP 61082808A JP 8280886 A JP8280886 A JP 8280886A JP H0792672 B2 JPH0792672 B2 JP H0792672B2
Authority
JP
Japan
Prior art keywords
voice
section
detection
threshold value
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
JP61082808A
Other languages
Japanese (ja)
Other versions
JPS62238599A (en
Inventor
晴剛 安田
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ricoh Co Ltd
Original Assignee
Ricoh Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ricoh Co Ltd filed Critical Ricoh Co Ltd
Priority to JP61082808A priority Critical patent/JPH0792672B2/en
Publication of JPS62238599A publication Critical patent/JPS62238599A/en
Publication of JPH0792672B2 publication Critical patent/JPH0792672B2/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Description

【発明の詳細な説明】 技術分野 本発明は、音声認識装置における音声区間検出方法に関
する。
TECHNICAL FIELD The present invention relates to a voice section detection method in a voice recognition device.

従来技術 音声認識装置の音声区間検出方法において、始端の検出
方法はさまざまあり、種々有効な方法が報告されている
が、端始以降の音声区間検出には、一般的には、一定閾
値が用いられている。一方、音声認識装置の音声区間検
出はほとんどの場合、リアルタイム検出が要求されてお
り、先の一定閾値をある音声入力時以外にその周囲ノイ
ズをサンプルし、それを基にして次単語の一定閾値を決
定しているのが現状である。このように、一定閾値の場
合、語頭が子音などの場合欠落する可能性が大きいが、
特に、語頭は重要な要素が大きい。
2. Description of the Related Art There are various methods of detecting a start edge in a voice section detection method of a voice recognition device, and various effective methods have been reported. However, a constant threshold value is generally used for voice section detection after the start. Has been. On the other hand, in most cases, the voice recognition device requires real-time detection of the voice section, and the ambient noise is sampled except when a certain voice is input, and the constant threshold of the next word is based on it. The current situation is to decide. In this way, in the case of a fixed threshold, there is a high possibility that the beginning of a word will be missing, such as a consonant,
In particular, the beginning of a word has a large important factor.

上述のように、従来技術においては、音声始端が検出さ
れた後、語中においては少なくとも一定閾値で音声区間
を検出していため、決定されている閾値が高い場合(比
較的騒音レベルが高い場合)音声パワーの低い部分つま
りは子音部分などを欠落する可能性が高かつた。
As described above, in the related art, after the voice start end is detected, the voice section is detected at least in a certain threshold value in the word, and therefore when the determined threshold value is high (when the noise level is relatively high, ) There was a high possibility that parts with low voice power, that is, consonant parts, would be missing.

目的 本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声区間をより正確に検出することを目的として
なされたものである。
Purpose The present invention has been made in view of the above-mentioned circumstances,
In particular, the purpose is to detect the voice section more accurately.

構成 本発明は、上記目的を達成するために、(1)マイクか
ら入力された音声の音声パワー及び周波数差分パワース
ペクトルを用いて音声の区間を検出する検出手段を有す
る音声区間検出方法において、音声始端を検出した時点
から、一定のフレーム区間の間音声区間検出の閾値を下
げておき、一定フレーム区間経過後、前記閾値をもとの
値に戻して語頭部分の検出精度を上げること、更には、
(2)音声区間中に無音区間が存在する場合、前記無音
区間の次の有音区間の子音始端を検出した時点で、一定
フレームの区間の間、切り出し閾値を下げるようにした
こと、更には、(3)有声/無声検出手段を有し、無声
区間中の時は、音声区間検出閾値を下げ、有声信号が発
生した時点で前記閾値をもとの値にもどすことを特徴と
したものである。以下、本発明の実施例に基いて説明す
る。
Structure In order to achieve the above object, the present invention provides (1) a voice section detection method including a detection section for detecting a section of voice using voice power and frequency difference power spectrum of voice input from a microphone, From the time when the start edge is detected, the threshold for voice section detection is lowered for a fixed frame section, and after the fixed frame section has passed, the threshold is returned to the original value to improve the detection accuracy of the beginning of the word, and ,
(2) When there is a silent section in the voice section, the cut-out threshold is lowered during the fixed frame section at the time when the consonant start of the voiced section next to the silent section is detected. (3) It has a voiced / unvoiced detection means, lowers the voice section detection threshold value during the unvoiced section, and restores the threshold value to the original value when the voiced signal is generated. is there. Hereinafter, it demonstrates based on the Example of this invention.

第1図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、1はマイクロフォン、2は前処理
部、3は音声パワー生成部、4は始端検出部、5はnフ
レームチェック部、6は閾値決定部、7は閾値からの減
算部、8は比較部である。
FIG. 1 is an electrical block diagram for explaining an embodiment of the present invention. In the figure, 1 is a microphone, 2 is a pre-processing unit, 3 is a voice power generation unit, 4 is a start end detection unit, 5 Is an n frame check unit, 6 is a threshold value determination unit, 7 is a subtraction unit from the threshold value, and 8 is a comparison unit.

第2図及び第3図は、それぞれ本発明の動作説明をする
ためのタイムチャートで、両図とも(a)は音声パワー
信号、(b)及び(c)は音声区間パルスで、(b)は
従来技術による音声区間パルス、(c)は本発明による
音声区間パルスである。
2 and 3 are time charts for explaining the operation of the present invention. In both figures, (a) is a voice power signal, (b) and (c) are voice section pulses, and (b) is a voice section pulse. Is a voice section pulse according to the prior art, and (c) is a voice section pulse according to the present invention.

第2図は、始端検出がS点で行われた場合の例で、この
場合、子音の先頭で正確に行われたにもかかわらず、音
声パワーによる検出閾値Pthが高いために音声区間パル
スは(b)に示すように語頭の部分のみが残り、有用な
情報を切り落すことになる。従つて、始端からnフレー
ムの区間のみ区間の閾値を下げておき、nフレームたっ
た時点で閾値をもとに戻すようにすれば、音声区間パル
スは(c)のようになり、有用な情報が欠落することは
ない。
FIG. 2 shows an example in which the start edge detection is performed at the point S. In this case, although the detection threshold Pth by the voice power is high, the voice section pulse is As shown in (b), only the beginning of the word remains and useful information is cut off. Therefore, if the threshold value of the section is lowered only for the section of n frames from the start end and the threshold value is returned to the original value when n frames have passed, the voice section pulse becomes as shown in (c), and useful information is obtained. There is nothing missing.

第3図は、語中に無音区間が生じた場合の例で、この場
合、次の有音区間の先頭に有用な子音情報があるが(例
えばストップ、復改など)、この場合も、前記と同様に
して音声区間パルス(c)を検出することが可能とな
る。
FIG. 3 is an example of a case where a silent section occurs in a word. In this case, useful consonant information is present at the beginning of the next voiced section (for example, stop, revision, etc.). The voice section pulse (c) can be detected in the same manner as.

第4図は、本発明の他の実施例を説明するための電気的
ブロック線図で、この実施例は、第1図に示した実施例
に更に有音/無声検出回路9を付加し、前記のnフレー
ム区間を自動的に制御するようにしたものである。
FIG. 4 is an electrical block diagram for explaining another embodiment of the present invention. In this embodiment, a voiced / unvoiced detection circuit 9 is added to the embodiment shown in FIG. The above n frame section is automatically controlled.

第5図は、第4図に示した実施例の動作説明をするため
のタイムチャートで、(a)は音声パワー信号、(b)
は有声/無声信号、(c)は音声区間信号であるが、閾
値を下げる目的は主に子音の救済にある。而して、子音
はすべてが無声である訳ではないが比較的無声音が多
い。そこで、音声の有声/無声信号を利用し、nフレー
ム中に有声音が発生した場合には閾値を下げておく必要
がなくなるので、閾値をもとに戻す(閾値は正常値の方
が望ましいのは当然であり、閾値を下げることによりノ
イズを切り出してしまう可能性が強い)。又、無声音が
続いても、nフレームたったら閾値をもとに戻せば良い
ことになる。
FIG. 5 is a time chart for explaining the operation of the embodiment shown in FIG. 4, in which (a) is an audio power signal and (b) is an audio power signal.
Is a voiced / unvoiced signal, and (c) is a voice section signal. The purpose of lowering the threshold value is mainly for consonant relief. Thus, not all unvoiced consonants are relatively unvoiced. Therefore, it is not necessary to lower the threshold value by using the voiced / unvoiced signal of the voice and when the voiced sound occurs in n frames, the threshold value is returned to the original value (a normal value is preferable for the threshold value. Of course, there is a strong possibility that noise will be cut out by lowering the threshold). Even if unvoiced sound continues, the threshold value may be returned to the original value after n frames.

効果 以上の説明から明らかなように、本発明によると、語中
の閾値による子音の欠落を最小に押えることができる。
Effects As is clear from the above description, according to the present invention, it is possible to minimize the loss of consonants due to the threshold value in words.

【図面の簡単な説明】[Brief description of drawings]

第1図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第2図及び第3図は、それぞれ第1図に示
した実施例の動作説明をするためのタイムチャート、第
4図は、本発明の他の実施例を説明するための電気的ブ
ロック線図、第5図は、第4図に示した実施例の動作説
明をするためのタイムチャートである。 1……マイクロフォン、2……前処理部、3……音声パ
ワー生成部、4……始端検出部、5……nフレームチェ
ック部、6……閾値決定部、7……閾値からの減算部、
8……比較部、9……有声/無声検出回路。
FIG. 1 is an electrical block diagram for explaining one embodiment of the present invention, and FIGS. 2 and 3 are time charts for explaining the operation of the embodiment shown in FIG. 1, respectively. FIG. 4 is an electrical block diagram for explaining another embodiment of the present invention, and FIG. 5 is a time chart for explaining the operation of the embodiment shown in FIG. 1 ... Microphone, 2 ... Preprocessing unit, 3 ... Voice power generation unit, 4 ... Starting point detection unit, 5 ... N frame check unit, 6 ... Threshold value determination unit, 7 ... Subtraction unit from threshold value ,
8 ... Comparison section, 9 ... Voice / unvoiced detection circuit.

Claims (3)

【特許請求の範囲】[Claims] 【請求項1】マイクから入力された音声の音声パワー及
び周波数差分パワースペクトルを用いて音声の区間を検
出する検出手段を有する音声区間検出方法において、音
声始端を検出した時点から、一定のフレーム区間の間音
声区間検出の閾値を下げておき、一定フレーム区間経過
後、前記閾値をもとの値に戻して語頭部分の検出精度を
上げることを特徴とする音声区間検出方法。
1. A voice section detection method having a detection means for detecting a voice section by using the voice power and frequency difference power spectrum of the voice input from a microphone, wherein a constant frame section is detected from the time when the voice start end is detected. A voice interval detection method, wherein a threshold for voice interval detection is lowered during a period of time, and after a lapse of a certain frame interval, the threshold is returned to the original value to improve the detection accuracy of the beginning of a word.
【請求項2】音声区間中に無音区間が存在する場合、前
記無音区間の次の有音区間の子音始端を検出した時点
で、一定フレームの区間の間、切り出し閾値を下げるよ
うにしたことを特徴とする特許請求の範囲第(1)項に
記載の音声区間検出方法。
2. When a voiceless section exists in a voice section, when a consonant starting point of a voiced section next to the voiceless section is detected, the cutout threshold value is lowered during a certain frame section. The voice section detection method according to claim (1).
【請求項3】有声/無声検出手段を有し、無声区間中の
時は、音声区間検出閾値を下げ、有声信号が発生した時
点で前記閾値をもとの値にもどすことを特徴とする特許
請求の範囲第(1)項又は第(2)項に記載の音声区間
検出方法。
3. A voiced / unvoiced detection means, wherein a voice section detection threshold value is lowered during an unvoiced section, and the threshold value is returned to the original value when a voiced signal is generated. The voice section detection method according to claim (1) or (2).
JP61082808A 1986-04-10 1986-04-10 Voice section detection method Expired - Fee Related JPH0792672B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP61082808A JPH0792672B2 (en) 1986-04-10 1986-04-10 Voice section detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP61082808A JPH0792672B2 (en) 1986-04-10 1986-04-10 Voice section detection method

Publications (2)

Publication Number Publication Date
JPS62238599A JPS62238599A (en) 1987-10-19
JPH0792672B2 true JPH0792672B2 (en) 1995-10-09

Family

ID=13784710

Family Applications (1)

Application Number Title Priority Date Filing Date
JP61082808A Expired - Fee Related JPH0792672B2 (en) 1986-04-10 1986-04-10 Voice section detection method

Country Status (1)

Country Link
JP (1) JPH0792672B2 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6202046B1 (en) 1997-01-23 2001-03-13 Kabushiki Kaisha Toshiba Background noise/speech classification method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS58190993A (en) * 1982-05-01 1983-11-08 日産自動車株式会社 Voice detector for vehicle
JPS60242500A (en) * 1984-05-17 1985-12-02 日本電気株式会社 Voice detection method and circuit

Also Published As

Publication number Publication date
JPS62238599A (en) 1987-10-19

Similar Documents

Publication Publication Date Title
US4821325A (en) Endpoint detector
JP4736632B2 (en) Vocal fly detection device and computer program
CA2483607A1 (en) Syllabic nuclei extracting apparatus and program product thereof
JP3413862B2 (en) Voice section detection method
JPH0792672B2 (en) Voice section detection method
JPH0950288A (en) Device and method for recognizing voice
JPH0430040B2 (en)
JPH03114100A (en) Voice section detecting device
JP3031081B2 (en) Voice recognition device
JP2666296B2 (en) Voice recognition device
JPH0567039B2 (en)
JPS5925237B2 (en) Speech segment determination method using speech analysis and synthesis method
JPS61292199A (en) Voice recognition equipment
JPS6239754B2 (en)
JPS6217800A (en) Voice section decision system
JP2578771B2 (en) Voice recognition device
JPS59228300A (en) Voice section detecting system
JPS61273599A (en) Voice recognition equipment
JPS58113992A (en) Voice signal compression system
JPS6267598A (en) Voice section detection system
JPH0652479B2 (en) Speech analysis method
JPS59170894A (en) Voice section starting system
JPH0632002B2 (en) Method of cutting out vowel part of monosyllabic voice
JPS6039691A (en) Voice recognition
JPS61177499A (en) Voice section detecting system

Legal Events

Date Code Title Description
LAPS Cancellation because of no payment of annual fees