JPS63306497A

JPS63306497A - Voice section detecting system

Info

Publication number: JPS63306497A
Application number: JP62143664A
Authority: JP
Inventors: 章次栗木
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1987-06-08
Filing date: 1987-06-08
Publication date: 1988-12-14

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】技術分野本発明は、音声認識装置の音声区間検出方式に関する。[Detailed description of the invention] Technical field The present invention relates to a speech segment detection method for a speech recognition device.

鴛］θえ４従来の音声区間検出においては、語尾の音声区間は、音
声パワーがある値より小さくなる点を終端としていた。雛]θE4 In conventional voice section detection, the end of the voice section at the end of a word is defined as the point where the voice power becomes smaller than a certain value.

しかし、この方式では１語尾のパワーが急に小さくなる
母音などでは誤差は少ないが語尾のパワーがゆっくりと
小さくなる「ん」音に関しては、音声の途中で終端と見
なされることが多い、また、「ん」音はパワーが小さい
ため場合によっては、「ん」音が欠落することもあった
。However, with this method, there is little error for vowels where the power at the end of the first word suddenly decreases, but for the "n" sound where the power at the end of the word slowly decreases, it is often considered to be the end in the middle of the voice. Because the "n" sound has low power, in some cases, the "n" sound was missing.

１−一煎本発明は、上述のごとき実情に鑑みてなされたもので、
特に、音声区間検出方式において、語尾が「ん」で終了
する単語について正しく区間を検出することを目的とし
てなされたものである。1-1 The present invention was made in view of the above-mentioned circumstances.
In particular, this method was developed for the purpose of correctly detecting a segment of a word ending in "n" in a speech segment detection method.

構　　　成本発明は、上記目的を達成するために、入力された音声
をあるサンプル周期で特徴抽出する手段と、パワーを用
いて音声区間を決定する手段と、周波数の高い成分のパ
ワーと低い成分のパワーを比較して「ん」の音声部分を
検出する手段とを有する音声区間検出方式において１語
尾が「ん」で終了しているか否かを上記「ん」の音声部
分を検出する・手段を用いて判断し、「ん」で終了して
いる場合には区間終了後の音声を検索し、「ん」の状態
が継続しておりかつパワーが単語内母音パワーの１／２
〜１／１６のある定められた値より大きい場合には音声
区間とすることを特徴としたものである。以下１本発明
の実施例に基づいて説明する。Configuration In order to achieve the above object, the present invention provides means for extracting features of input speech at a certain sampling period, means for determining a speech section using power, and means for extracting features of input speech at a certain sampling period, means for determining a speech section using power, and a means for extracting features of input speech at a certain sampling period. and a means for detecting the vocal part of "n" by comparing the power, and a means for detecting the vocal part of "n" to determine whether the end of one word ends with "n". If the sound ends with "n", the voice after the end of the section is searched and the state of "n" continues and the power is 1/2 of the vowel power in the word.
This feature is characterized in that if it is larger than a predetermined value of ~1/16, it is considered a voice section. An explanation will be given below based on one embodiment of the present invention.

第１図は１本発明の一実施例を説明するためのタイムチ
ャートで、例えシ、入力音声が「はん」である場合、そ
の音声パワーは第１図（ａ）に示すように、母音部分は
安定したレベルを示し、「ん」ではパワーが小さく、か
つ、徐々にパワーが小さくなっていく傾向がある。その
ためにある値の以上のパワーを持つ部分を音声区間とす
れば、第１図（ｂ）に示すようになり、［ん」の部分が
大半無くなってしまう、また、「ん」の部分を区間と見
なすように閾値を小さくした場合、ノイズが加わると区
間切り出しが不可能となる。よって、本発明では語尾が
「ん」で終了する単語のみ音声区間を補正することによ
り正しく区間を切り出すことにする。「ん」であること
は高周波数成分パワーＨ（約８００Ｈｚ）と低周波成分
パワーＬ（約８００）１ｚ以下）を比較することにより
検出される。FIG. 1 is a time chart for explaining one embodiment of the present invention. For example, when the input voice is ``han'', the voice power is as shown in FIG. The part shows a stable level, and the power of "n" is small, and the power tends to gradually decrease. Therefore, if we define the part with power above a certain value as a vocal section, it will become as shown in Figure 1 (b), where most of the part ``n'' will disappear; If the threshold value is set small so that it is considered as , section extraction becomes impossible if noise is added. Therefore, in the present invention, only words whose endings end in "n" are corrected to correct the speech interval, thereby correctly cutting out the interval. ``H'' is detected by comparing the high frequency component power H (approximately 800 Hz) and the low frequency component power L (approximately 800 1z or less).

第１図（ｃ）は「はん」の場合のそれぞれのパワーを示
しており、第１図（ｄ）に示すように低周波成分パワー
Ｌが高周波成分パワーＨより大きい区間が「ん」の区間
と考えられる。そこで、第１図（ｂ）の音声区間終端よ
り約１００＋ｓ程度戻った点Ｃから終端までの間を検索
ｑ、５０％以上「ん」の区間がある場合には区間の補正
を行なう。図では末尾が「ん」で終了しているため区間
補正を行なう、もし、語尾が「ん」でなければこの条件
にあてはまらないため第１図（ｂ）に示した音声区間（
１）が音声区間となる。補正は次の２つの条件が共にみ
たされている区間を付加することにより行なわれる。Figure 1 (c) shows the respective powers for "han", and as shown in Figure 1 (d), the section where the low frequency component power L is greater than the high frequency component power H is for "n". It can be considered as an interval. Therefore, search q from point C, which is about 100+s back from the end of the voice section in FIG. In the figure, since the end of the word ends with "n", section correction is performed.If the ending of the word does not end with "n", this condition does not apply, so the phonetic section shown in Figure 1 (b)
1) is the voice section. Correction is performed by adding an interval where the following two conditions are both satisfied.

（１）、第１図（ｄ）に示すように、「ん」区間が継続
していること。(1) As shown in FIG. 1(d), the "n" section continues.

（２）、第１図（ｅ）に示すように、母音パワーを検出
しく（Ｅ）の時点で終了）、その後音声パワー５と比較
して母音レベルの約１／８のパワー（■）より大なるこ
と（値は１／２〜ｌ／１６の決められた値）。(2), as shown in Figure 1 (e), the vowel power is detected (ends at point (E)), and then the power (■) is about 1/8 of the vowel level compared to the voice power 5. A large thing (value is determined from 1/2 to 1/16).

上記（１）はノイズ等により特徴量が不正確になってい
る区間は省くためであり、（２）はあまり小さなレベル
では特徴量が不正確になるためである。第１図の（ｄ）
と（ｅ）のＡＮＤ条件を満たした区間を第１ｒＪ！１（
ｂ）に付加することにより。The reason for (1) above is to omit sections where the feature amount is inaccurate due to noise etc., and the reason for (2) is that the feature amount becomes inaccurate if the level is too small. (d) in Figure 1
The section that satisfies the AND condition of (e) is the 1st rJ! 1(
By adding to b).

補正された音声区間（２）が（第２図（ｆ））作られ、
通常の区間第２図（ｂ）の音声区間（１）より「ん」の
区間が正しく検出されている。A corrected speech section (2) is created (Fig. 2(f)),
The "n" section is correctly detected from the voice section (1) in the normal section of FIG. 2(b).

第２図は、上記本発明の音声区間検出方式を実現するた
めの一実施例を説明するための構成図で。FIG. 2 is a block diagram for explaining an embodiment for realizing the voice section detection method of the present invention.

図中、１はマイクロフォン、２はアンプ、３は特微量抽
出部、４は音声パワー検出部、５は高周波成分パワー検
出部、６は低周波パワー検出部、７は母音パワー検出部
、８は音声区間閾値部、９〜１１はコンパレータ、１２
は語尾［ん」検出部。In the figure, 1 is a microphone, 2 is an amplifier, 3 is a feature extraction unit, 4 is a voice power detection unit, 5 is a high frequency component power detection unit, 6 is a low frequency power detection unit, 7 is a vowel power detection unit, and 8 is a vowel power detection unit. Voice section threshold section, 9 to 11 are comparators, 12
is the word ending [n] detection part.

１３は区間補正部で、マイク１より入力された音声はア
ンプ２で増幅され、特徴抽出部３．音声パワー検出部４
、高周波成分パワー検出部５．低周波成分パワー検出部
６、母音パワー検出部７に入力される０通常の音声区間
（音声区間（１））は音声パワーと音声区間閾値をコン
パレータ９により比較して得られる。また「ん」区間（
第１図（ｄ））は高周波と低周波の成分パワーをコンパ
レータ１０により比較して得られる。この音声区間信号
と「んＪ区間信号より語尾が「ん」で終了しているかを
語尾「ん」検出部１２により検出し。Reference numeral 13 denotes a section correction section, in which audio input from the microphone 1 is amplified by an amplifier 2, and a feature extraction section 3. Audio power detection section 4
, high frequency component power detection section 5. The 0 normal speech section (speech section (1)) input to the low frequency component power detection section 6 and the vowel power detection section 7 is obtained by comparing the speech power and the speech section threshold value by the comparator 9. Also, the “n” section (
FIG. 1(d)) is obtained by comparing the high frequency and low frequency component powers using the comparator 10. The ending "n" detection unit 12 detects whether the ending of the word ends with "n" from this voice section signal and the "n J section signal.

そうであれば［ん」区間と、コンパレータ１１の出力で
ある母音の１／８以上のパワーを持つ区間とのＡＮＤ条
件で、音声区間を補正して第１図（ｆ）に示した音声区
間（２）を得る０語尾が「ん」でなければ音声区間の補
正は行なわない。If so, the speech section is corrected using the AND condition of the [n] section and the section having a power of 1/8 or more of the vowel output from the comparator 11, and the speech section shown in FIG. 1(f) is created. If the ending of the zero word to obtain (2) is "n", no correction of the speech interval is performed.

羞−一米以上の説明から明らかなように、本発明によると１語尾
が「ん」で終了する単語の正しい音声区間の検出が可能
となる。As is clear from the above explanation, according to the present invention, it is possible to detect the correct speech section of a word whose first word ends in "n".

[Brief explanation of the drawing]

第１図は、本発明の一実施例を説明するためのタイムチ
ャート、第２図は、本発明を実現するための一実施例を
説明するための構成図である。１・・・マイクロフォン、２・・・アンプ、３・・・特
徴量抽出部、４１０．音声パワー検出部、５・・・高周
波成分パワー検出部、６・・・低周波パワー検出部、７
・・・母音パワー検出部、８・・・音声区間閾値部、９
〜１１・・・コンパレータ、１２・・・語尾「ん」検出
部、１３・・・区間補正部。第　　１　　図音声パワーFIG. 1 is a time chart for explaining an embodiment of the present invention, and FIG. 2 is a configuration diagram for explaining an embodiment for realizing the present invention. 1...Microphone, 2...Amplifier, 3...Feature quantity extraction unit, 410. Audio power detection section, 5... High frequency component power detection section, 6... Low frequency power detection section, 7
...Vowel power detection section, 8...Speech interval threshold section, 9
~11... Comparator, 12... Word ending "n" detection unit, 13... Section correction unit. Figure 1 Sound power

Claims

[Claims]

A method for extracting features of input speech at a certain sampling period, a method for determining speech sections using power, and a method for detecting the speech part of "n" by comparing the power of high frequency components and the power of low frequency components. In a voice section detection method having a means for detecting a sound, it is determined whether a word ends with "n" using the above-mentioned means for detecting the sound part of "n", and if it ends with "n". Search for the audio after the end of the section,
A voice section detection method characterized in that if the state of "n" continues and the power is greater than a predetermined value of 1/2 to 1/16 of the intra-word vowel power, it is determined as a voice section.