JPS6217800A

JPS6217800A - Voice section decision system

Info

Publication number: JPS6217800A
Application number: JP60159149A
Authority: JP
Inventors: 伸神谷
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1985-07-16
Filing date: 1985-07-16
Publication date: 1987-01-26
Also published as: JPH0456999B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】く技術分野〉本発明は入力音の中から音声と雑音とを分離するだめの
音声区間判定方式に関するものである。DETAILED DESCRIPTION OF THE INVENTION Technical Field The present invention relates to a speech segment determination method for separating speech and noise from input sound.

〈従来技術〉音声と雑音とを分離する際に、今までは白色雑音やパル
ス性雑音等ある特定の雑音のみを検出し、それを抑制す
ることにより雑音の軽減をはかってきた。<Prior Art> When separating speech and noise, conventional techniques have been to detect only certain types of noise, such as white noise or pulsed noise, and to suppress the noise in order to reduce the noise.

しかし、雑音の種類は無限にあり、したがって雑音ごと
に抑制方法を変えていくこれまでの方式では全ての雑音
に対処しきれない。However, there are an infinite number of types of noise, and conventional methods that change the suppression method for each type of noise cannot deal with all types of noise.

く目　的〉本発明はかかる従来の問題点に鑑みて成されたもので、
特定の雑音を検出して抑制するのではなく、入力音から
音声と雑音とを分離することにより、非常に多くの種類
の雑音をきわめて容易に取り除くことができる音声区間
判定方式を提供せんとするものである。さらに言えば、
母音の有無にもとづいて音声区間の判定を行ない、得ら
れた音声区間を雑音区間から分離し得るようにした音声
区間判定方式を提供せんとするものである。Purpose> The present invention has been made in view of such conventional problems, and
Rather than detecting and suppressing specific noise, the present invention aims to provide a speech interval determination method that can very easily remove a large number of types of noise by separating speech and noise from input sound. It is something. Furthermore,
It is an object of the present invention to provide a speech interval determination method that determines speech intervals based on the presence or absence of vowels and is capable of separating the obtained speech intervals from noise intervals.

〈実施例〉子音と母音の組を基本構造とする日本語音声において、
母音らしい区間の条件としては以下の４つが挙げられる
。<Example> In Japanese speech whose basic structure is a pair of consonants and vowels,
The following four conditions are listed as conditions for a vowel-like interval.

■　パワーが大きい区間。■ Sections with high power.

Φ　スペクトル変化が小さい区間（音声定常部）。Φ　A section with small spectral changes (voice stationary part).

■　母音の標準パターンとのマツチング距離が小さい区
間。■ An area where the matching distance with the standard vowel pattern is small.

■　ケグストラム係数の絶対値和か大きい区間。■　Sum of absolute values of kegstrum coefficients or larger interval.

本発明はこれら４つの条件の中から特に■と■の条件に
もとづいて母音区間を検出して、雑音区間との分離を行
なうもので、これによって、母音の標準パターンとのマ
ツチングを省略し、より簡単なハードウェア構成により
音声区間の判定を行なえるようにしたところＫｖｆ、徴
がある。The present invention detects vowel intervals based on the conditions ■ and ■ out of these four conditions and separates them from noise intervals, thereby omitting matching with the standard pattern of vowels. When the voice section can be determined using a simpler hardware configuration, Kvf is found.

次に図にもとづいて本発明方式を詳細に説明する。Next, the system of the present invention will be explained in detail based on the drawings.

第１図は本発明方式を実施した音声区間判定装置のブロ
ック構成図である。FIG. 1 is a block diagram of a speech segment determination device implementing the method of the present invention.

図において！は音声分析部、２はケプストラム和計算部
、３は判定部である。In the figure! 2 is a speech analysis section, 2 is a cepstrum sum calculation section, and 3 is a judgment section.

前記音声分析部１は第２図にそのブロック構成図を示す
通り、自己相関係数計算部４、線形予測係数計算部５、
ケプストラム係数計算部６及びパワー計算部７から構成
されている。As shown in the block diagram of FIG. 2, the speech analysis section 1 includes an autocorrelation coefficient calculation section 4, a linear prediction coefficient calculation section 5,
It consists of a cepstrum coefficient calculation section 6 and a power calculation section 7.

自己相関係数計算部４ではサンプリング値５（ｉ）（た
だし、１≦ｌ≦２５６）、分析次数ｎｐ＝２４として、
第３図の処理フローにもとづいて自己相関係数（Ｒｉ）
（ただし、１≦ｉ≦ｎｐ＋Ｉンを求めている。The autocorrelation coefficient calculation unit 4 calculates the sampling value 5(i) (where 1≦l≦256) and the analysis order np=24,
Based on the processing flow in Figure 3, the autocorrelation coefficient (Ri)
(However, 1≦i≦np+In is required.

一方、線形予測係数計算部５では前記計算部４からの自
己相関係数（Ｒｉ）を入力として、第４図の処理フロー
に従って線形予測係数Ａ（ｉ）　（ただし、■≦ｉ≦ｎ
ｐ　　八個自己相関係数Ｐ（ｉ）並びに残差パワーＥ　
（ｉ）を算出する。又、ケプストラム係数計算部６では
前段で求められた線形予測係数Ａ（ｉ）（ｌ≦ｉ≦ｎｐ
　　）をもとに次式によりケグストラム係数ｃ（ｉ）（
１≦ｉ≦ｎｐ　　）を求める。On the other hand, the linear prediction coefficient calculation unit 5 inputs the autocorrelation coefficient (Ri) from the calculation unit 4, and follows the processing flow of FIG. 4 to calculate the linear prediction coefficient A(i) (where ■≦i≦n
p Eight autocorrelation coefficients P(i) and residual power E
Calculate (i). In addition, the cepstral coefficient calculation unit 6 calculates the linear prediction coefficient A(i) (l≦i≦np
), the kegstrum coefficient c(i)(
1≦i≦np).

さらに、パワー計算部７ではサンプリング値５（ｉ）（
＋≦ｉ≦２５６）から次式にもとづいてパワーＰを求め
る。Furthermore, in the power calculation unit 7, the sampling value 5(i)(
+≦i≦256), the power P is determined based on the following equation.

次に動作を説明する。Next, the operation will be explained.

まず音声分析部１にて音声信号を１６ＫＨ２でサンプリ
ングしくただし、時刻ｔのサンプリング値を５（ｔ）と
する）、１６ｍ秒のハユング窓をかけて、フレーム周期
８ｍ秒毎にパワーＰとＬＰＣケプストラムＣを求める。First, the audio signal is sampled at 16KH2 in the audio analysis unit 1, and the sampling value at time t is 5(t)), and a Hayung window of 16 ms is applied, and the power P and LPC cepstrum are calculated every 8 ms frame period. Find C.

（なお、１番目のフレームのパワー及びケプストラムを
それぞれＰ　（ｔ）　、　ｃ　（ｔ）であられす）。求
められたＬＰＣケグストラムｃ　（ｔ）は次段のケプス
トラム和計算部２に入力され、ここで低次（２４次まで
）のケグストラム係数の絶（りとして出力される。(Note that the power and cepstrum of the first frame are P (t) and c (t), respectively). The obtained LPC kegstrum c (t) is input to the cepstral sum calculation section 2 at the next stage, where it is output as the end of low-order (up to the 24th order) kegstrum coefficients.

こうして求められたパワーＰ（ｔＪとケプストラム和Ｃ
（ｔ）は判定部３（／ｃ送られ、そこで次のような判定
が成される。The power P (tJ and cepstral sum C
(t) is sent to the determination unit 3 (/c), where the following determination is made.

すなわち、パワーＰ　（ｔ）がいき値ａｌより大きく（
第５図参照）、かつケプストラム和Ｃ（υがいき値ａ２
より大きければ（第６図参照）、そのフレームが母音区
間内Ｖｃ６ると判定する。そして、区間ｔＩ＜ｔ＜ｔ２
において（ただし、ｔ　２−　ｔ　＋＞８４フレームと
する）、有音区間が２１フレ一ム以上あシ、かつ母音区
間と判定されたフレーム数が有音区間長の１７４以上な
らば区間ｔＩ＜ｔ＜ｔ２は音声区間であると判定し、ま
た１／４未満ならば雑音区間であると判定する。That is, the power P (t) is larger than the threshold al (
(see Figure 5), and the cepstral sum C (υ is the threshold a2
If it is larger (see FIG. 6), it is determined that the frame is within the vowel interval Vc6. Then, the interval tI<t<t2
(however, t 2 - t + > 84 frames), if the voiced section is 21 frames or more, and the number of frames determined to be vowel sections is 174 or more, which is the length of the voiced section, then the section tI < If t<t2, it is determined that it is a voice section, and if it is less than 1/4, it is determined that it is a noise section.

このようにして、入力音区間長に対する母音図５長の比
といき値との関係にもとづいて入力音中の音声区間と雑
音区間を判定し分離することができる。特にこの方式の
特徴は、母音区間の検出にあたって母音の標準パターン
とのマツチング処理を行なわないので、非常に簡単なハ
ードウェア構成でもって音声区間の判定を行なうことが
できることである。In this way, it is possible to determine and separate the speech section and the noise section in the input sound based on the relationship between the threshold and the ratio of the vowel diagram 5 length to the input sound section length. A particular feature of this method is that it does not perform matching processing with a standard vowel pattern when detecting vowel sections, so it is possible to determine speech sections with a very simple hardware configuration.

なお、入力音区間長に対する母音区間長の比は実施例に
限らず適宜定めることが出来る。Note that the ratio of the vowel section length to the input sound section length is not limited to the embodiment and can be determined as appropriate.

く効　果〉以上詳細に説明した様に、本発明に係る音声区間判定方
式は入力音中の母音区間を検出し、入力音区間長に対す
る前記検出した母音区間長の比を求め、その比がいき値
より大きいとき入力音が音声であると判定するようにし
たから、入力音中から音声のみ検出することができ、し
かも母音区間の検出の際に母音の標準パターンとのマツ
チング処ａを行なわないので、ノ・−ドウエアの構成に
あたって、その構成を著しく簡略化することができると
いう大きな効果がある。Effect> As explained in detail above, the speech interval determination method according to the present invention detects a vowel interval in an input sound, calculates the ratio of the detected vowel interval length to the input sound interval length, and calculates the ratio of the detected vowel interval length to the input sound interval length. Since the input sound is determined to be speech when it is larger than the threshold, it is possible to detect only speech from the input sound, and when detecting vowel sections, matching process a with the standard vowel pattern is performed. Therefore, there is a great effect that the configuration of the software can be significantly simplified.

[Brief explanation of the drawing]

第１図は本発明方式を採用した音声区間判定装置のブロ
ック図、第２図は第１図の音声分析部のブロック図、第
３図は自己相関係数計算部における処理フロー図、第４
図は線形予測係数計算部における処理フロー図、第５図
はパワーのいき値と雑音及び音声との関係を示す図、第
６図はケグヌトラム和のいき値と雑音及び音声との関係
を示す図である。ｌは音声分析部、２はケグヌトラム和計算部、３は判定
部、５（ｔ）はサンプリング値、ｃ（ｔ）はＬＰＧケグ
ヌトラム、Ｐ（ｔ）はパワー、Ｃ（ｔ）はケプヌトラム
和。代理人　弁理士　福　士　愛　彦（他２名）第１図９１色副　　　　　　第４図第６図Fig. 1 is a block diagram of a speech segment determination device adopting the method of the present invention, Fig. 2 is a block diagram of the speech analysis section of Fig. 1, Fig. 3 is a processing flow diagram of the autocorrelation coefficient calculation section, and Fig. 4
Figure 5 is a diagram showing the processing flow in the linear prediction coefficient calculation unit, Figure 5 is a diagram showing the relationship between the power threshold, noise and speech, and Figure 6 is a diagram showing the relationship between the Kegnutrum sum threshold, noise and audio. It is. 1 is a speech analysis unit, 2 is a kegnutrum sum calculation unit, 3 is a determination unit, 5(t) is a sampling value, c(t) is an LPG kegnutrum, P(t) is power, and C(t) is a kegnutrum sum. Agent Patent attorney Aihiko Fuku (and 2 others) Figure 1 91 color subtitles Figure 4 Figure 6

Claims

[Claims]

1. A voice section determination method that detects a vowel section in an input sound, calculates the ratio of the detected vowel section length to the input sound section length, and determines that the input sound is speech when the ratio is greater than a threshold value.