JP2564821B2

JP2564821B2 - Voice judgment detector

Info

Publication number: JP2564821B2
Application number: JP62097779A
Authority: JP
Inventors: 敏雄吉川
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1987-04-20
Filing date: 1987-04-20
Publication date: 1996-12-18
Anticipated expiration: 2011-12-18
Also published as: JPS63262693A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は入力する音声を判定して検出する装置に関
し、とくに音声認識装置などにおける入力音声の存在範
囲を判定し検出する音声判定検出装置に関する。Description: TECHNICAL FIELD The present invention relates to an apparatus for determining and detecting an input voice, and more particularly to a voice determination / detection apparatus for determining and detecting the existence range of an input voice in a voice recognition apparatus or the like. .

〔従来の技術〕従来、音声区間を判定検出する音声判定検出装置は、
第４図に示されるように、入力する音に対応する入力信
号１からレベルを抽出する回路８と、音声入力時の雑音
レベル、ならびに入力音声レベルなどによってレベルの
しきい値を設定し、該しきい値と、前記レベル抽出回路
８から送出される入力レベル信号９とを比較して、該入
力レベル信号９が大である状態が、定められた一定時間
以上継続したとき、音声区間の始端と判定し、そのの
ち、前記しきい値と当該入力レベル信号９とを比較し
て、該入力レベル信号９が小である状態が、定められた
一定時間以上継続したときに、音声区間の終端と判定し
て、音声区間の始端ならびに終端の判定信号11を送出す
るしきい値設定回路10と、該音声区間始端終端判定信号
11を入力して、音声検出の結果信号７を送出する音声区
間の検出回路12と、を備えていて、上述の判定により決
定された始端から終端までを、音声区間として検出して
いた。[Prior Art] Conventionally, a voice determination / detection device for determining and detecting a voice section is
As shown in FIG. 4, the circuit 8 for extracting the level from the input signal 1 corresponding to the input sound, the noise level at the time of voice input, the level threshold value is set according to the input voice level, and the like. The threshold value is compared with the input level signal 9 sent from the level extraction circuit 8, and when the state in which the input level signal 9 is high continues for a predetermined time or more, the start end of the voice section. After that, the threshold value is compared with the input level signal 9, and when the state in which the input level signal 9 is small continues for a predetermined time or more, the end of the voice section is terminated. Threshold value setting circuit 10 for transmitting the judgment signal 11 for the start and end of the voice section, and the judgment signal for the start and end of the voice section.
It is provided with a voice section detection circuit 12 for inputting 11 and transmitting a voice detection result signal 7, and detects from the start end to the end determined by the above determination as a voice section.

[Problems to be solved by the invention]

上述した従来の音声判定検出装置は、入力音声のパワ
ー情報を用いるため、周囲雑音が混入しやすく、入力音
声と周囲雑音との区別が困難という欠点がある。Since the above-described conventional voice determination and detection apparatus uses the power information of the input voice, it has a drawback that ambient noise is easily mixed and it is difficult to distinguish between the input voice and the ambient noise.

[Means for solving problems]

本発明の音声判定検出装置は、入力信号を一定の抽出
区間ごとに線スペクトル対係数に変換する変換回路と、
該変換された線スペクトル対係数の隣接する係数間の距
離が、しきい値より大きいか小さいかを判定する係数間
距離の判定回路と、該係数間距離判定回路の判定結果
が、連続して一定時間以上継続したかどうかを判定し音
声を検出する有声音判定回路と、を備えている。The voice determination and detection device of the present invention is a conversion circuit that converts an input signal into a line spectrum pair coefficient for each constant extraction section,
The inter-coefficient distance determination circuit that determines whether the distance between adjacent coefficients of the converted line spectrum pair coefficient is greater than or less than a threshold value, and the determination result of the inter-coefficient distance determination circuit are consecutive. And a voiced sound determination circuit that determines whether or not the voice has continued for a certain period of time or more and detects a voice.

[Action]

したがって本発明によると、入力信号が音声信号であ
るか否かの判定に、線スペクトル対係数の係数間距離を
用いるため、周囲雑音があっても音声を判定し検出する
ことができる。Therefore, according to the present invention, since the inter-coefficient distance of the line spectrum pair coefficient is used to determine whether or not the input signal is a voice signal, the voice can be determined and detected even in the presence of ambient noise.

〔Example〕

以下に本発明を、その実施例について図面を参照して
説明する。Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明による一実施例を示すブロック図、第
２図ならびに第３図はそれぞれ、同上の実施例を説明す
るグラフ図である。入力する音に対応する入力信号１
は、通常、周囲雑音を含んでいる。線スペクトル対の変
換回路２は、入力信号１を、線形予測符号化法の一種で
ある線スペクトル対（Line Spectrum Pair、以下、LSP
と称す）方式により、周波数領域のパラメータである線
スペクトル対（LSP）係数の信号３に変換する。FIG. 1 is a block diagram showing an embodiment according to the present invention, and FIGS. 2 and 3 are graphs for explaining the same embodiment. Input signal 1 corresponding to the input sound
Usually contains ambient noise. The line spectrum pair conversion circuit 2 converts the input signal 1 into a line spectrum pair (hereinafter, LSP), which is a kind of linear predictive coding method.
(Hereinafter, referred to as a “system”), the signal 3 of the line spectrum pair (LSP) coefficient, which is a parameter in the frequency domain, is converted.

たとえば、LSP係数は、分析次数を８次で計算する
と、第２図、第３図の如く、w₁,w₂,w₃,〜,w₈の８個が求
められる。なお、分析は、標本化周波数が8kHzで、帯域
幅を電話帯域の0.4〜3.4kHzとし、分析フレーム周期を1
0〜20m秒とする。For example, when the analysis order is calculated by the 8th order, eight LSP coefficients w ₁ , w ₂ , w ₃ , ..., W ₈ are obtained as shown in FIGS. ₂ and ₃ . In the analysis, the sampling frequency is 8 kHz, the bandwidth is 0.4 to 3.4 kHz of the telephone band, and the analysis frame period is 1
0 to 20 ms.

また、LSPについては、1981年２月２日発行の「日経
エレクトロニクス」No.257の記事「線スペクトル周波数
をパラメータとした音声合成法とそのLSI化」P.P.128〜
158に解説されている。Regarding LSP, "Speech synthesis method using line spectrum frequency as a parameter and its LSI implementation" in "Nikkei Electronics" No.257, published on February 2, 1981, PP128-
158.

LSP係数w₁〜w_Pは、周波数領域のパラメータであっ
て、音声のホルマント周波数F₁〜Ｆ_P/2の近ぼうに集中
するという性質があり、また、各LSP係数w₁〜w_P間に
は、次の関係が成立している。すなわち、０＜w₁＜w₂＜
…＜w_P-1＜w_P＜πであり、ここでＰは分析次数である。The LSP coefficients w _{1 to} w _P are parameters in the frequency domain and have the property of being concentrated in the vicinity of the formant frequencies F _{1 to} F _{P / 2} of the speech, and between the LSP coefficients w _{1 to} w _P. Has the following relationship: That is, 0 <w ₁ <w ₂ <
<W _P-1 <w _P <π, where P is the analysis order.

この性質を利用して、係数間距離の判定回路４によ
り、線スペクトル対係数信号３にて、第２図のように隣
接するLSP係数間の距離（w₂−w₁）〜（w₈−w₇）を計算
する。Utilizing this property, the inter-coefficient distance determination circuit 4 determines the distance (w ₂ −w ₁ ) to (w ₈ −) between adjacent LSP coefficients in the line spectrum pair coefficient signal 3 as shown in FIG. Calculate w ₇ ).

係数間距離（w_n−w_n-1）の計算方法の一例をつぎに述
べる。LSP分析の次数がＰ次のとき、ｎ＝2,3,…Ｐにお
いて、次式を計算する。An example of the calculation method of the inter-coefficient distance (w _n −w _n-1 ) will be described below. When the order of the LSP analysis is the P-th order, the following equation is calculated at n = 2, 3, ... P.

（w_n−w_n-1）＜w_TH1 ……（１）（w_n−w_n-1）＜w_TH2 ……（２）なお、w_TH1とw_TH2とは、LSP係数間距離（w_n−w_n-1）
のしきい値であり、w_TH1＜w_TH2に設定される。(W _n −w _n-1 ) <w _TH1 …… (1) (w _n −w _n-1 ) <w _TH2 …… (2) Note that w _TH1 and w _TH2 are the distances between the LSP coefficients (w _n −w _n-1 )
_Is a threshold value of w _TH1 <w _TH2 .

（１）式を満足するLSP係数w₁〜w_Pが１個以上存在
し、かつ（２）式を満足するLSP係数w₁〜w_Pが２個以上
存在すれば、係数間距離判定結果の信号５が、有声音で
あると判定され、次に有声音の判定回路６は、係数間距
離判定結果信号５が有声音であることを、たとえば連続
して３フレーム継続して入力されると、音声検出結果の
信号７を出力する。If there are one or more LSP coefficients w _{1 to} w _P satisfying the expression (1) and two or more LSP coefficients w _{1 to} w _P satisfying the expression (2), the inter-coefficient distance determination result If the signal 5 is determined to be a voiced sound, and then the voiced sound determination circuit 6 inputs that the inter-coefficient distance determination result signal 5 is a voiced sound, for example, it is continuously input for three frames. , And outputs the signal 7 of the voice detection result.

第２図は、第１図の実施例において、有声音の場合の
周波数スペクトルとLSP係数との関係を示し、また第３
図は、第１図の実施例において、無声音あるいは周囲雑
音の周波数スペクトルとLSP係数との関係を示す。FIG. 2 shows the relationship between the frequency spectrum and the LSP coefficient in the case of voiced sound in the embodiment of FIG.
The figure shows the relationship between the frequency spectrum of unvoiced sound or ambient noise and the LSP coefficient in the embodiment of FIG.

第２図から分かるように、有声音の場合、ホルマント
周波数F₁〜F₄の近ぼうにLSP係数w₁〜w₈が集中してい
る。また、第１ホルマント周波数F₁は一般に共振の利得
が高いため、LSP係数w₁,w₂の集中度も強まって、LSP係
数間距離（w₂−w₁）は、しきい値w_TH1より小さくなり、
第２ホルマント周波数F₂近ぼうのLSP係数間距離（w₄−w
₃）はしきい値w_TH2よりも小さくなる。As can be seen from FIG. 2, in the case of voiced sound, the LSP coefficients w _{1 to} w ₈ are concentrated near the formant frequencies F _{1 to} F ₄ . In addition, since the first formant frequency F ₁ generally has a high resonance gain, the degree of concentration of the LSP coefficients w ₁ and w ₂ also increases, and the distance between the LSP coefficients (w ₂ −w ₁ ) becomes greater than the threshold value w _TH1 . Getting smaller,
Second formant frequency F ₂ Distance between LSP coefficients near (w ₄ −w
₃ ) becomes smaller than the threshold value w _TH2 .

しかし無声音や周囲雑音の場合、第３図の如く、周波
数スペクトルが平坦であり、LSP係数w₁〜w₈の集中は少
ない。このため、LSP係数間距離（w_n−w_n-1）はしきい
値w_TH1,w_TH2より小さくなることはない。However, in the case of unvoiced sound or ambient noise, the frequency spectrum is flat and the LSP coefficients w _{1 to} w ₈ are not concentrated as shown in FIG. Therefore, the distance between LSP coefficients (w _n −w _n−1 ) does not become smaller than the threshold values w _TH1 and w _TH2 .

〔The invention's effect〕

以上説明したように本発明は、入力信号が音声信号で
あるかどうかを判定するために、入力信号レベルの大き
さで判定するかわりに、線スペクトル対係数の係数間距
離を用いることにより、周囲雑音にうもれた音声でも、
有声音であれば検出することが可能であるから、音声認
識装置における認識率の向上に効果がある。As described above, according to the present invention, in order to determine whether or not an input signal is a voice signal, instead of determining based on the magnitude of the input signal level, the inter-coefficient distance between the line spectrum and the coefficient is used to Even voices that are noisy
Since voiced sound can be detected, it is effective in improving the recognition rate in the voice recognition device.

[Brief description of drawings]

第１図は本発明による一実施例を示すブロック図、第２
図ならびに第３図は、それぞれ同上を説明するためのグ
ラフ図、第４図は従来例を示すブロック図である。２……線スペクトル対変換回路、４……係数間距離判定回路、６……有声音判定回路。FIG. 1 is a block diagram showing an embodiment of the present invention, and FIG.
FIG. 3 and FIG. 3 are graphs for explaining the same as above, and FIG. 4 is a block diagram showing a conventional example. 2 ... Line spectrum pair conversion circuit, 4 ... Inter-coefficient distance determination circuit, 6 ... Voiced sound determination circuit.

Claims

(57) [Claims]

1. A conversion circuit for converting an input signal into a line spectrum pair coefficient for each fixed extraction section, and whether a distance between adjacent coefficients of the converted line spectrum pair coefficient is larger or smaller than a threshold value. A voice having a determination circuit for determining the inter-coefficient distance, and a voiced sound determination circuit that determines whether or not the determination result of the inter-coefficient distance determination circuit has continued for a fixed time or longer and detects a voice. Judgment detection device.