JPH03236100A

JPH03236100A - Voice detection system

Info

Publication number: JPH03236100A
Application number: JP2031424A
Authority: JP
Inventors: Masami Akamine; 政巳赤嶺
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1990-02-14
Filing date: 1990-02-14
Publication date: 1991-10-22

Abstract

PURPOSE:To reduce an omission and the addition of a noise by calculating feature parameters from a voice signal and projecting in a specific main component vector space, discriminating whether a voice signal is a voiceless or voiced sound according to the position of the projective point, and increasing voice/ voiceless-sound detection accuracy. CONSTITUTION:The voice signal is inputted and encoded by a voice encoder 41 and a noise encoder 42. One of the outputs of the encoders 41 and 42 is inputted to a cell generating circuit 46 through a selector 45 and made into a cell. The selector 45 is switched according to the discrimination result of a voiceless/voiced-sound discriminating device 43 and only the voiced section of the voice signal is made into the cell. A noise detector 44 detects only the voiceless sound section, wherein a background noise is encoded by the noise encoder 42. The selector 45 is switched to the side of the noise encoder 42 when the discriminating device 43 detects the voiceless sound. The device 43 calculates the LPC cepstrum C of the input voice signal to analyze the main component, an inner product calculator 51 finds the projection point Q of C, and the discriminating device 56 decides whether a frame is voiced or voiceless according to the Q.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は、音声信号の有音部分をセル化し伝送するＡ
ＴＭ通信や音声認識の基本技術である音声信号の有音・
無音検出方式に関する。[Detailed description of the invention] [Object of the invention] (Industrial application field)
The presence/absence of voice signals, which is the basic technology of TM communication and voice recognition,
Regarding silence detection method.

（従来の技術）音声信号のうちの有音部分のみを処理する装置では無音
／有音の識別が正確に行なわれないと、通信された音声
が途切れたり、音声認識の誤り率が増加する。また、Ａ
ＴＭ通信システムにおいては、回線の有効利用が図れな
い。そのため非常に高精度な無音／有音識別が望まれて
いる。そこで、使用環境の変動による信号レベルの変動
に依存せず、周囲雑音レベルが大きい場合でも、語順子
音等のレベルの小さい有音信号に対しても検出脱落を減
少できる無音／有音識別装置の従来例が特開昭ｆｔ（１
−２ＱＱ３Ｇ（１号に記載されている。(Prior Art) If a device that processes only the voiced portion of an audio signal does not accurately distinguish between silence and voice, the transmitted voice will be interrupted and the error rate of voice recognition will increase. Also, A
In the TM communication system, lines cannot be used effectively. Therefore, very accurate silent/sound discrimination is desired. Therefore, a silence/voice identification device that does not depend on signal level fluctuations due to changes in the usage environment and can reduce detection omissions even when the ambient noise level is high or for low-level voice signals such as word order consonants. A conventional example of this is JP-A-Sho ft (1
-2QQ3G (described in No. 1).

第１図はこの従来装置のブロック図である。マイク等を
介して入力される音声信号がエネルギ抽出回路５、スペ
クトル抽出回路６に供給される。FIG. 1 is a block diagram of this conventional device. An audio signal input via a microphone or the like is supplied to an energy extraction circuit 5 and a spectrum extraction circuit 6.

エネルギ抽出回路５は整流平滑回路からなり、音声信号
の特徴パラメータとしてのパワー（対数パワー）を無音
／有音の識別時間単位であｐ所定時間のフレーム期間毎
に抽出する。スペクトル抽出回路６は低域（２５０−６
００Ｈり、中域（６１１０−１５００Ｈｒｌ高域（１５
００−４０００ｔｌりの３種類のバンドパスフィルタと
それらの出力端に接続される３つの整流平滑回路からな
り、やはりフレーム期間毎に音声信号の特徴パラメータ
としての各帯域毎のパワー（対数パワー）を抽出する。The energy extraction circuit 5 is composed of a rectifying and smoothing circuit, and extracts the power (logarithmic power) as a characteristic parameter of the audio signal every frame period of p predetermined time in units of silent/sound discrimination time. The spectrum extraction circuit 6 has a low frequency range (250-6
00Hrl, middle range (6110-1500Hrl high range (15
It consists of three types of bandpass filters of 00 to 4000 tl and three rectifying and smoothing circuits connected to their output terminals, and also calculates the power (logarithmic power) of each band as a characteristic parameter of the audio signal for each frame period. Extract.

エネルギ抽出回路５とスペクトル抽出回数６は特微量抽
出部１３を構成する。The energy extraction circuit 5 and the frequency of spectrum extraction 6 constitute a feature amount extraction section 13.

エネルギ抽出回路５とスペクトル抽出回路６の出力がマ
ルチプレクサ７に供給され、エネルギ抽出回路５からの
信号パワーとスペクトル抽出回路６からの帯域毎のパワ
ーとが時分割的に無音／有音識別器８に供給される。識
別８は音声信号の各フレームが無音か有音（有音フレー
ムは有声音フレーム、無声音フレームを含む）であるか
を判定する。識別器８には閾値メモリ９と標準パターン
メモリ１０とが接続される。閾値メモリ９はパワーに基
いて無音か有音かを判定のため２つの閾値Ｅ１、Ｅ２を
格納する。標準パターンメモリ１０は検出すべきフレー
ムが無音フレームあるいは舞声音フレームであることを
検出すめための線形識別関数の係数と、検出すべきフレ
ームが無音フレームおよび有声音フレームであることを
検出するための線形識別関数の係数を格納する。これら
の閾値と係数は、予めこの無音／有音識別装置を使用す
る環境下で発生された無音、有声音、無声音を含む音声
信号の統計的性質により求められ、格納されている。The outputs of the energy extraction circuit 5 and the spectrum extraction circuit 6 are supplied to a multiplexer 7, and the signal power from the energy extraction circuit 5 and the power for each band from the spectrum extraction circuit 6 are time-divisionally divided into a silence/speech discriminator 8. is supplied to Identification 8 determines whether each frame of the audio signal is silent or has a sound (a sound frame includes a voiced sound frame and an unvoiced sound frame). A threshold memory 9 and a standard pattern memory 10 are connected to the discriminator 8 . The threshold value memory 9 stores two threshold values E1 and E2 for determining whether there is no sound or sound based on the power. The standard pattern memory 10 contains coefficients of a linear discriminant function for detecting whether a frame to be detected is a silent frame or a voiced frame, and a coefficient for detecting that a frame to be detected is a silent frame or a voiced frame. Stores the coefficients of the linear discriminant function. These threshold values and coefficients are determined and stored in advance based on the statistical properties of audio signals including silence, voiced sounds, and unvoiced sounds generated under the environment in which this silence/speech identification device is used.

識別器８の識別結果が有音期間の始端、終端候補検出器
１１に入力され、各フレーム毎の識別結果に基づいて、
有音期間の始端、終端候補（フレーム）を検出する。検
出結果は有音期間検出部１２に送られ、そこで最終的に
有音期間の始端、終端が決定される。The identification result of the classifier 8 is input to the sound period start and end candidate detector 11, and based on the identification result for each frame,
Detect starting and ending candidates (frames) of a sound period. The detection result is sent to the sound period detection unit 12, where the start and end of the sound period are finally determined.

この従来例の動作を説明する。各フレームの音声信号は
エネルギ抽出回路５によりパワーＬＰＷに変換され、ス
ペクトル抽出回路６により帯域毎のパワーＬＰｉ（ｉは
帯域を示すパラメータであり、ここでは、ｉ＝１〜３）
に変換される。無音／有音識別器８はこれら４つの特徴
パラメータＬＰＷＳＬＰｉと、メモリ９．１０内に格納
されている閾値Ｅ１、Ｅ２、線形識別関数の係数とを用
いて、当該フレームが有音であるか無音であるかを判定
する。The operation of this conventional example will be explained. The audio signal of each frame is converted into power LPW by the energy extraction circuit 5, and the power LPi for each band is converted by the spectrum extraction circuit 6 (i is a parameter indicating the band, here, i=1 to 3).
is converted to The silence/speech discriminator 8 uses these four feature parameters LPWSLPi, thresholds E1 and E2, and coefficients of the linear discrimination function stored in the memory 9.10 to determine whether the frame is voiced or not. Determine whether

この判定は先ず２つの閾値Ｅｌ、Ｅ２とパワーＬＰＷと
の比較により次のように行う。This determination is first made by comparing two threshold values El and E2 with the power LPW as follows.

ＬＰＷ＞Ｅｌ　　　　　　ならば有音、Ｌ　ＰＷ＞　Ｅ
　２　　　　　　ならば無音、Ｅ２≦ＬＰＷ≦Ｅ１　　
ならば不定と判定する。If LPW>El, there is sound, L PW>E
If 2, there is no sound, E2≦LPW≦E1
If so, it is determined to be indeterminate.

不定と判定された場合はさらに帯域毎のパワーＬＰｉと
２つの線形識別関数の係数とを用いて次のように表わさ
れる識別関数値ＦＸにより判定を行なう。If it is determined to be undefined, further determination is made based on the discriminant function value FX expressed as follows using the power LPi for each band and the coefficients of the two linear discriminant functions.

１＝１ここで、ＡｉはメモリＩＯに格納されている識別別関数
の係数であり、ＬＰｉは標準パターンであり、標準パタ
ーンも予め使用環境下で発声された音声信号を統計処理
して求められる。1=1 Here, Ai is the coefficient of the discrimination function stored in the memory IO, LPi is the standard pattern, and the standard pattern is also obtained by statistical processing of the audio signal uttered in the usage environment in advance. .

関数値ＦＸは無音の場合は負で、無声音あるいは有声音
からなる有音の場合は正になるように決められている。The function value FX is determined to be negative when there is no sound, and positive when there is a sound consisting of an unvoiced sound or a voiced sound.

係数Ａｔとして無音、無声音であることを検出するため
の線形識別関数の係数を用いた場合のＦＸと、無音、有
声音であることを検出するための線形識別関数の係数を
用いた場合のＦＸとをそれぞれ計算し、いずれか一方で
も正の値をとる場合は有音、それ以外の場合は無音と識
別する。FX when a coefficient of a linear discriminant function is used to detect silence or voiceless sound as the coefficient At, and FX when a coefficient of a linear discriminant function is used to detect silence or voiced sound. are calculated respectively, and if either one takes a positive value, it is determined that there is a sound, and otherwise it is determined that there is no sound.

この従来例は、スペクトル抽出回路６で抽出したスペク
トル形状と無音、無声音、有声音の標準的なスペクトル
形状との相違に基づいて無音／有音識別を行なっている
ので、エネルギレベルの小さな無声子音や有声子音等の
識別脱落を防ぐことができる。しかしながら、スペクト
ル形状に関する特徴パラメータとしては、低域（２５Ｇ
　−６００Ｈり、中域（６ＧＧ−１５ｆｌＯＨＩ）　、
高域（１５００−４０００）１りの３つの帯域毎のパワ
ーが用いられているが、この特徴パラメータの選択に理
論的根拠がないとともに、特徴パラメータの数が少ない
ので、無音／有音識別を誤ってしまい、有音信号の検出
漏れや雑音の付加等が避けられない場合がある。例えば
無声音のスペクトルと雑音のスペクトルがそれぞれ第２
図に実線、破線で示すような場合、両者の形状は大きく
異なっているにもかかわらず、Ａｉ　　（ｉ＝１〜３）
＝１とした場合の（１）式の関数値ＦＸは無声音と雑音
に対して同一になってしまい、誤判定をしてしまう。こ
れは、スペクトル形状を規定する特徴パラメータの数が
少ない上に、パラメータの選択が適切ではないからであ
る。さらに、選択に理論的根拠が無いため、選択を試行
錯誤的に行なわなくてはならず、そのために多大な労力
を有するにもかかわらず、その結果得られた特徴パラメ
ータが必ずしも適切でないという問題がある。In this conventional example, silence/voice discrimination is performed based on the difference between the spectral shape extracted by the spectrum extraction circuit 6 and the standard spectral shape of silence, unvoiced sounds, and voiced sounds. It is possible to prevent omission of identification of voiced consonants, voiced consonants, etc. However, the characteristic parameters related to the spectral shape are low frequency (25G
-600H, midrange (6GG-15flOHI),
Power for each of the three high-frequency bands (1500-4000) is used, but there is no theoretical basis for selecting this feature parameter, and the number of feature parameters is small, so it is difficult to distinguish between silence and sound. If a mistake is made, failure to detect a voice signal or addition of noise may be unavoidable. For example, the spectrum of unvoiced sound and the spectrum of noise are each
In the case shown by the solid line and broken line in the figure, Ai (i=1 to 3) even though the shapes of the two are greatly different.
When =1, the function value FX of equation (1) will be the same for unvoiced speech and noise, resulting in an erroneous determination. This is because the number of characteristic parameters that define the spectral shape is small and the selection of the parameters is not appropriate. Furthermore, since there is no theoretical basis for selection, selection must be made by trial and error, and despite a great deal of effort, the resulting feature parameters are not necessarily appropriate. be.

なお、パラメータの数を増せば、誤判定を減少させるこ
とができるが、（１）式の識別関数値を求めるための計
算量は増大する。さらに、上記公報では、（１）式の識
別関数の代わりにマハラノビス距離を用いることができ
るという記載があるが、それを使うと演算量はもっと増
えてしまうという問題点もある。Incidentally, if the number of parameters is increased, erroneous judgments can be reduced, but the amount of calculation to obtain the discriminant function value of equation (1) increases. Furthermore, although the above-mentioned publication states that the Mahalanobis distance can be used instead of the discriminant function in equation (1), there is a problem in that the amount of calculation increases even more if it is used.

（発明が解決しようとする課題）上述したように、従来の有音・無音検出方法は、演算量
を少なくするためにパラメータ数を少なくした場合、有
音・無音判定を誤ってしまい音声の脱落や雑音の付加が
避けられない場合があるという問題点がある。また、従
来の方法は、パラメータの選択に当って理論的な選択基
準がないため、多くの労力を要するという問題点がある
。(Problems to be Solved by the Invention) As mentioned above, in the conventional voice/silence detection method, when the number of parameters is reduced in order to reduce the amount of calculation, the voice/silence determination is incorrect and the voice is dropped. There is a problem that the addition of noise and noise may be unavoidable. Further, the conventional method has the problem that it requires a lot of effort because there is no theoretical selection criterion for selecting parameters.

本発明は、このような問題点に鑑みて行われ、有音・無
音検出精度が高く音声の脱落・雑音の付加が少ない有音
・無音検出方式を提供することを目的とする。The present invention has been made in view of these problems, and an object of the present invention is to provide a speech presence/non-speech detection method that has high precision in detecting speech presence/non-speech and is less likely to drop out voices or add noise.

［発明の構成］（課題を解決するための手段）本発明は、電話や認識装置が使用される環境下で予め集
収された音声を試聴や波形の視認などにより予め有音と
無音にラベル付けし、次に有音部と無音部の特徴パラメ
ータを求め、それらに対しそれぞれ主成分分析を行い有
音部と無音部の主成分ベクトルを予め求めておく。[Structure of the Invention] (Means for Solving the Problems) The present invention labels sounds collected in advance in an environment where a telephone or a recognition device is used as sound or non-sound by preview listening or visual confirmation of waveforms. Next, the characteristic parameters of the sound part and the silent part are obtained, and principal component analysis is performed on each of them to obtain the principal component vectors of the sound part and the silent part in advance.

次に、検出対象フレームの特徴パラメータを、有音部特
徴パラメータの主成分ベクトル空間又は無音部特徴パラ
メータの主成分ベクトル空間上に射影し、その射影点の
位置と射影点の時間的変化量又は射影点の時間的変化量
に基づいて有音・無音の判定を行う。Next, the feature parameters of the detection target frame are projected onto the principal component vector space of the sound part feature parameters or the principal component vector space of the silent part feature parameters, and the position of the projection point and the temporal change amount of the projection point or The presence or absence of sound is determined based on the amount of change in the projection point over time.

（作　　用）上述したような本発明の音声検出方式を用いる時、まず
、音声信号等の音響信号の特徴パラメータを求める。次
のそのパラメータを別のパラメータに変換した後パラメ
ータ数を元の特徴パラメータより少なくすることを考え
る。第３図にこの概念を示す。第３図において、Ｌ個の
元の特徴パラメータをＸｉ　　（ｉ＝ｌ、　　２．・・
・、Ｌ）としＸｉを要素とするベクトルをＸとする。変
換は直交変換とし、変換行列をＡとする。変換後の特徴
パラメータをｙｉ　　（ｉ＝１．２．・・・、Ｌ）、ｙ
ｉを要素とするベクトルをＹＳＮ個のパラメータｙｉ（
ｊ＝１．２．・・・、Ｎ）を残りの（Ｌ−Ｎ）個を零と
した特徴パラメータベクトルＹ′とする（但し、Ｎ＜Ｌ
）。このとき、パラメータ数削減によって生じる誤差ベ
クトルｅは、元の特徴パラメータベクトルＸとＹ′の逆
変換との差として次式のように記述される。(Function) When using the voice detection method of the present invention as described above, first, characteristic parameters of an acoustic signal such as a voice signal are determined. Next, consider reducing the number of parameters after converting that parameter into another parameter than the original feature parameter. Figure 3 shows this concept. In Fig. 3, L original feature parameters are expressed as Xi (i=l, 2...
, L) and let X be a vector whose element is Xi. The transformation is orthogonal transformation, and the transformation matrix is A. The feature parameters after conversion are yi (i=1.2...,L), y
A vector with i as an element is defined as YSN parameters yi(
j=1.2. ..., N) is the feature parameter vector Y' with the remaining (L-N) pieces being zero (however, N<L
). At this time, the error vector e caused by the reduction in the number of parameters is described as the difference between the original feature parameter vector X and the inverse transformation of Y' as shown in the following equation.

ｅ＝Ｘ−Ａ−”Ｙ− ＝Ａ−”（Ｙ−Ｙ−）この誤差の２乗平均値σｒ’　＝Ｅ　［ｅ’　ｅｌを最
小にする変換を行えば、特徴パラメータ数を少なくする
ことによる誤差が最小になる。但し、Ｅは期待値である
７ｒ２を最小化する変数は、Ｘｉの自己相関行列の固有
ベクトルを行ベクトルとする行列Ａによる変換、すなわ
ちＫＬ変換であることが知られている。また固有ベクト
ルは、Ｘｉ主成分分析によって得られる主成分ベクトル
と同じであり、固有値の大きい順に対応した固有ベクト
ルが第１、第２、第３・・・主成分ベクトルに対応する
。e=X-A-"Y- = A-" (Y-Y-) The root mean square value of this error σr' = E [e' The number of feature parameters can be reduced by performing the transformation that minimizes el. The error due to this is minimized. However, it is known that the variable that minimizes 7r2, where E is the expected value, is a transformation using a matrix A whose row vectors are the eigenvectors of the autocorrelation matrix of Xi, that is, KL transformation. Further, the eigenvectors are the same as the principal component vectors obtained by Xi principal component analysis, and the eigenvectors corresponding to the larger eigenvalues correspond to the first, second, third, . . . principal component vectors.

Ｌ個の特徴パラメータＸをＫＬ変換した後、パラメータ
数を削減する操作は、第１〜第Ｎ主成分ベクトルを座標
軸とするＮ次元主成分ベクトル空間上に、Ｘを射影する
ことに対応する。従って、特徴パラメータを主成分ベク
トル空間上に射影することにより、元の特徴パラメータ
をより少ないパラメータ次元で表現する場合の誤差、言
い換えれば元の特徴パラメータのもつ情報のロスを最小
にしながら特徴パラメータ数を少なくできる。After performing KL transformation on the L feature parameters X, the operation of reducing the number of parameters corresponds to projecting X onto an N-dimensional principal component vector space whose coordinate axes are the first to Nth principal component vectors. Therefore, by projecting the feature parameters onto the principal component vector space, we can reduce the number of feature parameters while minimizing the error in expressing the original feature parameters with fewer parameter dimensions, in other words, the loss of information in the original feature parameters. can be reduced.

有音部と無音部の特徴パラメータは、特性の違い、例え
ばスペクトル形状の違いによって主成分ベクトル空間上
の特定の領域に分布する。有音・無音判定はこの性質を
利用し、特徴パラメータを主成分ベクトル空間上に射影
したときの射影点と予め設定された有音／無音の領域の
比較により有音・無音の判定を行う。The feature parameters of the sound part and the silent part are distributed in a specific region on the principal component vector space due to differences in characteristics, for example, differences in spectral shapes. Utilizing this property, the presence/absence of speech is determined by comparing the projection point when the feature parameters are projected onto the principal component vector space with a preset region of speech/non-speech.

また、特徴パラメータの主成分ベクトル空間上の射影点
は、無音から有音、有音から無音というように音声の性
質が時間的に変化するのに対応して時間的にある特徴的
な変化をする。この性質を利用して射影点の時間的変化
を表すパラメータと予め設定したしきい値との比較又は
時間変化を表すパラメータのパターンと予め設定した標
準パターンとのマツチングにより有音・無音判定を行う
ことができる。音声検出は、このような射影点の時間的
変化に基づく有音・無音判定単体又は上述した射影点の
位置に基づく有音・無音判定との組合せによって行う。In addition, the projection point on the principal component vector space of the feature parameter shows a certain characteristic change over time corresponding to the temporal change in the nature of the voice, such as from silence to voice, and from voice to silence. do. Utilizing this property, the presence/absence of sound is determined by comparing the parameter representing the temporal change of the projection point with a preset threshold, or by matching the pattern of the parameter representing the temporal change with a preset standard pattern. be able to. The voice detection is performed by the sound/non-sound determination based on the temporal change of the projection point alone or in combination with the above-described voice/non-sound determination based on the position of the projection point.

この方法により、有音／無音の検出精度が高くしかも音
声の脱落・雑音の付加が少なくなる。This method provides high accuracy in detecting speech/non-speech, and reduces omission of speech and addition of noise.

（実施例）以下図面を参照してこの発明による無音／有音識別装置
の実施例を説明する。第３図にこの発明の概略を示す。(Embodiment) An embodiment of the silence/speech identification device according to the present invention will be described below with reference to the drawings. FIG. 3 shows an outline of the invention.

先ず、音声信号から公知の方法で特徴パラメータを求め
ておく。特徴パラメータとしては、従来例のようなスペ
クトル以外にも、ＬＰＣケプストラム、信号パワー、零
交差数、線形予測係数、自己相関関数、ＤＦＴ係数等、
及びそれらを組み合わせたものがある。ここでは、特徴
パラメータの個数、種類等の選択は不要であり、なるべ
く多種類で多数のパラメータを求めておくことが望まし
い。次に、この特徴パラメータの数を無音／有音識別精
度に影響がでないように減少することを考える。このた
めに、この発明では、特徴パラメータを一旦別のパラメ
ータに変換した後パラメータの数を減らし、この変換と
しては、数が削減された変換後のパラメータを逆変換し
て得られるパラメータと元のパラメータとの誤差が最小
になるような変換を行う。First, feature parameters are obtained from the audio signal using a known method. In addition to the conventional spectrum, characteristic parameters include LPC cepstrum, signal power, number of zero crossings, linear prediction coefficient, autocorrelation function, DFT coefficient, etc.
There are also combinations of these. Here, it is not necessary to select the number, types, etc. of feature parameters, and it is desirable to obtain as many parameters as possible. Next, consider reducing the number of feature parameters so as not to affect the silence/speech discrimination accuracy. To this end, in this invention, the number of parameters is reduced after first converting the feature parameters into other parameters, and this conversion involves inversely transforming the reduced number of transformed parameters and the original parameters. Conversion is performed so that the error with the parameters is minimized.

具体的に説明すると、第３図に示すようにＬ個の元の特
徴パラメータをＸｉ、（ｉ＝１〜Ｌ）としＸｉを要素と
するベクトルをＸとする。変換は直交変換とし、変換行
列をＡとする。変換後の特徴パラメータをｙｉ　　（ｉ
−１〜Ｌ）、ｙｉを要素とするベクトルをＹとし、変換
後の特徴パラメータｙｉのうちのＮ個のパラメータｙｊ
（ｊ＝１〜Ｎ）を残して、残りの（Ｌ−Ｎ）個のパラメ
ータを０とした特徴パラメータからなるベクトルをＹ′
とする。ただし、ＮＩＬとする。このとき、パラメータ
数を削減したことによって生じる誤差ベクトルｅは、元
の特徴パラメータベクトルＸと変換後の特徴パラメータ
ベクトルＹを逆変換したベクトルＹ−１Ａ−’との差と
して、次式のように表される。Specifically, as shown in FIG. 3, it is assumed that L original feature parameters are Xi, (i=1 to L), and a vector having Xi as an element is X. The transformation is orthogonal transformation, and the transformation matrix is A. The feature parameters after conversion are yi (i
-1 to L), where Y is a vector whose elements are yi, and N parameters yj of the feature parameters yi after conversion are
(j = 1 to N) and the remaining (L-N) parameters are set to 0.
shall be. However, it shall be NIL. At this time, the error vector e caused by reducing the number of parameters is calculated as the difference between the original feature parameter vector expressed.

ｅ＝Ｘ−Ａ−’Ｙ” ＝Ａ−１（Ｙ−Ｙｉ・・・・・・・・・　（２）この誤
差の二乗平均値（ｒｒ２＝Ｅ　［ｅ’　ｅ］を最小にす
る変換を行なえば、特徴パラメータの数を少なくするこ
とによる誤差が最小になる。但し、ｔは行列の転置Ｅ［
］は期待値である。e=X-A-'Y" = A-1(Y-Yi...... (2) Perform the transformation that minimizes the root mean square value of this error (rr2=E [e' e] If we do this, the error due to reducing the number of feature parameters will be minimized, where t is the transpose of the matrix E[
] is the expected value.

（２）式で表される二乗平均値を最小にする変換パラメ
ータＸｉの自己相関行列の固有ベクトルを行ベクトルと
する行列Ａによる変換、すなわちＫＬ変換として知られ
ている。固有ベクトルはＸｉを主成分分析することによ
って得られる主成分ベクトルと同じであり、固有値の大
きい順に対応する固有ベクトルが第１、第２、・・・主
成分ベクトルに対応する。This is known as the KL transformation, which is a transformation using a matrix A whose row vectors are the eigenvectors of the autocorrelation matrix of the transformation parameter Xi that minimizes the root mean square value expressed by equation (2). The eigenvectors are the same as the principal component vectors obtained by principal component analysis of Xi, and the eigenvectors corresponding to the larger eigenvalues correspond to the first, second, . . . principal component vectors.

特徴パラメータベクトルをＸｉ　　（ｉ＝１〜Ｍ）とす
ると、各ベクトルＸｉに対して（２）式の二乗平均値！
ｊ’　ｒ　’を最小にする変換は次式で表わされる特徴
パラメータベクトルの自己相関行列を主成分分析するこ
とによって得られる固有ベクトルにより定義される。Letting the feature parameter vector be Xi (i=1 to M), the root mean square value of equation (2) for each vector Xi!
The transformation that minimizes j' r ' is defined by the eigenvector obtained by principal component analysis of the autocorrelation matrix of the feature parameter vector expressed by the following equation.

ここで、ｘｉ、、ｘ、・・・ｘｉｔは特徴パラメータベ
クトルｘｉの要素である。Here, xi, , x, . . . xit are elements of the feature parameter vector xi.

（３）式から自己相関行列Ｒは次式に示す各特徴パラメ
ータベクトルＸｉの自己相関行列ＲｉをＬ２次元の空間
で平均したもの、すなわち重心であることがわかる。From equation (3), it can be seen that the autocorrelation matrix R is the average of the autocorrelation matrix Ri of each feature parameter vector Xi shown in the following equation in an L2-dimensional space, that is, the center of gravity.

・・・・・・・・・（３）・・・・・・・・・（４）Ｍ個の特徴パラメータベクトルＸｉの自己相関行列Ｒｉ
　（ｉ＝１．２、・・・Ｍ）を１個の自己相関行列Ｒで
代表させるとすると、自己相関行列Ｒは次式で定義され
るＲｉとＲとの二乗平均誤差Ｅを最小にする行列となる
。・・・・・・・・・(3) ・・・・・・・・・(4) Autocorrelation matrix Ri of M feature parameter vectors Xi
(i=1.2,...M) is represented by one autocorrelation matrix R, the autocorrelation matrix R minimizes the root mean square error E between Ri and R defined by the following equation There will be a queue.

ＬＭ　　　　　Ｌ　　　　　ＬＥ＝　　　　　Σ　　　Σ　　　Σ　　（Ｒ（ｋ、　１
）　−ＲＩ　（ｋ、Ｊ）　）Ｍ　　１・ｌ　　　ｋ＝Ｉ
　　　Ｉ＝１なぜなら、上式をＴ（ｋ、／）で偏微分し
、それをＯとすることにより、次式が得られ、これが、
Ｒになるからである。ただしＲ（ｋ、　ｌ　）Ｒｉ　（
ｋ　、ｌ　）はそれぞれ自己相関行列ＲとＲｉの（ｋ、
４７）要素である。LM L L E= Σ Σ Σ (R(k, 1
) −RI (k, J) ) M 1・l k=I
I=1 Because, by partially differentiating the above equation with T(k,/) and setting it as O, the following equation is obtained, which is
This is because it becomes R. However, R(k, l)Ri (
k, l) are the autocorrelation matrices R and Ri (k,
47) It is an element.

このような変換行列Ｒを用いて特徴パラメータベクトル
ＸｉをＫＬ変換することにより主成分分析を行うことが
できる。Principal component analysis can be performed by performing KL transformation on the feature parameter vector Xi using such a transformation matrix R.

ここで、Ｌ個の特徴パラメータｘｉ　　（ｉ＝１〜Ｌ）
をＫＬ変換した後、パラメータ数を削減する操作は、第
１〜第Ｎ主成分ベクトルを座標軸とするＮ次元主成分ベ
クトル空間上に特徴パラメータベクトルＸを射影するこ
とに対応する。そして、主成分ベクトル空間上に射影す
ることにより、数を減らしたことにより生じる誤差を最
小に保ちながら元の特徴パラメータの数を減らすること
ができる。Here, L feature parameters xi (i=1 to L)
After performing KL transformation on , the operation of reducing the number of parameters corresponds to projecting the feature parameter vector X onto an N-dimensional principal component vector space having the first to Nth principal component vectors as coordinate axes. Then, by projecting onto the principal component vector space, the number of original feature parameters can be reduced while minimizing the error caused by reducing the number.

第４図はこのような原理に基づいて構成されたこの発明
による無音／有音識別装置の一実施例が音符骨化器４２
に供給され、そこで符号化される。FIG. 4 shows a note ossifier 42 which is an embodiment of the silent/sound discriminating device according to the present invention, which is constructed based on such a principle.
and encoded there.

符号化器４１．４２の符号化速度は異なり、音声符号化
器４１の方が高速である。音声符号化器４１．４２の符
号化システムとしてはＡＤＰＣＭ（Ａｄｓｐ目マＣＤｉ
目ｅｒｅｌｉｘｌ　ＰＩＩｇＩＣｏｄｅ　Ｍｏｄ＋＋１
ｘｔｉｏｎ）符号化シ路４６に供給され、音声信号がセ
ル化される。セレクタ４５の切換は無音／有音識別装置
４３の識別結果により行われ、音声信号の有音区間のみ
がセル化される。また、雑音検出器４４により無音区間
のみが検出され、無音区間は背景雑音が雑音符号化器４
２で符号化される。雑音は伝送された音声に自然さをだ
すために符号化され伝送されるが、そのビットレートは
低いので、雑音を伝送しても回線の利用効率はさほど低
下しない。そのため、セレクタ４５は無音／有音識別機
４３が無音を検出したときに雑音符号化器４２側に切り
換えられる。The encoding speeds of the encoders 41, 42 are different, with the speech encoder 41 being faster. The encoding system of the audio encoders 41 and 42 is ADPCM (AdspMCDi).
Eyerelixl PIIgICode Mod++1
xtion) is supplied to the encoding channel 46, and the audio signal is converted into cells. Switching of the selector 45 is performed based on the discrimination result of the silent/speech discriminating device 43, and only the sound section of the audio signal is converted into cells. Further, only silent sections are detected by the noise detector 44, and background noise in the silent sections is detected by the noise encoder 4.
2. Noise is encoded and transmitted in order to bring out the naturalness of the transmitted voice, but the bit rate is low, so transmitting noise does not significantly reduce line usage efficiency. Therefore, the selector 45 is switched to the noise encoder 42 side when the silence/speech discriminator 43 detects silence.

また、雑音（無音フレーム）のみを初期の段階（例えば
、通信回線の接続開始時）で受信側に伝送しておき、以
後は伝送しない代わりに受信側で絶えずこの雑音を再生
し、送信側で雑音に変化が生じたことを検出すると、再
度、雑音を送り直す、あるいは、不自然さは考慮せずに
音声だけを伝送し雑音は伝送しない等の変化も可能であ
る。なお、雑音を伝送しない場合でも受信側で白色雑音
を挿入するも可能である。In addition, only noise (silent frames) is transmitted to the receiving side at an early stage (for example, when starting a communication line connection), and instead of being transmitted thereafter, the receiving side constantly reproduces this noise, and the sending side When it is detected that a change has occurred in the noise, it is possible to make changes such as sending the noise again, or transmitting only the voice without considering unnaturalness and not transmitting the noise. Note that even when no noise is transmitted, it is possible to insert white noise on the receiving side.

第５図は第４図の音声セル化装置に使われるこの発明の
第１実施例の無音／有音識別装置４３の詳細なブロック
図である。ここでは、音声信号の特徴パラメータとして
はケプストラムが用いられている。その為、入力端子に
ＬＰＣケプストラム検出器５１が接続され、音声信号の
ケプストラムｃｉ（ｉ＝１．２、・・・Ｌ）が一定時間
のフレーム毎に計算とれる。パラメータの数りは分析次
数であり、この実施例では１６であるが、本発明では特
徴パラメータの数は主成分分析後に減少されるので、も
っと多くてもよい。ケプストラムの計算方法はＡｋｉｎ
　Ｖ、Ｏｐｐｅｎｈｅｉｍ　＆　Ｒｉｎａｌｄ　Ｗ、５
ｈ１ｅｒ’ＤＩＧＩＴＡＬＳＩＧＡＮＡＬ　ＰＲＯＣＥ
ＳＳＩＮＧ’、ＰＲＥＮＴＩＣＥ　）ＩＡＬＬ　ＩＮｃ
、Ｎ月９７５）ケプストラムｃ１、Ｃ２・・・ｃＬを要
素とするベクトルをＣとする。求められたＬＰＣケプス
トラムベクトルＣは内積計算器５３に入力される。内積
計算器５３は有音主成分ベクトルメモリ５２とともに特
出装置の使用環境下で収集された音声の有音成分のＬＰ
Ｃケプストラムに対して主成分分析を行った結果得られ
た第１〜第３主成分ベクトルｖ１、■２、ｖ３を格納し
ている。ただし、主成分ベクトルＶｉ　　（ｉ＝１〜３
）の要素をｖｉｊ（ｊ−１，２、・・・’Ｌ）とする。FIG. 5 is a detailed block diagram of the silence/speech identification device 43 of the first embodiment of the present invention used in the voice cell conversion device of FIG. Here, the cepstrum is used as the feature parameter of the audio signal. Therefore, an LPC cepstrum detector 51 is connected to the input terminal, and the cepstrum ci (i=1.2, . . . L) of the audio signal is calculated for each frame of a fixed time. The number of parameters is the analysis order, which is 16 in this example, but may be larger since in the present invention the number of feature parameters is reduced after principal component analysis. The cepstrum calculation method is Akin
V, Oppenheim & Rinald W, 5
h1er'DIGITAL SIGANAL PROCE
SSING', PRENTICE) IALL INc
, N month 975) Let C be a vector whose elements are cepstrum c1, C2...cL. The obtained LPC cepstrum vector C is input to the inner product calculator 53. The inner product calculator 53, together with the voiced principal component vector memory 52, calculates the LP of the voiced component of the voice collected under the usage environment of the special device.
The first to third principal component vectors v1, 2, and v3 obtained as a result of performing principal component analysis on the C cepstrum are stored. However, principal component vector Vi (i=1 to 3
) is assumed to be vij(j-1, 2,...'L).

有音主成分ベクトルメモリ５２に格納する有音主成分ベ
クトルを求める手順を第６図にフローチャートとして示
す。ステップ＃１で、予め装置の使用環境下において発
生された学習用音声データを収集する。ステップ＃２で
全音声データの中から有音データのみを抽出する。ステ
ップ＃３で、有音データのＬＰＣケプストラムを計算す
る。ステップ＃４で、シ、ＬＰＣケプストラ゛ムに対し
て主成分分析を行う。具体的には、ＬＰＣケプストラム
の自己相関行列を計算する。ステップ＃５で、その行列
の固有値を計算し、ステップ＃６で、絶対値の大きい固
有値に対応する固有ベクトルから順に第１、・・・第Ｎ
主成分ベクトルとする。ここでは、Ｎ＝３である。これ
により、有音主成分ベクトルＶ１、ｖ２、ｖ３が求めら
れる。The procedure for determining the voiced principal component vector to be stored in the voiced principal component vector memory 52 is shown as a flowchart in FIG. In step #1, learning audio data generated in advance in the usage environment of the device is collected. In step #2, only voiced data is extracted from all audio data. In step #3, the LPC cepstrum of the voiced data is calculated. In step #4, principal component analysis is performed on the LPC cepstrum. Specifically, an autocorrelation matrix of the LPC cepstrum is calculated. In step #5, the eigenvalues of the matrix are calculated, and in step #6, the eigenvectors corresponding to the eigenvalues with the largest absolute value are sequentially numbered from 1st to Nth.
Let it be a principal component vector. Here, N=3. As a result, the voiced principal component vectors V1, v2, and v3 are obtained.

第５図に戻って、内積計算器５３は次式のようなケプス
トラムベクトルＣと主成分ベクトルＶｉとの内積演算を
行い、第１〜第３主成分のベクトル■１、ｖ２、ｖ３を
座標軸とする３次元空間上のＬＰＣケプストラムベクト
ルＣ（＝ｃｌ、Ｃ２、・・・ｃＬ）’の射影点Ｑを求め
る。Returning to FIG. 5, the inner product calculator 53 performs an inner product operation between the cepstrum vector C and the principal component vector Vi as shown in the following equation, and uses the first to third principal component vectors 1, v2, and v3 as coordinate axes. The projection point Q of the LPC cepstrum vector C (=cl, C2, . . . cL)' on the three-dimensional space is determined.

ここで、ｑｉは射影点Ｑの座標軸Ｖｉ力方向成分である
。Here, qi is the force direction component of the coordinate axis Vi of the projection point Q.

間上の有音領域を規定するパラメータを格納する有音領
域パラメータメモリ５５も接続される。有音領域を第７
図に示すように直方体とすると、領域規定パラメータは
各座標軸における下限と上限であルＯＶＩＬｌＶ　ｌ１
ｌｓ　Ｖ２ＬＮ　ｖ、、、　ｖ３Ｌ、　ｖｌｌｌｌであ
る。これらのパラメータも予め装置の使用環境下で収集
された有音成分、無音成分（雑音も含む）のＬＰＣケプ
ストラムを統計処理して求めておく。Also connected is a sound area parameter memory 55 that stores parameters defining an intermediate sound area. 7th sound area
Assuming a rectangular parallelepiped as shown in the figure, the area defining parameters are the lower limit and upper limit of each coordinate axis.
ls V2LN v,,, v3L, vllll. These parameters are also determined in advance by statistically processing the LPC cepstrum of sound components and silent components (including noise) collected under the environment in which the device is used.

識別器５６は射影点Ｑが第７図に示す直方体の有音領域
内に存在するか否かに応じて、当該フレームが無音か有
音かの判定を行う。すなわちＶ１Ｌ≦ｑ１≦ｖ１Ｍかつ
ｖ２Ｌ≦ｑ２≦ｖ２，１１かつｖ、Ｌ≦ｑ３≦Ｖ、ｌの
場合のみ、有音と判定し、それ以外の場合は無音と判定
する。この判定の手順を第８図にフローチャートとして
示す。The classifier 56 determines whether the frame is silent or has sound, depending on whether or not the projection point Q exists within the sound region of the rectangular parallelepiped shown in FIG. That is, only when V1L≦q1≦v1M and v2L≦q2≦v2,11 and v, L≦q3≦V, l, it is determined that there is a sound, and in other cases, it is determined that there is no sound. The procedure for this determination is shown as a flowchart in FIG.

この実施例では有音主成分ベクトル空間上への射影点が
有音領域内に入るか否かに応じて判定を行なっているが
、有音領域の重心と射影点Ｑとの距離に基づいて行なう
ことも可能である。有音領域の重心Ｇの座標を（ｇｌ、
ｇ２、ｇ３）と置き、次式で表わされる距離りと予め定
められた閾値Ｔｈとを比較して、Ｄ≦Ｔｈならば有音、
Ｄ＞Ｔｈならば無音と判定することもできる。In this embodiment, the judgment is made depending on whether or not the projection point onto the sound principal component vector space falls within the sound region, but the judgment is made based on the distance between the center of gravity of the sound region and the projection point Q. It is also possible to do so. Let the coordinates of the center of gravity G of the sound area be (gl,
g2, g3), compare the distance expressed by the following formula with a predetermined threshold Th, and if D≦Th, there is a sound,
If D>Th, it can be determined that there is no sound.

Ｄ＝　　Σ　　　Ａｉ（ｑｉ−ｇｉ）２　　　　・・・
・・・（９）ｉ−まただし、Ａｉは重み係数である。D= Σ Ai(qi-gi)2...
...(9) i - where Ai is a weighting coefficient.

以上説明したこの発明の第１実施例の無音／有音識別装
置によれば、Ｌ個の特徴パラメータを第１〜第Ｍの有音
主成分ベクトルからなる空間内に射影し、射影点が有音
領域内に存在するか否かかに基づいて有音か否かを判定
しているので、実際に判定に使う特徴パラメータの数を
減らすことができ、計算量を減らすことができるととも
に、検出回路の構成が簡単になる。さらに、主成分ベク
トルからなる空間内に射影することにより、パラメータ
の数を減らしてもそれにより生じる識別精度の低下を最
小にすることができる。さらに、領域による判定を行な
っているので、有音領域、無音領域が主成分ベクトル空
間上で特別な領域に存在する場合でも、高精度で無音／
有音の判定ができる。例えば（９）式でＡｉ＝１　（ｉ
＝１．２．３）とおいた場合、Ｄ≦Ｔｈとなる領域は球
の内部となるように、距離・・による判定では、有音領
域の形状が距離の定義によって決まってしまい、任意の
形状を設定することができないのに対して、領域による
判定では、任意の形状を設定できることができる。According to the silent/sound identification device of the first embodiment of the present invention described above, the L feature parameters are projected into the space consisting of the first to M-th sound principal component vectors, and the projection point is Since the presence or absence of a sound is determined based on whether it exists within the sound region, the number of feature parameters actually used for determination can be reduced, reducing the amount of calculation and making it easier to detect The circuit configuration becomes simple. Furthermore, by projecting into a space consisting of principal component vectors, even if the number of parameters is reduced, the resulting reduction in identification accuracy can be minimized. Furthermore, since the region-based determination is performed, even if the voiced region or silent region exists in a special region on the principal component vector space, it can be determined with high accuracy whether there is sound or no sound.
It is possible to determine whether there is a sound. For example, in equation (9), Ai=1 (i
= 1.2.3), the region where D≦Th is inside the sphere, so when determining based on distance, the shape of the sound region is determined by the definition of distance, and any shape In contrast, in region-based determination, it is possible to set an arbitrary shape.

なお、特徴パラメータを投射する空間は有音主成分ベク
トルからなる空間としたが、無音主成分ベクトルで定義
される空間でもよい。また、領域判定のために有音領域
を使ったが、無音領域を使ってもよい。特徴パラメータ
としてはＬＰＣケプストラムを用いたが、従来例のよう
なスペクトルや、信号パワー、零交差数、線形予測係数
、自己相関関数、ＤＦＴ係数等、及びそれらを組み合わ
せたものを用いてもよい。さらに、特徴パラメータの数
、主成分ベクトル空間の次元数等の具体的数値は任意に
設定可能である。Note that although the space into which the feature parameters are projected is a space consisting of voiced principal component vectors, it may be a space defined by silent principal component vectors. Furthermore, although a sound area is used for area determination, a silent area may also be used. Although the LPC cepstrum is used as the feature parameter, a spectrum, signal power, number of zero crossings, linear prediction coefficient, autocorrelation function, DFT coefficient, etc. as in the conventional example, or a combination thereof may also be used. Further, specific numerical values such as the number of feature parameters and the number of dimensions of the principal component vector space can be arbitrarily set.

第９図はこの発明による第２実施例の無音／有音識別装
置のブロック図である。入力端子にＬＰＣケプストラム
計算器６２が接続され、第１実施例と同様に入力音声信
号のケプストラムｃｉ（ｉｌ、２、・・・Ｌ）がフレー
ム毎に計算される。ケプストラムは内積計算器６３．６
４に入力される。内積計算器６３．６４にはそれぞれ有
音主成分ベクトルメモリ６５、無音主成分ベクトルメモ
リ６６が接続される。内積計算器６３．６４、有音主成
分ベクトルメモリ６５、無音主成分ベクトルメモリ、６
６は特徴パラメータ射影回路６７を構成する。すなわち
、第２実施例では特徴パラメータ（ケプストラム）は有
音主成分ベクトル空間、及び無音主成分ベクトル空間に
それぞれ射影される。有音主成分ベクトルメモリ６５は
、第１実施例と同様に予めこの検出装置の使用環境下で
発声された音声の有音部のＬＰＣケプストラム主成分分
析して得られた第１〜第３主成分ベクトルを格納してい
る。無音主成分ベクトルメモリ６６は、予めこの検出装
置の使用環境下で発声された音声の無音部のＬＰＣケプ
ストラムを主成分分析して得られた第１〜第３主成分ベ
クトルを格納している。内積計算器６３．６４は第１実
施例の内積計算器５３と同様に有音、無音の第１〜第３
主成分ベクトルを座標軸とする３次元空間上のＬＰＣケ
プストラムベクトルの射影点Ｑを求める。FIG. 9 is a block diagram of a second embodiment of the silence/speech discrimination device according to the present invention. An LPC cepstrum calculator 62 is connected to the input terminal, and the cepstrum ci (il, 2, . . . L) of the input audio signal is calculated for each frame as in the first embodiment. Cepstrum is inner product calculator 63.6
4 is input. A voiced principal component vector memory 65 and a silent principal component vector memory 66 are connected to the inner product calculators 63 and 64, respectively. Inner product calculator 63, 64, voiced principal component vector memory 65, silent principal component vector memory, 6
6 constitutes a feature parameter projection circuit 67. That is, in the second embodiment, the feature parameters (cepstrum) are projected onto the voiced principal component vector space and the silent principal component vector space, respectively. The voiced principal component vector memory 65 stores the first to third principal components obtained by LPC cepstral principal component analysis of the voiced portion of the voice uttered in the environment in which this detection device is used, as in the first embodiment. Stores component vectors. The silent principal component vector memory 66 previously stores first to third principal component vectors obtained by principal component analysis of the LPC cepstrum of the silent portion of speech uttered under the usage environment of this detection device. The inner product calculators 63 and 64, like the inner product calculator 53 of the first embodiment, have the first to third voiced and silent
A projection point Q of the LPC cepstrum vector on a three-dimensional space whose coordinate axis is the principal component vector is determined.

内積計算器６３．６４の出力がそれぞれ有音検出器６８
、無音検出器６９に供給される。検出器６８．６９には
それぞれ有音主成分ベクトル空間上の有音領域を規定す
るパラメータを格納する有音領域パラメータメモリ７０
と無音主成分ベクトル空間上の無音領域を規定するパラ
メータを格納する無音領域パラメータメモリ７１が接続
される。各パラメータは射影点Ｑの座標をこれらのパラ
メータと比較し、射影点Ｑが有音領域内に存在する場合
、“１”レベルの検出信号を出力する°。無音検出器６
ｇも同様に射影点Ｑが無音領域内に存在する場合“１”
レベル検出信号を出力する。雨検出器６８．６９の出力
が無音／有音識別器７２に供給される。識別器７２は次
のように判定を行なう。The outputs of the inner product calculators 63 and 64 are detected by the sound detector 68, respectively.
, are supplied to the silence detector 69. The detectors 68 and 69 each have a sound region parameter memory 70 that stores parameters defining a sound region on the sound principal component vector space.
A silent region parameter memory 71 that stores parameters defining a silent region on the silent principal component vector space is connected to the silent region parameter memory 71 . For each parameter, the coordinates of the projection point Q are compared with these parameters, and if the projection point Q exists within the sound area, a detection signal of "1" level is output. Silence detector 6
Similarly, g is “1” if the projection point Q exists within the silent area.
Outputs level detection signal. The outputs of the rain detectors 68,69 are fed to a silence/sound discriminator 72. The discriminator 72 makes the determination as follows.

有音検出器６８の出力が“１”で無音検出器の出力が“
０”の場合は有音、有音検出器６８の出力が“０”で無音検出器の出力が“
１”の場合は無音、有音検出器６８の出力が“１”で無音検出器の出力が“
１°の場合は有音、有音部検出器６８の出力が“０”で無音検出器の出力が
“０”の場合は有音。The output of the sound detector 68 is “1” and the output of the silence detector is “1”.
If the output is “0”, there is a sound, and the output of the sound detector 68 is “0” and the output of the silence detector is “0”.
If the output is “1”, there is no sound, and the output of the sound detector 68 is “1” and the output of the silence detector is “1”.
If the angle is 1°, there is a sound, and if the output of the sound part detector 68 is "0" and the output of the silence detector is "0", there is a sound.

すなわち、有音検出器６８と無音検出器６９とが両方と
も無音と判定したときのみ無音と判定し、それ以外の時
は有音と判定する。この判定の流れを第１０図に示す。That is, silence is determined only when both the sound detector 68 and the silence detector 69 determine that there is no sound, and in other cases, it is determined that there is sound. The flow of this determination is shown in FIG.

第２実施例によれば、有音主成分ベクトル空間と無音主
成分ベクトル空間への２つの射影点の位置に基づいて判
定を行っているので、有音主成分ベクトルを求めるため
に予め収集した有音音声のＬＰＣケプストラムパターン
と異なるパターンを有する音声が入力され、有音検出器
６８の検出結果が無音となった場合でも、無音検出器６
９が無音と判定しない限り、最終的には有音であると判
定することができ、有音の検出洩れを防止できる。According to the second embodiment, since the determination is made based on the positions of two projection points onto the voiced principal component vector space and the silent principal component vector space, the Even if a voice having a pattern different from the LPC cepstrum pattern of the voiced voice is input and the detection result of the voiced voice detector 68 is silence, the silence detector 6
9 is not determined to be silent, it can ultimately be determined that there is a sound, and failure to detect the presence of sound can be prevented.

なお、第２実施例も第１実施例と同様に変形することが
可能である。さらに、有音主成分ベクトルからなる空間
の判定領域を無音領域として、無音主成分ベクトルから
なる空間の判定領域を有音領域としてもよい。Note that the second embodiment can also be modified in the same way as the first embodiment. Furthermore, the determination region of the space consisting of the voiced principal component vector may be set as a silent region, and the determination region of the space consisting of the silent principal component vector may be set as a voiced region.

第１、第２実施例は有音／無音主成分ベクトル空間への
射影点が有音／無音領域にあるか否かに応じて無音／有
音識別したが、有音の中には多くのカテゴリがあるので
、第１、第２実施例では識別できない有音も有り得る。In the first and second embodiments, silence/voice was identified depending on whether or not the projection point onto the voice/silence principal component vector space was in the voice/silence region. Since there are categories, there may be sounds that cannot be identified in the first and second embodiments.

例えば、−口で有音と言っても、その特徴パラメータは
母音と子音、男声と女声、子音でも各音韻によって異な
る。そのため、主成分ベクトルを複数のカテゴリ毎に求
めておいて、各カテゴリ毎に有音／無音の判定を行ない
、その結果に基づいて最終的な判定を行なえば精度は向
上する。For example, even if it is pronounced with the - mouth, its characteristic parameters differ depending on each phoneme, whether it is a vowel or a consonant, a male voice or a female voice, or a consonant. Therefore, accuracy can be improved by determining principal component vectors for each of a plurality of categories, determining whether each category is uttered or not, and making a final determination based on the results.

ここで、カテゴリの数は多いほど精度は高いが、音声認
識のように多くすると、装置の規模が太き有音部の特徴
パラメータを予め定めた数のカテゴリに分類すると共に
、各カテゴリを代表する自己相関行列を求める。具体的
にはＬＢＧアルゴリズムとして知られている手法を用い
る。Here, the higher the number of categories, the higher the accuracy, but as in speech recognition, when there is a large number of categories, the scale of the device is large, and the feature parameters of the sound part are classified into a predetermined number of categories, and each category is represented. Find the autocorrelation matrix. Specifically, a method known as the LBG algorithm is used.

装置の使用環境下で収集された音声の有音部の特徴パラ
メータベクトルＸｉ　　（ｉ＝１〜Ｍ）を多数求め、Ｘ
ｉの自己相関行列Ｒｉを（４）式に従い計算する。行列
Ｒｉの行ベクトルを要素とするベクトルをトレーニング
ベクトルＴｉとしてＬＢＧアルゴリズムを適用すること
により、予め定められた数、問題であるとの代表ベクト
ルＡｊ（ｊ＝１〜Ｎ）と分割Ｐ（Ａｊ）を求める。分割
Ｐ（Ａｊ）に属する自己相関行列Ｒｉを作成した特徴パ
ラメータベクトルＸｉをｊ番目のカテゴリのメンバとす
るとともに、代表ベクトルｒ］の要素から作られる行列
Ｒｊをｊ番目のカテゴリの代表自己相関行列とする。A large number of characteristic parameter vectors Xi (i=1 to M) of the voiced parts of the audio collected under the usage environment of the device are obtained, and X
The autocorrelation matrix Ri of i is calculated according to equation (4). By applying the LBG algorithm using a vector whose elements are the row vectors of the matrix Ri as the training vector Ti, a predetermined number of representative vectors Aj (j=1 to N) and division P(Aj) of the problem are obtained. seek. Let the feature parameter vector Xi that created the autocorrelation matrix Ri belonging to the division P(Aj) be a member of the j-th category, and let the matrix Rj created from the elements of the representative vector r] be the representative autocorrelation matrix of the j-th category. shall be.

ＬＢＧアルゴリズムについては、Ｙ、　Ｌｉｎｄｅ、Ａ
３１１！Ｏｘｎｄ　Ｒ，Ｍ、Ｇｒａ７：’Ａｎ　ｘ１ｇ
ｏｐｒ目ｂｍ　ｆｏ＋　Ｖｅｃｔｏ＋ｑ　ｕ　’！　Ｉ
Ｉ目ｗｅｔ　ｄｅｓｉｇｎ　’、ＩＥＥＥ　Ｔ＋ｏｍｓ
、Ｃ０Ｍ−２８、Ｎｏ　ｌ　Ｈ１１４−９５（Ｊ＊ｎｕ
ｘｔ７．１９８０）に記載されている。For the LBG algorithm, Y., Linde, A.
311! Oxnd R, M, Gra7:'An x1g
Oprth bm fo+ Vecto+q u'! I
I wet design', IEEE T+oms
, C0M-28, No l H114-95 (J*nu
xt7.1980).

以上の方法により、特徴パラメータベクトルの自己相関
行列が予め定められた数のカテゴリに分類されると共に
、各カテゴリの代表自己相関行列が求められる。ＬＧＢ
アルゴリズムを用いているので、Ｍ個の自己相関行列Ｒ
ｉ　（ｉ−１，２、・・・Ｍ）をＮ　（＜Ｍ）個々の代
表自己相関行列よりＲｊ　（ｊ＝１．２、・・・Ｎ）で
代表、または近似させたときの誤差の二乗平均値が最小
になる。By the above method, the autocorrelation matrix of the feature parameter vector is classified into a predetermined number of categories, and a representative autocorrelation matrix for each category is determined. LGB
Since the algorithm is used, M autocorrelation matrices R
The error when i (i-1, 2, . . . M) is represented or approximated by Rj (j = 1.2, . . . N) from N (<M) individual representative autocorrelation matrices. The root mean square value is minimized.

次に、それぞれのカテゴリの代表自己相関行列Ｒｊを主
成分分析することによりそれぞれのカテゴリの主成分ベ
クトルを求める。また、それぞれのカテゴリについて、
主成分ベクトルを座標軸とする主成分ベクトル空間上に
そのカテゴリに属する特徴パラメータベクトルを射影す
ることにより、カテゴリに属するか否かを判定するため
の識別領域を主成分ベクトル空間上に定めることができ
る。Next, principal component vectors for each category are obtained by subjecting the representative autocorrelation matrix Rj of each category to principal component analysis. Also, for each category,
By projecting the feature parameter vector belonging to the category onto the principal component vector space with the principal component vector as the coordinate axis, it is possible to define an identification region on the principal component vector space for determining whether or not it belongs to the category. .

有音／無音判定はカテゴリ毎に求められる特徴パラメー
タベクトルをそれぞれのカテゴリの主成分ベクトル空間
上に射影し、射影点とベクトル空間上に予め定めた識別
領域との比較により行なう。Speech/silence determination is performed by projecting the feature parameter vectors obtained for each category onto the principal component vector space of each category, and comparing the projected points with a predetermined identification area on the vector space.

このような判定を行なう第３実施例のブロック図を第１
１図に示す。第１、第２実施例と同様に入力端子にケプ
ストラム計算器８３が接続され１、特徴パラメータとし
てのケプストラムＣｉ　　（ｉ＝１．２、・・・Ｌ）が
フレーム毎に計算される。ケプストラムは各カテゴリ毎
の特徴パラメータ射影回路８４ａ〜８４ｊに入力される
。ここｔでは、カテゴリは１０個とする。各特徴パラメ
ータ射影回路８４は各カテゴリ毎の主成分分析を行う。A block diagram of the third embodiment that makes such a determination is shown in the first example.
Shown in Figure 1. As in the first and second embodiments, a cepstrum calculator 83 is connected to the input terminal 1, and a cepstrum Ci (i=1.2, . . . L) as a feature parameter is calculated for each frame. The cepstrum is input to feature parameter projection circuits 84a to 84j for each category. Here, at t, there are 10 categories. Each feature parameter projection circuit 84 performs principal component analysis for each category.

−例として特徴がパラメータ射影回路８４ａを第１２図
に示す。射影回路８４ａはカテゴリ＃１の主成分ベクト
ルを記憶するベクトルメモリ９２とカテゴリ＃１の主成
分ベクトルと特徴パラメータベクトルとの内積を計算す
る内積計算器９４からなる。このように、各射影回路８
４ａ〜８４ｊはそれぞれカテゴリ＃１〜＃１Ｇの主成分
ベクトル空間上に特徴ベクトルを射影し、射影点を求め
る。- As an example, a characteristic parameter projection circuit 84a is shown in FIG. The projection circuit 84a includes a vector memory 92 that stores the principal component vector of category #1, and an inner product calculator 94 that calculates the inner product of the principal component vector of category #1 and the feature parameter vector. In this way, each projection circuit 8
4a to 84j project feature vectors onto principal component vector spaces of categories #1 to #1G, respectively, and obtain projection points.

各射影回路Ｈａ〜８４ｊの出力が検出回路８６ａ〜８６
ｊに供給される。各検出回路８６ａ〜８６ｊは射影点の
座標に基づいて有音か無音かを検出する。The outputs of the projection circuits Ha to 84j are sent to the detection circuits 86a to 86.
supplied to j. Each of the detection circuits 86a to 86j detects whether there is a sound or no sound based on the coordinates of the projection point.

例として検出回路８６ａを第１３図に示す。検出回路８
６ａはカテゴリ＃１の主成分ベクトル空間上の有音領域
を規定するパラメータを格納するカテゴリ＃１領域パラ
メータメモリ１０２と、有音検出器＋０４とからなる。As an example, a detection circuit 86a is shown in FIG. Detection circuit 8
Reference numeral 6a includes a category #1 area parameter memory 102 that stores parameters defining a voice area on the principal component vector space of category #1, and a voice detector +04.

この有音領域も主成分ベクトル空間が３次元空間である
とすると、第１実施例で説明したように第７図に示すよ
うな直方体で表わされ、パラメータは各座標軸における
下限、上限からなる。検出器１０４は射影点Ｑが有音領
域内に存在する場合、“１”レベルの検出信号を出力し
、それ以外の場合は“０”レベルの検出信号を出力する
。各検出回路８６ａ〜８６ｊの出力が無音／有音識別器
８８に供給される。識別器８８は少なくともいずれか１
つの検出回路８６ａ〜８６ｊの出力が“１”レベルの場
合は、有音と判定する。この判定手順を第１４図に示す
。Assuming that the principal component vector space is a three-dimensional space, this sound region is also represented by a rectangular parallelepiped as shown in FIG. 7, as explained in the first embodiment, and the parameters consist of lower and upper limits in each coordinate axis. . The detector 104 outputs a "1" level detection signal when the projection point Q exists within the sound region, and otherwise outputs a "0" level detection signal. The output of each detection circuit 86a-86j is supplied to a silence/speech discriminator 88. The discriminator 88 selects at least one
When the outputs of the two detection circuits 86a to 86j are at the "1" level, it is determined that there is a sound. This determination procedure is shown in FIG.

次に、特徴パラメータを複数のカテゴリに分類すると共
に、各カテゴリの主成分ベクトル及び各カテゴリ毎の主
成分ベクトル空間上無音／有音判定のための標準領域の
求め方を説明する。Next, a description will be given of how to classify the feature parameters into a plurality of categories and find the principal component vector of each category and the standard region for silent/speech determination in the principal component vector space of each category.

先ず、予め装置の使用環境下で収集された音声の有音部
の多数のＬＰＣケプストラムベクトルＣｉ　　（ｉ−１
，２、・・・Ｍ）を求め、Ｃｉの自己相関行列を次式に
従って計算する。行列Ｒｉの行ベクトルを要素とするＰ
２次元のベクトルをトレーニングベクトルＴｉとする。First, a large number of LPC cepstrum vectors Ci (i-1
, 2, . . . M) and calculate the autocorrelation matrix of Ci according to the following equation. P whose elements are row vectors of matrix Ri
Let the two-dimensional vector be a training vector Ti.

Ｔｉ＝（ｃｉ□　′ ｃｉ、　　Ｃ１２、−ｃ・・・・・・（１０）ｃｉ、、　　ｃｉ２　ＣＩ＋　　。Ti=(ci□　′ ci, C12, -c ・・・・・・(10) ci,, ci2 CI+.

ただし、ｃｉ、、ｃｉ２．”’Ｃ１ｐはＬＰＣケプスト
ラムベクトルＣｊの要素である。トレーニングベクトル
ＴｉはＬＢＧアルゴリズムを適用して、次のようにＮ個
の代表ベクトルＹｊ　　（ｊ−１，２、・・・Ｎ）と分
割Ｐ（Ａｊ）を求めることにより求められる。However, ci, ci2. ``'C1p is an element of the LPC cepstrum vector Cj.The training vector Ti is divided into N representative vectors Yj (j-1, 2, . . . N) and the division P( Aj).

ステップ１：初期設定代表ベクトルの数Ｎ１歪（代表ベクトルと各ベクトルと
の誤差の二乗平均値）の閾値ε、代表ベクトルの初期値
Ａｏ、）レーニングベクトルＴｉ（ｉ＝１．２、・Ｍ　
）の初期値を与え、ｍ＝ｏとし、Ｄｌに大きい値を設定
する。Step 1: Initial setting Number of representative vectors N1 Threshold ε of distortion (root mean square value of error between the representative vector and each vector), Initial value of representative vector Ao, ) Training vector Ti (i=1.2, ・M
), set m=o, and set Dl to a large value.

ステップ２：最小平均歪の計算与えられた代表ベクトルの集合Ａｍ＝（Ｙｊ：ｊ＝１．
２、・・・Ｎ）で最小平均歪となるような分割Ｐ　（Ａ
ｍ）＝　（Ｓ　ｉ）　、ｉ＝１．２、・Ｎをトレーニン
グベクトルＴｉによって求める。すなわち、分割領域Ｓ
ｉに属する全てのＴｉについて、ｄ（Ｔｉ、Ｙｉ）くｄ
（Ｔｉ９Ｔｊ）、ｊ＝１．２、・・・Ｎとなるようにす
る。ただし、ｄ（Ｔｉ。Step 2: Calculation of minimum average distortion Given set of representative vectors Am=(Yj:j=1.
2,...N) such that the minimum average distortion is achieved by dividing P (A
m)=(S i) , i=1.2, ·N is determined by the training vector Ti. That is, the divided area S
For all Ti belonging to i, d(Ti, Yi) x d
(Ti9Tj), j=1.2, . . .N. However, d(Ti.

Ｙｉ）はＴｉとＹｉとの間の歪であり、次のように二乗
誤差として定義できる。Yi) is the distortion between Ti and Yi, and can be defined as a squared error as follows.

ただし、Ｔｉ　　（Ｒ）　、Ｔ　ｉ　　（Ｒ）はベクト
ルＴｉ５Ｙｉの要素である。However, Ti (R) and T i (R) are elements of vector Ti5Yi.

次に、分割Ｐ　（Ａｍ）による最小平均歪を次式により
計算する。Next, the minimum average distortion due to the division P (Am) is calculated using the following equation.

Ｄ、＝Ｄユ　（（Ａ、、Ｐ（Ａｍ））〕なお、実施例で
は、Ｎ＝１０、ε−０，０１、Ｍ＝１００００である。D,=Dyu ((A,,P(Am))] In the example, N=10, ε-0,01, and M=10000.

以上の処理によって得られる１０個の分割５ｉ（ｉ＝１
〜１０）が１０個のカテゴリになり、Ｓｉに属するトレ
ーニングベクトルＴｊを作成しているＬＰＣケプストラ
ムベクトルＣｊがｉ番目のカテゴリのメンバーと言うこ
とになる。また、代表ベクトルＹｉの要素並び替えによ
って得られる行列Ｒｉがｉ番目のカテゴリの代表自己相
関行列となる。The 10 divisions 5i (i=1
~10) become 10 categories, and the LPC cepstrum vector Cj that creates the training vector Tj belonging to Si is said to be a member of the i-th category. Further, the matrix Ri obtained by rearranging the elements of the representative vector Yi becomes the representative autocorrelation matrix of the i-th category.

ステップ３：収束のチエツク（Ｄｍ−＋−Ｄｍ）／Ｄｍ＜　さならば、処理を停止し
、Ａｍを最終代表ベクトルの集合とする。Step 3: Check for convergence (Dm-+-Dm)/Dm< If so, stop the process and let Am be the final set of representative vectors.

ステップ４：繰り返し今の分割により得られている代表ベクトル集合Ａ０１を
Ａ６とし、ｍ＝ｍ＋１として、ステップ＃２へ戻る。Step 4: Repeat. Set the representative vector set A01 obtained by the current division to A6, set m=m+1, and return to step #2.

各カテゴリの主成分ベクトリはＫ］−の主成分分析によ
り予め求めることができる。主成分ベクトル空間上で各
カテゴリの領域を規定するパラメータは各カテゴリ毎に
そのカテゴリに属するＬＰＣケプストラムベクトルを各
カテゴリの主成分ベクトル空間上に射影することにより
予め求めることができる。The principal component vector of each category can be determined in advance by principal component analysis of K]-. The parameters defining the area of each category on the principal component vector space can be determined in advance for each category by projecting the LPC cepstrum vector belonging to that category onto the principal component vector space of each category.

以上説明したように第３実施例によれば、第１、第２実
施例の効果に加えて、有音の各カテゴリ毎の主成分ベク
トル空間上に特徴パラメータをそれぞれ射影し、カテゴ
リ毎の判定結果を総合して最終的に判定を行っているの
で検出精度が向上する。As explained above, according to the third embodiment, in addition to the effects of the first and second embodiments, feature parameters are projected onto the principal component vector space for each category of sound, and judgment for each category is achieved. Since the results are combined and the final judgment is made, detection accuracy is improved.

しかも、カテゴリの分類及びカテゴリ毎の主成分ベクト
ルを求める際に、ＬＢＧアルゴリズムを用いているので
、Ｍ個の特徴パラメータの自己相関行列をより少ない個
数のカテゴリに分類することができ、精度を高くできる
。なお、第３実施例も上述の実施例と同様に変形可能で
ある。Furthermore, since the LBG algorithm is used when classifying categories and finding principal component vectors for each category, it is possible to classify the autocorrelation matrix of M feature parameters into a smaller number of categories, increasing accuracy. can. Note that the third embodiment can also be modified in the same way as the above-mentioned embodiments.

上述の実施例は、特徴パラメータの射影点が主成分ベク
トル空間内の判定のための領域以外に位置する場合には
、すぐさまその領域内にある場合の反対の判定を行った
が、このような場合に、他の基準により判定を再度行え
ば、さらに判定精度は向上する。第１５図はこのような
原理に基づいたこの発明による無音／有音識別装置の第
４実施例のブロック図である。上述の実施例と同様に、
音声信号がＬＰＣケプストラム計算器１２２に入力さ１
７１２５が接続される。内積計算器１２４、有音主成分
ベクトルメモリ１２６は特徴パラメータ射影回路１２ｇ
を構成する。すなわち、この実施例では特徴パラメータ
は有音主成分ベクトル空間内に射影され、内積計算器１
２４は射影点Ｑの座標を求める。In the above embodiment, when the projection point of a feature parameter is located outside the region for determination in the principal component vector space, the opposite determination is made immediately when the projection point is within that region. In this case, if the determination is made again using other criteria, the determination accuracy will be further improved. FIG. 15 is a block diagram of a fourth embodiment of the silence/speech discrimination device according to the present invention based on such a principle. Similar to the above embodiment,
The audio signal is input to the LPC cepstrum calculator 122.
7125 is connected. The inner product calculator 124 and the voiced principal component vector memory 126 are the feature parameter projection circuit 12g.
Configure. That is, in this embodiment, the feature parameters are projected into the voiced principal component vector space, and the inner product calculator 1
24 determines the coordinates of the projection point Q.

有音主成分ベクトルメモリ１２６は、予めこの検出装置
の使用環境下で収集された音声の有音部のＬＰＣケプス
トラムに対して主成分分析を行った結果得られた第１〜
第３主成分ベクトルを格納している。The voiced principal component vector memory 126 stores the first to the voiced principal component vectors obtained as a result of performing principal component analysis on the LPC cepstrum of the voiced part of the voice collected in advance under the usage environment of this detection device.
Stores the third principal component vector.

の有音領域を規定するパラメータを格納する有音領域パ
ラメータメモ１Ｊ１３２と無音領域を規定するパラメー
タを格納する無音領域パラメータメモリ１３４も接続さ
れる。検出回路１３０は射影点Ｑが有音領域内に存在す
る場合は有音であると判定し、射影点Ｑが無音領域内に
存在する場合は無音であると判定し、いずれの領域にも
ない場合は不定であると判定する。Also connected are a sound area parameter memo 1J132 that stores parameters that define a sound area, and a silent area parameter memory 134 that stores parameters that define a silent area. The detection circuit 130 determines that there is a sound when the projection point Q exists in the sound area, determines that there is no sound when the projection point Q exists in the silent area, and determines that there is no sound when the projection point Q exists in the silent area. In this case, it is determined that it is indefinite.

として出力されるとともに、識別結果メモリ　１４０に
格納される。メモリ　１４０は少なくとも過去の３フレ
ームの識別結果を格納する。メモリ　１４０の出力が条
件付き確立テーブル１３８に供給される。テーブル１３
８は過去の３フレームの識別結果に応じた現在のフレー
ムが無音であるか、有音であるかの確率、すなわち条件
付き確率を格納する。条件付き確率Ｐは、現在のフレー
ムの識別結果Ｄ１が過去３フレームの識別結果を３．　
、Ｄｉ−、，１）ｉ−とじた時、式で表わされるＰ　　（Ｄ　ｉ　ｌ　　Ｄ、−＋　、　　Ｊ−ｚ　、　
　Ｄｉ−ｉ　）″　Ｐ　　（Ｄ・　１ｒ　　Ｄ　ｉ−ｚ
　ｒ　　Ｄｉ−ｉ　）じただし、Ｄｉはｉ番目のフレームが有音の場合は１、無
音の場合は０である。Ｐ　（Ｄ　ｉ　　Ｄ’、−、。The identification result memory 140 is outputted as well as stored in the identification result memory 140. The memory 140 stores identification results of at least three past frames. The output of memory 140 is provided to conditional probability table 138. table 13
8 stores the probability of whether the current frame is silent or has sound according to the identification results of the past three frames, that is, the conditional probability. The conditional probability P is that the identification result D1 of the current frame is 3.
, Di-,,1) When closing i-, P (D i l D,-+, J-z,
Di-i)″P (D・1r Di-z
r Di-i ) However, Di is 1 if the i-th frame has sound, and 0 if it is silent. P (D i D', -,.

Ｄｉ−ｚ　＋　　ＤＩ−３）　Ｐ　（ＤＩ−１＋　　Ｄ
ｉ−２，ＤＩ−３）は予め装置の使用環境下で収集した
音声にフレーム毎に波形やスペクトルの視認等によって
有音、無音のラベル付けを行なった連続する４“フレー
ム、３フレームを基に、確率計算を行い予め求めておく
。Di-z + DI-3) P (DI-1+ D
i-2, DI-3) is based on consecutive 4" frames and 3 frames in which the audio collected in advance under the usage environment of the device is labeled as sound or silent by visual inspection of waveforms and spectra for each frame. The probability is calculated and determined in advance.

現在のフレームが有音である条件付き確率と無音である
条件付き確率は無音／有音識別器１３６に供給され、識
別器　１３６は検出回路１３０の検出結果が不定である
場合には有音である確率と無音である確率とを比較して
、確率の高い方として無音／有音の判定を行う。この判
定の流れを第１６図に示す。The conditional probability that the current frame is speech and silence is supplied to a silence/speech discriminator 136, and the discriminator 136 determines that there is speech if the detection result of the detection circuit 130 is indeterminate. A certain probability is compared with the probability that there is no sound, and the higher probability is used to determine whether there is sound or no sound. The flow of this determination is shown in FIG.

このように第４実施例によれば、主成分ベクトル空間内
の領域に基づいた判定でポジティブな判定結果が得られ
ない場合は、即ネガティブな判定結果を導かずに、学習
データから得られた条件付き確率を基に判定を行うとい
う２段階の判定を行が有音→無音→有音→無音と変化す
るということは非常に少ないというような音声に関する
知識を利用していることになり、パワ・−の小さい有声
子音や無声子音等の誤判定が減り、語頭や語尾の脱落や
雑音の付加が減少するというこうかがある。As described above, according to the fourth embodiment, if a positive judgment result is not obtained in the judgment based on the region in the principal component vector space, instead of immediately leading to a negative judgment result, the judgment result obtained from the learning data is This means that the two-stage judgment based on conditional probabilities uses knowledge about speech that it is extremely rare for a line to change from voiced to silent to voiced to silent. Misjudgments of voiced consonants and voiceless consonants with low power are reduced, and the occurrence of omissions and addition of noise at the beginning and end of words is reduced.

この実施例も上述の実施例と同様な変形が可能である。This embodiment can also be modified in the same way as the above embodiment.

上述の実施例は単に特徴パラメータの射影点の位置に基
づいて無音／有音の判定を行っているが、射影点の時間
的変化に基づいて判定を行えば、さらに精度が向上する
。このような原理に従ったこの発明による無音／有音識
別装置の第５実施例のブロック図を第１７図に示す。入
力端子にＬＰＣケプストラム計算器＋５２が接続され、
上述の実施例と同様にケプストラム（ｉ（ｉ＝ｌ、２、
・・・Ｌ）がフレーム毎に計算される。ケプストラムは
内積計算器１５４に入力される。内積計算器１５４には
、有音主成分ベクトルメモリ１５もが接続され、内積計
算器１５４、有音主成分ベクトルメモリ　＋５６により
特徴パラメータ射影回路１５Ｂが構成される。有音主成
分ベクトルメモリ　１５６は、予めこの検出装置の使用
環境下で発声された音声の有音部のＬＰＣケプストラム
を主成分分析して得られた第１〜第３主成分を格納して
いる。内積計算器１５４は有音の第１〜第３主成分をベ
クトルを座標軸とする３次元空間上のＬＰＣケプストラ
ムベクトルの射影点Ｑを求める。In the above-described embodiments, whether there is a sound or not is determined simply based on the position of the projection point of the feature parameter, but the accuracy can be further improved if the determination is made based on the temporal change of the projection point. FIG. 17 shows a block diagram of a fifth embodiment of the silent/sound identification device according to the present invention based on such a principle. LPC cepstrum calculator +52 is connected to the input terminal,
Similar to the above embodiment, the cepstrum (i (i=l, 2,
...L) is calculated for each frame. The cepstrum is input to an inner product calculator 154. The voiced principal component vector memory 15 is also connected to the inner product calculator 154, and the feature parameter projection circuit 15B is constituted by the inner product calculator 154 and the voiced principal component vector memory +56. The voiced principal component vector memory 156 stores the first to third principal components obtained by principal component analysis of the LPC cepstrum of the voiced part of the voice uttered under the usage environment of this detection device. . The inner product calculator 154 determines the projection point Q of the LPC cepstrum vector on a three-dimensional space whose coordinate axes are the vectors of the first to third principal components of the voice.

化を示す変化ベクトルが抽出される。フィルタ＋６０の
例を第１８図、第１９図に示す。第１９図は２次のＦＩ
Ｒフィルタのブロック図である。フィルタ１６０はｎ番
目のフレームにおける主成分ベクトル空間上の射影点ベ
クトルＱ　（ｎ）　＝　（ｑ　１　（ｎ）、ｑ２　（ｎ
）　、ｑ３（ｎ））とした時、Ｑ　（ｎ）にフィルタリ
ングを行い、変化ベクトル△Ｑ　（ｎ）＝（△ｑｌ（ｎ
）、△ｑ２（ｎ）、△ｑ３（ｎ））を求める。第１８図
のフィルタの場合は、△ｑｊ（ｎ）は次のように表され
る。A change vector indicating the change is extracted. Examples of filter +60 are shown in FIGS. 18 and 19. Figure 19 shows the second-order FI
FIG. 2 is a block diagram of an R filter. The filter 160 calculates the projection point vector Q (n) = (q 1 (n), q 2 (n
), q3(n)), filtering is performed on Q(n), and the change vector △Q(n)=(△ql(n)
), △q2(n), △q3(n)). In the case of the filter of FIG. 18, Δqj(n) is expressed as follows.

・・・・・・（１６）ここで、ａｊ　　（ｊ＝１〜ｐ）はフィルタ係数であり
、ｐはフィルタの次数である。また、σｉはフィルタ出
力の分散を正視化するための係数であり、ｑｉ（ｎ）の
標準偏差として次のように表される。(16) Here, aj (j=1 to p) is a filter coefficient, and p is the order of the filter. Further, σi is a coefficient for normalizing the dispersion of the filter output, and is expressed as the standard deviation of qi(n) as follows.

・・・・・・　（１７）一方、第１９図のフィルタの場合は、△ｑｉ（ｎ）は次
のように表される。(17) On the other hand, in the case of the filter shown in FIG. 19, Δqi(n) is expressed as follows.

△ｑ　ｉ　　（ｎ）　＝　−（ｘ　（ｎ）　＋ｂｌ　ｘ
　（ｎ）　＋σＧｂｚ　ｘ　（ｎ　　２）　）　　　　　　　　　　　・
＝−（１８１ｘ　　（ｎ）＝ｑ　　ｉ　　　（ｎ）　　
十　ａ、　　　ｘ　　（ｎ−１）　　＋ａ　　２　Ｘ（
ｎ　−２）　　　　　　　　　　　　　　　・・・・・
・（１９）なお、第１８図と第１９図のフィルタの伝達
関数Ｈ１（ｚ）　、Ｈ２（ｚ）はそれぞれ次のように表
わされる。△q i (n) = −(x (n) +bl x
(n) +σG bz x (n 2) ) ・
=-(181x (n) = q i (n)
10 a, x (n-1) +a 2 X(
n-2) ・・・・・・
(19) The transfer functions H1(z) and H2(z) of the filters in FIGS. 18 and 19 are respectively expressed as follows.

・・・・・・　（２０）・・・・・・（２１）フィルタ係数ａｊ、ｂｊは伝達関数Ｈ１（ｚ）、Ｈ２（
ｚ）がバイパス特性になるように予め設定しておくが、
信号のパワーによって適応的に変えてもよい。...... (20) ...... (21) The filter coefficients aj and bj are the transfer functions H1(z) and H2(
z) is set in advance so that it has a bypass characteristic, but
It may be adaptively changed depending on the signal power.

フィルタ　１６０の出力がマツチング回路１６２ａ〜１
６２ｊに供給される。各マツチング回路　１６２は次式
に示すユークリッド距離で表される類似度を計算する。The output of the filter 160 is connected to the matching circuits 162a to 162a.
62j. Each matching circuit 162 calculates the degree of similarity expressed by the Euclidean distance shown in the following equation.

−例として、マツチング回路１６２ａを第２０図に示す
。- As an example, a matching circuit 162a is shown in FIG.

マツチング回路　１６２ａは参照ハターンテーブル１８
２ａと、類似度計算回路１Ｂ２　ａからなり、この類似
度計算回路１８３の内で以下の計算を行う。Matching circuit 162a is reference pattern table 18
2a and a similarity calculation circuit 1B2a, and the following calculations are performed within this similarity calculation circuit 183.

ｓｍ＝　　　Σ　　（△Ｑ　　１　　　ｒ　　ｉＩ　Ｉ
Ｉ　）　　）　　２ｉ＝１・・・・・・（２２）ただし標準パターンテーブル１８０ａ内にはＲ″″ゝ　
＝（ｒ１′″ｔｌ　　　ｒ２　ｆ＋″ｌ　、　ｒ　３　
Ｌ　ｓ　ｌ　）があらかじめ記憶させており（ｍ番目の
標準パターン）があらかじめ記憶されている。sm= Σ (△Q 1 r iI I
I ) ) 2i=1 (22) However, R″″ is in the standard pattern table 180a.
=(r1′″tl r2 f+″l, r 3
L s l ) is stored in advance, and (mth standard pattern) is stored in advance.

なお、類似度としては、他の各種の公知の類似度を用い
てもよい。Note that various other known degrees of similarity may be used as the degree of similarity.

この標準パターンは第２１図に示す手順で求めておく。This standard pattern is obtained in advance by the procedure shown in FIG.

先ず、ステップ＃４１で、装置の使用環境下で発声され
た音声の有音区間と見なせる部分の音声データを学習用
データとして収集する。ステップ＃４２で、学習データ
のＬＰＣケプストラムをフレーム単位で求める。ステッ
プ＃４３で、ケプストラムを有音部主成分ベクトル空間
上に射影し、ステップ＃４４で、射影点の時間的変化を
表わす変化ベクトルを複数抽出し、ステップ＃４５で、
それらの重心を求め、これを標準パターンとする。マツ
チング回路１６２を複数設けるのは、多種類の特徴パラ
メータの射影点の変化ベクトルの標準パターンとの類似
度を計算し、判定精度をあげるためである。First, in step #41, audio data of a portion that can be considered as a sound section of audio uttered under the usage environment of the device is collected as learning data. In step #42, the LPC cepstrum of the learning data is determined frame by frame. In step #43, the cepstrum is projected onto the voiced part principal component vector space, in step #44, a plurality of change vectors representing temporal changes in the projection point are extracted, and in step #45,
Find their centers of gravity and use this as the standard pattern. The reason why a plurality of matching circuits 162 are provided is to calculate the degree of similarity between the change vectors of the projection points of many types of feature parameters and the standard pattern, and to improve the determination accuracy.

マツチング回路１６２ａ〜１６２ｊの出力が無音／有音
識別器　１６４に供給される。識別器１６４は１０個の
類似度の中の最小の値と所定の閾値とを比較し、最小類
似度が閾値以上の場合は有音と判定し、それ以外の場合
は無音と判定する。この判定の手順を第２２図に示す。The outputs of matching circuits 162a-162j are supplied to silence/speech discriminator 164. The classifier 164 compares the minimum value among the ten similarities with a predetermined threshold, and determines that there is a sound if the minimum similarity is greater than or equal to the threshold, and otherwise determines that there is no sound. The procedure for this determination is shown in FIG.

このように第５実施例によれば、射影点の時間変化に基
づいて無音／有音判定を行うことにより、背景雑音によ
る誤検出を防止する効果がある。雑音によって特徴パラ
メータの射影点の位置は移動する可能性はあるが、射影
点の時間的な変化は相対的なものであり、雑音による影
響を受けにくいからである。第５実施例も上述の実施例
と同様に変形が可能である。As described above, according to the fifth embodiment, by determining whether there is a sound or not based on the temporal change of the projection point, there is an effect of preventing false detection due to background noise. This is because although the position of the projection point of the feature parameter may shift due to noise, the temporal change in the projection point is relative and is not easily affected by noise. The fifth embodiment can also be modified in the same way as the above-mentioned embodiments.

第２３図は第６実施例のブロック図である。ＬＰＣケプ
ストラム計算器２０２の出力が内積計算器２０４、有音
主成分ベクトルメモリ２０６からなる特徴パラメータ射
影回路２０８に供給される。有音主成分ベトルメモリ　
２０６は、予めこの検出装置の使用環境下で発声された
音声の有音部のＬＰＣケプストラムを主成分分析して得
られた第１〜第３主成分ベクトルを格納している。内積
計算器２０４は有音の第１〜第３主成分ベクトルを座標
軸とする３次元空間上のＬＰＣケプストラムベクトルの
射影点Ｑを求める。FIG. 23 is a block diagram of the sixth embodiment. The output of the LPC cepstrum calculator 202 is supplied to a feature parameter projection circuit 208 comprising an inner product calculator 204 and a voiced principal component vector memory 206. Voiced principal component vector memory
Reference numeral 206 stores first to third principal component vectors obtained by principal component analysis of the LPC cepstrum of a sound part of a voice uttered under the usage environment of this detection device. The inner product calculator 204 determines a projection point Q of the LPC cepstrum vector on a three-dimensional space whose coordinate axes are the first to third principal component vectors of the voice.

内積計算器２０４の出力がＦＩＲフィルタ　２１［１と
複数の検出回路２１２ａ〜２１２ｊに供給される。各検
出回路　２１２は射影点が各カテゴリ毎の有音領域か否
かを判定し、領域以内の場合は“１”レベルの信号を出
力し、それ以外の場合は“０”レベルの信号を出力する
。検出回路２１２ａの一例を第２４図に示す。これは、
各カテゴリ毎の有音領域パラメータメモリ　２２４と、
有音検出器２２６とからなる。The output of the inner product calculator 204 is supplied to the FIR filter 21[1 and the plurality of detection circuits 212a to 212j. Each detection circuit 212 determines whether or not the projection point is in the sound area for each category, and if it is within the area, outputs a "1" level signal, otherwise outputs a "0" level signal. do. An example of the detection circuit 212a is shown in FIG. this is,
a sound area parameter memory 224 for each category;
It consists of a sound detector 226.

検出回路２１２ａ〜２１２ｊの出力が板検出器２１４に
供給される。板検出器２１４はいずれか１つの検出回路
の出力でも“１”レベルの場合は有音であると仮に判定
し、それ以外の場合は無音であると仮に判定する。The outputs of detection circuits 212a-212j are supplied to plate detector 214. The plate detector 214 tentatively determines that there is a sound if the output of any one detection circuit is at the "1" level, and tentatively determines that there is no sound otherwise.

板検出器２１４の出力が不一致検出器２１６に供給され
、１つ前のフレームの仮判定状と現フレームの仮判定結
果が一致するか否か判定され、不一致場合は“１”レベ
ルの信号ＦＦを出力し、それ以外の場合は“０”レベル
の信号ＦＦを出力する。The output of the plate detector 214 is supplied to the discrepancy detector 216, which determines whether the provisional judgment result of the previous frame and the provisional judgment result of the current frame match, and if they do not match, a "1" level signal FF is sent. otherwise, a signal FF of "0" level is output.

一方、ＦＩＲフィルタ２＋６の出力が変化検出器２１８
に供給される。変化検出器２１８はフィルタ２１６から
出力される射影点Ｑのフレーム毎の変化ベクトル△Ｑ冨
（△ｑ１、△ｑ２、△ｑ３）を用いて次式で示される変
化量△を計算し、変化量△が所定の閾値以上の場合は変
化ありとして“１”レベルの信号ＣＦを出力し、それ以
下の場合は変化なしとして“０″レベルの信号ＣＦを出
力する。On the other hand, the output of the FIR filter 2+6 is detected by the change detector 218.
is supplied to The change detector 218 calculates the amount of change △ expressed by the following equation using the change vector △Q-value (△q1, △q2, △q3) for each frame of the projection point Q output from the filter 216, and calculates the amount of change. If Δ is greater than or equal to a predetermined threshold, a signal CF of level “1” is output, indicating that there has been a change, and when it is less than that, a signal CF of level “0” is output, indicating that there has been no change.

△＝Ｗ１△Ｑ１’＋Ｗ１△ｑｌ’＋Ｗ１Δｑ１２・・・
・・・（２３）ただし、Ｗｌ、Ｗ２、Ｗ３は線形の重み係数であり、有
音部の特徴パラメータの自己相関行列の固有値をＷｉが
用いられる。△=W1△Q1'+W1△ql'+W1△q12...
(23) However, Wl, W2, and W3 are linear weighting coefficients, and Wi is used as the eigenvalue of the autocorrelation matrix of the feature parameter of the voiced part.

板検出器２１４の出力、不一致検出器２１６の出力ＦＦ
、変化検出器２１ｇの出力ＣＦが無音／有音識別器２２
０に供給される。無音／有音識別器２２Ｇは、先ず、不
一致検出器２１６の出力ＦＦと変化検出器２１８の出力
ＣＦとを比較し、両者が一致する場合は板検出器２１４
の出力を最終的判定結果として出力する。一方、両者が
一致しない場合は、次のように判定する。Output of plate detector 214, output FF of discrepancy detector 216
, the output CF of the change detector 21g is the silent/sound discriminator 22
0. The silent/sound discriminator 22G first compares the output FF of the mismatch detector 216 and the output CF of the change detector 218, and if they match, the plate detector 214
output as the final judgment result. On the other hand, if the two do not match, the determination is made as follows.

ＦＦ＝１かつＣＦ＝Ｏならば、１つ前のフレームの仮検
出結果を最終判定結果として出力するとともに、現フレ
ームの仮検出結果を書換え、前のフレームの結果と一致
させる。If FF=1 and CF=O, the provisional detection result of the previous frame is output as the final judgment result, and the provisional detection result of the current frame is rewritten to match the result of the previous frame.

ＦＦ＝０かつＣＦ＝１ならば、現フレームの仮検出結果
を反転した後、最終判定結果として出力する。この判定
の流れを第２５図に示す。If FF=0 and CF=1, the temporary detection result of the current frame is inverted and then output as the final determination result. The flow of this determination is shown in FIG.

このように第６実施例によれば、主成分ベクトル空間上
の特徴パラメータの射影点の位置だけではなく、射影点
の時間的変化量をも組み合わせて無音／有音の判定を行
なうので、背景雑音による誤判定が減り、判定精度が向
上する効果がある。In this way, according to the sixth embodiment, since the determination of silence/speech is performed by combining not only the position of the projection point of the feature parameter on the principal component vector space but also the amount of change over time of the projection point, the background This has the effect of reducing erroneous judgments due to noise and improving judgment accuracy.

以上説明したように、この発明によれば、音声信号から
多数の特徴パラメータを計算し、それらを所定の主成分
ベクトル空間内（その次元数はパラメータの数よりも少
ない）に射影し、射影点の位置に基づいて音声記号が無
音か有音かを識別することにより、特徴パラメータの統
計的性質を利用で識別に使うパラメータの数を減らし、
かつパラメータ数の減少に伴う識別精度の低下は最小に
抑えることができる。さらに、特徴パラメータを求める
際に、パラメータの個数、その分類を最適化しておく必
要がないという利点もある。As explained above, according to the present invention, a large number of feature parameters are calculated from an audio signal, projected into a predetermined principal component vector space (the number of dimensions thereof is smaller than the number of parameters), and a projected point is By identifying whether a phonetic symbol is silent or voiced based on its position, the statistical properties of feature parameters can be used to reduce the number of parameters used for identification.
Moreover, the decrease in identification accuracy due to a decrease in the number of parameters can be suppressed to a minimum. Another advantage is that it is not necessary to optimize the number of parameters and their classification when determining feature parameters.

[Brief explanation of drawings]

第１図は従来の無音／有音識別装置のブロック図、第２図は従来例の動作を説明するためにスペクトル形状
を示す図、第３図はこの発明による無音／有音識別の概略するため
の図、第４図はこの発明による無音／有音識別装置の第１実施
例が応用される音声セル化装置のブロック図、第５図はこの発明による無音／有音識別装置の第１実施
例のブロック図第６図は第１実施例の有音主成分ベクトルメモリに格納
されるデータを求める手順を示すフローチャート、第７図は第１実施例の無音／有音識別のための基準であ
る有音主成分ベクトル空間の有音領域を表わす図、第８図は第１実施例の無音／有音識別動作を示すフロー
チャート第９図はこの発明による無音／有音識別装置の第２実施
例のブロック図第１０図は第２実施例の無音／有音識別動作を示すフロ
ーチャート第１１図はこの発明による無音／有音識別装置の第３実
施例のブロック図、第１２図は第３実施例の特徴パラメータ射影回路のブロ
ック図、第１３図は第３実施例の検出回路のブロック図、第１４
図は第３実施例の無音／有音識別動作を示すフローチャ
ート、第１５図はこの発明による無音／有音識別装置の第４実
施例のブロック図第１６図は第４実施例の無音／有音識別動作を示すフロ
ーチャート第１７図はこの発明による無音／有音識別装置の第５実
施例のブロック図、第１８図は第５実施例のＦＩＲフィルタの第１例のブロ
ック図、第１９図は第５実施例のＦＩＲフィルタの第２例のブロ
ック図、第２０図は第５実施例のマツチング回路のブロック図、第２１図は第５実施例のマツチング回路の標準パターン
メモリに格納される標準パターンを求める手順を示すフ
ローチャート、第２２図は第５実施例の無音／有音識別動作を示すフロ
ーチャート第２３図はこの発明による無音／有音識別装置の第６実
施例のブロック図、第２４図は第６実施例のマツチング回路のブロック図第２５図は第６実施例の無音／有音識別動作を示すフロ
ーチャートである。代；人弁理土　則：丘憲ｊコ同　　松山光之第１図屑友毅第図第図第図第図第図第１０図第１４図Ｌ　　　　　　　　　　　　　　　　Ｊ第２図第３図ＵＴ第５図第１６図第８図第９図第２２図第２４図第２５図Fig. 1 is a block diagram of a conventional silence/speech identification device, Fig. 2 is a diagram showing a spectrum shape to explain the operation of the conventional example, and Fig. 3 is a schematic diagram of silence/speech identification according to the present invention. FIG. 4 is a block diagram of a speech cell generator to which the first embodiment of the silence/speech identification device according to the present invention is applied, and FIG. 5 is a block diagram of the first embodiment of the silence/sound identification device according to the invention. 6 is a block diagram of the embodiment. FIG. 6 is a flowchart showing the procedure for obtaining data stored in the voice principal component vector memory of the first embodiment. FIG. 7 is a standard for identifying silence/speech in the first embodiment. FIG. 8 is a flowchart showing the silence/speech discrimination operation of the first embodiment. FIG. A block diagram of the embodiment FIG. 10 is a flowchart showing the silence/speech discrimination operation of the second embodiment. FIG. 11 is a block diagram of the third embodiment of the silence/speech discrimination device according to the present invention. FIG. 13 is a block diagram of the feature parameter projection circuit of the third embodiment; FIG. 13 is a block diagram of the detection circuit of the third embodiment;
15 is a block diagram of the fourth embodiment of the silence/speech identification device according to the present invention. FIG. Flowchart showing sound identification operation FIG. 17 is a block diagram of the fifth embodiment of the silent/sound identification device according to the present invention, FIG. 18 is a block diagram of the first example of the FIR filter of the fifth embodiment, and FIG. 19 is a block diagram of the second example of the FIR filter of the fifth embodiment, FIG. 20 is a block diagram of the matching circuit of the fifth embodiment, and FIG. 21 is a block diagram of the standard pattern memory of the matching circuit of the fifth embodiment. FIG. 22 is a flowchart showing the procedure for determining the standard pattern; FIG. 22 is a flowchart showing the silent/speech identification operation of the fifth embodiment; FIG. 23 is a block diagram of the sixth embodiment of the silence/sound identification device according to the present invention; FIG. 24 is a block diagram of the matching circuit of the sixth embodiment. FIG. 25 is a flowchart showing the silent/sound discrimination operation of the sixth embodiment. Law: Kenji Oka, Mitsuyuki Matsuyama, Figure 1, Tomo Kuzu, Figure 10, Figure 14, Figure L, J, 2, Figure 3, UT, Figure 5 16 Figure 8 Figure 9 Figure 22 Figure 24 Figure 25

Claims

[Claims]

(1) The principal component vectors of the sound part and the silent part are calculated in advance for the audio data collected under a predetermined environment, and the feature parameters of the frame to be detected are set in the principal component vector space of the sound part or the silent part. A voice detection method is characterized in that the speech is projected onto the principal component vector space of the area, and the presence/absence of speech is determined based on the position of the projection point and the amount of change over time.

(2) Calculate the principal component vectors of the voiced part and silent part in advance for the audio data collected under a predetermined environment, and set the feature parameters of the frame to be detected in the principal component vector space of the voiced part or the silent part. A voice detection method is characterized in that the speech is projected onto the principal component vector space of the area, and the presence or absence of speech is determined based on the amount of change over time of the projection point.