JP2599974B2

JP2599974B2 - Voice detection method

Info

Publication number: JP2599974B2
Application number: JP63227546A
Authority: JP
Inventors: 雅幸海野; 正志宮川; 恒彦小池
Original assignee: NTT Advanced Technology Corp; Sekisui Chemical Co Ltd
Current assignee: NTT Advanced Technology Corp; Sekisui Chemical Co Ltd
Priority date: 1988-09-13
Filing date: 1988-09-13
Publication date: 1997-04-16
Anticipated expiration: 2012-04-16
Also published as: JPH0277098A

Description

【発明の詳細な説明】［産業上の利用分野］本発明は、音声検出方式に関する。Description: TECHNICAL FIELD The present invention relates to a voice detection system.

［従来の技術］従来、雑音環境下で音声の存在を検出する方法は多数
あり、特公昭57−12999号公報に記載されているような
通信における音声区間の検出に用いたり、音声言語内容
の認識の前処理に用いたりされているが、高雑音下での
一般用途への展開は困難で、例えば、着信ベル音が鳴っ
ているような状態でのハンズフリー電話機の音声による
応答開始等ができなかった。[Prior Art] Conventionally, there are a number of methods for detecting the presence of speech in a noisy environment, such as detecting speech sections in communication as described in Japanese Patent Publication No. 57-12999, or using speech language contents. Although it is used for pre-processing of recognition, it is difficult to develop it for general use under high noise.For example, starting a response by voice of a hands-free phone while the ringing tone is ringing, etc. could not.

なお、雑音環境下で簡易に音声の存在を検出する方法
としては、入力信号が一定時間間隔内に参照軸を横切る
回数を検出する方法があった。As a method of simply detecting the presence of voice in a noise environment, there has been a method of detecting the number of times an input signal crosses a reference axis within a predetermined time interval.

［発明が解決しようとする課題］しかしながら、上記従来の音声検出方式を用いる方法
にあっては、一般に雑音の振幅は音声の振幅に比較して
小さいという前提を用いており、雑音の振幅が音声の振
幅と同程度の場合、音声の存在を検出することができな
い。[Problems to be Solved by the Invention] However, in the above-mentioned method using the conventional voice detection method, it is generally assumed that the noise amplitude is smaller than the voice amplitude, and the noise amplitude is lower than the voice amplitude. If the amplitude is almost the same, the presence of voice cannot be detected.

本発明は、雑音の振幅が大きく音声の検出に対する影
響が大きい場合にも、雑音環境下での音声の存在を、簡
易に検出することを目的とする。SUMMARY OF THE INVENTION It is an object of the present invention to easily detect the presence of speech in a noise environment even when the amplitude of noise is large and the influence on speech detection is large.

［課題を解決するための手段］請求項１、２に記載の本発明は、入力信号の参照軸交
差数と波形の振幅分布に関する値とを特徴パラメータと
して算出し、この算出結果を、有声音と特定雑音につい
ての辞書データと比較し、入力信号が有声音を含むかど
うかを判定するするようにしたものである。[Means for Solving the Problems] According to the present invention as set forth in claims 1 and 2, the number of reference axis crossings of an input signal and a value relating to the amplitude distribution of a waveform are calculated as characteristic parameters, and this calculation result is used as a voiced sound. Is compared with the dictionary data on the specific noise to determine whether or not the input signal includes a voiced sound.

そして、請求項１に記載の本発明は、前記波形の振幅
分布に関する値として、例えば次式で表わされる波高値
Ｐを用いるようにしたものである。According to the first aspect of the present invention, for example, a peak value P represented by the following equation is used as a value relating to the amplitude distribution of the waveform.

Ｐ＝20×1og₁₀（V_P/V_rms）ただし、V_P:一定時間間隔内の振幅の絶対値の最大値 V_rms:同一定時間間隔内の振幅の実効値また、請求項２に記載の本発明は、前記波形の振幅分
布に関する値として、例えば次式で表わされる波高値Ｐ
を用いるようにしたものである。P = 20 × 1 og ₁₀ (V _P / V _rms ) where V _P : the maximum value of the absolute value of the amplitude within a fixed time interval V _rms : the effective value of the amplitude within the same fixed time interval According to the present invention, as a value relating to the amplitude distribution of the waveform, for example, a peak value P represented by the following equation is used.
Is used.

Ｐ＝20×1og₁₀（V_P/V_a）ただし、V_P:一定時間間隔内の振幅の絶対値の最大値 V_a:同一定時間間隔内の振幅の絶対値の平均値［作用］請求項１、２に記載の本発明にあっては、雑音環境下
の音声を以下の如く検出する。なお、本発明にあって
は、有声音（母音、半母音、鼻音等の声帯の振動をとも
なう音であり、人間が発声する殆どすべての音声には有
声音が含まれている）をもって音声とする。 _{P = 20 × 1og 10 (V} P / V a) However, V _P: maximum V _a of the absolute value of the amplitude within a certain time interval: average value of the absolute value of the amplitude in the same fixed time interval [Operation] claims In the present invention described in the items 1 and 2, a voice in a noise environment is detected as follows. In the present invention, voiced sounds (voices with vocal folds such as vowels, semi-vowels, and nasal sounds, and almost all voices uttered by humans include voiced sounds) are regarded as voices. .

（１）有声音と特定雑音について、それらの信号の一定
時間間隔内における参照軸交差数（零レベル等、予め定
めた参照レベルを横切る回数）と波形の振幅分布に関す
る値とを特徴パラメータとする辞書データを用意する。(1) With respect to voiced sound and specific noise, the number of reference axis crossings (the number of times the signal crosses a predetermined reference level such as a zero level) and the value related to the amplitude distribution of the waveform within a certain time interval are used as characteristic parameters. Prepare dictionary data.

辞書データとしては、例えば、下記（ａ）、（ｂ）、
（ｃ）が用いられる。Examples of the dictionary data include the following (a), (b),
(C) is used.

（ａ）多数の音声から得られた有声音についての特徴パ
ラメータの組。(A) A set of feature parameters for voiced sounds obtained from multiple voices.

（ｂ）特定雑音（例えば特定電話機の着信ベル音）につ
いて求められた多数の特徴パラメータの組。(B) A set of a large number of characteristic parameters obtained for a specific noise (for example, a ring tone of a specific telephone).

（ｃ）有声音と、特定雑音とを特定の比率で加え合わせ
た結果を多数の音声について求めた特徴パラメータの
組。(C) A set of feature parameters obtained by adding a result of adding a voiced sound and a specific noise at a specific ratio for many voices.

なお、上記（ａ）、（ｂ）、（ｃ）のデータは、音響
データを特徴パラメータ化した数値データ、数値データ
を統計処理した平均値、分散等の統計的データ、もしく
は統計的データに基づいて定まる境界方程式等の判別式
データ等の各種態様にて用意できる。The data (a), (b) and (c) are based on numerical data obtained by converting acoustic data into feature parameters, average values obtained by statistically processing numerical data, statistical data such as variance, or statistical data. It can be prepared in various modes such as discriminant data such as a boundary equation determined in advance.

（２）入力信号を採取し、この入力信号の一定時間間隔
内における参照軸交差数と波形の振幅分布に関する値と
を特徴パラメータとして算出する。(2) The input signal is sampled, and the number of reference axis crossings and the value related to the amplitude distribution of the waveform within a certain time interval of the input signal are calculated as characteristic parameters.

（３）上記（２）で算出した特徴パラメータと、上記
（１）で定めた辞書データが規定する標準パターンと
を、パラメータ空間上で比較し、入力信号が有声音を含
むかどうかをパターン認識により判定する。(3) The feature parameters calculated in (2) are compared with a standard pattern defined by the dictionary data defined in (1) in a parameter space, and pattern recognition is performed to determine whether or not the input signal includes a voiced sound. Determined by

辞書データを用いて上述のパターンは認識は例えば以
下の如くなされる。The above-mentioned pattern is recognized using the dictionary data, for example, as follows.

辞書データが規定するカテゴリー「有声音」（前記
（ａ）の有声音、もしくは前記（ｃ）の特定雑音を特定
の比率で加え合わされた有声音のカテゴリー）と、カテ
ゴリー「その他」とで２分されるパラメータ空間を構成
し、入力信号の特徴パラメータがどちらのカテゴリーに
属するかを判定する。The category "voiced sound" defined by the dictionary data (the voiced sound of (a) or the voiced sound category of the specific noise of (c) added at a specific ratio) and the category "others" are divided into two minutes. And determines which category the feature parameter of the input signal belongs to.

に、特定雑音の振幅が大きく、これが有声音の検出に
大きく影響を与えることの可能性を考慮し、上記に加
え、カテゴリー「特定雑音」とカテゴリー「有声音」の
境界を定め、入力信号の特徴パラメータがどちらのカテ
ゴリーに属するかを判定する。In addition, considering the possibility that the amplitude of the specific noise is large, which greatly affects the detection of voiced sound, in addition to the above, the boundary between the category "specific noise" and the category "voiced sound" is determined, and the input signal It is determined which category the feature parameter belongs to.

上記、の判定の結果、入力信号が、においてカ
テゴリー「有声音」に属し、かつにおいてカテゴリー
「特定雑音」に属さないことを条件に、入力信号中に有
声音が存在することを判定する。As a result of the above determination, it is determined that a voiced sound exists in the input signal on condition that the input signal belongs to the category “voiced sound” and does not belong to the category “specific noise”.

しかして、請求項１、２に記載の本発明にあっては、
特徴パラメータとして参照軸交差数と波形の振幅分布に
関する値の２つのパラメータを用いたから、カテゴリー
「有声音」とカテゴリー「特定雑音」とをパラメータ空
間において明瞭に分離できる。したがって、特定雑音の
振幅が大きく音声の検出に対する影響が大きい場合に
も、雑音環境下での音声の存在を、高い検出率で簡易に
検出できる。Thus, in the present invention described in claims 1 and 2,
Since two parameters, the number of reference axis crossings and the value related to the amplitude distribution of the waveform, are used as the feature parameters, the category “voiced sound” and the category “specific noise” can be clearly separated in the parameter space. Therefore, even when the amplitude of the specific noise is large and the influence on the voice detection is large, the presence of the voice under the noise environment can be easily detected at a high detection rate.

しかして、請求項１に記載の本発明によれば、波形の
振幅分布に関する値として、前述した如くの波高値を用
いたから、有声音の特徴である先鋭な波形を忠実に反映
したパラメータ値を用いることとなり、雑音の識別性が
向上するというメリットがある。According to the first aspect of the present invention, since the peak value as described above is used as the value related to the amplitude distribution of the waveform, the parameter value that faithfully reflects the sharp waveform which is a characteristic of voiced sound is used. As a result, there is a merit that noise discrimination is improved.

また、請求項２に記載の本発明によれば、波形の振幅
分布に関する値として、前述した如くの波高値を用いた
から、請求項２に記載の本発明に比して演算量を少なく
でき、かつ有声音の特徴である先鋭な波形を比較的忠実
に反映したパラメータ値を用いることとなり、雑音の識
別性が向上するというメリットがある。なお、演算量が
少ないということは応答速度が速いことを意味する。According to the second aspect of the present invention, since the peak value as described above is used as the value relating to the amplitude distribution of the waveform, the amount of calculation can be reduced as compared with the second aspect of the present invention. In addition, since a parameter value that reflects a sharp waveform that is a characteristic of voiced sound relatively faithfully is used, there is a merit that noise discrimination is improved. Note that a small amount of calculation means that the response speed is fast.

［実施例］第１図は本発明の実施に用いられる音声検出装置の一
例を示すブロック図、第２図は本発明の特徴パラメータ
によって形成されるパラメータ空間を示す模式図であ
る。[Embodiment] FIG. 1 is a block diagram showing an example of a speech detection device used in the embodiment of the present invention, and FIG. 2 is a schematic diagram showing a parameter space formed by characteristic parameters of the present invention.

第１図において、11はマイク、12は増幅器、13はロー
パスフィルタ、14はA/Dコンバータ、15はパラメータ計
算部、16は辞書データ記憶部、17は判定部、18は結果出
力部である。この実施例にあっては、雑音環境下の音声
を以下の如く検出する。In FIG. 1, 11 is a microphone, 12 is an amplifier, 13 is a low-pass filter, 14 is an A / D converter, 15 is a parameter calculation unit, 16 is a dictionary data storage unit, 17 is a judgment unit, and 18 is a result output unit. . In this embodiment, speech in a noisy environment is detected as follows.

（１）有声音と特定雑音について、それらの信号の20mS
間における参照軸交差数X₁と、波形の振幅分布に関する
値X₂とを特徴パラメータとする辞書データを用意し、こ
れを辞書データ記憶部16に記憶せしめる。(1) For voiced sound and specific noise, 20mS of those signals
A reference axis intersecting the number X ₁ between, the value X ₂ relating to the amplitude distribution of the waveform to prepare the dictionary data, wherein parameters allowed to store it in the dictionary data storage unit 16.

ここで、波形の振幅分布に関する値X₂としては、下記
、のいずれかを用いることができる。Here, the value X ₂ relating to the amplitude distribution of the waveform can be used below, one of the.

下式で表わされる波高値Ｐ。The peak value P represented by the following equation.

Ｐ＝20×1og₁₀（V_P/V_rms）ただし、V_P:一定時間間隔内の振幅の絶対値の最大値 V_rms:同一定時間間隔内の振幅の実効値下式で表わされる波高値Ｐ。P = 20 × 1og ₁₀ (V _P / V _rms ) where V _P : Maximum value of the absolute value of the amplitude within a fixed time interval V _rms : RMS value of the amplitude within the same fixed time interval P.

Ｐ＝20×1og₁₀（V_P/V_a）ただし、V_P:一定時間間隔内の振幅の絶対値の最大値 V_a:同一定時間間隔内の振幅の絶対値の平均値上記の波高値を用いる場合には、有声音の特徴であ
る先鋭な波形を比較的忠実に反映したパラメータ値を用
いることとなり、雑音の識別性が向上するというメリッ
トがある。 _{P = 20 × 1og 10 (V} P / V a) However, V _P: maximum V _a of the absolute value of the amplitude within a certain time interval: average above the peak value of the absolute value of the amplitude in the same fixed time interval Is used, a parameter value that reflects a sharp waveform that is a feature of voiced sound relatively faithfully is used, and there is an advantage that noise discrimination is improved.

上記の波高値を用いる場合には、上記の波高値に
比して演算量を少なくでき、かつ有声音の特徴である先
鋭な波形を忠実に反映したパラメータ値を用いることと
なり、雑音の識別性が向上するというメリットがある。When the above peak value is used, the amount of calculation can be reduced as compared to the above peak value, and a parameter value that faithfully reflects a sharp waveform that is a characteristic of voiced sound is used. There is a merit that is improved.

また、辞書データとしては、例えば下記（ａ）、
（ｂ）、および（ｃ）が作成される。As the dictionary data, for example, the following (a):
(B) and (c) are created.

（ａ）多数の音声から得られた有声音［ア］についての
特徴パラメータの組。(A) A set of feature parameters for voiced sound [A] obtained from many voices.

（ｂ）特定雑音（特定電話機の着信ベル音）について求
められた多数の特徴パラメータの組。(B) A set of a large number of characteristic parameters obtained for a specific noise (ringing sound of a specific telephone).

（ｃ）有声音［ア］と特定雑音とを、 20×1og₁₀（S_rms/N_rms）［dB］で定義される有声音対特定雑音比3,0,−3,−6,−10［d
B］で加え合わせた結果を多数の音声について求めた特
徴パラメータの組。なお、S_rmsは有声音「ア」の振幅の
実効値を表わし、N_rmsは特定雑音の振幅の実効値を表わ
す。(C) The voiced sound [a] and the specific noise are defined as 20 × 1 og ₁₀ (S _rms / N _rms ) [dB], and the voiced sound-to-specific noise ratio is defined as 3,0, −3, −6, −10. [D
A set of feature parameters obtained by adding the results obtained in [B] for many voices. Note that S _rms represents the effective value of the amplitude of the voiced sound “A”, and N _rms represents the effective value of the amplitude of the specific noise.

（２）マイク11にて入力信号を採取し、この入力信号
を、増幅器12で増幅し、ローパスフィルタ13を通すこと
によって4.2KHz以上の成分はカットし、A/Dコンバータ1
4によって標本化周波数10KHz、変換ビット数16bitのデ
ジタル信号に変換し、パラメータ計算部15に送り込む。
パラメータ計算部15は、上記入力信号の20mS間における
参照軸交差数X₁と、波形の振幅分布に関する値X₂とを特
徴パラメータとして算出する。(2) The input signal is sampled by the microphone 11, the input signal is amplified by the amplifier 12, and the component above 4.2 KHz is cut by passing through the low-pass filter 13, and the A / D converter 1
The signal is converted into a digital signal having a sampling frequency of 10 KHz and a conversion bit number of 16 bits by 4 and is sent to the parameter calculation unit 15.
Parameter calculating unit 15 calculates a reference axis intersecting the number X ₁ between 20mS of the input signal, and a value X ₂ relating to the amplitude distribution of the waveform as the feature parameter.

（３）上記（２）で算出した特徴パラメータと、上記
（１）で定めた辞書データが規定する標準パターンと
を、判定部17において比較し、入力信号が有声音を含む
かどうかを判定し、この判定結果を結果出力部18から出
力する。(3) The feature parameter calculated in (2) is compared with the standard pattern defined by the dictionary data defined in (1) in the determination unit 17 to determine whether the input signal includes a voiced sound. The result is output from the result output unit 18.

ここで、前述の辞書データを用いたパターン認識は、
例えば第２図のパラメータ空間上で以下の如くなされ
る。Here, the pattern recognition using the aforementioned dictionary data is as follows.
For example, the following is performed on the parameter space of FIG.

なお、第２図は零交差数（参照軸レベルを零レベルに
設定したもの）と波高値の２つの特徴パラメータをそれ
ぞれX₁軸とX₂軸にとったものである。第２図において、
μ_１、σ₁₁、σ₁₂はそれぞれ有声音（前記（ａ）の有声
音［ア］、もしくは前記（ｃ）の特定雑音を特定の有声
音対特定雑音比で加え合わされた有声音）の辞書パラメ
ータの平均値、X₁軸成分の標準偏差、X₂軸成分の標準偏
差を表わし、μ_２、σ₂₁、σ₂₂はそれぞれ特定雑音の辞
書パラメータについての同様の値を表わす。Incidentally, FIG. 2 are those taken in X ₁ 2 single feature parameters respectively axial and X ₂ axis between the peak value zero crossing number (obtained by setting the reference axis level to zero level). In FIG.
μ ₁ , σ ₁₁ , and σ ₁₂ are voiced sound dictionaries (the voiced sound [a] of (a) or the voiced sound obtained by adding the specific noise of (c) at a specific voiced sound to specific noise ratio). the average value of the parameter, the standard deviation of the X ₁ axis component represents the standard deviation of the X ₂ axis _{_{component, μ 2, σ 21, σ}} 22 represents the same values for the dictionary parameter of each specific noise.

辞書データが規定するカテゴリー「有声音」（前記
（ａ）の有声音［ア］、もしくは前記（ｃ）の特定雑音
を特定の比率で加え合わせた有声音のカテゴリー）と、
カテゴリー「その他」とを２分する境界１を定める。境
界１にあっては、有声音の辞書データの平均値μ_１を含
む側がカテゴリー「有声音」である。この境界１は、平
均値のまわりにどれだけ有声音の辞書データが集中して
いるかを表わす集中楕円であり、軸の長さを変えること
により有声音の辞書データが楕円内に入る割合を変える
ことができる。この実施例の場合は有声音の辞書データ
の９割が楕円内に入るように軸の長さを定めた。破線は
μとσで規定されるカテゴリー「有声音」の概念を表わ
す。すなわち、このの過程にあっては、入力信号の特
徴パラメータが境界１のいずれの側のカテゴリーに属す
るかを判定することとなる。A category “voiced sound” defined by the dictionary data (a voiced sound [a] of the above (a) or a voiced sound category obtained by adding the specific noise of the above (c) at a specific ratio);
A boundary 1 that divides the category “other” into two is determined. In the boundary 1, the side including the average value mu ₁ of the dictionary data voiced is category "voiced". This boundary 1 is a concentrated ellipse indicating how much voiced dictionary data is concentrated around the average value. By changing the length of the axis, the ratio of the voiced dictionary data entering the ellipse is changed. be able to. In the case of this embodiment, the length of the axis is determined so that 90% of the voiced dictionary data falls within the ellipse. The broken line represents the concept of the category “voiced sound” defined by μ and σ. That is, in this process, it is determined which side of the boundary 1 the characteristic parameter of the input signal belongs to.

次に、特定雑音の振幅が大きく、これが有声音の検出
に大きく影響を与えることの可能性を考慮し、上記に
加え、カテゴリー「特定雑音」とカテゴリー「有声音」
の境界２を定める。境界２にあっては、特定雑音の平均
値μ_２を含む側がカテゴリー「特定雑音」となる。この
境界２は、カテゴリー「有声音」とカテゴリー「特定雑
音」に対する尤度が等しい点の集まりである。この実施
例の場合には特定雑音の標準偏差が、人工的に作られた
電話機の着信ベル音であって、有声音と特定雑音を特定
の有声音対特定雑音比で加え合わせたものの辞書データ
の標準偏差より一般的に小さいので、カテゴリー「特定
雑音」が閉じた空間になっている。破線はμとσで規定
されるカテゴリー「特定雑音」の概念を表わす。すなわ
ち、このの過程にあっては、入力信号の特徴パラメー
タが境界２のいずれの側のカテゴリーに属するかを判定
することとなる。Next, in consideration of the possibility that the amplitude of the specific noise is large, which greatly affects the detection of voiced sound, in addition to the above, the category "specific noise" and the category "voiced sound"
The boundary 2 of is determined. In the boundary 2, is the side that contains the average value mu ₂ specific noise becomes category "specific noise". The boundary 2 is a group of points having the same likelihood for the category “voiced sound” and the category “specific noise”. In the case of this embodiment, the standard deviation of the specific noise is the ring tone of the artificially created telephone, and the dictionary data of the voiced sound and the specific noise added at the specific voiced sound to specific noise ratio. Is generally smaller than the standard deviation of, the category "specific noise" is a closed space. The broken line represents the concept of the category “specific noise” defined by μ and σ. That is, in this process, it is determined which side of the boundary 2 the characteristic parameter of the input signal belongs to.

上記、の判定の結果、入力信号が、特徴パラメー
タ空間上で、において境界１のμ_１側に属し、かつ
において境界２のμ_２側に属さない時、入力信号をカテ
ゴリー「有声音」に属すると判定する。すなわち、入力
信号中に有声音が存在することを判定する。Above, the results of the determination of the input signal, in feature parameter space, belonging to the mu ₁ side of the boundary 1 in, when not belonging to mu ₂ side of the boundary 2 in and belongs to the category "voiced" input signal Is determined. That is, it is determined that a voiced sound exists in the input signal.

しかして、上記実施例にあっては、特徴パラメータと
して参照軸交差数と波形の振幅分布に関する値の２つの
パラメータを用いたから、カテゴリー「有声音」とカテ
ゴリー「特定雑音」とをパラメータ空間において明瞭に
分離できる。したがって、特定雑音の振幅が大きく音声
の検出に対する影響が大きい場合にも、雑音環境下での
音声の存在を、高い検出率で簡易に検出できる。特に、
上記実施例では、有声音対特定雑音比が−6dBにおいて
も高い有声音の検出率を示し、−3dBにおいては100％に
近い検出率を示すことが認められた。In the above embodiment, since two parameters, ie, the number of reference axis crossings and the value related to the amplitude distribution of the waveform, are used as the characteristic parameters, the category “voiced sound” and the category “specific noise” are clearly defined in the parameter space. Can be separated. Therefore, even when the amplitude of the specific noise is large and the influence on the voice detection is large, the presence of the voice under the noise environment can be easily detected at a high detection rate. Especially,
In the above example, it was confirmed that the voiced sound-to-specific noise ratio exhibited a high voiced sound detection rate even at −6 dB, and a detection rate close to 100% at −3 dB.

なお、上記実施例においては、特徴パラメータ空間上
で標準パターンを規定する境界線として集中楕円と２つ
のカテゴリーに対する尤度が等しくなる点の集まりを用
いたが、本発明の実施においては、もちろん他の一般的
なパターン認識の手法を用いることができる。例えば、
カテゴリー「有声音」とカテゴリー「特定雑音」に対す
る尤度が等しくなる点の集まりの代わりに、Maharanobi
s距離やEuclid距離が等しくなる点の集まり等を用いる
ことができる。In the above embodiment, a concentrated ellipse and a set of points at which the likelihoods of the two categories are equal are used as the boundary defining the standard pattern in the feature parameter space. General pattern recognition technique can be used. For example,
Instead of a collection of points where the likelihood for the category "voiced sound" and the category "specific noise" are equal, Maharanobi
A group of points where the s distance and the Euclid distance are equal can be used.

［発明の効果］以上のように本発明によれば、雑音の振幅が大きく音
声の検出に対する影響が大きい場合にも、雑音環境下で
の音声の存在を、簡易に検出することができる。[Effects of the Invention] As described above, according to the present invention, even when the amplitude of noise is large and the influence on voice detection is large, the presence of voice in a noise environment can be easily detected.

[Brief description of the drawings]

第１図は本発明の実施に用いられる音声検出装置の一例
を示すブロック図、第２図は本発明の特徴パラメータに
よって形成されるパラメータ空間を示す模式図である。 11……マイク、 15……パラメータ計算部、 16……辞書データ記憶部、 17……判定部、 18……結果出力部。FIG. 1 is a block diagram showing an example of a voice detection device used for carrying out the present invention, and FIG. 2 is a schematic diagram showing a parameter space formed by characteristic parameters of the present invention. 11: microphone, 15: parameter calculation unit, 16: dictionary data storage unit, 17: determination unit, 18: result output unit

───────────────────────────────────────────────────── フロントページの続き (51)Int.Cl.⁶ 識別記号庁内整理番号ＦＩ技術表示箇所Ｇ１０Ｌ 9/18 ３０１Ｇ１０Ｌ 9/18 ３０１Ａ (56)参考文献特開昭58−7196（ＪＰ，Ａ) 特開昭60−200300（ＪＰ，Ａ) 特開昭61−292199（ＪＰ，Ａ) 特公昭38−11568（ＪＰ，Ｂ１) 特表昭63−500339（ＪＰ，Ａ)──────────────────────────────────────────────────の Continuation of the front page (51) Int.Cl. ⁶ Identification number Office reference number FI Technical display location G10L 9/18 301 G10L 9/18 301A (56) References JP-A-58-7196 (JP, A) JP-A-60-200300 (JP, A) JP-A-61-292199 (JP, A) JP-B-38-11568 (JP, B1) JP-T-63-500339 (JP, A)

Claims

(57) [Claims]

1. The method according to claim 1, wherein the number of reference axis crossings of the input signal and a value relating to the amplitude distribution of the waveform are calculated as characteristic parameters, and the calculation result is compared with dictionary data for voiced sound and specific noise. And a value relating to the amplitude distribution of the waveform, expressed as a ratio of the effective value of the amplitude within the fixed time interval to the maximum value of the absolute value of the amplitude within the fixed time interval. A voice detection method using peak values.

2. The method according to claim 1, wherein the number of reference axis crossings of the input signal and a value relating to the amplitude distribution of the waveform are calculated as characteristic parameters, and the calculation result is compared with dictionary data of voiced sound and specific noise. A ratio of the average value of the absolute value of the amplitude within a certain time interval to the maximum value of the absolute value of the amplitude within a certain time interval as a value related to the amplitude distribution of the waveform. A voice detection method using the peak value represented by.