JPS60101598A

JPS60101598A - Voice section detector

Info

Publication number: JPS60101598A
Application number: JP58208669A
Authority: JP
Inventors: 中谷　奉文
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1983-11-07
Filing date: 1983-11-07
Publication date: 1985-06-05

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】肢生公互本発明は、音声区間検出装置、より詳細には、音声認識
装置において、音声区間を安定して切り出すための語頭
検出装置に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech segment detection device, and more particularly to a word beginning detection device for stably cutting out speech segments in a speech recognition device.

従来技術音声認識装置において、入力信号対雑音比が良好な音声
を対象とする場合には音声の存在する区間を抽出するこ
とは比較的に容易なことである。In a conventional speech recognition device, when the target is speech with a good input signal-to-noise ratio, it is relatively easy to extract a section where speech exists.

しかしながら、音声の認識装置が実際に使用されるよう
な環境においては、種々の騒音を含み、騒音・雑音と重
畳された形で音声が入力される。その雑音は時々刻々変
化するもので、そのため固定的な閾値を設けておいて音
声区間を切り出す方法では安定な音声区間の切り出しは
困難であり、誤認識の一因となる。この解決法として、
閾値を種々の方法で可変にする方法が提案されているが
、いづれもその方法が複雑で高価となる欠点を有してい
た。However, in an environment in which a speech recognition device is actually used, speech is input in a form that contains various noises and is superimposed on the noise. The noise changes from moment to moment, and therefore, using a method of setting a fixed threshold value and cutting out a voice section, it is difficult to cut out a stable voice section, and this becomes a cause of misrecognition. As a solution to this
Various methods have been proposed for varying the threshold value, but each method has the drawback of being complicated and expensive.

−比−−煎本発明は、上述のごとき実情に鑑みてなされたもので、
特に、周囲雑音レベルの高低にかかわりなく安定な音声
区間の切り出しを行うことができ、安定した認識率を確
保することのできる音声区間検出装置を提供することを
目的としてなされたものである。-Comparison--The present invention was made in view of the above-mentioned circumstances.
In particular, the purpose of this invention is to provide a speech section detection device that can stably extract speech sections regardless of the level of ambient noise and ensure a stable recognition rate.

構成本発明の構成について、以下、実施例に基づいて説明す
る。Configuration The configuration of the present invention will be described below based on examples.

第１図は、本発明の動作原理を説明するための音声信号
波形図で、第１図（ａ）は、例えば、入力信号のパワー
の変化を示す波形図で、Ａ点は図中の予め決めた閾値Ｔ
Ｈ１で検出した大雑把な語頭である。この場合の閾値Ｔ
Ｈ，は一般的には雑音レベルの最大値よりも大きな値に
設定する。第１図（ｂ）は第１図（ｑ）の信号を時間△
ｔだけ遅延させた波形図で、従って、Ａ点に対応する点
はＡ′となる。閾値ＴＨ２はＡ点を検出した時点の近傍
で設定した値で、この値をもとにしてよす正確な立ち上
がり点Ｂを決定し、正しい音声区間を決定する。よって
、装置として信号の取り込み及び区間の検出はＡ点を検
出した時点より始めれば良いことになる。FIG. 1 is an audio signal waveform diagram for explaining the operating principle of the present invention, and FIG. 1(a) is a waveform diagram showing, for example, a change in the power of an input signal. Decided threshold T
This is the rough beginning of the word detected by H1. Threshold T in this case
H, is generally set to a value larger than the maximum value of the noise level. Figure 1(b) shows the signal in Figure 1(q) over time △
In the waveform diagram delayed by t, therefore, the point corresponding to point A is A'. The threshold value TH2 is a value set near the time point A is detected, and based on this value, an accurate rising point B is determined, and a correct speech section is determined. Therefore, it is sufficient for the device to start acquiring signals and detecting sections from the time point A is detected.

上述のように、本発明は、固定閾値により予備的に音声
の語頭を検出したのちに正確な語頭を検出して音声区間
を検出しようとするもので、第２図に、その一実施例で
ある電気的ブロック線図を示す。As described above, the present invention attempts to detect a speech interval by preliminary detection of the beginning of a speech using a fixed threshold value, and then detecting the exact beginning of the speech. 1 shows an electrical block diagram.

第２図において、■は入力信号端、２は遅延器、３は特
徴抽出部、４は量子化装置、５は音声区間検出器、６は
予備語頭検出器で、入力端１からの入力信号は遅延器２
及び予備語頭検出器６に印加され、遅延器２の出力は例
えばＢ、Ｐ、Ｆ群のような特徴抽出部３を経てＡＤ変換
器のような量子化装置４でデジタル信号に変換され、音
声区間検出器５で音声区間が検出されて区間信号として
出力８される。また、予備語頭検出器６からの検出信号
７は量子化装置４及び音声区間検出器５に加えられ、こ
の信号に基づいて量子化及び区間検出を始める。In FIG. 2, ■ is an input signal end, 2 is a delay device, 3 is a feature extractor, 4 is a quantization device, 5 is a voice section detector, 6 is a preliminary word beginning detector, and the input signal from input end 1 is is delay device 2
The output of the delay device 2 is applied to a preliminary word-initial detector 6, and the output of the delay device 2 is converted into a digital signal by a quantization device 4 such as an AD converter through a feature extraction unit 3 such as groups B, P, and F. A voice section is detected by a section detector 5 and output 8 as a section signal. Further, the detection signal 7 from the preliminary word-initial detector 6 is applied to the quantizer 4 and the speech section detector 5, and quantization and section detection are started based on this signal.

第３図は、第２図に示した予備語頭検出器６の詳細図で
、入力端１からの入力信号はパワーエネルギー又は零交
叉波数等の語頭検出パラメータ抽出器９に加えられ、そ
の出力信号が比較器工０にて閾値設定器１１の設定閾値
Ｔ　Ｈ、と比較されて第１図（ｑ）のＡ点を検出する。FIG. 3 is a detailed diagram of the preliminary word-initial detector 6 shown in FIG. is compared with the set threshold value TH of the threshold value setter 11 in the comparator 0, and the point A in FIG. 1(q) is detected.

ただし、設定閾値はパワーエネルギー又は零交叉波数等
パラメータに対応した値である。また、音声区間検出器
５は正確な語頭の検出をする検出部で、フレームごとの
差分により立ち上がりを検出するとか帯域毎のパワー比
から検出するとか、既に公知の種々の検出手段のいずれ
を用いて検出してもよい。However, the set threshold value is a value corresponding to a parameter such as power energy or zero-crossing wave number. The speech interval detector 5 is a detection unit that accurately detects the beginning of a word, and can use any of various known detection means, such as detecting the rise based on the difference between frames or detecting it from the power ratio of each band. It may also be detected by

第４図は、本発明の他の実施例を説明するための構成図
で、図中、１２はシフ１〜レジスタ、１３は特徴抽出器
で、その他第２図と同様の作用をする部分には第２図の
場合と同一の参照番号を付しである。而して、この実施
例は、遅延器をアナログ信号のところで用いるのではな
く、量子化したデジタル信号で扱うようにしてシフトレ
ジスタで構成したもので、入力端１がらの入力信号は量
子化装置４及び予備語頭検出器６に印加される。量子化
装置４によって量子化された信号はシフトレジスタ１２
でΔｔだけ遅延され、例えばデジタルフィルター等の特
徴抽出器１３で特徴パラメータが抽出され、このパラメ
ータに基づいて音声区間検出器５で正確な音声区間が検
出さ４ｔて出力８される。一方、予備語頭検出器６がら
の検出信号７は、量子化装置４、特徴抽出器１３及び音
声区間検出器５に加えられ、この信号に基づいて量子化
、特徴抽出及び区間検出がなされる。FIG. 4 is a block diagram for explaining another embodiment of the present invention. In the figure, 12 is a shift 1 to register, 13 is a feature extractor, and other parts having the same function as those in FIG. are given the same reference numerals as in FIG. Therefore, in this embodiment, the delay device is not used for analog signals, but is configured with a shift register to handle quantized digital signals, and the input signal from input terminal 1 is processed by the quantization device. 4 and a preliminary word-initial detector 6. The signal quantized by the quantizer 4 is transferred to the shift register 12.
Then, a feature extractor 13 such as a digital filter extracts a feature parameter, and based on this parameter, a speech section detector 5 detects an accurate speech section 4t and outputs it 8. On the other hand, the detection signal 7 from the preliminary word-initial detector 6 is applied to the quantizer 4, the feature extractor 13, and the speech segment detector 5, and quantization, feature extraction, and segment detection are performed based on this signal.

勢−一米以上の説明がら明らかなように、本発明によると、最初
にあらく音声の語頭を検出し、その後に正確な？）声の
語頭及び区間を検出するようにしたので、誤検出のない
区間検出ができ、更に、装置本体の動作を予備語頭検出
後のみに限定して使用できるので効率の良い操作ができ
る。As is clear from the above explanation, according to the present invention, the beginning of a word is roughly detected first, and then the beginning of the word is detected accurately. ) Since the beginning and section of the voice are detected, the section can be detected without false detection, and furthermore, the operation of the main body of the device can be limited to only after the preliminary beginning of the word has been detected, allowing efficient operation.

[Brief explanation of drawings]

第１図は、本発明の動作原理を説明するための波形図、
第２図は、本発明の一実施例を説明するための電気的ブ
ロック線図、第３図は、第２図に示した予備語頭検出器
６の詳細電気回路図、第４図は、本発明の他の実施例を
説明するための電気的ブロック線図である。２・・・遅延器、３・・・特徴抽出部、４・・・量子化
装置、５・・・音声区間検出器、６・・・予備語頭検出
器、１０・・・比較器、１１・・・閾値設定器、１２・
・・シフトレジスタ、工３・・・特徴抽出器。第１図 −（２１ン１FIG. 1 is a waveform diagram for explaining the operating principle of the present invention,
FIG. 2 is an electrical block diagram for explaining one embodiment of the present invention, FIG. 3 is a detailed electrical circuit diagram of the pre-word beginning detector 6 shown in FIG. 2, and FIG. FIG. 3 is an electrical block diagram for explaining another embodiment of the invention. 2... Delay device, 3... Feature extractor, 4... Quantization device, 5... Speech section detector, 6... Preliminary beginning detector, 10... Comparator, 11.・Threshold value setter, 12・
...Shift register, Engineering 3...Feature extractor. Figure 1 - (21-1

Claims

[Claims]

(1) It is characterized by having a detection means for preliminarily detecting the beginning of a word of a signal based on a preset threshold value, and a speech section detection means for detecting a detailed speech section based on the detection signal of the detection means. Voice section detection device.

(2) The voice section detecting device according to claim 1, wherein a delay device is inserted before the voice section detecting means to delay the signal for a certain period of time.

(3) The speech section detection device according to claim (1), wherein the preliminary word beginning detection signal operates the quantization means and the speech section detection stage.

(4) The voice section detecting device according to claim 1, characterized in that a shift 1 register is inserted before the voice section detecting means to delay the signal for a certain period of time.