JPS60216399A

JPS60216399A - Voice section detecting circuit for voice recognition equipment

Info

Publication number: JPS60216399A
Application number: JP59073536A
Authority: JP
Inventors: 安田　晴剛; 中谷　奉文; 河本　俊毅
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-04-12
Filing date: 1984-04-12
Publication date: 1985-10-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】筑４分１本発明は、音声認識装置における音声区間検出回路に関
する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech section detection circuit in a speech recognition device.

従米用先一般に、入力音声の信号対雑音比が良好な音声を対象と
する場合には音声の存在する区間を抽出することは比較
的容易なことである。しかし、音声認識装置が実際に使
用されるような環境においては種々の騒音を含み騒音と
重畳された形で音声が入力される。この時の騒音は時々
刻々と変化するので、固定的な閾値を設けておいて音声
区間を切り出す方法では安定な音声区間の検出は困難で
あり、誤認識の一因となる。このような固定閾値による
切り出しでは音声の語頭及び語尾及び無声子音のような
パワーの低い部分がカッｉ〜される事になる。また、高
騒音下では本来音声区間であるべき所の前後に騒音が付
加して切り出される事になる。Generally speaking, when the target is input speech with a good signal-to-noise ratio, it is relatively easy to extract the section where speech exists. However, in an environment in which a speech recognition device is actually used, speech is input in a form that includes various noises and is superimposed on the noise. Since the noise at this time changes from moment to moment, it is difficult to detect stable voice sections by setting a fixed threshold value and cutting out voice sections, and this becomes a cause of misrecognition. When clipping is performed using such a fixed threshold, low-power parts such as the beginning and end of speech and voiceless consonants are cut out. Furthermore, under high noise conditions, noise will be added and cut out before and after what should originally be a voice section.

■−−〕狂本発明は、−に述のごとき実情に鑑みてなされたもので
、特に、周囲の定常騒音レベルの大小にがかわりなく安
定な音声区間の検出を行ない、安定した認識率を確保す
ることのできる音声区間検出回路を提供することを目的
としてなされたものである。■--] The present invention was made in view of the actual situation mentioned in -.In particular, it detects stable speech sections regardless of the level of ambient steady noise, and achieves a stable recognition rate. This was done with the purpose of providing a voice section detection circuit that can ensure the following.

碧−一一戊本発明の構成１３ついて、以下、実施例に基づいて説明
する。Configuration 13 of the present invention will be described below based on examples.

本発明の特徴とする所は、あらかじめ設定されたノイズ
に影響を受けない高いレベルの始端信号から一定周期で
遅延された過去の信号と比較し、その差が一定しベル以
」二あるかどうかを検出し、−室以上あれば更にその一
回前のサンプルと比較し、もって、最も低い閾値をめて
それを最適閾値とし、かつ、遅延サンプル数を最低とし
て音声区間を検出することにある。The feature of the present invention is to compare a preset high-level starting signal that is not affected by noise with a past signal that is delayed at a certain period, and determine whether the difference is constant and is greater than or equal to 2. is detected, and if it is greater than or equal to - room, it is further compared with the previous sample, and then the lowest threshold is determined, which is used as the optimal threshold, and the number of delayed samples is determined to be the minimum to detect the voice section. .

第１図は、本発明の動作原理を説明するためのタイムチ
ャートで、（ａ）において、Ａは音声パワー信号、Ｂは
最適域値、ｎ、ｎ−１，ｎ−２・・・はサンプル点を示
し、（ｂ）はクロックパルス、（ｅ）は音声区間信号を
示し、あらかじめ設定されたｎ点での始端値から（ｎ−
１）点でのサンプル値と比較し、設定値以上の差がある
時は更に（ｎ−２）点でのサンプル値と比較する。差が
設定値以上の時は更に（’ｎ−３）点と比較し、同様に
（ｎ−４）点と比較する。而して、例えば（ｎ　−３）
と（ｎ−４）点との比較において設定値以下となると、
（ｎ　−３）点のレベルを最適閾値として音声パワー信
号の区間検出を行う。この様にすれば最適な閾値がまる
と共に最小の遅延数を得る事ができる。FIG. 1 is a time chart for explaining the operating principle of the present invention. In (a), A is the audio power signal, B is the optimal threshold, and n, n-1, n-2, etc. are samples. (b) shows the clock pulse, (e) shows the voice section signal, and (n-
1) Compare with the sample value at point, and if there is a difference greater than a set value, further compare with the sample value at point (n-2). When the difference is greater than the set value, it is further compared with point ('n-3) and similarly with point (n-4). For example, (n −3)
If it becomes less than the set value in comparison with point (n-4),
Section detection of the audio power signal is performed using the level at point (n-3) as an optimal threshold. In this way, the optimal threshold value can be set and the minimum number of delays can be obtained.

第２図は、本発明による音声区間検出回路の一実施例を
示す図で、図中、１はバンドパスフィルタ（Ｂ　Ｐ　Ｆ
）、２は整流回路、３はローパスフィル＝３− タ（’ＬＰＦ）、４　ｎは０点コンパレータ、４　ｎ　
−１はｎ−１点コンパレータ、４ｎ−２はｎ−２点コン
パレータ、５はサンプルホールド回路、６は立下りエツ
ジ検出回路、７は閾値サンプルボールド回路、８は区１
？ｎコンパレータ、９はローパスフィルタ（Ｌ　Ｐ　Ｆ
）で、入力された音声信号は、ＢＰＦｌ、ｌｌ流回路２
．ＬＰＦ３を通して第１図にＡにて示す様な音声パワー
信号に変換され、０点コンパレータ４ｎと遅延ブロック
六方切換スイッチＳｎに入力される。初期状態において
は切換スイッチＳｎ、５ｎ−１，・旧・・はすべでＯＮ
になっており、クロックパルスによって順次サンプルポ
ール回路５で遅延される。０点コンパレータ４ｎにおい
ては、まず、最初のｎ点音声パワー信号と任意のレベル
とを比較し、その区間信号の始点を検出する。次に（ｎ
　−１）点コンパレータにおいて、ｎ点と（ｎ−１）点
との値を比較し、その差分値が一定しベル以」二であれ
ば出方はｉ−レベルとなる。FIG. 2 is a diagram showing an embodiment of the voice section detection circuit according to the present invention. In the figure, 1 is a band pass filter (B P F
), 2 is a rectifier circuit, 3 is a low-pass filter ('LPF), 4 n is a 0-point comparator, 4 n
-1 is n-1 point comparator, 4n-2 is n-2 point comparator, 5 is sample hold circuit, 6 is falling edge detection circuit, 7 is threshold sample bold circuit, 8 is section 1
? n comparator, 9 is a low pass filter (L P F
), the input audio signal is sent to the BPFi, ll flow circuit 2.
．． It is converted into an audio power signal as shown by A in FIG. 1 through the LPF 3, and is input to a 0-point comparator 4n and a delay block hexagonal changeover switch Sn. In the initial state, selector switches Sn, 5n-1, old... are all ON.
The signal is sequentially delayed by the sample pole circuit 5 according to the clock pulse. The 0-point comparator 4n first compares the first n-point audio power signal with an arbitrary level to detect the starting point of the section signal. Then (n
-1) In the point comparator, the values of point n and point (n-1) are compared, and if the difference value is constant and is less than or equal to Bell, the output becomes i-level.

第３図は、ｎ−］点以下のコンパレータの構成を示す図
で、コンパレータ１０．にょって入カ信４− 号■と■のレベル差をとり、コンパレータ１０２によっ
てその差信号◎と任意のレベル■とを比較する様になっ
ており、設定された任意のレベル以下であれば出力がロ
ーレベルとなる。この様なコンパレータを用いて（ｎ−
１）点と（ｎ　−２）点。FIG. 3 is a diagram showing the configuration of a comparator for points below n-], in which comparator 10. The level difference between input signals No. 4-■ and ■ is taken, and the difference signal ◎ is compared with an arbitrary level ■ by the comparator 102. If it is below the set arbitrary level, Output becomes low level. Using such a comparator (n-
1) point and (n -2) point.

（ｎ−２）点と（ｎ　−３）点・・・・・・を比較し、
各々の出力とその１回あとのコンパレータとの出力の排
他的論理和をとり、その変化サンプル点を検出する。Compare point (n-2) and point (n-3)...
The exclusive OR of each output and the output of the comparator one time after it is taken, and the change sample point is detected.

この変化サンプル点が最適閾値を得るサンプル点となる
。This change sample point becomes the sample point from which the optimal threshold value is obtained.

上述のようにして得られた変化サンプル点において、遅
延サンプルホールドの切り換えスイッチを切り換え、閾
値サンプルホールドと区間コンパレータに入力する。又
、先の排他的論理和信号の各出力の総論埋積をめその立
ち下がりエッヂを検出し、そのパルスで先の遅延サンプ
ルの値をサンプルホールドして最適閾値とし、それと先
の切り換えスイッチで切り換えられたすでに固定遅延量
となっている遅延音声パワー信号とを区間コンパレータ
で比較し、音声区間信号を得る。又、先の遅延音声パワ
ー信号は離散的値となっているので、Ｌ　Ｐ　Ｆを通し
て’Ｆ　？ｉ’）化すれが遅延音声パワー信号として用
いる事がＩｊｒ能であり、この様にして遅延音声パワー
信シ）と最適音声区間信号を得る事ができる。At the change sample point obtained as described above, the delay sample and hold changeover switch is switched and input to the threshold sample and hold and the interval comparator. Also, detect the falling edge of each output of the previous exclusive OR signal, sample and hold the value of the previous delayed sample with that pulse, set it as the optimal threshold, and use it with the previous changeover switch. A section comparator compares the switched delayed voice power signal, which has already reached a fixed delay amount, to obtain a voice section signal. Also, since the delayed audio power signal is a discrete value, 'F?' is passed through LPF. It is possible to use the signal i') as the delayed voice power signal, and in this way, the delayed voice power signal (i') and the optimal voice section signal can be obtained.

勿−−−（以上の説明から明らかなように１本発明によると、最適
な閾値をめて区間信号を検出する事が可能となり、しか
も、遅延量を自動的に最小にすることができる。(As is clear from the above description, according to the present invention, it is possible to detect a section signal by setting an optimal threshold value, and moreover, it is possible to automatically minimize the amount of delay.

[Brief explanation of the drawing]

第１図は、本発明の動作原理を説明するための図、第２
図は、本発明による音声区間検出回路の一実施例を説明
するための図、第３図は、ｎ−１点以下のコンパレータ
の詳細図である。１・・・バンドパスフィルタ（ＢＰＦ）、２・・・整流
回路、３・・・ローパスフィルタ（ＬＰＦ）、４ｎ・・
・１１点コンパレータ、４ｎ−１・・・ｎ−１点コンパ
レータ、４ｎ−２・・・ｎ−２点コンパレータ、５・・
・サンプルホールド回路、６・・・立下りエツジ検出回
路、７・・・７− 閾値サンプルホールド回路、８・・・区間コンパレータ
、９・・・ローパスフィルタ（ＬＰＦ）、１０１，１０
２・・・コンパレータ。 −８＝Figure 1 is a diagram for explaining the operating principle of the present invention, Figure 2 is a diagram for explaining the operating principle of the present invention.
The figure is a diagram for explaining one embodiment of the voice section detection circuit according to the present invention, and FIG. 3 is a detailed diagram of a comparator having n-1 points or less. 1... Band pass filter (BPF), 2... Rectifier circuit, 3... Low pass filter (LPF), 4n...
・11 point comparator, 4n-1... n-1 point comparator, 4n-2... n-2 point comparator, 5...
- Sample hold circuit, 6... Falling edge detection circuit, 7... 7- Threshold sample hold circuit, 8... Section comparator, 9... Low pass filter (LPF), 101, 10
2...Comparator. −8=

Claims

[Claims]

(1) In a speech recognition device having a means for converting an input speech signal into a speech power signal, a means for generating a section signal from a preset threshold level of a speech section detection section, and means for comparing the n-time delay block and the threshold level with the level of (n-1); and the level of (n-1) with the level of (n-2).
, (n -2) level and (n -3),...
. A voice section detection circuit characterized in that it has a means for comparing up to once.

(2) The delay amount of the delay block is variable and is automatically minimized.
1) The voice section detection circuit described in item 1).

(3) In a speech recognition device having means for converting an input speech signal into a speech power signal, the speech section detecting section sequentially detects the corresponding thread from a point in time when the speech power becomes larger than a preset threshold level. It has a circuit that determines whether the level difference between the hold level and the previous one in the past is larger than a predetermined value, and if it is smaller, it compares the level difference between the previous one and the previous two times, Hereinafter, a voice section detection circuit is characterized in that the optimum section detection level is automatically detected by repeating similar comparisons.