JPS60191300A

JPS60191300A - Voice section detecting circuit

Info

Publication number: JPS60191300A
Application number: JP59047940A
Authority: JP
Inventors: 中谷　奉公; 安田　晴剛; 河本　俊毅
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-03-13
Filing date: 1984-03-13
Publication date: 1985-09-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】良亙分１本発明は、音声認識装置における音声区間検出回路、よ
り詳細には、音声区間の切り出し安定化に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech segment detection circuit in a speech recognition device, and more particularly to stabilization of segmentation of speech segments.

【末盈遺音声認識装置において、入力音声の信号対雑音比が良好
な音声を対象とする場合には、音声の存在する区間を抽
出することは比較的容易なことである。しかし音声認識
装置が実際に使用されているような環境においては種々
の騒音を含み、音声は騒音と仄畳された形で入力される
。このときの騒音は、時々刻々と変化するので固定的な
閾値を設けておいて音声区間を切り出すような方法では
安定な音声区間の検出は困難であり誤認識の一因となる
。また、このような固定閾値による切り出しでは音声の
語頭語尾及び無声子音のようなパワーの低い部分がカッ
トされることになる。更に、高騒音下では本来音声区間
であるべきところの前後に騒音が付加して切り出される
ことになる。[In a speech recognition device, when input speech is targeted at speech with a good signal-to-noise ratio, it is relatively easy to extract sections where speech exists. However, the environment in which a speech recognition device is actually used includes various types of noise, and speech is input in a form mixed with noise. Since the noise at this time changes from moment to moment, it is difficult to detect a stable voice section using a method that sets a fixed threshold value and cuts out the voice section, which may cause misrecognition. Further, when clipping is performed using such a fixed threshold, low-power parts such as the beginning and end of speech and voiceless consonants are cut. Furthermore, under high noise conditions, noise is added before and after what should normally be a voice section and is cut out.

貝−一一一的本発明は、上述のごとき従来技術の欠点を解決するため
になされたもので、特に、周囲の定常騒音レベルの大小
にかかわりなく安定な音声区間の検出を行ない、安定し
た認識率を確保することのできる音声区間検出回路を提
供することを目的としてなされたものである。The present invention was made in order to solve the above-mentioned shortcomings of the prior art, and in particular detects a stable voice section regardless of the magnitude of the surrounding steady noise level. The purpose of this invention is to provide a speech section detection circuit that can ensure a high recognition rate.

１−一誠本発明の構成について、以下、実施例に基づいて説明す
る。1-Issei The configuration of the present invention will be described below based on examples.

本発明は、音声を続けて発声する離散的な音節、単語の
間のノイズレベルが前後に離散発声する音節なり単語間
で大きく変化しないという仮定に基づいている。従って
、音声切り出しの閾値を一つ前の音声のピーク値をホー
ルドしておき、次の音声の始端以前のノイズレベルのピ
ーク又は平均値で除算してそのときのＳ／Ｎをめ、これ
を閾値として音声の始端と終端を検出することにある。The present invention is based on the assumption that the noise level between consecutively uttered discrete syllables or words does not change significantly between consecutively uttered syllables or words. Therefore, set the threshold for audio extraction by holding the peak value of the previous audio and dividing it by the peak or average value of the noise level before the start of the next audio to find the S/N at that time. The goal is to detect the beginning and end of audio as thresholds.

このときノイズレベルを検出するポイントは前の音声の
終端から時間Ｔだけ遅れた時点である。この時間Ｔは、
一般に、単語なら語中の促音／ツ／などの無音区間が２
００〜４００ｍ５存在することに基づいており、これ以
下′で次の音声が入力すれば前後する音声は一つの単語
（又は単位）として処理されるような配慮から決定され
る。At this time, the point at which the noise level is detected is a time point delayed by a time T from the end of the previous voice. This time T is
Generally, in a word, there are 2 silent intervals such as the consonant /tsu/ in the word.
This is based on the fact that there are 00 to 400 m5, and the decision is made with the consideration that if the next voice is input within this range, the preceding and succeeding voices will be processed as one word (or unit).

第１図は、本発明の動作原理を説明するためのタイムチ
ャートで、（ａ）は入力信号の平均信号レベルの例であ
り、Ｔ、、Ｔ２は閾値ＴＨの切り変わり時点を、Ａ、Ｂ
、Ｃ，Ｄはそれぞれ音声の始端（Ａ　、　Ｃ）　、及び
終端（Ｂ　、　Ｄ）を示している。（ｂ）はピークホー
ルド波形を示し、Ｄ点でリセットされており、・と０印
でノイズレベルとピークレベルが検出されている様子を
示している。（Ｃ）はもう一方のピークホールド波形を
示しくｂ）と同様であるがＢ点でリセットされ・と・印
でピークレベルとノイズレベルが検出されている様子を
示している。（ｄ）は（ａ）の閾（＋Ｕで検出した、音
声区間検出パルス信号である。（ｅ）は（ｄ）の立ち下
がりでオンするフリップフロップ信号出力Ｑを示し、（
ｆ）は（ｅ）の逆相ｑを示している。つまりこの（ｅ）
と（ｆ）のパルスの立ち一ヒがりてピークホールド回路
をリセットする（（ｂ）及び（ｃ）参照）。（ｇ）は（
ｄ）の時間Ｔだけ遅延した波形を示し、（ｈ）は（ｇ）
の立ち下がりでオンするフリップフロップ出力を示し、
この信号でピークホールド信号の切換えを行なう。FIG. 1 is a time chart for explaining the operating principle of the present invention, in which (a) is an example of the average signal level of the input signal, T, , T2 are the switching points of the threshold TH, A, B
, C, and D indicate the beginning (A, C) and end (B, D) of the audio, respectively. (b) shows a peak hold waveform, which is reset at point D, and indicates that the noise level and peak level are detected by the . and 0 marks. (C) shows the other peak hold waveform, which is similar to b), but has been reset at point B, and indicates that the peak level and noise level have been detected by the dot marks. (d) is the voice section detection pulse signal detected at the threshold (+U) in (a). (e) shows the flip-flop signal output Q that turns on at the falling edge of (d);
f) shows the reverse phase q of (e). In other words, this (e)
The rise and fall of the pulses (f) and (f) reset the peak hold circuit (see (b) and (c)). (g) is (
d) shows the waveform delayed by time T, and (h) shows the waveform of (g).
shows a flip-flop output that turns on at the falling edge of
This signal is used to switch the peak hold signal.

第２図は、本発明の一実施例を説明するための電気的ブ
ロック線図で、図中、ｌは入力部、２は検波回路、３は
平滑回路、４及び５はピークホールド回路、６はレベル
比較回路、７及び８はスイッチ、９及び１０は除算回路
、１１及び１２はフリップフロップ回路、１３は出力部
で、入力部ｌからの入力信号は、検波回路２及び平滑回
路３を通してその平均信号レベルが検出され、ピークホ
ールド回路４，５及びレベル比較回路６に入力される。FIG. 2 is an electrical block diagram for explaining one embodiment of the present invention, in which l is an input section, 2 is a detection circuit, 3 is a smoothing circuit, 4 and 5 are peak hold circuits, and 6 1 is a level comparison circuit, 7 and 8 are switches, 9 and 10 are division circuits, 11 and 12 are flip-flop circuits, 13 is an output section, and the input signal from the input section l passes through the detection circuit 2 and the smoothing circuit 3. The average signal level is detected and input to peak hold circuits 4 and 5 and level comparison circuit 6.

夫々ピークホールド信号はそれぞれスイッチ７．８の一
方の入力端子に印加されスイッチ７からはピークホール
ド値が、スイッチ８からはノイズレベルが同期して出力
される。この信号は除算回路９でＳ／Ｎかめられ閾値が
決定され、レベル比較回路６の基準端子に加えられる。Each peak hold signal is applied to one input terminal of a switch 7, 8, and the peak hold value is outputted from the switch 7, and the noise level is outputted from the switch 8 in synchronization. This signal is divided by S/N in a division circuit 9 to determine a threshold value, and is applied to a reference terminal of a level comparison circuit 6.

この除算回路９はスイッチ７．８に連動して該スイッチ
が切変った時点で除算した結果を保持し続は第１図（ｇ
）の立ち下がり時点でリセットと除算保持を繰り返す。This division circuit 9 is linked to the switch 7.8 and holds the result of division at the time when the switch is turned on.
) repeats the reset and division hold at the falling edge.

比較器６から第１図（ｄ）の区間信号が出力され、出力
端子１３に導かれると同時に遅延時間Ｔの遅延器１０と
フリップフロップ回路１２に印加される。遅延器１０の
出力はフリップフロップ回路１１で第１図（ｈ）の出力
信号が得られスイッチ７．８を制御する。一方、フリッ
プフロップ回路１２Ｆ７）Ｑ　、　’Ｃｌイｉ号（第１
図（ｅ）及び（ｆ）参照）はピークホールド回路４，５
のリセット信号としてホールド回路を制御する。このよ
うにして第１図（ａ）に示す閾値が設定され、区間信号
（第１図（ａ））が検出される。上記第２図に示した実
施例は、ピークホールド回路での信号ピーク値とノイズ
ピーク値を用いてＳ／Ｎをめる方式であるが、ノイズは
ピーク値でなく平均値を使用して信号のピーク値とノイ
ズの＋Ｆ均値からＳ／Ｎをめるようにしてもよい。The section signal shown in FIG. 1(d) is outputted from the comparator 6, guided to the output terminal 13, and simultaneously applied to the delay device 10 having a delay time T and the flip-flop circuit 12. The output of the delay device 10 is outputted to a flip-flop circuit 11 to obtain the output signal shown in FIG. 1(h), which controls the switch 7.8. On the other hand, the flip-flop circuit 12F7)Q, 'Clii No.
(See Figures (e) and (f)) are peak hold circuits 4 and 5.
The hold circuit is controlled as a reset signal. In this way, the threshold shown in FIG. 1(a) is set, and the section signal (FIG. 1(a)) is detected. The embodiment shown in Fig. 2 above uses the signal peak value and the noise peak value in the peak hold circuit to calculate the S/N, but for noise, the average value is used instead of the peak value. The S/N may be calculated from the peak value of the noise and the +F average value of the noise.

第３図は、上記信号のピーク値とノイズの平均値とから
Ｓ／Ｎをめるようにした場合の実施例を示す図で、図中
、第２図と同様の作用をする部分には第２図の場合と同
一の参照番号が付しである。而して、この第３図に示し
た実施例が第２図に示した実施例と異なっているところ
は、第２図の実施例において使用していたスイッチ８を
具備せず、除算回路９に平均信号レベル信号が直接印加
されていることである。この除算回路９は第２図と同様
、スイッチに連動して除算値を保持する。なお、他の動
作は第２図と同様であるので、その説明は省略するが、
この実施例は第２図と比べると閾イ１ａが小さくでる傾
向にある。FIG. 3 is a diagram showing an example in which the S/N is calculated from the peak value of the signal and the average value of the noise. The same reference numbers as in FIG. 2 are provided. The difference between the embodiment shown in FIG. 3 and the embodiment shown in FIG. 2 is that the switch 8 used in the embodiment of FIG. The average signal level signal is directly applied to the signal. Similar to FIG. 2, this division circuit 9 holds the division value in conjunction with the switch. Note that the other operations are the same as those in Figure 2, so their explanation will be omitted.
In this embodiment, the threshold value 1a tends to be smaller than that in FIG.

丸−−〕以上の説明から明らかなように、本発明によると、音声
の間でノイズレベルを検出して、前の音声のレベルが次
の音声レベルと変らないとの仮定から直前の音声レベル
のピーク値をめてこの間のＳ／Ｎをめ、これを閾値とし
て次の＠岸の始端と終端を検出することによりノイズレ
ベルと音声レベルの両方の変化に対応して音声区間を切
り出すことができ、従って、より現実に即した区間検出
が可能となる。Circle --] As is clear from the above description, according to the present invention, the noise level is detected between voices, and the previous voice level is determined based on the assumption that the level of the previous voice is the same as the next voice level. By determining the peak value of , the S/N ratio during this period, and using this as a threshold to detect the start and end of the next @ shore, it is possible to cut out a voice section in response to changes in both the noise level and the voice level. Therefore, more realistic section detection becomes possible.

[Brief explanation of drawings]

第１図は本発明の動作説明をするだめのタイムチャート
、第２図及び第３図は、それぞれ本発明の詳細な説明す
るための電気的ブロック線図である。１・・・入力部、２・・・検波回路、３・・・平滑回路
、４．５・・・ピークホールド回路、６・・・レベル比
較口−路、７．８・・・スイッチ、９．１０・・・除算
回路、１１．１２・・・フリップフロップ回路、１３・
・・出力部。第１図第２図手続補正書（岐）昭和５９年４月２３日昭和５９年　特許願　第４７９４０号２、発明の名称音声区間検出回路３、補正をする者事件との関係　特許出願人オオタ　り　ナカマゴメ住所　東京都大田区中馬−込　１丁目３番６号氏名（名
称）　（６７，４）株式会社リコー代表省　浜　１）　
広４、代　理　人住　所　〒２３１　横浜市中区不老町１−２−７シヤト
レーイン横浜８０７号７、補正の内容（１）、明細書第６頁第２行目に記載の「９及びｌＯは
除算回路、」を「９は除算回路、１ｏは遅延回路、」に
補正する。（２）、明細書第９頁第２行目に記載のｒ９，１０川除
算回路、」を「９・・・除算回路、１０山遅延回路、」
に補正する。（３）、第１図を別紙の通り補正する。第　１　図FIG. 1 is a time chart for explaining the operation of the present invention, and FIGS. 2 and 3 are electrical block diagrams for explaining the present invention in detail. DESCRIPTION OF SYMBOLS 1...Input part, 2...Detection circuit, 3...Smoothing circuit, 4.5...Peak hold circuit, 6...Level comparison port-path, 7.8...Switch, 9 .10... Division circuit, 11.12... Flip-flop circuit, 13.
...Output section. Figure 1 Figure 2 Procedural Amendment (Kiji) April 23, 1980 Patent Application No. 47940 2, Name of the invention Speech section detection circuit 3, Relationship with the person making the amendment Case Patent applicant Ota ri Nakamagome Address 1-3-6 Nakama-kome, Ota-ku, Tokyo Name (67,4) Ricoh Co., Ltd. Representative Ministry Hama 1)
Hiro 4, Agent Address: 7, Sha Train Yokohama 807, 1-2-7, Furo-cho, Naka-ku, Yokohama 231, Contents of amendment (1), "9 and 1O described in the second line of page 6 of the specification" is a division circuit," is corrected to "9 is a division circuit, and 1o is a delay circuit." (2) "r9, 10 river division circuit described in the second line of page 9 of the specification," was replaced with "9... division circuit, 10 river delay circuit,"
Correct to. (3) Correct Figure 1 as shown in the attached sheet. Figure 1

Claims

[Claims]

(1) In a speech recognition device, means for detecting the average signal level of an input signal, means for detecting and holding the peak value of this average signal level in parallel for each utterance unit, and detecting and holding the peak value of the average signal level for each utterance unit; means for switching between the peak value of the voice and the peak value of the noise; means for dividing the peak value of the signal by the peak value of the noise and holding the divided value; A voice section detection circuit characterized in that it has a means for comparing.

(2) The voice section detection circuit according to claim (1), wherein the dark value is set after a certain time delay from the end of the voice.

(3) In the speech recognition device, means for detecting the average signal level of the input No. 4g, means for detecting and holding the peak value of this average signal level in parallel for each utterance unit, and two peak value detection signals. means for calculating the peak value of the voice from , means for dividing the peak value of the signal by the average signal level of the noise and holding the divided value, and means for comparing the divided value with the average signal level as a dark value. What is claimed is: 1. A voice section detection circuit comprising:

(4) The voice section detection circuit according to claim (3), wherein the threshold value is set with a certain time delay from the end of the voice.