JPS60216400A

JPS60216400A - Voice section detecting circuit

Info

Publication number: JPS60216400A
Application number: JP59073538A
Authority: JP
Inventors: 河本　俊毅; 中谷　奉文; 安田　晴剛
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1984-04-12
Filing date: 1984-04-12
Publication date: 1985-10-29

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】九艶史■ 本発明は、音声認識装置における音声区間検出回路、よ
り詳細には、音声区間の切出し閾値の最適化に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a speech segment detection circuit in a speech recognition device, and more particularly to optimization of a speech segment extraction threshold.

【象敦遣一般に、入力音声の信号対雑音比が良好な音声を対象と
する場合には音声の存在する区間を抽出することは比較
的容易なことである。しかし音声認識装置が実際に使用
されるような環境においては、種々の騒音を含み、騒音
と重畳された形で音声が入力される。この時の騒音は時
々刻々と変化するため固定的な閾値を設けておいて音声
区間を切出す方法では安定した音声区間の検出は困難で
あり、誤認識の一因となる。このような固定閾値による
切出しでは音声の語頭及び語尾及び無音子音のようなパ
ワーの低い部分がカットされることになる。また、高騒
音下では本来音声区間であるべき所の前後に騒音が付加
して切出されることになる。[Examination] In general, when the target is input speech with a good signal-to-noise ratio, it is relatively easy to extract sections where speech exists. However, in an environment where a speech recognition device is actually used, speech is input in a form that contains various kinds of noise and is superimposed on the noise. Since the noise at this time changes from moment to moment, it is difficult to detect a stable voice section by setting a fixed threshold value and cutting out the voice section, which may cause misrecognition. When clipping is performed using such a fixed threshold, low-power parts such as the beginning and end of speech and silent consonants are cut. Furthermore, under high noise conditions, noise is added and cut out before and after what should normally be a voice section.

旦−一一一的本発明は、上述のごとき実情に鑑みてなされたもので、
特に、周囲の定常騒音レベルの大小にかかわりなく、安
定な音声区間検出を行ない、安定した認識率を確保する
ことのできる音声区間検出回路を提供することを目的と
してなされたものである。The present invention has been made in view of the above-mentioned circumstances.
In particular, it is an object of this invention to provide a speech section detection circuit that can perform stable speech section detection and ensure a stable recognition rate regardless of the level of ambient steady noise.

確−一一戒本発明の構成について、以下、実施例に基づいて説明す
る。The configuration of the present invention will be described below based on embodiments.

本発明の特徴とするところは、音声区間の終端から所定
時間ｔだけ経過した時点でのノイズレベルをホールドし
ておき、これを音−１区間検出のための閾イｆ１とする
ことにある。この時間ｔは促音を持つ単語などの無音区
間が２００〜４．００　ｍ　ｓ存在することに基づいて
おり、これ以下で次の音響が入力すれば前後する音声は
−Ｑｉ語として処理されるような配慮から決定される。A feature of the present invention is that the noise level at the time when a predetermined time t has elapsed from the end of the voice section is held, and this is set as the threshold f1 for detecting the sound-1 section. This time t is based on the fact that there is a silent interval of 200 to 4.00 ms, such as a word with a consonant, and if the next sound is input after this time, the preceding and succeeding sounds will be processed as -Qi words. The decision is made with due consideration.

第１図は、本発明の動作原理を説明するためのタイムチ
ャートで、第１図（ａ）は、入力信号の平均信号レベル
の例であり、Ａは入力音ｔ１信号、Ｔ１　、Ｔ２は閾値
のｖｌり換わり時点である。第１図（ｂ）は第１図（ａ
）の閾値で検出した音声区間信−づである。また、ｔ５
１図（ｃ）は第１図（ｂ）の音声区間信号の立ち下がり
から所定時間ｔ　（例えば３５０　ｍ　ｓ　）経過時に
、区間信号がローの時に発生する音声区間終端パルスで
、このパルスによって、後述のようにサンプルホールド
回路が働き、閾値が切り換わる。FIG. 1 is a time chart for explaining the operating principle of the present invention, and FIG. 1(a) is an example of the average signal level of the input signal, where A is the input sound t1 signal, and T1 and T2 are the threshold values. This is the point at which vl changes. Figure 1(b) is similar to Figure 1(a).
) is the voice section signal detected using the threshold value. Also, t5
FIG. 1(c) is a voice section end pulse that is generated when the section signal is low after a predetermined time t (for example, 350 ms) has elapsed since the fall of the voice section signal in FIG. 1(b). As will be described later, a sample hold circuit operates and the threshold value is switched.

第２図は、本発明による音声区間検出回路の一実施例を
説明するための電気的ブロック線図で、図中、■は入力
端子、２は検波回路、３は平滑回路、４はサンプルホー
ルド回路、５はレベル比較回路、６は音声区間検出回路
、７は出力端子で、入力端子ｌからの入力信号は検波回
路２、平滑回路３を通して平均レベルが検出され、該平
均レベル信号がサンプルホールド回路４及びレベル比較
回路５に入力される。レベル比較回路５からの出力信号
は音声区間検出回路６に入力され、ここで発生する音声
区間終端パルスがサンプルホールド回路４に入力され、
その時の信号レベルがホールドされる。このホールドさ
れた値と平均信号レベルとがレベル比較回路５で比較Ｘ
れ、音声区間信号が出力端子７に出力される。以トは音
声終端パルスレベルをそのまま次のｒｌｌ　（（ｉとす
る方式であるが、少し大きいノイズに対処するため、こ
の閾値にある固定値を加えたものを音声区間検出の閾値
とする方式も考えられる。FIG. 2 is an electrical block diagram for explaining one embodiment of the voice section detection circuit according to the present invention. In the figure, ■ is an input terminal, 2 is a detection circuit, 3 is a smoothing circuit, and 4 is a sample hold. 5 is a level comparison circuit, 6 is a voice section detection circuit, 7 is an output terminal, the average level of the input signal from the input terminal 1 is detected through the detection circuit 2 and the smoothing circuit 3, and the average level signal is sampled and held. The signal is input to the circuit 4 and the level comparison circuit 5. The output signal from the level comparison circuit 5 is input to the voice section detection circuit 6, and the voice section end pulse generated here is input to the sample hold circuit 4.
The signal level at that time is held. This held value and the average signal level are compared in the level comparison circuit 5.
Then, the voice section signal is output to the output terminal 7. Hereinafter, the voice end pulse level is used as is for the next rll Conceivable.

第３図は、上述のごとき考えに基づいて構成された音声
区間検出回路の一例を示す図で、図中、８は一定しベル
信号Ｖｓを発生するための基準電圧発生回路、９は加算
回路を示し、その化第２図と同一の機能をする部分には
第２図の場合と同一の参照番号がイ１しである。而して
、この実施例においては、サンプルホールド回路４の出
力信号に基準電圧発生回路８からの一定しベル信号Ｖｓ
が加算され、この加算された（ｆ＋が新しいＩｌｌ　４
ｆｉとなり、これにより、少１７大きいノイズに対処す
ることが可能となる。FIG. 3 is a diagram showing an example of a voice section detection circuit constructed based on the above-mentioned idea. In the figure, 8 is a reference voltage generation circuit for generating a constant bell signal Vs, and 9 is an adder circuit. The parts having the same functions as those in FIG. 2 are designated by the same reference numerals as in FIG. 2. In this embodiment, the output signal of the sample and hold circuit 4 is supplied with a constant bell signal Vs from the reference voltage generation circuit 8.
is added, and this added (f+ is the new Ill 4
fi, which makes it possible to deal with noise that is 17 times louder.

作−−１以上の説明から明らかなように、本発明によると、最適
な値をめて区間信号を検出することが可能となり、音声
認識をより正確に行うことが可能となる。Operation--1 As is clear from the above description, according to the present invention, it becomes possible to detect a section signal by determining an optimum value, and it becomes possible to perform speech recognition more accurately.

[Brief explanation of the drawing]

第１図は、本発明の動作原理を説明するためのタイムチ
ャート、第２図及び第３図は、それぞれ本発明の詳細な
説明するための電気的ブロック線図であるうｌ・・・入力端子、２・・・検波回路、３・・・平滑回
路、４・・・サンプルホールド回路、５・・・レベル比
較回路、６・・・音声区間検出回路、７・・・出力端子
、８・・・基準電圧源、９・・・加算回路。手続補正書（岐）昭和５９年６月６日昭和５９年　特許願　第７３５３８号２、発明の名称音声区間検出回路３、補正をする者事件との関係　特許出願人オオタ　り　ナカマゴメ住所　東京都大田区中馬込１丁目３番６号氏名（名称）
　（６７４）株式会社リコー代表者　浜　１）　広４、代　理　人住　所　〒２３１　横浜市中区不老町１−２−７シヤト
レーイン横浜８０７号６、補正の対象明細書の発明の詳細な説明の欄７、補正の内容（１）、明細書第４頁第２０行目から第５頁第１行目に
記載の「パルスレベルをそのまま次の」を「パルス発生
時の信号レベルをそのまま次の」に補正する。（２）、同第５頁第２行目に記載の「少し大きいノイズ
」を「より大きいノイズ」に補正する。（３）、同第１４行目に記載の「これにより、少し大き
いノイズ」を「これにより、大きいノイズ」に補正する
。FIG. 1 is a time chart for explaining the operating principle of the present invention, and FIGS. 2 and 3 are electrical block diagrams for explaining the present invention in detail. Terminal, 2... Detection circuit, 3... Smoothing circuit, 4... Sample hold circuit, 5... Level comparison circuit, 6... Voice section detection circuit, 7... Output terminal, 8... ...Reference voltage source, 9...Addition circuit. Procedural Amendment (Ki) June 6, 1980 Patent Application No. 73538 2, Name of the invention Vocal section detection circuit 3, Relationship with the case of the person making the amendment Patent applicant Ota Ri Nakamagome Address Ota, Tokyo 1-3-6 Nakamagome, Ward Name
(674) Ricoh Co., Ltd. Representative: Hama 1) Hiro 4, Agent Address: 807 No. 6, Sha Train Yokohama, 1-2-7 Furo-cho, Naka-ku, Yokohama 231, Detailed description of the invention in the specification subject to amendment Column 7, Correction details (1), from page 4, line 20 of the specification to page 5, line 1, change ``the pulse level to the next one as it is'' to ``to change the signal level at the time of pulse generation to the next one without changing it''. ”. (2) Correct the "slightly louder noise" described in the second line of page 5 to "larger noise." (3) Correct "thereby a little louder noise" written in the 14th line to "thereby a louder noise".

Claims

[Claims]

(1) In a voice section detection circuit that extracts the voice signal power and sets a threshold value within the silent section to cut out the voice section, a pulse is emitted once a predetermined time has elapsed from the end of voice I. and a sample-and-hold circuit that samples and holds the noise level at the time of the pulse generation, and uses the hold value as a sound section extraction threshold? A voice section detection circuit characterized by detecting a \rjH section.

(2), having one stage for adding a certain fixed value to the threshold;
The speech section detection circuit according to claim 1, wherein the speech section is detected by using the value as a value for cutting out the speech section.