JPH0823756B2

JPH0823756B2 - Voice section detection method

Info

Publication number: JPH0823756B2
Application number: JP63198162A
Authority: JP
Inventors: 敬三木
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1988-08-09
Filing date: 1988-08-09
Publication date: 1996-03-06
Anticipated expiration: 2011-03-06
Also published as: JPH0247698A

Description

【発明の詳細な説明】（産業上の利用分野）この発明は音声認識装置における音声区間の検出方式
に関するものである。The present invention relates to a method of detecting a voice section in a voice recognition device.

（従来の技術）通常の音声認識装置では、入力された音響信号より音
声が存在する区間を検出する処理（以下音声区間検出処
理と呼ぶ）と、検出された音声の内容を認識判定する
（以下認識処理と呼ぶ）処理に大別できる。(Prior Art) In a normal voice recognition device, a process of detecting a section in which a voice exists from an input acoustic signal (hereinafter referred to as a voice section detection process) and a determination of the content of the detected voice (hereinafter It can be roughly divided into processing (called recognition processing).

通例このような動作を行うために音声認識装置では入
力された音響フレームと呼ばれている微小時間毎に音響
信号を分析してその特徴パラメータを算出している。特
徴パラメータとしては音響パワー、パワースペクトル等
が代表的なものである。In order to perform such an operation, a speech recognition apparatus usually analyzes an acoustic signal at every minute time called an acoustic frame and calculates a characteristic parameter thereof. Representative examples of characteristic parameters are acoustic power and power spectrum.

音声区間検出は音声区間がそれ以外の区間に比べ音響
パワーが大きいという性質を利用している。The voice section detection utilizes the property that the sound power of the voice section is larger than that of other sections.

このような従来の音声区間の検出処理方式としては例
えば文献：特開昭60−114900号公報に開示されたものが
ある。この従来方式の一構成例を第２図を参照して説明
する。As such a conventional voice section detection processing method, for example, there is one disclosed in the document: Japanese Patent Laid-Open No. 60-114900. An example of the configuration of this conventional method will be described with reference to FIG.

外部入力部10、例えば、マイクロホン、電話機等から
入力した音響信号をA/D変換部12において標本化しディ
ジタル信号系列に変換する。次のパワー算出部14ではこ
のディジタル信号系列（以下単に入力信号とする）から
フレーム毎に音響パワーP_I（Ｉはフレーム番号を示す）
を演算し、これを音声区間検出部16及び閾値設定部18に
それぞれ送出する。閾値設定部18において、後述するよ
うに、この音響パワーP_Iに基づいて平均雑音レベルを算
定して音声区間検出部16へ送り、この音声区間検出部16
において、音響パワーP_Iと平均雑音レベルとから音声区
間を検出して判定する。次の認識部20においては、音声
区間の音響パワー系列からなる音声パタンに対して認識
処理が行なわれ、その認識結果が外部機器22、例えば、
コンピュータとかその他所要の表示装置等へ送られる。An acoustic signal input from an external input unit 10, such as a microphone or a telephone, is sampled by the A / D conversion unit 12 and converted into a digital signal sequence. In the next power calculation unit 14, the acoustic power P _I (I indicates a frame number) for each frame from this digital signal sequence (hereinafter simply referred to as an input signal)
Is calculated and transmitted to the voice section detection unit 16 and the threshold value setting unit 18, respectively. In the threshold setting unit 18, as will be described later, an average noise level is calculated based on the sound power P _I and sent to the voice section detection unit 16, and the voice section detection unit 16
In, the voice section is detected and determined from the sound power P _I and the average noise level. In the next recognition unit 20, a recognition process is performed on a voice pattern composed of a sound power sequence of a voice section, and the recognition result is an external device 22, for example,
It is sent to a computer or other required display device.

このような構成の従来の音声認識装置では認識動作に
先立って前述したように音声区間検出のための平均雑音
レベルを設定する目的で、背景雑音レベルの測定を行っ
ている。これは無入力状態での音響パワーの性質を測定
し適切な音声区間検出用闘値を決定するためである。In the conventional voice recognition apparatus having such a configuration, the background noise level is measured for the purpose of setting the average noise level for voice section detection as described above prior to the recognition operation. This is to measure the property of the sound power in the non-input state and determine an appropriate threshold for detecting the voice section.

以下、この処理につき説明する。外部入力部10より入
力された音響信号からパワー算出部14で得られた音響パ
ワーP_Iに基づいて、闘値設定部18では平均雑音レベル
N_L、平均雑音分散N_Dを算出する。これら平均雑音レベル
N_L及び平均雑音分散N_Dは、Ｎを測定フレーム数とすると
次の（１）及び（２）式でそれぞれ与えられている。Hereinafter, this process will be described. Based on the acoustic power P _I obtained by the power calculation unit 14 from the acoustic signal input from the external input unit 10, the average noise level is set by the threshold value setting unit 18.
Calculate N _L and average noise variance N _D. These average noise levels
N _L and average noise variance N _D are given by the following equations (1) and (2), respectively, where N is the number of measurement frames.

さらに平均雑音レベルN_L及び平均雑音分散N_Dから下記
の（３）式に従って音声切り出しレベルV_Lを決定してい
る。 Further, the voice cut-out level V _L is determined from the average noise level N _L and the average noise variance N _D according to the following equation (3).

V_L＝N_L＋N₁×N_D ・・・・（３）ここで、N₁はあらかじめシステムで定めた計数であり
通例２〜４程度の値となる。このように算定された音声
切り出しレベルV_Lを以後音声区間検出部16で利用する。V _L = N _L + N ₁ × N _D (3) Here, N ₁ is a count determined in advance by the system and usually has a value of about 2 to 4. The voice cut-out level V _L calculated in this way is used by the voice section detection unit 16 thereafter.

次に従来の音声区間検出動作について簡単に説明す
る。Next, a conventional voice section detection operation will be briefly described.

先ず、通常の如く、外部入力部10より入力された音響
信号をA/D変換部12において入力信号に変換した後、パ
ワー算出部14にて音響パワーP_Iを算出する。この音響パ
ワーP_Iの一例を第３図に示す。同図において、縦軸に音
響パワーP_I、横軸にフレーム番号Ｉをとって示してあ
る。図中、破線は音声切り出しレベルV_Lを表している。
I_S及びI_Eは音声区間の音声始端及び音声終端である。ま
た、V_S、V_Eは音声始端フレーム及び音声終端フレームで
あり、通常はフレーム周期を８ミリ秒程度としている。First, as usual, after converting the acoustic signal input from the external input unit 10 into an input signal in the A / D conversion unit 12, the power calculation unit 14 calculates the acoustic power P _I. An example of this acoustic power P _I is shown in FIG. In the figure, the vertical axis represents the acoustic power P _I and the horizontal axis represents the frame number I. In the figure, the broken line represents the audio cutout level V _L.
I _S and I _E are the voice start end and voice end of the voice section. Further, V _S and V _E are a voice start frame and a voice end frame, and the frame period is usually set to about 8 milliseconds.

音声区間検出部16では上述した音声区間を切り出す処
理を行うもので、従来は音響パワーP_Iに対して次の条件
〜が成立する最初のフレームを音声区間の始端フレ
ームとしている。The voice section detection unit 16 performs the above-described processing of cutting out the voice section, and conventionally, the first frame satisfying the following conditions ( ₁₎ to the acoustic power P _I is set as the start frame of the voice section.

始端条件Ｐ≧V_LとなるフレームがあるフレームＩ以降、予め経
験により定められている複数個すなわちN₂個のフレーム
以上継続したとき、このフレームＩを始端フレームV_Sと
する。After a frame I having a frame satisfying the starting condition P ≧ V _L , when a plurality of frames, that is, N ₂ frames which are predetermined by experience, are continued, the frame I is set as a starting frame V _S.

終端条件また、又始端フレームV_Sを検出後、以下の条件が最初
に成立するフレームの直前のフレームを音声区間の終端
フレームV_Eとする。Termination Condition Further, after detecting the beginning frame V _S , the frame immediately before the frame in which the following condition is first satisfied is set as the termination frame V _E of the voice section.

Ｐ＜V_LとなるフレームがフレームＩ以降、予め経験に
より定められている複数個すなわちN₃個のフレーム以上
継続したとき。When the number of frames satisfying P <V _L continues from the frame I onward for a plurality of frames, which is predetermined by experience, that is, N ₃ frames or more.

除外条件さらに音声区間長V_LENが以下の条件にかかる場合には
音声区間とみなさない。Exclusion condition If the voice section length V _LEN satisfies the following conditions, it is not considered as a voice section.

V_LEN＜N₄又はV_LEN＞N₅ 但し V_LEN＝V_E−V_S＋１でありかつN₄及びN₅は経験により予め定められたフレー
ム数である。V _LEN <N ₄ or V _LEN > N ₅ where V _LEN = V _E −V _S +1 and N ₄ and N ₅ are empirically predetermined number of frames.

（発明が解決しようとする課題）上述した従来の音声切り出しレベルV_Lの算定は、背景
雑音の音響パワーの分布が正規分布に近いことを仮定し
ている。実際静かな環境下ではこのような近似がよく当
てはまる。しかし騒音レベルが高いような環境か、もし
くは電話等の回線を経由してきたような入力条件では、
クリック音等の継続時間は短いがピークの音響パワーが
極めて高い雑音が存在するため、この近似から外れる場
合が多く、これがため、第４図に示される様に音響パワ
ーレベルのかなり高いところの分布が増加する。(Problems to be Solved by the Invention) In the above-described conventional calculation of the voice cut-out level _VL , it is assumed that the distribution of the acoustic power of the background noise is close to the normal distribution. In fact, such an approximation is often true in a quiet environment. However, in an environment where the noise level is high, or in the input conditions where the line such as a telephone is used,
Since there are noises such as click sounds that have a short duration but extremely high peak sound power, they often deviate from this approximation, and as a result, as shown in Fig. 4, the distribution of sound power levels at a fairly high level. Will increase.

従ってこのような雑音がちょうど背景雑音レベルの測
定時に発生すると、平均雑音レベルN_L、平均雑音分散N_D
が共に高く算定されてしまい、これは音声区間検出誤り
の原因となる。このような減少を軽減する一手法として
平均雑音レベルの測定時間Ｎを長くする手法があるが、
この手法では認識開始に至るまでの準備時間が長くなり
音声認識装置自体の応答性が低下してしまうため、充分
な測定時間Ｎを採用出来なかった。Therefore, if such noise occurs just when measuring the background noise level, the average noise level N _L and the average noise variance N _D
Both are calculated to be high, which causes a voice section detection error. As a method of reducing such a decrease, there is a method of lengthening the measurement time N of the average noise level.
With this method, the preparation time until the start of recognition becomes long and the responsiveness of the voice recognition device itself deteriorates, so a sufficient measurement time N could not be adopted.

この発明の目的は、上述したクリック音等の雑音環境
下においても音声区間検出誤りを著しく減少させること
が出来るような音声切り出しレベルV_Lを設定出来る音声
区間検出方式を提供することにある。An object of the present invention is to provide a voice section detection method capable of setting a voice cutout level _VL capable of significantly reducing a voice section detection error even in a noise environment such as the above-mentioned click sound.

（課題を解決するための手段）この目的の達成を図るため、この発明の音声区間検出
方式によれば、閾値算出部において、音響パワーP_Iのう
ち最も大なる値を持つものから順に、第一の所定の個数
N_maxの音響パワーと、最も小なる値をもつものから順に
第二の所定の個数N_minの音響パワーとを除いた残りの全
ての音響パワーP_Iに対して平均雑音レベルN_L′、平均雑
音分散N_D′を算出した後、当該平均雑音レベルN_L′及び
平均雑音分散N_D′より音声切り出しレベルV_Lを算定する
ことを特徴とする。(Means for Solving the Problem) In order to achieve this object, according to the voice section detection method of the present invention, in the threshold value calculation unit, the sound power P _I having the largest value is sequentially arranged from the one having the largest value. One predetermined number
Average noise level N _L ′, average for all remaining sound power P _I except N _max sound power and second predetermined number N _min of sound powers in order from the smallest value After the noise variance N _D ′ is calculated, the speech cutout level V _L is calculated from the average noise level N _L ′ and the average noise variance N _D ′.

（作用）このように構成すれば、音声無入力時の音響パワー分
布のうちクリック音等の雑音に起因する高音響パワー側
と、その他の雑音に起因する低音響パワー側を除いた、
本来の音響パワーが集中する中間の分布領域中の音響パ
ワーを用いて音声切り出しレベルV_Lを定める方式である
ので、ピークパワーの高い雑音成分にほとんど影響され
ずに適切な音声切り出しレベルV_Lを著しく簡単に決定出
来る。その結果、音声区間検出の誤りが減少する。従っ
て、総合的な認識性能に優れた音声認識装置を提供する
ことになる。(Operation) With this configuration, the high acoustic power side caused by noise such as a click sound and the low acoustic power side caused by other noise are excluded from the acoustic power distribution when no voice is input,
This is a method to determine the voice cutout level _VL using the sound power in the middle distribution area where the original sound power is concentrated, so an appropriate voice cutout level _VL is hardly affected by the noise component with high peak power. It's extremely easy to determine. As a result, erroneous voice segment detection is reduced. Therefore, it is possible to provide a voice recognition device having excellent overall recognition performance.

（実施例）以下、図面を参照してこの発明の音声区間検出方式の
実施例を説明する。(Embodiment) An embodiment of the voice section detection method of the present invention will be described below with reference to the drawings.

第１図はこの発明の音声区間検出方式の実施例の説明
に供するブロック図、第５図は閾値設定部での処理の流
れ図である。FIG. 1 is a block diagram for explaining an embodiment of a voice section detection system of the present invention, and FIG. 5 is a flow chart of processing in a threshold value setting unit.

第１図において、第２図に示した構成成分と同一の構
成成分については同一の符号を付して示し、その詳細な
説明を省略する。In FIG. 1, the same components as those shown in FIG. 2 are designated by the same reference numerals, and detailed description thereof will be omitted.

又、第１図において、24は第２図に示す従来の閾値設
定部18に対応する閾値設定部であるが、この従来の閾値
設定部18とはその機能従って内部構成が異なる。Further, in FIG. 1, reference numeral 24 is a threshold value setting unit corresponding to the conventional threshold value setting unit 18 shown in FIG. 2, but the internal structure is different from the conventional threshold value setting unit 18 because of its function.

先ず、この実施例における閾値設定部24につき第５図
を併用しながら説明する。First, the threshold value setting unit 24 in this embodiment will be described with reference to FIG.

この実施例では、先ず、音声無入力状態で各フレーム
Ｉ（Ｉ＝１、・・・、Ｎ）毎の音響パワーＰ（Ｉ）をパ
ワー算出部14で算出し、これを閾値設定部24及び音声区
間検出部16に送る。In this embodiment, first, the sound calculation unit 14 calculates the acoustic power P (I) for each frame I (I = 1, ..., N) in the state of no audio input, and the calculated sound power P (I) is calculated by the threshold setting unit 24 and It is sent to the voice section detection unit 16.

閾値設定部24においては、マイクロプロセッサ30の制
御の下で、これら音響パワーＰ（Ｉ）をパワー算出部14
からシステムバス36を経てメモリ32の各メモリ領域RMEM
（１）、RMEM（２）、RM・・・PMEM（Ｎ）に一時記憶す
る。この場合、Ｉ＝１（１番目）のフレームから処理を
開始する（ステップS1）。次にＩ＞Ｎであるかを判定し
（ステップS2）、Ｉ≦Ｎである場合には１番目のフレー
ムの音響パワーP₁をメモリ領域RMEM（１）に一時記憶す
る（ステップS3）。次にフレーム番号Ｉを次のＩ＝２へ
進め（ステップS4）、上述したステップS2へ戻し、ステ
ップS2及びS3の処理を行って２番目（Ｉ＝２）のフレー
ムの音響パワーP₂をメモリ領域RMEM（２）へ一時記憶す
る。このように、順次に、Ｉ＝Ｎまで各音響パワーP_Iを
それぞれ対応するメモリ領域RMEM（Ｎ）へ一時記憶す
る。Under the control of the microprocessor 30, the threshold setting unit 24 calculates the acoustic power P (I) as the power calculation unit 14
Through the system bus 36 to each memory area RMEM of the memory 32
(1), RMEM (2), RM ... PMEM (N) temporarily stores. In this case, the process is started from the frame of I = 1 (first) (step S1). Next, it is determined whether I> N (step S2). If I ≦ N, the acoustic power P ₁ of the _first frame is temporarily stored in the memory area RMEM (1) (step S3). Next, the frame number I is advanced to the next I = 2 (step S4), the process returns to step S2 described above, and the processes of steps S2 and S3 are performed to store the acoustic power P ₂ of the second (I = 2) frame. Temporarily store in area RMEM (2). In this way, the acoustic powers P _I are sequentially temporarily stored in the corresponding memory areas RMEM (N) until I = N.

ステップS2において、Ｉ＞Ｎと判定されると、マイク
ロプロセッサの制御の下で、メモリ32の各メモリ領域RM
EM（１）〜RMEM（Ｎ）に記憶されている音響パワーP₁〜
P_Nを昇順にソーティングを行って、その結果をシステム
バス36を経てワークメモリ34へ送り、このワークメモリ
34のメモリ領域SMEM（１）、SMEM（２）、...SMEM
（Ｎ）へ大きさの順に再格納させる（ステップS5）。従
って、例えば、メモリ領域SMEM（１）には音響パワーP_I
のうち一番ピーク値の小さいものが記憶され、逆にメモ
リ領域SMEM（Ｎ）には一番ピーク値の大きいものが記憶
される。すなわち、この実施例では、メモリ領域SMEM
（Ｊ）（Ｊ＝１、・・・、Ｎ）に格納される音響パワー
P_Iの大きさは次の関係が成立する。If I> N is determined in step S2, each memory area RM of the memory 32 is controlled under the control of the microprocessor.
Sound power P ₁ stored in EM (1) to RMEM (N)
P _N is sorted in ascending order and the result is sent to the work memory 34 via the system bus 36.
34 memory areas SMEM (1), SMEM (2), ... SMEM
It is stored again in the order of size in (N) (step S5). Therefore, for example, in the memory area SMEM (1), the acoustic power P _I
The one having the smallest peak value is stored, and conversely, the one having the largest peak value is stored in the memory area SMEM (N). That is, in this embodiment, the memory area SMEM
(J) Sound power stored in (J = 1, ..., N)
The following relationship holds for the magnitude of P _I.

SMEM（１）≦SMEM（２）≦・・・SMEM（Ｎ）・・・・
（４）次にマイクロプロセッサ30において、次の式で示され
る平均雑音レベルN_L′を算出する。SMEM (1) ≦ SMEM (2) ≦ ・・・ SMEM (N) ・・・・
(4) Next, the microprocessor 30 calculates the average noise level N _L ′ represented by the following equation.

この目的のため、マイクロプロセッサ30のメモリ（図
示せず）に、経験によって予め定められた、最大音響パ
ワーから順に小さい方へ数えてこの平均雑音レベルの計
算に用いない音響レベルの個数N_maxと、同様に経験によ
って予め定められた、最小音響パワーから順に大きい方
へ数えて、この平均雑音レベルの計算に用いない音響レ
ベルの個数N_minとを格納しておき、これら格納されたN
_max及びN_minをマイクロプロセッサ30自身で読み出しか
つ、これら個数に対応する音響パワーP_Iを除いた残りの
全ての音響パワーP_Iをワークメモリ34からマイクロプロ
セッサ30へ読み出す（ステップS6）。For this purpose, in a memory (not shown) of the microprocessor 30, the number N _{max of} sound levels, which is predetermined by experience, is counted from the maximum sound power to the smaller one in order, and is not used in the calculation of this average noise level. Similarly, the number of sound levels N _min that is not used in the calculation of this average noise level is stored in advance, counting from the smallest sound power to a larger one, which is also predetermined by experience.
reads _max and N _min microprocessor 30 itself and reads all the rest except the sound power P _I corresponding to these numbers of sound power P _I from the work memory 34 to the microprocessor 30 (step S6).

次に、マイクロプロセッサ30において、次式（５）に
従った平均雑音レベルN_L′の算出処理を行ない、その結
果をマイクロプロセッサ30のメモリに一時記憶しておく
（ステップS7）。Next, in the microprocessor 30, the average noise level N _L ′ is calculated according to the following equation (5), and the result is temporarily stored in the memory of the microprocessor 30 (step S7).

次に、マイクロプロセッサ30において、メモリからN
_max及びN_minと平均雑音レベルN_L′とを読み出して次式
（６）で与えられる平均雑音分散N_D′を算出し、その結
果N_D′を当該メモリに一時記憶させる（ステップS8）。 Next, in the microprocessor 30, the N
_Max and N _min and the average noise level N _L ′ are read to calculate the average noise variance N _D ′ given by the following equation (6), and the result N _D ′ is temporarily stored in the memory (step S8).

次に、これら平均雑音レベルN_L′、平均雑音分散N_D′
及び予め経験によって定められてマイクロプロセッサ30
中のメモリに格納されている係数N₁をそれぞれ読み出し
て次式（７）に従って音声切り出しレベルV_L′を求める
（ステップS9）。 Next, these average noise level N _L ′ and average noise variance N _D ′
And a microprocessor 30 that is predetermined by experience
The coefficient N ₁ stored in the internal memory is read out to obtain the voice cut-out level V _L ′ according to the following equation (7) (step S9).

V_L′＝N_L′＋N₁×N_D ・・・・（７）閾値設定部24において上述したステップS1〜S9の処理
が完了すると、その結果である音声切り出しレベルV_L′
がマイクロプロセッサ30の制御によってシステムバス36
を経て音声区間検出部16へ送られる。尚、測定時間Ｎは
通例0.16〜0.32秒程度が好適であり、フレーム周期が８
ミリ秒の場合、Ｎ＝20〜40となる。N_max、N_minはピーク
性雑音の発生確率、継続時間の性質によって適切な値に
設定する必要がある。通例N_maxは測定フレーム数のＮの
1/10〜1/50程度、N_minはＮの1/10〜1/50ないし０の値と
するのが好適である。V _L ′ = N _L ′ + N ₁ × N _D (7) When the processes of steps S1 to S9 described above are completed in the threshold setting unit 24, the resulting voice cut-out level V _L ′
System bus 36 under the control of microprocessor 30
And is sent to the voice section detection unit 16 via. In addition, the measurement time N is generally preferably 0.16 to 0.32 seconds, and the frame period is 8
In the case of milliseconds, N = 20-40. N _max and N _min must be set to appropriate values depending on the occurrence probability of peak noise and the nature of the duration. Usually N _max is the number of measured frames N
It is preferable that the value of N _min is about 1/10 to 1/50 and N _min is 1/10 to 1/50 to 0.

音声区間検出処理、認識処理については従来例の通り
であるのでその説明を省略する。Since the voice section detection process and the recognition process are the same as those in the conventional example, the description thereof is omitted.

上述した実施例はこの発明の好適例であるにすぎず、
この発明は上述した実施例にのみ限定されるものではな
いこと明らかである。The above-described embodiments are merely preferred examples of the present invention,
Obviously, the invention is not limited to the embodiments described above.

（発明の効果）上述した説明からも明らかなようにこの発明の音声区
間検出方式によれば、背景雑音レベル測定に際してサン
プルされた音響パワーP_Iのうち最も大なる値を持つもの
からN_max個の音響パワーと、最も小なる値を持つものか
ら順にN_min個の音響パワーを除いた残りの全ての音響パ
ワーP_Iの平均雑音レベル値N_L′、平均雑音分散N_D′を求
めることにより、ピークパワーの高い雑音成分が多い環
境下でもその影響を受けることなく、適切な音声切り出
しレベルを設定出来るように構成したものであるから、
高雑音下でも音声区間検出誤りが非常に少なくなり、こ
れがため総合的な認識性能に優れた認識装置を実現する
ことが出来る。(Effects of the Invention) As is apparent from the above description, according to the voice section detection method of the present invention, N _max pieces are selected from the one having the largest value of the acoustic power P _I sampled in the background noise level measurement. By calculating the average noise level value N _L ′ and the average noise variance N _D ′ of all remaining sound power P _I except N _min sound powers in order from the one with the smallest value. Since it is configured so that an appropriate audio cutout level can be set without being affected by it even in an environment with a high peak power noise component,
Even under high noise, the voice section detection error is very small, which makes it possible to realize a recognition apparatus having excellent overall recognition performance.

[Brief description of drawings]

第１図はこの発明の音声区間検出方式の説明に供するブ
ロック図、第２図は従来の音声区間検出方式の説明に供するブロッ
ク図、第３図はこの発明及び従来の説明に供する音声パワーの
一例を示す図、第４図は音響パワー分布を示す図、第５図は音声切り出しレベルの算出処理の動作の流れ図
である。 10…外部入力部、12…A/D変換部 14…パワー算出部、16…音声区間検出部 20…認識部、22…外部機器 24…閾値設定部、30…マイクロプロセッサ 32…メモリ、34…ワークメモリ 36…システムバス。FIG. 1 is a block diagram used for explaining a voice section detection method of the present invention, FIG. 2 is a block diagram used for explaining a conventional voice section detection method, and FIG. 3 is a voice power used for explaining the present invention and the conventional art. FIG. 4 is a diagram showing an example, FIG. 4 is a diagram showing an acoustic power distribution, and FIG. 5 is a flow chart of the operation of the calculation process of the audio cutout level. 10 ... External input section, 12 ... A / D conversion section 14 ... Power calculation section, 16 ... Voice section detection section 20 ... Recognition section, 22 ... External device 24 ... Threshold setting section, 30 ... Microprocessor 32 ... Memory, 34 ... Work memory 36 ... System bus.

Claims

[Claims]

1. A power calculation unit calculates an acoustic power P _I for each minute time called a frame from an input acoustic signal from an external input unit, and a threshold setting unit calculates an average noise level based on the acoustic power P _I. Then, a voice section is detected from the acoustic power P _I and the average noise level, the recognition section performs a recognition process on the voice pattern defined by the voice section, and outputs the result to an external device. In the recognition device, in detecting the voice section, the power calculation unit measures the acoustic power P _I in a voice non-input state for a predetermined time, and in the threshold value calculation unit, the acoustic power P _I is measured. except the acoustic power of the first predetermined number N _max in order from one with the largest becomes the value, the acoustic power of the second predetermined number N _min in order from the one with the smallest becomes the value of the The average noise level for all of the remaining sound power P _I N _L ', the average noise variance
'After calculating the, the average noise level N _L' N _D and average noise variance N _D 'VAD method, characterized in that to calculate the voice clipping level V _L from.