JPS62227200A

JPS62227200A - Zero crossing number detection

Info

Publication number: JPS62227200A
Application number: JP61070191A
Authority: JP
Inventors: 三木　敬
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1986-03-28
Filing date: 1986-03-28
Publication date: 1987-10-06

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）この発明は音声認識装置における特徴パラメータである
零交差数を検出する方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a method for detecting the number of zero crossings, which is a characteristic parameter in a speech recognition device.

（従来の技術）従来より、アナログ音声信号をＡ／Ｄ変換して得られた
ディジタル音声信号から音声パワー値反び周波数分析を
算出すると共に、このディジタル音声信号の予め定めら
れた長さのフレーム毎にこのディジタル音声信号と予め
定められた基準値との零交差数を特徴パラメータとして
検出し、これら音声パワー値、周波数分析結果及び零交
差数に基づいて音声認識処理を行う方法が提案されてい
る。(Prior Art) Conventionally, an audio power value and a frequency analysis are calculated from a digital audio signal obtained by A/D conversion of an analog audio signal, and a frame of a predetermined length of this digital audio signal is calculated. A method has been proposed in which the number of zero crossings between this digital audio signal and a predetermined reference value is detected as a feature parameter, and speech recognition processing is performed based on these audio power values, frequency analysis results, and the number of zero crossings. There is.

この発明の説明に先たち、この従来の零交差数検出方法
につき説明する。Before explaining the present invention, this conventional method for detecting the number of zero crossings will be explained.

第７図は従来の音声認識装置の一構成例を示すブロック
図で、１１は音声入力端子、１２は増幅器、１３はロー
パスフィルタ（ＬＰＦ）である。１４はＡ／Ｄコンバー
タでここでアナログ入力音声はディジタル音声信号に変
換される。１５はＡ／Ｄコンバータ１４からの人力音声
信号を周波数分析を行う周波数分析部である。１６は零
交差数算出部でディジタル化された人力音声信号と、基
準値である零レベルとの交差回数すなわち零交差数をカ
ウントする。１７は音声パワーを算出する音声パワー算
出部である。１８は周波数分析結果と音声パワーから人
力された信号中の音声が存在する区間（以下音声区間と
呼ぶ）を検出する音声区間検出部である。１９は認識部
で、周波数分析部１５からの周波数分析結果、；交差数
算出部１６からの零交差数、音声パワー算出部１７から
の音声パワー及び音声区間検出部１８からの音声区間検
出信号をそれぞれ受は取り、入力音声信号に対して認識
処理を行い、その認識結果を出力端子２０に出力するよ
うに構成されている。FIG. 7 is a block diagram showing an example of the configuration of a conventional speech recognition device, in which 11 is a speech input terminal, 12 is an amplifier, and 13 is a low-pass filter (LPF). 14 is an A/D converter in which analog input audio is converted into a digital audio signal. 15 is a frequency analysis section that performs frequency analysis of the human voice signal from the A/D converter 14. Reference numeral 16 is a zero-crossing number calculation unit that counts the number of times the digitized human voice signal crosses a zero level, which is a reference value, that is, the number of zero-crossings. Reference numeral 17 denotes an audio power calculation unit that calculates audio power. Reference numeral 18 denotes a voice section detection unit that detects a section in which a voice exists in a manually input signal (hereinafter referred to as a voice section) from the frequency analysis result and the voice power. Reference numeral 19 denotes a recognition unit which receives the frequency analysis result from the frequency analysis unit 15; the number of zero crossings from the crossing number calculation unit 16, the voice power from the voice power calculation unit 17, and the voice interval detection signal from the voice interval detection unit 18. Each receiver is configured to perform recognition processing on the input audio signal and output the recognition result to the output terminal 20.

第８図は第７図の音声認識装置における従来の零交差数
算出部１６の構成の一例を示すブロック図であり、この
従来の零交差数検出方法につき説明する。FIG. 8 is a block diagram showing an example of the configuration of the conventional zero-crossing number calculation section 16 in the speech recognition device of FIG. 7, and this conventional zero-crossing number detection method will be explained.

零交差回数はフレーム周期内に音声信号が零レベルを交
差する回数である。ある時点での音声信号が零レベルを
交差したか否かは次式によって得られる値Ｓ１　（但し
、ｉはフレーム番号）を算出することによって検出する
ことが出来る。The number of zero crossings is the number of times the audio signal crosses the zero level within a frame period. Whether or not the audio signal crosses the zero level at a certain point in time can be detected by calculating the value S1 (where i is the frame number) obtained by the following equation.

Ｓ、＝ｘ、■Ｘｉ−寵但し、　ｘ、＝０：ｘＨの値が正または０＝１：ｘｌの
値が負ここで■は排他的論理和と呼ばれている演算である。こ
の演算は二つの人力の値が一致すれば０となり、不一致
であれば１となる。従って、零交差回数は予め定めたフ
レーム周期中に値Ｓ１が１となる回数をカウントするこ
とで算出される。S,=x, ■Xi-However, x,=0: The value of xH is positive or 0=1: The value of xl is negative. Here, ■ is an operation called exclusive OR. This calculation yields 0 if the two human power values match, and 1 if they do not match. Therefore, the number of zero crossings is calculated by counting the number of times the value S1 becomes 1 during a predetermined frame period.

第８図は、このような零交差数の算出原理を第７図の零
交差数算出部に適用して構成した例を示すブロック図で
ある。同図において、ｘｌはＡ／Ｄコンバータ１４から
のディジタル音声信号のサンプル値Ｘ＋（以下、単に音
声信号Ｘｉ　という場合がある）に対応する符号ビット
である。又Ｃは音声信号Ｘ１のサンプリングクロック、
Ｆは予め定めたフレーム長、フレーム周期に同期させる
ためのフレームクロックである。２１はシフトレジスタ
、２２は反一致回路、２３はカウンタ及び２４はレジス
タである。FIG. 8 is a block diagram showing an example of a structure in which the principle of calculating the number of zero crossings is applied to the zero crossing number calculation section of FIG. 7. In the figure, xl is a sign bit corresponding to the sample value X+ of the digital audio signal from the A/D converter 14 (hereinafter sometimes simply referred to as audio signal Xi). Also, C is the sampling clock of the audio signal X1,
F is a frame clock for synchronizing with a predetermined frame length and frame period. 21 is a shift register, 22 is an anti-coincidence circuit, 23 is a counter, and 24 is a register.

第８図に示す構成によれば、フレームクロックＦにより
あるフレーム開始時点にカウンタ２３はリセットされる
とする。その時の直前のカウンタ２３の出力Ｚ　ｉ　−
１がレジスタ２４に格納される。反一致回路２２では、
符号ビットｘ１と、シフトレジスタ２１に格納されてい
る一つ前のサンプリング時点での符号ビットＸ１−１と
の排他的論理和Ｓ１の演算を行う。カウンタ２３はこの
値Ｓ、が１の値を保っている間のサンプリングクロック
Ｃの数をカウントし、次のフレームクロックＦが入力す
る直前の値がそのフレーム周期の零交差数Ｚとして与え
られる。この値Ｚはレジスタ２４に格納され、第７図の
認識部１９への入力となる。According to the configuration shown in FIG. 8, it is assumed that the counter 23 is reset at the start of a certain frame by the frame clock F. The output Z i − of the counter 23 immediately before that time
1 is stored in register 24. In the anti-coincidence circuit 22,
An exclusive OR operation S1 is performed between the sign bit x1 and the sign bit X1-1 stored in the shift register 21 at the previous sampling time. The counter 23 counts the number of sampling clocks C while this value S remains at 1, and the value immediately before the next frame clock F is input is given as the number Z of zero crossings in that frame period. This value Z is stored in the register 24 and becomes an input to the recognition unit 19 in FIG.

第９図（Ａ）及び（Ｂ）並びに第１０図（Ａ）及び（Ｂ
）は、この従来の零交差数検出方法により無音区間、摩
擦音Ｓ及び母音Ｉについて雑音が乗っていない場合と、
低レベルの雑音が乗っている場合とについて、それぞれ
得られた零交差数Ｚについての説明図であり、各図の（
Ａ）はＡ／Ｄコンバータ１４の音声信号Ｘ、の変化の様
子を示し、（Ｂ）はこれに対応する零交差数２をそれぞ
れ示している。Figures 9(A) and (B) and Figures 10(A) and (B)
) is the case where no noise is added to the silent section, fricative S and vowel I using this conventional zero crossing number detection method, and
It is an explanatory diagram of the number of zero crossings Z obtained when low-level noise is added, and (
A) shows how the audio signal X of the A/D converter 14 changes, and (B) shows the number of zero crossings 2 corresponding thereto.

第９図（Ａ）及び（Ｂ）からも理解出来るように、無音
区間及び母音Ｉの区間では零交差数Ｚは小さな値を示す
が摩擦音Ｓの区間では非常に大きな値を示してい゛る。As can be understood from FIGS. 9(A) and (B), the number of zero crossings Z shows a small value in the silent section and the vowel I section, but it shows a very large value in the fricative S section.

しかし人力信号に多少の雑音が乗っていると思われる第
１０図（Ａ）及び（Ｂ）に示す例では無音区間において
も零交差数Ｚは大きな値を示している。このようにこの
従来の零交差数２の検出方法では低レベルの雑音でも大
きな影響を受けることがわかる。However, in the examples shown in FIGS. 10(A) and 10(B), in which it seems that some noise is added to the human signal, the number of zero crossings Z shows a large value even in the silent section. Thus, it can be seen that this conventional detection method with a zero crossing count of 2 is greatly affected by even low level noise.

この雑音の影響を除去するために、人力音声信号がこの
場合の基準値である７レベルを交差する回数ではなく、
ある定められた基準値を交差する回数を求める方法が提
案されている。In order to remove the influence of this noise, instead of measuring the number of times the human voice signal crosses the 7 level, which is the reference value in this case,
A method has been proposed for determining the number of times a given reference value is crossed.

この検出方法につき第１１図に示す零交差数算出部１６
を参照して説明する。この例ではＡ／Ｄコンバータ１４
からの音声信号Ｘ＋の零レベルに対して一定の定数を減
じたレベルを零交差の基準レベルに設定する。この設定
処理は音声信号ｘ１に対してオフセット値を減すること
で実現できる。この減算処理は減算回路２５で行われる
。減算後の音声信号の値をＸ’ｉとする。この場合、零
交差の基準となるレベルをオフセットと呼ぶ。そして、
前述した従来の算出方法の場合と同様に、値ｘ′１の現
時点での符号ビットｘ１と、一つ前のサンプリングクロ
ックでの符号ビットｘ１−１との排他的論理和の値Ｓ１
を求める。この場合にも、この値Ｓ。For this detection method, the zero crossing number calculation unit 16 shown in FIG.
Explain with reference to. In this example, the A/D converter 14
A level obtained by subtracting a certain constant from the zero level of the audio signal X+ from is set as a reference level for zero crossing. This setting process can be realized by subtracting the offset value from the audio signal x1. This subtraction process is performed by the subtraction circuit 25. Let the value of the audio signal after the subtraction be X'i. In this case, the level that serves as a reference for zero crossing is called an offset. and,
As in the case of the conventional calculation method described above, the value S1 of the exclusive OR of the current sign bit x1 of the value x'1 and the sign bit x1-1 at the previous sampling clock.
seek. Also in this case, this value S.

が１となっている区間の数をカウンタでカウントして次
なるフレームクロックが入力する直前の値を零交差数Ｚ
とする。Count the number of intervals where Z is 1 and calculate the value immediately before the next frame clock input as the zero crossing number Z.
shall be.

このオフセットされた音声信号ｘ′、と、第１１図の零
交差算出部１６での算出によって検出された零交差数Ｚ
′の例を第１２図（Ａ）及び（Ｂ）にそれぞれ示す。図
中の破線は交差の基準となるオフセット値を示している
。この例からも明らかなように、確かに無音区間での零
交差数２の値は雑音の影響が除去された値となっている
が摩擦音Ｓの部分での特徴的な値の大きさも低くなって
おり、従って音声の特徴を示すパラメータとしての感度
が落ちてしまっている。This offset audio signal
Examples of ' are shown in FIGS. 12(A) and 12(B), respectively. The broken line in the figure indicates an offset value that serves as a reference for intersection. As is clear from this example, the value of the number of zero crossings 2 in the silent section is indeed a value where the influence of noise has been removed, but the magnitude of the characteristic value in the part of the fricative S is also low. Therefore, the sensitivity as a parameter indicating voice characteristics has decreased.

（発明が解決しようとする問題点）このように、従来技術の一方の検出方法では雑音に弱く
、他方の検出方法では音声の特徴パラメータの感度が低
下するという問題点があった。(Problems to be Solved by the Invention) As described above, one detection method of the prior art is susceptible to noise, and the other detection method has a problem in that the sensitivity of voice characteristic parameters is reduced.

この発明の目的は、上述した従来の零交差数の検出方法
における雑音による影響を除去するとともに、音声の特
徴パラメータとしての十分な感度を有する零交差数検出
方法を提供することにある。An object of the present invention is to provide a method for detecting the number of zero crossings that eliminates the influence of noise in the conventional method for detecting the number of zero crossings and has sufficient sensitivity as a characteristic parameter of speech.

（問題点を解決するための手段）この目的の達成を図るため、この発明による零交差数検
出方法、によれば、１フレーム前までの音声パワー値に基づいて現フレーム
の音声パワー予測値を求め、求められた音声パワー予測値に基づいて零交差数検出の
ための基準値を設定することを特徴とする。(Means for solving the problem) In order to achieve this objective, according to the zero crossing number detection method according to the present invention, the predicted voice power value of the current frame is calculated based on the voice power value of one frame before. and setting a reference value for detecting the number of zero crossings based on the obtained voice power predicted value.

この発明の実施に当り、この音声パワー予測値と、ディ
ジタル音声信号の零レベル値からのオフセット値との関
係を予めテーブルＲＯＭに記憶させておき、この音声パ
ワー予測値をアドレスとしてテーブルＲＯＭから対応す
るオフセット値を読出して、読出されたオフセット値を
零交差数検出のための基準値とするのが好適である。In carrying out the present invention, the relationship between the predicted voice power value and the offset value from the zero level value of the digital voice signal is stored in advance in a table ROM, and the predicted voice power value is used as an address to correspond from the table ROM. It is preferable to read out an offset value and use the read offset value as a reference value for detecting the number of zero crossings.

（作用）このように、この発明・は、零交差数の算出方法として
交差の基準となる基準値（ＩＪＩ値ともいう）を音声パ
ワーレベルに応じて予め適切な値に設定する処理を追加
したものである。(Function) As described above, the present invention adds a process of setting a standard value (also referred to as an IJI value), which is a reference for crossing, to an appropriate value in advance according to the audio power level as a method for calculating the number of zero crossings. It is something.

このようにすれば、音声パワーが比較的小さくて雑音の
割合が大きい場合には基準値を高く設定して雑音の影響
を除去し、一方、音声パワーが大きくて雑音の割合が小
さい場合には基準値を小さく設定するので、この発明に
より検出される零交差数は雑音の影響がなくかつ音声の
特徴パラメータとしての感度が十分に大きい。In this way, when the voice power is relatively small and the noise proportion is large, the reference value is set high to remove the influence of noise, while when the voice power is large and the noise proportion is small, the reference value is set high and the noise effect is removed. Since the reference value is set small, the number of zero crossings detected by the present invention is not affected by noise and has a sufficiently high sensitivity as a voice characteristic parameter.

（実施例）以下、図面を参照して、この発明の零交差数検出方法の
実施例につき説明する。(Example) Hereinafter, an example of the zero crossing number detection method of the present invention will be described with reference to the drawings.

第１図はこの発明の実施例を示すブロック図であって、
第７図に示した構成成分と同一の構成成分については同
一の符号を付して示しその詳細な説明を省略する。FIG. 1 is a block diagram showing an embodiment of the present invention,
Components that are the same as those shown in FIG. 7 are denoted by the same reference numerals, and detailed explanation thereof will be omitted.

この発明の零交差数検出方法が従来と相違する点は、零
交差算出部１６において、零交差数算出の基準値（閾値
）を音声パワー算出部１７から生じる音声パワー値に対
応させて設定させる値とする点にある。例えば、１フレ
ーム前までの音声パワー値に基づいて現フレームの音声
パワー予測値を求め、求められた音声パワー予測値に基
づいて零交差数検出のための基準値を設定する。The difference between the zero-crossing number detection method of the present invention and the conventional one is that the zero-crossing calculation unit 16 sets a reference value (threshold) for calculating the number of zero-crossings in correspondence with the audio power value generated from the audio power calculation unit 17. It is at the point where the value is taken. For example, a predicted voice power value for the current frame is determined based on the voice power value up to one frame before, and a reference value for detecting the number of zero crossings is set based on the determined voice power predicted value.

これがため、第１図に示すように、この発明を実施する
ための音声認識装置では、零交差数算出部１６の入力と
して、Ａ／Ｄコンバータ１４からの音声信号、サンプリ
ングクロック及びフレームクロックの他に、音声パワー
算出部１７から生じた音声パワー信号を供給出来るよう
に構成する。Therefore, as shown in FIG. 1, in the speech recognition device for carrying out the present invention, the input of the zero-crossing number calculation unit 16 includes the audio signal from the A/D converter 14, the sampling clock, the frame clock, and other signals. The configuration is such that the audio power signal generated from the audio power calculation section 17 can be supplied to the audio power calculating section 17.

この発明を実施するための零交差数算出部１６の−構成
例を第２図にブロック図で示す。尚、第２図において、
第１１図に示した構成成分と同一の構成成分については
同一の符号を付して示しである。同図において、音声イ
８号Ｘｉ、音声パワー信号のレベル値Ｐｊ　（音声パワ
ーレベルという）、サンプリングクロックＣ、フレーム
クロックＦで示す。この実施例では音声パワーレベルＰ
Ｊ　　（ｊはフレーム番号）は１フレーム前までの音声
パワーレベルから予測した現フレームの音声パワーレベ
ルすなわち音声パワー予測値とする。FIG. 2 shows a block diagram of an example of the configuration of the zero crossing number calculating section 16 for carrying out the present invention. In addition, in Figure 2,
Components that are the same as those shown in FIG. 11 are designated by the same reference numerals. In the figure, the audio signal number 8 Xi, the level value Pj of the audio power signal (referred to as audio power level), the sampling clock C, and the frame clock F are shown. In this example, the audio power level P
Let J (j is the frame number) be the audio power level of the current frame predicted from the audio power level up to one frame before, that is, the audio power predicted value.

この場合、音声パワーレベルｐＪの値としては音声パワ
ーは急激に変化しないという性質から次式（１）で与え
られるような前数フレームの音声パワーの平均を音声パ
ワー予測値として用いるという簡易な方法でも実施上何
ら問題ない。In this case, the value of the audio power level pJ is a simple method of using the average of the audio power of the previous few frames as the audio power prediction value, as given by the following equation (1), because the audio power does not change rapidly. However, there is no problem in implementation.

但し、Ｐｋはに番目のフレームでの音声パワーの値であ
る。However, Pk is the value of the audio power in the second frame.

このような予測値は従来既知の方法で求めることが出来
ると共に、この予測値を求める手段を音声パワー算出部
１７に設けても良いし、或いは零交差数算出部１６内に
設けても良い。Such a predicted value can be obtained by a conventionally known method, and a means for obtaining this predicted value may be provided in the audio power calculation section 17 or within the zero crossing number calculation section 16.

次に、このようにして求めた音声パワー予測値からディ
ジタル音声信号の零レベルからのオフセット値を求める
方法につき説明する。この図示の実施例では、テーブル
ＲＯＭ　（リードオンリメモリ以下ＲＯＭと呼ぶ）２６
を用いる。このテーブルＲＯＭ２６は予め、音声パワー
予測値とオフセット値との関係を学習して記憶させてお
く。そして、音声パワー算出部１７の音声パワーＰｊか
ら得られる音声パワー予測値ＰｊをテーブルＲＯＭ２６
に対するアドレスとしてこのテーブルＲＯＭ２６から対
応する適切なオフセット値θ、を読出し、このオフセッ
ト値θｊをこの場合減算器として構成した減算回路２５
へ出力する。Next, a method for determining the offset value from the zero level of the digital audio signal from the audio power predicted value obtained in this manner will be explained. In the illustrated embodiment, a table ROM (read only memory hereinafter referred to as ROM) 26
Use. This table ROM 26 has previously learned and stored the relationship between the predicted voice power value and the offset value. Then, the predicted voice power value Pj obtained from the voice power Pj of the voice power calculation unit 17 is stored in the table ROM 26.
A corresponding appropriate offset value θ is read from the table ROM 26 as an address for the subtraction circuit 25, which is configured as a subtracter in this case.
Output to.

この減算回路２５において、第１図のＡ／Ｄコンバータ
１４からのディジタル音声信号Ｘｉからオフセット値θ
１が減ぜられる。従って、反一致回路２２には（Ｘｉ　
−〇Ｊ）の符号ビットＸ＋　と、シフトレジスタ２１に
よってサンプリングクロックＣの１クロック分だけ遅れ
た（Ｘｉ−１−〇ｊ）の符号ビットｘ、−１が入力され
る。反一致回路２２の出力Ｓ″１は符号ビットＸ、とｘ
、−１との符号値が異なった場合、すなわちディジタル
音声信号Ｘｉ　と、１フレーム前のディジタル音声信号
ｘ、−１の符号が異っている場合には、この時点で基準
となるオフセット値θ４に対してディジタル音声信号が
交差していることを示しているので、この場合にのみそ
の値が１となる。In this subtraction circuit 25, an offset value θ is calculated from the digital audio signal Xi from the A/D converter 14 in FIG.
1 is subtracted. Therefore, the anti-coincidence circuit 22 has (Xi
The sign bit X+ of -〇J) and the sign bit x, -1 of (Xi-1-〇j) delayed by one clock of the sampling clock C by the shift register 21 are input. The output S″1 of the anti-coincidence circuit 22 has sign bits X, and x
, -1, that is, when the digital audio signal Xi and the digital audio signal x, -1 one frame before have different signs, the reference offset value θ4 is set at this point. Since this indicates that the digital audio signal crosses the , the value is 1 only in this case.

カウンタ２３ではフレームクロックＦで定められた区間
中に値Ｓ′１が１となる回数ＺｈＩをカウントする。レ
ジスタ２４にはカウンタ２３のカウント値Ｚ″１でフレ
ームが入力される直前の値、すなわちそのフレーム周期
における零交差数Ｚ″が保持され第１図の認識部１９へ
の入力となる。The counter 23 counts the number of times ZhI that the value S'1 becomes 1 during the interval determined by the frame clock F. The register 24 holds the count value Z''1 of the counter 23 and the value immediately before the frame is input, that is, the number of zero crossings Z'' in that frame period, and is input to the recognition unit 19 in FIG.

又、この実施例では第２図に示したテーブルＲＯＭ２６
に記憶する基準値としてのオフセット値θ４として単純
な二段閾値を用いている。第３図はこの音声パワー予測
値ＰＪと、基準値θ１との関係を格納したテーブルＲＯ
Ｍ２６の内容を示す図である。又、第４図はテーブルＲ
ＯＭ２６内に格納されている音声パワー予測値Ｐ、と、
基準値θ、の関係をグラフで示した図である。ここで音
声パワー予測値ｐｊが値Ｐ０以下の場合の基準値θＬＯ
Ｗは音声パワー予測値ＰＪが値Ｐ０より大なる場合の基
準値θＨＩＧＨより低く設定されている。In addition, in this embodiment, the table ROM 26 shown in FIG.
A simple two-stage threshold value is used as the offset value θ4 as a reference value to be stored in . FIG. 3 shows a table RO that stores the relationship between this predicted voice power value PJ and the reference value θ1.
It is a diagram showing the contents of M26. Also, Figure 4 shows table R
A voice power predicted value P stored in the OM26,
FIG. 3 is a graph showing the relationship between the reference value θ. Here, the reference value θLO when the voice power predicted value pj is less than the value P0
W is set lower than the reference value θHIGH when the predicted voice power value PJ is greater than the value P0.

これらの値は予め学習により適切な値にそれぞれ設定す
ることが出来る。このように設定した理由は音声パワー
が比較的小さい場合には入力波形にしめる雑音の影響が
大きいために基準値を高く設定する必要があり、逆に音
声パワーが大きい場合、入力波形にしめる雑音の割合が
相対的に低くなるため、基準値を理想に近い零レベル付
近に設定できるからでる。These values can be set to appropriate values by learning in advance. The reason for this setting is that when the audio power is relatively small, the influence of noise that is included in the input waveform is large, so it is necessary to set the reference value high.On the other hand, when the audio power is large, the percentage of noise that is included in the input waveform is This is because the reference value can be set close to the ideal zero level since the value is relatively low.

次にこの実施例に基づいて零交差数を算出して得られた
例を第５図（Ａ）〜（Ｃ）及び第６図（Ａ）〜（Ｃ）に
示す。第５図（Ａ）　、（Ｂ）及び（Ｃ）は雑音が乗っ
ていないある入力音声に対する平均パワー予測値ＰＪ　
（実線図示）、入力音声のサンプル値ｘ＋（破線図示）
及び零交差数Ｚ″　（実線図示）をそれぞれ示す図であ
る。又、第６図（Ａ）　、　（Ｂ）及び（Ｃ）は低雑音
が乗っている別の入力音声に対するＰＪ　（実線図示）
、Ｘ、（破線図示）、Ｚ″（実線図示）をそれぞれ示す
図である。これら図において、一点破線で示すＰｏは基
準値θｊがθ）１１０８からθＬＯＷへと切り変えるた
めの境界値である。又、点線で示すＺはこの発明によら
ずに従来の方法で算出して得られた零交差数である。Next, examples obtained by calculating the number of zero crossings based on this example are shown in FIGS. 5(A)-(C) and FIGS. 6(A)-(C). Figures 5 (A), (B), and (C) are average power predicted values PJ for a certain input voice without noise.
(solid line diagram), input audio sample value x+ (dashed line diagram)
and the number of zero crossings Z'' (shown as a solid line).Furthermore, FIGS. 6(A), (B), and (C) show the PJ for another input voice with low noise (shown as a solid line).
, Also, Z indicated by a dotted line is the number of zero crossings calculated by a conventional method without using the present invention.

この実験結果から理解出来るように、いづれの場合にも
、無音区間では雑音の影ツを効果的に抑制することが出
来ると共に、摩擦音Ｓ及び母音Ｉの有声区間において特
徴的な零交差数の値が大きく、従って音声の特徴を示す
パラメータとしての感度が高い。As can be understood from these experimental results, in any case, the shadow of noise can be effectively suppressed in the silent section, and the characteristic value of the number of zero crossings in the voiced section of the fricative S and the vowel I can be suppressed. is large, and therefore has high sensitivity as a parameter indicating voice characteristics.

この発明は上述した実施例にのみ限定されるものではな
くこの発明の範囲を逸脱することなく多くの変形又は変
更を加えることが出来る。例えば、上述した実施例では
音声パワー値ＰＪから交差の基準となる値θ４をテーブ
ルＲＯＭ２ｆｉによって得る場合につき説明したが、こ
のテーブルＲＯＭを用いる代りに、任意の値を書き込み
可能なＲＡＭ　（ランダム・アクセス・メモリ）で置き
換え、初期設定時、あるいは認識パラメータ等の更新時
にこのＲＡＭに適切な値を書き込んでおくという方法を
用いても良い。This invention is not limited only to the embodiments described above, and many modifications and changes can be made without departing from the scope of this invention. For example, in the above-described embodiment, a case was explained in which the value θ4, which is a reference for crossing, is obtained from the audio power value PJ using the table ROM2fi, but instead of using this table ROM, a RAM (random access・Memory) may be used, and appropriate values may be written in this RAM at the time of initial setting or when updating recognition parameters, etc.

又、上述した実施例ではオフセット値θ１として二段閾
値を用いたが、これに限定されるものではなく三段以上
の閾値を用いても良いことが明らかである。Further, in the above-described embodiment, a two-stage threshold value is used as the offset value θ1, but it is clear that the present invention is not limited to this, and three or more stages of threshold values may be used.

（発明の効果）上述した実施例からも明らかなように、この発明の零交
差数検出方法によれば、雑音による影響を除去し得ると
共に、特徴パラメータとしての十分な大きさの感度を存
する零交差数を検出することが出来る。(Effects of the Invention) As is clear from the embodiments described above, according to the method for detecting the number of zero crossings of the present invention, it is possible to eliminate the influence of noise, and also to detect zeros that have a sufficient sensitivity as a characteristic parameter. The number of intersections can be detected.

[Brief explanation of drawings]

第１図はこの発明の零交差数検出方法を適用させた音声
認識装置を示すブロック図、第２図はこの発明の零交差数検出方法の一実施例の説明
に供する零交差数算出部を示すブロック図、第３図及び第４図はこの発明の零交差数検出方法に説明
に供するテーブルＲＯＭの内容及びオフセット値をそれ
ぞれ示す図、第５図及び第６図はこの発明の零交差数検出方法により
得られた零交差数の説明に供する説明図、第７図は従来の零交差数検出方法を適用した従来の音声
認識装置を示すブロック図、第８図は従来の零交差数算出部を示すブロック図、第９図及び第１０図は第８図に示す従来の零交差数算出
部を用いて得られた零交差数の説明に供する説明図、第１１図は従来の他の零交差数算出部を示すブロック図
、第１２図は第１１図の零交差数算出部を用いて得られた
零交差数の説明に供する説明図である。１１−・・入力端子、１２−・・増幅器１３−・・ロー
パスフィルり、１４・・・Ａ／Ｄコンバータ１５−・・
周波数分析部、　　１６−・・零交差数算出部１７・・
・音声パワー算出部、ｌ　８−・・音声区間検出部１９
−・・認識部、　　　　　２０・・・出力端子２１−・
・シフトレジスタ、　２２−・・反一致回路２３−・・
カウンタ、　　　　　２４・・・レジスタ２５−・・減
算回路、　　　　２６−テーブルＲＯＭ　。イ旦し　　　　　θ　’　　”ｊ　　’　　ＰＭＡｘθ
Ｈ１（ｙＨ＜θＬＯＶＪ −を声パワー千３ａす殖を基硯イエの藺イ爪第４図FIG. 1 is a block diagram showing a speech recognition device to which the method for detecting the number of zero crossings of the present invention is applied, and FIG. 3 and 4 are diagrams respectively showing the contents of the table ROM and offset values used to explain the method for detecting the number of zero crossings of the present invention. FIGS. 5 and 6 are diagrams showing the number of zero crossings of the present invention An explanatory diagram for explaining the number of zero crossings obtained by the detection method. Figure 7 is a block diagram showing a conventional speech recognition device to which the conventional method of detecting the number of zero crossings is applied. Figure 8 is a diagram showing the conventional method of calculating the number of zero crossings. FIG. 9 and FIG. 10 are explanatory diagrams for explaining the number of zero crossings obtained using the conventional zero crossing number calculating section shown in FIG. 8. FIG. FIG. 12 is a block diagram showing the zero-crossing number calculation section. FIG. 12 is an explanatory diagram for explaining the number of zero-crossings obtained using the zero-crossing number calculation section of FIG. 11-...Input terminal, 12-...Amplifier 13-...Low pass filter, 14...A/D converter 15-...
Frequency analysis section, 16-...Zero crossing number calculation section 17...
・Voice power calculation unit, l 8-...Voice section detection unit 19
---Recognition unit, 20... Output terminal 21--
・Shift register, 22-... anti-match circuit 23-...
Counter, 24--Register 25--Subtraction circuit, 26-Table ROM. itanshi θ ' ``j ' PMAxθ
H1 (yH < θLOVJ - is based on the voice power 13a) Fig. 4

Claims

[Claims]

(1) Detect the audio power value from the digital audio signal obtained by A/D converting the analog audio signal, and convert the digital audio signal and the reference value for each frame of a predetermined length of the digital audio signal. In order to detect the number of zero crossings between the voice power values and the number of zero crossings, and to perform voice recognition based on the voice power values and the number of zero crossings, A method for detecting the number of zero crossings, comprising: obtaining a predicted voice power value of a frame, and setting the reference value based on the predicted voice power value.

(2) The relationship between the predicted voice power value and the offset value from the zero level value of the digital voice signal is stored in a table ROM in advance, and the corresponding offset is stored from the table ROM using the predicted voice power value as an address. 2. The method for detecting the number of zero crossings according to claim 1, characterized in that the offset value is read out and the offset value is used as the reference value.