JP3375655B2

JP3375655B2 - Sound / silence determination method and device

Info

Publication number: JP3375655B2
Application number: JP02488992A
Authority: JP
Inventors: 規雄野村
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1992-02-12
Filing date: 1992-02-12
Publication date: 2003-02-10
Anticipated expiration: 2018-02-10
Also published as: JPH05224686A

Abstract

PURPOSE:To improve the precision of judging voiced/voiceless in a speech while using a low-precision judging rule. CONSTITUTION:Feature parameters extracted from a frame-divided speech signal are decided by 1st and 2nd many-valued logical decision parts 2 and 3 according to their likelihoods. An inference part 4 infers whether the speech signal is voiced or voiceless by using the outputs of the 1st and 2nd many-valued decision parts 2 and 3 and a decision result feedback part 7. A variable hangover generation part 5 varies a hangover time according to the result of the inference part 4 and a binary decision part 6 judges voiced or voiceless finally.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ディジタル音声通信等
に使用する音声の有音無音判定方法およびその装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method for determining the presence / absence of voice for use in digital voice communication and the like, and a device therefor.

【０００２】[0002]

【従来の技術】近年、自動車電話、携帯電話等の移動体
通信では低消費電力化を図るため、音声の無音区間では
送信を中断する方法が必要とされており、精度の高い有
音無音判定方法の開発が望まれている。2. Description of the Related Art In recent years, in mobile communication such as car phones and mobile phones, a method of interrupting the transmission in a silent section of a voice is required in order to reduce power consumption. Development of methods is desired.

【０００３】以下、従来の有音無音判定方法について説
明する。図９は従来の有音無音判定装置を示す図であ
る。図９において、９はパラメータ抽出部、１０、１１
は第１、第２の２値論理判定部、１２は第３の２値論理
判定部、１３はハングオーバ発生部、１４、１５は１フ
レーム遅延部である。A conventional method for determining whether or not a sound is present will be described below. FIG. 9 is a diagram showing a conventional sound / soundlessness determination device. In FIG. 9, 9 is a parameter extraction unit, 10, 11
Is a first and second binary logic decision unit, 12 is a third binary logic decision unit, 13 is a hangover occurrence unit, and 14 and 15 are 1-frame delay units.

【０００４】以上のような構成により、まずパラメータ
抽出部９によって、フレームに分割された入力音声から
パワー、ゼロクロス数などの有音無音判定に有用である
いくつかの特徴パラメータを抽出する。次に、第１、第
２の２値論理判定部１０、１１それぞれの判定規則によ
り、しきい値を用いて有音無音の２値判定を行なう。例
えば、第１の２値論理判定部１０ではパワーの大きさに
よる２値判定を行ない、第２の２値論理判定部１１では
ゼロクロス数による２値判定を行なう。第３の２値論理
判定部１２では、第１、第２の２値論理判定部１０、１
１それぞれの判定結果と、１フレーム遅延部１４、１５
より出力された前フレームの判定結果とをもとにして、
２値論理演算を使用して有音無音の判定を行なう。ハン
グオーバ発生部１３では、第３の２値論理判定部１２に
よる無音判定が数フレーム連続したときに最終判定を有
音から無音に遷移させる。なお、１フレーム遅延部１
４、１５による判定のフィードバックは必要により使用
される。With the above-mentioned configuration, first, the parameter extraction unit 9 extracts some characteristic parameters such as power and the number of zero crosses, which are useful for determining the presence / absence of voice, from the input voice divided into frames. Next, according to the determination rule of each of the first and second binary logic determination units 10 and 11, binary determination of voiced / unvoiced is performed using a threshold value. For example, the first binary logic determination unit 10 performs binary determination based on the magnitude of power, and the second binary logic determination unit 11 performs binary determination based on the number of zero crosses. In the third binary logic determination unit 12, the first and second binary logic determination units 10, 1
1 determination result and 1 frame delay units 14 and 15
Based on the determination result of the previous frame output from
The presence or absence of voice is determined using a binary logic operation. In the hangover generation unit 13, when the silence determination by the third binary logic determination unit 12 continues for several frames, the final determination is transitioned from voiced to silence. The 1-frame delay unit 1
The feedback of judgment by 4 and 15 is used if necessary.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら上記従来
の有音無音判定方法では、精度の高いパラメータ抽出方
法および精度の高い判定規則が存在しないので、判定し
きい値が明確な２値論理による判定では、判定規則の判
定に誤りが生じやすく、最終的な有音無音の判定にも誤
りを生じるという問題があった。However, in the above-described conventional voiced / soundless determination method, since there is no highly accurate parameter extraction method and highly accurate determination rule, it is not possible to perform determination by binary logic with a clear determination threshold value. However, there is a problem that an error is likely to occur in the determination of the determination rule, and an error is also caused in the final determination of voiced / unvoiced sound.

【０００６】本発明は上記従来の問題を解決するもので
あり、音声のパワー、ゼロクロス数等、精度の高くない
特徴パラメータと判定規則を使用した場合であっても、
最終的な判定にはより確からしい判定結果を得ることが
できる優れた有音無音判定方法を提供することを目的と
するものである。The present invention solves the above-mentioned problems of the related art. Even when a feature parameter and a determination rule that are not accurate, such as the power of voice and the number of zero crosses, are used,
It is an object of the present invention to provide an excellent voiced / unvoiced determination method capable of obtaining a more reliable determination result in the final determination.

【０００７】[0007]

【課題を解決するための手段】本発明は上記目的を達成
するために、途中の判定過程においては、０〜１の範囲
内の連続値を持つ多値論理を使用し、例えば、０が「無
音」、０．５が「判定不能」、１が「有音」と意味づけ
された値の最大値と最小値とを用いて推論を行うように
し、最終段階において有音か無音かの２値判定を行うよ
うにしたものである。In order to achieve the above object, the present invention uses multi-valued logic having continuous values in the range of 0 to 1 in the judgment process on the way, for example, 0 is " Inference is performed using the maximum value and the minimum value of the values meaning "silent", 0.5 is "undecidable", and 1 is "voiced". The value judgment is performed.

【０００８】また、それぞれの判定においては、判定特
性が非線形の場合にも少ない処理量で多値論理出力が得
られるよう、入出力関係を記録したデータテーブルを備
えたものである。Further, in each judgment, a data table recording the input / output relationship is provided so that a multivalued logic output can be obtained with a small processing amount even when the judgment characteristic is non-linear.

【０００９】また、確からしくない無音判定に対して
は、長いハングオーバ時間を発生するようにして、有音
を無音と判定する誤りを減少させるものである。Further, for the uncertain silence determination, a long hangover time is generated to reduce the error of determining the sound as silence.

【００１０】[0010]

【作用】本発明は上記構成により、それぞれの判定規則
では明確なしきい値処理を行わず、判定の確からしさに
応じた値を出力し、最終的な判定において上記確からし
さに応じた処理を行う。つまり、複数の規則を用いてそ
のなかで最も確からしい判定出力を使うことで、より確
からしい判定結果を得る。With the above construction, the present invention does not perform clear threshold value processing in each judgment rule, outputs a value according to the certainty of the judgment, and performs processing according to the certainty in the final judgment. . That is, a more probable determination result is obtained by using a plurality of rules and using the most probable determination output among them.

【００１１】また、入出力関係を記録したデータテーブ
ルを備えたデータテーブルを備えたことで、パラメータ
と判定出力が非線形な特性の場合にも少ない処理量で処
理ができる。Further, since the data table having the data table recording the input / output relation is provided, the processing can be performed with a small processing amount even when the parameter and the determination output have a non-linear characteristic.

【００１２】また、確からしさの低い無音判定では、ハ
ングオーバ時間を長く発生させて最終的な有音判定から
無音判定へ遷移させる時間を遅らせることにより、有音
を無音に誤判定させる率を減少させる。Further, in the silent determination with low certainty, the hangover time is lengthened to delay the transition time from the final voice determination to the silent determination, thereby reducing the rate of false determination of voice. .

【００１３】[0013]

【実施例】以下本発明の一実施例について図面を参照し
ながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of the present invention will be described below with reference to the drawings.

【００１４】図１は本発明の有音無音判定方法を実現す
る構成を示す図である。図１において、１はフレームに
分割された音声データから有音無音判定に有効な１つも
しくは複数の特徴パラメータを抽出するパラメータ抽出
部である。２、３はそれぞれの判定規則により有音無音
判定を行い、０〜１の範囲内の連続値による多値論理に
よる判定結果を出力する第１、第２の多値論理判定部で
ある。４は複数の判定結果からより確かな結果を推論す
る多値論理による推論部である。５は判定結果の確から
しさにより可変のハングオーバ時間を発生する可変ハン
グオーバ発生部である。６は最終的に有音無音の２値判
定を行う２値判定部である。７は判定結果を１フレーム
遅延して推論部４へのフィードバックを行う判定結果フ
ィードバック部である。FIG. 1 is a diagram showing a configuration for realizing the voiced / unvoiced determination method of the present invention. In FIG. 1, reference numeral 1 is a parameter extraction unit that extracts one or a plurality of characteristic parameters effective for determining whether a sound is present or not from voice data divided into frames. Reference numerals 2 and 3 are first and second multi-valued logic determination units that perform voiced / non-voiced determination according to their respective determination rules and output a determination result by multi-valued logic based on continuous values within the range of 0 to 1. Reference numeral 4 is an inference unit based on multivalued logic that infers a more reliable result from a plurality of determination results. A variable hangover generation unit 5 generates a variable hangover time according to the certainty of the determination result. Reference numeral 6 is a binary determination unit that finally performs a binary determination of whether there is sound or no sound. Reference numeral 7 denotes a determination result feedback unit that delays the determination result by one frame and feeds it back to the inference unit 4.

【００１５】以上のような構成において、有音無音判定
を行う場合について説明する。本実施例の各過程におい
ては下式を用いる。A case will be described in which the presence / absence of a sound is determined in the above configuration. The following equation is used in each step of this embodiment.

【００１６】Ｉｆパワーが大きいｔｈｅｎ有音 ……規則Ｉｆパワーが小さいｔｈｅｎ無音 ……規則Ｉｆ（ゼロクロス数が小さいａｎｄ前フレームが
有音）ｔｈｅｎ有音 ……規則まず、パラメータ抽出部１において、フレーム長がｎの
第ｊフレームの音声（Ｘ_j（ｉ）；０≦ｉ≦ｎ−１）よ
り特徴パラメータとして、パワーＰ_jとゼロクロス数Ｚ_j
を求め、その結果、図２に示すＰ_j、Ｚ_jを得たとする。If power is large then voice ... rule If power is low then silence ... rule If (small zero cross number and previous frame is voice) then voice ... rule First, in the parameter extraction unit 1, the frame is extracted. From the voice of the j-th frame having a length of n (X _j (i); 0 ≦ i ≦ n−1), the power P _j and the number of zero crosses Z _j are set as the characteristic parameters.
Is obtained, and as a result, P _j and Z _j shown in FIG. 2 are obtained.

【００１７】図３（ａ）は第１の多値論理判定部２の内
部構成を示す図であり、２ａは入力されたパワーＰ
_jと、出力である有音無音判定値ｄ_1jとの関係を定義す
るデータテーブルである。２ｂは入力されたパワーＰ_j
にしたがって、データテーブルから対応する有音無音判
定値ｄ_1jを読み出すデータテーブル読み出し部である。
図３（ｂ）はデータテーブル２ａにおけるパワーＰ_jと
有音無音判定値ｄ_1jとの関係を図示したものである。こ
の第１の多値論理判定部２では、規則、による判定
を、データテーブル読み出し部２ｂにより、入力された
パワーＰ_jに応じてデータテーブル２ａの読み出しを行
い、図２に示す有音無音判定値ｄ_1jを得て、推論部４に
対して出力する。FIG. 3 (a) is a diagram showing the internal structure of the first multi-valued logic decision section 2, 2a being the input power P.
₆ is a data table that defines the relationship between _j and the output voiced / unvoiced determination value d _1j . 2b is the input power P _j
The data table reading unit reads out the corresponding voiced / unvoiced determination value d _1j from the data table in accordance with the above.
FIG. 3B shows the relationship between the power P _j and the voiced / unvoiced judgment value d _1j in the data table 2a. In the first multi-level logic determination unit 2, rules, a decision by, the data table reading section 2b, reads the data table 2a in accordance with the input power P _j, voice activity detection shown in FIG. 2 The value d _1j is obtained and output to the inference unit 4.

【００１８】図４（ａ）は第２の多値論理判定部３の内
部構成を示す図であり、３ａは入力されたゼロクロス数
Ｚ_jと、出力である有音無音判定値ｄ_20jとの関係を定義
するデータテーブルである。３ｂは入力されたゼロクロ
ス数Ｚ_jにしたがって、データテーブル３ａから対応す
る有音無音判定値ｄ_20jを読み出すデータテーブル読み
出し部である。図４（ｂ）はデータテーブル３ａにおけ
るゼロクロス数Ｚ_jと有音無音判定値ｄ_20jとの関係を図
示したものである。この第２の多値論理判定部３では、
規則の「ゼロクロス数が小さい」の判定を、データテ
ーブル読み出し部３ｂにより、入力されたゼロクロス数
Ｚ_jに応じてデータテーブル３ａの読み出しを行い、図
２に示す有音無音判定値ｄ_20jを得て、推論部４に対し
て出力する。FIG. 4 (a) is a diagram showing the internal structure of the second multi-valued logic judgment section 3, and 3a shows the number of input zero crosses Z _j and the output sound / sound judgment value d _20j . It is a data table which defines a relationship. Reference numeral 3b is a data table reading unit for reading the corresponding sound / non-sound determination value d _20j from the data table 3a according to the input zero-cross number Z _j . FIG. 4B illustrates the relationship between the number of zero crosses Z _j and the voiced / unvoiced judgment value d _20j in the data table 3a. In the second multi-valued logic determination unit 3,
For the determination of "the number of zero crosses is small" in the rule, the data table reading unit 3b reads the data table 3a according to the input number of zero crosses Z _j , and obtains the voiced / unvoiced determination value d _20j shown in FIG. And outputs it to the inference unit 4.

【００１９】図５（ａ）は判定結果フィードバック部７
の内部構成を示す図であり、７ａは推論部４の出力ｄ_j
をもとに１フレーム遅延部７ｃを通じて得られる前フレ
ームにおける推論部４の出力ｄ_j-1と、出力である有音
無音判定値＠_21jとの関係を定義するデータテーブルで
ある。７ｂは入力された１フレーム遅延部７ｃの出力ｄ
_j-1にしたがって、データテーブル７ａから対応する有
音無音判定値＠_21jを読み出すデータテーブル読み出し
部である。この判定結果フィードバック部７では、規則
の「前フレームが有音」の判定を、データテーブル７
ｂにより、前フレームの推論部４の出力ｄ_j-1に応じて
データテーブル７ａの読み出しを行い、図２に示す有音
無音判定値＠_21jを得て、推論部４に出力する。FIG. 5A shows the judgment result feedback section 7
7a is a diagram showing the internal configuration of the output of the inference unit 4 d_j
The 1-frame delay unit 7c based on
Output d of the inference unit 4 in the game_j-1And the sound that is output
Silence judgment value @_21jIn the data table that defines the relationship with
is there. 7b is the input output d of the 1-frame delay unit 7c
_j-1According to the data table 7a
Sound silence judgment value @_21jRead Data Table Read
It is a department. In this judgment result feedback unit 7, the rule
Data table 7
The output d of the inference unit 4 of the previous frame according to b_j-1In response to the
The data table 7a is read, and the sound
Silence judgment value @_21jIs obtained and output to the inference unit 4.

【００２０】図６は推論部４の内部構成を示す図であ
る。図６において、４ａは第２の多値論理判定部３の出
力ｄ_20jと、判定結果フィードバック部７の出力＠_21jを
もとにして下式の計算を行う前置演算部である。４ｂ
は第１の多値論理判定部２の出力ｄ_1j、前置演算部４ａ
の出力ｄ_2jおよび０．５のうち最大値を出力する最大値
検出部、４ｃは第１の多値論理判定部２の出力ｄ_1j、前
置演算部４ａの出力ｄ_2jおよび０．５のうち最小値を出
力する最小値検出部である。FIG. 6 shows the internal structure of the inference unit 4. In FIG. 6, reference numeral 4a denotes a pre-calculation unit that calculates the following equation based on the output d _20j of the second multi-valued logic judgment unit 3 and the output @ _21j of the judgment result feedback unit 7. 4b
Is the output d _1j of the first multi-valued logic decision unit 2 and the pre-calculation unit 4a
Of the outputs d _2j and 0.5 of the maximum value detector 4c, the maximum value detector 4c outputs the output d _1j of the first multi-valued logic decision unit 2 and the outputs d _2j and 0.5 of the pre-calculator 4a. It is a minimum value detection unit that outputs the minimum value.

【００２１】上記の構成を持つ推論部４では、最初のス
テップとして規則の「ａｎｄ」を下式により実行
し、規則による判定結果として図２に示す有音無音判
定値ｄ _2jを得る。In the inference unit 4 having the above structure, the first scan
The rule "and" is executed as the step by the following formula
As a result of the rule, the voiced and silent judgment shown in FIG.
Constant value d _2jTo get

【００２２】ｄ_2j＝＠_21j×（ｄ_20j−０．５）＋０．５ …… 推論部４における次のステップでは、ｄ_1j、ｄ_2j、およ
び０．５をいずれも最大値検出部４ｂ、最小値検出部４
ｃに与え、それぞれの出力値を加える。この値から０．
５を減算して有音無音判定値ｄ_jを得る。ここでの最大
値検出部４ｂ、最小値検出部４ｃはそれぞれ最も確から
しい有音判定値と無音判定値とを求める機能を果たして
いる。D _2j = @ _21j × (d _20j −0.5) +0.5 In the next step of the inference unit 4, all of d _1j , d _2j , and 0.5 are detected by the maximum value detection unit 4 b. Minimum value detector 4
It is given to c, and each output value is added. From this value 0.
5 is subtracted to obtain a voiced / unvoiced judgment value d _j . The maximum value detection unit 4b and the minimum value detection unit 4c here have a function of obtaining the most probable sound determination value and silence determination value, respectively.

【００２３】図７は可変ハングオーバ発生部５の内部構
成を示す図である。図７において、７ａは推論部４の出
力ｄ_j＝ｘと、可変ハングオーバ発生部５の出力ｓの前
フレームにおける値ｓ^'とを用いて下式、により、
出力ｓを発生する演算部である。７ｂは、前フレームの
演算部７ａの出力（＝可変ハングオーバ発生部５の出力
ｓ）を演算部７ａに対して出力する１フレーム遅延部で
ある。下式、における処理では、例えば、時定数Ａ
_m＝０．１、Ａ_p＝０．９の場合、可変ハングオーバ発生
部５の出力ｓ＝ｆ（ｘ，ｓ^'）は図８（ａ）に示す通り
になる。図８（ａ）中の値は、出力ｓを示す。FIG. 7 is a diagram showing the internal structure of the variable hangover generation unit 5. In FIG. 7, 7a is given by the following equation using the output d _j = x of the inference unit 4 and the value s ^′ of the output s of the variable hangover generation unit 5 in the previous frame.
It is a calculation unit that generates an output s. Reference numeral 7b is a one-frame delay unit that outputs the output of the calculation unit 7a of the previous frame (= the output s of the variable hangover generation unit 5) to the calculation unit 7a. In the processing in the following equation, for example, the time constant A
_{When m} = 0.1 and A _p = 0.9, the output s = f (x, s ^′ ) of the variable hangover generator 5 is as shown in FIG. 8 (a). The value in FIG. 8A indicates the output s.

【００２４】ｓ＝ｆ（ｘ，ｓ^'）＝ｓ^'＋Ａ_m×（１−
ｘ）×（ｘ−ｓ^'）（ただし、ｘ≦ｓ ^'） …… ｓ＝ｆ（ｘ，ｓ^'）＝ｓ^'＋Ａ_p×ｘ×（ｘ−ｓ^'）（た
だし、ｘ＞ｓ^'） …… 、式によれば、無音のフレームが連続していても有
音らしきフレームが現れた場合には有音へと移行しやす
いが、逆に有音から無音へとは移行しにくくなってい
る。これは、実際に音声通信を行っている場合には、無
音を有音と判断することよりも、語間、語尾等の有音を
無音と判断して音声が途切れてしまうことのほうが影響
が大きいからである。S = f (x, s^') = S^'+ A_m× (1-
x) × (x−s^') (However, x ≦ s ^') …… s = f (x, s^') = S^'+ A_pXxx (x-s^') (Ta
But x> s^') …… According to the formula, even if there are consecutive silent frames,
When a sound-like frame appears, it is easy to move to voice
However, on the contrary, it is difficult to shift from voice to silence.
It This is not available when voice communication is actually performed.
Rather than judging a sound as a sound
It is more affected by the sound being interrupted when it is judged as silence
Is large.

【００２５】また、ｘに無音判定を表す一定値（ｘ＜
０．５）を与え、ｓ^'に有音判定を表す初期値（ｓ^'＞
０．５）を与えた場合の出力ｓが有音無音判定の境界値
に近い０．５５に減少するまでの繰り返し回数は図８
（ｂ）に示すようになり、ハングオーバ時間を可変する
ことになる。なお、図８（ｂ）中の数字は、何フレーム
無音状態が続いたときに有音であるとの情報（ｓ≧０．
５）を出力するかという値である。In addition, x is a constant value (x <
0.5), and an initial value (s ^' > which represents a voiced judgment is given to s ^'.
The number of repetitions until the output s when 0.5) is reduced to 0.55 which is close to the boundary value of the voiced / unvoiced determination is shown in FIG.
As shown in (b), the hangover time is variable. Note that the numbers in FIG. 8B indicate the number of frames of silence when there is a sound (s ≧ 0.
5) is a value indicating whether to output.

【００２６】したがって、この可変オーバハング発生部
５では、有音と判定するフレームの後、推論部４の出力
のうち無音と判定されたフレーム（ｘ＜０．５）が何フ
レーム続いた場合に最終結果として無音と判定するかと
いう基準回数を可変するというものである。したがっ
て、図８（ｂ）からも明らかなように、前フレームの出
力値ｓ^'が１．００に近い（かぎりなく有音という確か
らしさがある）場合、ｓ^'が０．５に近い場合に比べ
て、無音と判断するためのフレーム数が大きい。したが
って、有音のフレームが連続した後、ノイズ等の影響を
受けて、無音らしきフレームがほんの数回続いた場合、
本来有音であるにもかかわらず、これを無音と判断する
危険性が減少する。なお、ｘが０．５の時には有音無音
判定の境界値である０．５には限り無く近づくが０．５
にはならない。Therefore, in the variable overhang generating section 5, when the number of frames (x <0.5) determined to be silent in the output of the inference section 4 continues after the frame determined to be voiced, the final As a result, the reference number of times to determine whether or not there is silence is changed. Therefore, as is clear from FIG. 8B, when the output value s ^′ of the previous frame is close to 1.00 (there is a certainty that there is an infinite sound), s ^′ is close to 0.5. In comparison, the number of frames for determining silence is large. Therefore, if there is a silence-like frame only a few times after being affected by noise after a series of voiced frames,
Although it is originally voiced, the risk of judging it as silence is reduced. It should be noted that when x is 0.5, it approaches the boundary value of the voiced / unvoiced judgment of 0.5 infinitely, but 0.5
It doesn't.

【００２７】次に、２値判定部６において、可変オーバ
ハング発生部５の出力ｓに対し、０．５をしきい値とし
て、以下の通り最終的な有音無音判定を行う。Then, in the binary decision section 6, the output s of the variable overhang generation section 5 is subjected to the final decision as to whether there is sound or not, with 0.5 as a threshold value.

【００２８】ｓ≧０．５の場合は有音ｓ＜０．５の場合は無音以上のようにして各フレームについて有音無音判定を行
うことにより、パワー、ゼロクロス数といった精度の高
くない特徴パラメータを用いても、第１、第２の多値論
理判定部では有音無音の確からしさに応じた判定にとど
め、推論部においてこれらの判定結果と前フレームの判
定結果を考慮した推論を行うことにより、最終的により
確からしい判定を下すことができる。When s ≧ 0.5, voice is present when s <0.5, and voice is not performed for each frame as described above. Even if the above is used, the first and second multi-valued logic judgment units make only judgments according to the certainty of voiced and unvoiced sounds, and the inference unit makes an inference considering these judgment results and the judgment result of the previous frame. This makes it possible to finally make a more reliable decision.

【００２９】なお、本実施例では、分割された音声の中
から各フレームにおけるパワー及びゼロクロス数をパラ
メータとして用いたが、この他、前フレームと現フレー
ムとのパワーの比、あるいは各フレーム毎のスペクトル
の変化等を用いてもよい。さらに、３つ以上のパラメー
タをあらかじめ多値論理判定してもよいものである。In the present embodiment, the power and the number of zero crossings in each frame out of the divided voices are used as parameters, but in addition to this, the ratio of the power between the previous frame and the current frame, or each frame is used. You may use the change of a spectrum etc. Furthermore, multi-valued logic determination may be performed on three or more parameters in advance.

【００３０】また、本実施例では前フレームの推論部４
の出力をフィードバックする構成をとっているが、この
他に前フレームの可変ハングオーバ発生部５の出力を推
論部４の入力としてフィードバックしてもよいものであ
る。Further, in this embodiment, the inference unit 4 of the previous frame is used.
In addition to this, the output of the variable hangover generating unit 5 of the previous frame may be fed back as the input of the inference unit 4.

【００３１】[0031]

【発明の効果】本発明は上記実施例から明らかなよう
に、音声から抽出したパラメータをもとにした判定値を
０〜１の範囲内の連続値で多値論理化し、判定制度に応
じた値を出力し、複数の判定結果の最大値と最小値をも
とに推論を行うことにより、精度の低い判定規則を用い
ても、最終的に精度の高い判定を行うことが出来る。As is apparent from the above embodiment, the present invention multi-values the judgment value based on the parameter extracted from the voice with a continuous value within the range of 0 to 1, according to the judgment system. By outputting a value and performing inference based on the maximum value and the minimum value of a plurality of judgment results , even if a judgment rule with low accuracy is used, a judgment with high accuracy can be finally made.

【００３２】また、パラメータからの判定においては、
データテーブルを備えて、これを読み出して判定を行う
ことにより、パラメータと判定出力とが非線形関係にあ
る場合でも簡単な処理で判定結果を多値論理化できる。Further, in the judgment from the parameters,
By providing a data table and reading the data table to make a judgment, the judgment result can be multivalued logically by a simple process even when the parameter and the judgment output have a non-linear relationship.

【００３３】また、疑わしい無音判定の場合、ハングオ
ーバ時間を長くできるので、語間、語尾を無音に判定す
る割合を減少させることができる。Further, in the case of suspicious silence determination, the hangover time can be lengthened, so that it is possible to reduce the rate of determining silence between words and endings.

[Brief description of drawings]

【図１】本発明の一実施例における有音無音判定装置の
構成を示すブロック図FIG. 1 is a block diagram showing the configuration of a sound / sound determination device according to an embodiment of the present invention.

【図２】本実施例における各ブロックの出力を示す図FIG. 2 is a diagram showing the output of each block in this embodiment.

【図３】（ａ）は本実施例における第１の多値論理判定
部の内部構成を示すブロック図（ｂ）は第１の多値論理判定部のデータテーブルの内容
を示す図FIG. 3A is a block diagram showing an internal configuration of a first multi-valued logic judgment unit in the present embodiment, and FIG. 3B is a diagram showing contents of a data table of the first multi-valued logic judgment unit.

【図４】（ａ）は本実施例における第２の多値論理判定
部の内部構成を示すブロック図（ｂ）は第２の多値論理判定部のデータテーブルの内容
を示す図FIG. 4A is a block diagram showing an internal configuration of a second multi-valued logic judgment unit in the present embodiment, and FIG. 4B is a diagram showing contents of a data table of the second multi-valued logic judgment unit.

【図５】（ａ）は本実施例における判定結果フィードバ
ック部の内部構成を示すブロック図（ｂ）は判定結果フィードバック部のデータテーブルの
内容を示す図5A is a block diagram showing an internal configuration of a determination result feedback unit in the present embodiment, and FIG. 5B is a diagram showing contents of a data table of the determination result feedback unit.

【図６】本実施例における推論部の内部構成を示すブロ
ック図FIG. 6 is a block diagram showing an internal configuration of an inference unit in this embodiment.

【図７】本実施例における可変ハングオーバ発生部の内
部構成を示すブロック図FIG. 7 is a block diagram showing an internal configuration of a variable hangover generation unit in this embodiment.

【図８】（ａ）は本実施例における可変オーバハング発
生部の入出力関係を示す図（ｂ）は同可変オーバハング発生部における入力値とハ
ングオーバとの関係を示す図FIG. 8A is a diagram showing an input / output relationship of a variable overhang generating unit in the present embodiment, and FIG. 8B is a diagram showing a relationship between an input value and a hangover in the variable overhang generating unit.

【図９】従来の有音無音判定装置を示すブロック図FIG. 9 is a block diagram showing a conventional sound / sound determination device.

[Explanation of symbols]

１パラメータ抽出部２第１の多値論理判定部３第２の多値論理判定部４推論部５可変オーバハング発生部６２値判定部７判定結果フィードバック部 1 Parameter extractor 2 First multi-valued logic judgment unit 3 Second multi-valued logic judgment unit 4 Reasoning Department 5 Variable overhang generator 6 Binary judgment section 7 Judgment result feedback section

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平２−203397（ＪＰ，Ａ) 特開平２−204799（ＪＰ，Ａ) 特開平３−236100（ＪＰ，Ａ) 特開平４−42299（ＪＰ，Ａ) 特開昭60−209799（ＪＰ，Ａ) 特公昭63−13200（ＪＰ，Ｂ２) ─────────────────────────────────────────────────── ─── Continued front page (56) Reference JP-A-2-203397 (JP, A) JP-A-2-204799 (JP, A) JP-A-3-236100 (JP, A) Japanese Patent Laid-Open No. 4-42299 (JP, A) JP-A-60-209799 (JP, A) Japanese Patent Publication Sho 63-13200 (JP, B2)

Claims

(57) [Claims]

1. A multi-valued logic with continuous values in the range of 0 to 1 is used to determine the presence or absence of voice using each of a plurality of parameters extracted from the input voice, and the maximum and minimum values of these plurality of determination results are determined. A voiced / unvoiced determination method which infers voiced / unvoiced by multivalued logic based on values .

2. The maximum value and the minimum value of the multivalued logic judgment result by a plurality of parameters and the inference result in the previous frame.
The voiced / unvoiced determination method according to claim 1, wherein voiced / unvoiced is inferred by multivalued logic based on the above.

3. The voiced / soundless determination method according to claim 1, wherein the multivalued logical determination is performed by using a data table defining a determination value for a parameter extracted from voice.

4. The voiced / unvoiced determination method according to claim 1, wherein the hangover time is varied according to the inference result of the multivalued logic.

5. A parameter extraction unit for extracting a plurality of parameters from audio data divided for each frame, and a multivalued logic with continuous values in the range of 0 to 1 based on each of the extracted plurality of parameters. A plurality of multi-valued logic judgment units for judging presence / absence of sound, an inference unit for inferring presence / absence of sound with a multi-valued logic based on outputs of the plurality of multi-valued logic judgment units, and an output of the inference unit. maximum variable hangover generator for varying the hangover, and a binary determination unit for performing binary determination on the output of the variable hangover generator, the inference unit includes a plurality of multi-valued logic determination value according to Value and minimum value
A sound / silence determination device , which outputs a value obtained by adding .

6. An inference unit is provided with a decision result feedback unit for inputting the output of the inference unit in the previous frame to the inference unit, and inference is performed from the maximum value and the minimum value of the outputs of the plurality of parameter extraction units and the output of the determination result feedback unit. The voiced / unvoiced determination device according to claim 5, which is performed.

7. The sound / sound determination device according to claim 5, wherein the multi-valued logic determination unit includes a data table defining output values for the parameters.