JPH11133997A

JPH11133997A - Equipment for determining presence or absence of sound

Info

Publication number: JPH11133997A
Application number: JP9301489A
Authority: JP
Inventors: Koji Yoshida; 幸司吉田
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1997-11-04
Filing date: 1997-11-04
Publication date: 1999-05-21

Abstract

PROBLEM TO BE SOLVED: To provide the high-accuracy equipment for determining the presence or absence of sound and capable of correctly determining the absence of sound at the silent zone even under conditions such as the higher noise level, or the lower S/N ratio (sound signal to noise power ratio). SOLUTION: A power enumerating device 301 enumerates power for every definite zone (frame) in an input sound signal. A silent power presuming device 302 presumes silent power. A plurality of the resulting parameters are individually used. Each individual multivalued logic determining device 304-307 individually determines the definition of the presence or absence of sound. An overall determining device 308 carries out multivalue determination in the degree of the presence of sound, using a plurality of the individual determination results. Binary encoding is applied to the presence or absence of sound according to the determination result of the presence of sound after hangover. The final determination result is then generated.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、ディジタル携帯電
話等のディジタル移動通信端末に必須な音声符号化装置
における符号器の前段において、入力音声の有音無音を
判定するために用いる有音無音判定装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice / silence determination used for determining voice / silence of an input voice at a stage prior to an encoder in a voice coding apparatus essential for a digital mobile communication terminal such as a digital portable telephone. Related to the device.

【０００２】[0002]

【従来の技術】有音無音判定装置は入力した音声信号に
ついての有音（音声あり）／無音（音声なし）を判定す
るもので、音声通信を行う際に有音（音声あり）と判定
された区間のみ信号を伝送することで伝送効率の向上を
図ることができると共に、無音（音声なし）区間で信号
の伝送を行わない（送信しない）ことにより送信時の消
費電力を削減することができ、高い精度の有音／無音
（音声あり／なし）の判定を行う有音無音判定装置
（例：携帯電話機、携帯無線機等）が望まれている。2. Description of the Related Art A sound / non-speech judging device judges a sound (with sound) / no sound (no sound) of an input sound signal, and judges that there is sound (with sound) when performing voice communication. The transmission efficiency can be improved by transmitting the signal only in the section where the signal has been transmitted, and the power consumption at the time of transmission can be reduced by not transmitting (not transmitting) the signal in the silent (no sound) section. There is a demand for a sound / silence determination device (eg, a mobile phone, a portable wireless device, and the like) that performs high-accuracy sound / silence (voice presence / absence) determination.

【０００３】従来の有音無音判定装置の１つとして、'9
2年電子情報通信学会春季大会、B-373、「有音無音判定
方式の検討」に記載の技術がある。以下、その従来の有
音無音判定装置について図５を用いて説明する。図５は
その構成を示したブロック図で、501は入力音声信号に
対して一定区間毎の音声信号パワーを算出するパワー算
出器、502はパワー算出器501で得られた音声信号パワー
および過去の有音無音判定結果の値を用いて無音区間の
パワーを推定する無音パワー推定器、503〜505は入力音
声信号における有音無音（音声あり、なし）の確からし
さに応じた個別の判定を行う複数の個別多値論理判定
器、506は前記複数の多値論理判定器により得られた判
定結果を基に多値論理により有音無音（音声あり、な
し）を判定する総合判定器、507は前記総合判定器506に
より得られた判定結果に対して有音から無音への判定を
遅らせるハングオーバ処理器、508はハングオーバ処理
器507を通して得られた有音度合いを表す連続値から有
音（音声あり）か無音（音声なし）のいずれかに判定す
る２値化器である。以上のように構成された従来の有音
無音判定装置についてその動作を説明する。[0003] As one of the conventional sound / non-sound judging devices, '9
There is a technology described in the IEICE Spring Conference, B-373, "Study of Speech and Silence Judgment Method". Hereinafter, the conventional sound / silence determination device will be described with reference to FIG. FIG. 5 is a block diagram showing the configuration. Reference numeral 501 denotes a power calculator for calculating the audio signal power of the input audio signal for each fixed section. Reference numeral 502 denotes the audio signal power obtained by the power calculator 501 and the past. A silent power estimator for estimating power in a silent section using the value of the voiced / silent determination result, and 503 to 505 perform individual determination according to the likelihood of voiced / silent (voice presence / absence) in the input audio signal. A plurality of individual multi-valued logic decision units 506 are integrated decision units for determining sound / silence (with or without voice) by multi-valued logic based on the decision results obtained by the plurality of multi-valued logic decision units. A hangover processor 508 delays the determination from sound to silence with respect to the determination result obtained by the comprehensive determiner 506. A hangover processor 508 uses a continuous value representing the degree of sound obtained through the hangover processor 507 to generate a sound (voice is present). ) Or silence (voice ) Is a binarizing unit determines that either. The operation of the conventional sound / non-speech determination device configured as described above will be described.

【０００４】図５において、パワー算出器501により入
力音声信号の一定区間（以降「フレーム」と呼ぶ）毎の
パワーを算出し、無音パワー推定器502では、前記パワ
ー算出器501で得られた入力音声信号パワーおよび過去
の有音無音判定結果を用いて推定無音パワーを更新す
る。推定無音パワーは、対象フレームの無音の確からし
さが大きいほどその時の入力信号パワーの値に近くなる
よう推定無音パワーを更新することにより得られる。そ
して、前記パワー算出器501により得られた入力音声パ
ワーおよび前記無音パワー推定器502により得られた推
定無音パワーを用いて、絶対判定器503、変化判定器50
4、相対判定器505の各々の個別多値論理判定器により、
有音／無音の確からしさを個別に判定する。個別多値論
理判定器のうち、絶対判定器503は入力信号パワーの絶
対値を、変化判定器504は入力信号パワーの変化量（フ
レーム毎での差や比等）を用いて有音度の判定を行い、
相対判定器505は推定無音パワーに対する入力信号パワ
ーの比または差をパラメータとして判定する。そして個
別の多値論理判定で得られた複数の個別判定結果を用い
て総合判定器506により有音度（音声ありの度合い）の
多値判定を行う。ハングオーバ処理器507では、総合判
定器506の結果から有音度が高いほど有音から無音への
判定をより遅らせるような制御を行う。最後に、ハング
オーバ後の有音判定結果に対して有音（音声あり）か無
音（音声なし）かの２値化を行い最終的な判定結果とし
て出力する。In FIG. 5, a power calculator 501 calculates the power of each fixed section (hereinafter referred to as a "frame") of an input speech signal, and a silence power estimator 502 calculates the input power obtained by the power calculator 501. The estimated silence power is updated using the audio signal power and the result of the sound / silence determination in the past. The estimated silence power is obtained by updating the estimated silence power so that the greater the probability of silence of the target frame, the closer to the value of the input signal power at that time. Then, using the input voice power obtained by the power calculator 501 and the estimated silent power obtained by the silent power estimator 502, an absolute determiner 503, a change determiner 50
4, by each individual multi-valued logic decision unit of the relative decision unit 505,
The likelihood of sound / no sound is individually determined. Among the individual multi-valued logic decision units, the absolute decision unit 503 uses the absolute value of the input signal power, and the change decision unit 504 uses the amount of change in the input signal power (difference or ratio for each frame, etc.) to determine the sound level. Make a decision,
The relative determiner 505 determines the ratio or difference of the input signal power to the estimated silent power as a parameter. Then, using the plurality of individual determination results obtained by the individual multi-valued logical determinations, the integrated determiner 506 makes a multi-valued determination of the sound level (the degree of presence of voice). The hangover processor 507 performs control such that the higher the sound level is, the longer the determination from sound to no sound is made based on the result of the overall judgment unit 506. Finally, the sound determination result after the hangover is binarized into a sound (with sound) or a silence (no sound) and output as a final determination result.

【０００５】上記従来の有音無音判定装置は、入力音声
信号のパワーのみをパラメータとして判定を行う方式で
あり、音声符号化で用いられる他のパラメータを使用す
る必要がないため音声符号化とは独立に判定が可能であ
り、また判定に使用するパラメータの算出に要する演算
量も非常に小さくてすむ。また、入力音声信号のパワー
のみを用いる方法ながら、種々の周囲騒音条件下でも良
好な判定を行うことができる。[0005] The above-mentioned conventional sound / non-speech judging device is a system in which judgment is made using only the power of an input speech signal as a parameter, and there is no need to use other parameters used in speech encoding. The determination can be made independently, and the amount of calculation required for calculating the parameters used for the determination can be very small. In addition, while using only the power of the input audio signal, a good determination can be made under various ambient noise conditions.

【０００６】[0006]

【発明が解決しようとする課題】しかしながら上記従来
の有音無音判定装置は、入力信号のパワーを基本として
判定を行っているため、自動車内の騒音等の高いレベル
の騒音環境下での有音無音判定に限界があり、非常に大
きなレベルの周囲騒音が存在する場合には、音声のない
無音区間でも騒音レベルが高いために有音と判定される
ことがあり、伝送効率が低下する場合があるという問題
点を有していた。However, since the above-described conventional sound / silence determination device makes a determination based on the power of an input signal, the sound / noise determination under a high-level noise environment such as noise in an automobile is performed. If there is a limit to silence determination and there is a very large level of ambient noise, even in a silent section where there is no sound, it may be determined that there is sound because the noise level is high, and transmission efficiency may decrease. There was a problem that there is.

【０００７】本発明は、上記従来の問題を解決するもの
で、騒音レベルが高い場合やＳ／Ｎ比（音声信号対騒音
のレベル比）が低い場合などの条件においても、無音区
間をより正しく無音と判定することのできる高精度な有
音無音判定装置を提供することを目的とする。The present invention solves the above-mentioned conventional problems. Even when the noise level is high or the S / N ratio (voice signal to noise level ratio) is low, the silent section is more correctly determined. It is an object of the present invention to provide a high-precision sound / silence determination device capable of determining silence.

【０００８】[0008]

【課題を解決するための手段】上記問題を解決するため
に本発明は、入力音声信号に対してフレーム毎の音声信
号パワーを算出するパワー算出器と、入力音声信号のピ
ッチ周期性度合いを表すピッチパラメータを算出するピ
ッチパラメータ算出器と、前記音声信号パワーおよび前
記ピッチパラメータから入力音声信号の有音（音声あ
り）か無音（音声なし）かを判定する判定器を備え、前
記入力音声信号が有音（音声あり）から無音（音声な
し）へ変化する際に、前記ピッチパラメータの示す値が
所定のしきい値より低いときに、前記フレームを無音
（音声なし）と判定するようにしたものである。SUMMARY OF THE INVENTION In order to solve the above problems, the present invention provides a power calculator for calculating an audio signal power for each frame with respect to an input audio signal, and a degree of pitch periodicity of the input audio signal. A pitch parameter calculator for calculating a pitch parameter; and a determiner for determining whether the input voice signal is sound (with voice) or silent (no voice) from the voice signal power and the pitch parameter. When changing from voiced (with voice) to silence (no voice), when the value indicated by the pitch parameter is lower than a predetermined threshold value, the frame is determined to be voiceless (no voice). It is.

【０００９】また本発明は、フレーム毎の有音無音判定
結果に基づいて音声符号化を行う音声符号化器と、入力
音声信号のピッチ周期性の度合いを表すピッチパラメー
タを算出するピッチパラメータ算出器とを備えた音声符
号化装置において、入力音声信号に対してフレーム毎の
音声信号パワーを算出するパワー算出器と、前記音声符
号化装置内のピッチパラメータ算出器により得られたピ
ッチパラメータを一定フレーム数分遅延させる遅延器
と、音声信号パワーおよび一定フレーム数分遅延した前
記ピッチパラメータから、有音（音声あり）か無音（音
声なし）かを判定する判定器を備え、前記判定器におい
て、前記入力音声信号が有音（音声あり）から無音（音
声なし）へ変化する際に、前記ピッチパラメータの示す
値が所定のしきい値より低いときに、前記フレームを無
音（音声なし）と判定するようにしたものである。Further, the present invention provides a speech coder for performing speech coding based on the result of speech / non-speech determination for each frame, and a pitch parameter calculator for calculating a pitch parameter representing a degree of pitch periodicity of an input speech signal. And a power calculator for calculating a speech signal power for each frame with respect to the input speech signal, and a pitch parameter obtained by the pitch parameter calculator in the speech encoding device, a fixed frame A delay unit that delays by several minutes, and a determiner that determines whether there is sound (with voice) or no sound (without voice) from the voice signal power and the pitch parameter that is delayed by a certain number of frames. When the input audio signal changes from a sound (with sound) to a silence (no sound), the value indicated by the pitch parameter becomes a predetermined threshold value. Ri when low, is obtained so as to determine the frame and silence (no speech).

【００１０】更に本発明は、入力音声信号のフレーム毎
の音声信号パワーを算出するパワー算出器と、前記パワ
ー算出器で得られた音声信号パワーおよび過去の有音無
音判定結果を用いて無音区間のパワーを推定する無音パ
ワー推定器と、後段の前記音声符号化装置で得られるピ
ッチパラメータを１フレーム分遅延したピッチパラメー
タを出力する遅延器と、前記音声信号パワーや前記推定
無音パワーおよび１フレーム分遅延した前記ピッチパラ
メータのうちの一部を用いて、有音無音（音声あり、な
し）の確からしさに応じた個別の判定を行う複数の個別
多値論理判定器と、前記複数の多値論理判定器により得
られた判定結果を基に多値論理により、有音無音を判定
する総合判定器を備え、１フレーム分遅延した前記ピッ
チパラメータを用いた前記個別多値論理判定器が、有音
と判定された前記フレームにおいては、ピッチパラメー
タのピッチ周期性度合いを示す値が高いほど有音（音声
あり）の確からしさを大きく、低いほど無音（音声な
し）の確からしさを大きく判定し、無音のフレームにお
いては、有音・無音（音声あり・なし）の判定が不能で
あることを示す値とする、またはその値に漸近するよう
にしたものである。Further, the present invention provides a power calculator for calculating an audio signal power for each frame of an input audio signal, and a silent section using the audio signal power obtained by the power calculator and the result of the previous sound / silence determination. A silence power estimator for estimating the power of the speech signal, a delay unit for outputting a pitch parameter obtained by delaying the pitch parameter obtained by the speech encoding device at the subsequent stage by one frame, a speech signal power, the estimated silence power, and one frame A plurality of individual multi-valued logic determiners for performing individual determinations according to the likelihood of a sound or silence (with or without voice) by using a part of the pitch parameter delayed by a minute; A multi-valued logic based on the decision result obtained by the logic decision unit; In addition, in the frame determined as a sound by the individual multi-valued logic determiner, the higher the value indicating the pitch periodicity degree of the pitch parameter, the greater the certainty of the sound (with sound), and the lower the value, the lower the silence (with sound). (No sound) is determined to be large, and in a silent frame, a value indicating that it is not possible to determine whether there is sound or silence (with or without sound), or asymptotically closer to that value It is.

【００１１】以上により、騒音レベルが高い場合やＳ／
Ｎ比が低い場合などの条件においても、無音区間をより
正しく無音と判定することできる。As described above, when the noise level is high or when S /
Even under conditions such as when the N ratio is low, a silent section can be more correctly determined to be silent.

【００１２】[0012]

【発明の実施の形態】本発明の請求項１、２に記載の第
１の発明は、入力音声信号に対してフレーム毎の音声信
号パワーを算出するパワー算出器と、入力音声信号に対
してピッチ周期性度合いを表すピッチパラメータを算出
するピッチパラメータ算出器と、前記音声信号パワーお
よび前記ピッチパラメータから有音（音声あり）か無音
（音声なし）かを判定する判定器を備えたものであり、
前記判定器において、前記フレームが有音（音声あり）
から無音（音声なし）へ変化する際に、前記ピッチパラ
メータの示すピッチ周期性度合いの値が所定のしきい値
より低い場合に、前記フレームを無音（音声なし）と判
定することにより、騒音レベルが高い場合やＳ／Ｎ比が
低い場合などの条件においても、無音区間を正しく無音
（音声なし）と判定することができるという作用を有す
る。尚、請求項９は第１の発明を方法で実現したもので
ある。DESCRIPTION OF THE PREFERRED EMBODIMENTS A first invention according to claims 1 and 2 of the present invention provides a power calculator for calculating an audio signal power for each frame with respect to an input audio signal, and a power calculator for calculating an input audio signal. A pitch parameter calculator for calculating a pitch parameter representing a degree of pitch periodicity; and a determiner for determining whether there is sound (with sound) or no sound (without sound) based on the audio signal power and the pitch parameter. ,
In the determiner, the frame has a sound (with sound)
When the value of the degree of pitch periodicity indicated by the pitch parameter is lower than a predetermined threshold value when the sound level changes from silent to no sound (no sound), the noise level is determined by determining that the frame is silent (no sound). Has an effect that a silent section can be correctly determined to be silent (no sound) even under conditions such as a high S / N ratio and a low S / N ratio. The ninth aspect of the present invention realizes the first aspect by a method.

【００１３】また、本発明の請求項３、４に記載の第２
の発明は、フレーム毎の有音無音判定結果に基づいて音
声符号化を行う音声符号器と入力音声信号のピッチ周期
性の度合いを表すピッチパラメータを算出するピッチパ
ラメータ算出器とを備えた音声符号化装置において、入
力音声信号に対してフレーム毎の音声信号パワーを算出
するパワー算出器と、前記ピッチパラメータ算出器によ
り得られた前記ピッチパラメータを１フレーム分遅延さ
せる遅延器と、前記音声信号パワーおよび前記遅延器出
力の１フレーム分遅延したピッチパラメータから、有音
（音声あり）か無音（音声なし）かを判定する判定器を
備えたものであり、前記判定器において、前記入力音声
信号が有音（音声あり）から無音（音声なし）へ変化す
る際、前記ピッチパラメータの示すピッチ周期性度合い
の値が所定のしきい値より低い場合に、前記フレームを
無音（音声なし）と判定することにより、騒音レベルが
高い場合やＳ／Ｎ比が低い場合などの条件においても、
無音の一定区間を正しく無音（音声なし）と判定するこ
とができると同時に、前記音声符号器部で算出された１
フレーム前のピッチパラメータを用いることにより、判
定対象フレームの音声符号化処理を待つことによる遅延
を生じることなく、且つ前記ピッチパラメータの算出に
必要な演算量の増加がなく、高精度な判定を行うことが
できるという作用を有する。Further, the second and third aspects of the present invention.
The present invention relates to a speech code comprising a speech encoder for performing speech encoding based on a speech / non-speech determination result for each frame and a pitch parameter calculator for calculating a pitch parameter representing a degree of pitch periodicity of an input speech signal. A power calculator for calculating an audio signal power for each frame with respect to an input audio signal, a delay device for delaying the pitch parameter obtained by the pitch parameter calculator by one frame, and the audio signal power And a determiner for determining whether there is sound (with sound) or no sound (without sound) from the pitch parameter delayed by one frame of the output of the delay unit. When changing from voiced (with voice) to silence (no voice), the value of the pitch periodicity indicated by the pitch parameter is a predetermined threshold. It is lower than the value, by determining the frame as silence (no speech), in conditions such as when or if the S / N ratio the noise level is high is low,
It is possible to correctly determine a certain section of silence as silence (no speech), and at the same time, the 1 calculated by the speech encoder unit.
By using the pitch parameter before the frame, high-precision determination is performed without causing a delay due to waiting for speech coding processing of the frame to be determined and without increasing the amount of calculation required for calculating the pitch parameter. It has the effect of being able to.

【００１４】また、本発明の請求項５、６に記載の第３
の発明は、フレーム毎の有音無音判定結果に基づいて音
声符号化を行う音声符号器と前記音声符号器が入力音声
信号のピッチ周期性の度合いを表すピッチパラメータを
算出するピッチパラメータ算出器とを備えた音声符号化
装置において、入力音声信号に対してフレーム毎の音声
信号パワーを算出するパワー算出器と、前記パワー算出
器で得られた音声信号パワーおよび過去の有音無音判定
結果を用いて無音区間のパワーを推定する無音パワー推
定器と、後段の前記音声符号化装置で得られる前記ピッ
チパラメータを１フレーム分遅延させて出力する遅延器
と、前記音声信号パワー、前記推定無音パワー、前記ピ
ッチパラメータのうちの一部を用いて、有音無音（音声
あり、なし）の確からしさに応じた個別の判定を行う複
数の個別多値論理判定器と、前記複数の多値論理判定器
により得られた判定結果を基に多値論理により有音無音
（音声あり、なし）を判定する総合判定器を備えたたも
のであり、前記ピッチパラメータを用いた個別多値論理
判定器が、有音と判定されたフレームにおいては、その
フレームの前記ピッチパラメータの示すピッチ周期性度
合いが高いほど有音の確からしさを大きく、低いほど無
音の確からしさを大きく判定し、無音のフレームにおい
ては、有音・無音（音声あり・なし）の判定が不能であ
ることを示す値とする、またはその値に漸近するように
動作することにより、騒音レベルが高い場合やＳ／Ｎ比
が低い場合などの条件においても、無音のフレームを正
しく無音（音声なし）と判定することができると同時
に、１フレーム前のピッチパラメータを用いることによ
り、判定対象のフレームの音声符号化処理を待つことに
よる遅延を生じることなく、且つ前記ピッチパラメータ
の算出に必要な演算量の増加がなく高精度な判定を行う
ことができるという作用を有する。The third aspect of the present invention relates to the third aspect.
The invention of the present invention is a speech coder that performs speech coding based on a sound / non-speech determination result for each frame, and a pitch parameter calculator that calculates a pitch parameter indicating a degree of pitch periodicity of the input speech signal by the speech coder. In the speech coding apparatus provided with, using a power calculator that calculates the speech signal power for each frame for the input speech signal, using the speech signal power obtained in the power calculator and the result of the previous voiced and silent determination A silence power estimator for estimating the power of the silence section, a delay unit for delaying the pitch parameter obtained by the speech encoding device at the subsequent stage by one frame and outputting the same, and the speech signal power, the estimated silence power, A plurality of individual multi-valued logics that perform individual determinations according to the likelihood of a sound or silence (with or without voice) using a part of the pitch parameters A multi-level logic based on the determination results obtained by the plurality of multi-level logic determiners, and a comprehensive determiner that determines whether there is sound or no sound (with or without voice) based on the determination result. In a frame determined by the individual multi-valued logic determiner using a parameter as a sound, the higher the pitch periodicity degree indicated by the pitch parameter of the frame, the greater the certainty of the sound. The noise level is determined by determining the likelihood of the sound, and setting the value to indicate that it is not possible to determine the presence or absence of sound or silence (with or without sound) in a silent frame, or by operating asymptotically to that value. Is high, or the S / N ratio is low, a silent frame can be correctly determined to be silent (no voice), and at the same time, the pitch parameter of the previous frame can be determined. By using such a function, it is possible to perform a highly accurate determination without causing a delay caused by waiting for the voice encoding processing of the frame to be determined and without increasing the amount of calculation required for calculating the pitch parameter. Have.

【００１５】また、本発明の請求項７に記載の第４の発
明は、第３の発明の構成に加え、後段の音声符号化装置
で得られる音声スペクトルパラメータを１フレーム分遅
延させる遅延器と、判定対象のフレームより以前からの
スペクトルパラメータの変化量を算出するスペクトル変
化量算出器を備えたものであり、ピッチパラメータを用
いた個別多値論理判定器が、有音と判定されたフレーム
においては、そのフレームの前記ピッチパラメータの示
すピッチ周期性度合いの値が高いほど有音の確からしさ
を大きく、またピッチ周期性度合いの値が低くかつスペ
クトル変化量が小さいほどほど無音の確からしさを大き
く判定するように動作することにより、より精度の高い
有音無音判定を行うことができるという作用を有する。According to a fourth aspect of the present invention, in addition to the configuration of the third aspect, a delay unit for delaying a speech spectrum parameter obtained by a subsequent speech encoding device by one frame is provided. A spectrum change amount calculator for calculating the change amount of the spectrum parameter from before the frame to be determined, wherein the individual multi-valued logic determiner using the pitch parameter is used for the frame determined to be sound. Is determined that the higher the value of the pitch periodicity degree indicated by the pitch parameter of the frame, the greater the likelihood of sound, and the lower the value of the pitch periodicity degree and the smaller the amount of spectrum change, the greater the likelihood of silence. By doing so, there is an effect that it is possible to make a more accurate sound / silence determination.

【００１６】以下、本発明の実施の形態について、図１
から図４を用いて説明する。（実施の形態１）図１は第１の発明における有音無音判
定装置のブロック図を示したものである。図１におい
て、101は入力音声信号に対してフレーム毎の音声信号
パワーを算出するパワー算出器、102は入力音声信号に
対してそのピッチ周期性度合いを表すピッチパラメータ
を算出するピッチパラメータ算出器、103は前記音声信
号パワーおよび前記ピッチパラメータから有音か無音か
を判定する判定器である。以上のように構成された有音
無音判定装置について図１を用いてその動作を説明す
る。Hereinafter, an embodiment of the present invention will be described with reference to FIG.
This will be described with reference to FIG. (Embodiment 1) FIG. 1 is a block diagram showing a sound / non-speech judging device according to the first invention. In FIG. 1, 101 is a power calculator for calculating an audio signal power for each frame for an input audio signal, 102 is a pitch parameter calculator for calculating a pitch parameter representing the degree of pitch periodicity of the input audio signal, Reference numeral 103 denotes a determiner that determines whether there is sound or no sound based on the audio signal power and the pitch parameter. The operation of the sound / silence determination device configured as described above will be described with reference to FIG.

【００１７】図１において、パワー算出器101により入
力音声信号のフレーム毎の音声信号パワーを算出し、ピ
ッチパラメータ算出器102において、入力音声信号に対
してピッチ周期性度合いを表すパラメータ（ピッチパラ
メータ）を算出する。ピッチ周期性の度合いを表すパラ
メータ（ピッチパラメータ）としては、入力音声信号ま
たは入力音声信号に対する線形予測フィルタリングによ
り得られた線形予測残差信号の自己相関を最大にする遅
延値（ピッチ周期）における相関値（正規化ピッチ最大
相関値）や、ピッチ予測誤差を最小にする遅延値(ピッ
チ周期)における予測ゲイン(ピッチ予測ゲイン)等があ
る。次に、判定器103により、前記パワー算出器101およ
び前記ピッチパラメータ算出器102により得られた前記
音声信号パワーおよび前記ピッチパラメータを用いて有
音無音（音声あり、なし）の判定を行う。ここで、ピッ
チパラメータは有音区間における有音（音声あり）から
無音（音声なし）への変化を検出するときにのみに使用
する。すなわち、有音（音声あり）と判定されたフレー
ムの次の判定対象フレームが有音（音声あり）であるか
無音（音声なし）であるかの判定に用い、そのピッチパ
ラメータの示すピッチ周期性度合いの値がしきい値より
低いときに無音（音声なし）であることを示す。正規化
ピッチ最大相関係数やピッチゲイン等のピッチ周期性の
度合いを表すパラメータは、音声の有声／無声を表すパ
ラメータで、音声の立ち上がりの正確な検出より、音声
区間中の音声区間の継続（ピッチ周期性度合いが高い場
合）か無音区間への変化（ピッチ周期性度合いが継続的
に低い場合）という、有音（音声あり）から無音（音声
なし）への検出に適しており、本実施形態に示す構成が
有効である。また、前記判定器に入力された前記音声信
号パワーは、その値から判定対象フレームのパワーの絶
対値やそれ以前のフレームからのパワーの変化量、さら
には実施の形態３で示すような推定無音区間パワーとの
比等のパラメータを算出し、それらを単独または複数の
組み合わせで判定に用いる。従って、有音区間中の有音
無音（音声あり、なし）の判定は、入力音声信号パワー
から求められた単独または複数のパラメータおよびピッ
チパラメータを用い、それぞれに設定されたしきい値と
の比較で、それらのいずれかまたは全てが無音であるこ
とを示す場合に、無音（音声なし）と判定する。なお、
ピッチパラメータのしきい値としては、ピッチパラメー
タに正規化ピッチ最大相関値を用いる場合には０．０〜
１．０での適切な値（例えば０．４）とする。一方、無
音区間中の有音無音判定は、入力音声信号パワーから求
められた単独または複数のパラメータのみを用い、それ
ぞれに設定されたしきい値との比較で、それらのいずれ
かまたは全てが有音であることを示す場合に、有音（音
声あり）と判定する。In FIG. 1, a power calculator 101 calculates a speech signal power of each frame of an input speech signal, and a pitch parameter calculator 102 represents a parameter (pitch parameter) representing a degree of pitch periodicity with respect to the input speech signal. Is calculated. As a parameter (pitch parameter) representing the degree of pitch periodicity, the correlation in the delay value (pitch period) that maximizes the autocorrelation of the input speech signal or the linear prediction residual signal obtained by linear prediction filtering on the input speech signal Values (normalized pitch maximum correlation value) and a prediction gain (pitch prediction gain) in a delay value (pitch cycle) that minimizes the pitch prediction error. Next, the determiner 103 determines the presence or absence of sound or silence (with or without voice) using the audio signal power and the pitch parameter obtained by the power calculator 101 and the pitch parameter calculator 102. Here, the pitch parameter is used only when detecting a change from a sound (with sound) to a silent (no sound) in a sound section. That is, it is used to determine whether the frame to be determined next to a frame determined to be voiced (with voice) is voiced (with voice) or silent (no voice), and the pitch periodicity indicated by the pitch parameter is used. When the value of the degree is lower than the threshold value, it indicates that there is no sound (no sound). A parameter representing the degree of pitch periodicity such as a normalized pitch maximum correlation coefficient and a pitch gain is a parameter representing voiced / unvoiced voice. Based on accurate detection of the rising edge of voice, continuation of voice section in voice section ( This method is suitable for detecting from speech (with speech) to silence (without speech), that is, when the pitch periodicity is high or changes to a silent section (when the pitch periodicity is continuously low). The configuration shown in the embodiment is effective. Further, the audio signal power input to the determiner is obtained by calculating the absolute value of the power of the frame to be determined, the amount of change in power from the previous frame, and the estimated silence as shown in the third embodiment. Parameters such as a ratio to the section power are calculated, and the parameters are used alone or in a combination of a plurality of them for the determination. Accordingly, the determination of voiced silence (voice presence / absence) in the voiced section is performed by using one or more parameters and pitch parameters obtained from the input voice signal power and comparing them with threshold values respectively set. , If any or all of them indicate no sound, it is determined that there is no sound (no sound). In addition,
When the normalized pitch maximum correlation value is used as the pitch parameter, the threshold value of the pitch parameter is 0.0 to
An appropriate value at 1.0 (for example, 0.4) is set. On the other hand, the presence / absence determination in the silent section uses only one or a plurality of parameters obtained from the input audio signal power, and compares one or all of them with thresholds set for each. If it indicates that the sound is sound, it is determined that there is sound (there is sound).

【００１８】以上のように本発明の実施の形態によれ
ば、入力音声信号に対してそのピッチ周期性度合いを表
すピッチパラメータを算出するピッチパラメータ算出器
を設け、前記入力音声信号が有音（音声あり）から無音
（音声なし）へ変化する際の判定を、入力音声信号パワ
ーに加えてそのピッチパラメータを使用することで、騒
音レベルが高い場合やＳ／Ｎ比が低い場合などの条件に
おいても、無音区間を正しく無音（音声なし）と判定す
ることができる。As described above, according to the embodiment of the present invention, a pitch parameter calculator for calculating a pitch parameter representing the degree of pitch periodicity of an input voice signal is provided, and the input voice signal is By using the pitch parameter in addition to the input audio signal power to determine when the sound changes from “with voice” to “silence” (no voice), it is possible to determine whether the noise level is high or the S / N ratio is low. Also, the silent section can be correctly determined to be silent (no sound).

【００１９】（実施の形態２）図２は第２の発明におけ
る有音無音判定装置のブロック図を示したものである。
図２において、201は本実施の形態における有音無音判
定装置、202はフレーム毎の有音無音判定結果に基づい
て音声符号化を行う音声符号器、203は音声符号器202の
内部に備えられ音声符号化に必要なピッチ周期に関連す
るパラメータ（ピッチパラメータ）を算出するピッチパ
ラメータ算出器、204は有音無音判定装置201において入
力音声信号に対してフレーム毎の音声信号パワーを算出
するパワー算出器、205は音声符号器202内のピッチパラ
メータ算出器203により得られた前記ピッチパラメータ
を一定フレーム分遅延させる遅延器、206は前記音声信
号パワーおよび遅延器205の出力の一定フレーム分遅延
した前記ピッチパラメータから有音か無音かを判定する
判定器、207は前記判定器206により得られた判定結果の
うち、有音から無音への判定を遅らせるハングオーバ処
理器である。(Embodiment 2) FIG. 2 is a block diagram showing a sound / non-speech judging device according to a second invention.
In FIG. 2, reference numeral 201 denotes a sound / non-speech determination device according to the present embodiment, 202 denotes a voice coder that performs voice coding based on a voice / non-speech determination result for each frame, and 203 denotes a portion provided inside the voice coder 202. A pitch parameter calculator 204 for calculating a parameter (pitch parameter) related to a pitch period necessary for voice coding; a power calculation 204 for calculating a voice signal power for each frame with respect to an input voice signal in the voiced / silence determining device 201 Device, 205 is a delay device for delaying the pitch parameter obtained by the pitch parameter calculator 203 in the voice encoder 202 by a certain frame, and 206 is the voice signal power and the output of the delay device 205 delayed by a certain frame. A determiner 207 for determining whether there is sound or no sound from the pitch parameter, and 207 delays the determination from sound to silence among the determination results obtained by the determiner 206. It is a hangover processor.

【００２０】以上のように構成された有音無音判定装置
について図２を用いてその動作を説明する。本実施の形
態では、フレーム毎の有音無音判定結果に基づいて音声
符号化を行う音声符号器と、前記音声符号器が入力音声
信号のピッチ周期性の度合いを表すピッチパラメータを
算出するピッチパラメータ算出器とを備えた音声符号化
装置を前提とする。ピッチ周期またはそれに関連するパ
ラメータは音声信号を効率的に表現するために有効であ
り、CELP(Code Excited Linear Prediction)符号化をは
じめとする低ビットレート音声符号化方式には必須のパ
ラメータで、ピッチ周期性度合いを表すパラメータ(ピ
ッチパラメータ)はその符号化の過程で算出されるもの
である。図２において、まずパワー算出器204において
入力音声信号のフレーム毎のパワーを算出する。また、
遅延器205により、ピッチパラメータ算出器203で求めら
れたピッチ周期性度合いを表すピッチパラメータを1フ
レーム分遅延させ、前フレームのピッチパラメータを得
る。ピッチ周期性の度合いを表すパラメータとしては、
入力信号または入力信号に対する線形予測フィルタリン
グにより得られた線形予測残差信号の自己相関を最大に
する遅延値（ピッチ周期）における相関値（正規化ピッ
チ最大相関値）や、ピッチ予測誤差を最小にする遅延値
(ピッチ周期)における予測ゲイン(ピッチ予測ゲイン)等
がある。次に、判定器206により、パワー算出器204およ
び遅延器205により得られた前記音声信号パワーおよび
前フレームのピッチパラメータを用いて有音無音の判定
を行う。ここで、前記ピッチパラメータは有音区間にお
ける有音から無音への変化を検出するときに使用する。
すなわち、有音と判定された区間の次の判定対象フレー
ムが有音であるか無音であるかの判定に用い、前記ピッ
チパラメータの示すピッチ周期性度合いがしきい値より
低いときに無音であることを示す。なお、ピッチパラメ
ータのしきい値としては、ピッチパラメータに正規化ピ
ッチ最大相関値を用いる場合には０．０〜１．０での適
切な値（例えば０．４）とする。ピッチパラメータの使
用を有音と判定された一定区間での判定にのみ限定する
ことにより、音声符号器における前フレームのピッチ周
期関連パラメータの符号化過程で得られたピッチ周期度
合いを表すパラメータを用いることができ、それにより
判定対象フレームの音声符号化処理を待つことによる遅
延を生じることなく、また前記ピッチパラメータの算出
に必要な演算量の増加なく、前記ピッチパラメータを用
いた判定が行える。また、前記判定器に入力された前記
音声信号パワーは、その値から判定対象フレームのパワ
ーの絶対値やそれ以前のフレームからのパワーの変化
量、さらには実施の形態３で示すような推定無音区間パ
ワーとの比等のパラメータを算出し、それらを単独また
は複数の組み合わせで判定に用いる。従って、有音とさ
れる区間中での有音無音判定は、入力信号パワーから求
められた単独または複数のパラメータとピッチパラメー
タを用い、それぞれに設定されたしきい値との比較で、
それらのいずれかまたは全てが無音であることを示す場
合に、無音と判定する。一方、無音とされる区間中の有
音無音判定は、入力信号パワーから求められた単独また
は複数のパラメータのみを用い、それぞれに設定された
しきい値との比較で、それらのいずれかまたは全てが有
音であることを示す場合に、有音と判定する。The operation of the sound / non-speech judging device configured as described above will be described with reference to FIG. In the present embodiment, a speech coder that performs speech coding based on a speech / non-speech determination result for each frame, and a pitch parameter that the speech coder calculates a pitch parameter representing a degree of pitch periodicity of an input speech signal It is assumed that the speech coding apparatus includes a calculator. The pitch period or parameters related thereto are effective for efficiently expressing a speech signal, and are essential parameters for low bit rate speech coding such as CELP (Code Excited Linear Prediction) coding. A parameter representing the degree of periodicity (pitch parameter) is calculated in the encoding process. In FIG. 2, first, a power calculator 204 calculates the power of each frame of the input audio signal. Also,
The delay unit 205 delays the pitch parameter indicating the degree of pitch periodicity calculated by the pitch parameter calculator 203 by one frame, and obtains the pitch parameter of the previous frame. As parameters representing the degree of pitch periodicity,
Minimize the correlation value (normalized pitch maximum correlation value) in the delay value (pitch cycle) that maximizes the autocorrelation of the input signal or the linear prediction residual signal obtained by the linear prediction filtering of the input signal, and the pitch prediction error Delay value
There is a prediction gain (pitch prediction gain) in (pitch cycle). Next, the deciding unit 206 decides the presence or absence of sound using the audio signal power obtained by the power calculator 204 and the delay unit 205 and the pitch parameter of the previous frame. Here, the pitch parameter is used when detecting a change from voice to silence in a voice section.
That is, it is used to determine whether the frame to be determined next to the section determined to be voiced is voiced or silent, and is silent when the pitch periodicity degree indicated by the pitch parameter is lower than a threshold value. Indicates that When the normalized pitch maximum correlation value is used as the pitch parameter, the threshold value of the pitch parameter is set to an appropriate value in the range of 0.0 to 1.0 (for example, 0.4). By limiting the use of the pitch parameter to only the determination in a certain section determined to be sound, a parameter representing the degree of the pitch period obtained in the encoding process of the pitch period related parameter of the previous frame in the speech encoder is used. Accordingly, the determination using the pitch parameter can be performed without causing a delay due to waiting for the voice coding processing of the frame to be determined and without increasing the amount of calculation required for calculating the pitch parameter. Further, the audio signal power input to the determiner is obtained by calculating the absolute value of the power of the frame to be determined, the amount of change in power from the previous frame, and the estimated silence as shown in the third embodiment. Parameters such as a ratio to the section power are calculated, and the parameters are used alone or in a combination of a plurality of them for the determination. Accordingly, the sound / non-speech determination in the sounded section is performed by using a single or a plurality of parameters obtained from the input signal power and the pitch parameter, and comparing them with thresholds set respectively.
If any or all of them indicate silence, it is determined to be silence. On the other hand, the voiced / silent determination in the section that is determined to be silent uses only one or a plurality of parameters obtained from the input signal power, and compares them with thresholds set for each of them. Is determined to be voiced when indicates that it is voiced.

【００２１】以上のように本発明の実施の形態によれ
ば、判定パラメータの一つとしてピッチ周期性度合いを
表すピッチパラメータを利用し、そのピッチパラメータ
を、有音区間での有音から無音への変化を検出するとき
にのみ入力音声信号パワーに加えて使用することで、騒
音レベルが高い場合やＳ／Ｎ比が低い場合などの条件に
おいても、無音区間を正しく無音と判定することでき、
同時に、ピッチパラメータとして前フレームに音声符号
器で算出されたものを用いることにより、判定対象フレ
ームの音声符号化処理を待つことによる遅延を生じるこ
となくかつピッチパラメータの算出に必要な演算量の増
加なく高精度な判定を行うことができる。As described above, according to the embodiment of the present invention, a pitch parameter representing the degree of pitch periodicity is used as one of the determination parameters, and the pitch parameter is changed from voiced to voiceless in a voiced section. Is used in addition to the input audio signal power only when detecting the change in the sound section, so that even when the noise level is high or the S / N ratio is low, the silent section can be correctly determined to be silent.
At the same time, by using a pitch parameter calculated by the speech encoder for the previous frame, there is no delay due to waiting for speech encoding processing of the frame to be determined, and an increase in the amount of calculation required for the calculation of the pitch parameter And a highly accurate determination can be made.

【００２２】（実施の形態３）図３は第３の発明におけ
る有音無音判定装置のブロック図を示したものである。
図３において、301は入力音声信号に対してフレーム毎
の音声信号パワーを算出するパワー算出器、302はパワ
ー算出器301で得られた音声信号パワーおよび過去の有
音無音判定結果を用いて無音区間のパワーを推定する無
音パワー推定器、303は後段の音声符号器で得られるピ
ッチパラメータを１フレーム分遅延させ前フレームのピ
ッチパラメータとして出力する遅延器、304〜307は有音
無音の確からしさに応じた個別の判定を行う複数の個別
多値論理判定器でそのうち、304は音声信号パワーを用
いてパワーの絶対値で有音度合いを判定する絶対判定
器、305は音声信号パワーを用いてパワーの変化量で有
音度合いを判定する変化判定器、306は音声信号パワー
および推定無音パワーを用いて有音度合いを判定する相
対判定器、307は前フレームのピッチパラメータを用い
て有音度合いを判定するピッチ周期性判定器であり、30
8は前記複数の多値論理判定器により得られた判定結果
を基に多値論理により有音無音（音声あり、なし）を判
定する総合判定器、309は前記総合判定器308により得ら
れた判定結果に対して、有音から無音への判定を遅らせ
るハングオーバ処理器、310はハングオーバ処理器309を
通して得られた連続値表現の有音度合いを有音か無音の
いずれかに判定する２値化器である。(Embodiment 3) FIG. 3 is a block diagram showing a sound / non-speech judging device according to a third invention.
In FIG. 3, reference numeral 301 denotes a power calculator for calculating an audio signal power for each frame with respect to an input audio signal, and 302 denotes silence using the audio signal power obtained by the power calculator 301 and the result of the previous sound / non-speech determination. A silence power estimator for estimating the power of the section, 303 is a delay unit for delaying the pitch parameter obtained by the subsequent speech encoder by one frame and outputting it as a pitch parameter of the previous frame, and 304 to 307 are probabilities of speech and silence. Among a plurality of individual multi-valued logic decision units that perform individual decision according to the audio signal power, 304 is an absolute decision unit that determines the degree of sound with the absolute value of the power using the audio signal power, 305 is using the audio signal power A change determiner 306 for determining the degree of sound based on the power change amount, a relative determiner 306 for determining the degree of sound using the audio signal power and the estimated silence power, and 307 a pitch pattern for the previous frame. A determining pitch periodicity determining unit a sound degree with meter, 30
Reference numeral 8 denotes a comprehensive determiner for determining voiced / silent (with or without voice) by multivalued logic based on the determination results obtained by the plurality of multivalued logical determiners, and reference numeral 309 denotes a result obtained by the comprehensive determiner 308. A hangover processor for delaying the determination from speech to silence with respect to the determination result, and a binarization unit 310 for determining the degree of speech in the continuous value expression obtained through the hangover processor 309 as either speech or silence. It is a vessel.

【００２３】以上のように構成された有音無音判定装置
について図３を用いてその動作を説明する。本実施の形
態では、フレーム毎の有音無音判定結果に基づいて音声
符号化を行う音声符号器と、前記音声符号器が入力音声
信号のピッチ周期性の度合いを表すピッチパラメータを
算出するピッチパラメータ算出器とを備えた音声符号化
装置を前提とする。ピッチ周期またはそれに関連するパ
ラメータは音声信号を効率的に表現するために有効であ
り、CELP(Code Excited Linear Prediction)符号化をは
じめとする低ビットレート音声符号化方式には必須のパ
ラメータで、ピッチ周期性度合いを表すパラメータ(ピ
ッチパラメータ)はその符号化の過程で算出されるもの
である。図３において、パワー算出器301において入力
音声信号のフレーム毎のパワーを算出し、無音パワー推
定器302ではパワー算出器301で得られた入力信号パワー
および過去の有音無音判定結果を用いて、推定無音パワ
ーを更新する。推定無音パワーは、対象フレームの無音
の確からしさが大きいほどその時の入力信号パワーの値
に近くなるよう推定無音パワーを更新することにより得
る。また、遅延器303により、後段の音声符号器で得ら
れるピッチパラメータを1フレーム分遅延させ、前フレ
ームのピッチパラメータを得る。ピッチ周期性の度合い
を表すパラメータとしては、入力信号または入力信号に
対する線形予測フィルタリングにより得られた線形予測
残差信号の自己相関を最大にする遅延値（ピッチ周期）
における相関値（正規化ピッチ最大相関値）や、ピッチ
予測誤差を最小にする遅延値(ピッチ周期)における予測
ゲイン(ピッチ予測ゲイン)等がある。以上により得られ
た複数のパラメータを個別に用いて、304〜307の各々の
個別多値論理判定器により、有音／無音の確からしさを
個別に判定する。判定結果はたとえば0.0〜1.0の連続値
で表現する。ここでこの値は、1.0に近いほど有音が確
からしく、0.0に近いほど無音が確からしいことを、そ
して0.5に近いほどどちらとも判定できない（判定不
能）であることを示す。304〜307の各々での個別多値論
理判定器のうち、まず絶対判定器304は入力音声信号パ
ワーの絶対値を用いて上記に示した値で判定結果を出力
するもので、絶対値パワーが大きいほど有音度が高くな
るように判定する。また変化判定器305は入力音声信号
パワーの変化量（１フレーム前との差や比、過去数フレ
ームの最大変化値等）により判定を行うもので、変化が
大きいほど有音度が高くなるように判定する。相対判定
器306は、推定無音パワーに対する入力音声信号パワー
の比または差をパラメータとして判定するもので、それ
が大きいほど有音度が高く、小さいほど無音度合いが高
いと判定する。ピッチ周期性判定器307では、有音区間
中における判定法と無音区間中における判定法が異な
る。有音区間中、すなわち有音と判定されたフレームに
おいては、ピッチパラメータの示すピッチ周期性度合い
が高いほど有音の確からしさを大きく、低いほど無音の
確からしさを大きく判定する。なお、各フレーム毎に得
られるピッチパラメータを過去の値を用いて時間的に平
滑化してもよい。これにより、その区間の平均的な安定
した判定結果を得ることができる。また、ピッチ周期性
度合いが高い区間から低い区間へ遷移する音声の語尾で
の音切れを防ぐことができる。一方、無音区間中つまり
前フレームまでが無音の区間においては、有音／無音の
判定が不能であることを示す値とする、またはその値に
漸近するようにする。このように、無音区間中では新た
なピッチパラメータの入力を必要としない判定法にする
ことにより、音声符号器における前フレームのピッチ周
期関連パラメータの符号化過程で得られたピッチパラメ
ータを用いることができ、判定対象フレームの音声符号
化処理を待つことによる遅延を生じることなくかつピッ
チパラメータの算出に必要な演算量の増加なく、ピッチ
パラメータを用いた判定が行える。そして個別の多値論
理判定を行った後、それらの複数の個別判定結果を用い
て総合判定器308により有音度の多値判定を行う。これ
は、複数の判定結果のうち、有音度の最も確からしい結
果と、無音度の最も確からしい結果から多値(連続値)の
判定結果を得るものである。ハングオーバ処理器309で
は、総合判定器308の結果に対し有音度が高いほど有音
から無音への判定をより遅らせるような制御を行う。最
後に、ハングオーバ後の有音判定結果をあるしきい値
(たとえば0.5)で有音か無音かの２値化を行い最終的な
判定結果として出力する。なお、ハングオーバ処理器30
9の代わりに２値化後に、有音から無音への判定を一定
区間遅らせるような処理としてもよい。The operation of the sound / non-speech judging device configured as described above will be described with reference to FIG. In the present embodiment, a speech coder that performs speech coding based on a speech / non-speech determination result for each frame, and a pitch parameter that the speech coder calculates a pitch parameter representing a degree of pitch periodicity of an input speech signal It is assumed that the speech coding apparatus includes a calculator. The pitch period or parameters related thereto are effective for efficiently expressing a speech signal, and are essential parameters for low bit rate speech coding such as CELP (Code Excited Linear Prediction) coding. A parameter representing the degree of periodicity (pitch parameter) is calculated in the encoding process. In FIG. 3, a power calculator 301 calculates the power of each frame of the input audio signal, and a silence power estimator 302 uses the input signal power obtained by the power calculator 301 and the result of the previous sound / silence determination. Update estimated silence power. The estimated silence power is obtained by updating the estimated silence power so that the greater the probability of silence of the target frame, the closer to the value of the input signal power at that time. In addition, the delay unit 303 delays the pitch parameter obtained by the subsequent speech encoder by one frame to obtain the pitch parameter of the previous frame. As a parameter representing the degree of pitch periodicity, a delay value (pitch period) that maximizes the autocorrelation of the input signal or the linear prediction residual signal obtained by performing linear prediction filtering on the input signal.
, And a prediction gain (pitch prediction gain) at a delay value (pitch cycle) that minimizes the pitch prediction error. Using the plurality of parameters obtained as described above individually, the individual multi-valued logic decision units 304 to 307 individually determine the likelihood of sound / no sound. The determination result is represented by, for example, a continuous value of 0.0 to 1.0. Here, this value indicates that sound is more likely to be closer to 1.0, silence is more likely to be closer to 0.0, and that the value is closer to 0.5 indicates that both cannot be determined (determination is impossible). Of the individual multi-valued logic decision units 304 to 307, the absolute decision unit 304 first outputs a decision result with the value shown above using the absolute value of the input audio signal power. It is determined that the larger the sound level, the higher the sound level. The change determiner 305 makes a determination based on the amount of change in the input audio signal power (difference or ratio from one frame before, the maximum change value in the past several frames, and the like). Is determined. The relative determiner 306 determines the ratio or difference of the input audio signal power to the estimated silent power as a parameter, and determines that the larger the value is, the higher the sound level is, and the smaller the value is, the higher the silence level is. In the pitch periodicity determiner 307, the determination method during a sound interval and the determination method during a silent interval are different. In a voiced section, that is, in a frame determined to be voiced, the higher the pitch periodicity indicated by the pitch parameter, the greater the probability of voice presence, and the lower the pitch periodicity, the greater the probability of silence. Note that the pitch parameter obtained for each frame may be temporally smoothed using past values. As a result, an average stable determination result of the section can be obtained. In addition, it is possible to prevent the sound from being cut off at the end of the voice that transits from a section having a high pitch periodicity to a section having a low pitch periodicity. On the other hand, in a silent section, that is, in a section in which the preceding frame is silent, a value indicating that determination of presence / absence of speech is not possible or asymptotically close to the value. As described above, by using a determination method that does not require the input of a new pitch parameter during a silent period, it is possible to use the pitch parameter obtained in the encoding process of the pitch period related parameter of the previous frame in the speech encoder. Thus, the determination using the pitch parameter can be performed without causing a delay due to waiting for the speech encoding process of the frame to be determined and without increasing the amount of calculation required for calculating the pitch parameter. Then, after performing individual multi-valued logical judgments, the multi-value judgment of the soundness is performed by the comprehensive judgment unit 308 using the plurality of individual judgment results. This is to obtain a multi-valued (continuous value) determination result from the most probable result of voicedness and the most probable result of silence among a plurality of determination results. The hangover processor 309 performs control such that the higher the degree of sound, the longer the determination from sound to silence is delayed with respect to the result of the comprehensive determiner 308. Finally, the sound determination result after the hangover is
At 0.5 (for example, 0.5), binarization of sound or silence is performed and the result is output as a final determination result. The hangover processor 30
Instead of 9, after binarization, a process of delaying the determination from speech to silence by a certain section may be performed.

【００２４】以上のように本発明の実施の形態によれ
ば、入力音声信号パワー、推定無音パワーに加え、ピッ
チ周期性度合いを表すピッチパラメータを、音声符号器
における前フレームの符号化処理により得て、それらの
複数の多値判定結果をもとに総合判定により有音無音判
定を行うことにより、騒音レベルが高い場合やＳ／Ｎ比
が低い場合などの条件においても、無音の区間を正しく
無音と判定することでき、同時に、ピッチパラメータと
して判定前の前フレームで音声符号器により算出された
ものを用いることにより、判定対象フレームの音声符号
化処理を待つことによる遅延を生じることなくかつピッ
チパラメータの算出に必要な演算量の増加なく高精度な
判定を行うことができる。As described above, according to the embodiment of the present invention, in addition to the input speech signal power and the estimated silence power, the pitch parameter indicating the degree of pitch periodicity is obtained by the encoding process of the previous frame in the speech encoder. By performing sound / silence determination by comprehensive determination based on the plurality of multi-value determination results, a silent section can be correctly determined even when the noise level is high or the S / N ratio is low. It is possible to determine that there is no sound, and at the same time, by using the one calculated by the speech encoder in the previous frame before the decision as the pitch parameter, the delay caused by waiting for the speech encoding processing of the frame to be determined is not caused and the pitch is not changed. High-precision determination can be performed without increasing the amount of calculation required for parameter calculation.

【００２５】（実施の形態４）図４は第４の発明におけ
る有音無音判定装置のブロック図を示したものである。
図４において、301〜310は図３に示す第３の発明におけ
る実施の形態３と同一であるのでここでは省略する。40
1は後段の音声符号器で得られる音声スペクトルパラメ
ータを1フレーム分遅延させる遅延器、402は遅延器401
により得られた１フレーム分遅延したスペクトルパラメ
ータを入力し、以前のスペクトルパラメータからスペク
トルの変化量を算出するスペクトル変化量算出器であ
る。(Embodiment 4) FIG. 4 is a block diagram showing a sound / non-speech judging device according to a fourth invention.
In FIG. 4, reference numerals 301 to 310 are the same as in the third embodiment of the third invention shown in FIG. 40
1 is a delay unit that delays a speech spectrum parameter obtained by a subsequent speech encoder by one frame, and 402 is a delay unit 401.
Is a spectrum change amount calculator that inputs the spectrum parameters delayed by one frame obtained by the above and calculates the change amount of the spectrum from the previous spectrum parameters.

【００２６】以上のように構成された有音無音判定装置
について図４を用いてその動作を説明する。本実施の形
態は、実施の形態３に示す発明とピッチ周期性判定器30
7の動作以外は全く同じである。ピッチ周期性判定器307
では、有音の一定区間においては、ピッチパラメータの
示すピッチ周期性度合いが高いほど有音の確からしさを
大きく、またピッチ周期性度合いが低く、且つスペクト
ル変化量が小さいほどほど無音の確からしさを大きく判
定するように動作する。車内騒音等の騒音信号は、ピッ
チ周期性度合いが低いと同時に、信号が比較的定常的で
スペクトル変化量が少ないため、スペクトル変化量をピ
ッチ周期性の判定に利用することでより正確な判定を行
うことができる。The operation of the sound / non-speech judging device configured as described above will be described with reference to FIG. This embodiment relates to the invention shown in Embodiment 3 and
Except for the operation of 7, the operation is exactly the same. Pitch periodicity determiner 307
In a certain section of a sound, the higher the degree of pitch periodicity indicated by the pitch parameter, the greater the likelihood of a sound, and the lower the degree of pitch periodicity, and the smaller the amount of spectrum change, the greater the likelihood of silence. Operate to determine. Noise signals such as vehicle interior noise have a low degree of pitch periodicity, and at the same time, the signal is relatively stationary and has a small amount of spectrum change. It can be carried out.

【００２７】[0027]

【発明の効果】以上のように本発明は、まず第１の発明
においては、入力音声信号に対してそのピッチ周期性度
合いを表すピッチパラメータを算出するピッチパラメー
タ算出器を設け、有音区間における有音から無音への変
化を検出するときにのみ入力音声信号パワーに加えてピ
ッチパラメータを使用することで、騒音レベルが高い場
合やＳ／Ｎ比が低い場合などの条件においても、無音区
間を正しく無音と判定することできるという効果が得ら
れる。As described above, according to the first aspect of the present invention, a pitch parameter calculator for calculating a pitch parameter representing the degree of pitch periodicity of an input voice signal is provided. By using the pitch parameter in addition to the input audio signal power only when detecting a change from speech to silence, the silence period can be reduced even when the noise level is high or the S / N ratio is low. The effect is obtained that the sound can be correctly determined to be silent.

【００２８】また、第２の発明においては、音声符号化
装置内のピッチパラメータ算出器で得られたピッチ周期
性度合いを表すピッチパラメータを判定に利用し、その
ピッチパラメータを有音区間での有音から無音への変化
を検出する際に用い、入力音声信号パワーに加えて使用
することで、騒音レベルが高い場合やＳ／Ｎ比が低い場
合などの条件においても無音区間を正しく無音と判定す
ることでき、同時にピッチパラメータとして前フレーム
で音声符号器により算出されたものを用いることによ
り、判定対象フレームの音声符号化処理を待つことによ
る遅延を生じることなくまたピッチパラメータの算出に
必要な演算量の増加なく高精度な判定を行うことができ
るという効果が得られる。Further, in the second invention, a pitch parameter representing a degree of pitch periodicity obtained by a pitch parameter calculator in a speech coding apparatus is used for determination, and the pitch parameter is used in a sound section. Used when detecting a change from sound to silence, and used in addition to the input audio signal power to correctly judge a silence section as silence even when the noise level is high or the S / N ratio is low. By using the pitch parameter calculated by the speech encoder in the previous frame at the same time, there is no delay due to waiting for the speech encoding processing of the frame to be determined, and the calculation required for the calculation of the pitch parameter is also possible. The advantage is that highly accurate determination can be performed without increasing the amount.

【００２９】また、第３の発明においては、入力音声信
号パワー、推定無音パワーに加え、ピッチ周期性度合い
を表すピッチパラメータを前フレームの符号化処理より
得、それらの複数の多値判定結果をもとに総合判定によ
り有音無音判定を行うことにより、騒音レベルが高い場
合やＳ／Ｎ比が低い場合などの条件においても、無音区
間を正しく無音と判定することでき、同時に、ピッチパ
ラメータとして音声符号器により前フレームで算出され
たものを用いることにより、判定対象フレームの音声符
号化処理を待つことによる遅延を生じることなく、且つ
ピッチパラメータの算出に必要な演算量の増加なく高精
度な判定を行うことができるという効果が得られる。According to the third aspect of the invention, in addition to the input speech signal power and the estimated silence power, a pitch parameter representing the degree of pitch periodicity is obtained from the encoding process of the previous frame, and a plurality of multi-value determination results are obtained. Based on the sound / silence determination based on the overall determination, a silent section can be correctly determined to be silent even under conditions such as a high noise level and a low S / N ratio. By using the one calculated in the previous frame by the speech encoder, high accuracy can be achieved without delay due to waiting for speech encoding processing of the frame to be determined and without an increase in the amount of calculation required for calculating the pitch parameter. The effect that determination can be performed is obtained.

【００３０】また、第４の発明においては、ピッチ周期
性判定器が、有音区間においては、そのパラメータの示
すピッチ周期性度合いが高いほど有音の確からしさを大
きく、またピッチ周期性度合いが低くかつスペクトル変
化量が小さいほどほど無音の確からしさを大きく判定す
るように動作することで、車内騒音等のピッチ周期性度
合いが低く信号が比較的定常的であるという特徴を利用
し、スペクトル変化量をピッチ周期性による有音度の判
定に用いることでより正確な判定を行うことができると
いう効果が得られる。In the fourth aspect of the present invention, the pitch periodicity judging device increases the likelihood of the sound as the degree of the pitch periodicity indicated by the parameter increases, and the degree of the pitch periodicity increases in the sounding section. By operating to determine the likelihood of silence higher as the spectrum change amount is lower and the spectrum change amount is smaller, the characteristic that the degree of pitch periodicity such as in-vehicle noise is low and the signal is relatively stationary is used, and the spectrum change amount is used. Is used for the determination of the soundness based on the pitch periodicity, so that an effect that a more accurate determination can be performed is obtained.

[Brief description of the drawings]

【図１】本発明の実施の形態１における有音無音判定装
置装置のブロック図FIG. 1 is a block diagram of a sound / silence determination device according to a first embodiment of the present invention.

【図２】本発明の実施の形態２における有音無音判定装
置装置のブロック図FIG. 2 is a block diagram of a sound / silence determination device according to Embodiment 2 of the present invention;

【図３】本発明の実施の形態３における有音無音判定装
置装置のブロック図FIG. 3 is a block diagram of a sound / silence determination device according to a third embodiment of the present invention;

【図４】本発明の実施の形態４における有音無音判定装
置装置のブロック図FIG. 4 is a block diagram of a sound / silence determination device according to a fourth embodiment of the present invention.

【図５】従来の有音無音判定装置のブロック図FIG. 5 is a block diagram of a conventional sound / silence determination device.

[Explanation of symbols]

101 パワー算出器 102 ピッチパラメータ算出器 103 判定器 201 有音無音判定器 202 音声符号器 203 ピッチパラメータ算出器 204 パワー算出器 205 遅延器 206 判定器 207 ハングオーバ処理器 301 パワー算出器 302 無音パワー推定器 303 遅延器 304 絶対判定器 305 変化判定器 306 相対判定器 307 ピッチ周期性判定器 308 総合判定器 309 ハングオーバ処理器 310 ２値化器 401 遅延器 402 スペクトル変化量算出器 101 Power Calculator 102 Pitch Parameter Calculator 103 Judge 201 Speech / No-Speech Judgment 202 Speech Encoder 203 Pitch Parameter Calculator 204 Power Calculator 205 Delayer 206 Judge 207 Hangover Processor 301 Power Calculator 302 Silence Power Estimator 303 Delay unit 304 Absolute judgment unit 305 Change judgment unit 306 Relative judgment unit 307 Pitch periodicity judgment unit 308 Total judgment unit 309 Hangover processor 310 Binarization unit 401 Delay unit 402 Spectral change calculation unit

Claims

[Claims]

1. A power calculator for calculating a voice signal power for each fixed section of an input voice signal, a pitch parameter calculator for calculating a pitch parameter for each fixed section, and an output of the voice signal power and the pitch parameter. A sound / silence determination device comprising a judgment unit for judging a sound / no-sound state for each of the predetermined sections based on the above.

2. The method according to claim 1, wherein when determining whether a sound is present or not in a next second fixed section following the first fixed section of sound, the determination section corresponds to the second fixed section. A sound / non-speech determining apparatus characterized in that when the output value of the pitch parameter calculator is lower than a predetermined threshold value, the second fixed section is determined as a silent fixed section.

3. A speech encoder having a pitch parameter calculator for calculating a pitch parameter for each fixed section of an input speech signal, a power calculator for calculating a speech signal power for each fixed section of the input speech signal, A delay unit that delays the output of the pitch parameter calculator for a predetermined time; and a determiner that determines a sound or silence state for each of the predetermined sections based on an audio signal power obtained by the power calculator and an output of the delay unit. Sound / silence determination device provided with

4. The method according to claim 3, wherein the determining unit determines whether a sound is present or not in a next second fixed section following the first fixed section, and corresponds to the first fixed section. A sound / silence determining device for determining the second constant interval as a silent constant interval when a value of a delay device output obtained by delaying an output value of the pitch parameter calculator is lower than a predetermined threshold value; .

5. A speech encoder having a pitch parameter calculator for calculating a pitch parameter of an input speech signal for each fixed section, and a power calculator for calculating a speech signal power of the input speech signal for each fixed section. A silent power estimator for estimating the power of a silent section using the voice signal power obtained by the power calculator and the result of the previous sound / silence determination, and a pitch parameter obtained by the pitch parameter calculator by one frame delay. A delay unit that outputs the input voice signal and a part of the calculated voice signal power, the estimated voiceless power, and the pitch parameter delayed by one frame. A plurality of individual multi-value logic decision units for performing individual multi-value logic decision, and a multi-value theory based on a decision result obtained by the plurality of individual multi-value logic decision units The input audio signal is configured by a voiced / silent determining unit including a comprehensive determining unit that determines whether the voice is voiced or unvoiced, and the individual multi-valued logic determining unit uses a pitch parameter of the delay unit output. And having a pitch periodicity determiner to determine the pitch periodicity,
A sound / non-speech judging device for judging by individual multi-valued logic.

6. The sound / non-speech determining unit according to claim 5, wherein the individual multi-valued logic determiner using a pitch parameter in the input voice signal determines a pitch in a certain section determined as a voice of the input voice signal. The higher the value indicating the degree of pitch periodicity indicated by the parameter, the greater the certainty of the sound,
The lower the lower, the greater the likelihood of silence is determined, and in a certain interval of silence, a value indicating that determination of existence / non-speech is impossible is performed, or the operation is performed so as to approach the value. Sound / silence determination device.

7. A speech / silence determination device according to claim 5, wherein a speech encoder for outputting a speech spectrum parameter of an input speech signal and a delay unit for delaying and outputting the speech spectrum parameter output by one frame. And a spectrum change amount calculator for calculating a change amount of a voice spectrum parameter from the delay unit, and an individual multi-valued logic determiner using a pitch parameter determines in a sound section,
The higher the value of the pitch parameter is, the larger the probability of sound is determined, and the lower the value of the pitch parameter and the smaller the amount of spectrum change, the greater the probability of silence is determined. Sound / silence determination device.

8. The sound / silence determination device according to claim 1, wherein a normalized pitch maximum correlation value or a pitch prediction gain is used as a pitch parameter indicating a degree of pitch periodicity of the input audio signal.

9. A voice signal power and a pitch parameter for each fixed section of the input voice signal are calculated, and based on the calculated voice signal power and the pitch parameter, a sound or silence state for each of the predetermined sections is determined, Particularly, in the determination, when the value of the pitch parameter is lower than a predetermined threshold value when the input voice signal changes from voiced to voiceless, the predetermined interval is determined to be voiceless and voiceless. Judgment method.

10. A recording medium such as a magnetic disk, a magneto-optical disk, a ROM cartridge, or the like in which a program that implements the sound / non-sound determining device according to claim 1 by software is recorded, or using the recording medium. A device that operates as a sound / silence determination device.