JP4230414B2

JP4230414B2 - Sound signal processing method and sound signal processing apparatus

Info

Publication number: JP4230414B2
Application number: JP2004158788A
Authority: JP
Inventors: 裕久田崎
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1997-12-08
Filing date: 2004-05-28
Publication date: 2009-02-25
Anticipated expiration: 2018-12-07
Also published as: JP2004272292A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a method and an apparatus to process the inputted sound signals which include degraded sound such as quantizing noise so that the degraded sound is subjectively unperceptible. <P>SOLUTION: The spectrum of a decoded sound after performing auditory weighting to the decoded sound as the input sound signals, is computed and transformation strength is calculated on the basis of the magnitude of the amplitude and the continuity of the spectrum. In a signal transformation section, the spectrum of the decoded sound is obtained, amplitude smoothing and phase disturbance adding are conducted on the basis of the transformation strength and the spectrum is returned to a signal region to provide transformed and decoded voice. In a signal evaluation section, the decoded sound is analyzed to obtain background noise likeness and the obtained value is made an addition control value. In a weighted value adding section, the weight for adding to the decoded sound is reduced, the weight for adding to the transformed sound is increased, and an output sound is obtained, when the addition control value indicates the background noise likeness. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

本発明は、音声や楽音などの符号化復号化処理によって発生する量子化雑音や、雑音抑圧処理などのさまざまな信号加工処理によって生じる歪み、などの主観的に好ましくない成分を主観的に感じにくいように加工する音信号加工方法および音信号加工装置に関する。 The present invention makes it difficult to subjectively perceive subjectively undesirable components such as quantization noise generated by encoding / decoding processing such as voice and musical sound and distortion generated by various signal processing processing such as noise suppression processing. The present invention relates to a sound signal processing method and a sound signal processing apparatus.

音声や楽音などの情報源符号化の圧縮率を高めていくと、次第に符号化時の歪みである量子化雑音が増えてくるし、量子化雑音が変形してきて主観的に耐えられないものになってくる。一例を挙げて説明すると、ＰＣＭ（ＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）やＡＤＰＣＭ（ＡｄａｐｔｉｖｅＤｉｆｆｅｒｅｎｔｉａｌＰｕｌｓｅＣｏｄｅＭｏｄｕｌａｔｉｏｎ）のような信号自体を忠実に表現しようとする音声符号化方式の場合には、量子化雑音は乱数状であり、主観的にもあまり気にならないが、圧縮率が高まり、符号化方式が複雑になるにつれて、量子化雑音に符号化方式固有のスペクトル特性が表れ、主観的に大きな劣化となる場合がでてくる。特に背景雑音が支配的な信号区間においては、高圧縮率の音声符号化方式が利用している音声モデルが合わないため、非常に聞き苦しい音となってしまう。 Increasing the compression rate of information source coding such as voice and musical sound will gradually increase the quantization noise that is distortion at the time of coding, and the quantization noise will be deformed and subjectively unbearable. Come. For example, in the case of a speech coding scheme that faithfully represents a signal itself such as PCM (Pulse Code Modulation) or ADPCM (Adaptive Differential Pulse Code Modulation), the quantization noise is a random number. Yes, but not too much subjectively, but as the compression rate increases and the coding scheme becomes more complex, the spectral characteristics unique to the coding scheme appear in the quantization noise, which can be subjectively degraded. Come. In particular, in a signal section in which background noise is dominant, the speech model used by the speech coding method with a high compression rate does not match, so the sound becomes very hard to hear.

また、スペクトルサブトラクション法などの雑音抑圧処理を行った場合、雑音の推定誤差が処理後の信号上に歪みとして残り、これが処理前の信号と大きく異なる特性をもっているために、主観評価を大きく劣化させることがある。 In addition, when noise suppression processing such as the spectral subtraction method is performed, the noise estimation error remains as distortion on the processed signal, and this has characteristics that are significantly different from the signal before processing, so the subjective evaluation is greatly degraded. Sometimes.

上記のような量子化雑音や歪みによる主観評価の低下を抑制する従来の方法としては、特開平８−１３０５１３号、特開平８−１４６９９８号、特開平７−１６０２９６号、特開平６−３２６６７０号、特開平７−２４８７９３号、およびＳ．Ｆ．Ｂｏｌｌ著ｒａｃｔｉｏｎＳＳＰ−２７，Ｎｏ．２，ｐｐ．１１３−１２０，Ａｐｒｉｌ１９７９）（以降文献１と呼ぶ）に開示されているものがある。 As conventional methods for suppressing a decrease in subjective evaluation due to quantization noise and distortion as described above, JP-A-8-130513, JP-A-8-146998, JP-A-7-160296, JP-A-6-326670 are disclosed. JP-A-7-248793, and S.A. F. By Boll RactionSSP-27, No. 2, pp. 113-120, April 1979) (hereinafter referred to as Document 1).

特開平８−１３０５１３号は、背景雑音区間の品質改善を目的としたもので、背景雑音のみの区間であるか否かを判定して、背景雑音のみの区間に専用の符号化処理または復号化処理を行うようにし、背景雑音のみの区間の復号化を行う場合に合成フィルタの特性を抑制することで、聴感的に自然な再生音を得るようにしたものである。 Japanese Patent Laid-Open No. 8-130513 is for the purpose of improving the quality of the background noise section. It is determined whether or not the section is a background noise only section, and a dedicated encoding process or decoding is performed for the background noise only section. In the case of performing processing and decoding the section of only background noise, the characteristic of the synthesis filter is suppressed to obtain an acoustically natural reproduced sound.

特開平８−１４６９９８号は、白色雑音が符号化復号化によって耳障りな音色になることを抑制することを狙って、復号音声に対して白色雑音や予め格納しておいた背景雑音を加えるようにしたものである。 Japanese Patent Laid-Open No. 8-146998 adds white noise or prestored background noise to decoded speech in order to suppress white noise from becoming a harsh tone by encoding and decoding. It is a thing.

特開平７−１６０２９６号は、量子化雑音を聴感的に低減することを狙って、復号音声または音声復号化部が受信したスペクトルパラメータに関するインデックスを基に、聴覚マスキング閾値を求め、これを反映したフィルタ係数を求めて、この係数をポストフィルタに使用するようにしたものである。 Japanese Patent Application Laid-Open No. 7-160296 aims to audibly reduce quantization noise and obtains an auditory masking threshold value based on decoded speech or an index related to a spectral parameter received by a speech decoding unit and reflects this. A filter coefficient is obtained, and this coefficient is used for the post filter.

特開平６−３２６６７０号は、通信電力制御などのために音声を含まない区間で符号伝送を停止するシステムでは、符号伝送の無い時には復号側で疑似背景雑音を生成して出力するが、この時に発生する、音声区間に含まれる実際の背景雑音と無音区間の疑似背景雑音の間の違和感を軽減することを狙ったもので、音声を含まない区間だけでなく音声区間にも疑似背景雑音を重畳するようにしたものである。 Japanese Patent Laid-Open No. 6-326670 generates and outputs pseudo background noise on the decoding side when there is no code transmission in a system that stops code transmission in a section that does not include speech for communication power control or the like. The aim is to reduce the sense of discomfort between the actual background noise included in the speech section and the pseudo background noise in the silence section. The pseudo background noise is superimposed not only on the section that does not contain speech but also on the speech section. It is what you do.

特開平７−２４８７９３号は、雑音抑圧処理によって発生する歪み音を聴感的に軽減することを目的としたもので、符号化側では、まず雑音区間か音声区間か判定し、雑音区間では雑音スペクトルを伝送し、音声区間では雑音抑圧処理後のスペクトルを伝送し、復号化側では、雑音区間では受信した雑音スペクトルを用いて合成音を生成して出力し、音声区間では受信した雑音抑圧処理後のスペクトルを用いて生成した合成音に、雑音区間で受信した雑音スペクトルを用いて生成した合成音に重畳倍率を乗じて加算して出力するようにしたものである。 Japanese Patent Application Laid-Open No. 7-248793 is intended to audibly reduce distortion sound generated by noise suppression processing. The encoding side first determines whether it is a noise interval or a speech interval, and in the noise interval a noise spectrum is determined. In the speech section, the spectrum after noise suppression processing is transmitted, and on the decoding side, a synthesized sound is generated and output using the received noise spectrum in the noise section, and after received noise suppression processing in the speech section The synthesized sound generated using the above spectrum is added to the synthesized sound generated using the noise spectrum received in the noise section, multiplied by the superposition magnification, and output.

文献１は、雑音抑圧処理によって発生する歪み音を聴感的に軽減することを狙い、雑音抑圧処理後の出力音声に対して、時間的に前後の区間と振幅スペクトル上の平滑化を行い、更に背景雑音区間に限って振幅抑圧処理を行っている。
特開平８−１３０５１３号特開平８−１４６９９８号特開平７−１６０２９６号特開平６−３２６６７０号特開平７−２４８７９３号Ｓ．Ｆ．Ｂｏｌｌ著ｒａｃｔｉｏｎＳＳＰ−２７，Ｎｏ．２，ｐｐ．１１３−１２０，Ａｐｒｉｌ１９７９） Reference 1 aims to audibly reduce the distortion sound generated by the noise suppression processing, and smoothes the output speech after the noise suppression processing in terms of time intervals and amplitude spectra, Amplitude suppression processing is performed only in the background noise section.
JP-A-8-130513 JP-A-8-146998 JP 7-160296 A JP-A-6-326670 JP-A-7-248793 S. F. By Boll RactionSSP-27, No. 2, pp. 113-120, April 1979)

上記の従来法には、以下に述べる課題がある。 The above conventional methods have the following problems.

特開平８−１３０５１３号には、符号化処理や復号化処理を区間判定結果に従って大きく切り替えているために、雑音区間と音声区間の境界で特性の急変が起こる課題がある。特に雑音区間を音声区間と誤判定することが頻繁に起こった場合、本来比較的定常である雑音区間が不安定に変動してしまい、かえって雑音区間の劣化を起こす場合がある。雑音区間判定結果を伝送する場合、伝送するための情報の追加が必要で、更にその情報が伝送路上で誤った場合に、不必要な劣化を引き起こす課題がある。また、合成フィルタの特性を抑制するだけでは、音源符号化の際に生じる量子化雑音は軽減されないため、雑音種によっては改善効果がほとんど得られない課題がある。 Japanese Patent Laid-Open No. 8-130513 has a problem that characteristics change suddenly at the boundary between the noise section and the voice section because the encoding process and the decoding process are largely switched according to the section determination result. In particular, when frequent misjudgment of a noise section as a speech section frequently occurs, the noise section, which is relatively stationary in nature, fluctuates in an unstable manner, which may cause deterioration of the noise section. When transmitting a noise section determination result, it is necessary to add information for transmission, and there is a problem that causes unnecessary degradation when the information is erroneous on the transmission path. Further, only suppressing the characteristics of the synthesis filter does not reduce the quantization noise generated at the time of sound source coding, and there is a problem that an improvement effect is hardly obtained depending on the noise type.

特開平８−１４６９９８号には、予め用意してある雑音を加えてしてしまうために、符号化された現在の背景雑音の特性が失われてしまう課題がある。劣化音を聞こえにくくするためには劣化音を上回るレベルの雑音を加える必要があり、再生される背景雑音が大きくなってしまう課題がある。 In Japanese Patent Laid-Open No. 8-146998, there is a problem that the characteristic of the current encoded background noise is lost because the noise prepared in advance is added. In order to make it difficult to hear the deteriorated sound, it is necessary to add noise at a level exceeding the deteriorated sound, and there is a problem that the background noise to be reproduced becomes large.

特開平７−１６０２９６号では、スペクトルパラメータに基づいて聴覚マスキング閾値を求めて、これに基づいてスペクトルポストフィルタを行うだけであるので、スペクトルが比較的平坦な背景雑音などでは、マスキングされる成分もほとんどなく、全く改善効果が得られない課題がある。また、マスキングされない主要成分については、大きな変化を与えることができないので、主要成分に含まれている歪みについては何らの改善効果も得られない課題がある。 In Japanese Patent Laid-Open No. 7-160296, an auditory masking threshold value is obtained based on a spectrum parameter, and a spectrum post-filter is simply performed on the basis of the threshold value. There are almost no issues that cannot be improved at all. Moreover, since the main component which is not masked cannot give a big change, there exists a subject from which the improvement effect is not acquired about the distortion contained in the main component.

特開平６−３２６６７０号では、実際の背景雑音に関係なく疑似背景雑音を生成しているので、実際の背景雑音の特性が失われてしまう課題がある。 In Japanese Patent Laid-Open No. 6-326670, pseudo background noise is generated regardless of actual background noise, and thus there is a problem that characteristics of actual background noise are lost.

特開平７−２４８７９３号には、符号化処理や復号化処理を区間判定結果に従って大きく切り替えているために、雑音区間か音声区間かの判定を誤ると大きな劣化を引き起こす課題がある。雑音区間の一部を音声区間と誤った場合には、雑音区間内の音質が不連続に変動して聞き苦しくなる。逆に音声区間を雑音区間と誤った場合には、平均雑音スペクトルを用いた雑音区間の合成音と、音声区間で重畳される雑音スペクトルを用いた合成音に音声成分が混入し、全体的に音質劣化が起こる課題がある。更に、音声区間における劣化音を聞こえなくするためには、決して小さくない雑音を重畳することが必要である。 In Japanese Patent Laid-Open No. 7-248793, encoding and decoding processes are largely switched according to the section determination result, and therefore there is a problem that causes a large deterioration if a determination is made whether the section is a noise section or a speech section. If a part of the noise section is mistaken as a voice section, the sound quality in the noise section varies discontinuously and becomes difficult to hear. Conversely, if the speech segment is mistaken as a noise segment, speech components are mixed into the synthesized speech of the noise segment using the average noise spectrum and the synthesized speech using the noise spectrum superimposed in the speech segment. There is a problem that the sound quality deteriorates. Furthermore, it is necessary to superimpose noise that is not small so as not to hear the degraded sound in the voice section.

文献１には、平滑化のために半区間分（１０ｍｓ〜２０ｍｓ程度）の処理遅延が発生する課題がある。また、雑音区間内の一部を音声区間と誤判定してしまった場合、雑音区間内の音質が不連続に変動して聞き苦しくなる課題がある。 Document 1 has a problem that processing delay of half a section (about 10 ms to 20 ms) occurs for smoothing. In addition, when a part of the noise section is erroneously determined as the voice section, there is a problem that the sound quality in the noise section varies discontinuously and becomes difficult to hear.

この発明は、かかる課題を解決するためになされたものであり、区間判定誤りによる劣化が少なく、雑音種やスペクトル形状への依存度が少なく、大きな遅延時間を必要としない、実際の背景雑音の特性を残すことができ、背景雑音レベルを過度に大きくすることがなく、新たな伝送情報の追加が不要で、音源符号化などによる劣化成分についても良好な抑圧効果を与えることのできる音信号加工方法および音信号加工装置を提供することを目的としている。 The present invention has been made in order to solve the above-described problem. Actual background noise is less deteriorated due to a section determination error, less dependent on noise types and spectrum shapes, and does not require a large delay time. Sound signal processing that can retain the characteristics, does not excessively increase the background noise level, does not require the addition of new transmission information, and can provide a good suppression effect on degradation components due to sound source coding, etc. It is an object to provide a method and a sound signal processing apparatus.

入力音信号を加工して第一の加工信号を生成し、前記入力音信号を分析して所定の評価値を算出し、この評価値に基づいて前記入力音信号と前記第一の加工信号を重み付け加算して第二の加工信号とし、この第二の加工信号を出力信号とすることを特徴とする。 The input sound signal is processed to generate a first processed signal, the input sound signal is analyzed to calculate a predetermined evaluation value, and the input sound signal and the first processed signal are calculated based on the evaluation value. Weighted addition is performed to obtain a second processed signal, and this second processed signal is used as an output signal.

また、更に、前記第一の加工信号生成方法は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、このフーリエ変換により算出された周波数毎のスペクトル成分に対して所定の変形を与え、変形後のスペクトル成分を逆フーリエ変換して生成することを特徴とする。 Furthermore, in the first processed signal generation method, a spectral component for each frequency is calculated by performing a Fourier transform on the input sound signal, and a predetermined spectral component is calculated for the spectral component for each frequency calculated by the Fourier transform. It is characterized in that a deformation is given and a spectral component after the deformation is generated by inverse Fourier transform.

また、更に、前記重み付け加算をスペクトル領域で行なうようにしたことを特徴とする。 Furthermore, the weighted addition is performed in a spectral region.

また、更に、前記重み付け加算を周波数成分毎に独立に制御するようにしたことを特徴とする。 Furthermore, the weighted addition is controlled independently for each frequency component.

また、更に、前記周波数毎のスペクトル成分に対する所定の変形に振幅スペクトル成分の平滑化処理を含むことを特徴とする。 Further, the predetermined deformation of the spectral component for each frequency includes an amplitude spectral component smoothing process.

また、更に、前記周波数毎のスペクトル成分に対する所定の変形に位相スペクトル成分の擾乱付与処理を含むことを特徴とする。 Further, the predetermined deformation of the spectral component for each frequency includes a disturbance applying process for the phase spectral component.

また、更に、前記平滑化処理における平滑化強度を、入力音信号の振幅スペクトル成分の大きさによって制御するようにしたことを特徴とする。 Furthermore, the smoothing intensity in the smoothing process is controlled by the magnitude of the amplitude spectrum component of the input sound signal.

また、更に、前記擾乱付与処理における擾乱付与強度を、入力音信号の振幅スペクトル成分の大きさによって制御するようにしたことを特徴とする。 Furthermore, the disturbance applying intensity in the disturbance applying process is controlled by the magnitude of the amplitude spectrum component of the input sound signal.

また、更に、前記平滑化処理における平滑化強度を、入力音信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたことを特徴とする。 Further, the smoothing intensity in the smoothing process is controlled by the magnitude of continuity in the time direction of the spectral components of the input sound signal.

また、更に、前記擾乱付与処理における擾乱付与強度を、入力音信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたことを特徴とする。 Furthermore, the disturbance applying intensity in the disturbance applying process is controlled by the magnitude of continuity in the time direction of the spectral components of the input sound signal.

また、更に、前記入力音信号として、聴覚重み付した入力音信号を用いるようにしたことを特徴とする。 Furthermore, an auditory weighted input sound signal is used as the input sound signal.

また、更に、前記平滑化処理における平滑化強度を、前記評価値の時間変動性の大きさによって制御するようにしたことを特徴とする。 Furthermore, the smoothing intensity in the smoothing process is controlled by the magnitude of temporal variability of the evaluation value.

また、更に、前記擾乱付与処理における擾乱付与強度を、前記評価値の時間変動性の大きさによって制御するようにしたことを特徴とする。 Furthermore, the disturbance applying intensity in the disturbance applying process is controlled by the magnitude of temporal variability of the evaluation value.

また、更に、前記所定の評価値として、前記入力音信号を分析して算出した背景雑音らしさの度合を用いるようにしたことを特徴とする。 Furthermore, the degree of background noise likelihood calculated by analyzing the input sound signal is used as the predetermined evaluation value.

また、更に、前記所定の評価値として、前記入力音信号を分析して算出した摩擦音らしさの度合を用いるようにしたことを特徴とする。 Furthermore, the degree of friction sound likelihood calculated by analyzing the input sound signal is used as the predetermined evaluation value.

また、更に、前記入力音信号として、音声符号化処理によって生成された音声符号を復号した復号音声を用いるようにしたことを特徴とする。 Furthermore, the input sound signal is characterized by using a decoded voice obtained by decoding a voice code generated by a voice encoding process.

この発明音信号加工方法は、前記入力音信号を音声符号化処理によって生成された音声符号を復号した第一の復号音声とし、この第一の復号音声に対してポストフィルタ処理を行なって第二の復号音声を生成し、前記第一の復号音声を加工して第一の加工音声を生成し、いずれかの復号音声を分析して所定の評価値を算出し、この評価値に基づいて前記第二の復号音声と前記第一の加工音声を重み付けし加算して第二の加工音声とし、この第二の加工音声を出力音声として出力することを特徴とする。 In the sound signal processing method of the present invention, the input sound signal is used as a first decoded sound obtained by decoding a sound code generated by a sound encoding process, and a post-filter process is performed on the first decoded sound to obtain a second decoded sound. The decoded speech is generated, the first decoded speech is processed to generate a first processed speech, one of the decoded speech is analyzed to calculate a predetermined evaluation value, and based on the evaluation value, the The second decoded sound and the first processed sound are weighted and added to form a second processed sound, and the second processed sound is output as an output sound.

この発明の音信号加工装置は、入力音信号を加工して第一の加工信号を生成する第一の加工信号生成部と、前記入力音信号を分析して所定の評価値を算出する評価値算出部と、この評価値算出部の評価値に基づいて前記入力音信号と前記第一の加工信号を重み付けして加算し、第二の加工信号として出力する第二の加工信号生成部とを備えたことを特徴とする。 The sound signal processing apparatus according to the present invention includes a first processed signal generation unit that processes an input sound signal to generate a first processed signal, and an evaluation value that analyzes the input sound signal and calculates a predetermined evaluation value A calculation unit, and a second processed signal generation unit that weights and adds the input sound signal and the first processed signal based on the evaluation value of the evaluation value calculating unit, and outputs the second processed signal as a second processed signal. It is characterized by having.

また、更に、前記第一の加工信号生成部は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、この算出された周波数毎のスペクトル成分に対して振幅スペクトル成分の平滑化処理を与え、この振幅スペクトル成分の平滑化処理された後のスペクトル成分を逆フーリエ変換して第一の加工信号を生成することを特徴とする。 Further, the first processed signal generation unit calculates a spectral component for each frequency by performing a Fourier transform on the input sound signal, and smoothes an amplitude spectral component with respect to the calculated spectral component for each frequency. And a first processed signal is generated by performing inverse Fourier transform on the spectrum component after the amplitude spectrum component is smoothed.

また、更に、前記第一の加工信号生成部は、前記入力音信号をフーリエ変換することで周波数毎のスペクトル成分を算出し、この算出された周波数毎のスペクトル成分に対して位相スペクトル成分の擾乱付与処理を与え、この位相スペクトル成分の擾乱付与処理された後のスペクトル成分を逆フーリエ変換して第一の加工信号を生成することを特徴とする。 Further, the first processed signal generation unit calculates a spectral component for each frequency by performing a Fourier transform on the input sound signal, and a disturbance of a phase spectral component with respect to the calculated spectral component for each frequency. A first processing signal is generated by applying an applying process and performing an inverse Fourier transform on the spectral component after the phase spectral component is subjected to the disturbance applying process.

以上説明したように本発明の音信号加工方法および音信号加工装置は、入力信号に対して所定の信号加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号を生成し、所定の評価値によって入力信号と加工信号の加算重みを制御するようにしたので、劣化成分が多く含まれる区間を中心に加工信号の比率を増やして、主観品質を改善できる効果がある。 As described above, the sound signal processing method and the sound signal processing apparatus according to the present invention perform predetermined signal processing on the input signal so that the deterioration component included in the input signal is not subjectively concerned. The processed signal is generated, and the addition weight of the input signal and the processed signal is controlled by the predetermined evaluation value, so the subjective quality is improved by increasing the ratio of the processed signal mainly in the section that contains many deterioration components There is an effect that can be done.

また、従来の２値区間判定を廃し、連続量の評価値を算出して、これに基づいて連続的に入力信号と加工信号の重み付け加算係数を制御できるので、区間判定誤りによる品質劣化を回避できる効果がある。 In addition, the conventional binary section judgment is eliminated, the evaluation value of the continuous quantity is calculated, and the weighted addition coefficient of the input signal and the processed signal can be controlled continuously based on this, thereby avoiding quality degradation due to the section judgment error. There is an effect that can be done.

また、背景雑音の情報が多く含まれている入力信号の加工処理によって出力信号を生成できるので、実際の背景雑音の特性を残しつつ、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られるし、音源符号化などによる劣化成分に対しても改善効果が得られる効果がある。 In addition, since the output signal can be generated by processing the input signal that contains a lot of background noise information, it has a stable quality improvement effect that does not depend much on the noise type and spectrum shape while retaining the actual background noise characteristics. In addition, there is an effect that an improvement effect can be obtained even for a deteriorated component due to excitation coding or the like.

また、現在までの入力信号を用いて処理を行うことができるので特に大きな遅延時間は不要で、入力信号と加工信号の加算方法によっては処理時間以外の遅延を排除することもできる効果がある。加工信号のレベルをあげる際には入力信号のレベルを下げていくようにすれば、従来のように劣化成分をマスクするために大きな疑似雑音を重畳することも不要で、逆に適用対象に応じて、背景雑音レベルを小さ目にしたり、大き目にしたりすることすら可能である。また、当然のことであるが、音声符号化復号化による劣化音を解消する場合でも、従来のような新たな伝送情報の追加は不要である。 In addition, since processing can be performed using the input signals up to now, there is no need for a particularly large delay time, and there is an effect that delays other than the processing time can be eliminated depending on the method of adding the input signal and the processed signal. When increasing the level of the processed signal, if the input signal level is lowered, it is not necessary to superimpose a large pseudo noise to mask the degradation component as in the conventional case. Thus, the background noise level can be reduced or even increased. As a matter of course, even when the degraded sound due to speech coding / decoding is eliminated, it is not necessary to add new transmission information as in the conventional case.

本発明の音信号加工方法および音信号加工装置は、入力信号に対して、スペクトル領域での所定の加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号を生成し、所定の評価値によって入力信号と加工信号の加算重みを制御するようにしたので、上記信号加工方法が持つ効果に加えて、スペクトル領域での細かい劣化成分の抑圧処理を行うことができ、更に主観品質を改善できる効果がある。 The sound signal processing method and sound signal processing apparatus according to the present invention perform predetermined processing in the spectral region on the input signal so that the deterioration component included in the input signal is not subjectively concerned. Since the processing signal is generated and the addition weight of the input signal and the processing signal is controlled by a predetermined evaluation value, in addition to the effect of the signal processing method, the fine degradation component suppression processing in the spectral region is performed. Can also improve the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法において、入力信号と加工信号をスペクトル領域で重み付け加算するようにしたので、上記音信号加工方法が持つ効果に加えて、スペクトル領域での処理を行う雑音抑圧方法の後段に接続する場合などに、音信号加工方法が必要とするフーリエ変換処理、逆フーリエ変換処理を一部または全部省略することができ、処理が簡易化できる効果がある。 In the sound signal processing method of the present invention, since the input signal and the processed signal are weighted and added in the spectral region in the sound signal processing method of the above invention, in addition to the effects of the sound signal processing method, For example, when connecting to the subsequent stage of the noise suppression method that performs the above processing, part or all of the Fourier transform processing and inverse Fourier transform processing required by the sound signal processing method can be omitted, and the processing can be simplified. is there.

本発明の音信号加工方法は、上記発明の音信号加工方法において、重み付け加算を周波数成分毎に独立に制御するようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分の支配的な成分が重点的に加工信号に置換され、量子化雑音や劣化成分が少ない良好な成分まで置換してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 In the sound signal processing method of the present invention, the weighted addition is controlled independently for each frequency component in the sound signal processing method of the above invention, so that in addition to the effects of the sound signal processing method, quantization noise and The dominant component of the deteriorated component is replaced with the processed signal mainly, and it is no longer replaced with a good component with little quantization noise and deteriorated component. The degradation component can be subjectively suppressed, and the subjective quality can be improved.

本発明の音信号加工方法は、上記発明の音信号加工方法における加工処理として、振幅スペクトル成分の平滑化処理を行うようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、主観品質を改善できる効果がある。 In the sound signal processing method of the present invention, since the amplitude spectrum component is smoothed as the processing in the sound signal processing method of the above invention, in addition to the effects of the sound signal processing method, quantization noise is added. It is possible to satisfactorily suppress unstable fluctuations in the amplitude spectrum component caused by the above, and to improve the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における加工処理として、位相スペクトル成分の擾乱付与処理を行うようにしたので、上記音信号加工方法が持つ効果に加えて、位相成分間に独特な相互関係を持ってしまい、特徴的な劣化と感じられることが多い量子化雑音や劣化成分に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 In the sound signal processing method according to the present invention, since the disturbance processing of the phase spectrum component is performed as the processing in the sound signal processing method of the above invention, in addition to the effects of the sound signal processing method, This has the effect of improving the subjective quality by disturbing the relationship between the phase components against the quantization noise and the degradation components that often have a characteristic correlation with each other. is there.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、入力信号または聴覚重み付けした入力信号の振幅スペクトル成分の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 In the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the above invention is controlled by the magnitude of the amplitude spectrum component of the input signal or the auditory weighted input signal. In addition to the effects of the sound signal processing method, quantization noise and deterioration components are mainly applied to the components where quantization noise and deterioration components are dominant because the amplitude spectrum component is small. Therefore, it is possible to subjectively suppress the quantization noise and the degradation component while leaving the characteristics of the input signal good, and to improve the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、入力信号または聴覚重み付けした入力信号のスペクトル成分の時間方向の連続性の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号の特性を良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 In the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the above invention is controlled by the magnitude of continuity in the time direction of the spectral components of the input signal or the auditory weighted input signal. Therefore, in addition to the effects of the sound signal processing method described above, the continuity of the spectral components is low, so that processing that is prone to increase quantization noise and deterioration components is processed with a focus on quantization. There is no need to process a good component with less noise and deterioration components, and it is possible to subjectively suppress quantization noise and deterioration components while maintaining good characteristics of the input signal, thereby improving the subjective quality.

本発明の音信号加工方法は、上記発明の音信号加工方法における平滑化強度または擾乱付与強度を、前記評価値の時間変動性の大きさによって制御するようにしたので、上記音信号加工方法が持つ効果に加えて、入力信号の特性が変動している区間において必要以上に強い加工処理を抑止でき、特に振幅平滑化によるなまけ、エコーの発生を防止できる効果がある。 In the sound signal processing method of the present invention, the smoothing strength or the disturbance imparting strength in the sound signal processing method of the above invention is controlled by the magnitude of temporal variability of the evaluation value. In addition to the effects, it is possible to suppress unnecessarily strong processing in a section where the characteristics of the input signal are fluctuating, and in particular, it is possible to prevent the occurrence of slack and echo due to amplitude smoothing.

本発明の音信号加工方法は、上記発明の音信号加工方法における所定の評価値として背景雑音らしさの度合を用いるようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな背景雑音区間に対して重点的な加工が加えられ、背景雑音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。 Since the sound signal processing method of the present invention uses the degree of background noise likelihood as the predetermined evaluation value in the sound signal processing method of the above invention, in addition to the effects of the sound signal processing method, quantization noise and Focused processing is added to the background noise section that tends to generate many degradation components, and appropriate processing (not processed, low-level processing, etc.) is selected for the section other than the background noise. Therefore, there is an effect that the subjective quality can be improved.

本発明の音信号加工方法は、上記発明の音信号加工方法における前記所定の評価値として摩擦音らしさの度合を用いるようにしたので、上記音信号加工方法が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな摩擦音区間に対して重点的な加工が加えられ、摩擦音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。 Since the sound signal processing method of the present invention uses the degree of frictional sound likelihood as the predetermined evaluation value in the sound signal processing method of the above invention, in addition to the effects of the sound signal processing method, quantization noise and Focusing processing is applied to the frictional sound section that tends to generate many deterioration components, and appropriate processing (not processing, low-level processing, etc.) is selected for the other sections than the frictional sound. It has the effect of improving subjective quality.

本発明の音信号加工方法は、音声符号化処理によって生成された音声符号を入力とし、この音声符号を復号して復号音声を生成し、この復号音声を入力として上記音信号加工方法を用いた信号加工処理を施して加工音声を生成し、この加工音声を出力音声として出力するようにしたので、上記音信号加工方法が持つ主観品質改善効果等をそのまま持った音声復号が実現される効果がある。 The sound signal processing method according to the present invention uses the sound code generated by the sound encoding process as an input, decodes the sound code to generate a decoded sound, and uses the decoded sound as an input to use the sound signal processing method. Since the processed speech is generated by performing the signal processing, and this processed speech is output as the output speech, the speech decoding with the subjective quality improvement effect etc. possessed by the sound signal processing method as it is can be realized. is there.

本発明の音信号加工方法は、音声符号化処理によって生成された音声符号を入力とし、この音声符号を復号して復号音声を生成し、復号音声に所定の信号加工処理を行って加工音声を生成し、復号音声にポストフィルタ処理を行い、更にポストフィルタ前または後の復号音声を分析して所定の評価値を算出し、この評価値に基づいてポストフィルタ後の復号音声と加工音声を重み付け加算して出力するようにしたので、上記音信号加工方法が持つ主観品質改善効果等をそのまま持った音声復号が実現される効果に加えて、ポストフィルタに影響されない加工音声が生成でき、ポストフィルタに影響されずに算出した精度の高い評価値に基づいて精度の高い加算重み制御ができるようになるので、更に主観品質が改善する効果がある。 In the sound signal processing method of the present invention, the speech code generated by the speech encoding process is input, the speech code is decoded to generate a decoded speech, and the decoded speech is subjected to a predetermined signal processing to produce the processed speech. Generate and perform post-filter processing on the decoded speech, further analyze the decoded speech before or after the post-filter, calculate a predetermined evaluation value, and weight the decoded speech after the post-filter and the processed speech based on this evaluation value Since it is added and output, in addition to the effect that speech decoding with the subjective quality improvement effect etc. of the sound signal processing method as it is is realized, processed speech that is not affected by the post filter can be generated, and the post filter Since the addition weight control with high accuracy can be performed based on the highly accurate evaluation value calculated without being influenced by the above, there is an effect of further improving the subjective quality.

以下図面を参照しながら、この発明の実施の形態について説明する。 Embodiments of the present invention will be described below with reference to the drawings.

実施の形態１．
図１は、本実施の形態による音信号加工方法を適用した音声復号方法の全体構成を示し、図中１は音声復号装置、２はこの発明による信号加工方法を実行する信号加工部、３は音声符号、４は音声復号部、５は復号音声、６は出力音声である。信号加工部２は、信号変形部７、信号評価部１２、重み付き加算部１８より構成されている。信号変形部７は、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ部１１より構成されている。信号評価部１２は、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定背景雑音パワー更新部１６、推定雑音スペクトル更新部１７より構成されている。 Embodiment 1 FIG.
FIG. 1 shows the overall structure of a speech decoding method to which a sound signal processing method according to this embodiment is applied. In FIG. 1, 1 is a speech decoding apparatus, 2 is a signal processing unit for executing the signal processing method according to the present invention, A voice code, 4 is a voice decoding unit, 5 is a decoded voice, and 6 is an output voice. The signal processing unit 2 includes a signal transformation unit 7, a signal evaluation unit 12, and a weighted addition unit 18. The signal transformation unit 7 includes a Fourier transform unit 8, an amplitude smoothing unit 9, a phase disturbance unit 10, and an inverse Fourier unit 11. The signal evaluation unit 12 includes an inverse filter unit 13, a power calculation unit 14, a background noise likelihood calculation unit 15, an estimated background noise power update unit 16, and an estimated noise spectrum update unit 17.

以下、図に基づいて動作を説明する。 Hereinafter, the operation will be described with reference to the drawings.

まず音声符号３が音声復号装置１内の音声復号部４に入力される。なお、この音声符号３は、別途音声符号化部が音声信号を符号化した結果として出力され、通信路や記憶デバイスを介してこの音声復号部４に入力される。 First, the speech code 3 is input to the speech decoding unit 4 in the speech decoding device 1. The speech code 3 is output as a result of separately encoding the speech signal by the speech encoding unit, and is input to the speech decoding unit 4 via a communication path or a storage device.

音声復号部４は、音声符号３に対して、前記音声符号化部と対を成す復号処理を行い、得られた所定の長さ（１フレーム長）の信号を復号音声５として出力する。そして、この復号音声５は、信号加工部２内の信号変形部７、信号評価部１２、重み付き加算部１８に入力される。 The speech decoding unit 4 performs a decoding process that forms a pair with the speech encoding unit on the speech code 3 and outputs the obtained signal having a predetermined length (1 frame length) as the decoded speech 5. The decoded speech 5 is input to the signal transformation unit 7, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2.

信号変形部７内のフーリエ変換部８は、入力された現フレームの復号音声５と必要に応じ前フレームの復号音声５の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを振幅平滑化部９に出力する。なお、フーリエ変換処理としては、離散フーリエ変換（ＤＦＴ）、高速フーリエ変換（ＦＦＴ）などが代表的である。窓がけ処理としては、台形窓、方形窓、ハニング窓など様々なものが適用可能であるが、ここでは、台形窓の両端の傾斜部分をそれぞれハニング窓の半分ずつに置換した変形台形窓を使用する。実際の形状例、復号音声５や出力音声６との時間関係については、図面を用いて後述説明する。 The Fourier transform unit 8 in the signal transformation unit 7 performs windowing on the combined signal of the input decoded speech 5 of the current frame and the latest part of the decoded speech 5 of the previous frame, if necessary. A spectral component for each frequency is calculated by performing a Fourier transform process on the signal, and this is output to the amplitude smoothing unit 9. Typical Fourier transform processing includes discrete Fourier transform (DFT), fast Fourier transform (FFT), and the like. Various methods such as trapezoidal windows, rectangular windows, and Hanning windows can be used for windowing, but here we use a modified trapezoidal window in which the slopes at both ends of the trapezoidal window are replaced with half of the Hanning window. To do. The actual shape example and the time relationship with the decoded speech 5 and the output speech 6 will be described later with reference to the drawings.

振幅平滑化部９は、フーリエ変換部８から入力された周波数毎のスペクトルの振幅成分に対して平滑化処理を行い、平滑化後のスペクトルを位相擾乱部１０に出力する。ここで用いる平滑化処理としては、周波数軸方向、時間軸方向の何れを用いても、量子化雑音などの劣化音の抑制効果が得られる。しかし、周波数軸方向の平滑化をあまり強くすると、スペクトルの怠けが生じ、本来の背景雑音の特性を損なってしまうことが多い。一方、時間軸方向の平滑化についても、あまり強くしていくと、長時間にわたって同じ音が残ることになり、反響感が発生してしまう。色々な背景雑音に対して調整を進めた結果、周波数軸方向の平滑化はなし、時間軸方向は振幅を対数領域で平滑化する、とした場合が出力音声６の品質が良かった。その時の平滑化方法は、次式で表わされる。 The amplitude smoothing unit 9 performs a smoothing process on the amplitude component of the spectrum for each frequency input from the Fourier transform unit 8, and outputs the smoothed spectrum to the phase disturbance unit 10. As the smoothing process used here, the effect of suppressing degraded sound such as quantization noise can be obtained regardless of whether the frequency axis direction or the time axis direction is used. However, if the smoothing in the frequency axis direction is too strong, the spectrum will be neglected and the characteristics of the original background noise are often impaired. On the other hand, if the smoothing in the time axis direction is made too strong, the same sound remains for a long time, resulting in a feeling of echo. As a result of proceeding with adjustments for various background noises, the quality of the output sound 6 was good when smoothing in the frequency axis direction was not performed and amplitude was smoothed in the logarithmic region in the time axis direction. The smoothing method at that time is expressed by the following equation.

ｙ_ｉ＝ｙ_ｉ−１（１−α）＋ｘ_ｉα ・・・式１
ここで、ｘ_ｉが現在のフレーム（第ｉフレーム）の平滑化前の対数振幅スペクトル値、ｙ_ｉ−１が前フレーム（第ｉ−１フレーム）の平滑化後の対数振幅スペクトル値、ｙ_ｉが現在のフレーム（第ｉフレーム）の平滑化後の対数振幅スペクトル値、αが０〜１の値を持つ平滑化係数である、平滑化係数αはフレーム長、解消したい劣化音のレベルなどによって最適値が異なるが、概ね０．５程度の値となる。 y _i = y _i−1 (1−α) + x _i α Equation 1
Here, x _i is a logarithmic amplitude spectrum value before smoothing of the current frame (i-th frame), y _i-1 is a logarithmic amplitude spectrum value after smoothing of the previous frame (i-th frame), and y _i. Is a logarithmic amplitude spectrum value of the current frame (i-th frame) after smoothing, α is a smoothing coefficient having a value of 0 to 1, smoothing coefficient α depends on the frame length, the level of degraded sound to be eliminated, etc. Although the optimum value is different, the value is about 0.5.

位相擾乱部１０は、振幅平滑化部９から入力された平滑化後のスペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを逆フーリエ変換部１１に出力する。各位相成分に擾乱を与える方法としては、乱数を用いて所定範囲の位相角を生成し、これを元々の位相角に加算すれば良い。位相角生成の範囲の制限を設けない場合には、各位相成分を乱数で生成した位相角に単に置換すればよい。符号化などによる劣化が大きい場合には、位相角生成の範囲は制限しない。 The phase disturbance unit 10 gives a disturbance to the phase component of the smoothed spectrum input from the amplitude smoothing unit 9 and outputs the spectrum after the disturbance to the inverse Fourier transform unit 11. As a method for giving disturbance to each phase component, a phase angle in a predetermined range may be generated using a random number, and this may be added to the original phase angle. In the case where the range of phase angle generation is not limited, each phase component may be simply replaced with a phase angle generated with a random number. When the deterioration due to encoding or the like is large, the range of phase angle generation is not limited.

逆フーリエ変換部１１は、位相擾乱部１０から入力された擾乱後のスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を変形復号音声３４として重み付き加算部１８に出力する。 The inverse Fourier transform unit 11 performs inverse Fourier transform processing on the spectrum after disturbance input from the phase disturbance unit 10 to return to the signal region, and opens a window for smooth connection with the preceding and following frames. The connection is performed while performing, and the obtained signal is output to the weighted addition unit 18 as the modified decoded speech 34.

信号評価部１２内の逆フィルタ部１３は、後述する推定雑音スペクトル更新部１７内に格納されている推定雑音スペクトルパラメータを用いて、前記音声復号部４から入力された復号音声５に対する逆フィルタ処理を行い、逆フィルタされた復号音声をパワー算出部１４に出力する。この逆フィルタ処理によって、背景雑音の振幅が大きい、つまり音声と背景雑音が拮抗している可能性が高い成分の振幅抑圧を行っており、逆フィルタ処理を行わない場合に比べて、音声区間と背景雑音区間の信号パワー比が大きくとれるようになっている。 The inverse filter unit 13 in the signal evaluation unit 12 uses an estimated noise spectrum parameter stored in an estimated noise spectrum update unit 17 to be described later, and performs an inverse filter process on the decoded speech 5 input from the speech decoding unit 4. Then, the decoded speech that has been inversely filtered is output to the power calculation unit 14. By this inverse filter processing, the amplitude of the background noise is large, that is, the amplitude suppression of the component that is highly likely to antagonize the speech and the background noise, and compared with the case where the inverse filter processing is not performed, The signal power ratio in the background noise section can be increased.

なお、推定雑音スペクトルパラメータは、音声符号化処理や音声復号処理との親和性、ソフトウエアの共有化といった観点で選択する。現状では多くの場合、線スペクトル対（ＬＳＰ）を使用する。ＬＳＰの他にも、線形予測係数（ＬＰＣ）、ケプストラムなどのスペクトル包絡パラメータ、または振幅スペクトルそのものを用いても類似の効果を得ることができる。後述する推定雑音スペクトル更新部１７における更新処理としては線形補間や平均処理などを用いる構成が簡単であり、スペクトル包絡パラメータの中では線形補間や平均処理を行ってもフィルタが安定であることが保証できるＬＳＰとケプストラムが適している。雑音成分のスペクトルに対する表現力としてはケプストラムが優れているが、逆フィルタ部の構成の容易さという点ではＬＳＰが勝る。振幅スペクトルを用いる場合には、この振幅スペクトル特性をもつＬＰＣを算出して逆フィルタに使用するか、復号音声５をフーリエ変換した結果（フーリエ変換部８の出力に等しい）に対して振幅変形処理を行って逆フィルタと同様の効果を実現すればよい。 Note that the estimated noise spectrum parameter is selected from the viewpoint of compatibility with speech encoding processing and speech decoding processing, and software sharing. Currently, line spectral pairs (LSPs) are often used. In addition to the LSP, similar effects can be obtained by using spectral envelope parameters such as linear prediction coefficients (LPC), cepstrum, or the amplitude spectrum itself. As the update process in the estimated noise spectrum update unit 17 to be described later, a configuration using linear interpolation or average process is simple, and it is guaranteed that the filter is stable even if linear interpolation or average process is performed in the spectrum envelope parameter. Suitable LSP and cepstrum are suitable. The cepstrum is excellent as a power of expressing the noise component spectrum, but the LSP is superior in terms of the ease of configuration of the inverse filter unit. In the case of using an amplitude spectrum, LPC having this amplitude spectrum characteristic is calculated and used for an inverse filter, or an amplitude transformation process is performed on the result of Fourier transform of the decoded speech 5 (equal to the output of the Fourier transform unit 8). To achieve the same effect as the inverse filter.

パワー算出部１４は、逆フィルタ部１３から入力された逆フィルタされた復号音声のパワーを求め、算出されたパワー値を背景雑音らしさ算出部１５に出力する。 The power calculation unit 14 obtains the power of the decoded speech that has been subjected to inverse filtering input from the inverse filter unit 13, and outputs the calculated power value to the background noise likelihood calculation unit 15.

背景雑音らしさ算出部１５は、パワー算出部１４から入力されたパワーと、後述する推定雑音パワー更新部１６内に格納されている推定雑音パワーを用いて、現在の復号音声５の背景雑音らしさを算出し、これを加算制御値３５として重み付き加算部１８に出力する。また、算出した背景雑音らしさを後述する推定雑音パワー更新部１６と推定雑音スペクトル更新部１７に対して出力し、パワー算出部１４から入力されたパワーを後述する推定雑音パワー更新部１６に対して出力する。ここで、背景雑音らしさについては、最も単純には、次式によって算出できる。 The background noise likelihood calculation unit 15 uses the power input from the power calculation unit 14 and the estimated noise power stored in the estimated noise power update unit 16 described later to calculate the background noise likelihood of the current decoded speech 5. This is calculated and output to the weighted addition unit 18 as the addition control value 35. The calculated background noise likelihood is output to an estimated noise power updating unit 16 and an estimated noise spectrum updating unit 17 described later, and the power input from the power calculating unit 14 is output to an estimated noise power updating unit 16 described later. Output. Here, the likelihood of background noise can be calculated most simply by the following equation.

ｖ＝ｌｏｇ（ｐ_Ｎ） − ｌｏｇ（ｐ）・・・式２
ここで、ｐがパワー算出部１４から入力されたパワー、ｐ_Ｎが推定雑音パワー更新部１６内に格納されている推定雑音パワー、ｖが算出された背景雑音らしさである。 v = log (p _N ) −log (p) Equation 2
Here, p is the power input from the power calculation unit 14, p _N is the estimated noise power stored in the estimated noise power update unit 16, and v is the calculated background noise likelihood.

この場合、ｖの値が大きい程（負値であればその絶対値が小さい程）背景雑音らしい、ということになる。この他にも、ｐ_Ｎ／ｐを計算してｖとするなど、様々な算出方法が考えられる。 In this case, the greater the value of v (the smaller the negative value, the smaller the absolute value), the more likely it is background noise. In addition, various calculation methods such as calculating p _N / p and setting it to v are conceivable.

推定雑音パワー更新部１６は、背景雑音らしさ算出部１５から入力された背景雑音らしさとパワーを用いて、その内部に格納してある推定雑音パワーの更新を行う。例えば、入力された背景雑音らしさが高い（ｖの値が大きい）時に、次式に従い、入力されたパワーを推定雑音パワーに反映させることで更新を行う。 The estimated noise power update unit 16 updates the estimated noise power stored therein using the background noise likelihood and power input from the background noise likelihood calculation unit 15. For example, when the input background noise likelihood is high (the value of v is large), updating is performed by reflecting the input power to the estimated noise power according to the following equation.

ｌｏｇ（ｐ_Ｎ′）＝（１−β）ｌｏｇ（ｐ_Ｎ）＋βｌｏｇ（ｐ）・・・式３
ここで、βは０〜１の値を取る更新速度定数で、比較的０に近い値に設定するとよい。この式の右辺の値を求めて、左辺のｐ_Ｎ′を新しい推定雑音パワーとすることで更新を行う。 log (p _N ′) = (1−β) log (p _N ) + βlog (p) Equation 3
Here, β is an update rate constant that takes a value of 0 to 1, and may be set to a value relatively close to 0. The value on the right side of this equation is obtained, and updating is performed by using p _N 'on the left side as a new estimated noise power.

なお、この推定雑音パワーの更新方法については、更に推定精度を向上させるためにフレーム間での変動性を参照したり、入力された過去のパワーを複数格納しておいて、統計分析によって雑音パワーの推定を行ったり、ｐの最低値をそのまま推定雑音パワーとしたりするなど様々な変形、改良が可能である。 As for the method of updating the estimated noise power, refer to the variability between frames in order to further improve the estimation accuracy, or store a plurality of input past powers and perform noise analysis by statistical analysis. Various modifications and improvements are possible, such as estimation of the above, or using the minimum value of p as the estimated noise power as it is.

推定雑音スペクトル更新部１７は、まず入力された復号音声５を分析して、現在のフレームのスペクトルパラメータを算出する。算出するスペクトルパラメータについては逆フィルタ部１３にて説明した通りで、多くの場合ＬＳＰを使用する。そして、背景雑音らしさ算出部１５から入力され背景雑音らしさとここで算出したスペクトルパラメータを用いて、内部に格納してある推定雑音スペクトルを更新する。例えば、入力された背景雑音らしさが高い（ｖの値が大きい）時に、次式に従い、算出したスペクトルパラメータを推定雑音スペクトルに反映させることで更新を行う。 The estimated noise spectrum updater 17 first analyzes the input decoded speech 5 and calculates the spectrum parameter of the current frame. The spectrum parameter to be calculated is as described in the inverse filter unit 13, and in many cases LSP is used. Then, the estimated noise spectrum stored inside is updated by using the background noise likelihood input from the background noise likelihood calculation unit 15 and the spectrum parameter calculated here. For example, when the input background noise likelihood is high (the value of v is large), updating is performed by reflecting the calculated spectrum parameter in the estimated noise spectrum according to the following equation.

ｘ_Ｎ′＝（１−γ）ｘ_Ｎ＋γｘ・・・式４
ここで、ｘが現在のフレームのスペクトルパラメータ、ｘ_Ｎが推定雑音スペクトル（パラメータ）である。γは０〜１の値を取る更新速度定数で、比較的０に近い値に設定するとよい。この式の右辺の値を求めて、左辺のｘ_Ｎ′を新しい推定雑音スペクトル（パラメータ）とすることで更新を行う。 _{x N '= (1-γ} ) x N + γx ··· formula 4
Here, x is from the spectrum parameter of the current frame, x _N is the estimated noise spectrum (parameter). γ is an update rate constant that takes a value from 0 to 1, and may be set to a value relatively close to 0. The value on the right side of this equation is obtained, and updating is performed by setting x _N ′ on the left side as a new estimated noise spectrum (parameter).

なお、この推定雑音スペクトルの更新方法についても、上記推定雑音パワーの更新方法と同様に様々な改良が可能である。 It should be noted that various improvements can be made to the method for updating the estimated noise spectrum as in the method for updating the estimated noise power.

そして、最後の処理として、重み付き加算部１８は、信号評価部１２から入力された加算制御値３５に基づいて、音声復号部４から入力された復号音声５と信号変形部７から入力された変形復号音声３４を重み付けして加算し、得られた出力音声６を出力する。重み付け加算の制御方法の動作としては、加算制御値３５が大きく（背景雑音らしさが高く）なるにつれて復号音声５に対する重みを小さく、変形復号音声３４に対する重みを大きく制御する。逆に加算制御値３５が小さく（背景雑音らしさが低く）なるにつれて復号音声５に対する重みを大きく、変形復号音声３４に対する重みを小さく制御する。 As a final process, the weighted addition unit 18 receives the decoded speech 5 input from the speech decoding unit 4 and the signal transformation unit 7 based on the addition control value 35 input from the signal evaluation unit 12. The modified decoded speech 34 is weighted and added, and the resulting output speech 6 is output. As the operation of the weighted addition control method, the weight for the decoded speech 5 is decreased and the weight for the modified decoded speech 34 is increased as the addition control value 35 becomes larger (likely background noise). Conversely, the weight for the decoded speech 5 is increased and the weight for the modified decoded speech 34 is decreased as the addition control value 35 becomes smaller (likely background noise is low).

なお、フレーム間での重みの急変に伴う出力音声６の品質劣化を抑制するために、加算制御値３５または重み付け係数をサンプル毎に徐々に変化するように平滑化を行うことが望ましい。 In addition, in order to suppress the quality deterioration of the output sound 6 due to a sudden change in weight between frames, it is desirable to perform smoothing so that the addition control value 35 or the weighting coefficient is gradually changed for each sample.

図２には、この重み付け加算部１８における、加算制御値に基づく重み付け加算の制御例を示す。 FIG. 2 shows an example of weighted addition control based on the addition control value in the weighted addition unit 18.

図２（ａ）では、加算制御値３５に対する２つの閾値ｖ_１とｖ_２を用いて線形制御している場合である。加算制御値３５がｖ_１未満の場合には、復号音声５に対する重み付け係数ｗ_Ｓを１、変形復号音声３４に対する重み付け係数ｗ_Ｎを０とする。加算制御値３５がｖ_２以上の場合には、復号音声５に対する重み付け係数ｗ_Ｓを０、変形復号音声３４に対する重み付け係数ｗ_ＮをＡ_Ｎとする。そして加算制御値３５がｖ_１以上でｖ_２未満の場合には、復号音声５に対する重み付け係数ｗ_Ｓを１〜０、変形復号音声３４に対する重み付け係数ｗ_Ｎを０〜Ａ_Ｎの間で線形的に計算して与えている。 FIG. 2A shows a case where linear control is performed using _two threshold values v ₁ and v ₂ for the addition control value 35. When the addition control value 35 is less than v ₁ , the weighting coefficient w _S for the decoded speech 5 is set to 1, and the weighting coefficient w _N for the modified decoded speech 34 is set to 0. When the addition control value 35 is v ₂ or more, the weighting coefficient w _S for the decoded speech 5 is set to 0, and the weighting coefficient w _N for the modified decoded speech 34 is set to A _N. And when the addition control value 35 is _v less than ₂ _{v 1} or more, linear weighting coefficients _{w S} for the decoded speech 5 1-0, the weighting coefficient _{w N} to deformation decoded speech 34 between 0 to A _N Is calculated and given.

この様に制御することで、確実に背景雑音区間であると判断できる場合（ｖ_２以上）には変形復号信号３４のみが出力され、確実に音声区間であると判断できる場合（ｖ_１未満）には復号音声５そのものが出力され、音声区間か背景雑音区間か判断がつかない場合（ｖ_１以上ｖ_２未満）には、どちらの傾向が強いかに依存した比率で復号音声５と変形復号音声３４が混合された結果が出力される。 By controlling in this way, when it can be reliably determined that it is the background noise section (v ₂ or more), only the modified decoded signal 34 is output, and when it can be determined that it is surely the speech section (less than v ₁ ). In the case where the decoded speech 5 itself is output and it is not possible to determine whether it is the speech interval or the background noise interval (v ₁ or more and less than v ₂ ), the decoded speech 5 and the modified decoding are proportionately depending on which tendency is strong. The result of mixing the sound 34 is output.

なお、ここで確実に背景雑音区間であると判断できる場合（ｖ_２以上）に変形復号信号３４に乗じる重み付け係数値Ａ_Ｎとして１以下の値を与えれば、結果的に背景雑音区間の振幅抑圧効果が得られる。逆に１以上の値を与えれば、背景雑音区間の振幅強調効果が得られる。背景雑音区間は、音声符号化復号化処理によって振幅低下が起こる場合が多く、その場合には背景雑音区間の振幅強調を行うことによって、背景雑音の再現性を向上することができる。振幅抑圧と振幅強調のどちらを行うかは適用対象、使用者の要求などに依存する。 Note that if you give a value of 1 or less as the weighting coefficient value A _N for multiplying the modified decoded signal 34 when (v ₂ or more) which can be determined that where a strictly background noise period, resulting in the amplitude suppression of the background noise period An effect is obtained. Conversely, if a value of 1 or more is given, the effect of enhancing the amplitude of the background noise section can be obtained. In the background noise section, the amplitude is often lowered by the speech coding / decoding process. In this case, the background noise section can be improved in amplitude by enhancing the amplitude of the background noise section. Whether amplitude suppression or amplitude enhancement is performed depends on the application target, the user's request, and the like.

図２（ｂ）では、新たな閾値ｖ_３を追加し、ｖ_１とｖ_３間、ｖ_３とｖ_２間で重み付け係数を線形的に計算して与えた場合である。閾値ｖ_３の位置における重み付け係数の値を調整することで、音声区間か背景雑音区間か判断がつかない場合（ｖ_１以上ｖ_２未満）における混合比率を更に細かく設定することができる。一般に位相の相関が低い２つの信号を加算した場合、得られる信号のパワーは加算前の２つの信号のパワーの合計より小さくなる。ｖ_１以上ｖ_２未満の範囲における２つの重み付け係数の合計を１ないしｗ_Ｎより大きくすることで、このパワー低下を抑制することができる。なお、図２（ａ）によって得られた重み付け係数の平方根をとって更に定数を乗じた値を新たに重み付け係数とすることによっても同様の効果をもたらすことができる。 FIG. 2B shows a case where a new threshold value v ₃ is added and weighting coefficients are linearly calculated and given between v ₁ and v _{3 and} between v ₃ and v ₂ . By adjusting the value of the weighting coefficient at the position of the threshold value v ₃ , it is possible to set the mixing ratio more finely when it is not possible to determine whether it is a speech section or a background noise section (v ₁ or more and less than v ₂ ). In general, when two signals having a low phase correlation are added, the power of the obtained signal is smaller than the sum of the powers of the two signals before the addition. By making the sum of the two weighting coefficients in the range of v ₁ or more and less than v ₂ greater than 1 to w _N , this power reduction can be suppressed. The same effect can be obtained by taking the square root of the weighting coefficient obtained in FIG. 2A and further multiplying by a constant to obtain a new weighting coefficient.

図２（ｃ）では、図２（ａ）のｖ_１未満の範囲における変形復号音声３４に与える重み付け係数ｗ_Ｎとして０より大きいＢ_Ｎという値を与え、これに応じてｖ_１以上ｖ_２未満の範囲におけるｗ_Ｎも修正した場合である。背景雑音レベルが高い場合や、符号化における圧縮率が非常に高い場合など、音声区間における量子化雑音や劣化音が大きい場合には、この様に確実に音声区間と分かっている範囲においても、変形復号音声を加算することで、劣化音を聞こえにくくすることができる。 In FIG. 2C, the weighting coefficient w _N given to the modified decoded speech 34 in the range of less than v _{1 in} FIG. 2A is given a value of B _N greater than 0 and v ₁ or more and less than v ₂ accordingly. This is a case where w _N in the range is also corrected. If the background noise level is high or the compression rate in encoding is very high, such as when the quantization noise or degraded sound in the speech section is large, even in the range that is surely known as the speech section, By adding the modified decoded speech, it is possible to make it difficult to hear the degraded sound.

図２（ｄ）は、背景雑音らしさ算出部１５において、推定雑音パワーを現在のパワーで除算した結果（ｐ_Ｎ／ｐ）を背景雑音らしさ（加算制御値３５）として出力した場合に対応する制御例である。この場合、加算制御値３５は復号音声５中に含まれる背景雑音の比率を示しているので、この値に比例した比率で混合されるように重み付け係数を算出している。具体的には、加算制御値３５が１以上の場合にはｗ_Ｎが１でｗ_Ｓが０、１未満の場合には、ｗ_Ｎが加算制御値３５そのもの、ｗ_Ｓが（１−ｗ_Ｎ）となっている。 FIG. 2D shows a control corresponding to the case where the background noise likelihood calculation unit 15 outputs the result (p _N / p) obtained by dividing the estimated noise power by the current power as the background noise likelihood (addition control value 35). It is an example. In this case, since the addition control value 35 indicates the ratio of the background noise included in the decoded speech 5, the weighting coefficient is calculated so as to be mixed at a ratio proportional to this value. Specifically, when the addition control value 35 is 1 or more, when w _N is 1 and w _S is 0 or less than 1, w _N is the addition control value 35 itself, and w _S is (1−w _N ).

図３には、フーリエ変換部８における切り出し窓、逆フーリエ変換部１１における連接のための窓の実際の形状例、復号音声５との時間関係を説明する説明図を示す。 FIG. 3 is an explanatory diagram for explaining a temporal relationship with the cutout window in the Fourier transform unit 8, an actual shape example of the window for connection in the inverse Fourier transform unit 11, and the decoded speech 5.

復号音声５は、音声復号部４から所定の時間長（１フレーム長）毎に出力されてくる。ここでこの１フレーム長をＮサンプルとする。図３（ａ）は、この復号音声５の一例を示しており、ｘ（０）〜ｘ（Ｎ−１）が入力された現在のフレームの復号音声５に当たる。フーリエ変換部８では、図３（ａ）に示されるこの復号音声５に対して図３（ｂ）に示す変形台形窓を乗じることで、長さ（Ｎ＋ＮＸ）の信号を切り出す。ＮＸは変形台形窓の両端の１未満の値を持つ区間のそれぞれの長さである。この両端の区間は長さ（２ＮＸ）のハニング窓を前半と後半に２分割したものに等しい。逆フーリエ変換部１１では、逆フーリエ変換処理によって生成した信号に対して、図３（ｃ）に示す変形台形窓を乗じ、（図３（ｃ）に破線で示すように）前後のフレームで得られた同信号と時間関係を守りつつ信号の加算を行って、連続する変形復号音声３４（図３（ｄ））を生成する。 The decoded speech 5 is output from the speech decoding unit 4 every predetermined time length (one frame length). Here, this one frame length is assumed to be N samples. FIG. 3A shows an example of the decoded speech 5, which corresponds to the decoded speech 5 of the current frame in which x (0) to x (N-1) are input. The Fourier transform unit 8 cuts out a signal of length (N + NX) by multiplying the decoded speech 5 shown in FIG. 3A by a modified trapezoidal window shown in FIG. 3B. NX is the length of each section having a value less than 1 at both ends of the modified trapezoidal window. The section at both ends is equal to a Hanning window having a length (2NX) divided into two parts, the first half and the second half. The inverse Fourier transform unit 11 multiplies the signal generated by the inverse Fourier transform process by a modified trapezoidal window shown in FIG. 3C and obtains it in the previous and subsequent frames (as indicated by the broken line in FIG. 3C). The signals are added while maintaining the time relationship with the received signals, and a continuous modified decoded speech 34 (FIG. 3D) is generated.

次のフレームの信号との連接のための区間（長さＮＸ）については、現在のフレーム時点では変形復号音声３４が確定していない。すなわち、新たに確定する変形復号音声３４は、ｘ′（−ＮＸ）〜ｘ′（Ｎ−ＮＸ−１）である。このため、現在のフレームの復号音声５に対して得られる出力音声６は、次式の通りとなる。 In the section (length NX) for connection with the signal of the next frame, the modified decoded speech 34 is not fixed at the time of the current frame. That is, the newly determined modified decoded speech 34 is x ′ (− NX) to x ′ (N−NX−1). For this reason, the output audio | voice 6 obtained with respect to the decoding audio | voice 5 of the present flame | frame becomes following Formula.

ｙ（ｎ）＝ｘ（ｎ）＋ｘ′（ｎ）・・・式５
（ｎ＝ −ＮＸ，…，Ｎ−ＮＸ−１）
ここで、ｙ（ｎ）が出力音声６である。この時、信号加工部２としての処理遅延は最低でもＮＸだけ必要となる。 y (n) = x (n) + x ′ (n) (5)
(N = -NX, ..., N-NX-1)
Here, y (n) is the output sound 6. At this time, the processing delay as the signal processing unit 2 is required at least NX.

この処理遅延ＮＸが許容できない適用対象の場合、復号音声５と変形復号音声３４の時間的ズレを許容して、次式のように出力音声６を生成することもできる。 In the case where the processing delay NX is not acceptable, the output sound 6 can be generated as shown in the following equation while allowing the time difference between the decoded sound 5 and the modified decoded sound 34.

ｙ（ｎ）＝ｘ（ｎ）＋ｘ′（ｎ−ＮＸ）・・・式６
（ｎ＝０，…，Ｎ−１）
この場合、復号音声５と変形復号音声３４の時間関係にズレがあるので、位相擾乱部１０における擾乱が弱い（つまり復号音声の位相特性がある程度残っている）場合や、フレーム内でスペクトルやパワーが急変する場合には劣化を生じる場合がある。特に重み付き加算部１８における重み付け係数が大きく変化するときと、２つの重み付け係数が拮抗している場合に劣化を生じ易い。しかし、それらの劣化は比較的少なく、信号加工部の導入効果の方が十分に大きい。よって処理遅延ＮＸが許容できない適用対象についても、この方法を用いることができる。 y (n) = x (n) + x ′ (n−NX) Expression 6
(N = 0, ..., N-1)
In this case, since there is a difference in the time relationship between the decoded speech 5 and the modified decoded speech 34, the disturbance in the phase disturbance unit 10 is weak (that is, the phase characteristics of the decoded speech remain to some extent), or the spectrum and power in the frame Degradation may occur when the value changes suddenly. In particular, deterioration is likely to occur when the weighting coefficient in the weighted addition unit 18 changes greatly and when the two weighting coefficients are antagonizing. However, their deterioration is relatively small, and the effect of introducing the signal processing part is sufficiently large. Therefore, this method can also be used for an application target in which the processing delay NX is not allowed.

なお、この図３の場合、フーリエ変換前と逆フーリエ変換後に変形台形窓を乗じており、連接部分の振幅低下を招く場合がある。この振幅低下も、位相擾乱部１０における擾乱が弱い場合に起こりやすい。そのような場合には、フーリエ変換前の窓を方形窓に変更することで振幅低下の抑制が得られる。通常、位相擾乱部１０によって位相が大きく変形された結果、逆フーリエ変換後の信号に最初の変形台形窓の形状が現れてこないので、前後のフレームの変形復号音声３４とのスムーズな連接のために２つ目の窓がけが必要になる。 In the case of FIG. 3, the modified trapezoidal window is multiplied before the Fourier transform and after the inverse Fourier transform, and the amplitude of the connected portion may be reduced. This decrease in amplitude is also likely to occur when the disturbance in the phase disturbance unit 10 is weak. In such a case, the amplitude reduction can be suppressed by changing the window before Fourier transform to a rectangular window. Normally, as a result of the phase being greatly deformed by the phase disturbance unit 10, the shape of the first deformed trapezoidal window does not appear in the signal after the inverse Fourier transform, so that smooth connection with the deformed decoded speech 34 of the previous and subsequent frames is achieved. A second window is required.

なお、ここでは、信号変形部７、信号評価部１２、重み付け加算部１８の処理を全てフレーム毎に行ったが、これに限ったものではない。例えば、１フレームを複数のサブフレームに分割し、信号評価部１２の処理をサブフレーム毎に行ってサブフレーム毎の加算制御値３５を算出し、重み付け加算部１８における重み付け制御もサブフレーム毎に行っても良い。信号変形処理にフーリエ変換を使用しているので、フレーム長があまり短いとスペクトル特性の分析結果が不安定になり、変形復号音声３４が安定しにくい。一方、背景雑音らしさはもっと短い区間に対しても比較的安定に算出できるので、サブフレーム毎に算出して重み付けを細かく制御することで音声の立ち上がり部分などにおける品質改善効果が得られる。 Here, the processes of the signal transformation unit 7, the signal evaluation unit 12, and the weighting addition unit 18 are all performed for each frame, but the present invention is not limited to this. For example, one frame is divided into a plurality of subframes, the processing of the signal evaluation unit 12 is performed for each subframe to calculate the addition control value 35 for each subframe, and the weighting control in the weighting addition unit 18 is also performed for each subframe. You can go. Since the Fourier transform is used for the signal transformation process, if the frame length is too short, the analysis result of the spectral characteristics becomes unstable, and the modified decoded speech 34 is difficult to stabilize. On the other hand, since the likelihood of background noise can be calculated relatively stably even for a shorter section, the quality improvement effect at the rising portion of the voice or the like can be obtained by calculating for each subframe and finely controlling the weighting.

また、信号評価部１２の処理をサブフレーム毎に行って、フレーム内の全ての加算制御値を組み合わせて、少数の加算制御値３５を算出することもできる。音声区間を背景雑音らしいと誤りたくない場合には、全ての加算制御値の内の最小値（背景雑音らしさの最小値）を選択してフレームを代表する加算制御値３５として出力すれば良い。 It is also possible to calculate a small number of addition control values 35 by performing the processing of the signal evaluation unit 12 for each subframe and combining all the addition control values in the frame. When it is not desired to make an error when the speech section seems to be background noise, a minimum value (minimum value of background noise likelihood) of all the addition control values may be selected and output as an addition control value 35 representing a frame.

更に、復号音声５のフレーム長と信号変形部７の処理フレーム長は同一である必要はない。例えば、復号音声５のフレーム長が短くて、信号変形部７内のスペクトル分析にとって短すぎる場合には、複数フレームの復号音声５を蓄積して、一括して信号変形処理を行うようにすれば良い。但し、この場合には、複数フレームの復号音声５を蓄積するために処理遅延が発生してしまう。この他、復号音声５のフレーム長と全く独立に信号変形部７や信号加工部２全体の処理フレーム長を設定しても構わない。この場合、信号のバッファリングが複雑になるが、様々な復号音声５のフレーム長に依存することなく、信号加工処理にとって最適の処理フレーム長を選択でき、信号加工部２の品質が最も良くなる効果がある。 Furthermore, the frame length of the decoded speech 5 and the processing frame length of the signal transformation unit 7 do not have to be the same. For example, if the decoded speech 5 has a short frame length and is too short for the spectrum analysis in the signal transformation unit 7, a plurality of frames of the decoded speech 5 can be accumulated so as to perform signal transformation processing collectively. good. However, in this case, a processing delay occurs because the decoded audio 5 of a plurality of frames is accumulated. In addition, the processing frame length of the entire signal transformation unit 7 and the signal processing unit 2 may be set completely independently of the frame length of the decoded speech 5. In this case, the buffering of the signal becomes complicated, but the optimum processing frame length for the signal processing can be selected without depending on the frame lengths of the various decoded speech 5, and the quality of the signal processing unit 2 becomes the best. effective.

また、ここでは、背景雑音らしさの算出に、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定背景雑音レベル更新部１６、推定雑音スペクトル更新部１７を使用したが、背景雑音らしさを評価するものであれば、この構成に限ったものではない。 Here, the inverse filter unit 13, the power calculation unit 14, the background noise likelihood calculation unit 15, the estimated background noise level update unit 16, and the estimated noise spectrum update unit 17 are used to calculate the background noise likelihood. The structure is not limited to this structure as long as it evaluates the uniqueness.

この実施の形態１によれば、入力信号（復号音声）に対して所定の信号加工処理を行うことで、入力信号に含まれる劣化成分を主観的に気にならないようにした加工信号（変形復号音声）を生成し、所定の評価値（背景雑音らしさ）によって入力信号と加工信号の加算重みを制御するようにしたので、劣化成分が多く含まれる区間を中心に加工信号の比率を増やして、主観品質を改善できる効果がある。 According to the first embodiment, by performing predetermined signal processing on the input signal (decoded speech), the processed signal (deformed decoding) that does not subjectively care about the degradation component included in the input signal. Voice), and the addition weight of the input signal and the processed signal is controlled by a predetermined evaluation value (likeness of background noise), so the ratio of the processed signal is increased mainly in a section containing a lot of deterioration components, It has the effect of improving subjective quality.

また、スペクトル領域で信号加工処理を行うようにしたことで、スペクトル領域での細かい劣化成分の抑圧処理を行うことができ、更に主観品質を改善できる効果がある。 In addition, since signal processing is performed in the spectral region, it is possible to perform processing for suppressing fine degradation components in the spectral region, and to further improve the subjective quality.

また、加工処理として振幅スペクトル成分の平滑化処理と位相スペクトル成分の擾乱付与処理を行うようにしたので、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、更に、位相成分間に独特な相互関係を持ってしまい特徴的な劣化と感じられることが多い量子化雑音に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 In addition, since the amplitude spectrum component smoothing process and the phase spectrum component disturbance applying process are performed as the processing process, it is possible to satisfactorily suppress unstable fluctuations in the amplitude spectrum component caused by quantization noise, Furthermore, for quantization noise that often has a unique correlation between phase components and is often felt as a characteristic deterioration, the relationship between phase components can be disturbed, and the subjective quality can be improved. There is.

また、従来の音声区間または背景雑音区間のどちらか、という２値区間判定を廃し、背景雑音らしさという連続量を算出して、これに基づいて連続的に復号音声と変形復号音声の重み付け加算係数を制御するようにしたので、区間判定誤りによる品質劣化を回避できる効果がある。 Also, the conventional binary section determination of either the speech section or the background noise section is abolished, and a continuous amount of background noise likelihood is calculated, and based on this, the weighted addition coefficient of the decoded speech and the modified decoded speech is continuously calculated. Therefore, there is an effect that quality deterioration due to a section determination error can be avoided.

また、音声区間における量子化雑音や劣化音が大きい場合には、確実に音声区間と分かっている区間においても、変形復号音声を加算することで、劣化音を聞こえにくくすることができる効果がある。 In addition, when quantization noise or deteriorated sound is large in the speech section, it is possible to make it difficult to hear the deteriorated sound by adding the modified decoded speech even in a section that is surely known as the speech section. .

また、背景雑音の情報が多く含まれている復号音声の加工処理によって出力音声を生成しているので、実際の背景雑音の特性を残しつつ、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られるし、音源符号化などによる劣化成分に対しても改善効果が得られる効果がある。 In addition, since the output speech is generated by processing the decoded speech that contains a lot of background noise information, the quality of the output remains stable, while maintaining the actual background noise characteristics, and stable quality improvement that does not depend much on the noise type or spectrum shape. The effect is obtained, and there is an effect that an improvement effect can be obtained even for a deteriorated component due to excitation coding or the like.

また、現在までの復号音声を用いて処理を行うので特に大きな遅延時間は不要で、復号音声と変形復号音声の加算方法によっては処理時間以外の遅延を排除することもできる効果がある。変形復号音声のレベルを上げる際には復号音声のレベルを下げていくので、従来のように量子化雑音を聞こえなくするために大きな疑似雑音を重畳することも不要で、逆に適用対象に応じて、背景雑音レベルを小さ目にしたり、大き目にしたりすることすら可能である。また、当然のことであるが、音声復号装置または信号加工部内に閉じた処理であるので従来のような新たな伝送情報の追加は不要である。 Further, since processing is performed using the decoded speech up to now, there is no need for a particularly large delay time, and there is an effect that delays other than the processing time can be eliminated depending on the method of adding the decoded speech and the modified decoded speech. When raising the level of the modified decoded speech, the level of the decoded speech is lowered, so there is no need to superimpose a large pseudo noise to make the quantization noise inaudible as in the past, and conversely depending on the application target Thus, the background noise level can be reduced or even increased. As a matter of course, since the process is closed in the speech decoding apparatus or the signal processing unit, it is not necessary to add new transmission information as in the prior art.

更に、この実施の形態１では、音声復号部と信号加工部が明確に分離されており、両者の間の情報のやりとりも少ないので、既存のものも含めて様々な音声復号装置内に導入することが容易である。 Furthermore, in this Embodiment 1, since the speech decoding unit and the signal processing unit are clearly separated and there is little exchange of information between them, it is introduced into various speech decoding apparatuses including existing ones. Is easy.

実施の形態２．
図４は、本実施の形態による音信号加工方法を雑音抑圧方法と組み合わて適用した音信号加工装置の構成の一部を示す。図中３６は入力信号、８はフーリエ変換部、１９は雑音抑圧部、３９はスペクトル変形部、１２は信号評価部、１８は重み付き加算部、１１は逆フーリエ変換部、４０は出力信号である。スペクトル変形部３９は、振幅平滑化部９、位相擾乱部１０より構成されている。
以下、図に基づいて動作を説明する。 Embodiment 2. FIG.
FIG. 4 shows a part of the configuration of a sound signal processing apparatus to which the sound signal processing method according to the present embodiment is applied in combination with a noise suppression method. In the figure, 36 is an input signal, 8 is a Fourier transform unit, 19 is a noise suppression unit, 39 is a spectrum transformation unit, 12 is a signal evaluation unit, 18 is a weighted addition unit, 11 is an inverse Fourier transform unit, and 40 is an output signal. is there. The spectrum modification unit 39 includes an amplitude smoothing unit 9 and a phase disturbance unit 10.
Hereinafter, the operation will be described with reference to the drawings.

まず、入力信号３６が、フーリエ変換部８と信号評価部１２に入力される。 First, the input signal 36 is input to the Fourier transform unit 8 and the signal evaluation unit 12.

フーリエ変換部８は、入力された現フレームの入力信号３６と必要に応じ前フレームの入力信号３６の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを雑音抑圧部１９に出力する。なお、フーリエ変換処理および窓がけ処理については実施の形態１と同様である。 The Fourier transform unit 8 performs windowing on the signal obtained by combining the input signal 36 of the input current frame and the latest part of the input signal 36 of the previous frame as necessary, and performs Fourier processing on the signal after windowing. By performing the conversion process, a spectrum component for each frequency is calculated and output to the noise suppression unit 19. The Fourier transform process and the windowing process are the same as those in the first embodiment.

雑音抑圧部１９は、フーリエ変換部８より入力された周波数毎のスペクトル成分から、雑音抑圧部１９内部に格納してある推定雑音スペクトルを減算し、得られた結果を雑音抑圧スペクトル３７として重み付け加算部１８とスペクトル変形部３９内の振幅平滑化部９に出力する。これは、いわゆるスペクトルサブトラクション処理の主部に相当する処理である。そして、雑音抑圧部１９は、背景雑音区間であるか否かの判定を行い、背景雑音区間であればフーリエ変換部８より入力された周波数毎のスペクトル成分を用いて、内部の推定雑音スペクトルを更新する。なお、背景雑音区間であるか否かの判定は、後述する信号評価部１２の出力結果を流用して行うことで処理を簡易化することも可能である。 The noise suppression unit 19 subtracts the estimated noise spectrum stored in the noise suppression unit 19 from the spectrum component for each frequency input from the Fourier transform unit 8, and weights and adds the obtained result as the noise suppression spectrum 37. To the amplitude smoothing unit 9 in the unit 18 and the spectrum transformation unit 39. This is a process corresponding to the main part of the so-called spectral subtraction process. Then, the noise suppression unit 19 determines whether or not it is a background noise interval, and if it is a background noise interval, an internal estimated noise spectrum is calculated using a spectrum component for each frequency input from the Fourier transform unit 8. Update. Note that it is possible to simplify the process by determining whether or not the background noise section is in use by using the output result of the signal evaluation unit 12 described later.

スペクトル変形部３９内の振幅平滑化部９は、雑音抑圧部１９より入力された雑音抑圧スペクトル３７の振幅成分に対して平滑化処理を行い、平滑化後の雑音抑圧スペクトルを位相擾乱部１０に出力する。ここで用いる平滑化処理としては、周波数軸方向、時間軸方向の何れを用いても、雑音抑圧部が発生させた劣化音の抑制効果が得られる。具体的な平滑化方法については実施の形態１と同様のものを用いることができる。 The amplitude smoothing unit 9 in the spectrum modification unit 39 performs a smoothing process on the amplitude component of the noise suppression spectrum 37 input from the noise suppression unit 19, and the smoothed noise suppression spectrum is supplied to the phase disturbance unit 10. Output. As the smoothing process used here, the effect of suppressing the deteriorated sound generated by the noise suppression unit can be obtained regardless of whether the frequency axis direction or the time axis direction is used. As a specific smoothing method, the same one as in the first embodiment can be used.

スペクトル変形部３９内の位相擾乱部１０は、振幅平滑化部９から入力された平滑化後の雑音抑圧スペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを変形雑音抑圧スペクトル３８として重み付き加算部１８に出力する。各位相成分に擾乱を与える方法については実施の形態１と同様のものを用いることができる。 The phase disturbance unit 10 in the spectrum modification unit 39 gives disturbance to the phase component of the smoothed noise suppression spectrum input from the amplitude smoothing unit 9, and weighted addition is performed with the spectrum after the disturbance as the modified noise suppression spectrum 38. To the unit 18. As a method for giving disturbance to each phase component, the same method as in the first embodiment can be used.

信号評価部１２は、入力信号３６を分析して背景雑音らしさを算出し、これを加算制御値３５として重み付け加算部１８に出力する。なお、この信号評価部１２内の構成と各処理については、実施の形態１と同様のものを用いることができる。 The signal evaluation unit 12 analyzes the input signal 36 to calculate the likelihood of background noise, and outputs this to the weighted addition unit 18 as the addition control value 35. In addition, about the structure and each process in this signal evaluation part 12, the thing similar to Embodiment 1 can be used.

重み付き加算部１８は、信号評価部１２から入力された加算制御値３５に基づいて、雑音抑圧部１９から入力された雑音抑圧スペクトル３７とスペクトル変形部３９から入力された変形雑音抑圧スペクトル３８を重み付けして加算し、得られたスペクトルを逆フーリエ変換部１１に出力する。重み付け加算の制御方法の動作としては、実施の形態１と同様に、加算制御値３５が大きく（背景雑音らしさが高く）なるにつれて雑音抑圧スペクトル３７に対する重みを小さく、変形雑音抑圧スペクトル３８に対する重みを大きく制御する。逆に加算制御値３５が小さく（背景雑音らしさが低く）なるにつれて雑音抑圧スペクトル３７に対する重みを大きく、変形雑音抑圧スペクトル３８に対する重みを小さく制御する。 Based on the addition control value 35 input from the signal evaluation unit 12, the weighted addition unit 18 uses the noise suppression spectrum 37 input from the noise suppression unit 19 and the modified noise suppression spectrum 38 input from the spectrum modification unit 39. The weighted addition is performed, and the obtained spectrum is output to the inverse Fourier transform unit 11. As the operation of the weighted addition control method, as in the first embodiment, the weight for the noise suppression spectrum 37 is decreased and the weight for the modified noise suppression spectrum 38 is increased as the addition control value 35 is increased (likely to be background noise). Greatly control. Conversely, as the addition control value 35 becomes smaller (the likelihood of background noise is lower), the weight for the noise suppression spectrum 37 is increased, and the weight for the modified noise suppression spectrum 38 is decreased.

そして、最後の処理として、逆フーリエ変換部１１は、重み付き加算部１８から入力されたスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を出力信号４０として出力する。連接のための窓がけと連接処理については、実施の形態１と同様である。 Then, as the last process, the inverse Fourier transform unit 11 performs the inverse Fourier transform process on the spectrum input from the weighted addition unit 18 to return to the signal region, and smooth connection with the previous and subsequent frames. For this purpose, the connection is performed while the window is opened, and the obtained signal is output as the output signal 40. The windowing and connection processing for connection are the same as in the first embodiment.

この実施の形態２によれば、雑音抑圧処理等によって劣化したスペクトルに対して所定の加工処理を行うことで、劣化成分を主観的に気にならないようにした加工スペクトル（変形雑音抑圧スペクトル）を生成し、所定の評価値（背景雑音らしさ）によって加工前のスペクトルと加工スペクトルの加算重みを制御するようにしたので、劣化成分が多く含まれて主観品質の低下につながっている区間（背景雑音区間）を中心に加工スペクトルの比率を増やして、主観品質を改善できる効果がある。 According to the second embodiment, a predetermined processing is performed on a spectrum that has deteriorated due to noise suppression processing or the like, so that a processing spectrum (deformed noise suppression spectrum) that does not bother subjectively with the deterioration component is obtained. Generated and controlled the addition weight of the spectrum before processing and the processing spectrum by a predetermined evaluation value (likeness of background noise), so that there are many deterioration components and it leads to a decrease in subjective quality (background noise) There is an effect that the subjective quality can be improved by increasing the ratio of the processing spectrum centering on the section).

また、スペクトル領域での重み付け加算を行うようにしたので、実施の形態１に比べると加工処理のためのフーリエ変換と逆フーリエ変換が不要となり、処理が簡易になる効果がある。なお、この実施の形態２におけるフーリエ変換部８と逆フーリエ変換１１は、雑音抑圧部１９のために元々必要な構成である。 In addition, since weighted addition in the spectral region is performed, the Fourier transform and inverse Fourier transform for processing are not required compared to the first embodiment, and the processing is simplified. Note that the Fourier transform unit 8 and the inverse Fourier transform 11 in the second embodiment are originally required for the noise suppression unit 19.

また、加工処理として振幅スペクトル成分の平滑化処理と位相スペクトル成分の擾乱付与処理を行うようにしたので、量子化雑音などによって生じる振幅スペクトル成分の不安定な変動を良好に抑圧することができ、更に、位相成分間に独特な相互関係を持ってしまい特徴的な劣化と感じられることが多い量子化雑音や劣化成分に対して、位相成分間の関係に擾乱を与えることができ、主観品質を改善できる効果がある。 In addition, since the amplitude spectrum component smoothing process and the phase spectrum component disturbance applying process are performed as the processing process, it is possible to satisfactorily suppress unstable fluctuations in the amplitude spectrum component caused by quantization noise, In addition, quantization noise and deterioration components, which often have a characteristic correlation between phase components and are often perceived as characteristic deterioration, can disturb the relationship between phase components, reducing subjective quality. There is an effect that can be improved.

また、背景雑音区間であるか否かという２値区間判定ではなく、背景雑音らしさという連続量を算出して、これに基づいて連続的に重み付け加算係数を制御するようにしたので、区間判定誤りによる品質劣化を回避できる効果がある。 In addition, instead of the binary interval determination of whether or not it is a background noise interval, a continuous amount of background noise likelihood is calculated, and the weighted addition coefficient is continuously controlled based on this, so an interval determination error There is an effect that quality deterioration due to can be avoided.

また、背景雑音区間以外における劣化音が大きい場合には、図２（ｃ）のような重み付け加算を行うことで、確実に背景雑音区間以外と分かっている区間においても変形雑音抑圧スペクトルを加算し、劣化音を聞こえにくくすることができる効果がある。 In addition, when the deteriorated sound outside the background noise section is large, weighted addition as shown in FIG. 2 (c) is performed, so that the modified noise suppression spectrum is added even in the section known to be outside the background noise section. There is an effect of making it difficult to hear the deteriorated sound.

また、雑音抑圧スペクトルに対して、単純な処理を直接施して変形雑音抑圧スペクトルを生成しているので、雑音種やスペクトル形状にあまり依存しない安定な品質改善効果が得られる効果がある。 In addition, since a modified noise suppression spectrum is generated by directly performing a simple process on the noise suppression spectrum, there is an effect that a stable quality improvement effect that does not depend much on the noise type and spectrum shape can be obtained.

また、現在までの雑音抑圧スペクトルを用いて処理を行うので、雑音抑圧部１９の遅延時間に追加して、大きな遅延時間がいらない特長を持つ。変形雑音抑圧スペクトルの加算レベルをあげる際には元々の雑音抑圧スペクトルの加算レベルを下げていくので、量子化雑音を聞こえなくするために比較的大きな雑音を重畳することも不要で、背景雑音レベルを小さくすることができる効果がある。また、当然のことであるが、この処理を音声符号化処理の前処理などとして用いる場合にも、符号化部内に閉じた処理となるので従来のような新たな伝送情報の追加は不要である。 Further, since processing is performed using the noise suppression spectrum up to now, there is a feature that a large delay time is not necessary in addition to the delay time of the noise suppression unit 19. When increasing the addition level of the modified noise suppression spectrum, the addition level of the original noise suppression spectrum is lowered, so it is not necessary to superimpose a relatively large noise in order to make the quantization noise inaudible. There is an effect that can be reduced. Of course, even when this process is used as a pre-process for the speech encoding process, it is a process closed in the encoding unit, so that it is not necessary to add new transmission information as in the prior art. .

実施の形態３．
図１との対応部分に同一符号を付けた図５は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中２０は信号変形部７の変形強度を制御する情報を出力する変形強度制御部である。変形強度制御部２０は、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５より構成されている。 Embodiment 3 FIG.
FIG. 5, in which the same reference numerals are assigned to the parts corresponding to those in FIG. 1, shows the overall configuration of the speech decoding apparatus to which the sound signal processing method according to this embodiment is applied. In FIG. It is a deformation intensity control part which outputs the information to do. The deformation intensity control unit 20 includes an auditory weighting unit 21, a Fourier transform unit 22, a level determination unit 23, a continuity determination unit 24, and a deformation intensity calculation unit 25.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２、重み付き加算部１８に入力される。 The decoded speech 5 output from the speech decoding unit 4 is input to the signal transformation unit 7, the transformation strength control unit 20, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2.

変形強度制御部２０内の聴覚重み付け部２１は、音声復号部４より入力された復号音声５に対して、聴覚重み付け処理を行い、得られた聴覚重み付け音声をフーリエ変換部２２に出力する。ここで、聴覚重み付け処理としては、音声符号化処理（音声復号部４で行った音声復号処理と対を成すもの）で使用されているものと同様な処理を行う。 The auditory weighting unit 21 in the deformation intensity control unit 20 performs auditory weighting processing on the decoded speech 5 input from the speech decoding unit 4, and outputs the obtained auditory weighted speech to the Fourier transform unit 22. Here, as auditory weighting processing, processing similar to that used in speech coding processing (which is paired with speech decoding processing performed in speech decoding unit 4) is performed.

ＣＥＬＰなどの符号化処理で良く用いられる聴覚重み付け処理は、符号化対象の音声を分析して線形予測係数（ＬＰＣ）を算出し、これに定数乗算を行って２つの変形ＬＰＣを求め、この２つの変形ＬＰＣをフィルタ係数とするＡＲＭＡフィルタを構成し、このフィルタを用いたフィルタリング処理によって聴覚重み付けを行う、というものである。復号音声５に対して符号化処理と同様の聴覚重み付けを行うためには、受信した音声符号３を復号して得られたＬＰＣ、もしくは復号音声５を再分析して算出したＬＰＣを出発点として、２つの変形ＬＰＣを求め、これを用いて聴覚重み付けフィルタを構成すれば良い。 An auditory weighting process often used in an encoding process such as CELP is to analyze a speech to be encoded to calculate a linear prediction coefficient (LPC), and perform constant multiplication on this to obtain two modified LPCs. An ARMA filter having two modified LPCs as filter coefficients is configured, and auditory weighting is performed by a filtering process using this filter. In order to perform auditory weighting on the decoded speech 5 in the same manner as in the encoding process, an LPC obtained by decoding the received speech code 3 or an LPC calculated by reanalyzing the decoded speech 5 is used as a starting point. What is necessary is just to obtain | require two deformation | transformation LPC and comprise an auditory weighting filter using this.

ＣＥＬＰなどの符号化処理では、聴覚重み付け後の音声上での歪みを最小化するように符号化を行うので、聴覚重み付け後の音声において、振幅が大きいスペクトル成分は、量子化雑音の重畳が少ない、ということになる。従って、符号化時の聴覚重み付け音声に近い音声を復号化部１内で生成できれば、信号変形部７における変形強度の制御情報として有用である。 In an encoding process such as CELP, encoding is performed so as to minimize distortion on speech after auditory weighting. Therefore, in a speech component after auditory weighting, a spectral component having a large amplitude has little superposition of quantization noise. ,It turns out that. Therefore, if the speech close to the auditory weighted speech at the time of encoding can be generated in the decoding unit 1, it is useful as control information for the deformation intensity in the signal deformation unit 7.

なお、音声復号部４における音声復号処理にスペクトルポストフィルタなどの加工処理が含まれている場合（ＣＥＬＰの場合にはほとんどに含まれている）には、本来であればまず復号音声５からスペクトルポストフィルタなどの加工処理の影響を除去した音声を生成するか、音声復号部４内からこの加工処理直前の音声を抽出するかして、該音声に対して聴覚重み付けを行うことによって、符号化時の聴覚重み付け音声に近い音声が得られる。しかし、背景雑音区間の品質改善を主な目的とする場合には、この区間におけるスペクトルポストフィルタなどの加工処理の影響は少なく、その影響を除去しなくても効果に大差は出ない。この実施の形態３は、スペクトルポストフィルタなどの加工処理の影響除去を行わない構成としている。 If the speech decoding process in the speech decoding unit 4 includes a processing process such as a spectrum post filter (mostly included in the case of CELP), the spectrum is first converted from the decoded speech 5 first. Encoding is performed by generating a sound from which the influence of processing such as a post filter has been removed or by extracting the sound immediately before the processing from the sound decoding unit 4 and performing auditory weighting on the sound. Sound close to the auditory weighted sound of the time can be obtained. However, when the main purpose is to improve the quality of the background noise section, the influence of processing such as a spectrum post filter in this section is small, and even if the influence is not removed, the effect does not vary greatly. In the third embodiment, the influence of processing such as a spectrum post filter is not removed.

なお、当然のことであるが、符号化処理において聴覚重み付けを行っていない場合や、その効果が小さくて無視しても良い場合には、この聴覚重み付け部２１は不要となる。その場合、信号変形部７内のフーリエ変換部８の出力を、後述するレベル判定部２３と連続性判定部２４に与えればよいので、フーリエ変換部２２も不要とできる。 As a matter of course, the auditory weighting unit 21 is not necessary when auditory weighting is not performed in the encoding process or when the effect is small and can be ignored. In that case, since the output of the Fourier transform unit 8 in the signal transformation unit 7 may be given to a level determination unit 23 and a continuity determination unit 24 described later, the Fourier transform unit 22 can also be omitted.

更に、スペクトル領域でも非線型振幅変換処理など聴覚重み付けに近い効果をもたらす方法があるので、符号化処理内で使用している聴覚重み付け方法との誤差を無視して構わない場合には、信号変形部７内のフーリエ変換部８の出力をこの聴覚重み付け部２１への入力とし、聴覚重み付け部２１がこの入力に対してスペクトル領域での聴覚重み付けを行い、フーリエ変換部２２を省略して、後述するレベル判定部２３と連続性判定部２４に聴覚重み付けされたスペクトルを出力するように構成することも可能である。 In addition, there is a method that brings an effect close to auditory weighting, such as non-linear amplitude conversion processing, in the spectral domain, so if you can ignore errors with the auditory weighting method used in the encoding process, The output of the Fourier transform unit 8 in the unit 7 is used as an input to the auditory weighting unit 21, and the auditory weighting unit 21 performs auditory weighting in the spectral region on this input, omitting the Fourier transform unit 22, and will be described later. It is also possible to configure so that an auditory weighted spectrum is output to the level determination unit 23 and the continuity determination unit 24.

変形強度制御部２０内のフーリエ変換部２２は、聴覚重み付け部２１より入力された聴覚重み付け音声と必要に応じ前フレームの聴覚重み付け音声の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを聴覚重み付けスペクトルとしてレベル判定部２３と連続性判定部２４に出力する。なお、フーリエ変換処理および窓がけ処理については実施の形態１のフーリエ変換部８と同様である。 The Fourier transform unit 22 in the deformation intensity control unit 20 performs windowing on a signal obtained by combining the auditory weighted speech input from the auditory weighting unit 21 and the latest part of the auditory weighted speech of the previous frame as necessary. A spectrum component for each frequency is calculated by performing Fourier transform processing on the signal after windowing, and this is output to the level determination unit 23 and the continuity determination unit 24 as an auditory weighted spectrum. The Fourier transform process and the windowing process are the same as those of the Fourier transform unit 8 of the first embodiment.

レベル判定部２３は、フーリエ変換部２２から入力された聴覚重み付けスペクトルの各振幅成分の値の大きさに基づいて、各周波数毎の第一の変形強度を算出し、これを変形強度算出部２５に出力する。聴覚重み付けスペクトルの各振幅成分の値が小さい程量子化雑音の比率が大きいので、第一の変形強度を強くすればよい。最も単純には、全振幅成分の平均値を求めて、この平均値に所定の閾値Ｔｈを加算して、これを上回る成分に対しては第一の変形強度を０、これを下回る成分に対しては第一の変形強度を１とすればよい。図６には、この閾値Ｔｈを用いた場合の聴覚重み付けスペクトルと第一の変形強度の関係を示す。なお、第一の変形強度の算出方法はこれに限定されるものではない。 The level determination unit 23 calculates the first deformation intensity for each frequency based on the magnitude of each amplitude component value of the auditory weighting spectrum input from the Fourier transform unit 22, and uses this as the deformation intensity calculation unit 25. Output to. The smaller the value of each amplitude component of the auditory weighting spectrum, the larger the ratio of quantization noise, so the first deformation strength may be increased. In the simplest case, an average value of all amplitude components is obtained, and a predetermined threshold Th is added to the average value. The first deformation strength is 0 for components exceeding this value, and for components below this value. For example, the first deformation strength may be 1. FIG. 6 shows the relationship between the auditory weighting spectrum and the first deformation intensity when this threshold value Th is used. Note that the first deformation strength calculation method is not limited to this.

連続性判定部２４は、フーリエ変換部２２から入力された聴覚重み付けスペクトルの各振幅成分または各位相成分の時間方向の連続性を評価し、この評価結果に基づいて、各周波数毎の第二の変形強度を算出し、これを変形強度算出部２５に出力する。聴覚重み付けスペクトルの振幅成分の時間方向の連続性、位相成分の（フレーム間の時間推移による位相の回転を補償した後の）連続性が低い周波数成分については、良好な符号化が行われていたとは考えにくいので、第二の変形強度を強くする。この第二の変形強度の算出についても、最も単純には所定の閾値を用いた判定によって０または１を与える方法を用いることができる。 The continuity determination unit 24 evaluates the continuity in the time direction of each amplitude component or each phase component of the auditory weighting spectrum input from the Fourier transform unit 22, and based on this evaluation result, the second continuity for each frequency. The deformation strength is calculated and output to the deformation strength calculation unit 25. It is said that good coding was performed for frequency components with low continuity in the time direction of the amplitude component of the auditory weighting spectrum and low continuity of the phase component (after compensating for phase rotation due to temporal transition between frames). Since it is difficult to think, increase the second deformation strength. For the calculation of the second deformation strength, the method of giving 0 or 1 by the determination using a predetermined threshold can be used most simply.

変形強度算出部２５は、レベル判定部２３より入力された第一の変形強度と、連続性判定部２４より入力された第二の変形強度に基づいて、各周波数毎の最終的な変形強度を算出し、これを信号変形部７内の振幅平滑化部９と位相擾乱部１０に出力する。この最終的な変形強度については、第一の変形強度と第二の変形強度の最小値、重み付き平均値、最大値などを用いることができる。以上でこの実施の形態３にて新たに加わった変形強度制御部２０の動作の説明を終了する。 Based on the first deformation strength input from the level determination unit 23 and the second deformation strength input from the continuity determination unit 24, the deformation strength calculation unit 25 calculates the final deformation strength for each frequency. This is calculated and output to the amplitude smoothing unit 9 and the phase disturbance unit 10 in the signal transformation unit 7. For the final deformation strength, the minimum value, the weighted average value, the maximum value, etc. of the first deformation strength and the second deformation strength can be used. Above, description of operation | movement of the deformation | transformation intensity | strength control part 20 newly added in this Embodiment 3 is complete | finished.

次に、この変形強度制御部２０の追加に伴って、動作に変更がある構成要素について説明する。 Next, a description will be given of constituent elements whose operation is changed with the addition of the deformation strength control unit 20.

振幅平滑化部９は、変形強度制御部２０より入力された変形強度に従い、フーリエ変換部８から入力された周波数毎のスペクトルの振幅成分に対して平滑化処理を行い、平滑化後のスペクトルを位相擾乱部１０に出力する。なお、変形強度が強い周波数成分程、平滑化を強めるように制御する。平滑化強度の強さを制御する最も単純な方法は、入力された変形強度が大きいときにのみ平滑化を行うようにすればよい。この他にも平滑化を強める方法としては、実施の形態１で説明した平滑化の数式における平滑化係数αを小さくしたり、固定的な平滑化を行った後のスペクトルと平滑化前のスペクトルを重み付き加算して最終的なスペクトルを生成するように構成しておき、平滑化前のスペクトルに対する重みを小さくするなど様々な方法を用いることができる。 The amplitude smoothing unit 9 performs a smoothing process on the amplitude component of the spectrum for each frequency input from the Fourier transform unit 8 according to the deformation intensity input from the deformation intensity control unit 20, and converts the smoothed spectrum to Output to the phase disturbance unit 10. It should be noted that control is performed so as to increase the smoothing of the frequency component having a higher deformation strength. The simplest method for controlling the strength of the smoothing strength is to perform smoothing only when the input deformation strength is large. As another method of enhancing smoothing, the spectrum after the smoothing coefficient α in the smoothing equation described in Embodiment 1 is reduced or fixed smoothed and the spectrum before smoothing are used. It is possible to use various methods such as generating a final spectrum by weighting and adding a weight to the spectrum before smoothing.

位相擾乱部１０は、変形強度制御部２０より入力された変形強度に従い、振幅平滑化部９から入力された平滑化後のスペクトルの位相成分に擾乱を与え、擾乱後のスペクトルを逆フーリエ変換部１１に出力する。なお、変形強度が強い周波数成分程、位相の擾乱を大きく与えるように制御する。擾乱の大きさを制御する最も単純な方法は、入力された変形強度が大きいときにのみ擾乱を与えるようにすればよい。この他にも擾乱を制御する方法としては、乱数で生成する位相角の範囲を大小させるなど様々な方法を用いることができる。 The phase perturbation unit 10 perturbs the phase component of the smoothed spectrum input from the amplitude smoothing unit 9 according to the deformation intensity input from the deformation intensity control unit 20, and converts the spectrum after the disturbance into an inverse Fourier transform unit. 11 is output. It should be noted that control is performed so that a frequency component having a higher deformation strength gives a larger phase disturbance. The simplest method for controlling the magnitude of the disturbance is to give the disturbance only when the input deformation strength is large. As other methods for controlling the disturbance, various methods such as increasing or decreasing the range of the phase angle generated by random numbers can be used.

その他の構成要素については、実施の形態１と同様であるため説明を省略する。 Since other components are the same as those in the first embodiment, description thereof is omitted.

なお、ここでは、レベル判定部２３と連続性判定部２４の両方の出力結果を使用したが、一方だけを使用するようにして、残るもう一方は省略する構成も可能である。また、変形強度によって制御する対象を、振幅平滑化部９と位相擾乱部１０の一方のみとする構成でも構わない。 Although the output results of both the level determination unit 23 and the continuity determination unit 24 are used here, a configuration in which only one is used and the other is omitted is also possible. In addition, a configuration in which only one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is controlled by the deformation strength may be used.

この実施の形態３によれば、入力信号（復号音声）または聴覚重み付けされた入力信号（復号音声）の各周波数成分毎の振幅の大きさ、各周波数毎の振幅や位相の連続性の大きさに基づいて、加工信号（変形復号音声）を生成する際の変形強度を周波数毎に制御するようにしたので、実施の形態１が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工が加えられ、量子化雑音や劣化成分が少ない良好な成分まで加工してしまうことがなくなり、入力信号や実際の背景雑音の特性を比較的良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 According to the third embodiment, the magnitude of each frequency component of the input signal (decoded speech) or the auditory weighted input signal (decoded speech), the magnitude of the continuity of the amplitude and phase for each frequency. In addition to the effect of the first embodiment, since the amplitude spectrum component is small, quantization is performed because the deformation intensity at the time of generating the processed signal (modified decoded speech) is controlled for each frequency. Components that are dominated by noise and degradation components, and the continuity of spectral components is low, so that quantization noise and degradation components tend to increase. The effect of improving the subjective quality by suppressing the quantization noise and the degradation component subjectively while keeping the characteristics of the input signal and the actual background noise relatively good, without processing the good component with few components. There .

実施の形態４．
図５との対応部分に同一符号を付けた図７は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中４１は加算制御値分割部であり、図５における信号変形部７の部分は、フーリエ変換部８、スペクトル変形部３９、逆フーリエ変換部１１の構成に変更している。 Embodiment 4 FIG.
FIG. 7 in which the same reference numerals are assigned to the corresponding parts in FIG. 5 shows the overall configuration of the speech decoding apparatus to which the sound signal processing method according to the present embodiment is applied. In the figure, 41 is an addition control value dividing unit. 5 is changed to a configuration of a Fourier transform unit 8, a spectrum transform unit 39, and an inverse Fourier transform unit 11.

音声復号部４から出力された復号音声５は、信号加工部２内のフーリエ変換部８、変形強度制御部２０、信号評価部１２に入力される。 The decoded speech 5 output from the speech decoding unit 4 is input to the Fourier transform unit 8, the deformation strength control unit 20, and the signal evaluation unit 12 in the signal processing unit 2.

フーリエ変換部８は、実施の形態２と同様にして、入力された現フレームの復号音声５と必要に応じ前フレームの復号音声５の最新部分を合わせた信号に対して、窓がけを行い、窓がけ後の信号に対してフーリエ変換処理を行うことで周波数毎のスペクトル成分を算出し、これを復号音声スペクトル４３として重み付き加算部１８とスペクトル変形部３９内の振幅平滑化部９に出力する。 In the same manner as in the second embodiment, the Fourier transform unit 8 performs windowing on a signal obtained by combining the input decoded speech 5 of the current frame and the latest part of the decoded speech 5 of the previous frame as necessary. A spectrum component for each frequency is calculated by performing Fourier transform processing on the signal after windowing, and this is output as a decoded speech spectrum 43 to the weighted addition unit 18 and the amplitude smoothing unit 9 in the spectrum transformation unit 39. To do.

スペクトル変形部３９は、実施の形態２と同様にして、入力された復号音声スペクトル４３に対して、振幅平滑化部９、位相擾乱部１０の処理を順に行い、得られたスペクトルを変形復号音声スペクトル４４として、重み付き加算部１８に出力する。 In the same manner as in the second embodiment, the spectrum modification unit 39 performs the processes of the amplitude smoothing unit 9 and the phase disturbance unit 10 on the input decoded speech spectrum 43 in order, and converts the obtained spectrum into the modified decoded speech. The spectrum 44 is output to the weighted addition unit 18.

変形強度制御部２０内では、実施の形態３と同様に、入力された復号音声５に対して、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５の処理を順次行い、得られた周波数毎の変形強度を加算制御値分割部４１に出力する。 In the deformation intensity control unit 20, the auditory weighting unit 21, the Fourier transform unit 22, the level determination unit 23, the continuity determination unit 24, and the deformation intensity calculation are performed on the input decoded speech 5 as in the third embodiment. The processing of the unit 25 is sequentially performed, and the obtained deformation strength for each frequency is output to the addition control value dividing unit 41.

なお、実施の形態３と同様に、符号化処理において聴覚重み付けを行っていない場合や、その効果が小さい場合には、聴覚重み付け部２１とフーリエ変換部２２は不要となる。その場合、フーリエ変換部８の出力を、レベル判定部２３と連続性判定部２４に与えればよい。 As in the third embodiment, the perceptual weighting unit 21 and the Fourier transform unit 22 are not required when perceptual weighting is not performed in the encoding process or when the effect is small. In that case, what is necessary is just to give the output of the Fourier-transform part 8 to the level determination part 23 and the continuity determination part 24. FIG.

また、フーリエ変換部８の出力をこの聴覚重み付け部２１への入力とし、聴覚重み付け部２１がこの入力に対してスペクトル領域での聴覚重み付けを行い、フーリエ変換部２２を省略して、後述するレベル判定部２３と連続性判定部２４に聴覚重み付けされたスペクトルを出力するように構成することも可能である。この様に構成することで、処理の簡易化効果が得られる。 The output of the Fourier transform unit 8 is used as an input to the perceptual weighting unit 21. The perceptual weighting unit 21 performs perceptual weighting on the input in the spectral region, omits the Fourier transform unit 22, and is described later. It is also possible to configure the judgment unit 23 and the continuity judgment unit 24 to output auditory weighted spectra. By configuring in this way, a process simplification effect can be obtained.

信号評価部１２は、実施の形態１と同様に、入力された復号音声５に対して、背景雑音らしさを求めて、これを加算制御値３５として加算制御値分割部４１に出力する。 Similarly to the first embodiment, the signal evaluation unit 12 obtains the likelihood of background noise for the input decoded speech 5 and outputs this to the addition control value division unit 41 as the addition control value 35.

新たに加えられた加算制御値分割部４１は、変形強度制御部２０から入力された周波数毎の変形強度と、信号評価部１２から入力された加算制御値３５を用いて、周波数毎の加算制御値４２を生成し、これを重み付き加算部１８に出力する。変形強度が強い周波数については、その周波数の加算制御値４２の値を制御して、重み付き加算部１８における復号音声スペクトル４３の重みを弱く、変形復号音声スペクトル４４の重みを強くする。逆に変形強度が弱い周波数については、その周波数の加算制御値４２の値を制御して、重み付き加算部１８における復号音声スペクトル４３の重みを強く、変形復号音声スペクトル４４の重みを弱くする。つまり、変形強度が強い周波数については、背景雑音らしさが高いわけであるので、その周波数の加算制御値４２を大きくし、逆の場合には、小さくするわけである。 The newly added addition control value dividing unit 41 uses the deformation strength for each frequency input from the deformation strength control unit 20 and the addition control value 35 input from the signal evaluation unit 12 to perform addition control for each frequency. A value 42 is generated and output to the weighted addition unit 18. For a frequency having a high deformation strength, the value of the addition control value 42 of the frequency is controlled so that the weight of the decoded speech spectrum 43 in the weighted addition unit 18 is weakened and the weight of the modified decoded speech spectrum 44 is increased. On the other hand, for a frequency with a weak deformation strength, the value of the addition control value 42 for that frequency is controlled to increase the weight of the decoded speech spectrum 43 in the weighted addition unit 18 and weaken the weight of the modified decoded speech spectrum 44. In other words, since the frequency of strong deformation is high in background noise, the addition control value 42 for the frequency is increased, and in the opposite case, it is decreased.

重み付き加算部１８は、加算制御値分割部４１から入力された周波数毎の加算制御値４２に基づいて、フーリエ変換部８から入力された復号音声スペクトル４３とスペクトル変形部３９から入力された変形復号音声スペクトル４４を重み付けして加算し、得られたスペクトルを逆フーリエ変換部１１に出力する。重み付け加算の制御方法の動作としては、図２にて説明したのと同様に、周波数毎の加算制御値４２が大きい（背景雑音らしさが高い）周波数成分に対しては復号音声スペクトル４３に対する重みを小さく、変形復号音声スペクトル４４に対する重みを大きく制御する。逆に周波数毎の加算制御値４２が小さい（背景雑音らしさが低い）周波数成分に対しては復号音声スペクトル４３に対する重みを大きく、変形復号音声スペクトル４４に対する重みを小さく制御する。 The weighted addition unit 18 is based on the addition control value 42 for each frequency input from the addition control value dividing unit 41, and the decoded speech spectrum 43 input from the Fourier transform unit 8 and the modification input from the spectrum modification unit 39. The decoded speech spectrum 44 is weighted and added, and the obtained spectrum is output to the inverse Fourier transform unit 11. As the operation of the weighted addition control method, as described with reference to FIG. 2, the weight for the decoded speech spectrum 43 is applied to a frequency component having a large addition control value 42 for each frequency (high background noise likelihood). A small weight and a large weight for the modified decoded speech spectrum 44 are controlled. On the other hand, for a frequency component having a small addition control value 42 for each frequency (low background noise likelihood), the weight for the decoded speech spectrum 43 is increased, and the weight for the modified decoded speech spectrum 44 is decreased.

そして、最後の処理として、逆フーリエ変換部１１は、実施の形態２と同様にして、重み付き加算部１８から入力されたスペクトルに対して逆フーリエ変換処理を行うことで、信号領域に戻し、前後のフレームとの滑らかな連接のための窓がけを行いつつ連接していき、得られた信号を出力音声６として出力する。 As a final process, the inverse Fourier transform unit 11 returns to the signal region by performing an inverse Fourier transform process on the spectrum input from the weighted addition unit 18 in the same manner as in the second embodiment. The connection is performed while performing windowing for smooth connection with the front and rear frames, and the obtained signal is output as the output sound 6.

なお、加算制御値分割部４１を廃して、信号評価部１２の出力を重み付き加算部１８に与え、変形強度制御部２０の出力である変形強度を振幅平滑化部９と位相擾乱部１０に与える構成も可能である。この様にしたものは、実施の形態３の構成における重み付き加算処理をスペクトル領域で行うようにしたものに相当する。 The addition control value dividing unit 41 is abolished, the output of the signal evaluation unit 12 is given to the weighted addition unit 18, and the deformation strength that is the output of the deformation strength control unit 20 is sent to the amplitude smoothing unit 9 and the phase disturbance unit 10. A configuration is also possible. This is equivalent to the weighted addition process in the configuration of Embodiment 3 performed in the spectral domain.

更に、実施の形態３の場合と同様に、レベル判定部２３と連続性判定部２４の一方だけを使用するようにして、残るもう一方は省略する構成も可能である。
この実施の形態４によれば、入力信号（復号音声）または聴覚重み付けされた入力信号（復号音声）の各周波数成分毎の振幅の大きさ、各周波数毎の振幅や位相の連続性の大きさに基づいて、入力信号のスペクトル（復号音声スペクトル）と加工スペクトル（変形復号音声スペクトル）の重み付け加算を周波数成分毎に独立に制御するようにしたので、実施の形態１が持つ効果に加えて、前記振幅スペクトル成分が小さいために量子化雑音や劣化成分が支配的になっている成分、スペクトル成分の連続性が低いために量子化雑音や劣化成分が多くなりがちな成分に対して重点的に加工スペクトルの重みを強め、量子化雑音や劣化成分が少ない良好な成分まで加工スペクトルの重みを強めてしまうことがなくなり、入力信号や実際の背景雑音の特性を比較的良好に残しつつ量子化雑音や劣化成分を主観的に抑圧でき、主観品質を改善できる効果がある。 Furthermore, as in the case of the third embodiment, it is possible to use only one of the level determination unit 23 and the continuity determination unit 24 and omit the remaining one.
According to the fourth embodiment, the magnitude of the amplitude for each frequency component of the input signal (decoded speech) or the auditory weighted input signal (decoded speech), the magnitude of the continuity of the amplitude and phase for each frequency. Since the weighted addition of the spectrum of the input signal (decoded speech spectrum) and the processed spectrum (modified decoded speech spectrum) is controlled independently for each frequency component, in addition to the effects of the first embodiment, Emphasis is placed on components where quantization noise and degradation components are dominant because the amplitude spectral components are small, and components where quantization noise and degradation components tend to increase due to low continuity of spectrum components. The weight of the processing spectrum is increased, and the weight of the processing spectrum is not increased to a good component with little quantization noise and deterioration component. While leaving sex relatively good can subjectively suppressed quantization noise or the degraded component, there is an effect of improving the subjective quality.

実施の形態３と比較すると、平滑化と擾乱という２つの周波数毎の変形処理から、１つの周波数毎の変形処理に変わっており、処理が簡易化される効果がある。 Compared to the third embodiment, the transformation processing for each frequency, that is, smoothing and disturbance, is changed to the transformation processing for each frequency, which has the effect of simplifying the processing.

実施の形態５．
図５との対応部分に同一符号を付けた図８は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中２６は背景雑音らしさ（加算制御値３５）の時間方向の変動性を判定する変動性判定部である。 Embodiment 5 FIG.
FIG. 8, in which parts corresponding to those in FIG. 5 are assigned the same reference numerals, shows the overall configuration of the speech decoding apparatus to which the sound signal processing method according to this embodiment is applied. In the figure, reference numeral 26 denotes background noise likelihood (addition control value 35). It is a variability determination part which determines the variability of the time direction.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２、重み付き加算部１８に入力される。信号評価部１２は、入力された復号音声５に対して、背景雑音らしさを評価し、評価結果を加算制御値３５として、変動性判定部２６と重み付き加算部１８に出力する。 The decoded speech 5 output from the speech decoding unit 4 is input to the signal transformation unit 7, the transformation strength control unit 20, the signal evaluation unit 12, and the weighted addition unit 18 in the signal processing unit 2. The signal evaluation unit 12 evaluates the likelihood of background noise for the input decoded speech 5 and outputs the evaluation result as the addition control value 35 to the variability determination unit 26 and the weighted addition unit 18.

変動性判定部２６は、信号評価部１２より入力された加算制御値３５を、その内部に格納している過去の加算制御値３５と比較し、値の時間方向の変動性が高いか否かを判定し、この判定結果に基づいて第三の変形強度を算出し、これを変形強度制御部２０内の変形強度算出部２５に出力する。そして、入力された加算制御値３５を用いて内部に格納している過去の加算制御値３５を更新する。
加算制御値３５などのフレーム（またはサブフレーム）の特性を表すパラメータの時間方向の変動性が高い場合には、復号音声５のスペクトルが時間方向に大きく変化している場合が多く、必要以上に強い振幅平滑化や位相擾乱付与を行うと不自然な反響感が発生してしまう。そこで、この第三の変形強度は、加算制御値３５の時間方向の変動性が高い場合には、振幅平滑化部９における平滑化と位相擾乱部１０における擾乱付与が弱くなるように設定する。なお、フレーム（またはサブフレーム）の特性を表すパラメータであれば、復号音声のパワー、スペクトル包絡パラメータなど、加算制御値３５以外のパラメータを用いても同様の効果を得ることができる。 The variability determination unit 26 compares the addition control value 35 input from the signal evaluation unit 12 with the past addition control value 35 stored therein, and whether or not the variability of the value in the time direction is high. And the third deformation strength is calculated based on the determination result, and is output to the deformation strength calculation unit 25 in the deformation strength control unit 20. Then, the past addition control value 35 stored therein is updated using the input addition control value 35.
When the variability in the time direction of the parameter representing the characteristics of the frame (or subframe) such as the addition control value 35 is high, the spectrum of the decoded speech 5 often changes greatly in the time direction, which is more than necessary. If strong amplitude smoothing or phase disturbance is applied, an unnatural echo is generated. Therefore, when the variability in the time direction of the addition control value 35 is high, the third deformation strength is set so that the smoothing in the amplitude smoothing unit 9 and the disturbance application in the phase disturbance unit 10 are weakened. Note that the same effect can be obtained by using parameters other than the addition control value 35, such as decoded speech power and spectrum envelope parameters, as long as they represent parameters of frames (or subframes).

変動性の判定方法としては、最も単純には、前フレームの加算制御値３５との差分の絶対値を所定の閾値と比較して、閾値を上回っていれば変動性が高い、とすれば良い。この他、前フレームおよび前々フレームの加算制御値３５との差分の絶対値を各々算出して、その一方が所定の閾値を上回っているか否かで判定してもよい。また、信号評価部１２がサブフレーム毎に加算制御値３５を算出する場合には、現在のフレーム内または必要に応じて前フレーム内の全サブフレーム間の加算制御値３５の差分の絶対値を求めて、何れかが所定の閾値を上回っているか否かで判定することもできる。そして、具体的な処理例としては、閾値を上回っていれば第三の変形強度を０、閾値を下回っていれば第三の変形強度を１とする。 As a variability determination method, the simplest method is to compare the absolute value of the difference from the addition control value 35 of the previous frame with a predetermined threshold value and to determine that the variability is high if the threshold value is exceeded. . In addition, the absolute value of the difference between the previous frame and the previous frame's addition control value 35 may be calculated, and determination may be made based on whether one of the absolute values exceeds a predetermined threshold. In addition, when the signal evaluation unit 12 calculates the addition control value 35 for each subframe, the absolute value of the difference of the addition control value 35 between all the subframes in the current frame or in the previous frame is calculated as necessary. It can be determined by determining whether any of them exceeds a predetermined threshold. As a specific processing example, the third deformation strength is set to 0 if the threshold is exceeded, and the third deformation strength is set to 1 if it is below the threshold.

変形強度制御部２０内では、入力された復号音声５に対して、聴覚重み付け部２１、フーリエ変換部２２、レベル判定部２３、連続性判定部２４までは、実施の形態３と同様な処理を行う。 In the deformation intensity control unit 20, the auditory weighting unit 21, the Fourier transform unit 22, the level determination unit 23, and the continuity determination unit 24 perform the same processing as in the third embodiment on the input decoded speech 5. Do.

そして、変形強度算出部２５では、レベル判定部２３より入力された第一の変形強度、連続性判定部２４より入力された第二の変形強度、変動性判定部２６より入力された第三の変形強度に基づいて、各周波数毎の最終的な変形強度を算出し、これを信号変形部７内の振幅平滑化部９と位相擾乱部１０に出力する。この最終的な変形強度の算出方法としては、第三の変形強度を全周波数に対して一定値として与え、周波数毎にこの全周波数に拡張した第三の変形強度、第一の変形強度、第二の変形強度の最小値、重み付き平均値、最大値などを求めて最終的な変形強度とする、という方法を用いることができる。 In the deformation strength calculation unit 25, the first deformation strength input from the level determination unit 23, the second deformation strength input from the continuity determination unit 24, and the third deformation strength input from the variability determination unit 26. Based on the deformation strength, the final deformation strength for each frequency is calculated and output to the amplitude smoothing section 9 and the phase disturbance section 10 in the signal deformation section 7. As a method for calculating the final deformation strength, the third deformation strength is given as a constant value with respect to all frequencies, and the third deformation strength, the first deformation strength, It is possible to use a method of obtaining a final deformation strength by obtaining a minimum value, a weighted average value, a maximum value, etc. of the second deformation strength.

以降の信号変形部７、重み付き加算部１８の動作は、実施の形態３と同様であり、説明を省略する。 Subsequent operations of the signal transformation unit 7 and the weighted addition unit 18 are the same as those in the third embodiment, and a description thereof will be omitted.

なお、ここでは、レベル判定部２３と連続性判定部２４の両方の出力結果を使用したが、一方だけを使用するようにしたり、両方とも使用しない構成も可能である。また、変形強度によって制御する対象を、振幅平滑化部９と位相擾乱部１０の一方のみとしたり、第三の変形強度については一方のみを制御対象とする構成でも構わない。 Although the output results of both the level determination unit 23 and the continuity determination unit 24 are used here, it is possible to use only one or neither. In addition, a configuration in which only one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is controlled by the deformation strength, or only one of the third deformation strengths is controlled.

この実施の形態５によれば、実施の形態３の構成に加えて、平滑化強度または擾乱付与強度を、所定の評価値（背景雑音らしさ）の時間変動性（フレームまたはサブフレーム間の変動性）の大きさによって制御するようにしたので、実施の形態３が持つ効果に加えて、入力信号（復号音声）の特性が変動している区間において必要以上に強い加工処理を抑止でき、なまけ、エコー（反響感）の発生を防止できる効果がある。 According to the fifth embodiment, in addition to the configuration of the third embodiment, the smoothing strength or the disturbance imparting strength is set to the time variability (variability between frames or subframes) of a predetermined evaluation value (likeness of background noise) In addition to the effects of the third embodiment, it is possible to suppress processing that is stronger than necessary in a section where the characteristics of the input signal (decoded speech) fluctuate, This has the effect of preventing the occurrence of echoes.

実施の形態６．
図５との対応部分に同一符号を付けた図９は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示す。図中２７は摩擦音らしさ評価部、３１は背景雑音らしさ評価部、４５は加算制御値算出部である。摩擦音らしさ評価部２７は、低域カットフィルタ２８、零交差数カウント部２９、摩擦音らしさ算出部３０より構成される。背景雑音らしさ評価部３１は、図５における信号評価部１２と同じ構成であり、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５、推定雑音パワー更新部１６、推定雑音スペクトル更新部１７より構成される。信号評価部１２は、図５の場合と異なり、摩擦音らしさ評価部２７、背景雑音らしさ評価部３１、加算制御値算出部４５より構成される。 Embodiment 6 FIG.
FIG. 9, in which the same reference numerals are assigned to the parts corresponding to FIG. 5, shows the overall configuration of the speech decoding apparatus to which the sound signal processing method according to this embodiment is applied. In the figure, 27 is a friction sound likelihood evaluation unit, 31 is a background noise likelihood evaluation unit, and 45 is an addition control value calculation unit. The frictional sound likelihood evaluation unit 27 includes a low-frequency cut filter 28, a zero-crossing number counting unit 29, and a frictional sound likelihood calculation unit 30. The background noise likelihood evaluation unit 31 has the same configuration as the signal evaluation unit 12 in FIG. 5, and includes an inverse filter unit 13, a power calculation unit 14, a background noise likelihood calculation unit 15, an estimated noise power update unit 16, and an estimated noise spectrum update unit. 17. Unlike the case of FIG. 5, the signal evaluation unit 12 includes a frictional sound likelihood evaluation unit 27, a background noise likelihood evaluation unit 31, and an addition control value calculation unit 45.

音声復号部４から出力された復号音声５が、信号加工部２内の信号変形部７、変形強度制御部２０、信号評価部１２内の摩擦音らしさ評価部２７と背景雑音らしさ評価部３１、そして重み付き加算部１８に入力される。 The decoded speech 5 output from the speech decoding unit 4 includes a signal deformation unit 7 in the signal processing unit 2, a deformation strength control unit 20, a frictional sound likelihood evaluation unit 27 and a background noise likelihood evaluation unit 31 in the signal evaluation unit 12, and Input to the weighted addition unit 18.

信号評価部１２内の背景雑音らしさ評価部３１は、実施の形態３における信号評価部１２と同様に、入力された復号音声５に対して、逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５の処理を行って、得られた背景雑音らしさ４６を加算制御値算出部４５に出力する。また、推定雑音パワー更新部１６、推定雑音スペクトル更新部１７の処理を行って、各々に格納してある推定雑音パワーと推定雑音スペクトルの更新を行う。 Similar to the signal evaluation unit 12 in the third embodiment, the background noise likelihood evaluation unit 31 in the signal evaluation unit 12 applies an inverse filter unit 13, a power calculation unit 14, and background noise likelihood to the input decoded speech 5. The processing of the calculation unit 15 is performed, and the obtained background noise likelihood 46 is output to the addition control value calculation unit 45. Further, the estimated noise power update unit 16 and the estimated noise spectrum update unit 17 are processed to update the estimated noise power and the estimated noise spectrum stored therein.

摩擦音らしさ評価部２７内の低域カットフィルタ２８は、入力された復号音声５に対して低周波数成分を抑圧する低域カットフィルタリング処理を行い、フィルタリング後の復号音声を零交差数カウント部２９に出力する。この低域カットフィルタリング処理の目的は、復号音声に含まれる直流成分や低周波数の成分がオッフセットとなって、後述する零交差数カウント部２９のカウント結果が少なくなることを防止することである。従って、単純には、フレーム内の復号音声５の平均値を算出し、これを復号音声５の各サンプルから減算することでもよい。 The low frequency cut filter 28 in the frictional sound likelihood evaluation unit 27 performs low frequency cut filtering processing for suppressing low frequency components on the input decoded speech 5, and outputs the decoded speech after filtering to the zero-crossing number counting unit 29. Output. The purpose of this low-frequency cut filtering process is to prevent a direct current component or a low-frequency component included in the decoded speech from becoming an offset and reducing the count result of the zero-crossing number counting unit 29 described later. Therefore, simply, an average value of the decoded speech 5 in the frame may be calculated and subtracted from each sample of the decoded speech 5.

零交差数カウント部２９は、低域カットフィルタ２８より入力された音声を分析して、含まれる零交差数を数え上げ、得られた零交差数を摩擦音らしさ算出部３０に出力する。零交差数を数え上げる方法としては、隣接サンプルの正負を比較し、同一でなければ零を交差している、としてカウントする方法、隣接サンプルの値の積をとって、その結果が負または零であれば零を交差している、としてカウントする方法などがある。 The zero-crossing number counting unit 29 analyzes the voice input from the low-frequency cut filter 28, counts the number of zero-crossings included, and outputs the obtained number of zero-crossings to the friction sound likelihood calculation unit 30. To count the number of zero crossings, compare the positive and negative values of adjacent samples and count them as crossing zero if they are not the same. Take the product of the values of adjacent samples, and the result is negative or zero. If there is, there is a method of counting as crossing zero.

摩擦音らしさ算出部３０は、零交差数カウント部２９より入力された零交差数を、所定の閾値と比較し、この比較結果に基づいて摩擦音らしさ４７を求めて、これを加算制御値算出部４５に出力する。例えば、零交差数が閾値より大きい場合には、摩擦音らしいと判定して摩擦音らしさを１に設定する。逆に零交差数が閾値より小さい場合には、摩擦音らしくないと判定して摩擦音らしさを０に設定する。この他、閾値を２つ以上設けて、摩擦音らしさを段階的に設定したり、所定の関数を用意しておいて、零交差数から連続的な値の摩擦音らしさを算出するようにしても良い。 The frictional sound likelihood calculating unit 30 compares the number of zero crossings input from the zero crossing number counting unit 29 with a predetermined threshold value, obtains the frictional sound likelihood 47 based on the comparison result, and adds this to the addition control value calculating unit 45. Output to. For example, when the number of zero crossings is larger than the threshold value, it is determined that the noise is likely to be a friction noise, and the friction noise likelihood is set to 1. Conversely, when the number of zero crossings is smaller than the threshold value, it is determined that the frictional sound is not likely and the frictional noise likelihood is set to zero. In addition, two or more threshold values may be provided to set the frictional sound likelihood in a stepwise manner, or a predetermined function may be prepared to calculate a continuous value of the frictional sound likelihood from the number of zero crossings. .

なお、この摩擦音らしさ評価部２７内の構成は、あくまでも一例にすぎず、スペクトル傾斜の分析結果に基づいて評価するようにしたり、パワーやスペクトルの定常性に基づいて評価するようにしたり、零交差数も含めて複数のパラメータを組み合わせて評価するようにしたりしても構わない。 Note that the configuration within the frictional sound likelihood evaluation unit 27 is merely an example, and the evaluation is based on the analysis result of the spectrum inclination, the evaluation is based on the continuity of power and spectrum, or the zero crossing. A plurality of parameters including the number may be combined and evaluated.

加算制御値算出部４５は、背景雑音らしさ評価部３１より入力された背景雑音らしさ４６と、摩擦音らしさ評価部２７より入力された摩擦音らしさ４７に基づいて、加算制御値３５を算出し、これを重み付き加算部１８に出力する。背景雑音らしい場合と摩擦音らしい場合のどちらにおいても、量子化雑音が聞き苦しくなってしまうことが多いので、背景雑音らしさ４６と摩擦音らしさ４７を適切に重み付き加算することで加算制御値３５を算出すればよい。 The addition control value calculation unit 45 calculates an addition control value 35 based on the background noise likelihood 46 input from the background noise likelihood evaluation unit 31 and the friction sound likelihood 47 input from the friction sound likelihood evaluation unit 27. The result is output to the weighted addition unit 18. In both cases of background noise and friction noise, quantization noise is often difficult to hear. Therefore, the addition control value 35 is calculated by appropriately weighted addition of background noise likelihood 46 and friction noise likelihood 47. do it.

以降の信号変形部７、変形強度制御部２０、重み付き加算部１８の動作は、実施の形態３と同様であり、説明を省略する。 The subsequent operations of the signal transformation unit 7, the transformation strength control unit 20, and the weighted addition unit 18 are the same as those in the third embodiment, and a description thereof will be omitted.

この実施の形態６によれば、入力信号（復号音声）の背景雑音らしさと摩擦音らしさが高い場合に、入力信号（復号音声）の代わりに加工信号（変形復号音声）をより大きく出力するようにしたので、実施の形態３が持つ効果に加えて、量子化雑音や劣化成分が多く発生しがちな摩擦音区間に対して重点的な加工が加えられ、摩擦音以外の区間についてもその区間に適切な加工（加工しない、低レベルの加工を行うなど）が選択されるので、主観品質を改善できる効果がある。なお、摩擦音らしさ以外にも、量子化雑音や劣化成分が多く発生しがちな部分がある程度特定できる場合には、その部分らしさを評価して、加算制御値に反映させることが可能である。その様に構成すれば、大きい量子化雑音や劣化成分を１つずつ抑圧していくことができるので、主観品質が一層改善できる効果がある。 According to the sixth embodiment, when the background noise and the frictional noise are high in the input signal (decoded voice), the processed signal (modified decoded voice) is output more greatly instead of the input signal (decoded voice). Therefore, in addition to the effects of the third embodiment, a focused processing is applied to a frictional sound section that tends to generate a lot of quantization noise and deterioration components, and sections other than the frictional sound are also appropriate for the section. Since processing (not processing, performing low level processing, etc.) is selected, there is an effect that subjective quality can be improved. In addition to the frictional sound, when a part that tends to generate a lot of quantization noise and degradation components can be specified to some extent, it is possible to evaluate the likelihood of the part and reflect it in the addition control value. With such a configuration, it is possible to suppress large quantization noises and degradation components one by one, so that there is an effect that the subjective quality can be further improved.

また、当然のことであるが、背景雑音らしさ評価部を削除した構成も可能である。 As a matter of course, a configuration in which the background noise likelihood evaluation unit is deleted is also possible.

実施の形態７．
図１との対応部分に同一符号を付けた図１０は、本実施の形態による信号加工方法を適用した音声復号装置の全体構成を示し、図中３２はポストフィルタ部である。 Embodiment 7 FIG.
FIG. 10, in which the same reference numerals are assigned to the parts corresponding to those in FIG. 1, shows the overall configuration of the speech decoding apparatus to which the signal processing method according to this embodiment is applied. In FIG.

まず音声符号３が音声復号装置１内の音声復号部４に入力される。 First, the speech code 3 is input to the speech decoding unit 4 in the speech decoding device 1.

音声復号部４は、入力された音声符号３に対して復号処理を行い、得られた復号音声５をポストフィルタ部３２、信号変形部７、信号評価部１２に出力する。 The speech decoding unit 4 performs a decoding process on the input speech code 3 and outputs the obtained decoded speech 5 to the post filter unit 32, the signal transformation unit 7, and the signal evaluation unit 12.

ポストフィルタ部３２は、入力された復号音声５に対して、スペクトル強調処理、ピッチ周期性強調処理などを行い、得られた結果をポストフィルタ復号音声４８として重み付き加算部１８に出力する。このポストフィルタ処理は、ＣＥＬＰ復号処理の後処理として一般的に使用されているもので、符号化復号化によって発生した量子化雑音を抑圧することを目的として導入されている。スペクトル強度の弱い部分には量子化雑音が多く含まれているので、この成分の振幅を抑圧してしまうものである。なお、ピッチ周期性強調処理が行われず、スペクトル強調処理だけが行われている場合もある。 The post filter unit 32 performs spectrum enhancement processing, pitch periodicity enhancement processing, and the like on the input decoded speech 5 and outputs the obtained result to the weighted addition unit 18 as the post filter decoded speech 48. This post-filter process is generally used as a post-process of CELP decoding process, and is introduced for the purpose of suppressing quantization noise generated by encoding / decoding. Since the portion where the spectrum intensity is weak contains a lot of quantization noise, the amplitude of this component is suppressed. In some cases, the pitch periodicity enhancement process is not performed and only the spectrum enhancement process is performed.

なお、実施の形態１、実施の形態３ないし６は、このポストフィルタ処理を音声復号部４内に含まれるもの、もしくは存在しないものの何れにも適用可能なものについて説明したが、この実施の形態７では、音声復号部４内にポストフィルタ処理が含まれるものからポストフィルタ処理の全部もしくは一部をポストフィルタ部３２として独立させている。 In the first embodiment and the third to sixth embodiments, the post filter processing has been described as being applicable to any of those included in the speech decoding unit 4 or those that do not exist. 7, all or part of the post-filter processing is made independent as post-filter unit 32 from what the post-filter processing is included in the speech decoding unit 4.

信号変形部７は、実施の形態１と同様に、入力された復号音声５に対して、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ変換部１１の処理を行い、得られた変形復号音声３４を重み付き加算部１８に出力する。 As in the first embodiment, the signal transformation unit 7 performs the processes of the Fourier transform unit 8, the amplitude smoothing unit 9, the phase disturbance unit 10, and the inverse Fourier transform unit 11 on the input decoded speech 5, The obtained modified decoded speech 34 is output to the weighted addition unit 18.

信号評価部１２は、実施の形態１と同様に、入力された復号音声５に対して、背景雑音らしさを評価し、評価結果を加算制御値３５として重み付き加算部１８に出力する。 As in the first embodiment, the signal evaluation unit 12 evaluates the likelihood of background noise for the input decoded speech 5 and outputs the evaluation result as the addition control value 35 to the weighted addition unit 18.

そして、最後の処理として、重み付き加算部１８は、実施の形態１と同様に、信号評価部１２から入力された加算制御値３５に基づいて、ポストフィルタ部３２から入力されたポストフィルタ復号音声４８と信号変形部７から入力された変形復号音声３４を重み付け加算し、得られた出力音声６を出力する。 Then, as a final process, the weighted addition unit 18 performs post-filter decoded speech input from the post-filter unit 32 based on the addition control value 35 input from the signal evaluation unit 12 as in the first embodiment. 48 and the modified decoded speech 34 input from the signal transformation unit 7 are weighted and added, and the resulting output speech 6 is output.

この実施の形態７によれば、ポストフィルタによる加工前の復号音声に基づいて変形復号音声を生成し、更にポストフィルタによる加工前の復号音声を分析して背景雑音らしさを求め、これに基づいてポストフィルタ復号音声と変形復号音声の加算時の重みを制御するようにしたので、実施の形態１が持つ効果に加えて、ポストフィルタによる復号音声の変形を含まない変形復号音声が生成でき、ポストフィルタによる復号音声の変形に影響されずに算出した精度の高い背景雑音らしさに基づいて精度の高い加算重み制御ができるようになるので、更に主観品質が改善する効果がある。 According to the seventh embodiment, the modified decoded speech is generated based on the decoded speech before processing by the post filter, and the decoded speech before processing by the post filter is analyzed to obtain the likelihood of background noise, based on this Since the weight at the time of adding the post-filter decoded speech and the modified decoded speech is controlled, in addition to the effect of the first embodiment, the modified decoded speech that does not include the deformation of the decoded speech by the post-filter can be generated. Since the addition weight control with high accuracy can be performed based on the high accuracy of background noise calculated without being affected by the deformation of the decoded speech by the filter, the subjective quality is further improved.

背景雑音区間においては、ポストフィルタによって劣化音までも強調されて聞き苦しくなってしまっていることが多く、ポストフィルタによる加工前の復号音声を出発点として変形復号音声を生成した方が、歪み音は小さくなる。また、ポストフィルタの処理が複数のモードを持っており、しばしば処理を切り替える場合には、その切り替えが背景雑音らしさの評価に影響する危険性が高く、ポストフィルタによる加工前の復号音声に対して背景雑音らしさを評価した方が安定な評価結果が得られる。 In the background noise section, the degraded sound is often emphasized by the post-filter, and it is often difficult to hear, and it is better to generate the modified decoded speech using the decoded speech before processing by the post-filter as the starting point. Becomes smaller. In addition, when the post-filter processing has multiple modes, and switching the processing often, there is a high risk that the switching will affect the evaluation of the likelihood of background noise. A more stable evaluation result can be obtained by evaluating the likelihood of background noise.

なお、実施の形態３の構成において、この実施の形態７と同様にポストフィルタ部の分離を行った場合には、図５の聴覚重み付け部２１の出力結果が、より符号化処理内の聴覚重み付け音声に近づき、量子化雑音の多い成分の特定精度が上がり、より良い変形強度制御が得られ、主観品質が更に改善する効果が得られる。 In the configuration of the third embodiment, when the post filter unit is separated as in the seventh embodiment, the output result of the auditory weighting unit 21 in FIG. 5 is more perceptually weighted in the encoding process. As the voice approaches, the accuracy of specifying a component with a lot of quantization noise is improved, better deformation intensity control can be obtained, and the subjective quality can be further improved.

また、実施の形態６の構成において、この実施の形態７と同様にポストフィルタ部の分離を行った場合には、図９の摩擦音らしさ評価部２７における評価精度が上がり、主観品質が更に改善する効果が得られる。 Further, in the configuration of the sixth embodiment, when the post filter unit is separated as in the seventh embodiment, the evaluation accuracy in the frictional sound likelihood evaluation unit 27 of FIG. 9 is increased, and the subjective quality is further improved. An effect is obtained.

なお、ポストフィルタ部の分離を行わない構成は、分離したこの実施の形態７の構成に比べると、音声復号部（ポストフィルタを含む）との接続が復号音声の１点だけと少なく、独立の装置、プログラムにて実現が容易である長所がある。この実施の形態７では、ポストフィルタを有する音声復号部に対して独立の装置、プログラムにて実現することが容易でない短所もあるが、上記の様々な効果を持つものである。 Note that the configuration in which the post filter unit is not separated is less connected to the speech decoding unit (including the post filter) than the separated configuration of the seventh embodiment, and is independent of the decoded speech. There is an advantage that it can be easily realized by a device and a program. In the seventh embodiment, there is a disadvantage that it is not easy to be realized by an apparatus and program independent of a speech decoding unit having a post filter. However, the seventh embodiment has various effects described above.

実施の形態８．
図１０との対応部分に同一符号を付けた図１１は、本実施の形態による音信号加工方法を適用した音声復号装置の全体構成を示し、図中３３は音声復号部４内で生成されたスペクトルパラメータである。図１０との相違点としては、実施の形態３と同様の変形強度制御部２０が追加され、スペクトルパラメータ３３が音声復号部４から信号評価部１２と変形強度制御部２０に入力されている点である。 Embodiment 8 FIG.
FIG. 11, in which the same reference numerals are assigned to the parts corresponding to those in FIG. 10, shows the overall configuration of the speech decoding apparatus to which the sound signal processing method according to this embodiment is applied. In FIG. It is a spectral parameter. The difference from FIG. 10 is that a deformation strength control unit 20 similar to that of the third embodiment is added, and the spectrum parameter 33 is input from the speech decoding unit 4 to the signal evaluation unit 12 and the deformation strength control unit 20. It is.

音声復号部４は、入力された音声符号３に対して復号処理を行い、得られた復号音声５をポストフィルタ部３２、信号変形部７、変形強度制御部２０、信号評価部１２に出力する。また、復号処理の過程で生成したスペクトルパラメータ３３を、信号評価部１２内の推定雑音スペクトル更新部１７と変形強度制御部２０内の聴覚重み付け部２１に出力する。なお、スペクトルパラメータ３３としては、線形予測係数（ＬＰＣ）、線スペクトル対（ＬＳＰ）などが一般的に用いられていることが多い。 The speech decoding unit 4 performs a decoding process on the input speech code 3 and outputs the obtained decoded speech 5 to the post filter unit 32, the signal transformation unit 7, the transformation strength control unit 20, and the signal evaluation unit 12. . Further, the spectrum parameter 33 generated in the course of the decoding process is output to the estimated noise spectrum updating unit 17 in the signal evaluation unit 12 and the auditory weighting unit 21 in the deformation intensity control unit 20. As the spectrum parameter 33, a linear prediction coefficient (LPC), a line spectrum pair (LSP), etc. are generally used in many cases.

変形強度制御部２０内の聴覚重み付け部２１は、音声復号部４より入力された復号音声５に対して、やはり音声復号部４から入力されたスペクトルパラメータ３３を用いて聴覚重み付け処理を行い、得られた聴覚重み付け音声をフーリエ変換部２２に出力する。具体的な処理としては、スペクトルパラメータ３３が線形予測係数（ＬＰＣ）である場合にはこれをそのまま用い、スペクトルパラメータ３３がＬＰＣ以外のパラメータである場合には、このスペクトルパラメータ３３をＬＰＣに変換して、このＬＰＣに定数乗算を行って２つの変形ＬＰＣを求め、この２つの変形ＬＰＣをフィルタ係数とするＡＲＭＡフィルタを構成し、このフィルタを用いたフィルタリング処理によって聴覚重み付けを行う。なお、この聴覚重み付け処理は、音声符号化処理（音声復号部４で行った音声復号処理と対を成すもの）で使用されているものと同様な処理を行うことが望ましい。 The perceptual weighting unit 21 in the deformation intensity control unit 20 performs perceptual weighting processing on the decoded speech 5 input from the speech decoding unit 4 using the spectral parameter 33 also input from the speech decoding unit 4. The perceived weighted speech is output to the Fourier transform unit 22. Specifically, when the spectral parameter 33 is a linear prediction coefficient (LPC), this is used as it is. When the spectral parameter 33 is a parameter other than LPC, the spectral parameter 33 is converted to LPC. Then, the LPC is subjected to constant multiplication to obtain two modified LPCs, an ARMA filter having the two modified LPCs as filter coefficients is configured, and auditory weighting is performed by a filtering process using the filter. The auditory weighting process is desirably performed in the same manner as that used in the speech encoding process (which is paired with the speech decoding process performed by the speech decoding unit 4).

変形強度制御部２０内では、上記聴覚重み付け部２１の処理に続いて、実施の形態３と同様に、フーリエ変換部２２、レベル判定部２３、連続性判定部２４、変形強度算出部２５の処理を行い、得られた変形強度を信号変形部７に対して出力する。 In the deformation intensity control unit 20, following the process of the auditory weighting unit 21, the processes of the Fourier transform unit 22, the level determination unit 23, the continuity determination unit 24, and the deformation intensity calculation unit 25, as in the third embodiment. And the obtained deformation strength is output to the signal deformation unit 7.

信号変形部７は、実施の形態３と同様に、入力された復号音声５と変形強度に対して、フーリエ変換部８、振幅平滑化部９、位相擾乱部１０、逆フーリエ変換部１１の処理を行い、得られた変形復号音声３４を重み付き加算部１８に出力する。 Similarly to the third embodiment, the signal transformation unit 7 processes the input decoded speech 5 and the transformation intensity by the Fourier transform unit 8, the amplitude smoothing unit 9, the phase disturbance unit 10, and the inverse Fourier transform unit 11. The modified decoded speech 34 obtained is output to the weighted addition unit 18.

信号評価部１２内では、実施の形態１と同様に、入力された復号音声５に対して、まず逆フィルタ部１３、パワー算出部１４、背景雑音らしさ算出部１５の処理を行って背景雑音らしさを評価し、評価結果を加算制御値３５として重み付き加算部１８に出力する。また、推定雑音パワー更新部１６の処理を行って、内部の推定雑音パワーを更新する。 In the signal evaluation unit 12, as in the first embodiment, the input decoded speech 5 is first subjected to the processing of the inverse filter unit 13, the power calculation unit 14, and the background noise likelihood calculation unit 15 to be the background noise likelihood. And the evaluation result is output to the weighted addition unit 18 as the addition control value 35. Moreover, the process of the estimated noise power update part 16 is performed, and internal estimated noise power is updated.

そして、推定雑音スペクトル更新部１７は、音声復号部４から入力されたスペクトルパラメータ３３と背景雑音らしさ算出部１５から入力され背景雑音を用いて、その内部に格納してある推定雑音スペクトルを更新する。例えば、入力された背景雑音らしさが高い時に、実施の形態１に示した式に従い、スペクトルパラメータ３３を推定雑音スペクトルに反映させることで更新を行う。
以降のポストフィルタ部３２、重み付き加算部１８の動作については、実施の形態７と同様であるため、説明を省略する。 Then, the estimated noise spectrum updating unit 17 updates the estimated noise spectrum stored therein using the spectrum parameter 33 input from the speech decoding unit 4 and the background noise input from the background noise likelihood calculating unit 15. . For example, when the likelihood of input background noise is high, updating is performed by reflecting the spectrum parameter 33 in the estimated noise spectrum according to the equation shown in the first embodiment.
Since the subsequent operations of the post filter unit 32 and the weighted addition unit 18 are the same as those in the seventh embodiment, description thereof will be omitted.

この実施の形態８によれば、音声復号処理の過程で生成されたスペクトルパラメータを流用して、聴覚重み付け処理、推定雑音スペクトルの更新を行うようにしたので、実施の形態３及び実施の形態７が持つ効果に加えて、処理が簡易化される効果がある。 According to the eighth embodiment, the spectral parameters generated in the course of the speech decoding process are used to perform the auditory weighting process and the update of the estimated noise spectrum, so the third and seventh embodiments. In addition to the effects of the process, the process is simplified.

更に、符号化処理とまったく同じ聴覚重み付け処理が実現され、量子化雑音の多い成分の特定精度が上がり、より良い変形強度制御が得られ、主観品質が改善する効果が得られる。 Furthermore, exactly the same auditory weighting process as that of the encoding process is realized, the accuracy of specifying components with a lot of quantization noise is improved, better deformation intensity control is obtained, and the effect of improving the subjective quality is obtained.

また、背景雑音らしさの算出に用いる推定雑音スペクトルの（音声符号化処理に入力された音声のスペクトルに近いという意味での）推定精度が上がり、結果として得られる安定した高精度の背景雑音らしさに基づいて精度の高い加算重み制御ができるようになり、主観品質が改善する効果がある。 In addition, the estimation accuracy of the estimated noise spectrum used to calculate the background noise likelihood (in the sense that it is close to the speech spectrum input to the speech encoding process) is increased, and the resulting high-precision background noise likelihood is obtained. Based on this, it becomes possible to perform addition weight control with high accuracy, and there is an effect of improving the subjective quality.

なお、この実施の形態８では、ポストフィルタ部３２を音声復号部４から分離した構成であったが、分離していない構成においても、実施の形態８のように音声復号部４が出力したスペクトルパラメータ３３を流用して信号加工部２の処理を行うことができる。この場合でも、上記実施の形態８と同様の効果が得られる。 In the eighth embodiment, the post-filter unit 32 is separated from the speech decoding unit 4. However, the spectrum output by the speech decoding unit 4 as in the eighth embodiment also in a configuration in which the post-filter unit 32 is not separated. The signal processing unit 2 can be processed using the parameter 33. Even in this case, the same effect as in the eighth embodiment can be obtained.

実施の形態９．
上記図７に示す実施の形態４の構成において、加算制御値分割部４１が、重み付け加算部１８にて加算される変形復号音声スペクトル４４の周波数毎の重みを乗じた後のスペクトルの概形が、量子化雑音の推定スペクトル形状に一致するように、出力する変形強度を制御することも可能である。 Embodiment 9 FIG.
In the configuration of the fourth embodiment shown in FIG. 7, the outline of the spectrum after the addition control value dividing unit 41 has multiplied the weight for each frequency of the modified decoded speech spectrum 44 added by the weighted addition unit 18 is as follows. It is also possible to control the output deformation intensity so as to match the estimated spectral shape of the quantization noise.

図１２は、この場合の復号音声スペクトル４３と、変形復号音声スペクトル４４に周波数毎の重みを乗じた後のスペクトルの一例を示す模式図である。 FIG. 12 is a schematic diagram illustrating an example of a spectrum obtained by multiplying the decoded speech spectrum 43 and the modified decoded speech spectrum 44 in this case by a weight for each frequency.

復号音声スペクトル４３には、符号化方式に依存したスペクトル形状を持つ量子化雑音が重畳している。ＣＥＬＰ系の音声符号化方式においては、聴覚重み付け処理後の音声における歪みを最小化するように符号の探索を行う。このため、量子化雑音は、聴覚重み付け処理後の音声においては、平坦なスペクトル形状を持つことになり、最終的な量子化雑音のスペクトル形状は、聴覚重み付け処理の逆特性のスペクトル形状を持つことになる。よって、聴覚重み付け処理のスペクトル特性を求め、この逆特性のスペクトル形状を求めて、変形復号音声スペクトルのスペクトル形状がこれに合うように、加算制御値分割部４１の出力を制御することは可能である。 The decoded speech spectrum 43 is superposed with quantization noise having a spectrum shape depending on the encoding method. In the CELP speech coding method, a code search is performed so as to minimize distortion in speech after auditory weighting processing. For this reason, the quantization noise has a flat spectral shape in the sound after perceptual weighting processing, and the final spectral shape of the quantization noise has a spectral shape opposite to that of perceptual weighting processing. become. Therefore, it is possible to obtain the spectrum characteristic of the auditory weighting process, obtain the spectrum shape of the inverse characteristic, and control the output of the addition control value dividing unit 41 so that the spectrum shape of the modified decoded speech spectrum matches this. is there.

この実施の形態９によれば、最終的な出力音声６に含まれる変形復号音声成分のスペクトル形状を量子化雑音の推定スペクトルの概形に一致するようにしたので、実施の形態４が持つ効果に加えて、必要最低限のパワーの変形復号音声の加算によって音声区間における聞き苦しい量子化雑音を聞こえにくくすることができる効果がある。 According to the ninth embodiment, the spectral shape of the modified decoded speech component included in the final output speech 6 is made to coincide with the approximate shape of the estimated spectrum of the quantization noise. In addition, it is possible to make it difficult to hear unpleasant quantization noise in the speech section by adding the modified decoded speech with the minimum power.

実施の形態１０．
上記実施の形態１、実施の形態３ないし８の構成において、振幅平滑化部９の処理内で、平滑化後の振幅スペクトルが推定量子化雑音の振幅スペクトル形状に一致するように加工することも可能である。なお、推定量子化雑音の振幅スペクトル形状の算出は、実施の形態９と同様にして行えばよい。 Embodiment 10 FIG.
In the configuration of the first embodiment and the third to eighth embodiments, the amplitude spectrum after smoothing may be processed so as to match the amplitude spectrum shape of the estimated quantization noise within the processing of the amplitude smoothing unit 9. Is possible. The calculation of the amplitude spectrum shape of the estimated quantization noise may be performed in the same manner as in the ninth embodiment.

この実施の形態１０によれば、変形復号音声のスペクトル形状を量子化雑音の推定スペクトル形状に一致するようにしたので、実施の形態１、実施の形態３ないし８が持つ効果に加えて、必要最低限のパワーの変形復号音声の加算によって音声区間における聞き苦しい量子化雑音を聞こえにくくすることができる効果がある。 According to the tenth embodiment, since the spectrum shape of the modified decoded speech matches the estimated spectrum shape of the quantization noise, in addition to the effects of the first embodiment and the third to eighth embodiments, it is necessary. There is an effect that it is possible to make it difficult to hear unpleasant quantization noise in the speech section by adding the deformed decoded speech with the minimum power.

実施の形態１１．
上記実施の形態１、実施の形態３ないし１０では、信号加工部２を復号音声５の加工に使用しているが、この信号加工部２のみを取り出して、音響信号復号部（音響信号符号化に対する復号部）、雑音抑圧処理の後段に接続するなど、他の信号加工処理に使用することもできる。但し、解消したい劣化成分の特性に応じて、信号変形部における変形処理、信号評価部における評価方法を変更、調整することが必要になる。 Embodiment 11 FIG.
In the first embodiment and the third to tenth embodiments, the signal processing unit 2 is used for processing the decoded speech 5. However, only the signal processing unit 2 is taken out and an acoustic signal decoding unit (acoustic signal encoding) is used. The decoder can be used for other signal processing processing, such as connecting to a subsequent stage of noise suppression processing. However, it is necessary to change or adjust the deformation process in the signal deformation unit and the evaluation method in the signal evaluation unit according to the characteristics of the degradation component to be eliminated.

この実施の形態１１によれば、復号音声以外の劣化成分を含む信号に対して、主観的に好ましくない成分を感じにくく加工することが可能である。 According to the eleventh embodiment, it is possible to process a component including a deteriorated component other than the decoded speech so that a subjectively undesirable component is difficult to feel.

実施の形態１２．
上記実施の形態１ないし１１では、現在のフレームまでの信号を用いて該信号の加工を行っているが、処理遅延の発生を許して次フレーム以降の信号も使用する構成も可能である。 Embodiment 12 FIG.
In the first to eleventh embodiments, the signal up to the current frame is used to process the signal. However, a configuration in which processing delay is allowed and signals from the next frame are used is also possible.

この実施の形態１２によれば、次のフレーム以降の信号を参照できるので、振幅スペクトルの平滑化特性の改善、連続性判定の精度向上、雑音らしさなどの評価精度の向上効果が得られる。 According to the twelfth embodiment, since the signal after the next frame can be referred to, it is possible to improve the smoothing characteristic of the amplitude spectrum, improve the accuracy of continuity determination, and improve the evaluation accuracy such as noise.

実施の形態１３．
上記実施の形態１、実施の形態３、実施の形態５ないし１２では、フーリエ変換によってスペクトル成分を算出し、変形処理を行って、逆フーリエ変換によって信号領域に戻しているが、フーリエ変換の代わりにバンドパスフィルタ群の各出力に対して、変形処理を行い、帯域別信号の加算によって信号を再構築する構成も可能である。 Embodiment 13 FIG.
In the first embodiment, the third embodiment, and the fifth to twelfth embodiments, the spectral component is calculated by Fourier transform, transformed, and returned to the signal domain by inverse Fourier transform. In addition, it is possible to perform a transformation process on each output of the bandpass filter group and reconstruct the signal by adding the band-specific signals.

この実施の形態１３によれば、フーリエ変換を使用しない構成でも同様の効果が得られる。 According to the thirteenth embodiment, the same effect can be obtained even in a configuration that does not use Fourier transform.

実施の形態１４．
上記実施の形態１ないし１３では、振幅平滑化部９と位相擾乱部１０の両方を備えた構成であったが、振幅平滑化部９と位相擾乱部１０の一方を省略した構成も可能であるし、更に別の変形部を導入した構成も可能である。 Embodiment 14 FIG.
In the first to thirteenth embodiments, the amplitude smoothing unit 9 and the phase disturbance unit 10 are both provided. However, a configuration in which one of the amplitude smoothing unit 9 and the phase disturbance unit 10 is omitted is also possible. However, a configuration in which another deformed portion is introduced is also possible.

この実施の形態１４によれば、解消したい量子化雑音や劣化音の特性によっては、導入効果がない変形部を省略することで処理が簡易化できる効果がある。また、適切な変形部を導入することで、振幅平滑化部９と位相擾乱部１０では解消できない量子化雑音や劣化音を解消できる効果が期待できる。 According to the fourteenth embodiment, depending on the characteristics of quantization noise and deteriorated sound to be eliminated, there is an effect that the processing can be simplified by omitting a deforming portion having no introduction effect. In addition, by introducing an appropriate deformation unit, it is possible to expect an effect of eliminating quantization noise and degraded sound that cannot be eliminated by the amplitude smoothing unit 9 and the phase disturbance unit 10.

この発明の実施の形態１による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the audio | voice decoding apparatus to which the audio | voice decoding method by Embodiment 1 of this invention is applied. この発明の実施の形態１の重み付け加算部１８における加算制御値に基づく重み付け加算の制御例を示す図である。It is a figure which shows the control example of the weighting addition based on the addition control value in the weighting addition part 18 of Embodiment 1 of this invention. この発明の実施の形態１のフーリエ変換部８における切り出し窓、逆フーリエ変換部１１における連接のための窓の実際の形状例、復号音声５との時間関係を説明する説明図である。It is explanatory drawing explaining the time relationship with the cut-out window in the Fourier-transform part 8 of Embodiment 1 of this invention, the actual shape example of the window for the connection in the inverse Fourier-transform part 11, and the decoding audio | voice 5. FIG. この発明の実施の形態２の音信号加工方法を雑音抑圧方法と組み合わて適用した音声復号装置の構成の一部を示す図である。It is a figure which shows a part of structure of the audio | voice decoding apparatus which applied the sound signal processing method of Embodiment 2 of this invention in combination with the noise suppression method. この発明の実施の形態３による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the audio | voice decoding apparatus to which the audio | voice decoding method by Embodiment 3 of this invention is applied. この発明の実施の形態３の聴覚重み付けスペクトルと第一の変形強度の関係を示す図である。It is a figure which shows the relationship between the auditory weighting spectrum of Embodiment 3 of this invention, and 1st deformation intensity. この発明の実施の形態４による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the audio | voice decoding apparatus to which the audio | voice decoding method by Embodiment 4 of this invention is applied. この発明の実施の形態５による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the speech decoding apparatus to which the speech decoding method by Embodiment 5 of this invention is applied. この発明の実施の形態６による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the speech decoding apparatus to which the speech decoding method by Embodiment 6 of this invention is applied. この発明の実施の形態７による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the audio | voice decoding apparatus to which the audio | voice decoding method by Embodiment 7 of this invention is applied. この発明の実施の形態８による音声復号方法を適用した音声復号装置の全体構成を示す図である。It is a figure which shows the whole structure of the audio | voice decoding apparatus to which the audio | voice decoding method by Embodiment 8 of this invention is applied. この発明の実施の形態９を適用した復号音声スペクトル４３と、変形復号音声スペクトル４４に周波数毎の重みを乗じた後のスペクトルの一例を示す模式図である。It is a schematic diagram which shows an example of the spectrum after multiplying the weight for every frequency to the decoding audio | voice spectrum 43 to which Embodiment 9 of this invention is applied, and the deformation | transformation decoding audio | voice spectrum 44. FIG.

Claims

A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters ;
A first processed sound generation step of generating a first processed sound by smoothing the amplitude of the decoded sound generated in the decoded sound generation step in the frequency axis direction ;
A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing A mixing ratio for mixing the decoded speech and the first processed speech so that a ratio of the first processed speech in the speech is increased is any one of a plurality of parameters generated in the decoded speech generation step. A mixing ratio acquisition step to acquire based on one;
And a second processed sound generation step of generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired in the mixing ratio acquisition step. Sound signal processing method.

  A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  A first processed sound generation step of generating a first processed sound by smoothing the amplitude of the decoded sound generated in the decoded sound generation step in the frequency axis direction;
  A weighting coefficient acquisition step of acquiring a weighting coefficient that increases as the noise likelihood of the decoded voice increases based on any one of the plurality of parameters generated in the decoded voice generation step;
  A second processed voice generating step of adding the decoded voice and the first processed voice weighted with the weighting coefficient acquired in the weighting coefficient acquiring step to generate a second processed voice; A sound signal processing method characterized by the above.

  A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  A first processed speech generation step of generating a first processed speech by smoothing the amplitude of the decoded speech generated in the decoded speech generation step in a time axis direction;
  A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing A mixing ratio for mixing the decoded speech and the first processed speech so that a ratio of the first processed speech in the speech is increased is any one of a plurality of parameters generated in the decoded speech generation step. A mixing ratio acquisition step to acquire based on one;
  And a second processed sound generation step of generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired in the mixing ratio acquisition step. Sound signal processing method.

  A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  A first processed speech generation step of generating a first processed speech by smoothing the amplitude of the decoded speech generated in the decoded speech generation step in a time axis direction;
  A weighting coefficient acquisition step of acquiring a weighting coefficient that increases as the noise likelihood of the decoded voice increases based on any one of the plurality of parameters generated in the decoded voice generation step;
  A second processed voice generating step of adding the decoded voice and the first processed voice weighted with the weighting coefficient acquired in the weighting coefficient acquiring step to generate a second processed voice; A sound signal processing method characterized by the above.

  A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  The first processed speech in which fluctuation in the time axis direction of the amplitude of the decoded speech generated in the decoded speech generation step is reduced to the amplitude at a predetermined time of the decoded speech by a predetermined time before the predetermined time. A first processed speech generation step for performing weighted addition processing for weighted addition of the amplitude of the decoded speech at a time point and the amplitude of the decoded speech at a time point after the predetermined time from the predetermined time point;
  A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing A mixing ratio for mixing the decoded speech and the first processed speech so that a ratio of the first processed speech in the speech is increased is any one of a plurality of parameters generated in the decoded speech generation step. A mixing ratio acquisition step to acquire based on one;
  And a second processed sound generation step of generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired in the mixing ratio acquisition step. Sound signal processing method.

  A decoded speech generation step of generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  The first processed speech in which fluctuation in the time axis direction of the amplitude of the decoded speech generated in the decoded speech generation step is reduced to the amplitude at a predetermined time of the decoded speech by a predetermined time before the predetermined time. A first processed speech generation step for performing weighted addition processing for weighted addition of the amplitude of the decoded speech at a time point and the amplitude of the decoded speech at a time point after the predetermined time from the predetermined time point;
  A weighting coefficient acquisition step of acquiring a weighting coefficient that increases as the noise likelihood of the decoded voice increases based on any one of the plurality of parameters generated in the decoded voice generation step;
  A second processed voice generating step of adding the decoded voice and the first processed voice weighted with the weighting coefficient acquired in the weighting coefficient acquiring step to generate a second processed voice; A sound signal processing method characterized by the above.

7. The sound signal according to claim 5, wherein, in the first processed sound generation step, the weighted addition process is repeatedly performed for a plurality of time points at equal time intervals to generate a first processed sound. 8. Processing method.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  First processed sound generating means for smoothing the amplitude of the decoded sound generated by the decoded sound generating means in the frequency axis direction to generate a first processed sound;
  A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing Any one of a plurality of parameters generated by the decoded sound generation means is a mixing ratio for mixing the decoded sound and the first processed sound so that a ratio of the first processed sound in the sound is increased. A mixing ratio acquisition means to acquire based on one;
  And a second processed sound generating means for generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired by the mixing ratio acquiring means. Sound signal processing device.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  First processed sound generating means for smoothing the amplitude of the decoded sound generated by the decoded sound generating means in the frequency axis direction to generate a first processed sound;
  Based on any one of a plurality of parameters generated by the decoded speech generation means, weighting coefficient acquisition means for acquiring a weighting coefficient that increases as the noise likelihood of the decoded speech increases;
  A second processed sound generating means for adding the decoded sound and the first processed sound weighted by the weighting coefficient acquired by the weighting coefficient acquiring means to generate a second processed sound; A sound signal processing apparatus characterized by.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  First processed sound generating means for smoothing the amplitude of the decoded sound generated by the decoded sound generating means in the time axis direction to generate a first processed sound;
  A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing Any one of a plurality of parameters generated by the decoded sound generation means is a mixing ratio for mixing the decoded sound and the first processed sound so that a ratio of the first processed sound in the sound is increased. A mixing ratio acquisition means to acquire based on one;
  And a second processed sound generating means for generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired by the mixing ratio acquiring means. Sound signal processing device.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  First processed sound generating means for smoothing the amplitude of the decoded sound generated by the decoded sound generating means in the time axis direction to generate a first processed sound;
  Based on any one of a plurality of parameters generated by the decoded speech generation means, weighting coefficient acquisition means for acquiring a weighting coefficient that increases as the noise likelihood of the decoded speech increases;
  A second processed sound generating means for adding the decoded sound and the first processed sound weighted by the weighting coefficient acquired by the weighting coefficient acquiring means to generate a second processed sound; A sound signal processing apparatus characterized by.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  The first processed speech in which the fluctuation in the time axis direction of the amplitude of the decoded speech generated by the decoded speech generation means is reduced to the amplitude at a predetermined time of the decoded speech by a predetermined time before the predetermined time. First processed speech generation means for performing weighted addition processing for weighted addition of the amplitude of the decoded speech at a time point and the amplitude of the decoded speech at a time point after the predetermined time from the predetermined time point;
  A mixing ratio used for generating a second processed voice generated by mixing the decoded voice and the first processed voice, and as the noise likelihood of the decoded voice increases, the second processing Any one of a plurality of parameters generated by the decoded sound generation means is a mixing ratio for mixing the decoded sound and the first processed sound so that a ratio of the first processed sound in the sound is increased. A mixing ratio acquisition means to acquire based on one;
  And a second processed sound generating means for generating a second processed sound by mixing the decoded sound and the first processed sound at the mixing ratio acquired by the mixing ratio acquiring means. Sound signal processing device.

  Decoded speech generating means for generating a plurality of parameters from the speech code and generating a decoded speech corresponding to the speech code using the plurality of parameters;
  The first processed speech in which the fluctuation in the time axis direction of the amplitude of the decoded speech generated by the decoded speech generation means is reduced to the amplitude at a predetermined time of the decoded speech by a predetermined time before the predetermined time. First processed speech generation means for performing weighted addition processing for weighted addition of the amplitude of the decoded speech at a time point and the amplitude of the decoded speech at a time point after the predetermined time from the predetermined time point;
  Based on any one of a plurality of parameters generated by the decoded speech generation means, weighting coefficient acquisition means for acquiring a weighting coefficient that increases as the noise likelihood of the decoded speech increases;
  A second processed sound generating means for adding the decoded sound and the first processed sound weighted by the weighting coefficient acquired by the weighting coefficient acquiring means to generate a second processed sound; A sound signal processing apparatus characterized by.

The sound signal according to claim 12 or 13, wherein the first processed sound generation means generates the first processed sound by repeatedly performing the weighted addition processing at a plurality of time points at equal time intervals. Processing equipment.