JP6451079B2

JP6451079B2 - Speech enhancement device and program, and speech decoding device and program

Info

Publication number: JP6451079B2
Application number: JP2014100856A
Authority: JP
Inventors: 青柳　弘美; 弘美青柳
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2014-05-14
Filing date: 2014-05-14
Publication date: 2019-01-16
Anticipated expiration: 2034-05-14
Also published as: JP2015219285A

Description

本発明は、音声強調装置及びプログラム、並びに、音声復号装置及びプログラムに関し、例えば、通話システムに適用し得るものである。 The present invention relates to a speech enhancement device and program, and a speech decoding device and program, and can be applied to, for example, a call system.

近年、通話システムのデジタル化が進み、携帯電話システムなど、ほとんどの通話システムがデジタル化されている。このようなデジタル通話システムにおいては、伝送路上のデータ量低減を主な目的として音声圧縮技術（音声符号化技術）が用いられている。一般に、送話側で音声符号化して送信し、受話側で符号化音声信号を復号して得た音声信号は、音声符号化される前の音声信号に比べると品質が劣化する。特に、音声レベルの細かい変化が平均化され、明瞭度が低下してしまう。このような問題に対して、例えば、特許文献１に記載のような音声強調処理を音声信号に適用することが考えられる。特許文献１に記載の音声強調技術は、音声信号の立ち上がり部分のレベル変化を強調して、明瞭度を向上させるものである。 In recent years, digitalization of call systems has progressed, and most call systems such as mobile phone systems have been digitized. In such a digital call system, a voice compression technique (voice coding technique) is used mainly for the purpose of reducing the amount of data on the transmission path. In general, the quality of an audio signal obtained by encoding and transmitting speech on the transmitting side and decoding the encoded audio signal on the receiving side is deteriorated compared to the audio signal before being encoded. In particular, fine changes in the sound level are averaged, resulting in a decrease in clarity. For such a problem, for example, it is conceivable to apply a speech enhancement process as described in Patent Document 1 to a speech signal. The speech enhancement technique described in Patent Literature 1 enhances the clarity by enhancing the level change at the rising portion of the speech signal.

特開平０７−１０４７８８号公報Japanese Patent Laid-Open No. 07-104788

ところで、符号化音声信号に対する復号処理によって得られた音声信号はデジタル信号である。一方、特許文献１に記載の音声強調装置は、複数の時定数回路を適用している。時定数回路はアナログ回路で実現するのに適した回路である。 By the way, the audio signal obtained by the decoding process on the encoded audio signal is a digital signal. On the other hand, the speech enhancement device described in Patent Document 1 uses a plurality of time constant circuits. The time constant circuit is a circuit suitable for realization with an analog circuit.

復号音声信号はデジタル信号であるため、時定数回路をデジタル回路で構成することも考えられる。しかし、デジタルフィルタなどの複雑な回路で時定数回路を実現する他なく、デジタル信号の音声信号を強調する音声強調装置が複雑、大型化してしまう。また、デジタル信号の復号音声信号をアナログ信号に変換して特許文献１に記載の音声強調装置に入力することも考えられる。しかし、復号音声信号に対しては、音声強調以外にも、イコライザ等の他の処理を行うことも多く、一部の回路だけアナログ回路で構成することは構成の無駄が多くなってしまう。 Since the decoded audio signal is a digital signal, it may be considered that the time constant circuit is constituted by a digital circuit. However, in addition to realizing a time constant circuit with a complex circuit such as a digital filter, a speech enhancement device that enhances a speech signal of a digital signal becomes complicated and large. It is also conceivable that the decoded speech signal of the digital signal is converted into an analog signal and input to the speech enhancement device described in Patent Document 1. However, in addition to speech enhancement, the decoded speech signal is often subjected to other processing such as an equalizer, and it is wasteful in configuration to configure only a part of the circuit with an analog circuit.

今日においては、デジタル処理が行う処理をソフトウェア処理で行うことも考えられる。しかし、ソフトウェア処理を行う場合には時定数回路に相当するプログラムが複雑になったり、他の処理がソフトウェア処理で実現されていても音声強調のために構成はアナログ回路で構成しなければならない、など、上述したと同様な課題を有する。 Nowadays, it is also conceivable to perform processing performed by digital processing by software processing. However, when performing software processing, the program corresponding to the time constant circuit becomes complicated, or even if other processing is realized by software processing, the configuration must be configured by an analog circuit for speech enhancement. Etc. have the same problems as described above.

そのため、デジタル処理やソフトウェア処理に適した音声強調装置及びプログラムや、そのような音声強調装置やプログラムを適用した音声復号装置及びプログラムが望まれている。 Therefore, a speech enhancement device and program suitable for digital processing and software processing, and a speech decoding device and program to which such speech enhancement device and program are applied are desired.

第１の本発明は、利得強調手段が音声信号に利得を乗算して強調する音声強調装置において、（１）音声信号のサンプル毎のレベルを計算するサンプルレベル計算手段と、（２）所定サンプル数のサンプルレベルに基づいて、所定サンプル数におけるサンプルレベルの代表値を計算する代表値計算手段と、（３）処理対象区間の上記代表値と直前区間の上記代表値とのみに基づき、上記音声信号を強調するための利得を計算する利得計算手段とを有することを特徴とする。 According to a first aspect of the present invention, there is provided a speech enhancement apparatus in which a gain enhancement unit multiplies a speech signal by gain to enhance (1) a sample level calculation unit that calculates a level for each sample of the speech signal, and (2) a predetermined sample. Representative value calculating means for calculating a representative value of the sample level in a predetermined number of samples based on a number of sample levels; and (3) the voice based only on the representative value of the processing target section and the representative value of the immediately preceding section. And gain calculating means for calculating a gain for enhancing the signal.

第２の本発明は、音声信号に強調する音声強調プログラムであって、コンピュータを、（１）音声信号のサンプル毎のレベルを計算するサンプルレベル計算手段と、（２）所定サンプル数のサンプルレベルに基づいて、所定サンプル数におけるサンプルレベルの代表値を計算する代表値計算手段と、（３）処理対象区間の上記代表値と直前区間の上記代表値とのみに基づき、上記音声信号を強調するための利得を計算する利得計算手段と、（４）音声信号に利得を乗算して強調する利得強調手段として機能させることを特徴とする。 The second aspect of the present invention is a speech enhancement program for emphasizing a speech signal, comprising: (1) sample level calculation means for calculating a level for each sample of the speech signal; and (2) a sample level of a predetermined number of samples. And (3) enhancing the audio signal based only on the representative value of the section to be processed and the representative value of the immediately preceding section. And (4) a gain enhancement unit that multiplies the audio signal by the gain for enhancement.

第３の本発明は、符号化音声信号を復号する音声復号部と、復号された音声信号を強調する音声強調部とを有する音声復号装置であって、上記音声強調部として、第１の本発明の音声強調装置を適用したことを特徴とする。 A third aspect of the present invention is a speech decoding apparatus having a speech decoding unit that decodes an encoded speech signal and a speech enhancement unit that emphasizes the decoded speech signal, and the first book is used as the speech enhancement unit. The speech enhancement device of the invention is applied.

第４の本発明は、コンピュータを、符号化音声信号を復号する音声復号部と、復号された音声信号を強調する音声強調部として機能させる音声復号プログラムであって、上記音声強調部として機能するプログラム部分として、第２の本発明の音声強調プログラムを適用したことを特徴とする。 The fourth aspect of the present invention is a speech decoding program that causes a computer to function as a speech decoding unit that decodes an encoded speech signal and a speech enhancement unit that enhances the decoded speech signal, and functions as the speech enhancement unit. The speech enhancement program according to the second aspect of the present invention is applied as the program portion.

本発明によれば、デジタル処理やソフトウェア処理に適した音声強調装置及びプログラムや、そのような音声強調装置やプログラムを適用した音声復号装置及びプログラムを実現できる。 According to the present invention, it is possible to realize a speech enhancement device and program suitable for digital processing and software processing, and a speech decoding device and program to which such speech enhancement device and program are applied.

第１の実施形態に係る音声復号装置の機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of the speech decoding apparatus which concerns on 1st Embodiment. 第１の実施形態の音声復号装置における音声強調部（音声強調装置）の内部構成を示す機能ブロック図である。It is a functional block diagram which shows the internal structure of the audio | voice emphasis part (voice | voice emphasis apparatus) in the audio | voice decoding apparatus of 1st Embodiment. 第２の実施形態の音声復号装置における音声強調部（音声強調装置）の内部構成を示す機能ブロック図である。It is a functional block diagram which shows the internal structure of the speech enhancement part (speech enhancement apparatus) in the speech decoding apparatus of 2nd Embodiment. 第２の実施形態の音声強調部における強調用利得の第１の制限方法の説明図である。It is explanatory drawing of the 1st restriction | limiting method of the gain for emphasis in the audio | voice emphasis part of 2nd Embodiment. 第２の実施形態の音声強調部における強調用利得の第２の制限方法の説明図である。It is explanatory drawing of the 2nd restriction | limiting method of the gain for emphasis in the audio | voice emphasis part of 2nd Embodiment. 第２の実施形態の音声強調部における強調用利得の第３の制限方法の説明図である。It is explanatory drawing of the 3rd restriction | limiting method of the gain for emphasis in the audio | voice emphasis part of 2nd Embodiment.

（Ａ）第１の実施形態
以下、本発明による音声強調装置及びプログラム、並びに、音声復号装置及びプログラムの第１の実施形態を、図面を参照しながら説明する。 (A) First Embodiment Hereinafter, a first embodiment of a speech enhancement device and program, and a speech decoding device and program according to the present invention will be described with reference to the drawings.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る音声復号装置の機能的構成を示すブロック図である。ここで、第１の実施形態の音声復号装置は、ハードウェアで構成することも可能であり、また、ＣＰＵが実行するソフトウェア（音声復号プログラム）とＣＰＵとで実現することも可能であるが、いずれの実現方法を採用した場合であっても、機能的には図１で表すことができる。 (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional configuration of a speech decoding apparatus according to the first embodiment. Here, the speech decoding apparatus according to the first embodiment can be configured by hardware, and can also be realized by software (speech decoding program) executed by the CPU and the CPU. Whichever implementation method is employed, it can be functionally represented in FIG.

図１において、第１の実施形態の音声復号装置１は、音声復号部２及び音声強調部３を有する。 In FIG. 1, the speech decoding apparatus 1 according to the first embodiment includes a speech decoding unit 2 and a speech enhancement unit 3.

音声復号部２には、図示しない受信部が伝送路復調して得た符号化音声信号が入力される。音声復号部２は、符号化音声信号を復号し、復号音声信号（デジタル信号）を得て音声強調部３に与える。ここで、音声符号化方式は限定されないが、ＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬｉｎｅａｒＰｒｅｄｉｃｔｉｏｎ）方式などの圧縮率が高い高能率符号化方式の場合に、第１の実施形態の効果が顕著となる。音声復号部２の内部構成は、既存のものと同様であるので、その詳細構成の説明は省略する。 The speech decoding unit 2 receives an encoded speech signal that is obtained by demodulating a transmission path by a receiving unit (not shown). The audio decoding unit 2 decodes the encoded audio signal, obtains a decoded audio signal (digital signal), and gives it to the audio enhancement unit 3. Here, the speech coding method is not limited, but the effect of the first embodiment becomes remarkable in the case of a high-efficiency coding method with a high compression rate such as a CELP (Code Excited Linear Prediction) method. Since the internal configuration of the speech decoding unit 2 is the same as that of the existing one, description of the detailed configuration is omitted.

音声強調部３は、第１の実施形態の音声強調装置に相当するものである。第１の実施形態の音声強調部３は、復号音声信号レベルが増大するときにはその増大の仕方の勢いを増す（増大を速くする）と共に、復号音声信号レベルが減少するときにはその減少の仕方の勢いを増す（減少を速くする）ように強調した音声信号に、復号音声信号を変換する。言い換えると、音声強調部３は、復号音声信号の立ち上がり部分及び立ち下がり部分のレベル変化を強調するものである。 The speech enhancement unit 3 corresponds to the speech enhancement device of the first embodiment. The voice emphasizing unit 3 of the first embodiment increases the moment of increase when the decoded voice signal level increases (makes the increase faster), and the momentum of the decrease when the decoded voice signal level decreases. The decoded speech signal is converted into a speech signal that is emphasized so as to increase (decrease the speed). In other words, the voice emphasizing unit 3 emphasizes the level change of the rising part and the falling part of the decoded voice signal.

図２は、第１の実施形態の音声強調部３の内部構成を示す機能ブロック図である。 FIG. 2 is a functional block diagram showing the internal configuration of the speech enhancement unit 3 of the first embodiment.

図２において、音声強調部３は、レベル計算回路１０、フレーム総和計算回路１１、利得計算回路１２、遅延回路１３及び強調演算回路１４を有する。 In FIG. 2, the speech enhancement unit 3 includes a level calculation circuit 10, a frame sum calculation circuit 11, a gain calculation circuit 12, a delay circuit 13, and an enhancement operation circuit 14.

レベル計算回路１０は、入力された復号音声信号（以下、入力音声信号と呼ぶ）Ｓのサンプル毎のレベルを計算し、サンプルレベルＳａをフレーム総和計算回路１１に与えるものである。入力音声信号のサンプルレベルとして、入力音声信号サンプルの絶対値を計算するようにしても良く、また、入力音声信号サンプルの自乗値を計算するようにしても良い。 The level calculation circuit 10 calculates a level for each sample of an input decoded audio signal (hereinafter referred to as an input audio signal) S and gives the sample level Sa to the frame sum calculation circuit 11. The absolute value of the input speech signal sample may be calculated as the sample level of the input speech signal, or the square value of the input speech signal sample may be calculated.

フレーム総和計算回路１１は、入力音声信号ＳにおけるＮ個（Ｎは１以上）のサンプルでなるフレーム毎に、サンプルレベルＳａの総和を計算し、フレーム総和Ｓｆを利得計算回路１２に与える。フレームを構成するサンプル数Ｎは、フレーム総和Ｓｆに基づいて、サンプルレベルＳａの変化傾向（増大傾向や減少傾向）を捉えることができる数であれば良い。サンプリングレートとの関係もあるが、例えば、サンプル数Ｎは１であっても良い。また、相前後するフレームは、完全に切り分けられたものであっても良く、前フレームの後半１／ｍのサンプルを後フレームの前半１／ｍのサンプルとするようなフレーム区間がオーバーラップしているものであっても良い。サンプル数Ｎやオーバーラップ量は、上述した増大や減少の変化速度に影響を与えるものである。 The frame sum calculation circuit 11 calculates the sum of the sample levels Sa for each frame composed of N samples (N is 1 or more) in the input audio signal S, and gives the frame sum Sf to the gain calculation circuit 12. The number N of samples constituting the frame may be any number that can capture the change tendency (increase tendency or decrease tendency) of the sample level Sa based on the frame sum Sf. Although there is a relationship with the sampling rate, the number of samples N may be 1, for example. In addition, the frames that follow each other may be completely separated, and the frame sections that overlap the second half 1 / m sample of the previous frame with the first half 1 / m sample of the rear frame overlap. It may be. The number of samples N and the overlap amount affect the rate of change of increase and decrease described above.

第１の実施形態のフレーム総和計算回路１１は、フレームのサンプルレベルの代表値としてフレーム総和を計算しているが、代表値として、他の値を計算するようにしても良い。例えば、Ｎ個のサンプルレベルＳａの平均値をフレーム代表値とするようにしても良く、また、フレーム総和の平方根をフレーム代表値とするようにしても良い。 Although the frame sum calculation circuit 11 of the first embodiment calculates the frame sum as a representative value of the sample level of the frame, another value may be calculated as the representative value. For example, the average value of N sample levels Sa may be used as the frame representative value, and the square root of the frame sum may be used as the frame representative value.

利得計算回路１２は、処理対象となっているフレームについてのフレーム総和Ｓｆを、その直前フレームのフレーム総和（符号Ｓｆｐで表す）で割った値を求め、その値ｇ（＝Ｓｆ／Ｓｆｐ）を利得として強調演算回路１４に与える。処理対象フレームのフレーム総和Ｓｆ若しくは直前フレームのフレーム総和Ｓｆｐの少なくとも一方が０であって除算ができない場合には、例えば、利得ｇを１とする。また、除算ができない場合に、直前に算出した利得ｇを今回の処理対象フレームに対する利得とするようにしても良い。また、割った値そのものではなく、割った値の平方根や、割った値に所定の値を乗算した値などを利得ｇとするようにしても良い。割った値を適用するか、割った値の加工値を適用するかは、上述した増大や減少の変化速度に影響を与えるものである。 The gain calculation circuit 12 obtains a value obtained by dividing the frame sum Sf for the frame to be processed by the frame sum (represented by the code Sfp) of the immediately preceding frame, and gains the value g (= Sf / Sfp) as a gain. Is given to the emphasis calculation circuit 14 as follows. If at least one of the frame total Sf of the processing target frame or the frame total Sfp of the immediately preceding frame is 0 and division is not possible, the gain g is set to 1, for example. When division is not possible, the gain g calculated immediately before may be used as the gain for the current processing target frame. Further, instead of the divided value itself, a square root of the divided value or a value obtained by multiplying the divided value by a predetermined value may be used as the gain g. Whether to apply the divided value or the processed value of the divided value affects the change rate of the increase or decrease described above.

遅延回路１３は、レベル計算回路１０、フレーム総和計算回路１１及び利得計算回路１２でなる処理系における処理遅延時間分だけ、入力音声信号Ｓを遅延させ、遅延させた入力音声信号Ｓｎを強調演算回路１４に与えるものである。すなわち、遅延回路１３は、強調演算回路１４に処理対象フレームの利得ｇが与えられているタイミングで、強調演算回路１４にその処理対象フレームの入力音声信号Ｓｎが与えられるように、入力音声信号Ｓを遅延させるものである。 The delay circuit 13 delays the input audio signal S by the processing delay time in the processing system including the level calculation circuit 10, the frame total calculation circuit 11, and the gain calculation circuit 12, and emphasizes the delayed input audio signal Sn. 14. That is, the delay circuit 13 receives the input audio signal S so that the input operation signal 14 is supplied to the enhancement operation circuit 14 at the timing when the gain g of the process target frame is provided to the enhancement operation circuit 14. Is to delay.

強調演算回路１４は、利得ｇに基づいて、入力音声信号Ｓｎを強調するものである。強調演算回路１４は、例えば、入力音声信号Ｓｎに利得ｇを単純に乗算して入力音声信号Ｓｎを強調する。 The enhancement operation circuit 14 enhances the input audio signal Sn based on the gain g. For example, the enhancement operation circuit 14 simply multiplies the input audio signal Sn by a gain g to enhance the input audio signal Sn.

また、強調演算回路１４は、入力音声信号（のサンプル）Ｓｎに乗算する利得ｇを、利得の変化が滑らかになるようにサンプル毎に変化させるようにしても良い。（１）式は、利得を入力音声信号のサンプル毎に変化させる場合の計算式である。（１）式において、ｇは処理対象フレームについて計算で得られた利得、ｇｐは処理対象フレームの直前フレームについて計算で得られた利得、ｉは処理対象フレームの全Ｎ個のサンプルのうちの先頭側から数えてｉ（ｉは１〜Ｎ）番目の入力音声信号サンプルを規定するパラメータであり、ｇ（ｉ）は処理対象フレームの先頭側から数えてｉ番目の入力音声信号サンプルに乗算する利得を表している。
ｇ（ｉ）＝ｇｐ＊（Ｎ−ｉ）／Ｎ＋ｇ＊ｉ／Ｎ …（１） Further, the enhancement calculation circuit 14 may change the gain g multiplied by the input audio signal (sample thereof) Sn for each sample so that the change in gain becomes smooth. Formula (1) is a calculation formula for changing the gain for each sample of the input audio signal. In equation (1), g is a gain obtained by calculation for the processing target frame, gp is a gain obtained by calculation for the frame immediately before the processing target frame, and i is the head of all N samples of the processing target frame. Is a parameter that defines the i (i is 1 to N) th input speech signal sample counted from the side, and g (i) is a gain multiplied by the ith input speech signal sample counted from the head side of the processing target frame. Represents.
g (i) = gp * (N−i) / N + g * i / N (1)

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態の音声復号装置１の動作を、全体動作、音声強調部３における音声強調動作の順に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the speech decoding apparatus 1 of the first embodiment will be described in the order of the overall operation and the speech enhancement operation in the speech enhancement unit 3.

図示しない受信部が得た符号化音声信号は、音声復号部２に入力され、音声復号部２によって復号される。得られた復号音声信号Ｓは、音声強調部３において、その立ち上がり部分及び立ち下がり部分のレベル変化が強調され、強調後の音声信号Ｓｇが出力される。 An encoded speech signal obtained by a receiving unit (not shown) is input to the speech decoding unit 2 and decoded by the speech decoding unit 2. In the obtained decoded speech signal S, the speech enhancement unit 3 emphasizes the level change of the rising portion and the falling portion, and the enhanced speech signal Sg is output.

音声強調部３の内部においては、入力音声信号（復号音声信号）Ｓの立ち上がり部分及び立ち下がり部分のレベル変化が以下のように強調される。 Inside the voice emphasizing unit 3, the level change of the rising part and the falling part of the input voice signal (decoded voice signal) S is emphasized as follows.

入力音声信号Ｓは、音声強調部３内のレベル計算回路１０及び遅延回路１３に与えられる。 The input speech signal S is given to the level calculation circuit 10 and the delay circuit 13 in the speech enhancement unit 3.

入力音声信号Ｓのサンプル毎のレベルがレベル計算回路１０によって計算され、得られたサンプルレベルＳａがフレーム総和計算回路１１に与えられる。フレーム総和計算回路１１においては、フレーム毎に、全Ｎ個のサンプルレベルＳａの総和が計算され、得られたフレーム総和Ｓｆが利得計算回路１２に与えられる。利得計算回路１２においては、処理対象フレームについてのフレーム総和Ｓｆが、その直前フレームのフレーム総和Ｓｆｐで除算され、その商ｇ（＝Ｓｆ／Ｓｆｐ）が利得として強調演算回路１４に与えられる。 The level for each sample of the input audio signal S is calculated by the level calculation circuit 10, and the obtained sample level Sa is given to the frame sum calculation circuit 11. In the frame sum calculation circuit 11, the sum of all N sample levels Sa is calculated for each frame, and the obtained frame sum Sf is given to the gain calculation circuit 12. In the gain calculation circuit 12, the frame sum Sf for the processing target frame is divided by the frame sum Sfp of the immediately preceding frame, and the quotient g (= Sf / Sfp) is given to the enhancement calculation circuit 14 as a gain.

一方、遅延回路１３に入力された入力音声信号Ｓは、レベル計算回路１０、フレーム総和計算回路１１及び利得計算回路１２でなる処理系における処理遅延時間分だけ遅延され、遅延後の入力音声信号Ｓｎが強調演算回路１４に与えられる。 On the other hand, the input audio signal S input to the delay circuit 13 is delayed by the processing delay time in the processing system including the level calculation circuit 10, the frame sum calculation circuit 11, and the gain calculation circuit 12, and the delayed input audio signal Sn. Is given to the emphasis calculation circuit 14.

そして、強調演算回路１４において、利得ｇに基づいて、入力音声信号Ｓｎに対する強調演算が実行され、入力音声信号Ｓの立ち上がり部分や立ち下がり部分のレベル変化が強調された音声信号Ｓｇが強調演算回路１４から出力される。 Then, the emphasis calculation circuit 14 executes an emphasis calculation on the input audio signal Sn based on the gain g, and the audio signal Sg in which the level change of the rising portion and the falling portion of the input audio signal S is emphasized is the enhancement calculation circuit. 14 is output.

ここで、強調演算回路１４における強調演算が、利得ｇを入力音声信号Ｓｎに単純に乗算するものとする。この場合において、仮に、処理対象フレームのフレーム総和Ｓｆが直前フレームのフレーム総和Ｓｆｐの１．２倍であったとすると、利得ｇは１．２となる。その結果、入力音声信号Ｓｎの処理対象フレームの各サンプルの値はそれぞれ１．２倍される。平均的に見て、処理対象フレームの各サンプルの値が直前フレームの各サンプルの値の１．２倍であった状況において、さらに利得ｇ（＝１．２）倍されるので、処理対象フレームの強調演算後の各サンプルの値は、直前フレームの各サンプルの値の１．４４（＝１．２×１．２）倍となる。以上の例から明らかなように、入力音声信号Ｓの立ち上がり部分のレベル変化が強調される（１．２から１．４４へ）。 Here, it is assumed that the enhancement operation in the enhancement operation circuit 14 simply multiplies the input audio signal Sn by the gain g. In this case, if the frame sum Sf of the processing target frame is 1.2 times the frame sum Sfp of the immediately preceding frame, the gain g is 1.2. As a result, the value of each sample of the processing target frame of the input audio signal Sn is multiplied by 1.2. On average, in a situation where the value of each sample of the processing target frame is 1.2 times the value of each sample of the immediately preceding frame, the gain g (= 1.2) is further multiplied. The value of each sample after the enhancement calculation is 1.44 (= 1.2 × 1.2) times the value of each sample in the immediately preceding frame. As is clear from the above example, the level change at the rising portion of the input audio signal S is emphasized (from 1.2 to 1.44).

また、仮に、処理対象フレームのフレーム総和Ｓｆが直前フレームのフレーム総和Ｓｆｐの０．９倍であったとすると、利得ｇは０．９となる。その結果、入力音声信号Ｓｎの処理対象フレームの各サンプルの値はそれぞれ０．９倍される。平均的に見て、処理対象フレームの各サンプルの値が直前フレームの各サンプルの値の０．９倍であった状況において、さらに利得ｇ（＝０．９）倍されるので、処理対象フレームの強調演算後の各サンプルの値は、直前フレームの各サンプルの値の０．８１（＝０．９×０．９）倍となる。以上の例から明らかなように、入力音声信号Ｓの立ち下がり部分のレベル変化が強調される（０．９から０．８１へ）。 If the frame sum Sf of the processing target frame is 0.9 times the frame sum Sfp of the immediately preceding frame, the gain g is 0.9. As a result, the value of each sample of the processing target frame of the input audio signal Sn is multiplied by 0.9. On average, in a situation where the value of each sample of the processing target frame is 0.9 times the value of each sample of the immediately preceding frame, the gain g (= 0.9) is further multiplied. The value of each sample after the enhancement calculation is 0.81 (= 0.9 × 0.9) times the value of each sample in the immediately preceding frame. As is clear from the above example, the level change at the falling portion of the input audio signal S is emphasized (from 0.9 to 0.81).

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、絶対値演算（若しくは自乗演算（乗算））、総和演算、除算、乗算（若しくは積和）等の単純な演算によって、復号音声信号（入力音声信号）の立ち上がり部分や立ち下がり部分のレベル変化を強調することができる。 (A-3) Effects of the First Embodiment According to the first embodiment, by simple operations such as absolute value operation (or square operation (multiplication)), summation operation, division, multiplication (or product sum), etc. Thus, it is possible to emphasize the level change of the rising portion and the falling portion of the decoded audio signal (input audio signal).

このような強調処理により、復号音声信号のレベル変化が符号化処理や復号処理のために緩やかになった場合でも、強調後の音声信号のレベル変化が明確になって明瞭度を向上させることができる。 By such enhancement processing, even when the level change of the decoded speech signal becomes gentle due to the encoding processing or decoding processing, the level change of the enhanced speech signal becomes clear and the clarity can be improved. it can.

上述した特許文献１に記載の音声強調は、音声信号の立ち上がり部分しか強調できないものであったが、第１の実施形態によれば、音声信号の立ち上がり部分だけでなく音声信号の立ち下がり部分も強調することができ、明瞭度は一段と向上する。 The speech enhancement described in Patent Document 1 described above can emphasize only the rising portion of the speech signal, but according to the first embodiment, not only the rising portion of the speech signal but also the falling portion of the speech signal. It can be emphasized and the clarity is further improved.

（Ｂ）第２の実施形態
次に、本発明による音声強調装置及びプログラム、並びに、音声復号装置及びプログラムの第２の実施形態を、図面を参照しながら説明する。 (B) Second Embodiment Next, a second embodiment of the speech enhancement apparatus and program, and the speech decoding apparatus and program according to the present invention will be described with reference to the drawings.

第２の実施形態の音声復号装置１Ａも、上述した図１に示すように、音声復号部２及び音声強調部３Ａを有するが、音声強調部３Ａが第１の実施形態のものと異なっている。 The speech decoding apparatus 1A according to the second embodiment also includes the speech decoding unit 2 and the speech enhancement unit 3A as shown in FIG. 1 described above, but the speech enhancement unit 3A is different from that of the first embodiment. .

図３は、第２の実施形態の音声強調部３Ａの内部構成を示す機能ブロック図であり、上述した図２との同一、対応部分には同一符号を付して示している。 FIG. 3 is a functional block diagram showing the internal configuration of the speech enhancement unit 3A of the second embodiment, and the same and corresponding parts as those in FIG.

図３において、第２の実施形態の音声強調部３Ａは、レベル計算回路１０、フレーム総和計算回路１１、利得計算回路１２、遅延回路１３及び強調演算回路１４に加え、利得制限回路２０を有する。レベル計算回路１０、フレーム総和計算回路１１、利得計算回路１２、遅延回路１３及び強調演算回路１４は、第１の実施形態のものと同様であるので、その機能説明は省略する。 In FIG. 3, the speech enhancement unit 3 </ b> A according to the second embodiment includes a gain limiting circuit 20 in addition to the level calculation circuit 10, the frame sum calculation circuit 11, the gain calculation circuit 12, the delay circuit 13, and the enhancement operation circuit 14. Since the level calculation circuit 10, the frame total calculation circuit 11, the gain calculation circuit 12, the delay circuit 13, and the emphasis calculation circuit 14 are the same as those in the first embodiment, description of their functions is omitted.

第２の実施形態の場合、利得計算回路１２から出力された利得ｇは利得制限回路２０に与えられる。 In the case of the second embodiment, the gain g output from the gain calculation circuit 12 is given to the gain limiting circuit 20.

利得制限回路２０は、入力された利得ｇが所定範囲の値である場合には、その範囲に応じた値に変換（制限）して出力し、入力された利得ｇが上記所定範囲以外の値の場合にはそのまま出力するものである。利得制限回路２０は、例えば、変換テーブルで構成されたものであっても良い。また、利得制限回路２０は、入力利得ｇを範囲の境界値（閾値）と比較し、所定範囲に属するときに、予め設定されている値を出力するようなデジタル回路やソフトウェアで構成されたものであっても良い。利得制限回路２０は、制限処理後の利得ｇｌを強調演算回路１４に与える。強調演算回路１４は、利得制限回路２０からの利得ｇｌに基づいて、第１の実施形態と同様な処理を行う。 When the input gain g is a value within a predetermined range, the gain limiting circuit 20 converts (limits) the output to a value according to the range and outputs the value. The input gain g is a value outside the predetermined range. In this case, the data is output as it is. The gain limiting circuit 20 may be configured with a conversion table, for example. The gain limiting circuit 20 is configured by a digital circuit or software that compares the input gain g with a boundary value (threshold value) of the range and outputs a preset value when belonging to a predetermined range. It may be. The gain limiting circuit 20 gives the gain gl after the limiting process to the emphasis calculation circuit 14. The enhancement calculation circuit 14 performs the same processing as that of the first embodiment based on the gain gl from the gain limiting circuit 20.

図４〜図６はそれぞれ、利得制限回路２０の入出力利得の関係例（利得制限例）を示す説明図である。図４〜図６において、横軸は、利得制限回路２０に入力された利得ｇの値を示しており、縦軸は、利得制限回路２０から出力された利得ｇｌの値を示している。入出力利得の関係が異なれば異なる実施形態を構成しているが、以下ではまとめて説明する。 4 to 6 are explanatory diagrams showing examples of relationship between input and output gains of the gain limiting circuit 20 (gain limiting examples). 4 to 6, the horizontal axis represents the value of the gain g input to the gain limiting circuit 20, and the vertical axis represents the value of the gain gl output from the gain limiting circuit 20. Different embodiments constitute different input / output gain relationships, which will be described collectively below.

図４は、入力利得ｇが１に近い値の範囲Ｍ〜１／Ｋ（Ｍ＞１、Ｋ＞１）の値には制限後の利得ｇｌとして１を出力するようにしたものである。言い換えると、強調処理に不感帯を導入したものである。処理対象フレームのフレーム和と直前フレームのフレーム和とがほぼ同じでありレベル変化がほとんどない場合、すなわち、入力利得ｇが１に近い値の場合には、強調処理を実行させないように、図４に示すような利得制限を設ける。 In FIG. 4, 1 is output as a limited gain gl for values in the range M to 1 / K (M> 1, K> 1) where the input gain g is close to 1. In other words, a dead zone is introduced into the enhancement process. When the frame sum of the processing target frame and the frame sum of the immediately preceding frame are almost the same and there is almost no level change, that is, when the input gain g is a value close to 1, the enhancement processing is not performed. A gain limit as shown in FIG.

図５は、図４を用いて説明した不感帯に加え、利得の上限値Ｐ（Ｐ＞１）、利得の下限値１／Ｑ（Ｑ＞１）を導入したものである。利得ｇが上限値Ｐ以上の場合には、制限後の利得ｇｌとして上限値Ｐを出力する。利得ｇが下限値１／Ｑ以下の場合には、制限後の利得ｇｌとして下限値１／Ｑを出力する。利得が大き過ぎたり小さ過ぎたりすると、音声強調が過度になされて音声強調によって却って音質を低下させる恐れもあり、そのため、利得に、上限値Ｐ及び下限値１／Ｑを導入する。図５は、利得の不感帯と利得の上下限値とを共に導入した場合を示しているが、利得の上下限値という制限だけを設けるようにしても良い。 FIG. 5 introduces a gain upper limit P (P> 1) and a gain lower limit 1 / Q (Q> 1) in addition to the dead zone described with reference to FIG. When the gain g is equal to or higher than the upper limit value P, the upper limit value P is output as the limited gain gl. When the gain g is less than or equal to the lower limit value 1 / Q, the lower limit value 1 / Q is output as the limited gain gl. If the gain is too large or too small, voice enhancement is excessively performed and the sound quality may be deteriorated by the voice enhancement. Therefore, an upper limit value P and a lower limit value 1 / Q are introduced into the gain. FIG. 5 shows a case where both the gain dead zone and the upper and lower limit values of the gain are introduced, but only the limitation of the upper and lower limit values of the gain may be provided.

図６は、図４を用いて説明した不感帯処理に加え、音声信号の立ち下がり部分での強調処理を実行しないようにしたものである。音声強調による明瞭化は、立ち上がり部分の方が効果が表れ易く、処理負担を軽減する場合などには、このような制限を導入するようにしても良い。 FIG. 6 shows the case where the enhancement processing at the falling edge of the audio signal is not executed in addition to the dead zone processing described with reference to FIG. The clarification by speech enhancement is more effective at the rising portion, and such a restriction may be introduced when the processing load is reduced.

第２の実施形態の音声強調部（音声強調装置）によれば、第１の実施形態と同様な効果に加え、音声強調のための利得に制限を掛けるようにしたので、設計者が意図した特性や機能等を実現することができる。 According to the speech enhancement unit (speech enhancement device) of the second embodiment, in addition to the same effects as those of the first embodiment, the gain for speech enhancement is limited. Characteristics, functions, etc. can be realized.

（Ｃ）他の実施形態
上記各実施形態の説明においても種々変形実施形態に言及したが、さらに、以下に例示するような変形実施形態を挙げることができる。 (C) Other Embodiments In the description of each of the above embodiments, various modified embodiments have been mentioned, and further modified embodiments as exemplified below can be given.

上記各実施形態においては、音声信号の全区間で音声強調（利得１による音声強調を含む）を行うものを示したが、有音無音判定を行い、有音区間の音声信号に対してのみ音声強調を行うようにしても良い。ここで、有音無音判定方法によっては、音声信号の立ち上がり部分の一部が無音区間と判定される恐れがあるが、判定された有音区間の前後のそれぞれに予め定めたサンプル数の延長期間を付加して有音区間とし、音声信号の立ち上がり部分や立ち下がり部分での音声強調が適切になされることを保証するようにしても良い。また、有音区間の全区間ではなく、有音区間の前半でのみ音声強調を行うようにしても良い。ここで、前半区間は固定長であっても良く、有音区間の半分の区間など可変長であっても良い。 In each of the above embodiments, voice enhancement (including voice enhancement with a gain of 1) is performed in all sections of a voice signal. However, a voice / silence determination is performed, and voice is spoken only for a voice signal in a voice section. Emphasis may be given. Here, depending on the sound / silence determination method, there is a possibility that a part of the rising portion of the audio signal is determined to be a silent section, but an extension period of a predetermined number of samples before and after the determined sound section. May be added to make a voiced section, and it may be ensured that the voice enhancement is appropriately performed at the rising and falling portions of the audio signal. In addition, voice enhancement may be performed only in the first half of a voiced section instead of the whole voiced section. Here, the first half section may have a fixed length, or may have a variable length such as a half section of the sound section.

上記各実施形態では、利得を算出する分子側の特徴量（例えばフレーム和）に係る処理対象区間と分母側の特徴量に係る直前区間とが同じ長さ（フレーム）であるものを示したが、これら区間の長さが異なっていても良い。例えば、処理対象区間が１フレーム期間であり、直前区間が２フレーム期間であっても良い。この場合において、特徴量としてフレーム和を適用するのであれば、直前区間の２フレームのフレーム和の平均を分母として利得を計算するようにしても良い。また、特徴量として、２つの区間の区間長の違いが影響にでないサンプルレベルの平均値を適用するようにしても良い。 In each of the embodiments described above, the processing target section related to the numerator-side feature quantity (for example, frame sum) for calculating the gain and the immediately preceding section related to the denominator-side feature quantity have the same length (frame). The lengths of these sections may be different. For example, the processing target section may be one frame period, and the immediately preceding section may be two frame periods. In this case, if the frame sum is applied as the feature amount, the gain may be calculated using the average of the frame sums of the two frames in the immediately preceding section as the denominator. Further, an average value of sample levels that does not affect the difference in section length between the two sections may be applied as the feature amount.

上記各実施形態では、音声強調部３、３Ａが常に動作するものを示したが、利用者が音声強調部３、３Ａを動作させるか否かを選択できるようにしても良い。 In each of the above embodiments, the voice enhancement units 3 and 3A are always operated. However, the user may be able to select whether or not to operate the voice enhancement units 3 and 3A.

また、例えば、複数の音声符号化方式に対応できる音声復号装置の場合には、音声符号化方式に応じた音声強調部（若しくはその一部構成（利得制限回路））を設け、そのとき選択されている音声符号化方式に応じた音声強調部を動作させるようにしても良い。 In addition, for example, in the case of a speech decoding apparatus that can support a plurality of speech coding schemes, a speech enhancement unit (or a partial configuration thereof (gain limiting circuit)) corresponding to the speech coding scheme is provided and selected at that time. A voice emphasis unit corresponding to the voice coding method being used may be operated.

上記第２の実施形態では、利得制限回路が常に動作するものを示したが、利用者が利得制限回路を動作させるか否かを選択できるようにしても良い。 In the second embodiment, the gain limiting circuit always operates. However, the user may select whether to operate the gain limiting circuit.

また、利得制限回路として、制限特性（図４〜図６参照）が異なる複数の利得制限回路を設け、利用者が適用する利得制限回路を選択できるようにしても良い。 Also, as the gain limiting circuit, a plurality of gain limiting circuits having different limiting characteristics (see FIGS. 4 to 6) may be provided so that the user can select the gain limiting circuit to be applied.

１、１Ａ…音声復号装置、２…音声復号部、３、３Ａ…音声強調部（音声強調装置）、１０…レベル計算回路、１１…フレーム総和計算回路、１２…利得計算回路、１３…遅延回路、１４…強調演算回路、２０…利得制限回路。 DESCRIPTION OF SYMBOLS 1, 1A ... Speech decoding device, 2 ... Speech decoding unit, 3, 3A ... Speech enhancement unit (speech enhancement device), 10 ... Level calculation circuit, 11 ... Frame total calculation circuit, 12 ... Gain calculation circuit, 13 ... Delay circuit , 14 ... Emphasis calculation circuit, 20 ... Gain limiting circuit.

Claims

In the speech enhancement device in which the gain enhancement unit multiplies the speech signal by gain to enhance the speech signal,
Sample level calculation means for calculating the level of each sample of the audio signal;
Representative value calculating means for calculating a representative value of the sample level in the predetermined number of samples based on the sample level of the predetermined number of samples;
A speech enhancement apparatus comprising: gain calculation means for calculating a gain for enhancing the speech signal based only on the representative value of the processing target section and the representative value of the immediately preceding section.

The speech enhancement apparatus according to claim 1, wherein the sample level calculation means calculates an absolute value or a square value of the sample value as a level for each sample.

The speech enhancement apparatus according to claim 1 or 2, wherein the representative value calculation means calculates a sum of sample levels of a predetermined number of samples as a representative value.

The gain calculation means calculates a quotient obtained by dividing the representative value of the section to be processed by the representative value of the immediately preceding section, or a square root of the quotient as a gain. Voice enhancement device.

5. The apparatus according to claim 1, further comprising gain limiting means for converting the gain calculated by the gain calculating means into a value corresponding to the predetermined range when the gain is a value within the predetermined range. The voice emphasis device described in 1.

The gain emphasizing unit gradually increases, for each sample, a gain for multiplying the sample value of the input audio signal in the processing target section based on the gain in the processing target section and the gain in the section that was the processing target section immediately before. The speech enhancement device according to claim 1, wherein the speech enhancement device is changed to

A speech enhancement program for emphasizing speech signals,
Computer
Sample level calculation means for calculating the level of each sample of the audio signal;
Representative value calculating means for calculating a representative value of the sample level in the predetermined number of samples based on the sample level of the predetermined number of samples;
Gain calculation means for calculating a gain for enhancing the audio signal based only on the representative value of the processing target section and the representative value of the immediately preceding section;
A speech enhancement program that functions as a gain enhancement means for enhancing a speech signal by multiplying the gain.

A speech decoding apparatus comprising: a speech decoding unit that decodes an encoded speech signal; and a speech enhancement unit that enhances the decoded speech signal,
A speech decoding device, wherein the speech enhancement device according to claim 1 is applied as the speech enhancement unit.

An audio decoding program that causes a computer to function as an audio decoding unit that decodes an encoded audio signal and an audio enhancement unit that emphasizes the decoded audio signal,
A speech decoding program, wherein the speech enhancement program according to claim 7 is applied as a program part that functions as the speech enhancement unit.