JP3750705B2

JP3750705B2 - Speech coding transmission method and speech coding transmission apparatus

Info

Publication number: JP3750705B2
Application number: JP15079297A
Authority: JP
Inventors: 正之三崎; 潤一田川; 宏嗣谷口; 美治男松本
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1997-06-09
Filing date: 1997-06-09
Publication date: 2006-03-01
Anticipated expiration: 2017-06-09
Also published as: JPH10341162A

Description

【０００１】
【発明の属する技術分野】
本発明は伝送路を用いて音声信号を効率よく伝送する音声符号化伝送方法及び音声符号化伝送装置に関するものである。
【０００２】
【従来の技術】
従来の音声符号化方法とその装置について説明する。図５は従来の音声符号化装置の基本構成を示すブロック図である。本図に示すように音声符号化装置は、周波数包絡演算手段１、マスキング閾値決定手段２Ａ、適応ビット割当手段３、第１〜第Ｎの帯域に帯域分割を行う第１〜第Ｎの帯域分割手段４、各帯域毎に量子化を行う第１〜第Ｎの量子化手段５、各帯域毎にエントロピー符号化を行う第１〜第Ｎのエントロピー符号化手段６、マルチプレクサ７を含んで構成される。
【０００３】
まず、周波数包絡演算手段１に入力された音声信号は、フレーム単位でスペクトル包絡が求められる。求められたスペクトル包絡をもとに、マスキング閾値決定手段２Ａは帯域分割されている帯域のマスキング閾値を決定する。このマスキング閾値は、臨界帯域幅を考慮した同時マスキング効果により決定される。適応ビット割当手段３は、得られたマスキング閾値を超える入力信号に対して、スペクトル包絡成分を各帯域毎に求める。そしてその比に応じて各帯域へのビット割当量を決定する。
【０００４】
一方、入力信号が第１〜第Ｎの帯域分割手段４に入力されると、第１〜第ＮのＮ帯域に分割される。そして、第１〜第Ｎの帯域分割手段４の出力信号は夫々第１〜第Ｎの量子化手段５に入力され、適応ビット割当手段３によって与えられたビット数で量子化される。そして量子化された各帯域分割信号は第１〜第Ｎのエントロピー符号化手段６に入力され、冗長性を削除するためのエントロピー符号化が行われる。そして各々の帯域の符号化データは適応ビット割当手段３で決定されたビット割当情報と共に、マルチプレクサ７でまとめられて伝送路に送出される。
【０００５】
【発明が解決しようとする課題】
しかしながら，上記のような方法では、符号化効率を良くするために同時マスキング効果を用いて符号化データを削減しているが、音声信号を受信する聴取者側の周囲騒音の影響や、聴取者個人の聴覚能力（聴覚特性）を考慮したものではない。特に受信側の環境において、騒音レベルが高かったり、全可聴帯域を聴くとのできない聴取者にとっては、一方的にこのような帯域分割信号を受信することは、冗長な情報を取得することになる。
【０００６】
本発明は、このような従来の問題点に鑑みてなされたものであって、音声信号を受信する聴取者側の周囲の騒音特性、及び聴取者の聴力特性を考慮することにより、符号化音声信号の再生品質又は符号化効率を向上させる音声符号化伝送方法及び音声符号化伝送装置を実現することを目的とするものである。
【０００７】
【課題を解決するための手段】
この課題を達成するために本願の請求項１記載の発明は、受信側の聴取条件に基づいて聴感上の再生品質を補償するために送信側で適用する音声符号化伝送方法であって、送信側から受信側へ受信側の周囲騒音特性に関する情報の転送を要求して受信側から周囲騒音スペクトル情報を入手し、送信側から受信側へ受信側聴取者の聴力特性に関する情報の転送を要求して受信側から聴取者聴力特性情報を入手し、符号化伝送すべき音声信号のスペクトル包絡を求め、同時マスキング効果によって前記スペクトル包絡に関するマスキング閾値を、受信側から得た周囲騒音スペクトル情報と受信側から得た聴取者聴力特性情報の少なくとも１つに基づいて補正し、得られた新しいマスキング閾値を基に各周波数帯域に対するビット割当量の配分を減少するよう調整し、調整されたビット割当量に基づいて各周波数帯域信号に対して所定の符号化アルゴリズムで符号化を行い、受信側へ伝送することを特徴とするものであり、本願の請求項５記載の発明は、受信側の聴取条件に基づいて聴感上の再生品質を補償するために送信側で適用する音声符号化伝送装置であって、受信側の周囲騒音スペクトル情報を送信側で入手する騒音特性参照手段と、受信側の聴取者聴力特性情報を送信側で入手する聴力特性参照手段と、符号化伝送すべき音声信号のスペクトル包絡を求める周波数包絡演算手段と、同時マスキング効果によって前記スペクトル包絡に関するマスキング閾値を、受信側から得た周囲騒音スペクトル情報と受信側から得た聴取者聴力特性情報の少なくとも１つに基づいて補正するマスキング閾値決定手段と、得られた新しいマスキング閾値を基に各周波数帯域に対するビット割当量の配分を減少するよう調整する適応ビット割当手段と、調整されたビット割当量に基づいて各周波数帯域信号に対して所定の符号化アルゴリズムで符号化を行う符号化手段と、を備えたことを特徴とするものである。
【０００８】
また本願の請求項２記載の発明は、受信側の聴取条件に基づいて聴感上の再生品質を補償するために送信側で適用する音声符号化伝送方法であって、送信側から受信側へ受信側の周囲騒音特性に関する情報の転送を要求して受信側から周囲騒音スペクトル情報を入手し、送信側から受信側へ受信側聴取者の聴力特性に関する情報の転送を要求して受信側から聴取者聴力特性情報を入手し、符号化伝送すべき音声信号のスペクトル包絡を求め、同時マスキング効果によって前記スペクトル包絡に関するマスキング閾値を、受信側から得た周囲騒音スペクトル情報と受信側から得た聴取者聴力特性情報の少なくとも１つに基づいて補正し、得られた新しいマスキング閾値を基に各周波数帯域に対するビット割当量を、音声信号のＳＮ値が所定値以上となるよう変更し、変更されたビット割当量に基づいて各周波数帯域信号に対して所定の符号化アルゴリズムで符号化を行い、受信側へ伝送することを特徴とするものであり、本願の請求項６記載の発明は、受信側の聴取条件に基づいて聴感上の再生品質を補償するために送信側で適用する音声符号化伝送装置であって、受信側の周囲騒音スペクトル情報を送信側で入手する騒音特性参照手段と、受信側の聴取者聴力特性情報を送信側で入手する聴力特性参照手段と、符号化伝送すべき音声信号のスペクトル包絡を求める周波数包絡演算手段と、同時マスキング効果によって前記スペクトル包絡に関するマスキング閾値を、受信側から得た周囲騒音スペクトル情報と受信側から得た聴取者聴力特性情報の少なくとも１つ以上に基づいて補正するマスキング閾値決定手段と、得られた新しいマスキング閾値を基に各周波数帯域に対するビット割当量を、音声信号のＳＮ値が所定値以上となるよう変更する適応ビット割当手段と、変更されたビット割当量に基づいて各周波数帯域信号に対して所定の符号化アルゴリズムで符号化を行う符号化手段と、を備えたことを特徴とするものである。
【０００９】
また本願の請求項３，７記載の発明は、上述した音声符号化伝送方法及び音声符号化伝送装置において、マスキング閾値の補正に際し、前記周囲騒音スペクトル情報から得られるマスキングノイズを基に、マスキング閾値を調整することを特徴とするものである。
【００１０】
また本願の請求項４，８記載の発明は、上述した音声符号化伝送方法及び音声符号化伝送装置において、マスキング閾値の補正に際し、前記聴取者聴力特性情報に基づいて得られる聴取者の周波数帯域毎の最小可聴値と臨界帯域幅を基に、マスキング閾値を調整することを特徴とするものである。
【００１１】
【発明の実施の形態】
（実施の形態１）
以下本発明の実施の形態１における音声符号化伝送方法について，図１〜図３を参照しつつ説明する。図１は本実施の形態の音声符号化装置の基本構成を示すブロック図であり、従来例と同一部分は同一符号をつけ、それらの説明は省略する。この音声符号化装置は、周波数包絡演算手段１、マスキング閾値決定手段２Ｂ、適応ビット割当手段３、Ｎ帯域に帯域分割を行う第１〜第Ｎの帯域分割手段４、各帯域毎に量子化を行う第１〜第Ｎの量子化手段５、各帯域毎にエントロピー符号化を行う第１〜第Ｎのエントロピー符号化手段６、マルチプレクサ７に加えて、騒音特性参照手段８、聴力特性参照手段９を含んで構成される。
【００１２】
騒音特性参照手段８は、伝送路を介して入力された受信側の周囲の騒音特性を入手し、マスキング閾値決定手段２Ｂに与える手段である。また聴力特性参照手段９は、伝送路を介して入力された聴取者の聴力特性を入手し、マスキング閾値決定手段２Ｂに与える手段である。マスキング閾値決定手段２Ｂは、入力音声信号の周波数包絡情報と、受信側の騒音特性及び聴力特性に基づき、マスキング閾値を決定する手段である。
【００１３】
このように構成された音声符号化装置の動作について図１〜図３を用いて説明する。図２，図３は本実施の形態における音声符号化伝送方法の信号処理の流れを示すフローチャートである。
【００１４】
ステップＳ１においてまず送信側は、受信側の周囲騒音環境を知るために、受信側の周囲騒音特性に関する騒音スペクトル情報の転送を要求する。これに対して図３のステップＳ１１では、受信側は送信側からの周囲騒音特性に関する情報の転送要求を受理する。そして次のステップＳ１２で、受信側は受信端末側の周囲騒音に関する騒音スペクトルを測定し、得られた騒音スペクトル情報を送信側の騒音特性参照手段８に転送する。こうして送信側は、周囲の騒音スペクトル情報を入手する。
【００１５】
図２のステップＳ２では、送信側は、受信側の聴取者の聴力特性を知るために、受信側の聴取者の聴力特性に関する聴力情報の転送を要求する。これに対して受信側は図３のステップＳ２１において、送信側からの受信側の聴取者の聴力情報の転送要求を受理する。そして次のステップＳ２２で、受信端末側の聴取者の聴力情報を収集し、得られた聴力情報を送信側の聴力特性参照手段９に転送する。なお、この受信端末側の聴取者の正確な聴覚特性の特性が既に得られていて、その情報を受信端末から転送できるものとする。なお受信側の周囲騒音特性や聴取者の聴力特性の情報が得られないときは、送信側がその情報を推定する。
【００１６】
次のステップＳ３では、送信側のマスキング閾値決定手段２Ｂは入力信号のスペクトル包絡をフレーム単位で求め、受信側から得られた周囲騒音スペクトルの包絡をこれに付加する。そしてステップＳ４では、同時マスキング効果によるマスキング閾値を決定する。これにより、受信側の周囲騒音環境を含めたマスキング閾値が得られることになる。
【００１７】
ステップＳ５に進むと、マスキング閾値決定手段２Ｂは聴力特性参照手段９を介して得られた聴取者の聴力特性である最小可聴値を基に、マスキング閾値を補正する。これにより聴取者が例えば高域周波数の感度が劣化している場合などに、可聴域外の無駄な符号化データの送信をなくすことができる。
【００１８】
ステップＳ６では、適応ビット割当て手段３は以上で求められたマスキング閾値と入力信号のスペクトル包絡とを用いて、適応的にビット割当量を変更する。なお、本実施の形態では、伝送する符号化音声のビットレートは上限が制限されているものとする。次に各帯域のマスキング閾値を越える成分の比を求め、その比に応じたビット配分を行う。全体でのビット数は所定値以下とするが、その割当量は先のビット配分に応じて適応的に変更される。
【００１９】
ビット割り当て以降の動作は従来例と同様である。即ち、ステップＳ７では、帯域分割手段４が入力信号を帯域分割する。そして量子化手段５は各帯域に割当てられたビット数で量子化し、エントロピー符号化手段６がエントロピー符号化を実施する。次のステップＳ８では、マルチプレクサ７は各帯域の符号化されたデータと、量子化に割り当てられたビット割当て数を多重化して伝送路に出力する。
【００２０】
（実施の形態２）
次に本発明の実施の形態２における音声符号化伝送方法について、図３及び図４を参照しつつ説明する。図４は本実施の形態における音声符号化伝送方法の信号処理の流れを示すフローチャートである。なお、音声符号化装置の基本構成は図１と同様であるので、図１の各手段の引用は省略する。
【００２１】
図４のステップＴ１においてまず送信側は、受信側の周囲騒音環境を知るために、受信側の周囲騒音特性に関する騒音スペクトル情報の転送を要求する。これに対して図３のステップＴ１１では、受信側は送信側からの周囲騒音特性に関する情報の転送要求を受理する。そして次のステップＴ１２で、受信側は受信端末側の周囲騒音に関する騒音スペクトルを測定し、得られた騒音スペクトル情報を送信側に転送する。こうして送信側は、周囲の騒音スペクトル情報を入手する。
【００２２】
次のステップＴ２では、送信側は、受信側の聴取者の聴力特性を知るために、聴取者の聴力特性に関する聴力情報の転送を要求する。これに対して図３のステップＴ２１では、受信側は送信側からの受信側の聴取者の聴力特性に関する情報の転送要求を受理する。そして次のステップＴ２２で、受信端末側の聴取者の聴力情報を収集し、得られた聴力情報を送信側に転送する。なお、この受信端末側の聴取者の正確な聴力の特性が既に得られていて、その情報を受信端末から転送できるものとする。
【００２３】
次のステップＴ３では、送信側は入力信号のスペクトル包絡をフレーム単位で求め、受信側から得られた周囲騒音スペクトルの包絡をこれに付加する。そしてステップＴ４では、同時マスキング効果によるマスキング閾値を決定する。これにより、受信側の周囲騒音環境を含めたマスキング閾値が得られることになる。
【００２４】
ステップＴ５に進むと、更に受信側の聴取者の聴力特性である最小可聴値を基に、マスキング閾値を補正する。これにより聴取者が例えば高域周波数の感度が劣化している場合などに、可聴域外の符号化データの無駄に送信を事前になくすことができる。
【００２５】
ステップＴ６では、以上で求められたマスキング閾値と入力信号のスペクトル包絡とを用いて、適応的にビット割当量を変更する。なお、本実施の形態では、伝送する符号化音声のビットレートは可変できるとする。まず各帯域のマスキング閾値を越える成分の絶対値から、音声のＳＮ値が所定の値になるようにビット数の決定を行う。このため、全体でのビット数は一定値ではなく、信号の状態などに応じて適応的に可変する。
【００２６】
ビット割り当て以降の動作は従来例と同様である。即ち、ステップＴ７では、入力信号を帯域分割し、各帯域に割当てられたビット数で量子化し、エントロピー符号化を実施する。次のステップＴ８では、各帯域の符号化されたデータと、量子化に割り当てられたビット割当て数を多重化して伝送路に出力する。
【００２７】
【発明の効果】
以上のように、請求項１〜８記載の発明によれば、受信側の周囲騒音の影響や受信側聴取者の聴力特性を考慮して全帯域の符号化データのビット数を制限することにより、符号化伝送する情報量を削減できる効果が得られる。
【００２８】
また請求項２，３，４，６，７，８記載の発明によれば、受信側の周囲騒音の影響や聴取者の聴力特性を考慮して各帯域へのビット配分を変更することにより、聴感上の符号化再生品質を改善できるという効果が得られる。
【図面の簡単な説明】
【図１】本発明の音声符号化伝送方法を実現するための音声符号化装置の基本構成図である。
【図２】本発明の実施の形態１における音声符号化伝送方法の信号処理を示すフローチ
ャート（その１）である。
【図３】実施の形態１，２における音声符号化伝送方法の信号処理を示すフローチャート（その２）である。
【図４】本発明の実施の形態２における音声符号化伝送方法の信号処理を示すフローチャート（その１）である。
【図５】従来の音声符号化装置の構成図である。
【符号の説明】
１周波数包絡演算手段
２Ａ，２Ｂマスキング閾値決定手段
３適応ビット割当手段
４帯域分割手段
５量子化手段
６エントロピー符号化手段
７マルチプレクサ
８騒音特性参照手段
９聴力特性参照手段[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a voice encoding transmission method and a voice encoding transmission apparatus for efficiently transmitting a voice signal using a transmission line.
[0002]
[Prior art]
A conventional speech encoding method and apparatus will be described. FIG. 5 is a block diagram showing a basic configuration of a conventional speech coding apparatus. As shown in the figure, the speech coding apparatus includes a frequency envelope calculation means 1, a masking threshold value determination means 2A, an adaptive bit allocation means 3, and first to Nth band divisions for dividing a band into first to Nth bands. Means 4 includes first to Nth quantization means 5 for performing quantization for each band, first to Nth entropy encoding means 6 for performing entropy encoding for each band, and a multiplexer 7. The
[0003]
First, the speech envelope input to the frequency envelope calculation means 1 is required to have a spectral envelope in units of frames. Based on the obtained spectral envelope, the masking threshold value determining means 2A determines the masking threshold value of the band that is divided. This masking threshold is determined by the simultaneous masking effect considering the critical bandwidth. The adaptive bit allocation means 3 obtains a spectrum envelope component for each band for an input signal exceeding the obtained masking threshold. The bit allocation amount for each band is determined according to the ratio.
[0004]
On the other hand, when the input signal is input to the first to Nth band dividing means 4, it is divided into the first to Nth N bands. The output signals of the first to Nth band dividing means 4 are respectively input to the first to Nth quantizing means 5 and quantized with the number of bits given by the adaptive bit allocation means 3. Each quantized band division signal is input to the first to Nth entropy encoding means 6 and subjected to entropy encoding for eliminating redundancy. The encoded data of each band is collected by the multiplexer 7 together with the bit allocation information determined by the adaptive bit allocation means 3 and sent to the transmission line.
[0005]
[Problems to be solved by the invention]
However, in the above method, the encoded data is reduced by using the simultaneous masking effect in order to improve the encoding efficiency. However, the influence of the ambient noise on the listener side receiving the audio signal and the listener It does not take into account individual hearing ability (auditory characteristics). Especially for listeners who have a high noise level or who cannot listen to the entire audible band in the receiving-side environment, receiving such a band division signal unilaterally acquires redundant information. .
[0006]
The present invention has been made in view of the above-described problems of the prior art, and is based on the noise characteristics around the listener receiving the audio signal and the hearing characteristics of the listener. It is an object of the present invention to realize a speech encoding / transmission method and speech encoding / transmission apparatus that improve signal reproduction quality or encoding efficiency.
[0007]
[Means for Solving the Problems]
In order to achieve this object, the invention according to claim 1 of the present application is a speech encoding transmission method applied on the transmission side in order to compensate reproduction quality on hearing based on listening conditions on the reception side, Request the transfer of information about the ambient noise characteristics of the receiving side from the receiving side to obtain the ambient noise spectrum information from the receiving side, and request the transfer of information about the hearing characteristics of the receiving listener from the transmitting side to the receiving side. Obtaining the listener's hearing characteristic information from the receiving side , obtaining the spectral envelope of the audio signal to be encoded and transmitted, the masking threshold for the spectral envelope by the simultaneous masking effect, the ambient noise spectral information obtained from the receiving side and the receiving side at least one corrected based on the basis of the new masking threshold obtained reduced the allocation of bit allocation amount for each frequency band of the listener hearing characteristic information obtained from To make it adjusts performs encoding at a predetermined coding algorithm based on the adjusted bit allocation amount for each frequency band signal, which is characterized in that transmission to the receiver, the claims hereof The invention described in claim 5 is a speech coding and transmission apparatus applied on the transmission side in order to compensate for the audible reproduction quality based on listening conditions on the reception side, and obtains ambient noise spectrum information on the reception side on the transmission side and noise characteristic reference means for a hearing characteristic reference means to obtain the listener hearing characteristic information of the receiving side on the transmitting side, the frequency envelope calculating means for calculating a spectrum envelope of the speech signal to be transmitted coded, the by simultaneous masking effect Ma is corrected based on the masking threshold for the spectral envelope, to at least one listener hearing characteristic information obtained with ambient noise spectrum information obtained from the reception side from the reception side A king threshold value determining means, an adaptive bit allocating means for adjusting the allocation of the bit allocation amount for each frequency band based on the obtained new masking threshold value, and a signal for each frequency band signal based on the adjusted bit allocation amount. On the other hand, an encoding means for encoding with a predetermined encoding algorithm is provided.
[0008]
The invention according to claim 2 of the present application is a speech coding transmission method applied on the transmission side in order to compensate the audible reproduction quality based on the listening condition on the reception side, and is received from the transmission side to the reception side. Request the transfer of information on the ambient noise characteristics of the receiver to obtain ambient noise spectrum information from the receiver, and request the transfer of information on the hearing characteristics of the receiver of the receiver from the transmitter to the receiver to request the listener from the receiver Obtain hearing characteristic information , determine the spectral envelope of the audio signal to be encoded and transmitted, and obtain the masking threshold for the spectral envelope by the simultaneous masking effect, the ambient noise spectrum information obtained from the receiving side and the listener's hearing obtained from the receiving side corrected based on at least one particular feature information, the bit allocation amount for each frequency band based on a new masking threshold obtained, SN value is a predetermined value or more of the audio signals It was changed to a performs encoding at a predetermined coding algorithm for each frequency band signal based on the changed bit allocation amount, which is characterized in that transmission to the receiver, the present application claims The invention according to Item 6 is a speech coding and transmission apparatus applied on the transmission side in order to compensate the reproduction quality on hearing based on listening conditions on the reception side, and the ambient noise spectrum information on the reception side is transmitted on the transmission side. By means of the simultaneous masking effect, the noise characteristic reference means to be obtained, the hearing characteristic reference means to obtain the listener's hearing characteristic information on the receiving side, the frequency envelope calculation means to obtain the spectral envelope of the audio signal to be encoded and transmitted, and a masking threshold for the spectral envelope, at least one or more correction based the listener hearing characteristic information obtained with ambient noise spectrum information obtained from the reception side from the reception side Masking threshold value determining means, adaptive bit allocating means for changing the bit allocation amount for each frequency band based on the obtained new masking threshold value so that the SN value of the audio signal is equal to or greater than a predetermined value, and changed bit allocation And encoding means for encoding each frequency band signal with a predetermined encoding algorithm based on the quantity.
[0009]
According to the third and seventh aspects of the present invention, in the above-described speech coding transmission method and speech coding transmission device , the masking threshold is corrected based on the masking noise obtained from the ambient noise spectrum information when the masking threshold is corrected. It is characterized by adjusting.
[0010]
According to the fourth and eighth aspects of the present invention, in the voice encoding transmission method and the voice encoding transmission apparatus described above, the frequency band of the listener obtained based on the listener hearing characteristic information when the masking threshold is corrected. The masking threshold value is adjusted based on the minimum audible value and the critical bandwidth for each.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
Hereinafter, a speech coding and transmission method according to Embodiment 1 of the present invention will be described with reference to FIGS. FIG. 1 is a block diagram showing a basic configuration of a speech encoding apparatus according to the present embodiment. The same parts as those in the conventional example are given the same reference numerals, and the description thereof is omitted. This speech encoding apparatus includes a frequency envelope calculation means 1, a masking threshold value determination means 2B, an adaptive bit allocation means 3, first to Nth band division means 4 for dividing a band into N bands, and performs quantization for each band. In addition to first to Nth quantizing means 5 to perform, first to Nth entropy encoding means 6 to perform entropy coding for each band, multiplexer 7, noise characteristic reference means 8, hearing characteristic reference means 9 It is comprised including.
[0012]
The noise characteristic reference means 8 is means for obtaining the noise characteristic of the reception side input via the transmission path and giving it to the masking threshold value determination means 2B. The hearing characteristic reference means 9 is means for obtaining the hearing characteristic of the listener inputted via the transmission path and giving it to the masking threshold value determination means 2B. The masking threshold value determining means 2B is a means for determining a masking threshold value based on the frequency envelope information of the input audio signal and the noise characteristics and hearing characteristics on the receiving side.
[0013]
The operation of the speech encoding apparatus configured as described above will be described with reference to FIGS. 2 and 3 are flowcharts showing the signal processing flow of the speech coding and transmission method according to the present embodiment.
[0014]
In step S1, the transmitting side first requests the transfer of noise spectrum information related to the ambient noise characteristics on the receiving side in order to know the ambient noise environment on the receiving side. On the other hand, in step S11 of FIG. 3, the receiving side accepts a transfer request for information on ambient noise characteristics from the transmitting side. In the next step S12, the receiving side measures the noise spectrum related to the ambient noise on the receiving terminal side, and transfers the obtained noise spectrum information to the noise characteristic reference means 8 on the transmitting side. In this way, the transmitting side obtains ambient noise spectrum information.
[0015]
In step S2 of FIG. 2, the transmitting side requests transfer of hearing information relating to the hearing characteristics of the receiving listener in order to know the hearing characteristics of the receiving listener. On the other hand, in step S21 in FIG. 3, the receiving side accepts a request for transferring the hearing information of the listener on the receiving side from the transmitting side. In the next step S22, the hearing information of the listener on the receiving terminal side is collected, and the obtained hearing information is transferred to the hearing characteristic reference means 9 on the transmitting side. It is assumed that the accurate auditory characteristics of the listener on the receiving terminal side have already been obtained, and that information can be transferred from the receiving terminal. If information on the ambient noise characteristics on the receiving side and the hearing characteristics of the listener cannot be obtained, the transmitting side estimates the information.
[0016]
In the next step S3, the masking threshold value determination means 2B on the transmission side obtains the spectrum envelope of the input signal in units of frames, and adds the envelope of the ambient noise spectrum obtained from the reception side to this. In step S4, a masking threshold value due to the simultaneous masking effect is determined. Thereby, the masking threshold value including the ambient noise environment on the receiving side is obtained.
[0017]
In step S5, the masking threshold value determining unit 2B corrects the masking threshold value based on the minimum audible value that is the hearing characteristic of the listener obtained through the hearing characteristic reference unit 9. As a result, it is possible to eliminate transmission of useless encoded data outside the audible range, for example, when the listener has deteriorated sensitivity at high frequencies.
[0018]
In step S6, the adaptive bit allocation means 3 adaptively changes the bit allocation amount using the masking threshold obtained above and the spectrum envelope of the input signal. In the present embodiment, it is assumed that the upper limit of the bit rate of encoded audio to be transmitted is limited. Next, a ratio of components exceeding the masking threshold value of each band is obtained, and bit allocation is performed according to the ratio. Although the total number of bits is set to a predetermined value or less, the allocated amount is adaptively changed according to the previous bit allocation.
[0019]
The operation after bit allocation is the same as in the conventional example. That is, in step S7, the band dividing means 4 divides the input signal into bands. The quantization means 5 quantizes the number of bits assigned to each band, and the entropy coding means 6 performs entropy coding. In the next step S8, the multiplexer 7 multiplexes the encoded data of each band and the bit allocation number allocated for quantization, and outputs them to the transmission line.
[0020]
(Embodiment 2)
Next, a speech coding and transmission method according to Embodiment 2 of the present invention will be described with reference to FIGS. FIG. 4 is a flowchart showing a signal processing flow of the speech coding and transmission method according to the present embodiment. Since the basic configuration of the speech encoding apparatus is the same as that in FIG. 1, citation of each unit in FIG. 1 is omitted.
[0021]
In step T1 in FIG. 4, first, the transmitting side requests transfer of noise spectrum information related to the ambient noise characteristics on the receiving side in order to know the ambient noise environment on the receiving side. On the other hand, in step T11 of FIG. 3, the receiving side accepts a transfer request for information on ambient noise characteristics from the transmitting side. In the next step T12, the receiving side measures the noise spectrum related to the ambient noise on the receiving terminal side, and transfers the obtained noise spectrum information to the transmitting side. In this way, the transmitting side obtains ambient noise spectrum information.
[0022]
In the next step T2, the transmitting side requests transfer of hearing information relating to the hearing characteristics of the listener in order to know the hearing characteristics of the listener on the receiving side. On the other hand, in step T21 of FIG. 3, the receiving side accepts a transfer request for information related to the hearing characteristics of the listener on the receiving side from the transmitting side. In the next step T22, the hearing information of the listener on the receiving terminal side is collected, and the obtained hearing information is transferred to the transmitting side. Note that it is assumed that accurate hearing characteristics of the listener on the receiving terminal side have already been obtained, and that information can be transferred from the receiving terminal.
[0023]
In the next step T3, the transmitting side obtains the spectrum envelope of the input signal in units of frames, and adds the envelope of the ambient noise spectrum obtained from the receiving side to this. In step T4, a masking threshold value due to the simultaneous masking effect is determined. Thereby, the masking threshold value including the ambient noise environment on the receiving side is obtained.
[0024]
In step T5, the masking threshold is corrected based on the minimum audible value that is the hearing characteristic of the listener on the receiving side. As a result, when the listener has deteriorated the sensitivity of the high frequency, for example, transmission of encoded data outside the audible range can be eliminated in advance.
[0025]
In step T6, the bit allocation amount is adaptively changed using the masking threshold obtained above and the spectrum envelope of the input signal. In the present embodiment, it is assumed that the bit rate of transmitted encoded audio can be varied. First, the number of bits is determined from the absolute value of the component exceeding the masking threshold of each band so that the SN value of the voice becomes a predetermined value. For this reason, the total number of bits is not a constant value, but is adaptively varied according to the signal state and the like.
[0026]
The operation after bit allocation is the same as in the conventional example. That is, in step T7, the input signal is divided into bands, quantized with the number of bits assigned to each band, and entropy coding is performed. In the next step T8, the encoded data of each band and the number of bits allocated for quantization are multiplexed and output to the transmission line.
[0027]
【The invention's effect】
As described above, according to the first to eighth aspects of the present invention, by limiting the number of bits of encoded data in the entire band in consideration of the influence of ambient noise on the reception side and the hearing characteristics of the listener on the reception side. Thus, it is possible to reduce the amount of information to be encoded and transmitted.
[0028]
According to the inventions of claims 2, 3, 4, 6, 7, and 8, by changing the bit distribution to each band in consideration of the influence of ambient noise on the receiving side and the hearing characteristics of the listener, There is an effect that the encoded reproduction quality on hearing can be improved.
[Brief description of the drawings]
FIG. 1 is a basic configuration diagram of a speech coding apparatus for realizing a speech coding and transmission method of the present invention.
FIG. 2 is a flowchart (part 1) showing signal processing of the speech coding and transmission method according to Embodiment 1 of the present invention.
FIG. 3 is a flowchart (part 2) illustrating signal processing of the speech coding and transmission method according to the first and second embodiments.
FIG. 4 is a flowchart (part 1) showing signal processing of the speech coding and transmission method according to Embodiment 2 of the present invention.
FIG. 5 is a configuration diagram of a conventional speech encoding apparatus.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Frequency envelope calculation means 2A, 2B Masking threshold value determination means 3 Adaptive bit allocation means 4 Band division means 5 Quantization means 6 Entropy encoding means 7 Multiplexer 8 Noise characteristic reference means 9 Hearing characteristic reference means

Claims

A speech coding and transmission method applied on the transmitting side to compensate for auditory reproduction quality based on listening conditions on the receiving side,
Obtain the ambient noise spectrum information from the receiver by requesting the transfer of information on the ambient noise characteristics of the receiver from the transmitter to the receiver .
Request the transfer of information about the hearing characteristics of the receiving listener from the transmitting side to the receiving side to obtain the listener's hearing characteristics information from the receiving side ,
Obtain the spectral envelope of the audio signal to be encoded and transmitted,
Correcting the masking threshold for the spectral envelope by a simultaneous masking effect based on at least one of ambient noise spectrum information obtained from the receiver and listener hearing characteristic information obtained from the receiver ;
Adjust to reduce the allocation of bit allocation for each frequency band based on the new masking threshold obtained,
A speech encoding transmission method, wherein each frequency band signal is encoded with a predetermined encoding algorithm based on the adjusted bit allocation amount and transmitted to a receiving side .

A speech coding and transmission method applied on the transmitting side to compensate for auditory reproduction quality based on listening conditions on the receiving side,
Obtain the ambient noise spectrum information from the receiver by requesting the transfer of information on the ambient noise characteristics of the receiver from the transmitter to the receiver .
Request the transfer of information about the hearing characteristics of the receiving listener from the transmitting side to the receiving side to obtain the listener's hearing characteristics information from the receiving side ,
Obtain the spectral envelope of the audio signal to be encoded and transmitted,
Correcting the masking threshold for the spectral envelope by a simultaneous masking effect based on at least one of ambient noise spectrum information obtained from the receiver and listener hearing characteristic information obtained from the receiver ;
Based on the obtained new masking threshold, the bit allocation amount for each frequency band is changed so that the SN value of the audio signal is a predetermined value or more,
A speech coding and transmission method characterized in that each frequency band signal is coded with a predetermined coding algorithm based on the changed bit allocation amount and is transmitted to the receiving side .

3. The speech encoding and transmitting method according to claim 1, wherein the masking threshold is adjusted based on the masking noise obtained from the ambient noise spectrum information when the masking threshold is corrected.

3. The masking threshold is adjusted based on a minimum audible value and a critical bandwidth for each frequency band of the listener obtained based on the listener hearing characteristic information when correcting the masking threshold. The voice encoded transmission method described.

A speech coding and transmission device applied on the transmission side to compensate for the audible reproduction quality based on the listening condition on the reception side,
Noise characteristic reference means for obtaining ambient noise spectrum information on the receiving side on the transmitting side ;
A hearing characteristic reference means for obtaining the listener's hearing characteristic information on the receiving side on the transmitting side ;
A frequency envelope calculating means for obtaining a spectral envelope of a voice signal to be encoded and transmitted;
A masking threshold value determining means for correcting a masking threshold value related to the spectrum envelope by a simultaneous masking effect based on at least one of ambient noise spectrum information obtained from the receiving side and listener hearing characteristic information obtained from the receiving side ;
Adaptive bit allocation means for adjusting the allocation of the bit allocation amount for each frequency band based on the obtained new masking threshold;
An audio encoding transmission apparatus comprising: encoding means for encoding each frequency band signal with a predetermined encoding algorithm based on the adjusted bit allocation amount.

A speech coding and transmission device applied on the transmission side to compensate for the audible reproduction quality based on the listening condition on the reception side,
Noise characteristic reference means for obtaining ambient noise spectrum information on the receiving side on the transmitting side ;
A hearing characteristic reference means for obtaining the listener's hearing characteristic information on the receiving side on the transmitting side ;
A frequency envelope calculating means for obtaining a spectral envelope of a voice signal to be encoded and transmitted;
A masking threshold value determining means for correcting a masking threshold value related to the spectrum envelope by a simultaneous masking effect based on at least one of ambient noise spectrum information obtained from the receiving side and listener hearing characteristic information obtained from the receiving side ;
Adaptive bit allocation means for changing the bit allocation amount for each frequency band based on the obtained new masking threshold so that the SN value of the audio signal is equal to or greater than a predetermined value;
An audio encoding transmission apparatus comprising: encoding means for encoding each frequency band signal with a predetermined encoding algorithm based on the changed bit allocation amount.

7. The speech coding and transmitting apparatus according to claim 5, wherein the masking threshold is adjusted based on the masking noise obtained from the ambient noise spectrum information when the masking threshold is corrected.

7. The masking threshold value is adjusted based on a minimum audible value and a critical bandwidth for each frequency band of the listener obtained by the listener hearing characteristic information when correcting the masking threshold value. Voice encoding transmission device.