JP2015227912A

JP2015227912A - Audio coding device and method

Info

Publication number: JP2015227912A
Application number: JP2014112478A
Authority: JP
Inventors: 周作伊藤; Shusaku Ito; 美由紀白川; Miyuki Shirakawa; 俊輔武内; Shunsuke Takeuchi; 土永　義照; Yoshiteru Tsuchinaga; 義照土永; 祥吾中村; Shogo Nakamura; 舞子平原; Maiko Hirahara; 孝志牧内; Takashi Makiuchi; 剛水田; Takeshi Mizuta
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-05-30
Filing date: 2014-05-30
Publication date: 2015-12-17

Abstract

PROBLEM TO BE SOLVED: To code an audio signal of a certain information amount into data having a less information amount which is hardly buried in noise.SOLUTION: An audio coding device (232) uses a masking threshold on the basis of audibility property for reducing an information amount, and codes an audio signal. The audio coding device comprises: a first processing part (239, 240, 242) which corrects the masking threshold (302) according to a surrounding noise level (304) received from a terminal, at the time, an input signal (350) which has larger difference between it and the masking threshold, than a certain threshold, becomes larger than corrected masking threshold (306, 310), corrects at least one of the masking threshold (302) and the input signal (350); and a second processing part (236, 246) for coding the input signal (350, 351) which is larger than the corrected masking threshold (306, 310), after at least one of the masking threshold and the input signal has been corrected.

Description

本発明は、オーディオ信号の符号化に関する。 The present invention relates to encoding audio signals.

既知の録画装置は、ユーザの操作に従って、放送番組のオーディオおよびビデオ・ストリームを受信して記録することができる。既知の著作権保護機能付き録画装置は、さらに、移動体通信網およびインターネットを介して、ユーザによる携帯情報端末からの送信要求に応答して、記録したオーディオおよびビデオ・ストリームを低い伝送レートに変換して携帯情報端末に送信することができる。著作権保護機能付き録画装置は、例えば著作権保護規格の機能ＤＴＣＰ＋（Digital Transmission Content Protection Plus）等に適合するものであってもよい。ユーザは、携帯情報端末において、著作権保護機能付き録画装置からその低いレートのオーディオおよびビデオ・ストリームを受信して再生して視聴することができる。 Known recording devices can receive and record audio and video streams of broadcast programs according to user operations. A known recording device with a copyright protection function further converts the recorded audio and video streams to a low transmission rate in response to a transmission request from a portable information terminal by a user via a mobile communication network and the Internet. Can be transmitted to the portable information terminal. The recording device with a copyright protection function may be adapted to, for example, the function DTCP + (Digital Transmission Content Protection Plus) of the copyright protection standard. The user can receive and play back the low-rate audio and video stream from the recording apparatus with a copyright protection function on the portable information terminal.

或る既知の信号符号化方法において、伝送モード決定部が、入力信号中の音声または楽音信号の背景に含まれる環境雑音を検知し、環境雑音のレベルに応じて相手側通信端末である通信端末装置から伝送される信号の伝送ビットレートを制御する伝送モードを決定する。また、信号復号化部が、伝送路を介して通信端末装置から伝送される符号化情報を復号化し、得られた信号を出力信号として出力する。このとき、信号復号化部は、伝送路から出力される符号化情報に含まれる伝送モード情報と伝送モード決定部から得られる伝送モード情報とを、伝送遅延を考慮した上で比較することにより、伝送誤りを検出する。それによって、受信側の使用環境を考慮して送信側の伝送ビットレートを制御することによって、品質を維持しつつ効率的な音声または楽音信号の符号化を行うことが可能である。 In a known signal encoding method, a transmission mode determination unit detects environmental noise included in the background of a voice or musical sound signal in an input signal, and is a communication terminal that is a counterpart communication terminal according to the level of the environmental noise A transmission mode for controlling a transmission bit rate of a signal transmitted from the apparatus is determined. In addition, the signal decoding unit decodes the encoded information transmitted from the communication terminal device via the transmission path, and outputs the obtained signal as an output signal. At this time, the signal decoding unit compares the transmission mode information included in the encoded information output from the transmission path with the transmission mode information obtained from the transmission mode determination unit in consideration of the transmission delay, Detect transmission errors. Thus, by controlling the transmission bit rate on the transmission side in consideration of the usage environment on the reception side, it is possible to efficiently encode a voice or musical sound signal while maintaining the quality.

或る既知の通信端末において、背景雑音検出部が、背景雑音の雑音レベルを検出する。また、制御部が、雑音レベルに基づいて復号レベルを決定する。制御部は、このようにして決定された復号レベルでのＡＭＲ（Adaptive Multi-Rate）復号を、圧縮伸長処理部に行わせる。それによって、過剰な品質での音声の出力を行うことが防止され、効率化を図ることが可能となる。 In a known communication terminal, a background noise detection unit detects a noise level of background noise. Further, the control unit determines the decoding level based on the noise level. The control unit causes the compression / decompression processing unit to perform AMR (Adaptive Multi-Rate) decoding at the decoding level thus determined. As a result, it is possible to prevent the output of sound with excessive quality and to improve efficiency.

或る既知の通信端末装置では、電波強度やネットワーク輻輳などの下位層の情報をアプリケーションから隠蔽するように構成されていた従来の通信端末装置に、これら下位層の情報を上位層に通知する下位層管理部が、新たに導入される。それによって、アプリケーションのサービス生成環境を提供する接続制御部が、直接下位層の情報を認識でき、アプリケーション・レベルで、下位層の情報に基づいた様々な状況判断と制御の変更が可能となる。また、それによって、モバイル環境で通信状況が変化しても、ユーザがシームレスな感覚で映像や音声のコミュニケーションを継続することができる通信端末が提供される。 In a certain known communication terminal device, the lower layer information that notifies the upper layer of the information of the lower layer to the conventional communication terminal device configured to hide the lower layer information such as the radio wave intensity and the network congestion from the application. A new layer management unit is introduced. As a result, the connection control unit that provides the service generation environment of the application can directly recognize the information of the lower layer, and various situation determinations and control changes based on the information of the lower layer can be performed at the application level. This also provides a communication terminal that allows the user to continue video and audio communication with a seamless feeling even when the communication status changes in a mobile environment.

特開２００５−２４１７６１号公報JP 2005-241761 A 特開２００３−２１８７８１号公報JP 2003-218781 A 特表２００４−２６６３３０号公報JP-T-2004-266330

E. Kurniawati, C. T. Lau, B. Premkumar, J. Absar, S. George, “New Implementation Techniques of an Efficient MPEG Advanced Audio Coder”、［平成２６年３月７日検索］インターネット<http://www.img.lx.it.pt/~fp/cav/ano2008_2009/Trabalhos_MEEC_2009/Artigo_MEEC_11/pagina%20web/pagina%20web/referencias/artigo3.pdf>E. Kurniawati, CT Lau, B. Premkumar, J. Absar, S. George, “New Implementation Techniques of an Efficient MPEG Advanced Audio Coder”, [Search March 7, 2014] Internet <http: // www. img.lx.it.pt/~fp/cav/ano2008_2009/Trabalhos_MEEC_2009/Artigo_MEEC_11/pagina%20web/pagina%20web/referencias/artigo3.pdf>

記録装置に或るレートで記録されたオーディオ・ストリーム・データが、送信側で低いレートで再符号化されて送信されて、受信側で再生されたとき、再生側での再生音は、受信側の周囲雑音に埋もれて聴取者にとって聴取しにくく、聴取音質が低下することがある。 When audio stream data recorded at a certain rate on the recording device is re-encoded and transmitted at a low rate on the transmission side and reproduced on the reception side, the reproduced sound on the reproduction side is received on the reception side. It is difficult to hear for the listener because of being buried in the ambient noise, and the listening sound quality may be deteriorated.

発明者たちは、或るレートで記録されたオーディオ・ストリーム・データを、再生時の周囲雑音に埋もれにくい形態のオーディオ・ストリーム・データに変換すれば、その実効的な聴取音質を高くすることができる、と認識した。 The inventors can improve the effective listening sound quality by converting the audio stream data recorded at a certain rate into the audio stream data in a form that is hard to be buried in ambient noise during reproduction. Recognized that it was possible.

１つの観点では、本発明の目的は、或る情報量のオーディオ信号を雑音に埋もれにくい形態かつ少ない情報量で符号化することである。 In one aspect, an object of the present invention is to encode an audio signal having a certain amount of information in a form that is less likely to be buried in noise and with a small amount of information.

本発明の実施形態によれば、情報量が低減するように聴覚特性に基づいてマスキング閾値を利用してオーディオ信号を符号化するオーディオ符号化装置が提供される。そのオーディオ符号化装置は、端末から受信した周囲雑音レベルに応じてマスキング閾値を補正し、その際、マスキング閾値との差が或る閾値より大きい入力信号が、補正後のマスキング閾値より大きくなるように、マスキング閾値および入力信号の少なくとも一方の補正を行う第１の処理部と、補正後のマスキング閾値より大きい、その少なくとも一方の補正が行われた後の入力信号の符号化を行う第２の処理部と、を含んでいる。 According to the embodiment of the present invention, there is provided an audio encoding device that encodes an audio signal using a masking threshold based on auditory characteristics so as to reduce the amount of information. The audio encoding device corrects the masking threshold according to the ambient noise level received from the terminal. At this time, an input signal whose difference from the masking threshold is larger than a certain threshold is larger than the corrected masking threshold. A first processing unit that corrects at least one of the masking threshold and the input signal, and a second processor that encodes the input signal after at least one of the corrections that is greater than the corrected masking threshold. And a processing unit.

実施形態の実施形態によれば、或る情報量のオーディオ信号を雑音に埋もれにくい形態かつ少ない情報量で符号化することができる。 According to the embodiment of the present invention, an audio signal having a certain amount of information can be encoded in a form that is less likely to be buried in noise and with a small amount of information.

図１は、実施形態による通信システムの例を示している。FIG. 1 shows an example of a communication system according to an embodiment. 図２は、トランスコーダ装置の概略的な構成（configuration）の例を示している。FIG. 2 shows an example of a schematic configuration of the transcoder device. 図３は、携帯情報端末の概略的な構成（configuration）の例を示している。FIG. 3 shows an example of a schematic configuration of the portable information terminal. 図４Ａ〜４Ｃは、音響符号化部によって処理される復号オーディオ・ストリーム・データの周波数スペクトルの例を示している。4A to 4C illustrate examples of frequency spectra of decoded audio stream data processed by the acoustic encoding unit. 図５は、携帯情報端末によって実行される、周囲の雑音を含む音に関するノイズ電力情報を生成して送信するための処理のフローチャートの例を示している。FIG. 5 shows an example of a flowchart of processing for generating and transmitting noise power information related to a sound including ambient noise, which is executed by the portable information terminal. 図６Ａおよび６Ｂは、トランスコーダ装置の音響符号化部によって実行される、携帯情報端末のノイズ電力レベルに応じて、復号されたビデオ・ストリーム・データを聴覚符号化するための処理のフローチャートの例を示している。FIGS. 6A and 6B are examples of a flowchart of a process for performing audio coding of decoded video stream data according to the noise power level of the portable information terminal, which is executed by the acoustic coding unit of the transcoder device. Is shown. (図6Aで説明)(Explained in Fig. 6A) 図７は、図６Ｂのステップ６２０の、より具体的なフローチャートの例を示している。FIG. 7 shows an example of a more specific flowchart of step 620 in FIG. 6B. 図８Ａ〜８Ｃは、３つの要素の補正関数の例を示している。8A to 8C show examples of correction functions of three elements. 図９は、オーディオ信号のスペクトルのピーク性の例を説明するための図である。FIG. 9 is a diagram for explaining an example of the peak nature of the spectrum of an audio signal. 図１０は、上述の実施形態の変形形態による、携帯情報端末の別の概略的な構成（configuration）の例を示している。FIG. 10 shows an example of another schematic configuration of the portable information terminal according to a modification of the above-described embodiment. 図１１は、図５のフローチャートの変形形態であり、携帯情報端末によって実行される、ノイズ電力および信号対ノイズ比を生成して送信するための処理のフローチャートの例を示している。FIG. 11 is a modification of the flowchart of FIG. 5 and shows an example of a flowchart of processing for generating and transmitting noise power and a signal-to-noise ratio, which is executed by the portable information terminal. 図１２は、図７のフローチャートの変形形態であり、合成マスキング閾値を要素および３つの要素で補正するための処理の別のフローチャートの例を示している。FIG. 12 is a modification of the flowchart of FIG. 7 and shows an example of another flowchart of the process for correcting the composite masking threshold value with elements and three elements. 図１３は、図２のトランスコーダ装置の変形形態であり、トランスコーダ装置の別の概略的な構成（configuration）の例を示している。FIG. 13 is a modification of the transcoder apparatus of FIG. 2 and shows another example of a schematic configuration of the transcoder apparatus. 図１４Ａおよび１４Ｂは、音響符号化部によって実行される、携帯情報端末のノイズ電力レベルおよび信号対ノイズ比に応じて、復号されたビデオ・ストリーム・データを聴覚符号化するための処理のフローチャートの例を示している。FIGS. 14A and 14B are flowcharts of a process for auditory encoding of decoded video stream data according to the noise power level and signal-to-noise ratio of the portable information terminal, which is executed by the acoustic encoding unit. An example is shown. (図14Aで説明)(Explained in Fig. 14A)

発明の目的および利点は、請求の範囲に具体的に記載された構成要素および組み合わせによって実現され達成される。
前述の一般的な説明および以下の詳細な説明は、典型例および説明のためのものであって、本発明を限定するためのものではない、と理解される。 The objects and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims.
It is understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not intended to limit the invention.

本発明の非限定的な実施形態を、図面を参照して説明する。図面において、同様のコンポーネントおよび要素には同じ参照番号が付されている。 Non-limiting embodiments of the present invention will be described with reference to the drawings. In the drawings, similar components and elements have the same reference numerals.

記録装置に或るレートで記録されたオーディオ（音響）ストリーム・データが、送信側の装置で低いレートで再符号化されて送信されて、受信側の端末で再生されたとき、その端末での再生音は、端末の周囲雑音に埋もれて聴取者にとって聴取しにくいことがある。一方、送信側の装置で低いレートで符号化されて送信されるオーディオ・ストリーム・データについて、受信側の端末で周囲雑音によって聴取しにくい周波数部分帯域の信号の利得を受信側の端末で増大させると、符号化の量子化誤差が増大して再生音の音質が低下する。 When audio (acoustic) stream data recorded on a recording device at a certain rate is re-encoded and transmitted at a low rate on the transmission side device and played back on the reception side terminal, The reproduced sound is sometimes buried in the ambient noise of the terminal and is difficult for the listener to hear. On the other hand, with respect to audio stream data that is encoded and transmitted at a low rate by the transmission-side apparatus, the reception-side terminal increases the gain of the signal in the frequency partial band that is difficult to hear by ambient noise at the reception-side terminal. As a result, the quantization error of encoding increases and the sound quality of the reproduced sound decreases.

一方、人の聴覚特性、即ち音に対する人の知覚特性として、或る周波数の或る閾値電力レベルより大きい音には、その音のレベルに応じてその付近の周波数の或る閾値電力レベルより小さい他の音を聞こえなくする効果、即ちマスキング効果がある。マスキング閾値は、可聴周波数帯域にわたって分布するそのような閾値電力レベルを表し、マスキング効果によって人に知覚できないスペクトル電力の限界値（上限）を表している。一方、聴覚特性に基づく知覚符号化（Perceptual Coding）又は聴覚符号化は、マスキング効果を利用して、オーディオ信号のスペクトルおよびレベルに応じて知覚可能な部分帯域のオーディオ信号だけを符号化し、マスキング閾値以下の部分帯域の信号を符号化しない。聴覚符号化によって、オーディオ信号の実効的な符号化効率が増大する。 On the other hand, as a human auditory characteristic, i.e., a human perceptual characteristic of a sound, a sound larger than a certain threshold power level at a certain frequency is smaller than a certain threshold power level at a nearby frequency depending on the sound level. There is an effect of making other sounds inaudible, that is, a masking effect. The masking threshold represents such a threshold power level distributed over the audible frequency band, and represents a limit value (upper limit) of spectral power that cannot be perceived by a human due to the masking effect. On the other hand, perceptual coding or auditory coding based on auditory characteristics uses a masking effect to encode only a subband audio signal that can be perceived according to the spectrum and level of the audio signal, and a masking threshold. The following partial band signals are not encoded. Auditory coding increases the effective coding efficiency of audio signals.

発明者たちは、符号化されるオーディオ信号を、部分帯域毎にマスキング閾値と周囲雑音に応じて選択すれば、聴取可能な部分帯域の信号だけに多くのビット数を割り当てて、低いレートでも周囲雑音に埋もれにくい形態でオーディオ信号を符号化できる、と認識した。また、発明者たちは、受信側の周囲雑音のレベルに応じて各部分帯域のオーディオ信号を選択してレベル補正して符号化すれば、人の聴覚特性に基づいて周囲雑音に埋もれにくい形態で、オーディオ信号を低いレートのデータに符号化することができる、と認識した。 The inventors select the audio signal to be encoded according to the masking threshold and the ambient noise for each subband, assigning a large number of bits only to the audible subband signal, even at a low rate. Recognized that audio signals can be encoded in a form that is less susceptible to noise. In addition, if the inventors select the audio signal of each partial band according to the level of ambient noise on the receiving side, and perform level correction and encoding, then the inventor will not be buried in ambient noise based on human auditory characteristics. Recognized that audio signals can be encoded into low rate data.

実施形態の目的は、オーディオ信号を、雑音に埋もれにくい形態かつ少ない情報量の符号化データへと、周囲雑音レベルに応じて適応的に符号化または変換することである。この目的は、実施形態によって達成される。 An object of the embodiment is to adaptively encode or convert an audio signal into encoded data having a form that is less likely to be buried in noise and having a small amount of information according to the ambient noise level. This object is achieved by the embodiment.

図１は、実施形態による通信システム２の例を示している。 FIG. 1 shows an example of a communication system 2 according to the embodiment.

図１において、通信システム２は、携帯情報端末１０および情報処理装置２０を含んでいる。情報処理装置２０は、通信ネットワーク５を介して、携帯情報端末１０からの要求に応答して、放送ストリームを或るデータレートで記録した符号化オーディオおよびビデオ・ストリームを復号し、低レートで再符号化して、携帯情報端末１０に送信することができる。通信ネットワーク５は、無線基地局またはアクセスポイント５０を含む移動体通信網、および例えばインターネットのようなＩＰ（Internet Protocol）ネットワークを含んでいてもよい。 In FIG. 1, the communication system 2 includes a portable information terminal 10 and an information processing device 20. In response to a request from the portable information terminal 10 via the communication network 5, the information processing device 20 decodes the encoded audio and video stream in which the broadcast stream is recorded at a certain data rate, and replays it at a low rate. It can be encoded and transmitted to the portable information terminal 10. The communication network 5 may include a mobile communication network including a radio base station or an access point 50, and an IP (Internet Protocol) network such as the Internet.

携帯情報端末１０は、例えば、携帯電話機、スマートフォン、タブレット端末、モバイル型パーソナル・コンピュータであってもよい。 The mobile information terminal 10 may be, for example, a mobile phone, a smartphone, a tablet terminal, or a mobile personal computer.

情報処理装置２０は、例えば、ＨＤＤ（Hard Disk Drive）レコーダのような放送記録装置もしくは放送記録再生装置、または放送記録再生機能付きのパーソナル・コンピュータであってもよい。図１において、情報処理装置２０は、例えば、プロセッサ２２、記憶装置２４、ネットワーク・インタフェース（ＮＷ／ＩＦ）２６、入力部２７、トランスコーダ装置またはトランスコーダ部２００、放送受信機２０２、ドライブ２８を含んでいる。但し、情報処理装置２０そのものが、トランスコーダ装置またはトランスコーダ機器と称されてもよい。 The information processing apparatus 20 may be, for example, a broadcast recording apparatus or a broadcast recording / reproducing apparatus such as an HDD (Hard Disk Drive) recorder, or a personal computer with a broadcast recording / reproducing function. In FIG. 1, an information processing apparatus 20 includes, for example, a processor 22, a storage device 24, a network interface (NW / IF) 26, an input unit 27, a transcoder device or transcoder unit 200, a broadcast receiver 202, and a drive 28. Contains. However, the information processing device 20 itself may be referred to as a transcoder device or a transcoder device.

プロセッサ２２は、コンピュータ用のＣＰＵ（Central Processing Unit）であってもよい。記憶装置２４は、主記憶装置および補助記憶装置を含んでいる。主記憶装置は、半導体メモリ等の記憶装置を含んでいる。また、補助記憶装置は、例えば、ハードディスク・ドライブ（ＨＤＤ）、および／または、フラッシュ・メモリのような半導体メモリを含んでいる。記憶装置２４の補助記憶装置の少なくとも一部の領域は、ストリーム・データの記録部として機能してもよい。入力部２７は、例えば、複数のキー、タッチパッド、テンキー、キーボード、および／またはタッチパネルを含んでいてもよい。 The processor 22 may be a CPU (Central Processing Unit) for a computer. The storage device 24 includes a main storage device and an auxiliary storage device. The main storage device includes a storage device such as a semiconductor memory. Further, the auxiliary storage device includes, for example, a hard disk drive (HDD) and / or a semiconductor memory such as a flash memory. At least a portion of the auxiliary storage device of the storage device 24 may function as a stream data recording unit. The input unit 27 may include, for example, a plurality of keys, a touch pad, a numeric keypad, a keyboard, and / or a touch panel.

ドライブ２８は、ソフトウェア、およびオーディオおよびビデオ・ストリーム・データが記録された例えば光ディスクまたは磁気ディスクのような記録媒体２８４を読み取るためのものであってもよい。そのソフトウェアは、例えば、ＯＳ、データベース管理システム（ＤＢＭＳ）、アプリケーション・プログラム、等を含んでいてもよい。アプリケーション・プログラムは、携帯情報端末１０からの操作コマンドまたは要求の受信に応答動作して要求されたストリーム・データを携帯情報端末１０に送信するためのアプリケーションを含んでいてもよい。 The drive 28 may be for reading software and a recording medium 284, such as an optical disk or magnetic disk, on which audio and video stream data is recorded. The software may include, for example, an OS, a database management system (DBMS), an application program, and the like. The application program may include an application for transmitting the requested stream data to the portable information terminal 10 in response to receiving an operation command or request from the portable information terminal 10.

プロセッサ２２は、例えば集積回路として実装された専用のプロセッサであってもよい。また、プロセッサ２２は、記憶装置２４に格納されたアプリケーション・プログラムに従って動作するものであってもよい。アプリケーション・プログラムは、記録媒体２８４に格納されていて、ドライブ２８によって記録媒体２８４から読み出されて情報処理装置２０にインストールされてもよい。 The processor 22 may be a dedicated processor implemented as an integrated circuit, for example. Further, the processor 22 may operate in accordance with an application program stored in the storage device 24. The application program may be stored in the recording medium 284, read from the recording medium 284 by the drive 28, and installed in the information processing apparatus 20.

情報処理装置２０において、プロセッサ２２は、ユーザの操作に従って、例えば地上波および衛星放送の放送信号を放送受信機２０２に受信させて、そのオーディオおよびビデオ（ＡＶ）ストリーム・データを記憶装置２４に記録させることができる。放送信号は、例えばＤＴＣＰ（Digital Transmission Content Protection）で著作権保護されたオーディオおよびビデオ・ストリーム・データを搬送するものであってもよい。また、情報処理装置２０において、プロセッサ２２は、通信ネットワーク５を介して、ユーザ操作による携帯情報端末１０からのストリーム送信要求を受信することができる。プロセッサ２２は、さらに、携帯情報端末１０からの要求に応答して、記憶装置２４に記録されたオーディオおよびビデオ・ストリーム・データを読み出して、トランスコーダ装置２００に、復号させて低レートのストリーム・データに再符号化させることができる。プロセッサ２２は、さらに、再符号化されたストリーム・データを、ネットワーク・インタフェース２６を介し通信ネットワーク５を介して携帯情報端末１０に送信することができる。 In the information processing apparatus 20, the processor 22 causes the broadcast receiver 202 to receive, for example, terrestrial and satellite broadcast signals in accordance with user operations, and records the audio and video (AV) stream data in the storage device 24. Can be made. The broadcast signal may carry audio and video stream data copyright-protected by, for example, DTCP (Digital Transmission Content Protection). In the information processing apparatus 20, the processor 22 can receive a stream transmission request from the portable information terminal 10 by a user operation via the communication network 5. Further, in response to the request from the portable information terminal 10, the processor 22 reads the audio and video stream data recorded in the storage device 24, and causes the transcoder device 200 to decode and decode the low-rate stream signal. Data can be re-encoded. The processor 22 can further transmit the re-encoded stream data to the portable information terminal 10 via the communication interface 5 via the network interface 26.

図２は、トランスコーダ装置２００の概略的な構成（configuration）の例を示している。 FIG. 2 shows an example of a schematic configuration of the transcoder device 200.

図２において、トランスコーダ装置２００は、例えば、情報入力部２１０、ストリーム入力部２２２、音響復号部２２６、音響符号化部２３０、画像復号部２５６、画像符号化部２６０、多重化部２７０、およびストリーム出力部２８０を含んでいる。音響符号化部２３０は、例えば、信号変換部２３６、マスキング閾値生成部２３８、マスキング閾値補正部２３９、符号化判定部２４０、レベル補正部２４２、マスキング補正部２４４、量子化部２４６、および多重化部２４８を含んでいる。音響符号化部２３０は、例えば、ＡＡＣ（Advanced Audio Coding）、ＭＰ３（MPEG Audio Layer-3）またはＡＣ３（Audio Code number 3）のようなオーディオ圧縮符号化規格に従う符号化器であってもよい。従って、音響符号化部２３０によって生成される低レートのオーディオ・ストリーム・データは、例えば、ＡＡＣ、ＭＰ３またはＡＣ３のようなオーディオ圧縮符号化規格で符号化されたものであってもよい。 In FIG. 2, the transcoder apparatus 200 includes, for example, an information input unit 210, a stream input unit 222, an acoustic decoding unit 226, an acoustic encoding unit 230, an image decoding unit 256, an image encoding unit 260, a multiplexing unit 270, and A stream output unit 280 is included. The acoustic encoding unit 230 includes, for example, a signal conversion unit 236, a masking threshold generation unit 238, a masking threshold correction unit 239, an encoding determination unit 240, a level correction unit 242, a masking correction unit 244, a quantization unit 246, and a multiplexing. Part 248. The acoustic encoding unit 230 may be an encoder that complies with an audio compression encoding standard such as AAC (Advanced Audio Coding), MP3 (MPEG Audio Layer-3), or AC3 (Audio Code number 3). Therefore, the low-rate audio stream data generated by the acoustic encoding unit 230 may be encoded by an audio compression encoding standard such as AAC, MP3, or AC3.

図３は、携帯情報端末１０の概略的な構成（configuration）の例を示している。 FIG. 3 shows an example of a schematic configuration of the portable information terminal 10.

携帯情報端末１０は、電話通信以外のオーディオ信号に関連して、例えば、受信部１３０、復号部１３２、音響再生部１３６、スピーカ１３８、マイクロホン１４２、音響入力部１４４、変換部１４６、測定部１４８、および送信部１５０を含んでいる。受信部１３０および送信部１５０は、移動体通信用の無線送受信機（図示せず）に結合されても、または移動体通信用の無線送受信機を含んでいてもよい。但し、図３において、携帯情報端末１０のその他の電話通信部および情報処理部等は図示されていない。復号部１３２は、例えばＡＡＣ、ＭＰ３またはＡＣ３のようなオーディオ圧縮符号化規格に従う復号器であってもよい。 The portable information terminal 10 is related to an audio signal other than telephone communication, for example, a receiving unit 130, a decoding unit 132, an acoustic reproduction unit 136, a speaker 138, a microphone 142, an acoustic input unit 144, a conversion unit 146, and a measurement unit 148. , And a transmission unit 150. The receiver 130 and transmitter 150 may be coupled to a mobile transceiver wireless transceiver (not shown) or may include a mobile transceiver. However, in FIG. 3, other telephone communication units, information processing units, and the like of the portable information terminal 10 are not shown. The decoding unit 132 may be a decoder according to an audio compression coding standard such as AAC, MP3, or AC3.

携帯情報端末１０において、受信部１３０は復号部１３２に結合され、復号部１３２は、音響再生部１３６に結合され、音響再生部１３６はスピーカ１３８に結合されている。また、マイクロホン１４２は音響入力部１４４に結合され、音響入力部１４４は変換部１４６に結合され、変換部１４６は測定部１４８に結合され、測定部１４８は送信部１５０に結合されている。音響再生部１３６は、さらに音響入力部１４４に結合されていてもよい。 In the portable information terminal 10, the receiving unit 130 is coupled to the decoding unit 132, the decoding unit 132 is coupled to the sound reproduction unit 136, and the sound reproduction unit 136 is coupled to the speaker 138. The microphone 142 is coupled to the acoustic input unit 144, the acoustic input unit 144 is coupled to the conversion unit 146, the conversion unit 146 is coupled to the measurement unit 148, and the measurement unit 148 is coupled to the transmission unit 150. The sound reproduction unit 136 may be further coupled to the sound input unit 144.

マイクロホン１４２は携帯情報端末１０の周囲の雑音を含む音を捕捉してノイズを含む音信号を生成し、音響入力部１４４は音信号を入力して濾波し増幅する。また、変換部１４６は、音響入力部１４４からの時間領域の音信号を周波数領域のスペクトルに変換する。その変換は、例えば高速フーリエ変換（ＦＦＴ）であってもよい。また、測定部１４８は、周波数スペクトルを複数の部分帯域に分割し、各部分帯域のスペクトルの電力を検出して、各部分帯域のノイズ電力を生成する。送信部１５０は、各部分帯域のノイズ電力を含むノイズ電力情報を情報処理装置２０に送信する。一方、携帯情報端末１０は、ユーザの操作に応じたストリーム送信要求と共に、またはストリーム送信要求の一部として、周囲の雑音に関するノイズ電力情報を情報処理装置２０に送信してもよい。また、携帯情報端末１０は、新しいノイズ電力情報を定期的に（例えば、３または５分間隔で）生成して情報処理装置２０に送信してもよい。 The microphone 142 captures sound including noise around the portable information terminal 10 and generates a sound signal including noise, and the sound input unit 144 receives the sound signal, filters it, and amplifies it. Further, the conversion unit 146 converts the time-domain sound signal from the sound input unit 144 into a frequency-domain spectrum. The transformation may be, for example, a fast Fourier transform (FFT). The measurement unit 148 also divides the frequency spectrum into a plurality of partial bands, detects the power of the spectrum of each partial band, and generates noise power of each partial band. The transmission unit 150 transmits noise power information including noise power of each partial band to the information processing device 20. On the other hand, the portable information terminal 10 may transmit noise power information related to ambient noise to the information processing apparatus 20 together with a stream transmission request according to a user operation or as a part of the stream transmission request. Further, the portable information terminal 10 may generate new noise power information periodically (for example, at intervals of 3 or 5 minutes) and transmit the new noise power information to the information processing apparatus 20.

一方、受信部１３０は情報処理装置２０から符号化されたオーディオ・ストリーム・データを受信し、復号部１３２は符号化オーディオ・ストリーム・データを復号してオーディオ信号を生成する。また、音響再生部１３６は、復号されたオーディオ信号を増幅し再生してスピーカ１３８に供給して音響を発生させる。音響再生部１３６でオーディオ信号を再生してスピーカ１３８で音を発生させている時に、音響入力部１４４は、マイクロホン１４２で捕捉された音信号から、音響再生部１３６における再生中のオーディオ信号を減算して、ノイズ成分の音信号を抽出し生成してもよい。 On the other hand, the receiving unit 130 receives encoded audio stream data from the information processing apparatus 20, and the decoding unit 132 decodes the encoded audio stream data to generate an audio signal. The sound reproducing unit 136 amplifies and reproduces the decoded audio signal and supplies the amplified audio signal to the speaker 138 to generate sound. When the audio reproduction unit 136 reproduces the audio signal and the speaker 138 generates sound, the sound input unit 144 subtracts the audio signal being reproduced by the sound reproduction unit 136 from the sound signal captured by the microphone 142. Then, the sound signal of the noise component may be extracted and generated.

図２において、プロセッサ２２は、ユーザの携帯情報端末１０からのストリーム送信要求に応答して、要求された番組のオーディオおよびビデオ・ストリーム・データを記憶装置２４から読み出してトランスコーダ装置２００に供給して低いレートで再符号化させる。また、プロセッサ２２は、携帯情報端末１０からノイズ電力情報を受信してトランスコーダ装置２００に供給する。 In FIG. 2, in response to the stream transmission request from the user's portable information terminal 10, the processor 22 reads the audio and video stream data of the requested program from the storage device 24 and supplies it to the transcoder device 200. Re-encode at a lower rate. Further, the processor 22 receives noise power information from the portable information terminal 10 and supplies it to the transcoder device 200.

トランスコーダ装置２００において、情報入力部２１０は、ノイズ電力情報を入力して各部分帯域のノイズ電力レベル（図４Ａ、３０４）を音響符号化部２３０に供給する。この場合、ノイズ電力を含むノイズ電力情報は、マスキング閾値補正部２３９および符号化判定部２４０に供給される。一方、ストリーム入力部２２２は、記憶装置２４から取り出されたオーディオおよびビデオ・ストリーム・データを分離して、オーディオ・ストリーム・データを音響復号部２２６に供給し、ビデオ・ストリーム・データを画像復号部２５６に供給する。 In the transcoder device 200, the information input unit 210 inputs noise power information and supplies the noise power level (FIG. 4A, 304) of each partial band to the acoustic encoding unit 230. In this case, the noise power information including the noise power is supplied to the masking threshold correction unit 239 and the encoding determination unit 240. On the other hand, the stream input unit 222 separates the audio and video stream data extracted from the storage device 24, supplies the audio stream data to the acoustic decoding unit 226, and converts the video stream data into the image decoding unit. 256.

画像復号部２５６は、符号化ビデオ・ストリーム・データを復号して、復号ビデオ・ストリーム・データを画像符号化部２６０に供給する。画像符号化部２６０は、復号ビデオ・ストリーム・データを低いレートで再符号化して、符号化されたビデオ・ストリーム・データを多重化部２７０に供給する。 The image decoding unit 256 decodes the encoded video stream data and supplies the decoded video stream data to the image encoding unit 260. The image encoding unit 260 re-encodes the decoded video stream data at a low rate, and supplies the encoded video stream data to the multiplexing unit 270.

一方、音響復号部２２６は、符号化オーディオ・ストリーム・データを復号して、復号オーディオ・ストリーム・データを音響符号化部２３０に供給する。音響符号化部２３０において、信号変換部２３６は、受け取った時間領域の復号オーディオ・ストリーム・データを周波数領域のスペクトルに変換し、周波数スペクトルを複数の部分帯域に分割し、各部分帯域のスペクトルの電力を生成する。信号変換部２３６は、さらに、各部分帯域のスペクトルを、マスキング閾値生成部２３８、符号化判定部２４０およびレベル補正部２４２に供給する。信号変換部２３６は、例えばＭＤＣＴ（modified discrete cosine transform：修正離散コサイン変換）用の変換器であってもよい。 On the other hand, the acoustic decoding unit 226 decodes the encoded audio stream data and supplies the decoded audio stream data to the acoustic encoding unit 230. In the acoustic encoding unit 230, the signal conversion unit 236 converts the received decoded audio stream data in the time domain into a frequency domain spectrum, divides the frequency spectrum into a plurality of partial bands, and converts the spectrum of each partial band. Generate power. The signal conversion unit 236 further supplies the spectrum of each partial band to the masking threshold value generation unit 238, the encoding determination unit 240, and the level correction unit 242. The signal conversion unit 236 may be, for example, a converter for MDCT (modified discrete cosine transform).

次は、実施形態による、音響符号化部２３０における復号オーディオ・ストリーム・データの再符号化のしかたを概略的に説明する。 Next, how to re-encode the decoded audio stream data in the acoustic encoding unit 230 according to the embodiment will be schematically described.

図４Ａ〜４Ｃは、音響符号化部２３０によって処理される復号オーディオ・ストリーム・データの周波数スペクトルの例を示している。図４Ａ〜４Ｃは、初期のマスキング閾値３０２を補正し、補正マスキング閾値３１０で符号化対象のオーディオ信号の電力３５０、３５１を選択し、レベル補正対象の電力３３２、３３４、３３８、３４０を選択する方法を説明するためのものである。 4A to 4C show examples of frequency spectra of decoded audio stream data processed by the acoustic encoding unit 230. FIG. 4A to 4C correct the initial masking threshold 302, select the power 350, 351 of the audio signal to be encoded with the corrected masking threshold 310, and select the power 332, 334, 338, 340 to be level corrected. It is for explaining the method.

図４Ａ〜４Ｃにおいて、周波数ｆに対する、復号されたオーディオ信号の各部分帯域の電力（パワー）３３０（棒グラフ）、およびオーディオ信号の各部分帯域の電力３３０に対する初期のマスキング閾値（プロファイル）３０２（細い破線）が示されている。ここで、図４Ａ〜４Ｃにおけるオーディオ信号の各部分帯域の電力３３０は、可聴周波数帯域全体を分割した各部分帯域におけるそれぞれの合計の電力を表している。また、図４Ａには、携帯情報端末１０で捕捉された周囲雑音の周波数ｆに対する各部分帯域のノイズ電力レベル３０４（細い破線）が示されている。この場合、可聴周波数帯域は、例えば２０Ｈｚ〜２４ｋＨｚの周波数範囲であってもよい。 4A-4C, the power (power) 330 (bar graph) of each subband of the decoded audio signal and the initial masking threshold (profile) 302 (power) for the power 330 of each subband of the audio signal with respect to frequency f. (Broken line) is shown. Here, the power 330 of each partial band of the audio signal in FIGS. 4A to 4C represents the total power of each partial band obtained by dividing the entire audible frequency band. FIG. 4A shows the noise power level 304 (thin broken line) of each partial band with respect to the frequency f of ambient noise captured by the portable information terminal 10. In this case, the audible frequency band may be a frequency range of 20 Hz to 24 kHz, for example.

マスキング閾値生成部２３８は、各部分帯域に対する或る時間区間のブロックの復号オーディオ信号の各電力３３０の分布に対して、聴覚特性によるマスキング特性に基づいて、初期のマスキング閾値３０２を生成してマスキング閾値補正部２３９に供給する。一方、情報入力部２１０は、携帯情報端末１０から受信したノイズ電力情報に含まれる複数の部分帯域にわたるノイズ電力レベル３０４をマスキング閾値補正部２３９および符号化判定部２４０に供給する。 The masking threshold value generation unit 238 generates an initial masking threshold value 302 and masks the distribution of each power 330 of the decoded audio signal of the block in a certain time interval with respect to each partial band based on the masking characteristic based on the auditory characteristic. The threshold value correction unit 239 is supplied. On the other hand, the information input unit 210 supplies the noise power level 304 over a plurality of partial bands included in the noise power information received from the portable information terminal 10 to the masking threshold correction unit 239 and the encoding determination unit 240.

次いで、マスキング閾値補正部２３９は、各部分帯域について、初期のマスキング閾値３０２をノイズ電力レベル３０４と合成することによって補正して、合成マスキング閾値３０６（太い一点鎖線）を形成する。その合成は、例えば、各部分帯域について、マスキング閾値３０２とノイズ電力レベル３０４とを比較して、ノイズ電力レベル３０４がマスキング閾値３０２の電力を超える場合に、マスキング閾値３０２をノイズ電力レベル３０４で置換することによって行われる。換言すれば、その合成は、各部分帯域について、マスキング閾値３０２とノイズ電力レベル３０４のいずれか大きい方の電力レベルを選択して、合成マスキング閾値３０６を形成する処理であってもよい。 Next, the masking threshold correction unit 239 corrects each partial band by combining the initial masking threshold 302 with the noise power level 304 to form a combined masking threshold 306 (thick one-dot chain line). For example, the combination compares the masking threshold 302 and the noise power level 304 for each subband and replaces the masking threshold 302 with the noise power level 304 if the noise power level 304 exceeds the power of the masking threshold 302. Is done by doing. In other words, the combination may be a process of selecting the larger power level of the masking threshold 302 and the noise power level 304 for each partial band to form the combined masking threshold 306.

図４Ａにおいて、復号オーディオ信号の各電力３３０を合成マスキング閾値３０６と比較すると、例えば、或る部分帯域の電力３３６は合成マスキング閾値３０６より大きい。ここで、初期のマスキング閾値３０２より大きい部分帯域の電力３３６を聴覚符号化して携帯情報端末１０に送信するとする。この場合、合成マスキング閾値３０６より充分大きい部分帯域の電力３３６は携帯情報端末１０において有効に聴取可能である。しかし、合成マスキング閾値３０６より僅かに小さい別の部分帯域の電力３３１は、聴取可能性が僅かにあるだけで、その聴覚的な再生音質が低い。また、合成マスキング閾値３０６より大きい部分帯域の電力３３６だけを聴覚符号化すると、伝送レートまたは伝送容量のうちの使用されない残りの部分が無駄になる。従って、トランスコーダ装置２００は、伝送レートの範囲内で、聴覚的な再生音質の向上に寄与する他の部分帯域の電力を選択してレベル補正または増幅して追加的に聴覚符号化し送信することができる。それによって、トランスコーダ装置２００は、携帯情報端末１０でのオーディオ信号の実効的な聴覚的な再生音質を向上させることができる。 In FIG. 4A, comparing each power 330 of the decoded audio signal with the composite masking threshold 306, for example, the power 336 for a certain partial band is greater than the composite masking threshold 306. Here, it is assumed that the power 336 in the partial band larger than the initial masking threshold 302 is audio-encoded and transmitted to the portable information terminal 10. In this case, the power 336 in the partial band sufficiently larger than the composite masking threshold 306 can be effectively heard in the portable information terminal 10. However, another sub-band power 331 that is slightly smaller than the composite masking threshold 306 is only slightly audible and has low audible playback quality. Also, if only the partial band power 336 greater than the combined masking threshold 306 is aurally encoded, the remaining unused portion of the transmission rate or transmission capacity is wasted. Therefore, the transcoder 200 selects the power of another partial band that contributes to the improvement of the auditory reproduction sound quality within the range of the transmission rate, performs level correction or amplification, and additionally performs auditory encoding and transmission. Can do. Thereby, the transcoder device 200 can improve the effective auditory reproduction sound quality of the audio signal in the portable information terminal 10.

次いで、図４Ｂにおいて、マスキング閾値補正部２３９は、全ての部分帯域にわたって合成マスキング閾値３０６の電力を、伝送レートに応じて、マスキング閾値３０２を下限として下向き矢印のように推定の余裕伝送容量の範囲内で一律に低く移動させて補正する。それによって、太い破線で示されたマスキング閾値３０８が得られる。この補正は、後で説明する第１の要素ｆ_１による補正である。 Next, in FIG. 4B, the masking threshold value correction unit 239 uses the power of the combined masking threshold value 306 over all the partial bands, and the estimated margin transmission capacity range as indicated by a downward arrow with the masking threshold value 302 as a lower limit, depending on the transmission rate. The correction is made by moving it uniformly low within. Thereby, a masking threshold value 308 indicated by a thick broken line is obtained. This correction is a correction by a first element f ₁ described later.

次いで、マスキング閾値補正部２３９は、マスキング閾値３０２より充分大きいオーディオ信号の各電力３５０の部分帯域とその付近について、マスキング閾値３０２を下限として、マスキング閾値３０８を、所定量だけまたは各電力３５０より低くなるように補正する。その際、補正の対象となるそのような部分帯域として、例えば、オーディオ信号の電力３３０と初期のマスキング閾値３０２に関する信号対マスキング閾値のゲイン（利得）比がゲイン比閾値より大きい部分帯域が、選択されてもよい。所定量は、信号対マスキング閾値のゲイン比に応じて決定されてもよい。ここで、ゲイン比閾値は、例えば、１．０〜１．２のような１以上の値であってもよい。この補正は、後で説明する第２の要素ｆ_２による補正である。 Next, the masking threshold value correction unit 239 sets the masking threshold value 308 by a predetermined amount or lower than each power 350 with the masking threshold value 302 as a lower limit for and in the vicinity of each power 350 of the audio signal sufficiently larger than the masking threshold value 302. Correct so that At this time, as such a partial band to be corrected, for example, a partial band in which the gain ratio of the signal-to-masking threshold with respect to the audio signal power 330 and the initial masking threshold 302 is larger than the gain ratio threshold is selected. May be. The predetermined amount may be determined according to a gain ratio of the signal to masking threshold. Here, the gain ratio threshold value may be one or more values such as 1.0 to 1.2, for example. This correction is a correction according to the second element f ₂ to be described later.

また、マスキング閾値補正部２３９は、各部分帯域と隣接部分帯域における局所的にピーク性の高いオーディオ信号の電力３５１について、マスキング閾値３０２を下限として、マスキング閾値３０８を所定量だけまたは電力３５１より低くするように補正する。その際、補正対象となるそのような部分帯域として、例えば、マスキング閾値３０２より大きい或る部分帯域のオーディオ信号の電力３３０と、隣接部分帯域のオーディオ信号の電力３３０との間のゲイン比の平均値が、ゲイン比閾値より大きい部分帯域が選択される。所定量は、電力３５１のピーク性の高さに応じて決定されてもよい。ここで、ゲイン比閾値は、例えば２．０または２．５、または３であってもよい。この補正は、後で説明する第３の要素ｆ_３による補正である。 The masking threshold correction unit 239 also sets the masking threshold 302 as the lower limit and the masking threshold 308 by a predetermined amount or lower than the power 351 for the power 351 of the locally high-peak audio signal in each partial band and adjacent partial bands. Correct as follows. At this time, as such a partial band to be corrected, for example, the average gain ratio between the power 330 of the audio signal of a certain partial band larger than the masking threshold 302 and the power 330 of the audio signal of the adjacent partial band A partial band whose value is greater than the gain ratio threshold is selected. The predetermined amount may be determined according to the peak power 351. Here, the gain ratio threshold may be 2.0, 2.5, or 3, for example. This correction is a correction according to the third element f ₃ to be described later.

このようにして、マスキング閾値補正部２３９は、全ての部分帯域にわたる補正マスキング閾値３１０（図４Ｂの太い破線３０８および太い実線３１０、図４Ｃの太い実線３１０）生成する。次いで、マスキング閾値補正部２３９は、各部分帯域の補正マスキング閾値３１０を符号化判定部２４０およびマスキング補正部２４４に供給する。 In this way, the masking threshold correction unit 239 generates corrected masking thresholds 310 (thick broken line 308 and thick solid line 310 in FIG. 4B, thick solid line 310 in FIG. 4C) over all partial bands. Next, the masking threshold correction unit 239 supplies the correction masking threshold 310 of each partial band to the encoding determination unit 240 and the masking correction unit 244.

次いで、符号化判定部２４０は、図４Ｂにおける各部分帯域について復号オーディオ信号の電力３３０と補正マスキング閾値３１０を比較する。符号化判定部２４０は、補正マスキング閾値３１０より大きい電力の部分帯域を選択し、選択された部分帯域のオーディオ信号の電力３５０、３５１のデータを符号化対象と判定する。一方、符号化判定部２４０は、補正マスキング閾値３１０以下の電力を有するオーディオ信号の部分帯域の電力３５２のデータを符号化対象外と判定する。 Next, the encoding determination unit 240 compares the power 330 of the decoded audio signal with the correction masking threshold 310 for each partial band in FIG. 4B. The encoding determination unit 240 selects a partial band having a power larger than the correction masking threshold 310, and determines the data 350 and 351 of the audio signal in the selected partial band as an encoding target. On the other hand, the encoding determination unit 240 determines that the data of the power 352 in the partial band of the audio signal having power equal to or less than the correction masking threshold 310 is not to be encoded.

次いで、符号化判定部２４０は、符号化対象として決定された部分帯域の電力３５０、３５１の中で、合成マスキング閾値３０６以下の電力３３０を、合成マスキング閾値３０６より大きくなるようにレベル補正または増幅するための増幅率を決定する。その際、符号化対象として決定された部分帯域の電力３５０、３５１の中で、合成マスキング閾値３０６より充分大きい電力３３６のレベルは、そのまま維持され、または増幅率が１と決定されてもよい。次いで、符号化判定部２４０は、符号化対象として決定された電力３５０、３５１の部分帯域と、それぞれの部分帯域の増幅率とを、レベル補正部２４２に通知する。 Next, the coding determination unit 240 performs level correction or amplification so that the power 330 that is equal to or lower than the combined masking threshold 306 among the powers 350 and 351 of the partial band determined as the encoding target is greater than the combined masking threshold 306. To determine the amplification factor. At this time, the level of the power 336 sufficiently larger than the composite masking threshold 306 among the powers 350 and 351 of the partial band determined as the encoding target may be maintained as it is, or the amplification factor may be determined to be 1. Next, the encoding determination unit 240 notifies the level correction unit 242 of the partial bands of power 350 and 351 determined as encoding targets and the amplification factors of the respective partial bands.

レベル補正部２４２は、符号化判定部２４０から受け取った符号化対象の電力３５０、３５１の部分帯域とその増幅率に従って、信号変換部２３６から受け取った各部分帯域のオーディオ信号の電力３３０の部分帯域の電力３５０、３５１を増幅する。図４Ｃには、合成マスキング閾値３０６より大きくなるように増幅されたオーディオ信号の電力３３２、３３４、３３８および３４０（太い棒状の破線）が示されている。それによって、部分帯域の電力３５０、３５１は、電力３３２、３３４、３３８、３４０のように増幅される。一方、レベル補正部２４２は、符号化対象外の部分帯域のオーディオ信号の電力３５２を削除しまたは増幅率０で増幅する。それによって、符号化されまたは伝送されるオーディオ信号の情報量が低減される。 The level correction unit 242 determines the partial band of the power 330 of the audio signal of each partial band received from the signal conversion unit 236 according to the partial bands of the power 350 and 351 to be encoded received from the encoding determination unit 240 and the amplification factor. The power 350, 351 of the current is amplified. FIG. 4C shows audio signal powers 332, 334, 338 and 340 (thick bar-shaped broken lines) amplified to be greater than the combined masking threshold 306. Thereby, the sub-band power 350, 351 is amplified as the power 332, 334, 338, 340. On the other hand, the level correction unit 242 deletes or amplifies the power 352 of the audio signal in the partial band not to be encoded with an amplification factor of zero. Thereby, the amount of information of the audio signal to be encoded or transmitted is reduced.

レベル補正部２４２によってレベル補正された後のオーディオ信号の電力３３２〜３４０は、補正マスキング閾値３１０に基づいて、設定された伝送レートまたは伝送容量の範囲内で全てを量子化し切れないことがある。マスキング補正部２４４は、符号化されるオーディオ信号の電力が伝送レートのビット数を満たすように、可能な限り音質を維持するように補正マスキング閾値３１０をさらに補正することができる。 The power 332 to 340 of the audio signal after the level correction by the level correction unit 242 may not be completely quantized within a set transmission rate or transmission capacity based on the correction masking threshold 310. The masking correction unit 244 can further correct the correction masking threshold 310 so as to maintain the sound quality as much as possible so that the power of the encoded audio signal satisfies the number of bits of the transmission rate.

次いで、量子化部２４６は、各部分帯域について、レベル補正部２４２で補正されたオーディオ信号の電力３３２〜３４０を量子化する。多重化部２４８は、各部分帯域の量子化データを多重化して、符号化オーディオ・ストリーム・データとして多重化部２７０に供給する。 Next, the quantization unit 246 quantizes the powers 332 to 340 of the audio signal corrected by the level correction unit 242 for each partial band. The multiplexing unit 248 multiplexes the quantized data of each partial band, and supplies it to the multiplexing unit 270 as encoded audio stream data.

多重化部２７０は、符号化されたオーディオ・ストリーム・データを、画像符号化部２６０からのビデオ・ストリーム・データと多重化して、低いレートで符号化されたオーディオおよびビデオ・ストリーム・データを生成してストリーム出力部２８０に供給する。 The multiplexing unit 270 multiplexes the encoded audio stream data with the video stream data from the image encoding unit 260 to generate audio and video stream data encoded at a low rate. And supplied to the stream output unit 280.

ストリーム出力部２８０は、符号化されたオーディオおよびビデオ・ストリーム・データを出力する。プロセッサ２０は、オーディオおよびビデオ・ストリーム・データを、ネットワーク・インタフェース２６を介し通信ネットワーク５を介して携帯情報端末１０に送信する。 The stream output unit 280 outputs encoded audio and video stream data. The processor 20 transmits the audio and video stream data to the portable information terminal 10 via the network interface 26 and the communication network 5.

次は、携帯情報端末１０のより具体的な動作を、フローチャートに基づいて説明する。 Next, a more specific operation of the portable information terminal 10 will be described based on a flowchart.

図５は、携帯情報端末１０によって実行される、周囲の雑音を含む音に関するノイズ電力情報を生成して送信するための処理のフローチャートの例を示している。 FIG. 5 shows an example of a flowchart of processing for generating and transmitting noise power information related to a sound including ambient noise, which is executed by the portable information terminal 10.

図５を参照すると、ステップ５０２において、携帯情報端末１０の音響入力部１４４は、マイクロホン１４２で捕捉された音を表す音信号を記憶装置（図示せず）に記録する。この場合、音信号は、オーディオ信号の再生前に捕捉され、周囲雑音を表すノイズとして記録される。また、音響入力部１４４は、音響再生部１３６で受信オーディオ信号が再生されてスピーカ１３８で音が放出される時に、マイクロホン１４２で捕捉した音信号から音響再生部１３６での再生信号を減算してノイズ成分を求めて周囲雑音を表すノイズとして記録してもよい。 Referring to FIG. 5, in step 502, the acoustic input unit 144 of the portable information terminal 10 records a sound signal representing a sound captured by the microphone 142 in a storage device (not shown). In this case, the sound signal is captured before reproduction of the audio signal and recorded as noise representing ambient noise. The sound input unit 144 subtracts the reproduction signal from the sound reproduction unit 136 from the sound signal captured by the microphone 142 when the reception audio signal is reproduced by the sound reproduction unit 136 and sound is emitted from the speaker 138. A noise component may be obtained and recorded as noise representing ambient noise.

ステップ５０４において、変換部１４６は、例えば高速フーリエ変換（ＦＦＴ）によって、記録された時間領域のノイズを含む音信号ｘ_ｉｎ（ｎ）を周波数領域のスペクトルＸ（ｆ）に変換する。ここで、ｘ_ｉｎ（ｎ）は各サンプルｎの入力信号を表し、Ｘ（ｆ）は各周波数ビン（区間）ｆの入力スペクトルを表す。 In step 504, the transform unit 146 transforms the recorded sound signal x _in (n) including noise in the time domain into a spectrum X (f) in the frequency domain, for example, by fast Fourier transform (FFT). Here, x _in (n) represents an input signal of each sample n, and X (f) represents an input spectrum of each frequency bin (section) f.

変換部１４６は、さらに、周波数スペクトルＸ（ｆ）を、例えば次の式で表される周波数パワースペクトルＳ（ｆ）に変換する。
Ｓ（ｆ）＝｜Ｘ（ｆ）^２｜ The conversion unit 146 further converts the frequency spectrum X (f) into, for example, a frequency power spectrum S (f) represented by the following expression.
S (f) = | X (f) ² |

ステップ５０６において、測定部１４８は、周波数パワースペクトルＳ（ｆ）を複数の部分帯域ｓｆｂに分割して、例えば次の式で表される各部分帯域ｓｆｂのノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）を求める。

ここで、ノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）は、各部分帯域ｓｆｂ毎のグループ化された電力の総和を表し、ｓｆｂは部分帯域の番号を表し、ｓｆｂｏｆｆｓｅｔ（ｎｏ）は部分帯域のオフセット番号ｎｏを表す。それによって、測定部１４８は、各部分帯域ｓｆｂ毎にグループ化されたノイズ電力を生成する。 In step 506, the measurement unit 148 divides the frequency power spectrum S (f) into a plurality of partial bands sfb, and obtains noise power noise_pow (sfb) of each partial band sfb expressed by the following equation, for example.

Here, the noise power noise_pow (sfb) represents the sum of the grouped power for each partial band sfb, sfb represents the number of the partial band, and sfboffset (no) represents the offset number no of the partial band. Thereby, the measurement unit 148 generates noise power grouped for each partial band sfb.

ステップ５１０において、送信部１５０は、各部分帯域ｓｆｂのノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）を含むノイズ電力情報を、ストリーム送信要求と共にまたは定期的に、通信ネットワーク５を介して情報処理装置２０に送信する。一方、情報処理装置２０はノイズ電力情報を受信し、情報処理装置２０のトランスコーダ装置２００はノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）に基づいて初期のマスキング閾値３０２を補正する。 In step 510, the transmission unit 150 transmits noise power information including the noise power noise_pow (sfb) of each partial band sfb to the information processing apparatus 20 via the communication network 5 together with the stream transmission request or periodically. On the other hand, the information processing apparatus 20 receives the noise power information, and the transcoder apparatus 200 of the information processing apparatus 20 corrects the initial masking threshold 302 based on the noise power noise_pow (sfb).

次は、トランスコーダ装置２００のより具体的な動作を、フローチャートに基づいて説明する。 Next, a more specific operation of the transcoder device 200 will be described based on a flowchart.

図６Ａおよび６Ｂは、トランスコーダ装置２００の音響符号化部２３０によって実行される、携帯情報端末１０のノイズ電力レベルに応じて、復号されたビデオ・ストリーム・データを聴覚符号化するための処理のフローチャートの例を示している。 FIGS. 6A and 6B show processes for performing audio coding of the decoded video stream data according to the noise power level of the portable information terminal 10, which is executed by the acoustic coding unit 230 of the transcoder device 200. An example of a flowchart is shown.

図６Ａを参照すると、ステップ６０６において、信号変換部２３６は、音響復号部２２６によって復号された時間領域の復号オーディオ信号ｘ_ｉｎを周波数領域のスペクトルＸ（ｆ）に変換する。例えば、信号変換部２３６は、復号オーディオ信号ｘ_ｉｎに対して、変換ＭＤＣＴ（modified discrete cosine transform、修正離散コサイン変換）を行って、例えば次の式の周波数スペクトルｍｄｃｔ（ｋ）を得る。

ここで、ｋは周波数ビンを表し、ＮはＭＤＣＴ変換の窓（例えば、窓長２０４８または２５６サンプル）を表し、ｎ_０＝（Ｎ／２＋１）／２である。 Referring to FIG. 6A, in step 606, the signal conversion unit 236 converts the time-domain decoded audio signal x _in decoded by the acoustic decoding unit 226 into a frequency-domain spectrum X (f). For example, the signal conversion unit 236 performs a conversion MDCT (modified discrete cosine transform) on the decoded audio signal x _in to obtain, for example, a frequency spectrum mdct (k) of the following equation.

Here, k represents a frequency bin, N represents an MDCT transform window (for example, window length 2048 or 256 samples), and n ₀ = (N / 2 + 1) / 2.

ステップ６０８において、マスキング閾値生成部２３８は、複数の周波数ビンｋの周波数帯域を複数の部分帯域ｓｆｂに分割して、例えば次の式で表される、スペクトルｍｄｃｔ（ｋ）の各部分帯域ｓｆｂの周波数スペクトル電力ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）を算出する。

ここで、ｋは、例えば、１０２４個のビン番号０〜１０２３を表していてもよい。また、部分帯域ｓｆｂは、例えば、５２個の部分帯域番号０〜５１を表していてもよい。 In step 608, the masking threshold value generation unit 238 divides the frequency band of the plurality of frequency bins k into a plurality of partial bands sfb, and for example, represents each partial band sfb of the spectrum mdct (k) represented by the following equation: The frequency spectrum power mdct_pow (sfb) is calculated.

Here, k may represent, for example, 1024 bin numbers 0 to 1023. The partial band sfb may represent, for example, 52 partial band numbers 0 to 51.

ステップ６１０において、マスキング閾値生成部２３８は、各部分帯域ｓｆｂの電力の分布の特徴に基づいて聴覚特性に従って、可聴周波数ｆの帯域にわたる初期のマスキング閾値３０２を算出しまたは求める。マスキング閾値３０２は、例えば、各部分帯域ｓｆｂについてａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）で表される。 In step 610, the masking threshold value generator 238 calculates or obtains an initial masking threshold value 302 over the band of the audible frequency f according to the auditory characteristic based on the distribution characteristics of the power of each partial band sfb. The masking threshold 302 is expressed by, for example, allowed_pow (sfb) for each partial band sfb.

その際、各入力信号ｘ_ｉｎの各マスキング閾値が求められて、各部分帯域ｓｆｂの各マスキング閾値の中の小さい値または大きい値がマスキング閾値として選択されてもよい。また、各入力信号ｘ_ｉｎのマスキング閾値として、例えば、簡易的に、各部分帯域ｓｆｂの最小可聴域の電力が用いられてもよい。より正確なマスキング閾値を算出するための既知の方法に関する文献として、例えば、“New Implementation Techniques of an Efficient MPEG Advanced Audio Coder”（非特許文献１）がある。 At this time, each masking threshold value of each input signal x _in may be obtained, and a small value or a large value among the masking threshold values of each partial band sfb may be selected as the masking threshold value. Further, as the masking threshold value of each input signal x _in , for example, the power in the minimum audible range of each partial band sfb may be used simply. For example, “New Implementation Techniques of an Efficient MPEG Advanced Audio Coder” (Non-Patent Document 1) is known as a document relating to a known method for calculating a more accurate masking threshold.

ステップ６１２において、マスキング閾値補正部２３９は、初期のマスキング閾値３０２を、携帯情報端末１０によって生成されたノイズ電力レベル３０４（ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ））と合成して、合成マスキング閾値３０６を形成する。合成マスキング閾値３０６は、各部分帯域ｓｆｂについて、例えば次のｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）で表される。
ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）≧ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）の場合、
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）＝ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
となる。
ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）＜ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）の場合、
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）＝ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）
となる。 In step 612, the masking threshold correction unit 239 combines the initial masking threshold 302 with the noise power level 304 (noise_pow (sfb)) generated by the portable information terminal 10 to form a combined masking threshold 306. The combined masking threshold 306 is expressed by, for example, the following new_allowed_pow (sfb) for each partial band sfb.
If allowed_pow (sfb) ≧ noise_pow (sfb),
new_allowed_pow (sfb) = allowed_pow (sfb)
It becomes.
If allowed_pow (sfb) <noise_pow (sfb),
new_allowed_pow (sfb) = noise_pow (sfb)
It becomes.

合成マスキング閾値３０６は、携帯情報端末１０における周囲雑音の存在下での聴覚的な再生品質にとって有効なオーディオ信号の電力レベルの閾値を表している。即ち、マスキング閾値３０２がノイズ電力レベル３０４以上の高さを有する部分帯域において、オーディオ信号の電力が合成マスキング閾値３０６以下の場合は、聴覚特性に起因して、人はオーディオ信号をほとんど聞き取れない。また、ノイズ電力レベル３０４がマスキング閾値３０２より高い高さを有する部分帯域において、オーディオ信号の電力が合成マスキング閾値３０６以下の場合は、周囲雑音に起因して、人はオーディオ信号をほとんど聞き取れない。従って、或る部分帯域において合成マスキング閾値３０６以下のオーディオ信号は、符号化して送信されなくてよい。一方、或る部分帯域において合成マスキング閾値３０６より大きいオーディオ信号は聴取可能である。従って、音質の維持または向上に寄与する部分帯域のオーディオ信号を合成マスキング閾値３０６より大きくなるようにレベル補正し、レベル補正されたオーディオ信号を符号化して送信すれば、復号して再生されたオーディオ信号は高い音質で聴取可能となる。 The composite masking threshold value 306 represents a power level threshold value of the audio signal that is effective for auditory reproduction quality in the presence of ambient noise in the portable information terminal 10. That is, in the partial band where the masking threshold 302 is higher than the noise power level 304, when the power of the audio signal is less than or equal to the combined masking threshold 306, the person hardly hears the audio signal due to the auditory characteristics. In addition, in a partial band where the noise power level 304 is higher than the masking threshold 302, when the power of the audio signal is equal to or lower than the combined masking threshold 306, a person hardly hears the audio signal due to ambient noise. Therefore, an audio signal having a composite masking threshold value 306 or less in a certain partial band may not be encoded and transmitted. On the other hand, audio signals larger than the composite masking threshold 306 can be heard in a certain partial band. Accordingly, if the audio signal of the partial band that contributes to maintaining or improving the sound quality is level-corrected so as to be larger than the synthesis masking threshold 306 and the level-corrected audio signal is encoded and transmitted, the decoded and reproduced audio signal is transmitted. The signal can be heard with high sound quality.

図６Ｂを参照すると、ステップ６２０において、マスキング閾値補正部２３９は、さらに、合成マスキング閾値３０６を３つの要素またはファクタｆ_１〜ｆ_３で補正して、補正マスキング閾値３１０を生成する。第１の要素ｆ_１は、伝送データレートまたはビットレートに応じた合成マスキング閾値３０６に対する補正量である。第２の要素ｆ_２は、オーディオ信号３３０と初期のマスキング閾値３０２に関する信号対マスキング閾値の比または利得比に応じた合成マスキング閾値３０６の補正量である。第３の要素ｆ_３は、オーディオ信号の電力のピーク性に応じた合成マスキング閾値３０６の補正量である。要素ｆ_１〜ｆ_３による補正は、各部分帯域について、合成マスキング閾値３０６を要素ｆ_１〜ｆ_３に関して引き下げて、補正マスキング閾値３１０を生成するものである。 Referring to FIG. 6B, in step 620, the masking threshold correction unit 239 further corrects the combined masking threshold 306 with three elements or factors f _{1 to} f ₃ to generate a corrected masking threshold 310. The first element f ₁ is a correction amount for the combined masking threshold 306 according to the transmission data rate or the bit rate. The second factor f ₂ is the correction amount of the combined masking threshold 306 according to the signal to masking threshold ratio or gain ratio for the audio signal 330 and the initial masking threshold 302. The third element f ₃ is a correction amount of the synthetic masking threshold 306 corresponding to the peak of the power of the audio signal. Correction according to an element _f 1 ~f ₃ for each sub-band, the synthesis masking threshold 306 by pulling respect elements _f 1 ~f _3, and generates a corrected masking threshold 310.

合成マスキング閾値３０６を補正してレベルを下げた補正マスキング閾値３１０を用いることによって、聴覚符号化において、聴覚的な再生音質の向上に寄与するオーディオ信号の部分帯域を符号化対象として増やすことができる。 By using the corrected masking threshold value 310 obtained by correcting the composite masking threshold value 306 and lowering the level, in audio coding, it is possible to increase the partial band of the audio signal that contributes to the improvement of auditory reproduction sound quality as an encoding target. .

次は、聴覚的な再生音質の維持または向上に有効な符号化対象を増やすための、３つの要素ｆ_１〜ｆ_３による補正について、より具体的に説明する。 Next, correction by the _three elements f _{1 to} f ₃ for increasing the number of encoding targets effective for maintaining or improving the auditory reproduction sound quality will be described more specifically.

図７は、図６Ｂのステップ６２０の、より具体的なフローチャートの例を示している。 FIG. 7 shows an example of a more specific flowchart of step 620 in FIG. 6B.

図８Ａ〜８Ｃは、３つの要素の補正関数ｆ_１、ｆ_２およびｆ_３の例を示している。
図９は、オーディオ信号のスペクトルのピーク性の例を説明するための図である。 8A to 8C show examples of _three- element correction functions f ₁ , f ₂ and f ₃ .
FIG. 9 is a diagram for explaining an example of the peak nature of the spectrum of an audio signal.

図７を参照すると、ステップ６２６において、マスキング閾値補正部２３９は、伝送ビットレートに応じた補正量ｆ_１を算出する。補正量ｆ_１は、伝送ビットレートｂｉｔｒａｔｅの関数ｆ_１（ｂｉｔｒａｔｅ）として、例えば図８Ａのように表される。従って、伝送ビットレートが高くなるに従って、閾値ＢＲ_ＴＨより高い或るビットレートの範囲で概して補正量ｆ_１が増大する。それによって、伝送ビットレートに応じて、推定の余裕ビット数の範囲内で、符号化対象の部分帯域を増大させることが可能である。 Referring to FIG. 7, in step 626, the masking threshold correction unit 239 calculates a correction amount f ₁ corresponding to the transmission bit rate. The correction amount f ₁ is expressed as a function f ₁ (bit rate) of the transmission bit rate bit rate, for example, as shown in FIG. 8A. Therefore, as the transmission bit rate increases, the correction amount f ₁ generally increases in a certain bit rate range higher than the threshold value BR _TH . Thereby, it is possible to increase the partial band to be encoded within the range of the estimated marginal number of bits according to the transmission bit rate.

ステップ６２８において、マスキング閾値補正部２３９は、各部分帯域ｓｆｂにおけるオーディオ信号の電力（３３０）ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）と初期のマスキング閾値（３０２）ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）の利得比Ｒを算出する。ここで、利得比は、Ｒ＝ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）／ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）で表される。 In step 628, the masking threshold correction unit 239 calculates the gain ratio R of the power (330) mdct_pow (sfb) of the audio signal and the initial masking threshold (302) allowed_pow (sfb) in each partial band sfb. Here, the gain ratio is represented by R = mdct_pow (sfb) / allowed_pow (sfb).

ステップ６３０において、マスキング閾値補正部２３９は、比Ｒに基づいて補正量ｆ_２を算出する。補正量ｆ_２は、比Ｒの関数ｆ_２（Ｒ）として、例えば図８Ｂのように表される。従って、オーディオ信号電力ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）とマスキング閾値ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）の利得比Ｒが大きくなるに従って、閾値Ｒ_ＴＨより高い或る比の範囲で概して補正量ｆ_２が増大する。それによって、オーディオ信号の電力３３０とマスキング閾値３０２のゲイン比が大きい部分帯域が、符号化対象として選択されやすくなる。 In step 630, the masking threshold correction unit 239 calculates the correction amount f ₂ based on the ratio R. The correction amount f ₂ is expressed as a function f ₂ (R) of the ratio R, for example, as shown in FIG. 8B. Therefore, according to the gain ratio R of the audio signal power mdct_pow (sfb) and the masking threshold allowed_pow (sfb) is increased, generally the correction amount _{f 2} is increased in the range of the threshold value _{R TH} higher certain ratio. Accordingly, a partial band having a large gain ratio between the power 330 of the audio signal and the masking threshold 302 is easily selected as an encoding target.

ステップ６３２において、マスキング閾値補正部２３９は、各部分帯域ｓｆｂにおけるオーディオ信号の電力のピーク性を算出する。或る部分帯域ｓｆｂ＿ｘにおけるオーディオ信号電力ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）のピーク性ｐｅａｋ（ｓｆｂ＿ｘ）は、隣接の部分帯域の電力との利得比または利得差分の平均に基づいて算出することができる。例えば、図９において、或る部分帯域ｓｆｂ＿ｘの、隣接の部分帯域ｓｆｂ＿ｘ−１、ｓｆｂ＿ｘ＋１に対するピーク性ｐｅａｋ（ｓｆｂ＿ｘ）は、利得比の平均値による次の式で表される。
ｐｅａｋ（ｓｆｂ＿ｘ）
＝｛ｍｄｃｔ＿ｐｏｗ（ｓｆｂ＿ｘ）／ｍｄｃｔ＿ｐｏｗ（ｓｆｂ＿ｘ−１）
＋ｍｄｃｔ＿ｐｏｗ（ｓｆｂ＿ｘ）／ｍｄｃｔ＿ｐｏｗ（ｓｆｂ＿ｘ＋１）｝／２ In step 632, the masking threshold correction unit 239 calculates the peak power of the audio signal in each partial band sfb. The peak peak (sfb_x) of the audio signal power mdct_pow (sfb) in a certain partial band sfb_x can be calculated based on the gain ratio or the average of the gain difference with the power of the adjacent partial band. For example, in FIG. 9, the peak peak (sfb_x) of a certain partial band sfb_x with respect to the adjacent partial bands sfb_x-1 and sfb_x + 1 is expressed by the following equation based on the average value of the gain ratio.
peak (sfb_x)
= {Mdct_pow (sfb_x) / mdct_pow (sfb_x-1)
+ Mdct_pow (sfb_x) / mdct_pow (sfb_x + 1)} / 2

ステップ６３４において、マスキング閾値補正部２３９は、ピーク性ｐｅａｋ（ｓｆｂ＿ｘ）に基づいて補正量ｆ_３を算出する。補正量ｆ_３は、ピーク性ｐｅａｋ（ｓｆｂ＿ｘ）の関数ｆ_２（ｐｅａｋ）として、例えば図８Ｃのように表される。従って、ピーク性ｐｅａｋが大きくなるに従って、閾値Ｐ_ＴＨより或るピーク性の範囲で概して補正量ｆ_３が増大する。それによって、隣接の部分帯域との比較でピーク性の高いオーディオ信号の部分帯域が、符号化対象として選択されやすくなる。 In step 634, the masking threshold correction portion 239 calculates the correction amount _{f 3} based on the peak properties peak (sfb_x). The correction amount f ₃ is expressed as a function f ₂ (peak) of the peak peak (sfb_x), for example, as shown in FIG. 8C. Thus, in accordance with the peak of peak increases, generally the correction amount _{f 3} increases at a certain peak of the range from the threshold value _{P TH.} This makes it easier to select a partial band of an audio signal having a high peak as compared with an adjacent partial band as an encoding target.

ステップ６３６において、マスキング閾値補正部２３９は、合成補正量α（ｓｆｂ）に基づいて合成マスキング閾値３０６を補正する。補正マスキング閾値３１０は、例えば次の式で表される。
補正マスキング閾値３１０＝合成マスキング閾値３０６×補正量α（ｓｆｂ）
ここで、３つの要素ｆ_１〜ｆ_３による合成補正量α（ｓｆｂ）は、例えば次の式で表される。
α（ｓｆｂ）＝−（ｆ_１×ｆ_２×ｆ_３） In step 636, the masking threshold correction unit 239 corrects the composite masking threshold 306 based on the composite correction amount α (sfb). The correction masking threshold 310 is expressed by the following equation, for example.
Correction masking threshold 310 = composite masking threshold 306 × correction amount α (sfb)
Here, the combined correction amount α (sfb) by the _three elements f _{1 to} f ₃ is expressed by the following equation, for example.
α (sfb) = − (f ₁ × f ₂ × f ₃ )

従って、合成マスキング閾値３０６を補正して得られる補正マスキング閾値３１０、即ちｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）は、例えば次の式で表される。
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×α（ｓｆｂ）
但し、補正マスキング閾値３１０は、初期のマスキング閾値３０２より小さくなることはなく、従って次の式を満たす。
左辺ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）≧ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
次いで、マスキング閾値補正部２３９は、各部分帯域の補正マスキング閾値３１０をマスキング補正部２４４に供給する。 Therefore, the corrected masking threshold 310 obtained by correcting the composite masking threshold 306, that is, new_allowed_pow (sfb) is expressed by the following equation, for example.
new_allowed_pow (sfb)
= New_allowed_pow (sfb) × α (sfb)
However, the corrected masking threshold 310 is never smaller than the initial masking threshold 302, and therefore satisfies the following equation.
Left side new_allowed_pow (sfb) ≧ allowed_pow (sfb)
Next, the masking threshold correction unit 239 supplies the correction masking threshold 310 of each partial band to the masking correction unit 244.

図６Ｂを再び参照すると、ステップ６４０において、符号化判定部２４０は、各部分帯域ｓｆｂについてオーディオ信号の電力３３０と補正マスキング閾値３１０とを比較する。次いで、符号化判定部２４０は、補正マスキング閾値３１０より大きい電力を有する部分帯域ｓｆｂのオーディオ信号の電力３５０、３５１を符号化対象と判定する。即ち、各部分帯域ｓｆｂについて、オーディオ信号の電力ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）は、補正マスキング閾値ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）より大きい（＞）場合に、符号化対象として選択される。一方、符号化判定部２４０は、補正マスキング閾値３１０以下の電力を有する部分帯域のオーディオ信号の電力３５２を符号化対象外と判定する。 Referring to FIG. 6B again, in step 640, the encoding determination unit 240 compares the power 330 of the audio signal with the correction masking threshold 310 for each partial band sfb. Next, the encoding determination unit 240 determines the powers 350 and 351 of the audio signal in the partial band sfb having a power larger than the correction masking threshold 310 as an encoding target. That is, for each partial band sfb, the power mdct_pow (sfb) of the audio signal is selected as an encoding target when it is larger (>) than the correction masking threshold new_allowed_pow (sfb). On the other hand, the encoding determination unit 240 determines that the power 352 of the audio signal in the partial band having power equal to or less than the correction masking threshold 310 is not to be encoded.

ステップ６４２において、符号化判定部２４０は、補正マスキング閾値３１０より大きい各部分帯域のオーディオ信号の電力３５０、３５１を、合成マスキング閾値３０６より大きくなるようにレベル補正または増幅するための増幅率γを決定する。一方、符号化判定部２４０は、合成マスキング閾値３０６より充分大きい各部分帯域のオーディオ信号の電力３３６の増幅率γを１と決定し、増幅しないようにする。次いで、レベル補正部２４２は、各増幅率γに従ってオーディオ信号の部分帯域の電力３５０、３５１をレベル補正し増幅する。 In step 642, the encoding determination unit 240 sets an amplification factor γ for level correction or amplification so that the power 350 and 351 of each subband larger than the correction masking threshold 310 is larger than the combined masking threshold 306. decide. On the other hand, the encoding determination unit 240 determines the amplification factor γ of the power 336 of the audio signal in each partial band sufficiently larger than the synthesis masking threshold 306 as 1 so as not to amplify. Next, the level correction unit 242 corrects and amplifies the power 350 and 351 of the partial band of the audio signal according to each amplification factor γ.

より具体的には、オーディオ信号の電力ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）は、例えば次式に従って、周囲雑音に埋もれないように増幅される。
部分帯域ｓｆｂに対して、
ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）＝ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）×ｇａｉｎ（ｓｆｂ）
ここで、ゲインｇａｉｎ（ｓｆｂ）は、例えば次式で表されてもよい。
ｇａｉｎ（ｓｆｂ）＝（ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）／ｍｄｃｔ＿ｐｏｗ（ｓｆｂ））×γ、
ここで、例えば、係数γ＝１．２であってもよい。
このゲインｇａｉｎ（ｓｆｂ）の式において、ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）／ｍｄｃｔ＿ｐｏｗ（ｓｆｂ）の比によって、オーディオ信号の電力が、ノイズ電力と同じレベルにレベル補正され、さらに、係数γによってノイズ電力より大きく増幅される。
また、部分帯域ｓｆｂのグループの電力における各ビンの電力に対する増幅は、次の式で表される。
ｍｄｃｔ＿ｐｏｗ（ｋ）＝ｍｄｃｔ＿ｐｏｗ（ｋ）×ｇａｉｎ（ｓｆｂ） More specifically, the power mdct_pow (sfb) of the audio signal is amplified so as not to be buried in ambient noise, for example, according to the following equation.
For the partial band sfb,
mdct_pow (sfb) = mdct_pow (sfb) × gain (sfb)
Here, the gain gain (sfb) may be expressed by the following equation, for example.
gain (sfb) = (noise_pow (sfb) / mdct_pow (sfb)) × γ,
Here, for example, the coefficient γ may be 1.2.
In this gain gain (sfb) expression, the power of the audio signal is level-corrected to the same level as the noise power by the ratio of noise_pow (sfb) / mdct_pow (sfb), and further amplified by a coefficient γ to be larger than the noise power. The
Further, the amplification of the power of each bin in the group power of the partial band sfb is expressed by the following equation.
mdct_pow (k) = mdct_pow (k) × gain (sfb)

ステップ６４４において、マスキング補正部２４４は、量子化部２４６と連携して動作して、マスキング閾値補正部２３９から受け取った補正マスキング閾値３１０をさらに伝送レートに適合するように補正する。一方、量子化部２４６は、マスキング補正部２４４によって補正されたマスキング閾値に基づいて、レベル補正部２４２によってレベル補正された各部分帯域または周波数ビンのオーディオ信号の電力を量子化する。 In step 644, the masking correction unit 244 operates in cooperation with the quantization unit 246 to correct the corrected masking threshold 310 received from the masking threshold correction unit 239 so as to match the transmission rate. On the other hand, the quantization unit 246 quantizes the power of the audio signal of each partial band or frequency bin whose level is corrected by the level correction unit 242 based on the masking threshold corrected by the masking correction unit 244.

ステップ６４６において、多重化部２４８は、量子化部２４６によって生成された複数の部分帯域のオーディオ・ストリーム・データを多重化して、聴覚符号化されたオーディオ・ストリーム・データを生成する。このようにして、音響復号部２２６で復号されたオーディオ信号は、音響符号化部２３０で、雑音に埋もれにくい形態の低いレートのデータへと、携帯情報端末１０の周囲雑音レベルに応じて適応的に符号化または変換される。 In step 646, the multiplexing unit 248 multiplexes the plurality of subband audio stream data generated by the quantization unit 246 to generate audio stream data that is audio-coded. In this way, the audio signal decoded by the acoustic decoding unit 226 is adaptively converted into low-rate data in a form that is hard to be buried in noise by the acoustic encoding unit 230 according to the ambient noise level of the portable information terminal 10. Is encoded or converted.

次は、上述の実施形態の変形形態による、３つの要素ｆ_１〜ｆ_３に加えてさらに信号対ノイズ（ＳＮ）比に関する別の要素ｆ_SNに基づいて合成マスキング閾値３０６を補正することによって、補正マスキング閾値３１０を生成する方法を説明する。この場合、携帯情報端末１０における再生オーディオ信号と周囲雑音に関する信号対ノイズ（ＳＮ）比が、合成マスキング閾値３０６の補正に用いられて、補正マスキング閾値３１０が生成される。 Next, by correcting the composite masking threshold 306 based on another factor f _SN in addition to the _three factors f ₁ -f ₃ in addition to the _three factors f ₁ -f 3 according to a variation of the above embodiment, A method for generating the correction masking threshold 310 will be described. In this case, the reproduction audio signal in the portable information terminal 10 and the signal-to-noise (SN) ratio related to the ambient noise are used to correct the combined masking threshold 306 to generate the corrected masking threshold 310.

図１０は、上述の実施形態の変形形態による、携帯情報端末１０の別の概略的な構成（configuration）の例を示している。 FIG. 10 shows an example of another schematic configuration of the portable information terminal 10 according to a modification of the above-described embodiment.

図１０において、測定部１４８は、復号部１３２に結合されていて、復号部１３２から復号されたオーディオ信号またはそのパワースペクトルを受け取る。携帯情報端末１０におけるその他の結合関係は、図３のものと同様である。 In FIG. 10, the measurement unit 148 is coupled to the decoding unit 132 and receives the decoded audio signal or its power spectrum from the decoding unit 132. Other coupling relationships in the portable information terminal 10 are the same as those in FIG.

図１０の携帯情報端末１０において、受信部１３０は情報処理装置２０から符号化されたオーディオ・ストリーム・データを受信し、復号部１３２は符号化オーディオ・ストリーム・データを復号してオーディオ信号を生成する。また、音響再生部１３６は、オーディオ信号を増幅し再生してスピーカ１３８に供給して音響を発生させる。 In the portable information terminal 10 of FIG. 10, the receiving unit 130 receives the encoded audio stream data from the information processing apparatus 20, and the decoding unit 132 generates the audio signal by decoding the encoded audio stream data. To do. The sound reproducing unit 136 amplifies and reproduces the audio signal and supplies the amplified signal to the speaker 138 to generate sound.

一方、変換部１４６は、マイクロホン１４２で捕捉された音信号を周波数スペクトルに変換する。この場合、捕捉された音信号は、周囲雑音のノイズ信号成分と、スピーカ１３８で発生した再生オーディオ信号成分とを含んでいる。測定部１４８は、復号されたオーディオ信号の各部分帯域の周波数スペクトルを算出する。また、測定部１４８は、捕捉された音信号の部分帯域のスペクトルを算出する。次いで、測定部１４８は、各部分帯域について、捕捉された音信号のスペクトルからオーディオ信号成分のスペクトルを減算して、ノイズ信号成分のスペクトルを算出してもよい。次いで、測定部１４８は、オーディオ信号成分とノイズ信号成分の各スペクトルに基づいて信号対ノイズ比を算出する。次いで、送信部１５０は、ノイズ電力および信号対ノイズ比を含むノイズ電力情報を、ストリーム送信要求と共にまたは定期的に情報処理装置２０に送信する。 On the other hand, the conversion unit 146 converts the sound signal captured by the microphone 142 into a frequency spectrum. In this case, the captured sound signal includes a noise signal component of ambient noise and a reproduced audio signal component generated by the speaker 138. The measurement unit 148 calculates the frequency spectrum of each partial band of the decoded audio signal. In addition, the measurement unit 148 calculates the spectrum of the partial band of the captured sound signal. Next, the measurement unit 148 may calculate the spectrum of the noise signal component by subtracting the spectrum of the audio signal component from the spectrum of the captured sound signal for each partial band. Next, the measurement unit 148 calculates a signal-to-noise ratio based on each spectrum of the audio signal component and the noise signal component. Next, the transmission unit 150 transmits noise power information including noise power and a signal-to-noise ratio to the information processing apparatus 20 together with the stream transmission request or periodically.

代替形態として、携帯情報端末１０は復号オーディオ信号をスピーカ１３８で発生させずに、測定部１４８が、捕捉された周囲雑音のノイズ信号のスペクトルと、復号オーディオ信号の算出されたスペクトルとに基づいて、信号対ノイズ比を算出してもよい。その際、復号オーディオ信号のスペクトルは、復号オーディオ信号をスピーカ１３８で再生した場合の音響的特性に基づいて、オーディオ信号のスペクトルを修正することによって、算出されてもよい。 As an alternative, the portable information terminal 10 does not generate the decoded audio signal at the speaker 138, and the measurement unit 148 is based on the noise signal spectrum of the captured ambient noise and the calculated spectrum of the decoded audio signal. The signal-to-noise ratio may be calculated. At this time, the spectrum of the decoded audio signal may be calculated by correcting the spectrum of the audio signal based on the acoustic characteristics when the decoded audio signal is reproduced by the speaker 138.

この場合、図２を再び参照すると、トランスコーダ装置２００において、情報入力部２１０は、ノイズ電力および信号対ノイズ比を含むノイズ電力情報を入力して音響符号化部２３０に供給する。ノイズ電力および信号対ノイズ比を含むノイズ電力情報は、マスキング閾値補正部２３９および符号化判定部２４０に供給される。情報処理装置２０のトランスコーダ装置２００のマスキング閾値補正部２３９は、さらに、携帯情報端末１０における信号対ノイズ比に基づいて合成マスキング閾値３０６を補正する。 In this case, referring to FIG. 2 again, in the transcoder device 200, the information input unit 210 inputs noise power information including noise power and a signal-to-noise ratio and supplies the noise power information to the acoustic encoding unit 230. The noise power information including the noise power and the signal-to-noise ratio is supplied to the masking threshold correction unit 239 and the encoding determination unit 240. The masking threshold value correction unit 239 of the transcoder device 200 of the information processing device 20 further corrects the combined masking threshold value 306 based on the signal-to-noise ratio in the portable information terminal 10.

図１１は、図５のフローチャートの変形形態であり、携帯情報端末１０によって実行される、ノイズ電力および信号対ノイズ比を生成して送信するための処理のフローチャートの例を示している。 FIG. 11 is a modification of the flowchart of FIG. 5 and shows an example of a flowchart of processing for generating and transmitting noise power and a signal-to-noise ratio, which is executed by the portable information terminal 10.

図１１を参照すると、ステップ５０２において、携帯情報端末１０の音響入力部１４４は、マイクロホン１４２で捕捉された音信号を記憶装置に記録する。この場合、音信号は、オーディオ信号の再生前に捕捉されたものがノイズとして記録され、また、復号オーディオ信号の再生中の再生音の信号成分と周囲音のノイズ成分を含むものが混合音信号として記憶装置に記録される。 Referring to FIG. 11, in step 502, the acoustic input unit 144 of the portable information terminal 10 records the sound signal captured by the microphone 142 in the storage device. In this case, the sound signal captured before the reproduction of the audio signal is recorded as noise, and the sound signal including the reproduced sound signal component during reproduction of the decoded audio signal and the noise component of the ambient sound is a mixed sound signal. Is recorded in the storage device.

ステップ５０４は、図５のものと同様である。即ち、変換部１４６は、時間領域の音信号ｘ_ｉｎ（ｎ）を周波数領域のスペクトルＸ（ｆ）に変換し、さらに周波数パワースペクトルＳ（ｆ）に変換する。 Step 504 is similar to that of FIG. That is, the converting unit 146 converts the sound signal x _in (n) _in the time domain into the spectrum X (f) in the frequency domain, and further converts it into the frequency power spectrum S (f).

ステップ５０６において、測定部１４８は、図５の場合と同様に、周波数パワースペクトルＳ（ｆ）を複数の部分帯域ｓｆｂに分割して、各部分帯域ｓｆｂのノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）および再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）を求める。ここで、再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）は、再生時の復号オーディオ信号の電力を表す。この場合、再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）は、復号部１３２で復号されたオーディオ信号に基づいて、計算上の推定値として求められてもよい。その際、再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）は、携帯情報端末１０におけるスピーカ１３８およびマイクロホン１４２等の音響特性を考慮して修正して、計算されてもよい。また、再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）は、或る時点で捕捉された混合音信号の電力ｓｏｕｎｄ＿ｐｏｗ（ｓｆｂ）から、別の時点で捕捉されたノイズの電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）を減算することによって、求められてもよい。また、ノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）は、捕捉された混合音信号の電力ｓｏｕｎｄ＿ｐｏｗ（ｓｆｂ）から、計算で求めた再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）を減算することによって、求められてもよい。各部分帯域ｓｆｂのノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）および再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）は、例えば次の式で表される。

In step 506, the measurement unit 148 divides the frequency power spectrum S (f) into a plurality of partial bands sfb, and the noise power noise_pow (sfb) and the reproduced sound signal of each partial band sfb, as in the case of FIG. The power signal_pow (sfb) is obtained. Here, the reproduction sound signal power signal_pow (sfb) represents the power of the decoded audio signal during reproduction. In this case, the reproduction sound signal power signal_pow (sfb) may be obtained as a calculated estimated value based on the audio signal decoded by the decoding unit 132. At this time, the reproduction sound signal power signal_pow (sfb) may be calculated by correcting acoustic characteristics of the portable information terminal 10 such as the speaker 138 and the microphone 142. Further, the reproduced sound signal power signal_pow (sfb) is obtained by subtracting the noise power noise_pow (sfb) captured at a different time from the power sound_pow (sfb) of the mixed sound signal captured at a certain time. It may be sought. Further, the noise power noise_pow (sfb) may be obtained by subtracting the reproduced sound signal power signal_pow (sfb) obtained by calculation from the power sound_pow (sfb) of the captured mixed sound signal. The noise power noise_pow (sfb) and the reproduction sound signal power signal_pow (sfb) of each partial band sfb are expressed by the following equations, for example.

ステップ５０８において、測定部１４８は、各部分帯域のノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）および再生音信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）に基づいて、各部分帯域の信号対ノイズ比ＳＮ＿ｒａｔｉｏ（ｓｆｂ）を算出する。信号対ノイズ比ＳＮ＿ｒａｔｉｏ（ｓｆｂ）は、例えば次の式で表される。
ＳＮ＿ｒａｔｉｏ（ｓｆｂ）
＝ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）／ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ） In step 508, the measurement unit 148 calculates the signal-to-noise ratio SN_ratio (sfb) of each partial band based on the noise power noise_pow (sfb) and the reproduction sound signal power signal_pow (sfb) of each partial band. The signal-to-noise ratio SN_ratio (sfb) is expressed by the following equation, for example.
SN_ratio (sfb)
= Signal_pow (sfb) / noise_pow (sfb)

ステップ５１０において、送信部１５０は、各部分帯域ｓｆｂのノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）および信号対ノイズ（ＳＮ）比ＳＮ＿ｒａｔｉｏ（ｓｆｂ）を含むノイズ電力情報を、図５の場合と同様に情報処理装置２０に送信する。この場合、情報処理装置２０のトランスコーダ装置２００は、ノイズ電力ｎｏｉｓｅ＿ｐｏｗ（ｓｆｂ）に基づいて初期のマスキング閾値３０２を補正し、さらに信号対ノイズ（ＳＮ）比ＳＮ＿ｒａｔｉｏ（ｓｆｂ）に基づいて合成マスキング閾値３０６を補正する。 In step 510, the transmission unit 150 obtains noise power information including the noise power noise_pow (sfb) and the signal-to-noise (SN) ratio SN_ratio (sfb) of each partial band sfb as in the case of FIG. Send to. In this case, the transcoder device 200 of the information processing device 20 corrects the initial masking threshold 302 based on the noise power noise_pow (sfb), and further combines the masking threshold based on the signal-to-noise (SN) ratio SN_ratio (sfb). 306 is corrected.

代替形態として、各部分帯域における信号電力ｓｉｇｎａｌ＿ｐｏｗ（ｓｆｂ）および信号対ノイズ比は、携帯情報端末１０から送信されたノイズ電力に基づいて、トランスコーダ装置２００のマスキング閾値補正部２３９によって算出されてもよい。この場合、信号対ノイズ比は、音響符号化部２３０において符号化されたオーディオ信号を遅延させた信号と、携帯情報端末１０から受信したノイズ電力とに基づいて、算出されてもよい。 As an alternative, the signal power signal_pow (sfb) and the signal-to-noise ratio in each partial band may be calculated by the masking threshold correction unit 239 of the transcoder device 200 based on the noise power transmitted from the portable information terminal 10. Good. In this case, the signal-to-noise ratio may be calculated based on the signal obtained by delaying the audio signal encoded by the acoustic encoding unit 230 and the noise power received from the portable information terminal 10.

図１１の携帯情報端末１０によって実行されるフローチャートに対して、トランスコーダ装置２００の音響符号化部２３０によって図６Ａおよび６Ｂのフローチャートが同様に実行される。 The flowcharts of FIGS. 6A and 6B are similarly executed by the acoustic encoding unit 230 of the transcoder apparatus 200 with respect to the flowchart executed by the portable information terminal 10 of FIG.

図１２は、図７のフローチャートの変形形態であり、合成マスキング閾値３０６を要素ｆ_ＳＮおよび３つの要素ｆ_１〜ｆ_３で補正するための処理の別のフローチャートの例を示している。 FIG. 12 is a modification of the flowchart of FIG. 7 and shows an example of another flowchart of the process for correcting the composite masking threshold 306 with the element f _SN and the three elements f _{1 to} f ₃ .

図１２を参照すると、ステップ６２２において、マスキング閾値補正部２３９は、携帯情報端末１０における信号対ノイズ（Ｓ／Ｎ）比（利得比）が例えば１未満（＜１）の場合、信号対ノイズ（Ｓ／Ｎ）比に応じ補正量ｆ_ＳＮを算出する。補正量ｆ_ＳＮは、信号対ノイズ（Ｓ／Ｎ）比に応じて合成マスキング閾値３０６を増大させるように補正するための、次式で表される補正量である。
ｆ_ＳＮ＝ｆ_ＳＮＭ＝１／ＳＮ＿ｒａｔｉｏ（ｓｆｂ）
この場合、マスキング閾値補正部２３９は、後で説明するように、次のように補正される。
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×ｆ_ＳＮＭ
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）／ＳＮ＿ｒａｔｉｏ（ｓｆｂ） Referring to FIG. 12, in step 622, the masking threshold value correction unit 239 determines that the signal-to-noise (S / N) ratio (gain ratio) in the portable information terminal 10 is less than 1, for example (<1). The correction amount f _SN is calculated according to the (S / N) ratio. The correction amount f _SN is a correction amount represented by the following equation for correcting the composite masking threshold 306 in accordance with the signal-to-noise (S / N) ratio.
f _SN = f _SNM = 1 / SN_ratio (sfb)
In this case, the masking threshold correction unit 239 corrects as follows, as will be described later.
new_allowed_pow (sfb)
= New_allowed_pow (sfb) × f _SNM
= New_allowed_pow (sfb) / SN_ratio (sfb)

代替形態として、補正量ｆ_ＳＮは、次式で表される補正量であってもよい。
ｆ_ＳＮ＝ｆ_ＳＮＡ＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×（１／ＳＮ＿ｒａｔｉｏ（ｓｆｂ）−１）
この場合、マスキング閾値補正部２３９は、後で説明するように、次のように補正される。
補正された合成マスキング閾値（３０６）ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）は、例えば次の式で表される。
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×ｆ_ＳＮＡ
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×１／ＳＮ＿ｒａｔｉｏ（ｓｆｂ） As an alternative, the correction amount f _SN may be a correction amount represented by the following equation.
f _SN = f _SNA = new_allowed_pow (sfb) × (1 / SN_ratio (sfb) −1)
In this case, the masking threshold correction unit 239 corrects as follows, as will be described later.
The corrected composite masking threshold (306) new_allowed_pow (sfb) is expressed by the following equation, for example.
new_allowed_pow (sfb)
= New_allowed_pow (sfb) × f _SNA
= New_allowed_pow (sfb) × 1 / SN_ratio (sfb)

このように、オーディオ信号の信号対ノイズ比が１未満の場合に合成マスキング閾値３０６を引き上げることによって、聴覚的な再生音質の向上に寄与する部分帯域のオーディオ信号のレベル補正量を、雑音に埋もれにくいように大きくできる。 As described above, when the signal-to-noise ratio of the audio signal is less than 1, the composite masking threshold 306 is raised, so that the level correction amount of the audio signal in the partial band that contributes to the improvement of the auditory reproduction sound quality is buried in the noise. Can be enlarged to be difficult.

ステップ６２６〜６３４は、図７のものと同様である。
ステップ６３７において、マスキング閾値補正部２３９は、補正量ｆ_ＳＮおよび合成補正量α（ｓｆｂ）に基づいて合成マスキング閾値３０６を補正する。補正マスキング閾値３１０は、例えば次の式で表される。
補正マスキング閾値３１０＝合成マスキング閾値３０６×補正量ｆ_ＳＮＭ×補正量α（ｓｆｂ）、または
補正マスキング閾値３１０＝合成マスキング閾値３０６×補正量ｆ_ＳＮＡ×補正量α（ｓｆｂ）
ここで、前述の通り、合成補正量α（ｓｆｂ）＝−（ｆ_１×ｆ_２×ｆ_３）である。この場合、補正量ｆ_ＳＮが大きくなるに従って、伝送データレートに応じた補正量ｆ_１が大きくなる傾向にあってもよい。 Steps 626 to 634 are the same as those in FIG.
In step 637, the masking threshold correcting portion 239 corrects the synthesized masking threshold 306 based on the correction amount _{f SN} and combined correction amount α (sfb). The correction masking threshold 310 is expressed by the following equation, for example.
Correction masking threshold 310 = synthesis masking threshold 306 × correction amount f _SNM × correction amount α (sfb), or correction masking threshold 310 = synthesis masking threshold 306 × correction amount f _SNA × correction amount α (sfb)
Here, as described above, the combined correction amount α (sfb) = − (f ₁ × f ₂ × f ₃ ). In this case, the correction amount f ₁ corresponding to the transmission data rate may tend to increase as the correction amount f _SN increases.

従って、合成マスキング閾値３０６をさらに補正して得られる補正マスキング閾値３１０、即ちｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）は、例えば次の式で表される。
ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）×補正量ｆ_ＳＮＭ×α（ｓｆｂ）、または
＝ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）＋補正量ｆ_ＳＮＡ×α（ｓｆｂ）
但し、補正マスキング閾値３１０は、初期のマスキング閾値３０２より小さくなることはなく、従って次の式を満たす。
左辺ｎｅｗ＿ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）≧ａｌｌｏｗｅｄ＿ｐｏｗ（ｓｆｂ）
次いで、マスキング閾値補正部２３９は、各部分帯域の補正マスキング閾値３１０をマスキング補正部２４４に供給する。 Accordingly, the corrected masking threshold 310 obtained by further correcting the composite masking threshold 306, that is, new_allowed_pow (sfb) is expressed by the following equation, for example.
new_allowed_pow (sfb)
= New_allowed_pow (sfb) × correction amount _{f SNM} × _α (sfb), or = new_allowed_pow (sfb) + correction amount _{f SNA} × _α (sfb)
However, the corrected masking threshold 310 is never smaller than the initial masking threshold 302, and therefore satisfies the following equation.
Left side new_allowed_pow (sfb) ≧ allowed_pow (sfb)
Next, the masking threshold correction unit 239 supplies the correction masking threshold 310 of each partial band to the masking correction unit 244.

次は、上述の実施形態のさらに別の変形形態による、携帯情報端末１０の再生オーディオ信号の信号対ノイズ比に応じて、トランスコーダ装置２００における周波数領域に変換されるオーディオ信号の時間領域の分析窓の長さを制御する方法を説明する。 Next, the time domain analysis of the audio signal converted into the frequency domain in the transcoder device 200 according to the signal-to-noise ratio of the reproduced audio signal of the portable information terminal 10 according to still another modification of the above-described embodiment. A method for controlling the length of the window will be described.

オーディオ信号符号化に関しては、アタック音など時間領域で急峻な変化を見せる信号に対しては分析長を短くして、プリエコーノイズが抑制される。しかし、単位時間での符号化効率が低下するので、入力音に応じて分析長の長さが適切に調整される。例えば、ＡＡＣなどでは、１２８点の短い窓と１０２４点の長い窓の２種類の窓の間の切り替えが行われる。 As for audio signal encoding, the analysis length is shortened for a signal that shows a sharp change in the time domain, such as an attack sound, and pre-echo noise is suppressed. However, since the encoding efficiency per unit time decreases, the length of the analysis length is appropriately adjusted according to the input sound. For example, in AAC or the like, switching is performed between two types of windows: a short window of 128 points and a long window of 1024 points.

また、次の場合には、信号対ノイズ比に応じて、トランスコーダ装置２００における周波数領域に変換されるオーディオ信号の時間領域の分析窓の長さが調整されまたは切り替えられる。信号対ノイズ比が低い場合には、分析窓を長くしたとしてもプリエコーノイズが知覚されにくいので、分析窓を長くしても問題はない。分析長を長くすることによって、単位時間で１つの部分帯域当り使用できるビットが多くなり、符号化効率が向上し、またはビット数の使用量が少ない分、別の単位時間（フレーム）でそのビット数を使用することによって、音質を向上させることができる。 In the following case, the length of the analysis window in the time domain of the audio signal converted into the frequency domain in the transcoder 200 is adjusted or switched according to the signal-to-noise ratio. When the signal-to-noise ratio is low, pre-echo noise is hardly perceived even if the analysis window is lengthened, so there is no problem even if the analysis window is lengthened. Increasing the analysis length increases the number of bits that can be used per partial band per unit time, improving coding efficiency, or reducing the amount of bits used, and that bit in another unit time (frame). By using numbers, the sound quality can be improved.

図１３は、図２のトランスコーダ装置２００の変形形態であり、トランスコーダ装置２００の別の概略的な構成（configuration）の例を示している。 FIG. 13 is a modification of the transcoder device 200 of FIG. 2, and illustrates another example of a schematic configuration of the transcoder device 200.

この場合、トランスコーダ装置２００は、例えば、図２の音響符号化部２３０の代わりに、音響符号化部２３２を含んでいる。音響符号化部２３２は、図２の音響符号化部２３０と同様に、信号変換部２３６、マスキング閾値生成部２３８、マスキング閾値補正部２３９、符号化判定部２４０、レベル補正部２４２を含んでいる。また、音響符号化部２３２は、図２の音響符号化部２３０と同様に、マスキング補正部２４４、量子化部２４６、および多重化部２４８を含んでいる。この場合、音響符号化部２３２は、さらに、ブロック切替制御部２３２およびブロック切替部２３４を含んでいる。 In this case, the transcoder device 200 includes, for example, an acoustic encoding unit 232 instead of the acoustic encoding unit 230 of FIG. Similar to the acoustic encoding unit 230 in FIG. 2, the acoustic encoding unit 232 includes a signal conversion unit 236, a masking threshold value generation unit 238, a masking threshold value correction unit 239, an encoding determination unit 240, and a level correction unit 242. . The acoustic encoding unit 232 includes a masking correction unit 244, a quantization unit 246, and a multiplexing unit 248, similarly to the acoustic encoding unit 230 of FIG. In this case, the acoustic encoding unit 232 further includes a block switching control unit 232 and a block switching unit 234.

ブロック切替制御部２３２は、情報入力部２１０から各部分帯域の信号対ノイズ比を受け取っても、または情報入力部２１０から受け取ったノイズ電力と信号変換部２３６のオーディオ信号の電力に基づいて各部分帯域の信号対ノイズ比を算出してもよい。ここで、信号対ノイズ比は、携帯情報端末１０における再生オーディオ信号の信号対ノイズ比である。 Even if the block switching control unit 232 receives the signal-to-noise ratio of each partial band from the information input unit 210 or based on the noise power received from the information input unit 210 and the power of the audio signal of the signal conversion unit 236, The signal-to-noise ratio of the band may be calculated. Here, the signal-to-noise ratio is the signal-to-noise ratio of the reproduced audio signal in the portable information terminal 10.

ブロック切替制御部２３２は、携帯情報端末１０における信号対ノイズ（ＳＮ）比に応じて、ブロック切替部２３４に、音響復号部２２６から供給される信号変換部２３６の入力データのブロック長を切り替えさせる。ブロック切替制御部２３２は、例えば、ｘ個の部分帯域以上の信号対ノイズ（ＳＮ）比が閾値ＳＮ_ｔｈ以下の場合（ＳＮ＿ｒａｔｉｏ（ｓｆｂ）≦ＳＮ_ｔｈ）、ブロックの長さを強制的に長くする（例えば、１０２４点）。それによって、音響符号化部２３２は、信号対ノイズ（ＳＮ）比が閾値ＳＮ_ｔｈより低くオーディオ信号が聴取しにくい場合に、ブロック長が短い場合に比べて、単位時間当りの符号化オーディオ・データのビット数を増やすことができる。それによって、１つの部分帯域当りにより多くのビット数を当てることができ、実効的な聴覚的音質の向上に寄与する部分帯域のオーディオ信号により多くのビット数を割り当てることができる。 The block switching control unit 232 causes the block switching unit 234 to switch the block length of the input data of the signal conversion unit 236 supplied from the acoustic decoding unit 226 according to the signal-to-noise (SN) ratio in the portable information terminal 10. . For example, when the signal-to-noise (SN) ratio of x partial bands or more is equal to or less than the threshold SN _th (SN_ratio (sfb) ≦ SN _th ), the block switching control unit 232 forcibly increases the length of the block. (For example, 1024 points). Accordingly, the audio encoding unit 232 has encoded audio data per unit time when the signal-to-noise (SN) ratio is lower than the threshold SN _th and the audio signal is difficult to hear, compared to when the block length is short. The number of bits can be increased. As a result, a larger number of bits can be assigned to one partial band, and a larger number of bits can be allocated to a partial band audio signal that contributes to an effective improvement in auditory sound quality.

一方、信号対ノイズ比（ＳＮ＿ｒａｔｉｏ（ｓｆｂ））が閾値ＳＮ_ｔｈより高い（＞）場合、信号変換部２３６は、通常通り入力データを入力信号の特性に応じて、信号変換部２３６を用いてそれぞれ長いブロックまたは短いブロックの周波数領域のデータに変換する。その際、例えば、窓長は２０４８であり、長いブロックは１０２４点であり、短いブロックは１２８点であってもよい。信号変換部２３６は、例えばアタック音のような時間領域で急峻な変化を示す信号には短いブロックの周波数領域のデータに変換する。それによって、時間的分解能が向上してプリエコー減少が改善される。 On the other hand, when the signal-to-noise ratio (SN_ratio (sfb)) is higher (>) than the threshold SN _th , the signal conversion unit 236 uses the signal conversion unit 236 to convert the input data according to the characteristics of the input signal as usual. Convert to long or short block frequency domain data. In this case, for example, the window length may be 2048, the long block may be 1024 points, and the short block may be 128 points. The signal conversion unit 236 converts a signal showing a steep change in the time domain, such as an attack sound, into short block frequency domain data. Thereby, temporal resolution is improved and pre-echo reduction is improved.

図１４Ａおよび１４Ｂは、音響符号化部２３２によって実行される、携帯情報端末１０のノイズ電力および信号対ノイズ比に応じて、復号されたビデオ・ストリーム・データを聴覚符号化するための処理のフローチャートの例を示している。図１４Ａおよび１４Ｂのフローチャートは、図６Ａおよび６Ｂのフローチャートの変形形態である。 FIGS. 14A and 14B are flowcharts of processing for audio-coding the decoded video stream data according to the noise power and signal-to-noise ratio of the portable information terminal 10 executed by the acoustic encoding unit 232. An example is shown. The flowchart of FIGS. 14A and 14B is a variation of the flowchart of FIGS. 6A and 6B.

図１４Ａを参照すると、ステップ６０２において、ブロック切替制御部２３２は、各部分帯域について、携帯情報端末１０での信号対ノイズ比が閾値より大きいかどうか（ＳＮ＿ｒａｔｉｏ（ｓｆｂ）＞ＳＮ_ｔｈ）を判定する。ステップ６０４において、ブロック切替制御部２３２は、各部分帯域の信号対ノイズ比に応じて、個数閾値ｘ個以上の部分帯域の信号対ノイズ比がノイズ比閾値より大きい場合に、ブロックを強制的に長くなるよう制御する。それ以外の場合は、ブロック切替制御部２３２は、通常の形態で、信号変換部２３６にブロック長の制御を行わせる。 Referring to FIG. 14A, in step 602, the block switching control unit 232 determines whether or not the signal-to-noise ratio at the portable information terminal 10 is greater than the threshold (SN_ratio (sfb)> SN _th ) for each partial band. . In step 604, the block switching control unit 232 forces the block according to the signal-to-noise ratio of each partial band when the signal-to-noise ratio of the number threshold of x number or more is larger than the noise ratio threshold. Control to be longer. In other cases, the block switching control unit 232 causes the signal conversion unit 236 to control the block length in a normal form.

ステップ６０６〜６１２およびステップ６２０〜６４４は、図６Ａおよび６Ｂのものと同様である。 Steps 606-612 and steps 620-644 are similar to those of FIGS. 6A and 6B.

上述の実施形態では、オーディオ信号の部分帯域の電力３３２、３３４、３３８、３４０が、音響符号化部２３０、２３２のレベル補正部２４２によって補正された。その代替形態として、音響符号化部２３０、２３２はオーディオ信号の電力を増幅せずに符号化し、携帯情報端末１０が、ユーザの再生操作に従って、受信して復号した符号化オーディオ・データを、周囲雑音のノイズ電力レベルより大きくなるように増幅してもよい。但し、この場合、量子化誤差が増大する傾向がある。 In the above-described embodiment, the power 332, 334, 338, 340 of the audio signal partial band is corrected by the level correction unit 242 of the acoustic encoding units 230, 232. As an alternative, the audio encoding units 230 and 232 encode the audio signal without amplifying the power, and the portable information terminal 10 receives the encoded audio data received and decoded according to the user's reproduction operation. You may amplify so that it may become larger than the noise power level of noise. However, in this case, the quantization error tends to increase.

また、上述の実施形態では、マスキング閾値３０２が、音響符号化部２３０、２３２のマスキング閾値補正部２３９によってマスキング閾値３０６に補正され、さらにマスキング閾値３１０に補正されて、マスキング閾値３１０がマスキング補正部２４４で使用された。その代替形態として、符号化判定部２４０は、補正マスキング閾値３０２との差が或る利得閾値より大きい各部分帯域の電力３５０、５３１を選択し、レベル補正部２４２が、電力３５０、３５１をノイズ電力レベル３０６より大きくなるようレベル補正してもよい。この場合、マスキング補正部２４４は、量子化部２４６での量子化のために、補正マスキング閾値３１０ではなくマスキング閾値３０６を補正し、量子化部２４６が、各部分帯域のレベル補正された電力を、マスキング補正部２４６に従って量子化してもよい。 In the above-described embodiment, the masking threshold value 302 is corrected to the masking threshold value 306 by the masking threshold value correction unit 239 of the acoustic encoding units 230 and 232, and further corrected to the masking threshold value 310. Used at 244. As an alternative, the encoding determination unit 240 selects powers 350 and 531 of each partial band whose difference from the correction masking threshold 302 is greater than a certain gain threshold, and the level correction unit 242 uses the powers 350 and 351 as noise. The level may be corrected so as to be higher than the power level 306. In this case, the masking correction unit 244 corrects the masking threshold 306 instead of the correction masking threshold 310 for the quantization in the quantization unit 246, and the quantization unit 246 uses the level-corrected power of each partial band. The quantization may be performed according to the masking correction unit 246.

このように、実施形態によれば、音響符号化部２３０および２３２は、伝送レートの許容範囲内で、聴覚的音質の維持または向上に寄与する部分帯域の電力を選択し増大させて聴覚符号化するので、携帯情報端末１０でのオーディオ信号の聴覚的再生音質が改善できる。また、実施形態によれば、音響符号化部２３０および２３２は、携帯情報端末１０での周囲雑音に応じて聴覚的な再生音質の確保に有効な情報を選択的に符号化するので、実効的な再生音質の確保に有利な形態で送信情報量を低減することができる。 Thus, according to the embodiment, the acoustic encoding units 230 and 232 select and increase the power of the partial band that contributes to the maintenance or improvement of the auditory sound quality within the allowable range of the transmission rate, and the auditory encoding. Therefore, the auditory reproduction sound quality of the audio signal in the portable information terminal 10 can be improved. In addition, according to the embodiment, the acoustic encoding units 230 and 232 selectively encode information effective for ensuring auditory reproduction sound quality in accordance with ambient noise in the portable information terminal 10, which is effective. The amount of transmission information can be reduced in a form that is advantageous for ensuring a good reproduction sound quality.

ここで挙げた全ての例および条件的表現は、発明者が技術促進に貢献した発明および概念を読者が理解するのを助けるためのものであり、ここで具体的に挙げたそのような例および条件に限定することなく解釈され、また、明細書におけるそのような例の編成は本発明の優劣を示すこととは関係ない、と理解される。本発明の実施形態を詳細に説明したが、本発明の精神および範囲から逸脱することなく、それに対して種々の変更、置換および変形を施すことができる、と理解される。 All examples and conditional expressions given here are intended to help the reader understand the inventions and concepts that have contributed to the promotion of technology, such examples and It is understood that the present invention is not limited to the conditions, and that the organization of such examples in the specification is not related to the superiority or inferiority of the present invention. Although embodiments of the present invention have been described in detail, it will be understood that various changes, substitutions and variations can be made thereto without departing from the spirit and scope of the invention.

以上の実施例を含む実施形態に関して、さらに以下の付記を開示する。
（付記１）情報量が低減するように聴覚特性に基づいてマスキング閾値を利用してオーディオ信号を符号化するオーディオ符号化装置であって、
端末から受信した周囲雑音レベルに応じてマスキング閾値を補正し、その際、前記マスキング閾値との差が或る閾値より大きい入力信号が、補正後の前記マスキング閾値より大きくなるように、前記マスキング閾値および前記入力信号の少なくとも一方の補正を行う第１の処理部と、
補正後の前記マスキング閾値より大きい、前記少なくとも一方の補正が行われた後の前記入力信号の符号化を行う第２の処理部と、
を含むオーディオ符号化装置。
（付記２）前記第１の処理部が前記端末における信号対ノイズ比に応じて前記マスキング閾値を補正することを特徴とする、付記１に記載のオーディオ符号化装置。
（付記３）前記第１の処理部は、さらに前記符号化されたオーディオ信号の前記端末への伝送レートに応じて前記マスキング閾値を補正するものであることを特徴とする、付記１または２に記載のオーディオ符号化装置。
（付記４）前記入力信号は、前記マスキング閾値より大きく、かつ前記マスキング閾値と間の利得差が前記或る閾値より大きいものであることを特徴とする、付記１乃至３のいずれかに記載のオーディオ符号化装置。
（付記５）前記第１の処理部は、前記マスキング閾値より大きい或る入力信号と、前記或る入力信号の部分帯域に隣接する部分帯域の他の入力信号との間の利得差が、或る利得閾値より大きい場合に、前記マスキング閾値を前記或る入力信号より小さくなるように補正するものであることを特徴とする、付記１乃至４のいずれかに記載のオーディオ符号化装置。
（付記６）前記第２の処理部が、さらに、前記端末における信号対ノイズ比に応じて前記第２の処理部によって符号化される前記入力信号のブロックの長さを変更することを特徴とする、付記１乃至５のいずれかに記載のオーディオ符号化装置。
（付記７）前記少なくとも一方の補正が前記入力信号の各部分帯域について行われることを特徴とする、付記１乃至６のいずれかに記載のオーディオ符号化装置。
（付記８）情報量が低減するように聴覚特性に基づいてマスキング閾値を利用してオーディオ信号を符号化する方法であって、
端末から受信した周囲雑音レベルに応じてマスキング閾値を補正し、その際、前記マスキング閾値との差が或る閾値より大きい入力信号が、補正後の前記マスキング閾値より大きくなるように、前記マスキング閾値および前記入力信号の少なくとも一方の補正を行い、
補正後の前記マスキング閾値より大きい、前記少なくとも一方の補正が行われた後の前記入力信号の符号化を行う
処理を情報処理装置が実行する方法。 Regarding the embodiment including the above examples, the following additional notes are further disclosed.
(Supplementary note 1) An audio encoding device that encodes an audio signal using a masking threshold based on auditory characteristics so as to reduce the amount of information,
The masking threshold is corrected according to the ambient noise level received from the terminal, and at this time, the masking threshold is set such that an input signal whose difference from the masking threshold is larger than a certain threshold is larger than the corrected masking threshold. And a first processing unit for correcting at least one of the input signals;
A second processing unit that encodes the input signal after the at least one correction is performed, which is greater than the corrected masking threshold;
An audio encoding device.
(Supplementary note 2) The audio encoding device according to supplementary note 1, wherein the first processing unit corrects the masking threshold according to a signal-to-noise ratio in the terminal.
(Supplementary note 3) The supplementary note 1 or 2, wherein the first processing unit further corrects the masking threshold according to a transmission rate of the encoded audio signal to the terminal. The audio encoding device described.
(Supplementary note 4) The input signal according to any one of Supplementary notes 1 to 3, wherein the input signal is larger than the masking threshold and a gain difference between the input signal and the masking threshold is larger than the certain threshold. Audio encoding device.
(Supplementary Note 5) The first processing unit has a gain difference between an input signal larger than the masking threshold and another input signal of a partial band adjacent to the partial band of the certain input signal, or The audio encoding device according to any one of appendices 1 to 4, wherein the audio coding apparatus corrects the masking threshold so as to be smaller than the certain input signal when the gain threshold is larger than a certain gain threshold.
(Additional remark 6) The said 2nd process part further changes the length of the block of the said input signal encoded by the said 2nd process part according to the signal-to-noise ratio in the said terminal, It is characterized by the above-mentioned. The audio encoding device according to any one of appendices 1 to 5.
(Supplementary note 7) The audio encoding device according to any one of supplementary notes 1 to 6, wherein the at least one correction is performed for each partial band of the input signal.
(Supplementary note 8) A method of encoding an audio signal using a masking threshold based on auditory characteristics so as to reduce the amount of information,
The masking threshold is corrected according to the ambient noise level received from the terminal, and at this time, the masking threshold is set such that an input signal whose difference from the masking threshold is larger than a certain threshold is larger than the corrected masking threshold. And correcting at least one of the input signals,
A method in which an information processing apparatus executes a process of encoding the input signal after at least one of the corrections is performed, which is larger than the corrected masking threshold.

１０移動通信端末
２０情報処理装置
２００トランスコーダ装置
２１０情報入力部
２２６音響復号部
２３０、２３２音響符号化部
２３２ブロック切替制御部
２３４ブロック切替部
２３６信号変換部
２３８マスキング閾値
２３９マスキング閾値補正部
２４０符号化判定部
２４２レベル補正部
２４４マスキング補正部
２４６量子化部
２５６画像復号部
２６０画像符号化部
２７０多重化部
２８０ストリーム出力部 DESCRIPTION OF SYMBOLS 10 Mobile communication terminal 20 Information processing apparatus 200 Transcoder apparatus 210 Information input part 226 Acoustic decoding part 230, 232 Acoustic encoding part 232 Block switching control part 234 Block switching part 236 Signal conversion part 238 Masking threshold value 239 Masking threshold value correction part 240 Code Determining unit 242 Level correcting unit 244 Masking correcting unit 246 Quantizing unit 256 Image decoding unit 260 Image encoding unit 270 Multiplexing unit 280 Stream output unit

Claims

An audio encoding device that encodes an audio signal using a masking threshold based on auditory characteristics so as to reduce the amount of information,
The masking threshold is corrected according to the ambient noise level received from the terminal, and at this time, the masking threshold is set such that an input signal whose difference from the masking threshold is larger than a certain threshold is larger than the corrected masking threshold. And a first processing unit for correcting at least one of the input signals;
A second processing unit that encodes the input signal after the at least one correction is performed, which is greater than the corrected masking threshold;
An audio encoding device.

The audio encoding apparatus according to claim 1, wherein the first processing unit corrects the masking threshold according to a signal-to-noise ratio in the terminal.

The audio according to claim 1 or 2, wherein the first processing unit further corrects the masking threshold according to a transmission rate of the encoded audio signal to the terminal. Encoding device.

The audio coding according to any one of claims 1 to 3, wherein the input signal is larger than the masking threshold and a gain difference between the input signal and the masking threshold is larger than the certain threshold. apparatus.

The first processing unit has a gain difference between an input signal larger than the masking threshold and another input signal of a partial band adjacent to the partial band of the certain input signal less than a certain gain threshold. 5. The audio encoding device according to claim 1, wherein when the value is larger, the masking threshold is corrected to be smaller than the certain input signal.

The second processing unit further changes a length of a block of the input signal encoded by the second processing unit according to a signal-to-noise ratio in the terminal. The audio encoding device according to any one of 1 to 5.

A method of encoding an audio signal using a masking threshold based on auditory characteristics so as to reduce the amount of information,
The masking threshold is corrected according to the ambient noise level received from the terminal, and at this time, the masking threshold is set such that an input signal whose difference from the masking threshold is larger than a certain threshold is larger than the corrected masking threshold. And correcting at least one of the input signals,
A method in which an information processing apparatus executes a process of encoding the input signal after at least one of the corrections is performed, which is larger than the corrected masking threshold.