JP5326714B2

JP5326714B2 - Band expanding apparatus, method and program, and quantization noise learning apparatus, method and program

Info

Publication number: JP5326714B2
Application number: JP2009070833A
Authority: JP
Inventors: 弘美青柳
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2009-03-23
Filing date: 2009-03-23
Publication date: 2013-10-30
Anticipated expiration: 2029-03-23
Also published as: JP2010224180A

Description

本発明は帯域拡張装置、方法及びプログラム、並びに、量子化雑音学習装置、方法及びプログラムに関し、例えば、帯域が制限された音声信号に対し、その帯域上限を超える信号を生成、付加して帯域を拡張する場合に適用し得るものである。 The present invention relates to a bandwidth extension device, method and program, and a quantization noise learning device, method and program. For example, a bandwidth exceeding a bandwidth upper limit is generated and added to a speech signal whose bandwidth is limited. Applicable when expanding.

現在、盛んに行われている音声通信である電話は、伝送可能な音声周波数に制限がある。具体的には、３００Ｈｚ〜３．４ｋＨｚの音声信号しか伝送できず、その通話音声品質は十分とは言えない。また、聴き取り易さも阻害されている。 Currently, telephones that are actively used for voice communication are limited in the frequency of voice that can be transmitted. Specifically, only a voice signal of 300 Hz to 3.4 kHz can be transmitted, and the call voice quality is not sufficient. In addition, ease of listening is also hindered.

このような課題に対し、帯域が制限された音声信号の帯域を拡張し、音声品質、聴き取り易さを向上しようとする試みがある（特許文献１、特許文献２参照）。従来の拡張方法では、図５に示すように、帯域が制限された音声信号から、その帯域を超える帯域の信号を生成し、付加することにより帯域の拡張を実現している。 In order to solve such a problem, there is an attempt to expand the band of a sound signal whose band is limited to improve sound quality and ease of listening (see Patent Document 1 and Patent Document 2). In the conventional extension method, as shown in FIG. 5, the extension of the band is realized by generating and adding a signal of a band exceeding the band from the audio signal whose band is limited.

特開平９−９０９９２号公報JP-A-9-90992 特開平８−１２３４９５号公報JP-A-8-123495

従来の音声帯域拡張方法によると、入力音声信号がクリーンな状態（例えば、アナログ／デジタル変換直後の信号）の場合には、比較的良好な拡張音声信号が得られる。 According to the conventional audio band extending method, a relatively good extended audio signal can be obtained when the input audio signal is in a clean state (for example, a signal immediately after analog / digital conversion).

しかしながら、入力音声信号に、符号化・復号処理による量子化雑音が重畳されている場合などには、生成された拡張帯域に量子化雑音の影響が出て、拡張音声信号の品質劣化が無視できないものとなっている。音声帯域の拡張は高音質を意図しているが、拡張帯域の品質が劣化するのであれば拡張した意義が喪失されてしまう。 However, when quantization noise due to encoding / decoding processing is superimposed on the input audio signal, the generated extended band is affected by the quantization noise, and the quality deterioration of the extended audio signal cannot be ignored. It has become a thing. The extension of the voice band is intended for high sound quality, but if the quality of the extension band deteriorates, the meaning of the extension is lost.

そのため、入力信号に、符号化・復号処理による量子化雑音があっても、高品質の帯域拡張信号を得ることができる帯域拡張装置、方法及びプログラムや、そのような新たな帯域拡張装置、方法及びプログラムに適用可能な新技術が望まれている。 Therefore, even if there is quantization noise due to encoding / decoding processing in the input signal, it is possible to obtain a high quality band extension signal, a band extension apparatus, method and program, and such a new band extension apparatus and method. And new technologies applicable to the program are desired.

かかる課題を解決するため、第１の本発明は、復号回路から出力された帯域拡張対象信号を帯域拡張する帯域拡張装置において、（１）帯域拡張を実行する帯域拡張回路本体と、（２）学習用信号を適用した学習により形成された、上記復号回路に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶している雑音周波数特性記憶手段と、（３）上記復号回路から出力された帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させる量子化雑音低減手段とを有することを特徴とする。 In order to solve such a problem, the first aspect of the present invention provides (1) a bandwidth extension circuit main body that performs bandwidth extension in a bandwidth extension device that extends a bandwidth extension target signal output from a decoding circuit, and (2) A noise frequency characteristic storage means for storing frequency characteristic information of quantization noise specific to the encoding method related to the decoding circuit , formed by learning using a learning signal, and (3) output from the decoding circuit The quantized noise component mixed in the band extension target signal is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means, And a quantization noise reduction means for inputting.

第２の本発明は、復号回路から出力された帯域拡張対象信号を帯域拡張する帯域拡張方法において、（１）帯域拡張回路本体は、帯域拡張を実行し、（２）雑音周波数特性記憶手段は、学習用信号を適用した学習により形成された、上記復号回路に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶しており、（３）量子化雑音低減手段は、上記復号回路から出力された帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させることを特徴とする。 According to a second aspect of the present invention, there is provided a band extension method for band extension of a band extension target signal output from a decoding circuit. (1) The band extension circuit body executes band extension. (2) The noise frequency characteristic storage means is , Storing frequency characteristic information of quantization noise specific to the encoding method according to the decoding circuit , formed by learning using a learning signal, and (3) the quantization noise reduction means includes the decoding circuit The band expansion circuit reduces the quantization noise component mixed in from the band extension target signal output from the base station based on the information on the frequency characteristic of the quantization noise stored in the noise frequency characteristic storage means. It is characterized by having the main body input.

第３の本発明は、復号処理で得られた帯域拡張対象信号を帯域拡張する帯域拡張プログラムであって、コンピュータを、（１）帯域拡張を実行する帯域拡張回路本体と、（２）学習用信号を適用した学習により形成された、上記復号処理に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶している雑音周波数特性記憶手段と、（３）上記復号処理で得られた帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させる量子化雑音低減手段として機能させることを特徴とする。 A third aspect of the present invention is a band expansion program for band-expanding a band expansion target signal obtained by decoding processing, comprising: (1) a band expansion circuit main body that performs band expansion; and (2) learning. Noise frequency characteristic storage means for storing frequency characteristic information of the quantization noise specific to the encoding scheme related to the decoding process , formed by learning using a signal, and (3) obtained by the decoding process The quantization noise component mixed from the band extension target signal is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means, and input to the band extension circuit body. It functions as a quantization noise reduction means.

第４の本発明は、復号回路から出力された帯域拡張対象信号に対し、帯域拡張を実行する帯域拡張回路本体と、上記復号回路に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶している雑音周波数特性記憶手段と、上記復号回路から出力された帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させる量子化雑音低減手段とを有する帯域拡張装置が利用する量子化雑音の周波数特性の情報を形成する量子化雑音学習装置であって、（１）学習用信号を出力する学習用信号発生手段と、（２）上記符号化方式に従って学習用信号を符号化すると共に、直ちに復号する量子化雑音混入学習用信号形成手段と、（３）上記学習用信号発生手段から出力された学習用信号から、上記量子化雑音混入学習用信号形成手段から出力された学習信号を減算して、量子化雑音信号を得る減算手段と、（４）得られた量子化雑音信号の周波数特性を分析して、上記雑音周波数特性記憶手段に記憶させる情報を形成する雑音学習手段とを備えることを特徴とする。 According to a fourth aspect of the present invention, information on frequency characteristics of a band expansion circuit main body that performs band expansion on the band extension target signal output from the decoding circuit and quantization noise specific to the encoding method related to the decoding circuit is provided. The stored noise frequency characteristic storage means and the frequency of the quantization noise stored in the noise frequency characteristic storage means from the band extension target signal output from the decoding circuit, the quantized noise component mixed This is a quantization noise learning device that forms the frequency characteristic information of the quantization noise used by the band expansion device having the quantization noise reduction means that is reduced based on the characteristic information and input to the band expansion circuit main body. (1) learning signal generation means for outputting a learning signal; and (2) a quantization noise-mixed learning signal form that encodes the learning signal according to the encoding method and immediately decodes it. And (3) a subtracting means for subtracting the learning signal output from the quantization noise-mixed learning signal forming means from the learning signal output from the learning signal generating means to obtain a quantized noise signal. And (4) noise learning means for analyzing the frequency characteristics of the obtained quantized noise signal and forming information to be stored in the noise frequency characteristics storage means.

第５の本発明は、復号回路から出力された帯域拡張対象信号に対し、帯域拡張を実行する帯域拡張回路本体と、上記復号回路に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶している雑音周波数特性記憶手段と、上記復号回路から出力された帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させる量子化雑音低減手段とを有する帯域拡張装置が利用する量子化雑音の周波数特性の情報を形成する量子化雑音学習方法であって、（１）学習用信号発生手段は、学習用信号を出力し、（２）量子化雑音混入学習用信号形成手段は、上記符号化方式に従って学習用信号を符号化すると共に、直ちに復号し、（３）減算手段は、上記学習用信号発生手段から出力された学習用信号から、上記量子化雑音混入学習用信号形成手段から出力された学習信号を減算して、量子化雑音信号を得、（４）雑音学習手段は、得られた量子化雑音信号の周波数特性を分析して、上記雑音周波数特性記憶手段に記憶させる情報を形成することを特徴とする。 According to a fifth aspect of the present invention, there is provided a bandwidth extension circuit main body for performing bandwidth extension on the bandwidth extension target signal output from the decoding circuit, and information on frequency characteristics of quantization noise specific to the coding system related to the decoding circuit The stored noise frequency characteristic storage means and the frequency of the quantization noise stored in the noise frequency characteristic storage means from the band extension target signal output from the decoding circuit, the quantized noise component mixed This is a quantization noise learning method for forming frequency characteristic information of quantization noise used by a band expansion apparatus having quantization noise reduction means that is reduced based on characteristic information and input to the band expansion circuit main body. (1) The learning signal generating means outputs a learning signal, and (2) the quantization noise-mixed learning signal forming means encodes the learning signal according to the above encoding method, and immediately (3) The subtracting unit subtracts the learning signal output from the quantization noise mixed learning signal forming unit from the learning signal output from the learning signal generating unit, and outputs a quantization noise signal. (4) The noise learning means analyzes the frequency characteristic of the obtained quantized noise signal and forms information to be stored in the noise frequency characteristic storage means.

第６の本発明は、復号回路から出力された帯域拡張対象信号に対し、帯域拡張を実行する帯域拡張回路本体と、上記復号回路に係る符号化方式固有の量子化雑音の周波数特性の情報を記憶している雑音周波数特性記憶手段と、上記復号回路から出力された帯域拡張対象信号から、混入されている量子化雑音成分を、上記雑音周波数特性記憶手段に記憶されている量子化雑音の周波数特性の情報に基づいて低減して、上記帯域拡張回路本体に入力させる量子化雑音低減手段とを有する帯域拡張装置が利用する量子化雑音の周波数特性の情報を形成する量子化雑音学習プログラムであって、コンピュータを、（１）学習用信号を出力する学習用信号発生手段と、（２）上記符号化方式に従って学習用信号を符号化すると共に、直ちに復号する量子化雑音混入学習用信号形成手段と、（３）上記学習用信号発生手段から出力された学習用信号から、上記量子化雑音混入学習用信号形成手段から出力された学習信号を減算して、量子化雑音信号を得る減算手段と、（４）得られた量子化雑音信号の周波数特性を分析して、上記雑音周波数特性記憶手段に記憶させる情報を形成する雑音学習手段として機能させることを特徴とする。 According to a sixth aspect of the present invention, there is provided a band extension circuit main body that performs band extension on a band extension target signal output from a decoding circuit, and information on frequency characteristics of quantization noise specific to the encoding method related to the decoding circuit. The stored noise frequency characteristic storage means and the frequency of the quantization noise stored in the noise frequency characteristic storage means from the band extension target signal output from the decoding circuit, the quantized noise component mixed This is a quantization noise learning program that forms frequency characteristic information of quantization noise used by a band expansion device that has a quantization noise reduction means that is reduced based on characteristic information and is input to the band expansion circuit main body. And (1) a learning signal generating means for outputting a learning signal, and (2) a quantum that encodes the learning signal according to the encoding method and immediately decodes the learning signal. (3) subtracting the learning signal output from the quantization noise mixing learning signal forming means from the learning signal output from the learning signal generating means, and quantizing (4) Analyzing the frequency characteristics of the obtained quantized noise signal and functioning as noise learning means for forming information to be stored in the noise frequency characteristic storage means .

本発明によれば、拡張対象信号に、符号化・復号処理による量子化雑音があっても、高品質の帯域拡張信号を得ることができるようになる。 According to the present invention, it is possible to obtain a high-quality band extension signal even if the extension target signal includes quantization noise due to encoding / decoding processing.

第１の実施形態に係る帯域拡張装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the band expansion apparatus which concerns on 1st Embodiment. 第１の実施形態における符復号雑音低減回路の内部構成例を示すブロック図である。It is a block diagram which shows the example of an internal structure of the codec noise reduction circuit in 1st Embodiment. 第２の実施形態に係る帯域拡張装置の全体構成を示すブロック図である。It is a block diagram which shows the whole structure of the band expansion apparatus which concerns on 2nd Embodiment. 第２の実施形態における学習パラメータ記憶部の構成を示す説明図である。It is explanatory drawing which shows the structure of the learning parameter memory | storage part in 2nd Embodiment. 帯域拡張の説明図である。It is explanatory drawing of band expansion.

（Ａ）第１の実施形態
以下、本発明による帯域拡張装置、方法及びプログラム、並びに、量子化雑音学習装置、方法及びプログラムの第１の実施形態を、図面を参照しながら詳述する。この第１の実施形態の帯域拡張装置や量子化雑音学習装置は、音声信号を取り扱う装置（例えば、電話端末やソフトフォンなど）で利用されるものである。また、第１の実施形態の装置は、適用可能なコーデックの種類が１種類の場合の実施形態である。 (A) First Embodiment Hereinafter, a first embodiment of a band extending apparatus, method and program, and a quantization noise learning apparatus, method and program according to the present invention will be described in detail with reference to the drawings. The band extending apparatus and the quantization noise learning apparatus according to the first embodiment are used in apparatuses that handle audio signals (for example, telephone terminals and softphones). The apparatus according to the first embodiment is an embodiment in the case where there is only one type of codec that can be applied.

第１の実施形態は、予め符号化・復号処理で生じる量子化雑音の特性を取得しておき、復号処理により得られた音声信号に対し、この学習特性に応じた雑音低減処理を施した後に拡張帯域部分を生成することを特徴としている。 In the first embodiment, after obtaining the characteristics of quantization noise generated in the encoding / decoding process in advance and performing noise reduction processing corresponding to the learning characteristics on the audio signal obtained by the decoding process, It is characterized by generating an extended band portion.

（Ａ−１）第１の実施形態の構成
図１は、第１の実施形態に係る帯域拡張装置の全体構成を示すブロック図である。例えば、第１の実施形態の帯域拡張装置を搭載する装置がソフトフォンであって、帯域拡張装置が、ＣＰＵと、ＣＰＵが実行するプログラムで実現される場合であっても、機能的には、図１の機能ブロック図で表すことができる。 (A-1) Configuration of the First Embodiment FIG. 1 is a block diagram showing the overall configuration of the bandwidth extension apparatus according to the first embodiment. For example, even if the device equipped with the bandwidth extension device of the first embodiment is a soft phone and the bandwidth extension device is realized by a CPU and a program executed by the CPU, functionally, It can be represented by the functional block diagram of FIG.

図１において、第１の実施形態の帯域拡張装置１００は、大きくは、量子化雑音学習部１１０と、学習パラメータ記憶部１２０と、帯域拡張部１３０とから構成されている。なお、帯域拡張部１３０へ復号された音声データを与える復号回路１０１は、第１の実施形態の帯域拡張装置１００を搭載する装置に設けられている、符号化音声データを復号する回路を表している。 In FIG. 1, the band extending apparatus 100 according to the first embodiment mainly includes a quantization noise learning unit 110, a learning parameter storage unit 120, and a band extending unit 130. Note that the decoding circuit 101 that provides the decoded voice data to the band extension unit 130 represents a circuit that decodes the encoded voice data provided in the apparatus on which the band extension apparatus 100 of the first embodiment is mounted. Yes.

量子化雑音学習部１１０は、帯域拡張部１３０が帯域拡張処理で利用する学習パラメータを得て、学習パラメータ記憶部１２０に記憶させるものである。 The quantization noise learning unit 110 obtains learning parameters used by the band extending unit 130 in the band extending process, and stores them in the learning parameter storage unit 120.

量子化雑音学習部１１０は、学習用データ発生部１１１、符号化回路１１２、復号回路１１３、減算器１１４及び符復号雑音学習回路１１５を有する。 The quantization noise learning unit 110 includes a learning data generation unit 111, an encoding circuit 112, a decoding circuit 113, a subtractor 114, and a codec noise learning circuit 115.

学習用データ発生部１１１は、学習用の音声信号（デジタル信号；以下、学習用音声データと呼ぶ）を発生して出力するものである。例えば、音声データを記録した記録媒体からの読み出しによって学習用の音声データを発生するものを適用できる。また例えば、マイクロフォンが捕捉し、アナログ／デジタル変換した任意の音声信号を学習用の音声信号として利用するようにしても良い。 The learning data generator 111 generates and outputs a learning speech signal (digital signal; hereinafter referred to as learning speech data). For example, it is possible to apply one that generates audio data for learning by reading from a recording medium on which audio data is recorded. Further, for example, an arbitrary audio signal captured by a microphone and analog / digital converted may be used as a learning audio signal.

符号化回路１１２は、第１の実施形態の帯域拡張装置１００を搭載する装置が採用している符号化方式（コーデックの種類）に従って、学習用音声データを符号化するものである。復号回路１１３は、符号化回路１１２から出力された符号化された学習用音声データを復号するものである。符号化回路１１２へ入力される学習用音声データと、復号回路１１３から出力された学習用音声データとは、符号化、復号処理を経たことによる量子化雑音（以下では適宜、符復号雑音と呼ぶ）分だけ異なっているものである。 The encoding circuit 112 encodes the speech data for learning according to the encoding method (type of codec) employed by the device on which the band extending device 100 according to the first embodiment is installed. The decoding circuit 113 decodes the encoded learning speech data output from the encoding circuit 112. The speech data for learning input to the encoding circuit 112 and the speech data for learning output from the decoding circuit 113 are quantized noise (hereinafter referred to as codec noise as appropriate) due to the encoding and decoding processes. ) Is different by the minute.

減算器１１４は、符号化回路１１２へ入力される学習用音声データから、復号回路１１３から出力された学習用音声データを減算することにより、符号化、復号処理に伴う量子化雑音成分からなる符復号雑音データを得るものである。 The subtractor 114 subtracts the learning speech data output from the decoding circuit 113 from the learning speech data input to the encoding circuit 112 to thereby generate a code consisting of a quantization noise component accompanying encoding and decoding processing. Decoding noise data is obtained.

符復号雑音学習回路１１５は、符復号雑音データの周波数特性を表す情報（学習パラメータ）を得て、学習パラメータ記憶部１２０に記憶させるものである。学習パラメータは、符復号雑音データの長期的な平均を反映させたパラメータであれば良い。符復号雑音学習回路１１５は、例えば、符復号雑音データを所定時間毎のフレームに分割し、各フレームにＦＦＴ処理を施し、フレーム毎のＦＦＴ処理結果の平均を学習パラメータとして記憶する。また、学習用音声データとして、複数用意しておき、それぞれの学習用音声データを適用して得た学習パラメータを平均して、学習パラメータ記憶部１２０に記憶させる学習パラメータを得るようにしても良い。 The codec noise learning circuit 115 obtains information (learning parameters) representing the frequency characteristics of the codec noise data and stores the information in the learning parameter storage unit 120. The learning parameter may be a parameter that reflects a long-term average of the codec noise data. For example, the codec noise learning circuit 115 divides the codec noise data into frames for every predetermined time, performs FFT processing on each frame, and stores the average of FFT processing results for each frame as a learning parameter. A plurality of learning speech data may be prepared, and learning parameters obtained by applying each learning speech data may be averaged to obtain a learning parameter to be stored in the learning parameter storage unit 120. .

帯域拡張部１３０には、音声通信時に復号回路１０１が復号処理で得た音声データが入力させるようになされており、この音声データの帯域を拡張するものである。なお、対向する装置の符号化回路（図示せず）と復号回路１０１との符号化、復号処理を通して、復号回路１０１から出力された音声データには符復号雑音が混入されている。上述した量子化雑音学習部１１０の符号化回路１１２は、対向する装置の符号化回路（図示せず）と同様なものであり、上述した量子化雑音学習部１１０の復号回路１１３は、復号回路１０１と同様なものである。 The band extending unit 130 is configured to input the voice data obtained by the decoding process by the decoding circuit 101 during voice communication, and extends the band of the voice data. It should be noted that codec noise is mixed in the audio data output from the decoding circuit 101 through the encoding and decoding processes of the encoding circuit (not shown) and the decoding circuit 101 of the opposite device. The encoding circuit 112 of the quantization noise learning unit 110 described above is similar to the encoding circuit (not shown) of the opposite device, and the decoding circuit 113 of the quantization noise learning unit 110 described above is a decoding circuit. 101 is the same.

帯域拡張部１３０は、符復号雑音低減回路１３１、帯域拡張回路１３２、高域通過回路１３３、アップサンプリング回路１３４及び加算器１３５を有する。 The band extension unit 130 includes a codec noise reduction circuit 131, a band extension circuit 132, a high-pass circuit 133, an upsampling circuit 134, and an adder 135.

符復号雑音低減回路１３１は、学習パラメータ記憶部１２０に記憶されている学習パラメータに基づいて、復号回路１０１から出力された音声データの符復号雑音を低減するものである。符復号雑音の低減には、例え、周波数減算などを利用できる。 The codec noise reduction circuit 131 reduces the codec noise of the speech data output from the decoding circuit 101 based on the learning parameters stored in the learning parameter storage unit 120. For example, frequency subtraction can be used to reduce the codec noise.

図２は、符復号雑音低減回路１３１の内部構成例を示すブロック図である。図２において、符復号雑音低減回路１３１は、ＦＦＴ処理回路２００、周波数減算回路２０１及び逆ＦＦＴ処理回路２０２を有する。 FIG. 2 is a block diagram illustrating an internal configuration example of the codec noise reduction circuit 131. In FIG. 2, the codec noise reduction circuit 131 includes an FFT processing circuit 200, a frequency subtraction circuit 201, and an inverse FFT processing circuit 202.

ＦＦＴ処理回路２００は、復号回路１０１から出力された音声データに対しＦＦＴを施す。周波数減算回路２０１は、音声データに対するＦＦＴ処理結果から、学習パラメータ記憶部１２０に記憶されている学習パラメータ（符復号雑音についてのＦＦＴ処理結果）を、対応する周波数毎に減算する。逆ＦＦＴ処理回路２０２は、周波数減算回路２０１から出力されたＦＦＴデータに対し、逆ＦＦＴ処理を実行する。 The FFT processing circuit 200 performs FFT on the audio data output from the decoding circuit 101. The frequency subtraction circuit 201 subtracts the learning parameter (FFT processing result for codec noise) stored in the learning parameter storage unit 120 from the FFT processing result for the speech data for each corresponding frequency. The inverse FFT processing circuit 202 performs an inverse FFT process on the FFT data output from the frequency subtraction circuit 201.

帯域拡張回路１３２は、符復号雑音低減回路１３１から出力された音声データに対し、帯域拡張処理を実行する。帯域拡張回路１３２による帯域拡張方法としては、既存のいかなる方法を適用しても良い。なお、帯域拡張回路１３２として、内部で、拡張帯域部分の処理部と、既存帯域部分の処理部とに分かれているものを適用した場合には、少なくとも前者にのみ符復号雑音を低減した音声データを入力させるようにすれば良い。 The band extension circuit 132 performs band extension processing on the audio data output from the codec noise reduction circuit 131. Any existing method may be applied as a bandwidth extension method by the bandwidth extension circuit 132. Note that when the band expansion circuit 132 is internally divided into an expansion band processing unit and an existing band processing unit, audio data in which codec noise is reduced at least in the former case. Should be input.

高域通過回路１３３は、帯域拡張された音声データから、拡張部分の周波数成分を抜き出すものである。図１では、帯域拡張回路１３２として、既存帯域部分と拡張帯域部分とを含む広帯域の音声データを出力する既存のものを適用する場合を示しており、そのため、高域通過回路１３３を設けている。帯域拡張回路１３２に代え、拡張帯域部分だけを形成する回路を適用した場合には、高域通過回路１３３を省略することができる。 The high-pass circuit 133 extracts the frequency component of the extended portion from the audio data whose band has been extended. FIG. 1 shows a case where an existing circuit that outputs wideband audio data including an existing band part and an extended band part is applied as the band extension circuit 132, and therefore a high-pass circuit 133 is provided. . When a circuit that forms only the extended band portion is applied instead of the band extending circuit 132, the high-pass circuit 133 can be omitted.

アップサンプリング回路１３４は、復号回路１０１から出力された音声データのサンプリングレートを、所望する広帯域に合わせて上昇させるものである。 The upsampling circuit 134 increases the sampling rate of the audio data output from the decoding circuit 101 in accordance with a desired wide band.

加算器１３５は、アップサンプリングされた既存帯域の音声データ部分と、高域通過回路１３３からの高域帯域の音声データ部分とを加算して帯域拡張信号を得るものである。 The adder 135 adds the up-sampled audio data portion of the existing band and the high-band audio data portion from the high-pass circuit 133 to obtain a band extension signal.

（Ａ−２）第１の実施形態の動作
次に、第１の実施形態の帯域拡張装置１００の動作を、量子化雑音学習動作（量子化雑音学習方法）、帯域拡張動作（帯域拡張方法）の順に説明する。 (A-2) Operation of the First Embodiment Next, the operation of the band extension apparatus 100 of the first embodiment is divided into a quantization noise learning operation (quantization noise learning method) and a band extension operation (band extension method). Will be described in the order.

符復号雑音（量子化雑音）の学習動作モードでは、学習用データ発生部１１１から、学習用音声データが出力され、符号化回路１１２及び減算器１１４に与えられる。 In the learning operation mode of codec noise (quantization noise), learning speech data is output from the learning data generation unit 111 and provided to the encoding circuit 112 and the subtracter 114.

学習用音声データは、符号化回路１１２によって符号化された後、復号回路１１３によって直ちに復号されて、減算器１１４の減算入力端子に与えられる。 The learning speech data is encoded by the encoding circuit 112, immediately decoded by the decoding circuit 113, and given to the subtraction input terminal of the subtractor 114.

符号化回路１１２へ入力される学習用音声データと、復号回路１１３から出力された学習用音声データとは、符復号雑音分だけが異なっており、減算器１１４の減算を通じて、符復号雑音データが取り出される。取り出された符復号雑音データの周波数特性を表す情報（学習パラメータ）が、符復号雑音学習回路１１５によって得られ、学習パラメータ記憶部１２０に記憶される。 The learning speech data input to the encoding circuit 112 is different from the learning speech data output from the decoding circuit 113 only in the amount of code decoding noise. It is taken out. Information (learning parameters) indicating the frequency characteristics of the extracted codec noise data is obtained by the codec noise learning circuit 115 and stored in the learning parameter storage unit 120.

音声データの帯域拡張動作モードでは、復号回路１０１が復号処理で得た音声データが符復号雑音低減回路１３１に与えられ、この音声データに混入されている符復号雑音が、学習パラメータ記憶部１２０に記憶されている学習パラメータが利用されて、低減される。 In the voice data band expansion operation mode, the voice data obtained by the decoding process by the decoding circuit 101 is supplied to the codec noise reduction circuit 131, and the codec noise mixed in the voice data is stored in the learning parameter storage unit 120. The stored learning parameters are utilized and reduced.

符復号雑音が低減された音声データは、帯域拡張回路１３２によって帯域が拡張された後、高域通過回路１３３によって、所望する拡張部分の周波数成分だけが抽出されて加算器１３５に与えられる。また、復号回路１０１が復号処理で得た音声データは、アップサンプリング回路１３４によって、そのサンプリングレートが上昇されて加算器１３５に与えられる。その結果、加算器１３５によって、アップサンプリングされた既存帯域の音声データ部分と、高域通過回路１３３からの高域帯域の音声データ部分とが加算されて帯域拡張信号が得られ、出力される。 The audio data whose codec noise has been reduced is expanded in band by the band extension circuit 132, and then only the frequency component of the desired extension portion is extracted by the high-pass circuit 133 and supplied to the adder 135. Also, the audio data obtained by the decoding process by the decoding circuit 101 is given to the adder 135 with the sampling rate increased by the upsampling circuit 134. As a result, the adder 135 adds the up-sampled audio data portion of the existing band and the high-frequency audio data portion from the high-pass circuit 133 to obtain and output a band extension signal.

（Ａ−３）第１の実施形態の効果
第１の実施形態によれば、符号化・復号処理による量子化雑音（符復号雑音）を除去した後に、帯域拡張成分を形成するようにしたので、拡張帯域部分の符復号雑音が知覚され難くなり、音声品質が高い、聴き取り易さが良好な帯域拡張信号を得ることができる。 (A-3) Effect of First Embodiment According to the first embodiment, after removing the quantization noise (code decoding noise) due to the encoding / decoding process, the band extension component is formed. Thus, it is difficult to perceive the codec noise in the extension band portion, and it is possible to obtain a band extension signal with high voice quality and good listening.

因みに、従来は、符復号雑音は、聴覚特性を考慮したマスキング処理により知覚され難くしている。既存帯域部分については、このような処理で対応できていたが、生成された拡張帯域部分については、正確な聴覚特性が得られないため、効果的なマスキング処理が実施できず、符復号雑音が知覚されてしまっていた。 Incidentally, conventionally, codec noise has been made difficult to perceive by masking processing in consideration of auditory characteristics. The existing band portion could be handled by such processing, but the generated extended band portion could not obtain an accurate auditory characteristic, so that effective masking processing could not be performed, and codec noise was not generated. It was perceived.

（Ｂ）第２の実施形態
次に、本発明による帯域拡張装置、方法及びプログラム、並びに、量子化雑音学習装置、方法及びプログラムの第２の実施形態を、図面を参照しながら詳述する。第２の実施形態の装置は、適用可能なコーデックの種類が２種類以上の場合の実施形態である。 (B) Second Embodiment Next, a second embodiment of the band extending apparatus, method and program, and quantization noise learning apparatus, method and program according to the present invention will be described in detail with reference to the drawings. The apparatus of the second embodiment is an embodiment when there are two or more types of codecs that can be applied.

図３は、第２の実施形態に係る帯域拡張装置の全体構成を示すブロック図であり、第１の実施形態に係る図１との同一、対応部分には、同一、対応符号を付して示している。 FIG. 3 is a block diagram showing the overall configuration of the bandwidth extension apparatus according to the second embodiment. The same and corresponding parts as those in FIG. 1 according to the first embodiment are assigned the same and corresponding reference numerals. Show.

図３において、第２の実施形態の帯域拡張装置１００Ａは、大きくは、量子化雑音学習部１１０Ａと、学習パラメータ記憶部１２０Ａと、帯域拡張部１３０Ａとから構成されている。 In FIG. 3, the band extending apparatus 100A of the second embodiment is mainly composed of a quantization noise learning unit 110A, a learning parameter storage unit 120A, and a band extending unit 130A.

第２の実施形態の量子化雑音学習部１１０Ａは、学習用データ発生部１１１、符号化回路１１２Ａ、復号回路１１３Ａ、減算器１１４、符復号雑音学習回路１１５及び学習コーデック種類設定部１１６を有する。 The quantization noise learning unit 110A of the second embodiment includes a learning data generation unit 111, an encoding circuit 112A, a decoding circuit 113A, a subtractor 114, a codec noise learning circuit 115, and a learning codec type setting unit 116.

第２の実施形態の場合、符号化回路１１２Ａ及び復号回路１１３Ａはそれぞれ、学習コーデック種類設定部１１６によって指示されたコーデックの種類（符号化方式）に従った符号化、復号を実行できる可変構成のものである。なお、コーデックの種類によって、学習用データのサンプリングレートも異なる場合には、学習用データ発生部１１１も、学習コーデック種類設定部１１６によって指示されたコーデックの種類に応じた学習用データを出力する可変構成のものを適用することを要する。 In the case of the second embodiment, each of the encoding circuit 112A and the decoding circuit 113A has a variable configuration capable of executing encoding and decoding according to the codec type (encoding method) instructed by the learning codec type setting unit 116. Is. If the sampling rate of the learning data varies depending on the codec type, the learning data generation unit 111 also outputs the learning data corresponding to the codec type instructed by the learning codec type setting unit 116. It is necessary to apply the configuration.

第２の実施形態で新たに設けられた学習コーデック種類設定部１１６は、当該装置が適用可能なコーデックの種類を、順次、符号化回路１１２Ａ及び復号回路１１３Ａに設定させて、各コーデック種類の学習パラメータを順次形成させて、学習パラメータ記憶部１２０Ａに記憶させるように各部を制御するものである。 The learning codec type setting unit 116 newly provided in the second embodiment causes the encoding circuit 112A and the decoding circuit 113A to sequentially set the codec types applicable to the apparatus, and learns each codec type. Each unit is controlled so that parameters are sequentially formed and stored in the learning parameter storage unit 120A.

図４は、学習パラメータ記憶部１２０Ａの構成を示す説明図である。第２の実施形態の学習パラメータ記憶部１２０Ａは、コーデック種類毎に、学習パラメータを記憶している構成を有する。 FIG. 4 is an explanatory diagram showing the configuration of the learning parameter storage unit 120A. The learning parameter storage unit 120A of the second embodiment has a configuration in which learning parameters are stored for each codec type.

第２の実施形態の帯域拡張部１３０Ａは、符復号雑音低減回路１３１Ａ、帯域拡張回路１３２、高域通過回路１３３、アップサンプリング回路１３４、加算器１３５及び適用コーデック種類設定部１３６を有する。 The band extension unit 130A of the second embodiment includes a codec noise reduction circuit 131A, a band extension circuit 132, a high-pass circuit 133, an upsampling circuit 134, an adder 135, and an applied codec type setting unit 136.

第２の実施形態で新たに設けられた適用コーデック種類設定部１３６は、音声通信の開始前に実行されるネゴシエーションで適用することに決定されたコーデックの種類を、可変構成の復号回路１０１Ａに設定させるものであるが、それだけでなく、符復号雑音低減回路１３１Ａに対し、学習パラメータ記憶部１２０Ａから学習パラメータを取り出すコーデックの種類を指示するものとして機能する。なお、コーデックの種類によって、音声データのサンプリングレートも異なる場合には、適用コーデック種類設定部１３６は、帯域拡張回路１３２、高域通過回路１３３、アップサンプリング回路１３４、加算器１３５等に対しても、コーデック種類又はサンプリングレートを通知したり指示したりする。 The applied codec type setting unit 136 newly provided in the second embodiment sets the codec type determined to be applied in the negotiation executed before the start of voice communication in the decoding circuit 101A having a variable configuration. In addition, the codec noise reduction circuit 131A functions as an instruction for the type of codec for extracting the learning parameter from the learning parameter storage unit 120A. If the sampling rate of the audio data varies depending on the codec type, the applied codec type setting unit 136 also applies to the band extension circuit 132, the high-pass circuit 133, the upsampling circuit 134, the adder 135, and the like. Informs or indicates the codec type or sampling rate.

以上のように、第２の実施形態は複数種類のコーデックに対応できるような構成になっており、学習するコーデックの種類が設定された際の動作は第１の実施形態と同様であり、通信時のコーデックの種類が定まった後の動作は第１の実施形態と同様であり、動作の詳細な説明は省略する。 As described above, the second embodiment is configured to be compatible with a plurality of types of codecs, and the operation when the type of codec to be learned is set is the same as that of the first embodiment. The operation after the type of codec is determined is the same as in the first embodiment, and a detailed description of the operation is omitted.

第２の実施形態によれば、装置が複数種類のコーデックに対応し、通信時にいずれのコーデックが選択されても、上述した第１の実施形態の効果を奏することができる。 According to the second embodiment, the apparatus supports a plurality of types of codecs, and the effect of the first embodiment described above can be achieved regardless of which codec is selected during communication.

（Ｃ）他の実施形態
上記各実施形態では、学習パラメータ記憶部及び帯域拡張部に加え、量子化雑音学習部も同じ装置が有するものを示したが、学習パラメータ記憶部及び帯域拡張部だけを備え、量子化雑音学習部を備えないように装置を構築しても良い。この場合、外部装置が得た学習パラメータが学習パラメータ記憶部に設定されている。 (C) Other Embodiments In each of the above embodiments, the same apparatus is also included in the quantization noise learning unit in addition to the learning parameter storage unit and the band extension unit. However, only the learning parameter storage unit and the band extension unit are included. It is also possible to construct the apparatus so as not to include the quantization noise learning unit. In this case, the learning parameter obtained by the external device is set in the learning parameter storage unit.

また、上記各実施形態では、アップサンプリング回路への入力データを、符復号雑音低減回路の入力側から得ているものを示したが、アップサンプリング回路への入力データを、符復号雑音低減回路の出力側から得るようにしても良い。 In each of the above embodiments, the input data to the upsampling circuit is obtained from the input side of the codec noise reduction circuit. It may be obtained from the output side.

上記各実施形態では、帯域拡張対象が音声データであるものを示したが、帯域拡張対象が音響データの場合にも、本発明を適用することができる。 In each of the above embodiments, the band extension target is audio data, but the present invention can also be applied to the case where the band extension target is acoustic data.

１００、１００Ａ…帯域拡張装置、
１１０、１１０Ａ…量子化雑音学習部、
１１１…学習用データ発生部、１１２、１１２Ａ…符号化回路、
１１３、１１３Ａ…復号回路、１１４…減算器、
１１５…符復号雑音学習回路、１１６…学習コーデック種類設定部、
１２０、１２０Ａ…学習パラメータ記憶部、
１３０、１３０Ａ…帯域拡張部、
１３１、１３１Ａ…符復号雑音低減回路、１３２…帯域拡張回路、
１３３…高域通過回路、１３４…アップサンプリング回路、
１３５…加算器、１３６…適用コーデック種類設定部。 100, 100A ... band extension device,
110, 110A ... Quantization noise learning unit,
111 ... Data generator for learning, 112, 112A ... Encoding circuit,
113, 113A ... decoding circuit, 114 ... subtractor,
115: Coded noise learning circuit, 116: Learning codec type setting unit,
120, 120A ... learning parameter storage unit,
130, 130A ... bandwidth extension unit,
131, 131A ... Coded noise reduction circuit, 132 ... Band extension circuit,
133 ... high-pass circuit, 134 ... upsampling circuit,
135... Adder, 136... Applicable codec type setting unit.

Claims

In a bandwidth extension device that extends the bandwidth extension target signal output from the decoding circuit,
A bandwidth extension circuit body for performing bandwidth extension;
A noise frequency characteristic storage means for storing information on the frequency characteristic of quantization noise specific to the encoding method according to the decoding circuit , formed by learning using a learning signal ;
From the band extension target signal output from the decoding circuit, the mixed quantization noise component is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage unit, and A bandwidth expansion apparatus comprising: quantization noise reduction means for input to a bandwidth expansion circuit body.

A training signal generating means for outputting said training signal,
Encoding the learning signal according to the above encoding method, and immediately decoding the quantization noise mixed learning signal forming means,
Subtracting means for subtracting the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means to obtain a quantized noise signal;
The band extending apparatus according to claim 1, further comprising: a noise learning unit that analyzes frequency characteristics of the obtained quantized noise signal and forms information to be stored in the noise frequency characteristic storage unit.

In a bandwidth extension method for extending a bandwidth extension target signal output from a decoding circuit,
The bandwidth extension circuit body performs bandwidth extension,
The noise frequency characteristic storage means stores the frequency characteristic information of the quantization noise specific to the encoding system related to the decoding circuit , formed by learning using the learning signal ,
The quantization noise reduction means converts the quantization noise component mixed from the band extension target signal output from the decoding circuit into information on the frequency characteristics of the quantization noise stored in the noise frequency characteristic storage means. A bandwidth expansion method comprising: reducing the bandwidth based on the bandwidth and inputting the bandwidth to the bandwidth expansion circuit body.

The learning signal generating means outputs a learning signal,
The quantization noise-mixed learning signal forming means encodes the learning signal according to the above encoding method, and immediately decodes it,
The subtracting means subtracts the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means, to obtain a quantization noise signal,
The band extending method according to claim 3, wherein the noise learning unit analyzes the frequency characteristic of the obtained quantized noise signal and forms information to be stored in the noise frequency characteristic storage unit.

A bandwidth extension program for extending the bandwidth extension target signal obtained by the decoding process,
Computer
A bandwidth extension circuit body for performing bandwidth extension;
A noise frequency characteristic storage means for storing information on frequency characteristics of quantization noise specific to the encoding method according to the decoding process , formed by learning using a learning signal ;
From the band extension target signal obtained by the decoding process, the mixed quantization noise component is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means, and A bandwidth expansion program that functions as a quantization noise reduction means that is input to a bandwidth expansion circuit body.

Computer
Learning signal generating means for outputting a learning signal;
Encoding the learning signal according to the above encoding method, and immediately decoding the quantization noise mixed learning signal forming means,
Subtracting means for subtracting the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means to obtain a quantized noise signal;
6. The bandwidth expansion program according to claim 5, wherein the bandwidth expansion program functions as noise learning means for analyzing the frequency characteristics of the obtained quantized noise signal and forming information to be stored in the noise frequency characteristics storage means. .

A noise frequency characteristic that stores information on the frequency characteristic of a quantization noise peculiar to a coding system related to the above-mentioned decoding circuit and a band expansion circuit main body that performs band expansion on the band extension target signal output from the decoding circuit The quantization noise component mixed in the storage means and the band extension target signal output from the decoding circuit is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means. A quantization noise learning device that forms information on frequency characteristics of quantization noise used by a bandwidth expansion device having a quantization noise reduction means that is input to the bandwidth expansion circuit body,
Learning signal generating means for outputting a learning signal;
Encoding the learning signal according to the above encoding method, and immediately decoding the quantization noise mixed learning signal forming means,
Subtracting means for subtracting the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means to obtain a quantized noise signal;
A quantization noise learning device comprising: noise learning means for analyzing frequency characteristics of the obtained quantized noise signal and forming information to be stored in the noise frequency characteristic storage means.

A noise frequency characteristic that stores information on the frequency characteristic of a quantization noise peculiar to a coding system related to the above-mentioned decoding circuit and a band expansion circuit main body that performs band expansion on the band extension target signal output from the decoding circuit The quantization noise component mixed in the storage means and the band extension target signal output from the decoding circuit is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means. A quantization noise learning method for forming frequency noise information of a quantization noise used by a band expansion device having a quantization noise reduction means to be input to the band expansion circuit body,
The learning signal generating means outputs a learning signal,
The quantization noise-mixed learning signal forming means encodes the learning signal according to the above encoding method, and immediately decodes it,
The subtracting means subtracts the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means, to obtain a quantization noise signal,
A noise learning means analyzes the frequency characteristic of the obtained quantized noise signal and forms information to be stored in the noise frequency characteristic storage means.

A noise frequency characteristic that stores information on the frequency characteristic of a quantization noise peculiar to a coding system related to the above-mentioned decoding circuit and a band expansion circuit main body that performs band expansion on the band extension target signal output from the decoding circuit The quantization noise component mixed in the storage means and the band extension target signal output from the decoding circuit is reduced based on the frequency characteristic information of the quantization noise stored in the noise frequency characteristic storage means. A quantization noise learning program for forming information on frequency characteristics of quantization noise used by a band expansion device having quantization noise reduction means to be input to the band expansion circuit body,
Computer
Learning signal generating means for outputting a learning signal;
Encoding the learning signal according to the above encoding method, and immediately decoding the quantization noise mixed learning signal forming means,
Subtracting means for subtracting the learning signal output from the quantization noise mixed learning signal forming means from the learning signal output from the learning signal generating means to obtain a quantized noise signal;
A quantization noise learning program which analyzes the frequency characteristics of the obtained quantized noise signal and functions as noise learning means for forming information to be stored in the noise frequency characteristic storage means.