JP2003316394A

JP2003316394A - System, method, and program for decoding sound

Info

Publication number: JP2003316394A
Application number: JP2002120751A
Authority: JP
Inventors: Yuichiro Takamizawa; 雄一郎高見沢; Toshiyuki Nomura; 俊之野村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-04-23
Filing date: 2002-04-23
Publication date: 2003-11-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide a sound decoding system for generating a decoded sound with high sound quality, and also to provide its method, and its program. <P>SOLUTION: The system includes: a flatening means (104); a copying means (105); and an envelope adding means (106), and comprises a function to generate the frequency signal of a certain frequency band by flatening a spectrum approximate form concerning a part of a frequency area signal which is decoded by a conventional decoding means, copying it to another frequency band, and adding the proper spectrum approximate form to the copied flat signal. The code amount of information about which kind of spectrum approximate form is added is generally smaller than the code amount which is generated by Huffman encoding in the frequency band, so that efficiency is raised in encoding. <P>COPYRIGHT: (C)2004,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は音声復号システム、
音声復号方法、音声復号プログラム（以下、音声復号シ
ステムと記載する）に関し、特に、高音質な復号音声を
生成することができる音声復号システムに関する。TECHNICAL FIELD The present invention relates to a speech decoding system,
The present invention relates to a voice decoding method and a voice decoding program (hereinafter referred to as a voice decoding system), and more particularly to a voice decoding system capable of generating decoded voice with high sound quality.

【０００２】[0002]

【従来の技術】従来の音声復号システムを、国際標準オ
ーディオ符号化方式であるＭＰＥＧ−２ＡＡＣ方式を例
として説明する。ＭＰＥＧ−２ＡＡＣの動作の詳細につ
いてはその規格文書である「１９９７年、インフォメー
ションテクノロジー − ジェネリックコーディン
グオブムービングピクチャーズアンドアソシ
エイティッドオーディオ、パート７：アドバンスド
オーディオコーディング、エーエーシー”（Ｉｎｆｏ
ｒｍａｔｉｏｎＴｅｃｈｎｏｌｏｇｙ−Ｇｅｎｅｒｉ
ｃｃｏｄｉｎｇｏｆｍｏｖｉｎｇｐｉｃｔｕｒ
ｅｓａｎｄａｓｓｏｃｉａｔｅｄａｕｄｉｏ，
Ｐａｒｔ７：ＡｄｖａｎｃｅｄＡｕｄｉｏＣｏｄｉ
ｎｇ，ＡＡＣ」に記述してあり、広く同業者に知られて
いる。ここでは図５を用いてその復号システムの動作の
概要を説明する。2. Description of the Related Art A conventional audio decoding system will be described by taking the MPEG-2 AAC system which is an international standard audio encoding system as an example. For details of the operation of MPEG-2 AAC, refer to the standard document "Information Technology-Generic Coding of Moving Pictures and Associated Audio, 1997, Part 7: Advanced.
Audio Coding, AC ”(Info
rmation Technology-Generi
c coding of moving picture
es and associated audio,
Part7: Advanced Audio Codi
ng, AAC ”and is widely known to those skilled in the art. Here, the outline of the operation of the decoding system will be described with reference to FIG.

【０００３】ＭＰＥＧ−２ＡＡＣに代表される従来の音
声復号システムは、ビットストリーム多重化解除手段５
００、ハフマン復号手段５０１、逆量子化手段５０２、
逆写像変換手段５０３から構成されている。A conventional audio decoding system represented by MPEG-2 AAC has a bit stream demultiplexing means 5.
00, Huffman decoding means 501, dequantization means 502,
It comprises an inverse mapping conversion means 503.

【０００４】ビットストリーム多重化解除手段５００
は、入力されたビットストリームに多重化されている情
報を分離し、ハフマン復号手段５０１、逆量子化手段５
０２、逆写像変換手段５０３へ各々の処理が必要とする
情報を出力する。例えば、ハフマン復号手段５０１へ
は、ハフマン符号化されている量子化値データや用いら
れているハフマン符号化手段などに関する情報が出力さ
れる。逆量子化手段５０２へは、量子化精度を表すパラ
メータなどが出力され、各量子化値を逆量子化する際に
用いられる。逆写像変換手段５０３へは、逆写像変換で
用いられる変換関数や窓係数を表すパラメータなどが出
力される。Bitstream demultiplexing means 500
Separates the information multiplexed in the input bit stream, and Huffman decoding means 501 and inverse quantization means 5
02, outputs information required for each processing to the inverse mapping conversion means 503. For example, to the Huffman decoding unit 501, information regarding the Huffman-encoded quantized value data, the Huffman encoding unit used, and the like is output. The dequantization means 502 outputs a parameter indicating the quantization accuracy and the like, and is used when dequantizing each quantized value. The inverse mapping conversion unit 503 outputs a conversion function used in the inverse mapping conversion, a parameter representing a window coefficient, and the like.

【０００５】ハフマン復号手段５０１は、ビットストリ
ーム多重化解除手段５００から入力された量子化値デー
タをハフマン復号し、複数の量子化値を生成する。この
複数の量子化値は周波数領域の信号であり、各周波数で
の信号強度を表している。生成された複数の量子化値は
逆量子化手段５０２へ出力される。The Huffman decoding means 501 Huffman-decodes the quantized value data input from the bitstream demultiplexing means 500 to generate a plurality of quantized values. The plurality of quantized values are signals in the frequency domain and represent the signal strength at each frequency. The generated plurality of quantized values are output to the inverse quantizer 502.

【０００６】逆量子化手段５０２は、ハフマン復号手段
５０１から入力された複数の量子化値を、ビットストリ
ーム多重化解除手段５００から入力された量子化精度を
表すパラメータに基づいて逆量子化し、複数の周波数領
域の音声信号を生成する。The dequantizing means 502 dequantizes a plurality of quantized values input from the Huffman decoding means 501 based on a parameter representing the quantization accuracy input from the bitstream demultiplexing means 500, To generate an audio signal in the frequency domain.

【０００７】逆写像変換手段５０３は、逆量子化手段５
０２から入力された周波数領域の音声信号に対し、ビッ
トストリーム多重化解除手段５００から入力された変換
関数や窓係数に関する情報に基づいて逆写像変換を施
し、時間領域の復号音声信号を生成して出力する。The inverse mapping conversion means 503 is an inverse quantization means 5
The frequency domain audio signal input from 02 is subjected to inverse mapping transformation based on the information about the transform function and the window coefficient input from the bitstream demultiplexing means 500 to generate a decoded audio signal in the time domain. Output.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、ハフマ
ン復号手段５０１が出力する量子化値、および、逆量子
化手段５０２が出力する周波数領域の音声信号は、特に
高周波数帯域においてゼロの値となることが多い。これ
は、ビットレートの制限により、これら主に高周波数帯
域の信号を音声符号化システムが符号化してビットスト
リームに多重化できないために発生する。ハフマン符号
化によって量子化値を圧縮する場合、その圧縮率の限界
から、ごく僅かな符号量に圧縮することが出来ず、符号
化できない周波数帯域が高周波数帯域を中心に多く発生
してしまう。このような周波数帯域の周波数領域信号
は、音声復号システムではゼロの値とする。このよう
に、高周波数帯域にゼロの値となることが多くなると、
音声復号システムで得られる復号音声信号は、その周波
数帯域が狭くなり、聴感的な音質が劣化する。However, the quantized value output by the Huffman decoding means 501 and the audio signal in the frequency domain output by the inverse quantization means 502 have a value of zero, especially in a high frequency band. There are many. This occurs because the audio coding system cannot code these signals in the high frequency band and multiplex them into the bit stream due to the bit rate limitation. When the quantized value is compressed by the Huffman coding, it is impossible to compress the code amount to a very small amount due to the limit of the compression rate, and many frequency bands that cannot be coded occur mainly in the high frequency band. The frequency domain signal in such a frequency band has a value of zero in the speech decoding system. In this way, when the value of zero becomes high in the high frequency band,
The frequency band of the decoded voice signal obtained by the voice decoding system is narrowed, and the audible sound quality is deteriorated.

【０００９】よって従来技術の問題点は、復号音声信号
の品質が悪いことである。その理由は、高周波数帯域に
おいて周波数領域信号がゼロの値となることが多いため
である。Therefore, the problem of the prior art is that the quality of the decoded speech signal is poor. The reason is that the frequency domain signal often has a value of zero in the high frequency band.

【００１０】本発明の目的は、高音質な復号音声信号を
生成できる音声復号システムおよび音声復号方法を提供
することにある。An object of the present invention is to provide a voice decoding system and a voice decoding method capable of generating a decoded voice signal with high sound quality.

【００１１】[0011]

【課題を解決するための手段】本発明の音声復号システ
ムは、平坦化手段と複写手段と包絡付加手段を少なくと
も備え、広帯域の周波数領域信号を効率的に生成するよ
う動作する。The speech decoding system of the present invention comprises at least a flattening means, a copying means and an envelope adding means, and operates so as to efficiently generate a wide band frequency domain signal.

【００１２】つまり、入力されたビットストリームか
ら、ハフマン復号手段が音声信号の周波数領域での量子
化値を生成し、逆量子化手段が前記量子化値を逆量子化
して周波数領域の音声信号を生成し、逆写像変換手段が
前記周波数領域の音声信号を時間領域の復号音声信号へ
と変換する従来の音声復号システムと比較し、前記従来
の音声復号システムが復号していない周波数帯域の全て
または一部である拡張帯域の周波数領域信号を生成する
ために、前記周波数領域の音声信号の振幅を平坦化して
平坦信号を生成する平坦化手段と、前記平坦信号の一部
を前記拡張帯域へ複写する複写手段と、前記拡張帯域へ
複写された周波数領域信号にスペクトラム包絡を付加し
て前記拡張帯域の周波数領域信号を生成する包絡付加手
段を備えて構成される。That is, the Huffman decoding means generates a quantized value in the frequency domain of the audio signal from the input bit stream, and the dequantization means dequantizes the quantized value to obtain the audio signal in the frequency domain. Compared with a conventional speech decoding system for generating and converting the frequency domain speech signal into a time domain decoded speech signal by the inverse mapping transforming means, all or all of the frequency bands not decoded by the conventional speech decoding system or Flattening means for flattening the amplitude of the audio signal in the frequency domain to generate a flat signal in order to generate a partial frequency band signal in the extension band; and copying a part of the flat signal into the extension band And a envelope adding unit that adds a spectrum envelope to the frequency domain signal copied to the extension band to generate a frequency domain signal in the extension band. .

【００１３】また、入力されたビットストリームから、
ハフマン復号手段が音声信号の周波数領域での量子化値
を生成し、逆量子化手段が前記量子化値を逆量子化して
周波数領域の音声信号を生成し、逆写像変換手段が前記
周波数領域の音声信号を時間領域の復号音声信号へと変
換する従来の音声復号システムと比較し、前記従来の音
声復号システムが復号していない周波数帯域の全てまた
は一部である拡張帯域の周波数領域信号を生成するため
に、前記周波数領域の音声信号の振幅を平坦化して平坦
信号を生成する平坦化手段と、前記平坦信号の一部を前
記拡張帯域へ複写する複写手段と、前記拡張帯域へ複写
された周波数領域信号に雑音を付加する雑音付加手段
と、前記雑音が付加された周波数領域信号にスペクトラ
ム包絡を付加して前記拡張帯域の周波数領域信号を生成
する包絡付加手段を備えて構成されう。From the input bit stream,
The Huffman decoding means generates a quantized value in the frequency domain of the speech signal, the dequantization means dequantizes the quantized value to generate a speech signal in the frequency domain, and the inverse mapping transformation means generates the quantized value in the frequency domain. Compared with a conventional voice decoding system that converts a voice signal into a decoded voice signal in a time domain, a frequency domain signal in an extension band that is all or a part of a frequency band not decoded by the conventional voice decoding system is generated. In order to do so, a flattening unit that flattens the amplitude of the audio signal in the frequency domain to generate a flat signal, a copying unit that copies a part of the flat signal to the extension band, and a copying unit that copies the flat signal to the extension band. Noise adding means for adding noise to the frequency domain signal, and envelope adding means for adding a spectrum envelope to the frequency domain signal with the added noise to generate a frequency domain signal in the extension band. Ete will be configured.

【００１４】[0014]

【発明の実施の形態】まず、本発明の概念について説明
する。本発明は、従来の音声復号システムに付加する形
で用いられて、ある周波数帯域の周波数信号を効率的に
生成する方法であり、従来の復号手段で復号された周波
数領域信号の一部を、そのスペクトラム概形を平坦化し
た上で他の周波数帯域へ複写し、さらに複写された平坦
な信号に適切なスペクトラム概形を付加することで周波
数領域の信号生成する。どこの周波数帯域をどこへ複写
し、どのようなスペクトラム概形を付加するかの情報は
予めビットストリームに含まれている。一般にこれらの
情報の符号量は、この周波数帯域のハフマン符号化で生
成される符号量よりも小さいために符号化効率を向上さ
せることができる。First, the concept of the present invention will be described. INDUSTRIAL APPLICABILITY The present invention is a method used in addition to a conventional speech decoding system to efficiently generate a frequency signal in a certain frequency band, and a part of the frequency domain signal decoded by the conventional decoding means, A frequency domain signal is generated by flattening the spectrum outline, copying it to another frequency band, and adding an appropriate spectrum outline to the copied flat signal. Information about which frequency band is copied to where, and what spectrum outline is added is included in the bitstream in advance. Generally, the code amount of these pieces of information is smaller than the code amount generated by the Huffman coding in this frequency band, so that the coding efficiency can be improved.

【００１５】（第１の実施の形態）次に、本発明の第１
の実施の形態について図面を参照して詳細に説明する。(First Embodiment) Next, a first embodiment of the present invention will be described.
Embodiments will be described in detail with reference to the drawings.

【００１６】図１を参照すると、本発明の実施の形態
は、ビットストリーム多重化解除手段１００、ハフマン
復号手段１０１、逆量子化手段１０２、逆写像変換手段
１０３、平坦化手段１０４、複写手段１０５、包絡付加
手段１０６とから構成される。Referring to FIG. 1, according to the embodiment of the present invention, a bit stream demultiplexing means 100, a Huffman decoding means 101, an inverse quantization means 102, an inverse mapping conversion means 103, a flattening means 104, and a copying means 105. , Envelope adding means 106.

【００１７】図５と比較すると、図１には平坦化手段１
０４、複写手段１０５、包絡付加手段１０６が追加され
ている。また、これらの追加された処理に必要な情報を
出力するために、ビットストリーム多重化解除手段１０
０が図５のビットストリーム多重化解除手段５００と比
較して機能に違いがある。その他の処理については、図
１のハフマン復号手段１０１、逆量子化手段１０２、逆
写像変換手段１０３は、図５のハフマン復号手段５０
１、逆量子化手段５０２、逆写像変換手段５０３とそれ
ぞれ等価である。そこで、変更されたビットストリーム
多重化解除手段１００、追加された平坦化手段１０４、
複写手段１０５、包絡付加手段１０６についてその動作
を詳細に説明する。なお、ここではＭＰＥＧ２−ＡＡＣ
方式を用いた従来の音声復号システムに付加する場合に
ついて説明するために、量子化値を生成する手段として
ハフマン復号手段１０１を用いている。しかしながら、
ハフマン復号手段１０１は本発明の本質には関わりな
く、算術符号化復号手段など、他のエントロピ符号化復
号手段などであってもよい。Compared with FIG. 5, the flattening means 1 is shown in FIG.
04, a copying unit 105, and an envelope adding unit 106 are added. Also, in order to output information necessary for these added processes, the bitstream demultiplexing means 10
0 is different in function from the bitstream demultiplexing means 500 of FIG. Regarding other processes, the Huffman decoding means 101, the dequantization means 102, and the inverse mapping conversion means 103 in FIG. 1 are the Huffman decoding means 50 in FIG.
1, inverse quantization means 502, and inverse mapping conversion means 503, respectively. Therefore, the changed bit stream demultiplexing unit 100, the added flattening unit 104,
The operation of the copying means 105 and the envelope adding means 106 will be described in detail. Note that here, MPEG2-AAC
The Huffman decoding means 101 is used as a means for generating a quantized value in order to explain the case of addition to the conventional speech decoding system using the method. However,
The Huffman decoding unit 101 may be another entropy coding / decoding unit such as an arithmetic coding / decoding unit, regardless of the essence of the present invention.

【００１８】ビットストリーム多重化解除手段１００
は、図５のビットストリーム多重化解除手段５００と比
較して、図１で新たに追加された平坦化手段１０４、複
写手段１０５、包絡付加手段１０６に対しても各処理で
必要となる情報を出力する点で異なる。出力する情報の
内容については各処理の説明の中で述べる。Bitstream demultiplexing means 100
Compared with the bitstream demultiplexing means 500 shown in FIG. 5, the information necessary for each processing is added to the flattening means 104, the copying means 105, and the envelope adding means 106 newly added in FIG. The difference is in the output. The contents of the output information will be described in the explanation of each process.

【００１９】平坦化手段１０４には逆量子化手段１０２
から周波数領域の音声信号が入力される。また、用いる
べき平坦化の方法を表す平坦化情報がビットストリーム
多重化解除手段１００から入力される場合もあるが、こ
れについては後述する。入力された周波数領域の音声信
号は、いわゆる周波数スペクトラムを表しており、その
おおまかな形状は包絡と呼ばれる。平坦化手段１０４で
は、入力された周波数領域の音声信号をその包絡で除算
することで平坦化させる。平坦化で得られた平坦信号は
複写手段１０５へ出力される。平坦化方法の詳細につい
ては後述する。The flattening means 104 includes an inverse quantization means 102.
The audio signal in the frequency domain is input from. Further, the flattening information indicating the flattening method to be used may be input from the bitstream demultiplexing unit 100, which will be described later. The input audio signal in the frequency domain represents a so-called frequency spectrum, and its rough shape is called an envelope. The flattening means 104 flattens the input frequency-domain audio signal by dividing it by its envelope. The flat signal obtained by the flattening is output to the copying unit 105. Details of the flattening method will be described later.

【００２０】例えば、逆量子化手段１０２が出力したＮ
個の周波数領域の音声信号をＭＤＣＴ（１〜Ｎ）とす
る。また、後述する平坦化方法によってＭＤＣＴ（１〜
Ｎ）を平坦化した信号をＦ（１〜Ｎ）とする。複写手段
１０５へはこのＦ（１〜Ｎ）が平坦信号として出力され
る。For example, N output from the inverse quantizer 102
The sound signals in each frequency domain are MDCT (1 to N). In addition, MDCT (1 to
A signal obtained by flattening N) is defined as F (1 to N). This F (1 to N) is output to the copying means 105 as a flat signal.

【００２１】複写手段１０５には平坦化手段１０４から
平坦信号、逆量子化手段１０２から周波数領域の音声信
号、ビットストリーム多重化解除手段１００から複写元
周波数情報と複写先周波数情報が入力される。複写手段
１０５では、平坦信号の一部の信号を複写元周波数情報
が表す周波数から、周波数領域の音声信号の複写先周波
数情報が表す周波数へと予め定められた個数だけ信号を
複写する。この複写する個数については、予め定められ
た個数ではなく、ビットストリーム多重化解除手段１０
０から供給される変数でも良い。この複写処理が行われ
た周波数領域の音声信号は包絡付加手段１０６へ出力さ
れる。A flat signal is input from the flattening unit 104, a frequency domain audio signal is input from the dequantization unit 102, and copy source frequency information and copy destination frequency information are input from the bitstream demultiplexing unit 100 to the copying unit 105. The copying unit 105 copies a predetermined number of signals of a part of the flat signal from the frequency represented by the copy source frequency information to the frequency represented by the copy destination frequency information of the audio signal in the frequency domain. The number to be copied is not a predetermined number, but the bitstream demultiplexing means 10
It may be a variable supplied from 0. The frequency-domain audio signal subjected to the copying process is output to the envelope adding unit 106.

【００２２】すなわち、複写手段１０５には前述のＭＤ
ＣＴ（１〜Ｎ）とＦ（１〜Ｎ）と複写元周波数情報と複
写先周波数情報が入力される。複写元周波数情報が表す
周波数情報をＦ１、複写先周波数情報が表す周波数情報
をＦ２、複写する信号の個数をＦ３とすると、複写手段
１０５では、Ｍ＝０〜（Ｆ３−１）についてＭＤＣＴ
（Ｆ２＋Ｍ）にＦ（Ｆ１＋Ｍ）を代入する。なお、ビッ
トストリームに複写元周波数情報と複写先周波数情報が
複数組記録されている場合には、複数のＦ１、Ｆ２、Ｆ
３の組の各々についてこの処理を行う。これは複数の周
波数帯域について、平坦信号からの信号複写を行うこと
を意味する。このようにして生成されたＭＤＣＴ（１〜
ＮＮ）は包絡付加手段１０６へ出力される。ここで、Ｎ
Ｎは前述のＮと同じでも、異なっても良い。例えば、Ｎ
ＮがＮよりも大きい場合は、従来の復号システムよりも
幅広い帯域の復号音声信号が得られるようになる。この
ようにＮＮがＮと異なる場合、逆写像変換手段１０３で
は、変換ブロック長さがＮではなく、変換ブロック長さ
がＮＮである逆変換関数や窓係数を使用すると良い。That is, the copying means 105 has the above-mentioned MD.
CT (1 to N), F (1 to N), copy source frequency information, and copy destination frequency information are input. When the frequency information represented by the copy source frequency information is F1, the frequency information represented by the copy destination frequency information is F2, and the number of signals to be copied is F3, the copying means 105 MDCTs for M = 0 to (F3-1).
Substitute F (F1 + M) for (F2 + M). When a plurality of sets of copy source frequency information and copy destination frequency information are recorded in the bit stream, a plurality of F1, F2, F
Do this for each of the three sets. This means that signal copying from a flat signal is performed for a plurality of frequency bands. The MDCT (1-
NN) is output to the envelope adding means 106. Where N
N may be the same as or different from N described above. For example, N
When N is larger than N, a decoded voice signal having a wider band than that of the conventional decoding system can be obtained. In this way, when NN is different from N, the inverse mapping transform means 103 may use an inverse transform function or window coefficient whose transform block length is NN instead of N.

【００２３】包絡付加手段１０６には複写手段１０５か
ら周波数領域の音声信号、ビットストリーム多重化解除
手段１００から包絡情報が入力される。包絡付加手段１
０６では、複写手段１０５で平坦信号から複写された信
号に対して包絡情報に基づいて包絡が付加される。こう
して得られた周波数領域の音声信号は逆写像変換手段１
０３へ出力される。The envelope adding means 106 receives the frequency domain audio signal from the copying means 105 and the envelope information from the bit stream demultiplexing means 100. Envelope adding means 1
At 06, an envelope is added to the signal copied from the flat signal by the copying unit 105 based on the envelope information. The frequency domain audio signal thus obtained is inverse mapping conversion means 1
It is output to 03.

【００２４】すなわち、複写手段１０５から入力された
ＭＤＣＴ（１〜ＮＮ）について、包絡情報が示す周波数
帯域に、包絡情報が示す包絡を付加する。例えば、包絡
情報が示す周波数帯域の周波数情報をＥ１からＥ２、包
絡係数をＥ（Ｅ１〜Ｅ２）とし、包絡を包絡係数との乗
算によって付加されるものとした場合、包絡付加手段１
０６で行う処理は、Ｍ＝Ｅ１〜Ｅ２について、ＭＤＣＴ
（Ｍ）をＥ（Ｍ）倍することである。こうして生成され
たＭＤＣＴ（１〜ＮＮ）は逆写像変換手段１０３へ出力
される。That is, with respect to the MDCT (1 to NN) input from the copying means 105, the envelope indicated by the envelope information is added to the frequency band indicated by the envelope information. For example, when the frequency information of the frequency band indicated by the envelope information is E1 to E2, the envelope coefficient is E (E1 to E2), and the envelope is added by multiplication with the envelope coefficient, the envelope adding means 1
The process performed in 06 is MDCT for M = E1 to E2.
(M) is multiplied by E (M). The MDCT (1 to NN) generated in this way is output to the inverse mapping conversion unit 103.

【００２５】なお、包絡係数Ｅ（Ｅ１〜Ｅ２）について
は、全てを包絡情報としてビットストリームに符号化し
ておくと符号量が大きくなる。そこで、包絡係数として
その始点と終点の値、つまりＥ（Ｅ１）とＥ（Ｅ２）の
みをビットストリームに符号化しておき、それ以外の包
絡係数については、復号システムでＥ（Ｅ１）とＥ（Ｅ
２）から内挿補間によって計算すると良い。つまりＥ１
＜Ｅ３＜Ｅ２となるＥ３について、Ｅ（Ｅ３）は、Ｅ
（Ｅ１）×（Ｅ２−Ｅ３）÷（Ｅ２−Ｅ１）＋Ｅ（Ｅ
２）×（Ｅ３−Ｅ１）÷（Ｅ２−Ｅ１）などとして求め
ると良い。It is to be noted that, with respect to the envelope coefficient E (E1 to E2), if all of them are encoded in the bit stream as envelope information, the code amount becomes large. Therefore, as the envelope coefficient, only the values of the start point and the end point, that is, E (E1) and E (E2) are encoded in the bitstream, and the other envelope coefficients are E (E1) and E (E) in the decoding system. E
It is better to calculate by interpolation from 2). That is, E1
For E3 with <E3 <E2, E (E3) is E
(E1) × (E2-E3) ÷ (E2-E1) + E (E
2) × (E3−E1) ÷ (E2−E1) or the like.

【００２６】さらに、包絡係数Ｅ（Ｅ１〜Ｅ２）につい
ては、これを時系列の信号とみなして線形予測分析して
得た係数やスプラインなどの曲線近似係数で表現した値
をビットストリームから取得して生成しても良い。ま
た、このような線形予測分析やスプラインなどの曲線近
似に加えて、これらと包絡係数Ｅ（Ｅ１〜Ｅ２）の差分
情報をビットストリームから取得し、これを加算するこ
とによって包絡係数Ｅ（Ｅ１〜Ｅ２）を生成しても良
い。Further, regarding the envelope coefficient E (E1 to E2), a value represented by a curve approximation coefficient such as a coefficient or a spline obtained by performing linear prediction analysis by regarding this as a time-series signal is acquired from the bit stream. May be generated. In addition to such linear prediction analysis and curve approximation such as a spline, difference information between these and the envelope coefficient E (E1 to E2) is acquired from the bit stream and added to this to obtain the envelope coefficient E (E1 to E1). E2) may be generated.

【００２７】次に、平坦化手段１０４での平坦化の方法
についてさらに詳細に説明する。上述の通り、平坦化手
段１０４では、入力された周波数領域の音声信号をその
包絡で除算することで平坦化させる。具体的に、周波数
領域の音声信号ＭＤＣＴ（１〜Ｎ）を平坦化して平坦信
号Ｆ（１〜ＮＮ）を求める方法を説明する。Ｌ＝１〜Ｎ
Ｎに対して、平坦信号Ｆ（Ｌ）は探索範囲Ｒを用いて以
下の手順で求められる。Next, the method of flattening by the flattening means 104 will be described in more detail. As described above, the flattening unit 104 flattens the input audio signal in the frequency domain by dividing it by the envelope. A method of flattening the audio signal MDCT (1 to N) in the frequency domain to obtain the flat signal F (1 to NN) will be specifically described. L = 1 to N
For N, the flat signal F (L) is obtained by the following procedure using the search range R.

【００２８】（手順１）ＭＤＣＴ（Ｌ−Ｒ）からＭＤＣ
Ｔ（Ｌ＋Ｒ）までの（２Ｒ＋１）個の信号の値の中で代
表値を算出する。代表値は例えば信号の最大振幅値や平
均振幅値などが適しており、ここでは、代表値を最大の
絶対値ＭＡＸとする。（手順２）代表値であるＭＡＸがゼロであればＦ（Ｌ）
＝０（手順３）代表値であるＭＡＸがゼロでなければＦ
（Ｌ）＝ＭＤＴ（Ｌ）÷ＭＡＸ(Procedure 1) MDCT (LR) to MDC
The representative value is calculated among the (2R + 1) signal values up to T (L + R). The representative value is, for example, the maximum amplitude value or the average amplitude value of the signal, and the representative value is the maximum absolute value MAX here. (Procedure 2) F (L) if the typical value MAX is zero
= 0 (Procedure 3) If the typical value MAX is not zero, F
(L) = MDT (L) ÷ MAX

【００２９】ここでは、周波数近傍の代表値（ここでは
最大振幅値）であるＭＡＸをスペクトラムの包絡と見な
し、周波数領域の音声信号を包絡で除算している。な
お、代表値を探索する範囲を表す探索範囲Ｒについて
は、定数であっても良いし、可変値であっても良い。可
変値である場合には、その値に関する情報をビットスト
リーム多重化解除手段１００がビットストリームから取
得し、平坦化手段１０４へ出力する必要がある。Here, MAX, which is a representative value in the vicinity of the frequency (here, the maximum amplitude value), is regarded as the envelope of the spectrum, and the voice signal in the frequency domain is divided by the envelope. It should be noted that the search range R representing the range for searching the representative value may be a constant or a variable value. In the case of a variable value, it is necessary for the bitstream demultiplexing means 100 to acquire the information regarding the value from the bitstream and output it to the flattening means 104.

【００３０】探索範囲Ｒを可変値とする方法としては、
周波数領域の音声信号のピッチを、音声復号システムに
おいて計算、または、音声符号化システムにおいて計算
してビットストリームに符号化しておき、これを音声復
号システムが復号する方法のどちらかで取得し、このピ
ッチの値に応じて探索範囲Ｒを変化させる方法がある。
ピッチを計算する方法としては、周波数領域の音声信号
ＭＤＣＴ（１〜Ｎ）の自己相関を用いる方法など、一般
的に知られている方法を用いることができる。探索範囲
Ｒはピッチと同じか、より広いことが好ましく、ピッチ
が大きくなれば探索範囲Ｒを大きく、ピッチが小さくな
れば探索範囲Ｒを小さくすると良い。As a method of setting the search range R to a variable value,
The pitch of the voice signal in the frequency domain is calculated in the voice decoding system, or is calculated in the voice encoding system and encoded into a bit stream, and this is acquired by the method of decoding by the voice decoding system. There is a method of changing the search range R according to the pitch value.
As a method for calculating the pitch, a generally known method such as a method using the autocorrelation of the voice signal MDCT (1 to N) in the frequency domain can be used. The search range R is preferably equal to or wider than the pitch, and the search range R may be increased as the pitch increases and decreased as the pitch decreases.

【００３１】次に、平坦信号のどの周波数帯域を複写す
るかを表す複写元周波数情報、および、周波数領域の音
声信号のどの周波数帯域へ複写するかを表す複写先情報
について一例を説明する。まず、複写元周波数情報に関
するブロック長をＰ１、複写先周波数情報に関するブロ
ック長をＰ２と定義する。ここでＰ１、Ｐ２は各々予め
定められた定数でも良いし、上述のピッチの値と同じに
しても良い。さらに、平坦信号Ｆ（１〜Ｎ）を低周波数
から周波数順にＰ１個の信号から構成されるブロック
に、周波数領域の音声信号ＭＤＣＴ（１〜ＮＮ）を低周
波数から周波数順にＰ２個の信号から構成されるブロッ
クに分割する。つまり、平坦信号Ｆ（１〜Ｎ）では１番
目のブロックがＦ（１〜Ｐ１）、２番目のブロックがＦ
（Ｐ１＋１〜２Ｐ１）、・・・、というようにブロック
へ区切られていく。同様に周波数領域の音声信号ＭＤＣ
Ｔ（１〜ＮＮ）も、１番目のブロックがＭＤＣＴ（１〜
Ｐ２）、２番目のブロックがＭＤＣＴ（Ｐ２＋１〜２Ｐ
２）、・・・、というようにブロックへ区切られてい
く。Next, an example of copy source frequency information indicating which frequency band of the flat signal is copied and copy destination information indicating which frequency band of the audio signal in the frequency domain is copied will be described. First, the block length related to the copy source frequency information is defined as P1, and the block length related to the copy destination frequency information is defined as P2. Here, P1 and P2 may each be a predetermined constant, or may be the same as the above-mentioned pitch value. Further, the flat signal F (1 to N) is composed of P1 signals in order of frequency from low frequency, and the audio signal MDCT (1 to NN) in the frequency domain is composed of P2 signals in order of frequency from low frequency. Divided into blocks. That is, in the flat signal F (1 to N), the first block is F (1 to P1) and the second block is F (1 to P1).
(P1 + 1 to 2P1), ... And so on. Similarly, frequency domain audio signal MDC
In T (1 to NN), the first block is MDCT (1 to NN).
P2) The second block is MDCT (P2 + 1 to 2P
2), ..., It is divided into blocks.

【００３２】複写手段１０５では、ビットストリーム多
重化分離手段１００から入力される複写元周波数情報Ｂ
Ｏ、複写先周波数情報ＢＤを用いて、平坦信号のＢＯ番
目のブロックからの信号を周波数領域の音声信号のＢＤ
番目のブロックへコピーする。すなわち、平坦信号の低
域からＰ１×ＢＯ番目からの信号を、周波数領域の音声
信号の低域からＰ２×ＢＤ番目の信号からへと複写す
る。この時、両者の位相差を考慮して、複写元をＰ１×
ＢＯ番目からではなく、Ｐ１×ＢＯ＋ＰＨとしても良
い。ここでＰＨは位相差を表しており、ビットストリー
ム多重化解除手段１００によって供給されても良いし、
ＰＨ＝（ＢＤ×Ｐ２）ＭＯＤＰ１として求めてもよ
い。ここでＡＭＯＤＢはＡをＢで除算した余りを求
める計算である。In the copying means 105, the copy source frequency information B input from the bit stream multiplexing / separating means 100.
O, by using the copy destination frequency information BD, the signal from the BO-th block of the flat signal is converted to the BD of the frequency domain audio signal.
Copy to the second block. That is, the signal from the P1 × BOth signal from the low band of the flat signal is copied to the P2 × BDth signal from the low band of the audio signal in the frequency domain. At this time, considering the phase difference between the two, the copy source is set to P1 ×.
It may be P1 × BO + PH instead of the BOth. Here, PH represents a phase difference and may be supplied by the bitstream demultiplexing means 100,
It may be obtained as PH = (BD × P2) MOD P1. Here, A MOD B is a calculation for obtaining the remainder when A is divided by B.

【００３３】次に、図１及び図２のフローチャートを参
照して本実施の形態の全体の動作について説明する。Next, the overall operation of this embodiment will be described with reference to the flow charts of FIGS.

【００３４】まず、逆量子化手段（図１の１０２）が出
力した周波数領域の音声信号を平坦化して平坦信号を生
成する（図２のステップ２０１、図１の１０４）。次
に、平坦化信号の一部を複写して周波数領域の音声信号
を生成する（ステップ２０２、図１の１０５）。さら
に、複写した音声信号に包絡を付加して周波数領域の音
声信号を生成する（ステップ２０３、図１の１０６）。
こうして生成された周波数領域の音声信号が逆写像変換
（図１の１０３）の入力となる。First, the dequantization means (102 in FIG. 1) outputs a frequency domain audio signal to be flattened to generate a flat signal (step 201 in FIG. 2, 104 in FIG. 1). Next, a part of the flattened signal is copied to generate an audio signal in the frequency domain (step 202, 105 in FIG. 1). Further, an envelope is added to the copied voice signal to generate a voice signal in the frequency domain (step 203, 106 in FIG. 1).
The frequency-domain audio signal generated in this way is input to the inverse mapping transformation (103 in FIG. 1).

【００３５】次に、本発明の第１の実施の形態の効果に
ついて説明する。Next, the effect of the first embodiment of the present invention will be described.

【００３６】本実施の形態では、平坦化手段（図１の１
０４）、複写手段（図１の１０５）、包絡付加手段（図
１の１０６）を持ち、ある周波数帯域の周波数信号を、
従来の復号手段で復号された周波数領域信号の一部を、
そのスペクトラム包絡で平坦化した上で他の周波数帯域
へ複写し、さらに複写された平坦な信号に適切なスペク
トラム包絡を付加することで生成する機能を有する。ど
この周波数帯域をどこへ複写し、どのようなスペクトラ
ム包絡を付加するかの情報の符号量は、一般に、この周
波数帯域のハフマン符号化で生成される符号量よりも小
さいために符号化効率を向上させることができる。In this embodiment, the flattening means (1 in FIG. 1) is used.
04), a copying unit (105 in FIG. 1) and an envelope adding unit (106 in FIG. 1) are provided, and a frequency signal in a certain frequency band is
Part of the frequency domain signal decoded by the conventional decoding means,
It has a function of flattening with the spectrum envelope, copying it to another frequency band, and adding a proper spectrum envelope to the copied flat signal to generate it. The coding amount of information about which frequency band is copied to where, and what kind of spectrum envelope is added is generally smaller than the coding amount generated by Huffman coding in this frequency band, and thus the coding efficiency is improved. Can be improved.

【００３７】（第２の実施の形態）次に、本発明の第２
の実施の形態について図面を参照して詳細に説明する。(Second Embodiment) Next, the second embodiment of the present invention will be described.
Embodiments will be described in detail with reference to the drawings.

【００３８】図３を参照すると、本発明の実施の形態
は、ビットストリーム多重化解除手段３００、ハフマン
復号手段３０１、逆量子化手段３０２、逆写像変換手段
３０３、平坦化手段３０４、複写手段３０５、包絡付加
手段３０６、ノイズ付加手段３０７とから構成される。Referring to FIG. 3, according to the embodiment of the present invention, the bitstream demultiplexing means 300, the Huffman decoding means 301, the dequantization means 302, the inverse mapping conversion means 303, the flattening means 304, the copying means 305. , Envelope adding means 306 and noise adding means 307.

【００３９】図１に示した本発明の第１の実施の形態と
比較すると、図３に示した本発明の第２の実施の形態
は、ノイズ付加手段３０７が追加された点でのみ異な
る。また、このノイズ付加手段３０７に必要な情報を出
力するために、ビットストリーム多重化解除手段３００
が図１のビットストリーム多重化解除手段１００と比較
して機能に違いがある。その他の処理については、全て
第１の実施の形態と等価である。そこで、変更されたビ
ットストリーム多重化解除手段３００、追加されたノイ
ズ付加手段３０７についてその動作を詳細に説明する。Compared with the first embodiment of the present invention shown in FIG. 1, the second embodiment of the present invention shown in FIG. 3 is different only in that a noise adding means 307 is added. Also, in order to output necessary information to the noise adding means 307, the bit stream demultiplexing means 300
However, there is a difference in function as compared with the bitstream demultiplexing means 100 of FIG. All other processing is equivalent to that of the first embodiment. Therefore, the operation of the changed bit stream demultiplexing means 300 and the added noise adding means 307 will be described in detail.

【００４０】ビットストリーム多重化解除手段３００
は、図１のビットストリーム多重化解除手段１００と比
較して、図３で新たに追加されたノイズ付加手段３０７
において必要となる情報を出力する点で異なる。すなわ
ち、ノイズ付加手段３０７で用いるノイズ情報をビット
ストリームから取得してノイズ付加手段３０７へ出力す
る。Bitstream demultiplexing means 300
Is compared with the bitstream demultiplexing means 100 of FIG. 1, and is added to the noise adding means 307 newly added in FIG.
The difference is that it outputs the information required for. That is, the noise information used by the noise adding unit 307 is acquired from the bit stream and output to the noise adding unit 307.

【００４１】ノイズ付加手段３０７には、複写手段３０
５から周波数領域の音声信号が、ビットストリーム多重
化解除手段３００からノイズ情報が入力される。ノイズ
情報は、付加すべきノイズの平均振幅もしくは最大振幅
といった振幅情報とどの帯域にノイズを付加するかを表
しており、ノイズ付加手段３０７では、この振幅情報に
基づいた乱数を発生させ、これを周波数領域の音声信号
のノイズ情報が指定する周波数帯域に加算する。このよ
うにしてノイズが付加された周波数領域の音声信号は包
絡付加手段３０６へと出力される。The noise adding means 307 includes a copying means 30.
5, an audio signal in the frequency domain is input, and noise information is input from the bitstream demultiplexing means 300. The noise information represents amplitude information such as the average amplitude or the maximum amplitude of the noise to be added and in which band the noise is added. The noise adding means 307 generates a random number based on this amplitude information, and generates this random number. Add to the frequency band specified by the noise information of the audio signal in the frequency domain. The noise-added frequency-domain audio signal is output to the envelope adding means 306.

【００４２】次に、図３及び図４のフローチャートを参
照して本実施の形態の全体の動作について説明する。Next, the overall operation of this embodiment will be described with reference to the flow charts of FIGS. 3 and 4.

【００４３】まず、逆量子化手段（図３の３０２）が出
力した周波数領域の音声信号を平坦化して平坦信号を生
成する（図４のステップ４０１、図３の４０４）。次
に、平坦化信号の一部を複写して周波数領域の拡張音声
信号を生成する（図４のステップ４０２、図３の３０
５）。そして、周波数領域信号の一部に指定された振幅
のノイズを付加する（図４のステップ４０３、図３の３
０７）。さらに、周波数領域信号の一部に包絡を付加す
る（ステップ４０４、図３の４０６）。こうして生成さ
れた周波数領域の音声信号が逆写像変換（図３の３０
３）の入力となる。First, the dequantizing means (302 in FIG. 3) outputs the audio signal in the frequency domain to be flattened to generate a flat signal (step 401 in FIG. 4, 404 in FIG. 3). Next, a part of the flattened signal is copied to generate an expanded speech signal in the frequency domain (step 402 in FIG. 4, 30 in FIG. 3).
5). Then, noise having a specified amplitude is added to a part of the frequency domain signal (step 403 in FIG. 4, 3 in FIG. 3).
07). Further, an envelope is added to a part of the frequency domain signal (step 404, 406 in FIG. 3). The frequency domain audio signal thus generated is subjected to the inverse mapping transformation (30 in FIG. 3).
It becomes the input of 3).

【００４４】次に、本発明の第２の実施の形態の効果に
ついて説明する。Next, the effect of the second embodiment of the present invention will be described.

【００４５】本実施の形態では、平坦化手段（図３の３
０４）、複写手段（図３の３０５）、包絡付加手段（図
３の３０６）、ノイズ付加手段（図３の３０７）を持
ち、ある周波数帯域の周波数信号を、従来の復号手段で
復号された周波数領域信号の一部を、そのスペクトラム
概形を平坦化した上で他の周波数帯域へ複写し、一部帯
域にノイズを付加し、さらに複写された平坦な信号に適
切なスペクトラム概形を付加することで生成する機能を
有する。どこの周波数帯域をどこへ複写し、どのような
ノイズとスペクトラム概形を付加するかの情報の符号量
は、一般に、この周波数帯域のハフマン符号化で生成さ
れる符号量よりも小さいために符号化効率を向上させる
ことができる。In this embodiment, the flattening means (3 in FIG. 3) is used.
04), copying means (305 in FIG. 3), envelope adding means (306 in FIG. 3), and noise adding means (307 in FIG. 3), and a frequency signal of a certain frequency band is decoded by the conventional decoding means. A part of the frequency domain signal is flattened in its spectrum shape and then copied to another frequency band, noise is added to a part of the band, and an appropriate spectrum shape is added to the copied flat signal. It has a function to generate by doing. Since the code amount of information about which frequency band is copied to where, what kind of noise and spectrum outline is added is generally smaller than the code amount generated by Huffman coding in this frequency band, The efficiency of conversion can be improved.

【００４６】最後に、本発明の第１および第２の実施の
形態で示したような、ある帯域の周波数信号を、他の帯
域の周波数信号を元にして複写などの手段により効率的
に生成する音声復号システムにおいて、ステレオ信号を
効率的に生成する方法を説明する。ステレオ信号を効率
的に生成する従来手法としては、前述のＭＰＥＧ−２Ａ
ＡＣ規格文書に記されているインテンシティステレオ手
段がよく知られている。インテンシティステレオ手段で
は、左右、２つのチャネルから構成されるステレオ信号
の生成において、左右共通の周波数領域信号をモノラル
信号と同様の手法により生成すると共に、ビットストリ
ームに含まれている左チャネルのスペクトラム包絡情報
と右チャネルのスペクトラム包絡情報を復号して得る。
このスペクトラム包絡情報は倍率を表しており、左右共
通の周波数領域信号に左チャネルのスペクトラム包絡情
報が表す倍率を乗算することで左チャネルの周波数領域
信号を生成する。同様に、左右共通の周波数領域信号に
右チャネルのスペクトラム包絡情報が表す倍率を乗算す
ることで右チャネルの周波数領域信号を生成する。この
ようなインテンシティステレオ手段を、本発明のよう
に、ある帯域の周波数信号を他の帯域の周波数信号を元
にして複写などの手段により効率的に生成する音声復号
システムにおいて利用する場合、左右共通の周波数領域
信号を他の帯域の周波数信号を元にして複写などの手段
により生成した後に左チャネルと右チャネルのスペクト
ラム包絡情報が表す倍率を各々乗算してステレオ信号を
生成しても良い。また、左右共通の周波数領域信号を生
成する替わりに、左右同一の生成手段、つまり左右チャ
ネルで同一の複写元周波数情報や複写先周波数情報を用
いて左チャネルの平坦信号から左チャネルの周波数領域
信号を、右チャネルの平坦信号から右チャネルの周波数
領域信号を生成し、これらの周波数領域信号に左チャネ
ルと右チャネルのスペクトラム包絡情報が表す倍率を各
々乗算してステレオ信号を生成しても良い。Finally, as shown in the first and second embodiments of the present invention, the frequency signal of a certain band is efficiently generated by means such as copying based on the frequency signal of another band. A method for efficiently generating a stereo signal in a speech decoding system that operates will be described. As a conventional method for efficiently generating a stereo signal, the above-mentioned MPEG-2A is used.
Intensity stereo means described in AC standard documents are well known. In the intensity stereo means, in generating a stereo signal composed of two channels, left and right, a frequency domain signal common to the left and right is generated by a method similar to that of a monaural signal, and the spectrum of the left channel included in the bit stream is generated. It is obtained by decoding the envelope information and the spectrum envelope information of the right channel.
This spectrum envelope information represents a scaling factor, and a frequency domain signal for the left channel is generated by multiplying the frequency domain signal common to the left and right by the scaling factor represented by the spectrum envelope information for the left channel. Similarly, the frequency domain signal of the right channel is generated by multiplying the frequency domain signal common to the left and right by the multiplication factor represented by the spectrum envelope information of the right channel. When such an intensity stereo means is used in a voice decoding system for efficiently generating a frequency signal of a certain band based on a frequency signal of another band by means such as copying as in the present invention, It is also possible to generate a common frequency domain signal based on frequency signals of other bands by means of copying or the like, and then multiply each by the magnification represented by the spectrum envelope information of the left channel and the right channel to generate a stereo signal. Further, instead of generating a frequency domain signal common to the left and right, a left channel frequency domain signal and a left channel frequency domain signal are generated using the same left and right generation means, that is, the same copy source frequency information and copy destination frequency information for the left and right channels. May generate a frequency domain signal of the right channel from the flat signal of the right channel, and multiply these frequency domain signals by the scaling factors represented by the spectrum envelope information of the left channel and the right channel to generate a stereo signal.

【００４７】また本発明は、第１の実施の形態、第２の
実施の形態で示した構成または手順をプログラム化し、
コンピュータ等に備えるＣＰＵで実施することも可能で
ある。The present invention also programs the configuration or procedure shown in the first and second embodiments,
It can also be implemented by a CPU provided in a computer or the like.

【００４８】[0048]

【発明の効果】本発明の効果は、より高音質な復号音声
信号を生成できることにある。その理由は、周波数領域
の音声信号を従来よりも必要情報量が少なくても復号で
きる手段を有しており符号化効率が向上しているためで
ある。The effect of the present invention is to generate a decoded voice signal with higher sound quality. The reason is that the audio signal in the frequency domain has a unit capable of decoding even if the required information amount is smaller than in the past and the coding efficiency is improved.

[Brief description of drawings]

【図１】本発明の第１の実施の形態の構成を示すブロッ
ク図である。FIG. 1 is a block diagram showing a configuration of a first exemplary embodiment of the present invention.

【図２】本発明の第１の実施の形態の動作を示す流れ図
である。FIG. 2 is a flowchart showing an operation of the first exemplary embodiment of the present invention.

【図３】本発明の第２の実施の形態の構成を示すブロッ
ク図である。FIG. 3 is a block diagram showing a configuration of a second exemplary embodiment of the present invention.

【図４】本発明の第２の実施の形態の動作を示す流れ図
である。FIG. 4 is a flowchart showing the operation of the second exemplary embodiment of the present invention.

【図５】従来の実施の形態の構成を示すブロック図であ
る。FIG. 5 is a block diagram showing a configuration of a conventional embodiment.

[Explanation of symbols]

１００、３００、５００ビットストリーム多重化解除
手段１０１、３０１、５０１ハフマン復号手段１０２、３０２、５０２逆量子化手段１０３、３０３、５０３逆写像変換手段１０４、３０４平坦化手段１０５、３０５複写手段手段１０６、３０６包絡付加手段３０７ノイズ付加手段100, 300, 500 Bitstream demultiplexing means 101, 301, 501 Huffman decoding means 102, 302, 502 Dequantization means 103, 303, 503 Inverse mapping conversion means 104, 304 Flattening means 105, 305 Copying means 106 , 306 Envelope adding means 307 Noise adding means

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5D045 DA20 5J064 AA01 AA02 BA09 BA16 BC16 BC21 BD04 ─────────────────────────────────────────────────── ─── Continued front page F-term (reference) 5D045 DA20 5J064 AA01 AA02 BA09 BA16 BC16 BC21 BD04

Claims

[Claims]

1. A quantized value in the frequency domain of an audio signal is generated from an input bit stream by means such as Huffman decoding, and inverse quantization means inversely quantizes the quantized value to produce audio in the frequency domain. A signal is generated and compared with a conventional speech decoding system in which the inverse mapping transforming means transforms the frequency domain speech signal into a time domain decoded speech signal. Flattening means for flattening the amplitude of the audio signal in the frequency domain to generate a flat signal in order to generate a frequency domain signal in the extension band which is all or a part; and a part of the flat signal in the extension band Copy means for copying to the extension band, and envelope adding means for adding a spectrum envelope to the frequency domain signal copied to the extension band to generate a frequency domain signal in the extension band. Audio decoding system according to claim.

2. A quantized value in the frequency domain of a voice signal is generated from an input bit stream by means such as Huffman decoding, and the inverse quantization means inversely quantizes the quantized value to produce a voice in the frequency domain. A signal is generated and compared with a conventional speech decoding system in which the inverse mapping transforming means transforms the frequency domain speech signal into a time domain decoded speech signal. Flattening means for flattening the amplitude of the audio signal in the frequency domain to generate a flat signal in order to generate a frequency domain signal in the extension band which is all or a part; and a part of the flat signal in the extension band Copy means for copying to, a noise adding means for adding noise to the frequency domain signal copied to the extension band, and a spectrum envelope for adding the noise to the frequency domain signal Audio decoding system characterized in that it comprises an envelope adding means for generating a frequency domain signal of the extension band Te.

3. The voice decoding system according to claim 1, wherein the flattening means obtains the flat signal by dividing the voice signal in the frequency domain by its spectrum envelope.

4. The speech decoding according to claim 3, wherein, in the frequency domain speech signal, the spectrum envelope at each frequency is set to a representative value in the vicinity of each frequency in the frequency domain speech signal. system.

5. The speech decoding system according to claim 4, wherein the range in the vicinity of the frequency changes according to the pitch of the speech signal in the frequency domain.

6. The speech decoding system according to claim 4, wherein the representative value is either maximum amplitude or average amplitude.

7. An envelope added by the envelope adding means to a signal of each frequency in the audio signal in the frequency domain,
The calculation is performed by interpolation processing from two or more known envelopes in the vicinity of the frequency of each frequency.
7. The speech decoding system according to any one of 1 to 6.

8. The copying device according to claim 7, wherein the copying source frequency is derived from the copying source frequency information recorded in the input bit stream and the pitch of the audio signal in the frequency domain. A speech decoding system according to item 1.

9. A quantized value in the frequency domain of an audio signal is generated from an input bit stream, the quantized value is dequantized to generate a frequency domain audio signal, and the frequency domain audio signal is generated. In order to generate a frequency domain signal in an extension band, which is all or part of a frequency band not decoded by the conventional speech decoding method, compared with a conventional speech decoding method for converting a speech signal into a decoded speech signal in the time domain. In addition, the amplitude of the audio signal in the frequency domain is flattened to generate a flat signal, a part of the flat signal is copied to the extension band, and a spectrum envelope is added to the frequency domain signal copied to the extension band. And a frequency domain signal in the extension band is generated.

10. A quantized value in the frequency domain of a voice signal is generated from an input bit stream, and the quantized value is dequantized to generate a voice signal in the frequency domain. In order to generate a frequency domain signal in an extension band, which is all or part of a frequency band not decoded by the conventional speech decoding method, compared with a conventional speech decoding method for converting a speech signal into a decoded speech signal in the time domain. In, to flatten the amplitude of the audio signal in the frequency domain to generate a flat signal, copy a part of the flat signal to the extension band, add noise to the frequency domain signal copied to the extension band,
A speech decoding method, wherein a spectrum envelope is added to the noise-added frequency domain signal to generate the extension band frequency domain signal.

11. The speech decoding method according to claim 9, wherein the flat signal is obtained by dividing the speech signal in the frequency domain by its spectrum envelope.

12. The speech decoding according to claim 11, wherein, in the frequency domain speech signal, the spectrum envelope at each frequency is set to a representative value in the vicinity of each frequency in the frequency domain speech signal. Method.

13. The speech decoding method according to claim 11, wherein the range near the frequency changes according to the pitch of the speech signal in the frequency domain.

14. The representative value is either maximum amplitude or average amplitude, according to claim 12 or 13.
The voice decoding method described in.

15. An envelope to be added to a signal of each frequency in the audio signal in the frequency domain is calculated by interpolation processing from two or more known envelopes in the vicinity of the frequency of each frequency. The speech decoding method according to any one of claims 9 to 14.

16. The copy means of the audio decoding system according to claim 1, wherein the copy source frequency is derived from the copy source frequency information recorded in the input bit stream and the pitch of the audio signal in the frequency domain. Claim 9
15. The speech decoding method according to any one of 1 to 15.

17. A computer generates a quantized value in the frequency domain of an audio signal from an input bit stream, dequantizes the quantized value to generate an audio signal in the frequency domain, and outputs the quantized value in the frequency domain. Compared with a conventional speech decoding method for converting the speech signal of No. 1 into a decoded speech signal in the time domain, the frequency domain signal of the extension band which is all or a part of the frequency band not decoded by the conventional speech decoding method is To generate, the amplitude of the audio signal in the frequency domain is flattened to generate a flat signal, a part of the flat signal is copied to the extension band, and a spectrum envelope is added to the frequency domain signal copied to the extension band. Is added to generate a frequency domain signal in the extension band, and the processing is executed.

18. A computer generates a quantized value in the frequency domain of an audio signal from an input bit stream, dequantizes the quantized value to generate an audio signal in the frequency domain, and outputs the quantized value in the frequency domain. Compared with a conventional speech decoding method for converting the speech signal of No. 1 into a decoded speech signal in the time domain, the frequency domain signal of the extension band which is all or a part of the frequency band not decoded by the conventional speech decoding method is To generate, the amplitude of the audio signal in the frequency domain is flattened to generate a flat signal, a part of the flat signal is copied to the extension band, and noise is added to the frequency domain signal copied to the extension band. A speech decoding program, characterized in that the processing is executed by adding a spectrum envelope to the noise-added frequency domain signal to generate a frequency domain signal in the extension band. Beam.