CN1532808A

CN1532808A - Method and device for coding and/or decoding audip frequency data using bandwidth expanding technology

Info

Publication number: CN1532808A
Application number: CNA031650201A
Authority: CN
Inventors: 金重会; 金尚煜
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2003-03-22
Filing date: 2003-09-17
Publication date: 2004-09-29
Anticipated expiration: 2023-09-17
Also published as: KR20040086879A; KR100923301B1; CN1273955C

Abstract

A coding device band expand-codes audio data, outputs band restricted audio data, and generates band expanding information. The coding device performs a Huffman coding process in a layer structure having a base layer and at least one enhancement layer to control a bit rate of the band restricted data. The coding device multiplexes the Huffman-coded band restricted audio data and the band expanding information.

Description

Adopt the method and apparatus of bandwidth expansion technique coding and/or decoding audio data

It is that the application of 2003-17978 is a foreign priority that the application requires with the application number that on March 22nd, 2003 submitted to Korea S Department of Intellectual Property, and it is disclosed in this and is quoted in full.

Technical field

The present invention relates to the Code And Decode of voice data, relate in particular to the method and apparatus that adopts bandwidth expansion technique coding and/or decoding audio data.

Background technology

Along with the Digital Signal Processing development, sound signal is mainly as numerical data storage and broadcast.DAB storer and/or playback equipment sampling and quantitative simulation sound signal, simulated audio signal is transformed into pulse code modulation (pcm) voice data as digital signal, and the pulse code modulation (pcm) voice data is stored on the information storage medium such as compact disk (CD), digital versatile disc (DVD) or analog, like this, but when the user wants to listen the pcm audio data data on the broadcast information storage medium.With respect to storage of slow-speed fine groove (LP) simulated audio signal that disc, tape or analog adopted and/or reproducting method, digital audio and video signals storage and/or reproducting method have improved tonequality greatly and have significantly reduced because the sound quality deterioration that long-term storage caused.Yet a large amount of numerical datas also produce the problem of storage and transmission sometimes.

For overcoming the above problems, the compress technique that is used to reduce the digital audio-frequency data amount is in a large number used.The Motion Picture Experts Group's audio standard drafted by International Organization for Standardization or the AC-2/AC-3 technology of Dolby take to utilize a psychoacoustic model to reduce the method for data volume, have effectively reduced and data volume that signal characteristic is irrelevant.That is to say that only on the bit rate of 64Kbps-384Kbps, promptly the 1/6-1/8 of existing digital coding provides the tonequality much at one with CD for mpeg audio standard and AC-2/AC-3 technology.

Yet, more than all technology all defer to the method for detection under the optimum condition of fixed bit rate, quantification and coding digital data.Therefore, when via networks transmit digital data, because the network condition restriction causes transmission bandwidth to reduce.Then, network disconnects and the network service stopping.That is to say, thereby be transformed into less bit stream when being fit to the limited mobile device of memory capacity when numerical data, can carry out recodification to reduce data volume.For this reason, need considerable calculating.

Therefore, it is that the korean patent application of 97-61298 " can utilize audio coding and/or the coding/decoding method and the device of bit cutting arithmetic coding (BSAC) technology control bit rate " that the applicant has submitted application number on November 19th, 1997 to Korea S Department of Intellectual Property, this application is authorized on April 17th, 2002, and Korean patent registration No. is 261253.According to the BSAC technology,, can be transformed into bit stream with low bit rate with the high bit rate bitstream encoded.Owing to can only adopt a part of bit stream to be reconstructed, even network over loading, decoder capabilities is bad or the customer requirements low bit rate, also can only utilize a part of bit stream to provide moderate tonequality service (even decoder capabilities and low bit rate worsen equally) to the user.Yet at low bit rate, decoder capabilities also will reduce inevitably.

Yet the BSAC technology is utilized improved discrete cosine transform (MDCT) converting audio frequency signal, and this has seriously reduced the sound quality that lower level produces.Because the frequency resolution of MDCT is a constant, considers psychoacoustic model, the frequency resolution of the insensitive part of people's ear becomes very high.Therefore, according to MDCT, sound quality variation during from enhancement layer to lower level.

Summary of the invention

The invention provides audio data coding and/or the coding/decoding method and the device of energy control audio data bit-rate, recover, also can reproduce high quality sound even only utilize partial bit stream to carry out.

The present invention also provides audio data coding and/or coding/decoding method and device, can make and can produce high quality sound from a lower level by control bit stream.

According to an aspect of of the present present invention, provide a kind of method of coding audio data.The method comprises: bandwidth extended coding voice data, and the voice data that output bandwidth is limited, and produce the bandwidth extend information; Thereby with the voice data arithmetic coding of described limited bandwidth is the hierarchy control bit rate with a basic unit and at least one enhancement layer; And the voice data of multiplexed this arithmetic coding limited bandwidth and bandwidth extend information.

Described arithmetic coding comprises: differential coding is corresponding to the supplementary of basic unit; The bit partition encoding is corresponding to a plurality of quantised samples values of basic unit; And repeat differential coding and the bit partition encoding is finished coding up to a plurality of predetermined layer for next enhancement layer.

Described arithmetic coding comprises: differential coding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit; The reference encoder model information, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit; And repeat differential coding and the bit partition encoding is finished coding up to a plurality of predetermined layer for next enhancement layer.

The quantised samples value preferably obtains by the pseudo-wavelet transform of voice data.

Limited voice data and the bandwidth extend information of encoded bandwidth is multiplexed in the following order, the location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, location bandwidth extend information, and the location is corresponding to the part of all the other enhancement layers limited voice data of encoded bandwidth.

Alternatively, limited voice data and the bandwidth extend information of encoded bandwidth is multiplexed in the following order, location bandwidth extend information, the location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, and the location is corresponding to the part of all the other enhancement layers limited voice data of encoded bandwidth.

A kind of method of decoding audio data is provided according to a further aspect in the invention.The method comprises: multichannel is decomposed the voice data of an input audio bitstream and the limited bandwidth of sampling, and these data are encoded as the hierarchy that comprises a basic unit and at least one enhancement layer and bandwidth extend information; Arithmetic decoding at least a portion is corresponding to the voice data of the limited bandwidth of basic unit; Decoded portion and reference bandwith extend information based on the voice data of limited bandwidth, generation is at least a portion not by the voice data in the frequency band of the decoded portion of the voice data of limited bandwidth covering, then the voice data that is produced is mended the decoded portion into the voice data of limited bandwidth.

Be created in the voice data of described frequency band part, thus the voice data that the arrives limited bandwidth border of decoded portion.Be created in the voice data of described frequency band part, thereby arrive the border of the bank of filters that is used for pseudo-wavelet transform.If the voice data no show is used for the border of the bank of filters of pseudo-wavelet transform, the voice data that then inserts limited bandwidth is the lap of decoded portion and the voice data that is produced.

Described input audio bitstream multichannel in the following order decomposes: from the data of input audio bitstream sampling corresponding to basic unit, and from input audio bitstream sampling bandwidth extend information, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

Alternatively, described input audio bitstream multichannel in the following order decomposes: from input audio bitstream sampling bandwidth extend information, from the data of input audio bitstream sampling corresponding to basic unit, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

Described arithmetic decoding comprises: differential decoding is corresponding to the supplementary of basic unit; Bit is cut apart a plurality of quantised samples values of decoding corresponding to basic unit; And cut apart decoding and finish decoding up to a plurality of predetermined layer for next enhancement layer repeats differential decoding and bit.

Described arithmetic decoding comprises: differential decoding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit; Reference encoder model information, bit are cut apart a plurality of quantised samples values of decoding corresponding to basic unit; And cut apart decoding and finish decoding up to a plurality of predetermined layer for next enhancement layer repeats differential decoding and bit.

A kind of device of coding audio data is provided according to a further aspect of the invention.This device comprises: a bandwidth extended coding device is used for the limited voice data of bandwidth extended coding voice data, output bandwidth and produces the bandwidth extend information; One particulate scalable encoder, thus the voice data arithmetic coding that is used for described limited bandwidth is the hierarchy control bit rate that comprises a basic unit and at least one enhancement layer; And a multiplexer, be used for the voice data and the bandwidth extend information of multiplexed described arithmetic coding limited bandwidth.

Described particulate scalable encoder differential coding is corresponding to the supplementary of basic unit, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit, and the bit partition encoding is finished coding corresponding to supplementary and a plurality of quantised samples value of next enhancement layer up to a plurality of predetermined layer.

Described particulate scalable encoder differential coding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit, the reference encoder model information, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit, coding is finished coding corresponding to the supplementary that comprises scale factor information and encoding model information of next enhancement layer up to a plurality of predetermined layer, and the bit partition encoding is corresponding to a plurality of quantised samples values of next enhancement layer.Described particulate scalable encoder preferably obtains the quantised samples value by pseudo-wavelet transform voice data.

Voice data that the multiplexed in the following order encoded bandwidth of described multiplexer is limited and bandwidth extend information: a location part is corresponding to the limited voice data of the encoded bandwidth of basic unit, location bandwidth extend information, and the location is corresponding to the part of all the other enhancement layers limited voice data of encoded bandwidth.

According to a further aspect of the invention, provide a kind of device that is used for decoding audio data.This device comprises: a demultiplexer, and multichannel decomposition one input audio bitstream and sampling are encoded into the voice data of the limited bandwidth of the hierarchy with a basic unit and at least one enhancement layer and bandwidth extend information; The scalable arithmetic decoder of one particulate, decoding is corresponding to the voice data of at least a portion limited bandwidth of basic unit; With a bandwidth extension decoder, decoded portion and reference bandwith extend information based on the voice data of limited bandwidth, generation is at least a portion not by the voice data in the frequency band of the decoded portion of the voice data of limited bandwidth covering, then the voice data that is produced is mended the decoded portion into the voice data of limited bandwidth.

The scalable Hafman decoding device of described particulate differential decoding is corresponding to the supplementary of basic unit, bit is cut apart a plurality of quantised samples values of decoding corresponding to basic unit, and decoding corresponding to the supplementary of next enhancement layer up to a plurality of predetermined layer complete decoding, and bit is cut apart a plurality of quantised samples values of decoding corresponding to next enhancement layer.

Described demultiplexer multichannel in the following order decomposes described input audio bitstream: from the data of input audio bitstream sampling corresponding to basic unit, from input audio bitstream sampling bandwidth extend information, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.Alternatively, described demultiplexer multichannel in the following order decomposes the input audio bitstream: from input audio bitstream sampling bandwidth extend information, from the data of input audio bitstream sampling corresponding to basic unit, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

Description of drawings

Describe exemplary embodiments of the present invention in detail by the reference accompanying drawing, feature of the present invention and other advantage will be more apparent, wherein:

Fig. 1 is the block scheme according to a code device of the present invention;

Fig. 2 is the more detailed block diagram of code device shown in Figure 1;

Fig. 3 is the block scheme according to a decoding device of the present invention;

Fig. 4 is the more detailed block diagram of decoding device shown in Figure 3;

Fig. 5 shows from the structure of the bit stream of a particulate scalable (FGS) scrambler 2 outputs;

Fig. 6 shows the detailed structure of supplementary shown in Figure 5;

Fig. 7 shows the structure of exporting or be input to the bit stream of demultiplexer 7 from multiplexer 3;

Fig. 8 is used to explain according to the arithmetic coding of Code And Decode device execution of the present invention and the synoptic diagram of coding/decoding method;

Fig. 9 is the synoptic diagram that is used for explaining in more detail the bandwidth expansion decoding of being carried out by bandwidth expansion (BWE) demoder 9;

Figure 10 is the process flow diagram that is used to illustrate according to a coding method of the present invention;

Figure 11 is the process flow diagram that is used to illustrate according to a coding/decoding method of the present invention.

Embodiment

Hereinafter with reference to accompanying drawing the preferred embodiments of the present invention are described in detail.

Fig. 1 is the block scheme according to a code device of the present invention.As Fig. 1, this code device receives and coding pcm audio data, and the pcm audio data are exported as audio bitstream, and this code device comprises bandwidth expansion (BWE) scrambler 1, one particulate scalable (FGS) scrambler 2 and a multiplexer 3.

Described BWE scrambler 1, the voice data that BWE coding pcm audio signal, output bandwidth are limited also produces BWE information.The BWE coding relates to and is used to receive voice data, cuts apart a part of voice data in the high frequency band, and produces the technology of the necessary supplementary of partitioning portion of recovering voice data.At this, the remainder of voice data is called as " voice data of limited bandwidth " and supplementary is called as " BWE information ".An example of BWE technology is spectral band replication (SBR) technology that the coding techniques development comes.Open in " meeting paper 5560 " of the detailed content of SBR technology in the 112nd the Audio Engineering Society meeting of 10-13 day in May, 2002.

Thereby FGS scrambler 2 becomes to have the hierarchy control bit rate of a basic unit and at least one enhancement layer with the audio data coding of limited bandwidth.Thereby the FGS coding relates to and is used to encode the data to sandwich construction control bit rate, and the technology of FGS promptly is provided.Application number is that the disclosed BSAC technology of the korean patent application of 97-61298 is an example of FGS coding.That is to say, FGS scrambler 2 differential codings are corresponding to the supplementary of basic unit, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit, differential coding is finished coding corresponding to the supplementary of next enhancement layer up to a plurality of predetermined layer, and the bit partition encoding is corresponding to a plurality of quantised samples values of next enhancement layer.At this, described supplementary comprises scale factor information and encoding model information, and obtains the quantised samples value by the transform and quantization input audio data.Below will describe described supplementary and quantised samples value in detail.

Multiplexer 3 is multiplexed by the limited bandwidth PMC voice data of FGS scrambler 2 coding generations and the BWE information that is produced by BWE scrambler 1.

Fig. 2 is the more detailed block diagram of code device shown in Figure 1.As Fig. 2, described code device comprises a BWE scrambler 1, a FGS scrambler 2 and a multiplexer 3.Adopt identical Reference numeral with the piece of carrying out identical function among Fig. 1, do not repeat them here.

Particularly, FGS scrambler 2 comprises pseudo-wavelet transform (PWT) unit 21, a tonequality unit 22 and a quantifying unit 23 and a FGS arithmetic coding unit 24.

PWT unit 21 receives the pcm audio data in the time domains, and is sound signal in the frequency domain with reference to the psychoacoustic model information that is provided by tonequality unit 22 with the pseudo-wavelet transform of these pcm audio data.Can below be called as the sensing audio signal by the characteristic audio signal of people's perception, in time domain, not have too big difference.On the contrary, consider psychoacoustic model, the perception in the frequency domain and the characteristic of non-sensing audio signal are very inequality.Therefore, can improve compression efficiency by the bit of giving each bandwidth assignment varying number.The only slight frequency distortion that is produced owing to high frequency resolution in the low-frequency band, MDCT can produce the perception noise.With respect to MDCT, owing to have moderate time/frequency resolution, even PWT also can provide stable sound quality from the lower level with lower band.

Tonequality unit 22 provides the information about psychoacoustic model such as handling (attack) detection information to PWT unit 21, to be packaged as the sub-band sound signal by the sound signal of PWT unit 21 conversion, the masking effect of utilizing the sub-band signal interphase interaction to be produced is calculated masking threshold for each sub-band, and masking threshold is offered quantifying unit 23.Described masking threshold represents because the peak power of the imperceptible sound signal of interaction people between sound signal.In the present embodiment, tonequality unit 22 utilizes the stereo level of sheltering to reduce (Binaural Masking Level Depression) and (BMLD) calculates masking threshold and the similar value that is used for stereo part.

Quantifying unit 23 reduces to the quantization noise energy with each sub-band based on each sub-band sound signal of scale factor information scalar quantization of correspondence and is lower than the masking threshold that tonequality unit 22 provides, output quantizes sampled value then, and people can hear the sub-band sound signal but imperceptible wherein noise like this.That is to say that quantifying unit 23 is pressed noise sheltering ratio (NMR) and quantized the sub-band sound signal, NMR represents the ratio of the masking threshold that noise that each sub-band produces and tonequality unit 22 are calculated, and is OdB or still less at full bandwidth.OdB or NMR still less represent that people can not hear quantization noise.

FGS arithmetic coding unit 24 is encoded to hierarchy with quantised samples value and the supplementary that belongs to every layer.Described supplementary comprises proportional band information, coding band information, scale factor information and the encoding model information corresponding to every layer.Proportional band information and coding band information can be packaged as the header information of each frame that constitutes audio bitstream, are sent to decoding device then.Alternatively, described proportional band information and coding band information can be encoded and be packaged as the supplementary corresponding to every layer, are sent to decoding device then.And because proportional band information and coding band information have been stored in the decoding device, proportional band signal and coding band information can be sent to described decoding device.

In more detail, FGS arithmetic coding unit 24 differential codings are corresponding to the supplementary that comprises scale factor information and encoding model information of ground floor, and reference encoder model information bit partition encoding quantizes the sampled value corresponding to ground floor simultaneously.The bit partition encoding is illustrated in the coding that uses among the above-mentioned BSAC, the highest significant bit of order lossless coding, inferior significant bit ... and minimum effective bit.The second layer and ground floor adopt same treatment.That is to say the sequential encoding successively of a plurality of predetermined layer.Ground floor is called as basic unit, and remainder layer is called as enhancement layer.The back will be described in more detail hierarchy.

It is necessary that proportional band information quantizes for the frequency characteristic correct execution that relies on sound signal, and when frequency domain is divided into a plurality of frequency bands and each frequency band and has been assigned with a correct proportions factor, each layer of the proportional band that proportional band information notice is corresponding.Therefore, every layer belongs at least one proportional band.Each proportional band is assigned with a scale factor.The coding band information is necessary for the frequency characteristic correct execution coding that relies on sound signal, and when frequency domain is divided into a plurality of frequency bands and each frequency band and has been assigned with a correct coding model, each layer of coding band information notice corresponding codes frequency band.By testing correct division proportion frequency band and coding frequency band, and determine corresponding scale factor and encoding model subsequently.

Voice data that the multiplexed in the following order encoded bandwidth of multiplexer 3 is limited and BWE information: BWE information is located corresponding to the sampled value of the coded quantization data of basic unit in the location, and the location is corresponding to the sampled value of the coded quantization data of all the other enhancement layers.Voice data and BWE information that multiplexer 3 or multiplexed in the following order encoded bandwidth are limited: location BWE information, the location is corresponding to the sampled value of the coded quantization data of basic unit, and the location is corresponding to the sampled value of the coded quantization data of all the other enhancement layers.

Fig. 3 is the block scheme according to a decoding device of the present invention.As Fig. 3, this decoding device receives and the decoded audio bit stream, outputting audio data then, and this decoding device comprises a demultiplexer 7, a FGS demoder 8 and a BWE demoder 9.

Demultiplexer 7 is imported the voice data that the audio bitstream multichannel is decomposed into the sampling limited bandwidth with one, and the voice data of this sampling limited bandwidth has been encoded into the hierarchy with basic unit and at least one enhancement layer and BWE information wherein.At this, the voice data of limited bandwidth and BWE information are with described identical with reference to Fig. 1.FGS demoder 8 arithmetic decodings are corresponding to the voice data of at least a portion limited bandwidth of basic unit.The layer of carrying out decoding is relevant with network state, user's selection and so on.

Based on by the voice data part of the limited bandwidth of FGS demoder 8 arithmetic decodings and with reference to BWE information by demultiplexer 7 samplings, BWE demoder 9 produces the voice data that is in the frequency band that at least a portion do not cover by the voice data by the limited bandwidth of FGS demoder 8 arithmetic decodings, then the voice data that is produced is mended into by the voice data of the limited bandwidth of FGS demoder 8 arithmetic decodings.

Because the present invention adopts PWT, the following process of BWE demoder 9 experience.When adopting PWT to carry out decoding, select dividing frequency by in the voice data process of determining limited bandwidth, determining the rearmost point in the frequency domain.Because low in the frequency resolution of HFS, PWT can not resemble the MDCT according to the accurate limiting bandwidth of determined rearmost point.In decode procedure, BWE demoder 8 will be arranged in the frequency domain by the core that FGS demoder 9 is produced, and confirms the frequency bandwidth of this core, and BWE is partly revised and be decoded as suitable frequency bandwidth.

For example, have only 8 layers of reconstruct in 16 layers of bit stream of let us hypothesis with the bit rate coding of 64kbps, corresponding the 8th layer frequency is 8.5kHZ.In the case, BWE demoder 8 is had in 8.5kHZ-15kHZ or reconstruct data in the wide frequency ranges more.Because the characteristic of Quadrature Mirror Filter QMF (QMF), BWE demoder 8 can be adjusted frequency bandwidth on the basis of quadrature mirror image filtering channel bandwidth.When n the frequency bandwidth of QMF is 8.3kHZ, the frequency component in the range of frequency bandwidths of 8.3-8.5kHZ is comprised in core or BWE part.Therefore, core or BWE part must correctly be handled.

First kind of method of handling core and BWE part is the frequency component of deleting from the core in the 8.3-8.5kHZ range of frequency bandwidths.In the method, consider the bandwidth information of BWE part, FGS demoder 9 is carried out decoding.Second method is to utilize the QMF that uses in the BWE demoder 8 to filter the data of core, generates the QMF data by interpolation, thus and the data of reverse quadrature mirror image filtering QMF data reconstruct core.

As mentioned above, even the voice data of FGS demoder 8 decodings has only the base-band audio data, BWE demoder 9 is created omission frequency band voice data and it is mended into the base-band audio data.Therefore, can improve the decoding audio data quality.

Fig. 4 is the more detailed block diagram of decoding device shown in Figure 3.As Fig. 4, this decoding device comprises a demultiplexer 7, a FGS demoder 8 and a BWE demoder 9.Same reference numerals with the piece of carrying out identical function among Fig. 3 adopts does not repeat them here.

Particularly, be the control bit rate, FGS demoder 8 is carried out decoding up to destination layer, and this destination layer is determined by network state, decoding device performance, user's selection etc.FGS demoder 8 comprises a FGS arithmetic decoding unit 81, an inverse quantization unit 82 and a PWT inverse transformation unit 83.The destination layer of decoding up to audio bitstream carried out in FGS arithmetic decoding unit 81.In more detail, based on comprising by decoding corresponding to every layer the scale factor information and the encoding model information that supplementary obtained of encoding model information, thereby FGS arithmetic decoding unit 81 arithmetic decodings obtain the quantised samples value corresponding to every layer the sampled value of coded quantization.Below will explain the processing that obtains the quantised samples value in detail.

Proportional band information and coding band information can obtain from the header information of audio bitstream or every layer the supplementary of decoding.Alternatively, described decoding device in advance the stored ratio band information and the coding band information.

Inverse quantization unit 82 is based on the quantised samples value corresponding to every layer of every layer scale factor information inverse quantization and reconstruct.PWT inverse transformation unit 83 frequencies/time map is the reconstructed sample value, is time domain pcm audio data with the reverse pseudo-wavelet transform of the sampled value of being shone upon, and exports this time domain pcm audio data.

BWE demoder 9 comprises a converter unit 91, a high frequency generation unit 92, an adjustment unit 93 and a synthesis unit 94.Converter unit 91 will the 83 time domain pcm audio data conversions of exporting be a frequency domain data from PWT inverse transformation unit.Frequency domain data is called as low frequency part.High frequency generation unit 92 is created the unlapped part of frequency domain data, that is, then the low frequency part of being duplicated mended frequency domain data, be the HFS that obtains in the original low frequency part by reference BWE information reproduction low frequency part.Adjustment unit 93 adopts the information of sealing that is included in the BWE information to adjust the level of the HFS that is produced by high frequency generation unit 92.From the information of sealing that encoded point transmits, expression is corresponding to the information of sealing of the voice data of the HFS of being cut apart by encoded point in the BWE cataloged procedure.Synthesis unit 94 is synthetic from the low frequency part of converter unit 91 outputs and the HFS of exporting from adjustment unit 93, exports the pcm audio data then.

As mentioned above, although 8 decodings of FGS demoder base-band audio data, 9 reconstruct of BWE demoder are omitted the frequency band voice data and will be omitted the frequency band voice data and mend in the base-band audio data.Therefore, improved the base-band audio quality of data.

Fig. 5 represents from the bit stream structure of FGS scrambler 2 outputs.As Fig. 5, arrive in the hierarchy of particulate scalable (FGS) by mapping quantification sampled value and supplementary, FGS scrambler 2 is encoded bit-stream frames.That is to say that this frame has hierarchy, wherein the bit stream of lower level is included in the bit stream of enhancement layer.Every layer of necessary supplementary successively encoded.

The header region that header information was stored in is in the beginning part of bit stream, and the 0th layer of information is packaged, and is in the first packaged successively to the information of N layer of enhancement layer.Basic unit's scope is from header region to the zero layer information, the ground floor scope be from header region to ground floor information, and second layer scope is to second layer information from header region.Equally, the highest enhancement layer scope is from header region to the N layer information, promptly from the N of basic unit to the layer.Supplementary and coded data all are used as every layer of information stores.For example, supplementary 2 and the coded quantization sampled value be used as second layer information stores.Here, N is the natural number more than or equal to " 1 ".

Fig. 6 shows the detailed structure of supplementary shown in Figure 5.As Fig. 6, supplementary and the coded quantization sampled value all be used as the random layer information stores.In current embodiment, because the quantised samples value is by arithmetic coding, supplementary comprises arithmetic coding model information, scale factor information, channel side and other supplementarys.The arithmetic coding model information relates to the index information of the arithmetic coding model of the quantised samples value that being used to encodes or decode is included in equivalent layer.Suitable quantification of scale factor information notice equivalent layer or inverse quantization are included in the quantization step size of the voice data of equivalent layer.Channel side relate to such as in/limit (M/S) stereosonic information relevant with channel.Whether other supplementarys adopt the stereosonic identification information of M/S for expression.

In the present embodiment, FGS scrambler 2 differential codings of code device comprise the supplementary of arithmetic coding model information and scale factor information.Because each proportional band has a scale factor, be the coding ratio factor, at first arithmetic coding belongs to minimum scale factor, the difference of arithmetic coding between the minimum scale factor and other scale factors then in the scale factor of proportional band.Encoded according to the method for coded quantization step size corresponding to an arithmetic coding model and information in each coding bit range that frequency band allowed, i.e. differential coding.

In the present embodiment, FGS demoder 8 arithmetic decodings of decoding device comprise the supplementary of arithmetic coding model information and scale factor information.Because each proportional band has a scale factor, be the decoding scale factor, at first arithmetic decoding belongs to minimum scale factor, the difference of arithmetic decoding between the minimum scale factor and other scale factors then in the scale factor of proportional band.Corresponding to the arithmetic coding model in each coding bit range that frequency band allowed and information in the mode identical with scale factor by arithmetic decoding.

Fig. 7 shows multiplexer 3 outputs or is input to the structure of the bit stream of demultiplexer 7.As Fig. 7, the 0th layer, i.e. the coded basic unit of FGS scrambler 2 is positioned at the beginning part of bit stream, and BWE information is after the 0th layer, and enhancement layer, i.e. ground floor, the second layer ... with the N layer, after BWE information.Although separate the code-point basic unit that receives only or decode, separate code-point and can omit a layer voice data based on the decoding audio data of basic unit and with reference to the BWE information creating.

Fig. 8 is used to explain according to the arithmetic coding of Code And Decode device execution of the present invention and the synoptic diagram of coding/decoding method.As Fig. 8, the dot matrix rectangle frame represents to constitute the spectrum line of quantised samples value, and wherein A represents to be used to form the line on the border between the layer, thereby B represents to be used for the terminal node of the corresponding PWT tree construction in boundary line of split spectrum line.

The PWT and/or the anti-PWT that are used in coding and/or the coding/decoding method according to the present invention adopt tree construction to carry out frequency transformation and/or frequency inverse transformation, thereby frequency representation is the state that more approaches the bank of filters of corresponding people's ear.The last node of described tree construction is corresponding arithmetic coding proportional band respectively.Therefore, all corresponding scale factor of each last node.

Can consider code efficiency and determine as the coding frequency band of the unit that transmits the necessary arithmetic coding model information of arithmetic coding.For example, let us supposes that last node has identical proportional band and coding frequency band.As shown in Figure 8, layer and last node are mapped.Because the data of corresponding last node are present in the time domain of same frequency band, in the process of dividing layer, do not divide the data of corresponding last node.

Thereby fix the 0th layer to frequency band a execution coding, fixedly thereby ground floor is carried out coding to frequency band b, fixedly thereby the second layer is carried out coding to frequency band c, thereby fix the 3rd layer to frequency band d execution coding, thereby fix the 4th layer to frequency band e execution coding, fixedly thereby layer 5 is carried out coding to frequency band f, thereby fixedly layer 6 is carried out coding to frequency band g, thereby and fixedly layer 7 frequency band h carried out encode.

At first, adopt the quantised samples value of corresponding codes model corresponding the 0th layer of the bit range arithmetic internal coding that allows.The 0th layer supplementary is by arithmetic coding.When the quantised samples value of the 0th layer of bit partition encoding, calculate bit quantity.If when bit quantity surpasses the bit range that is allowed, stop the 0th layer coding, begin the arithmetic coding of ground floor then.When the bit range of first and second layers of permission has additional bit part, uncoded the 0th layer quantised samples value is encoded.

Utilization is corresponding to the encode quantised samples value of corresponding ground floor of the encoding model of ground floor.The supplementary of arithmetic coding ground floor.After all quantised samples values of coding ground floor, the bit range that ground floor allowed has under the situation of added bit part, the bit range that uncoded the 0th layer quantised samples value is encoded and allowed up to arrival.When arriving the bit range that is allowed, stop the coding of ground floor, begin the coding of the second layer then.Carry out this processing up to layer 7, thereby finish the coding of layer 7.

If all quantised samples values of every layer are not considered the bit range that is allowed and are encoded, promptly, even the coded-bit amount has surpassed the bit range that is allowed and also all quantised samples values of every layer encoded, then can use the part bit range that allowed of one deck down.Like this, belonging to down, the quantised samples value of one deck can not be encoded.Therefore,, that is, only carry out decoding, then the quantised samples value in preset frequency is not decoded to lower level rather than to all layers if carry out the bit rate scalable coding.Therefore, the quantised samples of having decoded value changes up and down under frequency band, causes many birds effect (birdy effect), has worsened sound quality.

Because decoding processing is according to the bit range calculating bit quantity that is allowed when carrying out with the reverse processing of encoding process, can detect the time point of the predetermined layer that begins to decode.

Carry out coding from " msb " direction to " lsb " direction to spectrum line.Here, at the last node of the tree construction that is used for waveform transformation, the data bit on the same bits plane must be encoded together.For example, when last node has following quantised samples value,

00000000101010110101

11111100000000000000

00001100110000000110

Based on MDCT, the quantised samples value is grouped into five 4 * 4 bit-planes and carries out coding from left to right from top to bottom.Yet based on PWT, all quantised samples values are used as a bit-planes and carry out coding based on the N bit from the highest significant bit to minimum effective bit from lower frequency to upper frequency.The highest significant bit " 00000000101010110101 " is encoded from left to right based on the N bit, subsequent bits " 11111100000000000000 " is encoded from left to right based on the N bit, and Least significant bit " 00001100110000000110 " is based on the N bits of encoded.Herein, N is the integer more than or equal to " 1 ".Especially, if N is 1, then carry out binary coding.Because arithmetic coding can be with Bit Allocation in Discrete to decimal system position, for example 0.001 bit when coding one bit, can only utilize a small amount of bits of encoded bulk information.That is to say that code efficiency is quite high.Huffman encoding, another kind of lossless coding require at least one bit of each code element, so arithmetic coding has relatively poor code efficiency.

Fig. 9 is the synoptic diagram that is used to explain the BWE decoding of being carried out by BWE demoder 9.As Fig. 9, striped represents partly that by the data of FGS demoder 8 decodings dot matrix is partly represented the data that BWE demoder 9 is created.When all data in 1/4 part of sample frequency Fs belong to basic unit, Fig. 9 (a) shows only the decode situation of base band data of a decoding node, Figure 10 (b), (c) and (d) show FGS demoder 8 and decode corresponding to the data conditions of base band and at least one enhancement layer.That is to say, thus FGS demoder 8 energy decoded data control bit rates, and BWE demoder 9 can be created the omission frequency band data that FGS demoder 8 can not be decoded.

Based on said structure Code And Decode method according to the preferred embodiment of the invention will be described.

Figure 10 is the process flow diagram that is used to illustrate according to a coding method of the present invention.As Figure 10, in step 1001, a code device BWE coding audio data, the voice data that output bandwidth is limited, and generation is corresponding to the BWE information of basic unit.The BWE information of basic unit is necessary for utilizing the decoding node to create omission frequency band voice data based on the voice data that belongs to basic unit, and comprises the information of sealing.Thereby described code device is the hierarchy control bit rate with basic unit and at least one enhancement layer with the audio data coding of limited bandwidth.In more detail, in step 1002, the voice data of the successively pseudo-wavelet transform limited bandwidth of code device, in step 1003, quantize the voice data of limited bandwidth, and in step 1004, thereby the voice data of Huffman encoding limited bandwidth and the voice data of limited bandwidth is packaged into hierarchy control bit rate.In step 1005, the voice data of the multiplexed limited bandwidth of this code device and BWE information, output audio bit stream then.In more detail, voice data that the multiplexed in the following order encoded bandwidth of code device is limited and BWE information: the location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, location BWE information, and the location is corresponding to the part of all the other enhancement layers encoded bandwidth restricted data.Perhaps multiplexed in the following order: location BWE information, location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, and the location is corresponding to the part of all the other enhancement layers encoded bandwidth restricted data.

Figure 11 is the process flow diagram that is used to illustrate according to a coding/decoding method of the present invention.With reference to Figure 11, in step 1101, this decoding device multichannel is decomposed the voice data of an input audio bitstream and the limited bandwidth of sampling, and the voice data of this limited bandwidth has been encoded into the hierarchy with a basic unit and at least one enhancement layer and BWE information.That is to say, decoding device multichannel in the following order decomposes the input audio bitstream: its sampling from the input audio bitstream corresponding to the data of basic unit, BWE information with corresponding to the data of all the other enhancement layers, it or sampling are from the BWE information in the input audio bitstream, corresponding to the data of basic unit with corresponding to the data of all the other enhancement layers.Then, the voice data control bit rate of at least a portion thereby this decoding device is decoded corresponding to the limited bandwidth of basic unit.In more detail, in step 1102, decoding device is carried out arithmetic decoding up to destination layer, at step 1103 inverse quantization, thereby and in the pseudo-wavelet transform acquisition of step 1104 pcm audio data.In step 1105, based on pcm audio data that step 1104 obtained and with reference to BWE information, decoding device is created and is at least a portion not by the pcm audio data in the frequency band that covers in the pcm audio data that step 1104 obtained, and then the pcm audio data of being created is mended in the pcm audio data that obtain in step 1104.

As mentioned above, the invention provides a kind of bit scalable coding and coding/decoding method and device, only need the recovered part bit stream just can obtain high-quality sound.

Utilize low volume data that high FGS can be provided based on arithmetic coding, and based on PWT, frequency resolution can be identical with people's ear transmitting function.Therefore, better based on the existing time domain/frequency domain resolution of PWT encoding ratio based on the MDCT coding.Thereby, can produce high quality sound from lower level.

Though the present invention is described in detail with reference to exemplary embodiments, clearly, those of ordinary skills can do various changes to form of the present invention and details under the situation of the spirit and scope of the present invention that do not break away from claims and limited.

Claims

1, a kind of method of coding audio data, this method comprises:

Voice data is carried out the limited voice data of bandwidth extended coding, output bandwidth and produces the bandwidth extend information;

With the voice data arithmetic coding of limited bandwidth is the hierarchy with a basic unit and at least one enhancement layer, thus the control bit rate;

Voice data and bandwidth extend information to described arithmetic coding limited bandwidth are carried out multiplexed.

2, method according to claim 1, wherein arithmetic coding comprises:

Differential coding is corresponding to the supplementary of basic unit;

The bit partition encoding is corresponding to a plurality of quantised samples values of basic unit;

And repeat differential coding and the bit partition encoding is finished coding up to a plurality of predetermined layer for next enhancement layer.

3, method according to claim 1, wherein arithmetic coding comprises:

Differential coding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit;

The reference encoder model information, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit;

For next enhancement layer repeats differential coding and the bit partition encoding is finished coding up to a plurality of predetermined layer.

4, according to claim 2 or 3 described methods, wherein the quantised samples value obtains by the pseudo-wavelet transform of voice data.

5, method according to claim 1, wherein limited voice data and the bandwidth extend information of encoded bandwidth is multiplexed in the following order: the location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, location bandwidth extend information, and the location is corresponding to the part of all the other enhancement layers encoded bandwidth restricted data.

6, method according to claim 1, wherein limited voice data and the bandwidth extend information of encoded bandwidth is multiplexed in the following order: location bandwidth extend information, the location is corresponding to the part of the basic unit limited voice data of encoded bandwidth, and the location is corresponding to the part of all the other enhancement layers encoded bandwidth restricted data.

7, a kind of method of decoding audio data, this method comprises:

An input audio bitstream is carried out that multichannel is decomposed and the voice data and the bandwidth extend information of limited bandwidth are sampled, and the voice data of this limited bandwidth is encoded as the hierarchy that comprises a basic unit and at least one enhancement layer;

Arithmetic decoding at least a portion is corresponding to the voice data of the limited bandwidth of described basic unit;

Decoded portion and reference bandwith extend information according to the voice data of limited bandwidth, generation is in not by the voice data within least a portion of the frequency band of the decoded portion of the voice data of limited bandwidth covering, then the voice data that is produced is mended the decoded portion into the voice data of limited bandwidth.

8, method according to claim 7, thus wherein be created in voice data that voice data in this partial-band the arrives limited bandwidth border of coded portion.

9, method according to claim 8 wherein is created in the voice data of this partial-band, thereby arrives the border of the bank of filters that is used for pseudo-wavelet transform.

10, method according to claim 8 if wherein the voice data no show is used for the border of the bank of filters of pseudo-wavelet transform, is then inserted the limited bandwidth sound signal lap of decoded portion and the voice data that produces.

11, method according to claim 7, wherein import audio bitstream multichannel decomposition in the following order: from the data of input audio bitstream sampling corresponding to basic unit, from input audio bitstream sampling bandwidth extend information, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

12, method according to claim 7, wherein import audio bitstream multichannel decomposition in the following order: from input audio bitstream sampling bandwidth extend information, from the data of input audio bitstream sampling corresponding to basic unit, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

13, method according to claim 7, wherein arithmetic decoding comprises:

Differential decoding is corresponding to the supplementary of basic unit;

Bit is cut apart a plurality of quantised samples values of decoding corresponding to basic unit;

Cut apart decoding and finish decoding for next enhancement layer repeats differential decoding and bit up to a plurality of predetermined layer.

14, method according to claim 7, wherein arithmetic decoding comprises:

Differential decoding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit;

Reference encoder model information, bit are cut apart a plurality of quantised samples values of decoding corresponding to basic unit;

And cut apart decoding and finish decoding up to a plurality of predetermined layer for next enhancement layer repeats differential decoding and bit.

15, a kind of device of coding audio data, this device comprises:

One bandwidth extended coding device is used for the limited voice data of bandwidth extended coding voice data, output bandwidth and produces the bandwidth extend information;

One particulate scalable encoder, thus the audio data coding that is used for limited bandwidth is the hierarchy control bit rate that comprises a basic unit and at least one enhancement layer;

One multiplexer is used for carrying out multiplexed to the voice data and the bandwidth extend information of arithmetic coding limited bandwidth.

16, device according to claim 15, wherein particulate scalable encoder differential coding is corresponding to the supplementary of basic unit, the bit partition encoding is corresponding to a plurality of quantised samples values of basic unit, and the bit partition encoding is finished coding corresponding to supplementary and a plurality of quantised samples value of next enhancement layer up to a plurality of predetermined layer.

17, device according to claim 15, wherein particulate scalable encoder differential coding is corresponding to the supplementary that comprises scale factor information and encoding model information of basic unit, reference encoder model information bit partition encoding is corresponding to a plurality of quantised samples values of basic unit, coding is finished coding corresponding to the supplementary that next enhancement layer comprises scale factor information and encoding model information up to a plurality of predetermined layer, and the bit partition encoding is corresponding to supplementary and a plurality of quantised samples value of next enhancement layer.

18, device according to claim 15, wherein the particulate scalable encoder obtains the quantised samples value by pseudo-wavelet transform voice data.

19, device according to claim 15, limited voice data and the bandwidth extend information of the multiplexed in the following order encoded bandwidth of multiplexer wherein: a location part is corresponding to the limited voice data of the encoded bandwidth of basic unit, location bandwidth extend information, and the location is corresponding to the part of all the other enhancement layers encoded bandwidth restricted data.

20, a kind of device that is used for decoding audio data, this device comprises:

One demultiplexer is used for an input audio bitstream is carried out that multichannel is decomposed and the voice data and the bandwidth extend information of the limited bandwidth that is encoded into the hierarchy with a basic unit and at least one enhancement layer are sampled;

The scalable arithmetic decoder of one particulate is used for the voice data corresponding at least a portion limited bandwidth of basic unit is decoded;

One bandwidth extension decoder, be used for decoded portion and reference bandwith extend information according to the voice data of limited bandwidth, generation is at least a portion not by the voice data in the frequency band of the decoded portion of the voice data of limited bandwidth covering, then the voice data that is produced is mended the decoded portion into the voice data of limited bandwidth.

21, device according to claim 20, wherein the scalable Hafman decoding device of particulate differential decoding is corresponding to the supplementary of basic unit, bit is cut apart a plurality of quantised samples values of decoding corresponding to basic unit, and decoding corresponding to the supplementary of next enhancement layer up to a plurality of predetermined layer complete decoding, and bit is cut apart a plurality of quantised samples values of decoding corresponding to next enhancement layer.

22, device according to claim 20, wherein demultiplexer in the following order multichannel decompose the input audio bitstream: from the data of input audio bitstream sampling corresponding to basic unit, from input audio bitstream sampling bandwidth extend information, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.

23, device according to claim 20, wherein demultiplexer in the following order multichannel decompose the input audio bitstream: from input audio bitstream sampling bandwidth extend information, from the data of input audio bitstream sampling corresponding to basic unit, and from the data of input audio bitstream sampling corresponding to all the other enhancement layers.