CN101371296B

CN101371296B - Apparatus and method for encoding and decoding signal

Info

Publication number: CN101371296B
Application number: CN2007800026724A
Authority: CN
Inventors: 郑亮源; 吴贤午; 金孝镇; 崔升钟; 李东锦; 姜泓求; 李在晟
Original assignee: IND ACADEMIC COOP; LG Electronics Inc
Current assignee: IND ACADEMIC COOP; LG Electronics Inc
Priority date: 2006-01-18
Filing date: 2007-01-18
Publication date: 2012-08-29
Anticipated expiration: 2027-01-18
Also published as: CN101371295B; CN101371295A; CN101371296A; CN101371297A

Abstract

Encoding and decoding apparatuses and encoding and decoding methods are provided. The decoding method includes extracting a plurality of encoded signals from an input bitstream, determining which of a plurality of decoding methods is to be used to decode each of the encoded signals, decoding the encoded signals using the determined decoding methods, and synthesizing the decoded signals. Accordingly, it is possible to encode signals having different characteristics at an optimum bitrate by classifying the signals into one or more classes according to the characteristics of the signals and encoding each of the signals using an encoding unit that can best serve the class where a corresponding signal belongs. In addition, it is possible to efficiently encode various signals including audio and speech signals.

Description

The equipment and the method that are used for the Code And Decode signal

Technical field

The present invention relates to Code And Decode equipment and Code And Decode method, and more specifically, relating to can be according to the characteristic of signal Code And Decode equipment and the Code And Decode method with best bit rate coding or decoded signal.

Background technology

Conventional audio coder can provide high-quality audio signal with 48kbps or bigger high bit rate, but is inefficient for processes voice signals.On the other hand, conventional sound decorder can be with 12kbps or littler low bit rate encoding speech signal efficiently, but is inefficient for the various sound signals of coding.

Summary of the invention

Technical matters

The present invention provides and can have the Code And Decode equipment and the Code And Decode method of the signal (for example, voice and sound signal) of different qualities with best bit rate coding or decoding.

Technical scheme

According to an aspect of the present invention, a kind of coding/decoding method is provided, has comprised: from incoming bit stream, extracted a plurality of coded signals; Confirm that in a plurality of coding/decoding methods which will be used to each coded signal of decoding; Use determined coding/decoding method decoding and coding signal; With synthetic decoded signal.

According to a further aspect in the invention, a kind of decoding device is provided, has comprised: the position parse module, it extracts a plurality of coded signals from incoming bit stream; The demoder determination module, it confirms that in a plurality of decoding units which will be used to each coded signal of decoding; Decoder module, it comprises decoding unit, and uses determined decoding unit decodes coded signal; And synthesis module, its synthetic decoded signal.

According to a further aspect in the invention, a kind of coding method is provided, has comprised: input signal has been divided into a plurality of splitting signals; Based on each characteristic of splitting signal, confirm that in a plurality of coding methods which will be used for each of code division signal; Use determined encoded splitting signal; With produce bit stream based on the splitting signal of having encoded.

According to a further aspect in the invention, a kind of encoding device is provided, has comprised: the signal segmentation module, it is divided into a plurality of splitting signals with input signal; The scrambler determination module, it is based on each characteristic of splitting signal, confirms that in a plurality of coding units which will be used for each of code division signal; Coding module, it comprises coding unit, and uses determined coding unit code division signal; With position package module, it produces bit stream based on the splitting signal of having encoded.

Advantageous effects

Therefore, through according to the characteristic of signal signal being categorized as a classification or a plurality of classification and using each signal of coding unit coding that can best satisfy the classification under the corresponding signal, can have the signal of different qualities with best bit rate coding.In addition, can encode efficiently and comprise the various signals of audio frequency and voice signal.

Description of drawings

Fig. 1 is the block diagram of encoding device according to an embodiment of the invention;

Fig. 2 is the block diagram of the embodiment of the sort module shown in Fig. 1;

Fig. 3 is the block diagram of the embodiment of the pretreatment unit shown in Fig. 2;

Fig. 4 is the block diagram of equipment that is used for calculating the perceptual entropy of input signal according to an embodiment of the invention;

Fig. 5 is the block diagram of another embodiment of the sort module shown in Fig. 1;

Fig. 6 is the block diagram of the embodiment of the signal segmentation unit shown in Fig. 5;

Fig. 7 and 8 is the views that are used for explaining the method that merges a plurality of splitting signals according to an embodiment of the invention;

Fig. 9 is the block diagram of another embodiment of the signal segmentation unit shown in Fig. 5;

Figure 10 is used for explaining the view that according to an embodiment of the invention input signal is divided into the method for a plurality of splitting signals;

Figure 11 is the block diagram of the embodiment of cell really shown in Fig. 5;

Figure 12 is the block diagram of the embodiment of the coding unit shown in Fig. 1;

Figure 13 is the block diagram of another embodiment of the coding unit shown in Fig. 1;

Figure 14 is the block diagram of encoding device according to another embodiment of the present invention;

Figure 15 is the block diagram of decoding device according to an embodiment of the invention; And

Figure 16 is the block diagram of the embodiment of the synthesis unit shown in Figure 15.

Embodiment

Below with reference to accompanying drawing the present invention is described more fully, exemplary embodiment of the present invention shown in the drawings.

Fig. 1 is the block diagram of encoding device according to an embodiment of the invention.With reference to figure 1, encoding device comprises sort module 100, coding module 200 and position package module 300.

Coding module 200 comprises a plurality of coding units of first coding unit 210 to the m coding unit 220 of carrying out the different coding method.

Sort module 100 is divided into a plurality of splitting signals with input signal, and each of splitting signal is matched in first coding unit 210 to the m coding unit 220.In first coding unit 210 to the m coding unit 220 some can be mated two or more splitting signals or do not mated splitting signal.

Sort module 100 can divide coordination amount (bit quantity) with each splitting signal or the definite order of wanting the code division signal of encoding.

Use the coding module 200 of any one each splitting signal of encoding in first coding unit 210 to the m coding unit 220 to be matched corresponding splitting signal.Sort module 100 is analyzed the characteristic of each splitting signal, and select in first coding unit 210 to the m coding unit 220 can be the most efficiently according to encode of each splitting signal of analysis result.

The coding unit of code division signal can be believed to realize the highest compression efficiency the most efficiently.

For example, the splitting signal that can easily be modeled as coefficient and surplus can be encoded by sound decorder efficiently, and the splitting signal that can not easily be modeled as coefficient and surplus can be encoded by audio coder efficiently.

If less than predefined threshold value, then splitting signal can be thought the signal of modeling easily to the energy of the surplus that obtains through the modeling splitting signal to the ratio of the energy of splitting signal.

Because the splitting signal that on time shaft, presents highly redundant can use wherein linear prediction method based on previous signal estimation current demand signal by modeling well, therefore, use the sound decorder of linear prediction interpretation method this splitting signal of can encoding the most efficiently.

Position package module 300 produces the bit stream that will be transmitted based on the splitting signal of having encoded that is provided by coding module 200 with about the additional coding information of the splitting signal of having encoded.Position package module 300 can be used unformatted (bit-plain) method in position or bit slice (bit sliced) arithmetic coding method and produce the bit stream with variable bit rate.

Owing to bit rate restriction not have splitting signal or the bandwidth of coding can be from the decoded signal or the bandwidth recovery of inserting by in using, the demoder of extrapolation or clone method provides.And, can be included in the bit stream that will be transmitted about the compensated information of the splitting signal that is not encoded.

With reference to figure 1, sort module 110 can comprise a plurality of taxons of first taxon 110 to the n taxon 120.In first taxon 110 to the n taxon 120 each can be divided into a plurality of splitting signals with input signal; The territory of switching signal; Extract the characteristic of input signal, input signal is classified or input signal is matched in first coding unit 210 to the m coding unit 220 according to the characteristic of input signal.

One in first taxon 110 to the n taxon 120 can be pretreatment unit, this pretreatment unit to input signal carry out pretreatment operation make that input signal can be converted into can be by encoded signals efficiently.Pretreatment unit can be divided into a plurality of components with input signal, for example coefficient component and component of signal, and can before other taxon is carried out their operation, carry out pretreatment operation to input signal.

Can be according to characteristic, external environmental factor and the target bit rate of input signal, pre-service input signal optionally, and pre-service some from a plurality of splitting signals that input signal obtains optionally only.

Sort module 100 can be according to the apperceive characteristic information of the input signal that is provided by psychologic acoustics MBM 400 and input signal is classified.The example of apperceive characteristic information comprises masking threshold, signal to noise ratio (S/N ratio) (SMR) and perceptual entropy.

In other words; Apperceive characteristic information according to input signal; The for example masking threshold of input signal and SNR, sort module 100 can be divided into a plurality of splitting signals with input signal maybe can match one or more in first coding unit 210 to the m coding unit 220 with each splitting signal.

In addition, sort module 100 can receive tone, zero crossing rate (ZCR) and the information of linear predictor coefficient and the classified information of previous frame such as input signal, and can classify to input signal according to the information that is received.

With reference to figure 1, the object information of being exported by coding module 200 of having encoded can be fed back to sort module 100.

In case input signal be divided into a plurality of splitting signals by sort module 100 and confirmed will by in first coding unit 210 to the m coding unit 220 which, use what amount, what order to come the code division signal with, just come the code division signal according to determined result.In fact the position amount of each splitting signal of being used to encode can be same as the position amount of being distributed by sort module 100.

The information of the difference between the position amount that specifies the actual position amount of using and distributed can be fed back to sort module 100, and making sort module 100 can be other splitting signal increases the position amount of being distributed.If the position amount of actual use is greater than the position amount of being distributed, then sort module 100 can be the position amount that other splitting signal minimizing is distributed.

The coding unit of actual coding splitting signal can be same as the coding unit that is matched splitting signal by sort module 100.In this case, signal can be fed back to sort module 100, and the coding unit of indication actual coding splitting signal is different from the coding unit that is matched splitting signal by sort module 100.Then, sort module 100 can match the coding unit except the coding unit that before matched splitting signal with splitting signal.

Sort module 100 can be divided into a plurality of splitting signals with input signal once more according to the object information of having encoded that feeds back to it.In this case, sort module 100 can obtain to have a plurality of splitting signals with the structure various structure of the previous splitting signal that obtains.

If sort module 100 selected encoding operations are different from the encoding operation of actual execution, then the information about the difference between them can feed back to sort module 100, makes sort module 100 can confirm the information that encoding operation is relevant once more.

Fig. 2 is the block diagram of the embodiment of the sort module 100 shown in Fig. 1.With reference to figure 2, the first taxons can be pretreatment unit, and this pretreatment unit is carried out pretreatment operation to input signal, makes input signal to be encoded efficiently.

A plurality of pretreaters that can comprise first pretreater 111 to the n pretreater 112 of carrying out different preprocess methods with reference to figure 2, the first taxons 110.First taxon 110 can use in first pretreater 111 to the n pretreater 112 one to come input signal is carried out pre-service with characteristic, external environmental factor and target bit rate according to input signal.And first taxon 110 can use 112 pairs of input signals of first pretreater 111 to n pretreater to carry out two or more pretreatment operation.

Fig. 3 is the block diagram of the embodiment of first pretreater 111 to the n pretreater 112 shown in Fig. 2.With reference to figure 3, pretreater comprises coefficient extraction apparatus 113 and surplus extraction apparatus 114.

Coefficient extraction apparatus 113 is analyzed input signals and is extracted the coefficient of the characteristic of representing input signal from input signal.Surplus extraction apparatus 114 extracts from input signal and has used the coefficient that is extracted to remove the surplus of redundant component from it.

Pretreater can be carried out the linear prediction decoded operation to input signal.In this case, coefficient extraction apparatus 113 passes through input signal is carried out linear prediction analysis and extracted linear predictor coefficient from input signal, and the linear predictor coefficient that surplus extraction apparatus 114 coefficient of performance extraction apparatuss 113 provide extracts surplus from input signal.Removed redundant surplus from it and can have the form identical with white noise.

To describe linear prediction analysis method according to an embodiment of the invention in detail below.

Prediction signal through linear prediction analysis obtains can be made up of the linear combination of previous input signal, and is represented like equation (1):

Mathematic graph 1

\hat{x} (n) = Σ_{j = 1}^{p} α_{j} x (n - j)

Wherein, p representes the linear prediction exponent number, and 1 to p representes the linear predictor coefficient that obtains through the mean square deviation (MSE) that minimizes between input signal and the estimated signal.

The transport function P (z) that is used for linear prediction analysis can be represented by equation (2):

Mathematic graph 2

P (z) = Σ_{k = 1}^{p} α_{k} z^{- k}

With reference to figure 3, pretreater can use linear predictive interpretation (WLPC) method of curling to extract linear predictor coefficient and surplus from input signal, this linear prediction analysis that linear predictive interpretation method is another type of curling.Have about unit delay Z through replacement ^-1The all-pass filter of transport function A (z), can realize the WLPC method.Transport function A (z) can be represented by equation (3):

Mathematic graph 3

A (z) = [\frac{z^{- 1} - λ}{1 - λ z^{- 1}}]

Wherein, λ representes the all-pass coefficient.Through changing the all-pass coefficient, can change the resolution of the signal that will analyze.For example; If the signal height of being analyzed concentrates on a certain frequency band; For example, if be the sound signal of high concentration, then make the resolution of low band signal to be increased and the signal that can encode efficiently and will be analyzed through setting the all-pass coefficient in low-frequency band by the signal analyzed.

In the WLPC method, compare with high-frequency signal, with higher resolution analysis low frequency signal.Like this, the WLPC method can realize high estimated performance and can carry out modeling to low frequency signal better for low frequency signal.

The all-pass coefficient can be according to characteristic, external environmental factor and the target bit rate of input signal and is changed along time shaft.If the all-pass coefficient changes in time, the significantly distortion of sound signal that then obtains through decoding.Like this, when the all-pass index variation, the smooth method may be used on the all-pass coefficient makes the all-pass coefficient little by little to change, and makes distorted signals to minimize.Can confirm as the scope of the value of current all-pass coefficient value can confirm through previous all-pass coefficient value.

Replace original signal, masking threshold can be used as the input that is used for estimating linear predictor coefficient.More specifically, masking threshold can be exchanged into time-domain signal, and can use time-domain signal to carry out WLPC as input.Also can use surplus to carry out the prediction of linear predictor coefficient as input.In other words, linear prediction analysis can be carried out and surpass once, therefore obtains the surplus of further albefaction.

Can comprise with reference to figure 2, the first taxons 110: first pretreater 111, it carries out top linear prediction analysis with reference to equation (1) and (2) description; With the second pretreater (not shown), it carries out WLPC.First taxon 100 can be selected in the first processor 111 and second pretreater, perhaps can not decide according to characteristic, external environmental factor and the target bit rate of input signal or not is carried out linear prediction analysis.

If the value of all-pass coefficient is 0, then second pretreater can be identical with first pretreater 111.In this case, first taxon 110 can only comprise second pretreater, and selects in linear prediction analysis method and the WLPC method according to the value of all-pass coefficient.And first taxon 110 can be carried out linear prediction analysis, and any method in perhaps linear prediction analysis method and the WLPC method is selected in frame unit.

Indicate whether to carry out the information of linear prediction analysis and which the selecteed information in linear prediction analysis method and the WLPC side of indicating can be included in the bit stream that will be transmitted.

Position package module 300 receives linear predictor coefficient, indicates whether to carry out the information of linear prediction decoding and the information of the actual Linear Predictive Coder that uses of identification from first taxon 110.Then, position package module 300 is inserted the bit stream that will be transmitted with the information of all receptions.

Being used for that input signal is encoded to the required position amount of the almost undistinguishable signal of tonequality of tonequality and original input signal can be determined through the perceptual entropy of calculating input signal.

Fig. 4 is the block diagram that is used for calculating the equipment of perceptual entropy according to an embodiment of the invention.With reference to figure 4, this equipment comprises bank of filters 115, linear prediction unit 116, psychologic acoustics modeling unit 117, first computing unit 118 and second computing unit 119.

The perceptual entropy PE of input signal can use equation (4) to calculate:

Mathematic graph 4

PE = \frac{1}{2 π} {&Integral;}_{0}^{π} \max [0, lo g_{2} \frac{X (e^{jw})}{T (e^{jw})}] dw (bit / sample)

Wherein, X (e ^Jw) expression original input signal energy level, T (e ^Jw) the expression masking threshold.

In relating to the WLPC method of using all-pass filter, can use the ratio of masking threshold of energy and surplus of the surplus of input signal to calculate the perceptual entropy of input signal.More specifically, use the encoding device of WLPC method can use equation (5) to calculate the perceptual entropy PE of input signal:

Mathematic graph 5

PE = \frac{1}{2 π} {&Integral;}_{0}^{π} \max [0, lo g_{2} \frac{R (e^{jw})}{T^{'} (e^{jw})}] dw (bit / sample)

Wherein, R (e ^Jw) energy of surplus of expression input signal, T ' (e ^Jw) expression surplus masking threshold.

Masking threshold T ' (e ^Jw) can represent by equation (6):

Mathematic graph 6

T′(e ^jw)＝T(e ^jw)/|H(e ^jw)| ²

Wherein, T (e ^Jw) expression original signal masking threshold, H (e ^Jw) expression is used for the transport function of WLPC.Psychologic acoustics modeling unit 320 is used transfer function H (e ^Jw) and scale factor band territory in masking threshold T (e ^Jw) can calculate masking threshold T ' (e ^Jw).

Receive surplus that obtains by linear prediction unit 116 performed WLPC and the masking threshold of exporting by psychologic acoustics modeling unit 117 with reference to 4, the first computing units of figure 118.Bank of filters 116 can be carried out frequency inverted to original signal, and the result of frequency inverted can be input to psychologic acoustics modeling unit 117 and second computing unit 119.Bank of filters 115 can be carried out Fourier transform to original signal.

First computing unit 118 can use the ratio of energy of masking threshold and the surplus of the original signal of being removed by the spectrum of the transport function of WLPC composite filter to calculate perceptual entropy.

The curling perceptual entropy WPE that is divided into the signal of 60 or more a plurality of inhomogeneous dividing strips (partitionband) with different bandwidth can use WLPC to be calculated, shown in equation (7):

Mathematic graph 7

WPE = - Σ_{b = 1}^{b \max} (w_{high} (b) - w_{low} (b)) \cdot \log_{10} (\frac{{nb}_{res} (b)}{e_{res} (b)})

e_{res} (b) = Σ_{w = w_{low} (b)}^{w_{high} (b)} res {(w)}^{2}

{nb}_{res} (b) = Σ_{w = w_{low} (b)}^{w_{high} (b)} \frac{{nb}_{linear} (w)}{h {(w)}^{2}}

Wherein, b representes the index of the dividing strip that the applied mental acoustic model obtains, e _Res(b) energy of the surplus among the expression dividing strip b with, w_low (b) and w_high (b) represent the minimum and highest frequency among the dividing strip b, nb respectively _Linear(w) masking threshold of the dividing strip of expression linear mapping, h (w) ²Linear prediction decoding (LPC) energy spectrum of expression frame, nb _Res(w) expression is corresponding to the linear masking threshold of surplus.

On the other hand, be divided into the curling perceptual entropy WPE of the signal of 60 or more a plurality of even dividing strips with same band _SubCan use WLPC to be calculated, shown in equation (8):

Mathematic graph 8

{nb}_{sub} (s) = \min_{s_{low} (s) < w < s_{high} (s)} (\frac{{nb}_{linear} (w)}{h {(w)}^{2}})

{WPE}_{sub} = - Σ_{s = 1}^{s \max} (s_{high} (s) - s_{low} (s)) \cdot \log_{10} (\frac{{nb}_{sub} (s)}{e_{sub} (s)})

e_{sub} (s) = Σ_{w = s_{low} (s)}^{s_{high} (s)} res {(w)}^{2}

Wherein, s representes the index of the linear subband of separating, s _Low(w) and s _High(w) the minimum and highest frequency among the subband s of the separation of expression linearity respectively, nb _Sub(s) masking threshold of the linear subband s that separates of expression, e _Sub(s) energy of the linear subband s that separates of expression, the frequency among the promptly linear subband s that separates with.Masking threshold nb _Sub(s) be the minimum value of a plurality of masking thresholds among the linear subband s that separates.

For have same band and have be higher than input spectrum and the band of threshold value, can not calculate perceptual entropy.Like this, the curling perceptual entropy WPE of equation (8) _SubCan be lower than the curling perceptual entropy WPE that high-resolution equation (7) is provided for low-frequency band.

Use WLPC to calculate the perceptual entropy WPE that curls for scale factor band with different bandwidth _Sf, represented like equation (9):

Mathematic graph 9

{nb}_{sf} (s) = \min_{s f_{low} (s) < w < {sf}_{high} (s)} (\frac{{nb}_{linear} (w)}{h {(w)}^{2}})

{WPE}_{sf} = - Σ_{f = 1}^{f \max} (s_{high} (f) - s_{low} (f)) \cdot \log_{10} (\frac{{nb}_{sf} (f)}{e_{sf} (f)})

e_{sf} (s) = Σ_{w = {sf}_{low} (s)}^{{sf}_{high} (s)} res {(w)}^{2}

Wherein, f representes the index of scale factor band, nb _Sf(f) the minimum masking threshold of expression scale factor band f, WPE _SfThe ratio of the input signal of expression scale factor band f and the masking threshold of scale factor band f, e _Sf(s) all frequencies among the expression scale factor band f with, the i.e. energy of scale factor band f.

Fig. 5 is the block diagram of another embodiment of the sort module 100 shown in Fig. 1.With reference to figure 5, sort module comprises signal segmentation unit 121 and definite unit 122.

More specifically, signal segmentation unit 121 is divided into a plurality of splitting signals with input signal.For example, signal segmentation unit 121 can use sub-filter that input signal is divided into a plurality of frequency bands.Frequency band can have identical bandwidth or different bandwidth.As stated, through can best satisfying the coding unit of the characteristic of splitting signal, splitting signal can be encoded with other splitting signal by quilt discretely.

Signal segmentation unit 121 can be divided into a plurality of splitting signals with input signal, and for example, a plurality of band signals make that the interference between the band signal can minimize.Signal segmentation unit 121 can have double filter group structure.In this case, also divisible each splitting signal in signal segmentation unit 121.

About the carve information of the splitting signal that obtained by signal segmentation unit 121, for example the breath of taking a message of the total number of splitting signal and each splitting signal can be included in the bit stream that will be transmitted.The decoding device synthetic decoded signal of splitting signal and reference segmentation information of decoding separably, thus original input signal recovered.

Carve information can be stored as form.Bit stream can comprise the identification information of the form that is used to cut apart original input signal.

Can confirm the importance of each splitting signal (for example, a plurality of band signals), and can regulate bit rate for each splitting signal according to determined result to tonequality.More specifically, the importance of the splitting signal on-fixed value that may be defined as fixed value or change according to the characteristic of the input signal of each frame.

If voice and sound signal are mixed into input signal, signal segmentation unit 121 can be divided into voice signal and sound signal with input signal according to the characteristic of voice signal and the characteristic of sound signal.

Confirm that unit 122 can confirm in first coding unit 210 to the m coding unit 220 in the coding module 200 which each splitting signal of can encoding the most efficiently.

Confirm that unit 122 is categorized as some groups with splitting signal.For example; Confirm that unit 122 can be categorized as N classification with splitting signal; And, confirm that in first coding unit 210 to the m coding unit 220 which will be used to each splitting signal of encoding through each classification in N the classification is matched in first coding unit 210 to the m coding unit 220.

More specifically; Suppose that coding module 200 comprises that first coding unit 210 is to m coding unit 220; Confirm that then unit 122 can be categorized as first to the m classification with splitting signal, this first can be encoded by first coding unit 210 to the m coding unit 220 respectively to the m classification the most efficiently.

For this reason, can by in first coding unit 210 to the m coding unit 220 each the most efficiently the characteristic of encoded signals can be confirmed in advance, and can limit first the characteristic to the m classification according to this result who confirms.After this, confirm that unit 122 can extract the characteristic of each splitting signal, and according to the result who extracts with each splitting signal be categorized as first share identical characteristics to the m classification with corresponding splitting signal a classification.

First example to the m classification comprises voiced speech classification, unvoiced speech classification, background noise classification, noiseless classification, tone audio categories, non-pitch audio categories and voiced speech/audio mix classification.

Through with reference to the apperceive characteristic information that provides by psychologic acoustics MBM 400 about splitting signal; The for example masking threshold of splitting signal, SMR or perceptual entropy level (level) confirm that unit 122 can confirm that in first coding unit 210 to the m coding unit 220 which will be used to each splitting signal of encoding.

Through with reference to about the apperceive characteristic information of splitting signal, confirm that unit 122 can confirm that the position amount so that encode each splitting signal, perhaps confirms to want the order of code division signal.

Can comprise in the bit stream that is transmitted by definite information that is obtained of confirming that unit 122 is carried out; For example; The information of the position amount that indication has with each splitting signal that will be encoded through in first coding unit 210 to the m coding unit 220 which, and the information of the order of indication code division signal.

Fig. 6 is the block diagram of the embodiment of the signal segmentation unit 121 shown in Fig. 5.With reference to figure 6, the signal segmentation unit comprises dispenser 123 and combiner 124.

Dispenser 123 can be divided into a plurality of splitting signals with input signal.Combiner 124 can be merged into individual signals with the splitting signal with similar characteristics.For this reason, combiner 124 can comprise the composite filter group.

For example, dispenser 123 can be divided into 256 bands with input signal.In 256 bands, these bands with similar characteristics can be merged into single band by combiner 124.

With reference to figure 7, combiner 124 can be merged into single combined signal with a plurality of splitting signals located adjacent one another.In this case, combiner 124 can be merged into single combined signal with a plurality of adjacent splitting signals according to predefined rule, and does not consider the characteristic of adjacent splitting signal.

Whether alternatively, with reference to figure 8, combiner 124 can be merged into single combined signal with a plurality of splitting signals with similar characteristics, and adjacent one another are irrelevant with splitting signal.In this case, combiner 124 can be with merging into single combined signal by a plurality of splitting signals that identical coding unit is encoded efficiently.

Fig. 9 is the block diagram of another embodiment of the signal segmentation unit 121 shown in Fig. 5.With reference to figure 9, the signal segmentation unit comprises first dispenser 125, second dispenser 126 and the 3rd dispenser 127.

More specifically, input signal is cut apart in signal segmentation unit 121 gradablely.For example; Input signal can be divided into two splitting signals by first dispenser 125; One in two splitting signals can be divided into three splitting signals by second dispenser 126, and in three splitting signals one can be divided into three splitting signals by the 3rd dispenser 127.By this way, input signal may be partitioned into 6 splitting signals altogether.Signal segmentation unit 121 is divided into a plurality of bands with different bandwidth with input signal gradablely.

In the embodiment show in figure 9, cut apart input signal according to 3 grades of classifications, but the invention is not restricted to this.In other words, according to 2 grades or 4 grades or more multistage classification, input signal may be partitioned into a plurality of splitting signals.

One in first dispenser, 125 to the 3rd dispensers 127 in the signal segmentation unit 121 can be divided into a plurality of time-domain signals with input signal.

Figure 10 explains that signal segmentation unit 121 is divided into input signal the embodiment of a plurality of splitting signals.

During the short frame length, voice or sound signal be stable state normally.Yet sometimes, for example during transition, voice or sound signal can have the unstable state characteristic.

In order to analyze unstable signal efficiently and to improve the efficient of this unstable signal of coding, can use small echo or empirical mode decomposition (EMD) method according to the encoding device of present embodiment.In other words, the encoding device according to present embodiment can use unfixed transforming function transformation function to analyze the characteristic of input signal.For example, signal segmentation unit 121 can use unfixed frequency band subband filter method input signal to be divided into a plurality of bands with bandwidth varying.

To describe the method that input signal is divided into a plurality of splitting signals through EMD below in detail.

In the EMD method, input signal can be decomposed into one or more natural mode functions (IMF).IMF must satisfy following condition: the number of extreme value and the number of zero crossing must equate or differ one at the most; The mean value of envelope of being confirmed by local maximum and the envelope confirmed by local minimum is zero.

The IMF representative is similar to the simple oscillation pattern of the component in the simple harmonic function, therefore makes it can use the EMD method to decompose input signal efficiently.

More specifically; In order to extract IMF from input signal s (t); Envelope can be produced through connecting all local extremums of using the cubic spline interpolating method to confirm by the local maximum of input signal s (t), and envelope can be produced down through connecting all local extremums of using the cubic spline interpolating method to confirm by the local minimum of input signal s (t).The all values that input signal s (t) can have can be between envelope and the following envelope.

After this, can calculate the mean value m (t) of envelope and following envelope.After this, can calculate the first component h through deducting mean value m (t) from input signal s (t) ₁(t), shown in equation (10):

Mathematic graph 10

s(t)-m ₁(t)＝h ₁(t)

If the first component h ₁(t) do not satisfy above-mentioned IMF condition, then the first component h ₁(t) can be confirmed as identically, and can carry out aforesaid operations once more up to an IMF G who obtains to satisfy above-mentioned IMF condition with input signal s (t) ₁(t) till.

In case obtain an IMF C ₁(t), just through deducting an IMF C ₁(t) obtain surplus r ₁(t), shown in equation (11):

Mathematic graph 11

s(t)-c ₁(t)＝r ₁(t)

After this, can use surplus r ₁(t) carry out above-mentioned IMF once more as new input and extract operation, thereby obtain the 2nd IMF C ₂(t) and surplus r ₂(t).

If extract the surplus r that operating period obtains at above-mentioned IMF _n(t) have constant value or monotone increasing function or an extreme value is only arranged or do not have the simply periodic function of extreme value at all, then can stop above-mentioned IMF and extract operation.

Extract the result of operation as above-mentioned IMF, input signal s (t) can be by a plurality of IMF C ₀(t) to C _M(t) and final surplus r _m(t) and the expression, shown in equation (12):

Mathematic graph 12

s (t) = Σ_{m = 0}^{M} C_{m} (t) + r_{m} (t)

The total number of the M IMF that representes to extract wherein.Final surplus r _m(t) but the total characteristic of reflected input signal s (t).

Figure 10 illustrates through using the EMD method to decompose 11 IMF and final surplus that original input signal obtains.With reference to Figure 10, the frequency of the IMF that obtains from original input signal in early days that extracts at IMF is higher than the frequency of the IMF that obtains from original input signal in late period that IMF extracts.

Use previous surplus h _{1 (k-1)}With current surplus h _1kBetween standard deviation S D can simplify IMF and extract, shown in equation (13):

Mathematic graph 13

SD = Σ_{t = 0}^{T} [\frac{{| h_{1 (k - 1)} (t) - h_{1 k} (t) |}^{2}}{h_{1 (k - 1)}^{2} (t)}]

If standard deviation S D is less than reference value, for example less than 0.3, then current surplus h _1kCan regard IMF as.

Simultaneously, signal x (t) can be transformed to analytic signal through Hilbert transform, shown in equation (14):

Mathematic graph 14

z(t)＝x(t)+jH{x(t)}＝a(t)e ^jθ(t)

Wherein, (t) expression instantaneous amplitude, (t) expression instantaneous phase, H{} representes Hilbert transform.

As the result of Hilbert transform, input signal can be exchanged into the analytic signal of being made up of real component and imaginary component.

Through Hilbert transform being applied to mean value is 0 signal, and can obtain to provide high-resolution frequency component for time domain and frequency domain.

To describe below shown in Fig. 4 how cell 122 really to confirm a plurality of coding units in detail which will be used for encoding through decomposing each of a plurality of splitting signals that input signal obtains.

Which each splitting signal of can encoding more efficiently of confirming that unit 122 can confirm sound decorder and audio coder.In other words; What confirm that unit 122 can determine the splitting signal that any one sound decorder that uses in first coding unit 210 to the m coding unit 220 can be encoded is efficiently encoded is sound decorder, and the splitting signal that can encode efficiently to any one audio coder that uses in first coding unit 210 to the m coding unit 220 of decision encodes is audio coder.

To describe which code division signal more efficiently of confirming how unit 122 to confirm sound decorder and audio coder below in detail.

Confirm that unit 122 can measure the variation in the splitting signal, and if the result who measures greater than predefined reference value, confirm that then sound decorder can be than audio coder code division signal more efficiently.

Alternatively, confirm that unit 122 can measure the tonal components in certain part that is included in splitting signal, and if the result who measures greater than predefined reference value, confirm that then sound decorder can be than audio coder code division signal more efficiently.

Figure 11 is the block diagram of the embodiment of cell 122 really shown in Fig. 5.With reference to Figure 11, confirm that the unit comprises audio coding/decoding unit 500, first bank of filters 510, second bank of filters 520, confirms unit 530 and psychologic acoustics modeling unit 540.

Really cell which each splitting signal of can encoding more efficiently that can confirm sound decorder and audio coder shown in Figure 11.

With reference to Figure 11, input signal is by audio coding/decoding unit 500 coding, and coded signal is by 500 decodings of audio coding/decoding unit, thereby recovers original input signal.Audio coding/decoding unit 500 can comprise AMR-WB (AMR-WB) speech coders/decoders, and the AMR-WB speech coders/decoders can have code exciting lnear predict (CELP) structure.

Input signal can be owed sampling (down-sampled) before being input to audio coding/decoding unit 500.The signal of audio coding/decoding unit 500 outputs can be recovered input signal thus by over-sampling (up-sampled).

Input signal can carry out frequency transformation through first bank of filters 510.

Signal by 500 outputs of audio coding/decoding unit is converted into frequency-region signal by second bank of filters 520.First bank of filters 510 or second bank of filters 520 can be carried out cosine transform to the signal that is input to it, for example, revise discrete cosine transform (MDCT).

The frequency component of the input signal of the recovery of the frequency component of the original input signal of first bank of filters, 510 outputs and 520 outputs of second bank of filters all is imported into confirms unit 530.Which coded input signal more efficiently of confirming that unit 530 can confirm sound decorder and audio coder based on the frequency component that is input to it.

More specifically, based on being input to the frequency component of confirming the unit, calculate the perceptual entropy PE of each frequency component through using equation (15) _i, which coded input signal more efficiently of confirming that unit 530 can confirm sound decorder and audio coder:

Mathematic graph 15

{PE}_{i} = Σ_{j = j_{low} (i)}^{j_{high} (i)} N (j)

Wherein

N (j) = \{\begin{matrix} 0, x (j) = 0 \\ \log_{2} (2 | nint (\frac{x (j)}{δ}) | + 1), x (j) &NotEqual; 0 \end{matrix}

Wherein, the coefficient of x (j) expression frequency component, j representes the index of frequency component, and i representes quantization step, and nint () is the function that immediate integer is turned back to its independent variable, j _{Low (i)}And j _{High (i)}Be respectively the beginning frequency index and end frequency index of scale factor band.

Confirm that unit 530 can use equation (15) to calculate the perceptual entropy of frequency component of input signal of perceptual entropy and recovery of the frequency component of original input signal, and confirm for being used for which of audio coder and sound decorder for the coded input signal more efficient based on result calculated.

For example, if, then confirming unit 530 less than the perceptual entropy of the frequency component of the input signal that recovers, the perceptual entropy of the frequency component of original input signal can confirm that audio coder can be than sound decorder coded input signal more efficiently.On the other hand, if, then confirming unit 530 less than the perceptual entropy of the frequency component of original input signal, the perceptual entropy of the frequency component of the input signal that recovers to confirm that sound decorder can be than audio coder coded input signal more efficiently.

Figure 12 is the block diagram of one embodiment in first coding unit 210 to the m coding unit 220 shown in Fig. 1.Coding unit shown in Figure 12 can be a sound decorder.

Usually, sound decorder can be carried out LPC to input signal in frame unit, and uses the Levinson-Durbin algorithm to extract LPC coefficient, for example 16 rank LPC coefficients from each frame of input signal.Through adaptive codebook search or fixed codebook search, can quantize pumping signal.Use the linear prediction method of algebraic code excitation, can quantize pumping signal.Use has the quantification form of conjugated structure, can carry out vector quantization to the gain of pumping signal.

Sound decorder shown in Figure 12 comprises linear prediction analysis unit 600, pitch estimation unit 610, codebook search unit 620, line spectrum pair (LSP) unit 630 and quantifying unit 640.

Linear prediction analysis unit 600 uses coefficient of autocorrelation that input signal is carried out linear prediction analysis, and this coefficient of autocorrelation obtains through using asymmetric window.If leading (look-ahead) cycle, promptly asymmetric window has the length of 30ms, and then linear prediction analysis unit 600 can use the 5ms leading cycle to carry out linear prediction analysis.

Coefficient of autocorrelation uses the Levinson-Durbin algorithm to convert linear predictor coefficient into.For quantizing and linear interpolation, LSP unit 630 converts linear predictor coefficient into LSP.Quantifying unit 640 quantizes LSP.

Pitch estimation unit 610 is estimated the open loop pitch, so that reduce the complicacy of adaptive codebook search.More specifically, pitch estimation unit 610 uses the voice signal territory of the weighting of each frame to estimate the open loop pitch cycle.After this, use estimated open loop pitch to construct the harmonic noise forming filter.After this, use harmonic noise forming filter, linear prediction synthesis filter and resonance peak perceptual weighting filter to calculate impulse response.Impulse response can be used for producing the echo signal that is used for quantizing pumping signal.

Adaptive codebook search and fixed codebook search are carried out in codebook search unit 620.Interior inserting through closed loop pitch search and the pumping signal through the past calculated adaptive codebook vector, in subframe unit, can carry out adaptive codebook search.The adaptive codebook parameter can comprise the gain of pitch cycle and pitch wave filter.Can produce pumping signal through linear prediction synthesis filter, so that simplify the closed loop search.

The fixed codebook structure forms based on crossover monopulse displacement (ISSP) design.The codebook vectors that will comprise 64 positions of locating 64 pulses respectively is divided into four tracks, and each track comprises 16 positions.According to transfer rate, the pulse of predetermined number can be positioned at each of four tracks.Because therefore code book index indication track position and impulse code need not store code book, and can only use this code book index to produce pumping signal.

Sound decorder shown in Figure 12 can be carried out above-mentioned decode procedure in time domain.And if use linear prediction interpretation method coded input signal by the sort module shown in Fig. 1 100, then linear prediction analysis unit 600 can be chosen wantonly.

The invention is not restricted to the sound decorder shown in Figure 12.In other words, can use within the scope of the invention except the sound decorder shown in Figure 12, the various sound decorders of encoding speech signal efficiently.

Figure 13 is the block diagram of another embodiment of one in first coding unit 210 to the m coding unit 220 shown in Fig. 1.Coding unit shown in Figure 13 can be an audio coder.

With reference to Figure 13, audio coder comprises bank of filters 700, psychologic acoustics modeling unit 710 and quantifying unit 720.

Bank of filters 700 converts input signal into frequency-region signal.Bank of filters 700 can be carried out cosine transform to input signal, for example, revises discrete cosine transform (MDCT).

Psychologic acoustics modeling unit 710 is calculated the masking threshold of input signal or the SMR of input signal.Quantifying unit 720 uses the masking threshold that is calculated by psychologic acoustics modeling unit 710 to quantize the MDCT coefficient by bank of filters 700 outputs.Alternatively, in order in given bit rate scope, to minimize audible distortion, quantifying unit 720 can be used the SMR of input signal.

Audio coder shown in Figure 13 can be carried out above-mentioned cataloged procedure in frequency domain.

The invention is not restricted to the audio coder shown in Figure 13.In other words, can use within the scope of the invention except the audio coder shown in Figure 13, various audio coders of coding audio signal (advanced audio code translator) efficiently.

The advanced audio code translator carry out transient noise be shaped (TNS), intensity/coupling, predict and in/the stereo decoding of side (M/S).TNS be in the bank of filters window suitably distribution time domain quantization noise make the quantization noise inaudible operation that can become.Intensity/coupling is a kind of operation, comes the amount of the spatial information that the energy of transmitting audio signal, this operation can reduce to transmit through coding audio signal and this fact of time scale of only depending primarily on energy based on the perception of audio direction in the high-band.

Prediction is to remove redundant operation through the correlativity between the spectral component that uses frame from the indeclinable signal of statistical property.The stereo decoding of M/S is the operation of standardized and (that is) and poor (that is side) of transmission stereophonic signal rather than a left side and right channel signal.

The signal that carries out TNS, intensity/coupling, prediction and the stereo decoding of M/S is quantized by quantizer, and this quantizer uses the SMR that obtains from psychoacoustic model to carry out synthesis analysis (AbS).

As stated because audio coder uses the modeling method coded input signal such as the linear prediction interpretation method, so shown in Fig. 5 really cell 122 can confirm whether input signal can be by modeling easily according to one group of predetermined rule.After this, can be confirmed that then unit 122 can determine to use the sound decorder coded input signal by modeling easily if confirm input signal.On the other hand, can not be confirmed that then unit 122 can determine to use the audio coder coded input signal by modeling easily if confirm input signal.

Figure 14 is the block diagram of encoding device according to another embodiment of the present invention.In Fig. 1 to 14, the identical identical key element of Reference numeral representative, and therefore, will skip its detailed description.

With reference to Figure 14, sort module 100 is divided into first to a plurality of splitting signals of n splitting signal and confirm that in a plurality of coding units 230,240,250,260 and 270 which will be used for encoding first to each of n splitting signal with input signal.

With reference to Figure 14, coding unit 230,240,250,260 and 270 can sequentially encode first respectively to the n splitting signal.And, if input signal is split into a plurality of band signals, then can be according to this band signal of order encoding from the lowest band signal to high-frequency band signals.

Under the situation of sequential encoding splitting signal, the encoding error of first front signal can be used for the current demand signal of encoding.As a result, can use different coding method code division signal, so in case stop signal distortion and the bandwidth retractility is provided.

With reference to Figure 14, coding unit 230 coding first splitting signal, first splitting signal that decoding has been encoded, and the error between the decoded signal and first splitting signal outputed to coding unit 240.Coding unit 240 uses the error of coding units 230 outputs second splitting signal of encoding.By this way, consider the encoding error of their previous splitting signals separately, coding second is to the m splitting signal.Therefore, can realize error-free encoding and improve tonequality.

Encoding device shown in Figure 14 can be from the incoming bit stream restoring signal through the performed operation of the encoding device shown in the reverse ground execution graph 1 to 14.

Figure 15 is the block diagram of decoding device according to an embodiment of the invention.With reference to Figure 15, decoding device comprises a parse module 800, demoder determination module 810, decoder module 820 and synthesis module 830.

Position parse module 800 extracts one or more coded signals and decodes the required additional information of this coded signal from incoming bit stream.

Decoder module 820 comprises a plurality of decoding units of first decoding unit 821 to the m decoding unit 822 of carrying out different coding/decoding methods.

Decoding determination module 810 is confirmed in first decoding unit 821 to the m decoding units 822 which each coded signal of can decoding the most efficiently.Demoder determination module 810 can use with the similar method of method of the sort module 100 shown in Fig. 1 confirms in first decoding unit 821 to the m decoding unit 822 which each coded signal of can decoding the most efficiently.In other words, demoder determination module 810 can be confirmed in first decoding unit 821 to the m decoding unit 822 which each coded signal of can decoding the most efficiently based on the characteristic of each coded signal.Preferably, demoder determination module 810 can be confirmed first decoding unit 821 to the m decoding unit 822 which each coded signal of can decoding the most efficiently based on the additional information of extracting from incoming bit stream.

Additional information can comprise: classification information identifies the classification under the information encoded of being classified through encoding device; Coding unit information, sign is used to produce the coding unit of this coded signal; With decoding unit information, sign will be used to the to decode decoding unit of this coded signal.

For example, demoder determination module 810 can be based on additional information and confirm which classification coded signal belongs to, and selects in first decoding unit 821 to the m decoding unit 822 any one decoding unit corresponding to the classification of coded signal for coded signal.In this case, selected decoding unit can have a kind of structure and makes its can decode the most efficiently signal of the classification that belongs to identical with the classification of coded signal.

Alternatively; Demoder determination module 810 can be discerned the coding unit that is used to produce coded signal based on additional information, and selects in first decoding unit 821 to the m decoding unit 822 any one decoding unit corresponding to the coding unit of being discerned for coded signal.For example, if produced coded signal by sound decorder, then demoder determination module 810 can to select in first decoding unit 821 to the m decoding unit 822 be any one decoding unit of Voice decoder for coded signal.

Alternatively; Demoder determination module 810 can be discerned the decoding unit of decodable code coded signal based on additional information, and selects in first decoding unit 821 to the m decoding unit 822 any one decoding unit corresponding to the decoding unit of being discerned for coded signal.

Alternatively; Demoder determination module 810 can obtain the characteristic of decoded signal from additional information, and any one decoding unit of the signal with characteristic identical with the characteristic of coded signal of selecting can decode the most efficiently in first decoding unit 821 to the m decoding unit 822.

By this way, each coded signal that extracts from incoming bit stream is encoded by any one decoding unit of confirming as the respective coding signal of can decoding the most efficiently first decoding unit 821 to the m decoding unit 822.Decoded signal is synthetic by synthesis module 830, thereby recovers original signal.

Position parse module 800 extracts the carve information about coded signal, the breath of taking a message of the number of coded signal and each coded signal for example, but and the decoded signal that provides of the synthetic decoder module 820 of synthesis module 830 reference segmentation information.

Synthesis module 830 can comprise a plurality of synthesis units of first synthesis unit 831 to the n synthesis unit 832.In first synthesis unit 831 to the n synthesis unit 832 each all can be synthesized the decoded signal that decoder module 820 is provided, perhaps to some in the decoded signal or the decoding of all carrying out the territory conversion or adding.

Can carry out post-processing operation to composite signal for one in first synthesis unit 831 to the n synthesis unit 832, this post-processing operation is the inverse operation (inverse) of the pretreatment operation of encoding device execution.Can extract the information and the decoded information that is used to carry out post-processing operation that indicates whether to carry out post-processing operation from incoming bit stream.

With reference to Figure 16, in first synthesis unit 831 to the n synthesis unit 832 one, particularly, second synthesis unit 833 can comprise a plurality of preprocessors of first preprocessor 834 to the n preprocessor 835.First synthesis unit 831 synthesizes individual signals with a plurality of decoded signals, and in first preprocessor 834 to the n preprocessor 835 one to through the synthetic individual signals execution post-processing operation that obtains.

Indicate in first preprocessor 834 to the n preprocessor 835 which can be included in the incoming bit stream to information through the synthetic individual signals execution post-processing operation that obtains.

In first compositor 831 to the n compositor 832 one can use the linear predictor coefficient that extracts from incoming bit stream to carrying out the linear prediction decoding through the synthetic individual signals that obtains, thus the recovery original signal.

The present invention can be embodied as the embodied on computer readable code that writes on the computer-readable recording medium.Computer-readable recording medium can be a pen recorder of storing any kind of data with the mode of embodied on computer readable.The example of computer readable recording medium storing program for performing comprises ROM, RAM, CD-ROM, tape, floppy disk, optical data memories and the carrier wave data transmission of the Internet (for example, through).Computer-readable recording medium can be distributed on a plurality of computer systems that are connected to network, makes the embodied on computer readable code write on it and carry out from it with the mode of disperseing.Those skilled in the art can easily construct and realize function program, code and code segment required for the present invention.

Though illustrate and described the present invention particularly with reference to exemplary embodiment of the present invention, it will be understood by those skilled in the art that the various changes that to make form and details here and do not depart from the spirit and scope of the present invention that limit like claim.

Industrial applicibility

As stated; According to the present invention; Through according to the characteristic of signal signal being categorized as one or more classifications and using the coding unit that can best satisfy the classification under the corresponding signal each signal of encoding, can have the signal of different qualities with best bit rate coding.Therefore, can encode to the various signals that comprise audio frequency and voice signal efficiently.

Claims

1. coding/decoding method of being carried out by decoding device comprises:

Extract additional information and one or more coded signal from incoming bit stream; Wherein need said additional information so that said coded signal is decoded, said additional information comprises the coding unit information that is used to identify the coding unit that is used to produce said coded signal;

Based on the decoding unit of confirming from the said coding unit of said additional information sign to be used for coded signal is decoded, wherein said decoding unit comprises Voice decoder and audio decoder;

When selecting said Voice decoder, use said Voice decoder that coded signal is decoded; And

When selecting audio decoder, use said audio decoder that said coded signal is decoded;

Wherein, said coding unit comprises speech coder and audio coder, and said speech coder comprises linear prediction analysis unit, and said audio coder comprises bank of filters and psychoacoustic model unit.

2. decoding device comprises:

The position parse module; It extracts additional information and one or more coded signal from incoming bit stream; Wherein need said additional information so that said coded signal is decoded, said additional information comprises the coding unit information that is used to identify the coding unit that is used to produce said coded signal;

The demoder determination module, it is based on the decoding unit of confirming from the said coding unit of said additional information sign to be used for coded signal is decoded, and wherein said decoding unit comprises Voice decoder and audio decoder;

First decoding unit, when selecting said Voice decoder, first decoding unit is decoded to said coded signal; And

Second decoding unit, when selecting audio decoder, second decoding unit uses said audio decoder that said coded signal is decoded;