CN107958670A

CN107958670A - For determining the equipment and audio coding apparatus of coding mode

Info

Publication number: CN107958670A
Application number: CN201711421463.5A
Authority: CN
Inventors: 朱基岘; 安东·维克托维奇·波罗夫; 康斯坦丁·谢尔盖耶维奇·奥斯波夫; 李男淑
Original assignee: Samsung Electronics Co Ltd
Current assignee: Samsung Electronics Co Ltd
Priority date: 2012-11-13
Filing date: 2013-11-13
Publication date: 2018-04-24
Anticipated expiration: 2033-11-13
Also published as: JP2015535099A; CN108074579A; KR20210146443A; PH12015501114A1; ZA201504289B; US10468046B2; WO2014077591A1; JP2017167569A; US11004458B2; US20140188465A1; EP2922052A1; CN108074579B; US20180322887A1; TWI648730B; KR102331279B1; CN104919524B; SG10201706626XA; CN104919524A; TW201805925A; PL2922052T3

Abstract

Provide a kind of equipment and audio coding apparatus for being used to determine coding mode.A kind of method of definite coding mode includes：According to the characteristic of audio signal, a coding mode in multiple coding modes including the first coding mode and the second coding mode is determined as initial code pattern；If in the determining of initial code pattern there are mistake, by the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode.

Description

For determining the equipment and audio coding apparatus of coding mode

The application is " to be used for true for the entitled of on November 13rd, 2013 applying date submitted to China Intellectual Property Office Delimit the organizational structure pattern method and apparatus, for the method and apparatus that is encoded to audio signal and for audio signal Carry out decoded method and apparatus " No. 201380070268.6 application divisional application.

Technical field

The apparatus and method consistent with exemplary embodiment are related to audio coding and audio decoder, more particularly, are related to A kind of coding mode by determining to be suitable for the characteristic of audio signal simultaneously prevents frequent coding mode switching from determining to be used for The method and apparatus for improving the coding mode of the quality of the audio signal of reconstruct, it is a kind of to be used for what audio signal was encoded Method and apparatus and one kind are used to carry out decoded method and apparatus to audio signal.

Background technology

It is widely known that it is efficient and in time domain to voice signal to carry out coding to music signal in frequency domain It is efficient to carry out coding.Therefore, it has been proposed that for determining to be mixed with the audio signal of music signal and voice signal Classification and the various technologies for determining coding mode corresponding with identified classification.

However, due to frequency coding pattern switching, not only postpone, also reduce decoded sound quality.In addition, Since there is no the technology for correcting the coding mode (that is, classification) initially determined that, therefore, if in definite coding mode Mistake occurs for period, then the quality of the audio signal reconstructed reduces.

The content of the invention

Technical problem

The many aspects of one or more exemplary embodiments provide a kind of be used for by determining that being suitable for audio believes Number characteristic coding mode come determine for improve reconstruct audio signal quality coding mode method and apparatus, A kind of method and apparatus for being encoded to audio signal and it is a kind of be used to carrying out audio signal decoded method and Equipment.

The many aspects of one or more exemplary embodiments provide a kind of for determining to be suitable for audio signal The coding mode of characteristic and the method and apparatus for reducing the time delay caused by frequent coding mode switches, one kind are used for The method and apparatus and one kind encoded to audio signal is used to carry out decoded method and apparatus to audio signal.

Solution

According to the one side of one or more exemplary embodiments, a kind of method of definite coding mode, the method Including：According to the characteristic of audio signal, among multiple coding modes including the first coding mode and the second coding mode One coding mode is determined as initial code pattern；If there are mistake in the determining of initial code pattern, pass through by Initial code mode correction produces corrected coding mode for the 3rd coding mode.

According to the one side of one or more exemplary embodiments, a kind of method encoded to audio signal, institute The method of stating includes：According to the characteristic of audio signal, multiple coding modes of the first coding mode and the second coding mode will be included Among a coding mode be determined as initial code pattern；If there are mistake in the determining of initial code pattern, By the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode；Based on initial code mould Formula or corrected coding mode perform audio signal different coded treatments.

It is a kind of that decoded method, institute are carried out to audio signal according to the one side of one or more exemplary embodiments The method of stating includes：Bit stream including one of initial code pattern and the 3rd coding mode is parsed, and is compiled based on initial Pattern or the 3rd coding mode perform the bit stream different decoding process, wherein, the initial code pattern is logical Cross and one is determined among multiple coding modes including the first coding mode and the second coding mode according to the characteristic of audio signal A coding mode and obtain, the 3rd coding mode be to initial code pattern determine in there is a situation where mistake Under obtained from initial code pattern is corrected.

Beneficial effect

Accoding to exemplary embodiment, by based on the correction to initial code pattern and frame corresponding with trailing length Coding mode determine the final coding mode of present frame, frequent coding mode switching that can between multiple frames are prevented While select the coding mode of the characteristic for being adapted to audio signal.

Brief description of the drawings

Fig. 1 is the block diagram for the configuration for showing audio coding apparatus accoding to exemplary embodiment；

Fig. 2 is the block diagram for the configuration for showing audio coding apparatus according to another exemplary embodiment；

Fig. 3 is the block diagram for the configuration for showing coding mode determination unit accoding to exemplary embodiment；

Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit accoding to exemplary embodiment；

Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit accoding to exemplary embodiment；

Fig. 6 is the adaptive method for switching between linear prediction domain coding and the spectral domain shown accoding to exemplary embodiment Diagram；

Fig. 7 is the diagram for the operation for showing coding mode correction unit accoding to exemplary embodiment；

Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus accoding to exemplary embodiment；

Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus according to another exemplary embodiment.

Embodiment

Embodiment is will be described in now, its example is illustrated in the accompanying drawings, wherein, identical label refers to phase all the time Same element.At this point, the present embodiment can have different forms and should not be construed as limited to illustrate herein Description.Therefore, by referring to accompanying drawing, embodiment below is only illustrated many aspects for explaining this specification.

Such as term of " connection " and " link " may be used to indicate that the state for being directly connected to or linking, but should manage Solution, another component can be set to therebetween.

Such as term of " first " and " second " can be used for describing various assemblies, but the component should not be so limited to institute State term.The term can be only applied to make a component distinguish with another component.

The unit described in the exemplary embodiment is shown separately to indicate different characteristic functions, and it is unexpectedly Taste each unit and is formed by single a nextport hardware component NextPort or component software.Each unit is shown for the ease of explanation, and And multiple units can form a unit, a unit can be divided into multiple units.

Fig. 1 is the block diagram for the configuration for showing audio coding apparatus 100 accoding to exemplary embodiment.

The audio coding apparatus 100 shown in Fig. 1 may include coding mode determination unit 110, switch unit 120, spectral domain Coding unit 130, linear prediction domain coding unit 140 and bit stream generation unit 150.Linear prediction domain coding unit 140 can Including time domain excitation coding unit 141 and frequency domain excitation coding unit 143, wherein, linear prediction domain coding unit 140 can quilt It is embodied as at least one in time domain excitation coding unit 141 and frequency domain excitation coding unit 143.Unless it is necessarily implemented as Single hardware, otherwise said modules can be integrated at least one module and at least one processor can be implemented as (not Show).Here, term audio signal can refer to music signal, voice signal or their mixed signal.

With reference to Fig. 1, coding mode determination unit 110 can analyze the characteristic of audio signal to determine the classification of audio signal, And coding mode is determined according to the result of classification.Coding mode is determined to hold in units of superframe, frame or frequency range OK.Selectively, coding mode is determined to hold in units of multiple superframe groups, multiple frame groups or multiple groups of frequency bands OK.Here, the example of coding mode may include spectral domain and time domain or linear prediction domain, but not limited to this.If the property of processor Energy and processing speed are enough and time delay can be solved caused by coding mode switches, then coding mode can be subdivided, and And encoding scheme can be also subdivided according to coding mode.Accoding to exemplary embodiment, coding mode determination unit 110 can be by sound The initial code pattern of frequency signal is determined as one of spectral domain coding mode and time domain coding pattern.Implemented according to another exemplary The initial code pattern of audio signal can be determined as spectral domain coding mode, time domain excitation by example, coding mode determination unit 110 One of coding mode and frequency domain excitation coding mode.If spectral domain coding mode is confirmed as initial code pattern, mould is encoded Initial code mode correction can be encouraged one of coding mode by formula determination unit 110 for spectral domain coding mode and frequency domain.If when Domain coding mode (that is, time domain excitation coding mode) is confirmed as initial code pattern, then coding mode determination unit 110 can Initial code mode correction is encouraged into one of coding mode for time domain excitation coding mode and frequency domain.If time domain excitation encodes Pattern is confirmed as initial code pattern, then final coding mode is determined to be selectively performed.In other words, just Beginning coding mode (that is, time domain excitation coding mode) can be kept.Coding mode determination unit 110 can determine that and trailing length The coding mode of (hangover length) corresponding multiple frames, and can be that present frame determines final coding mode.According to showing Example property embodiment, if the initial code pattern of present frame or corrected coding mode and multiple previous frames are (for example, 7 Previous frame) coding mode it is identical, then corresponding initial code pattern or corrected coding mode can be confirmed as work as The final coding mode of previous frame.Meanwhile if the initial code pattern of present frame or corrected coding mode and multiple elder generations The coding mode of previous frame (for example, 7 previous frames) differs, then coding mode determination unit 110 can will be just before present frame The coding mode of frame be determined as the final coding mode of present frame.

As described above, pass through the coding mould based on the correction to initial code pattern and frame corresponding with trailing length Formula determines the final coding mode of present frame, can be selected while the frequent coding mode switching between preventing frame It is adapted to the coding mode of the characteristic of audio signal.

In general, time domain coding (that is, time domain excitation encodes) can be efficient, spectral domain coding for voice signal Can be efficient for music signal, and frequency domain excitation coding is for speech (vocal) signal and/or harmonic signal meeting It is efficient.

According to the coding mode determined by coding mode determination unit 110, switch unit 120 can be to spectral domain coding unit 130 or linear prediction domain coding unit 140 provide audio signal.If linear prediction domain coding unit 140 is implemented as time domain Coding unit 141 is encouraged, then switch unit 120 may include Liang Ge branches altogether.If 140 quilt of linear prediction domain coding unit It is embodied as time domain excitation coding unit 141 and frequency domain excitation coding unit 143, then switch unit 120 there can be 3 points altogether Branch.

Spectral domain coding unit 130 can encode audio signal in spectral domain.Spectral domain can refer to frequency domain or transform domain.It is adapted to It may include Advanced Audio Coding (AAC) in the example of the coding method of spectral domain coding unit 130 or become including improving discrete cosine The combination of (MDCT) and factorial pulse code (FPC) is changed, but not limited to this.In detail, other quantification technique and entropy coding skills Art can be used to replace FPC.It can be efficient to carry out coding to music signal in spectral domain coding unit 130.

Linear prediction domain coding unit 140 can encode audio signal in linear prediction domain.Linear prediction domain can refer to Excitation domain or time domain.Linear prediction domain coding unit 140 can be implemented as time domain excitation coding unit 141, or can be implemented It is to include time domain excitation coding unit 141 and frequency domain excitation coding unit 143.It is suitable for the volume of time domain excitation coding unit 141 The example of code method may include Code Excited Linear Prediction (CELP) or algebraically CELP (ACELP), but not limited to this.It is suitable for frequency The example of the coding method of domain excitation coding unit 143 may include universal signal coding (GSC) or conversion code excited (TCX), But not limited to this.It can be efficient to carry out coding to voice signal in time domain excitation coding unit 141, and is swashed in frequency domain It can be efficient to encourage and carry out coding to Vocal signal and/or harmonic signal in coding unit 143.

Bit stream generation unit 150 can produce bit stream to include the coding mould provided by coding mode determination unit 110 Formula, the coding result provided by spectral domain coding unit 130 and the coding result provided by linear prediction domain coding unit 140.

Fig. 2 is the block diagram for the configuration for showing audio coding apparatus 200 according to another exemplary embodiment.

Audio coding apparatus 200 shown in Figure 2 may include public pretreatment module 205, coding mode determination unit 210th, switch unit 220, spectral domain coding unit 230, linear prediction domain coding unit 240 and bit stream generation unit 250.This In, linear prediction domain coding unit 240 may include time domain excitation coding unit 241 and frequency domain excitation coding unit 243, linearly Prediction domain coding unit 240 can be implemented as time domain excitation coding unit or frequency domain excitation coding unit 243.With being shown in Fig. 1 Audio coding apparatus 100 compare, audio coding apparatus 200 may also include public pretreatment module 205, therefore, with audio compile The description of the identical component of the component of decoding apparatus 100 will be omitted.

With reference to Fig. 2, public pretreatment module 205 can perform joint stereo processing, around processing and/or bandwidth expansion Processing.Joint stereo processing, around processing and bandwidth expansion processing can with by specific criteria (for example, MPEG standards) use Those processing it is identical, but not limited to this.The output of public pretreatment module 205 can be monophonic, stereo channels or In multichannel.According to the quantity of the sound channel of the signal exported by public pretreatment module 205, switch unit 220 may include at least One switch.For example, if public pretreatment module 205 exports two or more sound channel (that is, stereo channels or more sound Road) signal, then it is corresponding with each sound channel switch can be arranged.For example, the first sound channel of stereo signal can be language Speech road, the second sound channel of stereo signal can be music soundtrack.In this case, audio signal can be simultaneously provided To two switches.The additional information produced by public pretreatment module 205 is provided to bit stream generation unit 250 and quilt Including in the bitstream.The additional information in decoding end for performing joint stereo processing, around processing and/or bandwidth Extension process is necessary, and may include spatial parameter, envelope information, energy information etc..However, based on the place applied Reason technology, may be present various additional informations.

Accoding to exemplary embodiment, in public pretreatment module 205, encoding domain can be based on and is differently carried out bandwidth expansion Exhibition is handled.Audio signal in core frequency band can be located by using time domain excitation coding mode or frequency domain excitation coding mode Reason, and the audio signal in bandwidth expansion frequency range can be processed in the time domain.Bandwidth expansion processing in time domain may include multiple Pattern (including voiced sound pattern or voiceless sound pattern).Selectively, the audio signal in core frequency band can be encoded by using spectral domain Pattern is handled, and the audio signal in bandwidth expansion frequency range can be processed in a frequency domain.Bandwidth expansion processing in frequency domain can Including multiple patterns (including transient mode, general modfel or harmonic mode).In order to be performed in not same area at bandwidth expansion Reason, the coding mode determined by coding mode determination unit 110 can be provided to public pretreatment module as signaling information 205.Accoding to exemplary embodiment, the beginning of the decline of core frequency band and bandwidth expansion frequency range may be in certain journey Overlap each other on degree.The positions and dimensions of lap can be pre-arranged.

Fig. 3 is the block diagram for the configuration for showing coding mode determination unit 300 accoding to exemplary embodiment.

The coding mode determination unit 300 shown in Fig. 3 may include initial code pattern determining unit 310 and coding mould Formula corrects unit 330.

With reference to Fig. 3, initial code pattern determining unit 310 can be by using the characteristic parameter extracted from audio signal To determine that audio signal is music signal or voice signal.If audio signal is confirmed as voice signal, linear prediction Domain coding can be suitable.Meanwhile if audio signal is confirmed as music signal, spectral domain coding can be suitable.Initially Coding mode determination unit 310 can determine the classification of audio signal by using the characteristic parameter extracted from audio signal, Wherein, the classification instruction of audio signal is that spectral domain coding, time domain excitation coding or frequency domain excitation coding are suitable for audio letter Number.Corresponding encoded pattern can be determined based on the classification of audio signal.(if Fig. 1's) switch unit (120) has two Branch, then coding mode can be represented with 1 bit.If (Fig. 1's) switch unit (120) has three branches, encode Pattern can be represented with 2 bits.Initial code pattern determining unit 310 can be by using well known in the prior art various Any technology in technology determines that audio signal is music signal or voice signal.Its example may include USAC standards The ACELP/TCX classification used in FD/LPD classification or ACELP/TCX classification and AMR standards disclosed in encoder section, But not limited to this.In other words, can be by using various any sides in addition to method according to the embodiment described here Method determines initial code pattern.

Coding mode correction unit 330 can be by using correction parameter to being determined by initial code pattern determining unit 310 Initial code pattern be corrected to determine corrected coding mode.Accoding to exemplary embodiment, if spectral domain encodes Pattern is confirmed as initial code pattern, then based on correction parameter, initial code pattern can be corrected as frequency domain excitation coding mould Formula.If time domain coding pattern is confirmed as initial code pattern, based on correction parameter, initial code pattern can be corrected Coding mode is encouraged for frequency domain.In other words, by using correction parameter, determine be in the determining of initial code pattern It is no that there are mistake.If it is determined that to initial code pattern determine in mistake is not present, then initial code pattern can be protected Hold.If instead it is determined that there are mistake in the determining of initial code pattern, then initial code pattern can be corrected.Can Obtain and coding mode and the excitation coding mode from time domain excitation coding mode to frequency domain are encouraged from spectral domain coding mode to frequency domain The correction to initial code pattern.

Meanwhile initial code pattern or corrected coding mode can be the temporary code patterns for present frame, Wherein, will can be carried out for the temporary code pattern of present frame and the coding mode for presetting the previous frame in trailing length Compare, and can determine that the final coding mode for present frame.

Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit 400 accoding to exemplary embodiment.

The initial code pattern determining unit 400 shown in Fig. 4 may include characteristic parameter extraction unit 410 and determine single Member 430.

With reference to Fig. 4, characteristic parameter extraction unit 410 can be extracted from audio signal is used to determine that the institute of coding mode to be necessary Characteristic parameter.The example of the characteristic parameter of extraction includes pitch (pitch) parameter, voiced sound parameter, degree of correlation parameter and linear Predict at least one or two among error, but not limited to this.It will be given below the detailed description to parameters.

First, fisrt feature parameter F₁It is related with pitch parameter, wherein, can be by using in present frame and at least one N number of pitch value for being detected in previous frame determines the performance of pitch.Effect deviates or prevents the sound of mistake at random in order to prevent High level, can remove the visibly different M pitch value of average value with N number of pitch value.Here, N and M can be passed through in advance By the value for testing or emulating and be acquired.In addition, N can be pre-arranged, and by removed pitch value and N number of pitch The difference between average value between value can be determined via experiment or emulation in advance.By using on (N-M) a pitch value Average m_p' and variances sigma_p', fisrt feature parameter F₁It can be expressed as shown in following equation 1.

[equation 1]

Second feature parameter F₂Also it is related with pitch parameter, and may indicate that the pitch value detected in the current frame can By property.By using two subframe SF in present frame₁And SF₂The variances sigma of the middle pitch value detected respectively_SF1And σ_SF2, the Two characteristic parameter F₂It can be expressed as shown in following equation 2.

[equation 2]

Here, cov (SF₁,SF₂) represent subframe SF₁With subframe SF₂Between covariance.In other words, second feature is joined Number F₂The degree of correlation between two subframes is designated as pitch distance.Accoding to exemplary embodiment, present frame may include two or More subframes, equation 2 can be changed based on the quantity of subframe.

Based on voiced sound parameter Voicing and degree of correlation parameter Corr, third feature parameter F₃Can be as in following equation 3 It is shown to be expressed.

[equation 3]

Here, voiced sound parameter Voicing is related to the speech characteristics of sound, and can be by well known in the prior art Any means in various methods obtain, and degree of correlation parameter Corr can be by the phase between the frame for each frequency range Guan Du sums to obtain.

Fourth feature parameter F₄With linear prediction error E_LPCCorrelation can be simultaneously expressed as shown in following equation 4.

[equation 4]

Here, M (E_LPC) represent the average value of N number of linear prediction error.

Determination unit 430 can be by using at least one characteristic parameter provided by characteristic parameter extraction unit 410 Lai really The classification of audio signal, and initial code pattern can be determined based on identified classification.Determination unit 430 can use soft Decision mechanism, wherein, in soft-decision mechanism, at least one mixing can be formed according to each characteristic parameter.According to exemplary reality Example is applied, can be by determining the classification of audio signal using gauss hybrid models (GMM) based on mixing (mixture) probability.Close It can be calculated in the probability f (x) of a mixing according to following equation 5.

[equation 5]

X=(x₁..., x_N)

m(Cx₁C ..., Cx_NC)

Here, x represents the input vector of characteristic parameter, and m represents mixing, and c represents covariance matrix.

Determination unit 430 can calculate music probability P m and speech probability Ps by using following equation 6.

[equation 6]

Here, can be by will be with being suitable for relevant M probability P i phases Calais mixed of characteristic parameter that music determine Calculate music probability P m, and can be by will be with being suitable for relevant S probability P i phases mixed of characteristic parameter that voice determine Calais calculates speech probability Ps.

Meanwhile in order to improve accuracy, music probability P m and speech probability Ps can be calculated according to following equation 7.

[equation 7]

Here,Represent the probability of error each mixed.Can be by using each mixing to including clean speech signal The quantity classified with the training data of pure music signal and classified to mistake is counted general to obtain the error Rate.

Next, for multiple frames of quantity identical with constant trailing length, it can be calculated all according to following equation 8 Frame only includes the music probability P of music signal^MOnly include the speech probability P of voice signal with all frames^S.Trailing length can be set 8 are set to, but not limited to this.Eight frames may include present frame and 7 previous frames.

[equation 8]

Next, it can be calculated by using using equation 5 or the music probability P m or speech probability Ps of the acquisition of equation 6 Multiple situation (condition) setWithIts detailed description is provided below with reference to Fig. 6.Here, can be according to Each situation is configured for music with value 1 and for mode of the voice with value 0.

With reference to Fig. 6, in operation 610 and operation 620, can be calculated from by using music probability P m and speech probability Ps Multiple situation setWithTo obtain the sum of music situation the sum of M and voice state S.In other words, music The sum of the sum of situation M and voice state S can be expressed as shown in following equation 9.

[equation 9]

630 are being operated, by the sum of music situation M compared with the threshold value Tm specified.If the sum of music situation M is big In the threshold value Tm, then the coding mode of present frame is switched to music pattern (that is, spectral domain coding mode).If music shape The sum of condition M is less than or equal to threshold value Tm, then the coding mode of present frame is not changed.

640 are being operated, by the sum of voice state S compared with specified threshold Ts.If the sum of voice state S is more than Threshold value Ts, then the coding mode of present frame be switched to speech pattern (that is, linear prediction domain coding mode).If voice shape The sum of condition S is less than or equal to threshold value Ts, then the coding mode of present frame is not changed.

Threshold value Tm and threshold value Ts can be arranged to the value obtained in advance via experiment or emulation.

Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit 500 accoding to exemplary embodiment.

The initial code pattern determining unit 500 shown in Fig. 5 may include converter unit 510, frequency spectrum parameter extraction unit 520th, time parameter extraction unit 530 and determination unit 540.

In Figure 5, original audio signal can be transformed from the time domain to frequency domain by converter unit 510.Here, converter unit 510 Can apply various any converter techniques using by audio signal from time-domain representation as spectral domain.The example of the technology may include quickly Fourier transformation (FFT), discrete cosine transform (DCT) or Modified Discrete Cosine Tr ansform (MDCT), but not limited to this.

Frequency spectrum parameter extraction unit 520 can extract at least one frequency from the frequency-domain audio signals provided by converter unit 510 Compose parameter.Frequency spectrum parameter can be classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic ginseng can be obtained from present frame Number, and long-term characteristic parameter can be obtained from multiple frames including present frame and at least one previous frame.

Time parameter extraction unit 530 can extract at least one time parameter from time-domain audio signal.Time parameter also may be used It is classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic parameter can be obtained from present frame, and can be from including current Multiple frames of frame and at least one previous frame obtain long-term characteristic parameter.

(Fig. 4's) determination unit (430) can by using the frequency spectrum parameter provided by frequency spectrum parameter extraction unit 520 with And the classification of audio signal is determined by time parameter that time parameter extraction unit 530 provides, and can be based on identified class Initial code pattern is not determined.(Fig. 4's) determination unit (430) can use soft-decision mechanism.

Fig. 7 is the diagram for the operation for showing coding mode correction unit 310 accoding to exemplary embodiment.

With reference to Fig. 7, in operation 700, the initial code pattern determined by initial code pattern determining unit 310 is received, And can determine that coding mode is Modulation (that is, time domain excitation pattern) or spectral domain pattern.

In operation 701, if determining that initial code pattern is spectral domain pattern (state in operation 700_TS==1), then may be used Check the whether more suitable index state of instruction frequency domain excitation coding_TTSS.It can be obtained by using the tone of different frequency range Fetching shows whether frequency domain excitation coding (for example, GSC) more suitably indexes state_TTSS.Its detailed description is presented below.

The tone of low-band signal can be acquired as with include minimum value multiple smaller values multiple spectral coefficients it And the ratio between the spectral coefficient with the maximum for given frequency range.If given frequency range be 0~1kHz, 1~ The pitch t of 2kHz and 2~4kHz, then each frequency range₀₁、t₁₂And t₂₄And the tone t of low-band signal (that is, core frequency band)_L It can be expressed as shown in following equation 10.

[equation 10]

t_L=max (t₀₁, t₁₂,t₂₄)

Meanwhile linear prediction error can be obtained and can be used for by using linear predictive coding (LPC) wave filter Except strong tonal components.In other words strong tonal components are directed to, spectral domain coding mode is more more efficient than frequency domain excitation coding mode.

For being switched to frequency domain excitation coding mode by using the tone and linear prediction error that obtain as described above Precondition cond_frontIt can be expressed as shown in following equation 11.

[equation 11]

cond_front=t₁₂＞ t_12frontAnd t₂₄＞ t_24frontAnd t_L＞ t_LfrontAnd err ＞ err_front

Here, t_12front、t_24front、t_LfrontAnd err_frontThreshold value, and can have in advance via experiment or emulation and The value of acquisition.

Meanwhile for completing frequency domain excitation coding by using the tone and linear prediction error that obtain as described above The postcondition cond of pattern_backIt can be expressed as shown in following equation 12.

[equation 12]

cond_back=t₁₂＜ t_12backAnd t₂₄＜ t_24backAnd t_L＜ t_Lback

Here, t_12back、t_24back、t_LbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.

In other words, can by determine equation 11 shown in precondition whether be satisfied or equation 12 shown in Postcondition whether be satisfied determine index state_TTSSWhether it is 1, wherein, index state_TTSSIndicate that frequency domain excitation is compiled It is more suitable whether code (for example, GSC) encodes than spectral domain.Here, the postcondition shown in peer-to-peer 12 determine can be Optionally.

In operation 702, if indexing state_TTSSIt is 1, then frequency domain excitation coding mode can be confirmed as finally encoding mould Formula.In this case, the spectral domain coding mode as initial code pattern is corrected as the frequency domain as final coding mode Encourage coding mode.

In operation 705, if determining index state in operation 701_TTSSIt is 0, then can checks for determining audio signal Whether the index state of strong characteristics of speech sounds is included_SS.If in the determining of spectral domain coding mode, there are mistake, frequency domain to swash Encouraging coding mode can be more more efficient than spectral domain coding mode.Can be by using the poor vc between voiced sound parameter and degree of correlation parameter To obtain for determining whether audio signal includes the index state of strong characteristics of speech sounds_SS。

For being switched to the preposition bar of strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter Part cond_frontIt can be expressed as shown in following equation 13.

[equation 13]

cond_front=vc ＞ vc_front

Here, vc_frontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.

Meanwhile for terminating strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter after Put condition cond_backIt can be expressed as shown in following equation 14.

[equation 14]

cond_back=vc ＜ vc_back

Here, vc_backIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.

In other words, in operation 705, whether can be satisfied or wait by the precondition for determining to show in equation 13 Whether the postcondition shown in formula 14 is not satisfied to determine index state_SSWhether it is 1, wherein, index state_SSInstruction It is more suitable whether frequency domain excitation coding (for example, GSC) encodes than spectral domain.Here, shown in peer-to-peer 14 to postcondition Determine can be optional.

In operation 706, if determining index state in operation 705_SSFor 0, (that is, it is special not include strong voice for audio signal Property), then spectral domain coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern Coding mode is retained as final coding mode.

In operation 707, if determining index state in operation 705_SSFor 1 (that is, audio signal includes strong characteristics of speech sounds), Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern Coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.

By perform operation 700,701 and 705, to the spectral domain coding mode as initial code pattern determine in Mistake can be corrected.In detail, the spectral domain coding mode as initial code pattern can be kept as final coding mould Formula, or frequency domain can be switched to and encourage coding mode as final coding mode.

Meanwhile if determine that initial code pattern is linear prediction domain coding mode (state in operation 700_TS==0), Then it is used to determine whether audio signal includes the index state of strong musical specific property_SMIt can be examined.If to linear prediction domain There are mistake in the determining of coding mode (that is, time domain excitation coding mode), then frequency domain excitation coding mode may swash than time domain It is more efficient to encourage coding mode.Can be by using the value for subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 and obtaining 1-vc is obtained for determining whether audio signal includes the state of strong musical specific property_SM。

For by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value 1-vc that obtains And it is switched to the precondition cond of strong music pattern_frontIt can be expressed as shown in following equation 15.

[equation 15]

cond_front=1-vc ＞ vcm_front

Here, vcm_frontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.

Meanwhile for by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value that obtains 1-vc and the postcondition cond for terminating strong music pattern_backIt can be expressed as shown in following equation 16.

[equation 16]

cond_back=1-vc ＜ vc_mback

Here, vcm_backIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.

In other words, in operation 709, whether can be satisfied or wait by the precondition for determining to show in equation 15 Whether the postcondition shown in formula 16 is not satisfied to determine index state_SMWhether it is 1, wherein, index state_SMInstruction Whether frequency domain excitation coding (for example, GSC) is more suitable for than time domain excitation coding.Here, the postcondition shown in peer-to-peer 16 Determine can be optional.

In operation 710, if determining index state in operation 709_SMFor 0, (that is, it is special not include forte pleasure for audio signal Property), then time domain excitation coding mode can be confirmed as final coding mode.In this case, as initial code pattern Linear prediction domain coding mode is switched to the time domain excitation coding mode as final coding mode.According to exemplary implementation Example, if linear prediction domain coding mode is corresponding with time domain excitation coding mode, it is contemplated that initial code pattern is kept not Become.

In operation 707, if determining index state in operation 709_SMFor 1 (that is, audio signal includes the happy characteristic of forte), Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, as the linear of initial code pattern Prediction domain coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.

By perform operation 700 and 709, to initial code pattern determine in mistake can be corrected.In detail, Linear prediction domain coding mode (for example, time domain excitation coding mode) as initial code pattern can be kept as final Coding mode, or frequency domain can be switched to and encourage coding mode as final coding mode.

Accoding to exemplary embodiment, for determining whether audio signal includes strong musical specific property to correct to linear prediction Domain coding mode determine in wrong operation 709 can be optional.

According to another exemplary embodiment, perform and be used to determine whether audio signal includes the operation 705 of strong characteristics of speech sounds And for determine frequency domain excitation coding mode if appropriate for the order of operation 701 can be reversed.In other words, operating After 700, operation 705 can be first carried out, then can perform operation 701.In this case, the parameter for being determined It can be changed according to necessary demand.

Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus 800 accoding to exemplary embodiment.

The audio decoding apparatus 800 shown in Fig. 8 may include bit stream resolution unit 810, spectral domain decoding unit 820, line Property prediction domain decoding unit 830 and switch unit 840.Linear prediction domain decoding unit 830 may include time domain excitation decoding unit 831 and frequency domain excitation decoding unit 833, wherein, it is single that linear prediction domain decoding unit 830 can be implemented as time domain excitation decoding It is at least one in member 831 and frequency domain excitation decoding unit 833.Unless single hardware is necessarily implemented as, otherwise above-mentioned group Part can be integrated at least one module, and can be implemented as at least one processor (not shown).

With reference to Fig. 8, bit stream resolution unit 810 can dock received bit stream and be parsed and on coding mode Separated with the information of coded data.Coding mode can be with encoding mould by the characteristic according to audio signal including first The initial code pattern that a coding mode is determined among multiple coding modes of formula and the second coding mode and is obtained is corresponding, Or can with to initial code pattern determine in deposit the obtained in the case of an error from initial code mode correction the 3rd Coding mode is corresponding.

Spectral domain decoding unit 820 can decode the data encoded in spectral domain from separated coded data.

Linear prediction domain decoding unit 830 can be to being encoded from separated coded data in linear prediction domain Data are decoded.If linear prediction domain decoding unit 830 includes time domain excitation decoding unit 831 and frequency domain excitation decoding Unit 833, then linear prediction domain decoding unit 830 can be directed to the execution time domain excitation decoding of separated coded data or frequency domain swashs Encourage decoding.

Switch unit 840 can be to the signal that is reconstructed by spectral domain decoding unit 820 or by linear prediction domain decoding unit 830 The signal of reconstruct switches over, and can provide the signal of switching as the signal finally reconstructed.

Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus 900 according to another exemplary embodiment.

Audio decoding apparatus 900 may include bit stream resolution unit 910, spectral domain decoding unit 920, linear prediction domain solution Code unit 930, switch unit 940 and public post-processing module 950.Linear prediction domain decoding unit 930 may include time domain excitation Decoding unit 931 and frequency domain excitation decoding unit 933, wherein, linear prediction domain decoding unit 930 can be implemented as time domain and swash Encourage at least one in decoding unit 931 and frequency domain excitation decoding unit 933.Unless single hardware is necessarily implemented as, it is no Then said modules can be integrated at least one module, and can be implemented as at least one processor (not shown).With in Fig. 8 The audio decoding apparatus 800 shown is compared, and audio decoding apparatus 900 may also include public post-processing module 950, therefore, will Omit the description pair the component identical with the component of audio decoding apparatus 800.

With reference to Fig. 9, public post-processing module 950 is executable corresponding with (Fig. 2's) public pretreatment module (205) Close three-dimensional sonication, around processing and/or bandwidth expansion processing.

Method accoding to exemplary embodiment can be written as computer executable program and be implemented in general digital In computer, wherein, the general purpose digital computer performs journey by using non-transitory computer readable recording medium Sequence.In addition, the data structure that can be used in embodiment, programmed instruction or data file can be recorded in a variety of ways In non-transitory computer readable recording medium.Non-transitory computer readable recording medium is that can store thereafter can be by calculating The arbitrary data storage device for the data that machine system is read.The example of non-transitory computer readable recording medium includes：Magnetic is situated between Matter (such as hard disk, floppy disk and tape), optical record medium (such as CD ROM disks and DVD), magnet-optical medium (such as CD) And it is specially configured to the hardware unit (ROM, RAM, flash memory etc.) of storage and execute program instructions.In addition, non-transitory Computer readable recording medium storing program for performing can be the transmission medium for the signal for being used for transmission designated program instruction, data structure etc..Program The example of instruction can not only include the machine language code produced by compiler, and may also include can use interpreter by computer Deng the higher-level language code of execution.

Although exemplary embodiment has been particularly shown and described above, those of ordinary skill in the art will Understand, in the case where not departing from the spirit and scope for the present inventive concept that claim is limited, form can be carried out to it With the various changes in details.Exemplary embodiment should be to be considered merely as descriptive meaning rather than the purpose for limitation. Therefore, the scope of present inventive concept is limited by the detailed description of exemplary embodiment, but is limited by claim It is fixed, and all differences in the scope are to be interpreted as being included in present inventive concept.

Claims

1. a kind of equipment for determining coding mode, the equipment includes：

At least one processing unit, is configured as：

Based on characteristics of signals, the classification of definite present frame among multiple classifications including music categories and voice class；

Characteristic parameter is obtained from multiple frames including present frame；

At least one condition is produced based on the characteristic parameter；

Based at least one condition, determine whether mistake occurs in the classification of identified present frame；

When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction.

2. equipment as claimed in claim 1, wherein, the processing unit is configured as：

, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is music categories The classification of identified present frame is corrected to voice class；

, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is voice class The classification of identified present frame is corrected to music categories.

3. equipment as claimed in claim 1, wherein, the characteristic parameter includes tone and linear prediction error.

4. equipment as claimed in claim 3, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter Difference.

5. a kind of audio coding apparatus, the equipment includes：

At least one processing unit, is configured as：

At least one condition is produced based on the characteristic parameter；

Based at least one condition and hangover parameter, determine whether mistake occurs in the classification of identified present frame By mistake；

When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction；

Based on the classification of the present frame after the classification of identified present frame or correction, present frame is performed at different codings Reason.

6. equipment as claimed in claim 5, wherein, the processing unit is configured as：

7. equipment as claimed in claim 6, wherein, the characteristic parameter includes tone and linear prediction error.

8. equipment as claimed in claim 5, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter Difference.