CN107958670A - For determining the equipment and audio coding apparatus of coding mode - Google Patents

For determining the equipment and audio coding apparatus of coding mode Download PDF

Info

Publication number
CN107958670A
CN107958670A CN201711421463.5A CN201711421463A CN107958670A CN 107958670 A CN107958670 A CN 107958670A CN 201711421463 A CN201711421463 A CN 201711421463A CN 107958670 A CN107958670 A CN 107958670A
Authority
CN
China
Prior art keywords
coding mode
present frame
classification
coding
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711421463.5A
Other languages
Chinese (zh)
Other versions
CN107958670B (en
Inventor
朱基岘
安东·维克托维奇·波罗夫
康斯坦丁·谢尔盖耶维奇·奥斯波夫
李男淑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Samsung Electronics Co Ltd
Original Assignee
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Samsung Electronics Co Ltd filed Critical Samsung Electronics Co Ltd
Publication of CN107958670A publication Critical patent/CN107958670A/en
Application granted granted Critical
Publication of CN107958670B publication Critical patent/CN107958670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/22Mode decision, i.e. based on audio signal content versus external parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

Provide a kind of equipment and audio coding apparatus for being used to determine coding mode.A kind of method of definite coding mode includes:According to the characteristic of audio signal, a coding mode in multiple coding modes including the first coding mode and the second coding mode is determined as initial code pattern;If in the determining of initial code pattern there are mistake, by the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode.

Description

For determining the equipment and audio coding apparatus of coding mode
The application is " to be used for true for the entitled of on November 13rd, 2013 applying date submitted to China Intellectual Property Office Delimit the organizational structure pattern method and apparatus, for the method and apparatus that is encoded to audio signal and for audio signal Carry out decoded method and apparatus " No. 201380070268.6 application divisional application.
Technical field
The apparatus and method consistent with exemplary embodiment are related to audio coding and audio decoder, more particularly, are related to A kind of coding mode by determining to be suitable for the characteristic of audio signal simultaneously prevents frequent coding mode switching from determining to be used for The method and apparatus for improving the coding mode of the quality of the audio signal of reconstruct, it is a kind of to be used for what audio signal was encoded Method and apparatus and one kind are used to carry out decoded method and apparatus to audio signal.
Background technology
It is widely known that it is efficient and in time domain to voice signal to carry out coding to music signal in frequency domain It is efficient to carry out coding.Therefore, it has been proposed that for determining to be mixed with the audio signal of music signal and voice signal Classification and the various technologies for determining coding mode corresponding with identified classification.
However, due to frequency coding pattern switching, not only postpone, also reduce decoded sound quality.In addition, Since there is no the technology for correcting the coding mode (that is, classification) initially determined that, therefore, if in definite coding mode Mistake occurs for period, then the quality of the audio signal reconstructed reduces.
The content of the invention
Technical problem
The many aspects of one or more exemplary embodiments provide a kind of be used for by determining that being suitable for audio believes Number characteristic coding mode come determine for improve reconstruct audio signal quality coding mode method and apparatus, A kind of method and apparatus for being encoded to audio signal and it is a kind of be used to carrying out audio signal decoded method and Equipment.
The many aspects of one or more exemplary embodiments provide a kind of for determining to be suitable for audio signal The coding mode of characteristic and the method and apparatus for reducing the time delay caused by frequent coding mode switches, one kind are used for The method and apparatus and one kind encoded to audio signal is used to carry out decoded method and apparatus to audio signal.
Solution
According to the one side of one or more exemplary embodiments, a kind of method of definite coding mode, the method Including:According to the characteristic of audio signal, among multiple coding modes including the first coding mode and the second coding mode One coding mode is determined as initial code pattern;If there are mistake in the determining of initial code pattern, pass through by Initial code mode correction produces corrected coding mode for the 3rd coding mode.
According to the one side of one or more exemplary embodiments, a kind of method encoded to audio signal, institute The method of stating includes:According to the characteristic of audio signal, multiple coding modes of the first coding mode and the second coding mode will be included Among a coding mode be determined as initial code pattern;If there are mistake in the determining of initial code pattern, By the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode;Based on initial code mould Formula or corrected coding mode perform audio signal different coded treatments.
It is a kind of that decoded method, institute are carried out to audio signal according to the one side of one or more exemplary embodiments The method of stating includes:Bit stream including one of initial code pattern and the 3rd coding mode is parsed, and is compiled based on initial Pattern or the 3rd coding mode perform the bit stream different decoding process, wherein, the initial code pattern is logical Cross and one is determined among multiple coding modes including the first coding mode and the second coding mode according to the characteristic of audio signal A coding mode and obtain, the 3rd coding mode be to initial code pattern determine in there is a situation where mistake Under obtained from initial code pattern is corrected.
Beneficial effect
Accoding to exemplary embodiment, by based on the correction to initial code pattern and frame corresponding with trailing length Coding mode determine the final coding mode of present frame, frequent coding mode switching that can between multiple frames are prevented While select the coding mode of the characteristic for being adapted to audio signal.
Brief description of the drawings
Fig. 1 is the block diagram for the configuration for showing audio coding apparatus accoding to exemplary embodiment;
Fig. 2 is the block diagram for the configuration for showing audio coding apparatus according to another exemplary embodiment;
Fig. 3 is the block diagram for the configuration for showing coding mode determination unit accoding to exemplary embodiment;
Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit accoding to exemplary embodiment;
Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit accoding to exemplary embodiment;
Fig. 6 is the adaptive method for switching between linear prediction domain coding and the spectral domain shown accoding to exemplary embodiment Diagram;
Fig. 7 is the diagram for the operation for showing coding mode correction unit accoding to exemplary embodiment;
Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus accoding to exemplary embodiment;
Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus according to another exemplary embodiment.
Embodiment
Embodiment is will be described in now, its example is illustrated in the accompanying drawings, wherein, identical label refers to phase all the time Same element.At this point, the present embodiment can have different forms and should not be construed as limited to illustrate herein Description.Therefore, by referring to accompanying drawing, embodiment below is only illustrated many aspects for explaining this specification.
Such as term of " connection " and " link " may be used to indicate that the state for being directly connected to or linking, but should manage Solution, another component can be set to therebetween.
Such as term of " first " and " second " can be used for describing various assemblies, but the component should not be so limited to institute State term.The term can be only applied to make a component distinguish with another component.
The unit described in the exemplary embodiment is shown separately to indicate different characteristic functions, and it is unexpectedly Taste each unit and is formed by single a nextport hardware component NextPort or component software.Each unit is shown for the ease of explanation, and And multiple units can form a unit, a unit can be divided into multiple units.
Fig. 1 is the block diagram for the configuration for showing audio coding apparatus 100 accoding to exemplary embodiment.
The audio coding apparatus 100 shown in Fig. 1 may include coding mode determination unit 110, switch unit 120, spectral domain Coding unit 130, linear prediction domain coding unit 140 and bit stream generation unit 150.Linear prediction domain coding unit 140 can Including time domain excitation coding unit 141 and frequency domain excitation coding unit 143, wherein, linear prediction domain coding unit 140 can quilt It is embodied as at least one in time domain excitation coding unit 141 and frequency domain excitation coding unit 143.Unless it is necessarily implemented as Single hardware, otherwise said modules can be integrated at least one module and at least one processor can be implemented as (not Show).Here, term audio signal can refer to music signal, voice signal or their mixed signal.
With reference to Fig. 1, coding mode determination unit 110 can analyze the characteristic of audio signal to determine the classification of audio signal, And coding mode is determined according to the result of classification.Coding mode is determined to hold in units of superframe, frame or frequency range OK.Selectively, coding mode is determined to hold in units of multiple superframe groups, multiple frame groups or multiple groups of frequency bands OK.Here, the example of coding mode may include spectral domain and time domain or linear prediction domain, but not limited to this.If the property of processor Energy and processing speed are enough and time delay can be solved caused by coding mode switches, then coding mode can be subdivided, and And encoding scheme can be also subdivided according to coding mode.Accoding to exemplary embodiment, coding mode determination unit 110 can be by sound The initial code pattern of frequency signal is determined as one of spectral domain coding mode and time domain coding pattern.Implemented according to another exemplary The initial code pattern of audio signal can be determined as spectral domain coding mode, time domain excitation by example, coding mode determination unit 110 One of coding mode and frequency domain excitation coding mode.If spectral domain coding mode is confirmed as initial code pattern, mould is encoded Initial code mode correction can be encouraged one of coding mode by formula determination unit 110 for spectral domain coding mode and frequency domain.If when Domain coding mode (that is, time domain excitation coding mode) is confirmed as initial code pattern, then coding mode determination unit 110 can Initial code mode correction is encouraged into one of coding mode for time domain excitation coding mode and frequency domain.If time domain excitation encodes Pattern is confirmed as initial code pattern, then final coding mode is determined to be selectively performed.In other words, just Beginning coding mode (that is, time domain excitation coding mode) can be kept.Coding mode determination unit 110 can determine that and trailing length The coding mode of (hangover length) corresponding multiple frames, and can be that present frame determines final coding mode.According to showing Example property embodiment, if the initial code pattern of present frame or corrected coding mode and multiple previous frames are (for example, 7 Previous frame) coding mode it is identical, then corresponding initial code pattern or corrected coding mode can be confirmed as work as The final coding mode of previous frame.Meanwhile if the initial code pattern of present frame or corrected coding mode and multiple elder generations The coding mode of previous frame (for example, 7 previous frames) differs, then coding mode determination unit 110 can will be just before present frame The coding mode of frame be determined as the final coding mode of present frame.
As described above, pass through the coding mould based on the correction to initial code pattern and frame corresponding with trailing length Formula determines the final coding mode of present frame, can be selected while the frequent coding mode switching between preventing frame It is adapted to the coding mode of the characteristic of audio signal.
In general, time domain coding (that is, time domain excitation encodes) can be efficient, spectral domain coding for voice signal Can be efficient for music signal, and frequency domain excitation coding is for speech (vocal) signal and/or harmonic signal meeting It is efficient.
According to the coding mode determined by coding mode determination unit 110, switch unit 120 can be to spectral domain coding unit 130 or linear prediction domain coding unit 140 provide audio signal.If linear prediction domain coding unit 140 is implemented as time domain Coding unit 141 is encouraged, then switch unit 120 may include Liang Ge branches altogether.If 140 quilt of linear prediction domain coding unit It is embodied as time domain excitation coding unit 141 and frequency domain excitation coding unit 143, then switch unit 120 there can be 3 points altogether Branch.
Spectral domain coding unit 130 can encode audio signal in spectral domain.Spectral domain can refer to frequency domain or transform domain.It is adapted to It may include Advanced Audio Coding (AAC) in the example of the coding method of spectral domain coding unit 130 or become including improving discrete cosine The combination of (MDCT) and factorial pulse code (FPC) is changed, but not limited to this.In detail, other quantification technique and entropy coding skills Art can be used to replace FPC.It can be efficient to carry out coding to music signal in spectral domain coding unit 130.
Linear prediction domain coding unit 140 can encode audio signal in linear prediction domain.Linear prediction domain can refer to Excitation domain or time domain.Linear prediction domain coding unit 140 can be implemented as time domain excitation coding unit 141, or can be implemented It is to include time domain excitation coding unit 141 and frequency domain excitation coding unit 143.It is suitable for the volume of time domain excitation coding unit 141 The example of code method may include Code Excited Linear Prediction (CELP) or algebraically CELP (ACELP), but not limited to this.It is suitable for frequency The example of the coding method of domain excitation coding unit 143 may include universal signal coding (GSC) or conversion code excited (TCX), But not limited to this.It can be efficient to carry out coding to voice signal in time domain excitation coding unit 141, and is swashed in frequency domain It can be efficient to encourage and carry out coding to Vocal signal and/or harmonic signal in coding unit 143.
Bit stream generation unit 150 can produce bit stream to include the coding mould provided by coding mode determination unit 110 Formula, the coding result provided by spectral domain coding unit 130 and the coding result provided by linear prediction domain coding unit 140.
Fig. 2 is the block diagram for the configuration for showing audio coding apparatus 200 according to another exemplary embodiment.
Audio coding apparatus 200 shown in Figure 2 may include public pretreatment module 205, coding mode determination unit 210th, switch unit 220, spectral domain coding unit 230, linear prediction domain coding unit 240 and bit stream generation unit 250.This In, linear prediction domain coding unit 240 may include time domain excitation coding unit 241 and frequency domain excitation coding unit 243, linearly Prediction domain coding unit 240 can be implemented as time domain excitation coding unit or frequency domain excitation coding unit 243.With being shown in Fig. 1 Audio coding apparatus 100 compare, audio coding apparatus 200 may also include public pretreatment module 205, therefore, with audio compile The description of the identical component of the component of decoding apparatus 100 will be omitted.
With reference to Fig. 2, public pretreatment module 205 can perform joint stereo processing, around processing and/or bandwidth expansion Processing.Joint stereo processing, around processing and bandwidth expansion processing can with by specific criteria (for example, MPEG standards) use Those processing it is identical, but not limited to this.The output of public pretreatment module 205 can be monophonic, stereo channels or In multichannel.According to the quantity of the sound channel of the signal exported by public pretreatment module 205, switch unit 220 may include at least One switch.For example, if public pretreatment module 205 exports two or more sound channel (that is, stereo channels or more sound Road) signal, then it is corresponding with each sound channel switch can be arranged.For example, the first sound channel of stereo signal can be language Speech road, the second sound channel of stereo signal can be music soundtrack.In this case, audio signal can be simultaneously provided To two switches.The additional information produced by public pretreatment module 205 is provided to bit stream generation unit 250 and quilt Including in the bitstream.The additional information in decoding end for performing joint stereo processing, around processing and/or bandwidth Extension process is necessary, and may include spatial parameter, envelope information, energy information etc..However, based on the place applied Reason technology, may be present various additional informations.
Accoding to exemplary embodiment, in public pretreatment module 205, encoding domain can be based on and is differently carried out bandwidth expansion Exhibition is handled.Audio signal in core frequency band can be located by using time domain excitation coding mode or frequency domain excitation coding mode Reason, and the audio signal in bandwidth expansion frequency range can be processed in the time domain.Bandwidth expansion processing in time domain may include multiple Pattern (including voiced sound pattern or voiceless sound pattern).Selectively, the audio signal in core frequency band can be encoded by using spectral domain Pattern is handled, and the audio signal in bandwidth expansion frequency range can be processed in a frequency domain.Bandwidth expansion processing in frequency domain can Including multiple patterns (including transient mode, general modfel or harmonic mode).In order to be performed in not same area at bandwidth expansion Reason, the coding mode determined by coding mode determination unit 110 can be provided to public pretreatment module as signaling information 205.Accoding to exemplary embodiment, the beginning of the decline of core frequency band and bandwidth expansion frequency range may be in certain journey Overlap each other on degree.The positions and dimensions of lap can be pre-arranged.
Fig. 3 is the block diagram for the configuration for showing coding mode determination unit 300 accoding to exemplary embodiment.
The coding mode determination unit 300 shown in Fig. 3 may include initial code pattern determining unit 310 and coding mould Formula corrects unit 330.
With reference to Fig. 3, initial code pattern determining unit 310 can be by using the characteristic parameter extracted from audio signal To determine that audio signal is music signal or voice signal.If audio signal is confirmed as voice signal, linear prediction Domain coding can be suitable.Meanwhile if audio signal is confirmed as music signal, spectral domain coding can be suitable.Initially Coding mode determination unit 310 can determine the classification of audio signal by using the characteristic parameter extracted from audio signal, Wherein, the classification instruction of audio signal is that spectral domain coding, time domain excitation coding or frequency domain excitation coding are suitable for audio letter Number.Corresponding encoded pattern can be determined based on the classification of audio signal.(if Fig. 1's) switch unit (120) has two Branch, then coding mode can be represented with 1 bit.If (Fig. 1's) switch unit (120) has three branches, encode Pattern can be represented with 2 bits.Initial code pattern determining unit 310 can be by using well known in the prior art various Any technology in technology determines that audio signal is music signal or voice signal.Its example may include USAC standards The ACELP/TCX classification used in FD/LPD classification or ACELP/TCX classification and AMR standards disclosed in encoder section, But not limited to this.In other words, can be by using various any sides in addition to method according to the embodiment described here Method determines initial code pattern.
Coding mode correction unit 330 can be by using correction parameter to being determined by initial code pattern determining unit 310 Initial code pattern be corrected to determine corrected coding mode.Accoding to exemplary embodiment, if spectral domain encodes Pattern is confirmed as initial code pattern, then based on correction parameter, initial code pattern can be corrected as frequency domain excitation coding mould Formula.If time domain coding pattern is confirmed as initial code pattern, based on correction parameter, initial code pattern can be corrected Coding mode is encouraged for frequency domain.In other words, by using correction parameter, determine be in the determining of initial code pattern It is no that there are mistake.If it is determined that to initial code pattern determine in mistake is not present, then initial code pattern can be protected Hold.If instead it is determined that there are mistake in the determining of initial code pattern, then initial code pattern can be corrected.Can Obtain and coding mode and the excitation coding mode from time domain excitation coding mode to frequency domain are encouraged from spectral domain coding mode to frequency domain The correction to initial code pattern.
Meanwhile initial code pattern or corrected coding mode can be the temporary code patterns for present frame, Wherein, will can be carried out for the temporary code pattern of present frame and the coding mode for presetting the previous frame in trailing length Compare, and can determine that the final coding mode for present frame.
Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit 400 accoding to exemplary embodiment.
The initial code pattern determining unit 400 shown in Fig. 4 may include characteristic parameter extraction unit 410 and determine single Member 430.
With reference to Fig. 4, characteristic parameter extraction unit 410 can be extracted from audio signal is used to determine that the institute of coding mode to be necessary Characteristic parameter.The example of the characteristic parameter of extraction includes pitch (pitch) parameter, voiced sound parameter, degree of correlation parameter and linear Predict at least one or two among error, but not limited to this.It will be given below the detailed description to parameters.
First, fisrt feature parameter F1It is related with pitch parameter, wherein, can be by using in present frame and at least one N number of pitch value for being detected in previous frame determines the performance of pitch.Effect deviates or prevents the sound of mistake at random in order to prevent High level, can remove the visibly different M pitch value of average value with N number of pitch value.Here, N and M can be passed through in advance By the value for testing or emulating and be acquired.In addition, N can be pre-arranged, and by removed pitch value and N number of pitch The difference between average value between value can be determined via experiment or emulation in advance.By using on (N-M) a pitch value Average mp' and variances sigmap', fisrt feature parameter F1It can be expressed as shown in following equation 1.
[equation 1]
Second feature parameter F2Also it is related with pitch parameter, and may indicate that the pitch value detected in the current frame can By property.By using two subframe SF in present frame1And SF2The variances sigma of the middle pitch value detected respectivelySF1And σSF2, the Two characteristic parameter F2It can be expressed as shown in following equation 2.
[equation 2]
Here, cov (SF1,SF2) represent subframe SF1With subframe SF2Between covariance.In other words, second feature is joined Number F2The degree of correlation between two subframes is designated as pitch distance.Accoding to exemplary embodiment, present frame may include two or More subframes, equation 2 can be changed based on the quantity of subframe.
Based on voiced sound parameter Voicing and degree of correlation parameter Corr, third feature parameter F3Can be as in following equation 3 It is shown to be expressed.
[equation 3]
Here, voiced sound parameter Voicing is related to the speech characteristics of sound, and can be by well known in the prior art Any means in various methods obtain, and degree of correlation parameter Corr can be by the phase between the frame for each frequency range Guan Du sums to obtain.
Fourth feature parameter F4With linear prediction error ELPCCorrelation can be simultaneously expressed as shown in following equation 4.
[equation 4]
Here, M (ELPC) represent the average value of N number of linear prediction error.
Determination unit 430 can be by using at least one characteristic parameter provided by characteristic parameter extraction unit 410 Lai really The classification of audio signal, and initial code pattern can be determined based on identified classification.Determination unit 430 can use soft Decision mechanism, wherein, in soft-decision mechanism, at least one mixing can be formed according to each characteristic parameter.According to exemplary reality Example is applied, can be by determining the classification of audio signal using gauss hybrid models (GMM) based on mixing (mixture) probability.Close It can be calculated in the probability f (x) of a mixing according to following equation 5.
[equation 5]
X=(x1..., xN)
m(Cx1C ..., CxNC)
Here, x represents the input vector of characteristic parameter, and m represents mixing, and c represents covariance matrix.
Determination unit 430 can calculate music probability P m and speech probability Ps by using following equation 6.
[equation 6]
Here, can be by will be with being suitable for relevant M probability P i phases Calais mixed of characteristic parameter that music determine Calculate music probability P m, and can be by will be with being suitable for relevant S probability P i phases mixed of characteristic parameter that voice determine Calais calculates speech probability Ps.
Meanwhile in order to improve accuracy, music probability P m and speech probability Ps can be calculated according to following equation 7.
[equation 7]
Here,Represent the probability of error each mixed.Can be by using each mixing to including clean speech signal The quantity classified with the training data of pure music signal and classified to mistake is counted general to obtain the error Rate.
Next, for multiple frames of quantity identical with constant trailing length, it can be calculated all according to following equation 8 Frame only includes the music probability P of music signalMOnly include the speech probability P of voice signal with all framesS.Trailing length can be set 8 are set to, but not limited to this.Eight frames may include present frame and 7 previous frames.
[equation 8]
Next, it can be calculated by using using equation 5 or the music probability P m or speech probability Ps of the acquisition of equation 6 Multiple situation (condition) setWithIts detailed description is provided below with reference to Fig. 6.Here, can be according to Each situation is configured for music with value 1 and for mode of the voice with value 0.
With reference to Fig. 6, in operation 610 and operation 620, can be calculated from by using music probability P m and speech probability Ps Multiple situation setWithTo obtain the sum of music situation the sum of M and voice state S.In other words, music The sum of the sum of situation M and voice state S can be expressed as shown in following equation 9.
[equation 9]
630 are being operated, by the sum of music situation M compared with the threshold value Tm specified.If the sum of music situation M is big In the threshold value Tm, then the coding mode of present frame is switched to music pattern (that is, spectral domain coding mode).If music shape The sum of condition M is less than or equal to threshold value Tm, then the coding mode of present frame is not changed.
640 are being operated, by the sum of voice state S compared with specified threshold Ts.If the sum of voice state S is more than Threshold value Ts, then the coding mode of present frame be switched to speech pattern (that is, linear prediction domain coding mode).If voice shape The sum of condition S is less than or equal to threshold value Ts, then the coding mode of present frame is not changed.
Threshold value Tm and threshold value Ts can be arranged to the value obtained in advance via experiment or emulation.
Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit 500 accoding to exemplary embodiment.
The initial code pattern determining unit 500 shown in Fig. 5 may include converter unit 510, frequency spectrum parameter extraction unit 520th, time parameter extraction unit 530 and determination unit 540.
In Figure 5, original audio signal can be transformed from the time domain to frequency domain by converter unit 510.Here, converter unit 510 Can apply various any converter techniques using by audio signal from time-domain representation as spectral domain.The example of the technology may include quickly Fourier transformation (FFT), discrete cosine transform (DCT) or Modified Discrete Cosine Tr ansform (MDCT), but not limited to this.
Frequency spectrum parameter extraction unit 520 can extract at least one frequency from the frequency-domain audio signals provided by converter unit 510 Compose parameter.Frequency spectrum parameter can be classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic ginseng can be obtained from present frame Number, and long-term characteristic parameter can be obtained from multiple frames including present frame and at least one previous frame.
Time parameter extraction unit 530 can extract at least one time parameter from time-domain audio signal.Time parameter also may be used It is classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic parameter can be obtained from present frame, and can be from including current Multiple frames of frame and at least one previous frame obtain long-term characteristic parameter.
(Fig. 4's) determination unit (430) can by using the frequency spectrum parameter provided by frequency spectrum parameter extraction unit 520 with And the classification of audio signal is determined by time parameter that time parameter extraction unit 530 provides, and can be based on identified class Initial code pattern is not determined.(Fig. 4's) determination unit (430) can use soft-decision mechanism.
Fig. 7 is the diagram for the operation for showing coding mode correction unit 310 accoding to exemplary embodiment.
With reference to Fig. 7, in operation 700, the initial code pattern determined by initial code pattern determining unit 310 is received, And can determine that coding mode is Modulation (that is, time domain excitation pattern) or spectral domain pattern.
In operation 701, if determining that initial code pattern is spectral domain pattern (state in operation 700TS==1), then may be used Check the whether more suitable index state of instruction frequency domain excitation codingTTSS.It can be obtained by using the tone of different frequency range Fetching shows whether frequency domain excitation coding (for example, GSC) more suitably indexes stateTTSS.Its detailed description is presented below.
The tone of low-band signal can be acquired as with include minimum value multiple smaller values multiple spectral coefficients it And the ratio between the spectral coefficient with the maximum for given frequency range.If given frequency range be 0~1kHz, 1~ The pitch t of 2kHz and 2~4kHz, then each frequency range01、t12And t24And the tone t of low-band signal (that is, core frequency band)L It can be expressed as shown in following equation 10.
[equation 10]
tL=max (t01, t12,t24)
Meanwhile linear prediction error can be obtained and can be used for by using linear predictive coding (LPC) wave filter Except strong tonal components.In other words strong tonal components are directed to, spectral domain coding mode is more more efficient than frequency domain excitation coding mode.
For being switched to frequency domain excitation coding mode by using the tone and linear prediction error that obtain as described above Precondition condfrontIt can be expressed as shown in following equation 11.
[equation 11]
condfront=t12> t12frontAnd t24> t24frontAnd tL> tLfrontAnd err > errfront
Here, t12front、t24front、tLfrontAnd errfrontThreshold value, and can have in advance via experiment or emulation and The value of acquisition.
Meanwhile for completing frequency domain excitation coding by using the tone and linear prediction error that obtain as described above The postcondition cond of patternbackIt can be expressed as shown in following equation 12.
[equation 12]
condback=t12< t12backAnd t24< t24backAnd tL< tLback
Here, t12back、t24back、tLbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, can by determine equation 11 shown in precondition whether be satisfied or equation 12 shown in Postcondition whether be satisfied determine index stateTTSSWhether it is 1, wherein, index stateTTSSIndicate that frequency domain excitation is compiled It is more suitable whether code (for example, GSC) encodes than spectral domain.Here, the postcondition shown in peer-to-peer 12 determine can be Optionally.
In operation 702, if indexing stateTTSSIt is 1, then frequency domain excitation coding mode can be confirmed as finally encoding mould Formula.In this case, the spectral domain coding mode as initial code pattern is corrected as the frequency domain as final coding mode Encourage coding mode.
In operation 705, if determining index state in operation 701TTSSIt is 0, then can checks for determining audio signal Whether the index state of strong characteristics of speech sounds is includedSS.If in the determining of spectral domain coding mode, there are mistake, frequency domain to swash Encouraging coding mode can be more more efficient than spectral domain coding mode.Can be by using the poor vc between voiced sound parameter and degree of correlation parameter To obtain for determining whether audio signal includes the index state of strong characteristics of speech soundsSS
For being switched to the preposition bar of strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter Part condfrontIt can be expressed as shown in following equation 13.
[equation 13]
condfront=vc > vcfront
Here, vcfrontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
Meanwhile for terminating strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter after Put condition condbackIt can be expressed as shown in following equation 14.
[equation 14]
condback=vc < vcback
Here, vcbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, in operation 705, whether can be satisfied or wait by the precondition for determining to show in equation 13 Whether the postcondition shown in formula 14 is not satisfied to determine index stateSSWhether it is 1, wherein, index stateSSInstruction It is more suitable whether frequency domain excitation coding (for example, GSC) encodes than spectral domain.Here, shown in peer-to-peer 14 to postcondition Determine can be optional.
In operation 706, if determining index state in operation 705SSFor 0, (that is, it is special not include strong voice for audio signal Property), then spectral domain coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern Coding mode is retained as final coding mode.
In operation 707, if determining index state in operation 705SSFor 1 (that is, audio signal includes strong characteristics of speech sounds), Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern Coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.
By perform operation 700,701 and 705, to the spectral domain coding mode as initial code pattern determine in Mistake can be corrected.In detail, the spectral domain coding mode as initial code pattern can be kept as final coding mould Formula, or frequency domain can be switched to and encourage coding mode as final coding mode.
Meanwhile if determine that initial code pattern is linear prediction domain coding mode (state in operation 700TS==0), Then it is used to determine whether audio signal includes the index state of strong musical specific propertySMIt can be examined.If to linear prediction domain There are mistake in the determining of coding mode (that is, time domain excitation coding mode), then frequency domain excitation coding mode may swash than time domain It is more efficient to encourage coding mode.Can be by using the value for subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 and obtaining 1-vc is obtained for determining whether audio signal includes the state of strong musical specific propertySM
For by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value 1-vc that obtains And it is switched to the precondition cond of strong music patternfrontIt can be expressed as shown in following equation 15.
[equation 15]
condfront=1-vc > vcmfront
Here, vcmfrontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
Meanwhile for by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value that obtains 1-vc and the postcondition cond for terminating strong music patternbackIt can be expressed as shown in following equation 16.
[equation 16]
condback=1-vc < vcmback
Here, vcmbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, in operation 709, whether can be satisfied or wait by the precondition for determining to show in equation 15 Whether the postcondition shown in formula 16 is not satisfied to determine index stateSMWhether it is 1, wherein, index stateSMInstruction Whether frequency domain excitation coding (for example, GSC) is more suitable for than time domain excitation coding.Here, the postcondition shown in peer-to-peer 16 Determine can be optional.
In operation 710, if determining index state in operation 709SMFor 0, (that is, it is special not include forte pleasure for audio signal Property), then time domain excitation coding mode can be confirmed as final coding mode.In this case, as initial code pattern Linear prediction domain coding mode is switched to the time domain excitation coding mode as final coding mode.According to exemplary implementation Example, if linear prediction domain coding mode is corresponding with time domain excitation coding mode, it is contemplated that initial code pattern is kept not Become.
In operation 707, if determining index state in operation 709SMFor 1 (that is, audio signal includes the happy characteristic of forte), Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, as the linear of initial code pattern Prediction domain coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.
By perform operation 700 and 709, to initial code pattern determine in mistake can be corrected.In detail, Linear prediction domain coding mode (for example, time domain excitation coding mode) as initial code pattern can be kept as final Coding mode, or frequency domain can be switched to and encourage coding mode as final coding mode.
Accoding to exemplary embodiment, for determining whether audio signal includes strong musical specific property to correct to linear prediction Domain coding mode determine in wrong operation 709 can be optional.
According to another exemplary embodiment, perform and be used to determine whether audio signal includes the operation 705 of strong characteristics of speech sounds And for determine frequency domain excitation coding mode if appropriate for the order of operation 701 can be reversed.In other words, operating After 700, operation 705 can be first carried out, then can perform operation 701.In this case, the parameter for being determined It can be changed according to necessary demand.
Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus 800 accoding to exemplary embodiment.
The audio decoding apparatus 800 shown in Fig. 8 may include bit stream resolution unit 810, spectral domain decoding unit 820, line Property prediction domain decoding unit 830 and switch unit 840.Linear prediction domain decoding unit 830 may include time domain excitation decoding unit 831 and frequency domain excitation decoding unit 833, wherein, it is single that linear prediction domain decoding unit 830 can be implemented as time domain excitation decoding It is at least one in member 831 and frequency domain excitation decoding unit 833.Unless single hardware is necessarily implemented as, otherwise above-mentioned group Part can be integrated at least one module, and can be implemented as at least one processor (not shown).
With reference to Fig. 8, bit stream resolution unit 810 can dock received bit stream and be parsed and on coding mode Separated with the information of coded data.Coding mode can be with encoding mould by the characteristic according to audio signal including first The initial code pattern that a coding mode is determined among multiple coding modes of formula and the second coding mode and is obtained is corresponding, Or can with to initial code pattern determine in deposit the obtained in the case of an error from initial code mode correction the 3rd Coding mode is corresponding.
Spectral domain decoding unit 820 can decode the data encoded in spectral domain from separated coded data.
Linear prediction domain decoding unit 830 can be to being encoded from separated coded data in linear prediction domain Data are decoded.If linear prediction domain decoding unit 830 includes time domain excitation decoding unit 831 and frequency domain excitation decoding Unit 833, then linear prediction domain decoding unit 830 can be directed to the execution time domain excitation decoding of separated coded data or frequency domain swashs Encourage decoding.
Switch unit 840 can be to the signal that is reconstructed by spectral domain decoding unit 820 or by linear prediction domain decoding unit 830 The signal of reconstruct switches over, and can provide the signal of switching as the signal finally reconstructed.
Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus 900 according to another exemplary embodiment.
Audio decoding apparatus 900 may include bit stream resolution unit 910, spectral domain decoding unit 920, linear prediction domain solution Code unit 930, switch unit 940 and public post-processing module 950.Linear prediction domain decoding unit 930 may include time domain excitation Decoding unit 931 and frequency domain excitation decoding unit 933, wherein, linear prediction domain decoding unit 930 can be implemented as time domain and swash Encourage at least one in decoding unit 931 and frequency domain excitation decoding unit 933.Unless single hardware is necessarily implemented as, it is no Then said modules can be integrated at least one module, and can be implemented as at least one processor (not shown).With in Fig. 8 The audio decoding apparatus 800 shown is compared, and audio decoding apparatus 900 may also include public post-processing module 950, therefore, will Omit the description pair the component identical with the component of audio decoding apparatus 800.
With reference to Fig. 9, public post-processing module 950 is executable corresponding with (Fig. 2's) public pretreatment module (205) Close three-dimensional sonication, around processing and/or bandwidth expansion processing.
Method accoding to exemplary embodiment can be written as computer executable program and be implemented in general digital In computer, wherein, the general purpose digital computer performs journey by using non-transitory computer readable recording medium Sequence.In addition, the data structure that can be used in embodiment, programmed instruction or data file can be recorded in a variety of ways In non-transitory computer readable recording medium.Non-transitory computer readable recording medium is that can store thereafter can be by calculating The arbitrary data storage device for the data that machine system is read.The example of non-transitory computer readable recording medium includes:Magnetic is situated between Matter (such as hard disk, floppy disk and tape), optical record medium (such as CD ROM disks and DVD), magnet-optical medium (such as CD) And it is specially configured to the hardware unit (ROM, RAM, flash memory etc.) of storage and execute program instructions.In addition, non-transitory Computer readable recording medium storing program for performing can be the transmission medium for the signal for being used for transmission designated program instruction, data structure etc..Program The example of instruction can not only include the machine language code produced by compiler, and may also include can use interpreter by computer Deng the higher-level language code of execution.
Although exemplary embodiment has been particularly shown and described above, those of ordinary skill in the art will Understand, in the case where not departing from the spirit and scope for the present inventive concept that claim is limited, form can be carried out to it With the various changes in details.Exemplary embodiment should be to be considered merely as descriptive meaning rather than the purpose for limitation. Therefore, the scope of present inventive concept is limited by the detailed description of exemplary embodiment, but is limited by claim It is fixed, and all differences in the scope are to be interpreted as being included in present inventive concept.

Claims (8)

1. a kind of equipment for determining coding mode, the equipment includes:
At least one processing unit, is configured as:
Based on characteristics of signals, the classification of definite present frame among multiple classifications including music categories and voice class;
Characteristic parameter is obtained from multiple frames including present frame;
At least one condition is produced based on the characteristic parameter;
Based at least one condition, determine whether mistake occurs in the classification of identified present frame;
When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction.
2. equipment as claimed in claim 1, wherein, the processing unit is configured as:
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is music categories The classification of identified present frame is corrected to voice class;
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is voice class The classification of identified present frame is corrected to music categories.
3. equipment as claimed in claim 1, wherein, the characteristic parameter includes tone and linear prediction error.
4. equipment as claimed in claim 3, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter Difference.
5. a kind of audio coding apparatus, the equipment includes:
At least one processing unit, is configured as:
Based on characteristics of signals, the classification of definite present frame among multiple classifications including music categories and voice class;
Characteristic parameter is obtained from multiple frames including present frame;
At least one condition is produced based on the characteristic parameter;
Based at least one condition and hangover parameter, determine whether mistake occurs in the classification of identified present frame By mistake;
When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction;
Based on the classification of the present frame after the classification of identified present frame or correction, present frame is performed at different codings Reason.
6. equipment as claimed in claim 5, wherein, the processing unit is configured as:
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is music categories The classification of identified present frame is corrected to voice class;
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is voice class The classification of identified present frame is corrected to music categories.
7. equipment as claimed in claim 6, wherein, the characteristic parameter includes tone and linear prediction error.
8. equipment as claimed in claim 5, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter Difference.
CN201711421463.5A 2012-11-13 2013-11-13 Device for determining coding mode and audio coding device Active CN107958670B (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201261725694P 2012-11-13 2012-11-13
US61/725,694 2012-11-13
CN201380070268.6A CN104919524B (en) 2012-11-13 2013-11-13 For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201380070268.6A Division CN104919524B (en) 2012-11-13 2013-11-13 For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal

Publications (2)

Publication Number Publication Date
CN107958670A true CN107958670A (en) 2018-04-24
CN107958670B CN107958670B (en) 2021-11-19

Family

ID=50731440

Family Applications (3)

Application Number Title Priority Date Filing Date
CN201711424971.9A Active CN108074579B (en) 2012-11-13 2013-11-13 Method for determining coding mode and audio coding method
CN201380070268.6A Active CN104919524B (en) 2012-11-13 2013-11-13 For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal
CN201711421463.5A Active CN107958670B (en) 2012-11-13 2013-11-13 Device for determining coding mode and audio coding device

Family Applications Before (2)

Application Number Title Priority Date Filing Date
CN201711424971.9A Active CN108074579B (en) 2012-11-13 2013-11-13 Method for determining coding mode and audio coding method
CN201380070268.6A Active CN104919524B (en) 2012-11-13 2013-11-13 For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal

Country Status (18)

Country Link
US (3) US20140188465A1 (en)
EP (2) EP2922052B1 (en)
JP (2) JP6170172B2 (en)
KR (3) KR102446441B1 (en)
CN (3) CN108074579B (en)
AU (2) AU2013345615B2 (en)
BR (1) BR112015010954B1 (en)
CA (1) CA2891413C (en)
ES (1) ES2900594T3 (en)
MX (2) MX349196B (en)
MY (1) MY188080A (en)
PH (1) PH12015501114A1 (en)
PL (1) PL2922052T3 (en)
RU (3) RU2630889C2 (en)
SG (2) SG11201503788UA (en)
TW (2) TWI612518B (en)
WO (1) WO2014077591A1 (en)
ZA (1) ZA201504289B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6599368B2 (en) 2014-02-24 2019-10-30 サムスン エレクトロニクス カンパニー リミテッド Signal classification method and apparatus, and audio encoding method and apparatus using the same
US9886963B2 (en) * 2015-04-05 2018-02-06 Qualcomm Incorporated Encoder selection
CN107731238B (en) 2016-08-10 2021-07-16 华为技术有限公司 Coding method and coder for multi-channel signal
CN109389987B (en) 2017-08-10 2022-05-10 华为技术有限公司 Audio coding and decoding mode determining method and related product
US10325588B2 (en) 2017-09-28 2019-06-18 International Business Machines Corporation Acoustic feature extractor selected according to status flag of frame of acoustic signal
US11032580B2 (en) 2017-12-18 2021-06-08 Dish Network L.L.C. Systems and methods for facilitating a personalized viewing experience
US10365885B1 (en) * 2018-02-21 2019-07-30 Sling Media Pvt. Ltd. Systems and methods for composition of audio content from multi-object audio
CN111081264B (en) * 2019-12-06 2022-03-29 北京明略软件系统有限公司 Voice signal processing method, device, equipment and storage medium
WO2023048410A1 (en) * 2021-09-24 2023-03-30 삼성전자 주식회사 Electronic device for data packet transmission or reception, and operation method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101050A1 (en) * 2001-11-29 2003-05-29 Microsoft Corporation Real-time speech and music classifier
CN1954364A (en) * 2004-05-17 2007-04-25 诺基亚公司 Audio encoding with different coding frame lengths
CN101197135A (en) * 2006-12-05 2008-06-11 华为技术有限公司 Aural signal classification method and device
CN101350199A (en) * 2008-07-29 2009-01-21 北京中星微电子有限公司 Audio encoder and audio encoding method
US20120069899A1 (en) * 2002-09-04 2012-03-22 Microsoft Corporation Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes

Family Cites Families (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2102080C (en) * 1992-12-14 1998-07-28 Willem Bastiaan Kleijn Time shifting for generalized analysis-by-synthesis coding
DE69926821T2 (en) * 1998-01-22 2007-12-06 Deutsche Telekom Ag Method for signal-controlled switching between different audio coding systems
JP3273599B2 (en) 1998-06-19 2002-04-08 沖電気工業株式会社 Speech coding rate selector and speech coding device
US6691084B2 (en) * 1998-12-21 2004-02-10 Qualcomm Incorporated Multiple mode variable rate speech coding
US6704711B2 (en) * 2000-01-28 2004-03-09 Telefonaktiebolaget Lm Ericsson (Publ) System and method for modifying speech signals
US6658383B2 (en) * 2001-06-26 2003-12-02 Microsoft Corporation Method for coding speech and music signals
WO2004034379A2 (en) * 2002-10-11 2004-04-22 Nokia Corporation Methods and devices for source controlled variable bit-rate wideband speech coding
US20050096898A1 (en) * 2003-10-29 2005-05-05 Manoj Singhal Classification of speech and music using sub-band energy
FI118834B (en) * 2004-02-23 2008-03-31 Nokia Corp Classification of audio signals
US7512536B2 (en) * 2004-05-14 2009-03-31 Texas Instruments Incorporated Efficient filter bank computation for audio coding
US7739120B2 (en) * 2004-05-17 2010-06-15 Nokia Corporation Selection of coding models for encoding an audio signal
WO2006137425A1 (en) * 2005-06-23 2006-12-28 Matsushita Electric Industrial Co., Ltd. Audio encoding apparatus, audio decoding apparatus and audio encoding information transmitting apparatus
US7733983B2 (en) * 2005-11-14 2010-06-08 Ibiquity Digital Corporation Symbol tracking for AM in-band on-channel radio receivers
US7558809B2 (en) * 2006-01-06 2009-07-07 Mitsubishi Electric Research Laboratories, Inc. Task specific audio classification for identifying video highlights
US8346544B2 (en) * 2006-01-20 2013-01-01 Qualcomm Incorporated Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision
KR100790110B1 (en) * 2006-03-18 2008-01-02 삼성전자주식회사 Apparatus and method of voice signal codec based on morphological approach
EP2092517B1 (en) * 2006-10-10 2012-07-18 QUALCOMM Incorporated Method and apparatus for encoding and decoding audio signals
CN101197130B (en) * 2006-12-07 2011-05-18 华为技术有限公司 Sound activity detecting method and detector thereof
KR100964402B1 (en) * 2006-12-14 2010-06-17 삼성전자주식회사 Method and Apparatus for determining encoding mode of audio signal, and method and appartus for encoding/decoding audio signal using it
CN101025918B (en) * 2007-01-19 2011-06-29 清华大学 Voice/music dual-mode coding-decoding seamless switching method
KR20080075050A (en) * 2007-02-10 2008-08-14 삼성전자주식회사 Method and apparatus for updating parameter of error frame
US8060363B2 (en) * 2007-02-13 2011-11-15 Nokia Corporation Audio signal encoding
CN101256772B (en) * 2007-03-02 2012-02-15 华为技术有限公司 Method and device for determining attribution class of non-noise audio signal
US9653088B2 (en) * 2007-06-13 2017-05-16 Qualcomm Incorporated Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding
ES2533358T3 (en) * 2007-06-22 2015-04-09 Voiceage Corporation Procedure and device to estimate the tone of a sound signal
KR101380170B1 (en) * 2007-08-31 2014-04-02 삼성전자주식회사 A method for encoding/decoding a media signal and an apparatus thereof
CN101393741A (en) * 2007-09-19 2009-03-25 中兴通讯股份有限公司 Audio signal classification apparatus and method used in wideband audio encoder and decoder
CN101399039B (en) * 2007-09-30 2011-05-11 华为技术有限公司 Method and device for determining non-noise audio signal classification
CN101236742B (en) * 2008-03-03 2011-08-10 中兴通讯股份有限公司 Music/ non-music real-time detection method and device
AU2009220321B2 (en) * 2008-03-03 2011-09-22 Intellectual Discovery Co., Ltd. Method and apparatus for processing audio signal
JP2011518345A (en) * 2008-03-14 2011-06-23 ドルビー・ラボラトリーズ・ライセンシング・コーポレーション Multi-mode coding of speech-like and non-speech-like signals
WO2009118044A1 (en) * 2008-03-26 2009-10-01 Nokia Corporation An audio signal classifier
EP2144231A1 (en) * 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme with common preprocessing
EP2144230A1 (en) 2008-07-11 2010-01-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Low bitrate audio encoding/decoding scheme having cascaded switches
WO2010003521A1 (en) * 2008-07-11 2010-01-14 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Method and discriminator for classifying different segments of a signal
JP5555707B2 (en) * 2008-10-08 2014-07-23 フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン Multi-resolution switching audio encoding and decoding scheme
CN101751920A (en) * 2008-12-19 2010-06-23 数维科技(北京)有限公司 Audio classification and implementation method based on reclassification
KR101622950B1 (en) * 2009-01-28 2016-05-23 삼성전자주식회사 Method of coding/decoding audio signal and apparatus for enabling the method
JP4977157B2 (en) * 2009-03-06 2012-07-18 株式会社エヌ・ティ・ティ・ドコモ Sound signal encoding method, sound signal decoding method, encoding device, decoding device, sound signal processing system, sound signal encoding program, and sound signal decoding program
CN101577117B (en) * 2009-03-12 2012-04-11 无锡中星微电子有限公司 Extracting method of accompaniment music and device
CN101847412B (en) * 2009-03-27 2012-02-15 华为技术有限公司 Method and device for classifying audio signals
US20100253797A1 (en) * 2009-04-01 2010-10-07 Samsung Electronics Co., Ltd. Smart flash viewer
KR20100115215A (en) * 2009-04-17 2010-10-27 삼성전자주식회사 Apparatus and method for audio encoding/decoding according to variable bit rate
KR20110022252A (en) * 2009-08-27 2011-03-07 삼성전자주식회사 Method and apparatus for encoding/decoding stereo audio
PL2491555T3 (en) * 2009-10-20 2014-08-29 Fraunhofer Ges Forschung Multi-mode audio codec
CN102237085B (en) * 2010-04-26 2013-08-14 华为技术有限公司 Method and device for classifying audio signals
JP5749462B2 (en) 2010-08-13 2015-07-15 株式会社Nttドコモ Audio decoding apparatus, audio decoding method, audio decoding program, audio encoding apparatus, audio encoding method, and audio encoding program
CN102446504B (en) * 2010-10-08 2013-10-09 华为技术有限公司 Voice/Music identifying method and equipment
CN102385863B (en) * 2011-10-10 2013-02-20 杭州米加科技有限公司 Sound coding method based on speech music classification
US9111531B2 (en) * 2012-01-13 2015-08-18 Qualcomm Incorporated Multiple coding mode signal classification
WO2014010175A1 (en) * 2012-07-09 2014-01-16 パナソニック株式会社 Encoding device and encoding method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030101050A1 (en) * 2001-11-29 2003-05-29 Microsoft Corporation Real-time speech and music classifier
US20120069899A1 (en) * 2002-09-04 2012-03-22 Microsoft Corporation Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes
CN1954364A (en) * 2004-05-17 2007-04-25 诺基亚公司 Audio encoding with different coding frame lengths
CN101197135A (en) * 2006-12-05 2008-06-11 华为技术有限公司 Aural signal classification method and device
CN101350199A (en) * 2008-07-29 2009-01-21 北京中星微电子有限公司 Audio encoder and audio encoding method

Also Published As

Publication number Publication date
JP2015535099A (en) 2015-12-07
CN108074579A (en) 2018-05-25
KR20210146443A (en) 2021-12-03
PH12015501114A1 (en) 2015-08-10
ZA201504289B (en) 2021-09-29
US10468046B2 (en) 2019-11-05
WO2014077591A1 (en) 2014-05-22
JP2017167569A (en) 2017-09-21
US11004458B2 (en) 2021-05-11
US20140188465A1 (en) 2014-07-03
EP2922052A1 (en) 2015-09-23
CN108074579B (en) 2022-06-24
US20180322887A1 (en) 2018-11-08
TWI648730B (en) 2019-01-21
KR102331279B1 (en) 2021-11-25
CN104919524B (en) 2018-01-23
SG10201706626XA (en) 2017-09-28
CN104919524A (en) 2015-09-16
TW201805925A (en) 2018-02-16
PL2922052T3 (en) 2021-12-20
AU2013345615A1 (en) 2015-06-18
MX361866B (en) 2018-12-18
AU2013345615B2 (en) 2017-05-04
EP2922052B1 (en) 2021-10-13
MY188080A (en) 2021-11-16
KR102561265B1 (en) 2023-07-28
BR112015010954A2 (en) 2017-08-15
US20200035252A1 (en) 2020-01-30
RU2680352C1 (en) 2019-02-19
CA2891413A1 (en) 2014-05-22
AU2017206243A1 (en) 2017-08-10
RU2015122128A (en) 2017-01-10
MX349196B (en) 2017-07-18
KR102446441B1 (en) 2022-09-22
CN107958670B (en) 2021-11-19
EP3933836A1 (en) 2022-01-05
TW201443881A (en) 2014-11-16
ES2900594T3 (en) 2022-03-17
KR20220132662A (en) 2022-09-30
AU2017206243B2 (en) 2018-10-04
RU2630889C2 (en) 2017-09-13
KR20150087226A (en) 2015-07-29
SG11201503788UA (en) 2015-06-29
JP6170172B2 (en) 2017-07-26
TWI612518B (en) 2018-01-21
MX2015006028A (en) 2015-12-01
JP6530449B2 (en) 2019-06-12
BR112015010954B1 (en) 2021-11-09
EP2922052A4 (en) 2016-07-20
RU2656681C1 (en) 2018-06-06
CA2891413C (en) 2019-04-02

Similar Documents

Publication Publication Date Title
CN104919524B (en) For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal
TWI459379B (en) Audio encoder and decoder for encoding and decoding audio samples
CN103493129B (en) For using Transient detection and quality results by the apparatus and method of the code segment of audio signal
MX2011000362A (en) Low bitrate audio encoding/decoding scheme having cascaded switches.
CN107112022A (en) The method and apparatus hidden for data-bag lost and the coding/decoding method and device using this method
US11922962B2 (en) Unified speech/audio codec (USAC) processing windows sequence based mode switching
US20240212698A1 (en) Unified speech/audio codec (usac) processing windows sequence based mode switching
JP2002244700A (en) Device and method for sound encoding and storage element

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant