CN107958670A - For determining the equipment and audio coding apparatus of coding mode - Google Patents
For determining the equipment and audio coding apparatus of coding mode Download PDFInfo
- Publication number
- CN107958670A CN107958670A CN201711421463.5A CN201711421463A CN107958670A CN 107958670 A CN107958670 A CN 107958670A CN 201711421463 A CN201711421463 A CN 201711421463A CN 107958670 A CN107958670 A CN 107958670A
- Authority
- CN
- China
- Prior art keywords
- coding mode
- present frame
- classification
- coding
- unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000012937 correction Methods 0.000 claims abstract description 19
- 238000012545 processing Methods 0.000 claims description 18
- 206010019133 Hangover Diseases 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 abstract description 63
- 238000000034 method Methods 0.000 abstract description 32
- 230000005284 excitation Effects 0.000 description 62
- 230000003595 spectral effect Effects 0.000 description 43
- 238000010586 diagram Methods 0.000 description 17
- 238000000605 extraction Methods 0.000 description 12
- 238000002474 experimental method Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000002156 mixing Methods 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 230000007774 longterm Effects 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001755 vocal effect Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000000712 assembly Effects 0.000 description 1
- 238000000429 assembly Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 238000000527 sonication Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 238000011282 treatment Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Provide a kind of equipment and audio coding apparatus for being used to determine coding mode.A kind of method of definite coding mode includes:According to the characteristic of audio signal, a coding mode in multiple coding modes including the first coding mode and the second coding mode is determined as initial code pattern;If in the determining of initial code pattern there are mistake, by the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode.
Description
The application is " to be used for true for the entitled of on November 13rd, 2013 applying date submitted to China Intellectual Property Office
Delimit the organizational structure pattern method and apparatus, for the method and apparatus that is encoded to audio signal and for audio signal
Carry out decoded method and apparatus " No. 201380070268.6 application divisional application.
Technical field
The apparatus and method consistent with exemplary embodiment are related to audio coding and audio decoder, more particularly, are related to
A kind of coding mode by determining to be suitable for the characteristic of audio signal simultaneously prevents frequent coding mode switching from determining to be used for
The method and apparatus for improving the coding mode of the quality of the audio signal of reconstruct, it is a kind of to be used for what audio signal was encoded
Method and apparatus and one kind are used to carry out decoded method and apparatus to audio signal.
Background technology
It is widely known that it is efficient and in time domain to voice signal to carry out coding to music signal in frequency domain
It is efficient to carry out coding.Therefore, it has been proposed that for determining to be mixed with the audio signal of music signal and voice signal
Classification and the various technologies for determining coding mode corresponding with identified classification.
However, due to frequency coding pattern switching, not only postpone, also reduce decoded sound quality.In addition,
Since there is no the technology for correcting the coding mode (that is, classification) initially determined that, therefore, if in definite coding mode
Mistake occurs for period, then the quality of the audio signal reconstructed reduces.
The content of the invention
Technical problem
The many aspects of one or more exemplary embodiments provide a kind of be used for by determining that being suitable for audio believes
Number characteristic coding mode come determine for improve reconstruct audio signal quality coding mode method and apparatus,
A kind of method and apparatus for being encoded to audio signal and it is a kind of be used to carrying out audio signal decoded method and
Equipment.
The many aspects of one or more exemplary embodiments provide a kind of for determining to be suitable for audio signal
The coding mode of characteristic and the method and apparatus for reducing the time delay caused by frequent coding mode switches, one kind are used for
The method and apparatus and one kind encoded to audio signal is used to carry out decoded method and apparatus to audio signal.
Solution
According to the one side of one or more exemplary embodiments, a kind of method of definite coding mode, the method
Including:According to the characteristic of audio signal, among multiple coding modes including the first coding mode and the second coding mode
One coding mode is determined as initial code pattern;If there are mistake in the determining of initial code pattern, pass through by
Initial code mode correction produces corrected coding mode for the 3rd coding mode.
According to the one side of one or more exemplary embodiments, a kind of method encoded to audio signal, institute
The method of stating includes:According to the characteristic of audio signal, multiple coding modes of the first coding mode and the second coding mode will be included
Among a coding mode be determined as initial code pattern;If there are mistake in the determining of initial code pattern,
By the way that initial code mode correction is produced corrected coding mode for the 3rd coding mode;Based on initial code mould
Formula or corrected coding mode perform audio signal different coded treatments.
It is a kind of that decoded method, institute are carried out to audio signal according to the one side of one or more exemplary embodiments
The method of stating includes:Bit stream including one of initial code pattern and the 3rd coding mode is parsed, and is compiled based on initial
Pattern or the 3rd coding mode perform the bit stream different decoding process, wherein, the initial code pattern is logical
Cross and one is determined among multiple coding modes including the first coding mode and the second coding mode according to the characteristic of audio signal
A coding mode and obtain, the 3rd coding mode be to initial code pattern determine in there is a situation where mistake
Under obtained from initial code pattern is corrected.
Beneficial effect
Accoding to exemplary embodiment, by based on the correction to initial code pattern and frame corresponding with trailing length
Coding mode determine the final coding mode of present frame, frequent coding mode switching that can between multiple frames are prevented
While select the coding mode of the characteristic for being adapted to audio signal.
Brief description of the drawings
Fig. 1 is the block diagram for the configuration for showing audio coding apparatus accoding to exemplary embodiment;
Fig. 2 is the block diagram for the configuration for showing audio coding apparatus according to another exemplary embodiment;
Fig. 3 is the block diagram for the configuration for showing coding mode determination unit accoding to exemplary embodiment;
Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit accoding to exemplary embodiment;
Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit accoding to exemplary embodiment;
Fig. 6 is the adaptive method for switching between linear prediction domain coding and the spectral domain shown accoding to exemplary embodiment
Diagram;
Fig. 7 is the diagram for the operation for showing coding mode correction unit accoding to exemplary embodiment;
Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus accoding to exemplary embodiment;
Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus according to another exemplary embodiment.
Embodiment
Embodiment is will be described in now, its example is illustrated in the accompanying drawings, wherein, identical label refers to phase all the time
Same element.At this point, the present embodiment can have different forms and should not be construed as limited to illustrate herein
Description.Therefore, by referring to accompanying drawing, embodiment below is only illustrated many aspects for explaining this specification.
Such as term of " connection " and " link " may be used to indicate that the state for being directly connected to or linking, but should manage
Solution, another component can be set to therebetween.
Such as term of " first " and " second " can be used for describing various assemblies, but the component should not be so limited to institute
State term.The term can be only applied to make a component distinguish with another component.
The unit described in the exemplary embodiment is shown separately to indicate different characteristic functions, and it is unexpectedly
Taste each unit and is formed by single a nextport hardware component NextPort or component software.Each unit is shown for the ease of explanation, and
And multiple units can form a unit, a unit can be divided into multiple units.
Fig. 1 is the block diagram for the configuration for showing audio coding apparatus 100 accoding to exemplary embodiment.
The audio coding apparatus 100 shown in Fig. 1 may include coding mode determination unit 110, switch unit 120, spectral domain
Coding unit 130, linear prediction domain coding unit 140 and bit stream generation unit 150.Linear prediction domain coding unit 140 can
Including time domain excitation coding unit 141 and frequency domain excitation coding unit 143, wherein, linear prediction domain coding unit 140 can quilt
It is embodied as at least one in time domain excitation coding unit 141 and frequency domain excitation coding unit 143.Unless it is necessarily implemented as
Single hardware, otherwise said modules can be integrated at least one module and at least one processor can be implemented as (not
Show).Here, term audio signal can refer to music signal, voice signal or their mixed signal.
With reference to Fig. 1, coding mode determination unit 110 can analyze the characteristic of audio signal to determine the classification of audio signal,
And coding mode is determined according to the result of classification.Coding mode is determined to hold in units of superframe, frame or frequency range
OK.Selectively, coding mode is determined to hold in units of multiple superframe groups, multiple frame groups or multiple groups of frequency bands
OK.Here, the example of coding mode may include spectral domain and time domain or linear prediction domain, but not limited to this.If the property of processor
Energy and processing speed are enough and time delay can be solved caused by coding mode switches, then coding mode can be subdivided, and
And encoding scheme can be also subdivided according to coding mode.Accoding to exemplary embodiment, coding mode determination unit 110 can be by sound
The initial code pattern of frequency signal is determined as one of spectral domain coding mode and time domain coding pattern.Implemented according to another exemplary
The initial code pattern of audio signal can be determined as spectral domain coding mode, time domain excitation by example, coding mode determination unit 110
One of coding mode and frequency domain excitation coding mode.If spectral domain coding mode is confirmed as initial code pattern, mould is encoded
Initial code mode correction can be encouraged one of coding mode by formula determination unit 110 for spectral domain coding mode and frequency domain.If when
Domain coding mode (that is, time domain excitation coding mode) is confirmed as initial code pattern, then coding mode determination unit 110 can
Initial code mode correction is encouraged into one of coding mode for time domain excitation coding mode and frequency domain.If time domain excitation encodes
Pattern is confirmed as initial code pattern, then final coding mode is determined to be selectively performed.In other words, just
Beginning coding mode (that is, time domain excitation coding mode) can be kept.Coding mode determination unit 110 can determine that and trailing length
The coding mode of (hangover length) corresponding multiple frames, and can be that present frame determines final coding mode.According to showing
Example property embodiment, if the initial code pattern of present frame or corrected coding mode and multiple previous frames are (for example, 7
Previous frame) coding mode it is identical, then corresponding initial code pattern or corrected coding mode can be confirmed as work as
The final coding mode of previous frame.Meanwhile if the initial code pattern of present frame or corrected coding mode and multiple elder generations
The coding mode of previous frame (for example, 7 previous frames) differs, then coding mode determination unit 110 can will be just before present frame
The coding mode of frame be determined as the final coding mode of present frame.
As described above, pass through the coding mould based on the correction to initial code pattern and frame corresponding with trailing length
Formula determines the final coding mode of present frame, can be selected while the frequent coding mode switching between preventing frame
It is adapted to the coding mode of the characteristic of audio signal.
In general, time domain coding (that is, time domain excitation encodes) can be efficient, spectral domain coding for voice signal
Can be efficient for music signal, and frequency domain excitation coding is for speech (vocal) signal and/or harmonic signal meeting
It is efficient.
According to the coding mode determined by coding mode determination unit 110, switch unit 120 can be to spectral domain coding unit
130 or linear prediction domain coding unit 140 provide audio signal.If linear prediction domain coding unit 140 is implemented as time domain
Coding unit 141 is encouraged, then switch unit 120 may include Liang Ge branches altogether.If 140 quilt of linear prediction domain coding unit
It is embodied as time domain excitation coding unit 141 and frequency domain excitation coding unit 143, then switch unit 120 there can be 3 points altogether
Branch.
Spectral domain coding unit 130 can encode audio signal in spectral domain.Spectral domain can refer to frequency domain or transform domain.It is adapted to
It may include Advanced Audio Coding (AAC) in the example of the coding method of spectral domain coding unit 130 or become including improving discrete cosine
The combination of (MDCT) and factorial pulse code (FPC) is changed, but not limited to this.In detail, other quantification technique and entropy coding skills
Art can be used to replace FPC.It can be efficient to carry out coding to music signal in spectral domain coding unit 130.
Linear prediction domain coding unit 140 can encode audio signal in linear prediction domain.Linear prediction domain can refer to
Excitation domain or time domain.Linear prediction domain coding unit 140 can be implemented as time domain excitation coding unit 141, or can be implemented
It is to include time domain excitation coding unit 141 and frequency domain excitation coding unit 143.It is suitable for the volume of time domain excitation coding unit 141
The example of code method may include Code Excited Linear Prediction (CELP) or algebraically CELP (ACELP), but not limited to this.It is suitable for frequency
The example of the coding method of domain excitation coding unit 143 may include universal signal coding (GSC) or conversion code excited (TCX),
But not limited to this.It can be efficient to carry out coding to voice signal in time domain excitation coding unit 141, and is swashed in frequency domain
It can be efficient to encourage and carry out coding to Vocal signal and/or harmonic signal in coding unit 143.
Bit stream generation unit 150 can produce bit stream to include the coding mould provided by coding mode determination unit 110
Formula, the coding result provided by spectral domain coding unit 130 and the coding result provided by linear prediction domain coding unit 140.
Fig. 2 is the block diagram for the configuration for showing audio coding apparatus 200 according to another exemplary embodiment.
Audio coding apparatus 200 shown in Figure 2 may include public pretreatment module 205, coding mode determination unit
210th, switch unit 220, spectral domain coding unit 230, linear prediction domain coding unit 240 and bit stream generation unit 250.This
In, linear prediction domain coding unit 240 may include time domain excitation coding unit 241 and frequency domain excitation coding unit 243, linearly
Prediction domain coding unit 240 can be implemented as time domain excitation coding unit or frequency domain excitation coding unit 243.With being shown in Fig. 1
Audio coding apparatus 100 compare, audio coding apparatus 200 may also include public pretreatment module 205, therefore, with audio compile
The description of the identical component of the component of decoding apparatus 100 will be omitted.
With reference to Fig. 2, public pretreatment module 205 can perform joint stereo processing, around processing and/or bandwidth expansion
Processing.Joint stereo processing, around processing and bandwidth expansion processing can with by specific criteria (for example, MPEG standards) use
Those processing it is identical, but not limited to this.The output of public pretreatment module 205 can be monophonic, stereo channels or
In multichannel.According to the quantity of the sound channel of the signal exported by public pretreatment module 205, switch unit 220 may include at least
One switch.For example, if public pretreatment module 205 exports two or more sound channel (that is, stereo channels or more sound
Road) signal, then it is corresponding with each sound channel switch can be arranged.For example, the first sound channel of stereo signal can be language
Speech road, the second sound channel of stereo signal can be music soundtrack.In this case, audio signal can be simultaneously provided
To two switches.The additional information produced by public pretreatment module 205 is provided to bit stream generation unit 250 and quilt
Including in the bitstream.The additional information in decoding end for performing joint stereo processing, around processing and/or bandwidth
Extension process is necessary, and may include spatial parameter, envelope information, energy information etc..However, based on the place applied
Reason technology, may be present various additional informations.
Accoding to exemplary embodiment, in public pretreatment module 205, encoding domain can be based on and is differently carried out bandwidth expansion
Exhibition is handled.Audio signal in core frequency band can be located by using time domain excitation coding mode or frequency domain excitation coding mode
Reason, and the audio signal in bandwidth expansion frequency range can be processed in the time domain.Bandwidth expansion processing in time domain may include multiple
Pattern (including voiced sound pattern or voiceless sound pattern).Selectively, the audio signal in core frequency band can be encoded by using spectral domain
Pattern is handled, and the audio signal in bandwidth expansion frequency range can be processed in a frequency domain.Bandwidth expansion processing in frequency domain can
Including multiple patterns (including transient mode, general modfel or harmonic mode).In order to be performed in not same area at bandwidth expansion
Reason, the coding mode determined by coding mode determination unit 110 can be provided to public pretreatment module as signaling information
205.Accoding to exemplary embodiment, the beginning of the decline of core frequency band and bandwidth expansion frequency range may be in certain journey
Overlap each other on degree.The positions and dimensions of lap can be pre-arranged.
Fig. 3 is the block diagram for the configuration for showing coding mode determination unit 300 accoding to exemplary embodiment.
The coding mode determination unit 300 shown in Fig. 3 may include initial code pattern determining unit 310 and coding mould
Formula corrects unit 330.
With reference to Fig. 3, initial code pattern determining unit 310 can be by using the characteristic parameter extracted from audio signal
To determine that audio signal is music signal or voice signal.If audio signal is confirmed as voice signal, linear prediction
Domain coding can be suitable.Meanwhile if audio signal is confirmed as music signal, spectral domain coding can be suitable.Initially
Coding mode determination unit 310 can determine the classification of audio signal by using the characteristic parameter extracted from audio signal,
Wherein, the classification instruction of audio signal is that spectral domain coding, time domain excitation coding or frequency domain excitation coding are suitable for audio letter
Number.Corresponding encoded pattern can be determined based on the classification of audio signal.(if Fig. 1's) switch unit (120) has two
Branch, then coding mode can be represented with 1 bit.If (Fig. 1's) switch unit (120) has three branches, encode
Pattern can be represented with 2 bits.Initial code pattern determining unit 310 can be by using well known in the prior art various
Any technology in technology determines that audio signal is music signal or voice signal.Its example may include USAC standards
The ACELP/TCX classification used in FD/LPD classification or ACELP/TCX classification and AMR standards disclosed in encoder section,
But not limited to this.In other words, can be by using various any sides in addition to method according to the embodiment described here
Method determines initial code pattern.
Coding mode correction unit 330 can be by using correction parameter to being determined by initial code pattern determining unit 310
Initial code pattern be corrected to determine corrected coding mode.Accoding to exemplary embodiment, if spectral domain encodes
Pattern is confirmed as initial code pattern, then based on correction parameter, initial code pattern can be corrected as frequency domain excitation coding mould
Formula.If time domain coding pattern is confirmed as initial code pattern, based on correction parameter, initial code pattern can be corrected
Coding mode is encouraged for frequency domain.In other words, by using correction parameter, determine be in the determining of initial code pattern
It is no that there are mistake.If it is determined that to initial code pattern determine in mistake is not present, then initial code pattern can be protected
Hold.If instead it is determined that there are mistake in the determining of initial code pattern, then initial code pattern can be corrected.Can
Obtain and coding mode and the excitation coding mode from time domain excitation coding mode to frequency domain are encouraged from spectral domain coding mode to frequency domain
The correction to initial code pattern.
Meanwhile initial code pattern or corrected coding mode can be the temporary code patterns for present frame,
Wherein, will can be carried out for the temporary code pattern of present frame and the coding mode for presetting the previous frame in trailing length
Compare, and can determine that the final coding mode for present frame.
Fig. 4 is the block diagram for the configuration for showing initial code pattern determining unit 400 accoding to exemplary embodiment.
The initial code pattern determining unit 400 shown in Fig. 4 may include characteristic parameter extraction unit 410 and determine single
Member 430.
With reference to Fig. 4, characteristic parameter extraction unit 410 can be extracted from audio signal is used to determine that the institute of coding mode to be necessary
Characteristic parameter.The example of the characteristic parameter of extraction includes pitch (pitch) parameter, voiced sound parameter, degree of correlation parameter and linear
Predict at least one or two among error, but not limited to this.It will be given below the detailed description to parameters.
First, fisrt feature parameter F1It is related with pitch parameter, wherein, can be by using in present frame and at least one
N number of pitch value for being detected in previous frame determines the performance of pitch.Effect deviates or prevents the sound of mistake at random in order to prevent
High level, can remove the visibly different M pitch value of average value with N number of pitch value.Here, N and M can be passed through in advance
By the value for testing or emulating and be acquired.In addition, N can be pre-arranged, and by removed pitch value and N number of pitch
The difference between average value between value can be determined via experiment or emulation in advance.By using on (N-M) a pitch value
Average mp' and variances sigmap', fisrt feature parameter F1It can be expressed as shown in following equation 1.
[equation 1]
Second feature parameter F2Also it is related with pitch parameter, and may indicate that the pitch value detected in the current frame can
By property.By using two subframe SF in present frame1And SF2The variances sigma of the middle pitch value detected respectivelySF1And σSF2, the
Two characteristic parameter F2It can be expressed as shown in following equation 2.
[equation 2]
Here, cov (SF1,SF2) represent subframe SF1With subframe SF2Between covariance.In other words, second feature is joined
Number F2The degree of correlation between two subframes is designated as pitch distance.Accoding to exemplary embodiment, present frame may include two or
More subframes, equation 2 can be changed based on the quantity of subframe.
Based on voiced sound parameter Voicing and degree of correlation parameter Corr, third feature parameter F3Can be as in following equation 3
It is shown to be expressed.
[equation 3]
Here, voiced sound parameter Voicing is related to the speech characteristics of sound, and can be by well known in the prior art
Any means in various methods obtain, and degree of correlation parameter Corr can be by the phase between the frame for each frequency range
Guan Du sums to obtain.
Fourth feature parameter F4With linear prediction error ELPCCorrelation can be simultaneously expressed as shown in following equation 4.
[equation 4]
Here, M (ELPC) represent the average value of N number of linear prediction error.
Determination unit 430 can be by using at least one characteristic parameter provided by characteristic parameter extraction unit 410 Lai really
The classification of audio signal, and initial code pattern can be determined based on identified classification.Determination unit 430 can use soft
Decision mechanism, wherein, in soft-decision mechanism, at least one mixing can be formed according to each characteristic parameter.According to exemplary reality
Example is applied, can be by determining the classification of audio signal using gauss hybrid models (GMM) based on mixing (mixture) probability.Close
It can be calculated in the probability f (x) of a mixing according to following equation 5.
[equation 5]
X=(x1..., xN)
m(Cx1C ..., CxNC)
Here, x represents the input vector of characteristic parameter, and m represents mixing, and c represents covariance matrix.
Determination unit 430 can calculate music probability P m and speech probability Ps by using following equation 6.
[equation 6]
Here, can be by will be with being suitable for relevant M probability P i phases Calais mixed of characteristic parameter that music determine
Calculate music probability P m, and can be by will be with being suitable for relevant S probability P i phases mixed of characteristic parameter that voice determine
Calais calculates speech probability Ps.
Meanwhile in order to improve accuracy, music probability P m and speech probability Ps can be calculated according to following equation 7.
[equation 7]
Here,Represent the probability of error each mixed.Can be by using each mixing to including clean speech signal
The quantity classified with the training data of pure music signal and classified to mistake is counted general to obtain the error
Rate.
Next, for multiple frames of quantity identical with constant trailing length, it can be calculated all according to following equation 8
Frame only includes the music probability P of music signalMOnly include the speech probability P of voice signal with all framesS.Trailing length can be set
8 are set to, but not limited to this.Eight frames may include present frame and 7 previous frames.
[equation 8]
Next, it can be calculated by using using equation 5 or the music probability P m or speech probability Ps of the acquisition of equation 6
Multiple situation (condition) setWithIts detailed description is provided below with reference to Fig. 6.Here, can be according to
Each situation is configured for music with value 1 and for mode of the voice with value 0.
With reference to Fig. 6, in operation 610 and operation 620, can be calculated from by using music probability P m and speech probability Ps
Multiple situation setWithTo obtain the sum of music situation the sum of M and voice state S.In other words, music
The sum of the sum of situation M and voice state S can be expressed as shown in following equation 9.
[equation 9]
630 are being operated, by the sum of music situation M compared with the threshold value Tm specified.If the sum of music situation M is big
In the threshold value Tm, then the coding mode of present frame is switched to music pattern (that is, spectral domain coding mode).If music shape
The sum of condition M is less than or equal to threshold value Tm, then the coding mode of present frame is not changed.
640 are being operated, by the sum of voice state S compared with specified threshold Ts.If the sum of voice state S is more than
Threshold value Ts, then the coding mode of present frame be switched to speech pattern (that is, linear prediction domain coding mode).If voice shape
The sum of condition S is less than or equal to threshold value Ts, then the coding mode of present frame is not changed.
Threshold value Tm and threshold value Ts can be arranged to the value obtained in advance via experiment or emulation.
Fig. 5 is the block diagram for the configuration for showing characteristic parameter extraction unit 500 accoding to exemplary embodiment.
The initial code pattern determining unit 500 shown in Fig. 5 may include converter unit 510, frequency spectrum parameter extraction unit
520th, time parameter extraction unit 530 and determination unit 540.
In Figure 5, original audio signal can be transformed from the time domain to frequency domain by converter unit 510.Here, converter unit 510
Can apply various any converter techniques using by audio signal from time-domain representation as spectral domain.The example of the technology may include quickly
Fourier transformation (FFT), discrete cosine transform (DCT) or Modified Discrete Cosine Tr ansform (MDCT), but not limited to this.
Frequency spectrum parameter extraction unit 520 can extract at least one frequency from the frequency-domain audio signals provided by converter unit 510
Compose parameter.Frequency spectrum parameter can be classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic ginseng can be obtained from present frame
Number, and long-term characteristic parameter can be obtained from multiple frames including present frame and at least one previous frame.
Time parameter extraction unit 530 can extract at least one time parameter from time-domain audio signal.Time parameter also may be used
It is classified as Short-term characteristic parameter and long-term characteristic parameter.Short-term characteristic parameter can be obtained from present frame, and can be from including current
Multiple frames of frame and at least one previous frame obtain long-term characteristic parameter.
(Fig. 4's) determination unit (430) can by using the frequency spectrum parameter provided by frequency spectrum parameter extraction unit 520 with
And the classification of audio signal is determined by time parameter that time parameter extraction unit 530 provides, and can be based on identified class
Initial code pattern is not determined.(Fig. 4's) determination unit (430) can use soft-decision mechanism.
Fig. 7 is the diagram for the operation for showing coding mode correction unit 310 accoding to exemplary embodiment.
With reference to Fig. 7, in operation 700, the initial code pattern determined by initial code pattern determining unit 310 is received,
And can determine that coding mode is Modulation (that is, time domain excitation pattern) or spectral domain pattern.
In operation 701, if determining that initial code pattern is spectral domain pattern (state in operation 700TS==1), then may be used
Check the whether more suitable index state of instruction frequency domain excitation codingTTSS.It can be obtained by using the tone of different frequency range
Fetching shows whether frequency domain excitation coding (for example, GSC) more suitably indexes stateTTSS.Its detailed description is presented below.
The tone of low-band signal can be acquired as with include minimum value multiple smaller values multiple spectral coefficients it
And the ratio between the spectral coefficient with the maximum for given frequency range.If given frequency range be 0~1kHz, 1~
The pitch t of 2kHz and 2~4kHz, then each frequency range01、t12And t24And the tone t of low-band signal (that is, core frequency band)L
It can be expressed as shown in following equation 10.
[equation 10]
tL=max (t01, t12,t24)
Meanwhile linear prediction error can be obtained and can be used for by using linear predictive coding (LPC) wave filter
Except strong tonal components.In other words strong tonal components are directed to, spectral domain coding mode is more more efficient than frequency domain excitation coding mode.
For being switched to frequency domain excitation coding mode by using the tone and linear prediction error that obtain as described above
Precondition condfrontIt can be expressed as shown in following equation 11.
[equation 11]
condfront=t12> t12frontAnd t24> t24frontAnd tL> tLfrontAnd err > errfront
Here, t12front、t24front、tLfrontAnd errfrontThreshold value, and can have in advance via experiment or emulation and
The value of acquisition.
Meanwhile for completing frequency domain excitation coding by using the tone and linear prediction error that obtain as described above
The postcondition cond of patternbackIt can be expressed as shown in following equation 12.
[equation 12]
condback=t12< t12backAnd t24< t24backAnd tL< tLback
Here, t12back、t24back、tLbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, can by determine equation 11 shown in precondition whether be satisfied or equation 12 shown in
Postcondition whether be satisfied determine index stateTTSSWhether it is 1, wherein, index stateTTSSIndicate that frequency domain excitation is compiled
It is more suitable whether code (for example, GSC) encodes than spectral domain.Here, the postcondition shown in peer-to-peer 12 determine can be
Optionally.
In operation 702, if indexing stateTTSSIt is 1, then frequency domain excitation coding mode can be confirmed as finally encoding mould
Formula.In this case, the spectral domain coding mode as initial code pattern is corrected as the frequency domain as final coding mode
Encourage coding mode.
In operation 705, if determining index state in operation 701TTSSIt is 0, then can checks for determining audio signal
Whether the index state of strong characteristics of speech sounds is includedSS.If in the determining of spectral domain coding mode, there are mistake, frequency domain to swash
Encouraging coding mode can be more more efficient than spectral domain coding mode.Can be by using the poor vc between voiced sound parameter and degree of correlation parameter
To obtain for determining whether audio signal includes the index state of strong characteristics of speech soundsSS。
For being switched to the preposition bar of strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter
Part condfrontIt can be expressed as shown in following equation 13.
[equation 13]
condfront=vc > vcfront
Here, vcfrontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
Meanwhile for terminating strong speech pattern by using the poor vc between voiced sound parameter and degree of correlation parameter after
Put condition condbackIt can be expressed as shown in following equation 14.
[equation 14]
condback=vc < vcback
Here, vcbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, in operation 705, whether can be satisfied or wait by the precondition for determining to show in equation 13
Whether the postcondition shown in formula 14 is not satisfied to determine index stateSSWhether it is 1, wherein, index stateSSInstruction
It is more suitable whether frequency domain excitation coding (for example, GSC) encodes than spectral domain.Here, shown in peer-to-peer 14 to postcondition
Determine can be optional.
In operation 706, if determining index state in operation 705SSFor 0, (that is, it is special not include strong voice for audio signal
Property), then spectral domain coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern
Coding mode is retained as final coding mode.
In operation 707, if determining index state in operation 705SSFor 1 (that is, audio signal includes strong characteristics of speech sounds),
Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, the spectral domain as initial code pattern
Coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.
By perform operation 700,701 and 705, to the spectral domain coding mode as initial code pattern determine in
Mistake can be corrected.In detail, the spectral domain coding mode as initial code pattern can be kept as final coding mould
Formula, or frequency domain can be switched to and encourage coding mode as final coding mode.
Meanwhile if determine that initial code pattern is linear prediction domain coding mode (state in operation 700TS==0),
Then it is used to determine whether audio signal includes the index state of strong musical specific propertySMIt can be examined.If to linear prediction domain
There are mistake in the determining of coding mode (that is, time domain excitation coding mode), then frequency domain excitation coding mode may swash than time domain
It is more efficient to encourage coding mode.Can be by using the value for subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 and obtaining
1-vc is obtained for determining whether audio signal includes the state of strong musical specific propertySM。
For by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value 1-vc that obtains
And it is switched to the precondition cond of strong music patternfrontIt can be expressed as shown in following equation 15.
[equation 15]
condfront=1-vc > vcmfront
Here, vcmfrontIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
Meanwhile for by using by subtracting the poor vc between voiced sound parameter and degree of correlation parameter from 1 the value that obtains
1-vc and the postcondition cond for terminating strong music patternbackIt can be expressed as shown in following equation 16.
[equation 16]
condback=1-vc < vcmback
Here, vcmbackIt is that threshold value can simultaneously have the value obtained in advance via experiment or emulation.
In other words, in operation 709, whether can be satisfied or wait by the precondition for determining to show in equation 15
Whether the postcondition shown in formula 16 is not satisfied to determine index stateSMWhether it is 1, wherein, index stateSMInstruction
Whether frequency domain excitation coding (for example, GSC) is more suitable for than time domain excitation coding.Here, the postcondition shown in peer-to-peer 16
Determine can be optional.
In operation 710, if determining index state in operation 709SMFor 0, (that is, it is special not include forte pleasure for audio signal
Property), then time domain excitation coding mode can be confirmed as final coding mode.In this case, as initial code pattern
Linear prediction domain coding mode is switched to the time domain excitation coding mode as final coding mode.According to exemplary implementation
Example, if linear prediction domain coding mode is corresponding with time domain excitation coding mode, it is contemplated that initial code pattern is kept not
Become.
In operation 707, if determining index state in operation 709SMFor 1 (that is, audio signal includes the happy characteristic of forte),
Then frequency domain excitation coding mode can be confirmed as final coding mode.In this case, as the linear of initial code pattern
Prediction domain coding mode is corrected as encouraging coding mode as the frequency domain of final coding mode.
By perform operation 700 and 709, to initial code pattern determine in mistake can be corrected.In detail,
Linear prediction domain coding mode (for example, time domain excitation coding mode) as initial code pattern can be kept as final
Coding mode, or frequency domain can be switched to and encourage coding mode as final coding mode.
Accoding to exemplary embodiment, for determining whether audio signal includes strong musical specific property to correct to linear prediction
Domain coding mode determine in wrong operation 709 can be optional.
According to another exemplary embodiment, perform and be used to determine whether audio signal includes the operation 705 of strong characteristics of speech sounds
And for determine frequency domain excitation coding mode if appropriate for the order of operation 701 can be reversed.In other words, operating
After 700, operation 705 can be first carried out, then can perform operation 701.In this case, the parameter for being determined
It can be changed according to necessary demand.
Fig. 8 is the block diagram for the configuration for showing audio decoding apparatus 800 accoding to exemplary embodiment.
The audio decoding apparatus 800 shown in Fig. 8 may include bit stream resolution unit 810, spectral domain decoding unit 820, line
Property prediction domain decoding unit 830 and switch unit 840.Linear prediction domain decoding unit 830 may include time domain excitation decoding unit
831 and frequency domain excitation decoding unit 833, wherein, it is single that linear prediction domain decoding unit 830 can be implemented as time domain excitation decoding
It is at least one in member 831 and frequency domain excitation decoding unit 833.Unless single hardware is necessarily implemented as, otherwise above-mentioned group
Part can be integrated at least one module, and can be implemented as at least one processor (not shown).
With reference to Fig. 8, bit stream resolution unit 810 can dock received bit stream and be parsed and on coding mode
Separated with the information of coded data.Coding mode can be with encoding mould by the characteristic according to audio signal including first
The initial code pattern that a coding mode is determined among multiple coding modes of formula and the second coding mode and is obtained is corresponding,
Or can with to initial code pattern determine in deposit the obtained in the case of an error from initial code mode correction the 3rd
Coding mode is corresponding.
Spectral domain decoding unit 820 can decode the data encoded in spectral domain from separated coded data.
Linear prediction domain decoding unit 830 can be to being encoded from separated coded data in linear prediction domain
Data are decoded.If linear prediction domain decoding unit 830 includes time domain excitation decoding unit 831 and frequency domain excitation decoding
Unit 833, then linear prediction domain decoding unit 830 can be directed to the execution time domain excitation decoding of separated coded data or frequency domain swashs
Encourage decoding.
Switch unit 840 can be to the signal that is reconstructed by spectral domain decoding unit 820 or by linear prediction domain decoding unit 830
The signal of reconstruct switches over, and can provide the signal of switching as the signal finally reconstructed.
Fig. 9 is the block diagram for the configuration for showing audio decoding apparatus 900 according to another exemplary embodiment.
Audio decoding apparatus 900 may include bit stream resolution unit 910, spectral domain decoding unit 920, linear prediction domain solution
Code unit 930, switch unit 940 and public post-processing module 950.Linear prediction domain decoding unit 930 may include time domain excitation
Decoding unit 931 and frequency domain excitation decoding unit 933, wherein, linear prediction domain decoding unit 930 can be implemented as time domain and swash
Encourage at least one in decoding unit 931 and frequency domain excitation decoding unit 933.Unless single hardware is necessarily implemented as, it is no
Then said modules can be integrated at least one module, and can be implemented as at least one processor (not shown).With in Fig. 8
The audio decoding apparatus 800 shown is compared, and audio decoding apparatus 900 may also include public post-processing module 950, therefore, will
Omit the description pair the component identical with the component of audio decoding apparatus 800.
With reference to Fig. 9, public post-processing module 950 is executable corresponding with (Fig. 2's) public pretreatment module (205)
Close three-dimensional sonication, around processing and/or bandwidth expansion processing.
Method accoding to exemplary embodiment can be written as computer executable program and be implemented in general digital
In computer, wherein, the general purpose digital computer performs journey by using non-transitory computer readable recording medium
Sequence.In addition, the data structure that can be used in embodiment, programmed instruction or data file can be recorded in a variety of ways
In non-transitory computer readable recording medium.Non-transitory computer readable recording medium is that can store thereafter can be by calculating
The arbitrary data storage device for the data that machine system is read.The example of non-transitory computer readable recording medium includes:Magnetic is situated between
Matter (such as hard disk, floppy disk and tape), optical record medium (such as CD ROM disks and DVD), magnet-optical medium (such as CD)
And it is specially configured to the hardware unit (ROM, RAM, flash memory etc.) of storage and execute program instructions.In addition, non-transitory
Computer readable recording medium storing program for performing can be the transmission medium for the signal for being used for transmission designated program instruction, data structure etc..Program
The example of instruction can not only include the machine language code produced by compiler, and may also include can use interpreter by computer
Deng the higher-level language code of execution.
Although exemplary embodiment has been particularly shown and described above, those of ordinary skill in the art will
Understand, in the case where not departing from the spirit and scope for the present inventive concept that claim is limited, form can be carried out to it
With the various changes in details.Exemplary embodiment should be to be considered merely as descriptive meaning rather than the purpose for limitation.
Therefore, the scope of present inventive concept is limited by the detailed description of exemplary embodiment, but is limited by claim
It is fixed, and all differences in the scope are to be interpreted as being included in present inventive concept.
Claims (8)
1. a kind of equipment for determining coding mode, the equipment includes:
At least one processing unit, is configured as:
Based on characteristics of signals, the classification of definite present frame among multiple classifications including music categories and voice class;
Characteristic parameter is obtained from multiple frames including present frame;
At least one condition is produced based on the characteristic parameter;
Based at least one condition, determine whether mistake occurs in the classification of identified present frame;
When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction.
2. equipment as claimed in claim 1, wherein, the processing unit is configured as:
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is music categories
The classification of identified present frame is corrected to voice class;
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is voice class
The classification of identified present frame is corrected to music categories.
3. equipment as claimed in claim 1, wherein, the characteristic parameter includes tone and linear prediction error.
4. equipment as claimed in claim 3, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter
Difference.
5. a kind of audio coding apparatus, the equipment includes:
At least one processing unit, is configured as:
Based on characteristics of signals, the classification of definite present frame among multiple classifications including music categories and voice class;
Characteristic parameter is obtained from multiple frames including present frame;
At least one condition is produced based on the characteristic parameter;
Based at least one condition and hangover parameter, determine whether mistake occurs in the classification of identified present frame
By mistake;
When determining that mistake is occurring in the classification of identified present frame, the classification of present frame determined by correction;
Based on the classification of the present frame after the classification of identified present frame or correction, present frame is performed at different codings
Reason.
6. equipment as claimed in claim 5, wherein, the processing unit is configured as:
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is music categories
The classification of identified present frame is corrected to voice class;
, will when the classification that wrong and identified present frame occurs in the classification in identified present frame is voice class
The classification of identified present frame is corrected to music categories.
7. equipment as claimed in claim 6, wherein, the characteristic parameter includes tone and linear prediction error.
8. equipment as claimed in claim 5, wherein, the characteristic parameter is further included between voiced sound parameter and degree of correlation parameter
Difference.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261725694P | 2012-11-13 | 2012-11-13 | |
US61/725,694 | 2012-11-13 | ||
CN201380070268.6A CN104919524B (en) | 2012-11-13 | 2013-11-13 | For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201380070268.6A Division CN104919524B (en) | 2012-11-13 | 2013-11-13 | For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107958670A true CN107958670A (en) | 2018-04-24 |
CN107958670B CN107958670B (en) | 2021-11-19 |
Family
ID=50731440
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711424971.9A Active CN108074579B (en) | 2012-11-13 | 2013-11-13 | Method for determining coding mode and audio coding method |
CN201380070268.6A Active CN104919524B (en) | 2012-11-13 | 2013-11-13 | For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal |
CN201711421463.5A Active CN107958670B (en) | 2012-11-13 | 2013-11-13 | Device for determining coding mode and audio coding device |
Family Applications Before (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711424971.9A Active CN108074579B (en) | 2012-11-13 | 2013-11-13 | Method for determining coding mode and audio coding method |
CN201380070268.6A Active CN104919524B (en) | 2012-11-13 | 2013-11-13 | For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal |
Country Status (18)
Country | Link |
---|---|
US (3) | US20140188465A1 (en) |
EP (2) | EP2922052B1 (en) |
JP (2) | JP6170172B2 (en) |
KR (3) | KR102446441B1 (en) |
CN (3) | CN108074579B (en) |
AU (2) | AU2013345615B2 (en) |
BR (1) | BR112015010954B1 (en) |
CA (1) | CA2891413C (en) |
ES (1) | ES2900594T3 (en) |
MX (2) | MX349196B (en) |
MY (1) | MY188080A (en) |
PH (1) | PH12015501114A1 (en) |
PL (1) | PL2922052T3 (en) |
RU (3) | RU2630889C2 (en) |
SG (2) | SG11201503788UA (en) |
TW (2) | TWI612518B (en) |
WO (1) | WO2014077591A1 (en) |
ZA (1) | ZA201504289B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6599368B2 (en) | 2014-02-24 | 2019-10-30 | サムスン エレクトロニクス カンパニー リミテッド | Signal classification method and apparatus, and audio encoding method and apparatus using the same |
US9886963B2 (en) * | 2015-04-05 | 2018-02-06 | Qualcomm Incorporated | Encoder selection |
CN107731238B (en) | 2016-08-10 | 2021-07-16 | 华为技术有限公司 | Coding method and coder for multi-channel signal |
CN109389987B (en) | 2017-08-10 | 2022-05-10 | 华为技术有限公司 | Audio coding and decoding mode determining method and related product |
US10325588B2 (en) | 2017-09-28 | 2019-06-18 | International Business Machines Corporation | Acoustic feature extractor selected according to status flag of frame of acoustic signal |
US11032580B2 (en) | 2017-12-18 | 2021-06-08 | Dish Network L.L.C. | Systems and methods for facilitating a personalized viewing experience |
US10365885B1 (en) * | 2018-02-21 | 2019-07-30 | Sling Media Pvt. Ltd. | Systems and methods for composition of audio content from multi-object audio |
CN111081264B (en) * | 2019-12-06 | 2022-03-29 | 北京明略软件系统有限公司 | Voice signal processing method, device, equipment and storage medium |
WO2023048410A1 (en) * | 2021-09-24 | 2023-03-30 | 삼성전자 주식회사 | Electronic device for data packet transmission or reception, and operation method thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030101050A1 (en) * | 2001-11-29 | 2003-05-29 | Microsoft Corporation | Real-time speech and music classifier |
CN1954364A (en) * | 2004-05-17 | 2007-04-25 | 诺基亚公司 | Audio encoding with different coding frame lengths |
CN101197135A (en) * | 2006-12-05 | 2008-06-11 | 华为技术有限公司 | Aural signal classification method and device |
CN101350199A (en) * | 2008-07-29 | 2009-01-21 | 北京中星微电子有限公司 | Audio encoder and audio encoding method |
US20120069899A1 (en) * | 2002-09-04 | 2012-03-22 | Microsoft Corporation | Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes |
Family Cites Families (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2102080C (en) * | 1992-12-14 | 1998-07-28 | Willem Bastiaan Kleijn | Time shifting for generalized analysis-by-synthesis coding |
DE69926821T2 (en) * | 1998-01-22 | 2007-12-06 | Deutsche Telekom Ag | Method for signal-controlled switching between different audio coding systems |
JP3273599B2 (en) | 1998-06-19 | 2002-04-08 | 沖電気工業株式会社 | Speech coding rate selector and speech coding device |
US6691084B2 (en) * | 1998-12-21 | 2004-02-10 | Qualcomm Incorporated | Multiple mode variable rate speech coding |
US6704711B2 (en) * | 2000-01-28 | 2004-03-09 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for modifying speech signals |
US6658383B2 (en) * | 2001-06-26 | 2003-12-02 | Microsoft Corporation | Method for coding speech and music signals |
WO2004034379A2 (en) * | 2002-10-11 | 2004-04-22 | Nokia Corporation | Methods and devices for source controlled variable bit-rate wideband speech coding |
US20050096898A1 (en) * | 2003-10-29 | 2005-05-05 | Manoj Singhal | Classification of speech and music using sub-band energy |
FI118834B (en) * | 2004-02-23 | 2008-03-31 | Nokia Corp | Classification of audio signals |
US7512536B2 (en) * | 2004-05-14 | 2009-03-31 | Texas Instruments Incorporated | Efficient filter bank computation for audio coding |
US7739120B2 (en) * | 2004-05-17 | 2010-06-15 | Nokia Corporation | Selection of coding models for encoding an audio signal |
WO2006137425A1 (en) * | 2005-06-23 | 2006-12-28 | Matsushita Electric Industrial Co., Ltd. | Audio encoding apparatus, audio decoding apparatus and audio encoding information transmitting apparatus |
US7733983B2 (en) * | 2005-11-14 | 2010-06-08 | Ibiquity Digital Corporation | Symbol tracking for AM in-band on-channel radio receivers |
US7558809B2 (en) * | 2006-01-06 | 2009-07-07 | Mitsubishi Electric Research Laboratories, Inc. | Task specific audio classification for identifying video highlights |
US8346544B2 (en) * | 2006-01-20 | 2013-01-01 | Qualcomm Incorporated | Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision |
KR100790110B1 (en) * | 2006-03-18 | 2008-01-02 | 삼성전자주식회사 | Apparatus and method of voice signal codec based on morphological approach |
EP2092517B1 (en) * | 2006-10-10 | 2012-07-18 | QUALCOMM Incorporated | Method and apparatus for encoding and decoding audio signals |
CN101197130B (en) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | Sound activity detecting method and detector thereof |
KR100964402B1 (en) * | 2006-12-14 | 2010-06-17 | 삼성전자주식회사 | Method and Apparatus for determining encoding mode of audio signal, and method and appartus for encoding/decoding audio signal using it |
CN101025918B (en) * | 2007-01-19 | 2011-06-29 | 清华大学 | Voice/music dual-mode coding-decoding seamless switching method |
KR20080075050A (en) * | 2007-02-10 | 2008-08-14 | 삼성전자주식회사 | Method and apparatus for updating parameter of error frame |
US8060363B2 (en) * | 2007-02-13 | 2011-11-15 | Nokia Corporation | Audio signal encoding |
CN101256772B (en) * | 2007-03-02 | 2012-02-15 | 华为技术有限公司 | Method and device for determining attribution class of non-noise audio signal |
US9653088B2 (en) * | 2007-06-13 | 2017-05-16 | Qualcomm Incorporated | Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding |
ES2533358T3 (en) * | 2007-06-22 | 2015-04-09 | Voiceage Corporation | Procedure and device to estimate the tone of a sound signal |
KR101380170B1 (en) * | 2007-08-31 | 2014-04-02 | 삼성전자주식회사 | A method for encoding/decoding a media signal and an apparatus thereof |
CN101393741A (en) * | 2007-09-19 | 2009-03-25 | 中兴通讯股份有限公司 | Audio signal classification apparatus and method used in wideband audio encoder and decoder |
CN101399039B (en) * | 2007-09-30 | 2011-05-11 | 华为技术有限公司 | Method and device for determining non-noise audio signal classification |
CN101236742B (en) * | 2008-03-03 | 2011-08-10 | 中兴通讯股份有限公司 | Music/ non-music real-time detection method and device |
AU2009220321B2 (en) * | 2008-03-03 | 2011-09-22 | Intellectual Discovery Co., Ltd. | Method and apparatus for processing audio signal |
JP2011518345A (en) * | 2008-03-14 | 2011-06-23 | ドルビー・ラボラトリーズ・ライセンシング・コーポレーション | Multi-mode coding of speech-like and non-speech-like signals |
WO2009118044A1 (en) * | 2008-03-26 | 2009-10-01 | Nokia Corporation | An audio signal classifier |
EP2144231A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme with common preprocessing |
EP2144230A1 (en) | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme having cascaded switches |
WO2010003521A1 (en) * | 2008-07-11 | 2010-01-14 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Method and discriminator for classifying different segments of a signal |
JP5555707B2 (en) * | 2008-10-08 | 2014-07-23 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | Multi-resolution switching audio encoding and decoding scheme |
CN101751920A (en) * | 2008-12-19 | 2010-06-23 | 数维科技(北京)有限公司 | Audio classification and implementation method based on reclassification |
KR101622950B1 (en) * | 2009-01-28 | 2016-05-23 | 삼성전자주식회사 | Method of coding/decoding audio signal and apparatus for enabling the method |
JP4977157B2 (en) * | 2009-03-06 | 2012-07-18 | 株式会社エヌ・ティ・ティ・ドコモ | Sound signal encoding method, sound signal decoding method, encoding device, decoding device, sound signal processing system, sound signal encoding program, and sound signal decoding program |
CN101577117B (en) * | 2009-03-12 | 2012-04-11 | 无锡中星微电子有限公司 | Extracting method of accompaniment music and device |
CN101847412B (en) * | 2009-03-27 | 2012-02-15 | 华为技术有限公司 | Method and device for classifying audio signals |
US20100253797A1 (en) * | 2009-04-01 | 2010-10-07 | Samsung Electronics Co., Ltd. | Smart flash viewer |
KR20100115215A (en) * | 2009-04-17 | 2010-10-27 | 삼성전자주식회사 | Apparatus and method for audio encoding/decoding according to variable bit rate |
KR20110022252A (en) * | 2009-08-27 | 2011-03-07 | 삼성전자주식회사 | Method and apparatus for encoding/decoding stereo audio |
PL2491555T3 (en) * | 2009-10-20 | 2014-08-29 | Fraunhofer Ges Forschung | Multi-mode audio codec |
CN102237085B (en) * | 2010-04-26 | 2013-08-14 | 华为技术有限公司 | Method and device for classifying audio signals |
JP5749462B2 (en) | 2010-08-13 | 2015-07-15 | 株式会社Nttドコモ | Audio decoding apparatus, audio decoding method, audio decoding program, audio encoding apparatus, audio encoding method, and audio encoding program |
CN102446504B (en) * | 2010-10-08 | 2013-10-09 | 华为技术有限公司 | Voice/Music identifying method and equipment |
CN102385863B (en) * | 2011-10-10 | 2013-02-20 | 杭州米加科技有限公司 | Sound coding method based on speech music classification |
US9111531B2 (en) * | 2012-01-13 | 2015-08-18 | Qualcomm Incorporated | Multiple coding mode signal classification |
WO2014010175A1 (en) * | 2012-07-09 | 2014-01-16 | パナソニック株式会社 | Encoding device and encoding method |
-
2013
- 2013-11-13 CN CN201711424971.9A patent/CN108074579B/en active Active
- 2013-11-13 JP JP2015542948A patent/JP6170172B2/en active Active
- 2013-11-13 BR BR112015010954-3A patent/BR112015010954B1/en active IP Right Grant
- 2013-11-13 TW TW102141400A patent/TWI612518B/en active
- 2013-11-13 KR KR1020217038093A patent/KR102446441B1/en active IP Right Grant
- 2013-11-13 RU RU2015122128A patent/RU2630889C2/en active
- 2013-11-13 CN CN201380070268.6A patent/CN104919524B/en active Active
- 2013-11-13 MX MX2015006028A patent/MX349196B/en active IP Right Grant
- 2013-11-13 AU AU2013345615A patent/AU2013345615B2/en active Active
- 2013-11-13 MX MX2017009362A patent/MX361866B/en unknown
- 2013-11-13 ES ES13854639T patent/ES2900594T3/en active Active
- 2013-11-13 US US14/079,090 patent/US20140188465A1/en not_active Abandoned
- 2013-11-13 KR KR1020157012623A patent/KR102331279B1/en active IP Right Grant
- 2013-11-13 MY MYPI2015701531A patent/MY188080A/en unknown
- 2013-11-13 PL PL13854639T patent/PL2922052T3/en unknown
- 2013-11-13 TW TW106140629A patent/TWI648730B/en active
- 2013-11-13 EP EP13854639.5A patent/EP2922052B1/en active Active
- 2013-11-13 CA CA2891413A patent/CA2891413C/en active Active
- 2013-11-13 SG SG11201503788UA patent/SG11201503788UA/en unknown
- 2013-11-13 CN CN201711421463.5A patent/CN107958670B/en active Active
- 2013-11-13 SG SG10201706626XA patent/SG10201706626XA/en unknown
- 2013-11-13 EP EP21192621.7A patent/EP3933836A1/en active Pending
- 2013-11-13 KR KR1020227032281A patent/KR102561265B1/en active IP Right Grant
- 2013-11-13 WO PCT/KR2013/010310 patent/WO2014077591A1/en active Application Filing
- 2013-11-13 RU RU2017129727A patent/RU2656681C1/en active
-
2015
- 2015-05-13 PH PH12015501114A patent/PH12015501114A1/en unknown
- 2015-06-12 ZA ZA2015/04289A patent/ZA201504289B/en unknown
-
2017
- 2017-06-29 JP JP2017127285A patent/JP6530449B2/en active Active
- 2017-07-20 AU AU2017206243A patent/AU2017206243B2/en active Active
-
2018
- 2018-04-18 RU RU2018114257A patent/RU2680352C1/en active
- 2018-07-18 US US16/039,110 patent/US10468046B2/en active Active
-
2019
- 2019-10-04 US US16/593,041 patent/US11004458B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030101050A1 (en) * | 2001-11-29 | 2003-05-29 | Microsoft Corporation | Real-time speech and music classifier |
US20120069899A1 (en) * | 2002-09-04 | 2012-03-22 | Microsoft Corporation | Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes |
CN1954364A (en) * | 2004-05-17 | 2007-04-25 | 诺基亚公司 | Audio encoding with different coding frame lengths |
CN101197135A (en) * | 2006-12-05 | 2008-06-11 | 华为技术有限公司 | Aural signal classification method and device |
CN101350199A (en) * | 2008-07-29 | 2009-01-21 | 北京中星微电子有限公司 | Audio encoder and audio encoding method |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104919524B (en) | For determining the method and apparatus of coding mode, the method and apparatus for the method and apparatus that is encoded to audio signal and for being decoded to audio signal | |
TWI459379B (en) | Audio encoder and decoder for encoding and decoding audio samples | |
CN103493129B (en) | For using Transient detection and quality results by the apparatus and method of the code segment of audio signal | |
MX2011000362A (en) | Low bitrate audio encoding/decoding scheme having cascaded switches. | |
CN107112022A (en) | The method and apparatus hidden for data-bag lost and the coding/decoding method and device using this method | |
US11922962B2 (en) | Unified speech/audio codec (USAC) processing windows sequence based mode switching | |
US20240212698A1 (en) | Unified speech/audio codec (usac) processing windows sequence based mode switching | |
JP2002244700A (en) | Device and method for sound encoding and storage element |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |