WO2014077591A1 - 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 - Google Patents
부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 Download PDFInfo
- Publication number
- WO2014077591A1 WO2014077591A1 PCT/KR2013/010310 KR2013010310W WO2014077591A1 WO 2014077591 A1 WO2014077591 A1 WO 2014077591A1 KR 2013010310 W KR2013010310 W KR 2013010310W WO 2014077591 A1 WO2014077591 A1 WO 2014077591A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- encoding mode
- encoding
- mode
- initial
- audio signal
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000005236 sound signal Effects 0.000 title claims abstract description 41
- 230000005284 excitation Effects 0.000 claims description 80
- 230000003595 spectral effect Effects 0.000 claims description 48
- 238000012545 processing Methods 0.000 claims description 25
- 206010019133 Hangover Diseases 0.000 claims description 7
- 238000012937 correction Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 16
- 238000007781 pre-processing Methods 0.000 description 10
- 239000000203 mixture Substances 0.000 description 9
- 238000004088 simulation Methods 0.000 description 9
- 230000002123 temporal effect Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000001228 spectrum Methods 0.000 description 5
- 230000007774 longterm Effects 0.000 description 4
- 230000003044 adaptive effect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000001755 vocal effect Effects 0.000 description 3
- 238000010561 standard procedure Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 239000002689 soil Substances 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000011426 transformation method Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
Definitions
- the present invention relates to audio encoding and decoding, and more particularly, to a method and apparatus for determining a coding mode that can improve reconstructed sound quality by preventing frequent coding mode switching while determining an encoding mode suitable for a characteristic of an audio signal.
- An encoding method and apparatus, and a signal decoding method and apparatus are particularly, to a method and apparatus for determining a coding mode that can improve reconstructed sound quality by preventing frequent coding mode switching while determining an encoding mode suitable for a characteristic of an audio signal.
- An object of the present invention is to provide an encoding mode determining method and apparatus, an audio encoding method and apparatus, and an audio decoding method and apparatus capable of improving a reconstructed sound quality by determining an encoding mode suitable for a characteristic of an audio signal.
- the present invention provides an encoding mode determining method and apparatus, an audio encoding method and apparatus, and an audio decoding method and apparatus capable of reducing a delay due to encoding mode switching while determining an encoding mode to suit the characteristics of an audio signal. have.
- a method of determining an encoding mode may include determining one of a plurality of encoding modes including a first encoding mode and a second encoding mode as an initial encoding mode of a current frame according to a characteristic of an audio signal; And generating an amended encoding mode by modifying the initial encoding mode to a third encoding mode when an error exists in the determination of the initial encoding mode.
- the audio encoding method determines one of a plurality of encoding modes including a first encoding mode and a second encoding mode as an initial encoding mode of a current frame according to a characteristic of an audio signal, and If the error exists in the determination, modifying the initial encoding mode to a third encoding mode to generate a modified encoding mode; And performing different encoding processing on the audio signal in response to the initial encoding mode or the modified encoding mode.
- the audio decoding method there is an error in determining an initial encoding mode or the initial encoding mode determined as one of a plurality of encoding modes including a first encoding mode and a second encoding mode in response to characteristics of an audio signal. Parsing the bitstream including one of the third encoding modes modified from the initial encoding mode as an encoding mode; And performing different decoding processing on the bitstream according to the encoding mode.
- the final encoding mode of the current frame is determined to determine the encoding mode that is adaptive to the characteristics of the audio signal, while switching between the encoding modes frequently. It can prevent.
- FIG. 1 is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
- FIG. 2 is a block diagram illustrating a configuration of an audio encoding apparatus according to another embodiment.
- FIG. 3 is a block diagram illustrating a configuration of an encoding mode determiner, according to an exemplary embodiment.
- FIG. 4 is a block diagram illustrating a configuration of an initial encoding mode determiner, according to an embodiment.
- FIG. 5 is a block diagram illustrating a configuration of a feature parameter extractor according to an exemplary embodiment.
- FIG. 6 illustrates an adaptive switching method for linear prediction domain domain and spectral domain encoding according to an embodiment.
- FIG. 7 is a diagram illustrating an operation of an encoding mode corrector, according to an exemplary embodiment.
- FIG. 8 is a block diagram illustrating a configuration of an audio decoding apparatus according to an embodiment.
- FIG. 9 is a block diagram illustrating a configuration of an audio decoding apparatus according to another embodiment.
- first and second may be used to describe various components, but the components should not be limited by the terms. The terms may be used only for the purpose of distinguishing one component from another component.
- each component shown in the embodiments are shown independently to represent different characteristic functions, and does not mean that each component is made of separate hardware or one software component unit.
- Each component is listed as each component for convenience of description, and at least two of the components may be combined into one component, or one component may be divided into a plurality of components to perform a function.
- FIG. 1 is a block diagram illustrating a configuration of an audio encoding apparatus according to an embodiment.
- the audio encoding apparatus 100 illustrated in FIG. 1 includes an encoding mode determiner 110, a switching unit 120, a spectral domain encoder 130, a linear predictive domain encoder 140, and a bitstream generator 150. It may include.
- the linear prediction domain encoder 140 may include a time domain excitation encoder 141 and a frequency domain excitation encoder 143, and may be implemented with at least one of two excitation encoders 141 and 143.
- each component may be integrated into at least one module and implemented as at least one processor (not shown), except that the components need to be implemented in separate hardware.
- the audio may mean music or voice, or a mixed signal of music and voice.
- the encoding mode determiner 110 may classify an audio signal type by analyzing characteristics of an audio signal, and determine an encoding mode according to the classification result.
- the encoding mode may be performed in a superframe unit, a frame unit, or a band unit.
- the operation may be performed in units of a plurality of superframe groups, a plurality of frame groups, and a plurality of band groups.
- two examples of the encoding mode may be spectral domain, time domain, or linear prediction domain, but are not limited thereto.
- the encoding mode may be further subdivided, and the encoding scheme may be further subdivided according to the encoding mode.
- the initial encoding mode may be determined as one of a spectral domain encoding mode and a time domain encoding mode.
- the initial encoding mode may be determined as one of a spectral domain encoding mode, a time domain excitation encoding mode, and a frequency domain excitation encoding mode.
- the encoding mode determiner 110 may modify one of the spectral domain encoding mode and the frequency domain excitation encoding mode. If the encoding mode determiner 110 determines that the initial encoding mode is a time domain encoding mode, that is, a time domain excitation encoding mode, the encoding mode determiner 110 may modify one of the time domain (TD) excitation encoding mode and the frequency domain (FD) excitation encoding mode.
- the initial encoding mode is determined as the time domain excitation encoding mode
- the final encoding mode determination process may be selectively performed.
- the initial encoding mode which is the time domain excitation encoding mode may be maintained as it is.
- the encoding mode determiner 110 may determine the encoding mode with respect to the number of frames corresponding to the hangover length to determine the final encoding mode of the current frame.
- the initial encoding mode or the modified encoding mode of the current frame is the same as the encoding mode of a plurality of, for example, seven previous frames, the initial encoding mode or the modified encoding mode is referred to as the final encoding mode of the current frame. Can be determined.
- the encoding mode determiner 110 determines the encoding mode of the immediately preceding frame as the final encoding mode of the current frame. Can be.
- the final encoding mode of the current frame is determined by referring to the encoding mode of the frames corresponding to the modification of the initial encoding mode and the hangover length, thereby frequently encoding the frame while determining the encoding mode adaptive to the characteristics of the audio signal. Switching of the mode can be prevented.
- frequency domain excitation encoding may be efficient when classified into a speech signal, that is, time domain encoding, that is, time domain excitation encoding, and when classified into a music signal, spectral domain encoding and vocal and / or harmonic signals.
- the switching unit 120 may provide an audio signal to one of the spectral domain encoder 130 and the linear prediction domain encoder 140 in response to the encoding mode determined by the encoding mode determiner 110.
- the switching unit 120 includes two branches, a time domain excitation encoder 141, and a frequency domain excitation encoder 143. In this case, the switching unit 120 may have a total of three branches.
- the spectral domain encoder 130 may encode an audio signal in the spectral domain.
- the spectral domain may mean a frequency domain or a transform domain.
- Coding schemes that can be applied to the spectral domain encoder 130 may include, but are not limited to, an advanced audio coding (AAC) scheme or a combined discrete cosine transform (MDCT) scheme and a passive pulse coding (FPC) scheme.
- AAC advanced audio coding
- MDCT combined discrete cosine transform
- FPC passive pulse coding
- other quantization and entropy coding schemes may be used instead of FPC.
- the linear prediction domain encoder 140 may encode the audio signal in the linear prediction domain.
- the linear prediction domain may mean an excitation domain or a time domain.
- the linear prediction domain encoder 140 may be implemented by the time domain excitation encoder 141 or may include a time domain excitation encoder 141 and a frequency domain excitation encoder 143.
- a CELP Code Excited Linear Prediction
- ACELP Algebraic CELP
- Coding schemes that may be applied to the frequency domain excitation encoder 143 may include, but are not limited to, General Signal Coding (GSC) or Transform Coded eXcitation (TCX).
- GSC General Signal Coding
- TCX Transform Coded eXcitation
- the bitstream generator 150 includes an encoding mode provided by the encoding mode determiner 110, an encoding result provided by the spectral domain encoder 130, and an encoding result provided by the linear predictive domain encoder 140. A bitstream can be generated.
- FIG. 2 is a block diagram illustrating a configuration of an audio encoding apparatus according to another embodiment.
- the audio encoding apparatus 200 illustrated in FIG. 2 includes a common preprocessing module 205, an encoding mode determiner 210, a switching unit 220, a spectral domain encoder 230, a linear predictive domain encoder 240, The bitstream generator 250 may be included.
- the linear prediction domain encoder 240 may include a time domain excitation encoder 241 and a frequency domain excitation encoder 243, and may be implemented with at least one of two excitation encoders 241 and 243.
- the common preprocessing module 205 is further added, and operation descriptions for common elements will be omitted.
- the common preprocessing module 205 may perform joint stereo processing, surround processing, and / or bandwidth extension processing.
- the joint stereo processing, the surround processing, and the bandwidth extension processing may be applied to a specific standard method, for example, the MPEG standard method, but are not limited thereto.
- the output of the common preprocessing module 205 can be a mono channel, stereo channel or multichannel.
- the switching unit 220 may be configured of at least one or more switches according to the number of channels of the signal output from the common preprocessing module 205. For example, when the common preprocessing module 205 outputs two or more channel outputs, that is, a stereo channel or a multichannel signal, a switch corresponding to each channel may be provided.
- the first channel of the stereo signal may be a voice channel and the second channel of the stereo signal may be a music channel, in which case the audio signal may be provided to two switches simultaneously.
- the additional information generated by the common preprocessing module 205 may be provided to the bitstream generator 250 to be included in the bitstream.
- the additional information is information required for performing joint stereo processing, surround processing and / or bandwidth extension processing at the decoding end, and may include spatial parameters, envelope information, energy information, and the like. May be present.
- the bandwidth extension processing in the common preprocessing module 205 may be performed differently according to the encoding domain.
- the audio signal of the core band may be processed using a time domain excitation coding scheme or a frequency domain excitation coding scheme, and the audio signal of a bandwidth extension band may be processed in the time domain.
- the bandwidth extension processing mode in the time domain may have a plurality of modes including a voiced sound mode or an unvoiced sound mode.
- the audio signal of the core band may be processed using the spectral domain method, and the audio signal of the bandwidth extension band may be processed in the frequency domain.
- the bandwidth extension processing mode in the frequency domain may exist in a plurality of modes including a transient mode, a normal mode, or a harmonic mode.
- the encoding mode determined by the encoding mode determiner 110 may be provided to the common preprocessing module 205 as signaling information for bandwidth extension processing in different domains.
- the last portion of the core band and the beginning portion of the bandwidth extension band may overlap.
- the position and size of the overlapping area may be predetermined.
- FIG. 3 is a block diagram illustrating a configuration of an encoding mode determiner, according to an exemplary embodiment.
- the encoding mode determiner 300 illustrated in FIG. 3 may include an initial encoding mode determiner 310 and an encoding mode corrector 330.
- the initial encoding mode determiner 310 may classify a type of a music signal or a voice signal by using feature parameters extracted from an audio signal. If it is classified as a speech signal, a linear prediction domain encoding process may be preferable. On the other hand, when classified as a music signal, the spectral domain encoding process may be preferable. The initial encoding mode determiner 310 may classify the type whether the spectral domain processing is appropriate, the time domain excitation processing, or the frequency domain excitation processing is suitable using the feature parameters extracted from the audio signal. According to the type of the audio signal, a corresponding encoding mode may be determined. 1 may be represented by two bits of the switching unit (120 of FIG.
- the type classification method into the music signal or the voice signal in the initial encoding mode determiner 310 various known methods may be used.
- the FD / LPD classification or the ACELP / TCX classification described in the encoder part of the USAC standard or the ACELP / TCX classification used in the AMR standard is not limited thereto.
- how to determine the initial encoding mode it is apparent that a variety of methods may be used in addition to the methods described in the embodiments.
- the encoding mode correction unit 330 may determine the modified encoding mode by modifying the initial encoding mode determined by the initial encoding mode determiner 310 using the correction parameter.
- the initial encoding mode when the initial encoding mode is determined as the spectral domain encoding mode, it may be modified to the frequency domain excitation encoding mode based on the correction parameter.
- the initial encoding mode when the initial encoding mode is determined as the time domain encoding mode, it may be modified to the frequency domain excitation encoding mode based on the correction parameter. That is, it is determined by using a correction parameter whether there is an error in the determination of the initial encoding mode, and if it is determined that there is no error in the determination of the initial encoding mode, it is kept as it is. Can be.
- the modification range of the initial encoding mode may be a frequency domain excitation encoding mode from the spectral domain encoding mode and a frequency domain excitation encoding mode from the time domain excitation encoding mode
- the initial encoding mode or the modified encoding mode is a temporary encoding mode of the current frame.
- the encoding mode of the current frame is compared with the encoding mode of previous frames within a predetermined hangover length, and the final encoding of the current frame is performed according to the comparison result.
- the mode can be determined.
- FIG. 4 is a block diagram illustrating a configuration of an initial encoding mode determiner, according to an embodiment.
- the initial encoding mode determiner 400 illustrated in FIG. 4 may include a feature parameter extractor 410 and a determiner 430.
- the feature parameter extractor 410 may extract feature parameters required to determine an encoding mode from an audio signal.
- Examples of feature parameters to be extracted may include at least one or a combination of at least two of pitch parameters, voicing parameters, correlation parameters, and linear prediction errors, but are not limited thereto.
- the feature parameters will be described in more detail as follows.
- the first feature parameter F1 is related to the pitch parameter, and the behavior of pitch may be determined using N pitch values detected from the current frame and at least one previous frame.
- M pitch values having a large difference can be removed from the average of the N pitch values.
- N and M can be set to the optimum value in advance experimentally or through simulation.
- N is set in advance, and an optimum value can be set in advance experimentally or through simulation to determine how much difference or more of pitch values are to be removed from the average of the N pitch values.
- the second feature parameter F2 is also related to the pitch parameter and may indicate the reliability of the pitch value detected in the current frame.
- the second feature parameter F2 may be expressed by Equation 2 using variances sigma SF1 and sigma SF2 of the pitch values detected in two subframes SF1 and SF2 in the current frame, respectively.
- cov (SF 1 , SF 2 ) represents covariance between subframes SF1 and SF2. That is, the second feature parameter F2 represents a correlation between two subframes as a pitch distance.
- the current frame may be composed of two or more subframes, and Equation 2 may be modified according to the number of subframes.
- the third feature parameter F3 may be represented by Equation 3 below from the voicing parameter and the correlation parameter Corr.
- the voicing parameter (voicing) can be obtained by various methods known to be related to the vocal characteristics of the sound
- the correlation parameter (Corr) can be obtained as the sum of the interframe correlation for each band.
- the fourth feature parameter F4 is related to the linear prediction error (ELPC) and can be expressed as Equation 4 below.
- M (ELPC) represents the average of N linear prediction errors.
- the determiner 430 may classify the type of the audio signal using at least one feature parameter provided from the feature parameter extractor 410, and determine an initial encoding mode according to the classified type.
- the decision unit 430 may preferably apply a soft decision method, and may form at least one mixture for each feature parameter.
- the type of an audio signal may be classified using a Gaussian Mixture Model (GMM) based on a mixture probability.
- GMM Gaussian Mixture Model
- the probability f (x) for one mix may be calculated by Equation 5 below.
- x is an input vector of feature parameters
- m is a mixture
- c is a covariance matrix
- the determination unit 430 may calculate the music probability Pm and the voice probability Ps by using Equation 6 below.
- the probability P for the M mixtures related to the feature parameter superior to the classification into music is added to calculate the music probability Pm, and the probability for the S mixtures related to the feature parameter superior to the classification into the voice.
- a negative probability Ps is calculated by adding all Pis.
- the music probability Pm and the speech probability Ps may be calculated using Equation 7 below.
- p_i ⁇ err represents an error probability for each mix.
- the error probability may be obtained by classifying the training data including the clean voice signal and the clean music signal by using each mix and checking the number of the wrong classification.
- the probability Pm in which all the frames are music and the probability Ps in which all the frames are speech can be calculated for a plurality of frames having a predetermined hangover length by using Equation 8 below.
- the hangover length may be set to 8, but is not limited thereto.
- Eight frames may include the current frame and seven previous frames.
- a plurality of condition sets ⁇ D_i ⁇ M ⁇ and ⁇ D_i ⁇ S ⁇ may be calculated using the music probability and the voice probability obtained using Equation 5 or 6. This will be described in more detail with reference to FIG. 6 as follows.
- each of the conditions may be set to have a value of 1 for music and 0 for voice.
- the sum S of the negative conditions can be found. That is, the sum M of music conditions and the sum S of voice conditions may be represented by Equation 9 below.
- the sum M of the music conditions is compared with a predetermined threshold Tm. If the comparison result M is greater than Tm, the encoding mode of the current frame is switched to the music mode, that is, the spectral domain mode. On the other hand, if M is less than or equal to Tm as a result of the comparison in step 630, the encoding mode of the current frame is not changed.
- the sum S of speech conditions is compared with a predetermined threshold Ts. If S is greater than Ts, the encoding mode of the current frame is switched to the speech mode, that is, the linear prediction domain domain mode. On the other hand, if S is less than or equal to Ts as a result of the comparison in step 640, the encoding mode of the current frame is not changed.
- the threshold values Tm and Ts used in steps 630 and 640 may be set to optimal values in advance by experimentation or simulation.
- FIG. 5 is a block diagram illustrating a configuration of a feature parameter extractor according to an exemplary embodiment.
- the initial encoding mode determiner 500 illustrated in FIG. 5 may include a transformer 510, a spectral parameter extractor 520, a temporal parameter extractor 530, and a determiner 540.
- the converter 510 may convert the original audio signal from the time domain to the frequency domain.
- the transform unit 510 may apply various transformation methods that can represent an audio signal of a time representation as a spectral representation. For example, the Fast Fourier Transform (FFT), the Discrete Cosine Transform (DCT), or the Modified Discrete Cosine (MDCT). Transform), but is not limited thereto.
- FFT Fast Fourier Transform
- DCT Discrete Cosine Transform
- MDCT Modified Discrete Cosine
- the spectral parameter extractor 520 may extract at least one or more spectral parameters from the audio signal of the frequency domain provided from the converter 510.
- the spectral parameters may be classified into short-term feature parameters and long-term feature parameters.
- the short term feature parameter may be obtained from a single current frame
- the long term feature parameter may be obtained from a plurality of frames including the current frame and at least one past frame.
- the temporal parameter extractor 530 may extract at least one temporal parameter from the audio signal in the time domain.
- temporal parameters may be classified into short-term feature parameters and long-term feature parameters.
- the short term feature parameter may be obtained from a single current frame
- the long term feature parameter may be obtained from a plurality of frames including the current frame and at least one past frame.
- the determination unit 430 of FIG. 4 classifies and classifies the type of the audio signal using the spectral parameter provided from the spectral parameter extractor 520 and the temporal parameter provided from the temporal parameter extractor 530.
- the initial encoding mode may be determined according to the type.
- the decision unit 430 of FIG. 4 may preferably apply a soft decision method.
- FIG. 7 is a diagram illustrating an operation of an encoding mode corrector, according to an exemplary embodiment.
- an initial encoding mode determined by the initial encoding mode determiner 310 may be determined to determine whether a time domain mode, that is, a time domain excitation mode or a spectrum domain mode.
- Frequency domain excitation coding For example, the indicator state TTSS indicating whether GSC is suitable can be obtained by using tonalities of different frequency bands. This will be described in more detail as follows.
- the tonality of the low band signal may be obtained as the ratio between the sum of a plurality of spectral coefficients having a small value including the minimum value for a given band and the spectral coefficient that is the maximum value. 1 ⁇ 2 kHz, 2 ⁇
- the 4 kHz in each band sat neolreo T t 01, t 12, t 24 and low-band signals, that is, soil neolreo T t L of the core band to have a given band is 0 ⁇ 1 kHz, respectively, It may be expressed as in Equation 10.
- the linear prediction error err may be obtained by using an LPC filter and may be used to exclude strong tonal components. That is, the strong tonal component may be more efficient in the spectral domain coding mode than in the frequency domain excitation coding mode.
- a start condition for switching to the frequency domain excitation encoding mode that is, cond front may be expressed as in Equation 11 below.
- t 12front , t 24front , t Lfront , and err front are thresholds, respectively, and may be set to optimal values through experimental or simulation in advance.
- Equation 12 an end condition, ie, cond back , for ending the frequency domain excitation encoding mode using the tonality and linear prediction error obtained as described above may be expressed as in Equation 12 below.
- t 12back , t 24back , and t Lback are thresholds, respectively, and may be set to optimal values through experimental or simulation in advance.
- an index state indicating whether the frequency domain excitation coding for example, GSC, is suitable for spectral domain coding in step 701. It may be checked whether the TTSS is one. At this time, the termination condition of Equation 12 may be optionally performed.
- the state TTSS when the state TTSS is 1, in operation 701, the state TTSS may be determined using a frequency domain excitation encoding scheme. In this case, the final encoding mode is modified from the spectral domain mode to the frequency domain excitation mode.
- the indicator state SS that determines whether the voice is strong may be checked. If there is a determination error for the spectral domain coding mode, the frequency domain excitation coding mode may be efficient instead of the spectral domain coding mode.
- the index state SS for determining whether the voice is strong can be obtained by using the difference value vc between the voicing parameter and the correlation parameter.
- the start condition for switching to the strong voice mode that is, cond front may be expressed as in Equation 13 below.
- v cfront is a threshold value, which may be set to an optimal value in advance experimentally or through simulation.
- an end condition, ie, cond back , for ending the strong voice mode using the difference value vc between the voicing parameter and the correlation parameter may be expressed as in Equation 14 below.
- vc back is a threshold value and may be set to an optimal value in advance experimentally or through simulation.
- an index state indicating whether the frequency domain excitation coding for example, GSC is suitable as compared to the spectral domain coding in step 705. It may be checked whether SS is one. In this case, the termination condition check of Equation 14 may be optionally performed.
- the operation may be determined using a spectrum domain coding method.
- the initial coding mode which is the spectral domain mode, is maintained in the final coding mode.
- the operation when the state SS is 1, that is, when it is determined that the voice is strong, the operation may be determined using the frequency domain excitation encoding method. In this case, the final encoding mode is modified from the spectral domain mode to the frequency domain excitation mode.
- Decision errors for the spectral domain encoding mode may be corrected when determining the initial encoding mode through steps 700, 701, and 705. Specifically, the final encoding mode may be changed from the spectral domain mode to the spectral domain mode or the frequency domain excitation mode.
- step 709 may be checked the indicator state SM for determining whether the music is strong. If there is a determination error about the linear prediction domain encoding mode, that is, the time domain excitation encoding mode, the frequency domain excitation encoding mode may be efficient instead of the time domain excitation encoding mode.
- the index state SM for determining whether the music is strong can be obtained by using a value (1-vc) obtained by subtracting the difference value vc between the voicing parameter and the correlation parameter from 1.
- Equation 15 the start condition for switching to the strong music mode, that is, cond front can be expressed as in Equation 15 below.
- vcm front is a threshold value and may be set to an optimal value in advance experimentally or through simulation.
- vcm back is a threshold value and may be set to an optimal value in advance experimentally or through simulation.
- Equation 15 by checking whether the start condition of Equation 15 is established or the end condition of Equation 16 is not satisfied, an index indicating whether frequency domain excitation coding, for example, GSC, is suitable for time domain excitation coding in step 709 is satisfied. It may be checked whether state SM is 1. In this case, the termination condition check of Equation 16 may be optionally performed.
- the operation when the state SM is 0, that is, when it is determined that the music is not strong music, the operation may be determined using a time domain excitation encoding method.
- the initial encoding mode in the linear prediction domain mode is modified to the final encoding mode in the time domain excitation mode.
- the linear prediction domain mode is maintained without modification when the time domain excitation mode is used.
- the operation when the state SM is 1, that is, when it is determined that the music is strong, the operation may be determined using the frequency domain excitation encoding method.
- the initial encoding mode in the linear prediction domain mode is modified to the final encoding mode in the frequency domain excitation mode.
- Errors in determining an initial encoding mode may be corrected through steps 700 and 709.
- the final encoding mode may be changed from the linear prediction domain mode, for example, from the time domain excitation mode to the time domain excitation mode or the frequency domain excitation mode.
- step 709 which is a strong music determination step for correcting an encoding mode determination error for the linear prediction domain mode, may be optionally performed.
- the post-relationship may be changed between step 705 which is a strong voice determination step and step 701 which is a frequency domain excitation mode determination step. That is, after step 700, step 705 may be performed first, and then step 701 may be performed. In this case, the parameters used in each determination step may be changed as necessary.
- FIG. 8 is a block diagram showing the configuration of an audio decoding apparatus according to an embodiment of the present invention.
- the audio decoding apparatus 800 illustrated in FIG. 8 may include a bitstream parser 810, a spectrum domain decoder 820, a linear prediction domain decoder 830, and a switching unit 840.
- the linear prediction domain decoder 830 may include a time domain excitation decoder 831 and a frequency domain excitation decoder 833, and may be implemented with at least one of two excitation decoders 831 and 833.
- each component may be integrated into at least one module and implemented as at least one processor (not shown), except that the components need to be implemented in separate hardware.
- the bitstream parser 810 may parse the received bitstream to separate information about an encoding mode and encoded data.
- the encoding mode determines one of a plurality of encoding modes including a first encoding mode and a second encoding mode as an initial encoding mode in response to characteristics of an audio signal, and if there is an error in the determination of the initial encoding mode, initial encoding. It may correspond to the final encoding mode determined by modifying the mode to the third encoding mode.
- the spectral domain decoder 820 may decode the data encoded in the spectral domain among the separated encoded data.
- the linear prediction domain decoder 830 may decode data encoded in the linear prediction domain among the separated encoded data.
- the linear prediction domain decoder 830 includes a time domain excitation decoder 831 and a frequency domain excitation decoder 833, time domain excitation decoding or frequency domain excitation decoding may be performed on the separated coded data. have.
- the switching unit 840 may switch one of a signal recovered from the spectrum domain decoder 820 and a signal recovered from the linear predictive domain decoder 830 to provide a final recovered signal.
- FIG. 9 is a block diagram showing the configuration of an audio decoding apparatus according to another embodiment of the present invention.
- the audio decoding apparatus 900 illustrated in FIG. 9 includes a bitstream parser 910, a spectral domain decoder 920, a linear predictive domain decoder 930, a switch 940, and a common post-processing module 950. It may include.
- the linear prediction domain decoder 930 may include a time domain excitation encoder 931 and a frequency domain excitation encoder 933, and may be implemented with at least one of two excitation encoders 931 and 933.
- each component may be integrated into at least one module and implemented as at least one processor (not shown), except that the components need to be implemented in separate hardware.
- a common post-processing module 950 is further added, and descriptions of operations of common components will be omitted.
- the common post-processing module 950 corresponds to joint stereo processing, surround processing, and / or bandwidth extension processing in response to the common pre-processing module 205 of FIG. 2. processing).
- the method according to the embodiments can be written in a computer executable program and can be implemented in a general-purpose digital computer operating the program using a computer readable recording medium.
- data structures, program instructions, or data files that can be used in the above-described embodiments of the present invention can be recorded on a computer-readable recording medium through various means.
- the computer-readable recording medium may include all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include magnetic media, such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, floppy disks, and the like.
- Such as magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.
- the computer-readable recording medium may also be a transmission medium for transmitting a signal specifying a program command, a data structure, or the like.
- Examples of program instructions may include high-level language code that can be executed by a computer using an interpreter as well as machine code such as produced by a compiler.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Claims (11)
- 오디오 신호의 특성에 대응하여 제1 부호화모드와 제2 부호화모드를 포함하는 복수의 부호화모드 중 하나를 현재 프레임의 초기 부호화 모드로 결정하는 단계; 및상기 초기 부호화모드에 대한 결정에 오류가 존재하는 경우 상기 초기 부호화모드를 제3 부호화모드로 수정하여 수정된 부호화 모드를 생성하는 단계를 포함하는 부호화 모드 결정방법.
- 제1 항에 있어서, 상기 제1 부호화모드는 스펙트럼 도메인 부호화모드, 상기 제2 부호화모드는 시간 도메인 부호화모드, 상기 제3 부호화모드는 주파수도메인 여기 부호화모드인 부호화모드 결정방법.
- 제1 항에 있어서, 상기 부호화모드 수정단계는 상기 제1 부호화모드가 스펙트럼 도메인 부호화모드인 경우, 소정의 수정 파라미터에 근거하여 상기 초기 부호화모드를 주파수 도메인 여기 부호화모드로 수정할지를 판단하는 부호화모드 결정방법.
- 제3 항에 있어서, 상기 수정 파라미터는 상기 오디오신호의 토널러티, 선형예측에러 및 보이싱 파라미터와 상관도 파라미터간 차이값 중 적어도 하나를 포함하는 부호화모드 결정방법.
- 제1 항에 있어서, 상기 부호화모드 수정단계는 상기 제1 부호화모드가 스펙트럼 도메인 부호화모드인 경우, 상기 오디오신호의 토널러티와 선형예측에러에 근거하여 상기 제1 부호화모드를 주파수 도메인 여기 부호화모드로 수정할지를 판단하고, 상기 판단결과에 따라서 상기 오디오신호의 보이싱 파라미터와 상관도 파라미터간 차이값에 근거하여 상기 제1 부호화모드를 주파수 도메인 여기 부호화모드로 수정할지를 판단하는 부호화모드 결정방법.
- 제1 항에 있어서, 상기 부호화모드 수정단계는 상기 제2 부호화모드가 시간 도메인 부호화모드인 경우, 상기 오디오신호의 보이싱 파라미터와 상관도 파라미터간 차이값에 근거하여 상기 제2 부호화모드를 주파수 도메인 여기 부호화모드로 수정할지를 판단하는 부호화모드 결정방법.
- 제1 항 내지 제6 항 중 어느 한 항에 있어서, 행오버 길이에 해당하는 프레임 수에 대하여 부호화 모드를 판단하여 상기 현재 프레임의 최종 부호화 모드를 결정하는 단계를 부호화모드 결정방법.
- 제7 항에 있어서, 상기 현재 프레임의 초기 부호화 모드 혹은 수정된 부호화 모드가 복수개의 이전 프레임의 부호화 모드와 동일한 경우, 해당 초기 부호화 모드 혹은 수정된 부호화 모드를 상기 현재 프레임의 최종 부호화 모드로 결정하는 부호화모드 결정방법.
- 제7 항에 있어서, 상기 현재 프레임의 초기 부호화 모드 혹은 수정된 부호화 모드가 복수개의 이전 프레임의 부호화 모드와 동일하지 않은 경우, 바로 이전 프레임의 부호화모드를 상기 현재 프레임의 최종 부호화 모드로 결정하는 부호화모드 결정방법.
- 제1 항 내지 제9 항 중 어느 한 항에 따라서 부호화 모드를 결정하는 단계; 및상기 결정된 부호화모드에 따라서 오디오신호에 대하여 서로 다른 부호화처리를 수행하는 단계를 포함하는 오디오 부호화방법.
- 제1 항 내지 제9 항 중 어느 한 항에 따라서 결정된 부호화 모드를 포함하는 비트스트림을 파싱하는 단계; 및상기 부호화 모드에 따라서 비트스트림에 대하여 서로 다른 복호화처리를 수행하는 단계를 포함하는 오디오 복호화방법.
Priority Applications (24)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020157012623A KR102331279B1 (ko) | 2012-11-13 | 2013-11-13 | 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 |
CN201711424971.9A CN108074579B (zh) | 2012-11-13 | 2013-11-13 | 用于确定编码模式的方法以及音频编码方法 |
ES13854639T ES2900594T3 (es) | 2012-11-13 | 2013-11-13 | Procedimiento para determinar un modo de codificación |
BR112015010954-3A BR112015010954B1 (pt) | 2012-11-13 | 2013-11-13 | Método de codificação de um sinal de áudio. |
BR122020023798-8A BR122020023798B1 (pt) | 2012-11-13 | 2013-11-13 | Método de codificação de um sinal de áudio |
MYPI2015701531A MY188080A (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
JP2015542948A JP6170172B2 (ja) | 2012-11-13 | 2013-11-13 | 符号化モード決定方法及び該装置、オーディオ符号化方法及び該装置、並びにオーディオ復号化方法及び該装置 |
CA2891413A CA2891413C (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode |
EP24182511.6A EP4407616A3 (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
PL13854639T PL2922052T3 (pl) | 2012-11-13 | 2013-11-13 | Sposób ustalania trybu kodowania |
MX2017009362A MX361866B (es) | 2012-11-13 | 2013-11-13 | Método y aparato para determinar el modo de codificación, método y aparato para codificar señales de audio, y método y aparato para decodificar señales de audio. |
KR1020227032281A KR102561265B1 (ko) | 2012-11-13 | 2013-11-13 | 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 |
EP21192621.7A EP3933836B1 (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
KR1020217038093A KR102446441B1 (ko) | 2012-11-13 | 2013-11-13 | 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 |
RU2015122128A RU2630889C2 (ru) | 2012-11-13 | 2013-11-13 | Способ и устройство для определения режима кодирования, способ и устройство для кодирования аудиосигналов и способ и устройство для декодирования аудиосигналов |
CN201380070268.6A CN104919524B (zh) | 2012-11-13 | 2013-11-13 | 用于确定编码模式的方法和设备、用于对音频信号进行编码的方法和设备以及用于对音频信号进行解码的方法和设备 |
MX2015006028A MX349196B (es) | 2012-11-13 | 2013-11-13 | Metodo y aparato para determinar el modo de codificacion, metodo y aparato para codificar señales de audio, y metodo y aparato para decodificar señales de audio. |
BR122020023793-7A BR122020023793B1 (pt) | 2012-11-13 | 2013-11-13 | Método de codificação de um sinal de áudio |
EP13854639.5A EP2922052B1 (en) | 2012-11-13 | 2013-11-13 | Method for determining an encoding mode |
AU2013345615A AU2013345615B2 (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
SG11201503788UA SG11201503788UA (en) | 2012-11-13 | 2013-11-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
PH12015501114A PH12015501114A1 (en) | 2012-11-13 | 2015-05-13 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
ZA2015/04289A ZA201504289B (en) | 2012-11-13 | 2015-06-12 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
AU2017206243A AU2017206243B2 (en) | 2012-11-13 | 2017-07-20 | Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201261725694P | 2012-11-13 | 2012-11-13 | |
US61/725,694 | 2012-11-13 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014077591A1 true WO2014077591A1 (ko) | 2014-05-22 |
Family
ID=50731440
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2013/010310 WO2014077591A1 (ko) | 2012-11-13 | 2013-11-13 | 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 |
Country Status (18)
Country | Link |
---|---|
US (3) | US20140188465A1 (ko) |
EP (3) | EP4407616A3 (ko) |
JP (2) | JP6170172B2 (ko) |
KR (3) | KR102446441B1 (ko) |
CN (3) | CN104919524B (ko) |
AU (2) | AU2013345615B2 (ko) |
BR (1) | BR112015010954B1 (ko) |
CA (1) | CA2891413C (ko) |
ES (1) | ES2900594T3 (ko) |
MX (2) | MX349196B (ko) |
MY (1) | MY188080A (ko) |
PH (1) | PH12015501114A1 (ko) |
PL (1) | PL2922052T3 (ko) |
RU (3) | RU2630889C2 (ko) |
SG (2) | SG10201706626XA (ko) |
TW (2) | TWI648730B (ko) |
WO (1) | WO2014077591A1 (ko) |
ZA (1) | ZA201504289B (ko) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107408383A (zh) * | 2015-04-05 | 2017-11-28 | 高通股份有限公司 | 编码器选择 |
US10090004B2 (en) | 2014-02-24 | 2018-10-02 | Samsung Electronics Co., Ltd. | Signal classifying method and device, and audio encoding method and device using same |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107731238B (zh) | 2016-08-10 | 2021-07-16 | 华为技术有限公司 | 多声道信号的编码方法和编码器 |
CN114898761A (zh) * | 2017-08-10 | 2022-08-12 | 华为技术有限公司 | 立体声信号编解码方法及装置 |
US10325588B2 (en) | 2017-09-28 | 2019-06-18 | International Business Machines Corporation | Acoustic feature extractor selected according to status flag of frame of acoustic signal |
US11032580B2 (en) | 2017-12-18 | 2021-06-08 | Dish Network L.L.C. | Systems and methods for facilitating a personalized viewing experience |
US10365885B1 (en) * | 2018-02-21 | 2019-07-30 | Sling Media Pvt. Ltd. | Systems and methods for composition of audio content from multi-object audio |
CN111081264B (zh) * | 2019-12-06 | 2022-03-29 | 北京明略软件系统有限公司 | 一种语音信号处理方法、装置、设备及存储介质 |
EP4362366A4 (en) * | 2021-09-24 | 2024-10-23 | Samsung Electronics Co Ltd | ELECTRONIC DEVICE FOR TRANSMITTING OR RECEIVING DATA PACKETS, AND ASSOCIATED OPERATING METHOD |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050256701A1 (en) * | 2004-05-17 | 2005-11-17 | Nokia Corporation | Selection of coding models for encoding an audio signal |
US20070179783A1 (en) * | 1998-12-21 | 2007-08-02 | Sharath Manjunath | Variable rate speech coding |
US20120069899A1 (en) * | 2002-09-04 | 2012-03-22 | Microsoft Corporation | Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes |
US20120253797A1 (en) * | 2009-10-20 | 2012-10-04 | Ralf Geiger | Multi-mode audio codec and celp coding adapted therefore |
EP2096629B1 (en) * | 2006-12-05 | 2012-10-24 | Huawei Technologies Co., Ltd. | Method and apparatus for classifying sound signals |
Family Cites Families (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2102080C (en) * | 1992-12-14 | 1998-07-28 | Willem Bastiaan Kleijn | Time shifting for generalized analysis-by-synthesis coding |
DE69926821T2 (de) * | 1998-01-22 | 2007-12-06 | Deutsche Telekom Ag | Verfahren zur signalgesteuerten Schaltung zwischen verschiedenen Audiokodierungssystemen |
JP3273599B2 (ja) * | 1998-06-19 | 2002-04-08 | 沖電気工業株式会社 | 音声符号化レート選択器と音声符号化装置 |
US6704711B2 (en) * | 2000-01-28 | 2004-03-09 | Telefonaktiebolaget Lm Ericsson (Publ) | System and method for modifying speech signals |
US6658383B2 (en) * | 2001-06-26 | 2003-12-02 | Microsoft Corporation | Method for coding speech and music signals |
US6785645B2 (en) * | 2001-11-29 | 2004-08-31 | Microsoft Corporation | Real-time speech and music classifier |
WO2004034379A2 (en) * | 2002-10-11 | 2004-04-22 | Nokia Corporation | Methods and devices for source controlled variable bit-rate wideband speech coding |
US20050096898A1 (en) * | 2003-10-29 | 2005-05-05 | Manoj Singhal | Classification of speech and music using sub-band energy |
FI118834B (fi) * | 2004-02-23 | 2008-03-31 | Nokia Corp | Audiosignaalien luokittelu |
US7512536B2 (en) * | 2004-05-14 | 2009-03-31 | Texas Instruments Incorporated | Efficient filter bank computation for audio coding |
DE602004025517D1 (de) | 2004-05-17 | 2010-03-25 | Nokia Corp | Audiocodierung mit verschiedenen codierungsrahmenlängen |
CN101203907B (zh) * | 2005-06-23 | 2011-09-28 | 松下电器产业株式会社 | 音频编码装置、音频解码装置以及音频编码信息传输装置 |
US7733983B2 (en) * | 2005-11-14 | 2010-06-08 | Ibiquity Digital Corporation | Symbol tracking for AM in-band on-channel radio receivers |
US7558809B2 (en) * | 2006-01-06 | 2009-07-07 | Mitsubishi Electric Research Laboratories, Inc. | Task specific audio classification for identifying video highlights |
US8346544B2 (en) * | 2006-01-20 | 2013-01-01 | Qualcomm Incorporated | Selection of encoding modes and/or encoding rates for speech compression with closed loop re-decision |
KR100790110B1 (ko) * | 2006-03-18 | 2008-01-02 | 삼성전자주식회사 | 모폴로지 기반의 음성 신호 코덱 방법 및 장치 |
WO2008045846A1 (en) * | 2006-10-10 | 2008-04-17 | Qualcomm Incorporated | Method and apparatus for encoding and decoding audio signals |
CN101197130B (zh) * | 2006-12-07 | 2011-05-18 | 华为技术有限公司 | 声音活动检测方法和声音活动检测器 |
KR100964402B1 (ko) * | 2006-12-14 | 2010-06-17 | 삼성전자주식회사 | 오디오 신호의 부호화 모드 결정 방법 및 장치와 이를 이용한 오디오 신호의 부호화/복호화 방법 및 장치 |
CN101025918B (zh) * | 2007-01-19 | 2011-06-29 | 清华大学 | 一种语音/音乐双模编解码无缝切换方法 |
KR20080075050A (ko) | 2007-02-10 | 2008-08-14 | 삼성전자주식회사 | 오류 프레임의 파라미터 갱신 방법 및 장치 |
US8060363B2 (en) * | 2007-02-13 | 2011-11-15 | Nokia Corporation | Audio signal encoding |
CN101256772B (zh) * | 2007-03-02 | 2012-02-15 | 华为技术有限公司 | 确定非噪声音频信号归属类别的方法和装置 |
US9653088B2 (en) * | 2007-06-13 | 2017-05-16 | Qualcomm Incorporated | Systems, methods, and apparatus for signal encoding using pitch-regularizing and non-pitch-regularizing coding |
CA2690433C (en) * | 2007-06-22 | 2016-01-19 | Voiceage Corporation | Method and device for sound activity detection and sound signal classification |
KR101380170B1 (ko) * | 2007-08-31 | 2014-04-02 | 삼성전자주식회사 | 미디어 신호 인코딩/디코딩 방법 및 장치 |
CN101393741A (zh) * | 2007-09-19 | 2009-03-25 | 中兴通讯股份有限公司 | 一种宽带音频编解码器中的音频信号分类装置及分类方法 |
CN101399039B (zh) * | 2007-09-30 | 2011-05-11 | 华为技术有限公司 | 一种确定非噪声音频信号类别的方法及装置 |
CN101236742B (zh) * | 2008-03-03 | 2011-08-10 | 中兴通讯股份有限公司 | 音乐/非音乐的实时检测方法和装置 |
EP2259253B1 (en) | 2008-03-03 | 2017-11-15 | LG Electronics Inc. | Method and apparatus for processing audio signal |
JP2011518345A (ja) * | 2008-03-14 | 2011-06-23 | ドルビー・ラボラトリーズ・ライセンシング・コーポレーション | スピーチライク信号及びノンスピーチライク信号のマルチモードコーディング |
US8856049B2 (en) * | 2008-03-26 | 2014-10-07 | Nokia Corporation | Audio signal classification by shape parameter estimation for a plurality of audio signal samples |
EP2144230A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme having cascaded switches |
EP2144231A1 (en) * | 2008-07-11 | 2010-01-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Low bitrate audio encoding/decoding scheme with common preprocessing |
MY153562A (en) * | 2008-07-11 | 2015-02-27 | Fraunhofer Ges Forschung | Method and discriminator for classifying different segments of a signal |
CN101350199A (zh) * | 2008-07-29 | 2009-01-21 | 北京中星微电子有限公司 | 音频编码器及音频编码方法 |
CN102177426B (zh) * | 2008-10-08 | 2014-11-05 | 弗兰霍菲尔运输应用研究公司 | 多分辨率切换音频编码/解码方案 |
CN101751920A (zh) * | 2008-12-19 | 2010-06-23 | 数维科技(北京)有限公司 | 基于再次分类的音频分类装置及其实现方法 |
KR101622950B1 (ko) * | 2009-01-28 | 2016-05-23 | 삼성전자주식회사 | 오디오 신호의 부호화 및 복호화 방법 및 그 장치 |
JP4977157B2 (ja) | 2009-03-06 | 2012-07-18 | 株式会社エヌ・ティ・ティ・ドコモ | 音信号符号化方法、音信号復号方法、符号化装置、復号装置、音信号処理システム、音信号符号化プログラム、及び、音信号復号プログラム |
CN101577117B (zh) * | 2009-03-12 | 2012-04-11 | 无锡中星微电子有限公司 | 伴奏音乐提取方法及装置 |
CN101847412B (zh) * | 2009-03-27 | 2012-02-15 | 华为技术有限公司 | 音频信号的分类方法及装置 |
US20100253797A1 (en) * | 2009-04-01 | 2010-10-07 | Samsung Electronics Co., Ltd. | Smart flash viewer |
KR20100115215A (ko) * | 2009-04-17 | 2010-10-27 | 삼성전자주식회사 | 가변 비트율 오디오 부호화 및 복호화 장치 및 방법 |
KR20110022252A (ko) * | 2009-08-27 | 2011-03-07 | 삼성전자주식회사 | 스테레오 오디오의 부호화, 복호화 방법 및 장치 |
CN102237085B (zh) * | 2010-04-26 | 2013-08-14 | 华为技术有限公司 | 音频信号的分类方法及装置 |
JP5749462B2 (ja) | 2010-08-13 | 2015-07-15 | 株式会社Nttドコモ | オーディオ復号装置、オーディオ復号方法、オーディオ復号プログラム、オーディオ符号化装置、オーディオ符号化方法、及び、オーディオ符号化プログラム |
CN102446504B (zh) * | 2010-10-08 | 2013-10-09 | 华为技术有限公司 | 语音/音乐识别方法及装置 |
CN102385863B (zh) * | 2011-10-10 | 2013-02-20 | 杭州米加科技有限公司 | 一种基于语音音乐分类的声音编码方法 |
US9111531B2 (en) * | 2012-01-13 | 2015-08-18 | Qualcomm Incorporated | Multiple coding mode signal classification |
WO2014010175A1 (ja) * | 2012-07-09 | 2014-01-16 | パナソニック株式会社 | 符号化装置及び符号化方法 |
-
2013
- 2013-11-13 JP JP2015542948A patent/JP6170172B2/ja active Active
- 2013-11-13 SG SG10201706626XA patent/SG10201706626XA/en unknown
- 2013-11-13 CN CN201380070268.6A patent/CN104919524B/zh active Active
- 2013-11-13 MY MYPI2015701531A patent/MY188080A/en unknown
- 2013-11-13 EP EP24182511.6A patent/EP4407616A3/en active Pending
- 2013-11-13 CA CA2891413A patent/CA2891413C/en active Active
- 2013-11-13 BR BR112015010954-3A patent/BR112015010954B1/pt active IP Right Grant
- 2013-11-13 ES ES13854639T patent/ES2900594T3/es active Active
- 2013-11-13 AU AU2013345615A patent/AU2013345615B2/en active Active
- 2013-11-13 PL PL13854639T patent/PL2922052T3/pl unknown
- 2013-11-13 WO PCT/KR2013/010310 patent/WO2014077591A1/ko active Application Filing
- 2013-11-13 KR KR1020217038093A patent/KR102446441B1/ko active IP Right Grant
- 2013-11-13 RU RU2015122128A patent/RU2630889C2/ru active
- 2013-11-13 TW TW106140629A patent/TWI648730B/zh active
- 2013-11-13 SG SG11201503788UA patent/SG11201503788UA/en unknown
- 2013-11-13 KR KR1020157012623A patent/KR102331279B1/ko active IP Right Grant
- 2013-11-13 CN CN201711421463.5A patent/CN107958670B/zh active Active
- 2013-11-13 MX MX2015006028A patent/MX349196B/es active IP Right Grant
- 2013-11-13 TW TW102141400A patent/TWI612518B/zh active
- 2013-11-13 MX MX2017009362A patent/MX361866B/es unknown
- 2013-11-13 EP EP21192621.7A patent/EP3933836B1/en active Active
- 2013-11-13 KR KR1020227032281A patent/KR102561265B1/ko active IP Right Grant
- 2013-11-13 RU RU2017129727A patent/RU2656681C1/ru active
- 2013-11-13 EP EP13854639.5A patent/EP2922052B1/en active Active
- 2013-11-13 US US14/079,090 patent/US20140188465A1/en not_active Abandoned
- 2013-11-13 CN CN201711424971.9A patent/CN108074579B/zh active Active
-
2015
- 2015-05-13 PH PH12015501114A patent/PH12015501114A1/en unknown
- 2015-06-12 ZA ZA2015/04289A patent/ZA201504289B/en unknown
-
2017
- 2017-06-29 JP JP2017127285A patent/JP6530449B2/ja active Active
- 2017-07-20 AU AU2017206243A patent/AU2017206243B2/en active Active
-
2018
- 2018-04-18 RU RU2018114257A patent/RU2680352C1/ru active
- 2018-07-18 US US16/039,110 patent/US10468046B2/en active Active
-
2019
- 2019-10-04 US US16/593,041 patent/US11004458B2/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070179783A1 (en) * | 1998-12-21 | 2007-08-02 | Sharath Manjunath | Variable rate speech coding |
US20120069899A1 (en) * | 2002-09-04 | 2012-03-22 | Microsoft Corporation | Entropy encoding and decoding using direct level and run-length/level context-adaptive arithmetic coding/decoding modes |
US20050256701A1 (en) * | 2004-05-17 | 2005-11-17 | Nokia Corporation | Selection of coding models for encoding an audio signal |
EP2096629B1 (en) * | 2006-12-05 | 2012-10-24 | Huawei Technologies Co., Ltd. | Method and apparatus for classifying sound signals |
US20120253797A1 (en) * | 2009-10-20 | 2012-10-04 | Ralf Geiger | Multi-mode audio codec and celp coding adapted therefore |
Non-Patent Citations (1)
Title |
---|
See also references of EP2922052A4 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10090004B2 (en) | 2014-02-24 | 2018-10-02 | Samsung Electronics Co., Ltd. | Signal classifying method and device, and audio encoding method and device using same |
US10504540B2 (en) | 2014-02-24 | 2019-12-10 | Samsung Electronics Co., Ltd. | Signal classifying method and device, and audio encoding method and device using same |
CN107408383A (zh) * | 2015-04-05 | 2017-11-28 | 高通股份有限公司 | 编码器选择 |
CN107408383B (zh) * | 2015-04-05 | 2019-01-15 | 高通股份有限公司 | 编码器选择 |
Also Published As
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2014077591A1 (ko) | 부호화 모드 결정방법 및 장치, 오디오 부호화방법 및 장치와, 오디오 복호화방법 및 장치 | |
WO2011049416A2 (en) | Apparatus and method encoding/decoding with phase information and residual information | |
WO2009096713A2 (ko) | 적응적 lpc 계수 보간을 이용한 오디오 신호의 부호화, 복호화 방법 및 장치 | |
WO2010008185A2 (en) | Method and apparatus to encode and decode an audio/speech signal | |
WO2011002185A2 (ko) | 가중 선형 예측 변환을 이용한 오디오 신호 부호화 및 복호화 장치 및 그 방법 | |
CA2455059A1 (en) | Speech bandwidth extension apparatus and speech bandwidth extension method | |
WO2010008179A1 (ko) | 음성/음악 통합 신호의 부호화/복호화 방법 및 장치 | |
WO2015126228A1 (ko) | 신호 분류 방법 및 장치, 및 이를 이용한 오디오 부호화방법 및 장치 | |
Arumugam et al. | Improved long-form speech recognition by jointly modeling the primary and non-primary speakers | |
WO2012169808A2 (ko) | 오디오 신호 처리방법, 오디오 부호화장치, 오디오 복호화장치, 및 이를 채용하는 단말기 | |
WO2012177067A2 (ko) | 오디오 신호 처리방법 및 장치와 이를 채용하는 단말기 | |
JP2003140693A (ja) | 音声復号装置及び方法 | |
JPS63237100A (ja) | 音声検出器 | |
WO2012044067A1 (ko) | 적응 코드북 업데이트를 이용한 오디오 신호 디코딩 방법 및 장치 | |
JPH02246625A (ja) | 音声信号の予測符号化方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13854639 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2891413 Country of ref document: CA Ref document number: 20157012623 Country of ref document: KR Kind code of ref document: A Ref document number: 2015542948 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
WWE | Wipo information: entry into national phase |
Ref document number: 122020023793 Country of ref document: BR Ref document number: MX/A/2015/006028 Country of ref document: MX |
|
WWE | Wipo information: entry into national phase |
Ref document number: 12015501114 Country of ref document: PH |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2013854639 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: IDP00201503494 Country of ref document: ID |
|
ENP | Entry into the national phase |
Ref document number: 2015122128 Country of ref document: RU Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2013345615 Country of ref document: AU Date of ref document: 20131113 Kind code of ref document: A |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112015010954 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 112015010954 Country of ref document: BR Kind code of ref document: A2 Effective date: 20150513 |