WO2009110738A2 - 오디오 신호 처리 방법 및 장치 - Google Patents
오디오 신호 처리 방법 및 장치 Download PDFInfo
- Publication number
- WO2009110738A2 WO2009110738A2 PCT/KR2009/001050 KR2009001050W WO2009110738A2 WO 2009110738 A2 WO2009110738 A2 WO 2009110738A2 KR 2009001050 W KR2009001050 W KR 2009001050W WO 2009110738 A2 WO2009110738 A2 WO 2009110738A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- signal
- audio signal
- type
- coding
- coding type
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 177
- 238000000034 method Methods 0.000 title claims abstract description 110
- 238000003672 processing method Methods 0.000 claims abstract description 6
- 230000003595 spectral effect Effects 0.000 claims description 13
- 238000000605 extraction Methods 0.000 claims description 3
- 230000001131 transforming effect Effects 0.000 claims 3
- 230000008569 process Effects 0.000 description 38
- 238000013139 quantization Methods 0.000 description 35
- 238000001228 spectrum Methods 0.000 description 16
- 238000010586 diagram Methods 0.000 description 13
- 238000004891 communication Methods 0.000 description 9
- 238000012546 transfer Methods 0.000 description 8
- 238000004458 analytical method Methods 0.000 description 7
- 230000006835 compression Effects 0.000 description 6
- 238000007906 compression Methods 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 230000005540 biological transmission Effects 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000010076 replication Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 238000007493 shaping process Methods 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/00007—Time or data compression or expansion
- G11B2020/00014—Time or data compression or expansion the compressed signal being an audio signal
Definitions
- the present invention relates to an audio signal processing method and apparatus capable of efficiently encoding and decoding all kinds of various types of audio signals.
- perceptual audio coder optimized for music is a method of reducing the amount of information in the encoding process by using a masking principle, which is a human listening psychoacoustic theory on the frequency axis.
- a linear prediction based coder optimized for speech is a method of reducing the amount of information by modeling speech utterance on the time axis.
- An object of the present invention is to provide an audio signal processing method and apparatus capable of compressing and restoring various kinds of audio signals with higher efficiency.
- the present invention for achieving the above object is to provide an audio coding method suitable for the characteristics of the audio signal.
- an audio coding scheme suitable for each audio signal characteristic provides a more efficient compression and reconstruction of an audio signal.
- FIG. 1 is a block diagram illustrating an audio encoding apparatus according to an embodiment of the present invention.
- FIG. 2 is a flowchart illustrating a method of encoding an audio signal using audio type information according to an embodiment of the present invention.
- FIG 3 shows an example of an audio bitstream structure encoded by the present invention.
- FIG. 4 is a block diagram illustrating an audio encoding apparatus using a psychoacoustic model according to an embodiment of the present invention.
- FIG. 5 is a block diagram illustrating an audio encoding apparatus using a psychoacoustic model according to another embodiment of the present invention.
- FIG. 6 illustrates a change in the noise distortion reference value using the psychoacoustic model unit according to another embodiment of the present invention.
- FIG. 7 is a flowchart illustrating a method of generating a noise distortion reference value using a psychoacoustic model according to another embodiment of the present invention.
- FIG. 8 is a block diagram illustrating an audio decoding apparatus according to an embodiment of the present invention.
- FIG. 9 illustrates the configuration of a product in which an audio decoding apparatus according to an embodiment of the present invention is implemented.
- FIG. 10 illustrates an example of a relationship between products in which an audio decoding apparatus according to an embodiment of the present invention is implemented.
- FIG. 11 is a flowchart illustrating an audio decoding method according to an embodiment of the present invention.
- 'Coding' may be interpreted as encoding or decoding in some cases, and information is a term including all values, parameters, coefficients, elements, and the like. .
- the term 'audio signal' in the present invention refers to all signals that can be visually identified during reproduction as a concept that is distinguished from a video signal.
- the audio signal may be, for example, a speech signal or a similar signal centered on human pronunciation (hereinafter, referred to as a 'speech signal'), a machine sound and a music centered sound. (music) signal or a similar signal (hereinafter referred to as a 'music signal'), and a 'mixed signal' in which the voice signal and the music signal are mixed.
- An object of the present invention is to provide a method and apparatus for encoding and decoding audio signals classified into three types according to characteristics of each signal.
- the classification of the audio signal is only a criterion classified for explanation of the present invention, and even if the audio signal is classified by another method, it is obvious that the technical idea of the present invention is equally applicable. .
- FIG. 1 is a block diagram illustrating an audio encoding apparatus according to an embodiment of the present invention.
- FIG. 1 illustrates a process of classifying an input audio signal according to a predetermined criterion and selecting and encoding an audio encoding method suitable for each classified audio signal.
- a signal classification unit (Sound Activity Detector) 100 for analyzing a characteristic of an input audio signal and classifying it into any one type of a voice signal, a music signal, or a mixed signal of voice and music, and the signal
- the linear prediction modeling unit 110 for encoding a speech signal among the signal types determined by the classification unit 100, the psychoacoustic modeling unit 120 for encoding a music signal, and a mixed signal for encoding a mixed signal of speech and music
- the modeling unit 130 is included.
- the switching unit 101 may select a coding scheme suitable for this.
- the switching unit 101 will be described later in detail with reference to audio signal coding type information generated by the signal classification unit 100 (for example, first type information and second type information, and FIGS. 2 and 3). Is operated as a control signal.
- the mixed signal modeling unit 130 may include a linear prediction unit 131, a residual signal extraction unit 132, and a frequency converter 133.
- a linear prediction unit 131 for example, a linear prediction unit 131
- a residual signal extraction unit 132 for example, first type information and second type information, and FIGS. 2 and 3
- a frequency converter 133 for example, a frequency converter 133.
- the signal classification unit 100 generates a control signal for classifying an input audio signal type and selecting an audio encoding scheme suitable for the input audio signal. For example, the signal classifying unit 100 classifies whether an input audio signal is a music signal, a voice signal, or a mixed signal in which both voice and music signals are mixed. That is, the reason for classifying the type of the audio signal input as described above is to select an optimal coding method among audio coding methods to be described later for each audio signal type. As a result, the signal classification unit 100 may correspond to a process of analyzing an input audio signal and selecting an optimal audio coding scheme.
- the signal classification unit 100 analyzes an input audio signal to generate audio coding type information, and the generated audio coding type information is used as a criterion for selecting a coding scheme, as well as the final audio signal. It is included in the form of a bitstream and transmitted to the decoding apparatus or the receiving apparatus. A decoding method and apparatus using the audio coding type information will be described later in detail with reference to FIGS. 8 and 11.
- the audio coding type information generated by the signal classification unit 100 may include, for example, first type information and second type information. This will be described later in FIGS. 2 and 3.
- the signal classification unit 100 determines the audio signal type according to the characteristics of the input audio signal. For example, if the input audio signal is a signal that is better modeled with a specific coefficient and a residual signal, it is determined as a voice signal, whereas the signal is a signal that cannot be well modeled with a specific coefficient and a residual signal. In this case, it is determined as a music signal. In addition, when it is difficult to determine any one of the voice signal and the music signal, it may be determined as a mixed signal. Specifically, for example, when the signal is modeled as a specific coefficient and a residual signal, when the energy level ratio of the residual signal to the energy level of the signal is smaller than a predetermined reference value, the signal may be modeled.
- the signal may be determined as a signal well modeled by linear prediction that predicts the current signal from the past signal, and thus, may be determined as a music signal.
- the input signal may be encoded using a speech encoder optimized for the speech signal.
- the linear prediction modeling is performed using a coding scheme suitable for the speech signal.
- the unit 110 is used.
- the linear prediction modeling unit 110 has various methods, for example, an Algebraic Code Excited Linear Prediction (ACELP) coding method or an Adaptive Multi-Rate (AMR) coding and an Adaptive Multi-Rate Wideband (AMR-WB) coding. The method can be applied.
- ACELP Algebraic Code Excited Linear Prediction
- AMR Adaptive Multi-Rate
- AMR-WB Adaptive Multi-Rate Wideband
- the linear prediction modeling unit 110 may linearly predict and encode an input audio signal in units of frames, and extract and quantize prediction coefficients for each frame.
- a method of extracting prediction coefficients using a 'Levinson-Durbin algorithm' is widely used.
- linear prediction is performed for each frame.
- the modeling method can be applied.
- the input audio signal when the input audio signal is classified as a music signal by the signal classification unit 100, the input signal may be encoded using a music encoder optimized for the music signal.
- the psychoacoustic modeling unit 120 is used as a suitable coding scheme. In relation to this, an example of the psychoacoustic modeling unit 120 applied to the present invention will be described later in detail with reference to FIGS. 4 to 7.
- the signal classification unit 100 classifies an input audio signal into a mixed signal in which voice and music are mixed
- the input signal may be encoded using an encoder optimized for the mixed signal.
- the mixed signal modeling unit 130 is used as an encoding method suitable for the mixed signal.
- the mixed signal modeling unit 130 may code the mixed prediction method by modifying the aforementioned linear prediction modeling method and the psychoacoustic modeling method. That is, the mixed signal modeling unit 130 performs linear predictive coding on an input signal and then obtains a residual signal that is a difference between the linearly predicted result signal and the original signal, and the residual signal is frequency transform coded. The way you do it.
- FIG. 1 illustrates an example in which the mixed signal modeling unit 130 includes a linear predictor 131, a residual signal extractor 132, and a frequency converter 133.
- the linear prediction unit 131 performs linear prediction analysis on the input signal to extract linear prediction coefficients representing the characteristics of the signal, and uses the linear prediction coefficients extracted by the residual signal extracting unit 132.
- a residual signal from which duplicate components are removed from the input signal is extracted.
- the residual signal may have a shape such as white noise since redundancy is removed.
- the linear predictor 131 may linearly encode an input audio signal in units of frames, and extract and quantize prediction coefficients for each frame. That is, for example, when the input audio signal is composed of a plurality of frames or a plurality of super frames having a plurality of frames as one unit, linear prediction is performed for each frame.
- the modeling method can be applied.
- the residual signal extractor 132 receives a residual signal coded through the linear predictor 131 and an original audio signal that has passed through the signal classifier 100, and is a residual signal that is a difference signal between the two signals. Extract the signal.
- the frequency converter 133 calculates a masking threshold value or a signal-to-mask ratio (SMR) of the residual signal by frequency domain converting the input residual signal using a method such as MDCT. Code dual signals.
- the frequency converter 133 may code a signal of residual audio tendency using TCX.
- the linear prediction modeling unit 110 and the linear prediction unit 131 linearly analyze the input audio signal to extract a linear prediction coefficient (LPC) reflecting audio characteristics.
- LPC linear prediction coefficient
- a method using a variable bit may be considered.
- the signal classification unit 100 generates and generates coding type information of an audio signal into two types of information, and includes the same in a bitstream to transmit to the decoding apparatus.
- audio coding type information according to the present invention will be described in detail with reference to FIGS. 2 and 3.
- FIG. 2 is a flowchart illustrating a method of encoding an audio signal using coding type information of the audio signal according to an embodiment of the present invention.
- the present invention proposes a method of expressing a type of an audio signal by dividing it into first type information and second type information. That is, for example, when the input audio signal is determined to be a music signal (S100), the signal classification unit 100 may switch the switching unit (eg, to select an appropriate coding scheme (for example, the psychoacoustic modeling method of FIG. 2). By controlling 101, encoding is performed according to the selected encoding scheme (S110). Subsequently, the control information is configured as first type information and included in the encoded audio bitstream for transmission. In relation to this, the first type information eventually serves as coding identification information indicating that the coding type of the audio signal is a music signal coding time, which is utilized in decoding the audio signal in the decoding method and apparatus.
- S100 music signal
- the signal classification unit 100 may switch the switching unit (eg, to select an appropriate coding scheme (for example, the psychoacoustic modeling method of FIG. 2).
- the control information is configured as first type information and included in the
- the signal classifying unit 100 switches the switching unit 101 to select an appropriate coding scheme (for example, the linear prediction modeling method of FIG. 2). By controlling, the encoding is performed according to the selected encoding scheme (S130). In addition, if it is determined that the input audio signal is a mixed signal (S120), the signal classifying unit 100 selects the switching unit 101 to select an appropriate coding scheme (for example, the mixed signal modeling method of FIG. 2). By controlling, encoding is performed according to the selected encoding scheme (S140).
- control information indicating either the speech signal coding type or the mixed signal coding type is configured as second type information, and is included in the encoded audio bitstream together with the first type information for transmission.
- the second type information eventually serves as a coding identification information indicating that a coding type of an audio signal is either a voice signal coding type or a mixed signal coding type, which is the first type described above in the decoding method and apparatus. It is used to decode the audio signal along with the information.
- the first type information and the second type information may be transmitted either only the first type information or both of the first type information and the second type information according to characteristics of the input audio signal.
- the input audio signal coding type is a music signal coding type
- only the first type information may be included in the bitstream and transmitted, and the second type information may not be included in the bitstream (Fig. 3 (a)). That is, since the second type information is included in the bitstream only when the input audio signal coding type is the voice signal coding type or the mixed signal coding type, the second type information prevents unnecessary bits to represent the coding type of the audio signal. It works.
- the first type information is a music signal coding type.
- the first type information is used as a voice signal coding type or a mixed signal coding type. It is obvious that it can be used as information to indicate. That is, according to the coding environment to which the present invention is applied, by using the audio coding type that is frequently generated as the first type information, the number of bits of the overall bitstream is reduced.
- FIG 3 shows an example of an audio bitstream structure encoded by the present invention.
- FIG. 3A illustrates a case where an input audio signal corresponds to a music signal, and includes only the first type information 301 in the bitstream and does not include the second type information.
- the bitstream includes audio data coded with a coding type corresponding to the first type information 301 (for example, the AAC bitstream 302).
- FIG. 3B illustrates a case where an input audio signal corresponds to a voice signal, and includes both first type information 311 and second type information 312 in the bitstream.
- the bitstream includes audio data coded with a coding type corresponding to the second type information 312 (for example, the AMR bitstream 313).
- FIG. 3C illustrates a case in which an input audio signal corresponds to a mixed signal, and includes both first type information 321 and second type information 322 in the bitstream.
- the bitstream includes audio data coded with a coding type corresponding to the second type information 322 (for example, the AAC bitstream 313 to which TXC is applied).
- 3 (a) to (c) are merely examples of information included in an audio bitstream encoded by the present invention, and it will be apparent that various applications are possible within the scope of the present invention. .
- the present invention adds information for identifying AMR and AAC as examples of coding schemes, but various coding schemes are applicable, and coding identification information for identifying them may also be used in various ways.
- 3 (a) to (c) of the present invention are applicable to one super frame, unit frame, or subframe. That is, the audio signal coding type information can be provided for each preset frame unit.
- a frequency band extension process may be performed as a preprocessing process of an input signal encoding process using the linear prediction modeling unit 110, the psychoacoustic modeling unit 120, and the mixed signal modeling unit 130. (Not shown).
- the frequency band extension process the SBR (Spectral Band Replication) and HBE (High Band Extension) for generating a high frequency component using a low frequency component in the bandwidth extension decoding unit may be used.
- a channel expansion process may be performed as a preprocessing process of an input signal encoding process using the linear prediction modeling unit 110, the psychoacoustic modeling unit 120, and the mixed signal modeling unit 130 (not shown).
- the channel expansion process reduces the bit allocation by encoding channel information of an audio signal into additional information.
- An example of the channel extension process may include a channel extension encoding unit such as Parametric Stereo (PS).
- PS Parametric Stereo is a technique for coding stereo signals, downmixing stereo signals to mono signals.
- a 24 kHz mono signal signal is left after passing through SBR / Parametric Stereo, which may be encoded through an encoder.
- the input signal of the encoder is 24 kHz because the high frequency component is coded through the SBR and downsampled to half of the existing frequency while passing through the SBR, and the reason for the mono signal is that stereo audio is parameterized through a parametric stereo (PS). This is because it is extracted and changed into the sum of mono signal and additional audio.
- PS parametric stereo
- FIG. 4 is a block diagram illustrating an audio encoding process using a psychoacoustic model according to an embodiment of the present invention.
- an audio encoder using a psychoacoustic model includes a filter bank (401), a psychoacoustic model (402), a quantization unit (Quantization and Bit Allocation); 403, an entropy coding unit 404, and a multiplexer 405.
- the filter bank 401 converts the audio signal into a frequency axis signal by performing a Modified Discrete Cosine Transform (MDCT) to encode an input audio signal that is a time axis signal.
- MDCT Modified Discrete Cosine Transform
- the psychoacoustic model unit 402 analyzes the perceptual characteristics of the input audio signal to determine the maximum allowable quantization noise for each frequency required for the bit allocation process.
- the noise shaping reference is a diagram of the maximum allowable quantization noise for each frequency.
- the psychoacoustic model unit 402 since the psychoacoustic model unit 402 analyzes the perceptual characteristics of the input signal on the frequency axis, it requires a frequency conversion process of the input signal. Although the frequency conversion is performed through the filter bank 401 in the encoding process of the audio signal, since the experimental results of the psychoacoustic theory are mostly performed on the Discrete Fourier Transform (DFT) axis, FFT (Fast Fourier Transform) transformation is performed. It is preferable.
- DFT Discrete Fourier Transform
- the noise distortion reference value in the psychoacoustic model is obtained by convolution of the frequency spectrum with a spreading function corresponding to each frequency component.
- the difference between the noise distortion reference value obtained by the psychoacoustic model and the input signal spectrum is calculated by perceptual entropy, and the appropriate bits are allocated to quantize the spectrum of the audio signal.
- the quantization unit 403 removes an amount of quantization noise located below a noise distortion reference value determined by the psychoacoustic model unit 402 in the audio signal converted into the frequency axis signal through the filter bank 401. Quantize the result after loss coding. It also assigns bits to the quantized audio signal. The bit allocation process optimizes the quantization noise generated in the quantization process at a given bit rate to be as small as possible than the maximum allowable noise obtained from the psychoacoustic model.
- the entropy coding unit 404 maximizes the compression rate of the audio signal by assigning a code according to the frequency of use of the quantized and bit-allocated audio signal through the quantization unit 403. That is, the compression efficiency is improved by assigning codes so that the average code length is closest to entropy.
- the basic principle is to reduce the total amount of data by representing each symbol or sequence of symbols with a sign of the appropriate length, depending on the frequency of statistical occurrence of the data symbols. The probability of occurrence of a data symbol determines the average amount of information per symbol called " entropy. &Quot; The goal of entropy encoding is to bring the average code length per symbol closer to entropy.
- the multiplexer 405 receives the highly efficient compressed audio data and side information from the entropy coding unit 404 and transmits an audio data bit stream to a receiver decoder.
- FIG. 5 is a block diagram illustrating an audio encoding process using a psychoacoustic model according to another embodiment of the present invention.
- an audio signal encoder includes an analysis filterbank (501), a psychoacoustic model (502), a quantization unit (Quantization and Bit Allocation; 503), and entropy.
- the coding unit includes an encoding code 504 and a multiplexer 505.
- the psychoacoustic model unit 502 includes a coefficient generator 502a and a noise distortion reference value determiner 502b.
- the filter bank 501 converts the audio signal into subband samples to remove statistical redundancy of the audio signal, and performs an MDCT (Modified Discrete Cosine Transform) to encode an input audio signal which is a time axis signal. Convert it to a signal.
- MDCT Modified Discrete Cosine Transform
- the psychoacoustic model unit 502 analyzes the perceptual characteristics of the input signal to determine the maximum allowable quantization noise for each frequency required for the bit allocation process.
- a quantization process for converting an analog signal into a digital signal is performed.
- an error value generated by rounding consecutive values is called quantization noise.
- the quantization noise varies according to the degree of bit allocation, and a signal to quantization noise ratio (SQNR) value is used to quantize the quantization noise.
- SQNR signal to quantization noise ratio
- the noise shaping reference is a diagram of the maximum allowable quantization noise for each frequency. As a result, increasing the bit allocation value reduces quantization noise and increases the probability that quantization noise falls below the noise distortion reference value.
- the psychoacoustic model unit 502 performs linear predictive analysis to generate linear predictive coefficients, and applies coefficients to the linear predictive coefficients to generate strain predictive coefficients, and generates the predictive predictive coefficients. And a noise distortion reference value determiner 502b that determines a noise distortion reference value.
- the noise distortion reference value is generated using the distortion prediction coefficients generated by the perceptual weighting coding that weights the linear prediction coefficients derived through the linear prediction coding.
- the quantization unit 503 is a loss coding method for removing an amount of quantization noise below a noise distortion reference value determined by the psychoacoustic model unit 502 from an audio signal converted into a frequency axis signal through the filter bank 501 ( Quantize the result through Loss Coding and allocate bits to the quantized audio signal.
- the bit allocation process optimizes the quantization noise generated during the quantization process at a given bit rate so that it is as small as possible than the maximum allowable noise of the newly set noise distortion threshold. That is, the quantization bits of the MDCT spectrum are allocated so that the quantization noise can be masked by the signal based on the noise distortion reference value in each frame.
- the frequency-converted audio signal may be divided into a plurality of subband signals, and the subband signals may be quantized by using the noise prediction reference value based on the distortion prediction coefficients corresponding to each subband signal.
- the entropy coding unit 504 maximizes the compression rate of the audio signal by allocating a code according to the frequency of use of the quantized and bit-assigned audio signal through the quantization unit 503. That is, the compression efficiency is improved by assigning codes so that the average code length is closest to entropy. That is, the data amount is optimized by representing each symbol or a continuous symbol with a sign of an appropriate length according to the frequency of statistical occurrence of the data symbols. The probability of occurrence of a data symbol determines the average amount of information per symbol called " entropy. &Quot; The goal of entropy encoding is to bring the average code length per symbol closer to entropy.
- the entropy coding unit 504 is not limited to a specific method in performing entropy coding, and a Huffman coding method, an arithmetic coding method, an LZW coding method, or the like may be used according to a choice of a person skilled in the art.
- the multiplexer 505 receives the highly efficient compressed audio data and side information from the entropy coding unit 504 and transmits the encoded audio data bit stream to a receiver decoder.
- audio data encoded through the audio encoding method of the present invention may be decoded as follows through a decoder.
- a quantized audio signal is received through a demultiplexer of a decoder, and the audio signal is recovered from the quantized audio signal.
- the quantized audio signal is generated using a noise distortion reference value for the frequency-converted audio signal, and the noise distortion reference value is a distortion prediction coefficient generated by applying a weight to a linear prediction coefficient of the audio signal. It may be determined using.
- FIG. 6 is a diagram illustrating a change in a noise distortion reference value using the psychoacoustic model unit according to another exemplary embodiment of the present invention.
- the horizontal axis represents frequency and the vertical axis represents signal strength (dB).
- the solid line 1 on the graph indicates the spectrum of the audio input signal, and the dashed line 2 indicates the energy of the audio input signal, and 3
- the solid line represents the noise distortion reference value newly set using the conventional noise distortion reference value, and the dotted line 4 represents the distortion prediction coefficient generated by applying the linear prediction coefficient calculated through the linear prediction analysis and the weight added to the linear prediction coefficient.
- the high point of the waveform is called a formant
- the low point of the waveform is called a valley
- part A of FIG. 6 becomes a formant
- part B becomes a valley area.
- valleys of more bits are counted in audio signal coding to compensate for quantization noise for the valley region based on the fact that the human auditory characteristics are sensitive to quantization noise in the valley region of the frequency spectrum.
- the coding efficiency for the speech signal can be improved by adjusting the noise distortion reference value of the A portion upwardly compared to the conventional, and adjusting the marching curve value of the B portion downwardly compared to the conventional one. That is, in the quantizing the frequency-converted audio signal, the weight increases the quantization noise of the audio signal corresponding to the formant region of the frequency spectrum with respect to the linear predictive coefficient, and reduces the quantization noise of the audio signal corresponding to the valley region. May be carried out in a decreasing direction.
- the coefficient generator 502a shown in FIG. 5 may obtain a transfer function composed of linear predictive coefficients through linear predictive analysis.
- the frequency spectrum of this transfer function is shown by the envelope of the frequency spectrum for the input signal.
- This transfer function is called Linear Predictive Coefficient, and has a form similar to the noise distortion reference value of the psychoacoustic model (PAM) used in the conventional audio encoding process.
- PAM psychoacoustic model
- the coefficient generator 502a may implement a weighting filter by applying an appropriate weighting coefficient to the linear predictive coefficients, thereby generating a modified predictive coefficient, and simply using the modified predictive coefficient. This makes it possible to adjust the specific gravity of the formant and valley regions of the spectrum.
- the effect of quantization noise is to lower the noise distortion threshold for more acoustically sensitive spectrum areas, to allocate more bits, and to form relatively less error-affected formants.
- the noise distortion threshold is increased to reduce the number of allocated bits, thereby improving the audio encoding performance.
- the weighting coefficients that control the perceptual weighting degree are not all applied equally, but are adaptively adjusted according to the input signal characteristics such as the flatness of the spectrum. Encoding performance can be further improved.
- the noise distortion reference value may be derived by applying only the perceptual weighting to the psychoacoustic model without analyzing the envelope of the spectrum.
- FIG. 7 is a flowchart illustrating a method of generating a noise distortion reference value using a psychoacoustic model according to another exemplary embodiment of the present invention.
- the coefficient generator 502a when an audio signal is input to the psychoacoustic model unit 502, the coefficient generator 502a generates a transfer function composed of linear predictive coefficients using linear predictive coding (S200). The frequency spectrum of this transfer function is shown by the envelope of the frequency spectrum for the input signal. This transfer function is called Linear Predictive Coefficient, and has a form similar to the noise distortion reference value of the psychoacoustic model (PAM) used in the conventional audio encoding process.
- the coefficient generator 502a receives an audio signal to determine a weighting coefficient suitable for the linear prediction coefficients (S210).
- the noise distortion reference value determiner 502b generates the corrected envelope by applying the weight coefficient determined in step S210 to the envelope of the transfer function formed of the linear prediction coefficient obtained in step S200 (S220). Thereafter, the noise distortion reference value determiner 502b calculates an impulse response of the envelope generated in step S220 (S230). In this case, the impulse response plays a kind of filtering role. Thereafter, the noise distortion reference value determiner 502b converts the time axis signal into a frequency axis signal by performing FFT on the envelope filtered in step S230 (S240). The noise distortion reference value determination unit 502b determines a masking level in order to set the envelope transformed into the frequency axis signal as the noise distortion reference value (S250). Thereafter, the noise distortion reference value determiner 502b divides the signal-to-mask ratio SMR for each subband (S260).
- a weighting filter is implemented by applying a weighting coefficient to psychoacoustic coefficients, thereby increasing the value of the formant region of the noise distortion reference value in the frequency spectrum and comparing the valley noise with the conventional noise distortion reference value. We can lower this to allow more bits to be allocated in the valley region.
- the modified linear prediction coding compresses a signal using a core audio coding method applying the psychoacoustic model of the present invention to a specific low frequency band in order to reduce a transmission rate in a high efficiency audio encoder, and rests the remaining high frequency components. They are encoded using a bandwidth extension or spectral band replication (SBR) method using low frequency information. In the case of such a high-efficiency encoder, the noise distortion reference value based on the psychoacoustic model is required only up to a specific low frequency band.
- SBR spectral band replication
- the audio signal encoder illustrated in FIG. 4 or 5 may operate in a device equipped with both a music signal encoder and a voice signal encoder.
- the audio signal encoder encodes the downmix signal according to a music coding scheme when a specific frame or a specific segment of the downmix signal has a music characteristic.
- the music signal encoder may correspond to a modified disc transform transform (MDCT) encoder.
- MDCT modified disc transform transform
- the speech signal encoder encodes the downmix signal according to a speech coding scheme when a specific frame or a specific segment of the downmix signal is mainly characterized by speech characteristics.
- the linear prediction coding method used in the speech signal encoder can be improved by the method proposed by the present invention.
- the harmonic signal may be modeled by linear prediction that predicts the current signal from the past signal.
- the linear prediction coding method may increase coding efficiency.
- the voice signal encoder may correspond to a time domain encoder.
- FIG. 8 is a diagram illustrating a decoding apparatus according to an embodiment of the present invention.
- the decoding apparatus may restore a signal from an input bitstream by performing an inverse process of an encoding process performed by the encoding apparatus described with reference to FIG. 1.
- the decoding apparatus may include a demultiplexer 210, a decoder determiner 220, a decoder 230, and a synthesizer 240.
- the decoder 230 may include a plurality of decoders 231, 232, and 233 which perform decoding by different methods, which are operated under the control of the decoder determiner 220.
- the decoder 230 may include a linear prediction decoder 231, a psychoacoustic decoder 232, and a mixed signal decoder 233.
- the mixed signal decoder 233 may include an information extractor 234, a frequency converter 235, and a linear predictor 236.
- the demultiplexer 210 extracts a plurality of encoded signals and additional information for decoding the signals from an input bitstream. For example, the first type information and the second type information (included only when necessary) included in the aforementioned bitstream are extracted and transmitted to the decoder determiner 220.
- the decoder determiner 220 determines one of decoding methods in the decoders 231, 232, and 233 from the first type information and the second type information (which are included only when necessary). However, the decoder determiner 220 may determine the decoding method using the additional information extracted from the bitstream. However, when there is no additional information in the bitstream, the decoder determining unit 220 may determine the decoding method by an independent determination method. have. The determination method may utilize the features of the signal classification unit (FIGS. 1 and 100) described above.
- the linear prediction decoder 231 in the decoder 230 is capable of decoding an audio signal of a voice signal type.
- the psychoacoustic decoder 232 decodes an audio signal of a music signal type.
- the mixed signal decoder 233 decodes an audio signal of a mixed type of voice and music.
- the mixed signal decoder 233 extracts spectral data and linear predictive coefficients from an audio signal, and inverse-frequency transforms the spectral data to generate a residual signal for linear prediction.
- a linear predictor 236 for linearly predicting and coding the linear predictive coefficient and the residual signal to generate an output signal.
- the decoded signals are synthesized by the combining unit 240 and restored to the original audio signal.
- the demultiplexer 210 extracts first type information and second type information (if necessary) from the input bitstream.
- the decoder determiner 220 first determines a coding type of the received audio signal by using first type information among the extracted information (S1000). If a music signal is received, the psychoacoustic decoder 232 in the decoder 230 is used, and a coding scheme applied to each frame or subframe determined by the first type information is determined. Then, decoding is performed by applying a coding scheme suitable for this (S1100).
- the decoder determiner 220 first uses the second type information. It is determined whether the coding type of the received audio signal is a voice signal coding type or a mixed signal coding type (S1200).
- the linear prediction decoder 231 in the decoder 230 may be used, and each frame or subframe may be utilized by using coding identification information extracted from the bitstream. A coding scheme applied to each star is determined, and then decoding is performed by applying a suitable coding scheme (S1300).
- the mixed signal decoder 233 in the decoder 230 may be utilized, and each frame or subframe may be determined by the second type information.
- the coding scheme to be applied is determined, and then decoding is performed by applying a coding scheme suitable for this (S1400).
- the frequency in the bandwidth extension decoding unit A band extension process can be made.
- the frequency band extension process decodes the band extension information included in the audio signal bitstream by the bandwidth extension decoding unit to generate spectral data of another band (eg, a high frequency band) from some or all of the spectral data.
- a block may be generated by grouping into units having similar characteristics in extending the frequency band. This is like creating an envelope region by grouping type slots (or samples) with a common envelope (or envelope characteristic).
- FIG. 9 is a diagram illustrating a configuration of a product on which a decoding apparatus according to an embodiment of the present invention is implemented.
- FIG. 10 is a diagram illustrating a relationship between products in which a decoding apparatus according to an embodiment of the present invention is implemented.
- the wired / wireless communication unit 910 receives a bitstream through a wired / wireless communication scheme.
- the wired / wireless communication unit 910 may include at least one of a wired communication unit 910A, an infrared communication unit 910B, a Bluetooth unit 910C, and a wireless LAN communication unit 910D.
- the user authentication unit 920 performs user authentication by inputting user information, and includes one or more of a fingerprint recognition unit 920A, an iris recognition unit 920B, a face recognition unit 920C, and a voice recognition unit 920D.
- the fingerprint, iris information, facial contour information, and voice information may be input, converted into user information, and the user authentication may be performed by determining whether the user information matches the existing registered user data. .
- the input unit 930 is an input device for a user to input various types of commands, and may include one or more of a keypad unit 930A, a touch pad unit 930B, and a remote controller unit 930C. It is not limited.
- the signal decoding unit 950 analyzes signal characteristics using the received bitstream and frame type information, and decodes the signal using a decoding unit corresponding to the corresponding signal characteristics to generate an output signal.
- the controller 950 receives input signals from the input apparatuses and controls all processes of the signal decoding unit 940 and the output unit 960.
- the output unit 960 is a component in which an output signal generated by the signal decoding unit 940 is output, and may include a speaker unit 960A and a display unit 960B. When the output signal is an audio signal, the output signal is output to the speaker, and when the output signal is a video signal, the output signal is output through the display.
- FIG. 10 illustrates a relationship between a terminal and a server corresponding to the product illustrated in FIG. 9.
- the first terminal 1001 and the second terminal 1002 are each terminals. It can be seen that they can communicate the data to the bitstream in both directions through the wired or wireless communication unit.
- the server 1003 and the first terminal 1001 may also perform wired or wireless communication with each other.
- the audio signal processing method according to the present invention can be stored in a computer-readable recording medium which is produced as a program for execution in a computer, and multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium.
- the computer readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, and may also be implemented in the form of a carrier wave (for example, transmission over the Internet). Include.
- the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted using a wired / wireless communication network.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Claims (15)
- 오디오 복호화기를 포함하는 오디오 신호 처리 장치내에서,제1 타입정보를 이용하여 오디오 신호의 코딩타입이 음악신호 코딩타입인지 를 식별하는 단계;상기 오디오 신호의 코딩타입이 음악신호 코딩타입이 아닌 경우, 제 2 타입정보를 이용하여 상기 오디오 신호의 코딩타입이 음성신호 코딩타입인지, 혼합신호 코딩타입인지를 식별하는 단계;상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우, 상기 오디오 신호로부터 스펙트럴 데이터와 선형예측 계수를 추출하는 단계;상기 스펙트럴 데이터를 역 주파수 변환하여 선형 예측에 대한 레지듀얼 신호를 생성하는 단계; 및상기 선형예측 계수 및 상기 레지듀얼 신호를 선형 예측 코딩하여, 오디오 신호를 복원하는 단계;를 포함하되, 상기 오디오 신호의 코딩타입이 음악신호 코딩타입인 경우 상기 제1 타입정보만이 이용되고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입 또는 혼합신호 코딩타입인 경우 상기 제1 타입정보 및 제2 타입정보 모두가 이용되는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제1 항에 있어서,상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우, 상기 복원된 오디오 신호의 저주파 대역 신호를 이용하여 고주파 대역 신호를 복원하는 단계; 및상기 복원된 오디오 신호를 업믹싱하여 복수개 채널을 생성하는 단계를 더 포함하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제 1항에 있어서,상기 오디오 신호는 복수의 서브 프레임으로 구성되며, 상기 제 2 타입 정보는 상기 서브 프레임 단위로 존재하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제 1항에 있어서,상기 오디오 신호의 코딩타입이 음악신호 코딩타입이면 상기 오디오 신호는 주파수 도메인 신호이고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입이면 상기 오디오 신호는 타임 도메인 신호이며, 상기 오디오 신호의 코딩타입이 혼합신호 코딩타입이면 상기오디오 신호는 MDCT 도메인 신호인 것을 특징으로 하는 오디오 신호 처리 방법.
- 제 1항에 있어서,상기 선형 예측 계수를 추출하는 단계는,선형 예측 계수 모드를 추출하고, 상기 추출된 모드에 해당하는 가변비트수 크기의 선형 예측 계수를 추출하는 것을 특징으로 하는 오디오 신호 처리 방법.
- 비트스트림으로부터 제1 타입정보, 제2 타입정보를 추출하는 디멀티플렉서;상기 제1 타입정보를 이용하여 오디오 신호의 코딩타입이 음악신호 코딩타입인지를 식별하고, 상기 오디오 신호의 코딩타입이 음악신호 코딩타입이 아닌 경우, 제 2 타입정보를 이용하여 상기 오디오 신호의 코딩타입이 음성신호 코딩타입인지 또는 혼합신호 코딩타입인지를 식별한 후, 복호화 방식을 결정하는 복호화기 결정부;상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우, 상기 오디오 신호로부터 스펙트럴 데이터와 선형예측 계수를 추출하는 정보추출부;상기 스펙트럴 데이터를 역 주파수 변환하여 선형 예측에 대한 레지듀얼 신호를 생성하는 주파수 변환부; 및상기 선형예측 계수 및 상기 레지듀얼 신호를 선형 예측 코딩하여, 오디오 신호를 복원하는 선형 예측부를; 포함하되 상기 오디오 신호의 코딩타입이 음악신호 코딩타입인 경우 상기 제1 타입정보만이 이용되고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입 또는 혼합신호 코딩타입인 경우 상기 제1 타입정보 및 제2 타입정보 모두가 이용되는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제6 항에 있어서,상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우,상기 복원된 오디오 신호의 저주파 대역 신호를 이용하여 고주파 대역 신호를 복원하는 대역폭 확장 디코딩부; 및상기 복원된 오디오 신호를 업믹싱하여 복수개 채널을 생성하는 채널 확장 디코딩부를 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제 6항에 있어서,상기 오디오 신호는 복수의 서브 프레임으로 구성되며, 상기 제2 타입정보는 상기 서브 프레임 단위로 존재하는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제 6항에 있어서,상기 오디오 신호의 코딩타입이 음악신호 코딩타입이면 상기 오디오 신호는 주파수 도메인 신호이고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입이면 상기 오디오 신호는 타임 도메인 신호이며, 상기 오디오 신호의 코딩타입이 혼합신호 코딩타입이면 상기 오디오 신호는 MDCT 도메인 신호인 것을 특징으로 하는 오디오 신호 처리 장치.
- 제 6항에 있어서,상기 선형 예측 계수를 추출하는 정보추출부는,선형 예측 계수 모드를 확인하고, 상기 추출된 모드에 해당하는 가변비트수 크기의 선형 예측 계수를 추출하는 것을 특징으로 하는 오디오 신호 처리 장치.
- 오디오 신호를 처리하는 오디오 부호화기를 포함하는 오디오 신호 처리 장치 내에서,상기 오디오 신호의 코딩타입을 결정하는 단계;상기 오디오 신호가 음악신호이면, 음악신호 코딩타입으로 코딩됨을 나타내는 제1 타입정보를 생성하는 단계;상기 오디오 신호가 음악신호가 아니면, 음성신호 코딩타입과 혼합신호 코딩 타입 중 어느 하나로 코딩됨을 나타내는 제2 타입정보를 생성하는 단계;상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우,상기 오디오 신호를 선형 예측 코딩하여 선형예측 계수를 생성하는 단계;상기 선형 예측 코딩에 대한 레지듀얼 신호를 생성하는 단계;상기 레지듀얼 신호를 주파수 변환하여 스펙트럴 계수를 생성하는 단계; 및상기 제 1 타입정보, 상기 제 2 타입정보, 상기 선형예측 계수 및 레지듀얼 신호를 포함하는 오디오 비트스트림을 생성하는 단계;를 포함하되, 상기 오디오 신호의 코딩타입이 음악신호 코딩타입인 경우 상기 제1 타입정보만이 생성되고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입 또는 혼합신호 코딩타입인 경우 상기 제1 타입정보 및 제2 타입정보 모두가 생성되는 것을 특징으로 하는 오디오 신호 처리 방법.
- 제 11항에 있어서,상기 오디오 신호는 복수의 서브 프레임으로 구성되며, 상기 제2 타입정보는 상기 서브 프레임별로 생성되는 것을 특징으로 하는 오디오 신호 처리 방법.
- 입력 오디오 신호의 코딩타입을 결정하되, 상기 오디오 신호가 음악신호이면, 음악신호 코딩타입으로 코딩됨을 나타내는 제1 타입정보를 생성하고, 상기 오디오 신호가 음악신호가 아니면, 음성신호 코딩타입과 혼합신호 코딩 타입 중 어느 하나로 코딩됨을 나타내는 제2 타입정보를 생성하는 신호분류부; 및상기 오디오 신호의 코딩타입이 혼합신호 코딩타입인 경우, 상기 오디오 신호를 선형 예측 코딩하여 선형예측 계수를 생성하는 선형예측 모델링부;상기 선형 예측에 대한 레지듀얼 신호를 생성하는 레지듀얼 신호추출부; 및상기 레지듀얼 신호를 주파수 변환하여 스펙트럴 계수를 생성하는 주파수 변환부를 포함하되, 상기 오디오 신호의 코딩타입이 음악신호 코딩타입인 경우 상기 제1 타입정보만이 생성되고, 상기 오디오 신호의 코딩타입이 음성신호 코딩타입 또는 혼합신호 코딩타입인 경우 상기 제1 타입정보 및 제2 타입정보 모두가 생성되는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제 13항에 있어서,상기 오디오 신호는 복수의 서브 프레임으로 구성되며, 상기 제2 타입정보는 상기 서브 프레임별로 생성되는 것을 특징으로 하는 오디오 신호 처리 장치.
- 제 13항에 있어서,상기 오디오 신호의 코딩타입이 음악신호 코딩인 경우,선형예측 코딩을 이용하여 선형예측 계수를 생성하고, 상기 선형예측 계수에 가중치를 부과하는 계수 발생부; 및가중치가 부여된 상기 선형예측 계수를 이용하여 잡음 변형 기준값을 생성하는 기준값 결정부를 포함하는 것을 특징으로 하는 오디오 신호 처리 장치.
Priority Applications (9)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2010549570A JP5266341B2 (ja) | 2008-03-03 | 2009-03-03 | オーディオ信号処理方法及び装置 |
MX2010009571A MX2010009571A (es) | 2008-03-03 | 2009-03-03 | Metodo y aparato para el procesamiento de señales de audio. |
AU2009220321A AU2009220321B2 (en) | 2008-03-03 | 2009-03-03 | Method and apparatus for processing audio signal |
CA2716817A CA2716817C (en) | 2008-03-03 | 2009-03-03 | Method and apparatus for processing audio signal |
KR1020107019538A KR101221919B1 (ko) | 2008-03-03 | 2009-03-03 | 오디오 신호 처리 방법 및 장치 |
BRPI0910285-0A BRPI0910285B1 (pt) | 2008-03-03 | 2009-03-03 | Métodos e aparelhos para processamento de sinal de áudio. |
EP09716372.9A EP2259253B1 (en) | 2008-03-03 | 2009-03-03 | Method and apparatus for processing audio signal |
CN2009801075430A CN101965612B (zh) | 2008-03-03 | 2009-03-03 | 用于处理音频信号的方法和装置 |
US12/497,375 US7991621B2 (en) | 2008-03-03 | 2009-07-02 | Method and an apparatus for processing a signal |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US3303208P | 2008-03-03 | 2008-03-03 | |
US61/033,032 | 2008-03-03 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US12/497,375 Continuation US7991621B2 (en) | 2008-03-03 | 2009-07-02 | Method and an apparatus for processing a signal |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2009110738A2 true WO2009110738A2 (ko) | 2009-09-11 |
WO2009110738A3 WO2009110738A3 (ko) | 2009-10-29 |
Family
ID=41056471
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2009/001050 WO2009110738A2 (ko) | 2008-03-03 | 2009-03-03 | 오디오 신호 처리 방법 및 장치 |
Country Status (11)
Country | Link |
---|---|
US (1) | US7991621B2 (ko) |
EP (1) | EP2259253B1 (ko) |
JP (1) | JP5266341B2 (ko) |
KR (1) | KR101221919B1 (ko) |
CN (1) | CN101965612B (ko) |
AU (1) | AU2009220321B2 (ko) |
BR (1) | BRPI0910285B1 (ko) |
CA (1) | CA2716817C (ko) |
MX (1) | MX2010009571A (ko) |
RU (1) | RU2455709C2 (ko) |
WO (1) | WO2009110738A2 (ko) |
Families Citing this family (58)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101434198B1 (ko) * | 2006-11-17 | 2014-08-26 | 삼성전자주식회사 | 신호 복호화 방법 |
US9343079B2 (en) * | 2007-06-15 | 2016-05-17 | Alon Konchitsky | Receiver intelligibility enhancement system |
KR101380170B1 (ko) * | 2007-08-31 | 2014-04-02 | 삼성전자주식회사 | 미디어 신호 인코딩/디코딩 방법 및 장치 |
JP5108960B2 (ja) * | 2008-03-04 | 2012-12-26 | エルジー エレクトロニクス インコーポレイティド | オーディオ信号処理方法及び装置 |
US8401845B2 (en) * | 2008-03-05 | 2013-03-19 | Voiceage Corporation | System and method for enhancing a decoded tonal sound signal |
CN101567203B (zh) * | 2008-04-24 | 2013-06-05 | 深圳富泰宏精密工业有限公司 | 自动搜寻及播放音乐的系统及方法 |
EP2301028B1 (en) * | 2008-07-11 | 2012-12-05 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | An apparatus and a method for calculating a number of spectral envelopes |
CN102089814B (zh) * | 2008-07-11 | 2012-11-21 | 弗劳恩霍夫应用研究促进协会 | 对编码的音频信号进行解码的设备和方法 |
KR101569702B1 (ko) * | 2009-08-17 | 2015-11-17 | 삼성전자주식회사 | 레지듀얼 신호 인코딩 및 디코딩 방법 및 장치 |
WO2011035813A1 (en) * | 2009-09-25 | 2011-03-31 | Nokia Corporation | Audio coding |
JP5754899B2 (ja) | 2009-10-07 | 2015-07-29 | ソニー株式会社 | 復号装置および方法、並びにプログラム |
PL2491553T3 (pl) | 2009-10-20 | 2017-05-31 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Koder audio, dekoder audio, sposób kodowania informacji audio, sposób dekodowania informacji audio i program komputerowy wykorzystujący iteracyjne zmniejszania rozmiaru przedziału |
JP5624159B2 (ja) * | 2010-01-12 | 2014-11-12 | フラウンホーファーゲゼルシャフトツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. | オーディオ符号化器、オーディオ復号器、オーディオ情報を符号化および復号するための方法、ならびに以前に復号されたスペクトル値のノルムに基づいてコンテキストサブ領域値を取得するコンピュータプログラム |
CA3097372C (en) | 2010-04-09 | 2021-11-30 | Dolby International Ab | Mdct-based complex prediction stereo coding |
JP5609737B2 (ja) | 2010-04-13 | 2014-10-22 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
JP5850216B2 (ja) | 2010-04-13 | 2016-02-03 | ソニー株式会社 | 信号処理装置および方法、符号化装置および方法、復号装置および方法、並びにプログラム |
JP6075743B2 (ja) | 2010-08-03 | 2017-02-08 | ソニー株式会社 | 信号処理装置および方法、並びにプログラム |
JP5707842B2 (ja) | 2010-10-15 | 2015-04-30 | ソニー株式会社 | 符号化装置および方法、復号装置および方法、並びにプログラム |
US20130066638A1 (en) * | 2011-09-09 | 2013-03-14 | Qnx Software Systems Limited | Echo Cancelling-Codec |
CN103918247B (zh) * | 2011-09-23 | 2016-08-24 | 数字标记公司 | 基于背景环境的智能手机传感器逻辑 |
CN103889335B (zh) * | 2011-10-28 | 2016-06-22 | 皇家飞利浦有限公司 | 用于处理针对听诊的心音的设备与方法 |
CN104040624B (zh) * | 2011-11-03 | 2017-03-01 | 沃伊斯亚吉公司 | 改善低速率码激励线性预测解码器的非语音内容 |
US9111531B2 (en) | 2012-01-13 | 2015-08-18 | Qualcomm Incorporated | Multiple coding mode signal classification |
EP2830062B1 (en) * | 2012-03-21 | 2019-11-20 | Samsung Electronics Co., Ltd. | Method and apparatus for high-frequency encoding/decoding for bandwidth extension |
US9123328B2 (en) * | 2012-09-26 | 2015-09-01 | Google Technology Holdings LLC | Apparatus and method for audio frame loss recovery |
EP2922052B1 (en) | 2012-11-13 | 2021-10-13 | Samsung Electronics Co., Ltd. | Method for determining an encoding mode |
CA2899542C (en) | 2013-01-29 | 2020-08-04 | Guillaume Fuchs | Noise filling without side information for celp-like coders |
EP2951822B1 (en) | 2013-01-29 | 2019-11-13 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Audio encoder, audio decoder, method for providing an encoded audio information, method for providing a decoded audio information, computer program and encoded representation using a signal-adaptive bandwidth extension |
US9601125B2 (en) * | 2013-02-08 | 2017-03-21 | Qualcomm Incorporated | Systems and methods of performing noise modulation and gain adjustment |
US20140355769A1 (en) | 2013-05-29 | 2014-12-04 | Qualcomm Incorporated | Energy preservation for decomposed representations of a sound field |
JP6201043B2 (ja) | 2013-06-21 | 2017-09-20 | フラウンホーファーゲゼルシャフト ツール フォルデルング デル アンゲヴァンテン フォルシユング エー.フアー. | エラー封じ込め中の切替音声符号化システムについての向上した信号フェードアウトのための装置及び方法 |
EP2830058A1 (en) | 2013-07-22 | 2015-01-28 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Frequency-domain audio coding supporting transform length switching |
CN103413553B (zh) | 2013-08-20 | 2016-03-09 | 腾讯科技(深圳)有限公司 | 音频编码方法、音频解码方法、编码端、解码端和系统 |
US9666202B2 (en) | 2013-09-10 | 2017-05-30 | Huawei Technologies Co., Ltd. | Adaptive bandwidth extension and apparatus for the same |
JP6531649B2 (ja) | 2013-09-19 | 2019-06-19 | ソニー株式会社 | 符号化装置および方法、復号化装置および方法、並びにプログラム |
CN103500580B (zh) * | 2013-09-23 | 2017-04-12 | 广东威创视讯科技股份有限公司 | 混音处理方法及系统 |
MY181965A (en) | 2013-10-18 | 2021-01-15 | Fraunhofer Ges Forschung | Coding of spectral coefficients of a spectrum of an audio signal |
JP6593173B2 (ja) | 2013-12-27 | 2019-10-23 | ソニー株式会社 | 復号化装置および方法、並びにプログラム |
US9311639B2 (en) | 2014-02-11 | 2016-04-12 | Digimarc Corporation | Methods, apparatus and arrangements for device to device communication |
CN110992965B (zh) * | 2014-02-24 | 2024-09-03 | 三星电子株式会社 | 信号分类方法和装置以及使用其的音频编码方法和装置 |
CN111312278B (zh) | 2014-03-03 | 2023-08-15 | 三星电子株式会社 | 用于带宽扩展的高频解码的方法及设备 |
WO2015133795A1 (ko) * | 2014-03-03 | 2015-09-11 | 삼성전자 주식회사 | 대역폭 확장을 위한 고주파 복호화 방법 및 장치 |
EP2922055A1 (en) | 2014-03-19 | 2015-09-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and corresponding computer program for generating an error concealment signal using individual replacement LPC representations for individual codebook information |
EP2922056A1 (en) | 2014-03-19 | 2015-09-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and corresponding computer program for generating an error concealment signal using power compensation |
EP2922054A1 (en) | 2014-03-19 | 2015-09-23 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method and corresponding computer program for generating an error concealment signal using an adaptive noise estimation |
KR20240046298A (ko) | 2014-03-24 | 2024-04-08 | 삼성전자주식회사 | 고대역 부호화방법 및 장치와 고대역 복호화 방법 및 장치 |
US9911427B2 (en) * | 2014-03-24 | 2018-03-06 | Nippon Telegraph And Telephone Corporation | Gain adjustment coding for audio encoder by periodicity-based and non-periodicity-based encoding methods |
CN107452391B (zh) | 2014-04-29 | 2020-08-25 | 华为技术有限公司 | 音频编码方法及相关装置 |
US10770087B2 (en) | 2014-05-16 | 2020-09-08 | Qualcomm Incorporated | Selecting codebooks for coding vectors decomposed from higher-order ambisonic audio signals |
JP6398607B2 (ja) * | 2014-10-24 | 2018-10-03 | 富士通株式会社 | オーディオ符号化装置、オーディオ符号化方法、オーディオ符号化プログラム |
WO2016142002A1 (en) * | 2015-03-09 | 2016-09-15 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Audio encoder, audio decoder, method for encoding an audio signal and method for decoding an encoded audio signal |
KR102398124B1 (ko) * | 2015-08-11 | 2022-05-17 | 삼성전자주식회사 | 음향 데이터의 적응적 처리 |
CN105070304B (zh) | 2015-08-11 | 2018-09-04 | 小米科技有限责任公司 | 实现对象音频录音的方法及装置、电子设备 |
US10186276B2 (en) * | 2015-09-25 | 2019-01-22 | Qualcomm Incorporated | Adaptive noise suppression for super wideband music |
EP3389046B1 (en) * | 2015-12-08 | 2021-06-16 | Sony Corporation | Transmission device, transmission method, reception device, and reception method |
US10991379B2 (en) * | 2018-06-22 | 2021-04-27 | Babblelabs Llc | Data driven audio enhancement |
FR3085785B1 (fr) * | 2018-09-07 | 2021-05-14 | Gracenote Inc | Procedes et appareil pour generer une empreinte numerique d'un signal audio par voie de normalisation |
KR20220017221A (ko) * | 2020-08-04 | 2022-02-11 | 삼성전자주식회사 | 전자 장치 및 그의 오디오 데이터를 출력하는 방법 |
Family Cites Families (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE2952113C2 (de) * | 1979-12-22 | 1983-05-19 | Matth. Hohner Ag, 7218 Trossingen | String-Chorus-Schaltung |
US4628529A (en) * | 1985-07-01 | 1986-12-09 | Motorola, Inc. | Noise suppression system |
DE4202140A1 (de) * | 1992-01-27 | 1993-07-29 | Thomson Brandt Gmbh | Verfahren zur uebertragung digitaler audio-signale |
US5285498A (en) * | 1992-03-02 | 1994-02-08 | At&T Bell Laboratories | Method and apparatus for coding audio signals based on perceptual model |
JP3707116B2 (ja) * | 1995-10-26 | 2005-10-19 | ソニー株式会社 | 音声復号化方法及び装置 |
US5692102A (en) * | 1995-10-26 | 1997-11-25 | Motorola, Inc. | Method device and system for an efficient noise injection process for low bitrate audio compression |
US5778335A (en) * | 1996-02-26 | 1998-07-07 | The Regents Of The University Of California | Method and apparatus for efficient multiband celp wideband speech and music coding and decoding |
US6092041A (en) * | 1996-08-22 | 2000-07-18 | Motorola, Inc. | System and method of encoding and decoding a layered bitstream by re-applying psychoacoustic analysis in the decoder |
JPH1084284A (ja) * | 1996-09-06 | 1998-03-31 | Sony Corp | 信号再生方法および装置 |
CA2233896C (en) * | 1997-04-09 | 2002-11-19 | Kazunori Ozawa | Signal coding system |
ATE302991T1 (de) * | 1998-01-22 | 2005-09-15 | Deutsche Telekom Ag | Verfahren zur signalgesteuerten schaltung zwischen verschiedenen audiokodierungssystemen |
US6424938B1 (en) * | 1998-11-23 | 2002-07-23 | Telefonaktiebolaget L M Ericsson | Complex signal activity detection for improved speech/noise classification of an audio signal |
SG98418A1 (en) * | 2000-07-10 | 2003-09-19 | Cyberinc Pte Ltd | A method, a device and a system for compressing a musical and voice signal |
US6658383B2 (en) * | 2001-06-26 | 2003-12-02 | Microsoft Corporation | Method for coding speech and music signals |
US7693707B2 (en) * | 2003-12-26 | 2010-04-06 | Pansonic Corporation | Voice/musical sound encoding device and voice/musical sound encoding method |
FI118835B (fi) * | 2004-02-23 | 2008-03-31 | Nokia Corp | Koodausmallin valinta |
US7596486B2 (en) * | 2004-05-19 | 2009-09-29 | Nokia Corporation | Encoding an audio signal using different audio coder modes |
ES2478004T3 (es) * | 2005-10-05 | 2014-07-18 | Lg Electronics Inc. | Método y aparato para decodificar una señal de audio |
CN101086845B (zh) * | 2006-06-08 | 2011-06-01 | 北京天籁传音数字技术有限公司 | 声音编码装置及方法以及声音解码装置及方法 |
CN101512639B (zh) * | 2006-09-13 | 2012-03-14 | 艾利森电话股份有限公司 | 用于语音/音频发送器和接收器的方法和设备 |
KR20070017378A (ko) * | 2006-11-16 | 2007-02-09 | 노키아 코포레이션 | 서로 다른 코딩 모델들을 통한 오디오 인코딩 |
CN101025918B (zh) * | 2007-01-19 | 2011-06-29 | 清华大学 | 一种语音/音乐双模编解码无缝切换方法 |
KR101513028B1 (ko) * | 2007-07-02 | 2015-04-17 | 엘지전자 주식회사 | 방송 수신기 및 방송신호 처리방법 |
-
2009
- 2009-03-03 JP JP2010549570A patent/JP5266341B2/ja active Active
- 2009-03-03 AU AU2009220321A patent/AU2009220321B2/en active Active
- 2009-03-03 BR BRPI0910285-0A patent/BRPI0910285B1/pt active IP Right Grant
- 2009-03-03 RU RU2010140362/08A patent/RU2455709C2/ru active
- 2009-03-03 CN CN2009801075430A patent/CN101965612B/zh active Active
- 2009-03-03 CA CA2716817A patent/CA2716817C/en active Active
- 2009-03-03 MX MX2010009571A patent/MX2010009571A/es active IP Right Grant
- 2009-03-03 WO PCT/KR2009/001050 patent/WO2009110738A2/ko active Application Filing
- 2009-03-03 KR KR1020107019538A patent/KR101221919B1/ko active IP Right Grant
- 2009-03-03 EP EP09716372.9A patent/EP2259253B1/en active Active
- 2009-07-02 US US12/497,375 patent/US7991621B2/en active Active
Non-Patent Citations (2)
Title |
---|
None |
See also references of EP2259253A4 |
Also Published As
Publication number | Publication date |
---|---|
BRPI0910285A2 (pt) | 2015-09-29 |
AU2009220321B2 (en) | 2011-09-22 |
KR20100134576A (ko) | 2010-12-23 |
AU2009220321A1 (en) | 2009-09-11 |
RU2010140362A (ru) | 2012-04-10 |
EP2259253B1 (en) | 2017-11-15 |
CA2716817A1 (en) | 2009-09-11 |
JP5266341B2 (ja) | 2013-08-21 |
US20100070284A1 (en) | 2010-03-18 |
MX2010009571A (es) | 2011-05-30 |
EP2259253A2 (en) | 2010-12-08 |
JP2011513788A (ja) | 2011-04-28 |
CN101965612A (zh) | 2011-02-02 |
EP2259253A4 (en) | 2013-02-20 |
CN101965612B (zh) | 2012-08-29 |
RU2455709C2 (ru) | 2012-07-10 |
WO2009110738A3 (ko) | 2009-10-29 |
KR101221919B1 (ko) | 2013-01-15 |
CA2716817C (en) | 2014-04-22 |
BRPI0910285B1 (pt) | 2020-05-12 |
US7991621B2 (en) | 2011-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2009110738A2 (ko) | 오디오 신호 처리 방법 및 장치 | |
WO2009110751A2 (ko) | 오디오 신호 처리 방법 및 장치 | |
KR101779426B1 (ko) | 오디오 신호 처리 방법 및 장치 | |
KR100304055B1 (ko) | 음성 신호 부호화동안 잡음 대체를 신호로 알리는 방법 | |
KR100647336B1 (ko) | 적응적 시간/주파수 기반 오디오 부호화/복호화 장치 및방법 | |
CN1926608B (zh) | 多声道信号处理设备和方法 | |
CN102150202A (zh) | 对音频/语音信号进行编码和解码的方法和设备 | |
KR100707177B1 (ko) | 디지털 신호 부호화/복호화 방법 및 장치 | |
JP2010510540A (ja) | オーディオ及び/またはスピーチ信号符号化及び/または復号化方法及び装置 | |
JP2009539132A (ja) | オーディオ信号の線形予測符号化 | |
JP4281131B2 (ja) | 信号符号化装置及び方法、並びに信号復号装置及び方法 | |
JP3348759B2 (ja) | 変換符号化方法および変換復号化方法 | |
JP4618823B2 (ja) | 信号符号化装置及び方法 | |
RU2773022C2 (ru) | Способ кодирования и декодирования стерео во временной области и сопутствующий продукт | |
EP2720223A2 (en) | Audio signal processing method, audio encoding apparatus, audio decoding apparatus, and terminal adopting the same | |
KR20100054749A (ko) | 신호의 처리 방법 및 이의 장치 | |
WO2010058931A2 (en) | A method and an apparatus for processing a signal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 200980107543.0 Country of ref document: CN |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 09716372 Country of ref document: EP Kind code of ref document: A2 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3130/KOLNP/2010 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2716817 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: MX/A/2010/009571 Country of ref document: MX |
|
ENP | Entry into the national phase |
Ref document number: 20107019538 Country of ref document: KR Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010549570 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2009716372 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009716372 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2009220321 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2010140362 Country of ref document: RU |
|
ENP | Entry into the national phase |
Ref document number: 2009220321 Country of ref document: AU Date of ref document: 20090303 Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: PI0910285 Country of ref document: BR Kind code of ref document: A2 Effective date: 20100901 |