WO2020223797A1 - Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack - Google Patents
Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack Download PDFInfo
- Publication number
- WO2020223797A1 WO2020223797A1 PCT/CA2020/050582 CA2020050582W WO2020223797A1 WO 2020223797 A1 WO2020223797 A1 WO 2020223797A1 CA 2020050582 W CA2020050582 W CA 2020050582W WO 2020223797 A1 WO2020223797 A1 WO 2020223797A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- attack
- frame
- stage
- sub
- current frame
- Prior art date
Links
- 230000005236 sound signal Effects 0.000 title claims abstract description 94
- 238000000034 method Methods 0.000 title claims abstract description 74
- 230000007704 transition Effects 0.000 claims abstract description 31
- 238000001514 detection method Methods 0.000 claims description 62
- 238000004458 analytical method Methods 0.000 claims description 50
- 230000007774 longterm Effects 0.000 claims description 4
- 230000005284 excitation Effects 0.000 description 39
- 230000003044 adaptive effect Effects 0.000 description 34
- 238000003786 synthesis reaction Methods 0.000 description 19
- 230000015572 biosynthetic process Effects 0.000 description 18
- 238000004891 communication Methods 0.000 description 18
- 238000012545 processing Methods 0.000 description 13
- 238000010586 diagram Methods 0.000 description 10
- 230000004044 response Effects 0.000 description 10
- 238000005070 sampling Methods 0.000 description 7
- 230000008901 benefit Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 4
- 230000005540 biological transmission Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 230000001143 conditioned effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/022—Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
- G10L19/025—Detection of transients or attacks for time/frequency resolution switching
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
- G10L19/032—Quantisation or dequantisation of spectral components
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L2019/0001—Codebooks
- G10L2019/0002—Codebook adaptations
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
- G10L2025/935—Mixed voiced class; Transitions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
- G10L2025/937—Signal energy in various frequency bands
Definitions
- the present disclosure relates to a technique for coding a sound signal, for example speech or an audio signal, in view of transmitting and synthesizing this sound signal.
- the present disclosure relates to methods and devices for detecting an attack in a sound signal to be coded, for example speech or an audio signal, and for coding the detected attack.
- attack refers to a low-to-high energy change of a signal, for example voiced onsets (transitions from an unvoiced speech segment to a voiced speech segment), other sound onsets, transitions, plosives, etc., generally characterized by an abrupt energy increase within a sound signal segment.
- the term“onset” refers to the beginning of a significant sound event, for example speech, a musical note, or other sound;
- plosive refers, in phonetics, to a consonant in which the vocal tract is blocked so that all airflow ceases
- coding of the detected attack refers to the coding of a sound signal segment whose length is generally few milliseconds after the beginning of the attack.
- a speech encoder converts a speech signal into a digital bit stream which is transmitted over a communication channel or stored in a storage medium.
- the speech signal is digitized, that is sampled and quantized with usually 16-bits per sample.
- the speech encoder has the role of representing these digital samples with a smaller number of bits while maintaining a good subjective speech quality.
- a speech decoder or synthesizer operates on the transmitted or stored digital bit stream and converts it back to a speech signal.
- CELP Code-Excited Linear Prediction
- M the number of speech samples corresponding typically to 10-30 ms.
- a LP (Linear Prediction) filter is calculated and transmitted every frame. The calculation of the LP filter typically needs a lookahead, for example a 5-15 ms speech segment from the subsequent frame.
- Each M- sample frame is divided into smaller blocks called sub-frames.
- an excitation is usually obtained from two components, a past excitation contribution and an innovative, fixed codebook excitation contribution.
- the past excitation contribution is often referred to as the pitch or adaptive codebook excitation contribution.
- the parameters characterizing the excitation are coded and transmitted to the decoder, where the excitation is reconstructed and supplied as input to a LP synthesis filter.
- CELP-based speech codecs rely heavily on prediction to achieve their high performance.
- Such prediction can be of different types but usually comprises the use of an adaptive codebook storing an adaptive codebook excitation contribution selected from previous frames.
- a CELP encoder exploits the quasi periodicity of voiced speech by searching in the past adaptive codebook excitation contribution the segment most similar to the segment being currently coded. The same past adaptive codebook excitation contribution is also stored in the decoder. It is then sufficient for the encoder to send a pitch delay and a pitch gain for the decoder to reconstruct the same adaptive codebook excitation contribution as used in the encoder.
- the evolution (difference) between the previous speech segment and the currently coded speech segment is further modeled using a fixed codebook excitation contribution selected from a fixed codebook.
- a problem related to prediction inherent to CELP- based speech codecs appears in the presence of transmission errors (erased frames or packets) when the state of the encoder and the state of the decoder become desynchronized. Due to prediction, the effect of an erased frame is not limited to the erased frame, but continues to propagate after the frame erasure, often during several following frames. Naturally, the perceptual impact can be very annoying. Attacks such as transitions from an unvoiced speech segment to a voiced speech segment (for example transitions between a consonant or a period of inactive speech, and a vowel) or transitions between two different voiced segments (for example transitions between two vowels) are amongst the most problematic cases for frame erasure concealment.
- the periodic part (adaptive codebook excitation contribution) of the excitation is thus completely missing in the adaptive codebook at the decoder after a lost voiced onset and it can take up to several frames for the decoder to recover from this loss.
- a similar situation occurs in the case of lost voiced to voiced transition.
- the excitation contribution stored in the adaptive codebook before the transition frame has typically very different characteristics from the excitation contribution stored in the adaptive codebook after the transition.
- the decoder usually conceals the lost frame with the use of the past frame information, the state of the encoder and the state of the decoder will be very different, and the synthesized signal can suffer from important distortion.
- a solution to this problem was introduced in Reference [2] where, in a frame following the transition frame, the inter-frame dependent adaptive codebook is replaced by a non-predictive glottal-shape codebook.
- a second issue is related to the gain quantizers, often designed as vector quantizers using a limited bit-budget, which are usually not able to adequately react to an abrupt energy increase within a frame. The more this abrupt energy increase occurs close to the end of a frame, the more critical the second issue is.
- the present disclosure relates to a method for detecting an attack in a sound signal to be coded wherein the sound signal is processed in successive frames each including a number of sub-frames.
- the method comprises a first-stage attack detection for detecting the attack in a last sub-frame of a current frame, and a second-stage attack detection for detecting the attack in one of the sub-frames of the current frame, including the sub-frames preceding the last sub-frame.
- the present disclosure also relates to a method for coding an attack in a sound signal, comprising the above-defined attack detecting method.
- the coding method comprises encoding the sub-frame comprising the detected attack using a coding mode with a non-predictive codebook.
- the present disclosure is concerned with a device for detecting an attack in a sound signal to be coded wherein the sound signal is processed in successive frames each including a number of sub- frames.
- the device comprises a first-stage attack detector for detecting the attack in a last sub-frame of a current frame, and a second-stage attack detector for detecting the attack in one of the sub-frames of the current frame, including the sub-frames preceding the last sub-frame.
- the present disclosure is further concerned with a device for coding an attack in a sound signal, comprising the above-defined attack detecting device and an encoder of the sub-frame comprising the detected attack using a coding mode with a non-predictive codebook.
- Figure 1 is a schematic block diagram of a sound processing and communication system depicting a possible context of implementation of the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack;
- Figure 2 is a schematic block diagram illustrating the structure of a CELP- based encoder and decoder, forming part of the sound processing and communication system of Figure 1 ;
- FIG. 3 is a block diagram illustrating concurrently the operations of an EVS (Enhanced Voice Services) coding mode classifying method and the modules of an EVS coding mode classifier;
- EVS Enhanced Voice Services
- Figure 4 is a block diagram illustrating concurrently the operations of a method for detecting an attack in a sound signal to be coded and the modules of an attack detector for implementing the method;
- Figure 5 is a graph of a first non-restrictive, illustrative example showing the impact of the attack detector of Figure 4 and a TC (Transition Coding) coding mode on the quality of a decoded speech signal, wherein curve a) represents an input speech signal, curve b) represents a reference speech signal synthesis, and curve c) represents the improved speech signal synthesis when the attack detector of Figure 4 and the TC coding mode are used for processing an onset frame;
- Figure 6 is a graph of a second non-restrictive, illustrative example showing the impact of the attack detector of Figure 4 and TC coding mode on the quality of a decoded speech signal, wherein curve a) represents an input speech signal, curve b) represents a reference speech signal synthesis, and curve c) represents the improved speech signal synthesis when the attack detector of Figure 4 and the TC coding mode are used for processing an onset frame; and
- Figure 7 is a simplified block diagram of an example configuration of hardware components for implementing the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack.
- the following description is concerned with detecting an attack in a sound signal, for example speech or an audio signal, and forcing a Transition Coding (TC) mode in sub-frames where an attack is detected.
- the detection of an attack may also be used for selecting a sub-frame in which a glottal-shape codebook, as part of the TC coding mode, is employed in the place of an adaptive codebook.
- a detection algorithm detects an attack in the last sub-frame of a current frame
- a glottal-shape codebook of the TC coding mode is used in this last sub-frame.
- the detection algorithm is complemented with a second-stage logic to not only detect a larger number of frames including an attack but also, upon coding of such frames, to force the use of the TC coding mode and corresponding glottal- shape codebook in all sub-frames in which an attack is detected.
- the above technique improves coding efficiency of not only attacks detected in a sound signal to be coded but, also, of certain music segments (e.g. castanets). More generally, coding quality is improved.
- Figure 1 is a schematic block diagram of a sound processing and communication system 100 depicting a possible context of implementation of the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack as disclosed in the following description.
- the sound processing and communication system 100 of Figure 1 supports transmission of a sound signal across a communication channel 101 .
- the communication channel 101 may comprise, for example, a wire or an optical fiber link.
- the communication channel 101 may comprise at least in part a radio frequency link.
- the radio frequency link often supports multiple, simultaneous communications requiring shared bandwidth resources such as may be found with cellular telephony.
- the communication channel 101 may be replaced by a storage device in a single device implementation of the system 100 that records and stores the encoded sound signal for later playback.
- a microphone 102 produces an original analog sound signal 103.
- the sound signal 103 may comprise, in particular but not exclusively, speech and/or audio.
- the analog sound signal 103 is supplied to an analog-to-digital (A/D) converter 104 for converting it into an original digital sound signal 105.
- the original digital sound signal 105 may also be recorded and supplied from a storage device (not shown).
- a sound encoder 106 encodes the digital sound signal 105 thereby producing a set of encoding parameters that are multiplexed under the form of a bit stream 107 delivered to an optional error-correcting channel encoder 108.
- the optional error-correcting channel encoder 108 when present, adds redundancy to the binary representation of the encoding parameters in the bit stream 107 before transmitting the resulting bit stream 1 1 1 over the communication channel 101.
- an optional error-correcting channel decoder 109 utilizes the above mentioned redundant information in the received digital bit stream 1 1 1 to detect and correct errors that may have occurred during transmission over the communication channel 101 , producing an error-corrected bit stream 1 12 with received encoding parameters.
- a sound decoder 1 10 converts the received encoding parameters in the bit stream 1 12 for creating a synthesized digital sound signal 1 13.
- the digital sound signal 1 13 reconstructed in the sound decoder 110 is converted to a synthesized analog sound signal 1 14 in a digital-to-analog (D/A) converter 1 15.
- D/A digital-to-analog
- the synthesized analog sound signal 1 14 is played back in a loudspeaker unit 1 16 (the loudspeaker unit 1 16 can obviously be replaced by a headphone).
- the digital sound signal 1 13 from the sound decoder 1 10 may also be supplied to and recorded in a storage device (not shown).
- the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack can be implemented in the sound encoder 106 and decoder 1 10 of Figure 1. It should be noted that the sound processing and communication system 100 of Figure 1 , along with the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack, can be extended to cover the case of stereophony where the input of the encoder 106 and the output of the decoder 1 10 consist of left and right channels of a stereo sound signal.
- the sound processing and communication system 100 of Figure 1 along with the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack, can be further extended to cover the case of multi-channel and/or scene-based audio and/or independent streams encoding and decoding (e.g. surround and high-order ambisonics).
- multi-channel and/or scene-based audio and/or independent streams encoding and decoding e.g. surround and high-order ambisonics.
- FIG. 2 is a schematic block diagram illustrating the structure of a CELP- based encoder and decoder which, according to the illustrative embodiments, is part of the sound processing and communication system 100 of Figure 1.
- a sound codec comprises two basic parts: the sound encoder 106 and the sound decoder 1 10 both introduced in the foregoing description of Figure 1.
- the encoder 106 is supplied with the original digital sound signal 105, determines the encoding parameters 107, described herein below, representing the original analog sound signal 103. These parameters 107 are encoded into the digital bit stream 1 1 1.
- the bit stream 1 1 1 is transmitted using a communication channel, for example the communication channel 101 of Figure 1 , to the decoder 1 10.
- the sound decoder 1 10 reconstructs the synthesized digital sound signal 1 13 to be as similar as possible to the original digital sound signal 105.
- the most widespread speech coding techniques are based on Linear Prediction (LP ), in particular CELP.
- LP-based coding the synthesized digital sound signal 230 ( Figure 2) is produced by filtering an excitation 214 through a LP synthesis filter 216 having a transfer function 1/A(z).
- An example of procedure to find the filter parameters A(z) of the LP filter can be found in Reference [4].
- the excitation 214 is typically composed of two parts: a first- stage, adaptive-codebook contribution 222 produced by selecting a past excitation signal v(n) from an adaptive codebook 218 in response to an index t (pitch lag) and by amplifying the past excitation signal v(n) by an adaptive-codebook gain g p 226 and a second-stage, fixed-codebook contribution 224 produced by selecting an innovative codevector C k (n) from a fixed codebook 220 in response to an index k and by amplifying the innovative codevector C k (n) by a fixed-codebook gain g c 228.
- the adaptive codebook contribution 222 models the periodic part of the excitation and the fixed codebook excitation contribution 224 is added to model the evolution of the sound signal.
- the sound signal is processed by frames of typically 20 ms and the filter parameters A(z) of the LP filter are transmitted from the encoder 106 to the decoder 1 10 once per frame.
- the frame is further divided in several sub- frames to encode the excitation.
- the sub-frame length is typically 5 ms.
- CELP uses a principle called Analysis-by-Synthesis where possible decoder outputs are tried (synthesized) already during the coding process at the encoder 106 and then compared to the original digital sound signal 105.
- the encoder 106 thus includes elements similar to those of the decoder 1 10.
- These elements includes an adaptive codebook excitation contribution 250 (corresponding to the adaptive-codebook contribution 222 at the decoder 110) selected in response to the index t (pitch lag) from an adaptive codebook 242 (corresponding to the adaptive codebook 218 at the decoder 1 10) that supplies a past excitation signal v(n) convolved with the impulse response of a weighted synthesis filter H(z) 238 (cascade of the LP synthesis filter 1/A(z) and a perceptual weighting filter W(z)), the output y 1 (n) of which is amplified by an adaptive-codebook gain g p 240 (corresponding to the adaptive-codebook gain 226 at the decoder 1 10).
- These elements also include a fixed codebook excitation contribution 252 (corresponding to the fixed-codebook contribution 224 at the decoder 1 10) selected in response to the index k from a fixed codebook 244 (corresponding to the fixed codebook 220 at the decoder 1 10) that supplies an innovative codevector C k (n) convolved with the impulse response of the weighted synthesis filter H(z) 246, the output y 2 (n) of which is amplified by a fixed codebook gain g c 248 (corresponding to the fixed-codebook gain 228 at the decoder 1 10).
- the encoder 106 comprises the perceptual weighting filter W(z) 233 and a calculator 234 of a zero-input response of the cascade (H(z)) of the LP synthesis filter 1/A(z) and the perceptual weighting filter W(z).
- Subtractors 236, 254 and 256 respectively subtract the zero-input response from calculator 234, the adaptive codebook contribution 250 and the fixed codebook contribution 252 from the original digital sound signal 105 filtered by the perceptual weighting filter 233 to provide an error signal used to calculate a mean-squared error 232 between the original digital sound signal 105 and the synthesized digital sound signal 113 ( Figure 1 ).
- Minimization of the mean-squared error 232 provides the best candidate past excitation signal v(n) (identified by the index t) and innovative codevector c k (n) (identified by the index k) for coding the digital sound signal 105.
- the perceptual weighting filter W(z) exploits the frequency masking effect and typically is derived from the LP filter A(z).
- An example of perceptual weighting filter W(z) for WB (wideband, bandwidth of typically 50 - 7000 Hz) signals can be found in Reference [4].
- the digital bit stream 11 1 transmitted from the encoder 106 to the decoder 110 contains typically the following parameters 107: quantized parameters of the LP filter A(z), index t of the adaptive codebook 242 and index k of the fixed codebook 244, and the gains g p 240 and g c 248 of the adaptive codebook 242 and of the fixed codebook 244.
- parameters 107 quantized parameters of the LP filter A(z), index t of the adaptive codebook 242 and index k of the fixed codebook 244, and the gains g p 240 and g c 248 of the adaptive codebook 242 and of the fixed codebook 244.
- the received gain g p is used as adaptive-codebook gain 226;
- the received gain g c is used as fixed-codebook gain 228.
- the LP- based core of the EVS codec as described in Reference [4] uses a signal classification algorithm and six (6) distinct coding modes tailored for each category of signal, namely the Inactive Coding (IC) mode, Unvoiced Coding (UC) mode, Transition Coding (TC) mode, Voiced Coding (VC) mode, Generic Coding (GC) mode, and Audio Coding (AC) mode (not shown).
- IC Inactive Coding
- UC Unvoiced Coding
- TC Transition Coding
- VC Voiced Coding
- GC Generic Coding
- AC Audio Coding
- Figure 3 is a simplified high-level block diagram illustrating concurrently the operations of an EVS coding mode classifying method 300 and the modules of an EVS coding mode classifier 320.
- the coding mode classifying method 300 comprises an active frame detection operation 301 , an invoiced frame detection operation 302, a frame after onset detection operation 303 and a stable voiced frame detection operation 304.
- an active frame detector 311 determines whether the current frame is active or inactive. For that purpose, sound activity detection (SAD) or voice activity detection (VAD) can be used. If an inactive frame is detected, the 1C coding mode 321 is selected and the procedure is terminated.
- SAD sound activity detection
- VAD voice activity detection
- the unvoiced frame detection operation 302 is performed using an unvoiced frame detector 312. Specifically, if an unvoiced frame is detected, the unvoiced frame detector 312 selects, to code the detected unvoiced frame, the UC coding mode 322.
- the UC coding mode is designed to code unvoiced frames. In the UC coding mode, the adaptive codebook is not used and the excitation is composed of two vectors selected from a linear Gaussian codebook. Alternatively, the coding mode in UC may be composed of a fixed algebraic codebook and a Gaussian codebook.
- the frame after onset detection operation 303 and a corresponding frame after onset detector 313, and the stable voiced frame detection operation 304 and a corresponding stable voiced frame detector 314 are used.
- the detector 313 detects voiced frames following voiced onsets and selects the TC coding mode 323 to code these frames.
- the TC coding mode 323 is designed to enhance the codec performance in the presence of frame erasures by limiting the usage of past information (adaptive codebook). To minimize at the same time the impact of the TC coding mode 323 on a clean channel performance (without frame erasures), mode 323 is used only on the most critical frames from a frame erasure point of view. These most critical frames are voiced frames following voiced onsets.
- the stable voiced frame detection operation 304 is performed.
- the stable voiced frame detector 314 is designed to detect quasi-periodic stable voiced frames. If the current frame is detected as a quasi-periodic stable voiced frame, the detector 314 selects the VC coding mode 324 to encode the stable voiced frame.
- the selection of the VC coding mode by the detector 314 is conditioned by a smooth pitch evolution. This uses Algebraic Code-Excited Linear Prediction (ACELP ) technology, but given that the pitch evolution is smooth throughout the frame, more bits are assigned to the fixed (algebraic) codebook than in the GC coding mode.
- ACELP Algebraic Code-Excited Linear Prediction
- the detector 314 selects, for encoding such frame, the GC coding mode 325, for example a generic ACELP coding mode.
- a speech/music classification algorithm (not shown) of the EVS Standard is run to decide whether the current frame shall be coded using the AC mode.
- the AC mode has been designed to efficiently code generic audio signals, in particular but not exclusively music.
- UNVOICED TRANSITION class comprises unvoiced frames with a possible voiced onset at the end of the frame.
- VOICED TRANSITION class comprises voiced frames with relatively weak voiced characteristics.
- VOICED class comprises voiced frames with stable characteristics.
- ONSET class comprises all voiced frames with stable characteristics following a frame classified as UNVOICED class or UNVOICED TRANSITION class.
- the TC coding mode was introduced to be used in frames following a transition for helping to stop error propagation in case a transition frame is lost (Reference [4]).
- the TC coding mode can be used in transition frames to increase coding efficiency.
- the adaptive codebook usually contains a noise-like signal not very useful or efficient for coding the beginning of a voiced segment. The goal is to supplement the adaptive codebook with a better, non-predictive codebook populated with simplified quantized versions of glottal impulse shapes to encode the voiced onsets.
- the glottal-shape codebook is used only in one sub-frame containing the first glottal impulse within the frame, more precisely in the sub-frame where the LP residual signal (s w (n) in Figure 2) has its maximum energy within the first pitch period of the frame. Further explanations on the TC coding mode of Figure 3 can be found, for example, in Reference [4].
- the present disclosure proposes to further extend the EVS concept of coding voiced onsets using the glottal-shape codebook of the TC coding mode.
- bit-budget number of available bits
- a difference with the TC coding mode of EVS as described in Reference [4] is that the glottal-shape codebook is usually used in the last sub-frame(s) within the frame, independently of the real maximum energy of the LP residual signal within the first pitch period of the frame.
- the waveform of the sound signal at the beginning of the frame might not be well modeled, especially at low bit-rates where the fixed codebook is formed of, for example, one or two pulses per sub-frame only.
- the human ear sensitivity is exploited here. The human ear is not much sensitive to an inaccurate coding of a sound signal before an attack, but much more sensitive to any imperfection in coding a sound signal segment, for example a voiced segment, after such attack.
- the adaptive codebook in subsequent sound signal frames is more efficient because it benefits from the past excitation corresponding to the attack segment that is well modeled. The subjective quality is consequently improved.
- the present disclosure proposes a method for detecting an attack and a corresponding attack detector which operates on frames to be coded with the GC coding mode to determine if these frames should be encoded with the TC coding mode. Specifically, when an attack is detected, these frames are coded using the TC coding mode. Thus, the relative number of frames coded using the TC coding mode increases. Moreover, as the TC coding mode does not use the past excitation, the intrinsic robustness of the codec against frame erasures is increased with this approach.
- Figure 4 is a block diagram illustrating concurrently the operations of an attack detecting method 400 and the modules of an attack detector 450.
- the attack detecting method 400 and attack detector 450 properly select frames to be coded using the TC coding mode.
- a codec in this illustrative example, a CELP codec with an internal sampling rate of 12.8 kbps and with a frame having a length of 20 ms and composed of four (4) sub-frames.
- An example of such codec is the EVS codec (Reference [4]) at lower bit-rates ( ⁇ 13.2 kbps).
- An application to other types of codecs, with different internal bit-rates, frame lengths and numbers of sub-frames can also be contemplated.
- the detection of attacks starts with a preprocessing where energies in several segments of the input sound signal in the current frame are calculated, followed by a detection performed sequentially in two stages and by a final decision.
- the first-stage detection is based on comparing calculated energies in the current frame while the second-stage detection takes into account also past frame energy values.
- K is the length in samples of the analysis sound signal segment
- / is the index of the segment
- NIK is the total number of segments.
- segments i 8, . . . , 15 to the second sub-frame
- segments i 16,...,23 to the third sub-frame
- segments i 24,..., 31 to the last (fourth) sub-frame of the current frame.
- the segments are consecutive.
- partially overlapping segments can be employed.
- a maximum energy segment finder 452 finds the segment / with maximum energy.
- the finder 452 may use, for example, the following Equation (2):
- the segment with maximum energy represents the position of a candidate attack which is validated in the following two stages (herein after first- stage and second-stage).
- VAD Active Frame
- first-stage and second-stage attack detection Further explanations on VAC (Voice Activity Detection) can be found, for example, in Reference [4].
- Both speech and music frames can be classified in the GC coding mode and, therefore, attack detection is applied in coding not only speech signals but general sound signals.
- the first-stage attack detection operation 404 comprises an average energy calculating operation 405.
- the calculator 455 calculates an average energy across the analysis segments starting with segment l att to the last segment of the current frame, using as an example the following Equation (4):
- the first-stage attack detection operation 404 further comprises a comparison operation 406.
- the first- stage attack detector 454 comprises a comparator 456 for comparing the ratio of the average energy E 1 from Equation (3) and the average energy E 2 from Equation (4) to a threshold depending on the signal classification of the previous frame, denoted as “last_class”, performed by the above discussed frame classification for Frame Error Concealment ( FEC ) (Reference [4]).
- the comparator 456 determines an attack position from the first-stage attack detection, l att1 , using as a non-limitative example, the following logic of Equation (5):
- l att1 0
- no attack is detected.
- Using the logic of Equation (5), all attacks that are not sufficiently strong are eliminated.
- the first-stage attack detection operation 404 further comprises a segment energy comparison operation 407.
- the first-stage attack detector 454 comprises a segment energy comparator 457 for comparing the segment with maximum energy E seg (l att ) with the energy E Seg (i) of the other analysis segments of the current frame.
- threshold b 3 is determined experimentally so as to reduce as much as possible falsely detected attacks without impeding on the efficiency of detection of true attacks.
- the second-stage attack detection operation 410 comprises a voiced class comparison operation 41 1 .
- the second-stage attack detector 460 comprises a voiced class decision module 461 to get information from the above discussed EVS FEC classifying method to determine whether the current frame class is VOICED or not. If the current frame class is VOICED, the decision module 461 outputs the decision that no attack is detected.
- the second-stage attack detection operation 410 comprises a mean energy calculating operation 412.
- the second-stage attack detector 460 comprises a mean energy calculator 462 for calculating a mean energy across N/K analysis segments before the candidate attack l att - including segments from the previous frame - using for example Equation (7):
- the second-stage attack detection operation 410 comprises a logic decision operation 413.
- the second-stage attack detector 460 comprises a logic decision module 463 to find an attack position from the second-stage attack detector, l att 2 , by applying, for example, the following logic of Equation (8) to the mean energy from Equation (7):
- the second-stage attack detection operation 410 finally comprises an energy comparison operation 414.
- the energy comparator 464 set the attack position I att2 to 0 if an attack was detected in the previous frame. In this case no attack is detected.
- a final decision whether the current frame is determined as an attack frame to be coded using the TC coding mode is conducted based on the positions of the attacks l att1 and I att2 obtained during the first-stage 404 and second-stage 410 detection operations, respectively.
- the attack detecting method 400 comprises a first-stage attack decision operation 430.
- the attack detector 450 further comprises a first-stage attack decision module 470 to determine if l att1 3 P. If l att1 3 P, then l att 1 is the position of the detected attack, l att,fina in the last sub-frame of the current frame and is used to determine that the glottal- shape codebook of the TC coding mode is used in this last sub-frame. Otherwise, no attack is detected.
- the position of the detected attack, l att,fina is used to determine in which sub-frame the glottal-shape codebook of the TC coding mode is used.
- the information about the final position l att,final of the detected attack is used to determine in which sub-frame of the current frame the glottal-shape codebook within the TC coding mode is employed and which TC mode configuration (see Reference [3]) is used.
- the glottal-shape codebook is used in the first sub-frame if the final attack position l att,final is detected in segments 1 -7, in the second sub-frame if the final attack position l att,final is detected in segments 8-15, in the third sub-frame if the final attack position l att,final is detected in segments 16-23, and finally in the last (fourth) sub-frame of the current frame if the final attack position l att,final is detected in segments 24-31.
- the value l att,final 0 signals that an attack was not found and that the current frame is coded according to the original classification (usually using the GC coding mode).
- the attack detecting method 400 comprises a glottal-shape codebook assignment operation 445.
- the attack detector 450 comprises a glottal-shape codebook assignment module 485 to assign the glottal-shape codebook within the TC coding mode to a given sub-frame of the current frame consisted from 4 sub-frames using the following logic of Equation (12):
- sbfr is the sub-frame index
- sbfr 0....3
- index 0 denotes the first sub- frame
- index 1 denotes the second sub-frame
- index 2 denotes the third sub-frame
- index 3 denotes the fourth sub-frame.
- the situation is different when the core codec operates at a different internal sampling rate, for example at higher bit-rates (16.4 kbps and more in the case of EVS) where the internal sampling rate is 16 kHz.
- the glottal-shape codebook assignment module 485 selects, in the glottal-shape codebook assignment operation 445, the sub-frame to be coded using the glottal-shape codebook within the TC coding mode using the following logic of Equation (13):
- Equation (13) where the operator indicates the largest integer less than or equal to x.
- the glottal-shape codebook is used in the first sub-frame if the final attack position l att,final is detected in segments 1-6, in the second sub-frame if the final attack position l att,final is detected in segments 7-12, in the third sub-frame if the final attack position l att,final is detected in segments 13-19, in the fourth sub-frame if the final attack position l att,final is detected in segments 20-25, and finally in the last (fifth) sub-frame of the current frame if the final attack position l att,final is detected in segments 26-31.
- Figure 5 is a graph of a first non-restrictive, illustrative example showing the impact of the attack detector of Figure 4 and TC coding mode on the quality of a decoded music signal.
- a music segment of castanets is shown, wherein curve a) represents the input (uncoded) music signal, curve b) represents a decoded reference signal synthesis when only the first-stage attack detection was employed, and curve c) represents the decoded improved synthesis when the whole first-stage and second-stage attack detections and coding using the TC coding mode are employed.
- Figure 6 is a graph of a second non-restrictive, illustrative example showing the impact of the attack detector of Figure 4 and TC coding mode on the quality of a decoded speech signal, wherein curve a) represents an input (uncoded) speech signal, curve b) represents a decoded reference speech signal synthesis when an onset frame is coded using the GC coding mode, and curve c) represents a decoded improved speech signal synthesis when the whole first-stage and second-stage attack detection and coding using the TC coding mode are employed in the onset frame.
- curve a) represents an input (uncoded) speech signal
- curve b) represents a decoded reference speech signal synthesis when an onset frame is coded using the GC coding mode
- curve c) represents a decoded improved speech signal synthesis when the whole first-stage and second-stage attack detection and coding using the TC coding mode are employed in the onset frame.
- Figure 7 is a simplified block diagram of an example configuration of hardware components forming the devices for detecting an attack in a sound signal to be coded and for coding the detected attack and implementing the methods for detecting an attack in a sound signal to be coded and for coding the detected attack.
- the devices for detecting an attack in a sound signal to be coded and for coding the detected attack may be implemented as a part of a mobile terminal, as a part of a portable media player, or in any similar device.
- the devices for detecting an attack in a sound signal to be coded and for coding the detected attack (identified as 700 in Figure 7) comprises an input 702, an output 704, a processor 706 and a memory 708.
- the input 702 is configured to receive for example the digital input sound signal 105 ( Figure 1 ).
- the output 704 is configured to supply the encoded bit-stream 1 1 1.
- the input 702 and the output 704 may be implemented in a common module, for example a serial input/output device.
- the processor 706 is operatively connected to the input 702, to the output 704, and to the memory 708.
- the processor 706 is realized as one or more processors for executing code instructions in support of the functions of the various modules of the sound encoder 106, including the modules of Figures 2, 3 and 4.
- the memory 708 may comprise a non-transient memory for storing code instructions executable by the processor 706, specifically a processor- readable memory comprising non-transitory instructions that, when executed, cause a processor to implement the operations and modules of the sound encoder 106, including the operations and modules of Figures 2, 3 and 4.
- the memory 708 may also comprise a random access memory or buffer(s) to store intermediate processing data from the various functions performed by the processor 706.
- modules, processing operations, and/or data structures described herein may be implemented using various types of operating systems, computing platforms, network devices, computer programs, and/or general purpose machines.
- devices of a less general purpose nature such as hardwired devices, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), or the like, may also be used.
- FPGAs field programmable gate arrays
- ASICs application specific integrated circuits
- a method comprising a series of operations and sub-operations is implemented by a processor, computer or a machine, and those operations and sub-operations may be stored as a series of non-transitory code instructions readable by the processor, computer or machine, they may be stored on a tangible and/or non-transient medium.
- Modules of the methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack as described herein may comprise software, firmware, hardware, or any combination(s) of software, firmware, or hardware suitable for the purposes described herein.
- 3GPP TS 26.445 "Codec for Enhanced Voice Services (EVS);
- the pseudo-code is based on EVS. New IVAS logic is highlighted in shaded background.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Description
Claims
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020217034717A KR20220006510A (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting attack in a sound signal and coding the detected attack |
JP2021566035A JP2022532094A (en) | 2019-05-07 | 2020-05-01 | Methods and Devices for Detecting Attacks in Coding Audio Signals and Coding Detected Attacks |
CA3136477A CA3136477A1 (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack |
US17/602,071 US20220180884A1 (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack |
EP20802156.8A EP3966818A4 (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack |
BR112021020507A BR112021020507A2 (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting an attack in a sound signal to be encoded and for encoding the detected attack |
CN202080033815.3A CN113826161A (en) | 2019-05-07 | 2020-05-01 | Method and device for detecting attack in a sound signal to be coded and decoded and for coding and decoding the detected attack |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962844225P | 2019-05-07 | 2019-05-07 | |
US62/844,225 | 2019-05-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020223797A1 true WO2020223797A1 (en) | 2020-11-12 |
Family
ID=73050501
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CA2020/050582 WO2020223797A1 (en) | 2019-05-07 | 2020-05-01 | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack |
Country Status (8)
Country | Link |
---|---|
US (1) | US20220180884A1 (en) |
EP (1) | EP3966818A4 (en) |
JP (1) | JP2022532094A (en) |
KR (1) | KR20220006510A (en) |
CN (1) | CN113826161A (en) |
BR (1) | BR112021020507A2 (en) |
CA (1) | CA3136477A1 (en) |
WO (1) | WO2020223797A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2008049221A1 (en) | 2006-10-24 | 2008-05-02 | Voiceage Corporation | Method and device for coding transition frames in speech signals |
US7933769B2 (en) * | 2004-02-18 | 2011-04-26 | Voiceage Corporation | Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX |
US8630863B2 (en) * | 2007-04-24 | 2014-01-14 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding audio/speech signal |
US10096323B2 (en) * | 2006-11-28 | 2018-10-09 | Samsung Electronics Co., Ltd. | Frame error concealment method and apparatus and decoding method and apparatus using the same |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7472059B2 (en) * | 2000-12-08 | 2008-12-30 | Qualcomm Incorporated | Method and apparatus for robust speech classification |
JP2006502426A (en) * | 2002-10-11 | 2006-01-19 | ノキア コーポレイション | Source controlled variable bit rate wideband speech coding method and apparatus |
-
2020
- 2020-05-01 CN CN202080033815.3A patent/CN113826161A/en active Pending
- 2020-05-01 US US17/602,071 patent/US20220180884A1/en active Pending
- 2020-05-01 CA CA3136477A patent/CA3136477A1/en active Pending
- 2020-05-01 EP EP20802156.8A patent/EP3966818A4/en active Pending
- 2020-05-01 JP JP2021566035A patent/JP2022532094A/en active Pending
- 2020-05-01 BR BR112021020507A patent/BR112021020507A2/en unknown
- 2020-05-01 KR KR1020217034717A patent/KR20220006510A/en unknown
- 2020-05-01 WO PCT/CA2020/050582 patent/WO2020223797A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7933769B2 (en) * | 2004-02-18 | 2011-04-26 | Voiceage Corporation | Methods and devices for low-frequency emphasis during audio compression based on ACELP/TCX |
WO2008049221A1 (en) | 2006-10-24 | 2008-05-02 | Voiceage Corporation | Method and device for coding transition frames in speech signals |
US10096323B2 (en) * | 2006-11-28 | 2018-10-09 | Samsung Electronics Co., Ltd. | Frame error concealment method and apparatus and decoding method and apparatus using the same |
US8630863B2 (en) * | 2007-04-24 | 2014-01-14 | Samsung Electronics Co., Ltd. | Method and apparatus for encoding and decoding audio/speech signal |
Non-Patent Citations (5)
Title |
---|
"Codec for Enhanced Voice Services (EVS); Detailed Algorithmic Description", 3GPP TS 26.445 |
See also references of EP3966818A4 |
V. EKSLERM. JELINEK: "Glottal-Shape Codebook to Improve Robustness of CELP Codecs", IEEE TRANS. ON AUDIO, SPEECH AND LANGUAGE PROCESSING, vol. 18, no. 6, August 2010 (2010-08-01), pages 1208 - 1217 |
V. EKSLERM. JELINEKR. SALAMI, METHOD AND DEVICE FOR THE ENCODING OF TRANSITION FRAMES IN SPEECH AND AUDIO |
V. EKSLERR. SALAMIM. JELINEK: "Efficient handling of mode switching and speech transitions in the EVS codec", PROC. IEEE INT. CONF. ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2015 |
Also Published As
Publication number | Publication date |
---|---|
JP2022532094A (en) | 2022-07-13 |
EP3966818A4 (en) | 2023-01-04 |
CN113826161A (en) | 2021-12-21 |
BR112021020507A2 (en) | 2021-12-07 |
KR20220006510A (en) | 2022-01-17 |
CA3136477A1 (en) | 2020-11-12 |
EP3966818A1 (en) | 2022-03-16 |
US20220180884A1 (en) | 2022-06-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR101406113B1 (en) | Method and device for coding transition frames in speech signals | |
US6134518A (en) | Digital audio signal coding using a CELP coder and a transform coder | |
TWI362031B (en) | Methods, apparatus and computer program product for obtaining frames of a decoded speech signal | |
TWI582758B (en) | Apparatus and method for selecting one of a first encoding algorithm and a second encoding algorithm using harmonics reduction | |
JP6530449B2 (en) | Encoding mode determination method and apparatus, audio encoding method and apparatus, and audio decoding method and apparatus | |
KR102007972B1 (en) | Unvoiced/voiced decision for speech processing | |
JP2004508597A (en) | Simulation of suppression of transmission error in audio signal | |
KR20020052191A (en) | Variable bit-rate celp coding of speech with phonetic classification | |
MXPA06011957A (en) | Signal encoding. | |
US20140214413A1 (en) | Systems, methods, apparatus, and computer-readable media for adaptive formant sharpening in linear prediction coding | |
US10672411B2 (en) | Method for adaptively encoding an audio signal in dependence on noise information for higher encoding accuracy | |
CN101609681B (en) | Coding method, coder, decoding method and decoder | |
KR20230129581A (en) | Improved frame loss correction with voice information | |
US20220180884A1 (en) | Methods and devices for detecting an attack in a sound signal to be coded and for coding the detected attack | |
Miki et al. | Pitch synchronous innovation code excited linear prediction (PSI‐CELP) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20802156 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 3136477 Country of ref document: CA |
|
REG | Reference to national code |
Ref country code: BR Ref legal event code: B01A Ref document number: 112021020507 Country of ref document: BR |
|
ENP | Entry into the national phase |
Ref document number: 2021566035 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 112021020507 Country of ref document: BR Kind code of ref document: A2 Effective date: 20211013 |
|
ENP | Entry into the national phase |
Ref document number: 2020802156 Country of ref document: EP Effective date: 20211207 |