WO2012158159A1 - Packet loss concealment for audio codec - Google Patents

Packet loss concealment for audio codec Download PDF

Info

Publication number
WO2012158159A1
WO2012158159A1 PCT/US2011/036662 US2011036662W WO2012158159A1 WO 2012158159 A1 WO2012158159 A1 WO 2012158159A1 US 2011036662 W US2011036662 W US 2011036662W WO 2012158159 A1 WO2012158159 A1 WO 2012158159A1
Authority
WO
WIPO (PCT)
Prior art keywords
frame
pitch
signal
residual signals
stored
Prior art date
Application number
PCT/US2011/036662
Other languages
English (en)
French (fr)
Inventor
Turaj ZAKIZADEH SHABESTARY
Tina LE GRAND
Original Assignee
Google Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Google Inc. filed Critical Google Inc.
Priority to PCT/US2011/036662 priority Critical patent/WO2012158159A1/en
Priority to CN201180072349.0A priority patent/CN103688306B/zh
Publication of WO2012158159A1 publication Critical patent/WO2012158159A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • the technical field relates to packet loss concealment in communication systems (such as Voice over IP, also referred to as VoIP), having an audio codec (coder/decoder).
  • Voice over IP also referred to as VoIP
  • audio codec coder/decoder
  • coder/decoder One such codec may be iSAC.
  • Real-time communication refers to communication where the delay between one user speaking and another user hearing the speech is so short that it is imperceptible or nearly imperceptible.
  • packet-switched networks such as the Internet
  • VoIP is one audio communication approach enabling real-time communication over packet-switched networks.
  • an audio signal is broken up into short time segments by an audio coder, and the time segments are transmitted individually as audio frames in packets.
  • the packets are received by the receiver, the audio frames are extracted, and the short time segments are reassembled by an audio decoder into the original audio signal, enabling the receiver to hear the transmitted audio signal,
  • Real time audio communication over packet-switched networks has brought with it unique challenges.
  • the available bandwidth of the network may be limited, and may change over time, Packets may also get lost or corrupted. A packet is considered lost, when it fails to arrive at the intended receiver within a limited time interval, even if the packet does eventually arrive at the receiver.
  • BEC Backward Error Correction
  • Another approach for dealing with lost packets is to use information from received packets to recreate lost packet or packets.
  • the received packets may contain information specifically for this purpose, such as redundant information about audio data from preceding time segments.
  • Such an approach will result in reduced effective bandwidth available for communication, because the available bandwidth is used for transmitting redundant data, which may not be needed at all if packets are not lost.
  • the present invention recognizes the problem posed by lost packets in real-time audio communication over packet switched networks, and provides a solution that avoids the disadvantages of the above examples.
  • the loss of packets is concealed by simulating the audio information that would have likely been contained in the lost packets based on previously received packets.
  • the invention utilizes packets that were previously received to reconstruct dropped packets in a particular way, without the use of a jitter buffer. Specifically, information from a previously received packet is used to reconstruct a lost packet, but the information is not merely copied. If it were simply copied, the resulting audio would sound unnatural and "robotic.” Instead, the information from the previously received packet is modified in a special way to make the reconstructed packet result in natural sounding audio.
  • a method of decoding an audio signal having been encoded as a sequence of consecutive frames may include receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals.
  • the modifying the stored residual signals may include generating a periodic signal, generating a colored pseudo-random signal based on the stored residual signals, multiplying the periodic signal and the colored pseudo-random signal with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and summing the weighted periodic signal and the weighted colored pseudo-random signal.
  • the generating the periodic signal may include retrieving at least two most recently stored pitch cycles, altering periodicity of each pitch cycle, weighting each pitch cycle, and summing the two weighted pitch cycles.
  • the altering the periodicity may include resampling pitch pulses of the pitch cycles.
  • the generating the colored pseudo-random signal may include generating a pseudo-random sequence, and filtering the pseudo-random sequence with Nth- order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
  • the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter.
  • the decoding parameters may include pitch gains, pitch lags, and LPC parameters.
  • the frames may contain encoded information for a first frequency band and distinct second frequency band higher than the first frequency band, and only a residual signal of the first frequency band is pitch post filtered, but not a residual signal of the second frequency band.
  • a decoding apparatus for decoding an audio signal having been encoded as a sequence of consecutive frames includes a receiver configured to receive a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, a storage unit storing the residual signals contained in the first frame, a decoding unit configured to decode the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, a loss detector configured to determine that a second frame subsequent to the first frame in time has been lost, a modification unit configured to modify the stored residual signals, and a reconstruction unit configured to reconstruct an estimate of the audio signal encoded by the second frame based on the stored residual signals modified by the modification unit.
  • the modification unit may include a first signal generator configured to generate a periodic signal, a second signal generator configured to generate a colored pseudo-random signal based on the stored residual signals, a multiplier multiplying the periodic signal generated in the first signal generator and the colored pseudo-random signal generated in the second signal generator with weight factors selected based on energy of an input and an output signal of a pitch synthesis filter created from the stored residual signals and based on pitch gain of the stored residual signals, and an adder summing the weighted periodic signal and the weighted colored pseudo-random signal output from the multiplier.
  • the first signal generator may be configured to retrieve at least two most recently stored pitch cycles, alter periodicity of each pitch cycle, weight each pitch cycle, and sum the two weighted pitch cycles.
  • the first signal generator may be configured to alter the periodicity by resampling pitch pulses of the pitch cycles.
  • the second signal generator may be configured to generate a pseudo-random sequence, and filter the pseudo-random sequence with Nth-order all-zero filter with coefficients given by N latest samples of a previously decoded lower-band residual signals of a previously received frame.
  • the stored residual signals may include input of a pitch synthesis filter, and input of an LPC synthesis filter.
  • the decoding parameters may include pitch gains, pitch lags, and LPC parameters.
  • a computer readable tangible recording medium is encoded with instructions, wherein the instructions, when executed on a processor, cause the processor to perform a method including receiving a first frame of the consecutive frames, the first frame containing decoding parameters and a residual signals for reconstructing audio data represented by the first frame, storing the residual signals contained in the first frame, decoding the first frame based on the stored residual signals to reconstruct the audio signal encoded by the first frame, determining that a second frame subsequent to the first frame in time has been lost, modifying the stored residual signals, and reconstructing an estimate of the audio signal encoded by the second frame based on the modified residual signals,
  • FIG. 1 is a block diagram illustrating an example of a communication system according to an embodiment of the present invention.
  • FIG. 2 illustrates an example of a stream of packets with a lost packet according to an embodiment of the present invention
  • FIG. 3 illustrates an example of a process flow of receiving packets according to an embodiment of the present invention.
  • FIG. 4 illustrates an example of a process flow of decoding received packets according to an embodiment of the present invention.
  • FIGS. 5A and 5B illustrate an example of a process flow of an algorithm for concealing packet loss according to an embodiment of the present invention.
  • FIGS. 6A and 6B illustrate an example of a process flow of an algorithm for generating a quasi-periodic pulse train according to an embodiment of the present invention.
  • FIG. 7 illustrates an example of a processing system for implementing the packet loss algorithm according to an embodiment of the present invention.
  • Fig. 1 illustrates a communication system. Audio input is passed into one end of the system, and is ultimately output at the other end of the system. The communication can be concurrently bi-directional, as in a telephone conversation between two callers. The audio input can be generated by a user speaking, by a recording, or any other audio source. The audio input is supplied to encoder 102.
  • Encoder 102 encodes the audio input into multiple packets, which are transmitted over packet network 104 to decoder 106.
  • Packet network 104 can be any packet-switched network, whether using physical link connection and/or wireless link connections. Packet network 104 may also be a wireless communication network, and/or an optical link network. Packet network 104 conveys packets from encoder 102 to decoder 106. Some of the packets sent from the encoder 102 may get lost, as illustrated in Fig. 2.
  • the encoder 102 may be the iSAC coder, and produces as output packets (also referred to as frames).
  • An embodiment of the invention relies on pitch information, and assumes that pitch parameters are available at the decoder. But even if pitch parameters are not embedded in the payload, they could be estimated at the decoder based on the previously decoded audio.
  • Each frame corresponds to a short segment of time, for example 30 or 60 milliseconds for iSAC. Other segment lengths may also be used with other encoders.
  • Oneway delay is at least as large as one frame size, so frame sizes longer than 60 ms may create unacceptably long delays.
  • longer frames are harder to conceal in the event of a lost packet. Shorter frames on the other hand may introduce too much packet overhead, reducing the effective bandwidth. If delay was not a concern (for instance in streaming), high quality could be achieved by allowing long frame sizes for stationary segments.
  • the encoder 102 may separate the incoming audio signal into two frequency bands, referred to as the lower band (LB) and the upper band (UB).
  • the LB may be 0-4kHz
  • the UB may be 4-8kHz.
  • a single frequency band (e.g., 0-8kHz) may also be used, without separating the incoming audio signal into separate bands.
  • each frame contains at least pitch gain, pitch lag, LPC parameters, and DFT coefficients of a residual signal during the corresponding time segment.
  • each of the bands will have respective information in the frame, the information for each band can be individually selected from the frame, and there are no pitch parameters associated with the UB band.
  • the encoder used is iSAC
  • the pitch lag can be thought of as the "optimal" delay of a long-term predictor
  • pitch gain can be though of as the prediction gain
  • LPC coefficients are optimal short-term prediction coefficients.
  • Decoder 106 receives packets conveyed by network 104 and decodes the packets into audio data, which is output from decoder 106. Details of the processing performed by the decoder 106 are illustrated in Figs. 3-6. Decoder 106 may be implemented on a processor, such as illustrated in Fig. 7, or on other hardware platforms, such as mobile telecommunication devices. The processing performed by decoder 106 is advantageous for mobile devices that lack sufficient processing power to perform alternate types of packet loss concealment, as the approach according to the present invention is of a relatively low computational complexity.
  • Fig. 3 illustrates a high level processing flow of the PLC approach according to an embodiment of the present invention.
  • step S 306 a determination is made whether frame N has been received, i.e., not lost. If frame N has been received, the processing continues to step S 320, where frame N is decoded.
  • Fig. 4 illustrates additional details of the processing in step S 320.
  • step S 320 After frame N is decoded in step S 320, the processing increments index N in step S 340, and continues with step S 306 to determine if frame N+l has been received. So long as frames are not lost, the processing continues along the loop of step S 306, S 320, and S 340.
  • step S 306 If it is determined in step S 306 that a frame has been lost, the processing continues to step S 350, where the loss of the frame is concealed.
  • Figs. 5A-B illustrate additional details of the processing in step S 350.
  • Fig. 4 illustrates an example of the process of decoding frames that are received by decoder 106.
  • frame size and bandwidth information are decoded from the frame in step S 410.
  • the frame size represents the size of the time segment represented by the frame, and can be represented in milliseconds, or count of samples at a particular sampling rate.
  • the sampling rate may also be encoded in the frame. Sampling rate may be negotiated before a call takes place and is not supposed to change during a call.
  • the bandwidth information reflects the bandwidth of the audio data encoded in the frame, and may be LB, UB, or both.
  • step S 415 the pitch lags and the pitch gains are decoded from the frame.
  • Pitch lags and gains may be updated every 7.5 ms, thus resulting is 4 pitch lags and gains per one 30 ms frame.
  • the pitch lag represents the lag of a long-term predictor for the current signal.
  • the pitch gain represents the long-term linear prediction coefficient.
  • step S 420 The decoded pitch lags and pitch gains are stored in step S 420, as they may be needed for packet loss concealment, if subsequent frames are lost.
  • step S 425 the LPC parameters (LPC shape and gain) are decoded.
  • the LPC parameters represent short-term linear prediction coefficients, describing the spectral envelope of the signal.
  • the LPC shape and gain are stored in step S 430, as they may be needed for packet loss concealment, if subsequent frames are lost.
  • step S 435 the DFT coefficients of the residual signal encoded in the frame are decoded.
  • the residual signal is the result of filtering out the short term and the long term linear dependencies.
  • the DFT coefficients are the result of transforming the residual signal into the frequency domain by an operation such as the FFT.
  • the DFT coefficients may include separate information for the LB signal and separate information for the UB signal.
  • step S 440 the DFT coefficients which were decoded in step S 435 are transformed from the frequency domain into the time domain, by an operation such as an inverse FFT, resulting in the residual signal.
  • an operation such as an inverse FFT
  • a separate residual signal is created for LB (referred to as LBJ es) and a separate residual signal is created for UB (referred to as UB_Res).
  • step S 445 the residual signals (LB Res and UB_R.es) are stored, as they may be needed for packet loss concealment.
  • step S 450 the lower band residual signal (LB Res) is filtered by a pitch post- filter.
  • the pitch post-filter is pole-zero filter where coefficients are given by the pitch gain and lag. It is the inverse of pitch pre-filter, therefore, it introduces long-term structure which was removed by the pitch pre-filter. Even when both LBJtes and UB_Res may be available, only the LB ies will be pitch post-filtered.
  • the output of the pitch post-filter (the filtered residual signal) is stored, as it may be needed for packet loss concealment.
  • step S 455 the LPC parameters decoded in step S 425 are used to synthesize the lower band and the upper band signals.
  • LPC synthesis is an all-pole filter with coefficients derived from LPC parameters. This filter is the inverse of LPC analysis (at the encoder), therefore, it introduces short-term structure of the signal,
  • the output of LPC synthesis is the time domain representation of the original encoded signal. In the case where LB and UB are used at the same time, the output is a separate LB signal and UB signal.
  • step S 460 When LB and UB are used together, in step S 460 the LB signal and the UB signal are combined, thus creating a representation of the original audio input, thereby, the output can be the audio input for a receiver, illustrated in Fig. 1. In an implementation where LB and UB are not treated separately, and only a single frequency band is used, step S 460 may be skipped.
  • the re-creation of the audio depends of the availability of the residual signal, pitch gain and lag, and LPC parameters from the received frame. In case of packet loss, however, that information is not available. As each frame represents a time segment on the order of 30 milliseconds, it is possible to simply copy the information from a preceding frame to represent the lost frame. With that approach, however, the audio would sound artificial and robotic. Thus, the inventors have derived an approach to reconstruct the data from the lost frame based on previously received frames which creates natural sounding audio.
  • step S 306 When it is determined in step S 306 that a frame has been lost, decoder 106 performs packet loss concealment in step S 350. As shown in Fig. 5 A, stored pitch lag and pitch gain are retrieved in step S 510. The pitch lag and pitch gain were stored in step S 420 for the previous received frame.
  • step S 515 the residual signal is retrieved for the previous received frame.
  • the residual signal was stored in step S 445.
  • step S 516 the decoder determines whether the current lost frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 520.
  • step S 520 the latest two pitch pulses are computed.
  • the pitch pulses used are closest in time to the lost frame. The computation is based on the pitch lag and the residual signal retrieved in steps S 510 and S 515.
  • the two latest pitch pulses are only computed for the LB signal, even when both LB and UB signals are used.
  • the two pitch pulses may be computed for both the LB and UB signals.
  • the choice of using two pitch pulses is a design parameter determined by the inventors for optimal performance, but other number of pitch pulses could also be used.
  • step S 525 the pitch pulses obtained in step S 520 are stored.
  • the pitch pulses will be referred to as LB PI and LB P2.
  • step S 530 the pitch post-filter output stored in step S 450 is retrieved, and in step S 535 the pitch post-filter output is used to compute a long-term similarity measure. More specifically the long-term similarity measure is a ratio computed based on the energy of pitch pulses before and after the post-filtering of the previous frame. It is a measure of how periodic the previous frame was.
  • a voice indicator is computed based on the long-term similarity measure and the frequency of the computed pitch pulses.
  • the voice indicator may be calculated as log2( sigma2_out / sigma2_in ) + 2 * pitch_gain + pitch_gain / 256, where log2(x) is logarithm of x in base 2, sigma2_out is the variance of the latest pitch pulse at the output of pitch post-filter and sigma2_in is the variance of the corresponding pulse at the input.
  • the voice indicator is an indication of how periodic the last decoded frame was.
  • step S 545 weigh factors are computed for voiced and un-voiced segments.
  • the weight factor for voiced segments is w_v, while the weight factor for un-voiced segments is w_u.
  • the weights are stored in step S 550.
  • the description of steps S 520 through S 550 is based on non-consecutive lost frames.
  • the processing differs for multiple consecutive lost frames as compared to a single lost frame. In the case of multiple consecutive lost frames, there is no immediately preceding frame that has been received. However, the first lost frame of a sequence of multiple lost frames will have been processed through steps S 520 to S 550. Any subsequent lost frames follow the processing through S 517 and S 547.
  • step S 517 A decay rate is increased.
  • the decay rate is the rate that the synthesized residual signal is decayed to zero, and is applied in step S 590.
  • step S 547 the weight factors w_v and w_u calculated during the previous PLC call (stored in step S 550) are retrieved.
  • step S 556 the weight factors w_v and w_u are analyzed to determine what kind of utterance is contained in the most recent received frame. Voiced utterances have strong periodic nature, while unvoiced utterances do not. If the most recently received frame contains voiced utterances, w_v will be greater than zero. If the frame also contains unvoiced utterances, w_u will also be grater than zero. The weights reflect the relative mix of voiced to unvoiced utterances in the frame. A frame with only voiced utterances will have w_ u equal to zero, while a frame with only unvoiced utterances will have w_v equal to zero. If both w_v and w_u are non-zero, the utterance is considered a mixed utterance.
  • step S 560 If it is determined that the utterance is unvoiced (i.e., w_v is zero), the processing proceeds to step S 560, where a pseudo random vector is generated.
  • a pseudo random vector may be generated for LB and a separate one for UB, when both LB and UB are used.
  • step S 562 the pseudo random vector is filtered by an Nth-order all-zero filter with coefficients given by N latest samples of recently decoded residual signal.
  • N may be a fixed number equal to 30. This filtering will color the generated pseudo random vectors to have a spectrum envelope similar to that of the previous received packet.
  • step S 556 If it is determined in step S 556 that the utterance is voiced (i.e., w_ is zero), the processing proceeds to step S 580.
  • step S 580 a quasi periodic pulse train is constructed.
  • the quasi periodic pulse train is a weighted sum of the two latest pitch cycles.
  • the output is the residual signal. In case both LB and UB are used, the output is the LB residual and the UB residual. Details of the process of generating the quasi periodic pulse train are illustrated in Figs. 6A-B.
  • step S 556 If it is determined in step S 556 that the utterance is mixed, the processing proceeds to step S 570.
  • Step S 570 is functionally the same as step S 580. The details of the processing in step S 570 are illustrated in Figs. 6A-B.
  • the output of step S 570 is a lower band pulse train (referred to as LB_F) and an upper band pulse train (referred to as UB P).
  • step S 572 two pseudo random vectors are generated, one for LB and one for UB.
  • the process of generating the pseudo random vectors is the same as in step S 560.
  • the LB pseudo random vector will be referred to as LB N and the UB pseudo random vector will be referred to as UB_N,
  • step S 574 weight factors w_v and w u are applied to the quasi-periodic pulse train and to the pseudo random vectors as follows.
  • the LB residual is LB_P*w_v + LB_N*w_u.
  • the UB residual is UB_P*w_v + UB_N*w_u.
  • step S 590 the residual signal is decayed.
  • the decay is linear and applied sample-by- sample. If K is the size of the reconstructed residual signal, the following pseudo code illustrates an exemplary algorithm for decaying the signal, where d is a number less than 1 , the role of the decay_rate is apparent:
  • step S 592 the LB residual is pitch post-filtered, similar to step S 450.
  • the pitch post-filtering uses filter coefficients derived from pitch lag and pitch gain stored in step S 420.
  • the UB residual can skip pitch post-filtering.
  • step S 594 LPC parameters stored in step S 430 are retrieved, and LPC synthesis of the LB and UB signal is performed based on the retrieved parameters. [0074] In step S 596 the LB and UB signals are combined to create a synthesized representation of the audio of the lost frame.
  • Figs. 6A-B illustrate a detailed description of the process of constructing a quasi- periodic pulse train according to an embodiment of the present invention.
  • a quasi-periodic pulse train is constructed in steps S 570 and S 580.
  • step S 610 the pitch lag of a previous frame, LB PI, LB P2, and UB_Res are retrieved. These values were previously stored when the previous frame was received.
  • step S 615 loop counters j and p_cntr are initialized to zero.
  • step S 616 the decoder determines whether the current frame is one of consecutive lost frames. If the lost frame is not one of multiple consecutive lost frames, the processing proceeds to step S 617, where the value of variable L is set equal to the retrieved pitch lag from step S610. It can be appreciated that the first lost frame will cause L to be initialized to the value of pitch lag, but subsequent lost frames will bypass step S 617, and the processing will continue to step S 620.
  • step S 620 LB_P1 is resampled to L samples and assigned to Rl. Thus, the length of Rl is L samples.
  • step S 625 the last L samples of UB_Res are selected, and referred to as Ql.
  • step S 630 loop counter i is initialized to zero.
  • step S 636 the decoder determines whether j is less than the frame size (extracted in step S 410). As long as j remains less than the frame size, the loop continues. When j reaches the frame ize, LB P and UB P are returned as the quasi-periodic pulse trains.
  • step S 638 the decoder determines whether i is less than L. If i is less than L, the process returns to step S 635 and continues the loop. Once i reaches L, the process continues to step S 640, shown in Fig. 6B.
  • step S 640 p cntr is incremented by one.
  • step S 642 the decoder determines whether L is greater than pitch Jag. If L is not greater, L is set to pitch Jag+1 in step S 644. If L is greater than pitch ag, L is set to pitch Jag in step S 646.
  • This processing is an example of resampling of pitch pulses to avoid too much of periodicity in the reconstructed signal.
  • step S 650 LB PI is resampled to L samples and assigned to RJ.
  • the length of Rl is I samples.
  • step S 655 LB P2 is resampled to L samples and assigned to R2.
  • the length of R2 is L samples.
  • step S 656 the decoder determines whether the value of p_cntr is equal to 1, 2, or 3.
  • Rl is set to (3 *R1 +R2)/4 in step S 661.
  • Rl is set to (Rl+R2)/2 in step S 662.
  • step S 673 If the value of p_cntr is 3, Rl is set to (Rl+3*R2)/4 in step S 663, and p_cntr is set to 0 in step S 673. [0092] At the conclusion any of steps S 661, S 662, and S 673 the processing returns to step S 630 in Fig. 6A.
  • FIG. 7 is a block diagram illustrating an example of a computing device 700 that is arranged for packet loss concealment in accordance with the present disclosure.
  • computing device 700 typically includes one or more processors 710 and system memory 720.
  • a memory bus 730 can be used for communicating between the processor 710 and the system memory 720,
  • processor 710 can be of any type including but not limited to a microprocessor ( ⁇ ), a microcontroller ( ⁇ ), a digital signal processor (DSP), or any combination thereof.
  • Processor 710 can include one more levels of caching, such as a level one cache 711 and a level two cache 712, a processor core 713, and registers 714.
  • the processor core 713 can include an arithmetic logic unit (ALU), a floating point unit (FPU), a digital signal processing core (DSP Core), or any combination thereof.
  • a memory controller 715 can also be used with the processor 710, or in some implementations the memory controller 715 can be an internal part of the processor 710.
  • system memory 720 can be of any type including but not limited to volatile memory (such as RAM), non- volatile memory (such as ROM, flash memory, etc.) or any combination thereof.
  • System memory 720 typically includes an operating system 721, one or more applications 722, and program data 724.
  • Application 722 includes a decoding processing algorithm with packet loss concealment 723 that is arranged to decode incoming packets, and to conceal lost packets according to the present disclosure.
  • Program Data 724 includes service data 725 that is useful for performing decoding of received packets and concealing lost packets, as will be further described below.
  • application 722 can be arranged to operate with program data 724 on an operating system 721 such that Android, Chrome, Windows, etc. This described basic configuration is illustrated in FIG. 7 by those components within dashed line 701.
  • Computing device 700 can have additional features or functionality, and additional interfaces to facilitate communications between the basic configuration 701 and any required devices and interfaces.
  • a bus/interface controller 740 can be used to facilitate communications between the basic configuration 701 and one or more data storage devices 750 via a storage interface bus 741.
  • the data storage devices 750 can be removable storage devices 751, non-removable storage devices 752, or a combination thereof.
  • removable storage and non-removable storage devices include magnetic disk devices such as flexible disk drives and hard-disk drives (HDD), optical disk drives such as compact disk (CD) drives or digital versatile disk (DVD) drives, solid state drives (SSD), and tape drives to name a few.
  • Example computer storage media can include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data.
  • System memory 720, removable storage 751 and non-removable storage 752 are all examples of computer readable storage media, and store information as described in various steps of the processing algorithms described in this disclosure.
  • Computer readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by computing device 700. Any such computer storage media can be part of device 700, and can store instructions that are executed by processor 710, and cause the computing device 700 to perform a method of decoding packets and concealing lost packets as described in this disclosure.
  • Computing device 700 can also include an interface bus 742 for facilitating communication from various interface devices (e.g., output interfaces, peripheral interfaces, and communication interfaces) to the basic configuration 701 via the bus/interface controller 740.
  • Example output devices 760 include a graphics processing unit 761 and an audio processing unit 762, which can be configured to communicate to various external devices such as a display or speakers via one or more A/V ports 763.
  • Example peripheral interfaces 770 include a serial interface controller 771 or a parallel interface controller 772, which can be configured to communicate with external devices such as input devices (e.g., keyboard, mouse, pen, voice input device, touch input device, etc.) or other peripheral devices (e.g., printer, scanner, etc.) via one or more I/O ports 773.
  • An example communication device 780 includes a network controller 781, which can be arranged to facilitate communications with one or more other computing devices 790 over a network communication via one or more communication ports 782.
  • the communication connection is one example of a communication media.
  • Communication media may typically be embodied by computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism, and includes any information delivery media.
  • a "modulated data signal" can be a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal.
  • communication media can include wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, radio frequency (RF), infrared (IR) and other wireless media.
  • RF radio frequency
  • IR infrared
  • the term computer readable media as used herein can include both storage media and communication media.
  • Computing device 700 can be implemented as a portion of a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • a small-form factor portable (or mobile) electronic device such as a cell phone, a personal data assistant (PDA), a personal media player device, a wireless web-watch device, a personal headset device, an application specific device, or a hybrid device that include any of the above functions.
  • PDA personal data assistant
  • Computing device 700 can also be implemented as a personal computer including both laptop computer and non-laptop computer configurations.
  • the implementer may opt for a mainly hardware and/or firmware vehicle; if flexibility is paramount, the implementer may opt for a mainly software implementation; or, yet again alternatively, the implementer may opt for some combination of hardware, software, and/or firmware.
  • Examples of a signal bearing medium include, but are not limited to, the following: a recordable type medium such as a floppy disk, a hard disk drive, a Compact Disc (CD), a Digital Video Disk (DVD), a digital tape, a computer memory, etc.; and a transmission type medium such as a digital and/or an analog communication medium (e.g., a fiber optic cable, a waveguide, a wired communications link, a wireless communication link, etc.) .
  • a typical data processing system generally includes one or more of a system unit housing, a video display device, a memory such as volatile and nonvolatile memory, processors such as microprocessors and digital signal processors, computational entities such as operating systems, drivers, graphical user interfaces, and applications programs, one or more interaction devices, such as a touch pad or screen, and/or control systems including feedback loops and control motors (e.g., feedback for sensing position and/or velocity; control motors for moving and/or adjusting components and/or quantities).
  • a typical data processing system may be implemented utilizing any suitable commercially available components, such as those typically found in data computing/communication and/or network computing communication systems.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
PCT/US2011/036662 2011-05-16 2011-05-16 Packet loss concealment for audio codec WO2012158159A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/US2011/036662 WO2012158159A1 (en) 2011-05-16 2011-05-16 Packet loss concealment for audio codec
CN201180072349.0A CN103688306B (zh) 2011-05-16 2011-05-16 对被编码为连续帧序列的音频信号进行解码的方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2011/036662 WO2012158159A1 (en) 2011-05-16 2011-05-16 Packet loss concealment for audio codec

Publications (1)

Publication Number Publication Date
WO2012158159A1 true WO2012158159A1 (en) 2012-11-22

Family

ID=44626536

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2011/036662 WO2012158159A1 (en) 2011-05-16 2011-05-16 Packet loss concealment for audio codec

Country Status (2)

Country Link
CN (1) CN103688306B (zh)
WO (1) WO2012158159A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104347076A (zh) * 2013-08-09 2015-02-11 中国电信股份有限公司 网络音频丢包掩蔽方法和装置
WO2015100999A1 (zh) * 2013-12-31 2015-07-09 华为技术有限公司 语音频码流的解码方法及装置
CN105453173A (zh) * 2013-06-21 2016-03-30 弗朗霍夫应用科学研究促进协会 利用改进的脉冲再同步化的似acelp隐藏中的自适应码本的改进隐藏的装置及方法
CN106133827A (zh) * 2014-03-19 2016-11-16 弗朗霍夫应用科学研究促进协会 使用用于各别码本信息的各别替换lpc表示产生错误隐藏信号的装置,方法和对应的计算机程序
CN106788876A (zh) * 2015-11-19 2017-05-31 电信科学技术研究院 一种语音丢包补偿的方法及系统
US10269357B2 (en) 2014-03-21 2019-04-23 Huawei Technologies Co., Ltd. Speech/audio bitstream decoding method and apparatus
US10381011B2 (en) 2013-06-21 2019-08-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for improved concealment of the adaptive codebook in a CELP-like concealment employing improved pitch lag estimation
US10621993B2 (en) 2014-03-19 2020-04-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using an adaptive noise estimation
US10733997B2 (en) 2014-03-19 2020-08-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using power compensation
US11437047B2 (en) 2013-02-05 2022-09-06 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for controlling audio frame loss concealment
US11857615B2 (en) 2014-11-13 2024-01-02 Evaxion Biotech A/S Peptides derived from Acinetobacter baumannii and their use in vaccination
US12009002B2 (en) 2019-02-13 2024-06-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transmitter processor, audio receiver processor and related methods and computer programs

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NO2780522T3 (zh) 2014-05-15 2018-06-09
CN104021792B (zh) * 2014-06-10 2016-10-26 中国电子科技集团公司第三十研究所 一种语音丢包隐藏方法及其系统
CN111383643B (zh) * 2018-12-28 2023-07-04 南京中感微电子有限公司 一种音频丢包隐藏方法、装置及蓝牙接收机
CN111402905B (zh) * 2018-12-28 2023-05-26 南京中感微电子有限公司 音频数据恢复方法、装置及蓝牙设备
WO2020164752A1 (en) * 2019-02-13 2020-08-20 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transmitter processor, audio receiver processor and related methods and computer programs
CN112908346B (zh) * 2019-11-19 2023-04-25 中国移动通信集团山东有限公司 丢包恢复方法及装置、电子设备和计算机可读存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1724756A2 (en) * 2005-05-20 2006-11-22 Broadcom Corporation Packet loss concealment for block-independent speech codecs
WO2008022176A2 (en) * 2006-08-15 2008-02-21 Broadcom Corporation Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ATE451685T1 (de) * 2005-09-01 2009-12-15 Ericsson Telefon Ab L M Verarbeitung von codierten echtzeitdaten
JP2008058667A (ja) * 2006-08-31 2008-03-13 Sony Corp 信号処理装置および方法、記録媒体、並びにプログラム
CN101261833B (zh) * 2008-01-24 2011-04-27 清华大学 一种使用正弦模型进行音频错误隐藏处理的方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1724756A2 (en) * 2005-05-20 2006-11-22 Broadcom Corporation Packet loss concealment for block-independent speech codecs
WO2008022176A2 (en) * 2006-08-15 2008-02-21 Broadcom Corporation Packet loss concealment for sub-band predictive coding based on extrapolation of full-band audio waveform

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BALAZS KOVESI ET AL: "A low complexity packet loss concealment algorithm for ITU-T G.722", ACOUSTICS, SPEECH AND SIGNAL PROCESSING, 2008. ICASSP 2008. IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 31 March 2008 (2008-03-31), pages 4769 - 4772, XP031251665, ISBN: 978-1-4244-1483-3 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437047B2 (en) 2013-02-05 2022-09-06 Telefonaktiebolaget L M Ericsson (Publ) Method and apparatus for controlling audio frame loss concealment
US10643624B2 (en) 2013-06-21 2020-05-05 Fraunhofer-Gesellschaft zur Föerderung der Angewandten Forschung E.V. Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pulse resynchronization
CN105453173A (zh) * 2013-06-21 2016-03-30 弗朗霍夫应用科学研究促进协会 利用改进的脉冲再同步化的似acelp隐藏中的自适应码本的改进隐藏的装置及方法
US11410663B2 (en) 2013-06-21 2022-08-09 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for improved concealment of the adaptive codebook in ACELP-like concealment employing improved pitch lag estimation
CN105453173B (zh) * 2013-06-21 2019-08-06 弗朗霍夫应用科学研究促进协会 利用改进的脉冲再同步化的似acelp隐藏中的自适应码本的改进隐藏的装置及方法
US10381011B2 (en) 2013-06-21 2019-08-13 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Apparatus and method for improved concealment of the adaptive codebook in a CELP-like concealment employing improved pitch lag estimation
CN104347076B (zh) * 2013-08-09 2017-07-14 中国电信股份有限公司 网络音频丢包掩蔽方法和装置
CN104347076A (zh) * 2013-08-09 2015-02-11 中国电信股份有限公司 网络音频丢包掩蔽方法和装置
WO2015100999A1 (zh) * 2013-12-31 2015-07-09 华为技术有限公司 语音频码流的解码方法及装置
US9734836B2 (en) 2013-12-31 2017-08-15 Huawei Technologies Co., Ltd. Method and apparatus for decoding speech/audio bitstream
US10121484B2 (en) 2013-12-31 2018-11-06 Huawei Technologies Co., Ltd. Method and apparatus for decoding speech/audio bitstream
US11367453B2 (en) 2014-03-19 2022-06-21 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using power compensation
US11423913B2 (en) 2014-03-19 2022-08-23 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using an adaptive noise estimation
US10614818B2 (en) 2014-03-19 2020-04-07 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using individual replacement LPC representations for individual codebook information
US10621993B2 (en) 2014-03-19 2020-04-14 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using an adaptive noise estimation
CN106133827B (zh) * 2014-03-19 2020-01-03 弗朗霍夫应用科学研究促进协会 产生错误隐藏信号的装置,方法和计算机存储介质
US10733997B2 (en) 2014-03-19 2020-08-04 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using power compensation
CN106133827A (zh) * 2014-03-19 2016-11-16 弗朗霍夫应用科学研究促进协会 使用用于各别码本信息的各别替换lpc表示产生错误隐藏信号的装置,方法和对应的计算机程序
US11393479B2 (en) 2014-03-19 2022-07-19 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Apparatus and method for generating an error concealment signal using individual replacement LPC representations for individual codebook information
US10269357B2 (en) 2014-03-21 2019-04-23 Huawei Technologies Co., Ltd. Speech/audio bitstream decoding method and apparatus
US11031020B2 (en) 2014-03-21 2021-06-08 Huawei Technologies Co., Ltd. Speech/audio bitstream decoding method and apparatus
US11857615B2 (en) 2014-11-13 2024-01-02 Evaxion Biotech A/S Peptides derived from Acinetobacter baumannii and their use in vaccination
CN106788876A (zh) * 2015-11-19 2017-05-31 电信科学技术研究院 一种语音丢包补偿的方法及系统
CN106788876B (zh) * 2015-11-19 2020-01-21 电信科学技术研究院 一种语音丢包补偿的方法及系统
US12009002B2 (en) 2019-02-13 2024-06-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transmitter processor, audio receiver processor and related methods and computer programs

Also Published As

Publication number Publication date
CN103688306B (zh) 2017-05-17
CN103688306A (zh) 2014-03-26

Similar Documents

Publication Publication Date Title
WO2012158159A1 (en) Packet loss concealment for audio codec
JP5186054B2 (ja) マルチステージコードブックおよび冗長コーディング技術フィールドを有するサブバンド音声コーデック
JP4658596B2 (ja) 線形予測に基づく音声コーデックにおける効率的なフレーム消失の隠蔽のための方法、及び装置
US11721349B2 (en) Methods, encoder and decoder for linear predictive encoding and decoding of sound signals upon transition between frames having different sampling rates
KR101290425B1 (ko) 소거된 스피치 프레임을 복원하는 시스템 및 방법
KR101513184B1 (ko) 계층적 디코딩 구조에서의 디지털 오디오 신호의 송신 에러에 대한 은닉
WO2006130226A2 (en) Audio codec post-filter
JP2009175693A (ja) 減衰率を取得する方法および装置
EP3080804A1 (en) Bandwidth extension mode selection
JP5289319B2 (ja) 隠蔽フレーム(パケット)を生成するための方法、プログラムおよび装置
AU2015241092B2 (en) Apparatus and methods of switching coding technologies at a device
JP2013076871A (ja) 音声符号化装置及びプログラム、音声復号装置及びプログラム、並びに、音声符号化システム
JP5604572B2 (ja) 複雑さ分散によるデジタル信号の転送誤り偽装
Chibani Increasing the robustness of CELP speech codecs against packet losses.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11721213

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11721213

Country of ref document: EP

Kind code of ref document: A1