US20190371345A1 - Smart coding mode switching in audio rate adaptation - Google Patents
Smart coding mode switching in audio rate adaptation Download PDFInfo
- Publication number
- US20190371345A1 US20190371345A1 US15/997,531 US201815997531A US2019371345A1 US 20190371345 A1 US20190371345 A1 US 20190371345A1 US 201815997531 A US201815997531 A US 201815997531A US 2019371345 A1 US2019371345 A1 US 2019371345A1
- Authority
- US
- United States
- Prior art keywords
- coding mode
- frame
- data
- audio signal
- primary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000006978 adaptation Effects 0.000 title 1
- 230000036961 partial effect Effects 0.000 claims abstract description 74
- 238000000034 method Methods 0.000 claims abstract description 65
- 230000004044 response Effects 0.000 claims abstract description 20
- 230000008859 change Effects 0.000 claims description 49
- 230000005236 sound signal Effects 0.000 claims description 45
- 238000005259 measurement Methods 0.000 claims description 16
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000004891 communication Methods 0.000 description 36
- 230000005540 biological transmission Effects 0.000 description 23
- 238000005516 engineering process Methods 0.000 description 22
- 238000010586 diagram Methods 0.000 description 21
- 230000015654 memory Effects 0.000 description 18
- 230000006870 function Effects 0.000 description 16
- 230000002441 reversible effect Effects 0.000 description 13
- 238000012545 processing Methods 0.000 description 8
- 238000013442 quality metrics Methods 0.000 description 7
- 238000012937 correction Methods 0.000 description 6
- 238000013459 approach Methods 0.000 description 5
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000001413 cellular effect Effects 0.000 description 5
- 238000001514 detection method Methods 0.000 description 5
- 238000007781 pre-processing Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000007480 spreading Effects 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000003044 adaptive effect Effects 0.000 description 4
- 230000006872 improvement Effects 0.000 description 4
- 230000000670 limiting effect Effects 0.000 description 4
- 238000012856 packing Methods 0.000 description 4
- 230000002829 reductive effect Effects 0.000 description 4
- 230000007774 longterm Effects 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000010295 mobile communication Methods 0.000 description 3
- 230000009467 reduction Effects 0.000 description 3
- 230000003139 buffering effect Effects 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 239000003795 chemical substances by application Substances 0.000 description 2
- 230000003750 conditioning effect Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 230000010363 phase shift Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 102000018059 CS domains Human genes 0.000 description 1
- 108050007176 CS domains Proteins 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000013475 authorization Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000012508 change request Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/005—Correction of errors induced by the transmission channel, if related to the coding algorithm
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/12—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/167—Audio streaming, i.e. formatting and decoding of an encoded audio signal representation into a data stream for transmission or storage purposes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/173—Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/20—Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/22—Mode decision, i.e. based on audio signal content versus external parameters
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W76/00—Connection management
- H04W76/20—Manipulation of established connections
- H04W76/28—Discontinuous transmission [DTX]; Discontinuous reception [DRX]
Definitions
- the present disclosure is generally related to adjusting coding mode in voice communication.
- wireless telephones such as mobile and smart phones, tablets, and laptop computers that are small, lightweight, and easily carried by users.
- These devices can communicate voice and data packets over wireless networks.
- many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player.
- such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities.
- Electronic devices such as wireless telephones, may send and receive data via networks.
- audio data may be sent and received via a circuit-switched network (e.g., the public switched telephone network (PSTN), a global system for mobile communications (GSM) network, etc.) or a packet-switched network (e.g., a voice over internet protocol (VoIP) network, a voice over long-term evolution (VoLTE) network, etc.).
- PSTN public switched telephone network
- GSM global system for mobile communications
- VoIP voice over internet protocol
- VoIP voice over long-term evolution
- 3GPP 3rd Generation Partnership Project
- the evolution of the 3rd Generation Partnership Project (3GPP) speech codecs has evolved from Adaptive Multi-Rate (AMR) to Adaptive Multi-Rate Wideband (AMR-WB) and to Enhanced Voice Services (EVS).
- AMR Adaptive Multi-Rate
- AMR-WB Adaptive Multi-Rate Wideband
- EVS Enhanced Voice Services
- audio packets corresponding to a sequence of audio frames may be individually routed from a source device to a destination device. Due to network conditions, the audio packets may arrive out of order.
- the destination device may store received packets in a de-jitter buffer and may rearrange the received packets if the received packets are out-of-order.
- the destination device may generate a processed audio signal based on the received packets.
- a particular packet sent by the source device may not be received, or may be received with errors, by a destination device.
- the destination device may be unable to recover all or a portion of the data associated with the particular packet.
- quality of a processed audio signal may improve because more packets have time to reach the destination device.
- increasing the buffer depth may increase end-to-end delay adversely impacting a user experience. Reducing the buffer depth reduces the end-to-end delay, but increases chances that the processed audio signal is generated without data for some frames of the sequence of frames because packets corresponding to the frames have not reached the destination device in time.
- a device in a particular aspect, includes a de-jitter buffer, an analyzer, a transmitter, and a speech decoder.
- the de-jitter buffer is configured to receive a first data encoded by a transmitting device based on a first coding mode.
- the first coding mode may indicate both primary frame encoding and redundant frame encoding.
- the de-jitter buffer is configured to receive a second data encoded by the transmitting device based on a second coding mode.
- the second coding mode may indicate only primary frame encoding without redundant frame encoding.
- the analyzer is configured to determine whether a condition for coding mode change is satisfied.
- the transmitter is configured to transmit, in response to determining that the condition is satisfied by the analyzer, a request to the transmitting device.
- the request may indicate coding mode change to the second coding mode corresponding to primary frame encoding only without redundant frame encoding.
- the speech decoder is configured to decode the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal.
- the speech decoder is further configured to decode the second data which includes a primary copy of a third frame of the audio signal without a partial copy of another frame.
- a method in another aspect, includes receiving a first data.
- the first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the method also includes decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal.
- the method further includes determining whether a condition for coding mode change is satisfied and transmitting a request to the transmitting device in response to determining that the condition is satisfied.
- the request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding.
- the method also includes receiving a second at the receiving device.
- the second data maybe encoded by the transmitting device based on the second coding mode.
- the method also includes decoding the second data.
- the second data may include a primary copy of a third frame of the audio signal.
- a non-transitory computer-readable medium includes instructions that, when executed by a processor within a processor, cause the processor to perform operations including receiving a first data.
- the first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the operations also include decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal.
- the operations further include determining whether a condition for coding mode change is satisfied and transmitting a request to the transmitting device in response to determining that the condition is satisfied.
- the request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding.
- the operations also include receiving a second at the receiving device.
- the second data maybe encoded by the transmitting device based on the second coding mode.
- the operations also include decoding the second data.
- the second data may include a primary copy of
- a device in another aspect, includes means for receiving a first data.
- the first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the device also includes means for decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal.
- the device further includes means for determining whether a condition for coding mode change is satisfied and means for transmitting a request to the transmitting device in response to determining that the condition is satisfied.
- the request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding.
- the device also includes means for receiving a second at the receiving device.
- the second data maybe encoded by the transmitting device based on the second coding mode.
- the device also includes means for means for decoding the second data.
- the second data may include a primary copy of a third frame of the audio signal.
- FIG. 1 is a particular illustrative example of a system operable to change coding mode based on a request
- FIG. 2 is a diagram illustrating a particular example for operation of channel aware mode
- FIG. 3 is a particular exemplary diagram for high-level description of channel aware encoding
- FIG. 4 is a particular exemplary diagram for high-level description of channel aware decoding
- FIG. 5 is a flow chart illustrating a particular method of changing coding mode
- FIG. 6 is another flow chart illustrating a particular method of changing coding mode for operating EVS codec
- FIG. 7 is a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP
- FIG. 8 is a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP2;
- FIG. 9 is a diagram conceptually illustrating an example of an access network in a UTRAN or RAN architecture
- FIG. 10 is a block diagram of a base station (BS) in communication with a UE.
- FIG. 11 is a block diagram of a particular illustrative example of a device that is operable to switch coding mode.
- the principles described herein may be applied, for example, to a headset, a handset, or other audio device that is configured to perform redundancy based packet transmission error recovery.
- the term “signal” is used herein to indicate any of its ordinary meanings, including a state of a memory location (or set of memory locations) as expressed on a wire, bus, or other transmission medium.
- the term “generating” is used herein to indicate any of its ordinary meanings, such as computing or otherwise producing.
- the term “calculating” is used herein to indicate any of its ordinary meanings, such as computing, evaluating, smoothing, and/or selecting from a plurality of values.
- the term “obtaining” is used to indicate any of its ordinary meanings, such as calculating, deriving, receiving (e.g., from another component, block or device), and/or retrieving (e.g., from a memory register or an array of storage elements).
- the term “producing” is used to indicate any of its ordinary meanings, such as calculating, generating, and/or providing.
- the term “providing” is used to indicate any of its ordinary meanings, such as calculating, generating, and/or producing.
- the term “coupled” is used to indicate a direct or indirect electrical or physical connection. If the connection is indirect, it is well understood by a person having ordinary skill in the art, that there may be other blocks or components between the structures being “coupled.”
- configuration may be used in reference to a method, apparatus/device, and/or system as indicated by its particular context. Where the term “comprising” is used in the present description and claims, it does not exclude other elements or operations.
- the term “based on” (as in “A is based on B”) is used to indicate any of its ordinary meanings, including the cases (i) “based on at least” (e.g., “A is based on at least B”) and, if appropriate in the particular context, (ii) “equal to” (e.g., “A is equal to B”). In the case (i) where A is based on B includes based on at least, this may include the configuration where A is coupled to B.
- the term “in response to” is used to indicate any of its ordinary meanings, including “in response to at least.”
- the term “at least one” is used to indicate any of its ordinary meanings, including “one or more.”
- the term “at least two” is used to indicate any of its ordinary meanings, including “two or more.”
- any disclosure of an operation of an apparatus having a particular feature is also expressly intended to disclose a method having an analogous feature (and vice versa), and any disclosure of an operation of an apparatus according to a particular configuration is also expressly intended to disclose a method according to an analogous configuration (and vice versa).
- the terms “method,” “process,” “procedure,” and “technique” are used generically and interchangeably unless otherwise indicated by the particular context.
- the terms “element,” “block,” and “module” may be used to indicate a portion of a greater configuration and may be implemented in hardware (e.g., circuitry), software or a combination of both.
- the term “communication device” refers to an electronic device that may be used for voice and/or data communication over a wireless communication network.
- Examples of communication devices include cellular phones, personal digital assistants (PDAs), handheld devices, headsets, wireless modems, laptop computers, personal computers, etc.
- the devices described herein may be compatible with one or more mobile telecommunication technologies.
- the devices described herein may be compatible with third generation (3G) mobile telecommunication technologies, fourth generation (4G) mobile telecommunication technologies, and/or fifth generation (5G) mobile telecommunication technologies.
- the devices described herein may be compatible with different standards (e.g., a Long-Term Evolution (LTE) wireless communication standard, a LTE-A wireless communication standard, a Worldwide Interoperability for Microwave Access (WiMAX) wireless communication standard, etc.).
- LTE Long-Term Evolution
- WiMAX Worldwide Interoperability for Microwave Access
- codec refers to the combination of an audio encoder and a corresponding audio decoder.
- coding indicates transfer of an audio signal via a codec, including encoding and subsequent decoding.
- transmitting indicates propagating (e.g., a signal) into a transmission channel.
- Audio telecommunications applications may be implemented in a packet-switched network.
- audio telecommunications applications may be implemented in a Voice over Internet Protocol (VoIP) network.
- VoIP Voice over Internet Protocol
- a packet may include one or more frames of the encoded audio signal, and packets with audio (e.g., voice) information may be transmitted from a first device to a second device on the network.
- audio e.g., voice
- some of the packets may be lost during the transmission of the packets.
- the loss of multiple packets (sometimes referred to as burst packet loss) may be a reason for the degradation of perceived speech quality at a receiving device.
- a PLC method may be used to mask the effects of packet loss in VoIP communications.
- a PLC method may be implemented to create a substitute packet instead of the one that was lost during transmission.
- Such a PLC method may attempt to create a packet as similar as possible to the one that was lost.
- Receiver-based PLC methods may not need any additional resources or help from the sender in order to create the substitute packet.
- a PLC method may be ineffective at masking effects of the packet loss.
- Another solution is a sender-based packet loss resilient approach.
- FEC forward error correction
- the additional data may be used to restore errors caused by the loss of data during the transmission.
- FEC schemes may transmit redundant audio frames. In other words, more than one copy (typically two) of an audio frame is transmitted by the sender. These two frames may be referred to as a primary copy and a redundant copy.
- sender-based packet loss resilient schemes may improve the perceptual quality of the decoded speech, these schemes may also increase the bandwidth used during transmission of the speech.
- Traditional FEC schemes may also increase the end-to-end delay, which may be intolerable for real-time conversations.
- conventional sender-based schemes send the same speech frame twice at two different time periods. This approach may at least double the data rate.
- Some conventional schemes may use a low-bit rate codec for the redundant copy in order to reduce the data rate. However, the use of a low-bit rate codec may increase the complexity at the encoder.
- some conventional schemes may use the same low-bit rate codec for both the primary copy of the frame and the redundant copy of the frame.
- the baseline speech quality i.e., the speech quality when no frames are lost
- conventional sender-based schemes typically operate under the assumption of an additional delay of at least one frame interval.
- Systems, methods, and apparatus as described herein may be implemented to provide a source- and channel-controlled FEC scheme in order to obtain an optimal trade-off between speech quality, delay, and data rate.
- the FEC scheme may be configured such that no additional delay is introduced. High quality improvement of the speech quality under moderate data rate increases may be achieved.
- An FEC scheme as described herein may also operate at any target data rate. In one example, the FEC scheme and a target data rate may be adaptively adjusted based on the condition of a transmission channel as well as external controls.
- the total number of bits used to encode each frame is a predetermined constant.
- codecs include the Adaptive Multi Rate (AMR) speech codec, the AMR Wideband (AMR-WB) speech codec, and Enhanced Voice Services (EVS) speech codec in which the number of bits is determined by the coding mode selected for the frame.
- AMR Adaptive Multi Rate
- AMR-WB AMR Wideband
- EVS Enhanced Voice Services
- transmitting a redundant copy of a past frame may require a corresponding reduction in the number of bits available for coding the signal information in the current frame. This reduction may have a negative impact on the perceptual quality of the decoded speech.
- 3GPP EVS supports bandwidths over an audio frequency range up to 20 kHz for four modes in EVS.
- the four supported bandwidths include narrowband (NB), wideband (WB), super wideband (SWB) and full band (FB).
- NB supports voice
- WB supports high definition (HD) voice
- SWB supports voice (including HD voice) and music
- FB supports voice (including HD voice) and high definition (HD) music.
- EVS supports a wide range of audio frequencies with the following attributes: a) the low-range frequencies may improve naturalness and listening comfort; b) the mid-range frequencies may improve voice clarity and intelligibility; and c) the high-range frequencies may improve sense of presence and contribute to better music quality.
- Table 1 illustrates examples of EVS bitrates and supported bandwidths.
- the EVS bitrates are the source bitrates; that is after source compression or source coding.
- the EVS bitrates are in units of kilobits per second (kbps).
- Each EVS bitrate in Table 1 is mapped to corresponding supported bandwidths, where NB is narrowband, WB is wideband, SWB is super wideband and FB is full band.
- Each bitrate is unique in its mapping to the supported bandwidth except for bitrate 13.2 kbps which has a channel aware option that does not include NB as its supported bandwidth.
- all the bitrates illustrated in Table 1 support discontinuous transmission (DTX).
- EVS channel aware mode significantly improve the performance under degraded channel conditions while maintaining the clean channel quality.
- EVS CAM allows in-band redundancy transmission as part of the codec payload in a constant bitrate stream, and is implemented for wideband (WB) and super-wideband (SWB) at 13.2 kbps.
- the transmission of redundancy (a partial copy or a partial data of another frame) may be either channel-controlled or source-controlled.
- an encoder encoding behavior may be based on condition of a network such as network congestion or frame error rate.
- an encoder can use properties of an input source signal to determine the frames that are critical for high quality reconstruction and selectively transmit redundancy for those frames only.
- an encoder can also identify the frames that can be best coded at a reduced bitrate in order to accommodate the attachment of redundancy while keeping the bitstream at a constant 13.2 kbps rate.
- the system 100 includes a first device 102 in communication with one or more other devices (e.g., a second device 122 ) via a network 150 .
- the first device 102 may send data to the second device 122 via the network 150 using a first path 152
- the second device 122 may send data to the first device 102 via the network 150 using a second path 154 .
- the first device 102 may communicate with the network 150 via a first reverse channel 152 a (e.g., a first reverse link) and a first forward channel 154 b (e.g., a first forward link). For example, the first device 102 may transmit data to the network 150 using the first reverse channel 152 a , and the first device 102 may receive data from the network 150 using the first forward channel 154 b .
- the second device 122 may communicate with the network 150 via a second reverse channel 154 a (e.g., a second reverse link) and a second forward channel 152 b (e.g., a second forward link). For example, the second device 122 may transmit data to the network 150 using the second reverse channel 154 a , and the second device 122 may receive data from the network 150 using the second forward channel 152 b.
- a first reverse channel 152 a e.g., a first reverse link
- a first forward channel 154 b e.g., a first forward
- the network 150 may include one or more base stations or access points to communicate data between the first device 102 and the second device 122 .
- data e.g., packets, frames, offset values, acknowledgements, etc.
- communicated via the first path 152 corresponds to data transmitted from the first device 102 to the network 150 via the first reverse channel 152 a and received at the second device 122 from the network 150 via the second forward channel 152 b .
- data communicated via the second path 154 corresponds to data transmitted from the second device 122 to the network 150 via the second reverse channel 154 a and received at the first device 102 from the network 150 via the first forward channel 154 b.
- the devices 102 , 122 may include fewer or more components than illustrated in FIG. 1 .
- the devices 102 , 122 may include one or more processors, one or more memory units, or both.
- the first device 102 and/or the second device 122 may be a smart phone, a cellular phone, a mobile communication device, a laptop computer, a computer, a tablet, a PDA, a set top box, a video player, an entertainment unit, a display device, a television, a gaming console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, or a combination thereof.
- Such devices may include a user interface (e.g., a touch screen, voice recognition capability, or other user interface capabilities).
- the first device 102 includes a first speech vocoder 104 , a receiver 106 , and a transmitter 108 .
- the first speech vocoder 104 includes an encoder 110 , a de-jitter buffer 112 , a decoder 114 , and an analyzer 116 .
- the second device 122 includes a second speech vocoder 124 , a receiver 126 , and a transmitter 128 .
- the second speech vocoder 124 includes an encoder 130 , a de-jitter buffer 132 , a decoder 134 , and an analyzer 136 .
- the first device 102 will be described as a “receiving device” and the second device 122 will be described as a “transmitting device.”
- the second device 122 may transmit packets that are received by the first device 102 .
- each device 102 , 122 may concurrently operate as a receiving device and a transmitting device.
- the first device 102 may transmit packets to the second device 122 via the first path 152 (e.g., transmit packets to the network 150 via the first reverse channel 152 a ) and concurrently receive packets from the second device 122 via the second path 154 (e.g., receive packets from the network 150 via the first forward channel 154 b ).
- the second device 122 may transmit packets to the first device 102 via the second path 154 (e.g., transmit packets to the network 150 via the second reverse channel 154 a ) and concurrently receive packets from the first device 102 via the first path 152 (e.g., receive packets from the network 150 via the second forward channel 152 b ).
- the receiver 106 may be configured to receive data (e.g., one or more packets) from the second device 122 via the second path 154 .
- the transmitter 128 may be configured to transmit a first packet 160 , a second packet 162 , and an Nth packet 164 to the network 150 via the second reverse channel 154 a .
- the receiver 106 may be configured to receive the first packet 160 , the second packet 162 , and the Nth packet 164 from the network 150 via the first forward channel 154 b .
- N may be any integer value that is greater than zero.
- the receiver 106 may provide the received packets 160 - 164 to the de-jitter buffer 112 .
- the de-jitter buffer 112 may be configured to store (e.g., buffer) the packets 160 - 164 received by the receiver 106 .
- the de-jitter buffer 112 may have a storage capacity (e.g., a “depth”) to store a number “D” of packets.
- the packets 160 - 164 may arrive out of order due to the dynamic nature of the wireless network 150 .
- the second packet 162 may arrive at the receiver 106 prior to the first packet 160 arriving at the receiver 106 due to a network delay, a data handling capacity, wireless signal conditions, a network load, etc.
- the de-jitter buffer 112 may be configured to “absorb” or reduce the jitter (e.g., the delay) in the packet arrival time by buffering the packets 160 - 164 received by the receiver 106 and providing a packet the decoder 114 at regular intervals (e.g., approximately once every 20 milliseconds (ms)).
- the analyzer 116 or, alternatively, the decoder 114 may reorder one or more of the packets 160 - 164 in the de-jitter buffer 112 if the packets 160 - 164 are out-of-order. Additionally, one or more packets 160 - 164 sent by the second device 122 may not be received, or may be received with errors, by the first device 102 . For example, a packet (e.g., the first packet 160 ) may not be received due to packet loss or may be partially received, due to network conditions, by the receiver 106 . The analyzer 116 or the decoder 114 may determine whether a particular packet is missing from the de-jitter buffer 112 .
- subsequent packets may include error correction data associated with the lost packet.
- the error correction data may include a partial copy of the lost packet.
- the analyzer 116 or the decoder 114 may retrieve a subsequent packet (having error correction data associated with the lost packet) from the de-jitter buffer 112 in response to determining that the lost packet is missing from the de-jitter buffer 112 .
- the analyzer 116 or the decoder 114 may determine whether a subsequent packet (that includes error correction data or partial copy associated with the first packet 160 ) is stored in the de-jitter buffer 112 .
- the actual transmission of the partial (redundant) copy of the first packet 160 may occur at frame N+offset (e.g., the offset value 166 ), where the value of offset k may indicate the distance between the primary copy of a frame and the partial copy of the frame.
- This offset value k may also be referred to as FEC offset value.
- typical values of the offset k may include one, two, three, five, and integers greater than five.
- the offset value k (e.g., the offset value 166 ) may be fixed (e.g., during system implementation or during call set-up) or may be selected and/or updated during a call according to channel conditions based on feedback (e.g., coding mode change request 186 ) from the receiving device (e.g., the first device 102 ). For example, it may be desirable to use a higher value of offset k in an environment that is causing a frequent loss of consecutive frames (e.g., due to long fades).
- the analyzer 116 may determine an optimum offset value (X opt ) and the transmitter 108 may transmit the offset value 166 to the receiver 126 of the second device 122 .
- the optimum FEC offset value (X opt ) may correspond to the offset value 166 that yields the greatest probability of successfully decoding a lost packet.
- the receiver 126 of the second device 122 may receive the offset value 166 and provide the offset value 166 to the encoder 130 .
- the second device 122 may send an acknowledgment 168 to the first device 102 via the second path 154 .
- the encoder 130 (or another processor) may generate the acknowledgement 168
- the transmitter 128 may transmit the acknowledgement 168 to the network 150 via the second reverse channel 154 a .
- one or more packets that include partial redundancy data transmitted from the second device 122 to the first device 102 may include the acknowledgment 168 in addition to, or instead of, sending a dedicated message to signal acknowledgement.
- the second device 122 may also acknowledge receipt of the offset value 166 by including the offset value 166 in the frame that includes the partial redundancy.
- the analyzer 116 may determine whether a condition to trigger coding mode change is satisfied. In one implementation, the analyzer 116 may determine whether the condition for coding mode change for encoders 130 110 is satisfied or not based on how many partial copies among received packets 160 162 164 from the second device being decoded by the decoder 114 . In another implementation, the analyzer 116 may determine whether the condition to trigger coding mode change for encoders 130 110 is satisfied or not based on statistical information indicating how many partial copies being decoded by the decoder 114 in lieu of their corresponding primary copies.
- the analyzer 116 may monitor the latest M number of received packets encoded by the encoder 130 according to channel aware mode in which case all the transmitted packets 160 162 164 from the second device 122 may include primary copies, and all or some of the transmitted packets 160 162 164 by the second device 122 may include partial copies.
- channel aware mode in which case all the transmitted packets 160 162 164 from the second device 122 may include primary copies, and all or some of the transmitted packets 160 162 164 by the second device 122 may include partial copies.
- the analyzer 116 may determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode.
- the analyzer 116 may also determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from a channel aware mode to a non-channel aware mode. For example, in such extremely harsh channel condition scenario, it is likely that many of the partial copies of transmitted packets 160 162 164 may be also lost and thus not available in the de-jitter buffer 112 . Then, the analyzer 116 may determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode.
- the analyzer 116 may determine whether the condition to trigger coding mode change for encoders 130 110 is satisfied or not based on a real-time quality metric of decoded speech signal.
- the real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples.
- the analyzer 116 may calculate quality measurement of decoded speech sample based on Perceptual Evaluation of Audio Quality (PEAQ) or Perceptual Evaluation of Speech Quality (PESQ).
- PEAQ Perceptual Evaluation of Audio Quality
- PESQ Perceptual Evaluation of Speech Quality
- the analyzer 116 may compare the quality measurement with a threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement.
- the analyzer 116 may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode.
- the analyzer 116 may determine whether the condition to trigger coding mode change for encoders 130 110 is satisfied or not based on information or indication received from another layer (e.g., upper layers) of communication protocol. For example, this indication may be from audio layer through error concealment mechanism.
- the transmitter 108 may transmit a request for coding mode change to the second device 122 through the first path 152 .
- the analyzer 136 or alternatively the encoder 130 , may configure the encoder 130 from a first coding mode to a second coding mode.
- the first coding mode may correspond to a channel aware mode and the second coding mode may correspond to a non-channel aware mode.
- the first coding mode may correspond to EVS 13.2 kbps a channel aware mode and the second coding mode may correspond to EVS 13.2 kbps a non-channel aware mode.
- the first coding mode may correspond to EVS 13.2 kbps a channel aware mode and the second coding mode may correspond to EVS a non-channel aware mode with higher bitrate than 13.2 kbps.
- the second coding mode may correspond to EVS 24.4 kbps coding mode.
- the first coding mode may correspond to a non-channel aware mode and the second coding mode may correspond to a channel aware mode.
- the diagram 200 includes de-jitter buffer 210 and speech decoder 220 .
- the de-jitter buffer 210 may correspond to the de-jitter buffer 112 132 of FIG. 1
- the speech decoder 220 may correspond to the decoder 114 134 of FIG. 1 .
- the de-jitter buffer 210 includes primary copies and partial copies.
- the offset value k is chosen 2 for illustration only.
- one or more packets 160 162 164 may include both the primary copy and the partial copy.
- the primary copy for frame N ⁇ 1 211 and the partial copy for frame N ⁇ 3 201 may be included in the first packet 160
- the primary copy for frame N 213 and the partial copy for frame N ⁇ 2 203 may be included in the second packet 162 .
- the primary copies 211 213 215 217 in the de-jitter buffer may have been encoded by the encoder 130 in the second speech vocoder 124 based on primary frame coding scheme. In one particular example, the primary copies 211 213 215 217 may have been encoded based on primary frame coding scheme according to the EVS 13.2 kbps channel aware mode.
- the partial copies 201 203 205 207 in the de-jitter buffer may have been encoded by the encoder 130 in the second speech vocoder 124 based on redundant frame coding scheme. In one particular example, the partial copies 201 203 205 207 may have been encoded based on redundant frame coding scheme according to the EVS 13.2 kbps channel aware mode.
- the second device 122 may send packets 160 162 164 the first device 102 may operate on a constant-bit-rate (e.g., 13.2 kbps) channel.
- a primary frame bit-rate corresponding to primary copy may be reduced (e.g., to 9.6 kbps) to accommodate the redundant copy.
- a remaining bit-rate (e.g., 3.6 kbps) of the constant-bit-rate may correspond to the redundant copy.
- the speech decoder 220 may decide which copy among the primary copy and the partial copy should be decoded based on the availability of each packet in the de-jitter buffer 210 at the time of decoding each frame.
- the analyzer 116 (not shown in FIG. 2 ) may determine, at the time for frame N ⁇ 1 decoding, that primary copy for the frame N ⁇ 1 211 is available in the de-jitter buffer 210 and the speech decoder 220 may subsequently decode the primary copy for the frame N ⁇ 1 211 .
- the analyzer 116 may determine that primary copy for the frame N 213 is not available in the de-jitter buffer 210 .
- the analyzer 116 may search if the partial copy for the frame N 207 is available in the de-jitter buffer 210 . In case the partial copy for the frame N 207 is indeed available in the de-jitter buffer 210 , then the analyzer 116 may instruct the speech decoder 220 to decode the partial copy for the frame N 207 , in lieu of the primary copy for the frame N 213 . In a similar manner, the speech decoder 220 may proceed with decoding the primary copy for the frame N+1 215 and the primary copy for the frame N+2 217 because these primary copies 215 217 are already available in the de-jitter buffer 210 .
- the diagram 300 includes pre-processing 310 , primary frame coding 320 , redundant frame coding 340 , redundant frame configuration 330 , and bit-packing 350 blocks.
- the diagram 300 may indicate EVS channel aware encoding scheme.
- the pre-processing block 310 may perform various signal processing on input audio signal that may be sampled at either 16 kHz (WB) or 32 kHz (SWB) and may be segmented, for example, into frames of 20 msec. In one implementation, the pre-processing block 310 may perform resampling of the input frame (e.g., to 12.8 kHz); signal detection such as a voice activity detection (VAD), bandwidth detection, time-domain transient detection; signal classification such as speech/music classification or open-loop classification; and signal parameter estimation such as correlation, frame type, and pitch lag.
- VAD voice activity detection
- VAD voice activity detection
- signal classification such as speech/music classification or open-loop classification
- signal parameter estimation such as correlation, frame type, and pitch lag.
- the redundant frame configuration block 330 may determine, based on the parameters from the pre-processing block 310 , the compressibility of a current frame to enable the inclusion of a partial copy associated with a previous frame.
- the compressibility of the current frame may indicate whether the current frame can allow for bitrate reduction with minimal perceptual impact or the current frame is “critical frame.”
- a frame may be considered as critical to protect when loss of that frame would cause significant impact to the speech quality at the receiver.
- the threshold to determine whether a particular frame is critical or not may be a configurable parameter at the encoder, which can be dynamically adjusted depending on network conditions. For example, under high FER conditions it may be desirable to adjust the threshold to classify more frames as critical.
- the criticality may also depend on the ability to quickly recover from the loss of a previous frame. For example, if the current frame depends heavily on the previous frame's synthesis, then the current frame may get re-classified from being non-critical.
- the redundant frame configuration block 330 may further determine, based on the parameters from the pre-processing block 310 , redundant frame (RF) type classification which controls the number of bits needed to faithfully reconstruct the current frame through the partial copy that is transmitted in a future frame. For example, the partial copy of the frame N will be transmitted along with frame N+2 because the FEC offset is 2 in this example. Strongly-voiced and unvoiced frames are suitable for carrying partial copies of a previous frame with negligible perceptual impact to the primary frame quality.
- RF redundant frame
- the primary frame coding block 320 may encode a plurality of input frames according to either ACELP coding technology or an MDCT-based coding technology and generate a plurality of primary frame packets 360 .
- the selection between ACELP coding technology and an MDCT-based coding technology for any particular frame may be based on the signal characteristic of the particular frame or may be based on the signal received from redundant frame configuration block 330 .
- the primary frame coding block 320 may determine which frames can be best be encoded at a reduced frame rate, and the size of each of the plurality of primary frame packets 360 may vary based on a measure of compressibility of each frame. For example, the primary frame packet 364 shows better compressibility than the primary frame packet 363 or the primary frame packet 365 .
- the primary frame coding block 320 may use the ACELP coding technology to encode the low band core up to 6.4 kHz while the upper band that is beyond 6.4 kHz and up to the Nyquist frequency is encoded using the Time-domain Bandwidth Extension (TBE) technology.
- the upper band is parameterized into LSPs and gain parameters to capture both the temporal evolution per sub-frame as well as over an entire frame.
- the primary frame coding block 320 may also use the MDCT-based coding technologies (e.g., Transform Coded Excitation (TCX) and Intelligent Gap Filling (IGF)) to encode the background noise frames and mixed/music content more efficiently.
- TCX Transform Coded Excitation
- IGF Intelligent Gap Filling
- the EVS CAM leverages these ACELP and TCX core advancements for primary frame encoding.
- a classifier such as an SNR-based open-loop classifier, may be used to decide whether to choose the ACELP/TBE based coding technology or the
- the redundant frame coding block 340 may perform compact re-encoding of small subset of key parameters that are critical to protect and may generate a plurality of redundant frame packets 370 .
- the set of critical parameters may be identified based on the frame's signal characteristics and may be re-encoded at a much lower bitrate (e.g., less than 3.6 kbps).
- the size of the redundant frame packets 370 is variable and may depend on the characteristics of the input signal or may be based on the signal received from redundant frame configuration block 330 . In one implementation, the criticality measure may be used to determine whether redundant frame coding is required or to determine the size of the redundant frame packets 370 .
- the redundant frame coding block 340 may perform redundant frame encoding based on either ACELP-based coding technology or MDCT-based coding technology.
- the redundant frame coding block 340 may perform an ACELP-based redundant frame encoding based on the RF type received from the redundant frame configuration block 330 .
- the redundant frame coding block 340 may perform an ACELP redundant frame encoding for the RF frame types such as RF_NOPRED, RF_ALLPRED, RF_GENPRED, or RF_NELP.
- the redundant frame coding block 340 may perform an MDCT-based redundant frame encoding based on the RF type received from the redundant frame configuration block 330 .
- the redundant frame coding block 340 may perform an MDCT-based redundant frame encoding for the RF frame types such as RF_TCXFD, RF_TCXTD1, or RF_TCXTD2.
- the bit-packing block 350 may arrange the primary frame packets 360 and the redundant frame packets (e.g., the partial copy) 370 , and may generate encoded packets 380 .
- the bit-packing block 350 may arrange the primary frame packets 360 and the redundant frame packets 370 based on FEC offset (e.g., 2 ). For instance, when FEC offset is 2, bit-packing block 350 may piggyback the redundant frame packet 372 (e.g., for redundant frame packet for frame 2 ) on the primary frame packet 366 (e.g., for primary frame packet labelled for frame 4 ).
- the encoded packets 380 may include not only the primary frame packets 360 and the redundant frame packets 370 , but also additional parameters, for example, such as RF frame type and FEC offset.
- the diagram 400 includes jitter buffer management (JBM) 410 , bitstream parser 430 , primary frame decoding 420 , redundant frame decoding 440 , and synthesis and memory update 450 blocks.
- JBM jitter buffer management
- bitstream parser 430 bitstream parser 430
- primary frame decoding 420 primary frame decoding 420
- redundant frame decoding 440 redundant frame decoding 440
- synthesis and memory update 450 blocks may indicate EVS channel aware decoding scheme.
- the JBM block 410 may receive a plurality of packets 480 and store them in a de-jitter buffer.
- the plurality of packets 480 may be transmitted from the first device 102 and may be received by the second device 122 via the receiver 126 .
- the de-jitter buffer within the JBM block 410 may correspond to the de-jitter buffer 132 .
- the received packets 480 may include a packet 481 that comprises a primary copy, a packet that comprises both a primary copy 484 485 486 and a partial copy 487 488 489 .
- Packets 482 483 may represent packets lost during transmission.
- the JBM block 410 may provide the primary copy of the current packet, which was encoded in accordance with primary frame encoding scheme, to primary frame decoding block 420 and disregard any RF information present in the current packet. In case the current packet (or frame) is lost, and a partial copy for the current frame is available in the de-jitter buffer as part of another packet, then the JBM block 410 may provide the partial copy for the current frame to the redundant frame decoding 440 . If the partial copy for the current frame is also not available in the de-jitter buffer, then the decoder 134 may perform any frame erasure concealment processing.
- the bitstream parser block 430 may decide whether any particular packet in received packets 480 needs to be decoded either by the primary frame decoding block 420 or by the redundant frame decoding block 440 .
- the bitstream parser block 430 may be corresponding to the analyzer 136 .
- the bitstream parser block 430 may extract the RF frame type information in the received packets 480 to decide whether to pass the received packets to the primary frame decoding block 420 or the redundant frame decoding block 440 .
- the bitstream parser block 430 may pass the bitstream to the subsequent decoding blocks based on an information from the JBM block 410 .
- the bitstream parser block 430 may detect whether received packets are successfully received without errors. In some configurations, the bitstream parser block 430 may make this determination based on an indicator from a channel decoder that indicates whether a packet was successfully received or not.
- the primary frame decoding block 420 may receive a plurality of primary copy 481 484 485 486 and decode them according to either ACELP coding technology or an MDCT-based coding technology and generate decoded PCM samples for a current frame.
- the redundant frame decoding block 440 may receive a plurality of partial copy 487 488 489 at the time of decoding a current frame, which likely was lost during the transmission or received with errors therein, and decode them according to either ACELP coding technology or an MDCT-based coding technology and generate decoded PCM samples for a current frame.
- the RF parameters e.g., LSPs, ACB and/or FCB gains, and upper band gain
- ACELP partial copy synthesis follows similar steps to that of the primary frame decoding except that the missing parameters (e.g., certain gains and pitch lags are only transmitted in alternate subframes) are extrapolated.
- the synthesis and memory update block 450 may receive the decoded PCM samples for the current from either the primary frame decoding 420 or the redundant frame decoding 440 , and reconstruct output audio 460 via post processing.
- the examples of the post processing may include extrapolation of some parameters derived from the previously decoded frames and may be based on signal characteristic of decoded parameters (e.g., spectral tilt).
- the method 500 includes receiving a first data at a de-jitter buffer of a receiving device, at 510 .
- the de-jitter buffer may correspond to the de-jitter buffer 132 and the receiving device may correspond to the second device 122 .
- the first data was encoded by a transmitting device (e.g., the first device 102 ) based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the first coding mode may correspond to EVS 13.2 kbps channel aware mode
- the second coding mode may correspond to EVS 13.2 kbps non-channel aware mode or EVS 24.4 kbps.
- the method 500 includes decoding the first data, by a speech decoder of the receiving device, at 520 .
- the speech decoder may correspond to the decoder 134 of the second device 122 .
- the first data may include both a partial copy of a first frame, which was encoded by the redundant frame coding block 340 , and a primary copy of a second frame, which was encoded by the primary frame coding block 320 .
- the method 500 includes determining, at the receiving device, whether a condition for coding mode change is satisfied, at 530 .
- the method 530 may be performed by the analyzer 116 .
- the analyzer 116 may determine whether the condition for coding mode change is satisfied based on how many partial copies among recently received packets were used for decoding by the decoder 114 . For example, the analyzer 116 may make this determination based on the number of partial copies decoded by the redundant frame decoding block 440 among the received partial copies 487 488 489 .
- the analyzer 116 may determine whether the condition to trigger coding mode change is satisfied based on a real-time quality metric of decoded speech signal.
- the real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples.
- the analyzer 116 may calculate quality measurement of decoded speech sample based on PEAQ or PESQ.
- the analyzer 116 may compare the quality measurement with a threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement.
- the analyzer 116 may also determine whether the condition to trigger coding mode change is satisfied based on information or indication received from another layer (e.g., upper layers) of communication protocol. For example, this indication may be from audio layer through error concealment mechanism.
- the method 500 includes transmitting a request to the transmitting device, in response to determining that the condition is satisfied, at 540 .
- the request may be transmitted by the transmitter 128 of the second device 122 to the receiver 106 of the first device 102 .
- the request may indicate coding mode change for the first device 102 from the first coding mode to a second coding mode.
- the second coding mode may correspond to encoding by only primary frame coding block 320 (i.e., no use of redundant frame encoding scheme).
- the method 500 includes receiving a second data at the de-jitter buffer of the receiving device, where the second data was encoded by the transmitting device based on the second coding mode, at 550 .
- the method 500 further includes decoding the second data by the speech decoder of the receiving device.
- the speech decoder may correspond to the decoder 134 of the second device 122 .
- the second data may include only a plurality of primary copy, which was encoded by the primary frame coding block 320 without including at least one partial copy.
- a method 600 for switching EVS coding mode from a channel aware mode to a non-channel aware mode is disclosed.
- the method 600 may be a particular example of the method 500 .
- the method 600 includes receiving a first data at a receiving device, at 610 .
- the receiving device may correspond to the second device 122 .
- the first data was encoded by a transmitting device (e.g., the first device 102 ) based on EVS 13.2 kbps channel aware mode.
- the method 600 includes decoding a partial copy of the first data, by a speech decoder of the receiving device, at 620 .
- the speech decoder may correspond to the decoder 134 of the second device 122 .
- the first data may include both a partial copy of a first frame, which was encoded by the redundant frame coding block 340 , and a primary copy of a second frame, which was encoded by the primary frame coding block 320 .
- the step of decoding the partial copy of the first date may be performed in response to the determination that a current frame to be decoded is not available the de-jitter buffer 132 but the partial copy for the current frame is available in the de-jitter buffer 132 .
- the decoding of the partial copy of the first data may be performed by the redundant frame decoding block 440 .
- the method 600 includes calculating a first quality measurement of decoded speech signal, at 630 .
- the decoded speech signal includes decoded samples the redundant frame decoding block 440 based on the partial copy of the first data.
- the analyzer 136 may calculate the first quality measurement of decoded speech signal.
- the analyzer 116 may calculate a real-time quality metric of decoded speech signal.
- the real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples.
- the analyzer 116 may calculate quality measurement of decoded speech sample based on PEAQ or PESQ.
- the method 600 includes comparing the first quality measurement with a first threshold, at 640 .
- the step 640 may be performed by the analyzer 116 .
- the analyzer 116 may compare the quality measurement with the threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement for decoded speech samples.
- the method 600 includes transmitting a request to the transmitting device, in response to determining that the first quality measurement is greater than or equal to the first threshold, at 650 .
- the request may be transmitted by the transmitter 128 of the second device 122 to the receiver 106 of the first device 102 .
- the request may indicate coding mode change for the first device 102 from EVS 13.2 kbps channel aware mode to any EVS non-channel aware mode.
- the EVS non-channel aware mode may include EVS 13.2 kbps non-channel aware mode or EVS 24.4 kbps.
- FIG. 7 a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP is disclosed and generally designated 700 .
- the various concepts presented throughout this disclosure may be implemented across a broad variety of telecommunication systems, network architectures, and communication standards.
- the aspects of the present disclosure illustrated in FIG. 7 are presented with reference to a UMTS system 1600 employing a W-CDMA air interface.
- a UMTS network includes three interacting domains: a Core Network (CN) 704 , a UMTS Terrestrial Radio Access Network (UTRAN) 702 , and User Equipment (UE) 710 .
- CN Core Network
- UTRAN UMTS Terrestrial Radio Access Network
- UE User Equipment
- the UTRAN 702 provides various wireless services including telephony, video, data, messaging, broadcasts, and/or other services.
- the UTRAN 702 may include a plurality of Radio Network Subsystems (RNSs) such as an RNS 707 , each controlled by a respective Radio Network Controller (RNC) such as an RNC 706 .
- RNC Radio Network Controller
- the UTRAN 702 may include any number of RNCs 706 and RNSs 707 in addition to the RNCs 706 and RNSs 707 illustrated herein.
- the RNC 706 is an apparatus responsible for, among other things, assigning, reconfiguring and releasing radio resources within the RNS 707 .
- the RNC 706 may be interconnected to other RNCs (not shown) in the UTRAN 702 through various types of interfaces such as a direct physical connection, a virtual network, or the like, using any suitable transport network.
- Communication between a UE 710 and a Node B 708 may be considered as including a physical (PHY) layer and a Medium Access Control (MAC) layer. Further, communication between a UE 710 and an RNC 706 by way of a respective Node B 708 may be considered as including a Radio Resource Control (RRC) layer.
- the PHY layer may be considered layer 1 ; the MAC layer may be considered layer 2 ; and the RRC layer may be considered layer 3 .
- the geographic region covered by the RNS 707 may be divided into multiple cells, with a radio transceiver apparatus serving each cell.
- a radio transceiver apparatus is commonly referred to as a Node B in UMTS applications, but may also be referred to by those skilled in the art as a base station (BS), a base transceiver station (BTS), a radio base station, a radio transceiver, a transceiver function, a basic service set (BSS), an extended service set (ESS), an access point (AP), or some other suitable terminology.
- BS basic service set
- ESS extended service set
- AP access point
- three Node Bs 708 are shown in each RNS 707 ; however, the RNSs 707 may include any number of wireless Node Bs.
- the Node Bs 708 provide wireless access points to a CN 704 for any number of mobile apparatuses.
- the UE 710 may further include a universal subscriber identity module (USIM) 711 , which contains a user's subscription information to a network.
- USIM universal subscriber identity module
- one UE 710 is shown in communication with a number of the Node Bs 708 .
- the DL also called the forward link, refers to the communication link from a Node B 708 to a UE 710
- the UL also called the reverse link, refers to the communication link from a UE 710 to a Node B 708 .
- the CN 704 interfaces with one or more access networks, such as the UTRAN 702 .
- the CN 704 is a GSM core network.
- the various concepts presented throughout this disclosure may be implemented in a RAN, or other suitable access network, to provide UEs with access to types of CNs other than GSM networks.
- the CN 704 includes a circuit-switched (CS) domain and a packet-switched (PS) domain. Some of the circuit-switched elements are a Mobile Services Switching Centre (MSC), a Visitor Location Register (VLR) and a Gateway MSC. Packet-switched elements include a Serving GPRS Support Node (SGSN) and a Gateway GPRS Support Node (GGSN). In the illustrated example, the CN 704 supports circuit-switched services with a MSC 712 and a GMSC 714 . In some applications, the GMSC 714 may be referred to as a media gateway (MGW). One or more RNCs, such as the RNC 706 , may be connected to the MSC 712 .
- MSC Mobile Services Switching Centre
- VLR Visitor Location Register
- GGSN Gateway GPRS Support Node
- SGSN Serving GPRS Support Node
- GGSN Gateway GPRS Support Node
- the CN 704 supports circuit-swit
- the MSC 712 is an apparatus that controls call setup, call routing, and UE mobility functions.
- the MSC 712 also includes a VLR that contains subscriber-related information for the duration that a UE is in the coverage area of the MSC 712 .
- the GMSC 714 provides a gateway through the MSC 712 for the UE to access a circuit-switched network 716 .
- the GMSC 714 includes a home location register (HLR) 715 containing subscriber data, such as the data reflecting the details of the services to which a particular user has subscribed.
- the HLR is also associated with an authentication center (AuC) that contains subscriber-specific authentication data.
- AuC authentication center
- the CN 704 also supports packet-data services with a SGSN 718 and a GGSN 720 .
- GPRS which stands for General Packet Radio Service, is designed to provide packet-data services at speeds higher than those available with standard circuit-switched data services.
- the GGSN 720 provides a connection for the UTRAN 702 to a packet-based network 722 .
- the packet-based network 722 may be the Internet, a private data network, or some other suitable packet-based network.
- the primary function of the GGSN 720 is to provide the UEs 710 with packet-based network connectivity. Data may be transferred between the 720 and the UEs 710 through the SGSN 718 , which performs primarily the same functions in the packet-based domain as the MSC 712 performs in the circuit-switched domain.
- a 3GPP2 network may include three interacting domains: a UE 850 (which may also be called a Mobile Station (MS)), a Radio Access Network (RAN) 860 , and a Core Network (CN) 870 .
- the RAN 860 provides various wireless services including telephony, video, data, messaging, broadcasts, and/or other services.
- the RAN 860 may include a plurality of BTSs 862 , each controlled by a respective BSC 864 .
- the CN 870 interfaces with one or more access networks, such as the RAN 860 .
- the CN 870 may include a CS domain and a PS domain.
- Some of the circuit-switched elements are a Mobile Switching Center (MSC) 872 to connect to a Public Switched Telephony Network (PSTN) 880 and an Inter-Working Function (IWF) 874 to connect to a network such as the Internet 890 .
- Packet-switched elements may include a Packet Data Serving Node (PDSN) 876 and a Home Agent (HA) 878 to connect to a network such as the Internet 890 .
- PDSN Packet Data Serving Node
- HA Home Agent
- an Authentication, Authorization, and Accounting (AAA) function (not shown) may be included in the CN 870 to perform various security and administrative functions.
- AAA Authentication, Authorization, and Accounting
- Examples of a UE may include a cellular phone, a smart phone, a session initiation protocol (SIP) phone, a laptop, a notebook, a netbook, a smartbook, a personal digital assistant (PDA), a satellite radio, a global positioning system (GPS) device, a multimedia device, a video device, a digital audio player (e.g., MP3 player), a camera, a game console, or any other similar functioning device.
- SIP session initiation protocol
- PDA personal digital assistant
- GPS global positioning system
- multimedia device e.g., a digital audio player (e.g., MP3 player), a camera, a game console, or any other similar functioning device.
- the UE is commonly referred to as a mobile apparatus, but may also be referred to by those skilled in the art as a mobile station, a subscriber station, a mobile unit, a subscriber unit, a wireless unit, a remote unit, a mobile device, a wireless device, a wireless communications device, a remote device, a mobile subscriber station, an access terminal, a mobile terminal, a wireless terminal, a remote terminal, a handset, a terminal, a user agent, a mobile client, a client, or some other suitable terminology.
- the multiple access wireless communication system includes multiple cellular regions (cells), including cells 902 , 904 , and 906 , each of which may include one or more sectors.
- the multiple sectors can be formed by groups of antennas with each antenna responsible for communication with UEs in a portion of the cell. For example, in cell 902 , antenna groups 912 , 914 , and 916 may each correspond to a different sector. In cell 904 , antenna groups 918 , 920 , and 922 each correspond to a different sector.
- antenna groups 924 , 926 , and 928 each correspond to a different sector.
- the cells 902 , 904 and 906 may include several wireless communication devices (e.g., UEs), which may be in communication with one or more sectors of each cell 902 , 904 or 906 .
- UEs 930 and 932 may be in communication with base station 942
- UEs 934 and 936 may be in communication with base station 944
- UEs 938 and 940 can be in communication with base station 946 .
- References to a base station made herein may include the node B 708 of FIG. 7 and/or the BTS 862 of FIG. 8 .
- each base station 942 , 944 , 946 is configured to provide an access point to a CN (see FIGS. 7-8 ) for all the UEs 930 , 932 , 934 , 936 , 938 , 940 in the respective cells 902 , 904 , and 906 .
- a serving cell change (SCC) or handover may occur in which communication with the UE 934 transitions from the cell 904 , which may be referred to as the source cell, to cell 906 , which may be referred to as the target cell.
- SCC serving cell change
- Management of the handover procedure may take place at the UE 934 , at the base stations corresponding to the respective cells, at a RNC 706 or Base Station Controller (BSC) 864 (see FIGS. 7-8 ), or at another suitable node in the wireless network.
- the UE 934 may monitor various parameters of the source cell 904 as well as various parameters of neighboring cells such as cells 906 and 902 . Further, depending on the quality of these parameters, the UE 934 may maintain communication with one or more of the neighboring cells.
- the UE 934 may maintain an Active Set, that is, a list of cells that the UE 934 is simultaneously connected to (i.e., the UTRA cells that are currently assigning a downlink dedicated physical channel DPCH or fractional downlink dedicated physical channel F-DPCH to the UE 934 may constitute the Active Set).
- an Active Set that is, a list of cells that the UE 934 is simultaneously connected to (i.e., the UTRA cells that are currently assigning a downlink dedicated physical channel DPCH or fractional downlink dedicated physical channel F-DPCH to the UE 934 may constitute the Active Set).
- the modulation and multiple access scheme employed by the access network 1700 may vary depending on the particular telecommunications standard being deployed.
- the standard may include Evolution-Data Optimized (EV-DO) or Ultra Mobile Broadband (UMB).
- EV-DO and UMB are air interface standards promulgated by the 3GPP2 as part of the cdma2000 family of standards and employs CDMA to provide broadband Internet access to user equipment (e.g., mobile stations).
- the standard may alternately be Universal Terrestrial Radio Access (UTRA) employing Wideband-CDMA (W-CDMA) and other variants of CDMA, such as TD-SCDMA; Global System for Mobile Communications (GSM) employing TDMA; and Evolved UTRA (E-UTRA), Ultra Mobile Broadband (UMB), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, and Flash-OFDM employing OFDMA.
- UTRA, E-UTRA, UMTS, Long-Term Evolution (LTE), LTE Advanced, and GSM are described in documents from the 3GPP organization.
- cdma2000 and UMB are described in documents from the 3GPP2 organization.
- the actual wireless communication standard and the multiple access technology employed will depend on the specific application and the overall design constraints imposed on the system.
- a block diagram of a base station 1010 in communication with a UE 1050 is disclosed and generally designated 1000 .
- the base station 1010 may be the Node B 708 or the BTS 862 in FIGS. 7-8
- the UE 1050 may be the UE 710 , 850 in FIGS. 7-8 .
- a transmit processor 1020 may receive data from a data source 1012 and control signals from a controller/processor 1040 .
- the transmit processor 1020 provides various signal processing functions for the data and control signals, as well as reference signals (e.g., pilot signals).
- the transmit processor 1020 may provide cyclic redundancy check (CRC) codes for error detection, coding and interleaving to facilitate forward error correction (FEC), mapping to signal constellations based on various modulation schemes (e.g., binary phase-shift keying (BPSK), quadrature phase-shift keying (QPSK), M-phase-shift keying (M-PSK), M-quadrature amplitude modulation (M-QAM), and the like), spreading with orthogonal variable spreading factors (OVSF), and multiplying with scrambling codes to produce a series of symbols.
- BPSK binary phase-shift keying
- QPSK quadrature phase-shift keying
- M-PSK M-phase-shift keying
- M-QAM M-quadrature amplitude modulation
- OVSF orthogonal variable spreading factors
- the channel estimates may be derived from a reference signal transmitted by the UE 1050 or from feedback from the UE 1050 .
- the symbols generated by the transmit processor 1020 are provided to a transmit frame processor 1030 to create a frame structure.
- the transmit frame processor 1030 creates this frame structure by multiplexing the symbols with information from the controller/processor 1040 , resulting in a series of frames.
- the frames are then provided to a transmitter 1032 , which provides various signal conditioning functions including amplifying, filtering, and modulating the frames onto a carrier for downlink transmission over the wireless medium through antenna 1034 .
- the antenna 1034 may include one or more antennas, for example, including beam steering bidirectional adaptive antenna arrays or other similar beam technologies.
- a receiver 1054 receives the downlink transmission through an antenna 1052 and processes the transmission to recover the information modulated onto the carrier.
- the information recovered by the receiver 1054 is provided to a receive frame processor 1060 , which parses each frame, and provides information from the frames to a channel processor 1094 and the data, control, and reference signals to a receive processor 1070 .
- the receive processor 1070 then performs the inverse of the processing performed by the transmit processor 1020 in the base station 1010 . More specifically, the receive processor 1070 descrambles and despreads the symbols, and then determines the most likely signal constellation points transmitted by the base station 1010 based on the modulation scheme. These soft decisions may be based on channel estimates computed by the channel processor 1094 .
- the soft decisions are then decoded and deinterleaved to recover the data, control, and reference signals.
- the CRC codes are then checked to determine whether the frames were successfully decoded.
- the data carried by the successfully decoded frames will then be provided to a data sink 1072 , which represents applications running in the UE 1050 and/or various user interfaces (e.g., display).
- Control signals carried by successfully decoded frames will be provided to a controller/processor 1090 .
- the controller/processor 1090 may also use an acknowledgement (ACK) and/or negative acknowledgement (NACK) protocol to support retransmission requests for those frames.
- ACK acknowledgement
- NACK negative acknowledgement
- a transmit processor 1080 receives data from a data source 1078 and control signals from the controller/processor 1090 and provides various signal processing functions including CRC codes, coding and interleaving to facilitate FEC, mapping to signal constellations, spreading with OVSFs, and scrambling to produce a series of symbols.
- Channel estimates may be used to select the appropriate coding, modulation, spreading, and/or scrambling schemes.
- the symbols produced by the transmit processor 1080 will be provided to a transmit frame processor 1082 to create a frame structure.
- the transmit frame processor 1082 creates this frame structure by multiplexing the symbols with information from the controller/processor 1090 , resulting in a series of frames.
- the frames are then provided to a transmitter 1056 , which provides various signal conditioning functions including amplification, filtering, and modulating the frames onto a carrier for uplink transmission over the wireless medium through the antenna 1052 .
- the uplink transmission is processed at the base station 1010 in a manner similar to that described in connection with the receiver function at the UE 1050 .
- a receiver 1035 receives the uplink transmission through the antenna 1034 and processes the transmission to recover the information modulated onto the carrier.
- the information recovered by the receiver 1035 is provided to a receive frame processor 1036 , which parses each frame, and provides information from the frames to the channel processor 1044 and the data, control, and reference signals to a receive processor 1038 .
- the receive processor 1038 performs the inverse of the processing performed by the transmit processor 1080 in the UE 1050 .
- the data and control signals carried by the successfully decoded frames may then be provided to a data sink 1039 and the controller/processor 1040 , respectively. If some of the frames were unsuccessfully decoded by the receive processor, the controller/processor 1040 may also use an ACK and/or NACK protocol to support retransmission requests for those frames.
- the controller/processors 1040 and 1090 may be used to direct the operation at the base station 1010 and the UE 1050 , respectively.
- the controller/processors 1040 and 1090 may provide various functions including timing, peripheral interfaces, voltage regulation, power management, and other control functions.
- the computer readable media of memories 1042 and 1092 may store data and software for the base station 1010 and the UE 1050 , respectively.
- a scheduler/processor 1046 at the base station 1010 may be used to allocate resources to the UEs and schedule downlink and/or uplink transmissions for the UEs.
- wireless networks with EVS coverage may be handed over to a wireless network without EVS coverage, i.e., a non-native EVS system.
- a UE within a LTE coverage may be handed over to another coverage, e.g., 3GPP2 coverage, without EVS.
- a transcoder may be used to enable compatibility for EVS coverage with possible increase in delay and decrease in audio quality due to the need for transcoding between different formats.
- FIG. 11 a block diagram of a particular illustrative implementation of a device (e.g., a wireless communication device) for smart coding mode switch is depicted and generally designated 1100 .
- the device 1100 may have more or fewer components than illustrated in FIG. 11 .
- the device 1100 may correspond to one or more of the first device 102 , the second device 122 of FIG. 1 , and the UE 850 of FIG. 8 .
- the device 1100 may operate according to one or more of the systems or methods described with reference to FIGS. 1-10 .
- the device 1100 includes a processor 1106 (e.g., a central processing unit (CPU).
- the device 1100 may include one or more additional processors 1110 (e.g., one or more digital signal processors (DSPs)).
- the processors 1110 may include the first speech vocoder 104 .
- the first speech vocoder 104 may be included in a different type of processor, such as a CPU (e.g., the processor 1106 ).
- the device 1100 may include the memory 1152 and a CODEC 1134 .
- the memory 1152 may include instructions 1156 that are executable by the processor 1110 .
- the device 1100 may include a wireless controller 1140 coupled, via a transceiver 1150 , to an antenna 1142 .
- the transceiver 1150 may include the receiver 106 , the transmitter 108 , or both, of FIG. 1 .
- the antenna 1142 may receive incoming packets corresponding to a sequence of packets sent by another device via a network. The received packets may correspond to a sequence of frames of a user speech signal.
- the device 1100 may include a display 1128 coupled to a display controller 1126 .
- the speaker 1148 , the microphone 1146 , or both, may be coupled to the CODEC 1134 .
- the CODEC 1134 may include a digital-to-analog converter 1102 and an analog-to-digital converter 1104 .
- the CODEC 1134 may receive analog signals from the microphone 1146 , convert the analog signals to digital signals using the analog-to-digital converter 1104 , and provide the digital signals to the first speech vocoder 104 .
- the first speech vocoder 104 may process the digital signals.
- the first speech vocoder 104 may provide digital signals to the CODEC 1134 .
- the CODEC 1134 may convert the digital signals to analog signals using the digital-to-analog converter 1102 and may provide the analog signals to the speaker 1148 .
- the first speech vocoder 104 may include the encoder 110 , the decoder 114 , the analyzer 116 , the de-jitter buffer 112 .
- the decoder 114 , the analyzer 116 , and the de-jitter buffer 112 may be used to implement a hardware that performs smart coding mode switch as described herein.
- a software implementation or combined software/hardware implementation
- the memory 1152 may include instructions 1156 executable by the processors 1110 or other processing unit of the device 1100 (e.g., the processor 1106 , the CODEC 1134 , or both) to perform the method 500 of FIG. 5 or the method 600 of FIG. 6 .
- the device 1100 may be included in a system-in-package or system-on-chip device 1122 .
- the memory 1152 , the processor 1106 , the processors 1110 , the display controller 1126 , the CODEC 1134 , and the wireless controller 1140 may be included in a system-in-package or system-on-chip device 1122 .
- an input device 1130 and a power supply 1144 are coupled to the system-on-chip device 1122 .
- the display 1128 , the input device 1130 , the speaker 1148 , the microphone 1146 , the antenna 1142 , and the power supply 1344 are external to the system-on-chip device 1122 .
- each of the display 1128 , the input device 1130 , the speaker 1148 , the microphone 1146 , the antenna 1142 , and the power supply 1144 may be coupled to a component of the system-on-chip device 1122 , such as an interface or a controller.
- the processors 1110 may be operable to perform all or a portion of the methods or operations described with reference to FIGS. 1-10 .
- the processors 1110 may be configured to execute software (e.g., a program of one or more instructions 1156 ) stored in the memory 1152 .
- the processors 1110 may be configured to operate in accordance with the method 500 of FIG. 5 , the method 600 of FIG. 6 , or a combination thereof.
- the processors 1110 may be configured to execute the instructions 1156 that cause the processors 1110 to perform the operations of receiving a first data at a de-jitter buffer 112 .
- the first data may be encoded by a transmitting device (e.g., the first device 102 ) based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the processor may also perform the operations of decoding the first data by the decoder 114 .
- the first data may include both a partial copy of a first frame and a primary copy of a second frame.
- the processor may also perform the operations of determining whether a condition for coding mode change is satisfied.
- the condition to trigger coding mode change may be based on statistical information indicating the number of the received partial copies decoded by redundant frame decoding block, a real-time quality metric of decoded speech signal, or information from another layer.
- the processor may also perform the operations of transmitting a request to the transmitting device, in response to determining that the condition is satisfied.
- the request may indicate coding mode change from the first coding mode to a second coding mode, which corresponds to encoding by only primary frame encoding scheme without the use of redundant frame encoding scheme.
- the processor may also perform the operations of receiving a second data at the de-jitter buffer of the receiving device, where the second data was encoded by the transmitting device based on the second coding mode.
- an apparatus may include means for receiving a first data.
- the means for receiving the first data may include the receiver 106 , the transceiver 1150 , the de-jitter buffer 112 , one or more other devices, circuits, or modules.
- the first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding.
- the apparatus may also include means for decoding the first data, at the receiving device.
- the means for decoding the first data may include the decoder 114 , the first speech vocoder 104 , the processors 1106 1110 , one or more other devices, circuits, or modules.
- the first data may include a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal.
- the apparatus may also include means for determining whether a condition for coding mode change is satisfied.
- the means for determining the condition for coding mode change may include the analyzer 116 , the first speech vocoder 104 , the processors 1106 1110 , one or more other devices, circuits, or modules.
- the apparatus may also include means for transmitting a request to the transmitting device in response to determining that the condition is satisfied.
- the means for transmitting the request may include the transmitter 108 , the transceiver 1150 , one or more other devices, circuits, or modules.
- the request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding.
- the apparatus may also include means for receiving a second at the receiving device.
- the means for receiving the first data may include the receiver 106 , the transceiver 1150 , the de-jitter buffer 112 , one or more other devices, circuits, or modules.
- the second data maybe encoded by the transmitting device based on the second coding mode.
- the apparatus may also include means for decoding the second data.
- the means for decoding the second data may include the decoder 114 , the first speech vocoder 104 , the processors 1106 1110 , one or more other devices, circuits, or modules.
- the second data may include a primary copy of a third frame of the audio signal.
- One or more of the disclosed aspects may be implemented in a system or an apparatus, such as the device 1100 , that may include a communications device, a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a satellite phone, a computer, a tablet, a portable computer, a display device, a media player, or a desktop computer.
- the device 1100 may include a set top box, an entertainment unit, a navigation device, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a video player, a digital video player, a digital video disc (DVD) player, a portable digital video player, a satellite, a vehicle, any other device that includes a processor or that stores or retrieves data or computer instructions, or a combination thereof.
- PDA personal digital assistant
- DVD digital video disc
- the system or the apparatus may include remote units, such as hand-held personal communication systems (PCS) units, portable data units such as GPS enabled devices, meter reading equipment, or any other device that includes a processor or that stores or retrieves data or computer instructions, or any combination thereof.
- remote units such as hand-held personal communication systems (PCS) units, portable data units such as GPS enabled devices, meter reading equipment, or any other device that includes a processor or that stores or retrieves data or computer instructions, or any combination thereof.
- PCS personal communication systems
- portable data units such as GPS enabled devices, meter reading equipment, or any other device that includes a processor or that stores or retrieves data or computer instructions, or any combination thereof.
- FIGS. 1-11 may illustrate systems, apparatuses, and/or methods according to the teachings of the disclosure, the disclosure is not limited to these illustrated systems, apparatuses, and/or methods.
- One or more functions or components of any of FIGS. 1-11 as illustrated or described herein may be combined with one or more other portions of another function or component of FIGS. 1-11 . Accordingly, no single example described herein should be construed as limiting and examples of the disclosure may be suitably combined without departing from the teachings of the disclosure.
- one or more of the methods of FIGS. 5-6 individually or in combination, may be performed by the processors 1110 of FIG. 11 .
- one or more operations described with reference to the FIGS. 5-6 may be optional, may be performed at least partially concurrently, and/or may be performed in a different order than shown or described.
- a software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art.
- An exemplary storage medium is coupled to the processor such that the processor may read information from, and write information to, the storage medium.
- the storage medium may be integral to the processor.
- the processor and the storage medium may reside in an application-specific integrated circuit (ASIC).
- ASIC application-specific integrated circuit
- the ASIC may reside in a computing device or a user terminal.
- the processor and the storage medium may reside as discrete components in a computing device or user terminal.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Detection And Prevention Of Errors In Transmission (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
Description
- The present disclosure is generally related to adjusting coding mode in voice communication.
- Advances in technology have resulted in smaller and more powerful computing devices. For example, there currently exist a variety of portable personal computing devices, including wireless telephones such as mobile and smart phones, tablets, and laptop computers that are small, lightweight, and easily carried by users. These devices can communicate voice and data packets over wireless networks. Further, many such devices incorporate additional functionality such as a digital still camera, a digital video camera, a digital recorder, and an audio file player. Also, such devices can process executable instructions, including software applications, such as a web browser application, that can be used to access the Internet. As such, these devices can include significant computing capabilities.
- Electronic devices, such as wireless telephones, may send and receive data via networks. For example, audio data may be sent and received via a circuit-switched network (e.g., the public switched telephone network (PSTN), a global system for mobile communications (GSM) network, etc.) or a packet-switched network (e.g., a voice over internet protocol (VoIP) network, a voice over long-term evolution (VoLTE) network, etc.). The evolution of the 3rd Generation Partnership Project (3GPP) speech codecs has evolved from Adaptive Multi-Rate (AMR) to Adaptive Multi-Rate Wideband (AMR-WB) and to Enhanced Voice Services (EVS).
- In a packet-switched network, audio packets corresponding to a sequence of audio frames may be individually routed from a source device to a destination device. Due to network conditions, the audio packets may arrive out of order. The destination device may store received packets in a de-jitter buffer and may rearrange the received packets if the received packets are out-of-order.
- The destination device may generate a processed audio signal based on the received packets. A particular packet sent by the source device may not be received, or may be received with errors, by a destination device. The destination device may be unable to recover all or a portion of the data associated with the particular packet. As a buffer depth of the de-jitter buffer increases and a buffering time of a packet prior to being processed increases, quality of a processed audio signal may improve because more packets have time to reach the destination device. However, increasing the buffer depth may increase end-to-end delay adversely impacting a user experience. Reducing the buffer depth reduces the end-to-end delay, but increases chances that the processed audio signal is generated without data for some frames of the sequence of frames because packets corresponding to the frames have not reached the destination device in time.
- In a particular aspect, a device includes a de-jitter buffer, an analyzer, a transmitter, and a speech decoder. The de-jitter buffer is configured to receive a first data encoded by a transmitting device based on a first coding mode. The first coding mode may indicate both primary frame encoding and redundant frame encoding. The de-jitter buffer is configured to receive a second data encoded by the transmitting device based on a second coding mode. The second coding mode may indicate only primary frame encoding without redundant frame encoding. The analyzer is configured to determine whether a condition for coding mode change is satisfied. The transmitter is configured to transmit, in response to determining that the condition is satisfied by the analyzer, a request to the transmitting device. The request may indicate coding mode change to the second coding mode corresponding to primary frame encoding only without redundant frame encoding. The speech decoder is configured to decode the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal. The speech decoder is further configured to decode the second data which includes a primary copy of a third frame of the audio signal without a partial copy of another frame.
- In another aspect, a method includes receiving a first data. The first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding. The method also includes decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal. The method further includes determining whether a condition for coding mode change is satisfied and transmitting a request to the transmitting device in response to determining that the condition is satisfied. The request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding. The method also includes receiving a second at the receiving device. The second data maybe encoded by the transmitting device based on the second coding mode. The method also includes decoding the second data. The second data may include a primary copy of a third frame of the audio signal.
- In another aspect, a non-transitory computer-readable medium includes instructions that, when executed by a processor within a processor, cause the processor to perform operations including receiving a first data. The first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding. The operations also include decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal. The operations further include determining whether a condition for coding mode change is satisfied and transmitting a request to the transmitting device in response to determining that the condition is satisfied. The request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding. The operations also include receiving a second at the receiving device. The second data maybe encoded by the transmitting device based on the second coding mode. The operations also include decoding the second data. The second data may include a primary copy of a third frame of the audio signal.
- In another aspect, a device includes means for receiving a first data. The first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding. The device also includes means for decoding the first data, which includes a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal. The device further includes means for determining whether a condition for coding mode change is satisfied and means for transmitting a request to the transmitting device in response to determining that the condition is satisfied. The request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding. The device also includes means for receiving a second at the receiving device. The second data maybe encoded by the transmitting device based on the second coding mode. The device also includes means for means for decoding the second data. The second data may include a primary copy of a third frame of the audio signal.
-
FIG. 1 is a particular illustrative example of a system operable to change coding mode based on a request; -
FIG. 2 is a diagram illustrating a particular example for operation of channel aware mode; -
FIG. 3 is a particular exemplary diagram for high-level description of channel aware encoding; -
FIG. 4 is a particular exemplary diagram for high-level description of channel aware decoding; -
FIG. 5 is a flow chart illustrating a particular method of changing coding mode; -
FIG. 6 is another flow chart illustrating a particular method of changing coding mode for operating EVS codec; -
FIG. 7 is a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP; -
FIG. 8 is a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP2; -
FIG. 9 is a diagram conceptually illustrating an example of an access network in a UTRAN or RAN architecture; -
FIG. 10 is a block diagram of a base station (BS) in communication with a UE; and -
FIG. 11 is a block diagram of a particular illustrative example of a device that is operable to switch coding mode. - The principles described herein may be applied, for example, to a headset, a handset, or other audio device that is configured to perform redundancy based packet transmission error recovery. Unless expressly limited by its context, the term “signal” is used herein to indicate any of its ordinary meanings, including a state of a memory location (or set of memory locations) as expressed on a wire, bus, or other transmission medium. Unless expressly limited by its context, the term “generating” is used herein to indicate any of its ordinary meanings, such as computing or otherwise producing. Unless expressly limited by its context, the term “calculating” is used herein to indicate any of its ordinary meanings, such as computing, evaluating, smoothing, and/or selecting from a plurality of values. Unless expressly limited by its context, the term “obtaining” is used to indicate any of its ordinary meanings, such as calculating, deriving, receiving (e.g., from another component, block or device), and/or retrieving (e.g., from a memory register or an array of storage elements).
- Unless expressly limited by its context, the term “producing” is used to indicate any of its ordinary meanings, such as calculating, generating, and/or providing. Unless expressly limited by its context, the term “providing” is used to indicate any of its ordinary meanings, such as calculating, generating, and/or producing. Unless expressly limited by its context, the term “coupled” is used to indicate a direct or indirect electrical or physical connection. If the connection is indirect, it is well understood by a person having ordinary skill in the art, that there may be other blocks or components between the structures being “coupled.”
- The term “configuration” may be used in reference to a method, apparatus/device, and/or system as indicated by its particular context. Where the term “comprising” is used in the present description and claims, it does not exclude other elements or operations. The term “based on” (as in “A is based on B”) is used to indicate any of its ordinary meanings, including the cases (i) “based on at least” (e.g., “A is based on at least B”) and, if appropriate in the particular context, (ii) “equal to” (e.g., “A is equal to B”). In the case (i) where A is based on B includes based on at least, this may include the configuration where A is coupled to B. Similarly, the term “in response to” is used to indicate any of its ordinary meanings, including “in response to at least.” The term “at least one” is used to indicate any of its ordinary meanings, including “one or more.” The term “at least two” is used to indicate any of its ordinary meanings, including “two or more.”
- The terms “apparatus” and “device” are used generically and interchangeably unless otherwise indicated by the particular context. Unless indicated otherwise, any disclosure of an operation of an apparatus having a particular feature is also expressly intended to disclose a method having an analogous feature (and vice versa), and any disclosure of an operation of an apparatus according to a particular configuration is also expressly intended to disclose a method according to an analogous configuration (and vice versa). The terms “method,” “process,” “procedure,” and “technique” are used generically and interchangeably unless otherwise indicated by the particular context. The terms “element,” “block,” and “module” may be used to indicate a portion of a greater configuration and may be implemented in hardware (e.g., circuitry), software or a combination of both.
- As used herein, the term “communication device” refers to an electronic device that may be used for voice and/or data communication over a wireless communication network. Examples of communication devices include cellular phones, personal digital assistants (PDAs), handheld devices, headsets, wireless modems, laptop computers, personal computers, etc. The devices described herein may be compatible with one or more mobile telecommunication technologies. For example, the devices described herein may be compatible with third generation (3G) mobile telecommunication technologies, fourth generation (4G) mobile telecommunication technologies, and/or fifth generation (5G) mobile telecommunication technologies. Additionally, or in the alternative, the devices described herein may be compatible with different standards (e.g., a Long-Term Evolution (LTE) wireless communication standard, a LTE-A wireless communication standard, a Worldwide Interoperability for Microwave Access (WiMAX) wireless communication standard, etc.).
- Unless otherwise indicated, the terms “codec,” “vocoder,” “audio coder,” and “speech coder” refer to the combination of an audio encoder and a corresponding audio decoder. Unless otherwise indicated, the term “coding” indicates transfer of an audio signal via a codec, including encoding and subsequent decoding. Unless otherwise indicated, the term “transmitting” indicates propagating (e.g., a signal) into a transmission channel.
- Audio telecommunications applications may be implemented in a packet-switched network. For example, audio telecommunications applications may be implemented in a Voice over Internet Protocol (VoIP) network. A packet may include one or more frames of the encoded audio signal, and packets with audio (e.g., voice) information may be transmitted from a first device to a second device on the network. However, some of the packets may be lost during the transmission of the packets. For example, the loss of multiple packets (sometimes referred to as burst packet loss) may be a reason for the degradation of perceived speech quality at a receiving device.
- In order to alleviate the degradation of the perceived speech quality caused by packet losses in a VoIP network, several solutions may exist. One solution is a receiver-based packet loss concealment (PLC) approach. A PLC method may be used to mask the effects of packet loss in VoIP communications. For example, a PLC method may be implemented to create a substitute packet instead of the one that was lost during transmission. Such a PLC method may attempt to create a packet as similar as possible to the one that was lost. Receiver-based PLC methods may not need any additional resources or help from the sender in order to create the substitute packet. When important speech frames are lost, however, a PLC method may be ineffective at masking effects of the packet loss. Another solution is a sender-based packet loss resilient approach. Such an approach includes forward error correction (FEC) methods, which may include sending some additional data with each packet. The additional data may be used to restore errors caused by the loss of data during the transmission. For example, FEC schemes may transmit redundant audio frames. In other words, more than one copy (typically two) of an audio frame is transmitted by the sender. These two frames may be referred to as a primary copy and a redundant copy.
- Although sender-based packet loss resilient schemes may improve the perceptual quality of the decoded speech, these schemes may also increase the bandwidth used during transmission of the speech. Traditional FEC schemes may also increase the end-to-end delay, which may be intolerable for real-time conversations. For example, conventional sender-based schemes send the same speech frame twice at two different time periods. This approach may at least double the data rate. Some conventional schemes may use a low-bit rate codec for the redundant copy in order to reduce the data rate. However, the use of a low-bit rate codec may increase the complexity at the encoder. In addition, some conventional schemes may use the same low-bit rate codec for both the primary copy of the frame and the redundant copy of the frame. Although this approach may reduce the complexity at the encoder as well as reduce the data rate, the baseline speech quality (i.e., the speech quality when no frames are lost) may be greatly reduced. Further, conventional sender-based schemes typically operate under the assumption of an additional delay of at least one frame interval.
- Systems, methods, and apparatus as described herein may be implemented to provide a source- and channel-controlled FEC scheme in order to obtain an optimal trade-off between speech quality, delay, and data rate. The FEC scheme may be configured such that no additional delay is introduced. High quality improvement of the speech quality under moderate data rate increases may be achieved. An FEC scheme as described herein may also operate at any target data rate. In one example, the FEC scheme and a target data rate may be adaptively adjusted based on the condition of a transmission channel as well as external controls.
- For some codecs for audio (e.g., voice) communications, the total number of bits used to encode each frame is a predetermined constant. Examples of such codecs include the Adaptive Multi Rate (AMR) speech codec, the AMR Wideband (AMR-WB) speech codec, and Enhanced Voice Services (EVS) speech codec in which the number of bits is determined by the coding mode selected for the frame. In such cases, transmitting a redundant copy of a past frame may require a corresponding reduction in the number of bits available for coding the signal information in the current frame. This reduction may have a negative impact on the perceptual quality of the decoded speech.
- 3GPP EVS supports bandwidths over an audio frequency range up to 20 kHz for four modes in EVS. The four supported bandwidths include narrowband (NB), wideband (WB), super wideband (SWB) and full band (FB). In various examples, NB supports voice, WB supports high definition (HD) voice, SWB supports voice (including HD voice) and music and FB supports voice (including HD voice) and high definition (HD) music. In various examples, EVS supports a wide range of audio frequencies with the following attributes: a) the low-range frequencies may improve naturalness and listening comfort; b) the mid-range frequencies may improve voice clarity and intelligibility; and c) the high-range frequencies may improve sense of presence and contribute to better music quality.
- Table 1 illustrates examples of EVS bitrates and supported bandwidths. The EVS bitrates are the source bitrates; that is after source compression or source coding. The EVS bitrates are in units of kilobits per second (kbps). Each EVS bitrate in Table 1 is mapped to corresponding supported bandwidths, where NB is narrowband, WB is wideband, SWB is super wideband and FB is full band. Each bitrate is unique in its mapping to the supported bandwidth except for bitrate 13.2 kbps which has a channel aware option that does not include NB as its supported bandwidth. In various examples, all the bitrates illustrated in Table 1 support discontinuous transmission (DTX).
-
TABLE 1 Supported EVS Bitrates (kbps) Bandwidth Notes 5.9 (SC-VBR) NB, WB Source-controlled variable bitrate with DTX 7.2 NB, WB 8.0 NB, WB 9.6 NB, WB, SWB 13.2 NB, WB, SWB 13.2 (Channel Aware Mode) WB, SWB 16.4 NB, WB, SWB, FB 24.4 NB, WB, SWB, FB 32 WB, SWB, FB 48 WB, SWB, FB 64 WB, SWB, FB 96 WB, SWB, FB 128 WB, SWB, FB - The EVS channel aware mode (CAM) significantly improve the performance under degraded channel conditions while maintaining the clean channel quality. EVS CAM allows in-band redundancy transmission as part of the codec payload in a constant bitrate stream, and is implemented for wideband (WB) and super-wideband (SWB) at 13.2 kbps. The transmission of redundancy (a partial copy or a partial data of another frame) may be either channel-controlled or source-controlled. For channel-controlled case, an encoder encoding behavior may be based on condition of a network such as network congestion or frame error rate. For source-controlled case, an encoder can use properties of an input source signal to determine the frames that are critical for high quality reconstruction and selectively transmit redundancy for those frames only. Furthermore, an encoder can also identify the frames that can be best coded at a reduced bitrate in order to accommodate the attachment of redundancy while keeping the bitstream at a constant 13.2 kbps rate.
- Referring to
FIG. 1 , a particular illustrative example of a system operable to change coding mode based on a request is disclosed and generally designated 100. Thesystem 100 includes afirst device 102 in communication with one or more other devices (e.g., a second device 122) via anetwork 150. Thefirst device 102 may send data to thesecond device 122 via thenetwork 150 using afirst path 152, and thesecond device 122 may send data to thefirst device 102 via thenetwork 150 using asecond path 154. - The
first device 102 may communicate with thenetwork 150 via a firstreverse channel 152 a (e.g., a first reverse link) and a firstforward channel 154 b (e.g., a first forward link). For example, thefirst device 102 may transmit data to thenetwork 150 using the firstreverse channel 152 a, and thefirst device 102 may receive data from thenetwork 150 using the firstforward channel 154 b. Thesecond device 122 may communicate with thenetwork 150 via a secondreverse channel 154 a (e.g., a second reverse link) and a secondforward channel 152 b (e.g., a second forward link). For example, thesecond device 122 may transmit data to thenetwork 150 using the secondreverse channel 154 a, and thesecond device 122 may receive data from thenetwork 150 using the secondforward channel 152 b. - The
network 150 may include one or more base stations or access points to communicate data between thefirst device 102 and thesecond device 122. As used herein, data (e.g., packets, frames, offset values, acknowledgements, etc.) communicated via thefirst path 152 corresponds to data transmitted from thefirst device 102 to thenetwork 150 via the firstreverse channel 152 a and received at thesecond device 122 from thenetwork 150 via the secondforward channel 152 b. In a similar manner, data communicated via thesecond path 154 corresponds to data transmitted from thesecond device 122 to thenetwork 150 via the secondreverse channel 154 a and received at thefirst device 102 from thenetwork 150 via the firstforward channel 154 b. - The
devices FIG. 1 . For example, thedevices first device 102 and/or thesecond device 122 may be a smart phone, a cellular phone, a mobile communication device, a laptop computer, a computer, a tablet, a PDA, a set top box, a video player, an entertainment unit, a display device, a television, a gaming console, a music player, a radio, a digital video player, a digital video disc (DVD) player, a tuner, a camera, a navigation device, or a combination thereof. Such devices may include a user interface (e.g., a touch screen, voice recognition capability, or other user interface capabilities). - The
first device 102 includes afirst speech vocoder 104, areceiver 106, and atransmitter 108. Thefirst speech vocoder 104 includes anencoder 110, ade-jitter buffer 112, adecoder 114, and ananalyzer 116. Thesecond device 122 includes asecond speech vocoder 124, areceiver 126, and atransmitter 128. Thesecond speech vocoder 124 includes anencoder 130, ade-jitter buffer 132, adecoder 134, and ananalyzer 136. As described herein, thefirst device 102 will be described as a “receiving device” and thesecond device 122 will be described as a “transmitting device.” For example, thesecond device 122 may transmit packets that are received by thefirst device 102. However, in other implementations, eachdevice first device 102 may transmit packets to thesecond device 122 via the first path 152 (e.g., transmit packets to thenetwork 150 via the firstreverse channel 152 a) and concurrently receive packets from thesecond device 122 via the second path 154 (e.g., receive packets from thenetwork 150 via the firstforward channel 154 b). Additionally, thesecond device 122 may transmit packets to thefirst device 102 via the second path 154 (e.g., transmit packets to thenetwork 150 via the secondreverse channel 154 a) and concurrently receive packets from thefirst device 102 via the first path 152 (e.g., receive packets from thenetwork 150 via the secondforward channel 152 b). - The
receiver 106 may be configured to receive data (e.g., one or more packets) from thesecond device 122 via thesecond path 154. To illustrate, thetransmitter 128 may be configured to transmit afirst packet 160, asecond packet 162, and anNth packet 164 to thenetwork 150 via the secondreverse channel 154 a. Thereceiver 106 may be configured to receive thefirst packet 160, thesecond packet 162, and theNth packet 164 from thenetwork 150 via the firstforward channel 154 b. N may be any integer value that is greater than zero. - The
receiver 106 may provide the received packets 160-164 to thede-jitter buffer 112. Thede-jitter buffer 112 may be configured to store (e.g., buffer) the packets 160-164 received by thereceiver 106. To illustrate, thede-jitter buffer 112 may have a storage capacity (e.g., a “depth”) to store a number “D” of packets. The packets 160-164 may arrive out of order due to the dynamic nature of thewireless network 150. For example, thesecond packet 162 may arrive at thereceiver 106 prior to thefirst packet 160 arriving at thereceiver 106 due to a network delay, a data handling capacity, wireless signal conditions, a network load, etc. Thede-jitter buffer 112 may be configured to “absorb” or reduce the jitter (e.g., the delay) in the packet arrival time by buffering the packets 160-164 received by thereceiver 106 and providing a packet thedecoder 114 at regular intervals (e.g., approximately once every 20 milliseconds (ms)). - The
analyzer 116 or, alternatively, thedecoder 114 may reorder one or more of the packets 160-164 in thede-jitter buffer 112 if the packets 160-164 are out-of-order. Additionally, one or more packets 160-164 sent by thesecond device 122 may not be received, or may be received with errors, by thefirst device 102. For example, a packet (e.g., the first packet 160) may not be received due to packet loss or may be partially received, due to network conditions, by thereceiver 106. Theanalyzer 116 or thedecoder 114 may determine whether a particular packet is missing from thede-jitter buffer 112. - To circumvent problems at the
decoder 114 caused by a lost packet (or a packet received with errors), subsequent packets may include error correction data associated with the lost packet. According to one implementation, the error correction data may include a partial copy of the lost packet. Thus, theanalyzer 116 or thedecoder 114 may retrieve a subsequent packet (having error correction data associated with the lost packet) from thede-jitter buffer 112 in response to determining that the lost packet is missing from thede-jitter buffer 112. For example, if theanalyzer 116 or thedecoder 114 determines that the first packet 160 (e.g., a “current frame”) is missing from the de-jitter buffer 112 (or includes errors), theanalyzer 116 or thedecoder 114 may determine whether a subsequent packet (that includes error correction data or partial copy associated with the first packet 160) is stored in thede-jitter buffer 112. - The actual transmission of the partial (redundant) copy of the first packet 160 (e.g., for frame N) may occur at frame N+offset (e.g., the offset value 166), where the value of offset k may indicate the distance between the primary copy of a frame and the partial copy of the frame. This offset value k may also be referred to as FEC offset value. In one example, typical values of the offset k may include one, two, three, five, and integers greater than five. The offset value k (e.g., the offset value 166) may be fixed (e.g., during system implementation or during call set-up) or may be selected and/or updated during a call according to channel conditions based on feedback (e.g., coding mode change request 186) from the receiving device (e.g., the first device 102). For example, it may be desirable to use a higher value of offset k in an environment that is causing a frequent loss of consecutive frames (e.g., due to long fades). According to one implementation, the
analyzer 116 may determine an optimum offset value (Xopt) and thetransmitter 108 may transmit the offsetvalue 166 to thereceiver 126 of thesecond device 122. The optimum FEC offset value (Xopt) may correspond to the offsetvalue 166 that yields the greatest probability of successfully decoding a lost packet. - The
receiver 126 of thesecond device 122 may receive the offsetvalue 166 and provide the offsetvalue 166 to theencoder 130. Upon reception of the offsetvalue 166, thesecond device 122 may send anacknowledgment 168 to thefirst device 102 via thesecond path 154. For example, when theencoder 130 receives the offsetvalue 166, the encoder 130 (or another processor) may generate theacknowledgement 168, and thetransmitter 128 may transmit theacknowledgement 168 to thenetwork 150 via the secondreverse channel 154 a. According to one implementation, one or more packets that include partial redundancy data transmitted from thesecond device 122 to thefirst device 102 may include theacknowledgment 168 in addition to, or instead of, sending a dedicated message to signal acknowledgement. Thesecond device 122 may also acknowledge receipt of the offsetvalue 166 by including the offsetvalue 166 in the frame that includes the partial redundancy. - The
analyzer 116 may determine whether a condition to trigger coding mode change is satisfied. In one implementation, theanalyzer 116 may determine whether the condition for coding mode change forencoders 130 110 is satisfied or not based on how many partial copies among receivedpackets 160 162 164 from the second device being decoded by thedecoder 114. In another implementation, theanalyzer 116 may determine whether the condition to trigger coding mode change forencoders 130 110 is satisfied or not based on statistical information indicating how many partial copies being decoded by thedecoder 114 in lieu of their corresponding primary copies. For example, theanalyzer 116 may monitor the latest M number of received packets encoded by theencoder 130 according to channel aware mode in which case all the transmittedpackets 160 162 164 from thesecond device 122 may include primary copies, and all or some of the transmittedpackets 160 162 164 by thesecond device 122 may include partial copies. In an ideal or excellent channel condition scenario, it is likely that all or most of the primary copies of transmittedpackets 160 162 164 may be received by the first device. In such case, the likelihood of any primary copies within thepackets 160 162 164 being decoded by thedecoder 114 in lieu of the corresponding primary copy is quite low. Then, theanalyzer 116 may determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode. - Alternatively, in an extremely harsh channel condition scenario where frame error rate is high, the
analyzer 116 may also determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from a channel aware mode to a non-channel aware mode. For example, in such extremely harsh channel condition scenario, it is likely that many of the partial copies of transmittedpackets 160 162 164 may be also lost and thus not available in thede-jitter buffer 112. Then, theanalyzer 116 may determine the effectiveness of partial frame encoding is low and may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode. - According to another implementation, the
analyzer 116 may determine whether the condition to trigger coding mode change forencoders 130 110 is satisfied or not based on a real-time quality metric of decoded speech signal. The real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples. For example, theanalyzer 116 may calculate quality measurement of decoded speech sample based on Perceptual Evaluation of Audio Quality (PEAQ) or Perceptual Evaluation of Speech Quality (PESQ). Theanalyzer 116 may compare the quality measurement with a threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement. In a case where theanalyzer 116 determines the quality contribution by the partial copies included the transmittedpackets 160 162 164 from thesecond device 122 is below the threshold, the analyzer may decide to send the second device 122 a request for coding mode change from, for example, a channel aware mode to a non-channel aware mode. - According to another implementation, the
analyzer 116 may determine whether the condition to trigger coding mode change forencoders 130 110 is satisfied or not based on information or indication received from another layer (e.g., upper layers) of communication protocol. For example, this indication may be from audio layer through error concealment mechanism. - If the
analyzer 116 determines a condition to trigger coding mode change is satisfied, thetransmitter 108 may transmit a request for coding mode change to thesecond device 122 through thefirst path 152. Upon receiving the request by the second device, theanalyzer 136, or alternatively theencoder 130, may configure theencoder 130 from a first coding mode to a second coding mode. According to one implementation, the first coding mode may correspond to a channel aware mode and the second coding mode may correspond to a non-channel aware mode. For example, the first coding mode may correspond to EVS 13.2 kbps a channel aware mode and the second coding mode may correspond to EVS 13.2 kbps a non-channel aware mode. In another implementation, the first coding mode may correspond to EVS 13.2 kbps a channel aware mode and the second coding mode may correspond to EVS a non-channel aware mode with higher bitrate than 13.2 kbps. As a non-limiting example, the second coding mode may correspond to EVS 24.4 kbps coding mode. Alternatively, the first coding mode may correspond to a non-channel aware mode and the second coding mode may correspond to a channel aware mode. - Referring to
FIG. 2 , a diagram illustrating a particular example for operation of channel aware mode is disclosed and generally designated 200. The diagram 200 includesde-jitter buffer 210 andspeech decoder 220. Thede-jitter buffer 210 may correspond to thede-jitter buffer 112 132 ofFIG. 1 , and thespeech decoder 220 may correspond to thedecoder 114 134 ofFIG. 1 . Thede-jitter buffer 210 includes primary copies and partial copies. In this particular example, the offset value k is chosen 2 for illustration only. According to one implementation, one ormore packets 160 162 164 may include both the primary copy and the partial copy. For example, the primary copy for frame N−1 211 and the partial copy for frame N−3 201 may be included in thefirst packet 160, and the primary copy forframe N 213 and the partial copy for frame N−2 203 may be included in thesecond packet 162. - The
primary copies 211 213 215 217 in the de-jitter buffer may have been encoded by theencoder 130 in thesecond speech vocoder 124 based on primary frame coding scheme. In one particular example, theprimary copies 211 213 215 217 may have been encoded based on primary frame coding scheme according to the EVS 13.2 kbps channel aware mode. Thepartial copies 201 203 205 207 in the de-jitter buffer may have been encoded by theencoder 130 in thesecond speech vocoder 124 based on redundant frame coding scheme. In one particular example, thepartial copies 201 203 205 207 may have been encoded based on redundant frame coding scheme according to the EVS 13.2 kbps channel aware mode. thesecond device 122 may sendpackets 160 162 164 thefirst device 102 may operate on a constant-bit-rate (e.g., 13.2 kbps) channel. In this implementation, a primary frame bit-rate corresponding to primary copy may be reduced (e.g., to 9.6 kbps) to accommodate the redundant copy. For example, a remaining bit-rate (e.g., 3.6 kbps) of the constant-bit-rate may correspond to the redundant copy. - The
speech decoder 220 may decide which copy among the primary copy and the partial copy should be decoded based on the availability of each packet in thede-jitter buffer 210 at the time of decoding each frame. According to one implementation, the analyzer 116 (not shown inFIG. 2 ) may determine, at the time for frame N−1 decoding, that primary copy for the frame N−1 211 is available in thede-jitter buffer 210 and thespeech decoder 220 may subsequently decode the primary copy for the frame N−1 211. At the time for frame N decoding, however, theanalyzer 116 may determine that primary copy for theframe N 213 is not available in thede-jitter buffer 210. Theanalyzer 116 may search if the partial copy for theframe N 207 is available in thede-jitter buffer 210. In case the partial copy for theframe N 207 is indeed available in thede-jitter buffer 210, then theanalyzer 116 may instruct thespeech decoder 220 to decode the partial copy for theframe N 207, in lieu of the primary copy for theframe N 213. In a similar manner, thespeech decoder 220 may proceed with decoding the primary copy for the frame N+1 215 and the primary copy for the frame N+2 217 because theseprimary copies 215 217 are already available in thede-jitter buffer 210. - Referring to
FIG. 3 , a particular exemplary diagram for high-level description of channel aware encoding is disclosed and generally designated 300. The diagram 300 includes pre-processing 310,primary frame coding 320,redundant frame coding 340,redundant frame configuration 330, and bit-packing 350 blocks. In one implementation, the diagram 300 may indicate EVS channel aware encoding scheme. - The
pre-processing block 310 may perform various signal processing on input audio signal that may be sampled at either 16 kHz (WB) or 32 kHz (SWB) and may be segmented, for example, into frames of 20 msec. In one implementation, thepre-processing block 310 may perform resampling of the input frame (e.g., to 12.8 kHz); signal detection such as a voice activity detection (VAD), bandwidth detection, time-domain transient detection; signal classification such as speech/music classification or open-loop classification; and signal parameter estimation such as correlation, frame type, and pitch lag. - The redundant
frame configuration block 330 may determine, based on the parameters from thepre-processing block 310, the compressibility of a current frame to enable the inclusion of a partial copy associated with a previous frame. For example, the compressibility of the current frame may indicate whether the current frame can allow for bitrate reduction with minimal perceptual impact or the current frame is “critical frame.” A frame may be considered as critical to protect when loss of that frame would cause significant impact to the speech quality at the receiver. The threshold to determine whether a particular frame is critical or not may be a configurable parameter at the encoder, which can be dynamically adjusted depending on network conditions. For example, under high FER conditions it may be desirable to adjust the threshold to classify more frames as critical. The criticality may also depend on the ability to quickly recover from the loss of a previous frame. For example, if the current frame depends heavily on the previous frame's synthesis, then the current frame may get re-classified from being non-critical. - The redundant
frame configuration block 330 may further determine, based on the parameters from thepre-processing block 310, redundant frame (RF) type classification which controls the number of bits needed to faithfully reconstruct the current frame through the partial copy that is transmitted in a future frame. For example, the partial copy of the frame N will be transmitted along with frame N+2 because the FEC offset is 2 in this example. Strongly-voiced and unvoiced frames are suitable for carrying partial copies of a previous frame with negligible perceptual impact to the primary frame quality. - The primary
frame coding block 320 may encode a plurality of input frames according to either ACELP coding technology or an MDCT-based coding technology and generate a plurality ofprimary frame packets 360. The selection between ACELP coding technology and an MDCT-based coding technology for any particular frame may be based on the signal characteristic of the particular frame or may be based on the signal received from redundantframe configuration block 330. The primaryframe coding block 320 may determine which frames can be best be encoded at a reduced frame rate, and the size of each of the plurality ofprimary frame packets 360 may vary based on a measure of compressibility of each frame. For example, theprimary frame packet 364 shows better compressibility than theprimary frame packet 363 or theprimary frame packet 365. - The primary
frame coding block 320 may use the ACELP coding technology to encode the low band core up to 6.4 kHz while the upper band that is beyond 6.4 kHz and up to the Nyquist frequency is encoded using the Time-domain Bandwidth Extension (TBE) technology. The upper band is parameterized into LSPs and gain parameters to capture both the temporal evolution per sub-frame as well as over an entire frame. The primaryframe coding block 320 may also use the MDCT-based coding technologies (e.g., Transform Coded Excitation (TCX) and Intelligent Gap Filling (IGF)) to encode the background noise frames and mixed/music content more efficiently. In one implementation, the EVS CAM leverages these ACELP and TCX core advancements for primary frame encoding. Additionally, a classifier, such as an SNR-based open-loop classifier, may be used to decide whether to choose the ACELP/TBE based coding technology or the MDCT-based coding technology to encode the primary frame. - The redundant
frame coding block 340 may perform compact re-encoding of small subset of key parameters that are critical to protect and may generate a plurality ofredundant frame packets 370. The set of critical parameters may be identified based on the frame's signal characteristics and may be re-encoded at a much lower bitrate (e.g., less than 3.6 kbps). The size of theredundant frame packets 370 is variable and may depend on the characteristics of the input signal or may be based on the signal received from redundantframe configuration block 330. In one implementation, the criticality measure may be used to determine whether redundant frame coding is required or to determine the size of theredundant frame packets 370. - The redundant
frame coding block 340 may perform redundant frame encoding based on either ACELP-based coding technology or MDCT-based coding technology. The redundantframe coding block 340 may perform an ACELP-based redundant frame encoding based on the RF type received from the redundantframe configuration block 330. In one implementation, the redundantframe coding block 340 may perform an ACELP redundant frame encoding for the RF frame types such as RF_NOPRED, RF_ALLPRED, RF_GENPRED, or RF_NELP. The redundantframe coding block 340 may perform an MDCT-based redundant frame encoding based on the RF type received from the redundantframe configuration block 330. In one implementation, the redundantframe coding block 340 may perform an MDCT-based redundant frame encoding for the RF frame types such as RF_TCXFD, RF_TCXTD1, or RF_TCXTD2. - The bit-
packing block 350 may arrange theprimary frame packets 360 and the redundant frame packets (e.g., the partial copy) 370, and may generate encodedpackets 380. The bit-packing block 350 may arrange theprimary frame packets 360 and theredundant frame packets 370 based on FEC offset (e.g., 2). For instance, when FEC offset is 2, bit-packing block 350 may piggyback the redundant frame packet 372 (e.g., for redundant frame packet for frame 2) on the primary frame packet 366 (e.g., for primary frame packet labelled for frame 4). The encodedpackets 380 may include not only theprimary frame packets 360 and theredundant frame packets 370, but also additional parameters, for example, such as RF frame type and FEC offset. - Referring to
FIG. 4 , a particular exemplary diagram for high-level description of channel aware decoding is disclosed and generally designated 400. The diagram 400 includes jitter buffer management (JBM) 410,bitstream parser 430,primary frame decoding 420,redundant frame decoding 440, and synthesis andmemory update 450 blocks. In one implementation, the diagram 400 may indicate EVS channel aware decoding scheme. - The
JBM block 410 may receive a plurality ofpackets 480 and store them in a de-jitter buffer. For example, the plurality ofpackets 480 may be transmitted from thefirst device 102 and may be received by thesecond device 122 via thereceiver 126. The de-jitter buffer within the JBM block 410 may correspond to thede-jitter buffer 132. The receivedpackets 480 may include apacket 481 that comprises a primary copy, a packet that comprises both aprimary copy 484 485 486 and apartial copy 487 488 489.Packets 482 483 may represent packets lost during transmission. In one implementation, if a current packet (or frame) is not lost, the JBM block 410 may provide the primary copy of the current packet, which was encoded in accordance with primary frame encoding scheme, to primaryframe decoding block 420 and disregard any RF information present in the current packet. In case the current packet (or frame) is lost, and a partial copy for the current frame is available in the de-jitter buffer as part of another packet, then the JBM block 410 may provide the partial copy for the current frame to theredundant frame decoding 440. If the partial copy for the current frame is also not available in the de-jitter buffer, then thedecoder 134 may perform any frame erasure concealment processing. - The
bitstream parser block 430 may decide whether any particular packet in receivedpackets 480 needs to be decoded either by the primaryframe decoding block 420 or by the redundantframe decoding block 440. In one implementation, thebitstream parser block 430 may be corresponding to theanalyzer 136. Thebitstream parser block 430 may extract the RF frame type information in the receivedpackets 480 to decide whether to pass the received packets to the primaryframe decoding block 420 or the redundantframe decoding block 440. In some configurations, thebitstream parser block 430 may pass the bitstream to the subsequent decoding blocks based on an information from theJBM block 410. Thebitstream parser block 430 may detect whether received packets are successfully received without errors. In some configurations, thebitstream parser block 430 may make this determination based on an indicator from a channel decoder that indicates whether a packet was successfully received or not. - The primary
frame decoding block 420 may receive a plurality ofprimary copy 481 484 485 486 and decode them according to either ACELP coding technology or an MDCT-based coding technology and generate decoded PCM samples for a current frame. The redundantframe decoding block 440 may receive a plurality ofpartial copy 487 488 489 at the time of decoding a current frame, which likely was lost during the transmission or received with errors therein, and decode them according to either ACELP coding technology or an MDCT-based coding technology and generate decoded PCM samples for a current frame. In one implementation, depending on the RF frame type, if the current frame corresponds to an ACELP partial copy, then the RF parameters (e.g., LSPs, ACB and/or FCB gains, and upper band gain) are decoded for ACELP synthesis. ACELP partial copy synthesis follows similar steps to that of the primary frame decoding except that the missing parameters (e.g., certain gains and pitch lags are only transmitted in alternate subframes) are extrapolated. - The synthesis and
memory update block 450 may receive the decoded PCM samples for the current from either theprimary frame decoding 420 or theredundant frame decoding 440, and reconstructoutput audio 460 via post processing. The examples of the post processing may include extrapolation of some parameters derived from the previously decoded frames and may be based on signal characteristic of decoded parameters (e.g., spectral tilt). - Referring to
FIG. 5 , amethod 500 for switching a coding mode from a first coding mode to a second coding mode is disclosed. Themethod 500 includes receiving a first data at a de-jitter buffer of a receiving device, at 510. The de-jitter buffer may correspond to thede-jitter buffer 132 and the receiving device may correspond to thesecond device 122. The first data was encoded by a transmitting device (e.g., the first device 102) based on a first coding mode indicating both primary frame encoding and redundant frame encoding. As a non-limiting example, the first coding mode may correspond to EVS 13.2 kbps channel aware mode, and the second coding mode may correspond to EVS 13.2 kbps non-channel aware mode or EVS 24.4 kbps. - The
method 500 includes decoding the first data, by a speech decoder of the receiving device, at 520. The speech decoder may correspond to thedecoder 134 of thesecond device 122. For example, the first data may include both a partial copy of a first frame, which was encoded by the redundantframe coding block 340, and a primary copy of a second frame, which was encoded by the primaryframe coding block 320. - The
method 500 includes determining, at the receiving device, whether a condition for coding mode change is satisfied, at 530. In some implementation, themethod 530 may be performed by theanalyzer 116. Theanalyzer 116 may determine whether the condition for coding mode change is satisfied based on how many partial copies among recently received packets were used for decoding by thedecoder 114. For example, theanalyzer 116 may make this determination based on the number of partial copies decoded by the redundantframe decoding block 440 among the receivedpartial copies 487 488 489. - Additionally, the
analyzer 116 may determine whether the condition to trigger coding mode change is satisfied based on a real-time quality metric of decoded speech signal. The real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples. For example, theanalyzer 116 may calculate quality measurement of decoded speech sample based on PEAQ or PESQ. Theanalyzer 116 may compare the quality measurement with a threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement. Theanalyzer 116 may also determine whether the condition to trigger coding mode change is satisfied based on information or indication received from another layer (e.g., upper layers) of communication protocol. For example, this indication may be from audio layer through error concealment mechanism. - The
method 500 includes transmitting a request to the transmitting device, in response to determining that the condition is satisfied, at 540. The request may be transmitted by thetransmitter 128 of thesecond device 122 to thereceiver 106 of thefirst device 102. The request may indicate coding mode change for thefirst device 102 from the first coding mode to a second coding mode. In one preferred implementation, the second coding mode may correspond to encoding by only primary frame coding block 320 (i.e., no use of redundant frame encoding scheme). - The
method 500 includes receiving a second data at the de-jitter buffer of the receiving device, where the second data was encoded by the transmitting device based on the second coding mode, at 550. Themethod 500 further includes decoding the second data by the speech decoder of the receiving device. The speech decoder may correspond to thedecoder 134 of thesecond device 122. The second data may include only a plurality of primary copy, which was encoded by the primaryframe coding block 320 without including at least one partial copy. - Referring to
FIG. 6 , amethod 600 for switching EVS coding mode from a channel aware mode to a non-channel aware mode is disclosed. Themethod 600 may be a particular example of themethod 500. Themethod 600 includes receiving a first data at a receiving device, at 610. The receiving device may correspond to thesecond device 122. The first data was encoded by a transmitting device (e.g., the first device 102) based on EVS 13.2 kbps channel aware mode. - The
method 600 includes decoding a partial copy of the first data, by a speech decoder of the receiving device, at 620. The speech decoder may correspond to thedecoder 134 of thesecond device 122. For example, the first data may include both a partial copy of a first frame, which was encoded by the redundantframe coding block 340, and a primary copy of a second frame, which was encoded by the primaryframe coding block 320. In one implementation, the step of decoding the partial copy of the first date may be performed in response to the determination that a current frame to be decoded is not available thede-jitter buffer 132 but the partial copy for the current frame is available in thede-jitter buffer 132. The decoding of the partial copy of the first data may be performed by the redundantframe decoding block 440. - The
method 600 includes calculating a first quality measurement of decoded speech signal, at 630. The decoded speech signal includes decoded samples the redundantframe decoding block 440 based on the partial copy of the first data. Theanalyzer 136 may calculate the first quality measurement of decoded speech signal. In one implementation, theanalyzer 116 may calculate a real-time quality metric of decoded speech signal. The real-time quality metric may be measured objectively or subjectively based on the quality of the decoded audio samples. For example, theanalyzer 116 may calculate quality measurement of decoded speech sample based on PEAQ or PESQ. - The
method 600 includes comparing the first quality measurement with a first threshold, at 640. In some implementation, thestep 640 may be performed by theanalyzer 116. Theanalyzer 116 may compare the quality measurement with the threshold to determine whether the redundant frame coding in a channel aware mode contributes any quality improvement for decoded speech samples. - The
method 600 includes transmitting a request to the transmitting device, in response to determining that the first quality measurement is greater than or equal to the first threshold, at 650. The request may be transmitted by thetransmitter 128 of thesecond device 122 to thereceiver 106 of thefirst device 102. The request may indicate coding mode change for thefirst device 102 from EVS 13.2 kbps channel aware mode to any EVS non-channel aware mode. For example, the EVS non-channel aware mode may include EVS 13.2 kbps non-channel aware mode or EVS 24.4 kbps. - Referring to
FIG. 7 , a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP is disclosed and generally designated 700. The various concepts presented throughout this disclosure may be implemented across a broad variety of telecommunication systems, network architectures, and communication standards. By way of example and without limitation, the aspects of the present disclosure illustrated inFIG. 7 are presented with reference to a UMTS system 1600 employing a W-CDMA air interface. A UMTS network includes three interacting domains: a Core Network (CN) 704, a UMTS Terrestrial Radio Access Network (UTRAN) 702, and User Equipment (UE) 710. In this example, theUTRAN 702 provides various wireless services including telephony, video, data, messaging, broadcasts, and/or other services. TheUTRAN 702 may include a plurality of Radio Network Subsystems (RNSs) such as anRNS 707, each controlled by a respective Radio Network Controller (RNC) such as anRNC 706. Here, theUTRAN 702 may include any number ofRNCs 706 andRNSs 707 in addition to theRNCs 706 andRNSs 707 illustrated herein. TheRNC 706 is an apparatus responsible for, among other things, assigning, reconfiguring and releasing radio resources within theRNS 707. TheRNC 706 may be interconnected to other RNCs (not shown) in theUTRAN 702 through various types of interfaces such as a direct physical connection, a virtual network, or the like, using any suitable transport network. - Communication between a
UE 710 and aNode B 708 may be considered as including a physical (PHY) layer and a Medium Access Control (MAC) layer. Further, communication between aUE 710 and anRNC 706 by way of arespective Node B 708 may be considered as including a Radio Resource Control (RRC) layer. In the instant specification, the PHY layer may be consideredlayer 1; the MAC layer may be consideredlayer 2; and the RRC layer may be consideredlayer 3. - The geographic region covered by the
RNS 707 may be divided into multiple cells, with a radio transceiver apparatus serving each cell. A radio transceiver apparatus is commonly referred to as a Node B in UMTS applications, but may also be referred to by those skilled in the art as a base station (BS), a base transceiver station (BTS), a radio base station, a radio transceiver, a transceiver function, a basic service set (BSS), an extended service set (ESS), an access point (AP), or some other suitable terminology. For clarity, threeNode Bs 708 are shown in eachRNS 707; however, theRNSs 707 may include any number of wireless Node Bs. TheNode Bs 708 provide wireless access points to aCN 704 for any number of mobile apparatuses. In a UMTS system, theUE 710 may further include a universal subscriber identity module (USIM) 711, which contains a user's subscription information to a network. For illustrative purposes, oneUE 710 is shown in communication with a number of theNode Bs 708. The DL, also called the forward link, refers to the communication link from aNode B 708 to aUE 710, and the UL, also called the reverse link, refers to the communication link from aUE 710 to aNode B 708. - The
CN 704 interfaces with one or more access networks, such as theUTRAN 702. As shown, theCN 704 is a GSM core network. However, as those skilled in the art will recognize, the various concepts presented throughout this disclosure may be implemented in a RAN, or other suitable access network, to provide UEs with access to types of CNs other than GSM networks. - The
CN 704 includes a circuit-switched (CS) domain and a packet-switched (PS) domain. Some of the circuit-switched elements are a Mobile Services Switching Centre (MSC), a Visitor Location Register (VLR) and a Gateway MSC. Packet-switched elements include a Serving GPRS Support Node (SGSN) and a Gateway GPRS Support Node (GGSN). In the illustrated example, theCN 704 supports circuit-switched services with aMSC 712 and aGMSC 714. In some applications, theGMSC 714 may be referred to as a media gateway (MGW). One or more RNCs, such as theRNC 706, may be connected to theMSC 712. TheMSC 712 is an apparatus that controls call setup, call routing, and UE mobility functions. TheMSC 712 also includes a VLR that contains subscriber-related information for the duration that a UE is in the coverage area of theMSC 712. TheGMSC 714 provides a gateway through theMSC 712 for the UE to access a circuit-switchednetwork 716. TheGMSC 714 includes a home location register (HLR) 715 containing subscriber data, such as the data reflecting the details of the services to which a particular user has subscribed. The HLR is also associated with an authentication center (AuC) that contains subscriber-specific authentication data. When a call is received for a particular UE, theGMSC 714 queries theHLR 715 to determine the UE's location and forwards the call to the particular MSC serving that location. - The
CN 704 also supports packet-data services with aSGSN 718 and aGGSN 720. GPRS, which stands for General Packet Radio Service, is designed to provide packet-data services at speeds higher than those available with standard circuit-switched data services. TheGGSN 720 provides a connection for theUTRAN 702 to a packet-basednetwork 722. The packet-basednetwork 722 may be the Internet, a private data network, or some other suitable packet-based network. The primary function of theGGSN 720 is to provide theUEs 710 with packet-based network connectivity. Data may be transferred between the 720 and theUEs 710 through theSGSN 718, which performs primarily the same functions in the packet-based domain as theMSC 712 performs in the circuit-switched domain. - Referring to
FIG. 8 , a block diagram conceptually illustrating an example of a telecommunications system based on 3GPP2 employing a cdma2000 interface is disclosed and generally designated 800. A 3GPP2 network may include three interacting domains: a UE 850 (which may also be called a Mobile Station (MS)), a Radio Access Network (RAN) 860, and a Core Network (CN) 870. In various examples, theRAN 860 provides various wireless services including telephony, video, data, messaging, broadcasts, and/or other services. TheRAN 860 may include a plurality ofBTSs 862, each controlled by arespective BSC 864. TheCN 870 interfaces with one or more access networks, such as theRAN 860. TheCN 870 may include a CS domain and a PS domain. Some of the circuit-switched elements are a Mobile Switching Center (MSC) 872 to connect to a Public Switched Telephony Network (PSTN) 880 and an Inter-Working Function (IWF) 874 to connect to a network such as theInternet 890. Packet-switched elements may include a Packet Data Serving Node (PDSN) 876 and a Home Agent (HA) 878 to connect to a network such as theInternet 890. In addition, an Authentication, Authorization, and Accounting (AAA) function (not shown) may be included in theCN 870 to perform various security and administrative functions. - Examples of a UE may include a cellular phone, a smart phone, a session initiation protocol (SIP) phone, a laptop, a notebook, a netbook, a smartbook, a personal digital assistant (PDA), a satellite radio, a global positioning system (GPS) device, a multimedia device, a video device, a digital audio player (e.g., MP3 player), a camera, a game console, or any other similar functioning device. The UE is commonly referred to as a mobile apparatus, but may also be referred to by those skilled in the art as a mobile station, a subscriber station, a mobile unit, a subscriber unit, a wireless unit, a remote unit, a mobile device, a wireless device, a wireless communications device, a remote device, a mobile subscriber station, an access terminal, a mobile terminal, a wireless terminal, a remote terminal, a handset, a terminal, a user agent, a mobile client, a client, or some other suitable terminology.
- Referring to
FIG. 9 , a diagram conceptually illustrating an example of an access network in a UTRAN or RAN architecture is illustrated and generally designated 900. The multiple access wireless communication system includes multiple cellular regions (cells), includingcells cell 902,antenna groups cell 904,antenna groups cell 906,antenna groups cells cell UEs base station 942,UEs base station 944, andUEs base station 946. References to a base station made herein may include thenode B 708 ofFIG. 7 and/or theBTS 862 ofFIG. 8 . - Here, each
base station FIGS. 7-8 ) for all theUEs respective cells UE 934 moves from the illustrated location incell 904 intocell 906, a serving cell change (SCC) or handover may occur in which communication with theUE 934 transitions from thecell 904, which may be referred to as the source cell, tocell 906, which may be referred to as the target cell. Management of the handover procedure may take place at theUE 934, at the base stations corresponding to the respective cells, at aRNC 706 or Base Station Controller (BSC) 864 (seeFIGS. 7-8 ), or at another suitable node in the wireless network. For example, during a call with thesource cell 904, or at any other time, theUE 934 may monitor various parameters of thesource cell 904 as well as various parameters of neighboring cells such ascells UE 934 may maintain communication with one or more of the neighboring cells. During this time, theUE 934 may maintain an Active Set, that is, a list of cells that theUE 934 is simultaneously connected to (i.e., the UTRA cells that are currently assigning a downlink dedicated physical channel DPCH or fractional downlink dedicated physical channel F-DPCH to theUE 934 may constitute the Active Set). - The modulation and multiple access scheme employed by the access network 1700 may vary depending on the particular telecommunications standard being deployed. By way of example, the standard may include Evolution-Data Optimized (EV-DO) or Ultra Mobile Broadband (UMB). EV-DO and UMB are air interface standards promulgated by the 3GPP2 as part of the cdma2000 family of standards and employs CDMA to provide broadband Internet access to user equipment (e.g., mobile stations). The standard may alternately be Universal Terrestrial Radio Access (UTRA) employing Wideband-CDMA (W-CDMA) and other variants of CDMA, such as TD-SCDMA; Global System for Mobile Communications (GSM) employing TDMA; and Evolved UTRA (E-UTRA), Ultra Mobile Broadband (UMB), IEEE 802.11 (Wi-Fi), IEEE 802.16 (WiMAX), IEEE 802.20, and Flash-OFDM employing OFDMA. UTRA, E-UTRA, UMTS, Long-Term Evolution (LTE), LTE Advanced, and GSM are described in documents from the 3GPP organization. cdma2000 and UMB are described in documents from the 3GPP2 organization. The actual wireless communication standard and the multiple access technology employed will depend on the specific application and the overall design constraints imposed on the system.
- Referring to
FIG. 10 , a block diagram of abase station 1010 in communication with aUE 1050 is disclosed and generally designated 1000. Thebase station 1010 may be theNode B 708 or theBTS 862 inFIGS. 7-8 , and theUE 1050 may be theUE FIGS. 7-8 . In the downlink communication, a transmitprocessor 1020 may receive data from adata source 1012 and control signals from a controller/processor 1040. The transmitprocessor 1020 provides various signal processing functions for the data and control signals, as well as reference signals (e.g., pilot signals). For example, the transmitprocessor 1020 may provide cyclic redundancy check (CRC) codes for error detection, coding and interleaving to facilitate forward error correction (FEC), mapping to signal constellations based on various modulation schemes (e.g., binary phase-shift keying (BPSK), quadrature phase-shift keying (QPSK), M-phase-shift keying (M-PSK), M-quadrature amplitude modulation (M-QAM), and the like), spreading with orthogonal variable spreading factors (OVSF), and multiplying with scrambling codes to produce a series of symbols. Channel estimates from achannel processor 1044 may be used by a controller/processor 1040 to determine the coding, modulation, spreading, and/or scrambling schemes for the transmitprocessor 1020. These channel estimates may be derived from a reference signal transmitted by theUE 1050 or from feedback from theUE 1050. The symbols generated by the transmitprocessor 1020 are provided to a transmitframe processor 1030 to create a frame structure. The transmitframe processor 1030 creates this frame structure by multiplexing the symbols with information from the controller/processor 1040, resulting in a series of frames. The frames are then provided to atransmitter 1032, which provides various signal conditioning functions including amplifying, filtering, and modulating the frames onto a carrier for downlink transmission over the wireless medium throughantenna 1034. Theantenna 1034 may include one or more antennas, for example, including beam steering bidirectional adaptive antenna arrays or other similar beam technologies. - At the
UE 1050, areceiver 1054 receives the downlink transmission through anantenna 1052 and processes the transmission to recover the information modulated onto the carrier. The information recovered by thereceiver 1054 is provided to a receiveframe processor 1060, which parses each frame, and provides information from the frames to achannel processor 1094 and the data, control, and reference signals to a receiveprocessor 1070. The receiveprocessor 1070 then performs the inverse of the processing performed by the transmitprocessor 1020 in thebase station 1010. More specifically, the receiveprocessor 1070 descrambles and despreads the symbols, and then determines the most likely signal constellation points transmitted by thebase station 1010 based on the modulation scheme. These soft decisions may be based on channel estimates computed by thechannel processor 1094. The soft decisions are then decoded and deinterleaved to recover the data, control, and reference signals. The CRC codes are then checked to determine whether the frames were successfully decoded. The data carried by the successfully decoded frames will then be provided to adata sink 1072, which represents applications running in theUE 1050 and/or various user interfaces (e.g., display). Control signals carried by successfully decoded frames will be provided to a controller/processor 1090. When frames are unsuccessfully decoded by thereceiver processor 1070, the controller/processor 1090 may also use an acknowledgement (ACK) and/or negative acknowledgement (NACK) protocol to support retransmission requests for those frames. - In the uplink, data from a
data source 1078 and control signals from the controller/processor 1090 are provided to a transmitprocessor 1080. Thedata source 1078 may represent applications running in theUE 1050 and various user interfaces (e.g., keyboard). Similar to the functionality described in connection with the downlink transmission by thebase station 1010, the transmitprocessor 1080 provides various signal processing functions including CRC codes, coding and interleaving to facilitate FEC, mapping to signal constellations, spreading with OVSFs, and scrambling to produce a series of symbols. Channel estimates, derived by thechannel processor 1094 from a reference signal transmitted by thebase station 1010 or from feedback contained in the midamble transmitted by thebase station 1010, may be used to select the appropriate coding, modulation, spreading, and/or scrambling schemes. The symbols produced by the transmitprocessor 1080 will be provided to a transmitframe processor 1082 to create a frame structure. The transmitframe processor 1082 creates this frame structure by multiplexing the symbols with information from the controller/processor 1090, resulting in a series of frames. The frames are then provided to atransmitter 1056, which provides various signal conditioning functions including amplification, filtering, and modulating the frames onto a carrier for uplink transmission over the wireless medium through theantenna 1052. - The uplink transmission is processed at the
base station 1010 in a manner similar to that described in connection with the receiver function at theUE 1050. Areceiver 1035 receives the uplink transmission through theantenna 1034 and processes the transmission to recover the information modulated onto the carrier. The information recovered by thereceiver 1035 is provided to a receiveframe processor 1036, which parses each frame, and provides information from the frames to thechannel processor 1044 and the data, control, and reference signals to a receiveprocessor 1038. The receiveprocessor 1038 performs the inverse of the processing performed by the transmitprocessor 1080 in theUE 1050. The data and control signals carried by the successfully decoded frames may then be provided to adata sink 1039 and the controller/processor 1040, respectively. If some of the frames were unsuccessfully decoded by the receive processor, the controller/processor 1040 may also use an ACK and/or NACK protocol to support retransmission requests for those frames. - The controller/
processors base station 1010 and theUE 1050, respectively. For example, the controller/processors memories base station 1010 and theUE 1050, respectively. A scheduler/processor 1046 at thebase station 1010 may be used to allocate resources to the UEs and schedule downlink and/or uplink transmissions for the UEs. - In various examples, wireless networks with EVS coverage may be handed over to a wireless network without EVS coverage, i.e., a non-native EVS system. For example, a UE within a LTE coverage may be handed over to another coverage, e.g., 3GPP2 coverage, without EVS. A transcoder may be used to enable compatibility for EVS coverage with possible increase in delay and decrease in audio quality due to the need for transcoding between different formats.
- Referring to
FIG. 11 , a block diagram of a particular illustrative implementation of a device (e.g., a wireless communication device) for smart coding mode switch is depicted and generally designated 1100. In various implementations, thedevice 1100 may have more or fewer components than illustrated inFIG. 11 . In an illustrative implementation, thedevice 1100 may correspond to one or more of thefirst device 102, thesecond device 122 ofFIG. 1 , and theUE 850 ofFIG. 8 . In an illustrative implementation, thedevice 1100 may operate according to one or more of the systems or methods described with reference toFIGS. 1-10 . - In a particular implementation, the
device 1100 includes a processor 1106 (e.g., a central processing unit (CPU). Thedevice 1100 may include one or more additional processors 1110 (e.g., one or more digital signal processors (DSPs)). Theprocessors 1110 may include thefirst speech vocoder 104. In an alternate implementation, thefirst speech vocoder 104 may be included in a different type of processor, such as a CPU (e.g., the processor 1106). - The
device 1100 may include thememory 1152 and aCODEC 1134. Thememory 1152 may includeinstructions 1156 that are executable by theprocessor 1110. Thedevice 1100 may include awireless controller 1140 coupled, via atransceiver 1150, to anantenna 1142. In a particular implementation, thetransceiver 1150 may include thereceiver 106, thetransmitter 108, or both, ofFIG. 1 . As a further example, theantenna 1142 may receive incoming packets corresponding to a sequence of packets sent by another device via a network. The received packets may correspond to a sequence of frames of a user speech signal. - The
device 1100 may include adisplay 1128 coupled to adisplay controller 1126. Thespeaker 1148, themicrophone 1146, or both, may be coupled to theCODEC 1134. TheCODEC 1134 may include a digital-to-analog converter 1102 and an analog-to-digital converter 1104. In a particular implementation, theCODEC 1134 may receive analog signals from themicrophone 1146, convert the analog signals to digital signals using the analog-to-digital converter 1104, and provide the digital signals to thefirst speech vocoder 104. Thefirst speech vocoder 104 may process the digital signals. In a particular implementation, thefirst speech vocoder 104 may provide digital signals to theCODEC 1134. TheCODEC 1134 may convert the digital signals to analog signals using the digital-to-analog converter 1102 and may provide the analog signals to thespeaker 1148. - The
first speech vocoder 104 may include theencoder 110, thedecoder 114, theanalyzer 116, thede-jitter buffer 112. Thedecoder 114, theanalyzer 116, and thede-jitter buffer 112 may be used to implement a hardware that performs smart coding mode switch as described herein. Alternatively, or in addition, a software implementation (or combined software/hardware implementation) may be used. For example, thememory 1152 may includeinstructions 1156 executable by theprocessors 1110 or other processing unit of the device 1100 (e.g., theprocessor 1106, theCODEC 1134, or both) to perform themethod 500 ofFIG. 5 or themethod 600 ofFIG. 6 . - The
device 1100 may be included in a system-in-package or system-on-chip device 1122. In a particular implementation, thememory 1152, theprocessor 1106, theprocessors 1110, thedisplay controller 1126, theCODEC 1134, and thewireless controller 1140 may be included in a system-in-package or system-on-chip device 1122. In a particular implementation, aninput device 1130 and apower supply 1144 are coupled to the system-on-chip device 1122. Moreover, as illustrated inFIG. 11 , thedisplay 1128, theinput device 1130, thespeaker 1148, themicrophone 1146, theantenna 1142, and the power supply 1344 are external to the system-on-chip device 1122. In a particular implementation, each of thedisplay 1128, theinput device 1130, thespeaker 1148, themicrophone 1146, theantenna 1142, and thepower supply 1144 may be coupled to a component of the system-on-chip device 1122, such as an interface or a controller. - In an illustrative implementation, the
processors 1110 may be operable to perform all or a portion of the methods or operations described with reference toFIGS. 1-10 . As an illustrative example, theprocessors 1110 may be configured to execute software (e.g., a program of one or more instructions 1156) stored in thememory 1152. For example, theprocessors 1110 may be configured to operate in accordance with themethod 500 ofFIG. 5 , themethod 600 ofFIG. 6 , or a combination thereof. To illustrate, theprocessors 1110 may be configured to execute theinstructions 1156 that cause theprocessors 1110 to perform the operations of receiving a first data at ade-jitter buffer 112. The first data may be encoded by a transmitting device (e.g., the first device 102) based on a first coding mode indicating both primary frame encoding and redundant frame encoding. The processor may also perform the operations of decoding the first data by thedecoder 114. The first data may include both a partial copy of a first frame and a primary copy of a second frame. - The processor may also perform the operations of determining whether a condition for coding mode change is satisfied. The condition to trigger coding mode change may be based on statistical information indicating the number of the received partial copies decoded by redundant frame decoding block, a real-time quality metric of decoded speech signal, or information from another layer.
- The processor may also perform the operations of transmitting a request to the transmitting device, in response to determining that the condition is satisfied. The request may indicate coding mode change from the first coding mode to a second coding mode, which corresponds to encoding by only primary frame encoding scheme without the use of redundant frame encoding scheme. The processor may also perform the operations of receiving a second data at the de-jitter buffer of the receiving device, where the second data was encoded by the transmitting device based on the second coding mode.
- In conjunction with the described implementations, an apparatus may include means for receiving a first data. For example, the means for receiving the first data may include the
receiver 106, thetransceiver 1150, thede-jitter buffer 112, one or more other devices, circuits, or modules. The first data may be encoded by a transmitting device based on a first coding mode indicating both primary frame encoding and redundant frame encoding. The apparatus may also include means for decoding the first data, at the receiving device. For example, the means for decoding the first data may include thedecoder 114, thefirst speech vocoder 104, theprocessors 1106 1110, one or more other devices, circuits, or modules. The first data may include a partial copy of a first frame of the audio signal and a primary copy of a second frame of the audio signal. - The apparatus may also include means for determining whether a condition for coding mode change is satisfied. For example, the means for determining the condition for coding mode change may include the
analyzer 116, thefirst speech vocoder 104, theprocessors 1106 1110, one or more other devices, circuits, or modules. The apparatus may also include means for transmitting a request to the transmitting device in response to determining that the condition is satisfied. For example, the means for transmitting the request may include thetransmitter 108, thetransceiver 1150, one or more other devices, circuits, or modules. The request may indicate coding mode change to a second coding mode corresponding to primary frame encoding without redundant frame encoding. - The apparatus may also include means for receiving a second at the receiving device. For example, the means for receiving the first data may include the
receiver 106, thetransceiver 1150, thede-jitter buffer 112, one or more other devices, circuits, or modules. The second data maybe encoded by the transmitting device based on the second coding mode. The apparatus may also include means for decoding the second data. For example, the means for decoding the second data may include thedecoder 114, thefirst speech vocoder 104, theprocessors 1106 1110, one or more other devices, circuits, or modules. The second data may include a primary copy of a third frame of the audio signal. - One or more of the disclosed aspects may be implemented in a system or an apparatus, such as the
device 1100, that may include a communications device, a fixed location data unit, a mobile location data unit, a mobile phone, a cellular phone, a satellite phone, a computer, a tablet, a portable computer, a display device, a media player, or a desktop computer. Alternatively or additionally, thedevice 1100 may include a set top box, an entertainment unit, a navigation device, a personal digital assistant (PDA), a monitor, a computer monitor, a television, a tuner, a radio, a satellite radio, a music player, a digital music player, a portable music player, a video player, a digital video player, a digital video disc (DVD) player, a portable digital video player, a satellite, a vehicle, any other device that includes a processor or that stores or retrieves data or computer instructions, or a combination thereof. As another illustrative, non-limiting example, the system or the apparatus may include remote units, such as hand-held personal communication systems (PCS) units, portable data units such as GPS enabled devices, meter reading equipment, or any other device that includes a processor or that stores or retrieves data or computer instructions, or any combination thereof. - Although one or more of
FIGS. 1-11 may illustrate systems, apparatuses, and/or methods according to the teachings of the disclosure, the disclosure is not limited to these illustrated systems, apparatuses, and/or methods. One or more functions or components of any ofFIGS. 1-11 as illustrated or described herein may be combined with one or more other portions of another function or component ofFIGS. 1-11 . Accordingly, no single example described herein should be construed as limiting and examples of the disclosure may be suitably combined without departing from the teachings of the disclosure. As an example, one or more of the methods ofFIGS. 5-6 , individually or in combination, may be performed by theprocessors 1110 ofFIG. 11 . To illustrate, one or more operations described with reference to theFIGS. 5-6 may be optional, may be performed at least partially concurrently, and/or may be performed in a different order than shown or described. - Those of skill would further appreciate that the various illustrative logical blocks, configurations, modules, circuits, and algorithm steps described in connection with the implementations disclosed herein may be implemented as electronic hardware, computer software executed by a processor, or combinations of both. Various illustrative components, blocks, configurations, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or processor executable instructions depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, such implementation decisions are not to be interpreted as causing a departure from the scope of the present disclosure.
- The steps of a method or algorithm described in connection with the implementations disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in random access memory (RAM), flash memory, read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), registers, hard disk, a removable disk, a compact disc read-only memory (CD-ROM), or any other form of non-transient storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor may read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application-specific integrated circuit (ASIC). The ASIC may reside in a computing device or a user terminal. In the alternative, the processor and the storage medium may reside as discrete components in a computing device or user terminal.
- The previous description of the disclosed aspects is provided to enable a person skilled in the art to make or use the disclosed aspects. Various modifications to these aspects will be readily apparent to those skilled in the art, and the principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope possible consistent with the principles and novel features as defined by the following claims.
Claims (30)
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/997,531 US10475456B1 (en) | 2018-06-04 | 2018-06-04 | Smart coding mode switching in audio rate adaptation |
PCT/US2019/032864 WO2019236267A1 (en) | 2018-06-04 | 2019-05-17 | Smart coding mode switching in audio rate adaptation |
CN201980037497.5A CN112236818A (en) | 2018-06-04 | 2019-05-17 | Intelligent coding mode switching in audio rate adaptation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US15/997,531 US10475456B1 (en) | 2018-06-04 | 2018-06-04 | Smart coding mode switching in audio rate adaptation |
Publications (2)
Publication Number | Publication Date |
---|---|
US10475456B1 US10475456B1 (en) | 2019-11-12 |
US20190371345A1 true US20190371345A1 (en) | 2019-12-05 |
Family
ID=66770585
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/997,531 Active 2038-07-26 US10475456B1 (en) | 2018-06-04 | 2018-06-04 | Smart coding mode switching in audio rate adaptation |
Country Status (3)
Country | Link |
---|---|
US (1) | US10475456B1 (en) |
CN (1) | CN112236818A (en) |
WO (1) | WO2019236267A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220059101A1 (en) * | 2019-11-27 | 2022-02-24 | Tencent Technology (Shenzhen) Company Limited | Voice processing method and apparatus, computer-readable storage medium, and computer device |
US11522816B2 (en) | 2019-09-25 | 2022-12-06 | MIXHalo Corp. | Multi-stride packet payload mapping for robust transmission of data |
US11546705B2 (en) | 2020-06-22 | 2023-01-03 | Gn Audio A/S | Adaptive multi-band hearing device |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9380327B2 (en) * | 2011-12-15 | 2016-06-28 | Comcast Cable Communications, Llc | System and method for synchronizing timing across multiple streams |
JP2022543292A (en) * | 2019-08-05 | 2022-10-11 | シュアー アクイジッション ホールディングス インコーポレイテッド | transmit antenna diversity wireless audio system |
US11646042B2 (en) * | 2019-10-29 | 2023-05-09 | Agora Lab, Inc. | Digital voice packet loss concealment using deep learning |
US11996932B2 (en) * | 2021-12-16 | 2024-05-28 | Nokia Solutions And Networks Oy | Active-active TDM PW with asymmetry control |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6300888B1 (en) * | 1998-12-14 | 2001-10-09 | Microsoft Corporation | Entrophy code mode switching for frequency-domain audio coding |
US7599840B2 (en) * | 2005-07-15 | 2009-10-06 | Microsoft Corporation | Selectively using multiple entropy models in adaptive coding and decoding |
AU2009267518B2 (en) * | 2008-07-11 | 2012-08-16 | Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. | Apparatus and method for encoding/decoding an audio signal using an aliasing switch scheme |
CN101478782B (en) * | 2009-01-20 | 2011-03-30 | 北京天碁科技有限公司 | Method and apparatus for hybrid automatic repeat request and channel decoding |
CN101741516B (en) * | 2009-12-29 | 2013-05-08 | 电子科技大学 | Speed-free mode-based frame level encoding and decoding method |
CN105355209B (en) * | 2010-07-02 | 2020-02-14 | 杜比国际公司 | Pitch enhancement post-filter |
US9026434B2 (en) * | 2011-04-11 | 2015-05-05 | Samsung Electronic Co., Ltd. | Frame erasure concealment for a multi rate speech and audio codec |
US9275644B2 (en) * | 2012-01-20 | 2016-03-01 | Qualcomm Incorporated | Devices for redundant frame coding and decoding |
US10614816B2 (en) * | 2013-10-11 | 2020-04-07 | Qualcomm Incorporated | Systems and methods of communicating redundant frame information |
TWI602172B (en) * | 2014-08-27 | 2017-10-11 | 弗勞恩霍夫爾協會 | Encoder, decoder and method for encoding and decoding audio content using parameters for enhancing a concealment |
US10148391B2 (en) * | 2015-10-01 | 2018-12-04 | Telefonaktiebolaget Lm Ericsson (Publ) | Method and apparatus for removing jitter in audio data transmission |
US10049682B2 (en) * | 2015-10-29 | 2018-08-14 | Qualcomm Incorporated | Packet bearing signaling information indicative of whether to decode a primary coding or a redundant coding of the packet |
-
2018
- 2018-06-04 US US15/997,531 patent/US10475456B1/en active Active
-
2019
- 2019-05-17 WO PCT/US2019/032864 patent/WO2019236267A1/en active Application Filing
- 2019-05-17 CN CN201980037497.5A patent/CN112236818A/en active Pending
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11522816B2 (en) | 2019-09-25 | 2022-12-06 | MIXHalo Corp. | Multi-stride packet payload mapping for robust transmission of data |
US11646979B2 (en) * | 2019-09-25 | 2023-05-09 | MIXHalo Corp. | Packet payload mapping for robust transmission of data |
US11716294B2 (en) | 2019-09-25 | 2023-08-01 | MIXHalo Corp. | Multi-stride packet payload mapping for robust transmission of data |
US20220059101A1 (en) * | 2019-11-27 | 2022-02-24 | Tencent Technology (Shenzhen) Company Limited | Voice processing method and apparatus, computer-readable storage medium, and computer device |
US11869516B2 (en) * | 2019-11-27 | 2024-01-09 | Tencent Technology (Shenzhen) Company Limited | Voice processing method and apparatus, computer- readable storage medium, and computer device |
US11546705B2 (en) | 2020-06-22 | 2023-01-03 | Gn Audio A/S | Adaptive multi-band hearing device |
Also Published As
Publication number | Publication date |
---|---|
CN112236818A (en) | 2021-01-15 |
US10475456B1 (en) | 2019-11-12 |
WO2019236267A1 (en) | 2019-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10475456B1 (en) | Smart coding mode switching in audio rate adaptation | |
US9047863B2 (en) | Systems, methods, apparatus, and computer-readable media for criticality threshold control | |
US6832195B2 (en) | System and method for robustly detecting voice and DTX modes | |
KR102463648B1 (en) | Enhanced voice services (evs) in 3gpp2 network | |
CN109644444B (en) | Method, apparatus, device and computer readable storage medium for wireless communication | |
EP2706709A2 (en) | Speech signal processing method and device, and access network system | |
WO2014101212A1 (en) | Multi-rate voice service channel encoding method, channel decoding method and device | |
WO2019232755A1 (en) | Data transmission method and device | |
US20150319743A1 (en) | Repeated fast associated control channel |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: QUALCOMM INCORPORATED, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:BHATIA, SARANDEEP;LEE, TONY;RAMACHANDRAN, SHYAMAL;AND OTHERS;SIGNING DATES FROM 20180708 TO 20180724;REEL/FRAME:046614/0900 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 4 |