WO2010000303A1 - Décodeur de parole avec dissimulation d'erreur - Google Patents

Décodeur de parole avec dissimulation d'erreur Download PDF

Info

Publication number
WO2010000303A1
WO2010000303A1 PCT/EP2008/058400 EP2008058400W WO2010000303A1 WO 2010000303 A1 WO2010000303 A1 WO 2010000303A1 EP 2008058400 W EP2008058400 W EP 2008058400W WO 2010000303 A1 WO2010000303 A1 WO 2010000303A1
Authority
WO
WIPO (PCT)
Prior art keywords
speech
data frame
speech data
frame
decoding
Prior art date
Application number
PCT/EP2008/058400
Other languages
English (en)
Inventor
Pasi Ojala
Ari Lakaniemi
Original Assignee
Nokia Corporation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Corporation filed Critical Nokia Corporation
Priority to PCT/EP2008/058400 priority Critical patent/WO2010000303A1/fr
Publication of WO2010000303A1 publication Critical patent/WO2010000303A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • the present invention relates to an apparatus and method for decoding, and in particular, but not exclusively to an apparatus and method suitable for the decoding of speech and audio related signals.
  • Audio signals such as speech or music
  • Speech encoders and decoders are usually optimised for speech signals, and can operate at either a fixed or variable bit rate.
  • Speech codecs may be provided to code the audio signal in various communication standards.
  • speech codecs may be used for communication on mobile networks such as those based on the WCDMA (wideband code division multiple access), GSM/EDGE (Global System for Mobile communications / Enhanced Data rates for GSM Evolution) and other 3G networks.
  • WCDMA wideband code division multiple access
  • GSM/EDGE Global System for Mobile communications / Enhanced Data rates for GSM Evolution
  • the speech coding may be used for both in circuit switched and packet switched domains. It may also be used in messaging type applications, such as multimedia messaging (MMS).
  • MMS multimedia messaging
  • AMR Adaptive Multi-Rate wideband codec was developed by the third generation partnership project (3GPP) for GSM-Edge and WCDMA communication networks.
  • 3GPP Third Generation partnership project
  • AMR is based on the algebraic code excited linear prediction (ACELP) coding algorithm. Details of the AMR codec can be found in the 3GPP TS 26.090 technical specification.
  • the transport of such digitised information over wireless channels invariably introduces errors into the transmitted data. These errors typically manifest themselves either as individual bit errors in the received data frame, or in the extreme case as the loss of the entire data frame.
  • the media receiver may incorporate an error concealment functionality. This has the effect of masking errors in the reconstructed signal and is found to be reasonably effective for error frame rates of up to 2-10 % for speech based communication, depending on the employed codec technology.
  • bit errors in the received signal may be detected using error detection schemes such a those based on cyclic redundancy checksum coding (CRC).
  • CRC cyclic redundancy checksum coding
  • the error detection scheme may be designed such that the length of the error detection code is sufficiently long enough such that all bits which comprise the source data frame are encompassed by the error detection code.
  • error detection systems are developed such that the data within the source frame is categorised according to their relative perceptual importance. This allows the more sensitive categories of bits to be covered by a checksum at perhaps at the expense of the less important categories of bits.
  • Most communications systems deploy an arrangement in the receiver whereby the received source data frame may be checked for errors in order to determine the extent of the corruption.
  • the results of the checksum can be used to ascertain whether vital information has been corrupted in the received source frame and therefore it may be necessary to discard the frame, or whether only partial corruption has taken place and it may be still possible to use some of the data bits during the source decoding process.
  • the results of the checksum may be fed to the error concealment process within the media or source decoder, in order to facilitate the decoding of the corrupted source frame.
  • error detection is deployed on the most important class A speech bits.
  • the lesser classes of speech bits in the AMR encoded frame, known as class B and class C bits do not have a CRC checksum for error detection purposes.
  • the CRC checksum for class A bits is decoded in order to determine if any of the most important bits have been corrupted. If errors have been detected within the class A bits then the received frame is marked as being corrupted and the AMR decoder will not use the frame for normal decoding, instead, the frame is used as part of the error concealment process.
  • a further example of the use of error concealment within the AMR codec occurs when the codec is deployed over a packet switched network as part of a Voice over IP (VoIP) communication link.
  • VoIP Voice over IP
  • IP internet Protocol
  • the underlying internet Protocol (IP) network is not specifically tailored for conversational media transport, and consequently the lower !P networks as such are not tuned to tolerate any level of corruption in the received data packets. Therefore the receiver or an intermediate network node will simply discard the corrupted packet and Therefore contrary to the circuit switched mode of operation of a codec, corruption in the received packet over a IP network typically will not result in the use of any of the corrupted packet in the decoding process.
  • Most speech coding systems have provisions for the handling of lost frames.
  • the third generation reference implementation for the 3GPP codec AMR apply particular provisions when the decoder is notified of a missing frame.
  • the mechanism implemented can comprise a number of different options for combating the effect of lost frames.
  • the deployment of a particular option is dependent on the current operating conditions and decoding state at the point in time the decoder is notified of the missing frame.
  • the 3GPP standard TS 26.093 have identified and catered for the notification of a lost frame during the following decoder operating conditions:
  • the first operating condition caters for the case when the decoder is operating in a normal mode of operation in that it is decoding frames of speech. Whilst operating in this mode the codec is said to be in a speech state.
  • the decoder is then notified of a lost frame by receiving a NO-DATA frame, upon receipt of the notification the coder changes to an error state of operation and performs error concealment in place of the missing frame.
  • the second operating condition caters for the case when the decoder is operating in a discontinuous transmission (DTX) mode of operation and the decoder is in the silence mode, that is the decoder is receiving silence insertions descriptor (SID) updates for the generation of comfort noise. If the decoder is notified of a lost frame whilst operating in this mode, the decoder simply continues in the silence mode, and uses the previous value of SID update parameters in order to generate the comfort noise for the missing frame.
  • DTX discontinuous transmission
  • SID silence insertions descriptor
  • the third operating condition caters for the case of the decoder returning from the error state to the normal operating speech state when a valid frame is received after a run of one or more missing or invalid frames.
  • the speech decoder Upon receipt of the valid frame the speech decoder returns to the normal operating speech state whereby the decoder simply decodes the frame according to the received frame type by taking necessary precautions due to the knowledge that the current state of the decoder is not synchronized with the encoder due to one or more missing or invalid frames.
  • the decoder In a normal mode of operation the decoder will simply decode the newly received first valid speech frame after a region of silence according to the received frame type. However, there is a mismatch between encoder and decoder memory states at this point.
  • the speech data will be encoded at the encoder using codec memories which are formed from previous frames which in this particular instance will comprise frames of speech. Whilst at the decoder the memory state will be populated with samples based on the proceeding comfort noise parameters. Consequently this will result in the computation of erroneous speech parameter values at the decoder. It is to be understood that this situation is not particular to this operating scenario, this mismatch will occur when ever the decoder returns to a normal mode of operation after a run of receiving invalid frames.
  • This invention proceeds from the consideration that as part of a speech coding scheme deploying error concealment for the masking of lost speech frames, there is a need to be able to compensate for the mismatch between the memories at the encoder and decoder at the point when the decoder resumes a normal mode of operation after a run of invalid speech frames, Whilst in most operating conditions this mismatch between memories is not problematic. However, in some operating conditions the mismatch between coding states and memories can result in particularly disturbing artefacts in the decoded speech signal.
  • Embodiments of the present invention aim to address the above problem.
  • a method comprising: detecting in a bitstream comprising a plurality of speech data frames the absence of at least one speech data frame; determining the absent at least one speech data frame is associated with a transition from a silence signal type frame to a speech signal type frame; adapting at least one speech parameter associated with at least one speech data frame succeeding the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame.
  • determining the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame may comprise: determining the absent at least one missing speech data frame directly succeeds at least one silence signal type data frame.
  • the plurality of speech data frames may be grouped into packets and the packets may further comprise a packet header.
  • Detecting the absent at least one speech data frame may comprises: reading from a first data packet a first data packet header value, the first data packet comprising a first speech data frame; reading from a second data packet a second data packet header value, the second data packet comprising a second speech data frame; calculating a difference value between the first data header packet value and the second data header packet value; and determining the difference value is indicative of the absent at least one speech data frame between the first speech data frame and the second speech data frame.
  • Determining the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame may comprise: determining the first speech data frame is the silence signal type frame; determining the second speech data frame is the speech signal type; and determining that a second data packet header value of the first received speech data frame and a second data packet header value of the second received speech data frame are of the same value.
  • Adapting at least one speech parameter associated with at least one further speech data frame succeeding the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame associated with the transition from the silence signal type frame to the speech signal type frame may comprise: applying a bandwidth expansion factor to at least one spectral coefficient associated the at least one speech data frame succeeding the absent at least one speech data frame.
  • the at least one second bandwidth expansion factor is preferably applied to at least one spectral coefficient associated with at least one further speech data frame succeeding the at least one speech data frame succeeding the absent at least one speech data frame.
  • the applying a bandwidth expansion factor may comprise: multiplying each of the at least one spectral coefficient with the bandwidth expansion factor, wherein an exponent of the bandwidth expansion factor is incremented with a numerical order value associated with a spectral coefficient order value of each of the at least one spectral coefficient.
  • the at least one second bandwidth expansion factor applied to each of the at least one spectral coefficient of the at least one further speech data frame is preferably increased in value relative to the bandwidth expansion factor.
  • the at least one second bandwidth expansion factor applied to each of the at least one spectral coefficient associated with each of the at least one further speech data frame is preferably increased in value relative to the speech data frame order value.
  • the at least one second bandwidth expansion factor applied to each of the at least one spectral coefficient associated with each of the at least one further speech data frame is preferably increased in at least one of the following ways: linearly; and exponentially.
  • the spectral coefficient is preferably a linear prediction coding coefficient.
  • Adapting at least one speech parameter associated with at least one further speech data frame succeeds the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type may comprise: applying an attenuation factor to at least one excitation vector gain value associated with the at least one speech data frame succeeding the absent at least one speech data frame.
  • At least one second attenuation factor may be further applied to at least one excitation vector gain value associated with at least one further speech data frame succeeding the at least one speech data frame succeeding the absent at ieast one speech data frame.
  • the at least one second attenuation factor applied to each of the at least one further speech data frame is preferably decreased relative to the attenuation factor.
  • the at least one second attenuation factor applied to each of the at least one further speech data frame is preferably decreased in at least one of the following ways: exponentially; and linearly.
  • the first data packet header value is preferably at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.
  • the second data packet header value is preferably a real time transport protocol payload specific indicator.
  • the real time transport protocol payload specific indicator is preferably a real time transport protocol M bit.
  • an apparatus for decoding a speech signal configured to: detect in a bitstream comprising a plurality of speech data frames the absence of at least one speech data frame; determine the absent at least one speech data frame is associated with a transition from a silence signal type frame to a speech signal type frame; adapt at least one speech parameter associated with at least one speech data frame succeeding the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame.
  • the apparatus configured to determine the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame may be further configured to: determine the absent at least one missing speech data frame directly succeeds at ieast one silence signal type data frame.
  • the plurality of speech data frames may be grouped into packets and the packets may further comprise a packet header.
  • the apparatus configured to detect the absent at least one speech data frame may be further configured to: read from a first data packet a first data packet header value, the first data packet comprising a first speech data frame; read from a second data packet a second data packet header value, the second data packet comprising a second speech data frame; calculate a difference value between the first data header packet value and the second data header packet value; and determine the difference value is indicative of the absent at least one speech data frame between the first speech data frame and the second speech data frame.
  • the apparatus configured to determine the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame may be further configured to: determine the first speech data frame is the silence signal type frame; determine the second speech data frame is the speech signal type; and determine that a second data packet header value of the first received speech data frame and a second data packet header value of the second received speech data frame are of the same value.
  • the apparatus configured to adapt at least one speech parameter associated with at least one further speech data frame succeeding the absent at ieast one speech data frame dependent on the determination of the absent at ieast one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame may be further configured to: applying a bandwidth expansion factor to at least one spectral coefficient associated the at least one speech data frame succeeding the absent at least one speech data frame.
  • the apparatus may be further configured to apply at ieast one second bandwidth expansion factor to at least one spectral coefficient associated with at least one further speech data frame succeeding the at least one speech data frame succeeding the absent at least one speech data frame.
  • the apparatus configured to apply a bandwidth expansion factor may be further configured to: multipiy each of the at least one spectral coefficient with the bandwidth expansion factor, wherein an exponent of the bandwidth expansion factor is incremented with a numerical order value associated with a spectral coefficient order value of each of the at least one spectra! coefficient.
  • the apparatus may be further configured to appiy the at least one second bandwidth expansion factor to each of the at least one spectral coefficient of the at least one further speech data frame is increased in value relative to the bandwidth expansion factor.
  • the apparatus may be further configured to apply the at least one second bandwidth expansion factor to each of the at least one spectral coefficient associated with each of the at least one further speech data frame is increased in value relative to the speech data frame order value .
  • the apparatus may be further configured to increase the at least one second bandwidth expansion factor applied to each of the at least one spectral coefficient associated with each of the at least one further speech data frame by at least one of the following ways: linearly; and exponentially.
  • the spectral coefficient may be a linear prediction coding coefficient.
  • the apparatus configured to adapt at least one speech parameter associated with at least one further speech data frame succeeding the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame associated with the transition from the silence signal type frame to the speech signal type may be further configured to: apply an attenuation factor to at [east one excitation vector gain value associated with the at least one speech data frame succeeding the absent at least one speech data frame.
  • the apparatus may be further configured to apply at least one second attenuation factor to at least one excitation vector gain value associated with at least one further speech data frame succeeding the at least one speech data frame succeeding the absent at least one speech data frame.
  • the apparatus may be further configured to apply at least one second attenuation factor to each of the at least one further speech data frame is decreased relative to the attenuation factor.
  • the apparatus may be further configured to decrease the at least one second attenuation factor applied to each of the at least one further speech data frame in at least one of the following ways: exponentially; and linearly.
  • the first data packet header value may be at least one of: a real time transport protocol time stamp; and a real time transport protocol sequence number.
  • the second data packet header value may be a real time transport protoco! payload specific indicator.
  • the real time transport protocol payload specific indicator may be a real time transport protocol M bit.
  • An electronic device may comprise an apparatus as described above.
  • a chip set may comprise an apparatus as described above.
  • a computer program product configured to perform a method decoding a speech signal, comprising: detecting in a bitstream comprising a plurality of speech data frames the absence of at least one speech data frame ;determining the absent at least one speech data frame is associated with a transition from a silence signal type frame to a speech signal type frame; adapting at least one speech parameter associated with at least one speech data frame succeeding the absent at least one speech data frame dependent on the determination of the absent at least one speech data frame is associated with the transition from the silence signal type frame to the speech signal type frame.
  • FIG 1 shows schematically an electronic device employing embodiments of the invention
  • Figure 2 shows schematically a decoder system employing embodiments of the invention
  • Figure 3 shows schematically a decoder deploying a first embodiment of the invention
  • Figure 4 shows a flow diagram illustrating the operation of the decoder according to embodiments of the invention
  • Figure 5 shows a flow diagram illustrating in further detail a part of the operation of an embodiment of the decoder as shown in figure 4;
  • Figure 6 depicts an Adaptive Multirate (AMR) speech encoded frame structure according to an example of a first embodiment of the invention
  • Figure 7 shows schematically in further detail a part of the decoder deploying a first embodiment of the invention.
  • Figure 8 shows a flow diagram illustrating in further detail a further part of the decoder as shown in figure 3.
  • Figure 1 shows a schematic block diagram of an exemplary electronic device 10 or apparatus, which may incorporate a codec according to an embodiment of the invention.
  • the electronic device 10 may for example be a mobile terminal or user equipment of a wireless communication system.
  • the electronic device 10 comprises a microphone 11 , which is linked via an analogue-to-digital converter (ADC) 14 to a processor 21.
  • the processor 21 is further linked via a digital-to-analogue converter (DAC) 32 to loudspeaker(s) 33.
  • the processor 21 is further linked to a transceiver (RX/TX) 13, to a user interface (Ul) 15 and to a memory 22.
  • the processor 21 may be configured to execute various program codes.
  • the implemented program codes 23 may comprise an audio decoding code or speech decoding code.
  • the implemented program codes 23 may be stored for example in the memory 22 for retrieval by the processor 21 whenever needed.
  • the memory 22 could further provide a section 24 for storing data, for example data that has been encoded in accordance with the invention.
  • the decoding code may in embodiments of the invention be implemented in electronic based hardware or firmware.
  • the user interface 15 enables a user to input commands to the electronic device 10, for example via a keypad, and/or to obtain information from the electronic device 110, for example via a display.
  • the transceiver 13 enables a communication with other electronic devices, for example via a wireless communication network. It is to be understood again that the structure of the electronic device 10 could be supplemented and varied in many ways.
  • a user of the electronic device 10 may use the microphone 11 for inputting speech that is to be transmitted to some other electronic device or that is to be stored in the data section 24 of the memory 22.
  • a corresponding application has been activated to this end by the user via the user interface 15.
  • This application which may be run by the processor 21 , causes the processor 21 to execute the encoding code stored in the memory 22.
  • the anaSogue-to-digital converter 14 converts the input analogue audio signal into a digital audio signal and provides the digital audio signal to the processor 21.
  • the electronic device 10 could receive a bit stream with correspondingly encoded data from another electronic device via its transceiver 13.
  • coded data could be stored in the data section 24 of the memory 22, for instance for a later presentation by the same electronic device 10.
  • the processor 21 may execute the decoding program code stored in the memory 22.
  • the processor 21 decodes the received data, for instance in the same way as described with reference to Figures 3 and 4, and provides the decoded data to the digital-to- analogue converter 32.
  • the digital-to-analogue converter 32 converts the digital decoded data into analogue audio data and outputs them via the loudspeakers) 33. Execution of the decoding program code could be triggered as well by an application that has been called by the user via the user interface 15.
  • the received encoded data could also be stored instead of an immediate presentation via the loudspeakers) 33 in the data section 24 of the memory 22, for instance for enabling a later presentation or a forwarding to still another electronic device.
  • a general decoding system 102 is illustrated schematically in figure 2.
  • the system 102 may comprise a storage or media channel (also known as a communication channel) 106 and a decoder 108.
  • the decoder 108 decompresses the bit stream 112 and produces an output audio signal 114.
  • the bit rate of the bit stream 112 and the quality of the output audio signal 114 in relation to the input signal 1 10 are the main features, which define the performance of the coding system 102.
  • FIG. 3 shows schematically a decoder 108 according to an embodiment of the invention.
  • the decoder 108 comprises an input 302 from which the encoded stream 112 may be received via the media channel 106.
  • the input 302 may be connected to a network interface unit 301.
  • the network interface unit 301 may be configured to receive encoded data from a media or communication channel, whereby the received data may be stored and unpacked.
  • the output from the network interface unit 301 may be connected to the decoding unit 303.
  • the network interface unit 301 may be connected to the decoding unit 303 via at least two separate connections as depicted in figure 3.
  • the first connection 321 may be configured to convey unpacked audio or speech data to the decoding unit 303, and the second connection 323 may be configured to carry any associated packet header information.
  • the network interface unit 301 may be connected to the decoding unit 303 via a single connection.
  • This single connection may be configured to convey both unpacked audio or speech data and any associated packet header data.
  • the associated packet header information may be embedded within the audio or speech data stream.
  • the decoding unit 303 may comprise four functional entities; a data stream parser 305, a speech decoder 307, and a discontinuous transmission (DTX) decoder 309, and an error concealment processor 311.
  • the decoding unit 303 may receive the encoded speech data stream and accompanying packet header information at the speech decoding unit 303, and specifically at the data stream parser 305.
  • the data stream parser 305 may provide separate connections to each of the speech decoder 307, discontinuous transmission (DTX) decoder 309, and the error concealment processor 311.
  • the connection 325 from the data stream parser 305 to the error concealment processor 311 may convey data associated with the process of error concealment.
  • connection 327 from the data stream parser 305 to the discontinuous transmission (DTX) decoder 309 may convey data associated with the coding of silence regions.
  • the connection 329 from the data stream parser 305 to the speech decoder 307 may convey speech coding parameters. Additionally, the data stream parser 305 may also have a further signalling connection 328 to the speech decoder 307.
  • the outputs from each of the speech decoder 307, discontinuous transmission (DTX) decoder 309, and the error concealment processor 311 may be connected to the output of the decoding unit 303.
  • the speech decoder 307, DTX decoder 309 and error concealment processor 311 share some core functional processing elements and therefore in further embodiments of the invention they may be implemented as a single body. Further, it is to be understood that entities such as these often share the same filter memories which may be utilised to assist in any transition from one entity to the next.
  • the network interface unit 301 may be connected via the input 302 to a packet switched network typically employed for Voice over IP (VoIP) based communication.
  • VoIP Voice over IP
  • the speech signal may be transmitted in packet form using a packet structure according to the Real Time Transport Protocol (RTP) which may be encapsulated in the User Datagram Protocol (UDP), and further encapsulated in the Internet Protocol (IP). Contained within the RTP packet header there may be found the Time Stamp (TS) field, the sequence number (SN) field and the marker bit (M bit).
  • RTP Real Time Transport Protocol
  • UDP User Datagram Protocol
  • IP Internet Protocol
  • TS Time Stamp
  • SN sequence number
  • M bit marker bit
  • the TS field may be configured to reflect the sampling instant of the first octet in the RTP data packet.
  • the time stamp is typically derived from a system clock which increments monotonically and linearly in time. This information may then be used to provide information on the temporal difference between RTP packets transmitted in the same RTP session, for example to enable restoring the temporal order of received frames.
  • the sequence number field reflects the order in which the RTP packets are sent within in an RTP session. The sequence number increments by one for each RTP data packet sent, and therefore may be used by a receiver to detect packet loss and to restore the sequence order of received packets.
  • the M bit is a RTP payload specific bit whereby the information conveyed by the bit is dependent on the format of the RTP payload.
  • the Internet Engineering Task Force (IETF) RFC standard 4867 specifies the payload format for AMR and AMR-WB speech codecs.
  • the standard specifies the M bit may be used to signify whether the RTP packet contains the first frame of encoded speech after a region of silence.
  • This particular speech frame may be termed the speech transition frame.
  • a one indicates that the packet contains such a speech frame and a zero is used to indicate a packet not carrying such a speech frame.
  • the network interface unit 301 may, as part of the parsing process, unpack the payload data from the stream of received RTP packets and pass the codec specific payload information to the decoding unit 303.
  • step 401 The process of parsing the received packets is shown as step 401 in figure 4.
  • the network interface unit 301 may be used to reconstruct the timeline of the received RTP packets. This may be done by monitoring the TS field found within the header of each received RTP packet. The network interface unit 301 may then determine if any RTP packets are missing from the received stream by monitoring for any breaks in the received timeline. It is to be understood that the length of the break in the received timeline may be proportional to the number of missing RTP packets and therefore the number of missing encoded speech data frames.
  • the network interface unit 301 may also be configured to determine if there are any missing packets within the received data stream by monitoring the value of the SN field found in the received packets' RTP header.
  • the number of missing RTP packets may be indicated by a discontinuity or break in the incremental ordering of the SN fieid.
  • the number of missing packets may be determined by the difference between the SN fieid value after the break, and the SN filed value before the break.
  • the process of monitoring the received data stream for missing RTP packets from the received data stream is shown as processing step 403 in figure 4.
  • the network interface unit 301 may be implemented as a packet receiver whereby the parsing and missing packet detection functionality may be incorporated as part of the subsequent decoding unit 303.
  • the output from the network interface unit 301 may be used to convey the encoded speech data retrieved from the RTP stream to the decoding unit 303; this may be achieved via connection 321.
  • the encoded speech data may be accompanied by accompanying packet header information which may provide information indicative of any packets that may be missing. As described above this packet header information may be formed or derived from the TS and/or SN RTP header fields.
  • the packet header information stream may be conveyed to the data stream parser 305 via a separate connection 323.
  • the packet header information stream may be embedded as part of the encoded speech data stream which is in turn conveyed via connection 321.
  • the accompanying packet header information may contain the actual data relating to the TS and/or SN RTP header fields, in this embodiment the recipient of the packet header information stream, the data stream parser 305 in the decoding unit 303, may perform the necessary processing in order to evaluate the co received encoded speech data for missing frames.
  • the accompanying packet header information stream may contain the RTP header field indicating the RTP payload specific information; for example the M bit representing whether the RTP packet contains the first frame of encoded speech after a region of silence.
  • RTP packet may contain a payload comprising one or more frames of speech data, and that any RTP packet header information which is sent via the accompanying packet header information stream will be applicable to the speech frames allocated to that particular payload.
  • the data stream parser 305 is configured to parse the speech data frames and in some embodiments of the invention further determine the status of the frame.
  • This operation of parsing the speech data frame is shown as processing step 405 in Figure 4.
  • the speech decoding unit 303 and more specifically the data stream parser 305 is configured to receive the input unpacked encoded speech data stream via connection 321 together with accompanying packet header information via connection 323 from the network interface unit 301.
  • step 501 The process of receiving the unpacked encoded speech data stream together with any accompanying packet header information is as shown as step 501 in figure 5.
  • the data stream parser 305 may comprise a decoding state store.
  • the decoding state store may store a value which indicates the decoding state of the last valid frame of received speech encoded data. For example, if the last valid frame of encoded data received was a speech frame, then the decoding state store value would reflect a speech state. Whereas, if the last valid frame received was a silence indicator descriptor (SID) frame, then the decoding state store would store a value reflecting a DTX state.
  • SID silence indicator descriptor
  • the data stream parser 305 may parse the accompanying signalling stream in order to ascertain if any frames are missing from the encoded media stream.
  • a buffering mechanism may be incorporated into the data stream parser 305. This buffering mechanism may be used to store encoded speech data frames derived from the incoming unpacked speech data stream via connection 321.
  • the accompanying packet header information may be used to determine frame header information for each of the stored speech data frames.
  • the frame header information may be ascertained in conjunction with specific packet header fields such as SN and TS in order to determine the sequence order of the encoded speech frames in the buffering mechanism. From this ordering information it may be possible to reorder the stored speech data frames accordingly. Further, the ordering information may be used to determine which if any frames are missing from the sequence of encoded speech data currently frames stored in the buffering mechanism.
  • the content of the buffering mechanism may be updated with the contents of the incoming unpacked speech data stream on a fist in first out (FIFO) basis.
  • the functionality of the buffering mechanism may equally occur in the network interface unit 301, whereby information relating to any missing speech data frames may be conveyed along with the accompanying packet header information stream.
  • processing step 503 The process of parsing the accompanying packet header information stream is depicted as processing step 503 in figure 5.
  • information relating to missing speech encoded frames may be obtained from the buffering mechanism as described above. This information may be used by the data stream parser 305 in order to determine if one or more encoded speech frames are missing. If the result of the parsing process or the buffering mechanism indicates that a speech encoded frame is missing, then the data stream parser 305 may inspect the decoding state store in order to determine the type of the last valid Iy decoded frame.
  • the error concealment processor 31 1 may then be activated in a particular mode of operation to generate either a substituted speech or silence frame according to the value of the decoding state store. For example, if the value of the decoding state store indicates that the last va ⁇ dly decoded frame was a SID frame, then the data stream parser 305 would activate the error concealment processor to generate a further silence frame. However, if the value of the decoding state store indicates that the last validly decoded frame was a speech frame, then the data stream parser 305 would activate the error concealment processor to generate a speech frame.
  • the process of activating the error concealment processor is shown as processing step 505 in figure 5.
  • the error concealment processor 311 may comprise a state machine based approach for the generation of speech type fames.
  • the value of the state may indicate the number of consecutive bad frames received. If the current frame is declared as either being missing or corrupted then the state of the machine may be increased. However, if the current frame is declared as being correctly received then the state of the machine may be decreased. At each state a form of error concealment may take place, in which the exact process adopted may be dependent on the state of the machine.
  • a typical method of error concealment may involve using codec parameters from previous frames in order to form a substitute for the missing or corrupted frame.
  • This method may in further embodiments be further enhanced in situations where the current frame has been erroneously received and declared as corrupted by the network interface unit 301.
  • it may be possible to form a substitute for the erroneously received frame by utilising a mixture of codec parameters from the current and previous frames.
  • Further systems of error concealment may additionally adopt a method of attenuating key codec parameters such as gain terms when formulating the substituted frame. This method of attenuation is often applied progressively in a manner which is dependent on the number of consecutive frames which have either been found to be corrupted or declared as missing.
  • the method adopted by the error concealment processor 311 may comprise simply repeating the silence frame associated with the last valid SID frame.
  • this technique may be further enhanced for further subsequent lost SID frames by applying attenuation factors to the parameters of the last validly received silence frame.
  • the data stream parser 305 may further parse the frame type field contained within the header of the validly received encoded speech data frame in order to determine the type of speech frame contained within.
  • processing step 507 The process of further parsing the validly received encoded speech data frame in order to determine the type of frame contained within the speech frame is shown as processing step 507 in figure 5.
  • the frame type field may be used to convey information about the operating mode and type of speech frame.
  • the frame type filed may indicate whether the frame is a speech frame or a SID frame.
  • the speech decoding unit 303 may be configured to decode a validly received encoded speech frame of Adaptive Multirate (AMR) speech coding type.
  • AMR Adaptive Multirate
  • Figure 6 depicts an AMR encoded frame 601 according to an example of a first embodiment of the invention.
  • the header 603 associated with AMR encoded frame 601 may comprise a codec type field 605 which indicates the type of coding information present within the encoded frame.
  • this codec type field may be used to signify three distinct types of coding information. These three types of coding information may be categorised as; the frame contains coded speech data together with an indication of the codec mode of operation, the frame contains SID update parameters, and the frame contains no data.
  • the data stream parser 305 may as a result of parsing the encoded speech frame's header determine that the validly received frame is either a SlD frame or an encoded speech frame.
  • step 508 The process of determining the validly received encoded speech data frame type is shown as step 508 in figure 5.
  • the data stream parser 305 may activate the DTX decoder 311 in order to decode the silence frame. Activation of the DTX decoder may be performed by passing the SID parameters from the data stream parser 305 to the DTX decoder 31 1 , via connection 327.
  • step 511 The process of activating the DTX decoder 311 is shown as step 511 in figure 5.
  • the data stream parser 305 may perform a further processing stage which performs a content check on the received speech encoded data and accompanying packet header information.
  • the data stream parser 305 may further be configured to monitor for particular characteristics in the distribution between encoded frames which have been declared missing and encoded frames which have been validly received. By monitoring for such characteristics in the data stream parser 305 it may be possible to obtain further aposteri information on the category or type of encoded speech data frame which may have been deciared missing by the network interface unit receiver 301. This additional aposteri information may then be conveyed to subsequent decoding stages in order to assist in the process of decoding the speech data frame.
  • the data stream parser 305 may be configured to identify a particular instance of missing frames which may be associated with a return to a region of speech after a period of silence. This particular instance may be identified by the data stream parser 305 by; firstly identifying the first speech frame which is validly received after a period of silence or comfort noise updates and noting the value of the RTP header field indicating a first frame of speech after a region of silence (also known as the speech transition frame) which may be conveyed to the data stream parser 305 as part of the accompanying packet header information stream from the network interface unit 301.
  • the value of RTP header speech onset indicator field may indicate if a frame within the received RTP packet contains data associated with a first frame of speech after a region of silence. If the RTP header speech transition indicator field does not indicate this then it may be ascertained by the data stream parser 305 that at least the first speech frame after a region of silence is missing.
  • This information may be conveyed to the speech decoder 307 as data stream parser signalling information, where the information may be used to assist in the decoding of any subsequent speech frames by the decoder.
  • the data stream parser signalling information may be sent to the speech decoder 307 via the connection 328.
  • the data stream parser signalling information may be embedded as part of encoded speech data frames conveyed to the speech decoder via connection 329.
  • the encoded speech data stream may correspond to that of an AMR-NB encoded speech data stream, and the RTP payload format for the AMR-NB encoded speech data is given by the Internet Engineering Task Force (IETF) RFC 4867. in this standard the RTP header field indicating the first frame of speech after a period of siience is determined by the value of the M bit.
  • IETF Internet Engineering Task Force
  • the encoded speech data stream may correspond to that of an AMR-WB encoded speech data stream, and as above the RTP payload format for the AMR-WB encoded speech data is given by the Internet Engineering Task Force (IETF) RFC 4867. Also as before, this standard specifies that the M bit signifies the first speech frame following a region of silence.
  • IETF Internet Engineering Task Force
  • processing step 509 The process of determining if the first frame of speech following a region of silence is the first validly received frame of speech is shown as processing step 509 in figure 5.
  • the encoded speech data and data stream parser signalling information output from the data stream parser 305 may be connected to the input of the speech decoder 307.
  • the speech data output from the data stream parser 305 may be used to activate and convey encoded speech data to the speech decoder 307.
  • the data stream parser signalling output connection from the data stream parser 305 may be used to convey information to the speech decoder 307 relating to the particular instance that the first encoded speech frame after a region of silence may be missing from the accompanying encoded speech data stream.
  • the data stream parser signalling information may be used as part of the subsequent decoding process.
  • the process of activating the speech decoder 307 with an encoded speech data frame and data stream parser signalling information is depicted as processing step 510 in figure 5.
  • this data stream parser signalling information may be conveyed to the following speech coder 307 embedded as part of the accompanying encoded speech data stream.
  • processing step 405 The process of parsing received encoded speech frames and determining if any of the missing frames are associated with a transition from a silence region to a speech region is shown as processing step 405 in figure 4.
  • the speech decoder 307 may be based on the Algebraic Code Excited Linear Prediction (ACELP) architecture of speech coding, such as that deployed by the AMR family of speech codecs. These codecs may typically segment a speech signal into frames of 20ms duration, and then further segment the frame into a plurality of sub frames. Parametric modelling of the signal may then be performed over the frame in order to model its spectral characteristic.
  • the coefficients generated by this process may typically be represented in the form of Linear Predictive Coding (LPC) coefficients, and the model formed from these coefficients may be known as a LPC filter.
  • LPC Linear Predictive Coding
  • LTP Line Spectral Frequencies
  • ISP Interference Spectral Pair
  • the audio or speech signal may be further modelled by using tools such as long term prediction (LTP) and secondary excitation generation or fixed codebook excitation.
  • LTP long term prediction
  • secondary or fixed codebook excitation model the residual signal, i.e. the signal which may be left once the contributions from the parametric modelling and long term prediction tools have been removed.
  • the resultant from the LTP and secondary excitation stages is to form a combined excitation vector comprising a long term prediction contribution and a fixed codebook contribution which can be used to excite the LPC filter on per sub frame basis.
  • Parameters associated with the LTP and secondary codebook excitations namely the vector gains may also be quantised in order facilitate transmission and storage.
  • Figure 7 depicts a speech decoder 307 suitable for decoding an ACELP encoded speech signal according to an embodiment of the invention.
  • the operation of the speech decoder 307 will hereafter be described in more detail with reference to the flow chart shown in figure 8.
  • the speech decoder 307 receives the input encoded speech frame at the speech frame unpacker 701.
  • the speech frame u ⁇ packer may unpack the input encoded speech frame into its constituent encoded speech parameters.
  • the constituent encoded speech parameters for a ACELP derived codec may partly comprise a plurality of quantised and transformed spectral or LPC coefficients.
  • the LPC coefficients may be received by the speech decoder 307 as Line Spectral Frequencies (LSF) or ISP (Immittance Spectral Pair).
  • LSF Line Spectral Frequencies
  • ISP Immittance Spectral Pair
  • the speech decoder 307 may equally receive the LPC coefficients in a different format.
  • LPC coefficients may be transformed at the encoder for transmission into any one of a number of formats of which a non Mmiting set of examples may include, reflection coefficients, Log Area Ratios (LAR).
  • the encoded speech parameters for a ACELP based codec may further comprise LTP (long-term prediction) parameters describing the periodic structure, and ACELP excitation parameters describing the residual signal.
  • LTP long-term prediction
  • ACELP excitation parameters describing the residual signal.
  • These parameters typically consist of one or more index values representing the LTP lag factor and secondary codebook indices.
  • each of these values may be accompanied by a corresponding quantised gain factor.
  • both the LTP lag and secondary codebook index may have a quantised gain factor associated with it.
  • the encoded constituent speech parameters are passed from the speech frame unpacker 701 to the dequantizer 703.
  • the dequantizer 703 may dequantize a sub set of the constituent encoded speech parameters. !n embodiments of the invention this sub set may comprise, the encoded LTP gain, the encoded secondary codebook gain, and the LSF parameters.
  • quantized parameter values are usually represented for transmission and storage in the form of quantization indices which map into a corresponding quantization table. Therefore, the process of dequantization in embodiments of the invention may comprise mapping the quantization index into an appropriate quantization table in order to obtain the corresponding quantized parameter value.
  • speech parameters such as those associated with the transformed LPC coefficients may be quantized using vector or lattice quantization techniques. It is to be understood therefore that in these embodiments of the invention the dequantization process may result in a quantized vector whose component members represent the quantized parameter value.
  • the process of dequantizing the encoded speech parameters is shown as processing step 803 in figure 8.
  • the dequantizer 703 may pass the quantised LSF coefficients to the LSF to LPC converter 705.
  • the LSF to LPC converter may perform a transform in order to convert the LSF coefficients to LPC coefficients for use in subsequent processing steps. It is to be understood that in those further embodiments of the invention which may deploy a different representation for the LPC coefficients the spectral coefficient conversion stage may employ a different transform in order to obtain the LPC coefficients.
  • processing step 805 The process of transforming spectral coefficients is depicted as processing step 805 in figure 8.
  • the LPC coefficients received by the LSF to LPC converter 705 may be passed to the speech parameter adaptor 707.
  • the speech parameter adaptor 707 may be arranged to receive a parameter input from the dequantizer 703.
  • the parameter input from the quantizer 703 may comprise the LTP gain factor and secondary (fixed) codebook gain factor generated by the decoding process in the dequantizer 703.
  • the speech parameter adaptor 707 may be further arranged to receive the data stream parser signalling information output from the data stream parser 305.
  • the data stream parser signalling information output from the data stream parser 305 may be used to notify the speech parameter adaptor 707 of an occurrence of an instance of a first speech frame being declared missing after a region of silence.
  • the data stream parser signalling information output may indicate that an instance of a missing speech transition frame.
  • the speech parameter adaptor 707 may be configured to adapt the values of the LPC coefficients together with the LTP and secondary (fixed) codebook gains in order to counteract any adverse effects that may occur in subsequent processing steps.
  • the LPC coefficients may be adapted in response to the notification by data stream parser 305 signalling channel by performing bandwidth expansion on the set of LPC coefficients.
  • the bandwidth expansion may be achieved by multipiying the set of LPC coefficients by a power series according to the following expression:
  • a k * a k ⁇ k for k - 1 to K
  • K is the order of the set of LPC coefficients, are the bandwidth expanded set of coefficients, and ⁇ is the factor by which the bandwidth of the LPC coefficients are expanded.
  • These bandwidth expanded coefficients a k may be used as the filter coefficients for the subsequent LPC fiitering stage, rather than the original unexpanded LPC filter coefficients a k .
  • Using the bandwidth expanded LPC coefficients a k rather than the original LPC coefficients in the subsequent LPC filtering stage may have the technical effect of pulling the poles of the subsequent LPC synthesis filter 719 from the unit circle (otherwise known as the Z-plane stability boundary). This may result in a broadening of the peaks of oscillation, known as formants in the LPC filter spectrum, which may ultimately have the effect of increasing filter stability.
  • Multiplication of the LPC coefficients a k by the power series ⁇ k may also be viewed as effectively dampening the impulse response of the subsequent LPC filter synthesis 719 by a decaying exponential.
  • any dampening effect introduced into the subsequent LPC filtering stage may be used to control the oscillatory behaviour of the LPC synthesis filter 719.
  • This oscillatory behaviour may become especially prevalent due to a mismatch of memories between the encoder and decoder.
  • the discrepancy between codec memories may occur when a first frame of speech is missing after a region of silence.
  • the first validly received speech frame at the decoder may be decoded using a memory state dependent on data from the previous silence frame.
  • the encoded speech data frame may have been formed in the encoder using a codec state memory dependent on previous speech frames rather than previous silence frames. This may result in an unpredictable and somewhat erratic behaviour in the subsequent LPC filtering stage at the decoder.
  • the speech parameter adaptor 707 may be output either the LPC or bandwidth expanded LPC coefficients to the LPC synthesis filter 719.
  • bandwidth expansion of the LPC coefficients may be applied over a number of speech frames following the signalling of the transition frame at the beginning of a speech region.
  • Attenuation of the bandwidth expansion factor ⁇ may be given as
  • ⁇ ⁇ ⁇ - ⁇ ( ⁇ ) - ( ⁇ - ⁇ )
  • ⁇ ⁇ is the bandwidth expansion factor which is gradually increased to a value of one
  • /?( «) is the bandwidth expansion effect attenuation factor which may be a function of n, the frame number after a declaration of the first speech frame missing following a region of silence. It is to be understood that when the bandwidth expansion factor has a value of one it ceases to have any effect. It is to be further understood in embodiments of the invention that the attenuation factor /?( «) may be less than or equal to one for all n.
  • the attenuation factor ⁇ (n) may vary linearly with the frame number n.
  • the attenuation factor may be linearly decreased over four frames by using the set
  • the attenuation factor ⁇ n may vary exponentially with the frame number n.
  • the attenuation factor may be exponentially decreased over N frames according to the following expression
  • processing step 807 The process of adapting the LPC coefficients conditional upon the data stream parser signalling information output from the data stream parser 305 is depicted as processing step 807 in figure 8.
  • the LSF decoding process employed by speech decoders in embodiments of the invention may incorporate a predictive quantization stage in the dequantizer 703, whereby the LSF coefficients for the current speech frame may be formed by adding a predictive vector contribution from past decoded LSF coefficients to that of a residual LSF vector.
  • the residual LSF vector may typically represent the difference at the equivalent encoder between the original LSF vector and the predicted LSF vector.
  • the LSF decoding process may be notified of an occurrence of an instance of a first speech frame being declared missing after a region of silence by the data stream parser signalling information output from the data stream parser 305. This may be implemented as a further connection from the data stream parser signalling information output from the data stream parser 305 to the dequantizer 703. If such a notification was received by the dequantizer 703 the predictive quantization stage used within the LSF quantization process may be adapted to counteract any adverse effects.
  • these adverse effects may be as a result of a predictor memory mismatch between the memories of the predictors at the encoder to the memories of the predictors at the decoder.
  • this mismatch may result in the LSF vector for the current frame being formed using invalid past LSF vectors. !n the instance of missing speech transition frames the LSF vector for the current frame should be generated in part using LSF vectors from past speech frames. However these frames are missing and instead the quantizer memories are populated with past LSF vectors corresponding to comfort noise frames.
  • the adverse effects of the past mismatch in quantizer memories at the instance of missing speech transition frames may be depleted by setting the vector of LSF coefficients within the quantizer memories to have a neutral contributory value or zero contributory value.
  • the LSF vector of coefficients within the quantizer may either be substituted with a pre determined set of values or set to zero.
  • the long term predictor (LTP) and secondary (fixed) codebook excitation gains may also be adapted in response to the data stream parser 305 signalling the occurrence of a missing first speech frame when transitioning from a region of silence to a region of speech via the data stream parser signalling output.
  • the values of the received LTP and secondary (fixed) codebook gains may be attenuated over the course of a number of speech sub frames (or frames) foilowing the notification of the missing transition frame. Further, it is to be understood that this attenuation factor may be reduced over the course of the number of speech sub frames.
  • the attenuated LTP or secondary codebook gain may be given by
  • Gain is either the LTP or secondary (fixed) codebook gain
  • ⁇ (n) is the gain attenuation factor which may be a function of either the frame number or subframe number n
  • G ⁇ is either the attenuated LTP or secondary codebook gain
  • the attenuation factor ⁇ ( ⁇ ) may vary linearly with the frame number n.
  • the attenuation factor may be linearly decreased over four sub frames by using the set
  • the attenuation factor ⁇ ) may vary exponentially with the sub frame number n.
  • the attenuation factor may be exponentially decreased over a number of sub frames N according to the following expression
  • the attenuation of the LTP and secondary codebook gains may be applied to just the first sub frame or frame received after the notification of the missing speech transition frame via the data stream parser signalling output from the data stream parser 305.
  • the technical effect of attenuating the LTP and secondary codebook gains after the notification of a missing speech transition frame is to control the energy of the combined excitation vector delivered to the input to the LPC filter 719.
  • This attenuation of gain factors ensures that the effect of any perceptually disturbing artefacts produced at the output of the LPC filter, as a consequence of the memory mismatch between encoder and decoder, are kept at a minimum.
  • the secondary excitation gains associated with the first speech frame following a missing speech transition frame may be replaced by a fixed pre determined set of secondary excitation gains in response to the notification by the data stream parser 305 signalling channel.
  • the predetermined set of secondary excitation gains may be randomly generated from a select range.
  • Adaptation of the LTP and secondary codebook excitation gains, conditional upon the data stream parser signalling output from the data stream parser 305, are shown as respective processing steps 809 and 810 in figure 8.
  • the LTP lag output from the dequantizer 703 is passed to the adaptive (LTP) codebook 709. This parameter may be used in conjunction with the adaptive (LTP) codebook memory to generate the adaptive codebook (LTP) vector.
  • the output adaptive codebook (LTP) vector may be then multiplied with the LTP gain output from the speech parameter adaptor 707 at a first multiplier unit 713, to form an adaptive (LTP) excitation vector contribution.
  • processing step 81 1 The process of generating the LTP excitation vector contribution is shown as processing step 81 1 in figure 8.
  • the secondary codebook index output from the dequantizer 703 may be connected to the secondary (fixed) codebook 711. This index parameter may be used to select an optimum excitation vector from the secondary codebook.
  • the output secondary excitation vector may then be multiplied with the secondary codebook gain output from the speech parameter adaptor 707 at a second multiplier unit 717, to form a secondary excitation vector contribution.
  • the adaptive (LTP) excitation contribution from the output of the first multiplier unit 713 may be added via the summer unit 715 to the secondary excitation contribution from the output of the second multiplier unit 717 to form a combined excitation vector.
  • processing step 815 The process of generating the combined excitation vector is shown as processing step 815 in figure 8.
  • the combined excitation vector output from the summer unit 715 may be passed to the LPC synthesis filter 719.
  • the combined excitation vector may then be used to excite the LPC synthesis filter 719 in order to obtain a speech signal output.
  • the process of generating the synthetic speech output from the LPC synthesis filter is shown as processing step 817 in figure 8.
  • the output from the LPC synthesis filter 719 may be connected to the output of the decoding unit 303.
  • the output from the LPC filter 717 may be passed to a further postfiltering stage.
  • the output signal from the decoding unit 303 may form the output audio signal 114.
  • the method of adapting coding parameters according to the condition of the first speech frame after a region of silence may be applied to any speech or audio codec which deploys a LPC based filtering technique as a method of generating a synthetic signal.
  • the decoder as a separate apparatus 108 in order to assist the understanding of the processes involved.
  • the apparatus, structures and operations may be jointly implemented with an equivalent encoding structure.
  • both encoder and decoder may share some/all common elements.
  • embodiments of the invention operating within a codec within an electronic device 610
  • the invention as described above may be implemented as part of speech (or audio) codec operating over a packet based network.
  • embodiments of the invention may be implemented on the speech codec which may implement speech coding over fixed or wired communication paths.
  • user equipment may comprise a speech codec such as those described in embodiments of the invention above.
  • user equipment may comprise an audio codec such as those described in embodiments of the invention above.
  • user equipment is intended to cover any suitable type of wireless user equipment, such as mobile telephones, portable data processing devices or portable web browsers.
  • PLMN public land mobile network
  • elements of a public land mobile network may also comprise audio codecs as described above.
  • the various embodiments of the invention may be implemented in hardware or special purpose circuits, software, logic or any combination thereof.
  • some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto.
  • While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-iimittng examples, hardware, software, firmware, special purpose circuits or logic, genera! purpose hardware or controller or other computing devices, or some combination thereof.
  • the embodiments of this invention may be implemented by computer software executable by a data processor of the mobile device, such as in the processor entity, or by hardware, or by a combination of software and hardware.
  • any blocks of the iogic flow as in the Figures may represent program steps, or interconnected Iogic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions.
  • the memory may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor-based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory.
  • the data processors may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital signal processors (DSPs) and processors based on multi-core processor architecture, as non-limiting examples.
  • Embodiments of the inventions may be practiced in various components such as integrated circuit modules.
  • the design of integrated circuits is by and large a highly automated process.
  • Complex and powerful software tools are available for converting a logic level design into a semiconductor circuit design ready to be etched and formed on a semiconductor substrate.
  • Programs such as those provided by Synopsys, Inc. of Mountain View, California and Cadence Design, of San Jose, California automatically route conductors and locate components on a semiconductor chip using well established rules of design as well as libraries of pre-stored design modules.
  • the resultant design in a standardized electronic format (e.g., Opus, GDSII, or the like) may be transmitted to a semiconductor fabrication facility or "fab" for fabrication.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

L'invention porte sur un appareil pour décoder un signal de parole, lequel appareil est configuré pour : détecter, dans un train de bits comprenant une pluralité de trames de données de parole, l'absence d'au moins une trame de données de parole; déterminer l'absence d'au moins une trame de données de parole qui est associée à une transition d'une trame du type signal de silence à une trame du type signal de parole; adapter au moins un paramètre de parole associé à au moins une trame de données de parole suivant la ou les trames de données de parole absentes, en fonction de la détermination de l'absence d'au moins une trame de données de parole qui est associée à la transition de la trame du type signal de silence à la trame du type de signal de parole.
PCT/EP2008/058400 2008-06-30 2008-06-30 Décodeur de parole avec dissimulation d'erreur WO2010000303A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/058400 WO2010000303A1 (fr) 2008-06-30 2008-06-30 Décodeur de parole avec dissimulation d'erreur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2008/058400 WO2010000303A1 (fr) 2008-06-30 2008-06-30 Décodeur de parole avec dissimulation d'erreur

Publications (1)

Publication Number Publication Date
WO2010000303A1 true WO2010000303A1 (fr) 2010-01-07

Family

ID=39863097

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2008/058400 WO2010000303A1 (fr) 2008-06-30 2008-06-30 Décodeur de parole avec dissimulation d'erreur

Country Status (1)

Country Link
WO (1) WO2010000303A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106575505A (zh) * 2014-07-29 2017-04-19 奥兰吉公司 Fd/lpd转换环境中的帧丢失管理
CN113490981A (zh) * 2019-02-13 2021-10-08 弗劳恩霍夫应用研究促进协会 音频发送器处理器、音频接收器处理器以及相关方法和计算机程序
US12009002B2 (en) 2019-02-13 2024-06-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transmitter processor, audio receiver processor and related methods and computer programs

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1103953A2 (fr) * 1999-11-23 2001-05-30 Texas Instruments Incorporated Procédé de dissimulation de pertes de trames de parole
WO2007073604A1 (fr) * 2005-12-28 2007-07-05 Voiceage Corporation Procede et dispositif de masquage efficace d'effacement de trames dans des codecs vocaux

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1103953A2 (fr) * 1999-11-23 2001-05-30 Texas Instruments Incorporated Procédé de dissimulation de pertes de trames de parole
WO2007073604A1 (fr) * 2005-12-28 2007-07-05 Voiceage Corporation Procede et dispositif de masquage efficace d'effacement de trames dans des codecs vocaux

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SJOBERG J ET AL: "RTP Payload Format and File Storage Format for the Adaptive Multi- Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) Audio Codecs, RFC 4867", IETF STANDARD-WORKING-DRAFT, INTERNET ENGINEERING TASK FORCE, IETF, CH, April 2007 (2007-04-01), XP002502209, Retrieved from the Internet <URL:http://tools.ietf.org/html/rfc4867> [retrieved on 20081031] *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106575505A (zh) * 2014-07-29 2017-04-19 奥兰吉公司 Fd/lpd转换环境中的帧丢失管理
CN113490981A (zh) * 2019-02-13 2021-10-08 弗劳恩霍夫应用研究促进协会 音频发送器处理器、音频接收器处理器以及相关方法和计算机程序
US12009002B2 (en) 2019-02-13 2024-06-11 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Audio transmitter processor, audio receiver processor and related methods and computer programs

Similar Documents

Publication Publication Date Title
JP5265553B2 (ja) フレーム消去回復のシステム、方法、および装置
JP5587405B2 (ja) スピーチフレーム内の情報のロスを防ぐためのシステムおよび方法
US9047863B2 (en) Systems, methods, apparatus, and computer-readable media for criticality threshold control
Valin et al. Definition of the opus audio codec
RU2673847C2 (ru) Системы и способы передачи избыточной информации кадра
TWI484479B (zh) 用於低延遲聯合語音及音訊編碼中之錯誤隱藏之裝置和方法
US10504525B2 (en) Adaptive forward error correction redundant payload generation
JP6687599B2 (ja) Fd/lpd遷移コンテキストにおけるフレーム喪失管理
RU2437170C2 (ru) Ослабление чрезмерной тональности, в частности, для генерирования возбуждения в декодере при отсутствии информации
JP2003504669A (ja) 符号化領域雑音制御
US20080180307A1 (en) Audio quantization
US20080255860A1 (en) Audio decoding apparatus and decoding method
US8862465B2 (en) Determining pitch cycle energy and scaling an excitation signal
WO2010000303A1 (fr) Décodeur de parole avec dissimulation d&#39;erreur
JP6012620B2 (ja) エンコーダおよび予測的に符号化する方法、デコーダおよび復号化する方法、予測的に符号化および復号化するシステムおよび方法、および予測的に符号化された情報信号
CN110770822A (zh) 音频信号编码和解码
JP2010539550A (ja) 複雑さ分散によるデジタル信号の転送誤り偽装
US20150100318A1 (en) Systems and methods for mitigating speech signal quality degradation
US8418032B2 (en) Processing of bit errors in a digital audio bit frame
JP2003140699A (ja) 音声復号化装置
JPH11316600A (ja) ラグパラメ―タの符号化方法及びその装置並びに符号帳作成方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08774553

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08774553

Country of ref document: EP

Kind code of ref document: A1