EP1129451A1 - Modele en boucle de codeur de la parole predictif, multimode et a debit variable - Google Patents
Modele en boucle de codeur de la parole predictif, multimode et a debit variableInfo
- Publication number
- EP1129451A1 EP1129451A1 EP99957560A EP99957560A EP1129451A1 EP 1129451 A1 EP1129451 A1 EP 1129451A1 EP 99957560 A EP99957560 A EP 99957560A EP 99957560 A EP99957560 A EP 99957560A EP 1129451 A1 EP1129451 A1 EP 1129451A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- coding
- coding mode
- mode
- speech
- threshold value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 claims abstract description 52
- 230000008569 process Effects 0.000 claims abstract description 19
- 238000013139 quantization Methods 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 230000000717 retained effect Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 5
- 238000012360 testing method Methods 0.000 description 4
- 230000006835 compression Effects 0.000 description 3
- 238000007906 compression Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 101100030919 Porphyromonas gingivalis (strain ATCC BAA-308 / W83) prtH gene Proteins 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000005311 autocorrelation function Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
- G10L19/18—Vocoders using multiple modes
- G10L19/24—Variable rate codecs, e.g. for generating different qualities using a scalable representation such as hierarchical encoding or layered encoding
Definitions
- the present invention pertains generally to the field of speech processing, and more specifically to closed-loop, variable-rate, multimode, predictive coding of speech.
- Speech coders typically comprise an encoder and a decoder, or a codec.
- the encoder analyzes the incoming speech frame to extract certain relevant parameters, and then quantizes the parameters into binary representation, i.e., to a set of bits or a binary data packet.
- the data packets are transmitted over the communication channel to a receiver and a decoder.
- the decoder processes the data packets, unquantizes them to produce the parameters, and then resynthesizes the speech frames using the unquantized parameters.
- the function of the speech coder is to compress the digitized speech signal into a low-bit-rate signal by removing all of the natural redundancies inherent in speech.
- the challenge is to retain high voice quality of the decoded speech while achieving the target compression factor.
- the performance of a speech coder depends on (1) how well the speech model, or the combination of the analysis and synthesis process described above, performs, and (2) how well the parameter quantization process is performed at the target bit rate of N G bits per frame. The goal of the speech model is thus to capture the essence of the speech signal, or the target voice quality, with a small set of parameters for each frame.
- a multimode coder applies different modes, or encoding- decoding algorithms, to different types of input speech frames.
- Each mode, or encoding-decoding process is customized to represent a certain type of speech segment (i.e., voiced, unvoiced, or background noise) in the most efficient manner.
- An external mode decision mechanism examines the input speech frame and make a decision regarding which mode to apply to the frame.
- the mode decision is done in an open-loop fashion by extracting a number of parameters out of the input frame and evaluating them to make a decision as to which mode to apply.
- the mode decision is made without knowing in advance the exact condition of the output speech, i.e., how similar the output speech will be to the input speech in terms of voice-quality or any other performance measure.
- An exemplary open-loop mode decision for a speech codec is described in U.S. Patent No. 5,414,796, which is assigned to the assignee of the present invention and fully incorporated herein by reference.
- Multimode coding can be fixed-rate, using the same number of bits N 0 for each frame, or variable-rate, in which different bit rates are used for different modes.
- the goal in variable-rate coding is to use only the amount of bits needed to encode the codec parameters to a level adequate to obtain the target quality.
- VBR variable-bit-rate
- Conventional VBR speech coders are designed with modes having different bit-rates.
- An exemplary variable rate speech coder is described in U.S. Patent No. 5,414,796, assigned to the assignee of the present invention and previously fully incorporated herein by reference.
- the codec described in the aforesaid patent has the following four rates: (1) full rate (FR); (2) half rate (HR); (3) quarter rate (QR); and (4) eighth rate (ER).
- FR full rate
- HR half rate
- QR quarter rate
- ER eighth rate
- each frame of speech is encoded by 160, eighty, forty, and twenty bits per frame, respectively.
- An external open-loop mode decision is made regarding which mode (FR, HR, QR or ER) to apply to the input speech frame.
- the application areas include wireless telephony, satellite communications, Internet telephony, various multimedia and voice-streaming applications, voice mail, and other voice storage systems.
- the driving forces are the need for high capacity and the demand for robust performance under packet loss situations.
- Various recent speech coding standardization efforts are another direct driving force propelling research and development of low-rate speech coding algorithms.
- a low-rate speech coder creates more channels, or users, per allowable application bandwidth, and a low- rate speech coder coupled with an additional layer of suitable channel coding can fit the overall bit-budget of coder specifications and deliver a robust performance under channel error conditions.
- Conventional speech coders typically use some form of prediction mechanism to encode the current frame.
- a speech coder exploits and uses the information contained in the last decoded and recreated frame. This works well because there is typically strong correlation, or similarity, between successive frames.
- P(n) is a conventional prediction filter that produces an approximation of current frame from past quantized frame, tJTie quantized version of the prediction error E cur (n) of the current frame.
- SNR signal-to- noise ratio
- PSNR perceptual SNR
- the prediction filter information is necessarily sent to the decoder as a certain number of bits, Np.
- the remaining available bits, No - Np can be used to encode the prediction error signal E cur . If the prediction from the quantized past frame, S prev qunntlzed , generates an excellent predicted representation S cur _ predlcted of the current frame S cur , the prediction error E cur will be small, having a low dynamic range. Hence, it will be relatively easy to encode the prediction error E cur with a small number of bits.
- the total number of bits per frame, No is high.
- the QCELP ⁇ supports 260 bits per 20-ms frame. Therefore, even after allocating a number of bits, Np, to quantize the prediction filter parameter, there are enough remaining bits, No-Np, to accurately encode the prediction error.
- Np a number of bits
- No-Np a number of bits
- a speech coder advantageously includes a codec configured to operate in at least one of a plurality of coding modes; and a closed- loop mode decision module coupled to the codec and configured to apply a first coding mode from the plurality of coding modes to an input speech frame, the first coding mode having a first bit rate that is lower than the bit rate of any other coding mode of the plurality of coding modes, the closed-loop mode decision module being further configured to obtain a performance measure of the codec, compare the performance measure with a threshold value, and, if the performance measure does not exceed the threshold value, reject the first coding mode in favor of a second coding mode having a second bit rate that is greater than the first bit rate.
- a method of coding speech frames advantageously includes the steps of selecting a first coding mode to apply to a speech frame, the first coding mode having a first bit rate; obtaining a coding performance measure; comparing the coding performance measure with a threshold value; and rejecting the first coding mode in favor of a second coding mode if the coding performance measure does not exceed the threshold value, the second coding mode having a second bit rate that exceeds the first bit rate.
- a speech coder advantageously includes means for selecting a first coding mode to apply to a speech frame, the first coding mode having a first bit rate; means for obtaining a coding performance measure; means for comparing the coding performance measure with a threshold value; and means for rejecting the first coding mode in favor of a second coding mode if the coding performance measure does not exceed the threshold value, the second coding mode having a second bit rate that exceeds the first bit rate.
- FIG. 1 is a block diagram of a communication channel terminated at each end by speech coders.
- FIG. 2 is a block diagram of an encoder.
- FIG. 3 is a block diagram of a decoder.
- FIG. 4 is a flow chart illustrating the steps of a closed-loop, multimode, predictive coding technique for speech frames at low bit rates.
- a first encoder 10 receives digitized speech samples s(n) and encodes the samples s(n) for transmission on a transmission medium 12, or communication channel 12, to a first decoder 14.
- the decoder 14 decodes the encoded speech samples and synthesizes an output speech signal s SYNTH (n).
- a second encoder 16 encodes digitized speech samples s(n), which are transmitted on a communication channel 18.
- a second decoder 20 receives and decodes the encoded speech samples, generating a synthesized output speech signal s SYNTH (n).
- the speech samples s(n) represent speech signals that have been digitized and quantized in accordance with any of various methods known in the art including, e.g., pulse code modulation (PCM), companded ⁇ -law, or A-law.
- PCM pulse code modulation
- the speech samples s(n) are organized into frames of input data wherein each frame comprises a predetermined number of digitized speech samples s(n). In an exemplary embodiment, a sampling rate of 8 kHz is employed, with each 20 ms frame comprising 160 samples.
- the rate of data transmission may advantageously be varied on a frame-to- frame basis from 8 kbps (full rate) to 4 kbps (half rate) to 2 kbps (quarter rate) to 1 kbps (eighth rate). Varying the data transmission rate is advantageous because lower bit rates may be selectively employed for frames containing relatively less speech information. As understood by those skilled in the art, other sampling rates, frame sizes, and data transmission rates may be used.
- the first encoder 10 and the second decoder 20 together comprise a first speech coder, or speech codec.
- the second encoder 16 and the first decoder 14 together comprise a second speech coder.
- speech coders may be implemented with a digital signal processor (DSP), an application-specific integrated circuit (ASIC), discrete gate logic, firmware, or any conventional programmable software module and a microprocessor.
- the software module could reside in RAM memory, flash memory, registers, or any other form of writable storage medium known in the art.
- any conventional processor, controller, or state machine could be substituted for the microprocessor.
- Exemplary ASICs designed specifically for speech coding are described in U.S. Patent No. 5,727,123, assigned to the assignee of the present invention and fully incorporated herein by reference, and U.S. Application Serial No. 08/197,417, entitled VOCODER ASIC, filed February 16, 1994, assigned to the assignee of the present invention, and fully incorporated herein by reference.
- an encoder 100 that may be used in a speech coder includes a mode decision module 102, a pitch estimation module 104, an LP analysis module 106, an LP analysis filter 108, an LP quantization module 110, and a residue quantization module 112.
- Input speech frames s(n) are provided to the mode decision module 102, the pitch estimation module 104, the LP analysis module 106, and the LP analysis filter 108.
- the mode decision module 102 produces a mode index I M and a mode M based upon the periodicity of each input speech frame s(n).
- Various methods of classifying speech frames according to periodicity are described in U.S. Application Serial No.
- the pitch estimation module 104 produces a pitch index I P and a lag value
- the LP analysis module 106 performs linear predictive analysis on each input speech frame s(n) to generate an LP parameter a.
- the LP parameter a is provided to the LP quantization module 110.
- the LP quantization module 110 also receives the mode M.
- the LP quantization module 110 produces an LP index I LP and a quantized LP parameter u .
- the LP analysis filter 108 receives the quantized LP parameter a in addition to the input speech frame s(n).
- the LP analysis filter 108 generates an LP residue signal R[n], which represents the error between the input speech frames s(n) and the quantized linear predicted parameters .
- the LP residue R[n], the mode M, and the quantized LP parameter a are provided to the residue quantization module 112. Based upon these values, the residue quantization module 112 produces a residue index I ana 1 a quantized residue signal — .
- a decoder 200 that may be used in a speech coder includes an LP parameter decoding module 202, a residue decoding module 204, a mode decoding module 206, and an LP svnthesis filter 208.
- the mode decoding module 206 receives and decodes a mode index I M , generating therefrom a mode M.
- the LP parameter decoding module 202 receives the mode M and an LP index I LP .
- the LP parameter decoding module 202 decodes the received values to produce a quantized LP parameter a .
- the residue decoding module 204 receives a residue index I ⁇ , a pitch index I P , and the mode index I M .
- the residue decoding module 204 decodes the received values to generate a quantized residue signal ⁇ [ ' m + i .
- the quantized residue signal R[n] and the quantized LP parameter a are provided to the LP synthesis filter 208, which synthesizes a decoded output speech signal s[n] therefrom.
- a multimode coder first uses an open-loop decision mode, relying on parameters extracted out of the current frame to classify the current frame as background-noise /silence (N), unvoiced speech (UV), or voiced speech (V).
- N-type frames are coded with an eighth-rate mode
- UV-type frames are coded with a quarter- rate mode.
- V-type frames i.e., voiced speech frames
- the full-rate mode may advantageously be a prediction-based coding scheme with adequate bits to accurately encode various types of voiced speech, delivering a perceptual signal- to-noise ratio (PSNR) well above the target PSNR (a predefined or variable threshold value).
- PSNR perceptual signal- to-noise ratio
- the half-rate mode is advantageously a prediction-based coding scheme designed to encode frames a high degree of correlation with the previous frame (i.e., frames that are quite similar to the previous frame).
- the number of bits available in the half-rate mode is adequate to encode the prediction parameters for frames with high correlation, as well as the prediction error, which is relatively small due to the high correlation between successive frames.
- Such frames are typically encountered in steady voiced speech segments, which are therefore amenable to half-rate coding.
- the performance of prediction-based coding schemes also depends on how accurately the previous frame is quantized.
- a closed-loop mode selection process is employed after the open-loop mode to ensure that the coding performance exceeds the predefined (or variable) target PSNR value.
- the open-loop mode need not necessarily be applied at all.
- the flow chart of FIG. 4 illustrates a closed-loop, multimode, predictive coding technique for speech frames at low bit rates, in accordance with one embodiment.
- a frame number counter is set equal to 1.
- the algorithm then proceeds to step 302, starting the coding process.
- the algorithm then proceeds to step 304.
- the algorithm checks the current frame and the previous quantized frame.
- the algorithm then proceeds to step 306.
- the algorithm determines whether the current frame should be classified as silence or background noise. This determination is made in accordance with various conventional techniques for measuring frame energy, such as, e.g., calculating the sum-of-squares. If the frame is classified as silence or background noise, the algorithm proceeds to step 308.
- the algorithm applies an eighth-rate coding mode to the frame.
- step 310 the algorithm determines whether the current frame should be classified as unvoiced speech. This determination is made in accordance with various known methods of periodicity determination, such as, e.g., the use of zero crossings and normalized autocorrelation functions (NACFs). These techniques are described in the aforementioned U.S. Application Serial No. 08/815,354, previously fully incorporated herein by reference. If the frame is classified as unvoiced speech, the algorithm proceeds to step 314. In step 314 a quarter-rate coding mode is applied to the frame. The algorithm then proceeds to step 310.
- NACFs normalized autocorrelation functions
- step 312 the algorithm proceeds to step 316, considering the frame to contain voiced speech.
- step 316 the algorithm goes to a half-rate prediction-based coding mode.
- step 318 the PSNR is computed.
- the algorithm then proceeds to step 320.
- step 320 the algorithm determines whether the computed PSNR is greater than a predefined threshold, or target, PSNR value.
- the threshold, or target, PSNR value may be a function of average bit rate. For example, the average bit rate is calculated periodically and fed back to the algorithm, which adjusts the target threshold value accordingly. Further, it should be understood that any conventional measure of performance may be substituted for PSNR.
- the algorithm proceeds to step 322. In step 322 a half-rate coding mode is applied to the frame. The algorithm then proceeds to step 310. If, on the other hand, in step 320 the computed PSNR does not exceed the target PSNR, the algorithm proceeds to step 324. In step 324 the algorithm applies a full-rate coding mode to the frame. The algorithm then proceeds to step 310.
- step 310 the frame number counter is incremented by 1.
- the algorithm then proceeds to step 326.
- step 326 the algorithm determines whether the frame number counter value is greater than or equal to the total number of frames that must be processed (i.e., whether there are any remaining frames to process). If the frame number counter value is less than the total number of frames to be processed, the algorithm returns to step 302, beginning the coding process for the next frame. If, on the other hand, the frame number counter value is greater than or equal to the total number of frames to be processed, the algorithm proceeds to step 328, ending the coding process.
- the full-rate coding mode described above with respect to FIG. 4 could be a higher-bit-rate predictive mechanism (i.e., any bit rate that is greater than half-rate).
- a higher-bit-rate, direct coding mechanism is substituted for the full-rate, predictive coding mode.
- the direct coding mode encodes the current speech frame or residue without using any information from the previous frame.
- a direct encoding method is appropriate for speech segments for which there is no similarity between the current frame and the previous frame.
- An example is during the onset of a voice segment.
- Another example is unvoiced-to- voiced segment transitions.
- a direct encoding method is also useful in the middle of voiced segments when the cumulative effect of prediction-based encoding has degraded the past quantized frame so as to be too far out of sync with the corresponding original speech frame. In this case predictive coding will fail, even at much higher bit rates, due to the lack of similarity between the past quantized frame and the past original frame.
- a fresh capture of the current frame with a direct encoding method will not only enhance the preservation of the current frame, but will also facilitate future prediction-based encoding of the next and later frames because the prediction mechanism will be aided by a more accurate memory.
- the Rl coding method is a higher-rate, direct coding method.
- the R2 coding method is a lower-rate, predictive coding method.
- a closed-loop decision is performed such that the R2 coding method is tried first, the performance is checked by comparing with a performance measure, and the algorithm switches to the Rl coding method if the performance for the R2 coding mode is insufficient.
- the higher-rate, Rl coding mode is tried first, the performance is checked by comparing with a performance measure, and, if the performance is satisfactory, the lower-rate, R2 coding mode is tried.
- the performance check is then performed for the R2 coding mode, and if the R2 coding mode performance is unsatisfactory, the Rl coding mode is applied to the frame.
- multiple coding modes having bit rates R1,R2,...,RN-1,RN (where R1>R2>...>RN-1>RN) are employed.
- a closed-loop decision is performed such that the lowest rate, RN, is tried first. If the RN coding mode performs adequately, the RN coding mode is retained for the frame. Otherwise, the next, higher-rate coding mode, RN-1, is applied. The process is reiterated until either a coding mode performs adequately or the highest-rate mode, Rl, is retained. In an alternate embodiment, the highest rate, Rl, is tried first. If the Rl mode performs adequately, the next, lower-rate coding mode, R2, is tried. The process is continued until a given coding mode does not perform adequately (at which time the last coding mode to perform adequately is applied), or until the lowest-rate coding mode, RN, performs satisfactorily and is applied.
- multiple coding modes having bit rates Rl,R2,...,Rm-l,Rm,Rm+l,...,RN are employed.
- the bit rates have the following relative magnitudes: Rl>R2>Rm-l>Rm>Rm+l>RN.
- a closed-loop mode decision works in conjunction with an open-loop mode decision.
- the open-loop mode decision based upon parameters such as frame energy or frame periodicity, tells the coder to apply a mode with a bit rate of Rm, at which point the closed-loop mode decision takes over.
- the closed-loop mode decision applies the Rm coding mode, tests performance, and maintains the Rm coding mode if performance is satisfactory.
- the closed-loop mode decision tries the next, higher-rate coding mode, Rm-1. The process is reiterated until either a coding mode performs adequately or the highest-rate mode, Rl, is retained. Alternatively, the closed-loop mode decision applies the Rm coding mode, tests performance, and maintains the Rm coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, lower-rate coding mode, Rm+1. The process is reiterated until either a coding mode performs inadequately (at which time the last coding mode to perform adequately is applied), or the lowest-rate mode, RN, is retained.
- multiple coding modes having bit rates R1,R2,...,RN (where R1>R2>...>RN) are employed. All of the coding modes are applied in parallel to the input speech frame, and the performances of the coding modes are compared with a set of N threshold performance measures. The coding mode that appears to produce the most accurate result is selected.
- multiple coding modes having bit rates R1,R2,...,RN are employed. All of the coding modes are applied in parallel to the input speech frame, and the performances of the coding modes are compared with a set of N threshold performance measures. If several coding modes exceed the performance threshold target, the coding mode having the lowest bit rate (and also performing above the performance threshold) is selected.
- multiple coding modes having bit rates Rl,R2,...,Quarter Rate,..., Half Rate,...,RN (where Rl is Full Rate and RN is Eighth Rate) are employed.
- a closed-loop mode decision works in conjunction with an open-loop mode decision.
- the open-loop mode decision based upon parameters such as frame energy or frame periodicity, tells the coder to apply the full-rate coding mode to unvoiced-to-voiced transition frames, voiced-to-voiced transition frames, nonstationary voiced segments, and nonstationary unvoiced segments. Also based upon frame parameters, the open-loop mode decision tells the coder to apply the half-rate coding mode to steady-voiced segments that exhibit a significant degree of similarity from frame to frame.
- the open-loop mode decision tells the coder to apply the quarter-rate coding mode to steady unvoiced segments. Also based upon frame parameters, the open-loop mode decision tells the coder to apply the eighth-rate coding mode to background noise and other nonspeech signals such as silence.
- the closed-loop mode decision takes over. The closed-loop mode decision applies the coding mode selected by the open-loop mode decision, tests performance, and maintains the selected coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, higher-rate coding mode. The process is reiterated until either a coding mode performs adequately or the full-rate mode is retained.
- the closed-loop mode decision applies the coding mode selected by the open-loop mode decision, tests performance, and maintains the selected coding mode if performance is satisfactory. Otherwise, the closed-loop mode decision tries the next, lower-rate coding mode. The process is reiterated until either a coding mode performs inadequately (at which time the last coding mode to perform adequately is applied), or the lowest-rate mode is retained.
- the MCCi and Mi coding modes each use the same source-coding mode (i.e., the same encoder and decoder).
- the MCCi coding mode includes an additional layer of channel protection, in which (RCCi-Ri) bits are used for robust protection of the parameters of the Mi coding mode under the worst possible channel condition of the communication system.
- the performance, or voice quality, delivered by the Mi coding mode under channel-error-free conditions is similar to the performance, or voice quality, delivered by the MCCi coding mode under the worst possible channel error condition.
- the (RCCi-Ri) channel coding bits serve to provide adequate protection under the assumed, or target, worst channel condition.
- the assumed worst channel condition may advantageously be, e.g., a predefined percentage of frame error rate (FER).
- FER frame error rate
- a closed-loop mode decision advantageously accounts for both channel variation and source variation to deliver a guaranteed quality of service. For example, a source-controlled, closed-loop mode decision such as described above is applied first. The closed-loop mode decision tells the coder to use the Mi coding mode.
- MCCi,j-RCCi represents the minimum number of bits needed to add channel error protection to the channel coding layer so that the channel error protection will be adequate for the worst-case scenario in the j-th channel error condition.
- Such a closed-loop, combined-network-and-source-controlled codec delivers guaranteed quality of service across various channel conditions while also delivering a low average bit rate.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Le modèle en boucle de codeur de la parole prédictif, multimode de la présente invention comporte un codec (100, 200) configuré pour fonctionner en plusieurs modes de codage, et un module de décision de mode en boucle configuré pour appliquer à une trame de parole d'entrée le mode de codage du débit binaire inférieur. On réalise une mesure du rendement du codec et on la mesure à une valeur de seuil. Si la mesure de rendement reste inférieure à la valeur de seuil, le mode de codage à débit binaire inférieur est refusé au profit du mode de codage à débit binaire supérieur. Le processus peut se poursuivre jusqu'à obtention d'un rendement de codage satisfaisant. On peut appliquer un mode de codage direct à débit binaire supérieur dès que le mode de codage prédictif à débit binaire inférieur n'a pas réussi à donner un rendement satisfaisant.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US191643 | 1980-09-26 | ||
US19164398A | 1998-11-13 | 1998-11-13 | |
PCT/US1999/026850 WO2000030075A1 (fr) | 1998-11-13 | 1999-11-12 | Modele en boucle de codeur de la parole predictif, multimode et a debit variable |
Publications (1)
Publication Number | Publication Date |
---|---|
EP1129451A1 true EP1129451A1 (fr) | 2001-09-05 |
Family
ID=22706319
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP99957560A Withdrawn EP1129451A1 (fr) | 1998-11-13 | 1999-11-12 | Modele en boucle de codeur de la parole predictif, multimode et a debit variable |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1129451A1 (fr) |
JP (1) | JP2002530706A (fr) |
KR (1) | KR20010087393A (fr) |
AU (1) | AU1524300A (fr) |
WO (1) | WO2000030075A1 (fr) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6330532B1 (en) * | 1999-07-19 | 2001-12-11 | Qualcomm Incorporated | Method and apparatus for maintaining a target bit rate in a speech coder |
US6438518B1 (en) * | 1999-10-28 | 2002-08-20 | Qualcomm Incorporated | Method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions |
JP3404024B2 (ja) | 2001-02-27 | 2003-05-06 | 三菱電機株式会社 | 音声符号化方法および音声符号化装置 |
US8725499B2 (en) | 2006-07-31 | 2014-05-13 | Qualcomm Incorporated | Systems, methods, and apparatus for signal change detection |
US8260609B2 (en) | 2006-07-31 | 2012-09-04 | Qualcomm Incorporated | Systems, methods, and apparatus for wideband encoding and decoding of inactive frames |
US8532984B2 (en) | 2006-07-31 | 2013-09-10 | Qualcomm Incorporated | Systems, methods, and apparatus for wideband encoding and decoding of active frames |
JP2008170488A (ja) | 2007-01-06 | 2008-07-24 | Yamaha Corp | 波形圧縮装置、波形伸長装置、プログラムおよび圧縮データの生産方法 |
CN102254562B (zh) * | 2011-06-29 | 2013-04-03 | 北京理工大学 | 一种相邻高低速率编码模式间切换的变速率音频编码方法 |
CN118016081B (zh) * | 2024-04-10 | 2024-06-21 | 山东省计算中心(国家超级计算济南中心) | 基于语音质量分级模型的变速率语音编码方法及系统 |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0398318A (ja) * | 1989-09-11 | 1991-04-23 | Fujitsu Ltd | 音声符号化方式 |
CN1129263C (zh) * | 1994-02-17 | 2003-11-26 | 摩托罗拉公司 | 分组编码信号的方法和装置 |
-
1999
- 1999-11-12 AU AU15243/00A patent/AU1524300A/en not_active Abandoned
- 1999-11-12 KR KR1020017006035A patent/KR20010087393A/ko not_active Application Discontinuation
- 1999-11-12 EP EP99957560A patent/EP1129451A1/fr not_active Withdrawn
- 1999-11-12 JP JP2000583004A patent/JP2002530706A/ja active Pending
- 1999-11-12 WO PCT/US1999/026850 patent/WO2000030075A1/fr not_active Application Discontinuation
Non-Patent Citations (1)
Title |
---|
See references of WO0030075A1 * |
Also Published As
Publication number | Publication date |
---|---|
WO2000030075A1 (fr) | 2000-05-25 |
AU1524300A (en) | 2000-06-05 |
KR20010087393A (ko) | 2001-09-15 |
JP2002530706A (ja) | 2002-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP1340223B1 (fr) | Procede et dispositif de classification vocale robuste | |
US7203638B2 (en) | Method for interoperation between adaptive multi-rate wideband (AMR-WB) and multi-mode variable bit-rate wideband (VMR-WB) codecs | |
JP5543405B2 (ja) | フレームエラーに対する感度を低減する符号化体系パターンを使用する予測音声コーダ | |
EP1129450B1 (fr) | Codage a bas debit binaire de segments non voises de la parole | |
KR100798668B1 (ko) | 무성 음성의 코딩 방법 및 장치 | |
EP1214705B1 (fr) | Procede et appareil de maintien d'un debit binaire cible dans un codeur binaire | |
JP2003525473A (ja) | 閉ループのマルチモードの混合領域の線形予測音声コーダ | |
US20010051873A1 (en) | Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation | |
EP1181687B1 (fr) | Codage interpolatif a impulsions multiples de trames vocales de transition | |
US6434519B1 (en) | Method and apparatus for identifying frequency bands to compute linear phase shifts between frame prototypes in a speech coder | |
JP2002536694A (ja) | 音声コーダのための、1/8レート乱数発生のための方法と手段 | |
EP1129451A1 (fr) | Modele en boucle de codeur de la parole predictif, multimode et a debit variable |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 20010507 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE |
|
AX | Request for extension of the european patent |
Free format text: AL;LT;LV;MK;RO;SI |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN |
|
18D | Application deemed to be withdrawn |
Effective date: 20030603 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): DE FR GB |