CA2149039C - Coding with modulation, error control, weighting, and bit allocation - Google Patents

Coding with modulation, error control, weighting, and bit allocation Download PDF

Info

Publication number
CA2149039C
CA2149039C CA002149039A CA2149039A CA2149039C CA 2149039 C CA2149039 C CA 2149039C CA 002149039 A CA002149039 A CA 002149039A CA 2149039 A CA2149039 A CA 2149039A CA 2149039 C CA2149039 C CA 2149039C
Authority
CA
Canada
Prior art keywords
vectors
bit
code
error control
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002149039A
Other languages
French (fr)
Other versions
CA2149039A1 (en
Inventor
John C. Hardwick
Jae S. Lim
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Digital Voice Systems Inc
Original Assignee
Digital Voice Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US07/982,937 external-priority patent/US5517511A/en
Application filed by Digital Voice Systems Inc filed Critical Digital Voice Systems Inc
Publication of CA2149039A1 publication Critical patent/CA2149039A1/en
Application granted granted Critical
Publication of CA2149039C publication Critical patent/CA2149039C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0057Block codes
    • H04L1/0058Block-coded modulation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/002Dynamic bit allocation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/29Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes combining two or more codes or code structures, e.g. product codes, generalised product codes, concatenated codes, inner and outer codes
    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M13/00Coding, decoding or code conversion, for error detection or error correction; Coding theory basic assumptions; Coding bounds; Error probability evaluation methods; Channel models; Simulation or testing of codes
    • H03M13/35Unequal or adaptive error protection, e.g. by providing a different level of protection according to significance of source information or by adapting the coding according to the change of transmission channel characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0041Arrangements at the transmitter end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0045Arrangements at the receiver end
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/007Unequal error protection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • H04L1/004Arrangements for detecting or preventing errors in the information received by using forward error control
    • H04L1/0056Systems characterized by the type of code used
    • H04L1/0071Use of interleaving

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Various forms of coding are performed.
They include fundamental frequency encoding (1) and fundamental frequency deconding (2).

Description

t x.; .
._ WO 94112932 ~.~_4~~3~
,::
G~ODiTIG WTTIi MODULATION. ERROR CONTROL, WEiGHTiNG, AND EIT ALLOCATION
Background of the Invention This invention relates to methods for preserving the quality of speech or other acoustic signals when transmitted over a noisy channel.
Relevant publications include: .1. L. Flanagan, Speech Analysis, Synthesis and Perception, Springer-Verlag, 19 i ?, pp. 3 i S-356. j discusses phase vocoder - frequency-based speech analysis-synthesis system); Quatieri, et al.. "Speech Transformations Based on a Sinusoidal Representation". IEEE TASSP. Vol.
ASSP34, No. 6. Dec. 1986, pp. 1449-1986, (discusses analysis-synthesis tech-nique based on a sinusoidal representation); Griffin, '''~tultiband Excitation Vocoder", Ph.D. Thesis, 1Z.LT. 1987, (discusses an 8000 bps Multi-Band Ex-citation speech coder); Griffin, et al., ''A High ~ualitv 9.6 kbps Speech Coding System'', Proc. ICASSP 86, pp. 125-128, Tokyo, Japan, April 13-20, 1986, (discusses a 9600 bps Multi-Band Excitation speech coder):GrifFm, et al.. ":~
New IVIodel-Based Speech Analysis/Synthesis System''. Proc. ICASSP S5, pp.
X13-516, Tampa, FL.. iYlarch 26-29. 1985, (discusses itlulti-Band Excitation sp~~h model); Hardwick, "A 4.8 kbps Nlultir~and Excitation Speech Coder'' , S.M. Thesis, 1VI.L'I', May 1988, (discusses a 4800 bps :~Iulti-Band Excitation speech coder): iVIcAulay et al.', "Mid-Rate Coding Based on a Sinusoidal Rep-resentation of Speech". Proc. ICASSP S5, pp. 945-945. Tampa, FL., March x ~_ ;
26-29, 1985, (discusses the sinusoidal transform speech coder); Campbell et al.. "The New 4.800 bps Voice Coding Standard" , ~~Iil Speech Tech Confer-f ence, Nov. 1989, (discusses error correction in a Lv.S. Government speech .) coder); Campbell et al., "CELP Coding for Land Mobile Radio :applications".
Proc. IC?~SSP 90, pp. 465-465. .~lbequerquc. NCI. April 3-6. 1990. (dis-cusses error correction in a U.S. Government speech coder); Levesque et al..
Error-Control Techniques for Digital Communication, Wiley. 1985. (discusses error correction in general); Lin et al.. Error Control Coding, Prentice-Hall, 1983, (discusses error correction in general);Jayant et al., Digital Coding of Waveforms, Prentice-Hall, 1984, (discusses speech coding in general); Digital Voice Systems. Inc., "INMARSAT-M Voice Coder". Version 1.9, November 1S. 1992, (discusses 6.4 kbps IMBE~~'~ speech coder far INMARSAT-VI stan-dard), Digital Voice Systems, Inc., "APCO/NASTD/Fed Project 25 Vocoder Description", Version 1.0, December 1. 1992, (discusses 7.2 kbps IMBET'N
speech coder for APCO/NASTD/Fed Project 25 standard) (attached hereto as Appendix A).
The problem of reliably transmitting digital data over noisy communica-tion channels has a large number of applications. and as a result has received considerable attention in the literature. Traditional digital communication systems have relied upon error correction and detection methods to reliably transmit digital data over noisy channels. Sophisticated error coding tech-niques have been developed to systematically correct and detect bit errors which are introduced by the channel. Examples of commonly used error con-trol codes (ECC's) include: Golay codes. Hamming codes. BCH codes, CRC
codes. convolutional codes, Reed-Solomon codes. etc... . These codes all func-tion by converting a set of information bits into a larger number of bits which are then transmitted across the channel. The increase in the number of bits can be viewed as a form of redundancy which enables the receiver to correct .. W~ 94/12932 ' , PCT/US93111609 fS ~.
!.k...
X149039 ,.
:..
sw 3.
r.
l I

and/or detect up to a certain number of bit errors. In traditional ECC meth-j ods the number of bit errors which can be corrected/detected is a function of the amount of redundancy which is added to the data. This results in a trade-, off between reliability (the number of bit errors which can be corrected) versus useable data rate (the amount of information which can be transmitted after leaving room for redundancy). The digital communication designer typically performs a sophisticated system analysis to determine the best compromise between these two competing requirements.
The reliable transmission of speech or other acoustic signals over a commu-nication channel is a related problem which is made more complicated by the need to first convert the analog acoustic signal into a digital representation.
This is often done by digitizing the analog signal with an A-to-D convertor.
In the case of speech, where an 8 bit A-to-D convertor may sample the signal at a rate of 8 kHz, the digital representation would require 64 kbps. If ad-ditional, redundant, information must be added prior to transmission across the channel, then the required channel data rate would be significantly greater than 64 kbps. For example, if the channel requires 50% redundancy for reli-able transmission, then the required channel data rate would be 64 + 32 = 96 kbps. Unfortunately this data rate is beyond what is practical in many digital communication systems. Consequently some method for reducing the size of the digital representation is needed. This problem, commonly referred to as "compression" ; is ;performed by a, signal coder. In the case of speech or other acoustic signals a system of this type is often referred to as a speech coder.
voice coders, or simply a vocoder. , A modern speech coder performs a sophisticated analysis on the input sig-nal, which can be viewed as either an analog signal or the output of an A-to-D
2 PCT/US93/11609 ~1~9039 converter. The result of this analysis is a compressed digital representation .
w which may be as low as 100 bps. The actual compressed rate which is achieved ' a is generally a function of the desired fidelity (i.e. speech quality) and the type of speech coder which is employed. Different types of speech coders have been designed to operate at high rates (16 - 64 kbps), mid-rates ('? - lfl kbps) and low rates (0 - 2 kbps). Recently, mid-rate speech coders have been the sub-ject of renewed interest due to the increase in mobile communication sermces (cellular, satellite telephony, land mobile radio, in-flight phones, etc...).
These applications typically require high quality speech at mid-rates. In additions is these applications are all subject to significant channel degradations including in high bit error rates (BER) of 1-lf)% and multipath fading. (iVote the prob-lem of bit errors is present to some extent in all digital communication and storage applications. The mobile communication example is presented due to the severity of the problem in the mobile environment) ~s discussed above, there are .numerous speech coding methods which have been employed in the past. One class of speech coders which have been exten-sively studied and used in practice is based on an underlying model of speech.
Examples from this class of vocoders include linear prediction vocoders, homo-morphic vocoders, sinusoidal transform coders, mufti-band excitation speech coders, improved mufti-band excitation speech coders and channel vocoders. .
In these vocoders, speech is characterized on a short-time basis through a set of model parameters. The model parameters typcially consist of some com-bination of voiced/unvoiced decisions, voiced/unvoiced probability measure.
A
pitch period, fundamental frequency, gain, spectral envelope parameters and ~
;:
residual or error parameters. For this class of speech coders, speech is ana-, lyzed by first segmenting speech using a window such as a Hamming window.

r f .. WO 94112932 2 ~ ~ 9 Q 3 ~ PCT/US93/11609 ~ <,;;, i Then, for each segment of speech, the model parameters are estimated and quantized. i In noisy digital communication systems, the traditional approach is to pro-tect the quantized model parameters with some form of ECC. The redundant information associated with the ECC is used by the receiver to correct and/or detect bit errors introduced by the channel. The receiver then reconsiructs the model parameters and then proceeds to synthesize a digital speech signal which is suitable for playback through a D-to-A convertor and a speaker. The inclusion of error control capabilitv allows the receiver to reduce the distortion and other artifacts which would be introduced into the synthesized speech due to the presence of bit errors in the received data. Unfortunately, with any error control code, there is some probability that too many errors will be in-troduced for the receiver to correct. In this case the remaining bit errors will affect the reconstruction of the model parameters and possibly introduce signif-icant degradations into the synthesized speech. This problem can be lessened by either including additional error control codes, or by including additional error detection capability which can detect errors which cannot be corrected.
These traditional approaches require' additional redundancy and hence further increase the channel data rate which is required to transmit a fixed amount of information. This requirement is a disadvantage, since in most applications it is desirable to minimize the total number of bits which are transmitted (or stored).
The invention described herein applies to many different digital commu-,..
nication .systems, some of which contain speech coders. Examples of speech .:;
_.
coders which may be contained in such a communication system include but are not limited to linear predictive speech coders, channel vocoders, homo-~~~ J.1;''.':' yV~ 94!12932 . ~ PCT/US93I11609 , ~1~903~
fi morphic vocoders, sinusoidal transform coders, multi-band excitation speech M
.
caders and improved multibancl excitation ( IMBET'" ) speech coders, For the purpose of describing the details of this invention, we have focussed on a digital .
communication system containing the IivIBETft~ speech coder. This particular speech coder has been standardized at 6.4 kbps for use aver the INMARSAT-i\r (International Marine Satellite Organization) and OPTUS Mobilesat satellite communication system, and which has been selected at r .2 kbps for use in the :~PCO/NASTD/Fed Project '?5 North American land mobile radio standard.
The IMBET'~f coder uses a robust speech model which is referred to as the iVlulti-Band Excitation (MBE) speech model. The MBE speech model was developed by Griffin and Lim in 19b4. This model uses a more flexible representation of the speech signal than traditional speech models. As a con sequence it is able to produce more natural sounding speech, and it is more robust to the presence of acoustic background noise. These properties have caused the MBE speech model to be used extensively for high quality mid-rate speech coding.
Let s(n) denote a discrete speech signal obtained by sampling an analog speech signal. In order to focus attention on a short segment of speech over which the model parameters are assumed to be constant, the signal s(n) is multiplied by a window w(n) to obtain a windowed speech segment or frame, sw(n). The speech segment sw(n) is modelled as the response of a linear filter h~,(n) to some excitation signal ew(n)~ Therefore, Sw(w), the Fourier Transform of su(n), can be expressed as ':.:;:
4',: : . ~ ~ .
S~(W) = H~(W)E,~(w) ( 1 ) c where Hw(w) and E~,(w) are the Fourier Transforms of hw(n) and eW(n), re- .

t WO 94112932 ~ ~ PCT/i1S93/11609 4;;, ;.
i I
spectively. The spectrum hl",(w) is often referred to as the spectral envelope of the speech segment. ' i In traditional speech models speech is divided into two classes depending upon whether the signal is mostly periodic (voiced) or mostly noise-like (un-voiced). For voiced speech the excitation signal is a periodic impulse sequence.
where the distance between impulses is the pitch period. For unvoiced speech the excitation signal is a white noise sequence.
In traditional speech models each speech segment is classified as either entirely voiced or entirely unvoiced. In contrast the MBE speech model divide's the excitation spectrum into a number of non-overlapping frequency bands and makes a voiced or unvoiced (V/UV) decision for each frequency band. This approach allows the excitation signal for a particular speech segment to be a mixture of periodic (voiced) energy and aperiodic (unvoiced) energy. This added flexibility in the modelling of the excitation signal allows the VIBE
is speech model to produce high quality speech and to be robust to the presence of background noise.
Speech coders based on the MBE speech model estimate a set of model parameters for each segment of speech. The MBE model parameters con-sist of a fundamental frequency, a set of V/UV decisions which characterize the excitation signal, and a set of spectral amplitudes which characterize the spectral envelope. Once the MBE model parameters have been estimated for ;
each segment, they are quantized, protected with ECC and transmitted to the decoder. The decoder then performs error control decoding to correct a' .
and/or detect bit errors. The resulting bits are then used to reconstruct the a MBE model parameters which are in turn used to synthesize a speech signal suitable for playback through a D-to-A convertor and a conventional speaker.
;.

WO 94!12932 PCT/US93l11609 , ~1~.9~39~
Summary of the Invention M
In a first aspect, the invention features a new data encoding method which uses bit modulation to allow uncorrectable bit errors to be detected without requiring any further redundancy to be added to digital data stream. The digital data is first subdivided into contiguous frames. Then for each frame.
a modulation key is generated from a portion of the digital data, which is in turn used to generate a unique modulation sequence. This sequence .is then combined with the digital data after error control coding has been applied. .-decoder which receives a frame of modulated data attempts to generate the correct demodulation key, demodulate the data and perform error control de-coding. An error measure is computed by comparing the data before and after error control decoding. The value of the error measure indicates the probabil-ity that ~~he demodulation key is incorrect. If the value of the error measure exceeds a threshold, then the decoder declares the current frame of digital data to be invalid and performs a frame repeat or some other appropriate action.
In a second aspect, the invention features a bit prioritization method which improves the reliability with which a set of quantizer values can be transmitted over a noisy communication channel. This new method assigns a weight to each bit location in a set of quantizer values. In any one quantizer value, the weight is greater for a more significant bit location than for a less significant bit location. The weight of bit locations of the same significance in different quantizer values varies depending upon the sensitivity of the different quantizer values to bit errors; more sensitive bit locations receiving a higher weight than . , less sensitive bit locations. The bits in each of the bit locations are then prioritized according to their weight, and the prioritized bits are then encoded .
with error control codes. Error control codes with higher redundancy are WO 94112932 ~ , PCT/US93/116U9 '.
~1 ~~ 3 9 v ~~y-°a.
,~ ~. ~ ,.
i. .. .

s typically used to encode the higher priority (i.e. higher weight) bits, while ~ !
lower redundancy error control codes are used to encode the lower priority i ;..
(i.e. lower weight) bits. This method improves the efficiency of the error control codes, since only the most critical bits are protected with the high redundancy codes. The decoder which receives the prioritized data, performs error control decoding and then rearranges the bits, using the same weighting method, to reconstruct the quantizer values.
In a third aspect, the invention features an improved method for decoding and synthesizing an acoustic signal from a digital data stream. This method divides the digital data into contiguous frames each of which is associated with a time segment of the signal. The method then performs error control decoding of the digital data and then performs further decoding to reconstruct a frequepcy domain representation of the time segments. The number of er-rors detected in each frame is determined by comparing the data before and after error control decoding. The frequency domain representation is then smoothed depending upon the number of detected errors, and the smoothed representation is used to synthesize an acoustic signal. Typically the amount of smoothing is increased as the number of detected errors increases, and the amount of smoothing is decreased as the number of detected errors decreases. , This method reduces the amount of degradation a listener perceives when hearing the synthesized acoustic signal if the digital data contains a substan-tial number of bit errors.
In a fourth aspect, the invention features a particular advantageous bit allo-.., cation for a 7.2 kbps speech coder and decoder. In such a system, each frame =
. ~;.
has 144 bits, which must be allocated to various parameters. ~Ve have dis covered, after considerable experimentation, that a particularly advantageous allocation of these bits is as follows: 88 bits for the speech model parameters and 56 bits for error control coding. Preferably, the 88 bits allocated to speech model parameters is further allocated as follows: 8 bits for the fundamental frequency, K bits for the voiced/unvoiced decisions, and 79 - K bits for the spectral amplitudes, and 1 bit for synchronization.
In accordance with another aspect of the invention, there is provided a method for encoding digital data. The method includes dividing the digital data into one or more frames, and further dividing each of the frames into a plurality of bit vectors. The method further includes encoding one or more of the bit vectors with error control codes, generating a modulation key from one or more of the bit vectors, and using the modulation key to modulate one or more of the encoded bit vectors.
In accordance with another aspect of the invention, there is provided a method for decoding digital data that has been encoded using an encoding method described herein. The decoding method includes dividing the digital data that has been encoded into one or more frames, and further dividing each of the frames into a plurality of code vectors. The decoding method further includes generating a demodulation key from one or more of the code vectors, using the demodulation key to demodulate one or more of the code vectors, and error control decoding one or more of the demodulated code vectors.
In accordance with another aspect of the invention, there is provided an apparatus for encoding a speech signal into digital data. The apparatus includes a processor configured to perform the steps of sampling the speech signal to obtain a series of discrete samples and constructing therefrom a series of frames, each frame spanning a plurality of the samples.
The processor is further configured to analyze the frames to extract the parameters of a speech coder, and to use a quantizer to convert the parameters into a set of discrete quantizer values. The processor is further configured to divide the quantizer values into a plurality of bit vectors, and to encode one or more of the bit vectors with error control codes. The processor is also configured to generate a modulation key from one or more of the bit vectors, and to use the modulation key to modulate one or more of the encoded bit vectors.
In accordance with another aspect of the invention, there is provided an apparatus for decoding speech from digital data which has been encoded using an encoding apparatus as described herein. The apparatus for decoding includes a processor configured to perform the steps of dividing the digital data into one or more frames, and further dividing the frames into a plurality of code vectors. The processor is further configured to generate a demodulation key from one or more of the code vectors, and to use the demodulation key to demodulate one or more of the code vectors. The processor is further configured to error control decode one or more of the demodulated code vectors.
In accordance with another aspect of the invention, there is provided a method for error control coding of digital data. The method includes dividing the digital data into a plurality S of bit vectors, including a first bit vector, and encoding the bit vectors with error control codes, to produce encoded bit vectors, including an encoded first bit vector.
The method further includes generating a modulation key from at least the first bit vector, and using the modulation key to modulate at least some of the encoded bit vectors.
In accordance with another aspect of the invention, there is provided a method of decoding digital data that has been encoded by an encoding method. The encoding method includes dividing the digital data into a plurality of bit vectors, including a first bit vector, and encoding the bit vectors with error control codes, to produce encoded vectors, including an encoded first bit vector. The encoding method further includes generating a modulation key from at least the first bit vector, and using the modulation key to modulate at least some of the encoded bit vectors, to produce modulated encoded bit vectors. The method of decoding includes dividing the digital data to be decoded into a plurality of code vectors, the code vectors corresponding to the modulated encoded bit vectors, and generating a demodulation key from at least one of the code vectors. The decoding method further includes using the demodulation key to demodulate at least some of the code vectors, to produce demodulated code vectors, and error control decoding at least some of the demodulated code vectors.
In accordance with another aspect of the invention, there is provided an apparatus for encoding a speech signal into digital data. The apparatus includes means for sampling the speech signal to obtain a series of discrete samples and constructing therefrom a series of frames, each frame spanning a plurality of the samples. The apparatus further includes means for analyzing the frames to extract the parameters of a speech coder, means for converting the parameters into a set of discrete quantizer values, and means for dividing the quantizer values into a plurality of bit vectors. The apparatus further includes means for encoding one or more of the bit vectors with error control codes, means for generating a modulation key from one or more of the bit vectors, and means for using the modulation key to modulate one or more of the encoded bit vectors.
In accordance with another aspect of the invention, there is provided an apparatus for decoding speech from digital data which has been encoded using an encoding apparatus as described herein. The apparatus for decoding includes means for dividing the digital data into one or more frames, means for further dividing the frames into a plurality of code f vectors, and means for generating a demodulation key from one or more of the code vectors.
The apparatus further includes means for using the demodulation key to demodulate one or more of the code vectors, and means for error control decoding one or more of the demodulated code vectors.
Other features and advantages of the invention will be apparent from the following description of preferred embodiments and from the claims.
Brief Description of the Drawings FIG. 1 is a block diagram showing a preferred embodiment of the invention in which the fundamental frequency used in the IMBETM speech coder is encoded and decoded.
FIG. 2 is a block diagram showing a preferred embodiment of the invention in which the voiced/unvoiced decisions used in the IMBETM speech coder are encoded and decoded.
FIG. 3 is a block diagram showing a preferred embodiment of the invention in which the spectral amplitudes used in the IMBETM speech coder are quantized into a set of quantizer values, denoted b2 through bL+n FIG. 4 shows a preferred embodiment of the invention in which the spectral amplitudes prediction residuals used in the IMBETM speech coder are divided into six blocks for L = 34 .
FIG. 5 is a block diagram showing a preferred embodiment of the invention in which the gain vector, used in part to represent the IMBETM spectral amplitudes, is formed.
FIG. 6 is a block diagram showing a preferred embodiment of the invention in which the spectral amplitudes used in the IMBETM speech coder are WO 94112932 PCTlUS93111b09 Il reconstructed (i.e. inverse quantized) from a set of quantizer values. denoted ,~
r a b2 through bL+w 1 Figure 7 is a block diagram showing a prefered embodiment of the invention in which the quantizer values for a frame of IMBET'~r model parameters are encoded into a frame of digital data.
Figure $ is a block diagram showing a prefered embodiment of the invention in which a frame of digital data is decoded into a set of quantizer values representing a frame of IMBET'~t model parameters.
Figure 9 'shows a prefered embodiment of the invention in which the quan-tizer values representing the spectral amplitudes used in the i .2 kbps IiVIBET'~~
speech coder are prioritized for L = 16.
Figure 10 is a block diagram showing a prefered embodiment of the in-vention .in which the four highest priority code vectors are formed from the quantizer values used in the 7.2 kbps IMBET~'r speech coder.
Figure 11 is a block diagram showing a prefered embodiment of the in-vention in which the four lowest priority code vectors are formed from the quantizer values used in the i .2 kbps IyIBET'~l speech coder.
Figure 12 is a block diagram showing a prefered embodiment of the inven-tion in which the spectral amplitudes used in the IZ~IBETn~ speech coder are 20 adaptively smoothed.
Figures 13 and 14 are flow charts showing a prefered embodiment of the invention in which the model parameters used in the IiVIBE~'~r speech coder are adaptively smoothed and in which frame repeats and frame mutes are :.
performed if an error measure exceeds a threshold. ~=-'' 25 Figures 15 and 16 are flow charts showing a prefered embodiment of the ,.
invention in which the quantizer values representing a frame in the 7.'? kbps '.
i 4 9 ~ ~ ~ i . PCT/US93J11609 ;;;::
WO 94J12932 °' 1'?
II~IBET'~~ speech coder are encaded into a frame of digital data. ,.
Figures 1 i and 15 are flow charts showing a prefered embodiment of the invention in which the received data representing a frame in the i .'? l:bps IMBET~f speech coder is decoded into a set of bit encodings.
S
Description of Preferred Embodiments of the Invention The preferred embodiment of the invention is described in the context of the 7.2 kbps IMBET''~r speech coder adopted as the land mobile radio standard by APCO/Nr~STD/Fed Project ?5. In the IMBETVf speech coder the speech ' 0 signal is divided into segments and each segment is used to estimate a set of parameters which consist of the fundamental frequency, ;.'vv, the V/U~' deci-sions, vk for 1 < k < Ii , and the spectral amplitudes. :l~h for 1 < l < L.
quantizer is then employed to reduce each set of model parameters to a frame of quantizer values, denoted by bb, bl, . . . , b~+2. Since the Project 2~
speech coder is designed to operate at i .2 kbps with a ?0 ms. frame length, only 144 bits are available per frame. Of these 144 bits. 56 are reserved for error control, leaving only 88 bits to be divided over the frame of L + 3 quantizer values. These 88 bits must be used in a sophisticated manner to ensure that the MBE model parameters are transmitted with sufficient fidelity to allow 20 the decoder to synthesize high quality speech. The method of achieving this goal is described in the following sections.
The encoder and decoder both allocate the 88 bits per frame in the manner shown in Table 1. This table highlights the fact that the bit allocation varies , from frame to frame depending upon the number of model parameters. :~s will be described ,below, the exact bit allocation is fully determined by the six .
most significant bits of the pitch.

s;;:
~'.

. . 'NO 94112932 ~ 1'CT/LJS93/11609 r:''.
:..::

-:
;:.

1:3 i f Parameter Number of Bits Fundamental Frequency Voiced/Unvoiced Decisions Spectral Amplitudes ~ 9 - 1i Synchronization 1 Table 1: Bit Allocation Among Model Parameters valuebits Table 2: Eight Bit Binary Representation The fundamental frequency is estimated with one-quarter sample resolution in the interval 123125 ~ ~''~ ~ lsa~~. however, it is only encoded at half-sample resolution. This is accomplished by finding the value of bo which satisfies:
by - l4~ _ ss~ ~: (') wo The quantizer value bo is represented with 8 bits using the unsigned binary rep-resentation shown in Table 2. This representation is used throughout IMBE~'~~
L' speech coder.
The fundamental frequency is decoded arid reconstructed at the receiver by using Equation (3) to convert bo to the received fundamental frequency wo.

WO 94112932 , , PCT/US93/1~609 e~v~- ~~:,',:, In addition bo is used to calculate Ii and L, the number of V/tIV decisions and the number of spectral amplitudes, respectively. These relationships arc:
given in Equations (~) and (~).
_ _ 4~r (3) ~~ ! bo + 39.5 L = (.9254L~o +.?5JJ ('~) L(~+21~ if l, < 3s 1i = 3 ( ~~ ) 1? otherwise A block diagram of the fundamental frequency encoding and decoding process is shown in Figure 1.
Equation (~) shows that the parameter I. is only a function of the six most significant bits (MSB's) of the quantizer value bo. As will be discussed below, the IMBET~~ speech encoder and decoder interpret each frame using a variable frame format (i.e: variable bit allocation) which is specified by the parameters, L and L, respectively. Since for proper operation the frame format used by the encoder and decoder must be identical, it is extremely important that L = 1..
Because of this fact these six bits are considered the highest priority bits in the IMBETI''r speech frame.
24 The V/LTV decisions vk, for 1 < k < 1i , are binary values which classify each frequency band as either voiced or unvoiced. These values are encoded _ , using b1 = ~ vk ~h-k (fi) , k=1 t.
r The quantizer value bi is represented with Ii bits using an unsigned binary i representation analogous to that shown in Table 2. -.. .. ~~O 94/12932 PCTlUS93/11609 ~..;,:v c: .
t
3 l~

:fit the receiver the h; bits corresponding to b, are decoded into the V,/L~~
decisions vi for 1 < ! < L. Vote that this is a departure from the V/L~V con-_ -vention used by the encoder, which used a single V/UV decision to represent an entire frequency band. Instead the decoder uses a separate V/L'V decision for each spectral amplitude. T'he decoder performs this conversion by using b~
to determine which frequency bands are voiced or unvoiced. The state of i~~ is then set depending upon whether the frequency.v = ! ~ ~:vo is within a voiced or unvoiced frequency band. This can be expressed mathematically as shown in the following two equations.
to 1~.~+2!~ if 1 < 36 ( 12 otherwise v~ = b1 _ ~ b1 for 1 <_ ! <_ L ( 5 ) ~li_~, ~h'+1-K, Figure ? shows a block diagram of the V/UV decision encoding and 'decoding process.
The spectral amplitudes lYlr, for 1 < ! < L, are real values which must be quantized prior to encoding. This is accomplished as shown in Figure 3, by forming the spectral amplitude prediction residuals T~ for 1 < f < L.
according to Equations (9) through (1~). For the purpose of this discussion ~lT~(o) refers to the unquantized spectral amplitudes of the current frame, while Eli(-1 ) refers to the quantized spectral amplitudes of the previous frame. Similarly L(o) refers to the number of h~.rmonics in' the current frame. while L(-1 ) refers to the number of harmonics in the previous frame.
= L(-1) . ! (9) ~ L(o) s~ - y _ 1~~~ ( 10 ) ,; _:-.. ,.. : :.
WO 94/12932 PCTIUS93I11609 ..
.
1 f>~
T, - loge Vli(0) - p ( 1 - drl lcg2 el~~,i(-1 ) ,. .
- p b, logy .ll~k~l+1(-1 ) L,(o) + F~ ~ ( 1 - s.v ) logy ~yilkai ( -1 ) + di logy :ll~kai+, ( .-f. 1 ) L(0) a-, S The prediction coefficient, p. is adjusted each frame according to the following rule:
if L(0) <_ 15 p = .03L(0) - .05 if 15 < L(0) < ''-1 ( 1~') ,; otherwise 1~ In order to form T~ using equations (9) through ( 12), the following assumptions are made:
Mo(-1) - 1.0 (13) Wit(-1) - it~l~~_I)(-1) for I > L(-1) (1~) Also upon initialization Mi( --~ 1 ) should be set equal to 1.0 for all l, and L(-1 ) = 30.
The L prediction residuals are then divided into 6 blocks. The length of each block, denoted J; for 1 < i < 6, is adjusted such that the following constraints are satisfied.
~ j~ = L ( 16 ) im ~L~ v L fort<-i<5 (.16) l 6 ~. < ~; <. J;+~ <_ 6 .
The first or lowest frequency block is denoted by c~,; for 1 < j < Jl. and ' ~:
it consists of the first J~ consecutive elements of T, (i.e. 1 < l < Jl). The second block is denoted by c2,~ for 1 < j < J2, and it consists of the next J~
, consecutive elements of Ti (i.e. J1 -E- 1 <_ l < J~ + J~). This continues through p:: .
WO 94112932 ~ PCT/US93/11609 .. 6-( r V t '.' ' S
1i the sixth - -or highest frequency block, which is denoted by c,;,~ for 1 < j <
.1,;. ,~
It consists of the last JF consecutive elements of T, ( i.e. 1. + 1 -- Js < I
< L 1. ' i :fin example of this process is shown in Figure ~I for L = ~3-1.
Each of the six blocks is transformed using a Discrete Cosine Transform (DCT). The length of the DCT for the i'th block is equal to J;. The DCT
coe$icients are denoted by C;,k, where 1 < i < 6 refers to the block number.
and 1 < k < J; refers to the particular coefficient within each block. The formula for the computation of these DCT coefficients is as follows:
1 ~ %r(W 1)(.7 W) Ct.k = - ~ ~;,.i cos( ~ , ~ for 1 < k < J; ( 1 ~ ) J; W , The DCT coef~rcients from each of the six blocks are then divided into two groups. The first group consists of the first DCT coef~rcient from each of the six block. These coefficients are used to form a six element vector, R; for 1 < i < 6, where R; = C;,1. The vector R; is referred to as the gain vector.
and its construction is shown in Figure 5.
The second group consists of the remaining higher order DCT coefficients.
These coefficients correspond to C';,;. _ _ -where 1 < i < 6 and '? < j < J;.
Vote that if J; = l, then there are no higher order DCT coefficients in the i'th block.
One important feature of the spectral amplitude encoding method, is that the spectral amplitude information is transmitted differentially.
Specifically, a prediction residual is transmitted which measures the change in the spectral envelope between the current frame and the previous frame. In order for a differential scheme of this type to work properly, the encoder must simulate the , _..
operation of the decoder and use the reconstructed spectral amplitudes from ' the previous frame to predict the spectral amplitudes of the current frame.
The IMBET'~ spectral amplitude encoder simulates the spectral amplitude l~
decoder by setting L = L and then reconstructing the spectral amplitudes as discussed above. This is shown as the feedback path in Figure 3.
The gain vector can be viewed as a coarse represention of the spectral envelope of the current segment of speech. The quantization of the gain vector begins with a six point DCT of R; for 1 < i < 6 as shown in the following equation.
Gm = 6 ~R; cos(~(m 6)(z Z)~ for 1 < m <_ 6 (18) ~_i The resulting vector, denoted by G,~ for 1 < m. < 6. is quantized in two parts.
The first element, G~, can be viewed as representing the overall gain or level of the speech segment. This element is quantized using the 6 bit non-uniform quantizer given in Appendix E of the APCO/NASTD/Fed Project 25 Vocoder Description. The 6 bit value b2 is defined as the index of the quantizer value (as shown in this Appendix) which is closest, in a mean-square error sense. to G, . The remaining five elements of Gm are quantized using uniform scalar quantizers where the five quantizer values b3 to b, are computed from the vector elements as shown in Equation ( 19 ).
i Cr~_1 ~ < ~r~Bm-1 ~m bm 2Bm - 1 if ~~-r=-' J > 2B"' ' fOr 3 < TT't < L ( 19 ) ~m ~G ~' ~ ~- ~gm-' otherwise The parameters Bm and L1m in Equation (19) are the number of bits and the step sizes used to quantize each element. These values are dependent upon L.
which is the number of harmonics in the current frame. This dependence is tabulated in Appendix F of the APCO/NASTD/Fed Project 25 Vocoder De-scrip.tion. ' Since L is known by the encoder, the correct values of Bm and .gym are first obtained using this Appendix and then the quantizer values bm for :3 < rn < i are computed using Equation ( 19). The final step is to convert each quantizer value into an unsigned binary representation using the same method as shown in Table 2.
Once the gain vector has been quantized, the remaining bits are used to encode the L - 6 higher order DCT coefficients which complete the represen-tation of the spectral amplitudes. Appendix G of the APCO/NASTD/Fed Project 25 Vocoder Description, shows the bit allocation as a function of l, for these coefficients. For each value of L
ZO the L - 6 entries. labeled b8 through bL+~', provide the bit allocation for the higher order DCT coefficients. The adopted convention is that ~bq, b5, . . ..
bL+y correspond to (C1.?, Ci.3, . . ., CI.~, , . . ., C6.z, Cs.s, . . ..
G6.~6).
respectively.
Once the bit allocation for the higher order DCT coefficients has been obtained, these coefficients are quantized using uniform quantization. The step size used to quantize each coefficient must be computed the bit allocation and the standard de~~iation of the DCT coefficient using Tables 3 and -~. For example. if 4 bits are allocated for a particular coefficient, then from Table the step size, ~, equals .40~. If this was the the third DCT coef$cient from any block (i.e. C;,s). then o = .241 as shown in Table 4. Performing this multiplication gives a step size of .0964. Once the bit allocation and the step sizes for the higher order DCT coefficients have been determined. then the quantizer values bm for a < m < L + 1 are computed according to Equation WO 94112932 . . PCTlt3S93/11609 I

i a.
\~umber of Bits Step Sire I 1..y y .85~

.6~r~

:~ .~0~

.'~8Q

6 .15Q

j .08~

S .04Q

g .02~

10 .01Q

Table 3: Uniform ~uantizer Step Size for Higher Order DCT Coefficients (20).
if ~_C~.k~ ~ ~ 7Bm-1 m ~Bm - 1 , if~Cs~k~ > 7Bm-1 for b <_ m < L + 1 (20) Om l~-.-~k ~ -f- 28m-' otherwise am The parameters t?m, Bm and ~m in equation (?0) refer to the quantizer value.
the number of bits and the step size which has been computed for C't,k, re-spectively: Note that the relationship between rrc, i, and k in Equation (20) is known and can be expressed as:
m=6+k+~Jn (~'1) n.1 Finally, each quantizer value is converted into the appropriate unsigned binary representation which is analogous to that shown in Table 2.
In order for the decoder to reconstruct the spectral amplitudes the param-eter L must first be computed from 6o using Equations (3) and (4). Then ~~~~~) -: :. ..::

~;a.,' .. . W~ 94/12932 '? 1 DC~" ~'oe,/,~cientQ

~.',.~ .307 Ca,a v41 C;,a .207 C~.s .190 C;.6 .190 y - .179 C;.B .173 .163 .170 Table 4: Standard Deviation of Higher Order DCT Caefhcients the spectral amplitudes can be decoded and reconstructed by inverting the quantization and encoding procedure described above. .~ block diagram of the spectral lamplitude decoder is shown in Figure 6.
The first step in the spectral amplitude reconstruction process is to divide the spectral amplitudes into six blocks.' The length of each block, J; for 1 <
i < 6. is adjusted to meet the following constraints.
s.
~J, - L
~L~<J;<J;+i<_~~~ forl<i<5 (23) fi ;
The elements of these blocks are denoted by C;,k, where 1 < i < 6 denotes the block number and where 1 < Ik < J; denotes the element within that block.
_ _ A
The first element of each block is then set equal to the decoded gain vector R;

via equation (24).
forl<i<6 (~-1) .. C;.~=R; :.

.).) The remaining elements of each block correspond to the decoded higher order DCT coefficients.
The gain is decoded in two parts. First the six bit quantizer value b2 is used to reconstruct the first element of the transformed gain vector, denoted by G 1. This is done by using the 6 bit value b2 as an index into the quantizer values listed in Appendix E of the APCO/NASTD/Fed Project 25 Vocoder Description. Next the five quantizer values b3 through b~ are used to reconstruct the remaining five elements of the trans-formed gain vector. denoted by Gz through G6. This is done by using L. the number of harmonics in the current frame, in combination with the table in this Appendix to establish the bit allocation and step size for each of these five elements. The relationship between the received quantizer values and the transformed gain vector elements is expressed in Equation (25), 0 ifBm=0 G'"-~ = for 3 < m < i (25) t'1", (b"~ - 2gm-i -~ .5) otherwise where :'m and Bm are the step sizes and the number of bits found via Appendix F of the APCO/NASTD/Fed Project 25 Vocoder Description.
Once the transformed gain vector has been reconstructed in this manner, the gain vector R; for 1 _< i _< 6 must be computed through an inverse DCT of Gm as shown in the following equations.
R; _ ~ a(m) Gm cos(~(m 6)(z Z)~ for 1 < m < 6 (26) mcl - -1 ifrrr=1 a(m) _ ('-'i) 2 otherwise The higher order DCT coefficients, which are denoted by C;,k for 2 < i < G
and I < k < J;, are reconstructed from the quantizer values b8, b9, . . . , bL+m ''3 First the bit allocation table listed in Appendix G of the r~PCO/Na~STD/Fed Project 25 Vocoder Description is used to ~3etermine the appropriate bit allocation. The adopted convention is that (b8, b4, . . . bL+i j correspond to (Cl.z, Cl,s, . . . Cl.~, , . - . Cs.?, Ls.a. .
. ..
Cs~i6j~ respectively. Once the bit allocation has been determined the step sizes for each C;,k are computed using Tables 3 and 4. The determination of the bit allocation and the step sizes for the decoder proceeds in the same manner as in the encoder. Using the notation Bm and :gym to denote the number of bits and the step size, respectively, then each higher order DCT coefficient can be reconstructed according to the following formula, U ' if Bm = U
C;,k= fors<m <L+1 (28) ~"i (bm - 2B~'-' ~- .~) otherwise where as' in Equation (21 ), the following equation can be used to relate m.
i.
and k.
;-i m. - g ..~ ~. -~- ~ jn ( ~g ) nm Qnce the DCT coefficients C;,,~ have been reconstructed, an inverse DCT
is computed on each of the six blocla to form the vectors c;.j. l his is done using the following equations for 1 < i < f .
~t,J = ~ a(k)Cf.k cos(~(k 1 )(J - z )~ for 1 < j < J; (30) .1:
1 ifk--.1 a(k) _ (31 ) otherwise The six transformed blocla c;,~ are then joined to form a single vector of length L, which is denoted Ti for 1 < I < L. The vector T~ corresponds to the reconstructed spectral amplitude prediction residuals. The adopted convention VVO 94112932 ; PCT/LTS93/11609 ~1~9039 is that the first J~ elements of Tr are equal to ~~,~ for 1 < j < J,. The neat ,l-~ M
elements of Tr are equal to c~,; for 1 < j < J2. This continues until the last J,;
elements of T, are equal to c~,~ for 1 < j < Js. Finally. the reconstructed log., spectral amplitudes for the current frame are computed using the following equations.
kr. = LL(0)) . ! (3~) br = kr - lkr~ ( 33 ) loge ~1%Ir(0) = Tr + p ( 1 - br) logz ~yl~k~~ (-l ) to + p br log ~l~lik~J+I(-1 ) 1.(o) 1 - b a to ., ~tl ( -1 ) -1- b.~ log2 VI k ( -~3'~ ) - ~ ( . ) g_ lkaJ 1 aJ+~
L(0) w, In order to reconstruct Mr(0) using equations (32) through (34), the following assumptions are always made:
:5 l~la(-1) -- 1.0 ~ (35) il~lr(-1 ) - :11~(_~)(-1 ) for ! > L(-1 ) (361 In addition it is assumed that upon initialization :blr(-1 ) = 1 for all 1.
and L(-1 ) = 30. Note that later sections of the IMBET~~ decoder require the 20 Spectral amplitudes, Mr for 1 < 1 < L, which must be computed by applying the inverse loge to each of the values computed with Equation (34).
One final note is that it should be clear that the IlVIBE~M speech coder uses a variable frame format (i.e. variable bit allocation ) which is dependent ' t upon the number of harmonics in each frame. ~t the encoder the value L
is used to determLne the bit allocation and quantizer step sizes, while at the .
decoder the value L is used to determine the bit allocation and quantizer step s ;4.
WO 94112932 ~ PCT/US93/11609 ~' t, .,.
1;.'.
i ~..'. J
c f sizes. In order to ensure proper operation it is necessary that these two values ~, ' be equal (i.e. L = L). The encoder and decoder are designed to ensure this a property except in the presence of a very large number of bit errors. In addition the use of bit modulation allows the decoder to detect frames where a large number of bit errors may prevent the generation of the correct bit allocation and quantizer step sizes. In this case the decoder discards the bits for the current frame and repeats the parameters from the previous frame. This is discussed in more detail in latter sections of this document.
.-~ final one bit quantizer value is reserved in each speech frame for svn-chronization. This quantizer value. denoted by b~+~ is set to an alternating sequence by the encoder. If this bit was set to 0 during the previous speech frame, then this bit should be set to a 1 for the current speech frame. Oth-erwise, ii'; this bit was set to 1 during the previous speech frame, then this bit should be set to a 0 for the current speech frame. This is expressed in the following equation, where 6L+2(0) refers to the value for the current frame, while 6L+2(-1) refers to the value for the previous frame.
- 0 if b~+z(-I ) = I ( ) ~3 ~
i,+2( .
1 otherwise It is assumed that bL+2(0) should be set equal to 0 during the first frame following initialization.
The decoder may use this bit to establish synchronization. As presented .
later in'this description, this bit is'not drror control encoded or modulated.
and it is placed in a fixed offset relative to the beginning of each 1-I~ bit frame of speech data. The decoder may check each possible offset in the v received data stream and establish which offset is most likely to correspond to the synchronization bit. The beginning of each speech frame can then be ,., ~1 '~ vv.
WO 94/12932 '~ PCTlUS93/1IG09 f';v..
~903.~~~.~
'? G
established using the known distance between the 1>eQinning of each speech Y
frame and the synchronization bit. Noi.e that the number of received speech frames which is used to establish synchronization can be modified t.o trade off the probability of false synchronization, the synchronization delay, and the ability to acquire synchronization in the presence of bit errors. :also note that other synchronization fields may be provided outside the INIBET'~f speech coder which may eliminate the need to use b~+2 for synchronization.
The IMBET'~r encoder combines the quantization methods described above with a sequence of bit manipulations to increase the systems robustness to channel degradations (i.e. bit errors). Each frame of quantizer values.
denoted by bo, . . . , bL+2. is first prioritized into a set of bit vectors, denoted by uo, . . . .
u7 according to each bits sensitivity to bit errors. The result is that bit errors introduced into the highest priority bit vector, uo, cause large distortions in the decoded speech. Conversely, bit errors added to the lowest priority bit vector, u7, cause small distortions in the decoded speech. The bit vectors are then protected with error control codes, including both (23,12 Golay codes and (1~.11~ Hamming codes, to produce a set of code vectors denoted by vo.
. . . , v; . The use of bit prioritization increases the effectiveness of the error control codes. since only the most sensitive bits are protected by the high redundancy Golay codes.
The IMBET'~ encoder also utilizes bit modulation to further increase the systems robustness to bit errors. One of the bit vectors is used to generate a modulation key which is used to initialize a pseudo-random sequence. This .
E.,, .
v:
sequence is converted into a set of binary modulation vectors which are added modulo 2 to the code vectors (i.e after error control encoding). The result is a .
set of modulated code vectors denoted by eo, . . . c; . Finally, intra-frame bit :.
PCTlUS93I1~609 1,r..
WO 94112932 . xw~r . :.
.,.
r....::.
..
a .o i i interleaving is used on the modulated code vectors in order to spread the effect -, of short burst errors. .-~ block diagram of the bit manipulations performed by the encoder is shown in Figure i .
The IMBETV' decoder reverses the bit manipulations perforated by the encoder. First the decoder de-interleaves each frame of 1-~4 bits to obtain the eight code vectors cc, . . . , c; . The highest priority code vector is then error control decoded and used to generate a demodulation key. The demodulation kev is then used to initialize a pseudo-random sequence which is converted into a set of binary modulation vectors. 'These are added modulo '? to the remaining code vectors to produce a set of demodulated code vectors, denoted by vo, . . . , v; , which the decoder then error control decodes to reconstruct the bit vectors uo, .. , u;. Finally, the decoder rearranges these bit vectors to reconstruct the quantizer values, denoted by bo, bt, . . . , bL+2, which are then used to reconstruct a frame MBE model parameters. Each frame of model i5 parameters can then be used by an IMBET~f speech synthesizer to synthesize a time segment of speech. A block diagram of the bit manipulations performed by the decoder is shown in Figure S.
- One should note that the IMBET~~ speech decoder employs a number of different mechanisms to improve performance in the presence of bit errors.
2Q These mechanisms consist first of error control codes, which are able to remove a significant number of errors. In addition, the IMBET~~ speech coder uses bit madulation combined with frame repeats and frame mutes to detect and discard highly corrupted frames. Finally, the IMBET't~ speech decoder uses adaptive smoothing to reduce the perceived effect of any remaining errors.
25 These mechanisms are all discussed in the following sections of this description.
The first bit manipulation performed by the Il~iBET'~~ encoder is a prior-)S
itization of the quantizer values bo, b,, . .. , b~+Z into a set of b bit vectors denoted by uo, u1 . . . . . u; . The bits vectors u0 through u3 are 1'? bits long.
while the bit vectors u4 through us are 11 bits long, and the bit vector u;
is seven bits long. Throughout this section the convention has been adopted that bit N, where N is the vector length, is the most significant bit (I~ISB), and bit 1 is the least significant bit (LSB).
The prioritization of the quantizer values into the set of bits vectors begins with uo. The six most significant bits of uo (i.e. bits 1? through ?) are set equal to the six most significant bits of bb (i.e. bits 8 through 3). The next three most significant bits of uo (i.e. bits 6 through 4 ) are set equal to the three most significant bits of b? (i.e. bits 6 through ~). The remaining three bits of uo are generated from the spectral amplitude quantizer values b~ through b~+~ . Specifically these quntizer values are arranged as shown in Figure 9.
In this figure the shaded areas represent the number of bits which were-allocated to each of these values assuming L = 16. Note that for other values of L this figure would change in accordance with the bit allocation information con-tained in Appendices F and G of the APCO/NASTD/Fed Project 25 Vocoder Description. The remaining three bits of uo are then selected by beginning in the upper left hand corner of this figure (i.e.
bit 10 of b3) and scanning left to right. When the end of any row is reached the scanning proceeds from left to right on the next lower row. Bit 3 of uo is set equal to the bit corresponding to the first shaded block which is encountered using the prescribed scanning order. Similarly. bit 2 of uo is set equal to the bit corresponding to the second shaded block which is encountered and bit 1 of uo is set equal to the bit corresponding to the third shaded block which is encountered.

WO 94/12932 ~' PCTIUS93/11509 .w' r ....
'~1~90~0;
'?~3 The scanning of the spectral amplitude quantizer values b3 through b~+y", ;.
which is used to generate the last three bits of up is continued For the bit ;, vectors ~:el t.hrough u3. Each successive bit in these vectors is set equal to the ' bit corresponding to the next shaded block. This process begins with bit 12 of u1, proceeds through bit 1 of ui followed by bit 1'? of u,~, and continues in this manner until finally reaching bit 1 of u3. ~1t this paint the 48 highest priority (i.e. most sensitive) bits have been assigned to the bit vectors uo through u3 as shown in Figure 10.
The formation of the bit vectors u.4 through u; begins with of the V/L;V
decision bits. This is accomplished by inserting into the bit vectors (beginning , with bit 11 of u9; proceeding through bit 1 of ie4 followed by bit 11 of u5, and continuing in this manner until finally reaching bit 5 of u7) all of the bits of b1 (starting with the MSB), followed by bit 3 and then bit 2 of 62, and then continuing with the scanning of b~ through bL+~ as described above. The final four bits of v,~ (beginning with bit ~ and ending with bit 1 ) are set equal to bit 1 of b2, bit 2 of bo, bit 1 of bo, and then bit 1 of bL+2, respectively. ~
block , diagram of this procedure is shown in Figure 11 for Ii = G.
The formation of the bit vectors described above prioritizes the bits accord-ing to their sensitivity to bit errors. A bit error introduced into uo generally causes the largest degradation in speech quality, while a bit error introduced into u; generally causes little degradation in speech quality. Consequently the t 5fi bits per frame available for error control are used to protect the first four bit vectors with (23,12 Golay codes, while the next three bit vectors are protected -:
with (15,11) Hamming codes. The last bit vector is left unprotected. This ap-fr .
-25 proach is efficient, since it only uses the more redundant (and hence more robust) Golay codes where they are most needed, while using less redundant ~30 i and hence less robust ) codes in other areas.
The bit prioritization described above can be viewed as assigning a weight to each allocated bit location in the set of quantizer values. Within anv one quantizer value the weight is greater for a more significant bit location than for a less significant bit location. In addition the relative weight of a bit loca-tion of the same significance in different quantizer values is dependent upon each quantizer values sensitivity to bit errors (i.e. the perceived degradation that results after speech is synthesized with bit errors in a particular quan-tizer value). Once weight have been assigned to each bit location. then the contruction of the bit vectors is performed by ordering each bit according to the weight of its bit location and then partitioning the ordered bit sequence into bit vectors of the appropriate length.
Once the eight bit vectors have been formed, they are each converted into a corresponding code vector. The generation of the eight code vectors v; for 0 < i < i is performed according, to the following set of equations, v; - u; ~ PG for 0 < i < 3 ( 38 ) v; - u;~PH for4<i<6 (39) v~ - u; (40) where the Pc and Pay are the parity matrices for the (23,12 Golay code and the (15.11) Hamming code, respectively. These are shown below where absent entries are assumed to f~ual zero. Note that all operations are modulo '?
i4,s defined in the references referred to herein. and the vectors v; and u;
are assumed to be row vectors. where the "left" most bit is the MSB. This fonvention is used throughout this section.
f :rte:
. .a.

.,. WO 9a/12932 ~ ~ ~ e~ ~ e~ ~ ~ , .:..

i s j M , 11000111010 :.

pG- 010 01101100110 i0 is pH= O10 1010 The Golay parity matrix PG shown above is a systematic representation of ;
the standard (23,12 Golay code. Standard methods of decoding this code and the Hamming code are discussed in the literature. These methods are used by the IMBETM decoder to correct the maximum number of errors for each code.
t The IMBET'~'l speech coder uses bit modulation to provide a mechanism for detecting errors in vv beyond the three errors that the (23,12 Golav code ::.,::; ..
:.. .
PCT/IJS93111609 tr':I~v'~

~~.49~3~
3'?
can correct.. The first step in this procedure is to generate a set of binary ,.
modulation vectors which are added (modulo '?) to the code vectors vo through i~; . The modulation vectors are generated from a pseudo-random sequence which is initialized to the value of a modulation key which is generated from _ the bit vector uo. Specifically, the sequence defined in the following equation is used.
pr(V) - l6ico ('~1) 1 i3pr(n - 1) + 13849 '' ] (4 pr(n) - 173pr(n - 1 ) 1 + 1389 - 655361 where the bit vector n the uo is interpreted as 'an unsigned 12 bit number i range (0, 4095. Equation eudo-(42) is used to recursively compute the ps random sequence pr(n) over the range 1 < n < 11=1. Each element of this sequence can be interpreted as a 16 bit random number which is uniformly distributed over the interval set of (0; 65535). Using this interpretation, a binarv modulation vectors, from denoted by mo through m ; . are generated this sequence as shown below.

m.o - (0, 0, . . . , OJ ('13) Pr(23)J (~4) pr(?) , pr(1) ,~I - ~''. ~
~' ~
132768 32768 3276b 7 lpr(25)~ ~ lPr(~6)~ (~5) . lpr(2~')J
.

1 ~
22 _ 32768 32768 32768 ' Pr(47) pr(~8) ..., Pr(69) (~6) - ~
~' ~
~
~
~

s2 7 ss 3276s 3276s ' :

m pr( 7~~~ (''1; ) - ~pr(71 V
I
)~
( ~r(~'1),
4 ~ ;
' ~ . . . , 32768 32 r 68 . ' pr(85) pr(86) lpr(99)~ (48) ~ I

nts ' . . . . 32768 l 32768 ~' l 32768 pr(101) pr(114) ;.
pr(100) ' ~ (491 , .

L 32768 J' L 32768 J' .
. . , l 327ss ~::::::
CVO 94112932 . , PCT/US93111609 ;..
a..:
-.

ria; - (~, 0. . . . , U] ~o~) " , Unce these modulation vectors have been computed in this manner, the rnod-S
uiated code vectors, vv for 0 < 1 < 1, are computed by adding (modulo ?) the code vectors to the modulation vectors.
c; - v; -~- Tn; for 0 <_ 1 < 1 One should note that the bit modulation performed by the IMBET''I en-coder can be inverted by the decoder if ca does not contain any uncorrectable bit errors. In this case Golay decoding co, which always equals vv since mo =
0.
In will yield the correct value of uo. The decoder can then use up to recon struct the pseudo-random sequence and the modulation vectors Tnl through m;. Subtracting these vectors from c1 though c; will then yield the code vec tors vi though v; . At this point the remaining error control decoding can be performed. In the other case, where co contains uncorrectable bit errors, the modulation cannot generally be inverted by the decoder. In this case the likely result of Golay decoding cc will be some uo which does not equal up. Con-sequently the decoder will initialize the pseudo-random sequence incorrectly, and the modulation vectors computed by the decoder will be uncorrelated with the modulation vectors used by the encoder. Using these incorrect modulation vectors to reconstruct the code vectors is essentially the same as passing 'v1, . . . , vs through a 50 percent bit error rate (BER) channel. The IMBET'~~ de-coder exploits the 'fact that, statistically, a 50 percent BER causes the Golay and Hamming codes employed on v1 through v6 to correct a number of errors , t.
which is near the maximum capability of the code. By counting the total number of errors which are corrected in all of these code vectors, the decoder is able to .reliably detect frames in which co is likely to contain uncorrectable 3-~
bit errors. The decoder performs frame repeats during these frames in order to reduce the perceived degradation in the presence of bit errors. Experimental results have shown that frame repeats are preferable to using an incorrectly decoded ca, since this code vector controls the bit allocation for the parameter quantizers.
Hence, the use of random bit modulation by the encoder allows the decoder to reliably detect whether there are any uncorrectable bit errors in co without requiring further redundancy to be placed into the data stream. This allow efhcierit use of the communication channel while eliminating large degradations from being introduced into the synthesized speech.
Intra-frame bit interleaving is used to spread short bursts.of errors among several code words. This decreases the probability that a short burst of errors will result in an uncorrectable error pattern. The minimum separation between anv two bits of the same error correction code is 6 bits. The exact order of the i5 1.1~ bits in each frame is tabulated in Appendix H of the APCO/NASTD/Fed Project 25 Vocoder Description dated 1 December 1992.
The table in this appendix uses the same notation as was discussed above, i.e. bit N (where N is the vector length) is the MSB and bit 1 is the LSB. The speech coder bits should be inserted into the Project 25 frame format beginning with the bit t, and ending with 'bit t,9~.
The IMBETM speech decoder estimates the number of errors in each re-ceived data frame by computing the number of errors corrected by each of the (23.12] and (15,11) Hamming codes. The number of errors for each code vector is denoted e; for 0 < i < 6, where e; refers to the number of bit errors which were detected during the error control decoding of u;. These seven bit error parameters can easily be determined by using the following equation where c PCT~US93/11609 . , F, ~~ ~~03~
r -; .
f . _.
3~
a main all arithmetic operations are modulo '?.
v; - pG~c; for 0 < i < :3 E.. (.~2) v; - PHic; for 4 < i From these error values two other error parameters are computed as shown below.

(~7~~
~=0 eR(0) - .95 * eR(-1 ) + .000356eT f ~-~ ;9 ~ The parameter eR(0) is the estimate of the error rate for the current frame.
while ER(-1) is the estimate of the error rate for the previous frame. These error parameters are used to control the frame repeat process described belove, and to control the adaptive smoothing functions described in below. Both of these functions are designed to improve the perceived quality of the decoded speech, given that the error control decoding is not always able to correct all of the bit errors introduced by a severely degraded channel.
The IiVIBE~~~ decoder examines each received data frame in order to de-tect and discard frames which are highly corrupted. ~ number of different fault conditions are checked and if any of these conditions indicate the cur-rent frame is invalid, then a frame repeat is performed. The IMBET'~i speech encoder uses values of bo in the range 0 < bo < 207 to represent valid pitch _ , estimates. In addition values of b0, in the range 216 < bo <_ 219 are used by encoder to represent silence frames. The remaining values of bo are reserved r.
for future expansion (such as DTMF signals, call progress signals, enhanced speech coders, inband data, etc...) and are currently considered invalid. .~
frame repeat is performed by the decoder if it receives an invalid value of b0.

~f~
or if both of the following two equations are true.
s :~ ~,~,~) ;
to 11 f:~6i S Thes° two equations are used to detect the incorrect demodulation which re-sults if there are uncorrectable bit errors in c~. The decoder performs a frame repeat by taking the following steps:
1 ) The current 1-~4 bit received data frame is marked as invalid and sub-sequently ignored during future processing steps.

_) The IMBET'~ model parameters for the current frame are set equal to the IMBE~'~'f madel parameters for the previous frame. Specifically. the following update expressions are computed.
.. ,:~a10) wa(-1 ) ( ) 15 L(0) - I,(-1) (58) h(0) _ Ii (-1 ) (~9) _ uk(-1) for 1 < k C Ii 1601 .YI,(0) -- :'1~1~(-1) for I <_ (611 ! < !.

:YIi(0) - -- tl~li(-1) for 1 < (631 ! < L

3) The repeated model parameters are used in all future processing wher-ever the current model parameters are required (i.e. speech synthesis).
The IMBE~M decoder useslmuting to squelch the output in severe bit error environments. This is indicated after four successive frames have been repeated the s eech out ut if a silence or if eR > .085. In addition the decoder mutes p P
frame is received which is indicated by ''16 <_ 6o < '?19. The recommended ~ E

(. .

. _y ~ ~ .... .
;.
.~ I
t muting method is to bypass the synthesis procedure and to set the svnthecir~
~peecli signal. s(n) to random noise which is uniformly distributed over the -;
- ;.
interval (-~. 5J samples.
In the embodiment described above, the logic used to detect an incor-rect demodulation of the current frame is controlled by the parameters E; for t G_ 6, which represent the number of errors detected during the error control decoding of v;. This detection Logic can be generalized to the com putation of an error measure based upon the result of comparing v; with u;
i.e. the demodulated code vectors before and after error control decodim;).
If the value of this error measure exceeds a threshold then the current frame is declared invalid. This relies on the fact that incorrect demodulation causes large discrepancies between these vectors. resulting in a high value of the error measure. Some appropriate action, such as a frame repeat or a frame mute.
is then performed for invalid frames. The advantage of this generalized view-point is that it easily accomodates alternative error measures which mar offer improved performance under certain channel conditions. For example soft-decision ( i.e. mufti-bitl data from a modem or similar device can be combined with the disclosed demodulation method in a straightforward manner to offer improved performance.
The IMBETM speech decoder attempts to improve the perceived quality of the synthesized speech by enhancing the spectral amplitudes. The unen-hanced spectral amplitudes are required by future frames in the computation of .
Equation (34). However, the enhanced spectral amplitudes are used in speech synthesis. The spectral amplitude enhancement is accomplished by generating t_ a set of spectral weights from the model parameters of the current frame.
First ~i'O 94/12932 . PCTIUS93111609 '~~a:~>' ~1~90~9 . :;, l; ~lo and Rm, are calculated as shown i~elow l 6:3 ) ;
ILaro = ~ .VI, .-, 1.
R.~r~ _ ~ .1l.' cosj~,~;~ 1) l 6-~ ) I- t Ve~a, the parameters RNro, and R~r~ are used to calculate a sec of weights.
ti-'t. given by "~ ~.96;, I R~~o '" R~111 ~ ~-'R~aoR:vr~ oos4w~ !1) ' for 1 ~' : . L
- ~% t1 1t-', _ ~ t ~ ~ ,voR~~o( R:~ro - R.~ri ) -i 6.~
These weights are then used to enhance the spectral amplitudes for the current frame according to the relationship:
;1~, if 8 I < L
1.2 ~ lLlt else if Wt > 1.3 'ut ! for 1 < 1 < L (66) ..~ ~ :~~It else if ~~'< < .~
~.y~ . ,Ylt otherwise .~ nnal step is to scale the enhanced spectral amplitudes in order to remove anv energy difference between the enhanced and unenhanced amplitudes. The correct scale factor, denoted by 7, is given below.
zo , -~ ~ RMO ~ (6; ) ~i ~ ~lLl,(2 This scale factor is applied to each to each of the enhanced spectral amplitudes as shown in Equation (68).
a.
.llt -- _ --;~ ~ .11, for 1 < 1 < L ( bb l :5~.,~.~t ~.
PCT/US93/11609 ,::.r.;, .. ~i~0 94/12932 1; :;.::.: ::
:..:

For rotational simplicity this equation refers to both the scaled and unscaled~
spectral amplitudes as X11,. This convention leas been adopted since the un-scaled amplitudes are discarded and only the scaled amplitudes are subse-quentlv used by the decoder during parameter smoothing and speech svnthe-sas.
The value of R,~,~o expressed in Equation (fi4) is a measure of the energy in the current frame. This value is used to update a local energy parameter in accordance with the following rule.
.95 SE(-1 ) + .05 R,vo if .95 SE(-1 ) + .05 Ii~.~o > 10000.0 ao sE(o> _ 10000.0 otherwise v (69) This equation generates the local energy parameter for the current frame, SE(0), from Rl,,io and the value of the local energy parameter from the previous frame SE(-1). The parameter SE(0) is used to adaptively smooth the V/UV
is decisions as described below.
The IlVIBE~M decoder performs adaptive smoothing to reduce the per-ceived distortion caused by any uncorrectable bit errors in bo, b1, ....
bL~.v.
The adaptive smoothing methods are controlled by two error rate parameters, eT and eR, which are estimated for the current frame as discussed above. When eT and eR are small, it is assumed that the error control decoding removed all of the bit errors, and the decoded model parameters are not perturbed. Con-versely, when e~ and eR are small, :it is assumed that their is a high probability that some uncorrected bit errors have been introduced into the decoded model i t.. .:.
t . ._ parameters, and a large amount o~ smoothing is peformed.
s, 25 The first parameters to be smoothed by the decoder are the V/UV deci-s.

WO 94/12932 , ~ PCT/US93111609 ,ions. First an adaptive threshold 1'u is calculated using oc/uatioo - ;.
if E~(O1 < .005 and cT ~ -~
i,.,~~ - .cs.2ssts~:to)t.';~ else if eR(0) < .Ol'?5 and c., - /) ~ .01 exp/2i'.2titp,/U11 ' 1.41-1 (SE(0)).~''s otherwise where the energy parameter SE(0) is defined in Equation (69). after the adaptive threshold is computed each enhanced spectral amplitude .tlc for 1 ~-l < L is compared against V;Sl. and if :ylc > V~~ then the V/j-~%~ decision for that spectral amplitude is declared voiced. regardless of the decoded ''/1.-V
decision. Otherwise the decoded '/LTV decision for that spectral amplitude 1~
is left unchanged. This process can be expressed mathematically as shown below.
1 if Vlc > Lhr for 1 < ! < L ( t 1 ) iy otherwise Once V/UV decisions have been smoothed, the decoder adaptively smooths 15 the spectral amplitudes :ylc for 1 < 1 < h. The spectral amplitude smoothing method computes the following amplitude measure for the current segment.
L, .-1 nr = ~ .llc ( ~ ~
cm Text an amplitude threshold is updated according to the following equation, 20 ?0480 if eR(0) < .005 and eT(0) <_ 6 T~(0) - (i3) fi000 - 300eT + T.v(-1 ) otherwise where r,~(0) and r;~f(-1) represent the value of the amplitude threshold for the current and previous frames respectively. The two parameters .-1,,~ anal r.«(0) are then used to compute a scale factor 7,~t given below. .
;, .
2~ 1.0 if r;yt(0) > .-1.,~ a ',~.tr = f t -11 ' otherwise ' .
~a M

:.

r s "if.:
~~~ ~~~~ ~9 -~ 1 I
This scale factor is multiplied by each of the spectral amplitudes .tl~ tore l.
1 < ! < L. tote that this step must be completed alter spectral amplitude enhancement has been performed and after ~;~t has been computed according to Equation 70. The correct sequence is shown in Figure 1'?.
Further description of the preferred embodiment is given in the claims.

t 3:
a

Claims (43)

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:
1. A method for encoding digital data, the method comprising the steps of:
dividing said digital data into one or more frames;
further dividing each of said frames into a plurality of bit vectors;
encoding one or more of said bit vectors with error control codes;
generating a modulation key from one or more of said bit vectors; and using said modulation key to modulate one or more of said encoded bit vectors.
2. The method of claim 1 wherein a first group of said bit vectors are each encoded by a first type of error control code and a second group of said bit vectors are each encoded by a second type of error control code.
3. The method of claims 1 or 2 wherein said modulation key is generated from a high priority bit vector.
4. The method of claims 1 or 2 wherein said frames of digital data are generated by encoding a speech signal with a speech coder.
5. The method of claim 4 wherein said frames of digital data can be grouped into a plurality of frame formats and wherein the said modulation key is generated from one of said bit vectors, said one bit vector determining the frame format used in the current frame.
6. A method for decoding digital data that has been encoded using the method of claim 1, the method comprising the steps:
dividing said digital data that has been encoded, into one or more frames;
further dividing each of said frames into a plurality of code vectors;
generating a demodulation key from one or more of said code vectors;
using said demodulation key to demodulate one or more of said code vectors;
and error control decoding one or more of said demodulated code vectors.
7. The method of claim 6, further comprising the steps of:
computing an error measure which is formed at least in part by comparing said demodulated code vectors before error control decoding with said demodulated code vectors after error control decoding;
comparing the value of said error measure against a threshold; and declaring said frame invalid if said error measure exceeds said threshold.
8. The method of claim 7 wherein said demodulation is performed using a method comprising the steps of:
initializing a pseudo-random sequence using said demodulation key;
using said pseudo-random sequence to generate one or more binary demodulation vectors; and performing modulo 2 addition of said binary demodulation vectors to a plurality of said code vectors.
9. The method of claim 8 wherein said demodulation key is generated from one of said code vectors after said one code vector has been error control decoded.
10. The method of claims 6, 7, 8 or 9 wherein a first group of said demodulated code vectors are each decoded using a first type of error control code and a second group of said demodulated code vectors are each decoded using a second type of error control code.
11. The method of claim 10 wherein said first type of error control code is a Golay code and said second type of error control code is a Hamming code.
12. The method of claims 6, 7, 8 or 9 wherein said demodulation key is generated from a high priority code vector.
13. The method of claim 7, 8, or 9 wherein said error measure is computed at least in part by counting the number of errors detected or corrected by said error control decoding.
14. The method of claim 12 wherein said frames of digital data represent a speech signal which has been encoded with a speech coder.
15. The method of claim 14 wherein said frames of digital data can be grouped into a plurality of frame formats and wherein said demodulation key is generated from one of said code vectors, said one code vector determining the frame format used in each frame.
16. The method of claim 14 wherein said demodulation key is generated from one of said code vectors, said one code vector representing at least in part the level of said speech signal.
17. The method of claim 14 wherein said invalid frames are discarded and replaced by the last decoded frame which was not declared to be invalid.
18. The method of claim 14 wherein said speech coder is one of the following speech coders: Multi-Band Excitation (MBE) speech coder, Improved MultiBand Excitation (IMBE TM) speech coder, or sinusoidal transform speech coder (STC).
19. An apparatus for encoding a speech signal into digital data, the apparatus including:
a processor configured to perform the steps of:
sampling said speech signal to obtain a series of discrete samples and constructing therefrom a series of frames, each frame spanning a plurality of said samples;
analyzing said frames to extract the parameters of a speech coder;
using a quantizer to convert said parameters into a set of discrete quantizer values;
dividing said quantizer values into a plurality of bit vectors;
encoding one or more of said bit vectors with error control codes;
generating a modulation key from one or more of said bit vectors; and using said modulation key to modulate one or more of said encoded bit vectors.
20. An apparatus for decoding speech from digital data which has been encoded using the apparatus of claim 19, the apparatus including:
a processor configured to perform the steps of:
dividing said digital data into one or more frames;
further dividing said frames into a plurality of code vectors;
generating a demodulation key from one or more of said code vectors;
using said demodulation key to demodulate one or more of said code vectors;
and error control decoding one or more of said demodulated code vectors.
21. The apparatus of claim 19 or claim 20 wherein said speech coder is one of the following speech coders: Multi-Band Excitation (MBE) speech coder, Improved Multi-Band Excitation (IMBE TM) speech coder, or sinusoidal transform speech coder (STC).
22. A method for error control coding of digital data, the method comprising the steps of:
dividing said digital data into a plurality of bit vectors, including a first bit vector;
encoding said bit vectors with error control codes, to produce encoded bit vectors, including an encoded first bit vector;
generating a modulation key from at least said first bit vector; and using said modulation key to modulate at least some of said encoded bit vectors.
23. The method of claim 22 wherein said digital data comprises bits of varying priority, and wherein said first bit vector comprises a high priority bit vector containing bits of higher priority than at least some other said bits.
24. The method of claim 22 wherein said digital data is divided into frames, each frame comprising a said plurality of bit vectors.
25. The method of claim 23 wherein said digital data is divided into frames, each frame comprising a said plurality of bit vectors.
26. The method of claim 22 or claim 25 wherein a first group of said bit vectors are each encoded by a first type of error control code and a second group of said bit vectors are each encoded by a second type of error control code.
27. The method of claim 24 or claim 25 wherein said frames of digital data are generated by encoding a speech signal with a speech coder.
28. The method of claim 27 wherein said frames of digital data are grouped into a plurality of frame formats and wherein the said modulation key is generated from one of said bit vectors, said one bit vector determining the frame format used in the current frame.
29. A method of decoding digital data that has been encoded by an encoding method comprising the steps of:
dividing said digital data into a plurality of bit vectors, including a first bit vector;
encoding said bit vectors with error control codes, to produce encoded vectors, including an encoded first bit vector;
generating a modulation key from at least said first bit vector; and using said modulation key to modulate at least some of said encoded bit vectors, to produce modulated encoded bit vectors;
said method of decoding comprising the steps of:
dividing the digital data to be decoded into a plurality of code vectors, said code vectors corresponding to said modulated encoded bit vectors;
generating a demodulation key from at least one of said code vectors;
using said demodulation key to demodulate at least some of said code vectors, to produce demodulated code vectors; and error control decoding at least some of said demodulated code vectors.
30. The method of claim 29 further comprising the steps of:
computing an error measure which is formed at least in part by comparing said demodulated code vectors before error control decoding with said demodulated code vectors after error control decoding;
comparing the value of said error measure against a threshold; and declaring said frame invalid if said error measure exceeds said threshold.
31. The method of claim 30 wherein said demodulation is performed using a method comprising the steps of:
initializing a pseudo-random sequence using said demodulation key;
using said pseudo-random sequence to generate one or more binary demodulation vectors; and performing modulo 2 addition of said binary demodulation vectors to a plurality of said code vectors.
32. The method of claim 31 wherein said demodulation key is generated from one of said code vectors after said one code vector has been error control decoded.
33. The method of claim 29, claim 30, claim 31 or claim 32 wherein a first group of said demodulated code vectors are each decoded using a first type of error control code and a second group of said demodulated code vectors are each decoded using a second type of error control code.
34. The method of claim 33 wherein said first type of error control code is a Golay code and said second type of error control code is a Hamming code.
35. The method of claim 29, claim 30, claim 31 or claim 32 wherein said digital data comprises code vectors of varying priority, and wherein said demodulation key is generated from a high priority code vector.
36. The method of claim 30, claim 31 or claim 32 wherein said error measure is computed at least in part by counting the number of errors detected or corrected by said error control decoding.
37. The method of claim 35 wherein said frames of digital data represent a speech signal which has been encoded with a speech coder.
38. The method of claim 37 wherein said frames of digital data are grouped into a plurality of frame formats and wherein said demodulation key is generated from one of said code vectors, said one code vector determining the frame format used in each frame.
39. The method of claim 37 wherein said demodulation key is generated from one of said code vectors, said one code vector representing at least in part the level of said speech signal.
40. The method of claim 37 wherein said invalid frames are discarded and replaced by the last decoded frame which was not declared to be invalid.
41. The method of claim 37 wherein said speech coder is one of the following speech coders: Multi-Band Excitation (MBE) speech coder, Improved Multi-Band Excitation (IMBE.TM.) speech coder, or sinusoidal transform speech coder (STC).
42. An apparatus for encoding a speech signal into digital data, the apparatus including:
means for sampling said speech signal to obtain a series of discrete samples and constructing therefrom a series of frames, each frame spanning a plurality of said samples;
means for analyzing said frames to extract the parameters of a speech coder;
means for converting said parameters into a set of discrete quantizer values;
means for dividing said quantizer values into a plurality of bit vectors;
means for encoding one or more of said bit vectors with error control codes;

means for generating a modulation key from one or more of said bit vectors;
and means for using said modulation key to modulate one or more of said encoded bit vectors.
43. An apparatus for decoding speech from digital data which has been encoded using the apparatus of claim 42, the apparatus including:
means for dividing said digital data into one or more frames;
means for further dividing said frames into a plurality of code vectors;
means for generating a demodulation key from one or more of said code vectors;
means for using said demodulation key to demodulate one or more of said code vectors; and means for error control decoding one or more of said demodulated code vectors.
CA002149039A 1992-11-30 1993-11-29 Coding with modulation, error control, weighting, and bit allocation Expired - Lifetime CA2149039C (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US07/982,937 1992-11-30
US07/982,937 US5517511A (en) 1992-11-30 1992-11-30 Digital transmission of acoustic signals over a noisy communication channel
PCT/US1993/011609 WO1994012932A1 (en) 1992-11-30 1993-11-29 Coding with modulation, error control, weighting, and bit allocation

Publications (2)

Publication Number Publication Date
CA2149039A1 CA2149039A1 (en) 1994-06-09
CA2149039C true CA2149039C (en) 2006-10-03

Family

ID=37101719

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002149039A Expired - Lifetime CA2149039C (en) 1992-11-30 1993-11-29 Coding with modulation, error control, weighting, and bit allocation

Country Status (1)

Country Link
CA (1) CA2149039C (en)

Also Published As

Publication number Publication date
CA2149039A1 (en) 1994-06-09

Similar Documents

Publication Publication Date Title
EP0671032B1 (en) Coding with modulation, error control, weighting, and bit allocation
US5247579A (en) Methods for speech transmission
US5097507A (en) Fading bit error protection for digital cellular multi-pulse speech coder
US6131084A (en) Dual subframe quantization of spectral magnitudes
AU657508B2 (en) Methods for speech quantization and error correction
US4831624A (en) Error detection method for sub-band coding
EP1061503B1 (en) Error detection and error concealment for encoded speech data
US8359197B2 (en) Half-rate vocoder
US8315860B2 (en) Interoperable vocoder
CA2254567C (en) Joint quantization of speech parameters
US6161089A (en) Multi-subframe quantization of spectral parameters
AU717381B2 (en) Method and arrangement for reconstruction of a received speech signal
AU7174100A (en) Multiband harmonic transform coder
EP0371032A1 (en) Protection of energy information in sub-band coding
CA2149039C (en) Coding with modulation, error control, weighting, and bit allocation
KR100220783B1 (en) Speech quantization and error correction method
Yang et al. Performance of pitch synchronous multi-band (PSMB) speech coder with error-correction coding
van den Berghe et al. Real-time implementation of a scaleable channel coding scheme for mobile transmission of G. 723.1 speech bitstream
Obozinski From Grid Technologies (in 6th of IST RTD 2002-2006) to knowledge Utility

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20131129