WO1997013242A1 - Trifurcated channel encoding for compressed speech - Google Patents

Trifurcated channel encoding for compressed speech Download PDF

Info

Publication number
WO1997013242A1
WO1997013242A1 PCT/US1996/013394 US9613394W WO9713242A1 WO 1997013242 A1 WO1997013242 A1 WO 1997013242A1 US 9613394 W US9613394 W US 9613394W WO 9713242 A1 WO9713242 A1 WO 9713242A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
encoder
speech
error sensitive
error
Prior art date
Application number
PCT/US1996/013394
Other languages
French (fr)
Inventor
Xiaojun Li
Jian-Cheng Huang
Floyd Simpson
Original Assignee
Motorola Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc. filed Critical Motorola Inc.
Publication of WO1997013242A1 publication Critical patent/WO1997013242A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm

Definitions

  • This invention relates generally to communication systems, and more specifically to a compressed voice digital communication system which provides a novel channel encoding method utilizing a plurality of encoders for encoding data of varying importance using varying protection levels to decrease the average overhead for the channel encoding process.
  • Adding to the difficulty in low bit rate data transmission is the requirement that the data to be transmitted utilizing a channel encoding process which protects the data from irregularities in the transmission channel. This significantly increases the bit rate required for the data transmission.
  • the speech parameters vary significantly in their importance in speech replication. For example, if one of the energy parameters is altered in the transmission process, speech replication will not be significantly affected. However, if pitch information becomes altered, it will likely render the speech replication unintelligible.
  • pitch generally refers to the period or frequency of the buzzing of the vocal cords or glottis
  • spectrum generally refers to the frequency dependent properties of the vocal tract
  • energy generally refers to the magnitude or intensity of the speech waveform
  • voicing refers to whether or not the vocal cords are active
  • quantizing refers to choosing one of a finite number of discrete values to characterize these ordinarily continuous speech parameters .
  • the number of different quantized values (or levels) for a particular speech parameter is set by the number of bits assigned to code that speech parameter.
  • Vocoders may be built which operate at rates such as 200, 400, 600, 800, 900, 1200, 2400, 4800, 9600 bits per second, with varying results depending on the bit rate.
  • quality of reconstructed voice will vary depending not only on the bit rate chosen, but also on parameters such as previously discussed (e.g., pitch period, spectrum bandwidth, energy, voicing, etc.).
  • the allowable bit rate will fall accordingly. Consequently, as the allowable bit rate falls, it becomes more difficult to find a data compression scheme that provides clear, intelligible, synthesized speech.
  • data compression is intended to refer to the creation of a set of quantized parameters describing the input speech and “de-compression” is intended to refer to the subsequent use of this set of quantized parameters to synthesize a replica of the input speech.
  • channel encoding is referred to as both the encoding and decoding of the compressed speech parameter data for the protection of the data when passed through a transmission channel.
  • vocoder has been coined in the art to describe an apparatus which performs the aforementioned functions.
  • compressed input speech is distinguished based on the importance of the data for speech replication.
  • the most error sensitive data is considered to be heading and pitch information and therefore is passed to encoder I wherein a channel encoding method, such as a Hamming code, is utilized to provide for maximum protection of the compressed data from channel errors, but at the cost of increased overhead.
  • the intermediate error sensitive data is considered to be the spectral parameters and is encoded utilizing a method that provides less protection, such as the split vector quantization error detecting method which requires less overhead.
  • the least error sensitive data is considered to be the energy information and is encoded using a pure binary code (e.g., a Gray code). This provides minimal error correction and minimal overhead, while significantly decreasing the number of bits required for channel encoding.
  • the encoded data is then multiplexed and transmitted via a transmitter to a receiver capable of de ⁇ multiplexing the encoded data.
  • the encoded most error sensitive data is passed to decoder I, wherein a decoding method consistent with encoder I is used, such as a Hamming code decoder in the preferred embodiment.
  • the encoded intermediate error sensitive data is passed to decoder II, wherein a decoding method consistent with encoder II is used, such as the split vector quantization error correcting method.
  • the encoded least error sensitive data is passed to decoder III, wherein the Gray code to pure binary code conversion consistent with encoder III is used.
  • the output of each decoder is a replication of the compressed input speech.
  • FIG. 1 is a block diagram of a communication system, such as a paging system, utilizing the novel channel encoding scheme in accordance with the preferred embodiment of the present invention.
  • FIG. 2 is an electrical block diagram of a paging terminal and associated paging transmitters utilizing the novel channel encoding scheme of the preferred embodiment of the present invention.
  • FIG. 3 is a flow chart showing the operation of the paging terminal of FIG. 2.
  • FIG. 4 is a block diagram of a digital signal processor that can be utilized in the paging terminal of FIG. 2 and paging receiver of FIG 6.
  • FIG. 5 is a flow chart showing the operation of a digital signal processor utilized in the paging terminal of FIG. 2.
  • FIG. 6 is an electrical block diagram of a paging receiver utilizing the novel channel encoding scheme of the preferred embodiment of the present invention.
  • FIG. 7 is a block diagram illustrating the parallel coding for 3-level error protection of the preferred embodiment of the present invention.
  • FIG. 8 is a flowchart of a Hamming code encoder utilized in the channel encoding process of the present invention to encode the "most error sensitive" speech parameter data.
  • FIG. 9 is a flowchart of the conversion of pure binary code into a Gray code.
  • FIG. 10 is a flowchart of a Hamming code decoder, which decodes the "most error sensitive" speech parameter data encoded by the Hamming code encoder.
  • FIG. 11 is a flow chart of the error correction algorithm utilized to encode and decode the "intermediate error sensitive" speech parameter data.
  • FIG. 12 is a continuation of the error correction algorithm of FIG. 11.
  • FIG. 13 is a flowchart for conversion of a Gray code into a pure binary code.
  • This invention proposes a novel channel encoding scheme for protecting compressed speech data from channel impairments.
  • voice messaging the data obtained from a vocoder is highly compressed. To achieve very low bit rate, the compression is typically done on a multi- frame basis. As a result, some compressed data may have more influence on speech reconstruction than others.
  • the compressed data is classified into three categories according to their importance in speech reconstruction. The new coding scheme guarantees that the most error sensitive data will have better protection than the least error sensitive data. This substantially reduces channel coding overhead while achieving good reliability in message transmission.
  • FIG. 1 shows a block diagram of a communication system, such as a paging system, utilizing a novel channel encoding scheme in accordance with the present invention.
  • the digital voice compression process and channel encoding scheme are adapted to the store and forward type communications systems, which provide the time required to perform the highly computational intensive voice compression and channel encoding processes. Furthermore, it minimizes the processing required to be performed in a portable communication device, such as a pager, making the process ideal for paging applications and other similar store and forward type voice communications.
  • the highly computational intensive portion of the digital voice compression process is performed in a fixed portion of the system and as a result little computation is required to be performed in the portable portion of the system as will be described below.
  • a paging system is designed to provide service to a variety of users each requiring different services. Some of the users will require numeric messaging services, and still others users may require voice messaging services.
  • the caller originates a page by communicating with a paging terminal 106 via a telephone 102 through the public switched telephone network (PSTN) 104 or the like.
  • PSTN public switched telephone network
  • the paging terminal 106 encodes the message and places the encoded message in a transmission queue.
  • the messages are broadcast under control of the paging terminal 106 using a transmitter 108 and a transmitting antenna 110. It will be appreciated that in a simulcast transmission system, a multiplicity of transmitters covering different geographic areas can be utilized as well, thus increasing geographic coverage.
  • the signal transmitted from the transmitting antenna 110 is intercepted by a receiving antenna 112 and processed by a paging receiver 114.
  • the person being paged may be alerted and the message may be displayed or enunciated depending on the type of messaging being employed.
  • FIG. 2 An electrical block diagram of the paging terminal 106 and the paging transmitter 108 utilizing digital voice compression and the trifurcated encoding of the present invention is shown in FIG. 2.
  • the paging terminal 106 shown in FIG. 1 is of a type that would be used to service a large number of simultaneous users, such as in a commercial Radio Common Carrier (RCC) system.
  • the paging terminal 106 utilizes a number of input devices, signal processing devices and output devices controlled by a controller 216. Communications between the controller 216 and the various devices that compose the paging terminal 106 are handled by a digital control bus 210. Communication of digitized voice and data is handled by an input time division multiplexed highway 212 and an output time division multiplexed highway 218.
  • the digital control bus 210, input time division multiplexed highway 212 and output time division multiplexed highway 218 can be extended to provide for expansion of the paging terminal 106.
  • the interface between the PSTN 104 and the paging terminal 106 can be either a plurality of multi-call per line multiplexed digital connections shown in FIG. 2 as a digital PSTN connection 202 or plurality of single call per line analog PSTN connections 208.
  • the interface may take the form of a high speed local area or wide area network interface, utilizing such conventional communication protocols as TCP/IP or the like.
  • TCP/IP Transmission Control Protocol
  • Each digital PSTN connection 202 is serviced by a digital telephone interface 204.
  • the digital telephone interface 204 provides necessary signal supervision, regulatory protection requirements for operation of the digital voice compression process and data protection in accordance with the present invention.
  • the digital telephone interface 204 can also provide temporary storage of the digitized voice frames to facilitate interchange of time slots and time slot alignment necessary to provide an access to the input time division multiplexed highway 212.
  • requirements for service and supervisory responses are controlled by a controller 216. Communications between the digital telephone interface 204 and the controller 216 passes over the digital control bus 210.
  • Each analog PSTN connection 208 is serviced by an analog telephone interface 206.
  • the analog telephone interface 206 provides the necessary signal conditioning, signaling, supervision, analog to digital and digital to analog conversion, and regulatory protection requirements for operation of the digital voice compression and channel encoding processes in accordance with the present invention.
  • the frames of digitized voice messages from the analog to digital converter are temporarily stored in the telephone interface 206 to facilitate interchange of time slots and time slot alignment necessary to provide an access to the input time division multiplexed highway 212.
  • Communications between the analog telephone interface 206 and the controller 216 pass over the digital control bus 210.
  • a request for service is sent from the analog telephone interface 206 or the digital telephone interface 204 to the controller 216.
  • the controller 216 selects a digital signal processor 214 from a plurality of digital signal processors.
  • the controller 216 couples the analog telephone interface 206 or the digital telephone interface 204 requesting service to the digital signal processor 214 selected via the input time division multiplexed highway 212.
  • the digital signal processor 214 can be programmed to perform all of the signal processing functions required to complete the paging process. Typical signal processing functions performed by the digital signal processor 214 include voice analyzation, digital voice compression, channel encoding in accordance with the present invention, dual tone multi frequency (DTMF) decoding and generation and modem tone generation.
  • the digital signal processor 214 can be programmed to perform one or more of the functions described above.
  • the controller 216 assigns the particular task needed to be performed at the time the digital signal processor 214 is selected, or in the case of a digital signal processor 214 that is programmed to perform only a single task, the controller 216 selects a digital signal processor 214 programmed to perform the particular function needed to complete the next step in the paging process.
  • the operation of the digital signal processor 214 performing dual tone multi frequency (DTMF) decoding and prerecorded voice prompt generation is well known to one of ordinary skill in the art.
  • the processing of a page request proceeds in the following manner.
  • the digital signal processor 214 that is coupled to an analog telephone interface 206 or a digital telephone interface 204 prompts the originator for a voice message.
  • the digital signal processor 214 analyzes the input speech for pitch, energy, spectral parameters and voicing information 214(a) and then compresses the data 214(b) using a compression method such as vector quantization; although, it is appreciated that any compression method can be utilized with this invention.
  • the digital signal processor 214 then processes the digital voice message generated by the compression processes using three distinct channel encoding methods: Encoder I 214(c) (i) for the "most error sensitive” data, Encoder II 214(c) (ii) for the "intermediate error sensitive” data and Encoder III 214
  • the channel encoder I 214(c) (i), encoder II 214(c) (ii) and encoder III 214 (c)(iii) encode the data (described in detail below), each with varying levels of data protection.
  • the compressed and channel encoded data is multiplexed by the digital signal processor 214(d) and then coupled to a paging protocol encoder 228 via the output time division multiplexed highway 218, under the control of the controller 216.
  • the controller 216 directs the paging protocol encoder 228 to store the encoded data in a data storage device 226 via the output time division multiplexed highway 218.
  • the compressed and channel encoded data is downloaded into the transmitter control unit 220, under control of the controller 216, via the output time division multiplexed highway 218 and transmitted using the paging transmitter 108 and the transmitting antenna 110.
  • FIG. 3 is a flow chart which describes the operation of the paging terminal 106 shown in FIG. 1 when processing a voice message.
  • the first entry point is for a process associated with the digital PSTN connection 202 and the second entry point is for a process associated with the analog PSTN connection 208.
  • the process starts with step 302, receiving a request over a digital PSTN line. Requests for service from the digital PSTN connection 202 are indicated by a bit pattern in the incoming data stream.
  • the digital telephone interface 204 receives the request for service and communicates the request to the controller 216.
  • step 304 information received from the digital channel requesting service is separated from the incoming data stream by digital de-multiplexing.
  • the digital signal received from the digital PSTN connection 202 typically includes a plurality of digital channels multiplexed into an incoming data stream.
  • the digital channels requesting service are de-multiplexed and the digitized speech data is then stored temporarily to facilitate time slot alignment and multiplexing of the data onto the input time division multiplexed highway 212.
  • a time slot for the digitized speech data on the input time division multiplexed highway 212 is assigned by the controller 216.
  • digitized speech data generated by the digital signal processor 214 for transmission to the digital PSTN connection 202 is formatted suitably for transmission and multiplexed into the outgoing data stream.
  • step 306 when a request from the analog PSTN line is received.
  • incoming calls are signaled by either low frequency AC signals or by DC signaling.
  • the analog telephone interface 206 receives the request and communicates the request to the controller 216.
  • the analog voice message is converted into a digital data stream.
  • the analog signal received over its total duration is referred to as the analog voice message.
  • the analog signal is sampled and digitized.
  • the samples of the analog signal are referred to as voice samples.
  • the digitized voice samples are referred to as digitized speech data.
  • the digitized speech data is multiplexed onto the input time division multiplexed highway 212 in a time slot assigned by the controller 216. Conversely any voice data on the input time division multiplexed highway 212 that originates from the digital signal processor 214 undergoes a digital to analog conversion before transmission to the analog PSTN connection 208.
  • the processing path for the analog PSTN connection 208 and the digital PSTN connection 202 converge in step 310, when a digital signal processor is assigned to handle the incoming call.
  • the controller 216 selects a digital signal processor 214 programmed to perform the input speech analyzation, compression, channel encoding and multiplexing.
  • the digital signal processor 214 assigned reads the data on the input time division multiplexed highway 212 into the previously assigned time slot.
  • the data read by the digital signal processor 214 is stored for processing, in step 312, as uncompressed speech data.
  • the stored uncompressed, multi-frame speech data is analyzed and grouped according to the pitch, spectral parameters, energy, and voicing information.
  • the grouped parameters are buffered so each set of parameters may be coded independently.
  • the data is compressed utilizing, for example, a vector quantization method, although alternative compression methods known to those skilled in the relevant art may be implemented.
  • the compressed data is assigned to three distinct encoders which utilize distinct encoding methods with varying levels of protection. Header and pitch information are assigned to encoder I; spectral parameters are assigned to encoder II; and energy information is assigned encoder III.
  • step 320 the header and pitch information are encoded implementing a high redundancy, maximum protection encoding method. This provides for greater protection, but at the cost of higher overhead.
  • step 322 the spectral parameters are encoded implementing a limited channel protection method, which is a compromise between the high protection provided by encoder I and no error correction by encoder III. This requires only minimal channel overhead.
  • step 324 energy information is encoded implementing a zero redundancy encoding method which provides for no error correction, but offers error robustness.
  • Encoder I, II and III then pass their respective coded data to a multiplexer wherein the data is multiplexed 326 and then stored in a paging queue 328 for later transmission. At the appropriate time, the queued data is sent to the transmitter 108 at step 330 and transmitted, at step 332.
  • FIG. 4 is a block diagram of a digital signal processor that can be utilized in the paging terminal of FIG. 2 and paging receiver of FIG 6.
  • the digital signal processor 400 functions both as an analyzer to determine the essential speech parameters and as a synthesizer to reconstruct a replica of the input speech input based on such speech parameters.
  • vocoder 400 receives speech input 402 which then passes through gain adjustment block 404 (e.g., an AGC) and analog to digital (A/D) converter 406.
  • A/D 406 supplies digitized input speech to microprocessor or controller 408.
  • Microprocessor 408 communicates over bus 418 with ROM 420 (e.g., an EPROM or
  • EEPROM electrically erasable read-only memory
  • SRAM selective access memory
  • address decoder 424 address decoder 424.
  • These elements act in concert to execute the instructions stored in ROM 420 to divide the incoming digitized speech into frames and analyze the frames to determine the significant speech parameters associated with each frame of speech, as for example, pitch, spectrum, energy and voicing. Additionally, these elements act in concert to assign the "most error sensitive data" to be data "A" which consists of heading and pitch information; the intermediate error sensitive data to be data "B” which consists of the spectral parameters; and the least error sensitive data to be data "C” which consists of any remaining speech parameters not assigned as "A” or "B” . These parameters are delivered to output 410 from whence they go to three distinct channel coders and a multiplexer for eventual transmission to a receiver.
  • vocoder 400 When acting as a synthesizer (i.e. decoder), vocoder 400 receives speech parameters from the de-multiplexer and three distinct channel decoders via input 412. These speech parameters are used by microprocessor 408 in connection with SRAM 424 and address decoder 46 and the program stored in ROM 420, to provide digitized synthesized speech to D/A converter 416 which converts the digitized synthesized speech back to analog form and provides synthesized analog speech via optional gain adjustment block 414 to output 426 for delivery to a loud speaker or head phone.
  • D/A converter 416 converts the digitized synthesized speech back to analog form and provides synthesized analog speech via optional gain adjustment block 414 to output 426 for delivery to a loud speaker or head phone.
  • the vocoder 400 of FIG. 4 is capable of not only analyzing speech parameters and compressing them but also channel encoding these quantized speech parameters with varying protection levels by distinct encoding methods , thereby s i gnificantly decreasing the bit rate that would be requ i red w i th previous channel encoding schemes, without effecting the quality of the speech replication.
  • GP-VCM General Purpose Voice Coding Module
  • FIG. 5 is a flow chart showing the functions performed by the digital signal processor utilized in the paging terminal of FIG. 2.
  • the digital speech data 502 that was previously stored in the digital signal processor 214 as uncompressed vo i ce data is passed through a gain normalization step 504 and analyzed at step 506.
  • the amplitude of the digital speech message is adjusted on a syllabic basis to fully utilize the dynamic range of the system and improve the apparent signal- to-noise performance.
  • the normalized uncompressed speech data is grouped into short duration segments of speech in step 506 ( a ) .
  • the groups typically contain twenty to thirty milliseconds of speech data.
  • an analyzer extracts the energy, spectrum, voice, and pitch parameters in parallel, utilizing respective extraction methods which are well known to those sk i lled in the relevant art.
  • a linear predictive coding ( LPC ) process is performed on the short duration segment of speech to provide a short term prediction.
  • the LPC process analyzes the short duration segments of speech and extracts the spectrum parameters.
  • the digital voice compression process illustrated herein calculates thirteen parameters .
  • the first three parameters represent the total energy in the speech segment, a characteristic pitch value, and voicing information.
  • the remaining ten parameters are referred to as spectral parameters and basically represent coefficients of a digital filter. It will be appreciated, however, that varying parameter numbers can be utilized with the present invention.
  • a buffer is provided, in step 508(a), wherein the speech parameters of pitch, energy, voicing and spectral parameters from a plurality of frames are grouped.
  • the parameter sets are then compressed in step 508(b). If the parameter set is spectrum, compression occurs utilizing a vector quantization process step 508(b). If the parameter set is energy, compression occurs utilizing a type of discrete cosine transform and scalar quantization. If the parameter set is pitch, compression occurs utilizing a type of run-length coding.
  • a second buffer is provided to allow for the channel encoding step 510.
  • the quantized vector representing the pitch and heading information is encoded using a method providing for the greatest protection.
  • the quantized vector representing the spectral parameters is encoded implementing an encoding method for intermediate data protection.
  • the quantized vector representing the energy information for the group of parameters is encoded using a method of zero redundancy.
  • the encoded data from steps 510(a- c) is then passed to a multiplexer 512 for subsequent data transmission.
  • FIG. 6 is an electrical block diagram of the paging receiver 114.
  • the signal transmitted from the transmitting antenna 110 is intercepted by the receiving antenna 112.
  • the receiving antenna 112 is coupled to a receiver 600.
  • the receiver 600 processes the signal received by the receiver antenna 112 and produces a receiver output signal 614 which is a replica of the encoded data transmitted.
  • the output signal is passed to a selected digital signal processor 604 wherein a de-multiplexer 604(a) de-multiplexes the encoded speech parameters and sends the data to three distinct decoders.
  • Decoder I 604(b) decodes the information from encoder I and passes it to de-quantizer 604(e); decoder II 604(c) decodes the information from encoder II and passes the information to de-quantizer 604(e); and decoder III 604(d) decodes the information from encoder III and passes it to de-quantizer 604(e).
  • De-quantizer 604(e) then de-quantizes the vectors representing the grouped parameters of speech and passes the information to synthesizer 604(f) which reconnects the analyzed frames of speech and replicates the speech from the speech parameters produced from the analyzer.
  • the digital signal processor 604 also provides the basic control of the various functions of the paging receiver 114.
  • the digital signal processor 604 is coupled to a battery saver switch 618, a code memory 612, a user interface 614, and a message memory 616, via the control bus 610.
  • the code memory 612 stores unique identification information or address information, necessary for the controller to implement the selective call feature.
  • the user interface 614 provides the user with an audio, visual or mechanical signal indicating the reception of a message and can also include a display and push buttons for the user to input commands to control the receiver.
  • the message memory 616 provides a place to store messages for future review, or to allow the user to repeat the message.
  • the battery saver switch 618 provides means of selectively disabling the supply power to the receiver during a period when the system is communicating with other pagers or not transmitting, thereby reducing power consumption and extending battery life in a manner well known to one with ordinary skill in the art.
  • FIG. 7 is a block diagram illustrating parallel coding for 3-level error protection of the present invention.
  • Subset A 702 consisting of the "most error sensitive" speech replication information of heading and pitch is sent to encoder I 708.
  • Subset B 704 consisting of the "intermediate error sensitive” information of the spectral parameters is sent to encoder II 710.
  • Subset C 706, consisting of the "least error sensitive” energy information is sent to encoder III 712. It will be appreciated that further categorization of speech parameters can be utilized, providing for incremental data protection. For example, assigning most error sensitive information to subset "A”, next most error sensitive information to subset "B”, intermediate error sensitive information to subset "C” and least error sensitive information to subset "D”, while providing varying levels of protection for each. As this data passing to encoder I 708 is the most error sensitive for speech replication, it is provided the greatest degree of protection by utilizing, for example, a Hamming (7,4) code, which implies that 3/4 overhead is required.
  • FIG. 8 is a flow chart of a Hamming code encoder, where encoder I codes every 4 bits in subset A into a 7 bits Hamming codeword by a Hamming code with the Generator matrix (G) of the Hamming code and parity check matrix (H), given by G and H in FIG. 8. Note, that if all parameters were coded using the Hamming (7,4) code with its concomitant overhead, the final vocoder rate would be approximately 1600 bits per second. As is the object of this invention, different coding techniques will be utilized to obviate the larger overhead for the less error sensitive information.
  • G Generator matrix
  • H parity check matrix
  • subset "B" in encoder II is encoded with a minimum redundancy method, thereby eliminating excessive redundancy for the less error sensitive parameters, which can reduce the effective capacity of a system.
  • Encoder II exploits the inter-frame correlation of the spectrum of speech and operates with less redundancy than Encoder I.
  • Encoder II includes a 2-step procedure illustrated in FIGS. 11 and 12 to correct errors in the bits for the spectrum of a speech frame.
  • An error detection phase and a non- algebraic error correction phase are used. Errors are detected by simple parity and speech parameter checks. Once an error occurs, a flag is set indicating the error. The error is corrected by interpolating the two nearest neighboring spectrum vectors which have no error detected.
  • Encoder II is described in terms of the coding method used for the line spectral frequencies (LSF) spectrum parameters of a digital vocoder. This coding method is fully detailed in United States Patent Application Serial No.
  • the LSFs are encoded by split vector quantization (SVQ) .
  • Encoder II's spectrum coding uses two 9-bit codebooks C anc j C 2 . With the 10th order LPC and a 4/6 split, C contains the first four LSF's and C 2 , the next six of a reconstructed 10 dimensional spectrum vector.
  • C contains the first four LSF's and C 2 , the next six of a reconstructed 10 dimensional spectrum vector.
  • At the transmitter side for each 10 dimensional spectrum vector for a speech frame, two 9-bit indexes are transmitted. To the 18 bits for each frame, is added an even parity bit which is the minimum amount of overhead needed to detect any channel errors.
  • LSFs for speech frames satisfy the following ordering property: ⁇ i ⁇ 02 ⁇ ... ⁇ io , where ⁇ t refers to the i th LSF.
  • Subset "C” sent to encoder III 712 is the least error sensitive information and is therefore encoded with a zero redundancy coding method providing for the least protection and overhead. This brings the overall bit rate down considerably, without appreciably affecting the quality of the replicated voice.
  • Encoder III simply converts a pure binary code into a Gray code as shown in the flow chart of FIG. 9, which depicts a flowchart of a zero redundancy coding method which is unable to correct any bit error.
  • data "A”, "B” and “C” are passed to multiplexer 714 wherein the signals are multiplexed and transmitted via a transmitter 716 over the transmission channel 718 to the receiver 720.
  • the receiver 720 passes the data to a de-multiplexer 722, which passes data "A” to decoder I 724.
  • FIG. 10 is a flowchart of a Hamming Code Decoder, wherein data "A” is decoded utilizing maximum error detection and correction, with the concomitant overhead, producing output "A” 730.
  • the de-multiplexer 722 simultaneously sends data "B" to decoder II 726 wherein is analyzed to determine if, at the receiver, the last LSF of the vector picked up from C is greater than the first LSF of the vector picked up from C 2 for a particular frame, in which case an error is assumed to be detected.
  • Decoder II 720 sequentially flips each bit in the received code in which an error is detected, generating 18 candidates and further comparing each codebook vector corresponding to the candidate index with an interpolated vector at the error frame position.
  • the interpolated vector is obtained by linear interpolation between the two nearest neighbor vectors which have no error detected. The candidate with the smallest distance to the interpolated vector will be picked to replace the error vector.
  • de-multiplexer 722 passes data "C" to decoder III which converts the Gray code into a pure binary code.
  • FIG. 13 is a flow chart showing the simple conversion of the Gray code into a pure binary code illustrated as output C 734.

Abstract

A method and apparatus for protecting compressed speech in very low bit rate voice messaging comprising the steps of analyzing compressed input speech data to discriminate between the data such as heading, pitch, energy, spectral and timing information (506); providing a plurality of methods for channel encoding wherein the most error sensitive information for speech replication is provided a coding method with the greatest protection and sequentially less error sensitive information is encoded utilizing methods of sequentially less protection (510), thereby providing for significantly less overhead in the overall channel encoding process than would be present under a standard channel encoding scheme; passing the output of each encoder to a multiplexor (512), which multiplexes the plurality of channel encoded data and sends the encoded data via a transmission channel to a de-multiplexer wherein the channel encoded data is separated and passed to a plurality of decoders designed to decode the data of its paired encoder; passing the decoded data to an analog to digital converter wherein the digital data is converted to analog data; and passing the analog data to a speech synthesizer which replicates the input speech.

Description

Trifurcatβd Channel Encoding for Compressed Speech
Field of the Invention
This invention relates generally to communication systems, and more specifically to a compressed voice digital communication system which provides a novel channel encoding method utilizing a plurality of encoders for encoding data of varying importance using varying protection levels to decrease the average overhead for the channel encoding process.
Background of the Invention
In modern communication systems data transmission occurs under a condition of narrow bandwidth. It is therefore highly advantageous to develop methods which allow for data transmission at low bit rates. Various data compression techniques have been developed to assist in accomplishing the goal of low bit rate data transfer. In lieu of sending the input speech itself, the speech is analyzed to determine its parameters such as, pitch, spectrum, energy and voicing and these parameters are transmitted. The receiver then uses these parameters to synthesize an intelligible replica of the input speech.
Adding to the difficulty in low bit rate data transmission is the requirement that the data to be transmitted utilizing a channel encoding process which protects the data from irregularities in the transmission channel. This significantly increases the bit rate required for the data transmission. As will be discussed below, the speech parameters vary significantly in their importance in speech replication. For example, if one of the energy parameters is altered in the transmission process, speech replication will not be significantly affected. However, if pitch information becomes altered, it will likely render the speech replication unintelligible. As used in the art, "pitch" generally refers to the period or frequency of the buzzing of the vocal cords or glottis, "spectrum" generally refers to the frequency dependent properties of the vocal tract, "energy" generally refers to the magnitude or intensity of the speech waveform, "voicing" refers to whether or not the vocal cords are active, and "quantizing" refers to choosing one of a finite number of discrete values to characterize these ordinarily continuous speech parameters . The number of different quantized values (or levels) for a particular speech parameter is set by the number of bits assigned to code that speech parameter. The foregoing terms are well known in the art and commonly used in connection with vocoding (voice- coding) .
Vocoders may be built which operate at rates such as 200, 400, 600, 800, 900, 1200, 2400, 4800, 9600 bits per second, with varying results depending on the bit rate. One skilled in the art will readily note that the quality of reconstructed voice will vary depending not only on the bit rate chosen, but also on parameters such as previously discussed (e.g., pitch period, spectrum bandwidth, energy, voicing, etc.). Typically, as the transmission channel bandwidth narrows, the allowable bit rate will fall accordingly. Consequently, as the allowable bit rate falls, it becomes more difficult to find a data compression scheme that provides clear, intelligible, synthesized speech. Low bit rates further aggravate the problem of digital voice transmission since error-free reception requires a channel encoding scheme that adequately protects the selected parameters from corruption. Accordingly, a scheme must be selected that adequately protects the coded vioice data without adding significant overhead, resulting in increased bit rate requirements. In addition, practical communication systems must take into consideration the complexity of the coding scheme since unduly complex coding schemes cannot be substantially executed in real time or using computer processors of reasonable size, speed, complexity and cost. Processor power consumption is also an important consideration since vocoders are frequently used in hand-held and portable apparatus.
As used herein the term "data compression" is intended to refer to the creation of a set of quantized parameters describing the input speech and "de-compression" is intended to refer to the subsequent use of this set of quantized parameters to synthesize a replica of the input speech. Also, as used herein "channel encoding" is referred to as both the encoding and decoding of the compressed speech parameter data for the protection of the data when passed through a transmission channel. The word "vocoder" has been coined in the art to describe an apparatus which performs the aforementioned functions.
While prior art vocoders are used extensively, they suffer from a number of limitations well known in the art, especially when low bit rates are desired. Thus, there is a continuing need an improved vocoder method and apparatus, especially for vocoders capable of providing highly intelligible speech at low or moderate bit rates .
Summary of the Invention
Briefly, according to the invention, there is provided an improved method for protecting compressed speech in very low bit rate voice messaging by utilizing a novel channel encoding scheme. In its preferred embodiment, compressed input speech is distinguished based on the importance of the data for speech replication. The most error sensitive data is considered to be heading and pitch information and therefore is passed to encoder I wherein a channel encoding method, such as a Hamming code, is utilized to provide for maximum protection of the compressed data from channel errors, but at the cost of increased overhead. The intermediate error sensitive data is considered to be the spectral parameters and is encoded utilizing a method that provides less protection, such as the split vector quantization error detecting method which requires less overhead. The least error sensitive data is considered to be the energy information and is encoded using a pure binary code (e.g., a Gray code). This provides minimal error correction and minimal overhead, while significantly decreasing the number of bits required for channel encoding. The encoded data is then multiplexed and transmitted via a transmitter to a receiver capable of de¬ multiplexing the encoded data. The encoded most error sensitive data is passed to decoder I, wherein a decoding method consistent with encoder I is used, such as a Hamming code decoder in the preferred embodiment. The encoded intermediate error sensitive data is passed to decoder II, wherein a decoding method consistent with encoder II is used, such as the split vector quantization error correcting method. The encoded least error sensitive data is passed to decoder III, wherein the Gray code to pure binary code conversion consistent with encoder III is used. The output of each decoder is a replication of the compressed input speech. By utilizing the trifurcated channel encoding method of the present invention significantly less overhead is required while maintaining high quality output speech.
Brief Description of the Drawings
FIG. 1 is a block diagram of a communication system, such as a paging system, utilizing the novel channel encoding scheme in accordance with the preferred embodiment of the present invention.
FIG. 2 is an electrical block diagram of a paging terminal and associated paging transmitters utilizing the novel channel encoding scheme of the preferred embodiment of the present invention. FIG. 3 is a flow chart showing the operation of the paging terminal of FIG. 2.
FIG. 4 is a block diagram of a digital signal processor that can be utilized in the paging terminal of FIG. 2 and paging receiver of FIG 6. FIG. 5 is a flow chart showing the operation of a digital signal processor utilized in the paging terminal of FIG. 2.
FIG. 6 is an electrical block diagram of a paging receiver utilizing the novel channel encoding scheme of the preferred embodiment of the present invention. FIG. 7 is a block diagram illustrating the parallel coding for 3-level error protection of the preferred embodiment of the present invention. FIG. 8 is a flowchart of a Hamming code encoder utilized in the channel encoding process of the present invention to encode the "most error sensitive" speech parameter data.
FIG. 9 is a flowchart of the conversion of pure binary code into a Gray code.
FIG. 10 is a flowchart of a Hamming code decoder, which decodes the "most error sensitive" speech parameter data encoded by the Hamming code encoder.
FIG. 11 is a flow chart of the error correction algorithm utilized to encode and decode the "intermediate error sensitive" speech parameter data.
FIG. 12 is a continuation of the error correction algorithm of FIG. 11.
FIG. 13 is a flowchart for conversion of a Gray code into a pure binary code.
Description of a Preferred Embodiment
Due to the narrow bandwidth constraints mentioned above, a desirable channel encoding scheme is one with good reliability and less overhead. This invention proposes a novel channel encoding scheme for protecting compressed speech data from channel impairments. In voice messaging, the data obtained from a vocoder is highly compressed. To achieve very low bit rate, the compression is typically done on a multi- frame basis. As a result, some compressed data may have more influence on speech reconstruction than others. In the present embodiment of this invention, the compressed data is classified into three categories according to their importance in speech reconstruction. The new coding scheme guarantees that the most error sensitive data will have better protection than the least error sensitive data. This substantially reduces channel coding overhead while achieving good reliability in message transmission. FIG. 1 shows a block diagram of a communication system, such as a paging system, utilizing a novel channel encoding scheme in accordance with the present invention. The digital voice compression process and channel encoding scheme are adapted to the store and forward type communications systems, which provide the time required to perform the highly computational intensive voice compression and channel encoding processes. Furthermore, it minimizes the processing required to be performed in a portable communication device, such as a pager, making the process ideal for paging applications and other similar store and forward type voice communications. The highly computational intensive portion of the digital voice compression process is performed in a fixed portion of the system and as a result little computation is required to be performed in the portable portion of the system as will be described below.
This invention will be exemplified by incorporation into a paging system, although it will be appreciated that any device which transmits speech parameters and requires channel coding prior to transmission will benefit from the present invention as well. A paging system is designed to provide service to a variety of users each requiring different services. Some of the users will require numeric messaging services, and still others users may require voice messaging services. In the paging system, the caller originates a page by communicating with a paging terminal 106 via a telephone 102 through the public switched telephone network (PSTN) 104 or the like. The paging terminal 106 encodes the message and places the encoded message in a transmission queue. At an appropriate time, the messages are broadcast under control of the paging terminal 106 using a transmitter 108 and a transmitting antenna 110. It will be appreciated that in a simulcast transmission system, a multiplicity of transmitters covering different geographic areas can be utilized as well, thus increasing geographic coverage.
The signal transmitted from the transmitting antenna 110 is intercepted by a receiving antenna 112 and processed by a paging receiver 114. The person being paged may be alerted and the message may be displayed or enunciated depending on the type of messaging being employed.
An electrical block diagram of the paging terminal 106 and the paging transmitter 108 utilizing digital voice compression and the trifurcated encoding of the present invention is shown in FIG. 2. The paging terminal 106 shown in FIG. 1 is of a type that would be used to service a large number of simultaneous users, such as in a commercial Radio Common Carrier (RCC) system. The paging terminal 106 utilizes a number of input devices, signal processing devices and output devices controlled by a controller 216. Communications between the controller 216 and the various devices that compose the paging terminal 106 are handled by a digital control bus 210. Communication of digitized voice and data is handled by an input time division multiplexed highway 212 and an output time division multiplexed highway 218. It will be appreciated that the digital control bus 210, input time division multiplexed highway 212 and output time division multiplexed highway 218 can be extended to provide for expansion of the paging terminal 106. The interface between the PSTN 104 and the paging terminal 106 can be either a plurality of multi-call per line multiplexed digital connections shown in FIG. 2 as a digital PSTN connection 202 or plurality of single call per line analog PSTN connections 208. In alternative embodiments, the interface may take the form of a high speed local area or wide area network interface, utilizing such conventional communication protocols as TCP/IP or the like. One of ordinary skill would appreciate that other conventional data transport mechanisms, both analog and digital, may be employed within the scope and intent of the present invention.
Each digital PSTN connection 202 is serviced by a digital telephone interface 204. The digital telephone interface 204 provides necessary signal supervision, regulatory protection requirements for operation of the digital voice compression process and data protection in accordance with the present invention. The digital telephone interface 204 can also provide temporary storage of the digitized voice frames to facilitate interchange of time slots and time slot alignment necessary to provide an access to the input time division multiplexed highway 212. As will be described below, requirements for service and supervisory responses are controlled by a controller 216. Communications between the digital telephone interface 204 and the controller 216 passes over the digital control bus 210. Each analog PSTN connection 208 is serviced by an analog telephone interface 206. The analog telephone interface 206 provides the necessary signal conditioning, signaling, supervision, analog to digital and digital to analog conversion, and regulatory protection requirements for operation of the digital voice compression and channel encoding processes in accordance with the present invention. The frames of digitized voice messages from the analog to digital converter are temporarily stored in the telephone interface 206 to facilitate interchange of time slots and time slot alignment necessary to provide an access to the input time division multiplexed highway 212. Communications between the analog telephone interface 206 and the controller 216 pass over the digital control bus 210. When an incoming call is detected, a request for service is sent from the analog telephone interface 206 or the digital telephone interface 204 to the controller 216. The controller 216 selects a digital signal processor 214 from a plurality of digital signal processors. The controller 216 couples the analog telephone interface 206 or the digital telephone interface 204 requesting service to the digital signal processor 214 selected via the input time division multiplexed highway 212.
The digital signal processor 214 can be programmed to perform all of the signal processing functions required to complete the paging process. Typical signal processing functions performed by the digital signal processor 214 include voice analyzation, digital voice compression, channel encoding in accordance with the present invention, dual tone multi frequency (DTMF) decoding and generation and modem tone generation. The digital signal processor 214 can be programmed to perform one or more of the functions described above. In the case of a digital signal processor 214 that is programmed to perform more than one task, the controller 216 assigns the particular task needed to be performed at the time the digital signal processor 214 is selected, or in the case of a digital signal processor 214 that is programmed to perform only a single task, the controller 216 selects a digital signal processor 214 programmed to perform the particular function needed to complete the next step in the paging process. The operation of the digital signal processor 214 performing dual tone multi frequency (DTMF) decoding and prerecorded voice prompt generation is well known to one of ordinary skill in the art.
The processing of a page request, in the case of a voice message, proceeds in the following manner. The digital signal processor 214 that is coupled to an analog telephone interface 206 or a digital telephone interface 204 prompts the originator for a voice message. The digital signal processor 214 analyzes the input speech for pitch, energy, spectral parameters and voicing information 214(a) and then compresses the data 214(b) using a compression method such as vector quantization; although, it is appreciated that any compression method can be utilized with this invention. The digital signal processor 214 then processes the digital voice message generated by the compression processes using three distinct channel encoding methods: Encoder I 214(c) (i) for the "most error sensitive" data, Encoder II 214(c) (ii) for the "intermediate error sensitive" data and Encoder III 214
(c)(iii) for the "less error sensitive" data. The channel encoder I 214(c) (i), encoder II 214(c) (ii) and encoder III 214 (c)(iii) encode the data (described in detail below), each with varying levels of data protection. The compressed and channel encoded data is multiplexed by the digital signal processor 214(d) and then coupled to a paging protocol encoder 228 via the output time division multiplexed highway 218, under the control of the controller 216. The controller 216 directs the paging protocol encoder 228 to store the encoded data in a data storage device 226 via the output time division multiplexed highway 218. At an appropriate time, the compressed and channel encoded data is downloaded into the transmitter control unit 220, under control of the controller 216, via the output time division multiplexed highway 218 and transmitted using the paging transmitter 108 and the transmitting antenna 110.
FIG. 3 is a flow chart which describes the operation of the paging terminal 106 shown in FIG. 1 when processing a voice message. There are shown two entry points into the flow chart 300. The first entry point is for a process associated with the digital PSTN connection 202 and the second entry point is for a process associated with the analog PSTN connection 208. In the case of the digital PSTN connection 202, the process starts with step 302, receiving a request over a digital PSTN line. Requests for service from the digital PSTN connection 202 are indicated by a bit pattern in the incoming data stream. The digital telephone interface 204 receives the request for service and communicates the request to the controller 216.
In step 304, information received from the digital channel requesting service is separated from the incoming data stream by digital de-multiplexing. The digital signal received from the digital PSTN connection 202 typically includes a plurality of digital channels multiplexed into an incoming data stream. The digital channels requesting service are de-multiplexed and the digitized speech data is then stored temporarily to facilitate time slot alignment and multiplexing of the data onto the input time division multiplexed highway 212. A time slot for the digitized speech data on the input time division multiplexed highway 212 is assigned by the controller 216. Conversely, digitized speech data generated by the digital signal processor 214 for transmission to the digital PSTN connection 202 is formatted suitably for transmission and multiplexed into the outgoing data stream.
Similarly with the analog PSTN connection 208, the process starts with step 306 when a request from the analog PSTN line is received. On the analog PSTN connection 208, incoming calls are signaled by either low frequency AC signals or by DC signaling. The analog telephone interface 206 receives the request and communicates the request to the controller 216.
In step 308, the analog voice message is converted into a digital data stream. The analog signal received over its total duration is referred to as the analog voice message. The analog signal is sampled and digitized. The samples of the analog signal are referred to as voice samples. The digitized voice samples are referred to as digitized speech data. The digitized speech data is multiplexed onto the input time division multiplexed highway 212 in a time slot assigned by the controller 216. Conversely any voice data on the input time division multiplexed highway 212 that originates from the digital signal processor 214 undergoes a digital to analog conversion before transmission to the analog PSTN connection 208.
As shown in FIG. 3, the processing path for the analog PSTN connection 208 and the digital PSTN connection 202 converge in step 310, when a digital signal processor is assigned to handle the incoming call. The controller 216 selects a digital signal processor 214 programmed to perform the input speech analyzation, compression, channel encoding and multiplexing. The digital signal processor 214 assigned reads the data on the input time division multiplexed highway 212 into the previously assigned time slot.
The data read by the digital signal processor 214 is stored for processing, in step 312, as uncompressed speech data. In step 314, the stored uncompressed, multi-frame speech data is analyzed and grouped according to the pitch, spectral parameters, energy, and voicing information. In step 315, the grouped parameters are buffered so each set of parameters may be coded independently. In step 316 the data is compressed utilizing, for example, a vector quantization method, although alternative compression methods known to those skilled in the relevant art may be implemented. In step 318 the compressed data is assigned to three distinct encoders which utilize distinct encoding methods with varying levels of protection. Header and pitch information are assigned to encoder I; spectral parameters are assigned to encoder II; and energy information is assigned encoder III. In step 320 the header and pitch information are encoded implementing a high redundancy, maximum protection encoding method. This provides for greater protection, but at the cost of higher overhead. In step 322 the spectral parameters are encoded implementing a limited channel protection method, which is a compromise between the high protection provided by encoder I and no error correction by encoder III. This requires only minimal channel overhead. In step 324, energy information is encoded implementing a zero redundancy encoding method which provides for no error correction, but offers error robustness.
Encoder I, II and III then pass their respective coded data to a multiplexer wherein the data is multiplexed 326 and then stored in a paging queue 328 for later transmission. At the appropriate time, the queued data is sent to the transmitter 108 at step 330 and transmitted, at step 332.
FIG. 4 is a block diagram of a digital signal processor that can be utilized in the paging terminal of FIG. 2 and paging receiver of FIG 6. The digital signal processor 400 functions both as an analyzer to determine the essential speech parameters and as a synthesizer to reconstruct a replica of the input speech input based on such speech parameters. When acting as an analyzer (i.e. coder), vocoder 400 receives speech input 402 which then passes through gain adjustment block 404 (e.g., an AGC) and analog to digital (A/D) converter 406. A/D 406 supplies digitized input speech to microprocessor or controller 408. Microprocessor 408 communicates over bus 418 with ROM 420 (e.g., an EPROM or
EEPROM), alterable memory (e.g. SRAM) 422 and address decoder 424. These elements act in concert to execute the instructions stored in ROM 420 to divide the incoming digitized speech into frames and analyze the frames to determine the significant speech parameters associated with each frame of speech, as for example, pitch, spectrum, energy and voicing. Additionally, these elements act in concert to assign the "most error sensitive data" to be data "A" which consists of heading and pitch information; the intermediate error sensitive data to be data "B" which consists of the spectral parameters; and the least error sensitive data to be data "C" which consists of any remaining speech parameters not assigned as "A" or "B" . These parameters are delivered to output 410 from whence they go to three distinct channel coders and a multiplexer for eventual transmission to a receiver.
When acting as a synthesizer (i.e. decoder), vocoder 400 receives speech parameters from the de-multiplexer and three distinct channel decoders via input 412. These speech parameters are used by microprocessor 408 in connection with SRAM 424 and address decoder 46 and the program stored in ROM 420, to provide digitized synthesized speech to D/A converter 416 which converts the digitized synthesized speech back to analog form and provides synthesized analog speech via optional gain adjustment block 414 to output 426 for delivery to a loud speaker or head phone.
Although similar vocoders, such as the General Purpose Voice Coding Module (GP-VCM) , Part No. 01-P36780D001 manufactured by Motorola, inc. exist, by programming ROM 420, the vocoder 400 of FIG. 4 is capable of not only analyzing speech parameters and compressing them but also channel encoding these quantized speech parameters with varying protection levels by distinct encoding methods, thereby significantly decreasing the bit rate that would be required with previous channel encoding schemes, without effecting the quality of the speech replication.
FIG. 5 is a flow chart showing the functions performed by the digital signal processor utilized in the paging terminal of FIG. 2. The digital speech data 502 that was previously stored in the digital signal processor 214 as uncompressed voice data is passed through a gain normalization step 504 and analyzed at step 506. The amplitude of the digital speech message is adjusted on a syllabic basis to fully utilize the dynamic range of the system and improve the apparent signal- to-noise performance.
The normalized uncompressed speech data is grouped into short duration segments of speech in step 506(a). Typically the groups contain twenty to thirty milliseconds of speech data. in step 506(b), an analyzer extracts the energy, spectrum, voice, and pitch parameters in parallel, utilizing respective extraction methods which are well known to those skilled in the relevant art. For example, a linear predictive coding (LPC) process is performed on the short duration segment of speech to provide a short term prediction. The LPC process analyzes the short duration segments of speech and extracts the spectrum parameters. There are many different
LPC processes known in thιee aarrtt noft d-iiig-,ii+ta,lι processing, and, th,e art continues to develop improved methods. it will be apparent to one of ordinary skill in the art which LPC method will best meet the requirement of the system being designed. The digital voice compression process illustrated herein calculates thirteen parameters . The first three parameters represent the total energy in the speech segment, a characteristic pitch value, and voicing information. The remaining ten parameters are referred to as spectral parameters and basically represent coefficients of a digital filter. It will be appreciated, however, that varying parameter numbers can be utilized with the present invention. Due to the non-real time communications of the present invention a buffer is provided, in step 508(a), wherein the speech parameters of pitch, energy, voicing and spectral parameters from a plurality of frames are grouped. Depending upon the parameter set being addressed, the parameter sets are then compressed in step 508(b). If the parameter set is spectrum, compression occurs utilizing a vector quantization process step 508(b). If the parameter set is energy, compression occurs utilizing a type of discrete cosine transform and scalar quantization. If the parameter set is pitch, compression occurs utilizing a type of run-length coding. These compression methods are generally well-known to those of ordinary skill in the relevant art.
Following the compression process, a second buffer is provided to allow for the channel encoding step 510. In step 510(a), the quantized vector representing the pitch and heading information is encoded using a method providing for the greatest protection. In step 510(b), the quantized vector representing the spectral parameters is encoded implementing an encoding method for intermediate data protection. In step 510(c), the quantized vector representing the energy information for the group of parameters is encoded using a method of zero redundancy. The encoded data from steps 510(a- c) is then passed to a multiplexer 512 for subsequent data transmission.
FIG. 6 is an electrical block diagram of the paging receiver 114. The signal transmitted from the transmitting antenna 110 is intercepted by the receiving antenna 112. The receiving antenna 112 is coupled to a receiver 600. The receiver 600 processes the signal received by the receiver antenna 112 and produces a receiver output signal 614 which is a replica of the encoded data transmitted. The output signal is passed to a selected digital signal processor 604 wherein a de-multiplexer 604(a) de-multiplexes the encoded speech parameters and sends the data to three distinct decoders. Decoder I 604(b) decodes the information from encoder I and passes it to de-quantizer 604(e); decoder II 604(c) decodes the information from encoder II and passes the information to de-quantizer 604(e); and decoder III 604(d) decodes the information from encoder III and passes it to de-quantizer 604(e). De-quantizer 604(e) then de-quantizes the vectors representing the grouped parameters of speech and passes the information to synthesizer 604(f) which reconnects the analyzed frames of speech and replicates the speech from the speech parameters produced from the analyzer.
The digital signal processor 604 also provides the basic control of the various functions of the paging receiver 114. The digital signal processor 604 is coupled to a battery saver switch 618, a code memory 612, a user interface 614, and a message memory 616, via the control bus 610. The code memory 612 stores unique identification information or address information, necessary for the controller to implement the selective call feature. The user interface 614 provides the user with an audio, visual or mechanical signal indicating the reception of a message and can also include a display and push buttons for the user to input commands to control the receiver. The message memory 616 provides a place to store messages for future review, or to allow the user to repeat the message. The battery saver switch 618 provides means of selectively disabling the supply power to the receiver during a period when the system is communicating with other pagers or not transmitting, thereby reducing power consumption and extending battery life in a manner well known to one with ordinary skill in the art.
FIG. 7 is a block diagram illustrating parallel coding for 3-level error protection of the present invention. Subset A 702, consisting of the "most error sensitive" speech replication information of heading and pitch is sent to encoder I 708. Subset B 704, consisting of the "intermediate error sensitive" information of the spectral parameters is sent to encoder II 710. Subset C 706, consisting of the "least error sensitive" energy information is sent to encoder III 712. It will be appreciated that further categorization of speech parameters can be utilized, providing for incremental data protection. For example, assigning most error sensitive information to subset "A", next most error sensitive information to subset "B", intermediate error sensitive information to subset "C" and least error sensitive information to subset "D", while providing varying levels of protection for each. As this data passing to encoder I 708 is the most error sensitive for speech replication, it is provided the greatest degree of protection by utilizing, for example, a Hamming (7,4) code, which implies that 3/4 overhead is required.
FIG. 8 is a flow chart of a Hamming code encoder, where encoder I codes every 4 bits in subset A into a 7 bits Hamming codeword by a Hamming code with the Generator matrix (G) of the Hamming code and parity check matrix (H), given by G and H in FIG. 8. Note, that if all parameters were coded using the Hamming (7,4) code with its concomitant overhead, the final vocoder rate would be approximately 1600 bits per second. As is the object of this invention, different coding techniques will be utilized to obviate the larger overhead for the less error sensitive information. Therefore, subset "B" in encoder II is encoded with a minimum redundancy method, thereby eliminating excessive redundancy for the less error sensitive parameters, which can reduce the effective capacity of a system. Encoder II exploits the inter-frame correlation of the spectrum of speech and operates with less redundancy than Encoder I.
Encoder II includes a 2-step procedure illustrated in FIGS. 11 and 12 to correct errors in the bits for the spectrum of a speech frame. An error detection phase and a non- algebraic error correction phase are used. Errors are detected by simple parity and speech parameter checks. Once an error occurs, a flag is set indicating the error. The error is corrected by interpolating the two nearest neighboring spectrum vectors which have no error detected. Encoder II is described in terms of the coding method used for the line spectral frequencies (LSF) spectrum parameters of a digital vocoder. This coding method is fully detailed in United States Patent Application Serial No. (PT01726U) entitled "Method and Apparatus for Minimal Redundancy Error Detection and Correction of Voice Spectrum Parameters." In this vocoder, the LSFs are encoded by split vector quantization (SVQ) . Encoder II's spectrum coding uses two 9-bit codebooks C ancj C2. With the 10th order LPC and a 4/6 split, C contains the first four LSF's and C2, the next six of a reconstructed 10 dimensional spectrum vector. At the transmitter side, for each 10 dimensional spectrum vector for a speech frame, two 9-bit indexes are transmitted. To the 18 bits for each frame, is added an even parity bit which is the minimum amount of overhead needed to detect any channel errors.
Using traditional algebraic coding, a one bit even parity overhead would not be able to correct any channel errors, but it can detect an odd number of channel errors. Additional channel errors can be detected using the ordering property of LSFs. At Encoder II, the LSFs for speech frames satisfy the following ordering property: θi <02< ...<θio, where θtrefers to the ith LSF.
Subset "C" sent to encoder III 712 is the least error sensitive information and is therefore encoded with a zero redundancy coding method providing for the least protection and overhead. This brings the overall bit rate down considerably, without appreciably affecting the quality of the replicated voice.
Encoder III simply converts a pure binary code into a Gray code as shown in the flow chart of FIG. 9, which depicts a flowchart of a zero redundancy coding method which is unable to correct any bit error.
Upon completion of each encoding process, data "A", "B" and "C" are passed to multiplexer 714 wherein the signals are multiplexed and transmitted via a transmitter 716 over the transmission channel 718 to the receiver 720. The receiver 720 passes the data to a de-multiplexer 722, which passes data "A" to decoder I 724.
FIG. 10 is a flowchart of a Hamming Code Decoder, wherein data "A" is decoded utilizing maximum error detection and correction, with the concomitant overhead, producing output "A" 730.
The de-multiplexer 722 simultaneously sends data "B" to decoder II 726 wherein is analyzed to determine if, at the receiver, the last LSF of the vector picked up from C is greater than the first LSF of the vector picked up from C2 for a particular frame, in which case an error is assumed to be detected.
In the error correction phase, Decoder II 720 sequentially flips each bit in the received code in which an error is detected, generating 18 candidates and further comparing each codebook vector corresponding to the candidate index with an interpolated vector at the error frame position. The interpolated vector is obtained by linear interpolation between the two nearest neighbor vectors which have no error detected. The candidate with the smallest distance to the interpolated vector will be picked to replace the error vector.
The error correction algorithm is shown in Figures 11 and 12, with the resultant decoded information shown as output "B" 732. Simultaneously, de-multiplexer 722 passes data "C" to decoder III which converts the Gray code into a pure binary code.
FIG. 13 is a flow chart showing the simple conversion of the Gray code into a pure binary code illustrated as output C 734.
Accordingly, it will be understood that the preferred embodiment of the present invention has been disclosed by way of example and that other modifications and alterations may occur to those skilled in the art without departing from the scope and spirit of the appended claims.
What is claimed is:

Claims

1. A method for protecting compressed speech in very low bit rate voice messaging, comprising the steps of: processing a voice message for generating speech parameters,* buffering said speech parameters; compressing said speech parameters; separating said speech parameters to produce distinguished groups of speech parameters to allow for the use of a plurality of distinct encoders; encoding said distinguished groups of speech parameters with said plurality of distinct encoders to create a plurality of distinct encoded data; multiplexing said plurality of distinct encoded data creating a multiplexed signal; transmitting said multiplexed signal via a transmitter and a transmission channel to a receiver which passes said multiplexed signal to a de-multiplexer; demultiplexing said mutliplexed signal creating a reproduction of said distinct encoded data; decoding said reproduction of said distinct encoded data using a plurality of distinct decoders which produce a reproduction of said speech parameters; passing said reproduction of said speech parameters to a synthesizer; and synthesizing said reproduction of said speech parameters to replicate said voice message.
2. The method of claim 1, wherein said plurality of distinct encoders provide varying levels of compressed speech protection.
3. The method of claim 2, wherein at least three distinct encoders, an encoder I, an encoder II and an encoder III are utilized.
4. The method of claim 3 , wherein said encoder I uses an encoding method which provides for the greatest data protection at the cost of greater overhead, said encoder II uses an encoding method providing for moderate data protection with significantly less overhead, said encoder III uses an encoding method providing error robustness with zero overhead.
5. The method of claim 4 , wherein said encoding method of said encoder I is a Hamming code encoder, said encoding method of said encoder II is split vector quantization protection algorithm as described herein, said encoding method of encoder III is gray code conversion algorithm which converts binary code to a gray code.
6. The method of claim 5, wherein said speech parameters comprise pitch, energy, spectral parameters and voicing information.
7. The method of claim 6, wherein said distinguished speech parameters are defined as most error sensitive speech parameter information, intermediate error sensitive speech parameter information and least error sensitive speech parameter information.
8. The method of claim 7, wherein said pitch and said voicing are defined as the most error sensitive speech parameter information, said spectral parameters are defined as intermediate error sensitive speech parameter information and said energy is defined as least error sensitive speech parameter information.
9. The method of claim 8, wherein said most error sensitive speech parameter information is assigned to encoder I, said intermediate error sensitive speech parameter information is assigned to encoder II and said least error sensitive speech parameter information is assigned to encoder III.
10. A low overhead channel encoding method, comprising the steps of: receiving a plurality of distinguished groups of data to be encoded for transmission; encoding said distinguished groups of data using a plurality of encoders which provide for varying levels of data protection and overhead, to produce a plurality of encoded outputs of said distinguished groups of data; multiplexing said plurality of encoded outputs of said distinguished groups of data to create a single multiplexed signal for transmission by a transmitter via a transmission channel to a receiver; passing said single multiplexed signal through a de¬ multiplexer, in order to re-separate said plurality of encoded outputs of said distinguished groups of data, to a plurality of decoders; decoding said plurality of encoded outputs of said distinguished groups of data using a plurality of decoders which use varying levels of error detection and correction to produce a reproduction of said distinguished groups of data.
11. The method of claim 10, wherein three distinct encoders comprsie an encoder I, an encoder II and an encoder III.
12. The method of claim 11, wherein said encoder I uses an encoding method which provides for the greatest data protection at the cost of greater overhead, said encoder II uses an encoding method providing for moderate data protection with significantly less overhead, said encoder III uses an encoding method providing little error correction with very low overhead.
13. The method of claim 12, wherein said encoder I comprises a Hamming code encoder, said encoder II comprises split vector quantization protection algorithm as described herein, said encoder III comprises gray code conversion algorithm which converts binary code to a gray code.
14. The method of claim 13, wherein said distinguished groups of data comprise pitch, energy, spectral parameters and voicing information.
15. The method of claim 14, wherein said distinguished groups of data comprise most error sensitive data, intermediate error sensitive data and least error sensitive data.
16. The method of claim 15, wherein said pitch and said voicing comprise the most error sensitive data, said spectral parameters comprise intermediate error sensitive data and said energy comprises least error sensitive data.
17. The method of claim 16, wherein said most error sensitive data is assigned to encoder I, said intermediate error sensitive data is assigned to encoder II and said least error sensitive data is assigned to encoder III .
18. A low overhead channel encoding encoder, comprising: a receiver to receive a plurality of distinguished groups of data to be encoded for transmission; a processor for encoding said distinguished groups of data which uses a plurality of encoders to provide for varying levels of data protection and overhead, to produce a plurality of encoded outputs of said distinguished groups of data; a multiplexer to multiplex said plurality of encoded outputs of said distinguished groups of data in order to create a single multiplexed signal for transmission by a transmitter via a transmission channel to a receiver; a de-multiplexer to re-separate said plurality of encoded outputs of said distinguished groups of data; a plurality of decoders to decode said plurality of encoded outputs of said distinguished groups of data wherein said plurality of decoders use varying levels of error detection and correction to produce a reproduction of said distinguished groups of data.
19. The encoder of claim 18, wherein three distinct encoders, an encoder I, an encoder II and an encoder III are utilized.
20. The encoder of claim 19, wherein said encoder I uses an encoding method which provides for the greatest data protection at the cost of greater overhead, said encoder II uses an encoding method providing for moderate data protection with significantly less overhead, said encoder III uses an encoding method providing error robustness with zero overhead.
21. The encoder of claim 20, wherein said encoder I is a Hamming code encoder, said encoder II is split vector quantization protection algorithm as described herein, said encoder III is gray code conversion algorithm which converts binary code to a gray code.
22. The encoder of claim 21, wherein said distinguished groups of data received by said receiver consist of pitch, energy, spectral parameters and voicing information.
23. The encoder of claim 22, wherein said distinguished groups of data are defined as most error sensitive data, intermediate error sensitive data and least error sensitive data.
24. The encoder of claim 23, wherein said pitch and said voicing comprise the most error sensitive data, said spectral parameters comprise intermediate error sensitive data and said energy comprises least error sensitive data.
25. The encoder of claim 24, wherein said most error sensitive data is assigned to encoder I, said intermediate error sensitive data is assigned to encoder II and said least error sensitive data is assigned to encoder III.
PCT/US1996/013394 1995-10-02 1996-08-19 Trifurcated channel encoding for compressed speech WO1997013242A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US53736995A 1995-10-02 1995-10-02
US08/537,369 1995-10-02

Publications (1)

Publication Number Publication Date
WO1997013242A1 true WO1997013242A1 (en) 1997-04-10

Family

ID=24142360

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1996/013394 WO1997013242A1 (en) 1995-10-02 1996-08-19 Trifurcated channel encoding for compressed speech

Country Status (1)

Country Link
WO (1) WO1997013242A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4038495A (en) * 1975-11-14 1977-07-26 Rockwell International Corporation Speech analyzer/synthesizer using recursive filters
US4724535A (en) * 1984-04-17 1988-02-09 Nec Corporation Low bit-rate pattern coding with recursive orthogonal decision of parameters
US4914702A (en) * 1985-07-03 1990-04-03 Nec Corporation Formant pattern matching vocoder
US5048088A (en) * 1988-03-28 1991-09-10 Nec Corporation Linear predictive speech analysis-synthesis apparatus
US5491771A (en) * 1993-03-26 1996-02-13 Hughes Aircraft Company Real-time implementation of a 8Kbps CELP coder on a DSP pair

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4038495A (en) * 1975-11-14 1977-07-26 Rockwell International Corporation Speech analyzer/synthesizer using recursive filters
US4724535A (en) * 1984-04-17 1988-02-09 Nec Corporation Low bit-rate pattern coding with recursive orthogonal decision of parameters
US4914702A (en) * 1985-07-03 1990-04-03 Nec Corporation Formant pattern matching vocoder
US5048088A (en) * 1988-03-28 1991-09-10 Nec Corporation Linear predictive speech analysis-synthesis apparatus
US5491771A (en) * 1993-03-26 1996-02-13 Hughes Aircraft Company Real-time implementation of a 8Kbps CELP coder on a DSP pair

Similar Documents

Publication Publication Date Title
JP2964344B2 (en) Encoding / decoding device
US6496798B1 (en) Method and apparatus for encoding and decoding frames of voice model parameters into a low bit rate digital voice message
US5995923A (en) Method and apparatus for improving the voice quality of tandemed vocoders
US6131084A (en) Dual subframe quantization of spectral magnitudes
US6018706A (en) Pitch determiner for a speech analyzer
EP0843301B1 (en) Methods for generating comfort noise during discontinous transmission
US6418405B1 (en) Method and apparatus for dynamic segmentation of a low bit rate digital voice message
US6301265B1 (en) Adaptive rate system and method for network communications
US6418407B1 (en) Method and apparatus for pitch determination of a low bit rate digital voice message
US20100094620A1 (en) Voice Transcoder
EP1091348A2 (en) Method and apparatus for non-speech activity reduction of a low bit rate digital voice message
EP1179820A2 (en) Method of coding LSP coefficients during speech inactivity
JPH11503275A (en) Method and apparatus for detecting and avoiding tandem boding
JPH05197400A (en) Means and method for low-bit-rate vocoder
US5657418A (en) Provision of speech coder gain information using multiple coding modes
US5666350A (en) Apparatus and method for coding excitation parameters in a very low bit rate voice messaging system
US6073094A (en) Voice compression by phoneme recognition and communication of phoneme indexes and voice features
US5781882A (en) Very low bit rate voice messaging system using asymmetric voice compression processing
US6772126B1 (en) Method and apparatus for transferring low bit rate digital voice messages using incremental messages
US5987406A (en) Instability eradication for analysis-by-synthesis speech codecs
US5806038A (en) MBE synthesizer utilizing a nonlinear voicing processor for very low bit rate voice messaging
US5893060A (en) Method and device for eradicating instability due to periodic signals in analysis-by-synthesis speech codecs
EP0850471B1 (en) Very low bit rate voice messaging system using variable rate backward search interpolation processing
WO1997013242A1 (en) Trifurcated channel encoding for compressed speech
US6952669B2 (en) Variable rate speech data compression

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): BR CA CN JP KR MX RU UA

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase