WO2000022743A1 - Method and apparatus for digital signal compression without decoding - Google Patents

Method and apparatus for digital signal compression without decoding Download PDF

Info

Publication number
WO2000022743A1
WO2000022743A1 PCT/US1999/021205 US9921205W WO0022743A1 WO 2000022743 A1 WO2000022743 A1 WO 2000022743A1 US 9921205 W US9921205 W US 9921205W WO 0022743 A1 WO0022743 A1 WO 0022743A1
Authority
WO
WIPO (PCT)
Prior art keywords
frames
parameters
rate
digital signal
frame
Prior art date
Application number
PCT/US1999/021205
Other languages
French (fr)
Inventor
David B. Taubenheim
Miriam R. Boudreux
Sunil Satyamurti
Original Assignee
Motorola Inc.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc. filed Critical Motorola Inc.
Priority to AU61461/99A priority Critical patent/AU6146199A/en
Priority to EP99948239A priority patent/EP1121764A4/en
Publication of WO2000022743A1 publication Critical patent/WO2000022743A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/087Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters using mixed excitation models, e.g. MELP, MBE, split band LPC or HVXC
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/06Determination or coding of the spectral characteristics, e.g. of the short-term prediction coefficients
    • G10L19/07Line spectrum pair [LSP] vocoders

Definitions

  • the present invention is directed to digital signal compression, and more particularly to a decoder or vocoder capable of compressing digital signals that are parametrically modeled and encoded.
  • FIG. 1 is a representation of a higher rate message in accordance with the present invention.
  • FIG. 2 is a representation of a frame in a higher rate message in accordance with the present invention.
  • FIG. 3 is a representation of a voiced anchor frame in a lower rate message in accordance with the present invention.
  • FIG. 4 is a representation of an voiced intermediate frame in a lower rate message in accordance with the present invention.
  • FIG. 5 is a representation of an unvoiced anchor frame for any rate in accordance with the present invention.
  • FIG. 6 is a representation of an unvoiced intermediate frame in accordance with the present invention.
  • FIG. 7 is a block diagram of an electronic device such as a selective call receiver in accordance with the present invention.
  • FIG. 8 is a flow chart illustrating a method of compressing a digital signal in accordance with the present invention.
  • FIG. 9 is another flow chart illustrating a method of compressing a digital signal in accordance with the present invention.
  • Any digital signal that can be modeled and parametrically encoded would be an exemplary signal that could be compressed or converted and subsequently recreated in accordance with benefits of the present invention.
  • the emphasis of the present disclosure is with regards to digital speech signals that can be modeled and parametrically encoded, it should be understood that other signals such as digitally stored video signals may equally benefit from the present invention.
  • a multi-rate vocoder is preferably used in the process of recreating speech.
  • the vocoder preferably has a speech synthesizer that initially performs the function of decoding a binary stream of data into sets of speech model parameters and then subsequently converts the parameters into synthesized speech.
  • the multi-rate vocoder is a multi-band Excitation (MBE) vocoder where the analysis, coding, and synthesis of speech is based on the segmentation of the speech into fixed length segments. Synthesis of the speech preferably proceeds frame-by-frame, using a distinct set of model parameters for each frame. Efficient use of the model parameters requires an understanding of the underlying assumptions of the nature of human speech.
  • unvoiced Generally, speech consists of mixtures of voiced and unvoiced spectral components and a typical vocoder would process the voiced and unvoiced portions of a speech signal separately to efficiently model and encode the signal and then subsequently combine the signals in re-creating the speech.
  • the sets of speech model parameters that maybe within a frame of data could include a frame voicing flag, a fundamental frequency value, band voicing vectors, line spectrum frequencies or spectral parameters, as well as gain.
  • a frame voicing flag would indicate whether a voiced component is present within a given frame and whether the frame data itself would be in a voiced or unvoiced format.
  • the fundamental frequency in voiced speech represents the pitch frequency or the frequency at which the pitch cycles are repeated. Since there is no true fundamental frequency in unvoiced speech, an arbitrary value can be assigned and used for decoding the spectral shape of the unvoiced speech segment.
  • the band voicing vector breaks up the speech signal into a plurality of spectral bands having predefined frequency ranges. Line spectrum frequencies (LSFs) or spectral parameters provide values that are used to encode the spectrum which will be used to generate the synthesized speech signal.
  • the harmonic spectrum shape derived from the LSFs should be scaled by the gain to represent the correct frame energy.
  • a typical vocoder as described above can be used to synthesize speech from three data rates: 600, 1000, or 1400 bits per second for example. While these rates are remarkable and allow a large amount of speech to be stored in memory, the present invention is primarily directed to a method to optimize the memory usage within an electronic device such as a selective call receiver or messaging unit.
  • FIG. 1 shows a typical higher rate message bitstream organization 12.
  • Receive frames may be either voiced or unvoiced, bit lengths are not shown for Frames 1....N.
  • FIG. 2 illustrates the bit designations for a higher rate voiced frame 12.
  • a voiced lower rate anchor frame 14 has 2 bits in the BV field, no Harmonic Residue (HR), and may contain less spectral parameters or LSFs.
  • FIG. 4 illustrates the bit designations for a voiced lower rate intermediate frame 16 wherein this frame essentially has the same format as the voiced lower rate anchor frame 14 except that the spectral parameters or LSFs are discarded.
  • FIG. 5 shows the bit fields for an unvoiced anchor frame 18 for any rate while FIG.
  • a voiced segment of a stored voice signal that has been compressed to a lower rate may contain a lower rate anchor frame 14 followed by a predetermined number of lower rate intermediate frames 16 and bounded by another lower rate anchor frame 14.
  • an unvoiced segment of a stored voice signal that has been compressed to a lower rate may contain an unvoiced anchor frame 18 followed by a predetermined number of unvoiced intermediate frames 20 and bounded by another unvoiced anchor frame 18.
  • the 13 bits in the gain field of a voiced or unvoiced frame are valid for any rate. Therefore, all 13 are copied into the lower rate bitstream from the higher bitstream. (A parameter decoder in a messaging unit handles how to partition the gain into left- or right-half energies according to the rate.) Likewise, the 13 bits in the pitch field are copied for voice frames from the higher rate to the lower rate bitstream.
  • band voicing a voiced frame's spectrum is preferably sectioned into four bands, each of which carries a voiced/unvoiced flag.
  • the second, third, and fourth bands may or may not be voiced. Therefore, a higher rate frame may carry the voicing status of bands 2, 3, and 4 explicitly as three bits: BV2, BV3, BV4.
  • a lower rate frame can contain less information, preferably only bits BV2 and BV3. This means that the rate conversion algorithm will simply not copy BV4 from the higher rate bitstream.
  • a parameter decoder would know to set BV4 to BV3 when a lower rate message is decoded.
  • Harmonic Residue HR
  • Harmonic residues are not used in lower rate messages and are not copied from the higher rate bitstream to the lower rate bitstream, resulting in a reduction of data.
  • LSFs Line Spectrum Frequencies
  • a lower bit rate can be achieved with a lower rate message since a lower rate message contains fewer explicit sets of LSFs than a higher rate message, which contains explicit LSFs because each frame is an anchor frame. It is important to the voice quality of the message to choose appropriate LSFs from the higher rate bitstream to represent the content of the voice message well at the lower rate.
  • Representative LSFs from the higher rate bitstream are preferably chosen according to a distortion-minimizing routine.
  • An electronic device such as a selective call receiver or transceiver having a memory for storing digital signals that are parametrically modeled and encoded and capable of compressing the digital signals in accordance with the present invention would preferably comprise a processor such as a multi-rate vocoder programmed to store the digital signal in the memory in a plurality of frames wherein each frame has a plurality of parameters and wherein the digital signal was encoded at a higher rate. Then, the processor would preferably convert the digital signal to a lower rate by selecting a subset of parameters from each of the plurality of frames and discard the subset of the plurality of parameters within each of the frames of the plurality of frames.
  • the processor can be further programmed to selectively compress the digital signal by selecting an additional subset of parameters from each frame of the plurality of frames and discarding the additional subset of parameters within each frame of the plurality of frames.
  • an electrical block diagram depicts an electronic device such as communication device 50 which may be embodied as a selective call receiver or transceiver or portable subscriber unit (PSU) in accordance with the present invention.
  • the portable subscriber unit comprises a transceiver antenna 52 for transmitting and intercepting radio signals to and from base stations (not shown).
  • the radio signals linked to the transceiver antenna 52 are coupled to a transceiver 54 comprising a conventional transmitter 51 and receiver 53.
  • the radio signals received from the base stations preferably use conventional two and four-level FSK modulation, but other modulation schemes could be used as well.
  • the transceiver antenna 52 is not limited to a single antenna for transmitting and receiving radio signals. Separate antennas for receiving and transmitting radio signals would also be suitable.
  • Radio signals received by the transceiver 54 produce demodulated information at the output.
  • the demodulated information is transferred over a signal information bus 55 which is preferably coupled to the input of a processor 58, which processes the information in a manner well known in the art.
  • response messages including acknowledge response messages are processed by the processor 58 and delivered through the signal information bus 55 to the transceiver 54.
  • the response messages transmitted by the transceiver 54 are preferably modulated using four-level FSK operating at a bit rate of ninety-six-hundred bps. It will be appreciated that, alternatively, other bit rates and other types of modulation can be used as well.
  • a conventional power switch 56 coupled to the processor 58, is used to control the supply of power to the transceiver 54, thereby providing a battery saving function.
  • a clock 59 is coupled to the processor 58 to provide a timing signal used to time various events as required in accordance with the present invention.
  • the processor 58 also is preferably coupled to a electrically erasable programmable read only memory (EEPROM) 63 which comprises at least one selective call address 64 assigned to the portable subscriber unit 18 and used to implement the selective call feature.
  • the processor 58 also is coupled to a random access memory (RAM) 66 for storing the at least a message in a plurality of message storage locations 68.
  • RAM random access memory
  • the communication device 50 in the form of a two-way messaging unit may also comprise a transmitter coupled to a encoder and further coupled to the processor 58. It should be understood that the processor 58 in the present invention could serve as both the decoder and encoder. When an address is received by the processor 58, the call processing element
  • a call alerting signal is preferably generated to alert a user that a message has been received.
  • the call alerting signal is directed to a conventional audible or tactile alert device 72 coupled to the processor 58 for generating an audible or tactile call alerting signal.
  • the call processing element 61 processes the message which preferably is received in a digitized conventional manner, and then stores the message in the message storage location 68 in the RAM 66.
  • the message can be accessed by the user through conventional user controls 70 coupled to the processor 58, for providing functions such as reading, locking, and deleting a message. Alternatively, messages could be read through a serial port (not shown).
  • an output device 62 e.g., a conventional liquid crystal display (LCD), preferably also is coupled to the processor 58.
  • LCD liquid crystal display
  • EEPROM e.g., electrically erasable programmable read-only memory
  • ROM 60 also preferably includes elements for handling the registration process (67) and for compression processing (65) among other elements or programs.
  • a method in accordance with the present invention would preferably convert a higher rate message to a lower rate message within the messaging unit.
  • the conversion is preferably done before the message is decoded. Alternatively, portions of the conversion can be done before decoding and the remaining portion of the conversion can be done after decoding.
  • the vocoder system envisioned for use with the present invention would store voice data as a bit-packed stream of parameters which are later used to re-create a person's voice. More parameters are contained within a higher rate message (such as the 1400 bps or rate 3 message) than in a lower rate message (such as the 1000 bps or rate 2 message or the 600 bps or rate 1 message), thus accounting for the rate and quality increase. (Please note that the number of bits associated with each rate are approximate and represent the average message.)
  • memory savings can be achieved by converting down the rate of the message by effectively reducing the number of parameters stored with only a slight reduction in the resultant speech quality.
  • the average 10 second rate 3 message occupies 875 words of memory, assuming 16 bit words:
  • a method in accordance with the present invention preferably converts a higher rate message to a lower rate message before reconstruction takes place by the vocoder. This significantly reduces the processing required to eventually re-generate the voice message and also provides for a higher quality message in comparison to a message using a method where the message is fully reconstructed and then converted to a lower rate. More specifically, parametric values can be extracted, discarded or at least reduced from the bit-packed stream of parameters received without ever decoding. It should be understood that further parametric values can be discarded or reduced after decoding as well.
  • a method 100 of compressing a digital signal that is parametrically modeled and encoded at a higher rate preferably comprises the steps of storing at step 102 the digital signal in a memory in a plurality of frames having a plurality of parameters in each frame of the plurality of frames and converting the digital signal to a lower rate by selecting at step 106 from each frame of the plurality of frames a subset of the plurality of parameters and discarding at step 108 the subset of the plurality of parameters within each frame of the plurality of frames.
  • the plurality of parameters can be selected from the group consisting of spectrum, gain, pitch, spectral parameters, and band voicing and the conversion of the digital signal to a lower rate is preferably achieved without reconstructing the signal.
  • the conversion could further comprise the step of segmentation by choosing representative frames and respective spectral parameters for the plurality of frames as previously explained above.
  • the conversion may also comprise the step of copying at least portions of gain, pitch, band voicing, and spectral parameters from the higher rate to the slower rate until the end of the message.
  • the digital signal can be further compressed at decision block 110 by selecting at step 112 an additional subset of parameters from each frame of the plurality of frames and discarding at step 114 the additional subset of parameters within each frame of the plurality of frames. All these steps can occur in an electronic device such as a selective call unit, telephone answering device, or dictation device preferably having a vocoder
  • compression of the digital signal can be predicated upon a predetermined event as shown at step 104.
  • the "Compress message" command could easily be implemented in a menu screen of an electronic device such as a pager.
  • Other examples could include compressing automatically the oldest message or messages or automatically compressing messages when a memory is full or approaching a predefined percentage of full capacity.
  • Message(s) over a predetermined number of days old or which has/have not been played/replayed for a predetermined number of days may be compressed automatically.
  • any audio information service message in memory may be compressed if memory has reach a predetermined capacity.
  • an incoming message can be compressed in real-time.
  • the compression algorithm could also be set to compress memory to a predetermined percentage of its present size.
  • a user may even set the compression criterion for a message or series of messages attempting to balance intelligibility or quality versus space savings.
  • the present invention ultimately allows for the option of selecting to keep or discard parameters to achieve a desired compression goal.
  • the first step is to initialize the lower rate message in the unit's memory at step 202 by beginning to compose its header (HD).
  • the first two bits of the HD contain the rate indicator (R).
  • R rate indicator
  • Much of the rest of the data contained in the higher rate header is to be also used in the lower rate header: bits which encode the number of frames in the current message, the number of voiced frames, the mean fundamental frequency, and the mean values of the odd line spectrum frequencies (LSFs) of the voiced frames.
  • spectral parameters such as LSFs are chosen according to segmentation as previously explained above.
  • the Frame Status Indicators (FSI) for the lower rate bit stream is built.
  • the Frame Status Indicators (FSI) describe which frames are voiced or unvoiced.
  • the FSI block of higher rate messages contains one bit per frame, since all higher rate frames are explicit (i.e. no interpolation of LSFs). However, since lower rate messages contain explicit and interpolated frames, the FSI block requires two bits per frame. The conversion process determines which frames are to be explicit or interpolated, so the two FSI bits are set.
  • the gain parameters from the higher rate message bit stream is copied to the lower rate message bitstream.
  • the pitch parameters from the higher rate message bitstream is copied over to the lower rate message bitstream.
  • the higher rate message band voicing bits are retrieved with the last band voicing bit discarded.
  • the remaining band voicing bits are then copied to the lower rate message bit stream.
  • the higher rate harmonic residue bits are ignored at step 218 and therefore not copied to the lower rate message bit stream.
  • representative spectral parameters are copied from the higher rate message bitstream to the lower rate message bit stream. The process described above is repeated for each frame until the end of message is reached as shown at step 222.
  • a multi-rate vocoder can then reconstruct the voice signal from the lower rate parameters and thereby achieve the desired memory savings.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A method (100) of compressing a digital signal that is parametrically modeled and encoded includes the steps of storing (102) the digital signal in a memory in a plurality of frames having a plurality of parameters in each frame of the plurality of frames, wherein the digital signal was encoded at a higher rate and converting the digital signal to a lower rate by selecting (106) from each frame of the plurality of frames a subset of the plurality of parameters and discarding (108) the subset of the plurality of parameters within each frame of the plurality of frames.

Description

METHOD AND APPARATUS FOR DIGITAL SIGNAL COMPRESSION
WITHOUT DECODING
FIELD OF THE INVENTION
The present invention is directed to digital signal compression, and more particularly to a decoder or vocoder capable of compressing digital signals that are parametrically modeled and encoded.
BACKGROUND OF THE INVENTION
Mobile communication products continue to push the envelope in size and capabilities. Memory optimization of stored digital data is therefore vital in addressing the current and future demands of users of such products. Voice, video and multimedia signals are memory intensive. Compression schemes for such signals can become quite complex with resulting uncompressed signals that fall below acceptable standards in intelligibility or uncompressed data lengths that are still too large to provide a significant advantage in memory space savings. Thus, what is needed is a compression scheme for a stored digital signal that simply reduces the size of the stored digital signal while maintaining intelligibility and a significant savings in memory space in an uncompressed mode.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 is a representation of a higher rate message in accordance with the present invention.
FIG. 2 is a representation of a frame in a higher rate message in accordance with the present invention. FIG. 3 is a representation of a voiced anchor frame in a lower rate message in accordance with the present invention.
FIG. 4 is a representation of an voiced intermediate frame in a lower rate message in accordance with the present invention.
FIG. 5 is a representation of an unvoiced anchor frame for any rate in accordance with the present invention. FIG. 6 is a representation of an unvoiced intermediate frame in accordance with the present invention.
FIG. 7 is a block diagram of an electronic device such as a selective call receiver in accordance with the present invention. FIG. 8 is a flow chart illustrating a method of compressing a digital signal in accordance with the present invention.
FIG. 9 is another flow chart illustrating a method of compressing a digital signal in accordance with the present invention.
DETAILED DESCRIPTION
Any digital signal that can be modeled and parametrically encoded would be an exemplary signal that could be compressed or converted and subsequently recreated in accordance with benefits of the present invention. Although the emphasis of the present disclosure is with regards to digital speech signals that can be modeled and parametrically encoded, it should be understood that other signals such as digitally stored video signals may equally benefit from the present invention.
With respect to digital speech signals, a multi-rate vocoder is preferably used in the process of recreating speech. The vocoder preferably has a speech synthesizer that initially performs the function of decoding a binary stream of data into sets of speech model parameters and then subsequently converts the parameters into synthesized speech. Preferably, the multi-rate vocoder is a multi-band Excitation (MBE) vocoder where the analysis, coding, and synthesis of speech is based on the segmentation of the speech into fixed length segments. Synthesis of the speech preferably proceeds frame-by-frame, using a distinct set of model parameters for each frame. Efficient use of the model parameters requires an understanding of the underlying assumptions of the nature of human speech.
The primary assumption about speech is that it is often highly periodic and its spectral characteristics change gradually. This is the basis for selecting a fixed- length frame in a vocoder scheme. Of course there are times when speech characteristics do change rapidly. High rate coders with shorter update intervals generally outperform very low rate coders in these circumstances. Thus, pseudo- periodic speech is referred to as "voiced" and a periodic speech is referred to as
"unvoiced." Generally, speech consists of mixtures of voiced and unvoiced spectral components and a typical vocoder would process the voiced and unvoiced portions of a speech signal separately to efficiently model and encode the signal and then subsequently combine the signals in re-creating the speech.
The sets of speech model parameters that maybe within a frame of data could include a frame voicing flag, a fundamental frequency value, band voicing vectors, line spectrum frequencies or spectral parameters, as well as gain. A frame voicing flag would indicate whether a voiced component is present within a given frame and whether the frame data itself would be in a voiced or unvoiced format. The fundamental frequency in voiced speech represents the pitch frequency or the frequency at which the pitch cycles are repeated. Since there is no true fundamental frequency in unvoiced speech, an arbitrary value can be assigned and used for decoding the spectral shape of the unvoiced speech segment. The band voicing vector breaks up the speech signal into a plurality of spectral bands having predefined frequency ranges. Line spectrum frequencies (LSFs) or spectral parameters provide values that are used to encode the spectrum which will be used to generate the synthesized speech signal. The harmonic spectrum shape derived from the LSFs should be scaled by the gain to represent the correct frame energy.
Thus, a typical vocoder as described above can be used to synthesize speech from three data rates: 600, 1000, or 1400 bits per second for example. While these rates are remarkable and allow a large amount of speech to be stored in memory, the present invention is primarily directed to a method to optimize the memory usage within an electronic device such as a selective call receiver or messaging unit.
FIG. 1 shows a typical higher rate message bitstream organization 12. (Since frames may be either voiced or unvoiced, bit lengths are not shown for Frames 1....N.). FIG. 2 illustrates the bit designations for a higher rate voiced frame 12. As shown in FIG. 3, a voiced lower rate anchor frame 14 has 2 bits in the BV field, no Harmonic Residue (HR), and may contain less spectral parameters or LSFs. FIG. 4 illustrates the bit designations for a voiced lower rate intermediate frame 16 wherein this frame essentially has the same format as the voiced lower rate anchor frame 14 except that the spectral parameters or LSFs are discarded. FIG. 5 shows the bit fields for an unvoiced anchor frame 18 for any rate while FIG. 6 shows a unvoiced intermediate frame 20. The notion of anchor frames and intermediate frames become apparent with an understanding of segmentation. Segmentation is the process of choosing representative frames (the anchor frames) and their respective spectral parameters while discarding the spectral parameters for the intermediate frames by means of a spectral distortion metric. Thus, a voiced segment of a stored voice signal that has been compressed to a lower rate may contain a lower rate anchor frame 14 followed by a predetermined number of lower rate intermediate frames 16 and bounded by another lower rate anchor frame 14. Likewise, an unvoiced segment of a stored voice signal that has been compressed to a lower rate may contain an unvoiced anchor frame 18 followed by a predetermined number of unvoiced intermediate frames 20 and bounded by another unvoiced anchor frame 18.
Looking at the bit designations within the frames in further detail, the 13 bits in the gain field of a voiced or unvoiced frame are valid for any rate. Therefore, all 13 are copied into the lower rate bitstream from the higher bitstream. (A parameter decoder in a messaging unit handles how to partition the gain into left- or right-half energies according to the rate.) Likewise, the 13 bits in the pitch field are copied for voice frames from the higher rate to the lower rate bitstream. With respect to band voicing (BV), a voiced frame's spectrum is preferably sectioned into four bands, each of which carries a voiced/unvoiced flag. In this example, the digital signal is parametrically modeled so that the first band is always voiced, so BV1=1 always. The second, third, and fourth bands may or may not be voiced. Therefore, a higher rate frame may carry the voicing status of bands 2, 3, and 4 explicitly as three bits: BV2, BV3, BV4. On the other hand, a lower rate frame can contain less information, preferably only bits BV2 and BV3. This means that the rate conversion algorithm will simply not copy BV4 from the higher rate bitstream. A parameter decoder would know to set BV4 to BV3 when a lower rate message is decoded. With respect to Harmonic Residue (HR), Harmonic residues are not used in lower rate messages and are not copied from the higher rate bitstream to the lower rate bitstream, resulting in a reduction of data. When a lower rate message is played, zeroes are passed to a synthesizer from a parameter decoder. With respect to spectral parameters such as Line Spectrum Frequencies (LSFs), a lower bit rate can be achieved with a lower rate message since a lower rate message contains fewer explicit sets of LSFs than a higher rate message, which contains explicit LSFs because each frame is an anchor frame. It is important to the voice quality of the message to choose appropriate LSFs from the higher rate bitstream to represent the content of the voice message well at the lower rate. Representative LSFs from the higher rate bitstream are preferably chosen according to a distortion-minimizing routine. Once the representative LSFs have been determined, the FSI block is updated accordingly.
An electronic device such as a selective call receiver or transceiver having a memory for storing digital signals that are parametrically modeled and encoded and capable of compressing the digital signals in accordance with the present invention would preferably comprise a processor such as a multi-rate vocoder programmed to store the digital signal in the memory in a plurality of frames wherein each frame has a plurality of parameters and wherein the digital signal was encoded at a higher rate. Then, the processor would preferably convert the digital signal to a lower rate by selecting a subset of parameters from each of the plurality of frames and discard the subset of the plurality of parameters within each of the frames of the plurality of frames. The processor can be further programmed to selectively compress the digital signal by selecting an additional subset of parameters from each frame of the plurality of frames and discarding the additional subset of parameters within each frame of the plurality of frames.
Referring to FIG. 7, an electrical block diagram depicts an electronic device such as communication device 50 which may be embodied as a selective call receiver or transceiver or portable subscriber unit (PSU) in accordance with the present invention. The portable subscriber unit comprises a transceiver antenna 52 for transmitting and intercepting radio signals to and from base stations (not shown). The radio signals linked to the transceiver antenna 52 are coupled to a transceiver 54 comprising a conventional transmitter 51 and receiver 53. The radio signals received from the base stations preferably use conventional two and four-level FSK modulation, but other modulation schemes could be used as well. It will be appreciated by one of ordinary skill in the art that the transceiver antenna 52 is not limited to a single antenna for transmitting and receiving radio signals. Separate antennas for receiving and transmitting radio signals would also be suitable.
Radio signals received by the transceiver 54 produce demodulated information at the output. The demodulated information is transferred over a signal information bus 55 which is preferably coupled to the input of a processor 58, which processes the information in a manner well known in the art. Similarly, response messages including acknowledge response messages are processed by the processor 58 and delivered through the signal information bus 55 to the transceiver 54. The response messages transmitted by the transceiver 54 are preferably modulated using four-level FSK operating at a bit rate of ninety-six-hundred bps. It will be appreciated that, alternatively, other bit rates and other types of modulation can be used as well.
A conventional power switch 56, coupled to the processor 58, is used to control the supply of power to the transceiver 54, thereby providing a battery saving function. A clock 59 is coupled to the processor 58 to provide a timing signal used to time various events as required in accordance with the present invention. The processor 58 also is preferably coupled to a electrically erasable programmable read only memory (EEPROM) 63 which comprises at least one selective call address 64 assigned to the portable subscriber unit 18 and used to implement the selective call feature. The processor 58 also is coupled to a random access memory (RAM) 66 for storing the at least a message in a plurality of message storage locations 68. Of course, other information could be stored that would be useful in a two-way messaging system such as zone identifiers and general purpose counters to preferably count calls (to and from the PSU).
The communication device 50 in the form of a two-way messaging unit may also comprise a transmitter coupled to a encoder and further coupled to the processor 58. It should be understood that the processor 58 in the present invention could serve as both the decoder and encoder. When an address is received by the processor 58, the call processing element
61 preferably within a ROM 60 compares the received address with at least one selective call addresses 64, and when a match is detected, a call alerting signal is preferably generated to alert a user that a message has been received. The call alerting signal is directed to a conventional audible or tactile alert device 72 coupled to the processor 58 for generating an audible or tactile call alerting signal. In addition, the call processing element 61 processes the message which preferably is received in a digitized conventional manner, and then stores the message in the message storage location 68 in the RAM 66. The message can be accessed by the user through conventional user controls 70 coupled to the processor 58, for providing functions such as reading, locking, and deleting a message. Alternatively, messages could be read through a serial port (not shown). For retrieving or reading a message, an output device 62, e.g., a conventional liquid crystal display (LCD), preferably also is coupled to the processor 58. It will be appreciated that other types of memory, e.g., EEPROM, can be utilized as well for the ROM 60 or RAM 66 and that other types of output devices, e.g., a speaker, can be utilized in place of or in addition to the LCD, particularly in the case of receipt of digitized voice. The ROM 60 also preferably includes elements for handling the registration process (67) and for compression processing (65) among other elements or programs.
A method in accordance with the present invention would preferably convert a higher rate message to a lower rate message within the messaging unit. The conversion is preferably done before the message is decoded. Alternatively, portions of the conversion can be done before decoding and the remaining portion of the conversion can be done after decoding. The vocoder system envisioned for use with the present invention would store voice data as a bit-packed stream of parameters which are later used to re-create a person's voice. More parameters are contained within a higher rate message (such as the 1400 bps or rate 3 message) than in a lower rate message (such as the 1000 bps or rate 2 message or the 600 bps or rate 1 message), thus accounting for the rate and quality increase. (Please note that the number of bits associated with each rate are approximate and represent the average message.) Thus, memory savings can be achieved by converting down the rate of the message by effectively reducing the number of parameters stored with only a slight reduction in the resultant speech quality.
For example, the average 10 second rate 3 message occupies 875 words of memory, assuming 16 bit words:
10 seconds * 1400 bits / second * 1 word/16 bits = 875 words By converting that 10 second message at rate 1, the memory usage becomes:
10 seconds * 600 bits / second * 1 word/16 bits = 375 words This results in an average savings of approximately 55%. Of course, as previously mentioned, there is a slight loss of voice quality associated with reducing the rate. However, the reduction may be applied judiciously, as described later. Further, a rate reduction may take place from rate 3 to rate 2, or from rate 2 to rate 1. A method in accordance with the present invention preferably converts a higher rate message to a lower rate message before reconstruction takes place by the vocoder. This significantly reduces the processing required to eventually re-generate the voice message and also provides for a higher quality message in comparison to a message using a method where the message is fully reconstructed and then converted to a lower rate. More specifically, parametric values can be extracted, discarded or at least reduced from the bit-packed stream of parameters received without ever decoding. It should be understood that further parametric values can be discarded or reduced after decoding as well.
Referring to FIG. 8, in one aspect of the present invention, a method 100 of compressing a digital signal that is parametrically modeled and encoded at a higher rate preferably comprises the steps of storing at step 102 the digital signal in a memory in a plurality of frames having a plurality of parameters in each frame of the plurality of frames and converting the digital signal to a lower rate by selecting at step 106 from each frame of the plurality of frames a subset of the plurality of parameters and discarding at step 108 the subset of the plurality of parameters within each frame of the plurality of frames. The plurality of parameters can be selected from the group consisting of spectrum, gain, pitch, spectral parameters, and band voicing and the conversion of the digital signal to a lower rate is preferably achieved without reconstructing the signal. The conversion could further comprise the step of segmentation by choosing representative frames and respective spectral parameters for the plurality of frames as previously explained above. The conversion may also comprise the step of copying at least portions of gain, pitch, band voicing, and spectral parameters from the higher rate to the slower rate until the end of the message. The digital signal can be further compressed at decision block 110 by selecting at step 112 an additional subset of parameters from each frame of the plurality of frames and discarding at step 114 the additional subset of parameters within each frame of the plurality of frames. All these steps can occur in an electronic device such as a selective call unit, telephone answering device, or dictation device preferably having a vocoder
In applying a method in accordance with the present invention, there are several situations when a digital voice message could be compressed. Thus compression of the digital signal can be predicated upon a predetermined event as shown at step 104. For example, upon a user's request. The "Compress message" command could easily be implemented in a menu screen of an electronic device such as a pager. Other examples could include compressing automatically the oldest message or messages or automatically compressing messages when a memory is full or approaching a predefined percentage of full capacity. Message(s) over a predetermined number of days old or which has/have not been played/replayed for a predetermined number of days may be compressed automatically. Additionally, any audio information service message in memory may be compressed if memory has reach a predetermined capacity. If a memory is nearly full, an incoming message can be compressed in real-time. The compression algorithm could also be set to compress memory to a predetermined percentage of its present size. A user may even set the compression criterion for a message or series of messages attempting to balance intelligibility or quality versus space savings. The present invention ultimately allows for the option of selecting to keep or discard parameters to achieve a desired compression goal.
A summary of the algorithm used to change a voice message from a higher rate to a lower rate is outlined in the method 200 in FIG. 9 below. The first step is to initialize the lower rate message in the unit's memory at step 202 by beginning to compose its header (HD). The first two bits of the HD contain the rate indicator (R). Thus, a change from 1400 bps to 600 bps, R is written as 01. Much of the rest of the data contained in the higher rate header is to be also used in the lower rate header: bits which encode the number of frames in the current message, the number of voiced frames, the mean fundamental frequency, and the mean values of the odd line spectrum frequencies (LSFs) of the voiced frames. At step 204, representative spectral parameters such as LSFs are chosen according to segmentation as previously explained above. At step 206, the Frame Status Indicators (FSI) for the lower rate bit stream is built. The Frame Status Indicators (FSI) describe which frames are voiced or unvoiced. The FSI block of higher rate messages contains one bit per frame, since all higher rate frames are explicit (i.e. no interpolation of LSFs). However, since lower rate messages contain explicit and interpolated frames, the FSI block requires two bits per frame. The conversion process determines which frames are to be explicit or interpolated, so the two FSI bits are set. At step 208, the gain parameters from the higher rate message bit stream is copied to the lower rate message bitstream. Next, at the decision block 210, if the frame is voiced, the pitch parameters from the higher rate message bitstream is copied over to the lower rate message bitstream. At steps 214 and 216, the higher rate message band voicing bits are retrieved with the last band voicing bit discarded. The remaining band voicing bits are then copied to the lower rate message bit stream. The higher rate harmonic residue bits are ignored at step 218 and therefore not copied to the lower rate message bit stream. At step 220, representative spectral parameters are copied from the higher rate message bitstream to the lower rate message bit stream. The process described above is repeated for each frame until the end of message is reached as shown at step 222. At decision block 210, if the frame was unvoiced, then only the spectral parameters are copied from the higher rate message bit stream to the lower rate message bit stream at step 224 until the end of message is reached as shown at step 222. Once the voice message is compressed from a higher rate to a lower rate in accordance with the present invention a multi-rate vocoder can then reconstruct the voice signal from the lower rate parameters and thereby achieve the desired memory savings.
The above description is intended by way of example only and is not intended to limit the present invention in any way except as set forth in the following claims.
What is claimed is:

Claims

1. A method of compressing a digital signal that is parametrically modeled and encoded, comprising the steps of: storing the digital signal in a memory in a plurality of frames having a plurality of parameters in each frame of the plurality of frames, wherein the digital signal was encoded at a higher rate; converting the digital signal to a lower rate by selecting from each frame of the plurality of frames a subset of the plurality of parameters and discarding the subset of the plurality of parameters within each frame of the plurality of frames.
2. The method of claim 1, wherein the method further comprises a method of compressing a digital voice signal having parameters selected from the group consisting of spectrum, gain, pitch, spectral parameters, and band voicing.
3. The method of claim 1, wherein the method further comprises the step of converting the digital signal to a lower rate without reconstructing the signal.
4. The method of claim 1, wherein the method further comprises the step of further compressing the digital signal by selecting an additional subset of parameters from each frame of the plurality of frames and discarding the additional subset of parameters within each frame of the plurality of frames.
5. A method of compressing upon a predetermined event a stored digitally encoded voice message stored in a plurality of frames in a memory within a subscriber unit having a vocoder, comprising the steps of: converting the stored digitally encoded voice message that was encoded at a first rate in the plurality of frames to a stored digitally encoded voice message at a second rate, wherein the second rate is lower than the first rate, wherein the conversion comprises the steps of: selecting a subset of a plurality of parameters with each of the plurality of frames; and discarding the subset of the plurality of parameters residing within the plurality of frames.
6. The method of claim 5, wherein the step of converting further comprises the step of segmentation by choosing representative spectral parameters for a number of the plurality of frames.
7. The method of claim 5, wherein the step of converting further comprises the step of copying at least portions of gain, pitch, band voicing, and spectral parameters from the first rate to the second rate until the end of the message.
8. The method of claim 5, wherein the predetermined event comprises a subscriber unit user initiated request.
9. The method of claim 5, wherein the predetermined event comprises determination of an oldest message or messages stored in the subscriber unit, whereupon the method further comprises the step of automatic compression of at least the oldest message when the memory in the subscriber unit exceeds a threshold percentage of its storage capacity.
10. An electronic device having a memory for storing digital signals that are parametrically modeled and encoded and capable of compressing the digital signals, comprises: a processor programmed to: store the digital signal in the memory in a plurality of frames wherein each frame has a plurality of parameters and wherein the digital signal was encoded at a high rate; convert the digital signal to a lower rate by selecting a subset of parameters from each of the plurality of frames and discarding the subset of the plurality of parameters within each of the frames of the plurality of frames.
PCT/US1999/021205 1998-10-13 1999-09-15 Method and apparatus for digital signal compression without decoding WO2000022743A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
AU61461/99A AU6146199A (en) 1998-10-13 1999-09-15 Method and apparatus for digital signal compression without decoding
EP99948239A EP1121764A4 (en) 1998-10-13 1999-09-15 Method and apparatus for digital signal compression without decoding

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US09/170,744 US6185525B1 (en) 1998-10-13 1998-10-13 Method and apparatus for digital signal compression without decoding
US09/170,744 1998-10-13

Publications (1)

Publication Number Publication Date
WO2000022743A1 true WO2000022743A1 (en) 2000-04-20

Family

ID=22621083

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US1999/021205 WO2000022743A1 (en) 1998-10-13 1999-09-15 Method and apparatus for digital signal compression without decoding

Country Status (5)

Country Link
US (1) US6185525B1 (en)
EP (1) EP1121764A4 (en)
CN (1) CN1192502C (en)
AU (1) AU6146199A (en)
WO (1) WO2000022743A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1516319A2 (en) * 2002-06-19 2005-03-23 Sony Ericsson Mobile Communications AB Methods and systems for compression of stored audio

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6661845B1 (en) * 1999-01-14 2003-12-09 Vianix, Lc Data compression system and method
US7558381B1 (en) * 1999-04-22 2009-07-07 Agere Systems Inc. Retrieval of deleted voice messages in voice messaging system
US6707826B1 (en) * 2000-03-20 2004-03-16 Motorola, Inc. Method and apparatus for wireless bandwidth efficient multi-way calling
JP2002196797A (en) * 2000-12-27 2002-07-12 Toshiba Corp Recording/reproducing device and recording/reproducing method therefor
US6952669B2 (en) * 2001-01-12 2005-10-04 Telecompression Technologies, Inc. Variable rate speech data compression
JP2002258894A (en) * 2001-03-02 2002-09-11 Fujitsu Ltd Device and method of compressing decompression voice data
US20040059835A1 (en) * 2002-09-25 2004-03-25 Zhigang Liu Method and system for in-band signaling between network nodes using state announcement or header field mechanisms
KR100524065B1 (en) * 2002-12-23 2005-10-26 삼성전자주식회사 Advanced method for encoding and/or decoding digital audio using time-frequency correlation and apparatus thereof
US8249873B2 (en) * 2005-08-12 2012-08-21 Avaya Inc. Tonal correction of speech
US8855275B2 (en) * 2006-10-18 2014-10-07 Sony Online Entertainment Llc System and method for regulating overlapping media messages
FI20075426A0 (en) * 2007-06-08 2007-06-08 Polar Electro Oy Performance meter, transmission method and computer program product

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4330689A (en) * 1980-01-28 1982-05-18 The United States Of America As Represented By The Secretary Of The Navy Multirate digital voice communication processor
US4791660A (en) * 1986-08-27 1988-12-13 American Telephone And Telegraph Company Variable data compression announcement circuit
US5506872A (en) * 1994-04-26 1996-04-09 At&T Corp. Dynamic compression-rate selection arrangement
US5675333A (en) * 1994-08-31 1997-10-07 U.S. Philips Corporation Digital compressed sound recorder
US5881104A (en) * 1996-03-25 1999-03-09 Sony Corporation Voice messaging system having user-selectable data compression modes

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0085820B1 (en) * 1982-02-09 1985-11-21 International Business Machines Corporation Method for multi-speed digital transmission and apparatus for carrying out said method
EP0464839B1 (en) * 1990-07-05 2000-09-27 Fujitsu Limited Digitally multiplexed transmission system
US5715367A (en) * 1995-01-23 1998-02-03 Dragon Systems, Inc. Apparatuses and methods for developing and using models for speech recognition
US5682462A (en) 1995-09-14 1997-10-28 Motorola, Inc. Very low bit rate voice messaging system using variable rate backward search interpolation processing
KR100251497B1 (en) * 1995-09-30 2000-06-01 윤종용 Audio signal reproducing method and the apparatus
US5974387A (en) * 1996-06-19 1999-10-26 Yamaha Corporation Audio recompression from higher rates for karaoke, video games, and other applications
KR100261254B1 (en) * 1997-04-02 2000-07-01 윤종용 Scalable audio data encoding/decoding method and apparatus
US5978757A (en) * 1997-10-02 1999-11-02 Lucent Technologies, Inc. Post storage message compaction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4330689A (en) * 1980-01-28 1982-05-18 The United States Of America As Represented By The Secretary Of The Navy Multirate digital voice communication processor
US4791660A (en) * 1986-08-27 1988-12-13 American Telephone And Telegraph Company Variable data compression announcement circuit
US5506872A (en) * 1994-04-26 1996-04-09 At&T Corp. Dynamic compression-rate selection arrangement
US5675333A (en) * 1994-08-31 1997-10-07 U.S. Philips Corporation Digital compressed sound recorder
US5881104A (en) * 1996-03-25 1999-03-09 Sony Corporation Voice messaging system having user-selectable data compression modes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP1121764A4 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1516319A2 (en) * 2002-06-19 2005-03-23 Sony Ericsson Mobile Communications AB Methods and systems for compression of stored audio

Also Published As

Publication number Publication date
CN1192502C (en) 2005-03-09
AU6146199A (en) 2000-05-01
EP1121764A4 (en) 2004-11-17
CN1323464A (en) 2001-11-21
US6185525B1 (en) 2001-02-06
EP1121764A1 (en) 2001-08-08

Similar Documents

Publication Publication Date Title
US5881104A (en) Voice messaging system having user-selectable data compression modes
US6185525B1 (en) Method and apparatus for digital signal compression without decoding
MXPA06010825A (en) Coding of audio signals.
US6073094A (en) Voice compression by phoneme recognition and communication of phoneme indexes and voice features
US6691081B1 (en) Digital signal processor for processing voice messages
US6002719A (en) Two way messaging system with non-real time voice compression and decompression
AU752863C (en) Communication device and method of operation
US5666350A (en) Apparatus and method for coding excitation parameters in a very low bit rate voice messaging system
EP0751490B1 (en) Speech decoding apparatus
US5806038A (en) MBE synthesizer utilizing a nonlinear voicing processor for very low bit rate voice messaging
KR101011320B1 (en) Identification and exclusion of pause frames for speech storage, transmission and playback
US5745524A (en) Self-initialized coder and method thereof
US6502071B1 (en) Comfort noise generation in a radio receiver, using stored, previously-decoded noise after deactivating decoder during no-speech periods
JP2000124915A (en) Method and device for decoding soundless compressed code
EP0850471B1 (en) Very low bit rate voice messaging system using variable rate backward search interpolation processing
JP3343002B2 (en) Voice band information transmission device
US7822283B2 (en) System and method for arithmetic encoding and decoding
JP3225256B2 (en) Pseudo background noise generation method
US10484006B2 (en) System and method for arithmetic encoding and decoding
EP0942571B1 (en) Communication device and method of operation
JP3249012B2 (en) Audio coding device
JPH09139978A (en) Sound signal transmitter-receiver
JP2001094507A (en) Pseudo-backgroundnoise generating method
JP2003067000A (en) Apparatus, method and program for processing acoustic signal and recording medium with program for processing acoustic signal recorded thereon
WO1997013242A1 (en) Trifurcated channel encoding for compressed speech

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 99812039.1

Country of ref document: CN

AK Designated states

Kind code of ref document: A1

Designated state(s): AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IS JP KE KG KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZW

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 1999948239

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1999948239

Country of ref document: EP

REG Reference to national code

Ref country code: DE

Ref legal event code: 8642

WWW Wipo information: withdrawn in national office

Ref document number: 1999948239

Country of ref document: EP