US4703505A - Speech data encoding scheme - Google Patents

Speech data encoding scheme Download PDF

Info

Publication number
US4703505A
US4703505A US06526065 US52606583A US4703505A US 4703505 A US4703505 A US 4703505A US 06526065 US06526065 US 06526065 US 52606583 A US52606583 A US 52606583A US 4703505 A US4703505 A US 4703505A
Authority
US
Grant status
Grant
Patent type
Prior art keywords
portion
synthesizer
formant
parameters
command signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US06526065
Inventor
Norman C. Seiler
Stephen S. Walker
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intersil Corp
Original Assignee
Harris Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Grant date

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis

Abstract

A coding scheme which uses Shannon-Fano coding for data headers to identify the type of command signal, uses a first set of formant data in the command signal to generate second sets of formant data for sound class initialization, and uses delta modulation to update the initialized sound class and for sound type transitions. The header indicates initialization of sound classes, repeat of the previous command, updating the previous command or end of word. Given types of command signals and sound classes have the same header and the data portion of the command signal defines which type of command signal is present. A unique delta modulation scheme is used wherein an increment, decrement or no change is indicated by a 11, a 00 or a 10 or 01 wherein each pair represents the delta modulation bit for a parameter, one in the present frame and one in the previous frame for that parameter. A repeat code with no data is used when the delta modulation bit for all parameters change from the previous frame.

Description

BACKGROUND OF THE INVENTION

The present invention relates generally to speech synthesizers and, more specifically, to a memory-efficient speech data encoding scheme.

The application of digital and analog network synthesis to the generation of artificial speech has been an area of active research interest for over two decades. Methods of implementing speech synthesizers range from digital algorithms in a large-scale mainframe-based systems to VLSI components intended for commercial consumption. Analysis and synthesis techniques most commonly used for speech processing rely upon concepts such as LPC (Linear Predictive Coding), PARCOR (Partial Autocorrelation), CVSD (Continuously Variable Slope Delta Modulation) and waveform compression. Generally, these methods share either or both of two deficiencies: (1) the speech quality is sufficiently coarse or mechanical to become annoying after repeated listening sessions, and (2) the bit rate of the associated encoding scheme is too high to permit memory efficient realization of large vocabulary systems. To date, these limitations have restricted high-volume application of speech synthesizers to the consumer marketplace.

Techniques for defining useful speech synthesizer parameters and extracting time-varying values from actual human speech are diverse. Such procedures fall under the general categories of "speech data extraction" and "speech parameter tracking." Such methods usually involve digitization of original human speech followed by successive application of many complex algorithms in order to produce useful parameter values. These algorithms must be implemented on digital computers and normally do not produce speech data in real time. In addition to computer speech analysis and parameterization from digitized human speech, other methods of deriving the synthesizer parameters may include visual analysis of speech waveforms on sonograph plots, artificial parameter generation by rule, and conversion from analysis data assembled by other synthesis methods.

Once the speech data has been generated, it is desirable to reduce it to some binary format which allows convenient and efficient storage in the memory space of the synthesizer. Methods for achieving this are often termed "speech data compression" or "speech data reduction" and the binary data formats they produce are generally referred to as "speech data coding schemes." The reduction methods are usually implemented as digital algorithms which operate on the output of the parameter tracking routines. To be properly and usefully implemented, a speech data encoding scheme must contain values for all synthesizer parameters necessary for high-quality speech reproduction and should permit storage of these values in significantly less memory space than that required by the output of the parameter tracking routine itself.

Most speech synthesizers and their associated data extraction and compression algorithms are "frame" oriented. A frame is defined as a small fixed time segment of the original speech waveform. The frame duration is short enough (usually on the order of 10 msec) so that the speech signal does not vary greatly during that interval. Thus, the analysis algorithms divide the original speech signal into successive, discrete time intervals, or frames, of uniform duration and extract sets of parameter values for each frame. The data reduction algorithms then condense these values into the encoding scheme which, in turn, is stored in memory. The encoded data are thus bit packets which are also oriented successively in time by frames.

The synthesizer accesses the speech memory at the same frame rate used to analyze the original speech and code the data. During each frame, a single packet of encoded speech data is read into the synthesizer. Each bit packet must contain two general classes of information: (1) an instruction containing the type of sound or speech to be generated (synthesizer architecture configuration), and (2) the encoded speech parameter data required to produce the speech segment. The coding technique by which this is accomplished directly affects the size of the memory necessary to store all the data packets required for any given synthetic utterance.

A figure of merit, called the "bit rate," has been defined for data coding schemes as a measure of performance. The bit rate is the ratio of memory size requirement (binary data) to corresponding speech segment duration (seconds). Given equivalent speech quality, a coding scheme with a low bit rate is considered to be more efficient than a scheme with a higher bit rate. There is, however, a rough correlation between bit rate and speech quality over wide ranges of bit rate when many different coding schemes are considered.

Phoneme synthesizers generally have a bit rate on the order of 100 bits per second and produce a synthesizer with mechanical sound. Linear predictive coding and waveform compression achieve substantially better speech quality, but require a bit rate on the order of 1000 bits per second. Substantially optimum speech quality is achieved by CVSD and pulse code modulation at a bit rate at or above 16,000 per bits per second. Formant synthesis has the capability of producing speech quality between LPC and CVSD at a bit rate less than LPC which is counter to the general relationship between speech quality and bit rate of prior art methods.

An example of data compression for linear predictive coding is described in U.S. Pat. No. 4,209,836 to Wiggins, Jr., et al. wherein a 6000 bits per second scheme is reduced to 1000 to 1200 bits per second. Recognizing that formant data can be stored more efficiently than the reflective coefficients of linear predictive coding, U.S. Pat. No. 4,304,965 to Blanton et al. uses formant data for storage at an equivalent bit rate as low as 300 bits per second and converts it to LPC type reflective coeffecients for use in an LPC-based speech synthesizer.

There is a need to provide a data compression scheme for formant based synthesizer having reduced memory requirements while maintaining speech quality.

SUMMARY OF THE INVENTION

An object of the present invention is to provide a formant-based coding scheme which reduces storage requirements while maintaining speech quality.

Another object of the present invention is to minimize the data bit rate and storage by judicial selection of independently variable formant parameters.

Still another object of the present invention is to provide an improved delta modulation scheme applicable to any communication system.

These and other objects of the invention are attained by a coding scheme which uses Shannon-Fano coding for data headers to identify the type of command signal, uses a first set of formant data in the command signal to generate second sets of formant data for sound class initialization, and uses delta modulation to update the initialized sound class and for sound type transitions. The header indicates initialization of sound classes, repeat of the previous command, updating the previous command or end of word. Given types of command signals and sound classes have the same header and the data portion of the command signal defines which type of command signal is present.

A unique delta modulation scheme is used wherein an increment, decrement or no change is indicated by a 11, a 00 or a 10 or 01 wherein each pair represents the delta modulation bit for a parameter, one in the present frame and one in the previous frame for that parameter. A repeat code with no data is used when the delta modulation bit for all parameters change from the previous frame.

Other objects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of the interconnection of a voice synthesizer, speech ROM and micro-controller.

FIG. 2 is a block diagram of the architecture of a vocal track model.

FIGS. 3 and 4 are graphs of quantization level of parameters 1 and 2 as a function of frame numbers.

FIG. 5 is a flow chart of the encoder.

FIG. 6 is a flow chart of the decoder.

FIG. 7 is a block diagram of the speech synthesizer architecture incorporating the vocal track model of FIG. 2.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

Generation of synthetic human speech typically requires a system similar to that illustrated in block diagram form in FIG. 1. The diagram details a monolithic, integrated circuit approach to synthesis, but functionally identical systems may be realized via other methods such as discrete circuitry or digital computer software packages. The speech generation system consists of four principle parts: (1) a controller function which determines when speech will be generated and what will be spoken; (2) a synthesizer block which functions as an artificial human vocal tract or waveform generator to produce the speech; (3) a data bank or memory containing the speech (vocal tract) parameter values required by the synthesizer to generate the various words and sounds which constitute its vocabulary; (4) an audio amplifier, filter, and loudspeaker to convert the electrical signal to an acoustic waveform.

As illustrated in FIG. 1, fourteen ROM address lines are supplied, allowing access to 131 K bit memories. At 500 bits per second, this corresponds to 26 seconds of speech. This capacity will be adequate for nearly all possible applications. Data buses for the ROM and controller are separated to avoid bus contention and a total of five handshake lines are required.

The controller sends an eight bit indirect utterance address to the synthesizer which in turn uses this information to access the two byte start-of-utterance address located in the lowest page of the speech ROM. The controller's data is flagged valid with write-bar WR. The utterance address is output on the ROM Address bus lines and the speech data is accessed by byte until an "end of word" (EOW) code is encountered. Such a code results in termination of the speech generation and the transmission of an interrupt code to the controller via the EOW line. The ROMEN line is available for memory clocking, where necessary, and the RST line resets the synthesizer for the next word. An external power amplifier will be required to drive an 8 ohm speaker.

A vocal tract model of a formant-based speech synthesizer is illustrated in FIG. 2. It includes a glottal (voiced) path in parallel with a fricative path. The glottal path includes a glottal or spectral shaping filter 12; first, second, third and fourth formant filters 14, 16, 18, 20, respectively; and a variable glottal-path attenuator 22 all connected in series. The fricative path includes a modulator 24, a variable fricative-path attenuator 26, a nasal/fricative pole filter 28 and a nasal/fricative zero filter 30. The output of the glottal path and of the fricative path are connected to an output buffer 32 which provides a speech output. A pitch pulse generator 34 provides a periodic signal of a given frequency. A turbulence generator 36 is a pseudorandom white noise source. A rectifier 38 is connected between the output of the first formant filter 14 in the glottal path and the modulator 24 of the fricative path.

A plurality of switches are provided to reconfigure the synthesizer to produce the different classes of sounds. Switch S1 connected to the input of the glottal path at the glottal filter 12 selects between the pitch pulse generator 34 and turbulence generator 36. Switch S2 connected to the modulator 24 of the fricative path selects the rectified signal from the first formant filter 14 and rectifier 38 or a fixed value voltage which is shown as +1 volts. A third switch S3 connects the nasal/fricative pole and zero filters 28 and 30 to the output of the fricative attenuator 26 so as to form a fricative path or disconnects the nasal/fricative pole and zero filters from the fricative path and connect them to a link 40 which will be part of the glottal path. Switch S4 normally connects the output of the formant filters to the input of the glottal path attentuator 22 and may disconnect the formant filters from the glottal attentuator 22 and connect it to the nasal/ fricative pole and zero filters 28 and 30 via the link 40 and switch S3. Switch S5 normally connects the output of the nasal/fricative zero filter 30 to the output buffer 32 but may also disconnect it from the buffer 32 and connect it to the glottal attenuator 22. Switch S6 connects switch S4 either to the output of the fourth formant filter 20 or to the bypass link 42 which is connected directly to the output of the glottal filter 12. The position of the switches for the seven sound classes is illustrated in Table 1:

                                  TABLE 1__________________________________________________________________________FORMANT SYNTHESIZER SWITCH ASSIGNMENTS              VOICE                   FRICATIVE                          VOICEDVOWEL  ASPIRATE         NASAL              BAR  OR STOP                          FRICATIVE__________________________________________________________________________S.sub.1  a    b      a    a    a      aS.sub.2  b    b      b    b    b      aS.sub.3  a    a      a    a    b      bS.sub.4  a    a      b    a    b      aS.sub.5  a    a      b    a    a      aS.sub.6  b    b      b    a    b      b__________________________________________________________________________

The sixteen operational parameters required by the synthesizer architecture of FIG. 2 to generate speech and suggested ranges for most male speakers are described in Table 2. The respective points of input are noted in FIG. 2.

              TABLE 2______________________________________FORMANT SYNTHESIZER PARAMETERSParameter   Description    Bits      Range______________________________________F.sub.0 *Pitch frequency                  5         0,65-160 HzF.sub.g Glottal filter break                  fixed     200 Hz   frequencyF.sub.1 *Center frequency of                  4         200-800 Hz   first formantBW.sub.1   Bandwidth of first                  4(F.sub.1 depen-                            50-80 Hz   formant        dent)F.sub.2 *Center frequency of                  4         800-2100 Hz   second formantBW.sub.2   Bandwidth of second                  4(F.sub.2 depen-                            50-100 Hz   formant        dent)F.sub.3 *Center frequency of                  3         1500-2900 Hz   third formantBW.sub.3   Bandwidth of third                  3(F.sub.3 depen-                            130-200 Hz   formant        dent)F.sub.4 Center frequency of                  fixed     3200 Hz   fourth formantBW.sub.4   Bandwidth of fourth                  fixed     200 Hz   formantF.sub.z *Center frequency of                  3         600-2000 Hz   nasal/fricative   zeroBW.sub.z   Bandwidth of nasal/                  3(F.sub.z depen-                            100-300 Hz   fricative zero dent)F.sub.p Center frequency of                  3(F.sub.z depen-                            200 Hz   nasal/fricative                  dent)     (nasal),   pole                     1400-4000 HzBW.sub.p   Bandwidth of nasal/                  3(F.sub.z depen-                            40 Hz (nasal)   fricative pole dent)     320-800 HzA.sub.v *Voicing amplitude                  3, (6 dB  0,0.016-1.0                  steps)A.sub.F *Fricative amplitude                  3, (6 dB  0,0.016-1.0                  steps)______________________________________

A brief review of Table 2 indicates that there are variable parameters signified by asterisks, parameters dependent on variable parameters and fixed parameters. The significance of this will be explained below. It should be noted that the number of bits for each of the variable parameters are for purposes of example and illustrates the efficiency of the present coding scheme.

Although FIG. 2 illustrates a specific vocal tract model, other models will use the formant parameters of Table 2 and thus the coding scheme of the present invention is not to be limited to any specific vocal tract model. The only requirement is that the synthesizer be a frame-oriented, formant synthesizer capable of accepting the variable parameters of Table 2. The apparatus of FIGS. 1 and 2 provide a background to better understand the present invention.

The speech data coding scheme of the present invention consists of binary bit packets, or commands, in four general categories. These commands are frame-oriented; one command per frame (10 msec nominal) is stored in the speech data memory. The four categories are: (1) sound initialization, requiring data for most or all of the parameters, (2) updates, requiring incrementing or decrementing a few parameters, (3) repeats, which require no data since the current sound is maintained and (4) terminals or halts, which signifies the end of a word (EOW) and thus requires no data. The commands consists of two parts, namely, header bits to indicate the class or category of command and parameter data bits.

To minimize the data rate during transmission of the command signals and to reduce memory capacity requirements, a bit-efficient coding scheme must be used. In the present scheme, the headers are generated using a technique similar to the well-known Shannon-Fano method. Each header is a bit string made up of a series of binary or logical "ones" (1) ended with a logical zero (0). The length of the header determines the command type. The synthesizer must read in from memory and decode each header and adjust its functional synthesis configuration to a form appropriate to produce the sound associated with the command. The shorter headers are assigned to the most frequently occurring commands to reduce bit rate.

Table 3 shows the resulting Shannon-Fano code, data structure and total bit length given the information of Table 2 for the thirteen proposed variable formant parameters. The REPEAT command, which has the most frequent occurance, is the shortest and consists of a single "0" bit and the voice bar and EOW (halt), which have the lowest frequency of occurance, are the longest with nine bits. The EOW does not end with a logical "0". All commands, except REPEAT and EOW, are structured such that the operating parameter data for each command code directly follow the corresponding header bits. During each speech frame, the synthesizer determines from the header bits which parameters are encoded in the data bits and then routes the data to their appropriate points within the system architecture for sound generation.

The initialize group consists of five types: VOWEL/ASPIRATE, FRICATIVE/STOP/PAUSE, NASAL, VOICED-FRICATIVE, and VOICE-BAR.

                                  TABLE 3__________________________________________________________________________                                                TotalCommand     Header            Data                 Description    Bits__________________________________________________________________________REPEAT      0    --                   status quo (10 msec)                                                 1                                 do not alter configuration                                 do not alter/update parametersUPDATE 1    10   0  ΔF.sub.1                  ΔF.sub.2                     ΔF.sub.3                                 mod parameters  6            1  ΔF.sub.0                  ΔA                     ΔF.sub.ZUPDATE 2    110  ΔF.sub.0               ΔF.sub.1                  ΔF.sub.2                     ΔF.sub.3                        ΔF.sub.Z                                 mod parameters  8VOWEL/ASPIRATE       1110 F.sub.0               F.sub.1                  F.sub.2                     F.sub.3                        A.sub.V  reset synthesizer                                                23n-                                 figuration for vowel                                 generation. Zero pitch                                 (F.sub.0 = 0 all bits) sets for                                 aspirate generation.                                 (10 msec)FRICATIVE/STOP/       11110            A.sub. F               F.sub.Z                  B  D           reset configuration                                                17rPAUSE                                 fricative/stop.                                 B.sub.123 = 3 bit pause, 10 msec                                 increments from 10 msec.                                 D.sub.123 = 3 bit fill, 10 msec                                 increments from 0.TRANSITION  111110            0  ΔF.sub.0                  ΔF.sub.1                     ΔF.sub.2                        ΔF.sub.3                           ΔA.sub.V                                 nasal-to-vowel 12            1  ΔF.sub.0                  ΔF.sub.1                     ΔF.sub.2                        ΔF.sub.3                           ΔA.sub.V                              F.sub.Z                                 vowel-to-nasal 15NASAL       1111110            F.sub.0               F.sub.1                  F.sub.2                     F.sub.3                        A.sub.V                           F.sub.Z                                 reset configuration                                                29r                                 nasal generation.                                 F.sub.p -- 200 H.sub.z, BW.sub.p --                                 40 H.sub.z                                 (10 msec)VOICED-     11111110            F.sub.0               F.sub.1                  F.sub.2                     F.sub.3                        A.sub.V                           A.sub.F                              F.sub.Z                                 reset configuration                                                33rFRICATIVE                             v-fricative generation                                 (10 msec)VOICE BAR   111111110            F.sub.0               A.sub.V           reset configuration                                                17r                                 voice bar (10 msec)END OF WORD 1111111110            --                   halt synthesis  9__________________________________________________________________________

As shown in Table 3, each of these command is represented by a unique header followed by data bit string containing the parameter values necessary for the particular sound to be generated. These values correspond to the electrical parameters associated with FIG. 2 and the parameter symbols and the number of data bits of Table 3 are explained in Table 2 except for B and D which are explained in Table 3 as pause and fill durations, respectively. Upon decoding an initialize class header, the synthesizer must set itself into an appropriate architectural approximation to the human vocal tract for that sound. For the synthesizer of FIG. 2 this is accomplished by positioning the switches as listed in Table 1. The data is then used to drive the energy sources and signal filters to produce the intended synthetic sound. As the word "initialize" implies, these commands are coded into memory for frames which correspond to the beginning of a particular sound and for which a full set of data are required.

In order to reduce the amount of memory and bit rate for the longer initialization command signals, some of the formant parameters are fixed and others are made dependent on independent parameters so that they can be derived from the independent parameters. As indicated in Table 2, seven of the parameters, marked with an asterisk, are directly coded into the command data bits as independent variables. Six parameters are dependent variables required by the synthesizer, but are not placed directly into memory by the encoding process. Three fixed parameters are also listed; these are required by the synthesizer but need not be coded to memory since they are not variable. The number of binary bits (quantization levels) necessary to yield a 600 bps average bit rate are also listed in Table 2 for each parameter. The formant bandwidths are not independently compressed, but are intended to be decoded by the synthesizer from the data provided for their respective formant center frequencies. A set of "look-up tables" is required in the synthesizer implementation to accomplish this function. For a switched-capacitor hardware implementation, this look-up function is performed automatically by the capacitor values in the filter stages. For other hardware implementations, this look-up function would be served by a small ROM. In a software implementation, look-up could be accomplished by accessing a data file. Thus, by forcing the formant bandwidths to be functions of the formant frequencies, two parameters may be controlled via a single set of code bits. Similarly, the four nasal/fricative parameters Fp, Fz, BWp , and BWZ are all coded via 3 bits of data for FZ.

For a vowel sound, the data selects the pitch F0 of the pitch generator, the center frequencies F1, F2, F3 of the first three formant filters and the attenuation AV to the glottal attenuator using nineteen bits of data. The bandwidths of the three formant filters are derived from their respective center frequencies. The center frequency and bandwidth of the fourth formant filter is fixed. For an aspirate sound, the frequency F0 of the pitch generator is set to zero and the turbulance generator is connected to the glottal path.

For unvoiced fricative, stop and pause sound generation, the data sets the attenuation AF of the fricative attenuator, the center frequency FZ of the frivative zero filter, the duration B123 of a pause and the duration D123 of a noise fill using twelve data bits. For an unvoiced fricative, the duration of the pause B is zero. For a pause, the amplitude AF of the fricative attenuator cen be set to zero and the duration is (B+D)×10 msec. For a stop, there is a gap of B×10 msec and a noise fill of D×10 msec. The center freqency Fp and BWp bandwidth of the fricative pole filter and the bandwidth BWZ of the fricative zero filter are derived from the fricative zero center frequency FZ.

For nasal sound generation, the data set the pitch frequency F0, formant center frequencies F1, F2 and F3, glottal attenuator amplitude AV and nasal zero filter center frequency FZ using 22 data bits. The bandwidths of the formant filters BW1,2,3, the center frequency Fp and bandwidths BWp of the nasal pole filter and the bandwidth BWZ of the nasal zero filter are derived.

For voiced fricatives, the same parameters as for the nasal sound generation are set and derived with the addition of the amplitude AF of the fricative attenuator which is set by the data using 25 data bits.

For a voice bar sound, the data selects the pitch F0 of the pitch generator using the five data bits and the attenuation AV of the voice attenuator using three data bits. The frequency of the glottal filter is fixed, the formant filters are bypassed and the gain of the glottal attenuator is one.

Since segments, or phonemes, in human speech typically last longer than one frame, the coding scheme also provides the "update" and "repeat" command classes. The nature of human speech is such that its spectral and amplitude characteristics generally change slowly with time. Thus, initialize commands need only be coded for the starting frame of a given phoneme or sound segment. Thereafter, in most frames, repeats and updates may be coded. The REPEAT command consists of a single bit header which is not followed by data. A REPEAT code tells the synthesizer to continue generating its current sound for one more frame. The synthesizer must use its current set of data bits to do so since no new data is coded. A REPEAT may follow any other command, except EOW, including another REPEAT.

The update commands are coded when parameter data variations, or updates, are required during the synthesis of a particular sound or phoneme. Because such changes are typically small, only one data bit per parameter is coded using delta-modulation to increment or decrement one bit at a time. The delta modulation (DM) bits are indicated in Table 3 by a delta (Δ) preceding the parameter notation.

In the case of an initialize command, no delta-modulation process is involved and full n bit values for appropriate parameters are included in the data bits. The UPDATE 1 command allows limited parameter updates for cases when only a few parameters have changes between successive frames; namely either the center frequencies and bandwidths of the formant filters F1, F2, F3 are adjusted or the pitch F0, the attenuator gains A and the nasal parameters FZ, BW2, Fp, BWp are adjusted. The synthesizer by reading the header and the first data bit distinguishes which update is present for UPDATE 1. The UPDATE 2 command allows delta modulation of all parameters except the four nasal variables. The update commands thus provide a simple format for coding both allophone and phoneme transitions (diphones) within each sound class.

The TRANSITION command is also considered to be an update function in order to allow delta-modulation of some parameters across vowel-nasal and nasal-vowel phoneme boundaries. However, a synthesizer architecture change is required for TRANSITION commands, whereas no such changing is needed for UPDATES. Alternatively the NASAL and VOWEL/ ASPIRATE initialize commands can be used instead at the cost of additional memory space.

For a nasal to vowel transition, the nasal pole and zero filters must be eliminated from the glottal signal path and the frequency of the pitch generator and the center frequencies and bandwidths of the formant filters adjusted. For a vowel to nasal transition, the nasal pole and zero filters must be inserted in the glottal signal path, its parameters initialized and the center frequencies and bandwidth of the formant filters adjusted.

It should be noted that for the vowel to nasal transition that the pitch, gain and formant filter center frequencies are in delta-modulation and the frequency of the nasal zero filter is non delta-modulation. Thus, the synthesizer by reading the header and the first data bit distinguishes between the different types of transitions each of which uses different data formats. An update command frame may follow any other command frame, except EOW, including other update commands.

The halt command is listed in Table 3 as EOW or "end-of-word". This command consists of a header without data bits and is coded into memory immediately following the last frame of a complete sound, phoneme, or full utterance. The EOW is interpreted by the synthesizer as a "shut-down" or "end of speech" command.

As is evident from Table 3, delta modulation is employed as an integral part of the present coding scheme in conjunction with the update class of commands, namely UPDATE 1, UPDATE 2 and transition. Through this the overall bit rate and memory storage requirements for the associated synthesizer used to reconstruct the encoded speech are reduced. A unique form of delta modulation is used which offers greater bit savings and versatility then conventional delta modulation techniques. As a first improvement, the present delta modulation uses a single bit coding not only to signify increments and decrements in the original signal, but also no change conditions as well. This permits coding of signals containing substantial steady-state segments and also reduces the net error in the reconstructed signal. A second major improvement is the use of specified update, transition commands and repeat commands which result in considerable savings in bit rate.

An encoding Table for the enhanced DM scheme is shown in Table 4.

              TABLE 4______________________________________Decision Table for a DM encoderOriginal Waveform           Previous Frame                        Current FrameLevel Change    Data Bit     Data Bit______________________________________V.sub.L (x) = V.sub.0 (x) - V.sub.0 (x - 1)           P(x) = B(x - 1)                        B(x)increase 1 LSB  0            1no change       0             1*decrease 1 LSB  0            0increase 1 LSB  1            1no change       1             0**decrease 1 LSB  1            0______________________________________ *For P(x) = 0 and V.sub.0 (x - 1) = -1 LSB, B(x) = 0 **For P(x) = 1 and V.sub.0 (x - 1) = +1 LSB, B(x) = 1

Since the DM bit B(x) can assume only one binary states during any frame x, and any one of three possible events may be coded, a comparison is made between the preceding frames DM bit, denoted by P(x)=B(x-1), and the level change in V0 (x) in order to set the state of current bit B(x). The level change is the difference signal VL introduced earlier,

V.sub.L (x)=V.sub.0 (x)-V.sub.0 (x-1)

where V0 is the original quantized waveform, x is the number of the frame being coded, and x-1 represents the frame previously coded. For the case of simultaneous coding of multiple waveforms or parameters, the decision process of Table 4 may be applied for each separate waveform during each frame x.

A corresponding decoder table is given in Table 5.

              TABLE 5______________________________________Decision Table for a DM DecoderP(x)     B(x)          Response______________________________________0        0             decrement V.sub.R 1 LSB0        1             no change1        0             no change1        1             increment V.sub.R 1 LSB______________________________________

The decoder table is used by a receiving system to reconstruct a synthetic waveform VR which relates closely to V0. The receiver performs this reconstruction by accessing the stored (or transmitted) DM bit string B at a rate of one bit per encoded waveform per frame. The receiver then compares, for each frame x, the current bit B(x) with the previous bit P(x), which has been saved, and adjusts VR accordingly. Then, P(x+1) is set equal to B(x), B(x+1) is received, and the comparison and reconstruction process is repeated for frame x+1. To generate a complete reconstructed signal VR, this process must be repeated iteratively for each frame of V0 originally encoded. For the case of simultaneous decoding of multiple waveforms or parameters, the decision process of Table 5 may be applied for each separate waveform during each frame x.

An examination of Tables 4 and 5 reveals that waveforms reconstructed from DM codes cannot change directly from an increment in frame x to a decrement in frame x+1, or vice-versa. Since the decoder requires P(x) and B(x) to be identical in any frame x in order to increment or decrement VR, two frames are required to reverse the direction of change. The first frame has a code equivalent to a no change, namely the present bit B(x) is opposite the previous bit P(x) and the second frame provides the consecutive state, namely, the present bit B(x) equals the previous bit P(x). This restriction results in a smoothing effect in VR during frames when V0 alternates states successively. Also, in some cases, two frames may be required to increment or decrement VR from a "no change" state. This occurs where the previous bit P(x) of a no change state is opposite in value from the desired present bit B(x) valve. Thus, one bit is needed to reverse the sequence and a second bit is required to provide a consecutive match.

Thus, ±1 LSB variations in VR may lag associate changes in V0 by one frame in time. To insure that VR tracks V0 as closely as possible, the encoding algorithm must determine when this lagging effect is present and adjust its coding procedure accordingly. This determination is best made by computing a value VD which is the difference between V0 and VR. This difference signal, expressed as

V.sub.D (x)=V.sub.0 (x)-V.sub.R (x),

is computed for each coded waveform during each frame x, then, in the following frame, if the value of VD is non-zero, a modification in the encoding process is performed as stated in the footnote to Table 4. This modification allows VR to "catch up" with V0 during frames when V0 is not changing. The scheme automatically catches up where VD =-1, P(x)=0 and an increase is required. The present bit B(x) is encoded as a 1 which with a previous bit P(x) of 0, will be decoded as a no change and thus increases will cancel the negative lag. The same is true when VD=+ 1, P(x)=1 and a decrease is required. The present bit B(x) is encoded as a 0.

The preceding description of the new Delta Modulation methodology can be applied to speech data encoding as follows. The decoding function is performed by a receiver which in the present example is a formant-based speech synthesizer or its functional equivalent. The V0 waveforms are taken to be quantized speech parameter levels generated by any appropriate speech parameter tracking and analysis algorithm, either with or without user interaction, as necessary. One V0 signal is required for each independently coded speech parameter. Using the formant-based parameters of Table 3, a separate V0 signal would exist for F0, F1, F2, F3, FZ, AV, and AF.

The parameter encoding algorithm assigns a separate pair of DM bits, B(x) and P(x) to each independently coded parameter. A similar bit-pair assignment is obviously required in the synthesizer (receiver) for each parameter. During any frame in which an initialize class command is coded, each B(x) bit for each associated parameter is preset to a given logical state. This state may be either a 1 or 0; it is necessary only that the preset state be consistent for all DM data bits and all preset events. Then, during the coding of any speech frame to which an update (UPDATE1, UPDATE2, TRANSITION) is to be assigned, the B(x) bits required by the particular update command are given logical states as dictated by Table 4. Note that parameter level variations greater than ±1 LSB must be smoothed prior to coding or accounted for with an initialize command.

As an example, consider encoding the B(x) bits for arbitrary V0 waveforms associated with any two speech parameters, say F1 and F2. Let F1 and F2 be quantized to four bits each as suggested in Table 3, and let the time variation of the associated quantized values be V01 and V02, respectively, over a total of 25 frames as illustrated in FIGS. 3 and 4. The results of the encoding process are shown in Table 6 for both parameters.

                                  TABLE 6__________________________________________________________________________F.sub.1                        F.sub.2FrameF.sub.1 level change           Previous                Current   F.sub.2 level change                                     Previous                                          CurrentNumberV.sub.L1 (x) =           Bit  Bit       V.sub.L2 (x) =                                     Bit  Bit        CodedX    V.sub.01 (x) - V.sub.01 (x - 1)           P.sub.1 (x)                B.sub.1 (x)                     V.sub.01 (x)**                          V.sub.02 (x) - V.sub.02 (x                                     P.sub.2 (x)                                          B.sub.2 (x)                                                V.sub.02 (x)**                                                     Command__________________________________________________________________________ 1                   1    0                    1     0    Initialize                                                     Update 2   +          1    1    0    --         1    0     -1   Initialize                                                     Update 3   NC         1    0    0    NC         0    0     0    Initialize                                                     Update 4   +          0    1    +1   +          0    1     +1   Initialize                                                     Update 5   NC         1    1    0    +          1    1     +1   Initialize                                                     Update 6   NC         1    0    0    --         1    0     0    Initialize                                                     Update 7   +          0    1    +1   --         0    0     0    Initialize                                                     Update 8   +          1    1    +1   NC         0    1     0    Initialize                                                     Update 9   NC         1    1    0    NC         1    0     0    Initialize                                                     Update10   +          1    1    0    --         0    0     0    Initialize                                                     Update11   --         1    0    -1   --         0    0     0    Initialize                                                     Update12   +          0    1    0    NC         0    1     0    Initialize                                                     Update13   --         1    0    -1   --         1    0     -1   Initialize                                                     Update14   --         0    0    -1   NC         0    0     0    Initialize                                                     Update15   NC         0    0    0    NC         0    1     0    Initialize                                                     Update16   NC         0    1    0    NC         1    0     0    Initialize                                                     Update17   NC         1    0    0    +          0    1     +1   Initialize                                                     Update18   --         0    0    0    +          1    1     +1   Initialize                                                     Update19   --         0    0    0    NC         1    1     0    Initialize                                                     Update20   +          0    1    +1   NC         1    0     0    Initialize                                                     Update21   --         1    0    0    +          0    1     +1   Initialize                                                     Update22   +          0    1    +1   NC         1    1     0    Initialize                                                     Update23   --         1    0    0    --         1    0     -1   Initialize                                                     Update24   NC         0    1    0    NC         0    0     0    Initialize                                                     Update25   NC         1    0    0    NC         0    1     0    Initialize                                                     Update__________________________________________________________________________ * "+" = 1 LSB increment ** LSB "-" = 1 LSB decrement "NC" = No change

              TABLE 7______________________________________  Frame  Number Coded  X      Command______________________________________   1     Initialize   2     Update   3     "   4     REPEAT   5     Update   6     REPEAT   7     Update   8     "   9     "  10     "  11     "  12     REPEAT  13     REPEAT  14     Update  15     "  16     REPEAT  17     REPEAT  18     Update  19     "  20     REPEAT  21     REPEAT  22     Update  23     REPEAT  24     Update  25     REPEAT______________________________________

Referring to Table 6, the current frame DM data bits B1 (x) and B2 (x) are preset to logical state 1 at frame 1 which is coded as an initialize command. Then, using Table 4 and the information in each VL column in Table 6, logical states for B1 (x), B2 (x) and P1 (x), P2 (x) are derived for each frame x. Note that the P1 (x), P2 (x) states for each frame x are the B1 (x), B2 (x) states, respectively, for the preceding frame x-1, i.e., P1 (x)=B1 (x-1), P2 (x)=B2 (x-1). Table 5 is applied during each frame to the P(x) and B(x) bits to generate levels VR1 (x) and VR2 (x) for the reconstructed waveforms. The only purpose the reconstructed signals VR1 and VR2 serve in the encoder is to provide necessary input to generate the difference signals VD1 and VD2. For frames where changes in the reconstructed waveforms lag changes in the original signals, the difference signals are used to modify the coding process per the footnote in Table 4. The values of the difference signals VD1 and VD2 for this particular example are listed frame by frame in Table 6. The resulting command codes generated by the encoding algorithm are shown in the right-hand column of Table 6.

The synthesizer (receiver) assembles VR1 (x) and VR2 (x) by performing a preset event identical in state to the encoder for each initialize command and using the absolute parameter data stored (or transmitted) with the initialize header to establish the initial parameter levels VR1 (1) and VR2 (1) for frame 1. Thereafter, for each frame during which an update command is received, the states of B1 (x) and B2 (x) are read (or received) and compared with P1 (x) and P2 (x) via Table 5 and VR1 (x) and VR2 (x) are altered in level as required. For any frame coded as a REPEAT, the synthesizer sets all B(x) bits artificially by forcing B(x)=P(x) for each parameter. This allows the synthesizer to reconstruct the two-bit "no-change" code for the data-less REPEAT command.

This example has been limited to two parameters for the sake of brevity and clarity and can be extended to all seven independent formant-based parameters with no loss in generality.

It is now possible to observe and understand a major inefficiency in the coding process used to generate Table 6. Observe that update commands or changes from a 1 to a 0 or a 0 to a 1 are generated for frames 4, 6, 12, 13, 17, 20, 21, and 23 in spite of the fact that no parameter changes occur in the reconstructed signals during those frames. Thus, update commands are coded for frames in which the net effect is as though a REPEAT were present. Therefore, bits are being wasted. Only during frames 16 and 25 are REPEAT commands actually coded. Generally, the REPEAT command would be coded with greater frequency in a "typical" segment of speech. However, the waveforms of FIGS. 3 and 4 are perfectly valid for speech segments whose phonetic content is rapidly changing.

To correct this inefficiency, the following enhancement of this coding scheme over conventional Delta Modulation is invoked. During the encoding process, for any potential UPDATE1 or UPDATE2 frame x, the encoder algorithm checks each parameter for a no change condition in the corresponding frame of the associated reconstructed waveform VR. This is done by comparing P(x) and B(x) via Table 5. If all reconstructed parameters do not change levels during that frame, i.e. B(x)=P(x), regardless of whether the original V0 waveforms are changing, the frame is coded as a REPEAT rather than an update. Then P(x-1) is set equal to B(x) as would occur anyway for an update frame, and coding proceeds to the next frame.

The effect of this update-to-repeat transformation scheme upon the encoding of the waveforms of FIGS. 3 and 4 is shown in Table 7. Comparing Table 7 to the right-hand column of Table 6 reveals that a total of 8 update commands were replaced with single-bit REPEAT commands resulting in a bit savings of either 5 or 7 bits per frame, depending on whether an UPDATE1 or UPDATE2 was replaced.

The use of a repeat instead of an update in the delta modulation encoding provides savings in addition to the use of updates 1 and 2 and transition codes as substitutes for initialization frames. During the encoding process for any potential UPDATE2 frame x, the encoder algorithm checks for a no change condition in the P(x), B(x) bits for either (a) F0, and FZ simultaneously, or (b) F1, F2, F3 simultaneously. If condition (a) is true, the frame is coded as an UPDATE1 in which the first data bit following the header is a logical 0 and the delta modulation code for parameters F1, F2, F3 are used. If condition (b) is true, the frame is coded as an UPDATE1 in which the first data bit following the header is a logical 1 and the delta modulation code for parameters F0 and FZ are used.

Note that for any frame in which either AF or AV must be changed via DM, an UPDATE1 is required. If the update occurs during synthesis of a vowel, aspirate, nasal or voice bar, the A bit corresponds to AV. If the update occurs during synthesis of a fricative, stop or pause, the A bit refers to AF. During an update on the amplitude of a voiced-fricative, the A bit is assigned to both AV and AF and both amplitude levels are changed in unison. Laboratory experimentation has indicated no effect on speech quality from enforcing this rule.

This enhanced and bit-efficient scheme for replacing update commands with REPEAT's or shorter updates is relatively transparent to the synthesizer, which simply reads headers and data from memory and decodes the bits as described above. FIGS. 5 and 6 contain flowcharts which show the functional structure of the encoder and decoder, respectively. It should be noted that the flowcharts are intended to most clearly indicate and detail aspects of the update process and the handling and control of the Delta Modulation bits.

For any single speech parameter coded via the flowchart of FIG. 5, a potential maximum of 50% of all updates may be replaced with either a REPEAT or a shorter update. Since variations in several parameters must be considered simultaneously, the effective bit savings is less due to limited correlation between parameter changes. A typical reduction in bit rate of between ten and twenty percent compared to coding without removing ineffective updates has been observed after extensive coding of both isolated and connected speech.

A functional diagram of the synthesizer architect are capable of operating with the coding scheme of the present invention is illustrated in FIG. 7. The multiplexer and fourteen bit address counter hold ROM access while the twenty-five bit PISO counter buffer converts the eight bit parallel speech data into a serial bit stream for decoding and distribution. The header decode logic and latches identify the type of sounds (vocal, nasal, etc.) to be generated and route the incoming data into the appropriate parameter latches for comparison with the previously transmitted data. The new data is blended with the old data via delta modulation and the resulting format parameters are applied to the vocal tract circuitry of FIG. 2. Since the elements of FIG. 7 are well known, they are not described in detail.

From the preceding description of the preferred embodiment, it is evident that the object of the invention are attained. By using seven independent formant parameters to represent and generate thirteen formant parameters for eight sound types and Shannon-Fano coding with a unique delta-modulation method, natural sounding speech at bit rate of 500 to 600 bits per second is attained. Although the invention has been described and illustrated in detail, it is to be clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation. The spirit and scope of the invention are to be limited only by the terms of the appended claims.

Claims (13)

What is claimed is:
1. In a formant based speech synthesizer, including at least three formant filters, a pitch and turbulence generator, spectral filter, nasal zero and pole filters, glottal and fricative attenuators, and control means for controlling the configuration and parameters of the above elements, the improvement being said control means which comprises:
storing means for storing command signals each of which includes a first portion indicating the type of command signal and a second portion indicating values of a first set of synthesizer parameters;
said first portion of said command signals indicating initialization of sound classes, repeating the previous command signal, updating previous command signals and end of word types of command signals;
determining means connected to said storing means and responsive to said first portion of said command signal for determining the configuration of said synthesizer to produce a class of sound; and
producing means connected to said storing means and responsive to said second portion of said command signals for producing values for a second set of synthesizer parameters as a function of said values of said first set of synthesizer parameters;
adjusting means connected to said storing means and said producing means for adjusting the operating characteristics of said synthesizer as a function of said first and second set of synthesizer parameters.
2. A formant based speech synthesizer according to claim 1 wherein initialization command signals have the greatest bit lengths, and said repeat command signal has the shortest bit length.
3. A formant based speech synthesizer according to claim 1, wherein said second portion of said updating command signals are in delta modulation and including decoding means responsive to said first portion of said command signal indicating an update for decoding the delta modulated second portion of said command signal and changing said first set of synthesizer parameters.
4. A formant based speech synthesizer according to claim 3 wherein the first bit of the second portion of an update command signal further indicates the class of updates and said delta-modulation decoding means does not decode said first bit.
5. A formant based speech synthesizer according to claim 3 wherein said updating command signals include sound class transition command signals having a shorter bit length than said sound class initialization command signals.
6. A formant based speech synthesizer according to claim 5 wherein said determining means is responsive to first portion of a transition command signal and said delta-modulation decoding means is responsive to the second portion of a transition command signal.
7. A formant based speech synthesizer according to claim 6 wherein the first bit of the second portion of a transition command signal further indicate the class of transition and said delta-modulation decoding means does not decode said first bit.
8. A formant based speech synthesizer according to claim 7 wherein for one class of transition command signal said delta-modulation decoding means decodes less than all of said second portion in addition to said first bit.
9. A formant based speech synthesizer according to claim 1, wherein said first portion of said command signal indicates a vowel or aspirate sound class, fricative or stop or pause sound class, nasal sound class, voiced fricative sound class or voice bar sound class.
10. A formant based speech synthesizer according to claim 9 wherein said second portion of a command signal for a vowel or aspirate sound class includes a zero frequency value for said pitch generator for an aspirate sound.
11. A formant based speech synthesizer according to claim 9 wherein said second portion of a command signal for a fricative or stop or pause sound class includes a fricative attenuator value, silence duration value and fill duration value and including means connected to said storing means and responnsive to said first and second portion of said command signal identifying a fricative or stop or pause sound class for controlling said adjusting means for periods determined by said silence and duration values to produce fricative, or stop or pause sounds.
12. In a formant based speech synthesizer, incluiding at least three formant filters, a pitch and turbulence generator, spectral filter, nasal zero and pole filters, glottal and fricative attenuators, and control means for controlling the configuration and parameters of the above elements, the improvement being said control means which comprises:
storing means for storing command signals each of which includes a first portion indicating the type of command signals and a second portion indicating values of a first set of synthesizer parameters;
determining means connected to said storing means and responsive to said first portion of said command signal for determining the configuration of said synthesizer to produce a class of sound;
producing means connected to said storing means and responsive to said second portion of said command signals for producing values for a second set of synthesizer parameters as a function of said values of said first set of synthesizer parameters;
said first set of synthesizer parameters including pitch generator frequency, first, second and third formant filter center frequencies, nasal zero filter center frequency and glottal and fricative attenuator amplitudes; and said second set of synthesizer parameters produced including first, second and third formant filter bandwidths, nasal pole filter center frequency and nasal pole and zero filter bandwidths; and
adjusting means connected to said storing means and said producing means for adjusting the operating characteristics of said synthesizer as a function of said first and second set of synthesizer parameters.
13. A formant based speech synthesizer according to claim 12 including a fourth formant filter having a fixed center frequency and bandwidth; and wherein the break frequency of said spectral filter is fixed.
US06526065 1983-08-24 1983-08-24 Speech data encoding scheme Expired - Lifetime US4703505A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US06526065 US4703505A (en) 1983-08-24 1983-08-24 Speech data encoding scheme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US06526065 US4703505A (en) 1983-08-24 1983-08-24 Speech data encoding scheme

Publications (1)

Publication Number Publication Date
US4703505A true US4703505A (en) 1987-10-27

Family

ID=24095777

Family Applications (1)

Application Number Title Priority Date Filing Date
US06526065 Expired - Lifetime US4703505A (en) 1983-08-24 1983-08-24 Speech data encoding scheme

Country Status (1)

Country Link
US (1) US4703505A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4833718A (en) * 1986-11-18 1989-05-23 First Byte Compression of stored waveforms for artificial speech
US5171930A (en) * 1990-09-26 1992-12-15 Synchro Voice Inc. Electroglottograph-driven controller for a MIDI-compatible electronic music synthesizer device
US5351338A (en) * 1992-07-06 1994-09-27 Telefonaktiebolaget L M Ericsson Time variable spectral analysis based on interpolation for speech coding
US5459813A (en) * 1991-03-27 1995-10-17 R.G.A. & Associates, Ltd Public address intelligibility system
US5633983A (en) * 1994-09-13 1997-05-27 Lucent Technologies Inc. Systems and methods for performing phonemic synthesis
US5664163A (en) * 1994-04-07 1997-09-02 Sony Corporation Image generating method and apparatus
US5699478A (en) * 1995-03-10 1997-12-16 Lucent Technologies Inc. Frame erasure compensation technique
US6754265B1 (en) * 1999-02-05 2004-06-22 Honeywell International Inc. VOCODER capable modulator/demodulator
US6993480B1 (en) 1998-11-03 2006-01-31 Srs Labs, Inc. Voice intelligibility enhancement system
US20060047506A1 (en) * 2004-08-25 2006-03-02 Microsoft Corporation Greedy algorithm for identifying values for vocal tract resonance vectors
US20090281807A1 (en) * 2007-05-14 2009-11-12 Yoshifumi Hirose Voice quality conversion device and voice quality conversion method
US20110170711A1 (en) * 2008-07-11 2011-07-14 Nikolaus Rettelbach Audio Encoder, Audio Decoder, Methods for Encoding and Decoding an Audio Signal, and a Computer Program
US8050434B1 (en) 2006-12-21 2011-11-01 Srs Labs, Inc. Multi-channel audio enhancement system
US9966084B2 (en) 2015-08-11 2018-05-08 Xiaomi Inc. Method and device for achieving object audio recording and electronic apparatus

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4199722A (en) * 1976-06-30 1980-04-22 Israel Paz Tri-state delta modulator
US4209836A (en) * 1977-06-17 1980-06-24 Texas Instruments Incorporated Speech synthesis integrated circuit device
US4301328A (en) * 1976-08-16 1981-11-17 Federal Screw Works Voice synthesizer
US4304965A (en) * 1979-05-29 1981-12-08 Texas Instruments Incorporated Data converter for a speech synthesizer
US4304964A (en) * 1978-04-28 1981-12-08 Texas Instruments Incorporated Variable frame length data converter for a speech synthesis circuit
US4441201A (en) * 1980-02-04 1984-04-03 Texas Instruments Incorporated Speech synthesis system utilizing variable frame rate

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4199722A (en) * 1976-06-30 1980-04-22 Israel Paz Tri-state delta modulator
US4301328A (en) * 1976-08-16 1981-11-17 Federal Screw Works Voice synthesizer
US4209836A (en) * 1977-06-17 1980-06-24 Texas Instruments Incorporated Speech synthesis integrated circuit device
US4304964A (en) * 1978-04-28 1981-12-08 Texas Instruments Incorporated Variable frame length data converter for a speech synthesis circuit
US4304965A (en) * 1979-05-29 1981-12-08 Texas Instruments Incorporated Data converter for a speech synthesizer
US4441201A (en) * 1980-02-04 1984-04-03 Texas Instruments Incorporated Speech synthesis system utilizing variable frame rate

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4833718A (en) * 1986-11-18 1989-05-23 First Byte Compression of stored waveforms for artificial speech
US5171930A (en) * 1990-09-26 1992-12-15 Synchro Voice Inc. Electroglottograph-driven controller for a MIDI-compatible electronic music synthesizer device
US5459813A (en) * 1991-03-27 1995-10-17 R.G.A. & Associates, Ltd Public address intelligibility system
US5351338A (en) * 1992-07-06 1994-09-27 Telefonaktiebolaget L M Ericsson Time variable spectral analysis based on interpolation for speech coding
US5664163A (en) * 1994-04-07 1997-09-02 Sony Corporation Image generating method and apparatus
US5633983A (en) * 1994-09-13 1997-05-27 Lucent Technologies Inc. Systems and methods for performing phonemic synthesis
US5699478A (en) * 1995-03-10 1997-12-16 Lucent Technologies Inc. Frame erasure compensation technique
US6993480B1 (en) 1998-11-03 2006-01-31 Srs Labs, Inc. Voice intelligibility enhancement system
US6754265B1 (en) * 1999-02-05 2004-06-22 Honeywell International Inc. VOCODER capable modulator/demodulator
US20060047506A1 (en) * 2004-08-25 2006-03-02 Microsoft Corporation Greedy algorithm for identifying values for vocal tract resonance vectors
US7475011B2 (en) * 2004-08-25 2009-01-06 Microsoft Corporation Greedy algorithm for identifying values for vocal tract resonance vectors
US8509464B1 (en) 2006-12-21 2013-08-13 Dts Llc Multi-channel audio enhancement system
US9232312B2 (en) 2006-12-21 2016-01-05 Dts Llc Multi-channel audio enhancement system
US8050434B1 (en) 2006-12-21 2011-11-01 Srs Labs, Inc. Multi-channel audio enhancement system
US8898055B2 (en) * 2007-05-14 2014-11-25 Panasonic Intellectual Property Corporation Of America Voice quality conversion device and voice quality conversion method for converting voice quality of an input speech using target vocal tract information and received vocal tract information corresponding to the input speech
US20090281807A1 (en) * 2007-05-14 2009-11-12 Yoshifumi Hirose Voice quality conversion device and voice quality conversion method
US20110173012A1 (en) * 2008-07-11 2011-07-14 Nikolaus Rettelbach Noise Filler, Noise Filling Parameter Calculator Encoded Audio Signal Representation, Methods and Computer Program
US8983851B2 (en) 2008-07-11 2015-03-17 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Noise filer, noise filling parameter calculator encoded audio signal representation, methods and computer program
US9043203B2 (en) 2008-07-11 2015-05-26 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, methods for encoding and decoding an audio signal, and a computer program
US20110170711A1 (en) * 2008-07-11 2011-07-14 Nikolaus Rettelbach Audio Encoder, Audio Decoder, Methods for Encoding and Decoding an Audio Signal, and a Computer Program
US9449606B2 (en) 2008-07-11 2016-09-20 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, methods for encoding and decoding an audio signal, and a computer program
US9711157B2 (en) 2008-07-11 2017-07-18 Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V. Audio encoder, audio decoder, methods for encoding and decoding an audio signal, and a computer program
US9966084B2 (en) 2015-08-11 2018-05-08 Xiaomi Inc. Method and device for achieving object audio recording and electronic apparatus

Similar Documents

Publication Publication Date Title
US6098036A (en) Speech coding system and method including spectral formant enhancer
US5305421A (en) Low bit rate speech coding system and compression
US5864801A (en) Methods of efficiently recording and reproducing an audio signal in a memory using hierarchical encoding
US5787387A (en) Harmonic adaptive speech coding method and system
US5377301A (en) Technique for modifying reference vector quantized speech feature signals
US5765127A (en) High efficiency encoding method
US4220819A (en) Residual excited predictive speech coding system
US4696039A (en) Speech analysis/synthesis system with silence suppression
US6963833B1 (en) Modifications in the multi-band excitation (MBE) model for generating high quality speech at low bit rates
US4833718A (en) Compression of stored waveforms for artificial speech
US5524172A (en) Processing device for speech synthesis by addition of overlapping wave forms
US5689615A (en) Usage of voice activity detection for efficient coding of speech
US4821324A (en) Low bit-rate pattern encoding and decoding capable of reducing an information transmission rate
US5774849A (en) Method and apparatus for generating frame voicing decisions of an incoming speech signal
US5774846A (en) Speech coding apparatus, linear prediction coefficient analyzing apparatus and noise reducing apparatus
US6078880A (en) Speech coding system and method including voicing cut off frequency analyzer
US5400434A (en) Voice source for synthetic speech system
US4979216A (en) Text to speech synthesis system and method using context dependent vowel allophones
US4815134A (en) Very low rate speech encoder and decoder
US5694521A (en) Variable speed playback system
US5630012A (en) Speech efficient coding method
Chen et al. Vector quantization of pitch information in Mandarin speech
US6647063B1 (en) Information encoding method and apparatus, information decoding method and apparatus and recording medium
US4696040A (en) Speech analysis/synthesis system with energy normalization and silence suppression
US5966689A (en) Adaptive filter and filtering method for low bit rate coding

Legal Events

Date Code Title Description
AS Assignment

Owner name: HARRIS CORPORATION, MELBORNE, FLA., 32919 A DE COR

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST.;ASSIGNORS:SEILER, NORMAN C.;WALKER, STEPHEN S.;REEL/FRAME:004167/0933

Effective date: 19830725

FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

FPAY Fee payment

Year of fee payment: 12

AS Assignment

Owner name: INTERSIL CORPORATION, FLORIDA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:HARRIS CORPORATION;REEL/FRAME:010247/0043

Effective date: 19990813

AS Assignment

Owner name: CREDIT SUISSE FIRST BOSTON, AS COLLATERAL AGENT, N

Free format text: SECURITY INTEREST;ASSIGNOR:INTERSIL CORPORATION;REEL/FRAME:010351/0410

Effective date: 19990813