US3564142A - Method of multiplex speech synthesis - Google Patents

Method of multiplex speech synthesis Download PDF

Info

Publication number
US3564142A
US3564142A US748745A US3564142DA US3564142A US 3564142 A US3564142 A US 3564142A US 748745 A US748745 A US 748745A US 3564142D A US3564142D A US 3564142DA US 3564142 A US3564142 A US 3564142A
Authority
US
United States
Prior art keywords
speech
line
filters
lines
description
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
US748745A
Other languages
English (en)
Inventor
Ernst H Rothauser
Kurt F Bandat
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Application granted granted Critical
Publication of US3564142A publication Critical patent/US3564142A/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B1/00Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
    • H04B1/66Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders

Definitions

  • the method provides for the storage of the time-sampled digital description of the transient behaviors of n spectrum channel bandpass filters. Only one such description is needed for synthesizing the speech signals for m speech lines.
  • the transient responses of the band-pass filters are modulated by the frequency function for the given line.
  • the modulated transient values are added for corresponding time samples and stored in a delay line for the given speech line.
  • the stored value of a speech line is released at points in time defined by the excitation function, thus releasing a digital description of the transient response of the set of band-pass filters as if they were excited by a unit pulse and modulated by the frequency function of the givenspeech line.
  • the digital description is demodulated to an analogue form by conventional means.
  • the invention relates to a method of channel vocoder multiplex speech synthesis of speech data stored in a data processor for a number of m speech lines.
  • the known pulse-excited channel vocoder permits the ready derivation of signals for natural speech generation from data stored in a computer, utilizing in an efficient manner the storage available.
  • the speech signals by means of filters, are divided into a number of frequency channels (aggregate or spectrum channels) and an excitation channel carrying the information relating to the basic speech wave.
  • pulses are generated in the excitation channel of the speech analyzer,
  • the time spacing of which is equivalent to the period of the basic speech wave just analyzed.
  • the output signals of a noise generator are either applied to an excitation channel or a method is used which does not distinguish between voiced and unvoiced sounds.
  • the speech signal of the excitation channel which is limited to a range from to 500 cps, is nonlinearly distorted due to the nonlinear characteristics of the elements used in the circuit consisting, in the main, of diodes.
  • difference frequencies occur. These difference frequencies in the case of vowels, that means the voiced speech segments, in the transient state result in the fundamental frequency of the speech segment just analyzed.
  • the main energy component lies within a frequency range exceeding 3,000 cps and difference frequencies occur which, behind the diodes, contain a distorted energy component in the range from some 20 to 500 cps, resulting in noiselike sound characteristics.
  • the value of the speech energy present in the individual lines can, in a known manner, be transmitted in analogue or digital form or be stored for synthesizing the divided speech signal.
  • the known method of speech signal synthesis in pulse excited recorders invariably starts from the concept that at certain times, for example initiated by the excitation pulses, the aggregate channel values are transmitted in the form of amplitude modulated pulse to the corresponding channel filters of the synthesizer.
  • the invention is characterized in that the description of the transient behavior of n aggregate channel filters is stored, that the values of this description of each aggregate channel filter is separately modulated with the frequency function of the same aggregate channel filter, added, subsequently stored and finally at the times given by the speech excitation the stored modulated values for each speech channel are separately called and demodulated.
  • the method can in an advantageous manner, with the help of digital means, be performed so that the description of the transient behavior of the aggregate channel filters, as a digital representation of the values of k scanning points, is stored in a delayline storage.
  • the digital values of the frequency function for'all scanning points of all n aggregate channels and all m speech channels are transmitted to another delayline storage, at such times that the values associated with the two delayline storages, without additional synchronization, are multiplied, added and, subsequently, through a distributor, are separately transmitted to delayline storages associated with each speech line.
  • the digital data relating to speech excitation are transferred to a further delayline storage which, through another distributor, separately control the synchronous calling of the data for each speech line from the delayline storages for transmission to the decoders.
  • Another advantageous embodiment is characterized in that during each cycle of the delayline storage, a line value in a counter is incremented by one until the counter has reached a predetermined value, thus causing a signal to be emitted from the corresponding delay-line storage to the associated decoder.
  • the arrangement of the invention reduces in an advantageous manner the means required for each speech line, permits the dimensions of the vocoder filter set to be readily changed and, in addition, handles the conversion of a major portion of the speech description stored in the computer, particularly coordinating in time the transmission of the speech description to the speech synthesizer.
  • H6. 1 is a block diagram of typical operation of the method explained.
  • FIG. 2 is a detailed representation of the block diagram of FIG. 1.
  • FIG. 3 is a block diagram showing the excitation-controlled calling of infonnation groups from the delay-line storages.
  • the filter description permits the generation of a pulse code modulated (PCM) description, which inits turn can easily and simply be decoded in a known manner for analogue speech representation, by multiplying the time description of the filters by the applicable amplitude values of the aggregate function and bysubsequently adding the filter channel values.
  • PCM pulse code modulated
  • the concept of the arrangement provides a block comprising 50 speech channels, having delay lines VOL, AGL, EXL and VL which allow a pulse frequency of 4.5 Mcps. Lower frequencies necessitate a different design of the system such as, for example, a parallel arrangement of the delay lines. For a greater number of speech lines additional blocks comprising 50 speech channels can be connected to the existing vocoder description (stored in VOL).
  • the transient behavior of the filter set is described in coded form, and this description is dynamically stored in the delay line VOL.
  • the transient behavior of the filters must be multiplied with the frequency function of the corresponding speech line.
  • the changes in the frequency functions are low-frequency ones and can be described with adequate accuracy by a 25 cps wide frequency band.
  • the frequency or aggregate information for a number of speech lines can be stored in a single delay-line storage AOL.
  • the values stored in the delay-line storages VOL and AOL are multiplied by each other.
  • the values of one filter and the factor of the frequency channel occur simultaneously on the multiplier arrangement MULT.
  • the results of all frequency channels generally 16 frequency channels are used -must be added.
  • the result after this addition in the adder AD consists of a number of digits indicating the impulse response of the filter set multiplied by the current frequency function of the line, provided the filter set is excited by an individually selected pulse magnitude.
  • the coded representation of the speech must be stored in the delay-line storages VL, to VL,,,.
  • the information groups circulate in these storages, being emitted on the output at the times quantized by means of the kcps quantizing frequency of the speech excitation.
  • the excitation of the filter set, the calling of the contents of the delay-line storages VL to VLm is controlled by the means of the excitation information which in coded form for all speech lines is stored in the delayline storage EXL.
  • a line value in the counter is incremented by one until the counter has reached a predetermined value, initiating the calling of a value in the corresponding delay-line storage for transmission to the associated decoder D
  • This line must be so designed that it provides the scanning values in a delayed fashion.
  • the pulse code modulated speech signal on the output of a delay-line storage VL,- is subsequently converted in the associated decoder D, into an analogue speech signal.
  • excitation pulses occurring at shorter intervals, every 5 msecs. (according to a max. fundamental frequency of less than 200 cps for the average male voice), have to be described accurately to 0.1 msec.
  • Another prerequisite for a good speech quality in the PCM representation consists in 8 bits every 0.1 msec. being provided as a description.
  • the longest time interval to be considered is the interval at which the description of the aggregate functions of the 50 speech lines are transmitted from the data processor EDP to the multiplex speech synthesizer, that means 40.1 msecs. or 180, 450 t, where t is the period time of one pulse in the delay lines. At a repetition frequency of 4.5 Mcps, one pulse period is 0.22 ,1. sec.
  • the time interval of 40.1 msecs., in its turn, is divided into 50 periods of 3,609 bits each, the individual bit times being referred to as t to r
  • the time t is the time at which the first information pulse is available on the lines A and B (FIG. 2).
  • the description of the transients of 16 channel filters is dynamically stored in a delay-line arrangement VOL, fifty scanning points of 4 bits each describing one filter.
  • the filter information is stored once and circulates in the delay-line storage VOL, unless a fault occurs, causing the circuit Q for the sum of all digits to respond, thus signalling the need for the vocoder description to be written in anew.
  • After each 64 bit scanning time 8 blanks are provided enabling synchronization with the individual speech line delay lines VL having a 9 bit group length.
  • the delay-line storage VOL is so designed that at 1, every 3,609 I the values of a succeeding scanning point occur on the output line A. This is necessary so that all aggregate channel values of the 50 speech lines can be multiplied by the 50 scanning points of the filter description.
  • An additional shift of 9 bits transfers the head of the information t, to the next group position in the delay-line storage VL.
  • the description for the 50 speech lines is transmitted from the computer to the speech synthesizer.
  • the information relating to the aggregate function is dynamically stored in the delay-line storage AGL so that the aggregate values of a speech line (16 X 4 64 bits) occur one after the other, the blocks of the speech channels being separated by 8 blanks (altogether 72 bits).
  • the fifty blocks are succeeded by 9 further blanks, causing the aggregate function to be shifted by one group length of 9 bits in the delay-line storage VL.
  • the information arriving on the lines A and B is the corresponding information of the filter description, that means 16 channels described by 4 bits each.
  • the multiplier circuit MULT the transmitted by the data processor EDP for the next excitation binary product 4 bits X 4 bits is formed.
  • the sixteen values resulting in their entirety in a time for the final speech signal are added in the adder AD and fed to the corresponding speech line delay line VL; through a switch 8,.
  • the results of the adder are thus written into the delay lines VL,.
  • the speech channel delay lines VL to VL 0 are each 450 bits long, that means they can accommodate 50 groups of (8 +1) bits, the positions of which are referred to as v1 n/m.
  • the first adding result is emitted by the speech line 1 and is stored in VL1 in position V1 1/1. Subsequently, after 72 t, a signal in VL position v1 2/9, occurs corresponding to the second of the 50 scanning values of VOL. The first scanning value, upon completion of the writing process, can be found in VL position v1 2/8.
  • the following table 1 is a survey of the division of the speech lines 1 to 50 and shows the first group into which a scanning value is entered and to which the first group corresponds Apart from this, it shows the position in which the first scanning value of the transient, which is derived from the output function, can be found.
  • the information of the filter set is so stored in the delay-line storage VOL that 50 scanning points are described by 16 frequency values each (16 aggregate channels) of 4 bits.
  • the information arriving first is the frequency value fl of the scanning point 1', (tv to tv,), followed by f to f of the scanning point 'r,. Then the frequency value f of the scanning point 1- and finallyf, of the scanning point 1 (tv to w occur. Every 64 bit values are followed by 8 blanks enabling a joint time pattern with the delay-line storages VL to VL
  • FIG. 2 shows that the delay-line storage VOL, in which the filter information is stored, consists in the main of three partial delay-line storages VZl to V23 connected in series. The total number of bits which can be stored in this arrangement is identical to that of the delay-line storage AGL in which the values of 16 aggregate channels for speech lines are stored.
  • the two delay-line storages have a capacity of 3609 bits.
  • the bits are circulated in the delay-line storage arrangement VOL as hereafter described.
  • the bit 1v, at the time I is on the output of the delay line VZ2, circulating in the latter in the same manner as the suc ceeding bits tv to tv
  • the bit rv at the time 2 is written into the delay line VZ3 as the first information, appearing on the output of this line, that means on line A at the time t
  • the bit 11 the last one of the filter description, at the time l arrives on the input of the delay line V23.
  • This bit is immediately succeeded, at the time t,, by the bit tv from the delay line VZ2. In between the times t and 1 no bits are transferred to the input of the delay line V23 (9 blanks).
  • the aggregate function for 50 speech lines circulates in the delay-line storage 'AGI, each speech line being described by 16 frequency values of 4 bits each. This information, in contrast to the storage VOL, is not shifted.
  • the multiplier circuit MULT Every 3609 t, that means every 0.8 msec., the full description of a value of the aggregate function of 50 speech lines is available. After 40.1 msecs. in accordance with the slow change in the aggregate function, it is replaced by new values.
  • the description circulating in the storage VOL represents, as already mentioned, the scanning values of the transient, this means the response to the standard excitation pulse sampled at 50 instants of time for the 16 filters of a vocoder aggregate filter set.
  • Each result constitutes a scanning value at the scanning time 1-,. of a line and is transferred to the delay-line storage VL,- corresponding to the line.
  • the output signal of the delay line VL,- which is transferred to the associated decoder DEC, is shown in FIG. 3explaining in particular the conditions for the line 1.
  • Parallel scanning values occur on the input AG 1 on 8 lines at the times t(65 m X 50 X 72). These values are taken up by a static storage BR and, shifted by 9!, written into the delay line VL through the OR gate 05.
  • the signal written in circulates in the loop formed by VL and VL, at a period of 450 t, the control signal TlN so controlling the processes that during the writing of new information the information in the loop is suppressed. In this manner the possibility of the first pulse of a group of 9 pulses being transmitted is prevented on the AND gate U8. This position is reserved for a control pulse.
  • a signal appears on the input EXI, and a scanning process is initiated during which the pulse of DU initially written into the first position of VLl is shifted by 9t at a period of 450t.
  • This pulse rather than being transferred through the partial delay line VL',, is written back through 06, n9, DLY, FF,, U and 05 via a loop shorter by 9 bits.
  • this process is terminated by a delay arrangement TF The times at which a pulse occurs on EXl, every Smsec.
  • the succeeding table 4 shows the distribution of the pulse groups and is a survey of the principal timing pulses required for controlling the delay-line storages.
  • the control switch STS controls the arrangement that the pulse groups inf to inl reach the output of the delay-line storage only when a pulse from EXl circulated in the line VL and, on the other, when this pulse, bypassing VL,, is shifted from block to block.
  • the synchronization in time of AG and EX for all 50 lines is ensured by the pulse on EXm occurring at the time at which the first scanning value for lime in appears on the line input.
  • a shift by 7 positions of the first scanning value, that means 56 1 occurs between two neighboring lines.
  • the selective switch S in FIG. 1 if so designed that the channel sequence corresponds to the sequence of the first scanning value in table 1 (that means EX], EX8, EXIS etc.).
  • the time function TlN for controlling the writing into the inputs A0,, of the lines n must be staggered by 72!, switch S of FIG. 1 advancing according to the order of speech lines 1,2 50.
  • Method of vocoder multiplex speech synthesis of speech data stored in a data processor comprising the steps of:

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Time-Division Multiplex Systems (AREA)
  • Telephonic Communication Services (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
US748745A 1967-08-03 1968-07-30 Method of multiplex speech synthesis Expired - Lifetime US3564142A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
AT723167A AT276495B (de) 1967-08-03 1967-08-03 Verfahren zur Multiplex-Sprachsynthese

Publications (1)

Publication Number Publication Date
US3564142A true US3564142A (en) 1971-02-16

Family

ID=3593978

Family Applications (1)

Application Number Title Priority Date Filing Date
US748745A Expired - Lifetime US3564142A (en) 1967-08-03 1968-07-30 Method of multiplex speech synthesis

Country Status (6)

Country Link
US (1) US3564142A (enrdf_load_stackoverflow)
JP (1) JPS5211161B1 (enrdf_load_stackoverflow)
AT (1) AT276495B (enrdf_load_stackoverflow)
DE (1) DE1762677A1 (enrdf_load_stackoverflow)
FR (1) FR1577550A (enrdf_load_stackoverflow)
GB (1) GB1227578A (enrdf_load_stackoverflow)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4363050A (en) * 1980-07-28 1982-12-07 Rca Corporation Digitized audio record and playback system
WO1997009712A3 (en) * 1995-09-05 1997-04-10 Frank Uldall Leonhard Method and system for processing auditory signals
US20240313807A1 (en) * 2023-03-16 2024-09-19 International Business Machines Corporation Separable, intelligible, single channel voice communication

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR2544901B1 (fr) * 1983-04-20 1986-02-21 Zurcher Jean Frederic Vocodeur a canaux muni de moyens de compensation des modulations parasites du signal de parole synthetise

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3303335A (en) * 1963-04-25 1967-02-07 Cabell N Pryor Digital correlation system having an adjustable impulse generator

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US3303335A (en) * 1963-04-25 1967-02-07 Cabell N Pryor Digital correlation system having an adjustable impulse generator

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4363050A (en) * 1980-07-28 1982-12-07 Rca Corporation Digitized audio record and playback system
WO1997009712A3 (en) * 1995-09-05 1997-04-10 Frank Uldall Leonhard Method and system for processing auditory signals
US20240313807A1 (en) * 2023-03-16 2024-09-19 International Business Machines Corporation Separable, intelligible, single channel voice communication
US12255671B2 (en) * 2023-03-16 2025-03-18 International Business Machines Corporation Separable, intelligible, single channel voice communication

Also Published As

Publication number Publication date
AT276495B (de) 1969-11-25
FR1577550A (enrdf_load_stackoverflow) 1969-08-08
JPS5211161B1 (enrdf_load_stackoverflow) 1977-03-29
GB1227578A (enrdf_load_stackoverflow) 1971-04-07
DE1762677A1 (de) 1970-09-17

Similar Documents

Publication Publication Date Title
US3662115A (en) Audio response apparatus using partial autocorrelation techniques
US3823390A (en) Musical tone wave shape generating apparatus
US4677499A (en) Digital time base corrector
US4121058A (en) Voice processor
JPS6131658B2 (enrdf_load_stackoverflow)
JPH07101840B2 (ja) ディジタル雑音信号発生回路
US3831167A (en) Digital-to-analog conversion using multiple decoders
US3566035A (en) Real time cepstrum analyzer
US4021616A (en) Interpolating rate multiplier
US3403227A (en) Adaptive digital vocoder
US3789144A (en) Method for compressing and synthesizing a cyclic analog signal based upon half cycles
US3752970A (en) Digital attenuator
US4319084A (en) Multichannel digital speech synthesizer
US3564142A (en) Method of multiplex speech synthesis
US3069507A (en) Autocorrelation vocoder
US4062060A (en) Digital filter
US4122743A (en) Electronic musical instrument with glide
US3908114A (en) Digital Hilbert transformation system
US4064363A (en) Vocoder systems providing wave form analysis and synthesis using fourier transform representative signals
GB2103005A (en) Modulation effect device
US3703609A (en) Noise signal generator for a digital speech synthesizer
US3697699A (en) Digital speech signal synthesizer
US4058682A (en) Expandable memory for PCM signal transmission
US4163871A (en) Digital CVSD telephone conference circuit
US3435147A (en) Adaptive data modem whereby digital data is encoded in time division format and converted to frequency division