US3564142A - Method of multiplex speech synthesis - Google Patents
Method of multiplex speech synthesis Download PDFInfo
- Publication number
- US3564142A US3564142A US748745A US3564142DA US3564142A US 3564142 A US3564142 A US 3564142A US 748745 A US748745 A US 748745A US 3564142D A US3564142D A US 3564142DA US 3564142 A US3564142 A US 3564142A
- Authority
- US
- United States
- Prior art keywords
- speech
- line
- filters
- lines
- description
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 12
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 12
- 238000003860 storage Methods 0.000 claims abstract description 54
- 230000001052 transient effect Effects 0.000 claims abstract description 29
- 230000005284 excitation Effects 0.000 claims abstract description 27
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 7
- 230000006399 behavior Effects 0.000 abstract description 12
- 230000004044 response Effects 0.000 abstract description 8
- 238000001228 spectrum Methods 0.000 abstract description 3
- 108091006146 Channels Proteins 0.000 description 45
- 230000014759 maintenance of location Effects 0.000 description 12
- 230000005540 biological transmission Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000001360 synchronised effect Effects 0.000 description 2
- 235000008733 Citrus aurantifolia Nutrition 0.000 description 1
- 235000011941 Tilia x europaea Nutrition 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 239000004571 lime Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04B—TRANSMISSION
- H04B1/00—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission
- H04B1/66—Details of transmission systems, not covered by a single one of groups H04B3/00 - H04B13/00; Details of transmission systems not characterised by the medium used for transmission for reducing bandwidth of signals; for improving efficiency of transmission
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
Definitions
- the method provides for the storage of the time-sampled digital description of the transient behaviors of n spectrum channel bandpass filters. Only one such description is needed for synthesizing the speech signals for m speech lines.
- the transient responses of the band-pass filters are modulated by the frequency function for the given line.
- the modulated transient values are added for corresponding time samples and stored in a delay line for the given speech line.
- the stored value of a speech line is released at points in time defined by the excitation function, thus releasing a digital description of the transient response of the set of band-pass filters as if they were excited by a unit pulse and modulated by the frequency function of the givenspeech line.
- the digital description is demodulated to an analogue form by conventional means.
- the invention relates to a method of channel vocoder multiplex speech synthesis of speech data stored in a data processor for a number of m speech lines.
- the known pulse-excited channel vocoder permits the ready derivation of signals for natural speech generation from data stored in a computer, utilizing in an efficient manner the storage available.
- the speech signals by means of filters, are divided into a number of frequency channels (aggregate or spectrum channels) and an excitation channel carrying the information relating to the basic speech wave.
- pulses are generated in the excitation channel of the speech analyzer,
- the time spacing of which is equivalent to the period of the basic speech wave just analyzed.
- the output signals of a noise generator are either applied to an excitation channel or a method is used which does not distinguish between voiced and unvoiced sounds.
- the speech signal of the excitation channel which is limited to a range from to 500 cps, is nonlinearly distorted due to the nonlinear characteristics of the elements used in the circuit consisting, in the main, of diodes.
- difference frequencies occur. These difference frequencies in the case of vowels, that means the voiced speech segments, in the transient state result in the fundamental frequency of the speech segment just analyzed.
- the main energy component lies within a frequency range exceeding 3,000 cps and difference frequencies occur which, behind the diodes, contain a distorted energy component in the range from some 20 to 500 cps, resulting in noiselike sound characteristics.
- the value of the speech energy present in the individual lines can, in a known manner, be transmitted in analogue or digital form or be stored for synthesizing the divided speech signal.
- the known method of speech signal synthesis in pulse excited recorders invariably starts from the concept that at certain times, for example initiated by the excitation pulses, the aggregate channel values are transmitted in the form of amplitude modulated pulse to the corresponding channel filters of the synthesizer.
- the invention is characterized in that the description of the transient behavior of n aggregate channel filters is stored, that the values of this description of each aggregate channel filter is separately modulated with the frequency function of the same aggregate channel filter, added, subsequently stored and finally at the times given by the speech excitation the stored modulated values for each speech channel are separately called and demodulated.
- the method can in an advantageous manner, with the help of digital means, be performed so that the description of the transient behavior of the aggregate channel filters, as a digital representation of the values of k scanning points, is stored in a delayline storage.
- the digital values of the frequency function for'all scanning points of all n aggregate channels and all m speech channels are transmitted to another delayline storage, at such times that the values associated with the two delayline storages, without additional synchronization, are multiplied, added and, subsequently, through a distributor, are separately transmitted to delayline storages associated with each speech line.
- the digital data relating to speech excitation are transferred to a further delayline storage which, through another distributor, separately control the synchronous calling of the data for each speech line from the delayline storages for transmission to the decoders.
- Another advantageous embodiment is characterized in that during each cycle of the delayline storage, a line value in a counter is incremented by one until the counter has reached a predetermined value, thus causing a signal to be emitted from the corresponding delay-line storage to the associated decoder.
- the arrangement of the invention reduces in an advantageous manner the means required for each speech line, permits the dimensions of the vocoder filter set to be readily changed and, in addition, handles the conversion of a major portion of the speech description stored in the computer, particularly coordinating in time the transmission of the speech description to the speech synthesizer.
- H6. 1 is a block diagram of typical operation of the method explained.
- FIG. 2 is a detailed representation of the block diagram of FIG. 1.
- FIG. 3 is a block diagram showing the excitation-controlled calling of infonnation groups from the delay-line storages.
- the filter description permits the generation of a pulse code modulated (PCM) description, which inits turn can easily and simply be decoded in a known manner for analogue speech representation, by multiplying the time description of the filters by the applicable amplitude values of the aggregate function and bysubsequently adding the filter channel values.
- PCM pulse code modulated
- the concept of the arrangement provides a block comprising 50 speech channels, having delay lines VOL, AGL, EXL and VL which allow a pulse frequency of 4.5 Mcps. Lower frequencies necessitate a different design of the system such as, for example, a parallel arrangement of the delay lines. For a greater number of speech lines additional blocks comprising 50 speech channels can be connected to the existing vocoder description (stored in VOL).
- the transient behavior of the filter set is described in coded form, and this description is dynamically stored in the delay line VOL.
- the transient behavior of the filters must be multiplied with the frequency function of the corresponding speech line.
- the changes in the frequency functions are low-frequency ones and can be described with adequate accuracy by a 25 cps wide frequency band.
- the frequency or aggregate information for a number of speech lines can be stored in a single delay-line storage AOL.
- the values stored in the delay-line storages VOL and AOL are multiplied by each other.
- the values of one filter and the factor of the frequency channel occur simultaneously on the multiplier arrangement MULT.
- the results of all frequency channels generally 16 frequency channels are used -must be added.
- the result after this addition in the adder AD consists of a number of digits indicating the impulse response of the filter set multiplied by the current frequency function of the line, provided the filter set is excited by an individually selected pulse magnitude.
- the coded representation of the speech must be stored in the delay-line storages VL, to VL,,,.
- the information groups circulate in these storages, being emitted on the output at the times quantized by means of the kcps quantizing frequency of the speech excitation.
- the excitation of the filter set, the calling of the contents of the delay-line storages VL to VLm is controlled by the means of the excitation information which in coded form for all speech lines is stored in the delayline storage EXL.
- a line value in the counter is incremented by one until the counter has reached a predetermined value, initiating the calling of a value in the corresponding delay-line storage for transmission to the associated decoder D
- This line must be so designed that it provides the scanning values in a delayed fashion.
- the pulse code modulated speech signal on the output of a delay-line storage VL,- is subsequently converted in the associated decoder D, into an analogue speech signal.
- excitation pulses occurring at shorter intervals, every 5 msecs. (according to a max. fundamental frequency of less than 200 cps for the average male voice), have to be described accurately to 0.1 msec.
- Another prerequisite for a good speech quality in the PCM representation consists in 8 bits every 0.1 msec. being provided as a description.
- the longest time interval to be considered is the interval at which the description of the aggregate functions of the 50 speech lines are transmitted from the data processor EDP to the multiplex speech synthesizer, that means 40.1 msecs. or 180, 450 t, where t is the period time of one pulse in the delay lines. At a repetition frequency of 4.5 Mcps, one pulse period is 0.22 ,1. sec.
- the time interval of 40.1 msecs., in its turn, is divided into 50 periods of 3,609 bits each, the individual bit times being referred to as t to r
- the time t is the time at which the first information pulse is available on the lines A and B (FIG. 2).
- the description of the transients of 16 channel filters is dynamically stored in a delay-line arrangement VOL, fifty scanning points of 4 bits each describing one filter.
- the filter information is stored once and circulates in the delay-line storage VOL, unless a fault occurs, causing the circuit Q for the sum of all digits to respond, thus signalling the need for the vocoder description to be written in anew.
- After each 64 bit scanning time 8 blanks are provided enabling synchronization with the individual speech line delay lines VL having a 9 bit group length.
- the delay-line storage VOL is so designed that at 1, every 3,609 I the values of a succeeding scanning point occur on the output line A. This is necessary so that all aggregate channel values of the 50 speech lines can be multiplied by the 50 scanning points of the filter description.
- An additional shift of 9 bits transfers the head of the information t, to the next group position in the delay-line storage VL.
- the description for the 50 speech lines is transmitted from the computer to the speech synthesizer.
- the information relating to the aggregate function is dynamically stored in the delay-line storage AGL so that the aggregate values of a speech line (16 X 4 64 bits) occur one after the other, the blocks of the speech channels being separated by 8 blanks (altogether 72 bits).
- the fifty blocks are succeeded by 9 further blanks, causing the aggregate function to be shifted by one group length of 9 bits in the delay-line storage VL.
- the information arriving on the lines A and B is the corresponding information of the filter description, that means 16 channels described by 4 bits each.
- the multiplier circuit MULT the transmitted by the data processor EDP for the next excitation binary product 4 bits X 4 bits is formed.
- the sixteen values resulting in their entirety in a time for the final speech signal are added in the adder AD and fed to the corresponding speech line delay line VL; through a switch 8,.
- the results of the adder are thus written into the delay lines VL,.
- the speech channel delay lines VL to VL 0 are each 450 bits long, that means they can accommodate 50 groups of (8 +1) bits, the positions of which are referred to as v1 n/m.
- the first adding result is emitted by the speech line 1 and is stored in VL1 in position V1 1/1. Subsequently, after 72 t, a signal in VL position v1 2/9, occurs corresponding to the second of the 50 scanning values of VOL. The first scanning value, upon completion of the writing process, can be found in VL position v1 2/8.
- the following table 1 is a survey of the division of the speech lines 1 to 50 and shows the first group into which a scanning value is entered and to which the first group corresponds Apart from this, it shows the position in which the first scanning value of the transient, which is derived from the output function, can be found.
- the information of the filter set is so stored in the delay-line storage VOL that 50 scanning points are described by 16 frequency values each (16 aggregate channels) of 4 bits.
- the information arriving first is the frequency value fl of the scanning point 1', (tv to tv,), followed by f to f of the scanning point 'r,. Then the frequency value f of the scanning point 1- and finallyf, of the scanning point 1 (tv to w occur. Every 64 bit values are followed by 8 blanks enabling a joint time pattern with the delay-line storages VL to VL
- FIG. 2 shows that the delay-line storage VOL, in which the filter information is stored, consists in the main of three partial delay-line storages VZl to V23 connected in series. The total number of bits which can be stored in this arrangement is identical to that of the delay-line storage AGL in which the values of 16 aggregate channels for speech lines are stored.
- the two delay-line storages have a capacity of 3609 bits.
- the bits are circulated in the delay-line storage arrangement VOL as hereafter described.
- the bit 1v, at the time I is on the output of the delay line VZ2, circulating in the latter in the same manner as the suc ceeding bits tv to tv
- the bit rv at the time 2 is written into the delay line VZ3 as the first information, appearing on the output of this line, that means on line A at the time t
- the bit 11 the last one of the filter description, at the time l arrives on the input of the delay line V23.
- This bit is immediately succeeded, at the time t,, by the bit tv from the delay line VZ2. In between the times t and 1 no bits are transferred to the input of the delay line V23 (9 blanks).
- the aggregate function for 50 speech lines circulates in the delay-line storage 'AGI, each speech line being described by 16 frequency values of 4 bits each. This information, in contrast to the storage VOL, is not shifted.
- the multiplier circuit MULT Every 3609 t, that means every 0.8 msec., the full description of a value of the aggregate function of 50 speech lines is available. After 40.1 msecs. in accordance with the slow change in the aggregate function, it is replaced by new values.
- the description circulating in the storage VOL represents, as already mentioned, the scanning values of the transient, this means the response to the standard excitation pulse sampled at 50 instants of time for the 16 filters of a vocoder aggregate filter set.
- Each result constitutes a scanning value at the scanning time 1-,. of a line and is transferred to the delay-line storage VL,- corresponding to the line.
- the output signal of the delay line VL,- which is transferred to the associated decoder DEC, is shown in FIG. 3explaining in particular the conditions for the line 1.
- Parallel scanning values occur on the input AG 1 on 8 lines at the times t(65 m X 50 X 72). These values are taken up by a static storage BR and, shifted by 9!, written into the delay line VL through the OR gate 05.
- the signal written in circulates in the loop formed by VL and VL, at a period of 450 t, the control signal TlN so controlling the processes that during the writing of new information the information in the loop is suppressed. In this manner the possibility of the first pulse of a group of 9 pulses being transmitted is prevented on the AND gate U8. This position is reserved for a control pulse.
- a signal appears on the input EXI, and a scanning process is initiated during which the pulse of DU initially written into the first position of VLl is shifted by 9t at a period of 450t.
- This pulse rather than being transferred through the partial delay line VL',, is written back through 06, n9, DLY, FF,, U and 05 via a loop shorter by 9 bits.
- this process is terminated by a delay arrangement TF The times at which a pulse occurs on EXl, every Smsec.
- the succeeding table 4 shows the distribution of the pulse groups and is a survey of the principal timing pulses required for controlling the delay-line storages.
- the control switch STS controls the arrangement that the pulse groups inf to inl reach the output of the delay-line storage only when a pulse from EXl circulated in the line VL and, on the other, when this pulse, bypassing VL,, is shifted from block to block.
- the synchronization in time of AG and EX for all 50 lines is ensured by the pulse on EXm occurring at the time at which the first scanning value for lime in appears on the line input.
- a shift by 7 positions of the first scanning value, that means 56 1 occurs between two neighboring lines.
- the selective switch S in FIG. 1 if so designed that the channel sequence corresponds to the sequence of the first scanning value in table 1 (that means EX], EX8, EXIS etc.).
- the time function TlN for controlling the writing into the inputs A0,, of the lines n must be staggered by 72!, switch S of FIG. 1 advancing according to the order of speech lines 1,2 50.
- Method of vocoder multiplex speech synthesis of speech data stored in a data processor comprising the steps of:
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Computational Linguistics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Time-Division Multiplex Systems (AREA)
- Telephonic Communication Services (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AT723167A AT276495B (de) | 1967-08-03 | 1967-08-03 | Verfahren zur Multiplex-Sprachsynthese |
Publications (1)
Publication Number | Publication Date |
---|---|
US3564142A true US3564142A (en) | 1971-02-16 |
Family
ID=3593978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US748745A Expired - Lifetime US3564142A (en) | 1967-08-03 | 1968-07-30 | Method of multiplex speech synthesis |
Country Status (6)
Country | Link |
---|---|
US (1) | US3564142A (enrdf_load_stackoverflow) |
JP (1) | JPS5211161B1 (enrdf_load_stackoverflow) |
AT (1) | AT276495B (enrdf_load_stackoverflow) |
DE (1) | DE1762677A1 (enrdf_load_stackoverflow) |
FR (1) | FR1577550A (enrdf_load_stackoverflow) |
GB (1) | GB1227578A (enrdf_load_stackoverflow) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4363050A (en) * | 1980-07-28 | 1982-12-07 | Rca Corporation | Digitized audio record and playback system |
WO1997009712A3 (en) * | 1995-09-05 | 1997-04-10 | Frank Uldall Leonhard | Method and system for processing auditory signals |
US20240313807A1 (en) * | 2023-03-16 | 2024-09-19 | International Business Machines Corporation | Separable, intelligible, single channel voice communication |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FR2544901B1 (fr) * | 1983-04-20 | 1986-02-21 | Zurcher Jean Frederic | Vocodeur a canaux muni de moyens de compensation des modulations parasites du signal de parole synthetise |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3303335A (en) * | 1963-04-25 | 1967-02-07 | Cabell N Pryor | Digital correlation system having an adjustable impulse generator |
-
1967
- 1967-08-03 AT AT723167A patent/AT276495B/de active
-
1968
- 1968-06-26 FR FR1577550D patent/FR1577550A/fr not_active Expired
- 1968-07-29 GB GB1227578D patent/GB1227578A/en not_active Expired
- 1968-07-30 US US748745A patent/US3564142A/en not_active Expired - Lifetime
- 1968-08-02 DE DE19681762677 patent/DE1762677A1/de active Pending
- 1968-08-02 JP JP43054380A patent/JPS5211161B1/ja active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US3303335A (en) * | 1963-04-25 | 1967-02-07 | Cabell N Pryor | Digital correlation system having an adjustable impulse generator |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4363050A (en) * | 1980-07-28 | 1982-12-07 | Rca Corporation | Digitized audio record and playback system |
WO1997009712A3 (en) * | 1995-09-05 | 1997-04-10 | Frank Uldall Leonhard | Method and system for processing auditory signals |
US20240313807A1 (en) * | 2023-03-16 | 2024-09-19 | International Business Machines Corporation | Separable, intelligible, single channel voice communication |
US12255671B2 (en) * | 2023-03-16 | 2025-03-18 | International Business Machines Corporation | Separable, intelligible, single channel voice communication |
Also Published As
Publication number | Publication date |
---|---|
AT276495B (de) | 1969-11-25 |
FR1577550A (enrdf_load_stackoverflow) | 1969-08-08 |
JPS5211161B1 (enrdf_load_stackoverflow) | 1977-03-29 |
GB1227578A (enrdf_load_stackoverflow) | 1971-04-07 |
DE1762677A1 (de) | 1970-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US3662115A (en) | Audio response apparatus using partial autocorrelation techniques | |
US3823390A (en) | Musical tone wave shape generating apparatus | |
US4677499A (en) | Digital time base corrector | |
US4121058A (en) | Voice processor | |
JPS6131658B2 (enrdf_load_stackoverflow) | ||
JPH07101840B2 (ja) | ディジタル雑音信号発生回路 | |
US3831167A (en) | Digital-to-analog conversion using multiple decoders | |
US3566035A (en) | Real time cepstrum analyzer | |
US4021616A (en) | Interpolating rate multiplier | |
US3403227A (en) | Adaptive digital vocoder | |
US3789144A (en) | Method for compressing and synthesizing a cyclic analog signal based upon half cycles | |
US3752970A (en) | Digital attenuator | |
US4319084A (en) | Multichannel digital speech synthesizer | |
US3564142A (en) | Method of multiplex speech synthesis | |
US3069507A (en) | Autocorrelation vocoder | |
US4062060A (en) | Digital filter | |
US4122743A (en) | Electronic musical instrument with glide | |
US3908114A (en) | Digital Hilbert transformation system | |
US4064363A (en) | Vocoder systems providing wave form analysis and synthesis using fourier transform representative signals | |
GB2103005A (en) | Modulation effect device | |
US3703609A (en) | Noise signal generator for a digital speech synthesizer | |
US3697699A (en) | Digital speech signal synthesizer | |
US4058682A (en) | Expandable memory for PCM signal transmission | |
US4163871A (en) | Digital CVSD telephone conference circuit | |
US3435147A (en) | Adaptive data modem whereby digital data is encoded in time division format and converted to frequency division |