EP0051462A2 - Dispositif de traitement de la parole - Google Patents

Dispositif de traitement de la parole Download PDF

Info

Publication number
EP0051462A2
EP0051462A2 EP81305149A EP81305149A EP0051462A2 EP 0051462 A2 EP0051462 A2 EP 0051462A2 EP 81305149 A EP81305149 A EP 81305149A EP 81305149 A EP81305149 A EP 81305149A EP 0051462 A2 EP0051462 A2 EP 0051462A2
Authority
EP
European Patent Office
Prior art keywords
digital
filter
processor according
input
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP81305149A
Other languages
German (de)
English (en)
Other versions
EP0051462A3 (fr
Inventor
Philip T. Mclaughlin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Arris Technology Inc
Original Assignee
Arris Technology Inc
General Instrument Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Arris Technology Inc, General Instrument Corp filed Critical Arris Technology Inc
Publication of EP0051462A2 publication Critical patent/EP0051462A2/fr
Publication of EP0051462A3 publication Critical patent/EP0051462A3/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Definitions

  • this invention relates to the synthesis of human speech. More particularly in a preferred embodiment, this invention relates to methods and apparatus for synthesizing human speech using a hybrid synthesis technique.
  • the four basic techniques for synthesizing human speech aan be compared with the three goals.
  • phoneme synthesis the output voice is clearly understandable.
  • the quality of the voice is robotic. There is one voice and it is not identifiable as other than that of an artificial source.
  • the bit requirement is fairly low with 120 bits for each second of speech.
  • formant synthesis the understandability is good.
  • the quality is better than in phoneme synthesis and it is capable of producing voices which are distinguishable between male and female.
  • the bit requirement is 400-800 bits per second.
  • linear predictive coding the understandability is the same as in formant.
  • the quality can be very high and an individual person's voice can be easily recognized but this requires more bits for more quality. Typically between 1,200 and 3,000 bits are required for each second of speech.
  • Wave-form digitization with compression makes a very broad range of all three goals.
  • the understandability can be very good to very poor.
  • the quality also extends over the same broad range. This reflects the cost or the number of bits required which varies from approximately 1,000 to 5,000 bits for each second of speech, the best quality and understandability being with more bits.
  • the present invention provides an inexpensive and very flexible speech synthesizer and is capable of providing high understandability, a range of qualities from acceptable to highest quality, and a flexible bit rate which is adjustable in the chip from 500 bits to 3,000 bits per each second of speech.
  • the entire synthesizer can be constructed on a single chip. Prior circuits required multiple chips. This has an important result in cost, as the single chip synthesizer significantly reduces the cost over multiple chip synthesizers.
  • An advantage of the present invention is that is uses formant synthesis for voiced (vowel) sounds and L PC for unvoiced sound. Formant and LPC coding can be used in the same word. It is thus a reduction in the size of the memory and bits needed to produce the same sound.
  • a further advantage of the present invention is that more memories can be dedicated to a particular sound or group of sounds, thereby permitting the ability to increase the quality or to adjust the quality of the sound of the synthesizer.
  • the present invention is a technique for synthesizing human speech that is relatively inexpensive, overcomes the deficiencies of the prior art, and is suitable for fabrication on a single VSLI chip.
  • the instant invention comprises a digital, fixed repertoire, processor for generating human speech in response to a sequence of n-bit, digital, command words input thereto.
  • the processor comprises a means for electronically modelling the behavior of the human vocal tract and means, connected to the source of the incoming digital command words, for controlling the operation of the modelling means thereby to control the speech generated by an analog signal generating means associated with the vocal tract modelling means.
  • the speech processor disclosed and claimed herein is intended for product applications where the generation of synthetic speech or complex sounds is required.
  • the speech processor is realized as an N-channel, metal-gate LSI device.
  • One skilled in the art will realize, however, that other implementations are possible.
  • the speech processor according to the invention is a fixed repertoir speech and sound synthesizer, which, in the preferred embodiment, is capable of reproducing up to 256 discrete sound sequences. Each sequence may be called by loading the 8-bit address of the sequence into a command register in the speech processor.
  • the sound sequence data is stored in a mask-programmable read-only memory (ROM), which arrangement enables the user to readily specify the speech or sound pattern desired. By use of suitable interfaces additional ROMs may be added and that would essentially provide an unlimited number of words.
  • the internal organization of the processor enables a large quantity of speech or sound to be specified in 16K bits of read-only memory.
  • the flexible architecture of an on-board controller associated with the processor allows the user to partition the available storage space into as many sequences as desired, up to a maximum of 256 sequences.
  • processor 10 can be divided into two major sections, a controller 11, and a vocal tract model (VTM) 12.
  • VTM 12 is a parametric sound and voice synthesizer which produces complex waveforms under the control of 15 slowly time varying parameters.
  • the controller 11 executes internal instructions, which are stored in ROM, and modifies the appropriate parameters of VTM 12 to create the desired sound sequence.
  • controller 11 the interface between controller 11 and VTM 12 is accomplished through a plurality of parameter registers and related timing signals.
  • VTM operates under control of 15 parameter registers, the size and function of which are listed in Table A.
  • the duration and pitch of the sounds produced by the processors are controlled by the R and P registers, respectively.
  • the contents of the P register in particular, specifies the number of sample periods in each pitch period.
  • the pitch source injects unit impulses, spaced P sample periods apart.
  • the contents of the R register (repeat count) represents the number of pitch cycles which will be executed before a register update occurs.
  • the pitch source is replaced by a zero-mean, pseudo-random noise source.
  • This mode of operation is referred to as the unvoiced mode.
  • the processor requests a register update after 64 x R samples..
  • the amplitude of the source is controlled by the A register. It is coded as 5 bits of mantissa and 3 bits of exponent (i.e. binary shift).
  • controller 11 is a sequential processor which fetches instructions and data from an internal 16K ROM 16, and which is capable of altering the contents of the 15 parameter registers 15 controlling the processor's vocal tract model.
  • the controller has 16 executable instructions, and supports one level of subroutine nesting.
  • the instruction set is specifically designed to allow selective updates of the parameter registers to be performed.
  • the instructions designated JMP and JSR allow chaining of segments, and sharing of code sequences to eliminate redundancy.
  • the processors instruction set comprises two groups of instructions, i.e. register modification instructions, and branch control instructions.
  • the processor can be placed into a mode where each speech or sound sequence can be initiated by pulling down a single conductor, for example, by grounding the SE conductor 19. When the processor is operated in this way, no handshaking is required to select the desired sequence.
  • VTM section 12 drives an internal 7-bit pulse-width-modulation, digital-to-analog converter 26.
  • the design of DAC 26 is such that all noise components are at or above 10 KHZ.
  • the output is low pass filtered to 5KHZ, and amplified, both of these functions being performed externally.
  • the processor has two power supply leads and a common ground.
  • One supply lead powers the interface logic and provides standby current to the controller and parameter registers.
  • the other lead powers VTM 12, controller 11 and internal ROM 16.
  • standby lead 27 When standby lead 27 is high (indicating that the processor is inactive, the second power lead can be powered down, externally, to conserve power. This will provide a standby current which is a fraction of the normal operating current.
  • the standby lead 27 When the processor is loaded with an entry byte, the standby lead 27 is brought low, signalling to the external circuitry to power-up the second power lead.
  • the processor will delay execution of the selected sequence to allow the power supply to settle. This is done by an RC time circuit external to the chip but driven by the chip. If it is not desired to implement the standby mode of operation, the power leads can be tied to a common supply.
  • the processor requires one 3.12MHZ clock, which is generated by an onboard oscillator 31 with external crystal control.
  • the crystal 32 is connected to oscillator 31 external of the processor.
  • the processor models speech (and other sounds) using a series of six 2nd order resonators, excited by either a pseudo-random noise source, or a periodic impulse source.
  • VTM 12 is implemented using totally digital techniques. This approach allows one 2nd order section to serve as six sections through the use of multiplexing and information line pipelining.
  • the section that is implemented is the 2nd order infinite impulse response ( I IR) digital filter shown in Figure 2.
  • I IR infinite impulse response
  • This filter comprises a pair of adder stages 41 and 42 and three multipliers 43, 44 and 46.
  • the filter stage has the transfer function:-
  • the poles of the transfer function occur at: and when, and, the poles will be placed in a complex pair, forming a resonator with the band-width given by: where Fs is the sampling frequency in HZ. and the center frequency (Fk) given by:
  • the modification of the B coefficient changes both the frequency and the bandwidth of the resonator.
  • the modification of the F coefficient changes only the center frequency and has no effect on the corresponding bandwidth.
  • Each 2nd order stage may be used to place two real axis poles of variable bandwidth. If X l is the real axis location of the first pole, and X 2 the second: and: with the bandwidths of each given by: where F t and Bt represent the coefficients in Figure 2.
  • Coefficient updates to the filter occur at the beginning of a pitch period. This timing results in the smallest possible disturbance to the output at update.
  • the information line precision is maintained at 16 bits throughout the VIM filter.
  • the multiply-by-2 unit 44 shown in Fig. 2 is advantageously implemented as a 1-bit binary shift circuit 51 following the F t multiplier. The shift operation is performed separately from the multiplication to scale F t to the same range of values as B t .
  • the coefficients F t and B t are quantized, nonlinearly, to minimize coefficient sensitivity.
  • the two coefficients are processed by the same non-linear transformation hardware in the range.
  • C may be either F t or B t .
  • the non-linear transformation T(X) is implemented with a table-look-up ROM 53.
  • the input coefficients of each stage are expressed in sign magnitude form and used to generate the multiplier coefficients as follows:
  • F ig. 3 represents vocal tract model 12 and depicts six cascaded filter stages 61-66.
  • the input to the filter comes from a pseudo-random noise source 68 while for voiced sounds a scaled, periodic impulse source 67 is used.
  • the purpose of the processor register modification instructions is to update the VTM parameters.
  • the R and P registers determine how many sample periods of a particular sound are output by the VTM, before control of the parameter registers is returned to the controller.
  • the controller waits until the completion of the last of R pitch periods, or 64 X R samples in unvoiced mode, before executing the next register modification instructions.
  • Each of the 13 register modification instructions comprise a 4-bit op code followed by 4 bits of data which are loaded into the lower 4 bits of register R.
  • the instruction RCU is a 1-byte instruction which loads the upper 2 bits of the Repeat register, i.e. Register R.
  • the RCU instructions passes control to the next instruction following execution.
  • the RCU instruction will not cause an immediate transfer of control to the VTM.
  • the lower 4 bits of the instruction byte are set to zero.
  • the last instruction in a series of chained instruction bytes is the only one with non- zero lower 4 bits.
  • the lower 4 bits of the last instruction byte in a chained series is loaded into the lower 4 bits of the R register.
  • the processor's Branch Control instructions differ from the register modification instructions in that they do not modify any of the VTM parameter registers.
  • the sole purpose of the branch control instructions is to determine the location in internal ROM 16 from which the next instruction will be fetched.
  • the JSR (Jump to Subroutine) instruction stores the present address (i.e. the contents of the program counter register 54) in the return buffer (RB) register 56.
  • the program counter 54 is loaded with the 11-bit address specified by the last 3 bits of the instruction byte and the following data byte.
  • an internal return flag is set to indicate that the RB register 56 has been loaded.
  • the controller then fetches and executes the instruction located at PC + 1 in the ROM 16. Only one level of subroutine is allowed.
  • the Jump instruction JMP loads the program counter 54 with the 11-bit address specified by the lower 3 bits of the instruction byte, and the following data byte. Neither the return flag nor the return buffer are modified. Upon completion of execution, the next instruction is fetched from location PC + 1 in ROM 16.
  • the RET (Return from Subroutine) instruction is an instruction whose function depends on the state of the return flag.
  • execution of an RET instruction will cause the contents of the RB register 56 to be moved into the Program Counter 54.
  • the return flag is reset, and the controller fetches the instruction located at PC + 1 and continues execution trom that location. Again, only one level of subroutine is allowed.
  • the status of the input buffer flag IB F is checked. If the I BF flag is set, indicating that the starting address of the next sound sequence has been loaded into the speech processor, the contents of input buffer register 21 (8 bits) is loaded into program counter 54, left justified.
  • the controller will disable any further output from the VTM and wait for the input buffer flag to become set.
  • the standby conductor 27 will go high and remain high until the input buffer flag is set. The standby conductor was previously discussed in the section on standby operation. When the input buffer flag is set, execution continues as described above.
  • the previously discussed preferred embodiment i.e. the N-channel, metal-gate LSI device, has the following characteristics:
  • the contents of the 2K x 8 ROM 16 can be read-out in serial format.
  • the processor can be placed in a test mode where instructions and data are input from the 8-bit input port 18 in place of the ROM 16.
  • the information line, the data path in the VTM, is output on the serial output (SER).
  • digital-to-analog converter 26 is a PWM design, the DAC output can be tested as an ordinary digital output. No special level detection is required.
  • the test program for a processor advantageously comprises two sections, a fixed part which tests the functionality and para- metrics of the processor, excluding ROM 16, and a ROM test section which is unique to the pattern being tested.
  • the speech processor according to the invention has a totally digital architecture which is designed to operate in the TTL voltage range.
  • the drive requirements, operating voltages, speed requirements, are all compatible with implementation in standard N-Channel Metal Gate LS1 technology.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Electrophonic Musical Instruments (AREA)
EP81305149A 1980-11-03 1981-10-29 Dispositif de traitement de la parole Withdrawn EP0051462A3 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US20304280A 1980-11-03 1980-11-03
US203042 1980-11-03

Publications (2)

Publication Number Publication Date
EP0051462A2 true EP0051462A2 (fr) 1982-05-12
EP0051462A3 EP0051462A3 (fr) 1982-06-09

Family

ID=22752227

Family Applications (1)

Application Number Title Priority Date Filing Date
EP81305149A Withdrawn EP0051462A3 (fr) 1980-11-03 1981-10-29 Dispositif de traitement de la parole

Country Status (2)

Country Link
EP (1) EP0051462A3 (fr)
JP (1) JPS57105800A (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918927A (en) * 1993-02-03 1999-07-06 Becker Group Europe Gmbh Outer support for bracket automobile sun visors

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0016427A2 (fr) * 1979-03-15 1980-10-01 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Synthétiseur numérique de parole à plusieurs canaux

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0016427A2 (fr) * 1979-03-15 1980-10-01 CSELT Centro Studi e Laboratori Telecomunicazioni S.p.A. Synthétiseur numérique de parole à plusieurs canaux

Non-Patent Citations (9)

* Cited by examiner, † Cited by third party
Title
Computer Design, Vol. 17, No. 9, September 1978 Concord (US) "Monilithic PMOS Speech Synthesizer Models Vocal Tract on Single Chip" pages 200,202 * the whole article * *
Computer Design, Vol. 18, No. 7, July 1979 Concord (US) L. SCHMIDT: "Implementing a Digital Filter Design in Custom LSI-Reducing Multiplier Area" pages 180,182,183 * figures 2,3 * *
Electronic Engineering, Vol. 53, No. 647, January 1981 London (GB) "Speech Synthesis: Devices and Applications" pages 41, 45-47, 49,51,52,54,57 * figure 7 * *
Electronics International, Vol. 53, No. 24, November 1980 New York (US) P. HAMILTON: "Speech Processor on Single Chip Talks at Low Bit Rate with Novel Coding Technique" pages 41,42 * the whole article * *
Electronics International, Vol. 53, No. 3, January 31, 1980 New York (US) M.E. HOFF et al.: "Software makes a Big Talker out of the 2920 Microcomputer" pages 102-107 * figures 3,4 * *
Electronics International, Vol. 54, No. 5, March 10, 1981, New York (US) P. AHRENS et al.: "Speech Chip Timeshares a 2-Pole Section to Create a 12-Pole Filter" pages 177-180 * figures 1,2 * *
ICASSP 79, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, April 2-4, 1979 Washington, IEEE New York (US) L. NEBBIA et al.: "Eight-Channel Digital Speech Synthesizer Based on LPC Techniques" pages 884-886 * Abstract * *
ICASSP 80, Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Vol. 3, April 9-11, 1980, San Francisco, IEEE New York (US) J.L. CALDWELL: "Programmable Synthesis using a New "Speech Microprocessor" pages 868-871 * Abstract * *
Nachrichten Electronik, Vol. 33, No. 12, December 1979 Heidelberg (DE) "Programmierbarer Digital-Signalprozessor fur Sprachysnthese" pages 399,400 * page 399, center column, lines 16-23; figure 1 * *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5918927A (en) * 1993-02-03 1999-07-06 Becker Group Europe Gmbh Outer support for bracket automobile sun visors

Also Published As

Publication number Publication date
EP0051462A3 (fr) 1982-06-09
JPS57105800A (en) 1982-07-01

Similar Documents

Publication Publication Date Title
US4577343A (en) Sound synthesizer
US4304964A (en) Variable frame length data converter for a speech synthesis circuit
US4344148A (en) System using digital filter for waveform or speech synthesis
US4209844A (en) Lattice filter for waveform or speech synthesis circuits using digital logic
US4520499A (en) Combination speech synthesis and recognition apparatus
CA1203907A (fr) Synthetiseur de la parole
JPH0773183B2 (ja) デジタル信号処理装置
JPS5930280B2 (ja) 音声合成装置
US4435831A (en) Method and apparatus for time domain compression and synthesis of unvoiced audible signals
US4542524A (en) Model and filter circuit for modeling an acoustic sound channel, uses of the model, and speech synthesizer applying the model
US4296279A (en) Speech synthesizer
EP0033510A2 (fr) Dispositif de synthèse de la parole et procédé d'excitation du filtre dudit dispositif
US5850628A (en) Speech and sound synthesizers with connected memories and outputs
JPH082014B2 (ja) 多段デジタル・フィルタ
EP0051462A2 (fr) Dispositif de traitement de la parole
US4335275A (en) Synchronous method and apparatus for speech synthesis circuit
US4627093A (en) One-chip LSI speech synthesizer
US4847906A (en) Linear predictive speech coding arrangement
CA1118104A (fr) Filtre en treillis pour circuits de synthese de la parole ou de formes d'ondes utilisant la logique digitale
EP0299537B1 (fr) Dispositif et méthode pour le traitement des signaux numériques
Quarmby et al. Implementation of a parallel-formant speech synthesiser using a single-chip programmable signal processor
McLaughlin A single-chip speech synthesis system
Caldwell Programmable synthesis using a new" Speech microprocessor"
Nebbia et al. Eight-channel digital speech synthesizer based on LPC techniques
US4959866A (en) Speech synthesizer using shift register sequence generator

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Designated state(s): DE FR GB

AK Designated contracting states

Designated state(s): DE FR GB

17P Request for examination filed

Effective date: 19821126

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19840306

RIN1 Information on inventor provided before grant (corrected)

Inventor name: MCLAUGHLIN, PHILIP T.

P01 Opt-out of the competence of the unified patent court (upc) registered

Effective date: 20230522