GB1068282A

GB1068282A - Speech waveform modification

Info

Publication number: GB1068282A
Application number: GB20363/65A
Authority: GB
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1964-06-09
Filing date: 1965-05-14
Publication date: 1967-05-10
Also published as: DE1472004C3; DE1472004B2; DE1472004A1; US3369077A

Abstract

1,068,282. Speech waveform modification. INTERNATIONAL BUSINESS MACHINES CORPORATION. May 14, 1965 [June 9, 1964], No. 20363/65. Heading H4R. The time duration of an audio signal is modified, e.g. to make speech samples from different sources sound as if from the same source, by adjusting the lengths of the pitch periods of the speech samples to a common length, discontinuities due to amplitude differences between the end of an adjusted pitch period and the beginning of the following period being eliminated by adding to the adjusted pitch period signal a " ramp " signal having an amplitude of zero at the commencement of the pitch period and an amplitude equal to the amplitude difference at the end of the adjusted pitch period. The actual pitch period of the samples is determined by measuring the time of occurrence of the maximum peak to peak excursions of the speech waveform during time intervals assessed by a rough determination of the pitch period. Figs. 3A, 3B and 3C show an embodiment in which a sample of speech from a source 2 is applied via a sampling switch 4 to a store 8 in which the speech sample circulates, together with a synchronizing pulse from single shot circuit 6 marking the start of the speech sample. On each repetition of the speech sample the synchronizing pulse is applied to reset the counter 22 which during the repeat of the sample provides a time scale by counting the output of oscillator 24. The speech sample is applied to a voicing detector 26, which produces a pulse at the beginning of a voiced sound, and to a conventional form of pitch extractor 10 to 16, which produces a count in counter 18 corresponding to the approximate pitch period. The pulse from the voicing detector is applied to gate 28 to gate a count, corresponding to the start of the voiced speech, from counter 22 into the register 30. In addition, this count is fed from gate 28 to an " ADD " circuit 34 which is also fed with the count from counter 18, corresponding to the pitch period, and the resulting count is fed into register 36. During the following cycles the counts in registers 30 and 36 are compared in comparators 38 and 40 with the count from counter 22 and signals are produced to trigger the bi-stable 42 to produce an output on lead Q which is positive during a period from the commencement of voiced signal to a time approximately one pitch period later. During this time speech is fed via gate 44 to the positive and negative peak detectors 64 and 66 which feed the values of the respective peaks to the gates 56, 58, 60 and 62. Initially, the synch. pulse sets bi-stable 72 so that the output 1a is energized and therefore the first positive and negative peak values are fed respectively via gates 56 and 58 to hold circuits 46 and 48, the outputs from which are fed to a differential amplifier 90 to obtain a signal representative of the first peak to peak excursion of the waveform, which is applied via an inverter 92 to adder 94, in addition the time of occurrence of the positive peak is fed via gate 74 into the register 68. Since no input has yet been applied to gates 60 and 62 the output of differential amplifier 96 is zero and the output of adder 94 is therefore negative and passes via gate 98 and gate 100, operated by the delayed negative peak, to trigger bi-stable 72 so that output 1a is removed and 2a is energized so that the following positive and negative peak values are fed via gates 60 and 62 to hold circuits 50 and 52 and differential amplifier 96, while the time of occurrence of the positive peak is fed into register 70 via gate 104. The outputs of amplifiers 90 and 96 are then compared and depending on the relative values either a positive or negative output results from adder 94 which is fed via gates 98 or 108 and gate 100 to trigger bi-stable 72 into such a condition that the following pair of positive and negative peaks is fed in via gates 56 and 58 or gates 60 and 62 to replace the values in the hold circuits corresponding to the smaller peak to peak swing. The process is repeated during the remaining duration of the Q signal discarding always the smaller of the two peak to peak swings being compared until at the end of the Q period the negative going signal detector 54 is energized to apply an output which is gated through the appropriate one of gates 110 and 112 to feed the output of the register 68 or 70, holding the position of the maximum peak to peak swing, into the computer 3. The computer takes the count corresponding to the maximum peak to peak value and adds to that a count corresponding to half a pitch period as stored in counter 18, and one and a half pitch periods, and the resulting values are fed-in to replace the counts stored in registers 30 and 36 respectively. The determination of the maximum peak to peak swing is then carried out, as before, for the interval between the counts now stored in registers 30 and 36 to determine the position of the next pitch pulse. In a similar fashion the positions of the maximum peak to peak swings of the speech waveform is determined for the remainder of the speech sample stored in the circulating store 8 and these values are stored in the computer 3. In order to adjust the pitch cycles to the required length the speech is fed to gates 122 and 140. Each pitch period is adjusted in length during a cycle of operations which entails two repeats of the speech sample from store 8. During the first repeat a pitch pulse from computer 3 on line 126 triggers bi-stable 124 to allow speech to pass through gate 122 to gate 134. At the end of the delay time produced by delay 128, which is equal to the desired pitch period and is equal to or shorter than any actual pitch period in the sample, bi-stable 124 is reset to inhibit gate 122 and gate 134 is operated to apply the voltage value existing at the end of the modified pitch period to the hold circuit 136 where it is stored. The following pitch pulse on line 138 operates gate 140 to feed to hold circuit 142 the voltage value of the speech signal at the beginning of the next pitch period. The two signals from stores 136 and 142 are applied to a differential amplifier 144 to obtain a signal representing the error between the amplitude of the signal at the end of the length modified pitch period and the amplitude of the signal at the commencement of the following pitch period, this signal being applied to the input of the integrating amplifier circuit 146. During the second repetition of the period being modified the gate 122 gates through the speech signal for the modified length period to one input of " add " circuit 130, in addition, the output of bi-stable 124 opens switch 154 on the output of integrating amplifier 146 for the duration of the modified pitch period so that the output of this amplifier consists of a ramp waveform which is zero at the beginning of the pitch period and has a value equal to the output of differential amplifier 144 at the end of the modified pitch period, this signal is applied to the other input of " add " circuit 130 to be added to the modified length speech waveform sample so that the resulting signal will be continuous in amplitude with the following sample starting at the following pitch pulse. The output of adder 130 is converted to digital form in the analogue to digital converter 132 so that it may be stored in computer 3 to await the following length modified pitch periods of the speech sample which will be processed in a similar way on subsequent cycles of the circulating store 8.