GB2173679A - Speech coding - Google Patents
Speech coding Download PDFInfo
- Publication number
- GB2173679A GB2173679A GB08608031A GB8608031A GB2173679A GB 2173679 A GB2173679 A GB 2173679A GB 08608031 A GB08608031 A GB 08608031A GB 8608031 A GB8608031 A GB 8608031A GB 2173679 A GB2173679 A GB 2173679A
- Authority
- GB
- United Kingdom
- Prior art keywords
- pulse
- pulses
- energy
- positions
- filter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
Description
1 GB 2 173 679 A 1
SPECIFICATION
Speech coding This invention is concerned with speech coding, and more particularly to systems in which a speech signal can be generated by feeding the output of an excitation source through a synthesis filter. The coding problem then becomes one of generating, from input speech, the necessary excitation and filter parameters.
LPC(Iinear predictive coding) parameters for the filter can be derived using well-established techniques, and the present invention is concerned with the excitation source.
Systems in which a voiced/unvoiced decision on the input speech is made to switch between a noise 10 source and a repetitive pulse source tend to give the speech output an unnatural quality, and it has been proposed to employ a single "multipulse" excitation source in which a sequence of pulses is generated, no prior assumptions being made as to the nature of the sequence. It is found that, with this method, only a few pulses (say 8 in a 1 Oms frame) are sufficient for obtaining reasonable results. See B S Atal and J R Rernde: "A New Model of LPC Excitation for producing Natural-sounding Speech at Low Bit Rates", Proc. IEEE ICASSP, 15 Paris, pp.614,1982.
Coding methods of this type offer considerable potential for low bit rate transmission - eg 9.6 to 4.8Kbit/s.
The coder proposed by Atal and Remde operates in a 1rial and error feedback loop" mode in an attempt to define an optimum excitation sequence which, when used as an input to an LPC synthesis filter, minimizes a weighted error function over a frame of speech. However, the unsolved problem of selecting an optimum 20 excitation sequence is at present the main reason for the enormous complexity of the coder which limits its real time operation.
The excitation signal in multipulse LPC is approximated by a sequence of pulses located at non-uniformly spaced time intervals. It is the task of the analysis by synthesis process to define the optimum locations and amplitudes of the excitation pulses.
In operation, the input speech signal is divided into frames of samples, and a conventional analysis is performed to define the filter coefficients for each frame. It is then necessary to derive a suitable multipulse excitation sequence for each frame. The algorithm proposed by Atal and Rernde forms a multipulse sequence which, when used to excite the LPC synthesis filter, minimises (that is, within the constraints imposed by the algorithm) a mean-squared weighted error derived from the difference between the synthesised and original speech. This is illustrated schematical)y in Figure 1. The positions and amplitudes of the excitation pulses are encoded and transmitted togetherwith the digitized values of the LPC filter coefficients. Atthe receiver, given the decoded values of the multipuise excitation and the prediction coefficients, the speech signal is recovered atthe output of the LPC synthesis filter.
In Figure 1 it is assumed that a frame consists of n speech samples, the input speech samples being s...sr,-, 35 and the synthesised samples S'.... s'n-1, which can be regarded as vectors s, s'. The excitation consists of pulses of amplitude am which are, it is assumed, permitted to occur at any of the n possible time instants within the frame, but there are only a limited number of them (say k). Thus the excitation can be expressed as an n-dimensional vector a with components a..... an-1, but only k of them are non-zero. The objective is to find the 2k unknowns (k amplitudes, k pulse positions) which minimise the error:
e 2 = FS --7)2 (1) - ignoring the perceptual weighting, which serves simply to filter the error signal such that, in the final result, the residual error is concentrated in those part of the speech band where it is least obtrusive.
The amount of computation required to do this is enormous and the procedure proposed by Atal and Remde was as follows:
(1) Find the amplitude and position of one pulse, alone, to give a minimum error.
(2) Find the amplitude and position of a second pulse which, in combination with this first pulse, give a minimum error; the positions and amplitudes of the pulse(s) previously found are fixed during this stage. 50 (3) Repeat for further pulses.
This procedure could be further refined by finally reoptimising all the pulse amplitudes; or the amplitudes may be reoptimised prior to derivation of each new pulse.
It will be apparent that in these procedures the results are not optimum, inter alia because the positions of all but the kth pulse are derived without regard to the positions or values of the later pulses: the contribution 55 of each excitation pulse to the energy of the synthesised signal is influenced by the choice of the other pulses. In vector terms, this can be explained by noting that the contribution of am is amf, wherif, is the LPC filter's impulse response vector displaced by m, and that the set of vectors fm are not, in general, orthogonal.
(where m = 0 n-1).
The present invention offers a method of deriving pulse parameters which, whilst still not optimum, is 60 believed to represent an improvement.
According to one aspect of the present invention we provide a method of speech coding in which an input speech signal is compared with the response of a synthesis filter to an excitation source, to o.btain an error signal; the excitation source consisting of a plurality of pulses within a time frame containing a larger plurality of speech samples, the amplitude and timing of the pulses being controlled so as to reduce the error 65 2 GB 2 173 679 A 2 signal; in which control of the pulse amplitude and timing comprises the steps of: (1) deriving an estimate of the positions and amplitude of the pulses and (2) carrying out an iterative adjustment process in which individual pulses are selected and their positions and amplitudes reassessed. 5 Some embodiments of the invention will now be described, by way of example, with reference to the accompanying drawings, in which; Figure 1 is a block diagram illustrating the coding process; Figures 2 is a flowchart; Figures 3 and 3a illustrate the operation of the pulse transfer iteration; Figures 4to 7 are graphs illustrating the signal-to-noise ratios that may be obtained.
Figure 8 is a graph of energy gain function against pulse energy; and Figures 9 to 11 are graphs illustrating results obtained using the function illustrated in Figure 8.
It has already been explained that the objective is to find, for each time frame, the parameters of the k non-zero pulses of the desired excitation a. For convenience the excitation is redefined in terms of a k-dimensional vector c containing the amplitude values C, to Ck, and pulse positions pi (i= 1 k) which is indicate where these pulses occur in the n-dimensional vector. The flow chart of the algorithm used in the invention is shown in Figure 2. An initial position estimate of the pulse positions pi, i=1,2 k, is first determined. A block solution for the optimum amplitudes then defines the initial k-pulse excitation sequence and a weighted error energy WP is obtained from the difference between the synthesised and the input speech.
The selection of only one pulse follows whose position pr, might be altered within the analysis frame. The algorithm decides on a new possible location forthis pulse and the block solution is used to determine the optimum amplitudes of this new k-pulse sequence which shares the same k-1 pulse locations with the previous excitation sequence. The new location is retained only if the corresponding weighted error energy W is smaller than WP obtained from the previous excitation signal.
The search process continues by selecting again one pulse out of the k available pulses and altering its position, while the above procedure is repeated. The final k-pulse sequence is established when all the available destination positions within the analysis frame have been considered for the possibility of a single pulsetransfer.
The search algorithm which defines i) the location of a pulse suitable fortransfer and ii) its destination, is 30 of importance in the convergence of the method towards a minimum weighted error. Different search algorithms for pulse selection and transfer will be considered below.
Firstly, we consider the initial estimate step. In principle, any of a number of procedures could be used - including the multistage sequential search procedures discussed above proposed by other workers.
However, a simplified procedure is preferred, on the basis that the reduction in accuracy can be more than 35 compensated for by the pulse transfer stage, and that the overall computational requirement can be kept much the same.
One possibility is to find the maxima of the cross correlation between the input speech and the 1-PC filter's impulse response. However, as voiced speech results in a smooth crosscorrelation which offers a limited number of local maxima, a multistage sequential search algorithm is preferred.
We recall that n-1 (2) E' = 'Z 45, M=0 a. fm ffi 45 Where m is the filter's memory from previously synthesised frames.
Since only k values of the excitation are non-zero Eq. 2 can be written as:
so k (3) 50 s' a + P! Pi 1r where pi is the location index. Consider that then normalized vectors bm fm def ine a basis of unit vectors in an n-dimensional space. Eq.3 shows that the synthesized speech vector can 60 be thought of as the sum of k n-dimensional vectors api iFfpI 11 -Fpi which are obtained by analysind-s-7 in a k dimensional subspace defined by the 5-PI, i = 1,2.... k unit vectors.
At each stage of the search the location of an additional excitation pulse is determined by first obtaining all the orthogonal projections qi, i = 0,1-n-1 of an input vector Sd 3 GB 2 173 679 A 3 onto the n axes of the analysis space and then selecting the projection - qn,.with the maximum magnitude.
These projections correspond to the cross-correlation between Sd and the basis vectors bi, i = 0,1,..n-l. The vector Sd is updated at each stage of the process by subtracting qma,< from it. Note that the initial value- Sd is the input speech vector s minus the filter memory m.
The algorithm can be implemented without the need to find Sd prior to the calculation of all the cross correlation values 11 -qi 11, at each stage of the process. Instead,_qi, i = 0,1... n-1, are defined directly using the linearity property of projection. Thus atthe jth stage of the process qi(j) is formed by subtracting the projection of-q-ma,, C-1) onto the n axes, from-qi (j-1) i.e.
qi(j) = qi(j-1) - Proj rq-..,(j-l)li i = 0,1... n-1 (4) 10 However, as qmax qmax where b,- is the unit basis vector of the axis where qrna>, lies, the orthogonal projections of -qmax onto the n axes are: 15 Projrqmaxli=IFq.axilFb,e.b) i i = 0,1,...n-1 (5) Note that i) the above n dot products Bii='-6-,e-.bi, i=0,1, n-1, are normalized autocovariance estimates of the 20 LPC filter's impulse response, and ii) k.n autocovariance estimates are needed for each analysis frame. Thus during the first stage of the method, n cross-correlation values IFq_1 11, i = 0,1... n-1 are calculated E qrn between the input speech vector s and j. The maximum value 11- ax 11 is then detected to define the location and amplitude of the first excitation pulse. In the next stage the n values 11 -qm.x 11 Bei, i=O, 1.. n-1 are subtracted from the previously found cross correlation values and a new maximum value is determined which provides 25 the location and amplitude of the second pulse. This continues until the locations of the k excitation pulses are found.
The complexity of the algorithm can be considerably reduced by approximating the normalized autocovariance estimates of the LPC filter's impulse response Bei with normalized autocorrelation estimates Rei whose value depends only on the 1 - i difference, viz.
Ri,i = BO, 1 IA 1. In this case only n autocorrelation estimates are calculated for each analysis frame compared to the k.n previously required. The performance of this simplified algorithm, in accurately locating the excitation pulse positions, is reduced when compared to that of the original method. The above approximation however makes the simplified method very satisfactory in providing the initial position estimates.
The initial position estimate may be modified to take account of a perceptual weighting - in which case the filter coefficient-fm (and hence the normalised vectors-6) would be replaced by those corresponding to the combined filter response; and the signal for analysis is also modified.
The pulse positions having been determined, the amplitudes may then be derived. Once a set of k pulse positions is given a "block" approach is used to define the pulse amplitudes. The method is designed to minimize the energy of a weighted error signal formed from the difference between the input-s-and the synthesized-s-' speech vectors.-7 is obtained at the output of the LPC synthesis filter F(z) = 1 [(-P(z)l as:
_7 = 9 -a+ -m where R isthe n x n lowertriangular convolution matrix (6) = [ rr, 0 0... 0 R - 1 ro 0 (7) rn-1 rn-2... r'] 50 rk is the kth value of the F(z) filter's impulse response, a is the vector containing the n values of the excitation and m is the filter's memory from the previously synthesised frames.
Since the excitation vector a consists of k pulses and n-k zeros Eq 6 can be written as:
s=Sc+m (8) where S is now a n x k convolution matrix formed from the columns of R which correspond to the k pulse locations, and c-contains the k unknown pulse amplitudes.
The error vector e = s - m - Sc (9) Where x = s - m has an energy _e T _e which can be minimized using Least Squares and the optimum vector-c is given by:
4 GB 2 173 679 A C = (STS)-1 ST -X 4 (10) As previously mentioned the error however has a flat spectral characteristic and is not a good measure of the perceptual difference between the original and the synthesised speech signals. In general due to the relatively high concentration of speech energy in formant regions, larger errors can be tolerated in the formant regions than in the regions between formants. The shape of the error spectrum is therefore modified using a linear shaping filter V(z).
Whence the weighted error u is given by:
U = FX-- VSW=V - D-h (11) 10 where V and D correspond to the "transformed" by V signal x and convolution matrix S respectively. An error is therefore defined in terms of both the shaping filter V and the excitation sequenc h-required to produce the perceptually shaped error u. The actual error is still of course x - S h and is designated e', whence el = V-1u Furthermore-u-T u-ts minimized when W= (DTD)-1 DT y in which case the spectrum oGu-is flat and its energy is (12) (13) 20 u T u = V T V - h T D T V (14) Thus the "perceptually optimum" excitation sequence can be obtained by minimizing the energy of the error vector u of Eq. 13, where both the input signal x and the synthesis filter F(z) have been modified according to the noise shaping filter V(z). Since the minimization is performed in a modified n-dimensional space, the actual error energy a, T, (see Figure 1) is expected to be larger than the error energy e T e found using c from 30 Eq. 10.
The filter VW is set to:
VW = [1 - PWI 1 [1 - NZI9)l (15) Where 9 controls the degree of shaping applied on the flat spectrum of-u- (Eq. 12). When g=1 there is no 35 shaping while when g =0 then V(z) = [1 -P(z)l and full spectral shaping is applied. The choise of 9 is not too critical in the performance of the system and a typical value of 0.9 is used.
Notice from Eq. 11 that V deemphasizes the formant regions of the input signal x and thatthe modified filter TW (whose convolution matrix is V R = T) has a transfer function 11 [1-P(zlg)l. Also an interesting case arises for 9=0 where V = V x becomes the LPC residual and J1) is a unit matrix. The optimum k pulse excitation sequence consists in this case (see Eq. 13), of the k most significant in amplitude samples of the LPC residual.
The pulse amplitudes-h can be efficiently calculated using Eq. 13 byforming the n-valued cross-correlation CT,=TT-y between the transformed input signal-y and the impulse respone of T(z) only once per analysis frame. Note here that T is the full nxn matrix as opposed to the nxk matrix D. CTy can be conveniently obtained at the output of the modified synthesis filter whose input is the time reversed signal-y. Thus instead of calculating the k cross-correlation valuesDT-y, every time Eq. 13 is solved for a particular set of pulse positions, the algorithm selects from CTy the values which correspond to the position of the excitation pulses and in this way the computational complexity is reduced.
Another simplification results from the fact that only one pulse position, out of k, is changed when a 50 different set of positions is tried. As a result the symmetric matrix D T D found in Eq. 13 only changes in one row and one column every time the configuration of the pulses is altered. Thus given the initial estimate, the amplitudes-h for each of the following multipulse configurations can be efficiently calculated with approximately V multiplications compared to the 0 multiplications otherwise required.
Finally an approximation is introduced to further reduce the computational burden of forming the J13 55 matrix for each set of pulse positions.
J13 is formed from estimates of the autocovariance of the TW filter's impulse response. These estimates are also elements of a larger n x n TTT matrix. The method is considerably simplified by making frT Toeplitz.
In this case there are only n different elements in YrT which can be used to define D T D for any configuration of excitation pulses. These elements need only to be determined once per analysis frame by feeding through 60 TW its reversed in time impulse response. In practice, though, it is more efficient to carry out updating (as opposed to recalculation) processes on the inverse matrix (D T D)-.
Consider nowthe pulse transferstage. The convergence of the proposed scheme towards a minimum weighted error depends on the pulse selection and transfer procedures employed to define various k-pulse excitation sequences. Once the initial excitation estimate has been determined, a pulse is selected for 65 GB 2 173 679 A 5 possible transfer to another position within the analysis frame (see Figure 2).
The criteria for this selection -and for selecting its destination -may vary. In the examples which follow, the destination positions are, for convenience, examined sequentially starting at one end of the frame.
Clearly, other sequences would be possible.
The pulse selection procedure employs the term fiT D T y of Eq. 14, which represents the energy of the 5 synthesised signal and is the sum of k energy terms. Each of these terms, which is the product of an excitation pulse amplitude with the corresponding element of the cross correlation vector CTy, represents the energy contribution of the pulse towards the total energy of the synthesized signal. The pulse with the smallest energy contribution is considered as the most likely one to be located in the wrong position and it is therefore selected for possible transfer to another position.
The procedure adopted is as follows:
a Choose the 9owest energy pulse" using the above criterion.
b define a new excitation vector in which the pulse positions are as before except that the chosen pulse is deleted and replaced by one at position w (w is initially 1).
c recalculate the amplitudes forthe pulses, as described above.
dcompare the new weighted errorwith the reference error - ifthe new error is not lower, increase w by one and return to step b to try the next position. Repetition of step a is not necessary atthis point since the "lowest energy" pulse is unchanged.
- ifthe error is lower, retain the new position, make the new errorthe reference, incrementw, and return to step a to identify which pulse is nowthe 1owest energy" pulse.
This process continues until w reaches n - ie all possible "destination" positions have been tried. During the process, of course, the previous position ofthe pulse being tested, and positions already containing a pulse are not tested - ie w is 'skipped' over those positions. As an extension of this, different selection criteria may be employed in dependence on whether the "destination" in question is a pulse position adjacent an existing pulse; ie. each pulse at position j defines a region from j-Xtoj+X and when w lies 25 within a region a different criterion is used. For example:
A outside regions - 1owest energy" pulse selected within regions - no pulse selected thus when w reaches j-X it is automatically incremented to j+X+1 8 outside regions - 1owest energy" pulse selected within region the pulse defining the region is selected C outside regions - no pulse selected within region - the pulse defining the region is selected Figures 3a and 3b illustrate the successive pulse position patterns examined when the algorithm employs the B scheme. In Figure 3a an analysis frame of n=180 samples is used while n=120 in Figure 3b. In both 35 cases the number of pulses k, is equal to n/10.
In practice, the coding method might be implemented using a suitably programmed digital computer.
More preferably, however, a digital signal processing (DSP) chip - which is essentially a dedicated microprocessor employing a fast hardware multiplier- might be employed.
The coding method discussed in detail above might conveniently be summarised as follows:
For each frame 1 Evaluate the LPC filter coefficients, using the maximum entropy method.
11 (a) find the impulse response of the weighted filter. (this gives us the convolution matrix T=VR).
(b) find the autocorrelation of the weighted filter's impulse response (c) subtract the memory contribution and weight the results; ie find = \Cx = V(-s- m (d) find the cross-correlation of the weighted signal and the weighted impulse response Ill make the initial estimate, by- starting with j = land qi(l) being the cross-correlation values already found (a) find the largest ofil-qi(j)ll which is 11 q,zJj)11 = ll-q (j)11, noting the value ofe so (b) find the n values 11 -qnax(j)ll Rli (c) subtract these from 11 -qi(j)il to give 11 _qi(j+1)11 (d) repeat steps (a) to (d) until k values of 1 - which are the derived pulse positions have been found.
W Find the amplitudes by (a) finding CDy = D T y (obtained from the k pulse positions simply by selecting the relevant columns of the cross-correlation from li(d(above)) (b) find the amplitudes h using the steps defined by equation (13); (D T d)-1 is initially calculated and then updated (c) finding the k energy terms CCDy V Carry out the pulse position adjustment by - starting with w = 1:
(a) checking whether w is within X of an existing pulse, and if not (assuming option A) omitting the 60 pulse having the lowest energy term and substituting a pulse at position w (b) repeat steps W to find the new amplitudes and error (c) advance w to the next available position - if none is available, proceed to step (f) (d) if the error is not lower than the reference error, return to step Va (e) if the error is lower, make the new error the reference error, retain the new amplitude and position 65 6 GB 2 173 679 A 6 and energy terms and return to step (a) (f) calculate the memory contribution for the next frame VI Encode the following information for transmission:
(a) the filter coefficients (b) the k pulse positions (c) the k pulse amplitudes.
VII Upon reception of this information, the decoder (a) sets the 1-PC filter coefficients (b) generates an excitation pulse sequence having k pulses whose positions and amplitudes are as defined by the transmitted data.
A typical set of parameters for a coder are as follows Bandwidth 3.4 KHz Sampling rate 8000 per second LPC order 12 LPC update period 22.5ms Frame size (n) 120 samples Spectral shaping factor (9) 0.9 No of pulses per frame (k) 12 (800 pulses/sec) Results obtained by computer simulation using sentences of both male and female speech, are illustrated 20 in Figures 4to 7. Except where otherwise indicated, the parameters are as stated above. In Figure 4, segmented signal-to-noise ratio, averaged over 3 sec of speech, for pulse transfer options A and B, is shown for 1-PC prediction order varying from 6 to 16.
In Figure 5 the noise shaping constant 9 was varied. 0.9 appears close to optimum. Figure 6 shows the variation of SNR with frame size (pulse rate remaining constant). The small increase in SEG-SNR can be attributed to the improved autocorrelation estimates Rli obtained when larger analysis frames are used. It is also evident, from Figure 6, that the proposed algorithms operate satisfactorily with small analysis frames which lead to computationally efficient implementations. Figure 7 compares the SEC-SNR performance of five multipulse excitation algorithms for a range of pulse rates. Curve 0 gives the performance of the simplified algorithm used to form the Initial Position Estimate forthe system A and 8, whose performance 30 curves are A and B. Curve Q corresponds to the algorithm used by Atal and Rernde, while curve S shows the performance of that algorithm when amplitude optimization is applied every time a new pulse is added to the excitation sequence. Note that the latter two systems employ the autocovariance estimates Bli while the first three systems approximate these estimates with the auto correlation values Rli.
The method proposed here, in essence lifts the pulse location search restrictions found in the methods referred to earlier. The errorto be minimized is always calculated for a set of k pulses, in a way similar to the amplitude optimization technique previously encountered, and no assumptions are involved regarding pulse amplitudes or locations. The algorithm commences with an initial estimate of the k-dimensional subspace and continues changing sequentially the subspace, and therefore the pulse positions, in search of the optimum solution. The pulse amplitudes are calculated with a "block" method which projects the input 40 signal s onto each subspace under consideration.
The proposed system has the potential to out-perform conventional multipulse excitation systems systems and its performance depends on the search algorithms employed to modify sequentially the k dimensional subspace under consideration.
A further modification of interative adjustment process and more especially the criteria for selection of 45 pulses whose positions are to be reassessed will now be considered. The option to be discussed is a modification of scheme (C) referred to above.
The aim is to reduce the computational requirements of the multipulse LPC algorithm described, without reducing the subjective and SNR performance of the system. In scheme C, given the initial excitation estimate, each excitation pulse defines a:hX region and only the possibility of transferring a pulse to a location within its own region is examined by the algorithm. Thus each of the k initial excitation pulses is tested for transfer into one of --tk neighbouring locations.
The complexity of the algorithm implementing scheme C is, it is proposed, reduced by testing only k, pulses for possible transfer where k, <k. The question then arises of how to select, for possible transfer k, out of the k initial excitation pulses.
The proposed pulse selection procedure is based on the following two requirements:
i) the k, pulses to be tested are associated with a high probability of being transferred to another location within their:tX region.
ii) given that an intial excitation pulse is to be transferred to another location, thistransfer results in a considerable change in the energy of the synthesized signal in approximating the energy of the input 60 signal.
is Recall (equation 14) that the energy of the synthesized signal is fiT J-y which is the sum of k energy terms hi dp-, v and D = [ aPl'P2',,, ap,1. Each of these terms represents the energy contribution of an excitation pulse towards the total energy of the synthesized signal. Using the (approximate) assumption that the energy contribution of each pulse is independent on the positionsfamplitudes of the remaining excitation 65 7 GB 2 173 679 A 7 pulses, one can then relate the above two requirements to a normalized energy measure Ei associated with an excitation pulse i:
- T - E hid Pi Y k - T. - h.i Y j=1 1 1j (16) In particular, given that Ei lies within the small energy interval E K, the probability of pulse relocation p(E K) is, p(E K) MK nK (17) where nK is the number of pulses with energy values within the E K interval and only MK of these pulses are 15 actually relocated by the search procedure.
In the second requirement the energy change Q, which results from relocating a pulse from the pi location to pi', is given by - T - - T - Q = h'dp'i y hidp i y k T - Z7 h.d Y j-1 j An average energy change per transfered pulse is now formed as Qav(E K) S1 j = - - PQki Q where POK,1= n QKj MK (19) MK is the number of pulses relocated by the search procedure, whose energy value lies within the E K interval, while nQ,,., is the number of those of the Mk Pulses whose relocation resulted in an energy change value Q lying within the small energy interval Ei.
Using p(E K) and QJE K) an Energy Gain Function G,, is thus defined as (18) 20 GC = p(E' Q av (E K) -1. n Q nk i Qk,i (20) and represents the average energy change per pulse, which results from the relocated pulses, whose normalized energy E fails within the E K interval.
Clearly then, the value of the Energy Gain Function G,, should be larger for the k, pulses, selected to be 50 tested for possible transfer, than for the remaining k - k, pulses in the initial excitation estimate.
In practice, a plot of Energy Gain Function against normalized Energy E can be obtained - eg. from several seconds of male and female speech while a piecewise linear representation is a convenient simplification of this function. The problem of selecting for possible relocation k, out of k pulses can now be solved using this data. That is, given the initial sequence of excitation pulses, the normalized energy Ei is measured for 55 each pulse and the corresponding Ge values are found from the plot- eg as a stored look-up table or computed criteria based on the piecewise linear approximation. Those k, pulses with the largest G. values are then selected and tested for relocation.
Figure 8 shows a typical Ge v. E plot, along with a piecewise linear approximation. It will be noted that if, as shown, the curve is monotonic (which is not always the case) then the largest Ge always corresponds to the 60 largest E. In this instance the conversion is unnecessary: the method reduces to selecting only those k, pulses with the largest values of E. In some circumstances it may be appropriate to use E ' instead of E as the horizontal axis for the plot, and indeed this is in fact so for Figure 8. (E' is given by equation 16 with h' and d' substituted for h and d).
Figure 9 shows the signal-to-noise ratio performance against multiplications required per input sample, 65 8 GB 2 173 679 A 8 for the following four multistage sequential search algorithms:
A: ATAL's scheme with amplitude optimization at each stage Z: ATAL's scheme without amplitude optimization at each stage X: INITIAL ESTIMATE algorithm with amplitude optimization at each stage.
K: INITIAL ESTIMATE algorithm without amplitude optimization at each stage.
as well as forthe proposed block sequential algorithm using the simplified scheme C of pulse selection and destination when allowing 116, 2/6,3/6 and 416 of the initial pulses to be tested for transfer.
The graph shows average segmental SNR obtained at a constant pulse rate with different multipulse algorithms (solid line), for a particular speech sentence. The horizontal axis indicates the algorithm complexity in number of multiplications per sample. The intermittent line shows the SNR performance of each algorithm when its complexity is varied by changing the pulse rate.
Note thatthe complexity of the proposed algorithm is considerably reduced for small transfer pulse ratios while the SNR performance is almost unaffected.
Figure 10 shows for the above system, the number of mulipications required per input sample versus excitation pulses per second.
Figure 11 illustrates the SNR performance of the proposed system for different values of pulse ratios to be tested for transfer. Results are shown for 800 pulses/sec (10 per cent), 1200 pulses/sec (15 per cent) and 1600 pulses/see (20 per cent). Note that the solid line in Figure 11 corresponds to performance of the Initial Estimate algorithm with amplitude optimization at each stage of the search process.
Claims (13)
1. A method of speech transmission in which speech is regenerated in a decoder by generating within each of successive time frames representing a plurality of speech samples a pulse sequence comprising a smaller plurality of pulses, and passing the pulse sequences through a controllable filter, in which, at a transmitter, input speech is processed to derive filter coefficients and pulse position and amplitude information, the pulse positions and amplitude being selected such as to reduce the error between the input speech and the regenerated filter output, characterised in that the pulse position and amplitude information is derived by:
(1) deriving a first estimate of the positions and amplitudes of the pulses, and (2) carrying out an iterative adjustment process in which individual pulses are selected and their positions and amplitudes reassessed.
2. A method according to Claim 1 in which the initial estimate of the pulse positions is made by selecting those pulses corresponding to the k largest values of the cross-correlation between the set of input speech sample amplitudes during the frame and each of a set of normalised vectors corresponding to the time-shifted impulse responses of the filter, where k is the number of pulses.
3. A method according to Claim 1 in which the initial estimate of the pulse positions is made by selecting a first pulse corresponding to the largest value of the cross-correlations between the set of input speech sample amplitudes during the frame and each of a set of normalised vectors corresponding to the time-shifted impulse responses of the filter and successive pulses corresponding to the largest value of adjusted cross-correlations between the input speech vector and the said normalised vectors, the cross-correlations having been adjusted by subtraction of values at least approximately representing the orthogonal projections of vector representations of earlier pulses onto axes represented by the relevant normalised vectors. 45
4. A method according to Claim 1, 2 or3 in which the iterative adjustment process is effected by repeated 45 selection of one of the pulses according to a predetermined criterion, and substituting for that pulse a pulse in an alternative position only if such substitution results in a reduction in the said error, the pulse amplitudes being reevaluated following each such substitution.
5. A method according to Claim 4 in which the predetermined criterion for pulse selection is effected by deriving k energy terms each of which is the product of a pulse amplitude and the corresponding term of the 50 vector CTy=15r _y where D is the convolution matrix of the filter truncated by omission of all terms other than those relevant to the k non zero input pulses and y is the difference between the input speech vector and the f ilter memory; each being adjusted by any perceptual weighting factor.
6. A method according to Claim 4 or 5 in which the alternative positions are selected successively in 55 sequence from the available positions, such that no alternative position is tested for substitution more than once.
7. A method according to Claim 6 in which zones are defined as including a predetermined number of potential alternative positions adjacent a position already occupied by a pulse, and different criteria for selection of a pulse to be substituted are employed dependent on whether the selected alternative position is 60 within or outside the said zones.
8. A method according to Claim 7 in which whenever the selected alternative position fails within a zone, no pulse is selected for substitution.
9. A method according to Claim 7 in which whenever the next available alternative position in sequence is within one of the zones the pulse defining that zone is selected for possible substitution.
9 GB 2 173 679 A 9 10. A method according to anyone of the preceding claims in which the pulse amplitudes, in the initial estimate step and/or during the iterative adjustment process, are calculated using the relation (DTD)-' D T y where-h is a vector consisting of the k amplitudes D is the set of time shifted filter impulse responses corresponding to the pulse positions in question, and y is the difference between the input speech vector and the filter response from previous frames; D and V being adjusted by a perceptual weighting, if any.
10. A method according to Claim 6,7 or9 in which only certain pulses are selected for possible substitution, those pulses being those whose normalised energy corresponds to a larger energy gain 10 function than the unselected pulses, the energy gain function for pulses having energies lying within a given energy interval being the average energy change resulting from relocation of a pulse having an energy within that interval.
11. A method according to Claim 10 in which the energy gain function for each pulse is obtained from a lockup table relating energy intervals and energy gain functions, the lockup table having been empirically 15 derived from a training sequence of speech.
12. A method according to anyone of the preceding claims in which the pulse amplitudes, in the initial estimate step and/or during the iterative adjustment process are calculated using the relation E = (D T D)-' DT -y where E is a vector consisting of the k amplitudes D is the set of time shifted filter impulse responses corresponding to the pulse positions in question, and y is the difference between the input speech vector and the filter response from previous frames; D and y being adjusted by a perceptual weighting, if any.
13. A method of speech transmission substantially as herein described with reference to the accompanying drawings.
Printed in the UK for HMSO, D8818935, 8186, 7102. Published by The Patent Office, 25 Southampton Buildings, London, WC2A lAY, from which copies may be obtained.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB858508669A GB8508669D0 (en) | 1985-04-03 | 1985-04-03 | Speech coding |
GB858515501A GB8515501D0 (en) | 1985-06-19 | 1985-06-19 | Speech coding |
Publications (3)
Publication Number | Publication Date |
---|---|
GB8608031D0 GB8608031D0 (en) | 1986-05-08 |
GB2173679A true GB2173679A (en) | 1986-10-15 |
GB2173679B GB2173679B (en) | 1989-01-11 |
Family
ID=26289084
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
GB08608031A Expired GB2173679B (en) | 1985-04-03 | 1986-04-02 | Speech coding |
Country Status (2)
Country | Link |
---|---|
US (1) | US4944013A (en) |
GB (1) | GB2173679B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2195220A (en) * | 1986-09-11 | 1988-03-30 | British Telecomm | Speech coding |
EP0397628A1 (en) * | 1989-05-11 | 1990-11-14 | Telefonaktiebolaget L M Ericsson | Excitation pulse positioning method in a linear predictive speech coder |
EP0516439A2 (en) * | 1991-05-31 | 1992-12-02 | Motorola, Inc. | Efficient CELP vocoder and method |
GB2285203A (en) * | 1993-12-10 | 1995-06-28 | Nec Corp | Multipulse processing of speech signals |
DE19647298A1 (en) * | 1995-11-17 | 1997-05-22 | Nat Semiconductor Corp | Digital speech coder excitation data determining method |
Families Citing this family (173)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU608944B2 (en) * | 1988-01-05 | 1991-04-18 | British Telecommunications Public Limited Company | Speech coding |
JP2625998B2 (en) * | 1988-12-09 | 1997-07-02 | 沖電気工業株式会社 | Feature extraction method |
US5293448A (en) * | 1989-10-02 | 1994-03-08 | Nippon Telegraph And Telephone Corporation | Speech analysis-synthesis method and apparatus therefor |
WO1990013112A1 (en) * | 1989-04-25 | 1990-11-01 | Kabushiki Kaisha Toshiba | Voice encoder |
JP2940005B2 (en) * | 1989-07-20 | 1999-08-25 | 日本電気株式会社 | Audio coding device |
CA2027705C (en) * | 1989-10-17 | 1994-02-15 | Masami Akamine | Speech coding system utilizing a recursive computation technique for improvement in processing speed |
FR2668288B1 (en) * | 1990-10-19 | 1993-01-15 | Di Francesco Renaud | LOW-THROUGHPUT TRANSMISSION METHOD BY CELP CODING OF A SPEECH SIGNAL AND CORRESPONDING SYSTEM. |
US5659659A (en) * | 1993-07-26 | 1997-08-19 | Alaris, Inc. | Speech compressor using trellis encoding and linear prediction |
US5602961A (en) * | 1994-05-31 | 1997-02-11 | Alaris, Inc. | Method and apparatus for speech compression using multi-mode code excited linear predictive coding |
SE508788C2 (en) * | 1995-04-12 | 1998-11-02 | Ericsson Telefon Ab L M | Method of determining the positions within a speech frame for excitation pulses |
DE69732746C5 (en) * | 1996-02-15 | 2020-11-19 | Koninklijke Philips N.V. | SIGNAL TRANSMISSION SYSTEM WITH REDUCED COMPLEXITY |
TW317051B (en) * | 1996-02-15 | 1997-10-01 | Philips Electronics Nv | |
US5794182A (en) * | 1996-09-30 | 1998-08-11 | Apple Computer, Inc. | Linear predictive speech encoding systems with efficient combination pitch coefficients computation |
US6192336B1 (en) | 1996-09-30 | 2001-02-20 | Apple Computer, Inc. | Method and system for searching for an optimal codevector |
US5832443A (en) * | 1997-02-25 | 1998-11-03 | Alaris, Inc. | Method and apparatus for adaptive audio compression and decompression |
JP3199020B2 (en) * | 1998-02-27 | 2001-08-13 | 日本電気株式会社 | Audio music signal encoding device and decoding device |
JP3824810B2 (en) * | 1998-09-01 | 2006-09-20 | 富士通株式会社 | Speech coding method, speech coding apparatus, and speech decoding apparatus |
US6195632B1 (en) * | 1998-11-25 | 2001-02-27 | Matsushita Electric Industrial Co., Ltd. | Extracting formant-based source-filter data for coding and synthesis employing cost function and inverse filtering |
US6295520B1 (en) | 1999-03-15 | 2001-09-25 | Tritech Microelectronics Ltd. | Multi-pulse synthesis simplification in analysis-by-synthesis coders |
US8645137B2 (en) | 2000-03-16 | 2014-02-04 | Apple Inc. | Fast, language-independent method for user authentication by voice |
KR100464369B1 (en) * | 2001-05-23 | 2005-01-03 | 삼성전자주식회사 | Excitation codebook search method in a speech coding system |
ITFI20010199A1 (en) | 2001-10-22 | 2003-04-22 | Riccardo Vieri | SYSTEM AND METHOD TO TRANSFORM TEXTUAL COMMUNICATIONS INTO VOICE AND SEND THEM WITH AN INTERNET CONNECTION TO ANY TELEPHONE SYSTEM |
US6662154B2 (en) * | 2001-12-12 | 2003-12-09 | Motorola, Inc. | Method and system for information signal coding using combinatorial and huffman codes |
US8677377B2 (en) | 2005-09-08 | 2014-03-18 | Apple Inc. | Method and apparatus for building an intelligent automated assistant |
US7633076B2 (en) | 2005-09-30 | 2009-12-15 | Apple Inc. | Automated response to and sensing of user activity in portable devices |
EP2009623A1 (en) * | 2007-06-27 | 2008-12-31 | Nokia Siemens Networks Oy | Speech coding |
US9318108B2 (en) | 2010-01-18 | 2016-04-19 | Apple Inc. | Intelligent automated assistant |
US8036886B2 (en) * | 2006-12-22 | 2011-10-11 | Digital Voice Systems, Inc. | Estimation of pulsed speech model parameters |
US8977255B2 (en) | 2007-04-03 | 2015-03-10 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US9053089B2 (en) | 2007-10-02 | 2015-06-09 | Apple Inc. | Part-of-speech tagging using latent analogy |
US8620662B2 (en) | 2007-11-20 | 2013-12-31 | Apple Inc. | Context-aware unit selection |
US10002189B2 (en) | 2007-12-20 | 2018-06-19 | Apple Inc. | Method and apparatus for searching using an active ontology |
US9330720B2 (en) | 2008-01-03 | 2016-05-03 | Apple Inc. | Methods and apparatus for altering audio output signals |
US8065143B2 (en) | 2008-02-22 | 2011-11-22 | Apple Inc. | Providing text input using speech data and non-speech data |
US8996376B2 (en) | 2008-04-05 | 2015-03-31 | Apple Inc. | Intelligent text-to-speech conversion |
US10496753B2 (en) | 2010-01-18 | 2019-12-03 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US8464150B2 (en) | 2008-06-07 | 2013-06-11 | Apple Inc. | Automatic language identification for dynamic text processing |
US20100030549A1 (en) | 2008-07-31 | 2010-02-04 | Lee Michael M | Mobile device having human language translation capability with positional feedback |
US8768702B2 (en) | 2008-09-05 | 2014-07-01 | Apple Inc. | Multi-tiered voice feedback in an electronic device |
US8898568B2 (en) | 2008-09-09 | 2014-11-25 | Apple Inc. | Audio user interface |
US8712776B2 (en) | 2008-09-29 | 2014-04-29 | Apple Inc. | Systems and methods for selective text to speech synthesis |
US8583418B2 (en) | 2008-09-29 | 2013-11-12 | Apple Inc. | Systems and methods of detecting language and natural language strings for text to speech synthesis |
US8676904B2 (en) | 2008-10-02 | 2014-03-18 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US9959870B2 (en) | 2008-12-11 | 2018-05-01 | Apple Inc. | Speech recognition involving a mobile device |
US8862252B2 (en) | 2009-01-30 | 2014-10-14 | Apple Inc. | Audio user interface for displayless electronic device |
US8380507B2 (en) | 2009-03-09 | 2013-02-19 | Apple Inc. | Systems and methods for determining the language to use for speech generated by a text to speech engine |
US10241644B2 (en) | 2011-06-03 | 2019-03-26 | Apple Inc. | Actionable reminder entries |
US10706373B2 (en) | 2011-06-03 | 2020-07-07 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US10241752B2 (en) | 2011-09-30 | 2019-03-26 | Apple Inc. | Interface for a virtual digital assistant |
US10540976B2 (en) | 2009-06-05 | 2020-01-21 | Apple Inc. | Contextual voice commands |
US9858925B2 (en) | 2009-06-05 | 2018-01-02 | Apple Inc. | Using context information to facilitate processing of commands in a virtual assistant |
US9431006B2 (en) | 2009-07-02 | 2016-08-30 | Apple Inc. | Methods and apparatuses for automatic speech recognition |
US8682649B2 (en) | 2009-11-12 | 2014-03-25 | Apple Inc. | Sentiment prediction from textual data |
US8600743B2 (en) | 2010-01-06 | 2013-12-03 | Apple Inc. | Noise profile determination for voice-related feature |
US8381107B2 (en) | 2010-01-13 | 2013-02-19 | Apple Inc. | Adaptive audio feedback system and method |
US8311838B2 (en) | 2010-01-13 | 2012-11-13 | Apple Inc. | Devices and methods for identifying a prompt corresponding to a voice input in a sequence of prompts |
US10553209B2 (en) | 2010-01-18 | 2020-02-04 | Apple Inc. | Systems and methods for hands-free notification summaries |
US10705794B2 (en) | 2010-01-18 | 2020-07-07 | Apple Inc. | Automatically adapting user interfaces for hands-free interaction |
US10276170B2 (en) | 2010-01-18 | 2019-04-30 | Apple Inc. | Intelligent automated assistant |
US10679605B2 (en) | 2010-01-18 | 2020-06-09 | Apple Inc. | Hands-free list-reading by intelligent automated assistant |
DE202011111062U1 (en) | 2010-01-25 | 2019-02-19 | Newvaluexchange Ltd. | Device and system for a digital conversation management platform |
US8682667B2 (en) | 2010-02-25 | 2014-03-25 | Apple Inc. | User profiling for selecting user specific voice input processing information |
US8713021B2 (en) | 2010-07-07 | 2014-04-29 | Apple Inc. | Unsupervised document clustering using latent semantic density analysis |
US8719006B2 (en) | 2010-08-27 | 2014-05-06 | Apple Inc. | Combined statistical and rule-based part-of-speech tagging for text-to-speech synthesis |
US8719014B2 (en) | 2010-09-27 | 2014-05-06 | Apple Inc. | Electronic device with text error correction based on voice recognition data |
US10515147B2 (en) | 2010-12-22 | 2019-12-24 | Apple Inc. | Using statistical language models for contextual lookup |
US10762293B2 (en) | 2010-12-22 | 2020-09-01 | Apple Inc. | Using parts-of-speech tagging and named entity recognition for spelling correction |
US8781836B2 (en) | 2011-02-22 | 2014-07-15 | Apple Inc. | Hearing assistance system for providing consistent human speech |
US9262612B2 (en) | 2011-03-21 | 2016-02-16 | Apple Inc. | Device access using voice authentication |
US10057736B2 (en) | 2011-06-03 | 2018-08-21 | Apple Inc. | Active transport based notifications |
US20120310642A1 (en) | 2011-06-03 | 2012-12-06 | Apple Inc. | Automatically creating a mapping between text data and audio data |
US8812294B2 (en) | 2011-06-21 | 2014-08-19 | Apple Inc. | Translating phrases from one language into another using an order-based set of declarative rules |
US8706472B2 (en) | 2011-08-11 | 2014-04-22 | Apple Inc. | Method for disambiguating multiple readings in language conversion |
US8994660B2 (en) | 2011-08-29 | 2015-03-31 | Apple Inc. | Text correction processing |
US8762156B2 (en) | 2011-09-28 | 2014-06-24 | Apple Inc. | Speech recognition repair using contextual information |
US10134385B2 (en) | 2012-03-02 | 2018-11-20 | Apple Inc. | Systems and methods for name pronunciation |
US9483461B2 (en) | 2012-03-06 | 2016-11-01 | Apple Inc. | Handling speech synthesis of content for multiple languages |
US9280610B2 (en) | 2012-05-14 | 2016-03-08 | Apple Inc. | Crowd sourcing information to fulfill user requests |
US8775442B2 (en) | 2012-05-15 | 2014-07-08 | Apple Inc. | Semantic search using a single-source semantic model |
US10417037B2 (en) | 2012-05-15 | 2019-09-17 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
WO2013185109A2 (en) | 2012-06-08 | 2013-12-12 | Apple Inc. | Systems and methods for recognizing textual identifiers within a plurality of words |
US9721563B2 (en) | 2012-06-08 | 2017-08-01 | Apple Inc. | Name recognition system |
US9495129B2 (en) | 2012-06-29 | 2016-11-15 | Apple Inc. | Device, method, and user interface for voice-activated navigation and browsing of a document |
US9576574B2 (en) | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9547647B2 (en) | 2012-09-19 | 2017-01-17 | Apple Inc. | Voice-based media searching |
US8935167B2 (en) | 2012-09-25 | 2015-01-13 | Apple Inc. | Exemplar-based latent perceptual modeling for automatic speech recognition |
JP2016508007A (en) | 2013-02-07 | 2016-03-10 | アップル インコーポレイテッド | Voice trigger for digital assistant |
US9977779B2 (en) | 2013-03-14 | 2018-05-22 | Apple Inc. | Automatic supplementation of word correction dictionaries |
US9368114B2 (en) | 2013-03-14 | 2016-06-14 | Apple Inc. | Context-sensitive handling of interruptions |
US9733821B2 (en) | 2013-03-14 | 2017-08-15 | Apple Inc. | Voice control to diagnose inadvertent activation of accessibility features |
US10652394B2 (en) | 2013-03-14 | 2020-05-12 | Apple Inc. | System and method for processing voicemail |
US10642574B2 (en) | 2013-03-14 | 2020-05-05 | Apple Inc. | Device, method, and graphical user interface for outputting captions |
US10572476B2 (en) | 2013-03-14 | 2020-02-25 | Apple Inc. | Refining a search based on schedule items |
WO2014144579A1 (en) | 2013-03-15 | 2014-09-18 | Apple Inc. | System and method for updating an adaptive speech recognition model |
US10748529B1 (en) | 2013-03-15 | 2020-08-18 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
KR101759009B1 (en) | 2013-03-15 | 2017-07-17 | 애플 인크. | Training an at least partial voice command system |
CN105190607B (en) | 2013-03-15 | 2018-11-30 | 苹果公司 | Pass through the user training of intelligent digital assistant |
CN112230878A (en) | 2013-03-15 | 2021-01-15 | 苹果公司 | Context-sensitive handling of interrupts |
WO2014197336A1 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for detecting errors in interactions with a voice-based digital assistant |
WO2014197334A2 (en) | 2013-06-07 | 2014-12-11 | Apple Inc. | System and method for user-specified pronunciation of words for speech synthesis and recognition |
US9582608B2 (en) | 2013-06-07 | 2017-02-28 | Apple Inc. | Unified ranking with entropy-weighted information for phrase-based semantic auto-completion |
WO2014197335A1 (en) | 2013-06-08 | 2014-12-11 | Apple Inc. | Interpreting and acting upon commands that involve sharing information with remote devices |
US10176167B2 (en) | 2013-06-09 | 2019-01-08 | Apple Inc. | System and method for inferring user intent from speech inputs |
CN110442699A (en) | 2013-06-09 | 2019-11-12 | 苹果公司 | Operate method, computer-readable medium, electronic equipment and the system of digital assistants |
CN105265005B (en) | 2013-06-13 | 2019-09-17 | 苹果公司 | System and method for the urgent call initiated by voice command |
JP6163266B2 (en) | 2013-08-06 | 2017-07-12 | アップル インコーポレイテッド | Automatic activation of smart responses based on activation from remote devices |
US10296160B2 (en) | 2013-12-06 | 2019-05-21 | Apple Inc. | Method for extracting salient dialog usage from live data |
US9620105B2 (en) | 2014-05-15 | 2017-04-11 | Apple Inc. | Analyzing audio input for efficient speech and music recognition |
US10592095B2 (en) | 2014-05-23 | 2020-03-17 | Apple Inc. | Instantaneous speaking of content on touch devices |
US9502031B2 (en) | 2014-05-27 | 2016-11-22 | Apple Inc. | Method for supporting dynamic grammars in WFST-based ASR |
US9715875B2 (en) | 2014-05-30 | 2017-07-25 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US9966065B2 (en) | 2014-05-30 | 2018-05-08 | Apple Inc. | Multi-command single utterance input method |
US9785630B2 (en) | 2014-05-30 | 2017-10-10 | Apple Inc. | Text prediction using combined word N-gram and unigram language models |
US10078631B2 (en) | 2014-05-30 | 2018-09-18 | Apple Inc. | Entropy-guided text prediction using combined word and character n-gram language models |
US10289433B2 (en) | 2014-05-30 | 2019-05-14 | Apple Inc. | Domain specific language for encoding assistant dialog |
US9633004B2 (en) | 2014-05-30 | 2017-04-25 | Apple Inc. | Better resolution when referencing to concepts |
US9430463B2 (en) | 2014-05-30 | 2016-08-30 | Apple Inc. | Exemplar-based natural language processing |
US9760559B2 (en) | 2014-05-30 | 2017-09-12 | Apple Inc. | Predictive text input |
US9842101B2 (en) | 2014-05-30 | 2017-12-12 | Apple Inc. | Predictive conversion of language input |
US10170123B2 (en) | 2014-05-30 | 2019-01-01 | Apple Inc. | Intelligent assistant for home automation |
US9734193B2 (en) | 2014-05-30 | 2017-08-15 | Apple Inc. | Determining domain salience ranking from ambiguous words in natural speech |
US10659851B2 (en) | 2014-06-30 | 2020-05-19 | Apple Inc. | Real-time digital assistant knowledge updates |
US9338493B2 (en) | 2014-06-30 | 2016-05-10 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US10446141B2 (en) | 2014-08-28 | 2019-10-15 | Apple Inc. | Automatic speech recognition based on user feedback |
US9818400B2 (en) | 2014-09-11 | 2017-11-14 | Apple Inc. | Method and apparatus for discovering trending terms in speech requests |
US10789041B2 (en) | 2014-09-12 | 2020-09-29 | Apple Inc. | Dynamic thresholds for always listening speech trigger |
US10074360B2 (en) | 2014-09-30 | 2018-09-11 | Apple Inc. | Providing an indication of the suitability of speech recognition |
US9886432B2 (en) | 2014-09-30 | 2018-02-06 | Apple Inc. | Parsimonious handling of word inflection via categorical stem + suffix N-gram language models |
US9668121B2 (en) | 2014-09-30 | 2017-05-30 | Apple Inc. | Social reminders |
US9646609B2 (en) | 2014-09-30 | 2017-05-09 | Apple Inc. | Caching apparatus for serving phonetic pronunciations |
US10127911B2 (en) | 2014-09-30 | 2018-11-13 | Apple Inc. | Speaker identification and unsupervised speaker adaptation techniques |
US10552013B2 (en) | 2014-12-02 | 2020-02-04 | Apple Inc. | Data detection |
US9711141B2 (en) | 2014-12-09 | 2017-07-18 | Apple Inc. | Disambiguating heteronyms in speech synthesis |
US9865280B2 (en) | 2015-03-06 | 2018-01-09 | Apple Inc. | Structured dictation using intelligent automated assistants |
US10567477B2 (en) | 2015-03-08 | 2020-02-18 | Apple Inc. | Virtual assistant continuity |
US9721566B2 (en) | 2015-03-08 | 2017-08-01 | Apple Inc. | Competing devices responding to voice triggers |
US9886953B2 (en) | 2015-03-08 | 2018-02-06 | Apple Inc. | Virtual assistant activation |
US9899019B2 (en) | 2015-03-18 | 2018-02-20 | Apple Inc. | Systems and methods for structured stem and suffix language models |
US9842105B2 (en) | 2015-04-16 | 2017-12-12 | Apple Inc. | Parsimonious continuous-space phrase representations for natural language processing |
US10083688B2 (en) | 2015-05-27 | 2018-09-25 | Apple Inc. | Device voice control for selecting a displayed affordance |
US10127220B2 (en) | 2015-06-04 | 2018-11-13 | Apple Inc. | Language identification from short strings |
US10101822B2 (en) | 2015-06-05 | 2018-10-16 | Apple Inc. | Language input correction |
US10255907B2 (en) | 2015-06-07 | 2019-04-09 | Apple Inc. | Automatic accent detection using acoustic models |
US10186254B2 (en) | 2015-06-07 | 2019-01-22 | Apple Inc. | Context-based endpoint detection |
US11025565B2 (en) | 2015-06-07 | 2021-06-01 | Apple Inc. | Personalized prediction of responses for instant messaging |
US10747498B2 (en) | 2015-09-08 | 2020-08-18 | Apple Inc. | Zero latency digital assistant |
US10671428B2 (en) | 2015-09-08 | 2020-06-02 | Apple Inc. | Distributed personal assistant |
US9697820B2 (en) | 2015-09-24 | 2017-07-04 | Apple Inc. | Unit-selection text-to-speech synthesis using concatenation-sensitive neural networks |
US11010550B2 (en) | 2015-09-29 | 2021-05-18 | Apple Inc. | Unified language modeling framework for word prediction, auto-completion and auto-correction |
US10366158B2 (en) | 2015-09-29 | 2019-07-30 | Apple Inc. | Efficient word encoding for recurrent neural network language models |
US11587559B2 (en) | 2015-09-30 | 2023-02-21 | Apple Inc. | Intelligent device identification |
US10691473B2 (en) | 2015-11-06 | 2020-06-23 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US10049668B2 (en) | 2015-12-02 | 2018-08-14 | Apple Inc. | Applying neural network language models to weighted finite state transducers for automatic speech recognition |
US10223066B2 (en) | 2015-12-23 | 2019-03-05 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US10446143B2 (en) | 2016-03-14 | 2019-10-15 | Apple Inc. | Identification of voice inputs providing credentials |
US9934775B2 (en) | 2016-05-26 | 2018-04-03 | Apple Inc. | Unit-selection text-to-speech synthesis based on predicted concatenation parameters |
US9972304B2 (en) | 2016-06-03 | 2018-05-15 | Apple Inc. | Privacy preserving distributed evaluation framework for embedded personalized systems |
US10249300B2 (en) | 2016-06-06 | 2019-04-02 | Apple Inc. | Intelligent list reading |
US10049663B2 (en) | 2016-06-08 | 2018-08-14 | Apple, Inc. | Intelligent automated assistant for media exploration |
DK179309B1 (en) | 2016-06-09 | 2018-04-23 | Apple Inc | Intelligent automated assistant in a home environment |
US10067938B2 (en) | 2016-06-10 | 2018-09-04 | Apple Inc. | Multilingual word prediction |
US10586535B2 (en) | 2016-06-10 | 2020-03-10 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US10490187B2 (en) | 2016-06-10 | 2019-11-26 | Apple Inc. | Digital assistant providing automated status report |
US10509862B2 (en) | 2016-06-10 | 2019-12-17 | Apple Inc. | Dynamic phrase expansion of language input |
US10192552B2 (en) | 2016-06-10 | 2019-01-29 | Apple Inc. | Digital assistant providing whispered speech |
DK179415B1 (en) | 2016-06-11 | 2018-06-14 | Apple Inc | Intelligent device arbitration and control |
DK179049B1 (en) | 2016-06-11 | 2017-09-18 | Apple Inc | Data driven natural language event detection and classification |
DK201670540A1 (en) | 2016-06-11 | 2018-01-08 | Apple Inc | Application integration with a digital assistant |
DK179343B1 (en) | 2016-06-11 | 2018-05-14 | Apple Inc | Intelligent task discovery |
US10593346B2 (en) | 2016-12-22 | 2020-03-17 | Apple Inc. | Rank-reduced token representation for automatic speech recognition |
DK179745B1 (en) | 2017-05-12 | 2019-05-01 | Apple Inc. | SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT |
DK201770431A1 (en) | 2017-05-15 | 2018-12-20 | Apple Inc. | Optimizing dialogue policy decisions for digital assistants using implicit feedback |
US11270714B2 (en) | 2020-01-08 | 2022-03-08 | Digital Voice Systems, Inc. | Speech coding using time-varying interpolation |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4472832A (en) * | 1981-12-01 | 1984-09-18 | At&T Bell Laboratories | Digital speech coder |
CA1197619A (en) * | 1982-12-24 | 1985-12-03 | Kazunori Ozawa | Voice encoding systems |
GB2137054B (en) * | 1983-03-11 | 1987-08-26 | Prutec Ltd | Speech encoder |
US4720865A (en) * | 1983-06-27 | 1988-01-19 | Nec Corporation | Multi-pulse type vocoder |
US4669120A (en) * | 1983-07-08 | 1987-05-26 | Nec Corporation | Low bit-rate speech coding with decision of a location of each exciting pulse of a train concurrently with optimum amplitudes of pulses |
NL8302985A (en) * | 1983-08-26 | 1985-03-18 | Philips Nv | MULTIPULSE EXCITATION LINEAR PREDICTIVE VOICE CODER. |
US4701954A (en) * | 1984-03-16 | 1987-10-20 | American Telephone And Telegraph Company, At&T Bell Laboratories | Multipulse LPC speech processing arrangement |
US4724535A (en) * | 1984-04-17 | 1988-02-09 | Nec Corporation | Low bit-rate pattern coding with recursive orthogonal decision of parameters |
-
1986
- 1986-04-01 US US06/846,854 patent/US4944013A/en not_active Expired - Lifetime
- 1986-04-02 GB GB08608031A patent/GB2173679B/en not_active Expired
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB2195220B (en) * | 1986-09-11 | 1990-10-10 | British Telecomm | Method of speech coding |
GB2195220A (en) * | 1986-09-11 | 1988-03-30 | British Telecomm | Speech coding |
US5193140A (en) * | 1989-05-11 | 1993-03-09 | Telefonaktiebolaget L M Ericsson | Excitation pulse positioning method in a linear predictive speech coder |
EP0397628A1 (en) * | 1989-05-11 | 1990-11-14 | Telefonaktiebolaget L M Ericsson | Excitation pulse positioning method in a linear predictive speech coder |
WO1990013891A1 (en) * | 1989-05-11 | 1990-11-15 | Telefonaktiebolaget Lm Ericsson | Excitation pulse positioning method in a linear predictive speech coder |
EP0516439A3 (en) * | 1991-05-31 | 1993-06-16 | Motorola, Inc. | Efficient celp vocoder and method |
EP0516439A2 (en) * | 1991-05-31 | 1992-12-02 | Motorola, Inc. | Efficient CELP vocoder and method |
GB2285203A (en) * | 1993-12-10 | 1995-06-28 | Nec Corp | Multipulse processing of speech signals |
AU676392B2 (en) * | 1993-12-10 | 1997-03-06 | Nec Corporation | Multipulse processing with freedom given to multipulse positions of a speech signal |
US5696874A (en) * | 1993-12-10 | 1997-12-09 | Nec Corporation | Multipulse processing with freedom given to multipulse positions of a speech signal |
GB2285203B (en) * | 1993-12-10 | 1998-10-28 | Nec Corp | Multipulse processing of speech signals |
DE19647298A1 (en) * | 1995-11-17 | 1997-05-22 | Nat Semiconductor Corp | Digital speech coder excitation data determining method |
US5867814A (en) * | 1995-11-17 | 1999-02-02 | National Semiconductor Corporation | Speech coder that utilizes correlation maximization to achieve fast excitation coding, and associated coding method |
DE19647298C2 (en) * | 1995-11-17 | 2001-06-07 | Nat Semiconductor Corp | Coding system |
Also Published As
Publication number | Publication date |
---|---|
US4944013A (en) | 1990-07-24 |
GB8608031D0 (en) | 1986-05-08 |
GB2173679B (en) | 1989-01-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
GB2173679A (en) | Speech coding | |
US5265167A (en) | Speech coding and decoding apparatus | |
US4980916A (en) | Method for improving speech quality in code excited linear predictive speech coding | |
US5371853A (en) | Method and system for CELP speech coding and codebook for use therewith | |
US7398205B2 (en) | Code excited linear prediction speech decoder and method thereof | |
US6161086A (en) | Low-complexity speech coding with backward and inverse filtered target matching and a tree structured mutitap adaptive codebook search | |
US4817157A (en) | Digital speech coder having improved vector excitation source | |
US6073092A (en) | Method for speech coding based on a code excited linear prediction (CELP) model | |
US5265190A (en) | CELP vocoder with efficient adaptive codebook search | |
US5187745A (en) | Efficient codebook search for CELP vocoders | |
US6006174A (en) | Multiple impulse excitation speech encoder and decoder | |
EP0764940A2 (en) | am improved RCELP coder | |
US5179594A (en) | Efficient calculation of autocorrelation coefficients for CELP vocoder adaptive codebook | |
US5953697A (en) | Gain estimation scheme for LPC vocoders with a shape index based on signal envelopes | |
EP0824750B1 (en) | A gain quantization method in analysis-by-synthesis linear predictive speech coding | |
JP4539988B2 (en) | Method and apparatus for speech coding | |
US5173941A (en) | Reduced codebook search arrangement for CELP vocoders | |
US6169970B1 (en) | Generalized analysis-by-synthesis speech coding method and apparatus | |
Deprettere et al. | Regular excitation reduction for effective and efficient LP-coding of speech | |
JPH075899A (en) | Voice encoder having adopted analysis-synthesis technique by pulse excitation | |
US5822721A (en) | Method and apparatus for fractal-excited linear predictive coding of digital signals | |
US7337110B2 (en) | Structured VSELP codebook for low complexity search | |
US5692101A (en) | Speech coding method and apparatus using mean squared error modifier for selected speech coder parameters using VSELP techniques | |
US5105464A (en) | Means for improving the speech quality in multi-pulse excited linear predictive coding | |
AU655090B2 (en) | Speech signal encoding system capable of transmitting a speech signal at a low bit rate |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PCNP | Patent ceased through non-payment of renewal fee |
Effective date: 20030402 |