GB2398980A - Adjustment of non-periodic component in speech coding - Google Patents

Adjustment of non-periodic component in speech coding Download PDF

Info

Publication number
GB2398980A
GB2398980A GB0304483A GB0304483A GB2398980A GB 2398980 A GB2398980 A GB 2398980A GB 0304483 A GB0304483 A GB 0304483A GB 0304483 A GB0304483 A GB 0304483A GB 2398980 A GB2398980 A GB 2398980A
Authority
GB
United Kingdom
Prior art keywords
speech
communication unit
periodic
speech communication
waveform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0304483A
Other versions
GB0304483D0 (en
GB2398980B (en
Inventor
Halil Fikretler
Jonathan Gibbs
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Motorola Solutions Inc
Original Assignee
Motorola Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Motorola Inc filed Critical Motorola Inc
Priority to GB0304483A priority Critical patent/GB2398980B/en
Publication of GB0304483D0 publication Critical patent/GB0304483D0/en
Publication of GB2398980A publication Critical patent/GB2398980A/en
Application granted granted Critical
Publication of GB2398980B publication Critical patent/GB2398980B/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering

Abstract

A speech communication unit (100) comprises a speech decoder (134) having a receiver for receiving at least a first substantially periodic waveform and a second non-periodic waveform. An adjustment function 324, operably coupled to the receiver, adjusts an amplitude of the second non-periodic waveform. Preferably, the amplitude adjustment is performed using a high-pass filter and/or a high frequency boost. By adjusting an amplitude of a non-periodic portion of a speech signal, preferably of non-periodic components of a Kalman speech coder, an improvement in the quality of the synthesised speech can be achieved without an increase in delay or bit-rate. Also described is phase adjustment of the non-periodic waveform for smoothing fricatives.

Description

be: ce. ce: À
-
-
Speech Communication Unit And Method For Synthesising Speech Therein
Field of the Invention
This invention relates to speech coding and methods of optimising the performance of speech coders in communication systems. The invention is applicable to, but not limited to, wideband speech coding using prototype waveform interpolation (PWI) based techniques.
Background of the Invention
Many present day speech communications system, such as the global system for mobile communications (GSM) cellular telephony standard and the TErrestrial Trunked RAdio (TETRA) system for private mobile radio users, use speech-processing units to encode and decode speech patterns. In such voice communication systems, a speech encoder in a transmitting unit converts the analogue speech pattern into a suitable digital format for transmission. A speech decoder in a receiving unit converts a received digital speech signal into an audible analogue speech pattern.
Thus, a digital speech communication system typically uses a speech encoder to produce a parsimonious representation of the analogue speech signal. A corresponding decoder is used to generate an approximation of the speech signal from that representation. The combination of the encoder and decoder is known in the art as a speech codec.
As frequency spectrum for such voice communication systems is a valuable resource, it is desirable to limit the channel bandwidth used by such speech signals, in order to maximise the number of users per frequency band.
ece c: À. be: : ::e e' :: À.
: : À.e.e À. À..
Hence, a primary objective in the use of speech coding techniques is to reduce the occupied capacity of the speech patterns as much as possible, by use of compression techniques, without losing fidelity.
A large number of communication technologies have been developed that use such reduced channel bandwidth. These communication systems are able to provide high-quality speech at bit rates below 10 Kbits/sec. The speech quality at these bit rates is referred to as 'toll' quality speech and facilitates the transmission of speech within the audio frequency telephone bandwidth of 300Hz to 3.4kHz. This range of speech signals, generally referred to as "narrowband" speech, is adequate for telephone communications. However, it is perceived as being inadequate for emerging technologies such as multimedia services, teleconferencing, etc., where an improvement in speech quality is required.
By increasing the sampling frequency to 16kbits/sec, a wider bandwidth of speech signals, from 50Hz to 7kHz can be accommodated, thereby providing for better quality speech. With such 'wideband' speech, it is known that extending the lower frequency range down to 50Hz increases naturalness, presence and comfort in the synthesized reproduced speech. Furthermore, extending the higher frequencies up to 7kHz increases intelligibility and makes it easier to differentiate between fricative sounds such as "s" and "f". This results in a speech signal that is more natural.
As will be apparent to a person skilled in the art, many segments of speech signals contain quasi-periodic waveforms. In particular, it is well known that during voiced segments the speech signal is nearly periodic.
l'' cl: be. ce: :: :' ce: te : : À À c.. À.e Thus, it is relatively easy to identify from any time instant, a first pitch cycle, a second pitch cycle and so on. Notably, when comparing a sequence of such pitch cycle waveforms, it is possible to observe that the waveforms' general shape evolves over time. This slow evolution led some researchers to improve speech-coding techniques by extracting a pitch cycle waveform at regular time intervals in order to obtain a good approximation of the intermediate pitch cycle by means of interpolation. This approach led to the speechprocessing concept of waveform interpolation (WI).
For voiced speech, the pitch cycle waveform effectively describes the essential characteristics of the speech signal. Thus, a speech signal can be re-constructed, without distortion, if the pitch cycle waveform and the phase of the speech signal are known at each instance of time. Although such a technique lends itself readily to voiced signals, it is also applicable to non-periodic unvoiced signals. However, whilst the pitch cycle waveform evolves slowly for voiced speech, i.e. there is a slow rate of change of repetitive components such as pitch and harmonic components of pitch, it evolves rapidly for unvoiced speech. Hence, the quasi-periodic component of the speech signal corresponds to a slowly evolving waveform (SEW) component, whereas a non-periodic ("noisy") component of the speech signal corresponds to a rapidly evolving waveform (REW) component.
In the "wideband" context of the present invention, a recent paper has proposed using a waveform interpolation technique for wideband speech coding: "Extending waveform interpolation to wideband speech coding" by C. H. Ritz, I. S. Burnett and J. Lukasiak, published in IEEE "Speech coding workshop proceedings" September 2002. However, in : : :: c c c reality, this paper provides little insight into the opportunities, implementation issues and problems associated with providing 'wideband speech' using a waveform interpolation technique.
The inventors of the present invention have recognised and appreciated significant limitations in the use of WI, particularly for wideband applications. In particular, an important element of a speech codec is the approach it takes to reconstruct consecutive cycles of quasi-periodic waveforms. Frequently, correlation is exploited by transmitting a single cycle, or of a filtered version, of the waveform only once every 20-30 msec. In this manner, a portion of the data is missing in the received signal.
The standard approach in dealing with the missing data in the decoder is to linearly interpolate between cycles of the received speech samples.
In practice, the use of linear interpolation by a speech decoder to generate data between the speech cycles only produces an adequate approximation to the speech signal if the speech signal really is quasiperiodic.
Alternatively, and equivalently, an adequate approximation is obtained if the vectors representing consecutive cycles of the waveform evolve sufficiently slowly. However, many segments of speech contain noisy nonperiodic signal components. This results in comparatively rapid evolution of the waveform cycles. In order for waveform interpolation in an encoder to be useful for such signals, it is necessary to accurately extract a sufficiently quasi-periodic component from the noisy signal in the encoder.
Linear low pass filtering a sequence of vectors representing consecutive cycles of speech in the time I,. Il: try f l: l l 1 1 1 1 :: l l c dimension using finite impulse response (FIR) filters is well known in the speech coding literature. The difficulty with this approach is that in order to obtain good separation of the slowly and rapidly evolving components, the low pass filter frequency response must have a sharp roll-off. This requires a long impulse response, which necessitates an undesirably large filter delay. Hence, the FIR approach is of limited practical use in a wideband speech-coding context for interactive conversational applications.
A Kalman filter technique for estimating quasi-periodic signal components has been described by Gruber and Todtli (IEEE Trans Signal Processing, Vol. 42, No. 3, March 1994, pp 552-562). In its favour, the Kalman filtering technique has promise in that the estimates of the quasi periodic and non-periodic parts of the speech waveform can be estimated with either little or a well defined delay.
However, because this Kalman filter technique is based on a linear dynamic system model of a frequency domain representation of the signal, it is unnecessarily complex. It also assumes that the dynamic system model parameters (i.e. noise energy and the harmonic signal gain) are known.
Some waveform interpolation (PWI) coders have been known to introduce random phase into the synthesis process. In particular, in the context of a waveform-interpolated coded of the present invention, non-periodic signals are employed in order to synthesist fricative sounds. In a WI codec, there is practically no interest in modifying non-periodic waveform signals when synthesizing speech and incorporating random phase into the synthesis c.e cc: ces tl: eeer.:e A: ate À:: process. The inventors of the present invention have recognized and appreciated that many more opportunities exist to improve a quality of synthesized speech by modifying the non-periodic components.
Hence, a need has arisen to provide a speech codec and, in particular, an improved method of speech synthesis and speech analysis for wideband speech communication, to at least alleviate some of the aforementioned disadvantages.
Summary of the Invention
The present invention provides a speech communication unit, in accordance with claim l or claim 16, and a method of synthesising speech in accordance with Claim 15. Further aspects of the present invention are defined in the dependent Claims.
In summary, the present invention aims to provide a speech communication unit and a method of synthesizing speech that at least alleviate some of the aforementioned disadvantages associated with current speech synthesis techniques. In particular, a mechanism to adjust an amplitude of a nonperiodic portion of a speech signal, preferably of non-periodic components of a Kalman speech coder, is described. This improves the quality of the synthesized speech without an increase in delay or bit- rate.
Brief Description of the Drawings
Exemplary embodiments of the present invention are described, with reference to the accompanying drawings: FIG. l illustrates a block diagram of a wireless communication unit containing a speech coder (encoder and/or decoder) adapted to support the various inventive ce:e aeee: :e ce:e:e concepts of a preferred embodiment of the present invention; FIG. 2 illustrates a block diagram of a Kalman encoder adapted to support the various inventive concepts of a preferred embodiment of the present invention; FIG. 3 illustrates a block diagram of a Kalman decoder adapted to support the various inventive concepts of preferred embodiments of the present invention; FIG. 4 illustrates a block diagram of a process of phase adjustment of a non-periodic component in accordance with a preferred embodiment of the present invention; and FIG. 5 illustrates a series of speech waveforms that highlight the reduction in rough output artefacts after implementing the preferred embodiments of the present invention.
Description of Preferred Embodiments
In summary, the inventors of the present invention
have proposed a speech codec implementation that uses a Kalman filter to identify and separate the quasi- periodic (QP) voiced signal component from the non- periodic (NP) unvoiced component of a speech signal.
Advantageously, the inventors of the present invention have recognised that a Kalman filter uses an infinite impulse response (IIR) filter, and as such does not suffer from the known problems associated with FIR approaches. However, the use of such a Kalman filter in a wideband context brings a new range of implementation difficulties. The preferred embodiment of the present invention is primarily focused on improving the quality of the non-periodic components in a speech signal.
Referring now to FIG. l, there is shown a block diagram of a wireless subscriber speech communication unit, se:e bees: :.. .e.e:.
adapted to support the inventive concepts of the preferred embodiments of the present invention. The speech communication unit 100 contains an antenna 102.
Antenna 102 is preferably coupled to a duplex filter or antenna switch 104 that provides isolation between a receiver chain and a transmitter chain within the speech communication unit 100.
As known in the art, the receiver chain typically includes receiver frontend circuitry 106 (effectively providing reception, filtering and intermediate or base- band frequency conversion). The front-end circuit is serially coupled to a signal processing function 108. An output from the signal processing function is provided to a suitable output device 110, such as a speaker, via a speech-processing unit 130.
The speech-processing unit 130 includes a speech encoding function 134 to encode a user's speech signal into a format suitable for transmitting over the communication medium. The speech-processing unit 130 also includes a speech decoding function 132 to decode received speech signals into a format suitable for outputting via the output device (speaker) 110. The speech-processing unit 130 is operably coupled to a memory unit 116 and a timer 118 via a controller 114 and link 136.
In particular, the operation of the speech-processing unit 130 has been adapted to support the inventive concepts of the preferred embodiments of the present invention. The adaptation of the speech-processing unit is further described with regard to FIG. 2 and FIG. ce. :e e*: ce' Be: c:.e eJ:: e.
: . À The receiver chain also includes received signal strength indicator (RSSI) circuitry 112 (shown coupled to the receiver front-end 106, although the RSSI circuitry 112 could be located elsewhere within the receiver chain).
The RSSI circuitry 112 is coupled to a controller 114 for maintaining overall subscriber unit control. The controller 114 is also coupled to the receiver front-end circuitry 106 and the signal processing function 108 (generally realised by a DSP).
The controller 114 may therefore receive bit error rate (BER) or frame error rate (FER) data from recovered information. The controller 114 is coupled to the memory device 116 for storing operating regimes, such as decoding/encoding functions and the like. A timer 118 is typically coupled to the controller 114 to control the timing of operations (transmission or reception of time dependent signals) within speech communication unit 100.
In the context of the present invention, the timer 118 dictates the timing of speech signals, in the transmit (encoding) path and/or the receive (decoding) path.
As regards the transmit chain, this essentially includes an input device 120, such as a microphone transducer, coupled in series via speech encoder 134 to a transmitter/modulation circuit 122. Thereafter, any transmit signal is passed through a power amplifier 124 to be radiated from the antenna 102. The transmitter/modulation circuit 122 and the power amplifier 124 are operationally responsive to the controller, with an output from the power amplifier coupled to the duplex filter or circulator 104. The transmitter/modulation circuit 122 and receiver front- end . c :. À .: À À À À circuitry 106 comprise frequency up-conversion and frequency down- conversion functions (not shown).
Of course, the various components within the speech communication unit 100 can be arranged in any suitable functional topology able to utilise the inventive concepts of the present invention. Furthermore, the various components within the speech communication unit can be realised in discrete or integrated component form, with an ultimate structure therefore being merely an arbitrary selection.
It is within the contemplation of the invention that the processing and/or storage of speech signals can be implemented in software, firmware or hardware, with the function being implemented in a software processor (or indeed a digital signal processor (DSP)), performing the speech processing function, merely a preferred option.
Referring now to FIG. 2, a block diagram of a wideband speech encoder 200 is shown. The wideband speech encoder uses a Kalman filter 228 to extract (estimate) the periodic components of a waveform making the assumption of Gaussian noise, in accordance with a preferred embodiment of the present invention.
Notably, the inventors of the present invention have proposed a mechanism for applying a Kalman filter approach to a wideband coded, whereas known waveform interpolation techniques have previously been mostly restricted to narrowband speech. With narrowband speech, the end-to-end algorithmic delay, together with the perceptual quality of the synthesized speech, can provide unacceptable speech synthesis when considered for future speech communication systems.
À À À . . . . À À . . c. . . . . À À À In addressing these limitations, the following assumptions have been made as to what would constitute an acceptable speech codec performance. The end-to-end delay should not exceed 60 msec., including worst-case processing time delays. The target bit rate for an initial quantization design should be around 8kbits/sec.
The speech quality of the synthesized speech should be as high as possible, although ideally comparable to ITU-T Rec. G.722 at 48 kb/s when measured in clean conditions.
In accordance with the preferred embodiment of the present invention, an acoustic (speech) signal to be synthesized is input to the input device 202, for example a microphone. The speech signal is then converted into a sequence of N pulse samples, and the amplitude of each pulse sample is then represented by a digital code via an analogue-to-digital (A/D) converter 204. The A/D samples the speech signal at 16kHz and decimates the signal to 14 kHz, as known to those skilled in the art, to reduce the complexity of the subsequent speech processing.
The output from the A/D 204 is input to a linear predictive coder (LPC) function, where the sampled speech is analysed. The analysis is preferably performed every msec. using a high order (22nd-order) LPC analysis.
The computation derives the LPC parameters for, say, sub frame '6' (of eight sub-frames) in the 20-msec. period of the look-ahead frame, i.e. an earlier frame in the sequence of speech frames being processed. The computation preferably uses an asymmetric window of length equal to one frame of speech.
The LPC parameters are then computed for the eight sub frames of the current frame by converting them to line : : ce c: e: :: À *: :: *: spectral frequencies (LSFS) in function 210 and then applying LSF interpolation in interpolation function 214.
In this manner, an LSF vector for the last sub-frame of the current frame is computed to act as the target for LSF quantization. This LSF vector is included as one of a number of parameters that are multiplexed onto the transmit signal and sent to the speech decoder.
The residual signals are computed for both the un quantized and quantized LSF vectors. In both cases the LPC gain is computed and used to scale the residual signals over a linear time window, in an attempt to reduce energy fluctuations in the residual signals. The quantized residual signal is used for further processing.
The un-quantized residual signal is used to estimate the pitch of the sampled speech signal in pitch estimation function 218. In the preferred embodiment, a spectral estimate of the pitch in the look-ahead frame, as well as the current frame, is computed. From the pitch track, integrating a function of the pitch period (not shown) derives an estimate of the glottal phase track. The glottal phase track is preferably used as the basis for extracting the pitch cycles for the current frame and the lookahead region, in function 222. The extraction process commences with the last pitch cycle of the current frame and utilises this as a reference for pitch phase and duration 223. This pitch parameter is coarsely quantized.
Accurate pitch information is used in the encoder in the following areas. The pitch cycles are extracted from an over-sampled version of the residual (56 kHz sampled) signal, preferably using at least three-times over sampling function 220.
. À À e À e a. ' Àe e.
Correlation in the perceptually weighted speech domain, followed by translation of the offsets to the residual domain, is used to perform time alignment (warping) of all cycles to that of the last cycle in the frame. In addition to the complete cycles in the current frame there is also a part cycle, known as the remnant, which is extracted.
Each pitch cycle is also normalized in function 224 in terms of power in the fundamental frequency. The normalization of each pitch cycle is achieved by computing the first non-DC discrete Fourier transform (DFT) slot magnitude, which equates to the energy at the pitch frequency. An overall scale factor 226 is derived for cycles within a frame. This scale factor 226 is also one of the parameters that is quantized and sent to the decoder.
The warped and normalized pitch cycles are input to a Kalman filter 223. The Kalman filter state from the previous frame is circular convolved with the perceptual weighting filter and time aligned with the last cycle of the frame. The Kalman filter 228 is executed on all of the 'complete' pitch cycles of the current frame. Any remaining portion of a pitch cycle, as indicated above, is termed the remnant signal.
In addition, Kalman lag-smoothing 228 is performed on the look-ahead cycles until the normalised correlation of a look-ahead cycle falls below a predetermined level. In this regard, lag-smoothing is performed on the look-ahead cycles to better determine the last pitch period in the current frame. The decision on whether, or not, to use lag-smoothing is made by smooth decision function 230.
ec. :t A: À. ce: : : t'* .. : : e.
. . This last pitch period is used as the quasi-periodic estimate 232. This QP estimate 232 equates to the slowly evolving waveform (SEW) of a prototype waveform interpolation coder.
Notably, in accordance with the preferred embodiment of the present invention, the Kalman filter state is quantized in the spectral domain. This is well known in the art from SEW quantization for narrowband PWI coding.
In accordance with the preferred embodiment of the present invention, two versions of the quasi-periodic (QP) signal are derived. The normal case is based upon the current frame and both previous and current quantized Kalman states. The second case is based purely upon observations made during the previous frame, implemented by routing the Kalman states through a one-frame delay 234. In this manner, the observations are made for cycles observed in the look-ahead region from the previous frame and modelled as scaled versions of the previous Kalman state only. As previously indicated, this does not constitute the full frame (as there is likely to be a remnant portion). Thus, some of the normal state will be required for at least the end of the frame. It is noteworthy that although the second case is slightly erroneous, as the LPC filter is not matched, the inventors have determined that it does not result in significant errors.
A gain computation function 236 has been introduced.
For each cycle of the frame, the optimum proportion (gains) of previous and current Kalman states are calculated in the perceptually weighted domain, in gain computation function 236. In this regard, it is then possible to produce a waveform that has minimum 8 8 8 8 8 8 8 8 À À 8 8 8 8 88 8 8 8 8 À 88. À 88 squared error (MSE) distortion, in a perceptually weighted sense.
The QP gains 240, 242, for the new frame and the previous frame, are quantized and sent to the decoder. The number of pitch periods used in the gain computation is between two and ten (i.e. a remnant pitch period and between one and nine full pitch periods).
From the un-quantized QP gains, an estimate of the quasi- periodic (QP) signal can be obtained in the residual domain, by de- warping 238 the weighted and combined Kalman states. The non-periodic (NP) component is computed by a simple subtraction 246 from the actual pitch cycle residuals in the fast Fourier transform (FFT) domain (after de- warping 244). The NP component is computed in the four-times over-sampled domain. Thus, the power of the FFT slot for the fold-over frequency (7 kHz) is assumed to be equal to the sum of the powers for all slots in the 7-28kHz range, provided by summing function 248.
The FFT of the NP component is modified to incorporate a high-pass response that increases in frequency for highly voiced regions in DFT function 252.
Thus NP components relating to power 250 and NP components relating to spectrum 254 are sent to the decoder. These equate to a rapidly evolving waveform (REW) signal in a prototype waveform interpolation waveform coder.
The above encoder architecture provides a number of significant improvements over the known periodic waveform interpolation (PWI) architectures. PWI architectures À À :e a: À À e: linearly interpolate between two successive frames.
However, the inventors of the present invention have recognized that performing such linear interpolation leads to sub-optimal speech synthesis. In particular, better control over the evolution between successive frames, as proposed in the present invention, enhances the quality of the synthesized and ultimately decoded speech.
Thus, for each block of speech in the preferred Kalman speech encoder, the following parameters are derived, quantized and sent to the decoder: (i) Line spectral frequencies (LSFs) 212: this defines the short-term spectral properties of the speech signal.
One set of LSFs per frame are calculated and sent to the decoder. These LSFs are then used to interpolate between the speech frames to derive multiple versions of the synthesis (1/A(z)) filter and/or analysis (A(z)) filter with a smooth evolution. In this manner, abrupt changes in the signal spectrum are removed, as the parameters of an all-pole (synthesis) filter are characterized.
(ii) Pitch 223: The pitch estimation function 218 identifies the pitch period of instantaneous sections of speech down to the resolution of a fraction of a sample.
However, for synthesis of a perceptually identical signal at the decoder, a lower pitch resolution is required.
The pitch is therefore quantized to 7-bits once per speech frame.
(iii) NPpower 250: The signal power of the non-periodic signal. This parameter is sent nine times per 20 msec.
(iv) NPspectrum 254: The spectral distribution of the non-periodic signal. This parameter is sent nine times per 20 msec.
(v) Gain factors 240, 242: For each cycle of the frame, the optimum gains of previous and current Kalman states ce:. fete: :. e. e':e are sent to the decoder. Between two and ten gain pairs are transmitted every 20 msec. depending upon the pitch period and the number of pitch periods within 20 msec.
(vi) QP state. The quasi-periodic estimate of the last pitch period in the frame.
A more detailed description of the functionality of a typical speech encoding unit can be found in "Digital speech coding for low-bit rate communications systems" by A. M. Kondoz, published by John Wiley in 1994.
The decoder functionality is substantially the reverse of that of the encoder (without the additional circuitry following a de-multiplexer), and is therefore not
described in great detail. A description of the
functionality of a typical speech decoding unit can also be found in "Digital speech coding for low-bit rate communications systems" by A. M. Kondoz, published by John Wiley in 1994.
Referring now to FIG. 3, a preferred embodiment of a Kalman wideband speech decoder 134 is illustrated. The Kalman decoder 134 effectively uses the aforementioned parameters to synthesize the transmitted speech signal.
In summary, the synthesiser (speech decoder) de-quantizes the parameters and a new Kalman state is time aligned according to the phase model. The time alignment is synchronized to the Kalman state of the previous frame.
A pitch phase is derived based upon the pitch period dynamics indicated in the pitch parameter 223 sent from the speech encoder 132. Notably, two versions of the pitch phase are used in the synthesis, namely a current frame pitch phase and a previous frame pitch phase, which is delayed by aone-frame delay function 310.
e t,': tIce e'41.
Similarly, the QP state 232 is derived for a current frame and a previous frame, delayed by a one-frame delay function 320. A sample-by-sample set of gains (gp+rev, g'+rXr) is derived by gain computation function 318, based on the QP gain components 240, 242 received from the speech encoder. These gains are applied to the derived QP states of the current and previous frame to ensure the QP signal envelope is smooth. The two gainadjusted QP state signals are then combined in summer 320.
The NP signal is synthesised with random phase using the received NP parameters 250, 254, as sent by the speech encoder 132.
In accordance with the preferred embodiment of the present invention, the NP parameters 250, 254 are optimised in order to reduce any "perkiness" in the residual signal caused by unfortunate random phases between spectral components. The optimization reduces or removes the effect of peakiness in this residual signal, which would ordinarily lead to disturbing audible artefacts. Such artefacts, in turn, would lead to lower perceived speech quality.
In accordance with the preferred embodiment of the present invention, the Kalman Wideband decoder employs one or more of the following techniques to modify the non-periodic (NP) parameters in the parameter adjustment function 324 prior to synthesis.
A first approach is to adaptively perform High-pass filtering on the NP parameters. The HPF characteristic performed on the NP parameters is preferably based upon a determined degree of voicing. A voicing decision is 1 t ', t. À, t À t e t preferably based on a power ratio of the quasiperiodic (QP) and NP components. Furthermore, in this approach, the parameter adjustment function includes a computation function to calculate a normalized correlation value between the respective extracted cycles since this is another useful indicator of voicing.
In this manner, the inventors of the present invention propose to utilise certain aspects of voiced speech. In particular, the inventors have recognized that a quasi- periodic (QP) component of voiced speech may occasionally leak into the non-periodic (NP) component. As the NP component is coarsely represented, speech quality is reduced by these leakages. This effect can be minimised by high-pass filtering the NP component in voiced sections, since the absence of badly synthesized spectral components is less disturbing than synthesis with incoherent or badly aligned phase.
In a second (alternative or additional) approach to modifying the nonperiodic (NP) parameters in the parameter adjustment function 324, prior to synthesis, a high frequency boost is applied. The high frequency boost is implemented to account for any anticipated loss in the aforementioned adaptive post filtering approach.
The inventors of the present invention have observed that synthesized speech sounds muffled, when the adaptive post filtering of the first HPF approach is applied to minimise high-pass filtering of the NP component in voiced sections. Boosting high frequencies of the signal in this manner significantly reduces the muffling effect.
In the preferred embodiment of the present invention, the high frequency boost is achieved by the parameter adjustment function 324 increasing the power of the À 6 À 6 6 C 6 e À e 6 frequencies in a linear fashion, starting with OdB at 4.6kHz and increasing to 5dB at 7kHz.
It is noteworthy that there is no known prior art in adjusting an amplitude of a non-periodic component of a waveform interpolated speech signal.
In a third (alternative or additional) approach to modifying the nonperiodic (NP) parameters in the parameter adjustment function 324, prior to synthesis, phase adjustment on the NP signals is performed. The phase adjustment reduces 'peakiness, following synthesis.
In this approach, the inventors observed that fricatives in the speech typically sound poor due to random phase generation. In some cases, transformation of a frequency-domain signal to a time-domain signal can cause undesired large spectral peaks.
In summary, a phase adjustment function, operably coupled to the receiver, adjusts a phase of the second non- periodic waveform. The phase adjustment function generates a random phase vector within the second non- periodic component of the speech signal and iteratively phase shifts one or more portions of the second non- periodic component, for example to reduce a peakiness of the fricatives.
The inventors of the present invention propose to check for peakiness of the speech signal in the time domain.
Following a transform of the speech signal from a time- domain signal to a frequency-domain signal, the parameter adjustment function 324 determines the largest spectral peaks in the speech signal. For these peaks, the parameter adjustment function 324 iteratively performs d e c
C
C d
phase shifting in the frequency domain. In this manner, the time-domain peaks, i.e. those peaks that cause fricatives, are substantially suppressed.
Referring now to FIG. 4, a flowchart 400 illustrates a process of phase adjustment of a non-periodic component in accordance with a preferred embodiment of the present invention. The preferred mechanism to implement the phase adjustment process commences in step 405. A random phase vector is generated within the non-periodic component of the speech signal, in step 410. The non- periodic component of the speech signal is then transformed from a spectrum-domain to the time domain, as shown in step 415.
A 'perkiness' score is then calculated in step 420. The perkiness' score is preferably calculated as the peak divided by the average power. Once the 'perkiness' score has been calculated, the appropriate phase shift is performed, as shown in step 425. The phase shift (adjustment) preferably starts from the largest spectral slot. The corresponding phase component is shifted by pi/12. However, it is within the contemplation of the present invention that other phase shifts may be used to produce similar results. The phase component that results in the lowest peakiness score is saved. In this manner, the phase adjustment is performed to reduce the peakiness of the fricatives.
It is within the contemplation of the present invention that any of the aforementioned three approaches can be used individually to improve the speech quality of the non-periodic components. However, the use of any two approaches, and preferably all three, further improves the quality of the synthesized speech.
À ea À À À a À FIG. 5 illustrates a series of speech waveforms that highlight the reduction in rough output artefacts, after implementing the preferred embodiments of the present invention. The lower waveform indicates the rough output artefacts that result from the NP components. These artefacts are not present in the original speech waveform, indicated in the upper waveform of FIG. 5.
Notably, after employing the inventive concepts herein described, the NP artefacts have been reduced through the phase adjustment, as illustrated in the middle speech waveform of FIG. 5.
Thus, the aforementioned processes modify the phase vector of the NP signal, such that it produces smooth energy envelopes for fricatives.
The QP and NP signals are then combined in summer 322.
The combined signal is then applied to a speech synthesis (l/A(z)) filter to provide an output signal with a smooth evolution. Preferably, a Chen adaptive post-filter 328, with pitch enhancement, is then performed on the decoded speech signal; a method well known in the art.
It is within the contemplation of the invention that any speech processing circuit would benefit from the inventive concepts described herein.
Apparatus of the invention: A speech communication unit comprises a speech decoder having a receiver for receiving at least a first substantially periodic waveform and a second non-periodic waveform. The speech decoder includes an adjustment function, operably coupled to the receiver, to adjust an amplitude of the second non-periodic waveform.
À c À À À Method of the invention: A method of synthesising speech in a speech communication unit comprises the step of receiving at least a first substantially periodic waveform and a second non-periodic waveform. The method also comprises the step of adjusting an amplitude of the second non-periodic waveform to smooth fricatives therein.
It will be understood that the speech codec and, in particular, an improved method of speech synthesis for wideband speech communication, as described above, tends to provide at least one or more of the following advantages: (i) A speech decoder that applies one or more of the aforementioned techniques improves the quality of the synthesized speech without an increase in delay or bit- rate; (ii) A speech decoder that applies a high-frequency boost of a non- periodic component improves the quality of the synthesized speech by accounting for an anticipated signal loss following adaptive post- filtering, as well as minimising muffling of the synthesised speech; (iii) A speech decoder that applies phase adjustment to a non-periodic component improves the quality of the synthesized speech by reducing peakiness of the fricatives following synthesis; and (iv) A speech decoder that applies a high-pass filter to a non-periodic component improves the quality of the synthesised speech by reducing leakages of quasi-periodic components in non-periodic components.
It will, of course, be appreciated that the above description has been given by way of example only and that modifications may be made within the scope of the e.ecee e ce. À À À À À
present invention. For example, whilst the preferred embodiment discusses the application of the present invention to a Kalman coder, it is envisaged by the inventors that any other speech waveform-based processing unit can benefit from the inventive concepts contained herein.
However, the inventive concepts described herein find particular use in speech processing units for both fixed network and wireless communication units, such as universal mobile telecommunication system (UMTS) units, global system for mobile communications (GSM), TErrestrial Trunked RAdio (TETRA) communication units, Digital Interchange of Information and Signalling standard (DIIS), Voice over Internet Protocol (VoIP) units, etc. Whilst specific, and preferred, implementations of the present invention are described above, it is clear that one skilled in the art could readily apply further variations and modifications of such inventive concepts.
Thus, a speech communication unit and method of synthesising speech have been described that substantially alleviate at least some of the aforementioned disadvantages with known techniques. À e À

Claims (18)

  1. e À À Claims 1. A speech communication unit (100) comprising a speech
    decoder (134) comprising: a receiver for receiving at least a first substantially periodic waveform and a second non-periodic waveform, the speech decoder (134) characterized by: an adjustment function (324), operably coupled to the receiver, to adjust an amplitude of the second non- periodic waveform.
  2. 2. The speech communication unit (100) according to Claim 1, wherein the adjustment function (324) is a high-pass filter applied to the second non-periodic waveform.
  3. 3. The speech communication unit (100) according to Claim 2, wherein the high-pass filter is based upon a determined degree of voicing.
  4. 4. The speech communication unit (100) according to Claim 3, wherein the determined degree of voicing is based on a power ratio of a first quasi-periodic and said second non-periodic waveform.
  5. 5. The speech communication unit (100) according to any preceding Claim, wherein the adjustment function includes a computation function to calculate a correlation value between the respective extracted cycles.
  6. 6. The speech communication unit (100) according to any preceding Claim, wherein said speech communication unit (100) uses waveform interpolation.
    ::À À: ::.: À
  7. 7. The speech communication unit (100) according to Claim 6, wherein said speech communication unit (100) employs a Kalman coder and uses quasi periodic signals and non periodic signals such that said adjustment function (324) reduces a leakage of non-periodic components into a quasi-periodic signal in said Kalman coder.
  8. 8. The speech communication unit (100) according to any preceding Claim, wherein the adjustment function (324) is a high frequency boost applied to the second non-periodic waveform.
  9. 9. The speech communication unit (100) according to Claim 8 when dependent upon Claim 2, wherein the high frequency boost is applied to the second non-periodic waveform thereby reducing a loss in signal resulting from applying said high-pass filter to the waveform.
  10. 10. The speech communication unit (100) according to Claim 8 or Claim 9, wherein the high frequency boost is applied to the second non-periodic waveform by increasing a signal power of frequencies in a linear manner.
  11. 11. The speech communication unit (100) according to any preceding Claim, wherein the adjustment function (324) is further characterized by a phase adjustment function to additionally adjust (400) a phase of the second non- periodic waveform.
  12. 12. The speech communication unit (100) according to Claim 11, wherein the phase adjustment function performs the following functions: generates a random phase vector (410) within the second non-periodic component of the speech signal; À :: .: :: .: À À transforms the synthesized non-periodic component from a spectrum-domain to a time-domain (415); calculates (420) a peakiness score; and iteratively phase shifts (425) the non-periodic component thereby reducing perkiness of speech fricatives.
  13. 13. The speech communication unit (100) according to Claim 12, wherein the peakiness score is calculated as the peak signal of the second nonperiodic waveform divided by average power.
  14. 14. The speech communication unit (100) according to Claim 12 or Claim 13, wherein the phase shift starts from the largest spectral slot, with a phase component that is calculated as having a low peakiness score.
  15. 15. A method of synthesising speech in a speech communication unit (100), the method comprising the step of: receiving at least a first substantially periodic waveform and a second non-periodic waveform; the method characterized by the step of: adjusting an amplitude of the second non-periodic waveform thereby smoothing fricatives.
  16. 16. A speech communication unit adapted to perform the method steps according to Claim 15.
  17. 17. A speech communication unit substantially as hereinbefore described with reference to, and/or as illustrated by, FIG. 1 of the accompanying drawings.
  18. 18. A speech coder substantially as hereinbefore described with reference to, and/or as illustrated by, FIG. 2 or FIG. 3 of the accompanying drawings.
GB0304483A 2003-02-27 2003-02-27 Speech communication unit and method for synthesising speech therein Expired - Fee Related GB2398980B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
GB0304483A GB2398980B (en) 2003-02-27 2003-02-27 Speech communication unit and method for synthesising speech therein

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB0304483A GB2398980B (en) 2003-02-27 2003-02-27 Speech communication unit and method for synthesising speech therein

Publications (3)

Publication Number Publication Date
GB0304483D0 GB0304483D0 (en) 2003-04-02
GB2398980A true GB2398980A (en) 2004-09-01
GB2398980B GB2398980B (en) 2005-09-14

Family

ID=9953766

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0304483A Expired - Fee Related GB2398980B (en) 2003-02-27 2003-02-27 Speech communication unit and method for synthesising speech therein

Country Status (1)

Country Link
GB (1) GB2398980B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847404A (en) * 2010-03-18 2010-09-29 北京天籁传音数字技术有限公司 Method and device for realizing audio pitch shifting

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114387989B (en) * 2022-03-23 2022-07-01 北京汇金春华科技有限公司 Voice signal processing method, device, system and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5828811A (en) * 1991-02-20 1998-10-27 Fujitsu, Limited Speech signal coding system wherein non-periodic component feedback to periodic excitation signal source is adaptively reduced

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5828811A (en) * 1991-02-20 1998-10-27 Fujitsu, Limited Speech signal coding system wherein non-periodic component feedback to periodic excitation signal source is adaptively reduced

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101847404A (en) * 2010-03-18 2010-09-29 北京天籁传音数字技术有限公司 Method and device for realizing audio pitch shifting
CN101847404B (en) * 2010-03-18 2012-08-22 北京天籁传音数字技术有限公司 Method and device for realizing audio pitch shifting

Also Published As

Publication number Publication date
GB0304483D0 (en) 2003-04-02
GB2398980B (en) 2005-09-14

Similar Documents

Publication Publication Date Title
AU763471B2 (en) A method and device for adaptive bandwidth pitch search in coding wideband signals
JP4662673B2 (en) Gain smoothing in wideband speech and audio signal decoders.
JP4550289B2 (en) CELP code conversion
JP3653826B2 (en) Speech decoding method and apparatus
US6081776A (en) Speech coding system and method including adaptive finite impulse response filter
US6654716B2 (en) Perceptually improved enhancement of encoded acoustic signals
JP2004537739A (en) Method and system for estimating pseudo high band signal in speech codec
WO2005106850A1 (en) Hierarchy encoding apparatus and hierarchy encoding method
WO1997031367A1 (en) Multi-stage speech coder with transform coding of prediction residual signals with quantization by auditory models
GB2398980A (en) Adjustment of non-periodic component in speech coding
GB2398981A (en) Speech communication unit and method for synthesising speech therein
GB2398983A (en) Speech communication unit and method for synthesising speech therein
GB2398982A (en) Speech communication unit and method for synthesising speech therein

Legal Events

Date Code Title Description
PCNP Patent ceased through non-payment of renewal fee

Effective date: 20080227