EP0537948B1 - Method and apparatus for smoothing pitch-cycle waveforms - Google Patents

Method and apparatus for smoothing pitch-cycle waveforms Download PDF

Info

Publication number
EP0537948B1
EP0537948B1 EP19920309167 EP92309167A EP0537948B1 EP 0537948 B1 EP0537948 B1 EP 0537948B1 EP 19920309167 EP19920309167 EP 19920309167 EP 92309167 A EP92309167 A EP 92309167A EP 0537948 B1 EP0537948 B1 EP 0537948B1
Authority
EP
European Patent Office
Prior art keywords
speech signal
speech
trace
pitch
decoder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP19920309167
Other languages
German (de)
English (en)
French (fr)
Other versions
EP0537948A2 (en
EP0537948A3 (en
Inventor
Willem Bastiaan Kleijn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
AT&T Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AT&T Corp filed Critical AT&T Corp
Publication of EP0537948A2 publication Critical patent/EP0537948A2/en
Publication of EP0537948A3 publication Critical patent/EP0537948A3/en
Application granted granted Critical
Publication of EP0537948B1 publication Critical patent/EP0537948B1/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters

Definitions

  • the present invention relates generally to speech communication systems and more specifically to signal processing associated with the reconstruction of speech from code words.
  • Speech coding can provide data compression useful for communication over a channel of limited bandwidth.
  • Speech coding systems include a coding process which converts speech signals into code words for transmission over the channel, and a decoding process which reconstructs speech from received code words.
  • a goal of most speech coding techniques is to provide faithful reproduction of original speech sounds such as, e.g., voiced speech, produced when the vocal cords are tensed and vibrating quasi-periodically.
  • voiced speech In the time domain, a voiced speech signal appears as a succession of similar but slowly evolving waveforms referred to as pitch-cycles. A single one of these pitch-cycles has a duration referred to as the pitch-period.
  • LTPs long-term predictors
  • CELP linear predictive
  • a frame (or subframe ) of coded pitch-cycles is reconstructed by a decoder in part through the use of past pitch-cycle data by the decoder's LTP.
  • a typical LTP may be interpreted as an all-pole filter providing delayed feedback of past pitch-cycle data, or an adaptive codebook of overlapping vectors of past pitch-cycle data.
  • Past pitch-cycle data works as an approximation of present pitch-cycles to be decoded.
  • a fixed codebook e.g. a stochastic codebook
  • Analysis-by-synthesis coding systems like CELP while providing low bit-rate coding, may not communicate enough information to completely describe the evolution of the pitch-cycle waveform shapes in original speech. If the evolution (or dynamics ) of a succession of pitch-cycle waveforms in original speech is not preserved in reconstructed speech, audible distortion may be the result.
  • the present invention provides a method and apparatus for improving the dynamics of reconstructed speech produced by speech coding systems.
  • Exemplary coding systems include analysis-by-synthesis systems employing LTPs, such as most CELP systems. Improvement is obtained through the identification and smoothing of one or more traces in reconstructed voiced speech signals.
  • a trace refers to an envelope formed by like-features present in a sequence of pitch-cycles of a voiced speech signal.
  • Identified traces are smoothed by any of the known smoothing techniques, such as linear interpolation or low-pass filtering. Smoothed traces are assembled by the present invention into a smoothed reconstructed signal.
  • the identification, smoothing, and assembly of traces may be performed in the reconstructed speech domain, or any of the excitation domains present in analysis-by-synthesis coding systems.
  • Figure 1 presents a time-domain representation of a voiced speech signal.
  • Figure 2 presents an illustrative embodiment of the present invention.
  • Figure 3 presents illustrative traces for the time-domain representation of the voiced speech signal presented in Figure 1.
  • Figure 4 presents illustrative frames of a speech signal used in trace smoothing.
  • Figure 5 presents an illustrative embodiment of the present invention which combines smoothed and conventional reconstructed speech signals according a proportionality measure of voiced and non-voiced speech.
  • Figure 1 presents an illustrative stylized time-domain representation of a voiced speech signal (20ms). As shown in the Figure, it is possible to describe voiced speech as a sequence of individual similar waveforms referred to as pitch-cycles. Generally, each pitch-cycle is slightly different from its neighbors in both amplitude and duration. The brackets in the Figure indicate a possible set of boundaries between successive pitch-cycles. Each pitch-cycle in this illustration is approximately 5ms in duration.
  • ICASSP 91 vol. 1, 14 May 1991, TORONTO, pages 233-236, M. Copperi, 'Efficient excitation modelling in a low bit-rate CELP coder' discloses an approach to excitation analysis which relies on a pitch tracker incorporated into a closed-loop analysis.
  • the purpose is to derive a pitch contour in order to set a smaller search depth in an adaptive codebook.
  • the pitch tracker is composed by an auto-correlation algorithm and a linear predictor. A suitable interpolation function of adjacent peaks in the autocorrelation function is used to get a smoothed pitch contour. Then the chosen lag is fed to a predictor that provides the reference lag, for the current subframe, to the closed pitch loop.
  • a pitch-cycle may be characterized by a series of features which it may share in common with one or more of its neighbors. For example, as shown in Figure 1, pitch-cycles A, B, C, and D, have characteristic signal peaks 1-4 in common. While the exact amplitude and location of peaks 14 may change with each pitch-cycle, such changes are generally gradual. As such, voiced speech is commonly thought of as periodic or nearly so ( i.e., quasi-periodic).
  • a CELP coder may transmit 20ms frames of coded speech (160 samples at 8kHz) by coding and assembling four 5ms subframes, each with its own characteristic LTP delay.
  • the illustrative pitch-cycles shown in Figure 1 correspond to 5 ms subframes. It will be apparent to one of ordinary skill in the art that the present invention is also applicable to situations where pitch-cycles and subframes do not coincide.
  • a trace identifier 100 receives a conventional reconstructed speech signal, V c (i), and a time-distance function, d(i), from a conventional decoder, such as a CELP decoder.
  • the conventional reconstructed speech signal may take the form of speech itself, or any of the speech-like excitation signals present in conventional decoder. It is preferred that V c (i) be the excitation signal produced by the LTP of the decoder.
  • Data from N traces, V T n (j k ), 1 ⁇ n ⁇ N, are identified and passed to a plurality of trace smoothing processes 200.
  • These smoothing processes 200 operate to provide smoothed trace data, V ST n (j k ), 1 ⁇ n ⁇ N, to a trace combiner 300.
  • Trace combiner 300 forms a smoothed speech signal, V s (i), from the smoothed trace data.
  • the trace identifier 100 of the illustrative embodiment defines or identifies traces in speech. Each identified trace associates a series of like-features present in a sequence of pitch-cycle waveforms of a reconstructed speech signal.
  • a trace is an envelope formed by the amplitude of samples of the reconstructed speech signal provided by a speech decoder, V c , at times given by values of an index, j k .
  • Figure 3 presents illustrative traces for certain sample points in a segment of the voiced speech (a frame) presented in Figure 1.
  • Illustrative values for the time-distance function, d(i) may be obtained from a conventional LTP-based decoder providing frames or subframes of the reconstructed speech signal.
  • d(i) is the delay used by the LTP of the CELP decoder.
  • a typical CELP decoder provides a delay for each subframe of coded speech. In such a case, d(i) is constant for all sample points in the subframe.
  • a trace need not be identified in non-voiced speech (that is, speech which comprises, for example, silence or unvoiced speech).
  • a trace may extend backward and forward in time from a given point in time. There may be as many traces in a given pitch-cycle as there are data samples (e.g ., at an 8kHz sampling rate, 40 traces in a 5ms pitch-cycle). When pitch-cycles expand over time, certain traces may split into multiple traces. When pitch-cycles contract over time, certain traces may end. Furthermore, because values of d(i) may exceed a single pitch-period, a trace may associate like-features in waveforms which are more than one pitch-cycle apart.
  • Traces identified in a reconstructed speech signal are smoothed by smoothing processes 200 as a way of modifying the dynamics of reconstructed pitch-cycle waveforms. Any of the known data smoothing techniques, such as linear interpolation, polynomial curve fitting, or low-pass filtering, may be used.
  • a smoothing technique is applied to each trace over a time interval, such as a 20ms frame provided by a CELP decoder.
  • Figure 4 presents illustrative frames of a reconstructed speech signal used in the smoothing of a single trace, T n , by the embodiment of Figure 2.
  • An exemplary smoothing process 200 maintains past trace values (from a past frame of the signal) which are used in establishing an initial data value for a smoothing operation on a current frame of the speech signal.
  • Delay d(j 4 ) is used by the smoothing process 200 to identify the first ( i.e.
  • this trace value is obtained from the past frame trace values: V T n (j 5 ).
  • V ST n (j k ) are combined on a rolling frame-by-frame to form a smoothed reconstructed speech signal, V s (i), by trace combiner 300.
  • Trace combiner 300 produces smoothed reconstructed speech signal, V s (i), by interlacing samples from individual smoothed traces in temporal order. That is, for example, the smoothed trace having the earliest sample point in the current frame becomes the first sample of the frame of smoothed reconstructed speech signal; the smoothed trace having the next earliest sample in the frame supplies the second sample, and so on.
  • a given smoothed trace will contribute one sample per pitch-cycle of a smoothed reconstructed speech signal.
  • the smoothed reconstructed speech signal, V s (i) may be provided as output to be used in the manner intended for the unsmoothed version of the speech signaL
  • the parameter ⁇ a measure of periodicity, is used to control the proportion of smoothed and conventional speech in V(i).
  • V s is significant as a manipulation of a voiced speech signal
  • operates to provide for V(i) a larger proportion of V s (i) when speech is voiced, and a larger proportion of V c (i) when speech is non-voiced.
  • a determination of the presence of voiced speech, and hence a value for ⁇ , may be made from the statistical correlation of adjacent frames of V c (i).
  • An estimate of this correlation may be provided for a CELP decoder by an autocorrelation expression: where d(i) is the delay from the LTP of the CELP decoder and L is the number of samples in the autocorrelation expression, typically 160 samples at an 8kHz sampling rate ( i.e ., the number of samples in a frame of the speech signal) ( see , Fig. 5, 400).
  • the greater the autocorrelation, the more periodic the speech, and the greater the value of ⁇ see , Fig. 5,500). Given the expression for V(i), large values for ⁇ provide large contributions to V(i) by V s , and visa-versa.
  • a further illustrative embodiment of the present invention concerns smoothing a subset of traces available from a reconstructed speech signal.
  • One such subset can be defined as those traces associated with sample data of large pulses within a pitch-cycle. Of course, these large pulses form a subset of pulses within the pitch-cycle.
  • this illustrative embodiment may involve smoothing only those traces associated with samples of the speech signal associated with pulses 1-3 of each pitch-cycle.
  • Identification of a subset of pulses to include in the smoothing process can be made by establishing a threshold below which pulses, and thus their traces, will not be included. This threshold may be established by an absolute level or a relative level as a percentage of the largest pulses.
  • the threshold may be selected from experience based upon several test levels.
  • assembly of smoothed traces into a smoothed reconstructed speech signal may be supplemented by the original reconstructed speech signal for which no smoothing has occurred.
  • Such original reconstructed speech signal samples are those samples which fall below the above- mentioned threshold. As a result, such samples do not form part of a trace which is smoothed.
  • the original reconstructed speech signal may be in the speech domain itself, or it may be in one of the excitation domains available in analysis-by-synthesis decoders. If the speech domain is used, the illustrative embodiments of the present invention may follow a conventional analysis-by-synthesis decoder. However, should the speech signal be in an excitation domain, such as the case with the preferred embodiment, the embodiment would be located within such decoder. As such, the embodiment would receive the excitation domain speech signal, process it, and providing a smoothed version of it to that portion of the decoder which expects to receive the excitation speech signal. In this case, however, it would receive the smoothed version provided by the embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP19920309167 1991-10-18 1992-10-08 Method and apparatus for smoothing pitch-cycle waveforms Expired - Lifetime EP0537948B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US77856091A 1991-10-18 1991-10-18
US778560 1991-10-18

Publications (3)

Publication Number Publication Date
EP0537948A2 EP0537948A2 (en) 1993-04-21
EP0537948A3 EP0537948A3 (en) 1993-06-23
EP0537948B1 true EP0537948B1 (en) 1997-09-03

Family

ID=25113764

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19920309167 Expired - Lifetime EP0537948B1 (en) 1991-10-18 1992-10-08 Method and apparatus for smoothing pitch-cycle waveforms

Country Status (4)

Country Link
EP (1) EP0537948B1 (ja)
JP (1) JP3798433B2 (ja)
DE (1) DE69221985T2 (ja)
ES (1) ES2104842T3 (ja)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999030315A1 (fr) * 1997-12-08 1999-06-17 Mitsubishi Denki Kabushiki Kaisha Procede et dispositif de traitement du signal sonore
ATE288121T1 (de) * 1999-05-19 2005-02-15 Noisecom Aps Verfahren und vorrichtung zur geräuschreduzierung in sprachsignalen
JP4968421B2 (ja) * 2001-09-28 2012-07-04 大日本印刷株式会社 時系列信号解析装置

Also Published As

Publication number Publication date
DE69221985D1 (de) 1997-10-09
DE69221985T2 (de) 1998-01-08
EP0537948A2 (en) 1993-04-21
EP0537948A3 (en) 1993-06-23
JPH05224698A (ja) 1993-09-03
JP3798433B2 (ja) 2006-07-19
ES2104842T3 (es) 1997-10-16

Similar Documents

Publication Publication Date Title
US5018200A (en) Communication system capable of improving a speech quality by classifying speech signals
EP0409239B1 (en) Speech coding/decoding method
US6345248B1 (en) Low bit-rate speech coder using adaptive open-loop subframe pitch lag estimation and vector quantization
EP0718822A2 (en) A low rate multi-mode CELP CODEC that uses backward prediction
EP0745971A2 (en) Pitch lag estimation system using linear predictive coding residual
EP0342687B1 (en) Coded speech communication system having code books for synthesizing small-amplitude components
US5267317A (en) Method and apparatus for smoothing pitch-cycle waveforms
EP1420391B1 (en) Generalized analysis-by-synthesis speech coding method, and coder implementing such method
EP0784846B1 (en) A multi-pulse analysis speech processing system and method
US5027405A (en) Communication system capable of improving a speech quality by a pair of pulse producing units
US4975955A (en) Pattern matching vocoder using LSP parameters
US6169970B1 (en) Generalized analysis-by-synthesis speech coding method and apparatus
EP0778561B1 (en) Speech coding device
EP1204092B1 (en) Speech decoder capable of decoding background noise signal with high quality
EP1103953B1 (en) Method for concealing erased speech frames
US6704703B2 (en) Recursively excited linear prediction speech coder
EP0537948B1 (en) Method and apparatus for smoothing pitch-cycle waveforms
EP0745972B1 (en) Method of and apparatus for coding speech signal
JPH0782360B2 (ja) 音声分析合成方法
JP3088204B2 (ja) コード励振線形予測符号化装置及び復号化装置
JP2736157B2 (ja) 符号化装置
EP0713208B1 (en) Pitch lag estimation system
EP0539103A2 (en) Generalized analysis-by-synthesis speech coding method and apparatus
GB2205469A (en) Multi-pulse type coding system
KR100309873B1 (ko) 코드여기선형예측부호화기에서무성음검출에의한부호화방법

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): DE ES FR GB IT

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): DE ES FR GB IT

17P Request for examination filed

Effective date: 19931203

RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: AT&T CORP.

17Q First examination report despatched

Effective date: 19960514

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE ES FR GB IT

ITF It: translation for a ep patent filed

Owner name: JACOBACCI & PERANI S.P.A.

ET Fr: translation filed
REF Corresponds to:

Ref document number: 69221985

Country of ref document: DE

Date of ref document: 19971009

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2104842

Country of ref document: ES

Kind code of ref document: T3

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 19980701

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 19980915

Year of fee payment: 7

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: THE PATENT HAS BEEN ANNULLED BY A DECISION OF A NATIONAL AUTHORITY

Effective date: 19991031

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20061004

Year of fee payment: 15

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20061031

Year of fee payment: 15

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20061124

Year of fee payment: 15

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20071008

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20071008

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20071009

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: THE PATENT HAS BEEN ANNULLED BY A DECISION OF A NATIONAL AUTHORITY

Effective date: 19981031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20071009

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20071008