CA2102080C - Time shifting for generalized analysis-by-synthesis coding - Google Patents

Time shifting for generalized analysis-by-synthesis coding

Info

Publication number
CA2102080C
CA2102080C CA002102080A CA2102080A CA2102080C CA 2102080 C CA2102080 C CA 2102080C CA 002102080 A CA002102080 A CA 002102080A CA 2102080 A CA2102080 A CA 2102080A CA 2102080 C CA2102080 C CA 2102080C
Authority
CA
Canada
Prior art keywords
original signal
signal
trial
original
samples
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
CA002102080A
Other languages
French (fr)
Other versions
CA2102080A1 (en
Inventor
Willem Bastiaan Kleijn
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AT&T Corp
Original Assignee
American Telephone and Telegraph Co Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by American Telephone and Telegraph Co Inc filed Critical American Telephone and Telegraph Co Inc
Publication of CA2102080A1 publication Critical patent/CA2102080A1/en
Application granted granted Critical
Publication of CA2102080C publication Critical patent/CA2102080C/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0002Codebook adaptations
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L2019/0001Codebooks
    • G10L2019/0011Long term prediction filters, i.e. pitch estimation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/06Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being correlation coefficients

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

A generalized analysis-by-synthesis technique is disclosed.
Illustratively, a section of an original signal containing a local maximum energy is identified. A plurality of segments of the original signal containing the local maximum energy are selected based on a plurality of time shifts. These segments are termed "trial original signals." Each trial original signal is compared to asynthesized signal from an adaptive codebook and a measure of similarity (e.g., a cross-correlation) between these signals is evaluated. A trial original signal for use in coding is determined based on one or more evaluated measures of similarity. Asignal reflecting a coded representation of the original signal is generated based on one or more determined trial original signals. The signal reflecting a coded representation of the original signal may be provided by an analysis-by-synthesis coder, such as a CELP coder.

Description

-2l02a~
TIME ~ G FOR GENERALIZED ANALYSIS-BY-SYNTHESIS CODING

Field of the I~
The present h~ ion relates generally to speech coding systems and more s~ifi~ ly to a l~luclion of ba,..l~.; llh ~ uh-,m~,llt~ in analysis-by-s~ n~' '! ~ ~
s speech coding systems.

R~ , . d of the I~
Speech coding systems function to provide cod~. Jld l~ iSe'.l ~lio~c of speech signals for cc ~ ~- over a channel or network to one or more system ~ce;~ . Each system receiver lbcon~ speech signals from received l0 cod~.c,ld~. Theamountofcod~ .ldi~ro~ nc~~~ ~i( ~ byasystemina given time period defines system ' '~. ;.llh and affects the quality of speech ~;p~luced by system l~,ce;.
Designers of speech coding systems often seek to provide high quality '~
speech l~,~ludu.:lion capability using as lit~e ~ .i.lll. as L " '1~ IIù
lS l~u l~ nt~ for high quality speech and low ~ may conflict and ~ fu present ~, r~ ". ;ag trade-offs in a design process. This nol~ E, speech coding t. I h ~ u~ S have been de~loped which provide a . ~r ~1' speech quality at reduced channel halld~.;dlhs. Among these are analysis-~-synthesis speech coding With analysis-by-~ - speech coding ~,h--i~ s, speech signals are COdedthrOUgha-.a~erOllll ' ' g~JlU~,edll.~.. A ~ ' ' speechsignalis synthesized from one or more parameters for co...p~ ;go~- to an original speech signal tober c' ~ Byvaryingp=,-...~ ffe~.nt~..l1P.:,.dc~n.'~ speech signals may be -.~ The parameters of the closest - ' ~ s ' ' 2s speech signal may then be used to ~pl~iSenl the original speech signal.
Many analysis-l" s.~ Le~;s coders, e.g., most code-excited linear predicdon (CELP) coders, employ a lDn~ predictor (LTP) to model 1- g t~,llll : ~
Cull~ - - in speech signals. (The term "speech signals" means actual speech or any of the residual and ~ ~ signals present in analysis-by-sylltlle~;~ coders.) 30 During the sylllllcs;s process, an LTP is COI~ ;ol- ~lly realized either as an all-pole filter or as an adaptive co~ with gain scaling. As a general matter, long t~
cu..~ r - in speech signals aUow a past l~,COI.,~tl uct~,d speech signal to serve as an a~ of a culrent speech signal. LTPs work to compare several past speech signals (which have already been coded) to a current (original) speech signal. By 2~o2o~a such Cr!llll~AI ;cons, the LTP det rmin~ s which past signal most closely matches the original signal. A past speech signal is i(l~ntifiq~ by a delay which inrlir~qt~s how far in the past (from current time) the signal is found. A coder employing an LTP
svb~ f~ a scaled version of the closest ' ~ g past speech signal (i.e., the bestS d~ '') from the current speech signal to yield a signal with reduced long-term c~ ,lalioll. This signal is then coded, typically with a fixed ~lor~ ;r codcbook (FSCB). The FSCB index and LTP delay, among other p= ~ t -~ are Il ~ to a CELP decoder which can recover an estimate of the original speech from these p~aln~,t~
By mndeling long t~ ll Cullbl - of speech, the quality of -.~coni,t~ ,t~,d speech at a decoder may be e ~ 9nc~ This e~h~ however, is - ' ~ -~
notachievedwithouta ~ignif- increaseinband..;dlll. Fore li'~ inorderto model long ~.1ll CUllbl ~ - in speech, conventional CELP coders may transmit 8-bit delay i.,r,.. ", ~i~m every 5 or 7.5 ms (referred to as a ~Jru,,~c). Such time-15 varying delay p ~ te-~ require, e.g., between one and two ad~1itinnql Idlobits (kb) per second of ~ .;.llh. Because ~r~ia~iOnS in LTP delay rnay not be p ~ " Lle over time (i.e., a se ~ of LTP delay values may be ~ ' - - - in nature), it may prove difficult to reduce the a1~l;l;n ~ .;dlll lb~uilb~ n~ through im~lu._d coding of delay ~ -One ~ uacl- to reducing the extra bandwidth l~ PI~ IS of analysis-by~ the~is coders ~,a ~ g an LTP might be to transmit LTP delay values less often and ~e ~ a ' ~' LTP delay values by ~lali~n.
IIo..~ , r -l ' may lead to ~ delay values being used by the LTP
in ~ f, ~ s of the speech signal. For ~ , if the delay is 2s ~ ~ rp~ -1, then the LTP will map past speech signals into the present in a ~_' >~ -1 fashion. As a result, the dil~,.buce between past speech mapped into the present and the original signal will be larger than it might c,lh~ . ;se be. The FSCB
must then work to undo the effects of this s~boptilllal d~ ,- ~r rather than perform its normal funcdon of refining ~ rOlul shape. As a result, ~igr'~ audible 30 d;i~t~ni ~ mayresult. -Summary of the Invention The present i..~_ntion provides a method and ~p~ --- for Ib 11 ~ g l; ls.;dill~b4ullb inanalysis-l~ S.~ lcs;scodingsystems. Inr~c~
with the present ill~. . ger~ ~. uli~ed analysis-by-synthesis coding is ~ .idcd 35 through variation of original signals. Original signal variants are referred to às trial . ; ~ .
'';

.. -. ..... .

21020~

original signals. Use of trial original signals in place of or as a ~ rp!~ to the use of original signals in analysis-by-synthesis coding reduces coding error and bit rate I~UilG~U~ i. In the context of speech coding, reduced coding error affords less frequent lln~ .in~ of LTP delay ~ ~( ~ and allows for delay ~ , s' S with little or no ~1Pgr: ~ in the quality of l~con~llu~,t~,d speech. The u~ ion is applicable to, among other things, n~,lwol~ ~ for cc, -~ g speech i.,r~ iO,-, such as, for - . 1 e, wireless (e.g., cellular) and con- ~nUO~ .pkc~r n~,t~
Regarding speech coding, trial original signals are i~ I)r signals which are p~ ually (e.g., audibly) similar to the actual original signal. The 10 degree of audible ~imil ~ between a trial original signal and the actual original signal may affect coded bit rate and the quality of speech ~ . d by a receiver (e.g., the lower the ~ ' ~~" the lower the bit rate and speech quality may be). The original signal, and hence the trial original signals, may take the form of actual speech signals or any of the~residual or e-ritP'i-~n signals present in analysis-by- ~ -15 ",.~hcs;s coders.
In an i11 ~_ e -..hc ' of the present i~ - trial original signals are ~- - ' as luu~ versions of an original speech signal s~,gr-Me~&;~llGS of ~imi1 ~~qr (e.g., cross-coll~ ' ) between trial original signals and cc ~ -U~io~ from an adaptive code~ are evaluated. A trial original signal which ~20 is either the same as one of the trial original signals or a variant of an original or trial original signal is ~ t~ based on one or more evaluated ...f ~ s of similarity.
an the case of a variant of ~ Iy g ~ trial original signals, the ~lc~ d trial original signal (i.e., the variant) may c~ ,ond to a time-shift which falls in between time-shifts which produced ~llG~;ousl~r generated trial original signals.) A ~ -25 signal ~ ing a coded .. ~ of the original signal is ~ .t d based on the d~ trial original signal.
:.,. ~ . . ,, :, Brief Description of the Dra~in~s Figure 1 presents a c - 1~. - -1 CELP coder.
Figure 2 presents an ill~.~t~ emho~ -lt of the present in.~
Figure 3 presents ~ lo.. i. of samples used in a correlation process r- g open-loop delay.
Figure 4 presents illustrative time reladonships of delay values for use with the e.llbc " of Figure 2.

21020~0 Figure S presents an illu~lla~ , embodiment of an adaptive codebook processor.
Figures 6a-c present illu~lla~ , sample time rçl~tion~hirs for operation of the adaptive codeboo~ of the c ~ f l-t of Figure 2.
Figure 7 presents an illu~halive e-l-boL,-~ll~ of the time-shift lJIUCeSSOI
of the embodiment of Figure 2.
Figure 8 presents an ir ~_ set of initial con-lition~ for the op~
of the time-shift ~uuCeSSOl of Figure 7. ~ .
Figure 9 presents a flow diagram of the operation of the time-shift 10 plvcessvl of Figure 7.
Figure l0 presents an illU~ dli~r_ segment of original speech used for g trial original speech signals by time-shifting.
Figure l l presents an ~ _ e-..l~~ 1 of the invention.
Figure 12 presents a finite state machine fl~srnbin~ the operatif~n of a 5 dday ~-as it co-~ time ~ -' Ully between original and time-shifted slgnals.
Figure 13 presents an illu~hd~ _./~codel for use with the illu~hdt;~ccoderç...~ ..- 't'i -- ~3inFigure2andinFigure ll.

DetailedDc~_. p' 20 Illustrative Embodiment l' ~.. -e For clarity of explanatdon, the illustradve ~ ~..hoA ~~ of the present h~ ion is ~ ' as co...~ -' Çl :- -' blocks (;-~ -J;ug r.. 1;0~9l blocks labeled as ' l~uce~ The 1~ ;o-~ these blocks l~,~._s~,n~ may - ' be realized through the use of either shared or dcd;~t~ d ha~ v~ g, but not25 limited to, L d~.alG capable of e~ g software. For e~mrle the ~ c of .,es~uls ~rC~t~ d in Figures S and 7 may be ~, -,.;d~d by a single shared p~essol. (Use of the term "~lu.,es~" should not be co..shucd to refer eAcl~ ly to ha..l~ i capable of e ~ c~ g software.) Illustratdve e-..hY~ .e-.t~ of the present invention may co--.l~ e digital 30 signal plvcessol (DSP) ~ ;, such as the AT~T DSPl6 or DSP32C, read-only memory (ROM) for storing software ~ f~ g the ope~tion~ c-~ d below, and ~ndom access memory (RAM) for storing DSP results. Very large scale ~
(VLSI) ~ . c ~ ' - as well as custom VLSI circuitry in c~ - r n with a general purpose DSP circuit, may also be provided.

2~02~80 ~

Introduction to Conventional CELP ~ -A conventional analysis-by-~yll~Lvi.is CELP coder is ~lvse.l~vd in Figure l. A sampled speech signal, s(i), (where i is the sample index) is provided to a short-term linear prediction filter (STP) 20 of order N, o~ f-d for a current 5 segrnent of speech. Signal x(i ) is an t ~ inn obtained after filtering with the STP:
N
x(i) = s(i) -- ~; an s(i n), (1) where p= ,...,. ~ . ~ an are provided by linear prediction analy~r lQ Since N isusually about lO samples (for an 8 kHz s~mrling rate), the çYcit~tion signal x(i) generally retains the long-term ~ - ;r.~ ; Iy of the original signal, s(i). An LTP 30 is lO provided to remove this ~ . y. ,~
Values for x(i) are usually .i~ t - ...i.-ed on a blockwise basis. Each block is referred to as a ~,~b.rr~...e. The linear ~,lvlivlion co~rr~ip l~l~> an, are det~ l by the analy~r lO on aframe-by-frame basis, with aframe having a fixed duration which is generally an integral multiple of ~Cl~v ~ ~tinnc and usually 20-30 ms 5 in length. Subframe valùes for an are usually l;~Ch -~ Y3 through interpolation.
The LTP, typically ~ r l ' ~ with an adaptive codeboot~ If.S , ~ ~ :
a gain ~(i) and a delay d(i) for use as follows:
- - .~ - .
r(i) = x(i) - ~(i) x(i-d(i)), (2) where the x(i -d(i)) are samples of a speech signal ;~ - -f~l (or lvcon~llu.;lvd) in 20 earlier subrla~llvs. Thus, the LTP 30 provides the quantity ~(i) x(i -d(i)). Signal r(i) is the ~ ~ signal lv ~ ~ ~ after ~(i) x(i--d(i)) is subtracted fromx(i).
Signal r(i) is then coded with a FSCB 40. The FSCB 40 yields an index ;...1;~
the codebo~'~ vector and an cssc ~ ~ i scaling factor, ~u(i). Together these qll~ntiti~s provide an eyrit~tion which most closely matches r(i).
Data l~v~ c~ v of each subframe of speech, namely, LTP
(i) and d(i), and the FSCB index, are co11P~t~d for the integer number of s -hr., -..~ s equsllinp a frame (typically 2 to 8). Together with the co~ rr.~ :e .l~ an, this frame of data is c ~ ~ ~ ' to a (~ P decoder where it is used in the vco.l~Llu.;lion of speech.

- 6- 2 1 0 2 ~ 8 9 .. .....
A CELP decoder p~ . rO. .,.~ the reverse of the coding process ~1ic~lcsed -above. The FSCB index is received by a FSCB of the receiver (sc--.~ .,es referred -to as a ~ Le~ ) and the ~coc i-t. ~ vector e(i) (an e~ritP~ion signal) is retrieved from the co~1r bo( ~ F ' ~ ~ e(i) is used to excite an inverse LTP process (wherein 5 long-term coll~l~ions are ~ .ided) to yield a ~ludnt~ed equivalent of x(i), x(i). A
l~collsll u.,t.,d speech signal, y(i), is obtained by filteAng x(i) with an inverse STP
process (wherein short-term COll~' ~ ~ are provided).
In general, the l~consh~ t~d e ~ -nn X(i) can be interpreted as the sum of scaled a!ntribu on~ from the adaptive and fixed codebo~JI~. To select thelo vectors from these codeboo~ ~, a pel~ tually relevant error criterion may be used.
This can be done by taking ~1~lLg~ of the spectral masking existing in the humanauditory system. Thus, instead of using the dirf~,lbnCe between the original andecol.stl~ : d speech signals, this error criterion co~ - . the dirr~ ince of ually weighted signals.
15The p~ iptUa~ igh~;llg of signals ~lf -~ S the ~l ~c present in speech. In this ~ ~ 'e, the r .. ~ .t~ are ~lf ~ C - ;I~A by an all-pole filter in which spectral ~le~ ~.. p'~ can be obtained by ~ving the poles inwar~ This is e~lui~.. le.-l -toreplacingthefilterwith~ 1;cto~ coer~: r.t~i a1, a2. ---, aN,byafilterwith ~ -rc:r'' ~al, lf2a2, ~-- . l~NaN,where~isape..,~ludlweightingfactor 20 (usually set to a value around 0.8). ~ ~ x The sampled error signal in the pcl.,~.ally weighted domain, g(i), is: ~
N - ~:
g(i) = x(i) --x(i) + ~ ~nan g(i--n) (3) n = l The error criterion of analysis-by-~ ~Pc;c coders is ~( ' - d on a ;" ~ r nP by ; ~ . :
S ~ r ~ basis. For a ,~ ~ r length of L samples, a co.... -.- -1y used c~iterion is:

i+L-I
2~ g(i)2 (4) where i is the first sample of the subframe. Note that this criterion weighs thee ~ in ~ samples ~ , over the Sl ' ~ ~ the sample x(i +L - l ) affects only g (i +L - l ), while x(i) affects all samples of g (i) in the present ~--bf, ne 2102~

The criterion of equation (4) includes the effects of differences iri x(i) and x( i) prior to î, i.e., prior to the beginning of the present subframe. It is co~ iclll to define an eY~ it~tion in the present subframe to ~ sellt this zero-input response of the weighted synthesis filter~

O, i<i~, q(i) = ~ z(i) -- ~,'Ynan q(i--n), i S i < i+N (S) ~ ~-n=l 0, i2i+N

where z(i) is the zero-input .r~ e in the present subframe of the ~ ually~
weighted 'a~ llhe~aiS filter when excited withx(i)-x(i) prior to the present subrl~
In the time-domain, the spectral dee~ h~;c by the factor ~results in a quicker ~ttçnll~tion of the impulse lb;~pOn tb of the all-pole filter. In practice, for a lo ~o.nplin~ rate of 8 kHz, and y = 0. 8, the impulse l~,~,vllse never has a ~ignifir~nt - - .
part of its energy beyond 20 samples.
Because of its fast decay, the impulse lb;,~nse of the all-pole filter 1/(1 - ~alz~l - ~yNaNz~N)canbea~ byafinite-i.ll~uls~
;,~nse filter. Let ho, hl, ~ ~ ~ ,hR_I denote the impulse lbi,~nse of the latter5 filter. This allows vector notation for the error criterion op~tin~ on the ,p~ually-weighted speech. Because the coders operate on a ;" ~ r ~ by-a~rbf ~ basis, it is co.. ~ ~nl to define vectors with the length of the subf.. --.. ~ in ::
samples, L. For b , ' 0, for the e - ~ ;t ,t;o ~ signal:

x(i) = [x(i) x(i+l) ~- x(i+L--1)] ~ (6) 20 Further, the spectral-weighting mat~ix H is defined as: ~ ~

:: ' ~, .

:, : ;

21~2~8~ ~
.
ho ~ 0 hl ho hR _ l hR - 2 ho (7) ;
h hR-I hR-2 hR--I
- -Hhas~ nc(L+R-l)xL Thus,thevectorHx(i)~ theentire responseoftheIIRfilterl/(1 - ~a1z-1 ~ yNaNz~N)tothevectorx(i). With these -1Pfinition~ an l,r U~ -r 'ly ~ ' ' d criterion is:

x(i) + q(i) --~(i)] HT H [x(i) + q(i) --~(i)'J ( ) With the current ~ of H the error criterion of equation (8) is of the ,' - type (note thatHTH is Toeplitz). If the matnx H is hl ~ ~ to be square LxL, equation (8) equals ~, (4), which is the more ~ -- cu. ' - e - ~
c tt as used in the original (~ELP. ~ ~ -0 An m~.~t~ Embodiment for CELP Coding Figure 2 presents an ill '~_ e-..~ of the present ~ ~ ' - - as it may be applied to OELP coding. A speech signal in digital form, s(i), is 1 - ~
for coding. Signal s(i) is ~J .;~d to a con~ linear ~ analyzer 100 which produces linear predictive coefficients, an. Signal s(i) is also ~>~o.;dcd to a 15 c~ ~_ -' -llinear~" ~i filter(o~"short-term~ I "(STP)~120,which operates acc~ll'' g to a process ~ 1 by Eq. (1), and to a con~. ~'- ~' delay i -~e:' - 140. ~ ~
Delay ~ ' 140 operates to provide an ~ -- rl delay value to the ~ ~ ;
adaptivecct'ebo~ ~esso.150.To~ ' ~delay r ~' validata -~
20 particular sample time, delay e ' 140 pe- r~.. C co~ co~.~;l of a window of samples of s(i), centered about the particular sample in question, -with each of a '~ y of ~ O~.D of the same length. The wil~do..D involved in this COIl~' ~r are i1l ~ in Figure 3.
- ., ~:.
-- - ~

21 0208~

Figure 3 presents the demarcations for frames (F) and c- ngtitnent subL~llcs (SF) of samples of signal s(i) (actual sarnple values of s(i) have been omitted for clarity). Shown are three frames, Fn_ 1 (the past frame), Fn (the current frame, and Fn+l (the next frame). Each of these frames CrJ~ S 160 sarnples of 5 signal s(i).
The location of frame bou~ n~ ;- s is provided by time shift P1~CSSO~
200 rlicr~csed below. Time shift plucessol 200 provides a sample location dp 1 'ine the end of a ~br.~ ~e of original speech signal, s(i). Delay s 140 simply keeps track of the sul)fiOAlle b ~u ' - s of original speech to know when a ~-10 frame bluu~ is reached (such a frame b u ~ ~ is at an integral multiple of S ~ r bo-~ s). Because delay ~ . 140 operates on a frame of speech prior to the ~1~ of the time shift ~lV/ee~ r 200 on the same frame of speech, ~~
delay s 140 must predict the position of future frame b " ' - - It does this by adding a fixed number of samples equal to a frame length (e.g., 160 samples) 15 to the last frame b ~u ' ~ provided by the time shift lll~essol 200.
Assume delay c ,~ 140 is to d~,t,. -...;~f a value for delay, M, valid at the b,ou.. ~l between the culrent and next frames of s(i), M(FBn+ 1). To do this, ~ ~ ;
f~i' ' 140 stores in its memory a window of 160 signal samples s~l~undillg this ~ ~u ' ~ (e 140 must wait to receive samples of signal s(i) valid in the next 20 frame). This window of samples is denoted as window A. Next, ~ 140 p. . r .. ",c a correlation co~ ;f~ with samples of s(i) in window B 1 -- the first of 140 other ~.;nd(,.. of s(i). Window B 1 is a window of 160 samples begi~ ng 20 -samples earlier than the b ~" " of window A and ending 20 samples earlier than the end of window A. A CClllb' - value ~ccc- ' with window B 1 is stored in 25 memory. The co~ if ~ ~ process is repeated with window B2, a l60 sample window beE~ e one sample earlier in time than window B 1. Correladon cc ..~ iOI~c are p- ~ fi,~...f d for each of the next 138 windows, each window distinct from the one before by one sample.
As shown in Figure 3, s~ ~ 140 must have enough memory to store 30 what is e~ n i~lly two frames of signal samples. If D is the largest delay value allowed, then the memory should extend D samples prior to the beg ~ ~ g of window A. When D = 160, in order to compute an ~ ~ ' delay valid at FBn+ 1.
s - 140 must store samples of s(i) from the be~innine of the third ;~ub~ f,, SF2, of frame Fn_l to the end of the second ~..hr.n.~.., SFl, of frame Fn+l . Delay, 35 M, is flf ~ Pd by e - 140 based on the B window of samples having the greatest correlation with the samples of window A. That is, delay is equal to the - lo-2~02~8~
number of samples that the most correlated B window is shifted in time from window A.
The delay e~ r)r 140 (1r~ r s a frame boundary delay estim~te~
M, once per frame. Delay e~ u~ 140 further ~ t~ ...i..~s a delay value, m, valid at s a fixed number of samples into each subfi~m~ te.g., 10 samples), by conventional linear intelpolation of delay values valid at frame boun~1Arif s For this puIpose, the delay value required at 10 samples into the next frame is set equal to the delay value at the frame boundary.
The timing A~oci~ted with the delay values provided by delay e,~ o.
10 140 is illuslrated in Figure 4. As shown in the Figure, delay values valid at the frameboi ~ -- s~uluu~ldingframenareM(FBn)andM(FBn+l). Delayvalues valid at a fixed number ûf samples past each ~ ~brl nl ~ bOundaly (SB) within frame n are;,,,l;~ as mn(k),k= 0,1,2,3. Thesevaluesof mn (k)ared~ h~ F.d by interpolation as ~iccllcce~ above. Delay values m n (k) are provided to the adaptive 15 codebook proce~ûl 150. As will be .l;c. ~sed below, the adaptive codebook ucessor 150 uses this dday i.,rO....~ n to provide an adaptive codf~o contrihlltir~n to the time shift ~l~)Cc'SSOl 200.

The Adaptive r~rb~
The adapdve codf ~lr p~u~esso~ 150 provides an estimate of a current 20 ~rl_llc of speech (to be coded) to the tdme shift ~JlUCC~';)l 200 based on delay ~ - ' s. mn (k), from the delay e~ 140 and past recoll~Ll uctcd speech signals ~:
from the CELP process. The adaptive codeb~~ lucessol 150 operates by using delay values, m,A, (k), to flf ~ r. a delay pointer, d(i), to past recon ,l. u-,t~,d speech signals stored in the memory of plu~ssol 150. Selected past speech samples, x(i), 25 are then provided to pluce:~s~l 200 as an esdmate of the current ~ubr.,.-.., of speech to be coded. For each s-~ -..f of original speech to be coded, adaptive co~leboo~
uce~ 150 provides a cc - ~-,,~,o~ g subframe of speech samples plus a fixed number of extra samples which extend into the next s~'l~aLue~ Illustradvely, this fixed number of extra samples equals 10.
Figure S presents an illu~h.. ~_ ~' ~ ~ of the adapdve co~
pl~,ss~l 150. There~l ~ cc..~ s~ cessùl 155 andRAM l57. E~uce~,~,o 155 receives past lCCo~ u~,tcd speech signals, x(i), and stores them in RAM 157 for use in Cf.. l~ current and next ~ JLallle speech samples. ~ucei,~ol 155 alsoreceives delay values, mn (k), from delay e,~ lu~ 140 which are used in the 35 co...l,.,~ ;on of such sample values. ~ucesiof 155 provides such cs...l...~cd sample .. . ~ . . . .. . .

.. .

-- . - . . - :

.

2~02~8~
values, x(i), to dme shift ~luccssol 200 for use in the generation of trial original signals.
Each sample of speech provided to the time shift ~ ,eSSOl 200 is c1etc. ",innd as follows. First, a delay pointer, d(i), valid for the sample in quesdon ~ -5 (that is, the sample to be provided to the dme shift processor 200) is ~,t~ f'd by u,CSSOl 155. This is done by interpoladng between a pair of delay values, mn(k) (provided by delay e 140), which surround the sample in q~upsti~n The interpoladon ~ used by ~IvCC,SSOI 155 to provide the delay pc~intors~ d(i), is convendonal linear int~ olali(,n of the provided delay values, mn (k). Next, 10 pluCe,SSOl 155 uses the delay pointer, d(i) (valid for the sample in quesdon), as a pointer ba,~ d in time to an earlier speech sample which is to be used in the current frame as the value of the sample in quPstif~n Such earlier sarnples are stored in RAM 157. In general, the delay pointer, d(i), wirl not point exactly to a past sample. Instead, d (i) will likely point sol.l~ ; between cnn~ ;ve past 15 samples. Under such ci-c~ r~s, plvcessol 155 ih~ Olal~,s past samples to ~1~ t - ...;..~ a past sample value valid at the moment in time to which the delay pointer refers. The int~ t~h~ e used by ~.~ ces~o. 155 to d~ e past sample values is convendonal b n~llimitP~l ~olaiion, such as that llescribed by Rabinerand Schafer, Digital F).~ces~.ng of Speech Signals, pp. 2~31 (1978). The ~
~olalion filter realized by ~ CeSSO~ 155 illustratively employs 20 taps on either ~ -side of the past sample closest to the dme i~ by the delay value.
Figures 6a-c ill t~ the process by which the adaptive co-leboo~
plu~ SOl 150 selects past samples for use in a current (and next) ,~ f . ..~.~ For clarity of yl~ 1;o~-, Figures 6a-c assurne that a co...l.ut~ value of d(i) points 25 exacdy to a past sample value, rather to a point in between past sample values. Also, it will be assumed without loss of generality that the delay values are shorter than the ~-hv rl ~ length.
As shown in Figure 6a, the samples to be provided to time shift plu~,eSSOl 200 include samples in a current subLa.lle~ and a fixed number of samples 30 in the next subframe. ~u~,essol 155 receives a delay value for the current S~mCurr~ from the delay ~ ' X 140 and has stored in its memory 157 a delay value for the previous S. br. ~ mp,e~. To ~e~ ;nf the value of each sample, x(i), of the current ~.lbL located prior to the point at which mCur~ is valid, ~lvcesi,~l 155 d~ t~ ...;nes a delay pointer, d(i), valid at the sample time i of the sample in ~ P~ti(~n 35 This is done by linear inte.~olalion to the point in time when the sample is valid using delay mcurr. and the last delay value received from estim~tr~r 140, mpr,~,,. After - 2102~8~

this delay pointer, d(i), has been cl~ tc. ~ F11, p-ocessol 155 comp~ltes by b~nfllimi~d interpolation of samples in its memory 157 the sample value valid at a point in time which is d(i) samples prior to the sample in q~estif~n~ i.e., x(i - d(i)). ., This sample value is then inserted into a memory location reserved for the current S ~ubr~ f samplein4l - <-n In the example of Figure 6, the ~ul~ÇIalllv length is longer than the delay values. The process by which a given sample in the current ~U~r.,- ..f is ,1. ~ Ill;llf~
is based on ~1. t ....;..i.~g a delay pointer and looking bac~. d in time for a sample value to use as the given sample value. Thus, ~ O of lc~,o~ lu._tvd speech may 10 be essenli~lly repeated using t- ~limitP~ tv.~lalio n within the current ;.~lbrl.tllle.
So, for c , 1 , in Figure 6b, a given sample, x(i), takes its value from a previously dei ~ - ~ sample which plv~cdrvs it in time by a delay d(i), i.e., x(i - d(i)). This delay is d~ t .. : -f.d as fle~ l above, except the delay values which are ' . ,~
interpolated are the delays from the current ~"br . -..- mC~rr, and the next svbfr~mP
15 m"~, since these delays - --.u ~ samplex(i). R~ ,, signal s~,gm. ..l~ with constant gain when the delay is shorter than the ~ ~ r length is what di~ g ~ ' - - the adaptive co~ JlU~ el~v from LTP Slter ng ~Iu~,el~s. -As shown in Figure 6c, the extra samples in the next ~ubrl~ullv are f~ in the same fashion as those in Figure 6b. In this case, samples from the 20 current ~ r ~ are used to provide values for samples in the next s ~br ...--~In pracdce, the above~esr"held P1OCC1~JIC of the adapdve c~lebQ~
.lucvssol 150 may be realized by first c~ ;..g all delay pointer values, d(i) for all sample dmes of the current and portion of the next ~-~br.~..f in ~_ t~nn Then, for each sample drne, i, of the present or next ,.~bf. .~ ~.f needing a sample value, d(i) ~ -25 is used as a lefv.vnce to a past dme, i - d(i), at which a sample is "located." In - -general, there will not be a sample located at dme i -d(i). Therefore, ~ d ~ ~-' - of samples suIrounding time i -d(i) wi~l be required. Once the ~- ~1 d ~ 1-- is ~ .r~ g~ g a sample value at i -d(i), that sample value is assigned to time i. This process may be repeated in a IvCu,~;~v 30 process for each sarnple in the present or next, l.r.. ... as needed.
Once the adaptdve co~ plv~v550~ 150 has dvt~ . ~ samples for s use in the current ~ r ~ ~ and a fixed pordon of the next ~ r I , those samples are plo~idvd to the time shift plUCvi;.~ 200 for use as a basis for ~ - g a shifted original signal for use in a OELP coding process. The samples lnu.i~vd to 35 the time shift pl~vSSO~ are referred to as the adap~ive codebook con~ribution to the analysis-by-s~ I.esis process of CELP coding.

- 13- ~
2~ ~2~8~
It should be understood that an all-pole filter may be used in place of the adaptive codebook re~li7~tinn of an LTP. Ilo.. v~vl, the adaptive codebook re~li7~ n is particularly well suited to ~;n-~lio~lC where, as illllctrAtPd here, delay values are generally less than the length of a ~ubf. This is because an adaptiveS codcbook re~li7~tion does not require a d~,t~ .f-d value of LTP gain (here, codebook gain) simply to provide an LTP contribution in the current ~bf ~ "r This gain may be clf t~v ...;.-~d later. Unlike the case of the adaptive codf boo~ an all-pole filter re~li7~tinn of an LTP rPquires the solution of a nonlinP - e, - to obtain a value for the filter gain when delay is less than s.l~r.,~.,f~ Iength.

10 The Time-Shift P~ wx The time shift pluCf ssoI 200 dvtv~ ihlf s how to shift s~;J..f - ~ of an original speech signal such that it may be coded (by an analysis-by-~ the;.;s coding process, such as ~ELP) with less error than if the original signal was always used for coding To dme-shift an original speech signal, the dme shift l,lu~s~l 200 first ~ - '' 5 i~lr ~I;fir s within the original speech signal a local ~--~ --- of original speech signalenergy. Intheillu~hdti~_v-..l~~ d~,s~-;h~lbelow,pl~essoI200selects a plurality of o.v.1~ g se~ of the original speech signal, each of which ~ -includes the ide--l;r~D~ local ~ ------ signal energy. P~U~eSSOl 200 c ~ vs eachselected segrnent with a segrnent of the adaptive codcbook cn~ ,I;n~ lu.;dvd by ~:
20 theadaptivecodebook~lucvss~ lS0). lhisc~....pn~;cn~ismadeto~e~-....;.~ the original speech signal segment which most closely matches the segment of the adaptive co~l~f ~r ~ bu When the segment of the original speech signal -~
which best matches the segment of the adaptive co~l~on~ co~ ut;o - is dctv, ~ ~ d, this segment of original speech is used in the f~ ~ of a shifted 25 original speech signal for coding by a CELP process.
As shown in Figure 2, the dme shift pluccv~ol 200 receives an original residual speech signal, x(i), from the STP 120, and prû.idvs a dme shifted residual, x(i), for use in the CELP coding process. As shown in Figure 7, time shift pluCvSS~
200 illu~t~ ,vly cc~ e5 p~ûcessol 210; conventional buffer ...- ~.o. ;~ s 220, 230, 30 and 240; conv~ ional ROM 250 for the storage of ~I JcessoI 210 programs; and coQ~_.. tional RAM 260 for the storage of plucessol 210 results. 9 The olJc~dt;oQ of time shift plucessol 200 will be e~ l with afe~;nce to Figure 8, which presents an illu~ , starting point for pl~lcessor 200 -~
operation on speech signals, and Figure 9, which presents an illu~llalive flow~
35 diagram for the operation of plucessol 210.
. : .

-14- 2l~,f~n~ -As shown in Figure 8, ploccsOor 200 begins op~,lalion having received a buffer 220 of lc~o~lshu~tcd speech lc~lcsen~ g the adaptive codebook contlil.ulion from the adaptive cod~book ~ cei~ul 150. As f1iicur~sed above, this adaptive codebook co.ltlibulion co.~ ;ce5 samples of past lcconOI.u~,t~,d speech which have 5 been mapped into the current s. ~ r ~ and a fixed portion of the next subrlaule (see Figure 6 and hccOAi, ~ ~1icc~ inn) by ~lVCf,SoOl 150. This buffer of lccofl~llu~t~d speech is loaded into RAM 260 for use by IJlvceOO~l 210. A pointer, dp 1, is ,,,~i~,u ;~-f d by plVcf,OOor 210 and stored in RAM 260 to indicate the end of the latest -~uI,Çl. lle for which both the adaptive codebonL and FSCB co~.l - ;bul ;m~c have been 0 5Ic t~ i, fA The length of such r ~ 5~ sul~f rame_l, is constant and --in memory, e.g., ROM 250. Based on prior olJc"Atio - of the plU~CooUl 21Q a timeshifted residua1, x(f ), has been created up to a point in time i~ ; fi~ by a pointer dpm (pointer dpm is always greater than or equal to dp 1). M~lc~ , a portion of the original residual signal, x(i), ;f~ d; ~g that ~ ~s~ t ~l with the current t Abr. .. , has ; :
15 been received by buffer 230 and stored in RAM 260. ~u-,essol 210 ~ ;- c (in RAM 260) a value, acc_shift, lc~ SW~Ath~g the sarnple flicpl - - - - (or ~ , '7tPd shift) between the last sample in the shifted residual signal and a co l~,q-o-~-l;f~g sarnple in the original residual speech signal. (At ~ ' ' 1 --~ - the above~lf sc ihed status is ,~f~l;fif d to include dpm =dp 1 and acc sh~ft =O).
Given this set of cc - s, thc tirne shift plU~ oS~l 200 operates to ~ ~
~ - a shifted residual signal for the current ~ ~ r ~ (and possibly a portion ~ ~ -of the next ~ r , ~ p ~ ~ ~- g on the . -) which best matches the adaptive cf~4bol-~r c~ ~-~'' Figure 9 presents a flow-diagrarn i11~.~h. ~;..g the o~ of the 25 plo ,essol 210 of Figure 7. According to Figure 9, the first task pe- r~....-f~ by C~SoOI 210 is to ~ ,.".'~r. whether the time shifted residual, x(~), has been t,~ up to or beyond the end of the current --~br n~~ -- As shown in Figure 8, the extent to which the tirne shifted residual has been e- ~t~ -~Cl~ iS given by plointer dpm.
The end of the current s ~hr,.~ ..f is i~ ;f ~ by the sum of cuA~rent s Abii ~ pointer 30 dpl andthefixed~' r ~ length,subframe l. If dpm<dpl+subframe lfurther pl~e;.~:..g is ~ r,.. ~e~l to extend the shifted residual; else, no further shift pA~ceoo;ng is required for the current ' fi (see step 305).
If further shift pl~e~'..g is required, pr~cessol 210 d ~ t- ~ A5 the location of ... - Y ;...- .. energy in a segment of the original residual speech signal, x(i) 35 (see steps 310-375). Ordinarily, the location of ...~ .. energy collcO~nd~ to the location of a pitch-pulse of voiced speech. IIo..c~., this is not nrc~-~ ;1y t~ case.

- " .. ~ . ...
.- . , ... ~ . .
.

-15- 219208~

Regardless of whether the " ,~ ,~; " ,. .. ,- energy is ~soci ~tf d with a pitch-pulse or some other signal feature (such as, e.g., energetic noise), the search for the m~ximllm ~
energy location is made so that shifts in the original signal will be made to best align ~ -an energetically si~nifir~nt feature in the original speech with a ~ignifir~nt feature in S the adaptive codebook contrih-~fif n The beginning of the segment of the original residual speech signal to be searched is defined with respect to a pointer to an original residual speech signal sample. This sample co,~ onds to the sample ident fif d by pointer dpm in the shifted residual signal. This residual speech signal sample pointer, dpm', is 0 detc~.l.il~f,d as the sum of sample pointer dpm and the ~cum~ ~i shift between - - -x(i)andx(i): dpm'=dpm+acc_shift(see step310). The beginnin~ of theinterval to be searched, flf sign~t~d by the pointer offset, is then colllr_ ~ (see step 315). ~ -Next, the length of the interval to be searched is defined (see step 320).
The location of.. -~;.... energy in the segment ofx(i)is then 15 fi~ t~ f-d (see step 325). This d~ - - -on is made with use of a L~
window. This window, centered about the ith sample of the original residual speech signal, defines sarnples of the original residual used in an energy Cfi....l.u~ The energy at sample locadon i is ~lf ~ ....;..f~ by the sum of the squares of the samples in the window. The energy at the (i + l)th sample location is fle~ " -An~i in the same 20 fashion, but with the window moved one sample later in time such that the center window location now contains the (i + l)th sample. Again, the energy is f3e~ - . . .; ..
as the sum of the squares of the sample values in the window. The energy of eachsample location in the segment is ~ in the same fashion. The energy of sarnples in a current window may be fif t~ ...;. f d as the energy of an ~" past25 window of samples minus the energy of the sarnple shifted out of the window plus the energy of the sample shifted into the window. The sample location having a~so~:sb d with it the ...-x;-.. ~. energy di t ...i-~d in this fashion is i~lentified by a pointer location.
Once the segment of the original residual signal, x(i), has been searched 30 for the sample having the ~~~ ~ -~~ energy in the segn~nt ~Iu-,essol 210 ..
dcl ' ~ if this ... ~ energy sample is one which has been con~;de.~d in the previous ~ bl., ~ (and thus not a ..~ . of interest). This is done by fi~e whether location p.~edes dpm' (see step 330).
If location pl~ced~,s dpm', another search is pe~ ~o~ d by ~ Jcess~
35 210. In this case, however, the segment searched begins at a sarnple spe~ifi~d as offset = location + 0.75delay (see step 335), and is of duration O.5del~y. The ~ : ~
:. ' ' ~ ' ';~

-16- 2102~8~ ~

value delay is provided by delay estim ~tor 140 as the delay valid at the beginnin~ of the current subframe, M(FBn). Since si~nifi~t pitch-pulse energy features in theoriginal residual signal are likely s~al ' by one delay period, the c~ n of a new offset allows the search to skip ahead (0.75 delay) and likely find a 5 energy feature within a segment of length Q5 dela,v. The sample location of ",~x;"",." energy is ~1. t~ as ~escribed above with lefv~b-lce to step 325 (see step 345). -If location does not proceed dpm', then the first pitch-pulse beyond dpm' has likely been found, and the flow of control jumps to step 350. :
If the location of .. ~;".~,. signal energy ~ ,f~ at either steps 325 or 345 follows dpm' + delay, then it is likely, but not certain, that a pitch-pulse located sul-3e~ I to dpm' but prior to dpm' + delay has been missed by the searches ~vlr~ ed to this dme by processor 210 (see step 350). In this case, another segment of the original residual signal is defined and the locadon of the 15 ",~.;... .. energy therein is ~ If the locadon of ".-,;------.. signal energy ~lf t~ d at either steps 325 or 345 I,lvcedes dpm' + delay, then the flow of control jumps to step 380.
,~cc.lmjn~ step 350 results in the need to search another segment of the original residual speech signal, this segment is ~c t' ~--;-~1 to begin at 20 offset = location--1.25delay (see step 355) and extend for length = O.5delay (see step 360). The locadon of the ~-- -~ ;------- ~ energy is rlf t~ " ,;l ~1 as cle5~ ;1~ above with lv~c~vnce to step 325, but the sample pointer to this locadon is saved as location2 (see step 365).
If the location of ...~ --.. energy (location 2) is s.l~sv~luelll to dpm', 25 then location2 i~lf ~ifif s the locadon of the first pitch-pulse beyond dpm', and location is set equal to lo( ~Ror~ (see steps 370 and 375). If, on the other hand, the locadon of ...~ ---.. energy is not beyond dpm', then location2 is not the firstpitch-pulse beyond dpm', and location remains set to the value it was ~c~i~P.cl at either step 325 or 345 (since under such CilC~ : ~ s, pointer location is not 30 o~v~ tvn by the o~ ~,~ of step 365).
At this point, the locadon of the first pitch-pulse (or energy .. ~ . ;.. ) in a segrnent of the o~iginal residual has been found. Now, a segment of the original residual signal c s ~ g this location will be defined by l,l~v~.~ 210 through the setting of certain pointers to samples in the signal. These-pointers specify the3s bf,gh~ g (sfstart) and end (sfend) of this segment co~ -g the ~If te- Il-;l~
locan'on. This segment is defined for later use as part of the process of aligning (or _ - 17 -2~ 02~80 : ~
shifting) original residual speech to best match an adaptive codebook contribution.
First, default values for the segment pointers are set by ~ cess(,. 210.
Pointer sfstart is set equal to dpm', the sample location co~ )nt~ g to dpm + acc_shift (see step 380). This value for sfstart collv~,onds to an q~r1itinnql 5 aec~lrmll-q-t~d shift between x(i) and x(i) of zero. That is, use of a section of x(i) bc~ g at dpm ' ( = sf start) adds nothing to the q.~cum~llqt~.d shift between the original and shifted residual signals.
Pointer sfend is set to location + extra. The value extra is a constant stored in memory (e.g., ROM 250) and is equal to a fixed number of samples, e.g., 10 10 samples. Use of extra g P ~ ~ that the pitch-pulse (or .. -~ .. energy) of ::
original residu. l speech will not fall at the end of the segment of the original residual being itl~.nfifiPd by these pointers (see step 380).
The default value of pointer sfend may be overwritten under certain ~ ' C~l~ 'e5 If the default value of sfend would mean that the segment of original 5 residual speech would extend ~ fit 'ly beyond the end of the adapdve codeboo~
co- ~ the pointer sfend is set to end at dp l' + subframe_l + extra, where -subfrarne l is a constant eq~ ~lling the number of samples in a fixed adapdve CO~f 1~1' Subfi.u~lCv as .l;~ ed above (see steps 385 and 390). -The value of sfend may be further u.vl~liltv.l if the locadon of the 20 i~ fifi~d pitch-pulse (or major energy) is ~;g~;fi~ ly beyond the end of the adapdve code~ ~ r ~ Under such CilC ~-cl --~r~ S the segment is deemed to ~ -end at the end of the adapdve c~i.- bon~ ~ r bou,ld~uy (see steps 395 and 400).
Note that such a ~ of sfend means that the locadon of the pitch-pulse (or major energy) is later than the end of the seg~n~nt Therefore, the segrnent no longer 25 contains the pitch-pulse.
At this point, the locadon of the i(le~;r~ pitch-pulse (or .. ~;..----..
energy) is checked to determine whether it falls outside a range of samples b~ - ~ g at sfs~art and ending at sfend - 1 (see steps 405). If so, x(i) may be t- t~ d with samples obtained with bandlimited int~ ' ~ of x(i) without need for ' eing 30 acc shif t (that is, flow of control may jump to step 480). Olh~,. .. ;~e, shifdng is pc fo~ d (see step 410-475).
~s~min~ the locatdon of the i~ ; r.,~ pitch-pulse (or major energy) is ~ ~ ' not outside the range defined above, a set (or seg~ nt) of L samples of x(i) (within a spe( ified range of samples about the segment defined by s fstart and sferu~) which :
3s most closely matches an L-length section of the adaptive codeb~l co~
(which begins at dpm and ends at dpm +L) is ~ t ~ f~l by ~ ,essol 210.

. ... , .. .. , .. . . . .. .. . . ~

21~2~80 This L-length segment of x(i) may comprise those L samples of the segment of x( i~ defined by sfstart and sfend, but may also co., .~ e samples (obtained by b infllimi~f d interpolation) of a segment which is shifted with respect to sfstart and sfend, ~c~ guponhowcloselyagivenL-lengthsegmentofx(i) 5 matches the L-length section of x(i). As predicates to this flf~t~ ion, a limit on the range of possible sample shifts (see step 410) and a sample length, L, are cl.,t~ rd (see step 415). The determi- f n of the "ClOSf'flf.~" (i.e., a measure of similarity) between L-length segl~ t~; of x(i) and the adapdve codebook f f~ntribntic~n x(i) is made through a cross-correlation process of these signals (see 10 step 425) ~it will be I ~ ~tood that other Illeasulcs of similarity, such as a dirrc.~ cf or error signal may also be used). The sf,hfActif n of L-length se~...f fll~ of x(i) for use in a cross-correlation with a segment of x(i) may be adv~ntagf Iy ~lf~sf~ihed with l~,f~..,nce to Figure 10.
Figure 10 presents an illu~llalive segment of original residual speech 15 signal x(i) which was located as dF srnbed plc~iously with l~,f~,l.,nce to steps 310- s 400. The segment begins at sample sfstart and ends at sample sfend. The pitch-pulse is at sample location, with the distance between samples location and sfend equal to extra. As ~lice ~ied above, the samples of x(i) falling within the segment defined by pointers sfstart and sfend c~lc;.~ond to a shift of ~ro. Shifted ~
20 of x(i) are defined with respect to this ~ro shift position. Each shifted segment is of length L and begins (and ends) a certain positive or negative number of sample lengths (or fractions of sample lengths) with respect to the ~ro shi~t position.F ~ ed another way, each shifted segment begins at sfstart + shif t and ends at sfend + shif t. As shown in Figure 10, the range of possible shifts values for shif t 25 is +limit.
So for r~ one possible shift would be shift =--limit. In this case, the L-length segment of x(i) defined by such a shift would begin at location sfstart - limit and end at locadon sfend--limit. Similarly, another possible shift would be shif t = + limit. In this case, the L-length segment of x(i) defined by such 30 a shift would begin at locadon sfstart + limit and end at location sfend + limit. As iO ~Fd above, +limit specifies a range of possible shifts. Therefore, shif t maytake on values in the range - limitSshif t~+ limit, given a shift step size (i.e., shift c~;s;on) of sstep. Step size sstep may be set illu ~ha~ y to 0.5 samples. Samplevalues resulting from r -' ~l shifts are dcl ~-,--i-~rd by co~ ,nlio~ imi~l~.d 35 int~ ' 'on A plurality of 2xlimitlsstep ;~ ;lllf ~ of the original residual signal x(i) may be defined in this way. A~ are L-length se~;. . Ie~ ~I '; between _limit, wherein 2~2080 ~ ~

each segment overlaps its neighbor segm~nt~ and is distinct from its nearest neighbor se~;.. t~ by sstep sarnples.
The relative sizes of limit and extra have an effect on system p~v.Ço~ ce For ey~mpl~ as extra is made larger, greater coding delay is ~-5 hl~lvduccd to the system. As extra is made smaller, coding delay is reduced, but the probability that shift will take on a value which excludes a pitch-pulse from the L- ' length segment of x(i) in~;lbds~s. This ~Yc~ nn, when it occurs, causes audible -~
di~tul liO.~ in the speech signal. The probability of eYrlncir~n is . ISO incl..ascd as limit is made larger. To help insure that eYc1n~inn does not occur, the value of limit0 should be less than the value of extra. For ~ r 1 v~ if the value of extra is l0, limit maybesetto6.
For each such L-length segment of x(i) thus if 1~ntified~ a measure of ~ -similarity between the segment and an ~length segment of the adaptive codebook contribution, x(i), is co...l.u~l This co...l..~l - io-- is illu~llati~,~,ly a cross-5 correladon. The adaptive co~ segment used for each cross-correlation beginsat dpm and ends at dpm +L (see Figure 8). The cross~- lltvlalioll is ~ . r~.. ~d with a ~ -step size equal to sstep (should sstep equal a non-integer value, co - of x(i) is ~tv~rulllltvd in advance to provide the ~
sample values for the seE.- .. ~ ~ ~ of x(i) and x(i)). Each cross-cfJllrvià~ion results in a 20 cross-coll~vldlio.- value (i.e., the measure of similarity). All such cross-co... 1 l;o~
form a set of cross-c~ ll.' values se~ d in time by sstep. Each cross-Cfjll~ ' - - value of the set is . ~ h~vf~lr~ with a shift co. ~ E to the L-length segment of x(i) used in the c~ u~ of that value.
Once the set of cross~v~jllrvldLioll values is d~ ~. the segment of 25 the original residual signal having the greatest cross-c~nr1qfion with the adaptive co l~ b~ segment is d ~ with an increased time res~ 1ntif n (see step 450).
m _ ~_ly, this is done by fl~ t,~ g a second order polynomial curve for each set of three c ~ ~ _ ~_ cross~o- " 1~l;f n values (a set of three values is distinct from its nearest - - ghbf~nne sets by one value). The middle value of these three cross-30 COll~ values in a set c~Jl~ ,onds to a shifted original residual signal as~3et~bed above. The set of three cross-correlation values, and thus the ~cs~r ~
pol~ulll.al curve, is i~ ;.';ed by this middle value and its qt~s~x:- 1 shift. For each such curve, a .. ,-I ;.. and the location of that .. -~c;.. (loc rnCuc) is ~ ;f~rA
(If loc max is outside the range of the three values, the three values and -q-~f~:A~,d 35 curves are Ls.bgdl~d.) The curve having the greatest ...~;... - .. value i~ nlifi~ s the shift of the original residual signal which ~luduces the best match with the segment ~ .

of the adaptive codebook cont}ibution. 210 ~ O 8 O
The shift of the original residual signal producing the best match is refined with knowledge of the location of the ...~xi...,.... of the polynomial curve having the greatest ...~~;...- ... With the location of the mr-ximllm defined with S respect to the location of the middle of the three cross-correlation values Q~sociotf-d with the curve (i.e., a value of shif t), shif t may be refined as shift = shift + sstep * loc max.
At this point, the best shift of the original residual signal has been fl~ ti . . Ii i~r~l This shift may then be used to extend the shifted residual, x(i) for a 10 duration L. Since this shift is known, the açcumlllOt~d shift between the original residual signal, x(i), and the shifted residual signal, x(i) may be updated as acc_shift = acc_shift + shift (see step 475).
With the ~c--m~llQtffl shift updated, the shifted residual signal, x(i), is eYtPn~Pd to match acc_shif t with use of the segment of the original residual signal JllG~l~Of~ e to shift. Note that original residual sample values are available only at original signal sample times. IIo..~ , in fl- ~ e an optimal shift of the original residual signal, an ~-~ l.li..g has been pGlrc,~ ed prior to co...puli..e cross-correlations and a value loc_max (which is generally n~-f~ tcg~ . ) has been (3rl~ fl1 In general this results in a f c nin-eg~n~r sample time relo inn~hip between 20 the shifted residual signal x(i) and the original residual signal x(i) to be used in ~YtPnfling the shifted residual signal. Therefore, brnfllimit~d illLGl~olaLion of the L-length segment of the original signal is used to provide sample values of the original signal which are time-aligned with samples of the shifted residual. Once such time-rligrfnf-nt is ~ rwlllcd, the samples of this time-aligned signal may be CO~f~ A~ h' d 2s with the existing shifted residual signal (see step 480).
Note that flow of control may have jumped to step 480 without ~r ~ ~ g the r-~ ~1 shift. In this case, a length of L-samples of the original signal is to provide samples for the shifted residual with the same value of acc_shift as the previous shifted residual segment In either case, dpm is updated to reflect the eYt~n~ n of x(i) (see step 490~.
As shown in Figure 9, once dpm is updated, the flow of control returns Y
to step 305. As ~--~ --t;~ above, step 305 ~ 5 whether further p.vcesj;i g is requi~f~d to extend the shifted residual beyond the end of the current subframe. If so, 35 control flows through the process pl~ t~,d in steps 310490 of Figure 9 again so that further t1' t~ s:on of the shifted residual may be pc Çollllcd. Steps 310-490 are .. . .

- - 2 ~ 0 2 ~

repeated as long as the con-l;tion of step 305 is satisfied. Once the shifted residual ~ -has been eYte,nded up to or beyond the end of the current adaptive codebook ~u~r~ v, the pointer to the end of the adaptive codebook snbfr~mP is updated (see step 500) and l~lucvssi-~g a~ 1 with time-shifting the original residual ends.
S Once x(i) is fl~ t~ d by time shift lJluCvSSOr 200, a scale factor is .~ t~ fd by process 210 as follows~

T~ ~ (13) where x(i) and x(i) are signals of length equal to a slJ~rlalllv~ This scale factor is mnltirlie,d by x(i) and provided as output from processor 200.
Referring again to Figure 2, x(i) and adaptive codebook estimate "
~(i)x(i) are supplied to circuit loO which subtracts estimate ~(i)x(i) from mnflified original x(i). The result is eyf~it~tion residual signal r(i) which is supplied to a fixed storh~clir coclebof ~ search plucessol 170.
Coflf-hon~ search ~luces~ 170 operates cO-I~v ~I;f).. ~lly to fletf m~in~
5 which of the fixed ~ ~x 1.~1;f~ codebon~ vectors, z(i), scaled by a factor"u(i), most ~ - -closely matches r(i) in a least squares, ~v~;vpludlly weighted sense. The chosenscaled fixed codebook vector, ~(i) z~l"" (i), is added to the scaled adaptive codebook vector, ~(i) x(i), to yield the best estimate of a current lvco.~ u.;l-vd speech signal, x(i). This best estim~t~., x(i), is stored by the adaptive code~ processor 150 in its ~ ~
20 memory. ; -As is the case with co~ ivonal speech coders, adaptive coflebof!~ delay and scale factor values, ~ and M, a FSCB index, I FC. and gain, ~1 (i), and linear prediction c~ rr~ t."l~, a", are CO~ IA ~-d across a channel for l-_con ,llu~:lion by a conventional CELP decodvf~lvce;~,v~ (see Figure 13). This co~ hon is in 25 the form of a signal reflecting these p~ Because of the reduced error (in the coding process) afforded by op~ ~hion of the illu~ v f ..bo~ -.l of the present invention, it is possible to transmit adaptive codebook delay i..fo....~;ûn M, once per frame, rather than once per s-~bf~ .. Subframe values for delay may be provided at the receiver by . ~ the delay values in a fashion identical to 30 that done by delay ea~ 140 of the ~ -..;lt~ ~.
By l~ ;''v~ adaptive co~ebook delay inf nn~fion M every frame rather than every subframe, the bandwidth re~luilu.llenb ~ccoci:~ted with delay may be c;~nifif~nntly reduced.

-22- 2~ 02~

As fliccllcced above with l-,fe ~,nce to step 475 of Figure 9, acc_shift S~ an a~cum~ ted shift over dme between the original signal, x(i), and the shifted signal, x(i). In order to prevent an ever in~;lvdSillg asyll~luully between these signals,thedelaycs~ o. 140canadjustcDmr ~valuesforMoverdme. An 5 adjusL.ll~ process suitable for this purpose carried out by e ~ t.~ 140 is adv~nt~geo--cly fl-~scnhed with lcr,~vnce to Figure 12.
Figure 12 presents a finite-state machine having states A, B and C. The state of this machine lvl,r,sen~s an amount of ~5~ I to ~ ~ values for M
to prevent ever hl~,lt,aslng a;~ elllUII~. T f nC between states are based on 10 values for acc shif t provided by time shift plUCf SSOI 200. When the machine is in state A, the delay value M(FBn+ 1 ) used to flf t~, I..in~ values for delays mn (k) is not t~d When in state B, the machine adjusts M(FBn+l ) as follows:
M(FB n + I ) = M(FB n + 1) + ~ where o illu;~tlali-vly equals one sample dme. When in state C, the machine adjusts M(FB n + 1 ) as follows:
15 M(FBn+l ) = M(FBn+l )~~vi~ -Given an inidal state (A, B, or C), the finite state machine operates by keeping track of values of acc_shif t. If the value of acc_shif t is such that a;f - - for m ~:l;....:. g between the current state and another state is met, a - to the other state occurs. For ~ cS--min~ the machine is in state A
20 (an ill..stlali-v inidal state for e 140) and--3ms ~ acc_shif t < 3ms, the machine would remain in state A and M(FBn+~ ) would not be mf~-.fifif ~1 If the valueof acc_shift exceeds 3ms, themachineI nn~tO stateCand M(FBn+1) iS i-~,lC i by one sample time to help offset ~he a~ -clllul~ .f~ by acc_shif t. If, on the other hand, when in state A acc_shif t becomes less than 2s - 3ms, the machine i ~ - to state B and M(FB n + l ) is d,~,lc ~ by one sample to help offset the zs~ ,Llu..~. The op~. is sirnilar for states B and C.

An All~ Dlustrative Embodiment One ~ to the m ~_ embodiment p~ s ~ in Figure 2 is plC3~ ~ in Figure 11. In this f ~..ho'~ , a trial signal gvnV alûl 610 receives an 30 original digital speech signal, x(i), and gv.~e._ - a plurality of trial original signals, x(i). The trial oliginal signal g _ 610 co-~ ;.es a lilllv-shirl ~JlUCvS~Of~ similar to that plCS ~ ' ~1 in Figures 2, 7, and 9, but which does not perform a Cv'llC' '-~n between a trial original signal fmd an adaptive coflf bo~ conhibuti~n Rather, this time shift pl~]Cf,;~:~Ol simply provides a plurality of L~length tlial original signals 35 based on a plurality ûf shifts of original speech signal x(i). As fl; c. ~ ~.s~d above with ., -23- 210208a reference to Figure 10, these trial original signals areL-length segments of theoriginal signal dctv.llf~nvd by shifts of step size sstep over a range of +limit with respect to an L-length segment beginnin~ at sample sfstart and ending at sample sfend. Because it pvlrf l.lls no cross-correlation between the original residual and S trial original signals, ~ .. 610 does not select a trial original signal for coding on its own. Rather it provides the trial original signals, x(i), it gv.lvlUtvs to a codf,./~ llf~ 620 for p Codvl/~ tLes; cr 620cfJ~ vs aconventional analysis-by-s~ llf,s;s ~ -coder, such as the conventional CELP coder p-vsv-ltvd in Figure 1. The sy ' ~
10 (or lvconsLIuvtvd) original signal, x(i), is that shown in Figure 1 as the sum of the adaptive and fixed codebook output signals, e~i)+~(i)x(i-d(i)) (see circuit 45 of Figure 1). The coded signal pa~lvtCl~ fl- t- "~inf~d by the analysis ~l~rCf ~;ng of the CELP coder (from which the ~ l.f ~ signal x(i) is genPr~tf-d) may be saved in RAM for later use. The output of the coder/synthPsi7P.r 620, x(i), is thus an estimate 1S of the original signal, x(i), based on a given trial original signal, x(i). This estimate of the original signal is Ihvlvartvr co ~ vl with the trial original signal to -rl. ~ .. ;nr a measure of the similarity between the e ~ original, x(i), and the trial original, x(i). This measure similarity is provided to a subtraction circuit 630, which flf ~ . .n;~f~ s a dirÇvl-v..v-v (or error) signal, E(i), between the two signals. The 20 error signal E(i) is pluvidc~ to the trial signal gfmP~tf r 610 which keeps track of the error ~CSof ~ ~ with a given trial original signal. Once all trial original signals have been plvl,f,sse~ in this way, the trial signal g~ o~ may dctr,. ..~;~f which trial signal, x(i), produced the best measure of similarity (e.g., the smallest error).
Thvlvàrtvl~ ~vnvldtvl 610 may signal the coder/s~ l.f ,;, ,~ 620 to use the saved code 2s F ~llvtvl~ accof~ ed with the trial original signal having the smallest error. These L ~lvtvl~ may be ~c ~ to a receiver as a coded lv~lvse ~ if ~ ~ of the original signal,x(i).
It will be undvl~tvod by those of ordinary skill in the art that reference to signals such as the "original" signal, ''lvcou~llu-;lvd'' signal, etc., may include 30 lefvlvncv to 5e~ thereof. MOlb~ whether a given signal is I r ~ ~ d or not does not change its ~ ' avtvl as an "original" signal, a "trial original" signal, etc.
Hence, use of the term "samples" with lefvlv.~ce to, e.g., an "original signal" may include those sarnple values of the signal provided by an ~lrQ~mrlin~ (suchas con~,vntional t dlimitpd hltv.~,ol~ion), those samples which are not the result of 35 urs~mrling, or both.

' '' -24- 2102~8Q

Introduction to Appendix Attached as an appendix hereto is an illustradve set of software programs related to the first illu~ ive embodiment ~licrllc~ed above. The software ~)lug~ 5 of this set are written in the "C" plu~ ..h-g lAngllagç An e ~ o 1;..~ .l S of this invention may be ylu.;dcd by t;Y~ ~-ulh~e these plUol~lS on a general purpose COI11~J. , for exr---p~e~ the Iris Indigo work stadon ~AA~ h.d by Silicon (~r.~hirc Inc. Note that ~ubluulil~es '~c~h:nr~ and "Illodilyol;g" cc,..~ ...A generally to those functions pl~ ~ in Figure 9.

.:: -... . ~, . . . . . .

~ ' ............................ .. ' ~

:' ' Nov 16 09:07 1992 mod.c Page 1 2~ 02~80 -~include "macro.h"
/ * ., . . ' * mod - modify residual void mod( residualm, accshift, d_shift, shiftr, exctation, residual, dpl, dpm, lpcw, lpcorder, delay, subframel, extra, fcnt) float *residualm; /* output: modified residual signal */
float *accshift; /* output: shift from mresidual to residual */
float *d_shift; /* output: local shift for all samples */
float shiftr; /* input: ~i shift range */ ' ~
float *exctation; /* input: adaptive codebook excitation */ - .: .
float *residual; /* input: original residual */
~ int dpl; /* input: pointer to output signals */
int *dpm; /* in/out: pointer to end of residualm */ :
float *lpcw; /* input: weigted lpc coefficients */ :~- -:
int lpcorder; /* input: lpc order */ : ..
float delay; /* input: delay */ : -: :
int su~framel; /* input: subframe length */ ~ - .
int extra; /* input: additional exctation constructed */
long fcnt; -.
void cshiftframe~);
void modifyorig();
float shiftr2;
int sfstart, sfend;
while( *dpm < dpl+subframel){
cshiftframe( &sfstart, &sfend, &shiftr2, *dpm, residual, dpl, *accshift, shiftr, delay, subframel, extra, fcnt);
modifyorig( residualm, accshift, d_shift, dpm, shiftr2, exctation, residual, sfstart, sfend);

: ' :
,, ::

, -( , :. ' ~: ~'::
: ' ~ ' ' :' :

:: . . . - ' : : ~

Nov _ 19:56 1992 cshiftframe.c Page 1 2 1 ~ 2 0 8 ~

#include "macro.h"
/*
* cshiftframe - find optimal frame shift */
void cshiftframe( sfstart, sfend, maxshift2, dpm, residual, dpl, accshift, maxshift, delay, subframel, extra, fcnt) int *sfstart; /* output: shift-frame start */
int *sfend; /* output: shift-frame ending */
float *maxshift2; /* output: one-sided shift range */
int dpm; /* output: up to where residualm exists */
float *residual; /* input : original residual signal */
int dpl; /* input : output signal pointer */
float accshift; /* input : shift of output versus input */
float maxshift; /* input : r~Y; shift range */
float delay; /* input : local pitch value */
int subframel; /* input : subframe length */
int extra; /* input : additional excitation beyond current frme */
long fcnt; /* input : frame counter (D~BUG) */
void maxeloc(); /* determine location of max energy */
float maxener;
int offset;
int iacshift;
int length;
int loc, loc2;
if( delay < 0){
iacshift = -accshift + 0.5;
iacshift = -iacshift;
else iacshift = -accshift + 0.5; -/* determine first a pitch pulse somewhere near dpm */
length = 1.5 * delay;
offset = dpm + iacshift - 0.25 * delay; : ::
maxeloc( &loc, 6r~Yenerr residual, offset, length, 2);
loc -= iacshift;
printf("cshiftframe: firstloc %d ", loc - dpl); - ~
/* now find the first pitch pulse for sure */ ~-if( loc < dpm){
offset = loc + iacshift + 0.75 * delay + 0.5;
length = 0.5 * delay;
maxeloc( &loc, r~-Y~n~r~ residual, offset, length, 2);
loc -= iacshift;
printf~" Aloc %dn, loc - dpl);
I
if( loc > dpm+delay)l . . :~
offset = loc + iacshift - 1.25 * delay + 0.5; : :
lenqth = 0~5.* delay;
maxeloc( &loc2, ~r-Y~n~r~ residual, offset, length, 2);
loc2 -= iacshift; '~
if( loc2 >= dpm) loc = loc2;
printf(" Bloc %d", loc - dpl);

.: .
No~ 1~-14:56 1992 cshiftframe.c Page 2 2~ 02~8~
*sfstart = dpm;
*sfend = loc + extra;
*maxshift2 = maxshift;
if( *sfend >'dpl + subframel + extra) *sfend = dpl + subframel + extra; ~ ~.
if( loc >= dpl + subframel + extra/2) *sfend = dpl + subframel; .
if( loc >= *sfend 11 loc < *sfstart) *maxshift2 = 0; ~ ~
printf(" loc is: %d\n",loc-dpl); :
/* debugging pictures */
/* " ' { . :
char titlel[100~;
static float w11200], w2[200];
register int i; ~ :
for( i=0; i<200; i++) wl[i] = 0.0; . :
wl[loc-dpl-l] =50.0;wl[1oc-dpl+l]= 50.0; wl[loc-dpl]=100; . - :
for( i=0; i< subframel+extra; i++) w2[i] = residual[dpl+iacshift+i];
for( i=0; i<*sfstart-dpl; i++) w2[i] = 0.0;
for( i=*sfend-dpl; i<subframel+extra; i++) w2[i] = 0.0;
sprintf(titlel,"shiftrange %5.3f", *maxshift2); -pictures3( residual+dpl+iacshift, subframel+extra, wl, subframel+extra, .::.
w2, subframel+extra, fcnt, titlel, "considered", "shifted");

':~
: . , ~
' '' ' ' , ~' ;"~ ~,;

' ~ . ' , ''''. - , ~

Nov ~' 09:07 1992 maxeloc.c Page l 21~2~
#include "macro.h"
void maxeloc( maxloc, maxener, signal, dp, length, ewl) int *maxloc; /* output: location of maximum energy */
float *maxener; /* output: energy at loc */
float *signal; /* input: signal for which energy is to be found*/
int dp; /* input: data pointer into signal */
int length; /* input: window of data */
int ewl; /* input: half length of energy window */
float ener;
register int i;
int tail, front;
ener = 0.0;
front = dp + ewl;
tail = dp - ewl;
for( i=tail; i<=front; i+t) ener += signal[i] * signal[i];
*maxloc = dp;
*maxener = ener;
for( i=l; i<length; i++){ -front++;
ener += signal[front] * signal[front] - signal[tail] * signal[tail];
tail++;
if( *maxener < ener){
*maxloc = i + dp; ;- :~
*~ ner = ener; :~

Nov - ~'12:37 1992 modifyorig.c Page 1 2 ~ 0 2 0 8 0 : ~ ~

~include "macro.h" -~
t* . :~ .
* modifyorig - modify original - ~.
*/ , void modif~orig( residualm, accshift, d_shift, dpm, shiftrange, :.
exctation, residual, dpl, sfend) float *residualm; /* output: modified residual signal */
float *accshift; /* in/out: accumulated shift */
float *d_shift; /* output: local shift value */
int *dpm, /* output: first nonvalid sample of residualm */ ::. :float shiftrange; /* input : one side of shift range */ : :
float *exctation; /* input : excitation waveform */ ~ ~ ;
float *residual; /* input : original residual signal */ : :
int dpl; /* input : window start */ : ~:
int sfend; /* input : window end */ ~ :~
void bl intrp();
void getcrit();
void testi_ubound(); .
int k;
float criterion, best;
float shift;
float optshift;
float locmax;
int leftlimit, rightlimit;
int length; ' : :::.
#define MAXDIM 100 : : -:.~
float critlMAXDIM]; : ~ :-.. :.-float a, b;
float sstep length = sfend - dpl; . .~
' ' ' ':. .' '. . . ' ~., ~.' ' :: .:
/* first we upsample by a factor 2 */ . .:~
sstep = 0.5;
rightlimit = shiftrange/sstep + O S;
leftlimit = -rightlimit;
if( leftlimit == rightlimit) rightlimit = leftlimit - 1; :
printf( modifyorig: llim %d rlim %d", leftlimit, rightlimit);
testi_ubound( rightlimit*2+1, MAXDIM, modifyorig.cl");
for(k=leftlimit; k<=rightlimit; k++){ . -~
shift = *accshift + k * sstep; : .
getcrit( crit+k-leftlimit, residual+dpl, exctation+dpl, shift, length);

/* then we interpolate the criterion */
best = 0.0; : .
optshift = *accshift; :. :
for(k=leftlimit+l; k<rightlimit; k++){
shift = *accshift + k * sstep;
a = crit[k-leftlimit+l~ + crit[k-leftlimit-1] - 2.0 * crit[k-leftlimit~
criterion = -2.0;
ift a != 0.0){
b = crit[k-leftlimit~l] - crit[k-leftlimit-1];
locmax = - b / (2.0 * a); . :
if( locmax <- 0.5 && locmax >= -O.S) ~ ~
'~ .
: : :
,,..,~,.

Nov lP-~2:37 1992 modifyorig c Page 2 21020~a criterion = a * locmax * locmax + b * locmax + 2.0 * crit[k-leftiimit~;
if( criterion > best)( optshift = shift + sstep * locmax;
best = criterion;

*accshift = optshift;
printf(" optshift %5.2f best %.4e\n", optshift, best);
if( best<=l.0) for(k=leftlimit+1; k<rightlimit; k++) printf("k=%d %f\n", k, crit[k-leftlimit]);
for( k=0; k<length; k++)( bl intrp( residualm+dpl+k, residual+dpl+k, *accshift, 0.9, 8);
d shift[dpl+k] = *accshift;
*dpm = dpl+length;

: '' :::

.. ;'.,.~ ~.

Nav 1~ 09:07 1992 bl_intrp.c Page 1 210208B

#include "macro.h"
/* : :
* bl_intrp - band-limited interpolation void bl_intrp~ output, input, delay, factor, fl) float *output; /* output: interpolated output value */
float *input; /* input : array to be interpolated */
float delay; /* input : delay where actual input is */
float factor; /* input : cut-off frequency (relative to fs*/
int fl; /* input : filter length is 2*fl+1 */
/* NOTES
* computes "input" signal value * at "delay" prior to the array pointer "input" into the "input" array.
*/ : :
register int n;
register float t;
register float *f; ;-register float argl, arg3; ~-register float denom;
int offset;
if( delay < 0){
offset = -delay + 0.5, ~ - -~, ;
offset = -offset; - -~ - ~
}
else offset = delay + 0.5;
t = offset - delay;
f = input - offset; /* center sum around f */
denom = 2.0 / (2.0 * fl + 1.0);
*output = 0.0;
for( n= -fl; n<=fl; n++)l argl = PI * factor * (t-n);
arg3 = PI * (t-n);
if( argl < l.e-2 && argl > -l e-2)/* just copy */
*output += factor * *(f+n);
else /* sinc function multiplied by hamming window */
*output += factor * (0.54 + 0.46 * cos( arg3 * denom )) *
*(f+n) * sin( argl) / argl;
':
, ...
, Nov 16 09:07 1992 testbound.c Page 1 ,* '' 21020~0 * testi_ubound - test if argument a exceeds int boundary b and print text :
void testi_ubound( a, b, text) int a; /* input: value to be tested */ - :
int b; /* input: boundary value */
char *text; /* input: program name */
if( a > b){
printf("\n%s-f-value exceeds range %d > %d\n", text, a, b); :.
exit(10);
) / * '~: -* testi bound - test if argument a exceeds range bl,b2 and print text */ .:
void testi_bound( a, bl, b2, text) -~.
int a; /* input: value to be tested */
int bl,b2; /* input: boundary values */ ~:
char *text; t* input: program name */
if~ a < bl )~
printf("\n%s-f-value exceeds range %d < %d\n", text, a, bl);
exittlO); ..
else if (a > b2 )~
printf~n\n%s-f-value exceeds range %d > %d\n", text, a, b2); ~ ~:.. :
exit(10); ~ :

/*
* testf_bound - test if argument a exceeds range bl,b2 and print text */ .:, void testf_bound( a, bl, b2, text) float a; t* input: value to be tested */
float bl,b2; /* input: boundary values */
char *text; /* input: program name ~/
, . ..
if( a < bl )~
printf("\n%s-f-value exceeds range %f < %f\n , text, a, bl); :.
exit(10); : :
else if (a > b2 )~
printf("\n%s-f-value exceeds range %f > %f\n", text, a, b2); :. .
exit(10); ~ ........ -}
,...
* testd_bound - test if argument a exceeds range bl,b2 and print text void testd_bound( a, bl, b2, text) double a; /* input: value to be tested */
double bl,b2; /* input: boundary values */
char *text; . /* input: program name */
..
if~ a < bl )~
~.

Nov 16 09:07 1992 testbound.c Page 2 210208~ ~ ~

printf("\n%s-f-value exceeds range %f < %f~n", text, a, bl);
exit(10);
}
else if (a > b2 )~
printf("\n%s-f-value exceeds range %f > %f\n", text, a, .b2);
exit(lO);

' ' , ,'::: ~"-::: . '' ' ' ~ - ' ': ' . ' - 34 - :
v 16 09:07 1992 getcrit.c Page 1 21020~0 #include "macro h"
/ * :: :
* getcrit - compute error between excitation and shifted residual */
void getcrit~ criterion, residual, exctation, shift, length) .
float *criterion; /* output: error criterion */
float *residual; /* input : residual signal */
float *exctation; /* input : reference signal */
float shift; /* input : shift */
int length; /* input : vector length */
void bl_intrp();
float output;
register int i;
*criterion = 0.0; .
for( i=O; i<length; i++)( bl_intrp( &output, residual+i, shift, 0.9, 8);
*criterion += output * exctation[i];

';~.

Claims (20)

1. A method for coding an original signal, the method comprising the steps of:
a. identifying one or more samples of the original signal based on a sample identification criterion;
b. selecting a plurality of segments of the original signal to form a plurality of different trial original signals, each selected segment including one or more of the identified samples;
c. for each of the plurality of trial original signals, evaluating a measure of similarity between the trial original signal and a signal synthesized to estimate the trial original signal;
d. determining a trial original signal for use in coding based on one or more evaluated measures of similarity; and e. generating a coded representation of the original signal based on one or moredetermined trial original signals.
2. The method of claim 1 further comprising the steps of:
1. analyzing one or more trial original signals to produce one or more parameters representative thereof; and 2. synthesizing a signal which estimates the original signal, the synthesis based on one or more of the parameters.
3. The method of claim 1 wherein the step of identifying one or more samples of the original signal comprises analyzing the original signal to locate a local energy maximum.
4. The method of claim 1 wherein each selected segment of the original signal comprises, in addition to said one or more of the identified samples, original signal samples other than the identified signal samples.
5. The method of claim 4 wherein a particular selected segment comprises identified samples preceding one or more other original signal samples.
6. The method of claim 1 wherein the step of selecting a plurality of segment comprises:
1. determining a time shift with reference to one or more samples of the original signal; and 2. determining a set of original signal samples based on the time shift.
7. The method of claim 1 wherein the step of evaluating a measure of similarity comprises forming a cross-correlation between the trial original signal and the synthesized signal.
8. The method of claim 1 wherein the step of determining a trial original signal for use in coding comprises the step of selecting a trial original signal from among the plurality of trial original signals, the selection of the trial original signal based upon a comparison of evaluated measures of similarity.
9. The method of claim 1 wherein the step of determining a trial original signal for use in coding comprises the step of generating a trial original signal based on evaluated measures of similarity.
10. The method of claim 9 wherein the step of generating a trial original signalcomprises:
1. determining a substantially local maximum measure of similarity from among a plurality of trial original signal similarity measures; and 2. determining a time-shift reflecting the substantial maximum measure of similarity.
11. The method of claim 10 wherein the step of generating a trial original signal further comprises determining sample values for the trial original signal based on a formed trial original signal and the time-shift.
12. The method of claim 10 wherein the step of generating a trial original signal further comprises determining sample values for the trial original signal based on the original signal and the time-shift.
13. The method of claim 1 wherein the step of generating a coded representation of the original signal comprises performing analysis-by-synthesis coding.
14. The method of claim 13 wherein the step of performing analysis-by-synthesis coding comprises performing code-excited linear prediction coding.
15. An apparatus for coding an original signal, the apparatus comprising:
a. means for identifying one or more samples of the original signal based on a sample identification criterion;
b. means for selecting a plurality of segments of the original signal to form a plurality of different trial original signals, each selected segment including one or more of the identified samples;
c. means for evaluating a measure of similarity between each of the plurality oftrial original signals and a signal synthesized to estimate the trial original signal;
d. means for determining a trial original signal for use in coding based on one or more evaluated measures of similarity; and e. means for generating a coded representation of the original signal based on one or more determined trial original signals.
16. The apparatus of claim 15 further comprising:
1. means for analyzing one or more trial original signals to produce one or moreparameters representative thereof; and 2. means for synthesizing a signal which estimates the original signal, the synthesis based on one or more of the parameters.
17. The apparatus of claim 15 wherein the means for identifying one or more samples of the original signal comprises a means for analyzing the original signal to locate a local energy maximum.
18. The apparatus of claim 15 wherein the means for selecting a segment comprises:
1. means for determining a time shift with reference to one or more samples of the original signal; and 2. means for determining a set of original signal samples based on the time shift.
19. The apparatus of claim 15 wherein the means for generating a coded representation of the original signal comprises means for performing analysis-by-synthesis coding.
20. The apparatus of claim 19 wherein the means for performing analysis-by-synthesis coding comprises means for performing code-excited linear prediction coding.
CA002102080A 1992-12-14 1993-10-29 Time shifting for generalized analysis-by-synthesis coding Expired - Lifetime CA2102080C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US99030992A 1992-12-14 1992-12-14
US990,309 1992-12-14

Publications (2)

Publication Number Publication Date
CA2102080A1 CA2102080A1 (en) 1994-06-15
CA2102080C true CA2102080C (en) 1998-07-28

Family

ID=25536013

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002102080A Expired - Lifetime CA2102080C (en) 1992-12-14 1993-10-29 Time shifting for generalized analysis-by-synthesis coding

Country Status (6)

Country Link
EP (1) EP0602826B1 (en)
JP (1) JP3770925B2 (en)
CA (1) CA2102080C (en)
DE (1) DE69326126T2 (en)
ES (1) ES2136649T3 (en)
MX (1) MX9307743A (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5704003A (en) * 1995-09-19 1997-12-30 Lucent Technologies Inc. RCELP coder
CA2213909C (en) * 1996-08-26 2002-01-22 Nec Corporation High quality speech coder at low bit rates
FI113903B (en) * 1997-05-07 2004-06-30 Nokia Corp Speech coding
JP4857468B2 (en) * 2001-01-25 2012-01-18 ソニー株式会社 Data processing apparatus, data processing method, program, and recording medium
JP4857467B2 (en) 2001-01-25 2012-01-18 ソニー株式会社 Data processing apparatus, data processing method, program, and recording medium
JP3888097B2 (en) 2001-08-02 2007-02-28 松下電器産業株式会社 Pitch cycle search range setting device, pitch cycle search device, decoding adaptive excitation vector generation device, speech coding device, speech decoding device, speech signal transmission device, speech signal reception device, mobile station device, and base station device
CA2365203A1 (en) * 2001-12-14 2003-06-14 Voiceage Corporation A signal modification method for efficient coding of speech signals
WO2005036527A1 (en) * 2003-10-07 2005-04-21 Matsushita Electric Industrial Co., Ltd. Method for deciding time boundary for encoding spectrum envelope and frequency resolution
FI118704B (en) 2003-10-07 2008-02-15 Nokia Corp Method and device for source coding
US8744091B2 (en) * 2010-11-12 2014-06-03 Apple Inc. Intelligibility control using ambient noise detection
EP3933836A1 (en) * 2012-11-13 2022-01-05 Samsung Electronics Co., Ltd. Method and apparatus for determining encoding mode, method and apparatus for encoding audio signals, and method and apparatus for decoding audio signals

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL8902347A (en) * 1989-09-20 1991-04-16 Nederland Ptt METHOD FOR CODING AN ANALOGUE SIGNAL WITHIN A CURRENT TIME INTERVAL, CONVERTING ANALOGUE SIGNAL IN CONTROL CODES USABLE FOR COMPOSING AN ANALOGUE SIGNAL SYNTHESIGNAL.
CA2068526C (en) * 1990-09-14 1997-02-25 Tomohiko Taniguchi Speech coding system
DE69225293T2 (en) * 1991-10-25 1998-09-10 At & T Corp Generalized analysis-by-synthesis method and device for speech coding

Also Published As

Publication number Publication date
EP0602826A3 (en) 1994-12-07
EP0602826A2 (en) 1994-06-22
CA2102080A1 (en) 1994-06-15
ES2136649T3 (en) 1999-12-01
DE69326126T2 (en) 2000-07-06
JPH06214600A (en) 1994-08-05
JP3770925B2 (en) 2006-04-26
MX9307743A (en) 1994-06-30
EP0602826B1 (en) 1999-08-25
DE69326126D1 (en) 1999-09-30

Similar Documents

Publication Publication Date Title
CA2102080C (en) Time shifting for generalized analysis-by-synthesis coding
US4944013A (en) Multi-pulse speech coder
EP0821849B1 (en) Reduced complexity encoder for signal transmission system
US4736428A (en) Multi-pulse excited linear predictive speech coder
CA2021508C (en) Digital speech coder having improved long term lag parameter determination
US5426718A (en) Speech signal coding using correlation valves between subframes
WO2001061687A1 (en) Wideband speech codec using different sampling rates
US5119424A (en) Speech coding system using excitation pulse train
KR100204740B1 (en) Information coding method
EP0577809A1 (en) Double mode long term prediction in speech coding.
EP1420391A1 (en) Generalized analysis-by-synthesis speech coding method, and coder implementing such method
Kleijn et al. Generalized analysis-by-synthesis coding and its application to pitch prediction
CA2132006C (en) Method for generating a spectral noise weighting filter for use in a speech coder
EP0944038A1 (en) Speech encoder with features extracted from current and previous frames
US4764963A (en) Speech pattern compression arrangement utilizing speech event identification
US5920832A (en) CELP coding with two-stage search over displaced segments of a one-dimensional codebook
US5924063A (en) Celp-type speech encoder having an improved long-term predictor
US5621853A (en) Burst excited linear prediction
Paulus Variable bitrate wideband speech coding using perceptually motivated thresholds
FI96248B (en) Method for providing a synthetic filter for long-term interval and synthesis filter for speech coder
EP0539103B1 (en) Generalized analysis-by-synthesis speech coding method and apparatus
EP0520462B1 (en) Speech coders based on analysis-by-synthesis techniques
JP3749838B2 (en) Acoustic signal encoding method, acoustic signal decoding method, these devices, these programs, and recording medium thereof
CA2218223C (en) Reduced complexity signal transmission system
Soheili et al. Techniques for improving the quality of LD-CELP coders at 8 kb/s

Legal Events

Date Code Title Description
EEER Examination request
MKEX Expiry

Effective date: 20131029