WO1999060561A2 - Split band linear prediction vocoder - Google Patents
Split band linear prediction vocoder Download PDFInfo
- Publication number
- WO1999060561A2 WO1999060561A2 PCT/GB1999/001581 GB9901581W WO9960561A2 WO 1999060561 A2 WO1999060561 A2 WO 1999060561A2 GB 9901581 W GB9901581 W GB 9901581W WO 9960561 A2 WO9960561 A2 WO 9960561A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pitch
- frame
- value
- frequency
- voicing
- Prior art date
Links
- 230000003595 spectral effect Effects 0.000 claims abstract description 41
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 21
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 21
- 238000004458 analytical method Methods 0.000 claims abstract description 19
- 238000001228 spectrum Methods 0.000 claims description 77
- 239000013598 vector Substances 0.000 claims description 44
- 238000000034 method Methods 0.000 claims description 34
- 230000005284 excitation Effects 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 17
- 238000012545 processing Methods 0.000 claims description 14
- 238000005070 sampling Methods 0.000 claims description 12
- 238000011156 evaluation Methods 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 101100455532 Arabidopsis thaliana LSF2 gene Proteins 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 230000001419 dependent effect Effects 0.000 claims description 4
- 238000009499 grossing Methods 0.000 claims description 4
- 230000000717 retained effect Effects 0.000 claims description 2
- 230000003247 decreasing effect Effects 0.000 claims 1
- 230000000875 corresponding effect Effects 0.000 description 32
- 101100456896 Drosophila melanogaster metl gene Proteins 0.000 description 26
- 101100261242 Mus musculus Trdmt1 gene Proteins 0.000 description 18
- 230000000694 effects Effects 0.000 description 18
- 238000003775 Density Functional Theory Methods 0.000 description 11
- 238000005314 correlation function Methods 0.000 description 5
- 101100008044 Caenorhabditis elegans cut-1 gene Proteins 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000002441 reversible effect Effects 0.000 description 3
- 101100455531 Arabidopsis thaliana LSF1 gene Proteins 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 238000013213 extrapolation Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/93—Discriminating between voiced and unvoiced parts of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/90—Pitch determination of speech signals
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/08—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
- G10L19/10—Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a multipulse excitation
Definitions
- This invention relates to speech coders.
- a speech coder including
- the encoder including: linear predictive
- LPC coding
- pitch determination means for determining at
- the pitch determination means including first
- estimation means for analysing samples using a frequency domain technique
- voicing means for defining a measure of voiced and unvoiced signals in each
- amplitude determination means for generating amplitude information for each
- said first estimation means generates a first measure of pitch for each of a number of candidate
- the second estimation means generates a respective second measure of
- the encoder comprising
- LPC predictive coding
- voicing means for defining a
- pitch estimation means for determining an estimate of the value of pitch
- pitch refinement means for deriving the value of pitch from the estimate, the pitch
- refinement means defining a set of candidate pitch values including fractional values
- the encoder comprising
- LPC predictive coding
- quantisation means for quantising said set of coefficients, said value of pitch, said
- a speech coder including an encoder for encoding an input speech signal, the encoder comprising,
- LPC predictive coding
- voicing means for defining a
- the quantisation means quantises the normalised spectral
- the encoder comprising
- LSF Spectral Frequency
- voicing means for defining a measure of voiced and unvoiced signals in each
- amplitude determination means for generating amplitude information for each
- LSF'3 and LSF'l are respectively sets of quantised LSF coefficients for the
- ⁇ is a vector in a first vector quantisation codebook, defines each said set
- Figure 1 is a generalised representation of a speech coder
- Figure 2 is a block diagram showing the encoder of a speech coder according to the
- Figure 3 shows a waveform of an analogue input speech signal
- Figure 4 is a block diagram showing a pitch detection algorithm used in the encoder of Figure 2;
- Figure 5 illustrates the determination of voicing cut-off frequency
- Figure 6(a) shows an LPC Spectrum for a frame
- Figure 6(b) shows spectral amplitudes derived from the LPC spectrum of Figure 6(a);
- Figure 6(c) shows a quantisation vector derived from the spectral amplitudes of
- Figure 7 shows the decoder of the speech coder
- Figure 8 illustrates an energy-dependent interpolation factor for the LSF coefficients
- Figure 9 illustrates a perceptually-enhanced LPC spectrum used to weight the
- Figure 1 is a generalised representation of a speech coder, comprising an encoder 1
- an analogue input speech signal S,(t) is received at the
- sampled speech signal is then divided into frames and each frame is encoded to
- decoder 2 processes the received quantisation indices to synthesize an analogue output
- the speech channel requires an encoder
- duplex link or the same channel in the case of a simplex link is duplex link or the same channel in the case of a simplex link.
- Figure 2 shows the encoder of one embodiment of a speech coder according to the
- SB-LPC Split-Band LPC
- the speech coder uses an Analysis and Synthesis scheme.
- the described speech coder is designed to operate at a bit rate of 2.4kb/s; however,
- quantisation indices are updated.
- the analogue input speech signal is low pass filtered to remove frequencies
- the low pass filtered signal is then sampled at a
- the effect of the high-pass filter 10 is to remove any DC level that might be present.
- the preconditioned digital signal is then passed through a Hamming window 1 1
- each frame is 160
- the LPC filter 12 attempts to establish a linear relationship
- LPC(0),LPC(1) ... LPC(9) are then transformed to generate
- LSF Line Spectral Frequency
- the LSF coefficients are then passed to a vector quantiser 14 where they undergo a
- coefficients facilitate frame-to- frame interpolation, a process needed in the decoder.
- the vector quantisation process takes account of the relative frequencies of the LSF
- the LSF coefficients are quantised
- LSF(6),LSF(7),LSF(8),LSF(9) form a third group G 3 which is also quantised using 8
- the vector quantisation process is carried out using a codebook containing 2 8 entries, numbered 1 to 256, the r th entry in the codebook consisting of a vector Y r of three elements V r (0), V r ( l ), V r (2) corresponding to the coefficients LSF(0),LSF(1),LSF(2) respectively.
- the aim of the quantisation process is to select a vector ⁇ ._ which best matches the actual LSF coefficients.
- W(i) is a weighting factor
- the entry giving the minimum summation defines the 8 bit quantisation index for the LSF coefficients in group G,.
- the effect of the weighting factor is to emphasise the importance in the above summations of the more significant peaks for which the LSF coefficients are relatively close.
- the RMS energy E 0 of the 160 samples in the current frame n is calculated in background signal estimation block 15 and this value is used to update the value of a background energy estimate E BG n according to the following criteria:
- E BG n' ' is the background energy estimate for the immediately preceding frame, If E R ⁇ n is less than 1, then E B ⁇ " is set at 1
- E BG n and E 0 are then used to update the values of NRGS and NRGB which represent the expected values of the RMS energy of the speech and background components respectively of the input signal according to the following criteria:
- NRGB ⁇ 0.05 then NRGB" is set at 0.05
- NRGS NRGS
- Figure 3 depicts the waveform of an analogue input speech
- the waveform exhibits relatively large amplitude pitch pulses P u which are an important
- the pitch or pitch period P for the frame is defined
- pitch period P is inversely related to the fundamental pitch frequency C ⁇ 0 , where ( 0
- the fundamental pitch frequency Q 0 will, of course, be accompanied
- pitch period P is an important characteristic of the speech signal
- P is central to the determination of other quantisation indices produced by the encoder.
- DFT block 17 uses a 512 point fast Fourier transform (FFT) algorithm.
- FFT fast Fourier transform
- Samples are supplied to the DFT block 17 via a 221 point Kaiser window 18 centred
- M max is calculated in block 402, as
- M(i) are preprocessed in blocks 404 to 407.
- a bias is applied in order to de-emphasise the main peaks in the
- each magnitude is weighted by the factor 1 -
- each magnitude M(i) is applied to the weighted magnitudes in block 405. To this end, each magnitude M(i)
- a threshold value typically in the range from 5 to 20
- M(i) is set at the threshold value.
- the resultant magnitudes M'(i) are then analysed in block 406 to detect for peaks.
- a smoothing algorithm is then applied to the magnitudes M'(i)in block 407 to generate
- a variable x is initialised at zero and is
- the process is effective to eliminate relatively small peaks residing next
- amp pk is less than a factor c times the magnitude a(i) at the same frequency.
- c is set at 0.5.
- K( ⁇ 0 ) is the number of harmonics below the cut-off frequency
- this expression can be thought of as the cross-correlation function between the frequency response of a comb filter defined by the harmonic amplitudes a(k ⁇ 0 ) of
- D(freq pk (l) - k ⁇ 0 ) is a distance measure related to the frequency separation between
- pitch candidate is the actual pitch value. Moreover, if the pitch candidate is twice the
- a pitch value which is half the actual pitch value i.e. a pitch halving
- the second estimate is evaluated using a time-domain analysis technique by forming
- N is the sample number
- the input samples for the current frame may be autocorrelated in block 412
- V, and V exceeds a preset threshold value (typically about 1.1 ), then the confidence is high that the values L,L 2 are close to the correct pitch value. If so, the
- the values of Metl and Met2 are further weighted in block 413 according to a tracked
- the current frame contains speech i.e. if E 0 > 1.5 E BG n , the
- b 4 is set at 1.56 and b 5 is set at 0.72. If it is determined
- NRGB is reduced - if ⁇ 0.5, b 4 is set at 1.1 and b 5 is set at 0.9 and for ⁇ 0.3, b 4 is set at
- a preset factor e.g. 2.0
- a constant e.g. 0.1
- P 0 is confirmed in block 416 as the estimated pitch value for the frame.
- the pitch algorithm described in detail with reference to Figure 4 is extremely robust
- the pitch value P 0 is estimated to an accuracy within 0.5 samples or 1 sample
- DFT block 20 a second discrete Fourier transform is performed in DFT block 20
- the window should still be at least three
- the input samples are supplied to DFT block 20 via a
- variable length window 21 which is sensitive to the pitch value P 0 detected in pitch
- the pitch refinement block 19 generates a new set of candidate pitch values containing fractional values distributed to either side of the estimated pitch value P 0 .
- the new values of Metl are computed in pitch refinement block 19 using substantially
- Equation 1 a first (low frequency) part
- the estimated pitch value P 0 was based on an analysis of the low
- the refined pitch value P ref generated in block 19 is passed to vector quantiser 22
- the pitch quantisation index R is defined by seven bits
- the quantised pitch levels L p (i) are defined as
- frequencies may be contained within the 4kHz bandwidth of the DFT block 20.
- a voicing block 23 derived from DFT block 20 is analysed in a voicing block 23 to set a voicing cut-off
- voicing cut-off frequency F c which is the periodic component of speech
- Each harmonic band is centred on a multiple k of a fundamental frequency ⁇ 0 , given
- variable length window 21 This is done by generating a correlation function S, for
- M(a) is the complex value of the spectrum at position a in the FFT
- a k and b k are the limits of the summation for the band
- W(m) is the corresponding magnitude of the ideal harmonic shape for the
- SF is the size of the FFT and Sbt is an up-sampling ratio, i.e. the ratio of the
- V(k) is further biassed by raising it to the power of l 3 ( Jt -10 )
- the function V(k) is compared with a corresponding threshold function THRES(k) at each value of k.
- THRES(k) The form of a typical threshold function THRES(k) is also shown in Figure 5.
- ZC is set to zero, and for each i between -N/2 and N/2
- ZC ZC + 1 if ip [i] x ip [i- 1] ⁇ O,
- ip is input speech referenced so that ip [0] corresponds to the input sample lying in the centre of the window used to obtain the spectrum for the current frame.
- residual (i) is an LPC residual signal generated at the output of a LPC inverse
- L 1 ', L2' are calculated as for L 1 ,L2 respectively, but excluding a predetermined
- PKYl and PKY2 are both indications of
- LH- Ratio 1.0, and LH-Ratio is clamped between 0.02 and 1.0.
- THRES(k) 1.0 - (1.0 - THRES(k)) (LH-Ratio x 5) v
- THRES(k) 1.0 - (1.0 - THRES(k)) (2.5 ER) , and
- the threshold values are further modified as follows:
- THRES(k) 1.0 - (1.0 - THRES(k))
- THRES(k) 0.85 + y 2 (THRES(k) - 0.85).
- THRES(k) 1.0 - V 2 (1.0 - THRES(k)).
- T THRES (k) l-(l-THRES(k) ) (— ) 2
- the input speech is low-pass filtered and the normalised cross-correlation is then computed for integer lag values P ref -3 to P ref +3, and the maximum value of the cross-
- THRES(k) 0.5 THRES(k).
- THRES(k) 0.55 THRES(k).
- THRES(k) 0.75 THRES(k).
- THRES(k) 1 - 0.75 ( 1 - THRES(k)).
- a summation S is then formed as follows:
- t volce (k) takes either the value " 1 " or the value "0".
- the values ⁇ (k) define a trial voicing cut-off frequency F c such that f ⁇ k)
- the summation S is formed for each of eight different sets of values
- t v0ice (k) has the value "0", i.e. at values of k above the cut-off frequency.
- the effect of the function (2t V01ce (k)-l) is to determine
- the corresponding index (1 to 8) provides the voicing quantisation index which is routed to a third output 0 3
- the quantisation index Y is defined by three
- the spectral amplitude of each harmonic band is evaluated in amplitude
- the spectral amplitudes are derived from a frequency
- Filter 28 is supplied with the original
- amp(k) of the band is given by the RMS energy in the band, expressed as
- M r (a) is the complex value at position a in the frequency spectrum derived from
- LPC residual signal calculated as before from the real and imaginary parts of the FFT, and a k and b k are the limits of the summation for the k ⁇ band, and ⁇ is a normalisation
- the harmonic band lies in the voiced part of the frequency
- amp(k) for the k lh band is given by the expression
- W(m) is as defined with reference to Equations 2 and 3 above.
- the normalised spectral amplitudes are then quantised in amplitude quantiser 26. It
- the LPC frequency spectrum P( ⁇ ) for the frame.
- the LPC frequency spectrum P( ⁇ ) represents the
- the LPC frequency spectrum is examined to find four harmonic bands containing the
- amp(l),amp(2),amp(3),amp(5) form the first four elements V(1),V(2),V(3),V(4) of an
- element V(5) is formed by amp(4), element
- V(6) is formed by the average of amp(6) and amp(7), element V(7) is formed by
- am ⁇ (8) and element V(8) is formed by the average of amp(9) and amp(l ⁇ ).
- the vector quantisation process is carried out with reference to the entries in a
- the first part of the amplitude quantisation index SI represents the "shape" of the
- the first part of the index SI consists of 6 bits (corresponding to a
- codebook containing 64 entries, each representing a different spectral "shape"
- second part of the index S2 consists of 5 bits.
- the two parts S1,S2 are combined to
- each entry may comprise a
- the decoder operates on the indices S, £ and Y to
- the encoder generates a set of quantisation indices JL££, £, Y, SI and S2 for each frame of the input speech signal.
- the encoder bit rate depends upon the number of bits used to define the quantisation
- the update period for each quantisation index is 20ms (the
- bit rate is 2.4kb/s.
- Table 1 also summarises the distribution of bits amongst the quantisation indices in
- index E derived during the first 10ms update period in a frame may be defined by a
- the frame length is 40ms.
- voicing quantisation indices £, Y are determined for one half of each frame
- indices for another half of the frame are obtained by extrapolation from the respective
- Each prediction value P2, P3 is obtained from the respective LSF quantisation vector
- ⁇ is a constant prediction factor, typically in the range from 0.5 to 0.7.
- LSF'2 LSF'l + (1- ⁇ ) LSF'3, ⁇ Eq 4
- ⁇ is a vector of 10 elements in a sixteen entry codebook represented by a 4-bit
- codebook consist of three groups each containing 2 8 entries, numbered 1 to 256, which
- the speech coder described with reference to Figures 3 to 6 may operate at a single
- the speech coder may be an adaptive multi-rate (AMR) coder
- the AMR coder is selectively operable at any one of the
- quantisation indices for each rate is summarised in Table 1.
- the quantisation indices generated at outputs 0,,0 2 ,0 3 and 0 4 of the speech encoder are
- the decoder the quantisation indices are regenerated and are supplied to inputs l l 2 ,I ⁇
- Dequantisation block 30 outputs a set of dequantised LSF coefficients for the frame
- Dequantisation blocks 31,32 and 33 respectively output dequantised values of pitch
- the first excitation generator 35 generates a respective sinusoid at the frequency of
- each harmonic band that is at integer multiples of the fundamental pitch frequency
- the first excitation generator 35 generates a set of sinusoids of the form A k cos(k ⁇ ), where k is an integer.
- Pref dequantised pitch value
- phase ⁇ (i) at any sample i is given by the expression
- ⁇ (i) ⁇ (i-l) + 2 ⁇ [ ⁇ tat (l-x) + ⁇ 0 . ⁇ ] ,
- the amplitude of the current frame is used, but scaled up by a
- voiced part synthesis can be implemented by an inverse DFT method
- the second excitation generator 36 used to synthesise the unvoiced part of the
- the windowed samples are subjected to a 256-point fast Fourier transform and the
- resultant frequency spectrum is shaped by the dequantised spectral amplitudes.
- each harmonic band, k, in the frequency spectrum is shaped
- the LPC synthesis filter 34 receives interpolated LPC coefficients
- the RMS energy E c in the current frame is greater than
- Figure 8 shows the variation of interpolation factor across the frame for different E ratios — - ranging from 0.125 (speech onset) to 8.0 (speech tail-off). It can be seen
- the k th spectral amplitude is derived from the LPC spectrum P( ⁇ ) described earlier.
- LPC spectrum P( ⁇ ) is peak-interpolated to generate a peak-interpolated spectrum
- ⁇ is in the range from 0.00 to 1.0 and is preferably 0.35.
- synthesis filter 44 which synthesises the smoothed output speech signal.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
- Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)
Abstract
Description
Claims
Priority Applications (8)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
BR9906454-5A BR9906454A (en) | 1998-05-21 | 1999-05-18 | Speech encoders. |
AU39454/99A AU761131B2 (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocodor |
KR1020007000661A KR20010022092A (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocodor |
CA002294308A CA2294308A1 (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocodor |
US09/446,646 US6526376B1 (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocoder with pitch extraction |
EP99922353A EP0996949A2 (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocoder |
JP2000550096A JP2002516420A (en) | 1998-05-21 | 1999-05-18 | Voice coder |
IL13412299A IL134122A0 (en) | 1998-05-21 | 1999-05-18 | Speech coders |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GBGB9811019.0A GB9811019D0 (en) | 1998-05-21 | 1998-05-21 | Speech coders |
GB9811019.0 | 1998-05-21 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1999060561A2 true WO1999060561A2 (en) | 1999-11-25 |
WO1999060561A3 WO1999060561A3 (en) | 2000-03-09 |
Family
ID=10832524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/GB1999/001581 WO1999060561A2 (en) | 1998-05-21 | 1999-05-18 | Split band linear prediction vocoder |
Country Status (11)
Country | Link |
---|---|
US (1) | US6526376B1 (en) |
EP (1) | EP0996949A2 (en) |
JP (1) | JP2002516420A (en) |
KR (1) | KR20010022092A (en) |
CN (1) | CN1274456A (en) |
AU (1) | AU761131B2 (en) |
BR (1) | BR9906454A (en) |
CA (1) | CA2294308A1 (en) |
GB (1) | GB9811019D0 (en) |
IL (1) | IL134122A0 (en) |
WO (1) | WO1999060561A2 (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1122717A1 (en) * | 2000-02-03 | 2001-08-08 | Alcatel | Coding method and apparatus for restoring speech signals packet-switched |
WO2004075571A3 (en) * | 2003-02-24 | 2005-01-06 | Ibm | Pitch estimation using low-frequency band noise detection |
EP1620844A2 (en) * | 2003-03-31 | 2006-02-01 | Motorola, Inc. | System and method for combined frequency-domain and time-domain pitch extraction for speech signals |
US9015038B2 (en) | 2010-10-25 | 2015-04-21 | Voiceage Corporation | Coding generic audio signals at low bitrates and low delay |
Families Citing this family (56)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6377919B1 (en) * | 1996-02-06 | 2002-04-23 | The Regents Of The University Of California | System and method for characterizing voiced excitations of speech and acoustic signals, removing acoustic noise from speech, and synthesizing speech |
US7092881B1 (en) * | 1999-07-26 | 2006-08-15 | Lucent Technologies Inc. | Parametric speech codec for representing synthetic speech in the presence of background noise |
JP3558031B2 (en) * | 2000-11-06 | 2004-08-25 | 日本電気株式会社 | Speech decoding device |
US7016833B2 (en) * | 2000-11-21 | 2006-03-21 | The Regents Of The University Of California | Speaker verification system using acoustic data and non-acoustic data |
DE60029147T2 (en) * | 2000-12-29 | 2007-05-31 | Nokia Corp. | QUALITY IMPROVEMENT OF AUDIO SIGNAL IN A DIGITAL NETWORK |
GB2375028B (en) * | 2001-04-24 | 2003-05-28 | Motorola Inc | Processing speech signals |
FI119955B (en) * | 2001-06-21 | 2009-05-15 | Nokia Corp | Method, encoder and apparatus for speech coding in an analysis-through-synthesis speech encoder |
KR100347188B1 (en) * | 2001-08-08 | 2002-08-03 | Amusetec | Method and apparatus for judging pitch according to frequency analysis |
US20030048129A1 (en) * | 2001-09-07 | 2003-03-13 | Arthur Sheiman | Time varying filter with zero and/or pole migration |
DE60307252T2 (en) * | 2002-04-11 | 2007-07-19 | Matsushita Electric Industrial Co., Ltd., Kadoma | DEVICES, METHODS AND PROGRAMS FOR CODING AND DECODING |
US6915256B2 (en) * | 2003-02-07 | 2005-07-05 | Motorola, Inc. | Pitch quantization for distributed speech recognition |
US6961696B2 (en) * | 2003-02-07 | 2005-11-01 | Motorola, Inc. | Class quantization for distributed speech recognition |
WO2004084182A1 (en) * | 2003-03-15 | 2004-09-30 | Mindspeed Technologies, Inc. | Decomposition of voiced speech for celp speech coding |
GB2400003B (en) * | 2003-03-22 | 2005-03-09 | Motorola Inc | Pitch estimation within a speech signal |
US7117147B2 (en) * | 2004-07-28 | 2006-10-03 | Motorola, Inc. | Method and system for improving voice quality of a vocoder |
CN1779779B (en) * | 2004-11-24 | 2010-05-26 | 摩托罗拉公司 | Method and apparatus for providing phonetical databank |
EP1872364B1 (en) * | 2005-03-30 | 2010-11-24 | Nokia Corporation | Source coding and/or decoding |
KR100735343B1 (en) * | 2006-04-11 | 2007-07-04 | 삼성전자주식회사 | Apparatus and method for extracting pitch information of a speech signal |
KR100900438B1 (en) * | 2006-04-25 | 2009-06-01 | 삼성전자주식회사 | Apparatus and method for voice packet recovery |
JP4946293B2 (en) * | 2006-09-13 | 2012-06-06 | 富士通株式会社 | Speech enhancement device, speech enhancement program, and speech enhancement method |
CN1971707B (en) * | 2006-12-13 | 2010-09-29 | 北京中星微电子有限公司 | Method and apparatus for estimating fundamental tone period and adjudging unvoiced/voiced classification |
US8036886B2 (en) | 2006-12-22 | 2011-10-11 | Digital Voice Systems, Inc. | Estimation of pulsed speech model parameters |
EP3629328A1 (en) * | 2007-03-05 | 2020-04-01 | Telefonaktiebolaget LM Ericsson (publ) | Method and arrangement for smoothing of stationary background noise |
JP5355387B2 (en) * | 2007-03-30 | 2013-11-27 | パナソニック株式会社 | Encoding apparatus and encoding method |
US8326617B2 (en) * | 2007-10-24 | 2012-12-04 | Qnx Software Systems Limited | Speech enhancement with minimum gating |
US8260220B2 (en) * | 2009-09-28 | 2012-09-04 | Broadcom Corporation | Communication device with reduced noise speech coding |
FR2961938B1 (en) * | 2010-06-25 | 2013-03-01 | Inst Nat Rech Inf Automat | IMPROVED AUDIO DIGITAL SYNTHESIZER |
US8862465B2 (en) | 2010-09-17 | 2014-10-14 | Qualcomm Incorporated | Determining pitch cycle energy and scaling an excitation signal |
US20140365212A1 (en) * | 2010-11-20 | 2014-12-11 | Alon Konchitsky | Receiver Intelligibility Enhancement System |
US8818806B2 (en) * | 2010-11-30 | 2014-08-26 | JVC Kenwood Corporation | Speech processing apparatus and speech processing method |
PL2676268T3 (en) * | 2011-02-14 | 2015-05-29 | Fraunhofer Ges Forschung | Apparatus and method for processing a decoded audio signal in a spectral domain |
BR112013020324B8 (en) | 2011-02-14 | 2022-02-08 | Fraunhofer Ges Forschung | Apparatus and method for error suppression in low delay unified speech and audio coding |
PT2676270T (en) | 2011-02-14 | 2017-05-02 | Fraunhofer Ges Forschung | Coding a portion of an audio signal using a transient detection and a quality result |
JP5969513B2 (en) | 2011-02-14 | 2016-08-17 | フラウンホーファー−ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン | Audio codec using noise synthesis between inert phases |
MY160265A (en) | 2011-02-14 | 2017-02-28 | Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E V | Apparatus and Method for Encoding and Decoding an Audio Signal Using an Aligned Look-Ahead Portion |
TWI488176B (en) | 2011-02-14 | 2015-06-11 | Fraunhofer Ges Forschung | Encoding and decoding of pulse positions of tracks of an audio signal |
KR101424372B1 (en) | 2011-02-14 | 2014-08-01 | 프라운호퍼 게젤샤프트 쭈르 푀르데룽 데어 안겐반텐 포르슝 에. 베. | Information signal representation using lapped transform |
AR085794A1 (en) | 2011-02-14 | 2013-10-30 | Fraunhofer Ges Forschung | LINEAR PREDICTION BASED ON CODING SCHEME USING SPECTRAL DOMAIN NOISE CONFORMATION |
PT3239978T (en) | 2011-02-14 | 2019-04-02 | Fraunhofer Ges Forschung | Encoding and decoding of pulse positions of tracks of an audio signal |
US9142220B2 (en) | 2011-03-25 | 2015-09-22 | The Intellisis Corporation | Systems and methods for reconstructing an audio signal from transformed audio information |
US8548803B2 (en) | 2011-08-08 | 2013-10-01 | The Intellisis Corporation | System and method of processing a sound signal including transforming the sound signal into a frequency-chirp domain |
US8620646B2 (en) * | 2011-08-08 | 2013-12-31 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal using harmonic envelope |
US9183850B2 (en) | 2011-08-08 | 2015-11-10 | The Intellisis Corporation | System and method for tracking sound pitch across an audio signal |
JP6010539B2 (en) * | 2011-09-09 | 2016-10-19 | パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカPanasonic Intellectual Property Corporation of America | Encoding device, decoding device, encoding method, and decoding method |
ES2689072T3 (en) * | 2012-05-23 | 2018-11-08 | Nippon Telegraph And Telephone Corporation | Encoding an audio signal |
RU2612589C2 (en) | 2013-01-29 | 2017-03-09 | Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф. | Frequency emphasizing for lpc-based encoding in frequency domain |
US9208775B2 (en) * | 2013-02-21 | 2015-12-08 | Qualcomm Incorporated | Systems and methods for determining pitch pulse period signal boundaries |
US9959886B2 (en) * | 2013-12-06 | 2018-05-01 | Malaspina Labs (Barbados), Inc. | Spectral comb voice activity detection |
US9922668B2 (en) | 2015-02-06 | 2018-03-20 | Knuedge Incorporated | Estimating fractional chirp rate with multiple frequency representations |
US9842611B2 (en) | 2015-02-06 | 2017-12-12 | Knuedge Incorporated | Estimating pitch using peak-to-peak distances |
EP3306609A1 (en) * | 2016-10-04 | 2018-04-11 | Fraunhofer Gesellschaft zur Förderung der Angewand | Apparatus and method for determining a pitch information |
JP6891736B2 (en) | 2017-08-29 | 2021-06-18 | 富士通株式会社 | Speech processing program, speech processing method and speech processor |
CN108281150B (en) * | 2018-01-29 | 2020-11-17 | 上海泰亿格康复医疗科技股份有限公司 | Voice tone-changing voice-changing method based on differential glottal wave model |
TWI684912B (en) * | 2019-01-08 | 2020-02-11 | 瑞昱半導體股份有限公司 | Voice wake-up apparatus and method thereof |
US11270714B2 (en) | 2020-01-08 | 2022-03-08 | Digital Voice Systems, Inc. | Speech coding using time-varying interpolation |
US11990144B2 (en) | 2021-07-28 | 2024-05-21 | Digital Voice Systems, Inc. | Reducing perceived effects of non-voice data in digital speech |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4791671A (en) * | 1984-02-22 | 1988-12-13 | U.S. Philips Corporation | System for analyzing human speech |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4731846A (en) * | 1983-04-13 | 1988-03-15 | Texas Instruments Incorporated | Voice messaging system with pitch tracking based on adaptively filtered LPC residual signal |
US5081681B1 (en) | 1989-11-30 | 1995-08-15 | Digital Voice Systems Inc | Method and apparatus for phase synthesis for speech processing |
US5226108A (en) | 1990-09-20 | 1993-07-06 | Digital Voice Systems, Inc. | Processing a speech signal with estimated pitch |
US5216747A (en) | 1990-09-20 | 1993-06-01 | Digital Voice Systems, Inc. | Voiced/unvoiced estimation of an acoustic signal |
JP3840684B2 (en) * | 1996-02-01 | 2006-11-01 | ソニー株式会社 | Pitch extraction apparatus and pitch extraction method |
-
1998
- 1998-05-21 GB GBGB9811019.0A patent/GB9811019D0/en not_active Ceased
-
1999
- 1999-05-18 EP EP99922353A patent/EP0996949A2/en not_active Withdrawn
- 1999-05-18 WO PCT/GB1999/001581 patent/WO1999060561A2/en not_active Application Discontinuation
- 1999-05-18 CN CN99801185A patent/CN1274456A/en active Pending
- 1999-05-18 AU AU39454/99A patent/AU761131B2/en not_active Ceased
- 1999-05-18 CA CA002294308A patent/CA2294308A1/en not_active Abandoned
- 1999-05-18 KR KR1020007000661A patent/KR20010022092A/en not_active Application Discontinuation
- 1999-05-18 US US09/446,646 patent/US6526376B1/en not_active Expired - Fee Related
- 1999-05-18 BR BR9906454-5A patent/BR9906454A/en not_active IP Right Cessation
- 1999-05-18 IL IL13412299A patent/IL134122A0/en unknown
- 1999-05-18 JP JP2000550096A patent/JP2002516420A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4791671A (en) * | 1984-02-22 | 1988-12-13 | U.S. Philips Corporation | System for analyzing human speech |
Non-Patent Citations (4)
Title |
---|
ATKINSON I ET AL: "HIGH QUALITY SPLIT BAND LPC VOCODER OPERATING AT LOW BIT RATES" 1997 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, SPEECH PROCESSING MUNICH, APR. 21 - 24, 1997, vol. 2, 21 April 1997 (1997-04-21), pages 1559-1562, XP002072023 INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS ISBN: 0-8186-7920-4 * |
BOYANOV B ET AL: "ROBUST HYBRID PITCH DETECTOR" ELECTRONICS LETTERS, vol. 29, no. 22, 28 October 1993 (1993-10-28), pages 1924-1926, XP000407587 ISSN: 0013-5194 * |
GRIFFIN D W ET AL: "A NEW MODEL-BASED SPEECH ANALYSIS/SYNTHESIS SYSTEM" INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING ICASSP, TAMPA, FLORIDA, MAR. 26 - 29, 1985, vol. 2, no. CONF. 10, 26 March 1985 (1985-03-26), pages 513-516, XP002015284 INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS * |
MCAULAY R J ET AL: "PITCH ESTIMATION AND VOICING DETECTION BASED ON A SINUSOIDAL SPEECH MODEL1" SPEECH PROCESSING 1, ALBUQUERQUE, APRIL 3 - 6, 1990, vol. 1, no. CONF. 15, 3 April 1990 (1990-04-03), pages 249-252, XP000146452 INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS * |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1122717A1 (en) * | 2000-02-03 | 2001-08-08 | Alcatel | Coding method and apparatus for restoring speech signals packet-switched |
FR2804813A1 (en) * | 2000-02-03 | 2001-08-10 | Cit Alcatel | ENCODING METHOD TO FACILITATE THE SOUND RESTITUTION OF DIGITAL SPOKEN SIGNALS TRANSMITTED TO A SUBSCRIBER TERMINAL DURING TELEPHONE COMMUNICATION BY PACKET TRANSMISSION AND EQUIPMENT USING THE SAME |
WO2004075571A3 (en) * | 2003-02-24 | 2005-01-06 | Ibm | Pitch estimation using low-frequency band noise detection |
EP1620844A2 (en) * | 2003-03-31 | 2006-02-01 | Motorola, Inc. | System and method for combined frequency-domain and time-domain pitch extraction for speech signals |
EP1620844A4 (en) * | 2003-03-31 | 2008-10-08 | Motorola Inc | System and method for combined frequency-domain and time-domain pitch extraction for speech signals |
US9015038B2 (en) | 2010-10-25 | 2015-04-21 | Voiceage Corporation | Coding generic audio signals at low bitrates and low delay |
Also Published As
Publication number | Publication date |
---|---|
AU3945499A (en) | 1999-12-06 |
EP0996949A2 (en) | 2000-05-03 |
CA2294308A1 (en) | 1999-11-25 |
BR9906454A (en) | 2000-09-19 |
IL134122A0 (en) | 2001-04-30 |
WO1999060561A3 (en) | 2000-03-09 |
AU761131B2 (en) | 2003-05-29 |
GB9811019D0 (en) | 1998-07-22 |
US6526376B1 (en) | 2003-02-25 |
KR20010022092A (en) | 2001-03-15 |
CN1274456A (en) | 2000-11-22 |
JP2002516420A (en) | 2002-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU761131B2 (en) | Split band linear prediction vocodor | |
US5226084A (en) | Methods for speech quantization and error correction | |
EP0337636B1 (en) | Harmonic speech coding arrangement | |
US6377916B1 (en) | Multiband harmonic transform coder | |
Supplee et al. | MELP: the new federal standard at 2400 bps | |
EP0336658B1 (en) | Vector quantization in a harmonic speech coding arrangement | |
EP1222659B1 (en) | Lpc-harmonic vocoder with superframe structure | |
CA2167025C (en) | Estimation of excitation parameters | |
US6078880A (en) | Speech coding system and method including voicing cut off frequency analyzer | |
US6098036A (en) | Speech coding system and method including spectral formant enhancer | |
US5930747A (en) | Pitch extraction method and device utilizing autocorrelation of a plurality of frequency bands | |
CA2412449C (en) | Improved speech model and analysis, synthesis, and quantization methods | |
US6138092A (en) | CELP speech synthesizer with epoch-adaptive harmonic generator for pitch harmonics below voicing cutoff frequency | |
US20030074192A1 (en) | Phase excited linear prediction encoder | |
JP2003512654A (en) | Method and apparatus for variable rate coding of speech | |
US5884251A (en) | Voice coding and decoding method and device therefor | |
CA2132006C (en) | Method for generating a spectral noise weighting filter for use in a speech coder | |
EP0899720B1 (en) | Quantization of linear prediction coefficients | |
KR100563016B1 (en) | Variable Bitrate Voice Transmission System | |
KR100220783B1 (en) | Speech quantization and error correction method | |
MXPA00000703A (en) | Split band linear prediction vocodor | |
Grassi et al. | Fast LSP calculation and quantization with application to the CELP FS1016 speech coder | |
Stegmann et al. | CELP coding based on signal classification using the dyadic wavelet transform |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 134122 Country of ref document: IL Ref document number: 99801185.1 Country of ref document: CN |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: IN/PCT/1999/5/CHE Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2294308 Country of ref document: CA Ref document number: 2294308 Country of ref document: CA Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 39454/99 Country of ref document: AU |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
WWE | Wipo information: entry into national phase |
Ref document number: PA/a/2000/000703 Country of ref document: MX Ref document number: 1999922353 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1020007000661 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 09446646 Country of ref document: US |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWP | Wipo information: published in national office |
Ref document number: 1999922353 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 1020007000661 Country of ref document: KR |
|
WWG | Wipo information: grant in national office |
Ref document number: 39454/99 Country of ref document: AU |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1020007000661 Country of ref document: KR |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999922353 Country of ref document: EP |