EP1388846A2 - Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen - Google Patents

Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen Download PDF

Info

Publication number
EP1388846A2
EP1388846A2 EP03291749A EP03291749A EP1388846A2 EP 1388846 A2 EP1388846 A2 EP 1388846A2 EP 03291749 A EP03291749 A EP 03291749A EP 03291749 A EP03291749 A EP 03291749A EP 1388846 A2 EP1388846 A2 EP 1388846A2
Authority
EP
European Patent Office
Prior art keywords
weighting filter
term
filter
long
speech
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP03291749A
Other languages
English (en)
French (fr)
Other versions
EP1388846A3 (de
Inventor
Michael Ansorge
Giuseppina Biundo Lotito
Benito Carnero
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
STMicroelectronics NV
Original Assignee
STMicroelectronics NV
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from EP02015919A external-priority patent/EP1383113A1/de
Application filed by STMicroelectronics NV filed Critical STMicroelectronics NV
Priority to EP03291749A priority Critical patent/EP1388846A3/de
Publication of EP1388846A2 publication Critical patent/EP1388846A2/de
Publication of EP1388846A3 publication Critical patent/EP1388846A3/de
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders

Definitions

  • the invention relates to speech encoding / decoding extended band, in particular but not limited to telephony mobile.
  • the bandwidth of the speech signal is between 50 and 7000 Hz.
  • Successive speech sequences sampled at one predetermined sampling frequency are processed in a CELP-type coding device, using a linear prediction with excitation by coded sequences (by ACELP example: "algebraic-code-excited linear-prediction"), well known to those skilled in the art, and described in particular in the recommendation ITU-TG 729, version 3/96, entitled “coding of the speech at 8 kbits / s by linear prediction with excitation by coded sequences with conjugated algebraic structure ”.
  • the CD prediction coder of the CELP type, is based on the linear predictive coding model with code excitation.
  • the coder operates on vocal superframes equivalent for example to 20 ms of signal and each comprising 320 samples.
  • the extraction of the linear prediction parameters, ie the coefficients of the linear prediction filter also called short-term synthesis filter 1 / A (z), is carried out for each speech superframe.
  • each superframe is subdivided into 5 ms frames comprising 80 samples.
  • the voice signal is analyzed to extract the parameters of the CELP prediction model (that is to say, in particular, a long-term digital excitation word v i extracted from an adaptive coded DLT directory, also called “adaptive long-term dictionary", an associated long-term gain Ga, a short-term excitation word c j , extracted from a fixed coded repertoire DCT, also called “short-term dictionary”, and a gain at associated short term Gc).
  • a long-term digital excitation word v i extracted from an adaptive coded DLT directory, also called “adaptive long-term dictionary”
  • an associated long-term gain Ga a short-term excitation word c j
  • a short-term excitation word c j extracted from a fixed coded repertoire DCT, also called “short-term dictionary”
  • Gc gain at associated short term Gc
  • these parameters are used, in a decoder, to retrieve the excitation and predictive filter parameters. We then reconstitutes speech by filtering this excitation flow in a short-term synthesis filter.
  • the short-term dictionary DCT is founded on a fixed structure, for example of the stochastic type, or of the algebraic using an interlaced permutation model of Dirac pulses.
  • the coded repertoire contains innovative excitations, also called algebraic or short-term excitations, and each vector contains a number of non-zero pulses, for example four, each of which can have amplitude +1 or -1 with predetermined positions.
  • the CD encoder processing means include functionally of the first MEXT1 extraction means intended to extract the word long-term excitement, and second MEXT2 extraction means intended to extract the word short-term excitement. Functionally, these means are made for example in software within a processor.
  • These extraction means include a predictive filter FP having a transfer function equal to 1 / A (z), as well as a filter FPP perceptual weighting with a transfer function W (z).
  • the perceptual weighting filter is applied to the signal to model the perception of the ear.
  • the extraction means include means MECM intended to perform a minimization of a square error average.
  • the linear prediction FP synthesis filter models the spectral envelope of the signal. Linear predictive analysis is performed all superframes, so as to determine the linear predictive filter coefficients. These are converted to spectral line pairs (LSP: “Line Spectrum Pairs”) and digitized by predictive vector quantization in two stages.
  • LSP Line Spectrum Pairs
  • Each 20 ms speech superframe is divided into four frames of 5 ms each containing 80 samples.
  • the settings Quantized LSPs are transmitted to the decoder once per superframe while long term and short term parameters are passed at each frame.
  • the coefficients of the linear prediction filter, quantified and not quantified, are used for the most recent frame of a super-frame, while the other three frames of the same super-frame use an interpolation of these coefficients.
  • Tonal delay in open loop is estimated for example every two frames on the basis of the perceptually weighted voice signal. Then, the The following operations are repeated for each frame:
  • the long-term target signal X LT is calculated by filtering the sampled speech signal s (n) by the perceptual weighting filter FPP.
  • the impulse response of the weighted synthesis filter is calculated.
  • a closed loop tonal analysis using a minimization of the mean square error is then carried out in order to determine the long-term excitation word v i and the associated gain Ga, by means of the target signal and the impulse response, by searches around the value of the tone delay in open loop.
  • the long-term target signal is then updated by subtracting the filtered contribution y from the adaptive coded directory DLT and this new short-term target signal X ST is used when exploring the fixed coded directory DCT in order to determine the password.
  • short term excitation C j and the associated gain G c is used when exploring the fixed coded directory DCT in order to determine the password.
  • CELP algorithm strongly depends on the richness of the DCT short term excitation dictionary for example from an algebraic excitation dictionary. If the effectiveness of such algorithm is unquestionable for bandwidth signals narrow (300-3400 Hz), problems arise for signals with widened band.
  • the object of the invention is to independently control the short-term and long-term distortions.
  • the invention therefore provides a speech encoding method with wide band, in which the speech is sampled so as to obtain successive voice frames each comprising a predetermined number of samples, and for each voice frame, we determines parameters of a linear prediction model at excitation by code, these parameters comprising a numeric word of long-term excitement extracted from an adaptive coded repertoire, as well that a word of short-term excitement extracted from a coded repertoire associated fixed.
  • long term excitation word extraction using a prime perceptual weighting filter comprising a first filter formantic weighting
  • the denominator of the transfer function of the first formantic weighting filter is equal to the numerator of the second formantic weighting filter.
  • the use of two filters weighting different formant allows to control regardless of short-term and long-term distortions.
  • the short-term weighting filter is cascaded to the filter of long-term weighting.
  • tying the denominator of the long-term weighting filter in the numerator of the short-term weighting allows these two to be controlled separately filters and also allows a clear simplification when these two filters are cascaded.
  • the first extraction means include a first filter perceptual weighting including a first weighting filter formantic, by the fact that the second means of extraction include the first perceptual weighting filter and a second perceptual weighting filter including a second formantic weighting filter, and the denominator of the function of transfer of the first formantic weighting filter is equal to numerator of the second formantic weighting filter.
  • the invention also relates to a terminal of a system wireless communication, such as a mobile phone cell, incorporating a device as defined above.
  • the FPP perceptual weighting filter uses the masking properties of the human ear compared to the spectral envelope of the speech signal, whose shape is a function resonances of the vocal tract. This filter allows you to assign more importance of the error appearing in the spectral valleys by compared to formic peaks.
  • W (z) AT ( z / ⁇ 1 ) AT ( z / ⁇ 2 ) in which 1 / A (z) is the transfer function of the predictive filter FP and ⁇ 1 and ⁇ 2 are the perceptual weighting coefficients, the two coefficients being positive or zero and less than or equal to 1 with the coefficient ⁇ 2 less than or equal to the coefficient ⁇ 1.
  • the perceptual weighting filter consists of a formantic weighting filter and a weighting of the slope of the spectral envelope of the signal (tilt).
  • FIG. 2 Such an embodiment according to the invention is illustrated in the Figure 2, in which, compared to Figure 1, the FPP single filter has been replaced by a first formantic weighting filter FPP1 for long-term research, cascaded with a second FPP2 formantic weighting filter for short search term.
  • FPP1 formantic weighting filter
  • FPP2 formantic weighting filter
  • the filters appearing in the long-term research loop should also appear in the short-term research loop.
  • the transfer function W 1 (z) of the formantic weighting filter FPP1 is given by formula (II) below.
  • W 1 ( z ) AT ( z / ⁇ 11 ) AT ( z / ⁇ 12 ) while the transfer function W 2 (z) of the formantic weighting filter FPP2 is given by formula (III) below.
  • W 2 ( z ) AT ( z / ⁇ 21 ) AT ( z / ⁇ 22 )
  • the coefficient ⁇ 12 is equal to the coefficient ⁇ 21 . This allows a clear simplification when cascading these two filters.
  • the filter equivalent to the cascade of these two filters has a transfer function given by formula (IV) below.
  • the synthesis filter FP (having the transfer function 1 / A (z)) followed by the long-term weighting filter FPP1 and the weighting filter FPP2 is then equivalent to the filter whose transfer function is given by formula (V) below. 1 AT ( z / ⁇ 22 )
  • the invention advantageously applies to telephony mobile, and in particular to all remote terminals belonging to a wireless communication system.
  • Such a terminal for example a TP mobile telephone, such as that illustrated in FIG. 3, conventionally comprises a antenna connected via a DUP duplexer to a chain reception CHR and a CHT transmission chain.
  • a baseband processor BB is connected to the chain respectively of reception CHR and to the chain of transmission CHT by via analog digital ADCs and analog digital DACs.
  • the processor BB performs processing in baseband, including DCN channel decoding, followed by DCS source decoding.
  • the processor For transmission, the processor performs source coding CCS followed by CCN channel coding.
  • the mobile phone incorporates an encoder according to the invention, it is incorporated within the coding means of CCS source, while the decoder is incorporated within the means DCS source decoding.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP03291749A 2002-07-17 2003-07-15 Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen Withdrawn EP1388846A3 (de)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP03291749A EP1388846A3 (de) 2002-07-17 2003-07-15 Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
EP02015919A EP1383113A1 (de) 2002-07-17 2002-07-17 Verfahren und Vorrichtung für Breitbandsprachkodierung geeignet zur Kontrolle von Kurzzeit- und Langzeitverzerrungen
EP02015919 2002-07-17
EP03291749A EP1388846A3 (de) 2002-07-17 2003-07-15 Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen

Publications (2)

Publication Number Publication Date
EP1388846A2 true EP1388846A2 (de) 2004-02-11
EP1388846A3 EP1388846A3 (de) 2008-08-20

Family

ID=30445142

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03291749A Withdrawn EP1388846A3 (de) 2002-07-17 2003-07-15 Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen

Country Status (1)

Country Link
EP (1) EP1388846A3 (de)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926785A (en) * 1996-08-16 1999-07-20 Kabushiki Kaisha Toshiba Speech encoding method and apparatus including a codebook storing a plurality of code vectors for encoding a speech signal
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5926785A (en) * 1996-08-16 1999-07-20 Kabushiki Kaisha Toshiba Speech encoding method and apparatus including a codebook storing a plurality of code vectors for encoding a speech signal
US6173257B1 (en) * 1998-08-24 2001-01-09 Conexant Systems, Inc Completed fixed codebook for speech encoder

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHEN J-H ET AL: "Improving the performance of the 16 kb/s LD-CELP speech coder" DIGITAL SIGNAL PROCESSING 2, ESTIMATION, VLSI. SAN FRANCISCO, MAR. 23 - 26, 1992, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), NEW YORK, IEEE, US, vol. 5 CONF. 17, 23 mars 1992 (1992-03-23), pages 69-72, XP010058714 ISBN: 0-7803-0532-9 *

Also Published As

Publication number Publication date
EP1388846A3 (de) 2008-08-20

Similar Documents

Publication Publication Date Title
EP0784311B1 (de) Verfahren und Vorrichtung zur Feststellung der Sprachaktivität in einem Sprachsignal und eine Kommunikationsvorrichtung
EP2002428B1 (de) Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung
EP0782128B1 (de) Verfahren zur Analyse eines Audiofrequenzsignals durch lineare Prädiktion, und Anwendung auf ein Verfahren zur Kodierung und Dekodierung eines Audiofrequenzsignals
EP1320087B1 (de) Synthese eines Anregungssignales zur Verwendung in einem Generator von Komfortrauschen
FR2596936A1 (fr) Systeme de transmission d'un signal vocal
CA2161575A1 (fr) Procede et dispositif de suppression de bruit dans un signal de parole, et systeme avec annulation d'echo correspondant
EP1125283B1 (de) Verfahren zur quantisierung der parameter eines sprachkodierers
EP0428445B1 (de) Verfahren und Einrichtung zur Codierung von Prädiktionsfiltern in Vocodern mit sehr niedriger Datenrate
KR100417351B1 (ko) 코드화 음성 신호의 희소성 감소
FR2825826A1 (fr) Procede pour detecter l'activite vocale dans un signal, et codeur de signal vocal comportant un dispositif pour la mise en oeuvre de ce procede
Kroon et al. Predictive coding of speech using analysis-by-synthesis techniques
EP2979266B1 (de) Optimiertes partielles mischen von audioströmen mit subband-codierung
EP1048024B1 (de) Verfahren zur sprachkodierung bei hintergrundrauschen
EP0685833B1 (de) Verfahren zur Sprachkodierung mittels linearer Prädiktion
WO2007107670A2 (fr) Procede de post-traitement d'un signal dans un decodeur audio
EP1429316B1 (de) Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen
EP1383109A1 (de) Verfahren und Vorrichtung für breitbandige Sprachkodierung
EP1383113A1 (de) Verfahren und Vorrichtung für Breitbandsprachkodierung geeignet zur Kontrolle von Kurzzeit- und Langzeitverzerrungen
EP1388846A2 (de) Verfahren und Vorrichtung zur Breitbandkodierung von Sprachsignalen geeignet zurunabhängigen Steuerung lang- und kurzzeitiger Verzerrungen
EP0616315A1 (de) Vorrichtung zur digitalen Sprachkodierung und -dekodierung, Verfahren zum Durchsuchen eines pseudologarithmischen LTP-Verzögerungskodebuchs und Verfahren zur LTP-Analyse
WO2023165946A1 (fr) Codage et décodage optimisé d'un signal audio utilisant un auto-encodeur à base de réseau de neurones
EP1383110A1 (de) Verfahren und Vorrichtung für Breitbandsprachkodierung, insbesondere mit einer verbesserten Qualität der stimmhaften Rahmen
EP1383112A2 (de) Verfahren und Vorrichtung zur Sprachkodierung mit erhöhter Bandbreite, insbesondere mit einer erhöhten Qualität stimmhafter Sprachrahmen
JPH09508479A (ja) バースト励起線形予測
FR2783651A1 (fr) Dispositif et procede de filtrage d'un signal de parole, recepteur et systeme de communications telephonique

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A2

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

PUAL Search report despatched

Free format text: ORIGINAL CODE: 0009013

AK Designated contracting states

Kind code of ref document: A3

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/12 20060101AFI20080716BHEP

AKX Designation fees paid
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20090203

REG Reference to a national code

Ref country code: DE

Ref legal event code: 8566