EP1933306A1 - Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP - Google Patents

Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP Download PDF

Info

Publication number
EP1933306A1
EP1933306A1 EP06025955A EP06025955A EP1933306A1 EP 1933306 A1 EP1933306 A1 EP 1933306A1 EP 06025955 A EP06025955 A EP 06025955A EP 06025955 A EP06025955 A EP 06025955A EP 1933306 A1 EP1933306 A1 EP 1933306A1
Authority
EP
European Patent Office
Prior art keywords
speech signal
parameter
pitch
pitch parameter
pcm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP06025955A
Other languages
German (de)
English (en)
Inventor
Christophe Beaugeant
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nokia Solutions and Networks GmbH and Co KG
Original Assignee
Nokia Siemens Networks GmbH and Co KG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Siemens Networks GmbH and Co KG filed Critical Nokia Siemens Networks GmbH and Co KG
Priority to EP06025955A priority Critical patent/EP1933306A1/fr
Publication of EP1933306A1 publication Critical patent/EP1933306A1/fr
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/08Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters
    • G10L19/12Determination or coding of the excitation function; Determination or coding of the long-term prediction parameters the excitation function being a code excitation, e.g. in code excited linear prediction [CELP] vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/173Transcoding, i.e. converting between two coded representations avoiding cascaded coding-decoding

Definitions

  • the present invention relates to a method and apparatus for transcoding a speech signal from a first code excited linear prediction (CELP) format to a second code excited linear prediction (CELP) format.
  • CELP code excited linear prediction
  • CELP coding Many different formats of CELP coding are in use today. In order to successfully decode a CELP-coded speech signal, the decoder must employ the same CELP coding model, in the following referred as "format", as the encoder that produced the signal. When communications systems employing different CELP formats must share speech data, it is often desirable to convert the speech signal from one CELP coding format to another.
  • tandem transcoding A known method to provide interconnectivity consists in decoding one standard compressed bitstream and to re-encode it into the other standard bitstream.
  • This known method is called the tandem transcoding.
  • tandem transcoding system includes an input CELP format decoder and an output CELP format encoder.
  • the input format CELP decoder receives a speech signal that has been encoded using one CELP format.
  • the decoder of the tandem coding system decodes the input coded speech signal to produce a pcm speech signal.
  • the output CELP format encoder of the tandem coding system receives the decoded pcm speech signal and encodes it using the output CELP format to produce a compressed output signal in the output CELP format.
  • the primary disadvantage of this approach is the perceptual degradation experienced by the speech signal in passing through multiple encoders and decoders. Further, the known tandem transcoding scheme suffers from the problems of complexity and delay.
  • the basic idea of this internal smart transcoding solution of the applicant is to use the redundancy on the standard to avoid computing parameters that were already computed. For example, it is possible to use parameters already coded at the encoder of the sending apparatus at the encoder of the transcoding system or apparatus to drive the re-encoding.
  • One of these parameters mapped between the speech codecs is the pitch parameter.
  • the pitch mapping is provided by copying the pitch or pitch parameter from the bitstream of a first codec to the encoder of a second codec.
  • the pitch estimation is done in two steps in standardized CELP coding.
  • An open-loop search gives a first estimation of the pitch To.
  • a closed loop pitch T OP is obtained as a refinement of the pitch parameter To by a search in an interval [T O -T LOW ; T O -T HIGH ].
  • a further enhanced internal solution is to provide a mapping skipping either the open-loop search or both the closed loop-search and the open-loop search dependent on predefined parameters.
  • the pitch parameter of the first codec T OP (A) is taken as the output of the open-loop search so that the closed loop search at the encoder of the second codec is done in an interval around T op (A).
  • the pitch T op (A) is directly taken as the output of the closed loop search and is quantified at the encoder of the second codec.
  • More advanced approaches try to estimate more accurately the pitch or pitch parameter at the encoder of the second codec given the pitch computed by the first codec.
  • Such approaches are for example known from " An Efficient transcoding algorithm for G.713.1 and EVCR speech coders", Kyung Tae Kim and al. IEEE 54th or from “A novel scheme from EVRC to G.729AB, Pankaj K. R., 37th Asilomar Conf. On Signals, Systems and Computers, 2003 .
  • Said advanced approaches could be called “pitch smoothing” method.
  • the open-loop pitch computation at the encoder of the second codec is driven by the pitch parameter T OP (A) of the first codec.
  • T OP (A) the pitch parameter of the first codec.
  • an open-loop search at the encoder of the second codec is also driven by the pitch or pitch parameter T op (A), by limiting the closed loop-search in a restricted interval (T O -T' LOW ; T O -T' HIGH ) with T' LOW ⁇ T LOW and T' HIGH ⁇ T HIGH ). All the previous mentioned solutions work either at the encoder of the second codec on the output of the open-loop search or on the output of the closed loop search.
  • An object of the present invention is to provide an optimal compromise for a transcoding scheme between the quality of the transmission of the speech signal and the complexity of the generating of the pitch parameter.
  • a method of transcoding a speech signal from a first code excited linear prediction (CELP) format to a second code excited linear prediction (CELP) format comprises the following steps:
  • transcoding apparatus for transcoding a speech signal from a first code excited linear prediction (CELP) format to a second code excited linear prediction (CELP) format
  • the transcoding apparatus comprises:
  • Voiced and unvoiced characteristics are defined by the action of the vocal cords.
  • the vocal cords vibrate for voiced sounds, but do not vibrate for unvoiced sounds. For example, all the vowels in English are voiced sounds. Some of the consonants such as "b”, "d” are partially voiced.
  • the beginning of the phoneme [b] or [d] is plosive, the end is voiced, while "p", "f" for instance are completely unvoiced.
  • the estimation of the pitch parameter at the encoder of the second codec is all the more accurate the open-loop search and the closed-loop search is done.
  • Definitions for the closed-loop search and the open-loop search can be found in the specification of the speech codec by ITU-T or 3GPPP. It can be found for instance in "Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec", release 6, 3GPPP TS 26.090, v6.0.0 or in "Coding of Speech at 8 kbit/s using conjugate-structure algebraic-excited linear prediction", ITU-T recommendation G.729 03/1996.
  • AMR Adaptive Multi-Rate
  • the pitch parameter T OP (A) is taken as the output of the open loop at the encoder of the second codec and it is possible to skip the closed loop search depended on the energy level of the decoded speech signal and the voiced level of the speech signal.
  • the influence of the pitch is less so that an accurate estimation of the pitch parameter for the second CELP format is not needed according to the present invention.
  • the pitch parameter of the first codec can be used as the output of the closed loop of the encoder of the second codec.
  • an optimal compromise between the quality of the transmission of the speech signal and the complexity of the generating of the pitch parameter of the encoder of the second codec is provided.
  • the method comprises the steps of:
  • the transcoding apparatus comprises:
  • the energy of the signal is an important factor determining if the quality needs to be optimal or not at the encoder of the second codec.
  • Artefacts are principally more acceptable on signals of low energy than on high energy signals. Indeed, low signals are less audible and can be principally more degraded than energetic signals. Accordingly, according to the present invention an accurate estimation of the pitch parameter at the encoder of the second codec is principally only applied for high energy signals.
  • An advantage of the present invention is to provide an adaptive compromise between the closed loop search and the open loop search depending on the first pitch parameter of the first compressed speech signal encoded using the first CELP format and depending on its energy level.
  • the method further comprises the step of encoding the decoded pcm speech signal using the second CELP format to a second coded speech signal including at least a second pitch parameter.
  • the closed loop search process is performed in a restricted interval [T op (A)-T' LOW ; T op (A) +T' HIGH ] around the first pitch parameter (T op (A)), wherein T' LOW the signals a preselected lower pitch threshold value and T' HIGH a preselected upper pitch threshold value.
  • the lower and the upper pitch first threshold values are preselected the greater the detected voiced level is.
  • the first parameter and/or the second parameter are provided as a predetermined threshold value, respectively.
  • the first CELP format is provided by a first codec and the second CELP format is provided by a second codec which is different to the first codec.
  • Suitable examples for the first codes and the second codec are the following:
  • the first codec and the second codec are selected from the group of AMR, AMR-WB/G.722.2, G.729, ANNEXES OF G.729, G.723.1, EVRC AND VMR-WB.
  • the first coded speech signal includes at least the first pitch parameter T op (A) and an additional parameter set comprising a linear prediction code (LPC) parameter and/or at least one fixed gain parameter and/or at least one adaptive gain parameter and/or one adaptive code-book parameter.
  • A first pitch parameter
  • LPC linear prediction code
  • the voiced level of the pcm speech signal is detected by means of using a variability of the first pitch parameter at a predetermined frame or for the predetermined time and/or by means of using at least one parameter of the additional parameter set.
  • the energy level of the pcm speech signal is detected by means of using the fixed gain parameter of the first coded speech signal and/or by means of computing the energy level of the decoded pcm speech signal.
  • Figure 1 shows a schematic flow diagram of a first embodiment of the method for transcoding a speech signal from a first code excited linear prediction (CELP) format to a second code excited linear prediction (CELP) format.
  • CELP code excited linear prediction
  • inventive method for transcoding the speech signal is explained by means of the schematic flow diagram of figure 1 referring to the block diagram of figure 4 .
  • the method of transcoding the speech signal of the present invention comprises the method steps S1-S8:
  • a first coded speech signal DS1 is received, in particular the frame of a speech signal.
  • Said coded speech signal DS1 is encoded using the first CELP format and includes at least a first pitch parameter T op (A).
  • the first coded speech signal DS1 includes at least the first pitch parameter T OP (A) and an additional parameter set.
  • the additional parameter set comprises a linear prediction code (LPC) parameter and/or at least one fixed gain parameter and/or at least one adaptive gain parameter and/or at least one adaptive code-bug parameter.
  • LPC linear prediction code
  • the received first coded speech signal DS1 is decoded to a decoded pcm speech signal AS2.
  • a voiced level VL of the pcm speech signal AS2 is detected within a predetermined time window T.
  • the voiced level VL of the pcm speech signal AS2 is detected by means of using a variability of the first pitch parameter T OP (A) at the predetermined frame or for the predetermined time window T.
  • the voiced level VL can be detected by means of using at least one parameter of said additional parameter set.
  • the first parameter P1 can be provided as a predetermined threshold value.
  • an energy level EL of the pcm speech signal AS2 is detected within the predetermined time window T.
  • the energy level EL of the pcm speech signal AS2 is detected by means of using the fixed gain parameter of the first coded speech signal DS1 and/or by means of computing the energy level of the decoded pcm speech signal AS2.
  • the energy level EL of the pcm speech signal AS2 is high or low dependent on at least a second parameter P2.
  • the second parameter P2 is provided as a predetermined threshold value.
  • a closed loop search process is performed which receives at least the first pitch parameter T OP (A) and estimates a second pitch parameter T OP (B) for the second CELP format dependent on at least the first pitch parameter T op (A).
  • the closed loop search process is performed in a restricted interval [T OP (A)-T' LOW ; T OP (A)+T' HIGH ] around the first pitch parameter T OP (A), wherein T' LOW designates a pre-selected lower pitch threshold value and T' HIGH designates an upper pitch threshold value.
  • the lower and the upper pitch threshold values T' HIGH , T' LOW are pre-selected the greater the detected voiced level VL is.
  • the first pitch parameter T OP (A) is copied as second pitch parameter T OP (B) for the second CELP format.
  • a preferable value for energy level EL on a frame is energy level EL ⁇ 20 dB.
  • Figure 2 is a schematic flow diagram of a second embodiment of the method of the present invention.
  • the second embodiment of figure 2 comprises the method steps S1-S8 as shown in figure 1 and as explained above. Further, the second embodiment of the method of the present invention of figure 2 comprises the additional method step S9.
  • the decoded pcm speech signal AS2 is encoded using the second CELP format to a second coded speech signal DS2 including at least the second pitch parameter T OP (B).
  • Figure 3 is a diagram showing the pitch parameter T OP over the time t.
  • Figure 3 shows that the amplitude of the pitch parameter T OP is low in the voiced periods VP.
  • the pitch parameter T OP is high in the unvoiced periods UP.
  • FIG. 4 is a schematic block diagram of an embodiment of the transcoding apparatus 1 of the present invention.
  • the transcoding apparatus 1 of figure 4 is adapted to execute the method of figure 1 , respectively, of figure 2 .
  • the transcoding apparatus 1 comprises a receiving means 2, a decoding means 3, a first detecting means 4, a first determining means 5, a second detecting means 6, a second determining means 7 and a pitch parameter providing means 8 comprising at least a closed loop search means 8a and a copying means 8b (see figure 6 ).
  • the receiving means 2 is adapted to receive the first coded speech signal DS1 encoded using the first CELP format and including at least the first pitch parameter T OP (A).
  • the decoding means 3 is adapted to decode the received first coded speech signal DS1 to provide an pcm speech signal AS2.
  • the first detecting means 4 is adapted to detect the voiced level VL of the pcm speech signal AS2 within the predetermined time window T.
  • the first determining means 5 is adapted to determine, if the pcm speech signal AS2 is a voiced speech signal or an unvoiced speech signal dependent on at least the first parameter P1.
  • the second detecting means 6 is adapted to detect an energy level EL of the pcm speech signal AS2 within the predetermined time window T.
  • the second determining means 7 is adapted to determine, if the energy level EL of the pcm speech signal AS2 is high or low dependent on at least the second parameter P2.
  • the closed loop search means 8a is adapted to perform a closed loop search.
  • the closed loop search means 8a receives at least the first pitch parameter T OP (A) and estimates a second pitch parameter T OP (B) for the second CELP format dependent on at least the first pitch parameter T OP (A), if the pcm speech signal AS2 is voiced and its energy level EL is high (see figure 6 ).
  • the copying means 8b is adapted to copy the first pitch parameter T OP (A) as the second pitch parameter T OP (B) for the second CELP format, if the pcm speech signal AS2 is unvoiced or its energy level EL is low (see figure 6 ).
  • the closed loop search means 8a and the copying means 8b are shown in detail in figure 6 .
  • the pitch parameter providing means 8 comprises further a decision means 8c.
  • the decision means 8c receives the signals EL' and VL'.
  • EL' designates the detection result of the second determining means 7. For example, if the energy level EL is greater than the second parameter P2 which is a threshold value, the decision signal EL' is high. On the other hand, if the energy level EL is smaller or equal to the second parameter P2, the decision signal EL' is low. Further, the signal VL' is the decision signal of the first detecting means 4. If, for example, the voiced level VL is greater than the first parameter P1 which is a threshold value, the decision signal VL' is high. On the other hand, the decision signal VL' is low.
  • the closed loop search means 8a performs the closed loop search in a restricted interval [T OP (A) -T' LOW ; T o p (A) +T 'HIGH] around the first pitch parameter T OP (A).
  • FIG. 5 shows a schematic block diagram of the transcoding apparatus 1 coupled between two terminal units 11 and 12.
  • a first terminal unit 11 comprises an encoding means 12 and a sending means 13.
  • the encoding means 12 receives a first pcm speech signal AS1 and encodes said first pcm speech signal AS1 using the first CELP format to a first coded speech signal DS1.
  • the sending means 13 receives the first coded speech signal DS1 encoded with the first CELP format and sends it to the transcoding apparatus 1.
  • the second terminal unit 14 comprises a receiving means 15 and a decoding means 16.
  • the receiving means 15 receives the second coded speech signal DS2 encoded with the second CELP format.
  • the receiving means 15 transfers the received second coded speech signal DS2 to the decoding means 16 for decoding.
  • the decoding means 16 works with the second CELP format.
  • the present invention is not limited to the use of one transcoding apparatus between the terminal units, but there could be also provided a lot of different transcoding apparatuses, wherein neighbouring transcoding apparatuses which are coupled to each other work on the same CELP format.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
EP06025955A 2006-12-14 2006-12-14 Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP Withdrawn EP1933306A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP06025955A EP1933306A1 (fr) 2006-12-14 2006-12-14 Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP06025955A EP1933306A1 (fr) 2006-12-14 2006-12-14 Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP

Publications (1)

Publication Number Publication Date
EP1933306A1 true EP1933306A1 (fr) 2008-06-18

Family

ID=37909828

Family Applications (1)

Application Number Title Priority Date Filing Date
EP06025955A Withdrawn EP1933306A1 (fr) 2006-12-14 2006-12-14 Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP

Country Status (1)

Country Link
EP (1) EP1933306A1 (fr)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003058407A2 (fr) * 2002-01-08 2003-07-17 Dilithium Networks Pty Limited Procede et systeme de transcodage entre des codes de la parole de type celp

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003058407A2 (fr) * 2002-01-08 2003-07-17 Dilithium Networks Pty Limited Procede et systeme de transcodage entre des codes de la parole de type celp

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DALWON JANG ET AL: "A Novel Rate Selection Algorithm for Transcoding CELP-type Codec and SMV", EUROSPEECH 2003, September 2003 (2003-09-01), Geneva, CH, pages 2865, XP007007018 *
GHENANIA M ET AL: "TRANSCODAGE INTELLIGENT A FAIBLE COMPLEXITE EXTRE LES CODEURS UIT-T G.729 ET 3GPP NB-AMR (12.2 KBIT/S)", CORESA. COMPRESSION ET REPRESENTATION DES SIGNAUX AUDIOVISUELS, 25 May 2004 (2004-05-25), pages 85 - 88, XP001199662 *
JIN-KYU CHOI ET AL: "Improvement issues on transcoding algorithms : for the flexible usage to the various pairs of speech codec", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2004. PROCEEDINGS. (ICASSP '04)., 17 May 2004 (2004-05-17) - 21 May 2004 (2004-05-21), MONTREAL, QUEBEC, CANADA, pages 269 - 272, XP010717617, ISBN: 0-7803-8484-9 *
KYUNG TAE KIM ET AL: "An efficient transcoding algorithm for G.723.1 and EVRC speech coders", VTC FALL 2001. IEEE 54TH. VEHICULAR TECHNOLOGY CONFERENCE. PROCEEDINGS. ATLANTIC CITY, NJ, OCT. 7 - 11, 2001, IEEE VEHICULAR TECHNOLGY CONFERENCE, NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 4. CONF. 54, 7 October 2001 (2001-10-07), pages 1561 - 1564, XP010562224, ISBN: 0-7803-7005-8 *
PANKAJ K R ED - MATTHEWS M B (ED) INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS: "A novel transcoding scheme from EVRC to G.729AB", CONFERENCE RECORD OF THE 37TH. ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS, & COMPUTERS. PACIFIC GROOVE, CA, NOV. 9 - 12, 2003, ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS AND COMPUTERS, NEW YORK, NY : IEEE, US, vol. VOL. 1 OF 2. CONF. 37, 9 November 2003 (2003-11-09), pages 533 - 536, XP010702678, ISBN: 0-7803-8104-1 *

Similar Documents

Publication Publication Date Title
USRE49363E1 (en) Variable bit rate LPC filter quantizing and inverse quantizing device and method
EP1747556B1 (fr) Support de commutation entre divers modes de codage audio
US7472059B2 (en) Method and apparatus for robust speech classification
US8825477B2 (en) Systems, methods, and apparatus for frame erasure recovery
JP4907826B2 (ja) 閉ループのマルチモードの混合領域の線形予測音声コーダ
US6754630B2 (en) Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation
KR102173422B1 (ko) 음성 부호화 장치, 음성 부호화 방법, 음성 부호화 프로그램, 음성 복호 장치, 음성 복호 방법 및 음성 복호 프로그램
WO2015032351A1 (fr) Décision non voisée/voisée pour un traitement de parole
EP1181687B1 (fr) Codage interpolatif a impulsions multiples de trames vocales de transition
US8380495B2 (en) Transcoding method, transcoding device and communication apparatus used between discontinuous transmission
EP1933306A1 (fr) Procédé et appareil de transcodage de signaux de parole entre deux codeurs de format CELP
JP2011090311A (ja) 閉ループのマルチモードの混合領域の線形予測音声コーダ
kS kkSkkS et al. km mmm SmmSZkukkS kkkk kkkLLk k kkkkkkS

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LI LT LU LV MC NL PL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL BA HR MK RS

AKX Designation fees paid
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20081219