EP3138095A1 - Improved frame loss correction with voice information - Google Patents

Improved frame loss correction with voice information

Info

Publication number
EP3138095A1
EP3138095A1 EP15725801.3A EP15725801A EP3138095A1 EP 3138095 A1 EP3138095 A1 EP 3138095A1 EP 15725801 A EP15725801 A EP 15725801A EP 3138095 A1 EP3138095 A1 EP 3138095A1
Authority
EP
European Patent Office
Prior art keywords
signal
components
frame
decoding
period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
EP15725801.3A
Other languages
German (de)
French (fr)
Other versions
EP3138095B1 (en
Inventor
Julien Faure
Stéphane RAGOT
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
Orange SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Orange SA filed Critical Orange SA
Publication of EP3138095A1 publication Critical patent/EP3138095A1/en
Application granted granted Critical
Publication of EP3138095B1 publication Critical patent/EP3138095B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/005Correction of errors induced by the transmission channel, if related to the coding algorithm
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/028Noise substitution, i.e. substituting non-tonal spectral components by noisy source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/81Detection of presence or absence of voice signals for discriminating voice from music
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals
    • G10L2025/932Decision in previous or following frames
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the present invention relates to the field of telecommunication coding / decoding, and more particularly that of decoding frame loss correction.
  • a "frame” is understood to mean an audio segment composed of at least one sample (so that the invention applies equally well to the loss of one or more samples coded according to the G.711 standard, as well as to a loss of one or more sample packets coded according to G.723, G.729, etc.).
  • the loss of audio frames occurs when a real-time communication using an encoder and a decoder is disturbed by the conditions of a telecommunication network (radio frequency problems, congestion of the access network, etc.).
  • the decoder uses frame loss correction mechanisms to try to substitute the missing signal with a reconstructed signal by using information available to the decoder (for example the audio signal already decoded for one or more past frames). This technique can maintain quality of service despite degraded network performance.
  • Frame loss correction techniques are most often very dependent on the type of coding used.
  • CELP coding it is common to repeat certain parameters decoded at the previous frame (spectral envelope, pitch, dictionary gains), with adjustments such as a modification of the spectral envelope to converge towards a medium envelope or the use of a random fixed dictionary.
  • the document FR 1350845 proposes a hybrid method that combines the advantages of the two methods by making it possible to maintain phase continuity in the transformed domain.
  • the present invention falls within this framework.
  • a detailed description of the solution that is the subject of this document FR 1350845 is described below with reference to FIG. 1.
  • This solution even if it is particularly promising, remains to be perfected because, when the coded signal has only a fundamental period ( "Mono pitch") such as for example a voiced segment of a speech signal, the audio quality after lost frame correction can to be degraded and worse than with a loss of frame correction by a speech model of the CELP type for example (for "Code-Excited Linear Prediction").
  • the invention improves the situation.
  • the method comprises the steps:
  • the amount of noise added to the addition of components is weighted according to a voicing information of the valid signal obtained at decoding.
  • the voicing information used at decoding, transmitted to at least one encoder bit rate makes it possible to give more importance to the sinusoidal components of the past signal if this signal is voiced, or to give more importance to the noise. otherwise, which gives an audible result much more satisfying.
  • the complexity of the treatments is then advantageously reduced, in particular in the case of an unvoiced signal, without degrading the quality of the synthesis.
  • this noise signal is weighted by a smaller gain in the event of a voicing of the valid signal.
  • this noise signal can be obtained from the frame previously received by a residue between the received signal and the addition of the selected components.
  • the number of components selected for the addition is greater in case of voicing of the valid signal.
  • the number of components selected for the addition is greater in case of voicing of the valid signal.
  • a complementary embodiment may be chosen, in which more components are selected if the signal is voiced while minimizing the gain to be applied to the noise signal.
  • the overall amount of energy attenuated by applying a gain smaller than 1 on the noise signal is partially offset by the selection of more components.
  • the gain to be applied to the noise signal is not decreased and fewer components are selected if the signal is unvoiced or only slightly voiced.
  • the aforesaid period can be sought in a valid signal segment of greater duration in case of voicing of the valid signal.
  • a search is carried out, by correlation in the valid signal, of a repetition period corresponding typically to at least one pitch period if the signal is voiced and in this case , especially for men's voices, the search for pitch can be done over 30 milliseconds for example.
  • the voicing information is provided in a coded stream received at decoding and corresponding to the aforementioned signal comprising a succession of samples distributed in successive frames. In case of decoded frame loss, then, the voicing information contained in a valid signal frame preceding the lost frame is used.
  • the voicing information is derived from an encoder generating a coded stream and determining the voicing information, and in a particular embodiment, the information of voicing is encoded on a single bit in the encoded stream.
  • the encoder generation of this voicing data may be conditioned by the fact that the rate is sufficient or not on a communication network between the encoder and the decoder. For example, if the rate is below a threshold, this voicing data is not transmitted by the encoder to save bandwidth.
  • the last voicing information acquired at the decoder can be used for the frame synthesis, or alternatively it may be decided to apply the case of a non-voicing for the frame synthesis. .
  • the value taken by the gain applied to the noise signal can also be binary and, if the signal is voiced, the value of the gain is set to 0. , 25, and it is 1 otherwise.
  • the voicing information comes from an encoder determining a value of flatness or harmonicity of the spectrum (obtained for example by comparing the amplitudes of the spectral components of the signal, with a background noise), the encoder delivering then this value in binary form in the coded stream (on more than one bit).
  • the value of the gain may be a function of the aforementioned flatness value (for example according to a continuous variation increasing as a function of this value).
  • said flatness value can be compared to a threshold to determine:
  • the signal is voiced if the value of flatness is below the threshold
  • the selection criteria of the components and / or choice of signal segment duration in which the pitch is sought may be binary.
  • the spectral components whose amplitudes are greater than those of the first neighboring spectral components, as well as the first neighboring spectral components, are selected, and
  • pitch search segment duration for example:
  • the period is sought in a valid signal segment of duration greater than 30 milliseconds (for example 33 milliseconds),
  • the period is searched for in a valid signal segment of duration less than 30 milliseconds (for example 28 milliseconds).
  • the purpose of the invention is to improve the state of the art within the meaning of document FR 1350845 by modifying various stages of the processing presented in this document (search for pitch, selection of components, noise injection), but nevertheless depending in particular characteristics of the original signal.
  • These characteristics of the original signal may be encoded as particular information in the data stream to the decoder (or "bitstream") depending on the classification of the speech and / or music, and the case of the speech class. in particular.
  • This information in the decoding stream makes it possible to optimize the compromise between complexity and quality and, jointly, to:
  • Such an embodiment can be implemented in an encoder for the determination of the voicing information, and more particularly in a decoder, particularly in the case of loss of frame. It can be implemented as software in a coded / decoded implementation for Enhanced Voice Services ("EVS") specified by the 3GPP (SA4) group.
  • EVS Enhanced Voice Services
  • SA4 3GPP
  • the present invention also provides a computer program comprising instructions for implementing the above method, when the program is executed by a processor. An example of a flowchart of such a program is presented in the detailed description below with reference to FIG. 4 for the decoding and with reference to FIG. 3 for the coding.
  • the present invention also relates to a device for decoding a digital audio signal comprising a succession of samples distributed in successive frames.
  • the device comprises means (such as a processor and a memory, or an ASIC component or other circuit) for replacing at least one lost signal frame, by:
  • the present invention also provides a device for encoding a digital audio signal, comprising means (such as a memory and a processor, or an ASIC component or other circuit) for providing a voice information in a coded stream that delivers the coding device, distinguishing a speech signal that can be voiced by a music signal, and, in the case of a speech signal, by:
  • FIG. 2 schematically illustrates the main steps of a method within the meaning of the invention
  • FIG. 3 illustrates an example of steps implemented in coding, in one embodiment in the sense of the invention
  • FIG. 4 illustrates an example of steps implemented in decoding, in one embodiment in the sense of the invention
  • FIG. 5 illustrates an example of steps implemented at decoding, for the search for pitch in a valid signal segment Ne
  • FIG. 6 schematically illustrates an exemplary encoder and decoder device within the meaning of the invention.
  • the audio buffer corresponds to the previous samples 0 to N-1.
  • the audio buffer corresponds to the samples at the previous frame, and are not modifiable because this type of coding / decoding does not provide for delay in the return of the signal, so that it It is not planned to perform a crossfade of sufficient duration to cover a frame loss.
  • Fc separation frequency
  • This filtering is preferably a filtering without delay.
  • this filtering step may be optional, the following steps being carried out in full band.
  • the next step S3 consists of searching in the low band for a loopback point and a segment p (n) corresponding to the fundamental period (or "pitch” hereinafter) within the buffer b (n) resampled to the frequency Fc.
  • This realization makes it possible to take into account the continuity of the pitch in the frame (s) lost (s) to be reconstructed.
  • Step S4 consists of breaking down the segment p (n) into a sum of sinusoidal components.
  • the discrete Fourier transform (DFT) of the signal p (n) can be computed over a period corresponding to the length of the signal. This gives the frequency, phase and amplitude of each of the sinusoidal components (or "peaks") that make up the signal.
  • DFT discrete Fourier transform
  • Other transforms than DFT are possible. For example, transforms of DCT, MDCT or MCLT type can be implemented.
  • Step S5 is a step of selecting K sinusoidal components so as to keep only the most important components.
  • the selection of the components corresponds firstly to selecting the amplitudes A (n) for which A (n)> A (n1) and A (n)> A (n + 1) with 3 ⁇ 4 e ⁇ 0; - '- î ⁇ , which ensures that the amplitudes correspond to the spectral peaks.
  • the analysis by Fourier transform FFT is thus made more efficiently over a length which is a power of 2, without changing the effective pitch period ( interpolation)
  • the step S6 sinusoidal synthesis consists in generating a segment s (n) of length at least equal to the size of the lost frame (T).
  • the synthesis signal s (n) is calculated as a sum of the selected sinusoidal components:
  • Step S7 consists of "injecting noise" (filling the spectral zones corresponding to the unselected lines) so as to compensate for the energy loss linked to the omission of certain frequency peaks in the low band.
  • a particular embodiment consists of calculating the residue r (n) between the segment corresponding to the pitch p (n) and the signal synthesized.
  • This residue of size P is transformed, for example, fenestrated and repeated by overlapping between windows of variable sizes, as described in document FR 1353551:
  • Step S8 applied to the high band may simply consist of repeating the past signal.
  • Step S9 the signal is synthesized by resampling the low band at its original frequency fc, after being mixed in step S8 to the high band filtered (simply repeated in step S11).
  • Step S 10 is an overlap addition that ensures continuity between the signal before the frame loss and the synthesized signal.
  • signaling information of the signal before loss of frame, transmitted to at least one encoder bit rate, is used at decoding (step DI-1) to quantitatively determine a proportion of noise to be added to the signal. synthesis signal replacing one or more lost frames.
  • the decoder uses the voicing information, to decrease, as a function of the voicing, the general amount of noise mixed with the synthesis signal (by assigning a gain G (res) lower to the noise signal r '(k) from a residue in step DI-3, and / or selecting more amplitude components A (k) to be used for construction of the synthesis signal in step DI-4).
  • the decoder can further adjust its parameters, including pitch search, to optimize the compromise quality / complexity of the treatment, according to the information of voicing. For example, for the pitch search, if the signal is voiced, the pitch search window Ne can be larger (in step DI-5), as will be seen later with reference to FIG.
  • information can be provided by the encoder in two ways at at least one encoder rate:
  • This "flatness" data PI of the spectrum can be received on several bits at the decoder at the optional step DI-10 of FIG. 2, then compared with a threshold at step DI-11, which amounts to determining at the steps DI-1 and DI-2 if the voicing is above or below a threshold, and deduce the appropriate treatments, in particular for the selection of peaks and for the choice of duration of the pitch search segment.
  • This information (whether in the form of a single bit or a multi-bit value) is received from the encoder (at at least one code rate), in the example described here.
  • the input signal presented in the form of frames C1 is analyzed in step C2.
  • the analysis step consists in determining whether the audio signal of the current frame has characteristics that would require special treatment in the event of loss of frames to the decoder, as is the case, for example, on voiced speech signals.
  • a classification speech / music or other
  • a coder classification already makes it possible to adapt the technique used for the coding according to the nature of the signal (speech or music).
  • predictive coders such as, for example, the coder according to the G.718 standard also use a classification so as to adapt the parameters of the coder to the nature of the signal (voiced / unvoiced, transient, generic, inactive).
  • characterization for the loss of frame is reserved. It is added to the code stream (or bitstream) in step C3 to indicate whether the signal is a speech signal (voiced or generic). This bit is for example set to 1 or to 0 according to the case of the table below:
  • Inactive 0 here we mean by "generic" a usual speech signal (which is not a transient related to the pronunciation of a plosive, which is not inactive, and which is not necessarily purely voiced as the pronunciation of a vowel without consonant).
  • the information transmitted to the decoder in the coded stream is not binary but corresponds to a quantification of the ratio between the peak levels and the valley levels in the spectrum. This ratio can be expressed by a measure of "flatness" of the spectrum, denoted PI:
  • x (k) is the amplitude spectrum of size N resulting from the analysis of the current frame in the frequency domain (after FFT).
  • a sinusoidal analysis decomposing the signal to the sinusoidal component and noise encoder is available and the measure of flatness is obtained by ratio between the sinusoidal components and the overall energy on the frame.
  • step C3 (comprising the single-bit voicing information or the flatness measurement over several bits)
  • the audio buffer of the encoder is conventionally coded in a step C4 before possible subsequent transmission to the decoder.
  • the decoder In the case where there are no frame losses in the step D1 (KO arrow at the output of the test D1 of FIG. 4), the decoder reads the information contained in the coded stream, including the information of " characterization for frame loss "in step D2 (at least one code rate). These are stored in memory so that they can be reused in case a next frame is missing. The decoder then continues the conventional decoding steps D3, etc. to obtain the SYN SYNTH synthesized output frame. In the case where a loss of frame (s) occurs (arrow OK at the output of the test D1), steps D4, D5, D6, D7, D8 and D12, corresponding respectively to the steps S2, S3, S4, S5, are applied. S6 and SU of FIG. 1.
  • step S3 and S5 searches for a loopback point for the determination of the pitch
  • D7 selection of the sinusoidal components
  • the noise injection in step S7 of FIG. 1 is carried out with a gain determination according to two steps D9 and D10 in FIG. 4 of the decoder within the meaning of the invention.
  • the invention consists in modifying the processing of steps D5, D7 and D9-D10, as follows.
  • the "characterization for frame loss” information is binary, and of value:
  • Step D5 consists in finding a loop point and a segment p (n) corresponding to the pitch within the audio buffer resampled at the frequency Fc.
  • This technique described in document FR 1350845, is illustrated in FIG. 5, in which:
  • the audio buffer at the decoder is of sample size N '
  • the loop point is designated Pt Boucl and is Ns samples of the maximum correlation
  • a correlated normal correlation (n) is calculated between the target buffer segment of size Ns lying between N'-Ns and N'-1 (of a duration of for example 6 ms) and the sliding segment of size Ns which begins between sample 0 and Ne (with Nc>N'-Ns): b (p + k) ⁇ , ⁇ '- Ns -
  • step D7 of FIG. 4 sinusoidal components are selected so as to keep only the most important components.
  • the first selection of components ionize the amplitudes A (n) for which A (n)> A (n1) and A (n)> A (n + 1) with
  • the signal that one seeks to reconstruct is a speech signal (voiced or generic) so with marked peaks and a low noise level.
  • the signal that one seeks to reconstruct is a speech signal (voiced or generic) so with marked peaks and a low noise level.
  • This modification notably makes it possible to lower the noise level (and in particular the level of noise injected in steps D9 and D10 presented below) with respect to the level of signal synthesized by sinusoidal synthesis in step D8, while maintaining a global level. sufficient energy not to cause audible artifacts related to energy fluctuations.
  • the voicing information is advantageously used here to attenuate the noise by applying a gain G to the step D10.
  • the signal s (n) resulting from the step D8 is mixed with the noise signal r '(n) resulting from the step D9 by applying however here a gain G which depends on the information of "characterization for the loss of frame from the coded stream of the previous frame, that is:
  • G may be a constant equal to 1 or 0.25 as a function of the voiced or unvoiced nature of the signal of the preceding frame, according to the table given below by way of example:
  • the "frame loss characterization" information has several discrete levels characterizing the PI flatness of the spectrum.
  • the gain G can be expressed directly as a function of the value Pl. The same applies to the limit of the segment Ne for the search for pitch and / or for the number of peaks An to be taken into account for the synthesis of the signal.
  • a treatment can be defined as follows.
  • the gain G is already defined directly as a function of the value PI: ⁇ 2 * 1
  • the value PI is compared to a mean value -3 dB, with the proviso that the value 0 corresponds to a flat spectrum, and -5 dB corresponds to a spectrum with sharp peaks.
  • the duration of the segment of search for pitch Ne at 33 ms and select the peaks A (n) such that A (n)> A (nl) and A (n)> A (n + 1), as well as the first neighboring peaks A (nl) and A (n + 1).
  • the duration Can not be chosen shorter, for example 25 ms and only A (n) peaks such as A (n)> A (n) and A (n)> A (n + 1) are selected.
  • the decoding can then be continued by mixing the noise, the gain of which is thus obtained, with the components thus selected to obtain the synthesis signal in the low frequencies at the D13 tab, which is added to the synthesis signal in the high frequencies obtained at step D14, to obtain in step D15 the synthesized overall signal.
  • a DECOD decoder (comprising for example a software and hardware hardware such as a judiciously programmed memory MEM and a processor PROC cooperating with this memory is illustrated.
  • a component such as an ASIC, or other, as well as a communication interface COM
  • a voicing information that it receives from a coder COD.
  • This encoder comprises, for example, a hardware and software hardware such as a memory MEM 'judiciously programmed to determine the voicing information and a processor PROC cooperating with this memory, or alternatively a component such as an ASIC, or other, and than a communication interface COM '.
  • the coder COD is implanted in a telecommunication device such as a TEL 'telephone.
  • the present invention is not limited to the embodiments described above by way of example; it extends to other variants.
  • the information on voicing can take different forms that can be varied. In the example described above, it may be a binary value on a single bit (voicing or not), or a value on several bits which may be relative to a parameter such as the flatness of the signal spectrum, or any other parameter to characterize (quantitatively or qualitatively) a voicing.
  • this parameter can be determined at decoding, for example according to the degree of correlation that can be measured during the identification of the pitch period.
  • an embodiment comprising a separation in a high frequency band and a low frequency band of the signal from previous valid frames, with in particular a selection of the spectral components in the first embodiment, has been presented as an example. low frequency band. Nevertheless, this embodiment is optional although advantageous in the sense that it reduces the complexity of the treatment.
  • the frame replacement method assisted by the voicing information in the sense of the invention can nevertheless be achieved by considering the entire spectrum of the valid signal, alternatively.
  • the aforementioned noise signal can be obtained by the residue (between the valid signal and the sum of the peaks) by weighting this residue temporally. For example, it can be weighted by overlapping windows, as in the usual framework of a transform coding / decoding with overlap.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Transmission Systems Not Characterized By The Medium Used For Transmission (AREA)

Abstract

The invention relates to the processing of a digital audio signal, including a series of samples distributed in consecutive frames. The processing is implemented in particular when decoding said signal in order to replace at least one signal frame lost during decoding. The method includes the following steps: a) searching, in a valid signal segment available when decoding, for at least one period in the signal, determined in accordance with said valid signal; b) analysing the signal in said period, in order to determine spectral components of the signal in said period; c) synthesising at least one frame for replacing the lost frame, by construction of a synthesis signal from: an addition of components selected among said predetermined spectral components, and a noise added to the addition of components. In particular, the amount of noise added to the addition of components is weighted in accordance with voice information of the valid signal, obtained when decoding.

Description

Correction de perte de trame perfectionnée avec information de voisement  Improved frame loss correction with voicing information
La présente invention concerne le domaine du codage/décodage en télécommunications, et plus particulièrement celui de la correction de perte de trame au décodage. The present invention relates to the field of telecommunication coding / decoding, and more particularly that of decoding frame loss correction.
On entend par « trame » un segment audio composé d' au moins un échantillon (si bien que l'invention s'applique aussi bien pour la perte d'un ou plusieurs échantillons en codage selon la norme G.711 que pour une perte d'un ou plusieurs paquets d'échantillons en codage selon les normes G.723, G.729, etc.). A "frame" is understood to mean an audio segment composed of at least one sample (so that the invention applies equally well to the loss of one or more samples coded according to the G.711 standard, as well as to a loss of one or more sample packets coded according to G.723, G.729, etc.).
Les pertes de trames audio interviennent lorsqu'une communication temps réel utilisant un codeur et un décodeur est perturbée par les conditions d'un réseau de télécommunication (problèmes radiofréquences, congestion du réseau d'accès, etc.). Dans ce cas, le décodeur utilise des mécanismes de correction de perte de trames pour tenter de substituer le signal manquant par un signal reconstruit en utilisant des informations disponibles au décodeur (par exemple le signal audio déjà décodé pour une ou plusieurs trames passées). Cette technique peut maintenir une qualité de service malgré des performances de réseau dégradées. The loss of audio frames occurs when a real-time communication using an encoder and a decoder is disturbed by the conditions of a telecommunication network (radio frequency problems, congestion of the access network, etc.). In this case, the decoder uses frame loss correction mechanisms to try to substitute the missing signal with a reconstructed signal by using information available to the decoder (for example the audio signal already decoded for one or more past frames). This technique can maintain quality of service despite degraded network performance.
Les techniques de correction de perte de trames sont le plus souvent très dépendantes du type de codage utilisé. Frame loss correction techniques are most often very dependent on the type of coding used.
Dans le cas d'un codage CELP, il est courant de répéter certains paramètres décodés à la trame précédente (enveloppe spectrale, pitch, gains de dictionnaires), avec des ajustements comme une modification de l'enveloppe spectrale pour converger vers une enveloppe moyenne ou l'utilisation d'un dictionnaire fixe aléatoire. In the case of CELP coding, it is common to repeat certain parameters decoded at the previous frame (spectral envelope, pitch, dictionary gains), with adjustments such as a modification of the spectral envelope to converge towards a medium envelope or the use of a random fixed dictionary.
La technique la plus employée pour corriger la perte de trame dans le cas d'un codage par transformée, consiste à répéter la dernière trame reçue si une trame est perdue et à mettre la trame répétée à zéro dès que plus d'une trame est perdue. Cette technique se retrouve dans plusieurs codages normalisés (G.719, G.722.1, G.722.1C). On peut aussi citer le cas du codage normalisé G.711, pour lequel un exemple de correction de perte de trame décrit dans l'appendice I de G.711 consiste à identifier une période fondamentale (dite « pitch ») dans le signal déjà décodé et à la répéter en prenant soin de faire une addition avec recouvrement (dit « overlap-add ») entre le signal déjà décodé et le signal répété. Cette addition avec recouvrement permet de « gommer » les artefacts audio mais nécessite, pour être mis en œuvre, un délai supplémentaire au décodeur (correspondant à la durée du recouvrement). Par ailleurs, dans le cas du codage normalisé G.722.1, une transformée modulée avec chevauchement (ou MLT pour « Modulated Lapped Transform »), avec une addition avec recouvrement de 50% et des fenêtres sinusoïdales permettent d'assurer une transition entre la dernière trame perdue et la trame répétée qui soit suffisamment lente pour gommer les artefacts liés à la simple répétition de la trame dans le cas d'une seule trame perdue. Contrairement à la correction de perte de trame décrite dans la norme G.711 (Appendice I), cette réalisation ne nécessite pas de retard supplémentaire puisqu'elle exploite le retard existant et le repliement temporel de la transformée MLT pour faire une addition avec recouvrement avec le signal reconstitué. Cette technique est très peu coûteuse mais elle a comme principal défaut une incohérence entre le signal décodé avant la perte de trame et le signal répété. Il en résulte une discontinuité de phase qui peut produire des artefacts audio importants si la durée du recouvrement entre les deux trames est faible, comme tel est le cas lorsque les fenêtres utilisées pour la transformée MLT sont « à faible retard » comme décrit dans le document FR 1350845 en référence aux figures 1A et 1B de ce document. Dans ce cas, même une solution qui combinerait une recherche de pitch comme dans le cas du codeur selon la norme G.711 (Appendice I) et une addition avec recouvrement selon la fenêtre de la transformée MLT n'est pas suffisante pour supprimer les artefacts audio. Le document FR 1350845 propose une méthode hybride qui combine les avantages des deux méthodes en permettant de garder la continuité de phase dans le domaine transformée. La présente invention s'inscrit dans ce cadre. Une description détaillée de la solution objet de ce document FR 1350845 est décrite plus loin en référence à la figure 1. Cette solution, même si elle est particulièrement prometteuse, reste à parfaire car, lorsque le signal codé ne comporte qu'une période fondamentale (« mono pitch ») comme par exemple un segment voisé d'un signal de parole, la qualité audio après correction de trame perdue peut être dégradée et moins bonne qu'avec une correction de perte de trame par un modèle de parole de type CELP par exemple (pour « Code-Excited Linear Prédiction »). The most common technique for correcting frame loss in the case of transform coding is to repeat the last frame received if a frame is lost and to set the repeated frame to zero as soon as more than one frame is lost. . This technique is found in several standardized codings (G.719, G.722.1, G.722.1C). We can also mention the case of the standardized coding G.711, for which an example of a frame loss correction described in appendix I of G.711 consists of identifying a fundamental period (called "pitch") in the already decoded signal. and repeat it, taking care to make an overlapping addition (called "Overlap-add") between the already decoded signal and the repeated signal. This overlapping addition makes it possible to "erase" the audio artifacts but requires, to be implemented, an additional delay to the decoder (corresponding to the duration of the recovery). On the other hand, in the case of G.722.1 standardized coding, a Modulated Lapped Transform (or MLT) with 50% overlap addition and sinusoidal windows provide a transition between the last lost frame and the repeated frame which is slow enough to erase the artifacts related to the simple repetition of the frame in the case of a single lost frame. In contrast to the frame loss correction described in the G.711 standard (Appendix I), this realization does not require additional delay since it exploits the existing delay and the time folding of the MLT transform to make an overlay with the reconstituted signal. This technique is very inexpensive but its main defect is an inconsistency between the decoded signal before the loss of frame and the repeated signal. This results in a phase discontinuity that can produce large audio artifacts if the overlap time between the two frames is low, as is the case when the windows used for the MLT are "low delay" as described in the document. FR 1350845 with reference to Figures 1A and 1B of this document. In this case, even a solution that combines a pitch search as in the case of the G.711 encoder (Appendix I) and a MLT window overlap addition is not sufficient to remove the artifacts. audio. The document FR 1350845 proposes a hybrid method that combines the advantages of the two methods by making it possible to maintain phase continuity in the transformed domain. The present invention falls within this framework. A detailed description of the solution that is the subject of this document FR 1350845 is described below with reference to FIG. 1. This solution, even if it is particularly promising, remains to be perfected because, when the coded signal has only a fundamental period ( "Mono pitch") such as for example a voiced segment of a speech signal, the audio quality after lost frame correction can to be degraded and worse than with a loss of frame correction by a speech model of the CELP type for example (for "Code-Excited Linear Prediction").
L'invention vient améliorer la situation. The invention improves the situation.
Elle propose à cet effet un procédé de traitement d'un signal audionumérique comportant une succession d'échantillons répartis en trames successives, le procédé étant mis en œuvre pendant un décodage dudit signal pour remplacer au moins une trame de signal perdue au décodage. It proposes for this purpose a method of processing a digital audio signal comprising a succession of samples distributed in successive frames, the method being implemented during a decoding of said signal to replace at least one lost signal frame decoding.
Le procédé comporte les étapes :  The method comprises the steps:
a) recherche, dans un segment de signal valide disponible au décodage, d'au moins une période dans le signal, déterminée en fonction dudit signal valide, a) search, in a valid signal segment available for decoding, of at least one period in the signal, determined according to said valid signal,
b) analyse du signal dans ladite période, pour une détermination de composantes spectrales du signal dans ladite période, b) analyzing the signal in said period for determining spectral components of the signal in said period,
c) synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir : c) synthesizing at least one replacement frame of the lost frame, by constructing a synthesis signal from:
- d'une addition de composantes sélectionnées parmi lesdites composantes spectrales déterminées, et  an addition of selected components from said determined spectral components, and
- d'un bruit ajouté à l'addition de composantes. En particulier, la quantité de bruit ajoutée à l'addition de composantes est pondérée en fonction d'une information de voisement du signal valide, obtenue au décodage.  - a noise added to the addition of components. In particular, the amount of noise added to the addition of components is weighted according to a voicing information of the valid signal obtained at decoding.
Avantageusement, l'information de voisement utilisée au décodage, transmise à au moins un débit du codeur, permet d'accorder plus d'importance aux composantes sinusoïdales du signal passé si ce signal est voisé, ou d'accorder plus d'importance au bruit sinon, ce qui donne un résultat audible beaucoup plus satisfaisant. Toutefois, en cas de signal non voisé ou dans le cas d'un signal de musique, il n'est pas utile de conserver autant de composantes pour la synthèse du signal remplaçant la trame perdue. Dans ce cas, plus de poids peut être attribué au bruit injecté pour la synthèse du signal. On réduit avantageusement alors la complexité des traitements en particulier dans le cas d'un signal non voisé, sans pour autant dégrader la qualité de la synthèse. Dans une forme de réalisation où un signal de bruit est ajouté aux composantes, ce signal de bruit est donc pondéré par un gain plus petit en cas de voisement du signal valide. Par exemple, ce signal de bruit peut être obtenu à partir de la trame précédemment reçue par un résidu entre le signal reçu et l'addition des composantes sélectionnées. Advantageously, the voicing information used at decoding, transmitted to at least one encoder bit rate, makes it possible to give more importance to the sinusoidal components of the past signal if this signal is voiced, or to give more importance to the noise. otherwise, which gives an audible result much more satisfying. However, in the case of an unvoiced signal or in the case of a music signal, it is not useful to keep as many components for the synthesis of the signal replacing the lost frame. In this case, more weight can be attributed to the noise injected for signal synthesis. The complexity of the treatments is then advantageously reduced, in particular in the case of an unvoiced signal, without degrading the quality of the synthesis. In an embodiment where a noise signal is added to the components, this noise signal is weighted by a smaller gain in the event of a voicing of the valid signal. For example, this noise signal can be obtained from the frame previously received by a residue between the received signal and the addition of the selected components.
Dans une forme de réalisation complémentaire ou alternative, le nombre de composantes sélectionnées pour l'addition est plus grand en cas de voisement du signal valide. Ainsi, si le signal est voisé, on tient compte davantage du spectre du signal passé, comme indiqué précédemment. In a complementary or alternative embodiment, the number of components selected for the addition is greater in case of voicing of the valid signal. Thus, if the signal is voiced, more account is taken of the spectrum of the signal passed, as indicated above.
Avantageusement, une forme de réalisation complémentaire peut être choisie, dans laquelle on sélectionne davantage de composantes si le signal est voisé tout en minimisant le gain à appliquer au signal de bruit. Ainsi, la quantité d'énergie globale atténuée par l'application d'un gain plus petit que 1 sur le signal de bruit est en partie compensée par la sélection de davantage de composantes. À l'inverse, on ne diminue pas le gain à appliquer au signal de bruit et on sélectionne moins de composantes si le signal n'est pas voisé ou n'est que faiblement voisé. Advantageously, a complementary embodiment may be chosen, in which more components are selected if the signal is voiced while minimizing the gain to be applied to the noise signal. Thus, the overall amount of energy attenuated by applying a gain smaller than 1 on the noise signal is partially offset by the selection of more components. Conversely, the gain to be applied to the noise signal is not decreased and fewer components are selected if the signal is unvoiced or only slightly voiced.
Il est possible en outre d'améliorer encore le compromis qualité/complexité au décodage et, à l'étape a), la période précitée peut être recherchée dans un segment de signal valide de durée plus grande en cas de voisement du signal valide. Dans un exemple de réalisation présenté dans la description détaillée ci-après, on effectue une recherche, par corrélation dans le signal valide, d'une période de répétition correspondant typiquement à au moins une période de pitch si le signal est voisé et dans ce cas, notamment pour les voix d'hommes, la recherche de pitch peut s'effectuer sur plus de 30 millisecondes par exemple. It is furthermore possible to further improve the quality / complexity compromise at decoding and, in step a), the aforesaid period can be sought in a valid signal segment of greater duration in case of voicing of the valid signal. In an exemplary embodiment presented in the detailed description below, a search is carried out, by correlation in the valid signal, of a repetition period corresponding typically to at least one pitch period if the signal is voiced and in this case , especially for men's voices, the search for pitch can be done over 30 milliseconds for example.
Dans une forme de réalisation optionnelle, l'information de voisement est fournie dans un flux codé reçu au décodage et correspondant au signal précité comportant une succession d'échantillons répartis en trames successives. On utilise alors, en cas de perte de trame au décodage, l'information de voisement contenue dans une trame de signal valide précédant la trame perdue. In an optional embodiment, the voicing information is provided in a coded stream received at decoding and corresponding to the aforementioned signal comprising a succession of samples distributed in successive frames. In case of decoded frame loss, then, the voicing information contained in a valid signal frame preceding the lost frame is used.
Ainsi, l'information de voisement est issue d'un codeur générant un flux codé et déterminant l'information de voisement, et dans une forme de réalisation particulière, l'information de voisement est codée sur un bit unique dans le flux codé. Néanmoins, à titre d'exemple de réalisation, la génération au codeur de cette donnée de voisement peut être conditionnée par le fait que le débit est suffisant ou non sur un réseau de communication entre le codeur et le décodeur. Par exemple, si le débit est inférieur à un seuil, cette donnée de voisement n'est pas transmise par le codeur pour économiser de la bande passante. Dans ce cas, à titre purement d'exemple, la dernière information de voisement acquise au décodeur peut être utilisée pour la synthèse de trame, ou alternativement il peut être décidé d'appliquer le cas d'un non- voisement pour la synthèse de trame. Dans la réalisation l'information de voisement est codée sur un bit unique dans le flux codé, la valeur prise par le gain appliqué au signal de bruit peut être aussi binaire et, si le signal est voisé, la valeur du gain est fixée à 0,25, et elle est de 1 sinon. Thus, the voicing information is derived from an encoder generating a coded stream and determining the voicing information, and in a particular embodiment, the information of voicing is encoded on a single bit in the encoded stream. However, as an example of embodiment, the encoder generation of this voicing data may be conditioned by the fact that the rate is sufficient or not on a communication network between the encoder and the decoder. For example, if the rate is below a threshold, this voicing data is not transmitted by the encoder to save bandwidth. In this case, purely by way of example, the last voicing information acquired at the decoder can be used for the frame synthesis, or alternatively it may be decided to apply the case of a non-voicing for the frame synthesis. . In the realization the voicing information is coded on a single bit in the coded stream, the value taken by the gain applied to the noise signal can also be binary and, if the signal is voiced, the value of the gain is set to 0. , 25, and it is 1 otherwise.
Dans une variante, l'information de voisement est issue d'un codeur déterminant une valeur de platitude ou d'harmonicité du spectre (obtenue par exemple par comparaison des amplitudes des composantes spectrales du signal, à un bruit de fond), le codeur délivrant alors cette valeur sous forme binaire dans le flux codé (sur plus d'un bit). In one variant, the voicing information comes from an encoder determining a value of flatness or harmonicity of the spectrum (obtained for example by comparing the amplitudes of the spectral components of the signal, with a background noise), the encoder delivering then this value in binary form in the coded stream (on more than one bit).
Dans une telle variante, la valeur du gain peut être fonction de la valeur de platitude précitée (par exemple selon une variation continue croissante en fonction de cette valeur). In such a variant, the value of the gain may be a function of the aforementioned flatness value (for example according to a continuous variation increasing as a function of this value).
De manière générale, ladite valeur de platitude peut être comparée à un seuil pour déterminer :In general, said flatness value can be compared to a threshold to determine:
- que le signal est voisé si la valeur de platitude est inférieure au seuil, et the signal is voiced if the value of flatness is below the threshold, and
- que le signal n'est pas voisé sinon,  - that the signal is not voiced otherwise,
(ce qui revient à caractériser le voisement de façon binaire). (which amounts to characterizing voicing in a binary way).
Ainsi, dans la réalisation du bit unique comme dans sa variante, les critères de sélection des composantes et/ou de choix de durée de segment de signal dans lequel on recherche le pitch peuvent être binaires. Thus, in the realization of the single bit as in its variant, the selection criteria of the components and / or choice of signal segment duration in which the pitch is sought may be binary.
Par exemple, pour la sélection de composantes : - si le signal est voisé, on sélectionne les composantes spectrales dont les amplitudes sont supérieures à celles des premières composantes spectrales voisines, ainsi que les premières composantes spectrales voisines, et For example, for the selection of components: if the signal is voiced, the spectral components whose amplitudes are greater than those of the first neighboring spectral components, as well as the first neighboring spectral components, are selected, and
- sinon, on ne sélectionne que les composantes spectrales dont les amplitudes sont supérieures à celles des premières composantes spectrales voisines.  otherwise, only the spectral components whose amplitudes are greater than those of the first neighboring spectral components are selected.
Pour le choix de durée de segment de recherche de pitch, par exemple : For the choice of pitch search segment duration, for example:
- si le signal est voisé, la période est recherchée dans un segment de signal valide de durée supérieure à 30 millisecondes (par exemple 33 millisecondes),  if the signal is voiced, the period is sought in a valid signal segment of duration greater than 30 milliseconds (for example 33 milliseconds),
- et, sinon, la période est recherchée dans un segment de signal valide de durée inférieure à 30 millisecondes (par exemple 28 millisecondes). and, if not, the period is searched for in a valid signal segment of duration less than 30 milliseconds (for example 28 milliseconds).
Ainsi, l'invention vise à améliorer l'état de l'art au sens du document FR 1350845 en modifiant différentes étapes du traitement présenté dans ce document (recherche de pitch, sélection des composantes, injection de bruit) mais néanmoins en fonction en particulier des caractéristiques du signal original. Thus, the purpose of the invention is to improve the state of the art within the meaning of document FR 1350845 by modifying various stages of the processing presented in this document (search for pitch, selection of components, noise injection), but nevertheless depending in particular characteristics of the original signal.
Ces caractéristiques du signal original peuvent être codées comme une information particulière dans le flux de données vers le décodeur (ou « bitstream ») en fonction de la classification de la parole et/ou de la musique, et le cas échant de la classe de parole en particulier. These characteristics of the original signal may be encoded as particular information in the data stream to the decoder (or "bitstream") depending on the classification of the speech and / or music, and the case of the speech class. in particular.
Cette information dans le flux au décodage permet d'optimiser le compromis entre complexité et qualité et, conjointement, de : This information in the decoding stream makes it possible to optimize the compromise between complexity and quality and, jointly, to:
- modifier le gain du bruit à injecter dans la somme des composantes spectrales sélectionnées pour construire le signal de synthèse remplaçant la trame perdue,  modifying the gain of the noise to be injected in the sum of the spectral components selected to construct the synthesis signal replacing the lost frame,
- modifier le nombre de composantes sélectionnées pour la synthèse,  - modify the number of components selected for the synthesis,
- modifier la durée du segment de recherche du pitch.  - change the duration of the pitch search segment.
Une telle réalisation peut être mise en œuvre dans un codeur pour la détermination de l'information de voisement, et plus particulièrement dans un décodeur, notamment dans le cas de perte de trame. Elle peut s'implémenter sous forme logicielle dans une réalisation d'un codage/décodage pour les services voix enrichis (ou « EVS » pour « Enhanced Voice Services ») spécifié par le groupe 3GPP (SA4). A ce titre la présente invention vise aussi un programme informatique comportant des instructions pour la mise en œuvre du procédé ci-avant, lorsque ce programme est exécuté par un processeur. Un exemple d'ordinogramme d'un tel programme est présenté dans la description détaillée ci-après en référence à la figure 4 pour le décodage et en référence à la figure 3 pour le codage. Such an embodiment can be implemented in an encoder for the determination of the voicing information, and more particularly in a decoder, particularly in the case of loss of frame. It can be implemented as software in a coded / decoded implementation for Enhanced Voice Services ("EVS") specified by the 3GPP (SA4) group. As such, the present invention also provides a computer program comprising instructions for implementing the above method, when the program is executed by a processor. An example of a flowchart of such a program is presented in the detailed description below with reference to FIG. 4 for the decoding and with reference to FIG. 3 for the coding.
La présente invention vise aussi un dispositif de décodage d'un signal audionumérique comportant une succession d'échantillons répartis en trames successives. Le dispositif comporte des moyens (tel qu'un processeur et une mémoire, ou un composant ASIC ou autre circuit) pour remplacer au moins une trame de signal perdue, par : The present invention also relates to a device for decoding a digital audio signal comprising a succession of samples distributed in successive frames. The device comprises means (such as a processor and a memory, or an ASIC component or other circuit) for replacing at least one lost signal frame, by:
a) recherche, dans un segment de signal valide disponible au décodage, d'au moins une période dans le signal, déterminée en fonction dudit signal valide, a) search, in a valid signal segment available for decoding, of at least one period in the signal, determined according to said valid signal,
b) analyse du signal dans ladite période, pour une détermination de composantes spectrales du signal dans ladite période, b) analyzing the signal in said period for determining spectral components of the signal in said period,
c) synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir : c) synthesizing at least one replacement frame of the lost frame, by constructing a synthesis signal from:
- d'une addition de composantes sélectionnées parmi lesdites composantes spectrales déterminées, et  an addition of selected components from said determined spectral components, and
- d'un bruit ajouté à l'addition de composantes, - a noise added to the addition of components,
la quantité de bruit ajoutée à l'addition de composantes étant pondérée en fonction d'une information de voisement du signal valide, obtenue au décodage. the amount of noise added to the addition of components being weighted according to a voicing information of the valid signal obtained at decoding.
De même, la présente invention vise aussi un dispositif de codage d'un signal audionumérique, comportant des moyens (tels qu'une mémoire et un processeur, ou un composant ASIC ou autre circuit) pour fournir une information de voisement dans un flux codé que délivre le dispositif de codage, en distinguant un signal de parole susceptible d'être voisé d'un signal de musique, et, dans le cas d'un signal de parole, en : Similarly, the present invention also provides a device for encoding a digital audio signal, comprising means (such as a memory and a processor, or an ASIC component or other circuit) for providing a voice information in a coded stream that delivers the coding device, distinguishing a speech signal that can be voiced by a music signal, and, in the case of a speech signal, by:
- identifiant que le signal est voisé ou générique, pour le considérer globalement voisé, ou - en identifiant que le signal est inactif, transitoire ou non voisé, pour le considérer globalement comme non voisé. D'autres caractéristiques et avantages de l'invention apparaîtront à l'examen de la description détaillée ci-après, et des dessins annexés sur lesquels : - identifying that the signal is voiced or generic, to consider it globally voiced, or - by identifying that the signal is inactive, transient or unvoiced, to consider globally as unvoiced. Other features and advantages of the invention will appear on examining the detailed description below, and the attached drawings in which:
- la figure 1 rappelle les principales étapes du procédé de correction de perte de trame au sens du document FR 1350845 ;  - Figure 1 recalls the main steps of the frame loss correction method in the sense of FR 1350845;
- la figure 2 illustre schématiquement les principales étapes d'un procédé au sens de l'invention ;  FIG. 2 schematically illustrates the main steps of a method within the meaning of the invention;
- la figure 3 illustre un exemple d'étapes mises en œuvre au codage, dans une forme de réalisation au sens de l'invention ;  FIG. 3 illustrates an example of steps implemented in coding, in one embodiment in the sense of the invention;
- la figure 4 illustre un exemple d'étapes mises en œuvre au décodage, dans une forme de réalisation au sens de l'invention ;  FIG. 4 illustrates an example of steps implemented in decoding, in one embodiment in the sense of the invention;
- la figure 5 illustre un exemple d'étapes mises en œuvre au décodage, pour la recherche de pitch dans un segment de signal valide Ne ;  FIG. 5 illustrates an example of steps implemented at decoding, for the search for pitch in a valid signal segment Ne;
- la figure 6 illustre schématiquement un exemple de dispositifs codeur et décodeur au sens de l'invention.  FIG. 6 schematically illustrates an exemplary encoder and decoder device within the meaning of the invention.
On se réfère à la figure 1 sur laquelle on a illustré les étapes principales décrites dans le document FR 1350845. A la première étape SI, on mémorise dans une mémoire tampon du décodeur (ou « buffer ») une succession de N échantillons audio, notée b(n) ci-après. Ces échantillons correspondent à des échantillons déjà décodés et sont donc accessibles pour la correction de perte de trame au décodeur. Si le premier échantillon à synthétiser est l'échantillon N, le buffer audio correspond aux échantillons 0 à N-l précédents. Dans le cas d'un codage par transformée, le buffer audio correspond aux échantillons à la trame précédente, et ne sont pas modifiables car ce type de codage/décodage ne prévoit pas de retard dans la restitution du signal, de sorte qu'il n'est pas prévu de réaliser un fondu enchaîné de durée suffisante pour couvrir une perte de trame. Referring to FIG. 1, in which the main steps described in document FR 1350845 are illustrated. In the first step S1, a succession of N audio samples is stored in a buffer memory of the decoder (or "buffer"). b (n) below. These samples correspond to samples already decoded and are therefore accessible for the decay field decoder correction. If the first sample to be synthesized is the sample N, the audio buffer corresponds to the previous samples 0 to N-1. In the case of a transform coding, the audio buffer corresponds to the samples at the previous frame, and are not modifiable because this type of coding / decoding does not provide for delay in the return of the signal, so that it It is not planned to perform a crossfade of sufficient duration to cover a frame loss.
Ensuite, on procède à une étape de filtrage fréquentiel S2, au cours de laquelle le buffer audio b(n) est séparé en deux bandes, une bande basse BB et une bande haute BH, avec une fréquence de séparation notée Fc (par exemple Fc=4kHz). Ce filtrage est de façon préférentielle un filtrage sans retard. La taille du buffer audio est maintenant réduite à N' = N*Fc/fs suite à la décimation de fs à Fc. Dans des variantes de l'invention, cette étape de filtrage peut être optionnelle, les étapes suivantes étant réalisées en pleine bande. L'étape suivante S3 consiste à rechercher dans la bande basse un point de bouclage et un segment p(n) correspondant à la période fondamentale (ou « pitch » ci-après) au sein du buffer b(n) ré-échantillonné à la fréquence Fc. Cette réalisation permet de tenir compte de la continuité du pitch dans la ou les trame(s) perdue(s) à reconstruire. Then, a frequency filtering step S2 is performed, during which the audio buffer b (n) is separated into two bands, a low band BB and a high band BH, with a separation frequency denoted Fc (for example Fc = 4kHz). This filtering is preferably a filtering without delay. The size of the audio buffer is now reduced to N '= N * Fc / fs following the decimation of fs to Fc. In variants of the invention, this filtering step may be optional, the following steps being carried out in full band. The next step S3 consists of searching in the low band for a loopback point and a segment p (n) corresponding to the fundamental period (or "pitch" hereinafter) within the buffer b (n) resampled to the frequency Fc. This realization makes it possible to take into account the continuity of the pitch in the frame (s) lost (s) to be reconstructed.
L'étape S4 consiste à décomposer le segment p(n) en une somme de composantes sinusoïdales. Par exemple, on peut calculer la transformée de Fourier discrète (DFT) du signal p(n) sur une durée correspondant à la longueur du signal. On obtient ainsi la fréquence, la phase et amplitude de chacune des composantes sinusoïdales (ou « pics ») qui composent le signal. D'autres transformées que la DFT sont possibles. Par exemple, des transformées de type DCT, MDCT ou MCLT peuvent être mises en œuvre. Step S4 consists of breaking down the segment p (n) into a sum of sinusoidal components. For example, the discrete Fourier transform (DFT) of the signal p (n) can be computed over a period corresponding to the length of the signal. This gives the frequency, phase and amplitude of each of the sinusoidal components (or "peaks") that make up the signal. Other transforms than DFT are possible. For example, transforms of DCT, MDCT or MCLT type can be implemented.
L'étape S5 est une étape de sélection de K composantes sinusoïdales de manière à garder uniquement les composantes les plus importantes. Dans un mode de réalisation particulier, la sélection des composantes correspond premièrement à sélectionner les amplitudes A(n) pour lesquelles A(n)>A(n-l) et A(n)>A(n+l) avec ¾ e \ 0;— ' - î \, ce qui assure que les amplitudes correspondent à des pics spectraux. Step S5 is a step of selecting K sinusoidal components so as to keep only the most important components. In a particular embodiment, the selection of the components corresponds firstly to selecting the amplitudes A (n) for which A (n)> A (n1) and A (n)> A (n + 1) with ¾ e \ 0; - '- î \, which ensures that the amplitudes correspond to the spectral peaks.
Pour ce faire, les échantillons du segment p(n) (pitch) sont interpolés de manière à obtenir un segment p'(n) composé de P' échantillons avec P = 2"s"*"3Sï<F)-' > P ) où ceil(x) est l'entier supérieur ou égal à x. L'analyse par transformée de Fourier FFT se fait donc de façon plus efficace sur une longueur qui est une puissance de 2, sans modifier la période de pitch effective (du fait de l'interpolation). On calcule la transformée FFT de p'(n) : Π(,%) = FFT(p! { rî) , et, à partir de la transformée FFT, on obtient directement les phases ψ{ίί) et amplitudes A (k des composantes sinusoï les fréquences normalisées entre 0 et 1 étant données ici par : To do this, the samples of the segment p (n) (pitch) are interpolated so as to obtain a segment p '(n) composed of P' samples with P = 2 " s "" 3Si <F) - '> P ) where ceil (x) is an integer greater than or equal to x. the analysis by Fourier transform FFT is thus made more efficiently over a length which is a power of 2, without changing the effective pitch period ( interpolation) The FFT transform of p '(n): Π (,%) = FFT (p ! {ri) is calculated, and, from the FFT transform, the phases ψ {ίί ) and amplitudes A (k of the sinusoidal components, the normalized frequencies between 0 and 1 being given here by:
Ensuite, parmi les amplitudes de cette première sélection, on sélectionne les composantes par ordre décroissant d' amplitude, de manière à ce que amplitude cumulée des pics sélectionnés soit d'au moins x (par exemple x=70 ) de l'amplitude cumulée sur typiquement la moitié du spectre à la trame courante. Il est aussi possible en plus, de limiter le nombre de composantes (par exemple à 20) de manière à rendre la synthèse moins complexe. Then, among the amplitudes of this first selection, the components are selected in descending order of amplitude, so that the cumulative amplitude of the selected peaks is at least x (for example x = 70) of the cumulative amplitude on typically half of the spectrum at the current frame. It is also possible in addition to limit the number of components (for example to 20) so as to make the synthesis less complex.
L'étape S6 de synthèse sinusoïdale consiste à générer un segment s(n) de longueur au moins égale à la taille de la trame perdue (T). Le signal de synthèse s(n) est calculé comme une somme des composantes sinusoïdales sélectionnées : The step S6 sinusoidal synthesis consists in generating a segment s (n) of length at least equal to the size of the lost frame (T). The synthesis signal s (n) is calculated as a sum of the selected sinusoidal components:
où k est l'indice des K pics sélectionnés de l'étape S5. L'étape S7 consiste à « injecter du bruit » (remplir les zones spectrales correspondant aux raies non sélectionnées) de manière à compenser la perte d'énergie liée à l'omission de certains pics fréquentiels dans la bande basse. Une mode de réalisation particulier consiste à calculer le résidu r(n) entre le segment correspondant au pitch p(n) et le signal synthétisé where k is the index of the K selected peaks of step S5. Step S7 consists of "injecting noise" (filling the spectral zones corresponding to the unselected lines) so as to compensate for the energy loss linked to the omission of certain frequency peaks in the low band. A particular embodiment consists of calculating the residue r (n) between the segment corresponding to the pitch p (n) and the signal synthesized.
s(n), avec n≡ \Ù:.P— 1],, tel que : s (n), with n≡ \ Ù: .P- 1] ,, such that:
ris) = pbi) - sin) n e [0; F - l] ris) = pbi) - sin) n e [0; F - l]
Ce résidu de taille P est transformé, par exemple fenêtré et répété en faisant des recouvrements entre des fenêtres de tailles variables, comme décrit dans le document FR 1353551 : This residue of size P is transformed, for example, fenestrated and repeated by overlapping between windows of variable sizes, as described in document FR 1353551:
IFI  IFI
r' ijs) = f r )} e _Q:; ?— îj st. k e r |0 2T — r 'ijs) = fr)} e _Q:; St. ker | 0 2T -
Le signal s(n) est ensuite combiné au signal r'(n) : The signal s (n) is then combined with the signal r '(n):
s" J_. i s "J_ i.
sh = + r!(p) n e |0; 2F ÷— sh = + r ! (p) not | 0; 2F ÷ -
L'étape S8 appliquée sur la bande haute peut consister simplement à répéter le signal passé. Step S8 applied to the high band may simply consist of repeating the past signal.
Dans une étape S9, le signal est synthétisé en ré-échantillonnant la bande basse à sa fréquence fc d'origine, après avoir été mixé à l'étape S8 à la bande haute filtrée (simplement répétée à l'étape SU). L'étape S 10 est une addition avec recouvrement qui permet d'assurer la continuité entre le signal avant la perte de trame et le signal synthétisé. Il est décrit maintenant les éléments ajoutés au procédé de la figure 1, dans une réalisation au sens de l'invention. Selon une approche générale présentée sur la figure 2, une information de voisement du signal avant perte de trame, transmise à au moins un débit du codeur, est utilisée au décodage (étape DI-1) pour déterminer quantitativement une proportion de bruit à ajouter au signal de synthèse remplaçant une ou plusieurs trames perdues. Ainsi, le décodeur utilise l'information de voisement, pour diminuer, en fonction du voisement, la quantité générale de bruit mixée au signal de synthèse (en assignant un gain G(res) plus faible au signal de bruit r'(k) issu d'un résidu à l'étape DI-3, et/ou en sélectionnant davantage de composantes d'amplitudes A(k) à utiliser pour la construction du signal de synthèse à l'étape DI-4). In a step S9, the signal is synthesized by resampling the low band at its original frequency fc, after being mixed in step S8 to the high band filtered (simply repeated in step S11). Step S 10 is an overlap addition that ensures continuity between the signal before the frame loss and the synthesized signal. The elements added to the process of FIG. 1 are now described in an embodiment within the meaning of the invention. According to a general approach presented in FIG. 2, signaling information of the signal before loss of frame, transmitted to at least one encoder bit rate, is used at decoding (step DI-1) to quantitatively determine a proportion of noise to be added to the signal. synthesis signal replacing one or more lost frames. Thus, the decoder uses the voicing information, to decrease, as a function of the voicing, the general amount of noise mixed with the synthesis signal (by assigning a gain G (res) lower to the noise signal r '(k) from a residue in step DI-3, and / or selecting more amplitude components A (k) to be used for construction of the synthesis signal in step DI-4).
Le décodeur peut en outre ajuster ses paramètres, notamment de recherche de pitch, pour optimiser le compromis qualité/complexité du traitement, en fonction de l'information de voisement. Par exemple, pour la recherche de pitch, si le signal est voisé, la fenêtre de recherche de pitch Ne peut être plus grande (à l'étape DI-5), comme on le verra plus loin en référence à la figure 5. The decoder can further adjust its parameters, including pitch search, to optimize the compromise quality / complexity of the treatment, according to the information of voicing. For example, for the pitch search, if the signal is voiced, the pitch search window Ne can be larger (in step DI-5), as will be seen later with reference to FIG.
Pour la détermination du voisement, une information peut être fournie par le codeur, de deux manières, à au moins un débit du codeur :  For voicing determination, information can be provided by the encoder in two ways at at least one encoder rate:
sous la forme d'un bit de valeur 1 ou 0 selon un degré de voisement identifié au codeur (reçu du codeur à l'étape DI-1 et lu à l'étape DI-2 en cas de perte de trame pour le traitement ultérieur), ou  in the form of a bit of value 1 or 0 according to a degree of voicing identified with the encoder (received from the encoder in step DI-1 and read in step DI-2 in case of loss of frame for the subsequent processing ), or
sous la forme d'une valeur d'amplitude moyenne des pics qui composent le signal au codage, comparée à un bruit de fond.  in the form of an average amplitude value of the peaks that make up the signal at the coding, compared to a background noise.
Cette donnée de « platitude » PI du spectre peut être reçue sur plusieurs bits au décodeur à l'étape optionnelle DI-10 de la figure 2, puis comparée à un seuil à l'étape DI-11, ce qui revient à déterminer aux étapes DI-1 et DI-2 si le voisement est supérieur ou inférieur à un seuil, et en déduire les traitements adéquats, notamment pour la sélection de pics et pour le choix de durée du segment de recherche de pitch.  This "flatness" data PI of the spectrum can be received on several bits at the decoder at the optional step DI-10 of FIG. 2, then compared with a threshold at step DI-11, which amounts to determining at the steps DI-1 and DI-2 if the voicing is above or below a threshold, and deduce the appropriate treatments, in particular for the selection of peaks and for the choice of duration of the pitch search segment.
Cette information (qu'elle soit sous la forme d'un bit unique ou d'une valeur sur plusieurs bits) est reçue du codeur (à au moins un débit du codée), dans l'exemple décrit ici. En effet, en référence à la figure 3, au codeur, le signal d'entrée présenté sous forme de trames Cl est analysé à l'étape C2. L'étape d'analyse consiste à déterminer si le signal audio de la trame courante présente des caractéristiques qui nécessiteraient un traitement particulier en cas de perte de trames au décodeur, comme tel est par exemple le cas sur des signaux de paroles voisés. This information (whether in the form of a single bit or a multi-bit value) is received from the encoder (at at least one code rate), in the example described here. In fact, with reference to FIG. 3, at the encoder, the input signal presented in the form of frames C1 is analyzed in step C2. The analysis step consists in determining whether the audio signal of the current frame has characteristics that would require special treatment in the event of loss of frames to the decoder, as is the case, for example, on voiced speech signals.
Dans un mode de réalisation particulier, on utilise avantageusement une classification (parole/musique ou autre) déjà effectuée au codeur de manière à ne pas augmenter la complexité globale de traitement. En effet, dans le cas de codeurs à commutation de modes de codage entre parole ou musique, une classification au codeur permet déjà d'adapter la technique employée pour le codage en fonction de la nature du signal (parole ou musique). De même, en cas de parole, des codeurs de type prédictifs comme par exemple le codeur selon la norme G.718 utilisent aussi une classification de manière à adapter les paramètres du codeur à la nature du signal (sons voisés / non voisés, transitoires, génériques, inactifs). In a particular embodiment, it is advantageous to use a classification (speech / music or other) already made to the encoder so as not to increase the overall complexity of processing. Indeed, in the case of encoders with coding modes of speech or music coding, a coder classification already makes it possible to adapt the technique used for the coding according to the nature of the signal (speech or music). Similarly, in the case of speech, predictive coders such as, for example, the coder according to the G.718 standard also use a classification so as to adapt the parameters of the coder to the nature of the signal (voiced / unvoiced, transient, generic, inactive).
Dans un premier mode particulier de réalisation, on ne réserve qu'un seul bit de « caractérisation pour la perte de trame ». Il est ajouté au flux codé (ou « bitstream ») à l'étape C3 pour indiquer si le signal est un signal de parole (voisé ou générique). Ce bit est par exemple mis à 1 ou à 0 selon les cas du tableau ci-dessous : In a first particular embodiment, only one bit of "characterization for the loss of frame" is reserved. It is added to the code stream (or bitstream) in step C3 to indicate whether the signal is a speech signal (voiced or generic). This bit is for example set to 1 or to 0 according to the case of the table below:
• de la décision du classificateur parole/musique,  • the decision of the speech / music classifier,
• et en outre de la décision du classificateur du mode de codage de la parole.  • and additionally the decision of the classifier of the speech coding mode.
Décision du classificateur du Parole Musique codeur Decision of the classifier of the Word Music encoder
Valeur du bit de caractérisation Décision du classificateur 0  Characterization bit value Decision of classifier 0
pour la perte de trame Mode de codages :  for frame loss Coding mode:
Voisé 1  Voised 1
Non Voisé 0  Not Vied 0
Transitoire 0  Transient 0
Générique 1  Generic 1
Inactif 0 On entend ici par « générique » un signal de parole habituel (qui n'est pas un transitoire lié à la prononciation d'une plosive, qui n'est pas inactif, et qui n'est pas nécessairement purement voisé comme la prononciation d'une voyelle sans consonne). Dans un deuxième mode de réalisation, alternatif, l'information transmise au décodeur dans le flux codé n'est pas binaire mais correspond à une quantification du rapport entre les niveaux de pics et les niveaux des vallées dans le spectre. Ce rapport peut être exprimé par une mesure de « platitude » du spectre, notée PI : Inactive 0 Here we mean by "generic" a usual speech signal (which is not a transient related to the pronunciation of a plosive, which is not inactive, and which is not necessarily purely voiced as the pronunciation of a vowel without consonant). In a second alternative embodiment, the information transmitted to the decoder in the coded stream is not binary but corresponds to a quantification of the ratio between the peak levels and the valley levels in the spectrum. This ratio can be expressed by a measure of "flatness" of the spectrum, denoted PI:
Dans cette expression, x(k) est le spectre d'amplitude de taille N issu de l'analyse de la trame courante dans le domaine fréquentiel (après FFT). In this expression, x (k) is the amplitude spectrum of size N resulting from the analysis of the current frame in the frequency domain (after FFT).
Dans une alternative, une analyse sinusoïdale décomposant le signal au codeur en composantes sinusoïdales et bruit est disponible et la mesure de platitude est obtenue par ratio entre les composantes sinusoïdales et l'énergie globale sur la trame. In an alternative, a sinusoidal analysis decomposing the signal to the sinusoidal component and noise encoder is available and the measure of flatness is obtained by ratio between the sinusoidal components and the overall energy on the frame.
Suite à l'étape C3 (comportant l'information de voisement en un seul bit ou la mesure de platitude sur plusieurs bits), le buffer audio du codeur est codé classiquement dans une étape C4 avant transmission ultérieure éventuelle au décodeur. Following step C3 (comprising the single-bit voicing information or the flatness measurement over several bits), the audio buffer of the encoder is conventionally coded in a step C4 before possible subsequent transmission to the decoder.
On se réfère maintenant à la figure 4 pour décrire les étapes mises en œuvre au décodeur, dans un exemple de réalisation de l'invention. Referring now to Figure 4 to describe the steps implemented in the decoder, in an exemplary embodiment of the invention.
Dans le cas où il n'y a pas de pertes de trame à l'étape Dl (flèche KO en sortie du test Dl de la figure 4), le décodeur lit les informations contenues dans le flux codé, y compris les informations de « caractérisation pour la perte de trame » à l'étape D2 (à au moins un débit du codée). Ces dernières sont stockées en mémoire de manière à être réutilisées au cas où une trame suivante serait manquante. Le décodeur continue alors les étapes classiques de décodage D3, etc. de manière à obtenir la trame de sortie synthétisée FR SYNTH. Dans le cas où une perte de trame(s) intervient (flèche OK en sortie du test Dl), on applique les étapes D4, D5, D6, D7, D8 et D12, correspondant respectivement aux étapes S2, S3, S4, S5, S6 et SU de la figure 1. Toutefois, quelques modifications sont faites par rapport aux étapes S3 et S5, respectivement aux étapes D5 (recherche d'un point de bouclage pour la détermination du pitch) et D7 (sélection des composantes sinusoïdales). Par ailleurs, l'injection de bruit à l'étape S7 de la figure 1 est réalisée avec une détermination de gain selon deux étapes D9 et D10 dans la figure 4 du décodeur au sens de l'invention. En effet, dans le cas où l'information de « caractérisation pour la perte de trame » est connue (lorsque la trame précédente a été reçue), l'invention consiste à modifier le traitement des étapes D5, D7 et D9-D10, comme suit. In the case where there are no frame losses in the step D1 (KO arrow at the output of the test D1 of FIG. 4), the decoder reads the information contained in the coded stream, including the information of " characterization for frame loss "in step D2 (at least one code rate). These are stored in memory so that they can be reused in case a next frame is missing. The decoder then continues the conventional decoding steps D3, etc. to obtain the SYN SYNTH synthesized output frame. In the case where a loss of frame (s) occurs (arrow OK at the output of the test D1), steps D4, D5, D6, D7, D8 and D12, corresponding respectively to the steps S2, S3, S4, S5, are applied. S6 and SU of FIG. 1. However, some modifications are made with respect to steps S3 and S5, respectively to steps D5 (search for a loopback point for the determination of the pitch) and D7 (selection of the sinusoidal components). Furthermore, the noise injection in step S7 of FIG. 1 is carried out with a gain determination according to two steps D9 and D10 in FIG. 4 of the decoder within the meaning of the invention. Indeed, in the case where the "characterization for frame loss" information is known (when the previous frame has been received), the invention consists in modifying the processing of steps D5, D7 and D9-D10, as follows.
Dans un premier exemple de réalisation, l'information de « caractérisation pour la perte de trame » est binaire, et de valeur : In a first exemplary embodiment, the "characterization for frame loss" information is binary, and of value:
- égale à 0 en cas de signal non voisé, de type musique, de type transitoire,  - equal to 0 in case of unvoiced signal, music type, transient type,
- égale à 1 sinon (tableau ci-dessus).  - equal to 1 otherwise (table above).
L'étape D5 consiste à rechercher un point de bouclage et un segment p(n) correspondant au pitch au sein du buffer audio ré -échantillonné à la fréquence Fc. Cette technique, décrite dans le document FR 1350845, est illustrée sur la figure 5, sur laquelle : Step D5 consists in finding a loop point and a segment p (n) corresponding to the pitch within the audio buffer resampled at the frequency Fc. This technique, described in document FR 1350845, is illustrated in FIG. 5, in which:
le buffer audio au décodeur est de taille d'échantillons N',  the audio buffer at the decoder is of sample size N ',
on détermine la taille d'un buffer cible BC de Ns échantillons,  the size of a target buffer BC of Ns samples is determined,
la recherche de corrélation s'effectue sur Ne échantillons,  the correlation search is done on Ne samples,
- la courbe de corrélation « Correl » présente un maximum en me,  the correlation curve "Correl" has a maximum in me,
le point de bouclage est désigné Pt Boucl et se situe à Ns échantillons du maximum de corrélation,  the loop point is designated Pt Boucl and is Ns samples of the maximum correlation,
le pitch est déterminé alors sur les p(n) échantillons restants à N'-l. On calcule en particulier une corrélation normalisée corr(n) entre le segment de buffer cible de taille Ns compris entre N'-Ns et N'-l (d'une durée par exemple de 6ms) et le segment glissant de taille Ns qui commence entre l'échantillon 0 et Ne (avec Nc>N'-Ns) : b(p + k) Η,Ν'— Ns -the pitch is then determined on the p (n) samples remaining at N'-l. In particular, a correlated normal correlation (n) is calculated between the target buffer segment of size Ns lying between N'-Ns and N'-1 (of a duration of for example 6 ms) and the sliding segment of size Ns which begins between sample 0 and Ne (with Nc>N'-Ns): b (p + k) Η, Ν'- Ns -
Carri as. j = ÎÏ £ [0; APc] Carri as. j = Ï £ [0; APc]
Pour des signaux de musique, de par la nature du signal, la valeur Ne n'a pas besoin d'être trop grande (par exemple Nc=28ms). Cette limitation permet d'économiser de la complexité de calcul lors de la recherche de pitch. For music signals, by the nature of the signal, the value Ne does not need to be too large (eg Nc = 28ms). This limitation saves computational complexity when searching for pitch.
En revanche, l'information de voisement de la dernière trame valablement reçue précédemment permet de déterminer si le signal que l'on cherche à reconstituer est un signal de parole voisé (mono pitch). Il est donc possible, dans ce cas et grâce à cette information, d'augmenter la taille du segment Ne (par exemple Nc=33 ms) de manière à optimiser la recherche de pitch (et potentiellement de trouver une valeur de corrélation plus élevée). On the other hand, the voicing information of the last frame validly received previously makes it possible to determine whether the signal that one seeks to reconstruct is a voiced speech signal (mono pitch). It is therefore possible, in this case and with this information, to increase the size of the segment Ne (for example Nc = 33 ms) so as to optimize the search for pitch (and potentially to find a higher correlation value) .
Par ailleurs, à l'étape D7 de la figure 4, on sélection des composantes sinusoïdales de manière à garder uniquement les composantes les plus importantes. Dans un mode de réalisation particulier présenté aussi dans le document FR 1350845, la première sélection de composantes ionner les amplitudes A(n) pour lesquelles A(n)>A(n-l) et A(n)>A(n+l) avec Moreover, in step D7 of FIG. 4, sinusoidal components are selected so as to keep only the most important components. In a particular embodiment also presented in the document FR 1350845, the first selection of components ionize the amplitudes A (n) for which A (n)> A (n1) and A (n)> A (n + 1) with
Dans le cas de l'invention, on sait avantageusement si le signal que l'on cherche à reconstituer est un signal de parole (voisé ou générique) donc avec des pics marqués et un faible niveau de bruit. Dans ces conditions, il est préférable de sélectionner non seulement les pics A(n) pour lesquelles A(n)>A(n-l) et A(n)>A(n+l) comme présenté ci-dessus, mais aussi d'élargir la sélection à A(n-l) et A(n+1) de manière à ce que les pics sélectionnés représentent une grande part de l'énergie totale du spectre. Cette modification permet notamment de baisser le niveau de bruit (et notamment le niveau de bruit injecté aux étapes D9 et D10 présentées ci-après) par rapport au niveau de signal synthétisé par synthèse sinusoïdale à l'étape D8, tout en conservant un niveau global d'énergie suffisant pour ne pas provoquer d'artefacts audibles liés à des fluctuations d'énergies. In the case of the invention, it is advantageously known if the signal that one seeks to reconstruct is a speech signal (voiced or generic) so with marked peaks and a low noise level. Under these conditions, it is preferable to select not only the peaks A (n) for which A (n)> A (n1) and A (n)> A (n + 1) as presented above, but also of broaden the selection to A (nl) and A (n + 1) so that the selected peaks account for a large part of the total spectrum energy. This modification notably makes it possible to lower the noise level (and in particular the level of noise injected in steps D9 and D10 presented below) with respect to the level of signal synthesized by sinusoidal synthesis in step D8, while maintaining a global level. sufficient energy not to cause audible artifacts related to energy fluctuations.
Ensuite, dans le cas où le signal est exempt de bruit (au moins dans les basses fréquences), comme tel est le cas dans un signal de parole voisé ou générique, il est observé que l'ajout du bruit correspondant au résidu transformé r'(n) au sens du document FR 1350845, dégrade en fait la qualité. Then, in the case where the signal is free of noise (at least in the low frequencies), as is the case in a voiced or generic speech signal, it is observed that the addition of the noise corresponding to the transformed residue r '(n) in the sense of document FR 1350845, degrades in fact the quality.
Ainsi, on utilise ici avantageusement l'information de voisement pour atténuer le bruit en lui appliquant un gain G à l'étape D10. Le signal s(n) issu de l'étape D8 est mixé au signal de bruit r'(n) issu de l'étape D9 en appliquant toutefois ici un gain G qui dépend de l'information de « caractérisation pour la perte de trame issue du flux codé de la trame précédente, soit : Dans ce mode de réalisation particulier, G peut être une constante égale à 1 ou 0,25 en fonction de la nature voisée ou non voisée du signal de la trame précédente, selon le tableau donné ci- après à titre d'exemple : Thus, the voicing information is advantageously used here to attenuate the noise by applying a gain G to the step D10. The signal s (n) resulting from the step D8 is mixed with the noise signal r '(n) resulting from the step D9 by applying however here a gain G which depends on the information of "characterization for the loss of frame from the coded stream of the previous frame, that is: In this particular embodiment, G may be a constant equal to 1 or 0.25 as a function of the voiced or unvoiced nature of the signal of the preceding frame, according to the table given below by way of example:
Dans le mode de réalisation alternatif où l'information de « caractérisation pour la perte de trame » possède plusieurs niveaux discrets caractérisant la platitude PI du spectre. Le gain G peut être exprimé directement en fonction de la valeur Pl. Il en va de même pour la limite du segment Ne pour la recherche de pitch et/ou pour le nombre de pics An à prendre en compte pour la synthèse du signal. In the alternative embodiment where the "frame loss characterization" information has several discrete levels characterizing the PI flatness of the spectrum. The gain G can be expressed directly as a function of the value Pl. The same applies to the limit of the segment Ne for the search for pitch and / or for the number of peaks An to be taken into account for the synthesis of the signal.
On peut définir à titre d'exemple un traitement comme suit. By way of example, a treatment can be defined as follows.
On définit déjà le gain G directement en fonction de la valeur PI : ~ 2* 1 The gain G is already defined directly as a function of the value PI: ~ 2 * 1
En outre, on compare la valeur PI à une valeur moyenne -3dB, étant entendu que la valeur 0 correspond à un spectre plat, et -5 dB correspond à un spectre à pics prononcés. In addition, the value PI is compared to a mean value -3 dB, with the proviso that the value 0 corresponds to a flat spectrum, and -5 dB corresponds to a spectrum with sharp peaks.
Si la valeur PI est inférieure à la valeur moyenne seuil -3dB (correspondant donc à un spectre à pics prononcés, typique d'un signal voisé), alors on peut fixer la durée du segment de recherche de de pitch Ne à 33 ms et sélectionner les pics A(n) tels que A(n)>A(n-l) et A(n)>A(n+l), ainsi que les pics premiers voisins A(n-l) et A(n+1). If the value PI is lower than the average value threshold -3dB (corresponding to a spectrum with pronounced peaks, typical of a voiced signal), then we can set the duration of the segment of search for pitch Ne at 33 ms and select the peaks A (n) such that A (n)> A (nl) and A (n)> A (n + 1), as well as the first neighboring peaks A (nl) and A (n + 1).
Sinon (si la valeur PI est supérieure au seuil, ce qui correspond à des pics moins marqués, plus de bruit de fond comme par exemple un signal de musique), la durée Ne peut être choisie plus courte, par exemple de 25 ms et seuls sont sélectionnés les pics A(n) tels que A(n)>A(n-l) et A(n)>A(n+l). Otherwise (if the PI value is greater than the threshold, which corresponds to less marked peaks, more background noise such as a music signal), the duration Can not be chosen shorter, for example 25 ms and only A (n) peaks such as A (n)> A (n) and A (n)> A (n + 1) are selected.
Le décodage peut se poursuivre ensuite par le mixage du bruit dont le gain est ainsi obtenu aux composantes ainsi sélectionnées pour obtenir le signal de synthèse dans les basses fréquences à l'étrape D13, lequel est ajouté au signal de synthèse dans les hautes fréquences obtenu à l'étape D14, pour obtenir à l'étape D15 le signal global synthétisé. The decoding can then be continued by mixing the noise, the gain of which is thus obtained, with the components thus selected to obtain the synthesis signal in the low frequencies at the D13 tab, which is added to the synthesis signal in the high frequencies obtained at step D14, to obtain in step D15 the synthesized overall signal.
En référence à la figure 6, on a illustré une mise en œuvre possible de l'invention dans laquelle, un décodeur DECOD (comportant par exemple un matériel software et hardware tel qu'une mémoire MEM judicieusement programmée et un processeur PROC coopérant avec cette mémoire, ou en variante un composant tel qu'un ASIC, ou autre, ainsi qu'une interface de communication COM) implanté par exemple dans un dispositif de télécommunication tel qu'un téléphone TEL, utilise, pour la mise en œuvre du procédé de la figure 4, une information de voisement qu'il reçoit d'un codeur COD. Ce codeur comporte par exemple un matériel software et hardware tel qu'une mémoire MEM' judicieusement programmée pour déterminer l'information de voisement et un processeur PROC coopérant avec cette mémoire, ou en variante un composant tel qu'un ASIC, ou autre, ainsi qu'une interface de communication COM'. Le codeur COD est implanté dans un dispositif de télécommunication tel qu'un téléphone TEL' . With reference to FIG. 6, a possible implementation of the invention in which a DECOD decoder (comprising for example a software and hardware hardware such as a judiciously programmed memory MEM and a processor PROC cooperating with this memory is illustrated. , or alternatively a component such as an ASIC, or other, as well as a communication interface COM) implanted for example in a telecommunication device such as a telephone TEL, uses, for the implementation of the method of the FIG. 4, a voicing information that it receives from a coder COD. This encoder comprises, for example, a hardware and software hardware such as a memory MEM 'judiciously programmed to determine the voicing information and a processor PROC cooperating with this memory, or alternatively a component such as an ASIC, or other, and than a communication interface COM '. The coder COD is implanted in a telecommunication device such as a TEL 'telephone.
Bien entendu, la présente invention ne se limite pas aux formes de réalisation décrites ci-avant à titre d'exemple ; elle s'étend à d'autres variantes. Ainsi, par exemple, on comprendra que l'information sur le voisement peut prendre différentes formes susceptibles de variantes. Dans l'exemple décrit ci-avant, il peut s'agir d'une valeur binaire sur un seul bit (voisement ou non), ou encore d'une valeur sur plusieurs bits qui peut être relative à un paramètre tel que la platitude du spectre de signal, ou tout autre paramètre permettant de caractériser (quantitativement ou qualitativement) un voisement. Plus encore, ce paramètre peut être déterminé au décodage, par exemple en fonction du degré de corrélation qui peut être mesuré à l'occasion de l'identification de la période de pitch. Par ailleurs, on a présenté ci-avant à titre d'exemple une réalisation comportant une séparation en une bande de fréquences hautes et une bande de fréquences basses, du signal issu de trames valides précédentes, avec en particulier une sélection des composantes spectrales dans la bande de fréquences basses. Néanmoins, cette réalisation est optionnelle bien qu'avantageuse dans le sens où elle permet de réduire la complexité du traitement. Le procédé de remplacement de trame assisté par l'information de voisement au sens de l'invention peut néanmoins être réalisé en considérant tout le spectre du signal valide, en variante. Of course, the present invention is not limited to the embodiments described above by way of example; it extends to other variants. Thus, for example, it will be understood that the information on voicing can take different forms that can be varied. In the example described above, it may be a binary value on a single bit (voicing or not), or a value on several bits which may be relative to a parameter such as the flatness of the signal spectrum, or any other parameter to characterize (quantitatively or qualitatively) a voicing. Moreover, this parameter can be determined at decoding, for example according to the degree of correlation that can be measured during the identification of the pitch period. On the other hand, an embodiment comprising a separation in a high frequency band and a low frequency band of the signal from previous valid frames, with in particular a selection of the spectral components in the first embodiment, has been presented as an example. low frequency band. Nevertheless, this embodiment is optional although advantageous in the sense that it reduces the complexity of the treatment. The frame replacement method assisted by the voicing information in the sense of the invention can nevertheless be achieved by considering the entire spectrum of the valid signal, alternatively.
Par ailleurs, on a décrit ci-avant un exemple de réalisation dans lequel l'invention était mise en œuvre dans le cadre d'un codage par transformée avec addition et recouvrement. Néanmoins, ce type de procédé peut s'adapter à tout autre type de codage (CELP notamment). In addition, an embodiment has been described above in which the invention was implemented in the context of an addition and overlap transform coding. Nevertheless, this type of process can adapt to any other type of coding (CELP in particular).
Il est à noter que dans le cadre d'un codage par transformée avec addition et recouvrement (dans lequel typiquement le signal de synthèse est construit sur au moins deux durées de trames du fait du recouvrement), le signal de bruit précité peut être obtenu par le résidu (entre le signal valide et la somme des pics) en pondérant ce résidu temporellement. Il peut par exemple être pondéré par des fenêtres de recouvrement, comme dans le cadre habituel d'un codage/décodage par transformée avec recouvrement. It should be noted that in the context of an addition and overlap transform coding (in which typically the synthesis signal is constructed over at least two frame times due to the overlap), the aforementioned noise signal can be obtained by the residue (between the valid signal and the sum of the peaks) by weighting this residue temporally. For example, it can be weighted by overlapping windows, as in the usual framework of a transform coding / decoding with overlap.
On comprendra alors que l'application du gain en fonction de l'information de voisement vient ajouter en outre une autre pondération, cette fois en fonction du voisement. It will then be understood that the application of the gain as a function of the voicing information additionally adds another weighting, this time as a function of the voicing.

Claims

Revendications claims
1. Procédé de traitement d'un signal audionumérique comportant une succession d'échantillons répartis en trames successives, le procédé étant mis en œuvre pendant un décodage dudit signal pour remplacer au moins une trame de signal perdue au décodage, le procédé comportant les étapes : A method of processing a digital audio signal comprising a succession of samples distributed in successive frames, the method being implemented during a decoding of said signal to replace at least one lost signal frame at decoding, the method comprising the steps of:
a) recherche, dans un segment de signal valide disponible au décodage (Ne), d'au moins une période dans le signal, déterminée en fonction dudit signal valide, a) search, in a valid signal segment available at decoding (Ne), of at least one period in the signal, determined according to said valid signal,
b) analyse du signal dans ladite période, pour une détermination de composantes spectrales du signal dans ladite période, b) analyzing the signal in said period for determining spectral components of the signal in said period,
c) synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir : c) synthesizing at least one replacement frame of the lost frame, by constructing a synthesis signal from:
- d'une addition de composantes sélectionnées parmi lesdites composantes spectrales déterminées, et  an addition of selected components from said determined spectral components, and
- d'un bruit ajouté à l'addition de composantes, - a noise added to the addition of components,
dans lequel la quantité de bruit ajoutée à l'addition de composantes est pondérée en fonction d'une information de voisement du signal valide, obtenue au décodage. wherein the amount of noise added to the component addition is weighted according to a valid signal voicing information obtained at decoding.
2. Procédé selon la revendication 1, caractérisé en ce qu'un signal de bruit ajouté à l'addition de composantes est pondéré par un gain plus petit en cas de voisement du signal valide. 2. Method according to claim 1, characterized in that a noise signal added to the addition of components is weighted by a smaller gain in case of voicing the valid signal.
3. Procédé selon la revendication 2, caractérisé en ce que le signal de bruit est obtenu par un résidu entre le signal valide et l'addition des composantes sélectionnées. 3. Method according to claim 2, characterized in that the noise signal is obtained by a residue between the valid signal and the addition of the selected components.
4. Procédé selon l'une des revendications précédentes, caractérisé en ce que le nombre de composantes sélectionnées pour l'addition est plus grand en cas de voisement du signal valide. 4. Method according to one of the preceding claims, characterized in that the number of selected components for the addition is greater in case of voicing of the valid signal.
5. Procédé selon l'une des revendications précédentes, caractérisé en ce que, à l'étape a), la période est recherchée dans un segment de signal valide (Ne) de durée plus grande en cas de voisement du signal valide. 5. Method according to one of the preceding claims, characterized in that, in step a), the period is sought in a valid signal segment (Ne) of greater duration in case of voicing of the valid signal.
6. Procédé selon l'une des revendications précédentes, caractérisé en ce que l'information de voisement est fournie dans un flux codé reçu au décodage et correspondant audit signal comportant une succession d'échantillons répartis en trames successives, 6. Method according to one of the preceding claims, characterized in that the voicing information is provided in a coded stream received decoding and corresponding to said signal comprising a succession of samples distributed in successive frames,
et en ce qu'on utilise, en cas de perte de trame au décodage, l'information de voisement contenue dans une trame de signal valide précédant la trame perdue. and in the case of decoded frame loss, using the voicing information contained in a valid signal frame preceding the lost frame.
7. Procédé selon la revendication 6, caractérisé en ce que l'information de voisement est issue d'un codeur délivrant le flux codé et déterminant l'information de voisement, et en ce que l'information de voisement est codée sur un bit unique dans le flux codé. 7. Method according to claim 6, characterized in that the voicing information is derived from an encoder delivering the coded stream and determining the voicing information, and in that the voicing information is coded on a single bit. in the code stream.
8. Procédé selon la revendication 7, prise en combinaison avec la revendication 2, caractérisé en ce que, si le signal est voisé, la valeur du gain est de 0,25, et elle est de 1 sinon. 8. The method of claim 7, taken in combination with claim 2, characterized in that, if the signal is voiced, the value of the gain is 0.25, and it is 1 otherwise.
9. Procédé selon la revendication 6, caractérisé en ce que l'information de voisement est issue d'un codeur déterminant une valeur de platitude de spectre (PI), obtenue par comparaison à un bruit de fond des amplitudes des composantes spectrales du signal, le codeur délivrant ladite valeur sous forme binaire dans le flux codé. 9. Method according to claim 6, characterized in that the voicing information is derived from an encoder determining a spectrum flatness value (PI), obtained by comparison with background noise amplitudes of the spectral components of the signal, the encoder delivering said value in binary form in the coded stream.
10. Procédé selon la revendication 7, prise en combinaison avec la revendication 2, caractérisé en ce que la valeur du gain est fonction de ladite valeur de platitude. 10. The method of claim 7, taken in combination with claim 2, characterized in that the value of the gain is a function of said platitude value.
11. Procédé selon l'une des revendications 9 et 10, caractérisé en ce que ladite valeur de platitude est comparée à un seuil pour déterminer : 11. Method according to one of claims 9 and 10, characterized in that said platitude value is compared to a threshold to determine:
- que le signal est voisé si la valeur de platitude est inférieure au seuil, et  the signal is voiced if the value of flatness is below the threshold, and
- que le signal n'est pas voisé sinon. - that the signal is not voiced otherwise.
12. Procédé selon l'une des revendications 7 et 11, prises en combinaison avec la revendication 4, caractérisé en ce que : 12. Method according to one of claims 7 and 11, taken in combination with claim 4, characterized in that:
- si le signal est voisé, on sélectionne les composantes spectrales dont les amplitudes sont supérieures à celles des premières composantes spectrales voisines, ainsi que les premières composantes spectrales voisines, et  if the signal is voiced, the spectral components whose amplitudes are greater than those of the first neighboring spectral components, as well as the first neighboring spectral components, are selected, and
- on ne sélectionne que les composantes spectrales dont les amplitudes sont supérieures à celles des premières composantes spectrales voisines, sinon. only the spectral components whose amplitudes are greater than those of the first neighboring spectral components are selected, if not.
13. Procédé selon l'une des revendications 7 et 11, prise en combinaison avec la revendication 5, caractérisé en ce que : 13. Method according to one of claims 7 and 11, taken in combination with claim 5, characterized in that:
- si le signal est voisé, la période est recherchée dans un segment de signal valide de durée supérieure à 30 millisecondes,  if the signal is voiced, the period is searched for in a valid signal segment of duration greater than 30 milliseconds,
- et, sinon, la période est recherchée dans un segment de signal valide de durée inférieure à 30 millisecondes.  - and, otherwise, the period is searched for in a valid signal segment of duration less than 30 milliseconds.
14. Programme informatique caractérisé en ce qu'il comporte des instructions pour la mise en œuvre du procédé selon l'une des revendications 1 à 13, lorsque ce programme est exécuté par un processeur. 14. Computer program characterized in that it comprises instructions for the implementation of the method according to one of claims 1 to 13, when the program is executed by a processor.
15. Dispositif de décodage d'un signal audionumérique comportant une succession d'échantillons répartis en trames successives, le dispositif comportant des moyens (MEM, PROC) pour remplacer au moins une trame de signal perdue, par : 15. Device for decoding a digital audio signal comprising a succession of samples distributed in successive frames, the device comprising means (MEM, PROC) for replacing at least one lost signal frame, by:
a) recherche, dans un segment de signal valide disponible au décodage (Ne), d'au moins une période dans le signal, déterminée en fonction dudit signal valide, a) search, in a valid signal segment available at decoding (Ne), of at least one period in the signal, determined according to said valid signal,
b) analyse du signal dans ladite période, pour une détermination de composantes spectrales du signal dans ladite période, b) analyzing the signal in said period for determining spectral components of the signal in said period,
c) synthèse d'au moins une trame de remplacement de la trame perdue, par construction d'un signal de synthèse à partir : c) synthesizing at least one replacement frame of the lost frame, by constructing a synthesis signal from:
- d'une addition de composantes sélectionnées parmi lesdites composantes spectrales déterminées, et  an addition of selected components from said determined spectral components, and
- d'un bruit ajouté à l'addition de composantes,  - a noise added to the addition of components,
la quantité de bruit ajoutée à l'addition de composantes étant pondérée en fonction d'une information de voisement du signal valide, obtenue au décodage. the amount of noise added to the addition of components being weighted according to a voicing information of the valid signal obtained at decoding.
16. Dispositif de codage d'un signal audionumérique, comportant des moyens (ΜΕΜ', PROC) pour fournir une information de voisement dans un flux codé que délivre le dispositif de codage, en distinguant un signal de parole susceptible d'être voisé d'un signal de musique, et, dans le cas d'un signal de parole, en : 16. A device for coding a digital audio signal, comprising means (ΜΕΜ ', PROC) for providing a voicing information in a coded stream that the coding device delivers, distinguishing a speech signal that can be voiced by a music signal, and, in the case of a speech signal,
- identifiant que le signal est voisé ou générique, pour le considérer globalement voisé, ou - en identifiant que le signal est inactif, transitoire ou non voisé, pour le considérer globalement comme non voisé. identifying that the signal is voiced or generic, to consider it globally voiced, or - identifying that the signal is inactive, transient or unvoiced, to consider globally as unvoiced.
EP15725801.3A 2014-04-30 2015-04-24 Improved frame loss correction with voice information Active EP3138095B1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR1453912A FR3020732A1 (en) 2014-04-30 2014-04-30 PERFECTED FRAME LOSS CORRECTION WITH VOICE INFORMATION
PCT/FR2015/051127 WO2015166175A1 (en) 2014-04-30 2015-04-24 Improved frame loss correction with voice information

Publications (2)

Publication Number Publication Date
EP3138095A1 true EP3138095A1 (en) 2017-03-08
EP3138095B1 EP3138095B1 (en) 2019-06-05

Family

ID=50976942

Family Applications (1)

Application Number Title Priority Date Filing Date
EP15725801.3A Active EP3138095B1 (en) 2014-04-30 2015-04-24 Improved frame loss correction with voice information

Country Status (12)

Country Link
US (1) US10431226B2 (en)
EP (1) EP3138095B1 (en)
JP (1) JP6584431B2 (en)
KR (3) KR20170003596A (en)
CN (1) CN106463140B (en)
BR (1) BR112016024358B1 (en)
ES (1) ES2743197T3 (en)
FR (1) FR3020732A1 (en)
MX (1) MX368973B (en)
RU (1) RU2682851C2 (en)
WO (1) WO2015166175A1 (en)
ZA (1) ZA201606984B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3020732A1 (en) * 2014-04-30 2015-11-06 Orange PERFECTED FRAME LOSS CORRECTION WITH VOICE INFORMATION
EP3389043A4 (en) * 2015-12-07 2019-05-15 Yamaha Corporation Speech interacting device and speech interacting method
CA3145047A1 (en) * 2019-07-08 2021-01-14 Voiceage Corporation Method and system for coding metadata in audio streams and for efficient bitrate allocation to audio streams coding
CN111883171B (en) * 2020-04-08 2023-09-22 珠海市杰理科技股份有限公司 Audio signal processing method and system, audio processing chip and Bluetooth device

Family Cites Families (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR1350845A (en) 1962-12-20 1964-01-31 Classification process visible without index
FR1353551A (en) 1963-01-14 1964-02-28 Window intended in particular to be mounted on trailers, caravans or similar installations
US5504833A (en) * 1991-08-22 1996-04-02 George; E. Bryan Speech approximation using successive sinusoidal overlap-add models and pitch-scale modifications
US5956674A (en) * 1995-12-01 1999-09-21 Digital Theater Systems, Inc. Multi-channel predictive subband audio coder using psychoacoustic adaptive bit allocation in frequency, time and over the multiple channels
US5799271A (en) * 1996-06-24 1998-08-25 Electronics And Telecommunications Research Institute Method for reducing pitch search time for vocoder
JP3364827B2 (en) * 1996-10-18 2003-01-08 三菱電機株式会社 Audio encoding method, audio decoding method, audio encoding / decoding method, and devices therefor
WO1999010719A1 (en) * 1997-08-29 1999-03-04 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
EP0932141B1 (en) * 1998-01-22 2005-08-24 Deutsche Telekom AG Method for signal controlled switching between different audio coding schemes
US6640209B1 (en) * 1999-02-26 2003-10-28 Qualcomm Incorporated Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder
US6138089A (en) * 1999-03-10 2000-10-24 Infolio, Inc. Apparatus system and method for speech compression and decompression
US6691092B1 (en) * 1999-04-05 2004-02-10 Hughes Electronics Corporation Voicing measure as an estimate of signal periodicity for a frequency domain interpolative speech codec system
US6912496B1 (en) * 1999-10-26 2005-06-28 Silicon Automation Systems Preprocessing modules for quality enhancement of MBE coders and decoders for signals having transmission path characteristics
US7016833B2 (en) * 2000-11-21 2006-03-21 The Regents Of The University Of California Speaker verification system using acoustic data and non-acoustic data
US20030028386A1 (en) * 2001-04-02 2003-02-06 Zinser Richard L. Compressed domain universal transcoder
JP4089347B2 (en) * 2002-08-21 2008-05-28 沖電気工業株式会社 Speech decoder
US7970606B2 (en) * 2002-11-13 2011-06-28 Digital Voice Systems, Inc. Interoperable vocoder
DE10254612A1 (en) * 2002-11-22 2004-06-17 Humboldt-Universität Zu Berlin Method for determining specifically relevant acoustic characteristics of sound signals for the analysis of unknown sound signals from a sound generation
JP2006508386A (en) * 2002-11-27 2006-03-09 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Separating sound frame into sine wave component and residual noise
JP3963850B2 (en) * 2003-03-11 2007-08-22 富士通株式会社 Voice segment detection device
US7318035B2 (en) * 2003-05-08 2008-01-08 Dolby Laboratories Licensing Corporation Audio coding systems and methods using spectral component coupling and spectral component regeneration
US7825321B2 (en) * 2005-01-27 2010-11-02 Synchro Arts Limited Methods and apparatus for use in sound modification comparing time alignment data from sampled audio signals
US7930176B2 (en) * 2005-05-20 2011-04-19 Broadcom Corporation Packet loss concealment for block-independent speech codecs
KR100744352B1 (en) * 2005-08-01 2007-07-30 삼성전자주식회사 Method of voiced/unvoiced classification based on harmonic to residual ratio analysis and the apparatus thereof
US7720677B2 (en) * 2005-11-03 2010-05-18 Coding Technologies Ab Time warped modified transform coding of audio signals
US8255207B2 (en) * 2005-12-28 2012-08-28 Voiceage Corporation Method and device for efficient frame erasure concealment in speech codecs
US8135047B2 (en) * 2006-07-31 2012-03-13 Qualcomm Incorporated Systems and methods for including an identifier with a packet associated with a speech signal
BRPI0711094A2 (en) * 2006-11-24 2011-08-23 Lg Eletronics Inc method for encoding and decoding the object and apparatus based audio signal of this
KR100964402B1 (en) * 2006-12-14 2010-06-17 삼성전자주식회사 Method and Apparatus for determining encoding mode of audio signal, and method and appartus for encoding/decoding audio signal using it
US8060363B2 (en) * 2007-02-13 2011-11-15 Nokia Corporation Audio signal encoding
US8990073B2 (en) * 2007-06-22 2015-03-24 Voiceage Corporation Method and device for sound activity detection and sound signal classification
CN100524462C (en) * 2007-09-15 2009-08-05 华为技术有限公司 Method and apparatus for concealing frame error of high belt signal
US20090180531A1 (en) * 2008-01-07 2009-07-16 Radlive Ltd. codec with plc capabilities
US8036891B2 (en) * 2008-06-26 2011-10-11 California State University, Fresno Methods of identification using voice sound analysis
PL2304723T3 (en) * 2008-07-11 2013-03-29 Fraunhofer Ges Forschung An apparatus and a method for decoding an encoded audio signal
US8718804B2 (en) * 2009-05-05 2014-05-06 Huawei Technologies Co., Ltd. System and method for correcting for lost data in a digital audio signal
FR2966634A1 (en) * 2010-10-22 2012-04-27 France Telecom ENHANCED STEREO PARAMETRIC ENCODING / DECODING FOR PHASE OPPOSITION CHANNELS
WO2014036263A1 (en) * 2012-08-29 2014-03-06 Brown University An accurate analysis tool and method for the quantitative acoustic assessment of infant cry
US8744854B1 (en) * 2012-09-24 2014-06-03 Chengjun Julian Chen System and method for voice transformation
FR3001593A1 (en) * 2013-01-31 2014-08-01 France Telecom IMPROVED FRAME LOSS CORRECTION AT SIGNAL DECODING.
US9564141B2 (en) * 2014-02-13 2017-02-07 Qualcomm Incorporated Harmonic bandwidth extension of audio signals
FR3020732A1 (en) * 2014-04-30 2015-11-06 Orange PERFECTED FRAME LOSS CORRECTION WITH VOICE INFORMATION
US9697843B2 (en) * 2014-04-30 2017-07-04 Qualcomm Incorporated High band excitation signal generation

Also Published As

Publication number Publication date
BR112016024358A2 (en) 2017-08-15
FR3020732A1 (en) 2015-11-06
ES2743197T3 (en) 2020-02-18
RU2016146916A3 (en) 2018-10-26
JP6584431B2 (en) 2019-10-02
RU2016146916A (en) 2018-05-31
KR20220045260A (en) 2022-04-12
KR20230129581A (en) 2023-09-08
US10431226B2 (en) 2019-10-01
BR112016024358B1 (en) 2022-09-27
ZA201606984B (en) 2018-08-30
MX368973B (en) 2019-10-23
US20170040021A1 (en) 2017-02-09
WO2015166175A1 (en) 2015-11-05
CN106463140B (en) 2019-07-26
MX2016014237A (en) 2017-06-06
JP2017515155A (en) 2017-06-08
EP3138095B1 (en) 2019-06-05
CN106463140A (en) 2017-02-22
KR20170003596A (en) 2017-01-09
RU2682851C2 (en) 2019-03-21

Similar Documents

Publication Publication Date Title
EP1316087B1 (en) Transmission error concealment in an audio signal
EP2080195B1 (en) Synthesis of lost blocks of a digital audio signal
EP2951813B1 (en) Improved correction of frame loss when decoding a signal
EP2277172B1 (en) Concealment of transmission error in a digital signal in a hierarchical decoding structure
CA2909401C (en) Frame loss correction by weighted noise injection
EP2727107B1 (en) Delay-optimized overlap transform, coding/decoding weighting windows
EP3175444B1 (en) Frame loss management in an fd/lpd transition context
EP2080194B1 (en) Attenuation of overvoicing, in particular for generating an excitation at a decoder, in the absence of information
EP3138095B1 (en) Improved frame loss correction with voice information
EP3175443B1 (en) Determining a budget for lpd/fd transition frame encoding
EP2795618B1 (en) Method of detecting a predetermined frequency band in an audio data signal, detection device and computer program corresponding thereto
EP2347411B1 (en) Pre-echo attenuation in a digital audio signal
WO2009047461A1 (en) Transmission error dissimulation in a digital signal with complexity distribution

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20161007

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

RIN1 Information on inventor provided before grant (corrected)

Inventor name: RAGOT, STEPHANE

Inventor name: FAURE, JULIEN

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20171103

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: G10L 19/005 20130101AFI20181203BHEP

Ipc: G10L 19/20 20130101ALN20181203BHEP

Ipc: G10L 25/93 20130101ALN20181203BHEP

INTG Intention to grant announced

Effective date: 20190107

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: AT

Ref legal event code: REF

Ref document number: 1140799

Country of ref document: AT

Kind code of ref document: T

Effective date: 20190615

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602015031383

Country of ref document: DE

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: LANGUAGE OF EP DOCUMENT: FRENCH

REG Reference to a national code

Ref country code: NL

Ref legal event code: FP

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG4D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190905

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190906

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190905

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1140799

Country of ref document: AT

Kind code of ref document: T

Effective date: 20190605

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20191007

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2743197

Country of ref document: ES

Kind code of ref document: T3

Effective date: 20200218

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20191005

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602015031383

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: TR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

26N No opposition filed

Effective date: 20200306

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200430

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200424

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200430

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20200430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200430

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20200424

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20190605

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20230321

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: IT

Payment date: 20230322

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20230502

Year of fee payment: 9

Ref country code: DE

Payment date: 20230321

Year of fee payment: 9

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: NL

Payment date: 20240320

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20240320

Year of fee payment: 10