EP0208712B1 - Procede et appareil adaptatifs de codage de la parole - Google Patents

Procede et appareil adaptatifs de codage de la parole Download PDF

Info

Publication number
EP0208712B1
EP0208712B1 EP86900480A EP86900480A EP0208712B1 EP 0208712 B1 EP0208712 B1 EP 0208712B1 EP 86900480 A EP86900480 A EP 86900480A EP 86900480 A EP86900480 A EP 86900480A EP 0208712 B1 EP0208712 B1 EP 0208712B1
Authority
EP
European Patent Office
Prior art keywords
coefficients
spectrum
subbands
speech
transmitted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP86900480A
Other languages
German (de)
English (en)
Other versions
EP0208712A4 (fr
EP0208712A1 (fr
Inventor
Israel Bernard Zibman
Baruch Mazor
Dale E. Veeneman
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Verizon Laboratories Inc
Original Assignee
GTE Laboratories Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US06/798,174 external-priority patent/US4790016A/en
Application filed by GTE Laboratories Inc filed Critical GTE Laboratories Inc
Publication of EP0208712A1 publication Critical patent/EP0208712A1/fr
Publication of EP0208712A4 publication Critical patent/EP0208712A4/fr
Application granted granted Critical
Publication of EP0208712B1 publication Critical patent/EP0208712B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques

Definitions

  • the invention refers to a speech encoder as set forth in the preamble of claim 1.
  • a speech encoder of this type is known from EP-8-0 176 243.
  • a coder for speech signals comprising separation means for receiving speech signals and generating series of values, each series representing respective portions of the frequency spectrum of the input signal and, encoding means for digitally encoding each series, and bit location means for varying the number of bits used for encoding the respective series in dependence on the relative energy content thereof, wherein the number of series to which any given number of bits is allocated is constant and only the selection of the series to which respective numbers of bits are allocated is varied.
  • analog telephone systems are being replaced by digital systems.
  • digital systems the analog signals are sampled at a rate of about twice the bandwidth of the analog signals or about eight kilohertz, and the samples are then encoded.
  • PCM pulse code modulation system
  • each sample is quantized as one of a discrete set of prechosen values and encoded as a digital word which is then transmitted over the telephone lines.
  • the analog sample is quantized to 28 or 256 levels, each of which is designated by a different eight bit word.
  • nonlinear quantization excellent quality speech can be obtained with only seven bits per sample; but since a seven bit word is still required for each sample, transmission bit rates of 56 kilobits per second are necessary.
  • the linear predictive coding (LPC) technique is based on the recognition that speech production involves excitation and a filtering process.
  • the excitation is determined by the vocal cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating signal is then modified by the filtering process of vocal resonance chambers, including the mouth and nasal passages.
  • a digital filter which simulates the formant effects of the resonance chambers can be defined and the definition can be encoded.
  • a residual signal which approximates the excitation can then be obtained by passing the speech signal through an inverse formant filter, and the residual signal can be encoded.
  • the receiver Because sufficient information is contained in the lower-frequency portion of the residual spectrum, it is possible to encode only the low frequency baseband and still obtain reasonably clear speech.
  • a definition of the formant filter and the residual baseband are decoded.
  • the baseband is repeated to complete the spectrum of the residual signal.
  • the decoded filter By applying the decoded filter to the repeated baseband signal, the initial speech can be reconstructed.
  • a major problem of the LPC approach is in defining the formant filter which must be redefined with each window of samples.
  • a complex encoder and a complex decoder are required to obtain transmission rates as low as 16,000 bits per second.
  • Another problem with such systems is that they do not always provide a satisfactory reconstruction of certain formants such as that resulting, for example, from nasal resonance. It is the object of the invention to solve these problems.
  • the approximate envelope of the transform spectrum in each of a plurality of subbands of coefficients is defined and each envelope definition is encoded for transmission.
  • Each spectrum coefficient is then scaled relative to the defined envelope of the respective subband, and each scaled coefficient is encoded in a number of bits which is determined by the defined envelope of its subband.
  • Zero bits may be allotted to a number of less significant subbands as indicated by the defined envelopes; and varying numbers of bits may be used for each encoded coefficient depending on the magnitude of the defined envelope for the respective subband.
  • the subbands which are transmitted and the resolution with which the transmitted subbands are encoded are determined adaptively for each sample window based on the defined envelopes of the subbands.
  • the subbands which are transmitted are replicated to define coefficients of frequencies which are not transmitted.
  • a list replication procedure is followed by which an nth coefficient which is transmitted is replicated as an nth coefficient which is not transmitted.
  • the speech signal can be recreated by using the transmitted envelope definitions to inverse scale the coefficients of the respective subbands and by performing an inverse transform.
  • the spectrum is normalized first with respect to only a few regions and subsequently with respect to a greater number of subregions.
  • the maximum magnitude in each of the regions and in each of the subregions is encoded.
  • the maximums are logarithmically encoded and only a baseband of the normalized spectrum is encoded.
  • FIG. 1 A block diagram of the system is shown in Fig. 1. Speech is filtered with a telephone bandpass filter 20 which prevents aliasing when the signal is sampled 8,000 times per second in sampling circuit 22.
  • the analog samples are digitally encoded in an analog to digital encoder 24 and are preprocessed at 26 prior to being applied to a discrete Fourier transform unit 28.
  • the output of the Fourier transform circuit 28 is a sequence of coefficients which indicate the magnitude and phase of the Fourier transform spectrum at each of 97 frequencies spaced 41.667 hertz apart.
  • the magnitude spectrum of the Fourier transform output is illustrated as a continuous function in Fig. 3 but it is recognized that the transform circuit 28 would actually provide only 97 incremental outputs.
  • the Fourier transform spectrum of the full speech within a selected window is equalized and encoded in circuit 30 in a manner which will be discussed below.
  • the resultant digital signal can be transmitted at 16,000 bits per second over a line 32 to a receiver.
  • the full spectrum of Fig. 3 is reconstructed in circuit 34.
  • the inverse Fourier transform is performed in circuit 36 and applied through a post-processor 38 corresponding to the pre-processor 26. That signal is then converted to analog form in digital to analog converter 40.
  • Final filtering in filter 42 provides clear speech to the listener.
  • a pipelined multiprocessor architecture is employed.
  • One microcomputer is dedicated to the analog to digital conversion with preemphasis filtering, one is dedicated to the forward Fourier transform and a third is dedicated to the spectral equalization and coding.
  • one microcomputer is dedicated to spectrum reconstruction, another to inverse Fourier transform and a third to digital to analog conversion with deemphasis filtering.
  • the spectral equalization and encoding technique of the present invention is based on the recognition that the Fourier transform of the total signal includes a relatively flat spectrum of the pitch illustrated in Fig. 4 shaped by formant signals.
  • the signal of Fig. 4 is obtained by normalizing the spectrum of Fig. 3 to at least one curve which itself can be encoded separate from the residual spectrum of Fig. 4.
  • the analog speech signal Prior to compression, the analog speech signal is low pass filtered in filter 20 at 3.4 kilohertz, sampled in sampler 22 at a rate of 8 kilohertz, and digitized using a 12 bit linear analog to digital converter 24. It will be recognized that the input to the encoder may already be in digital form and may require conversion to the code which can be accepted by the encoder.
  • the digitized speech signal in frames of N samples, is first scaled up in a scaler 26 to maximize its dynamic range in each frame. The scaled input samples are then Fourier transformed in a fast Fourier transform device 28 to obtain a corresponding discrete spectrum represented by (N/2)+ 1 complex frequency coefficients.
  • the input frame size equals 180 samples and corresponds to a frame every 22.5 milliseconds.
  • the discrete Fourier transform is performed on 192 samples, including 12 samples overlapped with the previous frame, preceded by trapezoidal windowing with a 12 point slope at each end.
  • the resulting output of the FFT includes 97 complex frequency coefficients spaced 41.667 Hertz apart.
  • FIG. 3 An example magnitude spectrum of a Fourier transform output from FFT 28 is illustrated in Figure 3. Although illustrated as a continuous function, it is recognized that the transform circuit 28 actually provides only 97 incremental complex outputs.
  • the magnitude spectrum of the Fourier transform output is equalized and encoded.
  • the spectrum is partitioned into contiguous subbands and a spectral envelope estimate is based on a piecewise approximation of those subbands at 44.
  • the spectrum is divided into twenty subbands, each including four complex coefficients. Frequencies above 3291.67 Hertz are not encoded and are set to zero at the receiver.
  • the spectral envelope of each subband is assumed constant and is defined by the peak magnitude in each subband as illustrated by the horizontal lines in Figure 3.
  • Each magnitude, or more correctly the inverse thereof, can be treated as a scale factor for its respective subband.
  • Each scale factor is quantized in a quantizer 45 to four bits.
  • a nonuniform bit allocation is used for the complex coefficients which are transmitted.
  • Three separate two dimensional quantizers 50 are used for the transmitted 12 subbands.
  • the sixteen complex coefficients of the four subbands having the smallest scale factors are quantized to seven bits each.
  • the coefficients of the four subbands having the next smallest scale factors are quantized to six bits each, and the coefficients of the remaining four of the transmitted subgroups are quantized to four bits each. In effect, the coefficients of the eight subbands which are not transmitted are quantized to zero bits.
  • Each of the two dimensional quantizers is designed using an approach presented by Linde, et al., "An Algorithm for Vector Quantizer Design," IEEE Trans on Commun , Vol COM-28, pp. 84-95, Jan 1980.
  • the result for the seven bit quantizer is shown in Figure 5.
  • the two dimensions of the quantizer are the real and imaginary components of each complex coefficient.
  • Each cluster has a seven bit representation to which each complex point in the cluster is quantized. Actual quantization may be by table look-up in a read only memory.
  • Time scaling 4 bits
  • Synchronization 4 bits TOTAL 360 bits
  • the transmitted 12 groups of coefficients are applied to corresponding seven bit, six bit and four bit inverse quantizers at 52.
  • the frequency subbands to which the resulting coefficients correspond are determined by the scale factors which are transmitted in sequence for all subbands.
  • the coefficients from the seven bit inverse quantizer are placed in the subbands which the scale factors indicate to be of the greatest magnitude.
  • the coefficients of the eight subbands which are not transmitted are approximated by replication of transmitted subbands at 54.
  • a list replication approach is utilized. This approach is illustrated by Figure 6.
  • the coefficients for each subband are illustrated by a single vector.
  • the transmitted subbands are indicated as T1, T2, T3, . . .Tn, . . .
  • the subbands which must be produced by replication in the receiver are indicated as R1, R2, R3, . . . Rn, . . .
  • the coefficients of the subband Tn are used both for Tn and for Rn.
  • the scaled coefficients for subband T1 are repeated at subband R1, those of subband T2 are repeated at R2, and those at subband T3 are repeated at R3.
  • the rationale for this list replication technique is that subbands are themselves usually grouped in blocks of transmitted subbands and blocks of nontransmitted subbands. Thus, large blocks of coefficients are typically repeated using this approach and speech harmonics are maintained in the replication process.
  • a reproduction of the spectrum of Figure 3 can be generated at 56 by applying the scale factors to the equalized spectrum. From that Fourier transform reproduction of the original Fourier transform, the speech can be obtained through an inverse FFT 36, an inverse scaler 38, a digital to analog converter 40 and a reconstruction filter 42.
  • a distinct advantage of the present system is that the coder is not based on an assumed fixed low pass spectrum model which is speech specific.
  • Voice-band data and signaling take the form of sine waves of some bandwidth which may occur at any frequency. Where only a lower or an upper baseband of coefficients is transmitted, voice-band data can be lost. With the present system, the subbands in which digital information is transmitted are naturally selected because of their higher energy.
  • Embedded coding important as a method of congestion control in telephone applications, allows the data to leave the encoder at a constant bit rate, yet be received at the decoder at a lower bit rate as some bits are discarded enroute.
  • Embedded coding implies a packet or block of bits within which there is a hierarchy of subblocks. Least crucial subblocks can be discarded first as the channel gets overloaded.
  • This hierarchical concept is a natural one in the present system where the partial-band information, described by a set of frequency coefficients, is ordered in a decreasing significance and the missing coefficients can always be approximated from the received ones. The more coefficients in the set, the higher is the rate and the better is the quality. However, speech quality degrades very gracefully with modest drops in the rate.
  • the implementation of an embedded coding system in conjunction with this approach is therefore fairly simple and very attractive.
  • the coding technique described above provides for excellent speech coding and reproduction at 16 kilobits per second. Excellent results as low as 8.0 kilobits per second can be obtained by using this technique in conjunction with a frequency scaling technique known as time domain harmonic scaling and described by D. Malah, "Time Domain Algorithms for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals", IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-27, pp. 121-133, Apr. 1979.
  • speech at twice the rate of the original speech but at the original pitch is generated by combining adjacent pitch cycles.
  • the frequency scaled speech can then be fast Fourier transformed in the technique described above.
  • each of the steps of residual extraction, subband selection, and quantizing and the steps of inverse quantizing, replication and envelope excitation are shown as individual elements of the system, it will be recognized that they can be merged in an actual system.
  • the residual spectrum for subbands which are not transmitted need not be obtained.
  • the system can be implemented using a combination of software and hardware.
  • the shape of the spectrum is determined by a two-step process. This process also encodes the shape of the entire 100 to 3,800 Hz spectrum since this is useful in the baseband coding.
  • the spectrum is divided into four regions illustrated in Fig. 7: 125 - 583 Hz 625 - 1959 Hz 2000 - 3416 Hz 3468 - 3833 Hz These regions correspond roughly to the usual locations of the first four formants.
  • the dynamic range of the magnitudes of the spectral coefficients is much smaller within each of these regions than in the spectrum as a whole. For voiced phonemes the peak magnitude near 250 Hz can be 30 dB above the magnitudes near 3,800 Hz.
  • the first step of spectral normalization is performed by finding the peak magnitudes within each region, quantizing these peaks to 5 bits each with a logarithmic quantizer, and dividing each spectral coefficient by the quantized peak in its region.
  • the result is a vector of spectral coefficients with maximum magnitude equal to unity.
  • the division into regions should result in the spectral coefficients being reasonably uniformly distributed within the complex disc of radius one.
  • the second step extracts more detailed structure.
  • the spectrum is divided into equal bands of about 165 Hz each.
  • the peak magnitude within each band is located and quantized to 3 bits.
  • the complex spectral coefficients within the band are divided by the quantized magnitude and coded to 6 bits each using a hexagonal quantizer. This coding preserves phase information that is important for reconstruction of frame boundaries.
  • the preprocessor 26 is a single-pole pre-emphasis filter. Low frequencies are attenuated by about 5 dB. High frequencies are boosted. The highest frequency (4 kHz) is boosted by about 24 dB.
  • the filter is useful in equalizing the spectrum by reducing the low-pass effects of the initializing filter and the high-frequency attenuation of the lips. The boosting helps to maintain numerical accuracy in the subsequent computation of the Fourier transform.
  • the spectrum is normalized to a curve which in this case is selected as a horizontal line through the peak magnitude of the spectrum in each region. These curves are shown as lines 58, 60, 62 and 64 in Figure 7.
  • the peak magnitude of the complex numbers in each region is determined and encoded to five bits at unit 66 of Fig. 11 by finding a value k which is encoded such that the peak magnitude is between 162 x 2 12(k-1)/32 and 162 x 2 12k/32 . This results in logarithmic encoding of the peak magnitude.
  • the four k values, each encoded in five bits make up a total of 20 bits from the formant encoder which are the most significant bits of the transmitted code for the window. All spectral coefficients in each of the four regions are then divided by the 162 x 2 12k/32 in the spectral normalization unit 68. By this method, all of the resultant magnitudes, illustrated in Figure 8, are less than 1.
  • the normalized coefficients output from unit 68 are grouped into 20 regions of four and two subregions of five illustrated in Figure 8.
  • the peak magnitude in each of these subregions is determined and encoded to three bits with a logarithmic quantizer in unit 70.
  • the peak is always coded to the next largest value.
  • the three bits from each of the 22 subregions provide an additional 66 bits of the final signal for the window.
  • Each output within a subregion is multipled by the reciprocal of the quantized magnitude in the sample normalization unit 72, thus ensuring that all outputs illustrated in Fig. 9 remain less than 1.
  • Each complex output from the baseband of 125 Hz to 1959 Hz of the normalized spectrum of Fig. 9 is coded to six bits with the two dimensional quantizer and encoder 74.
  • the two-dimensional quantizer is formed by dividing a complex disc of radius one into hexagons as shown in Figure 10.
  • the x, y coordinates are radially warped by an exponential function to approximate a logarithmic coding of the magnitude. All points within a hexagon are quantized to the coordinates of the center of the hexagon.
  • coefficients of large magnitude are coded to better phase resolution than coefficients of small magnitude.
  • Actual quantization is done by table lookup, but efficient computational algorithms are possible.
  • the actual coding transformations, bit allocations, and subband sizes may be changed as the coder is optimized for different applications.
  • All normalization factors (four at 5 bits each, 22 at 3 bits each) and the coded normalized baseband coefficients (45 at 6 bits) are transmitted.
  • the baseband is decoded and duplicated into the upper frequency ranged.
  • the normalization factors are applied onto the spectrum to restore the original shape. Specifically, in the receiver, the inverse Fourier Transform Inputs 0 to 2 and 93 to 96 are set to zero.
  • the normalized complex coefficients for Inputs 3 to 47 are reconstructed from the quantizer codes by table lookup. They are duplicated into Positions 48 to 92. This duplication is the nonlinear regeneration step. The scale factors for the subregions and larger regions are then applied.
  • the inverse transform is computed in unit 36.
  • the effects of the windowing are removed by adding the last 12 points of the previous inverse transform to the first 12 points from the current inverse transform.
  • the speech now passes through filter 38, which is an inverse to the pre-emphasis filter and which attenuates the high frequencies, removing the effects of the treble boost and reducing high-frequency quantization noise.
  • the outputs are converted to analog with a 12-bit linear analog to digital converter 40.
  • the baseband which is repeated in the spectrum reconstruction has been described as being a band of lower frequencies.
  • the baseband may include any range of frequencies within the spectrum. For some sounds where higher energy levels are found in the higher frequencies, a baseband of the higher frequencies is preferred.
  • the baseband suffers degradations only from quantization errors.
  • the reconstruction of the upper frequencies is only as good as the model and the shaping information.
  • each formant is excited at approximately the right frequency. This is an improvement over baseband residual excitation in which some parts of the spectrum may have too little energy.
  • the reduction in computational complexity due to peak finding and scaling instead of linear prediction analysis and filtering is very significant.
  • This approach is a wideband approach in that the entire voice frequency range is coded.
  • the major problem with other wideband systems at 16 kb/s is that there are barely enough bits available to give a rough description of the waveform.
  • Baseband excitation systems such as the present system meet that problem by devoting most of the bits to the baseband and regenerating the excitation signal for higher frequencies.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

Une transformée de Fourier (28) de la parole est introduite dans un codeur (Fig.1) de la parole. La transformée de Fourier est égalisée (30) en normalisant les coefficients du spectre jusqu'à obtenir une courbe approximée de la forme du spectre. La courbe et le spectre égalisé sont codés. Dans un système, des facteurs d'échelle (45) sont générés et codés pour chaque sous-bande d'une pluralité de sous-bandes d'un spectre de parole obtenu par une transformée de Fourier. Le spectre est égalisé (46) sur la base de ces facteurs d'échelle (46). Les coefficients d'un nombre limité de sous-bandes (48) déterminé par les facteurs d'échelle sont codés (50). Le nombre de bits utilisé pour coder chaque coefficient de chaque sous-bande transmise est déterminé par le facteur d'échelle pour chaque sous-bande. Au niveau du récepteur, les coefficients des sous-bandes non transmises sont calculés par approximation selon une technique de reproduction de liste (54).

Claims (18)

  1. Codeur de la parole comprenant :
       un moyen de transformation de Fourier (28) assurant une transformation discrète de Fourier d'un signal de parole entrant pour engendrer un spectre transformé discret de coefficients;
       un moyen de normalisation (30) pour modifier le spectre transforme pour obtenir un spectre normalisé plus plat et pour coder une fonction par laquelle le spectre discret est modifié; et
       un moyen (30) pour coder au moins une partie du spectre,
    caractérisé en ce que
       le dit moyen de normalisation (30) comprend un moyen (44) pour définir l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et pour coder l'enveloppe définie de chaque sous-bande de coefficients et un moyen pour établir chaque coefficient du spectre par rapport à l'enveloppe définie de la sous-bande respective de coefficients; et
       le dit moyen (30) pour coder code les coefficients établis du spectre à l'intérieur de chaque sous-bande dans un nombre de binons déterminé par l'enveloppe définie de la sous-bande.
  2. Système de codage de la parole selon la revendication 1 dans lequel le nombre déterminé de binons pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
  3. Système de codage de la parole selon la revendication 2 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
  4. Système de codage de la parole selon la revendication 2 dans lequel la parole codée est codée en copiant des sous-bandes des coefficients transmis en tant que substituts pour les sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la n ième sous-bandes qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmisé.
  5. Système de codage de la parole selon la revendication 1 dans lequel les coefficients des différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
  6. Système de codage de la parole selon la revendication 1 dans lequel :
       le moyen de codage (30) code les coefficients établis de moins de toutes les sous-bandes, les coefficients établis codes étant ceux correspondant aux enveloppes définies de plus grande amplitude, les coefficients établis des sous-bandes correspondant aux enveloppes définies de plus grande amplitude étant codés en plus de binons que les coefficients des sous-bandes correspondant aux enveloppes définies d'amplitude plus petite.
  7. Système de codage de la parole selon la revendication 6 dans lequel la parole codée est décodée en copiant des sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle manière que la n ième sous-bande qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmise.
  8. Système de codage de la parole selon la revendication 6 dans lequel le moyen de transformation (28) réalise une transformation discrète de Fourier.
  9. Système de codage de la parole selon la revendication 1 dans lequel le moyen de normalisation comprend :
       un moyen (44) pour déterminer l'amplitude maximale du spectre discret à l'intérieur de chacune d'une pluralité de régions du spectre; et
       un moyen pour coder en numérique l'amplitude maximale de chaque région; et
       un moyen (45) pour établir chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour obtenir un premier ensemble de coefficients normalisés.
  10. Système de codage de la parole selon la revendication 9 dans lequel le moyen de normalisation comprend, en outre :
       un moyen pour déterminer l'amplitude maximale du premier ensemble de sorties normalisées dans chacune d'une pluralité de sous-régions du spectre;
       un moyen pour coder en numérique l'amplitude maximale de chaque sous-région; et
       un moyen pour établir chaque sortie du premier ensemble de sorties normalisées par rapport à l'amplitude maximale de chaque sous-région pour obtenir un deuxième sous-ensemble de sorties normalisées.
  11. Codeur de la parole selon la revendication 10 dans lequel chacune des amplitudes maximales est codée de façon logarithmique.
  12. Codeur de la parole selon la revendation 10 dans lequel l'amplitude maximale est déterminée pour chacune des quatre régions correspondantes aux premiers quatre formants.
  13. Système de codage de la parole selon la revendication 10 dans lequel seule une bande de base du spectre normalisé est codée.
  14. Procédé de codage de la parole comprenant les étapes suivantes :
       réalisation d'une transformation discrète de Fourier d'une fenêtre de parole pour engendrer un spectre transforme discret;
       obtention d'un spectre normalisé en définissant au moins une courbe approximant l'amplitude du spectre discret, en codant en numérique la courbe définie et en définissant le spectre discret par rapport à la courbe définie; et
       codage d'au moins une partie du spectre normalisé.
    caractérisé en ce que
       le spectre normalisé est obtenu en définissant l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et en codant en numérique l'enveloppe définie de chaque sous-bande de coefficients et en établissant chaque coefficient par rapport à l'amplitude définie de la sous-bande respective de coefficients; et
       les coefficients établis à l'intérieur de chaque sous-bande sont codés en un nombre de binons déterminé par l'enveloppe définie de la sous-bande.
  15. Procédé selon la revendication 14 dans lequel le nombre de binons déterminé pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
  16. Procédé selon la revendication 15 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
  17. Procédé selon la revendication 15 dans lequel la parole codée est décodée par copie de sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la n ième sous-bande qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmisé.
  18. Procédé selon la revendication 14 dans lequel le spectre normalisé est obtenu en :
       déterminant une amplitude maximale du spectre discret à l'intérieur de chacune d une pluralité de régions du spectre;
       codant en numérique l'amplitude maximale de chaque région; et
       établissant chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour déterminer un ensemble de coefficients normalisés.
EP86900480A 1984-12-20 1985-12-11 Procede et appareil adaptatifs de codage de la parole Expired - Lifetime EP0208712B1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US68438284A 1984-12-20 1984-12-20
US684382 1984-12-20
US06/798,174 US4790016A (en) 1985-11-14 1985-11-14 Adaptive method and apparatus for coding speech
US798174 1985-11-14

Publications (3)

Publication Number Publication Date
EP0208712A1 EP0208712A1 (fr) 1987-01-21
EP0208712A4 EP0208712A4 (fr) 1988-01-28
EP0208712B1 true EP0208712B1 (fr) 1993-04-07

Family

ID=27103309

Family Applications (1)

Application Number Title Priority Date Filing Date
EP86900480A Expired - Lifetime EP0208712B1 (fr) 1984-12-20 1985-12-11 Procede et appareil adaptatifs de codage de la parole

Country Status (3)

Country Link
EP (1) EP0208712B1 (fr)
DE (1) DE3587251T2 (fr)
WO (1) WO1986003872A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12131742B2 (en) 2010-07-19 2024-10-29 Dolby International Ab Processing of audio signals during high frequency reconstruction

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3629434C2 (de) * 1986-08-29 1994-07-28 Karlheinz Dipl Ing Brandenburg Digitales Codierverfahren
US5924060A (en) * 1986-08-29 1999-07-13 Brandenburg; Karl Heinz Digital coding process for transmission or storage of acoustical signals by transforming of scanning values into spectral coefficients
SE0004163D0 (sv) 2000-11-14 2000-11-14 Coding Technologies Sweden Ab Enhancing perceptual performance of high frequency reconstruction coding methods by adaptive filtering
DE102004059979B4 (de) 2004-12-13 2007-11-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zur Berechnung einer Signalenergie eines Informationssignals
PL4016527T3 (pl) 2010-07-19 2023-05-22 Dolby International Ab Przetwarzanie sygnałów audio podczas rekonstrukcji wysokich częstotliwości

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0176243A2 (fr) * 1984-08-24 1986-04-02 BRITISH TELECOMMUNICATIONS public limited company Codage de la parole dans le domaine des fréquences

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS5857758B2 (ja) * 1979-09-28 1983-12-21 株式会社日立製作所 音声ピッチ周期抽出装置
US4330689A (en) * 1980-01-28 1982-05-18 The United States Of America As Represented By The Secretary Of The Navy Multirate digital voice communication processor
DE3102822C2 (de) * 1981-01-28 1984-02-16 Siemens AG, 1000 Berlin und 8000 München Verfahren zur frequenzbandkomprimierten Sprachübertragung
US4535472A (en) * 1982-11-05 1985-08-13 At&T Bell Laboratories Adaptive bit allocator
US4667340A (en) * 1983-04-13 1987-05-19 Texas Instruments Incorporated Voice messaging system with pitch-congruent baseband coding

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0176243A2 (fr) * 1984-08-24 1986-04-02 BRITISH TELECOMMUNICATIONS public limited company Codage de la parole dans le domaine des fréquences

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12131742B2 (en) 2010-07-19 2024-10-29 Dolby International Ab Processing of audio signals during high frequency reconstruction

Also Published As

Publication number Publication date
EP0208712A4 (fr) 1988-01-28
DE3587251T2 (de) 1993-07-15
EP0208712A1 (fr) 1987-01-21
DE3587251D1 (de) 1993-05-13
WO1986003872A1 (fr) 1986-07-03

Similar Documents

Publication Publication Date Title
US4914701A (en) Method and apparatus for encoding speech
US4790016A (en) Adaptive method and apparatus for coding speech
EP0481374B1 (fr) Procédé et dispositif de codage par transformation avec excitation par sous-bandes et allocation de bits dynamique
US4677671A (en) Method and device for coding a voice signal
US6484140B2 (en) Apparatus and method for encoding a signal as well as apparatus and method for decoding signal
JP3881943B2 (ja) 音響符号化装置及び音響符号化方法
US5752225A (en) Method and apparatus for split-band encoding and split-band decoding of audio information using adaptive bit allocation to adjacent subbands
US4704730A (en) Multi-state speech encoder and decoder
EP0910067A1 (fr) Procedes de codage et de decodage de signaux audio, et codeur et decodeur de signaux audio
WO2005111568A1 (fr) Dispositif de codage, dispositif de décodage et méthode pour ceux-ci
JP4628861B2 (ja) 複数のルックアップテーブルを利用したデジタル信号の符号化方法、デジタル信号の符号化装置及び複数のルックアップテーブル生成方法
KR100695125B1 (ko) 디지털 신호 부호화/복호화 방법 및 장치
WO2006051446A2 (fr) Procede de codage de signal
EP1228506A1 (fr) Procede de codage de signal audio a partir d'une valeur de qualite pour l'affectation des bits
JP3353868B2 (ja) 音響信号変換符号化方法および復号化方法
EP0208712B1 (fr) Procede et appareil adaptatifs de codage de la parole
Zelinski et al. Approaches to adaptive transform speech coding at low bit rates
JP4359949B2 (ja) 信号符号化装置及び方法、並びに信号復号装置及び方法
JP3297050B2 (ja) デコーダスペクトル歪み対応電算式適応ビット配分符号化方法及び装置
JP4281131B2 (ja) 信号符号化装置及び方法、並びに信号復号装置及び方法
JP2000151413A (ja) オーディオ符号化における適応ダイナミック可変ビット割り当て方法
JPH0537395A (ja) 帯域分割符号化方法
JP3297238B2 (ja) 適応的符号化システム及びビット割当方法
JPH0761016B2 (ja) コード化方法
JP4618823B2 (ja) 信号符号化装置及び方法

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19861104

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): BE DE FR GB IT

A4 Supplementary search report drawn up and despatched

Effective date: 19880128

17Q First examination report despatched

Effective date: 19900712

ITF It: translation for a ep patent filed
GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): BE DE FR GB IT

ET Fr: translation filed
REF Corresponds to:

Ref document number: 3587251

Country of ref document: DE

Date of ref document: 19930513

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: BE

Payment date: 20041118

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20041202

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20041209

Year of fee payment: 20

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20050426

Year of fee payment: 20

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION

Effective date: 20051210

REG Reference to a national code

Ref country code: GB

Ref legal event code: PE20

BE20 Be: patent expired

Owner name: *VERIZON LABORATORIES INC.

Effective date: 20051211