EP0208712B1 - Procede et appareil adaptatifs de codage de la parole - Google Patents
Procede et appareil adaptatifs de codage de la parole Download PDFInfo
- Publication number
- EP0208712B1 EP0208712B1 EP86900480A EP86900480A EP0208712B1 EP 0208712 B1 EP0208712 B1 EP 0208712B1 EP 86900480 A EP86900480 A EP 86900480A EP 86900480 A EP86900480 A EP 86900480A EP 0208712 B1 EP0208712 B1 EP 0208712B1
- Authority
- EP
- European Patent Office
- Prior art keywords
- coefficients
- spectrum
- subbands
- speech
- transmitted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000003044 adaptive effect Effects 0.000 title 1
- 238000001228 spectrum Methods 0.000 claims abstract description 80
- 230000003362 replicative effect Effects 0.000 claims 3
- 230000010076 replication Effects 0.000 abstract description 12
- 230000003595 spectral effect Effects 0.000 description 14
- 238000013459 approach Methods 0.000 description 10
- 230000005284 excitation Effects 0.000 description 7
- 238000001914 filtration Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 5
- 238000010606 normalization Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 238000013139 quantization Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000006243 chemical reaction Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000002238 attenuated effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001172 regenerating effect Effects 0.000 description 1
- 230000008929 regeneration Effects 0.000 description 1
- 238000011069 regeneration method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000011664 signaling Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/038—Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
Definitions
- the invention refers to a speech encoder as set forth in the preamble of claim 1.
- a speech encoder of this type is known from EP-8-0 176 243.
- a coder for speech signals comprising separation means for receiving speech signals and generating series of values, each series representing respective portions of the frequency spectrum of the input signal and, encoding means for digitally encoding each series, and bit location means for varying the number of bits used for encoding the respective series in dependence on the relative energy content thereof, wherein the number of series to which any given number of bits is allocated is constant and only the selection of the series to which respective numbers of bits are allocated is varied.
- analog telephone systems are being replaced by digital systems.
- digital systems the analog signals are sampled at a rate of about twice the bandwidth of the analog signals or about eight kilohertz, and the samples are then encoded.
- PCM pulse code modulation system
- each sample is quantized as one of a discrete set of prechosen values and encoded as a digital word which is then transmitted over the telephone lines.
- the analog sample is quantized to 28 or 256 levels, each of which is designated by a different eight bit word.
- nonlinear quantization excellent quality speech can be obtained with only seven bits per sample; but since a seven bit word is still required for each sample, transmission bit rates of 56 kilobits per second are necessary.
- the linear predictive coding (LPC) technique is based on the recognition that speech production involves excitation and a filtering process.
- the excitation is determined by the vocal cord vibration for voiced speech and by turbulence for unvoiced speech, and that actuating signal is then modified by the filtering process of vocal resonance chambers, including the mouth and nasal passages.
- a digital filter which simulates the formant effects of the resonance chambers can be defined and the definition can be encoded.
- a residual signal which approximates the excitation can then be obtained by passing the speech signal through an inverse formant filter, and the residual signal can be encoded.
- the receiver Because sufficient information is contained in the lower-frequency portion of the residual spectrum, it is possible to encode only the low frequency baseband and still obtain reasonably clear speech.
- a definition of the formant filter and the residual baseband are decoded.
- the baseband is repeated to complete the spectrum of the residual signal.
- the decoded filter By applying the decoded filter to the repeated baseband signal, the initial speech can be reconstructed.
- a major problem of the LPC approach is in defining the formant filter which must be redefined with each window of samples.
- a complex encoder and a complex decoder are required to obtain transmission rates as low as 16,000 bits per second.
- Another problem with such systems is that they do not always provide a satisfactory reconstruction of certain formants such as that resulting, for example, from nasal resonance. It is the object of the invention to solve these problems.
- the approximate envelope of the transform spectrum in each of a plurality of subbands of coefficients is defined and each envelope definition is encoded for transmission.
- Each spectrum coefficient is then scaled relative to the defined envelope of the respective subband, and each scaled coefficient is encoded in a number of bits which is determined by the defined envelope of its subband.
- Zero bits may be allotted to a number of less significant subbands as indicated by the defined envelopes; and varying numbers of bits may be used for each encoded coefficient depending on the magnitude of the defined envelope for the respective subband.
- the subbands which are transmitted and the resolution with which the transmitted subbands are encoded are determined adaptively for each sample window based on the defined envelopes of the subbands.
- the subbands which are transmitted are replicated to define coefficients of frequencies which are not transmitted.
- a list replication procedure is followed by which an nth coefficient which is transmitted is replicated as an nth coefficient which is not transmitted.
- the speech signal can be recreated by using the transmitted envelope definitions to inverse scale the coefficients of the respective subbands and by performing an inverse transform.
- the spectrum is normalized first with respect to only a few regions and subsequently with respect to a greater number of subregions.
- the maximum magnitude in each of the regions and in each of the subregions is encoded.
- the maximums are logarithmically encoded and only a baseband of the normalized spectrum is encoded.
- FIG. 1 A block diagram of the system is shown in Fig. 1. Speech is filtered with a telephone bandpass filter 20 which prevents aliasing when the signal is sampled 8,000 times per second in sampling circuit 22.
- the analog samples are digitally encoded in an analog to digital encoder 24 and are preprocessed at 26 prior to being applied to a discrete Fourier transform unit 28.
- the output of the Fourier transform circuit 28 is a sequence of coefficients which indicate the magnitude and phase of the Fourier transform spectrum at each of 97 frequencies spaced 41.667 hertz apart.
- the magnitude spectrum of the Fourier transform output is illustrated as a continuous function in Fig. 3 but it is recognized that the transform circuit 28 would actually provide only 97 incremental outputs.
- the Fourier transform spectrum of the full speech within a selected window is equalized and encoded in circuit 30 in a manner which will be discussed below.
- the resultant digital signal can be transmitted at 16,000 bits per second over a line 32 to a receiver.
- the full spectrum of Fig. 3 is reconstructed in circuit 34.
- the inverse Fourier transform is performed in circuit 36 and applied through a post-processor 38 corresponding to the pre-processor 26. That signal is then converted to analog form in digital to analog converter 40.
- Final filtering in filter 42 provides clear speech to the listener.
- a pipelined multiprocessor architecture is employed.
- One microcomputer is dedicated to the analog to digital conversion with preemphasis filtering, one is dedicated to the forward Fourier transform and a third is dedicated to the spectral equalization and coding.
- one microcomputer is dedicated to spectrum reconstruction, another to inverse Fourier transform and a third to digital to analog conversion with deemphasis filtering.
- the spectral equalization and encoding technique of the present invention is based on the recognition that the Fourier transform of the total signal includes a relatively flat spectrum of the pitch illustrated in Fig. 4 shaped by formant signals.
- the signal of Fig. 4 is obtained by normalizing the spectrum of Fig. 3 to at least one curve which itself can be encoded separate from the residual spectrum of Fig. 4.
- the analog speech signal Prior to compression, the analog speech signal is low pass filtered in filter 20 at 3.4 kilohertz, sampled in sampler 22 at a rate of 8 kilohertz, and digitized using a 12 bit linear analog to digital converter 24. It will be recognized that the input to the encoder may already be in digital form and may require conversion to the code which can be accepted by the encoder.
- the digitized speech signal in frames of N samples, is first scaled up in a scaler 26 to maximize its dynamic range in each frame. The scaled input samples are then Fourier transformed in a fast Fourier transform device 28 to obtain a corresponding discrete spectrum represented by (N/2)+ 1 complex frequency coefficients.
- the input frame size equals 180 samples and corresponds to a frame every 22.5 milliseconds.
- the discrete Fourier transform is performed on 192 samples, including 12 samples overlapped with the previous frame, preceded by trapezoidal windowing with a 12 point slope at each end.
- the resulting output of the FFT includes 97 complex frequency coefficients spaced 41.667 Hertz apart.
- FIG. 3 An example magnitude spectrum of a Fourier transform output from FFT 28 is illustrated in Figure 3. Although illustrated as a continuous function, it is recognized that the transform circuit 28 actually provides only 97 incremental complex outputs.
- the magnitude spectrum of the Fourier transform output is equalized and encoded.
- the spectrum is partitioned into contiguous subbands and a spectral envelope estimate is based on a piecewise approximation of those subbands at 44.
- the spectrum is divided into twenty subbands, each including four complex coefficients. Frequencies above 3291.67 Hertz are not encoded and are set to zero at the receiver.
- the spectral envelope of each subband is assumed constant and is defined by the peak magnitude in each subband as illustrated by the horizontal lines in Figure 3.
- Each magnitude, or more correctly the inverse thereof, can be treated as a scale factor for its respective subband.
- Each scale factor is quantized in a quantizer 45 to four bits.
- a nonuniform bit allocation is used for the complex coefficients which are transmitted.
- Three separate two dimensional quantizers 50 are used for the transmitted 12 subbands.
- the sixteen complex coefficients of the four subbands having the smallest scale factors are quantized to seven bits each.
- the coefficients of the four subbands having the next smallest scale factors are quantized to six bits each, and the coefficients of the remaining four of the transmitted subgroups are quantized to four bits each. In effect, the coefficients of the eight subbands which are not transmitted are quantized to zero bits.
- Each of the two dimensional quantizers is designed using an approach presented by Linde, et al., "An Algorithm for Vector Quantizer Design," IEEE Trans on Commun , Vol COM-28, pp. 84-95, Jan 1980.
- the result for the seven bit quantizer is shown in Figure 5.
- the two dimensions of the quantizer are the real and imaginary components of each complex coefficient.
- Each cluster has a seven bit representation to which each complex point in the cluster is quantized. Actual quantization may be by table look-up in a read only memory.
- Time scaling 4 bits
- Synchronization 4 bits TOTAL 360 bits
- the transmitted 12 groups of coefficients are applied to corresponding seven bit, six bit and four bit inverse quantizers at 52.
- the frequency subbands to which the resulting coefficients correspond are determined by the scale factors which are transmitted in sequence for all subbands.
- the coefficients from the seven bit inverse quantizer are placed in the subbands which the scale factors indicate to be of the greatest magnitude.
- the coefficients of the eight subbands which are not transmitted are approximated by replication of transmitted subbands at 54.
- a list replication approach is utilized. This approach is illustrated by Figure 6.
- the coefficients for each subband are illustrated by a single vector.
- the transmitted subbands are indicated as T1, T2, T3, . . .Tn, . . .
- the subbands which must be produced by replication in the receiver are indicated as R1, R2, R3, . . . Rn, . . .
- the coefficients of the subband Tn are used both for Tn and for Rn.
- the scaled coefficients for subband T1 are repeated at subband R1, those of subband T2 are repeated at R2, and those at subband T3 are repeated at R3.
- the rationale for this list replication technique is that subbands are themselves usually grouped in blocks of transmitted subbands and blocks of nontransmitted subbands. Thus, large blocks of coefficients are typically repeated using this approach and speech harmonics are maintained in the replication process.
- a reproduction of the spectrum of Figure 3 can be generated at 56 by applying the scale factors to the equalized spectrum. From that Fourier transform reproduction of the original Fourier transform, the speech can be obtained through an inverse FFT 36, an inverse scaler 38, a digital to analog converter 40 and a reconstruction filter 42.
- a distinct advantage of the present system is that the coder is not based on an assumed fixed low pass spectrum model which is speech specific.
- Voice-band data and signaling take the form of sine waves of some bandwidth which may occur at any frequency. Where only a lower or an upper baseband of coefficients is transmitted, voice-band data can be lost. With the present system, the subbands in which digital information is transmitted are naturally selected because of their higher energy.
- Embedded coding important as a method of congestion control in telephone applications, allows the data to leave the encoder at a constant bit rate, yet be received at the decoder at a lower bit rate as some bits are discarded enroute.
- Embedded coding implies a packet or block of bits within which there is a hierarchy of subblocks. Least crucial subblocks can be discarded first as the channel gets overloaded.
- This hierarchical concept is a natural one in the present system where the partial-band information, described by a set of frequency coefficients, is ordered in a decreasing significance and the missing coefficients can always be approximated from the received ones. The more coefficients in the set, the higher is the rate and the better is the quality. However, speech quality degrades very gracefully with modest drops in the rate.
- the implementation of an embedded coding system in conjunction with this approach is therefore fairly simple and very attractive.
- the coding technique described above provides for excellent speech coding and reproduction at 16 kilobits per second. Excellent results as low as 8.0 kilobits per second can be obtained by using this technique in conjunction with a frequency scaling technique known as time domain harmonic scaling and described by D. Malah, "Time Domain Algorithms for Harmonic Bandwidth Reduction and Time Scaling of Speech Signals", IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-27, pp. 121-133, Apr. 1979.
- speech at twice the rate of the original speech but at the original pitch is generated by combining adjacent pitch cycles.
- the frequency scaled speech can then be fast Fourier transformed in the technique described above.
- each of the steps of residual extraction, subband selection, and quantizing and the steps of inverse quantizing, replication and envelope excitation are shown as individual elements of the system, it will be recognized that they can be merged in an actual system.
- the residual spectrum for subbands which are not transmitted need not be obtained.
- the system can be implemented using a combination of software and hardware.
- the shape of the spectrum is determined by a two-step process. This process also encodes the shape of the entire 100 to 3,800 Hz spectrum since this is useful in the baseband coding.
- the spectrum is divided into four regions illustrated in Fig. 7: 125 - 583 Hz 625 - 1959 Hz 2000 - 3416 Hz 3468 - 3833 Hz These regions correspond roughly to the usual locations of the first four formants.
- the dynamic range of the magnitudes of the spectral coefficients is much smaller within each of these regions than in the spectrum as a whole. For voiced phonemes the peak magnitude near 250 Hz can be 30 dB above the magnitudes near 3,800 Hz.
- the first step of spectral normalization is performed by finding the peak magnitudes within each region, quantizing these peaks to 5 bits each with a logarithmic quantizer, and dividing each spectral coefficient by the quantized peak in its region.
- the result is a vector of spectral coefficients with maximum magnitude equal to unity.
- the division into regions should result in the spectral coefficients being reasonably uniformly distributed within the complex disc of radius one.
- the second step extracts more detailed structure.
- the spectrum is divided into equal bands of about 165 Hz each.
- the peak magnitude within each band is located and quantized to 3 bits.
- the complex spectral coefficients within the band are divided by the quantized magnitude and coded to 6 bits each using a hexagonal quantizer. This coding preserves phase information that is important for reconstruction of frame boundaries.
- the preprocessor 26 is a single-pole pre-emphasis filter. Low frequencies are attenuated by about 5 dB. High frequencies are boosted. The highest frequency (4 kHz) is boosted by about 24 dB.
- the filter is useful in equalizing the spectrum by reducing the low-pass effects of the initializing filter and the high-frequency attenuation of the lips. The boosting helps to maintain numerical accuracy in the subsequent computation of the Fourier transform.
- the spectrum is normalized to a curve which in this case is selected as a horizontal line through the peak magnitude of the spectrum in each region. These curves are shown as lines 58, 60, 62 and 64 in Figure 7.
- the peak magnitude of the complex numbers in each region is determined and encoded to five bits at unit 66 of Fig. 11 by finding a value k which is encoded such that the peak magnitude is between 162 x 2 12(k-1)/32 and 162 x 2 12k/32 . This results in logarithmic encoding of the peak magnitude.
- the four k values, each encoded in five bits make up a total of 20 bits from the formant encoder which are the most significant bits of the transmitted code for the window. All spectral coefficients in each of the four regions are then divided by the 162 x 2 12k/32 in the spectral normalization unit 68. By this method, all of the resultant magnitudes, illustrated in Figure 8, are less than 1.
- the normalized coefficients output from unit 68 are grouped into 20 regions of four and two subregions of five illustrated in Figure 8.
- the peak magnitude in each of these subregions is determined and encoded to three bits with a logarithmic quantizer in unit 70.
- the peak is always coded to the next largest value.
- the three bits from each of the 22 subregions provide an additional 66 bits of the final signal for the window.
- Each output within a subregion is multipled by the reciprocal of the quantized magnitude in the sample normalization unit 72, thus ensuring that all outputs illustrated in Fig. 9 remain less than 1.
- Each complex output from the baseband of 125 Hz to 1959 Hz of the normalized spectrum of Fig. 9 is coded to six bits with the two dimensional quantizer and encoder 74.
- the two-dimensional quantizer is formed by dividing a complex disc of radius one into hexagons as shown in Figure 10.
- the x, y coordinates are radially warped by an exponential function to approximate a logarithmic coding of the magnitude. All points within a hexagon are quantized to the coordinates of the center of the hexagon.
- coefficients of large magnitude are coded to better phase resolution than coefficients of small magnitude.
- Actual quantization is done by table lookup, but efficient computational algorithms are possible.
- the actual coding transformations, bit allocations, and subband sizes may be changed as the coder is optimized for different applications.
- All normalization factors (four at 5 bits each, 22 at 3 bits each) and the coded normalized baseband coefficients (45 at 6 bits) are transmitted.
- the baseband is decoded and duplicated into the upper frequency ranged.
- the normalization factors are applied onto the spectrum to restore the original shape. Specifically, in the receiver, the inverse Fourier Transform Inputs 0 to 2 and 93 to 96 are set to zero.
- the normalized complex coefficients for Inputs 3 to 47 are reconstructed from the quantizer codes by table lookup. They are duplicated into Positions 48 to 92. This duplication is the nonlinear regeneration step. The scale factors for the subregions and larger regions are then applied.
- the inverse transform is computed in unit 36.
- the effects of the windowing are removed by adding the last 12 points of the previous inverse transform to the first 12 points from the current inverse transform.
- the speech now passes through filter 38, which is an inverse to the pre-emphasis filter and which attenuates the high frequencies, removing the effects of the treble boost and reducing high-frequency quantization noise.
- the outputs are converted to analog with a 12-bit linear analog to digital converter 40.
- the baseband which is repeated in the spectrum reconstruction has been described as being a band of lower frequencies.
- the baseband may include any range of frequencies within the spectrum. For some sounds where higher energy levels are found in the higher frequencies, a baseband of the higher frequencies is preferred.
- the baseband suffers degradations only from quantization errors.
- the reconstruction of the upper frequencies is only as good as the model and the shaping information.
- each formant is excited at approximately the right frequency. This is an improvement over baseband residual excitation in which some parts of the spectrum may have too little energy.
- the reduction in computational complexity due to peak finding and scaling instead of linear prediction analysis and filtering is very significant.
- This approach is a wideband approach in that the entire voice frequency range is coded.
- the major problem with other wideband systems at 16 kb/s is that there are barely enough bits available to give a rough description of the waveform.
- Baseband excitation systems such as the present system meet that problem by devoting most of the bits to the baseband and regenerating the excitation signal for higher frequencies.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
Claims (18)
- Codeur de la parole comprenant :
un moyen de transformation de Fourier (28) assurant une transformation discrète de Fourier d'un signal de parole entrant pour engendrer un spectre transformé discret de coefficients;
un moyen de normalisation (30) pour modifier le spectre transforme pour obtenir un spectre normalisé plus plat et pour coder une fonction par laquelle le spectre discret est modifié; et
un moyen (30) pour coder au moins une partie du spectre,
caractérisé en ce que
le dit moyen de normalisation (30) comprend un moyen (44) pour définir l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et pour coder l'enveloppe définie de chaque sous-bande de coefficients et un moyen pour établir chaque coefficient du spectre par rapport à l'enveloppe définie de la sous-bande respective de coefficients; et
le dit moyen (30) pour coder code les coefficients établis du spectre à l'intérieur de chaque sous-bande dans un nombre de binons déterminé par l'enveloppe définie de la sous-bande. - Système de codage de la parole selon la revendication 1 dans lequel le nombre déterminé de binons pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
- Système de codage de la parole selon la revendication 2 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
- Système de codage de la parole selon la revendication 2 dans lequel la parole codée est codée en copiant des sous-bandes des coefficients transmis en tant que substituts pour les sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la n ième sous-bandes qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmisé.
- Système de codage de la parole selon la revendication 1 dans lequel les coefficients des différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
- Système de codage de la parole selon la revendication 1 dans lequel :
le moyen de codage (30) code les coefficients établis de moins de toutes les sous-bandes, les coefficients établis codes étant ceux correspondant aux enveloppes définies de plus grande amplitude, les coefficients établis des sous-bandes correspondant aux enveloppes définies de plus grande amplitude étant codés en plus de binons que les coefficients des sous-bandes correspondant aux enveloppes définies d'amplitude plus petite. - Système de codage de la parole selon la revendication 6 dans lequel la parole codée est décodée en copiant des sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle manière que la n ième sous-bande qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmise.
- Système de codage de la parole selon la revendication 6 dans lequel le moyen de transformation (28) réalise une transformation discrète de Fourier.
- Système de codage de la parole selon la revendication 1 dans lequel le moyen de normalisation comprend :
un moyen (44) pour déterminer l'amplitude maximale du spectre discret à l'intérieur de chacune d'une pluralité de régions du spectre; et
un moyen pour coder en numérique l'amplitude maximale de chaque région; et
un moyen (45) pour établir chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour obtenir un premier ensemble de coefficients normalisés. - Système de codage de la parole selon la revendication 9 dans lequel le moyen de normalisation comprend, en outre :
un moyen pour déterminer l'amplitude maximale du premier ensemble de sorties normalisées dans chacune d'une pluralité de sous-régions du spectre;
un moyen pour coder en numérique l'amplitude maximale de chaque sous-région; et
un moyen pour établir chaque sortie du premier ensemble de sorties normalisées par rapport à l'amplitude maximale de chaque sous-région pour obtenir un deuxième sous-ensemble de sorties normalisées. - Codeur de la parole selon la revendication 10 dans lequel chacune des amplitudes maximales est codée de façon logarithmique.
- Codeur de la parole selon la revendation 10 dans lequel l'amplitude maximale est déterminée pour chacune des quatre régions correspondantes aux premiers quatre formants.
- Système de codage de la parole selon la revendication 10 dans lequel seule une bande de base du spectre normalisé est codée.
- Procédé de codage de la parole comprenant les étapes suivantes :
réalisation d'une transformation discrète de Fourier d'une fenêtre de parole pour engendrer un spectre transforme discret;
obtention d'un spectre normalisé en définissant au moins une courbe approximant l'amplitude du spectre discret, en codant en numérique la courbe définie et en définissant le spectre discret par rapport à la courbe définie; et
codage d'au moins une partie du spectre normalisé.
caractérisé en ce que
le spectre normalisé est obtenu en définissant l'enveloppe approximée du spectre discret dans chacune d'une pluralité de sous-bandes de coefficients et en codant en numérique l'enveloppe définie de chaque sous-bande de coefficients et en établissant chaque coefficient par rapport à l'amplitude définie de la sous-bande respective de coefficients; et
les coefficients établis à l'intérieur de chaque sous-bande sont codés en un nombre de binons déterminé par l'enveloppe définie de la sous-bande. - Procédé selon la revendication 14 dans lequel le nombre de binons déterminé pour une pluralité de sous-bandes est zéro, de telle façon que les coefficients établis pour ces sous-bandes ne soient pas transmis.
- Procédé selon la revendication 15 dans lequel les coefficients établis de différentes sous-bandes sont codés en différents nombres de binons autres que zéro.
- Procédé selon la revendication 15 dans lequel la parole codée est décodée par copie de sous-bandes de coefficients transmis en tant que substituts pour des sous-bandes de coefficients non-transmis, les coefficients transmis étant copiés de telle façon que la n ième sous-bande qui est transmisé soit copiée en tant que n ième sous-bande qui n'est pas transmisé.
- Procédé selon la revendication 14 dans lequel le spectre normalisé est obtenu en :
déterminant une amplitude maximale du spectre discret à l'intérieur de chacune d une pluralité de régions du spectre;
codant en numérique l'amplitude maximale de chaque région; et
établissant chaque coefficient du spectre discret dans chaque région par rapport à l'amplitude maximale de chaque région pour déterminer un ensemble de coefficients normalisés.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US68438284A | 1984-12-20 | 1984-12-20 | |
US684382 | 1984-12-20 | ||
US06/798,174 US4790016A (en) | 1985-11-14 | 1985-11-14 | Adaptive method and apparatus for coding speech |
US798174 | 1985-11-14 |
Publications (3)
Publication Number | Publication Date |
---|---|
EP0208712A1 EP0208712A1 (fr) | 1987-01-21 |
EP0208712A4 EP0208712A4 (fr) | 1988-01-28 |
EP0208712B1 true EP0208712B1 (fr) | 1993-04-07 |
Family
ID=27103309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP86900480A Expired - Lifetime EP0208712B1 (fr) | 1984-12-20 | 1985-12-11 | Procede et appareil adaptatifs de codage de la parole |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP0208712B1 (fr) |
DE (1) | DE3587251T2 (fr) |
WO (1) | WO1986003872A1 (fr) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12131742B2 (en) | 2010-07-19 | 2024-10-29 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE3629434C2 (de) * | 1986-08-29 | 1994-07-28 | Karlheinz Dipl Ing Brandenburg | Digitales Codierverfahren |
US5924060A (en) * | 1986-08-29 | 1999-07-13 | Brandenburg; Karl Heinz | Digital coding process for transmission or storage of acoustical signals by transforming of scanning values into spectral coefficients |
SE0004163D0 (sv) | 2000-11-14 | 2000-11-14 | Coding Technologies Sweden Ab | Enhancing perceptual performance of high frequency reconstruction coding methods by adaptive filtering |
DE102004059979B4 (de) | 2004-12-13 | 2007-11-22 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Vorrichtung und Verfahren zur Berechnung einer Signalenergie eines Informationssignals |
PL4016527T3 (pl) | 2010-07-19 | 2023-05-22 | Dolby International Ab | Przetwarzanie sygnałów audio podczas rekonstrukcji wysokich częstotliwości |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0176243A2 (fr) * | 1984-08-24 | 1986-04-02 | BRITISH TELECOMMUNICATIONS public limited company | Codage de la parole dans le domaine des fréquences |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5857758B2 (ja) * | 1979-09-28 | 1983-12-21 | 株式会社日立製作所 | 音声ピッチ周期抽出装置 |
US4330689A (en) * | 1980-01-28 | 1982-05-18 | The United States Of America As Represented By The Secretary Of The Navy | Multirate digital voice communication processor |
DE3102822C2 (de) * | 1981-01-28 | 1984-02-16 | Siemens AG, 1000 Berlin und 8000 München | Verfahren zur frequenzbandkomprimierten Sprachübertragung |
US4535472A (en) * | 1982-11-05 | 1985-08-13 | At&T Bell Laboratories | Adaptive bit allocator |
US4667340A (en) * | 1983-04-13 | 1987-05-19 | Texas Instruments Incorporated | Voice messaging system with pitch-congruent baseband coding |
-
1985
- 1985-12-11 DE DE8686900480T patent/DE3587251T2/de not_active Expired - Lifetime
- 1985-12-11 WO PCT/US1985/002448 patent/WO1986003872A1/fr active IP Right Grant
- 1985-12-11 EP EP86900480A patent/EP0208712B1/fr not_active Expired - Lifetime
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0176243A2 (fr) * | 1984-08-24 | 1986-04-02 | BRITISH TELECOMMUNICATIONS public limited company | Codage de la parole dans le domaine des fréquences |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US12131742B2 (en) | 2010-07-19 | 2024-10-29 | Dolby International Ab | Processing of audio signals during high frequency reconstruction |
Also Published As
Publication number | Publication date |
---|---|
EP0208712A4 (fr) | 1988-01-28 |
DE3587251T2 (de) | 1993-07-15 |
EP0208712A1 (fr) | 1987-01-21 |
DE3587251D1 (de) | 1993-05-13 |
WO1986003872A1 (fr) | 1986-07-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4914701A (en) | Method and apparatus for encoding speech | |
US4790016A (en) | Adaptive method and apparatus for coding speech | |
EP0481374B1 (fr) | Procédé et dispositif de codage par transformation avec excitation par sous-bandes et allocation de bits dynamique | |
US4677671A (en) | Method and device for coding a voice signal | |
US6484140B2 (en) | Apparatus and method for encoding a signal as well as apparatus and method for decoding signal | |
JP3881943B2 (ja) | 音響符号化装置及び音響符号化方法 | |
US5752225A (en) | Method and apparatus for split-band encoding and split-band decoding of audio information using adaptive bit allocation to adjacent subbands | |
US4704730A (en) | Multi-state speech encoder and decoder | |
EP0910067A1 (fr) | Procedes de codage et de decodage de signaux audio, et codeur et decodeur de signaux audio | |
WO2005111568A1 (fr) | Dispositif de codage, dispositif de décodage et méthode pour ceux-ci | |
JP4628861B2 (ja) | 複数のルックアップテーブルを利用したデジタル信号の符号化方法、デジタル信号の符号化装置及び複数のルックアップテーブル生成方法 | |
KR100695125B1 (ko) | 디지털 신호 부호화/복호화 방법 및 장치 | |
WO2006051446A2 (fr) | Procede de codage de signal | |
EP1228506A1 (fr) | Procede de codage de signal audio a partir d'une valeur de qualite pour l'affectation des bits | |
JP3353868B2 (ja) | 音響信号変換符号化方法および復号化方法 | |
EP0208712B1 (fr) | Procede et appareil adaptatifs de codage de la parole | |
Zelinski et al. | Approaches to adaptive transform speech coding at low bit rates | |
JP4359949B2 (ja) | 信号符号化装置及び方法、並びに信号復号装置及び方法 | |
JP3297050B2 (ja) | デコーダスペクトル歪み対応電算式適応ビット配分符号化方法及び装置 | |
JP4281131B2 (ja) | 信号符号化装置及び方法、並びに信号復号装置及び方法 | |
JP2000151413A (ja) | オーディオ符号化における適応ダイナミック可変ビット割り当て方法 | |
JPH0537395A (ja) | 帯域分割符号化方法 | |
JP3297238B2 (ja) | 適応的符号化システム及びビット割当方法 | |
JPH0761016B2 (ja) | コード化方法 | |
JP4618823B2 (ja) | 信号符号化装置及び方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
17P | Request for examination filed |
Effective date: 19861104 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): BE DE FR GB IT |
|
A4 | Supplementary search report drawn up and despatched |
Effective date: 19880128 |
|
17Q | First examination report despatched |
Effective date: 19900712 |
|
ITF | It: translation for a ep patent filed | ||
GRAA | (expected) grant |
Free format text: ORIGINAL CODE: 0009210 |
|
AK | Designated contracting states |
Kind code of ref document: B1 Designated state(s): BE DE FR GB IT |
|
ET | Fr: translation filed | ||
REF | Corresponds to: |
Ref document number: 3587251 Country of ref document: DE Date of ref document: 19930513 |
|
PLBE | No opposition filed within time limit |
Free format text: ORIGINAL CODE: 0009261 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT |
|
26N | No opposition filed | ||
REG | Reference to a national code |
Ref country code: GB Ref legal event code: IF02 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: BE Payment date: 20041118 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: GB Payment date: 20041202 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: DE Payment date: 20041209 Year of fee payment: 20 |
|
PGFP | Annual fee paid to national office [announced via postgrant information from national office to epo] |
Ref country code: FR Payment date: 20050426 Year of fee payment: 20 |
|
PG25 | Lapsed in a contracting state [announced via postgrant information from national office to epo] |
Ref country code: GB Free format text: LAPSE BECAUSE OF EXPIRATION OF PROTECTION Effective date: 20051210 |
|
REG | Reference to a national code |
Ref country code: GB Ref legal event code: PE20 |
|
BE20 | Be: patent expired |
Owner name: *VERIZON LABORATORIES INC. Effective date: 20051211 |