EP1021805B1 - Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals - Google Patents

Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals Download PDF

Info

Publication number
EP1021805B1
EP1021805B1 EP98943997A EP98943997A EP1021805B1 EP 1021805 B1 EP1021805 B1 EP 1021805B1 EP 98943997 A EP98943997 A EP 98943997A EP 98943997 A EP98943997 A EP 98943997A EP 1021805 B1 EP1021805 B1 EP 1021805B1
Authority
EP
European Patent Office
Prior art keywords
signal
frequency
speech signal
frame
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP98943997A
Other languages
English (en)
French (fr)
Other versions
EP1021805A1 (de
Inventor
Philip Lockwood
Stéphane LUBIARZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nortel Networks France SAS
Original Assignee
Matra Nortel Communications SAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Matra Nortel Communications SAS filed Critical Matra Nortel Communications SAS
Publication of EP1021805A1 publication Critical patent/EP1021805A1/de
Application granted granted Critical
Publication of EP1021805B1 publication Critical patent/EP1021805B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/0212Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders using orthogonal transformation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the present invention relates to techniques digital speech signal processing.
  • a commonly used method used is based on a linear prediction by which we evaluate a prediction delay inversely proportional to the tone frequency. This delay may be expressed as a whole or fractional number of times digital signal sample.
  • Other methods directly detect attributable signal breaks at closures of the speaker's glottis, the intervals of time between these breaks being inversely proportional to the tone frequency.
  • the discrete frequencies considered are those of the form (a / N) ⁇ F e , where F e is the sampling frequency, N the number of samples of the blocks used in the discrete Fourier transform, and has an integer ranging from 0 to N / 2-1. These frequencies do not necessarily include the estimated tone frequency and / or its harmonics. This results in an imprecision in the operations carried out in connection with the estimated tonal frequency, which can cause distortions of the processed signal by affecting its harmonic character.
  • a main object of the present invention is to propose a way to condition the speech signal which makes it less sensitive to the above drawbacks.
  • the invention therefore proposes a process as indicated in the claim 1 and a device as claimed in claim 9.
  • the invention thus provides a method of conditioning of a digital speech signal processed by successive frames, in which an analysis is carried out harmonic of the speech signal to estimate a frequency tonal of the speech signal on each frame where it presents voice activity. After estimating the frequency of the speech signal on a frame, we condition the speech signal of the frame by oversampling it to a multiple oversampling frequency of the estimated tone frequency.
  • An additional improvement is that that after processing each frame, we keep, among samples of the denoised speech signal provided by this processing, a number of samples equal to an integer multiple of times the ratio between the frequency sampling frequency and estimated tone frequency. This avoids distortion problems caused by phase discontinuities between frames, which are not generally not fully corrected by techniques overlap-add classics.
  • the fact of having conditioned the signal by the oversampling technique provides good measure of the degree of voicing of the speech signal on the frame, from a calculation of the entropy of the autocorrelation of the spectral components calculated on the basis of the conditioned signal.
  • Signal conditioning of speech accentuates the irregular aspect of the spectrum and therefore variations in entropy, so that the latter is a measure of good sensitivity.
  • the denoising system shown in FIG. 1 processes a digital speech signal s.
  • the signal frame is transformed in the frequency domain by a module 11 applying a conventional fast Fourier transform (TFR) algorithm to calculate the module of the signal spectrum.
  • TFR fast Fourier transform
  • the frequency resolution available at the output of the fast Fourier transform is not used, but a lower resolution, determined by a number I of frequency bands covering the band [0 , F e / 2] of the signal.
  • a module 12 calculates the respective averages of the spectral components S n, f of the speech signal in bands, for example by a uniform weighting such that:
  • This averaging reduces the fluctuations between the bands by averaging the noise contributions in these bands, which will decrease the variance of the estimator of noise. In addition, this averaging allows a large reduction of the complexity of the system.
  • the averaged spectral components S n, i are addressed to a voice activity detection module 15 and to a noise estimation module 16. These two modules 15, 16 operate jointly, in the sense that degrees of vocal activity ⁇ n, i measured for the different bands by the module 15 are used by the module 16 to estimate the long-term energy of the noise in the different bands, while these long-term estimates B and n, i are used by module 15 to carry out a priori denoising of the speech signal in the different bands to determine the degrees of vocal activity ⁇ n, i .
  • modules 15 and 16 can correspond to the flowcharts represented in the figures 2 and 3.
  • the module 15 proceeds a priori to denoising the speech signal in the different bands i for the signal frame n.
  • This a priori denoising is carried out according to a conventional process of non-linear spectral subtraction from noise estimates obtained during one or more previous frames.
  • the spectral components pp n, i are calculated according to: where ⁇ p i is a floor coefficient close to 0, conventionally used to prevent the spectrum of the denoised signal from taking negative or too low values which would cause musical noise.
  • Steps 17 to 20 therefore essentially consist in subtracting from the signal spectrum an estimate, increased by the coefficient ⁇ '/ n - ⁇ 1, i , of the noise spectrum estimated a priori.
  • the module 15 calculates, for each band i (0 ⁇ 1 ⁇ I), a quantity ⁇ E n, i representing the short-term variation of the energy of the noise-suppressed signal in the band i, as well as long-term value E n, i of the energy of the denoised signal in band i.
  • step 25 the quantity ⁇ E n, l is compared with a threshold ⁇ 1. If the threshold ⁇ 1 is not reached, the counter b i is incremented by one unit in step 26.
  • step 27 the long-term estimator ba i is compared to the value of the smoothed energy E n, i . If ba l ⁇ E n, i , the estimator ba i is taken equal to the smoothed value E n, i in step 28, and the counter b i is reset to zero.
  • the quantity ⁇ i which is taken equal to the ratio ba i / E n, 1 (step 36), is then equal to 1.
  • step 27 shows that ba i ⁇ E n, i
  • the counter b i is compared with a limit value bmax in step 29. If b j > bmax, the signal is considered to be too stationary to support vocal activity.
  • Bm represents an update coefficient between 0.90 and 1. Its value differs depending on the state of a voice activity detection automaton (steps 30 to 32). This state ⁇ n-1 is that determined during the processing of the previous frame.
  • the coefficient Bm takes a value Bmp very close to 1 so that the noise estimator is very slightly updated in the presence of speech. Otherwise, the coefficient Bm takes a lower value Bms, to allow a more significant update of the noise estimator in the phase of silence.
  • the difference ba i -bl i between the long-term estimator and the internal noise estimator is compared to a threshold ⁇ 2. If the threshold ⁇ 2 is not reached, the long-term estimator ba l is updated with the value of the internal estimator bi l in step 35. Otherwise, the long-term estimator ba l remains unchanged . This avoids that sudden variations due to a speech signal lead to an update of the noise estimator.
  • the module 15 After having obtained the quantities ⁇ i , the module 15 proceeds to the voice activity decisions in step 37.
  • the module 15 first updates the state of the detection automaton according to the quantity ⁇ 0 calculated for l of the signal band.
  • the new state ⁇ n of the automaton depends on the previous state ⁇ n-1 and on ⁇ 0 , as shown in Figure 4.
  • the module 15 also calculates the degrees of vocal activity ⁇ n, i in each band i ⁇ 1.
  • This function has for example the appearance shown in FIG. 5.
  • Module 16 calculates the band noise estimates, which will be used in the denoising process, using the successive values of the components S n, i and the degrees of voice activity ⁇ n, i . This corresponds to steps 40 to 42 of FIG. 3.
  • step 40 it is determined whether the voice activity detection machine has just gone from the rising state to the speaking state. If so, the last two estimates B and n -1, i and B and n- 2 , i previously calculated for each band i ⁇ 1 are corrected in accordance with the value of the previous estimate B and n -3, i .
  • step 42 the module 16 updates the noise estimates per band according to the formulas: where ⁇ B denotes a forgetting factor such as 0 ⁇ B ⁇ 1.
  • Formula (6) shows how the degree of non-binary vocal activity ⁇ n, i is taken into account.
  • the long-term noise estimates B and n , l are overestimated, by a module 45 (FIG. 1), before proceeding to denoising by nonlinear spectral subtraction.
  • Module 45 calculates the overestimation coefficient ⁇ '/ n, i previously mentioned, as well as an increased estimate B and' / n, i which essentially corresponds to ⁇ '/ n, i . B and n , i .
  • the organization of the overestimation module 45 is shown in FIG. 6.
  • the enhanced estimate B and '/ n, i is obtained by combining the long-term estimate B and n, i and a measure ⁇ B max / n, i the variability of the noise component in band i around its long-term estimate.
  • this combination is essentially a simple sum made by an adder 46. It could also be a weighted sum.
  • the measure ⁇ B max / n, i of the noise variability reflects the variance of the noise estimator. It is obtained as a function of the values of S n, i and of B and n, i calculated for a certain number of previous frames on which the speech signal does not present any vocal activity in the band i. It is a function of the deviations
  • the degree of vocal activity ⁇ n, i is compared to a threshold (block 51) to decide whether the deviation
  • the measure of variability ⁇ B max / n, i can, as a variant, be obtained as a function of the values S n, f (and not S n, i ) and B and n , i .
  • FIFO 54 does not contain
  • the enhanced estimator B and '/ n, i provides excellent robustness to the musical noises of the denoising process.
  • a first phase of the spectral subtraction is carried out by the module 55 shown in FIG. 1.
  • This phase provides, with the resolution of the bands 1 (1 i i 1 1), the frequency response H 1 / n, i of first denoising filter, as a function of the components S n, i and B and n, i and the overestimation coefficients ⁇ '/ n, i .
  • the coefficient ⁇ 1 / i represents, like the coefficient ⁇ p i of formula (3), a floor conventionally used to avoid negative or too low values of the denoised signal.
  • the overestimation coefficient ⁇ n, i could be replaced in formula (7) by another coefficient equal to a function of ⁇ '/ n, i and of an estimate of the signal-to-noise ratio (for example S n, i / B and n, i ), this function decreasing according to the estimated value of the signal-to-noise ratio.
  • This function is then equal to ⁇ '/ n, i for the lowest values of the signal-to-noise ratio. Indeed, when the signal is very noisy, it is a priori not useful to reduce the overestimation factor.
  • this function decreases to zero for the highest values of the signal / noise ratio. This protects the most energetic areas of the spectrum, where the speech signal is most significant, the amount subtracted from the signal then tending towards zero.
  • This strategy can be refined by applying it selectively to frequency harmonics pitch of the speech signal when it has voice activity.
  • a second denoising phase is carried out by a module 56 for protecting harmonics.
  • the module 57 can apply any known method of analysis of the speech signal of the frame to determine the period T p , expressed as an integer or fractional number of samples, for example a linear prediction method.
  • the protection provided by the module 56 may consist in carrying out, for each frequency f belonging to a band i:
  • H 2 n , f 1
  • the quantity subtracted from the component S n, f will be zero.
  • the floor coefficients ⁇ 2 / i express the fact that certain harmonics of the tonal frequency f p can be masked by noise, so that it doesn ' is not worth protecting them.
  • This protection strategy is preferably applied for each of the frequencies closest to the harmonics of f p , that is to say for any arbitrary integer.
  • the difference ink the ⁇ -th harmonic of the real tonal frequency is its estimate ⁇ ⁇ f p (condition (9)) can go up to ⁇ ⁇ ⁇ ⁇ p / 2. For high values of ⁇ , this difference can be greater than the spectral half-resolution ⁇ f / 2 of the Fourier transform.
  • the corrected frequency response H 2 / n, f can be equal to 1 as indicated above, which corresponds to the subtraction of a zero quantity in the context of spectral subtraction, that is to say ie full protection of the frequency in question. More generally, this corrected frequency response H 2 / n, f could be taken equal to a value between 1 and H 1 / n, f depending on the degree of protection desired, which corresponds to the subtraction of an amount less than which would be subtracted if the frequency in question was not protected.
  • This signal S 2 / n, f is supplied to a module 60 which calculates, for each frame n, a masking curve by applying a psychoacoustic model of auditory perception by the human ear.
  • the masking phenomenon is a principle known from functioning of the human ear. When two frequencies are heard simultaneously, it is possible that one of the two is no longer audible. We say then that it is hidden.
  • the masking curve is seen as the convolution of the spectral spreading function of the basilar membrane in the bark domain with the excitatory signal, constituted in the present application by the signal S 2 / n, f .
  • the spectral spreading function can be modeled as shown in Figure 7.
  • R q depends on the more or less voiced character of the signal.
  • designates a degree of voicing of the speech signal, varying between zero (no voicing) and 1 (strongly voiced signal).
  • the denoising system also includes a module 62 which corrects the frequency response of the denoising filter, as a function of the masking curve M n, q calculated by the module 60 and of the increased estimates B and '/ n, i , calculated by the module 45.
  • the module 62 decides the level of denoising which must really be reached.
  • the new response H 3 / n, f for a frequency f belonging to the band i defined by the module 12 and to the banae of bark q, thus depends on the relative difference between the increased estimate B and '/ n, i of the corresponding spectral component of the noise and the masking curve M n, q , as follows:
  • the quantity subtracted from a spectral component S n, f , in the process of spectral subtraction having the frequency response H 3 / n, f is substantially equal to the minimum between on the one hand the quantity subtracted from this spectral component in the spectral subtraction process having the frequency response H 2 / n, f , and on the other hand the fraction of the increased estimate B and '/ n, i of the corresponding spectral component of the noise which, if if necessary, exceeds the masking curve M n, q .
  • FIG. 8 illustrates the principle of the correction applied by the module 62. It schematically shows an example of masking curve M n, q calculated on the basis of the spectral components S 2 / n, f of the noise-suppressed signal, as well as the estimation plus B and '/ n, i of the noise spectrum.
  • the quantity finally subtracted from the components S n, f will be that represented by the hatched areas, that is to say limited to the fraction of the increased estimate B and '/ n, i of the spectral components of the noise which exceeds the curve masking.
  • This subtraction is carried out by multiplying the frequency response H 3 / n , f of the denoising filter by the spectral components S n, f of the speech signal (multiplier 64).
  • TFRI inverse fast Fourier transform
  • FIG. 9 shows a preferred embodiment of a denoising system implementing the invention.
  • This system comprises a certain number of elements similar to corresponding elements of the system of FIG. 1, for which the same reference numbers have been used.
  • modules 10, 11, 12, 15, 16, 45 and 55 provide in particular the quantities S n, i , B and n , i , ⁇ '/ n, i , B and' / n, i and H 1 / n, f to perform selective denoising.
  • the frequency resolution of the fast Fourier transform 11 is a limitation of the system of FIG. 1.
  • the frequency subject to protection by the module 56 is not necessarily the precise tonal frequency f p , but the frequency closest to it in the discrete spectrum. In some cases, it is then possible to protect harmonics relatively far from that of the tone frequency.
  • the system of FIG. 9 overcomes this drawback thanks to an appropriate conditioning of the speech signal.
  • the sampling frequency of the signal is modified so that the period 1 / f p covers exactly an integer number of sample times of the conditioned signal.
  • This size N is usually a power of 2 for putting implementation of the TFR. It is 256 in the example considered.
  • This choice is made by a module 70 according to the value of the delay T p supplied by the harmonic analysis module 57.
  • the module 70 provides the ratio K between the sampling frequencies to three frequency change modules 71, 72, 73 .
  • the module 71 is used to transform the values S n, i , B and n , i , ⁇ '/ n, i , B and ' / n, i and H 1 / n, f , relating to the bands i defined by the module 12 , in the scale of modified frequencies (sampling frequency f e ). This transformation consists simply in dilating the bands i in the factor K. The values thus transformed are supplied to the module 56 for protecting harmonics.
  • the module 72 performs the oversampling of the frame of N samples provided by the windowing module 10.
  • the conditioned signal frame supplied by the module 72 includes KN samples at the frequency f e . These samples are sent to a module 75 which calculates their Fourier transform.
  • the two blocks therefore have an overlap of (2-K) x100%.
  • the autocorrelations A (k) are calculated by a module 76, for example according to the formula:
  • a module 77 then calculates the normalized entropy H, and supplies it to module 60 for the calculation of the masking curve (see SA McClellan et al: “Spectral Entropy: an Alternative Indicator for Rate Allocation?”, Proc. ICASSP'94 , pages 201-204):
  • the normalized entropy H constitutes a measurement of voicing very robust to noise and variations in the tonal frequency.
  • the correction module 62 operates in the same way as that of the system in FIG. 1, taking into account the overestimated noise B and '/ n, i rescaled by the frequency change module 71. It provides the response in frequency H 3 / n, f of the final denoising filter, which is multiplied by the spectral components S n, f of the signal conditioned by the multiplier 64. The components S 3 / n, f which result therefrom are brought back into the time domain by the TFRI 65 module. At the output of this TFRI 65, a module 80 combines, for each frame, the two signal blocks resulting from the processing of the two overlapping blocks delivered by the TFR 75. This combination can consist of a weighted sum Hamming of samples, to form a denoised conditioned signal frame of KN samples.
  • the management module 82 controls the windowing module 10 so that the overlap between the current frame and the next one corresponds to NM. This recovery of NM samples will be required in the recovery sum carried out by the module 66 during the processing of the next frame.
  • the tone frequency is estimated in an average way on the frame.
  • the tonal frequency can vary some little over this period. It is possible to take into account these variations in the context of the present invention, in conditioning the signal so as to obtain artificially a constant tone frequency in the frame.
  • the analysis module 57 harmonic provides the time intervals between the consecutive breaks in speech signal due to closures of the glottis of the intervening speaker for the duration of the frame.
  • Usable methods to detect such micro-ruptures are well known in the area of harmonic signal analysis lyrics.
  • the principle of these methods is to perform a statistical test between two models, one in the short term and the other in the long term. Both models are adaptive linear prediction models.
  • the value of this statistical test w m is the cumulative sum of the posterior likelihood ratio of two distributions, corrected by the Kullback divergence. For a distribution of residuals having a Gaussian statistic, this value w m is given by: where e 0 / m and ⁇ 2/0 represent the residue calculated at the time of the sample m of the frame and the variance of the long-term model, e 1 / m and ⁇ 2/1 likewise representing the residue and the variance of the short term model. The closer the two models are, the more the value w m of the statistical test is close to 0. On the other hand, when the two models are distant from each other, this value w m becomes negative, which indicates a break R of the signal.
  • FIG. 10 thus shows a possible example of evolution of the value w m , showing the breaks R of the speech signal.
  • FIG. 11 shows the means used to calculate the conditioning of the signal in the latter case.
  • the harmonic analysis module 57 is produced so as to implement the above analysis method, and to provide the intervals t r relative to the signal frame produced by the module 10.
  • These oversampling reports K r are supplied to the frequency change modules 72 and 73, so that the interpolations are carried out with the sampling ratio K r over the corresponding time interval t r .
  • the largest T p of the time intervals t r supplied by the module 57 for a frame is selected by the module 70 (block 91 in FIG. 11) to obtain a torque p, ⁇ as indicated in table I.
  • This embodiment of the invention also involves an adaptation of the window management module 82.
  • the number M of samples of the denoised signal to be saved on the current frame here corresponds to an integer number of consecutive time intervals t r between two glottal breaks (see FIG. 10). This arrangement avoids the problems of phase discontinuity between frames, while taking into account the possible variations of the time intervals t r on a frame.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Claims (9)

  1. Verfahren zur Aufbereitung eines in aufeinanderfolgenden Gruppen ("trames") behandelten digitalen Sprachsignals (s), dadurch gekennzeichnet, dass man eine Oberschwingungsanalyse des Sprachsignals vornimmt, um eine Tonfrequenz (fp) des Sprachsignals auf jeder Gruppe zu schätzen, auf der es eine Stimmaktivität aufweist, und dass man nach Schätzung der Tonfrequenz des Sprachsignals auf einer Gruppe das Sprachsignal der Gruppe aufbereitet, indem man es mit einer Überabtastfrequenz ("fréquence de suréchantillonnage") (fe) überabtastet ("suréchantillonne"), die ein ganzzahliges Vielfaches der geschätzten Tonfrequenz ist.
  2. Verfahren nach Anspruch 1, bei dem man Spektralkomponenten (Sn,f) des Sprachsignals errechnet, indem man das aufbereitete Signal (s') in Blöcken von N Abtastungen abgibt, die einer Transformation im Frequenzbereich unterzogen wurden, wobei N eine vorbestimmte ganze Zahl ist, und bei dem das Verhältnis (p) zwischen der Überabtastfrequenz (fe) und der geschätzten Tonfrequenz ein Teiler der Zahl N ist.
  3. Verfahren nach Anspruch 2, bei dem die Zahl N eine Potenz von 2 ist.
  4. Verfahren nach Anspruch 2 oder 3, bei dem man einen Voisementgrad ("degré de voisement") (χ) des Sprachsignals auf der Gruppe ausgehend von einer Berechnung der Entropie (H) der Autokorrelation von Spektralkomponenten (S 2 / n,f), die auf der Basis des aufbereiteten Signals (s') errechnet wurden, schätzt.
  5. Verfahren nach Anspruch 4, bei dem der Voisementgrad (χ) ausgehend von einer standardisierten entropie H der Formel
    Figure 00350001
    gemessen wird,
    worin A(k) die standardisierte Autokorrelation ist, die definiert ist durch:
    Figure 00350002
    worin S 2 / n,f diese auf der Basis des überabgetasteten Signals berechnete spektrale Komponente der Ordnung f bezeichnet.
  6. Verfahren nach einem der vorhergehenden Ansprüche, bei dem man nach Behandlung jeder Gruppe aufbereiteten Signals von den durch diese Behandlung gelieferten Signalabtastungen eine Anzahl von Abtastungen (M) gleich einem ganzzahligen Vielfachen des Verhältnisses (Tp) zwischen der Abtastfrequenz (Fe) und der geschätzten Tonfrequenz (fp) beibehält.
  7. Verfahren nach einem der Ansprüche 1 bis 5, bei dem die Schätzung der Tonfrequenz des Sprachsignals auf einer Gruppe die folgenden Schritte umfasst:
    man schätzt Zeitintervalle (tr) zwischen zwei aufeinanderfolgenden Unterbrechungen (R) des Signals, die während der Dauer der Gruppe auftretenden Schließungen der Stimmritze des Sprechers zuschreibbar sind, wobei die geschätzte Tonfrequenz umgekehrt proportional zu diesen Zeitintervallen ist;
    man interpoliert das Sprachsignal in diesen Zeitintervallen, damit das aus dieser Interpolation resultierende aufbereitete Signal (s') ein konstantes Zeitintervall zwischen zwei aufeinanderfolgenden Unterbrechungen aufweist.
  8. Verfahren nach Anspruch 7, bei dem man nach der Behandlung jeder Gruppe von den von dieser Behandlung gelieferten Abtastungen des Sprachsignals eine Anzahl von Abtastungen (M) beibehält, die einer ganzen Zahl von geschätzten Zeitintervallen (tr) entspricht.
  9. Vorrichtung zur Aufbereitung eines digitalen Sprachsignals (s), umfassend Behandlungsmittel, die für die Durchführung eines Aufbereitungsverfahrens nach einem der vorhergehenden Ansprüche ausgelegt ist.
EP98943997A 1997-09-18 1998-09-16 Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals Expired - Lifetime EP1021805B1 (de)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
FR9711641A FR2768545B1 (fr) 1997-09-18 1997-09-18 Procede de conditionnement d'un signal de parole numerique
FR9711641 1997-09-18
PCT/FR1998/001978 WO1999014744A1 (fr) 1997-09-18 1998-09-16 Procede de conditionnement d'un signal de parole numerique

Publications (2)

Publication Number Publication Date
EP1021805A1 EP1021805A1 (de) 2000-07-26
EP1021805B1 true EP1021805B1 (de) 2001-11-07

Family

ID=9511228

Family Applications (1)

Application Number Title Priority Date Filing Date
EP98943997A Expired - Lifetime EP1021805B1 (de) 1997-09-18 1998-09-16 Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals

Country Status (7)

Country Link
US (1) US6775650B1 (de)
EP (1) EP1021805B1 (de)
AU (1) AU9168798A (de)
CA (1) CA2304013A1 (de)
DE (1) DE69802431T2 (de)
FR (1) FR2768545B1 (de)
WO (1) WO1999014744A1 (de)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1278185A3 (de) * 2001-07-13 2005-02-09 Alcatel Verfahren zur Verbesserung von Geräuschunterdrückung bei der Sprachübertragung
US7103539B2 (en) * 2001-11-08 2006-09-05 Global Ip Sound Europe Ab Enhanced coded speech
EP1559101A4 (de) * 2002-11-07 2006-01-25 Samsung Electronics Co Ltd Mpeg-audiocodierungsverfahren und vorrichtung
CA2697920C (en) * 2007-08-27 2018-01-02 Telefonaktiebolaget L M Ericsson (Publ) Transient detector and method for supporting encoding of an audio signal
US8473283B2 (en) * 2007-11-02 2013-06-25 Soundhound, Inc. Pitch selection modules in a system for automatic transcription of sung or hummed melodies
US8924200B2 (en) * 2010-10-15 2014-12-30 Motorola Mobility Llc Audio signal bandwidth extension in CELP-based speech coder
WO2013054347A2 (en) * 2011-07-20 2013-04-18 Tata Consultancy Services Limited A method and system for detecting boundary of coarticulated units from isolated speech

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3785189T2 (de) * 1987-04-22 1993-10-07 Ibm Verfahren und Einrichtung zur Veränderung von Sprachgeschwindigkeit.
US5384891A (en) * 1988-09-28 1995-01-24 Hitachi, Ltd. Vector quantizing apparatus and speech analysis-synthesis system using the apparatus
AU633673B2 (en) 1990-01-18 1993-02-04 Matsushita Electric Industrial Co., Ltd. Signal processing device
DE69124005T2 (de) 1990-05-28 1997-07-31 Matsushita Electric Ind Co Ltd Sprachsignalverarbeitungsvorrichtung
US5400434A (en) * 1990-09-04 1995-03-21 Matsushita Electric Industrial Co., Ltd. Voice source for synthetic speech system
US5226084A (en) * 1990-12-05 1993-07-06 Digital Voice Systems, Inc. Methods for speech quantization and error correction
FR2679689B1 (fr) * 1991-07-26 1994-02-25 Etat Francais Procede de synthese de sons.
US5469087A (en) 1992-06-25 1995-11-21 Noise Cancellation Technologies, Inc. Control system using harmonic filters
US5787398A (en) * 1994-03-18 1998-07-28 British Telecommunications Plc Apparatus for synthesizing speech by varying pitch
JP3528258B2 (ja) * 1994-08-23 2004-05-17 ソニー株式会社 符号化音声信号の復号化方法及び装置
US5641927A (en) * 1995-04-18 1997-06-24 Texas Instruments Incorporated Autokeying for musical accompaniment playing apparatus
US5555190A (en) 1995-07-12 1996-09-10 Micro Motion, Inc. Method and apparatus for adaptive line enhancement in Coriolis mass flow meter measurement
BE1010336A3 (fr) * 1996-06-10 1998-06-02 Faculte Polytechnique De Mons Procede de synthese de son.
JP3266819B2 (ja) * 1996-07-30 2002-03-18 株式会社エイ・ティ・アール人間情報通信研究所 周期信号変換方法、音変換方法および信号分析方法
WO1999010719A1 (en) * 1997-08-29 1999-03-04 The Regents Of The University Of California Method and apparatus for hybrid coding of speech at 4kbps
US6064955A (en) * 1998-04-13 2000-05-16 Motorola Low complexity MBE synthesizer for very low bit rate voice messaging

Also Published As

Publication number Publication date
AU9168798A (en) 1999-04-05
EP1021805A1 (de) 2000-07-26
DE69802431T2 (de) 2002-07-18
US6775650B1 (en) 2004-08-10
FR2768545B1 (fr) 2000-07-13
WO1999014744A1 (fr) 1999-03-25
FR2768545A1 (fr) 1999-03-19
DE69802431D1 (de) 2001-12-13
CA2304013A1 (fr) 1999-03-25

Similar Documents

Publication Publication Date Title
EP1016072B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP1789956B1 (de) Verfahren zum verarbeiten eines rauschbehafteten tonsignals und einrichtung zur implementierung des verfahrens
EP1356461B1 (de) Rauschverminderungsverfahren und -einrichtung
EP2002428B1 (de) Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung
EP1016071B1 (de) Verfahren und vorrichtung zur sprachdetektion
EP0490740A1 (de) Verfahren und Einrichtung zum Bestimmen der Sprachgrundfrequenz in Vocodern mit sehr niedriger Datenrate
EP1016073B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP1021805B1 (de) Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals
EP3192073B1 (de) Unterscheidung und dämpfung von vorechos in einem digitalen audiosignal
EP1429316A1 (de) Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen
EP2515300B1 (de) Verfahren und System für die Geräuschunterdrückung
FR2797343A1 (fr) Procede et dispositif de detection d'activite vocale
EP4287648A1 (de) Elektronische vorrichtung und verarbeitungsverfahren, akustische vorrichtung und computerprogramm dafür
FR3051958A1 (fr) Procede et dispositif pour estimer un signal dereverbere
WO1999027523A1 (fr) Procede de reconstruction, apres debruitage, de signaux sonores
WO2006117453A1 (fr) Procede d’attenuation des pre- et post-echos d’un signal numerique audio et dispositif correspondant
FR2664446A1 (fr) Codeur differentiel a filtre predicteur auto-adaptatif a adaptation rapide de gain et decodeur correspondant.

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000316

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

RIC1 Information provided on ipc code assigned before grant

Free format text: 7G 10L 11/04 A, 7G 10L 21/02 B

RTI1 Title (correction)

Free format text: METHOD AND APPARATUS FOR CONDITIONING A DIGITAL SPEECH SIGNAL

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

17Q First examination report despatched

Effective date: 20001123

GRAG Despatch of communication of intention to grant

Free format text: ORIGINAL CODE: EPIDOS AGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAH Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOS IGRA

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE FR GB

REF Corresponds to:

Ref document number: 69802431

Country of ref document: DE

Date of ref document: 20011213

REG Reference to a national code

Ref country code: GB

Ref legal event code: IF02

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20020130

RAP2 Party data changed (patent owner data changed or rights of a patent transferred)

Owner name: NORTEL NETWORKS FRANCE

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed
PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20050817

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20050902

Year of fee payment: 8

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20050930

Year of fee payment: 8

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20070403

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20060916

REG Reference to a national code

Ref country code: FR

Ref legal event code: ST

Effective date: 20070531

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20060916

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20061002