EP1208561B1 - Procede et dispositif de reduction du bruit dans des signaux vocaux - Google Patents

Procede et dispositif de reduction du bruit dans des signaux vocaux Download PDF

Info

Publication number
EP1208561B1
EP1208561B1 EP00925105A EP00925105A EP1208561B1 EP 1208561 B1 EP1208561 B1 EP 1208561B1 EP 00925105 A EP00925105 A EP 00925105A EP 00925105 A EP00925105 A EP 00925105A EP 1208561 B1 EP1208561 B1 EP 1208561B1
Authority
EP
European Patent Office
Prior art keywords
speech signal
speech
signal
threshold value
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP00925105A
Other languages
German (de)
English (en)
Other versions
EP1208561A2 (fr
Inventor
Kjeld Hermansen
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Noisecom APS
Original Assignee
Noisecom APS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Noisecom APS filed Critical Noisecom APS
Publication of EP1208561A2 publication Critical patent/EP1208561A2/fr
Application granted granted Critical
Publication of EP1208561B1 publication Critical patent/EP1208561B1/fr
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Definitions

  • the present invention relates to noise reduction in speech signals, in particular noise reduction in speech signals employed in telecommunication, most particularly in telecommunication employing cellular phones.
  • Noise when added to a speech signal can impair the quality of the signal, reduce intelligibility, and increase listener fatigue. It is therefore of great importance to reduce noise in a speech signal, e.g. in relation to telecommunication, especially when employing cellular phones, or in relation to hearing aids.
  • noise reduction in speech signals include spectral subtraction and other filtering methods.
  • the noise reduction may e.g. be based on an estimate of the noise spectrum. Such methods depend on stationarity in the noise signal to perform optimally. As the noise in a speech signal is often non-stationary, the estimated noise spectrum used for spectral subtraction will be different from the actual noise spectrum during speech activity. This results in short duration random tones in the noise reduced signal, and such random tone noise tends to be very irritating to listen to due to psycho-acoustic effects.
  • WO 99/01942 disdoses a method of reducing the noise in a speech signal using spectral subtraction. According to this method a model based representation describing the quasi-stationary part of the speech signal is generated and manipulated, and the resulting speech signal is generated using the manipulated model and a second signal derived from the speech signal.
  • the object of the present invention is to provide a method of noise reduction in speech signals which reduces the noise even more than known methods. It is further an object to provide a method of noise reduction in speech signals which reduces the noise without affecting the actual speech signal, i.e. a method which eliminates, or at least considerably reduces, unwanted components of a signal without, or at least only to a very limited extend, reducing wanted components.
  • the method according to the present invention employs a new model based spectral subtraction algorithm for noise suppression/noise reduction in speech.
  • This new algorithm benefits from available knowledge of the speech dynamics.
  • the method yields better results - especially for low Signal to Noise Ratios (SNR) - with less distortions and artefacts, like musical tones, than any other methods, e.g. the usual spectral subtraction.
  • SNR Signal to Noise Ratios
  • noise suppression in speech processing such as speech coding has gained an increased importance due to the advent of digital cellular telephones. With the low data rate speech coding algorithms the speech quality tends to degrade drastically in high noise. To prevent such quality loss, noise suppression must be achieved without introducing artefacts, speech distortion or significant loss of speech intelligibility.
  • noisy signal is modelled as a sum of the speech signal and the noise assuming statistical independence.
  • the spectral subtraction provides an estimate of the signal spectrum as the difference between the noisy spectrum and an estimate of the noise/background spectrum, the latter is obtained during periods of silence.
  • Transformation of the estimated speech signal to time domain requires knowledge of the phase of the signal. In most situations one uses the phase of the noisy signal. This works well for high Signal to Noise Ratios (SNR >10 dB). The problem is to use this phase for low SNR, which is a serious drawback of the classical spectral subtraction. A possibility to handle this problem is to use an alternative description of the signal.
  • the speech signal is decomposed into two components: a generator signal (the residual signal) and a filter modelling of the vocal tract. This results in a separation of speech into a transient and a quasi-stationary part.
  • Determination of the noise free/reduced synthesis filter is done by a combination of classical spectral subtraction and model based characterisation of the difference spectrum of noisy speech and background noise.
  • the auto correlation function of the quasi stationary part of the speech is mapped into an LPC model spectrum of order 10 and the so-called f, b and g parameters (f formant frequency, b bandwidth and g gain) are determined from this spectrum. This is a pseudo decomposition of the spectrum into second order sections.
  • a noise robust pitch detector combined with a synthetic glottal pulse generator produces the new residual signal for voiced sounds).
  • the residual signal is used as is.
  • This residual signal is input to the noise free/reduced synthesis filter with noise free/reduced speech as output.
  • the dynamics of the synthesis filter are now constrained via the f, b and g parameters to the range 1 Hz to 10 Hz eliminating the main part of the usual musical tones and leaving the signal/speech component almost unchanged.
  • the input to the synthesis filter depends on the SNR.
  • a robust pitch detector determines the period of the synthetic glottal pulses used as input to the synthesis filter.
  • the present invention relates to a method of noise reduction in a speech signal as set forth in claim 1.
  • the frequency is preferably the formant frequency of the speech signal.
  • the speech signal is preferably transmitted via a telecommunications means, most preferably via a cellular phone, but it may alternatively or additionally be transmitted via other means such as a hearing aid or other suitable microphone/speaker arrangements.
  • Such microphone/speaker arrangements may be connected to telephones and/or video conference arrangements, thus allowing the person or persons using such arrangements to move freely within a certain distance from the telephone/video conference arrangement in the room in which the telephone/video conference arrangement is positioned. When using existing similar arrangements this is not possible due to the noise generated in the signals.
  • Other suitable microphone/speaker arrangements may alternatively or additionally be employed.
  • the parameters are preferably smoothed electronically.
  • the dynamic information regarding the f, b and g parameters may alternatively or additionally comprise information regarding the difference in frequency between the present speech signal and a previously measured speech signal. If such information is compared to knowledge concerning the human voice regarding the capability to change frequency within a certain time interval, it may be determined whether the present speech signal is in fact the speech signal that was previously measured, i.e. whether the present speech signal and the previously measured speech signal are in fact one and the same. If the difference in frequency exceeds a certain limit, the limit being determined on the basis of knowledge of the human voice and its capability of changing the frequency within a certain time interval, the two signals can not be the same. If the difference in frequency does not exceed such a limit, the two signal may be the same.
  • the dynamic information regarding the f, b and g parameters most preferably contain knowledge regarding the development of said parameters in time.
  • the a priori knowledge regarding the dynamics of the human voice may alternatively or additionally comprise knowledge regarding the maximum frequency span of a speech signal as described above.
  • the a priori knowledge regarding human speech production may be compared to measured parameters of the present speech signal as described above.
  • the a priori knowledge may be obtained e.g. by knowledge regarding the anatomy of the mouth and throat region and/or of the vocal chord.
  • the a priori knowledge may alternatively or additionally be based on a number of previous measurements of relevant parameters as described above. Such previous measurements, or alternatively or additionally a representative extract of such measurements, may be stored in look up tables.
  • look up tables are preferably stored electronically in a computer or the like, but may alternatively or additionally be stored in a printed medium such as a book or a sheet of paper.
  • the method comprises a step in which the speech signal is deemed to belong to a process, the process being a signal which may extend over one or more measurement frames.
  • the process is preferably a formant process. It may, e.g., correspond to the pronunciation of a word.
  • the process is an active process at a certain time if it extends over one or more preceding measurement frames.
  • the process is active if there is a detectable signal.
  • a process may also be regarded as active if there is presently no detectable signal, but such a signal has been present for a predefined number of measurement frames preceding the present measurement frame. Thereby a process may be kept artificially alive even though the signal disappears for a short time interval. This is very useful in telecommunication employing cellular phones, since the signal may temporarily disappear during such communication, e.g. due to noise or fall out due to an uneven geographical distribution of sending masts resulting in an uneven network coverage. It is not desirable to deem a process "inactive" in such a case.
  • the smoothing step may comprise the step of determining whether a new formant frequency belongs to an active process. This may be based on a comparison between the a priori knowledge regarding human speech production and the obtained dynamic information.
  • the method may in this case further comprise the step of defining a new process in case the new formant frequency does not belong to an active process, and the new formant frequency is then deemed to belong to said new process.
  • the process may be deemed to be inactive in case no new formant frequency is deemed to belong to said process. Thus, in case the signal is permanently terminated, the process is deemed to be inactive.
  • the method may further comprise the step of artificially maintaining the speech signal for a predetermined number of measurement frames in case the corresponding process is abruptly deemed to be inactive. This makes it possible to keep a process alive in case the signal is temporarily interrupted as described above.
  • the predetermined number of measurement frames may correspond to the maximum duration of the speech signal.
  • a process may be artificially maintained for a time interval corresponding to the time interval it normally takes to produce such a sound.
  • the maximum duration of the speech signal is preferably between 40 ms and 80 ms, such as between 50 ms and 70 ms, such as approximately 60 ms.
  • the new formant may be deemed to belong to an active process if the difference in frequency between said formant and said process does not exceed a predetermined level as described above.
  • the predetermined level is preferably between 200 Hz and 600 Hz, such as between 300 Hz and 500 Hz, such as approximately 400 Hz.
  • the smoothing step preferably comprises the step of filtering the f, b and g parameters.
  • the filtering step is most preferably performed using a first order Infinite Impulse Response (IIR) filter, but it may alternatively or additionally be performed using any other suitable kind of filter.
  • IIR Infinite Impulse Response
  • the method may further comprise the steps of
  • the noise eliminated pitch period may be noise eliminated by using known methods or in the manner described above. Most preferably, it is noise eliminated using known methods as well as in the manner described above.
  • the determining step may comprise the steps of comparing the variance of the speech signal to an upper threshold value and to a lower threshold value. Voiced speech is considered to be present in case the variance of the speech signal exceeds the lower threshold value.
  • the speech signal is considered to contain purely voiced speech, i.e. no unvoiced component is present. In this case the original speech signal is completely replaced by the synthetic glottal pulse.
  • the original speech signal is replaced by a new pulse which is an appropriate combination of the synthetic glottal pulse and the original speech signal.
  • the determining step may comprise the steps of comparing the first formant gain of the speech signal to an upper threshold value and to a lower threshold value. Voiced speech is present in case the first formant gain of the speech signal exceeds the lower threshold value.
  • the noise eliminated pitch period is preferably found from a residual signal of the speech signal.
  • the replacing step is preferably performed by fading out a residual signal and fading in the synthetic glottal pulse.
  • the synthetic glottal pulse is to be understood as either a completely synthetic signal or an appropriate combination of the created synthetic pulse and the original speech signal as described above.
  • At least the smoothing step is performed by a computer system, and the speech signal is most preferably generated in a cellular phone.
  • the present invention further relates to an apparatus for performing noise reduction in a speech signal as set forth in claim 28.
  • the means for obtaining dynamic information regarding f, b and g parameters preferably comprises one or more suitable detectors, such as microphones, and/or one or more computers.
  • the smoothing means preferably comprises one or more computers and/or one or more suitable electronically filters, such as low pass filters and/or high pass filters and/or Infinite Impulse Response (IIR) filters and/or any other suitable kind of fillers.
  • suitable electronically filters such as low pass filters and/or high pass filters and/or Infinite Impulse Response (IIR) filters and/or any other suitable kind of fillers.
  • the means for obtaining and storing a priori knowledge regarding human speech production preferably comprises one or more computers, most preferably comprising electronically storage means. It may further comprise one or more look up tables, the tables being created by using empirically obtained data (i.e. previous measurements of e.g. relevant parameters such as the dynamic information mentioned above) and/or by using theoretical calculation of relevant parameters. Such calculations may be based on knowledge regarding the anatomy of the mouth and throat region of humans as described above.
  • the smoothing means may comprise means for determining whether a new formant frequency belongs to an active process. Such means may comprise means for comparing obtained relevant parameters of the speech signal (as described above) and theoretical and/or empirical data. It is then determined from a predefined set of criteria whether the new formant frequency belongs to the active process or not. This is further described above.
  • the smoothing means preferably comprises means for filtering the f, b and g parameters. It may comprise one or more electronically filters, such as low pass filters and/or high pass filters and/or IIR filters and/or any other suitable filters.
  • the apparatus may further comprise
  • the creating means may comprise one or more tone generators and/or it may comprise a computer.
  • the replacing means may comprise one or more faders, so that the synthetic pulse may be faded in as the original signal is faded out.
  • the determining means may comprise comparing means for comparing the variance of the speech signal to an upper threshold value and to a lower threshold value. Voiced speech is present in case the variance of the speech signal exceeds the lower threshold value.
  • the comparison is most preferably performed by one or more computers having computer storage means for storing information regarding the threshold values.
  • the threshold values may be obtained by theoretical calculations and/or by previous measurements, i.e. it may be empirically obtained.
  • the determining means may comprise comparing means for comparing the first formant gain of the speech signal to an upper threshold value and to a lower threshold value. Voiced speech is present in case the first formant gain of the speech signal exceeds the lower threshold value. The above is equally applicable here.
  • the apparatus may further comprise means for producing a noise eliminated pitch period from a residual signal of the speech signal.
  • a new model based spectral subtraction method has been presented, the method being applicable for noise elimination for high as well as low SNR avoiding linear and non-linear artefacts like musical tones.
  • the method is attractive because of its flexibility, modularity and numerical robustness. Furthermore it is very suitable for real time implementation.
  • Fig. 1 shows the frequency of a speech signal as a function of time. Two processes ('process 1' and 'process 2') are present.
  • the signal of a process disappears it may be due to the signal ''falling out", e.g. due to noise or a bad connection, or it may be due to the fact that the person stops speaking, i.e. the signal actually disappears. If the process has existed for a certain amount of time it is artificially kept alive for some time after the disappearance of the signal as indicated by " ⁇ d" and the dotted lines of the figure.
  • the signal of 'process 2' also appears approximately at time t,. However, it is deemed not to belong to 'process 1', since the difference in frequency between the two signals exceeds the predefined limit ( ⁇ f). Since no other active processes are present a new process ('process 2') is defined, and the signal is deemed to belong to this process.
  • Fig. 2 is a flow diagram illustrating step one as defined above. New f, b and g parameters are used as input (FBG new), and the frequency nearest to a process is found. It is not possible to investigate the new formants one by one, since two or more of the new formants may fulfil the criteria for belonging to a certain process. Therefore, the formant having the frequency being nearest to the process is deemed to belong to said process.
  • the speech signal is found to belong to a process which already exists, that process is updated in accordance with the new signal.
  • a new process is created, and the speech signal is deemed to belong to the new process.
  • the difference in frequency between the new formant and the existing process exceeds a certain predetermined level. The level may be set in the control system.
  • Fig. 3 illustrates step 2 as defined above. It is investigated whether an active process has been updated. In case it has been updated nothing further is done during the present measurement frame.
  • Fig. 4 is a flow diagram illustrating step 3 of the above algorithm. It is first investigated whether the sign of a given process (Process number X) is locked. If this is the case the energy in the process is compared to a lower threshold value ("energi_lav"). If the energy in the process is below said lower threshold value the sign is unlocked. Otherwise the sign is maintained in a locked mode. The energy in the process is then compared to an upper threshold value ("energi_hoej"). If the energy in the process is above said upper threshold value the process is allocated a sign in relation to locked processes. Otherwise it is investigated whether the process has a sign allocated. If this is not the case, the process is allocated a sign in relation to locked processes. Otherwise the mark is not updated, i.e. the previously allocated sign is maintained.
  • a lower threshold value (“energi_lav”
  • energi_hoej” an upper threshold value
  • the formants are allocated mutually alternating signs in order to improve the quality of the sound.
  • Processes having frequencies which are between the frequencies of two different other processes are allocated a sign according to the process being closest in frequency to the formant, i.e. the sign allocated to this process is opposite to the sign of the "closest" process.
  • step 4 of the algorithm above the f, b and g parameters are individually filtered.
  • the coefficients a and b are calculated using a time constant in order to ensure that the filtering is independent of the frame shift.
  • a is chosen in such a way that said time constant relates to the time constant of the speech signal
  • b is chosen in such a way that the DC amplification is 1 (corresponding to the integral of the impulse response).
  • Fig. 5 is a flow diagram illustrating the adjustment of the pitch of a speech signal.
  • This function determines the pitch period of a source sequence.
  • the general idea of the function is based upon the assumption that the pitch is the most dominant periodical component in the speech signal. It is furthermore assumed that the pitch frequency, for physiological reasons, is limited to a certain frequency span.
  • the main issue of determination of the pitch period is the calculation of the auto correlation and the determination of the pitch as the index of the maximum value of the auto correlation in a limited time interval.
  • the pitch sequence which is used as an input to the function, is squared in order to avoid negative values and in order to enhance dynamical differences.
  • the pitch sequence should be a frame of the speech signal or of the residual signal.
  • the squared pitch sequence is then rectified. This step emphasises the periodicity of the pitch by using knowledge regarding the structure of the pitch sequences. This is due to the fact that the pitch is much more powerful than other potential periodical components of the speech signal or the residual signal, and due to the fact that said other components are hidden by the rectification.
  • the auto correlation is calculated for "allowed" pitch periods, i.e. for pitch periods having a duration which is between a lower threshold value and an upper threshold value, where said threshold values may be set initially.
  • the calculated auto correlation is subsequently scaled using a linear weighting function. This is done in order to obtain a robust pitch detection.
  • the index of the maximum value of the weighted auto correlation function is used as an initial guess for the pitch period.
  • the pitch period thus determined is large, and in case a shorter pitch period is more likely the pitch period is adjusted accordingly.
  • a shorter pitch period may be more likely if e.g. the half pitch period is also an "allowed" pitch period. In this case it is possible that the sub-harmonic period of the pitch has been detected in stead of the actual pitch.
  • the initial guess is maintained as the pitch period. Finally, the pitch period is used as an output.
  • Fig. 6 is a flow diagram illustrating the steps of the method according to the present invention in which it is determined whether voiced speech is present, and in which the noise eliminated pitch period is replaced by a synthetic glottal pulse.
  • Fig. 7a-c illustrate how the speech signal may be replaced by a synthetic glottal pulse in case voiced speech is present in the signal.
  • Fig. 7a shows the intonation of a speech signal as a function of time.
  • the intonation at time t2 is slightly larger than the intonation at time t1.
  • the period where no signal is detected may be a period of silence or it may be a period where only completely unvoiced speech is present, i.e. a period during which no intonation may be detected, since an intonation may only be detected if the vocal chord is active, i.e. when voiced speech is present.
  • Fig. 7b shows a synthetic internal glottal pulse.
  • glottal pulses are very dependent upon the person speaking.
  • the glottal pulse shown in Fig. 7b is an "average" pulse which is constructed in such a way that it has a wide spectrum and at the same time has the maximum length.
  • Fig. 7c shows a signal with the synthetic glottal pulse of Fig. 7b phased in instead of a noisy signal.
  • the synthetic glottal pulse has a certain length.
  • the synthetic signal is artificially "extended” by a "zero-signal", so as to match the length of the original pitch period, 'ipitch'.
  • the length of the second pulse is slightly larger than the length of the first pulse ('ipitch(t2)'>'ipitch(t1)'). This is due to the fact that the intonation at time t2 is slightly larger than the intonation at time t1 as indicated in Fig. 7a. It is clear that the only difference between the two pulses is the length of the "zero-signal" following the synthetic glottal pulse.
  • Figs. 8 and 9 show non-filtered and filtered formant tracks (f, b and g parameters), respectively, for the words 'three', 'four' and 'five'.
  • the parameters have been filtered using a so-called Kalman filter.
  • Kalman filter When comparing Fig. 8 and Fig. 9, it is clear that the noise of the signal is considerably reduced during the filtering process, i.e. the fluctuations are reduced. It is also clear that the dynamics of the speech are at the same time left nearly unchanged. What is achieved is thus a speech signal, wherein the noise, which was initially present, is removed or at least considerably reduced in such a way that the original signal, i.e. the actual speech, is left nearly unchanged. It is thus possible to remove unwanted components of a signal (i.e. noise) without removing or changing wanted components of the signal (i.e. actual speech).
  • Figs. 10-12 also show speech signals representing the words 'three', 'four' and 'five'.
  • Fig. 10 shows the original speech signal, including noise components. It is clear that this signal is very noisy, i.e. the signal to noise ratio (SNR) is very small.
  • SNR signal to noise ratio
  • Fig. 11 part of the signal has been replaced by synthetic glottal pulses, and the figure thus represents the output of the flow diagram of Fig. 6.
  • the shifts between the regions in which the original signal has been replaced and the regions in which the original signal has been maintained are very abrupt. This is because the difference between the lower threshold value (“taerskelnedre”) and the upper threshold value (“taerskeloevre”) is relatively small. It is therefore very likely that the variance or gain (whichever parameter is chosen) is either below the lower threshold value or above the upper threshold value, rather than being between the two values. That is, the sound is most often considered to be either completely voiced or completely unvoiced, rather than being considered to contain voiced as well as unvoiced components.
  • Fig. 12 shows a signal in which the noise has been reduced. Conventional methods as well as the method according to the invention has been employed. It is very clear that the SNR has improved considerably as compared to Fig. 10.
  • Fig. 13 and Fig. 14 show frequency spectra corresponding to Fig. 10 and Fig. 12, respectively.
  • SNR has improved considerably during the filtering process.

Claims (35)

  1. Procédé de réduction de la quantité de bruit dans un signal vocal ayant du bruit, comprenant les étapes de :
    obtention à partir d'un signal vocal, des représentations sur la base d'un modèle décrivant la partie quasi stationnaire de la voix ;
    obtention, à partir de ladite représentation sur la base d'un modèle, d'informations dynamiques concernant des paramètres de fréquence (f), de bande passante (b), et de gain (g) dudit signal vocal par rapport au temps ;
    définition de traitements en fonction du temps en supposant que lesdits paramètres f, b et g appartiennent à un traitement selon une connaissance à priori concernant la dynamique de la voix humaine ;
    lissage des paramètres f, b et g par rapport au temps, l'étape de lissage étant réalisée sur lesdits traitements.
  2. Procédé selon la revendication 1, dans lequel la connaissance à priori concernant la dynamique de la voix humaine comprend la connaissance concernant l'étendue de la fréquence maximum d'un signal vocal.
  3. Procédé selon la revendication 1 ou 2, dans lequel le signal vocal est supposé appartenir à un traitement, le traitement étant un signal qui peut s'étendre sur une ou plusieurs trames de mesure.
  4. Procédé selon la revendication 3, dans lequel le traitement est un traitement actif à un certain moment s'il s'étend sur une ou plusieurs trames de mesure précédentes.
  5. Procédé selon la revendication 3 ou 4, dans lequel l'étape de lissage comprend l'étape de détermination si une nouvelle fréquence de formant appartient à un traitement actif.
  6. Procédé selon la revendication 5, comprenant en outre l'étape de définition d'un nouveau traitement dans le cas où la nouvelle fréquence de formant n'appartient pas à un traitement actif, et dans lequel la nouvelle fréquence de formant est alors supposée appartenir audit nouveau traitement.
  7. Procédé selon l'une quelconque des revendications 4 à 6, dans lequel un traitement est supposé être inactif dans le cas où aucune nouvelle fréquence de formant n'est supposée appartenir audit traitement.
  8. Procédé selon la revendication 7, comprenant en outre l'étape de maintien artificiellement du signal vocal pendant un nombre prédéterminé de trames de mesure dans le cas où le traitement correspondant est brusquement supposé être inactif.
  9. Procédé selon la revendication 8, dans lequel le nombre prédéterminé de trames de mesure correspond à la durée maximum du signal vocal.
  10. Procédé selon la revendication 9, dans lequel la durée maximum du signal vocal est entre 40 ms et 80 ms.
  11. Procédé selon la revendication 10, dans lequel la durée maximum du signal vocal est entre 50 ms et 70 ms.
  12. Procédé selon la revendication 11, dans lequel la durée maximum du signal vocal est approximativement 60 ms.
  13. Procédé selon l'une quelconque des revendications 5 à 12, dans lequel le nouveau formant est supposé appartenir à un traitement actif si la différence de fréquence entre ledit formant et ledit traitement ne dépasse pas un niveau prédéterminé.
  14. Procédé selon la revendication 13, dans lequel le niveau prédéterminé est entre 200 Hz et 600 Hz.
  15. Procédé selon la revendication 14, dans lequel le niveau prédéterminé est entre 300 Hz et 500 Hz.
  16. Procédé selon la revendication 15, dans lequel le niveau prédéterminé est approximativement 400 Hz.
  17. Procédé selon l'une quelconque des revendications précédentes, dans lequel l'étape de lissage comprend l'étape de filtrage des paramètres f, b et g.
  18. Procédé selon la revendication 17, dans lequel l'étape de filtrage est réalisée en utilisant un filtre à réponse impulsionnelle infinie (IIR) de premier ordre.
  19. Procédé selon la revendication 18, dans lequel le filtre IIR de premier ordre est un filtre de rétroaction de la forme : y [n] = b · x [n] + a · y [n - 1], où x désigne le signal vocal, y désigne la sortie du filtre, et où a et b sont des paramètres à déterminer.
  20. Procédé selon la revendication 19, dans lequel les paramètres a et b sont déterminés en utilisant une connaissance de modèle du traitement vocal.
  21. Procédé selon l'une quelconque des revendications précédentes, comprenant en outre les étapes de :
    détermination si une voix exprimée est présente ;
    utilisation d'une période de pas du bruit éliminé pour créer une impulsion glottale synthétique dans le cas où la voix exprimée est présente ; et
    remplacement d'au moins une partie du signal vocal original par ladite impulsion glottale synthétique dans le cas où la voix exprimée est présente.
  22. Procédé selon la revendication 21, dans lequel l'étape de détermination comprend les étapes de comparaison de la variance du signal vocal à une valeur de seuil supérieure et à une valeur de seuil inférieure, et dans lequel la voix exprimée est présente dans le cas où la variance du signal vocal dépasse la valeur de seuil inférieure.
  23. Procédé selon la revendication 21 ou 22, dans lequel l'étape de détermination comprend les étapes de comparaison du premier gain de formant du signal vocal à une valeur de seuil supérieure et à une valeur de seuil inférieure, et dans lequel la voix exprimée est présente dans le cas où le premier gain de formant du signal vocal dépasse la valeur de seuil inférieure.
  24. Procédé selon l'une quelconque des revendications 21 à 23, dans lequel la période de pas du bruit éliminé est trouvée à partir d'un signal résiduel du signal vocal.
  25. Procédé selon l'une quelconque des revendications 21 à 24, dans lequel l'étape de remplacement est réalisée par l'affaiblissement d'un signal résiduel et l'affaiblissement dans l'impulsion glottale synthétique.
  26. Précédé selon l'une quelconque des revendications précédentes, dans lequel au moins l'étape de lissage est réalisée par un système d'ordinateur.
  27. Procédé selon l'une quelconque des revendications précédentes, dans lequel le signal vocal est généré dans un téléphone cellulaire.
  28. Appareil pour réaliser une réduction du bruit dans un signal vocal, l'appareil comprenant :
    un moyen pour obtenir à partir d'un signal vocal des représentations sur la base d'un modèle décrivant la partie quasi stationnaire de la voix ;
    un moyen pour obtenir des informations dynamiques concernant des paramètres de fréquence (f), de bande passante (b) et de gain (g) dudit signal vocal par rapport au temps ;
    un moyen pour définir des traitements en fonction du temps en supposant que lesdits paramètres f, b et g appartiennent à un traitement selon une connaissance à priori concernant la dynamique de la voix humaine ;
    un moyen de lissage pour lisser les traitements par rapport au temps.
  29. Appareil selon la revendication 28, dans lequel une connaissance à priori comprend l'étendue de la fréquence maximum d'un signal vocal.
  30. Appareil selon la revendication 28 ou 29, dans lequel le signal vocal est supposé appartenir à un traitement, le traitement étant un signal qui peut s'étendre sur une ou plusieurs trames de mesure, et dans lequel le traitement est un traitement actif à un certain moment s'il s'étend sur une ou plusieurs trames de mesure précédentes, et dans lequel le moyen de lissage comprend un moyen de détermination si une nouvelle fréquence de formant appartient à un traitement actif.
  31. Appareil selon l'une quelconque des revendications 28 à 30, dans lequel le moyen de lissage comprend un moyen pour filtrer les paramètres f, b et g.
  32. Appareil selon l'une quelconque des revendications 28 à 30, comprenant en outre :
    un moyen de détermination pour déterminer si une voix exprimée est présente ;
    un moyen de création pour créer une impulsion glottale synthétique en utilisant une période de pas du bruit éliminé ; et
    un moyen de remplacement pour remplacer au moins une partie du signal vocal original par ladite impulsion glottale synthétique dans le cas où la voix exprimée est présente.
  33. Appareil selon la revendication 32, dans lequel le moyen de détermination comprend un moyen de comparaison pour comparer la variance du signal vocal à une valeur de seuil supérieure et à une valeur de seuil inférieure, et dans lequel la voix exprimée est présente dans le cas où la variance du signal vocal dépasse la valeur de seuil inférieure.
  34. Appareil selon la revendication 32 ou 33, dans lequel le moyen de détermination comprend un moyen de comparaison pour comparer le premier gain de formant du signal vocal à une valeur de seuil supérieure et à une valeur de seuil inférieure, et dans lequel la voix exprimée est présente dans le cas où le premier gain de formant du signal vocal dépasse la valeur de seuil inférieure.
  35. Appareil selon l'une quelconque des revendications 32 à 34, comprenant en outre un moyen pour produire une période de pas du bruit éliminé à partir d'un signal résiduel du signal vocal.
EP00925105A 1999-05-19 2000-05-16 Procede et dispositif de reduction du bruit dans des signaux vocaux Expired - Lifetime EP1208561B1 (fr)

Applications Claiming Priority (5)

Application Number Priority Date Filing Date Title
DKPA199900691 1999-05-19
DK99691 1999-05-19
DKPA200000201 2000-02-08
DK200000201 2000-02-08
PCT/DK2000/000263 WO2000072305A2 (fr) 1999-05-19 2000-05-16 Procede et dispositif de reduction du bruit dans des signaux vocaux

Publications (2)

Publication Number Publication Date
EP1208561A2 EP1208561A2 (fr) 2002-05-29
EP1208561B1 true EP1208561B1 (fr) 2005-01-26

Family

ID=26064462

Family Applications (1)

Application Number Title Priority Date Filing Date
EP00925105A Expired - Lifetime EP1208561B1 (fr) 1999-05-19 2000-05-16 Procede et dispositif de reduction du bruit dans des signaux vocaux

Country Status (5)

Country Link
EP (1) EP1208561B1 (fr)
AT (1) ATE288121T1 (fr)
AU (1) AU4394300A (fr)
DE (1) DE60017758D1 (fr)
WO (1) WO2000072305A2 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003287927A1 (en) * 2002-12-31 2004-07-22 Microsound A/S A method and apparatus for enhancing the perceptual quality of synthesized speech signals
US9666204B2 (en) 2014-04-30 2017-05-30 Qualcomm Incorporated Voice profile management and speech signal generation
US10332520B2 (en) 2017-02-13 2019-06-25 Qualcomm Incorporated Enhanced speech generation
CN112969130A (zh) * 2020-12-31 2021-06-15 维沃移动通信有限公司 音频信号处理方法、装置和电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ES2104842T3 (es) * 1991-10-18 1997-10-16 At & T Corp Metodo y aparato para aplanar formas de ondas de ciclos de frecuencia.
US5479560A (en) * 1992-10-30 1995-12-26 Technology Research Association Of Medical And Welfare Apparatus Formant detecting device and speech processing apparatus
JPH10509256A (ja) * 1994-11-25 1998-09-08 ケイ. フインク,フレミング ピッチ操作器を使用する音声信号の変換方法

Also Published As

Publication number Publication date
WO2000072305A2 (fr) 2000-11-30
ATE288121T1 (de) 2005-02-15
EP1208561A2 (fr) 2002-05-29
DE60017758D1 (de) 2005-03-03
WO2000072305A3 (fr) 2008-01-10
AU4394300A (en) 2000-12-12

Similar Documents

Publication Publication Date Title
EP1250703B1 (fr) Dispositif et procede de reduction de bruit
JP4764995B2 (ja) 雑音を含む音響信号の高品質化
Tchorz et al. SNR estimation based on amplitude modulation analysis with applications to noise suppression
CN109065067A (zh) 一种基于神经网络模型的会议终端语音降噪方法
US8521530B1 (en) System and method for enhancing a monaural audio signal
US8010355B2 (en) Low complexity noise reduction method
US6182033B1 (en) Modular approach to speech enhancement with an application to speech coding
Lin et al. Adaptive noise estimation algorithm for speech enhancement
US20080031467A1 (en) Echo reduction system
CA2404027A1 (fr) Techniques de calcul de signaux de puissance d'elimination du bruit de systemes de communication
Löllmann et al. Low delay noise reduction and dereverberation for hearing aids
US20080004868A1 (en) Sub-band periodic signal enhancement system
US6510408B1 (en) Method of noise reduction in speech signals and an apparatus for performing the method
EP1913591B1 (fr) Amelioration de l'intelligibilite vocale dans un dispositif de communication mobile par commande du fonctionnement d'un vibreur en fonction du bruit de fond
Soon et al. Wavelet for speech denoising
EP1208561B1 (fr) Procede et dispositif de reduction du bruit dans des signaux vocaux
US7392180B1 (en) System and method of coding sound signals using sound enhancement
US7228271B2 (en) Telephone apparatus
US6975984B2 (en) Electrolaryngeal speech enhancement for telephony
Laaksonen et al. Artificial bandwidth expansion method to improve intelligibility and quality of AMR-coded narrowband speech
RU2589298C1 (ru) Способ повышения разборчивости и информативности звуковых сигналов в шумовой обстановке
JP2001249676A (ja) 雑音が付加された周期波形の基本周期あるいは基本周波数の抽出方法
King Enhancing single-channel speech in wind noise using coherent modulation comb filtering
EP0929065A2 (fr) Approche modulaire pour l'amélioration de la qualité de la voix avec application au codage de la parole
CN115527550A (zh) 一种单麦克风子带域降噪方法及系统

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20011219

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR LI

AX Request for extension of the european patent

Free format text: AL;LT;LV;MK;RO;SI

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

RBV Designated contracting states (corrected)

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT;WARNING: LAPSES OF ITALIAN PATENTS WITH EFFECTIVE DATE BEFORE 2007 MAY HAVE OCCURRED AT ANY TIME BEFORE 2007. THE CORRECT EFFECTIVE DATE MAY BE DIFFERENT FROM THE ONE RECORDED.

Effective date: 20050126

Ref country code: BE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

Ref country code: CH

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

Ref country code: FR

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050126

Ref country code: LI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050126

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REF Corresponds to:

Ref document number: 60017758

Country of ref document: DE

Date of ref document: 20050303

Kind code of ref document: P

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050426

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050426

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050426

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050427

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050507

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050516

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20050516

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050516

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050516

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050531

NLV1 Nl: lapsed or annulled due to failure to fulfill the requirements of art. 29p and 29m of the patents act
REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20051027

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20050516

REG Reference to a national code

Ref country code: IE

Ref legal event code: MM4A

EN Fr: translation not filed
PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20050626

R17D Deferred search report published (corrected)

Effective date: 20020529