EP1429316B1 - Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen - Google Patents

Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen Download PDF

Info

Publication number
EP1429316B1
EP1429316B1 EP03027552A EP03027552A EP1429316B1 EP 1429316 B1 EP1429316 B1 EP 1429316B1 EP 03027552 A EP03027552 A EP 03027552A EP 03027552 A EP03027552 A EP 03027552A EP 1429316 B1 EP1429316 B1 EP 1429316B1
Authority
EP
European Patent Office
Prior art keywords
speaker
voice
class
spectrum
filter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Lifetime
Application number
EP03027552A
Other languages
English (en)
French (fr)
Other versions
EP1429316A1 (de
Inventor
Gael Mahe
André Gilloire
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Orange SA
Original Assignee
France Telecom SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom SA filed Critical France Telecom SA
Publication of EP1429316A1 publication Critical patent/EP1429316A1/de
Application granted granted Critical
Publication of EP1429316B1 publication Critical patent/EP1429316B1/de
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band

Definitions

  • the invention relates to a method of multi-correction references of the spectral deformations of the voice introduced by a communication network. She also concerns a system for the implementation of the process.
  • the present invention aims to improve the quality of the speech transmitted on the networks of communication, offering ways to correct spectral deformations of the speech signal, deformations caused by different links in the transmission chain of the network.
  • Figure 1 shows a diagram of a PSTN link.
  • the speech uttered by a speaker is transmitted by a transmitting terminal 10, carried by the subscriber line 20, undergoes an analog-digital conversion (law A), is transmitted by the digital network 40, undergoes a digital conversion (A-law) - analog 50, is transmitted by the subscriber line 60, passes through the receiving terminal 70 to finally be received by the recipient.
  • law A analog-digital conversion
  • A-law digital conversion
  • A-law digital conversion
  • Each speaker is connected by an analog line (twisted pair) to the most close. This is an analog transmission in reference band 1 and 3 in Figure 1.
  • the link between exchanges borrows a network fully digital 40.
  • the spectrum of the voice is affected by two types of distortions when analog signal transmission in baseband.
  • the first type of distortion is filtering bandpass terminals and access points to the digital part of the network. Characteristics typical of this filtering are described by ITU-T under the name of "intermediate reference system” (SRI) [ITU-T, Recommendation P.48, 1988]. These frequency characteristics, from measurements achieved in the 1970s, however, tend to become obsolete. This is why ITU-T advocates since 1996 to use a "modified” SRI [ITU-T, Recommendation P.830, 1996], whose characteristic nominal is shown in Figure 2 for the part emission, and in Figure 3 for the reception part.
  • SRI intermediate reference system
  • the tolerance is ⁇ 2.5 dB; in below 200 Hz, the decay of the characteristic of the overall system must be at least 15 dB per octave.
  • the second distortion affecting the spectrum of the voice is the attenuation of the subscriber lines.
  • a simple model of the local analog line [given in a CNET Technical Note NT / LAA / ELR / 289 by Cadoret, 1983], it is considered that this introduces a weakening of the signal whose dB value depends on its length and is proportional to the square root of the frequency.
  • the attenuation is 3 dB at 800 Hz for a medium line (about 2 km), from 9.5 dB at 800 Hz for the longest lines (up to 10 km).
  • the anti-aliasing filtering the MIC encoder (ref. 30). It is typically a 200-3400 Hz bandpass filter with a almost flat response on the bandwidth and a high attenuation outside the band, according to the Figure 5 template for example [National Semiconductor, August 1994: Technical Documentation "TP3054, TP3057].
  • the voice undergoes spectral distortion as shown in Figure 6 for different combinations of three types of line analogue transmission and reception (ie 6 distortions), under the assumption of equipment respecting the nominal characteristic of the modified SRI.
  • the voice appear so smothered if one of the analog lines is long and suffers in all cases from a lack of "presence" due to the weakening of the low components frequency.
  • ITU-T In the ISDN and the GSM network, the signal is scanned from the terminal.
  • the only parts analogic are the transducers in emission and in reception associated with their amplification channels and respective packaging.
  • ITU-T has defined emission efficiency masks shown in Figure 7, and at the reception represented in Figure 8, valid both for the wired digital telephones [ITU-T, Recommendation P.310, May 2000] and mobile digital terminals or Wireless [ITU-T, Recommendation P.313, September 1999].
  • the effect of these filtering on the stamp is mainly a weakening of the low components frequency, less marked however than in the case of the RTC.
  • the invention relates to the correction of these spectral distortions by centralized processing, that is to say a device installed in the part digital network, as shown in Figure 10 for the RTC.
  • the goal of a correction of the tone of the voice is that the tone of the voice in reception is the most possible close to that of the voice emitted by the speaker, which will be called original voice.
  • equalization-based devices Compensation for spectral distortions introduced into the speech signal by the various elements of the telephone link is permitted at this day by equalization-based devices. This one can be fixed or adapt according to the conditions of transmission.
  • the device described in US Patent 5915235 aims to correct the answer non-ideal frequency of a telephone transducer mobile.
  • the equalizer is described as being placed between the analog-to-digital converter and the encoder CELP, but can be as well in the terminal as in the network.
  • the principle of equalization is bring the spectrum of the received signal closer to a spectrum ideal. Two methods are proposed.
  • the signal is filtered by a fixed filter which prints the ideal long-term spectral characteristics, ie those which it would have at the output of a transducer having the ideal frequency response.
  • These two filters are supplemented by a multiplicative gain equal to the ratio between the long-term energies of the bleacher input and the output of the second filter.
  • the second method illustrated by Figure 5 of aforementioned patent of De Jaco, consists in dividing the signal in sub-bands, and for each sub-band apply a multiplicative gain so as to achieve a target energy, this gain being defined as the ratio between the target energy of the subband and the energy to long term (obtained by smoothing the energy instantaneous) of the signal in this subband.
  • the device of US Patent 5905969 is intended to compensate for system filtering program and the subscriber line to improve centralized recognition of speech and / or quality of speech transmitted.
  • the spectrum of the signal is divided into 24 subbands, and each subband energy is multiplied by an adaptive gain.
  • the adaptation of the gain is performed according to the gradient algorithm stochastic, by minimizing the quadratic error, the error being defined as the difference between subband energy and reference energy defined for each sub-band.
  • Reference energy is modulated at each frame by the energy of the frame current, in order to respect the variations natural short-term level of the speech signal.
  • the convergence of the algorithm makes it possible to obtain in output the 24 equalized subband signals.
  • the equalized speech signal is obtained by Fourier transform inverse energies of subband equalized.
  • Mokbel's patent does not mention results in terms of improving speech quality, and recognizes that the method is suboptimal, in that that she performs a circular convolution. By Moreover, it is doubtful whether a speech signal be rebuilt correctly by transform of Fourier inverse distributed band energies according to the MEL scale. Finally, the device described does not correct not the reception system filtering and the line receiving analogue.
  • the compensation of the line effect is achieved in the "Mokbel” method, cepstral subtraction, in order to improve the robustness of the speech recognition. It is shown that the cepstrum of the transmission channel can be estimated by the average cepstrum of the received signal, the latter being previously bleached by a pre-emphasis filter. This method allows a clear improvement in performance of the recognition systems but is as an off-line method, with 2 to 4 s being necessary to estimate the mean cepstrum.
  • a fixed filter compensates for distortions of an average telephone link, defined as consisting of two medium subscriber lines and transmission and reception systems respecting nominal frequency responses defined in [ITU-T, Recommendation P.48, App.I, 1988]. His answer frequency, on the band [Fc-3150 Hz], is the opposite of the overall response of the analog part of this average link, Fc being the low limit frequency EQ.
  • This pre-equalization is completed by a suitable equalizer, which adapts the correction more precisely to the actual transmission conditions.
  • the long-term spectrum is defined as the time average of the short-term spectra of the successive signal frames; ⁇ ref (f) , called the reference spectrum, is the average spectrum of speech defined by the ITU [ITU-T / P.50 / App. I, 1998], taken as an approximation of the original long-term spectrum of the speaker. Because of this approximation, the frequency response of the adapted equalizer is very irregular and only its general form is relevant. This is why it must be smoothed.
  • the adapted equalizer being realized in the form of a temporal filter RIF, this smoothing in the frequency domain is obtained by a narrow (symmetrical) windowing of the impulse response.
  • the aim of the invention is to remedy disadvantages of the state of the art. It has for object a process and a system to improve the patch correction by reducing the error approximation of the original long-term spectrum of speakers.
  • the spectrum of reference on the equalization frequency band [F1-F2], associated with each class is calculated by Fourier transform of the center of the defined class by his partial fear.
  • the method further comprises a step of pre-equalizing the digital signal by a fixed filter having a frequency response in the frequency band [F1-F2], corresponding to the inverse of a reference spectral deformation introduced by the telephone connection.
  • the module [EQ] restricted to the band F1-F2 is then calculated by discrete Fourier Transform of C p EQ .
  • the first processing block comprises a subset for calculating the coefficients of the partial cepstrum of a communicating speaker and a second subset for operating the ranking of this speaker, this second subset comprising a block for calculating the pitch F 0 , a block for estimating the average pitch from the calculated pitch F 0 , and a classification block applying a discriminant function on the vector x having for its components the mean pitch and the coefficients of the partial cepstrum for classifying said speaker.
  • the system further comprises a pre-equalizer, the signal equalized from spectra differentiated according to speaker class being the x output signal of the pre-equalizer.
  • a series of treatments makes it possible to treat speech signal (upon detection of an activity voice by the system) of each speaker for a to classify the speakers, ie to assign them to a class according to predetermined criteria and for on the other hand correct the voice using the reference of the class of the speaker.
  • the reference spectrum is an approximation of the original long-term spectrum of speakers, the definition of classes of speakers and their respective reference spectra requires to have of a body of speakers registered in undegraded conditions.
  • the spectrum to long-term speaker measured on this record must be considered as its original spectrum, i.e that of his voice at the transmitting end of a telephone link.
  • the proposed treatment makes it possible each class, a reference spectrum closest possible long-term spectrum of each member of the classroom. However, only the part of the spectrum included in the equalization band F1-F2 is taken into account in the appropriate equalization process.
  • the classes are therefore constituted according to the long-term spectrum restricted to this band.
  • the comparison between two spectra is performed at a low level of resolution spectral, so as to reflect only the envelope spectral. That's why we prefer, in the space of the first cepstral coefficients of order greater than 0 (the order coefficient 0 representing energy), the choice of the number of coefficients depending on the spectral resolution desired.
  • the "long-term partial cepstrum”, which is noted as Cp, is thus determined in the treatment as the cepstral representation of the long-term spectrum restricted to a frequency band. If we denote by k1 and k2 the frequency indices respectively corresponding to the frequencies F1 and F2, and ⁇ the long-term spectrum of speech, the partial cepstrum is defined by the relation: where ° denotes the concatenation operation.
  • the (DFT) Inverted Discrete Fourier Transform is calculated for example by IFFT after interpolation samples of the truncated spectrum so as to reach a number of power samples of 2.
  • the interpolation is done simply by inserting a frequency line (interpolated linearly) all three lines in the spectrum restricted to 187-3187 Hz.
  • Classes are constituted for example so unsupervised, according to a hierarchical classification upward.
  • This consists of creating, from N individuals disjointed, a hierarchy of partitions according to the following process: at each step, the two closest elements, one element being either a non-aggregated individual, an aggregate of individuals constituted during a previous step. Proximity between two elements is determined by a measure of dissimilarity which is called distance. The process continues until the aggregation of the whole population.
  • the partition hierarchy thus created can be represented in the form of a tree like that of Figure 12, containing N-1 partitions nested. Each cut of the tree provides a partition, all the more thin that we cut low.
  • the intra-class variation of inertia resulting from their aggregation.
  • a partition is indeed all the better that the created classes are homogeneous, that is to say that the intra-class inertia is low.
  • the intra-class inertia is defined by:
  • Intra-class inertia zero at the initial stage of the calculation algorithm, inevitably increases each aggregation.
  • the score thus obtained is improved by a aggregation procedure around mobile centers, which reduces intra-class variance.
  • the reference spectrum, on the F1-F2 band, associated with each class is calculated by transforming Fourier from the center of the class.
  • the treatment described above is applied to a corpus of 63 speakers.
  • the classification tree of the corpus is shown in Figure 12.
  • the height of a horizontal segment aggregating two elements is chosen proportional to their distance, which allows to visualize the proximity elements grouped in the same class.
  • This representation makes it easy to choose the cutoff level of the tree, and thus classes retained. The cut must be done over level aggregations low, which bring together close individuals, and below high level aggregations, which associate groups of distinct individuals.
  • the treatment involves the use of parameters and criteria for assigning a speaker to one or the other classes.
  • the previously defined classes are homogeneous point of view of sex.
  • the average pitch being both discriminating enough for a male / female ranking and insensitive to the spectral distortions induced by telephone link, so it's used as ranking parameter, together with the cepstrum part.
  • each answer frequency is a path from left to right in the lattice.
  • the amplitude of their variations on this band does not exceed 20 dB, as extremal characteristics of emission systems and lines.
  • (a k ) 1 ⁇ k ⁇ K-1 be the family of discriminating linear functions defined from the training corpus.
  • a speaker represented by the vector x [ F 0 ; C p (1); ...; C p ( L )] is assigned to the class q if the conditional probability of q knowing a (x), denoted by P (q
  • at ( x )) P ( at ( x )
  • a (x)) is proportional to P (a (x)
  • Sq is the covariances matrix of a within the class q, with a qeneric element ⁇ q jk that can be estimated by:
  • the individual x will be assigned to the class q which maximizes fq (x) P (q), which amounts to minimizing on q the function sq (x) called discriminant score:
  • the proposed correction method is implemented by the correction system (equalizer) implemented in the digital network 40 as illustrated in FIG.
  • Figure 16 illustrates the suitable correction system to implement the method.
  • Figure 17 illustrates this system according to an alternative embodiment as it goes to be detailed in the following. These variants concern how to calculate the frequency response module of the adapted equalizer restricted to the F1-F2 band.
  • the pre-equalizer 200 is a fixed filter, whose Frequency response, on the F1-F2 band, is the inverse of the overall response of the analog part an average link as defined previously [ITU-T / P.830, 1996].
  • the stiffness of the frequency response of this filter involves a long impulse response; it is why, so as to limit the delay introduced by treatment, the pre-equalizer is typically performed in the form of an RII filter, order 20 for example.
  • Figure 15 shows the frequency responses Typical pre-equalizer for three values of F1.
  • the dispersion of group delays is less than 2 ms, so that the resulting phase distortion is not noticeable.
  • Block 400A allows calculate the module of the frequency response of the equalizer filter restricted to the equalizer band: EQ dB [F1-F2].
  • the second block 400B makes it possible to calculate the answer impulse of the equalizer filter in order to get the differentiated filter coefficients eq (n) according to the class of the speaker.
  • a 401 voice activity frame detector allows to trigger the different treatments.
  • Processing block 410 allows classification of the speaker.
  • the processing block 420 makes it possible to calculate the long-term spectrum followed by cepstrum calculation partial of this speaker.
  • the output of these two blocks is applied on the operator 428a or 428b.
  • the exit of this operator provides the module of the frequency response of the equalizer adapted in dB restricted to the band equalization F1-F2 via block 429 for 428a, via the block 440 for 428b.
  • Processing blocks 430 to 435 make it possible to calculate the coefficients eq (n) of the filter.
  • the output x (n) of the pre-equalizer is analyzed by successive frames of a typical duration of 32 ms, with an inter-frame overlap of typically 50%. We opens for this an analysis window represented by blocks 402 and 403.
  • the adapted equalization operation is implemented by a RIF 300 filter whose coefficients are calculated at each voice activity frame by the string treatment shown in Figures 16 and 17.
  • each speech activity frame there is a new vector x components of the average pitch and the coefficients 1 to L of the partial cepstrum, in which the discriminant function is applied is defined from the training corpus. This processing is implemented by block 413.
  • the speaker is then assigned to the minimum discriminant score class q.
  • dB [F1-F2], is calculated in one of two ways:
  • the first method ( Figure 16) is to calculate
  • the second method ( Figure 17) is to transcribe the equation (0.3) in the field of the partial cepstre, since we have the partial cepstre of the output x of the pre-equalizer, necessary for the classification of the speaker.
  • the partial cepstres are calculated as indicated previously, by selecting the frequency band F1-F2. This calculation is carried out only for the coefficients 1 to 20, the following coefficients being useless because representative of a spectral fin
  • the 20 coefficients of the partial cepstre of the adapted equalizer are obtained by the operators 414b and 428b according to the relationship (0.13).
  • the processing block 441 supplements these coefficients by zeros, symmetrizes them and calculates, from the vector thus formed, the modulus in dB of the frequency response of the adapted equalizer restricted to the band F1-F2 by implementing the following relation: EQ dB
  • F 1 - F 2 TFD -1 ( VS p eq ).
  • out of the F1-F2 band are calculated by linear extrapolation of the value in dB of
  • the coefficients a1 and a2 are chosen so as to minimize the quadratic error of the approximation over the interval F1-F2 defined by
  • the frequency characteristic thus obtained must to be smoothed. Filtering being done in the field temporal, the means allowing this smoothing is multiply by a narrow window the answer corresponding impulse.
  • the impulse response is obtained by a IFFT operation applied to
  • the answer resulting impulse is multiplied, operator 435, by a time window 434.
  • the window used is typically a Hamming window of length 31 centered on the peak of the answer impulse and is applied on the answer impulse by means of the operator 435.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Cable Transmission Systems, Equalization Of Radio And Reduction Of Echo (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Claims (12)

  1. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme, die durch ein Kommunikationsnetz eingeführt werden, wobei das Verfahren eine an die wirkliche Verzerrung der Übertragungskette angepasste Entzerrungsoperation in einem Frequenzband [F1-F2] umfasst, wobei diese Operation mittels eines digitalen Filters ausgeführt wird, das einen Frequenzgang besitzt, der vom Verhältnis zwischen einem Referenzspektrum und einem Spektrum, der dem Langzeitspektrum des Stimmsignals des Sprechers entspricht, abhängt, dadurch gekennzeichnet, dass es umfasst:
    vor der Operation der Entzerrung des Stimmsignals eines in Kommunikation befindlichen Sprechers:
    Bilden von Klassen von Sprechern mit einer Stimmreferenz pro Klasse,
    dann für einen gegebenen in Kommunikation befindlichen Sprecher:
    Klassifizieren dieses Sprechers, d. h. Zuordnung dieses Sprechers zu einer Klasse anhand von im Voraus definierten Klassifizierungskriterien, damit ihm eine Stimmreferenz entspricht, die der seinen am nächsten kommt,
    Entzerren des digitalisierten Signals der verwendeten Stimme des Sprechers mit der als Referenzspektrum dienenden Stimmreferenz der Klasse, der der Sprecher zugeordnet worden ist.
  2. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 1, dadurch gekennzeichnet, dass:
    das Bilden von Klassen von Sprechern umfasst:
    Wählen eines Körpers aus N Sprechern, die unter nicht verschlechterten Bedingungen aufgezeichnet worden sind, und Bestimmen ihres Langzeit-Frequenzspektrums,
    Klassifizieren der Sprecher des Körpers entsprechend ihres Partialcepstrums, d. h. des Spektrums, das anhand des auf das Entzerrungsband [F1-F2] eingeschränkten Langzeitspektrums berechnet wird, wobei ein im Voraus definiertes Klassifizierungskriterium auf diese Cepstren angewendet wird, um K Klassen zu erhalten,
    Berechnen des Referenzspektrums, das jeder Klasse zugeordnet ist, in der Weise, dass eine Stimmreferenz, die jeder der Klassen entspricht, erhalten wird.
  3. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 2, dadurch gekennzeichnet, dass das Referenzspektrum in dem Entzerrungsfrequenzband [F1-F2], das jeder Klasse zugeordnet ist, durch Fourier-Transformation des Zentrums der Klasse, die durch ihr Partialcepstrum definiert ist, berechnet wird.
  4. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 1, dadurch gekennzeichnet, dass:
    die Klassifizierung eines Sprechers umfasst:
    Verwenden der mittleren Tonhöhe des Stimmsignals und des Partialcepstrums dieses Signals als Klassifizierungsparameter,
    Anwenden einer diskriminierenden Funktion auf diese Parameter, um den Sprecher zu klassifizieren.
  5. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach einem der vorhergehenden Ansprüche, dadurch gekennzeichnet, dass es außerdem einen Schritt umfasst, bei dem das digitale Signal durch ein festes Filter entzerrt wird, das einen Frequenzgang in dem Frequenzband [F1-F2] besitzt, der der Umkehrung der durch die Telephonverbindung eingeführten spektralen Referenzverzerrung entspricht,.
  6. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach einem der vorhergehenden Ansprüche, dadurch gekennzeichnet, dass die Entzerrung des digitalisierten Signals der Stimme eines Sprechers umfasst:
    Erfassen einer Sprechaktivität auf der Leitung, um eine Verarbeitungskette auszulösen, die die Berechnung des Langzeitspektrums, die Klassifizierung des Sprechers, die Berechnung des Moduls des auf das Entzerrungsband [F1-F2] eingeschränkten Frequenzgangs des Entzerrungsfilters und anhand dieses Moduls die Berechnung der Filterkoeffizienten des digitalen Filters, die gemäß der Klasse des Sprechers unterschieden sind, umfasst,
    Steuern des Filters mit den erhaltenen Koeffizienten und
    Filtern des Signals, das den Vorentzerrer verlässt, durch das Filter.
  7. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 6, dadurch gekennzeichnet, dass die Berechnung des Moduls des auf das Entzerrungsband [F1-F2] eingeschränkten Frequenzgangs des Entzerrungsfilters durch Ausführen der folgenden Beziehung ausgeführt wird: EQ(f) = 1 S_RX(f).L_RX(f) γref(f)γ x (f) ,    worin γref(f) das Referenzspektrum der Klasse ist, zu der der Sprecher gehört,
       und worin L_RX der Frequenzgang der Empfangsleitung ist, S_RX der Frequenzgang des Empfangssystems ist und γx(f) das Langzeitspektrum des Eingangssignals x des Filters ist.
  8. Verfahren zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 6, dadurch gekennzeichnet, dass die Berechnung des Moduls [EQ] des auf das Entzerrungsband [F1-F2] einschränkten Frequenzgangs des Entzerrungsfilters durch Ausführen der folgenden Beziehung ausgeführt wird: C p eq = C p ref - C p x - C p s_rx - C p l_rx ,    worin C p eq , C p x, C p s_rx und C p l_rx die Partialcepstren des angepassten Entzerrers bzw. des Eingangssignals x des Entzerrungsfilters bzw. des Empfangssystems bzw. der Empfangsleitung sind, wobei Cp ref das Referenz-Partialcepstrum, das Zentrum der Klasse des Sprechers, ist; wobei das auf das Band F1-F2 eingeschränkte Modul [EQ] durch diskrete Fourier-Transformation von Cp eq berechnet wird.
  9. System zum Korrigieren spektraler Verzerrungen der Stimme, die durch ein Kommunikationsnetz eingeführt werden, wobei das System Mittel zur angepassten Entzerrung in einem Frequenzband [F1-F2] umfasst, die ein digitales Filter (300) enthalten, dessen Frequenzgang vom Verhältnis zwischen einem Referenzspektrum und einem dem Langzeitspektrum eines Stimmsignals entsprechenden Spektrum abhängt, dadurch gekennzeichnet, dass diese Mittel außerdem umfassen:
    Signalverarbeitungsmittel (400) für die Berechnung der Koeffizienten des digitalen Filters, die versehen sind mit:
    einem ersten Block (400A) für Verarbeitungen des Signals, um das Modul des auf das Entzerrungsband [F1-F2] eingeschränkten Frequenzgangs des Entzerrungsfilters gemäß der folgenden Beziehung zu berechnen: EQ(f) = 1 S_RX(f). L_RX(f) γref(f)γ x (f) ,    worin γref(f) das Referenzspektrum ist, das von einem Sprecher zum nächsten unterschiedlich sein kann und einer vorgegebenen Klassenreferenz entspricht, der der Sprecher zugehört, und worin L_RX der Frequenzgang der Empfangsleitung ist, S_RX der Frequenzgang des Empfangssystems ist und γx(f) das Langzeitspektrum des Eingangssignals x des Filters ist;
    einem zweiten Block (400B) für Verarbeitungen, um die Impulsantwort anhand des somit berechneten Moduls des Frequenzgangs zu berechnen, um die Koeffizienten des Filters zu bestimmen, die gemäß der Klasse des Sprechers unterschieden sind.
  10. System zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 9, dadurch gekennzeichnet, dass der erste Verarbeitungsblock (400A) Mittel (414b, 428b) umfasst, um das Partialcepstrum des Entzerrungsfilters gemäß der folgenden Beziehung zu berechnen: C p eq = C p ref - C p x - C p s_rx - C p l_rx ,    worin C p eq , C p x , C p s_rx und C p L_rx die Partialcepstren des angepassten Entzerrers bzw. des Eingangssignals x des Entzerrungsfilters bzw. des Empfangssystems bzw. der Empfangsleitung sind, wobei Cp ref das Referenz-Partialcepstrum, das Zentrum der Klasse des Sprechers, ist; wobei das auf das Band F1-F2 eingeschränkte Modul [EQ] durch diskrete Fourier-Transformation von Cp eq berechnet wird.
  11. System zum Korrigieren spektraler Verzerrungen der Stimme nach Anspruch 9 oder 10, dadurch gekennzeichnet, dass der erste Verarbeitungsblock eine Untereinheit (420) umfasst, um die Koeffizienten des Partialcepstrums eines in Kommunikation befindlichen Sprechers zu berechnen, und eine zweite Untereinheit (410) umfasst, um die Klassifizierung dieses Sprechers vorzunehmen, wobei diese zweite Untereinheit einen Block (411) zum Berechnen der Tonhöhe F0, einen Block (412) zum Schätzen der mittleren Tonhöhe anhand der berechneten Tonhöhe F0 und einen Klassifizierungsblock (413), der auf den Vektor x eine diskriminierende Funktion anwendet, der als Komponenten die mittlere Tonhöhe und die Koeffizienten des Partialcepstrums hat, um den Sprecher zu klassifizieren, umfasst.
  12. System zum Korrigieren spektraler Verzerrungen der Stimme nach einem der Ansprüche 9 bis 11, dadurch gekennzeichnet, dass es einen Vorentzerrer (200) umfasst und dass das Signal, das anhand der gemäß der Klasse des Sprechers unterschiedenen Referenzspektren entzerrt wird, das Ausgangssignal x des Vorentzerrers ist.
EP03027552A 2002-12-11 2003-12-01 Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen Expired - Lifetime EP1429316B1 (de)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
FR0215618 2002-12-11
FR0215618A FR2848715B1 (fr) 2002-12-11 2002-12-11 Procede et systeme de correction multi-references des deformations spectrales de la voix introduites par un reseau de communication

Publications (2)

Publication Number Publication Date
EP1429316A1 EP1429316A1 (de) 2004-06-16
EP1429316B1 true EP1429316B1 (de) 2005-01-12

Family

ID=32320172

Family Applications (1)

Application Number Title Priority Date Filing Date
EP03027552A Expired - Lifetime EP1429316B1 (de) 2002-12-11 2003-12-01 Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen

Country Status (5)

Country Link
US (1) US7359857B2 (de)
EP (1) EP1429316B1 (de)
DE (1) DE60300267T2 (de)
ES (1) ES2236661T3 (de)
FR (1) FR2848715B1 (de)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7574010B2 (en) * 2004-05-28 2009-08-11 Research In Motion Limited System and method for adjusting an audio signal
FR2882171A1 (fr) * 2005-02-14 2006-08-18 France Telecom Procede et dispositif de generation d'un arbre de classification permettant d'unifier les approches supervisees et non supervisees, produit programme d'ordinateur et moyen de stockage correspondants
BRPI0612579A2 (pt) * 2005-06-17 2012-01-03 Matsushita Electric Ind Co Ltd pàs-filtro, decodificador e mÉtodo de pàs-filtraÇço
JP4765461B2 (ja) * 2005-07-27 2011-09-07 日本電気株式会社 雑音抑圧システムと方法及びプログラム
US20070073770A1 (en) * 2005-09-29 2007-03-29 Morris Robert P Methods, systems, and computer program products for resource-to-resource metadata association
US7797337B2 (en) * 2005-09-29 2010-09-14 Scenera Technologies, Llc Methods, systems, and computer program products for automatically associating data with a resource as metadata based on a characteristic of the resource
US20070073751A1 (en) * 2005-09-29 2007-03-29 Morris Robert P User interfaces and related methods, systems, and computer program products for automatically associating data with a resource as metadata
US7490036B2 (en) * 2005-10-20 2009-02-10 Motorola, Inc. Adaptive equalizer for a coded speech signal
US20070198542A1 (en) * 2006-02-09 2007-08-23 Morris Robert P Methods, systems, and computer program products for associating a persistent information element with a resource-executable pair
US20090287489A1 (en) * 2008-05-15 2009-11-19 Palm, Inc. Speech processing for plurality of users
GB2476043B (en) * 2009-12-08 2016-10-26 Skype Decoding speech signals
CN106297813A (zh) * 2015-05-28 2017-01-04 杜比实验室特许公司 分离的音频分析和处理
CN106128466B (zh) * 2016-07-15 2019-07-05 腾讯科技(深圳)有限公司 身份向量处理方法和装置

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4310721A (en) * 1980-01-23 1982-01-12 The United States Of America As Represented By The Secretary Of The Army Half duplex integral vocoder modem system
JP2791036B2 (ja) * 1988-04-23 1998-08-27 キヤノン株式会社 音声処理装置
CA2083304C (en) * 1991-12-31 1999-01-26 Stephen R. Huszar Equalization and decoding for digital communication channel
US5727124A (en) * 1994-06-21 1998-03-10 Lucent Technologies, Inc. Method of and apparatus for signal recognition that compensates for mismatching
FR2722631B1 (fr) * 1994-07-13 1996-09-20 France Telecom Etablissement P Procede et systeme de filtrage adaptatif par egalisation aveugle d'un signal telephonique numerique et leurs applications
US5915235A (en) * 1995-04-28 1999-06-22 Dejaco; Andrew P. Adaptive equalizer preprocessor for mobile telephone speech coder to modify nonideal frequency response of acoustic transducer
US5839103A (en) * 1995-06-07 1998-11-17 Rutgers, The State University Of New Jersey Speaker verification system using decision fusion logic
US5806029A (en) 1995-09-15 1998-09-08 At&T Corp Signal conditioned minimum error rate training for continuous speech recognition
US5895447A (en) * 1996-02-02 1999-04-20 International Business Machines Corporation Speech recognition using thresholded speaker class model selection or model adaptation
FR2766604B1 (fr) * 1997-07-22 1999-10-01 France Telecom Procede et dispositif d'egalisation aveugle des effets d'un canal de transmission sur un signal de parole numerique
US6216107B1 (en) * 1998-10-16 2001-04-10 Ericsson Inc. High-performance half-rate encoding apparatus and method for a TDM system
US6266633B1 (en) * 1998-12-22 2001-07-24 Itt Manufacturing Enterprises Noise suppression and channel equalization preprocessor for speech and speaker recognizers: method and apparatus
FR2822999B1 (fr) * 2001-03-28 2003-07-04 France Telecom Procede et dispositif de correction centralisee du timbre de la parole sur un reseau de communications telephoniques

Also Published As

Publication number Publication date
DE60300267T2 (de) 2006-03-23
EP1429316A1 (de) 2004-06-16
FR2848715A1 (fr) 2004-06-18
US20040172241A1 (en) 2004-09-02
DE60300267D1 (de) 2005-02-17
US7359857B2 (en) 2008-04-15
ES2236661T3 (es) 2005-07-16
FR2848715B1 (fr) 2005-02-18

Similar Documents

Publication Publication Date Title
EP1016072B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP1429316B1 (de) Verfahren und Vorrichtung zur multi-referenz Korrektur der durch ein Kommunikationsnetzwerk verursachten spektralen Sprachverzerrungen
EP2002428B1 (de) Verfahren zur trainierten diskrimination und dämpfung von echos eines digitalsignals in einem decoder und entsprechende einrichtung
RU2507608C2 (ru) Устройства и способы для обработки аудио сигнала с целью повышения разборчивости речи, используя функцию выделения нужных характеристик
CN1985304B (zh) 用于增强型人工带宽扩展的系统和方法
EP2122607B1 (de) Verfahren zur aktiven minderung von störgeräuschen
CA2266654C (fr) Procede et dispositif d'egalisation aveugle des effets d'un canal de transmission sur un signal de parole numerique
EP1899961A1 (de) Verfahren und system zur beurteilung der sprachqualität
EP0752181B1 (de) Echokompensation mit adaptivem filter im frequenzbereich
FR2596936A1 (fr) Systeme de transmission d'un signal vocal
EP0608174A1 (de) System zur prädiktiven Kodierung/Dekodierung eines digitalen Sprachsignals mittels einer adaptiven Transformation mit eingebetteten Kodes
US8694311B2 (en) Method for processing noisy speech signal, apparatus for same and computer-readable recording medium
US8744846B2 (en) Procedure for processing noisy speech signals, and apparatus and computer program therefor
EP0998166A1 (de) Anordnung zur Verarbeitung von Audiosignalen, Empfänger und Verfahren zum Filtern und Wiedergabe eines Nutzsignals in Gegenwart von Umgebungsgeräusche
EP2347411B1 (de) Vor-echo-dämpfung in einem digitalaudiosignal
EP0692883B1 (de) Verfahren zur blinden Entzerrung, und dessen Anwendung zur Spracherkennung
US20110029305A1 (en) Method for processing noisy speech signal, apparatus for same and computer-readable recording medium
EP1039736B1 (de) Verfahren und Vorrichtung zur adaptiven Identifikation und entsprechender adaptiver Echokompensator
FR2894707A1 (fr) Procede de mesure de la qualite percue d'un signal audio degrade par la presence de bruit
FR2739481A1 (fr) Appareil et procede d'elimination du bruit
EP3192073A1 (de) Unterscheidung und dämpfung von vorechos in einem digitalen audiosignal
EP1016073B1 (de) Verfahren und vorrichtung zur rauschunterdrückung eines digitalen sprachsignals
EP1021805B1 (de) Verfahren und vorrichtung zur verbesserung eines digitalen sprachsignals
EP2515300B1 (de) Verfahren und System für die Geräuschunterdrückung
EP0989544A1 (de) Vorrichtung und Verfahren zur Filterung eines Sprachsignals, Empfänger und Fernsprechsystem

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LI LU MC NL PT RO SE SI SK TR

AX Request for extension of the european patent

Extension state: AL LT LV MK

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

17P Request for examination filed

Effective date: 20040429

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): DE ES GB IT

AX Request for extension of the european patent

Extension state: AL LT LV MK

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

Free format text: NOT ENGLISH

REF Corresponds to:

Ref document number: 60300267

Country of ref document: DE

Date of ref document: 20050217

Kind code of ref document: P

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

Free format text: FRENCH

AKX Designation fees paid

Designated state(s): DE ES GB IT

GBT Gb: translation of ep patent filed (gb section 77(6)(a)/1977)

Effective date: 20050419

REG Reference to a national code

Ref country code: ES

Ref legal event code: FG2A

Ref document number: 2236661

Country of ref document: ES

Kind code of ref document: T3

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

26N No opposition filed

Effective date: 20051013

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: GB

Payment date: 20151125

Year of fee payment: 13

Ref country code: IT

Payment date: 20151120

Year of fee payment: 13

Ref country code: DE

Payment date: 20151119

Year of fee payment: 13

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: ES

Payment date: 20151202

Year of fee payment: 13

REG Reference to a national code

Ref country code: DE

Ref legal event code: R119

Ref document number: 60300267

Country of ref document: DE

GBPC Gb: european patent ceased through non-payment of renewal fee

Effective date: 20161201

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IT

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20161201

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20170701

Ref country code: GB

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20161201

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: ES

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20161202

REG Reference to a national code

Ref country code: ES

Ref legal event code: FD2A

Effective date: 20181119