MXPA99002669A - Method and device for blind equalizing of transmission channel effects on a digital speech signal - Google Patents

Method and device for blind equalizing of transmission channel effects on a digital speech signal

Info

Publication number
MXPA99002669A
MXPA99002669A MXPA/A/1999/002669A MX9902669A MXPA99002669A MX PA99002669 A MXPA99002669 A MX PA99002669A MX 9902669 A MX9902669 A MX 9902669A MX PA99002669 A MXPA99002669 A MX PA99002669A
Authority
MX
Mexico
Prior art keywords
cepstral
signal
vectors
voice signal
digital voice
Prior art date
Application number
MXPA/A/1999/002669A
Other languages
Spanish (es)
Inventor
Mauuary Laurent
Monne Jean
Original Assignee
France Telecom
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by France Telecom filed Critical France Telecom
Publication of MXPA99002669A publication Critical patent/MXPA99002669A/en

Links

Abstract

The invention concerns a method and a device for blind equalizing of a transmission channel effects on a digital speech signal. The speech signal ({Sn(t)}) is transformed into (1000) cepstral vectors ({Cn(i)}). Each of the cepstral vectors is subjected to an adaptive filtering (1002), based on a reference spectrum ({Rn(i)}) (1001) representing the speech signal long term spectrum, to generate equalized cepstral signals ({Cn(i)}) representing an equalized speech signal. The invention is applicable to automatic speech recognition.

Description

PROCEDURE AND BLIND EQUALITY DEVICE FOR THE EFFECTS OF A TRANSMISSION CHANNEL ON A DIGITAL VOICE SIGNAL The invention concerns a method and a blind equalization device of a transmission channel on a digital voice signal. The reliable transmission of a voice signal is currently an important objective to, in particular, improve the performance of automatic voice recognition systems (RAP), systems that operate through the switched telephone network or the mobile radiotelephone network, GSM . The main difficulty that opposes obtaining a satisfactory and sensibly constant recognition regime is due to the variabilities that exist within the acoustic signal that leads to the voice. The sources of variability are numerous and there are almost always two types of variability: intrinsic variability and extrinsic variability in the speaker. In effect, the acoustic creations of the same word differ according to the speaker's state, the context of the word in the phrase, for example. This difference seems more important if one proceeds in addition to a comparison of acoustic creations coming from several speakers. The acoustic creations of words are certainly acoustic waves, which must be captured to be subjected to a process of recognition. From the acquisition of an acoustic wave, a voice wave, various disturbances are added to it, which has the effect of increasing the variability of the signal captured. The noise of the environment is also part of an acoustic wave captured by the microphone and thus mixed additively to the voice wave. The electronic signal that the microphone releases is a signal, the sum of the signals corresponding to the voice and the noise of the environment. On the other hand, particularly for applications that use the telephone network, the acquisition module formed by the microphone of the combination and telephone lines, which link the user to the voice recognition system, acts as a linear convolution filter that it varies slowly over time. For a classic combination that does not work in hands-free mode, the effects of ambient noise are almost always negligible and these are essentially the convolutive effects of the transmission lines that must be taken into consideration. In this way, each signal observed at the entrance of the automatic voice recognition system contains an almost constant convolutional component for a given call, but varying from one call to another. This convolutive component is harmful to the effectiveness of speech recognition. In order to characterize its effect well, it is essential to project this component into the representation space in which the recognition is made. It is necessary to know the cepstral space in most recognition systems. As an illustrative example, we can recall, in relation to Figure 1, relative to the cepstral representation, that upstream of an automatic voice recognition system, a parameterization module transforms the digitized speech signal into a series of parameter vectors, calculated over fixed length intervals of 10 to 40 ms, and that are coated. In principle, the coating is approximately 50%. These parameter vectors are chosen to represent the most relevant information possible in the signal interval. As shown in the Figure, relative to the general principle of calculating the cepstrum, a frequency FFT transform calculates the spectrum of the signal interval. The logarithm of the spectral energy is calculated immediately. From this logarithm, the cepstro is obtained. { Cn (i)} through the inverse FFT transformation. Generally, only the first ten cepstral coefficients are taken into consideration. These retained coefficients supposedly establish the impulsive response of the vocal conduit and thus carry the pertinent information for the recognition process. further, these coefficients are insensitive to the energy of the input signal, an important feature in the framework of automatic voice recognition. Other representations of the same type have been used in particular for the specific purpose of speech recognition. This is the case of the automatic voice recognition system developed in France by the National Center for Telecommunications Studies of FRANCE TELECOM, PHIL90 system, which uses as parameters vectors the cepstral coefficients based on MEL frequencies (MFCC) by " Mel Frequency based Cepstral Coefficients ". The latter employ a smoothing of the spectrum (Sn (f).) For the estimation of the spectral envelope and of the psychoacoustic knowledge.The spectral smoothing is carried out by means of a filter bank.The human auditory system analyzes the low frequencies with a higher resolution than the other frequencies, while also, in a voice signal the low frequencies are more information rich than the high frequencies, the critical bands of the filter bank are distributed on a non-linear perceptual scale called MEL or BARK The calculation principle of the MFCC, referring to Figure Ib, consists, after the frequency transformation of an interval of the FTT signal, comes the MEL filtering where a vector formed of the energies in each signal interval is calculated. Each of the frequency bands The inverse frequency transformation, inverse FTT, used by an inverse transformation releases the cepstral coefficients based on MEL frequencies. In the space corresponding to these types of representations, a convolutional filter representing the transmission channel is transformed into an almost constant additive distortion which is contained in the cepstral vectors. For a more detailed study of these representations, more information can be found in the following articles published by: H. HERMANSKY, N. MORGAN, A. BAYYA, P. KOHN, "Compensa t ion for the Effect of the Communication Channel in Audi Tory -like Analysis of Speech ", (RASTA-PLP), Eurospeech, p. 1367-1370, Genoa 1991; C. MOKBEL, D. JOUVET, J. MONNE, "Deconvolvement of Telephone Lines Effects for Speech Recognition". Speech Communication, Vol 19, N ° 3, September 1996, pp. 185-196. The distortion introduced changes from one call to the other. As a result of this transformation, and of this representation, the cepstral vectors corresponding to a certain sound occupy a developed part, due to the presence of the distortion in the representation space. The existence of this distortion has the effect of decreasing the discrimination capacity between cepstral vectors that correspond to different sounds and for this reason, this implies the use of more complex models in order to proceed to the discrimination between the different forms of the vocabulary of the application. Consequently, to minimize the effects of telephone lines on an observed signal for an automatic voice recognition process, for example, the problem it causes is essentially in terms of the blind deconvolution of two signals, because a single captor, terminal, is available. However, the original voice signal and the function of the channel transfer occupy the same frequency zones and have, in fact, a large part in common in the cepstral space. It is then, particularly delicate to define the lifters, the lifters are defined by or convention as the attenuation modules or filters within the cepstral field, in order to reduce or suppress the contribution of the channel transfer function and thereby perform the deconvolution sought. The current techniques used have as object, on the one hand, in the field of automatic voice recognition, the robustness of the recognition process to the conditions of acquisition of the telephone signal and, on the other hand, in the field of the treatment of the signal, the reduction of disturbances of a telephone signal in order to improve intelligibility. When the recognition process is applied locally for the vocal command of hands-free telephones, computers, information terminals and others, efforts to reduce disturbances in the signal concentrate on the reduction of disturbances introduced by noise additive. In the aforementioned framework, the usual techniques cover the definition of robust representations, filtering such as spectral subtraction, antenna filtering, state filtering of Markov models or similarly, the in-line noise sum of the environment over the signal or reference models. In the framework of a centralized recognition process, the efforts used also concern the reduction of the effects of telephone communications. The technique used, in general, is that the cepstral vectors have their estimated continuous component subtracted over a sufficiently long horizon. It is indicated that the notion of horizon designates, for a digital telephone signal subdivided into intervals an integer number of successive intervals. This subtraction can be done explicitly, estimating the average and subtracting it, or implicitly by high-pass filtering. Recent work shows that the average of the cepstral vectors over a sufficiently long horizon represents exactly the effects of the telephone lines. You can refer to the article by C. MOKBEL, J. MONNE and D. JOUVET previously mentioned. In the general framework of signal deconvolution, two large classes of deconvolution processes can be distinguished. The first class called blind deconvolution, is based on the spectral, cepstral or even timed properties of signals to define the deconvolution schemes. In the field of telecommunications, adaptive matching algorithms resemble blind deconvolution. For a more detailed description of this type of algorithm, it is possible to refer to the article published by A. BENVENISTE and M. GOURSAT, "Blind Equalizers", IEEE Transactions on Communications, Vol. COM-32, No. 8, August 1984, pp. 871-883, 1984. The second class, similar to the process used by the echo cancellation or non-reverberation algorithms, uses adaptive filtering or spatial filtering in the case of an acoustic antenna. In this case, in general, there are several captors or at least two, one is used for the reference signal and the other for the input signal. For a more detailed description of this type of adaptive filtering, you can refer to the article published by B. IDRO et al., "Adaptive Noise Canceling: Principles and Applications", Proc. of IEEE, vol. 63, No. 12, pp. 1692-1716, Dec. 1975. In the most particular framework of the transmission of digital signals, the problems caused by the equalization process are of the same nature, due to the difficulty to obtain a reference signal of their own, to use a classic filtering scheme to cancel the effect of the transmission channel. In effect, the only signal available is the digital signal observed and already transmitted. In order to simplify the equalization process, the digital sequences known to the receiver can be emitted to the latter in order to identify the channel transfer function. However, this operating mode results in a rapid saturation of the transmission capacity of the channel. In order to remedy this inconvenience, different works have been carried out to establish a blind equalization process. This blind equalization process uses a decision logic and known long-term statistics on the digital signal transmitted, in order to calculate the error that serves to update the coefficients of a filter by stochastic gradient descent. The general scheme of such a process is represented in Figure 1. For a more detailed description of this type of process it will be possible, for example, to refer to the article published by J.J. SCHINK, "Frequency-Domain and Multirate Adaptive Filtering", IEEE Signal Processing Magazine, pp. 15-37, January, 1992. In addition, a method and an adaptive filtering system by blind matching of a digital telephone signal have been the subject of French patent application No. 94 08741 filed on June 13, 1994 on behalf of FRANCE TELECOM. In the aforementioned method and system, the digital signal is subjected to a frequency transformation, FFT, and a subband filtering. Each signal in sub-bands is subjected to adaptive filtering from a reference signal based on long-term statistics over the telephone signal. An equalization is effected by blind deconvolution of the effects of the telephone line on the digital telephone signal. This operating mode, essentially based on a process of blind equalization in the frequency field and therefore spectral justified, however, by the long-term statistical properties of the signal passing over the telephone line is satisfactory. The object of the present invention is always to employ a blind equalization process by adaptive filtering directly applied to the cepstral field. Another object of the present invention for this direct application to the cepstral field is also a reduction of the overall calculation cost. Another object of the present invention is also a reduction in the number of outputs of the filter used. Another object of the present invention for this direct application to the cepstral field is the realization of a better adaptation of the speech recognition treatment processes, particularly the PHIL90 system. Another object of the present invention is to employ a treatment process that allows in some specific situations, an improvement of the recognition regimes compared with those obtained thanks to a blind equalization in the spectral field of the prior art. The method and device object of the present invention are remarkable because the voice signal Sn (t) is transformed into cepstral vectors. Each cepstral vector is subjected to adaptive filtering from a reference cepstro, representative of the long-term cepstro of the speech signal, to generate matched cepstral vectors representative of the matched speech signal. The method and device for blindly matching the effects of a transmission channel on a digital signal, object of the present invention, will be better understood by reading the description and observing the attached drawings in which, in addition to the Figures the one relating to the prior art: Figure 2a represents, in block diagram form, a general flowchart of the process object of the present invention; Figure 2b represents in block diagram form, a detail of the embodiment of the method, object of the invention such as that represented in Figure 2a; Figure 3a represents, in the form of a functional scheme, a device for blindly matching the effects of a transmission channel on a digital voice signal, object of the present invention; Figures 3b and 3c represent a particular embodiment of the device object of the invention shown in Figure 3a. A more detailed description of the method of blind matching of the effects of a transmission channel on a digital speech signal, according to the object of the present invention will now be given in relation to Figure 2a and Figure 2b. On the aforementioned Figure 2a, the digital voice signal is denoted. { Sn (t)} , this signal is considered as transiting on the transmission channel by which the blind equalization must be used, in accordance with the procedure object of the present invention. In general, the digital voice signal is conventionally subdivided into successive blocks that are optionally covered, n designates the range of the current block and by extension the range of any data set or frame, obtained through the use of the subject procedure. the present invention, from that current block. According to the aforementioned figure, the method consists at least in subjecting the digital voice signal. { Sn (t)} to a transformation in a set of vectors cepstral being the vector associated to the frame of rank n denoted. { Cn (i)} , Cn (i) designates the component, or cepstral coefficient, of rank i of each cepstral vector, this transformation is carried out in a step denoted 1000 over the aforementioned Figure. The set of the cepstral vectors is representative of the digital voice signal. { Sn (t)} over a given horizon, the notion of horizon is defined according to the definition given above in the description. The aforementioned step 1000, after which the cepstral vectors. { Cn (i)} are available, it is followed by a designated stage 1001, which calculates a reference cepstro, denoted. { Rn (i) J. Being this cepstro representative for each of the cepstral vectors. { Cn (i)} of the set of cepstral vectors of the long-term cepstrum of this voice signal. It is indicated that it is established, with reference to the publication of C.MOKBEL, D.JOUVET and J.MONNE previously mentioned in the description, that the long-term cepstro of the voice signal is almost constant over all the questions, the notion of quefrencia is, in the cepstral field, the analogue of the notion of frequency in the spectral field. In addition and with reference to the aforementioned publication, the average of the logarithms of the spectral densities, and, therefore, also the average of the cepstral coefficients, over a sufficiently long horizon, represents a constant convolutive component in the observed signal, the which can be validly assimilated to the effect of the transmission channel. Accordingly, the aforementioned step 1001 is followed by a step 1002 consisting of subjecting each of the cepstral vectors. { Cn (i)} to an adaptive filtering from the reference cepster (Rn (i).} to generate a set of matched, denoted cepstral vectors {. Cu (i).}., in which the effect of the transmission channel is suppressed This set of matched vectors is representative of an equalized digital voice signal, so that, according to the method, object of the present invention, the adaptive filtering performed in step 1002 is conducted from the constituted reference input. by the reference cepstrum {.Rn (i).}., adaptive filtering is of course implemented in the cepstral field and applied to the cepstral coefficients Cn (i) .With regard to the use of the adaptive filtering mentioned above, it is indicated that this filtering can be an adaptive LMS filtering, by Least Mean Square, that type of filtering has been described in the aforementioned publication, published by J.J. SCHINK However, according to a particularly advantageous aspect of the method, object of the present invention, the aforementioned adaptive filtering is applied in the cepstral field and not in the spectral field, as was done in the prior art. Accordingly, and in accordance with a particularly advantageous aspect of the method, object of the present invention: - Equalization is based on the fact that the long-term cepstral representation of this signal can be approximated by a constant. For a more detailed description of the elements related to the long-term cepstral representation of the signal, in particular on the statistics relating to this representation and on the possibility of identifying the transmission channel using the long-term cepstrum, it may be sent from useful way to the article published by C.MOKBEL, D. JOUVET and J.MONNE, previously mentioned in the description. - Adaptive filtering is applied directly to the cepstral field, which of course implies the use of a block treatment of the considered digital signal, as we mentioned earlier in the description. A proof of the blind deconvolution process by adaptive filtering, according to the object of the present invention, will be treated immediately. Assuming the observed signal s (k), ie the digital signal transmitted. { Sn (t)} , is the product of convolution of an own signal, that is, the product of a voice signal emitted x (k) by the filter that identifies the telephone channel of transfer function w (k), the observed signal s (k) verify the relation (1): s (k) = x (k) * w (k) In this relation, the operator * represents the operator product of convolution.
Deconvolution in the spectral field In order to introduce the deconvolution of the cepstral field, a theoretical recount of deconvolution in the spectral domain is first introduced. With reference to the aforementioned relation (1), the expression of the spectral density of the power of the two terms of the previously mentioned relation (1) is written, for each interval or block of the speech signal according to the relation (2): Ss (f) = S? (F) 2 (f) In this relation Ss (f), Sx (f) represent respectively the spectral densities of the observed signal power s (k) and the emitted speech signal x (k), while (f) represents the function of transfer of the telephone channel. We can recall that (f) in fact designates the Fourier transform of the filter that identifies the telephone channel w (k). Taking into account the previous relationships, an adaptive filter of the transfer function H (f) can be applied directly on the spectral density of the observed signal power Ss (f) in order to obtain the matched spectrum Sn (f) in which the effect of the telephone line or transmission channel has been suppressed. Under these conditions, the matched spectrum Sn (f) verifies the relation (3): Sn (f) = Ss (f) H (f) = Sx (f) W2 (f) H (f).
From a constant plane spectrum constituting a reference signal R (f) the error E (f) for each interval of the observed signal verifies the relation (4): E (f) = R (f) - S? (F¡ W2 (f) H (f) The optimum filter towards which the transfer function H (f) converges is that which allows minimizing the mean square error, noted EQM in each of the frequency bands f in which the decomposition is made in frequency bands, and spectral decomposition. The mean square error EQM (f) verifies the relation (5): EQM (f) = E [E2 (f)].
Taking into account some really verified hypotheses, the long-term spectrum of the constant voice signal as a function of the transfer of the telephone channel W (f) constant over an extended horizon, the optimum filter is the one that allows minimizing the expression given by the relation (6): EQM (f) = R2 (f) S2? (F) 4 (f) H2 (f) - 2R (f) S? (F) 2 (f) H (f) whatever the value of f, that is, in the set of frequency bands in which the observed signal has been decomposed. The minimization of the mean square error EQM (f) given by the relation (6) above allows to obtain the function of transfer of the optimum filter Hopt (f), which verifies the relation (7): Hopt (f) = R (f) Sx (f) = Cte Sx2 (f) W2 (f) W2 (f) The optimum filter obtained makes it possible to compensate the effect of the transmission channel, that is, of the telephone communication. On the other hand, if we consider a specific reference signal (R (f)) that is to say a reference signal that has the same power as the signal, the expression E [R (f) Sx (f)] then tends to be equal a E [Sx2 (f)] and, under these conditions, the optimum filter approaches the inverse of the transmission channel.
Deconvolution in the cepstral field By ctnalogy in the cepstral field the relation (3) above is written according to the relation (8): Cn (i) = Cs (i) + CH (i) = Cx (i) + Cw (i) + CH (i) In this relation Cn (i), Cs (i), Cx (i), Cw (i) and CH (i) respectively represent the equalized cepster, the cepstro of the observed signal, the cepstro of the own voice signal, is say before the transmission by the telephone line or the transmission channel, the cepstral representation of the effect of the line and the equalizer filter. Taking into account the realization of a constant cepstrum R (i) as reference, the error E (i) for each interval of the observed signal verifies the relation (9): E (i) = R (i) - (C? (I) + Cw (i) + CH (i)) The optimum filter towards which the transfer function H (i) of the filter converges is that which minimizes the mean square error EQM (i) in each of the differences according to the relation (10): EQM (i) = E [E2 (i)].
Taking into account some hypothesis similar to the hypothesis taken into account in the frequency field, these hypotheses expressed in the quefrencial field, that is, long-term cepstro of constant voice, cepstral representation of the effect of the constant transmission line over a horizon extensive, the optimal filter is the one that minimizes the mean square error and that consequently verifies the relation (11): CHopt ^) = R (i) "Cx ^ i)" Cw (i) = Cte - Cw (i) .
The optimum filtering applied in the cepstral field allows compensating the effect of the transmission channel. When the reference cepstro R (i) is chosen equal to the long term average value of the cepstrum, denoted Cx (i), the optimal filtering then approaches the inverse of the transmission channel.
The comparison of the approach corresponding to the blind equalization process according to the method, object of the present invention, with the classic approach of high pass filtering or cepstral subtraction shows in fact, from a recognition performance point of view, this mode The operation significantly equals the performance of the prior art solutions and sometimes exceeds them significantly in some databases, so it will be described in more detail later in the description. As regards stage 1000 of the transformation of the observed digital signal. { Sn (t)} In a set of cepstral vectors, a detailed description of the transformation process itself will be given in relation to Figure 2b. According to the aforementioned figure, the transformation stage 1000 consists in subjecting the digital voice signal in a step 1000a. { Sn (t)} to a frequency transformation freeing a spectrum of frequencies. { Sn (f)} of the digital voice signal. { S "(t)} over the considered horizon. We can remember, in effect, that the digital signal observed. { Sn (t)} it is subdivided into successive blocks of samples in order to carry out a block treatment. The frequency transform used can, for example, be constituted by a fast Fourier transform. The step 1000a is followed by a step 1000b consisting of someiter the frequency spectrum. { Sp (f)} , obtained thanks to the frequency transform used in step 1000a, to a decomposition of frequency sub-bands, to generate a plurality of signals of sub-bands of denoted frequencies. { Vn (j)} . It is indicated that the index j designates here the range of each sub-band, of considered frequencies. In a practical manner, it is indicated that the decomposition into sub-bands of frequencies made in step 1000b can be employed by means of a filter bank of adapted frequencies, the spectrum of the signal. { Sn (f)} it can for example be subdivided into 24 adjacent frequency bands. The aforementioned step 1000b is then followed by a step 1000c consisting of subjecting each signal in frequency sub-bands, ie each signal Vn (j), to a logarithmic attenuation to generate a plurality of signals in frequency sub-bands Attenuated denoted for each reason. { LVn (j)} . The set of signals in sub-bands of attenuated frequencies. { LVn (j)} it is then subjected to an inverse frequency transformation to generate the set of denoted cepstral vectors. { Cn (i)} . The inverse frequency transform, for example, is performed by means of an inverse fast Fourier transform when the frequency transform performed in step 1000a is performed in the form of a direct fast Fourier transform. As far as the calculation of the reference cepster is concerned. { Rn (i)} (It is indicated, advantageously, that this may be constituted by a cepstro signal of constant value for each of the cepstral vectors { Cn (i).}. This constant cepstro signal is representative, in the signal digital voice and in the set of cepstral vectors, the long-term cepstro of the speech signal In a general way we can indicate that the reference cepstro can be obtained from a database as will be described later in the Description: A more detailed description of a device for blindly equalizing the effects of a transmission channel on a digital voice signal that allows the procedure to be used, object of the present invention, will now be given in relation to Figures 3a, 3b and 3c. As shown in Figure 3a, and for a digital voice signal. { Sn (t)} that transits in the framework of a telephone communication, the device according to the object of the present invention has at least one module 1 for transforming the digital voice signal. { Sn (t)} in a set of cepstral vectors. { Cn (i)} representative of the digital voice signal over the determined horizon and in particular by the n-range interval of the observed digital speech signal. The aforementioned module l is followed by a generator module 2 of a representative reference cepster for each of the cepstral vectors. { Cn (i)] of the long-term cepstrum of the speech signal. The reference cepstro allows associating to each cepstral vector. { Cn (i)} a reference cepster. { Rn (i)} representative of the long-term cepstro of the speech signal for each of these vectors, under the conditions that will be explained later in the description. It can be understood that the value of the reference cepstrus can be advantageously approximate, at a constant in time. However, the distribution Rn (i) of this value for each component Cn (i) of range i of the cepstral vectors can be different depending on the range i considered. Meanwhile, in a particular non-limiting embodiment, the reference cepster. { Rn (i)} the reference cepster had an identical constant value for each component, or cepstral coefficient Cn (i) of corresponding rank i, constituent component of each cepstral vector. { C "(i)} . An adaptive filtering module 3 is provided from the reference cepster of each of the cepstral vectors. { Cn (i)} , this module 3 allows to generate equalized vectors in which the effect of the telephone communication is substantially eliminated. It is understood, in particular, that the adaptive filtering module 3 leaves, from the reference cepster (Rn (i).) Representative of the long-term cepstrum of this voice signal for each of the cepstral vectors, perform a filtering, for example, of the LMS type that allows, starting from the calculation of error between the reference cepstrus (Rn (i).} and the set of matched cepstral vectors, to generate the matched cepstral vectors { Cn (i).}. The set of matched cepstral vectors is representative of a digital matched voice signal As it has been represented on Figure 3a, the module 1 for transforming the digital voice signal into a set of cepstral vectors has at least to receive the observed speech signal { Sn (t).}., a module 10 for the frequency transformation of the digital voice signal by delivering a frequency spectrum of the denoted digital speech signal { Sn (f).}. over the considered horizon, a bank of Figure 11 decomposition in N sub-frequency bands of the digital voice signal frequency spectrum, this filter bank ll releases N signals in sub-bands of denoted frequencies. { v "(j)} . The filter bank 11 is followed by a logarithmic attenuation module 12 of each signal in frequency sub-bands, this module 12 releasing a plurality of signals in sub-bands of denoted attenuated frequencies. { LVn (j)} . Finally, the module 1 also has a reverse frequency transform module 13 that receives the signals in sub-bands of attenuated frequencies. { LVn (j)} that allows to generate, from signals in sub-bands of attenuated frequencies, the set of cepstral vectors. { Cn (i)} . As far as the generator module 2 of the reference cepster is concerned, it is indicated that, for a set of cepstral vectors. { Cn (i)} with i e [1, M], the reference cepster. { Rn (i)} it is representative for each of the cepstral vectors of the long-term cepstrum of the speech signal. It is thus understood that the constituent database of the reference cepstrus generator 2 can be organized so as to release the reference cepster representative of the long-term cepstrum of the speech signal as a function of the index i that designates the component of the cepstral vector. { Cn (i) J. On the other , as shown in Figure 3a, the adaptive filtering module 3 has at least one module 30 for calculating the error signal between each matched cepstral vector. { Cn (i)} corresponding and the corresponding reference cepster. { Rn (i)} . The error signal E (i) = Rn (i) - Cn (i) is calculated between each component of rank i of the reference cepster and the matched cepstral vector. In addition, an equalization module 31 of each cepstral vector is provided. { Cn (i)} , this equalization module allows to generate, in fact, from each component Cn (i) and from this error signal, an equalized cepstral vector. { Cn (i)} corresponding, under the conditions that will be explained later with respect to Figures 3b and 3c. In fact, the error signal can be weighted or adapted. In Figure 3b, the device, object of the present invention, has been represented in a particular embodiment directed towards an application to the automatic voice recognition system PHIL90 previously mentioned in the description. Of course, the same references designate, the same elements, module 1 of Figure 3a can be arranged, however, to perform the calculation of the MFCCs previously designated in the description and thereby releasing the corresponding cepstral vector. { Cn (i)} of components Cn (i), from Cn (l) to Cn (M) for each successive interval of the observed digital speech signal. The modules 2 and 3 of Figure 3b represent modules similar to those of the modules 2 and 3 of Figure 3a. Therefore, the error calculation module between the reference cepster. { Rn (i)} and each cepstral vector equaled. { Cn (i)} it is explicit and 3i is represented, as relative to each component Cn (i) of the matched vector cepstral. { Cn (i)} , each module 3i being identical and assuring in fact, for each component Cn (i) of the cepstral vector. { Cn (i)} , the same function of calculation of error and equalization of this component to release a component of the corresponding matched cepstral vector. { Cn (i)} . As shown on FIG. 3c, each module 3i advantageously contains a subtractive circuit 30i receiving, on the one hand, the component Rn (i) of the reference cepster. { Rn (i)} which corresponds to the positive side and, on the other hand, the cepstral coefficient equaled Cn (i) on the negative side to ensure the calculation of the error according to an error signal E (i). Each module 3i receives, in addition, the corresponding cepstral coefficient Cn (i) on an adder circuit 3li, which allows the equalization of this cepstral coefficient Cn (i) to release an equalized cepstral coefficient Cn (i). Furthermore, as shown in Figure 3c, each module 3i, and in particular the equalization module of each cepstral coefficient Cn (i) advantageously comprises a multiplier circuit 300i by a multiplier coefficient μ, this multiplier circuit that receives the signal of error E (i) released by the subtracter circuit 30i and releasing a weighted error signal E * (i). The multiplier circuit 300i is followed by a summing circuit 30li with two inputs and one output, the first input of the first summing circuit 301i receives the weighted error signal E * (i) and the second input of this summing circuit receives the signal released by the latter, by means of a retarder circuit 302i. The retarder circuit 302i introduces a delay equal to the duration of a block of samples of the digital signal. The output of the retarder circuit 302i releases an adaptation signal Hn (i), which is sent to the equalizing summing circuit 31i. The adaptation circuit constituted by the multiplier circuit 300i, the summing circuit 301i and the retarder circuit 302i allows, in this way, to adapt or weight the error signal E (i) to release the adaptation signal Hn (i). On the adaptation of the error signal made by the adaptation circuit, the equalizing summing circuit 3li then releases the corresponding cepstral coefficient equalized in Cu (i). As seen on Figure 3b, it is the same for all the components of the cepstral vectors of range i between one and M of the embodiment considered. The device, object of the present invention, as represented in Figures 3b and 3c, has been used and the comparative tests have allowed to verify the blind equalization technique proposed on various specific databases. The tests were carried out taking as input a file of 10 cepstral coefficients, application of the adaptive filtering and guarantee of the cepstral coefficient vectors based on MEL frequencies and on the filter output thus obtained. The cepstral coefficient files based on MEL frequencies are used directly by the PHIL90 system. The following table presents the advances obtained with respect to a classical technique, designated by the base system, a cepstral subtraction process as defined, for example, in the prior art by C. MOKBEL, J. MONNE and D. JOUVET previously cited in the description, an adaptive filtering in the spectral field as performed, for example, according to the process and system described in the French patent application No. 94 08741 previously cited in the description, and finally a filtering adaptive in the cepstral field according to the procedure and device, objects of the present invention. These techniques are applied in four different databases, designated in the previously cited table, successive by Figures. Numbers with two figures, Trégor and Baladins. The first three databases are so-called laboratory databases in which their registries are of preventative and cooperative announcers. The last database Baladins is a database called exploitation obtained with the registration of calls on a server in exploitation. The conditions of registration of exploitation databases are closer to the actual exploitation conditions. The results of recognition on these databases are, then, more in accordance with the performance obtained in exploitation. The improvements, last column of the table, indicated as Reduction of the error rate, are data with respect to the reference version PHIL90. The range indicated in brackets next to the error rate of the base system represents the confidence interval at 95%. Regarding the process of subtraction cepstral, this process seems innovative at least for the databases Figures and Numbers with two figures. However, the use of this process creates a real-time online implementation problem because this implementation is based on the estimation of the average of the cepstral vectors in the silence or on an extensive voice horizon.
As regards the actual implementation of the device, object of the present invention, it can be indicated that it can of course use the structures already employed in the framework of the PHIL90 system mentioned above in the description. The equalization device, according to the object of the present invention, applied in the space of the cepstral coefficients, is inexpensive in calculation time. In effect, you need for each vector of parameters, every 16 ms, M multiplications and 3N additions where M designates of course the number of cepstral coefficients. The number of operations required is equal to one multiplication and three additions per filter output. This calculation cost is appreciably negligible with respect to the volume of calculation put into play after the determination of the parameter vectors. Taking these indications into account, it is indicated that the cost of calculating the filter used is low and that the implementation of the latter does not present a real-time calculation problem. With respect to the solution proposed by the French patent application No. 94 08741, the cost in calculation volume is reduced since there is only one multiplication and three sums per filter output, instead of three multiplications and two sums in the aforementioned solution, when in addition, the filter, in the cepstral field, has fewer outputs than in the spectral domain. In addition, the dynamics of the cepstral coefficients is weaker than that of the spectral coefficients and the corresponding vectors, the precision in the number of bits allowed in the aforementioned variables requested on the calculations is thus less important. The results obtained are equivalent or slightly higher; this is in particular the case for the databases mentioned in the aforementioned table with respect to the technique retained in the aforementioned patent application No. 94 08741.

Claims (9)

1. Blind equalization process of the effects of a transmission channel on a digital voice signal ( { Sn (t).}.) That transits on this transmission channel, characterized in that said procedure consists at least: in submitting said signal of digital voice to a transformation in a set of cepstral vectors, the set of cepstral vectors is representative of said digital voice signal over a determined horizon; calculate a reference cepstro ( { Rn (i).}.) representative, for each of the cepstral vectors of said set of cepstral vectors, of the long-term cepstrum of this word signal, - submit each of the cepstral vectors to an adaptive filtering, from said reference cepster, to generate a set of matched vectors cepstral ( { Cn (i).}.) In which the effect of the transmission channel is substantially suppressed, said set of matched vectors is representative of an equalized digital voice signal. The method according to claim 1, characterized in that said transformation consists in succession: in submitting said digital voice signal ( { S "(t).}.) To a frequency transform that releases a spectrum of frequencies ( { Sn (f).}.) of said digital voice signal ( { Sn (t).}.) in the considered horizon; subjecting said frequency spectrum ( { Su (f).}.) to a decomposition into frequency sub-bands to generate a plurality of signals in sub-frequency bands ( { vn (j).}.) # - subject each signal in sub-frequency bands to a logarithmic attenuation to generate a plurality of signals in sub-bands of attenuated frequencies ( {LVn (j).}.), - to subject the set of signals in sub -bands of frequencies attenuated to an inverse frequency transformation to generate said set of cepstral vectors ( { Cu (i).}.). Method according to claim 1 or 2, characterized in that the reference cepstrum ( { Rn (i).}.) Is constituted by a constant cepstro signal for each of the representative vectors, in the signal of digital voice and in the set of cepstral vectors, the long-term cepstro of the voice signal. Method according to claim 3, characterized in that the reference cepstrum ( { Rn (i).}.) Has an identical constant value for each component, or cepstral coefficient (Cn (i)) of corresponding rank i, constitutive of each cepstral vector ( { Cn (i).}.). Method according to one of the claims 4, characterized in that said adaptive filtering consists, for each of the cepstral vectors, of said reference cepster ( {Rn (i).}.) Representative of the cepster In the long term of this voice signal, for each of the cepstral vectors, in effecting an LMS-type filtering, said LMS-type filtering allows, from the calculation of error between said reference cepster ( { Rn (i ).}.) and the set of said equalized cepstral vectors, of generating said equalized cepstral vectors. Device for blindly equalizing the effects of a transmission channel on a digital voice signal ( { Sn (t).}.) That transits this transmission channel, characterized in that it comprises at least: said digital voice signal in a set of cepstral vectors, this assembly of cepstral vectors is representative of said digital voice signal over a determined horizon; generator means of a reference cepstro, representative, for each one of the vectors cepstral of said set of vectors cepstral, of the long-term cepstrum of this voice signal; adaptive filtering means, from said reference cepster, of each of the cepstral vectors that allow generating equalized cepstral vectors in which the effect of said transmission channel is substantially eliminated, said set of matched vectors is representative of a digital voice signal matched. Device according to claim 6, characterized in that said means for transforming said digital voice signal into a set of cepstral vectors has at least: means for frequency transformation of said digital voice signal, which releases a frequency spectrum of said signal of digital voice ( { Sn (t).}.) over the considered horizon; a bank of decomposition filters in N sub-frequency bands of said frequency spectrum of said digital voice signal that releases N signals in sub-frequency bands ( { Vn (j).}.); means of logarithmic attenuation of each signal in sub-bands of frequencies that release a plurality of signals in sub-bands of attenuated frequencies ( {LVn (j).}.); means of inverse frequency transformation that allows generating, from said signals in sub-bands of attenuated frequencies ( {LVn (j).}.), said set of vectors cepstral ( { Cn (i).}. ). Device according to one of claims 6 or 7, characterized in that, for each cepstral vector ( {Cn (i).}.) With ie [1, M] and for a reference cepster ( { Ru. i).).) representative for each of the cepstral vectors of the long-term cepstrum of this voice signal, said adaptive filter means have at least.- means of calculating an error signal (E (i)) between each component of rank i (Cn (i)) of each matched vector cepstral ( { Cn (i).}.) and the corresponding component of equal rank (Rn (i)) of the reference cepster ( { Rn (i).}.) # E (i) = Rn (i) - Cn (i), and means of equalization of said cepstral vector ( { Cn (i).}.) That they release from each component (Cn (i)) of each cepstral vector ( { Cn (i).}.) And of said error signal (E ( i)) a component (Cn (i)) of the matched vector cepstral ( { Cn (i).}.). Device according to claim 8, characterized in that said error signal calculation means (E (i)) and said equalizing means said cepstral vector ( { Cn (i).}.) Comprise, for each component (Cn (i)) of said cepstral vector, a subtractor circuit receiving said component (Rn (i)) of the cepstrum of reference ( { Rn (i).}.) and said component (Cn (i)) of said equalized cepstral vector ( { Cn (i).}.) and releasing said error signal (E (i)), - matching means this error signal (E (i)) having: a multiplier circuit by a multiplier or which releases a signal weighted error (E * (i )); an adder circuit of two inputs and one output, a first input that receives the said weighted error signal (E * (i)) and a second input that receives the signal released by said first adder circuit through a retarder circuit of one determined duration, the output of said delay circuit releases a matching signal (Hn (i)), - an adder equalization circuit receiving said cepstral coefficient (Cn (i)) and said adaptation signal (Hn (i) and releases said equalized cepstral coefficient (Cn (i)).
MXPA/A/1999/002669A 1997-07-22 1999-03-19 Method and device for blind equalizing of transmission channel effects on a digital speech signal MXPA99002669A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
FR97/09273 1997-07-22

Publications (1)

Publication Number Publication Date
MXPA99002669A true MXPA99002669A (en) 2007-04-10

Family

ID=

Similar Documents

Publication Publication Date Title
AU751333B2 (en) Method and device for blind equalizing of transmission channel effects on a digital speech signal
CN109686381B (en) Signal processor for signal enhancement and related method
US5859914A (en) Acoustic echo canceler
US5924065A (en) Environmently compensated speech processing
US5706395A (en) Adaptive weiner filtering using a dynamic suppression factor
US6591234B1 (en) Method and apparatus for adaptively suppressing noise
CN108172231B (en) Dereverberation method and system based on Kalman filtering
US5148488A (en) Method and filter for enhancing a noisy speech signal
JP6545419B2 (en) Acoustic signal processing device, acoustic signal processing method, and hands-free communication device
US6744887B1 (en) Acoustic echo processing system
JPH07147548A (en) Adaptable noise eliminating device
JP2003500936A (en) Improving near-end audio signals in echo suppression systems
US7062039B1 (en) Methods and apparatus for improving adaptive filter performance by inclusion of inaudible information
US5905969A (en) Process and system of adaptive filtering by blind equalization of a digital telephone signal and their applications
US6377918B1 (en) Speech analysis using multiple noise compensation
KR100386488B1 (en) Arrangement for communication with a subscriber
US20050008143A1 (en) Echo canceller having spectral echo tail estimator
JP3110201B2 (en) Noise removal device
US20240203439A1 (en) Noise Reduction Based on Dynamic Neural Networks
MXPA99002669A (en) Method and device for blind equalizing of transmission channel effects on a digital speech signal
EP3667662A1 (en) Acoustic echo cancellation device, acoustic echo cancellation method and acoustic echo cancellation program
JP2000252891A (en) Signal processor
EP1748426A2 (en) Method and apparatus for adaptively suppressing noise
Washi et al. Sinusoidal noise reduction method using leaky LMS algorithm
Sasaoka et al. A study on less computational load of noise reduction method based on ALE and noise estimation filter