CN101533642A

CN101533642A - Method for processing voice signal and device

Info

Publication number: CN101533642A
Application number: CN200910078331A
Authority: CN
Inventors: 张晨; 冯宇红
Original assignee: Vimicro Corp
Current assignee: Mid Star Technology Ltd By Share Ltd
Priority date: 2009-02-25
Filing date: 2009-02-25
Publication date: 2009-09-16
Anticipated expiration: 2029-02-25
Also published as: CN101533642B

Abstract

The invention provides a method for processing voice signals and a device, aiming at solving the problem of channel disturbance of voice signals. The method comprises the following steps: in cepstrum domain, extracting cepstrum coefficient of currently observed voice signals to obtain cepstrum of observed voices; estimating the estimated value of the cepstrum of transmission channels according to the statistic mean of the cepstrum of the voice signals which do not pass through the signal path; subtracting the estimated value of the cepstrum of transmission channels from the cepstrum of the observed voices to obtain the cepstrum of the voice signals which do not pass through the signal path at present; the cepstrum of the voice signals which do not pass through the signal path being the separation result of the voice signals and channel disturbance. The invention can eliminate the channel disturbance of the voice signals and enhance the capability of resisting the disturbance of the transmission channels in the process of extracting voice recognition features, thereby improving the recognition rate.

Description

A kind of audio signal processing method and device

Technical field

The present invention relates to the speech recognition technology field, particularly relate to a kind of audio signal processing method and device.

Background technology

Speech recognition technology has begun progressively to enter the practical stage through the research of whole world over half a century at present.Speech chip is used more and more widely in recent years, mainly comprise: the phonetic dialing in the telephone communication, voice identification authentication, phonetic entry, the voice control of automobile, Industry Control and medical field, personal digital assistant (Personal Digital Assistant, interactive voice interface PDA), intelligent toy, household remote, or the like.

Speech recognition process comprises that mainly the pre-service, speech recognition features of voice signal extract, carry out the pattern match several sections according to the speech recognition features that extracts.Wherein, it is exactly the extraction of speech recognition features that voice signal is discerned a most important ring, and the characteristic parameter of extraction must satisfy following requirement: the characteristic parameter that extract (1) can be represented phonetic feature effectively, has good differentiation; (2) between each rank parameter good independence is arranged; (3) characteristic parameter is wanted convenience of calculation, and high-efficient algorithm is preferably arranged, to guarantee the real-time implementation of speech recognition.

But, in present speech recognition system,, cause the decline of recognition performance because the influence of the transmission channel of transmission of speech signals causes the characteristic of voice signal that certain variation has taken place.And this problem has in various degree embodiment for different transmission channels.Therefore, in order to suppress or offset the signal distortion of transmission channel introducing, the channel disturbance of need taking measures to eliminate.

Summary of the invention

Technical matters to be solved by this invention provides a kind of audio signal processing method and device, to solve problem of channel disturbance of voice signals.

In order to address the above problem, the invention discloses a kind of audio signal processing method, comprising:

On the logarithm cepstrum domain, the current voice signal that observes is carried out cepstrum coefficient extract, obtain observing the logarithm cepstrum of voice;

According to the average statistical of the voice signal logarithm cepstrum of channel not, estimation obtains the estimated value of transmission channel logarithm cepstrum;

The logarithm cepstrum of described observation voice is deducted the estimated value of described transmission channel logarithm cepstrum, obtain the voice signal logarithm cepstrum of current not channel; The voice signal logarithm cepstrum of described not channel is the separating resulting of voice signal and channel disturbance.

Wherein, described basis is the average statistical of the voice signal logarithm cepstrum of channel not, and estimation obtains the estimated value of transmission channel logarithm cepstrum, specifically comprises:

Calculate E[Tc (K)]=E[Sc (K)-RefCep (K)]; Wherein, the logarithm cepstrum of Tc (K) expression transmission channel, the logarithm cepstrum of Sc (K) expression observation voice; E[X] expression calculates the average statistical of X; RefCep (K) represents the not average statistical of the voice signal logarithm cepstrum of channel;

When having voice signal on the transmission channel, above-mentioned formula is carried out low-pass filtering is similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₁)+(Sc (K)-RefCep (K)) α ₁Wherein, the estimated value of TranCep (K) expression transmission channel logarithm cepstrum, j is a frame number, α ₁Be smoothing factor.

Preferably, described method also comprises: when not having voice signal on the transmission channel, to aforementioned calculation E[Tc (K)] formula carry out low-pass filtering and be similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₂)+Sc (K) α ₂Wherein, α ₁With α ₂The value difference.

Preferably, described method also comprises: utilize the signal to noise ratio (S/N ratio) of the voice signal that observes, will utilize α ₁And α ₂Two formula that calculate TranCep (K) are comprehensively as follows:

TranCep(K) _j＝TranCep(K) _j-1(1-α ₃)+(Sc(K)-RefCep(K))β ₁+Sc(K)β ₂；

Wherein, β ₁+ β ₂=α ₃, β ₁And β ₂Determine according to described signal to noise ratio (S/N ratio).

Preferably, described method also comprises:

According to formula RefCep (K) _J+1=RefCep (K) _j(1-γ)+Xc (K) γ utilizes the voice signal logarithm cepstrum Xc (K) of current not channel, upgrades the not average statistical RefCep (K) of the voice signal logarithm cepstrum of channel; Wherein γ is an a small amount of, and γ＜α.

The present invention also provides a kind of speech signal processing device, comprising:

The cepstrum coefficient extraction unit is used at the logarithm cepstrum domain, the current voice signal that observes is carried out cepstrum coefficient extract, and obtains observing the logarithm cepstrum of voice;

Channel logarithm cepstrum evaluation unit is used for the not average statistical of the voice signal logarithm cepstrum of channel of basis, and estimation obtains the estimated value of transmission channel logarithm cepstrum;

The interference separation unit is used for the logarithm cepstrum of described observation voice is deducted the estimated value of described transmission channel logarithm cepstrum, obtains the voice signal logarithm cepstrum of current not channel; The voice signal logarithm cepstrum of described not channel is the separating resulting of voice signal and channel disturbance.

Wherein, described channel logarithm cepstrum evaluation unit comprises:

The mean value computation subelement is used to calculate E[Tc (K)]=E[Sc (K)-RefCep (K)]; Wherein, the logarithm cepstrum of Tc (K) expression transmission channel, the logarithm cepstrum of Sc (K) expression observation voice; E[X] expression calculates the average statistical of X; RefCep (K) represents the not average statistical of the voice signal logarithm cepstrum of channel;

The first estimation subelement is used for when having voice signal on the transmission channel, above-mentioned formula is carried out low-pass filtering be similar to E[Tc (K)], obtain

TranCep(K) _j＝TranCep(K) _j-1(1-α ₁)+(Sc(K)-RefCep(K))α ₁；

Wherein, the estimated value of TranCep (K) expression transmission channel logarithm cepstrum, j is a frame number, α 1 is a smoothing factor.

Preferably, described channel logarithm cepstrum evaluation unit also comprises: the second estimation subelement is used for when not having voice signal on the transmission channel, to aforementioned calculation E[Tc (K)] formula carry out low-pass filtering and be similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₂)+Sc (K) α ₂Wherein, α ₁With α ₂The value difference.

Preferably, described channel logarithm cepstrum evaluation unit also comprises:

The comprehensive estimate subelement is used to utilize the signal to noise ratio (S/N ratio) of the voice signal that observes, and will utilize α ₁And α ₂Two formula that calculate TranCep (K) are comprehensively as follows:

Preferably, described device also comprises:

Updating block is used for according to formula RefCep (K) _J+1=RefCep (K) _j(1-γ)+Xc (K) γ utilizes the voice signal logarithm cepstrum Xc (K) of current not channel, upgrades the not average statistical RefCep (K) of the voice signal logarithm cepstrum of channel; Wherein γ is an a small amount of, and γ＜α.

Compared with prior art, the present invention has the following advantages:

At first, the present invention is converted to the logarithm cepstrum with the voice signal that observes in the process of extracting speech recognition features, and passes through the not average statistical of the voice signal logarithm cepstrum of channel, comes the logarithm cepstrum of estimating transmission channel; Then, the logarithm cepstrum of described observation voice is deducted the estimated value of transmission channel logarithm cepstrum, thereby, the interference separation of voice signal and transmission channel is come, extract the not voice signal logarithm cepstrum of channel at cepstrum domain.This method can be eliminated the interference of transmission channel to voice signal, improves the ability of anti-transmission-channel interference in the speech recognition features leaching process, thereby improves discrimination.

And, in the process of the logarithm cepstrum of estimating transmission channel, adopt the method for low-pass filtering, utilize the signal of present frame and former frames just can calculate approximate average, so can satisfy the demand of speech recognition features extract real-time.

Secondly, the evaluation method of transmission channel logarithm cepstrum provided by the invention, can carry out different disposal to voice segments (being the situation that has voice signal on the transmission channel) and non-speech segment (being the situation that does not have voice signal on the transmission channel), promptly adopt different estimation equations respectively, thereby estimate the transmission channel logarithm cepstrum of non-speech segment more accurately, further improve the ability of anti-channel disturbance.

Once more, the present invention is according to actual speaker's characteristics, all utilize the voice signal logarithm cepstrum of the current not channel that calculates in the computation process of every frame voice signal, upgrade the not average statistical of the voice signal logarithm cepstrum of channel (initial value is a constant), thereby make described average statistical more near speaker's personal characteristics.

Description of drawings

Fig. 1 is the embodiment of the invention one described a kind of audio signal processing method process flow diagram;

Fig. 2 is the process flow diagram that the described speech recognition features of the embodiment of the invention is extracted;

Fig. 3 is the described a kind of speech signal processing device structural drawing of apparatus of the present invention embodiment;

Fig. 4 is the structural drawing of channel logarithm cepstrum evaluation unit U32 among Fig. 3 of the present invention;

Fig. 5 is another structural drawing of channel logarithm cepstrum evaluation unit U32 among Fig. 3 of the present invention;

Fig. 6 is the described a kind of speech signal processing device structural drawing of another device embodiment of the present invention.

Embodiment

For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.

The invention provides a kind of audio signal processing method, this method is applicable to general Channel Transmission situation, and general channel satisfies: channel belongs to the convolution channel; The characteristic of channel is more stable, changes slowly; The cepstrum feature of voice signal is tending towards constant from long-time statistical.Therefore, for general Channel Transmission, following relation is arranged:

Suppose that the voice signal of channel (i.e. the voice signal of equalization channel ideally) is not x (n), transmission channel is t (n), and then according to the character of convolution channel, the voice signal s (n) that observes is:

S (n)=x (n) * t (n) (intermediate symbols is a convolution) (1.1)

Above-mentioned formula (1.1) has at frequency domain:

S(i)＝X(i)T(i)

Above-mentioned formula (1.1) has at the logarithm cepstrum domain:

Sc(K)＝Xc(K)+Tc(K) (1.2)

Promptly at the logarithm cepstrum domain, the logarithm cepstrum Sc (K) of observation voice equals the logarithm cepstrum Tc (K) that the voice signal logarithm cepstrum Xc (K) of channel not adds transmission channel.Wherein, K is a cepstrum parameter.

The present invention utilizes formula (1.2) just, on cepstrum domain, by the voice signal that observes is handled, the interference separation of voice signal and transmission channel is come, thereby eliminate the interference of transmission channel to voice signal, extract the not voice signal logarithm cepstrum of channel, promptly extract the balanced cepstrum feature of voice signal.

The realization principle of elimination channel disturbance of the present invention is as follows:

According to formula (1.2), obtaining not, the voice signal or the balanced voice signal of channel at the logarithm cepstrum domain are:

Xc(K)＝Sc(K)-Tc(K) (1.3)

Wherein, Sc (K) can calculate according to observation signal.Therefore, the key of extraction Xc (K) is to estimate the logarithm cepstrum Tc (K) of transmission channel.

To describe the method for eliminating channel disturbance in detail by embodiment below.

Embodiment one:

With reference to Fig. 1, be the described a kind of audio signal processing method process flow diagram of embodiment.

S101 on the logarithm cepstrum domain, carries out cepstrum coefficient to the current voice signal that observes and extracts, and obtains observing the logarithm cepstrum Sc (K) of voice;

The cepstrum coefficient extraction is a general procedure in the voice recognition processing process, and Mel cepstral coefficients (Mel-scale Frequency Cepstral Coefficients is called for short MFCC) is one of characteristic parameter of using always in speech recognition.MFCC has simulated the auditory properties of people's ear, can reflect the apperceive characteristic of people to voice, extracts speaker's personal characteristics from speaker's voice signal, has obtained high recognition in the speech recognition practical application.

Present embodiment can adopt the MFCC coefficient extraction algorithm of standard, this algorithm is at first used FFT (FastFourier Transfonn, Fast Fourier Transform (FFT)) time-domain signal is changed into frequency domain, use the triangular filter group that distributes according to the Mel scale to carry out convolution to its logarithm energy spectrum afterwards, the vector that the output of each wave filter is constituted carries out discrete cosine transform (dct transform) at last, gets the top n coefficient.Because this algorithm belongs to known content, therefore be not described in detail in this.

S102, according to the average statistical of the voice signal logarithm cepstrum of channel not, estimation obtains the estimated value of transmission channel logarithm cepstrum;

This step is the logarithm cepstrum Tc (K) of estimating transmission channel, and the evaluation method that present embodiment adopts is as follows:

The first step utilizes formula (1.3) to calculate the average statistical E[Tc (K) of transmission channel logarithm cepstrum Tc (K)], be specially:

Use E[X] expression calculates the average statistical of X, Xc (K), Sc (K), the Tc (K) of X in can representation formula;

According to formula (1.3), have

E[Xc(K)]＝E[Sc(K)]-E[Tc(K)]

That is: E[Tc (K)]=E[Sc (K)]-E[Xc (K)]=E[Sc (K)]-RefCep (K)

＝E[Sc(K)-RefCep(K)] (1.4)

Wherein, RefCep (K) represents the not average statistical of the voice signal logarithm cepstrum of channel, i.e. E[Xc (K)]=RefCep (K).Described RefCep (K) obtains through long-time statistical according to the voice signal logarithm cepstrum feature vector of (ideal situation) under equalization channel in advance, and K=1-N, N generally get 12.Because RefCep (K) is a constant, still is a constant so RefCep (K) is got average, i.e. E[RefCep (K)]=RefCep (K).

In second step, adopt the method for low-pass filtering to be similar to described average E[Tc (K)], obtain the estimated value of transmission channel logarithm cepstrum;

In formula (1.4), because E[Sc (K)-RefCep (K)] need the long term data statistics to obtain, could further draw E[Tc (K)], so present embodiment is estimated E[Tc (K) by the method for asking approximate value] value.

In order to satisfy real-time demand, at formula (1.4), present embodiment adopts the method for low-pass filtering to be similar to E[X].Described low-pass filtering is meant allows that low frequency signal passes through, but weakens the passing through of signal that (or reduce) frequency is higher than cutoff frequency, promptly removes high frequency interference, thereby reduces sample frequency, avoids frequency aliasing.The method of low-pass filtering has multiple, does not limit at this.What present embodiment adopted is first order IIR (endless impulse response) low-pass filtering, obtains

TranCep(K) _j＝TranCep(K) _j-1(1-α ₁)+(Sc(K)-RefCep(K))α ₁ (1.5)

Wherein, the estimated value of TranCep (K) expression transmission channel logarithm cepstrum, j is a frame number, α ₁Be smoothing factor.

The physical meaning of above-mentioned formula (1.5) is to leach MFCC index variation part slowly, approaches average, therefore can utilize the result of calculation TranCep (K) of formula (1.5) to be similar to average E[Tc (K)].

By formula (1.5) as can be known, present embodiment utilizes the signal of present frame and former frame, just can calculate the average of approximate transmission channel logarithm cepstrum, and will be similar to the estimated value of average, therefore can satisfy the demand of speech recognition features extract real-time as Tc (K).

S103 deducts the estimated value TranCep (K) of described transmission channel logarithm cepstrum with the logarithm cepstrum Sc (K) of described observation voice, obtains the voice signal logarithm cepstrum Xc (K) of current not channel; The voice signal logarithm cepstrum Xc (K) of described not channel is the separating resulting of voice signal and channel disturbance.

Above-mentioned S101 has calculated Sc (K), and S102 has calculated the estimated value of Tc (K), according to formula (1.3), can obtain Xc (K).When voice signal process Channel Transmission, the voice signal logarithm cepstrum after the elimination channel disturbance is Xc (K).

In sum, extracting not, the computing method of the voice signal logarithm cepstrum Xc (K) of channel are summarized as follows:

Xc(K)＝Sc(K)-TranCep(K) (1.6)

TranCep(K) _j＝TranCep(K) _j-1(1-α ₁)+(Sc(K)-RefCep(K))α ₁ (1.5)

Wherein, TranCep (K) initial value is 0, and RefCep (K) draws by adding up in advance.

Said method can be eliminated the interference of transmission channel to voice signal, improves the ability of anti-transmission-channel interference in the speech recognition features leaching process, thereby improves discrimination.

Embodiment two:

The method of the foregoing description one has only considered to exist on the transmission channel situation (being voice segments) of voice signal, but for the situation that does not have voice signal on the transmission channel (being non-speech segment), then the evaluation method of transmission channel logarithm cepstrum can not adopt formula (1.5), and should adopt following formula:

TranCep(K) _j＝TranCep(K) _j-1(1-α ₂)+Sc(K)α ₂ (1.7)

Wherein, the estimated value of TranCep (K) expression transmission channel logarithm cepstrum, j is a frame number, α ₂Also be smoothing factor, but and α ₁The value difference.

In the prior art, the method of much writing to disturb is not all considered the processing of non-speech segment, for example frequency domain is based on LMS (Least Mean Square, lowest mean square) blind balance method and cepstrum domain are based on the blind balance method of LMS, these two kinds of methods all are a kind of blind balance methods, by the LMS algorithm, minimize the error of observation phonetic feature and reference voice feature, thereby obtain the balanced speech characteristic parameter of convergent.Described first method is at spectrum domain, and second method is at cepstrum domain, and the blind equalization of doing based on LMS at cepstrum domain can be so that calculated amount be littler, the convergence better effects if.But in non-speech segment, blind equalization algorithm may bring wrong convergence, thereby influences the extraction of speech recognition features.At this problem, present embodiment can carry out different disposal to voice segments and non-speech segment, promptly adopt the estimation equation of different transmission channel logarithm cepstrums respectively, thereby estimate the transmission channel logarithm cepstrum of non-speech segment more accurately, further improve the ability of anti-channel disturbance.

Preferably, present embodiment can also utilize the signal to noise ratio (snr) of the voice signal that observes, and will utilize α ₁And α ₂Two formula (1.5) and (1.7) of calculating TranCep (K) are comprehensively as follows:

TranCep(K) _j＝TranCep(K) _j-1(1-α ₃)+(Sc(K)-RefCep(K))β ₁+Sc(K)β ₂ (1.8)

Wherein, α ₃Also be smoothing factor, α ₃With α ₁, α ₃Relation be: in voice segments, α ₃Be α ₁In non-speech segment, α ₃Be α ₂

β ₁+ β ₂=α ₃, β ₁And β ₂Determine according to signal to noise ratio (S/N ratio).Signal to noise ratio (S/N ratio) refers to original part in the signal and the ratio of the noise that causes owing to reasons such as equipment self, environmental interference, and usually with " SNR " or " S/N " expression, general is unit with decibel (dB), and signal to noise ratio (S/N ratio) is high more good more.β ₁And β ₂Satisfy: when SNR is high, β ₁β ₂When SNR is low, β ₁＜＜β ₂, see following table for details:

SNR(dB)	20	15	10	5	0	-5	-10
SNR(dB)	20	15	10	5	0	-5	-10	β ₁	100％α ₃	90％α ₃	80％α ₃	70％α ₃	50％α ₃	20％α ₃	0
β ₂	0	10％α ₃	20％α ₃	30％α ₃	50％α ₃	80％α ₃	100％α ₃	β ₁	100％α ₃	90％α ₃	80％α ₃	70％α ₃	50％α ₃	20％α ₃	0

Table 1

In sum, the computing method of the voice signal logarithm cepstrum Xc (K) of channel are not:

Xc(K)＝Sc(K)-TranCep(K) (1.6)

According to table 1, if SNR 〉=0dB, then α ₃=α ₁, otherwise α ₃=α ₂Be that SNR is that 0dB is the critical point of voice segments and non-speech segment.

Embodiment three:

In the aforementioned calculation process, the average statistical RefCep (K) of the voice signal logarithm cepstrum of channel is not by adding up a constant that draws in advance, only representing a blanket average.Present embodiment is in order to make this value more near each speaker's personal characteristics, characteristics according to actual speaker, in the computation process of every frame voice signal, all utilize the voice signal logarithm cepstrum Xc (K) of the current not channel that calculates to upgrade RefCep (K), specific as follows:

RefCep(K) _j+1＝RefCep(K) _j(1-γ)+Xc(K)γ (1.9)

Wherein γ is an a small amount of, and γ＜α.

Promptly at each speaker's voice signal, the constant that RefCep (K) initial value draws for statistics, after having calculated the Xc of present frame (K), utilize this Xc (K) to upgrade RefCep (K) according to formula (1.9), the RefCep after the described renewal (K) is used for the calculating of next frame.Like this, speaker's difference, it is also different to upgrade the RefCep (K) that obtains, and RefCep (K) more near speaker's personal characteristics, can improve phonetic recognization rate.

In actual applications, in order to reach better effect, can be in the time of the estimated value TranCep of transmission channel logarithm cepstrum (K) convergent, and upgrade under the signal to noise ratio snr condition with higher.

Based on the explanation of above-mentioned three embodiment, utilize method that the present invention extracts speech recognition features as shown in Figure 2.

S201 carries out the voice enhancement process to the voice signal s (n) that observes, and the voice signal s ' after being enhanced (n);

This step is a pre-treatment step.The purpose that voice strengthen is to extract pure as far as possible raw tone from noisy voice signal, and enhancement algorithms commonly used at present is a lot, as subtracts spectrometry or Wiener filtering algorithm etc., and present embodiment does not elaborate.

S202 (n) carries out the MFCC coefficient to the voice signal s ' after strengthening and extracts, and obtains observing the logarithm cepstrum Sc (K) of voice;

S203 utilizes the signal to noise ratio (S/N ratio) of long-term speech logarithm cepstrum feature average Re fCep (K) and observation signal, eliminates channel disturbance, obtains the balanced cepstrum feature of voice.

Described RefCep (K) promptly refers to the average statistical of the voice signal logarithm cepstrum of not channel above, and the balanced cepstrum feature of described voice is the speech recognition features that extracts, and this speech recognition features is used for follow-up pattern match identifying.

Based on above content,, compare explanation below by the test example for the performance of elimination channel disturbance method of the present invention is described.This test case adopts the instrument of HTK kit as speech recognition, and the MFCC coefficient of employing standard and single order second derivative thereof are as characteristic parameter.Cycle tests is divided into three groups of A, B, C, every group of 50 numeric strings, and each numeric string comprises 8 numerals, and promptly every group of cycle tests comprises 400 numerals.A is one group of data of gathering down with the training data same channel, and B is one group of data of gathering than relative superiority or inferiority with training data different channels signal to noise ratio (S/N ratio), and C is one group of data of more lowly gathering with training data different channels signal to noise ratio (S/N ratio).

The situation of test is following 5 kinds:

1, do not use the interference method of writing to;

2, adopt existing LMS blind equalization algorithm;

3, the example that adopts (1.5) of the present invention, (1.6) formula to constitute;

4, the example that adopts (1.6) of the present invention, (1.8) formula to constitute;

5, the example that adopts (1.9) of the present invention formula to constitute;

According to 5 kinds of top situations, carry out the speech recognition test of A, B, three groups of sequences of C respectively.Recognition result (annotate: it is that test 1 relatively is benchmark that error rate reduces) as shown in the table:

Table 2

From table data as seen, the interference method of writing to provided by the invention has improved action preferably to the cycle tests of gathering down with the training data different channels.And method of the present invention is compared with existing method, and error rate further reduces.

At the explanation of the foregoing description, the present invention also provides corresponding device thereof embodiment.

With reference to Fig. 3, be the described a kind of speech signal processing device structural drawing of embodiment.Described device mainly comprises:

Cepstrum coefficient extraction unit U31 is used at the logarithm cepstrum domain, the current voice signal that observes is carried out cepstrum coefficient extract, and obtains observing the logarithm cepstrum of voice;

Channel logarithm cepstrum evaluation unit U32 is used for the not average statistical of the voice signal logarithm cepstrum of channel of basis, and estimation obtains the estimated value of transmission channel logarithm cepstrum;

Interference separation unit U33 is used for the logarithm cepstrum of described observation voice is deducted the estimated value of described transmission channel logarithm cepstrum, obtains the voice signal logarithm cepstrum of current not channel; The voice signal logarithm cepstrum of described not channel is the separating resulting of voice signal and channel disturbance.

Wherein, with reference to Fig. 4, described channel logarithm cepstrum evaluation unit U32 may further include:

Mean value computation subelement U321 is used to calculate E[Tc (K)]=E[Sc (K)-RefCep (K)];

Wherein, the logarithm cepstrum of Tc (K) expression transmission channel, the logarithm cepstrum of Sc (K) expression observation voice; E[X] expression calculates the average statistical of X; RefCep (K) represents the not average statistical of the voice signal logarithm cepstrum of channel;

The first estimation subelement U322 is used for when having voice signal on the transmission channel, above-mentioned formula is carried out low-pass filtering be similar to E[Tc (K)], obtain

TranCep(K) _j＝TranCep(K) _j-1(1-α ₁)+(Sc(K)-RefCep(K))α ₁；

Preferably, with reference to Fig. 5, described channel logarithm cepstrum evaluation unit U32 can also comprise:

The second estimation subelement U323 is used for when not having voice signal on the transmission channel, to aforementioned calculation E[Tc (K)] formula carry out low-pass filtering and be similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₂)+Sc (K) α ₂Wherein, α ₁With α ₂The value difference.

Preferably, described channel logarithm cepstrum evaluation unit U32 can also comprise:

Preferably, with reference to Fig. 6, described device can also comprise:

Updating block U34 is used for according to formula RefCep (K) _J+1=RefCep (K) _j(1-γ)+Xc (K) γ utilizes the voice signal logarithm cepstrum Xc (K) of current not channel, upgrades the not average statistical RefCep (K) of the voice signal logarithm cepstrum of channel; Wherein γ is an a small amount of, and γ＜α.

Each embodiment in this instructions all adopts the mode of going forward one by one to describe, and what each embodiment stressed all is and the difference of other embodiment that identical similar part is mutually referring to getting final product between each embodiment.For device embodiment, because it is similar substantially to method embodiment, so description is fairly simple, relevant part gets final product referring to the part explanation of method embodiment.

More than to a kind of method and device of eliminating transmission channel to voice signal influence provided by the present invention, be described in detail, used specific case herein principle of the present invention and embodiment are set forth, the explanation of above embodiment just is used for helping to understand method of the present invention and core concept thereof; Simultaneously, for one of ordinary skill in the art, according to thought of the present invention, the part that all can change in specific embodiments and applications, in sum, this description should not be construed as limitation of the present invention.

Claims

1, a kind of audio signal processing method is characterized in that, comprising:

2, method according to claim 1 is characterized in that, described basis is the average statistical of the voice signal logarithm cepstrum of channel not, and estimation obtains the estimated value of transmission channel logarithm cepstrum, specifically comprises:

Calculate E[Tc (K)]=E[Sc (K)-RefCep (K)];

When having voice signal on the transmission channel, above-mentioned formula is carried out low-pass filtering is similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₁)+(Sc (K)-RefCep (K)) α ₁

3, method according to claim 2 is characterized in that, also comprises:

When not having voice signal on the transmission channel, to aforementioned calculation E[Tc (K)] formula carry out low-pass filtering and be similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₂)+Sc (K) α ₂

Wherein, α ₁With α ₂The value difference.

4, method according to claim 3 is characterized in that, also comprises:

The signal to noise ratio (S/N ratio) of the voice signal that utilization observes will be utilized α ₁And α ₂Two formula that calculate TranCep (K) are comprehensively as follows:

5, according to the arbitrary described method of claim 2 to 4, it is characterized in that, also comprise:

According to formula RefCep (K) _J+1=RefCep (K) _j(1-γ)+Xc (K) γ utilizes the voice signal logarithm cepstrum Xc (K) of current not channel, upgrades the not average statistical RefCep (K) of the voice signal logarithm cepstrum of channel;

Wherein γ is an a small amount of, and γ＜α.

6, a kind of speech signal processing device is characterized in that, comprising:

7, device according to claim 6 is characterized in that, described channel logarithm cepstrum evaluation unit comprises:

The mean value computation subelement is used to calculate E[Tc (K)]=E[Sc (K)-RefCep (K)];

TranCep(K) _j＝TranCep(K) _j-1(1-α ₁)+(Sc(K)-RefCep(K))α ₁；

8, device according to claim 7 is characterized in that, described channel logarithm cepstrum evaluation unit also comprises:

The second estimation subelement is used for when not having voice signal on the transmission channel, to aforementioned calculation E[Tc (K)] formula carry out low-pass filtering and be similar to E[Tc (K)], obtain TranCep (K) _j=TranCep (K) _J-1(1-α ₂)+Sc (K) α ₂Wherein, α ₁With α ₂The value difference.

9, device according to claim 8 is characterized in that, described channel logarithm cepstrum evaluation unit also comprises:

The comprehensive estimate subelement is used to utilize the signal to noise ratio (S/N ratio) of the voice signal that observes, and it is comprehensively as follows to utilize α 1 and α 2 to calculate two formula of TranCep (K):

10, according to the arbitrary described device of claim 7 to 9, it is characterized in that described device also comprises: