CN1299254C

CN1299254C - Approaching end voice detection realizing method for echo inhibitor

Info

Publication number: CN1299254C
Application number: CNB2004100091628A
Authority: CN
Inventors: 肖志方; 杜军; 王侃
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2004-05-31
Filing date: 2004-05-31
Publication date: 2007-02-07
Anticipated expiration: 2024-05-31
Also published as: CN1584977A

Abstract

The present invention relates to a realizing method for detecting the voice of a proximal end in an echo suppressor, which can detect the voice of the proximal end by Geigel algorism. M sampling points which are obtained in the input sequence y (i) of a remote end are used as one sub-frame, and M is smaller than the length N of a filter in the echo suppressor; when the inputting number of a first sampling point of each sub-frame is counted, a maximum value of the latest data of total N-M+1 sampling points from the absolute value y (i) to y (i-N+M) in a sliding window is firstly obtained, wherein the maximum value of the latest data is inputted by the remote end and is saved as Frame_MAX; then M sampling points of the sub-frame are calculated one by one, and the maximum value Frame_MAX and the maximum value of rest M-1 sampling points in the current sliding window can be used as a maximum value MAX of each sampling point of the sub-frame; if a proximal end input sequence s (i) of the sub-frame can satisfy a condition that an absolute value s (i) is more than and is equal to u*MAX, then the proximal end has a voice. The method of the present invention can greatly reduce calculation quantity which is needed by program calculation under the condition that a calculation result fully accords with that of the existing method, and cost is reduced.

Description

The implementation method that adjacent speech detects in a kind of echo suppressor

Technical field

The present invention relates to the implementation method that adjacent speech detects in the echo suppressor in a kind of communication apparatus.

Background technology

The echo inhibition is a technology that is widely used in communication field at present, and this The Application of Technology purpose is to eliminate because the impedance of circuit does not match or because the echo that acoustic reflection causes thinks that both call sides provides higher-quality speech communication service.

Present most widely used echo suppressor is made up of following three parts, is respectively that duplexing detecting device, echo are estimated canceller and nonlinear processor.Wherein the effect of duplexing detecting device is that indication echo estimates whether canceller carries out filter coefficient update and whether open nonlinear processor.Echo estimates that the core texture of canceller is a sef-adapting filter, the effect of this wave filter is to estimate echo data and cut these data to reach the effect that weakens or eliminate echo from the near-end input according to the voice that far-end is imported, be the difference of near-end input and data estimator simultaneously and carry out the renewal of filter coefficient, thereby make wave filter reach convergence so that filter coefficient more approaches the echo path parameter according to certain convergence algorithm according to evaluated error.The effect of nonlinear processor is to eliminate residual echo by some nonlinear disposal routes, thereby further promotes voice quality.

Duplex detecting device most important work is to detect near-end whether the speech input is arranged.If detect that near-end has speech input then the coefficient update that stops sef-adapting filter immediately, and stop nonlinear processor.Because the difference of wave filter estimated echo and near-end input this moment be not evaluated error but evaluated error and near-end speech with, if carry out the renewal of filter coefficient this moment, then can cause wave filter to be in abnormal operating state, for some convergence algorithm even can cause filter divergence.If still carry out Nonlinear Processing then can cause the near-end speech clipped wave this moment, influence voice quality.Find out that thus the adjacent speech detection plays important effect in echo suppressor.For on same hardware platform, realizing more multichannel echo inhibition processing or integrated greater functionality, be necessary to provide a kind of high efficiency, stable adjacent speech detection method.

Algorithm the most classical during at present adjacent speech detects is the Geigel algorithm, and to be hypothesis echo have the decay of 6dB at least with respect to voice to the prerequisite of this algorithm, and promptly the amplitude of echo is to I haven't seen you for ages less than half of far-end speech amplitude.Suppose that filter length is N, the current sampled data of near-end is s (i), and the far-end list entries is that y (i) is to y (i-N-1), if inequality then | s (i) | 〉=0.5max (| y (i) |, | y (i-1) | ..., | y (i-N+1) |) set up and to show that then near-end necessarily has phonetic entry.

As can be seen from the above equation, if adopt the described method 1 of Fig. 1 directly to judge according to above judgment condition, then corresponding each near-end input sample point needs to calculate the maximal value of N point far-end list entries, and its calculated amount is directly proportional with filter order N.The echo inhibition with realization 16ms in the telephone system of 8KHz sampling is an example, upward the N point data being got maximal value according to general processor or DSP needs 2N instruction cycle to calculate at least, the calculated amount that need p.s. is 16*8*2*8000=2048000, close 2MIPS, such operand near in addition greater than the calculated amount of core filter computing, it is very uneconomic therefore directly adopting the words of this method computing in general processor or DSP.

The method 2 that a kind of improved implementation method is described for Fig. 2, this implementation method is to calculate the maximal value of far-end list entries when initialization, in follow-up processing, only need to calculate | y (i) | and the maximal value between the previous frame maximal value is the maximal value of these frame data, but after computing finishes, need to detect | y (i-N+1) | whether be this frame maximal value, if then the value that the next time shifts out from moving window of expression is a maximal value, need recomputates the maximal value that N-1 order and give frame use down.As can be seen under the normal voice situation, this situation can reduce operand significantly by the principle of this method.But under limiting case, for example far-end is input as under the quiet situation in a period of time, and the absolute value of promptly all far-end list entries equates that all then Ci Shi operand and above-mentioned operand do not change.Be that this method can be reduced to the mean value of operand a uncertain level, but the operand peak value does not become.And suppress in this real-time operation echo, need guarantee before next data point or Frame arrival, must finish the operation of a data point or Frame, therefore the developer must come distributes calculation resources according to the peak value of treatment capacity when design, and the computing peak value that this method provides is identical with the computing peak value of method 1, therefore unactual with respect to method 1 progress.The arithmetic average value of this method is different with the difference of remote end input signal simultaneously, can not obtain actual motion calculated amount peak value by the method for test, and therefore being not easy to the developer carries out performance statistics.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of low operand solution of adjacent speech detection, thereby can on same computing platform, realize more multichannel echo inhibition solution, or be the more processor computational resource of other module surplus, thereby reach the purpose that reduces cost.Can just can calculate simultaneously the operand of module in the development phase accurately, be convenient to the developer and carry out performance statistics and computational resource allocation, and help the stable and compatible of product.

To achieve these goals, the invention provides the implementation method that adjacent speech detects in a kind of echo suppressor, adopting the Geigel algorithm to carry out adjacent speech detects, its characteristics are, in far-end list entries y (i), get M sampled point as a subframe, M is less than the filter length N in the described echo suppressor, when the input of first sampled point of each subframe is calculated, at first obtain in the moving window of described far-end input | y (i) | arrive | y (i-N+M) | be total to the maximal value of the latest data of N-M+1 sampled point, and save as Frame_MAX, M sampled point to described subframe carries out computing one by one then, the maximal value of getting the data of M-1 remaining in described maximal value Frame_MAX and current moving window sampled point is the maximal value MAX as each sampled point of described subframe, if the near-end list entries s (i) of the sampled point of described subframe satisfies condition | s (i) | 〉=u*MAX, then described near-end has speech.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics are that the method for the subframe among the described far-end list entries y (i) being carried out the adjacent speech detection comprises the steps:

Step 1, initialization one sub-frame count device count value is 0;

Step 2, getting near-end input data absolute value is | s (i) |, getting far-end input data absolute value is | y (i) |, and described data being saved in the remote data buffer zone, this buffer zone has been preserved from | y (i) | to | y (i-N+1) | the Serial No. of N sampled point altogether;

Step 3, if the sub-frame count device is 0, represent that then current remote data sampled point is first sampled point of this subframe, calculate in the far-end input moving window up-to-date from | y (i) | to | y (i-N+M) | amount to the maximal value of N-M+1 sampling number certificate, and this value is saved as Frame_MAX; If sub-frame count device count value is not 0, then execution in step four;

Step 4, the maximal value MAX of a Frame_MAX and a remaining M-1 sampled point in the calculating remote data buffer zone;

Step 5, relatively decay factor u imports the data absolute value with product u*MAX and the near-end of MAX | s (i) | size, if satisfy condition | s (i) | 〉=u*MAX shows that then near-end has phonetic entry;

Step 6 is upgraded described sub-frame count device, and sub-frame count device numerical value adds 1;

Step 7 if sub-frame count device value equals M, then is reset to 0 with sub-frame count device numerical value;

Step 8 jumps to step 2, and next sampling number certificate of described subframe is calculated.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics be, when described method was applied to itself to have in the application of voice frame length notion, M should be the common divisor of described voice frame length simultaneously.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics are, for a fixing system, according to different design requirements, one or two optimum M value is arranged, the processor calculating instruction cycles minimum that when adopting the M value of described optimum, is consumed.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics are that described optimum M value obtains by the method for actual test.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics are that in the described method, the operand of all subframes is all consistent, and inconsistent at the operand of inner each sampled point of each subframe.

The implementation method that adjacent speech detects in the above-mentioned echo suppressor, its characteristics be, with reference to the relative voice of the subscriber's line circuit echo 6dB that decays at least, described echo amplitude decay factor u gets 0.5.

Compared to prior art, the characteristics of the method that the present invention proposes are can significantly reduce the needed operand of sequential operation under operation result and the on all four situation of existing method, therefore can on same processing platform, realize more multichannel echo inhibition processing, or, therefore can reach the purpose that reduces cost for realizing that more function provides enough processing poweies.Use the maximal value of this method operand consistent with mean value simultaneously, operand is a certain value under the certain situation of parameter, is convenient to the developer and carries out performance statistics and computational resource allocation, and help the stable and compatible of application product.

Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.

Description of drawings

Fig. 1 is the process flow diagram of the implementation method that adjacent speech detected during a kind of existing echo was suppressed;

Fig. 2 is the process flow diagram of the implementation method that adjacent speech detected during another kind of existing echo was suppressed;

Fig. 3 is the process flow diagram of the inventive method.

Embodiment

Realize the implementation method that the echo suppressor adjacent speech detects in a kind of communication field provided by the invention, the principles illustrated of this method is as follows:

(1) adjacent speech detects and adopts the Geigel algorithm, and filter length is N, and the near-end list entries is s (i), and the far-end list entries is y (i), and the echo amplitude decay factor is u.When then inequality is set up below then near-end speech is arranged:

|s(i)|≥u*max(|y(i)|，|y(i-1)|，...，|y(i-N+1)|)

(2) getting M sampled point is a subframe, and M is less than filter length N.If the present invention is used in VOIP etc. when itself having in the application of voice frame length notion, M should be the common divisor of frame length simultaneously, and for example in G.729 using, frame length is 10ms, 80 sampled points of corresponding 8000Hz sampling, and then M should be divided exactly by 80.Can guarantee at the operand of each sub-frame data in full accord in the present invention.

(3) when first input of each subframe calculated, at first calculate | y (i) | arrive | y (i-N+M) | amount to the maximal value that N-M+1 is ordered, and save as Frame_MAX, will be worth again with the interior remaining M-1 point of moving window and get the maximal value of maximal value as this sampled point.

(4) when the remaining M-1 point of this subframe is carried out computing one by one, the N-M+1 point that is used to calculate Frame_MAX still is retained in the moving window this moment (when calculating at first, do not need to repeat computing), therefore a maximal value that needs calculating Frame_MAX and the interior remaining M-1 of moving window this moment to order gets final product, and promptly is equivalent to get the maximum operation that M is ordered.

Please refer to Fig. 3, realize the inventive method, concrete operation steps is as follows:

Step 1: initialization sub-frame count device COUNT value is 0.

Step 2: getting near-end input data absolute value is | s (i) |.Getting far-end input data absolute value is | y (i) |, and these data are saved among the buffer zone BUFFER.This buffer zone has been preserved from | y (i) | to | y (i-N+1) | and the N Serial No. of ordering altogether.

Step 3: judge that the sub-frame count device is 0? if sub-frame count device COUNT is 0, then execution in step 4; If sub-frame count device COUNT is not 0, then direct execution in step 5.

Step 4: if the sub-frame count device is 0, then represent current remote data sampled point first sampled point for this subframe, calculate up-to-date N-M+1 point in the far-end input moving window, promptly | y (i) | arrive | y (i-N+M) | amount to the maximal value of N-M+1 point data, and this value is saved as Frame_MAX.

Step 5: the maximal value that Frame_MAX and remaining M-1 are ordered among the calculating remote data buffer zone BUFFER, this value is MAX.

Step 6: relatively decay factor u imports the data absolute value with product u*MAX and the near-end of MAX | s (i) | size.

Step 7: if inequality | s (i) | 〉=u*MAX sets up then shows that near-end has phonetic entry (with reference to the relative voice of the subscriber's line circuit echo 6dB that decays at least, can get u is 0.5).If be false, then directly transfer execution in step 8 to.

Step 8: upgrade the sub-frame count device, sub-frame count device numerical value adds 1.

Step 9: judge that sub-frame count device value equals M? if sub-frame count device value is not equal to M, then jump to step 2, next sampling number certificate is calculated; If sub-frame count device value equals M, then execution in step 10.

Step 10: sub-frame count device numerical value is reset to 0, and jumps to step 2, next sampling number certificate is calculated.

According to above implementation step, the operand that expends for whole subframe need carry out inferior 2 of N+M* (M-1) altogether and get maximum operation rather than N*M time at 2 and get maximum operation, thereby reached the purpose that reduces operand, and the operand that can guarantee each subframe equates and this operand can obtain through calculating accurately in the design phase.

Realizing the 16ms echo suppressor, realize once that so that processor is inner get maximal value to need two instruction cycles be example at 2, at the system of 8000Hz sampling, filter length N is (8000/1000) * 16=128.As to get M be 16, then realize that according to Fig. 1 method it is N*M*2=128*16*2=4096 that adjacent speech that a subframe M orders detects calculating the instruction cycles that the maximal value place consumes, and if to carry out the spent instruction cycles of computing according to method provided by the present invention be (N+M* (M-1)) * 2=(128+16* (16-1)) * 2=736.Can save 80% the processor calculating ability that surpasses with respect to former method.

For a fixing system, according to different design requirements, one or two optimum M value is arranged, the processor calculating instruction cycles minimum that when adopting this M value, is consumed.Promptly satisfy (N+M* (M-1))/M minimum in the span of M of system requirements at all this moment.And should optimum M value also can obtain by the method for actual test.

In sum, feature of the present invention is as follows:

(1) the present invention adopts the subframe notion, handles a sub-frame content before the next son frame data arrive.The length M of subframe is less than filter order N, if be applied in the application of speech frame notion, subframe lengths M should be able to be divided exactly by speech frame sampling number FRAME.

(2) the present invention guarantees that the operand of all subframes is all consistent, but inconsistent at the operand of inner each sampled point of each subframe.

(3) always calculate the maximal value Frame_MAX that up-to-date N-M+1 is ordered earlier during every frame arithmetic among the present invention, in the subsequent sampling point of this subframe, only need to calculate the maximal value that M is ordered.

Certainly; the present invention also can have other various embodiments; under the situation that does not deviate from spirit of the present invention and essence thereof; those of ordinary skill in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims

1, the implementation method that adjacent speech detects in a kind of echo suppressor, adopting the Geigel algorithm to carry out adjacent speech detects, it is characterized in that, in far-end list entries y (i), get M sampled point as a subframe, M is less than the filter length N in the described echo suppressor, when the input of first sampled point of each subframe is calculated, at first obtain in the moving window of described far-end input | y (i) | arrive | y (i-N+M) | be total to the maximal value of the latest data of N-M+1 sampled point, and save as Frame_MAX, M sampled point to described subframe carries out computing one by one then, the maximal value of getting the data of M-1 remaining in described maximal value Frame_MAX and current moving window sampled point is the maximal value MAX as each sampled point of described subframe, if the near-end list entries s (i) of the sampled point of described subframe satisfies condition | s (i) | 〉=u*MAX, then described near-end has speech, and wherein u is the echo amplitude decay factor.

2, the implementation method that adjacent speech detects in the echo suppressor according to claim 1 is characterized in that, the method for the subframe among the described far-end list entries y (i) being carried out the adjacent speech detection comprises the steps:

Step 1, initialization one sub-frame count device count value is 0;

Step 2, getting near-end input data absolute value is | s (i) |, getting far-end input data absolute value is | y (i) |, and with described data | y (i) | be saved in the remote data buffer zone, this buffer zone has been preserved from | y (i) | to | y (i-N+1) | the Serial No. of N sampled point altogether;

3, the implementation method that adjacent speech detects in the echo suppressor according to claim 1 and 2 is characterized in that, when described method was applied to itself to have in the application of voice frame length notion, M should be the common divisor of described voice frame length simultaneously.

4, the implementation method that adjacent speech detects in the echo suppressor according to claim 3, it is characterized in that, for a fixing system, according to different design requirements, one or two optimum M value is arranged, the processor calculating instruction cycles minimum that when adopting the M value of described optimum, is consumed.

5, the implementation method that adjacent speech detects in the echo suppressor according to claim 4 is characterized in that, described optimum M value obtains by the method for actual test.

6, the implementation method that adjacent speech detects in the echo suppressor according to claim 4 is characterized in that, in the described method, the operand of all subframes is all consistent, and inconsistent at the operand of inner each sampled point of each subframe.

7, the implementation method that adjacent speech detects in the echo suppressor according to claim 3 is characterized in that, with reference to the relative voice of the subscriber's line circuit echo 6dB that decays at least, described echo amplitude decay factor u gets 0.5.