CN100399419C

CN100399419C - Method for testing silent frame

Info

Publication number: CN100399419C
Application number: CNB2004100971346A
Authority: CN
Inventors: 王麒
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2004-12-07
Filing date: 2004-12-07
Publication date: 2008-07-02
Anticipated expiration: 2024-12-07
Also published as: CN1787071A

Abstract

The present invention discloses a method for detecting muting frames. An energy value variable for background noise is arranged to judge whether the energy value variable for background noise needs to be changed according to the minimum energy of sampling points in current obtained data frames; under the condition that the energy value variable for background noise needs to be changed, corresponding adjustment is executed; then, the variable for background noise is utilized to realize muting frame detection. Because the present invention adopts the energy value variable for background noise, and the value of the variable is changed in real time according to the current requirements, the present invention can adapt to the current environment of muting frame detection and the requirement of equipment so that muting frame detection is accurate.

Description

A kind of method that detects quiet frame

Technical field

The present invention relates to the voice communication technical field, relate in particular to a kind of method that detects quiet frame.

Background technology

Current, voice communication technology end to end is very general, communication one side obtains speech data, after this speech data encoded and pack, send to communication the opposing party by network, communication the opposing party to the encapsulation of data that receives unpack with decoding processing after play, thereby realize voice communication end to end.

In the practical communication process, any one party of communicating pair certainly is in the quite a while only listens to the other side's voice and itself and dumb state, is in communication party under this state and is called as and is in mute state.For the communication party who is in mute state, itself is also silent because it receives only the other side's voice, therefore, it is current can not to obtain speech data there from the caller, and if the quiet data of the data that can only obtain comprising that this class of ground unrest is useless still such according to conventional treatment method, encoded and is packed quiet data, and send to communication the opposing party and unpack and decode, then can cause the unnecessary waste of the network bandwidth.

So, the communication party is necessary whether the data to current acquisition are that quiet data is judged, because data normally occur with the form of Frame, therefore, need carry out quiet frame detects to judge whether the current Frame that obtains is quiet frame, if, the then processing of no longer this quiet frame being carried out above-mentioned encoding and decoding and packing and unpacking, thus the network bandwidth saved.

In the prior art, realize that in the following way quiet frame detects:

Frame in the voice flow is sampled, obtain the energy value of each sampled point, then, the energy value of these sampled points is compared with a quiet threshold value constant of rule of thumb setting in advance respectively, all do not have this threshold value of surpassing if continue for some time the energy value of sampled point, then this Frame is quiet frame.

Though prior art can realize quiet frame detection,, have following shortcoming:

1, since quiet threshold value for rule of thumb being worth a predefined constant, and adopt each different sound card can make quiet threshold value have nothing in common with each other, therefore, the quiet threshold value that employing exists with the constant form is carried out quiet frame and is detected, and can't realize that all carrying out quiet comparatively accurately frame for the communication party who adopts different sound cards detects;

2, because the influence of caller present position, cause ground unrest bigger possibly, at this moment, a side caller is silent even communicate by letter, the energy value of sampled point still can be higher in the speech frame that it obtained, if still adopt the quiet threshold value of constant form to carry out silence detection, then cause mistake to think the quiet frame that includes only the high noisy data of current acquisition by mistake to be speech frame possibly, thereby cause misjudgment; Even consider the situation that this ground unrest is bigger, and quiet threshold value constant is set at higher value, then under the normal background noise situation, the mistake speech frame that the speech energy value is less is judged to be quiet frame, can cause misjudgment equally.

In a word, detect, therefore, can cause the inaccurate of quiet frame detection because prior art has adopted quiet threshold value constant to carry out quiet frame.

Summary of the invention

In view of this, fundamental purpose of the present invention is to provide a kind of quiet frame detection method, detects the inaccurate problem of bringing of detection to eliminate owing to adopting above-mentioned quiet thresholding constant to carry out quiet frame.

For achieving the above object, the invention provides a kind of method that detects quiet frame, this method comprises:

Steps A: with current data frame dividing is data burst;

Step B: from the data burst of described division, select a data burst that does not also carry out the data burst of quiet frame detection as current detection, calculate the minimum energy value and the maximum energy value of each sampled point in this data burst, this minimum energy value and ground unrest energy value variable are compared, correspondingly change ground unrest energy value variable according to comparative result;

Step C: according to the currency of ground unrest energy value variable, calculate the noise power value upper limit, whether the described maximum energy value of determining step B if then the data burst of current detection is quiet frame, otherwise is speech frame less than this upper limit;

Step D: judge whether also have other data burst not carry out silence detection in the current data frame, if, then return step B, all data bursts in having detected this Frame, otherwise, judge whether the data burst in the current data frame is quiet frame, if, judge that then obtaining this current Frame is quiet frame, otherwise be speech frame.

Wherein, steps A comprises:

With the minimum treat duration is unit, and a Frame is divided into a plurality of data bursts.

Wherein, steps A comprises:

Whole Frame only is divided into a data subframe.

Wherein, described minimum energy value and the maximum energy value that calculates each sampled point in this data burst of step B comprises:

The mode of subtracting each other of adopt intersecting is calculated amplitude absolute value poor of each neighbouring sample point, calculates the energy of each sampled point respectively according to the absolute value of this difference, compares the energy of each sample point then, obtains described maximum energy value and minimum energy value.

Wherein, described absolute value according to this difference energy that calculates each sampled point respectively comprises:

Step B11: with the absolute value of described difference as the back output valve of a sample point in the adjacent sample point that produces this difference, with, calculate the poor of first sample point and all sample point amplitude average absolute, then this difference is taken absolute value, with the output valve of this absolute value as this first sample point;

Step B12: utilize the output valve of each sample point, calculate the energy value of each sample point respectively and compare each sample point energy value, obtain described maximum energy value and minimum energy value according to formula (1);

Formula (1):

Wherein, in formula (1), Energy _iThe energy value of representing i sample point, output[i] output valve of i sample point of expression, i is a natural number, representative sample point sequence number.

Step B21: with the absolute value of described difference output valve as last sample point in the adjacent sample point that produces this difference, with, calculate the poor of last sample point and all sample point amplitude average absolute, then this difference is taken absolute value, with the output valve of this absolute value as this last sample point;

Step B22: utilize the output valve of each sample point, calculate the energy value of each sample point respectively and compare each sample point energy value, obtain described maximum energy value and minimum energy value according to formula (1);

Formula (1):

The amplitude of each sample point in the data burst is taken absolute value, then, according to formula (2), utilize the amplitude absolute value of each sample point to calculate the energy value of each sample point respectively, again each sample point energy value is compared, obtain the minimum energy value and the maximum energy value of sample point;

Formula (2):

In formula (2), Energy _iThe energy value of representing i sample point, extent[i] the amplitude absolute value of i sample point of expression, i is a natural number, representative sample point sequence number.

Wherein, step B is described to compare this minimum energy value and ground unrest energy value variable, correspondingly changes ground unrest energy value variable according to comparative result and comprises:

Judge that described minimum energy value is whether smaller or equal to the currency of ground unrest energy value variable, if then give ground unrest energy value variable with sample point minimum energy value assignment; With, judge whether in the predefined time, continue to occur the sample point minimum energy value greater than ground unrest energy value variable, if then increase the ground unrest energy value.

Wherein, described increase ground unrest energy value comprises:

The ground unrest energy value is added a constant.

Wherein, described the calculating on the noise power value of step C is limited to:

Described ground unrest energy value variable adds a predefined noise energy thresholding constant, obtains the upper limit of described noise power value.

Wherein, described noise energy thresholding constant is 10dB.

Wherein, step C is described judges whether ceiling capacity comprises less than this upper limit:

Whether judgement continued maximum energy value to occur less than this upper limit in the predefined time.

As seen, the present invention has following beneficial effect:

In the present invention, because the least energy according to the sample point in the current data frame changes the ground unrest energy value, therefore, can all calculate in all cases and the corresponding ground unrest energy value of present case, thereby make this method can mask the difference between different vendor's sound card, and can be under the situation that background noise exists, whether be quiet frame, to improve the accuracy and the precision of quiet frame test greatly if accurately detecting the current speech frame.

Description of drawings

Fig. 1 is for realizing process flow diagram of the present invention.

Embodiment

The present invention is a kind of method that detects quiet frame, this method is set a ground unrest energy value variable, least energy according to sampled point in the Frame of current acquisition, judge whether this ground unrest energy value variable needs to change, and under the situation that needs change, correspondingly adjust, utilize this ground unrest variable to realize that quiet frame detects then.Because the present invention adopts this variable of ground unrest energy value, and according to the value of current this variable of needs real time altering, therefore, can adapt to the current environment that quiet frame detects and the needs of equipment, thereby it is more accurate to make that quiet frame detects.

Describe the present invention below in conjunction with accompanying drawing.

Referring to Fig. 1, realize that the present invention needs following steps:

Step 101: current data frame is divided into data burst;

Step 102: from current data frame, select a data burst that does not also carry out the data burst of quiet frame detection as current detection, this data burst is sampled, calculate the minimum energy value and the maximum energy value of each sampled point;

Step 103: minimum energy value and ground unrest energy value variable are compared, judge that according to comparative result current whether needs change ground unrest energy value variable, if then correspondingly increase or reduce ground unrest energy value variable;

Step 104: according to the currency of ground unrest energy value variable, calculate the noise power value upper limit, whether determining step 102 described maximum energy value if then the data burst of current detection is quiet frame, otherwise are speech frame less than this upper limit;

Step 105: judge whether also have other data burst not carry out silence detection in the current data frame, if, then return step 102, all data bursts in having detected this Frame, otherwise, execution in step 106;

Step 106: judging whether the data burst in the current data frame is quiet frame, if then this current Frame is quiet frame, otherwise is speech frame.

Wherein, calculate each sample point maximum energy value described in the above-mentioned steps 102, can be after step 102, the optional position before the described determining step of step 104 carries out, and do not influence realization of the present invention.

Below in conjunction with instantiation, the specific implementation of above step is described respectively.

In embodiments of the present invention, the sampling rate of voice flow is 8000Hz, and quantification gradation is 16, tape symbol; In other embodiment of the present invention, equally also can carry out quiet frame to the voice flow of other sampling rate and quantification gradation and detect, do not influence realization of the present invention.

(1) specific implementation of step 101:

In embodiments of the present invention, detect in order to carry out quiet frame more accurately, is unit with a Frame with the minimum treat duration, be divided into a plurality of data bursts, in embodiments of the present invention, with 5 milliseconds as the minimum treat duration, because the data stream sampling rate is 8000Hz, quantification gradation is 16, therefore, this data burst of 5 milliseconds has 5 * 8000/1000=40 sample point, in other embodiment of the present invention, also can adopt other duration is that unit is a data burst with data frame dividing, does not influence realization of the present invention;

In step 101, with the minimum treat duration is that unit is a data burst with data frame dividing, thereby on the one hand can the Frame that packaging time length is different being divided into onesize data burst carries out quiet frame and detects, thereby can obtain the quiet frame accuracy of detection of a unification, on the other hand, because the embodiment of the invention is the division that unit carries out data burst with the minimum treat duration, therefore, with respect to adopting longer duration is that the quiet frame that unit carries out being carried out after data burst is divided detects, can improve the accuracy that quiet frame detects greatly, avoid the appearance of error;

If do not consider unified as mentioned above accuracy of detection and the problem that improves accuracy in detection, also whole Frame only can be divided into a data subframe, at this moment, this Frame itself is exactly tested data burst, equally also can realize the silence detection process of Frame, not influence realization of the present invention by carrying out subsequent step;

(2) specific implementation of step 102:

In embodiments of the present invention, from each data burst of current data frame, select the data burst of a data subframe successively as current detection, this current data burst is sampled, then, calculates the minimum energy value and the maximum energy value of each sampled point in the following manner:

Mode one: the mode of subtracting each other of adopt intersecting is calculated amplitude absolute value poor of each neighbouring sample point, calculate the energy of each sampled point respectively according to the absolute value of this difference, compare the energy of each sample point then, obtain described maximum energy value and minimum energy value, its specific implementation is:

Step a1: the amplitude of all sample points in the data burst is taken absolute value, and calculate all sample point amplitude average absolute, then, the amplitude absolute value of adjacent sample point is subtracted each other and take absolute value, subtract each other the output valve of the absolute value of the difference that obtains with this as a sample point in the adjacent sample point that produces this difference, and calculate the difference of all sample point amplitude average absolute and first sample point or last sample point amplitude absolute value, with the absolute value of this difference output valve as described first sample point or last sample point;

Wherein, in embodiments of the present invention, described output valve of subtracting each other the absolute value of the difference that obtains as back one sample point in the adjacent sample point, for first sample point, calculate the poor of this sample point and all sample point amplitude average absolute, then this difference is taken absolute value, with the output valve of this absolute value as this first sample point; In other embodiment of the present invention, also can be with described output valve of subtracting each other the absolute value of the difference that obtains as last sample point in the adjacent sample point, for last sample point, calculate the poor of this sample point and all sample point amplitude average absolute, then this difference is taken absolute value, with the output valve of this absolute value as this last sample point;

Step a2: utilize the output valve of each sample point that step a1 obtains, calculate the energy value of each sample point according to following formula (1) respectively;

Formula (1):

Wherein, in formula (1), Energy _iThe energy value of representing i sample point, output[i] output valve of i sample point of expression, i is a natural number, representative sample point sequence number;

Step a3: all sample points are compared, therefrom obtain sample point maximum energy value and sample point minimum energy value;

Wherein, in embodiments of the present invention, at first calculate the energy value of all sample points by step a2, and then these energy values are compared by step a3, in other embodiment of the present invention, also can calculate the energy value of current sample point according to formula (1) successively, energy value that then will this current sample point and current minimum energy value and maximum energy value compare respectively, if this current minimum energy value is less than current minimum energy value, energy value assignment that then will this current sample point is given this current minimum energy value, if the energy value of this current sample point is greater than current maximum energy value, energy value assignment that then will this current sample point is given this current maximum energy value, repeat the process of this calculating and comparison energy value, until all sample points are all disposed.

This mode one is subtracted each other owing to the amplitude absolute value to sample point has carried out intersecting, therefore, can make subsequent treatment all be based upon on the basis of adjacent voice slight change, help shielding the difference between the different sound cards like this, and under the situation that ground unrest exists, can judge quiet making more accurately.

In step 102,, also can adopt mode two to calculate the minimum energy value and the maximum energy value of sample point except calculating the minimum energy value and maximum energy value of each sampled point according to mode one as mentioned above.

Mode two:

Formula (2):

In formula (2), Energy _iThe energy value of representing i sample point, extent[i] the amplitude absolute value of i sample point of expression, i is a natural number, representative sample point sequence number;

Wherein, in this mode two, can adopt with mode one in identical mode realize the mutual comparison of sample point energy value not influencing realization of the present invention.

(3) specific implementation of step 103:

Judge sample point minimum energy value in this data burst whether smaller or equal to the currency of ground unrest energy value variable, if, show that current ground unrest energy value is excessive, then give ground unrest energy value variable with sample point minimum energy value assignment; And, judge that simultaneously sample point minimum energy value in this data burst is whether greater than the currency of ground unrest energy value variable, if show that current ground unrest energy value is too small, can not correctly reflect current background, then increase the ground unrest energy value;

Wherein, more accurate in order to make judgement in embodiments of the present invention, can judge in the following manner that minimum energy value is whether greater than ground unrest energy value variable:

Judge whether in the predefined time, continue to occur the sample point minimum energy value greater than ground unrest energy value variable, if then increase the ground unrest energy value; Wherein, the data burst of corresponding corresponding number of this predefined time, utilize this to preestablish the purpose that the time carries out above-mentioned judgement and be:

The current detection Frame both may be a speech frame, also may be quiet frame, if be quiet frame, the energy that has then only comprised ground unrest in the energy value of sample point, so, obviously can adopt the minimum energy value and the ground unrest energy value of sample point in this quiet frame to compare, and carry out the renewal of ground unrest energy value according to comparative result; If the current detection Frame is a speech frame, then the energy value in the sample point has not only comprised the energy of ground unrest, also include the energy of voice itself, the minimum energy value of sample point obviously can not correctly reflect current ground unrest situation in this speech frame, therefore, should avoid minimum energy value in the speech frame and ground unrest energy value are compared; Consider that the people is when carrying out voice communication, its pronunciation can not be carried out in a period of time continuously, the present invention preestablishes the corresponding time of tone period limiting length with the people, then, judge whether in this time, continue to occur in the Frame sample point minimum energy value less than the ground unrest energy value, if, then show it is under quiet situation, the sample point minimum energy value has appearred in the Frame greater than the situation of ground unrest energy value, correspondingly increase the ground unrest energy value, otherwise, the sample point minimum energy value illustrates that greater than the situation of ground unrest energy value this Frame might be speech frame, does not increase the ground unrest energy value in the Frame if only at a time occur;

Wherein, in embodiments of the present invention,, realize increasing the ground unrest energy value in the following manner in order to make the ground unrest energy value to seamlessly transit:

If in described predefined a period of time, continue to occur data burst sample point minimum energy value greater than ground unrest energy value variable, then this ground unrest energy value variable is added 1, then, timing once more if in predefined a period of time, still continues to occur data burst sample point minimum energy value greater than ground unrest energy value variable, then this ground unrest energy value variable is added 1 once more, thereby realize seamlessly transitting of ground unrest energy value; In embodiments of the present invention, the ground unrest energy value adds 1 at every turn, and in other embodiment of the present invention, the ground unrest energy value also can add the constant that other is big or small at every turn, does not influence realization of the present invention.

(4) specific implementation of step 104:

Because ground unrest energy value variable is only represented the lower limit of ground unrest energy value, just pure background sound, and during quiet, except background sound, the noise that also might have other type, therefore, in the silence detection process, should consider the influence that energy brought of these other type noises, need calculate the upper limit of ground unrest energy value, utilize this upper limit to normal voice and quiet the differentiation, in the embodiment of the invention, ground unrest energy value variable is added a predefined noise energy thresholding constant, obtain the upper limit of described noise power value; Can determine the value of this noise energy thresholding constant by actual test, in embodiments of the present invention, this noise energy thresholding constant is 10dB;

After calculating the ground unrest energy value upper limit, described maximum energy value and this upper limit are compared, in embodiments of the present invention, more accurate in order to detect, whether judgement continued maximum energy value to occur less than this upper limit in the predefined time, if judge that then the data burst that obtains current detection is quiet frame; Wherein, in the embodiment of the invention, this predefined time is 500 milliseconds.

(5) specific implementation of step 106:

For being the situation of a plurality of data bursts, in this step 106, judge whether all data bursts in the Frame are quiet frame, if then this Frame is quiet frame with data frame dividing; For being the situation of a data subframe only, in this step 106, judge whether this Frame itself as data burst is quiet frame, if judge that then obtaining this Frame is quiet frame with data frame dividing.

The above only is preferred embodiment of the present invention, and is in order to restriction the present invention, within the spirit and principles in the present invention not all, any modification of being done, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. method that detects quiet frame is characterized in that this method comprises:

Steps A: with current data frame dividing is data burst;

2. method according to claim 1 is characterized in that steps A comprises:

3. method according to claim 1 is characterized in that steps A comprises:

Whole Frame only is divided into a data subframe.

4. method according to claim 1 is characterized in that, described minimum energy value and the maximum energy value that calculates each sampled point in this data burst of step B comprises:

5. method according to claim 4 is characterized in that, the energy that described absolute value according to this difference calculates each sampled point respectively comprises:

Formula (1):

6. method according to claim 4 is characterized in that, the energy that described absolute value according to this difference calculates each sampled point respectively comprises:

Formula (1):

7. method according to claim 1 is characterized in that, described minimum energy value and the maximum energy value that calculates each sampled point in this data burst of step B comprises:

Formula (2):