CN102543079A

CN102543079A - Method and equipment for classifying audio signals in real time

Info

Publication number: CN102543079A
Application number: CN2011104309646A
Authority: CN
Inventors: 林志斌; 孔庆胜
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2011-12-21
Filing date: 2011-12-21
Publication date: 2012-07-04

Abstract

The invention discloses a method and equipment for classifying audio signals in real time, belonging to the technical field of audio encoding and decoding. The method comprises the following steps: preprocessing the inputted audio signals; extracting multi-level audio features in a time domain and an MDCT domain; classifying a current frame by adopting the signal audio feature if the current frame is in a classification convergence time frame I; classifying the current frame by adopting the coarse-fine classification rule if the current frame is behind the classification convergence time frame I, and judging classification features by adopting the multi-level features; and after classifying the current frame by adopting the coarse-fine classification rule, upgrading the signal classification type of the current frame according to the historical state of the classification type of a signal frame before the current frame. Due to the adoption of the method and the equipment, the signals can be classified in real time simply and accurately.

Description

A kind of real-time sound signal sorting technique and equipment

One, technical field

The present invention relates to audio coding decoding and transmission field, relate in particular to a kind of real-time sound signal sorting technique and equipment.

Two, background technology

Before coding audio signal, transmission or other processing, signal is classified; Can improve the efficient of coding and transmission effectively; Because the transmission of multimedia audio signal is under the pattern framework based on real-time Transmission, be an important research contents to the real-time grading of sound signal.

The research to the sound signal classification both at home and abroad concentrates in the long classification mostly, classifies like the short-time energy of the low energy dose rate classification of 1 second or 10 seconds durations and 1 second or 10 seconds durations etc.In the design of sorter, generally adopt sorting technique, like the sorter of SVMs, neural network classifier etc. based on statistics.Because the processing time is long, these methods practicality on the audio frequency real-time grading is not high.

At present the algorithm of audio classification is realized in time domain or frequency domain basically; And the coded system of current trend; Like MP3, AAC etc. utilize the MDCT conversion process, in order to reduce extra arithmetic operation; The characteristic of directly extracting on MDCT territory and the time domain is analyzed, and can improve feature extraction efficient effectively.Cooperate suitable classifying rules can design the sorting device of quick audio signal in real time classification.

Three, summary of the invention

1, goal of the invention: the purpose of this invention is to provide a kind of real-time sound signal sorting technique and equipment; Carry out real-time grading fast; Reduce extra computing, improve the degree of accuracy of sound signal real-time grading, performance sound signal classification encode audio and audio transmission vital role.

2, technical scheme: for realizing the foregoing invention purpose, the present invention discloses a kind of real-time sound signal sorting technique, comprising:

After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame class categories according to the signal frame classification type historic state before the said present frame.

Further, in the said method, adopt short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.

Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

In the said further method;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.

Said present frame adopts the thickness grading rule to carry out the multi-stage characteristics classification when being in behind the convergence time frame I of classification, and multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

Said present frame carries out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, is a type music frames otherwise said present frame is set.

Further; In the said method; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.

Said historic state through the storage classification results; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.

It is given threshold value that described first setting value and second is provided with, and the respective settings value is a series of given threshold values.

The present invention also provides a kind of real-time sound signal sorting device, comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:

Said pre-processing module, it comprises carries out pre-service and silence detection to sound signal;

Said characteristic extracting module, it comprises carries out real-time feature extraction to the sound signal after handling in time domain and MDCT territory;

Said thickness grading rule classification module, it comprises places the audio frequency characteristics that obtains according to certain rule, according to classifying based on the method for thickness grading rule;

Said classification results correcting module, it comprises revises described former classification results, exports accurately sound signal classification results at last.

Further, in the said equipment, said pre-processing module adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.

Further; In the said equipment; Said characteristic extracting module is through carrying out the MDCT conversion to the every frame sound signal after handling; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

Further; In the said equipment; Said thickness grading rule classification module adopts single tagsort through time in the convergence time frame I that preceding frame is in classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.

Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.。

Further; In the said equipment; Said classification results correcting module is through the historic state of storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.

Technical scheme of the present invention improves sound signal real-time grading accuracy through simple thickness grading rule; Thereby improve audio coding decoding efficient greatly, technical scheme of the present invention can be used for the sound signal classification judgement in real time bidirectional communication such as audio coding decoding fields such as wireless, conferencing over ip TV and real time broadcasting service.

Four, description of drawings

Fig. 1 is the sound signal classification application block diagram that is used for the audio encoding device.

Fig. 2 is a kind of real-time sound signal sorting device structured flowchart.

Fig. 3 is the quiet judgement block diagram of sound signal.

Fig. 4 is a single tagsort block diagram in the convergence time frame I that classifies.

Fig. 5 is a single characteristic rough sort block diagram behind the convergence time frame I that classifies.

Fig. 6 is speech-like signal classification rule classfying frame figure.

Fig. 7 type of being music signal classifier block diagram of then classifying.

Fig. 8 is a classification results correcting module block diagram.

Five, embodiment

Main design of the present invention is; Frequently language audio signal classification judgement (like Fig. 1) before codec can adopt a kind of real-time sound signal sorting technique to encode to voice; Be applicable to the scrambler of voice or audio frequency on this basis according to the similar selection of judgement; Thereby improve the voice code efficiency to signal with different type of codec frequently, detailed process is following:

Step 1, signal divide frame and high-pass filtering to handle the unnecessary low frequency signal of filtering; Silence detection adopts short-time zero-crossing rate to carry out the present frame silence detection, when said present frame short-time zero-crossing rate greater than said first setting value, it is non-quiet frame that present frame is set.；

Step 2, MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;

Single characteristic signal category classification in the time of in the convergence time frame I of step 3, signal classification;

Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of behind the convergence time frame I of step 4, signal classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;

Step 5, signal classification rough sort, rough sort characteristic adopt first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, are a type music frames otherwise said present frame is set.

Signal frame after step 6, the signal classification rough sort carries out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.

Step 7, through the storage classification results historic state; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.

Below in conjunction with accompanying drawing and embodiment the present invention program is explained further details.

A kind of real-time sound signal sorting device, as shown in Figure 2, comprise pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module.Introduce the function of each module below.

Pre-processing module; At first be audio stream x (n) to be carried out signal divide frame and high-pass filtering; Next is to adopt short-time average zero-crossing rate to carry out silence detection; As shown in Figure 3; When short-time average zero-crossing rate during greater than first setting value

, judge that this frame is non-quiet frame, short-time average zero-crossing rate calculates and adopts following formula 1:

Z_{n} = \frac{1}{2 N} Σ_{m = - \infty}^{\infty} | Sgn [x (m)] - Sgn [x (m - 1)] | h (n - m)

(formula 1)

Wherein N is a frame length, and sgn [] is a sign function, that is:

Sgn [x (n)] = \{\begin{matrix} 1 & (x (n) &GreaterEqual; 0) \\ - 1 & (x (n) < 0) \end{matrix}

(formula 2)

Window function h (n) is a rectangular window, that is:

Characteristic extracting module is used for carrying out MDCT variation and time domain and the extraction of MDCT characteristic of field thereof after the pre-service.At first adopt MDCT (Modified Discrete Cosine Transform) conversion to obtain frequency coefficient:

The folded 2N point time domain data of forming of N point present frame time domain data x (n) and previous frame N point time domain data x (n-N) is carried out the MDCT conversion, for present embodiment, adopt the 16kHz sampled signal, N gets 320.

X (k) = Σ_{n = 0}^{2 N - 1} x (n) * w (n) * Cos [\frac{π}{N} (n + \frac{1}{2} + \frac{N}{2}) (k + \frac{1}{2})], k = 0 . . . N - 1

(formula 4)

Wherein w (n) representes the sin window function, and expression formula is:

w (n) = Sin [\frac{π}{2 N} (n + \frac{1}{2})], n = 0 . . . 2 N - 1

(formula 5)

Time domain and MDCT frequency domain character thereof extract:

(1) short-time zero-crossing rate Z _n, judgment threshold does

With

(2) MDCT frequency spectrum harmonic structure stability HSS:

Each peak point is designated as P in step 1, the every frame MDCT frequency spectrum of search _l, P _lL peak value representing this frame;

Step 2, conversion P _lOn the logarithmic scale of one-tenth standard, be designated as LP _l, conversion is shown in formula 6:

{LP}_{l} = Log (P_{l}) - Log (\underset{l}{Σ} P_{l}), l = 1, . . ., L

(formula 6)

Wherein L is last peak value;

Step 3, calculate each LP _lVariance be HSS, the judgment threshold of HSS is designated as

With

(3) MDCT spectral sub-bands energy E _b

Said MDCT spectral sub-bands is meant that the MDCT spectral coefficient equidistantly is divided into M subband, and present embodiment M is 32, and sub belt energy such as formula 7 calculate gained:

E_{b} (j) = \sqrt{Σ_{k = N / M * j}^{N / M * j + 1} X (k) * X (k)}, j = 0, . ., M - 1

(formula 7)

Wherein j is a sub-band serial number, E _bThe judgment threshold of first sub belt energy be designated as

(4) MDCT spectral sub-bands energy changing statistical value C _SF

Calculate MDCT frequency spectrum flow SF (j):

SF (j) = Σ_{n = 1}^{Q} | Log E_{b} (i, j) - {Log E}_{b} (i - 1, j) |

(formula 8)

E wherein _b(i j) is j sub belt energy of time frame i frame, and Q is the time frame number of the frequency spectrum flow of calculating, and Q gets 6 in the present embodiment.

Calculate and surpass setting value THR among the SF (j) _SFNumber C _SF, corresponding judgment threshold is a setting value

With

(5) MDCT frequency spectrum barycenter changing value δ _c:

Step 1, calculate each frame MDCT frequency spectrum center of mass values:

SC = Σ_{k = 0}^{N - 1} p (k) F (k)

(formula 9)

F (k)=k+1 wherein, the calculating of p (n) such as formula 10:

P (k)=Ω (k)/max (Ω (k)) (formula 10)

Ω (k)=abs (X (k)) wherein.

Step 2, calculating MDCT frequency spectrum barycenter changing value:

δ_{c} = Σ_{i - O + 1}^{i} | SC (i) - SC (i - 1) |

(formula 11)

The adjacent frame number of O for calculating.O gets 4 in the present embodiment.δ _cJudgment threshold be designated as

With

(6) preceding four the parameter absolute value sum E of MDCT spectral coefficient _l

Calculate preceding four the parameter absolute value sums of MDCT spectral coefficient, its result is designated as E _l, its judgment threshold is masked as

With

Thickness grading rule classification module is a kind of rule classification method based on thickness grading, specifically comprises following process:

Said present frame adopts single tagsort, single characteristic to adopt MDCT spectral sub-bands energy E when being in the convergence time frame I of classification _b, as shown in Figure 4, the MDCT spectral sub-bands energy first energy subband E _b(0) greater than

Then be judged to be the voice signal frame, otherwise be the music signal frame.

Said present frame adopts single tagsort when being in behind the convergence time frame I of classification, if the MDCT spectrum transformation coefficient first energy subband E _b(0) greater than

Then be judged to be the speech-like signal frame, on the contrary type of being music signal frame, and rough sort is as shown in Figure 5.

Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.

Said type of voice classification rule assorting process is as shown in Figure 6, and detailed process is following:

Compare MDCT spectral sub-bands energy changing statistical value C _SFWith C _SFSecond judgment threshold

Size, if greater than

Then exporting present frame is the voice signal frame, judges otherwise get into the second level;

Relatively MDCT spectral sub-bands energy changing statistical value C is judged in the second level _SFWith C _SFThe 3rd judgment threshold

Size, if less than

Then exporting present frame is the music signal frame, judges otherwise get into the third level;

The third level is judged relatively MDCT frequency spectrum barycenter changing value δ _cWith δ _cThe 3rd judgment threshold

If greater than

Then exporting present frame is the voice signal frame, judges otherwise get into the fourth stage;

The fourth stage is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient _lWith E _lSecond judgment threshold

Size, if greater than

Then exporting present frame is the music signal frame, judges otherwise get into level V;

Level V is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient _lWith E _lThe 3rd judgment threshold

Size, if less than

Then exporting present frame is the voice signal frame, otherwise gets into the 6th grade of judgement;

Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 3rd judgment threshold

and the 7th judgment threshold

size for the 6th grade; If HSS belongs to interval; Then be judged to be the music signal frame, otherwise get into the 7th grade of judgement;

Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS second judgment threshold for the 7th grade With the 6th judgment threshold Size compares short-time zero-crossing rate Z simultaneously _nWith Z _nFirst judgment threshold

With the 3rd judgment threshold

Size is if HSS belongs to

Interval and Z _nBelong to

The interval then is judged as the music signal frame, otherwise then is judged as the voice signal frame, speech-like signal classification rule sort module output audio signal classification results.

Then assorting process is as shown in Figure 7 for said type of music classifier, and detailed process is following:

Compare MDCT frequency spectrum barycenter changing value δ _cWith δ _cFirst judgment threshold

If greater than

Relatively MDCT frequency spectrum barycenter changing value δ is judged in the second level _cWith δ _cSecond judgment threshold

If be less than or equal to

The third level is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 4th judgment threshold

With second judgment threshold Size compares short-time zero-crossing rate Z simultaneously _nWith Z _nFirst judgment threshold

With second judgment threshold

Size is if HSS belongs to

Interval and Z _nBelong to

The interval, then exporting present frame is the music signal frame, judges otherwise get into the fourth stage;

The fourth stage is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 5th judgment threshold size; If then exporting present frame greater than

is the voice signal frame, judge otherwise get into level V;

Level V is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS first judgment threshold

Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously _SFWith C _SFFirst judgment threshold

Size, if HSS greater than

And C _SFGreater than

Judge relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient for the 6th grade _lWith E _lFirst judgment threshold Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously _SFWith C _SFThe 4th judgment threshold

If size is E _lLess than

And C _SFGreater than

Then be judged as the voice signal frame, otherwise then be judged as the music signal frame, type music signal classifier is sort module output audio signal classification results then.

The classification results correcting module; It is characterized in that,, promptly store original classification result and the classification results of present frame of the preceding T-1 frame of present frame through the historic state of storage classification results; If present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results; Otherwise add up the classification results of the highest sound signal classification results of the frequency of occurrences as present frame, T gets 10 in the present embodiment, and classification results correcting module block diagram is as shown in Figure 8.

Wherein, the related corresponding multistage decision threshold value of audio frequency characteristics parameter in described a kind of real-time sound signal sorting device, its numerical value is as shown in table 1.

Below the classifying quality of technical scheme of the present invention is assessed.

The voice frequency material of EBU SQAM is used in this assessment, and Chinese adopts the Chinese sample of national standard GSBM 6001-89 assessment of acoustics exemplar " good story is not beautiful ", totally 71 audio samples.Signal is an original audio signal, and signal sampling rate is 16KHz, and every frame length is 20ms.Assessment result is seen table 2.

The multistage decision threshold value that table 1 audio frequency characteristics parameter is corresponding

Table 2 sound signal classifying quality test result

The signal classification	Accuracy %
		Music
The single-frequency audio frequency	99.6
		Electronic musical instrument	96.9
Stringed musical instrument	96.6
		Wind instrument	97.8
Percussion instrument	94.5
		Concertina	95.0

		Voice
Male voice	95.6
		Female voice	96.9

Technical scheme according to the invention is to the correct resolution average out to 96.22% of voice, the correct resolution average out to 96.23% of music, good classification effect.Technical scheme of the present invention is carried out the extraction of audio frequency characteristics in the general MDCT territory of codec frequently at existing voice, has avoided the complex calculation of additional transformations, makes the sound signal classification more quick; Classification processing to sound signal is real-time, can improve the efficient of Audio Signal Processing such as audio transmission, audio coding effectively.

Claims

1. a real-time sound signal sorting technique is characterized in that, comprising:

After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame classification type according to the signal frame classification type historic state before the said present frame.

2. method according to claim 1 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.

3. method according to claim 1; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

4. method according to claim 1; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.

5. method according to claim 1; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

6. method according to claim 5 is characterized in that, said present frame is carried out rough sort; The rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate; Greater than said second setting value, present frame is set is a type speech frame, otherwise being set, said present frame is a type music frames.

7. according to claim 5 or 6 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.

8. method according to claim 1; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.

9. according to claim 1 or 2 or 4 or 7 described methods, it is characterized in that it is given threshold value that first setting value and second is provided with, the respective settings value is a series of given threshold values.

10. a real-time sound signal sorting device is characterized in that, this equipment comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:

Said classification results correcting module, it comprises revises described former classification results, exports accurate sound signal classification results at last.

11. method according to claim 10 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.

12. method according to claim 10; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

13. method according to claim 10; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.

14. method according to claim 10; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.

15. method according to claim 14; It is characterized in that; Said present frame is carried out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, greater than said second setting value; Present frame is set is a type speech frame, be a type music frames otherwise said present frame is set.

16. according to claim 14 or 15 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.

17. method according to claim 10; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.

18., it is characterized in that it is given threshold value that first setting value and second is provided with according to claim 10 or 11 or 13 or 16 described methods, the respective settings value is a series of given threshold values.