CN102543079A - Method and equipment for classifying audio signals in real time - Google Patents

Method and equipment for classifying audio signals in real time Download PDF

Info

Publication number
CN102543079A
CN102543079A CN2011104309646A CN201110430964A CN102543079A CN 102543079 A CN102543079 A CN 102543079A CN 2011104309646 A CN2011104309646 A CN 2011104309646A CN 201110430964 A CN201110430964 A CN 201110430964A CN 102543079 A CN102543079 A CN 102543079A
Authority
CN
China
Prior art keywords
frame
classification
mdct
present frame
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011104309646A
Other languages
Chinese (zh)
Inventor
林志斌
孔庆胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN2011104309646A priority Critical patent/CN102543079A/en
Publication of CN102543079A publication Critical patent/CN102543079A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses a method and equipment for classifying audio signals in real time, belonging to the technical field of audio encoding and decoding. The method comprises the following steps: preprocessing the inputted audio signals; extracting multi-level audio features in a time domain and an MDCT domain; classifying a current frame by adopting the signal audio feature if the current frame is in a classification convergence time frame I; classifying the current frame by adopting the coarse-fine classification rule if the current frame is behind the classification convergence time frame I, and judging classification features by adopting the multi-level features; and after classifying the current frame by adopting the coarse-fine classification rule, upgrading the signal classification type of the current frame according to the historical state of the classification type of a signal frame before the current frame. Due to the adoption of the method and the equipment, the signals can be classified in real time simply and accurately.

Description

A kind of real-time sound signal sorting technique and equipment
One, technical field
The present invention relates to audio coding decoding and transmission field, relate in particular to a kind of real-time sound signal sorting technique and equipment.
Two, background technology
Before coding audio signal, transmission or other processing, signal is classified; Can improve the efficient of coding and transmission effectively; Because the transmission of multimedia audio signal is under the pattern framework based on real-time Transmission, be an important research contents to the real-time grading of sound signal.
The research to the sound signal classification both at home and abroad concentrates in the long classification mostly, classifies like the short-time energy of the low energy dose rate classification of 1 second or 10 seconds durations and 1 second or 10 seconds durations etc.In the design of sorter, generally adopt sorting technique, like the sorter of SVMs, neural network classifier etc. based on statistics.Because the processing time is long, these methods practicality on the audio frequency real-time grading is not high.
At present the algorithm of audio classification is realized in time domain or frequency domain basically; And the coded system of current trend; Like MP3, AAC etc. utilize the MDCT conversion process, in order to reduce extra arithmetic operation; The characteristic of directly extracting on MDCT territory and the time domain is analyzed, and can improve feature extraction efficient effectively.Cooperate suitable classifying rules can design the sorting device of quick audio signal in real time classification.
Three, summary of the invention
1, goal of the invention: the purpose of this invention is to provide a kind of real-time sound signal sorting technique and equipment; Carry out real-time grading fast; Reduce extra computing, improve the degree of accuracy of sound signal real-time grading, performance sound signal classification encode audio and audio transmission vital role.
2, technical scheme: for realizing the foregoing invention purpose, the present invention discloses a kind of real-time sound signal sorting technique, comprising:
After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame class categories according to the signal frame classification type historic state before the said present frame.
Further, in the said method, adopt short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
In the said further method;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
Said present frame adopts the thickness grading rule to carry out the multi-stage characteristics classification when being in behind the convergence time frame I of classification, and multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Said present frame carries out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, is a type music frames otherwise said present frame is set.
Further; In the said method; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Said historic state through the storage classification results; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
It is given threshold value that described first setting value and second is provided with, and the respective settings value is a series of given threshold values.
The present invention also provides a kind of real-time sound signal sorting device, comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:
Said pre-processing module, it comprises carries out pre-service and silence detection to sound signal;
Said characteristic extracting module, it comprises carries out real-time feature extraction to the sound signal after handling in time domain and MDCT territory;
Said thickness grading rule classification module, it comprises places the audio frequency characteristics that obtains according to certain rule, according to classifying based on the method for thickness grading rule;
Said classification results correcting module, it comprises revises described former classification results, exports accurately sound signal classification results at last.
Further, in the said equipment, said pre-processing module adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
Further; In the said equipment; Said characteristic extracting module is through carrying out the MDCT conversion to the every frame sound signal after handling; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Further; In the said equipment; Said thickness grading rule classification module adopts single tagsort through time in the convergence time frame I that preceding frame is in classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
Said present frame adopts the thickness grading rule to carry out the multi-stage characteristics classification when being in behind the convergence time frame I of classification, and multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Said present frame carries out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, is a type music frames otherwise said present frame is set.
Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.。
Further; In the said equipment; Said classification results correcting module is through the historic state of storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
It is given threshold value that described first setting value and second is provided with, and the respective settings value is a series of given threshold values.
Technical scheme of the present invention improves sound signal real-time grading accuracy through simple thickness grading rule; Thereby improve audio coding decoding efficient greatly, technical scheme of the present invention can be used for the sound signal classification judgement in real time bidirectional communication such as audio coding decoding fields such as wireless, conferencing over ip TV and real time broadcasting service.
Four, description of drawings
Fig. 1 is the sound signal classification application block diagram that is used for the audio encoding device.
Fig. 2 is a kind of real-time sound signal sorting device structured flowchart.
Fig. 3 is the quiet judgement block diagram of sound signal.
Fig. 4 is a single tagsort block diagram in the convergence time frame I that classifies.
Fig. 5 is a single characteristic rough sort block diagram behind the convergence time frame I that classifies.
Fig. 6 is speech-like signal classification rule classfying frame figure.
Fig. 7 type of being music signal classifier block diagram of then classifying.
Fig. 8 is a classification results correcting module block diagram.
Five, embodiment
Main design of the present invention is; Frequently language audio signal classification judgement (like Fig. 1) before codec can adopt a kind of real-time sound signal sorting technique to encode to voice; Be applicable to the scrambler of voice or audio frequency on this basis according to the similar selection of judgement; Thereby improve the voice code efficiency to signal with different type of codec frequently, detailed process is following:
Step 1, signal divide frame and high-pass filtering to handle the unnecessary low frequency signal of filtering; Silence detection adopts short-time zero-crossing rate to carry out the present frame silence detection, when said present frame short-time zero-crossing rate greater than said first setting value, it is non-quiet frame that present frame is set.;
Step 2, MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;
Single characteristic signal category classification in the time of in the convergence time frame I of step 3, signal classification;
Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of behind the convergence time frame I of step 4, signal classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;
Step 5, signal classification rough sort, rough sort characteristic adopt first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, are a type music frames otherwise said present frame is set.
Signal frame after step 6, the signal classification rough sort carries out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Step 7, through the storage classification results historic state; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
Below in conjunction with accompanying drawing and embodiment the present invention program is explained further details.
A kind of real-time sound signal sorting device, as shown in Figure 2, comprise pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module.Introduce the function of each module below.
Pre-processing module; At first be audio stream x (n) to be carried out signal divide frame and high-pass filtering; Next is to adopt short-time average zero-crossing rate to carry out silence detection; As shown in Figure 3; When short-time average zero-crossing rate during greater than first setting value
Figure BSA00000640155400041
, judge that this frame is non-quiet frame, short-time average zero-crossing rate calculates and adopts following formula 1:
Z n = 1 2 N Σ m = - ∞ ∞ | Sgn [ x ( m ) ] - Sgn [ x ( m - 1 ) ] | h ( n - m ) (formula 1)
Wherein N is a frame length, and sgn [] is a sign function, that is:
Sgn [ x ( n ) ] = 1 ( x ( n ) &GreaterEqual; 0 ) - 1 ( x ( n ) < 0 ) (formula 2)
Window function h (n) is a rectangular window, that is:
Figure BSA00000640155400051
Characteristic extracting module is used for carrying out MDCT variation and time domain and the extraction of MDCT characteristic of field thereof after the pre-service.At first adopt MDCT (Modified Discrete Cosine Transform) conversion to obtain frequency coefficient:
The folded 2N point time domain data of forming of N point present frame time domain data x (n) and previous frame N point time domain data x (n-N) is carried out the MDCT conversion, for present embodiment, adopt the 16kHz sampled signal, N gets 320.
X ( k ) = &Sigma; n = 0 2 N - 1 x ( n ) * w ( n ) * Cos [ &pi; N ( n + 1 2 + N 2 ) ( k + 1 2 ) ] , k = 0 . . . N - 1 (formula 4)
Wherein w (n) representes the sin window function, and expression formula is:
w ( n ) = Sin [ &pi; 2 N ( n + 1 2 ) ] , n = 0 . . . 2 N - 1 (formula 5)
Time domain and MDCT frequency domain character thereof extract:
(1) short-time zero-crossing rate Z n, judgment threshold does
Figure BSA00000640155400054
Figure BSA00000640155400055
With
Figure BSA00000640155400056
(2) MDCT frequency spectrum harmonic structure stability HSS:
Each peak point is designated as P in step 1, the every frame MDCT frequency spectrum of search l, P lL peak value representing this frame;
Step 2, conversion P lOn the logarithmic scale of one-tenth standard, be designated as LP l, conversion is shown in formula 6:
LP l = Log ( P l ) - Log ( &Sigma; l P l ) , l = 1 , . . . , L (formula 6)
Wherein L is last peak value;
Step 3, calculate each LP lVariance be HSS, the judgment threshold of HSS is designated as
Figure BSA00000640155400058
Figure BSA00000640155400059
With
Figure BSA000006401554000510
(3) MDCT spectral sub-bands energy E b
Said MDCT spectral sub-bands is meant that the MDCT spectral coefficient equidistantly is divided into M subband, and present embodiment M is 32, and sub belt energy such as formula 7 calculate gained:
E b ( j ) = &Sigma; k = N / M * j N / M * j + 1 X ( k ) * X ( k ) , j = 0 , . . , M - 1 (formula 7)
Wherein j is a sub-band serial number, E bThe judgment threshold of first sub belt energy be designated as
Figure BSA000006401554000512
(4) MDCT spectral sub-bands energy changing statistical value C SF
Calculate MDCT frequency spectrum flow SF (j):
SF ( j ) = &Sigma; n = 1 Q | Log E b ( i , j ) - Log E b ( i - 1 , j ) | (formula 8)
E wherein b(i j) is j sub belt energy of time frame i frame, and Q is the time frame number of the frequency spectrum flow of calculating, and Q gets 6 in the present embodiment.
Calculate and surpass setting value THR among the SF (j) SFNumber C SF, corresponding judgment threshold is a setting value
Figure BSA00000640155400062
Figure BSA00000640155400063
With
(5) MDCT frequency spectrum barycenter changing value δ c:
Step 1, calculate each frame MDCT frequency spectrum center of mass values:
SC = &Sigma; k = 0 N - 1 p ( k ) F ( k ) (formula 9)
F (k)=k+1 wherein, the calculating of p (n) such as formula 10:
P (k)=Ω (k)/max (Ω (k)) (formula 10)
Ω (k)=abs (X (k)) wherein.
Step 2, calculating MDCT frequency spectrum barycenter changing value:
&delta; c = &Sigma; i - O + 1 i | SC ( i ) - SC ( i - 1 ) | (formula 11)
The adjacent frame number of O for calculating.O gets 4 in the present embodiment.δ cJudgment threshold be designated as
Figure BSA00000640155400067
With
(6) preceding four the parameter absolute value sum E of MDCT spectral coefficient l
Calculate preceding four the parameter absolute value sums of MDCT spectral coefficient, its result is designated as E l, its judgment threshold is masked as
Figure BSA00000640155400069
Figure BSA000006401554000610
With
Thickness grading rule classification module is a kind of rule classification method based on thickness grading, specifically comprises following process:
Said present frame adopts single tagsort, single characteristic to adopt MDCT spectral sub-bands energy E when being in the convergence time frame I of classification b, as shown in Figure 4, the MDCT spectral sub-bands energy first energy subband E b(0) greater than
Figure BSA000006401554000612
Then be judged to be the voice signal frame, otherwise be the music signal frame.
Said present frame adopts single tagsort when being in behind the convergence time frame I of classification, if the MDCT spectrum transformation coefficient first energy subband E b(0) greater than
Figure BSA000006401554000613
Then be judged to be the speech-like signal frame, on the contrary type of being music signal frame, and rough sort is as shown in Figure 5.
Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Said type of voice classification rule assorting process is as shown in Figure 6, and detailed process is following:
Compare MDCT spectral sub-bands energy changing statistical value C SFWith C SFSecond judgment threshold
Figure BSA00000640155400071
Size, if greater than
Figure BSA00000640155400072
Then exporting present frame is the voice signal frame, judges otherwise get into the second level;
Relatively MDCT spectral sub-bands energy changing statistical value C is judged in the second level SFWith C SFThe 3rd judgment threshold
Figure BSA00000640155400073
Size, if less than
Figure BSA00000640155400074
Then exporting present frame is the music signal frame, judges otherwise get into the third level;
The third level is judged relatively MDCT frequency spectrum barycenter changing value δ cWith δ cThe 3rd judgment threshold
Figure BSA00000640155400075
If greater than
Figure BSA00000640155400076
Then exporting present frame is the voice signal frame, judges otherwise get into the fourth stage;
The fourth stage is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient lWith E lSecond judgment threshold
Figure BSA00000640155400077
Size, if greater than
Figure BSA00000640155400078
Then exporting present frame is the music signal frame, judges otherwise get into level V;
Level V is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient lWith E lThe 3rd judgment threshold
Figure BSA00000640155400079
Size, if less than
Figure BSA000006401554000710
Then exporting present frame is the voice signal frame, otherwise gets into the 6th grade of judgement;
Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 3rd judgment threshold
Figure BSA000006401554000711
and the 7th judgment threshold
Figure BSA000006401554000712
size for the 6th grade; If HSS belongs to interval; Then be judged to be the music signal frame, otherwise get into the 7th grade of judgement;
Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS second judgment threshold for the 7th grade With the 6th judgment threshold Size compares short-time zero-crossing rate Z simultaneously nWith Z nFirst judgment threshold
Figure BSA000006401554000716
With the 3rd judgment threshold
Figure BSA000006401554000717
Size is if HSS belongs to
Figure BSA000006401554000718
Interval and Z nBelong to
Figure BSA000006401554000719
The interval then is judged as the music signal frame, otherwise then is judged as the voice signal frame, speech-like signal classification rule sort module output audio signal classification results.
Then assorting process is as shown in Figure 7 for said type of music classifier, and detailed process is following:
Compare MDCT frequency spectrum barycenter changing value δ cWith δ cFirst judgment threshold
Figure BSA000006401554000720
If greater than
Figure BSA000006401554000721
Then exporting present frame is the voice signal frame, judges otherwise get into the second level;
Relatively MDCT frequency spectrum barycenter changing value δ is judged in the second level cWith δ cSecond judgment threshold
Figure BSA000006401554000722
If be less than or equal to
Figure BSA000006401554000723
Then exporting present frame is the music signal frame, judges otherwise get into the third level;
The third level is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 4th judgment threshold
Figure BSA00000640155400081
With second judgment threshold Size compares short-time zero-crossing rate Z simultaneously nWith Z nFirst judgment threshold
Figure BSA00000640155400083
With second judgment threshold
Figure BSA00000640155400084
Size is if HSS belongs to
Figure BSA00000640155400085
Interval and Z nBelong to
Figure BSA00000640155400086
The interval, then exporting present frame is the music signal frame, judges otherwise get into the fourth stage;
The fourth stage is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 5th judgment threshold size; If then exporting present frame greater than
Figure BSA00000640155400088
is the voice signal frame, judge otherwise get into level V;
Level V is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS first judgment threshold
Figure BSA00000640155400089
Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously SFWith C SFFirst judgment threshold
Figure BSA000006401554000810
Size, if HSS greater than
Figure BSA000006401554000811
And C SFGreater than
Figure BSA000006401554000812
Then exporting present frame is the voice signal frame, otherwise gets into the 6th grade of judgement;
Judge relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient for the 6th grade lWith E lFirst judgment threshold Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously SFWith C SFThe 4th judgment threshold
Figure BSA000006401554000814
If size is E lLess than
Figure BSA000006401554000815
And C SFGreater than
Figure BSA000006401554000816
Then be judged as the voice signal frame, otherwise then be judged as the music signal frame, type music signal classifier is sort module output audio signal classification results then.
The classification results correcting module; It is characterized in that,, promptly store original classification result and the classification results of present frame of the preceding T-1 frame of present frame through the historic state of storage classification results; If present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results; Otherwise add up the classification results of the highest sound signal classification results of the frequency of occurrences as present frame, T gets 10 in the present embodiment, and classification results correcting module block diagram is as shown in Figure 8.
Wherein, the related corresponding multistage decision threshold value of audio frequency characteristics parameter in described a kind of real-time sound signal sorting device, its numerical value is as shown in table 1.
Below the classifying quality of technical scheme of the present invention is assessed.
The voice frequency material of EBU SQAM is used in this assessment, and Chinese adopts the Chinese sample of national standard GSBM 6001-89 assessment of acoustics exemplar " good story is not beautiful ", totally 71 audio samples.Signal is an original audio signal, and signal sampling rate is 16KHz, and every frame length is 20ms.Assessment result is seen table 2.
The multistage decision threshold value that table 1 audio frequency characteristics parameter is corresponding
Figure BSA00000640155400091
Table 2 sound signal classifying quality test result
The signal classification Accuracy %
Music
The single-frequency audio frequency 99.6
Electronic musical instrument 96.9
Stringed musical instrument 96.6
Wind instrument 97.8
Percussion instrument 94.5
Concertina 95.0
Voice
Male voice 95.6
Female voice 96.9
Technical scheme according to the invention is to the correct resolution average out to 96.22% of voice, the correct resolution average out to 96.23% of music, good classification effect.Technical scheme of the present invention is carried out the extraction of audio frequency characteristics in the general MDCT territory of codec frequently at existing voice, has avoided the complex calculation of additional transformations, makes the sound signal classification more quick; Classification processing to sound signal is real-time, can improve the efficient of Audio Signal Processing such as audio transmission, audio coding effectively.

Claims (18)

1. a real-time sound signal sorting technique is characterized in that, comprising:
After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame classification type according to the signal frame classification type historic state before the said present frame.
2. method according to claim 1 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
3. method according to claim 1; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
4. method according to claim 1; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
5. method according to claim 1; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
6. method according to claim 5 is characterized in that, said present frame is carried out rough sort; The rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate; Greater than said second setting value, present frame is set is a type speech frame, otherwise being set, said present frame is a type music frames.
7. according to claim 5 or 6 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
8. method according to claim 1; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
9. according to claim 1 or 2 or 4 or 7 described methods, it is characterized in that it is given threshold value that first setting value and second is provided with, the respective settings value is a series of given threshold values.
10. a real-time sound signal sorting device is characterized in that, this equipment comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:
Said pre-processing module, it comprises carries out pre-service and silence detection to sound signal;
Said characteristic extracting module, it comprises carries out real-time feature extraction to the sound signal after handling in time domain and MDCT territory;
Said thickness grading rule classification module, it comprises places the audio frequency characteristics that obtains according to certain rule, according to classifying based on the method for thickness grading rule;
Said classification results correcting module, it comprises revises described former classification results, exports accurate sound signal classification results at last.
11. method according to claim 10 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
12. method according to claim 10; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
13. method according to claim 10; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
14. method according to claim 10; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
15. method according to claim 14; It is characterized in that; Said present frame is carried out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, greater than said second setting value; Present frame is set is a type speech frame, be a type music frames otherwise said present frame is set.
16. according to claim 14 or 15 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
17. method according to claim 10; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
18., it is characterized in that it is given threshold value that first setting value and second is provided with according to claim 10 or 11 or 13 or 16 described methods, the respective settings value is a series of given threshold values.
CN2011104309646A 2011-12-21 2011-12-21 Method and equipment for classifying audio signals in real time Pending CN102543079A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104309646A CN102543079A (en) 2011-12-21 2011-12-21 Method and equipment for classifying audio signals in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104309646A CN102543079A (en) 2011-12-21 2011-12-21 Method and equipment for classifying audio signals in real time

Publications (1)

Publication Number Publication Date
CN102543079A true CN102543079A (en) 2012-07-04

Family

ID=46349818

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104309646A Pending CN102543079A (en) 2011-12-21 2011-12-21 Method and equipment for classifying audio signals in real time

Country Status (1)

Country Link
CN (1) CN102543079A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015018121A1 (en) * 2013-08-06 2015-02-12 华为技术有限公司 Audio signal classification method and device
CN105074822A (en) * 2013-03-26 2015-11-18 杜比实验室特许公司 Device and method for audio classification and audio processing
CN106256001A (en) * 2014-02-24 2016-12-21 三星电子株式会社 Modulation recognition method and apparatus and use its audio coding method and device
CN106571150A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Method and system for positioning human acoustic zone of music
CN108074584A (en) * 2016-11-18 2018-05-25 南京大学 A kind of audio signal classification method based on signal multiple features statistics
CN108242241A (en) * 2016-12-23 2018-07-03 中国农业大学 A kind of pure voice rapid screening method and its device
CN109671425A (en) * 2018-12-29 2019-04-23 广州酷狗计算机科技有限公司 Audio frequency classification method, device and storage medium
CN110931044A (en) * 2019-12-12 2020-03-27 上海立可芯半导体科技有限公司 Radio frequency searching method, channel classification method and electronic equipment
CN111161728A (en) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 Awakening method, device, equipment and medium for intelligent equipment
WO2020253694A1 (en) * 2019-06-17 2020-12-24 华为技术有限公司 Method, chip and terminal for music recognition

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254352A1 (en) * 2005-12-14 2009-10-08 Matsushita Electric Industrial Co., Ltd. Method and system for extracting audio features from an encoded bitstream for audio classification
CN102099856A (en) * 2008-07-17 2011-06-15 弗劳恩霍夫应用研究促进协会 Audio encoding/decoding scheme having a switchable bypass

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090254352A1 (en) * 2005-12-14 2009-10-08 Matsushita Electric Industrial Co., Ltd. Method and system for extracting audio features from an encoded bitstream for audio classification
CN102099856A (en) * 2008-07-17 2011-06-15 弗劳恩霍夫应用研究促进协会 Audio encoding/decoding scheme having a switchable bypass

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孔庆胜,林志斌: "一种实时的语音/音乐分类器的设计", 《2010年声频工程学术交流年会论文集》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803879B2 (en) 2013-03-26 2020-10-13 Dolby Laboratories Licensing Corporation Apparatuses and methods for audio classifying and processing
CN105074822A (en) * 2013-03-26 2015-11-18 杜比实验室特许公司 Device and method for audio classification and audio processing
CN106409310B (en) * 2013-08-06 2019-11-19 华为技术有限公司 A kind of audio signal classification method and apparatus
US10529361B2 (en) 2013-08-06 2020-01-07 Huawei Technologies Co., Ltd. Audio signal classification method and apparatus
CN106409310A (en) * 2013-08-06 2017-02-15 华为技术有限公司 Audio signal classification method and device
US11756576B2 (en) 2013-08-06 2023-09-12 Huawei Technologies Co., Ltd. Classification of audio signal as speech or music based on energy fluctuation of frequency spectrum
US11289113B2 (en) 2013-08-06 2022-03-29 Huawei Technolgies Co. Ltd. Linear prediction residual energy tilt-based audio signal classification method and apparatus
CN106409313B (en) * 2013-08-06 2021-04-20 华为技术有限公司 Audio signal classification method and device
US10090003B2 (en) 2013-08-06 2018-10-02 Huawei Technologies Co., Ltd. Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation
CN106409313A (en) * 2013-08-06 2017-02-15 华为技术有限公司 Audio signal classification method and apparatus
WO2015018121A1 (en) * 2013-08-06 2015-02-12 华为技术有限公司 Audio signal classification method and device
CN106256001A (en) * 2014-02-24 2016-12-21 三星电子株式会社 Modulation recognition method and apparatus and use its audio coding method and device
US10504540B2 (en) 2014-02-24 2019-12-10 Samsung Electronics Co., Ltd. Signal classifying method and device, and audio encoding method and device using same
CN106571150A (en) * 2015-10-12 2017-04-19 阿里巴巴集团控股有限公司 Method and system for positioning human acoustic zone of music
CN108074584A (en) * 2016-11-18 2018-05-25 南京大学 A kind of audio signal classification method based on signal multiple features statistics
CN108242241A (en) * 2016-12-23 2018-07-03 中国农业大学 A kind of pure voice rapid screening method and its device
CN109671425A (en) * 2018-12-29 2019-04-23 广州酷狗计算机科技有限公司 Audio frequency classification method, device and storage medium
CN109671425B (en) * 2018-12-29 2021-04-06 广州酷狗计算机科技有限公司 Audio classification method, device and storage medium
WO2020253694A1 (en) * 2019-06-17 2020-12-24 华为技术有限公司 Method, chip and terminal for music recognition
CN110931044A (en) * 2019-12-12 2020-03-27 上海立可芯半导体科技有限公司 Radio frequency searching method, channel classification method and electronic equipment
CN111161728A (en) * 2019-12-26 2020-05-15 珠海格力电器股份有限公司 Awakening method, device, equipment and medium for intelligent equipment

Similar Documents

Publication Publication Date Title
CN102543079A (en) Method and equipment for classifying audio signals in real time
CN101599271B (en) Recognition method of digital music emotion
Chou et al. Robust singing detection in speech/music discriminator design
CN103646649B (en) A kind of speech detection method efficiently
Bachu et al. Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal
CN1121681C (en) Speech processing
CN105206270A (en) Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)
CN103871426A (en) Method and system for comparing similarity between user audio frequency and original audio frequency
CN103489454A (en) Voice endpoint detection method based on waveform morphological characteristic clustering
CN106504772B (en) Speech-emotion recognition method based on weights of importance support vector machine classifier
CN102446504A (en) Voice/Music identifying method and equipment
CN103000172A (en) Signal classification method and device
CN102708861A (en) Poor speech recognition method based on support vector machine
CN108364641A (en) A kind of speech emotional characteristic extraction method based on the estimation of long time frame ambient noise
Sivakumaran et al. On the use of the bayesian information criterion in multiple speaker detection
CN108074584A (en) A kind of audio signal classification method based on signal multiple features statistics
Nilsson et al. On the mutual information between frequency bands in speech
Tsau et al. Environmental sound recognition with CELP-based features
CN102610234B (en) Method for selectively mapping signal complexity and code rate
Velayatipour et al. A review on speech-music discrimination methods
Ravindran et al. Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing
CN104517614A (en) Voiced/unvoiced decision device and method based on sub-band characteristic parameter values
Sunny et al. Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam
Sharma et al. Non intrusive codec identification algorithm
Feki et al. Audio stream analysis for environmental sound classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120704