CN102543079A - Method and equipment for classifying audio signals in real time - Google Patents
Method and equipment for classifying audio signals in real time Download PDFInfo
- Publication number
- CN102543079A CN102543079A CN2011104309646A CN201110430964A CN102543079A CN 102543079 A CN102543079 A CN 102543079A CN 2011104309646 A CN2011104309646 A CN 2011104309646A CN 201110430964 A CN201110430964 A CN 201110430964A CN 102543079 A CN102543079 A CN 102543079A
- Authority
- CN
- China
- Prior art keywords
- frame
- classification
- mdct
- present frame
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000005236 sound signal Effects 0.000 title claims abstract description 46
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000003595 spectral effect Effects 0.000 claims description 54
- 238000001228 spectrum Methods 0.000 claims description 35
- 238000006243 chemical reaction Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 5
- 238000001914 filtration Methods 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000009527 percussion Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Abstract
The invention discloses a method and equipment for classifying audio signals in real time, belonging to the technical field of audio encoding and decoding. The method comprises the following steps: preprocessing the inputted audio signals; extracting multi-level audio features in a time domain and an MDCT domain; classifying a current frame by adopting the signal audio feature if the current frame is in a classification convergence time frame I; classifying the current frame by adopting the coarse-fine classification rule if the current frame is behind the classification convergence time frame I, and judging classification features by adopting the multi-level features; and after classifying the current frame by adopting the coarse-fine classification rule, upgrading the signal classification type of the current frame according to the historical state of the classification type of a signal frame before the current frame. Due to the adoption of the method and the equipment, the signals can be classified in real time simply and accurately.
Description
One, technical field
The present invention relates to audio coding decoding and transmission field, relate in particular to a kind of real-time sound signal sorting technique and equipment.
Two, background technology
Before coding audio signal, transmission or other processing, signal is classified; Can improve the efficient of coding and transmission effectively; Because the transmission of multimedia audio signal is under the pattern framework based on real-time Transmission, be an important research contents to the real-time grading of sound signal.
The research to the sound signal classification both at home and abroad concentrates in the long classification mostly, classifies like the short-time energy of the low energy dose rate classification of 1 second or 10 seconds durations and 1 second or 10 seconds durations etc.In the design of sorter, generally adopt sorting technique, like the sorter of SVMs, neural network classifier etc. based on statistics.Because the processing time is long, these methods practicality on the audio frequency real-time grading is not high.
At present the algorithm of audio classification is realized in time domain or frequency domain basically; And the coded system of current trend; Like MP3, AAC etc. utilize the MDCT conversion process, in order to reduce extra arithmetic operation; The characteristic of directly extracting on MDCT territory and the time domain is analyzed, and can improve feature extraction efficient effectively.Cooperate suitable classifying rules can design the sorting device of quick audio signal in real time classification.
Three, summary of the invention
1, goal of the invention: the purpose of this invention is to provide a kind of real-time sound signal sorting technique and equipment; Carry out real-time grading fast; Reduce extra computing, improve the degree of accuracy of sound signal real-time grading, performance sound signal classification encode audio and audio transmission vital role.
2, technical scheme: for realizing the foregoing invention purpose, the present invention discloses a kind of real-time sound signal sorting technique, comprising:
After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame class categories according to the signal frame classification type historic state before the said present frame.
Further, in the said method, adopt short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
In the said further method;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
Said present frame adopts the thickness grading rule to carry out the multi-stage characteristics classification when being in behind the convergence time frame I of classification, and multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Said present frame carries out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, is a type music frames otherwise said present frame is set.
Further; In the said method; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Said historic state through the storage classification results; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
It is given threshold value that described first setting value and second is provided with, and the respective settings value is a series of given threshold values.
The present invention also provides a kind of real-time sound signal sorting device, comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:
Said pre-processing module, it comprises carries out pre-service and silence detection to sound signal;
Said characteristic extracting module, it comprises carries out real-time feature extraction to the sound signal after handling in time domain and MDCT territory;
Said thickness grading rule classification module, it comprises places the audio frequency characteristics that obtains according to certain rule, according to classifying based on the method for thickness grading rule;
Said classification results correcting module, it comprises revises described former classification results, exports accurately sound signal classification results at last.
Further, in the said equipment, said pre-processing module adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
Further; In the said equipment; Said characteristic extracting module is through carrying out the MDCT conversion to the every frame sound signal after handling; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Further; In the said equipment; Said thickness grading rule classification module adopts single tagsort through time in the convergence time frame I that preceding frame is in classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
Said present frame adopts the thickness grading rule to carry out the multi-stage characteristics classification when being in behind the convergence time frame I of classification, and multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
Said present frame carries out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, is a type music frames otherwise said present frame is set.
Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.。
Further; In the said equipment; Said classification results correcting module is through the historic state of storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
It is given threshold value that described first setting value and second is provided with, and the respective settings value is a series of given threshold values.
Technical scheme of the present invention improves sound signal real-time grading accuracy through simple thickness grading rule; Thereby improve audio coding decoding efficient greatly, technical scheme of the present invention can be used for the sound signal classification judgement in real time bidirectional communication such as audio coding decoding fields such as wireless, conferencing over ip TV and real time broadcasting service.
Four, description of drawings
Fig. 1 is the sound signal classification application block diagram that is used for the audio encoding device.
Fig. 2 is a kind of real-time sound signal sorting device structured flowchart.
Fig. 3 is the quiet judgement block diagram of sound signal.
Fig. 4 is a single tagsort block diagram in the convergence time frame I that classifies.
Fig. 5 is a single characteristic rough sort block diagram behind the convergence time frame I that classifies.
Fig. 6 is speech-like signal classification rule classfying frame figure.
Fig. 7 type of being music signal classifier block diagram of then classifying.
Fig. 8 is a classification results correcting module block diagram.
Five, embodiment
Main design of the present invention is; Frequently language audio signal classification judgement (like Fig. 1) before codec can adopt a kind of real-time sound signal sorting technique to encode to voice; Be applicable to the scrambler of voice or audio frequency on this basis according to the similar selection of judgement; Thereby improve the voice code efficiency to signal with different type of codec frequently, detailed process is following:
Step 1, signal divide frame and high-pass filtering to handle the unnecessary low frequency signal of filtering; Silence detection adopts short-time zero-crossing rate to carry out the present frame silence detection, when said present frame short-time zero-crossing rate greater than said first setting value, it is non-quiet frame that present frame is set.;
Step 2, MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;
Single characteristic signal category classification in the time of in the convergence time frame I of step 3, signal classification;
Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of behind the convergence time frame I of step 4, signal classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient;
Step 5, signal classification rough sort, rough sort characteristic adopt first subband of MDCT spectral sub-bands energy to differentiate, and greater than said second setting value, present frame is set is a type speech frame, are a type music frames otherwise said present frame is set.
Signal frame after step 6, the signal classification rough sort carries out sophisticated category according to the method that many characteristics combine, and the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Step 7, through the storage classification results historic state; Classification results in conjunction with present frame; Utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
Below in conjunction with accompanying drawing and embodiment the present invention program is explained further details.
A kind of real-time sound signal sorting device, as shown in Figure 2, comprise pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module.Introduce the function of each module below.
Pre-processing module; At first be audio stream x (n) to be carried out signal divide frame and high-pass filtering; Next is to adopt short-time average zero-crossing rate to carry out silence detection; As shown in Figure 3; When short-time average zero-crossing rate during greater than first setting value
, judge that this frame is non-quiet frame, short-time average zero-crossing rate calculates and adopts following formula 1:
Wherein N is a frame length, and sgn [] is a sign function, that is:
Window function h (n) is a rectangular window, that is:
Characteristic extracting module is used for carrying out MDCT variation and time domain and the extraction of MDCT characteristic of field thereof after the pre-service.At first adopt MDCT (Modified Discrete Cosine Transform) conversion to obtain frequency coefficient:
The folded 2N point time domain data of forming of N point present frame time domain data x (n) and previous frame N point time domain data x (n-N) is carried out the MDCT conversion, for present embodiment, adopt the 16kHz sampled signal, N gets 320.
Wherein w (n) representes the sin window function, and expression formula is:
Time domain and MDCT frequency domain character thereof extract:
(2) MDCT frequency spectrum harmonic structure stability HSS:
Each peak point is designated as P in step 1, the every frame MDCT frequency spectrum of search
l, P
lL peak value representing this frame;
Step 2, conversion P
lOn the logarithmic scale of one-tenth standard, be designated as LP
l, conversion is shown in formula 6:
Wherein L is last peak value;
(3) MDCT spectral sub-bands energy E
b
Said MDCT spectral sub-bands is meant that the MDCT spectral coefficient equidistantly is divided into M subband, and present embodiment M is 32, and sub belt energy such as formula 7 calculate gained:
Wherein j is a sub-band serial number, E
bThe judgment threshold of first sub belt energy be designated as
(4) MDCT spectral sub-bands energy changing statistical value C
SF
Calculate MDCT frequency spectrum flow SF (j):
E wherein
b(i j) is j sub belt energy of time frame i frame, and Q is the time frame number of the frequency spectrum flow of calculating, and Q gets 6 in the present embodiment.
Calculate and surpass setting value THR among the SF (j)
SFNumber C
SF, corresponding judgment threshold is a setting value
With
(5) MDCT frequency spectrum barycenter changing value δ
c:
Step 1, calculate each frame MDCT frequency spectrum center of mass values:
F (k)=k+1 wherein, the calculating of p (n) such as formula 10:
P (k)=Ω (k)/max (Ω (k)) (formula 10)
Ω (k)=abs (X (k)) wherein.
Step 2, calculating MDCT frequency spectrum barycenter changing value:
The adjacent frame number of O for calculating.O gets 4 in the present embodiment.δ
cJudgment threshold be designated as
With
(6) preceding four the parameter absolute value sum E of MDCT spectral coefficient
l
Calculate preceding four the parameter absolute value sums of MDCT spectral coefficient, its result is designated as E
l, its judgment threshold is masked as
With
Thickness grading rule classification module is a kind of rule classification method based on thickness grading, specifically comprises following process:
Said present frame adopts single tagsort, single characteristic to adopt MDCT spectral sub-bands energy E when being in the convergence time frame I of classification
b, as shown in Figure 4, the MDCT spectral sub-bands energy first energy subband E
b(0) greater than
Then be judged to be the voice signal frame, otherwise be the music signal frame.
Said present frame adopts single tagsort when being in behind the convergence time frame I of classification, if the MDCT spectrum transformation coefficient first energy subband E
b(0) greater than
Then be judged to be the speech-like signal frame, on the contrary type of being music signal frame, and rough sort is as shown in Figure 5.
Said signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine, the audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
Said type of voice classification rule assorting process is as shown in Figure 6, and detailed process is following:
Compare MDCT spectral sub-bands energy changing statistical value C
SFWith C
SFSecond judgment threshold
Size, if greater than
Then exporting present frame is the voice signal frame, judges otherwise get into the second level;
Relatively MDCT spectral sub-bands energy changing statistical value C is judged in the second level
SFWith C
SFThe 3rd judgment threshold
Size, if less than
Then exporting present frame is the music signal frame, judges otherwise get into the third level;
The third level is judged relatively MDCT frequency spectrum barycenter changing value δ
cWith δ
cThe 3rd judgment threshold
If greater than
Then exporting present frame is the voice signal frame, judges otherwise get into the fourth stage;
The fourth stage is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient
lWith E
lSecond judgment threshold
Size, if greater than
Then exporting present frame is the music signal frame, judges otherwise get into level V;
Level V is judged relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient
lWith E
lThe 3rd judgment threshold
Size, if less than
Then exporting present frame is the voice signal frame, otherwise gets into the 6th grade of judgement;
Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 3rd judgment threshold
and the 7th judgment threshold
size for the 6th grade; If HSS belongs to
interval; Then be judged to be the music signal frame, otherwise get into the 7th grade of judgement;
Judge relatively MDCT frequency spectrum harmonic structure stability HSS and HSS second judgment threshold for the 7th grade
With the 6th judgment threshold
Size compares short-time zero-crossing rate Z simultaneously
nWith Z
nFirst judgment threshold
With the 3rd judgment threshold
Size is if HSS belongs to
Interval and Z
nBelong to
The interval then is judged as the music signal frame, otherwise then is judged as the voice signal frame, speech-like signal classification rule sort module output audio signal classification results.
Then assorting process is as shown in Figure 7 for said type of music classifier, and detailed process is following:
Compare MDCT frequency spectrum barycenter changing value δ
cWith δ
cFirst judgment threshold
If greater than
Then exporting present frame is the voice signal frame, judges otherwise get into the second level;
Relatively MDCT frequency spectrum barycenter changing value δ is judged in the second level
cWith δ
cSecond judgment threshold
If be less than or equal to
Then exporting present frame is the music signal frame, judges otherwise get into the third level;
The third level is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 4th judgment threshold
With second judgment threshold
Size compares short-time zero-crossing rate Z simultaneously
nWith Z
nFirst judgment threshold
With second judgment threshold
Size is if HSS belongs to
Interval and Z
nBelong to
The interval, then exporting present frame is the music signal frame, judges otherwise get into the fourth stage;
The fourth stage is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS the 5th judgment threshold
size; If then exporting present frame greater than
is the voice signal frame, judge otherwise get into level V;
Level V is judged relatively MDCT frequency spectrum harmonic structure stability HSS and HSS first judgment threshold
Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously
SFWith C
SFFirst judgment threshold
Size, if HSS greater than
And C
SFGreater than
Then exporting present frame is the voice signal frame, otherwise gets into the 6th grade of judgement;
Judge relatively preceding four the parameter absolute value sum E of MDCT spectral coefficient for the 6th grade
lWith E
lFirst judgment threshold
Size compares MDCT spectral sub-bands energy changing statistical value C simultaneously
SFWith C
SFThe 4th judgment threshold
If size is E
lLess than
And C
SFGreater than
Then be judged as the voice signal frame, otherwise then be judged as the music signal frame, type music signal classifier is sort module output audio signal classification results then.
The classification results correcting module; It is characterized in that,, promptly store original classification result and the classification results of present frame of the preceding T-1 frame of present frame through the historic state of storage classification results; If present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results; Otherwise add up the classification results of the highest sound signal classification results of the frequency of occurrences as present frame, T gets 10 in the present embodiment, and classification results correcting module block diagram is as shown in Figure 8.
Wherein, the related corresponding multistage decision threshold value of audio frequency characteristics parameter in described a kind of real-time sound signal sorting device, its numerical value is as shown in table 1.
Below the classifying quality of technical scheme of the present invention is assessed.
The voice frequency material of EBU SQAM is used in this assessment, and Chinese adopts the Chinese sample of national standard GSBM 6001-89 assessment of acoustics exemplar " good story is not beautiful ", totally 71 audio samples.Signal is an original audio signal, and signal sampling rate is 16KHz, and every frame length is 20ms.Assessment result is seen table 2.
The multistage decision threshold value that table 1 audio frequency characteristics parameter is corresponding
Table 2 sound signal classifying quality test result
The signal classification | Accuracy % |
Music | |
The single-frequency audio frequency | 99.6 |
Electronic musical instrument | 96.9 |
Stringed musical instrument | 96.6 |
Wind instrument | 97.8 |
Percussion instrument | 94.5 |
Concertina | 95.0 |
Voice | |
Male voice | 95.6 |
Female voice | 96.9 |
Technical scheme according to the invention is to the correct resolution average out to 96.22% of voice, the correct resolution average out to 96.23% of music, good classification effect.Technical scheme of the present invention is carried out the extraction of audio frequency characteristics in the general MDCT territory of codec frequently at existing voice, has avoided the complex calculation of additional transformations, makes the sound signal classification more quick; Classification processing to sound signal is real-time, can improve the efficient of Audio Signal Processing such as audio transmission, audio coding effectively.
Claims (18)
1. a real-time sound signal sorting technique is characterized in that, comprising:
After the sound signal of input carried out branch frame and high-pass filtering and handle; Carry out the present frame silence detection; Calculate the MDCT conversion; Extract audio frequency characteristics in time domain and MDCT territory, in said present frame is in the convergence time frame I of classification, adopt single audio frequency characteristics classification, if said present frame then adopts the sorting technique of thickness grading rule to classify after being in the convergence time frame I of classification; And behind the said present frame process thickness grading rule classification, upgrade said present frame classification type according to the signal frame classification type historic state before the said present frame.
2. method according to claim 1 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, and said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
3. method according to claim 1; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
4. method according to claim 1; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
5. method according to claim 1; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
6. method according to claim 5 is characterized in that, said present frame is carried out rough sort; The rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate; Greater than said second setting value, present frame is set is a type speech frame, otherwise being set, said present frame is a type music frames.
7. according to claim 5 or 6 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
8. method according to claim 1; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
9. according to claim 1 or 2 or 4 or 7 described methods, it is characterized in that it is given threshold value that first setting value and second is provided with, the respective settings value is a series of given threshold values.
10. a real-time sound signal sorting device is characterized in that, this equipment comprises pre-processing module connected to one another, characteristic extracting module, thickness grading rule classification module and classification results correcting module, wherein:
Said pre-processing module, it comprises carries out pre-service and silence detection to sound signal;
Said characteristic extracting module, it comprises carries out real-time feature extraction to the sound signal after handling in time domain and MDCT territory;
Said thickness grading rule classification module, it comprises places the audio frequency characteristics that obtains according to certain rule, according to classifying based on the method for thickness grading rule;
Said classification results correcting module, it comprises revises described former classification results, exports accurate sound signal classification results at last.
11. method according to claim 10 is characterized in that, adopts short-time zero-crossing rate to carry out the present frame silence detection, said present frame short-time zero-crossing rate is greater than said first setting value, and it is non-quiet frame that present frame is set.
12. method according to claim 10; It is characterized in that; Through the every frame sound signal after handling is carried out the MDCT conversion; In time domain and MDCT territory, extract a series of audio frequency characteristics, audio frequency characteristics comprises short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
13. method according to claim 10; It is characterized in that;, adopts said present frame single tagsort when being in the convergence time frame I of classification; Single characteristic adopts MDCT spectral sub-bands energy, and the single characteristic first energy subband of said present frame is greater than said second setting value, and it is speech frame that present frame is set.
14. method according to claim 10; It is characterized in that; Adopt the thickness grading rule to carry out the multi-stage characteristics classification in the time of after said present frame is in the convergence time frame I of classification, multi-stage characteristics adopts short-time zero-crossing rate, MDCT frequency spectrum harmonic structure stability, MDCT spectral sub-bands energy changing statistical value, MDCT frequency spectrum barycenter changing value, MDCT spectral sub-bands energy and preceding four the parameter absolute value sums of MDCT spectral coefficient.
15. method according to claim 14; It is characterized in that; Said present frame is carried out rough sort, and the rough sort characteristic adopts first subband of MDCT spectral sub-bands energy to differentiate, greater than said second setting value; Present frame is set is a type speech frame, be a type music frames otherwise said present frame is set.
16. according to claim 14 or 15 described methods; It is characterized in that; Signal frame after the rough sort is carried out sophisticated category according to the method that many characteristics combine; The audio frequency characteristics of every grade of judgement of sophisticated category is relatively judged signal type with the respective settings value respectively, and the classification process does not change graded features judgement order.
17. method according to claim 10; It is characterized in that; Historic state through the storage classification results; In conjunction with the classification results of present frame, utilize the classification results of the highest classification type of the frequency of occurrences, if present frame is that among quiet frame or the historical classification original classification result two frames only to be arranged be that non-quiet frame is then kept former classification results as present frame.
18., it is characterized in that it is given threshold value that first setting value and second is provided with according to claim 10 or 11 or 13 or 16 described methods, the respective settings value is a series of given threshold values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104309646A CN102543079A (en) | 2011-12-21 | 2011-12-21 | Method and equipment for classifying audio signals in real time |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011104309646A CN102543079A (en) | 2011-12-21 | 2011-12-21 | Method and equipment for classifying audio signals in real time |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102543079A true CN102543079A (en) | 2012-07-04 |
Family
ID=46349818
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011104309646A Pending CN102543079A (en) | 2011-12-21 | 2011-12-21 | Method and equipment for classifying audio signals in real time |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102543079A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015018121A1 (en) * | 2013-08-06 | 2015-02-12 | 华为技术有限公司 | Audio signal classification method and device |
CN105074822A (en) * | 2013-03-26 | 2015-11-18 | 杜比实验室特许公司 | Device and method for audio classification and audio processing |
CN106256001A (en) * | 2014-02-24 | 2016-12-21 | 三星电子株式会社 | Modulation recognition method and apparatus and use its audio coding method and device |
CN106571150A (en) * | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and system for positioning human acoustic zone of music |
CN108074584A (en) * | 2016-11-18 | 2018-05-25 | 南京大学 | A kind of audio signal classification method based on signal multiple features statistics |
CN108242241A (en) * | 2016-12-23 | 2018-07-03 | 中国农业大学 | A kind of pure voice rapid screening method and its device |
CN109671425A (en) * | 2018-12-29 | 2019-04-23 | 广州酷狗计算机科技有限公司 | Audio frequency classification method, device and storage medium |
CN110931044A (en) * | 2019-12-12 | 2020-03-27 | 上海立可芯半导体科技有限公司 | Radio frequency searching method, channel classification method and electronic equipment |
CN111161728A (en) * | 2019-12-26 | 2020-05-15 | 珠海格力电器股份有限公司 | Awakening method, device, equipment and medium for intelligent equipment |
WO2020253694A1 (en) * | 2019-06-17 | 2020-12-24 | 华为技术有限公司 | Method, chip and terminal for music recognition |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090254352A1 (en) * | 2005-12-14 | 2009-10-08 | Matsushita Electric Industrial Co., Ltd. | Method and system for extracting audio features from an encoded bitstream for audio classification |
CN102099856A (en) * | 2008-07-17 | 2011-06-15 | 弗劳恩霍夫应用研究促进协会 | Audio encoding/decoding scheme having a switchable bypass |
-
2011
- 2011-12-21 CN CN2011104309646A patent/CN102543079A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090254352A1 (en) * | 2005-12-14 | 2009-10-08 | Matsushita Electric Industrial Co., Ltd. | Method and system for extracting audio features from an encoded bitstream for audio classification |
CN102099856A (en) * | 2008-07-17 | 2011-06-15 | 弗劳恩霍夫应用研究促进协会 | Audio encoding/decoding scheme having a switchable bypass |
Non-Patent Citations (1)
Title |
---|
孔庆胜,林志斌: "一种实时的语音/音乐分类器的设计", 《2010年声频工程学术交流年会论文集》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10803879B2 (en) | 2013-03-26 | 2020-10-13 | Dolby Laboratories Licensing Corporation | Apparatuses and methods for audio classifying and processing |
CN105074822A (en) * | 2013-03-26 | 2015-11-18 | 杜比实验室特许公司 | Device and method for audio classification and audio processing |
CN106409310B (en) * | 2013-08-06 | 2019-11-19 | 华为技术有限公司 | A kind of audio signal classification method and apparatus |
US10529361B2 (en) | 2013-08-06 | 2020-01-07 | Huawei Technologies Co., Ltd. | Audio signal classification method and apparatus |
CN106409310A (en) * | 2013-08-06 | 2017-02-15 | 华为技术有限公司 | Audio signal classification method and device |
US11756576B2 (en) | 2013-08-06 | 2023-09-12 | Huawei Technologies Co., Ltd. | Classification of audio signal as speech or music based on energy fluctuation of frequency spectrum |
US11289113B2 (en) | 2013-08-06 | 2022-03-29 | Huawei Technolgies Co. Ltd. | Linear prediction residual energy tilt-based audio signal classification method and apparatus |
CN106409313B (en) * | 2013-08-06 | 2021-04-20 | 华为技术有限公司 | Audio signal classification method and device |
US10090003B2 (en) | 2013-08-06 | 2018-10-02 | Huawei Technologies Co., Ltd. | Method and apparatus for classifying an audio signal based on frequency spectrum fluctuation |
CN106409313A (en) * | 2013-08-06 | 2017-02-15 | 华为技术有限公司 | Audio signal classification method and apparatus |
WO2015018121A1 (en) * | 2013-08-06 | 2015-02-12 | 华为技术有限公司 | Audio signal classification method and device |
CN106256001A (en) * | 2014-02-24 | 2016-12-21 | 三星电子株式会社 | Modulation recognition method and apparatus and use its audio coding method and device |
US10504540B2 (en) | 2014-02-24 | 2019-12-10 | Samsung Electronics Co., Ltd. | Signal classifying method and device, and audio encoding method and device using same |
CN106571150A (en) * | 2015-10-12 | 2017-04-19 | 阿里巴巴集团控股有限公司 | Method and system for positioning human acoustic zone of music |
CN108074584A (en) * | 2016-11-18 | 2018-05-25 | 南京大学 | A kind of audio signal classification method based on signal multiple features statistics |
CN108242241A (en) * | 2016-12-23 | 2018-07-03 | 中国农业大学 | A kind of pure voice rapid screening method and its device |
CN109671425A (en) * | 2018-12-29 | 2019-04-23 | 广州酷狗计算机科技有限公司 | Audio frequency classification method, device and storage medium |
CN109671425B (en) * | 2018-12-29 | 2021-04-06 | 广州酷狗计算机科技有限公司 | Audio classification method, device and storage medium |
WO2020253694A1 (en) * | 2019-06-17 | 2020-12-24 | 华为技术有限公司 | Method, chip and terminal for music recognition |
CN110931044A (en) * | 2019-12-12 | 2020-03-27 | 上海立可芯半导体科技有限公司 | Radio frequency searching method, channel classification method and electronic equipment |
CN111161728A (en) * | 2019-12-26 | 2020-05-15 | 珠海格力电器股份有限公司 | Awakening method, device, equipment and medium for intelligent equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102543079A (en) | Method and equipment for classifying audio signals in real time | |
CN101599271B (en) | Recognition method of digital music emotion | |
Chou et al. | Robust singing detection in speech/music discriminator design | |
CN103646649B (en) | A kind of speech detection method efficiently | |
Bachu et al. | Separation of voiced and unvoiced using zero crossing rate and energy of the speech signal | |
CN1121681C (en) | Speech processing | |
CN105206270A (en) | Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM) | |
CN103871426A (en) | Method and system for comparing similarity between user audio frequency and original audio frequency | |
CN103489454A (en) | Voice endpoint detection method based on waveform morphological characteristic clustering | |
CN106504772B (en) | Speech-emotion recognition method based on weights of importance support vector machine classifier | |
CN102446504A (en) | Voice/Music identifying method and equipment | |
CN103000172A (en) | Signal classification method and device | |
CN102708861A (en) | Poor speech recognition method based on support vector machine | |
CN108364641A (en) | A kind of speech emotional characteristic extraction method based on the estimation of long time frame ambient noise | |
Sivakumaran et al. | On the use of the bayesian information criterion in multiple speaker detection | |
CN108074584A (en) | A kind of audio signal classification method based on signal multiple features statistics | |
Nilsson et al. | On the mutual information between frequency bands in speech | |
Tsau et al. | Environmental sound recognition with CELP-based features | |
CN102610234B (en) | Method for selectively mapping signal complexity and code rate | |
Velayatipour et al. | A review on speech-music discrimination methods | |
Ravindran et al. | Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing | |
CN104517614A (en) | Voiced/unvoiced decision device and method based on sub-band characteristic parameter values | |
Sunny et al. | Feature extraction methods based on linear predictive coding and wavelet packet decomposition for recognizing spoken words in malayalam | |
Sharma et al. | Non intrusive codec identification algorithm | |
Feki et al. | Audio stream analysis for environmental sound classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120704 |