CN101645265B - Method and device for identifying audio category in real time - Google Patents

Method and device for identifying audio category in real time Download PDF

Info

Publication number
CN101645265B
CN101645265B CN2008101422448A CN200810142244A CN101645265B CN 101645265 B CN101645265 B CN 101645265B CN 2008101422448 A CN2008101422448 A CN 2008101422448A CN 200810142244 A CN200810142244 A CN 200810142244A CN 101645265 B CN101645265 B CN 101645265B
Authority
CN
China
Prior art keywords
frame
sound signal
real
tonality
described sound
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101422448A
Other languages
Chinese (zh)
Other versions
CN101645265A (en
Inventor
付中华
刘开文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN2008101422448A priority Critical patent/CN101645265B/en
Publication of CN101645265A publication Critical patent/CN101645265A/en
Application granted granted Critical
Publication of CN101645265B publication Critical patent/CN101645265B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a method and a device for identifying an audio category in real time. The method for identifying in real time comprises the following steps: a, calculating and analyzing a short-term energy mean square root of an audio signal in a section, and entering step b when the short-term energy mean square root is smaller than a preset mute detection threshold; b, respectively performing real cepstrum analysis on each frame of the audio signal; and c, calculating the short-term characteristics of the audio signal according to a result of the real cepstrum analysis, and according to the short-term characteristics, identifying the category of the audio signal by a threshold method. The technical scheme of the invention effectively implements real-time identification of the audio category based on a real cepstrum.

Description

A kind of real-time identification method of audio categories and device
Technical field
The present invention relates to the communications field, relate in particular to a kind of real-time identification method and device of audio categories.
Background technology
In the encoding-decoding process of audio frequency, music often adopts different code encoding/decoding modes with voice signal, therefore, needs to discern its classification before the audio coding decoding, determines music or voice.
The identification difficulty of audio categories is the changeable of noise in music and the voice.At present, the identification of music and voice is mainly analyzed by short-time analysis and when long realized.In the short-time analysis, the short-time characteristic that extracts from sound signal processing has in short-term only utilized a spot of useful information, is not enough to reflect the difference of music and voice two class signals; In analyzing when long, owing to lack strong feature description, perhaps by identification as described in long timeslice is realized as the analysis of whole audio file, perhaps the analysis of the statistical nature of deriving by the audio frequency behavioral characteristics realizes described identification; Though the former can reflect the difference of music and voice two class signals preferably, but to having relatively high expectations of sampling rate and calculated amount, time-delay is long, is not suitable for the real-time Communication for Power field, and the latter's recognition feature is sane inadequately, is difficult to guarantee in the communication environment of complexity its identification validity.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of real-time identification method and device of audio categories, has realized the identification based on the audio categories of real cepstrum in real time effectively.
The technical solution adopted for the present invention to solve the technical problems is:
A kind of real-time identification method of audio categories may further comprise the steps:
The short-time energy root mean square of the sound signal in a, the computational analysis interval, described short-time energy root mean square enter step b during less than default silence detection threshold value;
B, respectively each frame signal of described sound signal is carried out real cepstral analysis;
C, calculate the value of each LPH of described sound signal (LongestPitch Hold, fundamental frequency retention time)/PCT (Pitch Continue Time, fundamental frequency continuous time) according to described real cepstral analysis result;
D, judge that the number of times of LPH/PCT=1 is whether greater than default equal times threshold value, if described sound signal is a music.
In the such scheme, whether the number of times of the described LPH/PCT=1 of judgement is greater than comprising after the default equal times threshold value:
E, when the number of times of LPH/PCT=1 is not more than default equal times threshold value, determine the average of LPH/PCT according to the value of each LPH/PCT, whether judge it greater than default average threshold value, if described sound signal is a music, otherwise, enter next step;
F, calculate the APD (Average PitchDensity, average tonality intensity) of described sound signal, and whether judge it greater than default intensity threshold according to described real cepstral analysis result, if described sound signal is a music, otherwise, enter next step;
G, calculate the quantity and the average energy of the tonality frame and the non-tonality frame of described sound signal according to described real cepstral analysis result;
H, determine TNR (Tone Non-toneRatio, the average energy of tonality frame and non-tonality frame than), and judge whether it compares threshold value less than default energy according to the average energy of described tonality frame and non-tonality frame, if, described sound signal is a music, otherwise, enter next step;
I, determine RNT (Ratio of Non-Tone, non-tonality frame ratio) according to the quantity of described tonality frame and non-tonality frame, and whether judge it less than default proportion threshold value, if described sound signal is a music, otherwise described sound signal is voice;
The computing formula of described APD is:
APD = Σ i = 1 N 1 L Σ j = l 1 l 2 | RC x i ( j ) |
Wherein, N is the signal frame number that described sound signal comprises, RCx i(j) be the j point result of real cepstral analysis of the i frame signal of described sound signal, L=l 2-l 1+ 1, l 1And l 2Be respectively the starting point and the terminating point of the part of reflection spectrum detailed information among the described real cepstral analysis result.
In the such scheme, described step c realizes by following steps::
C1, according to the part of reflection spectrum detailed information among the described real cepstral analysis result, determine fundamental frequency continually varying sets of signals in the described sound signal, and the sets of signals that fundamental frequency remains unchanged in each fundamental frequency continually varying sets of signals;
C2, the signal frame number that comprises according to each fundamental frequency continually varying sets of signals are respectively determined corresponding PCT, and the signal frame number that comprises of the sets of signals that remains unchanged according to each fundamental frequency is determined corresponding LPH respectively;
C3, each LPH is divided by with corresponding PCT respectively.
In the such scheme, in each frame signal that each fundamental frequency continually varying sets of signals comprises, the peak difference of the part of reflection spectrum detailed information is less than default peak value error in reading among the real cepstral analysis result of adjacent two frame signals; In each frame signal that the sets of signals that each fundamental frequency remains unchanged comprises, the peak difference of the part of reflection spectrum detailed information keeps error less than default peak value among the real cepstral analysis result of adjacent two frame signals; Described peak value keeps error less than described peak value error in reading.
In the such scheme, described step g realizes by following steps:
G1, according to the part of reflection spectrum detailed information among the described real cepstral analysis result, each frame signal with described sound signal is labeled as tonality frame or non-tonality frame respectively;
The quantity of g2, statistics described tonality frame and non-tonality frame, and calculate the average energy of described tonality frame and non-tonality frame.
In the such scheme, when the peak value of the part of reflection spectrum detailed information is less than default tonality threshold value among the result of the real cepstral analysis of every frame signal of described sound signal, it is labeled as non-tonality frame, otherwise, it is labeled as the tonality frame.
In the such scheme, also comprise the step of the described sound signal of pre-service before the described step a, handle, divide frame processing and windowing process to realize by pre-emphasis successively described sound signal.
A kind of real-time distinguishing apparatus of audio categories comprises:
The silence detection module is used for the short-time energy root mean square of the sound signal in the computational analysis interval, and judges whether mute state of described sound signal according to it;
Real cepstral analysis module is used for when described silence detection module is determined the non-mute state of described sound signal each frame signal of described sound signal being carried out real cepstral analysis;
The audio categories identification module is used for calculating according to described real cepstral analysis result the value of each fundamental frequency retention time of described sound signal LPH/ fundamental frequency PCT continuous time; And whether the number of times of judging LPH/PCT=1 is greater than default equal times threshold value, if represent that then described sound signal is a music.
In the such scheme, described real-time distinguishing apparatus also comprises pretreatment module, be used for successively described sound signal being carried out pre-emphasis and handle, divide frame to handle and windowing process, and the audio signal transmission after will handling is given described silence detection module;
Described audio categories identification module, also be used for when the number of times of LPH/PCT=1 is not more than default equal times threshold value, determine the average of LPH/PCT, judge that whether it is greater than default average threshold value according to the value of each LPH/PCT, if represent that described sound signal is a music; Otherwise calculate the average tonality intensity A PD of described sound signal according to described real cepstral analysis result, and judge that whether it is greater than default intensity threshold, if represent that described sound signal is a music; Otherwise calculate the quantity and the average energy of the tonality frame and the non-tonality frame of described sound signal according to described real cepstral analysis result; The average energy of determining tonality frame and non-tonality frame according to the described tonality frame and the average energy of non-tonality frame is than TNR, and judge its whether less than default energy than threshold value, if represent that described sound signal is a music; Otherwise determine non-tonality frame ratio RNT according to the quantity of described tonality frame and non-tonality frame, and judge that whether it is less than default proportion threshold value, if represent that described sound signal is a music; Otherwise represent that described sound signal is voice.
Beneficial effect of the present invention mainly shows: the real-time distinguishing apparatus of audio categories provided by the invention is used to realize the real-time identification method of audio categories provided by the invention, this method is calculated the short-time characteristic of this sound signal according to the real cepstral analysis result of each frame signal of sound signal, and the employing threshold method, realized identification in real time effectively based on the audio categories of real cepstrum.
Description of drawings
Fig. 1 is the Real time identification process flow diagram of audio categories of the present invention;
Fig. 2 is the real-time distinguishing apparatus structural representation of audio categories of the present invention.
Embodiment
The invention will be further described below in conjunction with accompanying drawing.
With reference to Fig. 1, a kind of real-time identification method of audio categories may further comprise the steps:
S101: preprocessed audio signal comprises that the pre-emphasis of carrying out is successively handled, the branch frame is handled and windowing process;
S102: the short-time energy root mean square of the sound signal in the computational analysis interval, short-time energy root mean square are during less than default silence detection threshold value, and sound signal is non-mute state, enters next step, otherwise sound signal is a mute state, process ends;
S103: respectively each frame signal of the sound signal in the analystal section is carried out real cepstral analysis; Part near 0 among the real cepstral analysis result of every frame signal has mainly reflected large scale information such as power spectrum profile, has mainly reflected the spectrum detailed information away from 0 part, and promptly it can separate the spectrum profile with the spectrum details;
S104: the value of calculating each LPH/PCT of sound signal according to real cepstral analysis result; That is:
At first,, determine fundamental frequency continually varying sets of signals in the sound signal according to the part of reflection spectrum detailed information among the real cepstral analysis result, and the sets of signals that fundamental frequency remains unchanged in each fundamental frequency continually varying sets of signals;
Wherein, in each frame signal that each fundamental frequency continually varying sets of signals comprises, the peak difference of the part of reflection spectrum detailed information is less than default peak value error in reading σ among the real cepstral analysis result of adjacent two frame signals; In each frame signal that the sets of signals that each fundamental frequency remains unchanged comprises, the peak difference of the part of reflection spectrum detailed information keeps error ε less than default peak value among the real cepstral analysis result of adjacent two frame signals; ε is less than σ;
Then, the signal frame number that comprises according to each fundamental frequency continually varying sets of signals is determined corresponding PCT respectively, and the signal frame number that comprises of the sets of signals that remains unchanged according to each fundamental frequency is determined corresponding LPH respectively;
At last, each LPH is divided by with corresponding PCT respectively;
S105: whether the number of times of judging LPH/PCT=1 is greater than default equal times threshold value C 1If,, sound signal is a music, otherwise, enter next step;
S106: determine the average of LPH/PCT according to the value of each LPH/PCT, judge that whether it is greater than default average threshold value C 2If,, sound signal is a music, otherwise, enter next step; For music, because its pitch can keep a period of time on a particular value, so the probability that LPH equates with PCT is very big, even be not equal to 1, both ratio also can be relatively near 1, and for voice, its pitch seldom keeps on a particular value, so the probability that LPH equates with PCT is very little, and both difference are also bigger;
S107: calculate the APD of sound signal according to real cepstral analysis result, and judge that whether it is greater than default intensity threshold C 3, for music, because musical instrument and polyphony, its average pitch is than voice height, if APD is greater than C 3, sound signal is a music, otherwise, enter next step; The computing formula of APD is:
APD = Σ i = 1 N 1 L Σ j = l 1 l 2 | RC x i ( j ) |
Wherein, N is the signal frame number that sound signal comprises, RCx i(j) be the j point result of real cepstral analysis of the i frame signal of sound signal, L=l 2-l 1+ 1, l 1And l 2Be respectively the starting point and the terminating point of the part of reflection spectrum detailed information among the real cepstral analysis result;
S108: quantity and the average energy of calculating the tonality frame and the non-tonality frame of sound signal according to real cepstral analysis result; That is:
At first, according to the part of reflection spectrum detailed information among the real cepstral analysis result, each frame signal with sound signal is labeled as tonality frame or non-tonality frame respectively;
Because the signal frame that has fundamental frequency to exist is the tonality frame, the signal frame that does not have fundamental frequency to exist is non-tonality frame, when so the peak value of the part of reflection spectrum detailed information is less than default tonality threshold value θ among the result of the real cepstral analysis of every frame signal of sound signal, this frame signal is labeled as non-tonality frame, otherwise, this frame signal is labeled as the tonality frame;
Then, the quantity of statistics tonality frame and non-tonality frame, and the average energy of calculating tonality frame and non-tonality frame;
S109: the average energy according to tonality frame and non-tonality frame is determined TNR, and judge its whether less than default energy than threshold value C 4If,, sound signal is a music, otherwise, enter next step;
S110: the quantity according to tonality frame and non-tonality frame is determined RNT, and judges that whether it is less than default proportion threshold value C 5If,, sound signal is a music, otherwise sound signal is voice.
With reference to Fig. 2, a kind of real-time distinguishing apparatus that is used to realize the audio categories of above-mentioned real-time identification method comprises:
Pretreatment module is used for successively sound signal being carried out pre-emphasis and handles, divide frame to handle and windowing process, and the audio signal transmission after will handling is given the silence detection module;
The silence detection module is used in the computational analysis interval short-time energy root mean square of the sound signal after pretreatment module is handled, and judges whether mute state of sound signal according to it;
Real cepstral analysis module is used for when the silence detection module is determined the non-mute state of sound signal each frame signal of sound signal being carried out real cepstral analysis;
The audio categories identification module is used for calculating according to the analysis result of real cepstral analysis module the short-time characteristic of sound signal, and according to short-time characteristic, adopts the classification of threshold method identification sound signal.
To 8kHz sampling, 16 bit quantizations, pre emphasis factor for-0.80, frame length is the sound signal that 32ms, frame move 10ms, the overlapping 22ms of interframe, getting fast fourier transform length is 256, the starting point l of the part of reflection spectrum detailed information among the then real cepstral analysis result 1Be 14, terminating point l 2Be 128; Simultaneously, the signal frame number N that σ gets 4, ε gets 1, θ gets 0.2, the sound signal in the analystal section comprises gets 100, then C 1Get 0, C 2Get 0.5, C 3Get 0.6, C 4Get 0.2, C 5Get 1, when adopting the method for the invention to discern its classification, the cooperation of 5 kinds of short-time characteristics is judged, can effectively be realized the identification of audio categories.
The above is embodiments of the invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of being done, be equal to replacement, improvement etc., all should be included within the claim scope of the present invention.

Claims (9)

1. the real-time identification method of an audio categories is characterized in that, may further comprise the steps:
The short-time energy root mean square of the sound signal in a, the computational analysis interval, described short-time energy root mean square enter step b during less than default silence detection threshold value;
B, respectively each frame signal of described sound signal is carried out real cepstral analysis;
C, calculate the value of each fundamental frequency retention time of described sound signal LPH/ fundamental frequency PCT continuous time according to described real cepstral analysis result;
D, judge that the number of times of LPH/PCT=1 is whether greater than default equal times threshold value, if described sound signal is a music.
2. the real-time identification method of audio categories as claimed in claim 1 is characterized in that, whether the number of times of the described LPH/PCT=1 of judgement is greater than comprising after the default equal times threshold value:
E, when the number of times of LPH/PCT=1 is not more than default equal times threshold value, determine the average of LPH/PCT according to the value of each LPH/PCT, whether judge it greater than default average threshold value, if described sound signal is a music, otherwise, enter next step;
F, calculate the average tonality intensity A PD of described sound signal, and whether judge it greater than default intensity threshold according to described real cepstral analysis result, if described sound signal is a music, otherwise, enter next step;
G, calculate the quantity and the average energy of the tonality frame and the non-tonality frame of described sound signal according to described real cepstral analysis result;
H, determine tonality frame and non-tonality frame according to the described tonality frame and the average energy of non-tonality frame average energy than TNR, and judge its whether less than default energy than threshold value, if described sound signal is a music, otherwise, enter next step;
I, determine non-tonality frame ratio RNT according to the quantity of described tonality frame and non-tonality frame, and whether judge it less than default proportion threshold value, if described sound signal is a music, otherwise described sound signal is voice;
The computing formula of described APD is:
APD = Σ i = 1 N 1 L Σ j = l 1 l 2 | RC x i ( j ) |
Wherein, N is the signal frame number that described sound signal comprises, RCx i(j) be the j point result of real cepstral analysis of the i frame signal of described sound signal, L=l 2-l 1+ 1, l 1And l 2Be respectively the starting point and the terminating point of the part of reflection spectrum detailed information among the described real cepstral analysis result.
3. the real-time identification method of audio categories as claimed in claim 2 is characterized in that, described step c realizes by following steps:
C1, according to the part of reflection spectrum detailed information among the described real cepstral analysis result, determine fundamental frequency continually varying sets of signals in the described sound signal, and the sets of signals that fundamental frequency remains unchanged in each fundamental frequency continually varying sets of signals;
C2, the signal frame number that comprises according to each fundamental frequency continually varying sets of signals are respectively determined corresponding PCT, and the signal frame number that comprises of the sets of signals that remains unchanged according to each fundamental frequency is determined corresponding LPH respectively;
C3, each LPH is divided by with corresponding PCT respectively.
4. the real-time identification method of audio categories as claimed in claim 3, it is characterized in that: in each frame signal that each fundamental frequency continually varying sets of signals comprises, the peak difference of the part of reflection spectrum detailed information is less than default peak value error in reading among the real cepstral analysis result of adjacent two frame signals; In each frame signal that the sets of signals that each fundamental frequency remains unchanged comprises, the peak difference of the part of reflection spectrum detailed information keeps error less than default peak value among the real cepstral analysis result of adjacent two frame signals; Described peak value keeps error less than described peak value error in reading.
5. the real-time identification method of audio categories as claimed in claim 2 is characterized in that, described step g realizes by following steps:
G1, according to the part of reflection spectrum detailed information among the described real cepstral analysis result, each frame signal with described sound signal is labeled as tonality frame or non-tonality frame respectively;
The quantity of g2, statistics described tonality frame and non-tonality frame, and calculate the average energy of described tonality frame and non-tonality frame.
6. the real-time identification method of audio categories as claimed in claim 5, it is characterized in that: when the peak value of the part of reflection spectrum detailed information is less than default tonality threshold value among the result of the real cepstral analysis of every frame signal of described sound signal, it is labeled as non-tonality frame, otherwise, it is labeled as the tonality frame.
7. the real-time identification method of audio categories as claimed in claim 1, it is characterized in that: also comprise the step of the described sound signal of pre-service before the described step a, handle, divide frame processing and windowing process to realize by pre-emphasis successively described sound signal.
8. the real-time distinguishing apparatus of an audio categories is characterized in that, comprising:
The silence detection module is used for the short-time energy root mean square of the sound signal in the computational analysis interval, and judges whether mute state of described sound signal according to it;
Real cepstral analysis module is used for when described silence detection module is determined the non-mute state of described sound signal each frame signal of described sound signal being carried out real cepstral analysis;
The audio categories identification module is used for calculating according to described real cepstral analysis result the value of each fundamental frequency retention time of described sound signal LPH/ fundamental frequency PCT continuous time; And whether the number of times of judging LPH/PCT=1 is greater than default equal times threshold value, if represent that then described sound signal is a music.
9. the real-time distinguishing apparatus of audio categories as claimed in claim 8, it is characterized in that: described real-time distinguishing apparatus also comprises pretreatment module, be used for successively described sound signal being carried out pre-emphasis and handle, divide frame to handle and windowing process, and the audio signal transmission after will handling is given described silence detection module;
Described audio categories identification module, also be used for when the number of times of LPH/PCT=1 is not more than default equal times threshold value, determine the average of LPH/PCT, judge that whether it is greater than default average threshold value according to the value of each LPH/PCT, if represent that described sound signal is a music; Otherwise calculate the average tonality intensity A PD of described sound signal according to described real cepstral analysis result, and judge that whether it is greater than default intensity threshold, if represent that described sound signal is a music; Otherwise calculate the quantity and the average energy of the tonality frame and the non-tonality frame of described sound signal according to described real cepstral analysis result; The average energy of determining tonality frame and non-tonality frame according to the described tonality frame and the average energy of non-tonality frame is than TNR, and judge its whether less than default energy than threshold value, if represent that described sound signal is a music; Otherwise determine non-tonality frame ratio RNT according to the quantity of described tonality frame and non-tonality frame, and judge that whether it is less than default proportion threshold value, if represent that described sound signal is a music; Otherwise represent that described sound signal is voice.
CN2008101422448A 2008-08-05 2008-08-05 Method and device for identifying audio category in real time Expired - Fee Related CN101645265B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101422448A CN101645265B (en) 2008-08-05 2008-08-05 Method and device for identifying audio category in real time

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101422448A CN101645265B (en) 2008-08-05 2008-08-05 Method and device for identifying audio category in real time

Publications (2)

Publication Number Publication Date
CN101645265A CN101645265A (en) 2010-02-10
CN101645265B true CN101645265B (en) 2011-07-13

Family

ID=41657118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101422448A Expired - Fee Related CN101645265B (en) 2008-08-05 2008-08-05 Method and device for identifying audio category in real time

Country Status (1)

Country Link
CN (1) CN101645265B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102568470B (en) * 2012-01-11 2013-12-25 广州酷狗计算机科技有限公司 Acoustic fidelity identification method and system for audio files
US9564128B2 (en) * 2013-12-09 2017-02-07 Qualcomm Incorporated Controlling a speech recognition process of a computing device
CN104036788B (en) * 2014-05-29 2016-10-05 北京音之邦文化科技有限公司 The acoustic fidelity identification method of audio file and device
CN105336344B (en) * 2014-07-10 2019-08-20 华为技术有限公司 Noise detection method and device
CN104091601A (en) * 2014-07-10 2014-10-08 腾讯科技(深圳)有限公司 Method and device for detecting music quality
CN106920543B (en) * 2015-12-25 2019-09-06 展讯通信(上海)有限公司 Audio recognition method and device
CN112750459B (en) * 2020-08-10 2024-02-02 腾讯科技(深圳)有限公司 Audio scene recognition method, device, equipment and computer readable storage medium
CN112652324A (en) * 2020-12-28 2021-04-13 深圳万兴软件有限公司 Speech enhancement optimization method, speech enhancement optimization system and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1354455A (en) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 Sound activation detection method for identifying speech and music from noise environment
CN1920947A (en) * 2006-09-15 2007-02-28 清华大学 Voice/music detector for audio frequency coding with low bit ratio

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1354455A (en) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 Sound activation detection method for identifying speech and music from noise environment
CN1920947A (en) * 2006-09-15 2007-02-28 清华大学 Voice/music detector for audio frequency coding with low bit ratio

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Costas Panagiotakis and George Tziritas.A Speech/Music Discriminator Based on RMS and Zero-Crossings.《IEEE TRANSACTIONS ON MULTIMEDIA》.2005,第7卷(第1期), *
John Saunders.Real-time discrimination of broadcast speech/music.《Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on》.1996,第2卷 *
Zhong-Hua Fu, et al..Noise robust features for speech/music discrimination in real-time telecommunication.《Multimedia and Expo, 2009. ICME 2009. IEEE International Conference on》.2009, *

Also Published As

Publication number Publication date
CN101645265A (en) 2010-02-10

Similar Documents

Publication Publication Date Title
CN101645265B (en) Method and device for identifying audio category in real time
US8195449B2 (en) Low-complexity, non-intrusive speech quality assessment
CN102089803B (en) Method and discriminator for classifying different segments of a signal
CN109599093B (en) Intelligent quality inspection keyword detection method, device and equipment and readable storage medium
CN109147765B (en) Audio quality comprehensive evaluation method and system
AU712412B2 (en) Speech processing
US7346500B2 (en) Method of translating a voice signal to a series of discrete tones
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
US20030101050A1 (en) Real-time speech and music classifier
Hosseinzadeh et al. Combining vocal source and MFCC features for enhanced speaker recognition performance using GMMs
Dubey et al. Non-intrusive speech quality assessment using several combinations of auditory features
AU2009295251B2 (en) Method of analysing an audio signal
Dubey et al. Non-intrusive objective speech quality assessment using a combination of MFCC, PLP and LSF features
Lam et al. Objective speech quality measure for cellular phone
Zezario et al. A study on incorporating Whisper for robust speech assessment
CN102655000B (en) Method and device for classifying unvoiced sound and voiced sound
El-Maleh Classification-based Techniques for Digital Coding of Speech-plus-noise
CN111354352A (en) Automatic template cleaning method and system for audio retrieval
CN117409761B (en) Method, device, equipment and storage medium for synthesizing voice based on frequency modulation
Jabloun Large vocabulary speech recognition in noisy environments
Aye Speech recognition using Zero-crossing features
Trancoso et al. Harmonic postprocessing off speech synthesised by stochastic coders
Zhen et al. A new feature extraction based the reliability of speech in speaker recognition
Tribolet et al. An improved model for isolated word recognition
CN114255742A (en) Method, device, equipment and storage medium for voice endpoint detection

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20110713

Termination date: 20160805

CF01 Termination of patent right due to non-payment of annual fee