CN104992708A - Short-time specific audio detection model generating method and short-time specific audio detection method - Google Patents
Short-time specific audio detection model generating method and short-time specific audio detection method Download PDFInfo
- Publication number
- CN104992708A CN104992708A CN201510236568.8A CN201510236568A CN104992708A CN 104992708 A CN104992708 A CN 104992708A CN 201510236568 A CN201510236568 A CN 201510236568A CN 104992708 A CN104992708 A CN 104992708A
- Authority
- CN
- China
- Prior art keywords
- model
- gauss
- special audio
- universal background
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 238000001514 detection method Methods 0.000 title claims abstract description 29
- 238000000605 extraction Methods 0.000 claims description 10
- 239000012634 fragment Substances 0.000 claims description 6
- 238000012360 testing method Methods 0.000 claims description 5
- 239000003550 marker Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 230000001629 suppression Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 206010038743 Restlessness Diseases 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
Landscapes
- Electrically Operated Instructional Devices (AREA)
- Complex Calculations (AREA)
Abstract
The invention relates to a short-time specific audio detection model generating method comprising: extracting a characteristic of training voice data, wherein the training voice data comprises unspecific audio data and specific audio data; training a universal background model by using the characteristic of training voice data; self-adaptively acquiring a model of a kind of specific audio data according to the universal background model and the characteristic of the kind of specific audio data in the training voice data; and repeating the operation until the models of all kinds of specific audio data in the training voice data are obtained. The invention also provides a short-time specific audio detection method which detects the specific audio data by model scoring. The method not only well solves a problem of insufficient specific audio model training data, but also suppresses the background noise of input data to a certain extent.
Description
Technical field
The present invention relates to the method that special audio in short-term detects, more particularly, the present invention relates to the detection utilizing mixed Gauss model to carry out special audio in short-term.
Background technology
In a lot of fields, special audio has important effect in short-term, and especially in security fields, in some specific cases, we need to detect that the special audio in short-term of a certain class processes for some urgent events timely to facilitate us.Such as, in public, we need supervise public safety and detect the generation of mishap, as unexpected birdie, unexpected explosive sound or gunshot, we must detect in time these in short-term special audio to facilitate these fortuitous events of process in time.In addition, in the place that some are relatively important, in short-term special audio detection can also be used for abnormal sound detect, can well early warning be played a part.
The at present problem that runs into of special audio detection method or a lot of in short-term, the first because in short-term special audio that time of origin that is very fast and event occurs is very of short duration, so the information in how utilizing audio frequency is in short-term very important; The second, the frequency that special audio occurs in short-term is not very high, so have in the face of the insufficient problem of training data; 3rd, because the scene used often has complicated ground unrest, so Background suppression noise also becomes that special audio in short-term detects well is also an important problem.
Summary of the invention
The object of the invention is to overcome training data existing for the existing detection method of special audio in short-term not enough, cannot the defect of Background suppression noise, thus provide a kind of model generation of special audio in short-term based on mixed Gauss model and detection method.
Present invention also offers one special audio detection model generation method in short-term, comprising:
Step 101, feature extraction is carried out to training utterance data; Wherein, described training utterance data comprise nonspecific voice data and special audio data;
Step 102, the feature of training utterance data obtained by step 101, carry out the training of universal background model; Wherein, described universal background model is mixed Gauss model, and its expression formula is:
The weight of each Gauss that what wi represented is, span 0 ~ 1, and meets normalizing condition:
x represents the frame feature of training utterance fragment; λ represents the set of all parameters in gauss hybrid models; p
ix () represents the probability density function of each single Gauss model, its expression formula is:
The dimension of the frame feature of training utterance fragment that what D represented is; The covariance matrix of what Σ i represented is this Gaussian function; μ
iwhat represent is the mean vector of this Gaussian function;
Step 103, feature by class special audio data a certain in training utterance data, according to the model obtaining such special audio data in the universal background model that step 102 obtains adaptively; Repeat this operation, until obtain the model of all class special audio data in training utterance data.
In technique scheme, in a step 101, the feature extracted training utterance data is mel cepstrum coefficients.
In technique scheme, in a step 102, the training carrying out universal background model comprises period of use and hopes that maximized method carries out parameter estimation to universal background model, the parameter estimated comprises three classes: Gauss weight w, Gauss variance δ and Gaussian mean μ, and wherein w is each Gauss weight w
iset, δ is each Gauss variance δ
iset, μ is each Gaussian mean μ
iset, i represents the numbering of each single Gauss model; Specifically comprise:
Step 102-1, to a kth Gauss weight w
krenewal:
A kth Gauss weight w
krenewal process is as shown in following formula:
Wherein, x
trepresenting the t frame proper vector in the training utterance x of input, is the known vector calculated at characteristic extraction procedure; λ is the general name to parameters all in gauss hybrid models, and these all can provide initial value in the initialization of the incipient stage of training, are known parameters; The totalframes of what T represented the is training utterance of all inputs to calculate known numeric value; What k represented is kth single Gauss model numbering in gauss hybrid models; P (k|x
t, λ) and that represent is the training utterance frame x inputted
tposterior probability on a universal background model kth Gauss, by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda;
Step 102-2, to a kth Gaussian mean μ
krenewal:
A kth Gaussian mean μ
krenewal process is as shown in following formula:
Wherein, T, x
twith the variable that λ is known, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda;
Step 102-3, to kth Gauss's variance
renewal:
A kth Gaussian mean
renewal process is as shown in following formula:
Wherein, T, x
t, λ and μ
kall known variable, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda.
In technique scheme, in step 103, the model obtaining class special audio data in the universal background model obtained according to step 102 adaptively comprises:
Step 103-1, first calculate the posterior probability n of each speech frame on universal background model according to the proper vector of special audio of training
i, first order statistic E
i(x) and second-order statistic E
i(x
2); Concrete computation process is as shown in following formula:
Wherein, Pr (i|x
t) represent the posterior probability of input audio frequency x t frame in universal background model i-th Gauss; x
trepresent the feature of input audio frequency x t frame data; What T represented is the totalframes inputting audio frequency; The numbering of i-th single Gauss in universal background model that what i represented is;
Step 103-2, the posterior probability utilizing step 103-1 to calculate, first order statistic and second-order statistic, do self-adaptative adjustment to the parameter of universal background model, obtains the weight of special audio model
average
and covariance
the formula of self-adaptative adjustment is as follows:
Wherein,
with
variance, average, weight adjusting coefficient respectively; What T represented is such special audio training data totalframes, and γ represents normalized parameter, ensures
w
iwhat represent is the weight of i-th Gauss model in universal background model; μ
iwhat represent is the average of i-th Gauss model in universal background model;
represent the covariance of i-th Gauss in universal background model, μ
iwhat represent is the average of i-th Gauss in universal background model,
what represent is the average of i-th Gauss of this special audio model that self-adaptation obtains.
Invention further provides one special audio detection method in short-term, comprising:
Step 201, feature extraction is done to inputted tested speech;
Step 202, tested speech feature step 201 extracted are input in the middle of the universal background model that the described detection model of special audio in short-term generation method obtains, and calculate the score of tested speech on universal background model;
The mixed Gauss model of all kinds of special audios that step 203, the detection model of the special audio in short-term generation method described in the input of tested speech feature step 201 extracted obtain, calculates the score of tested speech on the mixed Gauss model of each class special audio;
The score of tested speech on the mixed Gauss model of all kinds of special audio that step 204, the tested speech obtained step 202 obtain in score and the step 203 of universal background model asks difference respectively, difference and threshold value are compared, thus adjudicate this testing audio and belong to which kind of special audio, if have multiple model score all in threshold range, then adopt the method for getting maximal value to adjudicate, select the special audio of the maximum model sign of mark as tested speech final judging result.
In technique scheme, in step 202., calculate the score of tested speech on universal background model to comprise: choose N number of Gauss that posterior probability in universal background model is maximum, and calculate this N number of probability sum, with this N number of gaussian sequence number of tense marker.
In technique scheme, in step 203, calculate the score of tested speech on the mixed Gauss model of each class special audio to comprise: N number of gaussian sequence of the universal background model recorded by step 202, calculate the posterior probability sum of this N number of Gauss in the mixed Gauss model of special audio accordingly, using this value as the score of tested speech on the mixed Gauss model of all kinds of special audio.
In technique scheme, in step 201, the feature extracted tested speech is mel cepstrum coefficients.
The invention has the advantages that:
Method of the present invention not only can overcome the insufficient problem of special audio model training data in short-term well, can also Background suppression noise well to a certain extent.
Accompanying drawing explanation
Fig. 1 be in short-term in special audio detection model generation method about the training ultimate principle block diagram of universal background model;
Fig. 2 be in short-term in special audio detection model generation method about the training ultimate principle block diagram of special audio model;
Fig. 3 is the process flow diagram of special audio detection method in short-term.
Embodiment
Existing composition graphs 1 and Fig. 2 are described in further detail the specific embodiment of the present invention.
The detection method of special audio in short-term of the present invention comprises two stages, and the first stage utilizes training utterance data training pattern, and subordinate phase is that the model after utilizing training detects tested speech.
One, the model training stage
Step 101, carry out feature extraction to training utterance data, the feature extracted is mel cepstrum coefficients (MFCC feature), and this category feature comprises energy value and single order, second order difference;
In one embodiment, the frame length of the mel cepstrum coefficients extracted is 20ms, and frame moves as 10ms, comprises energy value and single order, second order difference; Characteristic dimension is 60 dimensions altogether;
Described training utterance data should comprise the data of a large amount of nonspecific audio frequency and the data of a certain amount of special audio.
Step 102, the feature of training utterance data utilizing step 101 to obtain, i.e. mel cepstrum coefficients, carries out the training (UBM model) of universal background model;
With reference to the training schematic diagram of the universal background model given by figure 1, universal background model is as formula (1):
W in formula (1)
iwhat represent is the weight of each Gauss, and span 0 ~ 1, and meets normalizing condition:
x represents training utterance fragment frames feature; λ represents the set of all parameters in gauss hybrid models; M represents Gaussian Mixture number in gauss hybrid models.
P in formula (1)
ix () represents the probability density function of each single Gauss model, its expression is as formula (2):
Wherein p
iwhat x () was represented by following parameter characterization: D is the dimension of the frame feature of training utterance fragment, determined by characteristic dimension in characteristic extraction procedure; The covariance matrix of what Σ i represented is this Gaussian function; μ
iwhat represent is the mean vector of this Gaussian function.
Be exactly more than the concrete expression of universal background model, the gauss hybrid models linear weighted function summation of multiple single Gauss carrys out the probability distribution function of the general speaker of matching for voiced speech feature, i.e. distribution probability density function.So the distribution of speaker's sounding can be characterized well by common Gaussian mixture model, speaker's pronunciation character can be characterized well.
On the basis of above-mentioned universal background model, the process utilizing the feature of training utterance data to carry out universal background model training refers to utilize expects that maximized method carries out parameter estimation.
After parameter estimation, can obtain universal background model, this model essence is exactly mixed Gauss model, and its parameter just comprises three: Gauss weight w, Gauss variance δ and Gaussian mean μ, and wherein w is each Gauss weight w
iset, δ is each Gauss variance δ
iset, μ is each Gaussian mean μ
iset, i represents the numbering of not single Gauss model.Trained by training data, these three parameters obtained are unique.
Concrete parameter estimation procedure is as follows:
Step 102-1, to a kth Gauss weight w
krenewal:
A kth Gauss weight w
krenewal process is as formula (3):
Wherein, x
trepresenting the t frame proper vector in the training utterance x of input, is the known vector calculated at characteristic extraction procedure; λ is the same with the expression in formula (1), is the general name to parameters all in gauss hybrid models, and these all can provide initial value in the initialization of the incipient stage of training, are known parameters; The totalframes of what T represented the is training utterance of all inputs to calculate known numeric value; What k represented is kth single Gauss model numbering in gauss hybrid models; P (k|x
t, λ) and that represent is the training utterance frame x inputted
tposterior probability on a universal background model kth Gauss, this is by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda.
Step 102-2, to a kth Gaussian mean μ
krenewal:
A kth Gaussian mean μ
krenewal process is as formula (4):
Wherein each parameter is the same with the implication in formula (3), wherein, and T, x
twith the variable that λ is known, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda.
Step 102-3, to kth Gauss's variance
renewal:
A kth Gaussian mean
renewal process is as formula (5):
Wherein each parameter is the same with the implication in formula (3) and formula (4), wherein, and T, x
t, λ and μ
kall known variable, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda.
Step 103, in order to obtain the model of each special audio, first need first to obtain such special audio voice of part as model training voice, if the more difficult acquisition of such special audio data, such special audio data during training universal background model can be used, if such new special audio data can be obtained, just use such new voice data as training data, no matter training data has how many, and a class special audio obtains corresponding a kind of special audio model.
In this step, as shown in Figure 2, will utilize a small amount of certain class special audio training data and Bayesian adaptation, from universal background model, self-adaptation obtains such special audio model, and concrete adaptive process is as follows:
Step 103-1, first calculate the posterior probability of each speech frame on universal background model, first order statistic and second-order statistic according to the proper vector of special audio of training; Concrete computation process is as formula (6) (7) (8):
Wherein, Pr (i|x
t) represent the posterior probability of input audio frequency x t frame in universal background model i-th Gauss; x
trepresent the feature of input audio frequency x t frame data; What T represented is the totalframes inputting audio frequency; The numbering of i-th single Gauss in universal background model that what i represented is.
Because when each special audio model of training, it is different to be used for adaptive often kind of special audio data, thus be used for training the posterior probability calculated of each special audio model, single order second-order statistic is also different.
Step 103-2, the posterior probability utilizing step 103-1 to calculate, first order statistic and second-order statistic, do self-adaptative adjustment to the parameter of universal background model, obtains the weight of special audio model
average
and covariance
because special audio model is also gauss hybrid models in essence, so obtain the weight of special audio model
average
and covariance
after, just can characterize the mixed Gauss model of this special audio.
Concrete self-adaptation formula is as shown in (9) (10) (11):
Wherein,
with
variance, average, weight adjusting coefficient respectively; n
i, E
i(x) and E
i(x
2) be exactly posterior probability, first order statistic, the second-order statistic of this special audio training data calculated by formula (6) (7) (8); In formula (9), what T represented is such special audio training data totalframes, and γ represents normalized parameter, ensures
w
iwhat represent is the weight of i-th Gauss model in universal background model; In formula (10), μ
iwhat represent is the average of i-th Gauss model in universal background model; In formula (11),
represent the covariance of i-th Gauss in universal background model, μ
iwhat represent is the average of i-th Gauss in universal background model,
what represent is the average of i-th Gauss of this special audio model that self-adaptation obtains.
After above-mentioned computation process, just obtain such special audio model.
Can know from step 103-1, because the self-adapting data of each special audio model is different, so also different by the posterior probability that calculates, single order second-order statistic, so the final special audio model obtained is also just different after calculating through 103-2.
Two, test phase
With reference to figure 3, test phase comprises the following steps:
Step 201, feature extraction is done to inputted tested speech;
The feature extracted in this step is identical with the type of the feature extracted in step 101, as being mel cepstrum coefficients;
Step 202, tested speech feature step 201 extracted are input in the middle of the universal background model of training in step 102, calculate the score of tested speech on universal background model.
Can be known by explanation before, universal background model is gauss hybrid models in essence, and tested speech is exactly each Gauss's posterior probability sum in the score of universal background model.As the preferred implementation of one, calculate to accelerate score, when actual computation be not calculate whole Gauss posterior probability but choose the maximum N number of Gauss of posterior probability, and calculate this N number of probability sum, with this N number of gaussian sequence number of tense marker.
The mixed Gauss model of the respective special audio that step 203, the tested speech feature input step 103 step 201 extracted obtain, calculate the score of tested speech on the mixed Gauss model of each special audio, if there be M special audio model, the so final score obtained has M altogether.
The concrete grammar calculating the score of tested speech on the mixed Gauss model of respective special audio remains and calculates tested speech posterior probability sum above each Gauss on special audio model.As the preferred implementation of one, in order to improve computing velocity, N number of gaussian sequence of the universal background model recorded by step 202, calculate the posterior probability sum of this N number of Gauss in the mixed Gauss model of special audio accordingly, using this value as the score of tested speech on the mixed Gauss model of respective special audio.
Step 204, the score of tested speech on the mixed Gauss model of respective special audio that the tested speech obtained step 202 obtains in score and the step 203 of universal background model asks difference respectively, difference and threshold value are compared, thus adjudicate this testing audio and belong to which special audio, if have multiple model score all in threshold range, the method of getting maximal value is then adopted to adjudicate, namely compare the model score of these scores in threshold range, select the special audio of the maximum model sign of mark as tested speech final judging result.
It should be noted last that, above embodiment is only in order to illustrate technical scheme of the present invention and unrestricted.Although with reference to embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that, modify to technical scheme of the present invention or equivalent replacement, do not depart from the spirit and scope of technical solution of the present invention, it all should be encompassed in the middle of right of the present invention.
Claims (8)
1. a special audio detection model generation method in short-term, comprising:
Step 101, feature extraction is carried out to training utterance data; Wherein, described training utterance data comprise nonspecific voice data and special audio data;
Step 102, the feature of training utterance data obtained by step 101, carry out the training of universal background model; Wherein, described universal background model is mixed Gauss model, and its expression formula is:
W
iwhat represent is the weight of each Gauss, and span 0 ~ 1, and meets normalizing condition:
x represents the frame feature of training utterance fragment; λ represents the set of all parameters in gauss hybrid models; p
ix () represents the probability density function of each single Gauss model, its expression formula is:
The dimension of the frame feature of training utterance fragment that what D represented is; The covariance matrix of what Σ i represented is this Gaussian function; μ
iwhat represent is the mean vector of this Gaussian function;
Step 103, feature by class special audio data a certain in training utterance data, according to the model obtaining such special audio data in the universal background model that step 102 obtains adaptively; Repeat this operation, until obtain the model of all class special audio data in training utterance data.
2. the detection model of special audio in short-term generation method according to claim 1, is characterized in that, in a step 101, the feature extracted training utterance data is mel cepstrum coefficients.
3. the detection model of special audio in short-term generation method according to claim 1, it is characterized in that, in a step 102, the training carrying out universal background model comprises period of use and hopes that maximized method carries out parameter estimation to universal background model, the parameter estimated comprises three classes: Gauss weight w, Gauss variance δ and Gaussian mean μ, and wherein w is each Gauss weight w
iset, δ is each Gauss variance δ
iset, μ is each Gaussian mean μ
iset, i represents the numbering of each single Gauss model; Specifically comprise:
Step 102-1, to a kth Gauss weight w
krenewal:
A kth Gauss weight w
krenewal process is as shown in following formula:
Wherein, x
trepresenting the t frame proper vector in the training utterance x of input, is the known vector calculated at characteristic extraction procedure; λ is the general name to parameters all in gauss hybrid models, and these all can provide initial value in the initialization of the incipient stage of training, are known parameters; The totalframes of what T represented the is training utterance of all inputs to calculate known numeric value; What k represented is kth single Gauss model numbering in gauss hybrid models; P (k|x
t, λ) and that represent is the training utterance frame x inputted
tposterior probability on a universal background model kth Gauss, by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda;
Step 102-2, to a kth Gaussian mean μ
krenewal:
A kth Gaussian mean μ
krenewal process is as shown in following formula:
Wherein, T, x
twith the variable that λ is known, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda;
Step 102-3, to kth Gauss's variance
renewal:
A kth Gaussian mean
renewal process is as shown in following formula:
Wherein, T, x
t, λ and μ
kall known variable, and p (k|x
t, λ) and be by incoming frame x
tgained is calculated with mixed Gauss model parameter lambda.
4. the detection model of special audio in short-term generation method according to claim 1, is characterized in that, in step 103, the model according to obtaining class special audio data in the universal background model that step 102 obtains adaptively comprises:
Step 103-1, first calculate the posterior probability n of each speech frame on universal background model according to the proper vector of special audio of training
i, first order statistic E
i(x) and second-order statistic E
i(x
2); Concrete computation process is as shown in following formula:
Wherein, Pr (i|x
t) represent the posterior probability of input audio frequency x t frame in universal background model i-th Gauss; x
trepresent the feature of input audio frequency x t frame data; What T represented is the totalframes inputting audio frequency; The numbering of i-th single Gauss in universal background model that what i represented is;
Step 103-2, the posterior probability utilizing step 103-1 to calculate, first order statistic and second-order statistic, do self-adaptative adjustment to the parameter of universal background model, obtains the weight of special audio model
average
and covariance
the formula of self-adaptative adjustment is as follows:
Wherein,
with
variance, average, weight adjusting coefficient respectively; What T represented is such special audio training data totalframes, and γ represents normalized parameter, ensures
w
iwhat represent is the weight of i-th Gauss model in universal background model; μ
iwhat represent is the average of i-th Gauss model in universal background model;
represent the covariance of i-th Gauss in universal background model, μ
iwhat represent is the average of i-th Gauss in universal background model,
what represent is the average of i-th Gauss of this special audio model that self-adaptation obtains.
5. a special audio detection method in short-term, comprising:
Step 201, feature extraction is done to inputted tested speech;
Step 202, tested speech feature step 201 extracted are input in the middle of the universal background model that the described detection model of the special audio in short-term method of generationing of one of claim 1-4 obtains, the score of calculating tested speech on universal background model;
Step 203, tested speech feature step 201 extracted input the mixed Gauss model of all kinds of special audios that the described detection model of the special audio in short-term generation method of one of claim 1-4 obtains, and calculate the score of tested speech on the mixed Gauss model of each class special audio;
The score of tested speech on the mixed Gauss model of all kinds of special audio that step 204, the tested speech obtained step 202 obtain in score and the step 203 of universal background model asks difference respectively, difference and threshold value are compared, thus adjudicate this testing audio and belong to which kind of special audio, if have multiple model score all in threshold range, then adopt the method for getting maximal value to adjudicate, select the special audio of the maximum model sign of mark as tested speech final judging result.
6. the detection method of special audio in short-term according to claim 5, it is characterized in that, in step 202., calculate the score of tested speech on universal background model to comprise: choose N number of Gauss that posterior probability in universal background model is maximum, and calculate this N number of probability sum, with this N number of gaussian sequence number of tense marker.
7. the detection method of special audio in short-term according to claim 6, it is characterized in that, in step 203, calculate the score of tested speech on the mixed Gauss model of each class special audio to comprise: N number of gaussian sequence of the universal background model recorded by step 202, calculate the posterior probability sum of this N number of Gauss in the mixed Gauss model of special audio accordingly, using this value as the score of tested speech on the mixed Gauss model of all kinds of special audio.
8. the detection method of special audio in short-term according to claim 5, is characterized in that, in step 201, the feature extracted tested speech is mel cepstrum coefficients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236568.8A CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236568.8A CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104992708A true CN104992708A (en) | 2015-10-21 |
CN104992708B CN104992708B (en) | 2018-07-24 |
Family
ID=54304511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510236568.8A Expired - Fee Related CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104992708B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251861A (en) * | 2016-08-05 | 2016-12-21 | 重庆大学 | A kind of abnormal sound in public places detection method based on scene modeling |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
CN108305616A (en) * | 2018-01-16 | 2018-07-20 | 国家计算机网络与信息安全管理中心 | A kind of audio scene recognition method and device based on long feature extraction in short-term |
CN110135492A (en) * | 2019-05-13 | 2019-08-16 | 山东大学 | Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models |
CN113888777A (en) * | 2021-09-08 | 2022-01-04 | 南京金盾公共安全技术研究院有限公司 | Voiceprint unlocking method and device based on cloud machine learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509546A (en) * | 2011-11-11 | 2012-06-20 | 北京声迅电子股份有限公司 | Noise reduction and abnormal sound detection method applied to rail transit |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103198605A (en) * | 2013-03-11 | 2013-07-10 | 成都百威讯科技有限责任公司 | Indoor emergent abnormal event alarm system |
CN103226951A (en) * | 2013-04-19 | 2013-07-31 | 清华大学 | Speaker verification system creation method based on model sequence adaptive technique |
CN103366738A (en) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system |
-
2015
- 2015-05-11 CN CN201510236568.8A patent/CN104992708B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509546A (en) * | 2011-11-11 | 2012-06-20 | 北京声迅电子股份有限公司 | Noise reduction and abnormal sound detection method applied to rail transit |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103366738A (en) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system |
CN103198605A (en) * | 2013-03-11 | 2013-07-10 | 成都百威讯科技有限责任公司 | Indoor emergent abnormal event alarm system |
CN103226951A (en) * | 2013-04-19 | 2013-07-31 | 清华大学 | Speaker verification system creation method based on model sequence adaptive technique |
Non-Patent Citations (1)
Title |
---|
罗森林; 王坤; 谢尔曼; 潘丽敏; 李金玉: "融合GMM及SVM的特定音频事件高精度识别方法", 《北京理工大学学报》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251861A (en) * | 2016-08-05 | 2016-12-21 | 重庆大学 | A kind of abnormal sound in public places detection method based on scene modeling |
CN106251861B (en) * | 2016-08-05 | 2019-04-23 | 重庆大学 | A kind of abnormal sound in public places detection method based on scene modeling |
CN107517207A (en) * | 2017-03-13 | 2017-12-26 | 平安科技(深圳)有限公司 | Server, auth method and computer-readable recording medium |
WO2018166187A1 (en) * | 2017-03-13 | 2018-09-20 | 平安科技(深圳)有限公司 | Server, identity verification method and system, and a computer-readable storage medium |
CN108305616A (en) * | 2018-01-16 | 2018-07-20 | 国家计算机网络与信息安全管理中心 | A kind of audio scene recognition method and device based on long feature extraction in short-term |
CN110135492A (en) * | 2019-05-13 | 2019-08-16 | 山东大学 | Equipment fault diagnosis and method for detecting abnormality and system based on more Gauss models |
CN113888777A (en) * | 2021-09-08 | 2022-01-04 | 南京金盾公共安全技术研究院有限公司 | Voiceprint unlocking method and device based on cloud machine learning |
CN113888777B (en) * | 2021-09-08 | 2023-08-18 | 南京金盾公共安全技术研究院有限公司 | Voiceprint unlocking method and device based on cloud machine learning |
Also Published As
Publication number | Publication date |
---|---|
CN104992708B (en) | 2018-07-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104992708A (en) | Short-time specific audio detection model generating method and short-time specific audio detection method | |
Xu et al. | Deep sparse rectifier neural networks for speech denoising | |
Chai et al. | A cross-entropy-guided measure (CEGM) for assessing speech recognition performance and optimizing DNN-based speech enhancement | |
De Leon et al. | Detection of synthetic speech for the problem of imposture | |
CN110308485B (en) | Microseismic signal classification method and device based on deep learning and storage medium | |
CN110349597B (en) | Voice detection method and device | |
CN104900235A (en) | Voiceprint recognition method based on pitch period mixed characteristic parameters | |
CN104835498A (en) | Voiceprint identification method based on multi-type combination characteristic parameters | |
Rao et al. | Target speaker extraction for overlapped multi-talker speaker verification | |
CN103456302B (en) | A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight | |
CN104485108A (en) | Noise and speaker combined compensation method based on multi-speaker model | |
Ghalehjegh et al. | Deep bottleneck features for i-vector based text-independent speaker verification | |
CN108320732A (en) | The method and apparatus for generating target speaker's speech recognition computation model | |
Chazan et al. | A phoneme-based pre-training approach for deep neural network with application to speech enhancement | |
Allen et al. | Language identification using warping and the shifted delta cepstrum | |
CN106297769A (en) | A kind of distinctive feature extracting method being applied to languages identification | |
CN106251861A (en) | A kind of abnormal sound in public places detection method based on scene modeling | |
Yamamoto et al. | Denoising autoencoder-based speaker feature restoration for utterances of short duration. | |
Pohjalainen et al. | Automatic detection of anger in telephone speech with robust autoregressive modulation filtering | |
Hong et al. | Modified-prior PLDA and score calibration for duration mismatch compensation in speaker recognition system. | |
Dong et al. | Long-term SNR estimation using noise residuals and a two-stage deep-learning framework | |
Wang et al. | F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification | |
Soni et al. | Effectiveness of ideal ratio mask for non-intrusive quality assessment of noise suppressed speech | |
Garg et al. | Deep convolutional neural network-based speech signal enhancement using extensive speech features | |
Mansour et al. | A comparative study in emotional speaker recognition in noisy environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180724 |