CN104992708B - Specific audio detection model generation in short-term and detection method - Google Patents
Specific audio detection model generation in short-term and detection method Download PDFInfo
- Publication number
- CN104992708B CN104992708B CN201510236568.8A CN201510236568A CN104992708B CN 104992708 B CN104992708 B CN 104992708B CN 201510236568 A CN201510236568 A CN 201510236568A CN 104992708 B CN104992708 B CN 104992708B
- Authority
- CN
- China
- Prior art keywords
- model
- gauss
- specific audio
- training
- universal background
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 31
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000000605 extraction Methods 0.000 claims abstract description 11
- 239000013598 vector Substances 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 11
- 238000012360 testing method Methods 0.000 claims description 9
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 230000006583 body weight regulation Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 235000013399 edible fruits Nutrition 0.000 claims 1
- NAWXUBYGYWOOIX-SFHVURJKSA-N (2s)-2-[[4-[2-(2,4-diaminoquinazolin-6-yl)ethyl]benzoyl]amino]-4-methylidenepentanedioic acid Chemical compound C1=CC2=NC(N)=NC(N)=C2C=C1CCC1=CC=C(C(=O)N[C@@H](CC(=C)C(O)=O)C(O)=O)C=C1 NAWXUBYGYWOOIX-SFHVURJKSA-N 0.000 description 1
- 241000208340 Araliaceae Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005315 distribution function Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 230000002401 inhibitory effect Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
Landscapes
- Electrically Operated Instructional Devices (AREA)
- Complex Calculations (AREA)
Abstract
The present invention relates to a kind of specific audio detection model generation methods in short-term, including:Feature extraction is carried out to training voice data;Wherein, the trained voice data includes nonspecific audio data and specific audio data;With the feature of training voice data, the training of universal background model is carried out;By the feature of certain a kind of specific audio data in training voice data, according to the model for adaptively obtaining such specific audio data in universal background model;This operation is repeated, until obtaining training the model of all class specific audio data in voice data.The present invention also provides a kind of specific audio detection method in short-term, this method is given a mark by model carries out the detection of specific audio.This method can not only well solve the insufficient problem of specific audio model training data, can be inhibited with a degree of ambient noise to input data.
Description
Technical field
The present invention relates to the methods of the detection of specific audio in short-term, it is more particularly related to utilize mixed Gaussian mould
Type carries out the detection of specific audio in short-term.
Background technology
In many fields, specific audio suffers from important role in short-term, especially in security fields, in some specific feelings
Under condition, it would be desirable to detect certain a kind of specific audio in short-term to facilitate us to carry out timely some urgent events
Processing.For example, in public, it would be desirable to supervise public safety and detect the generation of contingency, as unexpected is screamed
Sound, unexpected explosive sound or gunshot, we must detect in time these in short-term specific audio with facilitate timely processing this
A little fortuitous events.In addition to this, in some relatively important places, the detection of specific audio in short-term can be also used for abnormal sound
Sound detects, and can be very good to play a part of early warning.
The problem of specific audio detection method encounters in short-term or many at present, first, because specific audio is sent out in short-term
Life is quickly and the time of origin of event is very of short duration, so how critically important using the information in audio in short-term;Second, it is special in short-term
The accordatura raw frequency that takes place frequently is not very high, so having to face the insufficient problem of training data;Third, due to the use of field
Scape often has complicated ambient noise, so inhibiting ambient noise well also to become specific audio detection in short-term to be also a weight
The problem of wanting.
Invention content
It is an object of the invention to overcome training data deficiency, nothing present in the existing detection method of specific audio in short-term
Method inhibits the defect of ambient noise, to provide a kind of generation of the model of specific audio in short-term and detection based on mixed Gauss model
Method.
The present invention also provides a kind of specific audio detection model generation methods in short-term, including:
Step 101 carries out feature extraction to training voice data;Wherein, the trained voice data includes nonspecific sound
Frequency evidence and specific audio data;
Step 102, with the feature of the obtained trained voice data of step 101, carry out the training of universal background model;Its
In, the universal background model is mixed Gauss model, and expression formula is:
What wi was indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X tables
Show the frame feature of trained sound bite;λ indicates the set of all parameters in gauss hybrid models;pi(x) each single Gauss is indicated
The probability density function of model, expression formula are:
What D was indicated is the dimension of the frame feature of trained sound bite;What Σ i were indicated is the covariance square of the Gaussian function
Battle array;μiWhat is indicated is the mean vector of the Gaussian function;
Step 103, the feature by training a kind of specific audio data of certain in voice data, it is obtained according to step 102
The model of such specific audio data is adaptively obtained in universal background model;This operation is repeated, until obtaining training language
The model of all class specific audio data in sound data.
In above-mentioned technical proposal, in a step 101, training voice data is extracted and is characterized as mel cepstrum coefficients.
In above-mentioned technical proposal, in a step 102, the training for carrying out universal background model includes hoping maximumlly period of use
Method carries out parameter Estimation to universal background model, and the parameter to be estimated includes three classes:Gauss weight w, Gauss variance δ and
Gaussian mean μ, wherein w are each Gauss weight wiSet, δ is each Gauss variance δiSet, μ is each Gaussian mean
μiSet, i indicates the number of each single Gauss model;It specifically includes:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process is as shown in following equation:
Wherein, xtIt indicates the t frame feature vectors in the training voice x of input, is calculated in characteristic extraction procedure
Known vector;λ is the general name to all parameters in gauss hybrid models, and what these all can be in the trained incipient stage is initial
Initial value is provided in change, is known parameter;What T was indicated is the totalframes of the training voice of all inputs, is that can calculate
Carry out known numeric value;What k was indicated is k-th of single Gauss model number in gauss hybrid models;p(k|xt, λ) and what is indicated is input
Training speech frame xtPosterior probability on k-th of Gauss of universal background model, by input frame xtWith mixed Gauss model parameter
λ calculates gained;
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process is as shown in following equation:
Wherein, T, xtAll it is known variable with λ, and p (k | xt, λ) and it is by input frame xtWith mixed Gauss model parameter lambda
Calculate gained;
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process is as shown in following equation:
Wherein, T, xt, λ and μkAll it is known variable, and p (k | xt, λ) and it is by input frame xtJoin with mixed Gauss model
Number λ calculates gained.
In above-mentioned technical proposal, in step 103, according in the obtained universal background model of step 102 adaptively
The model for obtaining a kind of specific audio data includes:
Step 103-1, each speech frame is calculated in common background mould according to the feature vector of trained specific audio first
Posterior probability n in typei, first order statistic Ei(x) and second-order statistic Ei(x2);Specific calculating process such as following equation institute
Show:
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtTable
Show the feature of input audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is in universal background model
The number of i-th of single Gauss;
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, it is right
The parameter of universal background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covariance
The formula adaptively adjusted is as follows:
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;What T was indicated is such specific audio
Training data totalframes, γ indicate normalized parameter, ensurewiWhat is indicated is i-th high in universal background model
The weight of this model;μiWhat is indicated is the mean value of i-th of Gauss model in universal background model;Indicate universal background model
In i-th of Gauss covariance, μiWhat is indicated is the mean value of i-th of Gauss in universal background model,What is indicated is adaptive
The mean value of i-th of Gauss of the obtained specific audio model.
Invention further provides a kind of specific audio detection methods in short-term, including:
Step 201 does feature extraction to the tested speech inputted;
The tested speech feature that step 201 is extracted is input to the detection model of the specific audio in short-term life by step 202
In the obtained universal background model of method, score of the tested speech on universal background model is calculated;
Step 203, the tested speech feature input for extracting step 201 detection model of specific audio in short-term generate
The mixed Gauss model of the obtained all kinds of specific audios of method, mixed Gaussian of the calculating tested speech in every a kind of specific audio
Score above model;
Step 204 obtains the obtained tested speech of step 202 in score and the step 203 of universal background model
Score of the tested speech on the mixed Gauss model of all kinds of specific audios seeks difference respectively, and difference and threshold value are compared
Compared with to adjudicate which kind of specific audio this testing audio belongs to, if there is multiple model scores are all in threshold range, then
It is adjudicated using the method being maximized, selects the specific audio that score maximum model characterizes as tested speech conclusive judgement
As a result.
In above-mentioned technical proposal, in step 202, calculating score of the tested speech on universal background model includes:
The sum of choose the maximum N number of Gauss of posterior probability in universal background model, and calculate this N number of probability, while marking this N number of Gauss
Sequence number.
In above-mentioned technical proposal, in step 203, mixed Gauss model of the calculating tested speech in every a kind of specific audio
Score above includes:By the N number of gaussian sequence for the universal background model that step 202 records, specific audio is accordingly calculated
Mixed Gauss model in this N number of Gauss the sum of posterior probability, using the value as tested speech in the mixed of all kinds of specific audios
Close the score above Gauss model.
In above-mentioned technical proposal, in step 201, tested speech is extracted and is characterized as mel cepstrum coefficients.
The advantage of the invention is that:
The method of the present invention can not only overcome the problems, such as that specific audio model training data are insufficient in short-term well, also
It can inhibit ambient noise well to a certain extent.
Description of the drawings
Fig. 1 is the training basic principle frame in specific audio detection model generation method about universal background model in short-term
Figure;
Fig. 2 is the training basic principle frame in specific audio detection model generation method about specific audio model in short-term
Figure;
Fig. 3 is the flow chart of specific audio detection method in short-term.
Specific implementation mode
The specific implementation mode of the present invention is described in further detail in conjunction with Fig. 1 and Fig. 2.
The detection method of specific audio in short-term of the present invention includes two stages, and the first stage is instructed using training voice data
Practice model, second stage is detected to tested speech using the model after training.
One, model training stage
Step 101 carries out feature extraction to training voice data, is extracted and is characterized as mel cepstrum coefficients (MFCC spies
Sign), this category feature includes energy value and single order, second differnce;
In one embodiment, the frame length of the mel cepstrum coefficients extracted is 20ms, and it is 10ms that frame, which moves, including energy value
With single order, second differnce;Characteristic dimension is 60 dimensions altogether;
The trained voice data should include the data of a large amount of nonspecific audio and a certain amount of specific audio
Data.
Step 102, using the feature of the obtained trained voice data of step 101, i.e. mel cepstrum coefficients, carry out general
The training (UBM model) of background model;
The training schematic diagram of universal background model with reference to given by figure 1, universal background model such as formula (1):
W in formula (1)iWhat is indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X indicates training sound bite frame feature;λ indicates the set of all parameters in gauss hybrid models;M indicates Gauss
Gaussian Mixture number in mixed model.
P in formula (1)i(x) probability density function of each single Gauss model, expression such as formula are indicated
(2):
Wherein pi(x) by following parameter characterization:What D was indicated is the dimension of the frame feature of trained sound bite, by spy
Characteristic dimension is determined in sign extraction process;What Σ i were indicated is the covariance matrix of the Gaussian function;μiThat indicate is the Gauss
The mean vector of function.
Above is exactly the specific expression of universal background model, and gauss hybrid models are summed with the linear weighted function of multiple single Gausses
To be fitted general speaker for the probability-distribution function of voiced speech feature, i.e. distribution probability density function.So by logical
The distribution of speaker's sounding can be characterized well with gauss hybrid models, can characterize speaker's pronunciation character well.
On the basis of above-mentioned universal background model, universal background model training is carried out using the feature of training voice data
Process refer to utilizing it is expected that maximized method carries out parameter Estimation.
After parameter Estimation, universal background model can be obtained, which is exactly mixed Gauss model, its parameter is just
Including three:Gauss weight w, Gauss variance δ and Gaussian mean μ, wherein w are each Gauss weight wiSet, δ is each
Gauss variance δiSet, μ is each Gaussian mean μiSet, i indicates a number for not single Gauss model.Pass through training number
According to training, these three obtained parameters are unique.
Specific parameter estimation procedure is as follows:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process such as formula (3):
Wherein, xtIt indicates the t frame feature vectors in the training voice x of input, is calculated in characteristic extraction procedure
Known vector;λ is the same with the expression in formula (1), is the general name to all parameters in gauss hybrid models, these
Initial value will be provided in the initialization of trained incipient stage, be known parameter;What T was indicated is the training of all inputs
The totalframes of voice is can to calculate known numeric value;What k was indicated is that k-th of single Gauss model is compiled in gauss hybrid models
Number;p(k|xt, λ) and that indicate is the training speech frame x of inputtPosterior probability on k-th of Gauss of universal background model, this
A is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process such as formula (4):
Wherein each parameter in formula (3) as being meant that, wherein T, xtAll it is known variable with λ, and p
(k|xt, λ) and it is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process such as formula (5):
Wherein each parameter is as the meaning in formula (3) and formula (4), wherein T, xt, λ and μkAll it is known change
Amount, and p (k | xt, λ) and it is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
The model of step 103, in order to obtain each specific audio, it is necessary first to first obtain such specific audio language of part
Sound is as model training voice, if such more difficult acquisition of specific audio data, when can use training universal background model
Such specific audio data just made using such new audio data if such new specific audio data can be obtained
For training data, no matter training data how many, a kind of specific audio obtains a kind of corresponding specific audio model.
In this step, as shown in Fig. 2, certain a small amount of class specific audio training data and Bayesian adaptation will be utilized to calculate
Method, adaptively obtains such specific audio model from universal background model, and specific adaptive process is as follows:
Step 103-1, each speech frame is calculated in common background mould according to the feature vector of trained specific audio first
Posterior probability, first order statistic in type and second-order statistic;Specific calculating process such as formula (6) (7) (8):
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtTable
Show the feature of input audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is in universal background model
The number of i-th of single Gauss.
Because when training each specific audio model, it is used for each adaptive specific audio data respectively not phase
Together, so being also each not phase for training the posterior probability of each specific audio model being calculated, single order second-order statistic
With.
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, it is right
The parameter of universal background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covariance
Because specific audio model is substantially also gauss hybrid models, the weight of specific audio model is obtainedMean value
And covarianceAfterwards, so that it may to characterize the mixed Gauss model of the specific audio.
Specific adaptive formula is such as shown in (9) (10) (11):
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;ni、Ei(x) and Ei(x2) it is exactly by public affairs
The posterior probability for the specific audio training data that formula (6) (7) (8) is calculated, first order statistic, second-order statistic;Formula
(9) in, what T was indicated is such specific audio training data totalframes, and γ indicates normalized parameter, ensureswiIt indicates
Be i-th of Gauss model in universal background model weight;In formula (10), μiWhat is indicated is the in universal background model
The mean value of i Gauss model;In formula (11),Indicate the covariance of i-th of Gauss in universal background model, μiIt indicates
It is the mean value of i-th of Gauss in universal background model,What is indicated is the i-th high of the specific audio model adaptively obtained
This mean value.
After above-mentioned calculating process, such specific audio model has just been obtained.
From step 103-1 it is recognised that since the self-adapting data of each specific audio model is different, so passing through
The posterior probability that is calculated, single order second-order statistic are also different, so finally obtained after being calculated by 103-2
Specific audio model is also just different.
Two, test phase
With reference to figure 3, test phase includes the following steps:
Step 201 does feature extraction to the tested speech inputted;
The feature extracted in this step is identical as the type for the feature extracted in step 101, is such as that Meier is fallen
Spectral coefficient;
Step 202, the universal background model that the tested speech feature that step 201 is extracted is input to training in step 102
In the middle, score of the tested speech on universal background model is calculated.
By explanation before it is recognised that universal background model is substantially gauss hybrid models, tested speech is general
The score of background model is exactly the sum of each Gauss posterior probability.As a kind of preferred implementation, calculated to accelerate score,
It not is to calculate the posterior probability of whole Gausses but choose the maximum N number of Gauss of posterior probability, and calculate when actually calculating
The sum of this N number of probability, while marking this N number of gaussian sequence number.
Step 203, the respective specific audio for obtaining the tested speech feature input step 103 that step 201 is extracted it is mixed
Gauss model is closed, score of the tested speech on the mixed Gauss model of each specific audio is calculated, if there is M is a specific
Audio model, then the score finally obtained is total M.
The specific method for calculating score of the tested speech on the mixed Gauss model of respective specific audio is still meter
Calculate tested speech the sum of posterior probability above each Gauss above the specific audio model.As a kind of preferred implementation, it is
Raising calculating speed by the N number of gaussian sequence for the universal background model that step 202 records accordingly calculates specific audio
Mixed Gauss model in this N number of Gauss the sum of posterior probability, using the value as tested speech in the mixed of respective specific audio
Close the score above Gauss model.
Step 204 obtains the obtained tested speech of step 202 in score and the step 203 of universal background model
Score of the tested speech on the mixed Gauss model of respective specific audio seeks difference respectively, and difference and threshold value are compared
Compared with to adjudicate which specific audio this testing audio belongs to, if there is multiple model scores are all in threshold range, then
It is adjudicated using the method being maximized, that is, compares model score of these scores in threshold range, select score most
The specific audio that large-sized model is characterized is as tested speech final judging result.
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.Although ginseng
It is described the invention in detail according to embodiment, it will be understood by those of ordinary skill in the art that, to the technical side of the present invention
Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention
Right in.
Claims (7)
1. a kind of specific audio detection model generation method in short-term, including:
Step 101 carries out feature extraction to training voice data;Wherein, the trained voice data includes nonspecific audio number
According to specific audio data;
Step 102, with the feature of the obtained trained voice data of step 101, carry out the training of universal background model;Wherein,
The universal background model is mixed Gauss model, and expression formula is:
wiWhat is indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X indicates training
The frame feature of sound bite;λ indicates the set of all parameters in gauss hybrid models;pi(x) each single Gauss model is indicated
Probability density function, expression formula are:
What D was indicated is the dimension of the frame feature of trained sound bite;∑iWhat is indicated is the covariance matrix of the Gaussian function;μiTable
What is shown is the mean vector of the Gaussian function;
Step 103, the feature by training a kind of specific audio data of certain in voice data are obtained general according to step 102
The model of such specific audio data is adaptively obtained in background model;This operation is repeated, until obtaining training voice number
The model of all class specific audio data in;
In step 103, according to adaptively obtaining a kind of specific audio data in the obtained universal background model of step 102
Model include:
Step 103-1, each speech frame is calculated on universal background model according to the feature vector of trained specific audio first
Posterior probability ni, first order statistic Ei(x) and second-order statistic Ei(x2);Specific calculating process is as shown in following equation:
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtIndicate defeated
Enter the feature of audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is i-th in universal background model
The number of single Gauss;
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, to general
The parameter of background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covarianceAdaptively
The formula of adjustment is as follows:
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;What T was indicated is such specific audio training number
According to totalframes, γ indicates normalized parameter, ensureswiWhat is indicated is i-th of Gauss model in universal background model
Weight;Indicate the covariance of i-th of Gauss in universal background model,What is indicated is the specific audio adaptively obtained
The mean value of i-th of Gauss of model.
2. the detection model generation method of specific audio in short-term according to claim 1, which is characterized in that in a step 101,
Training voice data is extracted and is characterized as mel cepstrum coefficients.
3. the detection model generation method of specific audio in short-term according to claim 1, which is characterized in that in a step 102,
The training for carrying out universal background model includes period of use maximized method being hoped to carry out parameter Estimation to universal background model,
The parameter of estimation includes three classes:Gauss weight w, Gauss variance δ and Gaussian mean μ, wherein w are each Gauss weight wiCollection
It closes, δ is each Gauss variance δiSet, μ is each Gaussian mean μiSet, i indicates the number of each single Gauss model;
It specifically includes:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process is as shown in following equation:
Wherein, xtIndicate input training voice x in t frame feature vectors, be calculated in characteristic extraction procedure known to
Vector;λ is the general name to all parameters in gauss hybrid models, these can all give in the initialization of trained incipient stage
Go out initial value, is known parameter;What T was indicated is the totalframes of the training voice of all inputs, be can calculate it is known
Numerical value;What k was indicated is k-th of single Gauss model number in gauss hybrid models;p(k|xt, λ) and what is indicated is the training language of input
Sound frame xtPosterior probability on k-th of Gauss of universal background model, by input frame xtIt is calculated with mixed Gauss model parameter lambda
Gained;
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process is as shown in following equation:
Wherein, T, xtAll it is known variable with λ, and p (k | xt, λ) and it is by input frame xtIt is calculated with mixed Gauss model parameter lambda
Gained;
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process is as shown in following equation:
Wherein, T, xt, λ and μkAll it is known variable, and p (k | xt, λ) and it is by input frame xtWith mixed Gauss model parameter lambda meter
Calculate gained.
4. a kind of specific audio detection method in short-term, including:
Step 201 does feature extraction to the tested speech inputted;
The tested speech feature that step 201 is extracted is input to sound specific in short-term described in one of claim 1-3 by step 202
In the obtained universal background model of frequency detection model generation method, tested speech obtaining on universal background model is calculated
Point;
Specific audio in short-term described in one of step 203, the tested speech feature input claim 1-3 for extracting step 201
The mixed Gauss model of the obtained all kinds of specific audios of detection model generation method calculates tested speech in every specific sound of one kind
Score above the mixed Gauss model of frequency;
Step 204, the test that the obtained tested speech of step 202 is obtained in the score of universal background model with step 203
Score of the voice on the mixed Gauss model of all kinds of specific audios seeks difference respectively, and difference is compared with threshold value, from
And adjudicate this testing audio and which kind of specific audio belonged to, if there is multiple model scores are all in threshold range, then use
The method being maximized is adjudicated, and selects the specific audio that score maximum model characterizes as tested speech conclusive judgement knot
Fruit.
5. specific audio detection method in short-term according to claim 4, which is characterized in that in step 202, calculate test
Score of the voice on universal background model include:The maximum N number of Gauss of posterior probability in universal background model is chosen, and
The sum of this N number of probability is calculated, while marking this N number of gaussian sequence number.
6. specific audio detection method in short-term according to claim 5, which is characterized in that in step 203, calculate test
Score of the voice on the mixed Gauss model of every a kind of specific audio include:The common background mould recorded by step 202
N number of gaussian sequence of type accordingly calculates the sum of the posterior probability of this N number of Gauss in the mixed Gauss model of specific audio, will
Score of the value as tested speech on the mixed Gauss model of all kinds of specific audios.
7. specific audio detection method in short-term according to claim 4, which is characterized in that in step 201, to testing language
Sound, which is extracted, is characterized as mel cepstrum coefficients.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236568.8A CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510236568.8A CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104992708A CN104992708A (en) | 2015-10-21 |
CN104992708B true CN104992708B (en) | 2018-07-24 |
Family
ID=54304511
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510236568.8A Expired - Fee Related CN104992708B (en) | 2015-05-11 | 2015-05-11 | Specific audio detection model generation in short-term and detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104992708B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106251861B (en) * | 2016-08-05 | 2019-04-23 | 重庆大学 | A kind of abnormal sound in public places detection method based on scene modeling |
CN107068154A (en) * | 2017-03-13 | 2017-08-18 | 平安科技(深圳)有限公司 | The method and system of authentication based on Application on Voiceprint Recognition |
CN108305616B (en) * | 2018-01-16 | 2021-03-16 | 国家计算机网络与信息安全管理中心 | Audio scene recognition method and device based on long-time and short-time feature extraction |
CN110135492B (en) * | 2019-05-13 | 2020-12-22 | 山东大学 | Equipment fault diagnosis and abnormality detection method and system based on multiple Gaussian models |
CN113888777B (en) * | 2021-09-08 | 2023-08-18 | 南京金盾公共安全技术研究院有限公司 | Voiceprint unlocking method and device based on cloud machine learning |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509546A (en) * | 2011-11-11 | 2012-06-20 | 北京声迅电子股份有限公司 | Noise reduction and abnormal sound detection method applied to rail transit |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103198605A (en) * | 2013-03-11 | 2013-07-10 | 成都百威讯科技有限责任公司 | Indoor emergent abnormal event alarm system |
CN103226951A (en) * | 2013-04-19 | 2013-07-31 | 清华大学 | Speaker verification system creation method based on model sequence adaptive technique |
CN103366738A (en) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system |
-
2015
- 2015-05-11 CN CN201510236568.8A patent/CN104992708B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102509546A (en) * | 2011-11-11 | 2012-06-20 | 北京声迅电子股份有限公司 | Noise reduction and abnormal sound detection method applied to rail transit |
CN102623009A (en) * | 2012-03-02 | 2012-08-01 | 安徽科大讯飞信息技术股份有限公司 | Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis |
CN103366738A (en) * | 2012-04-01 | 2013-10-23 | 佳能株式会社 | Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system |
CN103198605A (en) * | 2013-03-11 | 2013-07-10 | 成都百威讯科技有限责任公司 | Indoor emergent abnormal event alarm system |
CN103226951A (en) * | 2013-04-19 | 2013-07-31 | 清华大学 | Speaker verification system creation method based on model sequence adaptive technique |
Non-Patent Citations (1)
Title |
---|
融合GMM及SVM的特定音频事件高精度识别方法;罗森林; 王坤; 谢尔曼; 潘丽敏; 李金玉;《北京理工大学学报》;20140731;第1-2节 * |
Also Published As
Publication number | Publication date |
---|---|
CN104992708A (en) | 2015-10-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104992708B (en) | Specific audio detection model generation in short-term and detection method | |
CN105632501B (en) | A kind of automatic accent classification method and device based on depth learning technology | |
CN105938716B (en) | A kind of sample copying voice automatic testing method based on the fitting of more precision | |
Xu et al. | Dynamic noise aware training for speech enhancement based on deep neural networks. | |
CN104732978B (en) | The relevant method for distinguishing speek person of text based on combined depth study | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
CN108122552A (en) | Voice mood recognition methods and device | |
CN106683666B (en) | A kind of domain-adaptive method based on deep neural network | |
CN105654944B (en) | It is a kind of merged in short-term with it is long when feature modeling ambient sound recognition methods and device | |
CN106611604A (en) | An automatic voice summation tone detection method based on a deep neural network | |
CN103810996A (en) | Processing method, device and system for voice to be tested | |
CN106023986B (en) | A kind of audio recognition method based on sound effect mode detection | |
Poorjam et al. | Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals | |
CN109408660A (en) | A method of the music based on audio frequency characteristics is classified automatically | |
CN110085216A (en) | A kind of vagitus detection method and device | |
Tsenov et al. | Speech recognition using neural networks | |
CN103578480B (en) | The speech-emotion recognition method based on context correction during negative emotions detects | |
CN110738986B (en) | Long voice labeling device and method | |
Allen et al. | Language identification using warping and the shifted delta cepstrum | |
CN111133508A (en) | Method and device for selecting comparison phonemes | |
Rabiee et al. | Persian accents identification using an adaptive neural network | |
CN106251861A (en) | A kind of abnormal sound in public places detection method based on scene modeling | |
Wiśniewski et al. | Automatic detection of prolonged fricative phonemes with the hidden Markov models approach | |
Galgali et al. | Speaker profiling by extracting paralinguistic parameters using mel frequency cepstral coefficients | |
Khanum et al. | Speech based gender identification using feed forward neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180724 |
|
CF01 | Termination of patent right due to non-payment of annual fee |