CN104992708B - Specific audio detection model generation in short-term and detection method - Google Patents

Specific audio detection model generation in short-term and detection method Download PDF

Info

Publication number
CN104992708B
CN104992708B CN201510236568.8A CN201510236568A CN104992708B CN 104992708 B CN104992708 B CN 104992708B CN 201510236568 A CN201510236568 A CN 201510236568A CN 104992708 B CN104992708 B CN 104992708B
Authority
CN
China
Prior art keywords
model
gauss
specific audio
training
universal background
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510236568.8A
Other languages
Chinese (zh)
Other versions
CN104992708A (en
Inventor
云晓春
颜永红
袁庆升
黄宇飞
任彦
周若华
黄文廷
邹学强
包秀国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Acoustics CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Acoustics CAS, National Computer Network and Information Security Management Center filed Critical Institute of Acoustics CAS
Priority to CN201510236568.8A priority Critical patent/CN104992708B/en
Publication of CN104992708A publication Critical patent/CN104992708A/en
Application granted granted Critical
Publication of CN104992708B publication Critical patent/CN104992708B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Electrically Operated Instructional Devices (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention relates to a kind of specific audio detection model generation methods in short-term, including:Feature extraction is carried out to training voice data;Wherein, the trained voice data includes nonspecific audio data and specific audio data;With the feature of training voice data, the training of universal background model is carried out;By the feature of certain a kind of specific audio data in training voice data, according to the model for adaptively obtaining such specific audio data in universal background model;This operation is repeated, until obtaining training the model of all class specific audio data in voice data.The present invention also provides a kind of specific audio detection method in short-term, this method is given a mark by model carries out the detection of specific audio.This method can not only well solve the insufficient problem of specific audio model training data, can be inhibited with a degree of ambient noise to input data.

Description

Specific audio detection model generation in short-term and detection method
Technical field
The present invention relates to the methods of the detection of specific audio in short-term, it is more particularly related to utilize mixed Gaussian mould Type carries out the detection of specific audio in short-term.
Background technology
In many fields, specific audio suffers from important role in short-term, especially in security fields, in some specific feelings Under condition, it would be desirable to detect certain a kind of specific audio in short-term to facilitate us to carry out timely some urgent events Processing.For example, in public, it would be desirable to supervise public safety and detect the generation of contingency, as unexpected is screamed Sound, unexpected explosive sound or gunshot, we must detect in time these in short-term specific audio with facilitate timely processing this A little fortuitous events.In addition to this, in some relatively important places, the detection of specific audio in short-term can be also used for abnormal sound Sound detects, and can be very good to play a part of early warning.
The problem of specific audio detection method encounters in short-term or many at present, first, because specific audio is sent out in short-term Life is quickly and the time of origin of event is very of short duration, so how critically important using the information in audio in short-term;Second, it is special in short-term The accordatura raw frequency that takes place frequently is not very high, so having to face the insufficient problem of training data;Third, due to the use of field Scape often has complicated ambient noise, so inhibiting ambient noise well also to become specific audio detection in short-term to be also a weight The problem of wanting.
Invention content
It is an object of the invention to overcome training data deficiency, nothing present in the existing detection method of specific audio in short-term Method inhibits the defect of ambient noise, to provide a kind of generation of the model of specific audio in short-term and detection based on mixed Gauss model Method.
The present invention also provides a kind of specific audio detection model generation methods in short-term, including:
Step 101 carries out feature extraction to training voice data;Wherein, the trained voice data includes nonspecific sound Frequency evidence and specific audio data;
Step 102, with the feature of the obtained trained voice data of step 101, carry out the training of universal background model;Its In, the universal background model is mixed Gauss model, and expression formula is:
What wi was indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X tables Show the frame feature of trained sound bite;λ indicates the set of all parameters in gauss hybrid models;pi(x) each single Gauss is indicated The probability density function of model, expression formula are:
What D was indicated is the dimension of the frame feature of trained sound bite;What Σ i were indicated is the covariance square of the Gaussian function Battle array;μiWhat is indicated is the mean vector of the Gaussian function;
Step 103, the feature by training a kind of specific audio data of certain in voice data, it is obtained according to step 102 The model of such specific audio data is adaptively obtained in universal background model;This operation is repeated, until obtaining training language The model of all class specific audio data in sound data.
In above-mentioned technical proposal, in a step 101, training voice data is extracted and is characterized as mel cepstrum coefficients.
In above-mentioned technical proposal, in a step 102, the training for carrying out universal background model includes hoping maximumlly period of use Method carries out parameter Estimation to universal background model, and the parameter to be estimated includes three classes:Gauss weight w, Gauss variance δ and Gaussian mean μ, wherein w are each Gauss weight wiSet, δ is each Gauss variance δiSet, μ is each Gaussian mean μiSet, i indicates the number of each single Gauss model;It specifically includes:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process is as shown in following equation:
Wherein, xtIt indicates the t frame feature vectors in the training voice x of input, is calculated in characteristic extraction procedure Known vector;λ is the general name to all parameters in gauss hybrid models, and what these all can be in the trained incipient stage is initial Initial value is provided in change, is known parameter;What T was indicated is the totalframes of the training voice of all inputs, is that can calculate Carry out known numeric value;What k was indicated is k-th of single Gauss model number in gauss hybrid models;p(k|xt, λ) and what is indicated is input Training speech frame xtPosterior probability on k-th of Gauss of universal background model, by input frame xtWith mixed Gauss model parameter λ calculates gained;
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process is as shown in following equation:
Wherein, T, xtAll it is known variable with λ, and p (k | xt, λ) and it is by input frame xtWith mixed Gauss model parameter lambda Calculate gained;
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process is as shown in following equation:
Wherein, T, xt, λ and μkAll it is known variable, and p (k | xt, λ) and it is by input frame xtJoin with mixed Gauss model Number λ calculates gained.
In above-mentioned technical proposal, in step 103, according in the obtained universal background model of step 102 adaptively The model for obtaining a kind of specific audio data includes:
Step 103-1, each speech frame is calculated in common background mould according to the feature vector of trained specific audio first Posterior probability n in typei, first order statistic Ei(x) and second-order statistic Ei(x2);Specific calculating process such as following equation institute Show:
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtTable Show the feature of input audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is in universal background model The number of i-th of single Gauss;
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, it is right The parameter of universal background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covariance The formula adaptively adjusted is as follows:
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;What T was indicated is such specific audio Training data totalframes, γ indicate normalized parameter, ensurewiWhat is indicated is i-th high in universal background model The weight of this model;μiWhat is indicated is the mean value of i-th of Gauss model in universal background model;Indicate universal background model In i-th of Gauss covariance, μiWhat is indicated is the mean value of i-th of Gauss in universal background model,What is indicated is adaptive The mean value of i-th of Gauss of the obtained specific audio model.
Invention further provides a kind of specific audio detection methods in short-term, including:
Step 201 does feature extraction to the tested speech inputted;
The tested speech feature that step 201 is extracted is input to the detection model of the specific audio in short-term life by step 202 In the obtained universal background model of method, score of the tested speech on universal background model is calculated;
Step 203, the tested speech feature input for extracting step 201 detection model of specific audio in short-term generate The mixed Gauss model of the obtained all kinds of specific audios of method, mixed Gaussian of the calculating tested speech in every a kind of specific audio Score above model;
Step 204 obtains the obtained tested speech of step 202 in score and the step 203 of universal background model Score of the tested speech on the mixed Gauss model of all kinds of specific audios seeks difference respectively, and difference and threshold value are compared Compared with to adjudicate which kind of specific audio this testing audio belongs to, if there is multiple model scores are all in threshold range, then It is adjudicated using the method being maximized, selects the specific audio that score maximum model characterizes as tested speech conclusive judgement As a result.
In above-mentioned technical proposal, in step 202, calculating score of the tested speech on universal background model includes: The sum of choose the maximum N number of Gauss of posterior probability in universal background model, and calculate this N number of probability, while marking this N number of Gauss Sequence number.
In above-mentioned technical proposal, in step 203, mixed Gauss model of the calculating tested speech in every a kind of specific audio Score above includes:By the N number of gaussian sequence for the universal background model that step 202 records, specific audio is accordingly calculated Mixed Gauss model in this N number of Gauss the sum of posterior probability, using the value as tested speech in the mixed of all kinds of specific audios Close the score above Gauss model.
In above-mentioned technical proposal, in step 201, tested speech is extracted and is characterized as mel cepstrum coefficients.
The advantage of the invention is that:
The method of the present invention can not only overcome the problems, such as that specific audio model training data are insufficient in short-term well, also It can inhibit ambient noise well to a certain extent.
Description of the drawings
Fig. 1 is the training basic principle frame in specific audio detection model generation method about universal background model in short-term Figure;
Fig. 2 is the training basic principle frame in specific audio detection model generation method about specific audio model in short-term Figure;
Fig. 3 is the flow chart of specific audio detection method in short-term.
Specific implementation mode
The specific implementation mode of the present invention is described in further detail in conjunction with Fig. 1 and Fig. 2.
The detection method of specific audio in short-term of the present invention includes two stages, and the first stage is instructed using training voice data Practice model, second stage is detected to tested speech using the model after training.
One, model training stage
Step 101 carries out feature extraction to training voice data, is extracted and is characterized as mel cepstrum coefficients (MFCC spies Sign), this category feature includes energy value and single order, second differnce;
In one embodiment, the frame length of the mel cepstrum coefficients extracted is 20ms, and it is 10ms that frame, which moves, including energy value With single order, second differnce;Characteristic dimension is 60 dimensions altogether;
The trained voice data should include the data of a large amount of nonspecific audio and a certain amount of specific audio Data.
Step 102, using the feature of the obtained trained voice data of step 101, i.e. mel cepstrum coefficients, carry out general The training (UBM model) of background model;
The training schematic diagram of universal background model with reference to given by figure 1, universal background model such as formula (1):
W in formula (1)iWhat is indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X indicates training sound bite frame feature;λ indicates the set of all parameters in gauss hybrid models;M indicates Gauss Gaussian Mixture number in mixed model.
P in formula (1)i(x) probability density function of each single Gauss model, expression such as formula are indicated (2):
Wherein pi(x) by following parameter characterization:What D was indicated is the dimension of the frame feature of trained sound bite, by spy Characteristic dimension is determined in sign extraction process;What Σ i were indicated is the covariance matrix of the Gaussian function;μiThat indicate is the Gauss The mean vector of function.
Above is exactly the specific expression of universal background model, and gauss hybrid models are summed with the linear weighted function of multiple single Gausses To be fitted general speaker for the probability-distribution function of voiced speech feature, i.e. distribution probability density function.So by logical The distribution of speaker's sounding can be characterized well with gauss hybrid models, can characterize speaker's pronunciation character well.
On the basis of above-mentioned universal background model, universal background model training is carried out using the feature of training voice data Process refer to utilizing it is expected that maximized method carries out parameter Estimation.
After parameter Estimation, universal background model can be obtained, which is exactly mixed Gauss model, its parameter is just Including three:Gauss weight w, Gauss variance δ and Gaussian mean μ, wherein w are each Gauss weight wiSet, δ is each Gauss variance δiSet, μ is each Gaussian mean μiSet, i indicates a number for not single Gauss model.Pass through training number According to training, these three obtained parameters are unique.
Specific parameter estimation procedure is as follows:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process such as formula (3):
Wherein, xtIt indicates the t frame feature vectors in the training voice x of input, is calculated in characteristic extraction procedure Known vector;λ is the same with the expression in formula (1), is the general name to all parameters in gauss hybrid models, these Initial value will be provided in the initialization of trained incipient stage, be known parameter;What T was indicated is the training of all inputs The totalframes of voice is can to calculate known numeric value;What k was indicated is that k-th of single Gauss model is compiled in gauss hybrid models Number;p(k|xt, λ) and that indicate is the training speech frame x of inputtPosterior probability on k-th of Gauss of universal background model, this A is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process such as formula (4):
Wherein each parameter in formula (3) as being meant that, wherein T, xtAll it is known variable with λ, and p (k|xt, λ) and it is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process such as formula (5):
Wherein each parameter is as the meaning in formula (3) and formula (4), wherein T, xt, λ and μkAll it is known change Amount, and p (k | xt, λ) and it is by input frame xtGained is calculated with mixed Gauss model parameter lambda.
The model of step 103, in order to obtain each specific audio, it is necessary first to first obtain such specific audio language of part Sound is as model training voice, if such more difficult acquisition of specific audio data, when can use training universal background model Such specific audio data just made using such new audio data if such new specific audio data can be obtained For training data, no matter training data how many, a kind of specific audio obtains a kind of corresponding specific audio model.
In this step, as shown in Fig. 2, certain a small amount of class specific audio training data and Bayesian adaptation will be utilized to calculate Method, adaptively obtains such specific audio model from universal background model, and specific adaptive process is as follows:
Step 103-1, each speech frame is calculated in common background mould according to the feature vector of trained specific audio first Posterior probability, first order statistic in type and second-order statistic;Specific calculating process such as formula (6) (7) (8):
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtTable Show the feature of input audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is in universal background model The number of i-th of single Gauss.
Because when training each specific audio model, it is used for each adaptive specific audio data respectively not phase Together, so being also each not phase for training the posterior probability of each specific audio model being calculated, single order second-order statistic With.
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, it is right The parameter of universal background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covariance Because specific audio model is substantially also gauss hybrid models, the weight of specific audio model is obtainedMean value And covarianceAfterwards, so that it may to characterize the mixed Gauss model of the specific audio.
Specific adaptive formula is such as shown in (9) (10) (11):
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;ni、Ei(x) and Ei(x2) it is exactly by public affairs The posterior probability for the specific audio training data that formula (6) (7) (8) is calculated, first order statistic, second-order statistic;Formula (9) in, what T was indicated is such specific audio training data totalframes, and γ indicates normalized parameter, ensureswiIt indicates Be i-th of Gauss model in universal background model weight;In formula (10), μiWhat is indicated is the in universal background model The mean value of i Gauss model;In formula (11),Indicate the covariance of i-th of Gauss in universal background model, μiIt indicates It is the mean value of i-th of Gauss in universal background model,What is indicated is the i-th high of the specific audio model adaptively obtained This mean value.
After above-mentioned calculating process, such specific audio model has just been obtained.
From step 103-1 it is recognised that since the self-adapting data of each specific audio model is different, so passing through The posterior probability that is calculated, single order second-order statistic are also different, so finally obtained after being calculated by 103-2 Specific audio model is also just different.
Two, test phase
With reference to figure 3, test phase includes the following steps:
Step 201 does feature extraction to the tested speech inputted;
The feature extracted in this step is identical as the type for the feature extracted in step 101, is such as that Meier is fallen Spectral coefficient;
Step 202, the universal background model that the tested speech feature that step 201 is extracted is input to training in step 102 In the middle, score of the tested speech on universal background model is calculated.
By explanation before it is recognised that universal background model is substantially gauss hybrid models, tested speech is general The score of background model is exactly the sum of each Gauss posterior probability.As a kind of preferred implementation, calculated to accelerate score, It not is to calculate the posterior probability of whole Gausses but choose the maximum N number of Gauss of posterior probability, and calculate when actually calculating The sum of this N number of probability, while marking this N number of gaussian sequence number.
Step 203, the respective specific audio for obtaining the tested speech feature input step 103 that step 201 is extracted it is mixed Gauss model is closed, score of the tested speech on the mixed Gauss model of each specific audio is calculated, if there is M is a specific Audio model, then the score finally obtained is total M.
The specific method for calculating score of the tested speech on the mixed Gauss model of respective specific audio is still meter Calculate tested speech the sum of posterior probability above each Gauss above the specific audio model.As a kind of preferred implementation, it is Raising calculating speed by the N number of gaussian sequence for the universal background model that step 202 records accordingly calculates specific audio Mixed Gauss model in this N number of Gauss the sum of posterior probability, using the value as tested speech in the mixed of respective specific audio Close the score above Gauss model.
Step 204 obtains the obtained tested speech of step 202 in score and the step 203 of universal background model Score of the tested speech on the mixed Gauss model of respective specific audio seeks difference respectively, and difference and threshold value are compared Compared with to adjudicate which specific audio this testing audio belongs to, if there is multiple model scores are all in threshold range, then It is adjudicated using the method being maximized, that is, compares model score of these scores in threshold range, select score most The specific audio that large-sized model is characterized is as tested speech final judging result.
It should be noted last that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting.Although ginseng It is described the invention in detail according to embodiment, it will be understood by those of ordinary skill in the art that, to the technical side of the present invention Case is modified or replaced equivalently, and without departure from the spirit and scope of technical solution of the present invention, should all be covered in the present invention Right in.

Claims (7)

1. a kind of specific audio detection model generation method in short-term, including:
Step 101 carries out feature extraction to training voice data;Wherein, the trained voice data includes nonspecific audio number According to specific audio data;
Step 102, with the feature of the obtained trained voice data of step 101, carry out the training of universal background model;Wherein, The universal background model is mixed Gauss model, and expression formula is:
wiWhat is indicated is the weight of each Gauss, and value range meets normalizing condition 0~1:X indicates training The frame feature of sound bite;λ indicates the set of all parameters in gauss hybrid models;pi(x) each single Gauss model is indicated Probability density function, expression formula are:
What D was indicated is the dimension of the frame feature of trained sound bite;∑iWhat is indicated is the covariance matrix of the Gaussian function;μiTable What is shown is the mean vector of the Gaussian function;
Step 103, the feature by training a kind of specific audio data of certain in voice data are obtained general according to step 102 The model of such specific audio data is adaptively obtained in background model;This operation is repeated, until obtaining training voice number The model of all class specific audio data in;
In step 103, according to adaptively obtaining a kind of specific audio data in the obtained universal background model of step 102 Model include:
Step 103-1, each speech frame is calculated on universal background model according to the feature vector of trained specific audio first Posterior probability ni, first order statistic Ei(x) and second-order statistic Ei(x2);Specific calculating process is as shown in following equation:
Wherein, Pr (i | xt) indicate input audio x t frames i-th of Gauss of universal background model posterior probability;xtIndicate defeated Enter the feature of audio x t frame data;What T was indicated is the totalframes for inputting audio;What i was indicated is i-th in universal background model The number of single Gauss;
Step 103-2, posterior probability, first order statistic and the second-order statistic being calculated using step 103-1, to general The parameter of background model does adaptive adjustment, obtains the weight of specific audio modelMean valueAnd covarianceAdaptively The formula of adjustment is as follows:
Wherein,WithIt is variance, mean value, weight regulation coefficient respectively;What T was indicated is such specific audio training number According to totalframes, γ indicates normalized parameter, ensureswiWhat is indicated is i-th of Gauss model in universal background model Weight;Indicate the covariance of i-th of Gauss in universal background model,What is indicated is the specific audio adaptively obtained The mean value of i-th of Gauss of model.
2. the detection model generation method of specific audio in short-term according to claim 1, which is characterized in that in a step 101, Training voice data is extracted and is characterized as mel cepstrum coefficients.
3. the detection model generation method of specific audio in short-term according to claim 1, which is characterized in that in a step 102, The training for carrying out universal background model includes period of use maximized method being hoped to carry out parameter Estimation to universal background model, The parameter of estimation includes three classes:Gauss weight w, Gauss variance δ and Gaussian mean μ, wherein w are each Gauss weight wiCollection It closes, δ is each Gauss variance δiSet, μ is each Gaussian mean μiSet, i indicates the number of each single Gauss model; It specifically includes:
Step 102-1, to k-th of Gauss weight wkUpdate:
K-th of Gauss weight wkRenewal process is as shown in following equation:
Wherein, xtIndicate input training voice x in t frame feature vectors, be calculated in characteristic extraction procedure known to Vector;λ is the general name to all parameters in gauss hybrid models, these can all give in the initialization of trained incipient stage Go out initial value, is known parameter;What T was indicated is the totalframes of the training voice of all inputs, be can calculate it is known Numerical value;What k was indicated is k-th of single Gauss model number in gauss hybrid models;p(k|xt, λ) and what is indicated is the training language of input Sound frame xtPosterior probability on k-th of Gauss of universal background model, by input frame xtIt is calculated with mixed Gauss model parameter lambda Gained;
Step 102-2, to k-th of Gaussian mean μkUpdate:
K-th of Gaussian mean μkRenewal process is as shown in following equation:
Wherein, T, xtAll it is known variable with λ, and p (k | xt, λ) and it is by input frame xtIt is calculated with mixed Gauss model parameter lambda Gained;
Step 102-3, to k-th of Gauss varianceUpdate:
K-th of Gaussian meanRenewal process is as shown in following equation:
Wherein, T, xt, λ and μkAll it is known variable, and p (k | xt, λ) and it is by input frame xtWith mixed Gauss model parameter lambda meter Calculate gained.
4. a kind of specific audio detection method in short-term, including:
Step 201 does feature extraction to the tested speech inputted;
The tested speech feature that step 201 is extracted is input to sound specific in short-term described in one of claim 1-3 by step 202 In the obtained universal background model of frequency detection model generation method, tested speech obtaining on universal background model is calculated Point;
Specific audio in short-term described in one of step 203, the tested speech feature input claim 1-3 for extracting step 201 The mixed Gauss model of the obtained all kinds of specific audios of detection model generation method calculates tested speech in every specific sound of one kind Score above the mixed Gauss model of frequency;
Step 204, the test that the obtained tested speech of step 202 is obtained in the score of universal background model with step 203 Score of the voice on the mixed Gauss model of all kinds of specific audios seeks difference respectively, and difference is compared with threshold value, from And adjudicate this testing audio and which kind of specific audio belonged to, if there is multiple model scores are all in threshold range, then use The method being maximized is adjudicated, and selects the specific audio that score maximum model characterizes as tested speech conclusive judgement knot Fruit.
5. specific audio detection method in short-term according to claim 4, which is characterized in that in step 202, calculate test Score of the voice on universal background model include:The maximum N number of Gauss of posterior probability in universal background model is chosen, and The sum of this N number of probability is calculated, while marking this N number of gaussian sequence number.
6. specific audio detection method in short-term according to claim 5, which is characterized in that in step 203, calculate test Score of the voice on the mixed Gauss model of every a kind of specific audio include:The common background mould recorded by step 202 N number of gaussian sequence of type accordingly calculates the sum of the posterior probability of this N number of Gauss in the mixed Gauss model of specific audio, will Score of the value as tested speech on the mixed Gauss model of all kinds of specific audios.
7. specific audio detection method in short-term according to claim 4, which is characterized in that in step 201, to testing language Sound, which is extracted, is characterized as mel cepstrum coefficients.
CN201510236568.8A 2015-05-11 2015-05-11 Specific audio detection model generation in short-term and detection method Expired - Fee Related CN104992708B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510236568.8A CN104992708B (en) 2015-05-11 2015-05-11 Specific audio detection model generation in short-term and detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510236568.8A CN104992708B (en) 2015-05-11 2015-05-11 Specific audio detection model generation in short-term and detection method

Publications (2)

Publication Number Publication Date
CN104992708A CN104992708A (en) 2015-10-21
CN104992708B true CN104992708B (en) 2018-07-24

Family

ID=54304511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510236568.8A Expired - Fee Related CN104992708B (en) 2015-05-11 2015-05-11 Specific audio detection model generation in short-term and detection method

Country Status (1)

Country Link
CN (1) CN104992708B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106251861B (en) * 2016-08-05 2019-04-23 重庆大学 A kind of abnormal sound in public places detection method based on scene modeling
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN108305616B (en) * 2018-01-16 2021-03-16 国家计算机网络与信息安全管理中心 Audio scene recognition method and device based on long-time and short-time feature extraction
CN110135492B (en) * 2019-05-13 2020-12-22 山东大学 Equipment fault diagnosis and abnormality detection method and system based on multiple Gaussian models
CN113888777B (en) * 2021-09-08 2023-08-18 南京金盾公共安全技术研究院有限公司 Voiceprint unlocking method and device based on cloud machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509546A (en) * 2011-11-11 2012-06-20 北京声迅电子股份有限公司 Noise reduction and abnormal sound detection method applied to rail transit
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN103198605A (en) * 2013-03-11 2013-07-10 成都百威讯科技有限责任公司 Indoor emergent abnormal event alarm system
CN103226951A (en) * 2013-04-19 2013-07-31 清华大学 Speaker verification system creation method based on model sequence adaptive technique
CN103366738A (en) * 2012-04-01 2013-10-23 佳能株式会社 Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102509546A (en) * 2011-11-11 2012-06-20 北京声迅电子股份有限公司 Noise reduction and abnormal sound detection method applied to rail transit
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN103366738A (en) * 2012-04-01 2013-10-23 佳能株式会社 Methods and devices for generating sound classifier and detecting abnormal sound, and monitoring system
CN103198605A (en) * 2013-03-11 2013-07-10 成都百威讯科技有限责任公司 Indoor emergent abnormal event alarm system
CN103226951A (en) * 2013-04-19 2013-07-31 清华大学 Speaker verification system creation method based on model sequence adaptive technique

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合GMM及SVM的特定音频事件高精度识别方法;罗森林; 王坤; 谢尔曼; 潘丽敏; 李金玉;《北京理工大学学报》;20140731;第1-2节 *

Also Published As

Publication number Publication date
CN104992708A (en) 2015-10-21

Similar Documents

Publication Publication Date Title
CN104992708B (en) Specific audio detection model generation in short-term and detection method
CN105632501B (en) A kind of automatic accent classification method and device based on depth learning technology
CN105938716B (en) A kind of sample copying voice automatic testing method based on the fitting of more precision
Xu et al. Dynamic noise aware training for speech enhancement based on deep neural networks.
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN108122552A (en) Voice mood recognition methods and device
CN106683666B (en) A kind of domain-adaptive method based on deep neural network
CN105654944B (en) It is a kind of merged in short-term with it is long when feature modeling ambient sound recognition methods and device
CN106611604A (en) An automatic voice summation tone detection method based on a deep neural network
CN103810996A (en) Processing method, device and system for voice to be tested
CN106023986B (en) A kind of audio recognition method based on sound effect mode detection
Poorjam et al. Multitask speaker profiling for estimating age, height, weight and smoking habits from spontaneous telephone speech signals
CN109408660A (en) A method of the music based on audio frequency characteristics is classified automatically
CN110085216A (en) A kind of vagitus detection method and device
Tsenov et al. Speech recognition using neural networks
CN103578480B (en) The speech-emotion recognition method based on context correction during negative emotions detects
CN110738986B (en) Long voice labeling device and method
Allen et al. Language identification using warping and the shifted delta cepstrum
CN111133508A (en) Method and device for selecting comparison phonemes
Rabiee et al. Persian accents identification using an adaptive neural network
CN106251861A (en) A kind of abnormal sound in public places detection method based on scene modeling
Wiśniewski et al. Automatic detection of prolonged fricative phonemes with the hidden Markov models approach
Galgali et al. Speaker profiling by extracting paralinguistic parameters using mel frequency cepstral coefficients
Khanum et al. Speech based gender identification using feed forward neural networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180724

CF01 Termination of patent right due to non-payment of annual fee