CN109166591B - Classification method based on audio characteristic signals - Google Patents
Classification method based on audio characteristic signals Download PDFInfo
- Publication number
- CN109166591B CN109166591B CN201810994308.0A CN201810994308A CN109166591B CN 109166591 B CN109166591 B CN 109166591B CN 201810994308 A CN201810994308 A CN 201810994308A CN 109166591 B CN109166591 B CN 109166591B
- Authority
- CN
- China
- Prior art keywords
- audio
- function
- classification
- feature
- characteristic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000009467 reduction Effects 0.000 claims abstract description 38
- 230000005236 sound signal Effects 0.000 claims abstract description 37
- 238000012545 processing Methods 0.000 claims abstract description 26
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 12
- 238000005457 optimization Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 74
- 238000013145 classification model Methods 0.000 claims description 19
- 230000006835 compression Effects 0.000 claims description 11
- 238000007906 compression Methods 0.000 claims description 11
- 238000000605 extraction Methods 0.000 claims description 10
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000001914 filtration Methods 0.000 claims description 9
- 238000005316 response function Methods 0.000 claims description 9
- 230000014509 gene expression Effects 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 4
- 238000012360 testing method Methods 0.000 claims description 4
- 238000010606 normalization Methods 0.000 claims description 3
- 238000013139 quantization Methods 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims description 2
- 238000007635 classification algorithm Methods 0.000 abstract description 8
- 230000000694 effects Effects 0.000 abstract description 3
- 238000012544 monitoring process Methods 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract 1
- 238000013507 mapping Methods 0.000 abstract 1
- 238000007476 Maximum Likelihood Methods 0.000 description 4
- 238000013079 data visualisation Methods 0.000 description 4
- 230000037433 frameshift Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000003247 decreasing effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000009795 derivation Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Artificial Intelligence (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
- Auxiliary Devices For Music (AREA)
Abstract
The invention relates to a classification method based on audio characteristic signals, and belongs to the technical field of audio signal processing. The invention classifies the audio characteristic signals after the dimension reduction processing by utilizing the Gaussian kernel function and the Bayesian prior knowledge. The classification algorithm based on the audio characteristic signals can be used for audio broadcast monitoring, artificial intelligent speech recognition, audio scene mode distinguishing and the like. The invention mainly carries out audio classification aiming at the audio characteristic signal coefficient domain characteristics, and has better universality and stability compared with the prior art of carrying out classification based on audio content. The invention utilizes the excellent nonlinear characteristics of the Gaussian kernel function and the high-efficiency optimization algorithm to avoid the defects of single application scene, low running speed and poor classification effect caused by linear mapping. The algorithm theory is simple, easy to program and implement, and more practical and practical in engineering projects.
Description
Technical Field
The invention relates to a classification method based on audio characteristic signals, and belongs to the technical field of audio characteristic signal processing.
Background
In order to improve the recognition efficiency and accuracy based on audio signals, and meanwhile, audio feature classification is a great position in audio monitoring management and control of wireless broadcasting, so that the research on classification algorithms based on audio feature signals is particularly important, and the current main classification algorithms include Bayes classification algorithms, decision tree algorithms, support vector machine algorithms and the like, and most of the classification algorithms are poor in classification effect, complex in algorithm, large in calculation amount, difficult to realize programming and the like. The algorithm utilizes the excellent nonlinear characteristic of the Gaussian kernel function and combines the Bayes prior theory, can obtain a satisfactory result aiming at the classification problem of the audio characteristic signal subjected to the dimensionality reduction processing, and also shows an excellent effect in the actual engineering.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a classification method based on audio characteristic signals, which comprises the steps of firstly extracting audio characteristic parameters of the audio signals and carrying out dimension reduction treatment, sending the feature parameters after dimension reduction into a built classification model, and judging the category of a test point by using the input and output similarity probability, thereby realizing the purpose of audio classification, namely audio identification.
The technical scheme of the invention is as follows: a classification method based on audio characteristic signals. The method specifically comprises the following steps:
(1) audio signal acquisition: and acquiring an audio signal to obtain an audio sample.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering and framing the digital signals to be written into the WAV file.
(3) Characteristic parameter extraction: and programming realizes the extraction of high-dimensional characteristic parameters of Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC) and Mel Frequency Cepstrum Coefficients (MFCC) of the preprocessed audio signals.
(4) Reducing the dimension of the characteristic parameter: and sending the extracted audio characteristic parameters into a built dimension reduction model for dimension reduction treatment, and storing the dimension-reduced audio characteristic parameters into a table.
(5) Building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of f to a [01] range by using a compression function, wherein the obtained compression value is the similarity of the two classes, distinguishing the classes according to the similarity, and the built model is the required classification model.
(6) Audio feature parameter classification: and (5) sending the audio characteristic quantity subjected to the dimensionality reduction in the step (4) into the classification model in the step (5) for audio characteristic parameter classification, and performing data visualization display on a classification result.
In the audio collection in the step (1), the audio collection is to collect an audio sample by using an audio collection device, and the audio collection device sets a sampling frequency (the sampling frequency satisfies nyquist sampling theorem), a sampling channel number (set according to a collection object), and quantization precision when collecting the audio signal.
In the above method for dimension reduction analysis based on audio feature signals, the signal preprocessing in step (2) includes the following steps:
(1) using a rectangular window function w (n) (upper limit frequency is generally f)H3400Hz, lower limit frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
(2) Because the audio signal is not a stable signal and is not suitable for directly extracting the characteristic parameters, the audio signal y after the filtering processing is carried outa(n) dividing the audio signal into a plurality of audio signal segments, one audio signal segment is called a frame, and the time range of each audio signal segment is between 10 and 30 ms. The frames are partially overlapped, the overlapped part is called frame shift, and the frame shift takes 1/2 or 1/3 of the length of the frame.
In the above classification method based on audio feature signals, the feature parameter extraction in step (3) is to perform feature parameter extraction on the audio signals after being framed by Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC), mel-frequency cepstrum coefficients (MFCC), and to put them into 3 tables, respectively.
In the classification method based on the audio characteristic signals, in the step (4), the feature parameter dimension reduction is to obtain the optimal projection direction of the feature vector by using a Fisher criterion algorithm, the contribution degree of each feature component to the identification is judged by increasing or decreasing the feature components, the dimension reduction processing of the audio characteristic signals is carried out by combining the two characteristics to obtain a better dimension reduction result, the dimension component is more important when the Fisher ratio is larger, and the Fisher linear judgment criterion is that the dimension component is more important when the Fisher ratio is larger, whereinIn the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenThe inter-class variance of the characteristic components is represented, namely the variance of the mean values of different voice characteristic components; o ° owithinRepresents the intra-class variance of the feature components, i.e., the mean of the variances of the same speech feature components. Where ρ represents the dimension of the characteristic parameter;representing the mean value of rho dimension components of the voice features on all classes;mean values representing the epsilon-th class of the rho-th dimension components of the speech features; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;representing the rho-dimension component of the epsilon-type speech feature sequence.
The inter-class variance of the feature component reflects the degree of difference between different speech samples, while the intra-class variance reflects the degree of density between the same speech samples, and for the feature component, its separability is characterized from both the intra-class variance and the inter-class variance. The larger the Fisher ratio is, the more suitable the dimension characteristic parameter is as the characteristic information of the voice recognition, and the larger dimension component of the Fisher is selected as the dimension reduction result, so that the purpose of reducing the dimension is achieved.
In the above classification method based on the audio characteristic signal, the building of the characteristic quantity classification model in the step (5) includes the following steps:
(1) first, considering a dichotomy problem, two types of audio feature signals after dimension reduction processing are respectively defined as y being 1 and y being 0, and x represents the audio feature signals after dimension reduction processing. The model introduces an implicit function f (x) and a response function delta (f (x)), wherein the implicit function f (x) follows Gaussian distribution, and the response function compresses the result of f (x) to [01]]In the interval, the likelihood of data can be written as a response function here pi (x) ═ p (y ═ 1| x) ═ σ (f (x)), p (y ═ 1| f) ═ 1- δ (f)
(2) Since f is an implicit function subject to Gaussian distribution, the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is
Wherein σf 2For the coefficient parameters of the square exponential kernel, l represents the distance influence factor parameter between the two points x and x', and only two hyperparameters theta (sigma) exist in the kernel functionfL). For a given test point and a combined distribution of implicit functions at x, is
Wherein K is a covariance matrix expressed as
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*) (4)
The condition distribution of its implicit function is f*|f~N(K*K-1f,K**-K*K-1K* T) Prediction conditional probability of implicit function
Here, the conditional distribution of the implicit function is the same as the predicted scheduling probability distribution of the implicit function but the expression is not exactly the same, so it is assumed that the average value of the predicted conditional probability output of the implicit function isThe corresponding covariance matrix K' will also be different in the same way, the meaning of which is explained in step (2). Compress implicit function to [01]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Then there is
δ*=∫δ(f*)p(f*|f)df* (5)
(2) For the likelihood, firstly, function analysis is needed, and according to a Bayes formula, the posterior distribution of the implicit function isTo maximize the posterior probability of the implicit function, i.e. to solve the maximum likelihood, the maximum likelihood is obtained by using the optimization algorithms such as simplex
The optimal solution of f can be solved by substituting the equation into (5) and iterating for a certain number of times
Since p (y | f) is not Gaussian distribution, the posterior p (f | x, y) distribution of the implicit function is not analyzed, Laplace is used for approximation, Gaussian distribution q (f | x, y) is used for approximating the posterior distribution p (f | x, y), and log p (f | x, y) at the maximum position of the posterior distribution is subjected to second-order Taylor expansion, so that the Gaussian distribution can be obtained
(3) The implicit function assumed by the invention is a square exponential kernel function, i.e. a formula versus an over-parameter sigmafSolving for
p(y|x,θ)=∫p(y|f)p(f|x)df (9)
To obtain the optimal hyper-parameters, i.e. maximize the conditional probability as much as possible, second-order Taylor expansion is performed on logp (y | x, theta) at the local maximum point, probability normalization is performed, and finally Laplace approximation, i.e. logarithmic likelihood function expansion, is performed, for detailed derivation, see chart 2of Gaussian processing for machine learning of Rasmussen and Williams (2006)
And (3) substituting the solved relevant parameters into the log-likelihood function, optimizing by using a simplex method to solve the optimal hyper-parameters, substituting the relevant parameters into a classification model, and classifying the audio characteristic quantity by using the classification expressions (5) and (6).
In the dimension reduction analysis method based on the audio characteristic signal, in the step (6), the characteristic parameter classification is to send the audio characteristic signal subjected to the dimension reduction processing into the established classification model for classification processing. And finally, carrying out data visualization display on the classification result.
Compared with the existing dimension reduction method based on the audio frequency characteristics, the method has the advantages that:
(1) the invention utilizes the hidden function f obeying Gaussian distribution to connect input and function output, normalizes the function value of the hidden function f to be in a range of [01] by using a compression function, and classifies the function value by using the probability size more intuitively.
(2) The maximum likelihood is solved by Bayesian prior probability, and the classification algorithm is improved by further introducing the kernel function, so that the two classifications are further expanded into high-dimensional classification easily.
(3) The invention is provided for the classification problem of the dimension reduction audio characteristic signal, classifies the data after dimension reduction, has simple principle and easy programming, and has strong robustness for actual audio identification artificial intelligence and broadcast audio monitoring.
Drawings
FIG. 1 is a flowchart of the overall classification process of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, a classification method based on audio characteristic signals includes the following specific steps:
(1) audio signal acquisition: and collecting audio signals to obtain audio samples.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering and framing the digital signal to be written into the WAV file.
(3) Characteristic parameter extraction: and programming realizes the extraction of high-dimensional characteristic parameters of Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC) and Mel Frequency Cepstrum Coefficients (MFCC) of the preprocessed audio signals.
(4) Reducing the dimension of the characteristic parameter: and sending the extracted audio characteristic parameters into a built dimension reduction model for dimension reduction treatment, and storing the dimension-reduced audio characteristic parameters into a table.
(5) Building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of f to a [01] range by using a compression function, wherein the obtained compression value is the similarity of the two classes, distinguishing the classes according to the similarity, and the built model is the required classification model.
(6) Audio feature parameter classification: and (5) sending the audio characteristic quantity subjected to the dimension reduction in the step (4) into the built classification model in the step (5) for classification, and carrying out data visualization display on a classification result.
The audio acquisition is that the audio acquisition is to acquire an audio sample through an audio acquisition device, and the audio acquisition device sets sampling frequency (the sampling frequency meets the Nyquist sampling theorem), sampling channel number and quantization precision when acquiring an audio signal.
The signal preprocessing comprises the following steps:
(1) using a rectangular window function w (n) (the upper limit frequency is generally f)H3400Hz, lower frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
(2) The audio signal y after the filtering processing is carried outa(n) dividing the audio signal into a plurality of audio signal segments, wherein one audio signal segment is called a frame, and the time range of each audio signal segment is 10-30 ms. There is a partial overlap between frames, the portion of overlap being referred to as a frame shift, the frame shift taking the length of the frame 1/2 or 1/3.
The characteristic parameter extraction is to extract the characteristic parameters of Linear Predictive Coefficient (LPC), Linear Predictive Cepstrum Coefficient (LPCC) and Mel Frequency Cepstrum Coefficient (MFCC) of the audio signal after the frame division, and the characteristic parameters are distributed and put into 3 tables.
The characteristic parameter extraction is to obtain the optimal projection direction of characteristic vectors by using a Fisher criterion algorithm, judge the contribution degree of each characteristic component to identification by a method of increasing or decreasing the characteristic components, and synthesize the two characteristics to perform dimension reduction processing on audio characteristic signals to obtain a better dimension reduction result, wherein the larger the Fisher ratio is, the more important the dimension components are, wherein the Fisher linear discriminant criterion is as follows
In the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenThe inter-class variance of the characteristic components is represented, namely the variance of the mean values of different voice characteristic components; o ° owithinRepresents the intra-class variance of the feature components, i.e., the mean of the variances of the same speech feature components.
Where ρ represents the dimension of the characteristic parameter;representing the average value of the rho dimension component of the voice feature on all classes;representing the mean value of the epsilon-th class of the rho-dimension component of the voice feature; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;representing the rho-dimension component of the epsilon-type speech feature sequence.
The larger the Fisher ratio is, the more suitable the dimension characteristic parameter is as the characteristic information of the voice recognition, and the larger dimension component of the Fisher is selected as the dimension reduction result, so that the purpose of reducing the dimension is achieved.
The construction of the characteristic quantity classification model comprises the following steps:
(1) two types of audio characteristic signals after the dimensionality reduction processing are respectively defined as y being 1 and y being 0, and x represents the audio characteristic signals after the dimensionality reduction processing. The model introduces an implicit function f (x) and a response function delta (f (x)), wherein the implicit function f (x) follows a Gaussian distribution, the response function compresses the result of f (x) into a [01] interval, the likelihood of the data can be written as pi (x) ═ p (y ═ 1| x) ═ sigma (f (x)), and the response function of p (y ═ 1| f) ═ 1-delta (f) is
Since f is an implicit function subject to Gaussian distribution, the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is
Wherein σf 2For the coefficient parameters of the square exponential kernel, l represents the distance influence factor parameter between the two points x and x', and only two hyperparameters theta (sigma) exist in the kernel functionfL). For a given test point and a combined distribution of implicit functions at x, is
Wherein K is a covariance matrix expressed as
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*) (8)
The condition distribution of its implicit function is
f*|f~N(K*K-1f,K**-K*K-1K* T) (9)
Predictive conditional probability of implicit functions
The mean value of the conditional probability output of the implicit function prediction is defined asThe covariance matrix is defined as K', which is included in the explanation given in step (2). Passing the output value of the implicit function through a compression functionIs compressed to [01]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Namely have
δ*=∫δ(f*)p(f*|f)df* (11)
The compression processing values are given by Rasmussen and Williams (2006), chapter 9
(2) Analyzing the likelihood function according to Bayes formula to obtain posterior distribution of hidden function
The maximum likelihood function of the posterior probability is solved by the optimization algorithms such as simplex and the like
The optimal solution of f can be solved by substituting the equation into (10) and iterating for a certain number of times
Because p (y | f) is not Gaussian distribution, the posterior p (f | x, y) distribution of the implicit function is not analyzed, Laplace approximation is carried out, the posterior distribution p (f | x, y) is approximated by Gaussian distribution q (f | x, y), and log p (f | x, y) is subjected to second-order Taylor expansion at the maximum position of the posterior distribution, so that the Gaussian distribution approximation expression is obtained and is shown as
Thereby obtaining
(3) The premise of the smooth implementation of the classification algorithm is that the solution of the covariance matrix is realized, the relevant parameters in the implicit function become the key of the problem, and the implicit function assumed by the invention is a square exponential kernel function, namely, a formula for the hyper-parameter sigmafSolving for
p(y|x,θ)=∫p(y|f)p(f|x)df (18)
And (3) performing second-order Taylor expansion on the local maximum value point of logp (y | x, theta) and probability normalization processing to obtain the optimal hyper-parameter, namely maximizing the conditional probability as much as possible, and finally performing Laplacian approximation to obtain a log-likelihood function expansion.
And (5) substituting the correlation formulas (15) to (17) into a log-likelihood function, solving an optimal hyper-parameter by using a simplex method for optimization, directly calling a simplex function in programming software to solve, back-substituting the correlation formula parameters to obtain a dimension reduction model, and classifying the data by using (11) and (12).
And the audio characteristic parameter classification is to send the audio characteristic signals subjected to the dimensionality reduction processing into the established classification model for classification processing. And finally, performing data visualization display on the classification result, and giving out corresponding classification accuracy. The invention only describes a two-classification algorithm, and corresponding vectorization expansion is carried out on multi-classification problems.
The present invention is not limited to the above-described embodiments, and can be applied to other related fields within the scope of knowledge of those skilled in the art without departing from the spirit of the present invention.
Claims (5)
1. A method of classification based on audio feature signals, characterized by: the method comprises the following specific steps:
(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;
(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;
(3) characteristic parameter extraction: extracting high-dimensional characteristic parameters including a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient from the preprocessed audio signal;
(4) reducing the dimension of the characteristic parameter: sending the extracted high-dimensional characteristic parameters into a built dimension reduction model for dimension reduction treatment and storage;
(5) building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of the f to a [01] range by using a compression function, and distinguishing the classes according to the size of the compressed value, wherein the built model is the required classification model;
the construction of the classification model comprises the following steps:
(1) two types of audio characteristic signals after dimension reduction processing are respectively defined as two types of y-1 and y-0, x is defined as the audio characteristic signals after dimension reduction processing, an implicit function f (x) and a response function delta (f (x)) are introduced into a classification model, wherein the implicit function f (x) obeys Gaussian distribution, the response function compresses the result of f (x) into a [01] interval, the likelihood function of the data is pi (x) ═ p (y ═ 1| x) ═ sigma (f) (x)), p (y ═ 1| f) ═ 1-delta (f), and the response function is as follows:
the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is as follows:
wherein sigmaf 2Is a coefficient parameter of square exponential kernel, l represents a distance influence parameter between two points of x and x', and two hyperparameters of a kernel function are theta (sigma)fL), the joint distribution of implicit functions given a test point and x-is:
wherein K is a covariance matrix, and the expression is as follows:
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*)
the condition distribution of the implicit function is as follows:
f*|f~N(K*K-1f,K**-K*K-1K* T)
prediction conditional probability of implicit function:
the mean value of the conditional probability output of the implicit function prediction is defined asCooperative prescriptionThe difference matrix is defined as K', the output value of the implicit function is compressed to [01] by the compression function]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Namely, the method comprises the following steps:
δ*=∫δ(f*)p(f*|f)df*
(2) Analyzing the likelihood function according to Bayes formula to obtain posterior distribution of implicit function
Using a simplex optimization algorithm one can obtain:
the optimal solution of f is solved by carrying out iterative solution by the prediction conditional probability calculation formula of the implicit functionThe Gaussian distribution q (f | x, y) is used to approximate the posterior distribution p (f | x, y), and the second-order Taylor expansion is performed on logp (f | x, y) at the maximum of the posterior distribution, thus obtaining the Gaussian distribution
Thereby obtaining
W is a Hessian matrix of negative logp (y | f), the formula is substituted into a log-likelihood function, and an optimal hyper-parameter is solved by using a simplex method optimization algorithm;
(3) the implicit function assumed by the invention is a square exponential kernel function, and the conditional probability of the output result of the classification model is as follows:
p(y|x,θ)=∫p(y|f)p(f|x)df
performing second-order Taylor expansion on local maximum value points of logp (y | x, theta), performing probability normalization processing, and finally performing Laplace approximate expansion
The solved hyper-parameters are back substituted to obtain the solved classification model;
(6) audio feature parameter classification: and (5) sending the high-dimensional characteristic parameters of the audio signals subjected to the dimension reduction in the step (4) into the classification model in the step (5) for classification, and visually displaying the classification results.
2. The audio feature signal-based classification method according to claim 1, characterized by: the audio signal is collected by an audio collecting device, and the audio collecting device needs to set the sampling frequency, the sampling channel number and the quantization precision when collecting the audio signal.
3. The audio feature signal-based classification method according to claim 1, characterized by: the signal preprocessing comprises the following steps:
(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal ya(n) wherein
(2) Filtering to obtain signal ya(n) pre-emphasizing and dividing into a plurality of audio frame signals, and partially overlapping from frame to frame.
4. The audio feature signal-based classification method according to claim 1, characterized by: the characteristic parameter extraction is to extract the characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of the audio signal after the framing, and respectively store the characteristic parameters into 3 tables.
5. The audio feature signal-based classification method according to claim 1, characterized by: the specific steps of the feature parameter dimension reduction are as follows: performing dimensionality reduction processing on the audio characteristic signal by using a Fisher criterion, selecting a dimensional component with large Fisher as x as a dimensionality reduction result to achieve the purpose of dimensionality reduction, wherein the Fisher linear discrimination criterion is as follows
In the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenRepresenting the inter-class variance of the feature components, namely the variance of the mean values of different speech feature components; o ° owithinRepresenting the intra-class variance of the feature components, namely the mean variance of the same voice feature component;
where ρ represents the dimension of the characteristic parameter;representing the mean value of the rho dimension component of the speech feature on all classes;mean values representing the epsilon-th class of the rho-th dimension components of the speech features; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;representing the rho-dimension component of the epsilon-type speech feature sequence.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810994308.0A CN109166591B (en) | 2018-08-29 | 2018-08-29 | Classification method based on audio characteristic signals |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810994308.0A CN109166591B (en) | 2018-08-29 | 2018-08-29 | Classification method based on audio characteristic signals |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109166591A CN109166591A (en) | 2019-01-08 |
CN109166591B true CN109166591B (en) | 2022-07-19 |
Family
ID=64893393
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810994308.0A Active CN109166591B (en) | 2018-08-29 | 2018-08-29 | Classification method based on audio characteristic signals |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109166591B (en) |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109065070B (en) * | 2018-08-29 | 2022-07-19 | 昆明理工大学 | Kernel function-based audio characteristic signal dimension reduction method |
CN109949824B (en) * | 2019-01-24 | 2021-08-03 | 江南大学 | City sound event classification method based on N-DenseNet and high-dimensional mfcc characteristics |
CN110931044A (en) * | 2019-12-12 | 2020-03-27 | 上海立可芯半导体科技有限公司 | Radio frequency searching method, channel classification method and electronic equipment |
CN110956965A (en) * | 2019-12-12 | 2020-04-03 | 电子科技大学 | Personalized intelligent home safety control system and method based on voiceprint recognition |
CN113223511B (en) * | 2020-01-21 | 2024-04-16 | 珠海市煊扬科技有限公司 | Audio processing device for speech recognition |
CN117275519B (en) * | 2023-11-22 | 2024-02-13 | 珠海高凌信息科技股份有限公司 | Voice type identification correction method, system, device and medium |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7031530B2 (en) * | 2001-11-27 | 2006-04-18 | Lockheed Martin Corporation | Compound classifier for pattern recognition applications |
US9111547B2 (en) * | 2012-08-22 | 2015-08-18 | Kodak Alaris Inc. | Audio signal semantic concept classification method |
CN103151039A (en) * | 2013-02-07 | 2013-06-12 | 中国科学院自动化研究所 | Speaker age identification method based on SVM (Support Vector Machine) |
CN103854645B (en) * | 2014-03-05 | 2016-08-24 | 东南大学 | A kind of based on speaker's punishment independent of speaker's speech-emotion recognition method |
US10141009B2 (en) * | 2016-06-28 | 2018-11-27 | Pindrop Security, Inc. | System and method for cluster-based audio event detection |
CN107871498A (en) * | 2017-10-10 | 2018-04-03 | 昆明理工大学 | It is a kind of based on Fisher criterions to improve the composite character combinational algorithm of phonetic recognization rate |
CN108109612A (en) * | 2017-12-07 | 2018-06-01 | 苏州大学 | Voice recognition classification method based on self-adaptive dimension reduction |
-
2018
- 2018-08-29 CN CN201810994308.0A patent/CN109166591B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109166591A (en) | 2019-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109166591B (en) | Classification method based on audio characteristic signals | |
CN105976809B (en) | Identification method and system based on speech and facial expression bimodal emotion fusion | |
CN107393554B (en) | Feature extraction method for fusion inter-class standard deviation in sound scene classification | |
CN110852215B (en) | Multi-mode emotion recognition method and system and storage medium | |
CN109034046B (en) | Method for automatically identifying foreign matters in electric energy meter based on acoustic detection | |
CN103280220A (en) | Real-time recognition method for baby cry | |
Deshmukh et al. | Speech based emotion recognition using machine learning | |
Ghai et al. | Emotion recognition on speech signals using machine learning | |
US8812310B2 (en) | Environment recognition of audio input | |
CN114023354A (en) | Guidance type acoustic event detection model training method based on focusing loss function | |
CN113539294A (en) | Method for collecting and identifying sound of abnormal state of live pig | |
CN110931023A (en) | Gender identification method, system, mobile terminal and storage medium | |
CN111933148A (en) | Age identification method and device based on convolutional neural network and terminal | |
CN112906544A (en) | Voiceprint and face-based matching method suitable for multiple targets | |
CN113707175A (en) | Acoustic event detection system based on feature decomposition classifier and self-adaptive post-processing | |
CN116741159A (en) | Audio classification and model training method and device, electronic equipment and storage medium | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Yu | Research on music emotion classification based on CNN-LSTM network | |
Sharma et al. | Speech Emotion Recognition System using SVD algorithm with HMM Model | |
CN113658582A (en) | Voice-video cooperative lip language identification method and system | |
Aurchana et al. | Musical instruments sound classification using GMM | |
Therese et al. | A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system | |
Li et al. | Research on isolated word recognition algorithm based on machine learning | |
CN116230012B (en) | Two-stage abnormal sound detection method based on metadata comparison learning pre-training | |
CN117079673B (en) | Intelligent emotion recognition method based on multi-mode artificial intelligence |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |