CN109166591B - Classification method based on audio characteristic signals - Google Patents

Classification method based on audio characteristic signals Download PDF

Info

Publication number
CN109166591B
CN109166591B CN201810994308.0A CN201810994308A CN109166591B CN 109166591 B CN109166591 B CN 109166591B CN 201810994308 A CN201810994308 A CN 201810994308A CN 109166591 B CN109166591 B CN 109166591B
Authority
CN
China
Prior art keywords
audio
function
classification
feature
characteristic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810994308.0A
Other languages
Chinese (zh)
Other versions
CN109166591A (en
Inventor
龙华
杨明亮
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810994308.0A priority Critical patent/CN109166591B/en
Publication of CN109166591A publication Critical patent/CN109166591A/en
Application granted granted Critical
Publication of CN109166591B publication Critical patent/CN109166591B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Artificial Intelligence (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Auxiliary Devices For Music (AREA)

Abstract

The invention relates to a classification method based on audio characteristic signals, and belongs to the technical field of audio signal processing. The invention classifies the audio characteristic signals after the dimension reduction processing by utilizing the Gaussian kernel function and the Bayesian prior knowledge. The classification algorithm based on the audio characteristic signals can be used for audio broadcast monitoring, artificial intelligent speech recognition, audio scene mode distinguishing and the like. The invention mainly carries out audio classification aiming at the audio characteristic signal coefficient domain characteristics, and has better universality and stability compared with the prior art of carrying out classification based on audio content. The invention utilizes the excellent nonlinear characteristics of the Gaussian kernel function and the high-efficiency optimization algorithm to avoid the defects of single application scene, low running speed and poor classification effect caused by linear mapping. The algorithm theory is simple, easy to program and implement, and more practical and practical in engineering projects.

Description

Classification method based on audio characteristic signals
Technical Field
The invention relates to a classification method based on audio characteristic signals, and belongs to the technical field of audio characteristic signal processing.
Background
In order to improve the recognition efficiency and accuracy based on audio signals, and meanwhile, audio feature classification is a great position in audio monitoring management and control of wireless broadcasting, so that the research on classification algorithms based on audio feature signals is particularly important, and the current main classification algorithms include Bayes classification algorithms, decision tree algorithms, support vector machine algorithms and the like, and most of the classification algorithms are poor in classification effect, complex in algorithm, large in calculation amount, difficult to realize programming and the like. The algorithm utilizes the excellent nonlinear characteristic of the Gaussian kernel function and combines the Bayes prior theory, can obtain a satisfactory result aiming at the classification problem of the audio characteristic signal subjected to the dimensionality reduction processing, and also shows an excellent effect in the actual engineering.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a classification method based on audio characteristic signals, which comprises the steps of firstly extracting audio characteristic parameters of the audio signals and carrying out dimension reduction treatment, sending the feature parameters after dimension reduction into a built classification model, and judging the category of a test point by using the input and output similarity probability, thereby realizing the purpose of audio classification, namely audio identification.
The technical scheme of the invention is as follows: a classification method based on audio characteristic signals. The method specifically comprises the following steps:
(1) audio signal acquisition: and acquiring an audio signal to obtain an audio sample.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering and framing the digital signals to be written into the WAV file.
(3) Characteristic parameter extraction: and programming realizes the extraction of high-dimensional characteristic parameters of Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC) and Mel Frequency Cepstrum Coefficients (MFCC) of the preprocessed audio signals.
(4) Reducing the dimension of the characteristic parameter: and sending the extracted audio characteristic parameters into a built dimension reduction model for dimension reduction treatment, and storing the dimension-reduced audio characteristic parameters into a table.
(5) Building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of f to a [01] range by using a compression function, wherein the obtained compression value is the similarity of the two classes, distinguishing the classes according to the similarity, and the built model is the required classification model.
(6) Audio feature parameter classification: and (5) sending the audio characteristic quantity subjected to the dimensionality reduction in the step (4) into the classification model in the step (5) for audio characteristic parameter classification, and performing data visualization display on a classification result.
In the audio collection in the step (1), the audio collection is to collect an audio sample by using an audio collection device, and the audio collection device sets a sampling frequency (the sampling frequency satisfies nyquist sampling theorem), a sampling channel number (set according to a collection object), and quantization precision when collecting the audio signal.
In the above method for dimension reduction analysis based on audio feature signals, the signal preprocessing in step (2) includes the following steps:
(1) using a rectangular window function w (n) (upper limit frequency is generally f)H3400Hz, lower limit frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
Figure BDA0001781532710000021
(2) Because the audio signal is not a stable signal and is not suitable for directly extracting the characteristic parameters, the audio signal y after the filtering processing is carried outa(n) dividing the audio signal into a plurality of audio signal segments, one audio signal segment is called a frame, and the time range of each audio signal segment is between 10 and 30 ms. The frames are partially overlapped, the overlapped part is called frame shift, and the frame shift takes 1/2 or 1/3 of the length of the frame.
In the above classification method based on audio feature signals, the feature parameter extraction in step (3) is to perform feature parameter extraction on the audio signals after being framed by Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC), mel-frequency cepstrum coefficients (MFCC), and to put them into 3 tables, respectively.
In the classification method based on the audio characteristic signals, in the step (4), the feature parameter dimension reduction is to obtain the optimal projection direction of the feature vector by using a Fisher criterion algorithm, the contribution degree of each feature component to the identification is judged by increasing or decreasing the feature components, the dimension reduction processing of the audio characteristic signals is carried out by combining the two characteristics to obtain a better dimension reduction result, the dimension component is more important when the Fisher ratio is larger, and the Fisher linear judgment criterion is that the dimension component is more important when the Fisher ratio is larger, wherein
Figure BDA0001781532710000031
In the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenThe inter-class variance of the characteristic components is represented, namely the variance of the mean values of different voice characteristic components; o ° owithinRepresents the intra-class variance of the feature components, i.e., the mean of the variances of the same speech feature components.
Figure BDA0001781532710000032
Figure BDA0001781532710000033
Where ρ represents the dimension of the characteristic parameter;
Figure BDA0001781532710000034
representing the mean value of rho dimension components of the voice features on all classes;
Figure BDA0001781532710000035
mean values representing the epsilon-th class of the rho-th dimension components of the speech features; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;
Figure BDA0001781532710000036
representing the rho-dimension component of the epsilon-type speech feature sequence.
The inter-class variance of the feature component reflects the degree of difference between different speech samples, while the intra-class variance reflects the degree of density between the same speech samples, and for the feature component, its separability is characterized from both the intra-class variance and the inter-class variance. The larger the Fisher ratio is, the more suitable the dimension characteristic parameter is as the characteristic information of the voice recognition, and the larger dimension component of the Fisher is selected as the dimension reduction result, so that the purpose of reducing the dimension is achieved.
In the above classification method based on the audio characteristic signal, the building of the characteristic quantity classification model in the step (5) includes the following steps:
(1) first, considering a dichotomy problem, two types of audio feature signals after dimension reduction processing are respectively defined as y being 1 and y being 0, and x represents the audio feature signals after dimension reduction processing. The model introduces an implicit function f (x) and a response function delta (f (x)), wherein the implicit function f (x) follows Gaussian distribution, and the response function compresses the result of f (x) to [01]]In the interval, the likelihood of data can be written as a response function here pi (x) ═ p (y ═ 1| x) ═ σ (f (x)), p (y ═ 1| f) ═ 1- δ (f)
Figure BDA0001781532710000037
(2) Since f is an implicit function subject to Gaussian distribution, the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is
Figure BDA0001781532710000038
Wherein σf 2For the coefficient parameters of the square exponential kernel, l represents the distance influence factor parameter between the two points x and x', and only two hyperparameters theta (sigma) exist in the kernel functionfL). For a given test point and a combined distribution of implicit functions at x, is
Figure BDA0001781532710000041
Wherein K is a covariance matrix expressed as
Figure BDA0001781532710000042
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*) (4)
The condition distribution of its implicit function is f*|f~N(K*K-1f,K**-K*K-1K* T) Prediction conditional probability of implicit function
Figure BDA0001781532710000043
Here, the conditional distribution of the implicit function is the same as the predicted scheduling probability distribution of the implicit function but the expression is not exactly the same, so it is assumed that the average value of the predicted conditional probability output of the implicit function is
Figure BDA0001781532710000044
The corresponding covariance matrix K' will also be different in the same way, the meaning of which is explained in step (2). Compress implicit function to [01]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Then there is
δ*=∫δ(f*)p(f*|f)df* (5)
The compression processing values are given by Rasmussen and Williams (2006), Chapter 9
Figure BDA0001781532710000045
(2) For the likelihood, firstly, function analysis is needed, and according to a Bayes formula, the posterior distribution of the implicit function is
Figure BDA0001781532710000046
To maximize the posterior probability of the implicit function, i.e. to solve the maximum likelihood, the maximum likelihood is obtained by using the optimization algorithms such as simplex
Figure BDA0001781532710000047
The optimal solution of f can be solved by substituting the equation into (5) and iterating for a certain number of times
Figure RE-GDA0001886260210000048
Since p (y | f) is not Gaussian distribution, the posterior p (f | x, y) distribution of the implicit function is not analyzed, Laplace is used for approximation, Gaussian distribution q (f | x, y) is used for approximating the posterior distribution p (f | x, y), and log p (f | x, y) at the maximum position of the posterior distribution is subjected to second-order Taylor expansion, so that the Gaussian distribution can be obtained
Figure BDA0001781532710000051
Thereby obtaining K ═ K + W-1
Figure BDA0001781532710000052
Where W is the Hessian matrix of negative logp (y | f).
(3) The implicit function assumed by the invention is a square exponential kernel function, i.e. a formula versus an over-parameter sigmafSolving for
p(y|x,θ)=∫p(y|f)p(f|x)df (9)
To obtain the optimal hyper-parameters, i.e. maximize the conditional probability as much as possible, second-order Taylor expansion is performed on logp (y | x, theta) at the local maximum point, probability normalization is performed, and finally Laplace approximation, i.e. logarithmic likelihood function expansion, is performed, for detailed derivation, see chart 2of Gaussian processing for machine learning of Rasmussen and Williams (2006)
Figure BDA0001781532710000053
And (3) substituting the solved relevant parameters into the log-likelihood function, optimizing by using a simplex method to solve the optimal hyper-parameters, substituting the relevant parameters into a classification model, and classifying the audio characteristic quantity by using the classification expressions (5) and (6).
In the dimension reduction analysis method based on the audio characteristic signal, in the step (6), the characteristic parameter classification is to send the audio characteristic signal subjected to the dimension reduction processing into the established classification model for classification processing. And finally, carrying out data visualization display on the classification result.
Compared with the existing dimension reduction method based on the audio frequency characteristics, the method has the advantages that:
(1) the invention utilizes the hidden function f obeying Gaussian distribution to connect input and function output, normalizes the function value of the hidden function f to be in a range of [01] by using a compression function, and classifies the function value by using the probability size more intuitively.
(2) The maximum likelihood is solved by Bayesian prior probability, and the classification algorithm is improved by further introducing the kernel function, so that the two classifications are further expanded into high-dimensional classification easily.
(3) The invention is provided for the classification problem of the dimension reduction audio characteristic signal, classifies the data after dimension reduction, has simple principle and easy programming, and has strong robustness for actual audio identification artificial intelligence and broadcast audio monitoring.
Drawings
FIG. 1 is a flowchart of the overall classification process of the present invention.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1, a classification method based on audio characteristic signals includes the following specific steps:
(1) audio signal acquisition: and collecting audio signals to obtain audio samples.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering and framing the digital signal to be written into the WAV file.
(3) Characteristic parameter extraction: and programming realizes the extraction of high-dimensional characteristic parameters of Linear Prediction Coefficients (LPC), Linear Prediction Cepstrum Coefficients (LPCC) and Mel Frequency Cepstrum Coefficients (MFCC) of the preprocessed audio signals.
(4) Reducing the dimension of the characteristic parameter: and sending the extracted audio characteristic parameters into a built dimension reduction model for dimension reduction treatment, and storing the dimension-reduced audio characteristic parameters into a table.
(5) Building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of f to a [01] range by using a compression function, wherein the obtained compression value is the similarity of the two classes, distinguishing the classes according to the similarity, and the built model is the required classification model.
(6) Audio feature parameter classification: and (5) sending the audio characteristic quantity subjected to the dimension reduction in the step (4) into the built classification model in the step (5) for classification, and carrying out data visualization display on a classification result.
The audio acquisition is that the audio acquisition is to acquire an audio sample through an audio acquisition device, and the audio acquisition device sets sampling frequency (the sampling frequency meets the Nyquist sampling theorem), sampling channel number and quantization precision when acquiring an audio signal.
The signal preprocessing comprises the following steps:
(1) using a rectangular window function w (n) (the upper limit frequency is generally f)H3400Hz, lower frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
Figure BDA0001781532710000071
(2) The audio signal y after the filtering processing is carried outa(n) dividing the audio signal into a plurality of audio signal segments, wherein one audio signal segment is called a frame, and the time range of each audio signal segment is 10-30 ms. There is a partial overlap between frames, the portion of overlap being referred to as a frame shift, the frame shift taking the length of the frame 1/2 or 1/3.
The characteristic parameter extraction is to extract the characteristic parameters of Linear Predictive Coefficient (LPC), Linear Predictive Cepstrum Coefficient (LPCC) and Mel Frequency Cepstrum Coefficient (MFCC) of the audio signal after the frame division, and the characteristic parameters are distributed and put into 3 tables.
The characteristic parameter extraction is to obtain the optimal projection direction of characteristic vectors by using a Fisher criterion algorithm, judge the contribution degree of each characteristic component to identification by a method of increasing or decreasing the characteristic components, and synthesize the two characteristics to perform dimension reduction processing on audio characteristic signals to obtain a better dimension reduction result, wherein the larger the Fisher ratio is, the more important the dimension components are, wherein the Fisher linear discriminant criterion is as follows
Figure BDA0001781532710000072
In the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenThe inter-class variance of the characteristic components is represented, namely the variance of the mean values of different voice characteristic components; o ° owithinRepresents the intra-class variance of the feature components, i.e., the mean of the variances of the same speech feature components.
Figure BDA0001781532710000073
Figure BDA0001781532710000074
Where ρ represents the dimension of the characteristic parameter;
Figure BDA0001781532710000075
representing the average value of the rho dimension component of the voice feature on all classes;
Figure BDA0001781532710000076
representing the mean value of the epsilon-th class of the rho-dimension component of the voice feature; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;
Figure BDA0001781532710000077
representing the rho-dimension component of the epsilon-type speech feature sequence.
The larger the Fisher ratio is, the more suitable the dimension characteristic parameter is as the characteristic information of the voice recognition, and the larger dimension component of the Fisher is selected as the dimension reduction result, so that the purpose of reducing the dimension is achieved.
The construction of the characteristic quantity classification model comprises the following steps:
(1) two types of audio characteristic signals after the dimensionality reduction processing are respectively defined as y being 1 and y being 0, and x represents the audio characteristic signals after the dimensionality reduction processing. The model introduces an implicit function f (x) and a response function delta (f (x)), wherein the implicit function f (x) follows a Gaussian distribution, the response function compresses the result of f (x) into a [01] interval, the likelihood of the data can be written as pi (x) ═ p (y ═ 1| x) ═ sigma (f (x)), and the response function of p (y ═ 1| f) ═ 1-delta (f) is
Figure BDA0001781532710000081
Since f is an implicit function subject to Gaussian distribution, the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is
Figure BDA0001781532710000082
Wherein σf 2For the coefficient parameters of the square exponential kernel, l represents the distance influence factor parameter between the two points x and x', and only two hyperparameters theta (sigma) exist in the kernel functionfL). For a given test point and a combined distribution of implicit functions at x, is
Figure BDA0001781532710000083
Wherein K is a covariance matrix expressed as
Figure BDA0001781532710000084
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*) (8)
The condition distribution of its implicit function is
f*|f~N(K*K-1f,K**-K*K-1K* T) (9)
Predictive conditional probability of implicit functions
Figure BDA0001781532710000085
The mean value of the conditional probability output of the implicit function prediction is defined as
Figure BDA0001781532710000086
The covariance matrix is defined as K', which is included in the explanation given in step (2). Passing the output value of the implicit function through a compression functionIs compressed to [01]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Namely have
δ*=∫δ(f*)p(f*|f)df* (11)
The compression processing values are given by Rasmussen and Williams (2006), chapter 9
Figure BDA0001781532710000091
(2) Analyzing the likelihood function according to Bayes formula to obtain posterior distribution of hidden function
Figure BDA0001781532710000092
Figure BDA0001781532710000093
The maximum likelihood function of the posterior probability is solved by the optimization algorithms such as simplex and the like
Figure BDA0001781532710000094
The optimal solution of f can be solved by substituting the equation into (10) and iterating for a certain number of times
Figure RE-GDA0001886260210000095
Because p (y | f) is not Gaussian distribution, the posterior p (f | x, y) distribution of the implicit function is not analyzed, Laplace approximation is carried out, the posterior distribution p (f | x, y) is approximated by Gaussian distribution q (f | x, y), and log p (f | x, y) is subjected to second-order Taylor expansion at the maximum position of the posterior distribution, so that the Gaussian distribution approximation expression is obtained and is shown as
Figure BDA0001781532710000096
Thereby obtaining
Figure BDA0001781532710000097
Where W is the Hessian matrix of negative logp (y | f), above
Figure BDA0001781532710000098
K' is a complete solution process.
(3) The premise of the smooth implementation of the classification algorithm is that the solution of the covariance matrix is realized, the relevant parameters in the implicit function become the key of the problem, and the implicit function assumed by the invention is a square exponential kernel function, namely, a formula for the hyper-parameter sigmafSolving for
p(y|x,θ)=∫p(y|f)p(f|x)df (18)
And (3) performing second-order Taylor expansion on the local maximum value point of logp (y | x, theta) and probability normalization processing to obtain the optimal hyper-parameter, namely maximizing the conditional probability as much as possible, and finally performing Laplacian approximation to obtain a log-likelihood function expansion.
Figure BDA0001781532710000101
And (5) substituting the correlation formulas (15) to (17) into a log-likelihood function, solving an optimal hyper-parameter by using a simplex method for optimization, directly calling a simplex function in programming software to solve, back-substituting the correlation formula parameters to obtain a dimension reduction model, and classifying the data by using (11) and (12).
And the audio characteristic parameter classification is to send the audio characteristic signals subjected to the dimensionality reduction processing into the established classification model for classification processing. And finally, performing data visualization display on the classification result, and giving out corresponding classification accuracy. The invention only describes a two-classification algorithm, and corresponding vectorization expansion is carried out on multi-classification problems.
The present invention is not limited to the above-described embodiments, and can be applied to other related fields within the scope of knowledge of those skilled in the art without departing from the spirit of the present invention.

Claims (5)

1. A method of classification based on audio feature signals, characterized by: the method comprises the following specific steps:
(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;
(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;
(3) characteristic parameter extraction: extracting high-dimensional characteristic parameters including a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient from the preprocessed audio signal;
(4) reducing the dimension of the characteristic parameter: sending the extracted high-dimensional characteristic parameters into a built dimension reduction model for dimension reduction treatment and storage;
(5) building a classification model: firstly, describing the similarity of one class and the other class by using an implicit function f obeying Gaussian distribution, secondly, compressing the output value of the f to a [01] range by using a compression function, and distinguishing the classes according to the size of the compressed value, wherein the built model is the required classification model;
the construction of the classification model comprises the following steps:
(1) two types of audio characteristic signals after dimension reduction processing are respectively defined as two types of y-1 and y-0, x is defined as the audio characteristic signals after dimension reduction processing, an implicit function f (x) and a response function delta (f (x)) are introduced into a classification model, wherein the implicit function f (x) obeys Gaussian distribution, the response function compresses the result of f (x) into a [01] interval, the likelihood function of the data is pi (x) ═ p (y ═ 1| x) ═ sigma (f) (x)), p (y ═ 1| f) ═ 1-delta (f), and the response function is as follows:
Figure FDA0003512750820000011
the implicit function is assumed to be a Gaussian square exponential kernel function, and the expression is as follows:
Figure FDA0003512750820000012
wherein sigmaf 2Is a coefficient parameter of square exponential kernel, l represents a distance influence parameter between two points of x and x', and two hyperparameters of a kernel function are theta (sigma)fL), the joint distribution of implicit functions given a test point and x-is:
Figure FDA0003512750820000021
wherein K is a covariance matrix, and the expression is as follows:
Figure FDA0003512750820000022
K*=[k(x*,x1) k(x*,x2)...k(x*,xn)] K**=k(x*,x*)
the condition distribution of the implicit function is as follows:
f*|f~N(K*K-1f,K**-K*K-1K* T)
prediction conditional probability of implicit function:
Figure FDA0003512750820000023
the mean value of the conditional probability output of the implicit function prediction is defined as
Figure FDA0003512750820000024
Cooperative prescriptionThe difference matrix is defined as K', the output value of the implicit function is compressed to [01] by the compression function]The interval yields the probability of a class membership and defines δ*=δ(f*)=φ(f*) Namely, the method comprises the following steps:
δ*=∫δ(f*)p(f*|f)df*
its compression processing value
Figure FDA0003512750820000025
(2) Analyzing the likelihood function according to Bayes formula to obtain posterior distribution of implicit function
Figure FDA0003512750820000026
Figure FDA0003512750820000027
Using a simplex optimization algorithm one can obtain:
Figure FDA0003512750820000028
the optimal solution of f is solved by carrying out iterative solution by the prediction conditional probability calculation formula of the implicit function
Figure FDA0003512750820000029
The Gaussian distribution q (f | x, y) is used to approximate the posterior distribution p (f | x, y), and the second-order Taylor expansion is performed on logp (f | x, y) at the maximum of the posterior distribution, thus obtaining the Gaussian distribution
Figure FDA0003512750820000031
Thereby obtaining
K′=K+W-1
Figure FDA0003512750820000032
W is a Hessian matrix of negative logp (y | f), the formula is substituted into a log-likelihood function, and an optimal hyper-parameter is solved by using a simplex method optimization algorithm;
(3) the implicit function assumed by the invention is a square exponential kernel function, and the conditional probability of the output result of the classification model is as follows:
p(y|x,θ)=∫p(y|f)p(f|x)df
performing second-order Taylor expansion on local maximum value points of logp (y | x, theta), performing probability normalization processing, and finally performing Laplace approximate expansion
Figure FDA0003512750820000033
The solved hyper-parameters are back substituted to obtain the solved classification model;
(6) audio feature parameter classification: and (5) sending the high-dimensional characteristic parameters of the audio signals subjected to the dimension reduction in the step (4) into the classification model in the step (5) for classification, and visually displaying the classification results.
2. The audio feature signal-based classification method according to claim 1, characterized by: the audio signal is collected by an audio collecting device, and the audio collecting device needs to set the sampling frequency, the sampling channel number and the quantization precision when collecting the audio signal.
3. The audio feature signal-based classification method according to claim 1, characterized by: the signal preprocessing comprises the following steps:
(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal ya(n) wherein
Figure FDA0003512750820000034
(2) Filtering to obtain signal ya(n) pre-emphasizing and dividing into a plurality of audio frame signals, and partially overlapping from frame to frame.
4. The audio feature signal-based classification method according to claim 1, characterized by: the characteristic parameter extraction is to extract the characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of the audio signal after the framing, and respectively store the characteristic parameters into 3 tables.
5. The audio feature signal-based classification method according to claim 1, characterized by: the specific steps of the feature parameter dimension reduction are as follows: performing dimensionality reduction processing on the audio characteristic signal by using a Fisher criterion, selecting a dimensional component with large Fisher as x as a dimensionality reduction result to achieve the purpose of dimensionality reduction, wherein the Fisher linear discrimination criterion is as follows
Figure FDA0003512750820000041
In the formula rFisherIs the Fisher ratio or Fisher criterion of the characteristic components; o ° obetweenRepresenting the inter-class variance of the feature components, namely the variance of the mean values of different speech feature components; o ° owithinRepresenting the intra-class variance of the feature components, namely the mean variance of the same voice feature component;
Figure FDA0003512750820000042
Figure FDA0003512750820000043
where ρ represents the dimension of the characteristic parameter;
Figure FDA0003512750820000044
representing the mean value of the rho dimension component of the speech feature on all classes;
Figure FDA0003512750820000045
mean values representing the epsilon-th class of the rho-th dimension components of the speech features; omegaεRepresenting a sequence of speech features of the epsilon-th class; gamma and kappaεRespectively representing the category number and the sample number of each type of the voice feature sequence;
Figure FDA0003512750820000046
representing the rho-dimension component of the epsilon-type speech feature sequence.
CN201810994308.0A 2018-08-29 2018-08-29 Classification method based on audio characteristic signals Active CN109166591B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810994308.0A CN109166591B (en) 2018-08-29 2018-08-29 Classification method based on audio characteristic signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810994308.0A CN109166591B (en) 2018-08-29 2018-08-29 Classification method based on audio characteristic signals

Publications (2)

Publication Number Publication Date
CN109166591A CN109166591A (en) 2019-01-08
CN109166591B true CN109166591B (en) 2022-07-19

Family

ID=64893393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810994308.0A Active CN109166591B (en) 2018-08-29 2018-08-29 Classification method based on audio characteristic signals

Country Status (1)

Country Link
CN (1) CN109166591B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065070B (en) * 2018-08-29 2022-07-19 昆明理工大学 Kernel function-based audio characteristic signal dimension reduction method
CN109949824B (en) * 2019-01-24 2021-08-03 江南大学 City sound event classification method based on N-DenseNet and high-dimensional mfcc characteristics
CN110931044A (en) * 2019-12-12 2020-03-27 上海立可芯半导体科技有限公司 Radio frequency searching method, channel classification method and electronic equipment
CN110956965A (en) * 2019-12-12 2020-04-03 电子科技大学 Personalized intelligent home safety control system and method based on voiceprint recognition
CN113223511B (en) * 2020-01-21 2024-04-16 珠海市煊扬科技有限公司 Audio processing device for speech recognition
CN117275519B (en) * 2023-11-22 2024-02-13 珠海高凌信息科技股份有限公司 Voice type identification correction method, system, device and medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7031530B2 (en) * 2001-11-27 2006-04-18 Lockheed Martin Corporation Compound classifier for pattern recognition applications
US9111547B2 (en) * 2012-08-22 2015-08-18 Kodak Alaris Inc. Audio signal semantic concept classification method
CN103151039A (en) * 2013-02-07 2013-06-12 中国科学院自动化研究所 Speaker age identification method based on SVM (Support Vector Machine)
CN103854645B (en) * 2014-03-05 2016-08-24 东南大学 A kind of based on speaker's punishment independent of speaker's speech-emotion recognition method
US10141009B2 (en) * 2016-06-28 2018-11-27 Pindrop Security, Inc. System and method for cluster-based audio event detection
CN107871498A (en) * 2017-10-10 2018-04-03 昆明理工大学 It is a kind of based on Fisher criterions to improve the composite character combinational algorithm of phonetic recognization rate
CN108109612A (en) * 2017-12-07 2018-06-01 苏州大学 Voice recognition classification method based on self-adaptive dimension reduction

Also Published As

Publication number Publication date
CN109166591A (en) 2019-01-08

Similar Documents

Publication Publication Date Title
CN109166591B (en) Classification method based on audio characteristic signals
CN105976809B (en) Identification method and system based on speech and facial expression bimodal emotion fusion
CN107393554B (en) Feature extraction method for fusion inter-class standard deviation in sound scene classification
CN110852215B (en) Multi-mode emotion recognition method and system and storage medium
CN109034046B (en) Method for automatically identifying foreign matters in electric energy meter based on acoustic detection
CN103280220A (en) Real-time recognition method for baby cry
Deshmukh et al. Speech based emotion recognition using machine learning
Ghai et al. Emotion recognition on speech signals using machine learning
US8812310B2 (en) Environment recognition of audio input
CN114023354A (en) Guidance type acoustic event detection model training method based on focusing loss function
CN113539294A (en) Method for collecting and identifying sound of abnormal state of live pig
CN110931023A (en) Gender identification method, system, mobile terminal and storage medium
CN111933148A (en) Age identification method and device based on convolutional neural network and terminal
CN112906544A (en) Voiceprint and face-based matching method suitable for multiple targets
CN113707175A (en) Acoustic event detection system based on feature decomposition classifier and self-adaptive post-processing
CN116741159A (en) Audio classification and model training method and device, electronic equipment and storage medium
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Yu Research on music emotion classification based on CNN-LSTM network
Sharma et al. Speech Emotion Recognition System using SVD algorithm with HMM Model
CN113658582A (en) Voice-video cooperative lip language identification method and system
Aurchana et al. Musical instruments sound classification using GMM
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system
Li et al. Research on isolated word recognition algorithm based on machine learning
CN116230012B (en) Two-stage abnormal sound detection method based on metadata comparison learning pre-training
CN117079673B (en) Intelligent emotion recognition method based on multi-mode artificial intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant