CN116665698A - Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform - Google Patents

Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform Download PDF

Info

Publication number
CN116665698A
CN116665698A CN202310406504.2A CN202310406504A CN116665698A CN 116665698 A CN116665698 A CN 116665698A CN 202310406504 A CN202310406504 A CN 202310406504A CN 116665698 A CN116665698 A CN 116665698A
Authority
CN
China
Prior art keywords
spectrum
pulse
transform
hilbert
marginal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310406504.2A
Other languages
Chinese (zh)
Inventor
孙裕超
孟东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Television Electroacoustic Research Institute Third Research Institute Of China Electronics Technology Corp
Tianjin University
Original Assignee
Television Electroacoustic Research Institute Third Research Institute Of China Electronics Technology Corp
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Television Electroacoustic Research Institute Third Research Institute Of China Electronics Technology Corp, Tianjin University filed Critical Television Electroacoustic Research Institute Third Research Institute Of China Electronics Technology Corp
Priority to CN202310406504.2A priority Critical patent/CN116665698A/en
Publication of CN116665698A publication Critical patent/CN116665698A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Abstract

The application discloses a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, which comprises the following steps: generating an MFCC coefficient and a marginal spectrum coefficient based on a training pulse signal in advance to train an MFCC model and a marginal spectrum model so as to construct a trained Gaussian mixture model; for a given pulse signal, respectively performing Mel spectrum conversion processing and HHT conversion processing based on the pulse signal to generate MFCC coefficients and marginal spectrum coefficients of the pulse signal; calculating a Gaussian probability density function of the MFCC coefficients and the marginal spectrum coefficients to generate a Gaussian mixture model of the pulse to be identified; and matching based on the Gaussian mixture model of the pulse to be identified and the trained Gaussian mixture model, and determining the type of the pulse signal corresponding to the maximum probability. According to the embodiment of the application, hilbert-Huang transform (HHT) is adopted to extract the marginal spectrum characteristics of the pulse signals so as to effectively identify the local time domain characteristics of the nonlinear and non-stationary acoustic signals and improve the identification accuracy.

Description

Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform
Technical Field
The application relates to the technical field of audio detection, in particular to a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform.
Background
The pulse sound is an abrupt, discontinuous sound signal in a short time. Common impulsive sound signals are such as gunshot, lei Diansheng, door closing, etc. The traditional pulse sound identification method comprises a sound signal identification method based on the theories of wavelet transformation, autocorrelation, zero crossing rate analysis, spectrum analysis, linear prediction analysis and the like. In general, the traditional pulse sound recognition method can obtain good recognition rate, and can effectively distinguish target signals and interference signals; however, when the interference signal and the target signal have the same frequency range and overlap in frequency spectrum, the conventional method is difficult to separate the interference signal from the target signal, resulting in a high false alarm rate and unsatisfactory recognition result. On the other hand, in recent years, new voice recognition methods have been proposed, and the methods that are widely used are: by extracting Mel spectral coefficient characteristics of the sound signal, a Hidden Markov Model (HMM) is used to identify the target pulse sound signal. This method has been widely used in speech recognition and pattern recognition. However, this method has limitations that it cannot well express the local time domain characteristics of the nonlinear non-stationary acoustic signal and cannot further accurately identify the details of the time-frequency characteristics.
The conventional method is to extract the Mel spectral coefficient characteristics of the sound signal and use a Hidden Markov Model (HMM) to identify the target pulse sound signal. Although the method is widely applied to voice recognition and pattern recognition, mel spectral characteristics cannot well express local time domain characteristics of nonlinear and non-stationary acoustic signals, and further accurate recognition of details of time-frequency characteristics cannot be achieved. Secondly, in the probability recognition method of the Hidden Markov Model (HMM), the calculation complexity is high in links such as data training, probability calculation and the like; in the model, different parameters are input, so that the recognition accuracy of the algorithm can be fluctuated, and the recognition rate is low when the non-stationary signals are processed.
Disclosure of Invention
The embodiment of the application provides a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, which adopts Hilbert-Huang transform (HHT) to extract marginal spectrum characteristics of pulse signals so as to effectively identify local time domain characteristics of nonlinear non-stationary sound signals and improve identification accuracy.
The embodiment of the application provides a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, which comprises the following steps:
generating an MFCC coefficient and a marginal spectrum coefficient based on a training pulse signal in advance to train an MFCC model and a marginal spectrum model so as to construct a trained Gaussian mixture model;
for a given pulse signal, respectively performing Mel spectrum conversion processing and HHT conversion processing based on the pulse signal to generate MFCC coefficients and marginal spectrum coefficients of the pulse signal;
calculating a Gaussian probability density function of the MFCC coefficients and the marginal spectrum coefficients to generate a Gaussian mixture model of the pulse to be identified;
and matching based on the Gaussian mixture model of the pulse to be identified and the trained Gaussian mixture model, and determining the type of the pulse signal corresponding to the maximum probability, namely the pulse type of the given pulse signal.
Optionally, generating MFCC coefficients and marginal spectral coefficients based on the training pulse signal in advance to train the MFCC model and the marginal spectral model to construct a trained gaussian mixture model includes:
generating MFCC coefficients and marginal spectral coefficients based on training pulse signals in advance to construct a training sequence;
based on the training sequence, the GMM likelihood is expressed as:
wherein λ represents an estimated parameter, X i Representing a training sequence;
the training process satisfies:
estimating a new parameter by using EM algorithmSo that the likelihood under the new model parameters
And (5) iterating until the model converges.
Optionally, performing Mel-spectrum transformation processing and HHT-transformation processing on the basis of the pulse signal, respectively, to generate MFCC coefficients and marginal spectral coefficients of the pulse signal includes:
after preprocessing the pulse signal, performing FFT conversion;
calculating spectral line energy for the data after FFT of each frame;
multiplying and adding the spectral line energy of each frame with the frequency domain response of the mel filter to determine the energy passing through the mel filter;
based on the energy passed through the mel filter, the DCT is calculated to determine MFCC coefficients and first difference spectral coefficients.
Optionally, performing Mel-spectrum transformation processing and HHT-transformation processing on the basis of the pulse signal, respectively, to generate MFCC coefficients and marginal spectral coefficients of the pulse signal includes:
performing EMD screening on the pulse signal to obtain a plurality of IMF components;
performing Hilbert transform on each IMF component;
based on the results of the Hilbert transform, and the Hilbert spectrum, hilbert marginal spectrum and instantaneous energy density level are determined, satisfying:
Vh(ω)=h(ω t+1 )-h(ω t-1 )
wherein H (ω) represents a signal marginal spectrum, vh (ω) represents a first order differential coefficient of the marginal spectrum, and H (ω, t) represents a Hilbert spectrum;
a marginal spectral coefficient is determined based on the signal marginal spectrum.
Optionally, calculating the gaussian probability density function of the MFCC coefficients and the marginal spectral coefficients to generate the gaussian mixture model of the pulse to be identified includes:
defining probability density functions of an M-order Gaussian mixture model to satisfy the following conditions:
wherein X is a D-dimensional random vector; b i (X i ) I=1, l, m is a sub-distribution; omega i Is a mixed weight;
taking the MFCC coefficient, the first difference spectrum coefficient, the marginal spectrum coefficient and the first difference coefficient of the marginal spectrum as a sub-distribution b i (X i ) To generate a gaussian mixture model of the pulse to be identified.
Optionally, based on the gaussian mixture model of the pulse to be identified and the trained gaussian mixture model, matching is performed, and determining the type of the pulse signal corresponding to the maximum probability includes:
the maximum posterior probability based on Bayesian theory identifies that the pulse signal belongs to the category of training data and meets the following conditions:
i=argmaxP(X/λ i )
wherein i represents the type of the pulse signal identified, P (X/lambda i ) Representing the maximum posterior probability.
The embodiment of the application also provides a pulse sound identification device based on Hilbert-Huang transform and Mel spectrum transform, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the computer program realizes the steps of the pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform when being executed by the processor.
The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of the pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform when being executed by a processor.
The embodiment of the application provides a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, which adopts Hilbert-Huang transform (HHT) to extract marginal spectrum characteristics of pulse signals so as to effectively identify local time domain characteristics of nonlinear and non-stationary sound signals and improve identification accuracy.
The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present application more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is an overall flow example of a pulse sound recognition method according to an embodiment of the present application;
FIG. 2 is an example of a flow of extracting MFCC coefficients of a pulse tone recognition method according to an embodiment of the present application;
fig. 3 is a marginal spectrum example of a pulse sound recognition method according to an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The embodiment of the application provides a pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, as shown in figure 1, the identification scheme of pulse signals in the embodiment of the application is divided into two parts: a data training section and a model matching section.
First, a standard pulse signal is trained. According to the Mel spectrum theory, generating MFCC coefficients and first-order spectrum difference coefficients thereof; then, generating marginal spectral coefficients and first-order spectral difference coefficients thereof according to the Hilbert-Huang theory; and combining the two obtained coefficients into a whole, calculating a Gaussian probability density function of each coefficient, and generating a trained Gaussian mixture model.
And secondly, performing model matching on the unknown pulse signals. After the unknown pulse signals are subjected to signal pretreatment (windowing and framing), two types of spectrum transformation are carried out, and an MFCC coefficient, a marginal spectrum coefficient and a corresponding first-order differential spectrum coefficient are respectively generated; and calculating Gaussian probability density functions of the coefficients to generate a Gaussian mixture model of the pulse to be identified.
And finally, matching the trained model with the model to be identified, calculating the maximum likelihood probability of the matching, and finding the type of the pulse signal corresponding to the maximum probability, wherein the type is the pulse type of the pulse signal to be identified, and the identification is finished. The method specifically comprises the following steps:
the MFCC coefficients and the marginal spectrum coefficients are generated in advance based on the training pulse signal to train the MFCC model and the marginal spectrum model to construct a trained gaussian mixture model.
The embodiment of the application introduces the data training of the Gaussian mixture model, wherein the training of the GMM model is to give a group of training data, and the parameter lambda of the model is determined according to a certain criterion. The most common parameter estimation method is maximum likelihood (Maximum Likelihood, ML) estimation. In some embodiments, generating MFCC coefficients and marginal spectral coefficients based on the training pulse signal in advance to train the MFCC model and the marginal spectral model to construct a trained gaussian mixture model includes:
generating MFCC coefficients and marginal spectral coefficients based on training pulse signals in advance to construct a training sequence;
based on the training sequence, the GMM likelihood is expressed as:
wherein λ represents an estimated parameter, X i Representing a training sequence;
the training process satisfies:
a new parameter is estimated by adopting EM (Expectation Maximization) algorithmSo that the likelihood under the new model parameters +.>And (5) iterating until the model converges.
The new model parameters are then used as current parameters for training, and the operation is iterated until the model converges. In each iterative operation, the following re-estimation formula is utilized to ensure the monotonic increase of the model likelihood:
the re-estimation of the mixing weights satisfies:
the re-estimation of the mean value satisfies:
the re-estimation of variance satisfies:
wherein the posterior probability of component i is:
the EM (ExpectationMaximization) parameter estimation algorithm steps are as follows:
input: observation variable x= { X j J=1, 2, l, n, error epsilon, number of iterations M.
And (3) outputting: θ= { (ω) iii ),i=1,L,Q}。
1) The parameters are initialized and the parameters are set up,let m=1;
2) Iteration:
step E- -calculating the observed value X by using the idea of statistical averaging j Probability from the ith model:
m step-calculating the model parameters obtained by maximum likelihood using the calculated probability estimates:
if M is more than or equal to M, jumping to the step 3); otherwise, calculating the parameter iteration error, if [ theta ] mm-1 Jump to step 3) if the I is less than or equal to epsilon, otherwise, return to step 1 in step 2) to continue iteration;
3) Output the lastAnd the EM parameter estimation algorithm is finished as an estimation result.
For a given pulse signal, mel spectrum conversion processing and HHT conversion processing are respectively performed based on the pulse signal to generate MFCC coefficients and marginal spectrum coefficients of the pulse signal.
In some embodiments, performing a Mel-spectral transform process and a HHT transform process, respectively, based on the pulse signal to generate MFCC coefficients and marginal spectral coefficients of the pulse signal includes:
after preprocessing the pulse signal, FFT conversion is performed. In some specific examples the preprocessing includes pre-emphasis, framing, windowing functions. Pre-emphasis is used to compensate for the loss of high frequency components, boost the high frequency components, and windowing can be performed with hamming window functions.
Fast fourier transform FFT, performing FFT transform on each frame of signal, changing from time domain data to frequency domain data:
X(i,k)=FFT[x i (m)]
calculating spectral line energy for the data after FFT of each frame;
E(i,k)=[x i (k)] 2
multiplying and adding the spectral line energy of each frame with the frequency domain response of the mel filter to determine the energy passing through the mel filter;
based on the energy passed through the mel filter, the DCT is calculated to determine MFCC coefficients and first difference coefficients, an exemplary process is as follows:
calculating DCT cepstrum coefficient, FFT cepstrum of sequence x (n)The method comprises the following steps:
in the middle ofFT and FT -1 Representing Fourier transform and Fourier inversionAnd (5) changing. The DCT cepstrum for sequence x (n) is:
wherein, the parameter N is the length of the sequence x (N); c (k) is an orthogonality factor, which can be expressed as:
after taking the logarithm of the energy of the mel filter, calculating DCT to obtain the following components:
Δmfcc(i,n)=mfcc(i+1,n)-mfcc(i-1,n)
wherein S (i, m) represents the energy of the Meier filter; m represents the mth mel filter (M in total); i represents i-th frame data; n represents the spectral line after DCT; Δmfcc (i, n) is a first order differential coefficient.
In the above manner, the MFCC coefficients and the differential spectral coefficients are generated as one characteristic coefficient of mel spectrum in the acoustic target, and in the acoustic target model, as a characteristic for identifying an unknown acoustic target.
Hilbert-Huang Transform (HHT) is an effective method for nonlinear non-stationary signal analysis, and by definition of empirical mode decomposition (Empirical Mode Decomposition, EMD) and an intrinsic mode function (Intrinsic Mode Function, IMF), hilbert-Huang Transform (HHT) enables an adaptive decomposition of a complex signal into a set of IMFs with physically significant instantaneous frequencies, limited number of IMFs which can be amplitude or frequency modulation and have high-to-low frequency distribution, and their Hilbert Transform results form a three-dimensional time-frequency-energy distribution spectrogram which reveals the time-varying characteristics of nonlinear non-stationary signals. HHT does not need any priori knowledge, and the decomposition is based on self-adaptive dependence of the signals, so that the decomposition of the signals has real physical significance; HHT exhibits superior characteristics in nonlinear non-stationary signal analysis processing. In some embodiments, performing a Mel-spectral transform process and a HHT transform process, respectively, based on the pulse signal to generate MFCC coefficients and marginal spectral coefficients of the pulse signal includes:
EMD screening is performed on the pulse signal to obtain a plurality of IMF components. EMD (empirical mode decomposition) is a process of smoothing a signal, the result of which is to progressively decompose fluctuations or trends of different scales in the signal, producing a series of data sequences with different characteristic scales, each sequence being an IMF component.
Empirical mode decomposition
For any time series X (t), its hilbert transform can be obtained as defined below:
by the above equation, the resolution signal Z (t) is defined:
Z(t)=X(t)+i·Y(t)=a(t)·exp(i·θ(t))
wherein, the liquid crystal display device comprises a liquid crystal display device,
the instantaneous frequency calculation formula is:
and decomposing the single component IMF of the actual physical meaning of the instantaneous frequency by using an empirical mode decomposition method.
The IMF component provides a guarantee for the validity of the hilbert analysis, and the EMD method divides the nonlinear, non-stationary signal into a group of IMF decomposition methods. The EMD method determines the inherent vibration mode of the signal according to the characteristic time scale of the signal, and then sequentially decomposes; the time interval between successive extrema is used as a time scale definition of the natural modes within the signal, since it not only provides a high time-frequency resolution, but is equally applicable to signals where no zero crossing point is present. The method for screening the internal natural modes of the signal is given below:
first, a screening process is performed using envelopes formed by local maxima and local minima of a signal, respectively. After all local extreme points are determined, the upper envelope is formed by connecting cubic spline curves of maximum values, and likewise, the lower envelope is formed by connecting cubic spline curves of minimum values, so that all data points of the signal are positioned inside an area surrounded by the upper envelope and the lower envelope.
The average value of the upper envelope and the lower envelope of the original pulse signal s (t) is recorded as m 1 (t), then s (t) and m 1 The difference of (t) is the first component, denoted as h 1 (t):
s 1 (t)-m 1 (t)=h 1 (t)
In the second screening, h 1 (t) regarding the original pulse signal, the same method is applied to obtain:
h 1 (t)-m 1,1 (t)=h 1,1 (t)
the screening process is then repeated k times identically until h 1,k (t) satisfying the IMF condition, being a first IMF component. This process is represented as follows:
note imf 1,t (t)=h 1,k (t) thus imf 1,t (t) is the first IMF component to be screened from the original pulse signal s (t), and this level screening is referred to as inner screening in the present embodiment.
The inner layer screening process relies only on the feature time scale to first decompose the finest scale local modality from the signal. Imf is prepared by 1,t (t) is separated from the other components of s (t):
s 1 (t)-imf 1 (t)=r 1 (t)
r 1 (t) includes s (t) except imf 1,t The remaining components of (t),thus will r 1 (t) treating as a new signal to be decomposed, applying the same inner layer screening process to r 1 (t). Repeating the steps to obtain the following steps:
the EMD method is a decomposition method for decomposing nonlinear and non-stationary signals into a group of IMFs, and the decomposition result is as follows:
for each IMF component, a Hilbert transform is performed, i.e., after decomposition, the non-IMF components are discarded, thereby focusing more on the low-energy high-frequency components, and the result of each Hilbert transform is expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,
h is the hilbert transform operator. By the above expression, the amplitude modulation and the frequency modulation are clearly separated, which breaks the limit of constant amplitude and constant frequency in Fourier transform, so that HHT can be successfully applied to processing and analysis of nonlinear and non-stationary signals.
The definition of Hilbert spectrum is:
based on the results of the Hilbert transform, as well as the Hilbert spectrum, hilbert marginal spectra and instantaneous energy density levels are determined. In the embodiment of the application, the square of the amplitude is used for representing the energy density, and the Hilbert energy spectrum (Hilbert Energy Spectrum) is obtained by squaring the amplitude in the Hilbert spectrum. Hilbert marginal spectrum h (ω) (the HilbertMarginal Spectrum, HMS for short) and instantaneous energy density level IE (t) (the Instantaneous Energy Density Level) are defined, respectively:
h(ω)=∫ 0 T H(ω,t)dt
Vh(ω)=h(ω t+1 )-h(ω t-1 )
where T is the signal sampling time, H (ω) represents the signal marginal spectrum, vh (ω) represents the first order differential coefficient of the marginal spectrum, and H (ω, T) represents the Hilbert spectrum. h (ω) reflects the distribution of amplitude values at each frequency point and represents the cumulative amplitude along the whole data span in a probabilistic sense, i.e. the amplitude contribution of each frequency as a whole. Fig. 3 shows an example of a marginal spectrum, and finally, a marginal spectrum coefficient is determined based on the signal marginal spectrum, and the coefficient is used as a characteristic coefficient of the marginal spectrum in the acoustic target, and is used as a characteristic for identifying the unknown acoustic target in the acoustic target model.
A Gaussian probability density function of the MFCC coefficients and the marginal spectral coefficients is calculated to generate a Gaussian mixture model of the pulse to be identified.
And matching based on the Gaussian mixture model of the pulse to be identified and the trained Gaussian mixture model, and determining the type of the pulse signal corresponding to the maximum probability, namely the pulse type of the given pulse signal.
The method of the application aims to solve the complexity problem of the existing Hidden Markov Model (HMM) algorithm, and adopts a Gaussian mixture model (Gaussian Mixture Model), wherein the GMM can be regarded as a continuous distributed hidden Markov model with the state number of 1. Gaussian mixture models are less complex than Hidden Markov Model (HMM) algorithms. Theoretically, a Gaussian Mixture Model (GMM) can decompose any acoustic target signal. The method and the device have the advantages that the Hilbert-Huang transform (HHT) is adopted to extract the marginal spectrum characteristic of the pulse signal, the first-order spectrum characteristic of the sound target characteristic is added in the embodiment of the application, the first-order spectrum characteristic reflects the variation trend of the nonlinear non-stationary signal, and the recognition precision of the sound target is improved by introducing the high-order characteristic.
A gaussian mixture model is a model that has only one state in which there are multiple gaussian distribution functions. The Gaussian Mixture Model (GMM) can be regarded as a continuously distributed hidden markov model with a state number of 1. In some embodiments, calculating a gaussian probability density function of MFCC coefficients and marginal spectral coefficients to generate a gaussian mixture model of the pulse to be identified includes:
defining probability density functions of an M-order Gaussian mixture model to satisfy the following conditions:
wherein X is a D-dimensional random vector; b i (X i ) I=1, l, m is a sub-distribution; omega i Is a mixed weight. Each sub-distribution is a joint gaussian probability distribution of D dimension, which can be expressed as:
wherein mu is i Is the mean vector; sigma and method for producing the same i Is a covariance matrix, and the mixing weight value satisfies the following conditions:
the complete Gaussian mixture model consists of a parameter mean vector, a covariance matrix and a mixing weight, and is expressed as follows:
λ={ω iii },i=1,L,M
for a given time sequence x= { X t The log-likelihood of t=1, 2, l, t using the GMM model can be defined as:
taking the MFCC coefficient, the first difference spectrum coefficient, the marginal spectrum coefficient and the first difference coefficient of the marginal spectrum as a sub-distribution b i (X i ) To generate a gaussian mixture model of the pulse to be identified.
In some embodiments, based on the gaussian mixture model of the pulse to be identified and the trained gaussian mixture model, the matching is performed, and determining the type of the pulse signal corresponding to the maximum probability includes:
maximum posterior probability based on bayesian theory:
due to P (lambda) i ) The prior probability of the pulse signal belongs to the training data is identified to satisfy the following conditions under the assumption that the probabilities of the unknown sound target belonging to each class in the training set are equal:
i=argmaxP(X/λ i )
wherein i represents the type of the pulse signal identified, P (X/lambda i ) Representing the maximum posterior probability.
According to the embodiment of the application, the characteristic coefficient of the unknown sound target and the training set are calculated, the maximum posterior probability of the sound characteristic coefficient in the training set is calculated, the type corresponding to the maximum probability is calculated, and the type is the type of the unknown sound target, and the identification is completed.
The embodiment of the application also provides an application example of the pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform, taking certain gunsound data as an example, comprising the following steps:
step one: selecting gunshot data as training data; and generating Mel spectrum coefficients by using the training data. Utilizing a plurality of groups of data to be trained to generate MFCC coefficients one by one; generating an MFCC coefficient and a spectrum difference coefficient by each group of data;
step two: generating marginal spectrum coefficients one by utilizing a plurality of groups of data to be trained; generating a marginal spectrum coefficient and a spectrum difference coefficient by each group of data;
step three: and (5) estimating parameters of the GMM model. Training coefficients stored in the array in the two steps, and training a GMM model of data by using EM parameter estimation;
step four: and processing the gunshot data to be identified to generate an identification coefficient. Generating an MFCC coefficient and a marginal spectrum coefficient of the data to be detected and a spectrum difference coefficient of the corresponding coefficient by using the data to be detected;
step five: and calculating posterior probability and identifying the target. Combining the coefficient of the data to be detected with the GMM model data, and calculating the maximum likelihood probability item of the data to be detected, so that the target type of the data to be detected is identified, and the identification is completed.
The applicant verifies by using matlab based on the method of the present application, respectively counts the recognition rate of each algorithm, and analyzes the recognition accuracy of each algorithm, as shown in the following table 1:
table 1 algorithm recognition rate statistics under different features
The result shows that after the Hilbert marginal spectrum feature is introduced, the algorithm recognition rate is greatly improved, and the feature is proved to be capable of effectively recognizing the local time domain feature of the nonlinear non-stationary acoustic signal; after the first-order difference spectrum characteristic is introduced, the algorithm recognition rate is further improved, and the difference spectrum characteristic is proved to be an important recognition characteristic of the sound target signal, so that pulse signals with similar characteristics can be effectively distinguished.
According to the method provided by the embodiment of the application, a Gaussian Mixture Model (GMM) of the acoustic target is built, the characteristic parameters such as the marginal spectrum coefficient, the Mel spectrum coefficient and the differential spectrum of the acoustic target are extracted, the identification of the multidimensional characteristic space is realized, the details of the time-frequency characteristic are further accurately identified, the identification rate of the pulse acoustic target is greatly improved, and the limitation of the traditional method is overcome.
The embodiment of the application also provides a pulse sound identification device based on Hilbert-Huang transform and Mel spectrum transform, which comprises a processor and a memory, wherein the memory is stored with a computer program, and the computer program realizes the steps of the pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform when being executed by the processor.
The embodiment of the application also provides a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and the computer program realizes the steps of the pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform when being executed by a processor.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server or a network device, etc.) to perform the method according to the embodiments of the present application.
The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

Claims (8)

1. A method for pulse-sound recognition based on hilbert-yellow transform and Mel spectral transform, comprising:
generating an MFCC coefficient and a marginal spectrum coefficient based on a training pulse signal in advance to train an MFCC model and a marginal spectrum model so as to construct a trained Gaussian mixture model;
for a given pulse signal, respectively performing Mel spectrum conversion processing and HHT conversion processing based on the pulse signal to generate MFCC coefficients and marginal spectrum coefficients of the pulse signal;
calculating a Gaussian probability density function of the MFCC coefficients and the marginal spectrum coefficients to generate a Gaussian mixture model of the pulse to be identified;
and matching based on the Gaussian mixture model of the pulse to be identified and the trained Gaussian mixture model, and determining the type of the pulse signal corresponding to the maximum probability, namely the pulse type of the given pulse signal.
2. The hilbert-yellow transform and Mel spectrum transform based pulse sound recognition method of claim 1, wherein training the MFCC model and the marginal spectrum model by generating MFCC coefficients and marginal spectrum coefficients based on training pulse signals in advance to construct a trained gaussian mixture model comprises:
generating MFCC coefficients and marginal spectral coefficients based on training pulse signals in advance to construct a training sequence;
based on the training sequence, the GMM likelihood is expressed as:
wherein λ represents an estimated parameter, X i Representing a training sequence;
the training process satisfies:
estimating a new parameter by using EM algorithmSo that the likelihood under the new model parameters +.>
And (5) iterating until the model converges.
3. The hilbert-yellow transform and Mel spectrum transform based pulse sound recognition method according to claim 1, wherein performing a Mel spectrum transform process and a HHT transform process, respectively, based on the pulse signal to generate MFCC coefficients and marginal spectral coefficients of the pulse signal comprises:
after preprocessing the pulse signal, performing FFT conversion;
calculating spectral line energy for the data after FFT of each frame;
multiplying and adding the spectral line energy of each frame with the frequency domain response of the mel filter to determine the energy passing through the mel filter;
based on the energy passed through the mel filter, the DCT is calculated to determine MFCC coefficients and first difference spectral coefficients.
4. The hilbert-yellow transform and Mel spectrum transform based pulse sound recognition method according to claim 3, wherein performing a Mel spectrum transform process and a HHT transform process, respectively, based on the pulse signal to generate MFCC coefficients and marginal spectral coefficients of the pulse signal comprises:
performing EMD screening on the pulse signal to obtain a plurality of IMF components;
performing Hilbert transform on each IMF component;
based on the results of the Hilbert transform, and the Hilbert spectrum, hilbert marginal spectrum and instantaneous energy density level are determined, satisfying:
Vh(ω)=h(ω t+1 )-h(ω t-1 )
wherein H (ω) represents a signal marginal spectrum, vh (ω) represents a first order differential coefficient of the marginal spectrum, and H (ω, t) represents a Hilbert spectrum;
a marginal spectral coefficient is determined based on the signal marginal spectrum.
5. The hilbert-yellow transform and Mel spectrum transform based pulse sound recognition method of claim 4, wherein calculating gaussian probability density functions of MFCC coefficients and marginal spectral coefficients to generate a gaussian mixture model of the pulse to be recognized comprises:
defining probability density functions of an M-order Gaussian mixture model to satisfy the following conditions:
wherein X is a D-dimensional random vector; b i (X i ) I=1, l, m is a sub-distribution; omega i Is a mixed weight;
taking the MFCC coefficient, the first difference spectrum coefficient, the marginal spectrum coefficient and the first difference coefficient of the marginal spectrum as a sub-distribution b i (X i ) To generate a gaussian mixture model of the pulse to be identified.
6. The method for recognizing pulse sound based on hilbert-yellow transform and Mel spectrum transform according to claim 5, wherein the determining the type of the pulse signal corresponding to the maximum probability based on the gaussian mixture model of the pulse to be recognized and the trained gaussian mixture model comprises:
the maximum posterior probability based on Bayesian theory identifies that the pulse signal belongs to the category of training data and meets the following conditions:
i=argmaxP(X/λ i )
wherein i represents the type of the pulse signal identified, P (X/lambda i ) Representing the maximum posterior probability.
7. A hilbert-yellow transform and Mel spectrum transform based pulse sound recognition device comprising a processor and a memory, the memory having stored thereon a computer program which, when executed by the processor, implements the steps of the hilbert-yellow transform and Mel spectrum transform based pulse sound recognition method according to any one of claims 1 to 6.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the method for pulse-sound recognition based on hilbert-yellow transform and Mel-spectrum transform as claimed in any of claims 1 to 6.
CN202310406504.2A 2023-04-17 2023-04-17 Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform Pending CN116665698A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310406504.2A CN116665698A (en) 2023-04-17 2023-04-17 Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310406504.2A CN116665698A (en) 2023-04-17 2023-04-17 Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform

Publications (1)

Publication Number Publication Date
CN116665698A true CN116665698A (en) 2023-08-29

Family

ID=87708699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310406504.2A Pending CN116665698A (en) 2023-04-17 2023-04-17 Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform

Country Status (1)

Country Link
CN (1) CN116665698A (en)

Similar Documents

Publication Publication Date Title
US7904295B2 (en) Method for automatic speaker recognition with hurst parameter based features and method for speaker classification based on fractional brownian motion classifiers
JP4218982B2 (en) Audio processing
EP3440672B1 (en) Estimating pitch of harmonic signals
CN100543842C (en) Realize the method that ground unrest suppresses based on multiple statistics model and least mean-square error
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
US5734793A (en) System for recognizing spoken sounds from continuous speech and method of using same
CN109977724B (en) Underwater target classification method
JP4586577B2 (en) Disturbance component suppression device, computer program, and speech recognition system
JP6195548B2 (en) Signal analysis apparatus, method, and program
Todkar et al. Speaker recognition techniques: A review
US20160232924A1 (en) Estimating fractional chirp rate with multiple frequency representations
Boulanger-Lewandowski et al. Exploiting long-term temporal dependencies in NMF using recurrent neural networks with application to source separation
CN107430850A (en) Determine the feature of harmonic signal
US9548067B2 (en) Estimating pitch using symmetry characteristics
CN116665698A (en) Pulse sound identification method based on Hilbert-Huang transform and Mel spectrum transform
CN112133293A (en) Phrase voice sample compensation method based on generation countermeasure network and storage medium
CN111310836B (en) Voiceprint recognition integrated model defending method and defending device based on spectrogram
Nuthakki et al. Speech Enhancement based on Deep Convolutional Neural Network
US7180442B1 (en) Target indentification method using cepstral coefficients
CN117765969A (en) Voice recognition method of Mel spectrum multidimensional feature space under EMD wind noise inhibition
Solovyov et al. Information redundancy in constructing systems for audio signal examination on deep learning neural networks
Therese et al. A linear visual assessment tendency based clustering with power normalized cepstral coefficients for audio signal recognition system
Venkateswarlu et al. The performance evaluation of speech recognition by comparative approach
Parada et al. A quantitative comparison of blind C 50 estimators
Hung et al. Temporal Convolution Network-based Onset Detection and Query by Humming System Design

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination