CN109065070B - Kernel function-based audio characteristic signal dimension reduction method - Google Patents

Kernel function-based audio characteristic signal dimension reduction method Download PDF

Info

Publication number
CN109065070B
CN109065070B CN201810995309.7A CN201810995309A CN109065070B CN 109065070 B CN109065070 B CN 109065070B CN 201810995309 A CN201810995309 A CN 201810995309A CN 109065070 B CN109065070 B CN 109065070B
Authority
CN
China
Prior art keywords
dimension reduction
audio
signal
kernel function
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810995309.7A
Other languages
Chinese (zh)
Other versions
CN109065070A (en
Inventor
龙华
杨明亮
邵玉斌
杜庆治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN201810995309.7A priority Critical patent/CN109065070B/en
Publication of CN109065070A publication Critical patent/CN109065070A/en
Application granted granted Critical
Publication of CN109065070B publication Critical patent/CN109065070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio signal processing. The invention carries out dimension reduction processing on the characteristic parameters of the audio signals, achieves the required dimension reduction effect while not discarding the audio characteristic information quantity, visually displays the final dimension reduction data, and carries out comparison analysis on the results obtained by adopting other audio characteristic parameter dimension reduction methods. The invention carries out dimension reduction on the audio characteristic parameters, mainly carries out dimension reduction processing on a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient of an audio coefficient field, and visually displays the data result after dimension reduction. The audio feature dimension reduction processing of the invention can be used for monitoring broadcast signals and quickly identifying and processing audio signals. The method has simple algorithm, uses the nonlinear kernel function to represent the mapping relation between the Gaussian observation space and the hidden space, and avoids the defects of limited use range and poor dimension reduction effect of a linear mapping method.

Description

Kernel function-based audio characteristic signal dimension reduction method
Technical Field
The invention relates to a kernel function-based audio characteristic signal dimension reduction method, and belongs to the technical field of audio characteristic signal processing.
Background
In order to realize the management and control of wireless audio broadcasting and perform safe and efficient real-time monitoring and discrimination on the audio broadcasting, the rapid processing of audio information is related to the process speed of the whole process, and the characteristic signal dimension reduction processing of audio is taken as the core of audio information processing, so that the efficiency and the reliability of the processing are also necessary to be solved at present. Most of the existing audio characteristic signal dimension reduction methods mainly include a local preserving projection method, a multi-dimensional scaling method, a local linear embedding method, a principal component analysis method and the like. Most of the dimension reduction algorithms have high complexity, and the purpose of reducing the dimension by discarding part of characteristic signals can cause unpredictable errors in practical engineering application.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a kernel function-based audio characteristic signal dimension reduction method, which performs dimension reduction analysis on the extracted audio Linear Prediction Coefficients (LPCs), Linear Prediction Cepstrum Coefficients (LPCCs) and Mel Frequency Cepstrum Coefficients (MFCCs) to achieve the purposes of reducing data dimensions and improving information processing rate.
The technical scheme of the invention is as follows: a kernel function-based audio feature signal dimension reduction method. The method comprises the following specific steps:
(1) audio signal acquisition: and acquiring an audio signal to obtain an audio sample.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering, pre-emphasizing and framing the digital signals written into the WAV file.
(3) Characteristic parameter extraction: and extracting high-dimensional characteristic parameters of Linear Prediction Coefficients (LPCs), Linear Prediction Cepstrum Coefficients (LPCCs) and Mel Frequency Cepstrum Coefficients (MFCCs) in the processed digital signals.
(4) Building a dimension reduction model: and (3) sending the extracted characteristic parameters into a dimensionality reduction model built by a kernel chemistry technique (kerneltrick) to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data. The core of the method is to use a Gaussian regression process model (GPR) to model the relation between an implicit variable and an observed variable in a non-linear mode.
(5) And (3) dimension reduction analysis: and performing visual display (2D/3D) on the data subjected to the dimension reduction, and comparing the results obtained by other dimension reduction methods.
In the above dimension reduction method for the audio characteristic signal based on the kernel function, in the step (1), the audio acquisition is to acquire an audio sample through an audio acquisition device, and the audio acquisition device sets a sampling frequency (the sampling frequency meets the nyquist sampling theorem), a sampling channel number and quantization precision when acquiring the audio signal.
In the above method for reducing the dimension of the audio feature signal based on the kernel function, the audio signal preprocessing in step (2) includes the following steps:
(1) using a rectangular window function w (n) (upper limit frequency is generally f)H3400Hz, lower limit frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
Figure BDA0001781726780000021
(2) For the filtered signal ya(n) carrying out pre-emphasis processing by a difference method to obtain a signal yb(n) wherein yb(n) ═ y (n) — α y (n-1) (α is a pre-emphasis coefficient generally having a value close to 1). The high frequency part is improved, the low frequency part is suppressed, and the frequency spectrum of the signal is flattened.
(3) The short-time analysis of the frame-divided voice signal is to divide the signal into a plurality of voice segments, one segment is called a frame, and the time range of each segment is between 10 ms and 30 ms. In order to ensure smooth transition between frames, there is a partial overlap between frames, the overlapped part is called frame shift, and the frame shift takes 1/2 or 1/3 of the length of the frame.
In the above dimension reduction method for audio feature signals based on kernel functions, the step (3) of extracting feature parameters includes the following steps:
(1) linear Prediction Coefficient (LPC): calling an LPC function packet by programming, setting order parameters of a frame length, a frame shift, a window function and an LPC, extracting characteristic values of the audio signals preprocessed in the step (2), and putting the extracted characteristic values into a specified table 1.
(2) Linear Prediction Cepstrum Coefficient (LPCC): calling the LPCC function packet by programming, setting the frame length, the frame shift, the window function and the LPCC order parameter, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the characteristic value into a specified table 2.
(3) Mel-frequency cepstrum coefficient (MFCC): calling the MFCC function packet by programming, setting the frame length, the frame shift, the window function and the order parameter of the MFCC, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the extracted characteristic value into a specified table 3.
In the above method for reducing the dimension of the audio characteristic signal based on the kernel function, the building of the dimension reduction model in the step (4) includes the following steps:
(1) the characteristic dimension reduction model is built by firstly recording hidden space as
Figure BDA0001781726780000031
Dimension q, let observation space be
Figure BDA0001781726780000032
Dimension d (q)<d) In that respect Assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, noise epsilon obeys a Gaussian distribution with a mean value of 0 and a variance of beta, and assuming that the hidden function f is a square exponential kernel function satisfying the Gaussian process
Figure BDA0001781726780000033
Wherein, sigma is a coefficient parameter of square exponential kernel, l represents a distance influence factor parameter between two points z and z ', beta represents a hyper-parameter of the model, sigma (z, z') represents Kronecker delta function, and the parameter to be solved in the kernel function is theta (sigma, l, beta). The kernel function takes a maximum when z is close to z' and a minimum when the distance is far away. For the convenience of subsequent derivation, a calculation formula of the covariance matrix is given, and the formula is
Figure BDA0001781726780000034
(2) Assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is:,iFor n elements of the i-th dimension in the observation space Y
Figure BDA0001781726780000035
To obtain a better dimensionality reduction effect, that is, to obtain the best kernel function hyperparameters by using a correlation algorithm to maximize the probability, a particle swarm optimization algorithm is used to solve the probability, and θ (σ, l, β) is recorded as a ═ a (a)1,a2,a3) Wherein the velocity of the particle i is denoted as vi=(vi1,vi2,vi3) The best position where the particle passes is denoted as pg=(pg1,pg2,pg3) The particle swarm algorithm adopts the following equation to continuously update the positions of the particles
Figure BDA0001781726780000041
Wherein w is a non-negative inertia factor; acceleration constant c1And c2Is a non-negative number; r is1And r2Is at [01]Random numbers transformed within a range. The current position, the experience position and the neighbor bit information of the particle swarm optimization algorithm are utilized to adjust the state of the particles, the information exchange mode of the particle swarm optimization algorithm is applied to the optimization process of the nuclear parameters, and the particles are influenced by the experience of the particles and the experience in the swarm, so that the particle swarm optimization algorithm has better global optimization capability and convergence speed.
The kernel function used by the model is a nonlinear kernel function, the solved kernel parameters theta (sigma, l, beta) are brought back into the model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data.
In the dimension reduction analysis method based on the audio characteristic signal, in the step (5), the data after dimension reduction is displayed in a two-dimensional or three-dimensional visual manner, and then is analyzed and compared with other dimension reduction algorithm results.
Compared with the existing kernel function-based audio characteristic signal dimension reduction method, the method has the advantages that:
(1) the invention uses the nonlinear kernel function to represent the direct relation between the observation space data and the hidden space parameters, and avoids the defect of poor dimension reduction effect of certain audio characteristic data caused by linear mapping.
(2) The invention adopts the particle swarm algorithm to solve the hyper-parameters in the kernel function, the excellent global optimization capability of the particle swarm and the directionality of the swarm particles can quickly find the optimal hyper-parameters, and the method is extremely convenient for subsequent replacement of other kernel functions.
(3) The novel audio characteristic dimension reduction theory provided by the invention is simple, the programming is easy to realize, the novel audio characteristic dimension reduction theory is more suitable for the application of real engineering projects, and the improvement of the audio information processing speed is substantially changed.
Drawings
FIG. 1 is a flow chart of a dimension reduction analysis of the present invention;
FIG. 2 is a flow chart of signal preprocessing according to the present invention;
FIG. 3 is a flow chart of feature parameter extraction and dimension reduction processing according to the present invention;
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
As shown in fig. 1-3, a kernel function-based audio feature signal dimension reduction method specifically includes the steps of:
(1) audio signal acquisition: and collecting audio signals to obtain audio samples.
(2) Audio signal preprocessing: and converting the analog signals in the collected audio samples into digital signals, and writing the digital signals into the WAV file. And filtering, pre-emphasizing and framing the digital signals to be written into the WAV file.
(3) Characteristic parameter extraction: and extracting high-dimensional characteristic parameters of Linear Prediction Coefficients (LPCs), Linear Prediction Cepstrum Coefficients (LPCCs) and Mel Frequency Cepstrum Coefficients (MFCCs) in the processed digital signals.
(4) Building a dimension reduction model: and (3) sending the extracted characteristic parameters into a dimensionality reduction model built by a kernel trim (kernel trim) to directly obtain a low-dimensional hidden variable, wherein the low-dimensional hidden variable is dimensionality reduced data.
(5) And (3) dimension reduction analysis: and performing visual display (2D/3D) on the data subjected to the dimension reduction, and comparing the results obtained by other dimension reduction methods.
The audio acquisition is to acquire an audio sample through an audio acquisition device, the audio acquisition device sets the sampling frequency to be 44.1Hz (the sampling frequency meets the Nyquist sampling theorem) when acquiring the audio signal, and the number of sampling channels is single channel and the quantization precision is 16bit because the audio signal is acquired.
The signal preprocessing comprises the following steps:
(1) using a rectangular window function w (n) (the upper limit frequency is generally f)H3400Hz, lower limit frequency fL60-100 Hz) filtering the collected audio signal x (n) to obtain a signal ya(n) wherein
Figure BDA0001781726780000051
(2) For the filtered signal ya(n) carrying out pre-emphasis processing by a difference method to obtain a signal yb(n) wherein yb(n) ═ y (n) — α y (n-1) (α is a pre-emphasis coefficient generally having a value close to 1).
(3) Processing the pre-emphasis to obtain a signal yb(n) dividing the voice into a plurality of voice segments, wherein one voice segment is called a frame, and the time range of each voice segment is 10-30 ms. There is a partial overlap between frames, the overlapping part is called frame shift, and the frame shift takes the length of 1/2 or 1/3.
The characteristic parameter extraction comprises the following steps:
(1) linear Prediction Coefficient (LPC): calling an LPC function packet by programming, setting order parameters of a frame length, a frame shift, a window function and an LPC, extracting characteristic values of the audio signals preprocessed in the step (2), and putting the extracted characteristic values into a specified table 1.
(2) Linear Prediction Cepstrum Coefficient (LPCC): calling the LPCC function packet by programming, setting the order parameters of the frame length, the frame shift, the window function and the LPCC, extracting the characteristic values of the audio signals preprocessed in the step (2), and putting the characteristic values into a specified table 2.
(3) Mel-frequency cepstrum coefficient (MFCC): calling the MFCC function packet by programming, setting the frame length, the frame shift, the window function and the order parameter of the MFCC, extracting the characteristic value of the audio signal preprocessed in the step (2), and putting the characteristic value into a specified table 3.
The construction of the dimension reduction model comprises the following steps:
(1) the hidden space parameter is recorded as
Figure BDA0001781726780000061
Observation space of
Figure BDA0001781726780000062
Namely, the hidden space is a dimension q, the observation space dimension is d (q < d), the direct existence of a relation of y ═ f (z) + epsilon between an observation value and hidden space parameters is assumed, the noise epsilon follows Gaussian distribution with a mean value of 0 and a variance of ξ, and the hidden function f is assumed to be a square exponential kernel function satisfying the Gaussian process:
Figure BDA0001781726780000063
wherein, sigma is a coefficient parameter of square exponential kernel, l represents a distance influence factor parameter between two points z and z ', beta represents a hyper-parameter of the model, sigma (z, z') represents Kronecker delta function, and the parameter to be solved in the kernel function is theta (sigma, l, beta). The kernel function takes a maximum when z is close to z' and a minimum when the distance is far away. The covariance matrix is calculated as:
Figure BDA0001781726780000064
(2) assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is:,iAs the i-th dimension in the observation space YN elements of
Figure BDA0001781726780000065
The invention adopts a particle swarm optimization algorithm to solve the parameters, and records theta (sigma, l, beta) as A ═ a1,a2,a3) Wherein the velocity of the particle i is denoted vi=(vi1,vi2,vi3) The best position where the particle passes is denoted as pg=(pg1,pg2,pg3) And a particle swarm algorithm position iterative updating formula:
Figure BDA0001781726780000071
Figure BDA0001781726780000072
wherein w is a non-negative inertia factor; acceleration constant c1And c2Is a non-negative number; r is1And r2Is at [01]Random numbers transformed within a range. And (4) bringing the solved nuclear parameters theta (sigma, l, beta) back to the model to obtain a dimensionality reduction model based on the nuclear function, and sending the extracted characteristic parameters to the dimensionality reduction model to obtain hidden parameters, wherein the hidden parameters are data subjected to dimensionality reduction.
In the dimension reduction analysis, because a person lives in a three-dimensional space, the space beyond the three-dimensional space cannot be imagined, and the dimension reduction result with more data sets is difficult to directly analyze, the preprocessed audio signal is sent into the built dimension reduction model for dimension reduction processing, and the obtained hidden parameter data is stored and visually displayed so as to be convenient for comparing and analyzing the advantages and the disadvantages of other dimension reduction models. The present invention is not limited to the above embodiments, and the dimension reduction algorithm can be applied to other related fields within the knowledge of those skilled in the art.

Claims (3)

1. A kernel function-based audio feature signal dimension reduction method is characterized in that: the method comprises the following specific steps:
(1) audio signal acquisition: collecting an audio signal to obtain an audio sample;
(2) audio signal preprocessing: converting analog signals in the collected audio samples into digital signals, writing the digital signals into a WAV file, and performing filtering, pre-emphasis and framing processing on the digital signals written into the WAV file;
(3) characteristic parameter extraction: extracting characteristic parameters of a linear prediction coefficient, a linear prediction cepstrum coefficient and a Mel frequency cepstrum coefficient in the processed digital signal;
building a dimension reduction model: sending the extracted characteristic parameters into a dimensionality reduction model built by a nucleation skill to directly obtain low-dimensional hidden variables, wherein the low-dimensional hidden variables are dimensionality reduced data;
the dimension reduction model is specifically built as follows:
(1) the dimension reduction model is built by first recording the hidden space as
Figure FDA0003562036870000011
Dimension q, let observation space be
Figure FDA0003562036870000012
Dimension d, q<d, assuming that a relation of y ═ f (z) + epsilon exists between the observed value and the hidden space parameter, the noise epsilon follows a gaussian distribution with mean 0 and variance β, and assuming that the hidden function f is a squared exponential kernel function satisfying the gaussian process:
Figure FDA0003562036870000013
wherein σ is a coefficient parameter of a square exponential kernel, l represents a distance influence factor parameter between z and z ', β represents a hyper-parameter of the model, σ (z, z ') represents a Kronecker delta function, the parameter requiring solution in the kernel function is θ (σ, l, β), it can be known from the above formula that the kernel function obtains a maximum value when z and z ' are very close, and obtains a minimum value when the distance is very far, and a calculation formula of a covariance matrix of the kernel function:
Figure FDA0003562036870000014
(2) assuming independent sampling of the d-dimensional observation space, the probability of observation for Y, where Y is:,iFor n elements of the i-th dimension in the observation space Y
Figure FDA0003562036870000021
Solving the parameters by a particle swarm optimization algorithm, and recording theta (sigma, l, beta) as A ═ a1,a2,a3) Wherein the velocity of the particle i is denoted vi=(vi1,vi2,vi3) The best position where the particle passes is denoted pg=(pg1,pg2,pg3) And a particle swarm algorithm position iteration formula:
Figure FDA0003562036870000022
Figure FDA0003562036870000023
(4) wherein w is a non-negative inertia factor; acceleration constant c1And c2Is a non-negative number; r is1And r2Is at [01]Random numbers transformed within the range are applied to a kernel parameter optimization process by an information exchange mode of a particle swarm optimization algorithm, the solved kernel parameters theta (sigma, l, beta) are brought back into the model to obtain a dimension reduction model, the extracted characteristic parameters are sent into the dimension reduction model to obtain hidden parameters, and the hidden parameters are dimension reduced data;
(5) and (3) analyzing a dimension reduction result: and carrying out visual display on the data subjected to the dimensionality reduction.
2. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio acquisition is performed by an audio acquisition device, and the audio acquisition device sets the sampling frequency, the number of sampling channels and the quantization precision when acquiring the audio signals.
3. The kernel function-based audio feature signal dimension reduction method according to claim 1, wherein: the audio signal pre-processing comprises the steps of:
(1) filtering the collected audio signal x (n) by adopting a rectangular window function w (n) to obtain a signal ya(n) wherein
Figure FDA0003562036870000024
(2) For the filtered signal ya(n) pre-emphasis processing is carried out by using a difference method to obtain a signal yb(n) wherein yb(n) y (n) - α y (n-1), α being a pre-emphasis coefficient and generally having a value close to 1;
(3) processing the pre-emphasis to obtain a signal yb(n) dividing the frame into a plurality of voice frames, and partially overlapping the frames, wherein the overlapped part is called frame shift.
CN201810995309.7A 2018-08-29 2018-08-29 Kernel function-based audio characteristic signal dimension reduction method Active CN109065070B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810995309.7A CN109065070B (en) 2018-08-29 2018-08-29 Kernel function-based audio characteristic signal dimension reduction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810995309.7A CN109065070B (en) 2018-08-29 2018-08-29 Kernel function-based audio characteristic signal dimension reduction method

Publications (2)

Publication Number Publication Date
CN109065070A CN109065070A (en) 2018-12-21
CN109065070B true CN109065070B (en) 2022-07-19

Family

ID=64757611

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810995309.7A Active CN109065070B (en) 2018-08-29 2018-08-29 Kernel function-based audio characteristic signal dimension reduction method

Country Status (1)

Country Link
CN (1) CN109065070B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112444785B (en) * 2019-08-30 2024-04-12 华为技术有限公司 Target behavior recognition method, device and radar system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679321A (en) * 2016-01-29 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and device and terminal
CN105913066A (en) * 2016-04-13 2016-08-31 刘国栋 Digital lung sound characteristic dimension reducing method based on relevance vector machine
CN106898362A (en) * 2017-02-23 2017-06-27 重庆邮电大学 The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis
CN109166591A (en) * 2018-08-29 2019-01-08 昆明理工大学 A kind of classification method based on audio frequency characteristics signal
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A kind of audio frequency characteristics dimension reduction method based on spectral clustering

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8756061B2 (en) * 2011-04-01 2014-06-17 Sony Computer Entertainment Inc. Speech syllable/vowel/phone boundary detection using auditory attention cues

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105679321A (en) * 2016-01-29 2016-06-15 宇龙计算机通信科技(深圳)有限公司 Speech recognition method and device and terminal
CN105913066A (en) * 2016-04-13 2016-08-31 刘国栋 Digital lung sound characteristic dimension reducing method based on relevance vector machine
CN106898362A (en) * 2017-02-23 2017-06-27 重庆邮电大学 The Speech Feature Extraction of Mel wave filters is improved based on core principle component analysis
CN109166591A (en) * 2018-08-29 2019-01-08 昆明理工大学 A kind of classification method based on audio frequency characteristics signal
CN109346104A (en) * 2018-08-29 2019-02-15 昆明理工大学 A kind of audio frequency characteristics dimension reduction method based on spectral clustering

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
"Hierarchical Gaussian Process Latent Variable Models";Neil D.Lawrence;《Machine Learning, Proceedings of the Twenty-Fourth International Conference》;20171230;第20-24页 *
"Semi-supervised Gaussian process latent variable model with pairwise";Xiumei Wang 等;《Neurocomputing》;20101230;全文 *
"语音情感特征提取及其降维方法综述";刘振焘 等;《计算机学报》;20181230;全文 *
"基于语音特征的汉语数字语音降维与识别研究";高文曦;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20120715;第31-33页 *
"基于高斯过程隐变量模型的数据降维与分类";张家源;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20181015;全文 *
"降维技术与方法综述";张煜东;《四川兵工学报》;20101030;全文 *

Also Published As

Publication number Publication date
CN109065070A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN109599120B (en) Abnormal mammal sound monitoring method based on large-scale farm plant
CN111583954B (en) Speaker independent single-channel voice separation method
WO2020220439A9 (en) Highway traffic flow state recognition method based on deep neural network
CN109034046B (en) Method for automatically identifying foreign matters in electric energy meter based on acoustic detection
CN110808033B (en) Audio classification method based on dual data enhancement strategy
CN109166591B (en) Classification method based on audio characteristic signals
CN109192200B (en) Speech recognition method
US20220253700A1 (en) Audio signal time sequence processing method, apparatus and system based on neural network, and computer-readable storage medium
Deshmukh et al. Speech based emotion recognition using machine learning
CN105448291A (en) Parkinsonism detection method and detection system based on voice
CN104795064A (en) Recognition method for sound event under scene of low signal to noise ratio
CN104658538A (en) Mobile bird recognition method based on birdsong
CN112599145A (en) Bone conduction voice enhancement method based on generation of countermeasure network
CN116486834A (en) Rolling sound classification method based on feature fusion and improved convolutional neural network
CN109065070B (en) Kernel function-based audio characteristic signal dimension reduction method
CN110473548B (en) Classroom interaction network analysis method based on acoustic signals
CN114694640A (en) Abnormal sound extraction and identification method and device based on audio frequency spectrogram
CN116434759B (en) Speaker identification method based on SRS-CL network
CN114842280A (en) Automatic micro-seismic signal identification algorithm based on convolutional neural network
JP2003524218A (en) Speech processing using HMM trained with TESPAR parameters
CN116229991A (en) Motor fault diagnosis method based on MFCC voice feature extraction and machine learning
CN105206259A (en) Voice conversion method
CN112735477B (en) Voice emotion analysis method and device
CN113488069B (en) Speech high-dimensional characteristic rapid extraction method and device based on generation type countermeasure network
CN114550747A (en) Unmanned aerial vehicle acoustic identification control method based on wireless ultraviolet networking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant