CN109767756A - A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient - Google Patents

A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient Download PDF

Info

Publication number
CN109767756A
CN109767756A CN201910087494.4A CN201910087494A CN109767756A CN 109767756 A CN109767756 A CN 109767756A CN 201910087494 A CN201910087494 A CN 201910087494A CN 109767756 A CN109767756 A CN 109767756A
Authority
CN
China
Prior art keywords
cosine transform
discrete cosine
audio signal
cepstrum coefficient
inverse discrete
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910087494.4A
Other languages
Chinese (zh)
Other versions
CN109767756B (en
Inventor
左毅
马赫
李铁山
贺培超
刘君霞
艾佳琪
肖杨
于仁海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201910087494.4A priority Critical patent/CN109767756B/en
Publication of CN109767756A publication Critical patent/CN109767756A/en
Priority to JP2019186806A priority patent/JP6783001B2/en
Application granted granted Critical
Publication of CN109767756B publication Critical patent/CN109767756B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Complex Calculations (AREA)

Abstract

The invention discloses a kind of speech feature extraction algorithms based on dynamic partition inverse discrete cosine transform cepstrum coefficient, with following steps: preemphasis, framing and adding window pretreatment S1, are carried out to audio signal: S2, the variation carried out from time domain to frequency domain for pretreated audio signal are handled: S3, utilizing cluster algorithm, the similarity between the inverse discrete cosine transform cepstrum coefficient that step S2 is obtained is calculated, and maximum adjacent two class of similarity is successively merged;Iteration above procedure, until cluster is to 24 classes, obtained dynamic partition inverse discrete cosine transform cepstrum coefficient is speech feature.The shortcomings that perfect prior art of the present invention does not make full use of speech behavioral characteristics to carry out frequency-domain transform, makes the present invention have wider adaptability, and can obtain higher accuracy of identification in Speaker Identification.

Description

A kind of speech feature extraction based on dynamic partition inverse discrete cosine transform cepstrum coefficient Algorithm
Technical field
The invention belongs to speech Feature Extraction Technology fields, and Unsupervised clustering parser is applied to speech feature extraction Direction, in particular to a kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient.
Background technique
Speaker Recognition Technology includes feature extraction and identification modeling two parts.Feature extraction is in speaker Recognition Technology Committed step, the overall performance of speech recognition system will be directly influenced.Ordinary circumstance, voice signal pass through framing and adding window After pretreated, the data volume of high latitude can be generated, when extracting speaker characteristic, it is necessary to superfluous in original voice by removing Remaining information reduces data dimension.Existing method will use triangle filtering, converts voice signals into and meets characteristic parameter and want The speech feature vector asked simultaneously can meet approximate human auditory system perception characteristics and can enhance voice letter to a certain extent Number and inhibit non-speech audio.Common characteristic parameter has: linear prediction analysis coefficient is the principle of sound for simulating the mankind, is passed through Analyze characteristic parameter obtained from the cascade model of sound channel short tube;Perception linear predictor coefficient is to pass through calculating based on auditory model It is applied in spectrum analysis, input speech signal is handled by human auditory model, is substituted used in linear predictive coding LPC The all-pole modeling for being equivalent to LPC of time-domain signal predicts polynomial characteristic parameter;Tandem feature and Bottleneck are special Sign is two category features extracted using neural network;Fbank feature based on wave filter group is equivalent to MFCC and removes final step Discrete cosine transform, with MFCC feature compared to remaining more primary voice datas;Linear prediction residue error is to be based on Channel model has abandoned the voice-activated information in signal generating process and has represented the characteristic of formant with more than ten a cepstrum coefficients Important feature parameter;For speech characteristic parameter MFCC as widest speech characteristic parameter, which is first to language Sound carries out the pretreatments such as preemphasis, framing, adding window, acceleration Fourier transformation, and energy spectrum is then passed through the three of one group of Mel scale Angular filter group is filtered, and the logarithmic energy for calculating each filter group output is obtained through discrete cosine transform (DCT) MFCC coefficient finds out Mel-scale Cepstrum parameter and extracts dynamic difference parameter i.e. mel cepstrum coefficients again.2012 S.Al-Rawahya et al. refers to MFCC feature extracting method, carries out waiting frequency to the DCT cepstrum coefficient obtained after voice pretreatment Regional partition, the method for proposing Histogram DCT cepstrum coefficient.We have found that waiting Dividing in frequency domain cepstrum coefficient that can ignore speech number According to dynamic characteristic, therefore to propose that a kind of new speech feature extraction algorithm is based on dynamic partition on this basis inverse by the present invention The method of discrete cosine transform cepstrum coefficient utilizes hierarchy clustering method by speech data according to its dynamic in conjunction with unsupervised learning The similitude of feature carries out clustering, to extract the behavioral characteristics vector that can more describe speech characteristic.
In existing research, a kind of speech recognition technology being most widely used be using MFCC as speech feature to Amount, and the machine learning methods such as combination gauss hybrid models (GMM), Hidden Markov Model (HMM) and support vector machines (SVM) Carry out speaker's pattern match.The extraction process of MFCC are as follows: preemphasis is carried out to voice first, framing, adding window, accelerates Fourier Preconditioning;Then energy spectrum is filtered by the triangle filter group of one group of Mel scale;Calculate each filter The logarithmic energy of group output obtains MFCC coefficient through discrete cosine transform (DCT) and brings obtained logarithmic energy into discrete cosine change It changes, finds out Mel-scale Cepstrum parameter and extract dynamic difference parameter i.e. mel cepstrum coefficients MFCC again.
S.Al-Rawahya et al. has found this new feature of DCT Cepstrum in research in 2012, what they proposed Speech feature extraction algorithm based on equal frequency domains DCT Cepstrum coefficient.Pretreated audio signal is converted into frequency domain, Pretreated audio signal is converted into frequency domain spectra multiplication form from convolution, logarithm is taken to it, obtained component with Addition form indicates, obtains discrete cosine transform cepstrum coefficient (DCT Cepstrum coefficient).DCT cepstrum coefficient is with non-linear increasing The periodicity for measuring recording frequency range divides frequency domain character section between 0Hz-600Hz frequency domain with every 50Hz, in 600Hz- It can be regarded as with every 100Hz segmentation frequency domain character section process to speech signal intermediate frequency rate range week between 1000Hz frequency domain The counting of issue.It is simpler than MFCC feature extracting method, faster.
Summary of the invention
The purpose of the present invention is primarily directed to the speech feature based on equal Dividing in frequency domain inverse discrete cosine transform cepstrum coefficient The inaccuracy of dividing frequency in extraction algorithm proposes a kind of speech based on dynamic partition inverse discrete cosine transform cepstrum coefficient Feature extraction algorithm.The technological means that the present invention uses is as follows:
A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient has following step It is rapid:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
It is eliminated by pre-processing because mankind's phonatory organ itself and the equipment bring due to acquisition audio signal are mixed The factors such as folded, higher hamonic wave distortion, high frequency to audio signal quality influence guarantee signal that subsequent processing obtains more evenly, Smoothly, good parameter is provided for speech feature extraction, improves subsequent processing quality.
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency from convolution Multiplication form is composed in domain, takes logarithm to it, obtained component indicates in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient (IDCT Cepstrum coefficient), detailed process are carried out by following formula:
C (q)=IDCT log | DCT { x (k) } |;
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is input audio signal, I.e. pretreated audio signal, C (q) are output voice signal, i.e. inverse discrete cosine transform cepstrum coefficient;
Inverse discrete cosine transform cepstrum coefficient is a data matrix, due to the intrinsic frequency attribute of speech, is carrying out layer All Column Properties are identical when secondary cluster, so we are successively gathered by calculating the similarity of adjacent Column Properties Class.
S3, using cluster algorithm, calculate similar between the inverse discrete cosine transform cepstrum coefficient that step S2 is obtained Degree, and maximum adjacent two class of similarity is successively merged;Iteration above procedure, until cluster, to 24 classes, obtained dynamic is divided Cutting inverse discrete cosine transform cepstrum coefficient (DD-IDCT Cepstrum coefficient) is speech feature.
The preemphasis realizes that detailed process is carried out by following formula by digital filter:
Y (n)=X (n)-aX (n-l);
Wherein, Y (n) is the output signal after preemphasis, and the audio signal of X (n) input, a is pre emphasis factor, when n is It carves.
The average power spectra of audio signal is influenced by glottal excitation and mouth and nose radiation, and front end is about in 800Hz or more Decay by 6dB/oct (octave), the more high corresponding ingredient of frequency is smaller, right before analyzing audio signal thus Its high frequency section is promoted.
What it is through speech analysis overall process is " short time analysis technique ".Audio signal has time-varying characteristics, but one In a short time range (generally within the short time of 10~30ms), time-varying characteristics be held essentially constant it is i.e. relatively stable, because And a quasi-steady state process can be seen as, i.e., audio signal has short-term stationarity.So point of any audio signal Analysis and processing must be set up on the basis of " in short-term ", i.e. progress " short-time analysis ", and audio signal segmentation is analyzed to its feature Parameter, wherein each section is known as one " frame ", frame length is generally taken as 10~30ms.In this way, for whole audio signal, point The characteristic parameter time series being made of each frame characteristic parameter being precipitated.
The framing is that the output signal after the preemphasis is segmented into mono- frame of 20ms.
Windowing process is also carried out after sub-frame processing to it, the purpose of adding window, which may be considered, makes the voice signal overall situation more Continuously, Gibbs' effect is avoided the occurrence of, the Partial Feature for showing periodic function without periodic voice signal originally is made.Institute Stating adding window is Hamming window adding window.
The variation is Cepstrum Transform.
The cluster algorithm is step analysis algorithm.
The calculating similarity is to calculate Euclidean distance.
The present invention has the advantage that compared with prior art
First, since the present invention passes through the speech feature extraction for the equal Dividing in frequency domain DCT Cepstrum coefficient analysed in depth The shortcomings that property of algorithm, the perfect prior art does not make full use of speech behavioral characteristics to carry out frequency-domain transform, make the present invention With wider adaptability, and higher accuracy of identification can be obtained in Speaker Identification.
Second, the present invention is applied to Unsupervised clustering analysis in speech feature extraction, so that the present invention has process letter Bright, speed is quick, occupies the few advantage of computing resource.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to do simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with It obtains other drawings based on these drawings.
Fig. 1 is that the speech in a specific embodiment of the invention based on dynamic partition inverse discrete cosine transform cepstrum coefficient is special Levy the flow chart of extraction algorithm.
Fig. 2 is clustering tree graph in a specific embodiment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient, tool It has the following steps:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
The preemphasis realizes that detailed process is carried out by following formula by digital filter:
Y (n)=X (n)-aX (n-l);
Wherein, Y (n) is the output signal after preemphasis, and the audio signal of X (n) input, a is pre emphasis factor, when n is It carves, this paper a value is 0.97.
The framing is that the output signal after the preemphasis is segmented into mono- frame of 20ms.
The adding window is Hamming window adding window.
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency from convolution Multiplication form is composed in domain, takes logarithm to it, obtained component indicates in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient (IDCT Cepstrum coefficient), detailed process are carried out by following formula:
C (q)=IDCT log | DCT { x (k) } |;
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is input audio signal, I.e. pretreated audio signal, C (q) are output voice signal, i.e. inverse discrete cosine transform cepstrum coefficient;The variation For Cepstrum Transform.
S3, using cluster algorithm, calculate similar between the inverse discrete cosine transform cepstrum coefficient that step S2 is obtained Degree, and maximum adjacent two class of similarity is successively merged;Iteration above procedure, until cluster, to 24 classes, obtained dynamic is divided Cutting inverse discrete cosine transform cepstrum coefficient is speech feature, the specific steps are as follows:
Matrix A represents the inverse discrete cosine transform cepstrum coefficient for the m people n dimension that step S2 is acquired, as shown in Fig. 2, inverse Every one-dimensional vector V of discrete cosine transform cepstrum coefficient1, V2…VnIt regards n class as, acquires ViAnd VjEuclidean distance beBelow it is the specific steps of clustering:
It clusters for the first time:
l1=Dis (V1,V2)
l2=Dis (V2,V3)
ln-1=Dis (Vn-1,Vn)
If i=arg min (l1,l2,l3…ln-1), then cluster result is
(V1),(V2),…(Vi+Vi+1),…(Vn) i.e.
It updates:
li-1=Dis (Vi-1,(Vi+Vi+1))
li=Dis ((Vi+Vi+1),Vi+2)
li+1=li+2
ln-1=ln-2
Delete ln-1
Second of cluster:
If j=arg min (l1,l2,l3…ln-2), then cluster result is
(V1),(V2),…(Vi+Vi+1),…(Vj+Vj+1),…(Vn) i.e.
It updates again:
lj-1=Dis (Vj-1,(Vj+Vj+1))
lj=Dis ((Vj+Vj+1),Vj+2)
lj+1=lj+2
ln-3=ln-2
Delete ln-2
And so on carry out hierarchical clustering until last cluster result be 24 classes, obtain dynamic partition inverse discrete cosine transform Cepstrum coefficient is speech feature, which is put into the feasibility identified in GMM model to judge the algorithm.
The cluster algorithm is step analysis algorithm.
The calculating similarity is to calculate Euclidean distance.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution The range of scheme.

Claims (7)

1. a kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient, it is characterised in that as follows Step:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency domain spectra from convolution Multiplication form takes logarithm to it, and obtained component is indicated in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient, specifically Process is carried out by following formula
C (q)=IDCT log | DCT { x (k) } |;
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is input audio signal, i.e., in advance Treated audio signal, C (q) are output voice signal, i.e. inverse discrete cosine transform cepstrum coefficient;
S3, using cluster algorithm, calculate the similarity between the inverse discrete cosine transform cepstrum coefficient that step S2 is obtained, and Maximum adjacent two class of similarity is successively merged;Iteration above procedure, until cluster is to 24 classes, obtained dynamic partition it is inverse from Dissipating cosine transform cepstrum coefficient is speech feature.
2. extraction algorithm according to claim 1, it is characterised in that: the preemphasis realizes have by digital filter Body process is carried out by following formula:
Y (n)=X (n)-aX (n-l);
Wherein, Y (n) is the output signal after preemphasis, and the audio signal of X (n) input, a is pre emphasis factor, and n is the moment.
3. extraction algorithm according to claim 1, it is characterised in that: the framing is to believe the output after the preemphasis Number it is segmented into mono- frame of 20ms.
4. extraction algorithm according to claim 1, it is characterised in that: the adding window is Hamming window adding window.
5. extraction algorithm according to claim 1, it is characterised in that: the variation is Cepstrum Transform.
6. extraction algorithm according to claim 1, it is characterised in that: the cluster algorithm is step analysis algorithm.
7. extraction algorithm according to claim 1, it is characterised in that: the calculating similarity is to calculate Euclidean distance.
CN201910087494.4A 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient Active CN109767756B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910087494.4A CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient
JP2019186806A JP6783001B2 (en) 2019-01-29 2019-10-10 Speech feature extraction algorithm based on dynamic division of cepstrum coefficients of inverse discrete cosine transform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910087494.4A CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient

Publications (2)

Publication Number Publication Date
CN109767756A true CN109767756A (en) 2019-05-17
CN109767756B CN109767756B (en) 2021-07-16

Family

ID=66455625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910087494.4A Active CN109767756B (en) 2019-01-29 2019-01-29 Sound characteristic extraction algorithm based on dynamic segmentation inverse discrete cosine transform cepstrum coefficient

Country Status (2)

Country Link
JP (1) JP6783001B2 (en)
CN (1) CN109767756B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197657A (en) * 2019-05-22 2019-09-03 大连海事大学 A kind of dynamic speech feature extracting method based on cosine similarity
CN110299134A (en) * 2019-07-01 2019-10-01 中科软科技股份有限公司 A kind of audio-frequency processing method and system
CN110488675A (en) * 2019-07-12 2019-11-22 国网上海市电力公司 A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm
CN112180762A (en) * 2020-09-29 2021-01-05 瑞声新能源发展(常州)有限公司科教城分公司 Nonlinear signal system construction method, apparatus, device and medium
CN112581939A (en) * 2020-12-06 2021-03-30 中国南方电网有限责任公司 Intelligent voice analysis method applied to power dispatching normative evaluation
CN113449626A (en) * 2021-06-23 2021-09-28 中国科学院上海高等研究院 Hidden Markov model vibration signal analysis method and device, storage medium and terminal

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112669874B (en) * 2020-12-16 2023-08-15 西安电子科技大学 Speech feature extraction method based on quantum Fourier transform
CN113793614B (en) * 2021-08-24 2024-02-09 南昌大学 Speech feature fusion speaker recognition method based on independent vector analysis
CN114783462A (en) * 2022-05-11 2022-07-22 安徽理工大学 Mine hoist fault source positioning analysis method based on CS-MUSIC

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458950A (en) * 2007-12-14 2009-06-17 安凯(广州)软件技术有限公司 Method for eliminating interference from A/D converter noise to digital recording
US9606530B2 (en) * 2013-05-17 2017-03-28 International Business Machines Corporation Decision support system for order prioritization
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN107293308A (en) * 2016-04-01 2017-10-24 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN109065071A (en) * 2018-08-31 2018-12-21 电子科技大学 A kind of song clusters method based on Iterative k-means Algorithm
CN109256127A (en) * 2018-11-15 2019-01-22 江南大学 A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458950A (en) * 2007-12-14 2009-06-17 安凯(广州)软件技术有限公司 Method for eliminating interference from A/D converter noise to digital recording
US9606530B2 (en) * 2013-05-17 2017-03-28 International Business Machines Corporation Decision support system for order prioritization
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN107293308A (en) * 2016-04-01 2017-10-24 腾讯科技(深圳)有限公司 A kind of audio-frequency processing method and device
CN109065071A (en) * 2018-08-31 2018-12-21 电子科技大学 A kind of song clusters method based on Iterative k-means Algorithm
CN109256127A (en) * 2018-11-15 2019-01-22 江南大学 A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
S.AL-RAWAHY ET AL.: "《Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients》", 《INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED IN INTELLIGENT ENGINEERING SYSTEMS》 *
WEI HAN ET AL.: "《An efficient MFCC extraction method in speech recognition》", 《ISCAS 2006》 *
田辉平: "《基于层次分析法和聚类分析法相结合的评价方法》", 《华东经济管理》 *
缪元武: "《基于层次聚类的数据分析》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
胡文静: "《基于层次聚类分析的变点识别方法》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197657A (en) * 2019-05-22 2019-09-03 大连海事大学 A kind of dynamic speech feature extracting method based on cosine similarity
CN110197657B (en) * 2019-05-22 2022-03-11 大连海事大学 Dynamic sound feature extraction method based on cosine similarity
CN110299134A (en) * 2019-07-01 2019-10-01 中科软科技股份有限公司 A kind of audio-frequency processing method and system
CN110488675A (en) * 2019-07-12 2019-11-22 国网上海市电力公司 A kind of substation's Abstraction of Sound Signal Characteristics based on dynamic time warpping algorithm
CN112180762A (en) * 2020-09-29 2021-01-05 瑞声新能源发展(常州)有限公司科教城分公司 Nonlinear signal system construction method, apparatus, device and medium
CN112581939A (en) * 2020-12-06 2021-03-30 中国南方电网有限责任公司 Intelligent voice analysis method applied to power dispatching normative evaluation
CN113449626A (en) * 2021-06-23 2021-09-28 中国科学院上海高等研究院 Hidden Markov model vibration signal analysis method and device, storage medium and terminal
CN113449626B (en) * 2021-06-23 2023-11-07 中国科学院上海高等研究院 Method and device for analyzing vibration signal of hidden Markov model, storage medium and terminal

Also Published As

Publication number Publication date
JP2020140193A (en) 2020-09-03
JP6783001B2 (en) 2020-11-11
CN109767756B (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN109767756A (en) A kind of speech feature extraction algorithm based on dynamic partition inverse discrete cosine transform cepstrum coefficient
CN112017644B (en) Sound transformation system, method and application
CN103928023B (en) A kind of speech assessment method and system
Deshwal et al. Feature extraction methods in language identification: a survey
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
Ali et al. Automatic speech recognition technique for Bangla words
CN110942766A (en) Audio event detection method, system, mobile terminal and storage medium
Ryant et al. Highly accurate mandarin tone classification in the absence of pitch information
Linh et al. MFCC-DTW algorithm for speech recognition in an intelligent wheelchair
Nawas et al. Speaker recognition using random forest
Goyal et al. A comparison of Laryngeal effect in the dialects of Punjabi language
CN114283822A (en) Many-to-one voice conversion method based on gamma pass frequency cepstrum coefficient
CN114495969A (en) Voice recognition method integrating voice enhancement
Rabiee et al. Persian accents identification using an adaptive neural network
Sinha et al. Empirical analysis of linguistic and paralinguistic information for automatic dialect classification
Nancy et al. Audio based emotion recognition using Mel frequency Cepstral coefficient and support vector machine
Luo et al. Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform.
Gaudani et al. Comparative study of robust feature extraction techniques for ASR for limited resource Hindi language
Deiv et al. Automatic gender identification for hindi speech recognition
Lekshmi et al. An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language
Tailor et al. Deep learning approach for spoken digit recognition in Gujarati language
Muthamizh Selvan et al. Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification
Bansal et al. Automatic speech recognition by cuckoo search optimization based artificial neural network classifier
Laleye et al. Automatic text-independent syllable segmentation using singularity exponents and rényi entropy
CN109979481A (en) A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant