CN109979481A - A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient - Google Patents
A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient Download PDFInfo
- Publication number
- CN109979481A CN109979481A CN201910181526.7A CN201910181526A CN109979481A CN 109979481 A CN109979481 A CN 109979481A CN 201910181526 A CN201910181526 A CN 201910181526A CN 109979481 A CN109979481 A CN 109979481A
- Authority
- CN
- China
- Prior art keywords
- discrete cosine
- cosine transform
- inverse discrete
- coefficient
- cepstrum coefficient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 26
- 238000005192 partition Methods 0.000 title claims abstract description 15
- 230000005236 sound signal Effects 0.000 claims abstract description 33
- 238000000034 method Methods 0.000 claims abstract description 31
- 239000013598 vector Substances 0.000 claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 8
- 238000009432 framing Methods 0.000 claims description 8
- 238000001228 spectrum Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims 1
- 239000012467 final product Substances 0.000 claims 1
- 238000007781 pre-processing Methods 0.000 abstract description 2
- 230000003542 behavioural effect Effects 0.000 description 2
- 230000008447 perception Effects 0.000 description 2
- 230000000737 periodic effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 241000208340 Araliaceae Species 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000002203 pretreatment Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000010183 spectrum analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/14—Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
The speech feature extraction algorithm of the invention discloses a kind of dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient has following steps: S1, pre-processing to audio signal;S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled;S3, using cluster algorithm, calculate the similarity between the inverse discrete cosine transform cepstrum coefficient matrix adjacent column that step S2 is obtained, and related coefficient vector is summed maximum adjacent column merging;Iteration above procedure, until being incorporated into 14 column obtains 14 classes, the obtained dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient is speech feature.The perfect prior art of the present invention does not make full use of after S2 step process similarity feature between class possessed by signal itself, so that the present invention is had wider adaptability, and can obtain higher accuracy of identification in Speaker Identification.
Description
Technical field
The invention belongs to speech Feature Extraction Technology fields, and Unsupervised clustering parser is applied to speech feature extraction
The speech feature extraction in direction, in particular to a kind of dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient is calculated
Method.
Background technique
Speaker Recognition Technology includes feature extraction and modeling identification two parts.Feature extraction is in speaker Recognition Technology
Committed step, the overall performance of speech recognition system will be directly influenced.Ordinary circumstance, voice signal pass through framing and adding window
After pretreated, a large amount of high-dimensional data can be generated, when extracting speaker characteristic, it is necessary to by removing in original voice
Redundancy reduce data dimension.Existing method will use the triangle filter group filtering of Mel scale, by voice signal
It is converted to the speech feature vector for meeting characteristic parameter requirement and approximate human auditory system perception characteristics can be met and certain
Voice signal can be enhanced in degree and inhibit non-speech audio.Common characteristic parameter has:
Linear prediction analysis coefficient is the principle of sound for simulating the mankind, is obtained by analyzing the cascade model of sound channel short tube
Characteristic parameter;Perception linear predictor coefficient is to be applied in spectrum analysis based on auditory model by calculating, and will input voice
Signal is handled by human auditory model, substitutes the full pole for being equivalent to LPC of time-domain signal used in linear predictive coding LPC
The polynomial characteristic parameter of model prediction;Tandem feature and Bottleneck feature are the two classes spies extracted using neural network
Sign;Fbank feature based on wave filter group is equivalent to the discrete cosine transform that MFCC removes final step, compares with MFCC feature
Remain more primary voice datas;Linear prediction residue error has been abandoned in signal generating process based on channel model
Voice-activated information and the important feature parameter that the characteristic of formant is represented with more than ten a cepstrum coefficients;Speech characteristic parameter MFCC
As widest speech characteristic parameter, which is to carry out preemphasis to voice first, framing, adding window, accelerate in Fu
Energy spectrum, is then filtered by the triangle filter group of one group of Mel scale, calculates each filter by the pretreatment such as leaf transformation
The logarithmic energy of wave device group output obtains MFCC coefficient through discrete cosine transform (DCT), finds out Mel-scale Cepstrum ginseng
Number extracts dynamic difference parameter i.e. mel cepstrum coefficients again.S.Al-Rawahya in 2012 et al. refers to the feature extraction side MFCC
Method carries out waiting Dividing in frequency domain, proposes Histogram DCT cepstrum coefficient to the DCT cepstrum coefficient obtained after voice pretreatment
Method.
We have found that waiting Dividing in frequency domain cepstrum coefficient that can ignore the dynamic characteristic between speech data adjacent column itself, therefore
The present invention proposes a kind of new speech feature extraction algorithm i.e. based on the dynamic partition of related coefficient against discrete remaining on this basis
The method that string converts cepstrum coefficient utilizes hierarchy clustering method by speech data according to its behavioral characteristics in conjunction with unsupervised learning
Similitude carries out clustering, to extract the behavioral characteristics vector that can more describe speech characteristic.
S.Al-Rawahya et al. has found this new feature of DCT Cepstrum in research in 2012, what they proposed
Speech feature extraction algorithm based on equal frequency domains DCT Cepstrum coefficient.Pretreated audio signal is converted into frequency domain,
Pretreated audio signal is converted into frequency domain spectra multiplication form from convolution, logarithm is taken to it, obtained component with
Addition form indicates, obtains discrete cosine transform cepstrum coefficient (DCT Cepstrum coefficient).DCT cepstrum coefficient is with non-linear increasing
The periodicity for measuring recording frequency range divides frequency domain character section between 0Hz-600Hz frequency domain with every 50Hz, in 600Hz-
It can be regarded as with every 100Hz segmentation frequency domain character section process to speech signal intermediate frequency rate range week between 1000Hz frequency domain
The counting of issue.It is simpler than MFCC feature extracting method, faster.
Pearson correlation coefficient (Pearson correlation coefficient), also known as Pearson product-moment phase relation
Number (Pearson product-moment correlation coefficient, abbreviation PPMCC or PCCs), is for measuring
Two correlations between variable X and Y, value is between -1 and 1.
Summary of the invention
The purpose of the present invention is primarily directed to the speech feature based on equal Dividing in frequency domain inverse discrete cosine transform cepstrum coefficient
The inaccuracy of dividing frequency in extraction algorithm proposes a kind of dynamic partition inverse discrete cosine transform cepstrum based on related coefficient
The speech feature extraction algorithm of coefficient.The technological means that the present invention uses is as follows:
A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient, tool
It has the following steps:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
It is eliminated by pre-processing because mankind's phonatory organ itself and the equipment bring due to acquisition audio signal are mixed
The factors such as folded, higher hamonic wave distortion, high frequency to audio signal quality influence guarantee signal that subsequent processing obtains more evenly,
Smoothly, good parameter is provided for speech feature extraction, improves subsequent processing quality.
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency from convolution
Multiplication form is composed in domain, takes logarithm to it, obtained component indicates in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient
(IDCT Cepstrum coefficient), detailed process are carried out by following formula:
C (q)=IDCT log | DCT { x (k) } |
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is by pretreated
Audio signal, C (q) are transformed output signal, i.e. inverse discrete cosine transform cepstrum coefficient;
Inverse discrete cosine transform cepstrum coefficient is a data matrix, due to the intrinsic frequency attribute of speech, is carrying out layer
All Column Properties are identical when secondary cluster, and the relative position between each column is unalterable, so we pass through meter
It calculates the similarity of each adjacent Column Properties and similar highest adjacent two column is merged, successively clustered.
S3, the inverse discrete cosine transform cepstrum coefficient matrix adjacent column obtained using cluster algorithm, calculating step S2
Between similarity, and sum maximum adjacent column of related coefficient vector is merged;Iteration above procedure, until being incorporated into 14 column
14 classes are obtained, the obtained dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient is speech feature.
The preemphasis realizes that detailed process is carried out by following formula by digital filter:
Y (n)=X (n)-aX (n-1);
Wherein, Y (n) is the output signal after preemphasis, and the audio signal of X (n) input, a is pre emphasis factor, when n is
It carves.
The average power spectra of audio signal is influenced by glottal excitation and mouth and nose radiation, and front end is about in 800Hz or more
Decay by 6dB/oct (octave), the more high corresponding ingredient of frequency is smaller, right before analyzing audio signal thus
Its high frequency section is promoted.
What it is through speech analysis overall process is " short time analysis technique ".Audio signal has time-varying characteristics, but one
In a short time range (generally within the short time of 10~30ms), time-varying characteristics be held essentially constant it is i.e. relatively stable, because
And a quasi-steady state process can be seen as, i.e., audio signal has short-term stationarity.So point of any audio signal
Analysis and processing must be set up on the basis of " in short-term ", i.e. progress " short-time analysis ", and audio signal segmentation is analyzed to its feature
Parameter, wherein each section is known as one " frame ", frame length generally takes 10~30ms.In this way, for whole audio signal, analysis
The characteristic parameter time series being made of each frame characteristic parameter out.
The framing is that the output signal after the preemphasis is segmented into mono- frame of 20ms.
Windowing process is also carried out after sub-frame processing to it, the purpose of adding window, which may be considered, makes the voice signal overall situation more
Continuously, Gibbs' effect is avoided the occurrence of, the Partial Feature for showing periodic function without periodic voice signal originally is made.Institute
Stating adding window is Hamming window adding window.
The variation is Cepstrum Transform.
The cluster algorithm is step analysis algorithm.
The calculating similarity is to calculate Pearson product-moment correlation coefficient, then the specific steps of the step S3 are as follows:
Matrix A represents the inverse discrete cosine transform cepstrum coefficient for the single people m*n dimension that step S2 is acquired, inverse discrete cosine
Convert every one-dimensional vector V of cepstrum coefficient1, V2…VnIt regards n class as, acquires ViAnd Vi+1Pearson correlation coefficient are as follows:
Below it is the specific steps of clustering:
It clusters for the first time:
l1=r (V1,V2)
l2=r (V2,V3)
l3=r (V3,V4)
…
ln-1=r (Vn-1,Vn)
If the related coefficient vector respectively arranged after first speaker's inverse discrete cosine transform cepstrum coefficient cluster is expressed as p1
=(l1,l2,l3,...,ln-1), then after m-th speaker inverse discrete cosine transform cepstrum coefficient cluster the related coefficient that respectively arranges to
Amount is expressed as pM, it sums to the related coefficient vector of all speakers:
If i=argmin (L1,...,Ln-1), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vn), i.e.,
All speaker's inverse discrete cosine transform cepstrum coefficient related coefficient vectors are updated:
li-1=r (Vi-1,(Vi+Vi+1))
li=r ((Vi+Vi+1),Vi+2)
li+1=li+2
…
ln-2=ln-1
Delete ln-1
Second of cluster:
If j=argmin (L1,...,Ln-2), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vj+Vj+1),...,(Vn), i.e.,
It updates again:
lj-1=r (Vj-1,(Vj+Vj+1))
lj=r ((Vj+Vj+1),Vj+2)
lj+1=lj+2
…
ln-3=ln-2
Delete ln-2
And so on carry out hierarchical clustering until last cluster result is 14 classes, the obtained dynamic based on related coefficient point
Cutting inverse discrete cosine transform cepstrum coefficient is speech feature, which is put into GMM model and is identified to judge
The feasibility of the algorithm.
The present invention has the advantage that compared with prior art
First, since the present invention passes through the speech feature extraction for the equal Dividing in frequency domain DCT Cepstrum coefficient analysed in depth
The property of algorithm, the perfect prior art do not make full use of after S2 step process that similitude is special between class possessed by signal itself
Sign makes the present invention have wider adaptability, and can obtain higher accuracy of identification in Speaker Identification.
Second, the present invention is applied to Unsupervised clustering analysis in speech feature extraction, so that the present invention has process letter
Bright, speed is quick, occupies the few advantage of computing resource.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to do simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the dynamic partition inverse discrete cosine transform cepstrum system in a specific embodiment of the invention based on related coefficient
The flow chart of several speech feature extraction algorithms.
Fig. 2 is inverse discrete cosine transform cepstrum coefficient process of cluster analysis schematic diagram in a specific embodiment of the invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
As shown in Figure 1, a kind of speech of dynamic partition inverse discrete cosine transform cepstrum coefficient based on Similarity measures is special
Extraction algorithm is levied, there are following steps:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
The preemphasis realizes that detailed process is carried out by following formula by digital filter:
Y (n)=X (n)-aX (n-1);
Wherein, Y (n) is the output signal after preemphasis, and X (n) is the audio signal of input, and a is pre emphasis factor, and n is
Moment, this paper a value are 0.97.
The framing is that the output signal after the preemphasis is segmented into mono- frame of 20ms.
The adding window is Hamming window adding window.
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency from convolution
Multiplication form is composed in domain, takes logarithm to it, obtained component indicates in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient
(IDCT Cepstrum coefficient), detailed process are carried out by following formula:
C (q)=IDCT log | DCT { x (k) } |
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is by pretreated
Audio signal, C (q) are transformed output signal, i.e. inverse discrete cosine transform cepstrum coefficient;
S3, the inverse discrete cosine transform cepstrum coefficient matrix adjacent column obtained using cluster algorithm, calculating step S2
Between similarity, and sum maximum adjacent column of related coefficient vector is merged;Iteration above procedure, until being incorporated into 14 column
14 classes are obtained, the obtained dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient is speech feature, tool
Steps are as follows for body:
Matrix A represents the inverse discrete cosine transform cepstrum coefficient for the single people m*n dimension that step S2 is acquired, as shown in Fig. 2, handle
Every one-dimensional vector V of inverse discrete cosine transform cepstrum coefficient1, V2…VnIt regards n class as, acquires ViAnd Vi+1Pearson correlation coefficient
Are as follows:
Below it is the specific steps of clustering:
It clusters for the first time:
l1=r (V1,V2)
l2=r (V2,V3)
l3=r (V3,V4)
…
ln-1=r (Vn-1,Vn)
If the related coefficient vector respectively arranged after first speaker's inverse discrete cosine transform cepstrum coefficient cluster is expressed as p1
=(l1,l2,l3,...,ln-1), then after m-th speaker inverse discrete cosine transform cepstrum coefficient cluster the related coefficient that respectively arranges to
Amount is expressed as pM, it sums to the related coefficient vector of all speakers:
If i=argmin (L1,...,Ln-1), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vn), i.e.,
All speaker's inverse discrete cosine transform cepstrum coefficient related coefficient vectors are updated:
li-1=r (Vi-1,(Vi+Vi+1))
li=r ((Vi+Vi+1),Vi+2)
li+1=li+2
…
ln-2=ln-1
Delete ln-1
Second of cluster:
If j=argmin (L1,...,Ln-2), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vj+Vj+1),...,(Vn), i.e.,
It updates again:
lj-1=r (Vj-1,(Vj+Vj+1))
lj=r ((Vj+Vj+1),Vj+2)
lj+1=lj+2
…
ln-3=ln-2
Delete ln-2
And so on carry out hierarchical clustering until last cluster result is 14 classes, the obtained dynamic based on related coefficient point
Cutting inverse discrete cosine transform cepstrum coefficient is speech feature, which is put into GMM model and is identified to judge
The feasibility of the algorithm.
Finally, it should be noted that the above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Pipe present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (7)
1. a kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient, special
Sign is with following steps:
S1, audio signal is pre-processed:
Preemphasis, framing and windowing process are successively carried out to audio signal;
S2, variation of the pretreated audio signal progress from time domain to frequency domain is handled:
Pretreated audio signal is converted into frequency domain, i.e., pretreated audio signal is converted to frequency domain spectra from convolution
Multiplication form takes logarithm to it, and obtained component is indicated in the form of being added, and obtains inverse discrete cosine transform cepstrum coefficient, specifically
Process is carried out by following formula
C (q)=IDCT log | DCT { x (k) } |
Wherein, DCT and IDCT is discrete cosine transform and inverse discrete cosine transform respectively, and x (k) is to pass through pretreated speech
Signal, C (q) are transformed output signal, i.e. inverse discrete cosine transform cepstrum coefficient;
S3, using cluster algorithm, calculate between the inverse discrete cosine transform cepstrum coefficient matrix adjacent column that step S2 is obtained
Similarity, and sum maximum adjacent column of related coefficient vector is merged;Iteration above procedure arranges until being incorporated into 14 to obtain the final product
To 14 classes, the obtained dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient is speech feature.
2. extraction algorithm according to claim 1, it is characterised in that: the preemphasis realizes have by digital filter
Body process is carried out by following formula:
Y (n)=X (n)-aX (n-1);
Wherein, Y (n) is the output signal after preemphasis, and the audio signal of X (n) input, a is pre emphasis factor, and n is the moment.
3. extraction algorithm according to claim 1, it is characterised in that: the framing is to believe the output after the preemphasis
Number it is segmented into mono- frame of 20ms.
4. extraction algorithm according to claim 1, it is characterised in that: the adding window is Hamming window adding window.
5. extraction algorithm according to claim 1, it is characterised in that: the variation is Cepstrum Transform.
6. extraction algorithm according to claim 1, it is characterised in that: the cluster algorithm is step analysis algorithm.
7. extraction algorithm according to claim 1, it is characterised in that: the calculating similarity is to calculate Pearson product-moment phase
Relationship number, the then specific steps of the step S3 are as follows:
Matrix A represents the inverse discrete cosine transform cepstrum coefficient for the single people m*n dimension that step S2 is acquired, inverse discrete cosine transform
Every one-dimensional vector V of cepstrum coefficient1, V2…VnIt regards n class as, acquires ViAnd Vi+1Pearson correlation coefficient are as follows:
Below it is the specific steps of clustering:
It clusters for the first time:
l1=r (V1,V2)
l2=r (V2,V3)
l3=r (V3,V4)
…
ln-1=r (Vn-1,Vn)
If the related coefficient vector respectively arranged after first speaker's inverse discrete cosine transform cepstrum coefficient cluster is expressed as p1=(l1,
l2,l3,...,ln-1), then the related coefficient vector respectively arranged after m-th speaker inverse discrete cosine transform cepstrum coefficient cluster indicates
For pM, it sums to the related coefficient vector of all speakers:
If i=argmin (L1,...,Ln-1), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vn), i.e.,
All speaker's inverse discrete cosine transform cepstrum coefficient related coefficient vectors are updated:
li-1=r (Vi-1,(Vi+Vi+1))
li=r ((Vi+Vi+1),Vi+2)
li+1=li+2
…
ln-2=ln-1
Delete ln-1
Second of cluster:
If j=argmin (L1,...,Ln-2), then cluster result are as follows:
(V1),(V2),...,(Vi+Vi+1),...,(Vj+Vj+1),...,(Vn), i.e.,
It updates again:
lj-1=r (Vj-1,(Vj+Vj+1))
lj=r ((Vj+Vj+1),Vj+2)
lj+1=lj+2
…
ln-3=ln-2
Delete ln-2
And so on carry out hierarchical clustering until last cluster result is 14 classes, the obtained dynamic partition based on related coefficient is inverse
Discrete cosine transform cepstrum coefficient is speech feature, which is put into GMM model and is identified to judge the calculation
The feasibility of method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181526.7A CN109979481A (en) | 2019-03-11 | 2019-03-11 | A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910181526.7A CN109979481A (en) | 2019-03-11 | 2019-03-11 | A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109979481A true CN109979481A (en) | 2019-07-05 |
Family
ID=67078590
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910181526.7A Pending CN109979481A (en) | 2019-03-11 | 2019-03-11 | A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109979481A (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458950A (en) * | 2007-12-14 | 2009-06-17 | 安凯(广州)软件技术有限公司 | Method for eliminating interference from A/D converter noise to digital recording |
US9606530B2 (en) * | 2013-05-17 | 2017-03-28 | International Business Machines Corporation | Decision support system for order prioritization |
CN106971712A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive rapid voiceprint recognition methods and system |
CN107293308A (en) * | 2016-04-01 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of audio-frequency processing method and device |
CN109065071A (en) * | 2018-08-31 | 2018-12-21 | 电子科技大学 | A kind of song clusters method based on Iterative k-means Algorithm |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
-
2019
- 2019-03-11 CN CN201910181526.7A patent/CN109979481A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458950A (en) * | 2007-12-14 | 2009-06-17 | 安凯(广州)软件技术有限公司 | Method for eliminating interference from A/D converter noise to digital recording |
US9606530B2 (en) * | 2013-05-17 | 2017-03-28 | International Business Machines Corporation | Decision support system for order prioritization |
CN106971712A (en) * | 2016-01-14 | 2017-07-21 | 芋头科技(杭州)有限公司 | A kind of adaptive rapid voiceprint recognition methods and system |
CN107293308A (en) * | 2016-04-01 | 2017-10-24 | 腾讯科技(深圳)有限公司 | A kind of audio-frequency processing method and device |
CN109065071A (en) * | 2018-08-31 | 2018-12-21 | 电子科技大学 | A kind of song clusters method based on Iterative k-means Algorithm |
CN109256127A (en) * | 2018-11-15 | 2019-01-22 | 江南大学 | A kind of Robust feature extracting method based on non-linear power transformation Gammachirp filter |
Non-Patent Citations (5)
Title |
---|
S.AL-RAWAHY ET AL.: "《Text-independent speaker identification system based on the histogram of DCT-cepstrum coeffcients》", 《INTERNATIONAL JOURNAL OF KNOWLEDGE-BASED IN INTELLIGENT ENGINEERING SYSTEMS》 * |
WEI HAN ET AL.: "《An efficient MFCC extraction method in speech recognition》", 《ISCAS 2006》 * |
田辉平: "《基于层次分析法和聚态分析法相结合的评价方法》", 《华东经济管理》 * |
缪元武: "《基于层次聚类的数据分析》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
胡文静: "《基于层次聚类分析的变点识别方法》", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6783001B2 (en) | Speech feature extraction algorithm based on dynamic division of cepstrum coefficients of inverse discrete cosine transform | |
Ali et al. | Automatic speech recognition technique for Bangla words | |
Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
CN108305616A (en) | A kind of audio scene recognition method and device based on long feature extraction in short-term | |
CN110942766A (en) | Audio event detection method, system, mobile terminal and storage medium | |
Almaadeed et al. | Text-independent speaker identification using vowel formants | |
US20080167862A1 (en) | Pitch Dependent Speech Recognition Engine | |
CN110648684B (en) | Bone conduction voice enhancement waveform generation method based on WaveNet | |
Rajesh Kumar et al. | Optimization-enabled deep convolutional network for the generation of normal speech from non-audible murmur based on multi-kernel-based features | |
Mehta et al. | Comparative study of MFCC and LPC for Marathi isolated word recognition system | |
Gamit et al. | Isolated words recognition using mfcc lpc and neural network | |
CN114298019A (en) | Emotion recognition method, emotion recognition apparatus, emotion recognition device, storage medium, and program product | |
Nancy et al. | Audio based emotion recognition using mel frequency cepstral coefficient and support vector machine | |
Luo et al. | Emotional Voice Conversion Using Neural Networks with Different Temporal Scales of F0 based on Wavelet Transform. | |
Yadav et al. | Non-Uniform Spectral Smoothing for Robust Children's Speech Recognition. | |
Gaudani et al. | Comparative study of robust feature extraction techniques for ASR for limited resource Hindi language | |
Deiv et al. | Automatic gender identification for hindi speech recognition | |
Lekshmi et al. | An acoustic model and linguistic analysis for Malayalam disyllabic words: a low resource language | |
Muthamizh Selvan et al. | Spectral histogram of oriented gradients (SHOGs) for Tamil language male/female speaker classification | |
CN109979481A (en) | A kind of speech feature extraction algorithm of the dynamic partition inverse discrete cosine transform cepstrum coefficient based on related coefficient | |
Musaev et al. | Advanced feature extraction method for speaker identification using a classification algorithm | |
CN113689885A (en) | Intelligent auxiliary guide system based on voice signal processing | |
Tripathi et al. | VOP detection for read and conversation speech using CWT coefficients and phone boundaries | |
Iswarya et al. | Speech query recognition for Tamil language using wavelet and wavelet packets | |
Ma et al. | A Euclidean metric based voice feature extraction method using IDCT cepstrum coefficient |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190705 |