CN107274887A - Speaker's Further Feature Extraction method based on fusion feature MGFCC - Google Patents
Speaker's Further Feature Extraction method based on fusion feature MGFCC Download PDFInfo
- Publication number
- CN107274887A CN107274887A CN201710322792.8A CN201710322792A CN107274887A CN 107274887 A CN107274887 A CN 107274887A CN 201710322792 A CN201710322792 A CN 201710322792A CN 107274887 A CN107274887 A CN 107274887A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- speaker
- feature
- munderover
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 40
- 238000000605 extraction Methods 0.000 title claims abstract description 36
- 238000000034 method Methods 0.000 claims abstract description 12
- 238000001228 spectrum Methods 0.000 claims description 21
- 238000012545 processing Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 5
- 238000007906 compression Methods 0.000 claims description 5
- 230000004044 response Effects 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 239000000284 extract Substances 0.000 abstract description 4
- 239000013598 vector Substances 0.000 description 7
- 230000006870 function Effects 0.000 description 5
- 230000008859 change Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 210000003477 cochlea Anatomy 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000013432 robust analysis Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/02—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/26—Pre-filtering or post-filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2218/00—Aspects of pattern recognition specially adapted for signal processing
- G06F2218/08—Feature extraction
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The invention discloses a kind of speaker's Further Feature Extraction method based on fusion feature MGFCC, the method comprising the steps of:S1 is handled the progress of speaker's voice using Mel wave filters and is obtained MFCC features;S2 is handled the progress of speaker's voice using Gammatone wave filters and is obtained GFCC features simultaneously;S3 is calculated each dimensional feature discrimination of two kinds of features in a noisy environment;Every one-dimensional characteristic that S4 counts two kinds of features is in maximum FRThe number of times of value;The different of maximum times of the S5 according to two kinds of features under noise background carry out Fusion Features;S6 carries out differential to fusion feature and restructuring obtains second extraction feature.The present invention can extract more can comprehensive representation speaker feature.
Description
Technical field
The present invention relates to speaker Recognition Technology field, more particularly to a kind of speaker two based on fusion feature MGFCC
Secondary feature extracting method.
Background technology
Speaker's voice is after above a series of pretreatment, it is necessary to feature extraction and calculation be carried out to it, so as to produce
A raw mathematics vector sequence is used as training and the input of identification process in Speaker Recognition System, therefore extracts the excellent of feature
It is bad particularly important to the training of Speaker Identification model and the determination of parameter, affect the design and its property of whole speaker's system
Energy.
The selection and the performance boost of follow-up Speaker Recognition System of speaker characteristic have a direct impact, and are Speaker Identifications
It is basic that system is set up.To the Speaker Recognition System in practical application scene, the selection of characteristic parameter will not only consider identification
Rate, will more ensure the stability and robustness of whole system performance.Therefore it is whole to extract optimal speaker characteristic parameter
Particularly important processing procedure in individual Speaker Recognition System, while being also one of difficult point in Speech processing, to speaking
The recognition performance of people has a direct impact.
The content of the invention
To improve the discrimination of Speaker Recognition System in a noisy environment, the present invention is based on people from bionical angle
Ear auditory properties are extracted to speaker characteristic and studied, Gammatone filter of the Selection utilization based on human hearing characteristic first
Ripple device group is simulated to human ear analog cochlea respectively with Mel wave filter groups, then according to Mel frequency cepstral coefficients and
The discrimination of Gammatone frequency cepstral coefficients in a noisy environment carries out Fusion Features, obtains a kind of special based on human auditory system
Speaker's fusion feature MGFCC of property.
The present invention is adopted the following technical scheme that to achieve these goals:Speaker based on fusion feature MGFCC is secondary
Feature extracting method, it is characterised in that comprise the following steps:
S1:The progress of speaker's voice signal is handled using Mel wave filters and obtains MFCC features;
S2:The progress of speaker's voice signal is handled using Gammatone wave filters simultaneously and obtains GFCC features;
S3:Each dimensional feature discrimination F of MFCC features and GFCC features in a noisy environment is calculated respectivelyR;
S4:Every one-dimensional characteristic of statistics MFCC features and GFCC features is in the number of times of maximum characteristic area indexing respectively;
S5:Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out feature and melted
Close;
S6:Differential is carried out to the fusion feature that step S5 is obtained and feature restructuring obtains second extraction feature.
The method of MFCC feature extractions is described in step S1:
S11:Preemphasis processing is carried out to speaker's voice signal:Speaker's voice signal is carried out using digital filter
Handle, the transmission function in its Z domain is:H (z)=1-0.95z-1;
S12:Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, window function
For w (n), then the voice signal s after adding windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N;
Window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low:
S13:Fast Fourier Transform (FFT):Signal after S12 is handled carries out Fast Fourier Transform (FFT), from time domain data conversion
To frequency domain, obtaining voice linear spectral X (k) is:
S14:Line energy is calculated to the data after each frame Fast Fourier Transform (FFT):E (k)=[X (k)]2;
S15:Logarithm operation is made in output to each Mel wave filters, can obtain log spectrum S (m) and be:
Hm(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters.
S16:Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C
(n) it is:
The extracting method of GFCC features is described in step S2:
S21:Speaker voice signal s (n) is converted into time-domain signal x (n), by quick Fu after pretreatment
Leaf transformation obtains discrete power spectrum L (k),
S22:Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filters
Group is filtered to it;
S23:Row index compression is entered in output to each wave filter, obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
In formula, e (f) is index compressed value, and M is filter channel number, 1≤m<M, Hm(k) Gammatone wave filters are represented
Frequency response.H in the present inventionm(k) frequency response of wave filter is represented.
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
In formula, L is characterized the dimension of parameter.CGFCC(j) the GFCC characteristic parameters of different dimensions are represented, M represents wave filter
Number.
Characteristic area indexes FRThe ratio between inter _ class relationship and within-cluster variance for being characterized:
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is to say for i-th
The characteristic mean of people is talked about, H is speaker's sum, and K is the number of speech frames of single speaker.
The feature restructuring obtains second extraction feature F_MGFCC according to below equation:
MGFCCiRepresent fusion feature, MGFCC_D(i-p)The fusion feature after differential is represented, P is feature exponent number.
In summary, by adopting the above-described technical solution, the beneficial effects of the invention are as follows:Under identical experimental situation,
Further Feature Extraction algorithm based on fusion feature MGFCC, still there is preferable identification under complicated noise, and stronger
Robustness, can further extract the hiding feature of speaker, largely improve Speaker Recognition System
Performance.
Brief description of the drawings
Fig. 1 is the extraction flow chart of MFCC characteristic parameters;
Fig. 2 is the extraction flow chart of GFCC characteristic parameters;
Fig. 3 is speaker's Further Feature Extraction flow chart based on fusion feature MGFCC.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is purged, in detail
Carefully describe.Described embodiment is only a part of embodiment of the present invention.
Fig. 1 is the extraction flow chart of MFCC characteristic parameters, and Fig. 2 is the extraction flow chart of GFCC characteristic parameters, and Fig. 3 is to be based on
Fusion feature MGFCC speaker's Further Feature Extraction flow chart, as shown in the figure:It is special based on fusion the present invention is to provide one kind
The method for levying MGFCC speaker's Further Feature Extraction, comprises the following steps:
S1:The progress of speaker's voice is handled using Mel wave filters and obtains MFCC features;
S2, GFCC features are obtained while being handled using Gammatone wave filters the progress of speaker's voice;
S3, is calculated each dimensional feature discrimination of two kinds of features in a noisy environment;
S4, every one-dimensional characteristic of two kinds of features of statistics is in maximum FRThe number of times of value;
S5, different according to maximum times of two kinds of features under noise background carry out Fusion Features;
S6, carries out differential to fusion feature and restructuring obtains second extraction feature.
Speaker's Further Feature Extraction method based on fusion feature MGFCC described in step S1 is included to MFCC features
Extract, then its extraction step is (referring to Fig. 1):
S11:Preemphasis.Vocal cords and lip can bring certain low frequency signal to influence in mankind's voiced process, in order to carry
The HFS of voice signal is risen, and then causes voice signal to tend to be flat, preemphasis can be taken to handle.Generally using numeral
Wave filter is handled, set here the transmission function in Z domains as:
H (z)=1-0.95z-1
S12:Framing adding window.Because voice signal is short-term stationarity, therefore sub-frame processing can be carried out to it, wherein often
One frame contains N number of sampled point, and N generally takes 256, and the time is about 30ms.In view of the edge effect of speech frame, the two ends of speech frame
It can produce and sharp change, now need to carry out windowing process to the signal after preemphasis processing.Definition window function is w (n), then adds
Voice signal s after windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N.
S13:Preferably to express speaker characteristic, generally from the Hamming window that main lobe is wider and secondary lobe is relatively low:
S14:Fast Fourier Transform (FFT) (FFT).After pretreatment, each frame voice signal needs progress corresponding
FFT computings, frequency domain is transformed to from time domain data, can be obtained voice linear spectral X (k) and is:
S15:Line energy is calculated.Line energy is calculated to the data after each frame FFT:
E (k)=[X (k)]2
S16:Preferably to lift the recognition performance of Speaker Recognition System, logarithm is made in the generally output to each wave filter
Computing, can obtain log spectrum S (m) is:
Hm(k) Mel frequency response is represented, M represents the number of Mel wave filters.
S17:Discrete cosine transform (DCT).Dct transform is carried out to voice spectrum S (m), and then obtains Mel feature MFCC,
Then the n-th dimensional feature C (n) such as formula.
M represents number of filter.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC includes the extraction to GFCC features,
Then its extraction step is:
S21:Voice signal s (n) is converted into time-domain signal x (n), passes through Fast Fourier Transform (FFT) after pretreatment
Obtain discrete power spectrum L (k).
S22:Take above-mentioned power spectrum L (k) square obtains speech energy spectrum, then using Gammatone wave filter groups pair
It is filtered.
S23:In order to further improve the antijamming capability of Speaker Recognition System, the output to each wave filter is used
Index compresses, and obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
In formula, e (f) is index compressed value, and M is filter channel number.
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
In formula, L is characterized the dimension of parameter.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then to two kinds of spies of MFCC and GFCC
Each dimensional feature discrimination computational methods levied in a noisy environment are concretely comprised the following steps:
The ratio between the inter _ class relationship and within-cluster variance of feature FR:
The noise adaptation of speaker characteristic can be judged using this discrimination.To speaker characteristic in different rings
F under borderRValue is calculated, and carries out the noise robustness analysis of speaker characteristic.FRDefinition is:
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is to say for i-th
The characteristic mean of people is talked about, H is speaker's sum, and K is the number of speech frames of single speaker.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then two kinds of features of statistics is each
Dimensional feature is in maximum FRThe number of times of value is:
Feature A and B are extracted to the speaker in self-built sound bank, while the maximum of every one-dimensional characteristic to two kinds of features
FRThe number of times P that max occurs in eigenmatrix is counted, and calculation expression is:
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then according to two kinds of features in noise
The step of maximum times under background different carry out Fusion Features be:
S51:From Noisy Speech Signal, MFCC the and GFCC characteristic parameters of 24 dimensions are extracted respectively;
S52:Speaker characteristic is calculated according to the formula in step S3 to be set in factory noise and white noise and signal to noise ratio
F in the environment of 5dB, 10dB, 15dB per one-dimensional characteristicRValue;
S53:Maximum F is in per one-dimensional characteristic according to the formula statistics in step S4RThe number of times of value, and according to per one-dimensional
The difference of MFCC and GFCC maximum number of times is merged, and obtains 24 dimension fusion feature MGFCC.
Described speaker's Further Feature Extraction method based on fusion feature MGFCC, then carried out to fusion feature MGFCC
Differential and restructuring obtain second extraction characterization step:
S61:Differential is carried out to fusion feature:
The purpose differentiated to feature is in order to the progress of the continuous dynamic change track of corresponding characteristic vector
Description.Therefore, speaker is expressed in order to better profit from fusion feature MGFCC, MGFCC features are differentiated here with
Obtain its continuous dynamic change track.
Feature_D(j)i=Feature (j)i-Feature(j-1)i 0≤i≤P,1≤j≤R
In formula, Feature is the sequence vector of original feature, is herein MGFCC, and Feature_D is original characteristic vector
First differential, P is feature exponent number, R characteristic vector numbers.
S62:Fusion feature is recombinated:
Formula can obtain Feature in step S62, multigroup characteristic vector such as Feature_D, carry out feature to it below
Restructuring, because different speaker characteristics correspond to different phonetic feature differential vectors, is carried out by specific ratio to it
Weighting restructuring, preferably to describe the personal information of speaker.It is special that one group of new speaker can be obtained according to formula below
Levy parameter F_MGFCC:
The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, it is clear that those skilled in the art
Member can carry out various changes and modification to the present invention without departing from the spirit and scope of the present invention.So, if the present invention
These modifications and variations belong within the scope of the claims in the present invention and its equivalent technologies, then the present invention is also intended to include these
Including change and modification.
Claims (6)
1. speaker's Further Feature Extraction method based on fusion feature MGFCC, it is characterised in that comprise the following steps:
S1:The progress of speaker's voice signal is handled using Mel wave filters and obtains MFCC features;
S2:The progress of speaker's voice signal is handled using Gammatone wave filters simultaneously and obtains GFCC features;
S3:Each dimensional feature discrimination F of MFCC features and GFCC features in a noisy environment is calculated respectivelyR;
S4:Every one-dimensional characteristic of statistics MFCC features and GFCC features is in the number of times of maximum characteristic area indexing respectively;
S5:Maximum characteristic area indexing number of times of the two kinds of features counted according to step S4 under noise background carries out Fusion Features;
S6:Differential is carried out to the fusion feature that step S5 is obtained and feature restructuring obtains second extraction feature.
2. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that:
The method of MFCC feature extractions is described in step S1:
S11:Preemphasis processing is carried out to speaker's voice signal:Using digital filter to speaker's voice signal at
Manage, the transmission function in its Z domain is:H (z)=1-0.95z-1;
S12:Framing adding window is carried out to the signal after step S11 processing, each of which frame contains N number of sampled point, and window function is w
(n), then voice signal s after adding windoww(n) it is:
sw(n)=y (n) * w (n)
In formula, y (n) is the signal after preemphasis, 0≤n≤N;
S13:Fast Fourier Transform (FFT):Signal after S12 is handled carries out Fast Fourier Transform (FFT), and frequency is transformed to from time domain data
Domain, obtaining voice linear spectral X (k) is:
<mrow>
<mi>X</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<msub>
<mi>s</mi>
<mi>w</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mn>2</mn>
<mi>&pi;</mi>
<mi>n</mi>
<mi>k</mi>
<mo>/</mo>
<mi>N</mi>
</mrow>
</msup>
<mo>,</mo>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>&le;</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>&le;</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
S14:Line energy is calculated to the data after each frame Fast Fourier Transform (FFT):E (k)=[X (k)]2;
S15:Logarithm operation is made in output to each Mel wave filters, can obtain log spectrum S (m) and be:
<mrow>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mi>l</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mo>|</mo>
<mi>X</mi>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
<msup>
<mo>|</mo>
<mn>2</mn>
</msup>
<mo>)</mo>
</mrow>
<msub>
<mi>H</mi>
<mi>m</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
<mo>,</mo>
<mn>0</mn>
<mo>&le;</mo>
<mi>m</mi>
<mo><</mo>
<mi>M</mi>
</mrow>
Hm(k) frequency response of Mel wave filters is represented, M represents the number of Mel wave filters;
S16:Discrete cosine transform conversion is carried out to log spectrum S (m), and then obtains feature MFCC, then the n-th dimensional feature C (n)
For:
<mrow>
<mi>C</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mrow>
<mi>M</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mi>S</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>)</mo>
</mrow>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mo>&lsqb;</mo>
<mfrac>
<mrow>
<mi>&pi;</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>+</mo>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mi>M</mi>
</mfrac>
<mo>&rsqb;</mo>
<mo>,</mo>
<mn>0</mn>
<mo>&le;</mo>
<mi>m</mi>
<mo><</mo>
<mi>M</mi>
<mo>.</mo>
</mrow>
3. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 2, it is characterised in that:
The window function is from the Hamming window that main lobe is wider and secondary lobe is relatively low:
<mrow>
<mi>w</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mn>0.54</mn>
<mo>-</mo>
<mn>0.46</mn>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mo>&lsqb;</mo>
<mn>2</mn>
<mi>&pi;</mi>
<mi>n</mi>
<mrow>
<mo>(</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mn>0</mn>
<mo>&le;</mo>
<mi>n</mi>
<mo>&le;</mo>
<mrow>
<mo>(</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mn>0</mn>
<mo>,</mo>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>o</mi>
<mi>t</mi>
<mi>h</mi>
<mi>e</mi>
<mi>r</mi>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>.</mo>
</mrow>
4. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that:
The extracting method of GFCC features is described in step S2:
S21:Speaker voice signal s (n) is converted into time-domain signal x (n), become by fast Fourier after pretreatment
Get discrete power spectrum L (k) in return,
<mrow>
<mi>L</mi>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>n</mi>
<mo>=</mo>
<mn>0</mn>
</mrow>
<mrow>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</munderover>
<mi>x</mi>
<mrow>
<mo>(</mo>
<mi>n</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mi>j</mi>
<mn>2</mn>
<mi>&pi;</mi>
<mi>n</mi>
<mi>k</mi>
<mo>/</mo>
<mi>N</mi>
</mrow>
</msup>
<mo>,</mo>
<mrow>
<mo>(</mo>
<mn>0</mn>
<mo>&le;</mo>
<mi>n</mi>
<mo>,</mo>
<mi>k</mi>
<mo>&le;</mo>
<mi>N</mi>
<mo>-</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
1
S22:Take above-mentioned discrete power spectrum L's (k) square to obtain speech energy spectrum, then using Gammatone wave filter groups pair
It is filtered;
S23:Row index compression is entered in output to each wave filter, obtains one group of energy frequency spectrum s1,s2,s3,…,sMFor:
<mrow>
<msub>
<mi>s</mi>
<mi>m</mi>
</msub>
<mo>=</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msup>
<mrow>
<mo>&lsqb;</mo>
<mi>L</mi>
<msup>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
<msub>
<mi>H</mi>
<mi>m</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>k</mi>
<mo>)</mo>
</mrow>
<mo>&rsqb;</mo>
</mrow>
<mrow>
<mi>e</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mo>)</mo>
</mrow>
</mrow>
</msup>
</mrow>
In formula, e (f) is index compressed value, and M is the number of wave filter, 1≤m<M, Hm(k) frequency of Gammatone wave filters is represented
Rate is responded;
S24:Dct transform is made to the energy spectrum after compression, GFCC features are tried to achieve, its operational formula is:
<mrow>
<msub>
<mi>C</mi>
<mrow>
<mi>G</mi>
<mi>F</mi>
<mi>C</mi>
<mi>C</mi>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msqrt>
<mfrac>
<mn>2</mn>
<mi>M</mi>
</mfrac>
</msqrt>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>m</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</munderover>
<msub>
<mi>s</mi>
<mi>m</mi>
</msub>
<mi>c</mi>
<mi>o</mi>
<mi>s</mi>
<mo>&lsqb;</mo>
<mfrac>
<mrow>
<mi>&pi;</mi>
<mi>j</mi>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>-</mo>
<mn>1</mn>
<mo>/</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mi>M</mi>
</mfrac>
<mo>&rsqb;</mo>
<mo>,</mo>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mn>3</mn>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<mi>L</mi>
</mrow>
In formula, L is characterized the dimension of parameter.CGFCC(j) GFCC features are represented, M represents number of filter.
5. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that:
Characteristic area indexes FRThe ratio between inter _ class relationship and within-cluster variance for being characterized:
<mrow>
<msub>
<mi>F</mi>
<mi>R</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>H</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msub>
<mi>&mu;</mi>
<mi>i</mi>
</msub>
<mo>-</mo>
<mi>&mu;</mi>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
<mrow>
<mfrac>
<mn>1</mn>
<mi>K</mi>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>H</mi>
</munderover>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msup>
<mrow>
<mo>(</mo>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mi>j</mi>
</msubsup>
<mo>-</mo>
<msub>
<mi>&mu;</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mn>2</mn>
</msup>
</mrow>
</mfrac>
<mo>,</mo>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mn>3</mn>
<mo>,</mo>
<mn>...</mn>
<mi>H</mi>
<mo>,</mo>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mn>3</mn>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<mi>K</mi>
</mrow>
<mrow>
<msub>
<mi>&mu;</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>K</mi>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>K</mi>
</munderover>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mi>j</mi>
</msubsup>
<mo>,</mo>
<mi>&mu;</mi>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>H</mi>
</mfrac>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>H</mi>
</munderover>
<msub>
<mi>&mu;</mi>
<mi>i</mi>
</msub>
</mrow>
In formula, μ is the characteristic mean of whole speakers,It is the characteristic value of i-th of speaker's jth frame, μiIt is i-th of speaker
Characteristic mean, H be speaker sum, K is the number of speech frames of single speaker.
6. speaker's Further Feature Extraction method based on fusion feature MGFCC according to claim 1, it is characterised in that:
The feature restructuring obtains second extraction feature F_MGFCC according to below equation:
<mrow>
<mi>F</mi>
<mo>_</mo>
<mi>M</mi>
<mi>G</mi>
<mi>F</mi>
<mi>C</mi>
<mi>C</mi>
<mo>=</mo>
<mfenced open = "{" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<msub>
<mi>MGFCC</mi>
<mi>i</mi>
</msub>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>0</mn>
<mo>,</mo>
<mn>1</mn>
<mo>,</mo>
<mn>2</mn>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<mi>P</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mi>M</mi>
<mi>G</mi>
<mi>F</mi>
<mi>C</mi>
<mi>C</mi>
<mo>_</mo>
<msub>
<mi>D</mi>
<mrow>
<mo>(</mo>
<mi>i</mi>
<mo>-</mo>
<mi>p</mi>
<mo>)</mo>
</mrow>
</msub>
</mrow>
</mtd>
<mtd>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mi>p</mi>
<mo>,</mo>
<mi>p</mi>
<mo>+</mo>
<mn>1</mn>
<mo>,</mo>
<mi>p</mi>
<mo>+</mo>
<mn>2</mn>
<mo>,</mo>
<mn>...</mn>
<mo>,</mo>
<mn>2</mn>
<mi>p</mi>
<mo>-</mo>
<mn>1</mn>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
</mrow>
MGFCCiRepresent fusion feature, MGFCC_D(i-p)The fusion feature after differential is represented, P is feature exponent number.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710322792.8A CN107274887A (en) | 2017-05-09 | 2017-05-09 | Speaker's Further Feature Extraction method based on fusion feature MGFCC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710322792.8A CN107274887A (en) | 2017-05-09 | 2017-05-09 | Speaker's Further Feature Extraction method based on fusion feature MGFCC |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107274887A true CN107274887A (en) | 2017-10-20 |
Family
ID=60073910
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710322792.8A Pending CN107274887A (en) | 2017-05-09 | 2017-05-09 | Speaker's Further Feature Extraction method based on fusion feature MGFCC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107274887A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108109233A (en) * | 2017-12-14 | 2018-06-01 | 华南理工大学 | Multilevel security protection system based on biological information of human body |
CN109003364A (en) * | 2018-07-04 | 2018-12-14 | 深圳市益鑫智能科技有限公司 | A kind of Gate-ban Monitoring System of Home House based on speech recognition |
CN109147818A (en) * | 2018-10-30 | 2019-01-04 | Oppo广东移动通信有限公司 | Acoustic feature extracting method, device, storage medium and terminal device |
CN110363148A (en) * | 2019-07-16 | 2019-10-22 | 中用科技有限公司 | A kind of method of face vocal print feature fusion verifying |
CN111145736A (en) * | 2019-12-09 | 2020-05-12 | 华为技术有限公司 | Speech recognition method and related equipment |
CN111755012A (en) * | 2020-06-24 | 2020-10-09 | 湖北工业大学 | Robust speaker recognition method based on depth layer feature fusion |
CN115116452A (en) * | 2021-12-30 | 2022-09-27 | 昆明理工大学 | Information hiding method based on voice GFCC characteristic parameters |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104900229A (en) * | 2015-05-25 | 2015-09-09 | 桂林电子科技大学信息科技学院 | Method for extracting mixed characteristic parameters of voice signals |
-
2017
- 2017-05-09 CN CN201710322792.8A patent/CN107274887A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104900229A (en) * | 2015-05-25 | 2015-09-09 | 桂林电子科技大学信息科技学院 | Method for extracting mixed characteristic parameters of voice signals |
Non-Patent Citations (3)
Title |
---|
张来洪 等: ""一种基于感知特征动态失真度量的语音质量评估算法"", 《自动化技术与应用》 * |
方琦军: ""基于VQ与HMM的说话人识别技术研究"", 《中国优秀硕士学位论文全文数据库信息技术辑》 * |
罗元 等: ""一种新的鲁棒声纹特征提取与融合方法"", 《计算机科学》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108109233A (en) * | 2017-12-14 | 2018-06-01 | 华南理工大学 | Multilevel security protection system based on biological information of human body |
CN109003364A (en) * | 2018-07-04 | 2018-12-14 | 深圳市益鑫智能科技有限公司 | A kind of Gate-ban Monitoring System of Home House based on speech recognition |
CN109147818A (en) * | 2018-10-30 | 2019-01-04 | Oppo广东移动通信有限公司 | Acoustic feature extracting method, device, storage medium and terminal device |
CN110363148A (en) * | 2019-07-16 | 2019-10-22 | 中用科技有限公司 | A kind of method of face vocal print feature fusion verifying |
CN111145736A (en) * | 2019-12-09 | 2020-05-12 | 华为技术有限公司 | Speech recognition method and related equipment |
CN111145736B (en) * | 2019-12-09 | 2022-10-04 | 华为技术有限公司 | Speech recognition method and related equipment |
CN111755012A (en) * | 2020-06-24 | 2020-10-09 | 湖北工业大学 | Robust speaker recognition method based on depth layer feature fusion |
CN115116452A (en) * | 2021-12-30 | 2022-09-27 | 昆明理工大学 | Information hiding method based on voice GFCC characteristic parameters |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
Li et al. | An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions | |
CN108447495B (en) | Deep learning voice enhancement method based on comprehensive feature set | |
CN109036382B (en) | Audio feature extraction method based on KL divergence | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN110085245B (en) | Voice definition enhancing method based on acoustic feature conversion | |
CN110931022B (en) | Voiceprint recognition method based on high-low frequency dynamic and static characteristics | |
CN102568476B (en) | Voice conversion method based on self-organizing feature map network cluster and radial basis network | |
CN109949821B (en) | Method for removing reverberation of far-field voice by using U-NET structure of CNN | |
CN102664010B (en) | Robust speaker distinguishing method based on multifactor frequency displacement invariant feature | |
CN110942766A (en) | Audio event detection method, system, mobile terminal and storage medium | |
CN108564965B (en) | Anti-noise voice recognition system | |
CN112017682B (en) | Single-channel voice simultaneous noise reduction and reverberation removal system | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
CN110931023B (en) | Gender identification method, system, mobile terminal and storage medium | |
CN105225672A (en) | Merge the system and method for the directed noise suppression of dual microphone of fundamental frequency information | |
CN103021405A (en) | Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter | |
CN115602165B (en) | Digital employee intelligent system based on financial system | |
CN105679321B (en) | Voice recognition method, device and terminal | |
CN104778948B (en) | A kind of anti-noise audio recognition method based on bending cepstrum feature | |
Shi et al. | Robust speaker recognition based on improved GFCC | |
CN114613389A (en) | Non-speech audio feature extraction method based on improved MFCC | |
Riazati Seresht et al. | Spectro-temporal power spectrum features for noise robust ASR | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine | |
CN112017658A (en) | Operation control system based on intelligent human-computer interaction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171020 |