CN113393847A - Voiceprint recognition method based on fusion of Fbank features and MFCC features - Google Patents
Voiceprint recognition method based on fusion of Fbank features and MFCC features Download PDFInfo
- Publication number
- CN113393847A CN113393847A CN202110586134.6A CN202110586134A CN113393847A CN 113393847 A CN113393847 A CN 113393847A CN 202110586134 A CN202110586134 A CN 202110586134A CN 113393847 A CN113393847 A CN 113393847A
- Authority
- CN
- China
- Prior art keywords
- fbank
- mfcc
- feature
- fusion
- voiceprint recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000009432 framing Methods 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 3
- 230000037433 frameshift Effects 0.000 claims description 2
- 238000007500 overflow downdraw method Methods 0.000 abstract description 10
- 238000012549 training Methods 0.000 abstract description 3
- 238000000605 extraction Methods 0.000 description 9
- 238000002474 experimental method Methods 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- NGVDGCNFYWLIFO-UHFFFAOYSA-N pyridoxal 5'-phosphate Chemical compound CC1=NC=C(COP(O)(O)=O)C(C=O)=C1O NGVDGCNFYWLIFO-UHFFFAOYSA-N 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics, which comprises the steps of preprocessing a voice data set, and extracting the Fbank characteristics and the MFCC characteristics; and performing feature fusion on the basis of the obtained 40-dimensional Fbank feature and the 12-dimensional MFCC feature. The invention is tested on a generalized end-to-end model, and compared with the characteristics of single Fbank and MFCC, the characteristic fusion method provided by the invention is superior to the single characteristics. The feature fusion method of the invention reduces feature dimension, reduces redundancy, reduces storage space and training complexity.
Description
Technical Field
The invention relates to the field of voice signal processing and artificial intelligence, in particular to a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics.
Background
Voiceprint recognition, also known as speaker recognition, is a technique that extracts features that can represent the identity of a speaker from a speech signal and recognizes the identity of the speaker based on the features. Voiceprint recognition is one of biological feature applications, and has the same important application fields as fingerprint recognition and face recognition, and the voiceprint recognition has the advantages of convenience in collection, convenience in non-contact, low manufacturing cost and the like. Voiceprint recognition can be applied to the fields of finance, intelligent locks, specific person awakening and the like, along with the expansion of the application range, the requirement of people on voiceprint recognition is higher and higher, and the improvement of the performance of voiceprint recognition is of great significance.
The voiceprint recognition process is generally divided into three modules of feature extraction, model construction and scoring judgment. In the feature extraction module, commonly used voiceprint features are MFCC, Fbank, LPC, PLP and the like. Most of the current common methods are based on a single class of features for training, and the only feature fusion method is to select two different features for direct splicing.
Disclosure of Invention
The invention provides a voiceprint recognition method based on fusion of Fbank features and MFCC features, aiming at the problems of overlarge dimensionality and redundancy caused by direct splicing of heterogeneous features in the prior art.
The invention discloses a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics, which specifically comprises the following steps:
firstly, preparing a voice data set and preprocessing the data set;
step two, extracting Fbank characteristics;
performing fast Fourier transform, power spectrum taking, amplitude square taking, Mel filter bank passing and logarithm taking on the preprocessed voice frame sequence to obtain Fbank characteristics;
step three, extracting MFCC characteristics;
performing discrete cosine transform on the basis of the Fbank characteristics to obtain MFCC characteristics;
step four, feature fusion;
and performing feature fusion on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature.
Preferably, the mel-filter bank coefficients are 40.
Preferably, the MFCC is obtained by performing discrete cosine transform on the basis of the Fbank characteristics, and specifically includes: extracting the 1 st to 12 th group of coefficients, and performing DCT transformation to obtain 12-dimensional MFCC features.
Preferably, the feature fusion is performed on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature, and specifically comprises the following steps: the MFCC features of groups 1-12 are embedded into groups 1-12 of a 40-dimensional Fbank.
Preferably, the preprocessing the data set specifically includes: pre-emphasis, framing, windowing, and finally outputting a voice frame sequence.
Preferably, the framing adopts a 25ms frame length and a 10ms frame shift.
Preferably, the windowed window is selected as a hamming window.
Compared with the prior art, the invention has the following beneficial effects: the invention is tested on a generalized end-to-end model, and compared with the characteristics of single Fbank and MFCC, the characteristic fusion method provided by the invention is superior to the single characteristics. The feature fusion method of the invention reduces feature dimension, reduces redundancy, reduces storage space and training complexity.
Drawings
FIG. 1 is a flow chart of the feature extraction of Fbank and MFCC according to the present invention;
FIG. 2 is a schematic diagram of a feature fusion method proposed by the present invention;
Detailed Description
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings.
Fig. 1 depicts the flow of extracting Fbank features and MFCC features in voiceprint recognition. As shown in fig. 1, the feature extraction process includes preprocessing, fast fourier transform, power spectrum extraction, amplitude square extraction, Fbank feature extraction through mel filter bank and logarithm extraction (a in fig. 1), and MFCC feature extraction through Discrete Cosine Transform (DCT) (b in fig. 1).
Wherein the preprocessing comprises pre-emphasis, framing, and windowing. The specific details are as follows: the sampling rate is 8khz, the frame length is 25ms, the frame is shifted by 10ms, and a Hamming window is adopted.
Fig. 2 depicts a specific process of the feature fusion method of the present invention. The voice signal is preprocessed, fast Fourier transformed, power spectrum is taken, the square of the amplitude is taken, and the voice signal passes through a Mel filter bank. The Mel filter bank is a filter fitting the receiving characteristic of human ear, and the coefficient of the filter bank is taken as 40 to obtain the 40-dimensional Fbank characteristic. Taking the 1 st-12 th dimension features in the Fbank features to perform DCT transformation to obtain 12-dimension MFCC features, and then embedding the 12-dimension MFCC features into the 1 st-12 th dimension positions in the Fbank features to obtain fusion features.
The feature fusion method provided by the invention is used for carrying out experiments on LSTM and BiLSTM network models based on generalized end-to-end loss, and compared with single MFCC and Fbank features, the method is verified to be capable of improving the performance of voiceprint recognition and is beneficial to application of voiceprint recognition.
Results of MFCC features alone, Fbank features, and fusion features were compared experimentally. And experiments are carried out on the Bi-LSTM model and the LSTM model, and the results show that the characteristic fusion method provided by the invention effectively improves the performance of speaker recognition. Table 1 shows the results in the Bi-LSTM model, and Table 2 shows the results in the LSTM model.
TABLE 1
TABLE 2
This experiment compares the MFCC characteristics alone, the Fbank characteristics alone, and the characteristic fusion method proposed by the present invention. The results of experiments on Bi-LSTM and LSTM models prove that the method provided by the invention can improve the performance of speaker recognition. The test adopts the judgment standard of Equal Error Rate (EER), and the lower the EER is, the better the test effect is.
Claims (7)
1. The voiceprint recognition method based on the fusion of the Fbank characteristic and the MFCC characteristic is characterized by comprising the following steps:
firstly, preparing a voice data set and preprocessing the data set;
step two, extracting Fbank characteristics;
performing fast Fourier transform, power spectrum taking, amplitude square taking, Mel filter bank passing and logarithm taking on the preprocessed voice frame sequence to obtain Fbank characteristics;
step three, extracting MFCC characteristics;
performing discrete cosine transform on the basis of the Fbank characteristics to obtain MFCC characteristics;
step four, feature fusion;
and performing feature fusion on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature.
2. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the mel-filter bank coefficients are taken as 40.
3. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the discrete cosine transform is performed on the basis of the Fbank characteristics to obtain the MFCC characteristics, and the method specifically comprises the following steps: extracting the 1 st to 12 th group of coefficients, and performing DCT transformation to obtain 12-dimensional MFCC features.
4. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the feature fusion is carried out on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature, and specifically comprises the following steps: the MFCC features of groups 1-12 are embedded into groups 1-12 of a 40-dimensional Fbank.
5. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the preprocessing of the data set specifically comprises: pre-emphasis, framing, windowing, and finally outputting a voice frame sequence.
6. The method of claim 5 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the framing adopts 25ms frame length and 10ms frame shift.
7. The method of claim 5 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the windowed window is selected as a Hamming window.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110586134.6A CN113393847B (en) | 2021-05-27 | 2021-05-27 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110586134.6A CN113393847B (en) | 2021-05-27 | 2021-05-27 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113393847A true CN113393847A (en) | 2021-09-14 |
CN113393847B CN113393847B (en) | 2022-11-15 |
Family
ID=77619314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110586134.6A Active CN113393847B (en) | 2021-05-27 | 2021-05-27 | Voiceprint recognition method based on fusion of Fbank features and MFCC features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113393847B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113782034A (en) * | 2021-09-27 | 2021-12-10 | 镁佳(北京)科技有限公司 | Audio identification method and device and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105869644A (en) * | 2016-05-25 | 2016-08-17 | 百度在线网络技术(北京)有限公司 | Deep learning based voiceprint authentication method and device |
JP2017037222A (en) * | 2015-08-11 | 2017-02-16 | 日本電信電話株式会社 | Feature amount vector calculation device, voice recognition device, feature amount spectrum calculation method, and feature amount vector calculation program |
CN108305641A (en) * | 2017-06-30 | 2018-07-20 | 腾讯科技(深圳)有限公司 | The determination method and apparatus of emotion information |
CN108922556A (en) * | 2018-07-16 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | sound processing method, device and equipment |
CN111724899A (en) * | 2020-06-28 | 2020-09-29 | 湘潭大学 | Parkinson audio intelligent detection method and system based on Fbank and MFCC fusion characteristics |
CN111785285A (en) * | 2020-05-22 | 2020-10-16 | 南京邮电大学 | Voiceprint recognition method for home multi-feature parameter fusion |
CN111863003A (en) * | 2020-07-24 | 2020-10-30 | 苏州思必驰信息科技有限公司 | Voice data enhancement method and device |
-
2021
- 2021-05-27 CN CN202110586134.6A patent/CN113393847B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2017037222A (en) * | 2015-08-11 | 2017-02-16 | 日本電信電話株式会社 | Feature amount vector calculation device, voice recognition device, feature amount spectrum calculation method, and feature amount vector calculation program |
CN105869644A (en) * | 2016-05-25 | 2016-08-17 | 百度在线网络技术(北京)有限公司 | Deep learning based voiceprint authentication method and device |
CN108305641A (en) * | 2017-06-30 | 2018-07-20 | 腾讯科技(深圳)有限公司 | The determination method and apparatus of emotion information |
CN108922556A (en) * | 2018-07-16 | 2018-11-30 | 百度在线网络技术(北京)有限公司 | sound processing method, device and equipment |
CN111785285A (en) * | 2020-05-22 | 2020-10-16 | 南京邮电大学 | Voiceprint recognition method for home multi-feature parameter fusion |
CN111724899A (en) * | 2020-06-28 | 2020-09-29 | 湘潭大学 | Parkinson audio intelligent detection method and system based on Fbank and MFCC fusion characteristics |
CN111863003A (en) * | 2020-07-24 | 2020-10-30 | 苏州思必驰信息科技有限公司 | Voice data enhancement method and device |
Non-Patent Citations (1)
Title |
---|
李铮等: "说话人识别系统中特征提取的优化方法", 《厦门大学学报(自然科学版》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113782034A (en) * | 2021-09-27 | 2021-12-10 | 镁佳(北京)科技有限公司 | Audio identification method and device and electronic equipment |
Also Published As
Publication number | Publication date |
---|---|
CN113393847B (en) | 2022-11-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
McLaren et al. | Advances in deep neural network approaches to speaker recognition | |
CN109036382B (en) | Audio feature extraction method based on KL divergence | |
CN102968990B (en) | Speaker identifying method and system | |
CN103345923B (en) | A kind of phrase sound method for distinguishing speek person based on rarefaction representation | |
CN108922541B (en) | Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models | |
CN110931022B (en) | Voiceprint recognition method based on high-low frequency dynamic and static characteristics | |
CN109215665A (en) | A kind of method for recognizing sound-groove based on 3D convolutional neural networks | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN108281146A (en) | A kind of phrase sound method for distinguishing speek person and device | |
CN103794207A (en) | Dual-mode voice identity recognition method | |
CN1877697A (en) | Method for identifying speaker based on distributed structure | |
CN109215634A (en) | A kind of method and its system of more word voice control on-off systems | |
CN113393847B (en) | Voiceprint recognition method based on fusion of Fbank features and MFCC features | |
CN109448702A (en) | Artificial cochlea's auditory scene recognition methods | |
CN104778948A (en) | Noise-resistant voice recognition method based on warped cepstrum feature | |
CN106782503A (en) | Automatic speech recognition method based on physiologic information in phonation | |
CN115101076B (en) | Speaker clustering method based on multi-scale channel separation convolution feature extraction | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine | |
CN112017658A (en) | Operation control system based on intelligent human-computer interaction | |
CN104464738A (en) | Vocal print recognition method oriented to smart mobile device | |
CN115472168B (en) | Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features | |
CN110197657A (en) | A kind of dynamic speech feature extracting method based on cosine similarity | |
CN112992155B (en) | Far-field voice speaker recognition method and device based on residual error neural network | |
CN109003613A (en) | The Application on Voiceprint Recognition payment information method for anti-counterfeit of combining space information | |
CN113628639A (en) | Voice emotion recognition method based on multi-head attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |