CN113393847A - Voiceprint recognition method based on fusion of Fbank features and MFCC features - Google Patents

Voiceprint recognition method based on fusion of Fbank features and MFCC features Download PDF

Info

Publication number
CN113393847A
CN113393847A CN202110586134.6A CN202110586134A CN113393847A CN 113393847 A CN113393847 A CN 113393847A CN 202110586134 A CN202110586134 A CN 202110586134A CN 113393847 A CN113393847 A CN 113393847A
Authority
CN
China
Prior art keywords
fbank
mfcc
feature
fusion
voiceprint recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110586134.6A
Other languages
Chinese (zh)
Other versions
CN113393847B (en
Inventor
周后盘
赵将焜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dianzi University
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN202110586134.6A priority Critical patent/CN113393847B/en
Publication of CN113393847A publication Critical patent/CN113393847A/en
Application granted granted Critical
Publication of CN113393847B publication Critical patent/CN113393847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics, which comprises the steps of preprocessing a voice data set, and extracting the Fbank characteristics and the MFCC characteristics; and performing feature fusion on the basis of the obtained 40-dimensional Fbank feature and the 12-dimensional MFCC feature. The invention is tested on a generalized end-to-end model, and compared with the characteristics of single Fbank and MFCC, the characteristic fusion method provided by the invention is superior to the single characteristics. The feature fusion method of the invention reduces feature dimension, reduces redundancy, reduces storage space and training complexity.

Description

Voiceprint recognition method based on fusion of Fbank features and MFCC features
Technical Field
The invention relates to the field of voice signal processing and artificial intelligence, in particular to a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics.
Background
Voiceprint recognition, also known as speaker recognition, is a technique that extracts features that can represent the identity of a speaker from a speech signal and recognizes the identity of the speaker based on the features. Voiceprint recognition is one of biological feature applications, and has the same important application fields as fingerprint recognition and face recognition, and the voiceprint recognition has the advantages of convenience in collection, convenience in non-contact, low manufacturing cost and the like. Voiceprint recognition can be applied to the fields of finance, intelligent locks, specific person awakening and the like, along with the expansion of the application range, the requirement of people on voiceprint recognition is higher and higher, and the improvement of the performance of voiceprint recognition is of great significance.
The voiceprint recognition process is generally divided into three modules of feature extraction, model construction and scoring judgment. In the feature extraction module, commonly used voiceprint features are MFCC, Fbank, LPC, PLP and the like. Most of the current common methods are based on a single class of features for training, and the only feature fusion method is to select two different features for direct splicing.
Disclosure of Invention
The invention provides a voiceprint recognition method based on fusion of Fbank features and MFCC features, aiming at the problems of overlarge dimensionality and redundancy caused by direct splicing of heterogeneous features in the prior art.
The invention discloses a voiceprint recognition method based on fusion of Fbank characteristics and MFCC characteristics, which specifically comprises the following steps:
firstly, preparing a voice data set and preprocessing the data set;
step two, extracting Fbank characteristics;
performing fast Fourier transform, power spectrum taking, amplitude square taking, Mel filter bank passing and logarithm taking on the preprocessed voice frame sequence to obtain Fbank characteristics;
step three, extracting MFCC characteristics;
performing discrete cosine transform on the basis of the Fbank characteristics to obtain MFCC characteristics;
step four, feature fusion;
and performing feature fusion on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature.
Preferably, the mel-filter bank coefficients are 40.
Preferably, the MFCC is obtained by performing discrete cosine transform on the basis of the Fbank characteristics, and specifically includes: extracting the 1 st to 12 th group of coefficients, and performing DCT transformation to obtain 12-dimensional MFCC features.
Preferably, the feature fusion is performed on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature, and specifically comprises the following steps: the MFCC features of groups 1-12 are embedded into groups 1-12 of a 40-dimensional Fbank.
Preferably, the preprocessing the data set specifically includes: pre-emphasis, framing, windowing, and finally outputting a voice frame sequence.
Preferably, the framing adopts a 25ms frame length and a 10ms frame shift.
Preferably, the windowed window is selected as a hamming window.
Compared with the prior art, the invention has the following beneficial effects: the invention is tested on a generalized end-to-end model, and compared with the characteristics of single Fbank and MFCC, the characteristic fusion method provided by the invention is superior to the single characteristics. The feature fusion method of the invention reduces feature dimension, reduces redundancy, reduces storage space and training complexity.
Drawings
FIG. 1 is a flow chart of the feature extraction of Fbank and MFCC according to the present invention;
FIG. 2 is a schematic diagram of a feature fusion method proposed by the present invention;
Detailed Description
The following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings.
Fig. 1 depicts the flow of extracting Fbank features and MFCC features in voiceprint recognition. As shown in fig. 1, the feature extraction process includes preprocessing, fast fourier transform, power spectrum extraction, amplitude square extraction, Fbank feature extraction through mel filter bank and logarithm extraction (a in fig. 1), and MFCC feature extraction through Discrete Cosine Transform (DCT) (b in fig. 1).
Wherein the preprocessing comprises pre-emphasis, framing, and windowing. The specific details are as follows: the sampling rate is 8khz, the frame length is 25ms, the frame is shifted by 10ms, and a Hamming window is adopted.
Fig. 2 depicts a specific process of the feature fusion method of the present invention. The voice signal is preprocessed, fast Fourier transformed, power spectrum is taken, the square of the amplitude is taken, and the voice signal passes through a Mel filter bank. The Mel filter bank is a filter fitting the receiving characteristic of human ear, and the coefficient of the filter bank is taken as 40 to obtain the 40-dimensional Fbank characteristic. Taking the 1 st-12 th dimension features in the Fbank features to perform DCT transformation to obtain 12-dimension MFCC features, and then embedding the 12-dimension MFCC features into the 1 st-12 th dimension positions in the Fbank features to obtain fusion features.
The feature fusion method provided by the invention is used for carrying out experiments on LSTM and BiLSTM network models based on generalized end-to-end loss, and compared with single MFCC and Fbank features, the method is verified to be capable of improving the performance of voiceprint recognition and is beneficial to application of voiceprint recognition.
Results of MFCC features alone, Fbank features, and fusion features were compared experimentally. And experiments are carried out on the Bi-LSTM model and the LSTM model, and the results show that the characteristic fusion method provided by the invention effectively improves the performance of speaker recognition. Table 1 shows the results in the Bi-LSTM model, and Table 2 shows the results in the LSTM model.
Figure BDA0003087536860000031
TABLE 1
Figure BDA0003087536860000032
TABLE 2
This experiment compares the MFCC characteristics alone, the Fbank characteristics alone, and the characteristic fusion method proposed by the present invention. The results of experiments on Bi-LSTM and LSTM models prove that the method provided by the invention can improve the performance of speaker recognition. The test adopts the judgment standard of Equal Error Rate (EER), and the lower the EER is, the better the test effect is.

Claims (7)

1. The voiceprint recognition method based on the fusion of the Fbank characteristic and the MFCC characteristic is characterized by comprising the following steps:
firstly, preparing a voice data set and preprocessing the data set;
step two, extracting Fbank characteristics;
performing fast Fourier transform, power spectrum taking, amplitude square taking, Mel filter bank passing and logarithm taking on the preprocessed voice frame sequence to obtain Fbank characteristics;
step three, extracting MFCC characteristics;
performing discrete cosine transform on the basis of the Fbank characteristics to obtain MFCC characteristics;
step four, feature fusion;
and performing feature fusion on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature.
2. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the mel-filter bank coefficients are taken as 40.
3. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the discrete cosine transform is performed on the basis of the Fbank characteristics to obtain the MFCC characteristics, and the method specifically comprises the following steps: extracting the 1 st to 12 th group of coefficients, and performing DCT transformation to obtain 12-dimensional MFCC features.
4. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the feature fusion is carried out on the basis of obtaining the 40-dimensional Fbank feature and the 12-dimensional MFCC feature, and specifically comprises the following steps: the MFCC features of groups 1-12 are embedded into groups 1-12 of a 40-dimensional Fbank.
5. The method of claim 1 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the preprocessing of the data set specifically comprises: pre-emphasis, framing, windowing, and finally outputting a voice frame sequence.
6. The method of claim 5 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the framing adopts 25ms frame length and 10ms frame shift.
7. The method of claim 5 for voiceprint recognition based on Fbank feature and MFCC feature fusion, wherein: the windowed window is selected as a Hamming window.
CN202110586134.6A 2021-05-27 2021-05-27 Voiceprint recognition method based on fusion of Fbank features and MFCC features Active CN113393847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110586134.6A CN113393847B (en) 2021-05-27 2021-05-27 Voiceprint recognition method based on fusion of Fbank features and MFCC features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110586134.6A CN113393847B (en) 2021-05-27 2021-05-27 Voiceprint recognition method based on fusion of Fbank features and MFCC features

Publications (2)

Publication Number Publication Date
CN113393847A true CN113393847A (en) 2021-09-14
CN113393847B CN113393847B (en) 2022-11-15

Family

ID=77619314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110586134.6A Active CN113393847B (en) 2021-05-27 2021-05-27 Voiceprint recognition method based on fusion of Fbank features and MFCC features

Country Status (1)

Country Link
CN (1) CN113393847B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782034A (en) * 2021-09-27 2021-12-10 镁佳(北京)科技有限公司 Audio identification method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105869644A (en) * 2016-05-25 2016-08-17 百度在线网络技术(北京)有限公司 Deep learning based voiceprint authentication method and device
JP2017037222A (en) * 2015-08-11 2017-02-16 日本電信電話株式会社 Feature amount vector calculation device, voice recognition device, feature amount spectrum calculation method, and feature amount vector calculation program
CN108305641A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN108922556A (en) * 2018-07-16 2018-11-30 百度在线网络技术(北京)有限公司 sound processing method, device and equipment
CN111724899A (en) * 2020-06-28 2020-09-29 湘潭大学 Parkinson audio intelligent detection method and system based on Fbank and MFCC fusion characteristics
CN111785285A (en) * 2020-05-22 2020-10-16 南京邮电大学 Voiceprint recognition method for home multi-feature parameter fusion
CN111863003A (en) * 2020-07-24 2020-10-30 苏州思必驰信息科技有限公司 Voice data enhancement method and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017037222A (en) * 2015-08-11 2017-02-16 日本電信電話株式会社 Feature amount vector calculation device, voice recognition device, feature amount spectrum calculation method, and feature amount vector calculation program
CN105869644A (en) * 2016-05-25 2016-08-17 百度在线网络技术(北京)有限公司 Deep learning based voiceprint authentication method and device
CN108305641A (en) * 2017-06-30 2018-07-20 腾讯科技(深圳)有限公司 The determination method and apparatus of emotion information
CN108922556A (en) * 2018-07-16 2018-11-30 百度在线网络技术(北京)有限公司 sound processing method, device and equipment
CN111785285A (en) * 2020-05-22 2020-10-16 南京邮电大学 Voiceprint recognition method for home multi-feature parameter fusion
CN111724899A (en) * 2020-06-28 2020-09-29 湘潭大学 Parkinson audio intelligent detection method and system based on Fbank and MFCC fusion characteristics
CN111863003A (en) * 2020-07-24 2020-10-30 苏州思必驰信息科技有限公司 Voice data enhancement method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李铮等: "说话人识别系统中特征提取的优化方法", 《厦门大学学报(自然科学版》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782034A (en) * 2021-09-27 2021-12-10 镁佳(北京)科技有限公司 Audio identification method and device and electronic equipment

Also Published As

Publication number Publication date
CN113393847B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
McLaren et al. Advances in deep neural network approaches to speaker recognition
CN109036382B (en) Audio feature extraction method based on KL divergence
CN102968990B (en) Speaker identifying method and system
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN110931022B (en) Voiceprint recognition method based on high-low frequency dynamic and static characteristics
CN109215665A (en) A kind of method for recognizing sound-groove based on 3D convolutional neural networks
CN109119072A (en) Civil aviaton's land sky call acoustic model construction method based on DNN-HMM
CN108281146A (en) A kind of phrase sound method for distinguishing speek person and device
CN103794207A (en) Dual-mode voice identity recognition method
CN1877697A (en) Method for identifying speaker based on distributed structure
CN109215634A (en) A kind of method and its system of more word voice control on-off systems
CN113393847B (en) Voiceprint recognition method based on fusion of Fbank features and MFCC features
CN109448702A (en) Artificial cochlea's auditory scene recognition methods
CN104778948A (en) Noise-resistant voice recognition method based on warped cepstrum feature
CN106782503A (en) Automatic speech recognition method based on physiologic information in phonation
CN115101076B (en) Speaker clustering method based on multi-scale channel separation convolution feature extraction
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine
CN112017658A (en) Operation control system based on intelligent human-computer interaction
CN104464738A (en) Vocal print recognition method oriented to smart mobile device
CN115472168B (en) Short-time voice voiceprint recognition method, system and equipment for coupling BGCC and PWPE features
CN110197657A (en) A kind of dynamic speech feature extracting method based on cosine similarity
CN112992155B (en) Far-field voice speaker recognition method and device based on residual error neural network
CN109003613A (en) The Application on Voiceprint Recognition payment information method for anti-counterfeit of combining space information
CN113628639A (en) Voice emotion recognition method based on multi-head attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant