CN112542174A - VAD-based multi-dimensional characteristic parameter voiceprint identification method - Google Patents
VAD-based multi-dimensional characteristic parameter voiceprint identification method Download PDFInfo
- Publication number
- CN112542174A CN112542174A CN202011557161.2A CN202011557161A CN112542174A CN 112542174 A CN112542174 A CN 112542174A CN 202011557161 A CN202011557161 A CN 202011557161A CN 112542174 A CN112542174 A CN 112542174A
- Authority
- CN
- China
- Prior art keywords
- characteristic parameters
- frame
- signal
- frequency
- mfcc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000001514 detection method Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 12
- 238000009432 framing Methods 0.000 claims abstract description 11
- 238000012549 training Methods 0.000 claims abstract description 11
- 238000001228 spectrum Methods 0.000 claims description 36
- 230000003595 spectral effect Effects 0.000 claims description 25
- 230000004044 response Effects 0.000 claims description 8
- 230000037433 frameshift Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000012795 verification Methods 0.000 claims description 4
- 238000012935 Averaging Methods 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 description 8
- 230000003068 static effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005562 fading Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Complex Calculations (AREA)
Abstract
Step S1, reading, pre-emphasizing, framing and windowing the input voice signal, and converting the voice signal into a voice preprocessing signal; step S2, accurately detecting the start and stop frames of the framed voice preprocessing signal through end point detection, and removing the mute section; and S3, extracting MFCC characteristic parameters, MFCC standardized characteristic parameters, GFCC characteristic parameters and PNCC characteristic parameters of the voice signal after the endpoint detection, and combining the MFCC characteristic parameters, the MFCC standardized characteristic parameters and the PNCC characteristic parameters to form multi-dimensional characteristic parameters. The method improves the accuracy of endpoint detection, reduces the data volume of the training in the template training phase, enhances the anti-noise interference capability and effectively improves the recognition efficiency of voiceprint recognition.
Description
Technical Field
The invention belongs to the technical field of voiceprint recognition, and particularly relates to a multi-dimensional characteristic parameter voiceprint recognition method based on VAD.
Background
Voiceprint recognition is one of the biometric identification techniques, also known as speaker recognition. It is divided into two categories, namely speaker identification and speaker verification. The theoretical basis for voiceprint recognition is that each sound has a unique characteristic by which it is possible to effectively distinguish between different human voices.
The main flow of voiceprint recognition comprises reading the voice file of the speaker in the training set, and extracting characteristic information with identification from the read voice data through a specific filter. Common feature extraction methods include Mel cepstral coefficient MFCC, cepstral feature parameter GFCC of a Gamma-tone filter, energy regularization spectral coefficient PNCC, speaker Vector factors (Identity-Vector, I-Vector) and the like, and then template training is performed based on methods such as Gaussian mixture model GMM, dynamic time regularization DTW or artificial neural network template matching and the like. And finally, extracting the characteristics of the audio data in the test set to be matched with the trained template, so as to achieve the aim of voiceprint recognition.
In recent years, in order to improve the accuracy of voiceprint recognition, the following two types of feature extraction methods are mainly used in the field of speaker recognition.
(1) The method is mainly characterized in that a single feature extraction method is used for training according to different types of audios and the magnitude of signal-to-noise ratio. For example, the mainstream MFCC feature extraction, but sometimes the endpoint detection cannot accurately detect the start and stop endpoints of the speech because of the transformation of the frame length and the frame shift, and the standard cepstrum parameter MFCC only reflects the static characteristics of the speech parameters and does not perform well in the aspect of robustness.
(2) And calculating a difference spectrum from the extracted static features by a difference method to represent the dynamic features of the voice parameters, and then combining the dynamic features and the static features. The method can effectively improve the recognition performance of the system, but even greatly reduces the recognition accuracy under the condition of large noise interference in the voice information.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a multi-dimensional characteristic parameter voiceprint recognition method based on VAD, so that the accuracy of endpoint detection is improved, the data volume of training in the template training stage is reduced, the anti-noise interference capability is enhanced, and the recognition efficiency of voiceprint recognition is effectively improved.
The invention provides a voice print identification method based on VAD multidimensional characteristic parameters, which comprises the following steps,
step S1, reading, pre-emphasizing, framing and windowing the input voice signal, and converting the voice signal into a voice preprocessing signal;
step S2, accurately detecting the start and stop frames of the framed voice preprocessing signal through end point detection, and removing the mute section;
and S3, extracting MFCC characteristic parameters, MFCC standardized characteristic parameters, GFCC characteristic parameters and PNCC characteristic parameters of the voice signal after the endpoint detection, and combining the MFCC characteristic parameters, the MFCC standardized characteristic parameters and the PNCC characteristic parameters to form multi-dimensional characteristic parameters.
As a further technical solution of the present invention, in step S1, the voice signal reading is to read wav format audio files in a training set, and a wavfile () method in a scipy. io library in python is adopted to obtain a one-dimensional array representing audio information.
Further, in step S1, the pre-emphasis is to boost the high frequency component as a function of:wherein u is a pre-emphasis coefficient and has a value range of 0.9-1.
Further, in step S1, the framing and windowing specifically includes that the parameter model of the speech signal is approximately unchanged within 10ms to 30ms, the number of frames in 1 second is 33 to 100 frames, an overlap region exists between adjacent frames, i.e., frame shift, and the ratio of the frame shift to the frame length is 1/3 to 1/2; and finally multiplying each frame signal by a Hamming window, wherein the expression of the Hamming window is as follows:where a is the Hamming window coefficient.
Further, the end point detection in step S2 adopts a spectral entropy method, where entropy is identificationThe order degree of the signals, the spectrum entropy method is to detect the voice endpoint by detecting the flatness degree of the spectrum; the speech signal isAfter windowing and framing, the nth frame is obtainedThe FFT is:k is the kth spectral line; the short-time energy of the speech frame in the frequency domain is:wherein N is the FFT length, and only the positive frequency part is taken; the energy spectrum of the kth spectral line isNormalized spectral probability density function for each frequency component ofThe short-time spectral entropy of the frame isCalculating the spectral entropy of each frame as(ii) a The audio information in step S1 is intercepted by the spectral entropy method, and an area rich in speech information is reserved.
Further, in step S2, the endpoint detection adds a result verification mechanism, and when the spectral entropy method fails, the framed speech signal is filtered through an energy valve to remove the silence segment, where the value of the valve isWhereinIs an array of speech signal frame energies.
Furthermore, in step S3, the specific method for extracting the MFCC characteristic parameters is,
FFT is carried out on each frame signal of the voice preprocessing signal to obtain the frequency spectrum of each frame, and the power spectrum of the voice signal is obtained by taking the modulus square of the frequency spectrum of the voice signal;
passing the energy spectrum through a Mel triangular filter bank, wherein the center frequencies of the filter bank are uniformly arranged according to the Mel frequency, the filter bank comprises 22-26 filters, the base angle of each filter is the center frequency of the adjacent filter, and the approximate relational expression of the Mel frequency and the frequency isWherein f is frequency;
and calculating the logarithmic energy output by each Mel triangular filter bank, obtaining an MFCC feature vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
Further, in step S3, the specific method for extracting the GFCC characteristic parameters is,
carrying out Fourier transform calculation on a specific signal frame; and taking an absolute value of the output frequency spectrum;
the energy spectrum is passed through a Gamma Filter Bank, which contains 20 filters, the output response of which isWherein, N is the channel number of the filter, and M is the frame number after sampling;
and calculating the logarithmic energy output by each filter bank, obtaining a GFCC characteristic vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
Further, in step S3, the specific method for extracting the PNCC characteristic parameter is,
performing Fourier transform calculation on the signal frame, and taking an absolute value of an output frequency spectrum;
the energy spectrum is passed through a Gamma Filter Bank, which contains 20 filters, the output response of which isWhere N is the number of channels of the filter and M is the number of frames after sampling.
Smoothing each frame, namely averaging the left frame and the right frame by 2 frames to obtain average power;
and normalizing the average power, and acquiring the PNCC characteristic parameters through an exponential function and Discrete Cosine Transform (DCT) which are more in line with the auditory characteristics of the human ear.
Further, in step S3, the dimension of the multi-dimensional feature parameter formed by combination is 52 dimensions.
The invention has the advantages that the voice information frame is intercepted through the endpoint detection, the energy valve device is added, the interference of a mute frame and a noise frame on the feature extraction result can be eliminated, the accuracy of the endpoint detection result is enhanced, the standardized feature parameter of the MFCC, the MFCC feature parameter, the Gamma atom filter feature parameter and the energy regularization spectrum coefficient PNCC feature parameter are combined into the multidimensional feature coefficient, and the accuracy of the identification is obviously improved.
Drawings
FIG. 1 is a flow chart of voiceprint recognition in accordance with the present invention;
FIG. 2 is a schematic diagram illustrating the effect of the endpoint detection method of the present invention;
FIG. 3 is a diagram of a multi-dimensional feature set according to the present invention.
Detailed Description
Referring to fig. 1, the present embodiment provides a method for recognizing a voiceprint with multidimensional feature parameters based on VAD, which is based on traditional feature parameter extraction and training of a speech library, and is improved for a feature extraction stage, and mainly includes three parts, namely speech signal preprocessing, endpoint detection, and feature parameter extraction, including the following steps,
step S1, reading, pre-emphasizing, framing and windowing the input voice signal, and converting the voice signal into a voice preprocessing signal;
step S2, accurately detecting the start and stop frames of the framed voice preprocessing signal through end point detection, and removing the mute section;
and S3, extracting MFCC characteristic parameters, MFCC standardized characteristic parameters, GFCC characteristic parameters and PNCC characteristic parameters of the voice signal after the endpoint detection, and combining the MFCC characteristic parameters, the MFCC standardized characteristic parameters and the PNCC characteristic parameters to form multi-dimensional characteristic parameters.
In step S1, the voice signal reading is to read wav format audio files in the training set, and obtain a one-dimensional array representing audio information by using the wavfile () method in the scipy.
In step S1, since the frequency response curve of the glottal pulse is close to a second-order low-pass filter and the oral cavity radiation response is also close to a low-order high-pass filter, the frequency spectrum of the speech signal exhibits raised cosine roll-off fading, the value of the high-frequency component is usually much smaller than that of the low-frequency component, and in order to increase the high-frequency resolution of the speech signal and highlight the formants of the high-frequency part, we pre-emphasize the speech signal. The purpose of pre-emphasis is to compensate for the loss of high frequency components, which are boosted to pass the input speech test signal through a high pass filter as a function of:wherein u is a pre-emphasis coefficient, the value range is 0.9-1, and is generally 0.97.
In the step S1, the framing and windowing are specifically that a parameter model of the voice signal is approximately unchanged within 10ms to 30ms, the number of frames in 1 second is 33 to 100 frames, an overlapping region exists between adjacent frames, namely frame shift, and the ratio of the frame shift to the frame length is 1/3 to 1/2; and finally multiplying each frame signal by a Hamming window, wherein the expression of the Hamming window is as follows:where a is the Hamming window coefficient, which is 0.46.
In the step S2, the endpoint detection adopts a spectrum entropy method, wherein entropy is the ordered degree of the identification signal, and the spectrum entropy method is used for detecting the voice endpoint by detecting the flat degree of the spectrum; the speech signal isAfter windowing and framing, the nth frame is obtainedThe FFT is:k is the kth spectral line; the short-time energy of the speech frame in the frequency domain is:wherein N is the FFT length, and only the positive frequency part is taken; the energy spectrum of the kth spectral line isNormalized spectral probability density function for each frequency component ofThe short-time spectral entropy of the frame isCalculating the spectral entropy of each frame as(ii) a The audio information in step S1 is intercepted by the spectral entropy method, and a region rich in speech information is reserved, as shown in fig. 2.
In step S2, the endpoint detection adds a result verification mechanism, when the spectral entropy method fails, the speech signal after framing is screened by an energy valve to remove the mute section, and the valve value isWhereinIs an array of speech signal frame energies.
In step S3, the specific method for extracting the MFCC characteristic parameters is,
FFT is carried out on each frame signal of the voice preprocessing signal to obtain the frequency spectrum of each frame, and the power spectrum of the voice signal is obtained by taking the modulus square of the frequency spectrum of the voice signal;
passing the energy spectrum through a Mel triangular filter bank, wherein the center frequencies of the filter bank are uniformly arranged according to the Mel frequency, the filter bank comprises 22-26 filters, the base angle of each filter is the center frequency of the adjacent filter, and the approximate relational expression of the Mel frequency and the frequency isWherein f is frequency;
the MFCC characteristic parameter is normalized by centering the MFCC characteristic parameter on the mean, centering on the component unit and centering on the unit variance, and the data is subjected to the normalization processing by subtracting the mean from the attribute (by column) and dividing by the variance. The net result is that for each attribute/column all data is clustered around 0, with a variance value of 1, resulting in a normalized feature that is the same size as the MFCC feature parameter dimension.
And calculating the logarithmic energy output by each Mel triangular filter bank, obtaining an MFCC feature vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
In step S3, the specific method for extracting GFCC characteristic parameters is,
carrying out Fourier transform calculation on a specific signal frame; and taking an absolute value of the output frequency spectrum;
the energy spectrum is passed through a Gamma Filter Bank, which contains 20 filters, the output response of which isWherein, N is the channel number of the filter, and M is the frame number after sampling;
and calculating the logarithmic energy output by each filter bank, obtaining a GFCC characteristic vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
In step S3, the specific method for extracting the PNCC characteristic parameters is,
performing Fourier transform calculation on the signal frame, and taking an absolute value of an output frequency spectrum;
passing the energy spectrum through a Gamma-tone filter bank comprising 20 filters, passThe output response of the filter isWhere N is the number of channels of the filter and M is the number of frames after sampling.
Smoothing each frame, namely averaging the left frame and the right frame by 2 frames to obtain average power;
and normalizing the average power, and acquiring the PNCC characteristic parameters through an exponential function and Discrete Cosine Transform (DCT) which are more in line with the auditory characteristics of the human ear.
As shown in fig. 3, in step S3, the MFCC characteristic parameter, the MFCC normalized characteristic parameter, the GFCC characteristic parameter, and the PNCC characteristic parameter are combined to form a multi-dimensional characteristic parameter. The selected characteristic parameter dimension is 13 dimensions, and the dimension of the multi-dimensional characteristic parameter formed by combination is 52 dimensions.
The foregoing illustrates and describes the principles, general features, and advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are intended to further illustrate the principles of the invention, and that various changes and modifications may be made without departing from the spirit and scope of the invention, which is intended to be protected by the appended claims. The scope of the invention is defined by the claims and their equivalents.
Claims (10)
1. A multi-dimensional characteristic parameter voiceprint recognition method based on VAD is characterized by comprising the following steps,
step S1, reading, pre-emphasizing, framing and windowing the input voice signal, and converting the voice signal into a voice preprocessing signal;
step S2, accurately detecting the start and stop frames of the framed voice preprocessing signal through end point detection, and removing the mute section;
and S3, extracting MFCC characteristic parameters, MFCC standardized characteristic parameters, GFCC characteristic parameters and PNCC characteristic parameters of the voice signal after the endpoint detection, and combining the MFCC characteristic parameters, the MFCC standardized characteristic parameters and the PNCC characteristic parameters to form multi-dimensional characteristic parameters.
2. The method for identifying the voiceprint of the multi-dimensional characteristic parameters based on the VAD as claimed in claim 1, wherein in the step S1, the voice signal reading is to read wav format audio files in a training set, and a wavfile () method in a scipy. io library in python is adopted to obtain a one-dimensional array representing audio information;
3. The method for voiceprint recognition based on VAD multi-dimensional characteristic parameters of claim 1, wherein in the step S1, the framing and windowing are specifically performed such that the parameter model of the speech signal is approximately unchanged within 10ms to 30ms, the number of frames in 1 second is 33 to 100 frames, an overlapping region exists between adjacent frames, i.e. frame shift, and the ratio of the frame shift to the frame length is 1/3 to 1/2; and finally multiplying each frame signal by a Hamming window, wherein the expression of the Hamming window is as follows:in the formula, a is a Hamming window coefficient.
4. The method according to claim 1, wherein the end point detection in step S2 adopts a spectral entropy method, entropy is the degree of order of the identification signal, and the spectral entropy end point detection is to detect a voice end point by detecting the flatness of the spectrum; the speech signal isAfter windowing and framing, the nth frame is obtainedThe FFT is:k is the kth spectral line; the short-time energy of the speech frame in the frequency domain is:wherein N is the FFT length, and only the positive frequency part is taken; the energy spectrum of the kth spectral line isNormalized spectral probability density function for each frequency component ofThe short-time spectral entropy of the frame isCalculating the spectral entropy of each frame as(ii) a The audio information in step S1 is intercepted by the spectral entropy method, and an area rich in speech information is reserved.
5. The method according to claim 1, wherein in step S2, the endpoint detection adds a result verification mechanism, and when the spectral entropy method fails, the framed speech signal is filtered through an energy valve to remove silence segments, and the value of the valve is set asWhereinIs an array of speech signal frame energies.
6. The method for identifying the voiceprint of the multi-dimensional characteristic parameter based on the VAD according to the claim 1, wherein in the step S3, the specific method for extracting the MFCC characteristic parameter is,
FFT is carried out on each frame signal of the voice preprocessing signal to obtain the frequency spectrum of each frame, and the power spectrum of the voice signal is obtained by taking the modulus square of the frequency spectrum of the voice signal;
passing the energy spectrum through a Mel triangular filter bank, wherein the center frequencies of the filter bank are uniformly arranged according to the Mel frequency, the filter bank comprises 22-26 filters, the base angle of each filter is the center frequency of the adjacent filter, and the approximate relational expression of the Mel frequency and the frequency isWherein f is frequency;
and calculating the logarithmic energy output by each Mel triangular filter bank, obtaining an MFCC feature vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
7. The method according to claim 1, wherein in step S3, the specific method for extracting GFCC characteristic parameters is,
carrying out Fourier transform calculation on a specific signal frame; and taking an absolute value of the output frequency spectrum;
the energy spectrum is passed through a Gamma Filter Bank, which contains 20 filters, the output response of which isWherein, N is the channel number of the filter, and M is the frame number after sampling;
and calculating the logarithmic energy output by each filter bank, obtaining a GFCC characteristic vector through Discrete Cosine Transform (DCT), and returning to the default 13-dimensional cepstrum number.
8. The method as claimed in claim 1, wherein in step S3, the PNCC characteristic parameters are extracted by the specific method,
performing Fourier transform calculation on the signal frame, and taking an absolute value of an output frequency spectrum;
9. Smoothing each frame, namely averaging the left frame and the right frame by 2 frames to obtain average power;
and normalizing the average power, and acquiring the PNCC characteristic parameters through an exponential function and discrete cosine transform.
10. The method according to claim 1, wherein the dimension of the multidimensional characteristic parameter formed by combining in step S3 is 52 dimensions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011557161.2A CN112542174A (en) | 2020-12-25 | 2020-12-25 | VAD-based multi-dimensional characteristic parameter voiceprint identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011557161.2A CN112542174A (en) | 2020-12-25 | 2020-12-25 | VAD-based multi-dimensional characteristic parameter voiceprint identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112542174A true CN112542174A (en) | 2021-03-23 |
Family
ID=75017464
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011557161.2A Pending CN112542174A (en) | 2020-12-25 | 2020-12-25 | VAD-based multi-dimensional characteristic parameter voiceprint identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112542174A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113179442A (en) * | 2021-04-20 | 2021-07-27 | 浙江工业大学 | Voice recognition-based audio stream replacement method in video |
CN113205803A (en) * | 2021-04-22 | 2021-08-03 | 上海顺久电子科技有限公司 | Voice recognition method and device with adaptive noise reduction capability |
CN113823290A (en) * | 2021-08-31 | 2021-12-21 | 杭州电子科技大学 | Multi-feature fusion voiceprint recognition method |
CN114038469A (en) * | 2021-08-03 | 2022-02-11 | 成都理工大学 | Speaker identification method based on multi-class spectrogram feature attention fusion network |
CN115273863A (en) * | 2022-06-13 | 2022-11-01 | 广东职业技术学院 | Compound network class attendance system and method based on voice recognition and face recognition |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
CN108922541A (en) * | 2018-05-25 | 2018-11-30 | 南京邮电大学 | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model |
CN110349598A (en) * | 2019-07-15 | 2019-10-18 | 桂林电子科技大学 | A kind of end-point detecting method under low signal-to-noise ratio environment |
CN111785285A (en) * | 2020-05-22 | 2020-10-16 | 南京邮电大学 | Voiceprint recognition method for home multi-feature parameter fusion |
-
2020
- 2020-12-25 CN CN202011557161.2A patent/CN112542174A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
CN108922541A (en) * | 2018-05-25 | 2018-11-30 | 南京邮电大学 | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model |
CN110349598A (en) * | 2019-07-15 | 2019-10-18 | 桂林电子科技大学 | A kind of end-point detecting method under low signal-to-noise ratio environment |
CN111785285A (en) * | 2020-05-22 | 2020-10-16 | 南京邮电大学 | Voiceprint recognition method for home multi-feature parameter fusion |
Non-Patent Citations (1)
Title |
---|
王萌;王福龙;: "基于端点检测和高斯滤波器组的MFCC说话人识别", 计算机系统应用, no. 10 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113179442A (en) * | 2021-04-20 | 2021-07-27 | 浙江工业大学 | Voice recognition-based audio stream replacement method in video |
CN113179442B (en) * | 2021-04-20 | 2022-04-29 | 浙江工业大学 | Voice recognition-based audio stream replacement method in video |
CN113205803A (en) * | 2021-04-22 | 2021-08-03 | 上海顺久电子科技有限公司 | Voice recognition method and device with adaptive noise reduction capability |
CN113205803B (en) * | 2021-04-22 | 2024-05-03 | 上海顺久电子科技有限公司 | Voice recognition method and device with self-adaptive noise reduction capability |
CN114038469A (en) * | 2021-08-03 | 2022-02-11 | 成都理工大学 | Speaker identification method based on multi-class spectrogram feature attention fusion network |
CN114038469B (en) * | 2021-08-03 | 2023-06-20 | 成都理工大学 | Speaker identification method based on multi-class spectrogram characteristic attention fusion network |
CN113823290A (en) * | 2021-08-31 | 2021-12-21 | 杭州电子科技大学 | Multi-feature fusion voiceprint recognition method |
CN115273863A (en) * | 2022-06-13 | 2022-11-01 | 广东职业技术学院 | Compound network class attendance system and method based on voice recognition and face recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112542174A (en) | VAD-based multi-dimensional characteristic parameter voiceprint identification method | |
CN107610715B (en) | Similarity calculation method based on multiple sound characteristics | |
CN106935248B (en) | Voice similarity detection method and device | |
JP4802135B2 (en) | Speaker authentication registration and confirmation method and apparatus | |
CN102543073B (en) | Shanghai dialect phonetic recognition information processing method | |
CN108986824B (en) | Playback voice detection method | |
CN102968990B (en) | Speaker identifying method and system | |
CN110931022B (en) | Voiceprint recognition method based on high-low frequency dynamic and static characteristics | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN108198545B (en) | Speech recognition method based on wavelet transformation | |
CN110299141B (en) | Acoustic feature extraction method for detecting playback attack of sound record in voiceprint recognition | |
CN103646649A (en) | High-efficiency voice detecting method | |
Vyas | A Gaussian mixture model based speech recognition system using Matlab | |
WO2018095167A1 (en) | Voiceprint identification method and voiceprint identification system | |
CN111489763B (en) | GMM model-based speaker recognition self-adaption method in complex environment | |
Couvreur et al. | Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models | |
CN112233657A (en) | Speech enhancement method based on low-frequency syllable recognition | |
Nijhawan et al. | A new design approach for speaker recognition using MFCC and VAD | |
Kaminski et al. | Automatic speaker recognition using a unique personal feature vector and Gaussian Mixture Models | |
Tazi | A robust speaker identification system based on the combination of GFCC and MFCC methods | |
Kumar et al. | Text dependent speaker identification in noisy environment | |
Wang et al. | Robust Text-independent Speaker Identification in a Time-varying Noisy Environment. | |
Shu-Guang et al. | Isolated word recognition in reverberant environments | |
CN108962249B (en) | Voice matching method based on MFCC voice characteristics and storage medium | |
Bonifaco et al. | Comparative analysis of filipino-based rhinolalia aperta speech using mel frequency cepstral analysis and Perceptual Linear Prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |