CN112951245A - Dynamic voiceprint feature extraction method integrated with static component - Google Patents
Dynamic voiceprint feature extraction method integrated with static component Download PDFInfo
- Publication number
- CN112951245A CN112951245A CN202110257723.XA CN202110257723A CN112951245A CN 112951245 A CN112951245 A CN 112951245A CN 202110257723 A CN202110257723 A CN 202110257723A CN 112951245 A CN112951245 A CN 112951245A
- Authority
- CN
- China
- Prior art keywords
- voice data
- target voice
- dynamic
- mfcc
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000003068 static effect Effects 0.000 title claims abstract description 38
- 238000000605 extraction Methods 0.000 title claims abstract description 34
- 238000000034 method Methods 0.000 claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000001228 spectrum Methods 0.000 claims description 12
- 238000012546 transfer Methods 0.000 claims description 5
- 238000002474 experimental method Methods 0.000 description 10
- 239000000203 mixture Substances 0.000 description 4
- 238000012360 testing method Methods 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 2
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Complex Calculations (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a dynamic voiceprint feature extraction method integrated with static components, which comprises the steps of preprocessing target voice data, acquiring preprocessed target voice data, processing the preprocessed target voice by using a Fourier transform and Mel filter group, and acquiring MFCC coefficients of the target voice data; substituting the MFCC coefficients of the target voice data into a dynamic voiceprint feature extraction model fused with the static component, obtaining an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data; the method provided by the invention can ensure the sound continuity, reduce the average error rate and the like and improve the recognition rate when the voiceprint feature extraction is carried out on the voice data.
Description
Technical Field
The invention relates to the technical field of artificial intelligence voiceprint recognition, in particular to a dynamic voiceprint feature extraction method fused with static components.
Background
At present, smart homes are more and more widely applied to life and work of people, the smart homes adopt technologies such as wireless communication, image processing and voice processing, an intelligent home system based on voice interaction is more convenient to use, an information acquisition space is wider, and user experience is more friendly.
Voiceprint recognition has been developed greatly in recent years, and in some occasions, the recognition rate also meets the basic requirements of people on safety, and the voiceprint recognition has the advantages of economy, convenience and the like, so that the voiceprint recognition has a very wide application prospect. How to suppress external noise as much as possible and extract voice features as pure as possible from the acquired signals is a precondition for putting various voice processing techniques into practical use.
Today, the living quality of people is rapidly improved, the requirements of the public on the intelligent home system are not limited to the execution of standard and common control functions, but the intellectualization, convenience, safety and comfort of the whole home are expected to be improved. The voice print recognition function is added to the intelligent home system, and the stability of the system in a noise environment is improved by adopting voice enhancement, so that the human-computer interaction experience of the intelligent home can be further improved, and the use efficiency of a user on the intelligent home is improved; and a level system can be set for the control and operation of the smart home, and differentiated service functions can be provided for users with different permission levels, so that the overall safety and the practicability of the system are further improved. The system has strong impact force in the future market, especially under the large background that the development of the current smart home market is slow, the system can play more and more important roles and has profound influence on the life of the public, but the voice recognition or voice feature extraction method in the prior art has the problems of high average error rate and low recognition rate.
Therefore, in order to further reduce the error rate such as average and the like and improve the recognition rate, the invention provides a dynamic voiceprint feature extraction method which is integrated with a static component.
Disclosure of Invention
The purpose of the invention is as follows: the dynamic voiceprint feature extraction method is low in average equal error rate and high in recognition rate.
The technical scheme is as follows: the invention provides a dynamic voiceprint feature extraction method fused with static components, which is used for carrying out voiceprint feature extraction on target voice data and is characterized by comprising the following steps:
step 1: preprocessing the target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain an MFCC coefficient of the target voice data;
and step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
As a preferred aspect of the present invention, in step 1, a method for preprocessing target speech data includes: dividing target voice data into T frames, and acquiring multi-frame voice data;
in step 2, the method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
As a preferable aspect of the present invention, in step 3, the dynamic voiceprint feature extraction model merged into the static component is:
d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, C (l, t) is the tth parameter of the ith order in the MFCC coefficients, C (l, t +1) is the t +1 th parameter of the ith order in the MFCC coefficients, C (l, t + K) is the t + K parameter of the ith order, C (l, t-K) is the t-K parameter of the ith order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is performed on the tth frame of voice data, and K is the preset total step length when Fourier transform is performed on the tth frame of voice data.
As a preferred aspect of the present invention, according to the following formula:
obtaining an l-th order characteristic coefficient C (l, t) of the t-th frame of voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
As a preferred aspect of the present invention, according to the following formula:
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Has the advantages that: compared with the prior art, the method for extracting the dynamic voiceprint features fused with the static components, provided by the invention, has the advantages that the voiceprint features are extracted based on the dynamic voiceprint feature extraction model fused with the static components, and the purposes of reducing average equal error rate and improving identification rate are achieved while the sound continuity is ensured.
Drawings
FIG. 1 is a flow chart of a dynamic voiceprint feature extraction method provided by an embodiment of the invention;
FIG. 2 is a schematic diagram of the variation of the equal error rate with the ratio of the dynamic characteristic to the static characteristic provided by the embodiment of the invention;
FIG. 3 is a graph illustrating the variation of the constant error rate with the static feature coefficients according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
Referring to fig. 1, the method for extracting dynamic voiceprint features merged into a static component provided by the invention comprises the following steps:
step 1: and preprocessing the target voice data to obtain the preprocessed target voice data.
The method for preprocessing the target voice data comprises the following steps: dividing target voice data into T frames, and acquiring multi-frame voice data;
step 2: and processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain the MFCC coefficient of the target voice data.
The method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
The method of step 1 and step 2 specifically comprises the following steps:
extraction of Mel-frequency cepstrum coefficients (MFCCs) is performed on data that has been subjected to speech preprocessing, and desired feature coefficients are obtained by performing operations such as fourier transform, Mel (Mel) filter, and the like on the data.
(1) Fourier transform is carried out on each frame of data after voice preprocessing to obtain corresponding frequency spectrum and obtain power spectrum | X (j) of each frame2X (j) is calculated as follows:
wherein, N is the length of each frame, J is the fast Fourier transform length, namely the total frame number, J is the value of 1-J, which represents the jth frame, and x (N) is the voice data in the nth frame.
(2) And designing a Mel filter bank, and filtering the power spectrum of the signal through the configured Mel filter bank. And carrying out logarithm operation, and converting the frequency scale into Mel frequency. The center frequency f (m) of the mth filter in the filter bank satisfies the following formula:
Mel(f(m+1))-Mel(f(m))=Mel(f(m))-Mel(f(m-1))
where m is the number of filters in the filter bank, and Mel (f (m)) is the operation of converting the frequency f (m) to Mel frequency.
Transfer function H of each band pass filter in Mel Filter Bankm(f):
Wherein f is the frequency.
After the voice data is processed by the Mel filter, the logarithmic energy S (m) output by each filter bank is calculated:
wherein M is the serial number of the filter bank filter, M is the total number of the filters in the filter bank, generally 22-26, and M is 24 in the invention. | X (k) messaging2Represents the power spectrum of the k-th frame, Hm(f) Representing the transfer function of the mth filter in the filter bank at frequency f.
(3) Discrete cosine transform is carried out on the logarithm Mel power spectrum of each frame to carry out decorrelation operation on the energy of the logarithm Mel power spectrum, the correlation among signals of all dimensions is eliminated, the signals are mapped to a low-dimensional space, and a corresponding MFCC coefficient C (l) is obtained:
wherein, L is the total order of the MFCC coefficient, usually 12 to 18, and the invention takes L-15; l is a value from 1 to L and represents the ith order of the MFCC coefficient.
And step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
In step 3, a dynamic voiceprint feature extraction model fused with the static component is constructed according to the following method:
the essence of the dynamic feature extraction is the MFCC coefficient difference mode, that is, when the MFCC coefficient difference parameter of the t-th frame is calculated, the parameters of the t-1-th frame and the t + 1-th frame are used for carrying out the downsampling. Therefore, the classical dynamic feature extraction formula is as follows:
wherein J represents the fast Fourier transform length, usually 1 or 2, represents a first-order MFCC coefficient differential parameter and a second-order MFCC coefficient differential parameter, and is the value of J (J is more than or equal to 1 and is less than or equal to J); l is the order of the Mel cepstrum coefficient, T is the frame number, T is the total frame number of a section of audio, C (l, T) is the T-th parameter of the L-th order of the Mel cepstrum coefficient matrix of the voice signal, and d (l, T) is the MFCC dynamic characteristic parameter.
The new dynamic voiceprint characteristic feature symplex frequency cepstrum coefficient formula provided by the invention is as follows:
wherein,for the dynamic voiceprint feature proposed by the present invention, MFCC is the static voiceprint feature and Δ MFCC is the classical motionThe dynamic voiceprint characteristic is a difference dynamic parameter, alpha is a static characteristic coefficient, beta is a dynamic characteristic coefficient, and delta is the ratio of the dynamic characteristic coefficient to the static characteristic coefficient.
The sum α and δ values are determined according to the following method:
assuming that α is 1, the optimum value of the ratio δ of the dynamic coefficient to the static coefficient is determined by experiment.
The number of gaussian elements in the experiment was set to 64, and voice data of 100 persons (of which 50 women and 50 men) was selected from the timmit corpus as experimental voice data of the experiment. And selecting 60 persons of voice data as training data for UBM model training, and combining 10 sections of voice of each person into 10 seconds of voice for UBM model training. Model parameters of the UBM model are obtained and stored, 5 segments of speech of each of the remaining 40 persons are combined into 10-second speech data to train the GMM model of each specific speaker, and the obtained model parameters are stored. The remaining voice data of the last 40 people is cycled into 10 segments of 5 seconds of voice data to match the system. The complete test process comprises 400 times of speaker acceptance test experiments and 15600 times of speaker rejection test experiments, and the equal error rate is obtained as the output result of one experiment.
For the voiceprint feature obtained by the voice data, each section of test voice generates a plurality of frames of voice sections, the set MFCC order is 15, so that one frame of voice data can generate 15 MFCC coefficients, 15 dynamic feature coefficients are generated after calculation, and 30 MFCC coefficients are generated in each frame of voice section after combination. The sampling frequency in the experiment was 16KHz and the frame was shifted to 1/2 the length of the frame.
Assuming that α is 1, the optimum value of the ratio δ of the dynamic coefficient to the static coefficient is determined by experiment.
According to the experimental conditions, δ takes 5 different values, and 5 experiments are respectively carried out to obtain average equal error rate data as shown in table 1:
TABLE 1
Based on the data shown in table 1, error rate curves such as the ratio δ of different dynamic characteristics to static characteristics and the average can be obtained as shown in fig. 1.
As can be seen from fig. 2, when δ is 1, the average equal error rate is the lowest, so that the optimal value of the ratio δ of the dynamic characteristic to the static characteristic is 1.
Accordingly, the dynamic voiceprint characteristic symplex frequency cepstrum coefficient formula provided by the invention can be changed into:
according to the experimental conditions, α takes 5 different values, and 5 experiments are performed respectively to obtain average equal error rate data as shown in table 2:
TABLE 2
Based on the data shown in Table 2, error rate curves of different static characteristic coefficients α and average values can be obtained as shown in FIG. 3.
As can be seen from fig. 3, when α is 0.5, the average equal error rate is the lowest, and thus the optimal value of the static feature coefficient is 0.5.
Accordingly, the dynamic voiceprint characteristic symplex frequency cepstrum coefficient formula provided by the invention can be changed into:
formula (5) represents a dynamic feature parameter, that is, Δ MFCC, MFCC is a static feature parameter, that is, MFCC ═ d (l, t), and the two are added by taking a weight of 0.5, so as to obtain a dynamic feature extraction formula in which a static component is merged:
and (5) arranging to obtain a dynamic feature extraction formula fused with the static component:
namely, the constructed dynamic voiceprint feature extraction model fused into the static component is as follows:
wherein d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, and d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, namely: d (l, t) is the t-th parameter of the order I of the MFCC dynamic characteristic difference parameter matrix; c (l, t) is the t-th parameter of the l-th order in the MFCC coefficients, C (l, t +1) is the t + 1-th parameter of the l-th order in the MFCC coefficients, C (l, t + K) is the t + K-th parameter of the l-th order, C (l, t-K) is the t-K-th parameter of the l-th order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is carried out on the t-th frame voice data, and K is the preset total step length when Fourier transform is carried out on the t-th frame voice data.
And for the constructed dynamic voiceprint feature extraction model blended into the static component, the following formula is adopted:
obtaining an l-th order characteristic coefficient C (l, t) of the t-th frame of voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
According to the following formula:
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Based on the model and the method, according to parameters such as a Mel cepstrum coefficient matrix, audio time and the like, static characteristic parameters can be calculated firstly, and dynamic characteristic parameters blended into static components are further calculated for voiceprint recognition.
In the voiceprint recognition algorithm, a Gaussian mixture model and a general background model are commonly used for carrying out model establishment on voiceprint characteristics of a speaker, and the method mainly comprises the steps of training voice input of the Gaussian mixture model, voice preprocessing, voiceprint characteristic extraction, general background model parameter input, Gaussian mixture model construction and Gaussian mixture model parameter storage. Generally, in the voiceprint recognition algorithm, a classical dynamic feature extraction algorithm is mostly adopted in the process of voiceprint feature extraction, the process is improved, a static component is blended in when a dynamic feature extraction parameter is calculated, and the performance of the voiceprint recognition algorithm is improved.
The above description is only a preferred embodiment of the present invention, and it will be apparent to those skilled in the art that various modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be considered as the protection scope of the present invention.
Claims (5)
1. A dynamic voiceprint feature extraction method fused with static components is used for carrying out voiceprint feature extraction on target voice data, and is characterized by comprising the following steps:
step 1: preprocessing the target voice data to obtain preprocessed target voice data;
step 2: processing the preprocessed target voice by using a Fourier transform and Mel filter group to obtain an MFCC coefficient of the target voice data;
and step 3: and substituting the MFCC coefficients of the target voice data into the dynamic voiceprint feature extraction model fused into the static component, acquiring an MFCC dynamic feature difference parameter matrix of the target voice data, and defining the matrix as the dynamic voiceprint features of the target voice data.
2. The method for extracting dynamic voiceprint features merged into a static component according to claim 1, wherein in step 1, the method for preprocessing the target voice data comprises: dividing target voice data into T frames, and acquiring multi-frame voice data;
in step 2, the method for processing the preprocessed target voice by using the Fourier transform and the Mel filter set comprises the following steps:
processing each frame of voice data by using Fourier transform to obtain the frequency spectrum of each frame of voice data;
the frequency spectrum of each frame of voice data is input into the Mel filter bank, and the MFCC coefficient of each frame of voice data, namely the MFCC coefficient of the target voice data, is obtained.
3. The method according to claim 2, wherein in step 3, the model for extracting the dynamic voiceprint features merged into the static component is:
d (l, t) is the extraction result of the ith order dynamic voiceprint feature of the tth frame of voice data, d (l, t) constitutes the tth element of the ith order in the MFCC dynamic feature difference parameter matrix of the target voice data, C (l, t) is the tth parameter of the ith order in the MFCC coefficients, C (l, t +1) is the t +1 th parameter of the ith order in the MFCC coefficients, C (l, t + K) is the t + K parameter of the ith order, C (l, t-K) is the t-K parameter of the ith order in the MFCC coefficients, K is the frequency ordinal number after Fourier transform is performed on the tth frame of voice data, and K is the preset total step length when Fourier transform is performed on the tth frame of voice data.
4. The method of claim 3, wherein the method comprises the following steps:
acquiring characteristic coefficients C (l, t) of the l order of the t frame voice data in the MFCC coefficients;
wherein, L is the order of the MFCC coefficient, m is the serial number of the Mel filter bank, and s (m) is the logarithmic energy output by the mth Mel filter bank.
5. The method of extracting a dynamic temperature-increasing feature incorporating a static component according to claim 4, wherein the method is based on the following formula:
obtaining the logarithmic energy S (m) output by the mth Mel filter bank;
wherein M represents the total number of filter groups, N represents the data length of the t frame voice data, X (k) represents the power corresponding to the k frequency, Hm(k) Representing the transfer function of the mth Mel filter bank corresponding to the kth frequency.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257723.XA CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110257723.XA CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112951245A true CN112951245A (en) | 2021-06-11 |
CN112951245B CN112951245B (en) | 2023-06-16 |
Family
ID=76228612
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110257723.XA Active CN112951245B (en) | 2021-03-09 | 2021-03-09 | Dynamic voiceprint feature extraction method integrated with static component |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112951245B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113689863A (en) * | 2021-09-24 | 2021-11-23 | 广东电网有限责任公司 | Voiceprint feature extraction method, device, equipment and storage medium |
CN115762529A (en) * | 2022-10-17 | 2023-03-07 | 国网青海省电力公司海北供电公司 | Method for preventing cable from being broken outside by using voice recognition perception algorithm |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1246745A (en) * | 1985-03-25 | 1988-12-13 | Melvyn J. Hunt | Man/machine communications system using formant based speech analysis and synthesis |
CA2158847A1 (en) * | 1993-03-25 | 1994-09-29 | Mark Pawlewski | A Method and Apparatus for Speaker Recognition |
KR100779242B1 (en) * | 2006-09-22 | 2007-11-26 | (주)한국파워보이스 | Speaker recognition methods of a speech recognition and speaker recognition integrated system |
CN102290048A (en) * | 2011-09-05 | 2011-12-21 | 南京大学 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
US20170365259A1 (en) * | 2015-02-05 | 2017-12-21 | Beijing D-Ear Technologies Co., Ltd. | Dynamic password voice based identity authentication system and method having self-learning function |
CN107610708A (en) * | 2017-06-09 | 2018-01-19 | 平安科技(深圳)有限公司 | Identify the method and apparatus of vocal print |
CN107993663A (en) * | 2017-09-11 | 2018-05-04 | 北京航空航天大学 | A kind of method for recognizing sound-groove based on Android |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
CN108847244A (en) * | 2018-08-22 | 2018-11-20 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Voiceprint recognition method and system based on MFCC and improved BP neural network |
CN109256138A (en) * | 2018-08-13 | 2019-01-22 | 平安科技(深圳)有限公司 | Auth method, terminal device and computer readable storage medium |
CN110428841A (en) * | 2019-07-16 | 2019-11-08 | 河海大学 | A kind of vocal print dynamic feature extraction method based on random length mean value |
US20200135171A1 (en) * | 2017-02-28 | 2020-04-30 | National Institute Of Information And Communications Technology | Training Apparatus, Speech Synthesis System, and Speech Synthesis Method |
CN111489763A (en) * | 2020-04-13 | 2020-08-04 | 武汉大学 | Adaptive method for speaker recognition in complex environment based on GMM model |
-
2021
- 2021-03-09 CN CN202110257723.XA patent/CN112951245B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA1246745A (en) * | 1985-03-25 | 1988-12-13 | Melvyn J. Hunt | Man/machine communications system using formant based speech analysis and synthesis |
CA2158847A1 (en) * | 1993-03-25 | 1994-09-29 | Mark Pawlewski | A Method and Apparatus for Speaker Recognition |
KR100779242B1 (en) * | 2006-09-22 | 2007-11-26 | (주)한국파워보이스 | Speaker recognition methods of a speech recognition and speaker recognition integrated system |
CN102290048A (en) * | 2011-09-05 | 2011-12-21 | 南京大学 | Robust voice recognition method based on MFCC (Mel frequency cepstral coefficient) long-distance difference |
CN102982803A (en) * | 2012-12-11 | 2013-03-20 | 华南师范大学 | Isolated word speech recognition method based on HRSF and improved DTW algorithm |
US20170365259A1 (en) * | 2015-02-05 | 2017-12-21 | Beijing D-Ear Technologies Co., Ltd. | Dynamic password voice based identity authentication system and method having self-learning function |
CN104835498A (en) * | 2015-05-25 | 2015-08-12 | 重庆大学 | Voiceprint identification method based on multi-type combination characteristic parameters |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
US20200135171A1 (en) * | 2017-02-28 | 2020-04-30 | National Institute Of Information And Communications Technology | Training Apparatus, Speech Synthesis System, and Speech Synthesis Method |
CN107610708A (en) * | 2017-06-09 | 2018-01-19 | 平安科技(深圳)有限公司 | Identify the method and apparatus of vocal print |
CN107993663A (en) * | 2017-09-11 | 2018-05-04 | 北京航空航天大学 | A kind of method for recognizing sound-groove based on Android |
CN109256138A (en) * | 2018-08-13 | 2019-01-22 | 平安科技(深圳)有限公司 | Auth method, terminal device and computer readable storage medium |
CN108847244A (en) * | 2018-08-22 | 2018-11-20 | 华东计算技术研究所(中国电子科技集团公司第三十二研究所) | Voiceprint recognition method and system based on MFCC and improved BP neural network |
CN110428841A (en) * | 2019-07-16 | 2019-11-08 | 河海大学 | A kind of vocal print dynamic feature extraction method based on random length mean value |
CN111489763A (en) * | 2020-04-13 | 2020-08-04 | 武汉大学 | Adaptive method for speaker recognition in complex environment based on GMM model |
Non-Patent Citations (5)
Title |
---|
岳倩倩;周萍;景新幸;: "基于非线性幂函数的听觉特征提取算法研究", 微电子学与计算机, no. 06 * |
申小虎;万荣春;张新野;: "一种改进动态特征参数的话者语音识别系统", 计算机仿真, no. 04 * |
赵青;成谢锋;朱冬梅;: "基于改进MFCC和短时能量的咳嗽音身份识别", 计算机技术与发展, no. 06 * |
郭春霞;: "说话人识别算法的研究", 西安邮电学院学报, no. 05 * |
魏丹芳;李应;: "基于MFCC和加权动态特征组合的环境音分类", 计算机与数字工程, no. 02 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113689863A (en) * | 2021-09-24 | 2021-11-23 | 广东电网有限责任公司 | Voiceprint feature extraction method, device, equipment and storage medium |
CN113689863B (en) * | 2021-09-24 | 2024-01-16 | 广东电网有限责任公司 | Voiceprint feature extraction method, voiceprint feature extraction device, voiceprint feature extraction equipment and storage medium |
CN115762529A (en) * | 2022-10-17 | 2023-03-07 | 国网青海省电力公司海北供电公司 | Method for preventing cable from being broken outside by using voice recognition perception algorithm |
CN115762529B (en) * | 2022-10-17 | 2024-09-10 | 国网青海省电力公司海北供电公司 | Method for preventing cable from being broken outwards by utilizing voice recognition sensing algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN112951245B (en) | 2023-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111223493B (en) | Voice signal noise reduction processing method, microphone and electronic equipment | |
CN109326299B (en) | Speech enhancement method, device and storage medium based on full convolution neural network | |
Sarikaya et al. | High resolution speech feature parametrization for monophone-based stressed speech recognition | |
CN113129897B (en) | Voiceprint recognition method based on attention mechanism cyclic neural network | |
CN110428849A (en) | A kind of sound enhancement method based on generation confrontation network | |
CN109256127B (en) | Robust voice feature extraction method based on nonlinear power transformation Gamma chirp filter | |
CN111128209B (en) | Speech enhancement method based on mixed masking learning target | |
EP1250699B1 (en) | Speech recognition | |
CN112951245A (en) | Dynamic voiceprint feature extraction method integrated with static component | |
CN102982801A (en) | Phonetic feature extracting method for robust voice recognition | |
CN106024010A (en) | Speech signal dynamic characteristic extraction method based on formant curves | |
CN113744749B (en) | Speech enhancement method and system based on psychoacoustic domain weighting loss function | |
CN110428841B (en) | Voiceprint dynamic feature extraction method based on indefinite length mean value | |
Plahl et al. | Improved pre-training of deep belief networks using sparse encoding symmetric machines | |
CN107274887A (en) | Speaker's Further Feature Extraction method based on fusion feature MGFCC | |
CN112017658A (en) | Operation control system based on intelligent human-computer interaction | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
CN112992131A (en) | Method for extracting ping-pong command of target voice in complex scene | |
CN111920390A (en) | Snore detection method based on embedded terminal | |
Das et al. | Robust front-end processing for speech recognition in noisy conditions | |
Li et al. | An auditory system-based feature for robust speech recognition | |
Hurmalainen et al. | Modelling spectro-temporal dynamics in factorisation-based noise-robust automatic speech recognition | |
He et al. | An adaptive multi-band system for low power voice command recognition. | |
Chen et al. | Entropy-based feature parameter weighting for robust speech recognition | |
Saha et al. | Modified mel-frequency cepstral coefficient |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |