CN112053694A - Voiceprint recognition method based on CNN and GRU network fusion - Google Patents
Voiceprint recognition method based on CNN and GRU network fusion Download PDFInfo
- Publication number
- CN112053694A CN112053694A CN202010719665.3A CN202010719665A CN112053694A CN 112053694 A CN112053694 A CN 112053694A CN 202010719665 A CN202010719665 A CN 202010719665A CN 112053694 A CN112053694 A CN 112053694A
- Authority
- CN
- China
- Prior art keywords
- voice
- spectrogram
- cnn
- gru network
- voiceprint recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 230000004927 fusion Effects 0.000 title claims description 11
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 29
- 238000012360 testing method Methods 0.000 claims description 21
- 239000012634 fragment Substances 0.000 claims description 9
- 238000005070 sampling Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 4
- 238000001514 detection method Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 230000003595 spectral effect Effects 0.000 claims description 2
- 230000007704 transition Effects 0.000 claims description 2
- 238000013527 convolutional neural network Methods 0.000 claims 8
- 238000000605 extraction Methods 0.000 abstract description 8
- 230000010354 integration Effects 0.000 abstract description 4
- 238000001914 filtration Methods 0.000 abstract 1
- 230000006870 function Effects 0.000 description 7
- 238000013135 deep learning Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 210000000056 organ Anatomy 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
- G10L17/14—Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/12—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Quality & Reliability (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Complex Calculations (AREA)
Abstract
The invention discloses a voiceprint recognition method based on the integration of CNN and GRU network (C-GRU for short), which comprises the following steps: preprocessing a voice signal sample to be recognized and then performing voice enhancement processing on the voice signal sample through a self-adaptive filtering algorithm; generating a spectrogram of a voice segment to be recognized; inputting a spectrogram generated by a voice segment to be recognized into the trained C-GRU network model, and respectively extracting voiceprint features; and inputting the extracted features into a softmax function to obtain identity classification information of the voice signal segment to be recognized. The feature extraction method based on the C-GRU network avoids information loss in a frequency domain, and simultaneously realizes the voiceprint recognition method with higher recognition accuracy and faster convergence speed by utilizing the characteristic that the GRU network has good time feature extraction capability.
Description
Technical Field
The invention relates to a voiceprint recognition method, in particular to a voiceprint recognition method based on CNN and GRU network fusion (C-GRU for short).
Technical Field
In recent years, biometric information identification technology has become a very reliable and convenient way of authenticating identity information, attracting attention from both inside and outside of the industry. Voice is one of the everyday ways people communicate. Scientifically proves that the sounding organs of each person have difference, and the influence on the sounding organs is different due to different growing environments, so that each person has unique personality. The method has the advantages that the identity is recognized by voice, the method is very convenient to collect voice, and used equipment is cheap and easy to obtain, so that the method has great potential, and can be used for remote identity authentication, easy acceptance of users and the like.
Voiceprint recognition techniques can be divided into two directions, text-dependent and text-independent, in content. In the text-related voiceprint recognition method, a speaker must speak according to a fixed dialect, and the text content of a training voice must be the same as that of a testing voice, although the recognition method can be trained to have a good effect, the biggest disadvantage is that the speaker must pronounce according to the fixed text, and once the voice content is inconsistent with the text or does not pronounce according to the requirement, the text-independent voiceprint recognition method is difficult to guarantee, so that the popularization of the method in practical application has great limitation.
The traditional voiceprint recognition technology usually adopts a general background model (GMM-UBM), firstly, a speaker independent model is trained by utilizing the phonetic script of a large number of speakers, then, the problems of less phonetic data and phonetic mismatch caused by multiple channels existing in the traditional Gaussian mixture model are effectively solved by utilizing an MAP algorithm, and finally, the recognition model is trained through a maximum posterior probability or a maximum likelihood regression criterion. But this model takes up a significant amount of storage resources when modeling each speaker. The neural network method is a basic subject of deep learning research at present, and with the fact that deep learning gradually deepens into various fields, the aspect of voiceprint recognition technology gradually turns to the deep learning field for exploration research. The traditional deep learning method for voiceprint recognition mainly comprises a CNN (voice communications network) and a long-short term memory network (LSTM), original sequence features of sequence voice are ignored when a voiceprint recognition system based on the CNN extracts voiceprint features, and although voice feature sequences are considered in an LSTM network model, the LSTM network is extremely difficult to train due to huge operation requirements brought by 3 thresholds of the LSTM network.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a voiceprint recognition method based on the fusion of the CNN and the GRU network, which utilizes the advantage that the CNN can autonomously extract the characteristics to avoid the information loss on the frequency domain caused by the traditional speaker voice characteristic extraction method, simultaneously utilizes the characteristic that the GRU network has good time sequence characteristic extraction, and realizes the voiceprint recognition with high accuracy by adopting the mode of the fusion of the CNN and the GRU network.
The invention is realized by adopting the following technical scheme:
a voiceprint recognition method based on CNN and GRU network fusion comprises the following steps:
step 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to a time sequence to obtain identity classification information of the voice fragment to be recognized;
the training method of the voiceprint recognition model with the CNN and GRU network integration specifically comprises the following steps:
step 201, acquiring a training set of voice signals and a test set of the voice signals;
step 202, performing voice signal preprocessing by methods of pre-emphasis, framing, windowing, endpoint detection and the like;
step 203, improving the signal-to-noise ratio of the voice signal by improving the RLS algorithm;
step 204, converting each voice segment of the voice segment training set and the voice signal testing set through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set;
step 205, inputting a training set of the spectrogram into a CNN and GRU network to be trained, and training the CNN and GRU network to be trained;
step 206, inputting the testing set of the spectrogram into the trained CNN and GRU network, finishing the training of the CNN and GRU network if the output testing result meets the preset condition, or returning to the step 205 to perform the training again until the testing result meets the preset condition;
the spectrogram generating process comprises the following steps:
step 301, based on the first order digital filter h (z) ═ 1- α z-1Pre-emphasis processing is carried out on the voice fragment, wherein alpha is a filter coefficient;
step 302, performing framing processing on the pre-emphasized voice segment, and maintaining smooth transition and continuity between frames;
step 303, based on the formulaFourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n);
Step 304, calculating | X (n, k) | based on formula E (n, k) | X (n, k) |2=XR(n,k)2+XI(n,k)2Calculating the energy spectral density of each frame of signal, wherein X (n, k) is a complex sequence obtained after the nth frame of voice is subjected to FTT (Fourier transform) conversion by M points;
Step 306, based on the formulaNormalizing the spectrogram by a normalization method to obtain a normalized spectrogram, wherein Qmax(a, b) and Qmin(a, b) are respectively the maximum value and the minimum value in the grey scale of the spectrogram;
the improved RLS algorithm comprises:
step 401, based on the formula e (n) ═ d (n) — xT(n) ω (n-1) obtaining a prediction error to perform enhancement processing on the speech signal;
step 403, based on formulaThe forgetting factor is improved to have a faster tracking speed and a smaller steady-state error;
step 404, completing updating the filter coefficients based on the formula ω (n) ═ ω (n-1) + k (n) e (n);
in summary, the present invention discloses a voiceprint recognition method based on the fusion of CNN and GRU network, which includes the following steps: step 1, obtaining a voice segment to be recognized. And 2, preprocessing the original voice signal to generate a spectrogram of the voice fragment to be recognized. And 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized. The invention is based on the feature extraction method of the integration of the CNN and the GRU network, avoids the information loss on the frequency domain caused by the traditional speaker voice feature extraction method, and simultaneously realizes the voiceprint recognition with high accuracy by using the characteristic that the GRU network has good time sequence feature extraction and adopting the mode of the integration of the CNN and the GRU network.
The technical scheme provided by the implementation of the invention has the beneficial effects that at least:
the voiceprint recognition method based on the fusion of the CNN and the GRU network can solve the problems that a single neural network is insufficient in feature extraction and is limited by accuracy when being applied to complex problems, and meanwhile training efficiency is effectively improved.
Drawings
FIG. 1 is a flow chart of a voiceprint recognition method based on the fusion of a CNN and a GRU network implemented by the present invention;
FIG. 2 is a schematic diagram of a spectrogram implemented in the present invention;
FIG. 3 is a comparison of the improved RLS of the practice of the present invention;
fig. 4 is a schematic structural diagram of CNN and GRU network convergence implemented in the present invention;
Detailed Description
The present disclosure is described in further detail below with reference to the attached drawing figures.
As shown in fig. 1, the present invention discloses a voiceprint recognition method based on the fusion of CNN and GRU network, comprising the steps of. A speech segment to be recognized is acquired (step 1). Then, the original voice signal is preprocessed to generate a spectrogram of the voice segment to be recognized (step 2). And inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized (step 3).
The obtained spectrogram schematic diagram is shown in fig. 2, and the specific generation method comprises the following steps. Firstly, a pre-processing operation is performed on the speech signal to obtain a frame-by-frame short-time speech (step 301), based on a formulaFourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n) (step 302), then intangibly calculating the luminance based on the formula E (n, k) ═ X (n, k)2=XR(n,k)2+XI(n,k)2Calculating the energy spectrum density of each frame signal, wherein X (n, k) is a complex sequence obtained by the N frame speech after the M-point FTT transformation (step 303), and obtaining the logarithm of the energy spectrum density of the step 303(step 304), finally based on the formulaNormalizing the spectrogram by a normalization method to obtain a normalized spectrogram, wherein Qmax(a, b) and Qmin(a, b) are the maximum and minimum values in spectrogram grayscale level, respectively (step 305).
Improved RLS comparison graph As shown in FIG. 3, the cost function of the exponentially weighted RLS algorithm with the forgetting factor is Jn∑λn-ie2(i) And the smaller the lambda is, the stronger the tracking capability of the time-varying parameter is, but the smaller the lambda is, the more sensitive the noise is, and the larger the stable error is, and the weaker the tracking capability is, but the less sensitive the noise is. The RLS algorithm based on the variable forgetting factor has the advantages that the tracking capability is high, and small parameter estimation errors are considered, and the algorithm is as follows:
e(n)=d(n)-xT(n)ω(n-1)
ω(n)=ω(n-1)+k(n)e(n)
λ(n)=λmin+(1-λmin)2L(n)
L(n)=-round[μe2(n)]
in summary, when the error becomes smaller, λ (n) approaches 1, and the parameter error is reduced, whereas when the square error becomes larger, λ (n) becomes the minimum value λmin。
Based on the above understanding, it is proposed to use the following formula as a correction function, and introduce the parameter a to control the function shape, and the specific function is as follows:
in the formula, the constant m and n control the value range of the function, and the constant a and the constant b control the convergence speed of the function and improve the shape of the function. When e istWhen larger as 6, λ (n) approaches n; when e istWhen 0 is equal, λ (n) is m + n, i.e. n<λ(n)<m + n; experiments show that when the temperature is 0.8<λ(n)<1 is preferred, so that m, n finally takes the value of m 0.2 and n 0.8.
The schematic diagram of the structure of the obtained C-GRU network is shown in FIG. 4, and the training process comprises. Firstly, a training set of a speech signal and a test set of the speech signal are obtained (step 201), the speech signal is preprocessed by pre-emphasis, framing, windowing, end point detection and other methods (step 202), the speech signal is enhanced by improving RLS algorithm (step 203), then, each voice segment of the voice segment training set and the voice signal testing set is converted through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set (step 204), the training set of the spectrogram is input into a CNN and a GRU network to be trained, training the CNN and the GRU network to be trained (step 205), finally inputting the testing set of the spectrogram into the trained CNN and GRU network, if the output testing result meets the preset condition, the training of CNN and GRU network is completed, otherwise, the step 205 is returned to perform the training again until the test result satisfies the preset condition (step 206).
During the training process, the following modes can be adopted: the jth spectrogram of the ith speakerThe corresponding tag value is labeled i-1. And (3) sending the obtained spectrogram into a network for training, wherein the input dimension of the spectrogram is 128 multiplied by 128, and the length and the height of the corresponding spectrogram. And then inputting the data into a C-GRU network, wherein the convolution kernel size of the CNN network is 5 multiplied by 5, the convolution kernel number is 100, the size of a pooling layer is 2 multiplied by 2, the most obvious features in the feature map are extracted, and the maximum pooling is adopted by the network model. To prevent loss of timing information in the spectrogram, the signals are pooled only in frequency when pooling is performed. In order to prevent the phenomenon of data overfitting in the network, Dropout is added inside the GRU unit, and Dropout is added among GRU internal neurons and different GThe connection between the RU units is temporarily disconnected by a certain ratio, and the disconnection ratio can be set to 0.2. And finally, carrying out classification and identification through a softmax classifier.
The present disclosure is based on the use of a TIMIT Speech database with a corpus sampling rate of 16kHz, 16 bits, containing 10 sentences per person from 630 people. And during testing, 80% of data sets are used as training samples, and the rest 20% of data sets are used as test sets. The network model was iterated 20 times, the results are shown in table 1:
the accuracy of the C-GRU network is superior to that of the other two network structures, and the single characteristic has large influence on voiceprint recognition and cannot meet actual requirements.
The foregoing is one of the exemplary embodiments of the present disclosure, and various modifications may be made by those skilled in the art without departing from the spirit and scope of the present invention.
Claims (4)
1. A voiceprint recognition method based on CNN and GRU network fusion is characterized by comprising the following steps:
step 1, acquiring a voice segment to be recognized;
step 2, preprocessing an original voice signal to generate a spectrogram of a voice fragment to be recognized;
and 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized.
2. The method for voiceprint recognition based on CNN and GRU network convergence according to claim 1, wherein the training method for the CNN and GRU network voiceprint recognition model comprises the following steps:
step 201, acquiring a training set of voice signals and a test set of the voice signals;
step 202, performing voice signal preprocessing by methods of pre-emphasis, framing, windowing, endpoint detection and the like;
step 203, improving the signal-to-noise ratio of the voice signal by improving the RLS algorithm;
step 204, converting each voice segment of the voice segment training set and the voice signal testing set through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set;
step 205, inputting a training set of the spectrogram into a CNN and GRU network to be trained, and training the CNN and GRU network to be trained;
and step 206, inputting the testing set of the spectrogram into the trained CNN and GRU network, finishing the training of the CNN and GRU network if the output testing result meets the preset condition, or returning to the step 205 to perform the training again until the testing result meets the preset condition.
3. The method of claim 2, wherein the improved RLS algorithm comprises:
step 301, based on the formula e (n) ═ d (n) — xT(n) ω (n-1) obtaining a prediction error to perform enhancement processing on the speech signal;
step 303, based on the formulaThe forgetting factor is improved to have a faster tracking speed and a smaller steady-state error;
step 304, updating the filter coefficients based on the formula ω (n) ═ ω (n-1) + k (n) e (n).
4. The method for voiceprint recognition based on CNN and GRU network convergence according to claim 1, wherein the generation process of the spectrogram comprises:
step 401, based on first order numberWord filter h (z) ═ 1- α z-1Pre-emphasis processing is carried out on the voice fragment, wherein alpha is a filter coefficient;
step 402, performing framing processing on the pre-emphasized voice segment, and maintaining smooth transition and continuity between frames;
step 403, based on formulaFourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n);
Step 404, calculating y based on formula E (n, k) ═ X (n, k) |2=XR(n,k)2+XI(n,k)2Calculating the energy spectral density of each frame of signal, wherein X (n, k) is a complex sequence obtained after the nth frame of voice is subjected to FTT (Fourier transform) conversion by M points;
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010719665.3A CN112053694A (en) | 2020-07-23 | 2020-07-23 | Voiceprint recognition method based on CNN and GRU network fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010719665.3A CN112053694A (en) | 2020-07-23 | 2020-07-23 | Voiceprint recognition method based on CNN and GRU network fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112053694A true CN112053694A (en) | 2020-12-08 |
Family
ID=73602385
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010719665.3A Pending CN112053694A (en) | 2020-07-23 | 2020-07-23 | Voiceprint recognition method based on CNN and GRU network fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112053694A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613481A (en) * | 2021-01-04 | 2021-04-06 | 上海明略人工智能(集团)有限公司 | Bearing abrasion early warning method and system based on frequency spectrum |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113314144A (en) * | 2021-05-19 | 2021-08-27 | 中国南方电网有限责任公司超高压输电公司广州局 | Voice recognition and power equipment fault early warning method, system, terminal and medium |
CN113409795A (en) * | 2021-08-19 | 2021-09-17 | 北京世纪好未来教育科技有限公司 | Training method, voiceprint recognition method and device and electronic equipment |
CN113823291A (en) * | 2021-09-07 | 2021-12-21 | 广西电网有限责任公司贺州供电局 | Voiceprint recognition method and system applied to power operation |
WO2023070874A1 (en) * | 2021-10-28 | 2023-05-04 | 中国科学院深圳先进技术研究院 | Voiceprint recognition method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107276562A (en) * | 2017-06-16 | 2017-10-20 | 国网重庆市电力公司潼南区供电分公司 | One kind is based on improvement adaptive equalization hybrid algorithm RLS LMS transformer noise-eliminating methods |
CN108985167A (en) * | 2018-06-14 | 2018-12-11 | 兰州交通大学 | The gyro denoising method of improved RLS adaptive-filtering |
CN109524014A (en) * | 2018-11-29 | 2019-03-26 | 辽宁工业大学 | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks |
CN109523993A (en) * | 2018-11-02 | 2019-03-26 | 成都三零凯天通信实业有限公司 | A kind of voice languages classification method merging deep neural network with GRU based on CNN |
CN110299142A (en) * | 2018-05-14 | 2019-10-01 | 桂林远望智能通信科技有限公司 | A kind of method for recognizing sound-groove and device based on the network integration |
CN110634491A (en) * | 2019-10-23 | 2019-12-31 | 大连东软信息学院 | Series connection feature extraction system and method for general voice task in voice signal |
-
2020
- 2020-07-23 CN CN202010719665.3A patent/CN112053694A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107276562A (en) * | 2017-06-16 | 2017-10-20 | 国网重庆市电力公司潼南区供电分公司 | One kind is based on improvement adaptive equalization hybrid algorithm RLS LMS transformer noise-eliminating methods |
CN110299142A (en) * | 2018-05-14 | 2019-10-01 | 桂林远望智能通信科技有限公司 | A kind of method for recognizing sound-groove and device based on the network integration |
CN108985167A (en) * | 2018-06-14 | 2018-12-11 | 兰州交通大学 | The gyro denoising method of improved RLS adaptive-filtering |
CN109523993A (en) * | 2018-11-02 | 2019-03-26 | 成都三零凯天通信实业有限公司 | A kind of voice languages classification method merging deep neural network with GRU based on CNN |
CN109524014A (en) * | 2018-11-29 | 2019-03-26 | 辽宁工业大学 | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks |
CN110634491A (en) * | 2019-10-23 | 2019-12-31 | 大连东软信息学院 | Series connection feature extraction system and method for general voice task in voice signal |
Non-Patent Citations (1)
Title |
---|
常铁原等: "一种具有快速跟踪能力的改进RLS算法研究", 《计算机工程与应用》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112613481A (en) * | 2021-01-04 | 2021-04-06 | 上海明略人工智能(集团)有限公司 | Bearing abrasion early warning method and system based on frequency spectrum |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113129897B (en) * | 2021-04-08 | 2024-02-20 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism cyclic neural network |
CN113314144A (en) * | 2021-05-19 | 2021-08-27 | 中国南方电网有限责任公司超高压输电公司广州局 | Voice recognition and power equipment fault early warning method, system, terminal and medium |
CN113409795A (en) * | 2021-08-19 | 2021-09-17 | 北京世纪好未来教育科技有限公司 | Training method, voiceprint recognition method and device and electronic equipment |
CN113823291A (en) * | 2021-09-07 | 2021-12-21 | 广西电网有限责任公司贺州供电局 | Voiceprint recognition method and system applied to power operation |
WO2023070874A1 (en) * | 2021-10-28 | 2023-05-04 | 中国科学院深圳先进技术研究院 | Voiceprint recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110491416B (en) | Telephone voice emotion analysis and identification method based on LSTM and SAE | |
CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
CN110400579B (en) | Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network | |
CN109524014A (en) | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks | |
CN110459225B (en) | Speaker recognition system based on CNN fusion characteristics | |
CN105096955B (en) | A kind of speaker's method for quickly identifying and system based on model growth cluster | |
CN111161744B (en) | Speaker clustering method for simultaneously optimizing deep characterization learning and speaker identification estimation | |
CN111462729B (en) | Fast language identification method based on phoneme log-likelihood ratio and sparse representation | |
CN105206270A (en) | Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM) | |
CN113129897B (en) | Voiceprint recognition method based on attention mechanism cyclic neural network | |
CN110299132B (en) | Voice digital recognition method and device | |
Todkar et al. | Speaker recognition techniques: A review | |
CN113763965B (en) | Speaker identification method with multiple attention feature fusion | |
CN112735435A (en) | Voiceprint open set identification method with unknown class internal division capability | |
CN108564956A (en) | A kind of method for recognizing sound-groove and device, server, storage medium | |
Khdier et al. | Deep learning algorithms based voiceprint recognition system in noisy environment | |
CN114550703A (en) | Training method and device of voice recognition system, and voice recognition method and device | |
CN114783418B (en) | End-to-end voice recognition method and system based on sparse self-attention mechanism | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Koolagudi et al. | Speaker recognition in the case of emotional environment using transformation of speech features | |
CN111524520A (en) | Voiceprint recognition method based on error reverse propagation neural network | |
CN110415685A (en) | A kind of audio recognition method | |
Singh et al. | A critical review on automatic speaker recognition | |
CN113516987B (en) | Speaker recognition method, speaker recognition device, storage medium and equipment | |
CN115064175A (en) | Speaker recognition method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20201208 |
|
WD01 | Invention patent application deemed withdrawn after publication |