CN112053694A - Voiceprint recognition method based on CNN and GRU network fusion - Google Patents

Voiceprint recognition method based on CNN and GRU network fusion Download PDF

Info

Publication number
CN112053694A
CN112053694A CN202010719665.3A CN202010719665A CN112053694A CN 112053694 A CN112053694 A CN 112053694A CN 202010719665 A CN202010719665 A CN 202010719665A CN 112053694 A CN112053694 A CN 112053694A
Authority
CN
China
Prior art keywords
voice
spectrogram
cnn
gru network
voiceprint recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010719665.3A
Other languages
Chinese (zh)
Inventor
崔建伟
陈宝远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin University of Science and Technology
Original Assignee
Harbin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin University of Science and Technology filed Critical Harbin University of Science and Technology
Priority to CN202010719665.3A priority Critical patent/CN112053694A/en
Publication of CN112053694A publication Critical patent/CN112053694A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/12Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being prediction coefficients
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a voiceprint recognition method based on the integration of CNN and GRU network (C-GRU for short), which comprises the following steps: preprocessing a voice signal sample to be recognized and then performing voice enhancement processing on the voice signal sample through a self-adaptive filtering algorithm; generating a spectrogram of a voice segment to be recognized; inputting a spectrogram generated by a voice segment to be recognized into the trained C-GRU network model, and respectively extracting voiceprint features; and inputting the extracted features into a softmax function to obtain identity classification information of the voice signal segment to be recognized. The feature extraction method based on the C-GRU network avoids information loss in a frequency domain, and simultaneously realizes the voiceprint recognition method with higher recognition accuracy and faster convergence speed by utilizing the characteristic that the GRU network has good time feature extraction capability.

Description

Voiceprint recognition method based on CNN and GRU network fusion
Technical Field
The invention relates to a voiceprint recognition method, in particular to a voiceprint recognition method based on CNN and GRU network fusion (C-GRU for short).
Technical Field
In recent years, biometric information identification technology has become a very reliable and convenient way of authenticating identity information, attracting attention from both inside and outside of the industry. Voice is one of the everyday ways people communicate. Scientifically proves that the sounding organs of each person have difference, and the influence on the sounding organs is different due to different growing environments, so that each person has unique personality. The method has the advantages that the identity is recognized by voice, the method is very convenient to collect voice, and used equipment is cheap and easy to obtain, so that the method has great potential, and can be used for remote identity authentication, easy acceptance of users and the like.
Voiceprint recognition techniques can be divided into two directions, text-dependent and text-independent, in content. In the text-related voiceprint recognition method, a speaker must speak according to a fixed dialect, and the text content of a training voice must be the same as that of a testing voice, although the recognition method can be trained to have a good effect, the biggest disadvantage is that the speaker must pronounce according to the fixed text, and once the voice content is inconsistent with the text or does not pronounce according to the requirement, the text-independent voiceprint recognition method is difficult to guarantee, so that the popularization of the method in practical application has great limitation.
The traditional voiceprint recognition technology usually adopts a general background model (GMM-UBM), firstly, a speaker independent model is trained by utilizing the phonetic script of a large number of speakers, then, the problems of less phonetic data and phonetic mismatch caused by multiple channels existing in the traditional Gaussian mixture model are effectively solved by utilizing an MAP algorithm, and finally, the recognition model is trained through a maximum posterior probability or a maximum likelihood regression criterion. But this model takes up a significant amount of storage resources when modeling each speaker. The neural network method is a basic subject of deep learning research at present, and with the fact that deep learning gradually deepens into various fields, the aspect of voiceprint recognition technology gradually turns to the deep learning field for exploration research. The traditional deep learning method for voiceprint recognition mainly comprises a CNN (voice communications network) and a long-short term memory network (LSTM), original sequence features of sequence voice are ignored when a voiceprint recognition system based on the CNN extracts voiceprint features, and although voice feature sequences are considered in an LSTM network model, the LSTM network is extremely difficult to train due to huge operation requirements brought by 3 thresholds of the LSTM network.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a voiceprint recognition method based on the fusion of the CNN and the GRU network, which utilizes the advantage that the CNN can autonomously extract the characteristics to avoid the information loss on the frequency domain caused by the traditional speaker voice characteristic extraction method, simultaneously utilizes the characteristic that the GRU network has good time sequence characteristic extraction, and realizes the voiceprint recognition with high accuracy by adopting the mode of the fusion of the CNN and the GRU network.
The invention is realized by adopting the following technical scheme:
a voiceprint recognition method based on CNN and GRU network fusion comprises the following steps:
step 1, acquiring a voice segment to be recognized;
step 2, preprocessing an original voice signal to generate a spectrogram of a voice fragment to be recognized;
step 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to a time sequence to obtain identity classification information of the voice fragment to be recognized;
the training method of the voiceprint recognition model with the CNN and GRU network integration specifically comprises the following steps:
step 201, acquiring a training set of voice signals and a test set of the voice signals;
step 202, performing voice signal preprocessing by methods of pre-emphasis, framing, windowing, endpoint detection and the like;
step 203, improving the signal-to-noise ratio of the voice signal by improving the RLS algorithm;
step 204, converting each voice segment of the voice segment training set and the voice signal testing set through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set;
step 205, inputting a training set of the spectrogram into a CNN and GRU network to be trained, and training the CNN and GRU network to be trained;
step 206, inputting the testing set of the spectrogram into the trained CNN and GRU network, finishing the training of the CNN and GRU network if the output testing result meets the preset condition, or returning to the step 205 to perform the training again until the testing result meets the preset condition;
the spectrogram generating process comprises the following steps:
step 301, based on the first order digital filter h (z) ═ 1- α z-1Pre-emphasis processing is carried out on the voice fragment, wherein alpha is a filter coefficient;
step 302, performing framing processing on the pre-emphasized voice segment, and maintaining smooth transition and continuity between frames;
step 303, based on the formula
Figure BDA0002599498700000021
Fourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n);
Step 304, calculating | X (n, k) | based on formula E (n, k) | X (n, k) |2=XR(n,k)2+XI(n,k)2Calculating the energy spectral density of each frame of signal, wherein X (n, k) is a complex sequence obtained after the nth frame of voice is subjected to FTT (Fourier transform) conversion by M points;
step 305, obtaining the logarithm of the energy spectrum density in step 303
Figure BDA0002599498700000022
Step 306, based on the formula
Figure BDA0002599498700000023
Normalizing the spectrogram by a normalization method to obtain a normalized spectrogram, wherein Qmax(a, b) and Qmin(a, b) are respectively the maximum value and the minimum value in the grey scale of the spectrogram;
the improved RLS algorithm comprises:
step 401, based on the formula e (n) ═ d (n) — xT(n) ω (n-1) obtaining a prediction error to perform enhancement processing on the speech signal;
step 402, based on formula
Figure BDA0002599498700000024
Obtaining a Kalman gain coefficient;
step 403, based on formula
Figure BDA0002599498700000031
The forgetting factor is improved to have a faster tracking speed and a smaller steady-state error;
step 404, completing updating the filter coefficients based on the formula ω (n) ═ ω (n-1) + k (n) e (n);
in summary, the present invention discloses a voiceprint recognition method based on the fusion of CNN and GRU network, which includes the following steps: step 1, obtaining a voice segment to be recognized. And 2, preprocessing the original voice signal to generate a spectrogram of the voice fragment to be recognized. And 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized. The invention is based on the feature extraction method of the integration of the CNN and the GRU network, avoids the information loss on the frequency domain caused by the traditional speaker voice feature extraction method, and simultaneously realizes the voiceprint recognition with high accuracy by using the characteristic that the GRU network has good time sequence feature extraction and adopting the mode of the integration of the CNN and the GRU network.
The technical scheme provided by the implementation of the invention has the beneficial effects that at least:
the voiceprint recognition method based on the fusion of the CNN and the GRU network can solve the problems that a single neural network is insufficient in feature extraction and is limited by accuracy when being applied to complex problems, and meanwhile training efficiency is effectively improved.
Drawings
FIG. 1 is a flow chart of a voiceprint recognition method based on the fusion of a CNN and a GRU network implemented by the present invention;
FIG. 2 is a schematic diagram of a spectrogram implemented in the present invention;
FIG. 3 is a comparison of the improved RLS of the practice of the present invention;
fig. 4 is a schematic structural diagram of CNN and GRU network convergence implemented in the present invention;
Detailed Description
The present disclosure is described in further detail below with reference to the attached drawing figures.
As shown in fig. 1, the present invention discloses a voiceprint recognition method based on the fusion of CNN and GRU network, comprising the steps of. A speech segment to be recognized is acquired (step 1). Then, the original voice signal is preprocessed to generate a spectrogram of the voice segment to be recognized (step 2). And inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized (step 3).
The obtained spectrogram schematic diagram is shown in fig. 2, and the specific generation method comprises the following steps. Firstly, a pre-processing operation is performed on the speech signal to obtain a frame-by-frame short-time speech (step 301), based on a formula
Figure BDA0002599498700000032
Fourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n) (step 302), then intangibly calculating the luminance based on the formula E (n, k) ═ X (n, k)2=XR(n,k)2+XI(n,k)2Calculating the energy spectrum density of each frame signal, wherein X (n, k) is a complex sequence obtained by the N frame speech after the M-point FTT transformation (step 303), and obtaining the logarithm of the energy spectrum density of the step 303
Figure BDA0002599498700000033
(step 304), finally based on the formula
Figure BDA0002599498700000034
Normalizing the spectrogram by a normalization method to obtain a normalized spectrogram, wherein Qmax(a, b) and Qmin(a, b) are the maximum and minimum values in spectrogram grayscale level, respectively (step 305).
Improved RLS comparison graph As shown in FIG. 3, the cost function of the exponentially weighted RLS algorithm with the forgetting factor is Jn∑λn-ie2(i) And the smaller the lambda is, the stronger the tracking capability of the time-varying parameter is, but the smaller the lambda is, the more sensitive the noise is, and the larger the stable error is, and the weaker the tracking capability is, but the less sensitive the noise is. The RLS algorithm based on the variable forgetting factor has the advantages that the tracking capability is high, and small parameter estimation errors are considered, and the algorithm is as follows:
e(n)=d(n)-xT(n)ω(n-1)
Figure BDA0002599498700000041
ω(n)=ω(n-1)+k(n)e(n)
Figure BDA0002599498700000042
λ(n)=λmin+(1-λmin)2L(n)
L(n)=-round[μe2(n)]
in summary, when the error becomes smaller, λ (n) approaches 1, and the parameter error is reduced, whereas when the square error becomes larger, λ (n) becomes the minimum value λmin
Based on the above understanding, it is proposed to use the following formula as a correction function, and introduce the parameter a to control the function shape, and the specific function is as follows:
Figure BDA0002599498700000043
in the formula, the constant m and n control the value range of the function, and the constant a and the constant b control the convergence speed of the function and improve the shape of the function. When e istWhen larger as 6, λ (n) approaches n; when e istWhen 0 is equal, λ (n) is m + n, i.e. n<λ(n)<m + n; experiments show that when the temperature is 0.8<λ(n)<1 is preferred, so that m, n finally takes the value of m 0.2 and n 0.8.
The schematic diagram of the structure of the obtained C-GRU network is shown in FIG. 4, and the training process comprises. Firstly, a training set of a speech signal and a test set of the speech signal are obtained (step 201), the speech signal is preprocessed by pre-emphasis, framing, windowing, end point detection and other methods (step 202), the speech signal is enhanced by improving RLS algorithm (step 203), then, each voice segment of the voice segment training set and the voice signal testing set is converted through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set (step 204), the training set of the spectrogram is input into a CNN and a GRU network to be trained, training the CNN and the GRU network to be trained (step 205), finally inputting the testing set of the spectrogram into the trained CNN and GRU network, if the output testing result meets the preset condition, the training of CNN and GRU network is completed, otherwise, the step 205 is returned to perform the training again until the test result satisfies the preset condition (step 206).
During the training process, the following modes can be adopted: the jth spectrogram of the ith speaker
Figure BDA0002599498700000044
The corresponding tag value is labeled i-1. And (3) sending the obtained spectrogram into a network for training, wherein the input dimension of the spectrogram is 128 multiplied by 128, and the length and the height of the corresponding spectrogram. And then inputting the data into a C-GRU network, wherein the convolution kernel size of the CNN network is 5 multiplied by 5, the convolution kernel number is 100, the size of a pooling layer is 2 multiplied by 2, the most obvious features in the feature map are extracted, and the maximum pooling is adopted by the network model. To prevent loss of timing information in the spectrogram, the signals are pooled only in frequency when pooling is performed. In order to prevent the phenomenon of data overfitting in the network, Dropout is added inside the GRU unit, and Dropout is added among GRU internal neurons and different GThe connection between the RU units is temporarily disconnected by a certain ratio, and the disconnection ratio can be set to 0.2. And finally, carrying out classification and identification through a softmax classifier.
The present disclosure is based on the use of a TIMIT Speech database with a corpus sampling rate of 16kHz, 16 bits, containing 10 sentences per person from 630 people. And during testing, 80% of data sets are used as training samples, and the rest 20% of data sets are used as test sets. The network model was iterated 20 times, the results are shown in table 1:
Figure BDA0002599498700000051
the accuracy of the C-GRU network is superior to that of the other two network structures, and the single characteristic has large influence on voiceprint recognition and cannot meet actual requirements.
The foregoing is one of the exemplary embodiments of the present disclosure, and various modifications may be made by those skilled in the art without departing from the spirit and scope of the present invention.

Claims (4)

1. A voiceprint recognition method based on CNN and GRU network fusion is characterized by comprising the following steps:
step 1, acquiring a voice segment to be recognized;
step 2, preprocessing an original voice signal to generate a spectrogram of a voice fragment to be recognized;
and 3, inputting the spectrogram into a combined neural network voiceprint recognition model related to the time sequence to obtain identity classification information of the voice fragment to be recognized.
2. The method for voiceprint recognition based on CNN and GRU network convergence according to claim 1, wherein the training method for the CNN and GRU network voiceprint recognition model comprises the following steps:
step 201, acquiring a training set of voice signals and a test set of the voice signals;
step 202, performing voice signal preprocessing by methods of pre-emphasis, framing, windowing, endpoint detection and the like;
step 203, improving the signal-to-noise ratio of the voice signal by improving the RLS algorithm;
step 204, converting each voice segment of the voice segment training set and the voice signal testing set through operations such as discrete Fourier transform and the like to obtain a spectrogram training set and a spectrogram testing set;
step 205, inputting a training set of the spectrogram into a CNN and GRU network to be trained, and training the CNN and GRU network to be trained;
and step 206, inputting the testing set of the spectrogram into the trained CNN and GRU network, finishing the training of the CNN and GRU network if the output testing result meets the preset condition, or returning to the step 205 to perform the training again until the testing result meets the preset condition.
3. The method of claim 2, wherein the improved RLS algorithm comprises:
step 301, based on the formula e (n) ═ d (n) — xT(n) ω (n-1) obtaining a prediction error to perform enhancement processing on the speech signal;
step 302, based on formula
Figure FDA0002599498690000011
Obtaining a Kalman gain coefficient;
step 303, based on the formula
Figure FDA0002599498690000012
The forgetting factor is improved to have a faster tracking speed and a smaller steady-state error;
step 304, updating the filter coefficients based on the formula ω (n) ═ ω (n-1) + k (n) e (n).
4. The method for voiceprint recognition based on CNN and GRU network convergence according to claim 1, wherein the generation process of the spectrogram comprises:
step 401, based on first order numberWord filter h (z) ═ 1- α z-1Pre-emphasis processing is carried out on the voice fragment, wherein alpha is a filter coefficient;
step 402, performing framing processing on the pre-emphasized voice segment, and maintaining smooth transition and continuity between frames;
step 403, based on formula
Figure FDA0002599498690000013
Fourier transform is carried out on each frame of signal, wherein M is the number of sampling points of each frame, and the sequence formed by M sampling points of the nth frame of voice is x0(n),x1(n),…,xM-1(n);
Step 404, calculating y based on formula E (n, k) ═ X (n, k) |2=XR(n,k)2+XI(n,k)2Calculating the energy spectral density of each frame of signal, wherein X (n, k) is a complex sequence obtained after the nth frame of voice is subjected to FTT (Fourier transform) conversion by M points;
step 405, obtaining the logarithm of the energy spectrum density in the step 404
Figure FDA0002599498690000014
Step 406, based on the formula
Figure FDA0002599498690000015
Normalizing the spectrogram by a normalization method to obtain a normalized spectrogram, wherein Qmax(a, b) and QminAnd (a, b) are respectively the maximum value and the minimum value in the grayscale level of the spectrogram.
CN202010719665.3A 2020-07-23 2020-07-23 Voiceprint recognition method based on CNN and GRU network fusion Pending CN112053694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010719665.3A CN112053694A (en) 2020-07-23 2020-07-23 Voiceprint recognition method based on CNN and GRU network fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010719665.3A CN112053694A (en) 2020-07-23 2020-07-23 Voiceprint recognition method based on CNN and GRU network fusion

Publications (1)

Publication Number Publication Date
CN112053694A true CN112053694A (en) 2020-12-08

Family

ID=73602385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010719665.3A Pending CN112053694A (en) 2020-07-23 2020-07-23 Voiceprint recognition method based on CNN and GRU network fusion

Country Status (1)

Country Link
CN (1) CN112053694A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613481A (en) * 2021-01-04 2021-04-06 上海明略人工智能(集团)有限公司 Bearing abrasion early warning method and system based on frequency spectrum
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113314144A (en) * 2021-05-19 2021-08-27 中国南方电网有限责任公司超高压输电公司广州局 Voice recognition and power equipment fault early warning method, system, terminal and medium
CN113409795A (en) * 2021-08-19 2021-09-17 北京世纪好未来教育科技有限公司 Training method, voiceprint recognition method and device and electronic equipment
CN113823291A (en) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 Voiceprint recognition method and system applied to power operation
WO2023070874A1 (en) * 2021-10-28 2023-05-04 中国科学院深圳先进技术研究院 Voiceprint recognition method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107276562A (en) * 2017-06-16 2017-10-20 国网重庆市电力公司潼南区供电分公司 One kind is based on improvement adaptive equalization hybrid algorithm RLS LMS transformer noise-eliminating methods
CN108985167A (en) * 2018-06-14 2018-12-11 兰州交通大学 The gyro denoising method of improved RLS adaptive-filtering
CN109524014A (en) * 2018-11-29 2019-03-26 辽宁工业大学 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks
CN109523993A (en) * 2018-11-02 2019-03-26 成都三零凯天通信实业有限公司 A kind of voice languages classification method merging deep neural network with GRU based on CNN
CN110299142A (en) * 2018-05-14 2019-10-01 桂林远望智能通信科技有限公司 A kind of method for recognizing sound-groove and device based on the network integration
CN110634491A (en) * 2019-10-23 2019-12-31 大连东软信息学院 Series connection feature extraction system and method for general voice task in voice signal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107276562A (en) * 2017-06-16 2017-10-20 国网重庆市电力公司潼南区供电分公司 One kind is based on improvement adaptive equalization hybrid algorithm RLS LMS transformer noise-eliminating methods
CN110299142A (en) * 2018-05-14 2019-10-01 桂林远望智能通信科技有限公司 A kind of method for recognizing sound-groove and device based on the network integration
CN108985167A (en) * 2018-06-14 2018-12-11 兰州交通大学 The gyro denoising method of improved RLS adaptive-filtering
CN109523993A (en) * 2018-11-02 2019-03-26 成都三零凯天通信实业有限公司 A kind of voice languages classification method merging deep neural network with GRU based on CNN
CN109524014A (en) * 2018-11-29 2019-03-26 辽宁工业大学 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks
CN110634491A (en) * 2019-10-23 2019-12-31 大连东软信息学院 Series connection feature extraction system and method for general voice task in voice signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
常铁原等: "一种具有快速跟踪能力的改进RLS算法研究", 《计算机工程与应用》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112613481A (en) * 2021-01-04 2021-04-06 上海明略人工智能(集团)有限公司 Bearing abrasion early warning method and system based on frequency spectrum
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113129897B (en) * 2021-04-08 2024-02-20 杭州电子科技大学 Voiceprint recognition method based on attention mechanism cyclic neural network
CN113314144A (en) * 2021-05-19 2021-08-27 中国南方电网有限责任公司超高压输电公司广州局 Voice recognition and power equipment fault early warning method, system, terminal and medium
CN113409795A (en) * 2021-08-19 2021-09-17 北京世纪好未来教育科技有限公司 Training method, voiceprint recognition method and device and electronic equipment
CN113823291A (en) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 Voiceprint recognition method and system applied to power operation
WO2023070874A1 (en) * 2021-10-28 2023-05-04 中国科学院深圳先进技术研究院 Voiceprint recognition method

Similar Documents

Publication Publication Date Title
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
CN112053694A (en) Voiceprint recognition method based on CNN and GRU network fusion
CN110400579B (en) Speech emotion recognition based on direction self-attention mechanism and bidirectional long-time and short-time network
CN109524014A (en) A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks
CN110459225B (en) Speaker recognition system based on CNN fusion characteristics
CN105096955B (en) A kind of speaker&#39;s method for quickly identifying and system based on model growth cluster
CN111161744B (en) Speaker clustering method for simultaneously optimizing deep characterization learning and speaker identification estimation
CN111462729B (en) Fast language identification method based on phoneme log-likelihood ratio and sparse representation
CN105206270A (en) Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)
CN113129897B (en) Voiceprint recognition method based on attention mechanism cyclic neural network
CN110299132B (en) Voice digital recognition method and device
Todkar et al. Speaker recognition techniques: A review
CN113763965B (en) Speaker identification method with multiple attention feature fusion
CN112735435A (en) Voiceprint open set identification method with unknown class internal division capability
CN108564956A (en) A kind of method for recognizing sound-groove and device, server, storage medium
Khdier et al. Deep learning algorithms based voiceprint recognition system in noisy environment
CN114550703A (en) Training method and device of voice recognition system, and voice recognition method and device
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Koolagudi et al. Speaker recognition in the case of emotional environment using transformation of speech features
CN111524520A (en) Voiceprint recognition method based on error reverse propagation neural network
CN110415685A (en) A kind of audio recognition method
Singh et al. A critical review on automatic speaker recognition
CN113516987B (en) Speaker recognition method, speaker recognition device, storage medium and equipment
CN115064175A (en) Speaker recognition method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20201208

WD01 Invention patent application deemed withdrawn after publication