CN109524014A - A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks - Google Patents

A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks Download PDF

Info

Publication number
CN109524014A
CN109524014A CN201811439719.XA CN201811439719A CN109524014A CN 109524014 A CN109524014 A CN 109524014A CN 201811439719 A CN201811439719 A CN 201811439719A CN 109524014 A CN109524014 A CN 109524014A
Authority
CN
China
Prior art keywords
voice signal
neural networks
convolutional neural
frame
voiceprint recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811439719.XA
Other languages
Chinese (zh)
Inventor
仲珩
李昕
褚治广
蔡盼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University of Technology
Original Assignee
Liaoning University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University of Technology filed Critical Liaoning University of Technology
Priority to CN201811439719.XA priority Critical patent/CN109524014A/en
Publication of CN109524014A publication Critical patent/CN109524014A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention discloses a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, including step 1: acquiring the voice signal of known speaker, gray scale sound spectrograph is generated after the voice signal is pre-processed, and characteristic parameter extraction is carried out to the gray scale sound spectrograph;Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained;Step 3: acquiring voice signal to be identified, the characteristic parameter of the gray scale sound spectrograph of voice signal to be identified is obtained according to step 1, and identify the speaker of the voice signal to be identified using trained convolutional neural networks.Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, it is extracted the characteristic parameter of voice signal, and pass through the training of depth convolutional neural networks, identification, it can correctly identify the identity of speaker, preferable effect is achieved, the accuracy and efficiency of Application on Voiceprint Recognition is effectively improved.

Description

A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks
Technical field
The present invention relates to artificial intelligence fields, and more particularly, the present invention relates to one kind to be based on depth convolutional neural networks Application on Voiceprint Recognition analysis method.
Background technique
With the raising of scientific and technological level and the fast development of artificial intelligence, Application on Voiceprint Recognition numerous areas importance increasingly It highlights.For example, determining user identity using telephone speech certification in financial field;In security fields, use vocal print as into The authorization message of important secret occasion out;In police and judicial field, using vocal print as a kind of effective supplementary means to crime The identity of suspect judges;In military field, the identity of personnel is judged using vocal print;In medical application, vocal print is used for The diagnosis etc. of certain related diseases.Vocal print signal acquisition is extremely convenient, is full of in each place of people's daily life.Research High performance Voiceprint Recognition System has important real value.For this purpose, in order to promote the Accuracy and high efficiency of Application on Voiceprint Recognition, Designing a kind of Application on Voiceprint Recognition analysis method based on machine learning is very important.
Summary of the invention
The present invention has designed and developed a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, is extracted voice The characteristic parameter of signal, and by the training of depth convolutional neural networks, identification, it can correctly identify the identity of speaker, have Effect improves the accuracy and efficiency of Application on Voiceprint Recognition.
Technical solution provided by the invention are as follows:
A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, includes the following steps:
Step 1: acquiring the voice signal of known speaker, gray scale language spectrum is generated after the voice signal is pre-processed Figure carries out characteristic parameter extraction to the gray scale sound spectrograph;
Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained comprising 5 A hidden layer, 3 convolutional layers and 2 down-sampled layers, and convolution sum is down-sampled alternately:
First convolutional layer is made of 8 Feature Mappings, and the mode use using 5 × 5 convolution kernel, and convolution does not make 0 part is mended with edge to be calculated;
First drop adopts layer, is made of 8 Feature Mappings, using 2 × 2 convolution kernel, realizes down-sampled and local average;
Second convolutional layer is made of 20 Feature Mappings, using 5 × 5 convolution kernel, and each Feature Mapping by 10 × 10 neuron compositions;
Second drop adopts layer, is made of 20 Feature Mappings, using 5 × 5 convolution kernel;
Obtained characteristic pattern is pulled into vector, and uses 5 × 5 convolution kernel to each characteristic pattern by third convolutional layer;
Step 3: acquiring voice signal to be identified, the spy of the gray scale sound spectrograph of voice signal to be identified is obtained according to step 1 Parameter is levied, and identifies the speaker of the voice signal to be identified using trained convolutional neural networks.
Preferably, in step 1, the pretreatment of the voice signal include sampling with quantization, preemphasis, framing adding window and End-point detection.
Preferably, the sampling Yu quantization of the voice signal include: by voice signal with the sampling rate number of 8kHz Change, each sampling is indicated with 8bit.
Preferably, the preemphasis of the voice signal includes:
By the audio digital signals after over-sampling and quantization conversion, made by single order high-pass filter pre-
Exacerbation processing, highlights high frequency section, the transmission function of the single order high-pass filter are as follows:
H (z)=1-0.9375z-1
Wherein, z is the frequency of voice signal.
Preferably, the framing adding window of the voice signal includes:
Continuous speech signal is split as multiframe voice signal according to 10~30ms;
Windowing process, the window function of the Hamming window are carried out to the multiframe voice signal using the window function of Hamming window Are as follows:
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
Preferably, the end-point detection of the voice signal includes:
The silence clip in voice signal is rejected using short-time energy method and short-time zero-crossing rate method.
Preferably, in step 1, the generation of the gray scale sound spectrograph includes:
Each frame voice signal is resolved into amplitude spectrum by discrete Fourier transform:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains Column, k are Fourier transformation parameter, and e is the truth of a matter of natural logrithm, xpIt (n) is the letter of p-th of sampled point of n-th frame voice signal Number;
Obtain the energy density spectrum for the sequence of complex numbers that every frame voice signal obtains after Fourier transformation:
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation Spectrum, XR(n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th The imaginary part for the sequence of complex numbers that frame voice signal obtains after Fourier transformation;
Logarithm is taken to energy spectral density:
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|;
The energy spectral density of logarithmic form is mapped as the pixel value Q (n, m) between 0-255, obtains gray scale sound spectrograph:
Wherein, T (n, m) is the energy spectral density value of m logarithmic form of n-th frame voice signal, Tmax(n, m) is n-th frame Maximum value in the energy spectral density value of m logarithmic form of voice signal, Tmin(n, m) is that m of n-th frame voice signal are right Minimum value in the energy spectral density value of number form formula.
Preferably, described using Meier characteristic parameter of the frequency cepstral coefficient parameter as the gray scale sound spectrograph The acquisition of Meier frequency cepstral coefficient parameter includes:
Discrete cosine transform is carried out after taking logarithm to the energy spectral density, casts out its DC component, remaining is Meier Frequency cepstral coefficient parameter.
Preferably, the step 3 includes:
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,...,Aω,...,AS, so that A1=A2=... =Aω=...=AS=0;
Voice signal to be measured is obtained to the characteristic set of the gray scale sound spectrograph of voice signal to be identified according to step 1, respectively Trained convolutional neural networks are inputted, are spoken when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω kind When the corresponding voice signal of people, Aω=Aω+1;
Export decision content max (A1,A2,…,Aω,…,AS) speaker belonging to corresponding voice signal.
Preferably, continuous speech signal is split as multiframe voice signal with the frame length of 10ms.
It is of the present invention the utility model has the advantages that
Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, is extracted the spy of voice signal Parameter is levied, and by the training of depth convolutional neural networks, identification, can correctly identify the identity of speaker, achieve preferably Effect, effectively improve the accuracy and efficiency of Application on Voiceprint Recognition.
Detailed description of the invention
Fig. 1 is the Application on Voiceprint Recognition analytical framework schematic diagram of the present invention based on depth convolutional neural networks.
Fig. 2 is of the present invention completely based on the voiceprint recognition algorithm flow chart of depth convolutional neural networks.
Fig. 3 is identification model overall construction drawing of the present invention.
Specific embodiment
Present invention will be described in further detail below with reference to the accompanying drawings, to enable those skilled in the art referring to specification text Word can be implemented accordingly.
Referring to Fig. 1, the Application on Voiceprint Recognition analytical framework schematic diagram based on depth convolutional neural networks.By obtaining speaker Raw tone the voice messaging of speaker is pre-processed as input;For passing through pretreated voice messaging, By Fourier transformation, Fourier transformation is done to each frame, calculates the energy spectral density of every frame signal, energy spectral density is taken pair Number, is mapped as gray scale sound spectrograph for the energy spectral density of logarithmic form;Feature extraction is carried out for the vocal print feature in sound spectrograph, It establishes depth convolutional neural networks (CNNs) and classification based training is carried out for the characteristic parameter in training sample, finally utilize template Test sample is identified with method, obtains discriminance analysis result.
Referring to Fig. 2, for the complete Application on Voiceprint Recognition parser flow chart of the present invention.
Speech signal pre-processing, process are as follows:
In order to which balanced sound acquisition process generates certain decaying, the influence to voice signal, before handling voice signal, It needs to aggravate the high frequency section of signal, while reducing the influence of noise, the frequency spectrum of voice signal is made to become flat, improve noise Than.It is by a transmission function by the audio digital signals after sampling
H (z)=1-0.9375z-1
Single order high-pass filter realize preemphasis.
System voice sample frequency is 8kHz, correspondingly takes voice when extracting Meier frequency cepstral coefficient (MFCC) parameter The frame length 10ms of frame.To reduce the prediction error at signal both ends, avoid occurring leakage phenomenon in frequency spectrum, using Hamming window Window function carries out windowing process to voice signal, and Hamming window function is
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
It often further include silence clip in the voice of speaker other than including effective sound bite.Silence clip In the presence of the reduction that will lead to Application on Voiceprint Recognition accuracy rate and efficiency.The silence clip in voice is eliminated using end-point detection. The method combined in this method using short-time energy and short-time zero-crossing rate removes silence clip.
Voice signal through over-sampling and quantization, preemphasis, framing adding window, end-point detection and etc. after, so that it may carry out Generate sound spectrograph.
Sound spectrograph method is generated to specifically include:
Each frame is resolved into amplitude spectrum by discrete Fourier transform, n-th frame Fourier transformation is as follows:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains Column, k are Fourier transformation parameter, and e is the truth of a matter of natural logrithm, xpIt (n) is the letter of p-th of sampled point of n-th frame voice signal Number.
Next sequence of complex numbers X (n, k), k=0,1 that every frame signal obtains after M point Fourier transformation are calculated ..., The energy density spectrum of M-1, calculating formula are
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation Spectrum, XR(n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th The imaginary part for the sequence of complex numbers that frame voice signal obtains after Fourier transformation;
Then logarithm is taken to energy spectral density, is converted to a decibel form according to the following formula
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|
After above-mentioned steps are handled, it is assumed that n-th frame voice signal has obtained m value, is denoted as T (n, m), therein each Value is all the energy spectral density of logarithmic form.The maximum value for enabling T (n, m) is Tmax(n, m), minimum value Tmin(n, m), then can be with The pixel value Q (n, m) being mapped as the energy spectral density of logarithmic form by following formula between 0-255
Finally using n as abscissa, m is as ordinate, and with Q (n, m) for pixel value, the two dimensional image of generation is exactly gray scale Sound spectrograph.
After speech signal pre-processing generates sound spectrograph, followed by the extraction of characteristic parameter.Using MFCC parameter as sound The characteristic parameter of line identification, the calculation process of MFCC parameter are as follows:
It calculates energy spectral density and takes the discrete cosine transform after logarithm, obtain D (n, m).Cast out its DC component, remaining is For MFCC parameter.
Next the foundation to depth convolutional neural networks (CNNs) model will be completed, this is the core of this method. The structure of CNNs designs are as follows: total network includes 5 hidden layers, wherein 3 convolutional layers and 2 down-sampled layers, 1 full connection Layer.Between convolution sum is down-sampled alternately, specific calculating process is as described below for calculation process:
Convolution operation is carried out in the first hidden layer, characteristic pattern is designed as 8, and each characteristic pattern uses 5 × 5 convolution kernel, It is in this way 28 × 28 by the characteristic pattern size obtained after Feature Mapping, the mode of convolution mends 0 using edge is not used here Part is calculated;
Down-sampled and local average is realized in the second hidden layer, it is equally made of 8 Feature Mappings, down-sampled to use 2 × 2 core, finally obtaining each characteristic pattern size is 14 × 14.Down-sampled layer characteristic pattern quantity does not change, only each spy The size of sign reduces, this is a kind of dimensionality reduction mode;
Second of convolution is carried out in third hidden layer, it is made of 20 Feature Mappings, and convolution kernel size is similarly 5 × 5, Each Feature Mapping is made of 10 × 10 neurons.Here convolution operation is with the first convolutional layer, only in Feature Mapping process In a characteristic pattern may be connected with upper one layer of multiple characteristic pattern;
Second of down-sampled operation is carried out in the 4th hidden layer.It is made of 20 characteristic patterns, down-sampled template using 2 × 2, the mapping graph obtained in this way is 5 × 5 sizes.
Obtained characteristic pattern is pulled into vector in the 5th hidden layer, uses 5 × 5 convolution operations for each characteristic pattern, this Sample can be to 120 dimension output vectors.
Network the last layer is full articulamentum, obtains output vector by BP network training.
Next classification based training is carried out to training sample characteristic parameter, trained purpose is exactly in order to obtain in network model The biasing of connection weight and neuron between neuron, these values constitute model library.Trained process is calculated using supervised learning Method, training sample characteristic parameter is tagged before training, and CNNs model is by the corresponding mark of i-th of characteristic parameter of training sample Label value stamps i-1, all characteristic parameters of some speaker possess identical label value, this label value represents the speaker ID.Training pattern selects to intersect entropy function as cost function, and output layer uses Softmax classifier.Use BP algorithm meter Gradient is calculated, and CNNs network model can be trained in conjunction with any general technology based on gradient.
Referring to Fig. 3, identification model overall construction drawing.The identification of network model corresponds to the identification rank in Application on Voiceprint Recognition Section.
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,…,Aω,…,AS, so that A1=A2=...= Aω=...=AS=0;
Voice signal to be measured is obtained to the characteristic set of the gray scale sound spectrograph of voice signal to be identified according to step 1, respectively Trained convolutional neural networks are inputted, are spoken when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω kind When the corresponding voice signal of people, Aω=Aω+1;
Export decision content max (A1,A2,…,Aω,…,AS) speaker belonging to corresponding voice signal.
Specific discriminance analysis process is as follows: assuming that the voice of speaker to be measured obtains after the sound spectrograph generating process of front N characteristic parameter sound spectrographs have been arrived, then successively this N sound spectrographs have been input in CNNs network model, every sound spectrograph exists Before being input in network, pre-treatment step is also carried out as the training stage.Then CNNs network model can provide every Speaker belonging to characteristic parameter sound spectrograph.Last N characteristic parameter sound spectrographs can correspond to N number of speaker, wherein frequency of occurrence Most speakers is identified as speaker belonging to this section of tested speech, to achieve the purpose that Application on Voiceprint Recognition.
It illustrates: assuming that having A, B, C, D, E, F and G kind speaker in database, when training, with A, B, C, D, E, F and G Speech signal pre-processing after obtained characteristic parameter as training set training, using A, B, C, D, E, F and G speaker as exporting Vector obtains depth convolutional neural networks model.The voice for acquiring speaker to be measured, after the sound spectrograph generating process of front N characteristic parameter sound spectrographs have been obtained, this N sound spectrographs are successively then input in CNNs network model (every sound spectrograph Before being input in network, pre-treatment step is also carried out as the training stage), ballot form is then taken, finally To one group of result.Voting process is as follows;
The decision content of initialization sample A, B, C, D, E, F and G, A=B=C=D=E=F=G=0;
The characteristic parameter of voice signal to be measured is input in CNNs network model, is said if voice signal to be measured belongs to A People is talked about, then A=A+1;If voice signal to be measured belongs to B speaker, B=B+1;
And so on,
If voice signal to be measured belongs to G speaker, G=G+1;
Finally output result is that max (A, B, C, D, E, F, G) speaks belonging to artificial voice signal to be measured to get ticket is most Speaker.
Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, is extracted the spy of voice signal Parameter is levied, and by the training of depth convolutional neural networks, identification, can correctly identify the identity of speaker, achieve preferably Effect, effectively improve the accuracy and efficiency of Application on Voiceprint Recognition.
Although the embodiments of the present invention have been disclosed as above, but its is not only in the description and the implementation listed With it can be fully applied to various fields suitable for the present invention, for those skilled in the art, can be easily Realize other modification, therefore without departing from the general concept defined in the claims and the equivalent scope, the present invention is simultaneously unlimited In specific details and legend shown and described herein.

Claims (10)

1. a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, which comprises the steps of:
Step 1: the voice signal of known speaker is acquired, gray scale sound spectrograph is generated after the voice signal is pre-processed, Characteristic parameter extraction is carried out to the gray scale sound spectrograph;
Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained comprising 5 hidden Layer, 3 convolutional layers and 2 down-sampled layers are hidden, and convolution sum is down-sampled alternately:
First convolutional layer is made of 8 Feature Mappings, and using 5 × 5 convolution kernel, and the mode of convolution is using unused side Edge is mended 0 part and is calculated;
First drop adopts layer, is made of 8 Feature Mappings, using 2 × 2 convolution kernel, realizes down-sampled and local average;
Second convolutional layer is made of 20 Feature Mappings, and using 5 × 5 convolution kernel, and each Feature Mapping is by 10 × 10 Neuron composition;
Second drop adopts layer, is made of 20 Feature Mappings, using 5 × 5 convolution kernel;
Obtained characteristic pattern is pulled into vector, and uses 5 × 5 convolution kernel to each characteristic pattern by third convolutional layer;
Step 3: acquiring voice signal to be identified, joined according to the feature that step 1 obtains the gray scale sound spectrograph of voice signal to be identified It counts, and identifies the speaker of the voice signal to be identified using trained convolutional neural networks.
2. as described in claim 1 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that step In 1, the pretreatment of the voice signal includes sampling and quantization, preemphasis, framing adding window and end-point detection.
3. as claimed in claim 2 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described The sampling and quantization of voice signal include: to digitize voice signal with the sampling rate of 8kHz, and each sampling is indicated with 8bit.
4. as claimed in claim 3 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described The preemphasis of voice signal includes:
By the audio digital signals after over-sampling and quantization conversion, make preemphasis processing by single order high-pass filter, it is convex Aobvious high frequency section, the transmission function of the single order high-pass filter are as follows:
H (z)=1-0.9375z-1
Wherein, z is the frequency of voice signal.
5. as claimed in claim 4 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described The framing adding window of voice signal includes:
Continuous speech signal is split as multiframe voice signal with the frame length of 10~30ms;
Windowing process, the window function of the Hamming window are carried out to the multiframe voice signal using the window function of Hamming window are as follows:
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
6. as claimed in claim 5 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described The end-point detection of voice signal includes:
The silence clip in voice signal is rejected using short-time energy method and short-time zero-crossing rate method.
7. the Application on Voiceprint Recognition analysis method based on depth convolutional neural networks as described in any one of claim 2-6, It is characterized in that, in step 1, the generation of the gray scale sound spectrograph includes:
Each frame voice signal is resolved into amplitude spectrum by discrete Fourier transform:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains, k For Fourier transformation parameter, e is the truth of a matter of natural logrithm, xpIt (n) is the signal of p-th of sampled point of n-th frame voice signal;
Obtain the energy density spectrum for the sequence of complex numbers that every frame voice signal obtains after Fourier transformation:
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density spectrum for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation, XR (n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th frame language The imaginary part for the sequence of complex numbers that sound signal obtains after Fourier transformation;
Logarithm is taken to energy spectral density:
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|;
The energy spectral density of logarithmic form is mapped as the pixel value Q (n, m) between 0-255, obtains gray scale sound spectrograph:
Wherein, T (n, m) is the energy spectral density value of m logarithmic form of n-th frame voice signal, Tmax(n, m) is n-th frame voice Maximum value in the energy spectral density value of m logarithmic form of signal, Tmin(n, m) is m of n-th frame voice signal to number form Minimum value in the energy spectral density value of formula.
8. as claimed in claim 7 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that use Meier characteristic parameter of the frequency cepstral coefficient parameter as the gray scale sound spectrograph, the Meier frequency cepstral coefficient parameter Acquisition include:
Discrete cosine transform is carried out after taking logarithm to the energy spectral density, casts out its DC component, remaining is Meier frequency Cepstrum coefficient parameter.
9. as claimed in claim 8 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described Step 3 includes:
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,...,Aω,...,AS, so that A1=A2=...=Aω =...=AS=0;
The characteristic set that voice signal to be measured is obtained to the gray scale sound spectrograph of voice signal to be identified according to step 1, inputs respectively Trained convolutional neural networks are corresponded to when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω speaker Voice signal when, Aω=Aω+1;
Export decision content max (A1,A2,...,Aω,...,AS) speaker belonging to corresponding voice signal.
10. as claimed in claim 5 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that will Continuous speech signal is split as multiframe voice signal with the frame length of 10ms.
CN201811439719.XA 2018-11-29 2018-11-29 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks Pending CN109524014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811439719.XA CN109524014A (en) 2018-11-29 2018-11-29 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811439719.XA CN109524014A (en) 2018-11-29 2018-11-29 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks

Publications (1)

Publication Number Publication Date
CN109524014A true CN109524014A (en) 2019-03-26

Family

ID=65793759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811439719.XA Pending CN109524014A (en) 2018-11-29 2018-11-29 A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks

Country Status (1)

Country Link
CN (1) CN109524014A (en)

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265035A (en) * 2019-04-25 2019-09-20 武汉大晟极科技有限公司 A kind of method for distinguishing speek person based on deep learning
CN110277100A (en) * 2019-06-19 2019-09-24 南京邮电大学 Based on the improved method for recognizing sound-groove of Alexnet, storage medium and terminal
CN110349588A (en) * 2019-07-16 2019-10-18 重庆理工大学 A kind of LSTM network method for recognizing sound-groove of word-based insertion
CN110534118A (en) * 2019-07-29 2019-12-03 安徽继远软件有限公司 Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network
CN111048097A (en) * 2019-12-19 2020-04-21 中国人民解放军空军研究院通信与导航研究所 Twin network voiceprint recognition method based on 3D convolution
CN111124108A (en) * 2019-11-22 2020-05-08 Oppo广东移动通信有限公司 Model training method, gesture control method, device, medium and electronic equipment
CN111210835A (en) * 2020-01-08 2020-05-29 华南理工大学 Multi-channel voice noise reduction method based on auditory model and information source direction
CN111210807A (en) * 2020-02-21 2020-05-29 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111429921A (en) * 2020-03-02 2020-07-17 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111524525A (en) * 2020-04-28 2020-08-11 平安科技(深圳)有限公司 Original voice voiceprint recognition method, device, equipment and storage medium
CN111862989A (en) * 2020-06-01 2020-10-30 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN111951809A (en) * 2019-05-14 2020-11-17 深圳子丸科技有限公司 Multi-person voiceprint identification method and system
CN112053694A (en) * 2020-07-23 2020-12-08 哈尔滨理工大学 Voiceprint recognition method based on CNN and GRU network fusion
CN112201258A (en) * 2020-10-15 2021-01-08 杭州电子科技大学 AMBP-based noise robustness camouflage voice detection method
CN112614492A (en) * 2020-12-09 2021-04-06 通号智慧城市研究设计院有限公司 Voiceprint recognition method, system and storage medium based on time-space information fusion
CN112712814A (en) * 2020-12-04 2021-04-27 中国南方电网有限责任公司 Voiceprint recognition method based on deep learning algorithm
CN112767949A (en) * 2021-01-18 2021-05-07 东南大学 Voiceprint recognition system based on binary weight convolutional neural network
CN112786059A (en) * 2021-03-11 2021-05-11 合肥市清大创新研究院有限公司 Voiceprint feature extraction method and device based on artificial intelligence
CN112883562A (en) * 2021-02-01 2021-06-01 上海交通大学三亚崖州湾深海科技研究院 Method for repairing ocean platform actual measurement stress spectrum based on neural network algorithm
WO2021127990A1 (en) * 2019-12-24 2021-07-01 广州国音智能科技有限公司 Voiceprint recognition method based on voice noise reduction and related apparatus
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113470653A (en) * 2020-03-31 2021-10-01 华为技术有限公司 Voiceprint recognition method, electronic equipment and system
CN113823291A (en) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 Voiceprint recognition method and system applied to power operation
CN114333850A (en) * 2022-03-15 2022-04-12 清华大学 Voice voiceprint visualization method and device
CN114598565A (en) * 2022-05-10 2022-06-07 深圳市发掘科技有限公司 Kitchen electrical equipment remote control system and method and computer equipment
CN115206335A (en) * 2022-09-15 2022-10-18 北京中环高科环境治理有限公司 Noise monitoring method for automatic sample retention and evidence collection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160099010A1 (en) * 2014-10-03 2016-04-07 Google Inc. Convolutional, long short-term memory, fully connected deep neural networks
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Method for distinguishing speek person based on convolutional neural networks and spectrogram
WO2018053518A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Channel-compensated low-level features for speaker recognition
CN108597539A (en) * 2018-02-09 2018-09-28 桂林电子科技大学 Speech-emotion recognition method based on parameter migration and sound spectrograph
CN108831485A (en) * 2018-06-11 2018-11-16 东北师范大学 Method for distinguishing speek person based on sound spectrograph statistical nature

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160099010A1 (en) * 2014-10-03 2016-04-07 Google Inc. Convolutional, long short-term memory, fully connected deep neural networks
WO2018053518A1 (en) * 2016-09-19 2018-03-22 Pindrop Security, Inc. Channel-compensated low-level features for speaker recognition
CN106952649A (en) * 2017-05-14 2017-07-14 北京工业大学 Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN108597539A (en) * 2018-02-09 2018-09-28 桂林电子科技大学 Speech-emotion recognition method based on parameter migration and sound spectrograph
CN108831485A (en) * 2018-06-11 2018-11-16 东北师范大学 Method for distinguishing speek person based on sound spectrograph statistical nature

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265035B (en) * 2019-04-25 2021-08-06 武汉大晟极科技有限公司 Speaker recognition method based on deep learning
CN110265035A (en) * 2019-04-25 2019-09-20 武汉大晟极科技有限公司 A kind of method for distinguishing speek person based on deep learning
CN111951809A (en) * 2019-05-14 2020-11-17 深圳子丸科技有限公司 Multi-person voiceprint identification method and system
CN110277100A (en) * 2019-06-19 2019-09-24 南京邮电大学 Based on the improved method for recognizing sound-groove of Alexnet, storage medium and terminal
CN110349588A (en) * 2019-07-16 2019-10-18 重庆理工大学 A kind of LSTM network method for recognizing sound-groove of word-based insertion
CN110534118A (en) * 2019-07-29 2019-12-03 安徽继远软件有限公司 Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network
CN110534118B (en) * 2019-07-29 2021-10-08 安徽继远软件有限公司 Transformer/reactor fault diagnosis method based on voiceprint recognition and neural network
CN111124108A (en) * 2019-11-22 2020-05-08 Oppo广东移动通信有限公司 Model training method, gesture control method, device, medium and electronic equipment
CN111048097A (en) * 2019-12-19 2020-04-21 中国人民解放军空军研究院通信与导航研究所 Twin network voiceprint recognition method based on 3D convolution
WO2021127990A1 (en) * 2019-12-24 2021-07-01 广州国音智能科技有限公司 Voiceprint recognition method based on voice noise reduction and related apparatus
CN111210835A (en) * 2020-01-08 2020-05-29 华南理工大学 Multi-channel voice noise reduction method based on auditory model and information source direction
CN111210835B (en) * 2020-01-08 2023-07-18 华南理工大学 Multichannel voice noise reduction method based on auditory model and information source direction
CN111210807B (en) * 2020-02-21 2023-03-31 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111210807A (en) * 2020-02-21 2020-05-29 厦门快商通科技股份有限公司 Speech recognition model training method, system, mobile terminal and storage medium
CN111429921A (en) * 2020-03-02 2020-07-17 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN111429921B (en) * 2020-03-02 2023-01-03 厦门快商通科技股份有限公司 Voiceprint recognition method, system, mobile terminal and storage medium
CN113470653A (en) * 2020-03-31 2021-10-01 华为技术有限公司 Voiceprint recognition method, electronic equipment and system
CN111524525B (en) * 2020-04-28 2023-06-16 平安科技(深圳)有限公司 Voiceprint recognition method, device, equipment and storage medium of original voice
CN111524525A (en) * 2020-04-28 2020-08-11 平安科技(深圳)有限公司 Original voice voiceprint recognition method, device, equipment and storage medium
CN111862989B (en) * 2020-06-01 2024-03-08 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN111862989A (en) * 2020-06-01 2020-10-30 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN112053694A (en) * 2020-07-23 2020-12-08 哈尔滨理工大学 Voiceprint recognition method based on CNN and GRU network fusion
CN112201258A (en) * 2020-10-15 2021-01-08 杭州电子科技大学 AMBP-based noise robustness camouflage voice detection method
CN112712814A (en) * 2020-12-04 2021-04-27 中国南方电网有限责任公司 Voiceprint recognition method based on deep learning algorithm
CN112614492A (en) * 2020-12-09 2021-04-06 通号智慧城市研究设计院有限公司 Voiceprint recognition method, system and storage medium based on time-space information fusion
CN112767949A (en) * 2021-01-18 2021-05-07 东南大学 Voiceprint recognition system based on binary weight convolutional neural network
CN112883562A (en) * 2021-02-01 2021-06-01 上海交通大学三亚崖州湾深海科技研究院 Method for repairing ocean platform actual measurement stress spectrum based on neural network algorithm
CN112883562B (en) * 2021-02-01 2023-02-24 上海交通大学三亚崖州湾深海科技研究院 Ocean platform actual measurement stress spectrum repairing method based on neural network algorithm
CN112786059A (en) * 2021-03-11 2021-05-11 合肥市清大创新研究院有限公司 Voiceprint feature extraction method and device based on artificial intelligence
CN113129897A (en) * 2021-04-08 2021-07-16 杭州电子科技大学 Voiceprint recognition method based on attention mechanism recurrent neural network
CN113129897B (en) * 2021-04-08 2024-02-20 杭州电子科技大学 Voiceprint recognition method based on attention mechanism cyclic neural network
WO2023036016A1 (en) * 2021-09-07 2023-03-16 广西电网有限责任公司贺州供电局 Voiceprint recognition method and system applied to electric power operation
CN113823291A (en) * 2021-09-07 2021-12-21 广西电网有限责任公司贺州供电局 Voiceprint recognition method and system applied to power operation
CN114333850A (en) * 2022-03-15 2022-04-12 清华大学 Voice voiceprint visualization method and device
CN114598565A (en) * 2022-05-10 2022-06-07 深圳市发掘科技有限公司 Kitchen electrical equipment remote control system and method and computer equipment
CN115206335A (en) * 2022-09-15 2022-10-18 北京中环高科环境治理有限公司 Noise monitoring method for automatic sample retention and evidence collection
CN115206335B (en) * 2022-09-15 2022-12-02 北京中环高科环境治理有限公司 Noise monitoring method for automatic sample retention and evidence collection

Similar Documents

Publication Publication Date Title
CN109524014A (en) A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
CN106952649A (en) Method for distinguishing speek person based on convolutional neural networks and spectrogram
CN108831485A (en) Method for distinguishing speek person based on sound spectrograph statistical nature
CN107610707A (en) A kind of method for recognizing sound-groove and device
CN107393554A (en) In a kind of sound scene classification merge class between standard deviation feature extracting method
CN111128209B (en) Speech enhancement method based on mixed masking learning target
CN109559736A (en) A kind of film performer's automatic dubbing method based on confrontation network
CN109256138A (en) Auth method, terminal device and computer readable storage medium
CN112053694A (en) Voiceprint recognition method based on CNN and GRU network fusion
CN113658583B (en) Ear voice conversion method, system and device based on generation countermeasure network
Shi et al. End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network.
CN114783418B (en) End-to-end voice recognition method and system based on sparse self-attention mechanism
CN110782902A (en) Audio data determination method, apparatus, device and medium
CN102496366B (en) Speaker identification method irrelevant with text
Jing et al. Speaker recognition based on principal component analysis of LPCC and MFCC
Xue et al. Cross-modal information fusion for voice spoofing detection
Zheng et al. MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics
Wu et al. A Characteristic of Speaker's Audio in the Model Space Based on Adaptive Frequency Scaling
CN110136741A (en) A kind of single-channel voice Enhancement Method based on multiple dimensioned context
CN115064175A (en) Speaker recognition method
Chelali et al. MFCC and vector quantization for Arabic fricatives speech/speaker recognition
Lavania et al. Reviewing Human-Machine Interaction through Speech Recognition approaches and Analyzing an approach for Designing an Efficient System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190326

RJ01 Rejection of invention patent application after publication