CN109524014A - A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks - Google Patents
A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks Download PDFInfo
- Publication number
- CN109524014A CN109524014A CN201811439719.XA CN201811439719A CN109524014A CN 109524014 A CN109524014 A CN 109524014A CN 201811439719 A CN201811439719 A CN 201811439719A CN 109524014 A CN109524014 A CN 109524014A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- neural networks
- convolutional neural
- frame
- voiceprint recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 47
- 238000004458 analytical method Methods 0.000 title claims abstract description 22
- 238000000605 extraction Methods 0.000 claims abstract description 5
- 238000000034 method Methods 0.000 claims description 25
- 230000009466 transformation Effects 0.000 claims description 21
- 230000003595 spectral effect Effects 0.000 claims description 19
- 238000013507 mapping Methods 0.000 claims description 16
- 238000005070 sampling Methods 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 14
- 230000008569 process Effects 0.000 claims description 14
- 238000001228 spectrum Methods 0.000 claims description 12
- 238000013139 quantization Methods 0.000 claims description 7
- 238000001514 detection method Methods 0.000 claims description 6
- 239000013598 vector Substances 0.000 claims description 6
- 238000009432 framing Methods 0.000 claims description 5
- 210000002569 neuron Anatomy 0.000 claims description 5
- 230000005540 biological transmission Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 2
- 239000000203 mixture Substances 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000005236 sound signal Effects 0.000 claims 1
- 238000012549 training Methods 0.000 abstract description 18
- 230000000694 effects Effects 0.000 abstract description 4
- 230000001755 vocal effect Effects 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002203 pretreatment Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000005713 exacerbation Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/04—Training, enrolment or model building
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Complex Calculations (AREA)
Abstract
The present invention discloses a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, including step 1: acquiring the voice signal of known speaker, gray scale sound spectrograph is generated after the voice signal is pre-processed, and characteristic parameter extraction is carried out to the gray scale sound spectrograph;Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained;Step 3: acquiring voice signal to be identified, the characteristic parameter of the gray scale sound spectrograph of voice signal to be identified is obtained according to step 1, and identify the speaker of the voice signal to be identified using trained convolutional neural networks.Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, it is extracted the characteristic parameter of voice signal, and pass through the training of depth convolutional neural networks, identification, it can correctly identify the identity of speaker, preferable effect is achieved, the accuracy and efficiency of Application on Voiceprint Recognition is effectively improved.
Description
Technical field
The present invention relates to artificial intelligence fields, and more particularly, the present invention relates to one kind to be based on depth convolutional neural networks
Application on Voiceprint Recognition analysis method.
Background technique
With the raising of scientific and technological level and the fast development of artificial intelligence, Application on Voiceprint Recognition numerous areas importance increasingly
It highlights.For example, determining user identity using telephone speech certification in financial field;In security fields, use vocal print as into
The authorization message of important secret occasion out;In police and judicial field, using vocal print as a kind of effective supplementary means to crime
The identity of suspect judges;In military field, the identity of personnel is judged using vocal print;In medical application, vocal print is used for
The diagnosis etc. of certain related diseases.Vocal print signal acquisition is extremely convenient, is full of in each place of people's daily life.Research
High performance Voiceprint Recognition System has important real value.For this purpose, in order to promote the Accuracy and high efficiency of Application on Voiceprint Recognition,
Designing a kind of Application on Voiceprint Recognition analysis method based on machine learning is very important.
Summary of the invention
The present invention has designed and developed a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, is extracted voice
The characteristic parameter of signal, and by the training of depth convolutional neural networks, identification, it can correctly identify the identity of speaker, have
Effect improves the accuracy and efficiency of Application on Voiceprint Recognition.
Technical solution provided by the invention are as follows:
A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, includes the following steps:
Step 1: acquiring the voice signal of known speaker, gray scale language spectrum is generated after the voice signal is pre-processed
Figure carries out characteristic parameter extraction to the gray scale sound spectrograph;
Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained comprising 5
A hidden layer, 3 convolutional layers and 2 down-sampled layers, and convolution sum is down-sampled alternately:
First convolutional layer is made of 8 Feature Mappings, and the mode use using 5 × 5 convolution kernel, and convolution does not make
0 part is mended with edge to be calculated;
First drop adopts layer, is made of 8 Feature Mappings, using 2 × 2 convolution kernel, realizes down-sampled and local average;
Second convolutional layer is made of 20 Feature Mappings, using 5 × 5 convolution kernel, and each Feature Mapping by 10 ×
10 neuron compositions;
Second drop adopts layer, is made of 20 Feature Mappings, using 5 × 5 convolution kernel;
Obtained characteristic pattern is pulled into vector, and uses 5 × 5 convolution kernel to each characteristic pattern by third convolutional layer;
Step 3: acquiring voice signal to be identified, the spy of the gray scale sound spectrograph of voice signal to be identified is obtained according to step 1
Parameter is levied, and identifies the speaker of the voice signal to be identified using trained convolutional neural networks.
Preferably, in step 1, the pretreatment of the voice signal include sampling with quantization, preemphasis, framing adding window and
End-point detection.
Preferably, the sampling Yu quantization of the voice signal include: by voice signal with the sampling rate number of 8kHz
Change, each sampling is indicated with 8bit.
Preferably, the preemphasis of the voice signal includes:
By the audio digital signals after over-sampling and quantization conversion, made by single order high-pass filter pre-
Exacerbation processing, highlights high frequency section, the transmission function of the single order high-pass filter are as follows:
H (z)=1-0.9375z-1
Wherein, z is the frequency of voice signal.
Preferably, the framing adding window of the voice signal includes:
Continuous speech signal is split as multiframe voice signal according to 10~30ms;
Windowing process, the window function of the Hamming window are carried out to the multiframe voice signal using the window function of Hamming window
Are as follows:
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
Preferably, the end-point detection of the voice signal includes:
The silence clip in voice signal is rejected using short-time energy method and short-time zero-crossing rate method.
Preferably, in step 1, the generation of the gray scale sound spectrograph includes:
Each frame voice signal is resolved into amplitude spectrum by discrete Fourier transform:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains
Column, k are Fourier transformation parameter, and e is the truth of a matter of natural logrithm, xpIt (n) is the letter of p-th of sampled point of n-th frame voice signal
Number;
Obtain the energy density spectrum for the sequence of complex numbers that every frame voice signal obtains after Fourier transformation:
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation
Spectrum, XR(n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th
The imaginary part for the sequence of complex numbers that frame voice signal obtains after Fourier transformation;
Logarithm is taken to energy spectral density:
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|;
The energy spectral density of logarithmic form is mapped as the pixel value Q (n, m) between 0-255, obtains gray scale sound spectrograph:
Wherein, T (n, m) is the energy spectral density value of m logarithmic form of n-th frame voice signal, Tmax(n, m) is n-th frame
Maximum value in the energy spectral density value of m logarithmic form of voice signal, Tmin(n, m) is that m of n-th frame voice signal are right
Minimum value in the energy spectral density value of number form formula.
Preferably, described using Meier characteristic parameter of the frequency cepstral coefficient parameter as the gray scale sound spectrograph
The acquisition of Meier frequency cepstral coefficient parameter includes:
Discrete cosine transform is carried out after taking logarithm to the energy spectral density, casts out its DC component, remaining is Meier
Frequency cepstral coefficient parameter.
Preferably, the step 3 includes:
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,...,Aω,...,AS, so that A1=A2=...
=Aω=...=AS=0;
Voice signal to be measured is obtained to the characteristic set of the gray scale sound spectrograph of voice signal to be identified according to step 1, respectively
Trained convolutional neural networks are inputted, are spoken when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω kind
When the corresponding voice signal of people, Aω=Aω+1;
Export decision content max (A1,A2,…,Aω,…,AS) speaker belonging to corresponding voice signal.
Preferably, continuous speech signal is split as multiframe voice signal with the frame length of 10ms.
It is of the present invention the utility model has the advantages that
Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, is extracted the spy of voice signal
Parameter is levied, and by the training of depth convolutional neural networks, identification, can correctly identify the identity of speaker, achieve preferably
Effect, effectively improve the accuracy and efficiency of Application on Voiceprint Recognition.
Detailed description of the invention
Fig. 1 is the Application on Voiceprint Recognition analytical framework schematic diagram of the present invention based on depth convolutional neural networks.
Fig. 2 is of the present invention completely based on the voiceprint recognition algorithm flow chart of depth convolutional neural networks.
Fig. 3 is identification model overall construction drawing of the present invention.
Specific embodiment
Present invention will be described in further detail below with reference to the accompanying drawings, to enable those skilled in the art referring to specification text
Word can be implemented accordingly.
Referring to Fig. 1, the Application on Voiceprint Recognition analytical framework schematic diagram based on depth convolutional neural networks.By obtaining speaker
Raw tone the voice messaging of speaker is pre-processed as input;For passing through pretreated voice messaging,
By Fourier transformation, Fourier transformation is done to each frame, calculates the energy spectral density of every frame signal, energy spectral density is taken pair
Number, is mapped as gray scale sound spectrograph for the energy spectral density of logarithmic form;Feature extraction is carried out for the vocal print feature in sound spectrograph,
It establishes depth convolutional neural networks (CNNs) and classification based training is carried out for the characteristic parameter in training sample, finally utilize template
Test sample is identified with method, obtains discriminance analysis result.
Referring to Fig. 2, for the complete Application on Voiceprint Recognition parser flow chart of the present invention.
Speech signal pre-processing, process are as follows:
In order to which balanced sound acquisition process generates certain decaying, the influence to voice signal, before handling voice signal,
It needs to aggravate the high frequency section of signal, while reducing the influence of noise, the frequency spectrum of voice signal is made to become flat, improve noise
Than.It is by a transmission function by the audio digital signals after sampling
H (z)=1-0.9375z-1
Single order high-pass filter realize preemphasis.
System voice sample frequency is 8kHz, correspondingly takes voice when extracting Meier frequency cepstral coefficient (MFCC) parameter
The frame length 10ms of frame.To reduce the prediction error at signal both ends, avoid occurring leakage phenomenon in frequency spectrum, using Hamming window
Window function carries out windowing process to voice signal, and Hamming window function is
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
It often further include silence clip in the voice of speaker other than including effective sound bite.Silence clip
In the presence of the reduction that will lead to Application on Voiceprint Recognition accuracy rate and efficiency.The silence clip in voice is eliminated using end-point detection.
The method combined in this method using short-time energy and short-time zero-crossing rate removes silence clip.
Voice signal through over-sampling and quantization, preemphasis, framing adding window, end-point detection and etc. after, so that it may carry out
Generate sound spectrograph.
Sound spectrograph method is generated to specifically include:
Each frame is resolved into amplitude spectrum by discrete Fourier transform, n-th frame Fourier transformation is as follows:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains
Column, k are Fourier transformation parameter, and e is the truth of a matter of natural logrithm, xpIt (n) is the letter of p-th of sampled point of n-th frame voice signal
Number.
Next sequence of complex numbers X (n, k), k=0,1 that every frame signal obtains after M point Fourier transformation are calculated ...,
The energy density spectrum of M-1, calculating formula are
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation
Spectrum, XR(n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th
The imaginary part for the sequence of complex numbers that frame voice signal obtains after Fourier transformation;
Then logarithm is taken to energy spectral density, is converted to a decibel form according to the following formula
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|
After above-mentioned steps are handled, it is assumed that n-th frame voice signal has obtained m value, is denoted as T (n, m), therein each
Value is all the energy spectral density of logarithmic form.The maximum value for enabling T (n, m) is Tmax(n, m), minimum value Tmin(n, m), then can be with
The pixel value Q (n, m) being mapped as the energy spectral density of logarithmic form by following formula between 0-255
Finally using n as abscissa, m is as ordinate, and with Q (n, m) for pixel value, the two dimensional image of generation is exactly gray scale
Sound spectrograph.
After speech signal pre-processing generates sound spectrograph, followed by the extraction of characteristic parameter.Using MFCC parameter as sound
The characteristic parameter of line identification, the calculation process of MFCC parameter are as follows:
It calculates energy spectral density and takes the discrete cosine transform after logarithm, obtain D (n, m).Cast out its DC component, remaining is
For MFCC parameter.
Next the foundation to depth convolutional neural networks (CNNs) model will be completed, this is the core of this method.
The structure of CNNs designs are as follows: total network includes 5 hidden layers, wherein 3 convolutional layers and 2 down-sampled layers, 1 full connection
Layer.Between convolution sum is down-sampled alternately, specific calculating process is as described below for calculation process:
Convolution operation is carried out in the first hidden layer, characteristic pattern is designed as 8, and each characteristic pattern uses 5 × 5 convolution kernel,
It is in this way 28 × 28 by the characteristic pattern size obtained after Feature Mapping, the mode of convolution mends 0 using edge is not used here
Part is calculated;
Down-sampled and local average is realized in the second hidden layer, it is equally made of 8 Feature Mappings, down-sampled to use 2
× 2 core, finally obtaining each characteristic pattern size is 14 × 14.Down-sampled layer characteristic pattern quantity does not change, only each spy
The size of sign reduces, this is a kind of dimensionality reduction mode;
Second of convolution is carried out in third hidden layer, it is made of 20 Feature Mappings, and convolution kernel size is similarly 5 × 5,
Each Feature Mapping is made of 10 × 10 neurons.Here convolution operation is with the first convolutional layer, only in Feature Mapping process
In a characteristic pattern may be connected with upper one layer of multiple characteristic pattern;
Second of down-sampled operation is carried out in the 4th hidden layer.It is made of 20 characteristic patterns, down-sampled template using 2 ×
2, the mapping graph obtained in this way is 5 × 5 sizes.
Obtained characteristic pattern is pulled into vector in the 5th hidden layer, uses 5 × 5 convolution operations for each characteristic pattern, this
Sample can be to 120 dimension output vectors.
Network the last layer is full articulamentum, obtains output vector by BP network training.
Next classification based training is carried out to training sample characteristic parameter, trained purpose is exactly in order to obtain in network model
The biasing of connection weight and neuron between neuron, these values constitute model library.Trained process is calculated using supervised learning
Method, training sample characteristic parameter is tagged before training, and CNNs model is by the corresponding mark of i-th of characteristic parameter of training sample
Label value stamps i-1, all characteristic parameters of some speaker possess identical label value, this label value represents the speaker
ID.Training pattern selects to intersect entropy function as cost function, and output layer uses Softmax classifier.Use BP algorithm meter
Gradient is calculated, and CNNs network model can be trained in conjunction with any general technology based on gradient.
Referring to Fig. 3, identification model overall construction drawing.The identification of network model corresponds to the identification rank in Application on Voiceprint Recognition
Section.
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,…,Aω,…,AS, so that A1=A2=...=
Aω=...=AS=0;
Voice signal to be measured is obtained to the characteristic set of the gray scale sound spectrograph of voice signal to be identified according to step 1, respectively
Trained convolutional neural networks are inputted, are spoken when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω kind
When the corresponding voice signal of people, Aω=Aω+1;
Export decision content max (A1,A2,…,Aω,…,AS) speaker belonging to corresponding voice signal.
Specific discriminance analysis process is as follows: assuming that the voice of speaker to be measured obtains after the sound spectrograph generating process of front
N characteristic parameter sound spectrographs have been arrived, then successively this N sound spectrographs have been input in CNNs network model, every sound spectrograph exists
Before being input in network, pre-treatment step is also carried out as the training stage.Then CNNs network model can provide every
Speaker belonging to characteristic parameter sound spectrograph.Last N characteristic parameter sound spectrographs can correspond to N number of speaker, wherein frequency of occurrence
Most speakers is identified as speaker belonging to this section of tested speech, to achieve the purpose that Application on Voiceprint Recognition.
It illustrates: assuming that having A, B, C, D, E, F and G kind speaker in database, when training, with A, B, C, D, E, F and G
Speech signal pre-processing after obtained characteristic parameter as training set training, using A, B, C, D, E, F and G speaker as exporting
Vector obtains depth convolutional neural networks model.The voice for acquiring speaker to be measured, after the sound spectrograph generating process of front
N characteristic parameter sound spectrographs have been obtained, this N sound spectrographs are successively then input in CNNs network model (every sound spectrograph
Before being input in network, pre-treatment step is also carried out as the training stage), ballot form is then taken, finally
To one group of result.Voting process is as follows;
The decision content of initialization sample A, B, C, D, E, F and G, A=B=C=D=E=F=G=0;
The characteristic parameter of voice signal to be measured is input in CNNs network model, is said if voice signal to be measured belongs to A
People is talked about, then A=A+1;If voice signal to be measured belongs to B speaker, B=B+1;
And so on,
If voice signal to be measured belongs to G speaker, G=G+1;
Finally output result is that max (A, B, C, D, E, F, G) speaks belonging to artificial voice signal to be measured to get ticket is most
Speaker.
Application on Voiceprint Recognition analysis method provided by the invention based on depth convolutional neural networks, is extracted the spy of voice signal
Parameter is levied, and by the training of depth convolutional neural networks, identification, can correctly identify the identity of speaker, achieve preferably
Effect, effectively improve the accuracy and efficiency of Application on Voiceprint Recognition.
Although the embodiments of the present invention have been disclosed as above, but its is not only in the description and the implementation listed
With it can be fully applied to various fields suitable for the present invention, for those skilled in the art, can be easily
Realize other modification, therefore without departing from the general concept defined in the claims and the equivalent scope, the present invention is simultaneously unlimited
In specific details and legend shown and described herein.
Claims (10)
1. a kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks, which comprises the steps of:
Step 1: the voice signal of known speaker is acquired, gray scale sound spectrograph is generated after the voice signal is pre-processed,
Characteristic parameter extraction is carried out to the gray scale sound spectrograph;
Step 2: depth convolutional neural networks being established to the characteristic parameter of the gray scale sound spectrograph and are trained comprising 5 hidden
Layer, 3 convolutional layers and 2 down-sampled layers are hidden, and convolution sum is down-sampled alternately:
First convolutional layer is made of 8 Feature Mappings, and using 5 × 5 convolution kernel, and the mode of convolution is using unused side
Edge is mended 0 part and is calculated;
First drop adopts layer, is made of 8 Feature Mappings, using 2 × 2 convolution kernel, realizes down-sampled and local average;
Second convolutional layer is made of 20 Feature Mappings, and using 5 × 5 convolution kernel, and each Feature Mapping is by 10 × 10
Neuron composition;
Second drop adopts layer, is made of 20 Feature Mappings, using 5 × 5 convolution kernel;
Obtained characteristic pattern is pulled into vector, and uses 5 × 5 convolution kernel to each characteristic pattern by third convolutional layer;
Step 3: acquiring voice signal to be identified, joined according to the feature that step 1 obtains the gray scale sound spectrograph of voice signal to be identified
It counts, and identifies the speaker of the voice signal to be identified using trained convolutional neural networks.
2. as described in claim 1 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that step
In 1, the pretreatment of the voice signal includes sampling and quantization, preemphasis, framing adding window and end-point detection.
3. as claimed in claim 2 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described
The sampling and quantization of voice signal include: to digitize voice signal with the sampling rate of 8kHz, and each sampling is indicated with 8bit.
4. as claimed in claim 3 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described
The preemphasis of voice signal includes:
By the audio digital signals after over-sampling and quantization conversion, make preemphasis processing by single order high-pass filter, it is convex
Aobvious high frequency section, the transmission function of the single order high-pass filter are as follows:
H (z)=1-0.9375z-1
Wherein, z is the frequency of voice signal.
5. as claimed in claim 4 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described
The framing adding window of voice signal includes:
Continuous speech signal is split as multiframe voice signal with the frame length of 10~30ms;
Windowing process, the window function of the Hamming window are carried out to the multiframe voice signal using the window function of Hamming window are as follows:
Wherein, W (n) is the window function of the Hamming window of n-th frame voice signal, and N is the frame number of voice signal.
6. as claimed in claim 5 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described
The end-point detection of voice signal includes:
The silence clip in voice signal is rejected using short-time energy method and short-time zero-crossing rate method.
7. the Application on Voiceprint Recognition analysis method based on depth convolutional neural networks as described in any one of claim 2-6,
It is characterized in that, in step 1, the generation of the gray scale sound spectrograph includes:
Each frame voice signal is resolved into amplitude spectrum by discrete Fourier transform:
Wherein, M is the sampling number of each frame, and X (n, k) is the sequence that n-th frame voice signal passes through that Fourier transformation obtains, k
For Fourier transformation parameter, e is the truth of a matter of natural logrithm, xpIt (n) is the signal of p-th of sampled point of n-th frame voice signal;
Obtain the energy density spectrum for the sequence of complex numbers that every frame voice signal obtains after Fourier transformation:
E (n, k)=| X (n, k) |=XR(n,k)2+XI(n,k)2
Wherein, E (n, k) is the energy density spectrum for the sequence of complex numbers that ground n-th frame voice signal obtains after Fourier transformation, XR
(n, k) is the real part for the sequence of complex numbers that n-th frame voice signal obtains after Fourier transformation, XI(n, k) is n-th frame language
The imaginary part for the sequence of complex numbers that sound signal obtains after Fourier transformation;
Logarithm is taken to energy spectral density:
10log10E (n, k)=10log10|X(n,k)|2=20log10|X(n,k)|;
The energy spectral density of logarithmic form is mapped as the pixel value Q (n, m) between 0-255, obtains gray scale sound spectrograph:
Wherein, T (n, m) is the energy spectral density value of m logarithmic form of n-th frame voice signal, Tmax(n, m) is n-th frame voice
Maximum value in the energy spectral density value of m logarithmic form of signal, Tmin(n, m) is m of n-th frame voice signal to number form
Minimum value in the energy spectral density value of formula.
8. as claimed in claim 7 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that use
Meier characteristic parameter of the frequency cepstral coefficient parameter as the gray scale sound spectrograph, the Meier frequency cepstral coefficient parameter
Acquisition include:
Discrete cosine transform is carried out after taking logarithm to the energy spectral density, casts out its DC component, remaining is Meier frequency
Cepstrum coefficient parameter.
9. as claimed in claim 8 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that described
Step 3 includes:
Initialize the decision content A of the corresponding voice signal of S kind speaker1,A2,...,Aω,...,AS, so that A1=A2=...=Aω
=...=AS=0;
The characteristic set that voice signal to be measured is obtained to the gray scale sound spectrograph of voice signal to be identified according to step 1, inputs respectively
Trained convolutional neural networks are corresponded to when the feature for the gray scale sound spectrograph for identifying the voice signal to be measured belongs to ω speaker
Voice signal when, Aω=Aω+1;
Export decision content max (A1,A2,...,Aω,...,AS) speaker belonging to corresponding voice signal.
10. as claimed in claim 5 based on the Application on Voiceprint Recognition analysis method of depth convolutional neural networks, which is characterized in that will
Continuous speech signal is split as multiframe voice signal with the frame length of 10ms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811439719.XA CN109524014A (en) | 2018-11-29 | 2018-11-29 | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811439719.XA CN109524014A (en) | 2018-11-29 | 2018-11-29 | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109524014A true CN109524014A (en) | 2019-03-26 |
Family
ID=65793759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811439719.XA Pending CN109524014A (en) | 2018-11-29 | 2018-11-29 | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109524014A (en) |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265035A (en) * | 2019-04-25 | 2019-09-20 | 武汉大晟极科技有限公司 | A kind of method for distinguishing speek person based on deep learning |
CN110277100A (en) * | 2019-06-19 | 2019-09-24 | 南京邮电大学 | Based on the improved method for recognizing sound-groove of Alexnet, storage medium and terminal |
CN110349588A (en) * | 2019-07-16 | 2019-10-18 | 重庆理工大学 | A kind of LSTM network method for recognizing sound-groove of word-based insertion |
CN110534118A (en) * | 2019-07-29 | 2019-12-03 | 安徽继远软件有限公司 | Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network |
CN111048097A (en) * | 2019-12-19 | 2020-04-21 | 中国人民解放军空军研究院通信与导航研究所 | Twin network voiceprint recognition method based on 3D convolution |
CN111124108A (en) * | 2019-11-22 | 2020-05-08 | Oppo广东移动通信有限公司 | Model training method, gesture control method, device, medium and electronic equipment |
CN111210835A (en) * | 2020-01-08 | 2020-05-29 | 华南理工大学 | Multi-channel voice noise reduction method based on auditory model and information source direction |
CN111210807A (en) * | 2020-02-21 | 2020-05-29 | 厦门快商通科技股份有限公司 | Speech recognition model training method, system, mobile terminal and storage medium |
CN111429921A (en) * | 2020-03-02 | 2020-07-17 | 厦门快商通科技股份有限公司 | Voiceprint recognition method, system, mobile terminal and storage medium |
CN111524525A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Original voice voiceprint recognition method, device, equipment and storage medium |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111951809A (en) * | 2019-05-14 | 2020-11-17 | 深圳子丸科技有限公司 | Multi-person voiceprint identification method and system |
CN112053694A (en) * | 2020-07-23 | 2020-12-08 | 哈尔滨理工大学 | Voiceprint recognition method based on CNN and GRU network fusion |
CN112201258A (en) * | 2020-10-15 | 2021-01-08 | 杭州电子科技大学 | AMBP-based noise robustness camouflage voice detection method |
CN112614492A (en) * | 2020-12-09 | 2021-04-06 | 通号智慧城市研究设计院有限公司 | Voiceprint recognition method, system and storage medium based on time-space information fusion |
CN112712814A (en) * | 2020-12-04 | 2021-04-27 | 中国南方电网有限责任公司 | Voiceprint recognition method based on deep learning algorithm |
CN112767949A (en) * | 2021-01-18 | 2021-05-07 | 东南大学 | Voiceprint recognition system based on binary weight convolutional neural network |
CN112786059A (en) * | 2021-03-11 | 2021-05-11 | 合肥市清大创新研究院有限公司 | Voiceprint feature extraction method and device based on artificial intelligence |
CN112883562A (en) * | 2021-02-01 | 2021-06-01 | 上海交通大学三亚崖州湾深海科技研究院 | Method for repairing ocean platform actual measurement stress spectrum based on neural network algorithm |
WO2021127990A1 (en) * | 2019-12-24 | 2021-07-01 | 广州国音智能科技有限公司 | Voiceprint recognition method based on voice noise reduction and related apparatus |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113470653A (en) * | 2020-03-31 | 2021-10-01 | 华为技术有限公司 | Voiceprint recognition method, electronic equipment and system |
CN113823291A (en) * | 2021-09-07 | 2021-12-21 | 广西电网有限责任公司贺州供电局 | Voiceprint recognition method and system applied to power operation |
CN114333850A (en) * | 2022-03-15 | 2022-04-12 | 清华大学 | Voice voiceprint visualization method and device |
CN114598565A (en) * | 2022-05-10 | 2022-06-07 | 深圳市发掘科技有限公司 | Kitchen electrical equipment remote control system and method and computer equipment |
CN115206335A (en) * | 2022-09-15 | 2022-10-18 | 北京中环高科环境治理有限公司 | Noise monitoring method for automatic sample retention and evidence collection |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160099010A1 (en) * | 2014-10-03 | 2016-04-07 | Google Inc. | Convolutional, long short-term memory, fully connected deep neural networks |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
WO2018053518A1 (en) * | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Channel-compensated low-level features for speaker recognition |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108831485A (en) * | 2018-06-11 | 2018-11-16 | 东北师范大学 | Method for distinguishing speek person based on sound spectrograph statistical nature |
-
2018
- 2018-11-29 CN CN201811439719.XA patent/CN109524014A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160099010A1 (en) * | 2014-10-03 | 2016-04-07 | Google Inc. | Convolutional, long short-term memory, fully connected deep neural networks |
WO2018053518A1 (en) * | 2016-09-19 | 2018-03-22 | Pindrop Security, Inc. | Channel-compensated low-level features for speaker recognition |
CN106952649A (en) * | 2017-05-14 | 2017-07-14 | 北京工业大学 | Method for distinguishing speek person based on convolutional neural networks and spectrogram |
CN108597539A (en) * | 2018-02-09 | 2018-09-28 | 桂林电子科技大学 | Speech-emotion recognition method based on parameter migration and sound spectrograph |
CN108831485A (en) * | 2018-06-11 | 2018-11-16 | 东北师范大学 | Method for distinguishing speek person based on sound spectrograph statistical nature |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265035B (en) * | 2019-04-25 | 2021-08-06 | 武汉大晟极科技有限公司 | Speaker recognition method based on deep learning |
CN110265035A (en) * | 2019-04-25 | 2019-09-20 | 武汉大晟极科技有限公司 | A kind of method for distinguishing speek person based on deep learning |
CN111951809A (en) * | 2019-05-14 | 2020-11-17 | 深圳子丸科技有限公司 | Multi-person voiceprint identification method and system |
CN110277100A (en) * | 2019-06-19 | 2019-09-24 | 南京邮电大学 | Based on the improved method for recognizing sound-groove of Alexnet, storage medium and terminal |
CN110349588A (en) * | 2019-07-16 | 2019-10-18 | 重庆理工大学 | A kind of LSTM network method for recognizing sound-groove of word-based insertion |
CN110534118A (en) * | 2019-07-29 | 2019-12-03 | 安徽继远软件有限公司 | Transformer/reactor method for diagnosing faults based on Application on Voiceprint Recognition and neural network |
CN110534118B (en) * | 2019-07-29 | 2021-10-08 | 安徽继远软件有限公司 | Transformer/reactor fault diagnosis method based on voiceprint recognition and neural network |
CN111124108A (en) * | 2019-11-22 | 2020-05-08 | Oppo广东移动通信有限公司 | Model training method, gesture control method, device, medium and electronic equipment |
CN111048097A (en) * | 2019-12-19 | 2020-04-21 | 中国人民解放军空军研究院通信与导航研究所 | Twin network voiceprint recognition method based on 3D convolution |
WO2021127990A1 (en) * | 2019-12-24 | 2021-07-01 | 广州国音智能科技有限公司 | Voiceprint recognition method based on voice noise reduction and related apparatus |
CN111210835A (en) * | 2020-01-08 | 2020-05-29 | 华南理工大学 | Multi-channel voice noise reduction method based on auditory model and information source direction |
CN111210835B (en) * | 2020-01-08 | 2023-07-18 | 华南理工大学 | Multichannel voice noise reduction method based on auditory model and information source direction |
CN111210807B (en) * | 2020-02-21 | 2023-03-31 | 厦门快商通科技股份有限公司 | Speech recognition model training method, system, mobile terminal and storage medium |
CN111210807A (en) * | 2020-02-21 | 2020-05-29 | 厦门快商通科技股份有限公司 | Speech recognition model training method, system, mobile terminal and storage medium |
CN111429921A (en) * | 2020-03-02 | 2020-07-17 | 厦门快商通科技股份有限公司 | Voiceprint recognition method, system, mobile terminal and storage medium |
CN111429921B (en) * | 2020-03-02 | 2023-01-03 | 厦门快商通科技股份有限公司 | Voiceprint recognition method, system, mobile terminal and storage medium |
CN113470653A (en) * | 2020-03-31 | 2021-10-01 | 华为技术有限公司 | Voiceprint recognition method, electronic equipment and system |
CN111524525B (en) * | 2020-04-28 | 2023-06-16 | 平安科技(深圳)有限公司 | Voiceprint recognition method, device, equipment and storage medium of original voice |
CN111524525A (en) * | 2020-04-28 | 2020-08-11 | 平安科技(深圳)有限公司 | Original voice voiceprint recognition method, device, equipment and storage medium |
CN111862989B (en) * | 2020-06-01 | 2024-03-08 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN112053694A (en) * | 2020-07-23 | 2020-12-08 | 哈尔滨理工大学 | Voiceprint recognition method based on CNN and GRU network fusion |
CN112201258A (en) * | 2020-10-15 | 2021-01-08 | 杭州电子科技大学 | AMBP-based noise robustness camouflage voice detection method |
CN112712814A (en) * | 2020-12-04 | 2021-04-27 | 中国南方电网有限责任公司 | Voiceprint recognition method based on deep learning algorithm |
CN112614492A (en) * | 2020-12-09 | 2021-04-06 | 通号智慧城市研究设计院有限公司 | Voiceprint recognition method, system and storage medium based on time-space information fusion |
CN112767949A (en) * | 2021-01-18 | 2021-05-07 | 东南大学 | Voiceprint recognition system based on binary weight convolutional neural network |
CN112883562A (en) * | 2021-02-01 | 2021-06-01 | 上海交通大学三亚崖州湾深海科技研究院 | Method for repairing ocean platform actual measurement stress spectrum based on neural network algorithm |
CN112883562B (en) * | 2021-02-01 | 2023-02-24 | 上海交通大学三亚崖州湾深海科技研究院 | Ocean platform actual measurement stress spectrum repairing method based on neural network algorithm |
CN112786059A (en) * | 2021-03-11 | 2021-05-11 | 合肥市清大创新研究院有限公司 | Voiceprint feature extraction method and device based on artificial intelligence |
CN113129897A (en) * | 2021-04-08 | 2021-07-16 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism recurrent neural network |
CN113129897B (en) * | 2021-04-08 | 2024-02-20 | 杭州电子科技大学 | Voiceprint recognition method based on attention mechanism cyclic neural network |
WO2023036016A1 (en) * | 2021-09-07 | 2023-03-16 | 广西电网有限责任公司贺州供电局 | Voiceprint recognition method and system applied to electric power operation |
CN113823291A (en) * | 2021-09-07 | 2021-12-21 | 广西电网有限责任公司贺州供电局 | Voiceprint recognition method and system applied to power operation |
CN114333850A (en) * | 2022-03-15 | 2022-04-12 | 清华大学 | Voice voiceprint visualization method and device |
CN114598565A (en) * | 2022-05-10 | 2022-06-07 | 深圳市发掘科技有限公司 | Kitchen electrical equipment remote control system and method and computer equipment |
CN115206335A (en) * | 2022-09-15 | 2022-10-18 | 北京中环高科环境治理有限公司 | Noise monitoring method for automatic sample retention and evidence collection |
CN115206335B (en) * | 2022-09-15 | 2022-12-02 | 北京中环高科环境治理有限公司 | Noise monitoring method for automatic sample retention and evidence collection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109524014A (en) | A kind of Application on Voiceprint Recognition analysis method based on depth convolutional neural networks | |
CN102509547B (en) | Method and system for voiceprint recognition based on vector quantization based | |
Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
CN106952649A (en) | Method for distinguishing speek person based on convolutional neural networks and spectrogram | |
CN108831485A (en) | Method for distinguishing speek person based on sound spectrograph statistical nature | |
CN107610707A (en) | A kind of method for recognizing sound-groove and device | |
CN107393554A (en) | In a kind of sound scene classification merge class between standard deviation feature extracting method | |
CN111128209B (en) | Speech enhancement method based on mixed masking learning target | |
CN109559736A (en) | A kind of film performer's automatic dubbing method based on confrontation network | |
CN109256138A (en) | Auth method, terminal device and computer readable storage medium | |
CN112053694A (en) | Voiceprint recognition method based on CNN and GRU network fusion | |
CN113658583B (en) | Ear voice conversion method, system and device based on generation countermeasure network | |
Shi et al. | End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network. | |
CN114783418B (en) | End-to-end voice recognition method and system based on sparse self-attention mechanism | |
CN110782902A (en) | Audio data determination method, apparatus, device and medium | |
CN102496366B (en) | Speaker identification method irrelevant with text | |
Jing et al. | Speaker recognition based on principal component analysis of LPCC and MFCC | |
Xue et al. | Cross-modal information fusion for voice spoofing detection | |
Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics | |
Wu et al. | A Characteristic of Speaker's Audio in the Model Space Based on Adaptive Frequency Scaling | |
CN110136741A (en) | A kind of single-channel voice Enhancement Method based on multiple dimensioned context | |
CN115064175A (en) | Speaker recognition method | |
Chelali et al. | MFCC and vector quantization for Arabic fricatives speech/speaker recognition | |
Lavania et al. | Reviewing Human-Machine Interaction through Speech Recognition approaches and Analyzing an approach for Designing an Efficient System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190326 |
|
RJ01 | Rejection of invention patent application after publication |