CN102509547B - Method and system for voiceprint recognition based on vector quantization based - Google Patents

Method and system for voiceprint recognition based on vector quantization based Download PDF

Info

Publication number
CN102509547B
CN102509547B CN2011104503646A CN201110450364A CN102509547B CN 102509547 B CN102509547 B CN 102509547B CN 2011104503646 A CN2011104503646 A CN 2011104503646A CN 201110450364 A CN201110450364 A CN 201110450364A CN 102509547 B CN102509547 B CN 102509547B
Authority
CN
China
Prior art keywords
speaker
code word
code book
sound
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2011104503646A
Other languages
Chinese (zh)
Other versions
CN102509547A (en
Inventor
霍春宝
赵立辉
崔文翀
张彩娟
曹景胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University of Technology
Original Assignee
Liaoning University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University of Technology filed Critical Liaoning University of Technology
Priority to CN2011104503646A priority Critical patent/CN102509547B/en
Publication of CN102509547A publication Critical patent/CN102509547A/en
Application granted granted Critical
Publication of CN102509547B publication Critical patent/CN102509547B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a method and a system for voiceprint recognition based on vector quantization, which have high recognition performance and noise immunity, are effective in recognition, require few modeling data, and are quick in judgment speed and low in complexity. The method includes steps: acquiring audio signals; preprocessing the audio signals; extracting audio signal characteristic parameters by using MFCC (mel-frequency cepstrum coefficient) parameters, wherein the order of the MFCC ranges from 12 to 16; template training, namely using the LBG (linde, buzo and gray) clustering algorithm to set up a codebook for each speaker and store the codebooks in an audio data base to be used as the audio templates of the speakers; voiceprint recognizing, namely comparing acquired characteristic parameters of the audio signals to be recognized with the speaker audio templates set up in the audio data base and judging according to weighting Euclidean distance measure, and if the corresponding speaker template enables the audio characteristic vector X of a speaker to be recognized to have the minimum average distance measure, the speaker is supposed to be recognized.

Description

Method for recognizing sound-groove and system based on vector quantization
Technical field
The invention belongs to voice process technology, particularly a kind of voice signal with the speaker comes method for recognizing sound-groove and the system based on vector quantization of identification speaker ' s identity.
Background technology
In recent years, widespread use along with information processing and artificial intelligence technology, and people are to the effectively an urgent demand of authentication fast, the identification of conventional cipher authentication has lost his status gradually, and in field of biological recognition, but be subject to increasing people's favor based on the identity recognizing technology of speaker's voice.
Due to everyone differences of Physiological of vocal organs and the behavior difference that form the day after tomorrow causes articulation type and the custom of speaking is different, therefore identifying identity with speaker's voice becomes possibility.The advantages such as Application on Voiceprint Recognition can not forget except having, need not remember, easy to use, also have following properties: at first, its authentication mode is easy to accept, and " password " that uses is sound, opening and get final product; Secondly, the content of identification text can be random, is difficult for stealing, and security performance is higher; The 3rd, the terminal device that uses of identification is microphone or phone, and is with low cost and be easy to combine with the existing communication system.Therefore, the application prospect of Application on Voiceprint Recognition is boundless: in economic activity, can realize each bank remittance, inquiry into balance, transfer accounts etc.; In secret and safe, can check the personnel in secret place with the sound of appointment, it responds the speaker dependent; In judicial expertise, can judge according to instantaneous recording the true identity of criminal in the suspect; In biomedicine, can make this system only respond patient's order, thereby realize the control to user's artificial limb.
The gordian technique of Application on Voiceprint Recognition is mainly phonic signal character parameter extraction and Model Matching.The phonic signal character parameter can be divided into two classes substantially: a class is the low-level feature of major embodiment speaker vocal organs physiological property, as the Mel frequency cepstral coefficient (MFCC) that the sensitivity of the voice signal of different frequency is extracted according to people's ear, the linear prediction cepstrum coefficient coefficient (LPCC) that obtains according to the all-pole modeling of voice signal etc.; Another kind of is the high-level characteristic of major embodiment speaker term custom, pronunciation characteristic, as the prosodic features (Prosodic Features) that reflects the modulation in tone of speaker's voice, the phoneme feature (Phone Features) that reflects phoneme statistical law in speaker's idiom etc.LPCC is based on that the pronunciation model of voice signal sets up, and easily is subject to the impact of hypothesized model, although use in some document of high-level characteristic, discrimination is not very high.
The Model Matching method that proposes for various phonic signal character parameters mainly contains dynamic time warping (DTW) method, vector quantization (VQ) method, gauss hybrid models (GMM) method, artificial neural network (ANN) method etc.Wherein the DTW model depends on the time sequencing of parameter, and real-time performance is relatively poor, is fit to the Speaker Identification based on isolated word (word); GMM is mainly used in the Speaker Identification of a large amount of voice, needs more model training data, training time and the recognition time grown, but also need larger memory headroom.In the ANN model, might not guarantee convergence to the training algorithm of the design of best model topological structure, and can have the problem of study.In the Speaker Identification based on VQ, template matches does not rely on the time sequencing of parameter, and real-time is relatively good, and modeling data is few, and judgement speed is fast, and complexity is not high yet.Speaker Identification principle based on the vector quantization model is that each speaker's phonic signal character parameter quantification is become code book, be kept in sound bank the sound template as the speaker, sound template with existing some speakers in the eigenvector of voice to be identified and sound bank during identification compares, calculate overall average quantizing distortion separately, with the sound template of minimum distortion as recognition result.Be into elliptoid normal distribution yet weak point is voice signal, the distribution of each vector is unequal, does not obtain the reaction of arriving very much in estimating based on the Euclidean distance of traditional VQ Speaker Recognition System.
Summary of the invention
The technical problem to be solved in the present invention is to propose a kind of method for recognizing sound-groove and system based on vector quantization, has good recognition performance and anti-noise ability, and recognition effect is relatively good, and modeling data is few, and judgement speed is fast, and complexity is not high.
A kind of method for recognizing sound-groove based on vector quantization, concrete steps are as follows:
1, the collection of voice signal: as the terminal device that gathers voice, gather voice signal by sound card with the phone of programme-controlled exchange comprehensive experiment box;
2, voice signal pre-service: divide frame windowing operation by computing machine with the voice signal that extracts, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the end-point detection method that combines based on short-time energy and short-time zero-crossing rate; Pre-emphasis, the value that increases the weight of coefficient is 0.90 ~ 1.00;
3, phonic signal character parameter extraction: adopt the MFCC parameter, the exponent number of MFCC is 12 ~ 16;
4, template training: adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker;
5, sound-groove identification: compare by speaker's sound template of having set up by step 1,2,3,4 in the phonic signal character parameter to be identified that will collect and storehouse, and estimate according to weighted euclidean distance and judge, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker.
Above-mentioned phonic signal character parameter extraction step is as follows:
(1) pretreated voice signal is carried out Short Time Fourier Transform and obtain its frequency spectrum X (k), the DFT formula of voice signal is:
Figure 2011104503646100002DEST_PATH_IMAGE001
(1)
Wherein,
Figure 2011104503646100002DEST_PATH_IMAGE002
Be the voice signal take frame as unit of input, N is counting of Fourier transform, gets 256;
(2) ask frequency spectrum
Figure 2011104503646100002DEST_PATH_IMAGE003
Square, i.e. energy spectrum
Figure 2011104503646100002DEST_PATH_IMAGE004
, then undertaken smoothly by the frequency spectrum of Mel frequency filter to voice signal, and harmonic carcellation, highlight the resonance peak of original voice;
The Mel frequency filter is one group of V-belt bandpass filter, and centre frequency is
Figure 2011104503646100002DEST_PATH_IMAGE005
,
Figure 2011104503646100002DEST_PATH_IMAGE006
=1,2 ..., Q, Q are the number of V-belt bandpass filter, the Mel wave filter
Figure 2011104503646100002DEST_PATH_IMAGE007
Be expressed as follows:
Figure 2011104503646100002DEST_PATH_IMAGE008
(2)
(3) the Mel frequency spectrum of bank of filters output is taken the logarithm: the dynamic range of compressed voice spectrum; The property the taken advantage of composition conversion of noise in frequency domain is become the additivity composition, logarithm Mel frequency spectrum
Figure 2011104503646100002DEST_PATH_IMAGE009
As follows:
Figure 2011104503646100002DEST_PATH_IMAGE010
(3)
(4) discrete cosine transform (DCT)
Logarithm Mel frequency spectrum with formula (3) acquisition
Figure 842556DEST_PATH_IMAGE009
Transform to time domain, its result is Mel frequency cepstral coefficient (MFCC), n coefficient
Figure 2011104503646100002DEST_PATH_IMAGE011
The formula that is calculated as follows:
Figure 2011104503646100002DEST_PATH_IMAGE012
(4)
Wherein, L is the exponent number of MFCC parameter, and Q is the number of Mel wave filter, and L gets 12 ~ 16, Q and gets 23 ~ 26;
During above-mentioned template training the concrete steps of LBG clustering algorithm that adopt as follows:
(1) obtain all trained vector X in the eigenvector set S of input, and the code word by the given initial codebook of division codebook method
Figure 2011104503646100002DEST_PATH_IMAGE013
(2) utilize a less threshold value
Figure 2011104503646100002DEST_PATH_IMAGE014
,
Figure 2011104503646100002DEST_PATH_IMAGE015
, will Be divided into two, the method for division is followed following rule:
Figure 2011104503646100002DEST_PATH_IMAGE016
(5)
After division, obtain the code word of new code book
Figure 2011104503646100002DEST_PATH_IMAGE017
,
Figure 2011104503646100002DEST_PATH_IMAGE018
(3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
Figure 2011104503646100002DEST_PATH_IMAGE019
The time,
Figure 2011104503646100002DEST_PATH_IMAGE020
Figure 2011104503646100002DEST_PATH_IMAGE021
(6)
In formula, M is the number of code word in current initial codebook;
(4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
(5) by (3), (4) go on foot the iterative computation of carrying out, and obtain the code word of new code book ,
Figure 2011104503646100002DEST_PATH_IMAGE023
(6) and then repeated for (2) step, the code word that newly obtains respectively is divided into two, then again by (3), (4) stepping row iteration is calculated, and so continues, until required code book code word number is
Figure 2011104503646100002DEST_PATH_IMAGE024
, r is integer, need to do altogether the above-mentioned circular treatment of r wheel, until cluster is complete, at this moment, all kinds of barycenter is required code word.
Initial codebook in above-mentioned LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
The average of the eigenvector of all frames that (1) will extract is as the code word of initial codebook
(2) will
Figure 436108DEST_PATH_IMAGE013
According to following regular splitting, form 2m code word;
Figure 2011104503646100002DEST_PATH_IMAGE025
(7)
Wherein m is the code word number that changes to current code book from 1, Parameter when being division is got
Figure 636724DEST_PATH_IMAGE015
(3) according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and :
Figure 2011104503646100002DEST_PATH_IMAGE027
(8)
Figure 28391DEST_PATH_IMAGE026
Be total distance measure of next iteration,
Figure 2011104503646100002DEST_PATH_IMAGE028
Be training characteristics vector X and training code book out
Figure 2011104503646100002DEST_PATH_IMAGE029
Between distance measure;
Calculate relative distance measure:
Figure 2011104503646100002DEST_PATH_IMAGE030
(9)
If (
Figure 2011104503646100002DEST_PATH_IMAGE032
), stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step.
(4) recomputate the new barycenter of regional;
(5) repeat (3) step and (4) step, until form the code book of the best of a 2m code word;
(6) repetition (2), (3) and (4) step are until be formed with the code book of M code word;
During above-mentioned discrete cosine transform, L=13, Q=25.
A kind of Voiceprint Recognition System based on vector quantization, composed as follows:
Speech signal collection module, voice signal pretreatment module, phonic signal character parameter extraction module, sound template training module and voiceprint identification module.
The present invention's beneficial effect compared with prior art is:
Gather voice signal by sound card, utilize voice process technology to carry out pre-service to the voice signal that collects, then extract the phonic signal character parameter, build a Speaker Recognition System thereby utilize vector quantization technology to set up speech model to the phonic signal character parameter that obtains.Adopt the MFCC parameter, have good recognition performance and anti-noise ability and can fully simulate the auditory perceptual ability, the most useful speaker information is included in the 2nd rank of MFCC parameter between 16 rank in Speaker Identification; By adopting vector quantization (VQ) method, have good recognition performance and anti-noise ability, real-time, recognition effect is good, and modeling data is few, and algorithm is simple, and judgement speed is fast, and complexity is not high.
Description of drawings
Fig. 1 is system chart of the present invention;
Fig. 2 is main flow chart of the present invention;
Fig. 3 is the LBG algorithm flow chart;
Fig. 4 is based on the Application on Voiceprint Recognition human-computer interaction interface of VQ.
Embodiment
As shown in Figure 1, should be based on the Voiceprint Recognition System of vector quantization, complete identification to speaker's voice by software and hardware combining, composed as follows:
Speech signal collection module, voice signal pretreatment module, phonic signal character parameter extraction module, speech model training module and voiceprint identification module.
As Fig. 2~shown in Figure 3, should be as follows based on concrete steps of the method for recognizing sound-groove of vector quantization:
1, the collection of voice signal
The collection of voice signal is that original voice analog signal is converted to digital signal, channel number, sample frequency are set, the present invention carries out the collection of voice signal with the SHT-8B/PCI type sound card that adopts Hangzhou San Hui company to produce, channel number is 2 (sound card default channel number be 2), and sample frequency is 8KHz (sound card acquiescence sample frequency).The terminal device of identification is the telephone set of experiment with the programme-controlled exchange comprehensive experiment box, and the programme-controlled exchange experimental box exchanged form be space switching, speech channel is first two tunnel (totally four tunnel: Jia Yilu, first two tunnel, second one tunnel, second two tunnel, the present invention chooses Jia Erlu at random, on experimental result without the impact).
2, the pre-service of voice signal
(1) windowing divides frame
The time-varying characteristics of voice signal determine it is processed and must carry out on a bit of voice, therefore to divide frame to process to it, simultaneously in order to guarantee that voice signal can not cause because of minute frame the loss of information, to guarantee certain overlapping between frame and frame, be that frame moves, frame move and the ratio of frame length generally between 0 ~ 1/2.The frame length that uses in the present invention is 256 sampled points, and it is 128 sampled points that frame moves.Window function
Figure 2011104503646100002DEST_PATH_IMAGE033
Adopt smoothness properties Hamming window function preferably, as follows:
(10)
In formula, N is length of window, and the present invention is 256 points.
(2) end-point detection
The present invention adopts the end-point detection method that combines based on short-time energy and short-time average zero-crossing rate to carry out end-point detection to voice signal, thus the starting point and ending point of judgement voice signal.Short-time energy detects voiced sound, and zero-crossing rate detects voiceless sound.Suppose
Figure 2011104503646100002DEST_PATH_IMAGE035
Be voice signal,
Figure 2011104503646100002DEST_PATH_IMAGE036
Be Hamming window function, define short-time energy For
Figure 2011104503646100002DEST_PATH_IMAGE038
(11)
In formula,
Figure 2011104503646100002DEST_PATH_IMAGE039
,
Figure 654721DEST_PATH_IMAGE037
Short-time energy when n point of expression voice signal begins windowed function.
Short-time average zero-crossing rate
Figure 2011104503646100002DEST_PATH_IMAGE040
For:
Figure 2011104503646100002DEST_PATH_IMAGE041
(12)
In formula, N is the length of window function, The is-symbol function, namely
Figure 2011104503646100002DEST_PATH_IMAGE043
(3) pre-emphasis
Be subject to the impact of glottal excitation and mouth and nose radiation due to the average power spectra of voice signal, front end falls by 6dB/ times of journey more than 8000Hz greatly, will carry out the HFS that pre-emphasis processes to promote voice signal for this reason, makes the frequency spectrum of signal become smooth.Pre-emphasis realizes with the digital filter that having of 6dB/ times of journey promotes high frequency characteristics, and it is generally the digital filter of single order
Figure 2011104503646100002DEST_PATH_IMAGE044
, namely
Figure 2011104503646100002DEST_PATH_IMAGE045
(13)
Wherein u value discrimination of system between 0.90 ~ 1.00 is the highest, and the present invention gets u=0.97.
3, phonic signal character parameter extraction
The phonic signal character parameter extraction is exactly to extract the parameter that can reflect speaker's individual character from speaker's voice signal, and detailed process is as follows:
(1) pretreated voice signal is carried out Short Time Fourier Transform (DFT) and obtain its frequency spectrum X (k).The DFT formula of voice signal is:
Figure 2011104503646100002DEST_PATH_IMAGE046
(14)
Wherein,
Figure 2011104503646100002DEST_PATH_IMAGE047
Be the voice signal take frame as unit of input, N is counting of Fourier transform, gets 256.
(2) ask frequency spectrum
Figure 2011104503646100002DEST_PATH_IMAGE048
Square, i.e. energy spectrum
Figure 198966DEST_PATH_IMAGE004
, then with them by the Mel wave filter, the frequency spectrum of voice signal is carried out smoothly realizing, and harmonic carcellation, highlight the resonance peak of original voice.
The Mel frequency filter is one group of V-belt bandpass filter, and centre frequency is
Figure 756111DEST_PATH_IMAGE005
, =1,2 ..., Q, Q are the number of V-belt bandpass filter, the Mel wave filter
Figure 309769DEST_PATH_IMAGE007
Be expressed as follows:
Figure 2011104503646100002DEST_PATH_IMAGE049
(15)
(3) output of bank of filters is taken the logarithm: the dynamic range of compressed voice spectrum; The property the taken advantage of composition conversion of noise in frequency domain is become the additivity composition, the logarithm Mel frequency spectrum that obtains
Figure 2011104503646100002DEST_PATH_IMAGE050
As follows:
Figure 957788DEST_PATH_IMAGE010
(16)
(4) discrete cosine transform (DCT)
Mel frequency spectrum with the above-mentioned steps acquisition
Figure 387633DEST_PATH_IMAGE050
Transform to time domain, its result is exactly Mel frequency cepstral coefficient (MFCC).N coefficient
Figure 193040DEST_PATH_IMAGE051
The formula that is calculated as follows:
(17)
Wherein, L is the exponent number of MFCC, and Q is the number of Mel wave filter, and both value is often decided according to the experiment situation.The present embodiment is got L=13, Q=25, and reality is not limited by the present embodiment.
4, template training
(1) ultimate principle
In Application on Voiceprint Recognition, be generally first to use the code book of vector quantization as speaker's sound template, namely each speaker's voice in system, be quantified as a code book and deposit in sound bank as this speaker's sound template.For the speech characteristic vector sequential extraction procedures characteristic parameter of any input, calculate this speech characteristic parameter to the overall average distortion quantization error of each sound template during identification, the corresponding speaker of the template of total mean error minimum is recognition result.
(2) distance measure
If the K dimensional feature vector of unknown pattern is X, compare with certain K dimension code word vector Y in code book,
Figure 2011104503646100002DEST_PATH_IMAGE053
Represent respectively the same one dimension component of X and Y, Euclidean distance is estimated
Figure 2011104503646100002DEST_PATH_IMAGE054
For:
Figure 2011104503646100002DEST_PATH_IMAGE055
(18)
Each component for traditional Euclidean distance Measure Characteristics vector is equal weight, this NATURAL DISTRIBUTION of only having when eigenvector is spherical or when spherical, that is to say when the distribution of each component of eigenvector just can obtain recognition effect preferably when equal.And voice signal is into elliptoid normal distribution, and the distribution of each vector is unequal, and they are not well reacted in Euclidean distance is estimated, if directly adopt Euclidean distance to estimate, the speaker is adjudicated, and the discrimination of system will be affected.
The present invention adopts the MFCC on 13 rank, in order to embody them in the difference contribution of cluster, adopt the Euclidean distance of weighting to estimate, give different weights to the vector of different distributions, the more discrete vector that distributes is given very little weight, and the vector of concentrating is given very large weight for distributing.The dispersion degree that distributes is weighed to the Euclidean distance of cluster centre (vector average) with vector, weighting factor
Figure 2011104503646100002DEST_PATH_IMAGE056
For:
Figure 2011104503646100002DEST_PATH_IMAGE057
(19)
K in following formula is the dimension of eigenvector.When training and identification, the Euclidean distance that obtains is carried out descending sort, then carry out pre-emphasis with weighting factor, be equivalent to the Euclidean distance that adopts not weighting when training and identification on this process nature, and the component of respectively tieing up of eigenvector is carried out pre-emphasis with scale factor, like this to the very high vector that destruction character is arranged of sequence, give very little weight as isolated point or noise, and give larger weight to the very low good vector of sequence, thereby each vector is well embodied the contribution of identifying.
(3) template training
The LBG algorithm that is based on disintegrating method that the present invention adopts, concrete steps are as follows:
1) obtain all trained vector X in the eigenvector set S of input, and by dividing the code word of code book (code book is vector set, or perhaps the set of code word) the given initial codebook of method
2) utilize a less threshold value
Figure 2011104503646100002DEST_PATH_IMAGE059
( ) will
Figure 987001DEST_PATH_IMAGE013
Be divided into two, the method for division is followed following rule:
Figure 2011104503646100002DEST_PATH_IMAGE060
(20)
After division, obtain the code word of new code book
Figure 384484DEST_PATH_IMAGE017
,
Figure 113406DEST_PATH_IMAGE018
3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
Figure 2011104503646100002DEST_PATH_IMAGE061
The time,
Figure 2011104503646100002DEST_PATH_IMAGE062
Figure 2011104503646100002DEST_PATH_IMAGE063
Figure 818319DEST_PATH_IMAGE021
(21)
In formula, M is the number of code word in current initial codebook;
4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
5) by the 3rd), 4) go on foot the iterative computation of carrying out, obtain the code word of new code book ,
Figure 150260DEST_PATH_IMAGE023
6) and then repeat the 2nd) step, the code word that newly obtains respectively is divided into two, then again by the 3rd), 4) the stepping row iteration calculates, so continues, until required code book code word number is (r is integer) need to do the above-mentioned circular treatment of r wheel altogether, until cluster is complete, at this moment, all kinds of barycenter is required code word.
Initial codebook in above-mentioned LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
Figure 2011104503646100002DEST_PATH_IMAGE065
With the average of the eigenvector of all frames of the extracting code word as initial codebook
Figure 961986DEST_PATH_IMAGE065
Will According to following regular splitting, form 2m code word;
(22)
Wherein m is the code word number that changes to current code book from 1,
Figure 529420DEST_PATH_IMAGE059
Parameter when being division, the present invention gets
Figure 950037DEST_PATH_IMAGE015
3. according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and
Figure 2011104503646100002DEST_PATH_IMAGE067
:
(23)
Figure DEST_PATH_IMAGE069
Be total distance measure of next iteration,
Figure 48705DEST_PATH_IMAGE028
Be training characteristics vector X and training code book out
Figure 224471DEST_PATH_IMAGE029
Between distance measure.
Calculate relative distance measure :
Figure 212019DEST_PATH_IMAGE030
(24)
If
Figure DEST_PATH_IMAGE071
, stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step;
4. recomputate the new barycenter of regional;
5. repeat 3. and 4., until form the code book of the best of a 2m code word;
6. repeat 2., 3. and 4., until be formed with the code book of M code word;
5, sound-groove identification
(1) extracting length is the feature vector sequence of speaker's voice signal to be identified of T
Figure DEST_PATH_IMAGE072
, the code book in formed sound bank of training stage is:
Figure DEST_PATH_IMAGE073
(N represents speaker's number).
(2) distance measure between existing speaker's sound template in calculated characteristics vector and storehouse, namely obtain
Figure DEST_PATH_IMAGE074
:
Figure DEST_PATH_IMAGE075
(25)
In formula, j represents in X
Figure DEST_PATH_IMAGE076
The eigenvector of frame, m represent i speaker's m code word, total M code word, and K is the dimension of eigenvector.Weighting factor
Figure DEST_PATH_IMAGE077
For:
Figure DEST_PATH_IMAGE078
(26)
(3) calculating X estimates to the mean distance of i code book
Figure DEST_PATH_IMAGE079
Figure DEST_PATH_IMAGE080
(27)
(4) calculate
Figure DEST_PATH_IMAGE081
, obtain all
Figure DEST_PATH_IMAGE082
(5) obtain
Figure DEST_PATH_IMAGE083
That i that middle reckling is corresponding is namely that required people.
Native system belongs to closed set identification, that is to say that all speakers to be identified belong to known speaker's set.The human-computer interaction interface of Speaker Identification as shown in Figure 4.In the human-computer interaction interface of Voiceprint Recognition System, " demonstration of sound card state " List View shows the available voice channel of current speech card number and channel status; " speech samples storehouse " List View shows speaker's number of samples and the speaker's name in the current speech Sample Storehouse." setting of Application on Voiceprint Recognition parameter " hurdle shows the parameter that voice collecting will arrange, and comprising: training duration (acquiescence 23s), length of testing speech (acquiescence 15s) and candidate's number (acquiescence 1).
Be specifically described below in conjunction with example: suppose to have deposited in advance in the speech samples storehouse 100 people's voice, when an XX puts through phone, the process how its sound is identified.
If 1 XX does not belong to known speech samples storehouse
(1) collection of voice signal: as the terminal device that gathers voice, gather voice by sound card with the phone of programme-controlled exchange comprehensive experiment box;
At first, " training duration " parameter (scope: 10-39s), then add speaker's name " XX " in the name edit box, click " adding the speaker " button of the training utterance that needs collection is set.After interpolation is completed, click " is determined ", then put through the phone (number: 8700) of programme-controlled exchange comprehensive experiment box, after connection, the state of sound card passage 2 (being defaulted as passage 2) is updated to " in recording ", and this moment, sound card just can gather voice.The voice that gather reach predetermined training duration, phone meeting auto-hang up;
(2) pre-service of voice signal: divide frame windowing operation by computing machine and VC software in conjunction with the voice signal that will extract, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the detection method that combines based on short-time energy and short-time zero-crossing rate method; Pre-emphasis, the value that increases the weight of coefficient is 0.97;
(3) extract the phonic signal character parameter: utilize computing machine to be combined the MFCC parameter on extraction 13 rank with VC software;
(4) template training: utilize the division codebook method to carry out initialization to code book, then adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker;
(5) Speaker Identification
At first, " length of testing speech " parameter (scope: 5-20s), put through the phone (number: 8700), utilize sound card (passage is 2) to gather voice of programme-controlled exchange comprehensive experiment box of the tested speech that needs collection is set.The voice that gather reach predetermined length of testing speech, phone meeting auto-hang up;
then software forbids that " carrying out speaker's identification " button uses, voice to the speaker carry out step (2), (3) operation, the speaker's to be tested that will extract at last voice and the sound template in the storehouse compare, click " carrying out speaker's identification " button, the number of candidates that selection will show (scope 1-3), if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker, show simultaneously identification result " XX " and resolution on " speaker's identification " view list.
If 2 XX belong to known speech samples storehouse
Speaker's identification is directly carried out in the storehouse if XX belongs to known speech samples: at first, " length of testing speech " parameter (scope: 5-20s) of the tested speech that needs collection is set, put through the phone (number: 8700), utilize sound card (passage is 2) to gather voice of programme-controlled exchange comprehensive experiment box.The voice that gather reach predetermined length of testing speech, phone meeting auto-hang up;
Then software forbids that " carrying out speaker's identification " button uses, speaker's voice are carried out the operation of step (2), (3), the speaker's to be tested that will extract at last voice and the sound template in the storehouse compare, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker, show identification result " XX " and resolution simultaneously on " speaker's identification " view list.

Claims (2)

1. the method for recognizing sound-groove based on vector quantization, is characterized in that, concrete steps are as follows:
(1), the collection of voice signal: as the terminal device that gathers voice, gather voice signal by sound card with the phone of programme-controlled exchange comprehensive experiment box;
(2), voice signal pre-service: divide frame windowing operation by computing machine with the voice signal that extracts, a frame comprises 256 sampled points in minute frame process, and it is 128 sampled points that frame moves, and added window function is Hamming window; End-point detection adopts the end-point detection method that combines based on short-time energy and short-time zero-crossing rate; Pre-emphasis, the value that increases the weight of coefficient is 0.90 ~ 1.00;
(3), phonic signal character parameter extraction: adopt the MFCC parameter, the exponent number of MFCC is 12 ~ 16;
(4), template training: adopting the LBG clustering algorithm is that each speaker in system sets up a code book and is stored in speech database sound template as this speaker, the concrete steps of LBG clustering algorithm that adopt as follows:
(4.1) obtain all trained vector X in the eigenvector set S of input, and the code word by the given initial codebook of division codebook method
(4.2) utilize a less threshold value
Figure 275845DEST_PATH_IMAGE002
,
Figure 944724DEST_PATH_IMAGE003
, will Be divided into two, the method for division is followed following rule:
Figure 128635DEST_PATH_IMAGE004
(5)
After division, obtain the code word of new code book
Figure 741013DEST_PATH_IMAGE005
,
Figure 202082DEST_PATH_IMAGE006
(4.3) according to the most contiguous criterion, seek nearest code word for the code word of new code book, at last S is divided into the m subset, namely work as
Figure 631926DEST_PATH_IMAGE007
The time,
Figure 309770DEST_PATH_IMAGE008
Figure 584893DEST_PATH_IMAGE009
(6)
In formula, M is the number of code word in current initial codebook;
(4.4) calculate the barycenter of eigenvector in every subset, and replace code word in this set with this barycenter, so just obtained new code book;
(4.5) go on foot the iterative computation of carrying out by the 3rd step, the 4th, obtain the code word of new code book ,
Figure 501214DEST_PATH_IMAGE011
(4.6) and then repeated for the 2nd step, the code word that newly obtains respectively is divided into two, then calculates by the 3rd step, the 4th stepping row iteration again, so continue, until required code book code word number is
Figure 167818DEST_PATH_IMAGE012
, r is integer, need to do altogether the above-mentioned circular treatment of r wheel, until cluster is complete, at this moment, all kinds of barycenter is required code word.
(5), sound-groove identification: compare by speaker's sound template of having set up by the 1st the~the 4 step of step in the phonic signal character parameter to be identified that will collect and storehouse, and estimate according to weighted euclidean distance and judge, if corresponding speaker template makes words person's speech feature vector X to be identified have the minimum average B configuration distance measure, think and identify the speaker.
2. the method for recognizing sound-groove based on vector quantization according to claim 1, is characterized in that, the initial codebook in the LBG clustering algorithm adopts the division codebook method to carry out the code book initialization, and detailed process is as follows:
The average of the eigenvector of all frames that (1) will extract is as the code word of initial codebook
Figure 246633DEST_PATH_IMAGE001
(2) will
Figure 682293DEST_PATH_IMAGE001
According to following regular splitting, form 2m code word;
(7)
Wherein m is the code word number that changes to current code book from 1,
Figure 840797DEST_PATH_IMAGE002
Parameter when being division is got
Figure 660985DEST_PATH_IMAGE003
(3) according to new code word, all eigenvectors are carried out cluster, then calculate total distance measure D and
Figure 13469DEST_PATH_IMAGE014
:
Figure 893701DEST_PATH_IMAGE015
(8)
Figure 331635DEST_PATH_IMAGE014
Be total distance measure of next iteration,
Figure 689935DEST_PATH_IMAGE016
Be training characteristics vector X and training code book out Between distance measure;
Calculate relative distance measure:
Figure 712173DEST_PATH_IMAGE018
(9)
If
Figure 575087DEST_PATH_IMAGE019
(
Figure 799395DEST_PATH_IMAGE020
), stopping iterative computation, current code book is exactly the code book that designs, otherwise, turn next step
(4) recomputate the new barycenter of regional;
(5) repeat the 3rd step and the 4th step, until form the code book of the best of a 2m code word;
(6) the 2nd, the 3rd step of repetition, the 4th step are until be formed with the code book of M code word.
CN2011104503646A 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based Expired - Fee Related CN102509547B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011104503646A CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011104503646A CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Publications (2)

Publication Number Publication Date
CN102509547A CN102509547A (en) 2012-06-20
CN102509547B true CN102509547B (en) 2013-06-19

Family

ID=46221622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011104503646A Expired - Fee Related CN102509547B (en) 2011-12-29 2011-12-29 Method and system for voiceprint recognition based on vector quantization based

Country Status (1)

Country Link
CN (1) CN102509547B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102810A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 Method for recognizing sound-groove and device

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103794207A (en) * 2012-10-29 2014-05-14 西安远声电子科技有限公司 Dual-mode voice identity recognition method
CN103714826B (en) * 2013-12-18 2016-08-17 讯飞智元信息科技有限公司 Formant automatic matching method towards vocal print identification
CN103794219B (en) * 2014-01-24 2016-10-05 华南理工大学 A kind of Codebook of Vector Quantization based on the division of M code word generates method
CN104485102A (en) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 Voiceprint recognition method and device
CN105989842B (en) * 2015-01-30 2019-10-25 福建星网视易信息系统有限公司 The method, apparatus for comparing vocal print similarity and its application in digital entertainment VOD system
CN106340298A (en) * 2015-07-06 2017-01-18 南京理工大学 Voiceprint unlocking method integrating content recognition and speaker recognition
CN104994400A (en) * 2015-07-06 2015-10-21 无锡天脉聚源传媒科技有限公司 Method and device for indexing video by means of acquisition of host name
CN105304087B (en) * 2015-09-15 2017-03-22 北京理工大学 Voiceprint recognition method based on zero-crossing separating points
US10262654B2 (en) * 2015-09-24 2019-04-16 Microsoft Technology Licensing, Llc Detecting actionable items in a conversation among participants
CN105355206B (en) * 2015-09-24 2020-03-17 车音智能科技有限公司 Voiceprint feature extraction method and electronic equipment
CN105355195A (en) * 2015-09-25 2016-02-24 小米科技有限责任公司 Audio frequency recognition method and audio frequency recognition device
CN106920558B (en) * 2015-12-25 2021-04-13 展讯通信(上海)有限公司 Keyword recognition method and device
CN106971735B (en) * 2016-01-14 2019-12-03 芋头科技(杭州)有限公司 A kind of method and system regularly updating the Application on Voiceprint Recognition of training sentence in caching
CN106971711A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system
CN106971729A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of method and system that Application on Voiceprint Recognition speed is improved based on sound characteristic scope
CN106981287A (en) * 2016-01-14 2017-07-25 芋头科技(杭州)有限公司 A kind of method and system for improving Application on Voiceprint Recognition speed
CN106971712A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive rapid voiceprint recognition methods and system
CN106971726A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 A kind of adaptive method for recognizing sound-groove and system based on code book
CN105931637A (en) * 2016-04-01 2016-09-07 金陵科技学院 User-defined instruction recognition speech photographing system
CN106057212B (en) * 2016-05-19 2019-04-30 华东交通大学 Driving fatigue detection method based on voice personal characteristics and model adaptation
CN106448682A (en) * 2016-09-13 2017-02-22 Tcl集团股份有限公司 Open-set speaker recognition method and apparatus
CN107945807B (en) * 2016-10-12 2021-04-13 厦门雅迅网络股份有限公司 Voice recognition method and system based on silence run
CN108269573A (en) * 2017-01-03 2018-07-10 蓝盾信息安全技术有限公司 Speaker Recognition System based on vector quantization and gauss hybrid models
CN106847292B (en) * 2017-02-16 2018-06-19 平安科技(深圳)有限公司 Method for recognizing sound-groove and device
CN107039036B (en) * 2017-02-17 2020-06-16 南京邮电大学 High-quality speaker recognition method based on automatic coding depth confidence network
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107799114A (en) * 2017-04-26 2018-03-13 珠海智牧互联科技有限公司 A kind of pig cough sound recognition methods and system
CN107993663A (en) * 2017-09-11 2018-05-04 北京航空航天大学 A kind of method for recognizing sound-groove based on Android
CN108022584A (en) * 2017-11-29 2018-05-11 芜湖星途机器人科技有限公司 Office Voice identifies optimization method
CN107993661A (en) * 2017-12-07 2018-05-04 浙江海洋大学 The method and system that a kind of anti-spoken language impersonates
CN108417226A (en) * 2018-01-09 2018-08-17 平安科技(深圳)有限公司 Speech comparison method, terminal and computer readable storage medium
CN108460081B (en) * 2018-01-12 2019-07-12 平安科技(深圳)有限公司 Voice data base establishing method, voiceprint registration method, apparatus, equipment and medium
CN110047491A (en) * 2018-01-16 2019-07-23 中国科学院声学研究所 A kind of relevant method for distinguishing speek person of random digit password and device
CN108922541B (en) * 2018-05-25 2023-06-02 南京邮电大学 Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
CN109147798B (en) * 2018-07-27 2023-06-09 北京三快在线科技有限公司 Speech recognition method, device, electronic equipment and readable storage medium
CN109146002B (en) * 2018-09-30 2021-06-01 佛山科学技术学院 Quick identification method of GMM (Gaussian mixture model) identifier
CN109841229A (en) * 2019-02-24 2019-06-04 复旦大学 A kind of Neonate Cry recognition methods based on dynamic time warping
CN110889009B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Voiceprint clustering method, voiceprint clustering device, voiceprint processing equipment and computer storage medium
CN111128198B (en) * 2019-12-25 2022-10-28 厦门快商通科技股份有限公司 Voiceprint recognition method, voiceprint recognition device, storage medium, server and voiceprint recognition system
CN111341327A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Speaker voice recognition method, device and equipment based on particle swarm optimization
CN111583938B (en) * 2020-05-19 2023-02-03 威盛电子股份有限公司 Electronic device and voice recognition method
CN113611284A (en) * 2021-08-06 2021-11-05 工银科技有限公司 Voice library construction method, recognition method, construction system and recognition system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011004098A1 (en) * 2009-07-07 2011-01-13 France Telecom Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011004098A1 (en) * 2009-07-07 2011-01-13 France Telecom Allocation of bits in an enhancement coding/decoding for improving a hierarchical coding/decoding of digital audio signals
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张彩娟,霍春宝,吴峰,韦春丽.《改进K-means算法在声纹识别中的应用》.《辽宁工业大学学报》.2011,第31卷(第5期),第1-4节. *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109102810A (en) * 2017-06-21 2018-12-28 北京搜狗科技发展有限公司 Method for recognizing sound-groove and device

Also Published As

Publication number Publication date
CN102509547A (en) 2012-06-20

Similar Documents

Publication Publication Date Title
CN102509547B (en) Method and system for voiceprint recognition based on vector quantization based
CN102324232A (en) Method for recognizing sound-groove and system based on gauss hybrid models
CN102800316B (en) Optimal codebook design method for voiceprint recognition system based on nerve network
CN102820033B (en) Voiceprint identification method
Chavan et al. An overview of speech recognition using HMM
CN108900725A (en) A kind of method for recognizing sound-groove, device, terminal device and storage medium
Kekre et al. Speaker identification by using vector quantization
CN103794207A (en) Dual-mode voice identity recognition method
CN101540170B (en) Voiceprint recognition method based on biomimetic pattern recognition
CN102968990A (en) Speaker identifying method and system
Todkar et al. Speaker recognition techniques: A review
CN112735435A (en) Voiceprint open set identification method with unknown class internal division capability
Zhang et al. Voice biometric identity authentication system based on android smart phone
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
Sun et al. A novel convolutional neural network voiceprint recognition method based on improved pooling method and dropout idea
JPH09507921A (en) Speech recognition system using neural network and method of using the same
Rudresh et al. Performance analysis of speech digit recognition using cepstrum and vector quantization
Goh et al. Robust computer voice recognition using improved MFCC algorithm
Sarangi et al. A novel approach in feature level for robust text-independent speaker identification system
Kumar et al. Text dependent voice recognition system using MFCC and VQ for security applications
Nijhawan et al. Speaker recognition using support vector machine
Abushariah et al. Voice based automatic person identification system using vector quantization
Chauhan et al. A review of automatic speaker recognition system
Wang et al. Robust Text-independent Speaker Identification in a Time-varying Noisy Environment.
CN109003613A (en) The Application on Voiceprint Recognition payment information method for anti-counterfeit of combining space information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130619

Termination date: 20131229