CN101315771A - Compensation method for different speech coding influence in speaker recognition - Google Patents
Compensation method for different speech coding influence in speaker recognition Download PDFInfo
- Publication number
- CN101315771A CN101315771A CNA2008100646691A CN200810064669A CN101315771A CN 101315771 A CN101315771 A CN 101315771A CN A2008100646691 A CNA2008100646691 A CN A2008100646691A CN 200810064669 A CN200810064669 A CN 200810064669A CN 101315771 A CN101315771 A CN 101315771A
- Authority
- CN
- China
- Prior art keywords
- map
- speaker
- sequence
- lambda
- deviation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Abstract
The invention relates to a compensating method of different speech coding influence in speaker recognition, in particular to a compensating method for mismatching of the speech coding in the speaker recognition in the Internet so as to solves the problem of degradation of performance of the speaker recognition caused by the mismatching of training speech and test speech coding in the speaker recognition. The method carries out characteristic processing for a voice signal of the speaker under a standard encoding mode and takes a speaker model under the standard encoding mode obtained by expectation-maximization algorithm training as a match object library; the voice signal of the speaker to be recognized is input and treated with characteristic extraction to obtain a characteristic vector sequence; the front T frames in the characteristic sequence are selected to obtain a sequence and then an MAP algorithm is carried out so as to obtain deviation of a current code and a standard code in a self-adapting way; the original characteristic sequence is adjusted and compensated by using the obtained deviation of the current code and the standard code to obtain a new characteristic vector sequence; the new characteristic vector sequence is used for respectively being matched with the speaker model under the standard encoding mode and judging to obtain a recognition result.
Description
Technical field
The present invention relates to the compensation method in a kind of speaker Recognition Technology field, specifically be a kind of on the Internet to Speaker Identification in the compensation method of voice coding when not matching.
Background technology
Speaker Identification is meant by the analyzing and processing to speaker's voice signal, confirms the speaker automatically whether in words person's set of being write down, and confirms further whom the speaker is.Although under breadboard clean speech environment, Speaker Recognition System has obtained reasonable effect, in the middle of real world applications, the performance of Speaker Recognition System will be subjected to the several factors restriction, and the recognition result of system can't be satisfactory.The mismatch problem of speech signal coding when the one of the main reasons that wherein influences performance is the training and testing that causes owing to various factors.Along with the development of modern network technology, more and more by the application of the Internet voice signal.Voice during network transmits adopt more ratio of compression higher relatively in, low rate voice coding or audio coding.The voice of low rate (audio frequency) are though compressed encoding has brought convenience for the transmission of channel, also saved storage space, but because most of voice (audio frequency) coding all is a lossy compression method, voice quality will certainly incur loss, simultaneously, more outstanding is that its coding mechanism of different coded systems is not the same, especially adopts the situation of Streaming Media coded system.Therefore, adopt voice signal after the different coding mode to exist the mismatch problem of aspect such as characteristic parameter, and often we under carrying out network during Speaker Identification, obtainable training data be the signal that adopts under certain voice (audio frequency) coded system, and when reality is used, voice signal to be measured is the signal of other coded systems, at this moment Speaker Identification just is faced with the training and testing voice owing to the different mismatch problems that produce of coding, this will influence the performance of Speaker Identification, for this reason, need research effectively to overcome the compensation method of different speech coding influence.
Summary of the invention
The present invention is for solving in the Speaker Identification process, and the problem that the Speaker Identification performance that training utterance and tested speech coding cause when not matching descends provides the compensation method of different speech coding influence in a kind of Speaker Identification.The present invention is realized by following steps:
Present encoding and standard code deviation h that step 4, usefulness obtain
MAPX adjusts compensation to feature vector sequence, obtains new feature vector sequence X, wherein X={x
1-h
MAP, x
2-h
MAP..., x
S-h
MAP;
Step 5, with new feature vector sequence X respectively with N standard code mode under speaker model { λ
n}
N=1 NMate and adjudicate and obtain recognition result.
Beneficial effect: the feature under the coding that the present invention is adopted when adjusting Speaker Identification, make it near the phonetic feature in the match objects storehouse, and employing Gaussian distribution estimated coding deviation, the speaker's phonetic feature distortion that reduces to encode and cause, thereby reduce speaker's voice coding problem that the discrimination that causes reduces that do not match, the system's average recognition rate when coding is not matched has improved 7.1%.
Description of drawings
Fig. 1 is the variation diagram of system recognition rate when the value from 0 to 0.9 of adjusting factor-alpha; Fig. 2 is the variation diagram of system recognition rate when adopting baseline system and maximal posterior probability algorithm to encode compensation respectively, and wherein " → " represents the variation line of the system recognition rate that employing MAP algorithm obtains,
The variation line of the system recognition rate that expression employing baseline system obtains.
Embodiment
Embodiment one: referring to Fig. 1 and Fig. 2, present embodiment is made up of following steps:
Present encoding and standard code deviation h that step 4, usefulness obtain
MAPX adjusts compensation to feature vector sequence, obtains new feature vector sequence X, wherein X={x
1-h
MAP, x
2-h
MAP..., x
S-h
MAP;
Step 5, with new feature vector sequence X respectively with N standard code mode under speaker model { λ
n}
N=1 NMate and adjudicate and obtain recognition result.
The process of the feature extraction in the step 2 is in the present embodiment: speaker's signal s (n) is carried out sample quantization and pre-emphasis processing, suppose that speaker's signal is stably in short-term, so can carrying out the branch frame, handles speaker's signal, the concrete frame method that divides is that the method that adopts finite length window movably to be weighted realizes, to the voice signal s after the weighting
w(n) calculate linear predictive coding (LPC), obtain feature vector sequence X={x according to the relation between LPC and the linear prediction cepstrum coefficient (LPCC) then
1, x
2..., x
S, the relation between LPC and the LPCC is as follows:
Wherein, c
LP(n) represent the n of LPCC to tie up component, a
nBe the n dimension component of LPC, p is the dimension of LPC, and n represents natural number.
Wherein, λ is with reference to speaker model, the preceding T frame sequence X that the X representative is chosen
T
According to the monotonicity of Bayesian formula and logarithmic function, formula (1) is equivalent to:
Wherein, p (h) is the priori of coding deviation h.
In order to be limited in do not encode the simultaneously priori proportion of deviation h of self-adapting data amount, in formula (2), add and adjust factor-alpha, obtain following formula:
Wherein, (X|h λ) satisfies the mixed Gaussian distribution form, that is: to p
M is 64, and i represents i mixed components, c
iRepresent the weight that each mixed components is shared.
Find the solution formula (3), estimate the present encoding deviation with greatest hope (Expectation Maximum) algorithm in T frame adaptive data centralization, the function that obtains after a series of formula variations of latent status switch Q process for gauss hybrid models is:
Wherein, h is previous iteration result; H is current iteration result.x
tIt is the phonetic feature of t frame; P (x
t, i|h, after λ) expression is adjusted t frame voice with deviation h, the probability on i the mixed components of model λ; P (x
t| h, λ) for after adjusting t frame voice with deviation h, the probability on all mixed components of model λ; P (x
t, i|h, λ) for after adjusting t frame voice with deviation h, the probability on i the mixed components of model λ; P (h) is the previous priori of coding deviation h.
Suppose the covariance matrix ∑ of coding deviation h
hGet diagonal matrix, then order
Have
Wherein, h
jBe the value of the j of current iteration result vector h dimension, j=1,2 ..., L, L are the dimension of eigenvector; x
TjValue for the j dimension of the T frame proper vector of tested speech; μ
Ij, σ
Ij 2Be respectively j average and j the variance of i mixed components of speaker model under the standard code; μ
Hj, σ
Hj 2Be respectively coding deviation h average μ
hThe value and the covariance matrix ∑ of j dimension
hJ value.
In the above in the estimation formulas to the coding deviation, about the μ of priori of coding deviation
Hj, σ
Hj 2Be unknown quantity, thereby carrying out before MAP estimates, the priori of the deviation h that at first needs to obtain to encode.
For the priori of the deviation h that obtains to encode, make that factor-alpha is 1 in the formula (6), at this moment the maximum a posteriori probability method of estimation becomes the maximum likelihood method of estimation, and corresponding iterative formula is as follows:
If H class coding is arranged, can obtain the estimated value of H class coding deviation h by formula (7), be expressed as { h
M1, h
M2..., h
MH, utilize formula (8) and (9) can estimate μ at last
hAnd ∑
hValue.
The problem that exists coding deviation h initial value to set in formula (7) is here with the initial value h of the accumulative total of the difference between the average of voice under the current non-standard coding and the reference words person model under the standard code as h
0, shown in the following formula, c wherein
iWeights for i mixed components of reference speaker model GMM;
The estimated value that deviation h has been arranged can be passed through the raw tone feature space of present encoding the feature space of compensation map to standard code, and concrete compensation policy is:
X=X-h
MAP (11)
Coupling and judging process are in the step 5: for feature vector sequence X, and the new feature vector sequence X in the formula after the X representative compensation, this artificial N speaker's posterior probability:
Wherein, p (λ
n) be N the prior probability that the people speaks; P (X) is the probability density of feature vector sequence X under N speaker's condition in the match objects storehouse; P (X| λ
n) be the class conditional probability that N people produces feature vector sequence X.The maximum posteriori criterion of recognition result:
N wherein
*Expression identification court verdict.Suppose that the prior probability that everyone speaks equates to obtain:
For each speaker, the p (X) in the formula (12) equates in addition.Like this, formula (13) can be write as
At this moment, maximum posteriori criterion has just changed into maximum likelihood criterion.
Usually in order to simplify calculating, generally adopt log-likelihood function, court verdict is:
Formula (16) is exactly closed set test decision rule.The closed set test only is discussed here, is avoided the influence of opener test threshold to discrimination, the unmatched influence of outstanding coding is to reduce the problem complexity.
Claims (2)
1, the compensation method of different speech coding influence in the Speaker Identification is characterized in that it is realized by following steps:
Step 1, adopt certain coded system, N speaker carried out speaker model { λ under N the standard code mode that characteristic processing and the training of greatest hope algorithm obtain successively at the voice signal under the standard code mode as the standard code mode
n}
N=1 NAs the match objects storehouse, wherein N represents natural number;
Step 2, input speaker's to be identified voice signal s (n) carries out feature extraction to the voice signal of importing and obtains feature vector sequence X={x
1, x
2..., x
S, wherein S represents natural number;
Step 3, in characteristic sequence X, select its preceding T frame to obtain sequence X
T={ x
1, x
2..., x
T, with this T frame sequence X
TCarry out the deviation h of MAP algorithm self-adaptation acquisition present encoding and standard code
MAP, wherein T represents natural number;
Present encoding and standard code deviation h that step 4, usefulness obtain
MAPX adjusts compensation to characteristic sequence, obtains new feature vector sequence X, wherein X={x
1-h
MAP, x
2-h
MAP..., x
S-h
MAP;
Step 5, with new feature vector sequence X respectively with N standard code mode under speaker model { λ
n}
N=1 NMate and adjudicate and obtain recognition result.
2, the compensation method of different speech coding influence in the Speaker Identification according to claim 1 is characterized in that according to the MAP algorithm described in the step 3, h
MAPMAP be estimated as:
Wherein, λ is with reference to speaker model, the preceding T frame sequence X that the X representative is chosen
T
According to the monotonicity of Bayesian formula and logarithmic function, formula (1) is equivalent to:
Wherein, p (h) is the priori of coding deviation h;
In order to be limited in do not encode the simultaneously priori proportion of deviation h of self-adapting data amount, in formula (2), add and adjust factor-alpha, obtain following formula:
Wherein, (X|h λ) satisfies the mixed Gaussian distribution form, that is: to p
Wherein, i represents i mixed components, c
iRepresent the weight that each mixed components is shared.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2008100646691A CN101315771A (en) | 2008-06-04 | 2008-06-04 | Compensation method for different speech coding influence in speaker recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNA2008100646691A CN101315771A (en) | 2008-06-04 | 2008-06-04 | Compensation method for different speech coding influence in speaker recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN101315771A true CN101315771A (en) | 2008-12-03 |
Family
ID=40106754
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNA2008100646691A Pending CN101315771A (en) | 2008-06-04 | 2008-06-04 | Compensation method for different speech coding influence in speaker recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101315771A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102024455B (en) * | 2009-09-10 | 2014-09-17 | 索尼株式会社 | Speaker recognition system and method |
CN108899032A (en) * | 2018-06-06 | 2018-11-27 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, computer equipment and storage medium |
CN109036386A (en) * | 2018-09-14 | 2018-12-18 | 北京网众共创科技有限公司 | A kind of method of speech processing and device |
-
2008
- 2008-06-04 CN CNA2008100646691A patent/CN101315771A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102024455B (en) * | 2009-09-10 | 2014-09-17 | 索尼株式会社 | Speaker recognition system and method |
CN108899032A (en) * | 2018-06-06 | 2018-11-27 | 平安科技(深圳)有限公司 | Method for recognizing sound-groove, device, computer equipment and storage medium |
CN109036386A (en) * | 2018-09-14 | 2018-12-18 | 北京网众共创科技有限公司 | A kind of method of speech processing and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110211575A (en) | Voice for data enhancing adds method for de-noising and system | |
US11482241B2 (en) | Characterizing, selecting and adapting audio and acoustic training data for automatic speech recognition systems | |
JPH0850499A (en) | Signal identification method | |
CN111161744B (en) | Speaker clustering method for simultaneously optimizing deep characterization learning and speaker identification estimation | |
CN110491400B (en) | Speech signal reconstruction method based on depth self-encoder | |
CN101315771A (en) | Compensation method for different speech coding influence in speaker recognition | |
Ion et al. | A novel uncertainty decoding rule with applications to transmission error robust speech recognition | |
Gomez et al. | One-pulse FEC coding for robust CELP-coded speech transmission over erasure channels | |
Martin et al. | Estimation of missing LSF parameters using Gaussian mixture models | |
Lee et al. | KLT-based adaptive entropy-constrained quantization with universal arithmetic coding | |
CN104183239B (en) | Method for identifying speaker unrelated to text based on weighted Bayes mixture model | |
Wan et al. | Histogram-based quantization for robust and/or distributed speech recognition | |
Suh et al. | Probabilistic class histogram equalization based on posterior mean estimation for robust speech recognition | |
Suman et al. | Speech enhancement and recognition of compressed speech signal in noisy reverberant conditions | |
HoChoi et al. | Speech recognition method using quantised LSP parameters in CELP-type coders | |
Ito et al. | Designing Side Information of Multiple Description Coding. | |
Amro et al. | Speech compression exploiting linear prediction coefficients codebook and hamming correction code algorithm | |
KR100984094B1 (en) | A voiced/unvoiced decision method for the smv of 3gpp2 using gaussian mixture model | |
Tseng et al. | Quantization for adapted GMM-based speaker verification | |
Athaudage et al. | Model-based speech signal coding using optimized temporal decomposition for storage and broadcasting applications | |
Kohata et al. | Bit rate reduction of the MELP coder using Lempel-Ziv segment quantization | |
Bozantzis et al. | Combined Source Adaptive and Channel Optimized Matrix Quantization for Noisy Channels | |
Shin et al. | Signal modification for ADPCM based on analysis-by-synthesis framework | |
Petrovsky | Audio/Speech Coding Based on the Perceptual Sparse Representation of the Signal with DAE Neural Network Quantizer and Near-End Listening Enhancement | |
KR101647058B1 (en) | Missing Feature Reconstruction Based on HMM of Feature Vectors for Robust Speech Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Open date: 20081203 |