CN103730121B - A kind of recognition methods pretending sound and device - Google Patents

A kind of recognition methods pretending sound and device Download PDF

Info

Publication number
CN103730121B
CN103730121B CN201310728591.XA CN201310728591A CN103730121B CN 103730121 B CN103730121 B CN 103730121B CN 201310728591 A CN201310728591 A CN 201310728591A CN 103730121 B CN103730121 B CN 103730121B
Authority
CN
China
Prior art keywords
voice
speaker
coefficient
calculate
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310728591.XA
Other languages
Chinese (zh)
Other versions
CN103730121A (en
Inventor
王泳
黄继武
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen University
National Sun Yat Sen University
Original Assignee
Shenzhen University
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen University, National Sun Yat Sen University filed Critical Shenzhen University
Priority to CN201310728591.XA priority Critical patent/CN103730121B/en
Publication of CN103730121A publication Critical patent/CN103730121A/en
Application granted granted Critical
Publication of CN103730121B publication Critical patent/CN103730121B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The present invention discloses a kind of recognition methods pretending sound and device, this recognition methods is the coefficient utilizing the fundamental frequency characteristic of voice to estimate voice conversion, and Mel frequency cepstral coefficient extraction algorithm is improved, i.e. utilize linear interpolation to stretch to be incorporated in Mel frequency cepstral coefficient extraction algorithm by the coefficient of estimation so that it is approximate calculation can go out converting speech Mel frequency cepstral coefficient before switching.Finally, above method is incorporated into GMM UBM(gauss hybrid models agreement context model) in identification framework, calculate the similarity between voice.Simultaneously, moreover it is possible to the voice restoration after utilizing the conversion coefficient of this estimation to change is primitive sound.The present invention compares conventional identification evidence collecting method on recognition performance raising greatly, and missing inspection and false-alarm are all low than conventional scheme.

Description

A kind of recognition methods pretending sound and device
Technical field
The present invention relates to field of multi-media information safety, more particularly, to a kind of recognition methods pretending sound and dress Put.
Background technology
Voice conversion (Voice Transformation) is one of the most frequently used method of speech processing.Its function is one Sound becomes another and sounds natural but totally different sound.Voice conversion is generally used for music making or protects speaker's Safety and privacy, but be also possible to by criminal for covering up sound, in case being identified to identity.Therefore speaking after voice conversion People's identification has important using value.
The general step of voice conversion:
1) to signal x (n) framing, windowing:
F ( k ) = &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n 0 &le; n < N - - - ( 1 )
2) instantaneous amplitude is calculated:
| F ( k ) | = | &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n | 0 &le; n < N - - - ( 2 )
3) by the phase relation calculating instantaneous frequency of this frame with former frame:
&omega; ( k ) = ( k + &Delta; ) * F s N - - - ( 3 )
Wherein FsBeing sampling frequency, Δ is the deviation frequency of relative centre frequency.
4) frequency spectrum stretches.First it is instantaneous amplitude linear interpolation
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2 (4)
μ=k′/α-k (6)
Under the premise that will not give rise to misunderstanding, still the instantaneous amplitude after interpolation is designated as | F (k) |.
Then carry out frequency line to move:
ω′(k*α)=ω(k)*α 0≤k<N/2 0≤k*α<N/2 (7)
Under the premise that will not give rise to misunderstanding, after still moving instantaneous frequency be designated as ω (k).
5) instantaneous phase φ (k) is calculated by instantaneous frequency, it is thus achieved that the FFT coefficient after voice conversion:
F(k)=|F(k)|ejφ(k) (8)
6) F (k) is carried out FFT inverse transformation, the signal after voice conversion can be obtained.
Extract the process of MFCC as shown in Figure 1.Specifically comprise the following steps that
1) windowing and calculating frequency spectrum.
MFCC therein have employed the hamming window of N=1024 point:
w ( n ) = 0.53836 - 0.46164 cos ( 2 &pi;n N - 1 ) 0 &le; n < N - - - ( 9 )
To making FFT after source signal x (n) windowing:
F ( k ) = &Sigma; n = 0 N - 1 x ( n ) &CenterDot; w ( n ) &CenterDot; e - j 2 &pi; N &CenterDot; k &CenterDot; n 0 &le; n < N - - - ( 10 )
2) Mel segmentation (triangle filtering) and logarithmic transformation:
Weighted window uses quarter window, and its formula is as follows:
H m ( k ) = 0 k < k m - 1 k - k m - 1 k m - k m - 1 k m - 1 &le; k &le; k m k m + 1 - k k m + 1 - k m k m < k &le; k m + 1 0 k > k m + 1 - - - ( 11 )
Wherein, km=f(m)·N/Fs。FsFor sampling frequency.
Utilize quarter window to making logarithmic transformation after the energy spectrum weighting of FFT:
Y ( m ) = log [ &Sigma; k = 0 N - 1 | F ( k ) | 2 &CenterDot; H m ( k ) ] 1 &le; m &le; M - - - ( 12 )
3) cosine inverse transformation
Finally utilize cosine inverse transformation, i.e. can get Mel cepstrum coefficient, i.e. MFCC:
MFCC ( n ) = 1 M &Sigma; m = 1 M Y ( m ) cos ( n ( m - 0.5 ) &pi; M ) 1 &le; m &le; M 0 &le; n &le; N - 1 - - - ( 13 )
GMM-UBM
Speaker Identification can be considered two class hypothesis:
H0: Y is from speaker S;
H1: Y is not from speaker S.
Mathematically, H0With the model λ of speaker ShypRepresent, H1With agreement context model λbkgRepresent.Probability calculation is such as Shown in formula (14):
Along with the extensive application of Audiotechnica, protection audio production becomes a study hotspot of information security field.Language Sound evidence obtaining is also one of them important field.In judicial, business and other application, the speaker's identity after voice conversion is known The most all there is important using value.Test result indicate that, at voice after bigger conversion, conventional Speaker Identification side Case can cause higher or high loss and false alarm rate, identifies entirely ineffective.
Summary of the invention
The primary and foremost purpose of the present invention is to propose a kind of recognition methods pretending sound, and one turns to use the method to identify Change the speaker's identity of audio production, the speaker's identity after changing through voice has critically important using value.
A further object of the present invention is to propose the identification device of a kind of language camouflage sound.
In order to solve the deficiencies in the prior art, the technical solution used in the present invention is as follows:
A kind of recognition methods pretending sound, described method includes:
In the training stage, maximum expected value EM algorithm is utilized to calculate agreement context model UBM λ from background sound storehousebkg
In the training stage, extract tested speech S of speaker jjMel cepstrum coefficient MFCC and fundamental frequency, after utilizing maximum Test probability MAP algorithm and calculate the gauss hybrid models GMM λ of speaker jj, calculate fundamental frequency mean value fj;Set up the mould of speaker j Type Vj=(λj,fj), and be stored in model database;
Threshold θ is obtained in the training stage;Threshold θ acquisition methods: calculate Customer Score and personator's mark, utilize this two class The distribution of mark selects threshold θ to reach to meet loss and false alarm rate, the wherein Customer Score Client that application requires Scores, is speaker's sound bite probability under this speaker model, and personator mark Imposter Scores, is to say Words people's sound bite probability under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice YY;Utilize fY/fjMeter Calculate conversion coefficient;Original MFCC coefficient X before utilizing modified MFCC extraction algorithm to calculate Y conversion;Through based on GMM-UBM general Rate algorithm for estimating show that Y is model VjProbability Λ (X);
Relatively probability Λ (X) and threshold θ, if gained probability is more than threshold θ, voice Y is fragment described in j;Otherwise voice Y Not described in j;
Wherein said modified MFCC extraction algorithm particularly as follows: after windowing in MFCC extraction algorithm and FFT, Amplitude to FFT coefficient | F (k) | carries out that linear interpolation is flexible draws | F (k ') |, and the amplitude linear interpolation of FFT coefficient is flexible such as Shown in lower formula:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
Wherein 1/ α ' is the inverse of the conversion coefficient of described estimation, and α ' is the conversion coefficient estimated, α '=fY/fj
In the preferred scheme of one, the extraction step of described fundamental frequency is as follows:
(1) signal windowing is asked obtain any instant tmidThe signal of front later predetermined length value;
(2) auto-correlation function and the auto-correlation function of window function of the signal of described predetermined length value are asked;
(3) pairwise correlation function is divided by, and is cycle T at maximum, obtains this moment tmidFundamental frequency F.
In the preferred scheme of one, described fundamental frequency mean value is mean (F), and mean () is for being averaging.
In the preferred scheme of one, as α ' > 1, frequency spectrum compensation need to be carried out;Making nyquist frequency is Fn;Compensation method It is at Fn/ 2/ α ' to Fn/2/α′-FnIn frequency spectrum between/2, symmetry copies into Fn/ 2/ α ' to FnIn the range of/2/.Compensate herein Effect be approximation reduction Fn/ 2/ α ' to FnThe amplitude of frequency range between/2/, thus make the MFCC value after reduction close to original MFCC Value.
A kind of identification device pretending sound, including:
Training module, is used for utilizing maximum expected value EM algorithm to calculate agreement context model UBM from background sound storehouse λbkg;Extract tested speech S of speaker jjMel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm meter Calculate the gauss hybrid models GMM λ of speaker jj, calculate fundamental frequency mean value fj;Set up the model V of speaker jj=(λj,fj), and deposit Storage, in model database, obtains threshold θ in the training stage;
Wherein threshold θ acquisition methods: calculate Customer Score and personator's mark, utilizes the distribution of this two classes mark to select threshold Value θ, to reach to meet loss and false alarm rate, the wherein Customer Score Client Scores that application requires, is speaker's voice Fragment probability under this speaker model, personator mark Imposter Scores, is that speaker's sound bite is said at other Probability under words human model;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value fY;Utilize fY/fjCalculate and turn Change coefficient;Original MFCC coefficient X before utilizing modified MFCC extraction algorithm to calculate Y conversion;Estimate through probability based on GMM-UBM Calculating method show that Y is model VjProbability Λ (X);
Identification module, compares probability Λ (X) and threshold θ, if gained probability is more than threshold θ, voice Y is fragment described in j; Otherwise voice Y is not described in j;
Wherein the modified MFCC extraction algorithm specific implementation described in test module is:
After windowing in MFCC extraction algorithm and FFT, | the F (k) | of the amplitude to FFT coefficient carries out linear interpolation Stretching and draw | F (k ') |, the amplitude linear interpolation of FFT coefficient is stretched shown in equation below:
|F(k′)|=μ|F(k)|+(1-μ)|F(k+1)| 0≤k<N/2 0≤k′<N/2
μ=k′/(1/α′)-k
Wherein 1/ α ' is the inverse of the conversion coefficient of described estimation, and α ' is the conversion coefficient estimated, α '=fY/fj
Compared with prior art, the invention have the benefit that the present invention compares conventional identification on recognition performance and collects evidence Method has greatly raising, uses the mean value of fundamental frequency and estimates conversion coefficient, having made MFCC extraction algorithm to improve simultaneously, Can directly calculate the MFCC feature before voice conversion, calculate test language by method for calculating probability based on GMM-UBM Whether sound is described in some target speaker, and missing inspection and false-alarm are all low than conventional scheme.
Accompanying drawing explanation
Fig. 1 is the extraction process schematic of Mel frequency cepstral coefficient.
Fig. 2 is that the conversion coefficient contrast estimated is true obtains conversion coefficient (true factor alpha (k)=2k/12, estimation coefficient α ' (y) =2y/12) comparison schematic diagram.
Fig. 3 is tetra-kinds of frequency domain camouflage methods of EER() curve map.
Fig. 4 is tetra-kinds of frequency domain methods of DET() curve map.
Fig. 5 is EER(TD-PSOLA) curve map.
Fig. 6 is DET(TD-PSOLA) curve map.
Detailed description of the invention
As seen in figures 3-6, the present invention is disclosed in the training stage, utilizes EM(Expectation Maximum greatest hope Value) algorithm calculates UBM(agreement context model from background sound storehouse) λbkg;In the training stage, extract the test language of speaker j Sound SjMFCC coefficient and fundamental frequency, utilize MAP((Maximum A posteriori, maximum a posteriori probability) algorithm calculate speak The GMM(Gaussian Mixture Model of people j) model λj, calculate fundamental frequency mean value fj.Set up the model V of speaker jj= (λj,fj), and be stored in model database.Threshold θ is obtained in the training stage.At test phase, voice Y is after conversion Voice, extract its fundamental frequency mean value fY.Utilize fY/fjCalculate conversion coefficient;Utilize modified MFCC extraction algorithm to calculate Y to turn Original MFCC coefficient X before changing.Show that Y is model V through probability Estimation algorithm based on GMM-UBMjProbability Λ (X).If institute Obtain probability and then identify that voice Y is fragment described in j more than threshold θ;If no more than threshold value, then identify that voice Y is not described in j.
The method of estimation calculating described conversion coefficient is: α '=fY/fj, wherein α ' is the conversion coefficient of described estimation, and fundamental frequency is put down Average is that fundamental frequency is averaging gained.
The extraction step of fundamental frequency value is as follows:
(1) signal windowing is asked obtain any instant tmidThe signal of a predetermined length value front and back;
(2) auto-correlation function and the auto-correlation function of window function of the signal of described predetermined length value are asked;
(3) pairwise correlation function is divided by, and is cycle T at maximum, obtains this moment tmidFundamental frequency F.
After modified extraction algorithm is the windowing in Mel frequency cepstral coefficient extraction algorithm and FFT, to FFT The amplitude of coefficient | F (k) | carries out that linear interpolation is flexible draws | F (k ') |.The amplitude linear interpolation of FFT coefficient is stretched equation below Shown in:
| F (k ') |=μ | F (k) |+(1-μ) | F (k+1) | 0≤k < N/2 0 £ k ' < N/2
μ=k′/(1/α′)-k
The inverse that value 1/ α ' that wherein linear interpolation is flexible is the conversion coefficient of described estimation.
The method of matching primitives is method for calculating probability based on GMM-UBM.Matching primitives refers to that calculating sound bite exists Probability under certain model, this probability reflects the probability of the speaker speaking this sound bite this model artificial representing.
Now provide the sound bank and some experimental results utilizing the inventive method to be used.
Sound bank is TIMIT.This is sound bank the most frequently used in speech/speaker identification.Comprise from 8 different geographicals 192 female and 438 man, totally 630 speakers.Each speaker reads 10 sections of different voices respectively, amounts to 6300 sections of voices.Institute Having voice is WAV form, 8k sample rate, 16bit quantified precision.TIMIT is divided into three word banks by this experiment:
1) agreement context sound bank: 60 men, the 60 all sound bites of woman link together and train UBM.
2) the regular sound bank of mark: 40 female, 90 man's sound bites are for mark regular (TNorm).
3) exploitation-evaluation and test sound bank: 92 female, 288 men.To speaker j, its 5 sections of fragments are linked to be one section and train the 2048 of j Dimension GMM model λjAnd fundamental frequency mean value fj.Remaining 5 sections are linked to be one section, and act on this with different switching transformation of coefficient camouflage In one fragment.For training the sound bank of speaker model to be referred to as developing sound bank;Sound bank for camouflage is referred to as evaluating and testing language Sound storehouse.
Consider five kinds of transformation tool (method): Adobe Audition based on frequency domain, Audacity, GoldWave, RSITI and TD-PSOLA based on time domain.Adjusting as shift strength with 12 half inside acoustics, conversion coefficient is adjusted just like ShiShimonoseki with half System:
α(k)=2k/12
Experiment only considers the voice conversion of-11≤k≤11, because practical audio frequency (voice) instrument generally provides-11≤k The voice conversion of≤11.
By transfer function H (z)=1-0.97z-1Voice signal is carried out pre-emphasis.
Frame length 1024 sample point, 24 dimension MFCC matrixes are made up of 12 dimension MFCC coefficients and 12 dimension Δ MFCC coefficients.
The experimental example of estimating voice conversion coefficient is given below.The estimate of the conversion coefficient of same people is averaging, And in true that conversion coefficient is made comparisons, display is in fig. 2.
Recognition performance is given below.Fig. 3 is EER(Equal Error Rate, i.e. loss False Reject Rate (FRR) equal to false alarm rate False Alarm Rate (FAR)).Overall situation EER is shown in Table 1 and table 2.
Table 1 overall situation EER, | k |≤11(%)
Table 2 overall situation EER, | k |≤8(%)
Fig. 4 is DET(Detection Error Tradeoff, detection mistake balance).Visible, conventional scheme (baseline) performance is destroyed completely by various camouflage methods, say, that conventional Speaker Recognition System cannot correctly be known Do not pretend the speaker's identity of voice.And the method for the present invention (proposed with estimated scaling factor) Significantly reduce error probability, go up largely and can identify speaker's identity, reach acceptable level in a lot of application. Additionally give this method in the case of conversion coefficient is right-on performance (this performance be use the present invention can reach Best performance).From each chart, the performance that the present invention reaches is very close to optimum performance.
The method of the present invention also includes the identification to TD-PSOLA camouflage.Result shows in Fig. 5 and Fig. 6.Routine side Case performance is slightly better than the method that the present invention is put forward.But owing to the sense of hearing that also cannot keep voice when shift strength is little is natural Property, therefore range of application is less, and present useful application software is the most real in this way.

Claims (5)

1. the recognition methods pretending sound, it is characterised in that described method includes:
In the training stage, maximum expected value EM algorithm is utilized to calculate agreement context model UBM λ from background sound storehousebkg
In the training stage, extract tested speech S of speaker jjMel cepstrum coefficient MFCC and fundamental frequency, utilize maximum a posteriori probability MAP algorithm calculates the gauss hybrid models GMM λ of speaker jj, calculate fundamental frequency mean value fj;Set up the model V of speaker jj= (λj,fj), and be stored in model database;
Threshold θ is obtained, threshold θ acquisition methods: calculate Customer Score and personator's mark, utilize this two classes mark in the training stage Distribution select threshold θ with the loss reaching to meet application and requiring and false alarm rate, wherein Customer Score Client Scores, Being speaker's sound bite probability under speaker model, personator mark Imposter Scores, is speaker's voice sheet Section probability under other speaker model;
At test phase, voice Y is the voice after conversion, extracts the fundamental frequency mean value f of voice YY;Utilize fY/fjCalculate and turn Change coefficient;Original MFCC coefficient X before utilizing modified MFCC extraction algorithm to calculate Y conversion;Estimate through probability based on GMM-UBM Calculating method show that Y is model VjProbability Λ (X);
Relatively probability Λ (X) and threshold θ, if gained probability is more than threshold θ, voice Y is fragment described in j;Otherwise voice Y is not j Described;
Wherein said modified MFCC extraction algorithm is particularly as follows: after windowing in MFCC extraction algorithm and FFT, right The amplitude of FFT coefficient | F (k) | carries out that linear interpolation is flexible draws | F (k ') |, and the amplitude linear interpolation of FFT coefficient is flexible as follows Shown in formula:
| F (k ') |=μ | F (k) |+(1-μ) | F (k+1) | 0≤k < N/2,0≤k ' < N/2
μ=k '/(1/ α ')-k
Wherein 1/ α ' is the inverse of the conversion coefficient estimated, α ' is the conversion coefficient estimated, α '=fY/fj
The recognition methods of camouflage sound the most according to claim 1, it is characterised in that the extraction step of described fundamental frequency is such as Under:
(1) signal windowing is asked obtain any instant tmidThe signal of front later predetermined length value;
(2) auto-correlation function and the auto-correlation function of window function of the signal of described predetermined length value are asked;
(3) pairwise correlation function is divided by, and is cycle T at maximum, obtains this moment tmidFundamental frequency F.
The recognition methods of camouflage sound the most according to claim 2, it is characterised in that described fundamental frequency mean value is mean (F), mean () is for being averaging.
The recognition methods of camouflage sound the most according to claim 1, it is characterised in that as α ' > 1, frequency spectrum compensation need to be carried out; Making nyquist frequency is Fn;Compensation method is at Fn/ 2/ α ' to Fn/2/α′-FnIn frequency spectrum between/2, symmetry copies into Fn/2/ α ' to FnIn the range of/2/.
5. the identification device pretending sound, it is characterised in that including:
Training module, is used for utilizing maximum expected value EM algorithm to calculate agreement context model UBM λ from background sound storehousebkg;Carry Take tested speech S of speaker jjMel cepstrum coefficient MFCC and fundamental frequency, utilize Maximize algorithm calculate speak The gauss hybrid models GMM λ of people jj, calculate fundamental frequency mean value fj;Set up the model V of speaker jj=(λj,fj), and it is stored in mould In type database, obtain threshold θ in the training stage;
Wherein threshold θ acquisition methods: calculate Customer Score and personator's mark, utilizes the distribution of this two classes mark to select threshold θ To reach to meet loss and false alarm rate, the wherein Customer Score Client Scores that application requires, it it is speaker's sound bite Probability under speaker model, personator mark Imposter Scores, is that speaker's sound bite is at other speaker's mould Probability under type;
Test module, is the voice after conversion at voice Y, extracts its fundamental frequency mean value fY;Utilize fY/fjCalculate conversion system Number;Original MFCC coefficient X before utilizing modified MFCC extraction algorithm to calculate Y conversion;Calculate through probability Estimation based on GMM-UBM Method show that Y is model VjProbability Λ (X);
Identification module, compares probability Λ (X) and threshold θ, if gained probability is more than threshold θ, voice Y is fragment described in j;Otherwise Voice Y is not described in j;
The modified MFCC extraction algorithm wherein used in test module is particularly as follows: the windowing in MFCC extraction algorithm and FFT After conversion, | the F (k) | of the amplitude to FFT coefficient carries out that linear interpolation is flexible show that | F (k ') |, the amplitude of FFT coefficient linearly insert Shown in the flexible equation below of value:
| F (k ') |=μ | F (k) |+(1-μ) | F (k+1) | 0≤k < N/2,0≤k ' < N/2
μ=k '/(1/ α ')-k
Wherein 1/ α ' is the inverse of the conversion coefficient estimated, α ' is the conversion coefficient estimated, α '=fY/fj
CN201310728591.XA 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device Expired - Fee Related CN103730121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310728591.XA CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310728591.XA CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Publications (2)

Publication Number Publication Date
CN103730121A CN103730121A (en) 2014-04-16
CN103730121B true CN103730121B (en) 2016-08-24

Family

ID=50454168

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310728591.XA Expired - Fee Related CN103730121B (en) 2013-12-24 2013-12-24 A kind of recognition methods pretending sound and device

Country Status (1)

Country Link
CN (1) CN103730121B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104183245A (en) * 2014-09-04 2014-12-03 福建星网视易信息系统有限公司 Method and device for recommending music stars with tones similar to those of singers
CN105976819A (en) * 2016-03-23 2016-09-28 广州势必可赢网络科技有限公司 Rnorm score normalization based speaker verification method
CN109215680B (en) * 2018-08-16 2020-06-30 公安部第三研究所 Voice restoration method based on convolutional neural network
CN109741761B (en) * 2019-03-13 2020-09-25 百度在线网络技术(北京)有限公司 Sound processing method and device
CN109920435B (en) * 2019-04-09 2021-04-06 厦门快商通信息咨询有限公司 Voiceprint recognition method and voiceprint recognition device
CN110363406A (en) * 2019-06-27 2019-10-22 上海淇馥信息技术有限公司 Appraisal procedure, device and the electronic equipment of a kind of client intermediary risk
US11227601B2 (en) * 2019-09-21 2022-01-18 Merry Electronics(Shenzhen) Co., Ltd. Computer-implement voice command authentication method and electronic device
CN111739547B (en) * 2020-07-24 2020-11-24 深圳市声扬科技有限公司 Voice matching method and device, computer equipment and storage medium
CN112967712A (en) * 2021-02-25 2021-06-15 中山大学 Synthetic speech detection method based on autoregressive model coefficient
CN113270112A (en) * 2021-04-29 2021-08-17 中国人民解放军陆军工程大学 Electronic camouflage voice automatic distinguishing and restoring method and system
CN116013323A (en) * 2022-12-27 2023-04-25 浙江大学 Active evidence obtaining method oriented to voice conversion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1914667A (en) * 2004-06-01 2007-02-14 东芝泰格有限公司 Speaker recognizing device, program, and speaker recognizing method
CN1967657A (en) * 2005-11-18 2007-05-23 成都索贝数码科技股份有限公司 Automatic tracking and tonal modification system of speaker in program execution and method thereof
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN102354496A (en) * 2011-07-01 2012-02-15 中山大学 PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005345599A (en) * 2004-06-01 2005-12-15 Toshiba Tec Corp Speaker-recognizing device, program, and speaker-recognizing method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1914667A (en) * 2004-06-01 2007-02-14 东芝泰格有限公司 Speaker recognizing device, program, and speaker recognizing method
CN1967657A (en) * 2005-11-18 2007-05-23 成都索贝数码科技股份有限公司 Automatic tracking and tonal modification system of speaker in program execution and method thereof
CN101399044A (en) * 2007-09-29 2009-04-01 国际商业机器公司 Voice conversion method and system
CN102354496A (en) * 2011-07-01 2012-02-15 中山大学 PSM-based (pitch scale modification-based) speech identification and restoration method and device thereof

Also Published As

Publication number Publication date
CN103730121A (en) 2014-04-16

Similar Documents

Publication Publication Date Title
CN103730121B (en) A kind of recognition methods pretending sound and device
CN106847292B (en) Method for recognizing sound-groove and device
CN103345923B (en) A kind of phrase sound method for distinguishing speek person based on rarefaction representation
CN102968990B (en) Speaker identifying method and system
CN108694954A (en) A kind of Sex, Age recognition methods, device, equipment and readable storage medium storing program for executing
CN104900229A (en) Method for extracting mixed characteristic parameters of voice signals
CN104464724A (en) Speaker recognition method for deliberately pretended voices
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
WO2013040981A1 (en) Speaker recognition method for combining emotion model based on near neighbour principles
CN103077728B (en) A kind of patient&#39;s weak voice endpoint detection method
CN103456302B (en) A kind of emotional speaker recognition method based on the synthesis of emotion GMM Model Weight
CN106409298A (en) Identification method of sound rerecording attack
CN102789779A (en) Speech recognition system and recognition method thereof
CN104240706A (en) Speaker recognition method based on GMM Token matching similarity correction scores
CN101887722A (en) Rapid voiceprint authentication method
Alam et al. Tandem Features for Text-Dependent Speaker Verification on the RedDots Corpus.
CN104464738B (en) A kind of method for recognizing sound-groove towards Intelligent mobile equipment
CN100570712C (en) Based on anchor model space projection ordinal number quick method for identifying speaker relatively
Mahesha et al. LP-Hillbert transform based MFCC for effective discrimination of stuttering dysfluencies
CN116434759B (en) Speaker identification method based on SRS-CL network
CN105976819A (en) Rnorm score normalization based speaker verification method
Wen et al. Multi-Path GMM-MobileNet Based on Attack Algorithms and Codecs for Synthetic Speech and Deepfake Detection.
CN116665649A (en) Synthetic voice detection method based on prosody characteristics
Dai et al. An improved feature fusion for speaker recognition
Li et al. Voice-based recognition system for non-semantics information by language and gender

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160824

Termination date: 20211224

CF01 Termination of patent right due to non-payment of annual fee