CN105810198A - Channel robust speaker identification method and device based on characteristic domain compensation - Google Patents

Channel robust speaker identification method and device based on characteristic domain compensation Download PDF

Info

Publication number
CN105810198A
CN105810198A CN201610173039.2A CN201610173039A CN105810198A CN 105810198 A CN105810198 A CN 105810198A CN 201610173039 A CN201610173039 A CN 201610173039A CN 105810198 A CN105810198 A CN 105810198A
Authority
CN
China
Prior art keywords
score
speaker
voice signal
domain compensation
identifying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610173039.2A
Other languages
Chinese (zh)
Inventor
陈昊亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Network Technology Co Ltd
Original Assignee
Guangzhou Speakin Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Network Technology Co Ltd filed Critical Guangzhou Speakin Network Technology Co Ltd
Priority to CN201610173039.2A priority Critical patent/CN105810198A/en
Publication of CN105810198A publication Critical patent/CN105810198A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/10Multimodal systems, i.e. based on the integration of multiple recognition engines or fusion of expert systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/12Score normalisation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/20Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a channel robust speaker identification method based on characteristic domain compensation. The method comprises the following steps: testing a voice signal by virtue of a main system based on MFCC (mel-scale frequency cepstral coefficients) and an auxiliary system based on base frequency parameters, so that a first score and a second score are obtained; and judging whether the first score is higher than a threshold or not, if so, identifying the voice signal in accordance with the first score, otherwise identifying the voice signal in accordance with a scoring value obtained by fusing the first score and the second score. According to the embodiment disclosed by the invention, relatively good scoring distribution under the main system can be effectively protected, and a compensation effect of base frequency characteristic parameter speaker verification is improved.

Description

Speaker's discrimination method of the channel robustness of feature based domain compensation and device
Technical field
The present invention relates to a kind of speaker's discrimination method and device, speaker's discrimination method of the channel robustness of especially a kind of feature based domain compensation and device.
Background technology
Day by day ripe and universal recently as speech recognition technology, exchanges with computer language sound and has become as reality.Speech recognition is one and relates to computer science, and psychology and biology etc., multi-disciplinary cross discipline, can be largely classified into semantics recognition, languages identification and three main research directions of Speaker Identification.Wherein, Speaker Identification, also referred to as Application on Voiceprint Recognition, refers to after being gone out by the information retrieval characterizing speaker's personal characteristics in voice, through a series of process and then the technology that can distinguish or confirm speaker ' s identity.Differentiating that the identity of people has the advantage of many uniquenesses with voice, such as not easily forget or lose, the degree of dependence that user is coordinated is low, and system equipment is less costly, and is readily applied to communication system.
At present, speaker Recognition Technology is widely used, and such as University of Science and Technology news fly the clever storehouse voice assistant vocal print mobile phone screen locking released with China Mobile.For another example in judicial expertise, whether the sound that scene of a crime stays comes from suspect actually, it is necessary to utilize speaker Recognition Technology auxiliary to identify.
Speaker Identification can also continue to be divided into many sub-research directions.Difference according to recognition method, it is possible to be divided into speaker's identification and speaker verification's two types.Wherein, speaker identification be to input one section of voice signal, from a series of speakers gather in the middle of recognize the speaker most possibly saying this section of voice;Speaker verification sends whether judgement input voice speaks population from target, and the result of speaker verification can only be certainly or negate.From the requirement to voice content, it is possible to be divided into the two kind Speaker Identification types relevant with text that text is unrelated.For the unrelated Speaker Identification of text, it is not necessary to there is the model training language material of certain content, and do not require that testing material keeps consistent with corpus.If whether according to speaker to be identified within gathering already registered with good person in advance, Speaker Identification can be divided into two classes: opener identification and closed set identification.Wherein, opener identification in speaker to be identified might not all set in, and closed set identification in speaker to be identified all set in.Obviously, need to make rejection judgement to collecting the identification of outer speaker's opener, and closed set identification effect is better than opener identification.It is no matter speaker verification or open set speaker identification is required for that non-targeted words person or collection unduly polite words that a friend is not expected to say person are made refusal and identifies, in order to reach good rejection effect, common practice is to provide one to emit and recognize model or background model, as the object compared when rejection, and it is easy to selected threshold value.
What traditional method for identifying speaker adopted is channel parameters MFCC, the advantage that this characteristic parameter has himself owing to being conceived to human hearing characteristic in short-term.Current MFCC parameter is one of voice field prevailing characteristics parameter, but speaker's individual information that voice signal itself carries far is not limited to this.Fundamental frequency characteristic parameter is a kind of sound source, speaker's individual information containing relatively horn of plenty, it is very weak with sound channel dependence on parameter in short-term and has certain noiseproof feature, the weakness that channel parameters is easily affected by channel noise in short-term can be overcome in a way, thus can use as the supplementary parameter of the MFCC parameter based on short-term spectrum after parametrization.Using fundamental frequency characteristic parameter that speaker verification's compensation method is had two kinds, a kind of is L ties up base frequency parameters be attached to the big vector constituting a D+L dimension after D ties up MFCC vector, as the characteristic parameter that words person confirms.Although way is relatively simple, but for not having the mute frame of fundamental frequency and unvoiced frames to be then difficult to process.Owing to independent fundamental frequency aid system performance indications are poor, cause that under main system, scoring distribution preferably may be destroyed by fundamental frequency aid system, it is impossible to accurately judge speaker.
Summary of the invention
It is an object of the invention to provide speaker's discrimination method of the channel robustness of a kind of feature based domain compensation and device, aim to solve the problem that the aid system performance indications based on base frequency parameters are poor, cause that the main system based on MFCC parameter may be destroyed the technical problem of speaker verification's effect when scoring distribution preferably by aid system.
For this, the technical scheme that speaker's discrimination method of the channel robustness of a kind of feature based domain compensation of the present invention and device adopt is as follows:
A kind of speaker discrimination method of the channel robustness of feature based domain compensation, including:
Adopt the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtain the first score and the second score;
Judge whether described first score is entered higher than a threshold value,
If it is, described voice signal is carried out identification according to described first score, otherwise, according to the score value after the fusion of described first score and described second score, described voice signal is carried out identification.
The described step described voice signal being carried out identification according to the fusion results of described first score Yu described second score includes:
Merge according to following formula:
S c o r e = Σ n = 1 N ω n s n
Wherein, Score is score value after merging, subsystem number N=2, ωnRepresenting weight, Sn represents main system first score corresponding with aid system and the second score.
The described step described voice signal being carried out identification according to the fusion results of described first score Yu described second score includes:
Merge according to following formula:
S c o r e = 1 T Σ t = 1 T { log [ p ( o t | λ ) ] - log [ p ( o t | U B M ) ] }
Wherein, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.
Described threshold value is 0.
A kind of speaker device for identifying of the channel robustness of feature based domain compensation, including:
Grading module, for adopting the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtains the first score and the second score;
Judge module, is used for judging that whether described first score is higher than a threshold value;
Recognition module, if the absolute value of described first score is higher than described threshold value, then for described voice signal being carried out identification according to described first score;If the absolute value of described first score is lower than described threshold value, then for the score value after the fusion according to described first score and described second score, described voice signal is carried out identification.
Described device for identifying includes:
Merge according to following formula:
S c o r e = Σ n = 1 N ω n s n
Wherein, Score is score value after merging, subsystem number N=2, ωnRepresenting weight, Sn represents main system first score corresponding with aid system and the second score.
Described device for identifying includes:
Merge according to following formula:
S c o r e = 1 T Σ t = 1 T { log [ p ( o t | λ ) ] - log [ p ( o t | U B M ) ] }
Wherein, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.
Described threshold value is 0.
Compared with prior art, speaker's discrimination method of the channel robustness of the feature based domain compensation described in the embodiment of the present invention and device promote fundamental frequency feature interlocutor by setting threshold value and confirm to compensate performance, in the test result of main system, if the absolute value of the first score is higher than threshold value, then think high-quality test result, directly adopt the first score of this main system to carry out identification;The test result that the absolute value of the first score is lower than threshold value then needs the fusion of the second score with aid system to improve.The embodiment of the present invention can effectively protect distribution of marking preferably under main system, improves the compensation effect of fundamental frequency characteristic parameter speaker verification.
Accompanying drawing explanation
Fig. 1 is the flow chart of speaker's discrimination method one embodiment of the channel robustness of a kind of feature based domain compensation of the present invention;
Fig. 2 is auto-correlation function oscillogram;
Fig. 3 is the auto-correlation function oscillogram after bandpass filtering
Fig. 4 is fundamental frequency aid system DET curve
Fig. 5 is the schematic diagram in speaker's discrimination method one embodiment of the channel robustness of a kind of feature based domain compensation of the present invention, scoring merged;
Fig. 6 be after main system and auxiliary system globe area appraisal result change affect schematic diagram;
Fig. 7 is the structural representation of speaker's device for identifying one embodiment of the channel robustness of a kind of feature based domain compensation of the present invention.
Detailed description of the invention
Below in conjunction with accompanying drawing, embodiments of the present invention are described further.
It it is the flow chart of speaker's discrimination method one embodiment of the channel robustness of a kind of feature based domain compensation of the present invention referring to Fig. 1, Fig. 1.In the embodiment shown in Fig. 1, this speaker's discrimination method includes step S101-S104.
In step S101, adopt the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtain the first score and the second score.
Wherein, MFCC characteristic parameter, namely channel parameters is typical channel information parameter in short-term, and fundamental frequency describes a key character in voice-activated source, and the size of fundamental frequency is equal to the size of vibration frequency of vocal band, and fundamental frequency and pitch period meet reciprocal relation.Importance in view of pitch Detection, have been proposed for various method for extracting base-sound period at present, such as auto-correlation function (AutocorrelationFunction, ACF) method, average magnitude difference function (AverageMagnitudeDifferenceFunction, AMDF) method, YIN method, Cepstrum Method, wavelet method etc.;
1) correlation method
A bit of signal s is intercepted out from audio signal s (m) with the rectangular window that window length is NnM () be s afterwardsnM the auto-correlation function of () is:
R n = Σ m = 0 N - k - 1 S n ( m ) S n ( m + k )
Auto-correlation function meets even function definition, for Voiced signal, peak value can be had to occur at each pitch period place of auto-correlation function.Pitch period can be estimated with the 1st of auto-correlation function the peak.In above formula, along with the increase postponing k, cumulative item number reduces, and the envelope amplitude of auto-correlation function increases along with k and declines generally, it is generally acknowledged that peak-peak place is exactly pitch period value at corresponding time in actual applications.When with estimate of autocorrelation pitch period, generally adopting rectangular window, whether suitable the accuracy of fundamental detection is very big with rectangular window length relation.It is generally acknowledged that window length is no less than two pitch periods, so that snM () comprises abundant pitch period, but window length can not be excessive, otherwise will affect short-time characteristic.Getting 50Hz owing to fundamental frequency is minimum under normal circumstances, therefore in voice signal, the longest pitch period is about 20 milliseconds, and when extracting pitch period, window length should be not less than 40 milliseconds.It is auto-correlation function oscillogram referring to Fig. 2, Fig. 2, gives the auto-correlation oscillogram of one section of a certain frame of Voiced signal.The sample rate of this section of Voiced signal is 8KHz, frame length 40ms, and frame moves lOms, and transverse axis represents sampled point, and in fig. 2, extracting fundamental frequency value is 108.1Hz, correspond to about 75th sampling point position.As in figure 2 it is shown, pitch period place peak value be insufficient to prominent and with some peak value before closely, as easy as rolling off a log being flooded by false peaks, thus causing the mistake that fundamental frequency extracts, this is mainly the harmful effect that formant characteristic causes.It practice, fundamental frequency extracts the wrong impact being all affected by formant in a lot of situations, in order to reduce the interference of formant, the most direct way filters exactly.Being the band filter of 60Hz to 900Hz by voice signal bandwidth, the high end cut-off frequencies of 900Hz both can eliminate the impact of major part formant, can avoid again filtering low order harmonics composition when fundamental frequency is up to 450Hz.When low side cut-off frequency is set to 60Hz, it is possible to effectively suppressing the industrial frequency noise of 5OHz, after filtering, auto-correlation function oscillogram is as shown in Figure 3.Comparing with Fig. 2, the pitch period position peak value of Fig. 3 is more prominent, and after Filtering Processing, the impact of pitch period is lowered by formant.
(2) Praat improves auto-correlation fundamental frequency feature extraction algorithm
For removing formant, the impact that fundamental frequency extracts be have employed the band filter of upper cut-off frequency 900Hz, although this method is effectively inoperative to the first formant under 900Hz.PaulBoersma proposes the improvement auto-correlation fundamental frequency extraction algorithm of a kind of simplicity for this problem and has been used in speech analysis and composite software Praat, and follow-up fundamental frequency extraction work major part is completed by Praat script.
In the auto-correlation fundamental frequency extraction algorithm of this improvement, adopting a kind of regular auto-correlation function (NormalizedAutocorrelation), the regular auto-correlation function of a certain section of voice signal x (t) is shown below:
r x ′ ( τ ) = r x ( τ ) r x ( 0 )
By the character of auto-correlation function it can be seen that the value of this formula is in 0~1 scope.This fundamental frequency extraction algorithm mainly comprises the following steps:
● to voice signal windowing, being employed herein Hanning window (HarmingWindow), window function is as follows:
w ( t ) = 0.5 - 0.5 * c o s 2 π t T
Voice signal needs to remove flip-flop windowing again, and after windowing, signal a (t) is following formula:
A (t)=[x (t)-μx]*w(t)
● after above formula windowing, signal a (t) is through trying to achieve regular auto-correlation function raZ (), research shows that this auto-correlation function peak-peak is likely to appear in the first stronger formant place.The auto-correlation function formula of window function is:
r w ′ ( τ ) = ( 1 - | τ | T ) ( 2 3 + 1 3 c o s 2 π τ T ) + 1 2 π sin 2 π | τ | T
● by raZ () and the regular auto-correlation function of window function are divided by, obtain final improvement auto-correlation function institute following formula:
r x ( τ ) = r a ′ ( τ ) r w ′ ( τ )
In the auto-correlation function improved, the autocorrelation peak at pitch period place is raised to closely thus avoiding the impact of strong first formant place peak value.
Words person in order to investigate fundamental frequency characteristic parameter confirms performance, it is possible to adopting GMM-UBM framework is words person's modeling, and UBM degree of mixing is 128.Experiment raw tone adopts the corpus of NIST evaluation and test, have 40 target speakers, everyone about 5min trains voice (after VAD about 3min), one has 148 target detections, emit for 1560 times and recognize test, after every tested speech VAD, the extraction of about 90s fundamental frequency employs phonology analysis and composite software Praat, and each frame takes 40ms frame and moves 1Oms, and the fundamental frequency of continuous 3 frames is formed one group of 3 characteristic parameter tieed up.The fundamental frequency of unvoiced frames is taken as 0 process, often one-dimensional when being all 0 in a certain frame vector, then this group parameter rejected.Fig. 4 gives the DET curve chart of MFCC main system and fundamental frequency aid system, and for both contrasts performance, MFCC parameter system also uses the corpus the same with aid system, and eliminates leading portion RASTA filtering link.As can be seen from Figure 4, the performance of fundamental frequency aid system is not as good as baseline system, Deng misclassification rate EER aspect, fundamental frequency aid system is 25.00%.MFCC system is 14.84%, fundamental frequency system compares the very big gap of existence with the MFCC system based on channel parameters, therefore uses only as the aid system of redeeming.
In step s 102, it is judged that whether described first score is higher than a threshold value.
In step s 103, if it is, described voice signal is carried out identification according to described first score, otherwise, step S104 is entered.
In step S104, according to the score value after the fusion of described first score and described second score, described voice signal is carried out identification.
It is discussed in detail below and realizes method to what the first score and the second score merged.
Using fundamental frequency characteristic parameter that speaker verification's performance compensation mainly has two kinds of approach, a kind of is L ties up base frequency parameters be attached to the big vector constituting a D+L dimension after D ties up MFCC vector, as the characteristic parameter that words person confirms.Although way is relatively simple, but for not having the mute frame of fundamental frequency and unvoiced frames to be then difficult to process, and this method does not possess clear and definite physical significance.Therefore, this paper carries out performance compensation by system-level fusion, builds the main system based on MFCC parameter and the aid system based on fundamental frequency characteristic parameter under GMM-UBM framework respectively, and the linear weighted function that the two system carries out scoring level merges.
1, linear weighted function scoring is merged
Traditional scoring territory fusion method is as it is shown in figure 5, main system adopts MFCC parameter, and aid system adopts base frequency parameters.Voice signal obtains a score respectively under different sub-systems, and merges according to the following formula;
S c o r e = Σ n = 1 N ω n s n
Wherein, Score is scoring after merging, and in only one of which aid system situation, subsystem number N=20 gas meter shows that weight, Sn represent the scoring of each subsystem, and choosing of weight is value by empirically determined.
2, GMM-UBM framework scoring tactics and threshold value are determined
Owing to speaker verification adopts GMM-UBM framework, tested speech log likelihood score following formula is expressed:
S c o r e = 1 T Σ t = 1 T { log [ p ( o t | λ ) ] - log [ p ( o t | U B M ) ] }
In above formula, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.4-8 formula has been marked under UBM background model and result has been made regular for benchmark, and when tested speech is from target words person, its output likelihood score score under target Speaker model x in theory should more than the output likelihood score under background model UBM;When tested speech come from emit recognize words person time, the likelihood score score under UBM is in theory more than the output likelihood score score under target Speaker model x.It can therefore be concluded that go out, at Score>0 time, illustrate that tested speech is from target words person,<when 0, tested speech recognizes words person from emitting to Score.Target detection score is more high, illustrates that this tested speech more meets the feature distribution of target words person;Emit recognize test score more low, illustrate that target Speaker model refuses this tested speech probability more big.It is to say, target detection score more high or emit recognize test the more low expression test result of score more outstanding, the speaker verification system theory decision threshold under GMM-UBM framework is 0.
In order to examine or check fundamental frequency aid system to the impact of scoring after merging, mark the main system weight w in fusion method to above formula1Take 0.8, fundamental frequency aid system weight w2Taking 0.2, have chosen 1560 altogether and emit and recognize tested speech, 148 target detection voices are tested.Experiment finds, 1560 emit and recognize in tested speech, and having 1329 barium l to try voices scoring after merging increases;Article 148, in target detection voice, 110 tested speech scorings after merging reduce.Most of testing evaluation result after fusion (accounts for the outstanding degree of total number measured 84.25% to reduce.As shown in Figure 6, it is clear that in traditional linear weighted function fusion method, final scoring is created negative effect by fundamental frequency auxiliary system, the scoring distribution of main system is deteriorated.
Additionally, embodiment of the present invention additionally provides speaker's device for identifying of the channel robustness of a kind of feature based domain compensation.It it is the structural representation of speaker's device for identifying one embodiment of the channel robustness of a kind of feature based domain compensation of the present invention referring to Fig. 7, Fig. 7.Speaker Identification device shown in Fig. 7 includes grading module 10, judge module 20 and recognition module 30.
Wherein, grading module 10, for adopting the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtains the first score and the second score.Judge module 20 is used for judging that whether described first score is higher than a threshold value.In recognition module 30, if the absolute value of described first score is higher than described threshold value, then for described voice signal being carried out identification according to described first score;If the absolute value of described first score is lower than described threshold value, then for the score value after the fusion according to described first score and described second score, described voice signal is carried out identification.
Described device for identifying 30 includes merging according to following formula:
S c o r e = &Sigma; n = 1 N &omega; n s n
Wherein, Score is score value after merging, subsystem number N=2, ωnRepresenting weight, Sn represents main system first score corresponding with aid system and the second score.
Additionally, in some embodiments, described device for identifying 30 includes merging according to following formula:
S c o r e = 1 T &Sigma; t = 1 T { log &lsqb; p ( o t | &lambda; ) &rsqb; - log &lsqb; p ( o t | U B M ) &rsqb; }
Wherein, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.In the above-described embodiment, described threshold value is 0.
Can be seen that from above-mentioned embodiment, speaker's discrimination method of the channel robustness of the feature based domain compensation described in the embodiment of the present invention and device promote fundamental frequency feature interlocutor by setting threshold value and confirm to compensate performance, in the test result of main system, if the absolute value of the first score is higher than threshold value, then think high-quality test result, directly adopt the first score of this main system to carry out identification;The test result that the absolute value of the first score is lower than threshold value then needs the fusion of the second score with aid system to improve.The embodiment of the present invention can effectively protect distribution of marking preferably under main system, improves the compensation effect of fundamental frequency characteristic parameter speaker verification.
Should be appreciated that, the invention is not limited in above-mentioned embodiment, every various changes to the present invention or modification are without departing from the spirit and scope of the present invention, if these change within claim and the equivalent technologies scope belonging to the present invention with modification, then the present invention also implies that comprising these changes and modification.

Claims (8)

1. speaker's discrimination method of the channel robustness of a feature based domain compensation, it is characterised in that including:
Adopt the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtain the first score and the second score;
Judge whether described first score is entered higher than a threshold value,
If it is, described voice signal is carried out identification according to described first score, otherwise, according to the score value after the fusion of described first score and described second score, described voice signal is carried out identification.
2. speaker's discrimination method of the channel robustness of feature based domain compensation according to claim 1, it is characterised in that the described step described voice signal being carried out identification according to the fusion results of described first score Yu described second score includes:
Merge according to following formula:
S c o r e = &Sigma; n = 1 N &omega; n s n
Wherein, Score is score value after merging, subsystem number N=2, ωnRepresenting weight, Sn represents main system first score corresponding with aid system and the second score.
3. speaker's discrimination method of the channel robustness of feature based domain compensation according to claim 1, it is characterised in that the described step described voice signal being carried out identification according to the fusion results of described first score Yu described second score includes:
Merge according to following formula:
S c o r e = 1 T &Sigma; t = 1 T { l o g &lsqb; p ( o t | &lambda; ) &rsqb; - l o g &lsqb; p ( o t | U B M ) &rsqb; }
Wherein, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.
4. speaker's discrimination method of the channel robustness of feature based domain compensation according to claim 3, it is characterised in that described threshold value is 0.
5. speaker's device for identifying of the channel robustness of a feature based domain compensation, it is characterised in that including:
Grading module, for adopting the main system based on MFCC parameter and the aid system based on base frequency parameters that voice signal is tested, respectively obtains the first score and the second score;
Judge module, is used for judging that whether described first score is higher than a threshold value;
Recognition module, if the absolute value of described first score is higher than described threshold value, then for described voice signal being carried out identification according to described first score;If the absolute value of described first score is lower than described threshold value, then for the score value after the fusion according to described first score and described second score, described voice signal is carried out identification.
6. speaker's device for identifying of the channel robustness of a kind of feature based domain compensation according to claim 5, it is characterised in that described device for identifying includes:
Merge according to following formula:
S c o r e = &Sigma; n = 1 N &omega; n s n
Wherein, Score is score value after merging, subsystem number N=2, ωnRepresenting weight, Sn represents main system first score corresponding with aid system and the second score.
7. speaker's device for identifying of the channel robustness of feature based domain compensation according to claim 5, it is characterised in that described device for identifying includes:
Merge according to following formula:
S c o r e = 1 T &Sigma; t = 1 T { l o g &lsqb; p ( o t | &lambda; ) &rsqb; - l o g &lsqb; p ( o t | U B M ) &rsqb; }
Wherein, T represents the totalframes of tested speech, and x represents target Speaker model, and UBM represents background model.
8. speaker's device for identifying of the channel robustness of feature based domain compensation according to claim 7, it is characterised in that described threshold value is 0.
CN201610173039.2A 2016-03-23 2016-03-23 Channel robust speaker identification method and device based on characteristic domain compensation Pending CN105810198A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610173039.2A CN105810198A (en) 2016-03-23 2016-03-23 Channel robust speaker identification method and device based on characteristic domain compensation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610173039.2A CN105810198A (en) 2016-03-23 2016-03-23 Channel robust speaker identification method and device based on characteristic domain compensation

Publications (1)

Publication Number Publication Date
CN105810198A true CN105810198A (en) 2016-07-27

Family

ID=56454367

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610173039.2A Pending CN105810198A (en) 2016-03-23 2016-03-23 Channel robust speaker identification method and device based on characteristic domain compensation

Country Status (1)

Country Link
CN (1) CN105810198A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510991A (en) * 2018-03-30 2018-09-07 厦门大学 Utilize the method for identifying speaker of harmonic series
CN109308894A (en) * 2018-09-26 2019-02-05 中国人民解放军陆军工程大学 One kind being based on the pronunciation modeling method of Bloomfield ' s model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
EP2028647A1 (en) * 2007-08-24 2009-02-25 Deutsche Telekom AG Method and device for speaker classification
CN102664010A (en) * 2012-05-04 2012-09-12 山东大学 Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN104240706A (en) * 2014-09-12 2014-12-24 浙江大学 Speaker recognition method based on GMM Token matching similarity correction scores

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2028647A1 (en) * 2007-08-24 2009-02-25 Deutsche Telekom AG Method and device for speaker classification
CN101178897A (en) * 2007-12-05 2008-05-14 浙江大学 Speaking man recognizing method using base frequency envelope to eliminate emotion voice
CN101241699A (en) * 2008-03-14 2008-08-13 北京交通大学 A speaker identification system for remote Chinese teaching
CN102664010A (en) * 2012-05-04 2012-09-12 山东大学 Robust speaker distinguishing method based on multifactor frequency displacement invariant feature
CN104240706A (en) * 2014-09-12 2014-12-24 浙江大学 Speaker recognition method based on GMM Token matching similarity correction scores

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
花城: "说话人确认的特征域补偿方法研究", 《中国优秀硕士学位论文全文数据库,信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510991A (en) * 2018-03-30 2018-09-07 厦门大学 Utilize the method for identifying speaker of harmonic series
CN109308894A (en) * 2018-09-26 2019-02-05 中国人民解放军陆军工程大学 One kind being based on the pronunciation modeling method of Bloomfield ' s model

Similar Documents

Publication Publication Date Title
CN105938716B (en) A kind of sample copying voice automatic testing method based on the fitting of more precision
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN108074576B (en) Speaker role separation method and system under interrogation scene
Singh et al. Applications of speaker recognition
CN102664006B (en) Abnormal voice detecting method based on time-domain and frequency-domain analysis
US9704495B2 (en) Modified mel filter bank structure using spectral characteristics for sound analysis
CN107958669B (en) Voiceprint recognition method and device
CN104464724A (en) Speaker recognition method for deliberately pretended voices
Algabri et al. Automatic speaker recognition for mobile forensic applications
Paul et al. Countermeasure to handle replay attacks in practical speaker verification systems
CN113823293B (en) Speaker recognition method and system based on voice enhancement
Kim et al. Hierarchical approach for abnormal acoustic event classification in an elevator
KR101250668B1 (en) Method for recogning emergency speech using gmm
CN107993664A (en) A kind of robust method for distinguishing speek person based on Competitive ANN
CN110797032A (en) Voiceprint database establishing method and voiceprint identification method
CN109104534A (en) A kind of system for improving outgoing call robot and being intended to Detection accuracy, recall rate
CN105679323A (en) Number finding method and system
CN105810198A (en) Channel robust speaker identification method and device based on characteristic domain compensation
CN105741855B (en) Attendant call response detection method based on audio analysis
Zheng et al. A comparative study of feature and score normalization for speaker verification
Woubie et al. Voice quality features for replay attack detection
KR102407055B1 (en) Apparatus and method for measuring dialogue quality index through natural language processing after speech recognition
Limkar et al. Speaker Recognition using VQ and DTW
CN112967712A (en) Synthetic speech detection method based on autoregressive model coefficient
CN111091836A (en) Intelligent voiceprint recognition method based on big data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160727

RJ01 Rejection of invention patent application after publication