WO2017162053A1 - Identity authentication method and device - Google Patents

Identity authentication method and device Download PDF

Info

Publication number
WO2017162053A1
WO2017162053A1 PCT/CN2017/076336 CN2017076336W WO2017162053A1 WO 2017162053 A1 WO2017162053 A1 WO 2017162053A1 CN 2017076336 W CN2017076336 W CN 2017076336W WO 2017162053 A1 WO2017162053 A1 WO 2017162053A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
segmentation
score
target
hmm
Prior art date
Application number
PCT/CN2017/076336
Other languages
French (fr)
Chinese (zh)
Inventor
朱长宝
李欢欢
袁浩
王金明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2017162053A1 publication Critical patent/WO2017162053A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN

Definitions

  • the adjacent two initial segmentation points are sequentially selected as a range start and end point, in which the average energy is calculated in units of specified frames, and the point where the average energy continuously increases by a specified number of times is found, and the point at which the increase is started is started.
  • the initial segmentation unit is divided by the initial segmentation point.
  • the performing the initial segmentation unit to perform the forced segmentation so that the total number of the segmentation units is the same as the preset target text number, including:
  • the performing the initial segmentation unit to perform the forced segmentation so that the total number of the segmentation units is the same as the preset target text number, including:
  • the voiceprint matching module is configured to match the target voice with the target voiceprint model to obtain a first voiceprint score, and match the non-target voice with the target voiceprint model to obtain a second voiceprint score;
  • the processing module is further configured to sequentially select the target text model, and the voice feature of the non-target text is matched with the corresponding target text model, and the recognized text is scored. Obtaining a mean value and a standard deviation of the spoofed text score corresponding to the target text model; and subtracting the average of the corresponding spoofed text scores by the first text score and the second text score respectively and dividing by the The standard deviation is respectively scored by the regular text; the first text score after the regularization is scored and the first voiceprint is scored, and the maximum value and the minimum value corresponding to each target text are obtained; the maximum value and the minimum value are used to be regularized.
  • the text matching module matches the voice features of each of the segmentation units with all the target text models to obtain a segmentation of each of the segmentation units and each of the target text models.
  • the unit text matching score includes: using the voice feature of each of the segmentation units as an input of each target text hidden Markov model HMM, and using the output probability obtained according to the Viterbi algorithm as the corresponding segmentation unit text matching score .
  • step d Combining the voice scores and the regularized text scores, obtaining the maximum and minimum values corresponding to each target text, and using the maximum and minimum values in step d to score the voiceprint scores and texts. Normalize; for example:
  • Target speaker A person who is trusted by the system, who needs to pass the voiceprint authentication
  • the adjacent two initial segmentation points are sequentially selected as a range start and end point, in which the average energy is calculated in units of specified frames, and the point where the average energy continuously increases by a specified number of times is found, and the point at which the increase is started is started.
  • a new initial segmentation point otherwise, the initial segmentation point is not updated, and the initial segmentation unit is divided by the initial segmentation point.
  • Step 110 The decision is made by using the integrated decision classifier to determine the input feature vector new_score. For each input, the output is 1 or 0. When the output is 1, the test voice decision is passed, and when the output is 0, the test voice is rejected.

Abstract

An identity authentication method, comprising: acquiring a voice characteristic of an input voice, and matching the voice characteristic to a pre-stored target voiceprint model, obtaining a voiceprint match score (11); segmenting the input voice according to the voice characteristic and a target text model, and acquiring a number of initial segmentation units and initial voice segmentation units (12) - if the number of initial voice segmentation units is greater than or equal to a first threshold, performing forced segmentation on the initial segmentation units, causing the total number of segmentation units to be equal to the number in a preset target text; matching the voice characteristic of each segmentation unit with every target text model, obtaining a segmentation unit text match score for each segmentation unit and each target text model (13); performing identity authentication according to the segmentation unit text match scores, voiceprint match scores and a pre-trained probabilistic neural network (PNN) classifier (14). The present method realizes the goal of two-factor authentication of a user, increasing system security.

Description

一种身份认证的方法和装置Method and device for identity authentication 技术领域Technical field
本文涉及但不限于生物安全动态认证技术领域,尤指一种身份认证的方法和装置。This document refers to, but is not limited to, the field of biosafety dynamic authentication technology, especially a method and device for identity authentication.
背景技术Background technique
随着互联网信息技术的不断发展,网上业务、电子商务等日益繁荣,人们与计算机网络的联系越来越紧密,各种网络安全威胁也随之而来,保护用户个人信息成为人们急需解决的问题。动态声纹密码识别技术结合了说话人识别和语音识别两重身份认证技术,从而可以有效地防止了录音攻击,极大地增强了系统的安全性。通常,在接收到用户含有密码的语音后,系统首先对声纹和动态密码分别计算得分,然后分别比较两种得分与阈值大小,或者将两种得分融合后判断其与综合阈值的大小,若大于事先设定的阈值,则请求人进入被保护系统,否则,拒绝其进入。但在实际应用时,受环境的影响,说话人声纹匹配分数分布和文本匹配分数分布往往各不相同,而仅仅利用预先设定的阈值来判断则有失准确性。With the continuous development of Internet information technology, online business and e-commerce have become increasingly prosperous, people are more and more connected with computer networks, and various network security threats have followed. Protecting users' personal information has become an urgent problem for people. . The dynamic voiceprint password recognition technology combines the two identification technologies of speaker recognition and speech recognition, which can effectively prevent recording attacks and greatly enhance the security of the system. Generally, after receiving the voice containing the password, the system first calculates the scores for the voiceprint and the dynamic password respectively, and then compares the two scores with the threshold size respectively, or combines the two scores to determine the size of the integrated threshold. If the threshold is greater than the preset threshold, the requester enters the protected system, otherwise, it refuses to enter. However, in practical applications, due to the influence of the environment, the speaker's voiceprint matching score distribution and the text matching score distribution tend to be different, and only using the preset threshold to judge is the accuracy.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本发明实施例提供一种身份认证的方法,包括:The embodiment of the invention provides a method for identity authentication, including:
获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;Obtaining a voice feature of the input voice, and matching the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于第一阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;And segmenting the input voice according to the voice feature and the preset target text model, and acquiring the number of the initial segmentation unit and the initial voice segmentation unit, where the number of the initial voice segmentation units is smaller than the first And determining, by the threshold, that the input voice is an illegal voice; if the number of the initial voice segmentation units is greater than or equal to the first threshold, performing forced segmentation on the initial segmentation unit, so that the total number of the segmentation units is The number is the same as the number of preset target texts;
将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配, 得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;Matching the speech features of each of the segmentation units to all of the target text models, Obtaining a segmentation unit text matching score of each of the segmentation unit and each of the target text models;
根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的概率神经网络PNN分类器进行身份认证。Identity authentication is performed according to the segmentation unit text matching score, the voiceprint matching score, and a pre-trained probabilistic neural network PNN classifier.
可选地,所述PNN分类器是通过以下方式进行训练的:Optionally, the PNN classifier is trained in the following manner:
将目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第一文本打分和第一声纹打分,将所述第一文本打分和第一声纹打分组合成为所述判决分类器的接受特征信息;Matching the target speech with the target text model and the target voiceprint model to obtain a first text score and a first voice score, respectively, and combining the first text score and the first voice score into the decision classifier. Accept feature information;
将非目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第二文本打分和第二声纹打分,将所述第二文本打分和第二声纹打分组合成为所述判决分类器的拒绝特征信息;Matching the non-target speech with the target text model and the target voiceprint model to obtain a second text score and a second voice score, respectively, and combining the second text score and the second voice score into the decision classifier Rejection characteristic information;
根据所述接受特征信息和所述拒绝特征信息对所述PNN分类器进行训练。The PNN classifier is trained according to the acceptance feature information and the rejection feature information.
可选地,在根据所述接受特征信息和所述拒绝特征信息对所述PNN分类器进行训练之前,还包括对所述目标语音和所述非目标语音的声纹打分和文本打分进行得分规整,包括:Optionally, before training the PNN classifier according to the accepting feature information and the reject feature information, further comprising performing score regularization on voiceprint scoring and text scoring of the target speech and the non-target speech. ,include:
依次选取所述目标文本模型,取非目标文本的语音特征与对应的所述目标文本模型匹配,得到冒认文本打分,获取所述目标文本模型对应的冒认文本打分的均值及标准差;Selecting the target text model in turn, and matching the phonetic features of the non-target text with the corresponding target text model, obtaining a pseudo-text score, and obtaining the mean and standard deviation of the pseudo-text score corresponding to the target text model;
将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;Subdividing the first text score and the second text score respectively by the mean value of the corresponding spoofed text scores and dividing by the standard deviation, respectively obtaining a regular text score;
合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;Combining the normalized first text score and the first voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the normalized first text and the first The voiceprint score is normalized as the acceptance feature information of the PNN classifier;
合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。Combining the normalized second text score and the second voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the regular second text and the second The voiceprint score is normalized as the rejection feature information of the PNN classifier.
可选地,所述根据所述语音特征和预设的目标文本模型对所述输入语 音进行切分,获取初始切分单元,包括:Optionally, the input language is according to the voice feature and a preset target text model The sound is segmented to obtain the initial segmentation unit, including:
根据目标密码中的目标文本序列,将对应的目标文本隐马尔可夫模型HMM组合成第一复合HMM;Correlating the corresponding target text hidden Markov model HMM into a first composite HMM according to the target text sequence in the target password;
将所述语音特征作为所述第一复合HMM的输入进行维特比解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;Performing Viterbi decoding as the input of the first composite HMM to obtain a first state output sequence, and corresponding to a state in the first state output sequence that is an integer multiple of a state number of a single target text HMM Position as the initial segmentation point;
依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,由所述初始切分点分割成的所述初始切分单元。The adjacent two initial segmentation points are sequentially selected as a range start and end point, in which the average energy is calculated in units of specified frames, and the point where the average energy continuously increases by a specified number of times is found, and the point at which the increase is started is started. As the new initial segmentation point, the initial segmentation unit is divided by the initial segmentation point.
可选地,将对应的目标文本HMM组合成第一复合HMM,包括:Optionally, the corresponding target text HMM is combined into the first composite HMM, including:
所述第一复合HMM的状态数为单个目标文本HMM的状态数总和;所述第一复合HMM的每个状态的高斯混合模型参数与所述单个目标文本HMM模型每个状态的高斯混合模型参数相同;The state number of the first composite HMM is a sum of state numbers of a single target text HMM; a Gaussian mixture model parameter of each state of the first composite HMM and a Gaussian mixture model parameter of each state of the single target text HMM model the same;
将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;The last state self transition probability in the state transition matrix of the single target text HMM is set to 0, the state transition probability transferred to the next state is set to 1; the state transition of the last single target text HMM of the target text The probability matrix is not changed;
将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。The state transition probability matrices of the single target text HMM are merged according to a single target text arrangement order of the target text, to obtain a state transition probability matrix of the composite HMM.
可选地,所述对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:Optionally, the performing the initial segmentation unit to perform the forced segmentation, so that the total number of the segmentation units is the same as the preset target text number, including:
选择特征段最长的所述初始切分单元进行强制切分,使得强制切分后的所有切分单元的总个数与预设的目标文本的个数相同。The initial segmentation unit having the longest feature segment is selected to perform forced segmentation, so that the total number of all segmentation cells after the forced segmentation is the same as the preset target text number.
可选地,所述对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:Optionally, the performing the initial segmentation unit to perform the forced segmentation, so that the total number of the segmentation units is the same as the preset target text number, including:
按照所述初始切分单元的长度从大到小的顺序开始强制拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分后的切分单元总个数等于所述目标文本的个数为止; The forced splitting is started according to the length of the initial splitting unit from the largest to the smallest, and each of the initial splitting units is divided into two segments at a time, until the total number of the splitting units after the splitting is equal to The number of target texts;
若强制切分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分单元分别与每个目标文本隐马尔可夫模型HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元为所述切分单元,若当前的所述切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,当前的所述切分单元个数大于或小于所述第三阈值,则强制切分结束。If the number of forced segmentation is greater than or equal to the second threshold, the forced segmentation ends; if the number of forced segmentation is less than the second threshold, each current segmentation unit is separately associated with each target text hidden Markov model The HMM performs matching scoring, respectively selecting the target text HMM corresponding to the highest scoring, and combining the selected target text HMM into a second composite HMM; and using the speech feature as an input of the second composite HMM The Viterbi decoding obtains a second state output sequence, and the position corresponding to the state of the second state output sequence that is an integer multiple of the state number of the single target text HMM is used as a segmentation point, and the segmentation point is used for the speech The different unit obtained by the feature segmentation is the segmentation unit. If the current number of the segmentation units is less than the third threshold, the segmentation unit after the current segmentation is used as the initial segmentation unit to continue the forced segmentation. If the current number of the segmentation units is greater than or less than the third threshold, the forced segmentation ends.
可选地,所述将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数,包括:Optionally, the voice feature of each of the segmentation units is matched with all the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models. ,include:
将每个所述切分单元的语音特征作为每个目标文本隐马尔可夫模型HMM的输入,将根据维特比算法获得的输出概率作为对应的切分单元文本匹配分数。The speech feature of each of the segmentation units is used as an input of each target text hidden Markov model HMM, and the output probability obtained according to the Viterbi algorithm is used as a corresponding segmentation unit text matching score.
可选地,所述根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的判决分类器进行身份认证,包括:Optionally, the performing the identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and the pre-trained decision classifier includes:
取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过;若通过的切分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;Taking the text corresponding to the m highest scores in the segment matching unit text matching score corresponding to each of the segmentation units as the candidate text, and if the candidate text includes the target text corresponding to the segmentation unit, The segmentation unit passes the authentication, and calculates the total number of the splitting units that pass. If the total number of the splitting units passed is less than or equal to the fourth threshold, the text authentication fails, and the identity authentication fails; if the total number of the splitting units passed is greater than The fourth threshold, the text authentication of the input voice passes;
判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述判决分类器的输入进行身份认证。 Determining whether the voiceprint matching score is greater than a fifth threshold, if yes, the voiceprint authentication is passed, and the identity authentication is passed; if not, the text of each of the segmentation units and the corresponding target text model is scored and the voiceprint is The matching score is scored and the regularized score is used as the input of the decision classifier for identity authentication.
本发明实施例还提供了一种身份认证的装置,包括概率神经网络PNN分类器,包括:An embodiment of the present invention further provides an apparatus for identity authentication, including a probabilistic neural network PNN classifier, including:
声纹匹配模块,设置为获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;The voiceprint matching module is configured to acquire a voice feature of the input voice, and match the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
切分模块,设置为根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;a segmentation module, configured to segment the input speech according to the voice feature and a preset target text model, and obtain an initial segmentation unit and a number of initial voice segmentation units, such as the initial voice segmentation unit If the number of the input speech is less than the threshold, the input speech is determined to be an illegal speech; if the number of the initial speech segmentation units is greater than or equal to the first threshold, the initial segmentation unit is forcibly segmented so that the segmentation is performed. The total number of units is the same as the number of preset target texts;
文本匹配模块,设置为将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;a text matching module, configured to match a voice feature of each of the segmentation units with all of the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models ;
认证模块,设置为根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的所述PNN分类器进行身份认证。An authentication module is configured to perform identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and the pre-trained PNN classifier.
可选地,所述装置还包括处理模块,Optionally, the device further includes a processing module,
所述声纹匹配模块,是设置为将目标语音与目标声纹模型进行匹配得到第一声纹打分,将非目标语音与所述目标声纹模型进行匹配得到第二声纹打分;The voiceprint matching module is configured to match the target voice with the target voiceprint model to obtain a first voiceprint score, and match the non-target voice with the target voiceprint model to obtain a second voiceprint score;
所述文本匹配模块,是设置为将所述目标语音与所述目标文本模型进行匹配得到第一文本打分,将所述非目标语音与所述目标文本模型进行匹配得到第二文本打分;The text matching module is configured to match the target speech with the target text model to obtain a first text score, and match the non-target speech with the target text model to obtain a second text score;
所述处理模块,设置为将所述第一文本打分和第一声纹打分组合成为所述PNN分类器的接受特征信息,将所述第二文本打分和第二声纹打分组合成为所述PNN分类器的拒绝特征信息;The processing module is configured to combine the first text score and the first voice score into the acceptance feature information of the PNN classifier, and combine the second text score and the second voice score into the PNN Rejection feature information of the classifier;
所述PNN分类器,根据所述接受特征信息和所述拒绝特征信息进行训练。The PNN classifier performs training according to the acceptance feature information and the rejection feature information.
可选地,所述处理模块,还设置为依次选取所述目标文本模型,取非目标文本的语音特征与对应的所述目标文本模型匹配,得到冒认文本打分, 获取所述目标文本模型对应的冒认文本打分的均值及标准差;将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。Optionally, the processing module is further configured to sequentially select the target text model, and the voice feature of the non-target text is matched with the corresponding target text model, and the recognized text is scored. Obtaining a mean value and a standard deviation of the spoofed text score corresponding to the target text model; and subtracting the average of the corresponding spoofed text scores by the first text score and the second text score respectively and dividing by the The standard deviation is respectively scored by the regular text; the first text score after the regularization is scored and the first voiceprint is scored, and the maximum value and the minimum value corresponding to each target text are obtained; the maximum value and the minimum value are used to be regularized. The subsequent first text score and the first voiceprint score are normalized as the acceptance feature information of the PNN classifier; the normalized second text score and the second voiceprint score are merged, and each is acquired The maximum and minimum values corresponding to the target text; the normalized second text score and the second voiceprint score are normalized by the maximum and minimum values as the rejection feature information of the PNN classifier.
可选地,所述切分模块,根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元,包括:根据目标密码中的目标文本序列,将对应的目标文本隐马尔可夫模型HMM组合成第一复合HMM;将所述语音特征作为所述第一复合HMM的输入进行维特比解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,由所述初始切分点分割成的所述初始切分单元。Optionally, the segmentation module performs segmentation on the input voice according to the voice feature and a preset target text model, and obtains an initial segmentation unit, including: corresponding to the target text sequence in the target password, The target text hidden Markov model HMM is combined into a first composite HMM; the voice feature is Viterbi decoded as an input of the first composite HMM to obtain a first state output sequence, and the first state output sequence is The position corresponding to the state of the integer multiple of the state number of the single target text HMM is taken as the initial segmentation point; the adjacent two initial segmentation points are sequentially selected as the interval start and end points, and within the interval, the specified frame is The unit calculates the average energy, finds the point at which the average energy continuously specifies the second increase, and takes the point that starts to increase as the new initial segmentation point, and the initial segmentation unit is divided into the initial segmentation points.
可选地,所述切分模块,将对应的目标文本HMM组合成第一复合HMM,包括:所述第一复合HMM的状态数为单个目标文本HMM的状态数总和;所述第一复合HMM的每个状态具有的高斯混合模型参数与所述单个目标文本HMM的每个状态具有的高斯混合模型参数相同;将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。Optionally, the sharding module combines the corresponding target texts HMM into the first composite HMM, including: the state number of the first composite HMM is the sum of the states of the single target text HMM; the first composite HMM Each state has a Gaussian mixture model parameter that is the same as a Gaussian mixture model parameter that each state of the single target text HMM has; the last state self-transition probability in the state transition matrix of the single target text HMM is set to 0, the state transition probability of transitioning to the next state is set to 1; the state transition probability matrix of the last single target text HMM of the target text is not changed; the state transition probability matrix of the single target text HMM is according to the target The single target text arrangement order of the text is merged to obtain a state transition probability matrix of the composite HMM.
可选地,所述切分模块,对所述初始切分单元进行强制切分,使得切 分单元的总个数与预设的目标文本的个数相同,包括:选择特征段最长的所述初始切分单元进行强制切分,使得强制切分后的所有切分单元的总个数与预设的目标文本的个数相同。Optionally, the segmentation module performs forced segmentation on the initial segmentation unit to make a slice The total number of sub-units is the same as the number of preset target texts, including: selecting the longest primary segmentation unit to perform forced segmentation, so that the total number of all the segmentation units after forced segmentation Same as the preset target text.
可选地,所述切分模块,对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:按照所述初始切分单元的长度从大到小的顺序开始拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分后的单元总个数等于所述目标文本的个数为止;若强制切分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分的单元分别与每个目标文本隐马尔可夫模型HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元为所述切分单元,若当前的所述切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,若当前的所述切分单元个数大于或等于第三阈值,则强制切分结束。Optionally, the segmentation module performs forced segmentation on the initial segmentation unit, such that the total number of segmentation units is the same as the number of preset target texts, including: according to the initial segmentation unit The length of the length is split from the largest to the smallest, and each time the initial splitting unit is divided into two segments at an average, until the total number of units after the segmentation is equal to the number of the target text; If the number of times of segmentation is greater than or equal to the second threshold, the segmentation is forced to end; if the number of forced segmentation is less than the second threshold, then each of the currently segmented cells and each target text hidden Markov model HMM Performing matching scoring, respectively selecting the target text HMM corresponding to the highest scoring, combining the selected target text HMM into a second composite HMM; and using the speech feature as an input of the second composite HMM Ratio decoding, obtaining a second state output sequence, the position corresponding to the state of the second state output sequence that is an integer multiple of the number of states of the single target text HMM is used as a segmentation point, and the segmentation point is used for the speech The different unit obtained by the segmentation is the segmentation unit. If the current number of the segmentation units is less than the third threshold, the segmentation unit after the current segmentation is used as the initial segmentation unit to continue the forced segmentation. If the current number of the segmentation units is greater than or equal to the third threshold, the forced segmentation ends.
可选地,所述文本匹配模块,将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数,包括:将每个所述切分单元的语音特征作为每个目标文本隐马尔可夫模型HMM的输入,将根据维特比算法获得的输出概率作为对应的切分单元文本匹配分数。Optionally, the text matching module matches the voice features of each of the segmentation units with all the target text models to obtain a segmentation of each of the segmentation units and each of the target text models. The unit text matching score includes: using the voice feature of each of the segmentation units as an input of each target text hidden Markov model HMM, and using the output probability obtained according to the Viterbi algorithm as the corresponding segmentation unit text matching score .
可选地,所述认证模块,根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的判决分类器进行身份认证,包括:取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过;若通过的切 分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述PNN分类器的输入进行身份认证。Optionally, the authentication module performs identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and the pre-trained decision classifier, including: taking the corresponding to each of the segmentation units The text corresponding to the m highest scores in the segmentation unit text matching score is used as the candidate text. If the target text corresponding to the segmentation unit is included in the to-be-selected text, the segmentation unit passes the authentication, and the cut is calculated. The total number of sub-units, if the total number of split units passed is less than or equal to the fourth threshold, the text authentication will not pass, and the identity authentication will not pass; If the total number of the sub-units is greater than the fourth threshold, the text authentication of the input voice passes; determining whether the voiceprint matching score is greater than a fifth threshold, and if so, the voiceprint authentication is passed, and the identity authentication is passed; if not, the voice authentication is passed; Each of the segmentation units and the corresponding target text model scores a score and the voiceprint matching scores are scored, and the normalized scores are used as input of the PNN classifier for identity authentication.
本发明实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于上述的一种身份认证的方法。The embodiment of the invention further provides a computer readable storage medium storing computer executable instructions for the above method for identity authentication.
综上,本发明实施例提供一种身份认证的方法及装置,将声纹与动态密码认证两者相结合,实现了对用户进行双重验证的目的,提高了系统的安全性、可靠性和准确性。In summary, the embodiments of the present invention provide a method and device for identity authentication, which combines voiceprint and dynamic password authentication to achieve dual authentication for users, and improves system security, reliability, and accuracy. Sex.
附图说明DRAWINGS
图1是本发明实施例提供的一种身份认证的方法的流程图;FIG. 1 is a flowchart of a method for identity authentication according to an embodiment of the present invention;
图2是本发明实施例的训练PNN分类器的方法的流程图;2 is a flow chart of a method of training a PNN classifier according to an embodiment of the present invention;
图3是本发明实施例一的一种身份认证的方法的流程图;3 is a flowchart of a method for identity authentication according to Embodiment 1 of the present invention;
图4是本发明实施例一的语音信号初始切分的方法的流程图;4 is a flowchart of a method for initial segmentation of a speech signal according to Embodiment 1 of the present invention;
图5是本发明实施例一的声纹与文本初步认证的方法的流程图;FIG. 5 is a flowchart of a method for initial authentication of voiceprint and text according to Embodiment 1 of the present invention; FIG.
图6是本发明实施例一的得分规整的方法的流程图;6 is a flowchart of a method for score regularization according to Embodiment 1 of the present invention;
图7是本发明实施例二的一种身份认证的方法的流程图;7 is a flowchart of a method for identity authentication according to Embodiment 2 of the present invention;
图8是本发明实施例二的语音信号初始切分的方法的流程图;8 is a flowchart of a method for initial segmentation of a voice signal according to Embodiment 2 of the present invention;
图9为本发明实施例的一种身份认证的装置的示意图。FIG. 9 is a schematic diagram of an apparatus for identity authentication according to an embodiment of the present invention.
具体实施方式detailed description
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.
图1是本发明实施例提供的一种身份认证的方法的流程图,如图1所示,本实施例的方法包括以下步骤: FIG. 1 is a flowchart of a method for identity authentication according to an embodiment of the present invention. As shown in FIG. 1 , the method in this embodiment includes the following steps:
步骤11、获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;Step 11: Acquire a voice feature of the input voice, and match the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
步骤12、根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于阈值,则判定所述输入语音为非法语音,结束流程;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;Step 12: Perform segmentation according to the voice feature and a preset target text model, and obtain an initial segmentation unit and an initial voice segmentation unit, such as the number of the initial voice segmentation units. If the threshold is less than the threshold, the input voice is determined to be an illegal voice, and the process ends. If the number of the initial voice segmentation units is greater than or equal to the first threshold, the initial segmentation unit is forced to be segmented, so that the segmentation is performed. The total number of units is the same as the number of preset target texts;
步骤13、将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;Step 13: Matching the voice features of each of the segmentation units with all the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models;
步骤14、根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的PNN(Probabilistic neural networks,概率神经网络)分类器进行身份认证。 Step 14. Perform identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and a pre-trained PNN (Probabilistic Neural Networks) classifier.
本发明实施例提供的一种身份认证方法,将声纹与动态密码认证两者相结合,实现了对用户进行双重验证的目的,提高了系统的安全性、可靠性和准确性。An identity authentication method provided by the embodiment of the invention combines voiceprint and dynamic password authentication to achieve the purpose of double verification for the user, and improves the security, reliability and accuracy of the system.
本实施例中,需要预先对PNN分类器进行训练,根据已有的语音获取目标文本模型和目标声纹模型;将已有语音与所述目标文本模型和目标声纹模型进行匹配得到文本打分和声纹打分,根据所述声纹打分和文本打分组合成接受特征信息和拒绝特征信息,将所述信息接受特征和所述拒绝特征信息作为综合PNN判决分类器的输入进行训练,得到最终的综合判决分类器;实现方式如下:In this embodiment, the PNN classifier needs to be trained in advance, and the target text model and the target voiceprint model are acquired according to the existing voice; the existing voice is matched with the target text model and the target voiceprint model to obtain a text score and The voiceprint is scored, and the feature information and the rejection feature information are synthesized according to the voiceprint score and the text group, and the information acceptance feature and the rejection feature information are trained as input of the integrated PNN decision classifier to obtain a final synthesis. Decision classifier; implementation is as follows:
将目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第一文本打分和第一声纹打分,将所述第一文本打分和第一声纹打分组合成为所述判决分类器的接受特征信息;Matching the target speech with the target text model and the target voiceprint model to obtain a first text score and a first voice score, respectively, and combining the first text score and the first voice score into the decision classifier. Accept feature information;
将非目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第二文本打分和第二声纹打分,将所述第二文本打分和第二声纹打分组合成为所述判决分类器的拒绝特征信息; Matching the non-target speech with the target text model and the target voiceprint model to obtain a second text score and a second voice score, respectively, and combining the second text score and the second voice score into the decision classifier Rejection characteristic information;
根据所述接受特征信息和所述拒绝特征信息对所述PNN分类器进行训练。The PNN classifier is trained according to the acceptance feature information and the rejection feature information.
所述目标语音为所述目标话者读取所述目标文本的语音,所述非目标语音为所述目标话者读取非目标文本的语音以及非目标话者的语音。The target voice is a voice of the target speaker reading the target text, and the non-target voice is a voice of the target speaker reading non-target text and a voice of a non-target speaker.
可选地,在训练所述综合分类器之前对所述声纹打分和文本打分进行得分规整,例如包括以下步骤:Optionally, the voice score and the text score are scored before the training of the integrated classifier, for example, including the following steps:
a.依次选取目标文本模型,取非目标文本语音特征与该目标文本模型匹配,得到冒认文本打分;a. Select the target text model in turn, and take the non-target text speech feature to match the target text model, and obtain the fake text score;
b.求所述目标文本模型对应的所述冒认文本打分均值及标准差;b. Find the mean value and standard deviation of the falsified text corresponding to the target text model;
c.将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;c. The first text score and the second text score are respectively subtracted from the average of the corresponding text of the short text and divided by the standard deviation, respectively, and the regular text score is obtained;
d.合并所述声纹打分和规整后的文本打分,求得每一目标文本对应的最大值和最小值,利用步骤d中的所述最大值和最小值将所述声纹打分和文本打分进行归一化;例如:d. Combining the voice scores and the regularized text scores, obtaining the maximum and minimum values corresponding to each target text, and using the maximum and minimum values in step d to score the voiceprint scores and texts. Normalize; for example:
合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;Combining the normalized first text score and the first voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the normalized first text and the first The voiceprint score is normalized as the acceptance feature information of the PNN classifier;
合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。Combining the normalized second text score and the second voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the regular second text and the second The voiceprint score is normalized as the rejection feature information of the PNN classifier.
本实施例中为方便描述,做以下定义:For convenience of description in this embodiment, the following definitions are made:
目标文本:事先选定作为备选密码的文本,如0~9数字;Target text: text selected as an alternate password in advance, such as 0-9 digits;
目标话者:系统受信任的话者,在声纹认证时需要让其通过的话者;Target speaker: A person who is trusted by the system, who needs to pass the voiceprint authentication;
冒认话者:系统非受信任话者,在声纹认证时需要拒绝其进入的话者;A caller: a system that is not trusted, who needs to reject the entry when voiceprint authentication;
目标密码:系统受信任的目标文本组合,在文本认证时需要让其通过;Target password: A combination of the target texts trusted by the system, which needs to be passed when the text is authenticated;
冒认密码:系统不受信任的文本组合,在文本认证时需要拒绝其进入的文本。 False password: A system-untrusted combination of text that needs to be rejected for text authentication.
系统进行认证之前,需要选择目标文本集,并针对目标文本集中的每个目标文本进行训练,得到目标文本模型集。以下实施例目标文本集选择为:0~9十个数字,目标模型集由0~9十个数字训练出来的模型组成,目标模型种类可以为HMM(Hidden Markov Model,隐马尔可夫模型)。为方便描述,动态密码均由0~9十个数字中的8个组成,即系统选择8个目标文本,作为目标密码。同时在系统进行认证之前,需要注册目标话者的声纹信息,通过训练生成声纹模型,并通过声纹模型和目标模型训练综合判决分类器,如图2所示包括如下步骤:Before the system performs authentication, it is necessary to select the target text set and train each target text in the target text set to obtain the target text model set. In the following embodiment, the target text set is selected as: 0 to 9 ten numbers, and the target model set is composed of 0 to 9 ten numbers trained models, and the target model type may be HMM (Hidden Markov Model). For convenience of description, the dynamic password is composed of 8 out of 0 to 9 ten digits, that is, the system selects 8 target texts as the target password. At the same time, before the system is authenticated, it is necessary to register the voiceprint information of the target speaker, generate a voiceprint model through training, and train the comprehensive decision classifier through the voiceprint model and the target model, as shown in FIG. 2, including the following steps:
步骤001:训练目标文本模型:使用0~9的数字录音训练单个数字的HMM,每个数字的模型称为目标文本模型,训练方法可使用现有的训练方法;Step 001: Training target text model: training a single digital HMM using digital recordings of 0-9, each digital model is called a target text model, and the training method can use an existing training method;
HMM是一个双重随机过程,一个过程用来描述短时平稳信号的时变性,另一个过程用来描述HMM模型的状态数与特征序列之间的对应关系。两个过程相互作用,不仅能够描述语音信号的动态特性,而且可以解决短时平稳信号之间的过渡问题。HMM is a double stochastic process, one process is used to describe the time-varying of the short-term stationary signal, and the other process is used to describe the correspondence between the state number of the HMM model and the feature sequence. The interaction of the two processes not only describes the dynamic characteristics of the speech signal, but also solves the transition problem between short-term stationary signals.
步骤002:注册目标话者声纹模型:系统在使用之前,事先注册目标话者声纹模型,目标话者即为系统受信任的话者,在认证时需要让其通过;Step 002: Register the target voiceprint model: Before the system is used, register the target voiceprint model in advance, and the target speaker is the system trusted speaker, and needs to pass it when authenticating;
步骤003:求接受特征:使用目标话者的目标文本对应的语音与其对应的HMM进行匹配,得到目标文本接受打分;使用目标话者的目标文本对应的语音与目标话者声纹模型进行打分,得到目标话者声纹接受打分;一系列的目标话者声纹接受打分和目标文本接受打分组成综合分类器的接受特征,对应综合分类器输出为1;Step 003: Acquire the feature: use the voice corresponding to the target text of the target caller to match the corresponding HMM, and obtain the target text to receive the score; use the voice corresponding to the target text of the target caller to score the voice pattern of the target speaker. Get the target speaker's voiceprint to receive the score; a series of target voiceprints receive the score and the target text accepts the score to form the acceptance characteristics of the comprehensive classifier, corresponding to the integrated classifier output is 1;
步骤004:求拒绝特征:使用目标文本对应的语音与非对应的HMM模型进行匹配,得到冒认文本的拒绝打分;使用冒认话者与目标声纹模型进行打分,得到冒认声纹拒绝打分,由一系列的冒认文本拒绝打分和冒认声纹拒绝打分组成综合分类器的拒绝特征,对应综合分类器输出为0;Step 004: Reject the feature: use the voice corresponding to the target text to match the non-corresponding HMM model, and obtain the rejection score of the fake text; use the caller and the target voiceprint model to score, and obtain the voice recognition and reject the score. Rejecting the score by a series of falsified texts and rejecting the swearing and rejecting the scores to form a rejection feature of the comprehensive classifier, corresponding to the output of the integrated classifier being 0;
步骤005:训练分类器:合并综合分类器的接受特征和拒绝特征,将合并后的特征进行得分规整(详见步骤109)后作为分类器的训练输入, 根据现有训练算法(如梯度下降算法)可得到综合分类器。Step 005: Train the classifier: combine the acceptance feature and the rejection feature of the integrated classifier, and perform the score regularization (see step 109) as the training input of the classifier. A comprehensive classifier can be obtained according to an existing training algorithm such as a gradient descent algorithm.
实施例一:Embodiment 1:
如图3所示,包括以下步骤:As shown in Figure 3, the following steps are included:
步骤101、预处理:根据短时能量和短时过零率,对用户输入的测试语音进行预处理,去掉语音中的非语音段;Step 101: Pre-processing: pre-processing the test voice input by the user according to the short-time energy and the short-time zero-crossing rate, and removing the non-speech segment in the voice;
步骤102、特征参数提取:对预处理后的测试语音进行特征参数提取,该系统可以采用12维梅尔频域倒谱系数(Mel Frequency Cepstrum Coefficient,简称MFCC)和其一阶差分系数作为特征参数,共24维;Step 102: Feature parameter extraction: extracting characteristic parameters of the pre-processed test speech, the system may adopt a 12-dimensional Melfield Cepstrum Coefficient (MFCC) and its first-order differential coefficient as characteristic parameters. , a total of 24 dimensions;
步骤103、计算声纹匹配分数:将测试语音特征与目标话者的声纹模型进行匹配,得到声纹匹配分数;Step 103: Calculating a voiceprint matching score: matching the test voice feature with the voiceprint model of the target speaker to obtain a voiceprint matching score;
步骤104、对语音特征初始切分:通过对测试语音特征的初始切分,获得初始切分单元以及初始切分单元个数。Step 104: Initially segment the speech feature: obtain an initial segmentation unit and an initial segmentation unit number by initial segmentation of the test speech feature.
本实施例中,根据目标密码中的目标文本序列,将对应的目标文本HMM组合成复合HMM;In this embodiment, the corresponding target text HMM is combined into a composite HMM according to the target text sequence in the target password;
将所述语音特征作为所述复合HMM的输入进行Viterbi(维特比)解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;Performing Viterbi decoding as the input of the composite HMM to obtain a first state output sequence, and correspondingly indicating a state in which the number of states of the single target text HMM is an integer multiple of the first state output sequence Position as the initial cut point;
依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,否则,不更新初始切分点,由所述初始切分点分割成的所述初始切分单元。The adjacent two initial segmentation points are sequentially selected as a range start and end point, in which the average energy is calculated in units of specified frames, and the point where the average energy continuously increases by a specified number of times is found, and the point at which the increase is started is started. As a new initial segmentation point, otherwise, the initial segmentation point is not updated, and the initial segmentation unit is divided by the initial segmentation point.
其中,所述复合HMM的状态数为单个目标文本HMM的状态数总和;所述复合HMM的每个状态具有的高斯混合模型参数与所述单个目标文本HMM的每个状态具有的高斯混合模型参数相同,The state number of the composite HMM is a sum of state numbers of a single target text HMM; each state of the composite HMM has a Gaussian mixture model parameter and a Gaussian mixture model parameter of each state of the single target text HMM the same,
将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;The last state self transition probability in the state transition matrix of the single target text HMM is set to 0, the state transition probability transferred to the next state is set to 1; the state transition of the last single target text HMM of the target text The probability matrix is not changed;
将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的 单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。Determining a state transition probability matrix of the single target text HMM according to the target text A single target text arrangement order is merged to obtain a state transition probability matrix of the composite HMM.
对语音特征初始切分的方法如图4所示,包括步骤如下:The method for initial segmentation of speech features is shown in Figure 4, including the following steps:
步骤104a、复合HMM模型的组合:按照目标密码中目标文本序列,将对应的单个目标文本HMM组合为复合HMM模型。 Step 104a: Combination of composite HMM models: Combine the corresponding single target texts HMM into a composite HMM model according to the target text sequence in the target password.
假设每个数字的HMM模型有8个状态数,每个状态由3个高斯函数拟合,那么,复合HMM模型的状态数为单个目标文本HMM模型状态数之和,每个状态仍由3个高斯函数拟合,且其高斯混合模型参数与单个HMM模型每个状态的高斯混合模型参数相同,复合HMM的状态转移概率矩阵参数的变化以3个单个目标文本HMM模型连接成一个复合型HMM为例进行说明,该例中单个目标文本HMM模型状态数为3,如下式所示:Assuming that each number of HMM models has 8 states, each state is fitted by 3 Gaussian functions, then the state number of the composite HMM model is the sum of the number of states of a single target text HMM model, and each state is still composed of 3 The Gaussian function is fitted, and its Gaussian mixture model parameters are the same as the Gaussian mixture model parameters of each state of the single HMM model. The changes of the state transition probability matrix parameters of the composite HMM are connected into a composite HMM by three single target text HMM models. For example, the number of state of a single target text HMM model in this example is 3, as shown in the following equation:
Figure PCTCN2017076336-appb-000001
Figure PCTCN2017076336-appb-000001
Figure PCTCN2017076336-appb-000002
Figure PCTCN2017076336-appb-000002
Figure PCTCN2017076336-appb-000003
Figure PCTCN2017076336-appb-000003
组合成复合HMM模型时,每个状态矩阵将改写成如下形式:When synthesizing a composite HMM model, each state matrix is rewritten as follows:
Figure PCTCN2017076336-appb-000004
Figure PCTCN2017076336-appb-000004
Figure PCTCN2017076336-appb-000005
Figure PCTCN2017076336-appb-000005
Figure PCTCN2017076336-appb-000006
Figure PCTCN2017076336-appb-000006
于是复合HMM模型的状态转移概率矩阵为:Then the state transition probability matrix of the composite HMM model is:
Figure PCTCN2017076336-appb-000007
Figure PCTCN2017076336-appb-000007
步骤104b、Viterbi(维特比)解码:利用Viterbi解码将步骤102中得到的特征序列与步骤104a中得到的复合HMM模型匹配,得到一个最佳状态输出序列,使每一帧特征都有其对应的状态; Step 104b, Viterbi decoding: matching the feature sequence obtained in step 102 with the composite HMM model obtained in step 104a by Viterbi decoding to obtain an optimal state output sequence, so that each frame feature has its corresponding status;
步骤104c、寻找初始切分点:由步骤104a可知单个数字HMM模型的状态数为8,在步骤104b中所得最佳状态输出序列中寻找对应状态为8的整数倍的位置作为初始切分点P(i); Step 104c: Find an initial segmentation point: it can be known from step 104a that the number of states of the single digital HMM model is 8, and the position of the optimal state output sequence obtained in step 104b is found as the initial segmentation point P. (i);
步骤104d、更新初始切分点:依次选取步骤104c中相邻的两个初始切分点P(i-1)和P(i),并分别作为区间的起始点和终止点。在该区间内,每K帧组成一段,共L段,每段平均能量为E(n),n为段索引号,计算S(n-1)=E(n)-E(n-1)n=2…L,从S(n1)>0,n1=1…L-1的索引号开始向后搜索,若S(n1+1),S(n1+2),……,S(n1+q)均大于0,其中q是一个大于1的常数,则将n1段的起始点作为新的初始切分点代替P(i-1);若无该类索引号,则不更新初始切分点。由初始切分点分割成的不同单元即初始切分单元,假设初始切分单元个数为M,由于最佳状态序列的最大状态为64,所以初始切分单元个数小于等于8个(该更新过程并未改变初始切分点个数); Step 104d: Update the initial segmentation point: sequentially select two adjacent initial segmentation points P(i-1) and P(i) in step 104c, and respectively serve as a starting point and a termination point of the interval. In this interval, each K frame is composed of a segment, a total of L segments, the average energy of each segment is E(n), n is the segment index number, and S(n-1)=E(n)-E(n-1) is calculated. n=2...L, search backward from the index number of S(n1)>0, n1=1...L-1, if S(n1+1), S(n1+2),...,S(n1 +q) is greater than 0, where q is a constant greater than 1, then the starting point of the n1 segment is replaced by P(i-1) as the new initial segmentation point; if there is no such index number, the initial slice is not updated. Points. The initial segmentation unit is divided into different units, that is, the initial segmentation unit, assuming that the number of initial segmentation units is M, since the maximum state of the optimal state sequence is 64, the number of initial segmentation units is less than or equal to 8 (the The update process does not change the number of initial segmentation points);
步骤105、初始切分单元个数判决:步骤104将语音切分后得到若干 个初始切分单元,对于目标密码语音,其初始切分单元个数一般近似等于目标密码中目标文本个数;对于冒认密码语音,其切分单元个数往往远小于目标密码中目标文本个数。由步骤104可知测试语音初始切分单元数为M,假设最少切分单元个数为T,当M<T时,系统直接拒绝该请求人,判决结束,否则,执行步骤106;Step 105: Initial segmentation unit number decision: Step 104: After segmenting the speech, a number of segments are obtained. The initial segmentation unit, for the target cipher voice, the number of initial segmentation units is generally equal to the target text number in the target password; for the cryptographic voice, the number of segmentation units is often much smaller than the target text in the target password. number. It can be known from step 104 that the initial number of test speech units is M, assuming that the minimum number of split units is T, when M < T, the system directly rejects the requester, the decision ends, otherwise, step 106 is performed;
步骤106、强制切分:当8-M>0时,取初始切分单元中对应特征段最长的切分单元,并将该特征段平均切分为(8-M+1)份,强制切分后的切分单元总数变为8;Step 106: Forced segmentation: when 8-M>0, take the longest segmentation unit of the corresponding feature segment in the initial segmentation unit, and divide the feature segment into (8-M+1) portions, forcibly The total number of segmentation units after segmentation becomes 8;
步骤107、计算文本匹配分数:将步骤106中得到的切分单元对应特征序列与0~9十个目标文本的目标模型HMM进行匹配,每个切分单元对应10个匹配打分,假设该打分为word_score(i,j),该变量表示动态密码中第i个切分单元与数字j的模型的文本匹配分数;Step 107: Calculate the text matching score: match the corresponding feature sequence of the segmentation unit obtained in step 106 with the target model HMM of 0 to 9 target texts, and each segmentation unit corresponds to 10 matching scores, assuming the score is divided. Word_score(i,j), the variable represents the text matching score of the model of the i-th segmentation unit and the number j in the dynamic password;
步骤108、声纹与文本初步认证: Step 108, voiceprint and text preliminary certification:
取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过,判决结束;若通过的切分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;Taking the text corresponding to the m highest scores in the segment matching unit text matching score corresponding to each of the segmentation units as the candidate text, and if the candidate text includes the target text corresponding to the segmentation unit, The segmentation unit passes the authentication, and calculates the total number of the segmentation units that pass. If the total number of the segmentation units passed is less than or equal to the fourth threshold, the text authentication fails, the identity authentication fails, and the decision ends; if the segmentation is passed If the total number of units is greater than the fourth threshold, the text authentication of the input voice passes;
判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过,判决结束;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述判决分类器的输入进行身份认证。Determining whether the voiceprint matching score is greater than a fifth threshold, if yes, the voiceprint authentication is passed, the identity authentication is passed, and the decision is ended; if not, the text of each of the segmentation units and the corresponding target text model is scored and The voiceprint matching score is scored, and the regularized score is used as the input of the decision classifier for identity authentication.
如图5所示,其实施方法如下:As shown in Figure 5, the implementation method is as follows:
步骤108a、每个切分单元各取m个最高得分:由上述步骤106可知,每个切分单元对应有10个得分,各取m(一般为2或3)个最高打分,分别对应m个待匹配文本; Step 108a: Each of the segmentation units takes m highest scores: as shown in step 106 above, each segmentation unit has 10 scores, each taking m (generally 2 or 3) highest scores, corresponding to m The text to be matched;
步骤108b、切分单元文本认证:对每个切分单元进行文本认证,若切 分单元对应的m个待匹配文本中包含该切分单元对应的目标文本,则该切分单元的文本认证通过,反之,认证不通过; Step 108b, segmentation unit text authentication: text authentication for each segmentation unit, if cut If the m texts to be matched corresponding to the sub-units include the target text corresponding to the segmentation unit, the text authentication of the segmentation unit passes, and vice versa, the authentication fails;
步骤108c、计算切分单元文本认证通过的总个数W; Step 108c, calculating the total number of texts passed by the segmentation unit text authentication;
步骤108d、测试语音文本认证:假设测试语音切分单元文本认证通过的最小数为p,当W大于p时,则判定该语音文本认证通过,并转至步骤108e,否则,文本认证不通过,身份认证不通过,判决结束; Step 108d: Test voice text authentication: Assume that the minimum number of text authentication passes by the test voice segmentation unit is p. When W is greater than p, it is determined that the voice text authentication passes, and the process proceeds to step 108e. Otherwise, the text authentication fails. The identity authentication fails and the judgment ends;
步骤108e、测试语音声纹认证:设置一个较大的声纹阈值,以保证系统的严格性,当声纹匹配分数大于阈值时,声纹认证通过,该测试语音身份认证通过,否则,转至步骤109; Step 108e: testing the voice voiceprint authentication: setting a larger voiceprint threshold to ensure the strictness of the system. When the voiceprint matching score is greater than the threshold, the voiceprint authentication is passed, and the test voice identity authentication is passed, otherwise, the process proceeds to Step 109;
步骤109、得分规整:首先求得大量冒认密码语音对应目标文本模型的打分均值与方差,在得到测试语音中每个切分单元对应的文本打分后减去冒认得分均值并除以标准差。如图6所示,其实施方法如下:Step 109: The score is regular: firstly, the scores mean and variance of the target text model corresponding to a large number of vocabulary voices are obtained, and the text corresponding to each segmentation unit in the test voice is scored, and the average value of the spoof score is subtracted and divided by the standard deviation. . As shown in Figure 6, the implementation method is as follows:
步骤109a、求大量冒认文本打分:依次取0~9的单个数字模型HMM,假设取数字l的模型HMMl,根据Viterbi算法,取大量非l的冒认语音特征作为模型HMMl的输入,得到大量冒认文本打分; Step 109a, seeking a large number of spoofed text scores: taking a single digital model HMM of 0-9 in turn, assuming that the model HMM l of the number l is taken, according to the Viterbi algorithm, taking a large number of non-l spoofed speech features as input of the model HMM l , Get a lot of fake text scores;
步骤109b、求均值与标准差:计算每个文本对应的冒认文本打分均值与标准差; Step 109b: Find the mean value and the standard deviation: calculate the average value and the standard deviation of the text of the text corresponding to each text;
步骤109c、零归整及归一化:在步骤107计算文本匹配分数的基础上,找出每个切分单元与其对应目标文本模型的打分,此时每个切分单元对应一个文本打分。根据零归整方法,将每个文本打分分别减去对应文本的冒认打分均值并除以标准差,得到规整后的文本匹配分数,将步骤103中得到的声纹匹配分数与规整后的8个文本匹配分数合并组成一个9维的特征向量score(得分)。由于该特征向量中的声纹匹配分数不论是目标话者还是冒认话者的声纹打分,其打分一般远大于文本匹配分数,因此,又对特征向量增加了归一化处理,使得声纹匹配分数与文本匹配分数均在[0,1]之间。假设该特征向量的最大值和最小值分别为max_score和min_score,对特征向量作线性变换,得到一个新的特征向量new_score=(score-min_score)/(max_score-min_score); Step 109c, zero normalization and normalization: On the basis of calculating the text matching score in step 107, the scoring of each segmentation unit and its corresponding target text model is found, and each segmentation unit corresponds to a text score. According to the zero-homing method, each text score is subtracted from the default score of the corresponding text and divided by the standard deviation to obtain a regular text matching score, and the voiceprint matching score obtained in step 103 is adjusted. The text matching scores are combined to form a 9-dimensional feature vector score. Since the voiceprint matching score in the feature vector is scored by the target speaker or the caller's voiceprint, the score is generally much larger than the text matching score. Therefore, the normalization process is added to the feature vector to make the voiceprint The matching score and the text matching score are both between [0, 1]. Assuming that the maximum and minimum values of the feature vector are max_score and min_score, respectively, linearly transform the feature vector to obtain a new feature vector new_score=(score-min_score)/(max_score-min_score);
步骤110综合判决:利用综合判决分类器对输入特征向量new_score进行判决,对于每一个输入,其输出为1或0,当输出为1时表示测试语音判决通过,输出为0时拒绝测试语音通过。Step 110: The decision is made by using the integrated decision classifier to determine the input feature vector new_score. For each input, the output is 1 or 0. When the output is 1, the test voice decision is passed, and when the output is 0, the test voice is rejected.
实施例二:Embodiment 2:
针对第一种实施方式中步骤104的对于语音特征初始切分、步骤105的切分单元个数判决,以及步骤106的强制切分,本实施例中采用以下方法进行切分和判决:For the initial segmentation of the speech feature in step 104 in the first embodiment, the segmentation unit number determination in step 105, and the forced segmentation in step 106, the following method is used in the embodiment to perform segmentation and decision:
步骤201,语音信号初始切分; Step 201, initial segmentation of the voice signal;
本实施例中,按照所述初始切分单元的长度从大到小的顺序开始拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分单元的总个数等于所述目标文本的个数为止;In this embodiment, the splitting is started according to the length of the initial splitting unit from the largest to the smallest, and each of the initial splitting units is divided into two segments at a time, until the total number of the splitting units is equal to The number of the target texts;
若强制切分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分的单元分别与每个目标文本HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;If the number of forced segmentation is greater than or equal to the second threshold, the forced segmentation ends; if the number of forced segmentation is less than the second threshold, then each of the currently segmented cells is matched with each target text HMM. Determining, respectively, the target text HMM corresponding to the highest score, and combining the selected target text HMM into a second composite HMM;
将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元即为所述切分单元,若当前的切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,若当前的切分单元个数大于或等于第三阈值,则强制切分结束,并将上述强制切分后的切分单元作为最终切分单元。如图8所示,包括如下步骤:Performing Viterbi decoding as the input of the second composite HMM to obtain a second state output sequence corresponding to a state in which the number of states of the single target text HMM is an integer multiple of the second target output sequence The position is used as a segmentation point, and the different units obtained by dividing the speech feature by the segmentation point are the segmentation unit, and if the current number of segmentation units is less than the third threshold, the current slice is cut. The sub-unit continues to perform forced segmentation as the initial segmentation unit. If the current number of segmentation units is greater than or equal to the third threshold, the forced segmentation ends, and the sliced unit after the forced segmentation is used as the final slice. Sub-unit. As shown in Figure 8, the following steps are included:
步骤201a、初始分割:计算语音信号包络,选择8个极大包络处附近区域作为初始分割结果; Step 201a, initial segmentation: calculating a voice signal envelope, and selecting a region near the 8 maximum envelopes as an initial segmentation result;
步骤201b、根据打分对初始分割段判决:将每个分割段对0~9十个数字模型进行打分,每个分割段取最高得分对应的数字,并作为该分割段的判决结果; In step 201b, the initial segmentation segment is determined according to the scoring: each segment is scored from 0 to 9 digital models, and each segment takes the number corresponding to the highest score and serves as the decision result of the segment;
步骤201c、复合HMM模型的组合:根据上述步骤201b中的分割判决结果,选择相应的HMM模型,组合成复合HMM模型,该组合过程可参见第一种实施方式中的步骤104a; Step 201c, the combination of the composite HMM model: according to the segmentation decision result in the above step 201b, select the corresponding HMM model, combined into a composite HMM model, the combination process can be seen in step 104a in the first embodiment;
步骤201d、根据Viterbi解码作进一步分割:根据步骤201c输出的组合模型对输入信号进行Viterbi解码,根据最佳状态序列对信号做进一步分割,该分割过程可参见第一种实施方式中的步骤104c。 Step 201d is further divided according to Viterbi decoding: Viterbi decoding is performed on the input signal according to the combined model outputted in step 201c, and the signal is further divided according to the optimal state sequence. For the segmentation process, refer to step 104c in the first embodiment.
步骤202、强制切分:将分割段长度大小排序,按大小顺序平均分割为两个,直至分割为8段为止。Step 202: Force segmentation: Sort the segment lengths into two sizes, and divide them into two in size, until they are divided into 8 segments.
步骤203、初始切分判决:若步骤201d的分割段个数小于X(相当于第三阈值,X<8)个,则转至步骤201b,将步骤202的输出结果作为步骤201b的输入,继续进行分割;若分割段个数大于或等于X,则分割结束。设定一个最大迭代次数D(相当于第二阈值),若该过程迭代次数等于D时,步骤201b的分割段数仍小于X个,则停止迭代,并拒绝该语音;若该过程迭代次数小于D或等于D时分割段数大于等于X,则继续进行判决,执行第一种实施方式中的步骤107及后续步骤。Step 203: Initial segmentation decision: if the number of segmentation segments in step 201d is less than X (corresponding to the third threshold value, X<8), then go to step 201b, and the output result of step 202 is input as step 201b, and continue. Perform segmentation; if the number of segments is greater than or equal to X, the segmentation ends. Set a maximum number of iterations D (corresponding to the second threshold). If the number of iterations of the process is equal to D, the number of segments in step 201b is still less than X, then the iteration is stopped and the speech is rejected; if the number of iterations of the process is less than D When the number of divided segments is equal to or greater than X, the decision is continued, and step 107 and subsequent steps in the first embodiment are performed.
图9为本发明实施例的一种身份认证的装置的示意图,本实施例的装置包括PNN分类器,如图9所示,本实施例的装置包括:FIG. 9 is a schematic diagram of an apparatus for identity authentication according to an embodiment of the present invention. The apparatus of this embodiment includes a PNN classifier. As shown in FIG. 9, the apparatus of this embodiment includes:
声纹匹配模块,设置为获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;The voiceprint matching module is configured to acquire a voice feature of the input voice, and match the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
切分模块,设置为根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;a segmentation module, configured to segment the input speech according to the voice feature and a preset target text model, and obtain an initial segmentation unit and a number of initial voice segmentation units, such as the initial voice segmentation unit If the number of the input speech is less than the threshold, the input speech is determined to be an illegal speech; if the number of the initial speech segmentation units is greater than or equal to the first threshold, the initial segmentation unit is forcibly segmented so that the segmentation is performed. The total number of units is the same as the number of preset target texts;
文本匹配模块,设置为将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;a text matching module, configured to match a voice feature of each of the segmentation units with all of the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models ;
认证模块,设置为根据所述切分单元文本匹配分数、所述声纹匹配分 数和预先训练的所述PNN分类器进行身份认证。An authentication module, configured to match a score according to the segmentation unit text, and the voiceprint matching score The number and the pre-trained PNN classifier perform identity authentication.
在一可选实施例中,所述装置还包括处理模块,In an optional embodiment, the apparatus further includes a processing module,
所述声纹匹配模块,是设置为将目标语音与目标声纹模型进行匹配得到第一声纹打分,将非目标语音与所述目标声纹模型进行匹配得到第二声纹打分;The voiceprint matching module is configured to match the target voice with the target voiceprint model to obtain a first voiceprint score, and match the non-target voice with the target voiceprint model to obtain a second voiceprint score;
所述文本匹配模块,是设置为将所述目标语音与所述目标文本模型进行匹配得到第一文本打分,为将所述非目标语音与所述目标文本模型进行匹配得到第二文本打分;The text matching module is configured to match the target speech with the target text model to obtain a first text score, and to match the non-target speech with the target text model to obtain a second text score;
所述处理模块,设置为将所述第一文本打分和第一声纹打分组合成为所述PNN分类器的接受特征信息,将所述第二文本打分和第二声纹打分组合成为所述PNN分类器的拒绝特征信息;The processing module is configured to combine the first text score and the first voice score into the acceptance feature information of the PNN classifier, and combine the second text score and the second voice score into the PNN Rejection feature information of the classifier;
所述PNN分类器,根据所述接受特征信息和所述拒绝特征信息进行训练。The PNN classifier performs training according to the acceptance feature information and the rejection feature information.
在一可选实施例中,所述处理模块,还设置为依次选取所述目标文本模型,取非目标文本的语音特征与对应的所述目标文本模型匹配,得到冒认文本打分,获取所述目标文本模型对应的冒认文本打分的均值及标准差;将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。In an optional embodiment, the processing module is further configured to sequentially select the target text model, and the voice feature of the non-target text is matched with the corresponding target text model, and the pseudo text is scored to obtain the Mean and standard deviation of the spoofed text score corresponding to the target text model; subtracting the average of the corresponding spoofed text scores by the first text score and the second text score, respectively, and dividing by the standard deviation, Separating the normalized text scores; combining the normalized first text scores and the first voiceprint scores to obtain maximum and minimum values corresponding to each target text; using the maximum and minimum values to be normalized a text score and the first voiceprint score are normalized as the acceptance feature information of the PNN classifier; the normalized second text score and the second voiceprint score are combined to obtain each target text corresponding a maximum value and a minimum value; normalizing the normalized second text score and the second voiceprint score using the maximum and minimum values as a rejection of the PNN classifier Feature information.
在一可选实施例中,所述切分模块,根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元,包括:根据目标密码中的目标文本序列,将对应的目标文本隐马尔可夫模型HMM组合成第一复合HMM;将所述语音特征作为所述第一复合HMM的输入进行维 特比解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,由所述初始切分点分割成的所述初始切分单元。In an optional embodiment, the segmentation module segments the input voice according to the voice feature and a preset target text model, and obtains an initial segmentation unit, including: according to target text in the target password a sequence, combining the corresponding target text hidden Markov model HMM into a first composite HMM; and performing the speech feature as an input of the first composite HMM Decoding, obtaining a first state output sequence, and using a position corresponding to a state of an integer multiple of a state number of a single target text HMM in the first state output sequence as an initial segmentation point; sequentially selecting the adjacent two The initial segmentation point is used as a start and end point of the interval. In the interval, the average energy is calculated in units of specified frames, and the point at which the average energy is continuously increased by a specified number is found, and the point at which the increase is started is taken as a new initial segmentation point. The initial segmentation unit is divided by the initial segmentation point.
在一可选实施例中,所述切分模块,将对应的目标文本HMM组合成第一复合HMM,包括:所述第一复合HMM的状态数为单个目标文本HMM的状态数总和;所述第一复合HMM的每个状态具有的高斯混合模型参数与所述单个目标文本HMM的每个状态具有的高斯混合模型参数相同;将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。In an optional embodiment, the sharding module combines the corresponding target texts HMM into the first composite HMM, including: the state number of the first composite HMM is the sum of the states of the single target text HMM; Each state of the first composite HMM has a Gaussian mixture model parameter that is the same as a Gaussian mixture model parameter that each state of the single target text HMM has; the last state in the state transition matrix of the single target text HMM is itself The transition probability is set to 0, the state transition probability of transitioning to the next state is set to 1; the state transition probability matrix of the last single target text HMM of the target text is not changed; the state transition probability matrix of the single target text HMM is The state transition probability matrix of the composite HMM is obtained by merging in a single target text arrangement order of the target text.
在一可选实施例中,所述切分模块,对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:选择特征段最长的所述初始切分单元进行强制切分,使得强制切分后的所有切分单元的总个数与预设的目标文本的个数相同。In an optional embodiment, the segmentation module performs forced segmentation on the initial segmentation unit, such that the total number of segmentation units is the same as the number of preset target texts, including: selecting feature segments The longest initial splitting unit performs forced splitting, so that the total number of all splitting units after the forced splitting is the same as the preset target text.
在一可选实施例中,所述切分模块,对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:按照所述初始切分单元的长度从大到小的顺序开始拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分单元的总个数等于所述目标文本的个数为止;若强制切分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分的单元分别与每个目标文本HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状 态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元即为切分单元,若当前的切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,若当前的切分单元个数大于或等于第三阈值,强制切分结束,并将上述强制切分后的切分单元作为最终切分单元。In an optional embodiment, the segmentation module performs forced segmentation on the initial segmentation unit, such that the total number of segmentation units is the same as the number of preset target texts, including: The lengths of the initial splitting units are split from the largest to the smallest, and each of the initial splitting units is divided into two segments at a time, until the total number of the splitting units is equal to the number of the target texts. If the number of forced segmentation is greater than or equal to the second threshold, the forced segmentation ends; if the number of forced segmentation is less than the second threshold, then each of the currently segmented cells is matched with each target text HMM. Sorting, respectively selecting the target text HMM corresponding to the highest score, combining the selected target text HMM into a second composite HMM; and performing the Viterbi decoding as the input of the second composite HMM Obtaining a second state output sequence, wherein the second state output sequence is an integer multiple of a state number of a single target text HMM The position corresponding to the state is used as a segmentation point, and the different units obtained by segmenting the speech feature by the segmentation point are segmentation units. If the current number of segmentation units is less than the third threshold, the current segmentation is performed. The segmentation unit continues to perform forced segmentation as the initial segmentation unit. If the current number of segmentation cells is greater than or equal to the third threshold, the forced segmentation ends, and the segmentation unit after the forced segmentation is used as the final slice. Sub-unit.
在一可选实施例中,所述文本匹配模块,将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数,包括:将每个所述切分单元的语音特征作为每个目标文本隐马尔可夫模型HMM的输入,将根据维特比算法获得的输出概率作为对应的切分单元文本匹配分数。In an optional embodiment, the text matching module matches the voice features of each of the segmentation units with all of the target text models to obtain each of the segmentation units and each of the target texts. The segmentation unit text matching score of the model includes: using the speech feature of each of the segmentation units as an input of each target text hidden Markov model HMM, and using the output probability obtained according to the Viterbi algorithm as a corresponding segmentation The unit text matches the score.
在一可选实施例中,所述认证模块,根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的判决分类器进行身份认证,包括:取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过,判决结束;若通过的切分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过,判决结束;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述PNN分类器的输入进行身份认证。In an optional embodiment, the authentication module performs identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and a pre-trained decision classifier, including: taking each of the segmentation units a text corresponding to the m highest scores in the corresponding segmentation unit text matching score is used as the candidate text, and if the candidate text includes the target text corresponding to the segmentation unit, the segmentation unit passes the authentication. Calculating the total number of the splitting units that pass, if the total number of the splitting units passed is less than or equal to the fourth threshold, the text authentication fails, the identity authentication fails, and the decision ends; if the total number of splitting units passed is greater than the fourth threshold And determining, by the text authentication of the input voice, whether the voiceprint matching score is greater than a fifth threshold, if yes, the voiceprint authentication is passed, the identity authentication is passed, and the judgment ends; if not, each of the segments is segmented The unit scores the text corresponding to the target text model and the voiceprint matching scores are scored, and the normalized score is used as the input of the PNN classifier. Line identity authentication.
本发明实施例还提供了一种计算机可读存储介质。可选地,在本实施例中,上述存储介质可以被设置为存储由处理器执行的程序代码,程序代码的步骤如下:The embodiment of the invention further provides a computer readable storage medium. Optionally, in this embodiment, the foregoing storage medium may be configured to store program code executed by a processor, and the steps of the program code are as follows:
S1、获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;S1: acquiring a voice feature of the input voice, and matching the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
S2、根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切 分单元的个数小于第一阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;S2, segmenting the input voice according to the voice feature and a preset target text model, and acquiring an initial segmentation unit and a number of initial voice segmentation units, such as the initial voice slice. If the number of the sub-units is less than the first threshold, the input speech is determined to be an illegal speech; if the number of the initial speech segmentation units is greater than or equal to the first threshold, the initial segmentation unit is forcibly segmented So that the total number of segmentation units is the same as the number of preset target texts;
S3、将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;S3. Matching the voice features of each of the segmentation units with all the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models;
S4、根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的概率神经网络PNN分类器进行身份认证。S4. Perform identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and a pre-trained probabilistic neural network PNN classifier.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等每种可以存储程序代码的介质。Optionally, in the embodiment, the foregoing storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), and a Random Access Memory (RAM). A removable medium such as a hard disk, a disk, or a disc that can store program code.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or a portion of the steps described above can be accomplished by a program that instructs the associated hardware, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware or in the form of a software function module. The invention is not limited to any specific form of combination of hardware and software.
以上仅为本发明的优选实施例,当然,本发明还可有其他多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。The above is only a preferred embodiment of the present invention, and of course, the present invention may be embodied in various other embodiments without departing from the spirit and scope of the invention. Corresponding changes and modifications are intended to be included within the scope of the appended claims.
工业实用性Industrial applicability
本发明实施例提供的上述技术方案,可以应用于身份认证过程中,将声纹与动态密码认证两者相结合,实现了对用户进行双重验证的目的,提 高了系统的安全性、可靠性和准确性。 The foregoing technical solution provided by the embodiment of the present invention can be applied to the identity authentication process, combining voiceprint and dynamic password authentication, thereby achieving the purpose of double verification for the user. Increased system security, reliability and accuracy.

Claims (18)

  1. 一种身份认证的方法,包括:A method of identity authentication, including:
    获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;Obtaining a voice feature of the input voice, and matching the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
    根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于第一阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;And segmenting the input voice according to the voice feature and the preset target text model, and acquiring the number of the initial segmentation unit and the initial voice segmentation unit, where the number of the initial voice segmentation units is smaller than the first And determining, by the threshold, that the input voice is an illegal voice; if the number of the initial voice segmentation units is greater than or equal to the first threshold, performing forced segmentation on the initial segmentation unit, so that the total number of the segmentation units is The number is the same as the number of preset target texts;
    将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数;Matching the voice features of each of the segmentation units with all of the target text models to obtain a segmentation unit text matching score for each of the segmentation units and each of the target text models;
    根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的概率神经网络PNN分类器进行身份认证。Identity authentication is performed according to the segmentation unit text matching score, the voiceprint matching score, and a pre-trained probabilistic neural network PNN classifier.
  2. 根据权利要求1所述的方法,其中,所述PNN分类器是通过以下方式进行训练的:The method of claim 1 wherein said PNN classifier is trained in the following manner:
    将目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第一文本打分和第一声纹打分,将所述第一文本打分和第一声纹打分组合成为所述判决分类器的接受特征信息;Matching the target speech with the target text model and the target voiceprint model to obtain a first text score and a first voice score, respectively, and combining the first text score and the first voice score into the decision classifier. Accept feature information;
    将非目标语音与所述目标文本模型和目标声纹模型进行匹配分别得到第二文本打分和第二声纹打分,将所述第二文本打分和第二声纹打分组合成为所述判决分类器的拒绝特征信息;Matching the non-target speech with the target text model and the target voiceprint model to obtain a second text score and a second voice score, respectively, and combining the second text score and the second voice score into the decision classifier Rejection characteristic information;
    根据所述接受特征信息和所述拒绝特征信息对所述PNN分类器进行训练。The PNN classifier is trained according to the acceptance feature information and the rejection feature information.
  3. 根据权利要求2所述的方法,其中,在根据所述接受特征信息和所述拒绝特征信息对所述PNN分类器进行训练之前,还包括对所述目标语音和所述非目标语音的声纹打分和文本打分进行得分规整,包括:The method according to claim 2, further comprising: a voiceprint for said target voice and said non-target voice before training said PNN classifier based on said acceptance feature information and said rejection feature information Scores and text scores are scored, including:
    依次选取所述目标文本模型,取非目标文本的语音特征与对应的所述 目标文本模型匹配,得到冒认文本打分,获取所述目标文本模型对应的冒认文本打分的均值及标准差;Selecting the target text model in turn, taking the speech features of the non-target text and the corresponding The target text model is matched, and the pseudo-text score is obtained, and the mean and standard deviation of the pseudo-text score corresponding to the target text model are obtained;
    将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;Subdividing the first text score and the second text score respectively by the mean value of the corresponding spoofed text scores and dividing by the standard deviation, respectively obtaining a regular text score;
    合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;Combining the normalized first text score and the first voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the normalized first text and the first The voiceprint score is normalized as the acceptance feature information of the PNN classifier;
    合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。Combining the normalized second text score and the second voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the regular second text and the second The voiceprint score is normalized as the rejection feature information of the PNN classifier.
  4. 根据权利要求1所述的方法,其中,所述根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元,包括:The method according to claim 1, wherein the segmenting the input speech according to the voice feature and a preset target text model to obtain an initial segmentation unit comprises:
    根据目标密码中的目标文本序列,将对应的目标文本隐马尔可夫模型HMM组合成第一复合HMM;Correlating the corresponding target text hidden Markov model HMM into a first composite HMM according to the target text sequence in the target password;
    将所述语音特征作为所述第一复合HMM的输入进行维特比解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;Performing Viterbi decoding as the input of the first composite HMM to obtain a first state output sequence, and corresponding to a state in the first state output sequence that is an integer multiple of a state number of a single target text HMM Position as the initial segmentation point;
    依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,由所述初始切分点分割成的所述初始切分单元。The adjacent two initial segmentation points are sequentially selected as a range start and end point, in which the average energy is calculated in units of specified frames, and the point where the average energy continuously increases by a specified number of times is found, and the point at which the increase is started is started. As the new initial segmentation point, the initial segmentation unit is divided by the initial segmentation point.
  5. 根据权利要求4所述方法,其中,将对应的目标文本HMM组合成第一复合HMM,包括:The method of claim 4, wherein combining the corresponding target texts HMM into the first composite HMM comprises:
    所述第一复合HMM的状态数为单个目标文本HMM的状态数总和;所述第一复合HMM的每个状态的高斯混合模型参数与所述单个目标文本HMM模型每个状态的高斯混合模型参数相同; The state number of the first composite HMM is a sum of state numbers of a single target text HMM; a Gaussian mixture model parameter of each state of the first composite HMM and a Gaussian mixture model parameter of each state of the single target text HMM model the same;
    将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;The last state self transition probability in the state transition matrix of the single target text HMM is set to 0, the state transition probability transferred to the next state is set to 1; the state transition of the last single target text HMM of the target text The probability matrix is not changed;
    将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。The state transition probability matrices of the single target text HMM are merged according to a single target text arrangement order of the target text, to obtain a state transition probability matrix of the composite HMM.
  6. 根据权利要求1所述方法,其中,所述对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:The method according to claim 1, wherein the forcing the initial segmentation unit to be forcibly splitting, so that the total number of segmentation units is the same as the number of preset target texts, including:
    选择特征段最长的所述初始切分单元进行强制切分,使得强制切分后的所有切分单元的总个数与预设的目标文本的个数相同。The initial segmentation unit having the longest feature segment is selected to perform forced segmentation, so that the total number of all segmentation cells after the forced segmentation is the same as the preset target text number.
  7. 根据权利要求1所述方法,其中,所述对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:The method according to claim 1, wherein the forcing the initial segmentation unit to be forcibly splitting, so that the total number of segmentation units is the same as the number of preset target texts, including:
    按照所述初始切分单元的长度从大到小的顺序开始强制拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分后的切分单元总个数等于所述目标文本的个数为止;The forced splitting is started according to the length of the initial splitting unit from the largest to the smallest, and each of the initial splitting units is divided into two segments at a time, until the total number of the splitting units after the splitting is equal to The number of target texts;
    若强制切分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分单元分别与每个目标文本隐马尔可夫模型HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元为所述切分单元,若当前的所述切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,当前的所述切分单元个数大于或小于所述第三阈值,则强制切分结束。If the number of forced segmentation is greater than or equal to the second threshold, the forced segmentation ends; if the number of forced segmentation is less than the second threshold, each current segmentation unit is separately associated with each target text hidden Markov model The HMM performs matching scoring, respectively selecting the target text HMM corresponding to the highest scoring, and combining the selected target text HMM into a second composite HMM; and using the speech feature as an input of the second composite HMM The Viterbi decoding obtains a second state output sequence, and the position corresponding to the state of the second state output sequence that is an integer multiple of the state number of the single target text HMM is used as a segmentation point, and the segmentation point is used for the speech The different unit obtained by the feature segmentation is the segmentation unit. If the current number of the segmentation units is less than the third threshold, the segmentation unit after the current segmentation is used as the initial segmentation unit to continue the forced segmentation. If the current number of the segmentation units is greater than or less than the third threshold, the forced segmentation ends.
  8. 根据权利要求1所述方法,其中,所述将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数,包括: The method of claim 1 wherein said matching said speech features of each of said segmentation units with all of said target text models yields each of said segmentation units and each of said target text models Segmentation unit text match scores, including:
    将每个所述切分单元的语音特征作为每个目标文本隐马尔可夫模型HMM的输入,将根据维特比算法获得的输出概率作为对应的切分单元文本匹配分数。The speech feature of each of the segmentation units is used as an input of each target text hidden Markov model HMM, and the output probability obtained according to the Viterbi algorithm is used as a corresponding segmentation unit text matching score.
  9. 根据权利要求1-8任一项所述方法,其中,所述根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的判决分类器进行身份认证,包括:The method of any of claims 1-8, wherein said performing identity authentication based on said segmentation unit text matching score, said voiceprint matching score, and a pre-trained decision classifier comprises:
    取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过;若通过的切分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;Taking the text corresponding to the m highest scores in the segment matching unit text matching score corresponding to each of the segmentation units as the candidate text, and if the candidate text includes the target text corresponding to the segmentation unit, The segmentation unit passes the authentication, and calculates the total number of the splitting units that pass. If the total number of the splitting units passed is less than or equal to the fourth threshold, the text authentication fails, and the identity authentication fails; if the total number of the splitting units passed is greater than The fourth threshold, the text authentication of the input voice passes;
    判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述判决分类器的输入进行身份认证。Determining whether the voiceprint matching score is greater than a fifth threshold, if yes, the voiceprint authentication is passed, and the identity authentication is passed; if not, the text of each of the segmentation units and the corresponding target text model is scored and the voiceprint is The matching score is scored and the regularized score is used as the input of the decision classifier for identity authentication.
  10. 一种身份认证的装置,包括概率神经网络PNN分类器,包括:An apparatus for identity authentication, including a probabilistic neural network PNN classifier, comprising:
    声纹匹配模块,设置为获取输入语音的语音特征,将所述语音特征与预存的目标声纹模型进行匹配,得到声纹匹配分数;The voiceprint matching module is configured to acquire a voice feature of the input voice, and match the voice feature with the pre-stored target voiceprint model to obtain a voiceprint matching score;
    切分模块,设置为根据所述语音特征和预设的目标文本模型对所述输入语音进行切分,获取初始切分单元以及初始语音切分单元的个数,如所述初始语音切分单元的个数小于阈值,则判定所述输入语音为非法语音;如所述初始语音切分单元的个数大于或等于第一阈值,则对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同;a segmentation module, configured to segment the input speech according to the voice feature and a preset target text model, and obtain an initial segmentation unit and a number of initial voice segmentation units, such as the initial voice segmentation unit If the number of the input speech is less than the threshold, the input speech is determined to be an illegal speech; if the number of the initial speech segmentation units is greater than or equal to the first threshold, the initial segmentation unit is forcibly segmented so that the segmentation is performed. The total number of units is the same as the number of preset target texts;
    文本匹配模块,设置为将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数; a text matching module, configured to match a voice feature of each of the segmentation units with all of the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models ;
    认证模块,设置为根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的所述PNN分类器进行身份认证。An authentication module is configured to perform identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and the pre-trained PNN classifier.
  11. 根据权利要求10所述的装置,其中,还包括处理模块,The apparatus of claim 10, further comprising a processing module,
    所述声纹匹配模块,是设置为将目标语音与目标声纹模型进行匹配得到第一声纹打分,将非目标语音与所述目标声纹模型进行匹配得到第二声纹打分;The voiceprint matching module is configured to match the target voice with the target voiceprint model to obtain a first voiceprint score, and match the non-target voice with the target voiceprint model to obtain a second voiceprint score;
    所述文本匹配模块,是设置为将所述目标语音与所述目标文本模型进行匹配得到第一文本打分,将所述非目标语音与所述目标文本模型进行匹配得到第二文本打分;The text matching module is configured to match the target speech with the target text model to obtain a first text score, and match the non-target speech with the target text model to obtain a second text score;
    所述处理模块,设置为将所述第一文本打分和第一声纹打分组合成为所述PNN分类器的接受特征信息,将所述第二文本打分和第二声纹打分组合成为所述PNN分类器的拒绝特征信息;The processing module is configured to combine the first text score and the first voice score into the acceptance feature information of the PNN classifier, and combine the second text score and the second voice score into the PNN Rejection feature information of the classifier;
    所述PNN分类器,根据所述接受特征信息和所述拒绝特征信息进行训练。The PNN classifier performs training according to the acceptance feature information and the rejection feature information.
  12. 根据权利要求11所述的装置,其中,The apparatus according to claim 11, wherein
    所述处理模块,还设置为依次选取所述目标文本模型,取非目标文本的语音特征与对应的所述目标文本模型匹配,得到冒认文本打分,获取所述目标文本模型对应的冒认文本打分的均值及标准差;将所述第一文本打分和所述第二文本打分分别减去对应的所述冒认文本打分的均值且除以所述标准差,分别得到规整后的文本打分;合并规整后的第一文本打分和所述第一声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第一文本打分和所述第一声纹打分进行归一化,作为所述PNN分类器的接受特征信息;合并规整后的第二文本打分和所述第二声纹打分,获取每一目标文本对应的最大值和最小值;利用该最大值和最小值将规整后的第二文本打分和所述第二声纹打分进行归一化,作为所述PNN分类器的拒绝特征信息。The processing module is further configured to sequentially select the target text model, and the speech feature of the non-target text is matched with the corresponding target text model, and the pseudo-text score is obtained, and the pseudo-text corresponding to the target text model is obtained. The mean value and the standard deviation of the scores; the first text score and the second text score are respectively subtracted from the average of the corresponding short text scores and divided by the standard deviation, respectively, and the regular text scores are obtained; Combining the normalized first text score and the first voiceprint score, obtaining a maximum value and a minimum value corresponding to each target text; using the maximum value and the minimum value to score the normalized first text and the first The voiceprint score is normalized as the acceptance feature information of the PNN classifier; the normalized second text score and the second voiceprint score are combined to obtain the maximum value and the minimum value corresponding to each target text; The maximum and minimum values normalize the normalized second text score and the second voiceprint score as the rejection feature information of the PNN classifier.
  13. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein
    所述切分模块,根据所述语音特征和预设的目标文本模型对所述输入 语音进行切分,获取初始切分单元,包括:根据目标密码中的目标文本序列,将对应的目标文本隐马尔可夫模型HMM组合成第一复合HMM;将所述语音特征作为所述第一复合HMM的输入进行维特比解码,得到第一状态输出序列,将所述第一状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为初始切分点;依次选取所述相邻两个初始切分点作为区间起止点,在所述区间内,以指定帧为单位计算平均能量,寻找平均能量连续指定次增大的点,并将开始增大的点作为新的初始切分点,由所述初始切分点分割成的所述初始切分单元。The segmentation module pairs the input according to the voice feature and a preset target text model Performing segmentation of the voice to obtain the initial segmentation unit includes: combining the corresponding target text hidden Markov model HMM into the first composite HMM according to the target text sequence in the target password; using the voice feature as the first The input of the composite HMM is Viterbi-decoded to obtain a first state output sequence, and the position corresponding to the state of the first state output sequence that is an integer multiple of the state number of the single target text HMM is taken as the initial segmentation point; The two adjacent initial segmentation points are used as interval start and end points. In the interval, the average energy is calculated in units of specified frames, and the average energy is continuously designated to increase the number of points, and the point at which the increase is started is taken as a new one. An initial segmentation point, the initial segmentation unit divided by the initial segmentation point.
  14. 根据权利要求13所述的装置,其中,The device according to claim 13, wherein
    所述切分模块,将对应的目标文本HMM组合成第一复合HMM,包括:所述第一复合HMM的状态数为单个目标文本HMM的状态数总和;所述第一复合HMM的每个状态具有的高斯混合模型参数与所述单个目标文本HMM的每个状态具有的高斯混合模型参数相同;将所述单个目标文本HMM的状态转移矩阵中的最后一个状态自身转移概率设为0,转移到下一个状态的状态转移概率设为1;所述目标文本的最后一个单个目标文本HMM的状态转移概率矩阵不作改变;将所述单个目标文本HMM的状态转移概率矩阵按照所述目标文本的单个目标文本排列顺序合并,得到所述复合HMM的状态转移概率矩阵。The sharding module combines the corresponding target texts HMM into the first composite HMM, including: the state number of the first composite HMM is the sum of the states of the single target text HMM; and each state of the first composite HMM Having a Gaussian mixture model parameter having the same Gaussian mixture model parameter as each state of the single target text HMM; setting a last state self-transition probability in the state transition matrix of the single target text HMM to 0, shifting to The state transition probability of the next state is set to 1; the state transition probability matrix of the last single target text HMM of the target text is not changed; the state transition probability matrix of the single target text HMM is according to a single target of the target text The text arrangement order is merged to obtain a state transition probability matrix of the composite HMM.
  15. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein
    所述切分模块,对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:选择特征段最长的所述初始切分单元进行强制切分,使得强制切分后的所有切分单元的总个数与预设的目标文本的个数相同。The segmentation module performs a forced segmentation on the initial segmentation unit, such that the total number of segmentation units is the same as the number of preset target texts, including: selecting the initial segmentation of the feature segment that is the longest The unit performs forced segmentation so that the total number of all the segmentation units after the forced segmentation is the same as the number of preset target texts.
  16. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein
    所述切分模块,对所述初始切分单元进行强制切分,使得切分单元的总个数与预设的目标文本的个数相同,包括:按照所述初始切分单元的长度从大到小的顺序开始拆分,每次将一个所述初始切分单元平均切分成两个段,直至切分后的单元总个数等于所述目标文本的个数为止;若强制切 分的次数大于等于第二阈值,则强制切分结束;若强制切分的次数小于所述第二阈值,则将当前每个切分的单元分别与每个目标文本隐马尔可夫模型HMM进行匹配打分,分别选定最高打分对应的所述目标文本HMM,将所选定的所述目标文本HMM组合成第二复合HMM;将所述语音特征作为所述第二复合HMM的输入进行维特比解码,得到第二状态输出序列,将所述第二状态输出序列中为单个目标文本HMM的状态数的整数倍的状态对应的位置作为切分点,由该切分点对所述语音特征分割得到的不同单元为所述切分单元,若当前的所述切分单元个数小于第三阈值,则将当前切分后的切分单元作为所述初始切分单元继续进行强制切分,若当前的所述切分单元个数大于或等于所述第三阈值,则强制切分结束。The segmentation module performs a forced segmentation on the initial segmentation unit, such that the total number of segmentation units is the same as the number of preset target texts, including: according to the length of the initial segmentation unit Splitting into a small order, each time dividing one of the initial splitting units into two segments, until the total number of units after the splitting is equal to the number of the target text; If the number of times is greater than or equal to the second threshold, then the forced segmentation ends; if the number of forced segmentation is less than the second threshold, then each of the currently segmented cells is separately performed with each target text hidden Markov model HMM. Matching the scoring, respectively selecting the target text HMM corresponding to the highest scoring, combining the selected target text HMM into a second composite HMM; and using the speech feature as an input of the second composite HMM to perform Viterbi Decoding, obtaining a second state output sequence, using a position corresponding to a state of an integer multiple of a state number of a single target text HMM in the second state output sequence as a segmentation point, and segmenting the phonetic feature by the segmentation point The obtained different unit is the segmentation unit. If the current number of the segmentation units is less than the third threshold, the segmentation unit after the current segmentation is used as the initial segmentation unit to continue the forced segmentation. If the current number of the segmentation units is greater than or equal to the third threshold, the forced segmentation ends.
  17. 根据权利要求10所述的装置,其中,The device according to claim 10, wherein
    所述文本匹配模块,将每个所述切分单元的语音特征与所有所述目标文本模型进行匹配,得到每个所述切分单元与每个所述目标文本模型的切分单元文本匹配分数,包括:将每个所述切分单元的语音特征作为每个目标文本隐马尔可夫模型HMM的输入,将根据维特比算法获得的输出概率作为对应的切分单元文本匹配分数。The text matching module matches the voice features of each of the segmentation units with all of the target text models to obtain a segmentation unit text matching score of each of the segmentation units and each of the target text models. The method includes: using the voice feature of each of the segmentation units as an input of each target text hidden Markov model HMM, and using the output probability obtained according to the Viterbi algorithm as the corresponding segmentation unit text matching score.
  18. 根据权利要求10-17任一项所述的装置,其中,A device according to any one of claims 10-17, wherein
    所述认证模块,根据所述切分单元文本匹配分数、所述声纹匹配分数和预先训练的判决分类器进行身份认证,包括:取每个所述切分单元对应的所述切分单元文本匹配分数中m个最高分数对应的文本作为待选文本,若所述待选文本中包含所述切分单元对应的目标文本,则所述切分单元认证通过,计算通过的切分单元的总数,若通过的切分单元总数小于或等于第四阈值,则文本认证不通过,身份认证不通过;若通过的切分单元总数大于所述第四阈值,则所述输入语音的文本认证通过;判断所述声纹匹配分数是否大于第五阈值,如是,则声纹认证通过,身份认证通过;如不是,则将每个所述切分单元与对应目标文本模型的文本打分以及所述声纹匹配分数进行得分规整,将规整后的打分作为所述PNN分类器的输入进行身份认证。 The authentication module performs identity authentication according to the segmentation unit text matching score, the voiceprint matching score, and the pre-trained decision classifier, including: taking the segmentation unit text corresponding to each of the segmentation units Matching the text corresponding to the m highest scores in the score as the text to be selected. If the text to be selected includes the target text corresponding to the segmentation unit, the segmentation unit passes the authentication, and the total number of the split units passed is calculated. If the total number of the splitting units is less than or equal to the fourth threshold, the text authentication fails, and the identity authentication fails; if the total number of the splitting units passed is greater than the fourth threshold, the text authentication of the input voice passes; Determining whether the voiceprint matching score is greater than a fifth threshold, if yes, the voiceprint authentication is passed, and the identity authentication is passed; if not, the text of each of the segmentation units and the corresponding target text model is scored and the voiceprint is The matching score is scored and the regularized score is used as the input of the PNN classifier for identity authentication.
PCT/CN2017/076336 2016-03-21 2017-03-10 Identity authentication method and device WO2017162053A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610162027.X 2016-03-21
CN201610162027.XA CN107221333B (en) 2016-03-21 2016-03-21 A kind of identity authentication method and device

Publications (1)

Publication Number Publication Date
WO2017162053A1 true WO2017162053A1 (en) 2017-09-28

Family

ID=59899353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/076336 WO2017162053A1 (en) 2016-03-21 2017-03-10 Identity authentication method and device

Country Status (2)

Country Link
CN (1) CN107221333B (en)
WO (1) WO2017162053A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019194787A1 (en) * 2018-04-02 2019-10-10 Visa International Service Association Real-time entity anomaly detection
CN111131237A (en) * 2019-12-23 2020-05-08 深圳供电局有限公司 Microgrid attack identification method based on BP neural network and grid-connected interface device
CN111862933A (en) * 2020-07-20 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating synthesized speech
CN112423063A (en) * 2020-11-03 2021-02-26 深圳Tcl新技术有限公司 Automatic setting method and device for smart television and storage medium
CN112751838A (en) * 2020-12-25 2021-05-04 中国人民解放军陆军装甲兵学院 Identity authentication method, device and system

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108154588B (en) * 2017-12-29 2020-11-27 深圳市艾特智能科技有限公司 Unlocking method and system, readable storage medium and intelligent device
CN108831484A (en) * 2018-05-29 2018-11-16 广东声将军科技有限公司 A kind of offline and unrelated with category of language method for recognizing sound-groove and device
CN109545226B (en) * 2019-01-04 2022-11-22 平安科技(深圳)有限公司 Voice recognition method, device and computer readable storage medium
WO2020206455A1 (en) * 2019-04-05 2020-10-08 Google Llc Joint automatic speech recognition and speaker diarization
CN110502610A (en) * 2019-07-24 2019-11-26 深圳壹账通智能科技有限公司 Intelligent sound endorsement method, device and medium based on text semantic similarity
CN111862967A (en) * 2020-04-07 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition method and device, electronic equipment and storage medium
CN111882543B (en) * 2020-07-29 2023-12-26 南通大学 Cigarette filter stick counting method based on AA R2Unet and HMM

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6671672B1 (en) * 1999-03-30 2003-12-30 Nuance Communications Voice authentication system having cognitive recall mechanism for password verification
CN102413101A (en) * 2010-09-25 2012-04-11 盛乐信息技术(上海)有限公司 Voice-print authentication system having voice-print password voice prompting function and realization method thereof
CN102457845A (en) * 2010-10-14 2012-05-16 阿里巴巴集团控股有限公司 Method, equipment and system for authenticating identity by wireless service
CN103220286A (en) * 2013-04-10 2013-07-24 郑方 Identity verification system and identity verification method based on dynamic password voice
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
CN104064189A (en) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 Vocal print dynamic password modeling and verification method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060294390A1 (en) * 2005-06-23 2006-12-28 International Business Machines Corporation Method and apparatus for sequential authentication using one or more error rates characterizing each security challenge
CN102543084A (en) * 2010-12-29 2012-07-04 盛乐信息技术(上海)有限公司 Online voiceprint recognition system and implementation method thereof

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6671672B1 (en) * 1999-03-30 2003-12-30 Nuance Communications Voice authentication system having cognitive recall mechanism for password verification
CN102413101A (en) * 2010-09-25 2012-04-11 盛乐信息技术(上海)有限公司 Voice-print authentication system having voice-print password voice prompting function and realization method thereof
CN102457845A (en) * 2010-10-14 2012-05-16 阿里巴巴集团控股有限公司 Method, equipment and system for authenticating identity by wireless service
CN104021790A (en) * 2013-02-28 2014-09-03 联想(北京)有限公司 Sound control unlocking method and electronic device
CN103220286A (en) * 2013-04-10 2013-07-24 郑方 Identity verification system and identity verification method based on dynamic password voice
CN104064189A (en) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 Vocal print dynamic password modeling and verification method

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019194787A1 (en) * 2018-04-02 2019-10-10 Visa International Service Association Real-time entity anomaly detection
CN111131237A (en) * 2019-12-23 2020-05-08 深圳供电局有限公司 Microgrid attack identification method based on BP neural network and grid-connected interface device
CN111131237B (en) * 2019-12-23 2020-12-29 深圳供电局有限公司 Microgrid attack identification method based on BP neural network and grid-connected interface device
CN111862933A (en) * 2020-07-20 2020-10-30 北京字节跳动网络技术有限公司 Method, apparatus, device and medium for generating synthesized speech
CN112423063A (en) * 2020-11-03 2021-02-26 深圳Tcl新技术有限公司 Automatic setting method and device for smart television and storage medium
CN112751838A (en) * 2020-12-25 2021-05-04 中国人民解放军陆军装甲兵学院 Identity authentication method, device and system

Also Published As

Publication number Publication date
CN107221333B (en) 2019-11-08
CN107221333A (en) 2017-09-29

Similar Documents

Publication Publication Date Title
WO2017162053A1 (en) Identity authentication method and device
JP6766221B2 (en) Neural network for speaker verification
US10325602B2 (en) Neural networks for speaker verification
US8050919B2 (en) Speaker recognition via voice sample based on multiple nearest neighbor classifiers
KR100655491B1 (en) Two stage utterance verification method and device of speech recognition system
US20170236520A1 (en) Generating Models for Text-Dependent Speaker Verification
JPH11507443A (en) Speaker identification system
JP2007133414A (en) Method and apparatus for estimating discrimination capability of voice and method and apparatus for registration and evaluation of speaker authentication
CN111524527A (en) Speaker separation method, device, electronic equipment and storage medium
CN111199741A (en) Voiceprint identification method, voiceprint verification method, voiceprint identification device, computing device and medium
US10909991B2 (en) System for text-dependent speaker recognition and method thereof
US7050973B2 (en) Speaker recognition using dynamic time warp template spotting
Yun et al. An end-to-end text-independent speaker verification framework with a keyword adversarial network
Ozaydin Design of a text independent speaker recognition system
JP6996627B2 (en) Information processing equipment, control methods, and programs
Georgescu et al. GMM-UBM modeling for speaker recognition on a Romanian large speech corpora
Das et al. Comparison of DTW score and warping path for text dependent speaker verification system
CN105575385A (en) Voice cipher setting system and method, and sound cipher verification system and method
WO2002029785A1 (en) Method, apparatus, and system for speaker verification based on orthogonal gaussian mixture model (gmm)
Sun et al. A new study of GMM-SVM system for text-dependent speaker recognition
CN113744742B (en) Role identification method, device and system under dialogue scene
Nallagatla et al. Sequential decision fusion for controlled detection errors
JP2000099090A (en) Speaker recognizing method using symbol string
Kanrar Dimension compactness in speaker identification
WO2006027844A1 (en) Speaker collator

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17769330

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17769330

Country of ref document: EP

Kind code of ref document: A1