WO2018166112A1 - Voiceprint recognition-based identity verification method, electronic device, and storage medium - Google Patents

Voiceprint recognition-based identity verification method, electronic device, and storage medium Download PDF

Info

Publication number
WO2018166112A1
WO2018166112A1 PCT/CN2017/091361 CN2017091361W WO2018166112A1 WO 2018166112 A1 WO2018166112 A1 WO 2018166112A1 CN 2017091361 W CN2017091361 W CN 2017091361W WO 2018166112 A1 WO2018166112 A1 WO 2018166112A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
voice data
vector
feature vector
gaussian mixture
Prior art date
Application number
PCT/CN2017/091361
Other languages
French (fr)
Chinese (zh)
Inventor
王健宗
丁涵宇
郭卉
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018166112A1 publication Critical patent/WO2018166112A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method, an electronic device, and a storage medium for identity verification based on voiceprint recognition.
  • a first aspect of the present invention provides a voiceprint recognition based authentication method, and the voiceprint recognition based identity verification method includes:
  • a second aspect of the present invention provides an electronic device, including a processing device, a storage device, and a voiceprint recognition-based identity verification system, wherein the voiceprint recognition-based identity verification system is stored in the storage device, including at least one computer Reading instructions, the at least one computer readable instruction being executable by the processing device to:
  • a third aspect of the invention provides a computer readable storage medium having stored thereon at least one computer readable instruction executable by a processing device to:
  • the invention has the beneficial effects that the background channel model generated by the pre-training of the present invention is obtained by mining and comparing a large amount of voice data, and the model can accurately describe the user's voice while retaining the user's voiceprint feature to the utmost extent.
  • the background voiceprint feature can be removed at the time of recognition, and the intrinsic feature of the user voice can be extracted, which can greatly improve the accuracy of user identity verification and improve the efficiency of identity verification.
  • FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a voiceprint recognition based authentication method according to the present invention
  • FIG. 2 is a schematic flow chart of a preferred embodiment of a voiceprint recognition based identity verification method according to the present invention
  • FIG. 3 is a schematic diagram showing the refinement process of step S1 shown in FIG. 2;
  • step S3 is a schematic diagram showing the refinement process of step S3 shown in FIG. 2;
  • FIG. 5 is a schematic structural diagram of a system for authenticating a voiceprint recognition based authentication method according to the present invention.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of a method for implementing voiceprint recognition based identity verification according to the present invention.
  • the application environment diagram includes an electronic device 1 and a terminal device 2.
  • the electronic device 1 can perform data interaction with the terminal device 2 through a suitable technology such as a network or a near field communication technology.
  • the terminal device 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, or an individual.
  • PDA Personal Digital Assistant
  • game console Internet Protocol Television (IPTV)
  • smart wearable device etc.
  • the electronic device 1 is an automatic numerical meter capable of automatically setting according to an instruction set or stored in advance. Equipment for calculation and/or information processing.
  • the electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing, A super virtual computer consisting of a loosely coupled set of computers.
  • the electronic device 1 includes, but is not limited to, a storage device 11, a processing device 12, and a network interface 13 that are communicably connected to each other through a system bus. It should be noted that FIG. 1 only shows the electronic device 1 having the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the storage device 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the electronic device 1;
  • the readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like.
  • the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be external to the electronic device 1.
  • a storage device such as a plug-in hard disk equipped with an electronic device 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like.
  • SMC smart memory card
  • SD Secure Digital
  • the readable storage medium of the storage device 11 is generally used to store an operating system installed on the electronic device 1 and various types of application software, such as the voiceprint recognition-based identity verification system 10 in an embodiment of the present application. Program code, etc. Further, the storage device 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • Processing device 12 may, in some embodiments, include one or more microprocessors, microcontrollers, digital processors, and the like.
  • the processing device 12 is generally used to control the operation of the electronic device 1, for example, to perform control and processing related to data interaction or communication with the terminal device 2.
  • the processing device 12 is operative to run program code or process data stored in the storage device 11, such as a system 10 that runs voiceprint recognition based authentication.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the electronic device 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the electronic device 1 and one or more terminal devices 2.
  • the voiceprint recognition based authentication system 10 includes at least one computer readable instructions stored in the storage device 11, the at least one computer readable instructions being executable by the processing device 12 to implement a voiceprint based on embodiments of the present application.
  • the method of identifying the authentication As described later, the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • the voiceprint recognition based authentication system 10 when executed by the processing device 12, the following operations are performed: first, after receiving the voice data of the authenticated user, acquiring the voiceprint feature of the voice data And constructing a corresponding voiceprint feature vector based on the voiceprint feature; then inputting the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data; a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, based on the distance Authenticate and generate verification results.
  • FIG. 2 is a schematic flowchart of a method for authenticating a voiceprint recognition based authentication method according to the present invention.
  • the method for authenticating identity based on voiceprint recognition in this embodiment is not limited to the steps shown in the process. Further, in the steps shown in the flowchart, some steps may be omitted, and the order between the steps may be changed.
  • the method for voiceprint recognition based authentication includes the following steps:
  • Step S1 after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
  • the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
  • the voice collection device is, for example, a microphone
  • the voice collection device When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment.
  • the voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference.
  • the collected voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • the voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. .
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
  • Step S2 input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Extract the current voiceprint discrimination vector firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
  • the background channel model is a Gaussian mixture model
  • the method includes:
  • the voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
  • the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added.
  • the number of data samples is re-trained based on the increased speech data samples.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • Step S3 Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  • the spatial distance of this embodiment is a cosine distance, which is a measure of the magnitude of the difference between two individuals using the cosine of the angles of the two vectors in the vector space.
  • the standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user.
  • the stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
  • the verification passes, and vice versa, the verification fails.
  • the background channel model generated by the pre-training in this embodiment is obtained by mining and comparing a large amount of voice data, and the model can accurately depict the user while maximally retaining the voiceprint features of the user.
  • the background voiceprint feature when speaking, and can remove this feature when identifying, and extracting the inherent features of the user voice, can greatly improve the accuracy of the user identity verification, and improve the efficiency of the identity verification; It makes full use of the voiceprint features related to the vocal vocal in the human voice.
  • This voiceprint feature does not need to limit the text, so it has greater flexibility in the process of identification and verification.
  • the foregoing step S1 includes:
  • Step S11 Perform pre-emphasis, framing, and windowing on the voice data.
  • the voice data is processed.
  • each frame signal is regarded as a stationary signal.
  • the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the voice data needs to be windowed.
  • Step S12 performing Fourier transform on each window to obtain a corresponding spectrum
  • Step S13 input the spectrum into a mel filter to output a mega spectrum
  • Step S14 performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
  • Step S3 includes:
  • Step S31 calculating a cosine distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user: Identifying the vector for the standard voiceprint, Identify the vector for the current voiceprint;
  • Step S32 if the cosine distance is less than or equal to a preset distance threshold, generating verification pass information
  • Step S33 If the cosine distance is greater than a preset distance threshold, generate information that the verification fails.
  • the foregoing step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining The smallest spatial distance, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
  • the difference between the embodiment and the embodiment of FIG. 1 is that the embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user.
  • the voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.
  • FIG. 5 is a functional block diagram of a preferred embodiment of the voiceprint recognition based authentication system 10 of the present invention.
  • the voiceprint recognition based authentication system 10 can be partitioned into one or more modules, one or more modules being stored in a memory and executed by one or more processors to complete this invention.
  • the voiceprint recognition based authentication system 10 can be divided into a detection module 21, an identification module 22, a replication module 23, an installation module 24, and a startup module 25.
  • module refers to a series of computer program instruction segments capable of performing a specific function, which is more suitable than the program for describing the execution of the voiceprint recognition based authentication system 10 in an electronic device, wherein:
  • the first obtaining module 101 is configured to acquire a voiceprint feature of the voice data after receiving the voice data of the user who performs the identity verification, and construct a corresponding voiceprint feature vector based on the voiceprint feature;
  • the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
  • the voice collection device is, for example, a microphone
  • the voice collection device When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment.
  • the voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference.
  • the collected voice data is a preset number. According to the length of the voice data, or the voice data is greater than the preset data length.
  • the voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. .
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
  • the constructing module 102 is configured to input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Extract the current voiceprint discrimination vector firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
  • the background channel model is a Gaussian mixture model
  • the voiceprint recognition based authentication system further comprises:
  • a second acquiring module configured to acquire a preset number of voice data samples, and obtain each voice data a voiceprint feature corresponding to the sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on the voiceprint feature corresponding to each voice data sample;
  • a dividing module configured to divide the voiceprint feature vector corresponding to each voice data sample into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • a training module is configured to train the Gaussian mixture model by using the voiceprint feature vector in the training set, and after the training is completed, verify the accuracy of the trained Gaussian mixture model by using the verification set;
  • a processing module if the accuracy is greater than a preset threshold, the model training ends, the trained Gaussian mixture model is used as the background channel model, or, if the accuracy is less than or equal to a preset threshold, The number of speech data samples is described and re-trained based on the increased speech data samples.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • the first verification module 103 is configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  • the spatial distance of the present embodiment is a cosine distance
  • the cosine distance is a cosine value of the angle between two vectors in the vector space.
  • the standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user.
  • the stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
  • the verification passes, and vice versa, the verification fails.
  • the first acquiring module 101 is specifically configured to perform pre-emphasis, framing, and windowing on the voice data; Fourier transform to obtain the corresponding spectrum; input the spectrum into the Meyer filter to output the Mel spectrum; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, A corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the speech data are more prominent.
  • N frames short-time signals
  • there is a repeating area between adjacent frames and the repeating area is generally 1/2 of the length of each frame; after the framed speech data, each frame signal is regarded as a stationary signal.
  • the voice data needs to be windowed.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
  • the first verification module 103 is specifically configured to calculate between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user.
  • Cosine distance Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the first verification module is replaced by a second verification module, configured to calculate the current voiceprint discrimination vector and pre-stored standard voiceprint identification.
  • the spatial distance between the vectors, the minimum spatial distance is obtained, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
  • the present embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user.
  • the voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Collating Specific Patterns (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to a voiceprint recognition-based identity verification method, an electronic device, and a storage medium. The voiceprint recognition-based identity verification method comprises: after receiving voice data of a user performing identity verification, obtaining a voiceprint feature of the voice data, and building a corresponding voiceprint feature vector on the basis of the voiceprint feature; inputting the voiceprint feature vector into a background channel model generated by training in advance to build a current voiceprint identification vector corresponding to the voice data; and calculating a space distance between the current voiceprint identification vector and a pre-stored standard voiceprint identification vector of the user, performing identity verification on the user on the basis of the distance, and generating a verification result. The present invention can improve the accuracy and efficiency of user identity verification.

Description

基于声纹识别的身份验证的方法、电子装置及存储介质Method, electronic device and storage medium for identity verification based on voiceprint recognition
优先权申明Priority claim
本申请基于巴黎公约申明享有2017年3月13日递交的申请号为CN201710147695X、名称为“基于声纹识别的身份验证的方法及系统”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application is based on the priority of the Chinese Patent Application entitled "Method and System for Voiceprint Recognition Based on Voiceprint Recognition" filed on March 13, 2017, with the application number CN201710147695X, which is filed on March 13, 2017, the entire contents of which are The manner of reference is incorporated in the present application.
技术领域Technical field
本发明涉及通信技术领域,尤其涉及一种基于声纹识别的身份验证的方法、电子装置及存储介质。The present invention relates to the field of communications technologies, and in particular, to a method, an electronic device, and a storage medium for identity verification based on voiceprint recognition.
背景技术Background technique
目前,大型金融公司的业务范围涉及保险、银行、投资等多个业务范畴,每个业务范畴通常都需要同客户进行沟通,沟通的方式有多种(例如电话沟通或者面对面沟通等)。在进行沟通之前,对客户的身份进行验证成为保证业务安全的重要组成部分。为了满足业务的实时性需求,金融公司通常采用人工方式对客户的身份进行分析验证。由于客户群体庞大,依靠人工进行判别分析以对验证客户的身份的方式准确性也不高,效率也低。At present, the business scope of large financial companies involves insurance, banking, investment and other business areas. Each business category usually needs to communicate with customers, and there are many ways to communicate (such as telephone communication or face-to-face communication). Before communicating, verifying the identity of the customer becomes an important part of ensuring business security. In order to meet the real-time needs of the business, financial companies usually use manual methods to analyze and verify the identity of customers. Due to the large customer base, relying on manual discriminant analysis to verify the identity of the customer is not accurate or efficient.
发明内容Summary of the invention
本发明的目的在于提供一种基于声纹识别的身份验证的方法、电子装置及存储介质,旨在提高用户身份验证的准确率及效率。It is an object of the present invention to provide a voiceprint recognition based authentication method, an electronic device and a storage medium, which aim to improve the accuracy and efficiency of user identity verification.
本发明第一方面提供一种基于声纹识别的身份验证的方法,所述基于声纹识别的身份验证的方法包括:A first aspect of the present invention provides a voiceprint recognition based authentication method, and the voiceprint recognition based identity verification method includes:
S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3. Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
本发明第二方面提供一种电子装置,包括处理设备、存储设备及基于声纹识别的身份验证的系统,该基于声纹识别的身份验证的系统存储于该存储设备中,包括至少一个计算机可读指令,该至少一个计算机可读指令可被所述处理设备执行,以实现以下操作:A second aspect of the present invention provides an electronic device, including a processing device, a storage device, and a voiceprint recognition-based identity verification system, wherein the voiceprint recognition-based identity verification system is stored in the storage device, including at least one computer Reading instructions, the at least one computer readable instruction being executable by the processing device to:
S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量; S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3. Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
本发明第三发面提供一种计算机可读存储介质,其上存储有至少一个可被处理设备执行以实现以下操作的计算机可读指令:A third aspect of the invention provides a computer readable storage medium having stored thereon at least one computer readable instruction executable by a processing device to:
S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3. Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
本发明的有益效果是:本发明预先训练生成的背景信道模型为通过对大量语音数据的挖掘与比对训练得到,这一模型可以在最大限度保留用户的声纹特征的同时,精确刻画用户说话时的背景声纹特征,并能够在识别时将这一特征去除,而提取用户声音的固有特征,能够较大地提高用户身份验证的准确率,并提高身份验证的效率。The invention has the beneficial effects that the background channel model generated by the pre-training of the present invention is obtained by mining and comparing a large amount of voice data, and the model can accurately describe the user's voice while retaining the user's voiceprint feature to the utmost extent. The background voiceprint feature can be removed at the time of recognition, and the intrinsic feature of the user voice can be extracted, which can greatly improve the accuracy of user identity verification and improve the efficiency of identity verification.
附图说明DRAWINGS
图1为本发明基于声纹识别的身份验证的方法较佳实施例的应用环境示意图;1 is a schematic diagram of an application environment of a preferred embodiment of a voiceprint recognition based authentication method according to the present invention;
图2为本发明基于声纹识别的身份验证的方法较佳实施例的流程示意图;2 is a schematic flow chart of a preferred embodiment of a voiceprint recognition based identity verification method according to the present invention;
图3为图2所示步骤S1的细化流程示意图;FIG. 3 is a schematic diagram showing the refinement process of step S1 shown in FIG. 2;
图4为图2所示步骤S3的细化流程示意图;4 is a schematic diagram showing the refinement process of step S3 shown in FIG. 2;
图5为本发明基于声纹识别的身份验证的系统较佳实施例的结构示意图。FIG. 5 is a schematic structural diagram of a system for authenticating a voiceprint recognition based authentication method according to the present invention.
具体实施方式detailed description
以下结合附图对本发明的原理和特征进行描述,所举实例只用于解释本发明,并非用于限定本发明的范围。The principles and features of the present invention are described in the following with reference to the accompanying drawings.
参阅图1所示,是本发明实现基于声纹识别的身份验证的方法的较佳实施例的应用环境示意图。该应用环境示意图包括电子装置1及终端设备2。电子装置1可以通过网络、近场通信技术等适合的技术与终端设备2进行数据交互。Referring to FIG. 1 , it is a schematic diagram of an application environment of a preferred embodiment of a method for implementing voiceprint recognition based identity verification according to the present invention. The application environment diagram includes an electronic device 1 and a terminal device 2. The electronic device 1 can perform data interaction with the terminal device 2 through a suitable technology such as a network or a near field communication technology.
终端设备2包括,但不限于,任何一种可与用户通过键盘、鼠标、遥控器、触摸板或者声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA),游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴设备等。The terminal device 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch pad, or a voice control device, for example, a personal computer, a tablet computer, a smart phone, or an individual. Personal Digital Assistant (PDA), game console, Internet Protocol Television (IPTV), smart wearable device, etc.
电子装置1是一种能够按照事先设定或者存储的指令,自动进行数值计 算和/或信息处理的设备。电子装置1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。The electronic device 1 is an automatic numerical meter capable of automatically setting according to an instruction set or stored in advance. Equipment for calculation and/or information processing. The electronic device 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing, A super virtual computer consisting of a loosely coupled set of computers.
在本实施例中,电子装置1包括,但不仅限于,可通过系统总线相互通信连接的存储设备11、处理设备12、及网络接口13。需要指出的是,图1仅示出了具有组件11-13的电子装置1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In the present embodiment, the electronic device 1 includes, but is not limited to, a storage device 11, a processing device 12, and a network interface 13 that are communicably connected to each other through a system bus. It should be noted that FIG. 1 only shows the electronic device 1 having the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
其中,存储设备11包括内存及至少一种类型的可读存储介质。内存为电子装置1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器等的非易失性存储介质。在一些实施例中,可读存储介质可以是电子装置1的内部存储单元,例如该电子装置1的硬盘;在另一些实施例中,该非易失性存储介质也可以是电子装置1的外部存储设备,例如电子装置1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储设备11的可读存储介质通常用于存储安装于电子装置1的操作系统和各类应用软件,例如本申请一实施例中的基于声纹识别的身份验证的系统10的程序代码等。此外,存储设备11还可以用于暂时地存储已经输出或者将要输出的各类数据。The storage device 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the electronic device 1; the readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device 1, such as a hard disk of the electronic device 1; in other embodiments, the non-volatile storage medium may also be external to the electronic device 1. A storage device, such as a plug-in hard disk equipped with an electronic device 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, or the like. In this embodiment, the readable storage medium of the storage device 11 is generally used to store an operating system installed on the electronic device 1 and various types of application software, such as the voiceprint recognition-based identity verification system 10 in an embodiment of the present application. Program code, etc. Further, the storage device 11 can also be used to temporarily store various types of data that have been output or are to be output.
处理设备12在一些实施例中可以包括一个或者多个微处理器、微控制器、数字处理器等。该处理设备12通常用于控制电子装置1的运行,例如执行与终端设备2进行数据交互或者通信相关的控制和处理等。在本实施例中,处理设备12用于运行存储设备11中存储的程序代码或者处理数据,例如运行基于声纹识别的身份验证的系统10等。 Processing device 12 may, in some embodiments, include one or more microprocessors, microcontrollers, digital processors, and the like. The processing device 12 is generally used to control the operation of the electronic device 1, for example, to perform control and processing related to data interaction or communication with the terminal device 2. In the present embodiment, the processing device 12 is operative to run program code or process data stored in the storage device 11, such as a system 10 that runs voiceprint recognition based authentication.
网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在电子装置1与其他电子设备之间建立通信连接。本实施例中,网络接口13主要用于将电子装置1与一个或多个终端设备2相连,在电子装置1与一个或多个终端设备2之间建立数据传输通道和通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the electronic device 1 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the electronic device 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the electronic device 1 and one or more terminal devices 2.
基于声纹识别的身份验证的系统10包括至少一个存储在存储设备11中的计算机可读指令,该至少一个计算机可读指令可被处理设备12执行,以实现本申请各实施例的基于声纹识别的身份验证的方法。如后续所述,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块。The voiceprint recognition based authentication system 10 includes at least one computer readable instructions stored in the storage device 11, the at least one computer readable instructions being executable by the processing device 12 to implement a voiceprint based on embodiments of the present application. The method of identifying the authentication. As described later, the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
在一实施例中,基于声纹识别的身份验证的系统10被处理设备12执行时,实现以下操作:首先在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;然后将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;最后计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行 身份验证,并生成验证结果。In an embodiment, when the voiceprint recognition based authentication system 10 is executed by the processing device 12, the following operations are performed: first, after receiving the voice data of the authenticated user, acquiring the voiceprint feature of the voice data And constructing a corresponding voiceprint feature vector based on the voiceprint feature; then inputting the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data; a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, based on the distance Authenticate and generate verification results.
如图2所示,图2为本发明基于声纹识别的身份验证的方法较佳实施例的流程示意图,本实施例基于声纹识别的身份验证的方法并不限于流程中所示的步骤,此外流程图中所示步骤中,某些步骤可以省略、步骤之间的顺序可以改变。该基于声纹识别的身份验证的方法包括以下步骤:As shown in FIG. 2, FIG. 2 is a schematic flowchart of a method for authenticating a voiceprint recognition based authentication method according to the present invention. The method for authenticating identity based on voiceprint recognition in this embodiment is not limited to the steps shown in the process. Further, in the steps shown in the flowchart, some steps may be omitted, and the order between the steps may be changed. The method for voiceprint recognition based authentication includes the following steps:
步骤S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;Step S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
本实施例中,语音数据由语音采集设备采集得到(语音采集设备例如为麦克风),语音采集设备将采集的语音数据发送给基于声纹识别的身份验证的系统。In this embodiment, the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
在采集语音数据时,应尽量防止环境噪声和语音采集设备的干扰。语音采集设备与用户保持适当距离,且尽量不用失真大的语音采集设备,电源优选使用市电,并保持电流稳定;在进行电话录音时应使用传感器。在提取语音数据中的声纹特征之前,可以对语音数据进行去噪音处理,以进一步减少干扰。为了能够提取得到语音数据的声纹特征,所采集的语音数据为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment. The voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device. The power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone. The voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference. In order to extract the voiceprint feature of the voice data, the collected voice data is voice data of a preset data length, or voice data greater than a preset data length.
声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例的声纹特征为优选地为语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。在构建对应的声纹特征向量时,将语音数据的声纹特征组成特征数据矩阵,该特征数据矩阵即为语音数据的声纹特征向量。The voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. . When constructing the corresponding voiceprint feature vector, the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
步骤S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;Step S2: input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
其中,将声纹特征向量输入预先训练生成的背景信道模型,优选地,该背景信道模型为高斯混合模型,利用该背景信道模型来计算声纹特征向量,得出对应的当前声纹鉴别向量(即i-vector)。The voiceprint feature vector is input into the background channel model generated by the pre-training. Preferably, the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
具体地,该计算过程包括:Specifically, the calculation process includes:
1)、选择高斯模型:首先,利用通用背景信道模型中的参数来计算每帧数据在不同高斯模型的似然对数值,通过对似然对数值矩阵每列并行排序,选取前N个高斯模型,最终获得一每帧数据在混合高斯模型中数值的矩阵:1) Select the Gaussian model: First, use the parameters in the general background channel model to calculate the likelihood value of each frame of data in different Gaussian models. By sorting the columns of the likelihood logarithmic matrix in parallel, select the first N Gaussian models. Finally, a matrix of values per frame of data in the mixed Gaussian model is obtained:
Loglike=E(X)*D(X)-1*XT-0.5*D(X)-1*(X.2)TLoglike=E(X)*D(X) -1 *X T -0.5*D(X) -1 *(X. 2 ) T ,
其中,Loglike为似然对数值矩阵,E(X)为通用背景信道模型训练出来的均值矩阵,D(X)为协方差矩阵,X为数据矩阵,X.2为矩阵每个值取平方。Among them, Loglike is a likelihood logarithmic matrix, E(X) is a mean matrix trained by a general background channel model, D(X) is a covariance matrix, X is a data matrix, and X. 2 is a square of each value of the matrix.
2)、计算后验概率:将每帧数据X进行X*XT计算,得到一个对称矩阵,可简化为下三角矩阵,并将元素按顺序排列为1行,变成一个N帧乘以该下三角矩阵个数纬度的一个向量进行计算,将所有帧的该向量组合成新的数据矩阵,同时将通用背景模型中计算概率的协方差矩阵,每个矩阵也简化为下三角矩阵,变成与新数据矩阵类似的矩阵,在通过通用背景信道模型中的均值矩阵和协方差矩阵算出每帧数据的在该选择的高斯模型下的似然对 数值,然后进行Softmax回归,最后进行归一化操作,得到每帧在混合高斯模型后验概率分布,将每帧的概率分布向量组成概率矩阵。2) Calculate the posterior probability: X*XT calculation is performed on each frame of data X to obtain a symmetric matrix, which can be simplified into a lower triangular matrix, and the elements are arranged in order of 1 row, and become an N frame multiplied by the next A vector of the latitude of the triangular matrix is calculated, and the vectors of all the frames are combined into a new data matrix, and the covariance matrix of the probability is calculated in the general background model, and each matrix is also simplified into a lower triangular matrix, which becomes A matrix similar to the new data matrix, the likelihood ratio of each frame of data in the selected Gaussian model is calculated by the mean matrix and the covariance matrix in the universal background channel model The values are then subjected to Softmax regression, and finally normalized, and the posterior probability distribution of each frame in the mixed Gaussian model is obtained, and the probability distribution vector of each frame is composed into a probability matrix.
3)、提取当前声纹鉴别向量:首先进行一阶,二阶系数的计算,一阶系数计算可以通过概率矩阵列求和得到:3) Extract the current voiceprint discrimination vector: firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
Figure PCTCN2017091361-appb-000001
其中,Gammai为一阶系数向量的第i个元素,loglikesji为概率矩阵的第j行,第i个元素。
Figure PCTCN2017091361-appb-000001
Among them, Gamma i is the i-th element of the first-order coefficient vector, and loglikes ji is the j-th row of the probability matrix, the i-th element.
二阶系数可以通过概率矩阵的转置乘以数据矩阵获得:The second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
X=LoglikeT*feats,其中,X为二阶系数矩阵,loglike为概率矩阵,feats为特征数据矩阵。X=Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
在计算得到一阶,二阶系数以后,并行计算一次项和二次项,然后通过一次项和二次项计算当前声纹鉴别向量。After the first-order and second-order coefficients are calculated, the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
优选地,背景信道模型为高斯混合模型,在上述步骤S1之前包括:Preferably, the background channel model is a Gaussian mixture model, and before the step S1, the method includes:
获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
其中,在利用训练集中的声纹特征向量对高斯混合模型进行训练时,抽取出来的D维声纹特征对应的似然概率可用K个高斯分量表示为:When the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
Figure PCTCN2017091361-appb-000002
其中,P(x)为语音数据样本由高斯混合模型生成的概率(混合高斯模型),wk为每个高斯模型的权重,p(x|k)为样本由第k个高斯模型生成的概率,K为高斯模型数量。
Figure PCTCN2017091361-appb-000002
P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x|k) is the probability that the sample is generated by the kth Gaussian model. , K is the number of Gaussian models.
整个高斯混合模型的参数可以表示为:{wiii},wi为第i个高斯模型的权重,μi为第i个高斯模型的均值,∑i为第i个高斯模型的协方差。训练该高斯混合模型可以用非监督的EM算法。训练完成后,得到高斯混合模型的权重向量、常数向量、N个协方差矩阵、均值乘以协方差的矩阵等,即为一个训练后的高斯混合模型。The parameters of the entire Gaussian mixture model can be expressed as: {w i , μ i , Σ i }, w i is the weight of the i-th Gaussian model, μ i is the mean of the i-th Gaussian model, and ∑ i is the i-th Gaussian The covariance of the model. Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
步骤S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。Step S3: Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
向量与向量之间的距离有多种,包括余弦距离及欧氏距离等等,优选地, 本实施例的空间距离为余弦距离,余弦距离为利用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。There are various distances between the vector and the vector, including the cosine distance and the Euclidean distance, etc., preferably, The spatial distance of this embodiment is a cosine distance, which is a measure of the magnitude of the difference between two individuals using the cosine of the angles of the two vectors in the vector space.
其中,标准声纹鉴别向量为预先获得并存储的声纹鉴别向量,标准声纹鉴别向量在存储时携带其对应的用户的标识信息,其能够准确代表对应的用户的身份。在计算空间距离前,根据用户提供的标识信息获得存储的声纹鉴别向量。The standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user. The stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
其中,在计算得到的空间距离小于等于预设距离阈值时,验证通过,反之,则验证失败。Wherein, when the calculated spatial distance is less than or equal to the preset distance threshold, the verification passes, and vice versa, the verification fails.
与现有技术相比,本实施例预先训练生成的背景信道模型为通过对大量语音数据的挖掘与比对训练得到,这一模型可以在最大限度保留用户的声纹特征的同时,精确刻画用户说话时的背景声纹特征,并能够在识别时将这一特征去除,而提取用户声音的固有特征,能够较大地提高用户身份验证的准确率,并提高身份验证的效率;此外,本实施例充分利用了人声中与声道相关的声纹特征,这种声纹特征并不需要对文本加以限制,因而在进行识别与验证的过程中有较大的灵活性。Compared with the prior art, the background channel model generated by the pre-training in this embodiment is obtained by mining and comparing a large amount of voice data, and the model can accurately depict the user while maximally retaining the voiceprint features of the user. The background voiceprint feature when speaking, and can remove this feature when identifying, and extracting the inherent features of the user voice, can greatly improve the accuracy of the user identity verification, and improve the efficiency of the identity verification; It makes full use of the voiceprint features related to the vocal vocal in the human voice. This voiceprint feature does not need to limit the text, so it has greater flexibility in the process of identification and verification.
在一优选的实施例中,如图3所示,在上述图2的实施例的基础上,上述步骤S1包括:In a preferred embodiment, as shown in FIG. 3, based on the foregoing embodiment of FIG. 2, the foregoing step S1 includes:
步骤S11,对所述语音数据进行预加重、分帧和加窗处理;本实施例中,在接收到进行身份验证的用户的语音数据后,对语音数据进行处理。其中,预加重处理实际是高通滤波处理,滤除低频数据,使得语音数据中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ-1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于声音信号只在较短时间内呈现平稳性,因此将一段声音信号分成N段短时间的信号(即N帧),且为了避免声音的连续性特征丢失,相邻帧之间有一段重复区域,重复区域一般为每帧长的1/2;在对语音数据进行分帧后,每一帧信号都当成平稳信号来处理,但吉布斯效应的存在,语音数据的起始帧和结束帧是不连续的,在分帧之后,更加背离原始语音,因此,需要对语音数据进行加窗处理。Step S11: Perform pre-emphasis, framing, and windowing on the voice data. In this embodiment, after receiving the voice data of the user performing the identity verification, the voice data is processed. Wherein the pre-emphasis actually a high-pass filtering, to filter out low-frequency data, so that high-frequency characteristics of the speech data more highlighted, in particular, the high pass filter transfer function: H (Z) = 1- αZ -1, wherein, Z is voice data, α is a constant coefficient, preferably, the value of α is 0.97; since the sound signal exhibits smoothness only in a short time, a sound signal is divided into N short-time signals (ie, N frames). In order to avoid the loss of the continuity feature of the sound, there is a repeating area between adjacent frames, and the repeating area is generally 1/2 of the length of each frame; after the framed speech data, each frame signal is regarded as a stationary signal. To deal with, but the existence of the Gibbs effect, the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the voice data needs to be windowed.
步骤S12,对每一个加窗进行傅立叶变换得到对应的频谱;Step S12, performing Fourier transform on each window to obtain a corresponding spectrum;
步骤S13,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;Step S13, input the spectrum into a mel filter to output a mega spectrum;
步骤S14,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。其中,倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为MFCC系数。梅尔频率倒谱系数MFCC即为这帧语音数据的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为语音数据的声纹特征向量。Step S14, performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC. The cepstrum analysis is, for example, taking logarithm and inverse transform. The inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients. The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
在一优选的实施例中,如图4所示,在上述图2的实施例的基础上,上 步骤S3包括:In a preferred embodiment, as shown in FIG. 4, based on the embodiment of FIG. 2 above, Step S3 includes:
步骤S31,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
Figure PCTCN2017091361-appb-000003
Figure PCTCN2017091361-appb-000004
为所述标准声纹鉴别向量,
Figure PCTCN2017091361-appb-000005
为当前声纹鉴别向量;
Step S31, calculating a cosine distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user:
Figure PCTCN2017091361-appb-000003
Figure PCTCN2017091361-appb-000004
Identifying the vector for the standard voiceprint,
Figure PCTCN2017091361-appb-000005
Identify the vector for the current voiceprint;
步骤S32,若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Step S32, if the cosine distance is less than or equal to a preset distance threshold, generating verification pass information;
步骤S33,若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。Step S33: If the cosine distance is greater than a preset distance threshold, generate information that the verification fails.
在一优选的实施例中,在上述图2的实施例的基础上,上述的步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。In a preferred embodiment, based on the foregoing embodiment of FIG. 2, the foregoing step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining The smallest spatial distance, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
本实施例与图1的实施例不同的是,本实施例在存储标准声纹鉴别向量时并不携带用户的标识信息,在验证用户的身份时,计算当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,并取得最小的空间距离,如果该最小的空间距离小于预设的距离阈值(该距离阈值与上述实施例的距离阈值相同或者不同),则验证通过,否则验证失败。The difference between the embodiment and the embodiment of FIG. 1 is that the embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user. The voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.
请参阅图5,是本发明基于声纹识别的身份验证的系统10较佳实施例的功能模块图。在本实施例中,基于声纹识别的身份验证的系统10可以被分割成一个或多个模块,一个或者多个模块被存储于存储器中,并由一个或多个处理器所执行,以完成本发明。例如,在图5中,基于声纹识别的身份验证的系统10可以被分割成侦测模块21、识别模块22、复制模块23、安装模块24及启动模块25。本发明所称的模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述基于声纹识别的身份验证的系统10在电子装置中的执行过程,其中:Please refer to FIG. 5, which is a functional block diagram of a preferred embodiment of the voiceprint recognition based authentication system 10 of the present invention. In this embodiment, the voiceprint recognition based authentication system 10 can be partitioned into one or more modules, one or more modules being stored in a memory and executed by one or more processors to complete this invention. For example, in FIG. 5, the voiceprint recognition based authentication system 10 can be divided into a detection module 21, an identification module 22, a replication module 23, an installation module 24, and a startup module 25. The term "module" as used in the present invention refers to a series of computer program instruction segments capable of performing a specific function, which is more suitable than the program for describing the execution of the voiceprint recognition based authentication system 10 in an electronic device, wherein:
第一获取模块101,用于在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;The first obtaining module 101 is configured to acquire a voiceprint feature of the voice data after receiving the voice data of the user who performs the identity verification, and construct a corresponding voiceprint feature vector based on the voiceprint feature;
本实施例中,语音数据由语音采集设备采集得到(语音采集设备例如为麦克风),语音采集设备将采集的语音数据发送给基于声纹识别的身份验证的系统。In this embodiment, the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
在采集语音数据时,应尽量防止环境噪声和语音采集设备的干扰。语音采集设备与用户保持适当距离,且尽量不用失真大的语音采集设备,电源优选使用市电,并保持电流稳定;在进行电话录音时应使用传感器。在提取语音数据中的声纹特征之前,可以对语音数据进行去噪音处理,以进一步减少干扰。为了能够提取得到语音数据的声纹特征,所采集的语音数据为预设数 据长度的语音数据,或者为大于预设数据长度的语音数据。When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment. The voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device. The power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone. The voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference. In order to extract the voiceprint feature of the voice data, the collected voice data is a preset number. According to the length of the voice data, or the voice data is greater than the preset data length.
声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例的声纹特征为优选地为语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。在构建对应的声纹特征向量时,将语音数据的声纹特征组成特征数据矩阵,该特征数据矩阵即为语音数据的声纹特征向量。The voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. . When constructing the corresponding voiceprint feature vector, the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
构建模块102,用于将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;The constructing module 102 is configured to input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
其中,将声纹特征向量输入预先训练生成的背景信道模型,优选地,该背景信道模型为高斯混合模型,利用该背景信道模型来计算声纹特征向量,得出对应的当前声纹鉴别向量(即i-vector)。The voiceprint feature vector is input into the background channel model generated by the pre-training. Preferably, the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
具体地,该计算过程包括:Specifically, the calculation process includes:
1)、选择高斯模型:首先,利用通用背景信道模型中的参数来计算每帧数据在不同高斯模型的似然对数值,通过对似然对数值矩阵每列并行排序,选取前N个高斯模型,最终获得一每帧数据在混合高斯模型中数值的矩阵:1) Select the Gaussian model: First, use the parameters in the general background channel model to calculate the likelihood value of each frame of data in different Gaussian models. By sorting the columns of the likelihood logarithmic matrix in parallel, select the first N Gaussian models. Finally, a matrix of values per frame of data in the mixed Gaussian model is obtained:
Loglike=E(X)*D(X)-1*XT-0.5*D(X)-1*(X.2)TLoglike=E(X)*D(X) -1 *X T -0.5*D(X) -1 *(X. 2 ) T ,
其中,Loglike为似然对数值矩阵,E(X)为通用背景信道模型训练出来的均值矩阵,D(X)为协方差矩阵,X为数据矩阵,X.2为矩阵每个值取平方。Among them, Loglike is a likelihood logarithmic matrix, E(X) is a mean matrix trained by a general background channel model, D(X) is a covariance matrix, X is a data matrix, and X. 2 is a square of each value of the matrix.
2)、计算后验概率:将每帧数据X进行X*XT计算,得到一个对称矩阵,可简化为下三角矩阵,并将元素按顺序排列为1行,变成一个N帧乘以该下三角矩阵个数纬度的一个向量进行计算,将所有帧的该向量组合成新的数据矩阵,同时将通用背景模型中计算概率的协方差矩阵,每个矩阵也简化为下三角矩阵,变成与新数据矩阵类似的矩阵,在通过通用背景信道模型中的均值矩阵和协方差矩阵算出每帧数据的在该选择的高斯模型下的似然对数值,然后进行Softmax回归,最后进行归一化操作,得到每帧在混合高斯模型后验概率分布,将每帧的概率分布向量组成概率矩阵。2) Calculate the posterior probability: X*XT calculation is performed on each frame of data X to obtain a symmetric matrix, which can be simplified into a lower triangular matrix, and the elements are arranged in order of 1 row, and become an N frame multiplied by the next A vector of the latitude of the triangular matrix is calculated, and the vectors of all the frames are combined into a new data matrix, and the covariance matrix of the probability is calculated in the general background model, and each matrix is also simplified into a lower triangular matrix, which becomes A matrix similar to the new data matrix, the likelihood log value of each frame of data in the selected Gaussian model is calculated by the mean matrix and the covariance matrix in the general background channel model, and then Softmax regression is performed, and finally the normalization operation is performed. The posterior probability distribution of each frame in the mixed Gaussian model is obtained, and the probability distribution vector of each frame is composed into a probability matrix.
3)、提取当前声纹鉴别向量:首先进行一阶,二阶系数的计算,一阶系数计算可以通过概率矩阵列求和得到:3) Extract the current voiceprint discrimination vector: firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
Figure PCTCN2017091361-appb-000006
其中,Gammai为一阶系数向量的第i个元素,loglikesji为概率矩阵的第j行,第i个元素。
Figure PCTCN2017091361-appb-000006
Among them, Gamma i is the i-th element of the first-order coefficient vector, and loglikes ji is the j-th row of the probability matrix, the i-th element.
二阶系数可以通过概率矩阵的转置乘以数据矩阵获得:The second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
X=LoglikeT*feats,其中,X为二阶系数矩阵,loglike为概率矩阵,feats为特征数据矩阵。X=Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
在计算得到一阶,二阶系数以后,并行计算一次项和二次项,然后通过一次项和二次项计算当前声纹鉴别向量。After the first-order and second-order coefficients are calculated, the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
优选地,背景信道模型为高斯混合模型,基于声纹识别的身份验证的系统还包括:Preferably, the background channel model is a Gaussian mixture model, and the voiceprint recognition based authentication system further comprises:
第二获取模块,用于获取预设数量的语音数据样本,并获取各语音数据 样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;a second acquiring module, configured to acquire a preset number of voice data samples, and obtain each voice data a voiceprint feature corresponding to the sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on the voiceprint feature corresponding to each voice data sample;
划分模块,用于将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;a dividing module, configured to divide the voiceprint feature vector corresponding to each voice data sample into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
训练模块,用于利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;a training module is configured to train the Gaussian mixture model by using the voiceprint feature vector in the training set, and after the training is completed, verify the accuracy of the trained Gaussian mixture model by using the verification set;
处理模块,用于若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。a processing module, if the accuracy is greater than a preset threshold, the model training ends, the trained Gaussian mixture model is used as the background channel model, or, if the accuracy is less than or equal to a preset threshold, The number of speech data samples is described and re-trained based on the increased speech data samples.
其中,在利用训练集中的声纹特征向量对高斯混合模型进行训练时,抽取出来的D维声纹特征对应的似然概率可用K个高斯分量表示为:When the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
Figure PCTCN2017091361-appb-000007
其中,P(x)为语音数据样本由高斯混合模型生成的概率(混合高斯模型),wk为每个高斯模型的权重,p(x|k)为样本由第k个高斯模型生成的概率,K为高斯模型数量。
Figure PCTCN2017091361-appb-000007
P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x|k) is the probability that the sample is generated by the kth Gaussian model. , K is the number of Gaussian models.
整个高斯混合模型的参数可以表示为:{wiii},wi为第i个高斯模型的权重,μi为第i个高斯模型的均值,∑i为第i个高斯模型的协方差。训练该高斯混合模型可以用非监督的EM算法。训练完成后,得到高斯混合模型的权重向量、常数向量、N个协方差矩阵、均值乘以协方差的矩阵等,即为一个训练后的高斯混合模型。The parameters of the entire Gaussian mixture model can be expressed as: {w i , μ i , Σ i }, w i is the weight of the i-th Gaussian model, μ i is the mean of the i-th Gaussian model, and ∑ i is the i-th Gaussian The covariance of the model. Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
第一验证模块103,用于计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。The first verification module 103 is configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
向量与向量之间的距离有多种,包括余弦距离及欧氏距离等等,优选地,本实施例的空间距离为余弦距离,余弦距离为利用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。There are various distances between the vector and the vector, including the cosine distance and the Euclidean distance, etc. Preferably, the spatial distance of the present embodiment is a cosine distance, and the cosine distance is a cosine value of the angle between two vectors in the vector space. A measure of the magnitude of the difference between two individuals.
其中,标准声纹鉴别向量为预先获得并存储的声纹鉴别向量,标准声纹鉴别向量在存储时携带其对应的用户的标识信息,其能够准确代表对应的用户的身份。在计算空间距离前,根据用户提供的标识信息获得存储的声纹鉴别向量。The standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user. The stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
其中,在计算得到的空间距离小于等于预设距离阈值时,验证通过,反之,则验证失败。Wherein, when the calculated spatial distance is less than or equal to the preset distance threshold, the verification passes, and vice versa, the verification fails.
在一优选的实施例中,在上述图5的实施例的基础上,上述第一获取模块101具体用于对所述语音数据进行预加重、分帧和加窗处理;对每一个加窗进行傅立叶变换得到对应的频谱;将所述频谱输入梅尔滤波器以输出得到梅尔频谱;在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC, 基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。In a preferred embodiment, based on the foregoing embodiment of FIG. 5, the first acquiring module 101 is specifically configured to perform pre-emphasis, framing, and windowing on the voice data; Fourier transform to obtain the corresponding spectrum; input the spectrum into the Meyer filter to output the Mel spectrum; perform cepstrum analysis on the Mel spectrum to obtain the Mel frequency cepstral coefficient MFCC, A corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
其中,预加重处理实际是高通滤波处理,滤除低频数据,使得语音数据中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ-1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于声音信号只在较短时间内呈现平稳性,因此将一段声音信号分成N段短时间的信号(即N帧),且为了避免声音的连续性特征丢失,相邻帧之间有一段重复区域,重复区域一般为每帧长的1/2;在对语音数据进行分帧后,每一帧信号都当成平稳信号来处理,但吉布斯效应的存在,语音数据的起始帧和结束帧是不连续的,在分帧之后,更加背离原始语音,因此,需要对语音数据进行加窗处理。The pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the speech data are more prominent. Specifically, the transfer function of the high-pass filter is: H(Z)=1-αZ -1 , wherein Z is voice data, α is a constant coefficient, preferably, the value of α is 0.97; since the sound signal exhibits smoothness only in a short time, a sound signal is divided into N short-time signals (ie, N frames). In order to avoid the loss of the continuity feature of the sound, there is a repeating area between adjacent frames, and the repeating area is generally 1/2 of the length of each frame; after the framed speech data, each frame signal is regarded as a stationary signal. To deal with, but the existence of the Gibbs effect, the start frame and the end frame of the speech data are discontinuous, and after the framing, the original speech is further deviated. Therefore, the voice data needs to be windowed.
其中,倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为MFCC系数。梅尔频率倒谱系数MFCC即为这帧语音数据的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为语音数据的声纹特征向量。The cepstrum analysis is, for example, taking logarithm and inverse transform. The inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients. The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame, and the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech data.
在一优选的实施例中,在上述图5的实施例的基础上,所述第一验证模块103具体用于计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
Figure PCTCN2017091361-appb-000008
Figure PCTCN2017091361-appb-000009
为所述标准声纹鉴别向量,
Figure PCTCN2017091361-appb-000010
为当前声纹鉴别向量;若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。
In a preferred embodiment, based on the foregoing embodiment of FIG. 5, the first verification module 103 is specifically configured to calculate between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user. Cosine distance:
Figure PCTCN2017091361-appb-000008
Figure PCTCN2017091361-appb-000009
Identifying the vector for the standard voiceprint,
Figure PCTCN2017091361-appb-000010
And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
在一优选的实施例中,在上述图5的实施例的基础上,上述的第一验证模块替换为第二验证模块,用于计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。In a preferred embodiment, based on the foregoing embodiment of FIG. 5, the first verification module is replaced by a second verification module, configured to calculate the current voiceprint discrimination vector and pre-stored standard voiceprint identification. The spatial distance between the vectors, the minimum spatial distance is obtained, the user is authenticated based on the minimum spatial distance, and a verification result is generated.
本实施例与图5的实施例不同的是,本实施例在存储标准声纹鉴别向量时并不携带用户的标识信息,在验证用户的身份时,计算当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,并取得最小的空间距离,如果该最小的空间距离小于预设的距离阈值(该距离阈值与上述实施例的距离阈值相同或者不同),则验证通过,否则验证失败。The difference between the embodiment and the embodiment of FIG. 5 is that the present embodiment does not carry the identification information of the user when storing the standard voiceprint authentication vector, and calculates the current voiceprint authentication vector and the pre-stored standard when verifying the identity of the user. The voiceprint discriminates the spatial distance between the vectors and obtains a minimum spatial distance. If the minimum spatial distance is less than a preset distance threshold (the distance threshold is the same as or different from the distance threshold of the above embodiment), the verification passes, otherwise verification failed.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。 The above are only the preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalents, improvements, etc., which are within the spirit and scope of the present invention, should be included in the protection of the present invention. Within the scope.

Claims (20)

  1. 一种基于声纹识别的身份验证的方法,其特征在于,所述基于声纹识别的身份验证的方法包括:A method for identity verification based on voiceprint recognition, characterized in that the method for authenticating voiceprint recognition based authentication comprises:
    S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
    S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
    S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3. Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  2. 根据权利要求1所述的基于声纹识别的身份验证的方法,其特征在于,所述背景信道模型为高斯混合模型,所述步骤S1之前包括:The voiceprint recognition-based authentication method according to claim 1, wherein the background channel model is a Gaussian mixture model, and the step S1 comprises:
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  3. 根据权利要求1所述的基于声纹识别的身份验证的方法,其特征在于,所述步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。The method according to claim 1, wherein the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors. Obtaining a minimum spatial distance, authenticating the user based on the minimum spatial distance, and generating a verification result.
  4. 根据权利要求1所述的基于声纹识别的身份验证的方法,其特征在于,所述步骤S1包括:The voiceprint recognition based authentication method according to claim 1, wherein the step S1 comprises:
    S11,对所述语音数据进行预加重、分帧和加窗处理;S11: Perform pre-emphasis, framing, and windowing on the voice data.
    S12,对每一个加窗进行傅立叶变换得到对应的频谱;S12, performing Fourier transform on each window to obtain a corresponding spectrum;
    S13,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;S13, input the spectrum into a mel filter to output a mega spectrum;
    S14,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。S14, performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
  5. 根据权利要求4所述的基于声纹识别的身份验证的方法,其特征在于,所述背景信道模型为高斯混合模型,所述步骤S1之前包括:The voiceprint recognition based authentication method according to claim 4, wherein the background channel model is a Gaussian mixture model, and the step S1 comprises:
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量; Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  6. 根据权利要求4所述的基于声纹识别的身份验证的方法,其特征在于,所述步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。The method according to claim 4, wherein the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors. Obtaining a minimum spatial distance, authenticating the user based on the minimum spatial distance, and generating a verification result.
  7. 根据权利要求1所述的基于声纹识别的身份验证的方法,其特征在于,所述步骤S3包括:The voiceprint recognition based authentication method according to claim 1, wherein the step S3 comprises:
    S31,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2017091361-appb-100001
    为所述标准声纹鉴别向量,
    Figure PCTCN2017091361-appb-100002
    为当前声纹鉴别向量;
    S31. Calculate a cosine distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user:
    Figure PCTCN2017091361-appb-100001
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2017091361-appb-100002
    Identify the vector for the current voiceprint;
    S32,若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;S32. If the cosine distance is less than or equal to a preset distance threshold, generate verification information.
    S33,若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。S33. If the cosine distance is greater than a preset distance threshold, generate information that the verification fails.
  8. 根据权利要求7所述的基于声纹识别的身份验证的方法,其特征在于,所述背景信道模型为高斯混合模型,所述步骤S1之前包括:The voiceprint recognition-based authentication method according to claim 7, wherein the background channel model is a Gaussian mixture model, and the step S1 comprises:
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  9. 一种电子装置,其特征在于,包括处理设备、存储设备及基于声纹识别的身份验证的系统,该基于声纹识别的身份验证的系统存储于该存储设备中,包括至少一个计算机可读指令,该至少一个计算机可读指令可被所述处理设备执行,以实现以下操作: An electronic device, comprising: a processing device, a storage device, and a voiceprint recognition based authentication system, wherein the voiceprint recognition based authentication system is stored in the storage device, including at least one computer readable instruction The at least one computer readable instruction is executable by the processing device to:
    S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
    S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
    S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3. Calculate a spatial distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  10. 根据权利要求9所述的电子装置,其特征在于,所述背景信道模型为高斯混合模型,于所述步骤S1之前,所述至少一个计算机可读指令还可被处理设备执行,以实现以下操作:The electronic device according to claim 9, wherein the background channel model is a Gaussian mixture model, and the at least one computer readable instruction is further executable by the processing device to perform the following operations before the step S1 :
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  11. 根据权利要求9所述的电子装置,其特征在于,所述步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。The electronic device according to claim 9, wherein the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining a minimum spatial distance, The user is authenticated based on the minimum spatial distance and a verification result is generated.
  12. 根据权利要求9所述的电子装置,其特征在于,所述步骤S1包括:The electronic device according to claim 9, wherein the step S1 comprises:
    S11,对所述语音数据进行预加重、分帧和加窗处理;S11: Perform pre-emphasis, framing, and windowing on the voice data.
    S12,对每一个加窗进行傅立叶变换得到对应的频谱;S12, performing Fourier transform on each window to obtain a corresponding spectrum;
    S13,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;S13, input the spectrum into a mel filter to output a mega spectrum;
    S14,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。S14, performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
  13. 根据权利要求12所述的电子装置,其特征在于,所述背景信道模型为高斯混合模型,于所述步骤S1之前,所述至少一个计算机可读指令还可被处理设备执行,以实现以下操作:The electronic device according to claim 12, wherein the background channel model is a Gaussian mixture model, and the at least one computer readable instruction is further executable by the processing device to perform the following operations before the step S1 :
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练 完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;Training the Gaussian mixture model using the voiceprint feature vector in the training set, and training After the completion, verifying the accuracy of the trained Gaussian mixture model by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  14. 根据权利要求12所述的电子装置,其特征在于,所述步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。The electronic device according to claim 12, wherein the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining a minimum spatial distance, The user is authenticated based on the minimum spatial distance and a verification result is generated.
  15. 根据权利要求9所述的电子装置,其特征在于,所述步骤S3包括:The electronic device according to claim 9, wherein the step S3 comprises:
    S31,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2017091361-appb-100003
    为所述标准声纹鉴别向量,
    Figure PCTCN2017091361-appb-100004
    为当前声纹鉴别向量;
    S31. Calculate a cosine distance between the current voiceprint discrimination vector and a pre-stored standard voiceprint discrimination vector of the user:
    Figure PCTCN2017091361-appb-100003
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2017091361-appb-100004
    Identify the vector for the current voiceprint;
    S32,若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;S32. If the cosine distance is less than or equal to a preset distance threshold, generate verification information.
    S33,若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。S33. If the cosine distance is greater than a preset distance threshold, generate information that the verification fails.
  16. 根据权利要求15所述的电子装置,其特征在于,所述背景信道模型为高斯混合模型,于所述步骤S1之前,所述至少一个计算机可读指令还可被处理设备执行,以实现以下操作:The electronic device according to claim 15, wherein the background channel model is a Gaussian mixture model, and the at least one computer readable instruction is further executable by the processing device to perform the following operations before the step S1 :
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  17. 一种计算机可读存储介质,其特征在于,其上存储有至少一个可被处理设备执行以实现以下操作的计算机可读指令:A computer readable storage medium having stored thereon at least one computer readable instruction executable by a processing device to::
    S1,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;S1, after receiving the voice data of the user who performs the authentication, acquiring the voiceprint feature of the voice data, and constructing a corresponding voiceprint feature vector based on the voiceprint feature;
    S2,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;S2, input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
    S3,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之 间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。S3, calculating the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user The spatial distance between the users is authenticated based on the distance and a verification result is generated.
  18. 如权利要求17所述的存储介质,其特征在于,所述背景信道模型为高斯混合模型,于所述步骤S1之前,所述至少一个计算机可读指令还可被处理设备执行,以实现以下操作:The storage medium according to claim 17, wherein said background channel model is a Gaussian mixture model, and said at least one computer readable instruction is further executable by said processing device to perform the following operations prior to said step S1 :
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;Obtaining a preset number of voice data samples, and acquiring voiceprint features corresponding to each voice data sample, and constructing a voiceprint feature vector corresponding to each voice data sample based on voiceprint features corresponding to each voice data sample;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;The voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;The Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S2的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。If the accuracy is greater than the preset threshold, the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S2, or if the accuracy is less than or equal to the preset threshold, the voice is added. The number of data samples is re-trained based on the increased speech data samples.
  19. 如权利要求17所述的存储介质,其特征在于,所述步骤S3替换为:计算所述当前声纹鉴别向量与预存的各标准声纹鉴别向量之间的空间距离,获取最小的空间距离,基于所述最小的空间距离对该用户进行身份验证,并生成验证结果。The storage medium according to claim 17, wherein the step S3 is replaced by: calculating a spatial distance between the current voiceprint discrimination vector and each of the pre-stored standard voiceprint discrimination vectors, and obtaining a minimum spatial distance, The user is authenticated based on the minimum spatial distance and a verification result is generated.
  20. 如权利要求17所述的存储介质,其特征在于,所述步骤S1包括:The storage medium of claim 17, wherein the step S1 comprises:
    S11,对所述语音数据进行预加重、分帧和加窗处理;S11: Perform pre-emphasis, framing, and windowing on the voice data.
    S12,对每一个加窗进行傅立叶变换得到对应的频谱;S12, performing Fourier transform on each window to obtain a corresponding spectrum;
    S13,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;S13, input the spectrum into a mel filter to output a mega spectrum;
    S14,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。 S14, performing cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and composing a corresponding voiceprint feature vector based on the Mel frequency cepstral coefficient MFCC.
PCT/CN2017/091361 2017-03-13 2017-06-30 Voiceprint recognition-based identity verification method, electronic device, and storage medium WO2018166112A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710147695.XA CN107068154A (en) 2017-03-13 2017-03-13 The method and system of authentication based on Application on Voiceprint Recognition
CN201710147695.X 2017-03-13

Publications (1)

Publication Number Publication Date
WO2018166112A1 true WO2018166112A1 (en) 2018-09-20

Family

ID=59622093

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/091361 WO2018166112A1 (en) 2017-03-13 2017-06-30 Voiceprint recognition-based identity verification method, electronic device, and storage medium
PCT/CN2017/105031 WO2018166187A1 (en) 2017-03-13 2017-09-30 Server, identity verification method and system, and a computer-readable storage medium

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/105031 WO2018166187A1 (en) 2017-03-13 2017-09-30 Server, identity verification method and system, and a computer-readable storage medium

Country Status (3)

Country Link
CN (2) CN107068154A (en)
TW (1) TWI641965B (en)
WO (2) WO2018166112A1 (en)

Families Citing this family (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition
CN107527620B (en) * 2017-07-25 2019-03-26 平安科技(深圳)有限公司 Electronic device, the method for authentication and computer readable storage medium
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print
CN108172230A (en) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 Voiceprint registration method, terminal installation and storage medium based on Application on Voiceprint Recognition model
CN108269575B (en) * 2018-01-12 2021-11-02 平安科技(深圳)有限公司 Voice recognition method for updating voiceprint data, terminal device and storage medium
CN108154371A (en) * 2018-01-12 2018-06-12 平安科技(深圳)有限公司 Electronic device, the method for authentication and storage medium
CN108091326B (en) * 2018-02-11 2021-08-06 张晓雷 Voiceprint recognition method and system based on linear regression
CN108694952B (en) * 2018-04-09 2020-04-28 平安科技(深圳)有限公司 Electronic device, identity authentication method and storage medium
CN108766444B (en) * 2018-04-09 2020-11-03 平安科技(深圳)有限公司 User identity authentication method, server and storage medium
CN108768654B (en) * 2018-04-09 2020-04-21 平安科技(深圳)有限公司 Identity verification method based on voiceprint recognition, server and storage medium
CN108447489B (en) * 2018-04-17 2020-05-22 清华大学 Continuous voiceprint authentication method and system with feedback
CN108806695A (en) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 Anti- fraud method, apparatus, computer equipment and the storage medium of self refresh
CN108630208B (en) * 2018-05-14 2020-10-27 平安科技(深圳)有限公司 Server, voiceprint-based identity authentication method and storage medium
CN108650266B (en) * 2018-05-14 2020-02-18 平安科技(深圳)有限公司 Server, voiceprint verification method and storage medium
CN108834138B (en) * 2018-05-25 2022-05-24 北京国联视讯信息技术股份有限公司 Network distribution method and system based on voiceprint data
CN109101801B (en) * 2018-07-12 2021-04-27 北京百度网讯科技有限公司 Method, apparatus, device and computer readable storage medium for identity authentication
CN109087647B (en) * 2018-08-03 2023-06-13 平安科技(深圳)有限公司 Voiceprint recognition processing method and device, electronic equipment and storage medium
CN109256138B (en) * 2018-08-13 2023-07-07 平安科技(深圳)有限公司 Identity verification method, terminal device and computer readable storage medium
CN110867189A (en) * 2018-08-28 2020-03-06 北京京东尚科信息技术有限公司 Login method and device
CN110880325B (en) * 2018-09-05 2022-06-28 华为技术有限公司 Identity recognition method and equipment
CN109450850B (en) * 2018-09-26 2022-10-11 深圳壹账通智能科技有限公司 Identity authentication method, identity authentication device, computer equipment and storage medium
CN109377662A (en) * 2018-09-29 2019-02-22 途客易达(天津)网络科技有限公司 Charging pile control method, device and electronic equipment
CN109257362A (en) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice print verification
CN109378002B (en) * 2018-10-11 2024-05-07 平安科技(深圳)有限公司 Voiceprint verification method, voiceprint verification device, computer equipment and storage medium
CN109147797B (en) * 2018-10-18 2024-05-07 平安科技(深圳)有限公司 Customer service method, device, computer equipment and storage medium based on voiceprint recognition
CN109473105A (en) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 The voice print verification method, apparatus unrelated with text and computer equipment
CN109524026B (en) * 2018-10-26 2022-04-26 北京网众共创科技有限公司 Method and device for determining prompt tone, storage medium and electronic device
CN109360573A (en) * 2018-11-13 2019-02-19 平安科技(深圳)有限公司 Livestock method for recognizing sound-groove, device, terminal device and computer storage medium
CN109493873A (en) * 2018-11-13 2019-03-19 平安科技(深圳)有限公司 Livestock method for recognizing sound-groove, device, terminal device and computer storage medium
CN109636630A (en) * 2018-12-07 2019-04-16 泰康保险集团股份有限公司 Method, apparatus, medium and electronic equipment of the detection for behavior of insuring
CN110046910B (en) * 2018-12-13 2023-04-14 蚂蚁金服(杭州)网络技术有限公司 Method and equipment for judging validity of transaction performed by customer through electronic payment platform
CN109816508A (en) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 Method for authenticating user identity, device based on big data, computer equipment
CN109473108A (en) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 Auth method, device, equipment and storage medium based on Application on Voiceprint Recognition
CN109545226B (en) * 2019-01-04 2022-11-22 平安科技(深圳)有限公司 Voice recognition method, device and computer readable storage medium
CN110322888B (en) * 2019-05-21 2023-05-30 平安科技(深圳)有限公司 Credit card unlocking method, apparatus, device and computer readable storage medium
CN110298150B (en) * 2019-05-29 2021-11-26 上海拍拍贷金融信息服务有限公司 Identity verification method and system based on voice recognition
CN110334603A (en) * 2019-06-06 2019-10-15 视联动力信息技术股份有限公司 Authentication system
CN110473569A (en) * 2019-09-11 2019-11-19 苏州思必驰信息科技有限公司 Detect the optimization method and system of speaker's spoofing attack
CN110738998A (en) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 Voice-based personal credit evaluation method, device, terminal and storage medium
CN110971755B (en) * 2019-11-18 2021-04-20 武汉大学 Double-factor identity authentication method based on PIN code and pressure code
CN111402899B (en) * 2020-03-25 2023-10-13 中国工商银行股份有限公司 Cross-channel voiceprint recognition method and device
CN111597531A (en) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 Identity authentication method and device, electronic equipment and readable storage medium
CN111625704A (en) * 2020-05-11 2020-09-04 镇江纵陌阡横信息科技有限公司 Non-personalized recommendation algorithm model based on user intention and data cooperation
CN111710340A (en) * 2020-06-05 2020-09-25 深圳市卡牛科技有限公司 Method, device, server and storage medium for identifying user identity based on voice
CN111613230A (en) * 2020-06-24 2020-09-01 泰康保险集团股份有限公司 Voiceprint verification method, voiceprint verification device, voiceprint verification equipment and storage medium
CN111899566A (en) * 2020-08-11 2020-11-06 南京畅淼科技有限责任公司 Ship traffic management system based on AIS
CN112289324B (en) * 2020-10-27 2024-05-10 湖南华威金安企业管理有限公司 Voiceprint identity recognition method and device and electronic equipment
CN112669841A (en) * 2020-12-18 2021-04-16 平安科技(深圳)有限公司 Training method and device for multilingual speech generation model and computer equipment
CN112802481A (en) * 2021-04-06 2021-05-14 北京远鉴信息技术有限公司 Voiceprint verification method, voiceprint recognition model training method, device and equipment
CN113421575B (en) * 2021-06-30 2024-02-06 平安科技(深圳)有限公司 Voiceprint recognition method, voiceprint recognition device, voiceprint recognition equipment and storage medium
CN114780787A (en) * 2022-04-01 2022-07-22 杭州半云科技有限公司 Voiceprint retrieval method, identity verification method, identity registration method and device
CN114826709A (en) * 2022-04-15 2022-07-29 马上消费金融股份有限公司 Identity authentication and acoustic environment detection method, system, electronic device and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method
US20120330663A1 (en) * 2011-06-27 2012-12-27 Hon Hai Precision Industry Co., Ltd. Identity authentication system and method
US20130225128A1 (en) * 2012-02-24 2013-08-29 Agnitio Sl System and method for speaker recognition on mobile devices

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8645137B2 (en) * 2000-03-16 2014-02-04 Apple Inc. Fast, language-independent method for user authentication by voice
TWI234762B (en) * 2003-12-22 2005-06-21 Top Dihital Co Ltd Voiceprint identification system for e-commerce
US7447633B2 (en) * 2004-11-22 2008-11-04 International Business Machines Corporation Method and apparatus for training a text independent speaker recognition system using speech data with text labels
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
CN101064043A (en) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 Sound-groove gate inhibition system and uses thereof
CN102479511A (en) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 Large-scale voiceprint authentication method and system
CN102238190B (en) * 2011-08-01 2013-12-11 安徽科大讯飞信息科技股份有限公司 Identity authentication method and system
CN102509547B (en) * 2011-12-29 2013-06-19 辽宁工业大学 Method and system for voiceprint recognition based on vector quantization based
CN102695112A (en) * 2012-06-09 2012-09-26 九江妙士酷实业有限公司 Automobile player and volume control method thereof
CN102916815A (en) * 2012-11-07 2013-02-06 华为终端有限公司 Method and device for checking identity of user
CN103220286B (en) * 2013-04-10 2015-02-25 郑方 Identity verification system and identity verification method based on dynamic password voice
CN104427076A (en) * 2013-08-30 2015-03-18 中兴通讯股份有限公司 Recognition method and recognition device for automatic answering of calling system
CN103632504A (en) * 2013-12-17 2014-03-12 上海电机学院 Silence reminder for library
CN104765996B (en) * 2014-01-06 2018-04-27 讯飞智元信息科技有限公司 Voiceprint password authentication method and system
CN104978507B (en) * 2014-04-14 2019-02-01 中国石油化工集团公司 A kind of Intelligent controller for logging evaluation expert system identity identifying method based on Application on Voiceprint Recognition
CN105100911A (en) * 2014-05-06 2015-11-25 夏普株式会社 Intelligent multimedia system and method
CN103986725A (en) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 Client side, server side and identity authentication system and method
CN104157301A (en) * 2014-07-25 2014-11-19 广州三星通信技术研究有限公司 Method, device and terminal deleting voice information blank segment
CN105321293A (en) * 2014-09-18 2016-02-10 广东小天才科技有限公司 Danger detection and warning method and danger detection and warning smart device
CN104485102A (en) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 Voiceprint recognition method and device
CN104751845A (en) * 2015-03-31 2015-07-01 江苏久祥汽车电器集团有限公司 Voice recognition method and system used for intelligent robot
CN104992708B (en) * 2015-05-11 2018-07-24 国家计算机网络与信息安全管理中心 Specific audio detection model generation in short-term and detection method
CN105096955B (en) * 2015-09-06 2019-02-01 广东外语外贸大学 A kind of speaker's method for quickly identifying and system based on model growth cluster
CN105611461B (en) * 2016-01-04 2019-12-17 浙江宇视科技有限公司 Noise suppression method, device and system for front-end equipment voice application system
CN105575394A (en) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 Voiceprint identification method based on global change space and deep learning hybrid modeling
CN106971717A (en) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 Robot and audio recognition method, the device of webserver collaborative process
CN105869645B (en) * 2016-03-25 2019-04-12 腾讯科技(深圳)有限公司 Voice data processing method and device
CN106210323B (en) * 2016-07-13 2019-09-24 Oppo广东移动通信有限公司 A kind of speech playing method and terminal device
CN106169295B (en) * 2016-07-15 2019-03-01 腾讯科技(深圳)有限公司 Identity vector generation method and device
CN106373576B (en) * 2016-09-07 2020-07-21 Tcl科技集团股份有限公司 Speaker confirmation method and system based on VQ and SVM algorithms
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403953A (en) * 2002-09-06 2003-03-19 浙江大学 Palm acoustic-print verifying system
US20120330663A1 (en) * 2011-06-27 2012-12-27 Hon Hai Precision Industry Co., Ltd. Identity authentication system and method
US20130225128A1 (en) * 2012-02-24 2013-08-29 Agnitio Sl System and method for speaker recognition on mobile devices
CN102820033A (en) * 2012-08-17 2012-12-12 南京大学 Voiceprint identification method

Also Published As

Publication number Publication date
WO2018166187A1 (en) 2018-09-20
TWI641965B (en) 2018-11-21
TW201833810A (en) 2018-09-16
CN107517207A (en) 2017-12-26
CN107068154A (en) 2017-08-18

Similar Documents

Publication Publication Date Title
WO2018166112A1 (en) Voiceprint recognition-based identity verification method, electronic device, and storage medium
WO2019100606A1 (en) Electronic device, voiceprint-based identity verification method and system, and storage medium
JP6621536B2 (en) Electronic device, identity authentication method, system, and computer-readable storage medium
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
WO2018107810A1 (en) Voiceprint recognition method and apparatus, and electronic device and medium
TWI527023B (en) A voiceprint recognition method and apparatus
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
WO2017215558A1 (en) Voiceprint recognition method and device
CN110956966B (en) Voiceprint authentication method, voiceprint authentication device, voiceprint authentication medium and electronic equipment
KR20180104595A (en) Method for identifying a gate, device, storage medium and backstage server
CN107886943A (en) A kind of method for recognizing sound-groove and device
CN103794207A (en) Dual-mode voice identity recognition method
CN105096955A (en) Speaker rapid identification method and system based on growing and clustering algorithm of models
US9947323B2 (en) Synthetic oversampling to enhance speaker identification or verification
CN102881291A (en) Sensing Hash value extracting method and sensing Hash value authenticating method for voice sensing Hash authentication
Duraibi Voice biometric identity authentication model for iot devices
WO2019196305A1 (en) Electronic device, identity verification method, and storage medium
CN113223536A (en) Voiceprint recognition method and device and terminal equipment
WO2019218512A1 (en) Server, voiceprint verification method, and storage medium
Biagetti et al. Speaker identification with short sequences of speech frames
Zhang et al. Imperceptible black-box waveform-level adversarial attack towards automatic speaker recognition
Lin et al. A multiscale chaotic feature extraction method for speaker recognition
Guo et al. Voice-based user-device physical unclonable functions for mobile device authentication
WO2019218515A1 (en) Server, voiceprint-based identity authentication method, and storage medium
Nagakrishnan et al. Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17900320

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09/12/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17900320

Country of ref document: EP

Kind code of ref document: A1