WO2018166187A1 - 服务器、身份验证方法、系统及计算机可读存储介质 - Google Patents

服务器、身份验证方法、系统及计算机可读存储介质 Download PDF

Info

Publication number
WO2018166187A1
WO2018166187A1 PCT/CN2017/105031 CN2017105031W WO2018166187A1 WO 2018166187 A1 WO2018166187 A1 WO 2018166187A1 CN 2017105031 W CN2017105031 W CN 2017105031W WO 2018166187 A1 WO2018166187 A1 WO 2018166187A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
voiceprint feature
feature vector
voiceprint
password
Prior art date
Application number
PCT/CN2017/105031
Other languages
English (en)
French (fr)
Inventor
王健宗
查高密
程宁
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018166187A1 publication Critical patent/WO2018166187A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a server, an authentication method, a system, and a computer readable storage medium.
  • the present invention provides a server including a memory and a processor coupled to the memory, the memory storing an identity verification system operable on the processor, the identity verification
  • the system implements the following steps when executed by the processor:
  • S2 Receive a password voice broadcast by the user based on the voice acquisition text, and perform character recognition on the password voice to identify a password character corresponding to the password voice.
  • the password character is consistent with the standard password character corresponding to the voice acquisition text, construct a current voiceprint feature vector of the cipher voice, and according to a predetermined identity identifier and a standard voiceprint feature.
  • the mapping relationship of the vector determines a standard voiceprint feature vector corresponding to the identity of the user, and calculates a distance between the current voiceprint feature vector and the determined standard voiceprint feature vector by using a predetermined distance calculation formula, according to the distance pair
  • the user authenticates.
  • the present invention also provides a server including a memory and a processor coupled to the memory, the memory storing therein a voiceprint recognition-based identity verification executable on the processor.
  • the system when the voiceprint recognition based authentication system is executed by the processor, implements the following steps:
  • S102 Input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the present invention also provides an identity verification method, where the identity verification method includes:
  • S2 Receive a password voice broadcast by the user based on the voice acquisition text, and perform character recognition on the password voice to identify a password character corresponding to the password voice.
  • the password character is consistent with the standard password character corresponding to the voice acquisition text, construct a current voiceprint feature vector of the cipher voice, and determine the user according to a mapping relationship between the predetermined identity identifier and the standard voiceprint feature vector.
  • the standard voiceprint feature vector corresponding to the identity identifier calculates a distance between the current voiceprint feature vector and the determined standard voiceprint feature vector by using a predetermined distance calculation formula, and performs identity verification on the user according to the distance.
  • the present invention also provides an identity verification method, where the identity verification method includes:
  • S102 Input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the present invention also provides an identity verification system, where the identity verification system includes:
  • the sending module is configured to: after receiving the identity verification request that carries the identity identifier sent by the client, randomly send the voice acquisition text for the user response to the client;
  • a character recognition module configured to receive a password voice of a user broadcast based on the voice acquisition text, perform character recognition on the password voice, and identify a password character corresponding to the password voice;
  • An identity verification module configured to construct a current voiceprint feature vector of the cipher voice if the password character is consistent with a standard password character corresponding to the voice acquisition text, and according to a mapping relationship between the predetermined identity identifier and the standard voiceprint feature vector Determining a standard voiceprint feature vector corresponding to the identity of the user, calculating a distance between the current voiceprint feature vector and the determined standard voiceprint feature vector by using a predetermined distance calculation formula, and authenticating the user according to the distance .
  • the present invention also provides a voiceprint recognition based authentication system, the voiceprint recognition based identity verification system comprising:
  • a building module configured to acquire a voiceprint feature of the voice data after receiving the voice data of the authenticated user, and construct a corresponding voiceprint feature vector based on the voiceprint feature;
  • An input module configured to input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • an identity verification module configured to calculate a spatial distance between the current voiceprint discrimination vector and the pre-stored standard voiceprint discrimination vector of the user, perform identity verification on the user based on the distance, and generate a verification result.
  • the present invention also provides a computer readable storage medium having stored thereon an identity verification system, the method of implementing the above described identity verification method when the identity verification system is executed by a processor.
  • the present invention also provides another computer readable storage medium having stored on a voiceprint recognition based authentication system, the voiceprint recognition based authentication system being implemented by a processor The steps of the above authentication method.
  • the beneficial effects of the present invention are: if another person uses an existing or prepared false recording for identity verification, the recognized password character should be inconsistent with the corresponding standard password character due to the randomness of the transmitted voice to obtain the text. It prevents others from using existing or prepared fake recordings for authentication; if others record their own voice for authentication, they cannot be verified by subsequent voiceprint features. Therefore, the embodiment is equivalent to performing two authentications, which has the effect of double verification, and improves the security of the identity verification while ensuring the accuracy and efficiency of the user identity verification.
  • FIG. 1 is a schematic diagram of an optional application environment according to various embodiments of the present invention.
  • FIG. 2 is a schematic structural diagram of an embodiment of an identity verification system according to the present invention.
  • FIG. 3 is a schematic flowchart diagram of an embodiment of an identity verification method according to the present invention.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the identity verification method of the present invention.
  • the application environment diagram includes a server 1 and a terminal device 2.
  • the server 1 can perform data interaction with the terminal device 2 through a suitable technology such as a network or a near field communication technology.
  • the terminal device 2 is installed with a client for sending an authentication request to the server 1.
  • the terminal device 2 includes, but is not limited to, any one that can be performed with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device.
  • Machine-interactive electronic products such as personal computers, tablets, smart phones, personal digital assistants (PDAs), game consoles, Internet Protocol Television (IPTV), smart wearable devices, navigation A removable device such as a device, or a fixed terminal such as a digital TV, a desktop computer, a notebook, a server, or the like.
  • the server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the server 1 may be a computer, a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, where cloud computing is a type of distributed computing.
  • a super virtual computer consisting of a group of loosely coupled computers.
  • the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores an identity verification system executable on the processor 12. It is pointed out that Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the server 1;
  • the readable storage medium can be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit of the server 1, such as a hard disk of the server 1; in other embodiments, the non-volatile storage medium may also be an external storage device of the server 1, For example, a plug-in hard disk provided on the server 1, a smart memory card (SMC), a Secure Digital (SD) card, a flash card, and the like.
  • the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program codes of the identity verification system in an embodiment of the present invention. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the server 1, such as performing control and processing related to data interaction or communication with the terminal device 2.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running an identity verification system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the server 1 with one or more terminal devices 2, and establish a data transmission channel and a communication connection between the server 1 and one or more terminal devices 2.
  • the authentication system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; And, the at least one computer readable instruction can be classified into different logic modules according to functions implemented by the respective parts thereof. As shown in FIG. 2, the identity verification system is divided into a sending module 1, a character recognition module 2, and an identity verification. Module 3.
  • Step S1 After receiving the identity verification request that carries the identity identifier sent by the client, randomly send the voice acquisition text for the user response to the client;
  • the user performs an operation on the client, and sends an identity verification request carrying the identity identifier to the server.
  • the server After receiving the identity verification request, the server randomly sends the voice acquisition text for the user response to the client.
  • the identity identifier may be the user's ID number or the user's mobile phone number, etc.; the voice acquisition text for the user response is various, and the server randomly sends one of the characters to the client, so as to prevent others from using the existing fake. Recording for authentication.
  • the voice-acquired text may be text corresponding to a random password that requires voice recording, or may be text of a question that requires a random password for voice recording.
  • the voice acquisition text is “Please record a string of numbers ***”, and the user records the “Please record a string of numbers ***” voice when responding according to the voice acquisition text, and, for example, the voice acquisition text is the question text “ Where is your birthplace?" The user recorded "My birthplace is ***" when responding to the text obtained by the voice.
  • Step S2 receiving a password voice broadcast by the user based on the voice acquisition text, performing character recognition on the password voice, and identifying a password character corresponding to the password voice;
  • the manner in which the user records the cipher voice on the client may be: the user obtains the text according to the voice, and after the user presses the predetermined physical button or the virtual button, the voice recording unit is controlled to perform voice recording, and the user releases the button. After that, the voice recording is stopped, and the recorded voice is sent to the server as a password voice.
  • the voice recording device maintains an appropriate distance from the user, and tries not to use a large voice recording device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the server After receiving the cipher voice, the server performs character recognition on the cipher voice, that is, converts the cipher voice into characters, wherein the cipher voice can be directly converted into characters, and the cipher voice can be denoised to further reduce interference.
  • the recorded cipher voice is voice data of a preset data length, or voice data greater than a preset data length.
  • Step S3 If the password character is consistent with the standard password character corresponding to the voice acquisition text, construct a current voiceprint feature vector of the cipher voice, and determine the user according to a mapping relationship between the predetermined identity identifier and the standard voiceprint feature vector.
  • the standard voiceprint feature vector corresponding to the identity identifier is calculated by using a predetermined distance calculation formula to calculate a distance between the current voiceprint feature vector and the determined standard voiceprint feature vector, and the user is authenticated according to the distance.
  • the voice acquisition texts there are multiple types of voice acquisition texts, and there are multiple types of standard password characters pre-stored on the server, and the voice acquisition texts are respectively in one-to-one correspondence with standard password characters.
  • the standard password character corresponding to the sent voice acquisition text is obtained, and it is determined whether the identified password character is consistent with the corresponding standard password character.
  • the current voiceprint feature vector of the cryptographic voice is further constructed.
  • the voiceprint features include multiple types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is preferably a Mel Frequency Cepstrum Coefficient (MFCC) of voice data. ).
  • MFCC Mel Frequency Cepstrum Coefficient
  • the distance between the current voiceprint feature vector of the embodiment and the determined standard voiceprint feature vector is a cosine distance.
  • the cosine distance is a measure of the magnitude of the difference between two individuals using the cosine of the angle between the two vectors in the vector space.
  • the standard voiceprint feature vector is a pre-stored voiceprint feature vector. Before calculating the distance, the corresponding standard voiceprint feature vector is obtained according to the user identification.
  • the verification passes, and vice versa, the verification fails.
  • the embodiment is equivalent to performing two authentications, which has the effect of double verification, and improves the security of the identity verification while ensuring the accuracy and efficiency of the user identity verification.
  • the step S2 includes: receiving the cipher voice of the user broadcast sent by the client. And analyzing whether the password voice is available, and if the password voice is unavailable, prompting the client to re-record the password voice, or if the password voice is available, Then, the cipher voice is subjected to character recognition.
  • Whether the password voice is available is based on the analysis of whether the duration of the user's speaking part is greater than a preset duration, whether the background noise volume of the password voice is less than the first preset volume, and/or the speaking volume is greater than the second preset volume, if If the analysis result in the above is satisfied, the password voice is available, and the subsequent character recognition and the like may be performed; otherwise, if the duration of the user speaking part is less than the preset duration, or the background noise volume of the password voice is greater than or equal to the first preset volume, or If the speaking volume is less than or equal to the second preset volume, the password voice is not available. At this time, the client is prompted to re-record the password voice.
  • the following steps are further implemented: if the password character is inconsistent with the standard password character corresponding to the voice acquisition text, then the client is randomly sent again to the client. Sending a voice-acquired text for the user to respond; accumulating the number of times the voice is sent to the client to obtain the text, and if the number of times is greater than or equal to the preset number of times, terminating the response to the identity verification request.
  • the user may provide an opportunity to randomly send the voice to the client for the voice response text, and at the same time, in order to prevent too much
  • the password verification wastes the computer resources, and the number of times the password verification can be limited is less than the preset number of times, that is, the number of times the voice is sent to the client is less than the preset number of times, and the authentication request is terminated when the number of times is greater than or equal to the preset number of times. the response to.
  • the step of constructing the current voiceprint feature vector of the cipher voice in the step S3 includes: processing the cipher voice by using a preset filter to perform preset Extracting the voiceprint feature of the type, and constructing the voiceprint feature vector corresponding to the cryptographic voice based on the extracted preset voiceprint feature; inputting the constructed voiceprint feature vector into the pre-trained background channel model to construct the current sound Pattern feature vector.
  • the preset filter is preferably a Meyer filter.
  • the cipher voice is pre-emphasized, framing, and windowed.
  • the cipher voice is processed.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the cryptographic speech are more prominent.
  • each frame signal is regarded as a stationary signal.
  • the start frame and the end frame of the cipher voice are discontinuous. After the framing, the original voice is further deviated. Therefore, the cipher voice needs to be windowed.
  • a cepstrum analysis is performed on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the cryptographic speech of this frame, and the Mel frequency cepstral coefficient MFCC of each frame constitutes a feature data matrix, which is the voiceprint feature vector of the cryptographic speech.
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint feature vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint feature vector is obtained by calculating the primary term and the quadratic term.
  • step S3 the distance between the current voiceprint feature vector and the determined standard voiceprint feature vector is calculated by using a predetermined distance calculation formula, according to the
  • the step of authenticating the user includes calculating a cosine distance between the current voiceprint discrimination vector and the determined standard voiceprint feature vector:
  • the identity verification passes; if the cosine distance is greater than the preset distance threshold, the identity verification fails.
  • the present invention also provides another server similar to the hardware architecture of the server of FIG. 1 above, including a memory and a processor connected to the memory, and connected to an external terminal device through a network interface.
  • a memory-based identification system based on voiceprint recognition is stored in the memory
  • the voiceprint recognition based authentication system is stored in the memory, including at least one stored in the memory.
  • Computer readable instructions executable by a processor to implement the methods of various embodiments of the present application; and the at least one computer readable instruction may be Divided into different logic modules, the voiceprint recognition based authentication system can be classified into a building module, an input module and an authentication module.
  • the voiceprint recognition based authentication system is implemented by the processor to implement the following steps:
  • the voice data is collected by the voice collection device (the voice collection device is, for example, a microphone), and the voice collection device sends the collected voice data to the voice recognition-based identity verification system.
  • the voice collection device is, for example, a microphone
  • the voice collection device When collecting voice data, you should try to prevent environmental noise and interference from voice acquisition equipment.
  • the voice collection device maintains an appropriate distance from the user, and tries not to use a large voice acquisition device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the voice data may be denoised prior to extracting the voiceprint features in the voice data to further reduce interference.
  • the collected voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • the voiceprint features include various types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is a Mel Frequency Cepstrum Coefficient (MFCC), which is preferably voice data. .
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the voice data is composed into a feature data matrix, which is a voiceprint feature vector of the voice data.
  • S102 Input the voiceprint feature vector into a background channel model generated by pre-training to construct a current voiceprint discrimination vector corresponding to the voice data;
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint discrimination vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Extract the current voiceprint discrimination vector firstly calculate the first-order and second-order coefficients, and the first-order coefficient calculation can be obtained by summing the probability matrix:
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint discrimination vector is calculated by the primary term and the quadratic term.
  • the background channel model is a Gaussian mixture model
  • the method includes:
  • the voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
  • the model training ends, and the trained Gaussian mixture model is used as the background channel model of the step S102, or if the accuracy is less than or equal to the preset threshold, the voice is added.
  • the number of data samples is re-trained based on the increased speech data samples.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • the spatial distance of the present embodiment is a cosine distance
  • the cosine distance is a cosine value of the angle between two vectors in the vector space.
  • the standard voiceprint discriminant vector is a voiceprint discriminant vector obtained and stored in advance, and the standard voiceprint discriminant vector carries the identifier information of the corresponding user when stored, which can accurately represent the identity of the corresponding user.
  • the stored voiceprint discrimination vector is obtained according to the identification information provided by the user before calculating the spatial distance.
  • the verification passes, and vice versa, the verification fails.
  • the background channel model generated by the pre-training in this embodiment is obtained by mining and comparing a large amount of voice data, and the model can accurately depict the user while maximally retaining the voiceprint features of the user.
  • the background voiceprint feature when speaking, and can remove this feature when identifying, and extracting the inherent features of the user voice, can greatly improve the accuracy of the user identity verification, and improve the efficiency of the identity verification; It makes full use of the voiceprint features related to the vocal vocal in the human voice.
  • This voiceprint feature does not need to limit the text, so it has greater flexibility in the process of identification and verification.
  • FIG. 3 is a schematic flowchart of an embodiment of an identity verification method according to an embodiment of the present invention.
  • the identity verification method includes the following steps:
  • Step S1 After receiving the identity verification request that carries the identity identifier sent by the client, randomly send the voice acquisition text for the user response to the client;
  • the user performs an operation on the client, and sends an identity verification request carrying the identity identifier to the server.
  • the server After receiving the identity verification request, the server randomly sends the voice acquisition text for the user response to the client.
  • the identity identifier may be the user's ID number or the user's mobile phone number, etc.; the voice acquisition text for the user response is various, and the server randomly sends one of the characters to the client, so as to prevent others from using the existing fake. Recording for authentication.
  • the voice acquisition text may be text corresponding to a random password that requires voice recording, or may be a random password that requires voice recording.
  • the text of the question For example, the voice acquisition text is “Please record a string of numbers ***”, and the user records the “Please record a string of numbers ***” voice when responding according to the voice acquisition text, and, for example, the voice acquisition text is the question text “ Where is your birthplace?" The user recorded "My birthplace is ***" when responding to the text obtained by the voice.
  • Step S2 receiving a password voice broadcast by the user based on the voice acquisition text, performing character recognition on the password voice, and identifying a password character corresponding to the password voice;
  • the manner in which the user records the cipher voice on the client may be: the user obtains the text according to the voice, and after the user presses the predetermined physical button or the virtual button, the voice recording unit is controlled to perform voice recording, and the user releases the button. After that, the voice recording is stopped, and the recorded voice is sent to the server as a password voice.
  • the voice recording device maintains an appropriate distance from the user, and tries not to use a large voice recording device.
  • the power supply preferably uses the commercial power and keeps the current stable; the sensor should be used when recording the telephone.
  • the server After receiving the cipher voice, the server performs character recognition on the cipher voice, that is, converts the cipher voice into characters, wherein the cipher voice can be directly converted into characters, and the cipher voice can be denoised to further reduce interference.
  • the recorded cipher voice is voice data of a preset data length, or voice data greater than a preset data length.
  • Step S3 If the password character is consistent with the standard password character corresponding to the voice acquisition text, construct a current voiceprint feature vector of the cipher voice, and determine the user according to a mapping relationship between the predetermined identity identifier and the standard voiceprint feature vector.
  • the standard voiceprint feature vector corresponding to the identity identifier is calculated by using a predetermined distance calculation formula to calculate a distance between the current voiceprint feature vector and the determined standard voiceprint feature vector, and the user is authenticated according to the distance.
  • the voice acquisition texts there are multiple types of voice acquisition texts, and there are multiple types of standard password characters pre-stored on the server, and the voice acquisition texts are respectively in one-to-one correspondence with standard password characters.
  • the standard password character corresponding to the sent voice acquisition text is obtained, and it is determined whether the identified password character is consistent with the corresponding standard password character.
  • the current voiceprint feature vector of the cryptographic voice is further constructed.
  • the voiceprint features include multiple types, such as wide-band voiceprint, narrow-band voiceprint, amplitude voiceprint, etc., and the voiceprint feature of the present embodiment is preferably a Mel Frequency Cepstrum Coefficient (MFCC) of voice data. ).
  • MFCC Mel Frequency Cepstrum Coefficient
  • the distance between the current voiceprint feature vector of the embodiment and the determined standard voiceprint feature vector is a cosine distance.
  • the cosine distance is a measure of the magnitude of the difference between two individuals using the cosine of the angle between the two vectors in the vector space.
  • the standard voiceprint feature vector is a pre-stored voiceprint feature vector. Before calculating the distance, the corresponding standard voiceprint feature vector is obtained according to the user identification.
  • the verification passes, and vice versa, the verification fails.
  • the step S2 includes: receiving the cipher voice of the user broadcast sent by the client. And analyzing whether the password voice is available, and if the password voice is not available, prompting the client to re-record the password voice, or if the password voice is available, performing character recognition on the password voice.
  • Whether the password voice is available is based on the analysis of whether the duration of the user's speaking part is greater than a preset duration, whether the background noise volume of the password voice is less than the first preset volume, and/or the speaking volume is greater than the second preset volume, if If the analysis result in the above is satisfied, the password voice is available, and the subsequent character recognition and the like may be performed; otherwise, if the duration of the user speaking part is less than the preset duration, or the background noise volume of the password voice is greater than or equal to the first preset volume, or If the speaking volume is less than or equal to the second preset volume, the password voice is not available. At this time, the client is prompted to re-record the password voice.
  • the identity verification method further includes the following steps: if the password character does not match the standard password character corresponding to the voice acquisition text, The client sends the voice acquisition text for the user response; the number of times the voice is sent to the client to obtain the text, and if the number of times is greater than or equal to the preset number of times, the response to the identity verification request is terminated.
  • the user may provide an opportunity to randomly send the voice to the client for the voice response text, and at the same time, in order to prevent too much
  • the password verification wastes the computer resources, and the number of times the password verification can be limited is less than the preset number of times, that is, the number of times the voice is sent to the client is less than the preset number of times, and the authentication request is terminated when the number of times is greater than or equal to the preset number of times. the response to.
  • the step of constructing the current voiceprint feature vector of the cipher voice in the step S3 includes: processing the cipher voice by using a preset filter to perform preset Extracting the voiceprint feature of the type, and constructing the voiceprint feature vector corresponding to the cryptographic voice based on the extracted preset voiceprint feature; inputting the constructed voiceprint feature vector into the pre-trained background channel model to construct the current sound Pattern feature vector.
  • the preset filter is preferably a Meyer filter.
  • the cipher voice is pre-emphasized, framing, and windowed.
  • the cipher voice is processed.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the cryptographic speech are more prominent.
  • each frame signal is regarded as a stationary signal.
  • the start frame and the end frame of the cipher voice are discontinuous. After the framing, the original voice is further deviated. Therefore, the cipher voice needs to be windowed.
  • a cepstrum analysis is performed on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the cepstrum analysis is, for example, taking logarithm and inverse transform.
  • the inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients.
  • the Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the cryptographic speech of this frame, and the Mel frequency cepstral coefficient MFCC of each frame constitutes a feature data matrix, which is the voiceprint feature vector of the cryptographic speech.
  • the voiceprint feature vector is input into the background channel model generated by the pre-training.
  • the background channel model is a Gaussian mixture model, and the background channel model is used to calculate the voiceprint feature vector to obtain a corresponding current voiceprint feature vector ( I-vector).
  • the calculation process includes:
  • Loglike is a likelihood logarithmic matrix
  • E(X) is a mean matrix trained by a general background channel model
  • D(X) is a covariance matrix
  • X is a data matrix
  • X. 2 is a square of each value of the matrix.
  • Gamma i is the i-th element of the first-order coefficient vector
  • loglikes ji is the j-th row of the probability matrix, the i-th element.
  • the second-order coefficients can be obtained by multiplying the transposition of the probability matrix by the data matrix:
  • X Loglike T *feats, where X is a second-order coefficient matrix, loglike is a probability matrix, and feats is a feature data matrix.
  • the primary term and the quadratic term are calculated in parallel, and then the current voiceprint feature vector is obtained by calculating the primary term and the quadratic term.
  • step S3 the distance between the current voiceprint feature vector and the determined standard voiceprint feature vector is calculated by using a predetermined distance calculation formula, according to the
  • the step of authenticating the user includes calculating a cosine distance between the current voiceprint discrimination vector and the determined standard voiceprint feature vector:
  • the identity verification passes; if the cosine distance is greater than the preset distance threshold, the identity verification fails.
  • the background channel model is a Gaussian mixture model
  • the training background channel model includes:
  • the voiceprint feature vector corresponding to each voice data sample is divided into a training set of a first ratio and a verification set of a second ratio, wherein a sum of the first ratio and the second ratio is less than or equal to 1;
  • the Gaussian mixture model is trained by using the voiceprint feature vector in the training set, and after the training is completed, the accuracy of the trained Gaussian mixture model is verified by using the verification set;
  • the model training ends, and the trained Gaussian mixture model is used as the background channel model to be applied, or if the accuracy is less than or equal to a preset threshold, the voice data is added.
  • the likelihood probability corresponding to the extracted D-dimensional voiceprint feature can be expressed by K Gaussian components:
  • P(x) is the probability that the speech data samples are generated by the Gaussian mixture model (mixed Gaussian model), w k is the weight of each Gaussian model, and p(x
  • K is the number of Gaussian models.
  • the parameters of the entire Gaussian mixture model can be expressed as: ⁇ w i , ⁇ i , ⁇ i ⁇ , w i is the weight of the i-th Gaussian model, ⁇ i is the mean of the i-th Gaussian model, and ⁇ i is the i-th Gaussian
  • Training the Gaussian mixture model can use an unsupervised EM algorithm. After the training is completed, the Gaussian mixture model weight vector, constant vector, N covariance matrix, and the mean multiplied by the covariance matrix are obtained, which is a trained Gaussian mixture model.
  • the present invention also provides a computer readable storage medium on which the computer readable storage medium is stored There is stored an authentication system that implements the steps of the above described authentication method when executed by the processor.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present invention.

Abstract

本发明涉及一种服务器、身份验证方法、系统及计算机可读存储介质,服务器包括存储器及与存储器连接的处理器,存储器中存储有可在处理器上运行的身份验证系统,身份验证系统被处理器执行时实现如下步骤:在收到身份验证请求后,随机向该客户端发送语音获取文本;接收客户端发送的用户播报的密码语音,识别出密码语音对应的密码字符;若密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的映射关系确定对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据距离对用户进行身份验证。本发明能够提高身份验证的安全性。

Description

服务器、身份验证方法、系统及计算机可读存储介质
优先权申明
本申请基于巴黎公约申明享有2017年03月13日递交的申请号为CN201710147695.X、名称为“基于声纹识别的身份验证的方法及系统”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
本申请基于巴黎公约申明享有2017年08月20日递交的申请号为CN201710715433.9、名称为“服务器、身份验证方法及计算机可读存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。
技术领域
本发明涉及通信技术领域,尤其涉及一种服务器、身份验证方法、系统及计算机可读存储介质。
背景技术
目前,大型金融公司的业务范围涉及保险、银行、投资等多个业务范畴,每个业务范畴通常都需要同客户进行沟通,沟通的方式有多种(例如电话沟通或者面对面沟通等)。在进行沟通之前,对客户的身份进行验证成为保证业务安全的重要组成部分。
为了满足业务的实时性需求,金融公司很多采用人工方式对客户的身份进行分析验证,但由于客户群体庞大,依靠人工进行判别分析以对验证客户的身份的方式准确性不高,效率也低,为了解决这个问题,在现有的其他方案,金融公司还采用一种声纹方案进行身份验证,但该种方案并不能排除不法分子利用虚假录音通过声纹身份验证,具有一定的安全性隐患。
发明内容
本发明的目的在于提供一种服务器、身份验证方法、系统及计算机可读存储介质,旨在提高身份验证的安全性。
为实现上述目的,本发明提供一种服务器,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的身份验证系统,所述身份验证系统被所述处理器执行时实现如下步骤:
S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征 向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
为实现上述目的,本发明还提供一种服务器,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的基于声纹识别的身份验证的系统,所述基于声纹识别的身份验证的系统被所述处理器执行时实现如下步骤:
S101,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
S102,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
S103,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
为实现上述目的,本发明还提供一种身份验证方法,所述身份验证方法包括:
S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
为实现上述目的,本发明还提供一种身份验证方法,所述身份验证方法包括:
S101,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
S102,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
S103,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
为实现上述目的,本发明还提供一种身份验证系统,所述身份验证系统包括:
发送模块,用于在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
字符识别模块,用于接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
身份验证模块,用于若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
为实现上述目的,本发明还提供一种基于声纹识别的身份验证的系统,所述基于声纹识别的身份验证的系统包括:
构建模块,用于在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
输入模块,用于将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
身份验证模块,用于计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有身份验证系统,所述身份验证系统被处理器执行时实现上述的身份验证方法的步骤。
本发明还提供另一种计算机可读存储介质,所述计算机可读存储介质上存储有基于声纹识别的身份验证的系统,所述基于声纹识别的身份验证的系统被处理器执行时实现上述的身份验证方法的步骤。
本发明的有益效果是:若他人利用已有或已准备的虚假录音进行身份验证,由于发送的语音获取文本的随机性,则所识别的得到的密码字符应与对应的标准密码字符不一致,这样能够防止他人利用已有或已准备的虚假录音进行身份验证;如果他人录制自己的声音进行身份验证,则无法通过之后的声纹特征验证。因此,本实施例相当于进行两次身份验证,具有双重验证的效果,在保证用户身份验证的准确率及效率的同时,提高身份验证的安全性。
附图说明
图1为本发明各个实施例一可选的应用环境示意图;
图2为本发明身份验证系统一实施例的结构示意图;
图3为本发明身份验证方法一实施例的流程示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本发明,并不用于限定本发明。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
需要说明的是,在本发明中涉及“第一”、“第二”等的描述仅用于描述 目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本发明要求的保护范围之内。
参阅图1所示,是本发明身份验证方法的较佳实施例的应用环境示意图。该应用环境示意图包括服务器1及终端设备2。服务器1可以通过网络、近场通信技术等适合的技术与终端设备2进行数据交互。
终端设备2上安装有用于向服务器1发送身份验证请求的客户端,终端设备2包括,但不限于,任何一种可与用户通过键盘、鼠标、遥控器、触摸板或者声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant,PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备、导航装置等等的可移动设备,或者诸如数字TV、台式计算机、笔记本、服务器等等的固定终端。
所述服务器1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述服务器1可以是计算机、也可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。
在本实施例中,服务器1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的身份验证系统。需要指出的是,图1仅示出了具有组件11-13的服务器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为服务器1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是服务器1的内部存储单元,例如该服务器1的硬盘;在另一些实施例中,该非易失性存储介质也可以是服务器1的外部存储设备,例如服务器1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于服务器1的操作系统和各类应用软件,例如本发明一实施例中的身份验证系统的程序代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器1的总体操作,例如执行与所述终端设备2进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行身份验证系统等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述服务器1与其他电子设备之间建立通信连接。本实施例中,网络接口13主要用于将服务器1与一个或多个终端设备2相连,在服务器1与一个或多个终端设备2之间建立数据传输通道和通信连接。
所述身份验证系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块,如图2所示,身份验证系统划为发送模块1、字符识别模块2及身份验证模块3。
在一实施例中,上述身份验证系统被所述处理器12执行时实现如下步骤:
步骤S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
其中,用户在客户端上进行操作,向服务器发送携带身份标识的身份验证请求,服务器接收到该身份验证请求后,随机向客户端发送供用户响应的语音获取文本。
其中,身份标识可以是用户的身份证号或者用户的手机号码等等;供用户响应的语音获取文本有多种,服务器向客户端随机发送其中的一种,目的在于防止他人利用已有的虚假录音进行身份验证。该语音获取文本可以是需要语音录制的随机密码对应的文本,或者,可以是需要语音录制的随机密码的提问的文本。例如,语音获取文本为“请录制一串数字***”,用户根据该语音获取文本进行响应时录制“请录制一串数字***”的语音,又如,语音获取文本为提问文本“你的出生地在哪里”,用户根据该语音获取文本进行响应时录制“我的出生地在***”。
步骤S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
本实施例中,用户在客户端录制该密码语音的方式可以为:用户根据语音获取文本,在用户按压预先确定的物理按键或者虚拟按键后,控制声音录制单元进行语音录制,在用户释放该按键后,停止语音录制,所录制的语音作为密码语音发送给服务器。
其中,在进行密码语音录制时,应尽量防止环境噪声和语音录制设备的干扰。语音录制设备与用户保持适当距离,且尽量不用失真大的语音录制设备,电源优选使用市电,并保持电流稳定;在进行电话录音时应使用传感器。
服务器接收到该密码语音后,对该密码语音进行字符识别,即将密码语音转化为一个个的字符,其中,可以直接将密码语音转化为字符,可以对密码语音进行去噪音处理,以进一步减少干扰。为了能够提取得到密码语音的声纹特征,所录制的密码语音为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。
步骤S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
本实施例中,语音获取文本有多种,服务器上预存的标准密码字符也有多种,语音获取文本分别与标准密码字符一一对应。在识别出密码语音对应的密码字符后,获取与所发送的语音获取文本对应的标准密码字符,判断所识别的得到的密码字符与对应的标准密码字符是否一致。
如果所识别的得到的密码字符应与对应的标准密码字符一致,则进一步构建该密码语音的当前声纹特征向量。其中,声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例的声纹特征优选地为语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。在构建对应的声纹特征向量时,将密码语音的声纹特征组成特征数据矩阵,该特征数据矩阵即为密码语音的声纹特征向量。
向量与向量之间的距离有多种,包括余弦距离及欧氏距离等等,优选地,本实施例的当前声纹特征向量与所确定的标准声纹特征向量之间的距离为余弦距离,余弦距离为利用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。
其中,标准声纹特征向量为预先存储的声纹特征向量。在计算距离前,根据用户标识获得对应的标准声纹特征向量。
其中,在计算得到的距离小于等于预设距离阈值时,验证通过,反之,则验证失败。
与现有技术相比,若他人利用已有或已准备的虚假录音进行身份验证,由于发送的语音获取文本的随机性,则所识别的得到的密码字符应与对应的标准密码字符不一致,这样能够防止他人利用已有或已准备的虚假录音进行身份验证;如果他人录制自己的声音进行身份验证,则无法通过之后的声纹特征验证。因此,本实施例相当于进行两次身份验证,具有双重验证的效果,在保证用户身份验证的准确率及效率的同时,提高身份验证的安全性。
在一优选的实施例中,为了防止密码语音的音频质量影响声纹特征验证的结果,在上述图1的实施例的基础上,所述步骤S2包括:接收客户端发送的用户播报的密码语音,分析所述密码语音是否可用,若所述密码语音不可用,则提示客户端重新进行密码语音的录制,或者,若所述密码语音可用, 则对所述密码语音进行字符识别。
其中,密码语音是否可用是基于下述的分析:分析用户说话部分时长是否大于预设时长、密码语音的背景噪音音量是否小于第一预设音量和/或说话音量大于第二预设音量,若上述中的分析结果均满足则该密码语音可用,可以执行后续的字符识别等操作;反之,若用户说话部分时长小于预设时长,或密码语音的背景噪音音量大于等于第一预设音量,或说话音量小于等于第二预设音量,则该密码语音不可用,此时,提示客户端重新进行密码语音的录制。
在一优选的实施例中,所述身份验证系统被所述处理器执行时,还实现如下步骤:若所述密码字符与该语音获取文本对应的标准密码字符不一致,则再次随机向该客户端发送供用户响应的语音获取文本;累计向客户端发送的语音获取文本的次数,若所述次数大于等于预设次数,则终止对所述身份验证请求的响应。
若用户录制了错误的密码语音,即密码字符与该语音获取文本对应的标准密码字符不一致时,可以提供再次随机向该客户端发送供用户响应的语音获取文本的机会,同时,为了防止过多的密码验证浪费计算机资源,可以限定密码验证的次数小于预设次数,即累计向客户端发送的语音获取文本的次数小于预设次数,并在该次数大于等于预设次数时终止对身份验证请求的响应。
在一优选的实施例中,在上述实施例的基础上,上述步骤S3中构建该密码语音的当前声纹特征向量的步骤包括:利用预设滤波器对所述密码语音进行处理以进行预设类型声纹特征的提取,并基于提取的预设类型声纹特征构建该密码语音对应的声纹特征向量;将构建的声纹特征向量输入预先训练的背景信道模型,以构建出所述当前声纹特征向量。
其中,预设滤波器优选为梅尔滤波器。首先,对所述密码语音进行预加重、分帧和加窗处理;本实施例中,在接收到进行身份验证的用户的密码语音后,对密码语音进行处理。其中,预加重处理实际是高通滤波处理,滤除低频数据,使得密码语音中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ-1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于声音信号只在较短时间内呈现平稳性,因此将一段声音信号分成N段短时间的信号(即N帧),且为了避免声音的连续性特征丢失,相邻帧之间有一段重复区域,重复区域一般为每帧长的1/2;在对密码语音进行分帧后,每一帧信号都当成平稳信号来处理,但吉布斯效应的存在,密码语音的起始帧和结束帧是不连续的,在分帧之后,更加背离原始语音,因此,需要对密码语音进行加窗处理。
对每一个加窗进行傅立叶变换得到对应的频谱;
将所述频谱输入梅尔滤波器以输出得到梅尔频谱;
在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。其中,倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为MFCC系数。梅尔频率倒谱系数MFCC即为这帧密码语音的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为密码语音的声纹特征向量。
然后,将声纹特征向量输入预先训练生成的背景信道模型,优选地,该背景信道模型为高斯混合模型,利用该背景信道模型来计算声纹特征向量,得出对应的当前声纹特征向量(即i-vector)。
具体地,该计算过程包括:
1)、选择高斯模型:首先,利用通用背景信道模型中的参数来计算每帧数据在不同高斯模型的似然对数值,通过对似然对数值矩阵每列并行排序,选取前N个高斯模型,最终获得一每帧数据在混合高斯模型中数值的矩阵:
Loglike=E(X)*D(X)-1*XT-0.5*D(X)-1*(X.2)T
其中,Loglike为似然对数值矩阵,E(X)为通用背景信道模型训练出来的均值矩阵,D(X)为协方差矩阵,X为数据矩阵,X.2为矩阵每个值取平方。
2)、计算后验概率:将每帧数据X进行X*XT计算,得到一个对称矩阵,可简化为下三角矩阵,并将元素按顺序排列为1行,变成一个N帧乘以该下三角矩阵个数纬度的一个向量进行计算,将所有帧的该向量组合成新的数据矩阵,同时将通用背景模型中计算概率的协方差矩阵,每个矩阵也简化为下三角矩阵,变成与新数据矩阵类似的矩阵,在通过通用背景信道模型中的均值矩阵和协方差矩阵算出每帧数据的在该选择的高斯模型下的似然对数值,然后进行Softmax回归,最后进行归一化操作,得到每帧在混合高斯模型后验概率分布,将每帧的概率分布向量组成概率矩阵。
3)、提取当前声纹特征向量:首先进行一阶,二阶系数的计算,一阶系数计算可以通过概率矩阵列求和得到:
Figure PCTCN2017105031-appb-000001
其中,Gammai为一阶系数向量的第i个元素,loglikesji为概率矩阵的第j行,第i个元素。
二阶系数可以通过概率矩阵的转置乘以数据矩阵获得:
X=LoglikeT*feats,其中,X为二阶系数矩阵,loglike为概率矩阵,feats为特征数据矩阵。
在计算得到一阶,二阶系数以后,并行计算一次项和二次项,然后通过一次项和二次项计算得到当前声纹特征向量。
在一优选的实施例中,在上述实施例的基础上,上述步骤S3中利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证的步骤包括:计算所述当前声纹鉴别向量与所确定的标准声纹特征向量之间的余弦距离:
Figure PCTCN2017105031-appb-000002
其中,
Figure PCTCN2017105031-appb-000003
为所述标准声纹特征向量,
Figure PCTCN2017105031-appb-000004
为当前声纹特征向量。若所述余弦距离小于或者等于预设的距离阈值,则身份验证通过;若所述余弦距离大于预设的距离阈值,则身份验证不通过。
本发明还提供另一种服务器,该服务器与上述图1的服务器的硬件架构类似,包括存储器及与存储器连接的处理器,且通过网络接口与外部的终端设备连接。所不同的是,存储器中存储有可在所述处理器上运行的基于声纹识别的身份验证的系统,基于声纹识别的身份验证的系统存储在存储器中,包括至少一个存储在存储器中的计算机可读指令,该至少一个计算机可读指令可被处理器器执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块,基于声纹识别的身份验证的系统可划为构建模块、输入模块及身份验证模块。
该基于声纹识别的身份验证的系统被所述处理器执行时实现如下步骤:
S101,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
本实施例中,语音数据由语音采集设备采集得到(语音采集设备例如为麦克风),语音采集设备将采集的语音数据发送给基于声纹识别的身份验证的系统。
在采集语音数据时,应尽量防止环境噪声和语音采集设备的干扰。语音采集设备与用户保持适当距离,且尽量不用失真大的语音采集设备,电源优选使用市电,并保持电流稳定;在进行电话录音时应使用传感器。在提取语音数据中的声纹特征之前,可以对语音数据进行去噪音处理,以进一步减少干扰。为了能够提取得到语音数据的声纹特征,所采集的语音数据为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。
声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例的声纹特征为优选地为语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。在构建对应的声纹特征向量时,将语音数据的声纹特征组成特征数据矩阵,该特征数据矩阵即为语音数据的声纹特征向量。
S102,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
其中,将声纹特征向量输入预先训练生成的背景信道模型,优选地,该背景信道模型为高斯混合模型,利用该背景信道模型来计算声纹特征向量,得出对应的当前声纹鉴别向量(即i-vector)。
具体地,该计算过程包括:
1)、选择高斯模型:首先,利用通用背景信道模型中的参数来计算每帧数据在不同高斯模型的似然对数值,通过对似然对数值矩阵每列并行排序,选取前N个高斯模型,最终获得一每帧数据在混合高斯模型中数值的矩阵:
Loglike=E(X)*D(X)-1*XT-0.5*D(X)-1*(X.2)T
其中,Loglike为似然对数值矩阵,E(X)为通用背景信道模型训练出来的均值矩阵,D(X)为协方差矩阵,X为数据矩阵,X.2为矩阵每个值取平方。
2)、计算后验概率:将每帧数据X进行X*XT计算,得到一个对称矩阵,可简化为下三角矩阵,并将元素按顺序排列为1行,变成一个N帧乘以该下三角矩阵个数纬度的一个向量进行计算,将所有帧的该向量组合成新的数据矩阵,同时将通用背景模型中计算概率的协方差矩阵,每个矩阵也简化为下三角矩阵,变成与新数据矩阵类似的矩阵,在通过通用背景信道模型中的均值矩阵和协方差矩阵算出每帧数据的在该选择的高斯模型下的似然对数值,然后进行Softmax回归,最后进行归一化操作,得到每帧在混合高斯模型后验概率分布,将每帧的概率分布向量组成概率矩阵。
3)、提取当前声纹鉴别向量:首先进行一阶,二阶系数的计算,一阶系数计算可以通过概率矩阵列求和得到:
Figure PCTCN2017105031-appb-000005
其中,Gammai为一阶系数向量的第i个元素,loglikesji为概率矩阵的第j行,第i个元素。
二阶系数可以通过概率矩阵的转置乘以数据矩阵获得:
X=LoglikeT*feats,其中,X为二阶系数矩阵,loglike为概率矩阵,feats为特征数据矩阵。
在计算得到一阶,二阶系数以后,并行计算一次项和二次项,然后通过一次项和二次项计算当前声纹鉴别向量。
优选地,背景信道模型为高斯混合模型,在上述步骤S101之前包括:
获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;
将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;
利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;
若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S102的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。
其中,在利用训练集中的声纹特征向量对高斯混合模型进行训练时,抽取出来的D维声纹特征对应的似然概率可用K个高斯分量表示为:
Figure PCTCN2017105031-appb-000006
其中,P(x)为语音数据样本由高斯混合模型生成的概率(混合高斯模型),wk为每个高斯模型的权重,p(x|k)为样本由第k个高斯模型生成的概率,K为高斯模型数量。
整个高斯混合模型的参数可以表示为:{wiii},wi为第i个高斯模型的权重,μi为第i个高斯模型的均值,∑i为第i个高斯模型的协方差。训练该高斯混合模型可以用非监督的EM算法。训练完成后,得到高斯混合模型的权重向量、常数向量、N个协方差矩阵、均值乘以协方差的矩阵等,即为一个训练后的高斯混合模型。
S103,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
向量与向量之间的距离有多种,包括余弦距离及欧氏距离等等,优选地,本实施例的空间距离为余弦距离,余弦距离为利用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。
其中,标准声纹鉴别向量为预先获得并存储的声纹鉴别向量,标准声纹鉴别向量在存储时携带其对应的用户的标识信息,其能够准确代表对应的用户的身份。在计算空间距离前,根据用户提供的标识信息获得存储的声纹鉴别向量。
其中,在计算得到的空间距离小于等于预设距离阈值时,验证通过,反之,则验证失败。
与现有技术相比,本实施例预先训练生成的背景信道模型为通过对大量语音数据的挖掘与比对训练得到,这一模型可以在最大限度保留用户的声纹特征的同时,精确刻画用户说话时的背景声纹特征,并能够在识别时将这一特征去除,而提取用户声音的固有特征,能够较大地提高用户身份验证的准确率,并提高身份验证的效率;此外,本实施例充分利用了人声中与声道相关的声纹特征,这种声纹特征并不需要对文本加以限制,因而在进行识别与验证的过程中有较大的灵活性。
如图3所示,图3为本发明身份验证方法一实施例的流程示意图,该身份验证方法包括以下步骤:
步骤S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
其中,用户在客户端上进行操作,向服务器发送携带身份标识的身份验证请求,服务器接收到该身份验证请求后,随机向客户端发送供用户响应的语音获取文本。
其中,身份标识可以是用户的身份证号或者用户的手机号码等等;供用户响应的语音获取文本有多种,服务器向客户端随机发送其中的一种,目的在于防止他人利用已有的虚假录音进行身份验证。该语音获取文本可以是需要语音录制的随机密码对应的文本,或者,可以是需要语音录制的随机密码 的提问的文本。例如,语音获取文本为“请录制一串数字***”,用户根据该语音获取文本进行响应时录制“请录制一串数字***”的语音,又如,语音获取文本为提问文本“你的出生地在哪里”,用户根据该语音获取文本进行响应时录制“我的出生地在***”。
步骤S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
本实施例中,用户在客户端录制该密码语音的方式可以为:用户根据语音获取文本,在用户按压预先确定的物理按键或者虚拟按键后,控制声音录制单元进行语音录制,在用户释放该按键后,停止语音录制,所录制的语音作为密码语音发送给服务器。
其中,在进行密码语音录制时,应尽量防止环境噪声和语音录制设备的干扰。语音录制设备与用户保持适当距离,且尽量不用失真大的语音录制设备,电源优选使用市电,并保持电流稳定;在进行电话录音时应使用传感器。
服务器接收到该密码语音后,对该密码语音进行字符识别,即将密码语音转化为一个个的字符,其中,可以直接将密码语音转化为字符,可以对密码语音进行去噪音处理,以进一步减少干扰。为了能够提取得到密码语音的声纹特征,所录制的密码语音为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。
步骤S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
本实施例中,语音获取文本有多种,服务器上预存的标准密码字符也有多种,语音获取文本分别与标准密码字符一一对应。在识别出密码语音对应的密码字符后,获取与所发送的语音获取文本对应的标准密码字符,判断所识别的得到的密码字符与对应的标准密码字符是否一致。
如果所识别的得到的密码字符应与对应的标准密码字符一致,则进一步构建该密码语音的当前声纹特征向量。其中,声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例的声纹特征优选地为语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC)。在构建对应的声纹特征向量时,将密码语音的声纹特征组成特征数据矩阵,该特征数据矩阵即为密码语音的声纹特征向量。
向量与向量之间的距离有多种,包括余弦距离及欧氏距离等等,优选地,本实施例的当前声纹特征向量与所确定的标准声纹特征向量之间的距离为余弦距离,余弦距离为利用向量空间中两个向量夹角的余弦值作为衡量两个个体间差异的大小的度量。
其中,标准声纹特征向量为预先存储的声纹特征向量。在计算距离前,根据用户标识获得对应的标准声纹特征向量。
其中,在计算得到的距离小于等于预设距离阈值时,验证通过,反之,则验证失败。
在一优选的实施例中,为了防止密码语音的音频质量影响声纹特征验证的结果,在上述图3的实施例的基础上,所述步骤S2包括:接收客户端发送的用户播报的密码语音,分析所述密码语音是否可用,若所述密码语音不可用,则提示客户端重新进行密码语音的录制,或者,若所述密码语音可用,则对所述密码语音进行字符识别。
其中,密码语音是否可用是基于下述的分析:分析用户说话部分时长是否大于预设时长、密码语音的背景噪音音量是否小于第一预设音量和/或说话音量大于第二预设音量,若上述中的分析结果均满足则该密码语音可用,可以执行后续的字符识别等操作;反之,若用户说话部分时长小于预设时长,或密码语音的背景噪音音量大于等于第一预设音量,或说话音量小于等于第二预设音量,则该密码语音不可用,此时,提示客户端重新进行密码语音的录制。
在一优选的实施例中,在上述图3的实施例的基础上,该身份验证方法还包括如下步骤:若所述密码字符与该语音获取文本对应的标准密码字符不一致,则再次随机向该客户端发送供用户响应的语音获取文本;累计向客户端发送的语音获取文本的次数,若所述次数大于等于预设次数,则终止对所述身份验证请求的响应。
若用户录制了错误的密码语音,即密码字符与该语音获取文本对应的标准密码字符不一致时,可以提供再次随机向该客户端发送供用户响应的语音获取文本的机会,同时,为了防止过多的密码验证浪费计算机资源,可以限定密码验证的次数小于预设次数,即累计向客户端发送的语音获取文本的次数小于预设次数,并在该次数大于等于预设次数时终止对身份验证请求的响应。
在一优选的实施例中,在上述实施例的基础上,上述步骤S3中构建该密码语音的当前声纹特征向量的步骤包括:利用预设滤波器对所述密码语音进行处理以进行预设类型声纹特征的提取,并基于提取的预设类型声纹特征构建该密码语音对应的声纹特征向量;将构建的声纹特征向量输入预先训练的背景信道模型,以构建出所述当前声纹特征向量。
其中,预设滤波器优选为梅尔滤波器。首先,对所述密码语音进行预加重、分帧和加窗处理;本实施例中,在接收到进行身份验证的用户的密码语音后,对密码语音进行处理。其中,预加重处理实际是高通滤波处理,滤除低频数据,使得密码语音中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ-1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于声音信号只在较短时间内呈现平稳性,因此将一段声音 信号分成N段短时间的信号(即N帧),且为了避免声音的连续性特征丢失,相邻帧之间有一段重复区域,重复区域一般为每帧长的1/2;在对密码语音进行分帧后,每一帧信号都当成平稳信号来处理,但吉布斯效应的存在,密码语音的起始帧和结束帧是不连续的,在分帧之后,更加背离原始语音,因此,需要对密码语音进行加窗处理。
对每一个加窗进行傅立叶变换得到对应的频谱;
将所述频谱输入梅尔滤波器以输出得到梅尔频谱;
在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。其中,倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为MFCC系数。梅尔频率倒谱系数MFCC即为这帧密码语音的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为密码语音的声纹特征向量。
然后,将声纹特征向量输入预先训练生成的背景信道模型,优选地,该背景信道模型为高斯混合模型,利用该背景信道模型来计算声纹特征向量,得出对应的当前声纹特征向量(即i-vector)。
具体地,该计算过程包括:
1)、选择高斯模型:首先,利用通用背景信道模型中的参数来计算每帧数据在不同高斯模型的似然对数值,通过对似然对数值矩阵每列并行排序,选取前N个高斯模型,最终获得一每帧数据在混合高斯模型中数值的矩阵:
Loglike=E(X)*D(X)-1*XT-0.5*D(X)-1*(X.2)T
其中,Loglike为似然对数值矩阵,E(X)为通用背景信道模型训练出来的均值矩阵,D(X)为协方差矩阵,X为数据矩阵,X.2为矩阵每个值取平方。
2)、计算后验概率:将每帧数据X进行X*XT计算,得到一个对称矩阵,可简化为下三角矩阵,并将元素按顺序排列为1行,变成一个N帧乘以该下三角矩阵个数纬度的一个向量进行计算,将所有帧的该向量组合成新的数据矩阵,同时将通用背景模型中计算概率的协方差矩阵,每个矩阵也简化为下三角矩阵,变成与新数据矩阵类似的矩阵,在通过通用背景信道模型中的均值矩阵和协方差矩阵算出每帧数据的在该选择的高斯模型下的似然对数值,然后进行Softmax回归,最后进行归一化操作,得到每帧在混合高斯模型后验概率分布,将每帧的概率分布向量组成概率矩阵。
3)、提取当前声纹特征向量:首先进行一阶,二阶系数的计算,一阶系数计算可以通过概率矩阵列求和得到:
Figure PCTCN2017105031-appb-000007
其中,Gammai为一阶系数向量的第i个元素,loglikesji为概率矩阵的第j行,第i个元素。
二阶系数可以通过概率矩阵的转置乘以数据矩阵获得:
X=LoglikeT*feats,其中,X为二阶系数矩阵,loglike为概率矩阵,feats为特征数据矩阵。
在计算得到一阶,二阶系数以后,并行计算一次项和二次项,然后通过一次项和二次项计算得到当前声纹特征向量。
在一优选的实施例中,在上述实施例的基础上,上述步骤S3中利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证的步骤包括:计算所述当前声纹鉴别向量与所确定的标准声纹特征向量之间的余弦距离:
Figure PCTCN2017105031-appb-000008
其中,
Figure PCTCN2017105031-appb-000009
为所述标准声纹特征向量,
Figure PCTCN2017105031-appb-000010
为当前声纹特征向量。若所述余弦距离小于或者等于预设的距离阈值,则身份验证通过;若所述余弦距离大于预设的距离阈值,则身份验证不通过。
在一优选的实施例中,在上述实施例的基础上,背景信道模型为高斯混合模型,训练背景信道模型包括:
获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;
将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;
利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;
若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为上述待应用的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。
其中,在利用训练集中的声纹特征向量对高斯混合模型进行训练时,抽取出来的D维声纹特征对应的似然概率可用K个高斯分量表示为:
Figure PCTCN2017105031-appb-000011
其中,P(x)为语音数据样本由高斯混合模型生成的概率(混合高斯模型),wk为每个高斯模型的权重,p(x|k)为样本由第k个高斯模型生成的概率,K为高斯模型数量。
整个高斯混合模型的参数可以表示为:{wiii},wi为第i个高斯模型的权重,μi为第i个高斯模型的均值,∑i为第i个高斯模型的协方差。训练该高斯混合模型可以用非监督的EM算法。训练完成后,得到高斯混合模型的权重向量、常数向量、N个协方差矩阵、均值乘以协方差的矩阵等,即为一个训练后的高斯混合模型。
本发明还提供一种计算机可读存储介质,所述计算机可读存储介质上存 储有身份验证系统,所述身份验证系统被处理器执行时实现上述的身份验证方法的步骤。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本发明各个实施例所述的方法。
以上仅为本发明的优选实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (20)

  1. 一种服务器,其特征在于,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的身份验证系统,所述身份验证系统被所述处理器执行时实现如下步骤:
    S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
    S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
    S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
  2. 根据权利要求1所述的服务器,其特征在于,所述步骤S2包括:
    接收客户端发送的用户播报的密码语音,分析所述密码语音是否可用,若所述密码语音不可用,则提示客户端重新进行密码语音的录制,或者,若所述密码语音可用,则对所述密码语音进行字符识别。
  3. 根据权利要求1或2所述的服务器,其特征在于,所述身份验证系统被所述处理器执行时,还实现如下步骤:
    若所述密码字符与该语音获取文本对应的标准密码字符不一致,则再次随机向该客户端发送供用户响应的语音获取文本;
    累计向客户端发送的语音获取文本的次数,若所述次数大于等于预设次数,则终止对所述身份验证请求的响应。
  4. 根据权利要求1或2所述的服务器,其特征在于,所述构建该密码语音的当前声纹特征向量的步骤包括:
    利用预设滤波器对所述密码语音进行处理以进行预设类型声纹特征的提取,并基于提取的预设类型声纹特征构建该密码语音对应的声纹特征向量;
    将构建的声纹特征向量输入预先训练的背景信道模型,以构建出所述当前声纹特征向量;
    所述利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证的步骤包括:
    计算所述当前声纹鉴别向量与所确定的标准声纹特征向量之间的余弦距离:
    Figure PCTCN2017105031-appb-100001
    为所述标准声纹特征向量,
    Figure PCTCN2017105031-appb-100002
    为当前声纹特征向量;
    若所述余弦距离小于或者等于预设的距离阈值,则身份验证通过;
    若所述余弦距离大于预设的距离阈值,则身份验证不通过。
  5. 一种服务器,其特征在于,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的基于声纹识别的身份验证的系统,所述基于声纹识别的身份验证的系统被所述处理器执行时实现如下步骤:
    S101,在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
    S102,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
    S103,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
  6. 根据权利要求5所述的服务器,其特征在于,所述步骤S101包括:
    S1011,对所述语音数据进行预加重、分帧和加窗处理;
    S1012,对每一个加窗进行傅立叶变换得到对应的频谱;
    S1013,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;
    S1014,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。
  7. 根据权利要求5所述的服务器,其特征在于,所述步骤S103包括:
    S1031,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2017105031-appb-100003
    为所述标准声纹鉴别向量,
    Figure PCTCN2017105031-appb-100004
    为当前声纹鉴别向量;
    S1032,若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;
    S1033,若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。
  8. 一种身份验证方法,其特征在于,所述身份验证方法包括:
    S1,在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
    S2,接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
    S3,若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
  9. 根据权利要求8所述的身份验证方法,其特征在于,所述步骤S2包括:
    接收客户端发送的用户播报的密码语音,分析所述密码语音是否可用,若所述密码语音不可用,则提示客户端重新进行密码语音的录制,或者,若所述密码语音可用,则对所述密码语音进行字符识别。
  10. 根据权利要求8或9所述的身份验证方法,其特征在于,所述步骤S2之后还包括:
    若所述密码字符与该语音获取文本对应的标准密码字符不一致,则再次随机向该客户端发送供用户响应的语音获取文本;
    累计向客户端发送的语音获取文本的次数,若所述次数大于等于预设次数,则终止对所述身份验证请求的响应。
  11. 根据权利要求8或9所述的身份验证方法,其特征在于,所述构建该密码语音的当前声纹特征向量的步骤包括:
    利用预设滤波器对所述密码语音进行处理以进行预设类型声纹特征的提取,并基于提取的预设类型声纹特征构建该密码语音对应的声纹特征向量;
    将构建的声纹特征向量输入预先训练的背景信道模型,以构建出所述当前声纹特征向量;
    所述利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证的步骤包括:
    计算所述当前声纹鉴别向量与所确定的标准声纹特征向量之间的余弦距离:
    Figure PCTCN2017105031-appb-100005
    为所述标准声纹特征向量,
    Figure PCTCN2017105031-appb-100006
    为当前声纹特征向量;
    若所述余弦距离小于或者等于预设的距离阈值,则身份验证通过;
    若所述余弦距离大于预设的距离阈值,则身份验证不通过。
  12. 根据权利要求11所述的身份验证方法,其特征在于,所述背景信道模型为高斯混合模型,所述训练背景信道模型包括:
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。
  13. 一种身份验证方法,其特征在于,所述身份验证方法包括:
    S101,在接收到进行身份验证的用户的语音数据后,获取所述语音数据 的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
    S102,将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
    S103,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
  14. 根据权利要求13所述的身份验证方法,其特征在于,所述步骤S101包括:
    S1011,对所述语音数据进行预加重、分帧和加窗处理;
    S1012,对每一个加窗进行傅立叶变换得到对应的频谱;
    S1013,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;
    S1014,在梅尔频谱上面进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。
  15. 根据权利要求13所述的身份验证方法,其特征在于,所述步骤S103包括:
    S1031,计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2017105031-appb-100007
    为所述标准声纹鉴别向量,
    Figure PCTCN2017105031-appb-100008
    为当前声纹鉴别向量;
    S1032,若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;
    S1033,若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。
  16. 根据权利要求13至15任一项所述的身份验证方法,其特征在于,所述背景信道模型为高斯混合模型,所述步骤S101之前包括:
    获取预设数量的语音数据样本,并获取各语音数据样本对应的声纹特征,并基于各语音数据样本对应的声纹特征构建各语音数据样本对应的声纹特征向量;
    将各语音数据样本对应的声纹特征向量分为第一比例的训练集和第二比例的验证集,所述第一比例及第二比例的和小于等于1;
    利用所述训练集中的声纹特征向量对高斯混合模型进行训练,并在训练完成后,利用所述验证集对训练后的高斯混合模型的准确率进行验证;
    若所述准确率大于预设阈值,则模型训练结束,以训练后的高斯混合模型作为所述步骤S102的背景信道模型,或者,若所述准确率小于等于预设阈值,则增加所述语音数据样本的数量,并基于增加后的语音数据样本重新进行训练。
  17. 一种身份验证系统,其特征在于,所述身份验证系统包括:
    发送模块,用于在收到客户端发送的携带身份标识的身份验证请求后,随机向该客户端发送供用户响应的语音获取文本;
    字符识别模块,用于接收客户端基于所述语音获取文本发送的用户播报的密码语音,并对所述密码语音进行字符识别,识别出所述密码语音对应的密码字符;
    身份验证模块,用于若所述密码字符与该语音获取文本对应的标准密码字符一致,则构建该密码语音的当前声纹特征向量,并根据预定的身份标识与标准声纹特征向量的映射关系确定该用户的身份标识对应的标准声纹特征向量,利用预先确定的距离计算公式计算当前声纹特征向量与所确定的标准声纹特征向量之间的距离,根据所述距离对用户进行身份验证。
  18. 一种基于声纹识别的身份验证的系统,其特征在于,所述基于声纹识别的身份验证的系统包括:
    构建模块,用于在接收到进行身份验证的用户的语音数据后,获取所述语音数据的声纹特征,并基于所述声纹特征构建对应的声纹特征向量;
    输入模块,用于将所述声纹特征向量输入预先训练生成的背景信道模型,以构建出所述语音数据对应的当前声纹鉴别向量;
    身份验证模块,用于计算所述当前声纹鉴别向量与预存的该用户的标准声纹鉴别向量之间的空间距离,基于所述距离对该用户进行身份验证,并生成验证结果。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有身份验证系统,所述身份验证系统被处理器执行时实现如权利要求8至12中任一项所述的身份验证方法的步骤。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有基于声纹识别的身份验证的系统,该基于声纹识别的身份验证的系统被处理器执行时实现如权利要求13至16中任一项的身份验证方法的步骤。
PCT/CN2017/105031 2017-03-13 2017-09-30 服务器、身份验证方法、系统及计算机可读存储介质 WO2018166187A1 (zh)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201710147695.X 2017-03-13
CN201710147695.XA CN107068154A (zh) 2017-03-13 2017-03-13 基于声纹识别的身份验证的方法及系统
CN201710715433.9 2017-08-20
CN201710715433.9A CN107517207A (zh) 2017-03-13 2017-08-20 服务器、身份验证方法及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2018166187A1 true WO2018166187A1 (zh) 2018-09-20

Family

ID=59622093

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/CN2017/091361 WO2018166112A1 (zh) 2017-03-13 2017-06-30 基于声纹识别的身份验证的方法、电子装置及存储介质
PCT/CN2017/105031 WO2018166187A1 (zh) 2017-03-13 2017-09-30 服务器、身份验证方法、系统及计算机可读存储介质

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/091361 WO2018166112A1 (zh) 2017-03-13 2017-06-30 基于声纹识别的身份验证的方法、电子装置及存储介质

Country Status (3)

Country Link
CN (2) CN107068154A (zh)
TW (1) TWI641965B (zh)
WO (2) WO2018166112A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597531A (zh) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 一种身份认证方法、装置、电子设备及可读存储介质

Families Citing this family (51)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107068154A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 基于声纹识别的身份验证的方法及系统
CN107527620B (zh) * 2017-07-25 2019-03-26 平安科技(深圳)有限公司 电子装置、身份验证的方法及计算机可读存储介质
CN107993071A (zh) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 电子装置、基于声纹的身份验证方法及存储介质
CN108172230A (zh) * 2018-01-03 2018-06-15 平安科技(深圳)有限公司 基于声纹识别模型的声纹注册方法、终端装置及存储介质
CN108154371A (zh) * 2018-01-12 2018-06-12 平安科技(深圳)有限公司 电子装置、身份验证的方法及存储介质
CN108269575B (zh) * 2018-01-12 2021-11-02 平安科技(深圳)有限公司 更新声纹数据的语音识别方法、终端装置及存储介质
CN108091326B (zh) * 2018-02-11 2021-08-06 张晓雷 一种基于线性回归的声纹识别方法及系统
CN108766444B (zh) * 2018-04-09 2020-11-03 平安科技(深圳)有限公司 用户身份验证方法、服务器及存储介质
CN108768654B (zh) * 2018-04-09 2020-04-21 平安科技(深圳)有限公司 基于声纹识别的身份验证方法、服务器及存储介质
CN108694952B (zh) * 2018-04-09 2020-04-28 平安科技(深圳)有限公司 电子装置、身份验证的方法及存储介质
CN108806695A (zh) * 2018-04-17 2018-11-13 平安科技(深圳)有限公司 自更新的反欺诈方法、装置、计算机设备和存储介质
CN108447489B (zh) * 2018-04-17 2020-05-22 清华大学 一种带反馈的连续声纹认证方法及系统
CN108630208B (zh) * 2018-05-14 2020-10-27 平安科技(深圳)有限公司 服务器、基于声纹的身份验证方法及存储介质
CN108650266B (zh) * 2018-05-14 2020-02-18 平安科技(深圳)有限公司 服务器、声纹验证的方法及存储介质
CN108834138B (zh) * 2018-05-25 2022-05-24 北京国联视讯信息技术股份有限公司 一种基于声纹数据的配网方法及系统
CN109101801B (zh) * 2018-07-12 2021-04-27 北京百度网讯科技有限公司 用于身份认证的方法、装置、设备和计算机可读存储介质
CN109087647B (zh) * 2018-08-03 2023-06-13 平安科技(深圳)有限公司 声纹识别处理方法、装置、电子设备及存储介质
CN109256138B (zh) * 2018-08-13 2023-07-07 平安科技(深圳)有限公司 身份验证方法、终端设备及计算机可读存储介质
CN110867189A (zh) * 2018-08-28 2020-03-06 北京京东尚科信息技术有限公司 一种登陆方法和装置
CN110880325B (zh) * 2018-09-05 2022-06-28 华为技术有限公司 身份识别方法及设备
CN109450850B (zh) * 2018-09-26 2022-10-11 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN109377662A (zh) * 2018-09-29 2019-02-22 途客易达(天津)网络科技有限公司 充电桩控制方法、装置以及电子设备
CN109257362A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备以及存储介质
CN109378002A (zh) * 2018-10-11 2019-02-22 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备和存储介质
CN109147797A (zh) * 2018-10-18 2019-01-04 平安科技(深圳)有限公司 基于声纹识别的客服方法、装置、计算机设备及存储介质
CN109524026B (zh) * 2018-10-26 2022-04-26 北京网众共创科技有限公司 提示音的确定方法及装置、存储介质、电子装置
CN109473105A (zh) * 2018-10-26 2019-03-15 平安科技(深圳)有限公司 与文本无关的声纹验证方法、装置和计算机设备
CN109360573A (zh) * 2018-11-13 2019-02-19 平安科技(深圳)有限公司 牲畜声纹识别方法、装置、终端设备及计算机存储介质
CN109493873A (zh) * 2018-11-13 2019-03-19 平安科技(深圳)有限公司 牲畜声纹识别方法、装置、终端设备及计算机存储介质
CN109636630A (zh) * 2018-12-07 2019-04-16 泰康保险集团股份有限公司 检测代投保行为的方法、装置、介质及电子设备
CN110046910B (zh) * 2018-12-13 2023-04-14 蚂蚁金服(杭州)网络技术有限公司 判断客户通过电子支付平台所进行交易合法性的方法和设备
CN109816508A (zh) * 2018-12-14 2019-05-28 深圳壹账通智能科技有限公司 基于大数据的用户身份认证方法、装置、计算机设备
CN109473108A (zh) * 2018-12-15 2019-03-15 深圳壹账通智能科技有限公司 基于声纹识别的身份验证方法、装置、设备及存储介质
CN109545226B (zh) * 2019-01-04 2022-11-22 平安科技(深圳)有限公司 一种语音识别方法、设备及计算机可读存储介质
CN110322888B (zh) * 2019-05-21 2023-05-30 平安科技(深圳)有限公司 信用卡解锁方法、装置、设备及计算机可读存储介质
CN110298150B (zh) * 2019-05-29 2021-11-26 上海拍拍贷金融信息服务有限公司 一种基于语音识别的身份验证方法及系统
CN110334603A (zh) * 2019-06-06 2019-10-15 视联动力信息技术股份有限公司 身份验证系统
CN110473569A (zh) * 2019-09-11 2019-11-19 苏州思必驰信息科技有限公司 检测说话人欺骗攻击的优化方法及系统
CN110738998A (zh) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 基于语音的个人信用评估方法、装置、终端及存储介质
CN110971755B (zh) * 2019-11-18 2021-04-20 武汉大学 一种基于pin码和压力码的双因素身份认证方法
CN111402899B (zh) * 2020-03-25 2023-10-13 中国工商银行股份有限公司 跨信道声纹识别方法及装置
CN111625704A (zh) * 2020-05-11 2020-09-04 镇江纵陌阡横信息科技有限公司 一种用户意图与数据协同的非个性化推荐算法模型
CN111710340A (zh) * 2020-06-05 2020-09-25 深圳市卡牛科技有限公司 基于语音识别用户身份的方法、装置、服务器及存储介质
CN111613230A (zh) * 2020-06-24 2020-09-01 泰康保险集团股份有限公司 声纹验证方法、装置、设备及存储介质
CN111899566A (zh) * 2020-08-11 2020-11-06 南京畅淼科技有限责任公司 一种基于ais的船舶交通管理系统
CN112289324A (zh) * 2020-10-27 2021-01-29 湖南华威金安企业管理有限公司 声纹身份识别的方法、装置和电子设备
CN112669841A (zh) * 2020-12-18 2021-04-16 平安科技(深圳)有限公司 多语种语音的生成模型的训练方法、装置及计算机设备
CN112802481A (zh) * 2021-04-06 2021-05-14 北京远鉴信息技术有限公司 声纹验证方法、声纹识别模型训练方法、装置及设备
CN113421575B (zh) * 2021-06-30 2024-02-06 平安科技(深圳)有限公司 声纹识别方法、装置、设备及存储介质
CN114780787A (zh) * 2022-04-01 2022-07-22 杭州半云科技有限公司 声纹检索方法、身份验证方法、身份注册方法和装置
CN114826709A (zh) * 2022-04-15 2022-07-29 马上消费金融股份有限公司 身份认证和声学环境检测方法、系统、电子设备及介质

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060111905A1 (en) * 2004-11-22 2006-05-25 Jiri Navratil Method and apparatus for training a text independent speaker recognition system using speech data with text labels
CN101064043A (zh) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 一种声纹门禁系统及其应用
US20070294083A1 (en) * 2000-03-16 2007-12-20 Bellegarda Jerome R Fast, language-independent method for user authentication by voice
CN102238190A (zh) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 身份认证方法及系统
CN102509547A (zh) * 2011-12-29 2012-06-20 辽宁工业大学 基于矢量量化的声纹识别方法及系统
CN102916815A (zh) * 2012-11-07 2013-02-06 华为终端有限公司 用户身份验证的方法和装置
CN103220286A (zh) * 2013-04-10 2013-07-24 郑方 基于动态密码语音的身份确认系统及方法
CN103986725A (zh) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 一种客户端、服务器端以及身份认证系统和方法
CN104485102A (zh) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 声纹识别方法和装置
CN104765996A (zh) * 2014-01-06 2015-07-08 讯飞智元信息科技有限公司 声纹密码认证方法及系统
CN104978507A (zh) * 2014-04-14 2015-10-14 中国石油化工集团公司 一种基于声纹识别的智能测井评价专家系统身份认证方法
CN104992708A (zh) * 2015-05-11 2015-10-21 国家计算机网络与信息安全管理中心 短时特定音频检测模型生成与检测方法
CN105096955A (zh) * 2015-09-06 2015-11-25 广东外语外贸大学 一种基于模型生长聚类的说话人快速识别方法及系统
CN105869645A (zh) * 2016-03-25 2016-08-17 腾讯科技(深圳)有限公司 语音数据处理方法和装置
CN106169295A (zh) * 2016-07-15 2016-11-30 腾讯科技(深圳)有限公司 身份向量生成方法和装置
CN106373576A (zh) * 2016-09-07 2017-02-01 Tcl集团股份有限公司 一种基于vq和svm算法的说话人确认方法及其系统
CN107068154A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 基于声纹识别的身份验证的方法及系统

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1170239C (zh) * 2002-09-06 2004-10-06 浙江大学 掌上声纹验证方法
TWI234762B (en) * 2003-12-22 2005-06-21 Top Dihital Co Ltd Voiceprint identification system for e-commerce
US7536304B2 (en) * 2005-05-27 2009-05-19 Porticus, Inc. Method and system for bio-metric voice print authentication
CN102479511A (zh) * 2010-11-23 2012-05-30 盛乐信息技术(上海)有限公司 一种大规模声纹认证方法及其系统
TW201301261A (zh) * 2011-06-27 2013-01-01 Hon Hai Prec Ind Co Ltd 身份認證系統及方法
US9042867B2 (en) * 2012-02-24 2015-05-26 Agnitio S.L. System and method for speaker recognition on mobile devices
CN102695112A (zh) * 2012-06-09 2012-09-26 九江妙士酷实业有限公司 汽车播放器及其音量控制方法
CN102820033B (zh) * 2012-08-17 2013-12-04 南京大学 一种声纹识别方法
CN104427076A (zh) * 2013-08-30 2015-03-18 中兴通讯股份有限公司 呼叫系统自动应答的识别方法及装置
CN103632504A (zh) * 2013-12-17 2014-03-12 上海电机学院 图书馆安静提醒器
CN105100911A (zh) * 2014-05-06 2015-11-25 夏普株式会社 智能多媒体系统和方法
CN104157301A (zh) * 2014-07-25 2014-11-19 广州三星通信技术研究有限公司 删除语音信息空白片段的方法、装置和终端
CN105321293A (zh) * 2014-09-18 2016-02-10 广东小天才科技有限公司 一种危险检测提醒方法及智能设备
CN104751845A (zh) * 2015-03-31 2015-07-01 江苏久祥汽车电器集团有限公司 一种用于智能机器人的声音识别方法及系统
CN105611461B (zh) * 2016-01-04 2019-12-17 浙江宇视科技有限公司 前端设备语音应用系统的噪声抑制方法、装置及系统
CN105575394A (zh) * 2016-01-04 2016-05-11 北京时代瑞朗科技有限公司 基于全局变化空间及深度学习混合建模的声纹识别方法
CN106971717A (zh) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 机器人与网络服务器协作处理的语音识别方法、装置
CN106210323B (zh) * 2016-07-13 2019-09-24 Oppo广东移动通信有限公司 一种语音播放方法及终端设备

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070294083A1 (en) * 2000-03-16 2007-12-20 Bellegarda Jerome R Fast, language-independent method for user authentication by voice
US20060111905A1 (en) * 2004-11-22 2006-05-25 Jiri Navratil Method and apparatus for training a text independent speaker recognition system using speech data with text labels
CN101064043A (zh) * 2006-04-29 2007-10-31 上海优浪信息科技有限公司 一种声纹门禁系统及其应用
CN102238190A (zh) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 身份认证方法及系统
CN102509547A (zh) * 2011-12-29 2012-06-20 辽宁工业大学 基于矢量量化的声纹识别方法及系统
CN102916815A (zh) * 2012-11-07 2013-02-06 华为终端有限公司 用户身份验证的方法和装置
CN103220286A (zh) * 2013-04-10 2013-07-24 郑方 基于动态密码语音的身份确认系统及方法
CN104765996A (zh) * 2014-01-06 2015-07-08 讯飞智元信息科技有限公司 声纹密码认证方法及系统
CN104978507A (zh) * 2014-04-14 2015-10-14 中国石油化工集团公司 一种基于声纹识别的智能测井评价专家系统身份认证方法
CN103986725A (zh) * 2014-05-29 2014-08-13 中国农业银行股份有限公司 一种客户端、服务器端以及身份认证系统和方法
CN104485102A (zh) * 2014-12-23 2015-04-01 智慧眼(湖南)科技发展有限公司 声纹识别方法和装置
CN104992708A (zh) * 2015-05-11 2015-10-21 国家计算机网络与信息安全管理中心 短时特定音频检测模型生成与检测方法
CN105096955A (zh) * 2015-09-06 2015-11-25 广东外语外贸大学 一种基于模型生长聚类的说话人快速识别方法及系统
CN105869645A (zh) * 2016-03-25 2016-08-17 腾讯科技(深圳)有限公司 语音数据处理方法和装置
CN106169295A (zh) * 2016-07-15 2016-11-30 腾讯科技(深圳)有限公司 身份向量生成方法和装置
CN106373576A (zh) * 2016-09-07 2017-02-01 Tcl集团股份有限公司 一种基于vq和svm算法的说话人确认方法及其系统
CN107068154A (zh) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 基于声纹识别的身份验证的方法及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597531A (zh) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 一种身份认证方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
TWI641965B (zh) 2018-11-21
WO2018166112A1 (zh) 2018-09-20
TW201833810A (zh) 2018-09-16
CN107517207A (zh) 2017-12-26
CN107068154A (zh) 2017-08-18

Similar Documents

Publication Publication Date Title
WO2018166187A1 (zh) 服务器、身份验证方法、系统及计算机可读存储介质
WO2019100606A1 (zh) 电子装置、基于声纹的身份验证方法、系统及存储介质
JP6429945B2 (ja) 音声データを処理するための方法及び装置
JP6621536B2 (ja) 電子装置、身元認証方法、システム及びコンピュータ読み取り可能な記憶媒体
CN107610709B (zh) 一种训练声纹识别模型的方法及系统
EP3525209B1 (en) Systems and methods for cluster-based voice verification
Liu et al. An MFCC‐based text‐independent speaker identification system for access control
CN110956966B (zh) 声纹认证方法、装置、介质及电子设备
WO2020181824A1 (zh) 声纹识别方法、装置、设备以及计算机可读存储介质
WO2019136912A1 (zh) 电子装置、身份验证的方法、系统及存储介质
CN112562691A (zh) 一种声纹识别的方法、装置、计算机设备及存储介质
WO2019218512A1 (zh) 服务器、声纹验证的方法及存储介质
WO2019196305A1 (zh) 电子装置、身份验证的方法及存储介质
CN113177850A (zh) 一种保险的多方身份认证的方法及装置
AU2018201573B2 (en) Methods and Systems for Determining User Liveness
WO2019218515A1 (zh) 服务器、基于声纹的身份验证方法及存储介质
Nagakrishnan et al. Generic speech based person authentication system with genuine and spoofed utterances: different feature sets and models
WO2021196458A1 (zh) 贷款智能进件方法、装置及存储介质
CN113035230A (zh) 认证模型的训练方法、装置及电子设备
CN113436633B (zh) 说话人识别方法、装置、计算机设备及存储介质
TW201944320A (zh) 支付認證方法、裝置、設備及存儲介質
US20230153815A1 (en) Methods and systems for training a machine learning model and authenticating a user with the model
Nwazor A Raspberry Pi Based Speaker Recognition System for Access Control
US20230289420A1 (en) Method for multifactor authentication using bone conduction and audio signals
Gao et al. VarASV: Enabling pitch-variable automatic speaker verification via multi-task learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17900712

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 09/12/2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17900712

Country of ref document: EP

Kind code of ref document: A1