WO2019218512A1 - Server, voiceprint verification method, and storage medium - Google Patents

Server, voiceprint verification method, and storage medium Download PDF

Info

Publication number
WO2019218512A1
WO2019218512A1 PCT/CN2018/102049 CN2018102049W WO2019218512A1 WO 2019218512 A1 WO2019218512 A1 WO 2019218512A1 CN 2018102049 W CN2018102049 W CN 2018102049W WO 2019218512 A1 WO2019218512 A1 WO 2019218512A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
vector
current
verification
graphic code
Prior art date
Application number
PCT/CN2018/102049
Other languages
French (fr)
Chinese (zh)
Inventor
程序
彭俊清
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019218512A1 publication Critical patent/WO2019218512A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords
    • H04L63/0838Network architectures or network communication protocols for network security for authentication of entities using passwords using one-time-passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a server, a method for voiceprint verification, and a storage medium.
  • voiceprint verification technology to verify user identity has become an important means of verification for major customer service companies (eg, banks, insurance companies, game companies, etc.).
  • the traditional business solution for realizing user authentication using voiceprint verification technology is to use the interface of the voiceprint verification server to separately develop the corresponding client program, and collect and pre-process the user's voice through the developed client program, and then The voiceprint data processed in the previous period is transmitted to the voiceprint verification server, and the voiceprint verification server performs authentication verification and operation processing on the transmitted voiceprint data.
  • the purpose of the present application is to provide a server, a voiceprint verification method and a storage medium, which aim to improve the flexibility of voiceprint verification and avoid sound hijacking.
  • the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
  • a generating step after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
  • the analyzing step after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the present application further provides a method for voiceprint verification, and the method for voiceprint verification includes:
  • the server After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
  • the server After the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the data. Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
  • the application has the beneficial effects that the application does not require the developed client program to collect the user's voice data, and the voiceprint verification using the handheld terminal is highly flexible and not easily interfered, and the server is bound to the client computer by using the user identity. Then, the random code is used to bind the client computer, the server and the handheld terminal to avoid the situation of sound hijacking, and improve the authenticity and security of the voiceprint verification.
  • FIG. 1 is a schematic diagram of an optional application environment of each embodiment of the present application.
  • FIG. 2 is a schematic flow chart of an embodiment of a method for voiceprint verification according to the present application.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the method for voiceprint verification of the present application.
  • the application environment diagram includes a server 1, a client computer 2, and a handheld terminal 3.
  • the server 1 can perform data interaction with the client computer 2 and the handheld terminal 3 through a suitable technology such as a network or a near field communication technology.
  • the client computer 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a personal computer, a tablet, or a smart device.
  • Mobile devices such as mobile phones, personal digital assistants (PDAs), game consoles, Internet Protocol Television (IPTV), smart wearable devices, navigation devices, etc., or such as digital TVs, desktop computers Fixed terminals for notebooks, servers, etc.
  • the handheld terminal 3 can be a tablet computer, a smart phone, or the like.
  • the server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the server 1 may be a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, wherein cloud computing is a kind of distributed computing, which is loosely coupled by a group.
  • a super virtual computer consisting of a set of computers.
  • the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is pointed out that Figure 1 shows only the server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the server 1;
  • the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit on the server 1, such as a hard disk on the server 1; in other embodiments, the non-volatile storage medium may also be external to the server 1 Storage devices, such as plug-in hard drives on the server 1, smart memory cards (SMC), Secure Digital (SD) cards, flash cards, etc.
  • the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control overall operations on the server 1, such as performing control and processing related to data interaction or communication with the client computer 2, the handheld terminal 3.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the server 1 to the client computer 2 and the handheld terminal 3, and establish a data transmission channel and a communication connection between the server 1 and the client computer 2 and the handheld terminal 3.
  • the processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application;
  • the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • a generating step after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter;
  • the user identity is an identifier for uniquely identifying the identity of the user.
  • the user identity is an identity card number.
  • the graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like.
  • the graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode.
  • the graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
  • the client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code.
  • the effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter.
  • the code is displayed and displayed for scanning by the handheld terminal.
  • the analyzing step after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the handheld terminal After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
  • the server After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key.
  • the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the
  • the server receives the portable terminal to transmit by using the voiceprint data collection link address.
  • the voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code.
  • the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
  • the handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal.
  • the handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal.
  • the power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording.
  • the current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference.
  • the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
  • the voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like.
  • the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter.
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
  • pre-emphasizing and windowing processing the current voiceprint verification voice data performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
  • a cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent.
  • the cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform.
  • the inverse transform is generally realized by DCT discrete cosine transform.
  • the second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients.
  • MFCC The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame.
  • the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
  • the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
  • the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data.
  • Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
  • the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3.
  • the Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
  • the preset threshold for example, 98.5%
  • the background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data.
  • This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
  • the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
  • Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the user identity identifier when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried.
  • the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated.
  • the cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
  • the present application adopts an architecture composed of a client computer, a server and a handheld terminal when performing voiceprint verification, the client computer carries a user identity to make a request to the server, and the server generates a graphic code corresponding to the user identity.
  • the parameter is sent to the client computer for displaying the graphic code corresponding to the graphic code parameter, and the user scans the graphic code by using the carried handheld terminal, and then sends a random code to the server for verification through the link address, and the channel can be established with the server after the verification is passed.
  • the voice data of the user collected by the handheld terminal is obtained, and voiceprint verification is performed.
  • the application does not require the developed client program to collect the voice data of the user, and the voice recording verification using the handheld terminal is highly flexible and not easily interfered, and the user identity is utilized.
  • the logo binds the server to the client computer, and then binds the client computer, the server, and the handheld terminal with a random code to avoid sound hijacking and improve the authenticity and security of the voiceprint verification.
  • FIG. 2 is a schematic flowchart of a method for voiceprint verification according to an embodiment of the present invention.
  • the voiceprint verification method includes the following steps:
  • Step S1 After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer. And the client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter comprises a random key and a voiceprint data collection link address;
  • the user identity is an identifier for uniquely identifying the identity of the user.
  • the user identity is an identity card number.
  • the graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like.
  • the graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode.
  • the graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
  • the client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code.
  • the effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter.
  • the code is displayed and displayed for scanning by the handheld terminal.
  • Step S2 after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request carried by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes Whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal;
  • the handheld terminal After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
  • the server After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key.
  • the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the
  • the server receives the portable terminal to transmit by using the voiceprint data collection link address.
  • the voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code.
  • the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
  • Step S3 if yes, the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
  • the handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal.
  • the handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal.
  • the power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording.
  • the current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference.
  • the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • Step S4 constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between a predetermined user identity identifier and a standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
  • the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
  • the voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like.
  • the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter.
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
  • pre-emphasizing and windowing processing the current voiceprint verification voice data performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
  • a cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent.
  • the cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform.
  • the inverse transform is generally realized by DCT discrete cosine transform.
  • the second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients.
  • MFCC The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame.
  • the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
  • the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
  • the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data.
  • Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
  • the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3.
  • the Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
  • the preset threshold for example, 98.5%
  • the background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data.
  • This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
  • the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
  • Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the user identity identifier when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried.
  • the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated.
  • the cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
  • the application does not require the developed client program to collect the user's voice data, and the voice recording verification using the handheld terminal is highly flexible and difficult to be interfered with, and the user identity is used to bind the server to the client computer, and then the random code is used again.
  • the client computer, the server and the handheld terminal are bound to avoid the sound hijacking, and improve the authenticity and security of the voiceprint verification.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Abstract

The present application relates to a server, a voiceprint verification method, and a storage medium. The method comprises: after receiving an identity verification request, generating a graphic code parameter of a graphic code corresponding to a user identity, and sending the graphic code parameter to a client computer; after a handheld terminal parses the graphic code, receiving a voiceprint verification request, sent by the handheld terminal by means of a voiceprint data acquisition link address, carrying random keys, and analyzing whether the two random keys are consistent; if yes, establishing a voice data acquisition channel with the handheld terminal, and obtaining, on the basis of the channel, user's current voiceprint verification voice data acquired from the handheld terminal; and constructing a corresponding current voiceprint discrimination vector, determining a standard voiceprint discrimination vector corresponding to the user identity, calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating an identity verification result on the basis of the calculated distance. The present application can improve the flexibility of voiceprint verification, and avoid sound hijacking.

Description

服务器、声纹验证的方法及存储介质Server, voiceprint verification method and storage medium
优先权申明Priority claim
本申请基于巴黎公约申明享有2018年05月14日递交的申请号为CN2018104572671、名称为“服务器、声纹验证的方法及存储介质”中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application is based on the priority of the Chinese Patent Application entitled "Server, Voiceprint Verification Method and Storage Medium", which is filed on May 14, 2018, with the application number of CN2018104572671, the entire contents of which are incorporated by reference. The way is combined in this application.
技术领域Technical field
本申请涉及通信技术领域,尤其涉及一种服务器、声纹验证的方法及存储介质。The present application relates to the field of communications technologies, and in particular, to a server, a method for voiceprint verification, and a storage medium.
背景技术Background technique
目前,利用声纹验证技术实现用户身份的验证已经成为各大客户服务公司(例如,银行、保险公司、游戏公司等)的重要验证手段。传统的利用声纹验证技术实现用户身份验证的业务方案是:利用声纹验证服务器的接口,单独进行相应的客户端程序开发,通过开发的客户端程序对用户的语音进行采集和前期处理,然后将前期处理后的声纹数据传给声纹验证服务器,供声纹验证服务器对传过来的声纹数据进行鉴权验证和操作处理。At present, the use of voiceprint verification technology to verify user identity has become an important means of verification for major customer service companies (eg, banks, insurance companies, game companies, etc.). The traditional business solution for realizing user authentication using voiceprint verification technology is to use the interface of the voiceprint verification server to separately develop the corresponding client program, and collect and pre-process the user's voice through the developed client program, and then The voiceprint data processed in the previous period is transmitted to the voiceprint verification server, and the voiceprint verification server performs authentication verification and operation processing on the transmitted voiceprint data.
然而,这种传统的声纹验证方案的缺陷在于:用户需要通过所开发的客户端程序采集用户的语音,在实际操作中,使用灵活性低,容易受到人为声音干预,且采用客户端计算机采集声音时容易受到声音劫持,无法对声纹验证的真实性进行准确的控制,安全性得不到保证。However, the drawback of this traditional voiceprint verification scheme is that the user needs to collect the user's voice through the developed client program. In actual operation, the use flexibility is low, it is easy to be interfered by human voice, and the client computer collects. When the sound is sound, it is easily hijacked by the sound, and the authenticity of the voiceprint verification cannot be accurately controlled, and the security cannot be guaranteed.
发明内容Summary of the invention
本申请的目的在于提供一种服务器、声纹验证的方法及存储介质,旨在提高声纹验证的灵活性,避免出现声音劫持。The purpose of the present application is to provide a server, a voiceprint verification method and a storage medium, which aim to improve the flexibility of voiceprint verification and avoid sound hijacking.
为实现上述目的,本申请提供一种服务器,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:To achieve the above object, the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
生成步骤,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,生成与该用户身份标识对应的图形码的图形码参数,并将该图形码 参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;a generating step, after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
分析步骤,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;The analyzing step, after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
获取步骤,若是,则建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;Obtaining, if yes, establishing a voice data collection channel with the handheld terminal, and acquiring current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
验证步骤,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。a verification step of constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, and determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
为实现上述目的,本申请还提供一种声纹验证的方法,所述声纹验证的方法包括:To achieve the above object, the present application further provides a method for voiceprint verification, and the method for voiceprint verification includes:
S1,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,服务器生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;S1. After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
S2,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;S2. After the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the data. Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
S3,若是,则所述服务器建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;S3, if yes, the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
S4,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。S4. Construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, determine a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculate a current voice. The distance between the texture identification vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存 储有处理系统,所述处理系统被处理器执行时实现上述的声纹验证的方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
本申请的有益效果是:本申请不需要所开发的客户端程序采集用户的语音数据,使用手持终端进行声纹验证灵活性高且不易受干扰,利用用户身份标识将服务器与客户端计算机绑定,再利用随机码再将客户端计算机、服务器及手持终端进行绑定,避免出现声音劫持的情况,提高声纹验证的真实性及安全性。The application has the beneficial effects that the application does not require the developed client program to collect the user's voice data, and the voiceprint verification using the handheld terminal is highly flexible and not easily interfered, and the server is bound to the client computer by using the user identity. Then, the random code is used to bind the client computer, the server and the handheld terminal to avoid the situation of sound hijacking, and improve the authenticity and security of the voiceprint verification.
附图说明DRAWINGS
图1为本申请各个实施例一可选的应用环境示意图;1 is a schematic diagram of an optional application environment of each embodiment of the present application;
图2为本申请声纹验证的方法一实施例的流程示意图。2 is a schematic flow chart of an embodiment of a method for voiceprint verification according to the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the objects, technical solutions, and advantages of the present application more comprehensible, the present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions of "first", "second" and the like in the present application are for the purpose of description only, and are not to be construed as indicating or implying their relative importance or implicitly indicating the number of technical features indicated. . Thus, features defining "first" or "second" may include at least one of the features, either explicitly or implicitly. In addition, the technical solutions between the various embodiments may be combined with each other, but must be based on the realization of those skilled in the art, and when the combination of the technical solutions is contradictory or impossible to implement, it should be considered that the combination of the technical solutions does not exist. Nor is it within the scope of protection required by this application.
参阅图1所示,是本申请声纹验证的方法的较佳实施例的应用环境示意图。该应用环境示意图包括服务器上1、客户端计算机2及手持终端3。服务器上1可以通过网络、近场通信技术等适合的技术与客户端计算机2及手持终端3进行数据交互。Referring to FIG. 1, it is a schematic diagram of an application environment of a preferred embodiment of the method for voiceprint verification of the present application. The application environment diagram includes a server 1, a client computer 2, and a handheld terminal 3. The server 1 can perform data interaction with the client computer 2 and the handheld terminal 3 through a suitable technology such as a network or a near field communication technology.
所述客户端计算机2包括,但不限于,任何一种可与用户通过键盘、鼠标、遥控器、触摸板或者声控设备等方式进行人机交互的电子产品,例如,个人计算机、平板电脑、智能手机、个人数字助理(Personal Digital Assistant, PDA)、游戏机、交互式网络电视(Internet Protocol Television,IPTV)、智能式穿戴式设备、导航装置等等的可移动设备,或者诸如数字TV、台式计算机、笔记本、服务器等等的固定终端。所述手持终端3可以是平板电脑、智能手机等。The client computer 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a personal computer, a tablet, or a smart device. Mobile devices such as mobile phones, personal digital assistants (PDAs), game consoles, Internet Protocol Television (IPTV), smart wearable devices, navigation devices, etc., or such as digital TVs, desktop computers Fixed terminals for notebooks, servers, etc. The handheld terminal 3 can be a tablet computer, a smart phone, or the like.
所述服务器上1是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。所述服务器上1可以是单个网络服务器、多个网络服务器组成的服务器组或者基于云计算的由大量主机或者网络服务器构成的云,其中云计算是分布式计算的一种,由一群松散耦合的计算机集组成的一个超级虚拟计算机。The server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance. The server 1 may be a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, wherein cloud computing is a kind of distributed computing, which is loosely coupled by a group. A super virtual computer consisting of a set of computers.
在本实施例中,服务器上1可包括,但不仅限于,可通过系统总线相互通信连接的存储器11、处理器12、网络接口13,存储器11存储有可在处理器12上运行的处理系统。需要指出的是,图1仅示出了具有组件11-13的服务器上1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。In the present embodiment, the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is pointed out that Figure 1 shows only the server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
其中,存储器11包括内存及至少一种类型的可读存储介质。内存为服务器上1的运行提供缓存;可读存储介质可为如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等的非易失性存储介质。在一些实施例中,可读存储介质可以是服务器上1的内部存储单元,例如该服务器上1的硬盘;在另一些实施例中,该非易失性存储介质也可以是服务器上1的外部存储设备,例如服务器上1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。本实施例中,存储器11的可读存储介质通常用于存储安装于服务器上1的操作系统和各类应用软件,例如存储本申请一实施例中的处理系统的程序代码等。此外,存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 11 includes a memory and at least one type of readable storage medium. The memory provides a cache for the operation of the server 1; the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM). A non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like. In some embodiments, the readable storage medium may be an internal storage unit on the server 1, such as a hard disk on the server 1; in other embodiments, the non-volatile storage medium may also be external to the server 1 Storage devices, such as plug-in hard drives on the server 1, smart memory cards (SMC), Secure Digital (SD) cards, flash cards, etc. In this embodiment, the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器上1的总体操作,例如执行与所述客户端计算机2、手持终端3进行数据交互或者通信相关的控制和处理等。本实施例中,所述处理器12用于运行所述存储器11中存储的程序代码或者处理数据,例如运行处理系统等。The processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 12 is typically used to control overall operations on the server 1, such as performing control and processing related to data interaction or communication with the client computer 2, the handheld terminal 3. In this embodiment, the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述服务器上1与其他电子设备之间建立通信连接。本实施例中,网络接口13主要用于将服务器上1与客户端计算机2、手持终端3相连,在服务器上1与客户端计算机2、手持终端3之间建立数据传输通道和通信连接。The network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices. In this embodiment, the network interface 13 is mainly used to connect the server 1 to the client computer 2 and the handheld terminal 3, and establish a data transmission channel and a communication connection between the server 1 and the client computer 2 and the handheld terminal 3.
所述处理系统存储在存储器11中,包括至少一个存储在存储器11中的计算机可读指令,该至少一个计算机可读指令可被处理器器12执行,以实现本申请各实施例的方法;以及,该至少一个计算机可读指令依据其各部分所实现的功能不同,可被划为不同的逻辑模块。The processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application; The at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
在一实施例中,上述处理系统被所述处理器12执行时实现如下步骤:In an embodiment, when the processing system is executed by the processor 12, the following steps are implemented:
生成步骤,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码;a generating step, after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter;
其中,用户身份标识为用于唯一标识用户的身份的标识,优选地,用户身份标识为身份证号。图形码优选为二维码,但不限定于此,例如还可以是条形码等。图形码参数用于生成对应的图形码,例如二维码参数生成对应的二维码,条形码参数生成对应的条形码。图形码参数包括随机秘钥和声纹数据采集链接地址,还可进一步包括图形码的有效时间、图形码的详细信息、图形码的场景值ID等,随机秘钥可以是随机数字串或者随机字符串等等。The user identity is an identifier for uniquely identifying the identity of the user. Preferably, the user identity is an identity card number. The graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like. The graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode. The graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
客户端计算机向服务器发送的携带用户身份标识的身份验证请求,服务器接收到该身份验证请求后,生成与该用户身份标识对应的随机秘钥、该服务器的声纹数据采集链接地址以及图形码的有效时间、图形码的详细信息、图形码的场景值ID等图形码参数,将该图形码参数发送给该客户端计算机,客户端计算机接收到图形码参数后,根据图形码参数生成对应的图形码,并进行显示,供手持终端进行扫描。The client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code. The effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter. The code is displayed and displayed for scanning by the handheld terminal.
分析步骤,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;The analyzing step, after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
手持终端在扫描图形码后,利用自身的用于解析图形码的功能模块对图形码进行解析,得到对应的随机秘钥、该服务器的声纹数据采集链接地址以及图形码的有效时间、图形码的详细信息、图形码的场景值ID等图形码参 数,手持终端通过声纹数据采集链接地址发送携带有随机秘钥的声纹验证请求至服务器中。After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
服务器接收到该声纹验证请求后,分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致,为了防止其他手持终端盗用本次的随机秘钥后与服务器进行声纹验证,提高声纹验证的准确性,在一实施例中,服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,首先分析接收到该随机秘钥的次数是否大于预设次数;若接收到该随机秘钥的次数大于预设次数,例如大于1次,则服务器拒绝响应该声纹验证请求,并可将该手持终端的相关信息发送给服务器,供服务器后续作为声纹验证是否被欺诈的参考依据,若小于等于该预设次数,例如为1次,则再执行分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致的操作。After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key. After performing voiceprint verification with the server to improve the accuracy of the voiceprint verification, in an embodiment, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the handheld terminal.
为了防止其他手持终端盗用本次的随机秘钥后与服务器进行声纹验证,进一步提高声纹验证的准确性,在另一实施例中,服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,首先分析接收到该随机秘钥的时间是否处于该图形码的有效时间范围内,例如该图形码的有效时间为2018.03.01-2018.03.10,服务器接收手持终端的随机秘钥的时间为2018.03.08,则处于该图形码的有效时间范围内。若处于该图形码的有效时间范围内,则再分析接收到该随机秘钥的次数是否大于预设次数,例如,分析接收到该随机秘钥的次数是否大于1次;若接收到该随机秘钥的次数大于预设次数,则服务器拒绝响应该声纹验证请求,并可将该手持终端的相关信息发送给服务器,供服务器后续作为声纹验证是否被欺诈的参考依据。若小于等于该预设次数,则最后再执行分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致的操作。In order to prevent other handheld terminals from stealing the current random key and performing voiceprint verification with the server to further improve the accuracy of the voiceprint verification, in another embodiment, the server receives the portable terminal to transmit by using the voiceprint data collection link address. The voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code. If it is within the valid time range of the graphic code, it is analyzed whether the number of times the random key is received is greater than a preset number, for example, whether the number of times the random key is received is greater than one; if the random secret is received If the number of times of the key is greater than the preset number of times, the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
获取步骤,若是,则建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;Obtaining, if yes, establishing a voice data collection channel with the handheld terminal, and acquiring current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
如果发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥一致,则建立与该手持终端的语音数据采集信道。手持终端通过麦克风等语音采集设备实时采集得到用户的当前声纹验证语音数据。在采集当前声纹验证语音数据时,应尽量防止环境噪声和手持终端的干扰。手持终端与用户保持适当距离,且尽量不用失真大的手持终端,电源优选使用市电,并保持电流稳定;在进行录音时应使用传感器。在分帧和采样之前,可以对当前声纹验证语音数据进行去噪音处理,以进一步减少干扰。为了能 够提取得到当前声纹验证语音数据的声纹特征,所采集的当前声纹验证语音数据为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。If the random key in the graphic code parameter sent to the client computer coincides with the random key received from the handheld terminal, a voice data collection channel with the handheld terminal is established. The handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal. The handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal. The power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording. The current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference. In order to extract the voiceprint feature of the current voiceprint verification voice data, the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
验证步骤,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。a verification step of constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, and determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
为了有效减少声纹识别的计算量,提高声纹识别的速度,在一实施例中,上述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量。In order to effectively reduce the calculation amount of the voiceprint recognition and improve the speed of the voiceprint recognition, in an embodiment, the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
其中,声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例预设类型声纹特征优选为当前声纹验证语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC),预设滤波器为梅尔滤波器。在构建对应的声纹特征向量时,将当前声纹验证语音数据的声纹特征组成特征数据矩阵,该特征数据矩阵即为对应的声纹特征向量。The voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like. In this embodiment, the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter. When constructing the corresponding voiceprint feature vector, the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
具体地,对当前声纹验证语音数据进行预加重及加窗处理,对每一个加窗进行傅立叶变换得到对应的频谱,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;在梅尔频谱上进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。Specifically, pre-emphasizing and windowing processing the current voiceprint verification voice data, performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum; A cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
其中,预加重处理实际是高通滤波处理,滤除低频数据,使得当前声纹验证语音数据中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ -1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于语音数据在分帧之后在一定程度上背离原始语音,因此,需要对语音数据进行加窗处理。在梅尔频谱上进行倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为梅尔频率倒谱系数MFCC。梅尔频率倒谱系数MFCC即为这帧语音数据的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为语音采样数据的声纹特征向量。 The pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent. Specifically, the transfer function of the high-pass filter is: H(Z)=1-αZ - 1 , wherein Z is voice data, α is a constant coefficient, preferably, the value of α is 0.97; since the voice data deviates from the original voice to some extent after the frame division, the voice data needs to be windowed. The cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform. The inverse transform is generally realized by DCT discrete cosine transform. The second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients. MFCC. The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame. The Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
本实施例取语音数据的梅尔频率倒谱系数MFCC组成对应的声纹特征向量,由于其比用于正常的对数倒频谱中的线性间隔的频带更能近似人类的 听觉系统,因此能够提高身份验证的准确性。In this embodiment, the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
然后,将上述声纹特征向量输入预先训练的背景信道模型,以构建出当前声纹验证语音数据对应的当前声纹鉴别向量,例如,利用预先训练的背景信道模型来计算当前声纹验证语音数据对应的特征矩阵,以确定出当前声纹验证语音数据对应的当前声纹鉴别向量。Then, the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data. Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
为了高效率、高质量地构建出当前声纹验证语音数据对应的当前声纹鉴别向量,在一优选的实施例中,该背景信道模型为一组高斯混合模型,该背景信道模型的训练过程包括如下步骤:1.获取预设数量的语音数据样本,各个预设数量的语音数据样本对应有标准的声纹鉴别向量;2.分别对各个语音数据样本进行处理以提取出各个语音数据样本对应的预设类型声纹特征,并基于各个语音数据样本对应的预设类型声纹特征构建各个语音数据样本对应的声纹特征向量;3.将提取出的所有预设类型声纹特征向量分为第一百分比的训练集和第二百分比的验证集,所述第一百分比和第二百分比之和小于或者等于100%;4.利用训练集中的预设类型声纹特征向量对该组高斯混合模型进行训练,并在训练完成后利用验证集对训练后的该组高斯混合模型的准确率进行验证;若准确率大于预设阈值(例如,98.5%),则训练结束,以训练后的该组高斯混合模型作为待使用的背景信道模型,或者,若准确率小于或者等于预设阈值,则增加语音数据样本的数量,并重新进行训练,直至该组高斯混合模型的准确率大于预设阈值。In order to construct the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data with high efficiency and high quality, in a preferred embodiment, the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3. dividing all the extracted preset voiceprint feature vectors into the first a percentage of the training set and the second percentage of the verification set, the sum of the first percentage and the second percentage being less than or equal to 100%; 4. utilizing a preset type of voiceprint feature in the training set The Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
本实施例预先训练的背景信道模型为通过对大量语音数据的挖掘与比对训练得到,这一模型可以在最大限度保留用户的声纹特征的同时,精确刻画用户说话时的背景声纹特征,并能够在识别时将这一特征去除,而提取用户声音的固有特征,能够较大地提高用户身份验证的准确率及效率。The background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data. This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
在一实施例中,上述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:In an embodiment, the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
Figure PCTCN2018102049-appb-000001
Figure PCTCN2018102049-appb-000002
为所述标准声纹鉴别向量,
Figure PCTCN2018102049-appb-000003
为当前声纹鉴别向量;若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。
Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
Figure PCTCN2018102049-appb-000001
Figure PCTCN2018102049-appb-000002
Identifying the vector for the standard voiceprint,
Figure PCTCN2018102049-appb-000003
And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
其中,在存储用户的标准声纹鉴别向量时可以携带用户身份标识,在验证用户的身份时,根据当前声纹鉴别向量的标识信息匹配得到对应的标准声 纹鉴别向量,并计算当前声纹鉴别向量与匹配得到的标准声纹鉴别向量之间的余弦距离,以余弦距离来验证目标用户的身份,提高身份验证的准确性。Wherein, when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried. When the identity of the user is verified, the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated. The cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
与现有技术相比,本申请在进行声纹验证时采用客户端计算机、服务器及手持终端组成的架构,客户端计算机携带用户身份标识向服务器进行请求,服务器生成与用户身份标识对应的图形码参数并发送给客户端计算机,供其进行显示图形码参数对应的图形码,用户利用携带的手持终端扫描图形码后通过链接地址发送随机码给服务器进行验证,验证通过就可以与服务器建立信道,并获取手持终端采集的用户的语音数据,进行声纹验证,本申请不需要所开发的客户端程序采集用户的语音数据,使用手持终端进行声纹验证灵活性高且不易受干扰,利用用户身份标识将服务器与客户端计算机绑定,再利用随机码再将客户端计算机、服务器及手持终端进行绑定,避免出现声音劫持的情况,提高声纹验证的真实性及安全性。Compared with the prior art, the present application adopts an architecture composed of a client computer, a server and a handheld terminal when performing voiceprint verification, the client computer carries a user identity to make a request to the server, and the server generates a graphic code corresponding to the user identity. The parameter is sent to the client computer for displaying the graphic code corresponding to the graphic code parameter, and the user scans the graphic code by using the carried handheld terminal, and then sends a random code to the server for verification through the link address, and the channel can be established with the server after the verification is passed. The voice data of the user collected by the handheld terminal is obtained, and voiceprint verification is performed. The application does not require the developed client program to collect the voice data of the user, and the voice recording verification using the handheld terminal is highly flexible and not easily interfered, and the user identity is utilized. The logo binds the server to the client computer, and then binds the client computer, the server, and the handheld terminal with a random code to avoid sound hijacking and improve the authenticity and security of the voiceprint verification.
如图2所示,图2为本申请声纹验证的方法一实施例的流程示意图,该声纹验证的方法包括以下步骤:As shown in FIG. 2, FIG. 2 is a schematic flowchart of a method for voiceprint verification according to an embodiment of the present invention. The voiceprint verification method includes the following steps:
步骤S1,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,服务器生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;Step S1: After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer. And the client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter comprises a random key and a voiceprint data collection link address;
其中,用户身份标识为用于唯一标识用户的身份的标识,优选地,用户身份标识为身份证号。图形码优选为二维码,但不限定于此,例如还可以是条形码等。图形码参数用于生成对应的图形码,例如二维码参数生成对应的二维码,条形码参数生成对应的条形码。图形码参数包括随机秘钥和声纹数据采集链接地址,还可进一步包括图形码的有效时间、图形码的详细信息、图形码的场景值ID等,随机秘钥可以是随机数字串或者随机字符串等等。The user identity is an identifier for uniquely identifying the identity of the user. Preferably, the user identity is an identity card number. The graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like. The graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode. The graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
客户端计算机向服务器发送的携带用户身份标识的身份验证请求,服务器接收到该身份验证请求后,生成与该用户身份标识对应的随机秘钥、该服务器的声纹数据采集链接地址以及图形码的有效时间、图形码的详细信息、图形码的场景值ID等图形码参数,将该图形码参数发送给该客户端计算机,客户端计算机接收到图形码参数后,根据图形码参数生成对应的图形码,并进行显示,供手持终端进行扫描。The client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code. The effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter. The code is displayed and displayed for scanning by the handheld terminal.
步骤S2,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址 后,所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;Step S2, after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request carried by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes Whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal;
手持终端在扫描图形码后,利用自身的用于解析图形码的功能模块对图形码进行解析,得到对应的随机秘钥、该服务器的声纹数据采集链接地址以及图形码的有效时间、图形码的详细信息、图形码的场景值ID等图形码参数,手持终端通过声纹数据采集链接地址发送携带有随机秘钥的声纹验证请求至服务器中。After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
服务器接收到该声纹验证请求后,分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致,为了防止其他手持终端盗用本次的随机秘钥后与服务器进行声纹验证,提高声纹验证的准确性,在一实施例中,服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,首先分析接收到该随机秘钥的次数是否大于预设次数;若接收到该随机秘钥的次数大于预设次数,例如大于1次,则服务器拒绝响应该声纹验证请求,并可将该手持终端的相关信息发送给服务器,供服务器后续作为声纹验证是否被欺诈的参考依据,若小于等于该预设次数,例如为1次,则再执行分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致的操作。After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key. After performing voiceprint verification with the server to improve the accuracy of the voiceprint verification, in an embodiment, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the handheld terminal.
为了防止其他手持终端盗用本次的随机秘钥后与服务器进行声纹验证,进一步提高声纹验证的准确性,在另一实施例中,服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,首先分析接收到该随机秘钥的时间是否处于该图形码的有效时间范围内,例如该图形码的有效时间为2018.03.01-2018.03.10,服务器接收手持终端的随机秘钥的时间为2018.03.08,则处于该图形码的有效时间范围内。若处于该图形码的有效时间范围内,则再分析接收到该随机秘钥的次数是否大于预设次数,例如,分析接收到该随机秘钥的次数是否大于1次;若接收到该随机秘钥的次数大于预设次数,则服务器拒绝响应该声纹验证请求,并可将该手持终端的相关信息发送给服务器,供服务器后续作为声纹验证是否被欺诈的参考依据。若小于等于该预设次数,则最后再执行分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致的操作。In order to prevent other handheld terminals from stealing the current random key and performing voiceprint verification with the server to further improve the accuracy of the voiceprint verification, in another embodiment, the server receives the portable terminal to transmit by using the voiceprint data collection link address. The voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code. If it is within the valid time range of the graphic code, it is analyzed whether the number of times the random key is received is greater than a preset number, for example, whether the number of times the random key is received is greater than one; if the random secret is received If the number of times of the key is greater than the preset number of times, the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
步骤S3,若是,则所述服务器建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;Step S3, if yes, the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
如果发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥一致,则建立与该手持终端的语音数据采集信道。手持终端通过麦克风等语音采集设备实时采集得到用户的当前声纹验证语音数据。在采集当前声纹验证语音数据时,应尽量防止环境噪声和手持终端的干扰。手持终端与用户保持适当距离,且尽量不用失真大的手持终端,电源优选使用市电,并保持电流稳定;在进行录音时应使用传感器。在分帧和采样之前,可以对当前声纹验证语音数据进行去噪音处理,以进一步减少干扰。为了能够提取得到当前声纹验证语音数据的声纹特征,所采集的当前声纹验证语音数据为预设数据长度的语音数据,或者为大于预设数据长度的语音数据。If the random key in the graphic code parameter sent to the client computer coincides with the random key received from the handheld terminal, a voice data collection channel with the handheld terminal is established. The handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal. The handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal. The power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording. The current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference. In order to extract the voiceprint feature of the current voiceprint verification voice data, the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
步骤S4,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。Step S4, constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between a predetermined user identity identifier and a standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
为了有效减少声纹识别的计算量,提高声纹识别的速度,在一实施例中,上述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量。In order to effectively reduce the calculation amount of the voiceprint recognition and improve the speed of the voiceprint recognition, in an embodiment, the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
其中,声纹特征包括多种类型,例如宽带声纹、窄带声纹、振幅声纹等,本实施例预设类型声纹特征优选为当前声纹验证语音数据的梅尔频率倒谱系数(Mel Frequency Cepstrum Coefficient,MFCC),预设滤波器为梅尔滤波器。在构建对应的声纹特征向量时,将当前声纹验证语音数据的声纹特征组成特征数据矩阵,该特征数据矩阵即为对应的声纹特征向量。The voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like. In this embodiment, the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter. When constructing the corresponding voiceprint feature vector, the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
具体地,对当前声纹验证语音数据进行预加重及加窗处理,对每一个加窗进行傅立叶变换得到对应的频谱,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;在梅尔频谱上进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。Specifically, pre-emphasizing and windowing processing the current voiceprint verification voice data, performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum; A cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
其中,预加重处理实际是高通滤波处理,滤除低频数据,使得当前声纹验证语音数据中的高频特性更加突显,具体地,高通滤波的传递函数为:H(Z)=1-αZ -1,其中,Z为语音数据,α为常量系数,优选地,α的取值为0.97;由于语音数据在分帧之后在一定程度上背离原始语音,因此,需要对 语音数据进行加窗处理。在梅尔频谱上进行倒谱分析例如为取对数、做逆变换,逆变换一般是通过DCT离散余弦变换来实现,取DCT后的第2个到第13个系数作为梅尔频率倒谱系数MFCC。梅尔频率倒谱系数MFCC即为这帧语音数据的声纹特征,将每帧的梅尔频率倒谱系数MFCC组成特征数据矩阵,该特征数据矩阵即为语音采样数据的声纹特征向量。 The pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent. Specifically, the transfer function of the high-pass filter is: H(Z)=1-αZ - 1 , wherein Z is voice data, α is a constant coefficient, preferably, the value of α is 0.97; since the voice data deviates from the original voice to some extent after the frame division, the voice data needs to be windowed. The cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform. The inverse transform is generally realized by DCT discrete cosine transform. The second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients. MFCC. The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame. The Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
本实施例取语音数据的梅尔频率倒谱系数MFCC组成对应的声纹特征向量,由于其比用于正常的对数倒频谱中的线性间隔的频带更能近似人类的听觉系统,因此能够提高身份验证的准确性。In this embodiment, the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
然后,将上述声纹特征向量输入预先训练的背景信道模型,以构建出当前声纹验证语音数据对应的当前声纹鉴别向量,例如,利用预先训练的背景信道模型来计算当前声纹验证语音数据对应的特征矩阵,以确定出当前声纹验证语音数据对应的当前声纹鉴别向量。Then, the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data. Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
为了高效率、高质量地构建出当前声纹验证语音数据对应的当前声纹鉴别向量,在一优选的实施例中,该背景信道模型为一组高斯混合模型,该背景信道模型的训练过程包括如下步骤:1.获取预设数量的语音数据样本,各个预设数量的语音数据样本对应有标准的声纹鉴别向量;2.分别对各个语音数据样本进行处理以提取出各个语音数据样本对应的预设类型声纹特征,并基于各个语音数据样本对应的预设类型声纹特征构建各个语音数据样本对应的声纹特征向量;3.将提取出的所有预设类型声纹特征向量分为第一百分比的训练集和第二百分比的验证集,所述第一百分比和第二百分比之和小于或者等于100%;4.利用训练集中的预设类型声纹特征向量对该组高斯混合模型进行训练,并在训练完成后利用验证集对训练后的该组高斯混合模型的准确率进行验证;若准确率大于预设阈值(例如,98.5%),则训练结束,以训练后的该组高斯混合模型作为待使用的背景信道模型,或者,若准确率小于或者等于预设阈值,则增加语音数据样本的数量,并重新进行训练,直至该组高斯混合模型的准确率大于预设阈值。In order to construct the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data with high efficiency and high quality, in a preferred embodiment, the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3. dividing all the extracted preset voiceprint feature vectors into the first a percentage of the training set and the second percentage of the verification set, the sum of the first percentage and the second percentage being less than or equal to 100%; 4. utilizing a preset type of voiceprint feature in the training set The Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
本实施例预先训练的背景信道模型为通过对大量语音数据的挖掘与比对训练得到,这一模型可以在最大限度保留用户的声纹特征的同时,精确刻画用户说话时的背景声纹特征,并能够在识别时将这一特征去除,而提取用户声音的固有特征,能够较大地提高用户身份验证的准确率及效率。The background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data. This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
在一实施例中,上述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:In an embodiment, the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
Figure PCTCN2018102049-appb-000004
Figure PCTCN2018102049-appb-000005
为所述标准声纹鉴别向量,
Figure PCTCN2018102049-appb-000006
为当前声纹鉴别向量;若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。
Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
Figure PCTCN2018102049-appb-000004
Figure PCTCN2018102049-appb-000005
Identifying the vector for the standard voiceprint,
Figure PCTCN2018102049-appb-000006
And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
其中,在存储用户的标准声纹鉴别向量时可以携带用户身份标识,在验证用户的身份时,根据当前声纹鉴别向量的标识信息匹配得到对应的标准声纹鉴别向量,并计算当前声纹鉴别向量与匹配得到的标准声纹鉴别向量之间的余弦距离,以余弦距离来验证目标用户的身份,提高身份验证的准确性。Wherein, when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried. When the identity of the user is verified, the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated. The cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
本申请不需要所开发的客户端程序采集用户的语音数据,使用手持终端进行声纹验证灵活性高且不易受干扰,利用用户身份标识将服务器与客户端计算机绑定,再利用随机码再将客户端计算机、服务器及手持终端进行绑定,避免出现声音劫持的情况,提高声纹验证的真实性及安全性。The application does not require the developed client program to collect the user's voice data, and the voice recording verification using the handheld terminal is highly flexible and difficult to be interfered with, and the user identity is used to bind the server to the client computer, and then the random code is used again. The client computer, the server and the handheld terminal are bound to avoid the sound hijacking, and improve the authenticity and security of the voiceprint verification.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现上述的声纹验证的方法的步骤。The present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims (20)

  1. 一种服务器,其特征在于,所述服务器包括存储器及与所述存储器连接的处理器,所述存储器中存储有可在所述处理器上运行的处理系统,所述处理系统被所述处理器执行时实现如下步骤:A server, comprising: a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The following steps are implemented during execution:
    生成步骤,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;a generating step, after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
    分析步骤,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;The analyzing step, after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
    获取步骤,若是,则建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;Obtaining, if yes, establishing a voice data collection channel with the handheld terminal, and acquiring current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
    验证步骤,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。a verification step of constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, and determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
  2. 根据权利要求1所述的服务器,其特征在于,所述分析步骤,具体包括:The server according to claim 1, wherein the analyzing step comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的次数是否大于预设次数;Receiving, by the server, the voiceprint verification request that is sent by the handheld terminal by using the voiceprint data collection link address and carrying the random key, and analyzing whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  3. 根据权利要求1所述的服务器,其特征在于,所述图形码参数还包括图形码的有效时间,所述分析步骤,具体包括:The server according to claim 1, wherein the graphic code parameter further comprises an effective time of the graphic code, and the analyzing step comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的时间是否处于该图形码的有效时间范围内;Receiving, by the server, a voiceprint verification request that is sent by the handheld terminal by using a voiceprint data collection link address and carrying a random key, and analyzing whether the time of receiving the random key is within a valid time range of the graphic code;
    若处于该图形码的有效时间范围内,则分析接收到该随机秘钥的次数是否大于预设次数;If it is within the valid time range of the graphic code, analyze whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  4. 根据权利要求1所述的服务器,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The server according to claim 1, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100001
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100002
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100001
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100002
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  5. 根据权利要求2所述的服务器,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The server according to claim 2, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100003
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100004
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100003
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100004
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  6. 根据权利要求3所述的服务器,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The server according to claim 3, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100005
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100006
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100005
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100006
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  7. 根据权利要求4、5或6所述的服务器,其特征在于,所述对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量的步骤,具体包括:The server according to claim 4, 5 or 6, wherein the current voiceprint verification voice data is processed to extract a preset type voiceprint feature, and the corresponding voiceprint feature is constructed based on the preset type The steps of the voiceprint feature vector include:
    对所述当前声纹验证语音数据进行预加重、分帧和加窗处理,对每一个加窗进行傅立叶变换得到对应的频谱,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;Performing pre-emphasis, framing, and windowing on the current voiceprint verification voice data, performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
    在梅尔频谱上进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。A cepstrum analysis is performed on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  8. 一种声纹验证的方法,其特征在于,所述声纹验证的方法包括:A method for voiceprint verification, characterized in that the method for voiceprint verification comprises:
    S1,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,服务器生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;S1. After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
    S2,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致;S2. After the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the data. Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
    S3,若是,则所述服务器建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;S3, if yes, the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
    S4,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户 端计算机。S4. Construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, determine a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculate a current voice. The distance between the texture identification vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
  9. 根据权利要求8所述的声纹验证的方法,其特征在于,所述步骤S2,具体包括:The method of voiceprint verification according to claim 8, wherein the step S2 comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的次数是否大于预设次数;Receiving, by the server, the voiceprint verification request that is sent by the handheld terminal by using the voiceprint data collection link address and carrying the random key, and analyzing whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  10. 根据权利要求8所述的声纹验证的方法,其特征在于,所述图形码参数还包括图形码的有效时间,所述步骤S2,具体包括:The method of claim 8, wherein the graphic code parameter further comprises an effective time of the graphic code, and the step S2 comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的时间是否处于该图形码的有效时间范围内;Receiving, by the server, a voiceprint verification request that is sent by the handheld terminal by using a voiceprint data collection link address and carrying a random key, and analyzing whether the time of receiving the random key is within a valid time range of the graphic code;
    若处于该图形码的有效时间范围内,则分析接收到该随机秘钥的次数是否大于预设次数;If it is within the valid time range of the graphic code, analyze whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  11. 根据权利要求8所述的声纹验证的方法,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The method of claim 8, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100007
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100008
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100007
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100008
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  12. 根据权利要求9所述的声纹验证的方法,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The method according to claim 9, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100009
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100010
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100009
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100010
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  13. 根据权利要求10所述的声纹验证的方法,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The method of claim 10, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100011
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100012
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100011
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100012
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  14. 根据权利要求11、12或13所述的声纹验证的方法,其特征在于,所述对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量的步骤,具体包括:The method for verifying voiceprint according to claim 11, 12 or 13, wherein the current voiceprint verification voice data is processed to extract a preset type voiceprint feature, and based on the preset type voiceprint The step of constructing the corresponding voiceprint feature vector by the feature includes:
    对所述当前声纹验证语音数据进行预加重、分帧和加窗处理,对每一个加窗进行傅立叶变换得到对应的频谱,将所述频谱输入梅尔滤波器以输出得到梅尔频谱;Performing pre-emphasis, framing, and windowing on the current voiceprint verification voice data, performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
    在梅尔频谱上进行倒谱分析以获得梅尔频率倒谱系数MFCC,基于所述梅尔频率倒谱系数MFCC组成对应的声纹特征向量。A cepstrum analysis is performed on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有处理系统,所述处理系统被处理器执行时实现步骤:A computer readable storage medium, wherein the computer readable storage medium stores a processing system, and when the processing system is executed by the processor, the steps are:
    生成步骤,在接收到客户端计算机发送的携带用户身份标识的身份验证请求后,生成与该用户身份标识对应的图形码的图形码参数,并将该图形码参数发送给该客户端计算机,供该客户端计算机生成及显示与该图形码参数对应的图形码,所述图形码参数包括随机秘钥和声纹数据采集链接地址;a generating step, after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
    分析步骤,在手持终端解析图形码得到随机秘钥和声纹数据采集链接地址后,接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,并分析发送给该客户端计算机的图形码参数中的随机秘钥与从 手持终端接收的随机秘钥是否一致;The analyzing step, after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
    获取步骤,若是,则建立与该手持终端的语音数据采集信道,并基于该语音数据采集信道获取从该手持终端采集的用户的当前声纹验证语音数据;Obtaining, if yes, establishing a voice data collection channel with the handheld terminal, and acquiring current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
    验证步骤,构建该当前声纹验证语音数据对应的当前声纹鉴别向量,根据预定的用户身份标识与标准声纹鉴别向量的映射关系,确定该用户身份标识对应的标准声纹鉴别向量,计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果,并将该身份验证结果发送给该客户端计算机。a verification step of constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, and determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between the predetermined user identity identifier and the standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
  16. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述分析步骤,具体包括:The computer readable storage medium according to claim 15, wherein the analyzing step comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的次数是否大于预设次数;Receiving, by the server, the voiceprint verification request that is sent by the handheld terminal by using the voiceprint data collection link address and carrying the random key, and analyzing whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  17. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述图形码参数还包括图形码的有效时间,所述分析步骤,具体包括:The computer readable storage medium according to claim 15, wherein the graphic code parameter further comprises an effective time of the graphic code, and the analyzing step comprises:
    所述服务器接收手持终端通过声纹数据采集链接地址发送的携带有随机秘钥的声纹验证请求,分析接收到该随机秘钥的时间是否处于该图形码的有效时间范围内;Receiving, by the server, a voiceprint verification request that is sent by the handheld terminal by using a voiceprint data collection link address and carrying a random key, and analyzing whether the time of receiving the random key is within a valid time range of the graphic code;
    若处于该图形码的有效时间范围内,则分析接收到该随机秘钥的次数是否大于预设次数;If it is within the valid time range of the graphic code, analyze whether the number of times the random key is received is greater than a preset number of times;
    若小于等于该预设次数,则分析发送给该客户端计算机的图形码参数中的随机秘钥与从手持终端接收的随机秘钥是否一致。If it is less than or equal to the preset number of times, it is analyzed whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal.
  18. 根据权利要求15所述的计算机可读存储介质,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The computer readable storage medium according to claim 15, wherein the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100013
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100014
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100013
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100014
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  19. 根据权利要求16所述的计算机可读存储介质,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The computer readable storage medium according to claim 16, wherein the step of constructing the current voiceprint authentication vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100015
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100016
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100015
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100016
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
  20. 根据权利要求17所述的计算机可读存储介质,其特征在于,所述构建该当前声纹验证语音数据对应的当前声纹鉴别向量的步骤,具体包括:The computer readable storage medium according to claim 17, wherein the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data comprises:
    对当前声纹验证语音数据进行处理,以提取预设类型声纹特征,并基于该预设类型声纹特征构建对应的声纹特征向量;Processing the current voiceprint verification voice data to extract a preset type voiceprint feature, and constructing a corresponding voiceprint feature vector based on the preset voiceprint feature;
    将该声纹特征向量输入预先训练的背景信道模型中,以构建该当前声纹验证语音数据对应的当前声纹鉴别向量;Inputting the voiceprint feature vector into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data;
    所述计算当前声纹鉴别向量与标准声纹鉴别向量之间的距离,基于计算的距离生成身份验证结果的步骤包括:The calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance includes:
    计算该当前声纹鉴别向量与标准声纹鉴别向量之间的余弦距离:
    Figure PCTCN2018102049-appb-100017
    为所述标准声纹鉴别向量,
    Figure PCTCN2018102049-appb-100018
    为当前声纹鉴别向量;
    Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector:
    Figure PCTCN2018102049-appb-100017
    Identifying the vector for the standard voiceprint,
    Figure PCTCN2018102049-appb-100018
    Identify the vector for the current voiceprint;
    若所述余弦距离小于或者等于预设的距离阈值,则生成验证通过的信息;Generating a verification pass if the cosine distance is less than or equal to a preset distance threshold;
    若所述余弦距离大于预设的距离阈值,则生成验证不通过的信息。If the cosine distance is greater than a preset distance threshold, information that the verification fails is generated.
PCT/CN2018/102049 2018-05-14 2018-08-24 Server, voiceprint verification method, and storage medium WO2019218512A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810457267.1A CN108650266B (en) 2018-05-14 2018-05-14 Server, voiceprint verification method and storage medium
CN201810457267.1 2018-05-14

Publications (1)

Publication Number Publication Date
WO2019218512A1 true WO2019218512A1 (en) 2019-11-21

Family

ID=63755329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102049 WO2019218512A1 (en) 2018-05-14 2018-08-24 Server, voiceprint verification method, and storage medium

Country Status (2)

Country Link
CN (1) CN108650266B (en)
WO (1) WO2019218512A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462482B (en) * 2018-11-09 2023-08-08 深圳壹账通智能科技有限公司 Voiceprint recognition method, voiceprint recognition device, electronic equipment and computer readable storage medium
CN113129903A (en) * 2019-12-31 2021-07-16 深圳市航盛电子股份有限公司 Automatic audio test method and device, computer equipment and storage medium
CN113973299B (en) * 2020-07-22 2023-09-29 中国石油化工股份有限公司 Wireless sensor with identity authentication function and identity authentication method
CN111931146B (en) * 2020-07-24 2024-01-19 捷德(中国)科技有限公司 Identity verification method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015059365A1 (en) * 2013-10-25 2015-04-30 Aplcomp Oy Audiovisual -->associative --> authentication --> method and related system
CN105100123A (en) * 2015-09-11 2015-11-25 深圳市亚略特生物识别科技有限公司 Application registration method and system
CN107517207A (en) * 2017-03-13 2017-12-26 平安科技(深圳)有限公司 Server, auth method and computer-readable recording medium
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4463526B2 (en) * 2003-10-24 2010-05-19 株式会社ユニバーサルエンターテインメント Voiceprint authentication system
CN107610707B (en) * 2016-12-15 2018-08-31 平安科技(深圳)有限公司 A kind of method for recognizing sound-groove and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015059365A1 (en) * 2013-10-25 2015-04-30 Aplcomp Oy Audiovisual -->associative --> authentication --> method and related system
CN105100123A (en) * 2015-09-11 2015-11-25 深圳市亚略特生物识别科技有限公司 Application registration method and system
CN107517207A (en) * 2017-03-13 2017-12-26 平安科技(深圳)有限公司 Server, auth method and computer-readable recording medium
CN107993071A (en) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 Electronic device, auth method and storage medium based on vocal print

Also Published As

Publication number Publication date
CN108650266A (en) 2018-10-12
CN108650266B (en) 2020-02-18

Similar Documents

Publication Publication Date Title
WO2018166187A1 (en) Server, identity verification method and system, and a computer-readable storage medium
WO2019100606A1 (en) Electronic device, voiceprint-based identity verification method and system, and storage medium
US10825452B2 (en) Method and apparatus for processing voice data
JP6621536B2 (en) Electronic device, identity authentication method, system, and computer-readable storage medium
WO2019218512A1 (en) Server, voiceprint verification method, and storage medium
JP6649474B2 (en) Voiceprint identification method, apparatus and background server
WO2019205369A1 (en) Electronic device, identity recognition method based on human face image and voiceprint information, and storage medium
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
CN109660509A (en) Login method, device, system and storage medium based on recognition of face
WO2021051572A1 (en) Voice recognition method and apparatus, and computer device
WO2019196305A1 (en) Electronic device, identity verification method, and storage medium
CN110247898B (en) Identity verification method, identity verification device, identity verification medium and electronic equipment
CN109947971B (en) Image retrieval method, image retrieval device, electronic equipment and storage medium
CN105224844B (en) Verification method, system and device
WO2019218515A1 (en) Server, voiceprint-based identity authentication method, and storage medium
WO2020007191A1 (en) Method and apparatus for living body recognition and detection, and medium and electronic device
CN111709851B (en) Hotel safety check-in method, device and equipment based on RFID and facial recognition
US20170277423A1 (en) Information processing method and electronic device
WO2021196458A1 (en) Intelligent loan entry method, and apparatus and storage medium
CN115690920B (en) Credible living body detection method for medical identity authentication and related equipment
CN113436633B (en) Speaker recognition method, speaker recognition device, computer equipment and storage medium
CN113393318A (en) Bank card application wind control method and device, electronic equipment and medium
CN116629901A (en) Request processing method, device, computer equipment and storage medium
CN113762060A (en) Face image detection method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919055

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 26/02/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18919055

Country of ref document: EP

Kind code of ref document: A1