WO2019218512A1 - Serveur, procédé de vérification d'empreinte vocale et support d'informations - Google Patents

Serveur, procédé de vérification d'empreinte vocale et support d'informations Download PDF

Info

Publication number
WO2019218512A1
WO2019218512A1 PCT/CN2018/102049 CN2018102049W WO2019218512A1 WO 2019218512 A1 WO2019218512 A1 WO 2019218512A1 CN 2018102049 W CN2018102049 W CN 2018102049W WO 2019218512 A1 WO2019218512 A1 WO 2019218512A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint
vector
current
verification
graphic code
Prior art date
Application number
PCT/CN2018/102049
Other languages
English (en)
Chinese (zh)
Inventor
程序
彭俊清
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019218512A1 publication Critical patent/WO2019218512A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0807Network architectures or network communication protocols for network security for authentication of entities using tickets, e.g. Kerberos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/083Network architectures or network communication protocols for network security for authentication of entities using passwords
    • H04L63/0838Network architectures or network communication protocols for network security for authentication of entities using passwords using one-time-passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0876Network architectures or network communication protocols for network security for authentication of entities based on the identity of the terminal or configuration, e.g. MAC address, hardware or software configuration or device fingerprint

Definitions

  • the present application relates to the field of communications technologies, and in particular, to a server, a method for voiceprint verification, and a storage medium.
  • voiceprint verification technology to verify user identity has become an important means of verification for major customer service companies (eg, banks, insurance companies, game companies, etc.).
  • the traditional business solution for realizing user authentication using voiceprint verification technology is to use the interface of the voiceprint verification server to separately develop the corresponding client program, and collect and pre-process the user's voice through the developed client program, and then The voiceprint data processed in the previous period is transmitted to the voiceprint verification server, and the voiceprint verification server performs authentication verification and operation processing on the transmitted voiceprint data.
  • the purpose of the present application is to provide a server, a voiceprint verification method and a storage medium, which aim to improve the flexibility of voiceprint verification and avoid sound hijacking.
  • the present application provides a server including a memory and a processor coupled to the memory, the memory storing a processing system operable on the processor, the processing system being The processor implements the following steps when executed:
  • a generating step after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
  • the analyzing step after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the present application further provides a method for voiceprint verification, and the method for voiceprint verification includes:
  • the server After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter includes a random key and a voiceprint data collection link address;
  • the server After the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the data. Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
  • the application has the beneficial effects that the application does not require the developed client program to collect the user's voice data, and the voiceprint verification using the handheld terminal is highly flexible and not easily interfered, and the server is bound to the client computer by using the user identity. Then, the random code is used to bind the client computer, the server and the handheld terminal to avoid the situation of sound hijacking, and improve the authenticity and security of the voiceprint verification.
  • FIG. 1 is a schematic diagram of an optional application environment of each embodiment of the present application.
  • FIG. 2 is a schematic flow chart of an embodiment of a method for voiceprint verification according to the present application.
  • FIG. 1 it is a schematic diagram of an application environment of a preferred embodiment of the method for voiceprint verification of the present application.
  • the application environment diagram includes a server 1, a client computer 2, and a handheld terminal 3.
  • the server 1 can perform data interaction with the client computer 2 and the handheld terminal 3 through a suitable technology such as a network or a near field communication technology.
  • the client computer 2 includes, but is not limited to, any electronic product that can interact with a user through a keyboard, a mouse, a remote controller, a touch panel, or a voice control device, for example, a personal computer, a tablet, or a smart device.
  • Mobile devices such as mobile phones, personal digital assistants (PDAs), game consoles, Internet Protocol Television (IPTV), smart wearable devices, navigation devices, etc., or such as digital TVs, desktop computers Fixed terminals for notebooks, servers, etc.
  • the handheld terminal 3 can be a tablet computer, a smart phone, or the like.
  • the server 1 is a device capable of automatically performing numerical calculation and/or information processing in accordance with an instruction set or stored in advance.
  • the server 1 may be a single network server, a server group composed of multiple network servers, or a cloud-based cloud composed of a large number of hosts or network servers, wherein cloud computing is a kind of distributed computing, which is loosely coupled by a group.
  • a super virtual computer consisting of a set of computers.
  • the server 1 may include, but is not limited to, a memory 11, a processor 12, and a network interface 13 communicably connected to each other through a system bus, and the memory 11 stores a processing system operable on the processor 12. It is pointed out that Figure 1 shows only the server 1 with components 11-13, but it should be understood that not all illustrated components may be implemented, and more or fewer components may be implemented instead.
  • the memory 11 includes a memory and at least one type of readable storage medium.
  • the memory provides a cache for the operation of the server 1;
  • the readable storage medium may be, for example, a flash memory, a hard disk, a multimedia card, a card type memory (for example, SD or DX memory, etc.), a random access memory (RAM), a static random access memory (SRAM).
  • a non-volatile storage medium such as a read only memory (ROM), an electrically erasable programmable read only memory (EEPROM), a programmable read only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, or the like.
  • the readable storage medium may be an internal storage unit on the server 1, such as a hard disk on the server 1; in other embodiments, the non-volatile storage medium may also be external to the server 1 Storage devices, such as plug-in hard drives on the server 1, smart memory cards (SMC), Secure Digital (SD) cards, flash cards, etc.
  • the readable storage medium of the memory 11 is generally used to store an operating system installed on the server 1 and various types of application software, such as program code for storing the processing system in an embodiment of the present application. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control overall operations on the server 1, such as performing control and processing related to data interaction or communication with the client computer 2, the handheld terminal 3.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running a processing system or the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the network interface 13 is mainly used to connect the server 1 to the client computer 2 and the handheld terminal 3, and establish a data transmission channel and a communication connection between the server 1 and the client computer 2 and the handheld terminal 3.
  • the processing system is stored in the memory 11 and includes at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the methods of various embodiments of the present application;
  • the at least one computer readable instruction can be classified into different logic modules depending on the functions implemented by its various parts.
  • a generating step after receiving the identity verification request sent by the client computer and carrying the user identity, generating a graphic code parameter of the graphic code corresponding to the user identity, and sending the graphic code parameter to the client computer for The client computer generates and displays a graphic code corresponding to the graphic code parameter;
  • the user identity is an identifier for uniquely identifying the identity of the user.
  • the user identity is an identity card number.
  • the graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like.
  • the graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode.
  • the graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
  • the client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code.
  • the effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter.
  • the code is displayed and displayed for scanning by the handheld terminal.
  • the analyzing step after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes and sends the request Whether the random key in the graphic code parameter of the client computer is consistent with the random key received from the handheld terminal;
  • the handheld terminal After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
  • the server After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key.
  • the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the
  • the server receives the portable terminal to transmit by using the voiceprint data collection link address.
  • the voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code.
  • the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
  • the handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal.
  • the handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal.
  • the power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording.
  • the current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference.
  • the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
  • the voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like.
  • the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter.
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
  • pre-emphasizing and windowing processing the current voiceprint verification voice data performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
  • a cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent.
  • the cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform.
  • the inverse transform is generally realized by DCT discrete cosine transform.
  • the second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients.
  • MFCC The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame.
  • the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
  • the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
  • the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data.
  • Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
  • the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3.
  • the Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
  • the preset threshold for example, 98.5%
  • the background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data.
  • This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
  • the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
  • Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the user identity identifier when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried.
  • the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated.
  • the cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
  • the present application adopts an architecture composed of a client computer, a server and a handheld terminal when performing voiceprint verification, the client computer carries a user identity to make a request to the server, and the server generates a graphic code corresponding to the user identity.
  • the parameter is sent to the client computer for displaying the graphic code corresponding to the graphic code parameter, and the user scans the graphic code by using the carried handheld terminal, and then sends a random code to the server for verification through the link address, and the channel can be established with the server after the verification is passed.
  • the voice data of the user collected by the handheld terminal is obtained, and voiceprint verification is performed.
  • the application does not require the developed client program to collect the voice data of the user, and the voice recording verification using the handheld terminal is highly flexible and not easily interfered, and the user identity is utilized.
  • the logo binds the server to the client computer, and then binds the client computer, the server, and the handheld terminal with a random code to avoid sound hijacking and improve the authenticity and security of the voiceprint verification.
  • FIG. 2 is a schematic flowchart of a method for voiceprint verification according to an embodiment of the present invention.
  • the voiceprint verification method includes the following steps:
  • Step S1 After receiving the identity verification request that is sent by the client computer and carrying the user identity, the server generates a graphic code parameter of the graphic code corresponding to the user identity, and sends the graphic code parameter to the client computer. And the client computer generates and displays a graphic code corresponding to the graphic code parameter, where the graphic code parameter comprises a random key and a voiceprint data collection link address;
  • the user identity is an identifier for uniquely identifying the identity of the user.
  • the user identity is an identity card number.
  • the graphic code is preferably a two-dimensional code, but is not limited thereto, and may be, for example, a barcode or the like.
  • the graphic code parameter is used to generate a corresponding graphic code, for example, a two-dimensional code parameter generates a corresponding two-dimensional code, and the barcode parameter generates a corresponding barcode.
  • the graphic code parameter includes a random key and a voiceprint data collection link address, and may further include a valid time of the graphic code, detailed information of the graphic code, a scene value ID of the graphic code, etc., and the random key may be a random number string or a random character. Strings and so on.
  • the client computer sends an authentication request carrying the user identity to the server, and after receiving the identity verification request, the server generates a random key corresponding to the user identity, a voiceprint data collection link address of the server, and a graphic code.
  • the effective time, the detailed information of the graphic code, the scene value ID of the graphic code, and the like, the graphic code parameter is sent to the client computer, and after receiving the graphic code parameter, the client computer generates the corresponding graphic according to the graphic code parameter.
  • the code is displayed and displayed for scanning by the handheld terminal.
  • Step S2 after the handheld terminal parses the graphic code to obtain the random key and the voiceprint data collection link address, the server receives the voiceprint verification request carried by the handheld terminal through the voiceprint data collection link address and carries the random key, and analyzes Whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal;
  • the handheld terminal After scanning the graphic code, the handheld terminal parses the graphic code by using its own function module for analyzing the graphic code, and obtains the corresponding random key, the voiceprint data collection link address of the server, and the effective time and graphic code of the graphic code. The detailed information, the scene value ID of the graphic code, and the like, the handheld terminal sends a voiceprint verification request carrying the random key to the server through the voiceprint data collection link address.
  • the server After receiving the voiceprint verification request, the server analyzes whether the random key in the graphic code parameter sent to the client computer is consistent with the random key received from the handheld terminal, in order to prevent other handheld terminals from stealing the current random key.
  • the server receives the voiceprint verification request that is sent by the handheld terminal through the voiceprint data collection link address and carries the random key, and first analyzes the received Whether the number of times the random key is greater than a preset number; if the number of times the random key is received is greater than a preset number, for example, greater than one time, the server refuses to respond to the voiceprint verification request, and may information about the handheld terminal Sending to the server for the server to use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to the preset number of times, for example, the random key in the graphic code parameter sent to the client computer is analyzed. Whether the operation is consistent with the random key received from the
  • the server receives the portable terminal to transmit by using the voiceprint data collection link address.
  • the voiceprint verification request with the random key first analyzes whether the time when the random key is received is within the valid time range of the graphic code, for example, the effective time of the graphic code is 2018.03.01-2018.03.10, and the server receives the handheld The time of the terminal's random key is 2018.03.08, which is within the valid time range of the graphic code.
  • the server refuses to respond to the voiceprint verification request, and may send the related information of the handheld terminal to the server, for the server to subsequently use as a reference for whether the voiceprint verification is fraudulent. If the preset number of times is less than or equal to, the operation of analyzing whether the random key in the graphic code parameter sent to the client computer and the random key received from the handheld terminal are consistent is performed.
  • Step S3 if yes, the server establishes a voice data collection channel with the handheld terminal, and acquires current voiceprint verification voice data of the user collected from the handheld terminal based on the voice data collection channel;
  • the handheld terminal collects the current voiceprint verification voice data of the user through a voice collection device such as a microphone. When collecting current voiceprint verification voice data, it should try to prevent environmental noise and interference from the handheld terminal.
  • the handheld terminal maintains an appropriate distance from the user and tries not to use a large hand-held terminal.
  • the power supply is preferably powered by the mains and keeps the current stable; the sensor should be used when recording.
  • the current voiceprint verification voice data can be denoised before framing and sampling to further reduce interference.
  • the collected voiceprint verification voice data is voice data of a preset data length, or voice data greater than a preset data length.
  • Step S4 constructing a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, determining a standard voiceprint discrimination vector corresponding to the user identity identifier according to a mapping relationship between a predetermined user identity identifier and a standard voiceprint discrimination vector, and calculating a current The distance between the voiceprint discrimination vector and the standard voiceprint discrimination vector, the identity verification result is generated based on the calculated distance, and the identity verification result is sent to the client computer.
  • the step of constructing the current voiceprint discrimination vector corresponding to the current voiceprint verification voice data includes: verifying the current voiceprint The voice data is processed to extract a preset type voiceprint feature, and a corresponding voiceprint feature vector is constructed based on the preset type voiceprint feature; the voiceprint feature vector is input into a pre-trained background channel model to construct the current The voiceprint verifies the current voiceprint discrimination vector corresponding to the voice data.
  • the voiceprint feature includes a plurality of types, such as a wide-band voiceprint, a narrow-band voiceprint, an amplitude voiceprint, and the like.
  • the preset type voiceprint feature is preferably a Mel frequency cepstrum coefficient of the current voiceprint verification voice data (Mel Frequency Cepstrum Coefficient (MFCC), the default filter is a Meyer filter.
  • MFCC Mel Frequency Cepstrum Coefficient
  • the voiceprint feature of the current voiceprint verification voice data is composed into a feature data matrix, and the feature data matrix is the corresponding voiceprint feature vector.
  • pre-emphasizing and windowing processing the current voiceprint verification voice data performing Fourier transform on each window to obtain a corresponding spectrum, and inputting the spectrum into a Meyer filter to output a Mel spectrum;
  • a cepstrum analysis is performed on the spectrum to obtain a Mel frequency cepstral coefficient MFCC, and a corresponding voiceprint feature vector is formed based on the Mel frequency cepstral coefficient MFCC.
  • the pre-emphasis processing is actually a high-pass filtering process, filtering out the low-frequency data, so that the high-frequency characteristics in the current voiceprint verification voice data are more prominent.
  • the cepstrum analysis on the Mel spectrum is, for example, taking the logarithm and inverse transform.
  • the inverse transform is generally realized by DCT discrete cosine transform.
  • the second to thirteenth coefficients after DCT are taken as the Mel frequency cepstrum coefficients.
  • MFCC The Mel frequency cepstrum coefficient MFCC is the voiceprint feature of the speech data of this frame.
  • the Mel frequency cepstral coefficient MFCC of each frame is composed into a feature data matrix, which is the voiceprint feature vector of the speech sample data.
  • the voice frequency cepstral coefficient MFCC of the speech data is composed of a corresponding voiceprint feature vector, which can be improved because it is more similar to the human auditory system than the linearly spaced frequency band used in the normal cepstrum spectrum. The accuracy of the authentication.
  • the voiceprint feature vector is input into the pre-trained background channel model to construct a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data, for example, using the pre-trained background channel model to calculate the current voiceprint verification voice data.
  • Corresponding feature matrix to determine a current voiceprint discrimination vector corresponding to the current voiceprint verification voice data.
  • the background channel model is a set of Gaussian mixture models, and the training process of the background channel model includes The following steps are as follows: 1. Obtain a preset number of voice data samples, and each preset number of voice data samples corresponds to a standard voiceprint discrimination vector; 2. respectively process each voice data sample to extract corresponding voice data samples. Presetting the type of voiceprint feature, and constructing the voiceprint feature vector corresponding to each voice data sample based on the preset type voiceprint feature corresponding to each voice data sample; 3.
  • the Gaussian mixture model is trained by the vector, and the accuracy of the trained Gaussian mixture model is verified by the verification set after the training is completed; If the accuracy is greater than the preset threshold (for example, 98.5%), the training ends, and the trained Gaussian mixture model is used as the background channel model to be used, or if the accuracy is less than or equal to the preset threshold, the voice data is added. The number of samples and retraining until the accuracy of the Gaussian mixture model is greater than the preset threshold.
  • the preset threshold for example, 98.5%
  • the background channel model pre-trained in this embodiment is obtained by mining and comparing a large amount of voice data.
  • This model can accurately depict the background voiceprint characteristics of the user while maximally retaining the voiceprint features of the user. And this feature can be removed at the time of identification, and the inherent characteristics of the user's voice can be extracted, which can greatly improve the accuracy and efficiency of user identity verification.
  • the calculating the distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector, and generating the identity verification result based on the calculated distance comprises:
  • Calculating the cosine distance between the current voiceprint discrimination vector and the standard voiceprint discrimination vector Identifying the vector for the standard voiceprint, And identifying the vector for the current voiceprint; if the cosine distance is less than or equal to the preset distance threshold, generating information for verifying the pass; if the cosine distance is greater than the preset distance threshold, generating information that the verification fails.
  • the user identity identifier when storing the user's standard voiceprint authentication vector, the user identity identifier may be carried.
  • the corresponding standard voiceprint discrimination vector is obtained according to the identification information of the current voiceprint authentication vector, and the current voiceprint discrimination is calculated.
  • the cosine distance between the vector and the standard voiceprint discrimination vector obtained by matching, the cosine distance is used to verify the identity of the target user, and the accuracy of the authentication is improved.
  • the application does not require the developed client program to collect the user's voice data, and the voice recording verification using the handheld terminal is highly flexible and difficult to be interfered with, and the user identity is used to bind the server to the client computer, and then the random code is used again.
  • the client computer, the server and the handheld terminal are bound to avoid the sound hijacking, and improve the authenticity and security of the voiceprint verification.
  • the present application also provides a computer readable storage medium having stored thereon a processing system, the processing system being executed by a processor to implement the steps of the method of voiceprint verification described above.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Power Engineering (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)
  • Collating Specific Patterns (AREA)

Abstract

La présente invention concerne un serveur, un procédé de vérification d'empreinte vocale et un support d'informations. Le procédé consiste : après réception d'une requête de vérification d'identité, à générer un paramètre de code graphique d'un code graphique correspondant à une identité d'utilisateur et à envoyer le paramètre de code graphique à un ordinateur de client ; après analyse du code graphique par un terminal portatif, à recevoir une requête de vérification d'empreinte vocale, envoyée par le terminal portatif au moyen d'une adresse de liaison d'acquisition de données d'empreinte vocale, à transporter des clés aléatoires et à analyser si les deux clés aléatoires sont compatibles ; si oui, à établir un canal d'acquisition de données vocales avec le terminal portatif et à obtenir, sur la base du canal, des données vocales de vérification d'empreinte vocale actuelle de l'utilisateur, acquises à partir du terminal portatif ; et à construire un vecteur correspondant de discrimination d'empreinte vocale actuelle, à déterminer un vecteur de discrimination d'empreinte vocale standard correspondant à l'identité d'utilisateur, à calculer la distance entre le vecteur de discrimination d'empreinte vocale actuel et le vecteur de discrimination d'empreinte vocale standard et à générer un résultat de vérification d'identité en fonction de la distance calculée. La présente invention peut améliorer la flexibilité de la vérification d'empreinte vocale en évitant la déperdition sonore.
PCT/CN2018/102049 2018-05-14 2018-08-24 Serveur, procédé de vérification d'empreinte vocale et support d'informations WO2019218512A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810457267.1A CN108650266B (zh) 2018-05-14 2018-05-14 服务器、声纹验证的方法及存储介质
CN201810457267.1 2018-05-14

Publications (1)

Publication Number Publication Date
WO2019218512A1 true WO2019218512A1 (fr) 2019-11-21

Family

ID=63755329

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102049 WO2019218512A1 (fr) 2018-05-14 2018-08-24 Serveur, procédé de vérification d'empreinte vocale et support d'informations

Country Status (2)

Country Link
CN (1) CN108650266B (fr)
WO (1) WO2019218512A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109462482B (zh) * 2018-11-09 2023-08-08 深圳壹账通智能科技有限公司 声纹识别方法、装置、电子设备及计算机可读存储介质
CN113129903A (zh) * 2019-12-31 2021-07-16 深圳市航盛电子股份有限公司 一种自动化音频测试方法、装置、计算机设备及存储介质
CN113973299B (zh) * 2020-07-22 2023-09-29 中国石油化工股份有限公司 具有身份认证功能的无线传感器以及身份认证方法
CN111931146B (zh) * 2020-07-24 2024-01-19 捷德(中国)科技有限公司 身份验证方法、装置、设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015059365A1 (fr) * 2013-10-25 2015-04-30 Aplcomp Oy Procédé d'authentification associative audiovisuelle et système associé
CN105100123A (zh) * 2015-09-11 2015-11-25 深圳市亚略特生物识别科技有限公司 应用登录方法及系统
CN107517207A (zh) * 2017-03-13 2017-12-26 平安科技(深圳)有限公司 服务器、身份验证方法及计算机可读存储介质
CN107993071A (zh) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 电子装置、基于声纹的身份验证方法及存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4463526B2 (ja) * 2003-10-24 2010-05-19 株式会社ユニバーサルエンターテインメント 声紋認証システム
CN107610707B (zh) * 2016-12-15 2018-08-31 平安科技(深圳)有限公司 一种声纹识别方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015059365A1 (fr) * 2013-10-25 2015-04-30 Aplcomp Oy Procédé d'authentification associative audiovisuelle et système associé
CN105100123A (zh) * 2015-09-11 2015-11-25 深圳市亚略特生物识别科技有限公司 应用登录方法及系统
CN107517207A (zh) * 2017-03-13 2017-12-26 平安科技(深圳)有限公司 服务器、身份验证方法及计算机可读存储介质
CN107993071A (zh) * 2017-11-21 2018-05-04 平安科技(深圳)有限公司 电子装置、基于声纹的身份验证方法及存储介质

Also Published As

Publication number Publication date
CN108650266B (zh) 2020-02-18
CN108650266A (zh) 2018-10-12

Similar Documents

Publication Publication Date Title
WO2018166187A1 (fr) Serveur, procédé et système de vérification d'identité, et support d'informations lisible par ordinateur
WO2019100606A1 (fr) Dispositif électronique, procédé et système de vérification d'identité à base d'empreinte vocale, et support de stockage
US10825452B2 (en) Method and apparatus for processing voice data
JP6621536B2 (ja) 電子装置、身元認証方法、システム及びコンピュータ読み取り可能な記憶媒体
WO2019218512A1 (fr) Serveur, procédé de vérification d'empreinte vocale et support d'informations
WO2020006961A1 (fr) Procédé et dispositif d'extraction d'image
WO2019136912A1 (fr) Dispositif électronique, procédé et système d'authentification d'identité, et support de stockage
CN109660509A (zh) 基于人脸识别的登录方法、装置、系统和存储介质
WO2021051572A1 (fr) Procédé et appareil de reconnaissance vocale et dispositif informatique
WO2019196305A1 (fr) Dispositif électronique, procédé de vérification d'identité, et support de stockage
CN109947971B (zh) 图像检索方法、装置、电子设备及存储介质
CN110247898B (zh) 身份验证方法、装置、介质及电子设备
CN108009413A (zh) 自助终端的身份识别方法及装置、计算机存储介质、终端
CN105224844B (zh) 验证方法、系统和装置
WO2019218515A1 (fr) Serveur, procédé d'authentification d'identité par empreinte vocale, et support de stockage
WO2020007191A1 (fr) Procédé et appareil de reconnaissance et de détection de corps vivant, support et dispositif électronique
CN109614780B (zh) 生物信息认证方法及装置、存储介质、电子设备
CN111709851B (zh) 基于rfid及面部识别的酒店安全入住的方法、装置及设备
US20170277423A1 (en) Information processing method and electronic device
CN113035230A (zh) 认证模型的训练方法、装置及电子设备
CN115690920B (zh) 医疗身份认证的可信活体检测方法及相关设备
CN113436633B (zh) 说话人识别方法、装置、计算机设备及存储介质
CN113393318A (zh) 银行卡申请风控方法、装置、电子设备和介质
CN116629901A (zh) 请求处理方法、装置、计算机设备及存储介质
CN113762060A (zh) 人脸图像检测方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18919055

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 26/02/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18919055

Country of ref document: EP

Kind code of ref document: A1