WO2020073519A1

WO2020073519A1 - Voiceprint verification method and apparatus, computer device and storage medium

Info

Publication number: WO2020073519A1
Application number: PCT/CN2018/124402
Authority: WO
Inventors: 杨翘楚; 王健宗; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-10-11
Filing date: 2019-02-01
Publication date: 2020-04-16
Also published as: CN109257362A

Abstract

Disclosed is a voiceprint verification method. The method comprises: a client server extracting a voice signal to be subjected to identity verification, and extracting corresponding MFCC type voiceprint features, and constructing same as a first voiceprint feature corresponding to frames of voice data; a voiceprint verification server receiving the first voiceprint feature; the voiceprint verification server determining whether a feature distance value between voiceprint identification vectors (i-vectors) respectively corresponding to the first voiceprint feature and a pre-stored voiceprint feature meets a pre-set requirement; and if so, determining that the first voiceprint feature is the same as the pre-stored voiceprint feature.

Description

Voiceprint verification method, device, computer equipment and storage medium

This application requires the priority of the Chinese patent application filed on October 11, 2018 in the Chinese Patent Office with the application number 2018111847753 and the invention titled "Method, Device, Computer Equipment, and Storage Media for Voiceprint Verification", all of which are approved by The reference is incorporated in this application.

Technical field

The present application relates to the field of voiceprint verification, in particular to a method, device, computer equipment and storage medium for voiceprint verification.

Background technique

At present, the business scope of many large financial companies involves multiple business areas such as insurance, banking, and investment. Each business area usually needs to communicate with customers and anti-fraud identification is required. Therefore, the identity verification and Anti-fraud identification has become an important part of ensuring business security. In the process of customer identity verification, voiceprint verification is adopted by many companies due to its real-time and convenience. The training of the customer voiceprint model and the verification of the customer's identity need to collect the customer's voice data, and the acquisition of the voice data often comes from the recording of the conversation between the financial company and the customer. The inventor realized that since business negotiations often involve confidential content, transferring voice data from the network to the background and then extracting voice feature parameters is not conducive to data confidentiality.

technical problem

The main purpose of this application is to provide a method of voiceprint verification, which aims to solve the problem that the voice data collected by the client needs to be sent to the background for voiceprint feature extraction in the existing voiceprint verification process, resulting in poor confidentiality of the voice data in transmission technical problem.

Technical solution

This application proposes a method for voiceprint verification, including:

Extract the voice signal of the identity to be verified through the client server, and extract the MFCC type voiceprint features corresponding to each frame of voice data in the voice signal;

Constructing the MFCC type voiceprint feature into voiceprint feature vectors corresponding to each frame of voice data through the client server to form a first voiceprint feature;

The voiceprint verification server receives the first voiceprint feature sent by the client server;

The voiceprint verification server judges whether the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature meets the preset requirements;

If satisfied, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, otherwise it is not the same.

This application also provides a voiceprint verification system, including a client, a client server, and a voiceprint verification server;

The client collects the voice signal of the identity to be verified, and sends the voice signal to the client server;

The client server receives the voice signal, extracts voiceprint features from the voice signal to obtain a first voiceprint feature, and transmits the first voiceprint feature to the voiceprint verification server;

The voiceprint verification server receives the first voiceprint feature, and compares the first voiceprint feature with a pre-stored voiceprint feature to determine whether the first voiceprint feature and the pre-stored voiceprint feature The same, and feedback the judgment result to the client server;

The client server controls the client to perform a feedback response according to the judgment result.

The present application also provides a computer device, including a memory and a processor, where the memory stores a computer program, and when the processor executes the computer program, the steps of the foregoing method are implemented.

The present application also provides a computer non-volatile readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the above method.

Beneficial effect

In this application, the function of extracting the voiceprint feature vector is forwarded to the client server. After the client collects the voice signal through recording, the voiceprint feature vector of the voice signal is directly extracted from the local client server, and then the voiceprint feature vector is extracted. Transmitted to the verification server of third-party technical support for voiceprint verification, the training of voiceprint verification model and speaker recognition process, because the voiceprint feature vector can no longer be reversed to restore the original data of the voice signal, which is conducive to the recording of customers The voice signal is used for data confidentiality, which improves data security and improves the security of the customer's identity authentication process. In this application, the data after extracting the voiceprint feature vector is transmitted to the server for voiceprint verification. The voiceprint feature vector data is more portable than the original voice signal data, which greatly increases the transmission efficiency. The present application is based on GMM-UBM to realize the mapping of each of the voiceprint feature vectors into low-dimensional voiceprint identification vectors i-vector, which reduces the calculation cost and reduces the cost of voiceprint verification. In the verification process, by comparing and analyzing with the pre-stored data of multiple people, the equivalence rate of voiceprint verification is reduced, and the influence of the model errors of voiceprint verification is reduced.

BRIEF DESCRIPTION

FIG. 1 is a schematic flowchart of a method for voiceprint verification according to an embodiment of the present application;

2 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.

Best Mode of the Invention

Referring to FIG. 1, a method of voiceprint verification according to an embodiment of the present application collects information through a client and performs voiceprint verification through a server. The method includes:

S1: Extract the voice signal of the identity to be verified through the client server, and extract the MFCC type voiceprint features corresponding to each frame of voice data in the voice signal.

The MFCC (Mel Frequency Cepstrum Coefficient) type voiceprint feature of this embodiment has a non-linear feature, so that the analysis result of the customer's voice signal in each frequency band is closer to the characteristics of the real voice emitted by the human body, improving The effect of voiceprint verification.

S2: The client server is used to construct the MFCC-type voiceprint feature into voiceprint feature vectors corresponding to each frame of voice data to form a first voiceprint feature.

In this embodiment, the voiceprint feature vector corresponding to each frame of voice data is constructed according to the extracted MFCC type voiceprint features, and then the corresponding MFCC type voiceprint features are combined together by sorting the voice data of each frame of the voice signal to obtain The first voiceprint feature corresponding to the client's voice signal is still completed on the client server to enhance data confidentiality during data transmission.

S3: The voiceprint verification server receives the first voiceprint feature sent by the client server.

In this embodiment, the extraction of the first voiceprint feature is forwarded to the client server for completion, so that the client server directly extracts the first voiceprint feature corresponding to the voice signal on the client server after receiving the recorded voice signal of the client. Then transfer to the voiceprint verification server supported by the third party for voiceprint verification. Because the first voiceprint feature can no longer be restored to the original voice signal by reverse push, it is conducive to data confidentiality of the voice signal recorded by the customer, improves data security, and improves the security of the customer identity authentication process. At the same time, the first A voiceprint feature has a smaller data volume than a voice signal, greatly increasing transmission efficiency. The voiceprint feature is extracted from the collected voice signal through the client server, and the extracted voiceprint feature is transmitted to the voiceprint verification server for voiceprint verification, so that the client server and voiceprint verification server for voiceprint feature extraction are separated.

S4: The voiceprint verification server determines whether the feature distance value between the voiceprint identification vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively meets the preset requirements.

The preset requirements in this embodiment include that the characteristic distance value reaches a specified preset threshold range, etc., and can be customized according to specific application scenarios to meet the personalized usage requirements more widely.

S5: If satisfied, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, otherwise it is not the same.

In this embodiment, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, and the server passes the verification result to the client through the server; otherwise, the verification failure result is fed back to the client, so that the client Perform further application operations based on the feedback results. For example, after the verification is passed, the smart door is controlled to open, etc. As another example, after a specified number of verification failures, the security system is controlled to lock the screen to prevent criminals from further damaging the electronic banking system.

Further, step S4 of this embodiment includes:

S41: Map voiceprint feature vectors corresponding to each frame of speech data to low-dimensional voiceprint identification vectors i-vector, respectively.

This embodiment is based on GMM-UBM (Gaussian Mixture Model-Universal Background Model, Gaussian Mixture Model-Background Model) realizes the mapping of voiceprint feature vectors corresponding to each frame of speech data into low-dimensional voiceprint discrimination vectors i-vector. The training process of the GMM-UBM in this embodiment is as follows: B1: Obtain a preset number of voice data samples (for example, 100,000), each voice data sample corresponds to a voiceprint discrimination vector, and each voice sample can be collected from a different People's speech in different environments, such speech data samples are used to train a general background model (GMM-UBM) that can characterize the general speech characteristics; B2. Process each speech data sample separately to extract the correspondence of each speech data sample Voiceprint features of the preset type, and construct voiceprint feature vectors corresponding to each voice data sample based on the preset voiceprint features corresponding to each voice data sample; B3. Divide all the constructed voiceprint feature vectors of the preset types into The first percentage of the training set and the second percentage of the verification set, the first percentage and the second percentage are less than or equal to 100%; B4, using the voiceprint feature vector in the training set The second model is trained, and after the training is completed, a verification set is used to verify the accuracy of the trained second model; B5. If the accuracy is greater than the preset standard Rate (e.g., 98.5%), the model training ends, otherwise, increasing the number of samples of voice data, and re-execute the above-described step B2, B3, B4, B5 based on the voice data samples increases.

The voiceprint discrimination vector of this embodiment is expressed by the voiceprint discrimination vector i-vector. The voiceprint discrimination vector i-vector is a vector. Compared with the dimension of the Gaussian space, the voiceprint discrimination vector i-vector has a lower dimension, which Reduce computing costs.

[Quote to join (Rules 20.5) 01.02.2019]
S42: Pass the cosine distance formula

, Calculate the distance between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the voiceprint discrimination vector i-vector corresponding to the pre-stored voiceprint feature, where x represents the voiceprint discrimination vector corresponding to the pre-stored voiceprint feature i-vector, y represents the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature.

S43: Determine whether the cosine distance value meets a preset condition.

The preset conditions in this embodiment include that the cosine distance value is within a specified threshold value range, etc., which can be set as needed. In this embodiment, by sorting the first cosine distance values respectively calculated from the pre-stored voiceprint features corresponding to the pre-stored voiceprint feature data of multiple people and the first voiceprint feature from small to large, the preset sorting is determined as Whether the first few first cosine distance values include the first cosine distance value corresponding to the target person's pre-stored voiceprint feature, and if so, it is determined that the cosine distance value satisfies the preset condition. In another embodiment of the present application, by determining whether the second cosine distance value between the pre-stored voiceprint feature of the target person and the first voiceprint feature is less than or equal to a preset threshold, if less than or equal to, the cosine distance value is determined Meet the preset conditions.

S44: If the cosine distance value satisfies the preset condition, it is determined that the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively meets the preset requirement, otherwise it is not Meet the preset requirements.

Further, step S41 of this embodiment includes:

S410: Input voiceprint feature vectors corresponding to each frame of extracted speech data to the GMM-UBM model, respectively, to obtain a Gaussian supervector representing the probability distribution of each frame of speech data on each Gaussian component.

[Quote to join (Rules 20.5) 01.02.2019]
S411: use the above-mentioned Gaussian supervectors using formulas

, The low-dimensional voiceprint discrimination vector i-vector corresponding to each frame of speech data is calculated, where

Is the Gaussian supervector of each frame of voice data, μ is the mean supervector of the GMM-UBM model, and T is the low-dimensional voiceprint discrimination vector i-vector of each frame of voice data,

It is a transformation matrix mapped into a high-dimensional Gaussian space.

The T training of this embodiment uses the EM algorithm. EM algorithm refers to the maximum expectation algorithm (Expectation Maximization Algorithm, also known as Expectation Maximization Algorithm), is an iterative algorithm used in statistics to find the maximum likelihood estimation of parameters in probability models that depend on unobservable implicit variables. The maximum expectation algorithm is calculated alternately through two steps: 1) calculate the expectation (E), use the existing estimates of the probability model parameters to calculate the expectation of the hidden variable; 2) maximize (M), use the E Hidden variable expectation and maximum likelihood estimation of the parameter model. The parameter estimates found in the previous step are used in the next step of the calculation, and are alternately performed.

Further, step S43 of this embodiment includes:

S430: Obtain the first cosine distance value between the pre-stored voiceprint features corresponding to each of the pre-stored voiceprint feature data and the first voiceprint feature, wherein the voiceprint feature data of multiple people includes Pre-stored voiceprint features of the target person.

[Quote to join (Rules 20.5) 01.02.2019]
In this embodiment, the pre-stored voiceprint feature data of multiple persons including the target person is used to determine whether the voiceprint feature of the currently collected voice signal is the same as the target person's voiceprint feature, so as to improve the judgment accuracy. This embodiment uses the cosine distance formula

, Representing the first cosine distance value between each of the pre-stored voiceprint features and the first voiceprint feature, where x represents each prestored voiceprint discrimination vector, and y represents the voiceprint discrimination vector of the first voiceprint feature In i-vector, the smaller the cosine distance value, it means that the two voiceprint features are closer or the same. The "first" in this embodiment is only used for distinction, not for limitation, and the functions in other places are the same, and will not be repeated.

S431: Sort the first cosine distance values in ascending order.

In this embodiment, the first cosine distance value between each of the pre-stored voiceprint features and the first voiceprint feature is sorted from small to large, so as to more accurately analyze the first voiceprint feature and each pre-stored voiceprint The similarity distribution state of the feature, so as to obtain the verification of the first voiceprint feature more accurately.

S432: Determine whether the first cosine distance value of the preset preset number includes the first cosine distance value corresponding to the pre-stored voiceprint feature of the target person.

In this embodiment, the first predetermined number of first cosine distance values include the first cosine distance value corresponding to the pre-stored voiceprint feature of the target person, and then the first voiceprint feature and the pre-stored target person are determined. The voiceprint features are the same, to reduce the recognition error rate caused by model errors. The above error rate is "the frequency of verification failures that should occur when the verification is passed, and the frequency of verification passes that occur when the verification should be failed. ". The preset number of first cosine distance values in this embodiment includes 1, 2, or 3, etc., which can be set according to usage requirements.

S433: If yes, it is determined that the cosine distance value meets the preset condition, otherwise, the preset condition is not met.

Further, step S43 of another embodiment of the present application includes:

S434: Obtain a second cosine distance value between the pre-stored voiceprint feature of the target person and the first voiceprint feature.

In this embodiment, by comparing only a second cosine distance value in a targeted manner, the amount of comparison calculation is reduced, and the verification rate is improved.

S435: Determine whether the second cosine distance value is less than or equal to a preset threshold.

In this embodiment, by setting a distance threshold between the first voiceprint feature and the pre-stored voiceprint feature of the target user, effective voiceprint verification is achieved. For example, the preset threshold is 0.6.

S436: If yes, it is determined that the cosine distance value meets the preset condition, otherwise, the preset condition is not met.

In this embodiment, the cosine distance between the first voiceprint feature and the target user's pre-stored voiceprint feature is calculated to be less than or equal to a preset threshold, then the cosine distance value is determined to satisfy the preset condition, and the first voiceprint feature and the target user's pre-stored voiceprint are determined. If the features are the same, the verification is passed; if the cosine distance between the first voiceprint feature and the pre-stored voiceprint feature of the target user is greater than a preset threshold, it is determined that the distance value does not satisfy the preset condition, and the first voiceprint feature and the target are determined If the user's pre-stored voiceprint features are different, the verification fails.

The client collects the voice signal of the identity to be verified and sends the voice signal to the client server;

Further, the continuous analog signal of the voice signal in this embodiment is sampled by the client according to a specified sampling period to form a discrete analog signal, and specified encoding rules are quantized into digital signals; the client server receives the voice signal The process of extracting voiceprint features from the voice signal to obtain the first voiceprint features is as follows:

[Quote to join (Rules 20.5) 01.02.2019]
S101, after pre-emphasizing the digital signal, the client server performs framing processing on the pre-emphasized digital signal to obtain each frame of voice data; S102, according to

Map each frame of speech data from the linear spectrum domain to the Mel spectrum domain, where,

Represents the Mel spectrum value,

Represents the linear spectrum value; S103, input each frame of speech data converted into the mel spectrum domain into a group of mel triangle filter groups, calculate the log energy of the mel triangle filter output of each frequency band, and obtain each frame of speech Log energy sequences corresponding to the data respectively; S104, performing discrete cosine transformation on each of the log energy sequences to obtain MFCC type voiceprint features corresponding to each frame of speech data; constructing the MFCC type voiceprint features into each frame Voiceprint feature vectors corresponding to the voice data respectively to form the first voiceprint feature.

The above pre-emphasis, due to the physiological characteristics of the human body, the high-frequency components of the voice signal are often suppressed, the role of pre-emphasis is to compensate for the high-frequency components; in the above frame processing, due to the "instant stability" of the voice signal, spectrum analysis When a voice signal is framed (usually 10 to 30 milliseconds per frame), then feature extraction is performed in units of frames; after the above framed processing, windowing is performed to reduce the signal at the beginning and end of the frame For the discontinuity problem, this embodiment uses a Hamming window for windowing.

In this embodiment, the function of extracting the voiceprint feature vector is forwarded to the client server. After the client collects the voice signal through recording, the voiceprint feature vector of the voice signal is directly extracted from the local client server, and then the voiceprint feature is extracted. The vector is transmitted to a third-party technical support verification server for voiceprint verification. The training of the voiceprint verification model and the speaker recognition process, because the voiceprint feature vector can no longer be reversed and restored to the original data of the voice signal, is conducive to recording customers The voice signal is used to keep data confidential, improve data security, and improve the security of the customer's identity authentication process. In this embodiment, the data after extracting the voiceprint feature vector is transmitted to the server for voiceprint verification. The voiceprint feature vector data is more portable than the original voice signal data, which greatly increases the transmission efficiency. In this embodiment, based on GMM-UBM, each of the voiceprint feature vectors is mapped to a low-dimensional voiceprint discrimination vector i-vector, which reduces the calculation cost and the use cost of voiceprint verification. In the verification process, by comparing and analyzing with the pre-stored data of multiple people, the equivalence rate of voiceprint verification is reduced, and the influence of the model errors of voiceprint verification is reduced.

Further, the judgment result includes that the first voiceprint feature is not the same as the pre-stored voiceprint feature, and the client server controlling the feedback response process of the client according to the judgment result includes:

The client server generates feedback information about unsuccessful authentication and sends it to the client;

It is determined whether the number of times that the feedback information of unsuccessful identity verification is generated according to the first voiceprint feature within a preset time exceeds a preset number of times.

If the preset times are exceeded, the client is controlled to be in a disabled state and an alarm is issued.

The voiceprint verification system includes an alarm and a safety control device to enhance the functional completeness of the voiceprint verification system in the actual application process and improve management security and information security.

Referring to FIG. 2, an embodiment of the present application further provides a computer device. The computer device may be a server, and its internal structure may be as shown in FIG. 2. The computer device includes a processor, memory, network interface, and database connected by a system bus. Among them, the processor designed by the computer is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The memory device provides an environment for the operation of the operating system and computer-readable instructions in the non-volatile storage medium. The database of the computer device is used to store data such as voiceprint verification. The network interface of the computer device is used to communicate with external terminals through a network connection. When the computer-readable instructions are executed, the processes of the foregoing method embodiments are executed. Those skilled in the art can understand that the structure shown in FIG. 2 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.

An embodiment of the present application further provides a computer nonvolatile readable storage medium on which computer readable instructions are stored. When the computer readable instructions are executed, the processes of the foregoing method embodiments are performed. The above is only the preferred embodiment of the present application, and does not limit the patent scope of the present application. Any equivalent structure or equivalent process transformation made by the description and drawings of this application, or directly or indirectly used in other related In the technical field, the same reason is included in the scope of patent protection of this application.

Claims

A method of voiceprint verification, characterized in that it includes:

Extract the voice signal of the identity to be verified through the client server, and extract the MFCC type voiceprint features corresponding to each frame of voice data in the voice signal;

Constructing the MFCC type voiceprint feature into voiceprint feature vectors corresponding to each frame of voice data through the client server to form a first voiceprint feature;

The voiceprint verification server receives the first voiceprint feature sent by the client server;

The voiceprint verification server judges whether the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature meets the preset requirements;

If satisfied, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, otherwise it is not the same.
[Quote to join (Rules 20.5) 01.02.2019]
The voiceprint verification method according to claim 1, wherein the voiceprint verification server judges the features between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively The steps to determine whether the distance value meets the preset requirements include:
Map the voiceprint feature vectors corresponding to each frame of voice data to low-dimensional voiceprint identification vectors i-vector;
Pass the cosine distance formula
, Calculate the cosine distance cos (x, y) between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the voiceprint discrimination vector i-vector corresponding to the pre-stored voiceprint feature, where x represents the pre-stored voiceprint The voiceprint discrimination vector i-vector corresponding to the feature, y represents the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature;
Determine whether the cosine distance value meets the preset condition;
If satisfied, it is determined that the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively meets the preset requirements, otherwise the preset requirements are not met.
[Quote to join (Rules 20.5) 01.02.2019]
The method for voiceprint verification according to claim 2, wherein the step of mapping the voiceprint feature vectors corresponding to the voice data of each frame to the low-dimensional voiceprint discrimination vector i-vector includes:
The voiceprint feature vectors corresponding to the extracted speech data of each frame are input into the GMM-UBM model, respectively, to obtain a Gaussian supervector representing the probability distribution of each frame of speech data on each Gaussian component;
Use each Gaussian supervector with the formula
, The low-dimensional voiceprint discrimination vector i-vector corresponding to each frame of speech data is calculated, where
Is the Gaussian supervector of each frame of voice data, μ is the mean supervector of the GMM-UBM model, and T is the low-dimensional voiceprint discrimination vector i-vector of each frame of voice data,
It is a transformation matrix mapped into a high-dimensional Gaussian space.
The voiceprint verification method according to claim 2, wherein the step of determining whether the cosine distance value meets a preset condition includes:

Obtaining the first cosine distance value between the pre-stored voiceprint features corresponding to each of the pre-stored voiceprint feature data of the multiple persons and the first voiceprint feature respectively, wherein the voiceprint feature data of the multiple persons includes the target person Pre-stored voiceprint features;

Sort the first cosine distance values in order from small to large;

Judging whether the preset first number of first cosine distance values includes the first cosine distance value corresponding to the target person's pre-stored voiceprint feature;

If yes, it is determined that the first cosine distance value meets a preset condition, otherwise, the preset condition is not met.
The voiceprint verification method according to claim 2, wherein the step of determining whether the cosine distance value meets a preset condition includes:

Acquiring a second cosine distance value between the pre-stored voiceprint feature of the target person and the first voiceprint feature;

Determine whether the second cosine distance value is less than or equal to a preset threshold;

If yes, it is determined that the second cosine distance value meets the preset condition, otherwise, the preset condition is not met.
A voiceprint verification system is characterized by including a client, a client server and a voiceprint verification server;

The client collects the voice signal of the identity to be verified, and sends the voice signal to the client server;

The client server receives the voice signal, extracts voiceprint features from the voice signal to obtain a first voiceprint feature, and transmits the first voiceprint feature to the voiceprint verification server;

The voiceprint verification server receives the first voiceprint feature, and compares the first voiceprint feature with a pre-stored voiceprint feature to determine whether the first voiceprint feature and the pre-stored voiceprint feature The same, and feedback the judgment result to the client server;

The client server controls the client to perform a feedback response according to the judgment result.
[Quote to join (Rules 20.5) 01.02.2019]
The voiceprint verification system according to claim 6, wherein the continuous analog signal of the voice signal is sampled by the client according to a specified sampling period to form a discrete analog signal, and quantized into a digital signal by a specified encoding rule; The process of the client server receiving the voice signal and extracting voiceprint features from the voice signal to obtain the first voiceprint features includes:
After pre-emphasizing the digital signal, the client server performs frame-dividing processing on the pre-emphasized digital signal to obtain each frame of voice data;
according to
Mapping the speech data of each frame from the linear spectrum domain to the Mel spectrum domain, where,
Represents the Mel spectrum value,
Represents the linear spectrum value;
Input each frame of voice data converted into the Mel spectrum domain into the Mel triangle filter bank, calculate the log energy of the Mel triangle filter output of each frequency band, and obtain the log energy sequence corresponding to each frame of voice data;
Perform discrete cosine transform on each of the logarithmic energy sequences to obtain MFCC type voiceprint features corresponding to each frame of speech data;
The MFCC type voiceprint feature is constructed as a voiceprint feature vector corresponding to each frame of speech data to form the first voiceprint feature.
The voiceprint verification system according to claim 6, wherein the judgment result includes that the first voiceprint feature is different from the pre-stored voiceprint feature, and the client server controls the location based on the judgment result The process of client feedback response includes:

The client server generates feedback information about unsuccessful authentication and sends it to the client;

It is determined whether the number of times that the feedback information of unsuccessful identity verification is generated according to the first voiceprint feature within a preset time exceeds a preset number of times.

If the preset times are exceeded, the client is controlled to be in a disabled state and an alarm is issued.
A computer device includes a memory and a processor. The memory stores a computer program. The method is characterized in that the processor implements a voiceprint verification method when the computer program is executed. The voiceprint verification method includes:

Extract the voice signal of the identity to be verified through the client server, and extract the MFCC type voiceprint features corresponding to each frame of voice data in the voice signal;

Constructing the MFCC type voiceprint feature into voiceprint feature vectors corresponding to each frame of voice data through the client server to form a first voiceprint feature;

The voiceprint verification server receives the first voiceprint feature sent by the client server;

The voiceprint verification server judges whether the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature meets the preset requirements;

If satisfied, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, otherwise it is not the same.
[Quote to join (Rules 20.5) 01.02.2019]
The computer device according to claim 9, wherein the voiceprint verification server judges whether the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively Steps to meet preset requirements include:
Map the voiceprint feature vectors corresponding to each frame of voice data to low-dimensional voiceprint identification vectors i-vector;
Pass the cosine distance formula
, Calculate the cosine distance cos (x, y) between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the voiceprint discrimination vector i-vector corresponding to the pre-stored voiceprint feature, where x represents the pre-stored voiceprint The voiceprint discrimination vector i-vector corresponding to the feature, y represents the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature;
Determine whether the cosine distance value meets the preset condition;
If satisfied, it is determined that the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively meets the preset requirements, otherwise the preset requirements are not met.
[Quote to join (Rules 20.5) 01.02.2019]
The computer device according to claim 10, wherein the step of mapping the voiceprint feature vectors corresponding to the voice data of each frame to the low-dimensional voiceprint discrimination vector i-vector includes:
The voiceprint feature vectors corresponding to the extracted speech data of each frame are input into the GMM-UBM model, respectively, to obtain a Gaussian supervector representing the probability distribution of each frame of speech data on each Gaussian component;
Use each Gaussian supervector with the formula
, The low-dimensional voiceprint discrimination vector i-vector corresponding to each frame of speech data is calculated, where
Is the Gaussian supervector of each frame of voice data, μ is the mean supervector of the GMM-UBM model, and T is the low-dimensional voiceprint discrimination vector i-vector of each frame of voice data,
It is a transformation matrix mapped into a high-dimensional Gaussian space.
The computer device according to claim 10, wherein the step of determining whether the cosine distance value satisfies a preset condition includes:

Obtaining the first cosine distance value between the pre-stored voiceprint features corresponding to each of the pre-stored voiceprint feature data of the multiple persons and the first voiceprint feature respectively, wherein the voiceprint feature data of the multiple persons includes the target person Pre-stored voiceprint features;

Sort the first cosine distance values in order from small to large;

Judging whether the preset first number of first cosine distance values includes the first cosine distance value corresponding to the target person's pre-stored voiceprint feature;

If yes, it is determined that the first cosine distance value meets a preset condition, otherwise, the preset condition is not met.
The computer device according to claim 10, wherein the step of determining whether the cosine distance value satisfies a preset condition includes:

Acquiring a second cosine distance value between the pre-stored voiceprint feature of the target person and the first voiceprint feature;

Determine whether the second cosine distance value is less than or equal to a preset threshold;

If yes, it is determined that the second cosine distance value meets the preset condition, otherwise, the preset condition is not met.
A computer non-volatile readable storage medium, on which a computer program is stored, characterized in that, when the computer program is executed by a processor, a method of implementing voiceprint verification, the method of voiceprint verification includes:

Extract the voice signal of the identity to be verified through the client server, and extract the MFCC type voiceprint features corresponding to each frame of voice data in the voice signal;

Constructing the MFCC type voiceprint feature into voiceprint feature vectors corresponding to each frame of voice data through the client server to form a first voiceprint feature;

The voiceprint verification server receives the first voiceprint feature sent by the client server;

The voiceprint verification server judges whether the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature meets the preset requirements;

If satisfied, it is determined that the first voiceprint feature is the same as the pre-stored voiceprint feature, otherwise it is not the same.
[Quote to join (Rules 20.5) 01.02.2019]
The computer non-volatile readable storage medium according to claim 14, wherein the voiceprint verification server judges a voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature, respectively The steps of whether the characteristic distance value between meet the preset requirements include:
Map the voiceprint feature vectors corresponding to each frame of voice data to low-dimensional voiceprint identification vectors i-vector;
Pass the cosine distance formula
, Calculate the cosine distance cos (x, y) between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the voiceprint discrimination vector i-vector corresponding to the pre-stored voiceprint feature, where x represents the pre-stored voiceprint The voiceprint discrimination vector i-vector corresponding to the feature, y represents the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature;
Determine whether the cosine distance value meets the preset condition;
If satisfied, it is determined that the feature distance value between the voiceprint discrimination vector i-vector corresponding to the first voiceprint feature and the pre-stored voiceprint feature respectively meets the preset requirements, otherwise the preset requirements are not met.
[Quote to join (Rules 20.5) 01.02.2019]
The computer non-volatile readable storage medium according to claim 15, wherein the step of mapping voiceprint feature vectors corresponding to each frame of voice data to low-dimensional voiceprint discrimination vectors i-vector, respectively ,include:
The voiceprint feature vectors corresponding to the extracted speech data of each frame are input into the GMM-UBM model, respectively, to obtain a Gaussian supervector representing the probability distribution of each frame of speech data on each Gaussian component;
Use each Gaussian supervector with the formula
, The low-dimensional voiceprint discrimination vector i-vector corresponding to each frame of speech data is calculated, where
Is the Gaussian supervector of each frame of voice data, μ is the mean supervector of the GMM-UBM model, and T is the low-dimensional voiceprint discrimination vector i-vector of each frame of voice data,
It is a transformation matrix mapped into a high-dimensional Gaussian space.
The computer non-volatile readable storage medium according to claim 15, wherein the step of determining whether the cosine distance value meets a preset condition includes:

Obtaining the first cosine distance value between the pre-stored voiceprint features corresponding to each of the pre-stored voiceprint feature data of the multiple persons and the first voiceprint feature respectively, wherein the voiceprint feature data of the multiple persons includes the target person Pre-stored voiceprint features;

Sort the first cosine distance values in order from small to large;

Judging whether the preset first number of first cosine distance values includes the first cosine distance value corresponding to the target person's pre-stored voiceprint feature;

If yes, it is determined that the first cosine distance value meets a preset condition, otherwise, the preset condition is not met.
The computer non-volatile readable storage medium according to claim 15, wherein the step of determining whether the cosine distance value meets a preset condition includes:

Acquiring a second cosine distance value between the pre-stored voiceprint feature of the target person and the first voiceprint feature;

Determine whether the second cosine distance value is less than or equal to a preset threshold;

If yes, it is determined that the second cosine distance value meets the preset condition, otherwise, the preset condition is not met.